Tuesday, March 18, 2008

Analysis of the persistence layer from the SCEA 5 perspective

The technology grew up too fast and there's a lot of different ways to perform data persistence in an Enterprise application, you can do a simple INSERT sql, like:

INSERT INTO Customer VALUES (1, 'Isaac', 'Newton')

or you could use a more sophisticated way to do that, with some ORMs frameworks, but... before to talk about the benefits of one over the other, and which one should you pick in the real life or in the SCEA, lets explain every word.

ORM (stands for Object relationship management): The definition from wikipedia says:

"Object-Relational Mapping (aka ORM, O/RM, and O/R mapping) is a programming technique for converting data between incompatible type systems in relational databases and object-oriented programming languages. This creates, in effect, a "virtual object database" which can be used from within the programming language."
but what does this means in the real life of an application?, ok... this is really simple, instead of doing this:

INSERT INTO Customer VALUES (1, 'Isaac', 'Newton')

you will do:

Customer cust = new Customer();
cust.setName('Isaac');
cust.setLastName('Newton');

somepersistenceframework.doPersist(cust);


its very clear in the second sample that the LastName is Newton and the First name is 'Issac', futhermore you don't have to care about the type mapping. What will happen if your customer has a birthday and you want to save it? lets see:

INSERT INTO Customer VALUES (1, 'Isaac', 'Newton', '1642/Dec/25');

but... will be the '1642/Dec/25' a valid SQL format? I don't think so... ok, lets fix it:

INSERT INTO Customer VALUES (1, 'Isaac', 'Newton', '1642/12/25');

Now... will this gone a work? it depends on which database you are working on, if the DB is SQL Server then it will, but if this is an Oracle DB then... nope... why? in Oracle you must specified the Date format as: "TO_DATE('YYYY/MM/DD', '1642/12/25')", then you will need to hard code your formatting options in your code if you want to create a good application.

But, what if you use ORM? then your application should look like:

Customer cust = new Customer();
cust.setName('Isaac');
cust.setLastName('Newton');
cust.setBirthDate(new java.util.Date(1642, 11, 25)); // or use a java.util.Calendar to get a Date

somepersistenceframework.doPersist(cust);


its easier right? ok, this is what ORM means, create objects and let the framework (or container) do the work of synchronize the object with the underlying DB. (this will be done in two ways, to persist the data like INSERTS, UPDATEs or DELETEs and also will be done to retrieve the data).

Note here: btw, I don't forget the use of Named parameters in the javax sql statements (that will fix the date format problem), I just put it as an example, but if you will judge me... ok, lets change the problem, what if the problem is with the identities? in SQL you have it but in Oracle you must implement it as Sequence in a dual table, then you will need code a different method for SQL and Oracle... or what will you say about the BLOB datas? or... I think I made my point in here :D


The evolution of the concept

One of the earliest implementation of this ORM was JDO as a framework and EJB 1.1 as a J2EE standard, in the EJB 1.0 the first approach of this was something called BPM (Bean Persistence Management), which means: "Separate all the SQLs from the Data and the container will let the object know when to synchronize"... so... what's the difference with the DAO Pattern? the main difference is that the server will do all the caching work and will let you know when the data must be saved or retrieved, this will ensure that your application will be scalable, and it will have a good performance (off course, everything depends on the quality of your SQL statements), also it's easily maintained. Then, how this works in our previous sample? here its is:

the Customer class will implement a two new methods named "ejbStore" and "ejbLoad", with these new methods your class should call a DAO class which will implement the persistence.

The only advantage here is the separation of concerns and, off course, the scalability, etc. But the developers quickly hated it, just because it was a lot of work and seems not to powerful. (but it was!)

With this problems in EJB 1.0 the JDO was a good start point for a really ORM framework, the main idea in here is that you will create the objects and an XML which will tell the framework how it will persist this object. JDO could work as a standalone framework, this will able your simple applications to use a persistence framework and forget the behaviour of the persistence layer, this was one of the biggest advantages over EJB.

The big disadvantage in EJB was that the developer should care about the implementation issues I talked about (the problem with formats like Oracle and SQL Server dates), so the developers were angry about that.

Then Sun came out with the CMP (Container Management persistence Entities) which has the advantage that it will take care of the SQL stuff and you will only need to create the entities, the promised land came to earth... but was not the best thing in the world, in fact Sun recommended that if you will need performance then you need to go back to the BMP because the algorithm was not the best and so the performance. That changed in EJB 2.0 and 2.1, were the new Entities had a very good performance (even better than if you write your own code), but they were too complex and the people still had in mind the poor performance in the earlier version.

Recently Sun launched EJB 3.0 based on a new ORM framework (JPA) and a new specification (mainly based over the current persistence frameworks like hibernate and JDO).

Ok, that was the quick flash back, lets move to the main of this discussion, comparison between the different persistence approaches. (maybe I miss some points in the history or omitted some details)

Analysis of the persistence layer from the SCEA 5 perspective

Ok, first of all... let me tell you something... the test will rely on the new JEE specification, so you will not need to be worried about CMP or BMP (why? just because they disappeared and were replaced by JPA). So the question will be, what is the best for my application: JPA, DAOs, JDBC, etc. in front of this topics: ease of development, performance, scalability, extensibility, and security?

The JPA is a framework based on the idea of POJOs (Plain Old JavaObjects) which means: "Leave the Customer class as you have it and the container (or framework) will take care of the persistence", the only thing you need to do its to provide some information about your object to the container, how you do that? if you're using JPA you will do that using Annotations, if you're using Hibernate then you will have the option to put it in a mapping xml file. Lets see how the Customer class will be if you put the JPA annotations:



import java.util.Date;
import javax.persistence.Entity;
import javax.persistence.Id;

@Entity
public class Customer implements java.io.Serializable {
@Id
private int id;

private String name;
private String lastName;
private Date birthDate;

public int getId() {
return id;
}

public void setId(int id) {
this.id = id;
}

public String getName() {
return name;
}

public void setName(String name) {
this.name = name;
}

public String getLastName() {
return lastName;
}

public void setLastName(String lastName) {
this.lastName = lastName;
}

public Date getBirthDate() {
return birthDate;
}

public void setBirthDate(Date birthDate) {
this.birthDate = birthDate;
}


}



and how can you retrieve and save a Customer? just doing a minimum work:

put this as a variable in your Session Bean:

@PersistenceContext(unitName="BizAgi.RenderPU")
private EntityManager em;


and the persist the Customer using:

em.persist(c);

so, if you ask me what will be the easier method for development be my guess... the EJB 3.0 its really easy and far more maintainable than the previous EJB 2.1, what about the performance? the performance its not really a problem of one Entity or the way you access your db, I think its a problem of how did you solve your design issues before start the development, but from the SCEA perspective... again EJB 3.0 its faster because it has a lot of improves in performance, cache things and reuse, unless you create your own framework to persist and take care of all the cache issues and object queries EJB 3.0 will be faster every time. Off course, if you compare a simple statement over EJB 3.0 the statement will be faster than the entities, but... that's not the normal behaviour in an Enterprise application, you will have several tables, and several rows on them.

What about scalability? ok, the container ensures that your data will be replicated over all the nodes in the cluster, so you will not need to add anything else to keep the clusters up to date, in this way you could add new nodes to the array and made your application more scalable writing zero code lines, what about JDBC? it is not scalable? mmm scalable relies on how you would add new customers, the jdbc protocol will not be a limitation, the JDBC problem relies on the data cache, with the JDBC aproximation you will need to be in touch with your db to keep synchronization between the nodes, and this will be very expensive. In this way, the JDBC its scalable yes, but will lead you to a performance problem because the lack of cache. And JDO? well JDO and EJB 3 are pretty much the same thing (in the approach concepts) the difference relies again in the cache service, the framework will not replicate the changes and this will generate some problems. DAO will solve any of this? really... it does not, why? just because DAO its an implementation pattern and it will be implemented using JDBC, EJB 3 or JDO as the persistence layer.

In the extensibility context which one is best? all of the approaches are extensibles, but which one will needs more time? I think JDBC, just because the addition of new fields will need a lot of changes, UPDATE, INSERT, FIND and DELETE statements could change, in the EJB 3 you will need to add a new variable to the class and create its accessors (getters and setters) and thats it. Again EJB 3 wins the battle over JDBC, but what about JDO or hibernate? in terms of extensibility JDO and hibernate has a disadvantage, the mapping xml file, you need to do one more step over EJB 3, put the mapping data in that file (beware, hibernate has a new implementation that will get the entity information from the annotations, so hibernate will be as extensible as EJB 3 is).

And Security? ok, let me define security... do you need a user and password to create a connection with the db? for all the approaches the answer will be YES, so where is the difference? as far as I see none.

Conclusion

In the SCEA you will not be answered to choose between EJB 3, EJB 2.1 (CMP or BMP), JDO or Hibernate, it will assume if you are doing the SCEA test you will be approach your answers with his framework JPA, so you will be asked about how to ensure transactions or change things, you will not be asked about which one is the best solution. (From Sun's perspective EJB 3 is the best... off course... I do either :D)

Hope this article solves all the questions about the persistence you could have in order to pass the test, if you have any further questions about it please let me know and I will edit this article to solve some of them.

6 comments:

Rashmi said...

Thanks Cross, still working on this and sure would have tons of qs end of the week.

Appreciate your help.

Thanks
Rashmi

Cross said...

Good Penguin, please let me know if you have any further questions, btw in order to promote this site please share the link with your friends or forum communities.

Omer said...

Hi corss, I stumbled on your wiki from the SCEA 5 Google group. I'm thinking of doing the exam this year. Do you recommend any books for the exam?. I had a look around and all the books cover the old exam so my guess I need one of those books + EJB3 book. Let me know please

Thanks
Omer

Cross said...

Hi Omer, from my experience its easy to pass the exam with the old books and some knowledge in the new things like EJB3, JPA, etc.

I think you should be ready reading the guides for the old SCEA and taking a good read on the following topics:
1. EJB (take a look of the java tutorial 5 and you will be ok)
2. JEE Components:
- take a read about JSF, how it works and how to use it in your design
3. Design Patterns: Some of the patterns could change, and some new could appeared. So take a good read on the blueprints patterns, GoF, etc.

I hope this help you.

Rashmi said...

Cross a q, we know that JDO is kind of ORM so why do we talk abt comparing JDO with ORM, initially this really confused me.

After I saw your post, though I now know I am right but still don't get it.

Thanks
Rashmi

Cross said...

Hi Rashmi, yes you're right, JDO is an implementation of the ORM concept, take a look of this: http://db.apache.org/jdo/jdo_v_jpa_orm.html you will see the following: "There are 2 prevalent specification in the Java ORM world. JDO2 provides the most complete definition, whilst JPA is the most recent." this comes from the JDO's creators.

From the SCEA perspective you will not have any problem, because they will never ask you to explain the difference or something like that. But it's very important to understand this definition as an Architect professional.