Java Persistence Performance: no-sql

Showing posts with label no-sql. Show all posts

Monday, April 2, 2012

EclipseLink JPA supports MongoDB

EclipseLink 2.4 will support JPA access to NoSQL databases. This support is already part of the EclipseLink development trunk and can be tried out using the milestone or nightly builds. Initial support is provided for MongoDB and Oracle NoSQL. A plug-able platform and adapter layer allows for other databases to be supported.

NoSQL is a classification of database systems that do not conform to the relational database or SQL standard. They have various roots, from distributed internet databases, to object databases, XML databases and even legacy databases. They have become recently popular because of their use in large scale distributed databases in Google, Amazon, and Facebook.

There are various NoSQL databases including:

Mongo DB
Oracle NoSQL
Cassandra
Google BigTable
Couch DB

EclipseLink's NoSQL support allows the JPA API and JPA annotations/xml to be used with NoSQL data. EclipseLink also supports several NoSQL specific annotations/xml including @NoSQL that defines a class to map NoSQL data.

EclipseLink's NoSQL support is based on previous EIS support offered since EclipseLink 1.0. EclipseLink's EIS support allowed persisting objects to legacy and non-relational databases. EclipseLink's EIS and NoSQL support uses the Java Connector Architecture (JCA) to access the data-source similar to how EclipseLink's relational support uses JDBC. EclipseLink's NoSQL support is extendable to other NoSQL databases, through the creation of an EclipseLink EISPlatform class and a JCA adapter.

Let's walk through an example of using EclipseLink's NoSQL support to persist an ordering system's object model to a MongoDB database.

The source for the example can be found here, or from the EclipseLink SVN repository.

Ordering object model

The ordering system consists of four classes, Order, OrderLine, Address and Customer. The Order has a billing and shipping address, many order lines, and a customer.

public class Order implements Serializable {
private String id;
private String description;
private double totalCost = 0;
private Address billingAddress;
private Address shippingAddress;
private List orderLines = new ArrayList();
private Customer customer;
...
}
public class OrderLine implements Serializable {
private int lineNumber;
private String description;
private double cost = 0;
...
}
public class Address implements Serializable {
private String street;
private String city;
private String province;
private String country;
private String postalCode;
....
}
public class Customer implements Serializable {
private String id;
private String name;
...
}

Step 1 : Decide how to store the data

There is no standard on how NoSQL databases store their data. Some NoSQL databases only support key/value pairs, others support structured hierarchical data such as JSON or XML.

MongoDB stores data as BSON (binary JSON) documents. The first decision that must be made is how to store the objects. Normally each independent object would compose a single document, so a single document could contain Order, OrderLine and Address. Since customers can be shared amongst multiple orders, Customer would be its own document.

Step 2 : Map the data

The next step is to map the objects. Each root object in the document will be mapped as an @Entity in JPA. The objects that are stored by being embedded within their parent's document are mapped as @Embeddable. This is similar to how JPA maps relational data, but in NoSQL embedded data is much more common because of the hierarchical nature of the data format. In summary, Order and Customer are mapped as @Entity, OrderLine and Address are mapped as @Embeddable.

The @NoSQL annotation is used to map NoSQL data. This tags the classes as mapping to NoSQL data instead of traditional relational data. It is required in each persistence class, both entities and embeddables. The @NoSQL annotation allows the dataType and the dataFormat to be set.

The dataType is the equivalent of the table in relational data, its meaning can differ depending on the NoSQL data-source being used. With MongoDB the dataType refers to the collection used to store the data. The dataType is defaulted to the entity name (as upper case), which is the simple class name.

The dataFormat depends on the type of data being stored. Three formats are supported by EclipseLink, XML, Mapped, and Indexed. XML is the default, but since MongoDB uses BSON, which is similar to a Map in structure, Mapped is used. In summary, each class requires the @NoSql(dataFormat=DataFormatType.MAPPED) annotation.

@Entity
@NoSql(dataFormat=DataFormatType.MAPPED)
public class Order

@Embeddable
@NoSql(dataFormat=DataFormatType.MAPPED)
public class OrderLine

Step 3 : Define the Id

JPA requires that each Entity define an Id. The Id can either be a natural id (application assign id) or a generated id (id is assign by EclipseLink). MongoDB also requires an _id field in every document. If no _id field is present, then Mongo will auto generate and assign the _id field using an OID (object identifier) which is similar to a UUID (universally unique identifier).

You are free to use any field or set of fields as your Id in EclipseLink with NoSQL, the same as a relational Entity. To use an application assigned id as the Mongo id, simply name its field as "_id". This can be done through the @Field annotation, which is similar to the @Column annotation (which will also work), but without all of the relational details, it has just a name. So, to define the field Mongo will use for the id include @Field(name="_id") in your mapping.

To use the generated Mongo OID as your JPA Id, simply include @Id, @GeneratedValue, and @Field(name="_id") in your object's id field mapping. The @GeneratedValue tells EclipseLink to use the Mongo OID to generate this id value. @SequenceGenerator and @TableGenerator are not supported in MongoDB, so these cannot be used. Also the generation types of IDENTITY, TABLE and SEQUENCE are not supported. You can use the EclipseLink @UUIDGenerator if you wish to use a UUID instead of the Mongo OID. You can also use your own custom generator. The id value for a Mongo OID or a UUID is not a numeric value, it can only be mapped as String or byte[].

@Id
@GeneratedValue
@Field(name="_id")
private String id;

Step 4 : Define the mappings

Each attribute in your object has too be mapped. If no annotation/xml is defined for the attribute, then it mapping will be defaulted. Defaulting rules for NoSQL data, follow the JPA defaulting rules, so most simple mappings do not require any configuration if defaults are used. The field names used in the Mongo BSON document will mirror the object attribute names (as uppercase). To provide a different BSON field name, the @Field annotation is used.

Any embedded value stored in the document is persisted using the @Embedded JPA annotation. An embedded collection will use the JPA @ElementCollection annotation. The @CollectionTable of the @ElementCollection is not used or supported in NoSQL, as the data is stored within the document, no separate table is required. The @AttributeOverride is also not required nor supported with NoSQL, as the embedded objects are nested in the document, and do not require unique field names. The @Embedded annoation/xml is normally not required, as it is defaulted, the @ElementCollection is required, as defaulting does not currently work for @ElementCollection in EclipseLink.

The relationship annotations/xml @OneToOne, @ManyToOne, @OneToMany, and @ManyToMany are only to be used with external relationships in NoSQL. Relationships within the document use the embedded annotations/xml. External relationships are supported to other documents. To define an external relationship a foreign key is used. The id of the target object is stored in the source object's document. In the case of a collection, a collection of ids is stored. To define the name of the foreign key field in the BSON document the @JoinField annotation/xml is used.

The mappedBy option on relationships is not supported for NoSQL data, for bi-directional relationships, the foreign keys would need to be stored on both sides. It is also possible to define a relationship mapping using a query, but this is not currently supported through annotations/xml, only through a DescriptorCustomizer.

@Basic
private String description;
@Basic
private double totalCost = 0;
@Embedded
private Address billingAddress;
@Embedded
private Address shippingAddress;
@ElementCollection
private List orderLines = new ArrayList();
@ManyToOne(fetch=FetchType.LAZY)
private Customer customer;

Step 5 : Optimistic locking

Optimistic locking is supported with MongoDB. It is not required, but if locking is desired, the @Version annotation can be used.

Note that MongoDB does not support transactions, so if a lock error occurs during a transaction, any objects that have been previously written will not be rolled back.

@Version
private long version;

Step 6 : Querying

MongoDB has is own JSON based query by example language. It does not support SQL (i.e. NoSQL), so querying has limitations.

EclipseLink supports both JPQL and the Criteria API on MongoDB. Not all aspects of JPQL are supported. Most basic operations are supported, but joins are not supported, nor sub-selects, group bys, or certain database functions. Querying to embedded values, and element collections are supported, as well as ordering, like, and selecting attribute values.

Not all NoSQL database support querying, so EclipseLink's NoSQL support only supports querying if the NoSQL platform supports it.

Query query = em.createQuery("Select o from Order o where o.totalCost > 1000");
List orders = query.getResultList();

Query query = em.createQuery("Select o from Order o where o.description like 'Pinball%'");
List orders = query.getResultList();

Query query = em.createQuery("Select o from Order o join o.orderLines l where l.description = :desc");
query.setParameter("desc", "shipping");
List orders = query.getResultList();

Native queries are also supported in EclipseLink NoSQL. For MongoDB the native query is in MongoDB's command language.

Query query = em.createNativeQuery("db.ORDER.findOne({\"_id\":\"" + oid + "\"})", Order.class);
Order order = (Order)query.getSingleResult();

Step 7 : Connecting

The connection to a Mongo database is done through the JPA persistence.xml properties. The "eclipselink.target-database" property must define the Mongo platform "org.eclipse.persistence.nosql.adapters.mongo.MongoPlatform". A connection spec must also be defined through "eclipselink.nosql.connection-spec" to be "org.eclipse.persistence.nosql.adapters.mongo.MongoConnectionSpec". Other properties can also be set such as the "eclipselink.nosql.property.mongo.db", "eclipselink.nosql.property.mongo.host" and "eclipselink.nosql.property.mongo.port". The host and port can accept a comma separated list of values to connect to a cluster of Mongo databases.

<persistence-unit name="mongo-example" transaction-type="RESOURCE_LOCAL">
<class>model.Order</class>
<class>model.OrderLine</class>
<class>model.Address</class>
<class>model.Customer</class>
<properties>
<property name="eclipselink.target-database" value="org.eclipse.persistence.nosql.adapters.mongo.MongoPlatform">
<property name="eclipselink.nosql.connection-spec" value="org.eclipse.persistence.nosql.adapters.mongo.MongoConnectionSpec">
<property name="eclipselink.nosql.property.mongo.port" value="27017">
<property name="eclipselink.nosql.property.mongo.host" value="localhost">
<property name="eclipselink.nosql.property.mongo.db" value="mydb">
<property name="eclipselink.logging.level" value="FINEST">
</property>
</property>

Summary

The full source code to this demo is available from SVN.

To run the example you will need a Mongo database, which can be downloaded from, http://www.mongodb.org/downloads.

EclipseLink also support NoSQL access to other data-sources including:

Oracle NoSQL
XML files
JMS
Oracle AQ

Thursday, March 29, 2012

NoSQL

In the beginning data was free and wild. It was not confined to rows and columns and not bounded to standardization. Data access was unruly and proprietary. These were the first "NoSQL" databases. They consisted of flat file, hierarchical and network databases such as VSAM, IMS and ADABASE.

Then there was SQL, and things were good.

SQL was developed during the golden age of data in the 1970s. Database access became standardized through the SQL language and the relational model. The 1970s saw the birth of relational database products such as RDBMS, Ingres, Oracle and DB2. The 1980s saw ANSI standardization of the SQL language, and the adoption of client-server computing.

However, the legacy databases still existed, as well as the legacy applications that accessed them. New applications needed to access the old data, and this was in general a very painful experience.

Back in the good old Smalltalk days during the 1990s, Smalltalk was unofficially adopted as the programming language of choice for large corporate projects. It was the beginning of the commercial adoption of object-oriented programming, both Smalltalk, C++ and other OO languages. Things were great, but there was a dark side. All of the data was stored in relational databases, or worse legacy mainframe databases. Fitting round objects into square relational tables was difficult and cumbersome. Two solutions emerged, object-oriented databases, and object-relational mapping.

New commercial object-oriented database management systems (OODBMS) emerged in the 1990s including Versant, Gemstone and ObjectStore. They were integrated with their respective languages, Smalltalk and C++, and stored data as it was represented in memory, instead of in the relational model. These were the 2nd generation of "NoSQL" databases. There was little standardization and solutions were mainly proprietary. Access to the data from non object-oriented languages was difficult. The world did not adopt this new model, as it had previously adopted the relational model. The worlds data remained in the trusted, standardized and universally accessible relational model.

Object-relational mapping allowed objects to be used in the programming model, but have them converted to relational rows and SQL when persisted. A lot of OR mapping frameworks were built, including many corporate in-house solutions. TopLink, a product from The Object People became the leading OR mapper in the Smalltalk language. In C++ there was Persistence, as well as various other products in various languages.

Although the relational model was the industry standard for any new applications, much of the worlds data remained in mainframe databases. The data was slowly being migrated, but most corporations still had mainframe data. Consulting at TopLink clients in the 90s I found most clients were building applications on relational database, but still had to get some data from the mainframe. This is when we created the first version of TopLink's "NoSQL" support. Of coarse NoSQL was not a buzz word at the time, so the offering was called TopLink for the Mainframe. The main problem was that everyone's mainframe data and access was different, so the product involved lots of consulting.

When Java came along, TopLink moved from Smalltalk to Java. OR mapping became very popular in Java and many new products came to market. The first real OR standard came in the form on EJB CMP. It had is "issues" to say the least, and was coupled with the J2EE platform. A new competing standard of JDO was created in retaliation to CMP. To reconcile the issue of having two competing Java standards, JPA was created to replace them both, and was adopted by most OR mapping products.

In response to the popularity of object-oriented computing, the relational database vendors created the object-relational paradigm. This allowed storage of structured object types and collections in relational tables. SQL3 (SQL 1999) defined new query syntax to access this data. Despite some initial hype, the object-relational paradigm was not successful, and although the features remain in Oracle, DB2 and Postgres, the world stayed with the trusted relational model.

The panic around Y2K had the good fortune of getting most corporations and governments off mainframe databases, and into relational databases. Some legacy data still remained, so we also offered TopLink for the Mainframe in Java. At that time the Internet was taking off, and XML was becoming popular. Since XML is hierarchical data that you could convert any mainframe data to, it became part of our solution for accessing legacy data and the TopLink SDK was born.

With the explosion of the Internet, XML was becoming increasingly popular. This lead to once again the questioning of the relational model, and the creation of XML databases (the 3rd generation of NoSQL). There were several XML databases that achieved much hype, but limited market success. The relational database vendors responded by adding XML support for storage of XML in relational tables.

Again the world stayed with the relational model.

The TopLink SDK also provided simple object to XML mapping, perhaps the first such product to do so. As XML usage in Java became mainstream, the TopLink SDK was split into two products. TopLink Moxy become TopLink's object to XML mapping solution. TopLink EIS became TopLink's legacy data persistence solution.

Around 2009 the term NoSQL was used to categories the new distributed databases being used at Google, Amazon and Facebook. The databases categorized themselves as
being highly scalable, not adhering to ACID transaction semantics, and having limited querying. The NoSQL term grew to include the various other non-relational databases that have emerged throughout the ages.

Is the relational model dead? Will the world switch to the NoSQL model, and will data once again be free? Only time will tell. If history teaches us anything, one would expect the relational model to persist. NoSQL has already been renamed in some circles to "Not Only SQL", to leave room for the NoSQL databases to support the SQL standard. In fact, some NoSQL databases already have support for JDBC drivers. My intuition is a union of the two models, perhaps this has already begun with some NoSQL databases adding SQL support, and some relational databases extended their clustering support such as MySQL cluster.

EclipseLink 2.4 will contain JPA support for NoSQL databases. Initial support with include MongoDB and Oracle NoSQL. This support is already available in the EclipseLink nightly builds. Technically, this is not new functionality, as EclipseLink (formerly TopLink) has been supporting non-relational data for over a decade, but the JPA support is new.

In the upcoming months I will be blogging about some of the new features in EclipseLink to support NoSQL. This blog post is solely an introduction, so sorry to those expecting hard content.

Subscribe to: Posts ( Atom )

Pages