Java Persistence Performance: concurrency

Showing posts with label concurrency. Show all posts

Monday, June 10, 2013

Cool performance features of EclipseLink 2.5

The main goal of the EclipseLink 2.5 release was the support of the JPA 2.1 specification, as EclipseLink 2.5 was the reference implementation for JPA 2.1. For a list of JPA 2.1 features look here, or here.

Most of the features that went into the release were to support JPA 2.1 features, so there was not a lot of development time for other features. However, I was still able to sneak in a few cool new performance features. The features are not well documented yet, so I thought I would outline them here.

Indexing Foreign Keys

The first feature is auto indexing of foreign keys. Most people incorrectly assume that databases index foreign keys by default. Well, they don't. Primary keys are auto indexed, but foreign keys are not. This means any query based on the foreign key will be doing full table scans. This is any OneToMany, ManyToMany or ElementCollection relationship, as well as many OneToOne relationships, and most queries on any relationship involving joins or object comparisons. This can be a major perform issue, and you should always index your foreign keys fields.

EclipseLink 2.5 makes indexing foreign key fields easy with a new persistence unit property:

"eclipselink.ddl-generation.index-foreign-keys"="true"

This will have EclipseLink create an index for all mapped foreign keys if EclipseLink is used to generate the persistence unit's DDL. Note that DDL generation is now standard in JPA 2.1, so to enable DDL generation in EclipseLink 2.5 you can now use:

"javax.persistence.schema-generation.database.action"="create"

EclipseLink 2.5 and JPA 2.1 also support several new DDL generation features, including allowing user scripts to be executed. See, DDL generation for more information.

Query Cache Invalidation

EclipseLink has always supported a query cache. Unlike the object cache, the query cache is not enabled by default, but must be enabled through the query hint "eclipselink.query-results-cache". The main issue with the query cache, is that the results of queries can change when objects are modified, so the query cache could become out of date. Previously the query cache did support time-to-live and daily invalidation through the query hint "eclipselink.query-results-cache.expiry", but would not be kept in synch with changes as they were made.

In EclipseLink 2.5 automatic invalidation of the query cache was added. So if you had a query "Select e from Employee e" and had enabled query caching, every execution of this query would hit the cache and avoid accessing the database. Then if you inserted a new Employee, in EclipseLink 2.5 the query cache for all queries for Employee will automatically get invalidated. The next query will access the database, and get the correct result, and update the cache so all subsequent queries will once again obtain cache hits. Since the query cache is now kept in synch, the new persistence unit property "eclipselink.cache.query-results"="true" was added to enable the query cache on all named queries. If, for some reason, you want to allow stale data in your query cache, you can disable invalidation using the QueryResultsCachePolicy.setInvalidateOnChange() API.

Query cache invalidation is also integrated with cache coordination, so even if you modify an Employee on another server in your cluster, the query cache will still be invalidated. The query cache invalidation is also integrated with EclipseLink's support for Oracle Database Change Notification. If you have other applications accessing your database, you can keep the EclipseLink cache in synch with an Oracle database using the persistence unit property "eclipselink.cache.database-event-listener"="DCN". This support was added in EclipseLink 2.4, but in EclipseLink 2.5 it will also invalidate the query cache.

Tuners

EclipseLink 2.5 added an API to make it easier to provide tuning configuration for a persistence unit. The SessionTuner API allows a set of tuning properties to be configured in one place, and provides deployment time access to the EclipseLink Session and persistence unit properties. This makes it easy to have a development, debug, and production configuration of your persistence unit, or provide different configurations for different hardware. The SessionTuner is set through the persistence unit property "eclipselink.tuning".

Concurrent Processing

The most interesting performance feature provided in EclipseLink 2.5 is still in a somewhat experimental stage. The feature allows for a session to make use of concurrent processing.

There is no public API to configure it as of yet, but if you are interested in experimenting it is easy to set through a SessionCustomizer or SessionTuner.


public class MyCustomizer implements SessionCustomizer {
  public void customize(Session session) {
    ((AbstractSession)session).setIsConcurrent(true);
  }
}

Currently this enables two main features, one is the concurrent processing of result sets. The other is the concurrent loading of load groups.

In any JPA object query there are three parts. The first is the execution of the query, the second is the fetching of the data, and the third is the building of the objects. Normally the query is executed, all of the data is fetched, then the objects are built from the data. With concurrency enabled two threads will be used instead, one to fetch the data, and one to build the objects. This allows two things to be done at the same time, allowing less overall time (but the same amount of CPU). This can provide a benefit if you have a multi-CPU machine, or even if you don't, it allows the client to be doing processing at the same time as the database machine.

The second feature allows all of the relationships for all of the resulting objects to be queried and built concurrently (only when using a shared cache). So, if you queried 32 Employees and also wanted each Employee's address, the address queries could all be executed and built concurrently, resulting in significant less response time. This requires the usage of a LoadGroup to be set on the query. LoadGroup defines a new API setIsConcurrent() to allow concurrency to be enabled (this defaults to true when a session is set to be concurrent).

A LoadGroup can be configured on a query using the query hint "eclipselink.load-group", "eclipselink.load-group.attribute", or through the JPA 2.1 EntityGraph query hint "javax.persistence.loadgraph".

Note that for concurrency to improve your application's performance you need to have spare CPU time. So, to benefit the most you need multiple-CPUs. Also, concurrency will not help you scale an application server that is already under load from multiple client requests. Concurrency does not use less CPU time, it just allows for the CPUs to be used more efficiently to improve response times.

Monday, March 7, 2011

JVM Performance - Part III - Concurrent Maps

Concurent Maps

The main difference between Hashtable and HashMap is that Hashtable is synchronized. For this reason Hashtable is still used in a lot of concurrent code because it is, in theory, thread safe. This theory is however normally just a theory, because if you don't write concurrent code correctly, it will still not be thread safe no matter how many synchronized methods you have.

For example you could call get() on the Hashtable, then if it is not there call put(), both operations are synchronized and thread-safe, but in between your get() and put() another thread could have done the same thing and put something there already, in which case your code may be incorrect and have thrown away some other thread's data. With a Hashtable the solution to this is to synchronize the whole operation on the map.

Object value = map.get(key);
if (value == null) {
    synchronized (map) {
        value = map.get(key);
        if (value == null) {
            value = buildValue(key);
            map.put(key, value);
        }
    }
}

JDK 1.5 added the ConcurrentMap implementation that is thread safe, and designed and optimized for concurrent access. It basically has pages inside the map, to avoid locks on concurrent access to different pages. It also provides useful API such as putIfAbsent() to allow something to be put in the map unless it is already there, in a thread safe manner. Using putIfAbsent() is more efficient than using a synchronized get() and put() in both concurrency and performance.

Object value = map.get(key);
if (value == null) {
    Object newValue = buildValue(key);
    value = map.putIfAbsent(key, value);
    if (value == null) {
        value = newValue;
    }
}

So, how does the performance and concurrency of HashMap, Hashtable and ConcurrentMap stack up? This test compare the performance for gets and puts in various Map implementations using 1 to 32 threads. It does 100 gets or puts in a Map of size 100. Two machines were tested. The first machines is my Windows XP desktop, that has two cores. The second machine is an 8 core Linux server. All tests were run 5 times and averaged, Oracle Sun JDK 1.6.23 was used.

Threads, is the number of thread running the test. The average is the total number of operations performed in the time period by all threads in total. The %STD is the percentage standard deviation in the results. The %DIF is the percentage difference between the run and the single threaded run (for the same Map type).

Concurrent Map Performance Comparison (desktop, 2cpu)

Map	Opperation	Threads	Average	%STD	%DIF (with 1 thread)
HashMap	get	1	3551306	0.06%	0%
HashMap	get	2	4121102	0.03%	16%
HashMap	get	4	4132506	0.15%	16%
HashMap	get	8	4227485	0.68%	19%
HashMap	get	16	4402532	1.36%	23%
HashMap	get	32	4426514	1.61%	24%
Hashtable	get	1	1132956	0.06%	0%
Hashtable	get	2	364236	0.08%	-211%
Hashtable	get	4	274603	0.14%	-312%
Hashtable	get	8	277188	1.08%	-308%
Hashtable	get	16	277881	0.78%	-307%
Hashtable	get	32	296779	2.51%	-281%
ConcurrentHashMap	get	1	2771098	0.04%	0%
ConcurrentHashMap	get	2	3466451	2.30%	25%
ConcurrentHashMap	get	4	3458492	0.33%	24%
ConcurrentHashMap	get	8	3510282	0.31%	26%
ConcurrentHashMap	get	16	3613182	2.36%	30%
ConcurrentHashMap	get	32	3599489	2.23%	29%
HashMap	put	1	3897925	0.07%	0%
HashMap	put	2	2614784	0.01%	-49%
HashMap	put	4	2473011	0.21%	-57%
HashMap	put	8	2482743	0.30%	-57%
HashMap	put	16	2506519	0.53%	-55%
HashMap	put	32	2579715	0.30%	-51%
Hashtable	put	1	1042076	0.33%	0%
Hashtable	put	2	474199	3.06%	-119%
Hashtable	put	4	179550	6.71%	-480%
Hashtable	put	8	183102	1.63%	-469%
Hashtable	put	16	393085	0.68%	-165%
Hashtable	put	32	398277	1.10%	-161%
ConcurrentHashMap	put	1	1336292	0.21%	0%
ConcurrentHashMap	put	2	557880	3.71%	-139%
ConcurrentHashMap	put	4	390736	1.64%	-241%
ConcurrentHashMap	put	8	362653	1.21%	-268%
ConcurrentHashMap	put	16	1492123	0.20%	11%
ConcurrentHashMap	put	32	1564926	0.10%	17%

Concurrent Map Performance Comparison (server, 8cpu)

Map	Opperation	Threads	Average	%STD	%DIF (with 1 thread)
HashMap	get	1	3047533	0.0%	0%
HashMap	get	2	7500603	0.1%	146%
HashMap	get	4	14080828	0.01%	362%
HashMap	get	8	25160569	0.01%	769%
HashMap	get	16	17215757	1.2%	464%
HashMap	get	32	11797330	7.7%	287%
Hashtable	get	1	1165834	0.06%	0%
Hashtable	get	2	434485	16.9%	-168%
Hashtable	get	4	203231	2.7%	-473%
Hashtable	get	8	201290	2.1%	-479%
Hashtable	get	16	358459	2.3%	-225%
Hashtable	get	32	303975	4.7%	-283%
ConcurrentHashMap	get	1	2119602	0.0%	0%
ConcurrentHashMap	get	2	5044317	0.1%	137%
ConcurrentHashMap	get	4	9422460	0.09%	344%
ConcurrentHashMap	get	8	10195480	0.0%	381%
ConcurrentHashMap	get	16	9799273	1.2%	362%
ConcurrentHashMap	get	32	9557975	0.1%	350%
HashMap	put	1	1729801	0.02%	0%
HashMap	put	2	1347323	0.1%	-28%
HashMap	put	4	1267770	0.02%	-36%
HashMap	put	8	1056226	0.0%	-63%
HashMap	put	16	1055462	0.01%	-63%
HashMap	put	32	1055139	0.01%	-63%
Hashtable	put	1	1391458	0.08%	0%
Hashtable	put	2	211793	13.1%	-556%
Hashtable	put	4	191052	2.8%	-628%
Hashtable	put	8	200480	3.4%	-594%
Hashtable	put	16	399748	2.3%	-248%
Hashtable	put	32	400840	3.2%	-247%
ConcurrentHashMap	put	1	1503588	0.2%	0%
ConcurrentHashMap	put	2	441143	0.8%	-240%
ConcurrentHashMap	put	4	380565	1.1%	-295%
ConcurrentHashMap	put	8	354054	11.8%	-324%
ConcurrentHashMap	put	16	1736618	2.8%	15%
ConcurrentHashMap	put	32	1699647	5.7%	13%

Very interesting results. Given the desktop machine has 2 CPUs, I would have expected the 2+ threaded tests to have at most 2x the single threaded test. For the server results with 8 CPUs, I would expect the results to double until 8 threads, then flatten out.

Given Hashtable is synchronized, HashMap is not, and ConcurrentHashMap is partially synchronized, I would have expect HashMap to be about 2x, Hashtable to be about 1x, and ConcurrentHashMap to be somewhere in between. The thing I like best about running performance tests, is that you rarely get what you expect, and these results are not what I would have expected.

My basic premise holds, in that for get() HashMap had the best concurrency, then ConcurrentHashMap, and then Hashtable. I would have not expected Hashtable to do so bad though. Given it is synchronized, only one thread can perform a get() at one time, so naively one would expect the same results as a single thread. The reality is that it had 5x worse performance than a single thread. The reasons for this include that synchronization has a certain overhead, which modern JVMs optimize out when only a single thread is accessing the object, but with multiple threads this overhead becomes very apparent. Also, contention in general has a huge performance cost, as the threads are busy waiting for the lock to become available. In addition just having multiple threads adds some overhead with context switching and such, so in general the more threads, the worse the peformance unless the threads are doing something useful.

The desktop results for get() are worse than I would have expected with 2 CPUs, but perhaps the second CPU was busy with other things, such as the OS and garbage collection, etc.

The put() results are much more perplexing. First of all, I know that running concurrent puts on a HashMap does not make much sense, as HashMap does not support concurrent puts. I ran the test anyway, just to see what would happen, and the result is quite surprising. I would expect similar results to the get() test. However, HashMap had much worse results, similar to Hashtable, as if it were having some contention. How could this be given that HashMap has no synchronized methods? My only explanation is that the puts required modifying the same memory locations, so were experiencing contention on the memory access. If you have a better explanation, please comment.

I would have also expected the put() concurrency for ConcurrentHashMap to be better. I think the primary reason for this was that my tests looped over the same set of keys in the same order, so each thread was trying to put the same key at the same time. Since ConcurrentHashMap works by having multiple pages and only having to lock a page on put() instead of the entire Map, it still had contention because for the most part the threads were accessing the same keys at the same time. I think this is also why the > 8 threads fared better. With more threads they got more out of synch, and had less page conflicts. This is an important point, ConcurrentHashMap will only perform well when it has a large enough size to have many pages, and when the access to it is random. If all the threads are accessing a single page, it is really no better than a Hashtable.

So, what does all this mean? In general, if you have static meta-data that is read-only and requires concurrent access, then use HashMap (it is only not thread safe with puts, gets are fine). If you require concurrent read-write access, then use ConcurrentHashMap. If you just like being old school, then use Hashtable, but beware the hidden costs of concurrency.

Subscribe to: Posts ( Atom )

Pages