Saturday, January 29, 2011

Hibernate, Ehcache & JGroups

In this post I try to explain my experience on Hibernate L2 cache replication implementation.

Hibernate Cache

As we all know, there are two types of cache available in Hibernate.

  1. 1st Level Cache - Hibernate Session
  2. 2nd Level Cache(L2) - SessionFactory along with the external cache providers

* From here on i refer 2nd Level cache as L2 cache

L2 Cache

L2 cache can store Entities, Collections, Query results & Timestamps. Since Hibernate 3.3 each type gets a separate region to store the data. The data goes into the cache also varies based on the type. Preview on the same are below,

Entities:
  • Entity cache does not store instances of an entity, Instead it stores the entities in 'Dehydrated state' (minus associations)
          Ex.
          {id -> {attribute1,attribute2,attribute3}}
          {1 -> {"name",20, null}}
          {2 -> {"a name",30, 4}}

Collections:
  • Collection of primary key IDs not the actual dehydrated entities

Query Cache:
  • Query cache does not cache the state of the actual entities in the result set;
  • It caches only identifier values and value type of the result
  • Query along with the parameters are used as a key
        Ex.
{query,{parameters}} -> {id of the entity}
{"from Employee as e where e.joinedDate=:date", [12/07/2011]} -> [3423]

Timestamps:
  • Last updated timestamp for each entities.
{"tablename":"timestamp"}


Hibernate 3.3 cache SPI

  • Removed synchronization in the Hibernate cache plumbing.
  • Provides finer grained control over cache region storage and cache strategies
  • Cache providers are deprecated

Distributed cache behaviours


Local vs. Replication vs. Invalidation

Local - As it implies data stored in the cache will be local to the server instance. Data will not be replicated to the other instances in the cluster

Replication - On any change data stored in the cache will be copied to the other instances in the cluster.

Invalidation - On any change invalidation request will be sent to the other instances in the cluster along with the cache id, so that the data in the cache will be removed in other instances.

Synchronous vs. Asynchronous

Defines whether the replication is Sync or Async. If sync then the main thread needs to wait till the replication is over. If Async then the data will be inconsistent till the successfull replication.

Initial state transfer
To keep the newly up server to the same state

Eviction
Remove data from the local cache. If distributed cache then invalidation request will be sent to other instances

Cache provider selection

Note:Infinispan - a cache provider from JBoss, replaces Ehcache as a default cache provider for hibernate since Hibernate 3.5
About Ehcache 2.x

( copied from Ehcache site)
Ehcache 2.0 provides a new Hibernate 3.3 SPI caching plugin, JTA, Write-Behind, a new ultra-fast Bulk Loading API for clustered caches and dynamic runtime cache reconfiguration.
Ehcache 2.0 is fully backward compatible with earlier versions of Ehcache.
The Terracotta Server Array ("TSA") has also been re-engineered to dovetail with Ehcache to provide these features with cluster coherence,high availability and persistence.
This release also introduces some improvements to ehcache which reduce memory use (over 1.6 and 1.7) by the cache and improve the eviction algorithms.

Ehcache distributed cache

The following additional replicated caching mechanisms are available.

  • RMI - Point-to-point protocol
  • JMS - Not supported by > WAS 5
  • Terracotta - Commercial product
  • JGroups -Popular & stable, Based on IP Multicasting, JBoss, Tomcat clusters are based on this technology, Complex architecture

My preference would be JGroups than the others, so here i talk about JGroups more


The main features of JGroups

  • Group creation and deletion.
  • Group members can be spread across LANs or WANs
  • Membership detection and notification about joined/left/crashed members
  • Detection and removal of crashed members
  • Sending and receiving of member-to-group messages (point-to-multipoint)
  • Sending and receiving of member-to-member (Unicast) messages (point-to-point)
  • Fragmentation of large messages
  • Reliable unicast and multicast message transmission. Lost messages are retransmitted

IP Multicasting:

The sender sends a single datagram (from the sender's unicast address) to the multicast address, and the intermediary routers take care of making copies and sending them to all receivers that have registered their interest in data from that sender.

Hibernate 3.3 + Ehcache 2.0


Region factory assignment on session factory configuration,

<property name="hibernate.cache.region.factory_class">
              net.sf.ehcache.hibernate.EhCacheRegionFactory</property>
Ehcache + JGroups

Configuraging JGroups replication needs the below changes in the ehcache.xml

CacheManagerPeerProviderFactory:
This change is common for all the cache types
<cacheManagerPeerProviderFactory class="net.sf.ehcache.distribution.jgroups
    .JGroupsCacheManagerPeerProviderFactory"
     properties="connect=UDP(mcast_addr=231.12.21.132;mcast_port=45566;ip_ttl=32;
     mcast_send_buf_size=150000;mcast_recv_buf_size=80000):
     PING(timeout=2000;num_initial_members=6):
     MERGE2(min_interval=5000;max_interval=10000):
     FD_SOCK:VERIFY_SUSPECT(timeout=1500):
     pbcast.NAKACK(gc_lag=10;retransmit_timeout=3000):
     UNICAST(timeout=5000):
     pbcast.STABLE(desired_avg_gossip=20000):
     FRAG:
     pbcast.GMS(join_timeout=5000;join_retry_timeout=2000;
     shun=false;print_local_addr=true)"
 propertySeparator="::"
     />
CacheEventListenerFactory:
Below change is unique for each cache types.


<cache
    name="com.somecompany.someproject.domain.Country"
    maxElementsInMemory="10000"
    eternal="false"
    timeToIdleSeconds="300"
    timeToLiveSeconds="600"
    overflowToDisk="true">
    <cacheEventListenerFactory
    class="net.sf.ehcache.distribution.jgroups.JGroupsCacheReplicatorFactory"
    properties="replicateAsynchronously=true, replicatePuts=true,
replicateUpdates=true, replicateUpdatesViaCopy=false,
replicateRemovals=true" />
  </cache>


Ehcache Replication Properties

Here is the mapping of the Replication properties vs Ehcache properties










Best practices

Here is the best configuration options.
It is just a suggestion, these values should be derived based on the requirements.
Entities & Collections
     Synchronous Invalidation
     Less load on the network & cluster
     [‘replicatePutsViaCopy’-Synchronous Invalidation is not supported by the Ehcache-JGroups impl]
Query cache
       Local mode
       Cache invalidation will happen based on the timestamps value. So we can disable the replication.
TimeStamps cache
       Synchronous Replication


Sunday, July 25, 2010

My fav iOS4 apps

Recently i bought an iPod touch 3G. The interface is amazing. Also i have upgraded my iPod to 'iOS4'. That is even great!!. Thanks to Apple's new policy related to free upgrade of OS even for iPods.

I find Multitasking, folder and playlist creation are more useful in iOS4.

Another great thing is iOS4 apps. There are so many to download. Here is my app list,

Games:
=====
Music:
-------
DigiLite - drums
Virtuoso - Piano
Drum kit Lite

Action:
-------
Monkey Flight
RollerCoaster rush

Racing:
-------
Need for speed - Undercover
Racing GTI
Traffic Rush

Fancy :
--------
Papers Toss
Action Bowling
Indian Rummy

Utils:
====
WordWeb
GoodReader
Twitter
Fring
Opera mini

Here is My favorite:

Games:
--------
1. NFS - (Obviously, i bought this game :) ) really great, reminds my college days.
2. Indian Rummy
3. Paper Toss
4. Traffic Rush

Utils:
-----
1. WordWeb
2. GoodReader - PDF reader

Wednesday, June 30, 2010

Cashing in on Caching...

Here are the very recent developments in Caching...




The reason behind this new development is 'Clouds'.

Why caching?

"It is often used to speed up dynamic database-driven websites by caching data and objects in RAM to reduce the number of times an external data source must be read"

Then Why RDBMS?

If we reduce no. of db calls then why do we need complex rdbms.?
simple key-value store is more than enough.

So all the investments towards Cache is to replace RDBMS?

May be....But as a software engineers, need to learn about Cache layers, without forgetting RDBMS. :)

Monday, July 20, 2009

Performance analysis

We have been working on analyzing the performance of our Spring+Hibernate based web application since march. The task is to mainly identify the 'bottle necks' & other hot spots.

Functional Problems:

  • Less throughput - <>
  • Memory leak - Heap memory is not getting released, it leads to frequent application server restart
  • Less capacity, There is a Excel reading part, it is now restricted to 20MB(approx.), but the min requirement is 40MB +. If we use bigger files Excel reading(PoI api) fails due to memory unavailability.
  • XML plays important role, so XML(Excel to XML) to Object graph memory intensive due to the size

How to find the hot spots ??

  • Consider single user, single thread scenario
  • From UI to Repository measure the memory & execution time
  • We have developed a simple AOP based Monitoring api which will intercept the application in different layers and collects the statistics and dumps that into a log file after the complete cycle is over.
  • * To analyse the Heap usage, we have done some profiling activity using 'YourKit java profiler'.

Outcome:

* Heap analysis shows that

  • Entity objects are left due to the Hibernate Bug
  • Hibernate Query cache usage flaw(details of these are available in the below threads)
* Excel Reading takes much memory and time
  • POI user model maps the Excel into multiple different types of objects. WorkBook -> WorkSheet -> Row -> Cell -> etc, but thier core structure is List of Record object. Just to increase the usablility they are creating all the above objects. Due to this data extraction from our complex Excel sheet takes more memory and time.
* 32 bit JVM,
  • This is also a problem, why because we can allocate only 3GB(MAX) for JVM. Max limit for 32 bit JVM is 4GB, which is very less for handling bigger xmls and excels

Solution(Recommendations):

To solve the memory leak,
  • Upgrade the Hibernate jar
  • Avoid passing the objects to cacheable queries
  • Manage hibernate 2nd level efficiently (define exit criteria through ehcahe.xml)
Excel Reading:
  • Use POI Event model, that is like attacking the Record object instead of the complext object graph. POI claims 10 times improvement in memory footprint. But re-writing the exisitng logic is ........
To handle the bigger xmls and excels
  • Upgrade the JVM to 64bit, so that we can allocate more memory. But we have to take care of the GC issues (may be slow).
To increase the user base,
  • Horizontal Scalling, Watch this space to know more about Scalling..., just started learning about this.

Monday, May 11, 2009

Hibernate leaves Lazy enabled objects in ThreadLocal

After a long monitoring activity we found that some of our objects which are huge in size, left in the Heap. Due to this frequent restart of our production server has become our routine task.

The profiler output shows that the objects resides in the ThreadLocal map. All of them are Lazy enabled Hibernate Entity objects.

After Googling we come to know that there is a serious bug in Hibernate CGLib proxy creation mechanism.

Because of that all our 'lazy=proxy' objects are not removed from their ThreadLocal(Hibernate is using ThreadLocal to keep that proxy object temporarily). Why ThreadLocal? that is internal to hibernate. But the problem is they forgot to nullify the object available in the ThreadLocal.

So it will stay around (together with the object it's managing, and whatever object graph it may be connected to) until the next time a proxy is created for that type on that thread.

This is really a costly bug. Why because, we enable lazy for potentially big objects like Collections, BLOB Data etc. So there is a huge impact on the performance.

We noticed this problem in Hibernate 3.2.0 and solved in 3.2.3. Another process started to analyse the impact on the upgrade. This will go on....(verify the Hibernate bug URL below.)

Another way to solve this problem (alternate for upgrading the hibernate jar) is to get the thread and clean up the ThreadLocal map available in that thread. Doing that is again worse than the above. Because we have to figure out when and where to do this operation. 

Another general Question is Why cant the Thread pool clean up the Thread before it returns ....?? If the ThreadPool provides pre & post processors to initialize & cleanup a Thread then fine. I dont know something like that already available. If anybody aware please let me know..

If available also i need to access Server's ThreadPool... no way..right? Better i will upgrade the jar, i hope the jar upgrade wont create new problems.

General Understanding:

* Don't forget the method 'remove' in ThreadLocal object.
* Be aware of the Hibernate release before 3.2.3


try{
:
}finally{
threadLocal.remove();
}

Reference:
http://opensource.atlassian.com/projects/hibernate/browse/HHH-2481

Sunday, March 15, 2009

Aspect Oriented Programming in my way..

Here is my simple presentation about 'AOP in spring way'. I presented this in our account level Technical event. This is my second contribution for this event, first was about 'Multipurpose Template engines' (Feb 2008).

I explained a bit about AOP concepts & how spring manages transaction using aop.

My conclusion is 'Think in terms of AOP not in terms of AOP frameworks'. Our main concern should be to 'Improve the Separation of Concerns' in our application in some way.

Presentation for you:

Monday, December 1, 2008

Eclipse Mylyn & Trac

For the past 2 weeks I have been using Trac (Project management tool) inside the eclipse. It seems good. There are some useful feature which we can adapt, those might ease up our development process. There are terms like Activate & deactivate tasks, Context, Schedule etc.

* Context holds a collection of modified files for a particular task.
* When we work on multiple tasks it is very easy for us to retrieve the modified files for the particular task using its context.
* Also when we commit the files, task no and the subject line of the task will be automatically added to the svn comment and the trac is also updated with the svn version no of the modified files.

These features i thought useful. Connectors are available for JIRA, Trac & Bugzilla.

I have to try conflict scenario like the single file which is associated with multiple contexts. Watch this space for my results.

It is really a good plug-in to try.

Ref:
How to : http://wiki.eclipse.org/index.php/Mylyn_User_Guide#Eclipse_settings