Monday, December 24, 2007

Super scalable Java applications with terracotta and the Amazon elastic computing cloud (ec2).

written by Marcel Panse

I just got this idea, and i don't know if it works at all. I'm just playing with the idea and thought i might just as well write it down here immediately.

Amazon (S3 and EC2)

Amazons launched a couple of services, one of them is EC2 (Elastic Computing Cloud) and one of them is S3 (Simple Storage Service). S3 is a distributed storage solution where you can store an unlimited amount of data online. This storage is actually distributed over lots and lots of (virtual) servers around the globe.
Amazon S3 provides a simple web services interface that can be used to store and retrieve any amount of data, at any time, from anywhere on the web. It gives any developer access to the same highly scalable, reliable, fast, inexpensive data storage infrastructure that Amazon uses to run its own global network of web sites.
When using S3 you only have to pay about $0.18 per GB storage a month and $0.16 per GB data transfer a month.
EC2 stands for 'Elastic computing cloud' and makes it possible to launch an image of a server online. This virtual server will be launched somewhere in the computing cloud, which again is distributed over lots of servers.
Just as Amazon Simple Storage Service (Amazon S3) enables storage in the cloud, Amazon EC2 enables "compute" in the cloud. Amazon EC2's simple web service interface allows you to obtain and configure capacity with minimal friction. It provides you with complete control of your computing resources and lets you run on Amazon's proven computing environment. Amazon EC2 reduces the time required to obtain and boot new server instances to minutes, allowing you to quickly scale capacity, both up and down, as your computing requirements change. Amazon EC2 changes the economics of computing by allowing you to pay only for capacity that you actually use.
You pay about $0.10 per instance hour. I can imaging it saves a lot of money, since you only launch the amount of servers you need at the moment and you are able to scale it down or up when you like to.

Terracotta

Open Terracotta is Open Source JVM-level clustering software for Java. It delivers clustering as a runtime infrastructure service, which simplifies the task of clustering a Java application immensely, by effectively clustering the JVM underneath the application, instead of clustering the application itself.

Open Terracotta's JVM-level clustering can turn single-node, multi-threaded applications into distributed, multi-node applications, often with no code changes. Terracotta plugs into the Java Memory Model in order to maintain semantics of Java (Java Language Specification), such as pass-by-reference, thread coordination, and garbage collection across the cluster. Open Terracotta's JVM-level clustering is enabled through declarative configuration (XML), and provides fine-grained, field-level replication, which means that objects do not need to implement Java serialization.

With a simple load balancer it doesn't matter to which server the request is send, because all servers are on the grid and share the same memory and thus all user sessions are available on all servers. When adding a new server to the grid it takes automatically the state of the application and can receive requests immediately. This makes it very easy to scale up when needed.

Another idea is to use Amazons S3 Service (distributed online storage) for file storage, because this also is unlimited scalable. All data transfer between the EC2 cloud and S3 is free!

Winning combination?

The idea is to create a standard java backend application (the usual spring/hibernate/etc application). Then we take that application and launch it in a terracotta grid. That makes it scalable over multiple servers sharing the same resources through terracotta. Take it a step further by creating an image and launch it in Amazons EC2 cloud.

Possible challenges (problems do not exists, right?)

All images in the EC2 cloud are stateless, when the server images goes offline or restarts all state is gone. Every image starts from a blank image, so you can't write to disk. But that shouldn't be necessary if we write all files to S3. Another thing is databases, if we have like 5 images in the cloud all running terracotta and thus share the same memory/state. Then we don't have to do persist anything to the database, and just keep it in memory. Terracotta could drastically help to reduce the database load. But we can't keep everything in memory, you don't want to keep historic data that is practically not used anymore in memory. You only want to keep objects in memory that you are likely to query a lot. Where do we store the data that we don't want in memory? I heard about the Amazon Simple Database (SDB), which is going to get launched pretty soon (at the moment of writing this). I didn't look into it yet, but sounds promising.

I'm going to work this out and try this later, i you have any comments or suggestions let me know!