What are the best approaches to clustering/distributing a Java server application ?
I’m looking for an approach that allows you to scale horizontally by adding more application servers, and more database servers.
- What technologies (software engineering techniques or specific technologies) would you suggest to approach this type of problem?
- What techniques do you use to design a persistence layer to scale to many readers/writers
Scale application transactions and scale access to shared data (best approach is to eliminate shared data; what techniques can you apply to eliminate shared data).
- Different approaches seem to be needed depending on whether your transactions are read or write heavy, but I feel like if you can optimize a “write” heavy application that would also be efficient for “read”
The “best” solution would allow you to write a Java application for a single node and hopefully “hide” most of the details of accessing/locking shared data.
In a distributed environment the most difficult issue always comes down to having multiple transactions accessing shared data. There seems like there’s 2 common approaches to concurrent transactions.
- Explicit locks (which is extremely error prone and slow to coordinate across multiple nodes in a distributed system)
- Software transactional memory (STM) AKA optimistic concurrency where a transaction is rolled back during a commit if it discovers that shared state has changed (and the transaction can later be retried).
Which approach scales better and what are the trade-offs in a distributed system?
I’ve been researching scaling solutions (and in general applications that provide an example of how to scale) such as:
- Terracotta – provides “transparent” scaling by extending the Java memory model to include distributed shared memory using Java’s concurrency locking mechanism (synchronized, ReentrantReadWriteLocks).
- Google App Engine Java – Allows you to write Java (or python) applications that will be distributed amongst “cloud” servers where you distribute what server handles a transaction and you use BigTable to store your persistent data (not sure how you transactions that access shared data or handle lock contentions to be able to scale effectively)
- Darkstar MMO Server – Darkstar is Sun’s open source MMO (massively multiplayer online) game server they scale transactions in a thread transactional manner allowing a given transaction to only run for a certain amount and committing and if it takes to long it will rollback (kinda like software transactional memory). They’ve been doing research into supporting a multi-node server setup for scaling.
- Hibernate’s optimistic locking – if you are using Hibernate you can use their optimistic concurrency support to support software transactional memory type behavior
- Apache CouchDB is supposed to “scale” to many reader/writer DB’s in a mesh configuration naturally. (is there a good example of how you manage locking data or ensuring transaction isolation?):
- JCache – Scaling “read” heavy apps by caching results to common queries you can use in Google appengine to access memcached and to cache other frequently read data.
Terracotta seems to be the most complete solution in that you can “easily” modify an existing server application to support scaling (after defining @Root objects and @AutoLockRead/Write methods). The trouble is to really get the most performance out of a distributed application, optimization for distributed systems isn’t really an after thought you kinda have to design it with the knowledge that object access could potentially be blocked by network I/O.
To scale properly it seems like it always comes down to partitioning data and load balancing transactions such that a given “execution unit” (cpu core -> thread -> distributed application node -> DB master node)
It seems like though to make any app scale properly by clustering you need to be able to partition your transactions in terms of their data access reads/writes. What solutions have people come up with to distribute their applications data (Oracle, Google BigTable, MySQL, Data warehousing), and generally how do you manage partitioning data (many write masters, with many more read DBs etc).
In terms of scaling your data persistence layer what type of configuration scales out the best in terms of partitioning your data to many readers/many writers (generally I’d partition my data based on a given user (or whatever core entity that generally is your “root” object entity) being owned by a single master DB)