[Cosmo-dev] Deployment of Cosmo persistence load distribution
jared at wordzoo.com
Mon Jul 17 13:52:21 PDT 2006
Thread opening: Deployment of Cosmo persistence load distribution. aka,
MySQL can't be scaled to arbitrary sizes. In which we discuss the looming
issue of scaling the Hosted Service up. Act 1.
I was recently discussing with someone the near future of Cosmo storage
separation into a MySQL layer. Though we were both delighted, I discovered
this person had an assumption that essentially, once the persistence layer
is implemented with MySQL, one can add MySQL boxes to scale up to arbitrary
sizes. Scaling problem solved!
No, this is sorely not the case; MySQL (as with most RDBMS platforms),
supports decent "active/hot-standby" failover configurations, but
distributing a transactional workload across multiple databases is still a
Technically speaking, the problem is not read operations. For most read
operations like static web page hosting, image hosting, RSS, email, etc,
there are reasonably well-understood patterns for distributing the load.
Caching and time-delayed synchronized filesystems figure prominently in
Write operations are harder. Much, much harder.
There are a few standard patterns, a few best practices, some exotic
approaches, and a encyclopedia-sized trail of not-very-successful attempts.
Solutions like Oracle RAQ do allow a transactional write load. MySQL has a
product, but it was initially implemented to work only with in-memory
databases. Though someone is likely to pop up with "what about" and "MySQL
does X now" responses, I hope it's not controversial to say this is a hard
problem with limited solutions.
We need to determine what we're doing to have a large-scale story by the
Beta timeframe. Beta should technically be testing that implemented
solution as well, so there is time pressure. (No testing one entire model
during Beta then switching to something else after 1.0). The days where
Cosmo can handwave about the distributed backend have past, especially as we
intend to take on the persistence layer more directly with a Hibernate backend.
So I'm stating the problem as how to distribute the persistence layer over
multiple (3 more more) physical boxes. (I write 3 or more to exclude
configurations where we've just done "failover", but only one of the
databases is writing at a time, still giving us a maximum write i/o cap.)
It's worth pointing out that we don't absolutely have to solve the
"3-or-more servers" load-distribution problem. It is possible to buy larger
and larger database servers ($50k, $250k, $1M, etc), buy two of them to
implement a active/hot-standby configuration, and scale a hosted service up
to pretty much arbitrary sizes. The same can be true for implementing a
"true" distributed database like Oracle RAQ; we can throw large amounts of
money at the problem to make the "database layer" appear as one large
infrastructure that the app can treat as a black box.
Realistically though, we likely need to solve this problem through direct
application support of persistence layer distribution. In practical terms,
the way most people solve this is by having multiple partitioned/separate
databases DB1, DB2, DB3, etc and having the application look up which
database server to connect to for most read/write operations. You then need
to code up features to provide cross-database management and provide a
global lookup table that any client can access any time (to avoid the "which
database to use" bootstrapping problem).
Pretty much everyone who has worked development or operations at an ASP
(Application Service Provider) has seen this in action (John can testify)
and though it's a hassle, it's at least known to solve the problem in a
pretty reasonable way. I bet people who've worked in these shops can attest
to ongoing incidents, bugs, etc related to this "manually-maintained"
distribution layer. But, necessary evil for Cosmo and the Hosted Service?
This deployment issue (of persistence layer load distribution) is one that
has existed since day 1 in Cosmo, has not yet had a plan-of-record, and
exists regardless of any Cosmo/Scooby code mingling. I do believe scaling
plays into the Cosmo/Scooby merge discussion, but it's subtle and any
discussion there should be flavored by our position on the overall scaling
My goals here are to:
1) Clarify that moving data storage off to MySQL leaves many vexing database
2) Identify the unsolved problem of distributing the RDBMS load in any large
3) Propose that we are likely to need to build our own app->database lookup
4) Open discussion to see if there are any other viable architectures we'd
like to settle on which avoid us having to build our own app->database
I haven't analyzed the options in detail but this email is too long anyway.
I'm happy to discuss vigorously the details in subsequent messages. Please
respond some thoughts you have on how we might best scale Cosmo to large
More information about the cosmo-dev