[Cosmo-dev] Re: [service-dev] Deployment of Cosmo persistence load distribution

Jared Rhine jared at wordzoo.com
Mon Jul 17 23:03:12 PDT 2006


Brian Moseley wrote:
>> I bet people who've worked in these shops can
>> attest
>> to ongoing incidents, bugs, etc related to this "manually-maintained"
>> distribution layer.
> 
> yep. at critical path we wrote a dns-based "locator service" that,
> given a username, returned connection info for the data services to
> which that user had access. i don't recall any major operational
> problems with this service other than the need to make sure it was
> extremely performant and highly available.

Fair enough; forgive my operational FUD.  My last experience was at a 60Tb
Oracle/Veritas/Hitachi architecture, and about 500 app servers.  For all
that, downtimes and switchovers there still took on the order of a dozen
minutes.  Most of the issues I saw were with the data loading of the
redirection databases and syncing that with all the other databases.  They
could fairly be called process not technical problems though they produced
outages nonetheless.

> this was also a win for manageability because, once we had per-user
> granularity for service location, we were able to build migration
> tools that operated on a per-user basis. this allowed us to rebalance
> user data when new hardware was added, for instance, with minimal
> interruption to individual users and no interruption to the overall
> service.

Well, migration tools, automated and without service downtime are more like
a necessity than a benefit, once you partition your data.  I would tend to
strongly agree that yes, for cpath's and the Hosted Service's purposes,
placing the lookup operational at the granularity of the user is probably a
right way to go.  I'd probably couple the lookup with an "organization" id,
so that multiple simultaneous instances on top of the same servers and
lookup service can be kept entirely apart just through the lookup.

So ok, let's say we've got a service locater in operation; we can look up a
username and get back what, a server id?

Is that server id going to be the MySQL database hostname to use?  Or the
Cosmo app server to use?

In terms of deployment, I'd really like to have a stateless Cosmo.  Though
we probably need to partition RDBMS data, wouldn't it be great to have an
army of identical Cosmo servers ready to serve requests for any username at all?

This of course is greatly helped by a stateless Cosmo, and I admit a chill
recently when I read you say that part of planned changes (the merge I
believe?) may require a stateful Cosmo.  I didn't fully catch why, though.

With a stateful Cosmo, can I have even 2 Cosmo servers serving parallel
requests for a given set of users (ie, a particular database slice, users
aa-ag)?  Or were you thinking a 1:1 correspondence between database servers
and app servers?

For Scooby/web-ui sessions, I can likely provide for session stickiness, so
that subsequent web sessions end up back at the same server.  I'm trying to
think of how session stickiness issues relate with non-Scooby HTTP requests,
like test/admin scripts, CalDAV, etc.  Do I need to stick a layer in the
front of Cosmo that auto-detects the username incoming and then proxies to
the correct app server?

Best practice I feel is to avoid doing cheesy things like having a
www14.osaf.us and redirect to the right server if connections end up on the
wrong server.  (I realized recently that redirection works fine for
browsers, but maybe not so well for CalDAV PUTs etc; haven't tried it).  I'd
like to have an exposed IP address (per data center/cluster), and whatever
proxying/redirection magic is necessary, happen behind the scenes.

I admit that what I really want in our architecture is separate Scooby
layers with session persistence, stateless layers of Cosmo servers, and
service-locater lookups of user data to specific databases.  In this model,
each layer can be sized independently and all layers are fully
load-balanced.  I'm definitely not saying that other models can't be
attractive, but I do want to ask these types of questions when we start
talking about user partitioning.

>> 4) Open discussion to see if there are any other viable architectures

> all of my experience is with partitioning data as you've described, so
> i'm curious to hear alternate suggestions.

Well, I thought recently that "distributed + transactional = EJB" so a JEE
solution might be one route.  I admit I don't understand fully how JEE would
implement *distributed* serialization to databases, though the words
"persistence" are plastered all over the EJB specs.  I have no "full" EJB
deployments in my background; Bobby can say more, probably others.

There might be some Java implementations of some kind of distributed
writable cache that using non-EJB APIs and is nice and lightweight.  If that
cache layer knew how to distribute its storage, then all Cosmo's could use
the same distributed write API.

One can split the read and write layers, distributing reads widely and
centralizing writes.  But I guess that doesn't actually let you scale writes
up to arbitrary volumes the way true partitioning does.

If we're truly partitioning users, we can distribute all the way down to
individual boxes so that Scooby, Cosmo, and MySQL all run on the same box.
If we can't have multiple Cosmo servers work in parallel on a given slice of
users (ie 1:1 app:db model), than this is an attractive option.

Overall, partitioning data between RDBMS is a prudent route and I expect us
to end up there.  There's a lot of nuance is the layer above that, so I hope
you'll indulge all the questions/conversation in this area.

-- Jared


More information about the cosmo-dev mailing list