[Dev] Re: ownership and relays (4suite uuid.py)
rys mccusker
david at treedragon.com
Tue Apr 8 17:40:19 PDT 2003
David Paigen wrote:
> petite_abeille at mac.com wrote:
>>Why would you want to mangle object id and repository id? If you need
>>more information about your object aside from its identifier (e.g. its
>>entity, how to get to it, etc), this most likely doesn't belong to the
>>oid itself.
>
> Either I don't understand the semantics of object "ownership" and access,
> or I don't understand your question.
this comes up a lot, and I should write a wiki page which describes
a general plan for how object ownership affects things like replication
and synchronization. I was describing it to Katie the day we discussed
persistent event queues, and is about as large topic, but more confusing.
here is a summary version:
no one owns a canonical version of an object. Object sharing is always
simulated by copying. So a canonical copy of an object is created only
by cooperative simulation, through copying between repositories.
when we discussed this with John Anderson, he came up with the term
"relay" to emphasize the aspect he thought was most important. Say you
want to have a "centralized" server which publishes Amalgamated data
from multiple clients. In the model I propose, a central server does
not know that it is central, and has no responsibility for insuring
integrity of distributed to the copies. It is only a dumb relay.
Clients who wants to publish their data put a copy into the relay.
Latency for being up-to-date depends upon relay update lag time.
if we had a distributed database which guaranteed global consistency
of some view of some data, this would put the burden on the database
to ensure the condition which is true across time and space, with a
failure mode which tends to catastrophic instead of incremental.
instead, we want a model which explicitly acknowledges your data is
stale to the degree lag time between replication opportunities has
permitted data to get out of sync. in this model, you can negotiate
the degree of freshness you need with clients you cooperate with.
instead of implicitly guaranteeing global consistently, we explicitly
allow you to get into sync modulo some degree of staleness you are
willing to accept.
now I will try to apply this in the context of following remarks.
> If Chandler can view, add, or modify items from many repositories at
> once, Chandler will obviously need to know where those items reside
> so that changes can be monitored or delivered.
the same item of data might appear in multiple repositories, and
Chandler will not try to guarantee they say in sync. Chandler will only
synchronize repositories, which gets resolved at the granularity of
object changes. (to avoid locking, this might require optimistic
algorithms, which might abort a sync if an object changes during a sync.)
in a view in which items from multiple repositories are mixed (I suppose
we will do that sooner or later) we would mark these items with both
repository and object ID. If object ID _included_ repository ID, then
both would not be needed, because it would be redundant.
John Anderson and I had originally been planning to include repository
ID _inside_ each object ID, so that objects could be a signed IDs by
a repository, without any concern for how other repositories assigned IDs.
then as long as repositories are assigned IDs uniquely, there's no problem.
is a system would perform more efficiently using IDs native to a database
underneath repository, that would be an argument for database assigned IDs.
Otherwise you might just take the hit of requiring client assigned IDs,
using a collision resistant global unique ID scheme, and require whatever
indirection mapping is required between ideas at the database level.
> Looking over that last paragraph I feel an example is in order:
> - I am at work running Chandler, but I am connected to my Chandler at home.
> - I buy a book during lunch hour (shock! what a surprise!).
> - I enter the book information into my CardCatalog application (in Chandler).
> Note: CardCatalog resides in my home repository, I just have access at work.
> Chandler must now deliver the book item to the repository at home.
the Chandler model we have anticipated is that you will make changes
to a repository, and such changes might be replicated and/or
synchronized with other repositories. But that Chandler will not
spontaneously make updates to multiple repositories, except for special
cases, such as compositing multiple repositories to look like one.
(then writing to the composited repository would require pushing changes
back to the right source repository; witness theoretical applications
for separating Chandler annotations from IMAP repositories.)
> Furthermore, if the ordered pair of [repository ID, object ID] is the
> UUID Chandler uses, then object IDs can safely be incremented integers
> and simplicity rules. Then you just need to find a way to generate
> good 16 byte UUIDs for the repository and don't worry about performance.
that was exactly the plan. I don't know if it's the best plan.
> Having both a repository ID and an object ID, together, as a UUID has
> precedent and is good design. Together does not mean mangled, as long
> as you can still extract one or the other.
it sounded acceptable to John Anderson, and struck me as suitably close
two maximally efficient to not worry about a little more here and there.
the purpose of adding the repository ID was not to indicate ownership
of an object, but just uniqueness by virtue of where an object was created.
Say you have repositories with IDs A and B. in the latter, you might create
an object with local ID 1, whose UUID would be [B+1], and when you copied
this object into repository A, it would still have UUID [B+1], so it would
not actually be useful to show which repository contained the object.
you would only know that it had been created in repository B. It might
have come to repository A by way of repository C, so you don't even know
what the source repository was in case you want to push back changes.
so we were only going to use repository ID as disambiguiting namespace
for the sake of uniqueness. We could as well just use very collision
resistant IDs, assigned without the knowledge of a database under the
repository, which sacrifices the potential for performance advantage
using native database unique IDs.
Rys
More information about the Dev
mailing list