[Cosmo-dev] simplifying eimml
Phillip J. Eby
pje at telecommunity.com
Fri Dec 1 12:44:03 PST 2006
At 12:06 PM 12/1/2006 -0800, Brian Moseley wrote:
>On 12/1/06, Morgen Sagen <morgen at osafoundation.org> wrote:
>>I believe that
>>one issue that came up when he and I were working on the algorithm
>>was that records we get from Cosmo need to be grouped by item UUID,
>>so that the Chandler sharing layer knows which item the records
>>correspond to without needing to know what the various fields inside
>>a record actually signify -- in other words, without having to know
>>that the first field of a record happens to be a UUID.
>
>hmm... we had talked a while back about metadata records that
>described the structure of the data records, but thinking about how
>that would work is making my head spin.
They would just be other EIM records. Think of it as being like the system
tables in a relational database server, that contain rows describing the
columns of the tables in that database. :)
>if we group by item, then is there any need to include the uuid in
>each individual record?
Yes. There is no requirement that EIM records in the general case contain
any UUIDs at all, let alone ones that are in any way related to the item
identifier.
>and if we do this grouping, then we lose the
>tabular eim structure, don't we? is that a problem?
We don't lose it; we're just *batching* sets of records. For all practical
purposes, the "envelope identifier" could just as well be some kind of hash
code that divides the record keyspace in a meaningful way, so we don't have
to load records for the entire collection when doing diffs.
The essential information model of EIM is unchanged by the batching.
>i had assumed that it was the responsibility of the eim processor to
>group records related to the same item. is that at odds with your sync
>algorithm?
It's at odds with our non-relational storage. The Chandler repository
doesn't give us an efficient way to do what are essentially relational
operations. If we were storing the sync baseline in a relational way, we
could just look up the records directly by record keys instead of loading
groups of records by looking up an item key.
If/when we get a better way to handle this aspect, the item "envelopes"
requirement could be dropped, and the sync algorithm could change from
looping over recordsets to looping over individual records. But with or
without the batches, the information model remains the same.
More information about the cosmo-dev
mailing list