[Cosmo-dev] simplifying eimml

Phillip J. Eby pje at telecommunity.com
Fri Dec 1 12:44:03 PST 2006


At 12:06 PM 12/1/2006 -0800, Brian Moseley wrote:
>On 12/1/06, Morgen Sagen <morgen at osafoundation.org> wrote:
>>I believe that
>>one issue that came up when he and I were working on the algorithm
>>was that records we get from Cosmo need to be grouped by item UUID,
>>so that the Chandler sharing layer knows which item the records
>>correspond to without needing to know what the various fields inside
>>a record actually signify -- in other words, without having to know
>>that the first field of a record happens to be a UUID.
>
>hmm... we had talked a while back about metadata records that
>described the structure of the data records, but thinking about how
>that would work is making my head spin.

They would just be other EIM records.  Think of it as being like the system 
tables in a relational database server, that contain rows describing the 
columns of the tables in that database.  :)


>if we group by item, then is there any need to include the uuid in
>each individual record?

Yes.  There is no requirement that EIM records in the general case contain 
any UUIDs at all, let alone ones that are in any way related to the item 
identifier.


>and if we do this grouping, then we lose the
>tabular eim structure, don't we? is that a problem?

We don't lose it; we're just *batching* sets of records.  For all practical 
purposes, the "envelope identifier" could just as well be some kind of hash 
code that divides the record keyspace in a meaningful way, so we don't have 
to load records for the entire collection when doing diffs.

The essential information model of EIM is unchanged by the batching.


>i had assumed that it was the responsibility of the eim processor to
>group records related to the same item. is that at odds with your sync
>algorithm?

It's at odds with our non-relational storage.  The Chandler repository 
doesn't give us an efficient way to do what are essentially relational 
operations.  If we were storing the sync baseline in a relational way, we 
could just look up the records directly by record keys instead of loading 
groups of records by looking up an item key.

If/when we get a better way to handle this aspect, the item "envelopes" 
requirement could be dropped, and the sync algorithm could change from 
looping over recordsets to looping over individual records.  But with or 
without the batches, the information model remains the same.



More information about the cosmo-dev mailing list