[Cosmo-dev] simplifying eimml

Phillip J. Eby pje at telecommunity.com
Thu Nov 30 19:28:44 PST 2006


At 04:35 PM 11/30/2006 -0800, Brian Moseley wrote:
>On 11/30/06, Phillip J. Eby <pje at telecommunity.com> wrote:
>>Why?  Well, for example, yesterday Morgen and I hashed out an all-purpose
>>diff-processing, attribute-filtering, *and* conflict-detecting sync
>>algorithm for processing *arbitrary* EIM records for *arbitrary* parcels
>>and schemas.  It seems a waste to have to reinvent all those wheels for
>>one-off static formats.
>
>nice. got that written up on the wiki in english (as opposed to python)? :)

No, sorry, just Python, and a rather idiosyncratic hungarian notation that 
probably only Morgen and I understand at this point.

Definitions: a translator is an object that's responsible for converting 
EIM records to and from Chandler items, a recordset is a collection of EIM 
records, and a diff (the 'dFoo' variables) are a collection of record diffs 
and records to be deleted.  The rsOldBase and rsNewBase are hashtables of 
recordsets keyed by item UUID, holding the records as seen by the other end 
of the connection.  The algorithm assumes that subtracting recordsets 
produces diffs, that diffs can be added to recordsets to update them, and 
that subtracting two diffs produces a "lost updates" diff (i.e., what 
changes the inbound diff overwrites in the outbound diff).  Finally, the 
sync_filter and publish_filter are objects that know what fields to 
"censor" or pretend are unchanged.  (We also designed a registration 
framework so third-party devs can register filters and have them 
automatically picked up by the sharing system.)  The 'Cosmo 0.6' comments 
refer to the fact that (from what Morgen told me) Cosmo won't be tracking 
per-field changes and thus will always need to send or receive entire 
records.  So those bits of the algorithm would be a bit simpler if that 
capability were available in future.  But this basic algorithm should work 
fine for now, and will essentially be unchanged for future versions, 
because all of the schema-specific stuff stays in the translators provided 
by parcel developers:

  for item in changed_items:
     rsNewBase[item.itsUUID] = Recordset(translator.exportItem(item))

  for itemUUID, rs in inbound_diff.items():
     dInbound = rs - rsOldBase.setdefault(itemUUID, empty_rs)    # Cosmo 0.6
     if itemUUID in rsNewBase:
         # Check for conflicts and update outbound data
         # to include inbound changes
         dLocal = rsNewBase[itemUUID] - rsOldBase[itemUUID]
         dLost[itemUUID] = dLocal - dInbound
         rsNewBase[itemUUID] += dInbound
     translator.importRecords(sync_filter(dInbound))
     translator.deleteRecords(dInbound.deletions)
     rsOldBase[itemUUID] += dInbound

  for itemUUID, rs in rsNewBase.items():
     if itemUUID in rsOldBase:
         dOutbound = sync_filter(rs - rsOldBase[itemUUID])
     else:
         dOutbound = publish_filter(rs)
         rsOldBase[itemUUID] = empty_rs
     #send(dOutbound) # not in Comso 0.6
     rsOldBase[itemUUID] += dOutbound
     send(rsOldBase[itemUUID])   # Cosmo 0.6



>>Chandler needs a general EIM exchange format for database dump
>>and reload support, schema evolution, and sharing interop with multiple
>>client versions.  The idea of having an general EIM transport format is
>>that it allows the transmission schema to be varied independently of the
>>transport format, so we're coding serialization and deserialization once --
>>and only once.
>
>oh, so you were going to use eimml for dump and reload? i guess i
>should have realized that but did not.

At the least, it'll be used to prototype dump and reload.  If performance 
isn't sufficient, we may produce some kind of "binary eimml" equivalent, 
but the point is simply that if things are based on the EIM *information 
model* then the actual transmission format is just a thin wrapper.  So, 
producing a thin XML wrapper or thin binary wrapper or thin CSV wrapper or 
even thin SQL wrapper isn't that difficult.  But creating a one-off non-EIM 
format for Cosmo is a different story altogether.

I understand that it may seem like a bit more effort to try to do the stuff 
Cosmo "understands" in EIM, but if I may suggest, it might be easier if you 
create simple objects that represent EIM abstractions like records, and 
have an EIM<->XML converter.  Then, for the static schema you want to 
access, just pull it out of the EIM record objects.  While this may be 
slightly more work than direct XML access, it will make it easier to handle 
changes to that "static" schema, because you will just change what EIM 
records you pull stuff out of.

On the Chandler side, we will be registering record implementation classes 
by namespace URI, so the XML reader will effectively just look up the right 
static type and instantiate it.  You could use a similar approach, except 
simpler because you only have one schema to support, while we are 
supporting parcel devs registering their own record types and translators.

Anyway, I guess I'm suggesting that you could think of it as being like SAX 
for EIM, except that you have some kind of lookup mechanism to determine 
what to instantiate or invoke for each record type.  And if there's no 
handler registered for that type, you can then just store the EIM record 
wherever you're storing stuff you "don't understand".  Then, none of that 
code changes when you expand or change the schema, just the stuff you 
register.  Heck, you can probably use those Java @ things (annotations?) to 
mark what URIs your classes handle.  We're using Python's equivalent @ 
things (decorators) to register import and export methods for item types 
and record types.

But those are just ideas, it's been way too long since I did any Java stuff 
myself, so some of these things I'm so blithely spouting off about may be 
much harder to do than I remember.  :)



More information about the cosmo-dev mailing list