[Cosmo-dev] simplifying eimml
Phillip J. Eby
pje at telecommunity.com
Thu Nov 30 19:28:44 PST 2006
At 04:35 PM 11/30/2006 -0800, Brian Moseley wrote:
>On 11/30/06, Phillip J. Eby <pje at telecommunity.com> wrote:
>>Why? Well, for example, yesterday Morgen and I hashed out an all-purpose
>>diff-processing, attribute-filtering, *and* conflict-detecting sync
>>algorithm for processing *arbitrary* EIM records for *arbitrary* parcels
>>and schemas. It seems a waste to have to reinvent all those wheels for
>>one-off static formats.
>
>nice. got that written up on the wiki in english (as opposed to python)? :)
No, sorry, just Python, and a rather idiosyncratic hungarian notation that
probably only Morgen and I understand at this point.
Definitions: a translator is an object that's responsible for converting
EIM records to and from Chandler items, a recordset is a collection of EIM
records, and a diff (the 'dFoo' variables) are a collection of record diffs
and records to be deleted. The rsOldBase and rsNewBase are hashtables of
recordsets keyed by item UUID, holding the records as seen by the other end
of the connection. The algorithm assumes that subtracting recordsets
produces diffs, that diffs can be added to recordsets to update them, and
that subtracting two diffs produces a "lost updates" diff (i.e., what
changes the inbound diff overwrites in the outbound diff). Finally, the
sync_filter and publish_filter are objects that know what fields to
"censor" or pretend are unchanged. (We also designed a registration
framework so third-party devs can register filters and have them
automatically picked up by the sharing system.) The 'Cosmo 0.6' comments
refer to the fact that (from what Morgen told me) Cosmo won't be tracking
per-field changes and thus will always need to send or receive entire
records. So those bits of the algorithm would be a bit simpler if that
capability were available in future. But this basic algorithm should work
fine for now, and will essentially be unchanged for future versions,
because all of the schema-specific stuff stays in the translators provided
by parcel developers:
for item in changed_items:
rsNewBase[item.itsUUID] = Recordset(translator.exportItem(item))
for itemUUID, rs in inbound_diff.items():
dInbound = rs - rsOldBase.setdefault(itemUUID, empty_rs) # Cosmo 0.6
if itemUUID in rsNewBase:
# Check for conflicts and update outbound data
# to include inbound changes
dLocal = rsNewBase[itemUUID] - rsOldBase[itemUUID]
dLost[itemUUID] = dLocal - dInbound
rsNewBase[itemUUID] += dInbound
translator.importRecords(sync_filter(dInbound))
translator.deleteRecords(dInbound.deletions)
rsOldBase[itemUUID] += dInbound
for itemUUID, rs in rsNewBase.items():
if itemUUID in rsOldBase:
dOutbound = sync_filter(rs - rsOldBase[itemUUID])
else:
dOutbound = publish_filter(rs)
rsOldBase[itemUUID] = empty_rs
#send(dOutbound) # not in Comso 0.6
rsOldBase[itemUUID] += dOutbound
send(rsOldBase[itemUUID]) # Cosmo 0.6
>>Chandler needs a general EIM exchange format for database dump
>>and reload support, schema evolution, and sharing interop with multiple
>>client versions. The idea of having an general EIM transport format is
>>that it allows the transmission schema to be varied independently of the
>>transport format, so we're coding serialization and deserialization once --
>>and only once.
>
>oh, so you were going to use eimml for dump and reload? i guess i
>should have realized that but did not.
At the least, it'll be used to prototype dump and reload. If performance
isn't sufficient, we may produce some kind of "binary eimml" equivalent,
but the point is simply that if things are based on the EIM *information
model* then the actual transmission format is just a thin wrapper. So,
producing a thin XML wrapper or thin binary wrapper or thin CSV wrapper or
even thin SQL wrapper isn't that difficult. But creating a one-off non-EIM
format for Cosmo is a different story altogether.
I understand that it may seem like a bit more effort to try to do the stuff
Cosmo "understands" in EIM, but if I may suggest, it might be easier if you
create simple objects that represent EIM abstractions like records, and
have an EIM<->XML converter. Then, for the static schema you want to
access, just pull it out of the EIM record objects. While this may be
slightly more work than direct XML access, it will make it easier to handle
changes to that "static" schema, because you will just change what EIM
records you pull stuff out of.
On the Chandler side, we will be registering record implementation classes
by namespace URI, so the XML reader will effectively just look up the right
static type and instantiate it. You could use a similar approach, except
simpler because you only have one schema to support, while we are
supporting parcel devs registering their own record types and translators.
Anyway, I guess I'm suggesting that you could think of it as being like SAX
for EIM, except that you have some kind of lookup mechanism to determine
what to instantiate or invoke for each record type. And if there's no
handler registered for that type, you can then just store the EIM record
wherever you're storing stuff you "don't understand". Then, none of that
code changes when you expand or change the schema, just the stuff you
register. Heck, you can probably use those Java @ things (annotations?) to
mark what URIs your classes handle. We're using Python's equivalent @
things (decorators) to register import and export methods for item types
and record types.
But those are just ideas, it's been way too long since I did any Java stuff
myself, so some of these things I'm so blithely spouting off about may be
much harder to do than I remember. :)
More information about the cosmo-dev
mailing list