[Chandler-dev] Dump and reload sketch
Phillip J. Eby
pje at telecommunity.com
Fri May 12 14:53:06 PDT 2006
At 10:59 AM 5/12/2006 -0700, Morgen Sagen wrote:
>On May 11, 2006, at 5:17 PM, Phillip J. Eby wrote:
>>
>>By "stable external format", I mean a format that does not change
>>significantly from one Chandler release to the next, and which
>>allows for version detection of the format itself, as well as
>>providing version and schema information for the parcels whose data
>>is contained in the format.
>
>We're also talking about defining a new "sharing format" which has
>some of the same requirements that you spell out: needing a
>relatively stable format, needing to maintain ref-collection order
>even if the format structure is simple and non-nested, avoiding being
>bitten by onValueChanged( ) calls during sync, etc. So perhaps there
>is an opportunity for re-use here.
Certainly both systems would benefit from getting rid of
sequence-to-sequence birefs.
>>At this point I haven't covered much actual API detail, or anything
>>at all about the actual external format. I don't actually care
>>much about the external format, since it's not a requirement that
>>it be processed by other programs, and parcel writers will never
>>see it directly. The API will only expose streams of records of
>>elementary types, and provide a way for parcel writers to transform
>>individual records as the streams go by, and to do pre- and post-
>>processing on the repository contents.
>
>Ah, well, the sharing format is intended to be processed by Cosmo and
>other apps, so perhaps that doesn't fit in with your goals. However,
>I am hoping that the new sharing format can be something as simple as
>a series of RDF triples (could be represented in XML, or not, doesn't
>really matter as long as we have the equivalent of namespaces to
>handle attribute name collisions). What were you thinking your dump/
>restore records might look like?
From the POV of the dump/reload API and framework, data will essentially
be composed of tuples of elementary types. You can think of this as being
conceptually equivalent to a collection of relational database tables; in
fact the main difference between the format and a relational database is
that everything is read-only sequential access with no rewinding. That is,
processing will occur on a stream of records, which may at certain points
be interleaved.
That's the information model; how the data is actually stored on disk isn't
particularly important, except that it be reasonably efficient in time and
space (which probably means XML is out ;-)).
The overlap between sharing and dump/reload does worry me a bit. It would
suck for parcels to have to write *two* sets of code to do schema upgrades.
More information about the chandler-dev
mailing list