[Chandler-dev] Dump and reload sketch

Phillip J. Eby pje at telecommunity.com
Fri May 12 14:53:06 PDT 2006


At 10:59 AM 5/12/2006 -0700, Morgen Sagen wrote:

>On May 11, 2006, at 5:17 PM, Phillip J. Eby wrote:
>>
>>By "stable external format", I mean a format that does not change
>>significantly from one Chandler release to the next, and which
>>allows for version detection of the format itself, as well as
>>providing version and schema information for the parcels whose data
>>is contained in the format.
>
>We're also talking about defining a new "sharing format" which has
>some of the same requirements that you spell out: needing a
>relatively stable format, needing to maintain ref-collection order
>even if the format structure is simple and non-nested, avoiding being
>bitten by onValueChanged( ) calls during sync, etc.  So perhaps there
>is an opportunity for re-use here.

Certainly both systems would benefit from getting rid of 
sequence-to-sequence birefs.


>>At this point I haven't covered much actual API detail, or anything
>>at all about the actual external format.  I don't actually care
>>much about the external format, since it's not a requirement that
>>it be processed by other programs, and parcel writers will never
>>see it directly.  The API will only expose streams of records of
>>elementary types, and provide a way for parcel writers to transform
>>individual records as the streams go by, and to do pre- and post- 
>>processing on the repository contents.
>
>Ah, well, the sharing format is intended to be processed by Cosmo and
>other apps, so perhaps that doesn't fit in with your goals.  However,
>I am hoping that the new sharing format can be something as simple as
>a series of RDF triples (could be represented in XML, or not, doesn't
>really matter as long as we have the equivalent of namespaces to
>handle attribute name collisions).  What were you thinking your dump/ 
>restore records might look like?

 From the POV of the dump/reload API and framework, data will essentially 
be composed of tuples of elementary types.  You can think of this as being 
conceptually equivalent to a collection of relational database tables; in 
fact the main difference between the format and a relational database is 
that everything is read-only sequential access with no rewinding.  That is, 
processing will occur on a stream of records, which may at certain points 
be interleaved.

That's the information model; how the data is actually stored on disk isn't 
particularly important, except that it be reasonably efficient in time and 
space (which probably means XML is out ;-)).

The overlap between sharing and dump/reload does worry me a bit.  It would 
suck for parcels to have to write *two* sets of code to do schema upgrades.



More information about the chandler-dev mailing list