[Chandler-dev] Re: sharing format / dump and reload question

Phillip J. Eby pje at telecommunity.com
Thu Jul 6 14:47:37 PDT 2006


[cc'd to Chandler-Dev because I've been meaning to post something about 
information models anyway, so I might as well start with *something*]

At 01:11 PM 7/6/2006 -0700, Morgen Sagen wrote:
>On Jun 30, 2006, at 12:08 PM, Phillip J. Eby wrote:
>>At 02:02 PM 6/29/2006 -0700, Morgen Sagen wrote:
>>>Hey Phillip,
>>>
>>>Given your ideas about the information model and the needs of dump
>>>and reload, would Google's data API format suffice?
>>>
>>>See their 'Kinds' document:
>>>
>>>    http://code.google.com/apis/gdata/common-elements.html
>>>
>>>and especially this section for an example:
>>>
>>>    http://code.google.com/apis/gdata/common-elements.html#gdEventKind
>>>
>>>It's just an extension of Atom XML schema, and by starting with
>>>Google's schema we'd instantly get interoperability with a useful
>>>service.
>>
>>It's interesting, but it doesn't have a uniform or elementary
>>information model.  Notice, for example, the embedded iCalendar
>>data in gd:recurrence.  I agree that being able to share in this
>>format seems useful for interoperability purposes, but it doesn't
>>appear to solve our other issues.  (Note also the idiosyncratic
>>overlap in semantics between gd:recurrence and
>>gd:recurrenceException.)
>>
>>I would say, though, that if this is the kind of thing you want to
>>be able to do, it's probably a good idea for me to look at it and
>>see what could be done to reduce it to an elementary representation.
>
>What do you mean by "elementary"?  I thought the way they store
>icalendar content in the gd:recurrence field was strange at first,
>but actually it would probably be convenient since Jeffrey's vobject
>lib groks icalendar.

I'm saying that the gd:recurrence stuff means there's no information model, 
it's just a data format -- in fact it's *two* data formats.  :)

The difference would be that in a uniform information model, all the facts 
represented by the icalendar data would be represented in the same way as 
all other facts represented by the model.  Or conversely, all the other 
facts would be represented in vobject form.

Some examples of informational models are the relational model, the 
LDAP/x.500 directory model, and the XML document model.  XML's information 
model consists of hierarchies, text, elements, and attributes.  LDAP's 
model is a hierarchy of named objects, with multivalued text 
attributes.  The relational model is tables of atomic values.

Of course, this is a simplistic summary, because you can create many 
information models *in* XML, by restricting expressiveness to provide 
meaning.  You can also express relational-like models in XML or vice 
versa.  Any of these information models is sufficient to express ideas from 
the others.

The Chandler information model can be described as the meta-schema that 
defines what schema we can express in Chandler.  The Schema API 
documentation could be viewed as a summary of this information model or 
meta-schema.

However, the information model that we use in Chandler is way too rich for 
an interchange format, which should be simpler and more, well, 
*elementary*, if it's to be robust in the face of schema changes.  So 
although we could define an information model that matches the one we use 
for Chandler itself, this would just be moving the schema evolution 
problems around and not solving them.

I would suggest we define a model based on a restricted subset of the 
relational model, but possibly *expressed* using XML namespaces to 
represent the "fields" of the different "tables".  The reason I say 
"relational" rather than just saying "XML" is that the relational model 
contains some key ideas that XML does not.  For example, the relational 
model insists that individual data values be elementary, atomic, and 
normalized, without nesting or hierarchy.  I think that these are important 
qualities for data interchange and upgradeability, because the hierarchy 
you write out with may not be the ideal hierarchy to read in, when a schema 
changes or across implementation boundaries (i.e. Chandler vs. Cosmo/Scooby).

Another such important quality is discoverability - it should be possible 
to map from e.g. namespace URLs to handlers, if namespace URLs are being 
used to identify logical "tables".

The key here is that the information model is 
representation-independent.  Whether we use binary, XML, YAML, or even 
Python pickles to physically express the information model, there is still 
a schema that defines the scope of what you can "say" in that model.  Is 
this making any more sense?



More information about the chandler-dev mailing list