[Chandler-dev] Type Definitions for sharing format API

Phillip J. Eby pje at telecommunity.com
Fri Sep 22 10:52:53 PDT 2006


Last week, Brian Moseley, Ted, Morgen, and I met to work out a basis for 
defining interoperable data types between Cosmo and Chandler.  I'm in the 
process of incorporating this new information into the API proposal, and 
should have an updated version of it soon.  In the meantime, here's a recap 
of what we settled on at the meeting, and how I see it being incorporated 
into the API proposal.


Primitive Types
---------------

We ended up deciding on five "primitive" data types:

* Bytes[length], where the maximum length must be specified and it must be 
1024 or less

* Text[length], where the maximum length (in bytes of UTF-8 encoding) must 
be specified and it must be 1024 or less

* Lob, a blob of arbitrary-length data.  Unlike Chandler repository lobs, 
this type does *NOT* include encoding or mime-type information; these must 
be specified as separate fields if needed.

* Integer, an unsigned 32-bit integer

* Datetime, a date and time value with a timezone name *and a UTC 
offset*.  There will be a timezone name reserved for "local" time.  (The 
UTC offset ensures that the time's meaning is unambiguous, in the event 
that two systems have a different definition for the same timezone name, 
due to e.g. changes in the timezone database.)


Type Aliasing or "Typedefs"
---------------------------

The system will allow for extension of these primitive types via "type 
defs".  That is, you could define a "UUID" type as having a representation 
of Bytes[16] or Text[36].  So the metadata describing a schema will include 
a URI to define the "meaning" of the data that is represented.  Borrowing 
from my previous example, if we have a record type defined thus in Chandler:

@sharing.recordtype("URI for 'itemrecord'")
def itemrecord(itsUUID, title, body, createdOn, description, lastModifiedBy):
     # details omitted

We might represent the type information as a set of EIM records like this:

("URI for 'itemrecord'", "itsUUID",        "URI for 'UUID'", "Bytes",   16)
("URI for 'itemrecord'", "title",          "",               "Text",   256)
("URI for 'itemrecord'", "body",           "",               "Lob",      0)
("URI for 'itemrecord'", "createdOn",      "",               "Datetime", 0)
("URI for 'itemrecord'", "description",    "",               "Text",  1024)
("URI for 'itemrecord'", "lastModifiedBy", "URI for 'UUID'", "Bytes",   16)

Substituting the various "URI for" bits with appropriate URIs.  The idea 
here is that types that have no special semantics beyond those of the 
primitive represenation, don't need a URI.

This idea of separating a type's *meaning* from its *representation* means 
that EIM-based applications can trade data without *needing* to understand 
it, but being able to provide better support for types that they do understand.


API Changes/Additions
---------------------

Here are my current ideas for incorporating this type information into the API.

First, I would move type and dependency information to the default values 
of the record type declaration, so that to do the above, we might do 
something like this:

     @sharing.recordtype("URI for 'itemrecord'")
     def itemrecord(
         itsUUID        = schema.UUID,
         title          = sharing.TextType(256),
         body           = sharing.LobType,
         createdOn      = sharing.DateType,
         description    = sharing.TextType(1024),
         lastModifiedBy = schema.UUID,
     ):
         ...

You'll notice there's a mix of schema.* and sharing.* API calls here; the 
idea is that sharing would provide type constructors for the primitive 
types, and there would be standard representations registered for schema 
types that can be unambiguously defined.  For example, schema.UUID can have 
a representation defined as sharing.BytesType(16, "...some URI...").  There 
would be a registration system to allow mapping schema types to sharing 
types, e.g.:

     sharing.typedef(schema.UUID, sharing.BytesType(16, "...some URI..."))

So, from then on, using 'schema.UUID' to define a field type would "do the 
right thing".

The type constructors (BytesType, DateType, LobType, TextType, and IntType) 
would all accept arguments to set the type's URI, size, and converters to 
translate the native type (e.g. UUIDs) to and from the primitive 
representation (e.g. bytes).  So, for example, one might actually do the 
above type registration as:

     sharing.typedef(
         schema.UUID,
         sharing.BytesType(
             size=16, uri="...some URI...",
             repr=uuid_to_bytes, eval=bytes_to_uuid
         )
     )

Where uuid_to_bytes and bytes_to_uuid are appropriate conversion 
functions.  This then allows the EIM API to serialize and deserialize 
records using a parcel's preferred datatypes, and helps minimize the amount 
of coding that someone has to do to represent common data types in their 
sharing schema.

In addition to being able to register type aliases like this, there should 
also be support for specifying types by referring to fields provided by 
other record types


Open Issues
-----------

* The record type I've been using as an example above should probably 
actually define the "lastModifiedBy" field's type as being a reference to 
the "itsUUID" field: a self-referential dependency.  I don't currently have 
a way to express this.

* The metadata format example doesn't include field-to-field references 
either for the same or different record types, but it needs to.

* There's still no way to express what field(s) represent a record's 
primary key.  In the examples we've played with so far, this tends to be 
either a UUID or the entire record is its primary key, but I'm not sure 
that other combinations can't arise.  (Primary key definition is needed in 
order to implement a "diff" or "delta" mechanism for transmitting 
incremental updates.)

* A new potential issue is that of type representation changes.  If you 
change the definition of a type or its representation between schema 
versions, you could alter the schema in an incompatible way.  I'm not sure 
that this is really a *new* issue, just that the type aliasing machinery 
might make it easier to make this mistake.  I need to give this some more 
thought; suggestions are welcome.

In general, actually, any feedback or thoughts on the open issues (or the 
current state of the API proposal in general) would be useful.  Thanks.



More information about the chandler-dev mailing list