[Chandler-dev] Type Definitions for sharing format API

Grant Baillie grant at osafoundation.org
Thu Sep 28 09:31:04 PDT 2006


Hi, Phillip

This all makes sense ... I imagine the API would provide typedefs for  
common non-primitive types, like signed integer, or floating-point.  
Out of curiosity: why the restrictions on length of Bytes and Text  
(and the size of Integer)?

--Grant

On 22 Sep, 2006, at 10:52, Phillip J. Eby wrote:

> Last week, Brian Moseley, Ted, Morgen, and I met to work out a  
> basis for defining interoperable data types between Cosmo and  
> Chandler.  I'm in the process of incorporating this new information  
> into the API proposal, and should have an updated version of it  
> soon.  In the meantime, here's a recap of what we settled on at the  
> meeting, and how I see it being incorporated into the API proposal.
>
>
> Primitive Types
> ---------------
>
> We ended up deciding on five "primitive" data types:
>
> * Bytes[length], where the maximum length must be specified and it  
> must be 1024 or less
>
> * Text[length], where the maximum length (in bytes of UTF-8  
> encoding) must be specified and it must be 1024 or less
>
> * Lob, a blob of arbitrary-length data.  Unlike Chandler repository  
> lobs, this type does *NOT* include encoding or mime-type  
> information; these must be specified as separate fields if needed.
>
> * Integer, an unsigned 32-bit integer
>
> * Datetime, a date and time value with a timezone name *and a UTC  
> offset*.  There will be a timezone name reserved for "local" time.   
> (The UTC offset ensures that the time's meaning is unambiguous, in  
> the event that two systems have a different definition for the same  
> timezone name, due to e.g. changes in the timezone database.)
>
>
> Type Aliasing or "Typedefs"
> ---------------------------
>
> The system will allow for extension of these primitive types via  
> "type defs".  That is, you could define a "UUID" type as having a  
> representation of Bytes[16] or Text[36].  So the metadata  
> describing a schema will include a URI to define the "meaning" of  
> the data that is represented.  Borrowing from my previous example,  
> if we have a record type defined thus in Chandler:
>
> @sharing.recordtype("URI for 'itemrecord'")
> def itemrecord(itsUUID, title, body, createdOn, description,  
> lastModifiedBy):
>     # details omitted
>
> We might represent the type information as a set of EIM records  
> like this:
>
> ("URI for 'itemrecord'", "itsUUID",        "URI for 'UUID'",  
> "Bytes",   16)
> ("URI for 'itemrecord'", "title",          "",                
> "Text",   256)
> ("URI for 'itemrecord'", "body",           "",                
> "Lob",      0)
> ("URI for 'itemrecord'", "createdOn",      "",                
> "Datetime", 0)
> ("URI for 'itemrecord'", "description",    "",                
> "Text",  1024)
> ("URI for 'itemrecord'", "lastModifiedBy", "URI for 'UUID'",  
> "Bytes",   16)
>
> Substituting the various "URI for" bits with appropriate URIs.  The  
> idea here is that types that have no special semantics beyond those  
> of the primitive represenation, don't need a URI.
>
> This idea of separating a type's *meaning* from its  
> *representation* means that EIM-based applications can trade data  
> without *needing* to understand it, but being able to provide  
> better support for types that they do understand.
>
>
> API Changes/Additions
> ---------------------
>
> Here are my current ideas for incorporating this type information  
> into the API.
>
> First, I would move type and dependency information to the default  
> values of the record type declaration, so that to do the above, we  
> might do something like this:
>
>     @sharing.recordtype("URI for 'itemrecord'")
>     def itemrecord(
>         itsUUID        = schema.UUID,
>         title          = sharing.TextType(256),
>         body           = sharing.LobType,
>         createdOn      = sharing.DateType,
>         description    = sharing.TextType(1024),
>         lastModifiedBy = schema.UUID,
>     ):
>         ...
>
> You'll notice there's a mix of schema.* and sharing.* API calls  
> here; the idea is that sharing would provide type constructors for  
> the primitive types, and there would be standard representations  
> registered for schema types that can be unambiguously defined.  For  
> example, schema.UUID can have a representation defined as  
> sharing.BytesType(16, "...some URI...").  There would be a  
> registration system to allow mapping schema types to sharing types,  
> e.g.:
>
>     sharing.typedef(schema.UUID, sharing.BytesType(16, "...some  
> URI..."))
>
> So, from then on, using 'schema.UUID' to define a field type would  
> "do the right thing".
>
> The type constructors (BytesType, DateType, LobType, TextType, and  
> IntType) would all accept arguments to set the type's URI, size,  
> and converters to translate the native type (e.g. UUIDs) to and  
> from the primitive representation (e.g. bytes).  So, for example,  
> one might actually do the above type registration as:
>
>     sharing.typedef(
>         schema.UUID,
>         sharing.BytesType(
>             size=16, uri="...some URI...",
>             repr=uuid_to_bytes, eval=bytes_to_uuid
>         )
>     )
>
> Where uuid_to_bytes and bytes_to_uuid are appropriate conversion  
> functions.  This then allows the EIM API to serialize and  
> deserialize records using a parcel's preferred datatypes, and helps  
> minimize the amount of coding that someone has to do to represent  
> common data types in their sharing schema.
>
> In addition to being able to register type aliases like this, there  
> should also be support for specifying types by referring to fields  
> provided by other record types
>
>
> Open Issues
> -----------
>
> * The record type I've been using as an example above should  
> probably actually define the "lastModifiedBy" field's type as being  
> a reference to the "itsUUID" field: a self-referential dependency.   
> I don't currently have a way to express this.
>
> * The metadata format example doesn't include field-to-field  
> references either for the same or different record types, but it  
> needs to.
>
> * There's still no way to express what field(s) represent a  
> record's primary key.  In the examples we've played with so far,  
> this tends to be either a UUID or the entire record is its primary  
> key, but I'm not sure that other combinations can't arise.   
> (Primary key definition is needed in order to implement a "diff" or  
> "delta" mechanism for transmitting incremental updates.)
>
> * A new potential issue is that of type representation changes.  If  
> you change the definition of a type or its representation between  
> schema versions, you could alter the schema in an incompatible  
> way.  I'm not sure that this is really a *new* issue, just that the  
> type aliasing machinery might make it easier to make this mistake.   
> I need to give this some more thought; suggestions are welcome.
>
> In general, actually, any feedback or thoughts on the open issues  
> (or the current state of the API proposal in general) would be  
> useful.  Thanks.
>
> _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
>
> Open Source Applications Foundation "chandler-dev" mailing list
> http://lists.osafoundation.org/mailman/listinfo/chandler-dev



More information about the chandler-dev mailing list