[Chandler-dev] Type Definitions for sharing format API
Grant Baillie
grant at osafoundation.org
Thu Sep 28 09:31:04 PDT 2006
Hi, Phillip
This all makes sense ... I imagine the API would provide typedefs for
common non-primitive types, like signed integer, or floating-point.
Out of curiosity: why the restrictions on length of Bytes and Text
(and the size of Integer)?
--Grant
On 22 Sep, 2006, at 10:52, Phillip J. Eby wrote:
> Last week, Brian Moseley, Ted, Morgen, and I met to work out a
> basis for defining interoperable data types between Cosmo and
> Chandler. I'm in the process of incorporating this new information
> into the API proposal, and should have an updated version of it
> soon. In the meantime, here's a recap of what we settled on at the
> meeting, and how I see it being incorporated into the API proposal.
>
>
> Primitive Types
> ---------------
>
> We ended up deciding on five "primitive" data types:
>
> * Bytes[length], where the maximum length must be specified and it
> must be 1024 or less
>
> * Text[length], where the maximum length (in bytes of UTF-8
> encoding) must be specified and it must be 1024 or less
>
> * Lob, a blob of arbitrary-length data. Unlike Chandler repository
> lobs, this type does *NOT* include encoding or mime-type
> information; these must be specified as separate fields if needed.
>
> * Integer, an unsigned 32-bit integer
>
> * Datetime, a date and time value with a timezone name *and a UTC
> offset*. There will be a timezone name reserved for "local" time.
> (The UTC offset ensures that the time's meaning is unambiguous, in
> the event that two systems have a different definition for the same
> timezone name, due to e.g. changes in the timezone database.)
>
>
> Type Aliasing or "Typedefs"
> ---------------------------
>
> The system will allow for extension of these primitive types via
> "type defs". That is, you could define a "UUID" type as having a
> representation of Bytes[16] or Text[36]. So the metadata
> describing a schema will include a URI to define the "meaning" of
> the data that is represented. Borrowing from my previous example,
> if we have a record type defined thus in Chandler:
>
> @sharing.recordtype("URI for 'itemrecord'")
> def itemrecord(itsUUID, title, body, createdOn, description,
> lastModifiedBy):
> # details omitted
>
> We might represent the type information as a set of EIM records
> like this:
>
> ("URI for 'itemrecord'", "itsUUID", "URI for 'UUID'",
> "Bytes", 16)
> ("URI for 'itemrecord'", "title", "",
> "Text", 256)
> ("URI for 'itemrecord'", "body", "",
> "Lob", 0)
> ("URI for 'itemrecord'", "createdOn", "",
> "Datetime", 0)
> ("URI for 'itemrecord'", "description", "",
> "Text", 1024)
> ("URI for 'itemrecord'", "lastModifiedBy", "URI for 'UUID'",
> "Bytes", 16)
>
> Substituting the various "URI for" bits with appropriate URIs. The
> idea here is that types that have no special semantics beyond those
> of the primitive represenation, don't need a URI.
>
> This idea of separating a type's *meaning* from its
> *representation* means that EIM-based applications can trade data
> without *needing* to understand it, but being able to provide
> better support for types that they do understand.
>
>
> API Changes/Additions
> ---------------------
>
> Here are my current ideas for incorporating this type information
> into the API.
>
> First, I would move type and dependency information to the default
> values of the record type declaration, so that to do the above, we
> might do something like this:
>
> @sharing.recordtype("URI for 'itemrecord'")
> def itemrecord(
> itsUUID = schema.UUID,
> title = sharing.TextType(256),
> body = sharing.LobType,
> createdOn = sharing.DateType,
> description = sharing.TextType(1024),
> lastModifiedBy = schema.UUID,
> ):
> ...
>
> You'll notice there's a mix of schema.* and sharing.* API calls
> here; the idea is that sharing would provide type constructors for
> the primitive types, and there would be standard representations
> registered for schema types that can be unambiguously defined. For
> example, schema.UUID can have a representation defined as
> sharing.BytesType(16, "...some URI..."). There would be a
> registration system to allow mapping schema types to sharing types,
> e.g.:
>
> sharing.typedef(schema.UUID, sharing.BytesType(16, "...some
> URI..."))
>
> So, from then on, using 'schema.UUID' to define a field type would
> "do the right thing".
>
> The type constructors (BytesType, DateType, LobType, TextType, and
> IntType) would all accept arguments to set the type's URI, size,
> and converters to translate the native type (e.g. UUIDs) to and
> from the primitive representation (e.g. bytes). So, for example,
> one might actually do the above type registration as:
>
> sharing.typedef(
> schema.UUID,
> sharing.BytesType(
> size=16, uri="...some URI...",
> repr=uuid_to_bytes, eval=bytes_to_uuid
> )
> )
>
> Where uuid_to_bytes and bytes_to_uuid are appropriate conversion
> functions. This then allows the EIM API to serialize and
> deserialize records using a parcel's preferred datatypes, and helps
> minimize the amount of coding that someone has to do to represent
> common data types in their sharing schema.
>
> In addition to being able to register type aliases like this, there
> should also be support for specifying types by referring to fields
> provided by other record types
>
>
> Open Issues
> -----------
>
> * The record type I've been using as an example above should
> probably actually define the "lastModifiedBy" field's type as being
> a reference to the "itsUUID" field: a self-referential dependency.
> I don't currently have a way to express this.
>
> * The metadata format example doesn't include field-to-field
> references either for the same or different record types, but it
> needs to.
>
> * There's still no way to express what field(s) represent a
> record's primary key. In the examples we've played with so far,
> this tends to be either a UUID or the entire record is its primary
> key, but I'm not sure that other combinations can't arise.
> (Primary key definition is needed in order to implement a "diff" or
> "delta" mechanism for transmitting incremental updates.)
>
> * A new potential issue is that of type representation changes. If
> you change the definition of a type or its representation between
> schema versions, you could alter the schema in an incompatible
> way. I'm not sure that this is really a *new* issue, just that the
> type aliasing machinery might make it easier to make this mistake.
> I need to give this some more thought; suggestions are welcome.
>
> In general, actually, any feedback or thoughts on the open issues
> (or the current state of the API proposal in general) would be
> useful. Thanks.
>
> _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
>
> Open Source Applications Foundation "chandler-dev" mailing list
> http://lists.osafoundation.org/mailman/listinfo/chandler-dev
More information about the chandler-dev
mailing list