Open Source Applications Foundation

[Dev] Introduction; Shimmer Prototype Database

Morgen Sagen Fri, 8 Nov 2002 17:15:49 -0800


Greetings, all:

My name is Morgen Sagen and I'm an engineer here at OSAF.  I've been
trying to find time to introduce myself to the lists; the response has
been overwhelming and list/site administration has been keeping me busy.
This article talks about the prototype database I wrote which is used by
Andy Hertzfeld's 'Vista'[1], and like Vista, the database was meant only
as a means of exploration, not for the final product.  

[1] http://osafoundation.org/Vista_prototype.htm

Mitch, Al Cho, and I started looking at how we could model the
information we wanted to store in a PIM.  After several brainstorming
sessions where we drew schema diagrams of varying complexity, we
realized information can be boiled down into statements of the form:  

   <this> <has-relationship-with> <that>

"Morgen works-for OSAF", "OSAF is-located-in San Francisco", etc.  Treat
the nouns as nodes, treat the relationships as arcs, and you have a
directed graph, which should let you traverse information in interesting
and flexible ways.  At this point we came across the work being done
with RDF[2], found that there was already a formal specification for
just this type of schema description, and we adopted RDF/RDFS as our
model.  

[2] http://www.w3.org/TR/1999/REC-rdf-syntax-19990222/

Around this time Andy began work on the Vista prototype, and to support
this I wrote a small testbed database (Shimmer) which stores RDF triples
(statements of the form "subject-predicate-object"), supports RDF
concepts such as subclasses and subproperties, and allows queries that
take advantage of the graph model.  I explored ways of presenting
information to the application without it having to be exposed to the
concept of triples.  One way was to build a template mechanism which
maps groups of RDF triples into tree-structured records and vice-versa.

  
Say you want an application to be able to store contact information for
a person; you would first create a contact template (in RDF), which
defines the fields of a contact record (such as name, email address,
phone, residence, employer).  A field definition includes the RDF
property and class type required to represent the value of that field;
it can also include a reference to another template, which lets you
create hierarchical records.  Next, using the contact template you would
create a record and "attach" it to the person (internally this means the
resource representing the given person is used as the root node of the
record's tree).  Multiple records can be attached to any given resource,
each record representing a logical grouping of properties; you might
attach a Business Contact record, Medical record, Personal Interests
record, etc. to someone, and each record can potentially have different
access control settings for sharing (keeping certain properties private
and others shared with specific groups, etc.).  

Here's a portion of an example Contact template, in this case with two
fields, "name" and "email"; note that each field in turn refers to
another template, shown below:

   <rdf:Description rdf:about="&pim;Template_Contact">
       <rdf:type rdf:resource="&shim;Template" />
       <shim:name>Contact Template</shim:name>
   
       <shim:field>
           <rdf:Description rdf:about="&pim;field_contact_name">
               <rdf:type rdf:resource="&shim;Field" />
               <shim:name>name</shim:name>
               <shim:description>Contact name</shim:description>
               <shim:predicate>&pims;contact_name</shim:predicate>
               <shim:valueType>&pims;PersonName</shim:valueType>
 
<shim:hasTemplate>&pim;Template_PersonName</shim:hasTemplate>
           </rdf:Description>
       </shim:field>
   
       <shim:field>
           <rdf:Description rdf:about="&pim;field_contact_email">
               <rdf:type rdf:resource="&shim;Field" />
               <shim:name>email</shim:name>
               <shim:description>Email account</shim:description>
 
<shim:predicate>&pims;contact_email_address</shim:predicate>
               <shim:valueType>&pims;EmailAddress</shim:valueType>
 
<shim:hasTemplate>&pim;Template_EmailAddress</shim:hasTemplate>
           </rdf:Description>
       </shim:field>
   
       [...]
   
   </rdf:Description>
   

Here is a portion of an example Name template, with two fields, "first"
and "last":

   <rdf:Description rdf:about="&pim;Template_PersonName">
       <rdf:type rdf:resource="&shim;Template" />
       <shim:name>Person Name Template</shim:name>
    
       <shim:field>
           <rdf:Description rdf:about="&pim;field_personname_first">
               <rdf:type rdf:resource="&shim;Field" />
               <shim:name>first</shim:name>  
               <shim:description>First name</shim:description>
               <shim:predicate>&pims;person_first_name</shim:predicate>
               <shim:valueType>&rdfs;Literal</shim:valueType>
           </rdf:Description>
       </shim:field>
   
       <shim:field>
           <rdf:Description rdf:about="&pim;field_personname_last">
               <rdf:type rdf:resource="&shim;Field" />
               <shim:name>last</shim:name>
               <shim:description>Last name</shim:description>
               <shim:predicate>&pims;person_last_name</shim:predicate>
               <shim:valueType>&rdfs;Literal</shim:valueType>
           </rdf:Description>
       </shim:field>

   </rdf:Description>
   

Here's an oversimplified EmailAddress template, with just one field,
"address":

   <rdf:Description rdf:about="&pim;Template_EmailAddress">
       <rdf:type rdf:resource="&shim;Template" />
       <shim:name>Email Address Template</shim:name>
   
       <shim:field>
           <rdf:Description rdf:about="&pim;field_emailaddress_address">
               <rdf:type rdf:resource="&shim;Field" />
               <shim:name>address</shim:name>
               <shim:description>Account address</shim:description>
               <shim:predicate>&pims;email_address</shim:predicate>
               <shim:valueType>&rdfs;Literal</shim:valueType>
           </rdf:Description>
       </shim:field>
   
   </rdf:Description>


Using these templates, Shimmer allows the app to "insert" a record via a
syntax close to this:

   <insert template='Contact'>
      <name>
         <first value='Morgen'/>
         <last value='Sagen'/>
      </name>
      <email>
         <address value='xyzzy@work.org'/>
      </email>
      <email>
         <address value='xyzzy@home.org'/>
      </email>
   </insert>

Shimmer then adds the appropriate RDF triples to the store:

   *Subject*      *Predicate*                  *Object*
   -------------  ---------------------------  ----------------
   [resource A] - pims:contact_name          - [resource B]
   [resource A] - pims:contact_email_address - [resource C]
   [resource A] - pims:contact_email_address - [resource D]
   [resource B] - pims:person_first_name     - "Morgen"
   [resource B] - pims:person_last_name      - "Sagen"
   [resource C] - pims:email_address         - "xyzzy@work.org"
   [resource D] - pims:email_address         - "xyzzy@home.org"

Resource A represents the person we are attaching the record to.
Shimmer automatically generates the intermediate resources B, C, and D.


The concept of records provides a way to group RDF statements (triples)
into units that can be shared, edited, and deleted by the application
while still allowing flexible queries that view the data as a graph.
While Shimmer uses RDF triples as its internal storage format, Chandler
most likely will not -- we are looking at persistent object mechanisms
(such as ZODB) and Python objects will replace Shimmer's records. The
philosophy of RDF is what will live on; someone on the list mentioned
the word "intertwingled", which I think perfectly summarizes that
philosophy.  The user will be able to create structured relationships
(not just generic links) between any item, and create views based on
those relationships (not just displaying items that have placed into
hierarchical folders).

~morgen