[Dev] Introduction; Shimmer Prototype DatabaseMorgen Sagen Fri, 8 Nov 2002 17:15:49 -0800
Greetings, all: My name is Morgen Sagen and I'm an engineer here at OSAF. I've been trying to find time to introduce myself to the lists; the response has been overwhelming and list/site administration has been keeping me busy. This article talks about the prototype database I wrote which is used by Andy Hertzfeld's 'Vista'[1], and like Vista, the database was meant only as a means of exploration, not for the final product. [1] http://osafoundation.org/Vista_prototype.htm Mitch, Al Cho, and I started looking at how we could model the information we wanted to store in a PIM. After several brainstorming sessions where we drew schema diagrams of varying complexity, we realized information can be boiled down into statements of the form: <this> <has-relationship-with> <that> "Morgen works-for OSAF", "OSAF is-located-in San Francisco", etc. Treat the nouns as nodes, treat the relationships as arcs, and you have a directed graph, which should let you traverse information in interesting and flexible ways. At this point we came across the work being done with RDF[2], found that there was already a formal specification for just this type of schema description, and we adopted RDF/RDFS as our model. [2] http://www.w3.org/TR/1999/REC-rdf-syntax-19990222/ Around this time Andy began work on the Vista prototype, and to support this I wrote a small testbed database (Shimmer) which stores RDF triples (statements of the form "subject-predicate-object"), supports RDF concepts such as subclasses and subproperties, and allows queries that take advantage of the graph model. I explored ways of presenting information to the application without it having to be exposed to the concept of triples. One way was to build a template mechanism which maps groups of RDF triples into tree-structured records and vice-versa. Say you want an application to be able to store contact information for a person; you would first create a contact template (in RDF), which defines the fields of a contact record (such as name, email address, phone, residence, employer). A field definition includes the RDF property and class type required to represent the value of that field; it can also include a reference to another template, which lets you create hierarchical records. Next, using the contact template you would create a record and "attach" it to the person (internally this means the resource representing the given person is used as the root node of the record's tree). Multiple records can be attached to any given resource, each record representing a logical grouping of properties; you might attach a Business Contact record, Medical record, Personal Interests record, etc. to someone, and each record can potentially have different access control settings for sharing (keeping certain properties private and others shared with specific groups, etc.). Here's a portion of an example Contact template, in this case with two fields, "name" and "email"; note that each field in turn refers to another template, shown below: <rdf:Description rdf:about="&pim;Template_Contact"> <rdf:type rdf:resource="&shim;Template" /> <shim:name>Contact Template</shim:name> <shim:field> <rdf:Description rdf:about="&pim;field_contact_name"> <rdf:type rdf:resource="&shim;Field" /> <shim:name>name</shim:name> <shim:description>Contact name</shim:description> <shim:predicate>&pims;contact_name</shim:predicate> <shim:valueType>&pims;PersonName</shim:valueType> <shim:hasTemplate>&pim;Template_PersonName</shim:hasTemplate> </rdf:Description> </shim:field> <shim:field> <rdf:Description rdf:about="&pim;field_contact_email"> <rdf:type rdf:resource="&shim;Field" /> <shim:name>email</shim:name> <shim:description>Email account</shim:description> <shim:predicate>&pims;contact_email_address</shim:predicate> <shim:valueType>&pims;EmailAddress</shim:valueType> <shim:hasTemplate>&pim;Template_EmailAddress</shim:hasTemplate> </rdf:Description> </shim:field> [...] </rdf:Description> Here is a portion of an example Name template, with two fields, "first" and "last": <rdf:Description rdf:about="&pim;Template_PersonName"> <rdf:type rdf:resource="&shim;Template" /> <shim:name>Person Name Template</shim:name> <shim:field> <rdf:Description rdf:about="&pim;field_personname_first"> <rdf:type rdf:resource="&shim;Field" /> <shim:name>first</shim:name> <shim:description>First name</shim:description> <shim:predicate>&pims;person_first_name</shim:predicate> <shim:valueType>&rdfs;Literal</shim:valueType> </rdf:Description> </shim:field> <shim:field> <rdf:Description rdf:about="&pim;field_personname_last"> <rdf:type rdf:resource="&shim;Field" /> <shim:name>last</shim:name> <shim:description>Last name</shim:description> <shim:predicate>&pims;person_last_name</shim:predicate> <shim:valueType>&rdfs;Literal</shim:valueType> </rdf:Description> </shim:field> </rdf:Description> Here's an oversimplified EmailAddress template, with just one field, "address": <rdf:Description rdf:about="&pim;Template_EmailAddress"> <rdf:type rdf:resource="&shim;Template" /> <shim:name>Email Address Template</shim:name> <shim:field> <rdf:Description rdf:about="&pim;field_emailaddress_address"> <rdf:type rdf:resource="&shim;Field" /> <shim:name>address</shim:name> <shim:description>Account address</shim:description> <shim:predicate>&pims;email_address</shim:predicate> <shim:valueType>&rdfs;Literal</shim:valueType> </rdf:Description> </shim:field> </rdf:Description> Using these templates, Shimmer allows the app to "insert" a record via a syntax close to this: <insert template='Contact'> <name> <first value='Morgen'/> <last value='Sagen'/> </name> <email> <address value='xyzzy@work.org'/> </email> <email> <address value='xyzzy@home.org'/> </email> </insert> Shimmer then adds the appropriate RDF triples to the store: *Subject* *Predicate* *Object* ------------- --------------------------- ---------------- [resource A] - pims:contact_name - [resource B] [resource A] - pims:contact_email_address - [resource C] [resource A] - pims:contact_email_address - [resource D] [resource B] - pims:person_first_name - "Morgen" [resource B] - pims:person_last_name - "Sagen" [resource C] - pims:email_address - "xyzzy@work.org" [resource D] - pims:email_address - "xyzzy@home.org" Resource A represents the person we are attaching the record to. Shimmer automatically generates the intermediate resources B, C, and D. The concept of records provides a way to group RDF statements (triples) into units that can be shared, edited, and deleted by the application while still allowing flexible queries that view the data as a graph. While Shimmer uses RDF triples as its internal storage format, Chandler most likely will not -- we are looking at persistent object mechanisms (such as ZODB) and Python objects will replace Shimmer's records. The philosophy of RDF is what will live on; someone on the list mentioned the word "intertwingled", which I think perfectly summarizes that philosophy. The user will be able to create structured relationships (not just generic links) between any item, and create views based on those relationships (not just displaying items that have placed into hierarchical folders). ~morgen
|