[Cosmo-dev] Re: conference call followup

Brian Moseley bcm at osafoundation.org
Wed Jul 19 11:56:25 PDT 2006

<moving this discussion to cosmo-dev, where all follow-up should take place.>

thanks guys! that was quick work. i'm glad to see discussion starting
so quickly.

to catch everybody up, randy and charles have offered to work on the
meat and potatoes of the hibernate project that i proposed several
days ago.

On 7/19/06, rletness at simdeskcorp.com <rletness at simdeskcorp.com> wrote:

> We have been thinking about a database schema for a hibernate/db persistence
> layer the past couple of days.  Here is a first pass at an ERD (still
> more to be defined) of what we were thinking.  We wanted to come up with
> something that:
> 1. was hierarchical
> 2. abstract enough to support DAV (resource model), but not too abstract
> as to be inefficient and complex to develop against
> 3. similar to the JCR model
> The idea is that everything revolves around a resource.  A resource can
> have many types (this is the mixin idea) and any number of properties.
> To make this model hierarchical, a resource has a parent and a URI,
> which is essentially the path to the resource like "a/b/c".  We toyed
> around with only storing the resource name instead of the full path, but
> that means getting to a child node "a/b/c" means first getting a, then
> getting b, then getting c...icky tree traversal that would get nasty
> pretty fast.  The drawback of storing the complete path is that when you
> move a resource it affects all descendants, but moving large portions of
> the tree doesn't happen too often.

i'd like to explore the notion of having the data model implement an
"item soup", in which items are identified by uuid as well as by
unique name. this is symmetrical to the chandler repository and would
allow a very simple transfer of ideas between the two. i think the
less dav-centric the data model is, the better.

of course, we'd need to be be able to map webdav onto such a model. as
you point out, we'd need to map paths to uuids. the icky tree
traversal is pretty much what jackrabbit does - each repository item
has a reference to its parent and children, but it doesn't know its
place in the hierarchy. jackrabbit makes liberal use of caching to
avoid having to read every item from its database for every path

the fact is that webdav hierachies in cosmo don't tend to be very
deep. inside a user's home collection, you tend to have a few calendar
collections, and that's about it. anything nested within the calendar
collection is essentially a caldav implementation detail, not a user
choice. when we add task and contact support, those collections will
likely sit right inside the home collection as well. so i don't think
that path resolution by tree traversal would in practice be terribly
inefficient. if you disagree, let's hear why :)

i'd also like to use the terms "item" and "attribute" rather than
"resource" and "property", again for reasons of symmetry with
chandler's repository.

sorry to seem like i'm continually throwing new requirements out. a
lot of these thoughts are emerging just now based on discussions in
various threads on cosmo-dev.

> We also discussed storing a resource's content (blob) in a separate
> table, or as a property of the resource (similar to JCR world).  Thats
> still an open question. The reason that we have separate tables for
> different property types is that hibernate provides nice subclass
> features.  Defining a new property type is as simple as creating a new
> table, and new property subclass mapping and little code change.  For
> now we were thinking string and binary properties, but any number of
> property types could be added.

i've used int and date types in jcr as well. whatever hibernate has
builtin support for - using that support gives us a simple mapping to
the java types in our model classes.

i wonder if we should explicitly address content types like event and
calendar in the schema. we certainly have to be able to fulfill
queries like "find all events in this calendar that occur in this time
range" and "find the event in this calendar with this uid" as well as
"list all items in this collection".

also, should an item's "collection-ness" also be explicit in the
model? i think maybe so. a collection will have child items, but a
regular item won't. a collection will also have slightly different
"built in" properties than a regular item (see the cnd for the
existing repository schema).

> The user table is stubbed out (should map to the current cosmo user
> model) and there is nothing about tickets, but those can easily be added.

yea, i'm more worried about the data model for content right now than for users.

re tickets - each stored item can have 0..* tickets granted on it. a
ticket is uniquely identified by a string id (unique to the resource,
but in general reusing ticket ids is not a good idea). acls will be
similar, but we don't need to worry about those yet.

> The plan is to decide on a db schema, generate hibernate mapping
> files/classes, and then dao interfaces/hibernate implementations.  We
> are thinking that there also needs to be a set of application objects
> defined that map to the hibernate classes to decouple the hibernate
> model from the app.

can you be more specific about what you mean by that last sentence?
why wouldn't we want to use the hibernate model objects directly?

> We are just trying to figure out if we are going in the right
> direction.  Suggestions, comments, concerns??

i think we are definitely headed in the right direction. some more
requirements and ideas will probably shake out as we continue working,
but i'm hopeful that we'll move quickly into actually coding. i'm
probably not going to get to start the jcr-server work until after os
con (which occurs next week), but you should be able to work on your
stuff in isolation before then.

also, in case it wasn't clear, i'll be looking for this work to
eventually be submitted as a patch. i'll create a bug to which patches
can be attached for review.

> -Randy and Charles
> Brian Moseley wrote:
> > * the data must be hierarchically related
> > * the most fundamental data elements are "resources" and "collections"
> > * both resources and collections have name/value "properties"
> > * resource content can be really big (blob)
> > * we don't need to know the specific type of content present in a
> > resource
> > * if we do have a media type for a resource, though, then we can
> > potentially store more properties with the resource, which allows
> > those resources to participate in focused queries ("find all events in
> > this date range" as opposed to "find all resources owned by bcm")
> > * similarly, if we specifically type a collection, we can make some
> > rules about the resources that are placed in the collection (each
> > event in a calendar collection must have a unique icalendar uid within
> > that collection)
> >
> > even though the dav-based protocols that we use to access specific
> > content types (caldav for events, carddav for contacts) don't
> > specifically allow content to be intermixed (events live in calendar
> > collections, contacts live in addressbook collections), our data model
> > should allow us to store arbitrary mixed content in a given
> > collection. in other words, we should allow both events and contacts
> > within a single collection. when accessed by caldav, we'll just send
> > back the events. when accessed by carddav, we'll just send back the
> > contacts. additionally, resources themselves should be able to have
> > 0..* types - a resource could be of unknown type, or an event, or a
> > contact, or both an event and a contact. in the osaf world, assigning
> > multiple types to a content item is known as "stamping".
> >
> > jcr gives us api for navigating hierarchical relationships between
> > data, for querying those hierarchical relationships (give me all
> > descendents of node /b/bc/bcm that are of type calendar collection),
> > and mixin node types so that we can assign new content types to
> > existing nodes. it also gives us a simple locking api which we don't
> > use yet.
> >
> > in order to service caldav queries, as you point out, we can't just
> > store the icalendar representation of an event in a blob column
> > (although we still do that so that we can guarantee GET requests for
> > the event will preserve byte-for-byte equality with the event that was
> > originally PUT, which means we can use strong etags). we also have to
> > parse the icalendar object and store and index the contained
> > components, properties and parameters so that they can be queried
> > later. we currently do that by "flattening" the icalendar data into a
> > set of jcr properties. you can see this in the CalendarFlattener class
> > (i believe that's the name, can look it up if you can't find it).
> >
> > i'm sure there's more for me to tell you, but this is probably a good
> > start.
> >

More information about the cosmo-dev mailing list