Open Source Applications Foundation

[Dev] RDF and ZODB

Michael R. Bernstein 31 Oct 2002 22:55:51 -0800


Ok, given what little I know about Chandler's proposed use of RDF stored
in the ZODB, I went hunting for the RedFoot developers to ask them about
their library. I caught up with Eikon on the #redfoot IRC channel.

I started by assuming that triples needed to be represented by class
instances and stored somehow, but this turns out not to be the case. A
triple really only consists of references to a subject, predicate, and
object, so the RDFLib triple store uses nested dictionaries to store
triples. Each triple is stored in two sets of nested dictionaries as
follows:

spo[s][p][o] = 1
pos[p][o][s] = 1

(s)ubject, (p)redicate, (o)bject.

Persisting very large dicts in the ZODB is usually a bad idea, because
in order to access a key/value pair, the whole dict needs to be loaded
into memory. So, the ZODB has a more efficient persistent data type
called a BTree (Binary Tree). The ZODB BTree implementation has the same
API as a dict, so can more or less be used as a dict replacement, but
BTrees are ordered (like lists) so getting the correct value by key only
requires loading the branch of the tree that leads to the key/value
pair.

BTree documentation can be found here:
http://www.zope.org/Members/ajung/BTrees/FrontPage

Anyway, Eikon downloaded the Standalone ZODB package
(http://www.zope.org/Products/StandaloneZODB) and in a couple of hours
had successfully modified his in-memory triple store to use BTrees
inside a ZODB instance.

All this looks very promising, although without knowing more about
Morgen Sagan's Shimmer RDF database prototype, it's hard for me to tell
whether I'm barking up the wrong tree here, or duplicating his efforts.

In any case, storing the triples is only part of the story. The triple
is just a set of references tying together three object instances, a
subject, a predicate, and an object together in a relationship,
presumably these objects are also stored in the ZODB somewhere.

There are a couple of different ways we could store these object
instances. We could:

 - Store everything in one BTree, giving each object a unique id.

 - Store each object type in a separate BTree (email, contacts,
   events, etc.)

 - Store all Items (subjects and objects) in one BTree,
   and the predicates somewhere else, perhaps in another BTree

Some more information about the ZODB:
 http://www.zope.org/Documentation/Articles/ZODB1
 http://www.zope.org/Documentation/Articles/ZODB2
 http://www.zope.org/Documentation/Books/ZDG/current/Persistence.stx

Of course, this approach may be entirely naive, but I think it serves as
a point of departure, at least.

Michael Bernstein.