Open Source Applications Foundation

[Dev] RDF tax

David Hyatt Fri, 15 Nov 2002 17:09:54 -0800

One lesson I learned while working on Mozilla is that it can be tricky 
to make a scalable data source using RDF, even if the underlying format 
(beneath the RDF representation) is scalable.  As an example, consider 
a large mailbox with 50,000 messages.  If you force the UI to display 
information through communication with your RDF layer, then you can end 
up with a real challenge on your hands as far as avoiding walking too 
much of the RDF data (especially when hierarchies are involved).  This 
problem is compounded by aggregation, where even simple things like 
counting how many items are in a subfolder become challenging (since 
all data sources have to get involved).  Trying to do projection in a 
message display becomes very difficult, especially with aggregation.

In Mozilla we ended up using RDF for mailboxes, but we dumped it for 
messages (and sacrificed the aggregation capabilities of RDF in the 
process).   This is an advantage of using Mozilla's tree view, since it 
works with RDF for smaller data sources (e.g., mailboxes and bookmarks) 
but you can still plug in your own non-RDF back end to the tree view if 
you need something more scalable.


On Friday, November 15, 2002, at 01:04 PM, David McCusker wrote:

> Bill Seitz wrote:
>> Which may or may not be relevant
> Below I cite a small part of that material by Joe Gregorio so I can
> comment on the use of RDF in Chandler, since apparently some folks
> are concerned about paying the RDF tax, and worry this will make the
> Chandler framework(s) complex.
> Joe Gregorio (
> : Note that mail in Mozilla is stored in mbox format and that there is
> : a seperate code layer above it that presents it as an RDF data 
> source.
> : They didn't go back an change the mbox format so that it was native
> : RDF. This is very important because it leaves the mbox format alone
> : allowing the current set of tools that manipulate mbox format to 
> still
> : work. I'll repeat that for emphasis: They left the native format 
> alone.
> : This is very different from picking up a working format and forcing
> : into a convoluted form so that it is natively RDF. We all know how
> : successful that's been.
> Chandler content will not typically be stored in any native RDF format,
> although it makes sense to generate some on demand when this is what
> the user wants.  (Say, if a user wants to see addresses represented as
> an RDF serialization, why not do this when requested?  It need not
> imply anything about how the content is stored the rest of the time.)
> As Joe Gregorio mentions, mail in Mozilla is stored in mbox format, and
> yet the content is still presented at some higher level as RDF content.
> Chandler can store content in any physical encoding it wishes and still
> present content as RDF compatible in every suitable context.
> We're currently thinking along the lines of transparent Python object
> persistence for Chandler, which might also make this content visible
> in other ways when accessed along a more direct route than straight
> through the Chandler app framework interfaces.  The persistant object
> format won't have any character specific to RDF.
> And yet, we also want RDF schemas which can describe all the content, 
> so
> Python objects are visible in an RDF world that can query and otherwise
> deal with the content as if it has a native RDF representation 
> somewhere.
> So Chandler will use RDF in the sense that when someone wants to 
> operate
> on content with RDF based interfaces (including internal Chandler view
> components) this will work in a highly supported way, especially since
> Chandler will depend on this working well itself.
> However, this doesn't mean RDF in any way determines what can be stored
> persistently, or what it looks like to a reader or writer when an RDF
> based interface is not the one desired.
> The RDF tax won't get you when you want to avoid it.  Of course, some
> folks will want to understand the internals of Chandler intimately, and
> using RDF internally implies a need to understand how RDF gets used in
> Chandler contexts.  That will be a barrier to entry if RDF complexity
> stops new folks from getting involved, or old folks from understanding
> how to solve problems as they crop up in development.
> So I'll try to make sure the RDF parts have a description that it easy
> to understand, so the barrier to entry is a low as possible.  I'm not
> sure how I'll do that yet, but it doesn't sound hard in principle.
> Probably this means I'll write an RDF primer suitable for Chandler that
> aims to make it sound simpler than other documents do.  (I have not yet
> read the recent documents whose links were posted by Wes Felter; see
> which includes the following links:
> The last time I spent a lot of time thinking about RDF when I was at
> Netscape, I thought of a spatial interpretation of RDF graphs that 
> might
> or might not be helpful when I rethink it and write a description.  But
> even more important might be an explanation of how it relates to 
> Chandler
> content without overwhelming a reader with RDF specifics.
> Maybe I'll describe an in-memory representation that emphasizes space
> efficiency that is easier to grasp for coders than grammars about the
> text serialization formats.  Diagrams might clarify things later.
> Anyway, I hope to keep RDF from making Chandler hard to understand when
> folks want to get involved.  But the engineering for persistence in the
> system is only now getting underway for the long term production plans,
> so I can't tell you yet how I'll achieve the desired effects I 
> mentioned
> concerning independence from the RDF perspective.
> --David McCusker
> _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
> Open Source Applications Foundation "Dev" mailing list