[Dev] (dataflow image) RDF vs Python objects

David McCusker david at osafoundation.org
Wed Nov 27 00:51:21 PST 2002


(I was unable to post to my weblog from the office earlier today,
presumably because of firewall problems.  But I have no details.)

This note describes an image that illustrates the notions I mentioned
in http://lists.osafoundation.org/pipermail/dev/2002-November/000331.html.

I have resisted enclosing the image itself in this email, even though
it's only a 40K jpeg file.  It might be too big to some folks.  So I
put a copy on my personal weblog (along with a copy of this note),
and that's where you can see the image until another copy appears on
the right OSAF web page later.  See the following weblog entry:

http://www.treedragon.com/ged/map/ti/newNov02.htm#26nov02-rdf-vs-python

Katie Capps Parlante also wrote a strawman design document today with
another diagram emphasizing interobject relationships in a UML style
presentation, and that complements the image I discuss below, without
much in the way of overlap.  (Some of my Python file name references
are pure handwaving, however, and Katie supplies realistic names.)

The ASCII diagram below is a loose recreation of the original image,
but with almost all the arrows missing, with their respective labels,
so this text diagram really won't make any sense. But at least I have
given body to the noun-objects involved in the process I describe.

                     +------------+
     +----------+    |  event.py  |   +-----------+
     |  data.py |    +------------+   |  rdfio.py |
     +----------+                     +-----------+
                      +---+
                      |   |    /--------/       +-------------+
               DATA:  +-+-+   < import <        |             |
                 +---+  |      \--------\    +-------------+  |
+----+ <-store  |   +--+                    |             |  |
|    | load->   +---+         \--------\    |  .RDF       |--+
|    |                         > export >   |             |
|    |                        /--------/    +-------------+
|    |
|    |         +-------------+
| db |         | SCHEMA:     |
|    | load->  | +---+       |   \--------\       +-------------+
|    | <-store | |   |  +--+ |    > export >      |             |
|    |         | |   +--+  | |   /--------/    +-------------+  |
|    |         | |   |  |  | |                 |             |  |
+----+         | +---+  |  | |   /--------/    |  .RDFS      |--+
                |        +--+ |  < import <     |             |
                +-------------+   \--------\    +-------------+

             +--------------+           +-------------+
             |  schema.py   |           |  rdfsio.py  |
             +--------------+           +-------------+


When using the system as a persistent storage model, most traffic moves
back and forth over 'load' and 'store' arcs between the data and db,
and between schema and db. (Note the word "respository" is very nearly
interchangeable with the word "database" in most of these discussions.)

The Python objects corresponding to the in-memory graphs are defined in
files data.py, event.py, and schema.py, but we won't have static Python
code for every object, because some of them will be dynamically created
at runtime in response to RDF and RDFS descriptions.  For example, the
schema.py file might not exist, since all the objects describing each
schema might be generated dynamically from RDF descriptions.

Python doesn't have strongly typed interface files for objects, so we
hope to use RDFS schema files to define the expected data formats of
objects in memory, and the expected way of exporting these objects when
they are serialized as RDF/XML.

When a database is first created, the appropriate RDFS schema file
is parsed to populate the persistent store with a metadata description
of object formats and the way they should be serialized.  The code
used to turn RDFS into a suitable in-memory schema datastructure is
defined by some interface for doing this, and here I show a file named
rdfsoio.py providing the implementation as a plausible dummy.

Subsequent sessions need not reparse the RDFS because the content is
already persistent in the database.  However, we'll want to support
dynamic redefinition of schemas, and this might involve reparsing a
new RDFS file, and somehow upgrading the existing db metadata.

Import from RDF involves instantiating another Python based service,
which knows how to turn a SAX stream parse of RDF into suitable Python
objects in memory, which are persisted to the database.  The importer
code (here illustrated by a dummy file named rdfio.py) will look up
the Python implementations of objects mentioned by name in the RDF,
and this will dynamically load implentations on demand when they exist.

For example, We might have an CalendarEvent class which subclasses RDFObject,
which in turn subclasses a Persitent base class.  (We plan to have a thin
class layer above ZODB, so we are not strongly coupled to ZODB, so we'll
have our own Persistent base class for objects that persist themselves
transparently.)  In this case, when CalendarEvent appears in the RDF, we'll
load file CalendarEvent.py when the class is not already defined.

In contrast, the RDF might also mention a MusicItem type, which does not
have an actual Python implementation predefined.  So the importer would
see no source code defined MusicItem, and would instead instantiate a
base RDFObject instead, and populate this with suitable content from
the RDF specification, perhaps informed by the schema from RDFS and
accessed as a Python data structure in memory.

(We could dynamically generate a MusicItem.py file at runtime, because
Python allows this sort of thing.  But this seems potentionally exotic
and confusing for some developers, and it's not necessary.  So it might
be simpler to say not all RDFObject instances have static Python code.)

The main purpose of the illustrating image is to suggest the following
propositions that are expected true in RDF and persistent cooperation:

* Conversion to and from RDF invokes Python objects for this purpose,
   and they might be plugins. The process is not magic -- there will be
   source code that says how it happens.

* Import will typically attempt to load Python modules by name when
   they can be found, so creation of suitable Python objects is handled
   by late dynamic binding based on names found in RDF text files.

* The metainformation expressed by schemas in RDFS files becomes
   explicit Python objects in memory, which are persisted to the db,
   and are used at runtime to guide the handling of data objects.

* Python schema objects are modifiable at runtime, and the changes are
   not only persisted to the database, but are also exported to new
   RDFS files for future database creation.

* Well known schema objects might have specialized Python coded loaded
   by the RDFS importer.

* Everything that happens, and everything that describes content in the
   system, is expressed with actual Python objects at runtime that
   mediate desired effects with object methods.

* Data becomes Python objects.  Schemas become Python objects.  Actions
   like import and export become Python objects.  There's a pattern.

--David McCusker




More information about the Dev mailing list