[Dev] (dataflow image) RDF vs Python objectsDavid McCusker Wed, 27 Nov 2002 00:51:21 -0800
(I was unable to post to my weblog from the office earlier today, presumably because of firewall problems. But I have no details.) This note describes an image that illustrates the notions I mentioned in http://lists.osafoundation.org/pipermail/dev/2002-November/000331.html. I have resisted enclosing the image itself in this email, even though it's only a 40K jpeg file. It might be too big to some folks. So I put a copy on my personal weblog (along with a copy of this note), and that's where you can see the image until another copy appears on the right OSAF web page later. See the following weblog entry: http://www.treedragon.com/ged/map/ti/newNov02.htm#26nov02-rdf-vs-python Katie Capps Parlante also wrote a strawman design document today with another diagram emphasizing interobject relationships in a UML style presentation, and that complements the image I discuss below, without much in the way of overlap. (Some of my Python file name references are pure handwaving, however, and Katie supplies realistic names.) The ASCII diagram below is a loose recreation of the original image, but with almost all the arrows missing, with their respective labels, so this text diagram really won't make any sense. But at least I have given body to the noun-objects involved in the process I describe. +------------+ +----------+ | event.py | +-----------+ | data.py | +------------+ | rdfio.py | +----------+ +-----------+ +---+ | | /--------/ +-------------+ DATA: +-+-+ < import < | | +---+ | \--------\ +-------------+ | +----+ <-store | +--+ | | | | | load-> +---+ \--------\ | .RDF |--+ | | > export > | | | | /--------/ +-------------+ | | | | +-------------+ | db | | SCHEMA: | | | load-> | +---+ | \--------\ +-------------+ | | <-store | | | +--+ | > export > | | | | | | +--+ | | /--------/ +-------------+ | | | | | | | | | | | | +----+ | +---+ | | | /--------/ | .RDFS |--+ | +--+ | < import < | | +-------------+ \--------\ +-------------+ +--------------+ +-------------+ | schema.py | | rdfsio.py | +--------------+ +-------------+ When using the system as a persistent storage model, most traffic moves back and forth over 'load' and 'store' arcs between the data and db, and between schema and db. (Note the word "respository" is very nearly interchangeable with the word "database" in most of these discussions.) The Python objects corresponding to the in-memory graphs are defined in files data.py, event.py, and schema.py, but we won't have static Python code for every object, because some of them will be dynamically created at runtime in response to RDF and RDFS descriptions. For example, the schema.py file might not exist, since all the objects describing each schema might be generated dynamically from RDF descriptions. Python doesn't have strongly typed interface files for objects, so we hope to use RDFS schema files to define the expected data formats of objects in memory, and the expected way of exporting these objects when they are serialized as RDF/XML. When a database is first created, the appropriate RDFS schema file is parsed to populate the persistent store with a metadata description of object formats and the way they should be serialized. The code used to turn RDFS into a suitable in-memory schema datastructure is defined by some interface for doing this, and here I show a file named rdfsoio.py providing the implementation as a plausible dummy. Subsequent sessions need not reparse the RDFS because the content is already persistent in the database. However, we'll want to support dynamic redefinition of schemas, and this might involve reparsing a new RDFS file, and somehow upgrading the existing db metadata. Import from RDF involves instantiating another Python based service, which knows how to turn a SAX stream parse of RDF into suitable Python objects in memory, which are persisted to the database. The importer code (here illustrated by a dummy file named rdfio.py) will look up the Python implementations of objects mentioned by name in the RDF, and this will dynamically load implentations on demand when they exist. For example, We might have an CalendarEvent class which subclasses RDFObject, which in turn subclasses a Persitent base class. (We plan to have a thin class layer above ZODB, so we are not strongly coupled to ZODB, so we'll have our own Persistent base class for objects that persist themselves transparently.) In this case, when CalendarEvent appears in the RDF, we'll load file CalendarEvent.py when the class is not already defined. In contrast, the RDF might also mention a MusicItem type, which does not have an actual Python implementation predefined. So the importer would see no source code defined MusicItem, and would instead instantiate a base RDFObject instead, and populate this with suitable content from the RDF specification, perhaps informed by the schema from RDFS and accessed as a Python data structure in memory. (We could dynamically generate a MusicItem.py file at runtime, because Python allows this sort of thing. But this seems potentionally exotic and confusing for some developers, and it's not necessary. So it might be simpler to say not all RDFObject instances have static Python code.) The main purpose of the illustrating image is to suggest the following propositions that are expected true in RDF and persistent cooperation: * Conversion to and from RDF invokes Python objects for this purpose, and they might be plugins. The process is not magic -- there will be source code that says how it happens. * Import will typically attempt to load Python modules by name when they can be found, so creation of suitable Python objects is handled by late dynamic binding based on names found in RDF text files. * The metainformation expressed by schemas in RDFS files becomes explicit Python objects in memory, which are persisted to the db, and are used at runtime to guide the handling of data objects. * Python schema objects are modifiable at runtime, and the changes are not only persisted to the database, but are also exported to new RDFS files for future database creation. * Well known schema objects might have specialized Python coded loaded by the RDFS importer. * Everything that happens, and everything that describes content in the system, is expressed with actual Python objects at runtime that mediate desired effects with object methods. * Data becomes Python objects. Schemas become Python objects. Actions like import and export become Python objects. There's a pattern. --David McCusker
|