Open Source Applications Foundation

[Dev] (db policy) transparent persistence

Michael McLay Wed, 27 Nov 2002 01:39:01 -0500


On Tuesday 26 November 2002 03:38 pm, David McCusker wrote:
> Before I respond to earlier database threads, I should make progress
> on some disclosure and interface design fronts.  This is a disclosure
> message, and a separate post will discuss import/export interfaces.
>
> This is a note on the plan to use transparent object persistence.  I
> think it's a good idea, but more importantly, John Anderson intuits
> this is a good choice, and I think it fits Andy Hertzfeld's desire for
> high usability criteria in engineering judgment.

I agree with the choice of transparent object persistence, but you also need
to blend in a couple additional features to facilitate collaboration between
multiple databases owned by multiple users. For instance, if I'm scheduling a 
meeting with 10 people in 4 different organizations. Since one organization  
is off the net at the moment several people cannot be scheduled while their 
system is down. I tentatively schedule the other attendees and have a 
transaction pending for the remainder. When their system comes back on line I 
find one of the remaining persons calendar is full that day so I need to role 
back the other scheduling transactions and select an alternative date that 
meets everyone's schedule. The process is complicated because one person I'm 
inviting to the meeting is a higher up in my organization. I don't have 
authorization to reserve time on their schedule, so scheduling the meeting 
requires first getting permission from this person to be on their schedule. 
While I'm waiting for approval for the time slot I have tentatively reserved 
time on the schedules of the other 9 attendees.

With this simple scenario we will also need to have sophisticated access
control mechanism [0] with features such as the ability to override the
roles  for some activities for specific individuals. We also need
transaction control so that transactions that do not successfully run to 
completion on all remote systems can be easily rolled back. This rollback 
capability has the pleasant side effect of providing a handy undo feature for 
the the application. (Have you tried using the undo tabs through the Zope 
management interface? They've done a nice job of making the prior 
transactions atomic.

A tricky part of this architecture will be the server-to-server interface for
collaborative scheduling. This is going to be a peer-to-peer protocol in
which the person calling the meeting is connecting with N external servers
to find a common time for a meeting with the N external persons. The
algorithm for finding the time slot will probably need rules based on the
importance of attendance of the individuals. Some people must be there or it
is a show stopper. Others may be invited out of courtesy and can be
scheduled in spite of preexisting conflicts.

The current Zope server does not support peer-to-peer transactions. The ZSync
"product" provides for a server-to-server synchronization. ZServer relies on 
XMLRPC calls between the servers, but this is not as complex a problem as 
managing access control lists across systems, conducting negotiations between 
servers, and then managing transaction rollback on remote systems when an 
activity is canceled.

Using XMLRPC for ZSync may have been a bit of a stretch. Building the 
infrastructure for schedule synchronization will be facilitated by using a 
peer-to-peer framework. The BEEP [1] framework would be an obvious choice. 
The RoadRunner C library is maturing to the point were the framework is 
becoming usable and there are Python bindings built on top of RoadRunner. 
There is also Beepy, which is a pure Python implementation of BEEP. The 
initial work on interoperability testing is just getting underway. On top of 
BEEP there are a couple other layers of software that may be of assistance as 
well. The "Application Exchange Core" (APEX) message relaying service [2] 
provides a core architecture for communications between applications. There 
are access control services and publish and subscribe services [4] built on 
top of APEX.There is also an instant messaging protocol built on top of APEX 
[5].

The BEEP developers have also mapped iCAL onto the BEEP framework with the
introduction of CAP [6]. Rich Salz has written a good introductory article [7] 
on BEEP. Ideallly I'd like to see Mozilla integrate BEEP for HTTP [8] into 
the client and Apache integrate it into the server. Other technology will 
follow if Mozilla and Apache take the lead. It's time to move beyond the TCP, 
single threaded straight jacket of HTTP.

> What does transparent object persistence mean?
>
> It means persistent content is mainly the attributes of some
> collection of objects, or of subobjects recursively embedded in other
> top level persistent objects.  Interacting with this content involves
> using normal Python objects.  Database updates merely involve modifying
> these objects and then committing the database.
>
> There need not be any overt operations on a database per se.  However,
> it should also be possible to read and write the database through
> alternative means, so it's not necessary for every single change to
> actually manifest in memory as a Python object before it can exist.
>
> (Content can appear in a database by other means, but an app developer
> cannot prove it did not come from a Python object in memory first.  If
> it gets shown to you as a Python object when you read it, how can you
> tell it was not originally a Python object when written?  You can't.)
>
> However content gets in the database, it's possible to look at all of
> it as the attributes of Python objects that can be accessed by asking
> other Python objects for them.  The root of a database should have an
> app object, and from this it should be possible to navigate to any
> object in the database by using the APIs of objects traversed down
> from an app object.  (And we can have other top level objects besides
> the app, of course.)

The Zope server can be accessed through HTTP, FTP, XMLRPC, and if the server
has an embedded SQL database adaptor the SQL database can be updated using
the usual SQL network connection to the database. The control loop for
accessing Zope is built on top of the Python async module. I hope this list
of capabilities is expanded to include a BEEP interface to Zope in the near 
future. And then there are those crazy Twisted guys:-) Barry Warsaw as 
impressed by Twisted. I wouldn't be surprised if Twisted were to be placed at 
the bottom layer of Zope someday.

[...]

> Does this mean the database must be an object database?
>
> No, not really, because the layer that serializes Python objects when
> they leave memory (or when they get flushed) can write to an API
> that doesn't assume much about how it gets stored.  So the database
> can be a relational database, as long as it has some way (maybe not
> in the core RDB part) which will store attributes never previously
> described in the table schemas.

This is how the Zope adaptors to databases work. There are some interesting
 issues raised in this architecture. This architecture for hiding an SQL
 database also enables "smart queries" to be written. (need to add a reference 
to "SQL with brains" here.)

> How are searches expressed?
>
> You can hide the way a database searches for content by asking a
> Python object in memory to create a new Python object that represents
> the results of a search.  Then asking this result object for objects
> it contains will expose search results as Python objects in memory.
> (Sorry for repeating the word "object" so many times.)
>
> Abstract Chandler database API layers must partly be specified as the
> APIs of Python objects that answer queries like this, so folks who
> write database plugins can provide implementations of these Python
> objects that put the right face on however a database actually does
> things under the covers.
>
> Is there a pattern for making this kind of thing work?
>
> Yes, a lot of this style of database plugin system can be implemented
> easily if the interfaces involved use a "factory" pattern.  Let's
> assume you've never heard of that before.  What's a factory?
>
> A factory is an object which creates or gives access to other objects.
> Instead of creating objects out of the blue, or assuming you know
> where to go look for them, you instead go to a factory object and ask
> it for what you want.  It gives you objects you request, but you
> don't know how the factory does memory management, or where it gets
> the objecs that satisfy factory requests.
>
> So a database plugin will emphasize a factory based interface. The
> root of a database plugin might be an object that provides access to
> the factory objects which answer questions about the database.  For
> example, to perform a search (which generates Python objects that
> satisfy a search) you can go to a factory object and ask for a suitable
> search factory, and then ask this factory your query, and it will
> return something that actually generates the result objects.

The words are slightly different, but the plugin factory based interface you
are describing are found throughout the Zope Wikis. In a tutorial on
creating "Products" Hathaway states:

   One of the defining characteristics of Zope products are that they can be
   added to Zope Folders. To allow your product to be created in this way you
   need to provide a creation form and a factory method. Factory methods are
   methods whose purpose is to create an instance of a class and place it in
   the ZODB.

A "Product" is a plugin, it can be added at different places in the Zope
hierarchy, so one plugin might be visible to a calendar, but not to a
contact list. The factory methods work as you described, for placing
creating content for the database. The discovery process for factories is
being refined in Zope3. For Zope2 the process was through acquisition. An 
interesting idea, but one that is implicit rather than explicit. "Import
This" warns against implicit and Zope3 is backing away from acquisition.

Here [8] is one example which talks about how to adapt content for new views
 of the content. The "Example" section about half way down the page discusses
 the issues encountered when storing the contact data for a contact database
 in a relational database while still providing a transparent means of
 accessing this content through the Zope object database interface. This
 specific example should be of immediate interest to Chandler.

The Zope database adaptors have been heavily field tested. They provide great
flexibility for gluing Zope to existing databases within an organization.
Potential users will want to integrate Chandler with existing databases so
you will eventually need to provide this same glue layer for Chandler. For
instance, the access control mechanism might map directly to an LDAP server
within a company for user authentication.

> Sorry if this sounds tedious.  It's something easy to implement by
> turning a crank.  All the artistry is in trying to make the interface
> elegant and clear.  It doesn't represent a technical engine problem.
>
> I'll stop this note here before I veer too far from the original intent
> of explaining the transparent object policy generally.

Please let me know if my running commentary about the parallels of Zope and
Chandler is more annoying than helpful. I see a strong pattern in the
requirements and the bits and pieces of this pattern may not be immediately
obvious to you if you haven't been watching the evolution of Python and
Zope. Hopefully the references will be helpful in filling in the gaps in
your view of this pattern.

[0]http://www.zope.org//Wikis/DevSite/Projects/ComponentArchitecture/SecurityFramework 
[1] http://www.beepcore.org/beepcore/specsdocs.jsp and
    http://www.beepcore.org/beepcore/docs/rfc3080.jsp
[2] http://www.beepcore.org/beepcore/docs/apex-core.jsp
[3] http://www.beepcore.org/beepcore/docs/rfc3341.jsp
[4] http://www.beepcore.org/beepcore/docs/apex-pubsub.html
[5] http://www.beepcore.org/beepcore/project.jsp?projectid=11
[6] http://www.ietf.org/internet-drafts/draft-ietf-calsch-cap-09.txt
[7] http://www.xml.com/pub/a/2002/10/16/ends.html
[8]http://www.zope.org//Wikis/DevSite/Projects/ComponentArchitecture/AdaptContentForViews