Open Source Applications Foundation

[Dev] (db policy) transparent persistence

David McCusker Tue, 26 Nov 2002 12:38:48 -0800


Before I respond to earlier database threads, I should make progress
on some disclosure and interface design fronts.  This is a disclosure
message, and a separate post will discuss import/export interfaces.

This is a note on the plan to use transparent object persistence.  I
think it's a good idea, but more importantly, John Anderson intuits
this is a good choice, and I think it fits Andy Hertzfeld's desire for
high usability criteria in engineering judgment.

Basically, when I was interviewing, I heard this is what folks wanted
and it's technically feasible and should lead to good results.  I can
satisfy this expressed desire, so this is the plan (as opposed to some
strikingly different approach to organizing storage interfaces).

So the policy of choosing transparent Python object persistence at the
high level, where Chandler apps interact with content, is probably not
open to discussion.  I'm going to do mostly what John Anderson and Andy
Hertzfeld want when they express a preference, and in this case I think
they'll be happy with the choice and are unlikely to change their minds.

(In an alternative universe, they might have been especially keen on SQL
queries as the universal way to see persistent content, and then I'd
sign up to see things that way instead.  That one would have beeen
harder for me to intuit things would be fine in the long run.)

What does transparent object persistence mean?

It means persistent content is mainly the attributes of some
collection of objects, or of subobjects recursively embedded in other
top level persistent objects.  Interacting with this content involves
using normal Python objects.  Database updates merely involve modifying
these objects and then committing the database.

There need not be any overt operations on a database per se.  However,
it should also be possible to read and write the database through
alternative means, so it's not necessary for every single change to
actually manifest in memory as a Python object before it can exist.

(Content can appear in a database by other means, but an app developer
cannot prove it did not come from a Python object in memory first.  If
it gets shown to you as a Python object when you read it, how can you
tell it was not originally a Python object when written?  You can't.)

However content gets in the database, it's possible to look at all of
it as the attributes of Python objects that can be accessed by asking
other Python objects for them.  The root of a database should have an
app object, and from this it should be possible to navigate to any
object in the database by using the APIs of objects traversed down
from an app object.  (And we can have other top level objects besides
the app, of course.)

But even though content is accessed as if they are all Python objects,
and all in memory the time, in fact most objects won't be in memory
most of the time when the database size is significantly bigger than
what an app actually uses at any given moment.

Objects appear in memory on demand, pulled from some serialized form
in the database into memory as a Python object, in response to calls
to access the objects.  So an app developer can't tell objects were
not previously in memory before the calls.

(Side note: for performance optimization purposes in some contexts,
app code might want to reduce display latency for users by "pre-
touching" content before it gets displayed, in an attempt to bring
it into memory earlier than the actual deadline, to take advantage
of potential parallelism in the system for reading stored content.
But this is something to ignore in the near future.)

Does this mean the database must be an object database?

No, not really, because the layer that serializes Python objects when
they leave memory (or when they get flushed) can write to an API
that doesn't assume much about how it gets stored.  So the database
can be a relational database, as long as it has some way (maybe not
in the core RDB part) which will store attributes never previously
described in the table schemas.

How are searches expressed?

You can hide the way a database searches for content by asking a
Python object in memory to create a new Python object that represents
the results of a search.  Then asking this result object for objects
it contains will expose search results as Python objects in memory.
(Sorry for repeating the word "object" so many times.)

Abstract Chandler database API layers must partly be specified as the
APIs of Python objects that answer queries like this, so folks who
write database plugins can provide implementations of these Python
objects that put the right face on however a database actually does
things under the covers.

Is there a pattern for making this kind of thing work?

Yes, a lot of this style of database plugin system can be implemented
easily if the interfaces involved use a "factory" pattern.  Let's
assume you've never heard of that before.  What's a factory?

A factory is an object which creates or gives access to other objects.
Instead of creating objects out of the blue, or assuming you know
where to go look for them, you instead go to a factory object and ask
it for what you want.  It gives you objects you request, but you
don't know how the factory does memory management, or where it gets
the objecs that satisfy factory requests.

So a database plugin will emphasize a factory based interface.  The
root of a database plugin might be an object that provides access to
the factory objects which answer questions about the database.  For
example, to perform a search (which generates Python objects that
satisfy a search) you can go to a factory object and ask for a suitable
search factory, and then ask this factory your query, and it will
return something that actually generates the result objects.

Sorry if this sounds tedious.  It's something easy to implement by
turning a crank.  All the artistry is in trying to make the interface
elegant and clear.  It doesn't represent a technical engine problem.

I'll stop this note here before I veer too far from the original intent
of explaining the transparent object policy generally.

--David McCusker