[Dev] (db policy) transparent persistenceDavid McCusker Tue, 26 Nov 2002 12:38:48 -0800
Before I respond to earlier database threads, I should make progress on some disclosure and interface design fronts. This is a disclosure message, and a separate post will discuss import/export interfaces. This is a note on the plan to use transparent object persistence. I think it's a good idea, but more importantly, John Anderson intuits this is a good choice, and I think it fits Andy Hertzfeld's desire for high usability criteria in engineering judgment. Basically, when I was interviewing, I heard this is what folks wanted and it's technically feasible and should lead to good results. I can satisfy this expressed desire, so this is the plan (as opposed to some strikingly different approach to organizing storage interfaces). So the policy of choosing transparent Python object persistence at the high level, where Chandler apps interact with content, is probably not open to discussion. I'm going to do mostly what John Anderson and Andy Hertzfeld want when they express a preference, and in this case I think they'll be happy with the choice and are unlikely to change their minds. (In an alternative universe, they might have been especially keen on SQL queries as the universal way to see persistent content, and then I'd sign up to see things that way instead. That one would have beeen harder for me to intuit things would be fine in the long run.) What does transparent object persistence mean? It means persistent content is mainly the attributes of some collection of objects, or of subobjects recursively embedded in other top level persistent objects. Interacting with this content involves using normal Python objects. Database updates merely involve modifying these objects and then committing the database. There need not be any overt operations on a database per se. However, it should also be possible to read and write the database through alternative means, so it's not necessary for every single change to actually manifest in memory as a Python object before it can exist. (Content can appear in a database by other means, but an app developer cannot prove it did not come from a Python object in memory first. If it gets shown to you as a Python object when you read it, how can you tell it was not originally a Python object when written? You can't.) However content gets in the database, it's possible to look at all of it as the attributes of Python objects that can be accessed by asking other Python objects for them. The root of a database should have an app object, and from this it should be possible to navigate to any object in the database by using the APIs of objects traversed down from an app object. (And we can have other top level objects besides the app, of course.) But even though content is accessed as if they are all Python objects, and all in memory the time, in fact most objects won't be in memory most of the time when the database size is significantly bigger than what an app actually uses at any given moment. Objects appear in memory on demand, pulled from some serialized form in the database into memory as a Python object, in response to calls to access the objects. So an app developer can't tell objects were not previously in memory before the calls. (Side note: for performance optimization purposes in some contexts, app code might want to reduce display latency for users by "pre- touching" content before it gets displayed, in an attempt to bring it into memory earlier than the actual deadline, to take advantage of potential parallelism in the system for reading stored content. But this is something to ignore in the near future.) Does this mean the database must be an object database? No, not really, because the layer that serializes Python objects when they leave memory (or when they get flushed) can write to an API that doesn't assume much about how it gets stored. So the database can be a relational database, as long as it has some way (maybe not in the core RDB part) which will store attributes never previously described in the table schemas. How are searches expressed? You can hide the way a database searches for content by asking a Python object in memory to create a new Python object that represents the results of a search. Then asking this result object for objects it contains will expose search results as Python objects in memory. (Sorry for repeating the word "object" so many times.) Abstract Chandler database API layers must partly be specified as the APIs of Python objects that answer queries like this, so folks who write database plugins can provide implementations of these Python objects that put the right face on however a database actually does things under the covers. Is there a pattern for making this kind of thing work? Yes, a lot of this style of database plugin system can be implemented easily if the interfaces involved use a "factory" pattern. Let's assume you've never heard of that before. What's a factory? A factory is an object which creates or gives access to other objects. Instead of creating objects out of the blue, or assuming you know where to go look for them, you instead go to a factory object and ask it for what you want. It gives you objects you request, but you don't know how the factory does memory management, or where it gets the objecs that satisfy factory requests. So a database plugin will emphasize a factory based interface. The root of a database plugin might be an object that provides access to the factory objects which answer questions about the database. For example, to perform a search (which generates Python objects that satisfy a search) you can go to a factory object and ask for a suitable search factory, and then ask this factory your query, and it will return something that actually generates the result objects. Sorry if this sounds tedious. It's something easy to implement by turning a crank. All the artistry is in trying to make the interface elegant and clear. It doesn't represent a technical engine problem. I'll stop this note here before I veer too far from the original intent of explaining the transparent object policy generally. --David McCusker
|