[Chandler-dev] [Sum] The Great Architecture Discussion of 2007

Phillip J. Eby pje at telecommunity.com
Tue Oct 9 18:58:09 PDT 2007


At 05:27 PM 10/9/2007 -0700, Andi Vajda wrote:
>Not throw out. Migrate to a new schema. Just like in a relational database.
>If you change the low-level layout (format), core schema, or app 
>schema (table layout) someone needs to migrate the data. It might be 
>apparently easier in a relational schema but not so once you've 
>carefully optimized it and duplicated stuff left and right to get 
>the desired performance. Essentially, it becomes harder once the 1-1 
>correspondance between programmer's view (kind/class) and SQL table is broken.

Have a look at Hibernate, which is used by Cosmo: it uses an XML file 
that specifies the mapping between objects and database.  The 
contents of this file are never known to the application, which 
simply uses its own object model.

Hibernate maps object retrieval and queries to SQL, and applications 
use either the collections defined by the mapping, or use "HQL", 
which is an SQL-like query language that queries in terms of the 
*object* schema, rather than the relational one.  And it takes care 
of all the non-1-1-ness in the mapping.

Now, if you add new types to the application schema, of course you 
have to add to the XML file.  But in principle you could generate the 
XML in a logical fashion from the new piece of application schema, so 
that even that step is not necessary when you are first adding to the 
application.

Now, Hibernate is not available for Python (although I suppose you 
could make it so with JCC!) but it illustrates the point that is 
possible to separate things in this fashion.  I believe there is at 
least one Python ORM that claims to be inspired by or to work like 
Hibernate, though.  I also seem to recall that SQLAlchemy for Python 
also has a great deal of flexibility in mapping between different 
relational schemas, such that your code can deal with a logical 
schema rather than an actual one.

There is also the possibility of just rolling Yet Another Python ORM, 
perhaps based on EIM.  But these things don't matte as much as 
layering the application in such a way that it does not *care* how 
things actually get stored.  Chandler's domain model objects should 
not be subclasses of a storage type, for example.  (i.e., they should 
not be repository.Items).

That way, we will be able to experiment with different mappings and 
different back ends for optimum performance.  For that matter, we 
could use more than one back end if we chose, such that email bodies 
might be stored in mbox files, while their headers get indexed in 
SQLite.  (While all being dumpable and reloadable, of course.)

And, it is likely that for some period, we will still back-end to the 
repository -- we just would go through a mapping layer of some sort 
first.  (And that would mean that we could do some physical schema 
tuning there, without needing to mess with the application layer.)



More information about the chandler-dev mailing list