[Chandler-dev] [Sum] The Great Architecture Discussion of 2007

Phillip J. Eby pje at telecommunity.com
Tue Oct 9 16:38:04 PDT 2007

At 04:12 PM 10/9/2007 -0700, Andi Vajda wrote:

>On Tue, 9 Oct 2007, Phillip J. Eby wrote:
>>1. application-level code meddling in storage-level details
>Could you give some examples ?

Any place where the application is creating collections or working 
with indexes in order to achieve performance compared to "naive" 
iteration or queries.

>>2. lack of sufficient domain-specific query APIs
>Again, please give an example of what you'd like ?

This isn't a repository problem - it's a domain-layer problem.  If 
the places where we're doing #1 were at least consolidated to single 
points of reference, #1 wouldn't be so bad.

>>3. no indirection between the application's logical schema and its 
>>physical storage schema
>Seems incorrect. I can change the physical storage schema (core 
>schema or even repo format) without affecting app code. Or am I 
>misunderstanding something ?

Sorry, I am using the relational meaning of logical and physical.  A 
logical schema does not include indexes or views, while a physical 
schema does.  I'm also extending this to refer to the lack of 
distinction between our preferred form of data as encapsulated 
objects, versus the best divisions of data from a performance point of view.

The core schema and repo format aren't a factor in this, as they're 
at an even lower level than the "physical" schema I'm talking 
about.  In the repository today, the "physical" schema consists of 
whatever sets/collections and indexes you create, which is rather 
analagous to creating indexes or materialized views in an RDBMS, only 
without the same transparency.  In an RDBMS, if you add an index or a 
materialized view, it doesn't change how you retrieve your data: it 
just goes faster.  So you can do application specific tuning without 
changing your application.

>>4. implementing a generic database inside another generic database
>That was the goal, originally.

Not quite; having a generic database was the goal, not that it be 
implemented *inside* another generic database.  It is one thing to 
have a BerkeleyDB persistence layer driven by the application's 
dynamic schema, and another one altogether to implement a database on 
top of a fixed BerkeleyDB schema.

For comparison purposes, consider OpenLDAP: it is a generic, 
hierarchical, networked database implemented atop 
BerkeleyDB.  However, instead of having a fixed schema for storing 
values, items, etc., in BerkeleyDB, it is dynamically extended as 
attribute types and indexes are added.  So the database is 
*represented* in BerkeleyDB, rather than being implemented *inside* BerkeleyDB.

The same distinction applies to say, MySQL, which implements each 
table using separate BerkeleyDB data structures, rather than creating 
a generic "rows" data structure.

So, when I say it is implemented "inside" another database, I mean it 
in the sense that the schema of the repository is not reflected in 
the schema of its back-end storage, and thus cannot fully utilize the 
back-end's features to maximum performance.

>Not to have a hard compiled app against a hard compiled relational 
>schema. If Chandler is to become a hard compiled application with a 
>static schema, where all data types have to be determined in 
>advance, then of course, the chandler repository is overkill and can 
>be replaced by some specifically optimized, domain-specific, schema.

I'm not sure what you mean by "hard compiled".  Nothing stops us from 
having a relational schema that's extensible by parcels, or from 
doing so dynamically.  In truth, the schemas we use with the 
repository today are no less "hard compiled".  If we at some future 
time allow user-defined fields, there are still ways to represent 
them within such a relatively-static schema, or to simply modify the 
schema at runtime.

>>5. implementing generic indexes inside of generic indexes
>How so ? What are you thinking about ?

The skip list system is the main one I have in mind, but if I 
correctly understand how versions and values are stored, then those 
would be included too.

More information about the chandler-dev mailing list