[Dev] schema api: final review before migration

Phillip J. Eby pje at telecommunity.com
Tue Jun 14 14:12:51 PDT 2005


At 10:23 AM 6/14/2005 -0700, Katie Capps Parlante wrote:
>Its our goal to finish the migration by the next milestone. For the most 
>part, the dev platform team will handle the migration, but you need to be 
>aware of some of the issues during the transition. Phillip will send more 
>details about the migration.

During development of 0.6, Chandler is moving from defining its schemas 
partly in XML and partly in Python, to defining them entirely in 
Python.  While there are lots of positive benefits to this, there are also 
some things you should be aware of, especially during the change-over process.

Currently, because we define parts of the schema in Python (the class 
hierarchy) and part in XML (the parcel and kind hierarchies), it's possible 
for them to be inconsistent in some ways.  For example, there are a few 
classes whose name is different from the name of the Kind that uses the 
class.  There are also several Kinds that share the same class, rather than 
creating a new class for each Kind.  Finally, there are classes that 
inherit from different classes than those inherited by their Kind.

For the most part, these minor inconsistencies don't affect Chandler's 
operation right now, but when we move to having only *one* place where this 
information is specified, it will be necessary to know which representation 
is correct: the parcel.xml or the class?  Later today, I'll be posting 
about the inconsistencies I've found, and my proposed resolution for 
them.  These kinds and classes will need to be made consistent before they 
can be migrated.

It's important that they be resolved correctly, however; bug #3242 was the 
result of making the Kind match the class, when the class should've been 
made to match the Kind.  Doing it the other way around can produce bugs, 
too.  Luckily, there are only a handful of inconsistencies, and most of 
them should be straightforward to resolve.

You may have also noticed various '__parcel__' variables popping up around 
the codebase, along with various 'import' statements in __init__.py files, 
and that some imports in modules are changing from this format:

     import foo.bar.Baz as Baz

to this format:

     from foo.bar import Baz

These changes are all to support the schema API, or more precisely, to 
support the *transition* to the schema API, during which we need to have 
the schema API and parcel.xml-defined schemas interoperating.  So, what do 
these changes mean, and how do they affect you?

The '__parcel__' setting tells the schema API that the classes in that 
module belong to the named parcel.  These parcel names are Python package 
names, not repository paths or XML namespace URIs.  In the long run, we 
will be doing away with both repository paths and XML namespace URIs as a 
way of identifying parcels, since they will become redundant with respect 
to Python package names.

However, until we have actually done away with the use of repository paths 
in the Chandler code base, we need to ensure that the new schema lives 
under exactly the same repository paths as the old schema, and that's where 
'__parcel__' comes in.  After the transition, we won't care about 
repository paths any more, because we'll be importing classes from 
packages, not retrieving kinds using paths.  But until then, we need 
'__parcel__' settings to maintain backward compatible repository paths.

Typically, the '__parcel__' is set to the name of the enclosing 
package.  For example, in the 'osaf.contententmodel.ContentModel' module, 
'__parcel__' is set to "osaf.contentmodel", because that's the name of the 
package that contains the relevant parcel.xml file.

You do not need to add '__parcel__' strings to existing modules unless 
you're one of the people who is porting existing packages.  However, if you 
are reorganizing a parcel's schema and want to move kinds between parcels, 
you may need to change this setting if it already exists.  If you are not 
sure what to do with a '__parcel__' setting during the transition period, 
please contact me.  Or if you're adventurous, you can run the 
'schema_status' command before and after your change, diffing the outputs 
to see if you broke anything.  :)  (Always remembering to run the tests 
before checkin, of course.)

The second kind of change you need to be aware of, is the addition of 
imports to __init__.py files.  These ensure that all of a parcel's classes 
are defined and present in the package's top-level module when the 
corresponding parcel is loaded.  This is mostly a transitional change, and 
might go away in some cases if we end up flattening the package structures 
a bit in a later milestone.  In general, however, all of a package's 
dependencies need to be imported and ready to use by the time the package 
is fully imported.

Therefore, if you add a new Kind+class during the transition, please make 
sure that the containing package's __init__.py is updated to import the new 
class.  If the class has the same name as an enclosing module, you will 
need to rename it in the import.  For example, in 
osaf.contentmodel.__init__, you'll see this:

     from ItemCollection import ItemCollection as __ItemCollection

This is because the class name (ItemCollection) would clash with the module 
name (also ItemCollection).  Renaming it in the import prevents this 
collision.  The schema API doesn't care what name it has in the package 
__init__, just as long as it's there.  (Its original name in the defining 
module, however, *must* match the name of the Kind, and the Kind *must* be 
defined in the parcel.xml of the package named in the module's '__parcel__' 
setting.)

By the way, you'll notice that it's a pain to have classes whose name is 
the same as the module name, and we strongly suggest you don't do it in 
future.  There are two common practices used to avoid this.  One, that is 
already done many places in Chandler, is to make the module name a plural 
(such as 'ContainerBlocks') so that it is different from the singular class 
name (such as 'ContainerBlock').  Another practice, that is more common 
among large Python frameworks (e.g. Twisted, Zope, PEAK, etc.) is to use 
all-lowercase names for modules.  For example, in Twisted, the 
'twisted.internet.selectreactor' module contains the 'SelectReactor' 
class.  (In addition to avoiding name clashes, this convention also makes 
it obvious whether a particular piece of code is working with a module or a 
class.)

The third major class of change is moving from 'import x.y.Z as Z' to 'from 
x.y import Z' or just 'import Z'.  This is not a change made for esthetic 
reasons, but practical ones.  A quirk of the Python import machinery makes 
the old form not work, when you are importing a sibling module, while being 
imported by a parent package that is in the import statement.

For example, in osaf.contentmodel.ItemCollection, the code originally did this:

     import osaf.contentmodel.ContentModel as ContentModel

However, when we added code to osaf.contentmodel that imports 
ItemCollection, this means that this statement executes *while* the 
osaf.contentmodel package is still being imported, and it therefore 
fails.  Changing the statement to:

     from osaf.contentmodel import ContentModel

fixes the problem.  We could also just say:

     import ContentModel

because the current package *is* osaf.contentmodel.

You do not need to go through your code and change these, but you should be 
aware of the issue, and we recommend you write new import statements in one 
of the two other forms.  (Note: you can still use 'as' with these forms to 
rename a module; it's not 'as' that causes the problem, but rather the 
attempt to do an absolute import while the parent package is still in the 
process of being imported.)

The current plan for schema migration is to have all Kind, Attribute, and 
Cloud definitions moved to Python by the next milestone date.  In the next 
milestone, we'll be looking at getting rid of path dependencies, flattening 
packages, and beginning to take advantage of the benefits of the schema API 
(like being able to create and test items without needing to go through a 
parcel load operation).   These benefits unfortunately are not available 
*during* the transition period, because of the need for backward 
compatibility, and because we want to disturb things as little as possible 
during the transition process.  Thanks for your patience and assistance.



More information about the Dev mailing list