[Dev] Schema API update

Phillip J. Eby pje at telecommunity.com
Wed Apr 20 16:53:58 PDT 2005


On Monday, I met with Katie, Ted, Andi, Grant, and Morgen to review the 
Schema API proposal and its impact on parcel loading, etc.  We identified a 
number of issues, some of which we resolved during the meeting, and others 
that I've been working on since then and now have resolutions for.


Mapping Python modules to Parcels
---------------------------------

First, the proposal didn't address the mapping of Python modules to parcel 
objects, or how Parcel subclasses would be defined/used.  I propose to 
address this by defining an API that will be used to create null-view 
parcels for importing: ``schema.parcel_for_module(module_name)``.  So, e.g.::

     aParcel = schema.parcel_for_module('osaf.contentmodel')

would return a null-view parcel object for 
``//parcels/osaf/contentmodel``.  This API will work by checking whether 
the named module has a ``__parcel__`` variable defined, and if not, it will 
create one using the module's ``__parcel_class__`` variable, if 
defined.  If there is neither a ``__parcel__`` nor ``__parcel_class__``, it 
will just create a stock Parcel for the module.  If a new parcel object is 
created, it will be saved in the ``__parcel__`` attribute of the module so 
that subsequent invocations will return the same parcel object.  There will 
have to be some locking support in order to make this API threadsafe, using 
``threading.RLock`` because this API is recursive.  That is, in order to 
create a parcel for a module, it will first ask for the parent module's 
parcel, in order to know what parent to set on the child parcel.  So, the 
locking has to support re-entrancy.  Finally, the API will need to be able 
to return a meaningful value when a null module (i.e. the empty string 
``""``) is requested, so that the recursion has a place to "bottom out".


Mapping XML Namespaces to Modules
---------------------------------

Second, during the meeting Morgen pointed out that the XML namespaces used 
in ``parcel.xml`` today do not directly correspond to the modules where 
contentmodel classes live.  That is, parcels correspond to Python 
"packages", but not to modules.  So, in order to allow gradual transition, 
when we port packages to use the schema API, we'll need to "flatten" them 
so that all the package's classes can be imported directly from the 
package.  (E.g. by moving the code directly into the package ``__init__.py``.)

Note that if the flattened package ends up as just an ``__init__.py`` with 
no ``parcel.xml``, it can then also be changed to be just a module instead 
of a package.  For example, if we were porting the 
``osaf.contentmodel.contacts`` parcel to use the schema API, we could just 
take its ``Contacts.py``, rename it to ``contacts.py`` and move it into 
``osaf.contentmodel``, thus moving the 
``osaf.contentmodel.contacts.Contacts.Contact`` class to just 
``osaf.contentmodel.contacts.Contact``.  Then, the location of the content 
classes will match the XML namespaces used in current ``parcel.xml`` files, 
and the corresponding repository paths.

Of course, parcels that have instance data in ``parcel.xml`` cannot be 
converted from packages to modules, because they still need a separate 
directory for the ``parcel.xml`` to live in.  Such parcels can still be 
flattened by moving the schema classes into the ``__init__.py``, however.

(Note: this is a slightly different resolution than the one(s) we discussed 
at the meeting on Monday.  This modified approach has less likelyhood of 
error during porting, and also achieves the side benefit of helping to 
reduce the current deep package nesting of our parcels.)


Parcel Synch and Update
-----------------------

Morgen's questions at the meeting also exposed a couple of issues where the 
sequence of parcel loading and imports could make a difference to the 
resulting repository contents.  The schema API is intended to support lazy 
loading on a couple of different levels, but parcel loading is a more 
synchronous process.  Schema classes can't load themselves into the 
repository right away for three reasons: 1) they don't know what repository 
"the repository" is, 2) they may have dependencies that aren't yet 
imported, and 3) we don't want to have to import all possible modules at 
startup.

So, in order to ensure that namespaces referenced by a ``parcel.xml`` file 
have been initialized in the repository, there will be an API along the 
lines of  ``schema.synchronize_parcel(repository_view,path)``.  The parcel 
loader will invoke this API when setting up an XML namespace, to ensure 
that the dependent parcel(s) have been imported, and the relevant 
schema(s), if any, are added to the repository.

It also became clear during the meeting that changes to schema modules 
can't be practically detected at present by any parcel loading mechanism, 
and some expressed the opinion that when one changes a parcel's schema, one 
generally needs to recreate their repository.  So, Andi offered to add a 
checksum facility to the repository so that when Kinds are imported they 
will be checked against the existing Kind in the repository, and an error 
will occur if they differ in any substantial way, thereby alerting you of 
the need to recreate your repository once you use a changed schema.

Alternatively, we could attempt to support simple schema evolution.  In a 
hallway conversation yesterday, John asked if we could include a way to 
reload parcels in such a way as to incorporate changes to both code and 
schema at runtime, to afford a faster development feedback loop.  This is 
not on the feature list for 0.6, but I'll be watching for opportunities to 
move us in this direction, perhaps by adding some sort of "upgrade hook" to 
parcel modules or to kind classes, and maybe a way to specify a schema's 
version.  This would naturally have some overlap with repository schema 
evolution and would be something Andi and I would need to talk about more 
before coming up with an actionable plan.


Clouds and Endpoints
--------------------

I previously mentioned that I needed someone "Wise in the Way of Clouds" to 
knock some sense into my head about how they work, and there were at least 
two such people there on Monday, so now my head hurts, but I'm closer to 
knowing what would work.  :)  It's likely that it will look something like 
this::

     class ContentItem(schema.Item):
         # ... other stuff here

         __clouds__ = dict(
             sharing = schema.Cloud(
                 byRef = [displayName, body, issues, createdOn]
             )
         )

The idea here is that ``__clouds__`` is a dictionary mapping cloud aliases 
(like ``sharing``) to ``schema.Cloud()`` objects, which are the same as 
regular clouds except that you'll specify attributes by referencing the 
descriptors rather than strings representing the attribute names.  And 
you'll group the names by policy rather than specifying a policy for each 
name.  This is still very vague, and feedback to help steer this in the 
right direction would be welcome, especially if I've made a stupid mistake 
like assuming that the order of endpoints in a cloud is inconsequential, 
when in fact it makes a difference.  (And yes, I'm assuming that, so if 
that's wrong, somebody please apply the appropriate clue-by-four to my 
head.  Thanks!)


Making the Transition
---------------------

At Monday's meeting, Katie asked for input on how we might proceed with the 
actual transition, in terms of who, what, how, and when.  My initial 
proposal stated that porting of parcels needed to take place from the 
"inside out", such that a parcel containing a base class needed to be 
ported prior to a parcel containing a subclass of that base class, because 
``parcel.xml`` files can refer to schema items defined in modules, but not 
the other way around.  Andi commented that this need not be the case, 
because I was going to have to implement such linkage in order to connect 
to the core schema (e.g. ParcelManager et al).

After thinking about this some more, though, I realized that although Andi 
is correct, that only works if the parcels are loaded into the null 
repository view, which means we'd have to load all parcels into the null 
view every time, and that's not really what we want.  So, at this point I 
think we should stick with the strategy of working from the "inside out", 
beginning with core schema elements and working our way out.

And, since the implementation design allows us to port parcels 
incrementally, this will allow us to control scope of porting in 0.6, 
because we can "stop any time we want".  So, I don't have any specific 
recommendations regarding the who/what/when parts.  Probably what will 
happen is that once I have enough of the schema API implemented, I will try 
to port a parcel's schema (probably the osaf.contentmodel.* parcels) and 
see what happens.  Either it will then be an example for how to do other 
parcels, or I'll learn what doesn't work well in the schema API.  We'll 
also know then how long it might take to port a parcel, and can do more 
planning at that time.



More information about the Dev mailing list