[Dev] Schema API update
Phillip J. Eby
pje at telecommunity.com
Wed Apr 20 16:53:58 PDT 2005
On Monday, I met with Katie, Ted, Andi, Grant, and Morgen to review the
Schema API proposal and its impact on parcel loading, etc. We identified a
number of issues, some of which we resolved during the meeting, and others
that I've been working on since then and now have resolutions for.
Mapping Python modules to Parcels
---------------------------------
First, the proposal didn't address the mapping of Python modules to parcel
objects, or how Parcel subclasses would be defined/used. I propose to
address this by defining an API that will be used to create null-view
parcels for importing: ``schema.parcel_for_module(module_name)``. So, e.g.::
aParcel = schema.parcel_for_module('osaf.contentmodel')
would return a null-view parcel object for
``//parcels/osaf/contentmodel``. This API will work by checking whether
the named module has a ``__parcel__`` variable defined, and if not, it will
create one using the module's ``__parcel_class__`` variable, if
defined. If there is neither a ``__parcel__`` nor ``__parcel_class__``, it
will just create a stock Parcel for the module. If a new parcel object is
created, it will be saved in the ``__parcel__`` attribute of the module so
that subsequent invocations will return the same parcel object. There will
have to be some locking support in order to make this API threadsafe, using
``threading.RLock`` because this API is recursive. That is, in order to
create a parcel for a module, it will first ask for the parent module's
parcel, in order to know what parent to set on the child parcel. So, the
locking has to support re-entrancy. Finally, the API will need to be able
to return a meaningful value when a null module (i.e. the empty string
``""``) is requested, so that the recursion has a place to "bottom out".
Mapping XML Namespaces to Modules
---------------------------------
Second, during the meeting Morgen pointed out that the XML namespaces used
in ``parcel.xml`` today do not directly correspond to the modules where
contentmodel classes live. That is, parcels correspond to Python
"packages", but not to modules. So, in order to allow gradual transition,
when we port packages to use the schema API, we'll need to "flatten" them
so that all the package's classes can be imported directly from the
package. (E.g. by moving the code directly into the package ``__init__.py``.)
Note that if the flattened package ends up as just an ``__init__.py`` with
no ``parcel.xml``, it can then also be changed to be just a module instead
of a package. For example, if we were porting the
``osaf.contentmodel.contacts`` parcel to use the schema API, we could just
take its ``Contacts.py``, rename it to ``contacts.py`` and move it into
``osaf.contentmodel``, thus moving the
``osaf.contentmodel.contacts.Contacts.Contact`` class to just
``osaf.contentmodel.contacts.Contact``. Then, the location of the content
classes will match the XML namespaces used in current ``parcel.xml`` files,
and the corresponding repository paths.
Of course, parcels that have instance data in ``parcel.xml`` cannot be
converted from packages to modules, because they still need a separate
directory for the ``parcel.xml`` to live in. Such parcels can still be
flattened by moving the schema classes into the ``__init__.py``, however.
(Note: this is a slightly different resolution than the one(s) we discussed
at the meeting on Monday. This modified approach has less likelyhood of
error during porting, and also achieves the side benefit of helping to
reduce the current deep package nesting of our parcels.)
Parcel Synch and Update
-----------------------
Morgen's questions at the meeting also exposed a couple of issues where the
sequence of parcel loading and imports could make a difference to the
resulting repository contents. The schema API is intended to support lazy
loading on a couple of different levels, but parcel loading is a more
synchronous process. Schema classes can't load themselves into the
repository right away for three reasons: 1) they don't know what repository
"the repository" is, 2) they may have dependencies that aren't yet
imported, and 3) we don't want to have to import all possible modules at
startup.
So, in order to ensure that namespaces referenced by a ``parcel.xml`` file
have been initialized in the repository, there will be an API along the
lines of ``schema.synchronize_parcel(repository_view,path)``. The parcel
loader will invoke this API when setting up an XML namespace, to ensure
that the dependent parcel(s) have been imported, and the relevant
schema(s), if any, are added to the repository.
It also became clear during the meeting that changes to schema modules
can't be practically detected at present by any parcel loading mechanism,
and some expressed the opinion that when one changes a parcel's schema, one
generally needs to recreate their repository. So, Andi offered to add a
checksum facility to the repository so that when Kinds are imported they
will be checked against the existing Kind in the repository, and an error
will occur if they differ in any substantial way, thereby alerting you of
the need to recreate your repository once you use a changed schema.
Alternatively, we could attempt to support simple schema evolution. In a
hallway conversation yesterday, John asked if we could include a way to
reload parcels in such a way as to incorporate changes to both code and
schema at runtime, to afford a faster development feedback loop. This is
not on the feature list for 0.6, but I'll be watching for opportunities to
move us in this direction, perhaps by adding some sort of "upgrade hook" to
parcel modules or to kind classes, and maybe a way to specify a schema's
version. This would naturally have some overlap with repository schema
evolution and would be something Andi and I would need to talk about more
before coming up with an actionable plan.
Clouds and Endpoints
--------------------
I previously mentioned that I needed someone "Wise in the Way of Clouds" to
knock some sense into my head about how they work, and there were at least
two such people there on Monday, so now my head hurts, but I'm closer to
knowing what would work. :) It's likely that it will look something like
this::
class ContentItem(schema.Item):
# ... other stuff here
__clouds__ = dict(
sharing = schema.Cloud(
byRef = [displayName, body, issues, createdOn]
)
)
The idea here is that ``__clouds__`` is a dictionary mapping cloud aliases
(like ``sharing``) to ``schema.Cloud()`` objects, which are the same as
regular clouds except that you'll specify attributes by referencing the
descriptors rather than strings representing the attribute names. And
you'll group the names by policy rather than specifying a policy for each
name. This is still very vague, and feedback to help steer this in the
right direction would be welcome, especially if I've made a stupid mistake
like assuming that the order of endpoints in a cloud is inconsequential,
when in fact it makes a difference. (And yes, I'm assuming that, so if
that's wrong, somebody please apply the appropriate clue-by-four to my
head. Thanks!)
Making the Transition
---------------------
At Monday's meeting, Katie asked for input on how we might proceed with the
actual transition, in terms of who, what, how, and when. My initial
proposal stated that porting of parcels needed to take place from the
"inside out", such that a parcel containing a base class needed to be
ported prior to a parcel containing a subclass of that base class, because
``parcel.xml`` files can refer to schema items defined in modules, but not
the other way around. Andi commented that this need not be the case,
because I was going to have to implement such linkage in order to connect
to the core schema (e.g. ParcelManager et al).
After thinking about this some more, though, I realized that although Andi
is correct, that only works if the parcels are loaded into the null
repository view, which means we'd have to load all parcels into the null
view every time, and that's not really what we want. So, at this point I
think we should stick with the strategy of working from the "inside out",
beginning with core schema elements and working our way out.
And, since the implementation design allows us to port parcels
incrementally, this will allow us to control scope of porting in 0.6,
because we can "stop any time we want". So, I don't have any specific
recommendations regarding the who/what/when parts. Probably what will
happen is that once I have enough of the schema API implemented, I will try
to port a parcel's schema (probably the osaf.contentmodel.* parcels) and
see what happens. Either it will then be an example for how to do other
parcels, or I'll learn what doesn't work well in the schema API. We'll
also know then how long it might take to port a parcel, and can do more
planning at that time.
More information about the Dev
mailing list