[Dev] DRAFT: Python Schema API proposal
Bryan Stearns
stearns at osafoundation.org
Mon Apr 18 11:45:46 PDT 2005
Phillip,
How does the plan below deal with stamping?
...Bryan
Phillip J. Eby wrote:
> -------------------------------------
> Defining Chandler Schemas with Python
> -------------------------------------
>
>
> Introduction
> ============
>
> As many of you may know, I've for some time now been promoting the
> idea of replacing parcel XML with Python code for defining item
> schemas, and I created a proof-of-concept for this in the "Spike"
> project, found under 'internals' in the Chandler CVS.
>
> Since the PyCon sprints, it's my understanding that there's now a
> broad and actionable consensus at OSAF that it is indeed desirable to
> move to using Python syntax in place of XML for parcels' schema
> definition. So, after working with Andi and Grant to get the
> necessary infrastructure in place within Chandler, I'd like to present
> my proposal for what the Python schema definitions will look like, how
> migration might take place, and what new possibilities for Chandler
> development these changes will enable.
>
> If you haven't had a chance to look at Spike yet, you may find it
> helpful to read at least the "Introduction" section of this document:
>
> http://cvs.osafoundation.org/viewcvs.cgi/internal/Spike/src/spike/schema.txt?rev=HEAD&content-type=text/vnd.viewcvs-markup
>
>
> which presents a simple Python syntax for defining schemas. The
> actual syntax used in Chandler will be different, but the above
> document gives a good introduction to the concept, with lots of
> working examples. (In fact, the document is designed for use with
> Python's "doctest" module and is literally a part of Spike's unit
> tests. As much as is practical, I'll be using this approach for the
> changes to Chandler, so that the API will be documented and tested at
> the same time as it's developed.)
>
> You'll notice, by the way, that the documentation doesn't talk much
> about Kinds, or names, paths, repository views, and parents. That's
> because in Spike's API, you don't need any of these things in order to
> create an Item. You just create the item, and until you take some
> action to store it, it's simply an ordinary Python object.
>
>
> How it will Work
> ================
>
> Here's a snippet of XML from the parcel.xml of the osaf.contentmodel
> package::
>
> <Kind itsName="ContentItem">
> <superKinds itemref="Item"/>
> <classes
> key="python">osaf.contentmodel.ContentModel.ContentItem</classes>
> <description>Content Item is the abstract super-kind for
> things like Contacts, Calendar Events, Tasks, Mail Messages, and
> Notes. Content Items are user-level items, which a user might file,
> categorize, share, and delete.</description>
> <Attribute itsName="body">
> <displayName>Body</displayName>
> <type itemref="Lob"/>
> <description>All Content Items may have a body to contain
> notes. It's not decided yet whether this body would instead contain
> the payload for resource items such as presentations or spreadsheets
> -- resource items haven't been nailed down yet -- but the payload may
> be different from the notes because payload needs to know MIME type,
> etc.</description>
> </Attribute>
>
> Here's the corresponding code in the proposed schema API::
>
> from application import schema # not sure if this is where it
> will go
> from repository.schema import Types
>
> class ContentItem(schema.Item):
> """Base class for content items
>
> A content item (such as a contact, note, photo, etc.) Content
> objects are
> user-level items that a user might file, categorize, share,
> and delete.
> """
>
> body = schema.One(Types.Lob,
> displayName = "Body",
> doc = """\
> All Content Items may have a body to contain notes. It's
> not decided
> yet whether this body would instead contain the payload
> for resource
> items such as presentations or spreadsheets -- resource
> items haven't
> been nailed down yet -- but the payload may be different
> from the notes
> because payload needs to know MIME type, etc."""
> )
>
> The fundamental idea here is that Python class definitions replace
> Kind elements, and Python property definitions replace Attribute
> elements. Superkinds are defined by inheritance. Parcels are Python
> packages. Standard Python "import" statements replace XML namespace
> definitions.
>
> This has several useful consequences. First, it makes item classes
> independent of parcel loading, which means they're easy to unit test.
> You can simply create instances of items in order to run tests on
> them. Second, it means that content classes don't need getKind()
> methods and other chicanery to get access to a Kind object, just to be
> able to create instances. Indeed, in all the ways that matter, items
> will just be normal Python objects until/unless you link them with
> items that are already stored in the repository (at which time they
> will become persistent).
>
> This means routines that create new items will no longer need to know
> what repository view the item is intended for. Instead, such routines
> can simply create an instance of the appropriate class and return it
> without further ado. As soon as the caller links the new item to a
> persisted item (e.g. by setting an attribute), the new item will be
> persisted as well. (This functionality will be made possible by the
> "null view" and "view migration" features that Andi has added to the
> repository.)
>
>
> Code vs. Data
> -------------
>
> Sometimes when I describe the preceding, people wonder if this use of
> Python means that we are giving up on being "data driven", or if we
> will still be able to allow users to create kinds and attributes. No,
> we are not giving up on data-driven, and we will be just as dynamic as
> before.
>
> If you're not familiar with Python's ultra-dynamic nature, it would
> seem at first that writing code must be less flexible or less dynamic
> than writing XML, but this is not at all the case. The Python code
> for a schema definition is just a script that creates data objects.
> These data objects are no different than the data objects you would
> create by reading XML. The only technical difference is that the
> Python code doesn't have to parse the XML first! (Of course, there
> are aesthetic differences, too.)
>
> Note also that just because some schema is defined by writing Python
> classes, it doesn't stop Chandler from allowing users to create
> attributes or kinds. Again, if you're used to more static languages
> like Java or C++, it's natural to think of a class as something
> fixed. But Python allows you to trivially create new classes on the
> fly. For example::
>
> def create_a_class(docstring,base_class=object):
> class aNewClass(base_class):
> __doc__ = docstring
> return aNewClass
>
> This function returns a new, distinct class object each time it's
> called. Each returned class will have the name "aNewClass", but it
> will be a distinct class object. (And you could change its name by
> setting its ``__name__`` attribute, if you wanted to.)
>
> If methods were defined in this "nested class" statement, they would
> have access to any parameters that were passed to ``create_a_class``,
> which would allow the methods to be customized for each new class
> created. In effect, Python is its own macro language at this level.
> Also note that there's no speed disadvantage here; the statements are
> compiled only once (when the module is compiled), no matter how many
> times you call the function and create new classes. They are not
> compiled on the fly; the statements are just the same as any other
> Python statements, and there is absolutely no observable distinction
> between the dynamically created classes and "normal" classes, because
> *all* Python classes are dynamically generated in exactly the same way!
>
> So as you can see, Python is an extremely *fluid* language, and the
> assumption that "code" is harder to change than data doesn't really
> carry over from other languages. "Hard coding" *isn't*, in other
> words. So, it's trivial to define fresh classes and descriptors to
> represent user-defined kinds and attributes, and in fact the
> repository already does this kind of class generation today to support
> multiple inheritance of kinds.
>
> What do we gain from this? Well, it won't be necessary to keep track
> of or look up Kinds in order to create items: just create an instance
> of the class. And if there's a class for every Kind that needs to be
> referenced "statically" in code, then you won't need to also keep
> track of repository paths in order to get access to a kind; just
> import the class and ask for its kind.
>
>
> Parcel Loading
> --------------
>
> There are no plans to change the current parcel loading arrangements;
> parcel.xml will remain a valid way to define schemas and instances.
> The only change likely to be made to parcel loading is to ensure that
> a parcel's Python modules are imported before trying to process
> instances defined in the parcel.xml. This is to ensure that the kinds
> are present in the repository before the instances are created. Apart
> from this change, however, the parcel.xml format should not be impacted.
>
> Existing parcels will be changed to use the new schema definition
> mechanism on an "inside out" basis. That is, superkinds will be
> changed before subkinds. This is because kinds defined in a
> parcel.xml can refer to kinds defined in a Python module, but not the
> other way around. So, likely the contentmodel parcel will be changed
> first.
>
> There is, however, a new step that will have to be done when new kinds
> or attribute definitions are added to a parcel defined using Python.
> Each kind or attribute needs a permanent UUID assigned to it, as this
> UUID will be used to synchronize the Python module with the
> repository, and in the future it may be used to help support schema
> evolution. Spike has a tool that will automatically assign UUIDs for
> you, so that you don't have to do it by hand::
>
> http://cvs.osafoundation.org/viewcvs.cgi/internal/Spike/src/spike/uuidgen.txt?rev=HEAD&content-type=text/vnd.viewcvs-markup
>
>
> (Of course, it will have to be ported to work with the new Chandler
> schema API, because Spike doesn't currently integrate with the
> repository.)
>
> If you forget to run the tool over a module whose schema has changed,
> and you didn't set up the UUIDs by hand, an exception will be raised
> when you try to create instances of the new or changed classes. There
> should be a reminder in the error message telling you to run the UUID
> generation tool to resolve the error.
>
>
> API "Quick Reference"
> ---------------------
>
> It is currently an open issue where the API will live. But it's going
> to be a module called ``schema``, such that you'll do ``from somewhere
> import schema``; it's just not clear yet what ``somewhere`` will be.
> Here are the main features of interest:
>
> ``schema.Item``
> The base class for persistent items; inherit from it or a
> subclass. Note that your Python inheritance relationship will
> determine the superkind hierarchy of your newly defined kinds, so you
> will want to be sure that you subclass the appropriate base kind
> class, rather than subclassing everything directly from ``schema.Item``
>
> ``schema.One``
> Define an attribute of "single" cardinality, optionally specifying
> any attribute aspects like its type and display name.
>
> ``schema.Many``
> Define an attribute of "set" cardinality (once this is available
> in the repository), optionally specifying any attribute aspects like
> its type and display name.
>
> ``schema.Sequence``
> Define an attribute of "list" cardinality, optionally specifying
> any attribute aspects like its type and display name.
>
> ``schema.Mapping``
> Define an attribute of "dict" cardinality, optionally specifying
> any attribute aspects like its type and display name.
>
> ``schema.Cloud``
> Define a cloud attribute. (This isn't entirely worked out yet;
> Spike was using a different approach to the cloud concept, so I may
> need some assistance from someone wise in the ways of clouds before
> getting a concrete API defined for this.)
>
> In order to reference types (as opposed to kinds), you'll import them
> from ``repository.schema.Types``. For example, ``Types.String`` to
> define a string attribute. For attributes that reference other kinds,
> you'll just import the corresponding class directly from the
> appropriate module.
>
> Attribute aspects will mostly be keyword arguments to the attribute
> definitions. Inverse attributes for bidirectional relationships will
> be specified with an ``inverse`` keyword, and as in Spike they will
> refer to an attribute of the other class. For example::
>
> class ContentItem(schema.Item):
> ...
> creator = schema.One(
> displayName = "Created By",
> doc = "Link to the contact who created the item",
> )
>
> class Contact(ContentItem):
> itemsCreated = schema.Many(
> ContentItem, # sequence of ContentItem
> inverse = ContentItem.creator,
> ...
> )
>
> Notice that the inverse need only be specified on *one* side of the
> bidirectional relationship -- whichever side is defined last.
>
>
> Implementation Tasks
> ====================
>
> 1. Update Spike's code generator tests to use the repository's new
> "null view" instead of a memory repository. (DONE; this yielded a 40%
> speed improvement for the tests, dropping pack load time from roughly
> 1.3 seconds to about 0.8 seconds.)
>
> 2. Add Spike tests to prototype programmatic creation of repository
> Kinds and Attributes, and setting their UUIDs at construction time.
>
> 3. Test subclassing the repository's new C-based descriptor types and
> adding Spike-style metadata to them.
>
> 4. Implement the actual schema API and doctests in the main Chandler
> codebase for Kinds and Attributes. (This is pending a decision of
> where the API should live in the Chandler package namespace; maybe
> that decision can be wrapped next week while I'm in SFO.)
>
> 5. Define and implement a cloud-definition API (probably needs some
> input from persons Wise in the Ways of Clouds)
>
> 6. Port Spike's UUID generation tool (and docs) to work with modules
> using the Chandler schema API
>
> 7. Attempt a port of the ``contentmodel`` parcel using the API,
> possibly w/participation by others. (Note: Andi would need to have
> completed the repository auto-import feature before this would
> actually be usable in the Chandler application.)
>
> 8. Modify the parcel loading facilities to ensure that modules
> defining kinds are imported before loading parcel.xml files that
> define instances of those kinds. (This might need to be done by
> someone other than me; it might also require some minor changes to
> existing parcels or to the rules for how parcel loading is sequenced.)
>
> 9. Investigate possible synergy between the descriptor-level aspect
> caching that Andi wants to do for performance reasons, and the aspect
> setting that the schema API needs to do for schema definition
> reasons. (This will probably actually happen while I'm in SFO next
> week; it's only at the bottom of this list because it's optional in
> the general scheme of things.)
>
> 10. Investigate the feasibility of implementing Spike's
> ``schema.Relationship`` concept for Chandler, to allow creation of
> global attributes that don't appear in a class' static API, allowing
> parcels to expand/extend existing parcels.
>
>
> In Conclusion
> =============
>
> * Python class definitions offer a compact and convenient way to
> specify Chandler schemas that will be easier and less error-prone to
> use than parcel.xml, without losing any of Chandler's current or
> planned flexibility.
>
> * parcel.xml isn't going away, and during the transition any schema
> components defined in parcel.xml should be able to co-exist with those
> defined using Python (barring any inter-dependency issues and assuming
> no other issues arise).
>
> * Using Python-defined schema means that content items can be unit
> tested in isolation, without parcel loading overhead, making fast unit
> tests possible, enabling a test-driven approach to development of the
> non-UI portions of Chandler. It also reduces coupling between
> routines that currently have to ferry repository views or items around
> in order to be able to find kinds and set parents on newly created items.
>
> I hope that this was informative and helpful. I will be in OSAF's San
> Francisco offices next Monday through Thursday (April 18th-21st), so
> if you'd like to spend some time talking about any aspect of this
> proposal during those days, please let me know. Thanks!
> _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
>
> Open Source Applications Foundation "Dev" mailing list
> http://lists.osafoundation.org/mailman/listinfo/dev
>
More information about the Dev
mailing list