[Dev] DRAFT: Python Schema API proposal

Bryan Stearns stearns at osafoundation.org
Mon Apr 18 11:45:46 PDT 2005


Phillip,

How does the plan below deal with stamping?

...Bryan


Phillip J. Eby wrote:

> -------------------------------------
> Defining Chandler Schemas with Python
> -------------------------------------
>
>
> Introduction
> ============
>
> As many of you may know, I've for some time now been promoting the 
> idea of replacing parcel XML with Python code for defining item 
> schemas, and I created a proof-of-concept for this in the "Spike" 
> project, found under 'internals' in the Chandler CVS.
>
> Since the PyCon sprints, it's my understanding that there's now a 
> broad and actionable consensus at OSAF that it is indeed desirable to 
> move to using Python syntax in place of XML for parcels' schema 
> definition.  So, after working with Andi and Grant to get the 
> necessary infrastructure in place within Chandler, I'd like to present 
> my proposal for what the Python schema definitions will look like, how 
> migration might take place, and what new possibilities for Chandler 
> development these changes will enable.
>
> If you haven't had a chance to look at Spike yet, you may find it 
> helpful to read at least the "Introduction" section of this document:
>
> http://cvs.osafoundation.org/viewcvs.cgi/internal/Spike/src/spike/schema.txt?rev=HEAD&content-type=text/vnd.viewcvs-markup 
>
>
> which presents a simple Python syntax for defining schemas.  The 
> actual syntax used in Chandler will be different, but the above 
> document gives a good introduction to the concept, with lots of 
> working examples.  (In fact, the document is designed for use with 
> Python's "doctest" module and is literally a part of Spike's unit 
> tests.  As much as is practical, I'll be using this approach for the 
> changes to Chandler, so that the API will be documented and tested at 
> the same time as it's developed.)
>
> You'll notice, by the way, that the documentation doesn't talk much 
> about Kinds, or names, paths, repository views, and parents.  That's 
> because in Spike's API, you don't need any of these things in order to 
> create an Item.  You just create the item, and until you take some 
> action to store it, it's simply an ordinary Python object.
>
>
> How it will Work
> ================
>
> Here's a snippet of XML from the parcel.xml of the osaf.contentmodel 
> package::
>
>     <Kind itsName="ContentItem">
>         <superKinds itemref="Item"/>
>         <classes 
> key="python">osaf.contentmodel.ContentModel.ContentItem</classes>
>         <description>Content Item is the abstract super-kind for 
> things like Contacts, Calendar Events, Tasks, Mail Messages, and 
> Notes. Content Items are user-level items, which a user might file, 
> categorize, share, and delete.</description>
>         <Attribute itsName="body">
>             <displayName>Body</displayName>
>             <type itemref="Lob"/>
>             <description>All Content Items may have a body to contain 
> notes.  It's not decided yet whether this body would instead contain 
> the payload for resource items such as presentations or spreadsheets 
> -- resource items haven't been nailed down yet -- but the payload may 
> be different from the notes because payload needs to know MIME type, 
> etc.</description>
>         </Attribute>
>
> Here's the corresponding code in the proposed schema API::
>
>     from application import schema    # not sure if this is where it 
> will go
>     from repository.schema import Types
>
>     class ContentItem(schema.Item):
>         """Base class for content items
>
>         A content item (such as a contact, note, photo, etc.)  Content 
> objects are
>         user-level items that a user might file, categorize, share, 
> and delete.
>         """
>
>         body = schema.One(Types.Lob,
>             displayName = "Body",
>             doc = """\
>             All Content Items may have a body to contain notes.  It's 
> not decided
>             yet whether this body would instead contain the payload 
> for resource
>             items such as presentations or spreadsheets -- resource 
> items haven't
>             been nailed down yet -- but the payload may be different 
> from the notes
>             because payload needs to know MIME type, etc."""
>         )
>
> The fundamental idea here is that Python class definitions replace 
> Kind elements, and Python property definitions replace Attribute 
> elements.  Superkinds are defined by inheritance.  Parcels are Python 
> packages.  Standard Python "import" statements replace XML namespace 
> definitions.
>
> This has several useful consequences.  First, it makes item classes 
> independent of parcel loading, which means they're easy to unit test.  
> You can simply create instances of items in order to run tests on 
> them.  Second, it means that content classes don't need getKind() 
> methods and other chicanery to get access to a Kind object, just to be 
> able to create instances.  Indeed, in all the ways that matter, items 
> will just be normal Python objects until/unless you link them with 
> items that are already stored in the repository (at which time they 
> will become persistent).
>
> This means routines that create new items will no longer need to know 
> what repository view the item is intended for.  Instead, such routines 
> can simply create an instance of the appropriate class and return it 
> without further ado.  As soon as the caller links the new item to a 
> persisted item (e.g. by setting an attribute), the new item will be 
> persisted as well.  (This functionality will be made possible by the 
> "null view" and "view migration" features that Andi has added to the 
> repository.)
>
>
> Code vs. Data
> -------------
>
> Sometimes when I describe the preceding, people wonder if this use of 
> Python means that we are giving up on being "data driven", or if we 
> will still be able to allow users to create kinds and attributes.  No, 
> we are not giving up on data-driven, and we will be just as dynamic as 
> before.
>
> If you're not familiar with Python's ultra-dynamic nature, it would 
> seem at first that writing code must be less flexible or less dynamic 
> than writing XML, but this is not at all the case.  The Python code 
> for a schema definition is just a script that creates data objects.  
> These data objects are no different than the data objects you would 
> create by reading XML.  The only technical difference is that the 
> Python code doesn't have to parse the XML first!  (Of course, there 
> are aesthetic differences, too.)
>
> Note also that just because some schema is defined by writing Python 
> classes, it doesn't stop Chandler from allowing users to create 
> attributes or kinds.  Again, if you're used to more static languages 
> like Java or C++, it's natural to think of a class as something 
> fixed.  But Python allows you to trivially create new classes on the 
> fly.  For example::
>
>     def create_a_class(docstring,base_class=object):
>         class aNewClass(base_class):
>             __doc__ = docstring
>         return aNewClass
>
> This function returns a new, distinct class object each time it's 
> called.  Each returned class will have the name "aNewClass", but it 
> will be a distinct class object.  (And you could change its name by 
> setting its ``__name__`` attribute, if you wanted to.)
>
> If methods were defined in this "nested class" statement, they would 
> have access to any parameters that were passed to ``create_a_class``, 
> which would allow the methods to be customized for each new class 
> created.  In effect, Python is its own macro language at this level.  
> Also note that there's no speed disadvantage here; the statements are 
> compiled only once (when the module is compiled), no matter how many 
> times you call the function and create new classes.  They are not 
> compiled on the fly; the statements are just the same as any other 
> Python statements, and there is absolutely no observable distinction 
> between the dynamically created classes and "normal" classes, because 
> *all* Python classes are dynamically generated in exactly the same way!
>
> So as you can see, Python is an extremely *fluid* language, and the 
> assumption that "code" is harder to change than data doesn't really 
> carry over from other languages.  "Hard coding" *isn't*, in other 
> words.  So, it's trivial to define fresh classes and descriptors to 
> represent user-defined kinds and attributes, and in fact the 
> repository already does this kind of class generation today to support 
> multiple inheritance of kinds.
>
> What do we gain from this?  Well, it won't be necessary to keep track 
> of or look up Kinds in order to create items: just create an instance 
> of the class.  And if there's a class for every Kind that needs to be 
> referenced "statically" in code, then you won't need to also keep 
> track of repository paths in order to get access to a kind; just 
> import the class and ask for its kind.
>
>
> Parcel Loading
> --------------
>
> There are no plans to change the current parcel loading arrangements; 
> parcel.xml will remain a valid way to define schemas and instances.  
> The only change likely to be made to parcel loading is to ensure that 
> a parcel's Python modules are imported before trying to process 
> instances defined in the parcel.xml.  This is to ensure that the kinds 
> are present in the repository before the instances are created.  Apart 
> from this change, however, the parcel.xml format should not be impacted.
>
> Existing parcels will be changed to use the new schema definition 
> mechanism on an "inside out" basis.  That is, superkinds will be 
> changed before subkinds.  This is because kinds defined in a 
> parcel.xml can refer to kinds defined in a Python module, but not the 
> other way around.  So, likely the contentmodel parcel will be changed 
> first.
>
> There is, however, a new step that will have to be done when new kinds 
> or attribute definitions are added to a parcel defined using Python.  
> Each kind or attribute needs a permanent UUID assigned to it, as this 
> UUID will be used to synchronize the Python module with the 
> repository, and in the future it may be used to help support schema 
> evolution.  Spike has a tool that will automatically assign UUIDs for 
> you, so that you don't have to do it by hand::
>
> http://cvs.osafoundation.org/viewcvs.cgi/internal/Spike/src/spike/uuidgen.txt?rev=HEAD&content-type=text/vnd.viewcvs-markup 
>
>
> (Of course, it will have to be ported to work with the new Chandler 
> schema API, because Spike doesn't currently integrate with the 
> repository.)
>
> If you forget to run the tool over a module whose schema has changed, 
> and you didn't set up the UUIDs by hand, an exception will be raised 
> when you try to create instances of the new or changed classes.  There 
> should be a reminder in the error message telling you to run the UUID 
> generation tool to resolve the error.
>
>
> API "Quick Reference"
> ---------------------
>
> It is currently an open issue where the API will live.  But it's going 
> to be a module called ``schema``, such that you'll do ``from somewhere 
> import schema``; it's just not clear yet what ``somewhere`` will be.  
> Here are the main features of interest:
>
> ``schema.Item``
>     The base class for persistent items; inherit from it or a 
> subclass.  Note that your Python inheritance relationship will 
> determine the superkind hierarchy of your newly defined kinds, so you 
> will want to be sure that you subclass the appropriate base kind 
> class, rather than subclassing everything directly from ``schema.Item``
>
> ``schema.One``
>     Define an attribute of "single" cardinality, optionally specifying 
> any attribute aspects like its type and display name.
>
> ``schema.Many``
>     Define an attribute of "set" cardinality (once this is available 
> in the repository), optionally specifying any attribute aspects like 
> its type and display name.
>
> ``schema.Sequence``
>     Define an attribute of "list" cardinality, optionally specifying 
> any attribute aspects like its type and display name.
>
> ``schema.Mapping``
>     Define an attribute of "dict" cardinality, optionally specifying 
> any attribute aspects like its type and display name.
>
> ``schema.Cloud``
>     Define a cloud attribute.  (This isn't entirely worked out yet; 
> Spike was using a different approach to the cloud concept, so I may 
> need some assistance from someone wise in the ways of clouds before 
> getting a concrete API defined for this.)
>
> In order to reference types (as opposed to kinds), you'll import them 
> from ``repository.schema.Types``.  For example, ``Types.String`` to 
> define a string attribute.  For attributes that reference other kinds, 
> you'll just import the corresponding class directly from the 
> appropriate module.
>
> Attribute aspects will mostly be keyword arguments to the attribute 
> definitions.  Inverse attributes for bidirectional relationships will 
> be specified with an ``inverse`` keyword, and as in Spike they will 
> refer to an attribute of the other class.  For example::
>
>     class ContentItem(schema.Item):
>         ...
>         creator = schema.One(
>             displayName = "Created By",
>             doc = "Link to the contact who created the item",
>         )
>
>     class Contact(ContentItem):
>         itemsCreated = schema.Many(
>             ContentItem,    # sequence of ContentItem
>             inverse = ContentItem.creator,
>             ...
>         )
>
> Notice that the inverse need only be specified on *one* side of the 
> bidirectional relationship -- whichever side is defined last.
>
>
> Implementation Tasks
> ====================
>
> 1. Update Spike's code generator tests to use the repository's new 
> "null view" instead of a memory repository.  (DONE; this yielded a 40% 
> speed improvement for the tests, dropping pack load time from roughly 
> 1.3 seconds to about 0.8 seconds.)
>
> 2. Add Spike tests to prototype programmatic creation of repository 
> Kinds and Attributes, and setting their UUIDs at construction time.
>
> 3. Test subclassing the repository's new C-based descriptor types and 
> adding Spike-style metadata to them.
>
> 4. Implement the actual schema API and doctests in the main Chandler 
> codebase for Kinds and Attributes.  (This is pending a decision of 
> where the API should live in the Chandler package namespace; maybe 
> that decision can be wrapped next week while I'm in SFO.)
>
> 5. Define and implement a cloud-definition API (probably needs some 
> input from persons Wise in the Ways of Clouds)
>
> 6. Port Spike's UUID generation tool (and docs) to work with modules 
> using the Chandler schema API
>
> 7. Attempt a port of the ``contentmodel`` parcel using the API, 
> possibly w/participation by others.  (Note: Andi would need to have 
> completed the repository auto-import feature before this would 
> actually be usable in the Chandler application.)
>
> 8. Modify the parcel loading facilities to ensure that modules 
> defining kinds are imported before loading parcel.xml files that 
> define instances of those kinds.  (This might need to be done by 
> someone other than me; it might also require some minor changes to 
> existing parcels or to the rules for how parcel loading is sequenced.)
>
> 9. Investigate possible synergy between the descriptor-level aspect 
> caching that Andi wants to do for performance reasons, and the aspect 
> setting that the schema API needs to do for schema definition 
> reasons.  (This will probably actually happen while I'm in SFO next 
> week; it's only at the bottom of this list because it's optional in 
> the general scheme of things.)
>
> 10. Investigate the feasibility of implementing Spike's 
> ``schema.Relationship`` concept for Chandler, to allow creation of 
> global attributes that don't appear in a class' static API, allowing 
> parcels to expand/extend existing parcels.
>
>
> In Conclusion
> =============
>
> * Python class definitions offer a compact and convenient way to 
> specify Chandler schemas that will be easier and less error-prone to 
> use than parcel.xml, without losing any of Chandler's current or 
> planned flexibility.
>
> * parcel.xml isn't going away, and during the transition any schema 
> components defined in parcel.xml should be able to co-exist with those 
> defined using Python (barring any inter-dependency issues and assuming 
> no other issues arise).
>
> * Using Python-defined schema means that content items can be unit 
> tested in isolation, without parcel loading overhead, making fast unit 
> tests possible, enabling a test-driven approach to development of the 
> non-UI portions of Chandler.  It also reduces coupling between 
> routines that currently have to ferry repository views or items around 
> in order to be able to find kinds and set parents on newly created items.
>
> I hope that this was informative and helpful.  I will be in OSAF's San 
> Francisco offices next Monday through Thursday (April 18th-21st), so 
> if you'd like to spend some time talking about any aspect of this 
> proposal during those days, please let me know.  Thanks!
> _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
>
> Open Source Applications Foundation "Dev" mailing list
> http://lists.osafoundation.org/mailman/listinfo/dev
>


More information about the Dev mailing list