[Dev] schema evolution discussion at PyCon

Phillip J. Eby pje at telecommunity.com
Tue Feb 28 11:52:41 PST 2006


So today, Katie, Ted, Grant, John, and others took part in a 
discussion about how to do upgrades in 0.6->0.7->0.8, and we 
identified a general strategy for "worst case" upgrades, as well as 
some opportunities for less-worse cases.  Here are the scenarios:

1. Transparent upgrade - new parcels, new kinds, new attributes on 
existing kinds.  This pretty much works today, except for new 
attributes.  New attributes with an initialValue will also need some 
code added to effect adding the initial value to existing 
items.  These features can probably be implemented entirely within 
the schema API.

2. Incremental upgrade - moves, renames, attribute type changes, and 
other changes that can be implemented by referring to only a *single 
item at a time*.  These could possibly be implemented by giving an 
"upgrade" classmethod access to a view that has the old schema in 
effect, or perhaps a set of values/references in a dictionary.  The 
old item would be classless, and any incompatible data would first be 
purged from the current view.  Thus, the upgrade method would be 
responsible for copying/transforming values from the old view to the 
new view in the changed format.  Implementing this would require 
repository changes as well as schema API features.  The schema API 
would need to be able to detect a schema change, and each class would 
require a schema version number (optional for the first version of a 
class, required as soon as any non-transparent changes are made).

3. Full upgrade - any non-incremental, non-transparent change (such 
as a repository format change, or a complex schema change) would 
require a backup of the existing data to some serialized format, 
which would then be reloaded in a clean repository.  Each parcel will 
be responsible for its own data, but some to-be-designed framework 
will manage the overall process.  Parcels will have "backup" and 
"restore" methods to do this, which will be called by the 
framework.  Parcels without explicit methods will get default 
implementations that will save and load the data, but will be unable 
to support schema upgrades without additional code.  That is, if a 
parcel's schema changes, its "restore" method at least needs to 
change to be able to load the old backup format into the new schema.

If we only implemented one of the above scenarios, it would have to 
be #3, because it can support any kind of change, whereas the other 
two can only support certain classes of simple changes.  It was also 
proposed that we could support upgrades from 0.6 by backporting the 
"backup" methods to an 0.6.x release, thus allowing 0.7 to "restore" them.

We also talked about the actual mechanism for doing an upgrade, which 
might need to consist initially of running the old version with a 
command line option to perform a backup, then running the new version 
to do the restore.  This could potentially be done by an installer to 
give a better UI for people who don't use command lines.

Pretty much, the next steps are going to be to design and implement 
the backup/restore framework, and define actual serialization 
format(s) for our existing parcels.  Once that's done, we can 
consider adding incremental/transparent update features as 
well.  There will need to be broader discussions on all of these 
topics, but that may be able to wait until there's at least a design 
proposal for the backup/restore framework.  There's also some 
potential for overlap between the serialization format(s) and the 
"stable sharing protocol" that needs to happen in 0.7, but it's not 
yet clear how much.

And that's where we broke for lunch.  :)  Any questions?  Did I miss anything?





More information about the Dev mailing list