[Dev] Keeping Users' Data

Phillip J. Eby pje at telecommunity.com
Mon Feb 6 15:43:10 PST 2006


At 12:05 AM 2/3/2006 -0800, Morgen Sagen wrote:
>Although I had always assumed we would have code within each parcel
>that knew how to upgrade its data from the previous version to the
>current version, a more elegant solution would be to have a log of
>schema changes coupled with some transformation code that could apply
>those changes, as you described in your October email.  So how do
>those two methods compare?  You say:
>
>>"With sufficient care and infrastructure support, we can relatively
>>easily
>>support manual schema upgrades, in the sense of having installParcel () make
>>the changes, if we entirely forbid certain classes of schema change
>>that
>>could not be implemented in this way.  However, the amount of
>>developer
>>care required currently appears prohibitive, in the sense that it's
>>going
>>to seriously impede our flexibility to refactor."
>
>Do you mean that if we did take the route of putting hand-built
>transformation code into installParcel( ), the amount of
>transformation code would be unwieldy?

I don't know.  I don't know what kind of changes we're going to have.  The 
biggest question of all is, when does this discipline begin?  If we don't 
need to support upgrades before the release of 0.7, then there's a lot less 
to be done, and it's not certain that we need to provide any significant 
evolution infrastructure until 0.8.  For one thing, we can try to complete 
major moves before then, and we can make an effort to document and prepare 
for the freezing of key schemas.


>If we go the schema-change- log route, developers would still have to 
>create log entries for each
>change, or are you saying we could automatically build that log?

That all depends.  :)  So far, my observation has been that there are 
approaches that can manage detection of simple kinds of changes, but 
serious changes are harder to deal with.

In all honesty, the best recommendations I've seen suggest that having a 
well-defined externalization of your data (e.g. read/write XML, iCal, 
tab-delimited, etc.) is usually the best way to ensure upgradeability.  In 
effect, the object oriented approach to databases is essentially a really 
bad idea, because it tends to couple internal implementation details to 
your schema.  I wish I'd known that a few years sooner.  :)

But clearly, that insight doesn't help us much anyway.  :)  Our current 
situation is more of a chicken-and-egg problem.  I don't know what 
evolution features we really need, because we haven't yet upgraded users' 
data.  We haven't yet upgraded users' data because we haven't decided as of 
what point we will be *keeping* their data.  And we haven't done that, 
because we don't know what kind of schema evolution we can support, and so 
on.  :)

I suppose one way to investigate the issue might be to study the revision 
logs of the schema version number, to see what kinds of things it has been 
changed for in the past.


>There's always the chance that if we do the log-based transformation
>system, some parcel writer will want to be able to perform an upgrade
>that the transformation system doesn't support.  In that case could
>they have custom upgrade code in their parcel, or would all upgrading
>need to go through the transformation system?

That I don't know.  One question that's in my mind about this is whether 
these transforms can be incremental and "just-in-time" or whether they have 
to be all-at-once.  If they're all-at-once, they're going to need a 
progress UI, which means there needs to be more of an API structure than 
just "we call some code and stuff happens".  It'll need to be organized in 
a way that allows progress updates to take place.



More information about the Dev mailing list