[Chandler-dev] Server-side backups theoretically desirable?
jared at wordzoo.com
Fri Aug 25 13:58:51 PDT 2006
Grant Baillie wrote:
> In principle, this could be fixed on the client side by remembering all
> the ETags we ever synced each resource against...
It doesn't seem we need to go that far; versioning of some kind seems to be
a reasonable be practice.
In general, I like version numbers over last-modified. I really like the
way simple sync architectures like Palm's thing seem quite reliable.
Server-side versions have been discussed, but how about client-side versions
Instead of making the server keep track of all state, leave ETag as an
advisory indication of the resource changing. If the ETag has changed, go
ahead and download, but check the version numbers to make sure the version
id has not dropped. If it has dropped, for whatever reason, either note
that and notify the user of conflicts, or silently drop it and overwrite it
with your copy. If the version id is higher, overwrite the Chandler
repository with the server copy.
Thoughts on this approach?
> It's a little tricky, because ETags are a per-resource
> thing, whereas our sync mechanism is repository-wide.
A momentary diversion: at the last CodeCon I saw a presentation that
included a discussion of using Markov tries (yeah, tries not trees; I'm a
little fuzzy on the specifics) as a rapid synchronization mechanism.
The idea was that essentially that you'd take the ETags, and sort them into
a ordered tree of a certain kind. You do with the trees on both sides of
the synchronization. You then compute a hash over the concatenation of the
ETags of the resources at each level of the tree. Each level hashes over
the ETags/hashes of the level below it. At the root, you end up with a
single hash representing the state of the whole tree.
To synchronize, you check the root node's hash. If they differ, you fetch
the hashes of the root's children and check which of those differ. As you
walk the tree, you can ignore those subtrees where the hashes match and
decend into only those that differ, quickly finding which nodes have changed
and transferring only those.
Just a thought; I have no practical experience in doing this type of model,
but I can see some interesting advantages, particularly to performance if
it's not too heavyweight to calculate the hashing.
> An alternative is to take into account the resource's HTTP Last-Modified
> value as well as the ETag. Used on its own, Last-Modified is unreliable,
> but there's probably a way to use it in conjunction with ETag that would
> work well.
I agree that server-side last-modified is a reasonable option for improving
the survivability of a server-side rollback. I've got a better feeling
about version numbers though in the long term; Beta maybe not as critical.
Version numbers have better properties across multiple servers, etc.
More information about the chandler-dev