Open Source Applications Foundation

[Dev] ZODB is not a Storage Technology (Re: other formats )

Jeremy Hylton Mon, 4 Nov 2002 12:28:02 -0500


>>>>> "DM" == David McCusker <david@treedragon.com> writes:

  DM> I'm just learning about ZODB, so thanks for clueing me in
  DM> faster.  Now I know little but later I expect to get into the
  DM> nitty gritty details.

The ZODB Wiki is a decent place to get started.  It's get links to 
a variety of info, a couple of tutorials and a paper by Jim Fulton
that gets into implementation internals.a

  >> Currently, I understand the Storage selected as the ZODB back-end
  >> for Chandler is the Berkely DB, but ZODB has other storages
  >> available, and it's certainly possible to create more.

  DM> I understand Berkely DB has fabulous btree indexes for maps,
  DM> provided they are stored as one btree index per file. (Has that
  DM> changed?)  Does Berkeley DB provide a way to store arbitrary
  DM> sized objects without putting each one in a separate file?  I
  DM> gathered it didn't years ago.

  DM> A conventional thing to do with arbitrary sized objects is
  DM> append them to a single file (like mbox format files containing
  DM> email messages), and then index them from other files which
  DM> summarize the contents.  This approach requires a file rewrite
  DM> to compact after object deletes, which has time proportional to
  DM> db size rather than deleted content size.

  DM> I should look into the way Berkeley DB hooks into ZODB so I'll
  DM> have better informed ideas regarding the way it works and the
  DM> way alternatives could be plugged in as replacements.  I hope I
  DM> get around to starting this research in a few days.  (I'm
  DM> writing a spec at home right now.)

I wonder how appropriate Berkeley DB is for end user applications.
Running a Berkeley database entails a lot of management responsibility
-- checkpointing, log management, recovery, deadlock detection, etc.
It's a database, and running a database requires some database
administration.

The cost of administration is a drawback for any database.  I imagine
that Chandler would want to minimize the amount of administration an
end-user needs to do.  The ZODB storage with the least administrative
costs is FileStorage, which works much like you describe -- append
arbitrary objects to a single file.  It needs to be packed
occasionally; pack is the operation that removes old revisions of
objects.

The BerkeleyDB storage for ZODB is still experimental, but it's
intended more for server-side environments where there's a sysadmin on
hand to properly manage the database.

Jeremy