[Dev] transactions per second

Jeremy Hylton jeremy at alum.mit.edu
Mon Nov 11 15:32:04 PST 2002

>>>>> "DM" == David McCusker <david at treedragon.com> writes:

  >> There's only one kind of transaction in ZODB, and it has the
  >> standard ACID properties.

  DM> If D stands for durability, and flush() is not durable, then why
  DM> do you say ACID when the D is not supported?  (Just curious.  It
  DM> takes me a while to calibrate other folks' standards.)

Because I'm rushing a bit too much when I respond to these emails :-(.

FileStorage calls flush() during the prepare() phase of the two-phase
commit (2PC).  It writes a status byte and calls fsync() when the 2PC
commit occurs.  So there is the possibility that a storage crashes
after voting yes on the transaction but before the transaction
actually completes.  I think that's good enough for D, though we're
still not doing "careful writes."

  DM> Jeremy Hylton wrote:
  >> A ZODB transaction with FileStorage calls flush() before
  >> reporting that a transaction has committed.

  DM> Which moves the buffering from the app to the operating system,
  DM> and doesn't guarantee content is on disk.  I know you know this,
  DM> so this is for other readers.

To be sure that everyone is clear on these points.  The fsync() call
will copy any file buffers in memory to disk.  I believe it also
updates the file metadata.

  >> That's the durability guarantee you get.

  DM> When FileStorage works by appending, and the transactions are
  DM> marked so the starts and ends are findable, this is an adequate
  DM> amount of durability since one can ignore incomplete
  DM> transactions.  (Incidentally, this is exactly how I had the Mork
  DM> text db format work for Netscape.)

Right.  The failure modes leave transactions without a valid status
byte on the disk.  In log-structured storage like this, it is always
safe to ignore an incomplete transaction at the end of the file.

A careful write, as I understand it, would provide better durability
by writing to two different files on different media to guard against
low-level failures.


More information about the Dev mailing list