[Dev] ZODB is not a Storage Technology (Re: other formats )John Anderson Sat, 09 Nov 2002 09:09:49 -0800
Thanks for the very nice overview. Makes lots of sense and it will help me as we jump into the code. I did have one question, see below John Mike C. Fletcher wrote: > Okay, here's a quick overview of the guts, presented as an outline. > I've assumed you'll be reading the summaries with the source-code open > in another window to see what's being described, so I've not gone into > any details as to how anything is done. > > The objects likely best to concentrate on for understanding the > low-level guts are the FileStorage, the Connection, and the > _defaulttransaction. I've given you quick summaries of what you'll > find in most of the files in the ZODB4 CVS packages (ZODB, Transaction > and Persistence), the zLOG project is just logging facilities, nothing > really close to the core of the ZODB. The indentation is primarily > showing usage patterns (for instance, fsindex is really only used by > FileStorage AFAIK), though I've also used it to group items which can > be considered sub-categories of the superior item. > > I'll work on details tomorrow if I can get some more time, > questions/directions in which you'd like more coverage quite welcome. > BTW: I've copied the ZODB-dev list so that others can correct anything > I've messed up, or add anything that they consider critical to > understanding the system. > > Enjoy, > Mike > > ZODB: > Storage (BaseStorage sub-classes): > """Storages are responsible for maintaining object state records > > They can also maintain undo (transaction) and versional records. > """ > FileStorage: > """Default ZODB storage > > The FileStorage is a linear aggregate of all transactions, > and transactions are aggregates of all changed objects. > Transactions are added at the end of the file, with > later changes to a particular object conceptually overwriting > the earlier changes. > > Versions (personal views of the dbase) are just transactions > which are declared to have version information. The versions > form linked lists (they point to the last transaction in the > version). > > Storages which have undo support (such as filestorage) have > a pack method which basically copies all objects forward until > there is a single current set. Then discards anything not in > the current set. Does it copy "in place" so that if you pulled the plug while in pack your file is corrupted? > > """ > fsIndex: > """Index from persistent OID -> file position index > The fsIndex provides optimised index to > individual objects > within the data file of the FileStorage. The index can > be rebuilt merely be scanning through the entire datafile. > """ > TmpStore: > """Storage for transaction save-points""" > DBMStorage: > """Simple storage based on GDBM/AnyDBM""" > MappingStorage: > """A demonstration of a volatile in-memory storage""" > > utility mechanisms: > TimeStamp: > """TimeStamp C exetension type""" > Serialize: > """Pickle-like storage (cPickle plus some custom code)""" > referencesf: > """finds object refs in pickle strings""" > file_lock: > """(small) wrapper to do cross-platform locking of > files""" > fsdump, fsrecover: > """Debugging/utility code""" > > Connection: > """Object-space in which application objects live > > Uses an in-memory object-cache (see below) > > Provides object-access (get root dict, get object by oid) > though normal access is via getting root and then > drilling down through the object references. > > Other than this, almost the entire class is support > for the transaction and persistence mechanisms. > """ > ExportImport: > """Mix-in providing XML import/export""" > DB: > """Manages multiple Connections to a storage > > Provides a pool of connections > Provides mechanisms for applying functions > to all object caches in all connections > Tracks object modifications for versions? (not > sure about this, I've never used versions) > > Provides most of the primitives on which Connection and > Transaction build the transaction mechanism. (tpc_*) > """ > > > Transaction: > _defaultTransaction: > """The default transaction machinery > > Combined with the connection object, this is most > of the transaction-driving code in the system. It > is fairly tightly coupled to the Persistent module > (e.g. it assumes _p_jar and the like on all registered > objects). > """ > Transaction: > """Data-storage for the current transaction""" > Manager: > """Entry point for transaction APIs""" > > Persistence: > _persistent: > """Python 2.2.2 implementation of IPersistent > > Basically, this is a Pure-python version of the cPersistence > code that really gets used (I'm not sure if there's code > anywhere to fall back to using this version if the cPersistence > code isn't compiled). > > This is quite useful for figuring out what's going on, > but (having used it for a few months), it seemed too slow > to be of use in a real-world system (too much time spent in > __getattribute__). > """ > cPersistence: > """Provides optimised IPersistent implementation""" > > Cache: > """Provides an in-memory object cache to reduce reloads from disk > > Basically this is a high-level cache, it has a target size > and a few methods implementing garbage collection. The > DB calls the connection's GC methods, then the connection calls > it's cache's GC methods. > """ > > particular data-types: > PersistentDict, PersistentList: > """Dictionary and List types which track their changes > > Basically allow you to use them as lists/dicts without > needing to spend code tracking changes yourself. These > items, however, re-store the entire list/dict on each > save, so see BTree for large dicts. > """ > BTrees: > """BTree implementation using individually persistent nodes > > Allows large dictionaries to be stored so that only a small > sub-set of the dictionary needs to be re-stored on > modifications > """ > Function, Module, Package: > """References to these types w/ importing > > Never used these myself (I think they're new), > they appear to store name-references, or actual > code objects in the case of functions. > """ > > > > John Anderson wrote: > >> I'd be interested in an overview of the guts. Start with a big >> picture, then move into some details and describe what's in which >> files. I'd like to eventually learn the code base so I can decide how >> to improve it. >> >> John >> >> Mike C. Fletcher wrote: >> >>> At what level would you like the description (I've been using ZODB >>> for years now, and have just released a calendaring application on >>> it). I assume you understand the basics, so are you looking for >>> analysis of where/how it starts to fail/how to update it, or what >>> the actual machinery inside is doing for any given action? >>> >>> I'll push some time around and try to get a description posted this >>> weekend if you can tell me which area you need. >>> >>> Enjoy, >>> Mike >>> > ... >
|