[Chandler-dev] Indexing, import performance and 'nodefer'

Grant Baillie grant at osafoundation.org
Mon Jul 2 10:22:42 PDT 2007


Hi, all

This is a summary of the thinking behind a change I recently  
committed to iCalendar import:

<http://viewcvs.osafoundation.org/chandler/?rev=14887&view=rev>

that speeds up the 3000 event calendar import test case by 35%-40%.  
There are similar gains to be had in other performance scenarios  
(like reload, and possibly subscribe), but there's some trickiness  
involved, so it's good to document things.

The main part of the diff (in parcels/osaf/sharing/stateless.py) is  
to wrap the import code with a "with repoView.reindexingDeferred():"  
call. It turns out that before this change, we were spending an  
enormous amount of time reinserting items in indexes, as a result of  
setting attributes that could affect the various indexes Chandler uses.

Currently, there are two usage patterns for repository indexes in  
Chandler:

1) Indexes used to make sure items are unique: The cases I know of  
are the EmailAddress and Location kinds. We don't want to create a  
new item every time you address an item a given email address, so we  
index the collection of all EmailAddress items, and use the index  
(actually, multiple indexes) to use an existing item if possible when  
you add or import an email address.

2) Indexes used for sorting or searching in the UI: Examples here are  
the indexes used for sorting on dashboard column, and also the global  
startTime-related indexes used by the calendar UI to find all the  
relevant events for a given week/day.

It turns out that it's OK to defer the indexes in #2 above for import  
(or reload, which is similar): the UI is already being notified of  
changes to the items it's displaying, so we don't need to keep all  
the indexes instantaneously up-to-date.

However, in case #1, deferring indexing often leads to errors that  
look like:

LookupError: Access to skiplist is denied, it is marked INVALID

because the deferring has left the index in a temporarily  
inconsistent state, but we're trying to iterate/insert into the  
index. So, for case #1, Andi added a 'nodefer' keyword argument to  
the createIndex() call, which means that these indexes will always  
keep themselves consistent (i.e. essentially ignore reindexingDeferred 
()). This allows us to defer indexing for the remaining indexes,  
which, happily, is where the most time was previously wasted.

--Grant



More information about the chandler-dev mailing list