[Chandler-dev] Chandler background full-text indexing

Andi Vajda vajda at osafoundation.org
Tue May 23 20:50:05 PDT 2006


On Tue, 23 May 2006, Heikki Toivonen wrote:

> While once a minute is good proof-of-concept, I believe we need to have
> a centralized way to force indexing to happen before using functionality
> that requires up-to-date indexes.

Indeed. That is why I mentionned the repository.notifyIndexer() API in my 
previous message. If forces the indexer to run right away. Note that even if 
it runs right away, it does so in the background and make take a few seconds 
(or more) to complete.

> For example, suppose a user synchronizes their collections, and follows
> up with a search. If the indexer hasn't run yet, the search will not
> find the newly synced items (and will return garbage for changed stuff).

When a user synchronizes their collections, that operation also takes place in 
the background, in a different view (see Morgen for details), not in the UI 
view.

All views do their PyLucene indexing at commit() time UNLESS they're told not 
to with setBackgroundIndexed(True). In Chandler, currently, the only view that 
does NOT index during commit() is the UI view. All others work as they used 
to, that is, indexing is done during commit().
Therefore, when the sync completes, the indexing has also been done and search 
returns consistent results.

> It is not scalable/reliable to add spot checks to the code to force
> indexing just before actions that we know will need to have fresh
> indexes (like run indexer before executing search).

This is only an issue in views rely on background indexing, currently only the 
UI view. It is up to the app to decide which views need which style of 
indexing.

> Looking at Tinderbox perf data, the new code more than halved the time
> it takes to import a large calendar. All in all, our new code is about
> 5% faster than it was before we started indexing stuff (on Windows,
> didn't check other platforms).
>
> We will need new tests and may need to modify existing tests to work
> with indexing in a deterministic way, measure actual indexing perf etc.

I think that to make meaningful time comparisons you need to either run 
indexing as it used to be run, that is, during commit (still three times 
faster on Mac now), or not at all (until we have tests that depend on 
indexes).

I added a command line flag to this effect. By default --indexer is set to 
background but you might want to run the perf tests with --indexer=foreground 
or with --indexer=none (support for that last value is not yet checked in).
That way, you don't have background indexing kicking in more or less at random 
and slowing down whatever is currently being measured.

Andi..


More information about the chandler-dev mailing list