[Chandler-dev] Chandler background full-text indexing
Andi Vajda
vajda at osafoundation.org
Tue May 23 20:50:05 PDT 2006
On Tue, 23 May 2006, Heikki Toivonen wrote:
> While once a minute is good proof-of-concept, I believe we need to have
> a centralized way to force indexing to happen before using functionality
> that requires up-to-date indexes.
Indeed. That is why I mentionned the repository.notifyIndexer() API in my
previous message. If forces the indexer to run right away. Note that even if
it runs right away, it does so in the background and make take a few seconds
(or more) to complete.
> For example, suppose a user synchronizes their collections, and follows
> up with a search. If the indexer hasn't run yet, the search will not
> find the newly synced items (and will return garbage for changed stuff).
When a user synchronizes their collections, that operation also takes place in
the background, in a different view (see Morgen for details), not in the UI
view.
All views do their PyLucene indexing at commit() time UNLESS they're told not
to with setBackgroundIndexed(True). In Chandler, currently, the only view that
does NOT index during commit() is the UI view. All others work as they used
to, that is, indexing is done during commit().
Therefore, when the sync completes, the indexing has also been done and search
returns consistent results.
> It is not scalable/reliable to add spot checks to the code to force
> indexing just before actions that we know will need to have fresh
> indexes (like run indexer before executing search).
This is only an issue in views rely on background indexing, currently only the
UI view. It is up to the app to decide which views need which style of
indexing.
> Looking at Tinderbox perf data, the new code more than halved the time
> it takes to import a large calendar. All in all, our new code is about
> 5% faster than it was before we started indexing stuff (on Windows,
> didn't check other platforms).
>
> We will need new tests and may need to modify existing tests to work
> with indexing in a deterministic way, measure actual indexing perf etc.
I think that to make meaningful time comparisons you need to either run
indexing as it used to be run, that is, during commit (still three times
faster on Mac now), or not at all (until we have tests that depend on
indexes).
I added a command line flag to this effect. By default --indexer is set to
background but you might want to run the perf tests with --indexer=foreground
or with --indexer=none (support for that last value is not yet checked in).
That way, you don't have background indexing kicking in more or less at random
and slowing down whatever is currently being measured.
Andi..
More information about the chandler-dev
mailing list