[pylucene-dev] Help understanding "performance" issues.

Andi Vajda vajda at osafoundation.org
Wed Feb 21 10:56:13 PST 2007


On Wed, 21 Feb 2007, Rune Hansen wrote:

> I've set up a Multisearcher* inside a patched cherrypy 3.0 server (patched 
> with PythonThreads).
> Using a Queue, I've created searchables (MultiSearchcer spanning 10 indexes 
> with approximately 900.000 documents combined) which are available through 
> cherrypys .thread_data facility for the servers 10 threads.
>
> When timing a search of medium complexity, one searchable returns after ~0.3 
> seconds.
> The optimum seems to be to create two searchables, it does not produce higher 
> throughput when I increase the number of searchables to three or more, it 
> actually slows all the requests down. If I reduce the number of searchables 
> to one, it will produce half the throughput of two searchables.
>
> For example:
> ab -n100 -c8 on one searchable available to 10 threads : Requests per second: 
> 1.66 [#/sec] (mean)
> ab -n100 -c8 on two searchables available to 10 threads : Requests per 
> second:    3.05 [#/sec] (mean)
> ab -n100 -c8 on three searchables available to 10 threads : Requests per 
> second:    2.98 [#/sec] (mean)
> ab -n100 -c8 on four searchables available to 10 threads : Requests per 
> second:    2.95 [#/sec] (mean)
> (average of 5 runs on each)
>
> I have a hard time understanding this behavior. Is it because of how Lucene 
> accesses a IndexReader? Is it because of hardware limitations? Can in be 
> programmed "smarter" at my end?

I'm not sure. There have been many threads about this on 
java-dev at lucene.apache.org. A bunch of work was done in the area of locks and 
indexes in Lucene 2.1, so I'd try to upgrade to PyLucene 2.1 as well.

Andi..



More information about the pylucene-dev mailing list