[pylucene-dev] Help understanding "performance" issues.

Rune Hansen rune.hansen at scanmine.com
Wed Feb 21 02:28:34 PST 2007


Hi,
I've set up a Multisearcher* inside a patched cherrypy 3.0 server  
(patched with PythonThreads).
Using a Queue, I've created searchables (MultiSearchcer spanning 10  
indexes with approximately 900.000 documents combined) which are  
available through cherrypys .thread_data facility for the servers 10  
threads.

When timing a search of medium complexity, one searchable returns  
after ~0.3 seconds.
The optimum seems to be to create two searchables, it does not  
produce higher throughput when I increase the number of searchables  
to three or more, it actually slows all the requests down. If I  
reduce the number of searchables to one, it will produce half the  
throughput of two searchables.

For example:
ab -n100 -c8 on one searchable available to 10 threads : Requests per  
second:    1.66 [#/sec] (mean)
ab -n100 -c8 on two searchables available to 10 threads : Requests  
per second:    3.05 [#/sec] (mean)
ab -n100 -c8 on three searchables available to 10 threads : Requests  
per second:    2.98 [#/sec] (mean)
ab -n100 -c8 on four searchables available to 10 threads : Requests  
per second:    2.95 [#/sec] (mean)
(average of 5 runs on each)

I have a hard time understanding this behavior. Is it because of how  
Lucene accesses a IndexReader? Is it because of hardware limitations?  
Can in be programmed "smarter" at my end?

It seems to me that the MultiSearcher can access (at most) two  
searchables, perhaps not simultaneously, but at least it is able to  
efficiently switch between them. Introduce a third searchable and  
performance starts to drop. The throughput of the server is almost  
equal to running with two searchables, but each request takes longer  
to process.

I'm testing this on a dual 3.2ghz Xeon server with 2GB memory. The  
python process takes up 200MB-600MB memory. When running at full load  
the processor usage is in the area of 180%. Python 2.4, PyLucene 2.0  
binary. Not much else is running on this machine.

*also tested with ParallelMultiSearcher. MultiSearcher seems to be  
quite a bit faster.

regards
/rune

Happy those, who can remain at Highbury!
Jane Austen (1775-1817)


-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.osafoundation.org/pipermail/pylucene-dev/attachments/20070221/00c6f0ba/attachment.htm


More information about the pylucene-dev mailing list