[pylucene-dev] Locked index

Andi Vajda vajda at osafoundation.org
Sun Feb 27 09:55:34 PST 2005


>> The Lucene locking code doesn't use the OS locking APIs offered by Java 
>> 1.5, yet. Until then, that code is a little brittle. If you need more 
>> reliability in this area, try using a database for your index such as the 
>> DbDirectory implementation built around Berkeley DB. PyLucene supports it. 
>
> How does performance compare?

Berkeley DB is pretty fast but I didn't do a comparison with FSDirectory.
Berkeley DB is not SQL-based. The DbDirectory uses two B-trees for all its 
storage needs. I always use DbDirectory and it's fast enough.

The overhead you're going to have to deal with is related to using a database. 
There are going to be large files, transaction logs, backups, etc... to 
manage. There are also a number of configuration options that affect 
performance within the constraints of your application to consider.

The advantages of a database such as Berkeley DB are well worth it, especially 
for large indexes which take quite some time to rebuild in case of corruption.

> Will this work well for big indexes?  I'm at 6 GB and it looks like it'll hit 
> abou 20 GB at the rate I'm going.

Berkeley DB claims to scale up to terabytes.

Andi..



More information about the pylucene-dev mailing list