[pylucene-dev] index compatibility between jcc and gcj

Andi Vajda vajda at osafoundation.org
Tue Feb 12 23:28:40 PST 2008


On Wed, 13 Feb 2008, Alexandre Fiori wrote:

> i would like to know if it's possible to generate an index with pylucene-jcc
> in such a way that it's compatible with pylucene-gcj.
> that's because i figured out that pylucene-jcc is much faster creating the
> index, but slower searching.
>
> with the same data i generated two indexes, one using pylucene-jcc and
> another using pylucene-gcj.
> jcc can add 100.000 documents in ~56s, while gcj do the same in ~4m15s.
> but searching is different. jcc takes ~0.718s while gcj takes only
> 0.069sfor the same query.
>
> also, i've found that jcc version of pylucene can read/search indexes
> created with gcj version, but not the opposite.
> i would like to know if it's possible generate indexes faster with gcj, or,
> if it's possible to generate it by using jcc version and read/search with
> gcj version.

If you're using the same versions of the underlying Java Lucene software to 
build either jcc- or gcj-PyLucene I expect their indexes to be fully 
compatible. It is my understanding that newer Lucene versions are capable of 
reading older Lucene indexes but not the other way around.

The timing differences you're seeing are most likely due to the fact that a 
long running task, such as index creation, gives the Java VM (embedded in 
jcc-PyLucene) a better chance to compile the bytecode. gcj-PyLucene is 
faster quicker but is eventually passed by jcc-PyLucene once the embedded 
JVM has had a chance to compile the bytecode. If you were to run a sizeable
bunch of search queries in both jcc-PyLucene and gcj-PyLucene, I'm not sure 
which one would come out ahead. I suspect that jcc-PyLucene might actually.

Andi..


More information about the pylucene-dev mailing list