[pylucene-dev] ubuntu forum post about PyLucene
vajda at osafoundation.org
Thu Nov 1 22:13:14 PDT 2007
I found this interesting post comparing the GCJ and JCC PyLucene flavors on an
Mostly correct. Taking the final points made, comments inline:
1. GCJ version seems to be incompatible with python web frameworks, as well
Yeah, the threading issue in PyLucene with GCJ is a long standing pain that
got resolved with PyLucene with JCC.
2. GCJ has limits regarding file size for indexes, and sometimes cannot
optimize your data
That is true with GCJ 3.x. GCJ 4.x has a fix for the 2 Gb file size limit in
the Java runtime classes. Of course, your mileage with GCJ 4.x will vary.
3. GCJ is very, very fast making search
GCJ is faster than Sun's JRE in getting started. If your search is a short
lived program, GCJ is indeed faster. I did notice that this performance
difference got lesser and lesser as the program's running time was longer.
4. JCC is more complicated to install and require java installed (at least
Well, it depends. If you have to build your own GCJ, I'd argue that installing
PyLucene with JCC is vastly simpler. Building openjdk on Linux is also
comparatively easier (?) than building GCJ.
5. Programs using JCC version always need LD_LIBRARY_PATH
Not anymore. By using "-Wl,-rpath=libpath" in setup.py's LFLAGS, this problem
- and arguably, security issue - is resolved. No need to set LD_LIBRARY_PATH
anymore. svn trunk's version of JCC's setup.py has an example.
6. JCC needs to start java VM everytime you run the program, so in cases
like mine (cgi application) it's a bit slower
Yes, that's true. I spent some time today trying to detect the missing call to
initVM() but it's more complicated than I thought without adding the check
everywhere. I thought of adding it to findClass() only, a relatively slow
operation the first time, but it's harder than I thought. More on this
later. In the meantime, I put BIG notices at the top of both PyLucene's and
JCC's README files about the need to call initVM() before calling into the VM.
To dispell another fallacy in the post, initVM() is indeed documented along
with all its arguments in JCC's README file starting at line 189 of 
7. JCC is about 3 times slower than GCJ when searching records, but seems to
be fast importing data
See comment (3)
8. JCC seems to be more stable and can optimize indexes bigger than 2.4GB
Yes, the Sun-originating VMs are much more mature than GCJ's is many ways. Now
that Sun is sponsoring an open source JDK and JRE, openjdk , I expect most
of the open source energy in java land to be focusing on it (see iced tea 
project) instead of GCJ. The amount of traffic on the GCJ mailing list is not
what it used to be...
 http://fitzsim.org/blog/?p=16 and http://fitzsim.org/blog/?p=17
More information about the pylucene-dev