[pylucene-dev] What has happened to PyLucene?
erik at cq2.nl
Thu Jan 15 23:20:41 PST 2009
So, Helmut, PyLucene seems alive and kicking!
I liked the GCJ approach because of its:
- speed: although not a tenfold increase, it is significant to us
- ABI: 10-100 fold faster access to the Lucene API, from C++ that is.
- memory use: just less, and integrated, no separate heap etc
I do appreciate PyLucene's goal to support all of Lucene, and I agree
that maintaining all the proxies/stubs was painful. In fact, they
caused most of the build/install trouble: I was astonished to learn
that gcj is able to compile lucene.jar with one simple instruction.
However, our goal here is almost the opposite: use the smallest
possible core of Lucene, that what it is good at, and no more. Lucene
is darn good at ranked full-text/zoned based search, so we use that.
For faceted search, clustering, range search, n-grams etc we use other
software, and here comes the integration issue: the need for speed.
Generally speaking, crossing VM boundaries is extremely expensive,
mainly because of call dispatching and data conversions. Going from
Python to C++ and then to Java is crossing a VM boundary twice, first
using Python's C-API and then the JNI. We avoid much of the C-API by
creating higher level C++ interfaces that just do more in a single
call, for example perform a query and retrieve the results. Next we
avoided the JNI by using the ABI (formerly used by PyLucene as well).
This yielded a huge performance improvement. Multiple orders of
As a last remark, we do not use generated code for proxies. Instead,
we use ctypes to interface to our C++ code, which than uses the ABI to
interface to Lucene. This keeps our build process as simple as a few
single line compile statements.
Oh, for I forget: we don't use threads.
So, probably we used PyLucene for a specific feature of it that was
not expected, but I hope I made clear why we stick to GCJ. And I
really think that if PyLucene is to cover all of Lucene, the JCC
approach is a good one, and I am glad to hear that it is stable. I
will have to face the GCJ trouble on my own, if and when it appears, I
On Thu, Jan 15, 2009 at 9:12 PM, TJ Ninneman <tj at twopeasinabucket.com> wrote:
> On Jan 15, 2009, at 11:02 AM, Bill Janssen wrote:
>> Erik Groeneveld <erik at cq2.nl> wrote:
>>> But I admit that after the major
>>> strategy change that involved using JCC instead of GCJ, I am
>>> to a different GCJ solution. Probably other do so as well?
> What solution?
>> Nope. I dislike the JVM, particularly its handling of memory, so I
>> share your pain,
> Agreed, my memory consumption went up by almost a full order of
> With that being said, the new JCC based one just rocks in almost every
> way. Even when I would develop a pure python, multi-threaded server
> with GCJ PyLucene I invariably would have constant problems. Now I
> can run my code within a rock solid mod_wsgi Apache server and I never
> have issues.
> It's a beautiful thing...RAM is cheap, downtime isn't.
> pylucene-dev mailing list
> pylucene-dev at osafoundation.org
Seek You Too
twitter, skype: ejgroene
mobiel: 0624 584 029
More information about the pylucene-dev