{Spam?} Re: [pylucene-dev] {Spam?} HighFreqTerms from org.apache.lucene.misc

Dirk Rothe d.rothe at semantics.de
Tue Apr 1 11:28:10 PDT 2008


On Tue, 01 Apr 2008 20:20:56 +0200, Andi Vajda <vajda at osafoundation.org>  
wrote:

>
> On Tue, 1 Apr 2008, Dirk Rothe wrote:
>
>>>> Ok, but by inspecting the java code, this was pretty trivial to  
>>>> implement in Python. Only curiosity, but do you think the java  
>>>> version would be (significantly) faster. I'm not sure I understand  
>>>> the performance implications from the jcc bridge.
>>>  I don't know. How about measuring it ?
>>>  The jcc bridge involves converting some literals from java to python  
>>> (such as strings), releasing the GIL (global interpreter lock) when  
>>> leaving python and reacquiring it when returnig.
>>>  The jcc bridge also keeps track of the java objects returned to  
>>> python so that they don't get garbage collected until python no longer  
>>> uses them. This is implemented via a C++ multimap.
>>>  It's been shown before that using a python HitCollector (used in a  
>>> very tight loop by the Lucene core) is significantly slower than using  
>>> the java equivalent [1].
>>
>> Ok, I will try to measure it.
>>
>> After I understand the makefile jar/java stuff better - and I guess  
>> thats after my theoretical CS Exams next Week ;).
>
> To add a JAR file to the PyLucene build, look at line 171 in the  
> Makefile for the current list of JAR files. Looking above that line  
> should show you how to add another JAR file.

Yeah, I have seen that, doesnt look that hard.

thnx, dirk




More information about the pylucene-dev mailing list