[pylucene-dev] Large index files: Sort leads to "GC Warning: Repeated allocation of very large block"

Andi Vajda vajda at osafoundation.org
Wed Dec 19 08:56:10 PST 2007


On Wed, 19 Dec 2007, Marc Weeber wrote:

> I downloaded and installed the jcc version (man, that was a positively 
> different experience!), and changed my test script accordingly. The problem 
> is still there: the sort asks for a humongeous amount of memory. I have to 
> provide a maxheap='470m' or it will die with an out of memory error.
>
> It seams that the searcher object becomes this big. Interesitngly, if I make 
> a loop for different queries, and create a new searcher object in each 
> iterarion, there is no garbage collection (GC), and memory explodes again. 
> This behavior is both for gcc and jcc versions. Of course, I should stick to 
> one searcher, but it is interesting to note that the GC between jcc and gcc 
> versions does not behave differently.

This could point at an issue with Lucene itself. Maybe you should ask 
java-users at lucene.apache.org ? Let us know what you find out :)

Or, it could be a leak in PyLucene which could have a bug of holding on to 
objects when it shouldn't. But, since you've seen this in both versions of 
PyLucene, which are very different in their Java/C++/Python interfacing 
code, I kind of doubt it. Does invoking gc explicitely help ?
(Both in Python with gc.collect() and in Java from Python with System.gc())

Andi..


More information about the pylucene-dev mailing list