[pylucene-dev] Large index files: Sort leads to "GC Warning:
Repeated allocation of very large block"
Andi Vajda
vajda at osafoundation.org
Wed Dec 19 08:56:10 PST 2007
On Wed, 19 Dec 2007, Marc Weeber wrote:
> I downloaded and installed the jcc version (man, that was a positively
> different experience!), and changed my test script accordingly. The problem
> is still there: the sort asks for a humongeous amount of memory. I have to
> provide a maxheap='470m' or it will die with an out of memory error.
>
> It seams that the searcher object becomes this big. Interesitngly, if I make
> a loop for different queries, and create a new searcher object in each
> iterarion, there is no garbage collection (GC), and memory explodes again.
> This behavior is both for gcc and jcc versions. Of course, I should stick to
> one searcher, but it is interesting to note that the GC between jcc and gcc
> versions does not behave differently.
This could point at an issue with Lucene itself. Maybe you should ask
java-users at lucene.apache.org ? Let us know what you find out :)
Or, it could be a leak in PyLucene which could have a bug of holding on to
objects when it shouldn't. But, since you've seen this in both versions of
PyLucene, which are very different in their Java/C++/Python interfacing
code, I kind of doubt it. Does invoking gc explicitely help ?
(Both in Python with gc.collect() and in Java from Python with System.gc())
Andi..
More information about the pylucene-dev
mailing list