[pylucene-dev] Large index files: Sort leads to "GC Warning: Repeated allocation of very large block"

Marc Weeber marc at weeber.net
Wed Dec 19 01:16:36 PST 2007


Hi Andi and others,

I downloaded and installed the jcc version (man, that was a positively  
different experience!), and changed my test script accordingly. The  
problem is still there: the sort asks for a humongeous amount of  
memory. I have to provide a maxheap='470m' or it will die with an out  
of memory error.

It seams that the searcher object becomes this big. Interesitngly, if  
I make a loop for different queries, and create a new searcher object  
in each iterarion, there is no garbage collection (GC), and memory  
explodes again. This behavior is both for gcc and jcc versions. Of  
course, I should stick to one searcher, but it is interesting to note  
that the GC between jcc and gcc versions does not behave differently.

best,

Marc




On 18 dec 2007, at 23:44, Andi Vajda wrote:

>
> On Tue, 18 Dec 2007, Marc Weeber wrote:
>
>> Is there a way to a) be more prudent on the memory usage or b)  
>> another more memory efficient (and without warnings) way of getting  
>> the cooccurrence info?
>
> Yes, you might have better luck at controlling and watching memory  
> usage by switching to a regular Java VM, switching to the JCC flavor  
> of PyLucene.
>
>    http://svn.osafoundation.org/pylucene/trunk/jcc
>
> The gcj flavor is on its way to deprecation as the open source Java  
> energies seem to be moving to openjdk, away from gcj.
>
> If you'd like to stick with gcj, you might want to ask java at gcc.gnu.org 
>  about memory use and control.
>
> Andi..
> _______________________________________________
> pylucene-dev mailing list
> pylucene-dev at osafoundation.org
> http://lists.osafoundation.org/mailman/listinfo/pylucene-dev



More information about the pylucene-dev mailing list