[pylucene-dev] Large index files: Sort leads to "GC Warning: Repeated allocation of very large block"

Aaron Lav asl2 at pobox.com
Thu Dec 20 08:53:35 PST 2007


On Thu, Dec 20, 2007 at 03:47:42PM +0100, Marc Weeber wrote:
> hi all,
> 
> I think you're right. The field to sort on is a date field in the  
> string format of YYYY-MM-DD. I indeend started looking into the java  
> sorting things, and I am not too much surprised any more of the memory  
> load. Good thing is that after the first search+sort, it is *really*  
> fast: a cooccurrence search (two terms per doc in a boolean query)  
> together with a sort on date in the 50M collection is between 50ms and  
> 200ms (timed in python, before and after the search) , with no real  
> difference between jcc and gcc scripts
> 
> >
> >
> >If you have a lot of dead space (reader.maxDoc() >> reader.numDocs()),
> >optimizing should decrease memory usage.
> do you mean a .optimize() on the index? That I already have done. Or  
> do you mean something different?

Just optimize() on the index.  If you've done that, then maxDoc() should
be equal to numDocs().

     Aaron


More information about the pylucene-dev mailing list