[pylucene-dev] lucene.JavaError: java.lang.OutOfMemoryError: Java heap space

Brian Merrell brian at merrells.org
Tue Jan 8 16:48:33 PST 2008

I get an OutOfMemoryError: Java heap space after indexing less than 40,000
documents.  Here are the details.PyLucene-2.2.0-2 JCC
Ubuntu 7.10 64bit running on 4GB Core 2 Duo
Python 2.5.1

I am starting Lucene with the following:
lucene.initVM(lucene.CLASSPATH, maxheap='2048m')
Mergefactor (I've tried everything from 10 - 10,000)
MaxMergeDocs and MaxBufferedDocs are at their defaults

I believe the problem somehow stems from a filter I've written that turns
tokens into bigrams (each token returns two tokens, the original token and a
new token created from concatenating the text of the current and previous
token).  These bigrams add a lot of unique tokens but I didn't think that
would be a problem (aren't they all flushed out to disk?)

Any ideas or suggestions would be greatly appreciated.

-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.osafoundation.org/pipermail/pylucene-dev/attachments/20080108/64672955/attachment.html

More information about the pylucene-dev mailing list