I get an OutOfMemoryError: Java heap space after indexing less than 40,000
documents.  Here are the details.PyLucene-2.2.0-2 JCC
Ubuntu 7.10 64bit running on 4GB Core 2 Duo
Python 2.5.1

I am starting Lucene with the following:
lucene.initVM(lucene.CLASSPATH, maxheap='2048m')
Mergefactor (I've tried everything from 10 - 10,000)
MaxMergeDocs and MaxBufferedDocs are at their defaults

I believe the problem somehow stems from a filter I've written that turns
tokens into bigrams (each token returns two tokens, the original token and a
new token created from concatenating the text of the current and previous
token).  These bigrams add a lot of unique tokens but I didn't think that
would be a problem (aren't they all flushed out to disk?)

Any ideas or suggestions would be greatly appreciated.

