[pylucene-dev] Re: lucene.JavaError: java.lang.OutOfMemoryError:
Java heap space
Andi Vajda
vajda at osafoundation.org
Tue Jan 8 22:08:10 PST 2008
On Tue, 8 Jan 2008, Brian Merrell wrote:
> # java -version
>
> java version "1.6.0_03"
> Java(TM) SE Runtime Environment (build 1.6.0_03-b05)
> Java HotSpot(TM) 64-Bit Server VM (build 1.6.0_03-b05, mixed mode)
>
> # which java
> /usr/bin/java
>
> It doesn't seem to crash when I remove the filter. However this may be
> misleading as don't have nearly as many tokens (particularly unique tokens)
> without the filter. The problem may exist but the symptoms delayed.
This could indicate that there is indeed a leak in the code generated for
the extension. I intend to take a closer look at what's being generated
tomorrow or Thursday. This dictionary should not be growing unless your
python code keeps references to all these objects. Are all the values in the
returned dict mostly the same (their refcount) ? If so, what is it ?
In other words, what does myvm._dumpRefs().values() look like ?
> After a 3000 thousand documents I get len(myvm._dumpRef()) == 12270 and it
> seems to be increasing by about 4000 for each 1000 documents.
>
> I didn't even realize C++ code was being generated. I doubt I can help
> directly with this but would be happy to provide anything that would help
> those more knowledgeable than I debug this).
JCC generates over 100,000 lines of C++ code to integrate Java Lucene and
Python. I used to write this by hand, phew.
If you could send me one (or ten) document(s) and your indexing code (you
already posted your filter code), that should help me reproduce this.
Andi..
More information about the pylucene-dev
mailing list