[pylucene-dev] Re: lucene.JavaError: java.lang.OutOfMemoryError: Java heap space

Brian Merrell brian at merrells.org
Tue Jan 8 19:57:41 PST 2008


# java -version

java version "1.6.0_03"
Java(TM) SE Runtime Environment (build 1.6.0_03-b05)
Java HotSpot(TM) 64-Bit Server VM (build 1.6.0_03-b05, mixed mode)

# which java
/usr/bin/java

It doesn't seem to crash when I remove the filter.  However this may be
misleading as don't have nearly as many tokens (particularly unique tokens)
without the filter.  The problem may exist but the symptoms delayed.

After a 3000 thousand documents I get len(myvm._dumpRef()) == 12270 and it
seems to be increasing by about 4000 for each 1000 documents.

I didn't even realize  C++ code was being generated.  I doubt I can help
directly with this but would be happy to provide anything that would help
those more knowledgeable than I debug this).

-brian


On 1/8/08, Andi Vajda <vajda at osafoundation.org> wrote:
>
>
> On Tue, 8 Jan 2008, Brian Merrell wrote:
>
> > Thanks for the quick reply.  I haven't used Java in years so my
> apologies if
> > I am not able to provide useful debug info without some guidance.
> >
> > Memory does seem to be running low when it crashes.  According to top,
> > python is using almost all of the 4GB when it bails.
>
> That may be misleading because all the memory used belongs to the Python
> process. Even Java's since it's loaded in via shared libraries into the
> Python process.
>
> > I don't know what Java VM I am using.  How do I determine this?
>
> At the shell prompt enter: java -version
> For example, on my Mac, I get:
>
>    java version "1.5.0_13"
>    Java(TM) 2 Runtime Environment, Standard Edition (build
> 1.5.0_13-b05-237)
>    Java HotSpot(TM) Client VM (build 1.5.0_13-119, mixed mode, sharing)
>
> Also, what does 'which java' return ?
>
> > I will try running it calling gc.collect() and running optimize and see
> if
> > that helps.  Any suggestions on how to debug _dumpRefs?
>
> _dumpRefs() returns a dict of java objects as keys and their ref count as
> values. If this dict is unusually large, something's amiss. What is
> "unusually" ? Time will tell :)
>
> > P.S.  My filter is implemented in Python.  In fact here is the code:
>
> Another thing to try (proceeding by elimination), is to index your
> documents
> without your custom filter. Does it still run out of memory ? If the
> answer
> is no, clearly the python filter integration code needs to be looked at
> closely (that is, the generated C++ for that code). Maybe something's
> leaking there.
>
> Andi..
> _______________________________________________
> pylucene-dev mailing list
> pylucene-dev at osafoundation.org
> http://lists.osafoundation.org/mailman/listinfo/pylucene-dev
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.osafoundation.org/pipermail/pylucene-dev/attachments/20080108/8ed7a9e0/attachment.htm


More information about the pylucene-dev mailing list