[pylucene-dev] Re: lucene.JavaError: java.lang.OutOfMemoryError:
Java heap space
Brian Merrell
brian at merrells.org
Tue Jan 8 19:57:41 PST 2008
# java -version
java version "1.6.0_03"
Java(TM) SE Runtime Environment (build 1.6.0_03-b05)
Java HotSpot(TM) 64-Bit Server VM (build 1.6.0_03-b05, mixed mode)
# which java
/usr/bin/java
It doesn't seem to crash when I remove the filter. However this may be
misleading as don't have nearly as many tokens (particularly unique tokens)
without the filter. The problem may exist but the symptoms delayed.
After a 3000 thousand documents I get len(myvm._dumpRef()) == 12270 and it
seems to be increasing by about 4000 for each 1000 documents.
I didn't even realize C++ code was being generated. I doubt I can help
directly with this but would be happy to provide anything that would help
those more knowledgeable than I debug this).
-brian
On 1/8/08, Andi Vajda <vajda at osafoundation.org> wrote:
>
>
> On Tue, 8 Jan 2008, Brian Merrell wrote:
>
> > Thanks for the quick reply. I haven't used Java in years so my
> apologies if
> > I am not able to provide useful debug info without some guidance.
> >
> > Memory does seem to be running low when it crashes. According to top,
> > python is using almost all of the 4GB when it bails.
>
> That may be misleading because all the memory used belongs to the Python
> process. Even Java's since it's loaded in via shared libraries into the
> Python process.
>
> > I don't know what Java VM I am using. How do I determine this?
>
> At the shell prompt enter: java -version
> For example, on my Mac, I get:
>
> java version "1.5.0_13"
> Java(TM) 2 Runtime Environment, Standard Edition (build
> 1.5.0_13-b05-237)
> Java HotSpot(TM) Client VM (build 1.5.0_13-119, mixed mode, sharing)
>
> Also, what does 'which java' return ?
>
> > I will try running it calling gc.collect() and running optimize and see
> if
> > that helps. Any suggestions on how to debug _dumpRefs?
>
> _dumpRefs() returns a dict of java objects as keys and their ref count as
> values. If this dict is unusually large, something's amiss. What is
> "unusually" ? Time will tell :)
>
> > P.S. My filter is implemented in Python. In fact here is the code:
>
> Another thing to try (proceeding by elimination), is to index your
> documents
> without your custom filter. Does it still run out of memory ? If the
> answer
> is no, clearly the python filter integration code needs to be looked at
> closely (that is, the generated C++ for that code). Maybe something's
> leaking there.
>
> Andi..
> _______________________________________________
> pylucene-dev mailing list
> pylucene-dev at osafoundation.org
> http://lists.osafoundation.org/mailman/listinfo/pylucene-dev
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.osafoundation.org/pipermail/pylucene-dev/attachments/20080108/8ed7a9e0/attachment.htm
More information about the pylucene-dev
mailing list