[pylucene-dev] TermDocs.read() method
Andi Vajda
vajda at osafoundation.org
Tue Sep 9 11:59:55 PDT 2008
On Sep 9, 2008, at 11:21, Martin Bachwerk <bachwerk at i5.informatik.rwth-aachen.de
> wrote:
> Yea, tried with maxheap=128m.. it stabilized at around 230MB RAM..
> need to check performance though..
> The question is just.. when I iterate with .next() no memory is
> eaten up.. it lives on around 30-40MB.. and like this it grows to
> 800.. just strange.
For questions about the Lucene APIs themselves, you'd better off
asking java-user at lucene.apache.org as there more expertise hanging out
there.
> But since I don't know much Java and this is all not so critical,
> I'll just leave it be for now.. Thanks for the help! :)
Great !
Andi..
>
>
> Martin
>>
>> On Tue, 9 Sep 2008, Martin Bachwerk wrote:
>>
>>> Hello again,
>>>
>>> the index is kinda large indeed.. even though I have
>>> Field.Store.NO set for the actual content.. (ok the documents are
>>> 2-3k large in average, but it could be smaller still..)
>>>
>>> The memory use is just growing and growing.. though doesn't go
>>> into critical area, it just ate up 800megs out of 1024 I have in
>>> some 15 mins.. after that it stayed stable. I guess this would be
>>> acceptable.. but I don't quite understand why it is the case..
>>
>> If it stabilized, it could just mean that this is the memory
>> necessary for Java Lucene to work with your index. Have you tried
>> reducing the max memory so that you use less but gc more often ?
>>
>>> The arrays are pretty much dependant on the term (i.e. word).. for
>>> words like "is" they're around the size of the number of
>>> documents.. for rare words they can be 1-2-3.. entries long..
>>>
>>> I don't have Java code to test all this sorry.
>>
>> It could be written :) It's pretty much a one-to-one mapping for
>> the API calls. This is what I would do next to isolate this if I
>> were to debug this further right now.
>>
>> Andi..
>>
>>
>
More information about the pylucene-dev
mailing list