[pylucene-dev] finalizing the deadly embrace

anurag uniyal anuraguniyal at yahoo.com
Mon Jan 21 02:20:33 PST 2008


is there any problem in calling finalize in my code?

rgds
Anurag


----- Original Message ----
From: anurag uniyal <anuraguniyal at yahoo.com>
To: Andi Vajda <vajda at osafoundation.org>
Cc: pylucene-dev at osafoundation.org
Sent: Monday, 21 January, 2008 3:28:47 PM
Subject: Re: [pylucene-dev] finalizing the deadly embrace


This still fails *here* after removing __del__
 
I did rebuild JCC thats why now custom analyzer/parser works without finalize.
test_PythonDirectory.py is also passing.
 
src : trunk rev 381
lucene.VERSION = 2.2.0-613493
Python 2.4.3
Ubuntu 6.06.1 LTS
java 1.5.0_06-b05
 
rgds
Anurag

 
----- Original Message ----
From: Andi Vajda <vajda at osafoundation.org>
To: anurag uniyal <anuraguniyal at yahoo.com>
Cc: pylucene-dev at osafoundation.org
Sent: Monday, 21 January, 2008 1:33:13 PM
Subject: Re: [pylucene-dev] finalizing the deadly embrace


On Mon, 21 Jan 2008, anurag uniyal wrote:

> It does solve the problem for Custom Analyzer and parsers etc.
> But my code with Custom filters still goes out of memory.
> In code below if i comment out 'result = MyFilter(result)' line
> it works.

I don't seem to be able to reproduce this. It's working fine for me. I even 
increased the the loop to 1,000,000. Monitoring the process, its size 
remains constant too.

Maybe the __del__() method is causing trouble ?
But I left it in and all seemed fine for me.

So, what's different here ?

  - did you rebuild JCC ?
  - did you rebuild PyLucene ? (what's lucene.VERSION returning ?)
  - what version of Python are you using ?
  - on what OS ?
  - what version of Java ?

Andi..


----
import lucene
lucene.initVM(lucene.CLASSPATH, maxheap='1m')
from lucene import (Token, PythonAnalyzer, PythonTokenStream, StandardTokenizer, LowerCaseFilter)
from lia.analysis.AnalyzerUtils import AnalyzerUtils
class MyFilter(PythonTokenStream):
    count = 0
    filters = []
    def __init__(self, tokenStream):
        super(MyFilter, self).__init__()
        self.input = tokenStream
        MyFilter.count += 1
        self..id = MyFilter.count
        MyFilter.filters.append(self.id)

    def next(self):
        return self.input.next()

    def __del__(self):
        #self.input = None
        MyFilter.filters.remove(self.id)

class MyAnalyzer(PythonAnalyzer):
    def __init__(self):
        super(MyAnalyzer, self).__init__()

    def tokenStream(self, fieldName, reader):
        result = StandardTokenizer(reader)
        result = LowerCaseFilter(result)

        # my filtering
        result = MyFilter(result)

        return result
text = 'TESTING the TESTS'
analyzer = MyAnalyzer()
try:
    for i in xrange(10000):
        if i%100==0:print i
        tokens = AnalyzerUtils.tokensFromAnalysis(analyzer, text)
except lucene.JavaError,e:
    print i,e
    print "%s MyFilter remain:"%len(MyFilter.filters)
    print MyFilter.filters
-----
rgds
Anurag


----- Original Message ----
From: Andi Vajda <vajda at osafoundation.org>
To: pylucene-dev at osafoundation.org
Sent: Sunday, 20 January, 2008 7:18:34 AM
Subject: Re: [pylucene-dev] finalizing the deadly embrace


On Thu, 17 Jan 2008, Andi Vajda wrote:

> Thinking about this some more, I believe that Anurag's finalizer proxy idea 
> is on the right track. It provides the "trick" needed to break the deadly 
> embrace when the ref count of the python object is down to 1, that is, down 
> to when the only reference is the one from the Java parent wrapper.
>
> When the finalizer proxy's refcount goes to zero, it is safe to assume that 
> only Java _may_ still be needing the object. This is enough then to replace 
> the strong global reference to the Java parent wrapper with a weak global 
> reference thereby breaking the deadly embrace and letting Java garbage 
> collect it when its time has come. When that time comes, the finalize() 
> method on it is normally called by the Java garbage collector and the python 
> ref count to the Python extension instance is brought to zero and the object 
> is finally freed.
>
> This assumes, of course, that when such an extension object is instantiated, 
> the finalizer proxy is actually returned.
>
> I should be able to implement this in C/C++ so that the performance hit is 
> minimal and in a way that is transparent to PyLucene users.
>

I checked the implementation of this idea into svn trunk rev 381.
It is no longer necessary to call finalize() by hand :)

I removed the finalize() calls from test_PythonDirectory.py and test_Sort.py 
can now be run for ever, without any leakage.

It is necessary to rebuild both JCC and PyLucene to try this out.
I'd be curious to see if this solves your problem, Brian ?

Andi..
_______________________________________________
pylucene-dev mailing list
pylucene-dev at osafoundation.org
http://lists.osafoundation.org/mailman/listinfo/pylucene-dev


      Forgot the famous last words? Access your message archive online at http://in.messenger.yahoo.com/webmessengerpromo.php






Get the freedom to save as many mails as you wish. Click here to know how.


      5, 50, 500, 5000 - Store N number of mails in your inbox. Go to http://help.yahoo.com/l/in/yahoo/mail/yahoomail/tools/tools-08.html
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.osafoundation.org/pipermail/pylucene-dev/attachments/20080121/3d980b99/attachment-0001.html


More information about the pylucene-dev mailing list