[pylucene-dev] finalizing the deadly embrace

Andi Vajda vajda at osafoundation.org
Mon Jan 21 00:03:13 PST 2008

On Mon, 21 Jan 2008, anurag uniyal wrote:

> It does solve the problem for Custom Analyzer and parsers etc.
> But my code with Custom filters still goes out of memory.
> In code below if i comment out 'result = MyFilter(result)' line
> it works.

I don't seem to be able to reproduce this. It's working fine for me. I even 
increased the the loop to 1,000,000. Monitoring the process, its size 
remains constant too.

Maybe the __del__() method is causing trouble ?
But I left it in and all seemed fine for me.

So, what's different here ?

   - did you rebuild JCC ?
   - did you rebuild PyLucene ? (what's lucene.VERSION returning ?)
   - what version of Python are you using ?
   - on what OS ?
   - what version of Java ?


import lucene
lucene.initVM(lucene.CLASSPATH, maxheap='1m')
from lucene import (Token, PythonAnalyzer, PythonTokenStream, StandardTokenizer, LowerCaseFilter)
from lia.analysis.AnalyzerUtils import AnalyzerUtils
class MyFilter(PythonTokenStream):
     count = 0
     filters = []
     def __init__(self, tokenStream):
         super(MyFilter, self).__init__()
         self.input = tokenStream
         MyFilter.count += 1
         self..id = MyFilter.count

     def next(self):
         return self.input.next()

     def __del__(self):
         #self.input = None

class MyAnalyzer(PythonAnalyzer):
     def __init__(self):
         super(MyAnalyzer, self).__init__()

     def tokenStream(self, fieldName, reader):
         result = StandardTokenizer(reader)
         result = LowerCaseFilter(result)

         # my filtering
         result = MyFilter(result)

         return result
text = 'TESTING the TESTS'
analyzer = MyAnalyzer()
     for i in xrange(10000):
         if i%100==0:print i
         tokens = AnalyzerUtils.tokensFromAnalysis(analyzer, text)
except lucene.JavaError,e:
     print i,e
     print "%s MyFilter remain:"%len(MyFilter.filters)
     print MyFilter.filters

----- Original Message ----
From: Andi Vajda <vajda at osafoundation.org>
To: pylucene-dev at osafoundation.org
Sent: Sunday, 20 January, 2008 7:18:34 AM
Subject: Re: [pylucene-dev] finalizing the deadly embrace

On Thu, 17 Jan 2008, Andi Vajda wrote:

> Thinking about this some more, I believe that Anurag's finalizer proxy idea 
> is on the right track. It provides the "trick" needed to break the deadly 
> embrace when the ref count of the python object is down to 1, that is, down 
> to when the only reference is the one from the Java parent wrapper.
> When the finalizer proxy's refcount goes to zero, it is safe to assume that 
> only Java _may_ still be needing the object. This is enough then to replace 
> the strong global reference to the Java parent wrapper with a weak global 
> reference thereby breaking the deadly embrace and letting Java garbage 
> collect it when its time has come. When that time comes, the finalize() 
> method on it is normally called by the Java garbage collector and the python 
> ref count to the Python extension instance is brought to zero and the object 
> is finally freed.
> This assumes, of course, that when such an extension object is instantiated, 
> the finalizer proxy is actually returned.
> I should be able to implement this in C/C++ so that the performance hit is 
> minimal and in a way that is transparent to PyLucene users.

I checked the implementation of this idea into svn trunk rev 381.
It is no longer necessary to call finalize() by hand :)

I removed the finalize() calls from test_PythonDirectory.py and test_Sort.py 
can now be run for ever, without any leakage.

It is necessary to rebuild both JCC and PyLucene to try this out.
I'd be curious to see if this solves your problem, Brian ?

pylucene-dev mailing list
pylucene-dev at osafoundation.org

       Forgot the famous last words? Access your message archive online at http://in.messenger.yahoo.com/webmessengerpromo.php

More information about the pylucene-dev mailing list