[pylucene-dev] Automatic call to finalize

Andi Vajda vajda at osafoundation.org
Thu Jan 17 10:44:47 PST 2008


On Thu, 17 Jan 2008, anurag uniyal wrote:

> I have a custom parser which is created and used to parse queries at 
> several places. Instead of calling finalize each time, I have wrapped my 
> custom parser in a python class and keep a ref to it, delegating all 
> calls and it works well, without any weakrefs.

> Do you think such behaviour is correct and please tell if it can create 
> any problems?

This looks like it could work. There are caveats with using python __del__() 
methods and objects involved in cycles. See 'garbage' list docs [1].

Andi..

[1] http://docs.python.org/lib/module-gc.html


---------
import lucene
lucene.initVM(lucene.CLASSPATH, maxheap='1m')
class MyParser(object):

     class _MyParser(lucene.PythonQueryParser):
         def __init__(self,defaultField, analyzer):
             super(MyParser._MyParser, self).__init__(defaultField, analyzer)

     def __init__(self,defaultField, analyzer):
         self._parser = self._MyParser(defaultField, analyzer)

     def __getattr__(self, name):
         return getattr(self._parser, name)

     def __del__(self):
         self._parser.finalize()

analyzer = lucene.StandardAnalyzer()
for i in xrange(100000):
     if i%100==0:print i
     customParser = MyParser("body", analyzer)
     query = customParser.parse("anurag")
---------


----- Original Message ----
From: anurag uniyal <anuraguniyal at yahoo.com>
To: Andi Vajda <vajda at osafoundation.org>; pylucene-dev at osafoundation.org
Sent: Friday, 11 January, 2008 4:55:43 PM
Subject: Re: [pylucene-dev] memory leak status


Hi,

I am using the latest trunk code but still I am facing java.lang.OutOfMemoryError.

It may be due to problem in my code, so I have created and attached a sample script which shows the problem.

In my script I am just adding a simple document in threads.
Without threading it works and also if document's field is UN_TOKENIZED it works but TOKENIZED fails...

Thanks a lot!
Anurag


----- Original Message ----
From: Andi Vajda <vajda at osafoundation.org>
To: pylucene-dev at osafoundation.org
Sent: Friday, 11 January, 2008 4:26:53 AM
Subject: Re: [pylucene-dev] memory leak status


On Thu, 10 Jan 2008, Andi Vajda wrote:

>      I think I'm going to be adding support for the manual way via
>      finalize() shortly.

This just got checked in to rev 377.

The test/test_PythonDirectory.py tests can now be run in an endless loop 
without leakage. See this tests' sources for an example of finalize() use.

     > python test/test_PythonDirectory.py -loop

I'm still hoping to find a reliable way to automate this....

To rebuild PyLucene with this change, you also need to rebuild jcc.

Andi..
_______________________________________________
pylucene-dev mailing list
pylucene-dev at osafoundation.org
http://lists.osafoundation.org/mailman/listinfo/pylucene-dev






Chat on a cool, new interface. No download required. Click here.


-----Inline Attachment Follows-----

import os
import sys
import threading

import lucene
lucene.initVM(lucene.CLASSPATH, maxheap='5m')


class MyDocument(lucene.Document):
     indexType = lucene.Field.Index.UN_TOKENIZED # TOKENIZED fails
     def __init__(self):
         lucene.Document.__init__(self)

         self.add(lucene.Field("body", "what a body", lucene.Field.Store.YES, MyDocument.indexType))

class DocThread(threading.Thread):
     def __init__(self, writer):
         threading.Thread.__init__(self)
         self.writer = writer
         self.error = None

     def run(self):
         try:
             lucene.getVMEnv().attachCurrentThread()
             self.writer.addDocument(MyDocument())
         except Exception,e:
             self.error = e

def main():
     _store = lucene.FSDirectory.getDirectory("/tmp/index/", True)
     _writer = lucene.IndexWriter(_store, lucene.StandardAnalyzer(), True)

     for i in xrange(500):
         if i%100 == 0: print i

         t = DocThread(_writer)
         t.start()
         t.join()

         if t.error:
             print t.error
             break


main()
print "lucene.Field.Index.UN_TOKENIZED works but TOKENIZED fails..."
MyDocument.indexType = lucene.Field.Index.TOKENIZED
main()


       Bollywood, fun, friendship, sports and more. You name it, we have it on http://in.promos.yahoo.com/groups


More information about the pylucene-dev mailing list