[pylucene-dev] Analyzer memory leak

Robert Kaye rob at eorbit.net
Fri Jan 28 13:06:20 PST 2005


Hi!

First off, thanks for PyLucene -- it totally rocks!

I've been working on a python script that uses Lucene to look up text 
in an index -- it works great, but it ran my machine out of memory in 
an all-night test. :-( A little bit of digging around and I've come up 
with this little Python program to duplicates the memory leak:

#!/usr/bin/env python

import PyLucene
from stringreader import StringReader

analyzer = PyLucene.StopAnalyzer()
while True:
     query = u"any old text here will cause a leak"
     for token in query.split(u' '):
         stream = analyzer.tokenStream("", StringReader(token))
         while stream.next(): pass

I know that my use of the analyzer is a bit strange, but I want to 
examine which words get tossed as stop words and I need to correlate 
tokenized lucene queries from non-tokenized query strings.

Is there something I need to do that I am not doing?

This happens on Linux:

kernel: 2.4.24 (eeek -- its time to upgrade!)
gcj: gcj (GCC) 3.4.4 20041218 (prerelease) (Debian 3.4.3-6)
python: Python 2.3.4 (#2, Jan  5 2005, 08:24:51)
PyLucene: 0.9.6

Any tips at all would be appreciated!


The stringreader class used in the example above is mostly ripped off 
from one of the PyLucene unit tests:

#!/usr/bin/env python
class StringReader(object):

     def __init__(self, text):
         self.text = unicode(text)

     def read(self, length = -1):

         text = self.text
         if text is None:
             return ''

         if length == -1 or length >= len(text):
             self.text = None
             return text

         text = text[0:length]
         self.text = self.text[length:]


         return text

     def close(self):
         pass

--

--ruaok         Somewhere in Texas a village is *still* missing its 
idiot.

Robert Kaye     --     rob at eorbit.net     --    http://mayhem-chaos.net



More information about the pylucene-dev mailing list