[pylucene-dev] Broken optimization?

Andi Vajda vajda at osafoundation.org
Wed Feb 1 11:29:03 PST 2006


On Wed, 1 Feb 2006, Jared Kuolt wrote:

> On Wed, 2006-02-01 at 10:25 -0800, Andi Vajda wrote:
>> What version of PyLucene are you running ?
> PyLucene.py states 1.0
>
>> On what operating system ?
> Ubuntu 5.04
>
>> Did you build it yourself ? if so, with what version of gcj ?
> I don't actually know but I'll try to find out. In any case, gcj is
> version 4.0.2

PyLucene 1.0 is built from Java Lucene 1.4.3, you may have hit a bug with Java 
Lucene itself. You could ask the java-user at lucene.apache.org mailing list 
about it or upgrade to the latest PyLucene 1.9 which is very close to the Java 
Lucene 1.9's svn HEAD revision. Even though there is no official Java Lucene 
1.9 release yet, it appears to be very stable and has had many bugs fixed 
since release 1.4.3. Indexes created with 1.4.3 are supposed to be readable 
from Lucene 1.9 (the opposite is not true).

For a recent source tarball of PyLucene 1.9 see 
http://pylucene.osafoundation.org.

You should also use gcj 3.4.x, x >= 3, gcj building instructions are included 
near the bottom of PyLucene's INSTALL file. I've had little luck using gcj 4.x 
so far.

Andi..

>
>>
>> Andi..
>>
>> On Wed, 1 Feb 2006, Jared Kuolt wrote:
>>
>>> Hello all,
>>>
>>> I've recently inherited a PyLucene project with very little knowledge of
>>> PyLucene and Lucene itself.
>>>
>>> To cut to the chase, I think I've screwed something up. Basically there
>>> is a script we have that runs, pulling records from a text file, and
>>> then puts them into a "queue" to update the indexes later.
>>>
>>> In any case, the basic jist of the script is this:
>>>
>>> #### START ####
>>>
>>> def main():
>>>    reader = Reader("../text.txt")
>>>    '''to add id for reindexing profile'''
>>>    queue = StaleQueue()
>>>    '''lucene index to store events'''
>>>    indexDirectory = "/home/data/qdb/events"
>>>    analyzer = PyLucene.StandardAnalyzer()
>>>    writer = PyLucene.IndexWriter(indexDirectory, analyzer, False)
>>>    counter = 0
>>>    line = reader.get_line()
>>>    while(line):
>>>        counter +=1
>>>        if counter % 10000 == 0:
>>>            print "passing: %s" %counter
>>>            writer.optimize()
>>>            print "optimized - resuming"
>>>        handler = EfDemo(line)
>>>        indexer = EventIndexer(writer)
>>>        indexer.index(handler.get_id(), handler)
>>>        queue.add(handler.get_id())
>>>        line = reader.get_line()
>>>    writer.close()
>>>
>>> #### END ####
>>>
>>> It dies after 1000 records (probably has to do with the optimization...
>>> but I have no clue how to check):
>>>
>>> #### START ####
>>>
>>> Traceback (most recent call last):
>>>  File "./qdbefdemohandler.py", line 48, in ?
>>>    main()
>>>  File "./qdbefdemohandler.py", line 29, in main
>>>    indexer.index(handler.get_andii_id(), handler)
>>>  File "/usr/local/lib/python2.4/site-packages/qdb/qdbindexer.py", line
>>> 17, in index
>>>    self.writer.addDocument(doc)
>>>  File "/usr/lib/python2.4/site-packages/PyLucene.py", line 1902, in
>>> addDocument
>>>    def addDocument(*args): return
>>> _PyLucene.IndexWriter_addDocument(*args)
>>> PyLucene.JavaError:
>>> java.io.FileNotFoundException: /home/data/qdb/events/_30fvb.fnm (No such
>>> file or directory)
>>>
>>> #### END ####
>>>
>>> Thoughts? Help a PyLu newbie out! :)
>>>
>>> --
>>> Jared Kuolt <jaredk at morefocus.com>
>>>
>>>
>>> _______________________________________________
>>> pylucene-dev mailing list
>>> pylucene-dev at osafoundation.org
>>> http://lists.osafoundation.org/mailman/listinfo/pylucene-dev
>>>
>> _______________________________________________
>> pylucene-dev mailing list
>> pylucene-dev at osafoundation.org
>> http://lists.osafoundation.org/mailman/listinfo/pylucene-dev
> -- 
> Jared Kuolt <jaredk at morefocus.com>
> morefocus, inc.
>
>



More information about the pylucene-dev mailing list