[pylucene-dev] PyLucene optimize error (due to file size limit?)
chan kang
ddanddan at gmail.com
Mon Mar 20 19:38:07 PST 2006
The result of my "make test" is as follows:
find test -name 'test_*.py' | xargs -n 1 python
> ...
> ----------------------------------------------------------------------
> Ran 3 tests in 0.011s
>
> OK
> .....
> ----------------------------------------------------------------------
> Ran 5 tests in 0.023s
>
> OK
> .
> ----------------------------------------------------------------------
> Ran 1 test in 0.004s
>
> OK
> .
> ----------------------------------------------------------------------
> Ran 1 test in 0.001s
>
> OK
> F
> ======================================================================
> FAIL: test_bug1564 (__main__.Test_Bug1564)
> ----------------------------------------------------------------------
> Traceback (most recent call last):
> File "test/test_bug1564.py", line 52, in test_bug1564
> self.assertEqual(hits.length(), 1)
> AssertionError: 0 != 1
>
> ----------------------------------------------------------------------
> Ran 1 test in 0.003s
>
> FAILED (failures=1)
> E
> ======================================================================
> ERROR: test_bug1763 (__main__.Test_Bug1763)
> ----------------------------------------------------------------------
> Traceback (most recent call last):
> File "test/test_bug1763.py", line 62, in test_bug1763
> self.assertEqual(hits.doc(0).get('id'), '2')
> File "/usr/lib/python2.4/site-packages/PyLucene.py", line 2611, in doc
> def doc(*args): return _PyLucene.Hits_doc(*args)
> JavaError: java.lang.IndexOutOfBoundsException: Not a valid hit number: 0
>
> ----------------------------------------------------------------------
> Ran 1 test in 0.028s
>
> FAILED (errors=1)
> .
> ----------------------------------------------------------------------
> Ran 1 test in 0.005s
>
> OK
> .
> ----------------------------------------------------------------------
> Ran 1 test in 0.001s
>
> OK
> FF
> ======================================================================
> FAIL: testAfter (__main__.DateFilterTestCase)
> ----------------------------------------------------------------------
> Traceback (most recent call last):
> File "test/test_DateFilter.py", line 131, in testAfter
> self.assertEqual(0, result.length())
> AssertionError: 0 != 1
>
> ======================================================================
> FAIL: testBefore (__main__.DateFilterTestCase)
> ----------------------------------------------------------------------
> Traceback (most recent call last):
> File "test/test_DateFilter.py", line 82, in testBefore
> self.assertEqual(0, result.length())
> AssertionError: 0 != 1
>
> ----------------------------------------------------------------------
> Ran 2 tests in 0.006s
>
> FAILED (failures=2)
> E
> ======================================================================
> ERROR: testDocBoost (__main__.DocBoostTestCase)
> ----------------------------------------------------------------------
> Traceback (most recent call last):
> File "test/test_DocBoost.py", line 68, in testDocBoost
> hitCollector())
> File "/usr/lib/python2.4/site-packages/PyLucene.py", line 2888, in
> search
> def search(*args): return _PyLucene.Searcher_search(*args)
> JavaError: java.lang.NullPointerException
>
> ----------------------------------------------------------------------
> Ran 1 test in 12.866s
>
> FAILED (errors=1)
> xargs: python: ½ÅÈ£ 6¿¡ ÀÇÇØ Á¾·áµÊ
> make: *** [test] ¿À·ù 123
>
As you can see, the make process doesn't even finish ..
What should I do in this case?
Is there any other way to deal with this problem other than
waiting for the stable package to come out?
Currently, I am unable to sort the search results using
search(query, sort)
So, I tried sorting afterwards, by retrieving the documents
with map(lambda x: hits.doc(i), range(len(hits))
and then sorting the documents according to a single field.
However, this is too much slow when the # of hits goes above
a certain amount.
I really hope there's a way to search and sort at the same time...
Please help...
thank you very much in advance....
- Chan
2006/3/21, Andi Vajda <vajda at osafoundation.org>:
>
>
> On Sat, 18 Mar 2006, chan kang wrote:
>
> > Hi...
> > I've read a posting<
> http://lists.osafoundation.org/pipermail/pylucene-dev/2004-August/000089.html
> >from
> > 2004 about error regarding PyLucene's optimize().
> > <
> http://lists.osafoundation.org/pipermail/pylucene-dev/2004-August/000089.html
> >
> > at the end of the thread, he said that he's solved the problem by
> employing
> >
> > "gcc (GCC) 3.5.0 20040717".
> >
> > I thought my situation is somewhat similar to his.
> > What happens is : my optimize() function ends up with JavaError.
> > It doesn't even say where the error is made...
> > The error message is as follows:
> >
> > optimizing................
> >> Traceback (most recent call last):
> >> File "in.py", line 124, in ?
> >> writer.optimize()
> >> File "/usr/lib/python2.4/site-packages/PyLucene.py", line 2276, in
> >> optimize
> >> def optimize(*args): return _PyLucene.IndexWriter_optimize(*args)
> >> PyLucene.JavaErrorhola:/usr/lib/cgi-bin#
> >
> >
> >
> >
> > Since I've been encountering this type of error ever since I tried to
> > optimize over
> > larger indices, I've switched my approach to NOT optimizing only once at
> the
> > end,
> > but optimizing at the end of certain amount of transactions(writing to
> > index).
> > Whenever the optimizing is successfully carried out, I copy it to
> another
> > temporary directory,
> > so the resulting copied index is always the successfully optimized
> version.
> >
> > However, I'm left with the same type of error.
> > So I've looked into the index directory, and found out that the
> > "so-far-successful index" is
> > 2GB. I think this means that the optimize() was successful until 2GB,
> and
> > the error occurred when
> > I tried to optimize the index to create a single file with more than
> 2GB.
> > So I guessed that it might be the file size limitation for linux,
> because
> > the figure is exactly "2gb",
> > and tried to make a new file of more than 12GB, by using the command
> > dd if=/dev/zero of=big.file bs=1M count=12000
> >
> >
> > To my surprise, the 12GB file was successfully created... Does this mean
> > that it has nothing to do with the linux file size limit?
> > My gcc version is 4.x (gcc (GCC) 4.0.3 20060212 (prerelease) (Debian
> 4.0.2-9)).
> > , file system ext3, and kernel version is debian 2.6.15
> > After reading the
> > posting<
> http://lists.osafoundation.org/pipermail/pylucene-dev/2004-August/000092.html
> >,
> > I started to seriously consider about actually "downgrading" to gcc
> version
> > 3.5.0,
> > because he said that worked...
>
> There was a bug in gcc 3.x about a 2gb file size limit. gcj 3.5 morphed
> into
> gcj 4.0 and has that particular problem fixed.
>
> If you'd like to try a newer version of gcj with PyLucene, 'make test'
> will
> tell you if your build is sane.
>
> Andi..
>
> _______________________________________________
> pylucene-dev mailing list
> pylucene-dev at osafoundation.org
> http://lists.osafoundation.org/mailman/listinfo/pylucene-dev
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.osafoundation.org/pipermail/pylucene-dev/attachments/20060321/b688b00c/attachment-0001.htm
More information about the pylucene-dev
mailing list