[pylucene-dev] Re: seqfault in KeywordAnalyzerTest with jcc-enabled PyLucene

Andi Vajda vajda at osafoundation.org
Fri Nov 30 08:51:33 PST 2007


On Fri, 30 Nov 2007, Felix Schwarz wrote:

> Andi Vajda wrote:
>>> The seqfault seems to be in testSimpleKeywordAnalyzer() before:
>>> self.assertEqual(ts.next().termText(), input)
>>> The program terminates immediately after ts.next().
>> 
>> Could it be that there is a mismatch in unicode char width between the 
>> python you compiled PyLucene with and the python you're running it with 
>> (which should be the same, really) ?
>
> How can I check this?
> I'm just using the Python which comes with CentOS 5 and did not modify
> anything in PyLucene (besides some Makefile/setup.py stuff).
>
>>> From the name of the function on the stack 'PyUnicodeUCS4_FromUnicode', it 
>> could imply this.
>> 
>> To debug this, use gdb. You can recompile PyLucene with DEBUG=1 to disable 
>> optimizations and get a better gdb experience.

Edit JCCEnv.cpp and add:

printf("sizeof(Py_UNICODE) == sizeof(jchar): %d\n",
        sizeof(Py_UNICODE) == sizeof(jchar));

to the top of the JCCEnv::fromJString function and rebuild.
If it says '1' I suspect a problem because, unless I'm mistaken, the
PyUnicodeUCS4_FromUnicode expects 4-byte unicode chars yet Java's jchar is 
2-byte. There are flavors of unicode chars in python: 2-byte wide and 4-byte 
wide.

Of course, I could be completely wrong and misleading you. Only stepping 
though gdb can actually tell.

Andi..


More information about the pylucene-dev mailing list