[pylucene-dev] Re: seqfault in KeywordAnalyzerTest with
vajda at osafoundation.org
Fri Nov 30 08:51:33 PST 2007
On Fri, 30 Nov 2007, Felix Schwarz wrote:
> Andi Vajda wrote:
>>> The seqfault seems to be in testSimpleKeywordAnalyzer() before:
>>> self.assertEqual(ts.next().termText(), input)
>>> The program terminates immediately after ts.next().
>> Could it be that there is a mismatch in unicode char width between the
>> python you compiled PyLucene with and the python you're running it with
>> (which should be the same, really) ?
> How can I check this?
> I'm just using the Python which comes with CentOS 5 and did not modify
> anything in PyLucene (besides some Makefile/setup.py stuff).
>>> From the name of the function on the stack 'PyUnicodeUCS4_FromUnicode', it
>> could imply this.
>> To debug this, use gdb. You can recompile PyLucene with DEBUG=1 to disable
>> optimizations and get a better gdb experience.
Edit JCCEnv.cpp and add:
printf("sizeof(Py_UNICODE) == sizeof(jchar): %d\n",
sizeof(Py_UNICODE) == sizeof(jchar));
to the top of the JCCEnv::fromJString function and rebuild.
If it says '1' I suspect a problem because, unless I'm mistaken, the
PyUnicodeUCS4_FromUnicode expects 4-byte unicode chars yet Java's jchar is
2-byte. There are flavors of unicode chars in python: 2-byte wide and 4-byte
Of course, I could be completely wrong and misleading you. Only stepping
though gdb can actually tell.
More information about the pylucene-dev