[pylucene-dev] problem creating binary fields

Andi Vajda vajda at osafoundation.org
Sat May 12 09:13:13 PDT 2007


On Sat, 12 May 2007, Andra Tori wrote:

> I have problems creating binary fields, i have nailed down the problem
> to the non-ascii characters in supposed binary data of the field.
>
> Here's the testcase:
> ------------------------------------
> import PyLucene
>
> a = PyLucene.Field("show_tokens", '\xf3',
>        PyLucene.Field.Store.YES)

Lucene expect unicode strings. If you pass in a regular byte string as with 
'\xf3', PyLucene will assume it's a 'utf-8' string when converting it to 
Unicode for Lucene.

Given that '\xf3' is not a valid utf-8 string, it fails. If you're going to 
use non utf-8 strings with PyLucene, you need to convert them to unicode first 
yourself with u'\xf3' or unicode('\xe9', 'iso-8859-1'), for example.

Andi..


More information about the pylucene-dev mailing list