[pylucene-dev] Document encoding?

Andi Vajda vajda at osafoundation.org
Wed Mar 7 09:30:48 PST 2007


On Wed, 7 Mar 2007, Jarek Zgoda wrote:

> It seems that I cann't properly store UTF-8 encoded documents using PyLucene 
> (by "properly" I mean the documents are searchable and can be returned in 
> form they have been stored). Should I use only unicode objects in my 
> search/indexing machinery code, as PyLucene returns search result's fields as 
> unicode objects?

PyLucene wraps Java Lucene by compiling it with gcj. Java only uses Unicode.
If you pass utf-8 strings to PyLucene APIs, they are converted to Unicode 
before being passed to the wrapped Java Lucene APIs because that's all they 
understand.

Andi..


More information about the pylucene-dev mailing list