[Dev] pylucene fsdirectory patch and unicode issue
vajda at osafoundation.org
Sat May 1 16:16:48 PDT 2004
I just integrated your changes. Here is what I did:
- The patches were relative to the 'old' build sources. Our build is in
flux, we are about to move to a 'new' build and there has been quite a bit
of shifting around in our CVS repository. The sources of PyLucene that are
actively maintained are in internal/PyLucene.
For more information on the 'new' build infrastructure, please see:
I manually added FSDirectory to PyLucene.i and to the make files and
re-generated the rest.
- The problem with the unicode test was that you were passing a unicode
string to an InputStreamReader. As in Java, where I borrowed this idea
from, input streams are for bytes and readers are for unicode chars.
If you want to read unicode chars from a unicode string you can:
- encode it as utf-8 bytes and pass it to an InputStreamReader, which is
a little wasteful since the job of the InputStreamReader is to stream
- or pass the unicode string to a StringReader which I added a class
for in your tester and to repository/util/Streams.py. Your tester is
also checked into the new internal/PyLucene/test directory.
On Thu, 29 Apr 2004, Kapil Thangavelu wrote:
> hi folks,
> attached is a patch against cvs head to add lucene's standard
> fsdirectory store to PyLucene. swig files were regenerated with swig
> also attached is a unittest file, with one failing test (prefix XXX)
> which attempts to index unicode with pylucene, using a copy of the input
> stream reader from repository.utils.Streams which does string encoding.
> i was wondering if anyone had any idea as to the cause of this error,
> because afaics they should return the same value because the encoding by
> input stream reader amounts to the following
> unicode(u'sample text'*20).encode('utf-8')
> unicode('sample text'*20).encode('utf-8')
> and the return values are both of type str and have the same value.
> i've attached the traceback from the unit test as well.
More information about the Dev