[pylucene-dev] Problems with StringReader()

Andi Vajda vajda at osafoundation.org
Tue Nov 28 09:05:02 PST 2006


On Tue, 28 Nov 2006, BEADLING, Philip, GBM wrote:

>    def highlight( self, searchText, searchResultFilenames ):
>        for filename in searchResultFilenames:
>            # Find text directory from documents directory and convert
> network fileshare to local mount
>            textFile = filename.replace("\\Documents\\","\\Text\\") + ".txt"
>            textFile = textFile.replace("\\", "/")
>            textFile =
> textFile.replace("//networkshare/IRDcaf/Documentation", "/Documentation")
>
>            print "<br>", searchText, "<br>", textFile
>            if os.path.isfile( textFile ):
>                filen = open( textFile, 'r' )
>                textString = filen.read()
>                filen.close()
>                term = Term( "field", searchText )
>                termQuery = TermQuery( term )
>                scorer = QueryScorer( termQuery )
>                highlighter = Highlighter( scorer )
>                simpAn = SimpleAnalyzer()
>                # PROBLEM IS HERE!!!!
>                reader = PyLucene.StringReader( textString )
>                tokenStream = simpAn.tokenStream("field", reader )
>                print highlighter.getBestFragment( tokenStream, textString )
>

At first quick glance, it doesn't look like 'textString' is going to be of 
type 'unicode' in the above code sample. What comes out of a python file's 
read method is a object of type 'str'. I believe PyLucene will try to convert 
the 'str' into a 'unicode' object by assuming 'utf-8' encoding. If your 'str' 
is not 'utf-8' encoded then that is going to fail.

If you send in a piece of code that runs (with the required data) that 
reproduces the problem you're experiencing, I might be able to help you 
better.

Andi..


More information about the pylucene-dev mailing list