[pylucene-dev] Problems with StringReader()
Andi Vajda
vajda at osafoundation.org
Tue Nov 28 09:05:02 PST 2006
On Tue, 28 Nov 2006, BEADLING, Philip, GBM wrote:
> def highlight( self, searchText, searchResultFilenames ):
> for filename in searchResultFilenames:
> # Find text directory from documents directory and convert
> network fileshare to local mount
> textFile = filename.replace("\\Documents\\","\\Text\\") + ".txt"
> textFile = textFile.replace("\\", "/")
> textFile =
> textFile.replace("//networkshare/IRDcaf/Documentation", "/Documentation")
>
> print "<br>", searchText, "<br>", textFile
> if os.path.isfile( textFile ):
> filen = open( textFile, 'r' )
> textString = filen.read()
> filen.close()
> term = Term( "field", searchText )
> termQuery = TermQuery( term )
> scorer = QueryScorer( termQuery )
> highlighter = Highlighter( scorer )
> simpAn = SimpleAnalyzer()
> # PROBLEM IS HERE!!!!
> reader = PyLucene.StringReader( textString )
> tokenStream = simpAn.tokenStream("field", reader )
> print highlighter.getBestFragment( tokenStream, textString )
>
At first quick glance, it doesn't look like 'textString' is going to be of
type 'unicode' in the above code sample. What comes out of a python file's
read method is a object of type 'str'. I believe PyLucene will try to convert
the 'str' into a 'unicode' object by assuming 'utf-8' encoding. If your 'str'
is not 'utf-8' encoded then that is going to fail.
If you send in a piece of code that runs (with the required data) that
reproduces the problem you're experiencing, I might be able to help you
better.
Andi..
More information about the pylucene-dev
mailing list