[pylucene-dev] First shot at custom tokenfilter
Ofer Nave
ofer at smarter.com
Mon Mar 26 16:01:06 PST 2007
> -----Original Message-----
> From: pylucene-dev-bounces at osafoundation.org
> [mailto:pylucene-dev-bounces at osafoundation.org] On Behalf Of Ofer Nave
> Sent: Monday, March 26, 2007 4:49 PM
>
> I checked the PyLucene README, and the note regarding custom
> tokenfilters said this:
>
> "In order to instantiate such a custom token filter,
> the additional
> tokenFilter() factory method defined on
> org.apache.lucene.analysis.TokenStream instances needs
> to be invoked
> with the Python extension instance."
>
> However, I couldn't find reference to any tokenFilter()
> methods in the TokenStream class family in the Lucene 2.1 docs.
I finally figured out that it might be smart to compare my implementation to
the PyLucene version of the SynonymAnalyzer and SynonymFilter classes from
LIA (yeah, I'm slow).
The SynonnymAnalyzer class defines tokenStream like this:
def tokenStream(self, fieldName, reader):
tokenStream =
LowerCaseFilter(StandardFilter(StandardTokenizer(reader)))
tokenStream = StopFilter(tokenStream, StandardAnalyzer.STOP_WORDS)
filter = SynonymFilter(tokenStream, self.engine)
return tokenStream.tokenFilter(filter)
I found this very strange (especially the part about giving the filter the
stream object AND giving the stream the filter object), but it does match
the note in the README regarding the tokenFilter() factory method that I
previously didn't understand.
I reimplemented my FooAnalyzer using this pattern and it works now. I still
don't know why, but at least it works. :)
-ofer
More information about the pylucene-dev
mailing list