[pylucene-dev] language dependent analyzer
Andi Vajda
vajda at osafoundation.org
Mon Dec 10 10:36:29 PST 2007
On Mon, 10 Dec 2007, Helmut Jarausch wrote:
> sorry, but I need some more help.
>
> I'm trying to index our libarary. Each book entry contains the table of
> contents (TOC). 'Analyzing' this should be dependent on the language the
> book is written in.
> So, I need a customized Analyzer (probably using
> PerFieldAnalyzerWrapper)
> which 'analyzes' the TOC dependent on the (recorded) language of the
> book.
>
> Is there an example of an customized analyzer whose action depends
> on the data currently being indexed?
For general Lucene usage questions such as this one, you are encouraged to
contact the Lucene user list at java-users at lucene.apache.org. The solution
you'd find there is directly applicable to PyLucene. This list is about
Python-specific or PyLucene-specific issues.
The Lucene user list gets a lot more traffic and you're more likely to find
an answer there.
As for language specific analyzers, PyLucene supports all the Java Lucene
analyzers [1] and the snowball package [2] by including them into the build
by default. It is trivial to add more such packages to jcc-PyLucene since
all the wrappers are machine-generated. See its Makefile [3] for an example
of how the current set of Lucene .jar files is wrapped.
Andi..
[1] http://svn.apache.org/repos/asf/lucene/java/trunk/contrib/analyzers/src/java/org/apache/lucene/analysis/
[2] http://svn.apache.org/repos/asf/lucene/java/trunk/contrib/snowball/
[3] http://svn.osafoundation.org/pylucene/trunk/jcc/Makefile
More information about the pylucene-dev
mailing list