[pylucene-dev] language dependent analyzer

Andi Vajda vajda at osafoundation.org
Mon Dec 10 10:36:29 PST 2007


On Mon, 10 Dec 2007, Helmut Jarausch wrote:

> sorry, but I need some more help.
>
> I'm trying to index our libarary. Each book entry contains the table of
> contents (TOC). 'Analyzing' this should be dependent on the language the
> book is written in.
> So, I need a customized Analyzer (probably using
> PerFieldAnalyzerWrapper)
> which 'analyzes' the TOC dependent on the (recorded) language of the
> book.
>
> Is there an example of an customized analyzer whose action depends
> on the data currently being indexed?

For general Lucene usage questions such as this one, you are encouraged to 
contact the Lucene user list at java-users at lucene.apache.org. The solution 
you'd find there is directly applicable to PyLucene. This list is about 
Python-specific or PyLucene-specific issues.
The Lucene user list gets a lot more traffic and you're more likely to find 
an answer there.

As for language specific analyzers, PyLucene supports all the Java Lucene 
analyzers [1] and the snowball package [2] by including them into the build 
by default. It is trivial to add more such packages to jcc-PyLucene since 
all the wrappers are machine-generated. See its Makefile [3] for an example 
of how the current set of Lucene .jar files is wrapped.

Andi..

[1] http://svn.apache.org/repos/asf/lucene/java/trunk/contrib/analyzers/src/java/org/apache/lucene/analysis/
[2] http://svn.apache.org/repos/asf/lucene/java/trunk/contrib/snowball/
[3] http://svn.osafoundation.org/pylucene/trunk/jcc/Makefile


More information about the pylucene-dev mailing list