vajda at osafoundation.org
Thu Nov 24 08:57:45 PST 2005
On Thu, 24 Nov 2005, Victor Peinado wrote:
> Hello all,
> I'm indexing Spanish documents with Lucene and I need to avoid stop
> words. I'm quite new using PyLucene and so far the StandarAnalyzer
> worked well enough.
> But now i need to do more complex things. Is there any SpanishAnalyzer
> in the official distribution of Lucene or PyLucene, as those ones for
> German or Russian? If there isn't, is it very difficult to extend
> Analyzer to implement a kind of SpanishanAnalyzer? What issues should
> I have in mind? Any tip/idea/documentation I should read first?
I don't think there is a SpanishAnalyzer in Java Lucence 1.4.3.
There may be something in the snowball contrib package (also included in
Creating a custom analyzer in python in PyLucene can be pretty simple. See the
"Lucene in Action" samples ported to Python in the PyLucene distribution.
If all you want is a different set of stop words, it might even be very
For more specific information about a SpanishAnalyzer or how to go about
creating your own, you might ask the java-user at lucene.apache.org mailing list
where such Lucene-specific (java or not) questions are best addressed.
More information about the pylucene-dev