[pylucene-dev] lucene.analysis

Andi Vajda vajda at osafoundation.org
Sat Feb 23 11:26:36 PST 2008


On Feb 22, 2008, at 22:02, "Dirk Rothe" <d.rothe at semantics.de> wrote:

> On Fri, 22 Feb 2008 11:52:08 +0100, Andi Vajda <vajda at osafoundation.org 
> > wrote:
>
>>
>> On Feb 21, 2008, at 23:26, "Dirk Rothe" <d.rothe at semantics.de> wrote:
>>
>>> On Fri, 22 Feb 2008 11:06:40 +0100, Andi Vajda <vajda at osafoundation.org 
>>> > wrote:
>>>
>>>>
>>>> On Feb 21, 2008, at 22:10, "Dirk Rothe" <d.rothe at semantics.de>  
>>>> wrote:
>>>>
>>>>> I'm not sure if I miss something obvious, but how could I access  
>>>>> stuff in org.apache.lucene.analysis.de.
>>>>>
>>>>> http://hudson.zones.apache.org/hudson/job/Lucene-trunk/javadoc//org/apache/lucene/analysis/de/package-summary.html
>>>>>
>>>>> I havent found it in the pylucene namespace.
>>>>>
>>>>
>>>> The Java Lucene package structure is flattened in PyLucene. In  
>>>> other words, just import the class name from lucene:
>>>>   from lucene import GermanAnalyzer
>>>>
>>>> Andi..
>>>
>>> aah, I see, but there are two GermanStemmers:
>>>
>>> http://hudson.zones.apache.org/hudson/job/Lucene-trunk/javadoc//allclasses-frame.html
>>> [..]
>>> FuzzyQuery.ScoreTermQueue
>>> FuzzyTermEnum
>>> German2Stemmer
>>> GermanAnalyzer
>>> GermanStemFilter
>>> GermanStemmer
>>> GermanStemmer
>>> GradientFormatter
>>> GreekAnalyzer
>>> GreekCharsets
>>> GreekLowerCaseFilter
>>> HTMLDocument
>>> HTMLParser
>>> [..]
>>>
>>> One is from:
>>> java.lang.Object
>>> net.sf.snowball.SnowballProgram
>>>     net.sf.snowball.ext.GermanStemmer
>>>
>>> and the other from:
>>> java.lang.Object
>>> org.apache.lucene.analysis.de.GermanStemmer
>>>
>>> pylucene seems to wrap only the second one.
>>>
>>
>> Ugh. I'm afraid PyLucene wraps both but only one of them sticks.  
>> Both the analyzer contrib package and the porter stemmer packages  
>> are part of the build.
>> You get to choose:
>> - remove one of the jar files from your build in PyLucene's Makefile.
>> - rename one of the classes with a patch to the sources if you need  
>> to use both.
>>
>
> OK, I will do that.
>
> If this is a JCC "feature", I guess it would be nice to have a  
> resolution strategy for these cases in the future.

Yes, at least adding a --rename option would be a good move. I've also  
been thinking of adding an option not to flatten the Java package  
structure in Python. The C++ is already properly namespaced.
Another option already available is to use --exclude to skip the  
duplicate that is in the way as is done, for example, in PyLucene with  
the query parser's Token class.

Andi..

>
>
> --dirk
> _______________________________________________
> pylucene-dev mailing list
> pylucene-dev at osafoundation.org
> http://lists.osafoundation.org/mailman/listinfo/pylucene-dev


More information about the pylucene-dev mailing list