[pylucene-dev] Null Pointer Exception when Extending TokenFilter

Andi Vajda vajda at osafoundation.org
Fri Sep 15 12:24:38 PDT 2006


On Fri, 15 Sep 2006, Rob Young wrote:

> Python: 2.4.3
> PyLucene: 2.0.0
>
> I am trying to write a custom Analyser and TokenFilter but I keep getting a 
> NullPointerException whenever I try to place another filter after mine. If I 
> change the order of the filters so that mine is last everything is fine. Any 
> ideas on what the problem may be?
>
> Also, not a huge problem, but a little confusing. Why do I always have to 
> override the constructor, even if I am adding nothing of significance?

Look at the examples in tests/test_PositionIncrementTestCase.py.
There a custom python analyzer is setup. The lucene python 'extension' 
mechanism isn't really an 'extension' mechanism, as a wrapper in reverse.
(See the testSetPosition unit test).

Basically, a PythonAnalyzer Java class implements native methods that call a 
python object that implements the Analyzer protocol. There is hence no need to 
make your python analyzer a subclass of Analyzer. A subclass of object is 
enough.

Andi..

>
>
> from PyLucene import \
>   Analyzer, TokenFilter, StringReader, \
>   StandardTokenizer, LowerCaseFilter
>
> class TestAnalyzer( Analyzer ):
>   def __init__( self ):
>       pass
>   def tokenStream( self, reader ):
>       result = StandardTokenizer( reader )
>       # If I change the order of these two filters
>       # it works OK
>       result = LowerCaseFilter( result )
>       result = TestFilter( result )
>       return result
>
> class TestFilter( TokenFilter ):
>   def __init__( self, input ):
>       self.input = input
>   def __iter__( self ):
>       return self
>   def next( self ):
>       token = self.input.next()
>       if not token:
>           raise StopIteration
>       return token
>
> text = "A little chunK oF Text foR Me to analyze as a test for this problem 
> I'm having"
> tokenstream = TestAnalyzer().tokenStream( StringReader( text ) )
> for token in tokenstream:
>   print token.termText()
>
>


More information about the pylucene-dev mailing list