[pylucene-dev] First shot at custom tokenfilter
Andi Vajda
vajda at osafoundation.org
Mon Mar 26 20:00:27 PST 2007
On Mon, 26 Mar 2007, Ofer Nave wrote:
>> -----Original Message-----
>> From: pylucene-dev-bounces at osafoundation.org
>> [mailto:pylucene-dev-bounces at osafoundation.org] On Behalf Of Ofer Nave
>> Sent: Monday, March 26, 2007 5:01 PM
>>
>> I reimplemented my FooAnalyzer using this pattern and it
>> works now. I still don't know why, but at least it works. :)
>
> Ever since I started using a custom Analyer and TokenFilter, my index build
> script keeps crashing. Usually it just freezes at a random point, and won't
> even respond to ctrl-c (I have to use kill -9 in another terminal). One
> time it ended with: 'Fatal Python error: This thread state must be current
> when releasing'. One time it finished successfully (out of about 20
> attempts). This is from repeated runs without changing any code.
If you submit a piece of code that reproduces the problem, I can take a look
at it (best would be something like a unit test, see PyLucene/test).
Also, what is your OS ? did you build PyLucene yourself ? If so, which gcj ?
Does 'make test' pass ? What is your version of Python ?
Andi..
>
> I'm not creating any threads. It's a straight python script, no apache or
> web stuff involved. The only change has been the custom analyzer and
> tokenfilter.
>
> For reference:
>
> ---
> class TermJoinTokenFilter(object):
>
> TOKEN_TYPE_JOINED = "JOINED"
>
> def __init__(self, tokenStream):
> self.tokenStream = tokenStream
> self.a = None
> self.b = None
>
> def __iter__(self):
> return self
>
> def next(self):
> if self.a: # emitted prev last time - need to set next, emit prev +
> next, and reset prev to None
> self.b = self.tokenStream.next()
> if self.b is None:
> return None
> joined = PyLucene.Token(self.a.termText() + self.b.termText(),
> self.a.startOffset(), self.a.endOffset(), self.TOKEN_TYPE_JOINED)
> joined.setPositionIncrement(0)
> self.a = None
> return joined
> elif self.b: # emitted prev + next last time - need to emit next,
> set prev to next, and reset next to None
> self.a = self.b
> self.b = None
> return self.a
> else: # first call ever - set prev to first token and emit first
> token
> self.a = self.tokenStream.next()
> return self.a
>
> class TermJoinAnalyzer(object):
>
> def __init__(self, analyzer=PyLucene.StandardAnalyzer()):
> self.analyzer = analyzer
>
> def tokenStream(self, fieldName, reader):
> tokenStream = self.analyzer.tokenStream(fieldName, reader)
> filter = TermJoinTokenFilter(tokenStream)
> return tokenStream.tokenFilter(filter)
> ---
>
> -ofer
>
> _______________________________________________
> pylucene-dev mailing list
> pylucene-dev at osafoundation.org
> http://lists.osafoundation.org/mailman/listinfo/pylucene-dev
>
More information about the pylucene-dev
mailing list