[pylucene-dev] creating new Fragmenter
vajda at osafoundation.org
Wed Nov 22 08:58:29 PST 2006
On Wed, 22 Nov 2006, Brian Whitman wrote:
> I am attempting to create a sentence splitting fragmenter in PyLucene.
> I see that the Fragmenter is not a class but a Interface definition.
> Is there a way to create a new Fragmenter type that PyLucene can access via
> highlighter.setTextFragmenter(SentenceFragmenter()) ?
> I have tried to create a Python class that looks like
> class SentenceFragmenter:
> def start(self, text):
> def isNewFragment(self,token):
> but I get a InvalidArgsError in the setTextFragmenter call..
For that to work there has to be an extension point. At the moment, there is
no extension point for Fragmenter. I sure can add one....
An extension point is a wrapper in reverse, it's a Java class extending a
Lucene class or implementing a Lucene interface where the implementation
methods are native methods invoking a wrapped python implementation.
Currently, the only extension point available for the highlighter package is
for Formatter (see cpp/PythonHighlight.cpp and its uses in lucene.cpp).
Java Lucene has a number of well-known extension points and many of them are
implemented by PyLucene. Since they are a fair amount of work to make I only
implemented the obvious ones or the ones used by Java Lucene samples such as
the ones in the "Lucene in Action" book ported to PyLucene.
I'm open to adding new ones as people find they are blocked by missing them
for their use cases. Patches are also welcome...
I should have a Fragmenter extension point later today...
More information about the pylucene-dev