[pylucene-dev] Is there PyNutch?
Brian Whitman
brian.whitman at variogr.am
Wed Feb 14 09:34:34 PST 2007
On Feb 14, 2007, at 12:27 PM, Jack L wrote:
> The core of Nutch - Lucene has a Python port PyLucene. I wonder
> if there is a Python port for Nutch? We have some distributed
> Nutch searchers running. I'm thinking, if would be nice to
> have the merger/frontend available to Python and take advantage of
> the powerful Python web frameworks.
There is a Python frontend to Nutch built by Dennis Kubes:
http://wiki.apache.org/nutch/Automating_Fetches_with_Python
And in our setup we mix Nutch's java parsers and crawlers with our
own homebuilt Python ones. We use Solr via a Python class to inject
data into the main nutch index. You have to be very careful with
index and segment merging but otherwise it works well.
I was initially using PyLucene for this task but I found that Solr
does a great job at abstracting the index files from the application,
and we can run multiple crawl processes on many machines all feeding
to the same Solr-led index. With PyLucene/Lucene you need to worry
about locks and the indexWriter/Reader.
For more on Nutch->Solr, see http://blog.foofactory.fi/2007/02/online-
indexing-integrating-nutch-with.html
More information about the pylucene-dev
mailing list