[pylucene-dev] Is there PyNutch?

Brian Whitman brian.whitman at variogr.am
Wed Feb 14 09:34:34 PST 2007


On Feb 14, 2007, at 12:27 PM, Jack L wrote:
> The core of Nutch - Lucene has a Python port PyLucene. I wonder
> if there is a Python port for Nutch? We have some distributed
> Nutch searchers running. I'm thinking, if would be nice to
> have the merger/frontend available to Python and take advantage of
> the powerful Python web frameworks.


There is a Python frontend to Nutch built by Dennis Kubes:
http://wiki.apache.org/nutch/Automating_Fetches_with_Python

And in our setup we mix Nutch's java parsers and crawlers with our  
own homebuilt Python ones. We use Solr via a Python class to inject  
data into the main nutch index. You have to be very careful with  
index and segment merging but otherwise it works well.

I was initially using PyLucene for this task but I found that Solr  
does a great job at abstracting the index files from the application,  
and we can run multiple crawl processes on many machines all feeding  
to the same Solr-led index. With PyLucene/Lucene you need to worry  
about locks and the indexWriter/Reader.

For more on Nutch->Solr, see http://blog.foofactory.fi/2007/02/online- 
indexing-integrating-nutch-with.html






More information about the pylucene-dev mailing list