[pylucene-dev] Is there PyNutch?
jlist9 at yahoo.ca
Wed Feb 14 11:06:34 PST 2007
Thanks for the reply. (I'm not sure if this discussion is interesting
to PyLucene dev list. If it's considered OT, I shall take the next
I looked at the first link you sent. It's not actually what I'm
looking for. In our set up, we have multiple crawler/indexer/searcher
boxes talking to one merger/web server front-end using Nutch IPC.
The front-end box sends queries to multiple back-end searchers and
merge the results it has received, and presents them in a web page.
I'm hoping to find a way to replace the front-end Java implementation
with Python. So, the piece I'm looking for does not touch the
segments. Instead, it speaks Nutch IPC and parses the query
strings, issues queries to the back-end, and merges results and puts
them in a web page.
Thanks for mentioning your experience with solr. I haven't tried it
with large amount of data. My concern is, inserting using HTTP POST
is much less efficient than local file access (the Nutch approach.)
I'm not sure if it's able to handle millions of daily submits.
Wednesday, February 14, 2007, 9:34:34 AM, you wrote:
> On Feb 14, 2007, at 12:27 PM, Jack L wrote:
>> The core of Nutch - Lucene has a Python port PyLucene. I wonder
>> if there is a Python port for Nutch? We have some distributed
>> Nutch searchers running. I'm thinking, if would be nice to
>> have the merger/frontend available to Python and take advantage of
>> the powerful Python web frameworks.
> There is a Python frontend to Nutch built by Dennis Kubes:
> And in our setup we mix Nutch's java parsers and crawlers with our
> own homebuilt Python ones. We use Solr via a Python class to inject
> data into the main nutch index. You have to be very careful with
> index and segment merging but otherwise it works well.
> I was initially using PyLucene for this task but I found that Solr
> does a great job at abstracting the index files from the application,
> and we can run multiple crawl processes on many machines all feeding
> to the same Solr-led index. With PyLucene/Lucene you need to worry
> about locks and the indexWriter/Reader.
For more on Nutch->>Solr, see
For more on Nutch->>http://blog.foofactory.fi/2007/02/online-
> pylucene-dev mailing list
> pylucene-dev at osafoundation.org
Do You Yahoo!?
Tired of spam? Yahoo! Mail has the best spam protection around
More information about the pylucene-dev