[pylucene-dev] Announcing Grasyknoll,
was Re: Need to build a high-load searcher
Pete
pfein at pobox.com
Mon Mar 19 18:31:45 PST 2007
On Monday March 19 2007 8:39 pm, Jack L wrote:
> This is very interesting. Because I'm planning on deploying
> a solr-based search functionality soon, and I'd rather use Python,
If you're looking for something to deploy next week, Grassyknoll's not it. ;)
As mentioned, it's early alpha. That said, I have the full support of my
employer for this and we're going to be re-deploying our production site on
it, so it's going to get done, soon.
> I wonder if you have any numbers comparing the performance/CPU load
> /memory footprint, etc. between Grasyknoll and solr?
Sorry, I don't have anything like that ATM. One of the other devs was going
to be doing some benchmarking over the weekend, but he's not back from
vacation.
Currently, I'm using wsgiref as the server, which is single threaded. This
makes developing much easier, but isn't going to give very good performance.
The nice thing about wsgi is that it's relatively easy to swap servers.
However, interfacing a multi-threaded webserver with PyLucene is non-trivial,
as this ML will attest. ;) I've got a really good idea of how to go about
this, taking full advantage of PyLucene's GIL-releasing benefits, but that's
going to have to wait until the internals get nailed down a bit more.
> - Grasyknoll search vs lucene search
Grassyknoll's built on PyLucene. Supposedly, PyLucene is about 2x as fast a
Java lucene. Andii's got numbers on the website, IIRC.
> - Grasyknoll web server vs jetty
I've never used jetty. We'll probably end up using pasteserver, though flup's
a possiblity as well. I don't have performance numbers on either, but I
suspect PyLucene will be the bottleneck.
> solr also has a Python output format. Any chance Grasyknoll can
> provide the same format to make it easy to port the front-end
> application? And/or a similar REST URL scheme?
To be honest, I've never used Solr & my eyes tend to glaze over reading the
docs. If you dig up the relevant links, I'll take a look. ;) I'm eager to
make this easy to use for folks, so supporting formerly-Solr clients
certainly seems reasonable. I'm planning on supporting quite a range of
output formats, including (but not limited to) JSON, XML, pickle and some
form of HTML for debugging/browsing.
As for the REST URL scheme, it's pretty standard:
GET http://foo.com/?q=find+me+things
GET http://foo.com/my_doc_id/
PUT http://foo.com/my_doc_id/
DELETE http://foo.com/my_doc_id/
POST http://foo.com/ which'll create a unique id for you using uuid.
We're also going to support *Many versions of the above, which would allow you
to batch a bunch of operations into a single request. This is for
performance reasons on the Lucene side.
IIRC, the Solr python output format is intended to be eval()'d. From my
perspective, that's a little dubious from a security perspective (though the
same applies to pickle, I suppose).
--
Peter Fein || 773-575-0694 || pfein at pobox.com
http://www.pobox.com/~pfein/ || PGP: 0xCCF6AE6B
irc: pfein at freenode.net || jabber: peter.fein at gmail.com
More information about the pylucene-dev
mailing list