[pylucene-dev] pylucene and recommendations for RAM

David Pratt fairwinds at eastlink.ca
Thu Apr 5 15:17:40 PDT 2007


Hi Andi. I was thinking of using twisted's pb to retreive the objects 
from remote servers over the wire - sorry for not being clear. I should 
be able to load balance pb servers as to which server handles 
aggregation easily enough and hopefully merge results. I thought hits 
objects might be able to be merged.

To get better performance, each remote server could use index in RAM as 
opposed to filesystem. As far as writing to the distributed index, I 
could keep track of which index an object is writing to in the same way 
the load balancing is done for searching but a db is needed to keep 
track. A replication strategy could use the db to rebuild index in case 
a machine goes down (or you would temporarily loose results from all 
servers until the index was rebuilt).

Does this type of approach have any merit? What sort of strategy do you 
envision? Many thanks.

Regards,
David



Andi Vajda wrote:
> 
> On Thu, 5 Apr 2007, David Pratt wrote:
> 
>> Hi Andi. Yes, I like the idea of a pure python approach also. Quick 
>> question - if one is able to obtain hits objects over the wire from a 
>> number of servers, is pyLucene capable of merging hits objects as it 
>> stands now? I believe it is but cannot recall at the moment.
> 
> PyLucene is not capable of receiving anything over the wire.
> If it were, would it be capable to not return duplicate hits ? Not 
> without extra code caching all received hits for duplicate checking.
> 
> The Hits object is probably a bad place to start as it's very closely 
> tied to the IndexSearcher that returns them.
> 
> Andi..
> _______________________________________________
> pylucene-dev mailing list
> pylucene-dev at osafoundation.org
> http://lists.osafoundation.org/mailman/listinfo/pylucene-dev
> 


More information about the pylucene-dev mailing list