[pylucene-dev] pylucene and recommendations for RAM
David Pratt
fairwinds at eastlink.ca
Thu Apr 5 08:43:50 PDT 2007
Hi Pete. Many thanks for this advice. It would seem that perhaps a
cluster would best solve this and then spread over some number of lower
end servers. From what i read on large indexing, this seems to be the
approach (but with as much RAM as possible per server). I am looking at
costs so the lower end 2GB RAM servers are attractive but just use more
of them.
I have only used pylucene for tests on smaller indexes. Is a cluster
arrangement possible using pylucene? I am not a java programmer so would
like to stay with what I know. Many thanks.
Regards,
David
Pete wrote:
> On Thursday April 5 2007 9:33 am, David Pratt wrote:
>> I realize that the amount of RAM needed will be based on the size of the
>> index, how many documents and what you are storing in the index itself -
>> but some anecdotal information would be helpful. I am looking at an
>> index that could reach 20 - 50 million documents. Will a commodity
>> server with 2Gb be enough?
>
> IIRC, it's more a function of how quickly you're adding data than total size.
> Though this may be incorrect when merging segments (aka optimizing). A fast
> disk helps quite a lot too.
>
> You'll want to configure the IndexWriter for bulk loading. The relevant items
> are setMergeFactor, which controls how often segments are merged on disk, and
> setMaxBufferedDocs, which controls how many docs are held in RAM before being
> written out. A higher value for both will be faster, though be aware that an
> index build with a high merge factor is slower to query, so you'd probably
> want to optimize() at the end. On our indexing server, with ~4kb documents,
> setMaxBufferedDocs(200) uses about 700MB of RAM. See the Javadocs & Lucene
> In Action for more details.
>
> On the searching front, a dedicated commodity box w/ 2 GB can probably serve
> around 2 million documents (again, depending on document size). Multiple
> CPUs will let you serve more simultaneous queries.
>
>> I guess it is possible to build a test index with sample data to
>> determine this also. Many thanks.
>
> You should probably ask the Lucene list, but please report any test results
> here as well (you could put them on the wiki too).
>
More information about the pylucene-dev
mailing list