[Dev] index requirements (chandler query lang)

petite_abeille petite_abeille at mac.com
Sat Dec 21 05:15:48 PST 2002


On Saturday, Dec 21, 2002, at 11:39 Europe/Zurich, David McCusker wrote:

> Finding answers to search queries rapidly on very large data sets will
> normally require a btree index for each sorted collection supporting
> one kind of search.  So we're characterizing our btree index plans.

What I found to work best is to split the problem in two: persistency 
and indexing.

For persistency, I may use one btree per object class. Or less if I 
have a proliferation of entities. But in any case, the btree simply 
consist of two data type: a fixed length binary uuid to lookup an 
object and a serialized version of its data.

As far as search and indexing, I don't use a btree but a inverted index 
instead. And automatically index every single last significant byte of 
everything. Each index entry refer to an object uuid. No data is 
actually stored in an index.

In a nutshell, the inverted index takes care of finding all the 
relevant object uuids given a query and the btree of resolving their 
content.

Here are the actual figure as far as space usage goes:

Raw data: 45 %
Objects: 37 %
Indexes: 18 %

As always, YMMV.

Cheers,

PA.








More information about the Dev mailing list