[Dev] index requirements (chandler query lang)
petite_abeille
petite_abeille at mac.com
Sat Dec 21 05:15:48 PST 2002
On Saturday, Dec 21, 2002, at 11:39 Europe/Zurich, David McCusker wrote:
> Finding answers to search queries rapidly on very large data sets will
> normally require a btree index for each sorted collection supporting
> one kind of search. So we're characterizing our btree index plans.
What I found to work best is to split the problem in two: persistency
and indexing.
For persistency, I may use one btree per object class. Or less if I
have a proliferation of entities. But in any case, the btree simply
consist of two data type: a fixed length binary uuid to lookup an
object and a serialized version of its data.
As far as search and indexing, I don't use a btree but a inverted index
instead. And automatically index every single last significant byte of
everything. Each index entry refer to an object uuid. No data is
actually stored in an index.
In a nutshell, the inverted index takes care of finding all the
relevant object uuids given a query and the btree of resolving their
content.
Here are the actual figure as far as space usage goes:
Raw data: 45 %
Objects: 37 %
Indexes: 18 %
As always, YMMV.
Cheers,
PA.
More information about the Dev
mailing list