[Dev] Re: large random IDs (4suite uuid.py)
J C Lawrence
claw at kanga.nu
Tue Apr 8 23:20:04 PDT 2003
On Tue, 08 Apr 2003 19:37:07 -0700
rys mccusker <david at treedragon.com> wrote:
> J C Lawrence wrote:
>> rys mccusker <david at treedragon.com> wrote:
> I said common occurrence to mean it would not be rare (as in,
> happening never or almost never). I did not been it would be typical
> and usual.
Absolutely -- I was not finding fault with the literal truth of your
statement, but the inference which suggested that it was a reasonable
mean. This is the same logic that says "1" is a correct answer to, "How
many hands do you have?" If you have two hands, you also have one hand.
> that scenario is not the expected case for your average user.
It might not be now. Given a deployed Chandler consumer-level base for
~12 months I wouldn't bet on it.
> I had in mind some populations in which this case would occur more
> often than rarely. I did not want this case to be broken.
> One should expect far more object IDs to be used than user visible
> when I was at Netscape ...
BTW Do you know Mike Belshe?
> ... was asked to optimize address books and their synchronization with
> LDAP, because numerous client companies demanded support for address
> books for users in excess of 100,000 populations.
In that line, BitKeeper has not-so-recently started running into the
problem that users have made more than 2^32 changes to single files
within BK controlled repositories. BK is not an old product -- and
hasn't been deployed for that long in those cases (~3 years IIRC). A
32bit int used to enumerate changes to a single file in a repository was
expected to last considerably longer.
> we are not targeting the enterprise market, as we have said
> repeatedly, and yet this situation still occurs and ought to be
> accounted for in architecture. perhaps some significant subsets of
> companies will try to synchronize big portions of repositories.
A weather eye on possible futures rarely hurts.
> and I think you were right that the estimate of average object ID
> population size might have been conservative in some ways, assuming
> data never goes away, and becomes more valuable, and people store more
> things, and share more with each other.
Consumer grade off-the-shelf PCs now ship from supermarket shelves with
in excess of 100Gig of DASD. Odds are fair that by the time Chandler
v1.0 rolls that we'll be talking the better part of a TB as the base.
Humans tend to be pack rats, especially if the system doesn't actively
encourage or force them to clean up after themselves.
> (however, it's a stretch for me to estimate a user might have a huge
> number of _unique_ object IDs whose provenance was not sharing from
> other users in the same pool of users.)
If Chandler is successful then users will be tempted to keep everything
they touch within its domain, and if its view and report tools are good
enough they won't have an impetus to clean up after themselves and they
will be tempted to never delete anything, potentially making every
packet dribble and transient file yet another Chandler object. Add
network active objects (objects that are "alive" in the network sense
and are thus regularly updated or manipulated via basic network
activities) and there's the possibility for very rapid and long term
sustained object ID drains. Its the horror of a fully persistent setup
(especially if you support versioning), but its also what people tend to
naively expect and think, "should happen".
Probably not worth explicitly planning for, but a good candidate for a
J C Lawrence
---------(*) Satan, oscillate my metallic sonatas.
claw at kanga.nu He lived as a devil, eh?
http://www.kanga.nu/~claw/ Evil is a name of a foeman, as I live.
More information about the Dev