[Dev] uuid could/should be simplified

Andi Vajda vajda at osafoundation.org
Mon May 17 23:05:48 PDT 2004


Some responses to Roger's points:

> it appears that C extensions to Python for Windows require one to
> purchase Microsoft Visual C , and that comes in the .NET package which
> runs $700 or so. Not what I wanted to hear.

Well, not exactly. Python on Windows requires you to compile extensions for
it using the same compiler python itself was compiled with.
Therefore, on Windows, you have several options:

   - compile python with a free compiler such as cygwin's gcc or mingw.
   - compile python with the $100.00 version of Microsoft Visual C,
     same as the $700.00 version but without full optimization
   - compile python with the $0.00 version of Microsoft Visual C,
     same as the $700.00 version but command line only
     http://msdn.microsoft.com/visualc/vctoolkit2003/

> The Leach / Salz spec calls for a 16 byte (128 bit) uuid, and the specific
> Chandler format is divided into three parts (!): eight bytes of timestamp
> version, two bytes of "clock sequence" and six bytes of ethernet card
> "mac" address or random bits if the computer has no ethernet card. Each
> ethernet card has a unique address built into it, and the combination of
> time unique mac address should make for a truly globally unique ID. Turns
> out to be not so simple.

There is nothing specific about the Chandler format. I'm trying to
implement what is specified in '3.1.2 UUID layout' of the spec at
http://www.ics.uci.edu/pub/ietf/webdav/uuid-guid/draft-leach-uuids-guids-01.txt.
The timestamp field is actually 7 1/2 bytes (see below).

> "This is a security concern, due to the potential to trace hardware by
> following the IEEE 802 address. The UUID/GUID I-D by Paul Leach contains an
> alternate mechanism for generating the "node" field using pseudo-random
> numbers which doesnÕt have this security concern."

Absolutely. I have it on my todo list to change this at some point. But here
are some lame excuses about how it got to where it is now:
  Using the MAC address purports to enhance uniqueness. There are other ways
  to address this, spelled out by the spec. If security is an issue
  however, generating cryptographically random numbers isn't exactly
  fast. Last time I implemented a UUID facility, it was in Java and getting
  a MAC address from Java was not an option, so I resorted to creating a
  cryptographically random node id. On some platforms it took several
  seconds to generate. From python, however, calling C is not taboo so I
  thought I'd give the MAC address approach a try.

Nonetheless, you're absolutely correct, using the raw MAC address is not
very secure and needs to be remedied at some point.

> So that finished off the mac address concept for me -- but note that
> Chandler is still using it! The other python uuid generator at [link] was
> nice and simple -- use the current IP address if the computer is on the
> internet, otherwise random bits. At first blush it seems the time plus the
> internet address should be unique, but it finally dawned on me that often
> IP addresses are dynamically allocated, so when person A shuts down an
> internet

Uh, IP addresses are far from unique universally and not any more
secure. They're only unique on the subnet they're allocated for. This has
nothing to do with dynamically allocated IPs either. IP addresses are
definitely out.

> Now about the timestamp? Here's another surprise. The Chandler
> implementation overlays the high order bits of the date/time with a
> hardcoded uuidversion number. So the time and date cannot be recovered from
> the Chandler uuid, and the resulting time portion of the uuid is cyclical
> over I dunno what, maybe some years, instead of occuring just once until
> the timestamp runs out in the year 3000 and something.

Chandler does what the spec spells out at '3.2.1 Basic algorithm' and '3.2.6
UUID Generation details', namely the timestamp is defined as a 60 bit value
measured in 100 nanosecond intervals from '00:00:00.00, 15 October 1582'.
The 4 most significant bits of the time_hi_and_version field are set as
spelled out in 3.2.6: set the 4 most significant bits (bits numbered 12 to 15
inclusive) of the time_hi_and_version field to the 4-bit version number
corresponding to the UUID version being created, as shown in the table in
section 3.1.3.

There is nothing preventing these 60 timestamp bits from being recovered.
By the way, this is also a security hole. Not only can one find out on
which computer the UUID was generated, one can also find out very precisely
when.

Note that 60 bits of 100 nanosecond intervals starting on 10/15/1582 should,
if I'm not mistaken, run out at around the year 5238.
(2^60 / (10^7 * 86400 * 365)) + 1582 ~= 5238

> I guess this is just a bug that could be fixed, but it means as things
> are, this part of the Chandler format is also not usable.

So, what's the bug here, am I missing something here that needs to be
addressed before Year 5238 ?

Where the Chandler implementation diverges from the spec is how clock_seq
is set. The Chandler UUID implementation does not keep track of clock_seq
values between process restarts and opts instead for using the OS's
advertised quality-random number generator API to initialize clock_seq upon
process initialization and increments clock_seq were there to be two identical
timestamps between successive UUID generations in the same process,
something depending on the OS's clock resolution. Setting the clock back
without restarting the Chandler process is obviously a weak point of this
implementation. Generating more than 65535 UUIDs within one OS's clock
cycle is also a weakness.

> The Chandler uuid.c module can currently be found in the full source
> download at [link] -- look for chandler/Chandler/repository/util/ext/uuid.c

That code was recently moved, it's now in internal/UUIDext/uuid.c


The current implementation of Chandler's UUID is indeed not very secure
when privacy is taken into account - something that needs to be remedied.
At this point - we're embarking on item sharing - I view it as a debugging
feature to be able to trace UUIDs. For the final product, I'd like the UUID
generation to be more respectful of privacy.
How can this be achieved without too much of a performance hit ? Last time I
looked into cryptographically secure random number generators, they weren't
exacly fast. Could a hashing scheme be overlaid with the current MAC and
timestamp based approach ?

Andi..


On Mon, 17 May 2004, Roger Eaton wrote:

> For my experience working with the uuid.c function, see comments at
> http://blog.voiceofhumanity.net/newslog2.php/_v252/__show_article/_a000252-000040.htm
>
> For a number of reasons (simplicity, security, cleanup) the uuid might
> better be 128 random
> bits instead of using the complex time/clockseq/node format that it
> does.   Details are listed
> in the article above.
>
> Hoping to persuade,
>
> -- Roger Eaton
> rogereaton at brandx.net
> http://blog.voiceofhumanity.net
> _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
>
> Open Source Applications Foundation "Dev" mailing list
> http://lists.osafoundation.org/mailman/listinfo/dev
>




More information about the Dev mailing list