[Chandler-dev] empty value report (perf)

Andi Vajda vajda at osafoundation.org
Thu Jan 18 17:49:53 PST 2007


A while ago I noticed that the Chandler 'Welcome Note' had 49 values out of
the box. That number seemed a little high to me so I looked into this issue
a little more. Below is what I found so far.

Using Katie's alpha4.ini file which restores the collections she's normally 
using during dogfooding Chandler I end up with a repository containing 1816 
ContentItem instances. This includes the office calendar.

    >>> l=list(ContentItem.iterItems(view))
    >>> len(l)
    1816

These 1816 instances contain a total of 58856 values or references, that 
is, 58856 named entries in their _values and _references dictionaries.

    >>> sum(len(i._values) + len(i._references) for i in l)
    58856

More precisely, 32274 literal values and
                 21852 references (bi-refs, ref collections, None, ...)

Focusing on literal values, how many false values, ie, values that are None, 
False, empty lists, empty dicts, etc... are there:

    >>> sum(sum(1 for v in i._values._dict.itervalues() if not v) for i in l)
    17944

Hmm, that's a lot of false values. 56% !
How many of these are empty dicts or empty lists:

    >>> sum(sum(1 for v in i._values._dict.itervalues()
                if not v and isinstance(v, dict)) for i in l)
    2848

    >>> sum(sum(1 for v in i._values._dict.itervalues()
                if not v and isinstance(v, list)) for i in l)
    1043

A lot of empty dicts it seems.
Digging further and getting help from a little count function:

     def count(d, s):
         for n in s:
             if n in d:
                 d[n] += 1
             else:
                 d[n] = 1

Then, I was interested in seeing which attributes and how many occurrences
of them had empty dicts:

>>> d = {}
>>> for a in ((n for n,v in i._values._dict.iteritems()
                if not v and isinstance(v, dict)) for i in l):
         count(d,a)
>>> for n,c in d.iteritems():
         print "%50s: %4d" %(n, c)

   osaf.pim.calendar.EventStamp.icalendarParameters:  450
                              downloadedMessageUIDS:  2
                                           manifest:  4
                    osaf.pim.mail.MailStamp.headers:  798
   osaf.pim.calendar.EventStamp.icalendarProperties:  796
            osaf.pim.mail.MailStamp.chandlerHeaders:  798

Indeed, a small number of attributes are setup with empty dicts but these add 
up. Would there be a way to not do that ? Using a defaultValue is not going to 
work as a defaultValue is a schema value that is shared by all attributes 
needing it. Using a mutable value as a defaultValue is not good.

Similarly, for empty lists, we have:

                                       messageQueue:  3
                                      filterClasses:  8
                                            exdates:  61
              osaf.pim.mail.MailStamp.referencesMID:  798
                                             rdates:  68
                                         bymonthday:  57
                                           invitees:  48

Looks like at least one candidate for some rethinking...

Looking at simpler values, such as True or False, easy to use with
defaultValue since they're immutable, it looks like we have lots of
attributes with a local False value:

                                           isActive:    3
               osaf.pim.calendar.EventStamp.anyTime:  451
                                          recursive:    1
                                             useSSL:   15
         osaf.usercollections.UserCollection.canAdd:    1
                                               read: 1815
                                               test:    7
                                        untilIsDate:   68
                       osaf.pim.mail.MailStamp.toMe:  798
                                            private: 1816
                                            useAuth:    2
   osaf.usercollections.UserCollection.allowOverlay:    4
   osaf.usercollections.UserCollection.colorizeIcon:    4
           osaf.pim.calendar.EventStamp.isGenerated:   15
     osaf.usercollections.UserCollection.renameable:    4
                                         needsReply: 1816
                osaf.pim.calendar.EventStamp.allDay:  456
                     osaf.pim.mail.MailStamp.fromMe:  798
                                             hidden:    8
                 osaf.pim.mail.MailStamp.isOutbound:  798


Similarly, for True, we have:

    osaf.usercollections.UserCollection.dontDisplayAsCalendar:    7
                                                  established:    8
                         osaf.pim.calendar.EventStamp.anyTime:  347
                                                    recursive:    8
    osaf.usercollections.UserCollection.outOfTheBoxCollection:    4
                                                         test:    1
                                                      useAuth:    1
                                                         mine: 1816
                          osaf.pim.calendar.EventStamp.allDay:  341
                                                leaveOnServer:    2
                                                       useSSL:    4
                                                         read:    1
                     osaf.pim.calendar.EventStamp.isGenerated:  262
  osaf.usercollections.UserCollection.iconNameHasClassVariant:    1
                                                       active:    8
                                                     isActive:    7

How about making 'mine' be True by default ?

Now, looking at the number of values and references per ContentItem
instance, it seems that they have at least 15 and at the most 54.

    >>> m=[(len(i._values) + len(i._references), i.itsUUID) for i in l]
    >>> m.sort()
    >>> m[0]
    (15, <UUID: cf01b286-a753-11db-b1d2-9e1578b66e66>)
    >>> m[-1]
    (54, <UUID: 05b7c14e-a754-11db-b1d3-9e1578b66e66>)

Breaking it up by number of items for a given number of values, I get:

    >>> from itertools import groupby
    >>> m=[(len(i._values) + len(i._references), i.itsUUID) for i in l]
    >>> m.sort()
    >>> [(n, len(list(g))) for n, g in groupby(m, lambda x: x[0])
    [(15, 5), (16, 65), (17, 7), (18, 14), (19, 634), (20, 35), (21, 173),
     (22, 18), (23, 21), (24, 6), (25, 1), (26, 5), (27, 4), (28, 4),
     (29, 9), (30, 11), (31, 3), (32, 2), (37, 1), (39, 1), (46, 300),
     (47, 152), (48, 26), (49, 35), (50, 43), (51, 21), (52, 55), (53, 129),
     (54, 36)]

It looks like most ContentItem instances have at least 46 values !

Reducing these value counts can speed up many things such as item commit or
item load.

Andi..


More information about the chandler-dev mailing list