[Chandler-dev] empty value report (perf)
Andi Vajda
vajda at osafoundation.org
Thu Jan 18 17:49:53 PST 2007
A while ago I noticed that the Chandler 'Welcome Note' had 49 values out of
the box. That number seemed a little high to me so I looked into this issue
a little more. Below is what I found so far.
Using Katie's alpha4.ini file which restores the collections she's normally
using during dogfooding Chandler I end up with a repository containing 1816
ContentItem instances. This includes the office calendar.
>>> l=list(ContentItem.iterItems(view))
>>> len(l)
1816
These 1816 instances contain a total of 58856 values or references, that
is, 58856 named entries in their _values and _references dictionaries.
>>> sum(len(i._values) + len(i._references) for i in l)
58856
More precisely, 32274 literal values and
21852 references (bi-refs, ref collections, None, ...)
Focusing on literal values, how many false values, ie, values that are None,
False, empty lists, empty dicts, etc... are there:
>>> sum(sum(1 for v in i._values._dict.itervalues() if not v) for i in l)
17944
Hmm, that's a lot of false values. 56% !
How many of these are empty dicts or empty lists:
>>> sum(sum(1 for v in i._values._dict.itervalues()
if not v and isinstance(v, dict)) for i in l)
2848
>>> sum(sum(1 for v in i._values._dict.itervalues()
if not v and isinstance(v, list)) for i in l)
1043
A lot of empty dicts it seems.
Digging further and getting help from a little count function:
def count(d, s):
for n in s:
if n in d:
d[n] += 1
else:
d[n] = 1
Then, I was interested in seeing which attributes and how many occurrences
of them had empty dicts:
>>> d = {}
>>> for a in ((n for n,v in i._values._dict.iteritems()
if not v and isinstance(v, dict)) for i in l):
count(d,a)
>>> for n,c in d.iteritems():
print "%50s: %4d" %(n, c)
osaf.pim.calendar.EventStamp.icalendarParameters: 450
downloadedMessageUIDS: 2
manifest: 4
osaf.pim.mail.MailStamp.headers: 798
osaf.pim.calendar.EventStamp.icalendarProperties: 796
osaf.pim.mail.MailStamp.chandlerHeaders: 798
Indeed, a small number of attributes are setup with empty dicts but these add
up. Would there be a way to not do that ? Using a defaultValue is not going to
work as a defaultValue is a schema value that is shared by all attributes
needing it. Using a mutable value as a defaultValue is not good.
Similarly, for empty lists, we have:
messageQueue: 3
filterClasses: 8
exdates: 61
osaf.pim.mail.MailStamp.referencesMID: 798
rdates: 68
bymonthday: 57
invitees: 48
Looks like at least one candidate for some rethinking...
Looking at simpler values, such as True or False, easy to use with
defaultValue since they're immutable, it looks like we have lots of
attributes with a local False value:
isActive: 3
osaf.pim.calendar.EventStamp.anyTime: 451
recursive: 1
useSSL: 15
osaf.usercollections.UserCollection.canAdd: 1
read: 1815
test: 7
untilIsDate: 68
osaf.pim.mail.MailStamp.toMe: 798
private: 1816
useAuth: 2
osaf.usercollections.UserCollection.allowOverlay: 4
osaf.usercollections.UserCollection.colorizeIcon: 4
osaf.pim.calendar.EventStamp.isGenerated: 15
osaf.usercollections.UserCollection.renameable: 4
needsReply: 1816
osaf.pim.calendar.EventStamp.allDay: 456
osaf.pim.mail.MailStamp.fromMe: 798
hidden: 8
osaf.pim.mail.MailStamp.isOutbound: 798
Similarly, for True, we have:
osaf.usercollections.UserCollection.dontDisplayAsCalendar: 7
established: 8
osaf.pim.calendar.EventStamp.anyTime: 347
recursive: 8
osaf.usercollections.UserCollection.outOfTheBoxCollection: 4
test: 1
useAuth: 1
mine: 1816
osaf.pim.calendar.EventStamp.allDay: 341
leaveOnServer: 2
useSSL: 4
read: 1
osaf.pim.calendar.EventStamp.isGenerated: 262
osaf.usercollections.UserCollection.iconNameHasClassVariant: 1
active: 8
isActive: 7
How about making 'mine' be True by default ?
Now, looking at the number of values and references per ContentItem
instance, it seems that they have at least 15 and at the most 54.
>>> m=[(len(i._values) + len(i._references), i.itsUUID) for i in l]
>>> m.sort()
>>> m[0]
(15, <UUID: cf01b286-a753-11db-b1d2-9e1578b66e66>)
>>> m[-1]
(54, <UUID: 05b7c14e-a754-11db-b1d3-9e1578b66e66>)
Breaking it up by number of items for a given number of values, I get:
>>> from itertools import groupby
>>> m=[(len(i._values) + len(i._references), i.itsUUID) for i in l]
>>> m.sort()
>>> [(n, len(list(g))) for n, g in groupby(m, lambda x: x[0])
[(15, 5), (16, 65), (17, 7), (18, 14), (19, 634), (20, 35), (21, 173),
(22, 18), (23, 21), (24, 6), (25, 1), (26, 5), (27, 4), (28, 4),
(29, 9), (30, 11), (31, 3), (32, 2), (37, 1), (39, 1), (46, 300),
(47, 152), (48, 26), (49, 35), (50, 43), (51, 21), (52, 55), (53, 129),
(54, 36)]
It looks like most ContentItem instances have at least 46 values !
Reducing these value counts can speed up many things such as item commit or
item load.
Andi..
More information about the chandler-dev
mailing list