[General] Updated Hub usage metric
jared at wordzoo.com
Mon Nov 12 10:56:28 PST 2007
At the top of:
is a new metric: "Daily Hub visitors by Application". The rest of this
email describes what the metric actually measures, its limitations, and
how best to proceed towards getting an understanding of Chandler's
+ Hub usage in 5 buckets
This metric proposes to break down all incoming Hub traffic into 5 buckets:
- Chandler Desktop
- Web browser
- Mozilla Calendar(s) (Lightning, Sunbird)
- iCal 3.x
- Other (everything else)
+ Metric details and examples
How does this metric work? I count up all the IP + "HTTP User Agent"
pairs I see in the Chandler Server logs. This gives a list like:
184.108.40.206 Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US;
rv:220.127.116.11) Gecko/20070725 Firefox/18.104.22.168
22.214.171.124 Chandler/0.7.1 (Windows; U; i386; pt_BR)
126.96.36.199 Chandler/0.7.0.1 (Windows; U; i386; pt_BR)
188.8.131.52 Mozilla/5.0 (Windows; U; Windows NT 5.2; en-US; rv:184.108.40.206)
220.127.116.11 Chandler/0.7.2-rc1 (Windows; U; i386; en_CA)
18.104.22.168 Chandler/0.7.0.1 (Windows; U; i386; en_US)
22.214.171.124 DAVKit/2.0 (10.5; wrbt) iCal 3.0
126.96.36.199 Chandler/0.7.1 (Macintosh; U; i386; en_US)
188.8.131.52 DAVKit/2.0 (10.5; wrbt) iCal 3.0
184.108.40.206 Mozilla/5.0 (Macintosh; U; PPC Mac OS X Mach-O; en-US;
rv:220.127.116.11) Gecko/20071025 Firefox/18.104.22.168
So then I go through this list, and analyze all the user-agents into the
buckets shown on the graph. The "other" bucket is things we recognize
explicitly but don't have their own bucket. This includes iCal 2,
Evolution, NetNewsWire, and a big list of long-tail apps.
+ Undercounting and double-counting
If we don't recognize it, it's not counted. Same with
robots/web-spiders. This traits means we are undercounting a little bit
on total usage, and new clients aren't counted until I add them to the
Since we include IP address in the pair, we will undercount multiple
people using the same client behind a firewall, etc.
We will double-count people using the same client at home and at work.
We will double-count people using both Chandler and a web browser from
the same machine (there are a fair number of these). We will
double-count people who upgraded their apps so the version number
changed through the day.
We will undercount anyone using Chandler Server that's not on Chandler
Hub. We will undercount anyone using Chandler Desktop but not syncing
to the Hub.
If you use an app once in a day or 500 times, you get one "hit" in the
above metrics. The goal is to try to count "people"; IP+User-agent is
serving as a proxy for that. Due to our designs, there's no real way to
link people using Chandler Desktop to those following a ticketed URL to
the Hub, for instance.
How does this metric fall short? In lots of ways. What we'd really
like to understand are classic marketing dimensions of recency,
frequency, and depth of interaction. In particular, we should establish
a better way to understand our "regular" users. We have no functional
definition of "regular" plus no good way to measure it if we did have it
+ Writes vs reads as proxy for "regular user"?
One way to define "regular" user might be "writes vs reads".
Intuitively, a regular user might make changes to an event/todo, while a
passive user might just view occasionally or just sync in the
background. So if we had a way to separate "those people who have made
a change this week" from "people who made no changes", then we might
have the beginnings of a "regular user" vs "casual user" metrics.
I'd be curious to hear reactions to the "regular == makes changes" idea.
It might be an interesting metric to watch, but I worry that it might
exclude lots of great users getting substantial value from the Chandler
Project but who don't make lots of changes to their lists.
+ Next steps?
I suspect I'm getting closer to having wrung out the info available to
me in current Chandler Server log files. It's not too hard to think of
great next-step metrics to measure, but the ideas need to be finessed
into what's actually feasible, and translated into an implementable
feature that's incrementally better than what we have now. In
particular, we'll probably proceed to work on a Chandler Server feature
to note what actions are actually occurring in the MC and Atom
protocols. (Number of items changed, for instance).
+ Big dips in usage are a measurement artifact
A note on days like Nov 3rd and Oct 31st where the traffic seems to
plummet: this is an artifact of the measurement, not real. Because of a
failure to plan properly on my part, my metric analysis code only
properly works with 1 file per day. On days when I update the Hub,
multiple files are generated in production; only 1 gets counted. It is
obviously possible to fix this issue, but it will require a significant
amount of refactoring. It's on the plan, but a relnote for now. My
More information about the General