[General] Updated Hub usage metric - PPD perspective

Mimi Yin mimi at osafoundation.org
Wed Nov 28 20:21:56 PST 2007


I realized my list was long and a little bit all over the place. I've  
tried to distill the concrete next actions from your reply and have  
created an item in the shared Strategy collection to keep track of  
them called "PPD Metrics page".

On Nov 26, 2007, at 3:48 PM, Jared Rhine wrote:

> One next step I'll take is to cut-and-paste your list as-is into a  
> new PPD dashboard page.  There, as a prototype, I'll go ahead and  
> point to the metric streams we do have, and prototype the ones that  
> are missing but feasible.  When it's a little more fleshed out, we  
> can refactor the various dashboards and put the logical stuff  
> together.

That sounds like a good plan 'o action.

> For the overall issue of metrics, there are still some looming  
> issues such as "what counts as a person" topic in my last email,  
> some missing bits until we add some yet-to-be-specified logging  
> features, and developing a shared sense of the very short list of  
> metrics we should focus on most.
>
> Next, going point-by-point on your list:
>
>> Better visibility into how Desktop users are using the service.
>>
>> Cumulatively, over time...
>>
>> # of unique user accounts
>> # of collections
>> # of items
>> # of unique publishers
>> # of unique subscribers
>> # of subscriptions
>
> Many of these are available, in aggregate by counting the number of  
> rows in various database tables.  Items, events, subscriptions, and  
> tickets are here:
> http://dashboard.osafoundation.org/dashboard/hub

Do each of the bars in these graphs represent totals? As in, the  
Total # of items/events/subscriptions and tickets on the server?

> Let's spec out "unique publishers" and "unique subscribers" more;  
> is that "total number of people who have published a collection"?

+ unique publishers, as in unique Hub accounts?
+ unique subscribers is trickier as it involves anonymous  
subscriptions from the Desktop. So #of desktop clients with  
subscriptions?

> Mildly tricky given the autocreation of the Hub OOTB.  Maybe we  
> want "total number of accounts which have more than 1 collection"?   
> Dunno quite how to implement right now.

I think there are actually 3 metrics here. There's:
1. # of collections published to the Hub from other clients  
(Chandler, iCal 3, etc); and then there's
2. # of collections users create on the Hub via the Hub UI; then there's
3. # of collections that have items in them

> "Unique subscribers" is "number of accounts which have any  
> subscriptions created"?

# of accounts on Hub + anonymous subscribers from the Desktop + iCal  
subscribers, etc...

>
> I don't think "unique" adds anything to the "# of user accounts",  
> right?  I'll add one somewhere, though this is a bit outside the  
> "Desktop visibility" sphere.  (I suppose most of the above are too).

Hrm, well I guess it'd be nice to figure out if a single person has  
multiple accounts...

>
>> # of syncs per day per user
>
> Most "per X per Y" requests are going to be either tricky or in  
> need of some simplification.  You don't really want, I assume, a  
> list of 2000+ rows, with the number of syncs each user has done,  
> produced each day.  Plus there will be a privacy issue/discussion  
> each time we say "per user".  So what sort of simplification is  
> helpful.  I'm not sure what you'd do with an average syncs per day,  
> anyway; is your thought to get the classic insight of about how  
> long users leave Desktop running in a day?

Yes, that's the goal. Well, I think the idea is just to get a visual  
sense of the spread of the # of syncs across all our users, not  
necessarily to be able to literally see the #of syncs for each  
user...and then to get a sense of how that spread changes over time.  
But I agree, we can simplify by splitting this into 2 graphs:

1. Spread of syncs across user population on any given day.
2. Average # of syncs over time.

I'm essentially trying to get at 'quality of use' as opposed to  
'quantity of users'.

>
>> Spread of # of collections per user account
>> Spread of # of items per collection
>
> Histograms?  This will be good info, though I'll need to first  
> write some additional framework stuff to collect and present  
> (graph) histogram-style reports first.
>
>> # of collections per item
>
> Hmmm, a nice inverse.  Ok.  Seems like maybe a low-hanging fruit if  
> I do a straight average of all items (not broken down by item).

Kewl.

>>
>> For both: per collection and per user...
>> # of new items per day
>> # of items edited per day
>> # of edits per item per day
>> # of editors per item per day
>> # of editors per collection per day
>
> All these will require additional server infrastructure and  
> logging, I'm pretty sure, to be able to start a meaningful  
> analysis.  Also, the standard "per X" questions apply; you don't  
> really want a table with every collection and another table with  
> every user, right?

So if we take # of edits per item over time - The point of that would  
be to see how many items users touch on a day-to-day basis. And then  
to take that and see how it spreads out across collections and/or users:

+ About 30% of items are edited on any given day
+ 80% of those edited items are spread across 30% of the collections
+ 80% of the collections have 15+ items edited with an average of 3  
or more edits per day on each of those items
+ 30% of the users make 80% of the edits

> The "# of editors" questions could use some additional dev  
> exploration; offhand, I'd have to think about what data we'd need  
> to capture to produce that.
>
>> # of read/write versus read-only subscriptions
>
> Hurm, ok.  That sounds like a chunk of scripting to look up each  
> kind, given just the raw ticket string in the logs.
>
>> # of subscribers per collection
>> # collections per user
>
> It seems like lots of the above is useful info for Hub users too.   
> Do you mean to collect the above specifically for Desktop users,  
> split off?  I'm guessing it might wind up being difficult to split  
> the two categories even; for something like # of editors, we might  
> wind up with an aggregate of both Desktop and Hub users making  
> edits on an item.

I think it's important to distinguish between Desktop and Hub users.  
I'd like to work towards understanding if there are any qualitative  
differences between the way Desktop and Hub-only users use Chandler.

>
>> Better visibility into the users who are accessing the Hub UI  
>> directly
>>
>> # of times per week Hub UI users visit Hub UI
>> Average session time per visit
>> # and %age of 1-time Hub UI users
>
> These hinge on the complicated "what is a person" issues raised in  
> my recent email.  It's currently tricky for me to separate tickets  
> from accounts for Hub use, although the aggregates are reasonably  
> accurate I think.  Offhand, these are awesome questions, but they  
> are daunting to approach from an implementation perspective.  Maybe  
> I can find a reasonable place to start with an implementation if I  
> think about it for a while.

Yes, this is by no means a request for immediate solutions. Just a  
brain dump on my part.

>
> Session time is just the utter bane of log analysis people since  
> the beginning of web servers.  It's inherently a painful guess with  
> some tradeoffs.  Some additional backend work will be needed to put  
> a session id where needed or to design a heuristic for creating  
> one.  I'll see what happens if I use a regular stats package.   
> Maybe it can say something useful about average session time even  
> if it does weird things with the URL analysis itself.

Ok.

>
>> # of Hub UI users that have subscriptions to Desktop collections
>> - How many of these have accounts? don't have accounts.
>
> My first reaction is "a collection is a collection", that there's  
> no difference between a Desktop collection and a Hub collection.   
> But I guess you're thinking "a collection which is published into  
> an account used by a user that is primarily a Desktop user".  Which  
> makes sense as a product question (acknowledging overlap between  
> "Desktop users" and "Hub users").  But technically, it doesn't seem  
> quite feasible to do this separation properly.

Just to clarify: You're saying it doesn't seem quite feasible to  
distinguish between Hub Users subscribing to collections created by  
other Hub Users versus collections created and published by Desktop  
users? Again, I'm trying to understand what, if any qualitative  
differences exist between Hub-only and Desktop + Hub users.

> To the second bit, I think all users with subscriptions have an  
> account; what did you mean by the second line exactly?

My assumption is that there are Desktop users that subscribe to  
collections without getting an account.

>
>> # of Hub UI users that have subscriptions to collections published  
>> from other clients? Which clients?
>
> Another very hard one, it seems.  It would be interesting to keep  
> historical information about the first client which created each  
> collection, but I think it's a backburner request, as it'd take a  
> fair bit of infrastructure for probably not a lot of payoff other  
> than a prerequisite for this question.

I'm trying to understand how our users are interacting with each  
other with respect to the Chandler 'Eco-system'. As in, what %age of  
users are working entirely within the Chandler Universe versus users  
who are interacting with lots of people from outside of the Chandler  
eco-system.

>> # of Hub UI users without subscriptions
>
> Can you wrap up with a little more detail about this "Desktop user"  
> vs "Hub UI user".  Is the thought that each account should be split  
> into one of these buckets depending on past behavior?  And which  
> they use "more"?  (Recognizing that we'll never identify anonymous  
> users accessing via a ticket.)

? We about counting anonymous users via IP address + GUID from  
Desktop client?

>
> Thanks for the list, Mimi.  It's always helpful.  A number of them  
> just scare me and I wonder how I'll ever make headway on them, but  
> it's certainly useful to have the list.  I may not have answers  
> right now, but maybe something clever will come up like the "IP 
> +browser pair == 1 user" trick outlined in that previous email.

Should we be thinking about adding something on the Desktop end to  
help us identify Desktop users and collections published from Desktop  
clients?

I realize this list is daunting. I assume we will proceed as usual by  
figuring out priorities and chipping away at the hard stuff by  
phasing and staging things.

Looking back over this list, I think the highest priority for me is  
to figure out a way to break down our numbers so we have numbers for  
Desktop users syncing to the Hub. I understand that this isn't  
straightforward with how things work today, but I don't quite get  
what all the technical barriers are.

Mimi



More information about the General mailing list