[Cosmo-dev] space usage reports

Jared Rhine jared at wordzoo.com
Thu Nov 9 02:42:42 PST 2006


Brian Moseley wrote:
> i just checked in the ability for admins to get space usage reports
> with cmp.

Excellent, thanks.

I haven't actually used the feature hands-on yet.  But judging from just the
output and protocol I see, it seems serviceable as-is.  Great!

Below, I review the questions you pose at:

http://wiki.osafoundation.org/bin/view/Journal/BrianMoseleyCosmoSpaceUsageReports

It'd be good for the future docs to be clear about what's being measured
with the size; I assume it is content-length.  The zero-length items got my
attention.

You heard me wondering if adding the mime type as a column would be helpful.
 I'm still on the fence about that; I could see it being useful in
reporting, though I suppose I could also script up a report based on
filename extension.  It could introduce some difficulties, as in, what's the
mime type of a collection (directory)?  It looks like I outlined "Which
types of resources are consuming the most space for a user" as a use case in
the original ticket writeup, so maybe that argues for mime-type inclusion by
default.

I like the one-line-per-entry format, and the support for both server-side
and specified-user reports.  Certainly with these base tools, it wouldn't be
but a dozen-odd lines for an admin to script a needed "top space users"
report.  To do so, I'd fetch a list of all users, then iterate over each,
summing up all entries for that user on the 2nd column; no sweat.

I like the tab-separation.  Resources are more likely to end up with spaces
than tabs in their names, so it makes it easy to split-on-tab.

I've no problem with the current default sorting order (by path, depth-first).

I'm ok with deferring query params for sorting.  It'd be a nice feature, but
not required.  If params were supported, I'd want at least four items to be
included: sorting on size, sorting on lastmod, reverse sorting to get
largest size to smallest size or most recent/oldest files, and ability to
retrieve the top N queries.  These are pretty damn easy to script on the
client-side though too, so until at least a second person asks for the
features, it seems like "Preview focus" mode favors deferring them.

What's the auth scheme?  Admin user only?  If so, I'm not going to worry
about denial-of-service attacks, and I'd prefer to keep the aggregate report
available.  As with paging of the user list, I don't yet care and tend to
think that paging isn't actually going to cut down on computational load much.

Regarding timestamp format, that's vaguely thorny.  I tend to prefer regular
ISO 8601, which I see is close enough to RFC 3339 that we're talking about
the same thing.  When I code timestamps, I usually go ahead and include the
T as a separator.  But often I'm putting timestamps into filenames where I'm
trying to avoid whitespace.  In those cases, I often translate the colon to
a dash so as to support copying the files onto Windows too.  Sometimes
timezones go in, sometimes not.  Hmmm, I suppose it's slightly better to
include the timezone (or a Z if everything is UTC) for clarity, unless the
spec says it's always UTC.  As long as one is going for clarity, I suppose
an unambigious standard is better, so RFC 3339 (full, with T separator and a
timezone) is better than trying to pick a personal style that's pretty.
Parseable, standards, and automation seem to outweigh raw usability in this
case.

To the question of "should we list by uid instead of path", I've realized
I'm not sure if you're talking about users or resources.  I think I prefer
path in both cases, though maybe you'd like to show an example of what
by-uid would look like.

I think that totals and subtotals are better not included here; they are
really more a UI function and if they are included, then any client-side
scripting logic needs to parse some lines differently than others, an
unnecessary complication.

A "worst offenders" report would be nice, but lower priority than any other
features.  If the four query param items mentioned above were implemented
later, the worst offenders report drops out pretty naturally.

Regarding XML format, using the accept headers isn't bad.  It'd be nice to
be able to pass it in optionally as a query param too though.

Bottom line, sounds like I'm saying:
* +1 to adding mime-type column
* +1 to RFC 3339 timestamps
* +1 to sticking with path
* +1 to keeping aggregate (all users) report
* -1 to query params until features more pressing or in web UI
* -1 to subtotals
* +0 to XML format for 0.6 but include query param switch if you do

-- Jared


More information about the cosmo-dev mailing list