[Cosmo-dev] generating test data

Katie Capps Parlante capps at osafoundation.org
Tue Apr 10 15:28:00 PDT 2007

As the desktop team turns attention to performance, we're revisiting the
test data set that we've been using for performance metrics.

I'm not sure if we ought to coordinate desktop and server team efforts
wrt test data -- sending this to both teams in case this turns out to be
the case.

Previously, we generated a set of calendar data and have stored that in
an ics file (tools/cats/datafiles/Generated3000.ics). The size of the
calendar and some of the characteristics were based on Mitch's calendar
at the time.

We're now at a point where we'd like to have a data set that is
- based on real users' data
- contains multiple collections
- contains tasks, notes, messages in addition to calendar data (a data
set that reflects the Preview feature set)

Instead of storing test data in .ics files, it makes sense to use our
new eim-based format. We'd like to start out with real user data, then
obfuscate the data to protect the innocent.

Morgen has checked in a tool that allows us to obfuscate a "dump" file
from the desktop app. (Thanks Morgen!)

 From Morgen:
> So yesterday I checked in Tools > Save and Restore > Obfuscated dump
> to file...
> It sets an obfuscation attribute to True on the translator object,
> and then the various exporter callbacks honor its setting, instead
> emitting X's for the appropriate fields, and skipping a bunch of item
> types such as accounts and passwords altogether.  The one thing I'm
> not sure about obfuscating is the mail item and all its various
> fields.  Does it matter if email addresses are included in these
> dumps?  If so, someone will have to go and tweak export_mail( ) in
> translator.py to obfuscate the appropriate fields.

So from here:
- Should we tweak mail addresses to use test accounts? (looking to Brian 
Kirsch here)
- Can we make use of this tool or data for server performance efforts? 
If so, any next actions?


