[Chandler-dev] No more perf regression bugs?

Heikki Toivonen heikki at osafoundation.org
Mon Apr 16 18:41:09 PDT 2007


Bryan Stearns wrote:
> where we focus on those problems. However, I don't think
> performance-regression bugs targeting a particular commit are useful or
> positive for me as a developer:
> 
> - Performance analysis works better when all the changes affecting a
> metric are in place: we can analyze the whole chain, not just one piece
> at a time.

This is where we have differing opinions. I think it is easier to look
at the change in understanding why performance changed, rather than
start from the whole chain analysis.

> - It's not like we can just back out the commit and discard the feature
> requirement. Most of the time, it's not possible to implement new
> features without some performance cost.

I have background in a project where performance regressions were
considered extremely serious, and many a feature went back to drawing
board because the initial implementation was not performant enough. I
understand we are not even in a position to attempt that with Chandler
yet, because there are too many critical features missing, and doing it
the hard way can slow feature development significantly. I also
understand that in some cases performance costs cannot be avoided.

But having said that, AFAIK I have not asked any checkin to be backed
out because of a performance regression.

> - It's *really* demoralizing to work hard on a feature, then have a
> performance-regression bug filed against it a day or so later (usually
> after you've started to dig into the next feature).

I am sorry, that has certainly not been the intent of me filing the
bugs. I believe the wording I have used was along these lines: "please
check if the checkin contained obvious performance bugs, and if so, fix
them, otherwise mark as invalid". I did not mean to criticize the
feature or the implementation.

> I also have trouble with our performance-monitoring mechanisms: many of
> the measurements vary widely, even when run with the same version of the
> code: here's 19 runs against a single revision on a single platform, and
> the standard deviation is 1/3 of the average time!
> 
> http://builds.osafoundation.org/perf_data/detail_20070415.html#creating_a_new_event_in_the_cal_view_after_large_data_import.double_click_in_the_calendar_view

This is a problem. John's script record/playback could potentially
stabilize some results, but it has been quite fragile and does not yet
have the functionality to replace all the tests.

In some cases running the test with no indexing may get results with
less deviation. However, I am slightly against doing this on Tinderbox
because the user will run with the indexer. As a developer you can of
course make that change locally if it stabilizes the test results.

Also, if the test framework gets in the way (like it does in many
cases), we as developers can and should modify our local code and insert
the profiler calls ourselves where appropriate.

> That's a weekend day: on a weekday, or a slower platform, there may be
> only one perf run of each revision (or none at all). Because of this,
> the graphs cover too short a period to reliably see the effect of a
> single commit.

This is also a problem. One thing I have been hoping is to make the
tests report the last 24 hour period instead of today. However, the
reports were originally coded with just day reporting in mind and it
will be a fair amount of work to change; so far other tasks have seemed
more important.

Also, it takes a long time to run the performance tests. And I would
like to run each test 5 times instead of the current 3, to get a bit
more stability that way. We could get faster hardware, but we'd still
need to run the tests on the reference platforms as well.

> (While we're on the subject: I also don't like the way we state our the
> performance targets: If we say that 1 second is "acceptable", but the
> "goal" is .1 seconds, I'm going to stop looking at a problem once it
> reaches "acceptable" and switch to another problem, and won't try to
> improve the first one further until all the other metrics are at the
> "acceptable" level -- and probably not until after all other bugs are
> fixed, too, which hasn't happened yet. I'd be happier if the table on

I believe that is as it is supposed to be working.

> the tbox page used a shade of green once a measurement got to
> "acceptable", and a brighter shade if it got to the "goal": the table
> would make us look a lot less screwed than the red and orange mess there
> now, which makes it look like we're making no progress at all.)

Personally I prefer the orange, since it is an improvement from red, and
orange still means we are kind of screwed but ok for Preview. Showing
green would seem a bit like lying to me.

Of course, if the majority want to change the colors I can do that. If
not, with many browsers you can use a user.css stylesheet and color the
entries anything you like: http://www.squarefree.com/userstyles/

-- 
  Heikki Toivonen


-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 254 bytes
Desc: OpenPGP digital signature
Url : http://lists.osafoundation.org/pipermail/chandler-dev/attachments/20070416/d258f742/signature.pgp


More information about the chandler-dev mailing list