[Dev] Re: Functional Tests Fail: Tinderbox reports success

Heikki Toivonen heikki at osafoundation.org
Mon Feb 13 15:51:14 PST 2006

John sent this privately first, but since this seems useful to general
discussion I am posting all of his email (along with my comments) with
his permission.

John Anderson wrote:
> The other day I wrote a new functional test. When I added it to the rest
> of the tests and ran them, mine passed but the NewCollection test
> failed. Even when I ran without my test, NewCollection failed. I
> mentioned this to Heikki, who said that functional tests were passing on
> tinderbox. He asked me if I could look into why it was failing on my box.
> I spent most of yesterday looking into this, but didn't find the bug. It
> turned out to be very complicated. The test randomly fails in different
> ways and some times doesn't fail at all. However I learned a lot which
> I'd like to share with you, some of which might explains why tinderbox
> sometimes doesn't report failures.
> Dan noticed my new test and pointed out a mistake I made. I needed to
> catch exceptions if I wanted them to be reported as errors. I had
> incorrectly assumed that exceptions would be caught automatically and
> logged as failures.

Yeah, this is bad, and the current way we do it is more like a
workaround. Something better would be nice.

> This got me thinking. How does tinderbox and the functional test
> framework handle Chandler crashes, python exceptions that happen before
> or after the functional tests, C exceptions, or hangs?

Not sure.

We used to have problems on Windows with unit tests that would cause a
python crash, which in turn would pop up a dialog that would require a
user to OK before the tests would continue. I think nowadays the dialogs
may come up but the tests still continue.

> As it turns out my new functional test runs the skins menu, which has
> been broken for months. When I recently hooked it up, it caused Chandler
> to crash on mac (widgets accessed a deallocated pointer). This menu has
> been one of the best ways of testing the framework on which Chandler is
> built, and often finds newly introduced bugs that otherwise would be
> very hard to find. So catching a Python exception in my functional test
> wouldn't have caught the mac bug.
> Whoever launches Chandler (tinderbox or some test harness) is going to
> have to detect seg faults, etc. Is this happening today?

I am 80% sure we detect test failure, but I am not completely sure.

> Chandler catches uncaught python exceptions, which get lost in the
> release versions and in the debug version of Chandler are displayed in a
> dialog along with the anything else written to stderr. This probably
> isn't what you want when running functional tests. I think the best
> solution for this problem is to have whoever runs Chandler to run the
> functional tests include the --nocatch and --stderr arguments, log the
> stderr output, which if not empty, causes the funtional test to fail. Is
> this happening today?

Not as far as I know.

> If we did catch exceptions in this way we wouldn't need add try/catch
> blocks like Dan did for my functional test.
> It's very important to run the functional tests on both release and
> debug versions of Chandler since the debug versions contain lots of
> extra testing code. Are we doing this?

Release versions only at the moment. The functional tests are run by the
perf tinderboxes which only run tests in release mode.

> I also noticed some time ago that the TestLaunchChandler test fails on
> windows (it hangs when trying to quit). Could this be the same problem
> as bug #4773 <https://bugzilla.osafoundation.org/show_bug.cgi?id=4773>,
> which we can't reproduce? I looked into it and it turns out to be a C
> thread deadlock in repository quit, which is consistent with bug #4773.
> This would be a really great bug to fix. Does tinderbox (or the test
> framework) detect functional tests that fail because they hang?

If a test hangs, there is no automatic recovery or reporting. You can
see on the Tinderbox page when a machine has stopped sending reports
which is likely an indication of a hang. We have a bug to recover from
hangs, but at the moment even the only theoretical approach I know of
does not work on Windows. See

Hmm, possibly the script that starts Chandler itself could also kill a
stuck Chandler process.

> I asked Bear why TestLaunchChandler wasn't failing on tinderbox and he
> said it wasn't being run. I don't remember the exact reason he gave for
> why it couldn't be run, but I think it was something about tinderbox
> machine requiring a display and adding a display was a security problem.
> Since TestLaunchChandler just launches Chandler and the functional tests
> also need to launch Chandler, why we can run functional tests but not
> TestLaunchChandler?

AFAIK it wasn't a security problem, we just couldn't get it to work. We
run the performance boxes differently which is why it works there.
Regular boxes can be controlled simply via ssh access, the perf boxes
need physical presence or VNC to control the tests.

It seems we should be able to switch to VNC control with the regular
Tbox clients as well. The most important ones would be the quick build
boxes. If that works, we can make those run functional tests in both
debug and release mode, which will also allow us to stop running them on
the perf Tboxes.

> TestLaunchChandler was suppose to be a smoke test of Chandler -- to make
> sure it doesn't crash. However, it doesn't run the launchers, so it
> won't detect any problems with them. We should switch over to running
> all our functional tests with the launchers.

Yes, we should. I'll file a bug on that.

> Finally, I noticed that TestNewCollection, when run alone fails by
> hanging forever, but when run in the order of the functional tests,
> fails with an attribute error. I was surprised to learn that each
> functional test isn't run from the same known starting point, e.g. they
> are all run one after another in the same execution of Chandler. For
> someone like me, who just wants to add a new functional test, it would
> be convenient if my test results didn't depend on the random state of
> Chandler left over after the tests that ran before it -- especially when
> some of the test access network resources.
> So that's my brain dump after a day of tracking down functional test
> failures. Let me know if I misunderstand any of the issues. I'm hoping
> that we can get the functional tests working in tinderbox soon. I'm not
> sure how best to proceed and I'm open to suggestions. Perhaps Aparna or
> Heikki could file and assign some bugs, if they don't already exist. If
> anybody needs my help fixing any of these problems let me know.

  Heikki Toivonen

-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 249 bytes
Desc: OpenPGP digital signature
Url : http://lists.osafoundation.org/pipermail/dev/attachments/20060213/0b1c289b/signature.pgp

More information about the Dev mailing list