[Swan-dev] should final.sh shut down pluto?
Andrew Cagney
andrew.cagney at gmail.com
Sun Jun 26 23:58:40 UTC 2016
On 26 June 2016 at 09:40, D. Hugh Redelmeier <hugh at mimosa.com> wrote:
> | From: Paul Wouters <paul at nohats.ca>
>
> | > On Jun 26, 2016, at 03:49, D. Hugh Redelmeier <hugh at mimosa.com> wrote:
> | >
> | > I'm quite unhappy about uncaptured core files. At the moment, I'm
> | > not willing to say what a good solution would be.
> |
> | That's not quite was it is.
>
> I'm not sure what you mean by that sentence.
My "recollection" of the issue on IRC was wrong. This e-mail has the
correct details.
> | Not all tests perform a shutdown and so some
> | errors in a shutdown that could cause a core would theoretically not be
> | detected". Any core files made are always detected.
>
> I don't know what you are claiming.
>
> When I ran all the tests Friday morning (using kvmrunner.py), several
> assertion failures showed up but no core files were preserved. All
> failures were of the same assertion (which we know about).
>
> I'm a dumb user. I don't know why this happened.
>
> But I'll make a sequence of guesses:
> - the assertion failed during shutdown of Pluto
The entire system was shutting down; pluto didn't quite make it and dumped core.
> - maybe shutdown was performed by final.sh (I don't know)
You shouldn't need to know.
However to be specific, final.sh scripts sometimes do the following:
- shutdown pluto
- check for core files and save them
but some skip one or both steps. Regardless, it shouldn't matter.
> - the pluto log file captured the ASSERT failure
By luck we've this bread crumb. If pluto were to dump core for some
other reason, we'd have nothing.
> - the core file doesn't get captured in this case.
Right, per above. Capturing core files is an optional part of
final.sh and that can be before pluto exits.
> | As antony said, it is VERY useful to run a single test and then ssh in
> | while the tunnel is still up.
>
> Yes. So maybe we need to cleanly separate shutdown into a separate step
> and have the script-runner capable of stopping at any designated step.
> For an example of this kind of control, look at rpmbuild's -b parameter.
Yea, an explicit option like --stop-before <script>.
However, I view that as a "nice to have".
To me the "must have" is consistent default behaviour whether an
individual or group of tests. For instance:
./testing/utils/kvmrunner.py testing/pluto
./testing/utils/kvmrunner.py testing/pluto/basic-pluto-01
./testing/utils/kvmrunner.py testing/pluto/basic-pluto-*
should all run the tests the same way.
> | We have a bunch of tests running shutdown
> | but don't for the majority of tests. I think that's fine.
>
> I don't. The reason is that each test tests different paths through the
> system and each might cause different problems that linger undetected
> until shutdown.
>
> I come to that conclusion honestly: I don't have a core dump for any of
> these particular assertion failures. But I might be mis-diagnosing.
>
> There is a chance that kvmrunner.py needs some added code to make me
> happy. I know that it isn't an advertised component of our test system.
> Do you see core dumps for these assertion failures?
So far the best solution I've seen involves always shutting down pluto
_before_ shutting down the entire system (if systemd is causing pluto
to crash we've another problem). Perhaps final.sh should be required
to run a new script "swan-destroy", or perhaps that should be run
outside of the *.sh scripts.
More information about the Swan-dev
mailing list