[Swan-dev] what is swan-transmogrify doing?

Mon Jan 25 17:39:53 UTC 2016

On 25 January 2016 at 10:50, Paul Wouters <paul at nohats.ca> wrote:
> On Mon, 25 Jan 2016, Andrew Cagney wrote:
>
>>
>> I've now twice tracked a test VM's slow boot (<30s -> >1minute) down
>> to the Python (I'll get to that in a moment) script
>> "swan-transmogrify" that is run, every time the machine is booted,
>> from "/etc/rc.d/rc.local".  The first slow down I encountered was
>> tracked down to "chcon -R --reference /var/log /testing/pluto"; and
>
>
> that's to prevent selinux warnings inside the guest. It could possibly
> be commented out. Maybe Tuomo can fix it better.

The current theory is that it be moved to the init process and just
run over the current test's directory.

However, your suggestion to comment it out is interesting.  The build
uses /source and that doesn't get chcon'd.  The test scripts are in
/testing/guestbin and that doesn't get chcon'd.  And I've even run
tests where the logs are written to /source and, again, it isn't
chcon'd.  Perhaps the reason it was needed has gone.

>> the latest, I suspect, is with "systemctl restart network.service"
>> interacting with some tools needed for Kerberos.
>
>
> The network restart is because it reconfigured the "base" VM into
> one of the known guests "west, east, etc".

So, because rc.local is run really late, the VMs:

- bring up the network
- shutdown the network
- fiddle
- bring up the network again

seem's very inefficient; especially given that some of the boot
slowness is network related.  If the script is to be run then, running
it very early in the boot process, before the network is up would be
better.  Yes, that means, as part of cloning the test VM the script
would need to be installed into the root file system.  Part of a
post-processing step like I suggested below and not as part of
kickstart.

>> Well, according to
>> https://libreswan.org/wiki/Test_Suite#swan-transmogrify "This python
>> script compares the nic of eth0 with the list of known MAC addresses
>> from the XML files. By identifying the MAC, it knows which identity
>> (west, east, etc) it should take on. Files are copied from
>> /testing/baseconfigs/ into the VM's /etc directory and the network
>> service is restarted. "
>>
>> That sounds like a simple "first-boot" like operation; not something
>> that needs to be done every time, and not something requiring MAC
>> magic.  In fact, we could even run something like:
>
>
> Yes. But the original idea was that we would be using a single image
> (eg uml disk) and use a copy-on-write to transmogrify it into west,east,
> etc and then throw away the COW at the end of the test. This to ensure
> no state is left behind for the next test.

BTW, the actual clone is really really cheap.  Unfortunately, the
first-boot of the result is really slow - presumably copying files
that get touched - so it is a tradeoff.

> But you are right, with KVM we deviated a little bit from this. But
> I think for the docker tests Antony did, it might still be needed
> because those images are created from scratch for each test.

I wonder if docker is sufficiently different to be better served with
dedicated scripts.

>> i.e., initiate the transmogrification (google thinks that is a word!)
>> once, from outside the VM, and with an explicit host name. No need for
>> any MAC magic, and no need for repeated fiddling with config files or
>> restarting networks during every boot.
>
>
> We do need to ensure some files are always correct for that test, such
> as resolv.conf that might be different for different tests. But in
> general, yes we could probably get away with that for the KVM tests.
> We also always create the NSS database from a known saved copy that
> only contains the raw RSA key. Some tests need a cert added, some
> tests need to get that cert via IKE. So it is important to always
> cleanup the NSS and start from the minimal one.

Yes, re-creating /etc/ipsec.d from scratch at the start of each test
sounds like a good idea.  However, a task for just swan-prep, and not
shared with swan-transmogrify.
/etc/resolv.conf could be more of a challenge, a boot script that
restores it may well be needed (does swan-transmogrify even do that?),
but does swan-transmogrify even do that now?

>> Who knows. It contains docker magic. It silently runs random
>> processes.  And it's written in Python and not SH (boot scripts IMNSHO
>> should be simple SH).
>
>
> Shell sucks for variables and sub shells and automating interactive
> things :P

Right, thats why swantest and swan-prep are written in Python and not
run during the boot.

>>  However, what I suspect is that it has, over
>> time, it has out-of-site-out-of-mind accumulated stuff that really
>> really needs to be transparent and initiated from the "*init.sh"
>> scripts.  For instance, both swan-transmogrify and swan-prep screw
>> around with the contents of /etc/ipsec.d/*.
>
>
>> thoughts,
>> Andrew
>>
>> PS: Why are the VMs running NFS; and why do the VMs suck down useless
>> megabytes of debuginfo RPMs :-)
>
>
> We weren't sure if we were moving the 9P filesystem to NFS or not. 9P
> is nice because it works even if OE screwed up the network. But 9P is
> not writable in RHEL (by design) and it seems slower than NFS by a lot.
> But NFS can go down when the networking gets screwed up by IPsec. So
> for instance we could lose logging.
>
> debuginfo is needed to get proper gdb backtraces :P But you know that :P

I'm not so sure, .eh_frame has unwind information - full debug
information isn't needed to unwind from a random library.  The
debug-info lets you debug those libraries.  Either way, I'm not sure
we need debug-info from strongswan and other stand-alone programs by
default.  Better, I think, to provide an easy way to install
debug-info if/when needed.

Andrew

> Paul