[Swan-dev] status of failing tests

Thu Feb 1 16:37:23 UTC 2018

On Thu, 1 Feb 2018, D. Hugh Redelmeier wrote:

> - several failures that were only IKE retransmissions.  Just ignore them.
>  But a bit weird when IMPAIR_RETRANSMITS is set.

can happen in IKEv1 were both ends retransmit?

> - not sure: maybe retransmission needed but suppressed?
>  testing/pluto/ikev1-x509-12-san-dn-match failed west:output-different
> 	-004 "san" #2: STATE_QUICK_I2: sent QI2, IPsec SA established tunnel mode {ESP=>0xESPESP <0xESPESP xfrm=AES_CBC_128-HMAC_SHA1_96 NATOA=none NATD=none DPD=passive}
> 	+002 "san" #2: suppressing retransmit because IMPAIR_RETRANSMITS is set
> 	+002 "san" #2: IMPAIR RETRANSMITS: suppressing re-key
> 	+002 "san" #2: deleting state (STATE_QUICK_I1)

So if the packet is lost, you get the suppressing message when it neds
to retransmit. So that is one reason why the output differs even if
we have impair set.

> - state numbers differ, but everything else seems OK

I do see these and worry a bit. It seems to indicate a rekey has
happened that we did not expect.

>  testing/pluto/ikev2-delete-05-sa-start failed west:output-different
>  testing/pluto/ikev2-delete-06-start-both failed west:output-different

Some of this is because of changed behaviour, and it needs looking at
more carefully, which is why I haven't updated the reference output yet.

> - sometimes "ping -c N' sends more than N packets.  Huh?

We should migrate to using fping.

>  testing/pluto/ikev2-62-host-ondemand failed north:output-different
>  testing/pluto/ikev2-62-host-ondemand-instance failed north:output-different
>  testing/pluto/klips-passthrough-00 failed west:output-different
> - one dropped packet

again, for the ones where we just send 4 pings to confirm, we should
just use fping.

>  testing/pluto/ikev1-algo-esp-sha2-01-netkey-klips failed west:output-different
>  testing/pluto/ikev1-algo-esp-sha2-02-netkey-klips failed west:output-different
>  testing/pluto/newoe-07-ike-replace-initiator failed road:output-different
>  testing/pluto/certoe-09-packet-host failed road:output-different
>  testing/pluto/klips-algo-twofish-01 failed west:output-different
>  testing/pluto/klips-algo-serpent-01 failed west:output-different
>  testing/pluto/klips-algo-cast-01 failed west:output-different
>  testing/pluto/ah-pluto-07-klips-netkey failed west:output-different
>  testing/pluto/l2tp-01 failed north:output-different
>  testing/pluto/l2tp-02 failed north:output-different
>
> - more than one dropped packet

I wish we could get this more stable....

>  testing/pluto/interop-ikev2-strongswan-38-mobike-initiator failed north:output-different
> 	-2 packets transmitted, 2 received, 0% packet loss, time XXXX
> 	-rtt min/avg/max/mdev = 0.XXX/0.XXX/0.XXX/0.XXX ms
> 	+2 packets transmitted, 0 received, 100% packet loss, time XXXX
>
> - something funny with MOBIKE
>  testing/pluto/interop-ikev2-strongswan-38-mobike-pool failed east:output-different road:output-different
> 	west: +peer supports MOBIKE

That message seems strongswan version specific. I thin newer versions
have removed this?

> - script changed and odd message change
> 	- ip xfrm state
> 	+ ipsec look

This is a change I made. ipsec look runs both ip xfrm pol and ip xfrm
state but adds a sorting wrapper. This prevents false positives where
you see flipping of 192.1.2.23 with 192.1.2.45.

>  testing/pluto/interop-ikev2-strongswan-39-mobike-responder failed east:output-different road:output-different
> 	-	proto esp spi 0xSPISPIXX reqid REQID mode tunnel
> 	+	proto esp spi 0xSPISPIXX reqid REQIDREQID mode tunnel

I think related to the ipsec look somehow double fixing the REQID.

>  testing/pluto/newoe-21-liveness-clear failed east:output-different road:output-different
> 	road: script changed!

likely something else still went wrong, so output was not yet updated.

> - tcpdump didn't terminate properly

not sure about that.

>  testing/pluto/nflog-03-conns failed west:output-different
>
> - who knows
>  testing/pluto/newoe-06-prio failed east:output-different road:output-different
> 	-000 #1: "road-east-ikev2" --- cut ---

I don't know what those cut lines are and why they appear. Its a test
anomaly.

> - bonus xfrms

If you see "transport mode" bonus xfrms, those are the larval states.
They seem to linger sometimes for a bit and then we see them. It's
harmless (but annoying)

>  testing/pluto/newoe-15-portpass failed road:output-different
>  testing/pluto/newoe-18-private-clearall failed road:output-different
>  testing/pluto/newoe-19-poc-poc-clear failed road:output-different
>  testing/pluto/certoe-07-nat-2-clients failed road:output-different
>
> - state serial numbers differ

Again, I'm a little concerned why this happens.

>  testing/pluto/delete-sa-04 failed east:output-different
>  testing/pluto/delete-sa-05 failed east:output-different west:output-different
> 	(also different conn chosen?)
>  testing/pluto/klips-basic-pluto-01 failed east:output-different
>
> - mysteries

Needs looking at.

>  testing/pluto/nat-pluto-04 failed east:output-different
> 	-? (192.1.2.254) at 12:00:00:de:ad:ba [ether] on eth1

That flips as well on some tests. I don't know why or how to avoid it.

>  testing/pluto/interop-ikev1-strongswan-11-ah-initiator-sha512 failed west:output-different
> 	-generating QUICK_MODE request 0123456789 [ HASH ]
> 	-sending packet: from 192.1.2.45[500] to 192.1.2.23[500] (XXX bytes)

seems strongswan version dependent too.

>  testing/pluto/interop-ikev2-strongswan-35-ipsec-rekey failed west:output-different
> 	-westnet-eastnet-ikev2{6}:  DELETING, TUNNEL, reqid 1
> 	+westnet-eastnet-ikev2{6}:  REKEYING, TUNNEL, reqid 1, expires in 10 seconds
> 	-westnet-eastnet-ikev2{7}:  INSTALLED, TUNNEL, reqid 1, ESP SPIs: SPISPI_i SPISPI_o
> 	-westnet-eastnet-ikev2{7}:   192.0.1.0/24 === 192.0.2.0/24

That is also a race condition of strongswan. I've prevented some with a
sleep 1 statement.

>  testing/pluto/dnssec-pluto-01 failed west:output-different
> 	-000 "westnet-eastnet-etc-hosts-auto-add": 192.0.1.0/24===192.1.2.45<192.1.2.45>[@west]...192.1.2.23<east-from-hosts-file>[@east]===192.0.2.0/24; unrouted; eroute owner: #0
> 	-000 "westnet-eastnet-etc-hosts-auto-add":     oriented; my_ip=unset; their_ip=unset; my_updown=ipsec _updown;

That is still a bug that needs fixing. It suggests that /etc/hosts was
not properly read as part of the libunbound resolver context. It's been
buggy for a number of releases.

>  testing/pluto/nss-cert-10-notyetvalid-responder-ikev2 failed east:output-different west:output-different
>  east:
> 	+"nss-cert" #2: ERROR: netlink response for Del SA esp.ESPSPIi at 192.1.2.45 included errno 3: No such process
> 	+"nss-cert" #2: ERROR: netlink response for Del SA esp.ESPSPIi at 192.1.2.23 included errno 3: No such process
>  west a mess.  But:
> 	-003 "nss-cert" #2: Certificate E=testing at libreswan.org,CN=notyetvalid.testing.libreswan.org,OU=Test Department,O=Libreswan,L=Toronto,ST=Ontario,C=CA failed verification
> 	-003 "nss-cert" #2: ERROR: Peer's Certificate has expired.

The notyetvalid tests depend on libfaketime, which does not seem to work
on all distros. I suspect it needed a never version and when people
reinstalled to stock f22, it broke again.

>  testing/pluto/seccomp-03-updown failed road:output-different
> 	+Jan 27 16:47:44 unbound[1091:0] error: can't bind socket: Cannot assign requested address for ::1
> 	+Jan 27 16:47:44 unbound[1091:0] fatal error: could not open ports

That's because unbound fails to start when there is no ipv6. Probably
easiest just to tell swan-prep to do -6

>  testing/pluto/ipsec-hostkey-ckaid-02-fips failed west:output-different
> 	-Generated RSA key pair with CKAID <<CKAID#1>> was stored in the NSS database
> 	+FIPS HMAC integrity verification test failed.

It seems this fails for installs in /usr/local/
I blame Andrew.

>  testing/pluto/ikev2-ppk-static-05-insist-nokey-insist-fail failed west:output-different
> 	-003 "westnet-eastnet-ipv4-psk-ppk" #1: connection requires PPK, but PPK_ID did not match any loaded PPK
> 	+003 "westnet-eastnet-ipv4-psk-ppk" #1: connection requires PPK, but we didn't find one
> 	+leak: fork pid, item size: 64
> 	+leak detective found 1 leaks, total size 64

I'll double check with Vukasin. Although fork pid is not likely his
fault :)

>  testing/pluto/ikev2-ppk-static-09-no_ppk-initiator failed east:output-different
> 	- # TODO  put in a grep line confirming NO PPK usage
> 	-east #

I'll fix that one up too.

> - leaks (known to Andrew?)

I'll leave that to Andrew then :)

> - seemed to have more tunnels than expected
>  testing/pluto/multinet-01 failed west:output-different

Would need to investigate

> - early stop?
>  testing/pluto/klips-netkey-pluto-06 failed east:output-different west:output-different

if final.sh runs a status or trafficstatus and also shuts down pluto for
a leak report, there is a race between nodes. If one shuts down fast,
the other won't see the proper status because it will have processed the
deletes from the other peer's shutdown. The rule is to not have status
and shutdown in final.sh.

> - stale reference logs?
>  testing/pluto/interop-ikev2-strongswan-07-strongswan failed west:output-different
> 	+received AUTH_LIFETIME of XXXXs, scheduling reauthentication in XXXXs
> 	+peer supports MOBIKE

strongswan versioning.

> - requid changed
>  testing/pluto/interop-ikev2-strongswan-35-responder-rekey-pfs failed west:output-different
> 	-westnet-eastnet-ikev2{2}:  DELETING, TUNNEL, reqid 1
> 	+westnet-eastnet-ikev2{2}:  REKEYING, TUNNEL, reqid 1, expires in 59 minutes
> 	and more

Its the same race about delete/rekeying you are catching (see earlier)

> - cert oddity
>  testing/pluto/nss-cert-09-notyetvalid-initiator failed east:output-different west:output-different
> 	-"nss-cert" #1: ERROR: Peer's Certificate has expired.

see libfaketime earlier. also if you do not renew the certificates
within one week of generating them, these tests start to fail.

> - key oddity
>  testing/pluto/nss-cert-nosecret failed west:output-different
> 	-000 TIMESTAMP, 1024 RSA Key AwXXXXXXX (has private key), until TIMESTAMP ok
> 	+000 TIMESTAMP, 1024 RSA Key AwXXXXXXX (no private key), until TIMESTAMP ok

based on test name it seems the new output is right. Just not sure why
it was wrong before. Needs looking at.

Paul