[Swan-dev] analyses of regression in test ikev2-ike-rekey-03

Paul Wouters paul at nohats.ca
Fri Jun 15 21:52:20 UTC 2018


On Wed, 6 Jun 2018, Paul Wouters wrote:

> On Tue, 5 Jun 2018, Antony Antony wrote:
>
>>  I noticed a regression in test ikev2-ike-rekey-03. I was hunting an issue
>>  Tuomo found. While trouble shooting it I noticed this regression.
>
>>  http://testing.libreswan.org/results/testing/v3.22-1500-g2efd341-master/ikev2-ike-rekey-03/

I looked at this test case. Clearly there is a bug now, but the
situation is complicated by the fact that the test case seems
flawed to begin with due to the extremely short rekey timers used.

This is my observation on east:

[initial setup of all tunnels]

westnet-eastnet-ikev2c #1 IKE SA
westnet-eastnet-ikev2a #2 [192.0.2.0-192.0.2.255:0-65535 0] -> [192.0.1.0-192.0.1.255:0-65535 0]  ESP=>0x030eb7c4 <0xeb332c41
westnet-eastnet-ikev2b #3 [192.0.200.0-192.0.200.255:0-65535 0] -> [192.0.100.0-192.0.100.255:0-65535 0]  ESP=>0xc8c58945 <0x71312be9
westnet-eastnet-ikev2c #4 [192.0.201.0-192.0.201.255:0-65535 0] -> [192.0.101.0-192.0.101.255:0-65535 0]  ESP=>0x98a747b7 <0x4513d3f4

[rekey event as responder]
westnet-eastnet-ikev2a #5: negotiated new IPsec SA [192.0.2.0-192.0.2.255:0-65535 0] -> [192.0.1.0-192.0.1.255:0-65535 0] ESP=>0xd09a6b80 <0x13cdc164

[timer event unrelated to anything of remote peer kicks in]
| handling event EVENT_SA_EXPIRE for child state #4
westnet-eastnet-ikev2c #4: deleting state (STATE_V2_IPSEC_R)
[this means 192.0.201.0/24 - 192.0.101.0/24 tunnel is now BROKEN]

[Remote deletes old ipsec SA which was just rekeyed]
westnet-eastnet-ikev2c #1: received Delete SA payload: delete IPSEC State #2 now
westnet-eastnet-ikev2a #2: deleting other state #2 connection (STATE_V2_IPSEC_R) "westnet-eastnet-ikev2a"

[remote rekey event]
"westnet-eastnet-ikev2b" #6: negotiated new IPsec SA [192.0.200.0-192.0.200.255:0-65535 0] -> [192.0.100.0-192.0.100.255:0-65535 0] ESP=>0x304bf497 <0x970d9cc0

[remote delete ipsec sa which was just rekeyed]
"westnet-eastnet-ikev2c" #1: received Delete SA payload: delete IPSEC State #3 now

[next remote sends rekey ipsec sa for #4 which we already expired by timer!]
"westnet-eastnet-ikev2c" #1: CREATE_CHILD_SA no such IPsec SA to rekey SA(0x98a747b7) Protocol PROTO_v2_ESP
"westnet-eastnet-ikev2c" #7: responding to CREATE_CHILD_SA message (ID 8) from 192.1.2.45:500 with encrypted notification CHILD_SA_NOT_FOUND

[remote then tries to delete the thing we told it we already don't know anything about]
#1: received delete request for PROTO_v2_ESP SA(0x98a747b7) but corresponding state not found

[remote replaces #4 which we already deleted]
"westnet-eastnet-ikev2c" #8: negotiated new IPsec SA [192.0.201.0-192.0.201.255:0-65535 0] -> [192.0.101.0-192.0.101.255:0-65535 0] ESP=>0x63221c98 <0x3d570d8c
[ this means 192.0.201.0/24 - 192.0.101.0/24 tunnel is now repaired]

[remote rekeys #5 with #9, but our end then mistakenly deletes wrong state]
"westnet-eastnet-ikev2a" #9: negotiated new IPsec SA [192.0.2.0-192.0.2.255:0-65535 0] -> [192.0.1.0-192.0.1.255:0-65535 0] ESP=>0x10cb8a4c <0x9b32c05b
"westnet-eastnet-ikev2c" #8: deleting state (STATE_V2_IPSEC_R)
[note westnet-eastnet-ikev2a vs westnet-eastnet-ikev2c]

[remote deletes the replaced ipsec sa]
"westnet-eastnet-ikev2c" #1: received Delete SA payload: delete IPSEC State #5 now

And it goes on.

So the first problem is that #4 is deleted before it is rekeyed on east
and everything is a bit confused from there on.

But I think the real issue is that we have code that doesn't fully
understand the concept of shared IKE SA. With the IKE SA being on
"westnet-eastnet-ikev2c", it seems that it deletes another conn's
IPsec SA (on westnet-eastnet-ikev2a). Maybe this is because it didn't
find one on westnet-eastnet-ikev2c, because a timer just deleted it.


I think that instead of this playing with timers, it is worth
introducing a new whack command that rekeys an IKEv2 SA, so that we can
do rekeying without using 30 second timers in our test cases.

Looking at if the IPsec SA's properly migrate to the new IKE SA I am
observing the following:

[first incoming IKE SA rekey for #1]
| "westnet-eastnet-ikev2c" #1 received IKE Rekey Request CREATE_CHILD_SA from 192.1.2.45:500 Child "westnet-eastnet-ikev2c" #11 in STATE_V2_REKEY_IKE_R will process it further

We know that 3 IPsec SA's should move from #1 to #11

| #11 inherit #10 from parent #1
| #11 inherit #9 from parent #1
| #11 inherit #7 from parent #1

So that seems to work properly (unlike my expectation)

But the next one confuses me. I would expcet a IKE REKEY Request for
#11, but the next one I see is actually:

| "westnet-eastnet-ikev2c" #13 received IKE Rekey Request CREATE_CHILD_SA from 192.1.2.45:500 Child "westnet-eastnet-ikev2c" #17 in STATE_V2_REKEY_IKE_R will process it further

So where did #13 come from? Why is that not #11 ?

| switching from parent? #1 to child #11 in FSM processor

What does this mean? Surely it does not mean it switched from a parent
state to a child state because this involves none of the child IPsec
SA's. But the code indeed switches?

                 struct state *cst = process_v2_child_ix(md, st);
                 if (cst == NULL) {
                         /* no go. Could improve the status code? */
                         complete_v2_state_transition(mdp, STF_FAIL);
                         return;
                 }

                 DBGF(DBG_CONTROL,
                      "switching from parent? #%lu to child #%lu in FSM processor",
                      st->st_serialno, cst->st_serialno);

                /* switch from parent state to child state */
                 st = cst;
 	}

 	md->st = st;

It even updated the state associated to the md? That seems wrong.
This is a CREATE_CHILD_SA not involving any IPsec SA, yet we moved
state from IKE SA to IPsec SA? If that IPsec SA would have a different
IKE SA, the resulting IKE message would be undecryptable by the remote
endpoint. Also remember above we deleted #4, so what if there is no
IPsec SA state? The above code just fails out when cst == NULL.

Back to #11, it completes the new IKE SA. We now have two IKE SA's #1
and #11. And we then receive a delete for #1. Note that I see:

| STF_OK but no state object remains

and processing stops. So we did not send an informational delete answer.
(will this cause any msgid deadlock? Maybe not because it was the last
  message to be send/received over that IKE SA anyway)

The next message is from the new IKE SA.

"westnet-eastnet-ikev2c" #11: CREATE_CHILD_SA no such IPsec SA to rekey SA(0x63221c98) Protocol PROTO_v2_ESP

This is the state we (wrongly) deleted before. The remote assumed it migrated.

So what about #11 vs #13? It seems we just derail even more:

"westnet-eastnet-ikev2c" #11: received Delete SA payload: delete IPSEC State #10 now
"westnet-eastnet-ikev2b" #10: deleting other state #10 connection (STATE_V2_IPSEC_R) "westnet-eastnet-ikev2b"
"westnet-eastnet-ikev2b" #10: ESP traffic information: in=336B out=336B
"westnet-eastnet-ikev2c" #12: deleting other state #12 (STATE_CHILDSA_DEL)
"westnet-eastnet-ikev2c" #12: ERROR: netlink response for Del SA esp.549ef532 at 192.1.2.45 included errno 3: No such process
"westnet-eastnet-ikev2c" #12: ERROR: netlink response for Del SA esp.0 at 192.1.2.23 included errno 3: No such process
"westnet-eastnet-ikev2c" #7: deleting other state #7 (STATE_CHILDSA_DEL) "westnet-eastnet-ikev2c" #7: ERROR: netlink response for Del SA esp.65d93d00 at 192.1.2.45 included errno 3: No such process
"westnet-eastnet-ikev2c" #7: ERROR: netlink response for Del SA esp.0 at 192.1.2.23 included errno 3: No such process
"westnet-eastnet-ikev2c" #11: deleting state (STATE_IKESA_DEL)
packet from 192.1.2.45:500: proposal 1:IKE:ENCR=AES_GCM_C_256;PRF=HMAC_SHA2_512;DH=MODP2048 chosen from remote proposals
"westnet-eastnet-ikev2c" #13: STATE_PARENT_R1: received v2I1, sent v2R1 {auth=IKEv2 cipher=aes_gcm_16_256 integ=n/a prf=sha2_512 group=MODP2048}

Things come crashing down, and a new IKE SA is started from scratch,
with I presume fresh IPsec SA's to be generated as well. And I assume
we do another loop with similar results.

I've stopped tracking it at this point. This specific test run has log
files updated to vault:/tmp/ikev2-ike-rekey-03 if someone wants to have
a look. But I think it would be better to not use short timers, and
invoke an IKE rekey via whack, and then observe what is happening.

Paul


More information about the Swan-dev mailing list