[Swan-dev] Current IKEv2 insist/pending/rekey issues

Matt Rogers mrogers at redhat.com
Mon Jun 3 23:38:08 EEST 2013


Hi all, I've been investigating a few IKEv2 issues and I'd like some feedback on the problems and the fixes that I've included. All problems here are seen in the current libreswan git and also on the latest RHEL Openswan versions.

1. ikev2=insist as responder. This one is simple to reproduce, just set ikev2=insist on the responder and initiate with IKEv1. You'll notice that the responder still accepts the v1 proposal, even though v1 is disabled in the policy. The ipsec.conf man page states that ikev2=insist means we should only propose and accept v2. The fix for this was simple, see v2insist.patch. 

2. IKEv2 parent SA's are always in a pending state. This one manifests in an interesting way and is reproducible when enabling a dpdtimeout on a IKEv2 connection such as:

conn v2md5
        left=west
        right=east
        ike=3des-md5;modp1024
        phase2alg=3des-md5
        ikev2=insist
        authby=secret
        auto=start
        dpdaction=restart
        dpddelay=30
        dpdtimeout=300
        pfs=no

What you'll notice in the logs is that while the SA is established and has a default timeout:

Jun  3 15:18:13 east pluto[32036]: "v2md5" #2: transition from state STATE_PARENT_I2 to state STATE_PARENT_I3
Jun  3 15:18:13 east pluto[32036]: "v2md5" #2: negotiated tunnel [192.168.1.113,192.168.1.113:0-65535 0] -> [192.168.1.112,192.168.1.112:0-65535 0]
Jun  3 15:18:13 east pluto[32036]: | NAT-T: their IKE port is '500'
Jun  3 15:18:13 east pluto[32036]: | NAT-T: forceencaps is 'disabled'
Jun  3 15:18:13 east pluto[32036]: "v2md5" #2: STATE_PARENT_I3: PARENT SA established tunnel mode {ESP=>0x03312823 <0xb7197a49 xfrm=3DES_192-HMAC_MD5 NATOA=none NATD=none DPD=none}
Jun  3 15:18:13 east pluto[32036]: | releasing whack for #2 (sock=-1)
Jun  3 15:18:13 east pluto[32036]: | releasing whack for #1 (sock=-1)
Jun  3 15:18:13 east pluto[32036]: | deleting event for #2
Jun  3 15:18:13 east pluto[32036]: | inserting event EVENT_SA_REPLACE, timeout in 28790 seconds for #2

EVENT_PENDING_PHASE2 will continually run against the child SA since the parent is still considered pending:

Jun  3 15:24:11 east pluto[32036]: | pending review: connection "v2md5" checked
Jun  3 15:24:11 east pluto[32036]: | checking connection "v2md5" for stuck phase 2s (1370287093+ 3*300) <= 1370287451

Eventually the SA is considered stuck, and is replaced:

Jun  3 15:34:11 east pluto[32036]: | checking connection "v2md5" for stuck phase 2s (1370287093+ 3*300) <= 1370288051
Jun  3 15:34:11 east pluto[32036]: | connection "v2md5" stuck, restarting
Jun  3 15:34:11 east pluto[32036]: pending Quick Mode with 192.168.1.112 "v2md5" took too long -- replacing phase 1
Jun  3 15:34:11 east pluto[32036]: | processing connection v2md5
Jun  3 15:34:11 east pluto[32036]: | kernel_alg_db_new() initial trans_cnt=128
Jun  3 15:34:11 east pluto[32036]: | kernel_alg_db_new() will return p_new->protoid=3, p_new->trans_cnt=1
Jun  3 15:34:11 east pluto[32036]: | kernel_alg_db_new()     trans[0]: transid=3, attr_cnt=1, attrs[0].type=5, attrs[0].val=1
Jun  3 15:34:11 east pluto[32036]: | returning new proposal from esp_info
Jun  3 15:34:11 east pluto[32036]: | creating state object #3 at 0x7f2f57b9a6d0
Jun  3 15:34:11 east pluto[32036]: | processing connection v2md5
Jun  3 15:34:11 east pluto[32036]: | ICOOKIE:  83 d4 cf 56  d9 91 ef 4a
Jun  3 15:34:11 east pluto[32036]: | RCOOKIE:  00 00 00 00  00 00 00 00
Jun  3 15:34:11 east pluto[32036]: | state hash entry 11
Jun  3 15:34:11 east pluto[32036]: | inserting state object #3
Jun  3 15:34:11 east pluto[32036]: | inserting event EVENT_SO_DISCARD, timeout in 0 seconds for #3
Jun  3 15:34:11 east pluto[32036]: | event added at head of queue
Jun  3 15:34:11 east pluto[32036]: | processing connection v2md5
Jun  3 15:34:11 east pluto[32036]: | Queuing pending Quick Mode with 192.168.1.112 "v2md5"
Jun  3 15:34:11 east pluto[32036]: "v2md5" #3: initiating v2 parent SA

>From this point on the SA has a forced rekey with each phase2 connection check. The solution here was to unpend the parent SA inside of success_v2_state_transition. Although unpend() was not very friendly to a v2 parent SA and I needed to avoid the quick_outI1 that happens there. See v2unpend.patch.

3. So with the unpend changes (or by not setting a dpdtimeout), the v2 SA's are good to rekey normally, right?  Not quite :(

If you modify that configuration to read:

conn v2md5
        left=west
        right=east
        ike=3des-md5;modp1024
        phase2alg=3des-md5
        ikev2=insist
        authby=secret
        auto=start  
        ikelifetime=10m
        salifetime=3m
        rekeymargin=10s
        rekeyfuzz=0%
        pfs=no

You'll now notice that whenever EVENT_SA_REPLACE is executed on a child SA, a v1 proposal takes its place, and an assert happens.

Jun  3 15:55:05 east pluto[32218]: | next event EVENT_SA_REPLACE in 3 seconds for #2
Jun  3 15:55:05 east pluto[32218]: | next event EVENT_SA_REPLACE in 3 seconds for #2
Jun  3 15:55:08 east pluto[32218]: |
Jun  3 15:55:08 east pluto[32218]: | next event EVENT_SA_REPLACE in 0 seconds for #2
Jun  3 15:55:08 east pluto[32218]: | *time to handle event
Jun  3 15:55:08 east pluto[32218]: | handling event EVENT_SA_REPLACE
Jun  3 15:55:08 east pluto[32218]: | event after this is EVENT_PENDING_DDNS in 8 seconds
Jun  3 15:55:08 east pluto[32218]: | processing connection v2md5
Jun  3 15:55:08 east pluto[32218]: "v2md5" #2: replacing stale IPsec SA
Jun  3 15:55:08 east pluto[32218]: | creating state object #3 at 0x7f0561d866a0
Jun  3 15:55:08 east pluto[32218]: | processing connection v2md5
Jun  3 15:55:08 east pluto[32218]: | ICOOKIE:  63 d6 d0 75  c8 aa bc e2
Jun  3 15:55:08 east pluto[32218]: | RCOOKIE:  00 00 00 00  00 00 00 00
Jun  3 15:55:08 east pluto[32218]: | state hash entry 24
Jun  3 15:55:08 east pluto[32218]: | inserting state object #3
Jun  3 15:55:08 east pluto[32218]: | inserting event EVENT_SO_DISCARD, timeout in 0 seconds for #3
Jun  3 15:55:08 east pluto[32218]: | event added at head of queue
Jun  3 15:55:08 east pluto[32218]: | processing connection v2md5
Jun  3 15:55:08 east pluto[32218]: | Queuing pending Quick Mode with 192.168.1.112 "v2md5"
Jun  3 15:55:08 east pluto[32218]: "v2md5" #3: initiating Main Mode
Jun  3 15:55:08 east pluto[32218]: | **emit ISAKMP Message:
Jun  3 15:55:08 east pluto[32218]: |    initiator cookie:
Jun  3 15:55:08 east pluto[32218]: |   63 d6 d0 75  c8 aa bc e2
Jun  3 15:55:08 east pluto[32218]: |    responder cookie:
Jun  3 15:55:08 east pluto[32218]: |   00 00 00 00  00 00 00 00
Jun  3 15:55:08 east pluto[32218]: |    next payload type: ISAKMP_NEXT_SA
Jun  3 15:55:08 east pluto[32218]: |    ISAKMP version: ISAKMP Version 1.0 (rfc2407)
Jun  3 15:55:08 east pluto[32218]: |    exchange type: ISAKMP_XCHG_IDPROT
Jun  3 15:55:08 east pluto[32218]: |    flags: none
Jun  3 15:55:08 east pluto[32218]: |    message ID:  00 00 00 00
Jun  3 15:55:08 east pluto[32218]: "v2md5" #3: ASSERTION FAILED at /root/libreswan/programs/pluto/spdb_struct.c:192: emp_sp->prop_conj_cnt == 1
Jun  3 15:55:08 east pluto[32218]: "v2md5" #3: using kernel interface: netkey
Jun  3 15:55:08 east pluto[32218]: "v2md5" #3: interface lo/lo ::1
Jun  3 15:55:08 east pluto[32218]: "v2md5" #3: interface lo/lo 127.0.0.1
Jun  3 15:55:08 east pluto[32218]: "v2md5" #3: interface lo/lo 127.0.0.1

This was due to a few decisions in pick_initiator that resulted in choosing a v1 initiator function even though our policy has v1 disabled. The assert happens later on not because of the incorrect function, but because the policy was mostly reset. The assert was still happening on the v2 initiator function, so restoring the policy from the connection right before the function is called allows the rekey to happen properly. See v2rekey.patch.

As an aside, It seems that right now DPD settings doen't have any effect on v2 SA's and while we do process informational exchanges, we are missing the ability to send blank ones to use as liveness checks. Any plans/thoughts on this?

One final off topic thing, I always have to make this change to initsystems/systemd/ipsec.service.in on Fedora18 systems:

-ExecStartPre=@FINALSBINDIR@/ipsec addconn --config @FINALCONFFILE@ --checkconfig
+ExecStartPre=@FINALLIBDIR@/addconn --config @FINALCONFFILE@ --checkconfig

For some reason, /usr/local/sbin/ipsec addconn doesn't always honor my auto= settings, but /usr/local/libexec/ipsec/addconn does. I haven't really looked into it much so just seeing if anyone else has noticed that.

Thanks and regards,
Matt
-------------- next part --------------
A non-text attachment was scrubbed...
Name: v2rekey.patch
Type: text/x-patch
Size: 1693 bytes
Desc: not available
URL: <https://lists.libreswan.org/pipermail/swan-dev/attachments/20130603/8c1dd3ab/attachment.bin>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: v2unpend.patch
Type: text/x-patch
Size: 2658 bytes
Desc: not available
URL: <https://lists.libreswan.org/pipermail/swan-dev/attachments/20130603/8c1dd3ab/attachment-0001.bin>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: v2insist.patch
Type: text/x-patch
Size: 759 bytes
Desc: not available
URL: <https://lists.libreswan.org/pipermail/swan-dev/attachments/20130603/8c1dd3ab/attachment-0002.bin>


More information about the Swan-dev mailing list