[Swan-dev] found in my postponed folder: Re: xfrmi branch (fwd)

Paul Wouters paul at nohats.ca
Fri Jan 17 13:37:27 UTC 2020


I found this one in my postponed folder. I am not sure what parts are
still relevant or not.

Paul


On Mon, 30 Sep 2019, Paul Wouters wrote:

>  Subject: xfrmi branch

I looked further into my vpn.nohats.ca client issue, and found
the following diff breaks it for me (even when leftiface=id is
not set):

-    oops="$(eval ${it} 2>&1)"
-    st=$?
-    if [ -z "${oops}" -a ${st} -ne 0 ]; then
-       oops="silent error, exit status ${st}"
+
+    if [ "${ROUTE}" = "yes" -o "${XFRMI_ROUTE}" = "yes" ]; then
+        oops="$(eval ${it} 2>&1)"
+       st=$?
+
+       st_r=$(eval err_check ${st} ${oops} "$it")
+       if [ ${st_r} -ne 0 ]; then
+           return ${st}
+       fi

This changes the eval of $it to only happen if ROUTE of XFRMI_ROUTE is
set. But if you look slightly above it in the script, it= is only ever
set if PLUTO_PEER_CLIENT is 0.0.0.0/0 so we get our halfroutes. This is
why it breaks my vpn client connection with leftiface-id=no.

So the patch for this is basically:


             it="ip route ${cmd} 0.0.0.0/1 ${parms2} && ip route ${cmd} 
128.0.0.0 /1 ${parms2}"
+           HALFROUTES=yes

[...]

      if [ "${ROUTE}" = "yes" -o "${XFRMI_ROUTE}" = "yes" ]; then
+       if [ "${HALFROUTES}" = "no" ]; then


I confirmed also that when using XFRMi and scope 50, we do not need to
set any halfroutes and omit any "via" parameter, and things work, but
it needs a second fix:

So this is the second patch I needed:

       # use nexthop if nexthop is not %direct and POINTPOINT is not set
       if [ "${PLUTO_NEXT_HOP}" != "${PLUTO_PEER}" -a -z "${POINTPOINT}" ]; then
-       parms2="via ${PLUTO_NEXT_HOP}"
+       # XFRM interface needs no nexthop
+       if [ -z "${PLUTO_XFRMI_ROUTE}"  ]; then
+          parms2="via ${PLUTO_NEXT_HOP}"
+       fi

Otherwise we try to set a route the kernel refuses and packet flow is
broken because the scope 50 table isn't set properly.

With these two fixes, my vpn.nohats.ca client works with and without
leftiface-id=yes

I added a test case ikev2-xfrmi-05-remote-access-client, copied from a
test case without xfrm interfaces enabled. So a regression in either
case will show up with a test failure.

Note there are still minor issues with the updown script. We get a few
errors still that are non-fatal and show up in the "good" reference
output. Those need fixing still.

I reran the ikev2-xfrmi test cases and they still pass with these
changes.

Note, when testing my updown changes, I ran the xfrmi test cases, and
ikev2-xfrmi-01 once showed a craher on east:

|  expiring aged bare shunts from shunt table
|  spent 0.00563 milliseconds in global timer EVENT_SHUNT_SCAN
|  processing global timer EVENT_SHUNT_SCAN
|  expiring aged bare shunts from shunt table
|  spent 0.00418 milliseconds in global timer EVENT_SHUNT_SCAN
in event_schedule (type=<optimized out>, delay=..., st=0x55d765980000)
at
/usr/src/debug/libreswan-3.28-0.rc877_ga79330704c_xfrmi.x86_64/programs/pluto/timer.c:637
#5  0x000055d7641634d3 in timer_event_cb (unused_fd=<optimized out>,
unused_event=<optimized out>, arg=<optimized out>) at
/usr/src/debug/libreswan-3.28-0.rc877_ga79330704c_xfrmi.x86_64/programs/pluto/timer.c:336
#6  0x00007fd106615a5a in event_process_active_single_queue
(base=base at entry=0x55d765967e40, activeq=0x55d765968100,
max_to_process=max_to_process at entry=2147483647,
endtime=endtime at entry=0x0) at event.c:1646
#7  0x00007fd10661630f in event_process_active (base=0x55d765967e40) at
event.c:1738
#8  event_base_loop (base=0x55d765967e40, flags=flags at entry=0) at
event.c:1961
#9  0x000055d764160865 in call_server () at
/usr/src/debug/libreswan-3.28-0.rc877_ga79330704c_xfrmi.x86_64/programs/pluto/server.c:1496

which maps to the timer event hitting the default cause and expecting no
st_event but having one:

          case  EVENT_v1_SEND_XAUTH:
                  passert(st->st_send_xauth_event == NULL);
                  st->st_send_xauth_event = ev;
                  break;

          default:
                  passert(st->st_event == NULL);
                  st->st_event = ev;
                  break;
          }

when this happened, I was running namespace based testing, but I also had
ipsec1 to vpn.nohats.ca up on my bare metal. I stopped the host ipsec
and killed all pluto's and reran the test a few times and no crasher
happened ?

Paul



More information about the Swan-dev mailing list