[Swan-dev] crash after pluto: Fix addresspool reference count

wolfgang at linogate.de wolfgang at linogate.de
Fri Oct 6 19:29:33 UTC 2017


On Fri, 6 Oct 2017 19:24:39 +0200, Antony Antony wrote
> On Thu, Oct 05, 2017 at 09:57:06PM +0200, Wolfgang Nothdurft wrote:
> > Am 05.10.2017 um 20:57 schrieb Antony Antony:
> > > On Thu, Oct 05, 2017 at 08:36:52PM +0200, Wolfgang Nothdurft wrote:
> > > > Am 05.10.2017 um 20:18 schrieb Antony Antony:
> > > > > Wow, this patch looks like a heavy hammer solution. To reference
count the
> > > > > pool for each lease? There is something else going on. I imagine
reproducing
> > > > > #299 will give more info. Also wonder no unrefrence when the lease goes
> > > > > away. Did you check for memory leak after this patch?
> > > > > 
> > > > > Thanks for the proposed patch, it gave a bit more insight into the
issue.
> > > > > 
> > > > 
> > > > memory leak is not the problem, because at the moment the
> > > > unrefence_addresspool is called to often.
> > > > 
> > > > My final solution at the moment is to move unreference_addresspool to the
> > > > release leases function and when the non-instance connection is deleted.
> > > > 
> > > > The question is for what the refcount stands, only for installing a
> > > > addresspool it is not necessary in my opinion. But I'm not as deep in the
> > > > code as the one who wrote it initially.
> > > 
> > > An addresspool is shared between connections. Eech connection add on
> 
> > > sreference count. I think a connection instance may also add a reference
> > > count, I am not sure any more.
> > > 
> > > Lease should not add reference count to the pool. Atleast that is the idea.
> > > 
> > > I will look into soon, probably tomorrow.
> > > 
> > 
> > ah ok, than it is easy. Than the unreference call is wrong and should only
> > be called when the non-instance connection is deleted.
> > 
> > I have updated lsw#299 with the final patch.
> 
> Thanks for the new patch. I reviewed it and realized this would 
> break when deleting an established connection.
> 
> Here is the core dump after applying the patch,
> https://bugs.libreswan.org/attachment.cgi?id=112
> 
> To reproduce, run test xauth-pluto-16 after connection from road is 
> established on east, the responder, delete it.  
> ipsec auto --delete modecfg-east-21
> 
> and pluto crash. If I remember correctly, the reason is when 
> deleting a connection pluto delete the CK_TEMPLATE first. So both 
> instance and template should refcount.
> 
> (gdb) bt
> #0  0x000055b6925de68c in rel_lease_addr (c=0x7fe2b3bdeb08)
>     at /home/build/libreswan/programs/pluto/addresspool.c:183
> #1  0x000055b6925f0c08 in delete_connection (c=0x7fe2b3bdeb08, 
> relations=false)
>     at /home/build/libreswan/programs/pluto/connections.c:282
> #2  0x000055b6925f1471 in delete_connections_by_name (
>     name=0x7fff367a1960 "modecfg-east-21", strict=true)
>     at /home/build/libreswan/programs/pluto/connections.c:415
> #3  0x000055b69266dbda in whack_process (whackfd=27, m=0x7fff367a1440)
>     at /home/build/libreswan/programs/pluto/rcv_whack.c:392
> #4  0x000055b69266eaae in whack_handle (whackctlfd=4)
>     at /home/build/libreswan/programs/pluto/rcv_whack.c:779
> #5  0x000055b69266e73b in whack_handle_cb (fd=4, event=2, arg=0x0)
>     at /home/build/libreswan/programs/pluto/rcv_whack.c:679
> #6  0x00007fe2ba8693f9 in event_persist_closure (ev=0x7fe2b3f92f70,
>     base=0x7fe2b3b1fd80) at event.c:1319
> #7  event_process_active_single_queue (activeq=0x7fe2b3b25ff0, 
> base=0x7fe2b3b1fd80)
>     at event.c:1363
> #8  event_process_active (base=<optimized out>) at event.c:1438
> #9  event_base_loop (base=0x7fe2b3b1fd80, flags=0) at event.c:1639
> #10 0x000055b692615d17 in main_loop ()
>     at /home/build/libreswan/programs/pluto/server.c:813
> #11 0x000055b692616270 in call_server ()
>     at /home/build/libreswan/programs/pluto/server.c:946
> #12 0x000055b692612b3d in main (argc=5, argv=0x7fff367a53b8)
>     at /home/build/libreswan/programs/pluto/plutomain.c:1814
> 
> May be you need sharing address pools too, I am not sure.

Sorry, I missed that the initial problem was triggered with a configured
static ip in /etc/ipsec.d/passwd.
 
I have added a patch for you for the xauth-pluto-22 test to reproduce lsw299
with v3.21 and it also triggers the rel_lease_addr crash with my actual patch.

The actual problems seems to be when installing a new addresspool from
ikev1_xauth.c.
This code is initially from me and I think when I implemented it I overlooked
that the pool is shared and not copied for the instance.
I can look to rework it next week.
-------------- next part --------------
A non-text attachment was scrubbed...
Name: xauth-pluto-22-lsw299-crash
Type: application/octet-stream
Size: 2004 bytes
Desc: not available
URL: <https://lists.libreswan.org/pipermail/swan-dev/attachments/20171006/d2338e89/attachment.obj>


More information about the Swan-dev mailing list