[Swan] Pluto consumes all available memory

Paul Wouters paul at nohats.ca
Wed May 20 22:26:30 EEST 2015


On Wed, 20 May 2015, Will Roberts wrote:

>       Which were? there must have been a lot of the,?
> 
> These are the exact same leaks in the exact same quantity from my original report, nothing is building up over time and being detected.
> 
> May 20 18:01:57 sanfrancisco pluto[7864]: leak: 4 * struct event in event_schedule(), item size: 32
> May 20 18:01:57 sanfrancisco pluto[7864]: leak: pluto crypto helpers, item size: 64
> May 20 18:01:57 sanfrancisco pluto[7864]: leak detective found 5 leaks, total size 96

That's all? Thats a leaking 96 bytes, so where did the memory go?

You did not link this against ElectricFence right? Because that would
cause a lot of memory fragmentation that could explain this.

>       You had 34623 states since startup. Usually that indicates tunnels that
>       are infinitely failing to establish. I suspect something in the error
>       path is leaking that memory.
> 
> Our monitoring system repeatedly sets up and tears down tunnels; its purpose is to verify that the system is able to accept incoming requests and properly create the
> tunnel. This process happens every 5 minutes, from each of our 3 monitoring systems. So that is 864 tunnels a day, and with 2 states used per tunnel, 34623 is roughly 20
> days worth of testing tunnels.

Ok that makes sense then.

>             May 20 18:01:57 sanfrancisco pluto[7864]: "wonderproxy-L2TP" #34623: ERROR: netlink response for Del SA esp.b7f1c7e7 at 198.199.98.122 included
>             errno 3: No such process

>       Those are SA's the kernel deleted but pluto thought those should still
>       be there. I'm confused what would have happened to those.
> 
> These are ones that were detected as failures by our monitoring system. Here is the full log for #34622/34623:

> May 20 17:49:13 sanfrancisco pluto[7864]: "wonderproxy-L2TP"[15046] 69.90.78.100 #34623: transition from state STATE_QUICK_R0 to state STATE_QUICK_R1
> May 20 17:49:13 sanfrancisco pluto[7864]: "wonderproxy-L2TP"[15046] 69.90.78.100 #34623: STATE_QUICK_R1: sent QR1, inbound IPsec SA installed, expecting QI2
> May 20 17:49:13 sanfrancisco pluto[7864]: "wonderproxy-L2TP"[15046] 69.90.78.100 #34623: unable to popen up-host command

Ok, it failed to start the updown script, presumably because you already
ran out of memory. So that's a red herring.

> No, I don't see any failures/errors in the log until this issue is triggered.

So you are running out of memory on the machine. And you know for sure
it is pluto taking up that memory? Almost all memory allocations go
through leak-detective which shows there are no leaks. So this is very
mysterious.

Paul


More information about the Swan mailing list