<div dir="ltr"><div dir="ltr"><div>Thanks for the sysctl recommendations! I'll start by changing the net.core.rmem_* and wmem_* values first, to make it easier to identify the solution. Along those lines ...</div><div><br></div><div>I can't seem to find confirmation, but I assume net.core.rmem_* and wmem_* are the only params that will have any affect on netlink sockets. The net.ipv4.* params seem to be specific to the AF_INET address family, and shouldn't affect netlink.</div><div><br></div><div>And judging by the name, I'm also assuming net.core.netdev_max_backlog only affects packets coming through real networking devices, not netlink. Does that sound correct to you?</div><div><br></div><div> "Errno 105" is in the error message I posted initially ... Is that what you're looking for? Seems to correspond with ENOBUFS.</div><div><br></div><div>Thanks again, Paul.</div></div></div><br><div class="gmail_quote"><div dir="ltr" class="gmail_attr">On Fri, Jan 25, 2019 at 10:08 PM Paul Wouters <<a href="mailto:paul@nohats.ca">paul@nohats.ca</a>> wrote:<br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">On Fri, 25 Jan 2019, Alan Szlosek wrote:<br>

<br>

> We're using Libreswan 3.25 (netkey) on Linux 4.15.0-1020-aws and are seeing the following error pop up in our pluto logs from time to time, sometimes several per hour.<br>

> <br>

>     ERROR: recvfrom() failed in netlink_get. Errno 105: No buffer space available<br>

<br>

That looks like the kernel ran out of memory perhaps?<br>

<br>

> Our CPU usage has stayed below 40%, and memory usage has stayed below 50%.<br>

<br>

But you say that's not the case. hmm. odd.<br>

<br>

> Is there some value we can tune with sysctl that will affect the buffer associated with these netlink sockets?<br>

<br>

You can try something like:<br>

<br>

# /etc/sysctl.d/pwouters-highspeed.conf<br>

# increase TCP max buffer size setable using setsockopt()<br>

net.core.rmem_max = 536870912<br>

net.core.wmem_max = 536870912<br>

# increase Linux autotuning TCP buffer limit<br>

net.ipv4.tcp_rmem = 16384 349520 16777216<br>

net.ipv4.tcp_wmem = 16384 349520 16777216<br>

# recommended to increase this for CentOS6 with 10G NICS or higher<br>

net.core.netdev_max_backlog = 250000<br>

# don't cache ssthresh from previous connection<br>

net.ipv4.tcp_no_metrics_save = 1<br>

# If you are using Jumbo Frames, also set this<br>

net.ipv4.tcp_mtu_probing = 1<br>

# recommended for CentOS7/Debian8 hosts<br>

net.core.default_qdisc = fq<br>

<br>

> What else should we be considering?<br>

<br>

You can try git master or apply this patch which should at least give us<br>

the proper failure code from recvfrom():<br>

<br>

<a href="https://github.com/libreswan/libreswan/commit/9482cbfd03bee42aa8ad4f0b7a2c3f84d02cf550" rel="noreferrer" target="_blank">https://github.com/libreswan/libreswan/commit/9482cbfd03bee42aa8ad4f0b7a2c3f84d02cf550</a><br>

<br>

Paul<br>

</blockquote></div>