[Swan-dev] leaks and cores

Sun May 16 15:07:06 UTC 2021

On Wed, 3 Mar 2021 at 09:43, Andrew Cagney <andrew.cagney at gmail.com> wrote:
>
> Another update:
>
> We're down to:
>
>
https://testing.libreswan.org/v4.3-86-g701b488bb9-main/ikev2-59-multiple-acquires-alias/OUTPUT/east.pluto.log.gz
> Mar  2 15:56:47.592566: leak: heap logger, item size: 64
> Mar  2 15:56:47.592656: leak: heap logger prefix, item size: 26
> Mar  2 15:56:47.592737: leak: DH shared secret, item size: 104
> Mar  2 15:56:47.592807: leak: DH crypto, item size: 384
> Mar  2 15:56:47.592876: leak: dh, item size: 40
> Mar  2 15:56:47.592937: leak: calc_dh_local_secret, item size: 32
> Mar  2 15:56:47.593002: leak detective found 6 leaks, total size 650
>
> but it only happens occasionally so presumably a race between revival,
> crypto helpers, and the main thread during shutdown
>

Looks like a leak in an IKEv1's background task that gets scheduled during
a shutdown.
(Since IKEv2 will need background jobs working for fragmentation this
matters).

Having post-mortem run refcnt.awk helped a lot:
https://testing.libreswan.org/v4.4-178-gcd08e1a67e-main/ikev1-x509-18-id-notany/OUTPUT/east.console.diff
++ awk -f /testing/utils/refcnt.awk /tmp/pluto.log
+ERROR: : '0x7fb8c2480fb8' has count 1
+2780: | newref clone logger at 0x7fb8c2480fb8(0->1) (in submit_task() at
server_pool.c:354)

it let me track down:

- shutting down (at this point, think of this more as a suggestion than
actual action)

- an IKEv1 state schedules a job (compute final DH) and puts it into the
background
| job 4 helper 0 #2 DH shared secret (dh): added to pending queue
| #2 is idle; has background offloaded task

- shutdown stops the helper threads (task still on queue)
| one helper thread exited, 0 remaining
| scheduling callback all helper threads stopped (#0)

- states deleted
| FOR_EACH_STATE_... in foreach_state_by_connection_func_delete
"ikev2-westnet-eastnet-x509-cr" #2: deleting state (STATE_MAIN_R2) aged
0.117802s and NOT sending notification

normally deleting the state orphans the job; leaving it to the helper
threads to clean up, except here the helper threads aren't running

several options come to mind:
- once the states are gone, free any still-out-standing jobs
- bravely trust locked refcounts and use that to detect when a job can be
deleted ...
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.libreswan.org/pipermail/swan-dev/attachments/20210516/85776f4f/attachment.html>