[Swan] Libreswan state machine. What to do after STATE_QUICK_R2?
rstyczynski at gmail.com
Tue Apr 13 08:32:52 UTC 2021
it's Oracle OCI - log is captured at OCI. So it looks that OCI uses customised libreswan. It was my suspicion, as Oracle recommends Libreswan for special purposes. I implemented libreswan gateway combined with pacemaker and OCI floating IP (quite different from regular linux), but for this Azure connection wanted to use out of the box OCI feature.
It looks like series of bugs in Azure implementation. They do not take salifetime configuration for IKEv1 (we are trying with proper CLI params, but engine does not apply this), and does not respond to rekey messages on time. Sometimes they do, but majority of cases does not. My workaround is to keep both tunnels (OCI maintains always two tunnels) started with 30 minutes delay, so costly recreation of IPsec tunnel (5 minutes) will be performed in other moments, keeping VPN connection always ready for tcp transmissions.
Many thanks for your time,
> On 12 Apr 2021, at 17:42, Paul Wouters <paul at nohats.ca> wrote:
> On Mon, 12 Apr 2021, Ryszard Styczynski wrote:
>> Subject: Re: [Swan] Libreswan state machine. What to do after STATE_QUICK_R2?
>> thanks for sharing the state machine. Now I understnad what is going on. I see in logs following sequence:
>> 00:12:59 no rekeying on traffic selector override connection
> This log line does not match any libreswan (or openswan) code. It is
> customized. Is this the Oracle OCI side or the Azure side?
>> 00:17:29 deleting state (STATE_QUICK_R2) aged 3600.122s and sending notification
> This message confirms it is libreswan.
>> The VPN is established between Oracle OCI and Azure. I'm not sure which technology is ued by Oracle OCI, but it
>> look like Libreswan. I received information from partner operating Azure that they have bug in SA lifetime
>> configuration for IKEv1; provided setting is ignored and they always use value of 27000 s. It's not compatible
>> with OCI side configuration of 3600 s.
> lifetime is not negotiated. Whichever side has the shortest lifetime
> starts the rekey process, provided they are not configured with
>> OCI side expects to renegotiate phase 2 after 1h, waits 15 seconds for this
>> to happen (17:29 - 17:44) and gives up.
> That sounds like rekey=no with salifetime=3600 or maybe rekey=yes with
> keyingtries=1 or something silly like that?
>> Whole connection is destroyed and recreated. It's noticed by Azure.
>> Finally as Azure as the Initiator (OCI is the responder what is even visible in states *_R*) tries to recreate the
>> VPN. It's done with success after 5 minutes.
>> Mystery solved. 15 seconds timeout may be custom tuning at OCI implementation.
> You cannot change the azure side either ?
More information about the Swan