[Swan] Libreswan state machine. What to do after STATE_QUICK_R2?
rstyczynski at gmail.com
Mon Apr 12 15:11:30 UTC 2021
thanks for sharing the state machine. Now I understnad what is going on. I see in logs following sequence:
00:12:59 no rekeying on traffic selector override connection
00:17:29 deleting state (STATE_QUICK_R2) aged 3600.122s and sending notification
00:17:29 ESP traffic information in=513KB out=745KB
00:17:44 terminating SAa using this connection
00:17:44 deleting state (STATE_MAIN_R3) aged 7228.122s and sending notification
00:17:44 added connection description "xxxx"
00:17:44 listening for IKE messages
00:17:44 forgetting secrets
The VPN is established between Oracle OCI and Azure. I'm not sure which technology is ued by Oracle OCI, but it look like Libreswan. I received information from partner operating Azure that they have bug in SA lifetime configuration for IKEv1; provided setting is ignored and they always use value of 27000 s. It's not compatible with OCI side configuration of 3600 s. OCI side expects to renegotiate phase 2 after 1h, waits 15 seconds for this to happen (17:29 - 17:44) and gives up. Whole connection is destroyed and recreated. It's noticed by Azure. Finally as Azure as the Initiator (OCI is the responder what is even visible in states *_R*) tries to recreate the VPN. It's done with success after 5 minutes.
Mystery solved. 15 seconds timeout may be custom tuning at OCI implementation.
> On 9 Apr 2021, at 03:22, Paul Wouters <paul at nohats.ca> wrote:
> On Thu, 8 Apr 2021, Ryszard Styczynski wrote:
>> I'm looking for IPsec state machine implemented in Libreswan. I may guess how states are correlated, but having a state machine will give me a final answer.
> For IKEv1, the state machine is in programs/pluto/ikev1.c
>> My current question is what is a next state after STATE_QUICK_R2? Should IPsec engine wait for rekeying? How long? How many times should repeat waiting step? Should go back to STATE_MAIN and delete SA? When?
>> I currently see i my system that:
>> 1. STATE_QUICK_R2 may go to STATE_MAIN_R3, delete SA, and reestablish connection from Phase 1 - it happens after 15 seconds
>> 2. STATE_QUICK_R2 may go to STATE_QUICK_R1 and process rekeying - it happens when peer responds quicker than 15 seconds
>> How to understand why sometimes SA is deleted (what causes 5 minutes line drop), and sometimes rekeying is completed? How to control time limits?
> A proper exchange looks like:
> paul at thinkpad:~/libreswan.git/testing/pluto/basic-pluto-01 (main=)$ grep STATE_ OUTPUT/east.pluto.log |grep transition
> | IKEv1: transition from state STATE_MAIN_R0 to state STATE_MAIN_R1
> | IKEv1: transition from state STATE_MAIN_R1 to state STATE_MAIN_R2
> | IKEv1: transition from state STATE_MAIN_R2 to state STATE_MAIN_R3
> | IKEv1: transition from state STATE_QUICK_R0 to state STATE_QUICK_R1
> | IKEv1: transition from state STATE_QUICK_R1 to state STATE_QUICK_R2
> Nothing should really happen after 15 seconds, so perhaps you should
> show us your logs to see what is happening?
-------------- next part --------------
An HTML attachment was scrubbed...
More information about the Swan