[Swan-dev] IPSec restarts intermittently and crashes sometimes, PAYLOAD_MALFORMED issue observed: resend without logs

Rajeev Gaur rajeev.gaur at niyuj.com
Tue Feb 2 18:39:44 UTC 2016


Thank you Paul,

What you wrote in the last message [You should not have ......], I and the
support team are trying to work it out.

But it will be great if you can suggest on some of the points you mentioned
above. You may find that at some point that this is a small point, why he
is not understanding, but still if you can answer it will be good.

1) How did you understand that the sites are acting as both initiator and
responder?

2) Just for clarity, because the sites are acting as initiator and
responder and their ikelifetime and salifetime are different, you suggested
to keep them same so that even though they switch roles, one role does not
hold on to complete the duration of other role. The roles are switched at
the same time durations. Also, rather then my devices trigger the keying,
it is triggered when cisco router HST (hello state timer) expires. Am I
right?

3) I discussed the cisco bug point with my PM and they are looking for this
intermediate router. But how did you get that this is a cisco router?

4) Till now I was not able to reproduce this problem in house in my lab.
Now, PM say if I understand the problem, can I reproduce it inhouse. one
point here, I have gathered 3 devices, trying to make one as head-office
and other two as branch-offices. But how can I let these all behave as both
initiator and responder, so that the isusue is reproduced?

Thanks
Rajeev

On Fri, Jan 29, 2016 at 4:37 PM, Paul Wouters <paul at nohats.ca> wrote:

> You should not have that many instances of the same connection. It also
> seems these are both initiating and responding and all of these hang in
> quick mode, so phase 2 negotiation (rekeying)
>
> Possibly your problems start when timing causes the end points to switch
> roles from initiating to responding. One common cause is pfs= mismatch
> where one end is tolerant for a mismatch. But perhaps in your case there is
> a bad selection of PSK when switching between initiating or responding. You
> can try setting rekey=no with ikelifetime= and salifetime= set to 24h and
> hope hst the Cisco will initiate to you for rekey. Or try the reverse and
> set the lifetimes to 45m to ensure libreswan always initiates before Cisco.
>
> It looks like a Cisco bug
>
> Paul
>
> Sent from my iPhone
>
> On Jan 29, 2016, at 11:17, Rajeev Gaur <rajeev.gaur at niyuj.com> wrote:
>
> Hi Paul
>
> I have attached "ipsec status" output, do you feel the AUTH algos used
> here could be an issue?
>
> Thanks
> Rajeev
>
> On Wed, Jan 27, 2016 at 3:57 PM, Rajeev Gaur <rajeev.gaur at niyuj.com>
> wrote:
>
>> Hi Paul
>>
>> One request here, did you had chance to look at 24 and 96 site logs?
>> Do you find this same behavior being depicted by the logs?
>> If yes, in that case, let me see and check "ipsec status".
>> But, if you find it different, please do suggest what difference you
>> found.
>> Then, I will dig that matter.
>>
>> Thanks
>> Rajeev
>>
>> On Wed, Jan 27, 2016 at 3:07 AM, Paul Wouters <paul at nohats.ca> wrote:
>>
>>> On Tue, 26 Jan 2016, Rajeev Gaur wrote:
>>>
>>> Hi Rajeev,
>>>
>>> I wrote:
>>>
>>>       PAYLOAD_MALFORMED message is received quite sometimes.
>>>>
>>>> That could be because the other end still has state which the restarted
>>>> end does not have.
>>>>
>>>>       process_packet_tail() -> in_struct() -> [%s of %s has an unknown
>>>> value = next payload type of ISAKMP Hash Payload has
>>>>       an unknown value: 201]
>>>>
>>>>
>>>> It usually signifies an error in PSK/crypto, so the entire message is
>>>> garbage. (you can tell too because 201 is not defined, although it
>>>> is in the space of "private use" numbers as listed at
>>>>
>>>>
>>>> http://www.iana.org/assignments/ipsec-registry/ipsec-registry.xhtml#ipsec-registry-21
>>>>
>>>> [RG]:
>>>> As I found further the problem is at following place in
>>>> programs/pluto/ikev1.c:
>>>>
>>>>     if (!in_struct(&pd->payload, sd, &md->message_pbs,
>>>>                        &pd->pbs)) {
>>>>                 loglog(RC_LOG_SERIOUS,
>>>>                        "%smalformed payload in packet",
>>>>                        excuse);
>>>>                 status_update(STATE_PROBABLE_AUTH_FAILURE,
>>>> ip_str(&md->sender), md->sender_port);
>>>>                 SEND_NOTIFICATION(PAYLOAD_MALFORMED);
>>>>                 return;
>>>>             }
>>>>
>>>> What does the status_update as STATE_PROBABLE_AUTH_FAILURE mean here?
>>>> Also, I have checked and rechecked PSK and config, I did not found any
>>>> issue?
>>>> Please suggest something here.
>>>>
>>>
>>> As I said, a mismatching AUTH can use this when using PSK, because the
>>> packet will just become something encrypted to the wrong PSK. So it is
>>> decrypted but then becomes nonsense, and we can only try to interpret
>>> it, which then fails on the first or second payload.
>>>
>>> Paul
>>>
>>
>>
> <sites_ipsec_status.txt>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.libreswan.org/pipermail/swan-dev/attachments/20160203/708f15be/attachment.html>


More information about the Swan-dev mailing list