[Swan-dev] shannon_entropy [was Re: ikev2-14-missing-ke test failure]

Sat Jan 3 23:29:11 EET 2015

| 003 WARNING: PSK entropy of 2.251629 is less than minimum Shannon Entropy of 3.500000 - this will be rejected as of July 1st, 2015

I don't understand the wording of this message.

Claude Shannon "invented" information theory.  He's the one that
introduced the term "entropy" for a measure of information in a
signal.  Von Neumann suggested the term to him.  According to
<https://en.wikipedia.org/wiki/History_of_entropy#Information_theory>

    My greatest concern was what to call it. I thought of calling it
    ‘information’, but the word was overly used,
    so I decided to call it ‘uncertainty’. When
    I discussed it with John von Neumann, he had a better idea. Von
    Neumann told me, ‘You should call it entropy, for two
    reasons. In the first place your uncertainty function has been
    used in statistical mechanics under that name, so it already has a
    name. In the second place, and more important, nobody knows what
    entropy really is, so in a debate you will always have the
    advantage.

In any case, we just call it entropy, not Shannon Entropy, when we're
in a computer context.

To calculate entropy, one needs an a priori model of probablilities of
symbols (often characters, but not always) in the message stream
(password).  I don't know a good one.

The libreswan/secrets.c function shannon_entropy has this model: the
password is made up of independent characters.  The probability of a
character in each column of the password is the observed number of
times that character appears in the actual password divided by the
number of columns in the password.  I don't see the sense of this
model.

Lets look as some values calculated by the function.

- password empty, shannon_entropy = 0
  This seems reasonable for the purpose, but not strictly true.

- password a single character, shannon_entropy = 0
  log2(1) is 0 in:  
  entropy -= (double)histogram[i] / size * log2((double)histogram[i]/size);
  This seems like an underestimate.

- password of any number of repetitions of a single character has the
  same result.
  This seems like an underestimate.

- password of two distinct characters, shannon_entropy =
	-2 * (1/2) log2(1/2)
	= 1
  Clearly there is AT LEAST 1 bit of entropy: the two characters
  can appear in either order.  But there are no points for the
  choice of the actual characters.  So this is quite an underestimate.

- password of n distinct characters, shannon_entropy =
	-n * (1/n) log2(1/n)
	= log2(n)
  To get a shannon_entropy of 56 (the strength of DES)
  you would need a 2^56 character password!
  This is an increasingly serious underestimate.

- password is an English word.  shannon_entropy takes no notice of
  this property.  A relative overestimate.

Here's a hint that the code reflects this problem:
	#define MIN_SHANNON_ENTROPY 3.5
That's a horribly low threshold if we actually were measuring entropy.

Recommendation: don't use this function -- it probably isn't
calculating anything useful and certainly isn't calculating entropy.

[I welcome corrections.]