[Swan-dev] addconn: 64kB is not enough for everyone

D. Hugh Redelmeier hugh at mimosa.com
Thu Oct 5 01:55:15 UTC 2017


| commit adc3765ce2cd74864cae843ad33619ff926a7eae
| Author: Kim B. Heino <b at bbbs.net>
| Date:   Wed Oct 4 10:53:03 2017 +0300
| 
|     addconn: 64kB is not enough for everyone
|     
|     On systems where sizeof(int) == sizeof(ssize_t) netlink.h:NLMSG_OK() gives
|     a warning "comparison between signed and unsigned integer expressions".
|     This can be avoided by casting ssize_t to size_t. Negative values are already
|     checked so this cast does not loose sign bit.

Warning: the following argument is long and intricate.  It might have
bugs.  It is also boring.

What's GCC's warning about?  C just cannot compare an unsigned and a
signed type correctly UNLESS the unsigned type is widened to a signed
type.  When does widening occur?  When the other operand's type is
wider than the unsigned type or the unsigned type is narrower than
int.

  int < unsigned // bad
  int < unsigned short // OK, as long as sizeof(int) > sizeof(short)
  size_t < ssize_t // bad
  long < size_t // OK, as long as sizeof(long) > sizeof(size_t)

By "bad", I mean that the result of the comparision will be wrong if
the signed operand happens to have a negative value.  Generally a
compiler assumes that a value with a signed type might have a
negative value.  Compilers could be smarter.

GCC (and clang, etc.) could now be smarter.  They could notice that
(int) sizeof(struct nlmsghdr) is of a signed type but the value will
never be negative.  In this case, the unsigned comparision (as
prescribed by the C standard) would always yield the correct result
and no warning is necessary.  I'm assuming that this is not the case.

/usr/include/linux/netlink.h:

#define NLMSG_OK(nlh,len) ((len) >= (int)sizeof(struct nlmsghdr) && \
                     	   (nlh)->nlmsg_len >= sizeof(struct nlmsghdr) && \
                    	   (nlh)->nlmsg_len <= (len))

and
	__u32 nlmsg_len; /* Length of message including header */

Let's look at the two conjuncts that involve len.

1) (len) >= (int)sizeof(struct nlmsghdr)

2) (nlh)->nlmsg_len <= (len)

So:
1 compares len with int
2 compares len with uint32_t

To avoid warnings, typeof(len) must be signed and wider than uint32_t
or unsigned and narrower than int.

Before the change of this commit, I chose "unsigned and narrower than
int" (unsigned short).

This commit has chosen size_t, which is unsigned.  But is it wider than
uint32_t?  Not on x86-32.  Or on other pure 32-bit architctures
(32-bit ARM).

So this commit message is wrong.  Casting to size_t should not
suppress warnings on all machines.  (However, the resulting object
code should run correctly on all machines.)

I guess that the safest fix is to use type int64_t.  It is signed and
surely wider than uint32_t.  In some distant/theoretical future it
might not be as wide as ssize_t.  At that time, we can drop the cast
altogether.

I admit that this is quite annoying.  The flaw is in the definition of
NLMSG_OK.  It does not even comform to its own documentation,
netlink(3)

       int NLMSG_OK(struct nlmsghdr *nlh, int len);

The netlink maintainers refuse to fix this.

The GCC warning is newer than netlink.  But the warning is correct and
would have been useful all along.



More information about the Swan-dev mailing list