[Swan-dev] tainted^D^D^D^D sensitive addresses

Thu Mar 11 22:56:42 UTC 2021

On Thu, 11 Mar 2021 at 16:26, Paul Wouters <paul at nohats.ca> wrote:
>
> On Thu, 25 Feb 2021, Andrew Cagney wrote:
>
> > I'm sitting on a patch that adds still more str_*_sensitive()
> > functions but I'm getting the feeling that it is both excessive and
> > deficient.  For instance, consider:
> >
> >   str_endpoints_sensitive(local, remote)   1.2.3.4:0 -ICMP-> 4.3.2.1:8
> >   str_endpoints_sensitive(remote, local)   4.3.2.1:8 -ICMP-> 1.2.3.4:0
> >
> > with sanitizing turned on, we end up with:
> >
> >  str_endpoints_sensitive(local, remote) <endpoint> -<sensitive>-> <endpoint>
> >
> > which is really not very helpful.
> >
> > Hence, I'm wondering if instead (if it isn't obvious from the subject)
> > we can pick up on both Antony's ip_range.is_subnet hint and perl's
> > tainted variable hack and add a .sensitive bit to ip_* types.  The
> > functions manipulating ip_* types would propagate the bit, and the
> > str_*() functions would check the bit when emitting strings.
> >
> > That way, because only the remote endpoint has the sensitive bit set,
> > we'd be able to log the more meaningful:
> >
> >   str_endpoints(remote, local)   <endpoint> -ICMP-> 1.2.3.4:0
> >   str_endpoints(local, remote)   1.2.3.4:0 -ICMP-> <endpoint>
>
> But is that good enough?

Two issues:

1. identifying addresses that are sensitive
2. deciding how much or little information to hide

I think we're falling short on the first point (once we know an
address is sensitive, deciding how little to display is easy).
We're relying on developers to remember to mark up or update log lines
adding appropriate sensitive calls.

>The point is that if the logs are available to
> a third party, it could reveals identities. If we end up dispaying our
> ephemeral data then someone with logs + captured traffic could still
> connect the IP addresses to an entity?

> But perhaps that is already possible ?

Timestamps in the log file would make that pretty easy.

> If we end up at the "one logline per state change" logging level,
> perhaps all of this is moot? And we could just look at censoring the
> few lines that could possibly get displayed?

It's the error paths where we have problems - even if that is down to one line.
We also have problems with code working src/dst (which could be
incoming or outgoing).

> > one place I know this will fall flat is with dbg() lines.  The current
> > convention is to not sanitize addresses in dbg() lines, the above
> > would make that impossible (short of str_endpoints_insensitive() ...
>
> enabling verbose logging or debug is game over for privacy. We can't
> start obfuscating things there.