[Swan-dev] const version of str_*()

D. Hugh Redelmeier hugh at mimosa.com
Mon Sep 16 06:22:09 UTC 2019


| From: Andrew Cagney <andrew.cagney at gmail.com>

Nice analysis.

| But here's a different problem.  We've code micro-optimized as follows:
| 
|   char buf[ID_LEN]
|   idtoa(&id1, buf);
| 
|   if (dbg) printf("id1 = %s\n", buf);
|   ...
|   if (dbg) printf("id 2 = %s\n", buf);
| 
| presumably to minimise the number of string conversions.  Except, with
| certain irony, when !dbg and things matter, it will make an
| unnecessary call.

Generally those are cases where I was not bold enough to eliminate
that code when I converted to the new clearer routines.

I support your being bold.  (Except when I disagree :-)

Sometimes these things have been moved outside loops.  But even in
these cases your new routine would be helpful without moving the call
into the loop.

| However, I wonder if we we can tweak this further and eliminate the
| need to declare the buffers; and the desire to optimise, vis:
| 
|    id_buf id_str(const id_t *id) __attribute__((const))
|    ...
|    printf("id1 = %s\n", id_str(&id1).buf);
|    printf("id2 = %s\n", id_str(&id2).buf);
|    printf("id1 = %s\n", id_str(&id1).buf);

I like this idea, but only if you drop your objection to inline
functions.

A function that dereferences a pointer arg cannot have
__attribute__((const)).  So you must adjust the id_str function to
receive the id_t "by-value".

id_buf id_str(const id_t id) __attribute__((const))

(With inlining there should be no cost to this change.)

But there's another problem.

I don't think that C guarantees that the id_buf value would live long
enough.  The expression is more explicitly written

	&id_str(&id1).buf[0]

As soon as the & is applied, I believe that C's semantics allows the
temporary id_buf value to be discarded by the implementation.  Before
the pointer is passed to printf.  So the pointer must not be
dereferenced.  Thus this code probably ventures into the dreaded
"undefined behaviour".

In my brief participation in the C++ Standards process, I learned that
this was called the "lifetime of temporaries problem".  It's a real
mess in C++ but I think that it has been nailed down (with an
arbitrary hack).  There's much less need in C so I don't think that it
has been addressed.

Summary C++ has a hack that probably makes your function correct but C
does not.  Or more correctly, the idiomatic use of the function --
the function itself is well-defined.

See, for example
<https://www.ibm.com/support/knowledgecenter/en/SSLTBW_2.3.0/com.ibm.zos.v2r3.cbclx01/lifetime_temp.htm>

A less ambitious use of your function might still be an improvement
over the current code.  Perhaps not worth the effort to refactor.

	const id_buf b1 = id_str(id1);
	printf("id1 = %s\n", b1.buf);

	const id_buf b2 = id_str(id2);
	printf("id2 = %s\n", b2.buf);

	const id_buf b3 = id_str(id3);
	printf("id3 = %s\n", b3.buf);

| https://lwn.net/Articles/285332/
| 
| - the need to declare the buffer is gone
| - the compiler can eliminate duplicate calls
| - the odds of passing wrong buffers is reduced further

Yes.

I thought that C was going to adopt something like that but I don't
immediately see anything in the C-11 standard.

| it should also be efficient.  Small structures are returned in 1-2
| registers; and larger structures are returned by reference, the above
| is compiled as:
| 
|    id_buf tmp;
|    str_id(&tmp, &id)
|    tmp.buf
| 
| making it equivalent to the old str_id() call.

This operational model is not the definition of C.  I'm against
exploiting something that is technically Undefined Behaviour.


More information about the Swan-dev mailing list