What Are "Tentative" Symbols? |
Ali Bahrami Friday September 22, 2006
In the Linker and Libraries Guide, you will encounter discussion of tentative symbols. Based on the name, we might expect that such a symbol is missing something, but what? And why does the linker have to treat them as a special case?
A tentative symbol is a symbol used to track a global variable when we don't know its size or initial value. In other words, a symbol for which we have not yet assigned a storage address. They are also known as "common block" symbols, because they have their origins in the implementation of Fortran COMMON blocks. They are historical baggage something that needs to work for compatibility with the past, but also something to avoid in new code.
Consider the following two C declarations, made at outer file scope:
int foo; int foo = 0;Superficially, these both appear to declare a global variable named foo with an initial value of 0. However, the first definition is tentative it will have a value of 0 only if some other file doesn't explicitly give it a different value. The outcome depends on what else we link this file against.
To get a better handle on this, let's create two separate C files (t1.c, and t2.c) and experiment:
t1.c
#include <stdio.h> #ifdef TENTATIVE_FOO int foo; #else int foo = 0; #endif int main(int argc, char *argv[]) { printf("FOO: %d\\n", foo); return (0); }
t2.c
int foo = 12;
First, we compile and link t1.c by itself, using both forms of declaration for variable foo:
% cc -DTENTATIVE_FOO t1.c; ./a.out FOO: 0 % cc t1.c; ./a.out FOO: 0
As expected, they give identical results. Now, lets add t2.c to the mix and see what happens:
% cc -DTENTATIVE_FOO t1.c t2.c; ./a.out FOO: 12 % cc t1.c t2.c; ./a.out ld: fatal: symbol `foo' is multiply-defined: (file t1.o type=OBJT; file t2.o type=OBJT); ld: fatal: File processing errors. No output written to a.out ./a.out: No such file or directoryAs you can see, the two different ways of declaring foo are not 100% equivalent. The tentative declaration of foo in t1.c took on the value provided by the declaration in t2.c. In contrast, the linker was unwilling to merge the two non-tentative definitions of foo that had different values, and instead issued a fatal link error.
Normal C rules say that a variable at file scope without an explicit value is assigned an initial value of 0. However, the existence of other global variables with the same name can change this. The C compiler is only able to see the code in the single file it is compiling, and cannot know how to handle this case. So, it marks it as tentative by giving the symbol a type of STT_COMMON, and leaves it for the linker to figure out. The linker is in a position to match up all of these symbols and merge them into a single instance. The linker has no insight into programmer intent though, and it cannot protect you from doing this by accident. The result usually works, but is fragile.
The other declaration form (with a value) causes a non-tentative symbol to be created (STT_OBJECT). In this case, the linker ensures that all the declarations agree. This is the right behavior if you care about robust and scalable code.
It is worth noting that you will never see a tentative symbol with local scope. It can only happen to global symbols, because global symbols in different files are the only way you can get this form of aliasing to occur.
Sadly, it didn't stop there. We still sometimes find this practice in C code. Two files will both declare:
int foo;and then expect that they are both be referring to a single global variable, with an initial value of 0. This is not necessary. The proper solution has existed for decades. The safe way to do the above is to have exactly one declaration for the global variable in a single file. The other files that need to access to it use the "extern" keyword to let the compiler know what is going on. The statement
extern int foo;is a reference, not a declaration, and it has a single unambiguous interpretation.
You should always try to minimize or eliminate global variables. However, when you do use them:
[2] Settling An Old Score | [4] Symbol Tables |