This post aims to help understanding the problem of symbol missing or conflict that appears during the compilation process.
linkage in c / cpp
A translation unit refers to an implementation file (c/cpp) and all included headers.
- When internal linkage, that symbol is only visible to that translation unit.
static
keyword / anonymous namespace force internal linkage.
- When external linkage, that symbol can used in other translation unit.
extern
force external linkage.
By default,
- Non-const global variables have external linkage
const
global variables have internal linkage- Functions have external linkage
inline
- no external linkage in ISO C
- external linkage in GNU C & ISO C++
Linkage can affect symbol generation. So, it will have an impact on visibility. But its capability is limited across multiple sources.
static keyword
When used in a declaration of a class member, it declares a static member. When used in a declaration of an object, it specifies static storage duration (except if accompanied by thread_local). When used in a declaration at namespace scope, it specifies internal linkage.
decleration
Extern decleration suggests that the decleration is made outside of function scope.
A tentative definition is a declaration that may or may not act as a definition. If an actual external definition is found earlier or later in the same translation unit, then the tentative definition just acts as a declaration
Unlike the extern declarations, which don’t change the linkage of an identifier if a previous declaration established it, tentative definitions may disagree in linkage with another declaration of the same identifier. If two declarations for the same identifier are in scope and have different linkage, the behavior is undefined.
linkage to symbol
- external linkage will become global symbol.
- symbol can become strong or weak
- internal linkage will become local symbol.
- usually have HIDDEN visibility
symobl visibility for linking
global symbol has five level of visibility:
- EXTERNAL: defined elsewhere
- DEFAULT: can be directly referenced, can be preempted
- PROTECTED: can be directly referenced, cannot be preempted
- HIDDEN: cannot directly referenced
- INTERNAL: cannot be directly or indirectly reference
compilation
A translation unit will be compiled into one object file. Which can:
- expect symbol from other object:
EXTERNAL
- ex, function in header,
extern
variable
- ex, function in header,
- provide symbol:
- for other object:
DEFAULT
- only for itself:
HIDDEN
- for other object:
Visibility does not prevent symbol from being generated, but instructs the linker if symbol cannot be used externally. Note that only linker will care about visibility.
In nm
:
EXTERNAL
isU
.DEFAULT
isT
.HIDDEN
ist
.
static library
A static library is simply a collection of objects.
And it uses ar
instead of ld
.
Meaning that no linking is performed.
linking
for object and static
Linker will maintain two lists. One for exported symbols. One for undefined symbols.
Multiple definitions will happen when symbols are exported twice. Undefined reference will happen when there are undefined symbols at the end of linking.
Object files are always imported by the linker. Both lists will be updated correspondingly.
When linker encounters a static library, it will go through each object:
- If that object exports a symbol that is in the undefined list, linker will import that object.
- If that object has no symbol of interest, the object is skipped for now.
- The iteration might run several times to resolve symbol dependency in the library.
Note:
- Multiple definitions might not always happen if object is not imported
- By only import necessary object from static library, we can reduce binary size.
The standard library even make a object per function to help this.
--whole-archive
force import all object in archive - After linker has looked at a library, linker will not look at it again even if it contains necessary symbol. This behavior can be overridden by certain flags
for shared
When binding symbols at runtime, the dynamic linker searches libraries in the same order as they were specified on the link line and uses the first definition of the symbol encountered. If more than one library happens to define the same symbol, only the first definition applies
- a duplicated symbol does not trigger an error at linking time
- function interposition:
malloc
- function interposition:
- that means a library can call unexpected function from other object
-symbolic
: force binding to internal function- also disable the use of
extern
variable
- also disable the use of
linking order
Linker will look at all the files from left to right in the command line.
- For every file A that depends on file B, A needs to come before B to get A’s undefined symbol registered.
- For circular dependency, we can use linker flag:
--start-group archives --end-group
: Which resolves symbols recursively with possible “significant performance cost”--undefined
: which tells linker some specific symbols will be missing.--whole-archive
: force the entire library to be linked
- by default, the linker will add a DT_NEEDED tag for each dynamic library mentioned on the command line,
regardless of whether the library is actually needed or not.
--no-as-needed
changing visibility
With shared library, we wish to change visibility of symbol so that:
- only necessary function are exposed, reducing the risk of abi problem
- avoid symbol collision
marco on Windows
__declspec(dllexport)
: export symbol in dll
__declspec(dllimport)
: import this symbol from some dll
macro on Linux
__attribute__ ((visibility ("default")))
: symbol is visible either for exporting or importing.
__attribute__ ((visibility ("hidden")))
: symbol is not visible
We can use -fvisibility=hidden
during linking to change default visibility
scripts
- With version script (Linux) or def file (Windows), we choose what symbol to keep
- the script can be used degenerately and only choose exposed symbols.
static lib
With static lib, we wish to constraint symbol generation on end result.
single object pre-link
: mac only solution.- It will enable linking on the static lib.
- And it will hide library internal symbols
--exclude-libs
can be used to hide all symbols in a static librarystrip
the unnecessary symbol if necessary symbols are known- might affect internal symbols
ar -x
extract object file from static library- The object will be linked with proper visibility
- every object will be linked
objcopy
objcopy
can be used to change object symbol visibility
Op Note -> objcopy
symbol conflict
If there are two conflicting symbols, the conflicted symbol might:
- trigger nothing if the symbol is defined in shared library
- In which case, you might get unintented behavior
- not be included if that object is not included
- trigger multiple definition if object is imported via another function.
Possible solution:
- We could use a different namespace
- wrap with header and build into dynamic library
- need to change symbol visibility
- use objcopy to modify symbol
Reference
- library order in static linking
- internal and external linkage
- C++ visibility
- mac symbol hiding
- How to write shared libraries
- Good Practices in Library Design, Implementation, and Maintenance
- share library symbol conflict
- Inside story on shared library
- visibility
todo
- static library conflict
-
there was two interesting case
- zhihu, 单例模式跨dll?
- singleton with reference count???
- use static member + including class everywhere
- binary size?
- symbol resolution?