Direct Binding - the -zdirect/-Bdirect options, and probing |
Rod Evans Monday July 21, 2008
In a
previous
posting I introduced the use of direct bindings within the
OSNet consolidation. A comment to this posting questioned the
difference between the two options -z direct
and
-B direct
, and pointed out that runtime errors can occur
during process execution if a lazy dependency (typically enabled with
-B direct
) can not be found.
In this entry, I'll discuss the difference between the -z direct
and -B direct
options, and offer a useful technique for handling
the case where the lazy dependency is not present at runtime.
First, the difference between -z direct
and -B direct
.
A full discussion of these options can be found in the
Direct Binding
chapter of the Linker and Libraries Guide.
Aside from lazy loading being enabled by -B direct
, the essential
difference between these options is a trade off between ease of use,
and of control.
-B direct
can be specified anywhere on the command-line, and
results in any external and internal symbol bindings being
established as direct. This means that if libX.so
defines
xy()
and references xy()
, then a direct binding
will be established within the same object.
% cc -G -o libxy.so xy.c -Bdirect -Kpic % elfdump -y libxy.so | fgrep xy [7] DB <self> xy
Hence, -B direct
is a blunt club that hits everything. In contrast,
-z direct
is sensitive to its position in the command line, and
can therefore be used in a more precise manner. Only external
references that are resolved to dependencies that follow -z direct
are established as direct. In the following example, only the references to
libX.so
and libY.so
will have direct bindings
established.
% cc -o libxy.so xy.c -lA -lB -z direct -lX -lY
But the real question is why would you use one option over the other?
-B direct
is recommended where possible, due to
its simplicity and ease of use. However, there are cases where
finer grained control is needed, and -z direct
is more appropriate.
One example is libproc
. This library contains many routines
that users (typically debugging tools) wish to interpose upon. We want
libproc
to have direct bindings to any of the dependencies
it requires (libc
, libelf
, etc.), but we do not
wish libproc
to directly bind to itself. Therefore, but
using -z direct
we can build libproc
to bind
directly to its own dependencies while freely binding to any interposers,
for any of the interfaces libproc
defines.
This interposition is provided regardless of the interposers being explicitly
defined (a requirement as we do not have control over all the consumers
of libproc
).
Note, we even went a little bit further, and
defined all the libproc
interfaces as NODIRECT, which
prevents any direct binding to libproc
. This was to prevent
any dependencies binding to libproc
instead of to an interposer.
The comment to my previous blog entry also raised the issue of how lazy loading can be compromised if a lazy dependency can not be found. Typically, lazy loading is used to locate dependencies that are expected to exist. Historically, interfaces like dlopen(3c) have been used to test for the occurrence of dependencies that might not exist. However, a useful technique is to use lazy loading and test for the existence of a dependency with dlsym(3c). By testing for the existence of a known interface with a lazy dependency you can verify the dependency exists and then feel free to call any other interface within that dependency.
When a dependency is bound to, the SONAME of that dependency is recorded in the caller.
% cc -G -o libxy.so -hlibxy.so xy.c -Kpic % elfdump -d libxy.so | fgrep SONAME [2] SONAME 0x1 libxy.so % cc -o main main.c -z lazyload -L. -lxy % elfdump -d main | egrep "NEEDED|POSFLAG" [0] POSFLAG_1 0x1 [ LAZY ] [1] NEEDED 0x163 libxy.so
With this dependency established, you can protect yourself from calling the interfaces within the dependency unless the interface family you are interested in are known to exist.
if (dlsym(RTLD_PROBE, "symbol-in-libxy-1") { /* * feel free to call any-and-all interfaces in libxy */ symbol-in-libxy-1(); symbol-in-libxy-2(); ....
With this model you don't need to know the name of the object
that provides the interfaces, as the name was recorded at link-time.
And, the dlsym()
will trigger an attempt to load the dependency
associated with the symbol. All other references can be made
directly through function calls rather than through dlsym()
.
This allows the compiler, or verification tools like lint
, to
ensure that you are calling the function with the proper argument and return
types, and will therefore lead to safer and more robust code.
The use of dlopen()
is still appropriate for selecting
between differing objects, or when the caller is not knowledgeable of
the dependency, such as the case with plugins. In other cases, the use of
lazy loading together with dlsym()
, as outlined above, is
recommended, as the implementation is usually easier to write, debug and
deploy.
[28] OSNet Direct Binding | [30] Symbol Capabilities |