Library Bindings - let's be a little bit more precise shall we |
Michael Walker Tuesday June 14, 2005
But first a little history on what we currently do. Solaris (and *nix's in general) does the following when a process is executed. The kernel will load the required program (a ELF object) into memory and also load the runtime linker ( ld.so.1(1) ) into memory. The kernel then transfers control initially to the runtime linker. It's the runtime linkers job to examine the. program loaded and find any dependencies it has (in the form of a shared object), load those shared objects into memory, and then bind all of the symbol bindings (function calls, data references, etc...) from the program to each of those dependencies. Of coarse, as it loads each shared object it must in turn do the same examination on each of them and load any dependencies they require. Once all of the dependencies are loaded and their symbols have been bound - the runtime linker will fire the .init sections for each shared object loaded and finally transfer control to the executable, which calls main(). Most people think a process starts with main() but amazing things happen before we even get there.
Here we will specifically look at how the runtime linker binds the various symbol reference between all of the objects loaded into memory. Let's take a simple example first - how about a application which links against a couple of shared objects and then libc.
% more *.c :::::::::::::: bar.c :::::::::::::: #includeWe've now got a program, prog, which is bound against three shared objects, foo.so, bar.so and libc.so. The program makes two function calls, one to foo() and one to bar() located in it's dependent shared objects, by ldd'ing the executable we can see it's dependencies and a run of it shows the execution flow:void bar() { printf("inside of bar\n"); } :::::::::::::: foo.c :::::::::::::: #include void foo() { printf("inside of foo\n"); } :::::::::::::: prog.c :::::::::::::: #include int main(int argc, char *argv[]){ extern void foo(); extern void bar(); foo(); bar(); return (0); } % cc -G -o foo.so -Kpic foo.c -lc % cc -G -o bar.so -Kpic bar.c -lc % cc -o prog prog.c ./foo.so ./bar.so
% ldd prog ./foo.so => ./foo.so ./bar.so => ./bar.so libc.so.1 => /lib/libc.so.1 libm.so.2 => /lib/libm.so.2 /platform/SUNW,Sun-Blade-1000/lib/libc_psr.so.1 % ./prog inside of foo inside of bar %Nothing too fancy really - but it's an example we can use to examine what bindings are going on. First - when the program prog makes reference to foo and bar - it's up to the runtime linker to find definitions for these functions and bind the program to them. First the runtime linker will load in the dependent shared objects (listed above) - as the objects are loaded into memory we create a Link Map entry for each object, the objects are appended onto a Link Map list in the order that they are loaded. In the case above the Link Map list would contain:
prog -> foo.so -> bar.so ->libc.so.1 -> libm.so.2 -> libc_psr.so.1When the runtime linker needs to find a definition for a symbol it starts at the head of the list and will search each object for that symbol. If it's found, it binds to that symbol - if it's not found it proceeds to the next object on the list. The following should help demonstrate what's happening. I will run the prog program, but with some runtime linker diagnostics turned on to trace what it is doing. I'm concentrating specifically on foo and bar for this example - of course there are thousands of other bindings going on:
% LD_DEBUG=symbols,bindings ./prog ... 20579: 1: symbol=foo; lookup in file=./prog [ ELF ] 20579: 1: symbol=foo; lookup in file=./foo.so [ ELF ] 20579: 1: binding file=./prog to file=./foo.so: symbol `foo' ... 20579: 1: symbol=bar; lookup in file=./prog [ ELF ] 20579: 1: symbol=bar; lookup in file=./foo.so [ ELF ] 20579: 1: symbol=bar; lookup in file=./bar.so [ ELF ] 20579: 1: binding file=./prog to file=./bar.so: symbol `bar' ...Not so bad really, but it's really not the most efficient way to find a symbol is it. When we were looking for the symbol bar we had to go through 3 objects until we found it. Now imagine what happens when you have a more complex application which has many more shared objects with much larger symbol tables. If I look at firefox - I can see that has over 50 shared objects loaded:
% pldd `pgrep firefox-bin` 28294: /disk3/local/firefox/firefox-bin /lib/libpthread.so.1 /lib/libthread.so.1 /lib/libc.so.1 /disk3/local/firefox/libmozjs.so /disk3/local/firefox/libxpcom.so /usr/sfw/lib/libgtk-1.2.so.0.9.1 /usr/sfw/lib/libgmodule-1.2.so.0.0.10 /usr/sfw/lib/libglib-1.2.so.0.0.10 /usr/openwin/lib/libXext.so.0 /usr/openwin/lib/libX11.so.4 /lib/libsocket.so.1 /lib/libnsl.so.1 /lib/libm.so.2 /usr/sfw/lib/libgdk-1.2.so.0.9.1 /disk3/local/firefox/libssl3.so /disk3/local/firefox/libnss3.so /disk3/local/firefox/libplc4.so /disk3/local/firefox/libplds4.so /disk3/local/firefox/libnspr4.so /disk3/local/firefox/libsoftokn3.so /lib/librt.so.1 /lib/libdl.so.1 /lib/libaio.so.1 /lib/libmd5.so.1 /usr/openwin/lib/libXt.so.4 /platform/sun4u-us3/lib/libc_psr.so.1 /usr/lib/libCrun.so.1 /usr/lib/libdemangle.so.1 /disk3/local/firefox/cpu/sparcv8plus/libnspr_flt4.so /lib/libm.so.1 /disk3/local/firefox/libsmime3.so /usr/openwin/lib/libXp.so.1 /disk3/local/firefox/libxpcom_compat.so /usr/lib/libCstd.so.1 /usr/lib/cpu/sparcv8plus/libCstd_isa.so.1 /lib/libw.so.1 /lib/libmp.so.2 /lib/libscf.so.1 /lib/libuutil.so.1 /usr/openwin/lib/libSM.so.6 /usr/openwin/lib/libICE.so.6 /usr/lib/iconv/646%UTF-16BE.so /usr/lib/iconv/UTF-16BE%646.so /usr/jdk/instances/jdk1.5.0/jre/plugin/sparc/ns7/libjavaplugin_oji.so /platform/sun4u/lib/libmd5_psr.so.1 /usr/jdk/instances/jdk1.5.0/jre/lib/sparc/libjavaplugin_nscp.so /disk3/local/firefox/components/libjar50.so /usr/dt/lib/libXm.so.4 /disk3/local/firefox/libfreebl_hybrid_3.so /usr/sfw/lib/mozilla/libnssckbi.so %And on average - each of those objects have symbol tables with over 2,500 symbols. Doing a linear search at the beginning of each link-map list until you find the symbol just doesn't seem that practical anymore. Firefox is average for modern applications these days - if you were to take a look at Star Office you would find a single program which depends upon over 90 different shared objects.
There's got to be a better way, right? There is - we call it direct bindings. Instead of doing the linear search at runtime you can simply ask the link-editor to record not only what shared objects you bound against - but what symbols you obtained from each shared object. So, if you are bound with Direct Bindings, the runtime linker changes how it looks up symbol bindings and instead will bind directly to the object that offered the symbol at runtime. A much more efficient model - here's the same prog, but this time built with direct bindings, this is done by passing the -Bdirect link-editor option on the link-line:
% cc -Bdirect -o prog prog.c ./foo.so ./bar.soWhen you link with -Bdirect the link-editor will store additional information in a object including where each symbol was seen at link time. This can be viewed with elfdump as follows:
% elfdump -y prog Syminfo Section: .SUNW_syminfo index flgs bound to symbol ... [15] DBL [1] ./foo.so foo [19] DBL [3] ./bar.so bar ... %If we do the same experiment we did earlier, that of running the program and examining the actual bindings that the runtime linker is doing - we will see a much more efficient search:
% LD_DEBUG=symbols,bindings ./prog ... 20728: 1: symbol=foo; lookup in file=./foo.so [ ELF ] 20728: 1: binding file=./prog to file=./foo.so: symbol `foo' ... 20728: 1: symbol=bar; lookup in file=./bar.so [ ELF ] 20728: 1: binding file=./prog to file=./bar.so: symbol `bar' ... %Notice we now find each symbol in the first object we look in, much better.
This Direct Bindings has been in Solaris for a few releases now, although because it's not the default not everyone is familiar with it. It has matured quite a bit over the last few years and we are now starting to use it for some of our core shared objects. If you look at the X11 shared objects delivered with Solaris - you'll find that they are bound with direct bindings:
% elfdump -y /usr/lib/libX11.so | head Syminfo Section: .SUNW_syminfo index flgs bound to symbol [1] D <self> _XimXTransDisconnect [2] D [8] libc.so.1 snprintf [3] D <self> _XcmsFreeIntensityMaps [4] D <self> _XcmsTableSearch [5] D <self> _XDeq [6] D <self> XGetWMSizeHints [7] D <self> XUnmapWindow %Besides the fact that Direct Bindings are more efficient, they are also much more precise. It can get very tricky to control the name space when you start to combine all of the shared objects that you see in new modern applications. If two shared objects happen to offer a symbol of the same name (not by intention) using the default binding lookup - we'll bind to the first one found, which is probably not what the user intends. If - however we bind to exactly the version that was found at the time the object was built, there will be many fewer surprises.
Along these lines - it's worth giving a cautionary note for those re-linking their existing Applications with Direct Bindings enabled. As we apply Direct Bindings to more and more applications we have found a few cases where there are multiple definitions of a single symbol, by changing the binding model you can change the behavior of the application. In most, if not all cases, this was a bug in the design of the application - but a program can become dependent upon this and result in a failure of the application when run with Direct Bindings.
Further details on Direct Bindings specifically and the runtime linker (ld.so.1(1)) and link-editor (ld(1)) in general can be found in the Linker and Libraries Guide which is part of the standard Solaris Documentation.
Examples of tracing what the runtime linker is doing can found in a Blog entry by Rod here titled Tracing a link-edit.
[2] How build a Shared Library | [1] Hello World |