(Not) Managing More Than One Of The Same Object In A Process |
Ali Bahrami Wednesday January 06, 2016
I had a conversation with a coworker this week about one of those recurring questions that come up from time to time. There is an existing and widely used system library, and there's a desire to provide a better variant of it, using the same library name, with the same SONAME. The two objects offer the same interfaces, but they cannot coexist within a given process. The question was about whether the linkers can prevent both from being loaded into a single process. I had to deliver the unwelcome news that the linkers cannot do that with 100% reliability, and that they aren't intended to support that sort of design. This sort of discussion comes up frequently enough that I think it would be useful to explore the underlying issues.
For the purposes of this discussion, let's call that library libbar. The 2 copies of libbar live in different locations on the system, but are otherwise the same from a linking point of view. The question was: If the main a.out program uses one libbar, and other dependencies of a.out use the other libbar, is there any way to guarantee that only the good libbar gets loaded, and is used by both?
The short answer is no. The only completely safe way to manage this is to only have one libbar. If you need a variant, give it a different name, and SONAME, and possibly different function names too, or at least use direct bindings. The only really simple thing is one library.
Note that this is not the same question as "How can I design libbar such that having multiple copies loaded and used simultaneously is safe?". That's an easier question to answer: You do it by making the library completely reentrant, and by ensuring that all new APIs are backward compatible with old versions. That is of course, easier said than done.
To demonstrate how things can go wrong with 2 copies of 1 library, I wrote a small test program. Before we dive into this, you might find it useful to review how library names and SONAMES work: Please see How To Name A Solaris Shared Object.
It helps to first understand how the runtime linker finds dependencies for an object. It's pretty simple:
Sometimes, people believe that the runpath of the main program somehow controls how dependencies of dependencies are found. They'll say something like:
My program calls libfoo, and libbar. libfoo is itself linked to a different copy of libbar. I want only one libbar to be loaded, but I've been told by the linker experts that these two copies will both be loaded. And yet, I only see the first one being used, which is what I was hoping for, but which seems to contradict the experts. What is really going on, and why can't I just do this?The confusion stems from the fact that there's more involved than merely finding and loading libraries in this particular game of pachinko. After finding and loading libraries, the runtime linker carries out the process of symbol resolution, the process of determining how symbols are bound between objects at runtime. If direct bindings are in play, then symbols are bound as the direct bindings dictate. Otherwise, it's done by interposition: The objects in the process are examined in the order that they were loaded, and the first object to provide the desired symbol wins. You can therefore see that it's possible in our question above for 2 copies of libbar to be loaded, while only one is used. It's more complicated that that however. It's easy to imagine scenarios in which the one used changes, as well as scenarios where both are used.
Let's make this concrete with an example. I have a main program that calls functions foo() and bar(), each of which is in a library (libfoo, and libbar respectively):
foo() also calls bar():% cat main.c #include <stdio.h> extern void foo(void); extern void bar(void); int main(int argc, char **argv) { (void) printf("main calls foo\n"); foo(); (void) printf("main calls bar\n"); bar(); }
Now the twist: There are actually 2 libraries named bar. foo() is linked to lib1/libbar.so.1, while main is linked to lib2/libbar.so.1. Both libbar's have the same object name, and the same SONAME.% cat foo.c #include <stdio.h> extern void bar(void); void foo(void) { (void) printf(" foo calls bar\n"); bar(); }
BAR_STR is set via -D on the cc command line when the 2 libbar directories are built.% cat bar.c #include <stdio.h> void bar(void) { printf(" bar is in library %s\n", BAR_STR); }
I have provided a tarball with these files, and a Makefile, that you can download and use to reproduce these experiments. Unpack it in an empty directory, and follow along below:
First, let's build it without any special options. I'll show the output from make for this first experiment to give you a sense of what it does, but will elide it from following ones in the interest of brevity:
ldd shows that there will be 2 libbar objects in the process:% make mkdir lib1 cc -G -Kpic -DBAR_STR=\"bar_lib1\" bar.c -hlibbar.so.1 \ -o lib1/libbar.so.1 -zdefs -lc rm -f lib1/libbar.so ln -s libbar.so.1 lib1/libbar.so cc -G -Kpic foo.c -hlibfoo.so.1 \ -o libfoo.so.1 -L lib1 -R lib1 -zdefs -lbar -lc rm -f libfoo.so ln -s libfoo.so.1 libfoo.so mkdir lib2 cc -G -Kpic -DBAR_STR=\"bar_lib2\" bar.c -hlibbar.so.1 \ -o lib2/libbar.so.1 -zdefs -lc rm -f lib2/libbar.so ln -s libbar.so.1 lib2/libbar.so cc main.c -o main -L. -Llib2 -R. -Rlib2 -zdefs -lfoo -lbar
and debug output shows that both are actually pulled into the process:% ldd main libfoo.so.1 => ./libfoo.so.1 libbar.so.1 => lib2/libbar.so.1 libc.so.1 => /lib/libc.so.1 libbar.so.1 => lib1/libbar.so.1
However, only one is actually used, the one "controlled" by the a.out:% LD_DEBUG=all ./main 2>&1 | grep 'link map' | grep libbar.so 04689: file=lib2/libbar.so.1 [ ELF ]; generating link map 04689: file=lib1/libbar.so.1 [ ELF ]; generating link map
This didn't happen because the a.out controlled the loading of objects though. It happened because the a.out's libbar was already in memory, and symbol binding is being done via the traditional interposition rules. The symbol bar() could have come from any object in the process, not necessarily from libbar.% ./main main calls foo foo calls bar bar is in library bar_lib2 main calls bar bar is in library bar_lib2
It is not safe to assume that the copy of libbar tied to the a.out will always be the one that "wins". One way to change that is to enable lazy loading, which defers object loading until the first access to the object is made:
Now, lib1/libbar wins, rather than lib2/libbar as before. Thanks to lazy loading, libfoo pulled in lib1/libbar before main got around to pulling in lib2/libbar.% make clean rm -rf lib? libfoo.so* main % LD_OPTIONS=-zlazyload make <...make output elided...> % ./main main calls foo foo calls bar bar is in library bar_lib1 main calls bar bar is in library bar_lib1
Direct bindings offer another way to perturb the results, and can lead to both libraries being called.
Preloading is yet another way to change the outcome:% make clean rm -rf lib? libfoo.so* main % LD_OPTIONS=-Bdirect make <...make output elided...> % ./main main calls foo foo calls bar bar is in library bar_lib1 main calls bar bar is in library bar_lib2
There are probably other ways too. For instance, we haven't even discussed the use of dlopen().% LD_PRELOAD=lib1/libbar.so.1 ./main main calls foo foo calls bar bar is in library bar_lib1 main calls bar bar is in library bar_lib1 % LD_PRELOAD=lib2/libbar.so.1 ./main main calls foo foo calls bar bar is in library bar_lib2 main calls bar bar is in library bar_lib2
It is indeed true that normal small programs can manage these pitfalls without much issue. But consider the complexity of a situation like that in firefox, where multiple dependencies have dependencies on each other:
At some point, the interdependencies will overwhelm your ability to predict, or to manage. At the limit, the only 100% safe and predictable way to manage this issue is to ensure that there is never more than one instance of a given library on the system.% ldd /usr/bin/firefox | wc -l 90
[33] ELF Section Compression | [35] How To Strip An ELF Object |