The Cost Of ELF Symbol Hashing |
Ali Bahrami Tuesday October 21, 2008
Linux and Solaris are ELF cousins. Both systems use the same base ELF standard, though we've both made our own OS specific extensions over the years. Recently, the GNU linker folks made some changes to how Linux does symbol hashing in their ELF objects. I've been learning about what they've done, and that in turn caused me to consider the bigger picture of ELF hashing overhead.
The runtime linker maintains a list of objects currently mapped into a programs address space. To find a desired symbol, it starts at the head of this list and searches each one in turn using a hash lookup operation, until the symbol is found (success) or the end of the list is hit (failure).
The per-symbol cost of symbol lookup hashing grows with:
In the past, when the list of objects in a program was very short, it was not necessary to search many objects before a given symbol was found. Most symbols were in the first, often only, object. Hence, most hash operations were successful, and hash overhead was not a significant concern. In modern programs however, failed hash operations dominate. It is usually necessary to perform one or more failing hash operations before getting to the object that has a desired symbol. The more objects, the larger the percentage of failing hashes.
Unless somehow mitigated, the per-symbol cost of hashing will continue to grow as programs grow larger, possibly to a level where the user can feel the effect. There are several ways in which this overhead can be reduced:
ELF versioning allow symbols to be assigned to versions, thereby creating interfaces that can be maintained for backward compatibility as the object evolves. In a version definition, the scope reduction operator can be used to tell the link-editor that any global symbols not explicitly assigned to a version should not be visible outside the object. For example, the following exports the symbol foo from a version named VERSION_1.0, while reducing the scope of all other global symbols to local:
Some language compilers offer symbol visibility keywords that have similar effect.VERSION_1.0 { global: foo; local: *; };
Eliminating unnecessary symbols from the hash table reduces the average length of the hash chains, and speeds the lookup. In addition, hiding unnecessary symbols from object's external interface prevents accidental interposition in which a given library exports a function intended to only be for its own use, and that symbol ends up interposing on a symbol of the same name in a different object.
Prelinking and direct bindings are very different solutions that attack the problem of hashing overhead along different axis. Systems will see greatly reduced symbol hash overhead with either strategy, but symbol hashing will still occur. As such, the cost of the hash lookup operation is still of interest.
- Prelink
- The Linux operating system emphasizes their prelink system. Prelink analyzes all the executables on the system, and the libraries they depend on. Non-conflicting addresses are assigned for all the libraries, and then the executables and libraries are pre-bound to be loaded at those addresses. In effect, the work normally done by the runtime linker for each object at process startup is instead done once. The runtime linker, recognizing a prelinked object at process startup, will map it to its assigned location, and immediately return rather than do the usual relocation processing.
Prelinking is a complex per-system operation, though Linux does a good job of hiding and managing this complexity. Changes to the system can require the prelink operation to be redone.
Prelinking pares the cost of ELF runtime processing to the absolute minimum. As part of that, it completely eliminates symbol hashing at startup. A further advantage is that it does not require any changes to the objects in question. All of the complexity is kept in the prelink system itself.
Prelinking will not prevent hashing from occurring if:
- The object is loaded via dlopen().
- The object is not prelinked
- If the object is prelinked for one system, but is then used by another, either as a copy or via a network filesystem. Prelinking is a per-system concept. The system can use an object prelinked for a different system, but the benefits of prelinking may be lost.
- The objects on the system have changed, altering the prelinking needed. In this case, prelinking can be recomputed.
- Direct Binding
- The Solaris operating system employs a combination of direct binding, preferably in conjunction with lazy loading and lazy binding. In non-direct binding, an object records the symbols it requires from other objects. In direct binding, each such symbol also records the object expected to provide that symbol.
Direct bindings were developed with several goals in mind:
- They harden large programs against the effects of unintended symbol interposition, when the wrong library provides a given symbol due to the order in which libraries get loaded. This makes it easier to reliably build larger programs.
- They allow multiple instances of a named interface to be referenced by different objects, further expanding the ability of complex programs to interface with multiple, perhaps incompatible, external interfaces. For instance, when a program depends on many objects that are developed independently by others, sometimes those objects have dependencies on different incompatible versions of some other object.
- Performance: At runtime, the runtime linker uses the direct binding information to skip the O(n) search of all the objects in the process, and instead to go directly to the right library for each symbol, carrying out a single successful hash operation to locate it within the object.
The the first two items were the driving issues that led to the invention of direct bindings, reflecting real problems we were encountering in the field.
Lazy binding, which is the default, reduces the amount of hashing done even further, by delaying the binding operation until the first use of a given symbol by the program. If a given symbol is not needed during the lifetime of a process, it is never looked up, and the hashing that would otherwise occur is eliminated. Most programs do not use all of the symbols available to them in a given execution, so lazy binding amplifies the benefits of direct binding.
Direct bindings do not eliminate hashing, but they eliminate the unproductive failure cases, leaving a single successful hash lookup per symbol. In a sense, they bring us back to the performance of early ELF systems where everything was found in a single library and most hash operations were successful. Direct bindings have a relatively simple implementation. They work with dlopen(), and across network filesystems. However, many objects do not use direct bindings. Unlike prelink, direct bindings require explicit action to be taken when the object is built. Converting an existing non-direct object to use them can require some analysis to be carried out by the programmer, to enable desired interposition that direct bindings would otherwise prevent.
A Bloom filter is never wrong when it says an item does not exist in a set. Applied to ELF hash tables, the runtime linker can test a symbol name against a Bloom filter, and if rejected, immediately move on to the next object. Since most symbol lookups end in failure as discussed above, this has the potential to eliminate a large number of unnecessary hash operations. It is possible for a Bloom filter to incorrectly indicate that an item exists in the set when it doesn't. When this happens, it is caught by the following hash lookup, so correct linker operation is not affected. Since false positives are rare, this does not significantly affect performance.
It is interesting to note that the use of a Bloom filter makes a successful symbol lookup slightly more expensive than it would be otherwise. The hash table alone can be used to determine if a symbol exists or not, so the Bloom filter is pure overhead in the success case. Despite that, the Bloom filter is a winning optimization, because it is very cheap to compute compared to a hash table lookup, and because most hash operations are against libraries that end up not containing the desired symbol.
It is also worth noting that the runtime linker is free to skip the use of the Bloom filter and proceed directly to the hash lookup operation. This may be worthwhile in situations where the runtime linker has other reasons to believe the lookup will be successful. In particular, if the runtime linker is directed to a given object via a direct binding, the odds of a failed symbol lookup should be zero, so there is no need to screen before the hash lookup.
Nonetheless, the GNU hash embodies several worthwhile ideas:
It is clear that hash overhead can be measured and reduced. By and large however, we have not found ELF hash overhead to be a hot spot for real programs. It seems that the other things that go on in a program generally dominate the hit that comes from ELF symbol hashing. The core Solaris OS is now built using direct bindings, which has allowed us to harden and simplify aspects of its design. Interestingly enough, measurements do not reflect a significant resulting change in system performance. This does not prove that hash overhead should be ignored, but it does tend to suggest that one has to look towards extremely large non-direct bound programs in order to demonstrate the issue.
It's all food for thought, and perhaps time for some experimentation and measurement.
[11] Cross Link-Editor | [13] GNU Hash ELF Sections |