The Problem(s) With Solaris SVR4 Link-Editor Mapfiles

Ali Bahrami — Wednesday January 06, 2010

Surfing with the Linker-Aliens

Until recently, I've never really felt that I fully understood the mapfile language used by the Solaris link-editor (ld), despite having used it for years. It's a terse and arbitrary language that does not encourage intuition, full of special cases and odd twists. No matter how many times you read the mapfile chapter of the Linker and Libraries Guide, you're left with a sneaking suspicion that some things just don't fit, or that you've missed something.

Lately, I've been working on a new mapfile syntax to replace this original language, which Solaris inherited as part of its System V Release 4 origins. In the process, I've examined every line of the manual, and of the code, many times. I believe I understand it all the way down now, and I'd like to record some of what I've learned here. My main reason for doing this is as justification for undertaking a replacement language. Oddly enough though, I believe that this information will make it easier to decode, use, and write these older mapfiles. Once you understand the quirks, you can work around them.

This discussion will not cover the new syntax — that will come in a subsequent installment. However, I do want to reassure you that full support for the original mapfile language will remain in place. We're not about to force anyone to rewrite 20+ years worth of mapfiles. The goal is to freeze the old support in its current form, provide a better alternative, and gradually move the world to it over a period of years.

Terse To A Fault / Not Extensible

The core of the old syntax is simple enough: You can create segments, set attributes for them, and assign sections to them. One can easily believe that it seemed adequate and reasonable to its creators. Their primary design decision was to make SVR4 Mapfiles a magic character language. The purpose of a given statement is specified using special characters (=, :, |, @). Options to these statements are further distinguished from each other using other special characters (?, $, ...), or single letter prefixes.

Languages face continuous pressure to expand and provide new features. The initial language may have seemed spare and elegant, but it failed to provide a scalable mechanism for expansion, and this has proven to be a terrible weakness:

Most of these characters have no mnemonic value. The human mind struggles to remember what they stand for, resulting in frequent trips to the reference manual to decode them. The problem gets worse as the number of supported features grows, and is exacerbated by the fact that most people only read mapfiles on an occasional basis. The syntax is not reinforced by constant use the way some other terse languages are.

SVR4 Mapfile Syntax As Evolution

For me, the best way to understand the SVR4 mapfile syntax has been to start with it's original form, and then consider how and where each subsequent feature has been added.

In the late 1980's, starting around 1986 or so, Unix System V Release 4 (SVR4) was being developed at AT&T. They created a new linking format (ELF), to resolve the inadequacies of previous format (COFF) used in SVR3. SVR3 had a rather elaborate mapfile syntax. Rather than stay with this syntax, the SVR4 people designed a new, smaller, and simpler replacement. We don't know their reasons for this decision, and can only guess that they didn't think the SVR3 language was necessary, or a good fit with their new ELF based link-editor. As an aside, while researching different mapfile languages during the design of the replacement syntax for Solaris, I discovered that there is a notable similarity between the SVR3 mapfile language and GNU ld linker scripts. SVR3 lives on, as does much of Unix, in its influence on later systems.

The original SVR4 language was very small, consisting of four different possible statements. All of these have the form:

name magic ... ;
where name is a segment name, and magic is a character that determines what the directive does:

Solaris started with the original SVR4 code base. Since then, Sun has added three more top level statements:

File Control Directives and Capabilities use the same form of syntax as the original four directives. Symbol scope/version blazed a new path, using {} to group symbol names within:
[version-name] {
    scope:
	symbol [= ...];
	*;
} [inherited-version-name...];
In the following subsections, I will present a brief description of each of these top level mapfile statements, and discuss the various odd or unfortunate aspects of each. If you're not familiar with the mapfile language, it may be helpful to have the Linker and Libraries Guide available as well.
Segment Definition (=)
Segment definition statements can be used to create a new segment, or to modify existing ones:
segment_name = segment-attribute-value... ;
If a segment-attribute-value is one of (LOAD, NOTE, NULL, STACK), then it defines the type of segment being created:

If a segment-attribute-value starts with the '?' character, then it is a segment flag:

?R, ?W, ?X
Set the Read (PF_R), Write (PF_W), and eXecute (PF_X) program header flags, respectively. This is a feature of the original SVR4 syntax, and is self explanatory.
?E
The Empty flag can be be used with either a LOAD, or NULL segment:

  1. Applied to a LOAD segment, the ?E flag creates a "reservation". This is an obscure and little used feature by which a program header is written to the output object, "reserving" a region of the address space for use by the program, which presumably knows how to locate it and do something useful. Sections cannot be assigned to such a segment.

  2. Applied to a NULL segment, the ?E flag adds extra PT_NULL program headers to the end of the program header array. This feature is useful for post optimizers which rewrite objects to add segments, and need a place to create corresponding PT_LOAD program headers for them.

  3. The ?E flag is meaningless when applied to NOTE or STACK segments.

The Empty flag was added by Sun. It should be noted that ?E does not correspond to an actual program header flag. It's treatment as a flag in the mapfile syntax, rather than expressing it as a different sort of option (using a magic character other than '?' as a prefix) was primarily a matter of implementation convenience.

?N
Normally, the link-editor makes the ELF and program headers part of the first loadable segment in the object. The ?N flag, if set on the first loadable segment, prevents this from occurring. The headers are still placed in the output object, but are not part of a segment, and therefore not available at runtime. It is meaningless to apply ?N to a non-LOAD segment.

This flag was added by Sun. As with ?E, it does not correspond to a real program header flag. It's representation as a flag is a matter of implementation convenience.

?O
This is another flag, added by Sun, that does not correspond to a real program header. It is used to control the order the placement of sections from input files within the output sections in the segment. Sections are assigned to segments via the ':' mapfile directive. Normally, sections are added in the order seen by the link-editor. When ?O is set, the order of the input sections matches the order in which these assignment directives are found in the mapfile.

This feature was added to support the use of the -xF option to the compiler. That option causes each function to be placed in its own section, rather than all of the functions from a given source file going into a single generic text section. Then, their order can be specified using a mapfile, as with this example taken from the Linker and Libraries Manual:

text = LOAD ?RXO;
text : .text%foo
text : .text%bar
text : .text%main
	
The result of using this mapfile will be for foo(), bar(), and main() to be placed adjacent to each other at the head of the segment, in that order. This feature can be used to put routines that call each other close together, to enhance cache performance. It is worth noting that it was also necessary to set the R and X flags, even though they already are RX on a text segment. This is a quirk of the SVR4 syntax: Any change to the flags replaces the previous value, so we have to specify the flags we want to keep (RX) as well as the one we want to set (O).

A segment-attribute-value can also be a numeric value, prefixed with one of the letters (A, L, R, V), to set the Alignment, Maximum length, Rounding, Physical address, or Virtual address of a LOAD segment, respectively.

The syntax for segment definition suffers from a variety of issues:

Section to Segment Assignment (:)
The link-editor contains an internal list of entrance criteria each of which contains section attributes. To place a section in an output segment, it compares the section to each item in this list. If a section matches all of the items in a given entrance criteria, then the section is assigned to the corresponding segment, and the search ends.

Sections can be assigned to a specific segment via the following syntax, which uses the ':' magic character. The result of such a statement is to place a new entrance criteria on the internal list:

segment_name : section-attribute-value... [: file-name...];

If a section-attribute-value starts with a '$' prefix, then it specifies a section type. This can be one of ($PROGBITS, $SYMTAB, $STRTAB, $REL, $RELA, $NOTE, $NOBITS).

If a section-attribute-value starts with a '?' prefix, then it specifies one or more section header flags: A (SHF_ALLOC), W (SHF_WRITE), or X (SHF_EXECINSTR). To specify that a given flag must not be present, you can prepend it with the '!' character.

A section-attribute-value that does not start with a '$' or '?' prefix is a section name.

If there is a second colon (':') character on the line, then all items following it are file paths, and if any of these match the path for the input file containing the section to be assigned, it is considered to be a match. If the path name is prefixed with a '*', then the basename of the path is compared to the given name rather than the entire path.

Odd aspects of section to segment assignment:

Section-Within-Segment Ordering (|)
Section within segment ordering can be used to cause the link-editor to order output sections within a segment in a specified order. The specification is done by section name:
segment_name | section_name1;
segment_name | section_name2;
segment_name | section_name3;

The named sections are placed at the head of the output section in the order listed.

One might expect to be able to put more than one section on a line (you can't), and the use of '|' may cause a Unix user to make some invalid assumptions about shell pipes, or the C bitwise OR operator. However, there's nothing really terrible about this directive.

It's also not terribly useful --- I'm not sure I've ever seen it used outside of our link-editor tests.

Segment size symbols (@)
The '@' magic character is used to create an absolute symbol containing the length of the associated segment, in bytes:

segment_name @ symbol_name;

There is no corresponding mechanism to create a symbol containing the starting address of a segment, so it is debatable how useful the length is. Perhaps the user is expected to know the name of the first item (possibly a function in a text segment) and use that. In any case, we've never seen this feature used outside of our own tests.

The use of '@' carries no useful mnemonic information, but that's not unique to this particular directive.

Symbol Scope/Version Definition ({})
Symbol scope/versioning directives allows you to build objects that group symbols into named versions. When objects are built, they record the versions they require from dependencies, and at runtime, the runtime linker ld.so.1 validates that the necessary versions are present. Versioning was introduced in Solaris 2.5, and was later adopted (with extensions) by the GNU/Linux developers in a manner compatible with Solaris. This is easily the most successful part of mapfile language, and has proven to be a very useful feature. Today, most mapfiles we encounter contain only symbol versioning.

Scope/versioning definitions have the form:

[version-name] {
    scope:
	symbol [= ...];
	*;
} [inherited-version-name...];
If no version-name is specified, it's a simple scope operation, where global names are assigned to the unnamed "global" version. If a version name is given, the symbols within are assigned to that version, and the version can specify other versions that it inherits from.

Within the {} braces, one can encounter three different types of item:

  1. A symbol scope name (default/global, eliminate, exported, hidden/local, protected, singleton, symbolic), followed by a colon. These statements change the current scope, which starts as global, to the one specified. Any symbols listed after a scope declaration receive that scope, until changed by a following scope definition.

  2. A '*', which is called the scope auto-reduction operator. All global symbols in the final object not explicitly listed in a scope/version directive are given the current scope, which must be hidden/local, or eliminate. Auto-reduction is a powerful tool for preventing implementation details of an object from becoming visible to other objects.

  3. A symbol name, optionally followed by a '=' operator and attributes, finally terminated with a ';'.

The attributes that are allowed for a symbol are:

The scope/symbol directives are by far the most successful part of the SVR4 mapfile language, and there is relatively little to complain about. However, there are aspects of the way the symbol attributes work that could certainly be improved, caused in my opinion by an evident attempt to fit things stylistically with the rest of the language:
File Control Directives (-)
File control directives allow you to tell the link-editor to restrict the symbol versions available from a sharable object dependency being linked to the output object. The most common use for this feature is to limit your object to a set of functionality associated with a specific release of the operating system:

shared_object_name - version_name [version_name ...];

where version_name is the name of versions found within the shared object.

When a given shared object is specified with one of these directives, the link-editor will only consider using symbols from the object that come from the listed versions, or the versions they inherit. The link-editor will then make the versions actually used dependencies for the output object.

Alternatively, a version_name can be specified using the form:

$ADDVERS=version_name

In this case, the specified name is made a dependency for the output object whether or not it was actually needed by the link.

There are some odd aspects to file control directives:

Hardware/Software Capabilities (=)
The hardware and software capabilities of an object can be augmented, or replaced, using mapfile capability directives:
hwcap_1 = capitem...;
sfcap_1 = capitem...;
where the values on the right hand side of the '=' operator can be one of the following:

Perhaps the most unfortunate fact about the capability directives is that they use the '=' magic character, which normally indicates a segment definition. This has some odd ramifications:

One can understand the temptation to reuse '=' for capabilities, instead of picking some other unused magic character. Which one would you pick to convey the idea of 'capability'? I don't find any of the available characters (%, \^, &, ~) compelling in the least. Still, this overloading of '=' is a problem.

As a demonstration of how very similar mapfile lines can have wildly different meanings, consider the following example, which uses the debug feature of the link-editor to show us how mapfile lines are interpreted:

% cat hello.c
#include <stdio.h>

int
main(int argc, char **argv)
{
        printf("hello\\n");
        return (0);
}
% cat mapfile-cap
HwCaP_1 = LOAD ?RWX;		# A segment
hwcap_1 = V0x12;                # A capability
% LD_OPTIONS=-Dmap cc hello.c -Mmapfile-cap
debug: 
debug: map file=mapfile-cap
debug: segment declaration (=), segment added:
debug: 
debug: segment[3] sg_name:  HwCaP_1
debug:     p_vaddr:      0           p_flags:    [ PF_X PF_W PF_R ]
debug:     p_paddr:      0           p_type:     [ PT_LOAD ]
debug:     p_filesz:     0           p_memsz:    0
debug:     p_offset:     0           p_align:    0x10000
debug:     sg_length:    0
debug:     sg_flags:     [ FLG_SG_ALIGN FLG_SG_FLAGS FLG_SG_TYPE ]
debug: 
debug: hardware/software declaration (=), capabilities added:
debug: 

Other misfeatures of the capability syntax are the overloading of the '$' prefix to indicate an instruction to the link-editor ($OVERRIDE), and the use of the 'V' prefix in front of numeric values. These prefixes have different, though similar, meanings elsewhere, which makes the language hard to understand.

Mapfile Magic Character Decoder Ring

Another strategy for understanding SVR4 mapfiles is to organize things by magic character.

Most mapfile directives have the form:

name magic ... ;
where name is generally (but not always) a segment name, and magic is a character that determines what the directive does.

The following is a comprehensive list, in no particular order, of the magic characters and related syntactic elements used in the current SVR4 mapfile language:

CharacterMeaning
=
  1. Create a new segment, or modify the attributes of an existing one, as long as the segment is not named 'hwcap_1', or 'sfcap_1'.

  2. If '=' is used to reference a "segment" named 'hwcap_1', or 'sfcap_1', then this is a hardware or software capabilities directive, and not a segment directive at all. This means that you cannot create a segment named 'hwcap_1', or 'sfcap_1'. However, these names are case sensitive, so you can create segments of those names using any other case. For example, HWcap_1 would name a segment rather than refer to hardware capabilities.

  3. Within a symbol scope/version, associate a symbol name to one or more following attributes.

  4. Within a "File Control Directive", associate the $ADDVERS option (a use of the '$' magic character) with a version name, causing the given version to be added to the output object even if it is not directly used.
:

  1. Assign sections to segments.

  2. If used twice in a section to segment assignment directive, the second one indicates that the items following it are not section names, as they have been to that point, but are file paths from which the previous sections can come.
| Specify output section ordering within a segment. It does not mean "pipe" as it would in the shell, nor does it mean 'OR' as it would in a C-style programming language.
@ Create a "size symbol" for the specified segment, containing the length of the segment. It is not clear how useful these are, since there is no corresponding "address symbol" that might be used to locate the start of the segment for which we have a size. We've never seen it used.
- A "File Control Directive", used to specify the version definitions to be used from the sharable objects linked to the output object.
{ } Grouping, used to contain the symbols within a scope/version directive.
; Terminates all directives, similar to its purpose in the C programming language.
*

  1. Following the second ':' character in a section to segment assignment directive (:), as a prefix to the file names specified following the ':', specifies that the link-editor should compare the basename of the file providing the input section to the prefixed string, rather than comparing the full file path. The use of '*' in a file path is easily confused with the Unix shell "glob" wildcard character. However, this use in the mapfile is not a glob, and only has its special basename meaning if seen as the first character in the name.

  2. Within a symbol scope/version directive, the scope auto-reduction operator, which causes all symbols not otherwise assigned to a symbol version to be reduced to the current scope, which must be local/hidden, eliminate, or protected.
?
  1. Within a segment directive (=), indicates segment flags: 'E' (Empty), 'N' (Nohdr), O (Order), R (Read), W (Write), and 'X' (eXecute). Note that only RWX represent real program header flags. The others (ENO) are not really segment flags but communicate segment related information to the link-editor. This is an example of overloading --- they are "flag like", so it was convenient to treat them as flags rather than use some other magic character to represent them.

  2. Within a section to segment assignment directive (:), indicates section flags: 'A' (Allocable), 'W' (Writable), and 'X' (eXecinstr). Within these flags, the '!' character can be used to specify that the following flag must not be set in the candidate section.
$

  1. Within a section to segment mapping directive (:), a prefix used to indicate that the name following is a section type (PROGBITS, SYMTAB, etc) rather than a section name.

  2. Within a "File Control Directive", a prefix used to indicate that the following name is a special option to be applied to a version. Currently, the only such option is $ADDVERS.
! Within a section to segment mapping directive (:), and within the specification of section flags (?), negates the meaning of a given flag, indicating that the flag must not be set.
A
  1. When used in a segment (=) directive, as a prefix to a numeric value, indicates that the number is a segment alignment.

  2. When used in section to segment assignment (:) flag value(?), specifies the SHF_ALLOC section header flag.
E When used within a segment definition (=) for a flag (?) value, alters the meaning of LOAD or NULL segments. When applied to a LOAD segnebt, ?E specifies that this segment is to be reserved (Empty). No sections are assigned to it, but a program header is generated and at runtime, the region is available to the running program to use. This is an obscure and little used feature. When applied to a NULL segment, reserves an additional PT_NULL program header, for the use of post optimizers that will add segments to the object. Note that this "flag" does not correspond to an actual program header flag.
L When used in a segment (=) directive, as a prefix to a numeric value, indicates that the number is a maximum segment size.
N When used within a segment definition (=) flag value (?): By default, the first segment in an object, which is usually the text segment, contains the ELF header found at the start of the file, making the ELF header available to the runtime linker. The ?N flag specifies that if this segment is the first in the file, it should omit the ELF header. Note that this "flag" does not correspond to an actual program header flag, and that it has no meaning if the segment does not end up being first.
O When used within a segment definition (=) flag value (?): Input sections assigned to the segment should be ordered within their output sections in the order that section assignment directives (:) for the segment are encountered within the mapfile. Note that this "flag" does not correspond to an actual program header flag.
P When used in a segment (=) directive, as a prefix to a numeric value, indicates that the number is a physical address.
R

  1. When used in a segment (=) directive, as a prefix to a numeric value, indicates that the number is a segment rounding value.

  2. When used within a segment definition (=) flag value(?), specifies the READ (PF_R) program header flag value.
S When used in a symbol scope/version directive, as a prefix to a numeric value in a symbol attributes, specifies that the number provides the symbol size (st_size).
V
  1. When used in a segment (=) directive, as a prefix to a numeric value, indicates that the number is a virtual address.

  2. When used in a hwcap_1 or sfcap_1 capabilities definition (=), as a prefix to value that has not been recognized as a hardware or software capability name, indicates that the item is a number.

  3. When used in a symbol scope/version directive, as a prefix to a numeric value in a symbols attributes, specifies that the number provides the symbol value (st_value).
W

  1. When used within a segment definition (=) flag value (?), specifies the WRITE (PF_W) program header flag value.

  2. When used in section to segment assignment (:) flag value(?), specifies the SHF_WRITE section header flag.
X

  1. When used within a segment definition (=) flag value (?), specifies the EXECUTE (PF_X) program header flag value.

  2. When used in section to segment assignment (:) flag value(?), specifies the SHF_EXECINSTR section header flag.

Time For A Fresh Start

The original mapfile language inherited from AT&T was no beauty, but it was good enough to go forward with. We've continued to build on it for 2 decades for a variety of good reasons, primarily that it was getting the job done, that it wasn't preventing progress, and there has been plenty of other work to do. The sort of users who write mapfiles are up to dealing with a little ugliness, and perhaps have been a bit more tolerant than the situation deserves. The "mapfile situation" has been a concern for years. Put simply, it is not asking too much that a programmer with a reasonable (not necessarily deep) grasp of linker concepts be able to read and understand the intent of a mapfile without resorting to a reference manual or linker source code. Nor should it be a difficult chore to fit a new feature into the language cleanly

The mapfile syntax issue usually comes up in the context of wanting to add a new feature, and disparing at the ugliness of what that implies. One is usually in the middle of solving a considerably more focused and urgent problem, and not willing or able to take an extensive detour to replace underlying infrastructure. And so we've moved forward, adding one thing, and then another, with the situation slowly, but not catastrophically, getting worse each time The current state of our mapfile language is such that we shy away from adding new features, and we are aware of other projects that may need some link-editor support in the near future. The right infrastructure simplifies everything it touches, and as we know all too well, the reverse is also true.

We've known for quite awhile that eventually it would be necessary to tackle this issue systematically and produce a new mapfile language for Solaris. That time has finally arrived.

Surfing with the Linker-Aliens

Comments

Michael Ernest — Wednesday January 06, 2010

This is great news, and solid background work.

As a matter of personal interest, I've tried once or twice to divine what's going on with ld. I never got too far, owing to other demands, but neither could I remember that I ever got a foothold from the attempts.

These blog articles are saving me a ton of time in background reading, at the very least. At their best, they're turning on a number of lights for me. Thanks! And keep up the great work.

Surfing with the Linker-Aliens

Published Elsewhere

https://blogs.sun.com/ali/entry/the_problem_s_with_solaris/
https://blogs.oracle.com/ali/entry/the_problem_s_with_solaris/
https://blogs.oracle.com/ali/the-problems-with-solaris-svr4-link-editor-mapfiles/

Surfing with the Linker-Aliens

[13] GNU Hash ELF Sections
Blog Index (ali)
[15] New Mapfile Syntax