A New Mapfile Syntax for Solaris

Ali Bahrami — Thursday January 07, 2010

Surfing with the Linker-Aliens

In the previous entry, I discussed at length the problems and misfeatures of the original Solaris mapfile language that we inherited with System V Release 4 Unix. The original language was not designed to be extended, yet we've built on top of it for 20+ years. Although we could continue to do so, we have come to a point where a new language that retains the good features of the old, while addressing its shortcomings, would pay dividends.

My project to create a replacement mapfile language is in its final stages. I believe that the resulting syntax is simple, highly readable, and easily extended. Yet, the result is also highly evolutionary. I think anyone who knows the old language will have little difficulty understanding and quickly putting the new one to use. The implementation is complete, and I've used it to build a copy of the Solaris OSnet workspace with all of its mapfiles rewritten using the new syntax. Yesterday, the PSARC case for this work was approved, a significant milestone:

PSARC/2009/688 Human readable and extensible ld mapfile syntax

We're currently in a restricted build period leading up to the release of the next OpenSolaris, and this work will have to wait to integrate until after that, probably in the second half of February. However, the work is essentially done, and this seems like a good time to get some information about it into circulation.

The case materials for PSARC/2009/688 include a replacement mapfile chapter for the Solaris Linker and Libraries Guide. The old chapter will be preserved as an appendix for the benefit of those needing to decrypt existing mapfiles. Until this new material appears in the published manual, I hope you will find this HTML version helpful.

There is little reason to repeat the information in that document here. Instead, I would like to describe the underlying principles we used to design this new language, and to provide a series of examples in which a single item is expressed in both the old and new syntaxes. I think that these examples probably offer the fastest way for someone who already knows the old syntax to start using the new one. I will refer to the Linker and Libraries Manual frequently in this discussion, often using the abbreviation LLM.

Design, Testing, and Base Principles

The new syntax was developed in an iterative manner, starting with a paper design, written in the form of a replacement for the current LLM mapfile chapter, and progressing to implementation and testing with real mapfiles. With each iteration, I would take the lessons learned, debate and discuss the options with my fellow linker alien Rod Evans, and alter the design to address the issues and move forward. As might be expected, there were false starts, and surprises along the way, but eventually things solidified around the final design.

Once we had a final design and a working implementation of it, I modified our linker tests so that each test that uses a mapfile now does so twice, once with the old syntax, and once with the new. This has two important benefits:

  1. I can ensure that the new syntax can do anything the old one can (modulo a few obscure features not taken forward), by comparing the two resulting objects to make sure they are identical.

  2. The old syntax will continue to be used, so it will not fail due to bit rot.

As I iterated though the design process, I developed and refined the following list of requirements and observations that in turn guided following iterations. Listed in no particular order:

New Syntax Overview
The full definition of the version 2 mapfile language can be found online. As mentioned earlier, I won't be repeating that information here. Instead, I'll provide a high level overview, with an eye towards showing how the wish list from the previous section was fulfilled.

A version 2 mapfile can contain two types of directive:

As with the version 1 syntax, '#' is the comment character. A # on a line, and everything following it, is ignored by the link-editor, as are empty lines.

The first non-comment, non-empty, line in a version 2 mapfile must be the control directive:

$mapfile_version 2
Any mapfile that does not start with this line is interpreted as a version 1 mapfile, in which case the full original syntax is supported.
Control Directives
Aside from $mapfile_version, there are control directives that provide a conditional input facility that can be used to restrict specific mapfile lines to specific platforms:

$if expr
...
[$elif expr]
       ...
[$else]
...
$endif

The sole purpose of this facility is to allow you to write something like

$if _sparc && _ELF32
    32-bit sparc thing
$elif _x86 && _ELF64
    64-bit x86 thing
$else
    others
$endif

as a way to handle minor per-platform variations in an otherwise identical mapfile.

Users of C, and related, languages will instantly recognize this as being very similar to the C preprocessor, substituting '$' for '#'. That is true, but the similarity is very superficial:

  1. Mapfiles have no macro concept.

  2. The expressions evaluated by $if are purely logical (boolean true/false), with no concept of numeric values, and significantly simpler than those of CPP.

I had a few reasons for making '$' the character for control directives:

  1. To give C programmers a strong visual hint that they're not using CPP, and should have different expectations. As I mentioned earlier, if you need a macro pre-processor, Unix has many available that you can use outside the link-editor.

  2. To preserve '#' as the mapfile comment character.

  3. '$' has no previous meaning at the start of a mapfile statement in the original version 1 syntax.
Reasons 2 and 3 both relate to the fact that the link-editor reads the mapfile to determine which version of syntax is being used. By keeping the same comment character, and using a character for control directives not already used at the start of a statement by the old syntax, the link-editor can safely read and discard opening header comments, locate the first statement in the file, and unambiguously determine if the mapfile is using version 1 or version 2 syntax.

There are a small number of predefined values available for use in $if/$elif expressions:

_ELF32   _ELF64
_sparc   X86
true

I expect these to be sufficient for nearly any mapfile. However, the $add control directive exists to define new values, and $clear to remove them. $add might be used to define convenient shorthand for longer expressions. For example, you you were writing a mapfile that had a large number of special cases involving the 64-bit x86 architecture, a definition like the following might be convenient:

$if _ELF64 && _x86
$add amd64
$endif

Lastly, the $error directive allows you to make your mapfiles safe against attempts to use them in an unexpected context. The text following the directive is issued as a fatal error by the link-editor, which then exits. I expect it to be used as follows:

$if _sparc
sparc thing
$elif _x86
x86 thing
$else
$error unknown platform
$endif

The error message includes the mapfile name, and the line number where the $error directive was encountered.

Regular Directives
The regular directives all specify object-related information.

They all share a common underlying abstract syntax, based on the idea of name-value pairs, and the use of {} brackets for grouping, and to express sub-attributes.

All directives are terminated by the ';' character, as are attributes of directives.

Described informally, the simplest form is a directive name without a value:

directive;
The next form is a directive name with a value, or a whitespace separated list of values.
directive = value...;
The '=' operator is shown, which sets the given directive to the given value, or value list. The '+=' operator can be used, to specify that the value is to be added to the current value, and similarly, a '-=' operator is used to remove values.

More complex directives manipulate items that take multiple attributes enclosed within {...} brackets to group the attributes together as a unit:

directive [name] {
        attribute [= value];
        ...
} [name...];

Such a directive can have a name before the opening '{', which is used to name the result of the given statement. As an example, this may be a segment, or version name. One or more optional names may also be allowed following the closing '}', before the terminating ';'. These names are used to express that the named item being defined has relationship with other named items. For example, the SYMBOL_VERSION directive uses this for inherited version names.

Note that the format for attributes within this form follow the same pattern as that of the simple directive form.

Some directives may have attributes that in turn have sub-attributes. In such cases, the sub-attributes are also grouped within nested { ... } brackets to reflect this hierarchy:

directive [name] {
        attribute {
                subattribute [= value];
                ...
        };
        ...
} [name...];

Such nesting can be carried out to arbitrary depth, as required to express the meaning of a given directive. In practice, 1-2 levels of nesting are sufficient for the directives currently defined. I don't anticipate very deep nesting being necessary, but the flexibly to do so gives me confidence that the new syntax is sufficiently flexible, and that we will be able to expand it as necessary going forward.

Old and New Syntax Compared

I think that the best way to evaluate the new mapfile syntax is to show how one might express the same concepts using both. In the subsections that follow, I will show examples in the old syntax and then re-write them using the new. This won't be a comprehensive demonstration of every possible option, but will touch on all of the main features.
Segments/Sections (Elephant, Monkey, and Donkey Ride Again)
The Linker and Libraries Manual contains the following example, which comes from the original AT&T documentation. This example shows how segments are created and sections assigned to them using the old syntax:
elephant : .data : peanuts.o *popcorn.o; 
monkey : $PROGBITS ?AX; 
monkey : .data; 
monkey = LOAD V0x80000000 L0x4000; 
donkey : .data; 
donkey = ?RX A0x1000; 
text = V0x80008000;
I have re-written this example for the new replacement mapfile chapter, as it provides a direct comparison between the old and new syntaxes. The old chapter, and my replacement, both contain a description of what each line means. I'll reproduce the new version here, omitting the explanations:
$mapfile_version 2
LOAD_SEGMENT elephant {
        ASSIGN_SECTION {
                IS_NAME=.data;
                FILE_PATH=peanuts.o;
        };
        ASSIGN_SECTION {
                IS_NAME=.data;
                FILE_OBJNAME=popcorn.o;
        };
};
LOAD_SEGMENT monkey {
        VADDR=0x80000000;
        MAX_SIZE=0x4000;
        ASSIGN_SECTION {
                TYPE=progbits;
                FLAGS=ALLOC EXECUTE;
        };
        ASSIGN_SECTION {
                IS_NAME=.data
        };
};
LOAD_SEGMENT donkey {
        FLAGS=READ EXECUTE;
        ALIGN=0x1000;
        ASSIGN_SECTION {
                IS_NAME=.data;
        };
};
LOAD_SEGMENT text {
        VADDR=0x80008000
};
The original is extremely compact, but also very cryptic. The new version is is considerably longer, as it uses our recommended style of one item per line, with consistent indentation to show structure. The improvement in readability is substantial. I believe that most programmers can read this and follow its meaning without having to look up the syntax. I'm quite sure the same cannot be said of the old one.

Also note that the new version can be significantly compacted without losing much readability, though there's not much value in doing so:

$mapfile_version 2
LOAD_SEGMENT elephant {
        ASSIGN_SECTION { IS_NAME=.data; FILE_PATH=peanuts.o };
        ASSIGN_SECTION { IS_NAME=.data; FILE_OBJNAME=popcorn.o };
};
LOAD_SEGMENT monkey {
        VADDR=0x80000000; MAX_SIZE=0x4000;
        ASSIGN_SECTION { TYPE=progbits; FLAGS=ALLOC EXECUTE };
        ASSIGN_SECTION { IS_NAME=.data };
};
LOAD_SEGMENT donkey {
        FLAGS=READ EXECUTE; ALIGN=0x1000;
        ASSIGN_SECTION { IS_NAME=.data; };
};
LOAD_SEGMENT text { VADDR=0x80008000 };
Output Section Ordering
The version 1 syntax uses the '|' character to specify output section ordering. The LLM gives this example:

segment_name | section_name1;
segment_name | section_name2;
segment_name | section_name3;

In the version 2 syntax, this mapfile would be written as

$mapfile_version 2
LOAD_SEGMENT segment_name {
        OS_ORDER = section_name1 section_name2 section_name3;
};		
Size Symbol Declarations
The version 1 syntax for creating a size symbol is:

segment_name @ symbol_name;

In the version 2 syntax, this is:

$mapfile_version 2
LOAD_SEGMENT segment_name { SIZE_SYMBOL = symbol_name };
File Control Directives

In the version 1 syntax, File Control Directives, indicated by the '-' character, are used to establish the versions that are available from shared objects linked to the object being created. In the new syntax, this is done using the DEPEND_VERSIONS directive.

For example, the following specifies that the version SUNW_1.20, as well as any version inherited by SUNW_1.20, is available for use by the object being created. It also forces SUNW_1.19 to be listed as a dependency, whether or not a symbol from SUNW_1.19 is actually used:

libc.so - SUNW_1.20 $ADDVERS=SUNW_1.19;

The same requirement can be expressed in the new syntax as:

$mapfile_version 2
DEPEND_VERSIONS {
        ALLOW =   SUNW_1.20;
	REQUIRE = SUNW_1.19;
};
Capabilities

Hardware and software capability directives are used to augment or replace the capabilities found in the input objects. For example consider the following statements in the version 1 syntax:

hwcap_1 = mmx;		    # Add MMX to existing hardware capabilities
hwcap_1 = mmx $OVERRIDE;    # Replace existing hardware capabilities with MMX

sfcap_1 = addr32;	    # Add ADDR32 to existing software capabilities
sfcap_1 = addr32 $OVERRIDE; # Replace existing software capabilities with ADDR32

Rewritten using the version 2 syntax:

$mapfile_version 2
CAPABILITY {
	HW += mmx;          # Add MMX to existing hardware capabilities
	HW = mmx;           # Replace existing hardware capabilities with MMX

	SF += addr32;       # Add ADDR32 to existing software capabilities
	SF = addr32;        # Replace existing software capabilities with ADDR32
};
Symbol Versions

The syntax for symbol scope/versioning symbols is the least changed:

The following things are different:

For a large number of mapfiles, the only change necessary will be to add the $mapfile_version control directive to the file, and to put the keyword SYMBOL_SCOPE or SYMBOL_VERSION in front of each scope/version.

To show the difference in how symbol attributes are specified, consider the following directive in the old syntax that uses every possible symbol attribute. This is not a realistic example, as many of these options are not mutually compatible. However, it serves to highlight the full set of syntax differences:

VER_1.2 {
        foo = V0x12345678 S0x23
                FUNCTION DATA COMMON
                FILTER libfoo.so
                AUX libfoo.so
                PARENT EXTERN DIRECT NODIRECT INTERPOSE DYNSORT NODYNSORT;

        protected:
               *;
} VER_1.1;

Rewriting this in the version 2 syntax gives:

$mapfile_version 2
SYMBOL_VERSION VER_1.2 {
        foo {
                VALUE = 0x12345678; SIZE = 0x23;
                TYPE = FUNCTION;    TYPE = DATA;    TYPE=COMMON;
                FILTER = libfoo.so;
                AUX = libfoo.so;
                FLAGS = PARENT EXTERN DIRECT NODIRECT INTERPOSE DYNSORT NODYNSORT;
        }

        protected:
               *;
} VER_1.1;

Although the attribute syntax has changed, it is very similar.

Ordered Input Sections
The compiler usually places functions within a single source file together in an single text section in the resulting object. Such an object is an all or nothing proposition — to use any one of these functions, the link-editor must take the entire text segment as a unit. The contents of such a section are fixed in place, and cannot be altered by the linker.

The Sun compilers support a command line flag, -xF, that causes each function to instead be placed in its own separate section. This gives the link-editor finer grained control, as it can omit unused functions while still pulling in the ones needed to complete the link. The link-editor also has the opportunity to arrange these functions in arbitrary order relative to each other, under user control, specified via the mapfile.

The documentation for the original version 1 syntax in the LLM gives this example:

text = LOAD ?RXO;
text : .text%foo
text : .text%bar
text : .text%main

The result of using this mapfile will be for foo(), bar(), and main() to be placed adjacent to each other at the head of the segment, in that order. The ordering is implicit in the order in which the three section to segment statements (':' lines) are given in the mapfile.

The version 2 syntax accomplishes this reordering as follows:

$mapfile_version 2
LOAD_SEGMENT text {
        ASSIGN_SECTION bar  { IS_NAME = .text%bar };
        ASSIGN_SECTION main { IS_NAME = .text%main };
        ASSIGN_SECTION foo  { IS_NAME = .text%foo };
        IS_ORDER = foo bar main;
};
Conditional Input
This example comes from the linker tests. We have a test that sets an address for the text segment, and this test sets a different address for each of 32-bit sparc, 64-bit sparc, 32-bit x86, and 64-bit x86. As a result, we have four mapfiles:

mapfile-sparc
text = V0x40000;

mapfile-sparcv9
text = V0x100400000;

mapfile-i386
text = V0x8080000;

mapfile-amd64
text = V0x480000;

The version 2 syntax can employ conditional input to represent all of these differing values within a single mapfile, simplifying the test makefile. The $error control directive is used to catch cases where this test is run on a new previously unknown platform, and provide a meaningful error to the developer:

$mapfile_version 2

$if _sparc

$if _ELF64
LOAD_SEGMENT text { VADDR = 0x100400000 };
$else
LOAD_SEGMENT text { VADDR = 0x40000 };
$endif

$elif _x86

$if _ELF32
LOAD_SEGMENT text { VADDR = 0x8080000 };
$else
LOAD_SEGMENT text { VADDR = 0x480000 };
$endif

$else
$error unknown platform
$endif

Updates

21 February 2016
Chris Lent at Cooper Union pointed out that I had omitted a leading $ in my example:
$if _sparc && _ELF32
    32-bit sparc thing
elif _x86 && _ELF64
    64-bit x86 thing
$else
    others
$endif
I have added the missing $ to the elif. Thank you for your attention to detail!

Surfing with the Linker-Aliens

Comments

Chris Quenelle — Monday January 11, 2010

Cool. I haven't gotten very far through the docs yet, but it seems that the CAPABILITY directive is specified so that: "FOO -= bar" followed by "FOO += bar" results in bar being omitted from the value of FOO. In other words, += and -= are order-independent. Thus it seems that if one mapfile turns off a flag, a later mapfile cannot turn it back on again (unless it resets the complete value of the flag). I've been involved in specifying the behavior of non-trivial compiler options in the past. It seems simple, but it's actually quite hard to get something that works for all the common cases.

Ali Bahrami — Wednesday January 13, 2010

Thanks Chris! I've made a small change to how this works that should address the issue of '-=' locking out the later ability to add a value back with '+='. As described in the doc, there are two bitmasks, 'value', and 'exclude'. When adding a bit to one of these masks, the same bit will be removed from the other. Hence, a later '+=' can undo the action of an earlier '-='.

Surfing with the Linker-Aliens

Published Elsewhere

https://blogs.sun.com/ali/entry/a_new_mapfile_syntax_for/
https://blogs.oracle.com/ali/entry/a_new_mapfile_syntax_for/
https://blogs.oracle.com/ali/a-new-mapfile-syntax-for-solaris/

Surfing with the Linker-Aliens

[14] Problems With V1 Mapfiles
Blog Index (ali)
[16] Naming Shared Objects