64-bit Archives Needed |
Ali Bahrami Friday November 11, 2011
A little over a year ago, we received a question from someone who was trying to build software on Solaris. He was getting errors from the ar command when creating an archive. At that time, the ar command on Solaris was a 32-bit command. There was more than 2GB of data, and the ar command was hitting the file size limit for a 32-bit process that doesn't use the largefile APIs.
Even in 2011, 2GB is a very large amount of code, so we had not heard this one before. Most of our toolchain was extended to handle 64-bit sized data back in the 1990's, but archives were not changed, presumably because there was no perceived need for it. Since then of course, programs have continued to get larger, and in 2010, the time had finally come to investigate the issue and find a way to provide for larger archives.
As part of that process, I had to do a deep dive into the archive format, and also do some Unix archeology. I'm going to record what I learned here, to document what Solaris does, and in the hope that it might help someone else trying to solve the same problem for their platform.
The requirement to pad members to an even length is a dead giveaway as to the age of the archive format. It tells you that this format dates from the 1970's, and more specifically from the era of 16-bit systems such as the PDP-11 that Unix was originally developed on. A 32-bit system would have required 4 bytes, and 64-bit systems such as we use today would probably have required 8 bytes. 2 byte alignment is a poor choice for ELF object archive members. 32-bit objects require 4 byte alignment, and 64-bit objects require 64-bit alignment. The link-editor uses mmap() to process archives, and if the members have the wrong alignment, we have to slide (copy) them to the correct alignment before we can access the ELF data structures inside. The archive format requires 2 byte padding, but it doesn't prohibit more. The Solaris ar command takes advantage of this, and pads ELF object members to 8 byte boundaries. Anything else is padded to 2 as required by the format.
The ar command hides the special members from you when you list the contents of an archive, so most users don't know that they exist. There are only two possible special members: A symbol table that maps ELF symbols to the object archive member that provides it, and a string table used to hold member names that exceed 15 characters. The '/' convention for tagging special members provides room for adding more such members should the need arise. As I will discuss below, we took advantage of this fact to add an alternate 64-bit symbol table special member which is used in archives that are larger than 4GB.
The archive file format is not formally standardized. However, the ar command and archive format were part of the original Unix from Bell Labs. Other systems started with that format, extending it in various often incompatible ways, but usually with the same common shared core. Most of these systems use the same magic number to identify their archives, despite the fact that their archives are not always fully compatible with each other. It is often true that archives can be copied between different Unix variants, and if the member names are short enough, the ar command from one system can often read archives produced on another.
In practice, it is rare to find an archive containing anything other than objects for a single operating system and machine type. Such an archive is only of use on the type of system that created it, and is only used on that system. This is probably why cross platform compatibility of archives between Unix variants has never been an issue. Otherwise, the use of the same magic number in archives with incompatible formats would be a problem.
I was able to find information for a number of Unix variants, described below. These can be divided roughly into three tribes, SVR4 Unix, BSD Unix, and IBM AIX. Solaris is a SVR4 Unix, and its archives are completely compatible with those from the other members of that group (GNU/Linux, HP-UX, and SGI IRIX).
Widening the existing SVR4 archive symbol tables rather than inventing something new is the simplest path forward. There is ample precedent for this approach in the ELF world. When ELF was extended to support 64-bit objects, the approach was largely to take the existing data structures, and define 64-bit versions of them. We called the old set ELF32, and the new set ELF64. My guess is that there was no need to widen the archive format at that time, but had there been, it seems obvious that this is how it would have been done.
- AIX
- AIX is an exception to rule that Unix archive formats are all based on the original Bell labs Unix format. It appears that AIX supports 2 formats (small and big), both of which differ in fundamental ways from other Unix systems:
- These formats use a different magic number than the standard one used by Solaris and other Unix variants.
- They include support for removing archive members from a file without reallocating the file, marking dead areas as unused, and reusing them when new archive items are inserted.
- They have a special table of contents member (File Member Header) which lets you find out everything that's in the archive without having to actually traverse the entire file. Their symbol table members are quite similar to those from other systems though.
- Their member headers are doubly linked, containing offsets to both the previous and next members.
Of the Unix systems described here, AIX has the only format I saw that will have reasonable insert/delete performance for really large archives. Everyone else has O(n) performance, and are going to be slow to use with large archives.
- BSD
- BSD has gone through 4 versions of archive format, which are described in their manpage. They use the same member header as SVR4, but their symbol table format is different, and their scheme for long member names puts the name directly after the member header rather than into a string table.
- GNU/Linux
- The GNU toolchain uses the SVR4 format, and is compatible with Solaris.
- HP-UX
- HP-UX seems to follow the SVR4 model, and is compatible with Solaris.
- IRIX
- IRIX has 32 and 64-bit archives. The 32-bit format is the standard SVR4 format, and is compatible with Solaris. The 64-bit format is the same, except that the symbol table uses 64-bit integers.
IRIX assumes that an archive contains objects of a single ELFCLASS/MACHINE, and any archive containing ELFCLASS64 objects receives a 64-bit symbol table. Although they only use it for 64-bit objects, nothing in the archive format limits it to ELFCLASS64. It would be perfectly valid to produce a 64-bit symbol table in an archive containing 32-bit objects, text files, or anything else.
- Tru64 Unix (Digital/Compaq/HP)
- Tru64 Unix uses a format much like ours, but their symbol table is a hash table, making specific symbol lookup much faster. The Solaris link-editor uses archives by examining the entire symbol table looking for unsatisfied symbols for the link, and not by looking up individual symbols, so there would be no benefit to Solaris from such a hash table. The Tru64 ld must use a different approach in which the hash table pays off for them.
Widening the existing symbol table format to 64-bits is therefore the obvious way to proceed. For Solaris 11, I implemented that, and I also updated the ar command so that a 64-bit version is run by default. This eliminates the 2 most significant limits to archive size, leaving only the limit on an individual archive member.
We only generate a 64-bit symbol table if the archive exceeds 4GB, or when the new -S option to the ar command is used. This maximizes backward compatibility, as an archive produced by Solaris 11 is highly likely to be less than 4GB in size, and will therefore employ the same format understood by older versions of the system. The main reason for the existence of the -S option is to allow us to test the 64-bit format without having to construct huge archives to do so. I don't believe it will find much use outside of that.
Other than the new ability to create and use extremely large archives, this change is largely invisible to the end user. When reading an archive, the ar command will transparently accept either form of symbol table. Similarly, the ELF library (libelf) has been updated to understand either format. Users of libelf (such as the link-editor ld) do not need to be modified to use the new format, because these changes are encapsulated behind the existing functions provided by libelf.
As mentioned above, this work did not lift the limit on the maximum size of an individual archive member. That limit remains fixed at 4GB for now. This is not because we think objects will never get that large, for the history of computing says otherwise. Rather, this is based on an estimation that single relocatable objects of that size will not appear for a decade or two. A lot can change in that time, and it is better not to overengineer things by writing code that will sit and rot for years without being used.
It is not too soon however to have a plan for that eventuality. When the time comes when this limit needs to be lifted, I believe that there is a simple solution that is consistent with the existing format. The archive member header size field is an ASCII string, like the name, and as such, the overflow scheme used for long names can also be used to handle the size. The size string would be placed into the archive string table, and its offset in the string table would then be written into the archive header size field using the same format "/ddd" used for overflowed names.
[17] Solaris 11 | [19] -z guidance |