Last Significant Page Update:

2025-09-12

Comments to:

elftoolchain@jkoshy.net

Since I am dealing with a lot of ELF-related detail for my Elftoolchain work, I thought that it might be a good idea to collate the material into a Handbook.

The current plan is to structure the ELF Handbook as a reference book, with emphasis on how ELF concepts translate to the bits inside of files, and how ELF’s semantics are implemented by an operating system.  There will be plenty of diagrams illustrating the behavior expected at the level of machine code.

Please see the section “Concept Index” below for the list of concepts that are planned to be covered.

Proposed Handbook Structure

The structure below is tentative, and very much subject to change.

Chapters / Appendices

Introduction/Overview

Hello World (Static)
  • A simple, standalone, hello-world executable.

  • Examination of the generated ELF object.

  • How the resulting process is laid out in virtual memory.

Hello World (Dynamic)
  • Examination of the generated ELF objects.

  • How the resulting process is laid out in virtual memory.

  • Overview of the startup process.

ELF File Structure
  • Sufficient detail to make the following sections comprehensible.

Dynamic Executables

Relocations Explained

Linking
Static Linking

Dynamic Linking
  • Link time actions and runtime actions.

  • A step-by-step guide (for the default symbol lookup rules).

Handling Archives

Loading
  • Process layout in virtual memory.

  • Static vs dynamic executables.

  • Init/Fini sections.

Multi-Threading Support
  • PT_TLS segments.

C++ Support
  • Name mangling.

Rust Support

Symbol Versioning
  • How it works.

  • How it is represented in the ELF.

Symbol and Object Capabilities
  • How capabilities work; what they are used for.

  • Symbol and object lookup rules.

Using The Runtime Loader
  • Various RTLD_* options to dlopen().

(How-To) Evolving A Shared Library
  • How to preserve binary compatibility with prior binaries.

  • Tools to check ABI compatibility.

  • Using symbol versioning.

  • Using multiple .so files for binary-incompatible changes.

  • The notion of ‘interfaces’.

(How-To) Interposing Shared Objects
  • An example showing how to override the definition of say printf().

  • Finding and invoking the downstream definition using dlsym(RTLD_NEXT).

Figures

ar(1) Archive Structure

Showing the header, and member chaining.

ELF structure (executables)

Showing the header, PHDR table, and segments.

ELF structure (relocatables)

Showing the header, SHDR table, and section data.

Linking (static linking)

Diagram of how sections are merged.

Linking (dynamic linking)

Link time merging of sections, runtime resolution of symbols.

Process Layout (Static Executables)

Showing the mapping of text, bss and data segments.

Process Layout (Dynamic Executables)

Showing the mapping of text, bss, data segments across of multiple objects in an address space.

Shared Object Lookup

Showing how objects are found on disk.

Shared Objects: Disk sharing

Showing the sharing of disk space vs. static linking.

Shared Objects: RAM sharing

Showing the sharing of physical memory of text segments by the virtual memory system.

Symbol Lookup (Capabilities)

Showing the selection of capability-specific symbols in preference to a generic one.

Symbol Lookup (Default Lookup)

Showing symbol binding within a dynamic object and to a different object.

Symbol Lookup (Group Lookup)

Restricting symbol binding to within a group of objects.

Symbol Lookup (Versions)

Showing resolution of versioned symbols.

Tables

Auto-generated Symbols

Each auto-generated symbol along with a short description of the symbol.

LD_* environment variables

All LD_* environment variables and their impact on the linker’s behavior.

Linker Expansions

All string tokens (like $ORIGIN) known to the linker with their meaning, and the contexts where they are expanded.

Program Segments

Segments known to the kernel or runtime linker, with their semantics.

Reserved Symbols

Reserved symbols and their meanings.

Section Types

Short definitions of each section type; examples of where elements of that type would be used.

Relocation Types

Each relocation type defined for RISC-V or ARM (say), with examples of source code that results in that relocation type.

Concept Index

A (partial) list of concepts to be covered in the handbook.

32- vs 64- bit objects
  • Compile-time and runtime search paths.

  • Pitfalls in mix-and-match linking (if at all supported by an architecture).

Absolute Symbols
  • What these are.

  • How to define such symbols.

  • Where such symbols are needed.

  • Why these kinds of symbols are discouraged in shared objects.

Ancilliary Shared Objects

E.g., for holding debug information separately from a ‘main’ object:

  • Why ancillary objects are useful or needed.

  • How to generate these files.

ar(1) archives
  • What these are.

  • How used during program development.

  • Archive structure.  Extensions to the format.

  • Archive symbol tables: SVR4 and BSD tables.

  • Special members (like __.LIBDEP, //, etc.).

  • Using libelf to access archive members.

  • Using libarchive to access archive members.

Archive Processing by the Linker
  • How linkers look for symbol definitions in archives.

  • Speeding up the link using lorder and tsort.

  • Using undefined symbols to force an archive lookup.

  • One-pass linkers.

Auditing the Runtime Linker
  • What this is and why it is useful.

  • Environment variables controlling the audit.

Auto-Generated Symbols
  • What these are.

  • When these are generated during the link.

Backward Compatibility

See Object Evolution.

Capabilities (Link-time)
  • Describe what these are and where they are useful.

  • Types of capabilities: hardware, platform, machine, software, etc.

  • Object capabilities vs symbol capabilities.

  • How archives and capabilities interact.

  • The notion of a ‘lead symbol’.

C++ Name Mangling
  • The need for name mangling.

  • Demangling schemes in current use.

  • The Itanium ‘standard’ for demangling.

  • Potential issues with mangled names.

  • Tools to mangle and demangle C++ names.

C++ Templates
  • What these are, and the ELF structures that they are compiled to.

  • Handling duplicate template expansions at link time: code, data, entries that would go into .bss.

  • .gnu.linkonce sections.

COMDAT Sections
  • Why needed and where used.

  • Examples of C and FORTRAN sources that require COMDAT semantics.

  • How these sections are handling during linking.

  • Interaction with dynamic objects.

  • Any gotchas.

Compensating Dependencies
  • Why programmers end up specifying these.

  • Build problems due to their use.

  • Solutions.

Compilation vs Runtime Environments
  • List the differences between link behavior in compile-time and runtime environments.

Compilation Models
  • Why needed by some architectures.

  • Mix-and-match of objects built with differing models.

Compressed Debug Sections
  • How represented in an ELF object.

  • How to create these.

  • Costs and benefits.

Controlling Symbol Visibility
  • Why useful.

  • How to control symbol visibility.

Copy Relocations
  • What these are.

  • Why these are useful.

Cross- vs native linking
  • Issues in handling non-native architectures.

Cyclic Link-time Dependencies
  • Why these arise.

  • How to handle them.

Data Segments
  • Various types (.data, .bss, etc.) and their properties.

  • How the linker assembles these from its input relocatable objects.

  • Control of data placement using linker scripts.

  • Relocations that are applicable.

  • When potentially sharable across processes by the virtual memory system.

  • How data segments are distributed across shared objects.

Default Symbol Lookup Process
  • Describe the sequence of objects searched by the runtime loader for symbols.

  • Searching in dependent objects.

  • Lazy loading of dependent objects.

  • Searching in `dlopen()’ed objects.

  • Search scopes: world vs local.

Deferred Symbol References
  • What these are.

  • Where useful.

Dependent Shared Objects
  • Define the notion of a ‘dependent object’.

  • How object dependenices are specified at link time.

  • How dependencies are represented in the ELF format.

  • Rules for look up of dependent objects at link time and at run time.

  • Control of the lookup path using linker variables like $ORIGIN.

Direct Bindings
  • What these are.

  • Where useful.

  • How to specify direct bindings.

  • Conversely, how to prevent a symbol from being directly bound.

  • Why singleton symbols cannot be directly bound.

Displacement Relocations
  • What these are.

  • How they arise.

dlopen() and dlsym()
  • What these APIs are for.

  • How these APIs work: search paths, linker variables like $ORIGIN, the effect for the various RTLD_* flags.

Dynamic Linking
  • Describe what this is.

  • Advantages and disadvantages.

  • How represented in the ELF file format.

  • The need for position-independent code.

  • The need for a GOT and PLT.

  • Evolving APIs with backward compatibility when using dynamic linking.

Dynamic Objects
  • Shared objects, dynamic executables, position-independent executables.

  • How these look to the OS virtual memory manager.

  • Startup (.init) and teardown (.fini) semantics, when these are invoked.

  • Relocations used by dynamic objects; examples of source code that generates such relocations.

.dynamic Segment
  • The function of this section in an ELF object.

  • How pointed to from the ELF program header.

Dynamic String Tokens
  • List the linker tokens (e.g. $ORIGIN, etc.) supported by BSD and GNU runtime linkers, and their semantics.

  • Security considerations with setuid executables.

elfdump / objdump / readelf (Utilities)
  • Briefly describe how to use these tools.

ELF File Format
  • Describe the format.

  • The ELF header and its features.

  • How sections are described in a relocatable.

  • How segments are described in a executable object.

  • Possibly reuse/rework content that is Libelf by Example.

ELF Section Groups
  • Describe what these are.

  • Why useful.

  • How to specify and use section groups.

  • Linker behavior with section groups.

ELF Sections
  • What these are, and the role they play

  • How represented in an ELF file.

  • Various properties that sections can have.

  • Which ELF sections end up an executable and which are purely for link-time use.

ELF Sections vs Segments
  • The point in the object’s lifecycle that each concept (section or segment) is relevant.

  • How sections map to segments.

  • How the linker combines sections.

  • Discarded sections.

ELF Segments
  • How these are constructed from sections.

  • How represented in the ELF file format.

ELF Standards
  • The main ‘standard’ and who (apparently) maintains these.

  • Processor-specific ABIs and who maintains these.

  • psABI documents lacking clear owners.

  • Architectures using ELF but without a formal psABI (e.g. VAX).

ELF Startup: Dynamic Executables
  • The PT_INTERP segment and its contents.

  • How the kernel invokes the ELF interpreter.

  • What the interpreter does.

  • Relocations and fix-ups needed at startup.

  • Loading dependent shared objects.

  • Lazy loading.

ELF Startup

Static Executables ::

  • The _start entry point.

  • Relocations needed at load time.

  • How program segments map to virtual memory, copy-on-write sharing.

  • The machine environment when control passes to _start.

  • The runtime environment expected by the C language main().

Encapsulation Symbols
  • What these are, and where these would be useful.

  • How to get a linker to generate these.

Environment Variables
  • Describe the environment variables that influence linker behavior (e.g. LD_LIBRARY_PATH, LD_AUDIT, etc.)

  • Describe the environment variables that influence runtime loader behavior.

Filter Objects
  • Where filter objects are useful.

  • Filter object types.

  • Creating filter objects and linking against them.

  • Symbol lookup when using filter objects.

.fini and .fini_array Sections
  • What these are for.

  • When invoked during the process/shared object’s usage lifecycle.

  • How represented in the ELF object.

  • The runtime environment that code in these sections should expect.

  • Specifying code to be added to these sections.

  • How the linker merges these sections across relocatables.

Global Offset Table (GOT)
  • Why a GOT is needed, source/machine code examples.

  • How represented in an ELF object.

  • Relocations applicable.

Hiding Obsolete APIs
  • Using stubs to remove functions from future compilations, while keeping them around for older programs.

  • See Object Evolution below.

Immediate Reference
  • Where triggered.

  • Forcing using LD_BIND_NOW.

.init and .init_array Sections
  • What these sections are for.

  • When the code in these sections are involved during a process/shared object’s usage lifecycle.

  • How represented in the ELF object.

  • The runtime environment that code in these sections should expect.

  • Specifying code to be added to these sections.

  • How the linker merges these sections across relocatables.

.interp Segment
  • What this is and what it holds.

Kernel Loader
  • What this does.

  • Describe differences to the runtime loader for user programs.

Kernel Modules
  • What these are.

  • How these are different from dynamic executables meant for userspace.

  • How represented in ELF form.

  • Describe NetBSD/FreeBSD kernel modules.

Lazy Loading (of Objects)
  • What this is.

  • Advantages/drawbacks of lazy loading.

  • How to specify objects as lazily loaded.

Lazy Reference (of Symbols)
  • Why needed.

Lead Symbols
  • What these are.

  • How they guide the lookup for capability-specific symbols.

Library Naming Conventions
  • For unix.

Linker Scripts
  • What these are and where these are useful.

  • Implicit (default) scripts.

  • Linker script features (for a few linkers).

Link Editor
  • Describe broadly what a link editor does.

  • How invoked.

  • How symbols are resolved.

  • Impact of command-line position of objects and archives on symbol resolution.

  • Controlling the layout and content of the linker’s output.

Link Editor Extensions
  • How to extend the functionality of the link editor at runtime.

Linker Environment Variables
  • See Environment Variables.

Mapfiles
  • What these are.

  • Where and how used.

Multiple Symbol Definitions
  • How these are are resolved when using dynamic objects.

  • Impact of search order.

  • Examples of unexpected behavior.

  • Difference between static linking and dynamic linking.

Non-symbolic Relocations
  • Describe what these are.

Object Evolution
  • Techniques to preserve backward compatibility when evolving objects.

  • How to use symbol versioning.

  • How to use multiple .so objects.

  • SONAME functionality.

  • Defining interfaces for shared objects.

  • Additive vs non-additive changes to interfaces.

  • Controlling symbol scope / keeping symbols ‘local’.

  • Using filter objects.

  • Differences in linker lookup vs runtime lookup of symbols.

  • Differences in linker lookup vs runtime lookup of dependent objects.

Object Groups (Link time)
  • Why useful.

  • Forcing symbol lookup to be within a group.

Object Interposition
  • What this is.

  • Why useful.

  • How to use LD_PRELOAD.

Object Versioning
  • How to version files to preserve backward compatibility.

  • Using .soname.

Parent Objects (Plugins)
  • What these are.

  • How to specify a ‘parent’ object using mapfiles.

.plt Segment
  • What this is.

  • Why needed.

  • Lazy resolution of procedure symbols.

Position Independent Code
  • What PIC looks like at the machine level.

  • Advantages of PIC.

  • Disadvantages of PIC: extra indirections, loss of performance, code size, etc.

Position Independent Executables
  • What these are.

  • How to create PIEs.

.preinit_array Segment
  • Describe the semantics of this segment.

Procedure Linkage Table (PLT)
  • What this is.

  • Why necessary for dynamic objects.

  • Lazy resolution of procedure symbols: why useful.

  • Before- and after- content of a PLT entry after symbol resolution.

Relocatable Object
  • The structure of a relocatable object.

  • What specifically is ‘relocatable’ about the contents.

  • How relocations needed are represented in ELF.

Relocations
  • What relocations are, and why they are needed.

  • Types of relocations.

  • Where relocations are defined in a psABI.

  • Examples of instructions being modified by relocation.

  • When relocations are applied during the linking and loading process.

  • Tools to look at relocations in an ELF object.

  • Errors during relocation processing.

Reserved Symbols
  • A list of reserved symbols (like _init, _etext, etc.) and their meanings.

Runnable Process
  • How a file on disk is transformed into a runnable process.

  • Read-only vs read-write parts.

  • Executable vs non-executable memoryy.

  • Stacks.

  • Threads.

Runpaths
  • How used to find shared object dependencies.

  • How specified at link time.

  • How stored in the ELF file format.

Runtime Linker
  • Where found on the file system.

  • How invoked by the kernel, how information about the current executable is passed to the linker.

  • Initialization steps.

  • When and how control is passed to the main executable.

  • Directories searched for dependent objects.

  • How dependencies are represented in the ELF format.

  • Symbol lookup within a set of loaded objects.

  • Application access using dlsym().

  • Flags controlling runtime linker behavior.

Rust
  • Name mangling rules for Rust.

  • Anything else that is Rust specific.

Singleton Symbols
  • What these are, and why they are useful.

  • How multiply definitions across shared objects are handled.

Startup Performance (Dynamic Objects)
  • Slow-downs due to startup actions, relative to static executables.

  • Lazy symbol lookup (of procedure symbols).

  • Measuring startup performance.

  • Mitigations.

  • Caching by the runtime loader.

String Tables
  • What they hold.

  • Which strings tables exist in an ELF object.

String Table Compression
  • Why compression is needed.

  • Methods of compressing a string table.

  • Performance implications.

Stub Objects
  • Used for speeding up builds.

  • Also used for ensuring build correctness, to separate interface from implementation.

  • How to create, and use.

Symbol Binding (concept)
  • Define the notion of binding.

  • Weak vs strong binding.

Symbol Capabilities
  • What these are used for.

  • Examples of use for hardware-optimized functionality.

  • The ‘lead symbol’ that represents a family of related symbols distinguished by capability.

  • How represented in an ELF object.

Symbol Elimination
  • Why needed.

  • How to specify symbols to be eliminated from symbol tables.

Symbolic Binding (of Symbols)
  • Why needed.

  • How to specify symbolic binding for a shared object.

  • Differences from Direct Binding for symbols.

Symbolic Relocations
  • What these are.

  • Contrast with non-symbolic relocations.

  • Source code examples requiring symbolic relocation.

Symbol Interposition
  • What this is.

  • Where useful to override functionality.

  • Where unwanted; how to prevent unwanted interposition with Direct Binding.

Symbol Resolution
  • Simple vs Complex resolutions of symbols.

  • Handling symbols with differing characteristics.

  • Symbol resolution states: ‘undefined’, ‘tentative’, ‘defined’.

  • Precedence of resolution states.

Symbol Scopes
  • ‘Global’ and ‘Local’ scope.

  • Reducing the scope of symbols using mapfiles / linker scripts.

Symbol Search Order
  • Default Symbol Search Order (‘World’ search), vs local (‘Group’) searches.

  • The impact of the order of loading of dependent objects.

  • Linker lookups vs runtime lookups.

Symbol Tables
  • What these are, and what they are used for.

  • The structure of each symbol table entry.

  • The associated string table.

  • Link-time tables (.symtab/.strtab) vs runtime table (.dynsym/.dynstr).

  • Table generation process.

Symbol Types
  • The types expressible in ELF.

Symbol Versioning
  • How to specify versions for sets of symbols.

  • The use of symbol versioning.

  • How represented in the ELF file.

  • Symbol lookup in the presence of multiple symbol versions.

  • The ‘base’ symbol version, based on the object’s name.

  • “Empty” versions.

  • Using dlsym() to look up version identifiers.

  • Pinning down version dependencies at link time.

Symbol Visibility
  • The meaning of ‘Local’ or ‘Global’ visibility.

  • Adjusting symbol visibility at link time.

  • Singleton symbols.

Tentative Symbols
  • How to define these.

  • Where useful.

  • Lack of ordering guarantees in output files.

Text Relocations
  • What these are.

  • Why they are bad (pessimal copy-on-write behavior).

  • Why they arise. Source code examples.

  • How to find them e.g. the findtextrel utility.

Thread-local Storage
  • The semantics of thread-local storage.

  • How to specify thread-local storage in source code.

  • How represented in an ELF object.

  • Segments holding thread-local storage (of type SHT_TLS).

  • Sections with TLS data: (initialized) .tdata, (uninitialized) .tbss.

  • Various TLS models.

  • How TLS segments are processed at startup and termination of a dynamic object.

Undefined Symbols
  • What these are.

  • The effect of undefined symbols on a static link.

  • The effect of undefined symbols on dynamic objects.

  • Specifying additional undefined symbols at link time, and why one would do such a thing.

Unused Sections
  • How the linker determines that a section is unused.

  • How the linker handles cyclic references between sections where the section group is other unused.

  • The default behavior of unused sections.

  • Forcing unused sections to be kept around.

Versioned Filenames
  • Using versioned filenames to allow older binaries to run.

  • Rules for using version numbers in filenames for shared objects.

  • Using symbolic links.

  • How dependencies are recorded in an ELF object.

  • Runtime lookup of the correct versioned filename.

Weak Symbols
  • What these are.

  • Why they are useful.

  • How to define weak symbols.

  • Why they are considered to be fragile and error-prone.

  • Using dlsym(RTLD_PROBE) instead of weak symbols.