Dissecting an ELF File
WIP
Stages of Program Execution
Binaries are compiled at different stages, as we now know, for the purpose of whats coming ahead I'll start giving a very high level overview of what the binary looks like and what need to be loaded and set-up in memory for execution.
=> Case One: Statically-linked binary: LOAD_TIME
LOAD_TIME
even a simple Hello world!
program has a bunch of external code linked from C libraries, you can create an executable that is linked at compile-time, this is what a statically-linked binary can be described as. you specify a -static
flag with GCC when you compile the hello.c
file, what happens in a very generic overview is that the linker which is specified with gcc
at the end of compilation phase takes the hello.o
file and fills-in all external and local functions that are used which are preprocessed in the pre-processing phase, but in order to do this, the linker needs whats known as Section View
of the binary, in other words, a convenient organization of the binary that specifies which data/code be put where. and more importantly for resolving functions/variables in the code, aka associating a variables with it's values, functions with it's code.
Picture a big program with a number of X-file.c
files where you #include both local-header.h
or <global-header.h>
, the linker with the help of the Section View
will be able to substitute for example X
variable with it value 97
, and a function declaration for example add(y, x);
with it's associated code defined elsewhere. Now for this purpose there is a section that holds names -> Array of NULL-Terminated strings
for all functions/variables names; typically called .strtab
, and another section that hold symbolic reference to these function/variable name with their associated code/value, typically called .symtab
with .reloc
section that contains information for relocations, .reloc
entries are specifying addresses to places that needs relocations and instructions on how to apply this relocation, this relocation section doesn't end up in the file-image nor the memory-image of the binary since the linker used this relocation information from the object file, and all relocations are resolved.
Other sections you've came across are .rodata
, .text
, .bss
. there are factors that separates these sections from each other, but for now think of what memory permission this blob of code/data needs, and what uses this blob of code/data.
~$ gcc -no-pie -static hello.c -o hello-static
/* later gcc has the default pie 'position independent executable', use -no-pie to be a normal executable.
file output:
hello: ELF 64-bit LSB pie executable, x86-64, version 1 (SYSV), dynamically linked, interpreter lib64/ld-linux-x86-64.so.2, BuildID[sha1]=4a9dc1c471da58c90d25934a1ffbafa5836404b5, for GNU/Linux 3.2.0, not stripped */
~$ file hello-static
hello-static: ELF 64-bit LSB executable, x86-64, version 1 (GNU/Linux), statically linked, BuildID[sha1]=3a7d527c20706a085b3e3145161dc3fec289b36a, for GNU/Linux 3.2.0, not stripped
=> Case Two: Dynamically-linked binary: LOAD_TIME
LOAD_TIME
we can choose to dynamically link the binary at load-time -> normal dynamic linking
or at run-time -> lazy-binding
. the difference lies in run-time performance but basically linking at load-time is is self-explanatory, all symbols and references are linked once the program is loaded into memory, this takes time at first but then the program runs fast. The other option is linking at run-time which does not waste the time for relocations
until the first-call to an unresolved symbol, this takes no time at loading the binary but it's slower to run after comparing to linking at load-time
. Linux by default uses lazy linking/binding, but with LD_LOAD_NOW
environment variable, the linker is forced to perform all relocations right away.
Back on track, the dynamic linker needs to do the same job that the linker did at compile-time when we statically linked the binary, it would need Section View
to map symbols to strings, but in this case the dynamic linker refers to other sections .dynsym
and .dynstr
along with .rel.dyn
which contains relocation information, and a special section .dynamic
that is a roadmap for the linker and the OS-loader to load and setup the binary for execution.
~$ gcc -no-pie hello.c -o hello-dynamic
~$ file hello-dynamic
hello-dynamic: ELF 64-bit LSB executable, x86-64, version 1 (SYSV), dynamically linked, interpreter /lib64/ld-linux-x86-64.so.2, BuildID[sha1]=1e0b60ec718684de1c3c63854bcd4c4eb2659874, for GNU/Linux 3.2.0, not stripped
Note that for debuggers .strtab
.symtab
are needed for debugging symbols, but binaries are mostly stripped. but .dynstr
and .dynsym
are never stripped. so the distinction between both is for the compiler to know which symbols/strings tables to strip and which not.
=> Case Three: Executing the binary: EXECUTION_TIME
EXECUTION_TIME
The typical scenario is that the OS loader grabs the binary file-image and maps it to memory, but how? it needs a map to do so, and to be more proficient the OS loader works along with the linker/dynamic linker to perform the binary set-up, although the OS loader is what fires up the linker at load-time and maps it to memory as the first thing, but anyhow let's stick to the mechanism with which this setup is done. For the OS loader and the dynamic linker to map the binary's file-image to memory, it needs a memory layout with which a Segment View
is provided to give a generic layout of the binary as in what are the memory permission of this chuck? where it should be mapped?, as a generic view of this, the Sections the linker needed to perform it's linking and relocations are contained with bigger Segments that are mapped into memory; later we will see that not all sections gets mapped into memory as in .reloc
for statically linked binaries. for example we have a segment of type PT_LOAD
, simply put, that is a segment that gets loaded in memory, this segment can ave a flags of Read-Execute RE
which signifies code, now this type of segment will most certainly have the .text
, .plt
.init
.fini
sections which all contain code. another PT_LOAD
segment with Read-Write
RW flags contains .bss
.data
and others.
long story short, as we will see next, Segment View is needed by the OS loader and the dynamic linker to perform the binary load and setup for execution, and the Section View is needed for the linker to perform the relocation and not all sections gets mapped into memory.
ELF-Executable Headers -> Ehdr
Ehdr
Structure of Elf64_Ehdr
, starts at offset 0
and serves as a map to the rest of the file, it basically marks the ELF_Type
, architecture information, entry point address to the beginning of execution, offsets to other file headers and other relative information.
// ELF header (Ehdr)
// The ELF header is described by the type Elf32_Ehdr or Elf64_Ehdr:
#define EI_NIDENT 16
typedef struct {
unsigned char e_ident[EI_NIDENT];
uint16_t e_type;
uint16_t e_machine;
uint32_t e_version;
ElfN_Addr e_entry;
ElfN_Off e_phoff;
ElfN_Off e_shoff;
uint32_t e_flags;
uint16_t e_ehsize;
uint16_t e_phentsize;
uint16_t e_phnum;
uint16_t e_shentsize;
uint16_t e_shnum;
uint16_t e_shstrndx;
} ElfN_Ehdr;
the entries we care about are:
-> e_ident;
Magic Number, required for the OS loader to process the file as an ELF file.
00000000 7F 45 4C 46 : .ELF
-> e_type;
ELF type
ET_NONE
-> An Unknown type.
ET_REL(1)
->A relocatable file: Dynamically linked object file.
ET_EXEC(2)
-> An executable file.
ET_DYN(3)
-> A shared object: Position Independent Code PIC.o
file
ET_CORE(4)
-> A core file: Core dumps of SIGSEGV
program crash.
-> e_entry;
VA to the start of execution, not really main()
but rather to _start
which is initialization code tacked in for every binary compiled with GCC.
-> e_phoff;
file offset to the program header => Phder
-> e_shoff;
file offset to the section header => Shder
-> e_phnum;
number of entries in program header table, in other words, number of program headers.
-> e_shnum;
number of entries in section header table, in other words, number of section headers.
-> e_shstrndx;
simply put, it's an index to an entry in the section header that contains section header strings and name.

ELF Header of a simple Hello binary
~$ readelf -h hello-static
ELF Header:
Magic: 7f 45 4c 46 02 01 01 03 00 00 00 00 00 00 00 00
Class: ELF64
Data: 2's complement, little endian
Version: 1 (current)
OS/ABI: UNIX - GNU
ABI Version: 0
Type: EXEC (Executable file)
Machine: Advanced Micro Devices X86-64
Version: 0x1
Entry point address: 0x401bc0
Start of program headers: 64 (bytes into file)
Start of section headers: 860912 (bytes into file)
Flags: 0x0
Size of this header: 64 (bytes)
Size of program headers: 56 (bytes)
Number of program headers: 10
Size of section headers: 64 (bytes)
Number of section headers: 32
Section header string table index: 31
~$ readelf -h hello-dynamic
ELF Header:
Magic: 7f 45 4c 46 02 01 01 00 00 00 00 00 00 00 00 00
Class: ELF64
Data: 2's complement, little endian
Version: 1 (current)
OS/ABI: UNIX - System V
ABI Version: 0
Type: EXEC (Executable file)
Machine: Advanced Micro Devices X86-64
Version: 0x1
Entry point address: 0x401050
Start of program headers: 64 (bytes into file)
Start of section headers: 14648 (bytes into file)
Flags: 0x0
Size of this header: 64 (bytes)
Size of program headers: 56 (bytes)
Number of program headers: 13
Size of section headers: 64 (bytes)
Number of section headers: 31
Section header string table index: 30
Program Segment
Headers -> Phdr
Segment
Headers -> Phdr
Program headers table starts at e_phoff
, is an array of structures Elf64_Phdr
, each containing information describing a particular segment. Program Headers or Segments are execution information needed by the OS loader and the dynamic linker to provide a Segment View
of the binary and a memory layout of the program in disk and how it should be mapped into memory. I'll use header/segment respectively as both are denoting to the same information.
Structure of a single segment: type Elf64_Phdr
TODO: File Padding not present in linux example
typedef struct {
uint32_t p_type;
uint32_t p_flags;
Elf64_Off p_offset;
Elf64_Addr p_vaddr;
Elf64_Addr p_paddr;
uint64_t p_filesz;
uint64_t p_memsz;
uint64_t p_align;
} Elf64_Phdr;
structure members we care about:
-> p_type;
specifies SEGMENT type:
PT_LOAD:
-> segment that is mapped into memory, each is identified by memory permission
.text
is read/execute section so it's contained in a RE LOAD
segment,
.rodata
are read only thus is contained in R LOAD
segment.
.bss
is writable thus contained in RW LOAD
segment.
PT_DYNAMIC:
-> segment that hold dynamic linking information, only present if the executable is dynamically linked.
PT_INTERP:
-> this structure member contains a literal string NULL-Terminaed String
of the dynamic linker for this executable-binary. This gets loaded into memory first, then LOAD
segments, then the dynamic linker does it's job of filling out function pointers needed for the executable that are imported
from external libraries.
PT_PHDR:
-> header information about the segment/header itself, as of the location and size of the header/segment in bytes.
-> p_offset:
Where this segment's data starts in file
-> p_vaddr:
Where in memory this segment starts
-> p_padder:
** Physical address, not really present on modern linux OSs, as all programs gets loaded at virtual memory.
-> p_filesize:
The size of this segment in the file image
-> p_memsize:
The size of this segment in the memory image
-> p_align:
this files indicates an alignment constraint to segments loaded in memory, if it's value is 0,1
then there no particular alignment needed, otherwise it must be a power of 2
.
-> p_flags:
** memory permissions for the segment: e.g -> R RE RW
.
NOTE the difference between hello-static
and hello-dynamic
segments.
~$ readelf -l hello-dynamic
Elf file type is EXEC (Executable file)
Entry point 0x401050
There are 13 program headers, starting at offset 64
Program Headers:
Type Offset VirtAddr PhysAddr
FileSiz MemSiz Flags Align
PHDR 0x0000000000000040 0x0000000000400040 0x0000000000400040
0x00000000000002d8 0x00000000000002d8 R 0x8
INTERP 0x0000000000000318 0x0000000000400318 0x0000000000400318
0x000000000000001c 0x000000000000001c R 0x1
[Requesting program interpreter: /lib64/ld-linux-x86-64.so.2]
LOAD 0x0000000000000000 0x0000000000400000 0x0000000000400000
0x00000000000004d0 0x00000000000004d0 R 0x1000
LOAD 0x0000000000001000 0x0000000000401000 0x0000000000401000
0x00000000000001e5 0x00000000000001e5 R E 0x1000
LOAD 0x0000000000002000 0x0000000000402000 0x0000000000402000
0x0000000000000158 0x0000000000000158 R 0x1000
LOAD 0x0000000000002e10 0x0000000000403e10 0x0000000000403e10
0x0000000000000220 0x0000000000000228 RW 0x1000
DYNAMIC 0x0000000000002e20 0x0000000000403e20 0x0000000000403e20
0x00000000000001d0 0x00000000000001d0 RW 0x8
NOTE 0x0000000000000338 0x0000000000400338 0x0000000000400338
0x0000000000000020 0x0000000000000020 R 0x8
NOTE 0x0000000000000358 0x0000000000400358 0x0000000000400358
0x0000000000000044 0x0000000000000044 R 0x4
GNU_PROPERTY 0x0000000000000338 0x0000000000400338 0x0000000000400338
0x0000000000000020 0x0000000000000020 R 0x8
GNU_EH_FRAME 0x0000000000002014 0x0000000000402014 0x0000000000402014
0x0000000000000044 0x0000000000000044 R 0x4
GNU_STACK 0x0000000000000000 0x0000000000000000 0x0000000000000000
0x0000000000000000 0x0000000000000000 RW 0x10
GNU_RELRO 0x0000000000002e10 0x0000000000403e10 0x0000000000403e10
0x00000000000001f0 0x00000000000001f0 R 0x1
Section to Segment mapping:
Segment Sections...
00
01 .interp
02 .interp .note.gnu.property .note.gnu.build-id .note.ABI-tag .gnu.hash .dynsym .dynstr .gnu.version .gnu.version_r .rela.dyn .rela.plt
03 .init .plt .plt.sec .text .fini
04 .rodata .eh_frame_hdr .eh_frame
05 .init_array .fini_array .dynamic .got .got.plt .data .bss
06 .dynamic
07 .note.gnu.property
08 .note.gnu.build-id .note.ABI-tag
09 .note.gnu.property
10 .eh_frame_hdr
11
12 .init_array .fini_array .dynamic .got
~$ readelf -l hello-static
Elf file type is EXEC (Executable file)
Entry point 0x401bc0
There are 10 program headers, starting at offset 64
Program Headers:
Type Offset VirtAddr PhysAddr
FileSiz MemSiz Flags Align
LOAD 0x0000000000000000 0x0000000000400000 0x0000000000400000
0x0000000000000518 0x0000000000000518 R 0x1000
LOAD 0x0000000000001000 0x0000000000401000 0x0000000000401000
0x0000000000092db5 0x0000000000092db5 R E 0x1000
LOAD 0x0000000000094000 0x0000000000494000 0x0000000000494000
0x0000000000025315 0x0000000000025315 R 0x1000
LOAD 0x00000000000ba140 0x00000000004bb140 0x00000000004bb140
0x00000000000050d0 0x0000000000006880 RW 0x1000
NOTE 0x0000000000000270 0x0000000000400270 0x0000000000400270
0x0000000000000020 0x0000000000000020 R 0x8
NOTE 0x0000000000000290 0x0000000000400290 0x0000000000400290
0x0000000000000044 0x0000000000000044 R 0x4
TLS 0x00000000000ba140 0x00000000004bb140 0x00000000004bb140
0x0000000000000020 0x0000000000000060 R 0x8
GNU_PROPERTY 0x0000000000000270 0x0000000000400270 0x0000000000400270
0x0000000000000020 0x0000000000000020 R 0x8
GNU_STACK 0x0000000000000000 0x0000000000000000 0x0000000000000000
0x0000000000000000 0x0000000000000000 RW 0x10
GNU_RELRO 0x00000000000ba140 0x00000000004bb140 0x00000000004bb140
0x0000000000002ec0 0x0000000000002ec0 R 0x1
Section to Segment mapping:
Segment Sections...
00 .note.gnu.property .note.gnu.build-id .note.ABI-tag .rela.plt
01 .init .plt .text __libc_freeres_fn .fini
02 .rodata .stapsdt.base .eh_frame .gcc_except_table
03 .tdata .init_array .fini_array .data.rel.ro .got .got.plt .data __libc_subfreeres __libc_IO_vtables __libc_atexit .bss __libc_freeres_ptrs
04 .note.gnu.property
05 .note.gnu.build-id .note.ABI-tag
06 .tdata .tbss
07 .note.gnu.property
08
09 .tdata .init_array .fini_array .data.rel.ro .got
Section Header -> Shdr
Shdr
Section Header Table starts at e_shoff
is an Array of Elf64_Shdr
structures each containing information of a specific section. the section header table provides a Section View
that is used by the linker during link-time and with dynamic linking. each section is a blob of code/data which has no particular structure except for special sections. in fact the structure of a section depends on it's content.
Section headers provide convenient organization of the binary that is used by the linker and also parsed by static binary analysis tools. if an ELF doesn't need linking, Section Header Table is not needed, in fact in this case the Ehdr
element e_shoff
will be 0
.
Structure of a single Section: Elf64_Shdr
typedef struct {
uint32_t sh_name;
uint32_t sh_type;
uint64_t sh_flags;
Elf64_Addr sh_addr;
Elf64_Off sh_offset;
uint64_t sh_size;
uint32_t sh_link;
uint32_t sh_info;
uint64_t sh_addralign;
uint64_t sh_entsize;
} Elf64_Shdr;
elements we care about, all of them ... ( ̚‿̚ )
-> sh_name:
index to the .shstrtab
-> array of NULL-terminated strings, each for a section header.
-> sh_type:
section type to give the OS loader an idea of where and how to interpret the data.
SHT_NULL
-> the very first entry in the section header table, indicating an empty section.
SHT_PROGBITS
-> section that contains either code/data.
SHT_NOBITS
-> for .bss
section contained uninitialized data.
SHT_DYNAMIC
-> section header holding dynamic linking information
SHT_DYNSYM
-> contains symbol table for dynamic linker.
SHT_SYMTAB
-> contains symbol table for static linking
SHT_STRTAB
-> strings table
SHT_REL
, SHT_RELA
-> for relocation information
-> sh_flags:
additional information about the section
SHF_WRITE
-> section that is writable in run-time e.g .bss
.got.plt
SHF_ALLOC
-> section that gets loaded in memory, sections without this flag gets discarded and don't end up in the memory-image
SHF_EXECINSTR
-> indicates a section that contains executable instructions.
-> sh_addr:
is this section ends up in the memory-image of the binary, this field will have the address at which the section's first byte should reside in memory.
-> sh_offset:
file offset from the beginning of the file to the first byte of this section
-> sh_size:
size of bytes in file, unless it's of SHT_NOBITS
type, in this case that section won't have a size in file -> .bss
section.
-> sh_link:
index to a related section header, this information is needed by the linker. some sections have a relationship and the linker needs to know these relative section headers for linking purposes. e.g the SHT_SYMTAB
or SHT_DYNSYM
both have an associated section header that contains symbolic names for symbols in question ... SHT_STRTAB
.
-> sh_info:
information depending on the section.
-> sh_addralign:
like p_align
, defines an alignment constraint if any
which value must be power of 2
.
-> sh_entsize:
some special sections that has a well-defined-structure (such as Elf64_Rel
Elf64_Sym
) have a fixed size entries, this field contains the size of the structure entry.
Sections we care about:
-> null
: First section is always a SHT_NULL
-> .init
: Code run by the OS before handing the control to the program to the main entry point.
-> .fini
: Code run by the OS
-> .dynamic
​
Last updated