Dissecting an ELF File
WIP
Stages of Program Execution
Binaries are compiled at different stages, as we now know, for the purpose of whats coming ahead I'll start giving a very high level overview of what the binary looks like and what need to be loaded and set-up in memory for execution.
=> Case One: Statically-linked binary: LOAD_TIME
LOAD_TIMEeven a simple Hello world! program has a bunch of external code linked from C libraries, you can create an executable that is linked at compile-time, this is what a statically-linked binary can be described as. you specify a -static flag with GCC when you compile the hello.c file, what happens in a very generic overview is that the linker which is specified with gcc at the end of compilation phase takes the hello.o file and fills-in all external and local functions that are used which are preprocessed in the pre-processing phase, but in order to do this, the linker needs whats known as Section View of the binary, in other words, a convenient organization of the binary that specifies which data/code be put where. and more importantly for resolving functions/variables in the code, aka associating a variables with it's values, functions with it's code.
Picture a big program with a number of X-file.c files where you #include both local-header.h or <global-header.h>, the linker with the help of the Section View will be able to substitute for example X variable with it value 97, and a function declaration for example add(y, x); with it's associated code defined elsewhere. Now for this purpose there is a section that holds names -> Array of NULL-Terminated strings for all functions/variables names; typically called .strtab, and another section that hold symbolic reference to these function/variable name with their associated code/value, typically called .symtab with .reloc section that contains information for relocations, .reloc entries are specifying addresses to places that needs relocations and instructions on how to apply this relocation, this relocation section doesn't end up in the file-image nor the memory-image of the binary since the linker used this relocation information from the object file, and all relocations are resolved.
Other sections you've came across are .rodata, .text, .bss. there are factors that separates these sections from each other, but for now think of what memory permission this blob of code/data needs, and what uses this blob of code/data.
~$ gcc -no-pie -static hello.c -o hello-static
/* later gcc has the default pie 'position independent executable', use -no-pie to be a normal executable.
file output:
hello: ELF 64-bit LSB pie executable, x86-64, version 1 (SYSV), dynamically linked, interpreter lib64/ld-linux-x86-64.so.2, BuildID[sha1]=4a9dc1c471da58c90d25934a1ffbafa5836404b5, for GNU/Linux 3.2.0, not stripped */
~$ file hello-static
hello-static: ELF 64-bit LSB executable, x86-64, version 1 (GNU/Linux), statically linked, BuildID[sha1]=3a7d527c20706a085b3e3145161dc3fec289b36a, for GNU/Linux 3.2.0, not stripped=> Case Two: Dynamically-linked binary: LOAD_TIME
LOAD_TIMEwe can choose to dynamically link the binary at load-time -> normal dynamic linking or at run-time -> lazy-binding. the difference lies in run-time performance but basically linking at load-time is is self-explanatory, all symbols and references are linked once the program is loaded into memory, this takes time at first but then the program runs fast. The other option is linking at run-time which does not waste the time for relocations until the first-call to an unresolved symbol, this takes no time at loading the binary but it's slower to run after comparing to linking at load-time. Linux by default uses lazy linking/binding, but with LD_LOAD_NOW environment variable, the linker is forced to perform all relocations right away.
Back on track, the dynamic linker needs to do the same job that the linker did at compile-time when we statically linked the binary, it would need Section View to map symbols to strings, but in this case the dynamic linker refers to other sections .dynsym and .dynstr along with .rel.dyn which contains relocation information, and a special section .dynamic that is a roadmap for the linker and the OS-loader to load and setup the binary for execution.
Note that for debuggers .strtab .symtab are needed for debugging symbols, but binaries are mostly stripped. but .dynstr and .dynsym are never stripped. so the distinction between both is for the compiler to know which symbols/strings tables to strip and which not.
=> Case Three: Executing the binary: EXECUTION_TIME
EXECUTION_TIMEThe typical scenario is that the OS loader grabs the binary file-image and maps it to memory, but how? it needs a map to do so, and to be more proficient the OS loader works along with the linker/dynamic linker to perform the binary set-up, although the OS loader is what fires up the linker at load-time and maps it to memory as the first thing, but anyhow let's stick to the mechanism with which this setup is done. For the OS loader and the dynamic linker to map the binary's file-image to memory, it needs a memory layout with which a Segment View is provided to give a generic layout of the binary as in what are the memory permission of this chuck? where it should be mapped?, as a generic view of this, the Sections the linker needed to perform it's linking and relocations are contained with bigger Segments that are mapped into memory; later we will see that not all sections gets mapped into memory as in .reloc for statically linked binaries. for example we have a segment of type PT_LOAD, simply put, that is a segment that gets loaded in memory, this segment can ave a flags of Read-Execute RE which signifies code, now this type of segment will most certainly have the .text, .plt .init .fini sections which all contain code. another PT_LOAD segment with Read-Write RW flags contains .bss .data and others.
long story short, as we will see next, Segment View is needed by the OS loader and the dynamic linker to perform the binary load and setup for execution, and the Section View is needed for the linker to perform the relocation and not all sections gets mapped into memory.
ELF-Executable Headers -> Ehdr
EhdrStructure of Elf64_Ehdr, starts at offset 0 and serves as a map to the rest of the file, it basically marks the ELF_Type, architecture information, entry point address to the beginning of execution, offsets to other file headers and other relative information.
the entries we care about are:
-> e_ident; Magic Number, required for the OS loader to process the file as an ELF file.
-> e_type; ELF type
ET_NONE -> An Unknown type.
ET_REL(1) ->A relocatable file: Dynamically linked object file.
ET_EXEC(2) -> An executable file.
ET_DYN(3) -> A shared object: Position Independent Code PIC.o file
ET_CORE(4) -> A core file: Core dumps of SIGSEGV program crash.
-> e_entry; VA to the start of execution, not really main() but rather to _start which is initialization code tacked in for every binary compiled with GCC.
-> e_phoff; file offset to the program header => Phder
-> e_shoff; file offset to the section header => Shder
-> e_phnum; number of entries in program header table, in other words, number of program headers.
-> e_shnum; number of entries in section header table, in other words, number of section headers.
-> e_shstrndx; simply put, it's an index to an entry in the section header that contains section header strings and name.

ELF Header of a simple Hello binary
Program Segment Headers -> Phdr
Segment Headers -> PhdrProgram headers table starts at e_phoff, is an array of structures Elf64_Phdr, each containing information describing a particular segment. Program Headers or Segments are execution information needed by the OS loader and the dynamic linker to provide a Segment View of the binary and a memory layout of the program in disk and how it should be mapped into memory. I'll use header/segment respectively as both are denoting to the same information.
Structure of a single segment: type Elf64_Phdr
TODO: File Padding not present in linux example
structure members we care about:
-> p_type; specifies SEGMENT type:
PT_LOAD: -> segment that is mapped into memory, each is identified by memory permission
.text is read/execute section so it's contained in a RE LOAD segment,
.rodata are read only thus is contained in R LOAD segment.
.bss is writable thus contained in RW LOAD segment.
PT_DYNAMIC: -> segment that hold dynamic linking information, only present if the executable is dynamically linked.
PT_INTERP: -> this structure member contains a literal string NULL-Terminaed String of the dynamic linker for this executable-binary. This gets loaded into memory first, then LOAD segments, then the dynamic linker does it's job of filling out function pointers needed for the executable that are imported from external libraries.
PT_PHDR: -> header information about the segment/header itself, as of the location and size of the header/segment in bytes.
-> p_offset: Where this segment's data starts in file
-> p_vaddr: Where in memory this segment starts
-> p_padder:** Physical address, not really present on modern linux OSs, as all programs gets loaded at virtual memory.
-> p_filesize: The size of this segment in the file image
-> p_memsize: The size of this segment in the memory image
-> p_align: this files indicates an alignment constraint to segments loaded in memory, if it's value is 0,1 then there no particular alignment needed, otherwise it must be a power of 2.
-> p_flags:** memory permissions for the segment: e.g -> R RE RW.
NOTE the difference between hello-static and hello-dynamic segments.
Section Header -> Shdr
ShdrSection Header Table starts at e_shoff is an Array of Elf64_Shdr structures each containing information of a specific section. the section header table provides a Section View that is used by the linker during link-time and with dynamic linking. each section is a blob of code/data which has no particular structure except for special sections. in fact the structure of a section depends on it's content.
Section headers provide convenient organization of the binary that is used by the linker and also parsed by static binary analysis tools. if an ELF doesn't need linking, Section Header Table is not needed, in fact in this case the Ehdr element e_shoff will be 0.
Structure of a single Section: Elf64_Shdr
elements we care about, all of them ... ( ̚‿̚ )
-> sh_name: index to the .shstrtab -> array of NULL-terminated strings, each for a section header.
-> sh_type: section type to give the OS loader an idea of where and how to interpret the data.
SHT_NULL -> the very first entry in the section header table, indicating an empty section.
SHT_PROGBITS -> section that contains either code/data.
SHT_NOBITS -> for .bss section contained uninitialized data.
SHT_DYNAMIC -> section header holding dynamic linking information
SHT_DYNSYM -> contains symbol table for dynamic linker.
SHT_SYMTAB -> contains symbol table for static linking
SHT_STRTAB -> strings table
SHT_REL, SHT_RELA -> for relocation information
-> sh_flags: additional information about the section
SHF_WRITE -> section that is writable in run-time e.g .bss .got.plt
SHF_ALLOC -> section that gets loaded in memory, sections without this flag gets discarded and don't end up in the memory-image
SHF_EXECINSTR -> indicates a section that contains executable instructions.
-> sh_addr: is this section ends up in the memory-image of the binary, this field will have the address at which the section's first byte should reside in memory.
-> sh_offset: file offset from the beginning of the file to the first byte of this section
-> sh_size: size of bytes in file, unless it's of SHT_NOBITS type, in this case that section won't have a size in file -> .bss section.
-> sh_link: index to a related section header, this information is needed by the linker. some sections have a relationship and the linker needs to know these relative section headers for linking purposes. e.g the SHT_SYMTAB or SHT_DYNSYM both have an associated section header that contains symbolic names for symbols in question ... SHT_STRTAB.
-> sh_info: information depending on the section.
-> sh_addralign: like p_align, defines an alignment constraint if any which value must be power of 2.
-> sh_entsize: some special sections that has a well-defined-structure (such as Elf64_Rel Elf64_Sym) have a fixed size entries, this field contains the size of the structure entry.
Sections we care about:
-> null: First section is always a SHT_NULL
-> .init: Code run by the OS before handing the control to the program to the main entry point.
-> .fini: Code run by the OS
-> .dynamic
​
Last updated