Dissecting an ELF File
WIP
Tools needed
readelf, hexedit, ldd, objdump, gdb
Stages of Program Execution
Binaries are compiled at different stages, as we now know, for the purpose of whats coming ahead I'll start giving a very high level overview of what the binary looks like and what need to be loaded and set-up in memory for execution.
=> Case One: Statically-linked binary: LOAD_TIME
LOAD_TIME
even a simple Hello world!
program has a bunch of external code linked from C libraries, you can create an executable that is linked at compile-time, this is what a statically-linked binary can be described as. you specify a -static
flag with GCC when you compile the hello.c
file, what happens in a very generic overview is that the linker which is specified with gcc
at the end of compilation phase takes the hello.o
file and fills-in all external and local functions that are used which are preprocessed in the pre-processing phase, but in order to do this, the linker needs whats known as Section View
of the binary, in other words, a convenient organization of the binary that specifies which data/code be put where. and more importantly for resolving functions/variables in the code, aka associating a variables with it's values, functions with it's code.
Picture a big program with a number of X-file.c
files where you #include both local-header.h
or <global-header.h>
, the linker with the help of the Section View
will be able to substitute for example X
variable with it value 97
, and a function declaration for example add(y, x);
with it's associated code defined elsewhere. Now for this purpose there is a section that holds names -> Array of NULL-Terminated strings
for all functions/variables names; typically called .strtab
, and another section that hold symbolic reference to these function/variable name with their associated code/value, typically called .symtab
with .reloc
section that contains information for relocations, .reloc
entries are specifying addresses to places that needs relocations and instructions on how to apply this relocation, this relocation section doesn't end up in the file-image nor the memory-image of the binary since the linker used this relocation information from the object file, and all relocations are resolved.
Other sections you've came across are .rodata
, .text
, .bss
. there are factors that separates these sections from each other, but for now think of what memory permission this blob of code/data needs, and what uses this blob of code/data.
=> Case Two: Dynamically-linked binary: LOAD_TIME
LOAD_TIME
we can choose to dynamically link the binary at load-time -> normal dynamic linking
or at run-time -> lazy-binding
. the difference lies in run-time performance but basically linking at load-time is is self-explanatory, all symbols and references are linked once the program is loaded into memory, this takes time at first but then the program runs fast. The other option is linking at run-time which does not waste the time for relocations
until the first-call to an unresolved symbol, this takes no time at loading the binary but it's slower to run after comparing to linking at load-time
. Linux by default uses lazy linking/binding, but with LD_LOAD_NOW
environment variable, the linker is forced to perform all relocations right away.
Back on track, the dynamic linker needs to do the same job that the linker did at compile-time when we statically linked the binary, it would need Section View
to map symbols to strings, but in this case the dynamic linker refers to other sections .dynsym
and .dynstr
along with .rel.dyn
which contains relocation information, and a special section .dynamic
that is a roadmap for the linker and the OS-loader to load and setup the binary for execution.
Note that for debuggers .strtab
.symtab
are needed for debugging symbols, but binaries are mostly stripped. but .dynstr
and .dynsym
are never stripped. so the distinction between both is for the compiler to know which symbols/strings tables to strip and which not.
=> Case Three: Executing the binary: EXECUTION_TIME
EXECUTION_TIME
The typical scenario is that the OS loader grabs the binary file-image and maps it to memory, but how? it needs a map to do so, and to be more proficient the OS loader works along with the linker/dynamic linker to perform the binary set-up, although the OS loader is what fires up the linker at load-time and maps it to memory as the first thing, but anyhow let's stick to the mechanism with which this setup is done. For the OS loader and the dynamic linker to map the binary's file-image to memory, it needs a memory layout with which a Segment View
is provided to give a generic layout of the binary as in what are the memory permission of this chuck? where it should be mapped?, as a generic view of this, the Sections the linker needed to perform it's linking and relocations are contained with bigger Segments that are mapped into memory; later we will see that not all sections gets mapped into memory as in .reloc
for statically linked binaries. for example we have a segment of type PT_LOAD
, simply put, that is a segment that gets loaded in memory, this segment can ave a flags of Read-Execute RE
which signifies code, now this type of segment will most certainly have the .text
, .plt
.init
.fini
sections which all contain code. another PT_LOAD
segment with Read-Write
RW flags contains .bss
.data
and others.
long story short, as we will see next, Segment View is needed by the OS loader and the dynamic linker to perform the binary load and setup for execution, and the Section View is needed for the linker to perform the relocation and not all sections gets mapped into memory.
ELF-Executable Headers -> Ehdr
Ehdr
Structure of Elf64_Ehdr
, starts at offset 0
and serves as a map to the rest of the file, it basically marks the ELF_Type
, architecture information, entry point address to the beginning of execution, offsets to other file headers and other relative information.
the entries we care about are:
-> e_ident;
Magic Number, required for the OS loader to process the file as an ELF file.
-> e_type;
ELF type
ET_NONE
-> An Unknown type.
ET_REL(1)
->A relocatable file: Dynamically linked object file.
ET_EXEC(2)
-> An executable file.
ET_DYN(3)
-> A shared object: Position Independent Code PIC.o
file
ET_CORE(4)
-> A core file: Core dumps of SIGSEGV
program crash.
-> e_entry;
VA to the start of execution, not really main()
but rather to _start
which is initialization code tacked in for every binary compiled with GCC.
this is the real entry point to execution, unlike windows, linux does not have TLS_Callbacks
.
-> e_phoff;
file offset to the program header => Phder
-> e_shoff;
file offset to the section header => Shder
-> e_phnum;
number of entries in program header table, in other words, number of program headers.
-> e_shnum;
number of entries in section header table, in other words, number of section headers.
-> e_shstrndx;
simply put, it's an index to an entry in the section header that contains section header strings and name.
Picture it: starting from phoff
that has phnum
entries, the index shstrndx
contains a section of of the all section headers .shstrtab
.
ELF Header of a simple Hello binary
Program Segment
Headers -> Phdr
Segment
Headers -> Phdr
Program headers table starts at e_phoff
, is an array of structures Elf64_Phdr
, each containing information describing a particular segment. Program Headers or Segments are execution information needed by the OS loader and the dynamic linker to provide a Segment View
of the binary and a memory layout of the program in disk and how it should be mapped into memory. I'll use header/segment respectively as both are denoting to the same information.
Structure of a single segment: type Elf64_Phdr
TODO: File Padding not present in linux example
structure members we care about:
-> p_type;
specifies SEGMENT type:
PT_LOAD:
-> segment that is mapped into memory, each is identified by memory permission
.text
is read/execute section so it's contained in a RE LOAD
segment,
.rodata
are read only thus is contained in R LOAD
segment.
.bss
is writable thus contained in RW LOAD
segment.
PT_DYNAMIC:
-> segment that hold dynamic linking information, only present if the executable is dynamically linked.
PT_INTERP:
-> this structure member contains a literal string NULL-Terminaed String
of the dynamic linker for this executable-binary. This gets loaded into memory first, then LOAD
segments, then the dynamic linker does it's job of filling out function pointers needed for the executable that are imported
from external libraries.
PT_PHDR:
-> header information about the segment/header itself, as of the location and size of the header/segment in bytes.
-> p_offset:
Where this segment's data starts in file
-> p_vaddr:
Where in memory this segment starts
-> p_padder:
** Physical address, not really present on modern linux OSs, as all programs gets loaded at virtual memory.
-> p_filesize:
The size of this segment in the file image
-> p_memsize:
The size of this segment in the memory image
-> p_align:
this files indicates an alignment constraint to segments loaded in memory, if it's value is 0,1
then there no particular alignment needed, otherwise it must be a power of 2
.
-> p_flags:
** memory permissions for the segment: e.g -> R RE RW
.
NOTE the difference between hello-static
and hello-dynamic
segments.
Linux forces a memory alignment with 0x1000
page size, as the OS loader deals with CHUNKS of memory.
Linux does not force file alignment.
Section Header -> Shdr
Shdr
Section Header Table starts at e_shoff
is an Array of Elf64_Shdr
structures each containing information of a specific section. the section header table provides a Section View
that is used by the linker during link-time and with dynamic linking. each section is a blob of code/data which has no particular structure except for special sections. in fact the structure of a section depends on it's content.
Section headers provide convenient organization of the binary that is used by the linker and also parsed by static binary analysis tools. if an ELF doesn't need linking, Section Header Table is not needed, in fact in this case the Ehdr
element e_shoff
will be 0
.
Structure of a single Section: Elf64_Shdr
elements we care about, all of them ... ( ̚‿̚ )
-> sh_name:
index to the .shstrtab
-> array of NULL-terminated strings, each for a section header.
-> sh_type:
section type to give the OS loader an idea of where and how to interpret the data.
SHT_NULL
-> the very first entry in the section header table, indicating an empty section.
SHT_PROGBITS
-> section that contains either code/data.
SHT_NOBITS
-> for .bss
section contained uninitialized data.
SHT_DYNAMIC
-> section header holding dynamic linking information
SHT_DYNSYM
-> contains symbol table for dynamic linker.
SHT_SYMTAB
-> contains symbol table for static linking
SHT_STRTAB
-> strings table
SHT_REL
, SHT_RELA
-> for relocation information
-> sh_flags:
additional information about the section
SHF_WRITE
-> section that is writable in run-time e.g .bss
.got.plt
SHF_ALLOC
-> section that gets loaded in memory, sections without this flag gets discarded and don't end up in the memory-image
SHF_EXECINSTR
-> indicates a section that contains executable instructions.
-> sh_addr:
is this section ends up in the memory-image of the binary, this field will have the address at which the section's first byte should reside in memory.
-> sh_offset:
file offset from the beginning of the file to the first byte of this section
-> sh_size:
size of bytes in file, unless it's of SHT_NOBITS
type, in this case that section won't have a size in file -> .bss
section.
-> sh_link:
index to a related section header, this information is needed by the linker. some sections have a relationship and the linker needs to know these relative section headers for linking purposes. e.g the SHT_SYMTAB
or SHT_DYNSYM
both have an associated section header that contains symbolic names for symbols in question ... SHT_STRTAB
.
-> sh_info:
information depending on the section.
-> sh_addralign:
like p_align
, defines an alignment constraint if any
which value must be power of 2
.
-> sh_entsize:
some special sections that has a well-defined-structure (such as Elf64_Rel
Elf64_Sym
) have a fixed size entries, this field contains the size of the structure entry.
how
to get a section name: from e_shstrndx
we have an index to the .shstrtab
section, with sh_name
we get an offset into the section header string table.
Sections we care about:
-> null
: First section is always a SHT_NULL
-> .init
: Code run by the OS before handing the control to the program to the main entry point.
-> .fini
: Code run by the OS
-> .dynamic
Last updated