Dissecting an ELF File

WIP

Tools needed

readelf, hexedit, ldd, objdump, gdb

Stages of Program Execution

Binaries are compiled at different stages, as we now know, for the purpose of whats coming ahead I'll start giving a very high level overview of what the binary looks like and what need to be loaded and set-up in memory for execution.

=> Case One: Statically-linked binary: `LOAD_TIME`

even a simple Hello world! program has a bunch of external code linked from C libraries, you can create an executable that is linked at compile-time, this is what a statically-linked binary can be described as. you specify a -static flag with GCC when you compile the hello.c file, what happens in a very generic overview is that the linker which is specified with gcc at the end of compilation phase takes the hello.o file and fills-in all external and local functions that are used which are preprocessed in the pre-processing phase, but in order to do this, the linker needs whats known as Section View of the binary, in other words, a convenient organization of the binary that specifies which data/code be put where. and more importantly for resolving functions/variables in the code, aka associating a variables with it's values, functions with it's code.

Picture a big program with a number of X-file.c files where you #include both local-header.h or <global-header.h>, the linker with the help of the Section View will be able to substitute for example X variable with it value 97, and a function declaration for example add(y, x); with it's associated code defined elsewhere. Now for this purpose there is a section that holds names -> Array of NULL-Terminated strings for all functions/variables names; typically called .strtab, and another section that hold symbolic reference to these function/variable name with their associated code/value, typically called .symtab with .reloc section that contains information for relocations, .reloc entries are specifying addresses to places that needs relocations and instructions on how to apply this relocation, this relocation section doesn't end up in the file-image nor the memory-image of the binary since the linker used this relocation information from the object file, and all relocations are resolved.

Other sections you've came across are .rodata, .text, .bss. there are factors that separates these sections from each other, but for now think of what memory permission this blob of code/data needs, and what uses this blob of code/data.

~$ gcc -no-pie -static hello.c -o hello-static
/* later gcc has the default pie 'position independent executable', use -no-pie to be a normal executable.

file output:
hello: ELF 64-bit LSB pie executable, x86-64, version 1 (SYSV), dynamically linked, interpreter lib64/ld-linux-x86-64.so.2, BuildID[sha1]=4a9dc1c471da58c90d25934a1ffbafa5836404b5, for GNU/Linux 3.2.0, not stripped */

~$ file hello-static
hello-static: ELF 64-bit LSB executable, x86-64, version 1 (GNU/Linux), statically linked, BuildID[sha1]=3a7d527c20706a085b3e3145161dc3fec289b36a, for GNU/Linux 3.2.0, not stripped

=> Case Two: Dynamically-linked binary: `LOAD_TIME`

we can choose to dynamically link the binary at load-time -> normal dynamic linking or at run-time -> lazy-binding. the difference lies in run-time performance but basically linking at load-time is is self-explanatory, all symbols and references are linked once the program is loaded into memory, this takes time at first but then the program runs fast. The other option is linking at run-time which does not waste the time for relocations until the first-call to an unresolved symbol, this takes no time at loading the binary but it's slower to run after comparing to linking at load-time. Linux by default uses lazy linking/binding, but with LD_LOAD_NOW environment variable, the linker is forced to perform all relocations right away.

Back on track, the dynamic linker needs to do the same job that the linker did at compile-time when we statically linked the binary, it would need Section View to map symbols to strings, but in this case the dynamic linker refers to other sections .dynsym and .dynstr along with .rel.dyn which contains relocation information, and a special section .dynamic that is a roadmap for the linker and the OS-loader to load and setup the binary for execution.

~$ gcc -no-pie hello.c -o hello-dynamic

~$ file hello-dynamic 
hello-dynamic: ELF 64-bit LSB executable, x86-64, version 1 (SYSV), dynamically linked, interpreter /lib64/ld-linux-x86-64.so.2, BuildID[sha1]=1e0b60ec718684de1c3c63854bcd4c4eb2659874, for GNU/Linux 3.2.0, not stripped

Note that for debuggers .strtab .symtab are needed for debugging symbols, but binaries are mostly stripped. but .dynstr and .dynsym are never stripped. so the distinction between both is for the compiler to know which symbols/strings tables to strip and which not.

=> Case Three: Executing the binary: `EXECUTION_TIME`

The typical scenario is that the OS loader grabs the binary file-image and maps it to memory, but how? it needs a map to do so, and to be more proficient the OS loader works along with the linker/dynamic linker to perform the binary set-up, although the OS loader is what fires up the linker at load-time and maps it to memory as the first thing, but anyhow let's stick to the mechanism with which this setup is done. For the OS loader and the dynamic linker to map the binary's file-image to memory, it needs a memory layout with which a Segment View is provided to give a generic layout of the binary as in what are the memory permission of this chuck? where it should be mapped?, as a generic view of this, the Sections the linker needed to perform it's linking and relocations are contained with bigger Segments that are mapped into memory; later we will see that not all sections gets mapped into memory as in .reloc for statically linked binaries. for example we have a segment of type PT_LOAD, simply put, that is a segment that gets loaded in memory, this segment can ave a flags of Read-Execute RE which signifies code, now this type of segment will most certainly have the .text, .plt .init .fini sections which all contain code. another PT_LOAD segment with Read-Write RW flags contains .bss .data and others.

long story short, as we will see next, Segment View is needed by the OS loader and the dynamic linker to perform the binary load and setup for execution, and the Section View is needed for the linker to perform the relocation and not all sections gets mapped into memory.

ELF-Executable Headers -> `Ehdr`

Structure of Elf64_Ehdr, starts at offset 0 and serves as a map to the rest of the file, it basically marks the ELF_Type, architecture information, entry point address to the beginning of execution, offsets to other file headers and other relative information.


  // ELF header (Ehdr)
    //   The ELF header is described by the type Elf32_Ehdr or Elf64_Ehdr:

           #define EI_NIDENT 16
           typedef struct {
               unsigned char e_ident[EI_NIDENT];
               uint16_t      e_type;
               uint16_t      e_machine;
               uint32_t      e_version;
               ElfN_Addr     e_entry;
               ElfN_Off      e_phoff;
               ElfN_Off      e_shoff;
               uint32_t      e_flags;
               uint16_t      e_ehsize;
               uint16_t      e_phentsize;
               uint16_t      e_phnum;
               uint16_t      e_shentsize;
               uint16_t      e_shnum;
               uint16_t      e_shstrndx;
           } ElfN_Ehdr;

the entries we care about are:

-> e_ident; Magic Number, required for the OS loader to process the file as an ELF file.

00000000   7F 45 4C 46   :   .ELF

-> e_type; ELF type

ET_NONE -> An Unknown type. ET_REL(1) ->A relocatable file: Dynamically linked object file. ET_EXEC(2) -> An executable file. ET_DYN(3) -> A shared object: Position Independent Code PIC.o file ET_CORE(4) -> A core file: Core dumps of SIGSEGV program crash.

-> e_entry; VA to the start of execution, not really main() but rather to _start which is initialization code tacked in for every binary compiled with GCC.

this is the real entry point to execution, unlike windows, linux does not have TLS_Callbacks.

-> e_phoff; file offset to the program header => Phder

-> e_shoff; file offset to the section header => Shder

-> e_phnum; number of entries in program header table, in other words, number of program headers.

-> e_shnum; number of entries in section header table, in other words, number of section headers.

-> e_shstrndx; simply put, it's an index to an entry in the section header that contains section header strings and name.

Picture it: starting from phoff that has phnum entries, the index shstrndx contains a section of of the all section headers .shstrtab.

ELF Header of a simple Hello binary


~$ readelf -h hello-static

ELF Header:
  Magic:   7f 45 4c 46 02 01 01 03 00 00 00 00 00 00 00 00 
  Class:                             ELF64
  Data:                              2's complement, little endian
  Version:                           1 (current)
  OS/ABI:                            UNIX - GNU
  ABI Version:                       0
  Type:                              EXEC (Executable file)
  Machine:                           Advanced Micro Devices X86-64
  Version:                           0x1
  Entry point address:               0x401bc0
  Start of program headers:          64 (bytes into file)
  Start of section headers:          860912 (bytes into file)
  Flags:                             0x0
  Size of this header:               64 (bytes)
  Size of program headers:           56 (bytes)
  Number of program headers:         10
  Size of section headers:           64 (bytes)
  Number of section headers:         32
  Section header string table index: 31


~$ readelf -h hello-dynamic

ELF Header:
  Magic:   7f 45 4c 46 02 01 01 00 00 00 00 00 00 00 00 00 
  Class:                             ELF64
  Data:                              2's complement, little endian
  Version:                           1 (current)
  OS/ABI:                            UNIX - System V
  ABI Version:                       0
  Type:                              EXEC (Executable file)
  Machine:                           Advanced Micro Devices X86-64
  Version:                           0x1
  Entry point address:               0x401050
  Start of program headers:          64 (bytes into file)
  Start of section headers:          14648 (bytes into file)
  Flags:                             0x0
  Size of this header:               64 (bytes)
  Size of program headers:           56 (bytes)
  Number of program headers:         13
  Size of section headers:           64 (bytes)
  Number of section headers:         31
  Section header string table index: 30

Program `Segment` Headers -> `Phdr`

Program headers table starts at e_phoff, is an array of structures Elf64_Phdr, each containing information describing a particular segment. Program Headers or Segments are execution information needed by the OS loader and the dynamic linker to provide a Segment View of the binary and a memory layout of the program in disk and how it should be mapped into memory. I'll use header/segment respectively as both are denoting to the same information.

Structure of a single segment: type Elf64_Phdr

TODO: File Padding not present in linux example

 typedef struct {
               uint32_t   p_type;
               uint32_t   p_flags;
               Elf64_Off  p_offset;
               Elf64_Addr p_vaddr;
               Elf64_Addr p_paddr;
               uint64_t   p_filesz;
               uint64_t   p_memsz;
               uint64_t   p_align;
           } Elf64_Phdr;

structure members we care about:

-> p_type; specifies SEGMENT type:

PT_LOAD: -> segment that is mapped into memory, each is identified by memory permission

.text is read/execute section so it's contained in a RE LOAD segment,

.rodata are read only thus is contained in R LOAD segment.

.bss is writable thus contained in RW LOAD segment.

PT_DYNAMIC: -> segment that hold dynamic linking information, only present if the executable is dynamically linked.

PT_INTERP: -> this structure member contains a literal string NULL-Terminaed String of the dynamic linker for this executable-binary. This gets loaded into memory first, then LOAD segments, then the dynamic linker does it's job of filling out function pointers needed for the executable that are imported from external libraries.

PT_PHDR: -> header information about the segment/header itself, as of the location and size of the header/segment in bytes.

-> p_offset: Where this segment's data starts in file -> p_vaddr: Where in memory this segment starts -> p_padder:** Physical address, not really present on modern linux OSs, as all programs gets loaded at virtual memory.

-> p_filesize: The size of this segment in the file image -> p_memsize: The size of this segment in the memory image

-> p_align: this files indicates an alignment constraint to segments loaded in memory, if it's value is 0,1 then there no particular alignment needed, otherwise it must be a power of 2.

-> p_flags:** memory permissions for the segment: e.g -> R RE RW.

NOTE the difference between hello-static and hello-dynamic segments.

~$ readelf -l hello-dynamic 

Elf file type is EXEC (Executable file)
Entry point 0x401050
There are 13 program headers, starting at offset 64

Program Headers:
  Type           Offset             VirtAddr           PhysAddr
                 FileSiz            MemSiz              Flags  Align
  PHDR           0x0000000000000040 0x0000000000400040 0x0000000000400040
                 0x00000000000002d8 0x00000000000002d8  R      0x8
  INTERP         0x0000000000000318 0x0000000000400318 0x0000000000400318
                 0x000000000000001c 0x000000000000001c  R      0x1
      [Requesting program interpreter: /lib64/ld-linux-x86-64.so.2]
  LOAD           0x0000000000000000 0x0000000000400000 0x0000000000400000
                 0x00000000000004d0 0x00000000000004d0  R      0x1000
  LOAD           0x0000000000001000 0x0000000000401000 0x0000000000401000
                 0x00000000000001e5 0x00000000000001e5  R E    0x1000
  LOAD           0x0000000000002000 0x0000000000402000 0x0000000000402000
                 0x0000000000000158 0x0000000000000158  R      0x1000
  LOAD           0x0000000000002e10 0x0000000000403e10 0x0000000000403e10
                 0x0000000000000220 0x0000000000000228  RW     0x1000
  DYNAMIC        0x0000000000002e20 0x0000000000403e20 0x0000000000403e20
                 0x00000000000001d0 0x00000000000001d0  RW     0x8
  NOTE           0x0000000000000338 0x0000000000400338 0x0000000000400338
                 0x0000000000000020 0x0000000000000020  R      0x8
  NOTE           0x0000000000000358 0x0000000000400358 0x0000000000400358
                 0x0000000000000044 0x0000000000000044  R      0x4
  GNU_PROPERTY   0x0000000000000338 0x0000000000400338 0x0000000000400338
                 0x0000000000000020 0x0000000000000020  R      0x8
  GNU_EH_FRAME   0x0000000000002014 0x0000000000402014 0x0000000000402014
                 0x0000000000000044 0x0000000000000044  R      0x4
  GNU_STACK      0x0000000000000000 0x0000000000000000 0x0000000000000000
                 0x0000000000000000 0x0000000000000000  RW     0x10
  GNU_RELRO      0x0000000000002e10 0x0000000000403e10 0x0000000000403e10
                 0x00000000000001f0 0x00000000000001f0  R      0x1

 Section to Segment mapping:
  Segment Sections...
   00     
   01     .interp 
   02     .interp .note.gnu.property .note.gnu.build-id .note.ABI-tag .gnu.hash .dynsym .dynstr .gnu.version .gnu.version_r .rela.dyn .rela.plt 
   03     .init .plt .plt.sec .text .fini 
   04     .rodata .eh_frame_hdr .eh_frame 
   05     .init_array .fini_array .dynamic .got .got.plt .data .bss 
   06     .dynamic 
   07     .note.gnu.property 
   08     .note.gnu.build-id .note.ABI-tag 
   09     .note.gnu.property 
   10     .eh_frame_hdr 
   11     
   12     .init_array .fini_array .dynamic .got

~$ readelf -l hello-static 

Elf file type is EXEC (Executable file)
Entry point 0x401bc0
There are 10 program headers, starting at offset 64

Program Headers:
  Type           Offset             VirtAddr           PhysAddr
                 FileSiz            MemSiz              Flags  Align
  LOAD           0x0000000000000000 0x0000000000400000 0x0000000000400000
                 0x0000000000000518 0x0000000000000518  R      0x1000
  LOAD           0x0000000000001000 0x0000000000401000 0x0000000000401000
                 0x0000000000092db5 0x0000000000092db5  R E    0x1000
  LOAD           0x0000000000094000 0x0000000000494000 0x0000000000494000
                 0x0000000000025315 0x0000000000025315  R      0x1000
  LOAD           0x00000000000ba140 0x00000000004bb140 0x00000000004bb140
                 0x00000000000050d0 0x0000000000006880  RW     0x1000
  NOTE           0x0000000000000270 0x0000000000400270 0x0000000000400270
                 0x0000000000000020 0x0000000000000020  R      0x8
  NOTE           0x0000000000000290 0x0000000000400290 0x0000000000400290
                 0x0000000000000044 0x0000000000000044  R      0x4
  TLS            0x00000000000ba140 0x00000000004bb140 0x00000000004bb140
                 0x0000000000000020 0x0000000000000060  R      0x8
  GNU_PROPERTY   0x0000000000000270 0x0000000000400270 0x0000000000400270
                 0x0000000000000020 0x0000000000000020  R      0x8
  GNU_STACK      0x0000000000000000 0x0000000000000000 0x0000000000000000
                 0x0000000000000000 0x0000000000000000  RW     0x10
  GNU_RELRO      0x00000000000ba140 0x00000000004bb140 0x00000000004bb140
                 0x0000000000002ec0 0x0000000000002ec0  R      0x1

 Section to Segment mapping:
  Segment Sections...
   00     .note.gnu.property .note.gnu.build-id .note.ABI-tag .rela.plt 
   01     .init .plt .text __libc_freeres_fn .fini 
   02     .rodata .stapsdt.base .eh_frame .gcc_except_table 
   03     .tdata .init_array .fini_array .data.rel.ro .got .got.plt .data __libc_subfreeres __libc_IO_vtables __libc_atexit .bss __libc_freeres_ptrs 
   04     .note.gnu.property 
   05     .note.gnu.build-id .note.ABI-tag 
   06     .tdata .tbss 
   07     .note.gnu.property 
   08     
   09     .tdata .init_array .fini_array .data.rel.ro .got

Linux forces a memory alignment with 0x1000 page size, as the OS loader deals with CHUNKS of memory.

Linux does not force file alignment.

Section Header -> `Shdr`

Section Header Table starts at e_shoff is an Array of Elf64_Shdr structures each containing information of a specific section. the section header table provides a Section View that is used by the linker during link-time and with dynamic linking. each section is a blob of code/data which has no particular structure except for special sections. in fact the structure of a section depends on it's content.

Section headers provide convenient organization of the binary that is used by the linker and also parsed by static binary analysis tools. if an ELF doesn't need linking, Section Header Table is not needed, in fact in this case the Ehdr element e_shoff will be 0.

Structure of a single Section: Elf64_Shdr


typedef struct {
               uint32_t   sh_name;
               uint32_t   sh_type;
               uint64_t   sh_flags;
               Elf64_Addr sh_addr;
               Elf64_Off  sh_offset;
               uint64_t   sh_size;
               uint32_t   sh_link;
               uint32_t   sh_info;
               uint64_t   sh_addralign;
               uint64_t   sh_entsize;
           } Elf64_Shdr;

elements we care about, all of them ... ( ̚‿̚ )

-> sh_name: index to the .shstrtab -> array of NULL-terminated strings, each for a section header.

-> sh_type: section type to give the OS loader an idea of where and how to interpret the data.

SHT_NULL -> the very first entry in the section header table, indicating an empty section.

SHT_PROGBITS -> section that contains either code/data.

SHT_NOBITS -> for .bss section contained uninitialized data.

SHT_DYNAMIC -> section header holding dynamic linking information

SHT_DYNSYM -> contains symbol table for dynamic linker.

SHT_SYMTAB -> contains symbol table for static linking

SHT_STRTAB -> strings table

SHT_REL, SHT_RELA -> for relocation information

-> sh_flags: additional information about the section

SHF_WRITE -> section that is writable in run-time e.g .bss .got.plt

SHF_ALLOC -> section that gets loaded in memory, sections without this flag gets discarded and don't end up in the memory-image

SHF_EXECINSTR -> indicates a section that contains executable instructions.

-> sh_addr: is this section ends up in the memory-image of the binary, this field will have the address at which the section's first byte should reside in memory.

-> sh_offset: file offset from the beginning of the file to the first byte of this section

-> sh_size: size of bytes in file, unless it's of SHT_NOBITS type, in this case that section won't have a size in file -> .bss section.

-> sh_link: index to a related section header, this information is needed by the linker. some sections have a relationship and the linker needs to know these relative section headers for linking purposes. e.g the SHT_SYMTAB or SHT_DYNSYM both have an associated section header that contains symbolic names for symbols in question ... SHT_STRTAB.

-> sh_info: information depending on the section.

-> sh_addralign: like p_align, defines an alignment constraint if any which value must be power of 2.

-> sh_entsize: some special sections that has a well-defined-structure (such as Elf64_Rel Elf64_Sym) have a fixed size entries, this field contains the size of the structure entry.

how to get a section name: from e_shstrndx we have an index to the .shstrtab section, with sh_name we get an offset into the section header string table.

Sections we care about:

-> null: First section is always a SHT_NULL

-> .init: Code run by the OS before handing the control to the program to the main entry point.

-> .fini: Code run by the OS

-> .dynamic

Last updated 5 months ago

Stages of Program Execution

=> Case One: Statically-linked binary: LOAD_TIME

=> Case Two: Dynamically-linked binary: LOAD_TIME

=> Case Three: Executing the binary: EXECUTION_TIME

ELF-Executable Headers -> Ehdr

Program Segment Headers -> Phdr

Section Header -> Shdr