Dissecting a PE File
References
Intro
Operating systems play a key role in reversing. That’s because programs are tightly integrated with operating systems, and plenty of information can be gathered by probing this interface. Moreover, the eventual bottom line of every program is in its communication with the outside world (the program receives user input and outputs data on the screen, writes to a file, and so on), which means that identifying and understanding the bridging points between application programs and the operating system is critical.
Secrets of Reverse Engineering
Core concepts of operating systems and the low level work behind the scenes is a needed Knowledge to actually reverse engineer either user land applications, kernel modules and drivers or even firmware.
This Episode will go through the PE File Format, dissecting its headers and highlighting important entries that reveals important information and saves a lot of time finding anomalies in malicious binaries, and detecting malformed ones.
Basic Pseudo Structure of what we need to know about a PE-File.
DOS Header
MS_DOS
STUB that prints this program cannot be run in DOS mode
, though this stub can be changed.
IMPORTANT ENTRIES
-> e_magic;
ASCII MZ
, if the OS Loader don't find this Signature. it will drop the executable - Mark Zbikowski
->e_lfanew;
last member of the DOS-Header Structure, contains an offset to the next header/ structure.
NT Header | PE Header
This Structure in its entirety is considered the PE-Header.
Almost the wrapper of what the windows loader needs to process the binary, starting from a MS_DOS
STUB to a standard COFF File format and the Windows-Specific PE file format. This Structure has three entries, with the last two entries as structures embedded in this structure.
IMPORTANT ENTRIES
->Signature;
ASCII String for PE
-> _FILE_HEADER{ ... };
The standard COFF File header.
PE is created over the COFF file format as an extension to it, and in some resources you'll find this structure referenced as [ File Header | COFF File header ] .
-> _OPTIONAL_HEADER{ ... };
Structure that's needed by the windows loader to load, setup and execute the binary image i.e. executables.
=> COFF File-Header
This is a generic COFF File Format, it can be extended by other formats depending on their specification, this extension is set up in another header, for windows it's called _OPTIONAL_HEADER.
IMPORTANT ENTRIES
-> Machine;
CPU architecture specification, [ 32bit architecture -> 0x14C
| 64bit architecture -> 0x8664
]
->NumberOfSections;
Contains number of Section Headers ahead.
->TimeDateStamp;
Time of compiling the binary, set at Link_Time
, this option can be changed to mislead analysts as this is a very powerful technique to track malicious binaries, yet it's not the only TimeDateStamp
entry in the binary, there is a debug TimeDateStamp
entry in _Debug_Directory{ }
;
structure as we will see later.
-> Characteristics;
Attributes of the object or image file to indicate its type.
=> Optional Headers
Extension Specifications for Windows Images/ PE file format, this is REQUIRED for executables but OPTIONAL in the sense that it's not needed for object files though they can have it, but it's not more than a bloat, as object files don't really follow the PE format and they can be thought of as a sort of an archive.
IMPORTANT ENTRIES:
->Magic;
True determinant if the application is [32bit | 64bit], While the [ Machine; ] field in the NT-Header signifies the CPU/ Architecture to run on, this field is what actually matters for the OS-Loader to either parse the _OPTIONAL_HEADER for [ 32bit specifications or 64bit specifications ].
-> AddressOfEntryPoint;
Contains RVA -> Relative Virtual Address to the memory image, the place where the loader should head to after loading and setting up the image, where execution begins, though not necessarily the start of the .text or the main(); , but in general this is where the image starts executing code. the place where the debugger stops after loading the binary.
->ImageBase;
Contains the preferred address to where this image should be mapped in memory, though for .exe it's always the virtual address of _0x00400000
** with no ASLR enabled_, but for DLLs developers are encouraged to rebase the file, i.e. choose a non-default ImageBase
_which is 0x10000000
default address for DLLs_ to avoid collision and save the loader the burden of relocating the DLL at runtime, but that was the pre-ASLR , for now this ImageBase
doesn't really matter as it's randomly changed by The Kernel Memory Manager.
there is a disscussion for REBASING, RELOCATING, and ASLR later...
->SectionAlignment;
Windows specifies a SectionAlignment for binaries that must be validated by the loader before mapping what's called Sections into memory, and basically this value (in bytes) is of a page size of the architecture's memory -> e.g [ 0x1000 == 4096bytes ]
a page or it's multiples, so sections should be of this size to be mapped in memory properly.
-> FileAlignment;
Contains Value (in bytes) of Alignment Factor to align raw data of the File Image on Disk, must be power of 2
, default value is [ 0x200 -> 512bytes ``]
-> HardDisk | Page Size | [ 0x1000 -> 4096bytes ]
-> newer HDD. Sections on File must be padded out if it takes less than 512bytes
page size.
SectionAlignment must be equal to or greater than FileAlignment.
->DLLCharacteristics;
This field is mostly for security, defining support for ASLR
, DEP
, Integrity checks ...etc.
ASLR Adress Space Layout Randomization.
explicit support for ASLR, that allows DLLs to be dynamically relocated anywhere in memory, with a .reloc section specifying a list of all places/ functions' offsets that needs fixing. => set as a linker option: /DYNAMICBASE
yet for executables another option must be set /FIXED:NO to let the compiler generate a .reloc section as relocation is set to [ OFF ] for executables by default.
Integrity Checks.
Check for a signature attached along the whole headers to check for this image/binary integrity, if not present or mangled the OS-Loader will drop the image.
DEP Data Execution Prevention.
Memory flag to set the stack/ heap/ data as non executable, so no section in memory will have [WE | WX -> Writable-Executable] flag at the same time.
=> set by the linker option /NXCOMPAT
SAFESEH
long story short, if set it means that this binary doesn't use a Structured Exception Handler and if any exception occurs, It's an explicit order to just kill the binary.
=> set by the linker option /SAFESEH
Control Flow Guard.
Image that has explicit support for Control Flow Guard, makes checks for indirect-call targets of code's control-flow at runtime.
=> set by linker option /guard:cf
=> disabled by /guard:cf
Terminal Server Aware Applications
Mechanism behind using RDP
Remote Desktop Protocol to control remote desktops and interact with it as if you were in a GUI System.
=> set by linker option /TSAWARE
-> _IMAGE_DATA_DIRECTORY DataDirectory[IMAGE_NUMBEROF_DIRECTORY_ENTRIES];
Basically an array with pointers to structures that are called Data Directories with information about imports, exports, relocation, TLS, debug information, signatures ..etc
For each structure / data directory ->
-> VirtualAddress;
In fact RVA Relative Virtual Address
to the data structure.
-> Size;
Size of the data structure.
Section Headers
IMMEDIATELY AFTER the PE | NT-Headers, there is an array of Structures | SECTION HEADERS ==
``[ _IMAGE_FILE_HEADER.NumberOfSections ]
each structure/ section header consists of:
IMPORTANT ENTRIES
-> Name[8];
array of 8bytes UTF-8 characters containing the name of the section, null-padded if the section name is less than 8bytes, if equal to 8bytes there will be no null-termination character.
-> Misc.VirtualSize;
Size of Section in Memory.
UNION is a structure used to store multiple interpretation of the same exact data, Misc.VirtualSize == Misc.PhysicalAddress
, but all we will witness is Misc.VirtualSize
since PhysicalAddress is not really referenced anymore for recent architectures.
-> VirtualAddress;
RVA -> Offset to the Section relative to the Image's base address.
Absolute Virtual Address of a section == [ SectionHeader.VirtualAddress + OptionalHeader.ImageBase ]
-> SizeOfRawData;
Contains Size of section on disk.
-> PointerToRawData;
Contains offset to the Section in disk relative to the beginning of the file.
for Sections with writable memory protection, there is a noticeable difference between Section's VirtualSize/ RawSize.
-> Characterstics;
Different characteristics of a section including memory permissions, type of data in the section ..etc
=> Sections
Sections are portions of data/ code with similar memory protections and purpose grouped together. Sections has to be explicitly ordered, Various Sections are:
.text -> ALWAYS NON_PAGEABLE** Cannot be paged out of memory to disk, Contains Code.
.data -> Contains global data that can be changed, READ/ WRITE protections.
.bss -> Contains data that's not initialized, gets merged with .data, takes no space on disk but takes space in memory.
.rdata- > READ-ONLY data, i.e. Strings.
.idata -> Contains Imports information, usually merged with .data
.edata -> Contains exports information, usually merged with .data
.pdata -> Contains debugging/ exception processing information, usually merged with .data
.reloc -> Contains relocation information with all constants that needs fixing by the Loader.
.rsrc -> Contains resources from Icons with different resolutions, to STUBS and kernel modules that are dropped and later executed in case of Malware
SECTIONS can have any name, and can be merged by the Linker, though the linker will warn about merging sections of different memory permissions.
For an Understanding of the PE-File Format, this pretty much does the job, in later parts we will dive deeply into Data Directories discussing Imports/ Exports/ Debug Information/ Relocations and much more.
Last updated