Dissecting a PE File Format Data Directories p1 Imports Exports
Last updated
Last updated
In this Episode and the next few ones we will touch on Important Entries in the Data Directory array of structures, for a refresher on the PE-File Format, you might need to take a look at Dissecting a PE Binary: PE File Format .
There are tons of information that defines the Characteristics and behavior of a program, these definitions are very well structured as part of the PE-File Format in what's called [ DataDirectory ]
. [ DataDirectory ]
is an array of structures, each structure is of type [ IMAGE_DATA_DIRECTORY ]
. each structure is called a DIRECTORY ENTRY and can be accessed by its index to the [ DataDirectory ]
or by its MACRO name.
Each Data Directory Entry is yet another structure with two Members -> an [ RVA ]
to the Intended Directory, a [ Size ]
of the Directory Table.
In this Episode we will discuss the Import and Export Directories in Detail, as they sync together as in their function for the PE-Loader to load any PE Binary successfully.
1st Entry in the [ DataDirectory ]
Structure, as all Directory Entries, This structure has two members, an [ RVA ]
to the Export Directory, [ Size ]
of the Export Directory.
For Exports we will investigate [ KERNEL32.DLL ]
:
For each library that exports functions for other binaries to use, there is only ONE Export Directory Structure. Defined with struct type [ IMAGE_EXPORT_DIRECTORY ]
.
KERNEL32.dll
Data Directory Array of Structures:
=> Each Structure is of 8bytes
, 1st member is an [ RVA ]
to the Directory in Memory.
Structure of The Export Directory
IMPORTANT ENTRIES:
->** Name;
** Name of the DLL.
-> Base;
First Ordinal Number, For resolving Exports by Ordinals this value is needed to calculate if a given Ordinal is Valid or not.
an Ordinal must NOT be less than the [ Base ]
and must NOT be greater than [ Base+NumberOfNames ]
-> NumberOfFunctions;
Number of Entries in the EAT [ Export Address Table ]
.
-> NumberOfNames;
Number of Entries in the ENT [ Export Name Table ]
.
-> AddressOfFunctions;
RVA to the EAT [ Export Address Table ]
Array of Pointers to functions.
EAT entries points to either -> A) Function CODE if the exported function is a local function to the DLL. B) Forwarder String if the exported function is Forwarder from another DLL.
-> AddressOfNames;
RVA to the ENT [ Export Names Table
].
-> AddressOfNameOrdinals;
RVA to the EOT for consistency [ Export Table of Name Ordinals ]
.
Calculating a Function Address from the EAT by Ordinal
an ordinal is an index into the EAT
Now lets check the Export Directory Structure of KERNEL32.dll
// probably remove the picture
When looking through the values in the Export Directory, you can notice two different offsets, the FileOffset
and the RVA
, for example when we click on the RVA
| Value for [ Name ]
, we see on the left side a different offset, this is the FileOffset
, so PE-Bear Automatically maps values containing RVA
to its correspondent FileOffsets
, So keep that in mind and don't get confused, through out looking at the export directory values we have a 1200h
difference between a Memory RVA
and its correspondent FileOffset
.
Now let's walk through the Export Directory, we can see values of [ NumberOfFunctions ]
and [ NumberOfNames ]
are the same, this is not always the case, BUT the [ NumberOfNames ]
defines the number of entries in BOTH the ENT [ Export Names Table ]
and the [ Export Name Ordinals Table ]
, that's because both are of the same size, so to find an ordinal for a given Export Function Name, you need to look up the Function Name String' index in the ENT [ Export Name Table ]
, and this Index is the Index to its Ordinal in the [ Export Name Ordinals Table ]
. AND its ordinal is an Index to its entry _ Function Pointer _ in the EAT [ Export Address Table ]
.
Now lets look at these three Tables for KERNEL32.dll
Mark A highlights the RVAs to the [ EAT | ENT | Exports Ordinals Table ]
:
-> AddressOfFunctions;
RVA [ 990A8 ]
| FIleOffset [ 978A8 ]
**
Starts right after the [ IMAGE_EXPORT_DIRECTORY ]
, contains RVAs to [ Functions ]
OR RVAs to [ ForwarderString ]
like the First entry with the format [ <DLL_Name>.<Function_Name> ]
, and what happens is that the loader will go and load the DLL into the process's memory and lookup the exported function. in the case of NTDLL
it's already loaded in each process memory as a shared memory so the loader will look up [ RtlAquireSRWLockExeclusive ]
right away.
-> AddressOfNames;
RVA [ 9AA28 ]
| FileOffset [ 99228 ]
Starts right after the EAT Array and contains RVAs to Function Names, and PE-Bear already has resolved the Function Names for us .
-> AddressOfNameOrdinals;
RVA [ 9C3A8 ]
| FileOffset [ 9ABA8 ]
Starts right after the ENT, and obviously contains Ordinals to functions, for if a binary IMPORTED functions by Ordinals, the PE-Loader will look them up in the Ordinals Table.
So here is what happens if a binary imported a function from KERNEL32.dll
, there are two Options:
-> Import by Name: So the binary will write the name of the API it needs somewhere in its memory, the PE-Loader takes the Name and goes to the KERNEL32.dll
ENT [ Export Names Table ]
, lookup the name by binary searching when found, takes its index into the ENT and goes to the [ Export Name Ordinals Table ]
and lookup the Ordinal that sits at that index remember that ENT and Export Name Ordinals Table are of the same size it then takes the Ordinal that sits at this index, and THAT is the INDEX INTO the EAT, so it goes to the EAT array and grabs the RVA Value that sits at this Index, and voilà we have our needed API Function Address.
long ride, eh? lucky you the PE-Loader does all the job.
-> Import by Ordinal: Easy, the PE-Loader goes directly to the Export Name Ordinals Table and goes about the same process. so there is no need to lookup the ENT.
So lets make a nice Visual using old school text dashes to try to stick all these pieces together:
2nd Entry in the [ DataDirectory ]
Structure, a typical Directory entry of type [ IMAGE_DATA_DIRECTORY ]
with two members, an [ RVA ]
and a [ Size ]
. Interestingly enough, this [ RVA ]
is a Pointer to an Array of Structures each of type [ IMAGE_IMPORT_DESCRIPTOR ]
. There is a structure for each DLL the binary is importing from.
What's also interesting is that, this array of [ IMAGE_IMPORT_DESRIPTOR ]
ends with a structure terminator that is full of NULL values.
IMPORTANT ENTRIES:
UNION { Characteristics; OriginalFirstThunk; };
Typically is [ OriginalFirstThunk ]
, Characteristics is the old reference, contains an RVA | Pointer to an ENT [ Import Names/Lookup Table
]
. It's a structure of type [ IMAGE_THUNK_DATA ]
.
-> Name;
Name of the DLL, for each DLL the binary is importing from there is a [ IMAGE_DESCRIPTOR ]
structure containing details to all the DLL's exported APIs the binary is importing.
-> FirstThunk;
RVA to the IAT [ Import Address Table ] that is an array of Pointers to functions. It's a structure of type [ IMAGE_THUNK_DATA ]
the IAT has its own Data Directory Entry along with IAT for different types of imports.
Although [ OriginalFirstThunk ]
and [ FirstThunk ]
are two different RVAs to different Memory Addresses, but BOTH Contain the same data on Disk, that's because imported functions are resolved only at load time for a dynamically linked program, meaning that actual pointers to imported functions are resolved when the binary is loaded in memory, obviously this is memory efficient, as for an important DLL, there is only ONE copy in memory that is shared between processes. and of course it's an obsolete practice to statically link binaries, because it literally means having a copy of every function you import in your program leading to a bloated piece of binary.
There are multiple ways for Importing functions from other DLLs with different considerations for Speed and Optimization, In this Episode we will only discuss Normal Imports which is resolving imported functions at load time when the binary is loaded to memory for dynamically linked programs.
So for normal Imports, the linker will fill the IAT and INT with the same data on disk, until the image is loaded in memory, so the PELoader would overwrite data in the IAT with Real Addresses to Imported Functions.
But before we move to this, let's view the Import Directory of notepad.exe
.
again considering the RVA
to FileOffset
transition that PE-Bear does for us, lets follow the ImportDirectory->RVA
to see the Array of IMPORT Structures.
As we see, following the Imports tab leads us to the RVA
pointed at by the DataDirectory->VirtualAddress
, we have the Import Descriptor Structures filled in with its members' value.
For each DLL's IMPORT_DESCRIPTOR there is a [ OriginalFirstThunk ]
and a [ FirstThunk ]
, lets flash back to their [ IMAGE_THUNK_DATA ]
Structure:
So if we followed a DLL's [ OriginalFirstThunk ]
or [ FirstThunk ]
, we would find an array of ULONGLONG
, if you might recall, a UNION in C is for defining a memory address to store multiple different data-types one at a given time, so a UNION of four members in our case are actually four different representations of data to be stored in a Memory Address. So we have four ULONGLONG
Members:
-> ForwarderString;
If the API is forwarded from another DLL, this will contains a Literal String to the API in this format ->[ <DLL_name>.<API_name> ]
.
-> Function;
Real Memory Address to the API.
-> Ordinal;
Contains an Ordinal of the imported API.
-> AddressOfData;
Contains an RVA to another Structure of hint/names table .
Now lets look at notepad.exe
and explore the entries in Imports tab which are the [ IMAGE_IMPORT_DESCRIPTOR ]
structure for each DLL we are importing from:
Taking a fat deep look into this screenshot might help figuring out the difference between real PE structures and the representation of information an Analyst needs from a helper tool like PE-Bear, PE-Studio ..etc
The tab with [ KERNEL32.dll [83 entries] ]
that I highlighted in red is a form of representing all information about imported APIs from each dll in a way for an Analyst to easily explore and analyze the file, but a structure like this doesn't actually exist in the file image or in memory in this form. It is a Summary of what we all might need when looking at Imports of a Program.
Now lets move to the juicy stuff, starting with [ OriginalFirstThunk ]
and [ FirstThunk ]
that I highlighted in grey, we see both have RVAs to different places in Memory, leading us to different FileOffsets
, yet they contain the same information, which is the data of the imported APIs from KERNEL32.dll
So for Table at Mark A) This is the [ IMAGE_IMPORT_DESCRIPTOR ]
of KERNEL32.dll
filled in, so we have:
Note that PE-Bear also resolves some members for our sake, for example the [ Name RVA ]
is resolved, function count for this dll, and if it's bound or not which depends on the value of [ TimeDateStamp ]
but that is of another talk.
So we have [ INT ]
at RVA [ 2D560 ]
and IAT at RVA [ 268B0 ]
and by following each RVA, we get two identical arrays at their correspondent FileOffset
, we see at Mark B) the data in both arrays are the same, and if we looked closer to all the entries in KERNEL32.dll
we see for each API the [ OriginalThunk ]
and [ Thunk ]
have an identical ULONGLONG
value that is RVA | Pointer to yet another structure, following that we end up in the Imports hint/names table and we have an array of hint/names for all APIs imported from the KERNEL32.dl
l.
So for the [ IMAGE_THUNK_DATA ]
we have the AddressOfData
Member:
For each API we have hint/names entry:
Now we have an INT at FileOffset [ 2C360 ]
and an IAT array at FileOffset [ 256B0 ]
, and both on disk have pointers to the same data which is the hint/names table at some FileOffset [ 2CDD4 ]
.
So lets try to summarize this in a type of visual that would help us imagine what is going on at the Imports Side of a PE-Binary:
based on Mat Pietrek - MSDN Column on PE-File Format
You can actually make a good exercise out of this, and watch a binary's IAT | INT in disk vs in memory after it's loaded.
This is it for now, next time we will go through the rest of Importing types like Bound Imports and Delay-Loaded Imports.