Dissecting a PE File Format Data Directories p1 Imports Exports

In this Episode and the next few ones we will touch on Important Entries in the Data Directory array of structures, for a refresher on the PE-File Format, you might need to take a look at Dissecting a PE Binary: PE File Format .

Data Directory

There are tons of information that defines the Characteristics and behavior of a program, these definitions are very well structured as part of the PE-File Format in what's called [ DataDirectory ]. [ DataDirectory ] is an array of structures, each structure is of type [ IMAGE_DATA_DIRECTORY ]. each structure is called a DIRECTORY ENTRY and can be accessed by its index to the [ DataDirectory ] or by its MACRO name.

// accessing a DIRECTORY_ENTRY index of DataDirectory array 
// as an offset from the OptionalHeader structure.

// accessing Imports Directory by index
IMAGE_DATA_DIRECTORY pImportDirectory = &pOptionalHeader->DataDirectory[1];

// accessing exports directory by Macro name.
IMAGE_DATA_DIRECOTRY pExportDirectory = &pOptionalHeader->DataDirectory[IMAGE_DIRECTORY_ENTRY_EXPORT]

/********* winnt.h *********/

// Directory Entries

#define IMAGE_DIRECTORY_ENTRY_EXPORT          0   // Export Directory
#define IMAGE_DIRECTORY_ENTRY_IMPORT          1   // Import Directory
#define IMAGE_DIRECTORY_ENTRY_RESOURCE        2   // Resource Directory
#define IMAGE_DIRECTORY_ENTRY_EXCEPTION       3   // Exception Directory
#define IMAGE_DIRECTORY_ENTRY_SECURITY        4   // Security Directory
#define IMAGE_DIRECTORY_ENTRY_BASERELOC       5   // Base Relocation Table
#define IMAGE_DIRECTORY_ENTRY_DEBUG           6   // Debug Directory
//      IMAGE_DIRECTORY_ENTRY_COPYRIGHT       7   // (X86 usage)
#define IMAGE_DIRECTORY_ENTRY_ARCHITECTURE    7   // Architecture Specific Data
#define IMAGE_DIRECTORY_ENTRY_GLOBALPTR       8   // RVA of GP
#define IMAGE_DIRECTORY_ENTRY_TLS             9   // TLS Directory
#define IMAGE_DIRECTORY_ENTRY_LOAD_CONFIG    10   // Load Configuration Directory
#define IMAGE_DIRECTORY_ENTRY_BOUND_IMPORT   11   // Bound Import Directory in headers
#define IMAGE_DIRECTORY_ENTRY_IAT            12   // Import Address Table
#define IMAGE_DIRECTORY_ENTRY_DELAY_IMPORT   13   // Delay Load Import Descriptors
#define IMAGE_DIRECTORY_ENTRY_COM_DESCRIPTOR 14   // COM Runtime descriptor

Each Data Directory Entry is yet another structure with two Members -> an [ RVA ] to the Intended Directory, a [ Size ] of the Directory Table.

typedef struct _IMAGE_DATA_DIRECTORY {
	DWORD VirtualAddress;    // RVA to the Directory TABLE
	DWORD Size;
} IMAGE_DATA_DIRECTORY, *PIMAGE_DATA_DIRECTORY

In this Episode we will discuss the Import and Export Directories in Detail, as they sync together as in their function for the PE-Loader to load any PE Binary successfully.

#define IMAGE_DIRECTORY_ENTRY_EXPORT          0   // Export Directory
#define IMAGE_DIRECTORY_ENTRY_IMPORT          1   // Import Directory

Export Directory

1st Entry in the [ DataDirectory ] Structure, as all Directory Entries, This structure has two members, an [ RVA ] to the Export Directory, [ Size ] of the Export Directory.

For Exports we will investigate [ KERNEL32.DLL ]:

// structure of 8bytes

typedef struct _IMAGE_DATA_DIRECTORY {
	DWORD VirtualAddress;    // RVA to the EXPORT TABLE
	DWORD Size;
} IMAGE_DATA_DIRECTORY, *PIMAGE_DATA_DIRECTORY;

For each library that exports functions for other binaries to use, there is only ONE Export Directory Structure. Defined with struct type [ IMAGE_EXPORT_DIRECTORY ].

// accessing the Export Directory 

IMAGE_DATA_DIRECTORY * pExportDataDirectory = &pOptHeader->DataDirectory[0];
IMAGE_EXPORT_DIRECTORY * pExportDirectory = (IMAGE_EXPORT_DIRECTORY *) &pExportDataDirectory->VirtualAddress;

KERNEL32.dll Data Directory Array of Structures:

=> Each Structure is of 8bytes, 1st member is an [ RVA ] to the Directory in Memory.

Structure of The Export Directory

typedef struct _IMAGE_EXPORT_DIRECTORY {
	DWORD	Characteristics;
	DWORD	TimeDateStamp;
	WORD	MajorVersion;
	WORD	MinorVersion;
	DWORD	Name;                  // Name of the DLL
	DWORD	Base;                  // First ordinal number | Used to calculate valid Ords
	DWORD	NumberOfFunctions;     // Number of entries in the EAT
	DWORD	NumberOfNames;         // Number of entries|names in ENT == Number of Ordinals in Ordinal Table

	DWORD	AddressOfFunctions;    
// POINTER | RVA to exported functions EAT -> (Array of Pointers)

	DWORD	AddressOfNames;        
// POINTER | RVA to exported function Names Table ENT -> (Array of RVAs to Name Strings)

	DWORD	AddressOfNameOrdinals; 
// POINTER | RVA to Ordinal Table.

} IMAGE_EXPORT_DIRECTORY, *PIMAGE_EXPORT_DIRECTORY;

IMPORTANT ENTRIES:

->** Name;** Name of the DLL.

-> Base; First Ordinal Number, For resolving Exports by Ordinals this value is needed to calculate if a given Ordinal is Valid or not.

an Ordinal must NOT be less than the [ Base ] and must NOT be greater than [ Base+NumberOfNames ]

// Pseudo code to check if an ordinal is Valid
if(ord < pExportDirectory->Base || ord > (pExportDirectory->Base + pExportDirectory->NumberOfNames)
    return FALSE;

-> NumberOfFunctions; Number of Entries in the EAT [ Export Address Table ].

-> NumberOfNames; Number of Entries in the ENT [ Export Name Table ].

-> AddressOfFunctions; RVA to the EAT [ Export Address Table ] Array of Pointers to functions.

EAT entries points to either -> A) Function CODE if the exported function is a local function to the DLL. B) Forwarder String if the exported function is Forwarder from another DLL.

/* list of pointers to functions */

address_of_function[0];     
address_of_function[1];
address_of_function[2];
address_of_function[3];
.
.
.
.
address_of_function[NumberOfFunctions];

-> AddressOfNames; RVA to the ENT [ Export Names Table ].

/* List of Pointers to strings that are ordered Lexicaly */

address_of_name[0];    // pointer to a NULL-Terminalted sring entry in INT
address_of_name[1];
address_of_name[2];
address_of_name[3];
address_of_name[4];
.
.
.
address_of_name[NumberOfNames];

-> AddressOfNameOrdinals; RVA to the EOT for consistency [ Export Table of Name Ordinals ].

/* Array of WORDs */
name_ordinal[0];   // Ordinal values are indices into the EAT
name_ordinal[1];
name_ordinal[2];
name_ordinal[3];
.
.
.
.
name_ordinal[NumberOfNames];

Calculating a Function Address from the EAT by Ordinal

an ordinal is an index into the EAT

AddressOfFunctions[AddressOfNameOrdinals[N]];

Now lets check the Export Directory Structure of KERNEL32.dll // probably remove the picture

When looking through the values in the Export Directory, you can notice two different offsets, the FileOffset and the RVA, for example when we click on the RVA | Value for [ Name ], we see on the left side a different offset, this is the FileOffset, so PE-Bear Automatically maps values containing RVA to its correspondent FileOffsets, So keep that in mind and don't get confused, through out looking at the export directory values we have a 1200h difference between a Memory RVA and its correspondent FileOffset.

Now let's walk through the Export Directory, we can see values of [ NumberOfFunctions ] and [ NumberOfNames ] are the same, this is not always the case, BUT the [ NumberOfNames ] defines the number of entries in BOTH the ENT [ Export Names Table ] and the [ Export Name Ordinals Table ], that's because both are of the same size, so to find an ordinal for a given Export Function Name, you need to look up the Function Name String' index in the ENT [ Export Name Table ], and this Index is the Index to its Ordinal in the [ Export Name Ordinals Table ]. AND its ordinal is an Index to its entry _ Function Pointer _ in the EAT [ Export Address Table ].

Now lets look at these three Tables for KERNEL32.dll

Mark A highlights the RVAs to the [ EAT | ENT | Exports Ordinals Table ]:

-> AddressOfFunctions; RVA [ 990A8 ] | FIleOffset [ 978A8 ]**

Starts right after the [ IMAGE_EXPORT_DIRECTORY ], contains RVAs to [ Functions ] OR RVAs to [ ForwarderString ] like the First entry with the format [ <DLL_Name>.<Function_Name> ], and what happens is that the loader will go and load the DLL into the process's memory and lookup the exported function. in the case of NTDLL it's already loaded in each process memory as a shared memory so the loader will look up [ RtlAquireSRWLockExeclusive ] right away.

-> AddressOfNames; RVA [ 9AA28 ] | FileOffset [ 99228 ]

Starts right after the EAT Array and contains RVAs to Function Names, and PE-Bear already has resolved the Function Names for us .

-> AddressOfNameOrdinals; RVA [ 9C3A8 ] | FileOffset [ 9ABA8 ]

Starts right after the ENT, and obviously contains Ordinals to functions, for if a binary IMPORTED functions by Ordinals, the PE-Loader will look them up in the Ordinals Table.

So here is what happens if a binary imported a function from KERNEL32.dll, there are two Options:

-> Import by Name: So the binary will write the name of the API it needs somewhere in its memory, the PE-Loader takes the Name and goes to the KERNEL32.dll ENT [ Export Names Table ], lookup the name by binary searching when found, takes its index into the ENT and goes to the [ Export Name Ordinals Table ] and lookup the Ordinal that sits at that index remember that ENT and Export Name Ordinals Table are of the same size it then takes the Ordinal that sits at this index, and THAT is the INDEX INTO the EAT, so it goes to the EAT array and grabs the RVA Value that sits at this Index, and voilà we have our needed API Function Address.

long ride, eh? lucky you the PE-Loader does all the job.

-> Import by Ordinal: Easy, the PE-Loader goes directly to the Export Name Ordinals Table and goes about the same process. so there is no need to lookup the ENT.

So lets make a nice Visual using old school text dashes to try to stick all these pieces together:

EAT Array Of Function Pointers
--------------------------
NumberOfFunctions = 9
AddressOfFunctions[]   --->     Addr0  Addr1  Addr2  Addr3  Addr4  Addr5  Addr6 Addr7 Addr8
                                  |     |      |                            |           |
                                  |     |      |        ____________________|           |
ENT Array Of RVA to Names         |     |      |       |       _________________________|
-------------------------         |     |      |       |      |
NumberOfNames = 5                 |     |      |       |      |
AddressOfNames[]       --->     name0  name1  name2  name3  name4 
                                  |     |      |       |      |
                                  |     |      |       |      |
EOT Array of Ordinals             |     |      |       |      |
---------------------             |     |      |       |      |
NumberOfNames = 5                 |     |      |       |      | 
AddressOfNameOrdinals[]  ->       0     1      2       6      8 

______________________________________________________

Example: Calculate Function Pointer to [ name3 ]:

index of name3 = 3

Ordinal of name3 == AddressOfNameOrdinals[3] =  6

Function Address == AddressOfFunction[6] = Addr6

Import Directory

2nd Entry in the [ DataDirectory ] Structure, a typical Directory entry of type [ IMAGE_DATA_DIRECTORY ] with two members, an [ RVA ] and a [ Size ]. Interestingly enough, this [ RVA ] is a Pointer to an Array of Structures each of type [ IMAGE_IMPORT_DESCRIPTOR ]. There is a structure for each DLL the binary is importing from.

What's also interesting is that, this array of [ IMAGE_IMPORT_DESRIPTOR ] ends with a structure terminator that is full of NULL values.

(IMAGE_DATA_DIRECTORY) * pImportDataDirectory = &pOptionalHeader->DataDirectory[1]
(IMAGE_IMPORT_DESCRIPTOR) * pImportDescriptors = (IMAGE_IMPORT_DESCRIPTOR *) (pBaseAddress +pImportDataDirectory->VirtualAddress);

#define IMAGE_DIRECTORY_ENTRY_IMPORT         2

typedef struct _IMAGE_DATA_DIRECTORY {
  DWORD VirtualAddress;
  DWORD Size;
} IMAGE_DATA_DIRECTORY, *PIMAGE_DATA_DIRECTORY;

typedef struct _IMAGE_IMPORT_DESCRIPTOR {
	UNION {
		DWORD Characteristics;
		DWORD OriginalFirstThunk; // RVA to Import Lookup Table
	} DUMMYUNIONNAME;
	DWORD TimeDateStamp;    
	DWORD ForwarderChain;    
	DWORD Name ;             // Name of the DLL
	DWORD FirstThunk;        // RVA to IAT to be overwritten with PE-Loader when bound

// for each DLL there is an IMPORT_DESCRIPTOR structure
// FirstThunk -> RVA to this DLL's ENTRY in The IAT, this RVA is the RVA to the first import from this DLL
	
} IMAGE_IMPORT_DESCRIPTOR, *PIMAGE_IMPORT_DESCRIPTOR;

IMPORTANT ENTRIES:

UNION { Characteristics; OriginalFirstThunk; }; Typically is [ OriginalFirstThunk ], Characteristics is the old reference, contains an RVA | Pointer to an ENT [ Import Names/Lookup Table ]. It's a structure of type [ IMAGE_THUNK_DATA ].

// INT Structure

typedef struct _IMAGE_THUNK_DATA {
	UNION {
		ULONGLONG ForwarderString;    
		ULONGLONG Function;           
		ULONGLONG Ordinal;           
		ULONGLONG AddressOfData;      
	} u1;
} IMAGE_THUNK_DATA, *PIMAGE_THUNK_DATA;

-> Name; Name of the DLL, for each DLL the binary is importing from there is a [ IMAGE_DESCRIPTOR ] structure containing details to all the DLL's exported APIs the binary is importing.

-> FirstThunk; RVA to the IAT [ Import Address Table ] that is an array of Pointers to functions. It's a structure of type [ IMAGE_THUNK_DATA ]

// IAT Structure

typedef struct _IMAGE_THUNK_DATA {
	UNION {
		ULONGLONG ForwarderString;    
		ULONGLONG Function;           
		ULONGLONG Ordinal;           
		ULONGLONG AddressOfData;      
	} u1;
} IMAGE_THUNK_DATA, *PIMAGE_THUNK_DATA;

the IAT has its own Data Directory Entry along with IAT for different types of imports.

#define IMAGE_DIRECTORY_ENTRY_IAT            12   // Import Address Table

Although [ OriginalFirstThunk ] and [ FirstThunk ] are two different RVAs to different Memory Addresses, but BOTH Contain the same data on Disk, that's because imported functions are resolved only at load time for a dynamically linked program, meaning that actual pointers to imported functions are resolved when the binary is loaded in memory, obviously this is memory efficient, as for an important DLL, there is only ONE copy in memory that is shared between processes. and of course it's an obsolete practice to statically link binaries, because it literally means having a copy of every function you import in your program leading to a bloated piece of binary.

There are multiple ways for Importing functions from other DLLs with different considerations for Speed and Optimization, In this Episode we will only discuss Normal Imports which is resolving imported functions at load time when the binary is loaded to memory for dynamically linked programs.

So for normal Imports, the linker will fill the IAT and INT with the same data on disk, until the image is loaded in memory, so the PELoader would overwrite data in the IAT with Real Addresses to Imported Functions.

But before we move to this, let's view the Import Directory of notepad.exe .

again considering the RVA to FileOffset transition that PE-Bear does for us, lets follow the ImportDirectory->RVA to see the Array of IMPORT Structures.

As we see, following the Imports tab leads us to the RVA pointed at by the DataDirectory->VirtualAddress, we have the Import Descriptor Structures filled in with its members' value.

For each DLL's IMPORT_DESCRIPTOR there is a [ OriginalFirstThunk ] and a [ FirstThunk ], lets flash back to their [ IMAGE_THUNK_DATA ] Structure:

// IMAGE_THUNK_DATA

typedef struct _IMAGE_THUNK_DATA {
	UNION {
		ULONGLONG ForwarderString;    
		ULONGLONG Function;           
		ULONGLONG Ordinal;           
		ULONGLONG AddressOfData;      
	} u1;
} IMAGE_THUNK_DATA, *PIMAGE_THUNK_DATA;

So if we followed a DLL's [ OriginalFirstThunk ] or [ FirstThunk ], we would find an array of ULONGLONG, if you might recall, a UNION in C is for defining a memory address to store multiple different data-types one at a given time, so a UNION of four members in our case are actually four different representations of data to be stored in a Memory Address. So we have four ULONGLONG Members:

-> ForwarderString; If the API is forwarded from another DLL, this will contains a Literal String to the API in this format ->[ <DLL_name>.<API_name> ].

-> Function; Real Memory Address to the API.

-> Ordinal; Contains an Ordinal of the imported API.

-> AddressOfData; Contains an RVA to another Structure of hint/names table .

/* Import hint/name entry */

typedef struct _IMAGE_IMPORT_BY_NAME {
	WORD	Hint;
	BYTE	Name[1];
} IMAGE_IMPORT_BY_NAME,*PIMAGE_IMPORT_BY_NAME;

Now lets look at notepad.exe and explore the entries in Imports tab which are the [ IMAGE_IMPORT_DESCRIPTOR ] structure for each DLL we are importing from:

Taking a fat deep look into this screenshot might help figuring out the difference between real PE structures and the representation of information an Analyst needs from a helper tool like PE-Bear, PE-Studio ..etc

The tab with [ KERNEL32.dll [83 entries] ] that I highlighted in red is a form of representing all information about imported APIs from each dll in a way for an Analyst to easily explore and analyze the file, but a structure like this doesn't actually exist in the file image or in memory in this form. It is a Summary of what we all might need when looking at Imports of a Program.

Now lets move to the juicy stuff, starting with [ OriginalFirstThunk ] and [ FirstThunk ] that I highlighted in grey, we see both have RVAs to different places in Memory, leading us to different FileOffsets, yet they contain the same information, which is the data of the imported APIs from KERNEL32.dll

So for Table at Mark A) This is the [ IMAGE_IMPORT_DESCRIPTOR ] of KERNEL32.dll filled in, so we have:

typedef struct _IMAGE_IMPORT_DESCRIPTOR {

	DWORD OriginalFirstThunk; // RVA to INT [ 2D560 ]  at FileOffset [ 2C360 ]
	DWORD TimeDateStamp;    
	DWORD ForwarderChain;     
	DWORD Name ;              // RVA to Dll Name [ 2E3A8 ]
	DWORD FirstThunk;         // RVA to IAT [ 268B0 ] at FileOffset [ 256B0 ]
	
} IMAGE_IMPORT_DESCRIPTOR, *PIMAGE_IMPORT_DESCRIPTOR;

Note that PE-Bear also resolves some members for our sake, for example the [ Name RVA ] is resolved, function count for this dll, and if it's bound or not which depends on the value of [ TimeDateStamp ] but that is of another talk.

So we have [ INT ] at RVA [ 2D560 ] and IAT at RVA [ 268B0 ] and by following each RVA, we get two identical arrays at their correspondent FileOffset, we see at Mark B) the data in both arrays are the same, and if we looked closer to all the entries in KERNEL32.dll we see for each API the [ OriginalThunk ] and [ Thunk ] have an identical ULONGLONG value that is RVA | Pointer to yet another structure, following that we end up in the Imports hint/names table and we have an array of hint/names for all APIs imported from the KERNEL32.dll.

So for the [ IMAGE_THUNK_DATA ] we have the AddressOfData Member:

 IMAGE_THUNK_DATA {
    ULONGLONG AddressOfData;   // RVA to hint/names Table
};

For each API we have hint/names entry:

/* Import hint/name entry */

typedef struct _IMAGE_IMPORT_BY_NAME {
	WORD	Hint;                  // [ 2B7 ] for current API
	BYTE	Name[1];               // Null Terminated String starting from byte[1]
                                       // GetProcAddress\x0
} IMAGE_IMPORT_BY_NAME,*PIMAGE_IMPORT_BY_NAME;

Now we have an INT at FileOffset [ 2C360 ] and an IAT array at FileOffset [ 256B0 ], and both on disk have pointers to the same data which is the hint/names table at some FileOffset [ 2CDD4 ].

So lets try to summarize this in a type of visual that would help us imagine what is going on at the Imports Side of a PE-Binary:

based on Mat Pietrek - MSDN Column on PE-File Format


IMAGE_IMPORT_DESCRIPTOR        -> Import Lookup    Hint/Name    -> Import Address
                               |      Table          Table      |      Table
_______________________        |      _____        _________    |   ___________
OriginalFirstThunk(INT) _______|      RVA-A            44       |      RVA-A
                                           ->      GetMessage() |   <-
           .                          _____        _________    |   ___________
                                      RVA-B            72       |      RVA-B
                                           ->      LoadIcon()   |   <-
           .                          _____        _________    |   ___________
Name: imported DLL --> "USER32.dll"   RVA-C            19       |      RVA-C
					   ->	   IsWindows()  |   <-
           .                          _____        _________    |   ___________
First Thunk (RVA to IAT) __                                     | (IAT overwritten by PELoader
                           |                                    | at load-time with real Memory 
                           |                                    |   Addresses)
           .               |                                    |  
                           |                                    |
           .               |                                    |
                           |                                    |
                           |____________________________________|

You can actually make a good exercise out of this, and watch a binary's IAT | INT in disk vs in memory after it's loaded.

This is it for now, next time we will go through the rest of Importing types like Bound Imports and Delay-Loaded Imports.

Last updated 5 months ago