Disassemble that binary
This Article and the next few are acting as a refresher to basic and core concepts in Assembly Language and Reverse Engineering C Code Constructs touching on important Low Level Computer Architecture fundamentals.
References
A Program's Journey Under Execution
a process is the active state of a program, as its laid out in memory. active state meaning the state of the program when its under execution. this process of preparing and loading the program into memory, managing its execution is done by the Windows Loader. We'll touch on the steps of this.
First
your intent and action to double click and run an application makes the OS warm up and get a copy of that program and allocate some memory for it and places this copy aka Program Image in RAM; as we know RAM is where active programs are placed, and by active I mean programs under execution.
Second
now the program is in the hands of RAM or let's say Main Memory, RAM also warms up and gives some attention to the newly arrived program and tidy up an Address Space for the program's routines to place and execute.
This is where we find the 3 segments .text
, .code
, .bss
; if you have practiced writing some assembly in any dialect, you have to define these main 3 segments; the system uses these segments to organize the program in RAM:
->.text;
is where the machine instructions aka Code is put, this place is where the IP instruction pointer jumps to suddenly in the middle of execution, to execute another piece of code and return back to its normal flow. .text is readable/executable only. you cannot write into this segment; you no longer can.
->.data;
where your initialized global/static variables reside. .data is writable and changes at runtime
->.bss
; where uninitialized global/static variables are located. all uninitialized variables are initialized to Zero.
RAM will then initialize The Stack, The Heap;
The Stack: Data Structure that is perfect for temporary data storage, where you can
PUSH
data for manipulation Calculations ..etc andPOP
it when you're done. Designed in a LIFO manner specifically for Program Flow for routines to call other subroutines. The Stack is marked non-executable/Write-Only RW
, meaning that if the IP Instruction Pointer points to somewhere on the stack, you will get the you're trying to execute a non-executable memory kinda error. recall the.text
segment? this is where executable code should be located and IP fetches instructions from there.The Heap: Less Structured Memory adjoined to The Stack and grows to the opposite direction, this is where C functions like
malloc()
,free()
..etc work. You can freely allocate memory and free them dynamically in the Heap. The Heap is readable, writable e.i. dynamic memory structure.
Now your program is laid perfectly in RAM.
Long story short; the stack grows to location Zero in memory, and by PUSH
ing data in, you decrement the stack by the data size. The Stack grows down to lower addresses, grows up to lower addresses, the stack shrinks or grows; these are all mental visualizations to describe what is going on.
This is the very basic Program Layout in memory, this is for a single Program Image, there are many of these inside the memory. With more places for the DLLs associated with the program being loaded and space for Kernel-Land, you can pretty much say this is all of it. now there is a reason for the neighboring of The Stack and The Heap in this manner, but we will not get into this here.
Third
Magic..
Remember, Remember, X86 | X86-64
I cannot stress this enough, this is important, if we want to hack into binaries, know why our binary was hacked, we need to know what is a binary, I will go through this topic and review it in later episodes until we really get comfortable with it. now if you cannot care less;
X86 | X86-64 Architecture , Registers and Data Types:
X86 architecture can operate on two modes:
Real mode: when its first powered and uses only 16bit instruction set.
Protected mode: mode at which modern computers operate. Its the processor state at which VM ( Virtual Memory / Paging ... etc) is supported.
X64 / X86-64 Extended X86 arch that supports 64bit instruction set, with variable length instructions, for more on Instruction Sets and different Computer Architectures go here .
Now we need to go over some important stuff about X86 / X64, I suggest you go over the basics with all the types of instructions ..etc using this resource , now this takes time, and as i said we need to Remember how important is it.
Registers:
Basic CPU storage units, mainly to save time for the CPU so it doesn't need to access RAM, with GPR _General Purpose Registers_, Control Registers, Segment Registers and EFLAGS. X86 has 8 GPRs, some of them can be further divided into 16bits, 8bits registers.
GPRs are often used for specific operations, but you can use them as you want, here are some conventions:
AX → Accumulator Register, stores the Return Value of a function if the return value doesn't exceed Register Size.
BX → Base Register, Contains a Pointer to Data.
CX → Counter Register, used in loops as a counter to keep track of shifts.
DX → Data Register, used for I/O and Arithmetic
AX BX CX DX are divided into two 8bit registers, AH / AL ..etc
SP | BP → Pointer Registers, used to store stack addresses; Stack Pointer pointing to the Top of the Stack | Base Pointer pointing to the Base of the Stack Frame respectively.
SI | DI → Index Registers, used to point to data as Source Index|Destination Index respectively, to load from/write to memory during Stream Operations.
GPRs are extended to 32bit and 64bit registers.
Segment Registers are 6 in total, their purpose is to store the Segments' Starting Addresses of the Binary being executed. though their use is extended nowadays, and not really used solely for this purpose.
CS → Contains Starting Address of the Code Segment that contains all instructions to be executed,
DS → Contains Starting Address of the Data Segment.
SS → Contains Starting Address for Stack Segment which contains return addresses of subroutines and procedures.
EX → for Extra Segment Registers (FS, ES, GS) which are Filled with data from the Operating System or Exception Handlers -> FS / Thread Handling like TEB/PEB.
Data Types:
X86: Intuitively an instruction like this [_mov eax, 0x666_
] will store the value as of 32bits``
dword
size extended with 0s, or we can use explicit override prefixes and different sized registers.
X64: yes.. 64bits
qwords
Instruction Set: Memory Manipulation | Arithmetic Operations:
Data Movement is classified into five general methods:
→ Immediate to Register → Immediate to Memory → Memory to Register / Register to Memory → Memory to Memory → Register to Register
lets look at these Data Movement Instructions
in C
words, the equivalence of ASM
Square Brackets [ ]
, is when you define a Pointer e.g:
lets look at this example in Pseudo C
:
now translating the ASM
instructions to Pseudo C
:
[Base + offset]
Memory Access
[Base + offset]
Memory AccessThis form is commonly used to access structure members or data buffers, where the offset is either immediate or a register.
now we need to give some attention to instruction 05, the value to be written is meant to be stored in 32bits specified by an override prefix dword
, this will automatically override the first three variables of the structure "type", "importance", "number" which are 1, 1, 2bytes in size.
[Base + Index * scale]
Memory Access
[Base + Index * scale]
Memory AccessThis form is used to access Array-Type Objects, with base register indicating the start address of the array, an index to count over and a scale indicating the number of bytes/size of the array's elements.
Copy String/Memory between two memory locations:
-> MOVSB
, MOVSW
, MOVSD
: moves 1, 2, 4bytes respectively from memory to memory, MOVS
instruction implicitly uses EDI
, ESI
as Destination Index, Source Index, it also result in changing the DF
"Direction Flag" so if DF=1
, both ESI
, EDI
will be incremented, if DF=0
both will be decremented, the update value is equal to the size specified by MOVS
, so it's either (+-) 1, 2, 4bytes.
In some cases MOVS
is accompanied with REP
prefix to repeat the instruction with ECX
as counter.
Now the perfect spot to talk about mov
, lea
with instructions 01- 02; LEA
have this format of lea destination, source
, it doesn't access memory, it's just used to calculate a memory address and puts it in the destination, this is not exclusively for referring to memory addresses, LEA
comes in hand when calculating values without accessing memory, resulting in less / fewer instructions.
there is a huge difference between :
mov eax, [ebx+4]
and lea eax, [ebx+4]
.
Scan String in Memory:
-> SCASB
, SCASW
, SCASD
: scans 1, 2, 4bytes in memory against AL
/AX
/EAX
registers respectively, data/string in memory starts at address EDI
, which is automatically incremented/decremented depending on the DF
bit.
Compare String to String in Memory:
-> CMPSB
, CMPSW
, CMPSD
: compares two strings both in memory, specified by ESI:EDI
as Source:Destination
or in other word the two strings as operands.
Store String in Memory:
→ STOSB
, STOSW
, STOSD
: stores 1, 2, 4bytes from AL
/AX
/EAX
respectively to the destination address EDI
, also EDI
is updated accordingly to the value of DF
flag.
It is commonly used to initialize a buffer to a constant value.
Load String from Memory:
→ LODSB
, LODSW
, LODSD
: from the same family, used for loading data/string from memory at source address ESI
and saving it in AL
/AX
/EAX
.
As you can clearly see, this seems to never end, and yes there is a plethora of instructions for memory manipulation and all of them can be used to implement the functionality of C/C++ memory manipulation functions. C has a very thin line separating it from machine code, considering assembly as an artificial representation to just help us with having a kind of human-readable format of machine code, and assembly has many flavors each is built for a different system. Because of this fine line, C is considered very dangerous and powerful when dealing with memory, off course this is depending on where you are :), this is why over the time we got high-level languages to make a good separation between the developer and the memory/machine so it can make it harder to blindly destroy stuff -although we end up doing it anyways- but it's considered pretty much safer, easier and scalable.
Instruction Set: Stack Operations | Function Invocation:
yep, this is where most crimes begins.. suicides too, its the playground of functions, data, and instructions. Again this is very important to grasp, if you end up familiar with it _and by familiar i mean by looking at it you no longer want to jump right of the window_ inner workings of programs well get crystal clear. Now lets begin with the playground; The Stack.
In computer science, a call stack is a data structure that stores information about the active subroutines of a computer program. This kind of stack is also known as an execution stack, program stack, control stack, run-time stack, or machine stack, and is often shortened to just The Stack. The Stack has 3 primary tasks: Passing Function Parameters, Local-Data Storage, Storing Return Addresses.
Stack Layout:
Because The Stack is a LIFO structure it has two instructions to literally PUSH
, POP
data, these two instructions are PUSH
, POP
. The Stack or lets say that memory region is being pointed to by the stack pointer ESP
, ESP
is very dynamic as it points to the top of the stack, so when we PUSH new data the stack gets decremented _recall the stack grows to memory Zero?_ and when we POP data off the stack, ESP gets incremented. ESP is updated by +- either by 4bytes, or by 1, 2bytes with a prefix override.
If you like visuals;
Stack Frames and Function Calls:
What happens in Vegas, stays in Vegas
Since we are still mentally sound, this topic should be a soft blow. Stack Frames are LIFO data structures used to contain subroutine state information, state information are local variables, return addresses and the caller's base address.
Local Variables are local to the function being called, its when you define a function and define variables inside the function aka local scope. Return Address is simply where to go when the function's execution is finished, this return address is handed to the great EIP
register, EIP
can flow in sequential order, and can suddenly be handed a different location when a sudden subroutine/function-call pops up. With each subroutine being called, a dedicated stack frame will be initialized holding its parameters and local variables. If you're lucky this could be it, but most likely we are not lucky and we will find ourselves inside an inception, where you find nested functions and nested stack frames. Interestingly enough each subroutine/stack-frame will contain its caller's base address, to not lose mommy.
Lets dissect the Inception by implementing this function:
-> First
Where we at?
The program's execution flow starts we a function is called, all the program's functions' definitions with their local variables are evaluated in runtime, this is very logical and makes sense, and its called The Call Stack. You start by evaluating the function's parameters that are passed with the function call _if it has any_ then you jump to the function's location/definition after saving the return address the location _at where the function was called_ you evaluate the function's local variables _if there is any_ then you execute the function, evaluate its return value and save it _if there is any_ then you go back to resume execution at the return address you saved. Off course its not you who do all these stuff, but we are keeping a close eye on it.
-> Second
The Function's Prologue
Let's look at this from the eyes of the stack, with EBP:ESP
as our main focus, at line 03 there is a sudden function-call, we jump to the location of addme()
and initialize a stack-frame for this subroutine _it's a subroutine because its definitely inside the main()
function_ first we save EIP
's next instruction's location aka return address, and gives it the new location in which lies the subroutine addme()
to execute. This return address is saved on the stack by PUSH
ing it, this is done by the call instruction in line 03. Next we set up a base frame for our subroutine which is basically the address from which we will reference our parameters and local variables in other words our stack frame's Home Page. We have two candidates for this task EBP:ESP
, as they are the main registers for stack manipulation, the winner is EBP
, since ESP
is very dynamic and changes with each POP
/PUSH
, EBP
will be used as this reference/home-page/base-frame. We set EBP
to the top of the stack _the value of ESP_
so we can PUSH
/POP
data and have a pivot to reference from, this is done with instruction at line 16. Clearly EBP
is used as any stack-frame's base-frame/home-page , so this means that the current EBP
was our caller's pivot, so we need to save/preserve it before we use EBP
as the base frame for our new stack frame, this is done by **PUSH
**ing EBP
to the stack, before setting it as our base frame, this is done with instruction at line 15.
Obviously The Stack is designed to efficiently store temporary data at runtime
-> Third
The inner workings of the subroutine, initializing local variables _if there any_ executing the function ..etc. The local variables are referenced relative to EBP
the base frame with an offset of a size dictated by an override prefix, in lines 18-19. Now we have the EBP
as our base frame that is PUSHed right after the return address as the start of our stack frame construction, so EBP
with offset 0 is the base-frame, any local variable that is PUSHed after will be referenced with a -
/ minus
offset _recall, the stack grows to Zero_ that's one thing, the function's parameters that are passed with the function call is another, these are pushed before the call instruction at line 03 in lines 01-02, this is a logical step in the Call Stack; that is to evaluate the parameters passed to a function before actually jumping to the function execution. Hence you'll find that a subroutine's parameters are referenced with a +
/plus
offset relative to the EBP
.
The subroutine's returned value _if there any_ is automatically saved in EAX
-> Fourth
The Function's Epilogue
Now we are done executing the subroutine, so we move everyone back to his place, with instruction at line 22 we set ESP
to be the value of our base frame EBP
, this automatically kicks out whatever local variables PUSH
ed in the stack, so we have a clean slate. Then at line 23, we load the SAVED / PUSHed value of our caller's base frame to EBP
. This means we now are awake from an inception, yet to continue living __POPing/PUSHing_ variables local_ in our caller's stack frame, or maybe initiate a new one.
Calling Conventions | Function Invocation:
The way a function passes its parameters and saves its returned value is dictated by Calling Conventions. A Calling Convention is a set of rules dictating how function calls work at the machine level. It is defined by the Application Binary Interface 'ABI' for a particular system. That explains instruction at line 04 __the return address after the function cal_l_ as it cleans the stack, which means this function call was dictated by CDECL Calling Convention.
Its always good to keep references to hold your back. This is the instruction reference for x86 and amd64 with all instructions to lookup. This is the "Call Stack" logic, the high level implementation of the stack frame inception.
X64
X64 is an extension of X86, so it has most of the architecture properties with minor differences.
Registers, Data Types and Arithmetic Operations:
This time we have 18 64bit GPR with R prefix, RBP
can still be used as a the base frame pointer and reference local variables relative to it, yet X64 treats RBP
as just another GPR and reference local variables relative to ESP
.
-> X64 supports the concept of RIP
-relative addressing.
-> Most arithmetic operations are automatically updated to 64bit even if the operand is 32bit register, unless an override prefix is specified.
Registers:
X64 has brought 8 additional Registers R8-15:
R8-11 -> Considered Volatile and data stored in it will be lost once another function is called.
R12-15 -> Must be saved before another function is called.
Function Invocation:
Most calling conventions are passes parameters through registers:
Windows x64 has one conventions of passing the first 4 parameters thru RCX, RDX, R8, R9
and the remaining parameters are passed through the stack from right ot left.
Linux x64 the first 6 parameters are passed through registers RDI, RSI, RDX, RCX, R8, R9
.
That almost concludes THIS discussion about X86 - X64, i cannot promise I won't revisit this again, it has to happen, as the more easy to read/understand and spot Code Constructs, the faster we can spot weaknesses and understand malwares' functionality.
There is more to it than meets the eye
Endianness:
Virtual Memory
An Abstraction of physical memory (RAM) functionality created by the Operating System for all running processes.
Each Process running in memory has its illusion of having its own address space, well its not really an illusion, its more of Hardware-OS Magic. X86 supports the concept of privilege separation through an abstraction called Ring Level. In User-Mode aka Ring3, applications start user-mode processes which comes with its own private virtual address space and handle table. In Kernel-Mode aka Ring0 applications share virtual address space.
an illusion of a fixed amount of Virtual Memory is given to each process depending on the architecture
-> for 32bit: (2^32) addressable memory ~4GB Ram so the address-ability for a 32bit process is of range(0x00000000 - 0xFFFFFFFF)
-> for 64bit: (2^64) addressable memory ~ ( 0x0000000000000000 - 0xFFFFFFFFFFFFFFFF)
Virtual Address Translation:
Memory addresses are divided into Physical Addresses and Virtual Addresses. When Paging is enabled, instruction uses the Virtual Addresses for execution, while Physical Addresses are the actual memory locations used by the the processor when accessing memory. This is done by MMU _Memory Management Unit_ which translates every virtual address into physical address for the CPU before accessing it. CR0-CR4 are Control Registers for Memory Paging and Hardware Virtualization, DR0-DR7 are Control Registers used for setting Memory Breakpoints, while only DR0-DR3 are allowed to be set for memory breakpoints, the rest is reserved for status.
Interrupts and Exceptions:
Interrupts -> Hardware Interrupts** caused by Hardware Devices, they are asynchronous by nature. An Interrupt can be thought of as being associated with a number that is an index to an array of function pointers, so when an interrupt is received, the CPU executes the function at the index associated with the interrupt, then continues execution to wherever it was before te interrupt. Software Interrupts may be intentionally caused by executing a special instruction which, by design, invokes an interrupt when executed. Such instructions function similarly to subroutine calls and are used for a variety of purposes, such as requesting operating system services and interacting with device drivers.
Exceptions -> Caused by Instructions and are divided into two categories: Faults and Traps. A Fault is when a processor executes an instruction with some exception that is correctable i.e (Page Fault), the processor will save the current execution state and fix the Page Fault then re-executes the instruction. A Trap is issued when a program needs servicing from the OS by executing a special kind of instructions, the processor executes the system call handler and resumes execution right after is paused.
Breakpoints -> INT3
is a Software interrupt used as input signal for CPUs by debuggers to implement a breakpoint. The opcode for INT3
is 0xCC
which is replaced with one byte of an instruction _which you want to set a breakpoint at_ that triggers INT3
interrupt designed for debuggers to generate an exception and the OS will stop the program and transfer control to the debugger -> this is basically doing self-modifying code as the code changes while it runs. DR0-DR3 are registers in the CPU when set will trigger a Hardware Breakpoints as addresses the CPU will remember to pause at, with DR7
to store control information. This prevents change in Code. While Traps are used in debuggers to do a Step-In instruction by setting TF
Trap Flag for the processor to execute only one instruction at a time.
Now if this turned out to be a drag, it's okay, most of these lower-level concepts are easy to forget unless reviewed and practiced, Later we will discuss the Assembly/ C Code Constructs which will lay out a good foundation for reverse engineering full Programs .
Last updated