C Code constructs and Assembly Primer

WIP

Computer Architecture | Instruction Set Architecture (ISA)

To start this discussion with the end goal of having a solid grounds to understanding in the lowest level possible how a program functions by interacting with the Computer as Hardware, one must step back a bit to look at a Computer in it's very basic structure.

This is the basic architecture of any Computer at it's heart, not just PCs but Computer / Computing Devices of the modern world in general. This architecture has Three main Interfaces (CPU | Main Memory | Peripherals) with means to transfer data/information between them.

CPU: Central Processing Unit, for processing information, consists of:

ALU -> for doing the Math.

Control Unit -> for executing code.

Registers -> for storing data for fast processing, Registers are small memory that is accessed by names, for X86 a register size is of 1WORD.

Main Memory -> Stores data Moderately fast and is accessed by addresses, we can think of Main Memory as a sequence of linear bytes that is accessed by addresses.

Peripherals -> Devices attached to the Computer to interact with the outside world i.e (Keyboards, Mouse ..etc)

Bridges -> Responsible for handling transactions between these main interfaces by coordinating communication between Buses.

Buses -> Physical means to transfer information between different interfaces/components.

System Bus -> for transferring data to/from CPU.

Memory Bus -> for transferring data to/from Main Memory.

I/O Bus -> for Peripheral Devices.

Assembly vs Machine Code

Speaking of Low Level, for the simplest possible piece of code to execute and give us an output the very best Hello World! program there are tons of little operations working together under the hood to give us this output, from flipping transistors, fetching data from memory, sending signals via buses to the graphics card to print out Hello World!. With all this, the only language Computers understand is 1s and 0s i.e Binary Code.

In this episode we will discuss the two lowest Programming Codes that are directly translated to Binary Code, Machine Code, Assembly ,yet Assembly is not really a Programming Language but rather the simplest readable representation that could be for Machine Code.

Machine Code -> is the most detailed, lowest-level interaction with the processor that could be, all programming languages are an abstraction of that to make it easier to interact with the processor given the growth of number and complexity of modern computers' applications that are out there. So Machine Code is a Direct Interaction with the CPU to move information in/out of Memory/Registers, Control system bus, ALU ...etc

Is it Possible to program in Machine Code?? well... If you're done installing Arch from scratch on and on and emulating full systems with Qemu, I guess you're depressed enough to try it out :) .

Assembly -> on the other hand is the quickest shorthand to get a direct access to the CPU given the mnemonics representing various instructions to save us from writing punch of hex code Machine Code to write a simple operation like ADD / SUB, the more complex an instruction set can be the more complex operation a mnemonic can represent.

As we move higher to high-level languages from the lot of Compiled languages to Interpreted languages we move much further from a direct interaction with the Processor.

Compiling Code vs Reverse Engineering Compiled Code

The Compilation Process for developing any Program, goes from Compiling Source Code to Assembly Code, and Assembling Assembly Code to Machine Code. As Reverse Engineering we use tools to do the very opposite of this.

Source Code --> Compiler -> Assembly --> Assembler -> Machine Code

To Disassemble/Reverse Engineer Binaries we use a disassembler that takes Machine Code translate it back to Assembly Code with much much work done to get us near to the original functionality of the program if Compiled with levels of Optimization and by using other tools we can De-compile this Assembly Code into some Pseudocode to make our lives even better.

Machine Code --> Disassembler -> Assembly Code --> Decompiler -> Pseudocode

Instruction Set Architecture (ISA)

There are tons of different Assembly Dialects out there, all designed based in an Instruction Set Architecture (ISA) that defines HOW to design the Assembly Dialect, the HOW has nothing to do with the Syntax of an Assembly Dialect, the syntax is entirely defined by the Assembler, and there is no Standard syntax for Assembly in general, not just for X86 or any other particular architecture's assembly dialect e.g we have AT&T syntax for X86 architectures that is used by GCC, and Intel syntax that is mostly used by reverse engineering tools and particularly IDA (The Interactive Disassembler).

Back to the HOW, the ISA or Instruction Set Architecture defines in general:

The state of available registers on the processor to use, how many of them? what's the size?
For Data Manipulation, defines the Data Format and how to interact with data in memory?
For Instructions, defines the Scope of operations/Instructions an assembly dialect can use specific to a particular ISA, Is it primitive instructions only e.g ADD / SUB? are there Complex Instructions?

An Instruction Set Architecture is implemented over a system's Micro-Architecture which in itself is the full implementation of registers, memory, and all other logical circuits that build up a computer. With a Micro-Architecture and an ISA combined we have different Computer Architectures that can be broadly divided into:

CISC (Complex Instruction Set Computing) -> LARGE set of Complex Instructions that makes up X86 and Friends, It is hard/expensive to design, consumes much power and makes up chips that are large in size. e.g (X86 for PCs)

RISC (Reduced Instruction Set Computing) ->** SMALL** set of Simple Instructions, easy/cheap to design, low power consumption and it's chips are small in size. e.g (ARM for phones | MIPS for IoTs | PowerPC)

Between all different Computer Architectures, CISC and RISC are what we're mostly concerned about, for a nice discussion over both of them, Check these articles.

01: https://medium.com/macoclock/interesting-remarks-on-risc-vs-cisc-microprocessors-1f034dca16ff

02: https://medium.com/swlh/what-does-risc-and-cisc-mean-in-2020-7b4d42c9a9de

Programming with X86

In this section we will jump straight to Programming in X86 enough to make it easy for us to reverse engineer other people's code through reading it's disassembled code. For basics on X86 discussing registers, basic instructions, moving and copying data, accessing memory, and fetching data from memory, scanning and comparing strings ..etc you might want to take a look at X86 Section in a Previous Episode: Disassemble that Binary .

Yet for the sake of argument, I would briefly discuss X86 Addressing Modes as a starter, because in my point of view, this is ONE of the most important aspects in X86, specially when reverse engineering code bases written in high-level languages, knowing the X86 Addressing Modes helps spot data structures whether simple or complex.

Addressing Modes

The reason behind different X86 Addressing Modes, is to effectively support high-level languages' constructs and data structures, different Addressing Modes allows a faster access to elements in Arrays either one or two dimensional array, members of structures either simple or complex, member of records and arrays of records ..etc, also as we will see shortly with examples, Addressing Modes can be quite simple just to indicate access to a local variable on the stack and so on. Addressing Modes is basically HOW to write these structure elements' addresses. Addressing Modes depend on the Memory Address Size, so whether 32bit memory address size, or 16bit memory address size, and just for the record, mostly Addressing Modes that are available for 32bit addresses are also available for 16bit addresses.

So we will go through different addressing modes that are of importance to us, given some examples to work on:

Absolute Addressing: Used when we have a Variable sits at a fixed address, so we can choose to use it's absolute address, or it's label if it sits at one.

Effective Calculated Addresses: [ absolute_address ]

e.g: Global Variables in C


// C
// global variable

int x = 4;


; asm
; assuming global variable sits at address 0x1000

01: mov  eax, [0x1000]


; we can use simple arithmetic to calculate absolute address
; only addition and subtraction

02: mov  eax, [0xdeadbeef + 1337]


; we can use label as an absolute address
; label in assembly is a MARK of a MEMORY LOCATION

03: mov  eax, [label]

Indirect Addressing -> Best used with Pointer in C, in asm we use registers to indicate the address.

Effective Calculated Address: [ register ]

e.g: Pointers

// C

int x = 5;
int *p = &x;

; asm
; ONLY 16bit | 32bit GPRs are used for indirect addressing in X86

01: mov  eax, [ebx]
02: mov  eax, [bx]

we can think of square brackets [] in asm as &used with pointers to memory addresses in C.

Based Addressing -> For offsets inside a structure, when the structure's base address is known we can use a register as a Base and an absolute displacement for the address.

Effective Calculated Address: [ base + displacement ]

e.g: Local Variables of a function's stack frame, as relative to the base frame (EBP)

// C
// local variable inside a function

int i = 4;

; asm
; local variables are always relative to the stack's base frame
; local stack's base frame (EBP) -> Base
; displacement as offset from base


; consider i is 8bytes from the base frame
01: mov  eax, [ebp-8]


; also function's arguments in assembly relative to EBP
02: mov  eax, [ebp+8]

Indexed Addressing -> Indexed addressing is best used to access records or arrays, where you know the base of the structure which is called a displacement in this case and an index to the element as offset from the base. it can be used with or without a scaling factor.

scaling factor specifies the data size of an element.

Format w/o scaling factor: [ displacement + index ]

Format w/ scaling factor: [ displacement + index*scale ]

e.g: records | arrays

// C
// array of 4byte-sized elements

int x[5];

; asm
; accessing the 3rd element of an array of integers
; scaling factor == 4bytes

; with scaling factor
; base of the array in EBX
; index of the element in ECX
; ECX should have 3 to refer to the 3rd element
01: mov  eax, [ebx + ecx*4]


; without the scaling factor
; index to the element must be 12 to refer to the 3rd element
02: mov  eax, [ebx + ecx]

; generally a scaling factor saves from counting BYTES to the element

; asm
; base to an array could be a register having the address
; or an absolute address
; or a label

01: mov  eax, [ebx + index*scale]
02: mov  eax, [0x100 + index*scale]
03: mov  eax, [label + index*scale]

Based Indexed Addressing -> For a complex structure containing elements of different data sizes, with knowing the base address of the structure, a displacement in bytes as an offset from the beginning of the structure, index INTO the structure member we want to access. This can also be use with or without a scaling factor.

Format w/o scaling factor: [ base + index + displacement ]

Format w/ scaling factor: [ base + index*scale + displacement ]

e.g: Complex Structs | two-dimensional Arrays

// C
// complex Struct with two members of different element
// displacement to a member from the struct's base is of members' sizes preceding it.

struct {
    short i;
    int a[4];
} s;

Now let's say we want to access the second member of array a;

; asm
; EBX as base == beginning of the structure
; ECX as index == index INTO the member to access (2 in this case)
; displacement == offset to member to access from the struct's base, in this case its of size (short)
; scale == size of member's element


01: mov  eax, [ebx + 4*ecx + 2]

; again without a scaling factor ECX must be 8

Instruction Set: Control Flow | Loops

Spotting Control Flow and Loops constructs are easy, these are implemented with a combination of some test, then a jump. JMP(unconditional Jump), JCC(Conditional Jump), TEST, CMP . CMP compares two operands by using subtraction with the update of ZF EFLAG, TEST does a logical AND and mostly used to check if the destination/first-operand is a Zero, both of them do not store the result, but update the ZF EFLAG. Conditional Codes:

That's a very interesting example to discuss Control Flow and Loops:

char *sub_1000AE3B (char *str)            // function : sub_1000AE3B
{
	int len, i=0, j=0;
	len = lstrlenA(str);
	if (len <= 0) {
		str[j] = 0;
		return str;
	}
	
	while (j < len) {
		str[i] = str[j];
		j = j+3;
		i = i+1;
	}
	
	str[i] = 0;
	return str;
}

01:sub_1000AE3B proc near
02:  push   edi                     
03:  push   esi                    
04:  call   ds:lstrlenA             
          ; calling strlenA with ESI as parameter ; ESI-> str
05:  mov    edi,  eax               
          ; saving the returned value in EDI
06:  xor    ecx,  ecx               
07:  xor    edx,  edx               
08:  test   edi,  edi               
09:  jle    short loc_1000AE5B     
          ; if (len <= 0) { ... }

10:loc_1000AE4D:                    
          ; while() {...}
11:  mov    al, [edx+esi]           
12:  mov    [ecx+esi], al                  
          ; str[i] = str[j]
13:  add    edx,  3
14:  inc    ecx
15:  cmp    ecx,  edi
16:  jl     short loc_1000AE4D      
17:loc_1000AE5B: 
18:  mov    byte ptr [esi+edx],  0  
          ; setting *(esi+edx) = 0   -> str[j] = 0
19:  mov    eax,  esi              
20:  pop    edi
21:  retn

22:  mov    byte ptr [esi+ecx],  0  
          ; str[i] = 0
23:  mov    eax,  esi             
24:  pop    edi
25:  retn
26: sub_1000AE3B endp

01:sub_1000AE3B proc near, 26: sub_1000AE3B endp are Directives not Instructions, they indicate the beginning and end of a procedure. line 01 preserves EDI on the stack, line 02-03 PUSHes the parameter ESI as str and calls strlenA() and loads the returned value EAX in EDI, this corresponds to len = lstrlenA(str); . lines 06-07 sets i=0 , j=0 by XORing them with themselves. line 08 tests for the returned value if len =< 0 and either will jump to the if() procedure at location loc_1000AE5B with the jle JCC or continue to the while() loop at loc_1000AE4D, note lines 19-21 where a condition is tested if true an update to the index value is set either by inc|dec, then testing for the loop's condition, if true we jump to the start again. this corresponds to the while loop while j < len . lines 11-12 are accessing the ith and jth element in an interesting way after setting ESI as our structure's Base Address, and i -> ECX, j -> EDX as offsets, we can reference str[i] ->`` (esi+ecx) | str[j] -> (esi+edx) _recall accessing memory buffers / structure member by using a base register and an offset_ this corresponds to setting str[i] = str[j]; by using register AL as a temporary register. This method in accessing memory of some structure happened again but by writing into the memory addresses in lines 18/22.

LOOP Instructions:

This is a very nice example to implement LOOP / STOS / LODS:

/*Rough C*/

while (ecx != 0) {
		eax = *edi;
		edi++;
                // lods
		*esi = ~eax;
		esi++;
                // stosd
		ecx--;
}

01: 8B CA        mov      ecx, edx     
              ; loading value into ecx to test on for the loops condition
02:         loc_CFB8F:
03: AD           lodsd                 
              ; loads memory from EDI and saves it to eax ;edi update
04: F7 D0        not      eax          
              ; ~ eax    - > negate eax
05: AB           stosd                 
              ; writing ~eax into ESI   ;esi update
06: E2 FA        loop     loc_CFB8F    
              ;  LOOP instruction with ecx update/ test/ jump to start if condition is met

Last updated 5 months ago