Errant Security
Null Pointers

After you’ve competed in enough Capture the Flag events, you begin to get a feel for the different types of challenges that are thrown at you. The standard categories are Cryptography, Web Exploitation, Stegonagraphy, Binary Exploitation and Reverse Engineering. Different events may add or remove one or two here and there, but in general you have some variation of those categories. For this article, I will focus on binary exploitation. My goal is to create a series of articles which will outline what binary exploitation is, my methodology for approaching those problems, and some practical examples/walkthroughs of actual vulnerabilities!


What is Binary Exploitation?

Before getting into the technical nitty-gritty, we need to define what a binary exploitation vulnerability is. In general, binary exploitation refers to a set of challenges whose goal is to disrupt the logical flow of an application in a way that the writer did not intend. Some common examples of this are buffer overflows and heap overflows. Any binary exploitation challenge or exploit will normally start with a corruption of memory. For example, you may overwrite a function pointer in a memory structure or overwrite the return address of a function or even something as simple as changing a flag within an in-memory configuration block! If you don’t know what this means, don’t worry. I’ll discuss this in more depth.

In all of these cases, the user is given the opportunity to provide either malformed input or more input than expected. A well written program will account for both of these cases with error checking. A good rule of thumb is that white listing is better than black listing. What this means is that you should specifically enumerate every correct input, and discard all others an errors. Doing this will alleviate many of the bugs that create the opportunity for binary exploitation.

Before attempting to exploit a vulnerability, we first have to explore how modern processors execute code. By it’s very nature, a binary exploit is utilizing the normal operation of the processor interacting with memory against the programmer. A deep understanding of the processor itself is required. Today, I’ll go over some basics of the low-level operation of a processor.


Registers: how a processor… processes…

A processor operates using small variable like objects known as registers which store temporary/intermediate values as well as the current processor state. These variable like objects are stored inside the physical processor itself for performance reasons. This allows instructions in the processor to run extremely quickly. Even with modern memory, the cost of reaching out to read from the motherboard is expensive.

On an x86 based processor, there are many registers that are used. We will mainly deal with what is called General Purpose Registers (GPRs). These registers are used by every function and store data and intermediate values. They can hold memory addresses or regular values.

Each base GPR has the same number of bits as the processor on which it is being used. So, on a 64-bit processor, each GPR is 64-bits wide. In order to maintain backwards compatibility, the registers are named according to their size. On the original 8086 16-bit x86 processors, registers had no prefix, and were 16-bits wide. Next, on i386 32-bit processors, registers were given “e” prefixes (for “extended”). These extended registers are 32-bits wide. Most recently, 64-bit registers are named with an r prefix. The following are the GPRs on x86 based systems:

  1. [re]ax - the accumulator, generally used in arithmetic
  2. [re]bx - the base, normally used to store an offset
  3. [re]cx - the counter, used in looping operations
  4. [re]dx - data, result of multiplication/division and in/out port calls
  5. [re]sp - stack pointer, points to the newest item on the stack
  6. [re]bp - base pointer, the base of the current procedures stack frame (used to reference parameters)
  7. [re]di - destination index, used as a pointer and is normally the destination for string operations
  8. [re]si - source index, used as a pointer and is normally the source for string operations

There are other registers which you may see used in x64 systems such as r8-r15. These are basically extra general purpose registers. r0 through r7 are ommited and are likely reserved for the general purpose registers named above for backwards compatibility reasons.

In the case of all of the above 8 GPRs, you can always access smaller versions. For example, on a 64-bit processor, rax references the 64-bit accumulator, while eax references the low-32-bits of the accumulator. Additionally, ax through dx are divided into [a-d]l and [a-d]h. For any given register, the bits are divided up like so:

Register Breakdown

Setting rax to 0x12345678ABCDEF00, and then setting al to 0xAA will result in rax=0x12345678ABCDEFAA. There is one other register which is incredibly important to binary exploitation. The ip or “Instruction Pointer” register (rip on 64-bit, and eip on 32-bit) holds the address to the next instruction to be executed. This register is not directly accessible to the code, but may be indirectly inspected by using the call instruction. The call instruction first pushes the address of the next instruction onto the stack, and then jumps to it. So, an example function in assembly to retrieve the address of the next instruction would be:

get_eip:
mov eax,dword[esp]
ret

We can then use this function to get the address of an instruction easily:

call get_eip
some_label:

After the call, eax would hold a pointer to some_label!


Memory: what a processor processes

Memory on x86 processors is an interesting case. In the old days of 16-bit processors, memory segmentation was the primary form of protection in processors. An individual process was allowed to access the physical address of any memory address. Some memory was protected (each segment had R/W/X protections), but the address, say 0xDEADBEEF, referred to it’s physical location within the memory chip installed on the motherboard.

With the advent of the 32-bit processors, we received a gift: paging. Paging can be threatening to many people at times because it’s implementation can get confusing, but the idea is simple: allow the processor to map different physical portions of memory to arbitrary locations in an imaginary “virtual memory”. This idea effectively killed segmentation (although it is still technically around in x86 processors for backwards compatibility reasons). With paging, a programmer is able to map arbitrary portions of physical memory into any address he likes in virtual memory and set access permissions, including Read, Write, and Execute bits for individual pages. Originaly, these pages were 4KB long. Later, this was increased to 4MB with Page Addressing Extensions (PAE).

Modern operating system implementors took this one step further by segregating each process into their own “virtual memory world”. Each process references it’s own memory as if nothing else is running. Each process is unable to see other process memory. It’s not just innaccessible; it doesn’t exist from it’s point of view. Two processes may be loaded at 0x08045000 with no issues, because each process has it’s own physical memory mapping. This is great for security! No longer can individual processes just read other process memory. We can now fully segregate processes!


Program Loading: how is memory initialized?

As discussed above, modern PC’s utilize memory paging to create individual virtual address spaces for each process. When loading a process, individual libraries which are reused across the system are first mapped into the new process address space. Then, the new process itself is mapped into the address space. When the shared libraries and other system components are loaded at random addresses, we call this Address Space Layout Randomization (ASLR). Many programs are compiled with static addresses referenced within their code. Because of this, they have to be loaded at a specific address, irrelevant of ASLR state. Other programs, called Position Independant Code (PIC) or Position Independent Executables (PIEs), are compiled in such a way that they can be loaded anywhere. This allows the system to randomize the entire address space (making binary exploitation very difficult, but not impossible). During the loading process, most modern operating systems are configured to set access permissions in the program memory pages according to their usage. For example, code pages are Read Only and Executable, while data pages are Read-Write, but not Executable. Marking the data pages of an program as not executable is commonly referred to as “NX”, a security feature implemented at compile time and honored by the operating system at runtime. A program not compiled with NX at build time will be vulnerable even if the operating system is capable of the protection.

ASLR, NX, and PIE are by far the biggest protections against binary exploitation. With these protections in place, even a skilled attacker with a bonafied buffer overflow and ip control will have a tough time executing anything useful. This is because even with control over the ip, we are normally unable to modify any executing sections of code. Additionally, the addresses within the program are randomized and are hidden from the attacker, therefore we are not even able to reliably jump to a specific location within known code!

Methods for subverting these security mechanisms usually involve a memory leak. The attacker somehow is able to leak an address either within the binary itself or, if they are lucky, within the standard libraries. With a little bit of recon and a single leaked address, an attacker is capable of complete application takeover using techniques such as Return Oriented Programming (ROP).


The Stack: how is memory used?

Aside from static/global memory allocated at loading time, a program also uses temporary memory for local variables and storing/saving values for later usage. For this purpose, the stack is used. As discussed in the register section above, the stack is pointed to by the sp register. Machine instructions like push and pop can add and remove things from the stack. A push instruction will first decrement sp by the one integer (16, 32, or 64-bits depending on the processor). It will then copy the given value to this address. This grows the stack. A pop instruction will first copy the value at sp to the given location, then increase sp by the integer size (the inverse of push).

For every function, there is a defined range of stack memory allocated. This range is called the function’s “stack frame”. The stack frame ranges from a low address of sp and a high address of bp. The stack frame is built during the function preamble (a set of assembly instructions added to the beginning of every function). For example, consider this C function:

int do_stuff(int a, int b)
{
	int local_var = a+5;
	return local_var+b;
}

This is a simple function, albeit inefficient. We take in two parameters, store one in a local variable, and return the sum of the local variable and the second parameter. Depending on optimization options and compiler version, one possible compilation of this function could be:

do_stuff:
push ebp
mov ebp,esp
sub esp,4
mov edx,dword[ebp+8]
mov dword[ebp-4],edx
mov eax,dword[ebp-4]
add eax,dword[ebp+12]
ret

Those familiar with assembly or C programming will likely cringe or say “The compiler wouldn’t do that!”. This is true, but it’s hard to create a simple example where the compiler wouldn’t simply optimize away your local variables :P. Back to the lession at hand, the first three instructions are what we call the preamble. We first save the base pointer on the stack. This allows us to restore the state of the previous function’s stack frame. Next, we load the base pointer with the current value of sp. This initializes our new stack frame. Lastly, we subtract some number of bytes from sp in order to reserve that memory for our local variables. From this point on, bp will not change in the function. sp may move down and back up throughout the course of the function but will never be larger than bp. bp, therefore, is used as a constant location to reference both local variables (at ebp-OFFSET) or parameters (at ebp+OFFSET).

Depending on your calling convention, parameter’s may be stored on the stack prior to calling a function (more details below). When referencing parameters offset from ebp, you must always remember to add an extra offset for both the saved bp and the saved ip. bp is saved by the function preamble, and ip is saved by the call instruction before entering the function itself. On a 32-bit processor, this is an extra 8-bytes, while it would be 16-bytes on a 64-bit processor. Consider the following function:

int main(int argc, char** argv)
{
	char hello_buffer[512];

	if( argc != 2 ){
		printf("usage: %s [name]\n");
		exit(1);
	}
	
	snprintf(hello_buffer, 512, "Hello, %s\n", argv[1]);
	puts(hello_buffer);

	return 0;
}

On a 32-bit x86 target utilizing the cdecl calling convention, this function may have a stack frame resembling the following:

Example Stack Frame


Calling Conventions: how functions coordinate.

In assembly, there is no concept of functions. Assembly only natively understands addresses, instructions, data and registers. The concept of functions is an abstraction implemented by programmers to make logical program segregation and code reuse straightforward. The rules regarding parameter passing, register usage, and return value passing are collectively referred to as a “Calling Convention”. There are many calling conventions implemented, and the one you use must match the code you are interacting with. I will describe the most common 32-bit and 64-bit Unix conventions. Other common calling conventions are documented on Wikipedia

cdecl: The C Calling Convention

On 32-bit Unix-like machines, the most common default calling convention is cdecl. This is the default C calling convention. For our purposes, the following details are important:

  1. Parameters are pushed onto the stack from right to left (e.g. the last parameter is pushed first). The caller is responsible for removing the parameters from the stack.
  2. Integer and pointer return values are returned via the accumulator (rax, eax, ax).
  3. Floating Point return values are returned via fp0.
  4. eax, ecx, edx are caller saved. This means the function may clobber and not save them.
    • All other registers must be saved by the callee (aka the function).
  5. All Floating Pointer registers (fp0 through fp7) must be empty upon return, unless fp0 is used for a return value.

More information can be seen at the Wiki.

System V AMD64 Calling Convetion

On 64-bit Unix based clients, the System V AMD64 Calling Convention is default. The following are some important notes:

  1. The first six integer or pointer parameters are passed in: rdi, rsi, rdx, rcx, r8, and r9. Further parameters are passed on the stack, pushed from right to left as in cdecl. Caller must cleanup all stack parameters.
  2. Return values less than or equal to 64-bits are returned in rax, larger return values are sent in rdx:rax.
  3. rbx, rbp and r12-15 are callee saved. All other registers are caller saved.

More information can be seen at the Wiki

Why do I care?

Knowing where your parameters are is important. If you are building an exploit which calls a system level function, you need to understand where that function is expecting it’s parameters. Additionally, in all these cases, the return address is the last thing stored before a function begins executing (it is at the sp address upon execution of the function). For buffer overflows, overwriting this value is the goal of the exploit. Understanding where parameters are, and the agreed upon rules for calling functions is pivotal to triggering an exploit without causing your target to crash.


Conclusion

At this point, you should understand how a processor goes from a binary executable file to actually executing code. You should also understand how a program interacts with internal functions and external libraries. These are fundamental concepts which must be understood before you can fully grasp and independently implement your own binary exploits. Stay tuned for a follow-on article utilizing your new knowledge to discuss common binary exploitation vulnerabilities!


feature-top