2016-11-08

HackTheVote 2016: Vermatrix Supreme Writeup

I picked this challenge to work on first since it was the lowest point crypto challenge, which also just happened to be written in Python. The challenge had the following flavor text:

Working in IT for a campaign is rough; especially when your candidate uses his password as the IV for
your campaign's proprietary encryption scheme, then subsequently forgets it. See if you can get it 
back for him. The only hard part is, he changes it whenever he feels like it.

nc vermatrix.pwn.democrat 4201

handout

author's irc nick: negasora

The handout link provided the source to a python2 program, which presumably was the source of the challenge running on the remote server:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
import sys, random, time

flag = "flag{1_sw34r_1F_p30Pl3_4cTu4lLy_TrY_Th1s}"

def printmat(matrix):
    for row in matrix:
        for value in row:
            print value,
        print ""
    print ""


def pad(s):
    if len(s)%9 == 0:
        return s
    for i in xrange((9-(len(s)%9))):
        s.append(0)
    return s

def genBlockMatrix(s):
    outm = [[[7 for x in xrange(3)] for x in xrange(3)] for x in xrange(len(s)/9)]
    for matnum in xrange(0,len(s)/9):
        for y in xrange(0,3):
            for x in xrange(0,3):
                outm[matnum][y][x] = s[(matnum*9)+x+(y*3)]
    return outm


def fixmatrix(matrixa, matrixb):
    out = [[0 for x in xrange(3)] for x in xrange(3)]   
    for rn in xrange(3):
        for cn in xrange(3):
            out[cn][rn] = (int(matrixa[rn][cn])|int(matrixb[cn][rn]))\
            &~(int(matrixa[rn][cn])&int(matrixb[cn][rn]))
    return out


def chall():
    IV = [c for c in '?????????']
    seed = "??????????????????"


    blocks = genBlockMatrix(pad(IV + [ord(c) for c in seed]))

    res = [[0 for i in xrange(3)] for i in xrange(3)]
    for i in xrange(len(blocks)):
        res = fixmatrix(res, blocks[i])


    print "SEED: " + str(seed)
    printmat(res)

    data = raw_input("")

    data = data.replace(' ', '').strip().split(',')

    if len(data) != 9:
        return False

    for i in xrange(len(IV)):
        if str(IV[i]) != str(data[i]):
            return False

    return True


if chall():
    print flag

Looking over the source, it appeared the flag was stored in the first variable flag, and would only display if the chall() function returned true. Since the script didn't import socket I could only assume all I/O would be handled via STDIN and STDOUT, probably through netcat.

At this point, I wanted to know what the output looked like, so ran ncat vermatrix.pwn.democrat 4201, which gave the output:

SEED: %bIdKOc'I
997 678 527
827 773 1161
1025 836 562

Connecting a second time gave the output:

SEED: (v=]UPNh?
1339 617 777
1055 770 465
677 1021 609

And running the program locally with python2 handout.4838bbdb8619b3a581352c628c6b0b86475b94c9519347a520c90cf1822351ae.py gave the error:

Traceback (most recent call last):
  File "handout.4838bbdb8619b3a581352c628c6b0b86475b94c9519347a520c90cf1822351ae.py", line 66, in <module>
    if chall():
  File "handout.4838bbdb8619b3a581352c628c6b0b86475b94c9519347a520c90cf1822351ae.py", line 46, in chall
    res = fixmatrix(res, blocks[i])
  File "handout.4838bbdb8619b3a581352c628c6b0b86475b94c9519347a520c90cf1822351ae.py", line 33, in fixmatrix
    out[cn][rn] = (int(matrixa[rn][cn])|int(matrixb[cn][rn]))&~(int(matrixa[rn][cn])&int(matrixb[cn][rn]))
ValueError: invalid literal for int() with base 10: '?'

With this error it's obvious to see that the values provided in the script on lines 39 and 40 are the ones that are likely to be randomized with each connect to the remote version of the script. It is here that the meat of the challenge becomes apparent.

First Steps


We can see by looking at the chall() function's lines 58 and 62 two strict conditions for the program to be able to print the flag for us:

if len(data) != 9:
    return False
for i in xrange(len(IV)):
    if str(IV[i]) != str(data[i]):
        return False

where data is read in via raw_input(""). This indicates is that whatever data is sent in must be exactly 9 elements long, and, according to the code on line 55, must be comma delimited:

data = data.replace(' ', '').strip().split(',')

Additionally, we know from the source and output that the value of SEED is simply printed out, and is also 9 characters long, so we'll go ahead and fill those in on the local copy of the script:

IV = [c for c in '012345678']
seed = "HERPDERPP"

Running the local script again:

-> % python2 handout.py
SEED: HERPDERPP
72 70 84 
81 64 66 
80 85 8 

we see it is awaiting our raw_input(), and after supplying the IV in comma-delimited form:

0,1,2,3,4,5,6,7,8

we verify that the flag is output:

flag{1_sw34r_1F_p30Pl3_4cTu4lLy_TrY_Th1s}

The Challenge


Now the question is, how do we work backwards from what we're given (SEED and a 3x3 matrix of numbers) to get to the appropriate IV? Well, since we know the Matrix gets printed out, lets see what goes into making it.

We see that the printmat(matrix) function iterates over a nested iterable (probably a list) and prints each value in the row before printing a newline, meaning from our output before that matrix probably looks something like [[72, 70, 84], [81, 64, 66], [80, 85, 8]]. The only time printmat() gets called is on line 51 with res as the argument.

So now we must look at how res is constructed. On line 45, res is initialized to a list of three lists all with the value of 0:

res = [[0 for i in xrange(3)] for i in xrange(3)]

If you've never worked with list comprehensions then this line might look weird, however it's equivalent to writing:

res = []
for i in xrange(3):
    tmp = []
    for j in xrange(3):
        tmp.append(0)
    res.append(tmp)

which results in res holding [[0, 0, 0], [0, 0, 0], [0, 0, 0]]

The next two lines show that a function fixmatrix(matrixa, matrixb) is run on res, outputing its result to res, but also making use of another variable, blocks.

blocks


blocks is generated on line 43 with:

blocks = genBlockMatrix(pad(IV + [ord(c) for c in seed]))

Here we've got another nested list comprehension statement within a double function call. I forgot to mention that this is also how IV is generated on line 39:

IV = [c for c in '012345678']

which produces ['0', '1', '2', '3', '4', '5', '6', '7', '8'].

[ord(c) for c in seed] generates the list [72, 69, 82, 80, 68, 69, 82, 80, 80], and using the + operator simply concatenates both lists into a single list:

['0', '1', '2', '3', '4', '5', '6', '7', '8', 72, 69, 82, 80, 68, 69, 82, 80, 80] which is 18 elements long.

When we look at the pad() function, we see that its main function is to ensure that the len() of the list supplied to the function is divisible by 9, which 18 already is. If it weren't, then 0's would simply be appended to the list until its length was divisible by 9:

def pad(s):
    if len(s)%9 == 0:
        return s
    for i in xrange((9-(len(s)%9))):
        s.append(0)
    return s

The last function call of this line (#43) is to genBlockMatrix().

genBlockMatrix()


This function starts off initializing a 3-dimensional array filled with 7's by using a list comprehension:

outm = [[[7 for x in xrange(3)] for x in xrange(3)] for x in xrange(len(s)/9)]

where the x and y dimensions are of magnitude 3, and the z dimension is of magnitude len(s)/9, which comes out to 2 when len(s) == 18. Therefore the matrix's "signature" is outm[z][y][x] == outm[2][3][3].

If you're having a hard time visualizing this, this is what it looks like:

[[[7, 7, 7], [7, 7, 7], [7, 7, 7]], [[7, 7, 7], [7, 7, 7], [7, 7, 7]]]

Alternatively:

[[ [7, 7, 7], 
   [7, 7, 7], 
   [7, 7, 7] ],

 [ [7, 7, 7], 
   [7, 7, 7], 
   [7, 7, 7] ]]

The next 4 lines (22-25) simply read in the supplied list (s) into the matrix, left-to-right, top-to-bottom, front-to-back. The result:

[[['0', '1', '2'], ['3', '4', '5'], ['6', '7', '8']], [[72, 69, 82], [80, 68, 69], [82, 80, 80]]]

Now that we have the value for blocks (and res) we can dive into the final method of the program, fixmatrix(matrixa, matrixb)

fixmatrix()


blocks = genBlockMatrix(pad(IV + [ord(c) for c in seed]))

res = [[0 for i in xrange(3)] for i in xrange(3)]
for i in xrange(len(blocks)):
    res = fixmatrix(res, blocks[i])

As we can see in the code, fixmatrix() is run a number of times equal to the len(blocks), which in our case is 2. Also to note, each time fixmatrix runs, the output is reassigned to res, and in the invocation, only blocks's index is incremented.

Now, for the meat of the function:

def fixmatrix(matrixa, matrixb):
    out = [[0 for x in xrange(3)] for x in xrange(3)]   
    for rn in xrange(3):
        for cn in xrange(3):
            out[cn][rn] = (int(matrixa[rn][cn])|int(matrixb[cn][rn]))\
            &~(int(matrixa[rn][cn])&int(matrixb[cn][rn]))
    return out

Here we can see that res is matrixa, and blocks[i] is matrixb. On the first line, a 3x3 matrix out is initialized with all 0's. Next is a double for loop which iterates over all 9 elements in the out matrix. The entirety of the transformation is done on the assignment line, out[cn][rn] =....

Keeping in mind that cn refers to column number and rn refers to row number, then each element of the out matrix is being set to the bitwise expression on the same line. Removing the unnecessary visual fluff from the expression yields:

(matrixa | matrixb) &~ (matrixa & matrixb), or in logical notation: \((A \lor B) \land \lnot (A \land B)\)

If you haven't already, you should recognize this expression as XOR or exclusive-or, which can also be written as \(A \oplus B\). I've included below a truth table so that you can see how A and B are related.

\(A\) \(B\) \(A \lor B\) \(A \land B\) \(\lnot(A \land B)\) (\(A \lor B) \land\lnot(A \land B)\)
0 0 0 0 1 0
0 1 1 0 1 1
1 0 1 0 1 1
1 1 1 1 0 0

An important property of XOR is that when you XOR any data with 0, you simply get the supplied data as your result. For example, \(752875 \oplus 0 == 752875\). Now, back to the challenge, something important I have left out so far is that on the out[cn][rn] = ... assignment, you'll notice that the [cn][rn] are swapped for both out and matrixb. This means that both of these matrices are transposed. BUT, since out is being assigned to (and it's transposition matches matrixb's) then it's really as if matrixa is the one that is transposed.

Taking all this information into account, since the first run of fixmatrix is done with matrixa == res, and res being initialized 0's, and matrixb == blocks[0] == IV, we have:

$$out = \begin{bmatrix}0&0&0\\0&0&0\\0&0&0\end{bmatrix}^\intercal \oplus \begin{bmatrix}0&1&2\\3&4&5\\6&7&8\end{bmatrix} = \begin{bmatrix}0&1&2\\3&4&5\\6&7&8\end{bmatrix}$$

A second run of fixmatrix has the previous result of res == IV assigned to matrixa, with matrixb set to seed:

$$out = \begin{bmatrix}0&1&2\\3&4&5\\6&7&8\end{bmatrix}^\intercal \oplus \begin{bmatrix}H&E&R\\P&D&E\\R&P&P\end{bmatrix} = \begin{bmatrix}0&3&6\\1&4&7\\2&5&8\end{bmatrix} \oplus \begin{bmatrix}H&E&R\\P&D&E\\R&P&P\end{bmatrix}$$

Which means that the final value of res, which is printed out on line 51 with printmat(res) is simply the XOR of the seed string with the (transposed) values of IV!

One of the greatest properties of XOR is that if \(C = A \oplus B\), then \(C \oplus B = A\) (and \(C \oplus A = B\)). This means that if we simply XOR the values from the (transposed) output of printmatrix(res) with the ordinal values of the printed seed, we'll recover the IV.

The Solution


Here is my hacked-together solution:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
#!/usr/bin/env python3
import sys
import socket
import re
from codecs import decode

#Regex Pattern to capture seed string
seedPat = re.compile(r'^SEED:\s(.+)$')

#Create our Socket object and connect to the challenge server
s = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
s.connect(('vermatrix.pwn.democrat', 4201))

#Receive the challenge data
data = b''
data += s.recv(128)
#Turn bytes into strings
data = data.decode('utf8')
#create a list
rows = data.split('\n')

#print formatted data
print(data)
#print list
print(rows)
#remove empty list element
del(rows[4])

#perform a regex match agains the seed string
#could also use rows[0][6:]
seedMatch = seedPat.match(rows[0]).groups(1)[0]

#construct our 'res' matrix
matrix = rows[1] + ' ' + rows[2] + ' ' + rows[3]
matrix = [int(x) for x in matrix.split(' ')]

#verify captured data
print(seedMatch)
print(matrix)

#if the seed is short (9 chars)
if len(seedMatch) == 9:
    #perform XOR of res and seed values:
    g = [x[0] ^ ord(x[1]) for x in zip(matrix,seedMatch)]

#otherwise seed is 18 chars
else:
    #use last half of seed first
    g = [x[0] ^ ord(x[1]) for x in zip(matrix,seedMatch[9:18])]
    #transpose intermediate result
    tmp = [g[0],g[3],g[6],g[1],g[4],g[7],g[2],g[5],g[8]]
    #then XOR tmp value with first half of seed
    g = [x[0] ^ ord(x[1]) for x in zip(tmp,seedMatch[0:9])]

print("G: {}".format(g))

#store transposed result
z = "{},{},{},{},{},{},{},{},{}\n".format(g[0],g[3],g[6],g[1],g[4],g[7],g[2],g[5],g[8])
#convert string to bytes
z = bytes(z,'ascii')

#verify bytestring about to be sent 
print("Calculated IV: {}".format(z))

#send response to server
s.sendall(z)

#receive flag
data = b''
data += s.recv(128)
data = data.decode('utf8')
rows = data.split('\n')

#print flag
print(data)

Something I didn't find out until after I coded up a quick solution is that the challenge server usually spit out seeds that were 18 characters in length instead of 9 like I had witnessed my first few times running the challenge. In these cases, this solution I had written failed since I didn't take into account the second half of the seed. After retrieving the flag by spamming my solution until I got a seed of 9, I modified it to accept seeds of length 18.

You'll see the extra step that is needed starting on line 47 (with the else statement) is that the last half of the seed must be used first, followed by transposing the intermediate solution, before XORing this with the first half of the seed.

In both cases the result matrix g has to be transposed a final time before returning the value to the server. What is interesting to note is that this form of encryption is extremely similar to CBC mode of AES, minus the matrix transposition. Read up on this here.

And finally, running the script:

-> % ./solver.py
SEED: S;edzS'IeF"MA_^:Lg
924 1224 798
926 410 845
801 521 1084

['SEED: S;edzS\'IeF"MA_^:Lg', '924 1224 798', '926 410 845', '801 521 1084', '']
S;edzS'IeF"MA_^:Lg
[924, 1224, 798, 926, 410, 845, 801, 521, 1084]
G: [905, 996, 894, 1166, 447, 534, 884, 858, 1086]
Calculated IV: b'905,1166,884,996,447,858,894,534,1086\n'
flag{IV_wh4t_y0u_DiD_Th3r3}