15 Nov 2021
Itâs been a bit! Cryptography is fun, but so is coding theory, so weâre gonna take a dive in that direction starting today.
If youâve ever scanned a QR code before, youâve probably noticed that you donât need to get a very good picture of the code to have it work. In fact, as Wikipedia shows, you can literally tear off part of the code and it will still scan fine.
The trick is error correction codes. These are special algorithms for encoding information. If certain parts of the message are lost or modified, the original message can still be decoded. There is a limit to how much damage can be done to the encoded message, but as long as the damaged message is âclose enoughâ to the undamaged one, the original message can be decoded from it.
Today begins our journey to implement such a code from scratch. Weâll go right for the one QR codes use, called a Reed-Solomon code. These algorithms are built around finite fieldsâin our case the field \(\operatorname{GF}(2^8)\)âso weâll start by explaining finite field operations today.
A first date with \(\small{\operatorname{GF}(2^8)}\)
First, what is the finite field \(\operatorname{GF}(2^8)\)?
Simply put, itâs the unique field of order \(2^8 = 256\). (Donât worry about the unique part! Weâll show now that it exists, and at the end that itâs unique.)
To work with this field, we want an explicit construction. Let \(\mathbb{Z}_2\) be the integers mod 2, i.e. \(\mathbb{Z}/2\mathbb{Z}\). Then we can define
\[\operatorname{GF}(2^8) := \mathbb{Z}_2[X]/(f(X)),\]
where \(f(X) \in \mathbb{Z}_2[X]\) is any irreducible polynomial of degree \(8\).
We note this is indeed a field, and that by subtracting off multiples of \(f(X)\), every element of this field can be uniquely represented by a polynomial of degree \(< 8\) in \(\mathbb{Z}_2[X]\). Moreover, no matter what irreducible degree \(8\) polynomial we choose for \(f(X)\), weâll get the same field up to isomorphism.
To make this a little more concrete, one element of \(GF(2^8)\) might be given by the polynomial
\[g(X) = X^7 + X^4 + X^2.\]
and another might be given by the polynomial
\[h(X) = X^6 + X^2 + 1.\]
To keep things simple, weâll tend to conflate an element of our field with the polynomial of degree \(< 8\) representing it. Weâll also fix a âstandardâ choice of \(f\):
\[f(X) = X^8 + X^4 + X^3 + X + 1.\]
We can add \(g\) and \(h\) (remember the coefficients are mod 2) in \(\operatorname{GF}(2^8)\) to get
\[\begin{align*}
g(X) + h(X) &= X^7 + X^6 + X^4 + 2X^2 + 1\\
&= X^7 + X^6 + X^4 + 1.
\end{align*}\]
Since the degree is still \(< 8\), thereâs no need to reduce. To multiply \(f\) and \(g\), we can perform the multiplication in \(\mathbb{Z}_2[X]\) and then reduce mod \(f\):
\[\begin{align*}
g(X) \cdot h(X) &= X^{13} + X^{10} + X^9 + X^8 + X^7 + X^6 + X^2\\
&\rightsquigarrow X^7 + X^6 + X^3.
\end{align*}\]
This reduction can be done by polynomial long division, taking the remainder of \(g(X) \cdot h(X)\) divided by \(f(X).\)
Lastly, for inverses, we can use the extended Euclidean algorithm. Given a polynomial \(g(X) \in \mathbb{Z}_2[X],\) the extended Euclidean algorithm finds \(a(X), b(X) \in \mathbb{Z}_2[X]\) such that
\[a(X) f(X) + b(X) g(X) = \operatorname{gcd}(f(X), g(X)) = 1\]
where the latter inequality holds since \(f(X)\) is irreducible. In this case, \(b(X) g(X) \in 1 + (f(X))\), so \(b(X)\) is inverse to \(g(X)\) in \(\operatorname{GF}(2^8)\).
Coding it up!
With just this, we have enough to begin writing our own implementation of \(\operatorname{GF}(2^8)\)! Weâll do it in Python (of course) and represent polynomials as lists of coefficients, starting with the highest-degree monomial.
First, our modulusâŚ
# MOD is the coefficients of the modulus we choose for GF(2^K),
# which here is x^8 + x^4 + x^3 + x + 1
MOD = [1, 0, 0, 0, 1, 1, 0, 1, 1]
âŚand some basic utility functionsâŚ
# trim leading zeros from a polynomial (list of coefficents,
# starting with the largest-degree monomial)
def _trim(poly):
if _is_zero(poly):
return []
return poly[-(_degree(poly) + 1):]
# test if a polynomial is zero
def _is_zero(poly):
return all([coeff == 0 for coeff in poly])
Next, a function to get the degree of a polynomial:
# get the degree of a polynomial
def _degree(poly):
if _is_zero(poly):
return None
# get the highest-degree monomial with non-zero coefficient
exps = range(len(poly) - 1, -1, -1)
degree = max([exp for exp, coeff in zip(exps, poly)
if coeff == 1])
return degree
Now we can get into the real nitty-gritty. Adding two polynomials in \(\mathbb{Z}_2[X]\) is just xoring, but since our input polynomials might be different lengths, weâll want to zero-pad our arguments first:
# add two binary polynomials (i.e. as elts of Z/2Z[X])
def _add(poly_a, poly_b):
# zero-extend the shorter
len_diff = len(poly_a) - len(poly_b)
if len_diff > 0:
poly_b = [0] * len_diff + poly_b
elif len_diff < 0:
poly_a = [0] * (-len_diff) + poly_a
# just xor each coefficient
return [coeff_a ^ coeff_b
for coeff_a, coeff_b in zip(poly_a, poly_b)]
Next, multiplying in \(\mathbb{Z}_2[X]\). To do this, weâll create a result polynomial called prod
and accumulate partial sums into it. To we multiply \(a(X)\) and \(b(X)\), we can simply take each monomial in \(a(X)\), multiply them by \(b(X)\) (which just consists of âshifting overâ the coefficients, since each monomial has a coefficient of \(1\)), and sum the results.
# multiply two binary polynomials
def _multiply(poly_a, poly_b):
# create a product with all zeros
prod = [0 for _ in
range((len(poly_a) - 1) + (len(poly_b) - 1) + 1)]
# go through all self's terms
exps = range(len(poly_a)-1, -1, -1)
for exp, coeff_a in zip(exps, poly_a):
# if a term is nonzero
if coeff_a == 1:
# create a partial sum by shifting other's coefficients
part_sum = poly_b + exp * [0]
part_sum = (len(prod) - len(part_sum)) * [0] + part_sum
# add it into the product
prod = _add(prod, part_sum)
return prod
Now, we want a function to reduce one polynomial mod another using long division. We can do this by repeatedly taking our modulus, âshiftingâ it up (i.e. multiplying it by some power of \(X\)) until itâs the same degree as our polynomial, and then subtracting out the shifted modulus. Weâll also record what multiple we took of the modulus, and total that up into a quotient.
# divide one polynomial by another
def _reduce(poly, mod):
# get mod degree
mod_degree = _degree(mod)
# remove leading zeroes from mod
mod = _trim(mod)
# create quotient
quot = [0 for _ in range(len(poly))]
# long division to reduce
while True:
# get poly degree
poly_degree = _degree(poly)
# if we're fully reduced, break
if poly_degree is None or poly_degree < mod_degree:
break
# shift mod to cancel out the highest monomial
shift_amount = poly_degree - mod_degree
shift_mod = mod + [0] * shift_amount
# subtract result out of poly
poly = _add(poly, shift_mod)
# add corresponding value to quotient
quot[-(shift_amount + 1)] = 1
return quot, poly
We now have all the groundwork laid! Letâs define a class for elements of \(\operatorname{GF}(2^8).\) The only attribute of the class is a list of coefficients, again starting with the highest-degree monomial. (Also, __str__
and __repr__
are safe to ignore, but I wanted to include them just for completeness.)
class QRFiniteField:
"""
An element of the finite field GF(2^8).
Attributes
----------
coeffs : list[int]
A list of binary coefficients, starting with highest-degree
monomial, of the element of GF(2^8).
"""
def __init__(self, coeffs):
self.coeffs = _trim(coeffs)
def __str__(self):
if _is_zero(self.coeffs):
poly = '0'
else:
# get list of exponents, then monomials
exps = range(len(self.coeffs) - 1, -1, -1)
monos = [f'x^{exp}' if exp != 0 else '1'
for coeff, exp in zip(self.coeffs, exps)
if coeff == 1]
# join together with plusses
poly = ' + '.join(monos)
return poly
def __repr__(self):
# add some context for repr
return f'<value {self} in field GF(2^8)>'
Adding in our finite field is just as in \(\mathbb{Z}_2[X]\):
def __add__(self, other):
# adding is just adding binary polynomials
coeffs = _add(self.coeffs, other.coeffs)
return QRFiniteField(coeffs)
Multiplying is the same, but now with reduction:
def __mul__(self, other):
# multiply as regular binary polynomials
prod = _multiply(self.coeffs, other.coeffs)
# reduce the product
_, prod = _reduce(prod, MOD)
# return the result
return QRFiniteField(prod)
Lastly, to compute inverses, we just implement the extended Euclidean algorithm.
def inv(self):
"""
Compute the inverse of this element in GF(2^8).
Returns
-------
QRFiniteField:
The inverse of this element.
"""
# use extended euclidean method
# set initial values
r_last, r_cur = self.coeffs, MOD
s_last, s_cur = [1], [] # polynomials 1, 0
# iterate until we're done
while True:
# calculate next values
q_cur, r_next = _reduce(r_last, r_cur)
s_next = _add(s_last, _multiply(q_cur, s_cur))
if _is_zero(r_next):
# quit if no remainder
break
else:
# otherwise, step forward
r_last, r_cur = r_cur, r_next
s_last, s_cur = s_cur, s_next
# the inverse is stored in s_cur at the end
return QRFiniteField(s_cur)
Finally, we can run some tests and assert things work:
if __name__ == '__main__':
# run a basic test from Wikipedia, Finite Fields
a = QRFiniteField([0, 1, 0, 1, 0, 0, 1, 1])
b = QRFiniteField([1, 1, 0, 0, 1, 0, 1, 0])
assert a.inv().coeffs == b.coeffs
assert (a * a.inv()).coeffs == [1]
And lo and behold, things really work! If youâre curious, or want to play with the code, itâs available at this repository. Iâll be adding code there in the next entries in this series.
For the curious reader, there are certainly lots of ways to optimize finite field arithmetic. The C example here on Wikipedia shows a neat trick for speeding up multiplication in \(\operatorname{GF}(2^8)\), which performs multiplication and reduction simultaneously.
A little math, as a treat
I promised proofs, and here they are! These arenât critical to understanding how to work with \(\operatorname{GF}(2^8)\) in practice, but since I couldnât find somewhere that had all these arguments in one place, I thought Iâd collect them here for the curious reader:
\(\operatorname{GF}(2^8)\) is a field: To see that \(\operatorname{GF}(2^8)\) is a field, it sufficies to show that \((f(X))\) is a maximal ideal in \(\mathbb{Z}_2[X]\). This is true, because if \((f(X))\) werenât maximal, it would be contained in some maximal ideal \(\mathfrak{m}\) containing \(g(X) \notin (f(X)).\) But since \(f(X)\) is irreducible, it is coprime to any such polynomial \(g(X)\). By Bezoutâs lemma, we can find \(a(X), b(X) \in \mathbb{Z}_2[X]\) such that
\[\small{a(X) f(X) + b(X) g(X) = 1.}\]
But then since \(f(X), g(X) \in \mathfrak{m}\), we have \(1 \in \mathfrak{m}\), a contradiction. Thus \(\operatorname{GF}(2^8)\) is a field.
Elements of \(\operatorname{GF}(2^8)\) are represented by polynomials of degree \(< 8\): Next, we claim any element of this ring can be uniquely represented by a polynomial of degree \(< 8\). Indeed, if \(g(X) \in \mathbb{Z}_2[X]\) is a representative of some element of \(\operatorname{GF}(2^8)\), then we can use polynomial long division to write
\[\small{g(X) = a(X) f(X) + r(X),}\]
where \(a(x), r(x) \in \mathbb{Z}_2[X]\) and \(\operatorname{deg} r(x) < 8.\) Since \(g(X)\) and \(r(X)\) differ by a multiple of \(f(X)\), then \(r(X)\) is also a representative of our element of \(\operatorname{GF}(2^8)\). This shows our claim.
These representatives are unique: Moreover, we claim any two elements with distinct representatives of degree \(< 8\) are distinct. If this wasnât the case, weâd have equal elements of \(\operatorname{GF}(2^8)\) with representatives \(g(X), h(X)\) which are distinct but both have degree \(< 8.\) Subtracting \(g\) and \(h\), we would get a representative \(g(X) - h(X) \neq 0\) of \(0 \in \operatorname{GF}(2^8)\) with degree \(< 8\). But this is impossible, as the quotient map
\[\small{\mathbb{Z}_2[X] \to \mathbb{Z}_2[X] / (f(X))}\]
only sends multiples of \(f(X)\) to \(0\), meaning \(g(X) - h(X)\) canât be a representative of \(0\).
\(\operatorname{GF}(2^8)\) is well defined: Lastly, we want to show that our choise of \(f(X)\) doesnât matter. It is sufficient to show the stronger statement that all finite fields of order \(2^8\) are isomorphic. Take any field \(F\) of order \(2^8\). \(F\) must have characteristic \(2\) (as \(F\)âs characteristic must divide its order). We claim that \(F\) is isomorphic to the splitting field of
\[\small{h(X) = X^{2^8} - X.}\]
It sufficies to show that every element of \(F\) is a root of \(h(X)\), as then \(h(X)\) must split into
\[\small{h(X) = \prod_{a \in F} (x - a)}.\]
And indeed, \(0 \in F\) is clearly a root of \(h(X)\), and any other \(a \in F^\times\) must have order dividing \(\lvert F^\times \rvert = 2^8 - 1,\) meaning \(a\) is a root of \(X^{2^8 - 1} - 1\) and thus also a root of \(h(X) = X^{2^8} - X.\) Since splitting fields are unique up to isomorphism, we have all fields of order \(2^8\) are isomorphic.
Thanks for reading! Let me know if you liked this article, or if thereâs any improvements I can make for future ones. đŚ
22 May 2021
3kCTF 2021 was this past weekend, and there was one crypto challenge I really liked for its subtlety: Secure Roots.
The puzzle consists of a single Python file running on a server which we can connect to:
# secure_roots.py
from Crypto.Util.number import getPrime, long_to_bytes
import hashlib, os, signal
def xgcd(a, b):
if a == 0:
return (b, 0, 1)
else:
g, y, x = xgcd(b % a, a)
return (g, x - (b // a) * y, y)
def getprime():
while True:
p = getPrime(1024)
if p % 4 == 3:
return p
class Server():
def __init__(self):
self.private , self.public = self.gen()
print("Prove your identity to get your message!\n")
print("Public modulus : {}\n".format(self.public))
def gen(self):
p = getprime()
q = getprime()
return (p, q), p*q
def decrypt(self, c):
p, q = self.private
mp = pow(c, (p+1)//4, p)
mq = pow(c, (q+1)//4, q)
_, yp, yq = xgcd(p, q)
r = (yp * p * mq + yq * q * mp) % (self.public)
return r
def sign(self, m):
U = os.urandom(20)
c = int(hashlib.sha256(m + U).hexdigest(), 16)
r = self.decrypt(c)
return (r, int(U.hex(), 16))
def verify(self, m, r, u):
U = long_to_bytes(u)
c = int(hashlib.sha256(m + U).hexdigest(), 16)
return c == pow(r, 2, self.public)
def get_flag(self):
flag = int(open("flag.txt","rb").read().hex(), 16)
return pow(flag, 0x10001, self.public)
def login(self):
m = input("Username : ").strip().encode()
r = int(input("r : ").strip())
u = int(input("u : ").strip())
if self.verify(m, r, u):
if m == b"Guest":
print ("\nWelcome Guest!")
elif m == b"3k-admin":
print ("\nMessage : {}".format(self.get_flag()))
else :
print ("This user is not in our db yet.")
else:
print("\nERROR : Signature mismatch.")
if __name__ == '__main__':
signal.alarm(10)
S = Server()
r, u = S.sign(b"Guest")
print("Username : Guest\nr : {0}\nu : {1}\n".format(r , u))
S.login()
On first glance, this appears to be a login system that uses some kind of signature to identify users. To construct a signature for a user, it takes a message (the userâs username) and performs a SHA256 hash of it with random padding, which is then passed through the self.decrypt()
function to obtain a signature. On startup, the user is given a signature for the Guest
account consisting of the signature itself \(r\) and the output \(u\) of bytes_to_long
applied to the padding To get the flag, the user has to pass in a signature instead for the 3k-admin
account, then decrypt an encoded flag from get_flag()
.
Upon inspection, we find that this looks like an implementation of the Rabin cryptosystem. Indeed, everything seems to line up: encryption is squaring mod \(n = p \cdot q\), decryption is taking a modular square root using the Chinese Remainder Theorem trick, etc. In fact, this appears to be verbatim the algorithm on Wikipedia, except for the fact that we always pick one of the four rootsâin the notation of the above link,
\[r_1 = (y_p \cdot p \cdot m_q + y_q \cdot q \cdot m_p) \bmod n\]
where
\[m_p = c^{1/4(p+1)} \bmod p \quad \text{and} \quad m_p = c^{1/4(q+1)} \bmod q,\]
and \(y_p \cdot p + y_q \cdot q=1\). Ignoring the fanciness with \(y_p\) and \(y_q\), we are simply taking \(r_1\) to be the unique value mod \(n=p \cdot q\) which equals \(m_p \bmod p\) and \(m_q \bmod q\).
Now, we claim \(m_p\) is a square root of \(m \bmod p\) when \(m \bmod p\) is a square. The reasoning is outlined in this handy MSE post: since \(m \bmod p\) is a square, we have the Legendre symbol \(\left(\frac{m}{p}\right) = m^{(pâ1)/2} = 1 \bmod p\), and multiplying through by \(m\) we get \(m^{(p+1)/2} = m \bmod p\), so indeed \(m_p^2 = m \bmod p\).
Digging around some more though reveals what might be wrong with this implementation of Rabin signatures: in particular, page 8 of this chapter in Steven Galbraithâs book states that we need to make sure our signed message \(m\) is a square before attempting to compute a square root, replacing it with \(nâm\) if this isnât the case (one will always be a square). Indeed, if \(m\) isnât a square, then \(r_1\) clearly wonât be the modular square root of \(m\). We can verify this is a problem, as attempting to log in with Guest
âs given credentials fails exactly half the time.
So if \(m\) has no modular square root, then what is \(r_1\)? Well, if \(m \bmod n\) is not a square root, then by the Chinese Remainder Theorem, we must have that itâs not a square mod \(p\) or not a square mod \(q\). Without loss of generality, say \(m \bmod p\) is not a square, i.e.
\[\left(\frac{m}{p}\right) = m^{(pâ1)/2} = â1.\]
Then \(m_p^2 = m^{(p+1)/2} = âm \bmod p\), i.e. \(m_p\) is a modular square root of \(âm \bmod p\). Hereâs the key: in this case, \(r_1^2 \bmod p = âm \bmod p\), meaning \(p \mid r_1^2 + m\). Since \(n = p \cdot q\), we then can recover \(p\) by computing
\[p = \operatorname{gcd}(r_1^2 + m, n)\]
and \(q = n / p\). Having factored \(n\), we now have the private key for not only the Rabin signatures but also the get_flag()
secret, which is RSA encrypted with the same keys.
To implement this, we note that when \(m \bmod p\) is a square, the output of \(\operatorname{gcd}(r_1^2+m,n)\) is almost certainly \(1\), so we can simply keep making requests until the value of \(p\) we obtain is not \(1\). From there, since \(n = p \cdot q\) is factored, we can simply create a signature for 3k-admin
using the functions from the challenge script, then use \(p\) and \(q\) to decrypt the RSA-encrypted flag.
# solution.py
import hashlib
from Crypto.Util.number import GCD, long_to_bytes, bytes_to_long
from gmpy2 import is_square
from pwn import *
# copied from challenge script
def xgcd(a, b):
if a == 0:
return (b, 0, 1)
else:
g, y, x = xgcd(b % a, a)
return (g, x - (b // a) * y, y)
# copied
def decrypt(c):
mp = pow(c, (p+1)//4, p)
mq = pow(c, (q+1)//4, q)
_, yp, yq = xgcd(p, q)
r = (yp * p * mq + yq * q * mp) % n
return r
# copied
def sign(m):
U = os.urandom(20)
c = int(hashlib.sha256(m + U).hexdigest(), 16)
r = decrypt(c)
return (r, int(U.hex(), 16))
# copied
def verify(m, r, u):
U = long_to_bytes(u)
c = int(hashlib.sha256(m + U).hexdigest(), 16)
return c == pow(r, 2, n)
p = 1
while p == 1:
# get values for signature and public key
conn = remote('secureroots.2021.3k.ctf.to', 13371)
data = conn.recv().splitlines()
n = int(data[2].split()[-1])
r = int(data[5].split()[-1])
u = int(data[6].split()[-1])
# try to recover p
a = int(hashlib.sha256(
b'Guest' + long_to_bytes(u)).hexdigest(), 16)
p = GCD(n, a + r**2)
# recover q
q = n // p
# try to sign (remember this fails half the time)
while True:
r, u = sign(b'3k-admin')
# if valid, go on
if verify(b'3k-admin', r, u):
break
# send the signature
conn.send(b'3k-admin\n')
print(conn.recv())
conn.send(f'{r}\n'.encode())
print(conn.recv())
conn.send(f'{u}\n'.encode())
# get the encrypted flag
m = int(conn.recv().split()[-1])
# compute RSA private key from p and q
phi = (p - 1) * (q - 1)
d = pow(0x10001, -1, phi)
# decrypt the flag
plain = long_to_bytes(pow(m, d, n))
print(plain)
And, after all that, we are done! Iâve misplaced the flag, haha, but was proud to place 22nd overall putting in a handful of odd hours. Maybe someday Iâll have the skills (and the time!) to participate competitively. Til then, youâre stuck with me and my slow writeups :)
20 Apr 2021
Much of math and cryptography to me is beautiful because of the story it tells. Yesterday, before sitting down and solving this CTF challenge, Iâd been having an incredibly rough day. Fortuantely, the girl who I absolutely adore knew I was having a rough time and took me out for easily the best date of my life. I was moved nearly to tears.
This article is dedicated to her and her brilliant, inventive, hilarious, and kind spirit. Thank you, hun, for helping me tell a better storyâin particular one with you in it.
Anyway, on to the crypto! After such a rollercoaster of a day, I sat down and remembered that PlaidCTF had taken place just the two days before. Naturally, I put off all my other responsibilities and cracked open what looked to be the easiest puzzle by solve count: XORSA.
In this article, Iâve tried to give a faithful representation of my thought process to (1) realistically show where my mind went when solving this puzzle and (2) prove to you how awful I am at CTFs.
The puzzle consists of a tarball containing three files: a public key public.pem
, an encoded flag flag.enc
, and a Sage script xorsa.sage
:
# xorsa.sage
from Crypto.PublicKey import RSA
from Crypto.Cipher import PKCS1_OAEP
from secret import p,q
x = 16158503035655503426113161923582139215996816729841729510388257123879913978158886398099119284865182008994209960822918533986492024494600106348146394391522057566608094710459034761239411826561975763233251722937911293380163746384471886598967490683174505277425790076708816190844068727460135370229854070720638780344789626637927699732624476246512446229279134683464388038627051524453190148083707025054101132463059634405171130015990728153311556498299145863647112326468089494225289395728401221863674961839497514512905495012562702779156196970731085339939466059770413224786385677222902726546438487688076765303358036256878804074494
assert p^^q == x
n = p*q
e = 65537
d = inverse_mod(e, (p-1)*(q-1))
n = int(n)
e = int(e)
d = int(d)
p = int(p)
q = int(q)
flag = open("flag.txt","rb").read().strip()
key = RSA.construct((n,e,d,p,q))
cipher = PKCS1_OAEP.new(key)
ciphertext = cipher.encrypt(flag)
open("flag.enc","wb").write(ciphertext)
open("private.pem","wb").write(key.exportKey())
open("public.pem","wb").write(key.publickey().exportKey())
This appears to be a standard RSA implementation, except at the start weâre given \(p \oplus q\). PKCS#1 OAEP was new to me, but after reviewing the Wikipedia page, we see that itâs plain old RSA with a padding scheme designed to give âall or nothingâ security, i.e. to recover the plaintext one must recover the entire padded message.
As a first step to solving this puzzle, we want to look into what x
actually is: if \(p\) and \(q\) XOR to it, then it might be worth inspecting bitwise:
>>> bin(x)
'0b111111111111111111111111111111111111111111111111111111101111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111110111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111011111111111111111111111111111111111111111111110111111111111111111111111111111111111111111111111111111111111111111111111111111111011111111111111111111111111111111111111111111111111111111111111111111111111111111111111011111111111111111111111101111111111111111111111111111111111111111111111111111111111011111111111111111111111111111111111111111011111111111111111111111111111111111110111111111111111111101111111111111111111111111111111111111111111111111111111111111111111111111111111111111111110111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111011111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111011111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111011111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111011111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111110'
One observation: thatâs a lot of ones. Unsurprisingly, that became relevant later, but at first it wasnât obvious why this was the case.
At first, I thought it could be because one of the primes was a Mersenne prime, which has a binary expression of all ones. Unfortunately, taking a string of ones of the same bit length as m
(or any similar length) did not yield a prime. I tried seeing if one of the primes was almost all ones by testing if taking \(p\) as x
with some of the zeros replaced with ones would yield \(pq = n\), but this sadly wasnât the case.
At this point, I spent a good while wandering in the desert. My first thought was maybe there was a way to recover \(p\) and \(q\) given both \(p \oplus q\) and \(pq\). After trying a few toy examples and Googling furiously for something relevant, I found this sad StackExchange post from a year ago asking this exact question with no reply. On the plus side, during this search I found a trick that allows one to recover two numbers from their sum and their XOR, which is detailed in the first answer to this StackOverflow post. This turned out to be inspirational later, but as I had a product, not a sum, it wasnât immediately useful.
Eventually I gave up on trying to recover \(p\) and \(q\) and wondered if I could get some weaker piece of information sufficient to crack the cipher. In particular, I wondered if \(d\) would be recoverable. To find \(d\) from \(e = 65537\), I would need to calculate \((p - 1)(q - 1)\). A quick calculation gives
\[(p - 1)(q - 1) = pq - p - q - 1 = n - (p + q) - 1,\]
i.e. we donât need both \(p\) and \(q\), just \(p + q\). We have the XOR, we want the sum: this sounds like the trick Iâd found online! In particular, this trick hinges on the fact that
\[p + q = 2 (p \land q) + p \oplus q\]
since when we add two numbers, we count the places where \(p\) and \(q\) both have a \(1\) twice, and the places where only one of \(p\) and \(q\) is \(1\) once.
Looking back at our binary string for x
, we note that at almost every position along \(p\) and \(q\), one of them is \(1\) while the other is \(0\). Only at the positions in x
with a \(0\) do we have that both \(p\) and \(q\) have the same digit. Because there are so few of these positions, we can try every possibility! Now it makes sense why \(p\) and \(q\) would be chosen such that x
is mostly ones.
In particular, to calculate a possible value for \(p + q\), we start with x
. For each position \(k\) that weâve decided \(p\) and \(q\) both have a \(1\), we then add \(2^k + 2^k = 2^{k+1}\). Easy enough!
To test whether my guess for \(p + q\) was correct, I used it to calculate \((p - 1)(q - 1)\) as above, then calculated \(d = e^{-1} \bmod (p - 1)(q - 1)\) and tried to use \(d\) to decrypt the message. The problem was that when you create a key with PyCryptodome using RSA.construct((n, e, d))
, it uses the classic trick as described here to factor \(n\) into \(p\) and \(q\) using \(d\), which was prohibitively slow. As far as I can tell, thereâs no way around this. The closest I got was settingconsistency_check=False
and giving phony values for \(p\) and \(q\)âthis actually bypassed calculating \(p\) and \(q\) and allowed me to construct a PKCS1_OAEP
object, but then decryption seemed to fail for every possible value of \(p + q\).
After a bit, I realized instead of trying to use \(d\) to decrypt, I could instead use \((p - 1)(q - 1)\) with \(n\) to recover \(p\) and \(q\) directly. Implementing the trick shown here, I was able to, for each possible value of \(p + q\), compute \((p - 1)(q - 1)\) and then see if this yielded an integer value for \(p\). If it did, I could then use \((p - 1)(q - 1)\) and \(e\) to compute \(d\), pass all values into RSA.construct
, and decrypt the message.
The script to do this is given belowâneedless to say, itâs a bit messy, but it does the job.
# soln.py
import itertools
from decimal import *
from Crypto.PublicKey import RSA
from Crypto.Util.number import inverse
from Crypto.Cipher import PKCS1_OAEP
# set decimal precision to something massive for later
getcontext().prec = 1000
# store value of x from original script
x = 1615850303565550342611316192358213921599681672984172951038825712387991397815888639809911928486518200899420996082291853398649202449460010634814639439152205756660809471045903476123941182656197576323325172293791129338016374638447188659896749068317450527742579007670881619084406872746013537022985407072063878034478962663792769973262447624651244622927913468346438803862705152445319014808370702505410113246305963440517113001599072815331155649829914586364711232646808949422528939572840122186367496183949751451290549501256270277915619697073108533993946605970413224786385677222902726546438487688076765303358036256878804074494
# read in n and e from the public key
with open('public.pem', 'rb') as f:
pubkey = RSA.importKey(f.read())
n, e = pubkey.n, pubkey.e
# read in the ciphertext
with open('flag.enc', 'rb') as f:
ciphertext = f.read()
# get the binary of x
b = '0' + bin(x)[2:]
# get the powers 2^k such that the corresponding place in b is zero
powers = [len(b) - i - 1 for i in range(len(b)) if b[i] == '0']
# iterate over all subsets of positions where p and q share the same bit
done = False
for count in range(0, len(powers) + 1):
for subset in itertools.combinations(powers, count):
# for each such subset, assume that p and q have ones at those positions
# and calculate how much that contributes to p + q beyond x
offset = sum([2**(k + 1) for k in subset])
# get the resulting value of p + q
total = x + offset
# calculate phi given this assumption
phi = n - total - 1
# now, calculate the radicand in the formula for p given phi and n
radicand = (n + 1 - phi)**2 - 4 * n
# if it's negative, go to the next subset
if radicand < 0:
continue
# if the radicand is actually a square, calculate p, q, and d
if (int(Decimal(radicand).sqrt()))**2 == radicand:
rad = int(Decimal(radicand).sqrt())
p = ((n + 1) - phi + rad) // 2
q = n // p
d = inverse(e, phi)
done = True
break
if done:
break
# construct a key from p, q, and d and decrypt the flag
key = RSA.construct((n, e, d, p, q))
cipher = PKCS1_OAEP.new(key)
flag = cipher.decrypt(ciphertext)
print(flag)
With a minute or so of thought, our script outputs the long-awaited flag:
PCTF{who_needs_xor_when_you_can_add_and_subtract}
I sadly couldnât submit this flag, as the competition was over, but needless to say I was far more excited to solve the challenge than anyone reasonably should have been! It was a beautiful end to a night beyond compare, and going to bed that night, I felt remarkably at peace.
17 Apr 2021
For my first âactualâ post on this blog, Iâll share a trick I learned through a remarkable headache.
I was working on implementing the GGH97 lattice cryptosystem for a CTF problem I was designing. The GGH public key cryptosystem has a beautiful attack on it discovered in 1999 by Phong Nguyen in his paper âCryptanalysis of the GoldreichâGoldwasserâHalevi Cryptosystem from Cryptoâ97.â Iâll detail the attack in my next post, but for now, suffice it to say the attack hinges on lattice reduction, and my lattice reductions in Sage were taking orders of magnitude longer than the results in the literature indicated they should.
In particular, the original work was done on a 500 MHz DEC Alpha using NTL, and even when I used the NTL backend in Sage (versus the default fpLLL backend) any call to Matrix_integer_dense.LLL()
/BKZ()
took prohibitively long for GGH public keys larger than say, \(80 \times 80\). Since the primary empirical hardness results for GGH that necessitate the Ngu99 attack demand keys of \(120 \times 120\) and larger, this wouldnât do.
After ages of searching, I found this ticket from 14 years ago on the SageMath development webpage, where someone had complained that lattice reduction was significantly faster in Magma than in Sage. In response to this, the user malb
noted that the slowness was due to using NTLâs âexactâ algorithm rather than the floating point implementation. The patch this user developed then allowed passing algorithm='NTL:LLL_FP'
to LLL()
, allowing for comparable speed to Magma.
To try this myself, I peeked at the current documentation and found that the modern interface uses fp=...
instead to specify precision. In particular, passing fp='fp'
provided more than enough precision for cryptanalysis of GGH and gave me even faster reduction than that reported in Nyu99. If youâre using BKZ()
, the same values can be passed for fp
.
This may be obvious to people, but I shared it on the CryptoHack Discord and enough people (read: at least 2 people) thought it was cool that I thought I should post it.
tl;dr: check out the fp
argument for LLL()
in Sage if lattice reduction is slow!
16 Apr 2021
Iâm making a blog now! My hope is that Iâm sufficiently bad at what I like to learn (algebraic geometry and post-quantum crypto) that I can explain things well enough for people like me to understand. Expect content here soon!