Friday, December 18, 2015

Codes and Coding

Dear readers, hi fellows,
as my master's thesis was about Linear Codes and Applications in Cryptography I'd like to provide some content of it in a nutshell and to give some recommendations of software that I used in the course of writing it.

multiple scientific fields are involved (created using TikZ)
Modern cryptography covers among others the three areas privacy, authentication and integrity of messages. The aim of coding theory on the other hand described by one word is reliability - adding redundancy to messages in order to detect transmission errors. Both fields cover seemingly different areas in the field of digital information processing.

The motivation of my thesis was to showcase the combination of multiple scientific fields within coding theory and modern cryptography and to explore how they interact in the domain mentioned above and why they are a good choice.

In the center of attention stood the McEliece public-key cryptosystem, it's variation (Niederreiter's cryptosystem) and generalizations to other code classes after those code classes of interest were introduced and discussed.

The McEliece public-key cryptosystem was published in 1978, is based on disguising a special code as a general code and it is of interest back then and today because it seems to stand solid in the so-called post-quantum era, when a powerful quantum computer is available to attackers. More than 35 years ago the original code parameters: codeword length $n = 1024$, code dimension (or message bits) $k = 524$ and error-correcting capability $t = 50$ were proposed for the binary code underlying the McEliece public-key cryptosystem for $2^64$ bit security. To achieve the a reasonable security level today they have been adjusted to $n = 6960, k = 5413, t = 119$ because of attacks that were explored since then.

Given the number $t$ of erroneous positions and a general code $C$, the task of decoding $y = c + e$ is to find a codeword $c \in C$ of Hamming weight or number of non-zero entries: $w(y-c) = w(e) = t$. In the complexity theoretic literature the result is that this problem is NP-complete - there is no algorithm known that fulfills this task in polynomial time depending on the input size (length of $y$ respectively $c$).

The choice to use a Goppa codes as private key, which is multiplied by permutation matrices to obtain the public key looking as if it was a general code, was due to performance reasons when decoding messages thanks to Patterson's algorithm.

Interestingly many attacks are not applicable to the class of binary Goppa codes that McEliece's scheme was based on but on similar schemes and generalizations allowing other alphabets or other structures - binary Goppa codes were seemingly proposed with foresight.

In spite of some advantages and the - so far - quantum computer resistant attacks the biggest drawback of public-key cryptosystems based on codes are rather long public keys compared to other schemes. If one adds a security margin, under the assumption of a working quantum computer performing the general attack through the application of Grover's algorithm, the public key size is as big as 1 MB.

In the thesis I implemented the fast algorithm (by N. J. Patterson) for algebraically decoding binary Goppa codes in Sage and demonstrated the full functionality of the system in yet another chapter giving an example. Since I was writing it all up in $\LaTeX$, using sagetex even allowed the computation and insertion of the results into the $\verb!.tex!$ file one the fly when the document is being compiled!

My recommendation (to my fellows) for researching, analyzing, developing & testing of algorithms in a structured manner with the ability to create nice plots and documents rather easily is to give Sage or Spyder a try. Even though it might not be a necessity in your research to actually code something, playing around with these tools already available is instructive and worth a try.

Sage is available online or offline via installation and it is a good choice if you need access to i.e. number theory. Maybe the aforementioned Spyder is preferable if a somewhat more automated handling even closer to the programming language Python of the implemented primitive at hand is needed in your case.

Autumnal Bochum around Oct 31
As usual, a little plurivalent:
Happy Coding and Happy Halloween - because $Dec(25) = Oct(31)$, right?!

Tuesday, December 15, 2015

Spring School on Symmetric Cryptography

We are happy to announce the ECRPYT-CSA
The Spring school will be held in Bochum, Germany. Bochum is located within the famous "Ruhr Area", Germany's former primary location for coal mining and steel industry. Although most coal mines have been closed, the hidden beauty of these old days has been preserved in many places.

The Spring school takes place right before FSE 2016.

The goals of this school is to provide academia and industry with some of the necessary background to understand symmetric cryptography. On top, several recent developments in symmetric cryptography design and implementations will be presented.

Sunday, December 13, 2015

Post-Snowden Cryptography (PSC - Brussels)

On December 9 and 10, the Post-Snowden Cryptography Workshop took place in Brussels. The main topic of the discussion was cryptography and privacy after the revelations from Edward Snowden.

Day one

In the first day of the workshop, we learned about Qubes OS and how it works from Joanna Rutkowska. In the second talk, Kenny Paterson presented some disturbing facts about the NSA's capabilities from Snowden's revelations such as a rumoured "real-time breakage of RC4" and "ability to break certain unnamed VPN products". Also, he gave some solutions for the new problems of surveillance such as building cryptographic implementations that are "strong, fast and developer-friendly." Next, Phil Zimmermann gave a speech and answered questions. In the last talk of the first day, Jon A. Solworth presented the Ethos Operating System and how his group built the OS. He explained some important features of the OS such as how its networking is "encrypted, cryptographically authenticated, and authorized."

Day two

On the second day, Christopher Soghoian spoke about the revelations of the NSA and how the FBI uses this information, i.e. how law enforcement is the "little brother" that does the actions. Claudia Diaz presented some results about website fingerprinting on Tor, i.e. how to collect information from a client that is using Tor. Website fingerprinting consists of recognizing the traffic patterns of specific web pages without other information. Ian Goldberg presented DP5. DP5 allows clients to register and interact with friends without leaking metadata.

Next, Christian Grothoff talked about the development of GNUnet and how it is built. One little part of the work, i.e. the complexity of the development of any library he presented a dependency graph of GNUnet. Then, in the last part of the workshop, Jacob Appelbaum presented a "modest post-Snowden proposal." In fact, the correct term is post-Snowden revelations, since Snowden is still alive. In this talk, he presented a different view of the situation, that is, we did not need to hide the information. In the last talk of the day, Nathan Freitas presented an overview of the Guardian Project and his experience with development of secure apps for Android. He showed that the Guardian Project has SDKs for encryption in mobile applications such as SQLCipher to encrypt databases.
The event presented an ethical view of cryptography and which path cryptography is following. The moment for this discussion could be reached in better time, after the essay from Phillip Rogaway called "The Moral Character of Cryptographic Work," cryptographers need to pay more attention in their work. In simple words, this means that we should care about what we are doing and how this will impact society. We need to ask ourselves what kind of legacy we want to leave behind or what benefits we want to bring to the world in general.

Slides for some speakers' talks are available on the Post-Snowden Cryptography Workshop website.


The videos of the talks are available:

Friday, December 4, 2015

Workshop on Lattice Cryptography

It is the day after AsiaCrypt 2015 and there are two workshops being held in Auckland. The one which is most relevant for my research is that on Lattice Based Cryptography; which consists of four talks. One by Jung Hee Cheon on "Multilinear Maps and Their Cryptanalysis", one by Amit Sahai on "Obfuscation", one by Fre Vercauteren on "Weak Instances of RLWE" and one by Martin Albrecht on "Small Secret LWE".

Cheon first described a very naive version of multi-linear maps and then went on to show how this can be attacked by creating non-trivial encodings of zero, and then taking greatest common divisors. Then he went on to generalise this naive scheme to the CLT scheme (which is a bit like the DGHV FHE scheme). The naive attack does not apply to CLT as the dimension increased, meaning taking naive greatest common divisors would not work. Cheon then showed how to extend the naive attack to the CLT case by turning the gcd extraction into an eigenvalue extraction problem. This done by building quadratic forms which represent encodings of zero. The result is that for the CLT scheme one can break the equivalent of the DLP problem.
Cheon then went on to present the GGH scheme, which is a bit like the NTRU FHE scheme; except the instead of encrypting via c=[(m+r*p)/z] for an integer p, one encodes via c=[(m+r*g)/z] for a polynomial g which generates the ideal lattice <g>. Modifying the prior attack in this situation allows us to recover a basis of this ideal. But finding a short vector in this lattice can be hard. However, by utilizing encodings of zero one can actually solve the equivalent of the CDH problem.
Both attacks rely heavily on the presence of encodings of zero. So the attacks do not apply to situations in which one does not publish such encodings; i.e. applications such as indistinguishability Obfuscation (iO).

Amit Sahai then gave an introduction to iO; he motivated it via an analogy of an attacker who captures your brain and is able to read and tamper with every neuron, yet we still do not want the attacker to know what we are thinking about. This is the problem which obfuscation tries to solve in the computing realm. Martin pointed out that this would be a great way to produce malware!

Amit then put Multi-Party Computation within this analogy. He suggested we can think of MPC as protecting our brain against the tampering adversary, by dividing the brain up into portions. As long as one portion is kept out of the adversaries control we can use MPC to protect our thoughts. Obfuscation tries to do the same, without there needing to be an honest part of the brain.

Any program which is suitable for obfuscation must be unlearnable from query access to the program. Since otherwise the adversary could learn the program from the input/output behaviour. However, black-box obfuscation has been shown to be impossible; essentially because their are contrived programs which are unlearnable but for which one cannot produce an obfuscation, since any obfuscated program has an explicit attack against it.

This is why iO as a concept was presented; since it at least seems possible to achieve. The idea is that if you have two equivalent programs and we obfuscate one of them, then the adversary cannot tell which one we obfuscated. One way of thinking of this is as a psuedo-canonicalizer. The question is what useful can one do if we could create an obfuscator which satisfied the iO definition. Amit gave the application of building demo versions of software, without needing to re-engineer the software.

Fre Vercauteren then discussed a more in depth analysis of a paper from CRYPTO this year on Weak Instances of Ring-LWE. The CRYPTO paper gave instances where decision Ring-LWE was easy, but search appeared to be hard. However, Fre's talk showed that the search problem was in fact easy from the start, and thus the CRYPTO paper was less surprising than it at first seemed to be. As with all things on Ring-LWE the question arises as to how to choose the error distributions.

Fre spend the first part of his talk discussing the geometry of number fields, and in particular the Minkowski embedding. The Ring-LWE problem generates errors according to a discrete Gaussian distribution in the Minkowski embedding, Poly-LWE is to generate the errors according to a discrete Gaussian in the polynomial embedding.

Eisentrager et al discussed cases for which Poly-LWE was easy, these were then extended by Elias et al to special cases of decision Ring-LWE. They did this by mapping the special Ring-LWE instance to a special Poly-LWE instance.This is done by pulling back the problem from Ring-LWE to Poly-LWE via the matrix which defines the Minkowski embedding. The Poly-LWE attack requires that q is larger than f(1), and hence q will "kind of show up" in the coefficients of the defining polynomial f. So the fields being attacked, are very special indeed.

(This post originally appeared in the BristolCrypto blog)

Tuesday, December 1, 2015

Optimality of frequency analysis on deterministically-encrypted database columns

This fall at CCS '15, Naveed, Kamara, and Wright published a paper about their evaluation of some passive attacks on encrypted databases (EDBs). (There are posts about their paper on Kamara's blog and on the Bristol Cryptography Blog.) Their paper is important because it illustrates the power of simple attacks on columns encrypted with property-preserving encryption. All the attacker needs is a snapshot of the encrypted database and some relevant, publicly-available auxiliary data. In this post, I briefly explain an observation that Kenny Paterson and I made about the optimality of frequency analysis.

About the attacks

Two of the attacks Naveed, Kamara, and Wright evaluated, frequency analysis and ℓp-optimization, operate on deterministically-encrypted columns. (Of course, the appeal of deterministic encryption is that it naturally allows equality testing on encrypted data.) Both of these attacks are passive—they do not rely on seeing any queries to the EDB, and they require only a copy of it at rest. They do, however, require an auxiliary data set. In essence, an auxiliary data set is one that should have the same distribution as the target data (i.e., the data underlying the encrypted columns). Here's the main idea of the two attacks:

  • frequency analysis decrypts the column by matching the most frequent ciphertext with the most frequent plaintext from the auxiliary data (and so on for the other less frequent ciphertexts).
  • p-optimization decrypts the column by matching the frequencies of the ciphertexts with the frequencies of the auxiliary plaintexts in a way that minimizes the ℓp-distance of their histograms. (See Naveed et al.'s paper for details.)

Attacking a deterministically-encrypted database column using this type of auxiliary data is like attacking a monoalphabetic substitution cipher given plaintext letter frequencies. Consider the following example: suppose you intercept the following ciphertext of some English message that was encrypted with a keyword cipher.


The simplest attack is well-known: just look up English letter frequencies and you can probably crack the cipher by matching the most frequent letter in the ciphertext with the most frequent letter in the alphabet (E in English). A slightly more powerful attack would involve looking at which pairs of letters occur most often next to each other, and using bigram statistics (most common bigram in English: TH) to crack the cipher. The question of whether frequency analysis is an optimal attack on a keyword cipher is just not relevant since there's other information to exploit.

In the EDB context, though, the letters (i.e., individual data values) are not ordered! The ciphertext would look more like this, with one "letter" per record:

Attribute 1

Even if you knew the alphabet (say, EU country names), higher-order frequency statistics (like of bigrams) just can't apply since there is no concept of order or adjacency.

Our observation

So, given a copy of an EDB column and an auxiliary data set that is correlated with it, what is the best we can do in terms of attacking a deterministically-encrypted column? What should we use—frequency analysis, ℓp-optimization, or something else? We propose applying maximum likelihood estimation (MLE) to answer this question. MLE is a statistical technique for finding the value of a parameter of a probability distribution function that maximizes the likelihood1 of seeing some particular sample data having that distribution.

Why does it apply here? Well, we have samples of data that should have the distribution of the auxiliary data, but they've been relabelled (i.e., encrypted). The attacker wants to find the mapping between plaintext values (from the auxiliary data) and the relabelled values (from the encrypted target data). In other words, the distribution's parameter that we want MLE to find is a permutation.

Which permutation makes the target data most likely? The cool thing—and with hindsight, the unsurprising thing—is that according to the maximum likelihood principle, frequency analysis is exactly the optimal technique2 that arises from maximizing the likelihood function!

You can read the details in the short note we posted on ePrint.

1 What's the difference between probability and likelihood? Informally, probability matters when we're given a certain model and want to know about samples from it. Likelihood, on the other hand, matters when we're given the data, and want to know about the model (or distribution) from which it arises. Here's a simple example to illustrate the difference. Given a fair coin, we could ask (and compute) what is the probability of getting 2 heads when tossing it 10 times. On the other hand, suppose we have some coin of questionable origin. We toss it 10 times and observe 2 heads. Then, we could ask (and compute, using MLE) whether the coin is fair.

2 MLE may be optimal, but it is not infallible. If the sample size is very small (i.e., if there are few entries in the column of target data), then MLE won't perform well.

Monday, November 30, 2015

Privacy Preserving Authentication for Lightweight Devices

Welcome again, dear reader!

I have recently read "End-to-end Design of a PUF-based Privacy Preserving Authentication Protocol" by Aysu et al. which was published at CHES 2015.
They have designed and implemented a complete protocol for authenticating lightweight devices and include a security proof.
I would like to start by pointing out their definition of privacy:
A trusted server and a set of [...] deployed devices will authenticate each other where devices require anonymous authentication. [...] Within this hostile environment, the server and the devices will authenticate each other such that the privacy of the devices is preserved against the adversary.
So they are protecting the privacy of the authenticating devices against an adversary but not against the server to whom they are authenticating! If your background is in public key crypto or if you consider internet technologies this might seem surprising. After all, can't we just establish a secure channel to the server and an eavesdropper or man-in-the-middle will never learn anything?
Of course, it turns out the situation is not so simple:
  • Even if you use public key crypto you have to be very careful. In 2014 a paper by Degrabiele et al. showed how to fingerprint Australian health care cards using only ciphertexts encrypted with a card specific public key. Here is an explanation of the issue. The problem is that they used RSA which does not offer key privacy. Key privacy is the property that given a ciphertext and a set of public keys it should not be possible to determine which key was used for encryption.
  • For RFID tags it is actually often assumed that public key crypto is not possible at all. It is either too expensive (in terms of hardware) or too slow. One standard for e-ticketing, for example, requires a response time of less than 300 ms [source].
That the authors consider a symmetric system is also obvious by the extra capabilities that they grant their adversary:
The malicious adversary can control all communication between the server and (multiple) devices. Moreover, the adversary can obtain the authentication result from both parties and any data stored in the non-volatile memory of the devices.
Of course, if you want to use symmetric crypto you need to make sure that an adversary cannot just extract the shared key from a device.

And that is what the authors are tackling in this paper and where there adversarial model comes from!
Their solution to the problem involves another interesting invention, so called physically unclonable functions. These use properties of the hardware to create a function which can only be calculated on one device. Usually this is done by using some more or less random physical properties, like the initial contents of a memory chip after it is powered on, which are then post-processed so that outputs from different devices are random but the output of the same device is reasonably consistent.

With the help of physically unclonable functions the device and the server can derive a new shared secret for each session but the secret cannot be extracted from the device! Awesome.

Video lectures for Multiparty Computation

I am a big fan of filmed talks and conferences, as I personally find they are the best -or at least, the kindest- way to get into a new subject. Considering this, that I am new to the MPC world and that you might have ended up here looking for some on-line learning resources, here are a few talks I have collected that might help you deepen into this interesting area of research. When a name is given, it is that of the person who gave the talk.

Introductory talks 

Practical Two-Party Computation Based on Yao's Protocol 
The TinyOT Protocol 
SPDZ and Specific Protocols 
Other interesting results in MPC (mostly practice-focused): 

Thursday, November 26, 2015

Lattice meeting in Lyon

On November 25 and 26 I, together with some colleagues from the ENS lab, have taken part in a lattice meeting held at ENS Lyon and organized by Damien Stehlé.

We took a morning train from Paris to Lyon, where we arrived at about 11 am. We then had lunch in a Spanish restaurant, where we enjoyed some tapas, tortillas, serrano ham and other delicious things, before heading for ENS Lyon for the meeting.

The first talk, based on this article, was given by Nicolas Gama, from Inpher Switzerland and UVSQ France, and the title was "Solving shortest and closest vector problems: the decomposition approach". He began by giving some background notions on lattices (and also showed a very nice 3D animation of a lattice which was very appreciated by the audience) before moving on to the real topic of the talk. After introducing the Minkowski inequality he talked about the hardness of the SVP problem and showed us a graph which gives an idea of how the hardness changes when we choose a different approximation factor.
Then he talked about lattice basis reduction, starting from Lagrange's algorithm, which unfortunately works only for dimension 2: for higher dimensions we have LLL algorithm by Lenstra, Lenstra and Lovasz.
The central part of the talk was dedicated to sieving algorithms to solve SVP and CVP problems. Three kinds of approach were presented: based on geometric clustering, based on "representation and decomposition" and based on "towers of lattices", which means considering many lattices, each one included in the following one.

The second talk, based on this article, was given by Ilya Ozerov, who is about to complete his PhD in RUB, and the title was "On computing nearest neighbours with applications to decoding of binary linear codes". After a quick introduction to codes he presented a new algorithm which achieves the fastest decoding for random binary linear codes, with a complexity of $2^{0.097n}$, which is a nice improvement on Prange's $2^{0.121n}$ (1962) and subsequent results.

After two very interesting talks, we had a nice dinner in a bouchon Lyonnaise and... now we get ready for day two! In particular we will talk about algebraic algorithms for LWE and nearest neighbour searching with applications to lattice sieving.

Stay tuned for the second part!

Day 2

The second day started with a talk given by Ludovic Perret, from UPMC. The talk, based on this paper, was about "Algebraic algorithms for LWE". After a very quick introduction to the LWE problem and the worst-case/average-case connection with Gap-SVP problem, the speaker presented the Arora-Ge algorithm and introduced the BinaryError LWE problem, which features a binary error but provides the same security guarantee as the normal version of the problem. In this case we have to pay a particular attention to the number of samples that is given: if the error is binary then it must be limited, while in the general case there is no such constraint and we can have as many samples as we want. In the second part of the talk, a proof of the Arora-Ge algorithm was presented.

Léo Ducas from CWI gave the second talk of the day about "New directions in nearest neighbour searching with applications to lattice sieving". During this talk the technique of sieving and the Gap-NNS problem were presented.

This was my first lattice meeting and I have to say it was a really nice experience, so I would like to thank the organizers and all the friends I spent time with. I think such meetings are a wonderful opportunity to learn new things, ask questions and discuss new ideas, but also to meet people from different institutions who work on various problems.
Should anyone be interested, the next lattice meeting will be on January 21 and 22 in Paris. It will be organized by ENS Paris, and we hope to welcome many researchers!