Tuesday, August 23, 2016

CRYPTO 2016 –BAKCDOORS, BIG KEYS AND REVERSE FIREWALLS ON COMPROMISED SYSTEMS

The morning of the second day at CRYPTO’s 2016 started with a track on “Compromised Systems”, consisting of three talks covering different scenarios and attacks usually disregarded in the vast majority of the cryptographic literature. They all shared, as well, a concern about mass surveillance.

Suppose Alice wishes to send a message to Bob privately over an untrusted channel. Cryptographers have worked hard on this scenario, developing a whole range of tools, with different notions of security and setup assumptions, between which one of the most common axioms is that Alice has access to a trusted computer with a proper implementation of the cryptographic protocol she wants to run. The harsh truth is that this is a naïve assumption. The Snowden revelations show us that powerful adversaries can and will corrupt users machines via extraordinary means, such as subverting cryptographic standards, intercepting and tampering with hardware on its way to users or using Tailored Access Operation units.

Nevertheless, the relevance of these talks was not just a matter of a “trending topic”, or distrust on the authoritarian and unaccountable practices of intelligence agencies. More frequently than we would like, presumably accidental vulnerabilities (such as Poodle, Heartbleed, etc.) are found in popular cryptographic software, leaving the final user unprotected even when using honest implementations. In the meantime, as Paul Kocher remembered on his invited talk the day after, for most of our community it passes without notice that, when we design our primitives and protocols, we blindly rely on a mathematical model of reality that sometimes has little to do with it.

In the same way as people from the CHES community has become more aware –mainly also the hard way– that relying on the wrong assumptions leads to a false confidence of the security of the deployed systems and devices, I think those of us not that close to hardware should also try to step back and look at how realistic are our assumptions. This includes, as these talks addressed in different ways, starting to assume that some standards might –and most of the systems will— be compromised at some point, and that we understand what can still be done in those cases.

How would a Cryptography that worries not only about prevention, but also about the whole security cycle look like? How can the cryptography and information security communities come closer?

Message Transmission with Reverse Firewalls— Secure Communication on Corrupted Machines


The reverse firewalls framework was recently introduced by Mironov and Stephens-Davidowitz, with a paper that has already been discussed in our group’s seminars and this same blog. A secure reverse firewall is a third party that “sits between Alice and the outside world” and modifies her sent and received messages so that even if her machine has been corrupted, Alice’s security is still guaranteed.

Their elegant construction does not require the users to place any additional trust on the firewall, and relies on having the underlying cryptographic schemes to be rerandomizable. With this threat model and rerandomization capabilities, they describe impossibility results as well as concrete constructions.

For example, in the context of semantically secure public-key encryption, in order to provide reverse firewalls for Bob, the scheme must allow a third party to rerandomize a public key and map ciphertexts under the rerandomized public key to ciphertexts under the original public key. In the same context, Alice’s reverse firewall must be able to rerandomize the ciphertext she sends to Bob, in such a way that Dec(Rerand(Enc(m)))=m.

Big-Key Symmetric Encryption: Resisting Key Exfiltration


The threat addressed in Bellare’s talk is that of malware that aims to exfiltrate a user's key, likely using her system’s network connection. On their work, they design a schemes that aim to protect against this kind of Advanced Persistent Threats by making secret keys so big that their undetected exfiltration by the adversary is difficult, yet making the user’s overhead almost exclusively in terms of storage instead of speed.

Their main result is a subkey prediction lemma, that gives a nice bound on an adversary’s ability to guess a modest length subkey, derived by randomly selecting bits of a big-key from which partial information has already been leaked. This approach, known as the Bounded Retrieval Model, has been –in the words of the authors—largely a theoretical area of research, whereas they show a fully concrete security analysis with good numerical bounds, constants considered.
Other highlighted aspects of their paper were the concrete improvements over [ADW09] and the key encapsulation technique carefully based in different security assumptions (random oracle, standard model).

Backdoors in Pseudorandom Number Generators: Possibility and Impossibility Results


The last talk of the session focused on the concrete problem of backdoored Pseudorandom Number Generators (PRGs) and PRNGs with input, which are fundamental building blocks in cryptographic protocols that have already been successfully compromised, as we learnt when the DUAL_EC_DRBG scandal came to light.
On their paper, the authors revisit a previous abstraction of backdoored PRGs [DGA+15] which modeled the adversary (Big Brother) with weaker powers than it could actually have. By giving concrete “Backdoored PRG” constructions, they show how that model fails. Moreover, they also study robust PRNGs with input, for which they show that Big Brother is still able to predict the values of the PRNG state backwards, as well as giving bounds on the number of these previous phases that it can compromise, depending on the state-size of the generator.

[ADW09] J. Alwen, Y. Dodis, and D. Wichs. Leakage-resilient public-key cryptography in the bounded-retrieval model. In S. Halevi, editor, CRYPTO 2009, volume 5677 of LNCS, pages 36{54. Springer, Heidelberg, Aug. 2009.

[DGA+15] Yevgeniy Dodis, Chaya Ganesh, Alexander Golovnev, Ari Juels, and Thomas Ristenpart. A formal treatment of backdoored pseudorandom generators. In Elisabeth Oswald and Marc Fischlin, editors, EUROCRYPT 2015, Part I, volume 9056 of LNCS, pages 101–126, Sofia, Bulgaria, April 26–30, 2015. Springer, Heidelberg, Germany.

Saturday, July 30, 2016

ArcticCrypt

From 17th to 22nd of July the northernmost crypto workshop ever organized took place in Longyearbyen, Svalbard at latitude 78.13N. There are only about 1300 kilometers missing until the north pole.

Nordenskiöld glacier
Three ECRYPT-NET fellows (Marie-Sarah, Matthias and Ralph) had the opportunity to join the fascinating event.

The talks had a good mixture between talks from invited speakers as well as talks from researchers that submitted a paper. The topics reached from symmetric cryptography to fully homomorphic encryption as well as digital signatures and side channel attacks. The full program can be found here: http://arcticcrypt.b.uib.no/program/.

One of the most interesting talks on Monday, was given by Eik List: POEx: A Beyond-Birthday-Bound-Secure On-Line Cipher. In his talk he presented POEx that reached beyond-birthday-bound security with one call to a tweakable blockcipher and a call to a 2n-bit universal hash function per message block.  He then showed a security proof and gave possible instantiations.

In the evening from Monday/Tuesday at midnight there were two fascinating talks during midnight sun. The first one was given by Ron Rivest on: Symmetric Encryption based on Keyrings and Error correction. The second talk was from Adi Shamir: How Can Drunk Cryptographers Locate Polar Bears.

Adi Shamir explaining how drunk cryptographers can locate polar bears.

On Tuesday, Joan Daemen gave his invited talk on: Generic security of full-state keyed duplex. In his talk he briefly explained sponge constructions and how they can be used for authenticated encryption. Afterwards, he explained how to achieve beyond-birthday-bound security using sponges. In the end, he showed a new core of sponges, the (full state) keyed duplex construction.

On Wednesday, there was a full day of sightseeing planned, where we went on a boat trip to the "ghost town" Pyramiden. We started our boat trip in Longyearbyen, where we saw some minke whale. The captain of the ship told us that he saw also a blue whale a few days earlier. After a while we approached the bird cliffs where many seagulls and puffins were nesting. Birds are very important for the eco system of svalbard, as they exchange life from the water to the main lands. From there we continued our journey to the nordenskiöld glacier, a huge glacier with blueish shining ice. After a whiskey with glacier ice, we continued to our final destination. The "ghost town" pyramiden was a russian settlement and coal-mining community was closed in 1998 and is since 2007 a tourist attraction.
Reindeer
Polar bear
Polar bear warning sign
On Thursday, Marie-Sarah presented her paper: Security of BLS and BGLS signatures in a multi-user setting that was related to her master's thesis at the University of Waterloo. Marie-Sarah was inspired to re-visit it after reading about how the tightness of security reductions for the Schnorr signature scheme played a role in the CFRG's selection of an elliptic-curve based signature scheme for standardization. The standard notion of existential unforgeability under chosen-message attacks is in the single-user setting, since the adversary is given one public key to target.
In the more realistic multi-user setting, the attacker gets all users' public keys and can choose which one to attack. In her paper, she first analysed the BLS (Boneh-Lynn-Shacham) signature scheme's security in a manner similar to what was done for Schnorr - is key-prefixing necessary to maintain unforgeability of signatures in a multi-user setting? Next, she analysed the multi-user security of the aggregate signature scheme BGLS (Boneh-Gentry-Lynn-Shacham). She proposed a security notion in a multi-user setting analogous to the multi-user setting for normal (non-aggregate) signatures, then analysed BGLS's security in this model.

On Friday, Gregor Leander was presenting Structural Attacks on Block Ciphers where he presented invariant subspace attacks. Furthermore, he introduced an improved technique called non-linear invariant attacks.



Matthias, Marie-Sarah and Ralph
The artist of this sign never saw a polar
bear before in his life. Therfore, they told
him to paint a big white dog. 
Santa Clause mailbox

Wednesday, July 13, 2016

Crypto events in Île-de-France

The sunny weather and the general feeling of holiday were not impeding crypto-enthusiasts around Paris to meet and discuss the advancements in this topic. On one hand, the Paris Crypto Day brought together people working on different aspects of cryptography, who are based in the Paris area. The last such meeting was organized by ENS on 30.06.2016 and was fortunate to have Anne Canteaut (INRIA Paris), Leo Reyzin (BU), Victor Shoup (NYU) and Rafael Pass (Cornell) speaking about their research. On July 5-7, Paris also hosted a workshop organized within the HEAT (Homomorphic Encryption Applications and Technology) programme. It was held at Universite Pierre et Marie Curie (a.k.a. Paris 6) and it was composed of six invited talks given by famous researchers within the homomorphic encryption community and ten "regular" talks given by younger researchers and students.


Paris Crypto Day

The first presentation was given by Anne Canteaut on Algebraic Distinguishers against Symmetric Primitives. The talk focused on presenting a unified view about the notions of cube distinguishers and the more recently introduced division property. The aforementioned attacks are based on Knudsen's higher-order differential attacks which exploit properties of the polynomial representation of the cipher. The presentation was very appreciated by the symmetric and asymmetric cryptographers.

Victor Shoup gave a talk about hash proof systems1 and their applications, in which he reviewed definitions, constructions and applications. Hash proof systems can be seen as a family of keyed hash functions $H_{sk}$ associated to a language $L$ defined over a domain $D$. The secret hashing key $sk$ is used to compute a hash value for every input $x \in D$. Magically, there is a second way to compute the same hash value: it uses a projection key $pk$ (derived from the $sk$) and also a witness $w$ for $x \in L$. The original definition of hash proof systems requires that the projection key does not depend on the word $x$, but later, smooth projective hash functions allow for this change. Smooth projective hash functions have found applications, among others, in password authenticated key exchange.

Leo Reyzin from Boston University (joint work with Joel Alwen, Jeremiah Blocki, and Krzysztof Pietrzak), presented an analysis of SCrypt (originally introduced by Colin Percival in 2009 for Tarsnap), a tool whose potential applications include the realization of time-lock puzzles from memory-hard problems. The starting point for their work was the key derivation function in SCrypt. As stated by Leo during the talk, SCrypt is defined as the result of $n$ steps, where each step consists of selecting one of two previously computed values (the selection depends on the values themselves) and hashing them. It is conjectured that this function is memory-hard. The new result shows that in the Parallel Random Oracle Model, SCrypt is maximally memory-hard. One metric used is the product of time and memory used during the execution of SCrypt, for which the authors show the bound must be $\Theta(n^2)$. Interestingly, for a non-constant amount of memory used during the computation (this scenario simulates real applications), a more accurate metric - defined by the sum of memory usage over time - is again proven to be bounded by $\Theta(n^2)$ and this holds even if the adversary is allowed to make an unbounded number of parallel random oracle queries at each step.

The last speaker was Rafael Pass, from Cornell, who gave an gripping talk about the Analysis of the Blockchain Protocol in Asynchronous Networks. During his talk, Rafael defined the notions of consistency and liveness in asynchronous networks. In what followed, he explained his result that proves the blockchain consensus mechanism satisfies a strong forms of consistency and liveness in an asynchronous network with adversarial delays that are a-priori bounded.


HEAT Workshop

The workshop was really interesting because, besides new theoretical advances in the field, many talks were about the practical-side of FHE: how to set the parameters, concrete results in cryptanalysis, libraries and real-world applications. The part about lattice reduction techniques was especially interesting.

In particular, Antoine Joux gave a talk named "The prehistory of lattice-based cryptanalysis" where he reviewed some lattice reduction algorithms (Gauss's algorithm for two dimensions and LLL for higher dimensions) and gave some cryptanalytic results, e.g. Shamir's attack against the knapsack problem and the low-density attack against Merkle-Hellman knapsack. Basically, lattice-reduction aims at finding a "good" basis, made of short and almost orthogonal vectors, from a "bad" one, made of long and non-orthogonal vectors. In fact, with a good basis problems like SVP or CVP become easy and it is possible to break cryptosystems based on these problems. There are algorithms that do this (like the famous LLL) but the conclusion was that lattice-base cryptography remains secure as long as lattices are big enough: in fact, all the lattice-reduction algorithms work well if the dimension is not too high. With higher dimension many problems appear and lattice-reduction remains hard.

Another interesting talk about this kind of topic was "An overview of lattice reduction algorithms" by Damien Stehlé, who pointed out that lattice reduction has mainly two goals: beside the predictable one of cryptanalysing lattice-based cryptosystems (such as NTRU and all those based on SIS and LWE), it is useful for cryptanalysing other cryptosystems as well, like variants of RSA. He then presented the two main algorithms in this field, i.e. BKZ and LLL, and outlined their differences, like the global strategy used by BKZ versus the local one used by LLL. He also introduced faster-LLL2, an improvement of the LLL algorithm which is the subject of one of his most recent works. In the conclusions, he mentioned some open problems and finding a "quantum acceleration" is certainly one of the most interesting ones. In fact, as far as we know, lattice problems are not easier for quantum computers, and this is the reason why they are considered the most promising candidate for post-quantum cryptography.

If someone is into coding, this may be interesting: Shi Bai gave a short talk about FPLLL, an implementation of Floating-Point LLL and BKZ reduction algorithms created by Damien Stehlé. It is a C++ library (also available in Python under the name of FPyLLL) which is also used by the popular Sage. Its goal, as stated by the authors, is to provide benchmarks for lattice reduction algorithms and, more in general, lattice reduction for everyone. More details can be found at https://github.com/fplll/fplll and contributions are welcome!

Besides lattice reduction algorithms, another interesting talk was given by Florian Bourse, who presented a recent work3 about circuit privacy for FHE. The main result is that it is possible to homomorphically evaluate branching programs over GSW ciphertext's without revealing anything about the computation, i.e. the branching program, except for the result and a bound on the circuit's size, by adding just a small amount of noise at each step of computation. This means that the "price" to pay is quite low, especially if compared to other techniques based on bootstrapping. Also, this method does not rely on not-so-well-understood assumptions like circular security and only assumes the hardness of LWE with polynomial modulus-to-noise ratio.


References

1. Cramer R, Shoup V. Universal hash proofs and a paradigm for adaptive chosen ciphertext secure public-key encryption. In Advances in Cryptology— EUROCRYPT 2002, vol. 2332, LNCS. Springer: New York, NY, 2002; 45–64.

2. Arnold Neumaier and Damien Stehlé. Faster LLL-type reduction of lattice bases. ISSAC 2016.

3. Florian Bourse, Rafael Del Pino, Michele Minelli and Hoeteck Wee. FHE circuit privacy almost for free. CRYPTO 2016, to appear.


This blog post has been collaboratively written by Michele and Razvan.

Tuesday, July 12, 2016

The Subset-Sum Problem

The Subset-Sum problem (also known as knapsack problem) we want to review in this blog-post is defined as follows: Given $n, S, a_1, a_2, \dots a_n \in \mathbb{N}$, find \begin{align} I \subseteq [n] = \{1,2, \dots, n\}: \sum_{i \in I} a_i = S. \quad (1) \label{s} \end{align}
Historical Remarks

This old problem was first studied in 1897, the same year the first airborne mission to completely reach the geographical north pole (NP) started (and ended...), and was one of the first proven to be NP-complete -- worst-case instances of this problem are computationally intractable. Subset-Sum was proved to be NP-complete by reducing '3-SAT' to the 'Graph Coloring Problem' which was reduced to 'Exact cover' which was reduced to Knapsack and close variants thereof. These proofs were carried out during the early 1970's rigorous reduction proofs and Subset-Sum problem is featured on Karp's somewhat famous list of 21 NP-complete problems, all infeasible to solve on current computers & algorithms thus a possible basis for cryptographic primitives. In the following table one can see how the expected time/space requirements of algorithms solving (1) in hard cases evolved as the techniques were refined by modern research:
Expected time and space requirements of algorithms solving (1) in average hard instances.
The time and space requirements of current approaches are considerably less than for performing exhaustive search, a generic meet in the middle attack or even a finer combinatorial split into four (or multiple) lists. The currently best known algorithm, asymptotically speaking, is a quantum algorithm. The problem of determining a lower bound of the run-time is still an open research question. It seems more difficult than to see a possible link between the declining polar bear population in the arctic regions and the retreating sea ice accelerated by the observable rapid climate change.

Let us review two classical techniques that led to remarkable speed-ups:

Technique 1 - Meet in the Middle

Schröppel-Shamir: Combining disjoint sub-problems of smaller weight.
Hard instances of the Subset-Sum problem are characterized by relatively large elements $(\log_2 a_i \approx n)$ and a balanced solution, i.e. $|I| \approx \frac n 2$ in Equation (1). Identifying subsets of $[n]$ with length $n$ vectors $x$ over the 'number-set' $\{0,1\}$ via $i \in I \Leftrightarrow x[i] = 1$ one constructs lists $L_1, L_2$ of pairs merged to a solution in $L_0$:
Algorithms based on the birthday-paradox construct expected collisions in the second component of the sub-problems in the lists $L_1, L_2$ forcing any $x \in L_0$ to fulfill (1). The difficulty is to estimate the list-size needed to observe the existence of one solution with high probability. It is desirable to ensure that it is more likely to terminate the algorithm with a non-empty $|L_0| \geq 1$ (i.e. have a solution) than the chance to see a polar bear towards the north-east or meet one in the middle of Svalbard, Norway.

Technique 2 - Enlarge Number Set

BCJ11: Adding length $n$ sub-solutions increases the number-set.
The idea in Howgrave-Graham-Joux (2010) that was later extended by Becker-Coron-Joux (2011) was to allow multiple representations. This comes at the price of enlarging the number-set, i.e. having $$x_0[i] = x_1[i] + x_2[i] \not \in \{0, 1\}.$$ Additionally to the first improvements due to constructing colliding sums, constructing too many potential solutions and introducing a non-trivial filtering step to remove 'inconsistent' ones when merging the two lists $L_1$ and $L_2$ back into one, still gave an overall speed-up asymptotically. 
The number-set used by the authors was $\{-1,0,1\}$, indicating a summand appearing on both sides of Equation (1).
After constructing sufficiently many sub-problems and their respective partial solutions a collision can be expected thus the combination forms a solution for the given instance.  


Applications 
The cryptanalytic methods for structurally approaching the Subset-Sum problem are valuable algorithmic meta-techniques also applicable to other NP-complete problems like lattice- or code-based problems.
Credits: http://fav.me/d3a1n08
Such problems are promising candidates for the construction of post-quantum cryptosystems, cloud security applications and encrypted Polar Bear TV broadcasts. There is a conference focusing on such topics (Polar bears and NP-complete problems) coming up - stay tuned.



PS: The bad image quality, is due to blogger wouldn't let me include vector-graphics like .pdf or .eps nor directly render them giving latex code... :-(

Sunday, July 10, 2016

Workshop on PIR, distributed storage, and network coding at RHUL

Friday at Royal Holloway, there was a one-day workshop on private information retrieval (PIR), distributed storage, and network coding. Four speakers gave talks on topics including coding for distributed storage, coding vs. replication, and multi-server PIR schemes.

Coding for distributed storage

The first speaker, P. Vijay Kumar, gave an overview of coding for distributed storage. When a storage node fails (due to hardware failure or corrupted data, for example), there are two main issues to consider: repair bandwidth (how much data it must download to recover) and repair degree (how many other nodes must communicate with it to help it recover). We learned about which types of coding schemes are used in practice. For some applications, triple replication—without any coding—is the standard solution. The RAID 6 storage architecture uses any maximum distance separable (MDS) code that tolerates 2 failures. Facebook uses HDFS-RAID with one of two erasure codes (blog post by Facebook engineers). HDFS-RAID is a modified instance of the Hadoop distributed file system (HDFS, which replicates data three times) that uses erasure codes to reduce the effective replication of data to about 2. Windows Azure Storage uses Local Reconstruction Codes (LRC) (Microsoft blog post, paper).

PIR and coding

The second speaker, Alex Vardy, presented techniques for PIR that use coding instead of replication.

Private information retrieval (PIR) is relatively new; it was proposed in a 1995 paper by Chor, Goldreich, Kushilevitz, and Sudan. The setting for PIR is as follows. A server has a public, static database of n items (bits, for example) and a user wants to retrieve a certain one without the server knowing exactly which. Ideally, a computationally-unbounded server wouldn't be able to get any information about the user's request: the distribution of queries wouldn't depend on the index of the requested data item. But the only PIR scheme for a single database that satisfies this notion of "perfect privacy" requires the server to send the entire database in response to every query. Instead, there are computational notions of privacy for PIR (where the server is computationally bounded) and for settings with more than one copy of the database, there are other information-theoretic notions.

One interesting aspect of the multi-server PIR setting that Alex presented was that the servers are assumed to be non-colluding. There was some discussion about whether this assumption makes sense, and whether any communication at all should be allowed between the servers. For example, if all but one of the servers receives a query, will they know?

The third speaker, Salim El Rouayheb, presented some coding-based PIR schemes with a different security model: some number b of the nodes are passive adversaries who can collude. We learned about Freenet, a privacy-friendly P2P distributed storage system for sharing files (paper).

Network coding

The last talk, by Tuvi Etzion, was about network coding and its connections to distributed storage and PIR. A network is represented by an acyclic directed graph that can have parallel arcs (edges) and whose arcs each have an associated capacity. We considered two types of multicast networks. In the first, there is one source node that must distribute h messages (field elements) to n receiver nodes. In the second, there are h source nodes with one message each and n receiver nodes that need to receive all of the messages. In the scalar linear setting, each node "transforms" the messages it receives according to its local coding vector. (In the vector linear setting, nodes have local coding matrices.)

An example of a simple linear multicast network. The top two nodes are the source nodes and A and B are the messages they want to send to both of the receiver nodes (the bottom two nodes).

Tuvi shared his conjecture that for a multicast network of the first type with two messages, there is no vector solution that outperforms the optimal scalar solution. (See one of Tuvi's recent papers for related results.) It was neat that properties of multicast networks can be expressed succinctly in terms of graph-theoretic properties, like the size of a minimum cut between source nodes and receiver nodes.

I enjoyed this educational yet relaxed one-day workshop. What I found most interesting was how cloud storage providers and companies like Facebook encode their data. It's clear why they care about even small reductions in effective data replication: they want to maximize availability and tolerate hardware failures while reducing storage and communication costs. As a skeptical cryptographer, I can't help but wonder how they compose these coding schemes with encryption and other cryptographic tools to protect users' privacy.

Friday, June 10, 2016

The Enterprising Researcher: Innovation, Entrepreneurship and University IP

Let's have a break from pure cryptography for a moment. Lift your head from the security proof you are trying to come up with, or from the poly-time factoring algorithm you are coding. Now look around: each object surrounding you has passed through a lot of stages before becoming the commercial product you stare at. There is also a good chance that many of them were once ideas in published research papers, probably similar to those we are all planning to write at some point. Papers not only carrying academic work but content that, through efforts and developments and visions, ended up being an object in our everyday life.

"The Enterprising Researcher: Innovation, Entrepreneurship and University IP" is a workshop I attended organised by Bristol Basecamp. It shed light on the process an idea needs to undergo to become a product and gave insights on the fact that the word "entrepreneurship" carries a stronger meaning than just "commercialising an idea" (valuable by itself). It suggests a sense of innovation and creation: like an artist who uses brushes to outline figures (and more importantly emotions), the entrepreneur combines resources and ideas to shape new markets and products (from which, most importantly, desires of people arise). This poetic point of view was particularly evident in the first talk.

Harry Destecroix and Laurent Chabanne are respectively CEO and CSO of Ziylo, a spin-off of the University of Bristol with expertise in continuous carbohydrate sensing technology. They shared their experience in bringing an academic idea to the market: successes, problems, obstacles and satisfactions are all part of the start-upper's life! Among all, I was impressed by how much passion and dedication they put in their work, hence I will often quote their exact own words to briefly describe their story, which is really worth listening to.

Their journey began with a scientific discovery (sugar-sensing platforms using synthetic receptors to isolate glucose-like structures) which was the outcome of research carried out by Professor Anthony Davis' research group at the University of Bristol. We are in the academic world for now, hence usual laws governing scientific publications still rule. Entering the market is different: first of all, it requires money. Since the idea was born in the university, the first source of help came from the inside. Research and Enterprise Development (RED) provides support and assists the growth of entrepreneurs from within the university. This is not just the step where you need money though, but it is also the phase during which awareness and information about business are vital. As they said, you need to "find complementary skill sets to yours", for instance in someone who's expert of marketing because "it costs a lot of money and time to write a business plan" (and it’s not something they teach you in school, I may add). Eventually, you need to "stop talking about tech and do marketing". I also enjoyed the human factor around this point, as they suggested to start these kinds of adventures with people you trust, with friends! It reminds us that start-ups as well as huge corporations are first of all human communities.

Another crucial point is that the trip is not easy. Now they are successfully working with SETsquared and Innovate UK, but raising funding was an issue to face and there might be moments in which the project seems to be failing: "you need to get used to the fact that you can lose your job at any time. If you’re not ok with that, don't bother". The important thing is to always have a positive attitude and to have faith in what you're doing, after all "without downs you don't know when you're up".

Funding is just one (although quite steep to climb) obstacle you might encounter in your way to the market. I would like to reference another one they mentioned as there is a very important entrepreneurship lesson to learn there. Being a company in biochemistry, they needed labs: expensive and dedicated structures, not the kind listed on Rightmove! Instead of coping with their specific problem, they identified a potential market and they (very recently) founded NS Science, a commercial real estate company focused on providing labs and science facilities to businesses. I would like to stress the moral about "being entrepreneur" laying behind this example: they didn't just commercialise a service, but they shaped the need of a community. Now that their company is around, other people who have similar issues may feel the desire to produce something new thanks to NS Science's facilities. Even better: NS Science may inspire other people who put aside their ideas because of a lack of spaces, for instance. Let me give a final (rather famous) example of the "needs/desires creation" aspect of entrepreneurship: did you need iPads before they were invented? What changed apart from the invention of iPads itself? I believe that, after scratching the surface of possible answers, the outcome could be incredibly surprising.

The second talk took a step back: how to publish research ideas and related legal aspects. Kathryn Smith, Research Engagement Librarian, focused on the importance of easily available research papers through green open access, that is to say repositories of manuscripts that differ from the published version (even just in the formatting, sometimes) so that they can be freely released. This is motivated by the fact that some entities may have restricted access to research outcomes (industry, public sector, charity foundations…) and that "you don't know who's out there: investors, new collaborators, people who could possibly develop your work in ways you hadn't even thought of". In our specific field, ePrint is an example of green open access. The University of Bristol also has its own green open access repository called Pure, whose team also checks for possible legal incompatibilities with the journals the paper is published in. In these regards, she pointed to a really useful search engine for manuscript publication policies of journals: SHERPA/RoMEO. For example, the picture shows the result of querying "LNCS", where being a "RoMEO green journal" means that you "can archive pre-print and post-print or publisher's version/PDF".

The third talk was also focused on legal aspects: Sue Sunstrom, Head of Commercialisation and Impact Development, answered several questions about Intellectual Property (IP) and how they relate to the university. First of all, there exist different kinds of IP (trademark, copyright, patent, trade secret, design...). They offer different features and are applicable to different (either concrete or abstract) objects. Since it can be quite expensive, choosing the right form of IP (hence the right form of protection) is crucial. But defence against competitors is not the only reason why IPs matter: investors like them because they are a proof you own something different and possibly innovative. They make the object they protect valuable in some situations (think of industrial secrets, for instance) and, as every form of value, they can be used as a currency in commercial trades. The interest of universities in IPs on the outcomes of research stems from various motivations: develop practical applications (hence have an impact on society), attract funding (for new research, hence possibly new IPs, entering a virtuous circle) and strengthen the link with industry are just few of them.

The last talk was given by Prof Alan Palmer on translational life science research, exemplified by the process of commercialising drugs. Such a topic is indeed strictly related to the biomedical field and is about academic and industrial efforts made in order to develop new solutions for prevention, diagnosis, and therapies. I am intentionally brief here, just have a look at Prof Palmer’s Linkedin profile to have a feeling of who an entrepreneur is!

Back to our beloved crypto, I think that this event has added a brick to my personal awareness (and I hope yours, after this reading) about what is going on out there, following what AvonCrypt had started. That was the perfect occasion in which to see how these realities exist in our field too (and I have the feeling it is particularly fertile). Thanks to this event, I understood what probably happened to those companies behind the scenes and I also found out another example (my bad I didn't know it before!): there is a start-up in Bristol called KETS which has recently won a prize and related funding for their usage of "quantum cryptography to improve data encryption, ensuring information is safe in all situations, from bank transactions to critical infrastructure, and to individuals shopping online from the comfort of their own home".

As Harry and Laurent said during their talk, "start-upping involves learning" so "take a book and read about business and economics": a suggestion I will certainly follow.


Marco
(Thanks to Matthias, Simon and Marie-Sarah for comments and corrections)

Monday, June 6, 2016

A visit to the National Museum of Computing in Bletchley Park

Last Friday, I went to the National Museum of Computing (TNMOC) in Bletchley Park, home of the British Government Code and Cypher School during World War II. It was hosting a special event to celebrate two new acquisitions: a Lorenz teleprinter recently found on eBay and a Lorenz SZ42 cipher machine on long-term loan from the Norwegian Armed Forces Museum. TNMOC now has all of the key parts (either original or rebuilt) used in the process of encryption, interception, and decryption of Lorenz messages sent by the German High Command during WWII. Five women of the WRNS (Women's Royal Navy Service, whose members are often called "wrens") who operated Colossus and the relatives of others who contributed to breaking Lorenz attended this special event.

John Whetter, one of the leaders of the team at TNMOC that rebuilt the British Tunny (in the background), holds a Spruchtafel, next to the Lorenz machine on loan from Norway.

The Lorenz cipher

The story of Lorenz, Bill Tutte, and Tommy Flowers is perhaps less well known than the story of Enigma and Alan Turing. The Enigma machine encrypted messages sent among units of the German army, navy, and air force. It had 3 or 4 rotors and operated directly on an alphabet of 26 letters, which were then transmitted in Morse code.

The Lorenz SZ42, on the other hand, was custom-built for the German High Command to send the most important strategic messages to its Field Marshals. It was more complex and less portable than an Enigma machine. The Lorenz cipher could handle letters, punctuation, and spacing: each character was encoded as 5 bits according to the Baudot code. The machine had 12 wheels, each with a number of cams (pins) on it. The numbers of cams on the wheels were co-prime. The "key" was in two parts: the starting position of each wheel ("wheel setting"), and the pattern of raised or lowered cams on each wheel ("wheel pattern"). The wheel settings were supposed to be changed for each message, while the wheel patterns were changed infrequently—for the first few years. When the wheel patterns did begin to change more frequently, however, Colossus II was operational and could find them.

The entire process of intercepting a message went roughly as follows.

1. Setting up to send the message

  • The sending operator in Berlin picks six pairs of letters at random from a prepared sheet.
  • He or she types them in to the teleprinter (without the Lorenz machine attached). The output is a paper tape with punched holes corresponding to the Baudot encoding of the letters.
  • Next, the operator uses a board of wheel settings (a Spruchtafel) to determine the starting position of the Lorenz SZ42's rotors. Each of the letters corresponds to a number.
Lorenz SZ40 (Tunny) Indicator Reading Board

German Lorenz operators consulted a Spruchtafel to determine which wheel settings (starting positions) to use based on a given 12-letter indicator. (source)

2. Encrypting and sending the message

  • Now, the teleprinter operator in Berlin hooks up the Lorenz encryption machine to the teleprinter and types the plaintext message.
  • The encrypted message is output on the same perforated paper tape, again encoded with the Baudot code.
  • The paper tape corresponding to the 12-letter indicator and the ciphertext is fed to a radio transmitter, which broadcasts it.

3. Intercepting the message

  • Radio receivers at an intercept station at Knockholt, Kent (south-east of London) pick up the encrypted message.
  • The faint signals are fed to an undulator, which uses an ink pen to record a continuous trace of the signal on a strip of paper tape, the "slip".
  • Slip readers (people, not machines) translate the highs and lows on the slip to characters according to the Baudot code. To minimize errors, two or more slip-readers read each transmission.
  • The characters are typed in to a perforator that produces another strip of paper upon which the characters are encoded in Baudot code.
  • The intercepted message is sent to Bletchley Park (100 km away) in two ways: by secure landline and by motorcycle courier.

4. Decrypting the message

  • The perforated tape is fed to Colossus, which outputs the most likely wheel settings (Colossus I) and wheel patterns (Colossus II onwards).

The input to the Colossus machine is perforated paper tape with characters in 5-bit Baudot code.

WWII-era cryptography vs. modern cryptography

I went to TNMOC with Thyla van der Merwe, another PhD student at Royal Holloway, to speak to the guests for a few minutes about cryptography today and how it works now compared to how it worked in the WWII era.

Thyla and Marie-Sarah next to TNMOC's rebuilt Colossus.

Thyla explained the benefits of using a stream cipher, like the Lorenz cipher—they're fast, they don't propagate ciphertext errors, and they require only small buffers. These properties made it appropriate for encrypting radio transmissions. She pointed out how ordinary citizens of the WWII era probably didn't use encryption, while today, it is ubiquitous: everyone who's been online or had a cell phone has used it.

I talked about what makes "modern" cryptography different. At the time of WWII, public-key cryptography had not yet been discovered, so sharing keys for any kind of symmetric protocol was still hard. Cryptography in that era also didn't have the precise definitions, clear assumptions, and rigorous security reductions we have today. (Katz and Lindell's textbook does a wonderful job of explaining these three features of modern cryptography.) Although these more formal aspects of modern cryptography are powerful, their strength in the real world is limited in two ways. First, they may not capture all of the information or capabilities an attacker may have (e.g., side-channel attacks). Second, and maybe even more importantly, they come with the assumption that protocols are implemented and used exactly as they should be.

For example, cryptographers know how important it is that a stream cipher (like the Lorenz cipher) never re-uses the same keystream for different messages, because the XOR of two ciphertexts would equal the XOR of the two plaintexts. If the two messages are similar, then keystream re-use is particularly dangerous. This mistake is exactly what led cryptanalysts at Bletchley Park to decrypt two long messages and obtain 4000 characters of keystream: in August 1941, a long message was retransmitted with a few minor changes, but with the same key settings. Within a few months, cryptanalyst John Tiltman had recovered the keystream. By January 1942, Bill Tutte had fully reverse-engineered the Lorenz machine... without ever having seen it!

The operator or implementer of a cryptographic protocol that uses a stream cipher may not understand how important it is that the keystream never be re-used, or may simply make a mistake. This type of mistake hasn't happened only in WWII. In the late 1990s, the IEEE 802.11 standard specified the WEP (Wired Equivalent Privacy) protocol for Wi-Fi networks. WEP uses the stream cipher RC4 to encrypt traffic from access points (wireless routers) to mobile stations (all devices wirelessly connected to the network). Partly due to the WEP protocol's design, and partly due to how the access points' keys tended to be managed in practice, the same RC4 keystream was frequently re-used in implementations. (The key supplied to RC4 was a concatenation of a 24-bit IV, sent in plaintext along with each encrypted message, and a 40-bit shared secret key, which was rarely changed.) Read more about WEP's shortcomings in Borisov, Goldberg, and Wagner's CCS 2001 paper.

Modern cryptography may offer many new tools, definitions, and rigorous proofs, but some things will never change: designing protocols that are secure in the real world is still really hard, and breaking cryptographic schemes still requires a great deal of creativity, analysis, and dedication.

More about the Lorenz story

Determining how the Lorenz machine worked was only the first step. Tommy Flowers, an engineer at the Post Office Research Station, designed and built an emulator, "Tunny," of the Lorenz machine. An entire section at Bletchley Park (the "Testery," named after the section head, Ralph Tester) was devoted to decrypting the messages—which they did by hand for the first 12 months. Max Newman and Tommy Flowers designed and built machines to speed up the decryption process: the "Heath Robinson" and "Colossus". Colossus was the first electronic digital machine that was programmable (with plugs and switches). Heath Robinson and Colossus were operated (and named, actually) by members of the WRNS.