# Hash functions and random oracles

We have seen pseudorandom generators, functions and permutations, as
well as Message Authentication codes, CPA and CCA secure encryptions.
This week we will talk about cryptographic hash functions and some of
their magical properties. We motivate this by the *bitcoin*
cryptocurrency. As usual our discussion will be highly abstract and
idealized, and any resemblance to real cryptocurrencies, living or dead,
is purely coincidental.

## The “bitcoin” problem

Using cryptography to create a *centralized* digital-currency is fairly
straightforward, and indeed this is what is done by Visa, Mastercard
etc.. The main challenge with bitcoin is that it is *decentralized*.
There is no trusted server, there are no “user accounts”, no central
authority to adjudicate claims. Rather we have a collection of anonymous
and autonomous parties that somehow need to agree on what is a valid
payment.

### The currency problem

Before talking about cryptocurrencies, let’s talk about currencies in
general.I am not an economist by any stretch of the imagination, so please
take the dicussion below with a huge grain of salt. I would
appreciate any comments on it. At an abstract level, a *currency* requires two components:

- A scarce resource
- A mechanism for determining and transferring
*ownership*of certain quantities of this resource.

The original currencies were based on commodity money. The scarce resource was some commodity having intrinsic value, such as gold or silver, or even salt or tea, and ownership based on physical possession. However, as commerce increased, carrying around (and protecting) the large quantity of the commodities became impractical, and societies shifted to representative money, where the currency is not the commodity itself but rather a certificate that provides the right to the commodity. Representative money requires trust in some central authority that would respect the certificate. The next step in the evolution of currencies was fiat money, which is a currency (like today’s dollar, ever since the U.S. moved off the gold standard) that does not correspond to any commodity, but rather only relies on trust in a central authority. (Another example is the Roman coins, which though originally made of silver, have underdone a continous process of debasement until they contained less than two percent of it.) One advantage (sometimes disadvantage) of a fiat currency is that it allows for more flexible monetary policy on parts of the central authority.

### Bitcoin architecture

Bitcoin is a fiat currency without a central authority. A priori this seems like a contradiction in terms. If there is no trusted central authority, how can we ensure a scarce resource? who settles claims of ownership? and who sets monetary policy? Bitcoin (and other cryptocurrencies) is about the solution for these problems via cryptographic means.

The basic unit in the bitcoin system is a *coin*. Each coin has a unique
identifier, and a current *owner* .This is one of the places where we simplify and deviate from the
actual Bitcoin system. In the actual Bitcoin system, the atomic unit
is known as a *satoshi* and one bitcoin (abberviated BTC) is \(10^8\)
satoshis. For reasons of efficiency, there is no individual
identifier per satoshi and transactions can involve transfer and
creation of multiple satoshis. However, conceptually we can think of
atomic coins each of which has a unique identifier. Transactions in the system have
either the form of “mint coin with identifier \(ID\) and owner \(P\)” or
“transfer the coin \(ID\) from \(P\) to \(Q\)”. All of these transactions are
recorded in a public *ledger*.

Since there are no user accounts in bitcoin, the “entities” \(P\) and \(Q\)
are not identifiers of any person or account. Rather \(P\) and \(Q\) are
“computational puzzles”. A *computational puzzle* can be thought of as a
string \(\alpha\) that specifies some “problem” such that it’s easy to
verify whether some other string \(\beta\) is a “solution” for \(\alpha\),
but it is hard to find such a solution on your own. (Students with
complexity background will recognize here the class **NP**.) So when we
say “transfer the coin \(ID\) from \(P\) to \(Q\)” we mean that whomever holds
a solution for the puzzle \(Q\) is now the owner of the coin \(ID\) (and to
verify the authenticity of this transfer, you provide a solution to the
puzzle \(P\).) More accurately, a transaction involving the coin \(ID\) is
self-validating if it contains a solution to the puzzle that is
associated with \(ID\) according to the latest transaction in the ledger.

Please re-read the previous paragraph, to make sure you follow the logic.

One example of a puzzle is that \(\alpha\) can encode some large integer
\(N\), and a solution \(\beta\) will encode a pair of numbers \(A,B\) such
that \(N=A\cdot B\). Another more generic example (that you can keep in
mind as a potential implementation for the puzzles we use here) is that
\(\alpha\) will be a string in \(\{0,1\}^{2n}\) while \(\beta\) will be a
string in \(\{0,1\}^n\) such that \(\alpha = G(\beta)\) where
\(G:\{0,1\}^n\rightarrow\{0,1\}^{2n}\) is some pseudorandom generator. The
real Bitcoin system typically uses puzzles based on *digital
signatures*, a concept we will learn about later in this course, but you
can simply think of \(P\) as specifying some abstract puzzle and every
person that can solve \(P\) can perform transactions on the coins owned by
\(P\).There are reasons why Bitcoin uses digital signatures and not
these puzzles. The main issue is that we want to bind the puzzle not
just to the coin but also to the particular transaction, so that if
you know the solution to the puzzle \(P\) corresponding to the coin
\(ID\) and want to use that to transfer it to \(Q\), it won’t be
possible for someone to take your solution and use that to transfer
the coin to \(Q'\) before your transaction is added to the public
ledger. We will come back to this issue after we learn about digital
signatures. In particular if you lost the solution to the puzzle then you
have no access to the coin, and if someone stole the solution from you,
then you have no recourse or way to get your coin back. People have
managed to lose millions of
dollars
in this way.

## The bitcoin ledger

The main idea behind bitcoin is that there is a public *ledger* that
contains an ordered list of all the transactions that were ever
performed and are considered as valid in the system. Given such a
ledger, it is easy to answer the question of who owns any particular
coin. The main problem is how does a collection of anonymous parties
without any central authority agree on this ledger? This is an instance
of the *consensus* problem in distributed computing. This seems quite
scary, as there are very strong negative results known for this problem;
for example the famous Fischer, Lynch Patterson (FLP)
result
showed that if there is even one party that has a *benign* failure
(i.e., it halts and stop responding) then it is impossible to guarantee
consensus in an asynchronuous network. Things are better if we assume
synchronicity (i.e., a global clock and some bounds on the latency of
messages) as well as that a majority or supermajority of the parties
behave correctly. The central clock assumption is typically
approximately maintained on the Internet, but the honest majority
assumption seems quite suspicious. What does it mean a “majority of
parties” in an anonymous network where a single person can create
multiple “entities” and cause them to behave arbitrarily (known as
“byzantine” faults in distributed parlance)? Also, why would we assume
that even one party would behave honestly- if there is no central
authority and it is profitable to cheat then they everyone would cheat,
wouldn’t they?

Perhaps the main idea behind bitcoin is that “majority” will correspond to a “majority of computing power”, or as the original bitcoin paper says, “one CPU one vote” (or perhaps more accurately, “one cycle one vote”). It might not be immediately clear how to implement this, but at least it means that creating fictitious new entities (sometimes known as a Sybill attack after the movie about multiple-personality disorder) cannot help. To implement it we turn to a cryptographic concept known as “proof of work” which was originally suggested by Dwork and Naor in 1991 as a way to combat mass marketing email.This was a rather visionary paper in that it foresaw this issue before the term “spam” was introduced and indeed when email itself, let alone spam email, was hardly widespread.

Consider a pseudorandom function \(\{ f_k \}\) mapping \(n\) bits to \(\ell\)
bits. On average, it will take a party Alice \(2^\ell\) queries to obtain
an input \(x\) such that \(f_k(x)=0^\ell\). So, if we’re not too careful, we
might think of such an input \(x\) as a *proof* that Alice spent \(2^\ell\)
time.

Stop here and try to think if indeed it is the case that one cannot find an input \(x\) such that \(f_k(x)=0^\ell\) using much fewer than \(2^\ell\) steps.

The main question in using PRF’s for proofs of work is who is holding
the key \(k\) for the pseudorandom function. If there is a trusted server
holding the key, then sure, finding such an input \(x\) would take on
average \(2^\ell\) queries, but the whole point of bitcoin is to *not*
have a trusted server. If we give \(k\) to a party Alice, then can we
guarantee that she can’t find a “shortcut” to find such an input without
running \(2^\ell\) queries? The answer, in general, is **no**.

Indeed, it is an excellent exercise to prove that (under the PRF conjecture) that there exists a PRF \(\{ f_k \}\) mapping \(n\) bits to \(n\) bits and an efficient algorithm \(A\) such that \(A(k)=x\) such that \(f_k(x)=0^\ell\).

However, suppose that \(\{ f_k \}\) was somehow a “super-strong PRF” that
would behave like a random function *even to a party that holds the
key*. In this case, we can imagine that making a query to \(f_k\)
corresponds to tossing \(\ell\) independent random coins, and it would not
be feasible to obtain \(x\) such that \(f_k(x)=0^\ell\) using much less than
\(2^\ell\) cycles. Thus presenting such an input \(x\) can serve as a “proof
of work” that you’ve spent \(2^\ell\) cycles or so. By adjusting \(\ell\) we
can obtain a proof of spending \(T\) cycles for a value \(T\) of our choice.
Now if things would go as usual in this course then I would state a
result like the following:

Theorem:Under the PRG conjecture, there exist super strong PRF.

Unfortunately such a result is *not* known to be true, and for a very
good reason. Most natural ways to define “super strong PRF” will result
in properties that can be shown to be *impossible to achieve*.
Nevertheless, the intuition behind it still seems useful and so we have
the following heuristic:

The random oracle heuristic (aka “Random oracle model”, Bellare-Rogaway 1993):If a “natural” protocol is secure when all parties have access to a random function \(H:\{0,1\}^n\rightarrow\{0,1\}^\ell\), then it remains secure even when we give the parties thedescriptionof a cryptographic hash function with the same input and output lengths.

We don’t have a good characterization as to what makes a protocol “natural” and we do have fairly strong counterexamples to this heuristic (though they are arguably “unnatural”). That said, it still seems useful as a way to get intuition for security, and in particular to analyze bitcoin (and many other practical protocols) we do need to assume it, at least given current knowledge.

The random oracle heuristic is very different from all the conjectures
we considered before. It is **not** a formal conjecture since we don’t
have any good way to define “natural” and we do have examples of
protocols that are secure when all parties have access to a random
function but are **insecure** whenever we replace this random function
by **any** efficiently computable function (see the homework exercises).

We can now specify the “proof of work” protocol for bitcoin. Given some identifier \(ID\in\{0,1\}^n\), an integer \(T \ll 2^n\), and a hash function \(H:\{0,1\}^{2n}\rightarrow\{0,1\}^n\), the proof of work corresponding to \(ID\) and \(T\) will be some \(x\in\{0,1\}^*\) such that the first \(\lceil \log T \rceil\) bits of \(H(ID\| x)\) are zero.The actual bitcoin protocol is slightly more general, where the proof is some \(x\) such that \(H(ID\|x)\), when interpreted as a number in \([2^n]\), is at most \(T\). There are also other issues about how exactly \(x\) is placed and \(ID\) is computed from past history that we ignore here.

### From proof of work to consensus on ledger

How does proof of work help us in achieving consensus? The idea is that
every transaction \(t_i\) comes up with a proof of work of some \(T_i\) time
with respect to some identifier that is unique to \(t_i\). The *length* of
a ledger \((t_1,\ldots,t_n)\) is the sum of the corresponding \(T_i\)’s
which correspond to the total number of cycles invested in creating this
ledger.

An honest party in the bitcoin network will accept the longest valid
ledger it is aware of. (A ledger is *valid* if every transaction in it
of the form “transfer the coin \(ID\) from \(P\) to \(Q\)” is self-certified
by a solution of \(P\), and the last transaction in the ledger involving
\(ID\) either transferred or minted the coin \(ID\) to \(P\)). If a ledger \(L\)
corresponds to the majority of the cycles that were available in this
network then every honest party would accept it, as any alternative
ledger would be necessarily shorter. (See Reference:ledgerfig.)

The question is then how do we get to that happy state given that many
parties might be non-malicious but still *selfish* and might not want to
volunteer their computing power for the goal of creating a consensus
ledger. Bitcoin achieves this by giving some incentive, in the form of
the ability to mint new coins, to any party that adds to the ledger.
This means that if we are already in the situation where there is a
consensus ledger \(L\), then every party has an interest in continuing
this ledger \(L\), and not any alternative, as they want their minting
transaction to be part of the new consensus ledger. In contrast if they
“fork” the consensus ledger then their work may well be for vain. Thus
one can hope that the consensus ledger will continue to grow. (This is a
rather hand-wavy and imprecise argument, see this
paper for a more in depth analysis;
this is also related to the phenomenon known as preferential
attachment.)

**Cost to mine, mining pools:** Generally, if you know that completing a
\(T\)-cycle proof will get you a single coin, then making a single query
(which will succeed with probability \(1/T\)) is akin to buying a lottery
ticket that costs you a single cycle and has probability \(1/T\) to win a
single coin. One difference over the actual lottery is that there is
also some probability that you’re working on the wrong fork of the
ledger, but this incentivizes people to avoid this as much as possible.
Another, perhaps even more major difference, is that things are setup so
that this is a *profitable* enterprise and the cost of a cycle is
smaller than the value of \(1/T\) coins. Just like in the lottery, people
can and do gather in groups (known as “mining pools”) where they pool
together all their computing resources, and then split the award if they
win it. Joining a pool doesn’t change your expectation of winning but
reduces the *variance*. In the extreme case, if everyone is in the same
pool, then for every cycle you spend you get exactly \(1/T\) coins. The
way these pools work in practice is that someone that spent \(C\) cycles
looking for an output with all zeroes, only has probability \(C/T\) of
getting it, but is very likely to get an output that begins with
\(\log C\) zeroes. This output can serve as their own “proof of work” that
they spent \(C\) cycles and they can send it to the pool management so
they get an appropriate share of the reward.

The real bitcoin:There are several aspects in which the protocol described above differs from the real bitcoin protocol. Some of them were already discussed above: Bitcoin typically uses digital signatures for puzzles (though it has a more general scripting language to specify them), and transactions involve a number of satoshis (and the user interface typically displayes currency is in units of BTC which are \(10^8\) satoshis). The Bitcoin protocol also has a formula designed to factor in the decrease in dollar cost per cycle so that bitcoins become more expensive to mine with time. There is also a fee mechanism apart from the mining to incentivize parties to add to the ledger. (The issue of incentives in bitcoin is quite subtle and not fully resolved, and it is possible that parties’ behavior will change with time.) The ledger does not grow by a single transaction at a time but rather by ablockof transactions, and there is also some timing synchronization mechanism (which is needed, as per the consensus impossiblity results). There are other differences as well; see the Bonneau et al paper as well as the Tschorsch and Scheuermann survey for more.

## Collision resistance hash functions and creating short “unique” identifiers

Another issue we “brushed under the carpet” is how do we come up with
these unique identifiers per transaction. We want each transaction \(t_i\)
to be *bound* to the ledger state \((t_1,\ldots,t_{i-1})\), and so the ID
of \(t_i\) should contain also the ID’s all the prior transactions. But
yet we want this ID to be only \(n\) bits long. Ideally, we could solve
this if we had a *one to one* mapping \(H\) from \(\{0,1\}^N\) to
\(\{0,1\}^n\) for some very large \(N\gg n\). Then the ID corresponding to
the task of appending \(t_i\) to \((t_1,\ldots,t_{i-1})\) would simply be
\(H(t_1\|\cdots\|t_i)\). The only problem is that this is of course
clearly impossible- \(2^N\) is *much* bigger than \(2^n\) and there is no
one to one map from a large set to a smaller set. Luckily we are in the
magical world of crypto where the impossible is routine and the
unimaginable is occasional. So, we can actually find a function \(H\) that
is “essentially” one to one.

The main idea is the following simple result, which can be thought of as one side of the so called “birthday paradox”:

If \(H\) is a random function from some domain \(S\) to \(\{0,1\}^n\), then the probability that after \(T\) queries an attacker finds \(x\neq x'\) such that \(H(x)=H(x')\) is at most \(T^2/2^n\).

Let us think of \(H\) in the “lazy evaluation” mode where for every query the adversary makes, we choose a random answer in \(\{0,1\}^n\) at the time it is made. (We can assume the adversary never makes the same query twice since a repeat query can be simulated by repeating the same answer.) For \(i< j\) in \([T]\) let \(E_{i,j}\) be the event that \(H(x_i)=H(x_j)\). Since \(H(x_j)\) is chosen at random and independently from the prior choice of \(H(x_i)\), the probability of \(E_{i,j}\) is \(2^{-n}\). Thus the probability of the union of \(E_{i,j}\) over all \(i,j\)’s is less than \(T^2/2^n\), but this probability is exactly what we needed to calculate.

This means that a random function \(H\) is *collision resistant* in the
sense that it is hard for an efficient adversary to find two inputs that
collide. Thus the random oracle heuristic would suggest that a
cryptographic hash function can be used to obtain the following object:

A collection \(\{ h_k \}\) of functions where
\(h_k:\{0,1\}^*\rightarrow\{0,1\}^n\) for \(k\in\{0,1\}^n\) is a *collision
resistant hash function (CRH) collection* if the map
\((k,x)\mapsto h_k(x)\) is efficiently computable and for every efficient
adversary \(A\), the probability over \(k\) that \(A(k)=(x,x')\) such that
\(x\neq x'\) and \(h_k(x)=h_k(x')\) is negligible.Note that the other side of the birthday bound shows that you can
always find a collision in \(h_k\) using roughly \(2^{n/2}\) queries.
For this reason we typically need to double the output length of
hash functions compared to the key size of other cryptographic
primitives (e.g., \(256\) bits as opposed to \(128\) bits).

Once more we do *not* know a theorem saying that under the PRG
conjecture there exists a collision resistant hash function collection,
even though this property is considered as one of the desiderata for
cryptographic hash functions. However, we do know how to obtain
collections satisfying this condition under various assumptions that we
will see later in the course such as the learning with error problem and
the factoring and discrete logarithm problems. Furthermore if we
consider the weaker notion of security under a *second preimage attack*
(also known as being a “universal one way hash function” or UOWHF) then
it *is* known how to derive such a function from the PRG assumption.

A collection \(\{ h_k \}\) of collision resistant hash functions is an incomparable object to a collection \(\{ f_s \}\) of pseudorandom functions with the same input and output lengths. On one hand, the condition of being collision-resistant does not imply that \(h_k\) is indistinguishable from random. For example, it is possible to construct a valid collision resistant hash function where the first output bit always equals zero (and hence is easily distinguishable from a random function). On the other hand, unlike Reference:prfdef, the adversary of Reference:crhdef is not merely given a “black box” to compute the hash function, but rather the key to the hash function. This is a much stronger attack model, and so a PRF does not have to be collision resistant. (Constructing a PRF that is not collision resistant is a nice and recommended exercise.)

## Practical constructions of cryptographic hash functions

While we discussed hash functions as *keyed* collections, in practice
people often think of a hash function as being a *fixed keyless
function*. However, this is because most practical constructions involve
some hardwired standardized constants (often known as IV) that can be
thought of as a choice of the key.

Practical constructions of cryptographic hash functions start with a
basic block which is known as a *compression function*
\(h:\{0,1\}^{2n}\rightarrow\{0,1\}^n\). The function
\(H:\{0,1\}^*\rightarrow\{0,1\}^n\) is defined as
\(H(m_1,\ldots,m_t)=h(h(h(m_1,IV),m_2),\cdots,m_t)\) when the message is
composed of \(t\) blocks (and we can pad it otherwise). See
Reference:merkledamgardfig. This construction is known as the
Merkle-Damgard construction and we know that it does preserve collision
resistance:

Let \(H\) be constructed from \(h\) as above. Then given two messages \(m \neq m' \in \{0,1\}^{tn}\) such that \(H(m)=H(m')\) we can efficiently find two messages \(x \neq x' \in \{0,1\}^{2n}\) such that \(h(x)=h(x')\).

The intuition behind the proof is that if \(h\) was invertible then we
could invert \(H\) by simply going backwards. Thus in principle if a
collision for \(H\) exists then so does a collision for \(h\). Now of course
this is a vacuous statement since both \(h\) and \(H\) shrink their inputs
and hence clearly have collisions. But we want to show a *constructive*
proof for this statement that will allow us to transform a collision in
\(H\) to a collision in \(h\). This is very simple. We look at the
computation of \(H(m)\) and \(H(m')\) and at the first block in which the
inputs differ but the output is the same (there must be such a block).
This block will yield a collision for \(h\).

### Practical random-ish functions

In practice we want much more than collision resistance from our hash
functions. In particular we often would like them to be PRF’s as well.
Unfortunately, the Merkle-Damgard construction is *not* a PRF even when
\(IV\) is random and secret. This is because we can perform a *length
extension attack* on it. Even if we don’t know \(IV\), given
\(y=H_{IV}(m_1,\ldots,m_t)\) and a block \(m_{t+1}\) we can compute
\(y' = h(y,m_{t+1})\) which equals \(H_{IV}(m_1,\ldots,m_{t+1})\).

One fix for this is to use a different \(IV'\) in the *end* of the
encryption. That is, we define:

\(H_{IV,IV'}(m_1,\ldots,m_t) = h(IV',H_{IV}(m_1,\ldots,m_t))\)

A variant of this construction (where \(IV'\) is obtained as some simple function of \(IV\)) is known as HMAC and it can be shown to be a pseudorandom function under some pseudorandomness assumptions on the compression function \(h\). It is very widely implemented. In many cases where I say “use a cryptographic hash function” in this course I actually mean to use an HMAC like construction that can be conjectured to give at least a PRF if not stronger “random oracle”-like properties.

The simplest implementation for a compression function is to take a
*block cipher* with an \(n\) bit key and an \(n\) bit message and then
simply define
\(h(x_1,\ldots,x_{2n})=E_{x_{n+1},\ldots,x_{2n}}(x_{1},\ldots,x_{n})\). A
more common variant is known as Davies-Meyer where we also XOR the
output with \(x_{n+1},\ldots x_{2n}\). In practice people often use
*tailor made* block ciphers that are designed for some efficiency or
security concerns.

### Some history

Almost all practically used hash functions are based on the Merkle-Damgard paradigm. Hash functions are designed to be extremely efficientFor example, the Boneh-Shoup book quotes processing times of up to 255MB/sec on a 1.83 Ghz Intel Core 2 processor, which is more than enough to handle not just Harvard’s network but even Lamar College’s. which also means that they are often at the “edge of insecurity” and indeed have fallen over the edge.

In 1990 Ron Rivest proposed MD4, which was already shown weaknesses in 1991, and a full collision has been found in 1995. Even faster attacks have been since found and MD4 is considered completely insecure.

In response to these weaknesses, Rivest designed MD5 in 1991. A weakness was shown for it in 1996 and a full collision was shown in 2004. Hence it is now also considered insecure.

In 1993 the National Institute of Standards proposed a standard for a
hash function known as the *Secure Hash Algorithm (SHA)*, which has
quite a few similarities with the MD4 and MD5 functions. This function
is known as SHA-0, and the standard was replaced in 1995 with SHA-1 that
includes an extra “mixing” (i.e., bit rotation) operation. At the time
no explanation was given for this change but SHA-0 was later found to be
insecure. In 2002 a variant with longer output, known as SHA-256, was
added (as well as some others). In 2005, following the MD5 collision,
significant weaknesses were shown in SHA-1. In 2017, a full SHA-1
collision was found. Today SHA-1 is considered
insecure and SHA-256 is recommended.

Given the weaknesses in MD-5 and SHA-1 , NIST started in 2006 a competition for a new hashing standard, based on functions that seem sufficiently different from the MD5/SHA-0/SHA-1 family. (SHA-256 is unbroken but it seems too close for comfort to those other systems.) The hash function Keccak was selected as the new standard SHA-3 in August of 2015.

### The NSA and hash functions.

The NSA is the world’s largest employer of mathematicians, and is very heavily invested in cryptographic research. It seems quite possible that they devote far more resources to analyzing symmetric primitives such as block ciphers and hash functions than the open research community. Indeed, the history above suggests that the NSA has consistently discovered attacks on hash functions before the cryptographic community (and the same holds for the differential cryptanalysis technique for block ciphers). That said, despite the “mythic” powers that are sometimes ascribed to the NSA, this history suggests that they are ahead of the open community but not so much ahead, discovering attacks on hash functions about 5 years or so ahead.

There are a few ways we can get “insider views” to the NSA’s thinking. Some such insights can be obtained from the Snowden documents. The Flame malware has been discovered in Iran in 2012 after operating since at least 2010. It used an MD5 collision to achieve its goals. Such a collision was known in the open literature since 2008, but Flame used a different variant that was unknown in the literature. For this reason it is suspected that it was designed by a western intelligence agency.

Another insight into NSA’s thoughts can be found in pages 12-19 of NSA’s internal Cryptolog magazine which has been recently declassified; one can find there a rather entertaining and opinionated (or obnoxious, depending on your point of view) review of the CRYPTO 1992 conference. In page 14 the author remarks that certain weaknesses of MD5 demonstrated in the conference are unlikely to be extended to the full version, which suggests that the NSA (or at least the author) was not aware of the MD5 collisions at the time.

### Cryptographic vs non-cryptographic hash functions:

Hash functions are of course also widely used for *non-cryptographic*
applications such as building hash tables and load balancing. For these
applications people often use *linear* hash functions known as *cyclic
redundancy codes (CRC)*. Note however that even in those seemingly
non-cryptographic applications, an adversary might cause significant
slowdown to the system if he can generate many collisions. This can and
has
been used to obtain denial of service attacks. As a rule of thumb, if
the inputs to your system might be generated by someone who does not
have your best interests at heart, you’re better off using a
cryptographic hash function.