What is an encryption collision?

What are "collisions" of encryption algorithms? Can attacks create an encryption collision?

To answer your question, I need to step through various aspects of cryptography.

One type of cryptographic algorithm is called a hash function. Hash functions take a message of any length as input and then output a short, fixed-length value called a hash, digest or checksum. Hash functions have many uses in cryptography because any change to the original input, accidental or otherwise, will change the resulting hash value. This means hashes can be used in many forms of authentication, such as digital signatures and message authentication codes. They can also verify file integrity because even the slightest change to a document, message, or any type of data will change the hash value.

A hash function is not the same as an encryption function. Encryption is a two-way operation, transforming data from a cleartext to ciphertext and back, whereas hashes compile a stream of data into a small digest, a summarized form if you will, and it's strictly a one-way operation. Because of the one-way nature of hash functions, the hash values of passwords are often stored instead of the passwords themselves. As there's no way to find out for sure which password produced a particular hash, they are not useful to an attacker.

But there's an inescapable problem here. If we are creating a small fixed, length-hash value, say 128 bits, to represent any piece of data, large or small, it means that there are far more possible input values than there are unique hash values. Therefore more than one input stream can produce the same hash value. When this occurs, it is known as a collision. A hash function is deemed collision-resistant if it is hard to find two inputs that hash to the same output. Collision-resistant doesn't mean that no collisions exist; simply that they are difficult to find.

A successful encryption collision attack on a cryptographic hash function could compromise the security of all kinds of digital systems. For example, many software publishers provide the MD5 (Message-Digest algorithm 5) hash value of their downloadable software. This enables users to verify that the file is authentic and has not been tampered with. However, if an attacker could maliciously modify the source code, but manage to keep the same hash value, anyone downloading the doctored version wouldn't know that it wasn't the genuine software.

So to answer your second question, attacks do try to find encryption collisions. In March 2005, two researchers created two X.509 digital certificates with different public keys but with the same MD5 hash. Since then, an algorithm has been published that can find an MD5 collision in under a minute.

Fortunately, as MD5 has been shown to be non-collision resistant, it's being replaced by the SHA-2 family of hash functions in most applications.

What is an encryption collision?

Michael Cobb reviews how encryption collision attacks on cryptographic hash functions could compromise the security of all kinds of digital systems.

Dig Deeper on Data security and privacy

What is a UUID (Universally Unique Identifier)?

How to use the John the Ripper password cracker

What is a checksum?

hashing

Related Q&A from Michael Cobb

The differences between inbound and outbound firewall rules

Symmetric vs. asymmetric encryption: Understand key differences

How to protect port 139 from SMB attacks