What is MD5? Understanding Message-Digest Algorithms

Learn how Adaptive Multi-Factor Authentication combats data breaches, weak passwords, and phishing attacks.

The message-digest algorithm MD5 is a cryptographic hash that is used to generate and verify digital signatures or message digests. MD5 is still widely used despite being declared “cryptographically broken” over a decade ago. 

As a cryptographic hash, it has known security vulnerabilities, including a high potential for collisions, which is when two distinct messages end up with the same generated hash value. 

MD5 can be successfully used for non-cryptographic functions, including as a checksum to verify data integrity against unintentional corruption. MD5 is a 128-bit algorithm. Even with its known security issues, it remains one of the most commonly used message-digest algorithms.

What is the MD5 message-digest algorithm?

Published as RFC 1321 around 30 years ago, the MD5 message-digest algorithm is still widely used today. Using the MD5 algorithm, a 128-bit more compact output can be created from a message input of variable length. This is a type of cryptographic hash that is designed to generate digital signatures, compressing large files into smaller ones in a secure manner and then encrypting them with a private ( or secret) key to be matched with a public key. 

MD5 can also be used to detect file corruption or inadvertent changes within large collections of files as a command-line implementation using common computer languages such as Java, Perl, or C. MD5 can then be used as a checksum verifying data integrity and digital signatures. Other non-cryptographic functions of MD5 can include using it to determine the partitional for a specific key in a partitioned database. 

MD5 can be used to either print (generate) or check (verify) 128-bit cryptographic hashes. MD5 has some serious well-documented vulnerabilities and flaws, however. Because of this, it should not be used for security purposes.

History of MD5 use

Developed as an extension of the cryptographic hash function MD4, MD5 was created by Ronald Rivest of RSA Data Security, Inc. and MIT Laboratory for Computer Science in 1991 to replace this earlier version that was deemed insecure. It was published in the public domain a year later. Just a year later a “pseudo-collision” of the MD5 compression function was discovered. 

The timeline of MD5 discovered (and exploited) vulnerabilities is as follows:

  • In 1996, a full collision was reported, and cryptographers recommended replacing MD5 with a different cryptographic hash function such as SHA-1.
  • Early in 2004, a project began to prove that MD5 was vulnerable to a birthday attack due to the small size of the hash value at 128-bits.
  • By mid-2004, an analytical attack was completed in only an hour that was able to create collisions for the full MD5.
  • In 2005, a practical collision was demonstrated using two X.509 certificates with different public keys and the same MD5 hash value. Days later, an algorithm was created that could construct MD5 collisions in just a few hours.
  • A year later, in 2006, an algorithm was published that used tunnelling to find a collision within one minute on a single notebook computer. 
  • In 2008, MD5 was officially declared “cryptographically broken” as MD5 hashes can be created to collide with trusted X.509 certificates issued by well-known certificate authorities (CAs).

Despite the known security vulnerabilities and issues, MD5 is still used today even though more secure alternatives now exist. 

Security issues with MD5

The MD5 hash function’s security is considered to be severely compromised. Collisions can be found within seconds, and they can be used for malicious purposes. 

In fact, in 2012, the Flame spyware that infiltrated thousands of computers and devices in Iran was considered one of the most troublesome security issues of the year. Flame used MD5 hash collisions to generate counterfeit Microsoft update certificates used to authenticate critical systems. Fortunately, the vulnerability was discovered quickly, and a software update was issued to close this security hole. This involved switching to using SHA-1 for Microsoft certificates.

A hash collision occurs when two different inputs create the same hash value, or output. The security and encryption of a hash algorithm depend on generating unique hash values, and collisions represent security vulnerabilities that can be exploited. 

Threat actors can force collisions that will then send a digital signature that will be accepted by the recipient. Even though it is not the actual sender, the collision provides the same hash value so the threat actor’s message will be verified and accepted as legitimate. 

What programs use MD5?

Even though it has known security issues, MD5 is still used for