Serious Security: MD5 considered harmful – to the tune of
In a fascinating legal deliberation handed down by the French data protection regulator CNIL (Commission Nationale de l’Informatique et des Libertés), the energy company Électricité de France, or EDF for short, has been fined EUR 600,000 (about $600,000).
The legal declaration is, in the manner of such things, rather long and (to non-lawyers, at least) linguistically orotund, which means you need reasonable proficiency in French to understand all the ins and outs of the matter, but the overall case boils down to four infringements.
The first three are concerned with general data-related interactions with customers, covering:
- Sending commercial marketing emails without proper consent.
- Collecting data without clarifying what or why.
- Not handling requests reliably when customers asked to see their data, to or get it deleted.
But it’s the last complaint that piqued our interest: Sur le manquement à l’obligation d’assurer la sécurité des données.
In English, this loosely translates as failure to store data securely, and relates very specifically to the insecure handling of passwords.
MD5 considered harmful
The regulator noted, amongs other things, that despite claiming it was salting-and-then-hashing passwords using an accepted hashing algorithm, EDF still had more than 25,000 users’ passwords “secured” with a single MD5 hash as recently as July 2022.
As you will have heard many times on Naked Security, storing the cryptographic hash of a password means that you can validate a password when it is presented simply by recomputing its hash and comparing it with the hash of the password that was originally chosen.
If the hashes match, then you can safely infer that the passwords match, without ever needing to store the actual password.
When presented, the password only ever needs to be held temporarily in memory, and can be discarded as soon as its hash is calculated.
As long as the hashing algorithm is considered cryptographically secure, it can’t usefully be “run in reverse”, so you can’t work backwards from the hash to reveal anything about the password itself. (A hash of this sort is known in the jargon as a one-way function.)
Similarly, a decent hashing algorithm prevents you starting with a known hash and devising some input value – any input, not necessarily the original password – that produces the desired hash.
You would need to try input after input until you got lucky, which for hashes even of 128 bits would take too long to be a practicable attack. (A hash with the safety precaution of not allowing you to figure out multiple inputs with the same output is said to be collision resistant.)
But MD5, as you probably know, has significant problems with collisions, as does its immediate successor SHA-1 (both these hashes came out in the early 1990s).
These days, neither algorithm is recommended for use anywhere, by anyone, for any purpose, given that there are similar but still-secure alternatives that can easily be used to replace them, such as SHA-256 and SHA-512:
MD5 hashes are 128 bits, or 16 bytes, long. SHA-256 and SHA-512 are 2x and 4x as long respectively. But it is not this extra hash length alone that makes them more suitable. Their primary advantage over MD5 is that they don’t have any specific known problems with collisions, so their cryptographic safety is not considered generally doubtful as a result.
Salting and stretching
In short, you wouldn’t expect any company, let alone an energy sector behemoth like EDF, to use MD5 for any cryptographic purpose at all, let alone for securing passwords.
Even worse, however, was the lack of salting, which is where a chunk of data that’s chosen randomly for each user is mixed in with the password before its hash is calculated.
The reason for a salt is simple: it ensures that the hash values of potential passwords cannot be calculated in advance and then brought along to help with an attack.
Without salting, every time any user chooses the password 123456, the crooks know in advance what its hash would be.
Even if the user chooses a more suitable password, such as 34DF6467!Lqa9, you can tell in advance that its MD5 hash will be 7063a00e 41866d47 f6226e60 67986e91.
If you have a long enough list of precomputed passwords, or of partially computed passwords (known rather splendidly in the jargon as a rainbow table), you may be able to recover the password via the table rather than by trying trillions of password combinations until you get lucky.
Salting means that you would need a complete, precomputed rainbow table for every user (the table is determined by the combination of salt + password), and you wouldn’t be able to compute each rainbow table – a task that can take several weeks and occupy terabytes of disk space – until you recovered the salts anyway,
But there’s more you need to do.
Even if you include a salt, so that precomputed “hash dictionaries” can’t be used, and you use a trusted cryptographic algorithm such as SHA-512, one hash calculation alone is sufficiently quick that attackers who have acquired a database of hashes can still try out billions of possible passwords a second, or even more.
So you should use what’s called stretching as well, where you not only salt the initial password, but then pass the input through the hashing algorithm thousands of times or more in a loop, thus making attacks considerably more time-consuming for any crooks who want to try.
Unlike repeated addition, where you can use a single multiplication as a shortcut to replace, say, the calcuation 5+5+5+5+5+5 with 6×5, there are no shortcuts for repeated hashes. To hash an input 1000 times requires 1000 “turns” of the cryptographic calculation handle.
Not just an MD5 problem
Ironically, it seems that although EDF only had 25,800 passwords hashed with MD5, and claimed in its defence that it was mostly using SHA-512 instead, it still wasn’t always salting or stretching the stored hashes.
The regulator reports that 11,200,000 passwords had correctly been salted-and-hashed, but there were nevertheless 2,400,000 that had simply been hashed directly once, whether with MD5 or SHA-512.
Apparently, EDF has now got its password storage up to scratch, but the company was fined EUR 600,000 anyway, and will remain publicly listed online on CNIL’s “naughty step” for the next two years.
We can’t be sure what fine would have been imposed if the judgment had involved poor hashing only, and EDF hadn’t also had to answer for the three other data protection offences listed at the start…
…but it does go to show that bad cryptographic choices can cost you money in more ways than one!
What to do?
Store your customers’ passwords securely!
The extra computational cost of salting-and-stretching can be chosen so that individual users are not inconvenienced when they login, yet would-be attackers have their attack speeds increased by several orders of magnitude.
A password recovery attack that might take a week to extract 10% of passwords stored as simple one-shot hashes would, in theory, take 200 years (10,000 weeks) if you were to make the the cost of computing each trial password 10,000 times harder.
Read our excellent explainer article on this very subject:
In short, we recommend the PBKDF2 “stretching” algorithm with SHA-256 as its core hash, with a per-user random salt of 16 bytes (128 bits) or more.
This matches the recommendations in CNIL’s latest judgement.
CNIL doesn’t offer advice for the number of PBKDF2 iterations, but as you will see in our article, our advice (October 2022) is to use 200,000 or more. (You can regularly increase the number of loops to keep up with the increase in computing power.)
If you don’t want to use PBKDF2, we suggest reading up on the algorithms bcrypt, scrypt and Argon2 to help you make a wise choice.
Don’t get caught out on the cryptographic naughty step!