It is important to store the passwords of user accounts in a secure fashion. There have been many high profile incidents where a security breach resulted in hackers obtaining database dumps of user passwords. The 2012 LinkedIn hack and the recent Adobe hack are two out of many similar cases. Due to the fact that the passwords were stored in an inappropriate fashion, the hackers (read as crackers) were able to recover the passwords of many user accounts and publish them on the Internet, resulting in an embarrassing PR fiasco for the companies.
The first important thing to note is that passwords should be stored hashed, not encrypted. For the readers without a background in cryptography, the important distinction between hashing and encryption is that encryption is a reversible operation while hashing isn't. This was one of the primary failures behind the recent Adobe hack.
The second important thing to note is that while hashing algorithms like MD5 or the SHA family are cryptographically strong hashes, they are not suitable for password hashing. There are two primary reasons for this.
1. Lack of salts
The basic cryptographic hash lacks a salt. What exactly is a salt? A salt is simply a piece of random data added to the password before hashing it. The purpose of a salt is to prevent rainbow table attacks from working. Without salts, an attacker can generate or use rainbow tables widely available on the Internet to rapidly crack common passwords. Hashing without using salts was one of the reasons the password hashes from the LinkedIn hack was quickly broken.
A common practice is to append a randomly generated salt before hashing it.
import hashlib import os password = "password" salt = os.urandom(16) m = hashlib.md5() m.update(salt + password) m.hexdigest()
However, this still isn't good enough due to the second reason.
2. Lack of a slowness factor
The second reason why normal cryptographic hashes are inappropriate for password hashing is because they are fast. While fast algorithms are an advantage in most computing tasks, this doesn't apply to password hashing.
While a normal system only has access to an ordinary CPU for hashing passwords, attackers are able to utilize special equipments like GPUs in their password cracking attempt. Modern GPUs are able to compute billions of MD5 or SHA hashes per second, which allows attackers to quickly crack passwords.
Special purpose password hashing algorithms are designed to address this two reasons. Most such algorithms can be configured to make hashing on special equipment more difficult, which reduces the advantage attackers have.
Proper password hashing in Python
Currently, there are three different functions recommended for password hashing. PBKDF2, Bcrypt and Scrypt. There is an ongoing Password Hashing Competition that will hopefully yield better algorithms in the near future.
Let's take a look at how to use PBKDF2 to hash passwords. While Bcrypt and Scrypt are generally considered stronger algorithms, PBKDF2 is more widely available.The code examples will be Python specific but the concepts should apply to any language.
Passlib is an excellent Python library with good support for PBKDF2. Installing it is simple. As a Python developer, you should already have pip installed:
$ pip install passlib
Passlib has a very simple interface.
from passlib.hash import pbkdf2_sha256 hash = pbkdf2_sha256.encrypt("password", rounds=200000, salt_size=16)
The encrypt() function takes three parameters. The first parameter is the password to be hashed. The second parameter is much more interesting. It controls the number of iterations that PBKDF2 applies to the password. By tuning this parameter, PBKDF2 can be configured to require large amounts of computing time to complete. A common question about using PBKDF2 is about the number of rounds to use. Unfortunately, there is no one magic number. Instead, developers should benchmark the time required for a certain iteration count on the system they are deploying the code on and tune it until it's acceptable. The higher the iteration count the better of course. The third setting is the length of the salt in bytes that Passlib generates for you. The default of 16 bytes if the parameter isn't specified is fine.
Verifying the hash when it is time to authenticate the user is a simple matter as well. Passlib has a verify() function that returns a boolean value based on the success of the verification.
from passlib.hash import pbkdf2_sha256 pbkdf2_sha256.verify("password", hash)
And there we have it, three lines of code to add strong password hashing to your Python projects. It's that simple and it will help you to avoid the mistakes that LinkedIn and Adobe committed.
- Passlib is a password hashing library for Python 2 & 3, which provides cross-platform implementations of over 30 password hashing algorithms.
- Linux: Install pip Client To Install Python Packages