Enhance Security Hash Function For Web Development

Previously i wrote an article on Better Hashing Password in PHP and PHP Secure Login Tips And Tricks. Both these articles are closely link to hash function and i find that i really didn't answer hash function on these two articles clearly enough. Especially on the article Better Hashing Password in PHP where i find my reader get confused on some of the security term used and caused some misunderstanding on the article. Hence, i came up with an idea to write out an article to detail some hash function question and improvement any developer can make to further secure their website. You will get a better understanding on security hash function and apply it on your web development once this is over.

Stop Using MD5 or SHA-1 in the future

If you have read Better Hashing Password in PHP, US-CERT of the U. S. Department of Homeland Security said that MD5 should be considered cryptographically broken and unsuitable for further use in 2008. Wiki explained a good detail of MD5 vulnerability that will surely make you switch your hash function from md5 to other better hash function out there. SHA1 on the other hand is more secure than MD5. However, collision has also been found on SHA-1. This is no surprise as both SHA-1 and MD5 are descended from MD4. Nonetheless, SHA-1 is still strong enough currently but not in the future looking at how computer power advancement has been progressing. Furthermore, faster way of creating a collision has also been found with only 2⁵² attempt required which is a significant reduction from 2⁶³. What does this means? This means that SHA-1 attacks affect collision resistance, not pre-image resistance. Short to say that after 2⁵² operations, the researchers are able to generate two unique messages that hash to the same digest value which is approximately a few month times in a dedicated hardware to construct such collision. While obtaining a SHA-1 collision via brute force would still require 2⁸⁰ operations. By the way for people who have no clue what's SHA means, it stand for Secure Hash Algorithm which is required by law for use in certain U. S. Government applications, including use within other cryptographic algorithms and protocols, for the protection of sensitive unclassified information. Thus, SHA family algorithm should be pretty solid for anyone who wish to secure their web system unless weakness has been found such as SHA-1.

Why Collision Is Bad

Any hash function will face collision eventually due to pigeonhole principle whenever members of a very large set are mapped to a relatively short bit string. The impact of collisions of any kind are undesirable. This means that two different set of message may generate the same hash value and one hash value may be generated by more than one message. Simply to say that the hash function doesn't generate a one to one relationship hash value. What this means for security is that there is a message exist 'n' which is different from 'm' that can be used to access the same system. Therefore, the more collision resistance a hash function is the better it is for security. Let me provide you with an example on how bad collision can caused for a system. Assume a system had their hashed password compromised. Instead of 'hello' as password, another value 'bye' which was generate the same hashed value (which is not true) that the hacker is able to use access the system. Now, instead of a 5 length word, a similar hash value can be found with a 3 length word. And this means shorter time to crack your hashed value and easily it is for our hacker but its not always necessary this case. Obviously this is just an example and security cautious people will definitely do much more than this to prevent birthday or brute force attacks.

Moving To SHA-2 Hash Function

Like i mention on Better Hashing Password in PHP it is time to start considering moving towards a more solid hash function that have not found any weakness yet. Most U.S. government applications will be required to move to the SHA-2 family such as SHA-224, SHA-256, SHA-384, and SHA-512 of hash functions by 2010 as stated by NIST(National Institutes of Standards and Technology) on their hash function policy. Hence, for web security conscious people, we should also start moving towards SHA-2 too. Currently, PHP 5.12 also offered such option, hash

$phrase = 'This is my password';
$sha1a =  base64_encode(sha1($phrase));
$sha1b =  hash(’sha1′,$phrase);
$sha256= hash(’sha256′,$phrase);
$sha384= hash(’sha384′,$phrase);
$sha512= hash(’sha512′,$phrase);

For users who used version lower than PHP5.12, you you can try to use mhash which is an open source class for PHP.

$phrase = 'This is my password';
$sha1a =  base64_encode(sha1($phrase));
$sha1b =  base64_encode(bin2hex(mhash(MHASH_SHA1,$phrase)));
$sha256= base64_encode(bin2hex(mhash(MHASH_SHA256,$phrase)));
$sha384= base64_encode(bin2hex(mhash(MHASH_SHA384,$phrase)));
$sha512= base64_encode(bin2hex(mhash(MHASH_SHA512,$phrase)));

SHA-2 should be used if you are web security minded person. SHA-1 can still be used definitely but additional enhancement would have to be in placed to strengthen such hash function from various attacks on its weakness. On the other hand, you may want to avoid MD5 for very security data.

Is SHA-2 Totally Secure?

Like we mention previously, all hash function is unavoidable towards collision. It is the same as stating that any system can be compromised with the proper amount of time and resources. The objective of a hash function is not to provide a totally uncrackable solution but to delay the process of breaking it where the process might required few hundred year to achieve.

Currently, the best public attacks on SHA-2 break 24 of the 64 or 80 rounds where 64 and 80 rounds is the number of loop in SHA-256 and SHA-512 respectively. Hence, it is safe to say no collision has yet to be found on SHA2 family yet. Hence, SHA-2 family can be said to have high collision resistance at the moment.

Attacks On Hash Function

An attack can be process faster with the given resources. Some ways is to divide up the cracking process to different computer or exploit CPU architecture to take advantage of multiple cores on one processor or multiple processors on a single machine. This makes cracking the password much easier. And the below attacks might just be one of the cracking process used by them.

Birthday Attack

Most attacks on hash function are targeting the hash function collision resistance part as it is easier to launch a collision attacks. At the same time, preimage attacks has yet to be found on established hash functions. The typical kind of attack against hash function will usually be collision attacks which is also known as birthday attacks. We mention earlier that every hash function will face collision problem. Therefore, theoretically it is possible to find two message that produce the same hash value.

The birthday attack is a statistical probability problem. The method used to find a collision is to simply evaluate the function ƒ for different input values that may be chosen randomly or pseudorandomly until the same result is found more than once. Because of the birthday problem, this method can be rather efficient.

Let's see how efficient it is. Given 'n' inputs and 'k' possible outputs, there are n(n-1)/2 pairs of inputs. For each pair, there is a probability of 1/k of both inputs producing the same output key. So, if you take k/2 pairs, the probability will be 50% that a matching pair will be found. If n is greater than sqrt(k), there is a good chance of finding a collision.

Brute Force Attack

brute force attack is the most fundamental attack but is often used. Brute force is often used against hash value as hash function generates a one way hash value. This also means that the hash value is irreversible which makes it such a secure function. The only bet is to try each and every words in order to get it right. However, such method is very time consuming as the search space may be very huge.

Rainbow Table Attack

Another common attack is to use a rainbow table to try to figure out the original message given a hash value. Unlike brute force that attempt to match every single character, rainbow table attack uses tables which offer time-memory-trade-off and chains to lookup for the particular message. The table content does not depend on the hash value to be inverted. It is created once and then repeatedly used for the lookups unmodified. Increasing the length of the chain decreases the size of the table. It also increases the time required to perform lookups, and this is the time-memory trade-off of the rainbow table. In a simple case of one-item chains, the lookup is very fast, but the table is very big. Once chains get longer, the lookup slows down, but the table size goes down. The table here refers to the rainbow table where all kind of password is being hashed and stored in it. This process usually takes a while as the table should be extremely large to have a possibility of finding hash and as stated in bold above, it is usually repeatedly used for lookups unmodified. Once you understand this, you can visit kestas for an explanation of how rainbow table attack works.

Defensive Strategies

There are known ways of attacks for hash function. Similarly, there will be defensive strategies web developers can apply to better protect our system.

Using SALT

Another common way of enhancing hash function is through the using of SALT. A salt is used to strengthen user password if ever the system table is to be compromise. SALT is just an additional string appended to the password before it is being hashed and store into the database table. The same exact SALT will be required during verification between the user entered password and stored password. However, some of us get confuse of the capability of a SALT in cryptography can greatly help in a web system.

Before i explain SALT criteria, let's see what SALT can do for us. If your system database was compromised but no SALT was used for your password hashing. What will you do? Basically you will send an email out to everyone and request them to change their password like what Reddit did (well, the only differences is they did not even hash their password). On the other hand, if SALT is being used in this case, you guys will just have to modify the SALT algorithm without causing so much kiosk. Furthermore, the passwords that was compromised will be pretty solid to be easily decrypted.

Let’s look at brute force attack. We assume the hacker know the length of your SALT (since its compromised) but you did not change the length (assume the length is 1000 character). The hacker began to hack your system using the passwords they receive. Let’s assume there are around 94 character you can fill for your SALT or password (Its around 94 characters, please refer to ASCII values). Hence, 1 single character will make a brute force run 92 times to get a shot of a single character password (means no salt). Now, we have 1000 character for our SALT. This means that our hacker will have to try 92¹⁰⁰⁰ to get a shot to decrypt one password on the list of hacked passwords. This means they will have to try 10¹⁵⁰⁰ which is many many combination for just one password! I don’t think one year is enough for them to crack one password using a single computer. But correct me if i’m wrong. Anyway we can see from this interesting table from Lockdown.co.uk - The Home Computer Security Centre research which was written on Friday 10th July 2009 04:01 that the longer and more character in a password, the longer it will need to crack in a simple brute force attack. Try imagine 1000 character with different combination will take after looking at the table.

Next, let’s look at rainbow table attack attempt to crack such password. Now, we understand that rainbow table precompute hash chain. The goal is to precompute a data structure that, given any output h of the hash function, can either locate the p in P such that H(p) = h, or determine that there is no such p in P. Since all our password are salted uniquely, the attacker will have to generate a rainbow table for each salt for each possible password – exponentially increasing the effort an attacker must make. Furthermore, looking at the size of the SALT, it is infeasible for such attack to occur because of the sizable investment in computing processing, rainbow tables beyond fourteen places in length are not yet common. So, choosing a password that is longer than fourteen characters or that contains non-alphanumeric symbols may force an attacker to resort to brute-force methods.

However, OUR TABLE WAS COMPROMISED! This doesn't mean our system is compromised as well. It really depend on the SALT function you create as mention on Better Hashing Password in PHP. A paranoid SALT implementation criteria might have:

More than 64 bits long of SALT where 64 bits is the least amount of SALT length required to be secure as specific in PKCS #5
Randomly populated ASCII value for each SALT character
Validate the SALT to ensure it contains symbols, uppercase, lowercase and numbers
BASE64_encode it before storing into database (decode it when used)
Use other dynamic field in the table that might change by users as part of the SALT such as email address, telephone numbers, last login
Validate that no such password hash exist on the table.

Basically the above criteria should be enough for very solid SALT implementation that will not compromised your system even if your database was compromised. Let me explain the reason why on the above criteria.

Longer salt length means more brute force values and bigger size rainbow table required
ensure that any kind of character is being considered as salt to generate different hash for each password.
this is just a validation to ensure every kind of salt is being purchased
decode this salt value into the database so that stupid hacker will use it directly.
this is the criteria that makes the system secure as we are not doing hash(password + salt) instead we utilize the data in the table or other table to hash it this way hash(password + salt + username + secret answer). Take note that the each variable used to create a hash value are all static data (it doesn't change). Hence, the hacker might not know how the sequence and data of hashing the password is generated unless the function of the SALT is also compromised which is possible (developers can be hackers). However, the above description wrote dynamic data. This means that the SALT function will always generate a new SALT or password to be updated to the table whenever the dynamic data changed. Since last login time will only change upon login, we will use this dynamic data and generate a hash using this sequence hash(password + salt + last login time). Hence, whenever a user logged in, a new password hashed is generated and update the password field (which is a hash). During user validation we will use hash(password + salt + database last login time) and update current time and new password into the database upon logged in. Similar, SALT can be generated every time upon logged in as well. Hence, we will use a new salt for every single logged in request. Now, if a hacker compromise your database, unless it can crack before every user logged in, this is pretty solid i would said.
In case of collision, always validate that the hash value cannot be found within the table before inserting it.

Like i said, this is a total paranoid solution that some of you might like to have. But basic hash(password + salt + username + secret answer) will be quite enough to protect your system provided you follow the PKCS#5 way of constructing your salt. You may want to visit PHP Secure Login Tips And Tricks for login security.

Iterative Hashing Technique

One way of enhancing your hash function is through iterative hashing. This technique basically takes a particular hash function and rehashing it many times (around 1000-10,000) making it incredibly hard to crack which is also known as key stretching or key strengthening in cryptography. Assuming a hacker manage to crack a hash value within 8 hours, rehashing it many times might change this from 8 hours to 8000-80,000 hours per password. I think we can imagine how an attack will work on such technique where every decrypted hash resulted in more hash to be decrypt. Its like opening a birthday present and found yourself a new box to open again (make me cry). Now we all know that SHA family hash function are all considered as fast. However, SHA-2 family might run slower than SHA-1 hash function as the block size is bigger but no much differences is being made.

What most of us will worry about is the time required for that many iterative round hashing to occur each time a user login or account creation. This will depend on how secure your system is required. You don't expect a bank web portal to compromise security for speed don't you? (hacker: HELL YEAH! SPEED!! SPEED!!) Hence, you may want to tune this according to your need and be at least 1000 iterative as something between 2-100 iterative is equivalence to nothing as such hash might had already exist on rainbow table. On the other hand, according to Wiki, Slowest personal computers in use today (2009) can do about 65000 SHA-1 hashes in one second using compiled code. Thus a program that uses key strengthening can use 65000 rounds of hashes and delay the user for at most one second. The standard as mention on PKCS #5 is around 1000 and above. And in case you never heard of PKCS, in cryptography, PKCS refers to a group of Public Key Cryptography Standards devised and published by RSA Security.

SALT can be appended to each iterative to make your iteration algorithm more secure but it really depends on how you are going to design this.

Is Double Hashing Better?

Double hashing here is different from iterative hashing. Double hashing uses two different hash function instead of one. This is really confusing as double hashing is also refer to iterative hashing by some people. Hence, making everything confusing. An example is


sha1(md5($pass))

your system is at least as weak as the weakest of the hash algorithms you are using (md5). It is true that both iterative and double hashing will reduce the search space but the effective reduction is insignificant compare to the gain from the benefit of iterative hashing. On the other hand, double hashing is bad as it will weaken the system since the hacker will only required to break the weaker hash function. Therefore, try avoiding double hashing and go for iterative hashing instead. Furthermore, hashing two times with the same algorithm is considered suboptimal.

References And Good Reading

Conclusion

Enhance Security Hash Function is a pretty interesting topic. Understanding the strength and weakness of your used hash function is necessary to better protect your web system. Hash function is the only method most of us, web developers and the internet are using to secure ourselves. Without fully understanding hash function is like a man on a battle field with a gun but have little to no understand of its capability and weakness. And that's bad. (not that gun where you bring along with you to the toilet)