Hashing... First Step to Secure Software Programming

Where it all started?
Anyone in the field of programming wouldn't have missed the chance of implementing a login page. This login page might look very simple and straightforward. But in reality, it is not. What most of us would have done for our first login page is a simple string comparison for both username and password which might have already been hard-coded in the program. Some might have retrieved the username and password from the database and compared it with the user entering credentials based on roles. If we think about this scenario in a real-world enterprise or corporate level applications, the authentication procedure will not take place as simple as that.

Authentication
Authentication is all about ensuring that the system is being accessed by the correct user. Authentication plays a very important role in most of the systems or applications. The reason is to prevent the system from unauthorized access.

Authentication methodologies
1. Username Password authentication
The traditional way of authenticating a user for a system is "Username and password" plain text comparison. The main drawback of this mechanism is that anyone who gets the access to view the username and password in the database(for example Database Administrator) can easily access the account and can involve in any fraudulent activities. So the password should not be shown to any third party access.

Fig 1 - Traditional Username, Password authentication method

Fig 2 - Db table to store username and password for traditional authentication methodology

2. Password Hashing

The need for "Hashing" evolved with the above-mentioned problem as the base. Hashing is a technique used to convert a meaningful variable length input into an irreversible set of fixed length output string. The hashed output value is known as "Hash Value"/"Digest". Some common hash function families are MD (Message Digest), SHA (Secure Hash Algorithm).

Fig 3 - Hashing the string "logIn123" using SHA-256 hash function

Despite the input length based on the hash function(SHA-1, SHA-256, MD2, MD5, etc) the output hashed value length will remain the same. In Fig 4 it is clearly shown that for different length input strings "a" and "abc" we get different hash value when using SHA-256. But the length (no of characters) of the hash value is same (64) for both the input strings. For the same input strings "a" and "abc" when SHA-1 was used to hash the values, different hash values were generated but the output length for both "a" and "abc" is the same(40). 

If we take a look at the string "abc" that has been hashed by SHA-256 and SHA-1 we get 2 different hash values and 2 different output length. So this shows that for the same input value if we use different hashing functions we get different variable length digest. But when hashing different input values using the same hashing function we get different digest but the output digest length will be the same for all hashed values. As a general rule of thumb the greater the bit length of the hash value, the greater the protection as the cryptanalysis work factor will significantly get greater. In general SHA series hash functions are used for higher protection as MD series have collision problems. If 2 different input values get the same hashed value it is called as the hash collision. 

Fig 4 - Fixed length output for different hash functions

Hashing is "one-way" due to its irreversible character. Because of this property hashing cannot be used in places where you need to convert back the hashed value for future use.
E.g: Hashing the credit card numbers is not an advisable method. Once the input value is hashed, it cannot be converted back to its original form. So there is a high chance of losing sensitive data.

How does hashing authentication works?
Let's take a look at the following scenario where the user registers in a system for the first time and later login to that system with the credentials initially created by him/her.

Fig 5 - User authentication flow with hashing
The flow goes as follows:
1. Signup: User enrolls him/herself with the system by providing user credentials(username and password).
2. The password will be hashed using a hash function(here SHA1 has been used).
3. The hashed password along with its username will be sent to the database for storage.
4. Login: Once the user has successfully enrolled in the system the next time when the user wants to access the system, he/she has to provide their user credentials.
5. The password will be hashed using the same hash function used when the user got enrolled for the very first time.
6. Once the hashed value was generated for the user entered password, it will be sent to the server
7. The corresponding password for the username will be requested from the database.
8. If the user exists the corresponding password will be sent to the server.
9. The server will compare the 2 hash values for both the passwords and decide whether both the passwords are the same or not.
10. If the passwords are the same then the response will be sent to the client.

3. Salted Hashing
Everything looks good and secure with hashing. But is that all? No, there is still a problem. What if 2 users are having the same password? What if a hacker or an unauthorized person get to know one of the 2 users' passwords (through the dictionary or rainbow table, etc) and gets the access to the database? Then that unauthorized person gets the chance to access both the users' accounts as the same passwords will have the same hash values (hash collision) as shown in Fig 6.  

Fig 6 - Same hashed value password for two different users in database

This is when "Salted Hashing" comes into the play. Before analyzing about salted hashing, let's take a look at the term "Salt". Salt is nothing but a randomly generated string. In salted hashing, this randomly generated string will get appended to the password and then with the aid of a hash function the appended string will be converted to a digest. By this process, no user will have the same hashed password. Therefore, authentication will be more secure. In addition to that, some prefer to append the username with the password and salt, then hash it. It all depends on the requirements. Following are some of the common ways of doing salted hashing.

Fig 7 - Common ways of implementing salted hash

Let's try out salted hashing with a simple java application.

Initially, the user should be registered to the system by providing user details(username and password). Since it's for a demonstration purpose I have included the selection of hash function as well. But in a real scenario based on the requirement, the developers will decide on which hash function to be used. Currently, the recommended hash function is SHA-256.

Fig 8 - User signup page

Once the user has registered with the system, user credentials along with the randomly generated salt value will be stored in the database.

Fig 9 - Database table containing username, hashed password, salt and type of hash function

When the user login to the system the corresponding salt, hashed password along with the hash function type will be retrieved from the database based on the username (the username is unique). Then the user entered password will be hashed with the salt value retrieved from the database with the corresponding hash function. If the hashed password and the password retrieved from the database are same then the user can enter into the system as an authorized user. This how the salted hashing works.

Even though you cannot get the original value from the salted password, following the salted hashing technique for authentication prevent the users from dictionary attacks. So far we have seen hashing being used for authentication purposes. Is hashing limited only for authentication or are there any other places where hashing is used? Yes, there are some significant places where we use hashing for.

The usage of hashing in real world scenario 
Apart from authentication hashing is used for the following purposes as well.

1. Ensuring the integrity of messages during communication

Fig 10 - Message sent from A to B

The need for integrity verification of message during a communication:
  1.  To verify that the message is from A (Verifying the sender)
  2. The message received by B is not altered by any "man in the middle" and it is the same message sent by A.
How does this verification takes place in a communication?
Let's take a look at the flow shown in Fig 11 below.

Fig 11 - Hashing for integrity verification

In the above scenario, A sends a message along with a hash value H1. This hash value was generated by sending that message through a hash function. When B receives the message along with the hash value sent by A, B will hash the message with the same hash function used by A and gets the hash value H2. 

If H2 == H1,
  • A is the sender (verified the sender)
  • The received message is same as the original sent by A (message was not modified) 

if not,
  • The communication between A and B has been attacked by some unauthorized entity. Therefore, the message is altered and/or the final message received by B is not from A.
This is how the message integrity is handled by hashing. 

2. Hashing for indexing in database
Index in the database is mainly used for speed up the search process. Hashing improves the search in the database with the hash index. The records in a table will be divided into a set of groups which is known as "Buckets". Each bucket has a key which has a hash value.

Fig 12 - Hashing for indexing in database

During a search process, the input value will be hashed using a hash function. The hash value will be searched in the available buckets and through this, it speeds up the search.

I hope that this basic introduction to hashing will be helpful for the beginners. Try out these concepts and ideas by implementing some simple programs and get the practical experiences as well.

Comments

Popular posts from this blog

Encryption to take secure programming a step forward

RMI - Weather station