What is Hashing?

January 30, 2018

Hashing is a cryptographic technique that takes a piece of data and translates it into a unique “fingerprint” (or “bit-string”). The resulting value that is generated – a representative image of the original message – is referred to as a “digital fingerprint”, “message digest” or a “hash value”.

Different scenarios require different cryptographic techniques. For example, to ensure confidentiality, an encryption method is used that enables one to reconstruct an original message with the knowledge of the appropriate key*. A cryptographic hash, however, is used to allow for a quick comparison of large data sets and to verify that that data has not been altered.

While some encryption techniques – such as the confidentiality example above – are reversible, a cryptographic hash is a one-way function and therefore impossible to invert. So, even if someone knows the hash value of the data, he/she is unable to know the original message. Only the person who knows the original data can prove that the hash value was created from the original piece of data, thus ensuring that it has not been altered from its original form.

The only way to recreate the input (original) data from a hash function’s output is to try a large number of potential inputs to see if they produce a match. If two inputs result in the same output, then a collision** has occurred. While it ideally should be impossible to find two different messages whose hash values are similar, collision resistance doesn’t necessarily mean that no collisions exist, but rather that they are very difficult to find.

*Public/private key cryptography is an encryption method whereby one can encrypt data with the recipient’s public key and the recipient can then decrypt it using their private key (or vice versa, depending on the objective).

 **An example is a SHA-256 collision. A SHA-256 hash function produces 256 bits of output from a larger set of inputs. Thus, some inputs will necessarily hash to the same output. If a hacker finds a collision, he/she can use it to substitute an authorized message with an unauthorized one.