@codegasms/sha256 NPM

SHA-256 Algorithm

Usage

    const hashString = require('@codegasms/sha256');

    const message = 'Hello World';
    
    const hashedMessageHex = hashString(message, 'hex');
    const hashedMessageBinary = hashString(message, 'binary');
    const hashedMessageBase64 = hashString(message, 'base64');

Unicode Codepoint To UTF-8 Conversion

Unicode is represented as U+XXXX, where XXXX is a hexadecimal number. UTF-8 is a direct mapping of the unicode characters i.e. Encoding format of Unicode characters in one-byte units, hence the suffix -8, more specifically it is a conversion of a codepoint into a set of one to four bytes.

Range 1 : 0xxxxxxx
Out of the available 8-bits, 1-bit is reserved to show that this is of Type-1 UTF-8 i.e. one-byte encoding. The rest of the bits i.e. 7-bits can be used to map $2^7 = 128$ numbers.
e.g. U+0047 (Unicode of A i.e ASCII "A") is mapped as 0 1 0 0 0 0 0 1
This contains the mappings of ASCII characters (0-127)
Range 2 : 110xxxxx 10xxxxxx
Out of the available 16-bits, first 3-bits of the first byte are reserved to show that this is of Type-2 UTF-8 i.e. two-byte encoding. This is shown by using two 1s and a 0. The first 2-bits of the second byte are also reserved to denote the same thing. To avoid ambiguity in memory access, both the bytes have some reserved bits to denote the type of UTF-8 range. The rest of the bits i.e. 5-bits from the first byte and the 6-bits from the second byte can be used to map $2^{(5+6)} = 2^{11} = 2048$ numbers.
e.g. U+00B9 (Unicode of $^1$ i.e. Superscript "1") is mapped as 1 1 0 0 0 0 1 0
This contains the mappings of unicode characters (128-2047)
Range 3 : 1110xxxx 10xxxxxx 10xxxxxx
Out of the available 24-bits, first 4-bits of the first byte are reserved to show that this is of Type-3 UTF-8 i.e. three-byte encoding. This is shown by using three 1s and a 0. The first 2-bits of the second byte and third byte are also reserved to denote the same thing. To avoid ambiguity in memory access, all the bytes have some reserved bits to denote the type of UTF-8 range. The rest of the bits i.e. 5-bits from the first byte and the 6-bits from the second byte & third byte can be used to map $2^{(4+6+6)} = 2^{16} = 65536$ numbers.
e.g. U+2070 (Unicode of $^0$ i.e. Superscript "0") is mapped as 1 1 1 0 0 0 1 0 1 0 1 1 0 0 0 0
This contains the mappings of unicode characters (2048-65535)
Range 4 : 11110xxx 10xxxxxx 10xxxxxx 10xxxxxx
Out of the available 32-bits, first 5-bits of the first byte are reserved to show that this is of Type-4 UTF-8 i.e. four-byte encoding. This is shown by using four 1s and a 0. The first 2-bits of the second byte, third byte and fourth byte are also reserved to denote the same thing. To avoid ambiguity in memory access, all the bytes have some reserved bits to denote the type of UTF-8 range. The rest of the bits i.e. 5-bits from the first byte and the 6-bits from the second byte, third byte & fourth byte can be used to map $2^{(3+6+6+6)} = 2^{21} = 2097152$ numbers.
This contains the mappings of unicode characters (65536-2097151), but we have not yet reached the upper-bound yet. Range 4 is (65536-1114111) as of today. (Discalimer : This data is prone to changes on a regular basis, check the Unicode website for more information)

/**
 * Converts Binary Unicode Codepoint into Binary UTF-8 string
 * Unicode is represented as U+XXXX where XXXX is the hexadecimal codepoint
 * Convert the codepoint to respective ary-byte depending on the lenght of the codepoint
 * @param {number} codepoint - Unicode Codepoint as a number
 * @returns {string} - UTF-8 string for the given unicode codepoint
 */
function unicodeToUtf8Binary(codepoint) {
    const str = codepoint.toString(2);      // Converts to base-2 binary string
    const length = str.length;              // Length of the binary string

    // Range 1 : 0xxxxxxx
    if (length <= 7) {
        return `0${'0'.repeat(7 - length)}${str}`;
    }

    // Range 2 : 110xxxxx 10xxxxxx
    else if (length <= 11) {
        return `110${'0'.repeat(11 - length)}${str.slice(0, -6)} 10${str.slice(-6)}`;
    }

    // Range 3 : 1110xxxx 10xxxxxx 10xxxxxx
    else if (length <= 16) {
        return `1110${'0'.repeat(16 - length)}${str.slice(0, -12)} 10${str.slice(-12, -6)} 10${str.slice(-6)}`;
    }

    // Range 4 : 11110xxx 10xxxxxx 10xxxxxx 10xxxxxx
    else if (length <= 21) {
        return `11110${'0'.repeat(21 - length)}${str.slice(0, -18)} 10${str.slice(-18, -12)} 10${str.slice(-12, -6)} 10${str.slice(-6)}`;
    }

    // Out Of Range
    else {
        throw new Error('Invalid Unicode Codepoint received : Out Of UTF-8 Range');
    }
}

String To UTF-8 & UTF-8 To String

Using this logic one can simply build 2 conversion functions i.e.

UTF-8 To String

/**
 * Converts UTF-8 string to human-readable string
 * @param {string} str - UTF-8 string to be converted to human-readable string 
 * @returns {string} - Human-readable string for the given UTF-8 string
 */
function fromUTF8String(str) {
    let utf8StringArray = str.split(' ');
    let output = [];
    let i = 0;
    while (i < utf8StringArray.length) {
        // Range 1 : 0xxxxxxx
        if (utf8StringArray[i].startsWith('0')) {
            let unit = utf8StringArray.slice(i, i + 1).join('');
            let codepoint = parseInt(unit.slice(1), 2);
            output.push(String.fromCodePoint(codepoint));
            i += 1;
        }
        // Range 2 : 110xxxxx 10xxxxxx
        else if (utf8StringArray[i].startsWith('110')) {
            let unit = utf8StringArray.slice(i, i + 2).join('');
            let codepoint = parseInt(unit.slice(3, 8) + unit.slice(11), 2);
            output.push(String.fromCodePoint(codepoint));
            i += 2;
        }
        // Range 3 : 1110xxxx 10xxxxxx 10xxxxxx
        else if (utf8StringArray[i].startsWith('1110')) {
            let unit = utf8StringArray.slice(i, i + 3).join('');
            let codepoint = parseInt(unit.slice(4, 8) + unit.slice(11, 17) + unit.slice(20), 2);
            output.push(String.fromCodePoint(codepoint));
            i += 3;
        }
        // Range 4 : 11110xxx 10xxxxxx 10xxxxxx 10xxxxxx
        else if (utf8StringArray[i].startsWith('11110')) {
            let unit = utf8StringArray.slice(i, i + 4).join('');
            let codepoint = parseInt(unit.slice(5, 8) + unit.slice(11, 17) + unit.slice(20, 26) + unit.slice(29), 2);
            output.push(String.fromCodePoint(codepoint));
            i += 4;
        }
        else {
            throw new Error('Invalid UTF-8 string');
        }
    }
    return output.join('');
};

String To UTF-8

/**
 * Converts any human-readable string to UTF-8 string
 * @param {string} str - Human-readbale string to be converted to UTF-8 
 * @returns {string} - UTF-8 string for the given unicode string
 */
function toUTF8String(str) {
        let utf8StringArray = [];

        for (let symbol of str) {
            const codepoint = symbol.codePointAt(0);
            utf8StringArray.push(unicodeToUtf8Binary(codepoint));
        };

        return utf8StringArray.join(' ');
    };

Explaining The SHA-256

Preprocessing

Preprocessing needs to be done before implementing the SHA-256 algorithm. The preprocessing consists of 3 steps :

Padding The Message (To ensure that the message length is a multiple of 512-bits)
Parsing The Message into Message blocks (Entire message is broken down into n blocks of size 512-bits each, and SHA-256 runs n iterations to compute the Hashed Value)
Setting The Initial Hash Value (Defined as "The 32-bit fractional part of square root of first 8 prime numbers")

Padding the Message

The Message :
512-bits message = UTF-8 Encoding + 1 {Mark the start of padding hereafter} + padding {0000...000} + length(UTF-8 Encoding) {64-bits reserved}

To create the padded message, we need to add 2 paddings :

Zero padding after '1' and before start of length

    let zeroPadding = '0'.repeat(512 - (message.length + 1 + 64) % 512);

Zero padding to make length = 64-bit

    let lengthPadding = '0'.repeat(64 - message.length.toString(2).length);

Parsing the Message

For SHA-256, the padded message is parsed into N 512-bit blocks, M(1) ... M(N). Since the 512-bits of the input block may be expressed as sixteen 32-bit words, the first 32-bits of message block i are denoted M(0)i , the next 32 bits are M(1)i , and so on up to M(15)i.

Clever way is to use RegEx to parse the message
Divide into blocks of 512-bits first
    let blocks = [];

for (let i = 0; i < message.length; i += 512) {
    blocks.push(message.slice(i, i + 512));
}

>
> 2. Divide each blocks into sizteen 32-bits words
>
```javascript
    return blocks.map((x) => {
        let chunks = [];

        let j = 0;
        for (let i = 0; i < x.length; i += 32) {
            chunks.push(x.slice(i, i + 32));
        }
    });

Setting up the Initial Hash Values

These Hash Values were obtained by taking the first 32-bits of the fractional parts of the square roots of the first 8 prime numbers.

const H = [
    0x6a09e667,
    0xbb67ae85,
    0x3c6ef372,
    0xa54ff53a,
    0x510e527f,
    0x9b05688c,
    0x1f83d9ab,
    0x5be0cd19
];

Hash Computation

Operations In SHA-256

There are some predefined operations in SHA-256 algorithm which are :

Ch(x, y, z) - Choose : y if x is set else z

function Choose(x, y, z) {
    return (x & y) ^ (~x & z);
};

Maj(a, b, c) - Majority : max(frequency(1) in (a,b,c), frequency(0) in (a,b,c))
```
function Majority(x, y, z) {
    return (x & y) ^ (x & z) ^ (y & z);
};
```

Σ0(x) - Sigma0 : Rotates the number to the right by 2, 13, and 22 bits and then XORs the result

function Sigma0(x) {
    return rightRotate(x, 2, 32) ^ rightRotate(x, 13, 32) ^ rightRotate(x, 22, 32);
};

Σ1(x) - Sigma1 : Rotates the number to the right by 6, 11, and 25 bits and then XORs the result

function Sigma1(x) {
    return rightRotate(x, 6, 32) ^ rightRotate(x, 11, 32) ^ rightRotate(x, 25, 32);
};

σ0(x) - sigma0 : Rotates the number to the right by 7, 18, and shifts to the right by 3 bits and then XORs the result
```
function sigma0(x) {
    return rightRotate(x, 7, 32) ^ rightRotate(x, 18, 32) ^ (x >>> 3);
}
```
σ1(x) - sigma1 : Rotates the number to the right by 17, 19, and shifts to the right by 10 bits and then XORs the result
```
function sigma1(x) {
    return rightRotate(x, 17, 32) ^ rightRotate(x, 19, 32) ^ (x >>> 10);
}
```

Each Message Block, M(1) ... M(N), is processed in order, following the steps:
Prepare Message Schedule
Initialize Working Variables
Find State of Variables at each Iteration and,
Compute New $i^{th}$ Hash Values
After N iterations, concatenate H0 ... H7 i.e. H0 || H1 || H2 || H3 || H4 || H5 || H6 || H7 to obtain the bashA-256 Hash.

Prepare the Message Schedule

Perform the following operation:

$messageSchedulei = \begin{cases} parsedMessagei & \text{if } 0 \leq i \leq 15 \ \sigma_1(messageSchedulei - 2) + messageSchedulei - 7 + \sigma_0(messageSchedulei - 15) + messageSchedulei - 16 & \text{if } 16 \leq i \leq 63 \end{cases}$

let messageSchedule = [];

for (let t = 0; t < 64; t++) {
    if (t <= 15) {
        messageSchedule[t] = parsedMesssage[i][t];
    } else {
        messageSchedule[t] = (sigma1(messageSchedule[t - 2]) + messageSchedule[t - 7] + sigma0(messageSchedule[t - 15]) + messageSchedule[t - 16]) % Math.pow(2, 32);
    }
}

Initialize Working Variables

Initialize 8 working variables a, b, c, d, e, f, g and h with $(i - 1)th$ hash value.

a = H0(i)
b = H1(i)
c = H2(i)
d = H3(i)
e = H4(i)
f = H5(i)
g = H6(i)
h = H7(i)

Find the State of Variables

Iterate in the Message Schedules and perform the following operations:

T1 = h + Σ1(a) + Ch(e, f, g) + K[i] + mwssageSchedule[i]
T2 = Σ0(a) + Maj(a, b, c)
h = g
g = f
f = e
e = d + T1
d = c
c = b
b = a
a = T1 + T2

for (let t = 0; t < 64; t++) {
    T1 = (h + Sigma1(e) + Choose(e, f, g) + K[t] + messageSchedule[t]) % Math.pow(2, 32);
    T2 = (Sigma0(a) + Majority(a, b, c)) % Math.pow(2, 32);
    h = g;
    g = f;
    f = e;
    e = (d + T1) % Math.pow(2, 32);
    d = c;
    c = b;
    b = a;
    a = (T1 + T2) % Math.pow(2, 32);
}

Compute New $i^{th}$ Hash Values

Compute the $i^{th}$ immediate hash value by performing the following operations:

H0(i) = a + H0(i - 1)
H1(i) = b + H1(i - 1)
H2(i) = c + H2(i - 1)
H3(i) = d + H3(i - 1)
H4(i) = e + H4(i - 1)
H5(i) = f + H5(i - 1)
H6(i) = g + H6(i - 1)
H7(i) = h + H7(i - 1)

Final Result

After completing N iterations, the resulting 256-bits Message Digest of the message is :

$H0(N) | H1(N) | H2(N) | H3(N) | H4(N) | H5(N) | H6(N) | H7(N)$

2 years ago

2 years ago