1.0.3 • Published 4 years ago

c8r v1.0.3

Weekly downloads
3
License
MIT
Repository
github
Last release
4 years ago

Intro

Compressor is a Javascript encoder/decoder for array of numbers. It encodes numbers into an efficient Base 64 URL string.

Input

[5, 0, 5, 6, 3, 4, 5, 0, 5, 6];

Output

"Coucou";

Compressor can encode any array of numbers, and decode Base 64 URL string.

Usage

Install with npm or yarn

npm install c8r

Or

yarn add c8r

Encode

encode([1, 0, 0, 0, 0, 1, 1, 0, 0, 0, 0, 1]));

Returns "Ahh"

Decode

decode("ChaLut");

Returns [4, 1, 3, 2, 1, 3, 5, 6, 5, 5]

Use case

The original use case was to send an array of numbers directly in URL, like this: myapplication.io/?collection=[1, 2, 3, 4, 50, 600…]

But this is not an efficient way of doing so, because each digit will use a character (which is 8 or 16 bits), each comma separator will also use a character, and each bracket…

So we look for an alternative, like encoding into Base 64

Why not using btoa ?

btoa is used to transform Binary String into ASCII Base 64.

Example:

btoa([1, 2, 3, 4, 5, 6, 7, 8]);

Returns "MSwyLDMsNCw1LDYsNyw4"

atob("MSwyLDMsNCw1LDYsNyw4");

Returns "1,2,3,4,5,6,7,8"

This is very inefficient, you will use 16 bits per each digit (due to UTF-16 used for binary string), + 16 bits for each ',' delimiter…

You can try to optimize this, using for example Typed Array

var buffer = new Uint8Array(input);
var binary = "";
for (var b = 0; b < buffer.byteLength; b++) {
  binary += String.fromCharCode(buffer[b]);
}
return window.btoa(binary);

Example: input = [1,2,3,4,5,6] returns "AQIDBAUG"

This is better, but not optimal.

It got several flaws, beginning to size limitation in your input (8 bits per number, which won’t let you use numbers greater than 255), and increasing the total size as String.fromCharCode will encode into 16 bits…

You can also try to combine two 8 bits integers into one 16 bits char:

const [first, second] = [42, 12];

String.fromCharCode((first << 8) + second); // returns "⨌"

Unfortunately btoa doesn’t accept out of ASCII range characters…

How is this working ?

The first optimization is finding the smallest size for encoding each numbers gave to the encode function.

For example, taking the following input [1, 2, 3, 0, 2, 3, 1, 3, 2, 2, 3, 0, 0, 3, 1]

2 bits are needed to encode each number (as it could contain 0, 1, 2, or 3)

Giving that, the first Base 64 character of the output is this number of bits used for encoding each number.

Example: 2 bits => the output will begin with B character.

Next, each number is grouped into 6 bits packet for creating a Base 64 symbol.

Example: [1, 2, 3] => became 01 10 11 which is b in Base 64.

Resulting into BbLesN (more compact than [1, 2, 3, 0, 2, 3, 1, 3, 2, 2, 3, 0, 0, 3, 1] isn’t it?)