0.0.2 • Published 5 years ago

unzipit.js v0.0.2

Weekly downloads
-
License
MIT
Repository
-
Last release
5 years ago

unzipit.js

Random access unzip library for browser based JavaScript

How to use

import unzipit from 'unzipit';

async function readFiles(url) {
  const {zip, entries} = await unzipit(url);

  // print all entries an their sizes
  for (const entry in entries) {
    console.log(entry.name, entry.size);
  }
  
  // read the 4th entry as an arraybuffer
  const arrayBuffer = await entries[3].arrayBuffer();

  // read the 9th entry as a blob and tag it with mime type 'image/png'
  const blob = await entries[8].blob('image/png');
}

You can also pass a Blob, ArrayBuffer, TypedArray, or your own Reader

Why?

Most of the js libraries I looked at would decompress all files in the zip file. That's probably the most common use case but it didn't fit my needs. I needed to, as fast as possible, open a zip and read a specific file. The better libraries only worked on node, I needed a browser based solution for Electron.

Note that to repo the behavior of most unzip libs would just be

import unzipit from 'unzipit';

async function readFiles(url) {
  const {zip, entries} = await unzipit(url);
  for (const entry in entries) {
    if (!entry.isDirectory) {
      const data = await entry.arrayBuffer();
    }
  }
}

One other thing is that many libraries seem bloated. IMO the smaller the API the better. I don't need a library to try to do 50 things via options and configuration. Rather I need a library to handle the main task and make it possible to do the rest outside the library. This makes a library far far more flexible.

As an example some libraries provide no raw data for filenames. Apparently many zip files have non-utf8 filenames in them. The solution for this library is to do that on your own.

Example

const {zip, entries} = await unzipit(url);
// decode names as big5 (chinese)
const decoder = new TextDecoder('big5');
entries.forEach(entry => {
  entry.name = decoder.decode(entry.nameBytes);
});

So much easier than passing in functions to decode names or setting flags whether or not to decode them.

Same thing with filenames. If you care about slashes or backslashes do that yourself outside the library

const {zip, entries} = await unzipit(url);
// change slashes and backslashes into -
entries.forEach(entry => {
  entry.name = name.replace(/\\|\//g, '-');
});

Some libraries both zip and unzip. IMO those should be separate libraries as there is ZERO code to share between both. Plenty of projects only need to do one or the other.

Similarly inflate and deflate libraries should be separate from zip, unzip libraries. You need one or the other not both. See zlib as an example.

Finally this library is ES7 based.

One area I'm not sure about is worker support. I want this code to be able to deflate in a worker but the question is at what level should that happen. Should I wrap an inflate library in a worker interface an use it here? Or should I make the user wrap this library at a higher level?

API

const {zip, entries} = await unzipit(url/blob/arraybuffer/reader)
// note: If you need more options for your url then fetch your own blob and pass the blob in
class Zip {
  comment,  // the comment for the zip file
  commentBytes,  // the raw data for comment, see nameBytes
}
class ZipEntry {
  async blob(type)   // returns a Blob for this entry (optional type as in 'image/jpeg'
  async arrayBuffer() // returns an ArrayBuffer for this entry
  async text() // returns text, assumes the text is valid utf8. If you want more options decode arrayBuffer yourself
  async json() // returns text with JSON.parse called on it. If you want more options decode arrayBuffer yourself
  name,        // name of entry
  nameBytes,   // raw name of entry (see notes)
  size,    // size in bytes
  compressedSize, // size before decompressing
  comment,  // the comment for this entry
  commentBytes, // the raw comment for this entry
  lastModDate, // a Date
  isDirectory,
}

Notes:

Caching

If you ask for the same entry twice it will be read twice and decompressed twice. If you want to cache entires implement that at a level above unzipit

Streaming

You can't stream zip files. The only valid way to read a zip file is to read the central directory which is at the end of the zip file. Sure there are zip files where you can cheat and read the local headers of each file but that is an invalid way to read a zip file and it's trivial to create zip files that will fail when read that way but are perfectly valid zip files.

If your server had some kind of API that lets you randomly access parts of a file then it would theoretically be possible. Unfortunately AFAIK there are no web standards for remote random access file reading (WEBDAV?) so whatever proprietary protocol you use you'd have to adapt on your own. To do this you'd make your own Reader. It just needs to support a length property and a read(offset, size) method. You can imagine an class like

class NetworkReader {
  constructor(url) {
    this.url = url;
  }
  async init() {
    const req = await fetch(`${url}?cmd=length`);
    this.length = await req.json();
  }
  async read(offset, size) {
    const req = await fetch(`${url}?offset=${offset}&size=${size}`);
    const buffer = await req.arrayBuffer();
    return buffer;
  }
}

To use it you'd do something like

import unzipit from 'unzipit';

async function readFiles(url) {
  const reader = new NetworkReader(url);
  await reader.init();
  const {zip, entries} = await unzipit(reader);
  for (const entry in entries) {
    const data = await entry.arrayBuffer();
  }
}

Non UTF-8 Filenames

The zip standard predates unicode so it's possible and apparently not uncommon for files to have non-unicode names. entry.nameBytes contains the raw bytes of the filename. so you are free to decode the name using your own methods.

Testing

When writing tests serve the folder with your favorite web server (recommend http-server) then go to http://localhost:8080/test/ to easily re-run the tests.

Of course you can also npm test to run them from the command line.

Debugging

Follow the instructions on testing but add ?timeout=0 to the URL as in http://localhost:8080/tests/?timeout=0

Acknowledgements

The code is heavily based on yazul

Licence

MIT

0.0.2

5 years ago