1.0.0 • Published 3 years ago

@jedithepro/filetype.js v1.0.0

Weekly downloads
1
License
MIT
Repository
github
Last release
3 years ago

filetype.js

Detect the file type of a Buffer/Uint8Array/ArrayBuffer

The file type is detected by checking the magic number of the buffer.

This package is for detecting binary-based file formats, not text-based formats like .txt, .csv, .svg, etc.

Installation

$ npm install @jedithepro/filetype.js

Usage

Node.js

Determine file type from a file:

const FileType = require('@jedithepro/filetype.js');

(async () => {
	console.log(await FileType.fromFile('Unicorn.png'));
	//=> {ext: 'png', mime: 'image/png'}
})();

Determine file type from a Buffer, which may be a portion of the beginning of a file:

const FileType = require('@jedithepro/filetype.js');
const readChunk = require('read-chunk');

(async () => {
	const buffer = readChunk.sync('Unicorn.png', 0, 4100);

	console.log(await FileType.fromBuffer(buffer));
	//=> {ext: 'png', mime: 'image/png'}
})();

Determine file type from a stream:

const fs = require('fs');
const FileType = require('@jedithepro/filetype.js');

(async () => {
	const stream = fs.createReadStream('Unicorn.mp4');

	console.log(await FileType.fromStream(stream));
	//=> {ext: 'mp4', mime: 'video/mp4'}
}
)();

The stream method can also be used to read from a remote location:

const got = require('got');
const FileType = require('@jedithepro/filetype.js');

const url = 'https://upload.wikimedia.org/wikipedia/en/a/a9/Example.jpg';

(async () => {
	const stream = got.stream(url);

	console.log(await FileType.fromStream(stream));
	//=> {ext: 'jpg', mime: 'image/jpeg'}
})();

Another stream example:

const stream = require('stream');
const fs = require('fs');
const crypto = require('crypto');
const FileType = require('@jedithepro/filetype.js');

(async () => {
	const read = fs.createReadStream('encrypted.enc');
	const decipher = crypto.createDecipheriv(alg, key, iv);

	const fileTypeStream = await FileType.stream(stream.pipeline(read, decipher));

	console.log(fileTypeStream.fileType);
	//=> {ext: 'mov', mime: 'video/quicktime'}

	const write = fs.createWriteStream(`decrypted.${fileTypeStream.fileType.ext}`);
	fileTypeStream.pipe(write);
})();

API

FileType.fromBuffer(buffer)

Detect the file type of a Buffer, Uint8Array, or ArrayBuffer.

The file type is detected by checking the magic number of the buffer.

If file access is available, it is recommended to use FileType.fromFile() instead.

Returns a Promise for an object with the detected file type and MIME type:

Or undefined when there is no match.

buffer

Type: Buffer | Uint8Array | ArrayBuffer

A buffer representing file data. It works best if the buffer contains the entire file, it may work with a smaller portion as well.

FileType.fromFile(filePath)

Detect the file type of a file path.

The file type is detected by checking the magic number of the buffer.

Returns a Promise for an object with the detected file type and MIME type:

Or undefined when there is no match.

filePath

Type: string

The file path to parse.

FileType.fromStream(stream)

Detect the file type of a Node.js readable stream.

The file type is detected by checking the magic number of the buffer.

Returns a Promise for an object with the detected file type and MIME type:

Or undefined when there is no match.

stream

Type: stream.Readable

A readable stream representing file data.

FileType.fromTokenizer(tokenizer)

Detect the file type from an ITokenizer source.

This method is used internally, but can also be used for a special "tokenizer" reader.

A tokenizer propagates the internal read functions, allowing alternative transport mechanisms, to access files, to be implemented and used.

Returns a Promise for an object with the detected file type and MIME type:

Or undefined when there is no match.

An example is @tokenizer/http, which requests data using HTTP-range-requests. A difference with a conventional stream and the tokenizer, is that it can ignore (seek, fast-forward) in the stream. For example, you may only need and read the first 6 bytes, and the last 128 bytes, which may be an advantage in case reading the entire file would take longer.

const {makeTokenizer} = require('@tokenizer/http');
const FileType = require('@jedithepro/filetype.js');

const audioTrackUrl = 'https://test-audio.netlify.com/Various%20Artists%20-%202009%20-%20netBloc%20Vol%2024_%20tiuqottigeloot%20%5BMP3-V2%5D/01%20-%20Diablo%20Swing%20Orchestra%20-%20Heroines.mp3';

(async () => {
	const httpTokenizer = await makeTokenizer(audioTrackUrl);
	const fileType = await FileType.fromTokenizer(httpTokenizer);

	console.log(fileType);
	//=> {ext: 'mp3', mime: 'audio/mpeg'}
})();

Or use @tokenizer/s3 to determine the file type of a file stored on Amazon S3:

const FileType = require('@jedithepro/filetype.js');
const S3 = require('aws-sdk/clients/s3');
const {makeTokenizer} = require('@tokenizer/s3');

(async () => {
	// Initialize the S3 client
	const s3 = new S3();

	// Initialize the S3 tokenizer.
	const s3Tokenizer = await makeTokenizer(s3, {
		Bucket: 'affectlab',
		Key: '1min_35sec.mp4'
	});

	// Figure out what kind of file it is.
	const fileType = await FileType.fromTokenizer(s3Tokenizer);
	console.log(fileType);
})();

Note that only the minimum amount of data required to determine the file type is read (okay, just a bit extra to prevent too many fragmented reads).

FileType.extensions

Returns a set of supported file extensions.

FileType.mimeTypes

Returns a set of supported MIME types.

Supported file types

  • jpg
  • png
  • apng - Animated Portable Network Graphics
  • gif
  • webp
  • flif
  • cr2 - Canon Raw image file (v2)
  • cr3 - Canon Raw image file (v3)
  • orf - Olympus Raw image file
  • arw - Sony Alpha Raw image file
  • dng - Adobe Digital Negative image file
  • nef - Nikon Electronic Format image file
  • rw2 - Panasonic RAW image file
  • raf - Fujifilm RAW image file
  • tif
  • bmp
  • icns
  • jxr
  • psd
  • indd
  • zip
  • tar
  • rar
  • gz
  • bz2
  • 7z
  • dmg
  • mp4
  • mid
  • mkv
  • webm
  • mov
  • avi
  • mpg
  • mp1 - MPEG-1 Audio Layer I
  • mp2
  • mp3
  • ogg
  • ogv
  • ogm
  • oga
  • spx
  • ogx
  • opus
  • flac
  • wav
  • qcp
  • amr
  • pdf
  • epub
  • mobi - Mobipocket
  • exe
  • swf
  • rtf
  • woff
  • woff2
  • eot
  • ttf
  • otf
  • ico
  • flv
  • ps
  • xz
  • sqlite
  • nes
  • crx
  • xpi
  • cab
  • deb
  • ar
  • rpm
  • Z
  • lz
  • cfb
  • mxf
  • mts
  • wasm
  • blend
  • bpg
  • docx
  • pptx
  • xlsx
  • jp2 - JPEG 2000
  • jpm - JPEG 2000
  • jpx - JPEG 2000
  • mj2 - Motion JPEG 2000
  • aif
  • odt - OpenDocument for word processing
  • ods - OpenDocument for spreadsheets
  • odp - OpenDocument for presentations
  • xml
  • heic
  • cur
  • ktx
  • ape - Monkey's Audio
  • wv - WavPack
  • asf - Advanced Systems Format
  • dcm - DICOM Image File
  • mpc - Musepack (SV7 & SV8)
  • ics - iCalendar
  • glb - GL Transmission Format
  • pcap - Libpcap File Format
  • dsf - Sony DSD Stream File (DSF)
  • lnk - Microsoft Windows file shortcut
  • alias - macOS Alias file
  • voc - Creative Voice File
  • ac3 - ATSC A/52 Audio File
  • 3gp - Multimedia container format defined by the Third Generation Partnership Project (3GPP) for 3G UMTS multimedia services
  • 3g2 - Multimedia container format defined by the 3GPP2 for 3G CDMA2000 multimedia services
  • m4v - MPEG-4 Visual bitstreams
  • m4p - MPEG-4 files with audio streams encrypted by FairPlay Digital Rights Management as were sold through the iTunes Store
  • m4a - Audio-only MPEG-4 files
  • m4b - Audiobook and podcast MPEG-4 files, which also contain metadata including chapter markers, images, and hyperlinks
  • f4v - ISO base media file format used by Adobe Flash Player
  • f4p - ISO base media file format protected by Adobe Access DRM used by Adobe Flash Player
  • f4a - Audio-only ISO base media file format used by Adobe Flash Player
  • f4b - Audiobook and podcast ISO base media file format used by Adobe Flash Player
  • mie - Dedicated meta information format which supports storage of binary as well as textual meta information
  • shp - Geospatial vector data format
  • arrow - Columnar format for tables of data
  • aac - Advanced Audio Coding
  • it - Audio module format: Impulse Tracker
  • s3m - Audio module format: ScreamTracker 3
  • xm - Audio module format: FastTracker 2
  • ai - Adobe Illustrator Artwork
  • skp - SketchUp
  • avif - AV1 Image File Format
  • eps - Encapsulated PostScript
  • lzh - LZH archive
  • pgp - Pretty Good Privacy
  • asar - Archive format primarily used to enclose Electron applications
  • stl - Standard Tesselated Geometry File Format (ASCII only)

Pull requests are welcome for additional commonly used file types.

The following file types will not be accepted: