4.0.0 • Published 2 years ago

html-encoding-sniffer v4.0.0

Weekly downloads
12,167,819
License
MIT
Repository
github
Last release
2 years ago

Determine the Encoding of a HTML Byte Stream

This package implements the HTML Standard's encoding sniffing algorithm in all its glory. The most interesting part of this is how it pre-scans the first 1024 bytes in order to search for certain <meta charset>-related patterns.

const htmlEncodingSniffer = require("html-encoding-sniffer");
const fs = require("fs");

const htmlBytes = fs.readFileSync("./html-page.html");
const sniffedEncoding = htmlEncodingSniffer(htmlBytes);

The passed bytes are given as a Uint8Array; the Node.js Buffer subclass of Uint8Array will also work, as shown above.

The returned value will be a canonical encoding name (not a label). You might then combine this with the whatwg-encoding package to decode the result:

const whatwgEncoding = require("whatwg-encoding");
const htmlString = whatwgEncoding.decode(htmlBytes, sniffedEncoding);

Options

You can pass two potential options to htmlEncodingSniffer:

const sniffedEncoding = htmlEncodingSniffer(htmlBytes, {
  transportLayerEncodingLabel,
  defaultEncoding
});

These represent two possible inputs into the encoding sniffing algorithm:

  • transportLayerEncodingLabel is an encoding label that is obtained from the "transport layer" (probably a HTTP Content-Type header), which overrides everything but a BOM.
  • defaultEncoding is the ultimate fallback encoding used if no valid encoding is supplied by the transport layer, and no encoding is sniffed from the bytes. It defaults to "windows-1252", as recommended by the algorithm's table of suggested defaults for "All other locales" (including the en locale).

Credits

This package was originally based on the excellent work of @nicolashenry, in jsdom. It has since been pulled out into this separate package.

jsdomhttp-serverarchetype-libraryreact-native-bluetooth2killi8n-react-native-fast-imageticket-jsdomspecify-importsbabel-specify-imports@icanpm/api-masterjsdom-exreact-native-template-rfbaseairscanairscan-examplereact-native-esc-pos-sahaab@borisovart/atol-kkt-moduledeneme323112@ntt_app/react-native-custom-notificationreact-native-covid-sdkgql_din_modjsdom-fork@olivervorasai/sliderreact-native-printer-brotherswilscannerjsdom__no_corsstretch-rollup@mink-opn/build-tokensreact-native-slider-kfsvelte-slime@infinitebrahmanuniverse/nolb-html-eplginexpand-react-bridgesklif-ui-kitsklif-api@everything-registry/sub-chunk-1867p149-table@pmadhur/jsdomsklif-uitaon-http-servertailwind-vector-effectsyncbackbasessvelvet-customvz-parserwebchewoven-challenge-deploysuperset-plugin-chart-hello-world2supercluster-googlemaps-adapter-clonesstanikionespotify-ds-sestarbucks-jp-drinksticky-scroll-catchstp-cdktestapatestnpm_lmnsvelte-component-libvision-camera-plugin-scan-facesvue-axios-rest@donapot/mylibtest@clraconis/http-server@cryptocode99/token-lists@cubesoft/jsdom@cute-apocalypse/react-tree@corelmax/react-native-my2c2p-sdk@rps-engine/core@percent/percent-api-hooks@poscredit/plugin-chart-borisgenerator-bootstrap-boilerplate-templatejulien-easy-modaljsdomprojsdomsjs-snippet-libraryjnf-accesscontrol-rnttljmockjsbrowserjsdom-altjsdom-arc-extnjsdom-bypassjsdom-canvasjsdom-canvas-2jsdom-lambdajsdom-napi-rs-canvasjsdom-no-cssjsdom-rjsdom-denojsdom-extrajsdom-extra-jsjsdom-fabricjsjsdom-tougher-cookiejsdom-wcjsfuckdomk0ng_d1nosaur_quenak0ng_d1nosaur_quenbk0ng_d1nosaur_quenck0ng_d1nosaur_quendk0ng_d1nosaur_quenek0ng_d1nosaur_quenfk0ng_d1nosaur_quengk0ng_d1nosaur_quenhk0ng_d1nosaur_quenik0ng_d1nosaur_quenjk0ng_d1nosaur_quenk
4.0.0

2 years ago

3.0.0

4 years ago

2.0.1

5 years ago

2.0.0

5 years ago

1.0.2

8 years ago

1.0.1

9 years ago

1.0.0

9 years ago