4.0.0 • Published 1 year ago

html-encoding-sniffer v4.0.0

Weekly downloads
12,167,819
License
MIT
Repository
github
Last release
1 year ago

Determine the Encoding of a HTML Byte Stream

This package implements the HTML Standard's encoding sniffing algorithm in all its glory. The most interesting part of this is how it pre-scans the first 1024 bytes in order to search for certain <meta charset>-related patterns.

const htmlEncodingSniffer = require("html-encoding-sniffer");
const fs = require("fs");

const htmlBytes = fs.readFileSync("./html-page.html");
const sniffedEncoding = htmlEncodingSniffer(htmlBytes);

The passed bytes are given as a Uint8Array; the Node.js Buffer subclass of Uint8Array will also work, as shown above.

The returned value will be a canonical encoding name (not a label). You might then combine this with the whatwg-encoding package to decode the result:

const whatwgEncoding = require("whatwg-encoding");
const htmlString = whatwgEncoding.decode(htmlBytes, sniffedEncoding);

Options

You can pass two potential options to htmlEncodingSniffer:

const sniffedEncoding = htmlEncodingSniffer(htmlBytes, {
  transportLayerEncodingLabel,
  defaultEncoding
});

These represent two possible inputs into the encoding sniffing algorithm:

  • transportLayerEncodingLabel is an encoding label that is obtained from the "transport layer" (probably a HTTP Content-Type header), which overrides everything but a BOM.
  • defaultEncoding is the ultimate fallback encoding used if no valid encoding is supplied by the transport layer, and no encoding is sniffed from the bytes. It defaults to "windows-1252", as recommended by the algorithm's table of suggested defaults for "All other locales" (including the en locale).

Credits

This package was originally based on the excellent work of @nicolashenry, in jsdom. It has since been pulled out into this separate package.

jsdomhttp-serverarchetype-libraryreact-native-bluetooth2killi8n-react-native-fast-imageticket-jsdomspecify-importsbabel-specify-imports@icanpm/api-masterjsdom-exreact-native-template-rfbaseairscanairscan-examplereact-native-esc-pos-sahaab@borisovart/atol-kkt-moduledeneme323112@ntt_app/react-native-custom-notificationreact-native-covid-sdkgql_din_modbitgetjsdom-fork@olivervorasai/sliderreact-native-printer-brotherswilscannerjsdom__no_corsstretch-rollup@mink-opn/build-tokensreact-native-slider-kfsvelte-slime@infinitebrahmanuniverse/nolb-html-eplginexpand-react-bridgesklif-ui-kitsklif-api@everything-registry/sub-chunk-1867p149-table@pmadhur/jsdomsklif-uitaon-http-servertailwind-vector-effectsyncbackbasessvelvet-customvz-parserwebchewoven-challenge-deploys8-http-serverrn-tm-notifyrn-use-modal-hookrn-session-multiplier-demornttlockresponsis-gantt-task-reactresponsive-react-apprestful-decorator-plugin-jsdomreactofy-css-libraryreikamoon-string-library-aarfp-librn-adyen-dropinsuperset-plugin-chart-hello-world2supercluster-googlemaps-adapter-clonesstanikionespotify-ds-sestarbucks-jp-drinksticky-scroll-catchstp-cdksdenv-jsdomreact-native-video-typotestapatestnpm_lmnsvelte-component-libvision-camera-plugin-scan-facesvue-axios-rest@atlantjs.dev/guardian@simstudio/htmldiffdfeuk-frontenddfeuk-frontend-manualanci-reactsushi-sdk-ftmchain_diggerzzzxxxyyy321123mobtimer-api2lib-errorlight-jsdomgrids-over-polygongogencygogency-test-2jsdom-rnjsdom-tougher-cookiejsdom-wcjsdom-lambdajsdom-napi-rs-canvasjsdom-no-cssjsfuckdomjsbrowserjmockib-jsdomiconv-html-snifferlit-patient-cardmggauharmicroend-componentmpd-parser-1
4.0.0

1 year ago

3.0.0

3 years ago

2.0.1

5 years ago

2.0.0

5 years ago

1.0.2

7 years ago

1.0.1

8 years ago

1.0.0

8 years ago