4.0.0 • Published 5 months ago

html-encoding-sniffer v4.0.0

Weekly downloads
12,167,819
License
MIT
Repository
github
Last release
5 months ago

Determine the Encoding of a HTML Byte Stream

This package implements the HTML Standard's encoding sniffing algorithm in all its glory. The most interesting part of this is how it pre-scans the first 1024 bytes in order to search for certain <meta charset>-related patterns.

const htmlEncodingSniffer = require("html-encoding-sniffer");
const fs = require("fs");

const htmlBytes = fs.readFileSync("./html-page.html");
const sniffedEncoding = htmlEncodingSniffer(htmlBytes);

The passed bytes are given as a Uint8Array; the Node.js Buffer subclass of Uint8Array will also work, as shown above.

The returned value will be a canonical encoding name (not a label). You might then combine this with the whatwg-encoding package to decode the result:

const whatwgEncoding = require("whatwg-encoding");
const htmlString = whatwgEncoding.decode(htmlBytes, sniffedEncoding);

Options

You can pass two potential options to htmlEncodingSniffer:

const sniffedEncoding = htmlEncodingSniffer(htmlBytes, {
  transportLayerEncodingLabel,
  defaultEncoding
});

These represent two possible inputs into the encoding sniffing algorithm:

  • transportLayerEncodingLabel is an encoding label that is obtained from the "transport layer" (probably a HTTP Content-Type header), which overrides everything but a BOM.
  • defaultEncoding is the ultimate fallback encoding used if no valid encoding is supplied by the transport layer, and no encoding is sniffed from the bytes. It defaults to "windows-1252", as recommended by the algorithm's table of suggested defaults for "All other locales" (including the en locale).

Credits

This package was originally based on the excellent work of @nicolashenry, in jsdom. It has since been pulled out into this separate package.

jsdomhttp-serverarchetype-libraryreact-native-bluetooth2killi8n-react-native-fast-imageticket-jsdomspecify-importsbabel-specify-imports@icanpm/api-masterjsdom-exreact-native-template-rfbaseairscanairscan-examplereact-native-esc-pos-sahaab@borisovart/atol-kkt-moduledeneme323112@ntt_app/react-native-custom-notificationreact-native-covid-sdkgql_din_modbitgetjsdom-fork@olivervorasai/sliderreact-native-printer-brotherswilscannerjsdom__no_corsstretch-rollup@mink-opn/build-tokensreact-native-slider-kfsvelte-slime@infinitebrahmanuniverse/nolb-html-eplginexpand-react-bridgesklif-ui-kitsklif-api@everything-registry/sub-chunk-1867p149-table@pmadhur/jsdomsklif-ui@arielapaula/components@arielapaula/test@assembleco/jsdom@aristidenf/streak-counterant-design-draggable-modal-4ant-design-draggable-modal-fixant-design-draggable-modal-fix-2@applitools/jsdom@apardellass/react-native-audio-streamappxgen@anonybit-modules/videoreconstruction@ansonhkg/utilsarvm-bestdeveloper@zh0st/evm-chainsact_mvvm_shop_cart@aecz/jsdom@affinidi/affinidi-auth-sdk-kernel@amindunited/jsdom@torswap/tor-token-lists@torgeircook/cssjson@tonysusi/vapid@rps-engine/core@traitsniper/web3-react-connector@traitsniper/web3-react-v6-connector@smartpartner/postdirekt-autocomplete@taingo97/react-native-rsa@taingo97/react-native-telpo-printer@taingo97/react-native-expo-key-rsa-kt@taingo97/react-native-expo-rsa@thinkincoin-libs/token-lists@wenbo/jsdom@thekarinka/bootstrap-icons-vue@websoftmd/cdk-s3-static-website-construct@tasumaniadiabori/react-native-draggable-flatlist@wecraftapps/react-native-use-keyboard@nosnibor89/cdk-static-website-construct@openpolitica/matomo-next@orez/hiaffinidi-auth-sdk-kernel@alexshmyrkov/react-ranger@1nd/jsdom@zappar/http-serverhttpserver3hot-zone-vuehttp-server-envhttp-server-ipv6appcharge-checkoutappcharge-checkout-reactjs-sdk@asiz33/smartblok-vendure-plugin@behzadebrhm/utilsiex-sdkbackend-testing-corebackgammon_ui_shared@cdk8s-extensions/argo-rollout@badpingpong/placeholderimageasync-bus@bezael-challenge/innoit-date-formataxonv2sdk@buganto/client@christydennison/jsdom-no-cssbirken-react-native-community-image-editor@cryptocode99/token-lists
4.0.0

5 months ago

3.0.0

3 years ago

2.0.1

4 years ago

2.0.0

4 years ago

1.0.2

6 years ago

1.0.1

8 years ago

1.0.0

8 years ago