html-proofer.js v1.1.7
html-proofer.js
JavaScript port of gjtorikian/html-proofer (Hash: 93ba616eb45b7ba844197fc29824995e8fcd2279
, Version: 4.0.1
)
Currently, the library is fully working and I'm using it internally for my projects.
Motivation
I've used an original html-proofer for many projects to check static documentation link integrity. I was running it with locally installed ruby, locally with ruby in docker, on CI runner and it was working fine.
But in some cases, I had a requirement to create a custom set of checks with pretty complex logic including integration with non-public services. Ruby is not a wide-spread programming language in my working environment it was pretty hard to find someone who either know Ruby or was willing to learn a new language just to maintain or occasionally write new checks.
I've decided to port html-proofer from Ruby to JavaScript as JavaScript was already a part of the stack I was using and it perfectly complements other automated tests for static sites. JavaScript is a mainstream language and well known withing the dev community, so it is not a problem anymore to find developers with the required skills.
I've tried to keep the original html-proofer API as much as possible, but some APIs I had to change to be better consumed from JavaScript world.
Usage
Disregard the method of usage the library should be installed first.
Install in current folder
npm install html-proofer.js
or install globally (it would be available in any folder)
npm install -g html-proofer.js
Use as CLI
Running for current folder:
npx htmlproofer .
Output would look like the following (if there are no issues detected):
Running 3 checks (Links, Images, Scripts) in . on *.html files...
Ran on X files!
HTML-Proofer finished successfully.
Use as Library
You can import library and implement custom checks or just run default set of checks on desired file/folder.
Let's assume we want to check that our html files does not contain mailto links to octocat@github.com
mailto_octocat.html
<h1>Hello</h1>
<a href="mailto:octocat@github.com">hey!</a>
<a href="mailto:someoneelse@github.com">ho!</a>
You can create custom check class
const {HTMLProofer, Check, DummyReporter} = require('html-proofer.js')
class MailToOctocat extends Check {
internalRun() {
for (const node of this.html.css('a')) {
const link = this.createElement(node)
if (link.isIgnore()) {
continue
}
if (this.isMailtoOctocat(link)) {
this.addFailure(`Don't email the Octocat directly!`, link.line)
}
}
}
isMailtoOctocat(link) {
return link.url.rawAttribute === 'mailto:octocat@github.com'
}
}
Now we are ready to submit our custom check to HTMLProofer
const reporter = new DummyReporter()
const options = {
checks: [MailToOctocat],
}
const path = '<directory>'
main = async () => {
const proofer = HTMLProofer.checkDirectory(path, options, reporter)
await proofer.run()
console.log(proofer.failedChecks)
}
main()
as a result it should report something like that:
Running 1 check (MailToOctocat) in <directory> on *.html files...
Ran on 1 file!
HTML-Proofer found 1 failure!
[
Failure {
path: '<directory>/mailto_octocat.html',
checkName: 'MailToOctocat',
description: "Don't email the Octocat directly!",
line: 3,
status: null,
content: null
}
]
Configuration
The HTMLProofer
constructor takes an optional hash of additional options:
Option | Description | Default |
---|---|---|
allow_hash_href | If true , assumes href="#" anchors are valid | true |
allow_missing_href | If true , does not flag a tags missing href . In HTML5, this is technically allowed, but could also be human error. | false |
assume_extension | Automatically add specified extension to files for internal links, to allow extensionless URLs (as supported by most servers) | .html |
checks | An array of Strings indicating which checks you want to run | Links,Images,Scripts |
check_external_hash | Checks whether external hashes exist (even if the webpage exists) | true |
check_sri | Check that <link> and <script> external resources use SRI | false |
directory_index_file | Sets the file to look for when a link refers to a directory. | index.html |
disable_external | If true , does not run the external link checker | false |
enforce_https | Fails a link if it's not marked as https . | true |
extensions | An array of Strings indicating the file extensions you would like to check (including the dot) | ['.html'] |
ignore_empty_alt | If true , ignores images with empty/missing alt tags (in other words, <img alt> and <img alt=""> are valid; set this to false to flag those) | true |
ignore_files | An array of Strings or RegExps containing file paths that are safe to ignore. | [] |
ignore_empty_mailto | If true , allows mailto: href s which do not contain an email address. | false |
ignore_missing_alt | If true , ignores images with missing alt tags | false |
ignore_status_codes | An array of numbers representing status codes to ignore. | [] |
ignore_urls | An array of Strings or RegExps containing URLs that are safe to ignore. This affects all HTML attributes, such as alt tags on images. | [] |
log_level | Sets the logging level. One of debug , info , warn , or error | info |
only_4xx | Only reports errors for links that fall within the 4xx status code range. | false |
root_dir | The absolute path to the directory serving your html-files. | "" |
swap_attributes | JSON-formatted config that maps element names to the preferred attribute to check | {} |
swap_urls | A hash containing key-value pairs of RegExp => String . It transforms URLs that match RegExp into String via gsub . | {} |
ancestors_ignorable | Check ancestor elements for data-proofer-ignore attribute, this could cause performance degradation for large sites (disable it if not required) | true |