1.0.0 • Published 6 years ago

html-find-replace-element-attrs v1.0.0

Weekly downloads
2
License
MIT
Repository
github
Last release
6 years ago

html-find-replace-element-attrs Build Status codecov

Find (or replace, sync/async) attributes of elements (e.g. img src) in an html string, for browser & Node.js.

It's a new library, its stability has not been battle-tested. If you find a bug, please report.

API

find(html, options?)

Find all values of an attribute of a type of element.

htmlFindReplaceElementAttrs.find(
  `<div><img src='a.jpg' alt=''><img src="b.jpg" alt=''><img src=c.jpg alt=''></div>`,
  { tag: "img", attr: "src" }
)

produces:

[
    {
        "value": "a.jpg",
        "index": 15,
        "quoteType": "'"
    },
    {
        "value": "b.jpg",
        "index": 39,
        "quoteType": "\""
    },
    {
        "value": "c.jpg",
        "index": 62,
        "quoteType": " "
    }
]

If the attribute value is a url, it can also help you parse relative urls (you need to set parseAttrValueAsUrl to true and specify baseUrl in options)

htmlFindReplaceElementAttrs.find(
  "<div><img src='../a.jpg' alt=''><img src='/b.jpg' alt=''><img src='c.jpg' alt=''></div>",
  { tag: "img", attr: "src", parseAttrValueAsUrl: true, baseUrl: "http://www.example.com/hello/world/" },
)

produces:

[
    {
        "value": "../a.jpg",
        "index": 15,
        "quoteType": "'",
        "parsedUrl": "http://www.example.com/hello/a.jpg"
    },
    {
        "value": "/b.jpg",
        "index": 42,
        "quoteType": "'",
        "parsedUrl": "http://www.example.com/b.jpg"
    },
    {
        "value": "c.jpg",
        "index": 67,
        "quoteType": "'",
        "parsedUrl": "http://www.example.com/hello/world/c.jpg"
    }
]

To parse protocol independent urls, set urlProtocol in options

htmlFindReplaceElementAttrs.find(
  "<div><img src='//example.com/a.jpg' alt=''><img src='//example.com/b.jpg' alt=''></div>",
  { tag: "img", attr: "src", parseAttrValueAsUrl: true, urlProtocol: "https" },
)

produces:

[
    {
        "value": "//example.com/a.jpg",
        "index": 15,
        "quoteType": "'",
        "parsedUrl": "https://example.com/a.jpg"
    },
    {
        "value": "//example.com/b.jpg",
        "index": 53,
        "quoteType": "'",
        "parsedUrl": "https://example.com/b.jpg"
    }
]

options

tag

What element are you looking for (e.g. "img")

attr

What attribute are you looking for (e.g. "src")

parseAttrValueAsUrl

If set to true, the return value will also contain parsedUrl.

Make sure to set baseUrl if the html contains relative url, or with parseAttrValueAsUrl turned on it will throw an exception.

baseUrl

Used in conjunction with parseAttrValueAsUrl (e.g. "http://example.com/test/")

urlProtocol

Used when parsing protocol independent urls (defaults to the one in baseUrl or "http").

return value

Return value is an array of {value, index, quoteType, parsedUrl?}

value

The value as is in the html string.

index

Position of the value in the html string.

quoteType

What kind of quote surrounds the value.

Can be ", ', or

Caution: a space is used to indicate "no quotes", e.g. in <img width=100>

parsedUrl

It will be present only when you set parseAttrValueAsUrl to true.

It indicates what the final url should be in the context you provide.

Useful when you need to interact with the links in attribute values. (e.g. when you need to download and localize all images from img src)

replace(html, callback, options?)

Replace values of an attribute of a type of element.

htmlFindReplaceElementAttrs.replace(
  "<div><img src='../a.jpg' alt=''><img src='/b.jpg' alt=''><img src='c.jpg' alt=''></div>",
  "http://www.abc.com/1.jpg",
  { tag: "img", attr: "src" },
)

produces:

<div><img src='http://www.abc.com/1.jpg' alt=''><img src='http://www.abc.com/1.jpg' alt=''><img src='http://www.abc.com/1.jpg' alt=''></div>

It works with callback functions.

htmlFindReplaceElementAttrs.replace(
  "<div><img src='../a.jpg' alt=''><img src='/b.jpg' alt=''><img src='c.jpg' alt=''></div>",
  item => item.value.toUpperCase(),
  { tag: "img", attr: "src" },
)

produces:

<div><img src='../A.JPG' alt=''><img src='/B.JPG' alt=''><img src='C.JPG' alt=''></div>"

The parameter passed into the callback function is the same as that in the return value of find (i.e. {value, index, quoteType, parsedUrl?})

The options is the same as that of find, too.

e.g. You may use parseAttrValueAsUrl.

htmlFindReplaceElementAttrs.replace(
  "<div><img src='../a.jpg' alt=''><img src='//example2.com/b.jpg' alt=''></div>",
  item => item.parsedUrl,
  {
    tag: "img",
    attr: "src",
    parseAttrValueAsUrl: true,
    baseUrl: "https://www.example.com/hello/world",
    urlProtocol: "http",
  },
)

produces:

<div><img src='https://www.example.com/hello/a.jpg' alt=''><img src='http://example2.com/b.jpg' alt=''></div>

It is also smart enough to automatically add quotes for you sometimes.

htmlFindReplaceElementAttrs.replace(
  "<div><img src=a.jpg></div>",
  "hello world.jpg",
  {
    tag: "img",
    attr: "src"
  },
)

produces:

<div><img src="hello world.jpg"></div>

options

Same as that of find

return value

string

replaceAsync(html, callback, options?)

The async version of replace.

It will return a Promise, callbacks can return Promises too.

It is very useful when you want to perform async operations during the replacement.

htmlFindReplaceElementAttrs.replaceAsync('<img src="./abc.jpg"/>',
  _ => new Promise(resolve => {
    setTimeout(() => resolve(_.value.toUpperCase()), 1000)
  }),
  {
    tag: "img",
    attr: "src",
  }
)

produces a Promise that will resolve in 1000ms, with the value:

<img src="./ABC.JPG"/>

options

Same as that of replace

return value

Promise<string>

DEMOS

download and replace all image links in html

htmlFindReplaceElementAttrs.replaceAsync(html, (item) => new Promise(resolve => {
    downloadImage(item.parsedUrl).then(downloadResult => {
      resolve(downloadResult.path);
    });
  }),
  { tag: "img", attr: "src", parseAttrValueAsUrl: true, baseUrl: "https://www.gravatar.com" },
).then(replacedHtml => {
  console.log(replacedHtml)
});

function downloadImage(imageUrl) {
  return new Promise(resolve => {
    axios({
      url: imageUrl,
      responseType: 'stream',
    }).then(response => {
      let tmpPath = "tmp_" + Math.random();
      response.data.pipe(fs.createWriteStream(tmpPath));
      response.data.on('end', () => {
        resolve({ path: tmpPath });
      })
    });
  });
}

LICENSE

MIT