3.1.2 • Published 7 months ago

html-json-converter v3.1.2

Weekly downloads
-
License
MIT
Repository
github
Last release
7 months ago

HTML JSON Converter

A TypeScript library to convert HTML to JSON and vice versa. Supports both Node.js and browser environments.


Installation

npm install html-json-converter

Usage

Server-Side Usage

import { ServerHTMLJSONConverter } from 'html-json-converter';
// OR use the below import if you want to be specific (Recommended)
import { ServerHTMLJSONConverter } from 'html-json-converter/server';

const converter = new ServerHTMLJSONConverter();

// HTML to JSON
const html = '<div class="test">Hello World</div>';
const json = converter.toJSON(html);
console.log(json);
/* Output:
{
  tag: "div",
  attributes: { class: "test" },
  children: ["Hello World"]
}
*/

// JSON to HTML
const jsonObj = {
  tag: "div",
  attributes: { class: "test" },
  children: ["Hello World"]
};
const htmlOutput = converter.toHTML(jsonObj);
console.log(htmlOutput);
/* Output:
<div class="test">
    Hello World
</div>
*/

Browser Usage

// This is the only way to import the client-side version of the library.
import { ClientHTMLJSONConverter } from 'html-json-converter/client';

const converter = new ClientHTMLJSONConverter();

// Usage is the same as server-side

Features

Void Elements

Void elements are self-closing elements that cannot have children, such as <img>, <br>, <hr>, etc. The converter enforces the rule that void elements cannot have children.

// Valid void element
const html = '<img src="test.jpg" alt="Test"/>';
const json = converter.toJSON(html);
/* Output:
{
  tag: "img",
  attributes: {
    src: "test.jpg",
    alt: "Test"
  }
}
*/

// Attempting to convert a void element with children in JSON to HTML will throw an error
const invalidJson = {
  tag: "img",
  attributes: { src: "test.jpg" },
  children: ["Invalid content"] // This is not allowed
};

try {
  converter.toHTML(invalidJson);
} catch (error) {
  console.error(error.message);
  // Output: Void element <img> cannot have children.
}

Raw Text Elements

Raw text elements, such as <script> and <style>, preserve their content as a single text node.

const html = '<style>.test { color: red; }</style>';
const json = converter.toJSON(html);
/* Output:
{
  tag: "style",
  children: [".test { color: red; }"]
}
*/

Nested Elements

The converter handles nested HTML structures seamlessly.

const html = `
<div class="container">
  <h1>Title</h1>
  <p>Paragraph</p>
</div>
`;
const json = converter.toJSON(html);
/* Output:
{
  tag: "div",
  attributes: { class: "container" },
  children: [
    {
      tag: "h1",
      children: ["Title"]
    },
    {
      tag: "p",
      children: ["Paragraph"]
    }
  ]
}
*/

Supported HTML Elements

ElementTypeAllows ChildrenAllows Attributes
aNormalYesYes
abbrNormalYesYes
addressNormalYesYes
articleNormalYesYes
asideNormalYesYes
audioNormalYesYes
bNormalYesYes
bdiNormalYesYes
bdoNormalYesYes
blockquoteNormalYesYes
bodyNormalYesYes
buttonNormalYesYes
canvasNormalYesYes
captionNormalYesYes
citeNormalYesYes
codeNormalYesYes
colgroupNormalYesYes
dataNormalYesYes
datalistNormalYesYes
ddNormalYesYes
delNormalYesYes
detailsNormalYesYes
dfnNormalYesYes
dialogNormalYesYes
divNormalYesYes
dlNormalYesYes
dtNormalYesYes
emNormalYesYes
fieldsetNormalYesYes
figcaptionNormalYesYes
figureNormalYesYes
footerNormalYesYes
formNormalYesYes
h1NormalYesYes
h2NormalYesYes
h3NormalYesYes
h4NormalYesYes
h5NormalYesYes
h6NormalYesYes
headNormalYesYes
headerNormalYesYes
hgroupNormalYesYes
htmlNormalYesYes
iNormalYesYes
iframeNormalYesYes
insNormalYesYes
kbdNormalYesYes
labelNormalYesYes
legendNormalYesYes
liNormalYesYes
mainNormalYesYes
mapNormalYesYes
markNormalYesYes
menuNormalYesYes
meterNormalYesYes
navNormalYesYes
noscriptNormalYesYes
objectNormalYesYes
olNormalYesYes
optgroupNormalYesYes
optionNormalYesYes
outputNormalYesYes
pNormalYesYes
pictureNormalYesYes
preNormalYesYes
progressNormalYesYes
qNormalYesYes
rpNormalYesYes
rtNormalYesYes
rubyNormalYesYes
sNormalYesYes
sampNormalYesYes
sectionNormalYesYes
selectNormalYesYes
smallNormalYesYes
spanNormalYesYes
strongNormalYesYes
subNormalYesYes
summaryNormalYesYes
supNormalYesYes
tableNormalYesYes
tbodyNormalYesYes
tdNormalYesYes
templateNormalYesYes
tfootNormalYesYes
thNormalYesYes
theadNormalYesYes
timeNormalYesYes
trNormalYesYes
uNormalYesYes
ulNormalYesYes
varNormalYesYes
videoNormalYesYes
Void Elements
areaVoidNoYes
baseVoidNoYes
brVoidNoYes
colVoidNoYes
embedVoidNoYes
hrVoidNoYes
imgVoidNoYes
inputVoidNoYes
keygenVoidNoYes
linkVoidNoYes
metaVoidNoYes
paramVoidNoYes
sourceVoidNoYes
trackVoidNoYes
wbrVoidNoYes
Raw Text Elements
scriptRaw TextYesYes
styleRaw TextYesYes
textareaRaw TextYesYes
titleRaw TextYesYes
Foreign Elements
svgForeignYesYes
mathForeignYesYes

Document Fragment vs Full Documents

The converter supports both HTML fragments and full HTML documents.

// Fragment
const fragment = '<p>Hello</p>';
const fragmentJson = converter.toJSON(fragment);
/* Output:
{
  tag: "p",
  children: ["Hello"]
}
*/

// Full Document
const doc = '<!DOCTYPE html><html><body><p>Hello</p></body></html>';
const docJson = converter.toJSON(doc);
/* Output:
{
  tag: "html",
  children: [
    {
      tag: "head"
    },
    {
      tag: "body",
      children: [
        {
          tag: "p",
          children: ["Hello"]
        }
      ]
    }
  ]
}
*/

Custom Elements

You can register custom elements with specific behaviors.

import { ServerHTMLJSONConverter, HTMLElementType } from 'html-json-converter';
const customElements = {
  'my-component': { type: HTMLElementType.NORMAL, allowChildren: true, allowAttributes: true },
  'my-void-element': { type: HTMLElementType.VOID, allowChildren: false, allowAttributes: true }
};

const converter = new ServerHTMLJSONConverter({ customElements });

const html = '<my-component><span>Content</span></my-component>';
const json = converter.toJSON(html);
/* Output:
{
  tag: "my-component",
  children: [
    {
      tag: "span",
      children: ["Content"]
    }
  ]
}
*/

Configuration

You can customize the converter's behavior using the ConverterConfig interface.

import { type ConverterConfig, HTMLElementType,ServerHTMLJSONConverter } from 'html-json-converter';

const config : ConverterConfig = {
  useTab: false,      // Use spaces instead of tabs for indentation
  tabSize: 2,         // Number of spaces per indentation level
  customElements: {   // Register custom elements
    'custom-tag': { type: HTMLElementType.NORMAL, allowChildren: true, allowAttributes: true }
  }
};

const converter = new ServerHTMLJSONConverter(config);

Important Notes

  • Enforcement of HTML Rules: The converter enforces certain HTML rules:
    • Void Elements:
      • Cannot have children.
      • Must be self-closing in the output HTML.
    • Non-Void Elements:
      • Cannot be self-closed.
      • Must have separate opening and closing tags, even if they have no children.
  • Parser Behavior:
    • When converting HTML to JSON, the converter relies on the HTML parser (JSDOM on the server, DOMParser in the browser).
    • The parser may correct malformed HTML automatically.
    • Invalid HTML (e.g., void elements with children) may be parsed differently than expected due to parser correction.
  • Whitespace Handling:
    • Whitespace and indentation in the output HTML are controlled by the useTab and tabSize configuration options.
  • Error Handling:
    • The converter will throw errors when attempting to violate enforced HTML rules during conversion.
    • Examples include adding children to void elements in JSON when converting to HTML.

License

This project is licensed under the MIT License.


Additional Considerations

  • Graceful Handling of Invalid HTML:
    • While the converter enforces rules during JSON to HTML conversion, it handles invalid HTML input gracefully when converting HTML to JSON.
    • The parser may automatically correct or ignore invalid structures.
  • Custom Element Types:
    • You can define custom element types and specify whether they are void, raw text, or normal elements.
    • This allows for flexibility when working with web components or custom tags.
  • Cross-Environment Consistency:
    • Both ServerHTMLJSONConverter and ClientHTMLJSONConverter aim to provide consistent behavior across Node.js and browser environments.
    • Be aware that slight differences may occur due to underlying parser implementations.

Examples

Handling Malformed HTML

const html = '<div>Unclosed div';
const json = converter.toJSON(html);
/* Output:
{
  tag: "div",
  children: ["Unclosed div"]
}
*/
// The parser corrects the unclosed <div> tag.

Enforcing Rules During Conversion

// Attempting to add children to a void element
const invalidJson = {
  tag: "br",
  children: ["Should not be here"]
};

try {
  converter.toHTML(invalidJson);
} catch (error) {
  console.error(error.message);
  // Output: Void element <br> cannot have children.
}

Using in Next.js (Client-Side)

"use client";

import { useState, useEffect } from 'react';
import { ClientHTMLJSONConverter } from 'html-json-converter/client';

export default function Demo() {
    const [htmlJSON, setHtmlJSON] = useState<string | null>(null);

    useEffect(() => {
        const converter = new ClientHTMLJSONConverter();
        const complexHTMLString = `<div>
                                    <h1>My Title</h1>
                                    <p>My paragraph</p>
                                    <div>
                                        <h2>My Subtitle</h2>
                                        <p>My sub paragraph</p>
                                    </div>
                                    <section style="color:red;">
                                        <h3>My Subtitle</h3>
                                        <p>My sub paragraph</p>
                                    </section>
                                    <img src="https://www.google.com/images/branding/googlelogo/1x/googlelogo_color_272x92dp.png" alt="Google Logo" />
                                    </div>`;
        const json = converter.toJSON(complexHTMLString);
        setHtmlJSON(JSON.stringify(json));
    }, []);

    return (
        <div className="max-w-5xl mx-auto text-left font-mono">
            {htmlJSON}
        </div>
    );
}

For Using in server side in Next.js

import { ServerHTMLJSONConverter } from 'html-json-converter/server';

export default async function Demo() {
    const complexHTMLString = `<div>
                                <h1>My Title</h1>
                                <p>My paragraph</p>
                                <div>
                                    <h2>My Subtitle</h2>
                                    <p>My sub paragraph</p>
                                </div>
                                <section style="color:red;">
                                    <h3>My Subtitle</h3>
                                    <p>My sub paragraph</p>
                                </section>
                                <img src="https://www.google.com/images/branding/googlelogo/1x/googlelogo_color_272x92dp.png" alt="Google Logo" />
                                </div>`;
    const converter = new ServerHTMLJSONConverter();
    const json = converter.toJSON(complexHTMLString);

    return (
        <div className="max-w-5xl mx-auto text-left font-mono">
            {JSON.stringify(json, null, 2)}
        </div>
    );
}

Note: Tested only for Next.js 14.2.11


Feedback and Contributions

I appreciate your feedback and contributions. If you encounter issues or have suggestions for improvements, please open an issue or submit a pull request on the GitHub repository.