0.0.5 • Published 6 months ago

sitemap2docext v0.0.5

Weekly downloads
-
License
Apache 2.0.
Repository
-
Last release
6 months ago

CircleCI

Sitemap 2 Doc

This module downloads all web pages listed in the Sitemap.xml file and compiles them into a single document.

Use node.js v20+

Designed for AI Embedding Generation

Quickstart

Terminal

npm init -y && npm i sitemap2doc

Node index.mjs

import { Sitemap2Doc } from 'sitemap2doc'

const s2d = new Sitemap2Doc()
await s2d.getDocument( {
    'projectName': 'test',
    'sitemapUrl': 'https://...'
} )

Terminal

node index.mjs

Table of Contents

Methods

getDocument()

KeyTypeDescriptionRequiredDefault
projectNameStringSet project nametrue
sitemapUrlStringSet sitemap sourcetrue
silentBooleanControl terminal outputfalsefalse

Example

import { Sitemap2Doc } from 'sitemap2doc'

const s2d = new Sitemap2Doc()
await s2d.getDocument( {
    'projectName': 'test',
    'sitemapUrl': 'https://...'
} )
  Get Sitemap     https://...
  Get Pages       0 1 2 3 4 5 6 7 8 9  
  Merge           0 

getConfig()

Get current config, the default config you can find here: ./src/data/config.mjs

import { Sitemap2Doc } from 'sitemap2doc'

const s2d = new Sitemap2Doc()
let config = s2d.getConfig()
config['download']['chunkSize'] = 4

s2d
   .setConfig( { config } )
   .getDocument( { ... } )

setConfig()

All module settings are stored in a config file, see ./src/data/config.mjs. This file can be completely overridden by passing an object during initialization.

import { Sitemap2Doc } from 'sitemap2doc'

const s2d = new Sitemap2Doc()
let config = s2d.getConfig()
config['download']['chunkSize'] = 4

s2d
   .setConfig( { config } )
   .getDocument( { ... } )

License

The module is available as open source under the terms of the Apache 2.0. License.

0.0.5

6 months ago

0.0.4

6 months ago

0.0.3

6 months ago

0.0.2

6 months ago