@playfulsparkle/robotstxt-js NPM

robotstxt.js

robotstxt.js is a lightweight JavaScript library for parsing robots.txt files. It provides a compliant solution in both browser and Node.js environments.

Directives

Clean-param
Host
Sitemap
User-agent
- Allow
- Disallow
- Crawl-delay
- Cache-delay
- Comment
- NoIndex
- Request-rate
- Robot-version
- Visit-time

Benefits

Accurately parse and interpret robots.txt rules.
Ensure compliance with robots.txt standards to avoid accidental blocking of legitimate bots.
Easily check URL permissions for different user agents programmatically.
Simplify the process of working with robots.txt in JavaScript applications.

Usage

Here's how to use robotstxt.js to analyze robots.txt content and check crawler permissions.

Node.js

const { robotstxt } = require("@playfulsparkle/robotstxt-js")
...

### JavaScript

```javascript
// Parse robots.txt content
const robotsTxtContent = `
User-Agent: GoogleBot
Allow: /public
Disallow: /private
Crawl-Delay: 5
Sitemap: https://example.com/sitemap.xml
`;

const parser = robotstxt(robotsTxtContent);

// Check URL permissions
console.log(parser.isAllowed("/public/data", "GoogleBot"));   // true
console.log(parser.isDisallowed("/private/admin", "GoogleBot")); // true

// Get specific user agent group
const googleBotGroup = parser.getGroup("googlebot"); // Case-insensitive
if (googleBotGroup) {
    console.log("Crawl Delay:", googleBotGroup.getCrawlDelay()); // 5
    console.log("Rules:", googleBotGroup.getRules().map(rule =>
        `${rule.type}: ${rule.path}`
    )); // ["allow: /public", "disallow: /private"]
}

// Get all sitemaps
console.log("Sitemaps:", parser.getSitemaps()); // ["https://example.com/sitemap.xml"]

// Check default rules (wildcard *)
console.log(parser.isAllowed("/protected", "*")); // true (if no wildcard rules exist)

Installation

NPM

npm i @playfulsparkle/robotstxt-js

Yarn

yarn add @playfulsparkle/robotstxt-js

Bower (deprecated)

Bower

bower install playfulsparkle/robotstxt.js

API Documentation

Core Methods

robotstxt(content: string): RobotsTxtParser - Creates a new parser instance with the provided robots.txt content.
getReports(): string[] - Get an array of parsing error, warning etc.
isAllowed(url: string, userAgent: string): boolean - Check if a URL is allowed for the specified user agent (throws if parameters are missing).
isDisallowed(url: string, userAgent: string): boolean - Check if a URL is disallowed for the specified user agent (throws if parameters are missing).
getGroup(userAgent: string): Group | undefined - Get the rules group for a specific user agent (case-insensitive match).
getSitemaps(): string[] - Get an array of discovered sitemap URLs from Sitemap directives.
getCleanParams(): string[] - Retrieve Clean-param directives for URL parameter sanitization.
getHost(): string | undefined - Get canonical host declaration for domain normalization.

Group Methods (via `getGroup()` result)

User Agent Info

getName(): string - User agent name for this group.
getComment(): string[] - Associated comment from the Comment directive.
getRobotVersion(): string | undefined - Robots.txt specification version.
getVisitTime(): string | undefined - Recommended crawl time window.

Crawl Management

getCacheDelay(): number | undefined - Cache delay in seconds.
getCrawlDelay(): number | undefined - Crawl delay in seconds.
getRequestRates(): string[] - Request rate limitations.

Rule Access

getRules(): Rule[] - All rules (allow/disallow/noindex) for this group.
addRule(type: string, path: string): void - Add rule (throws if type missing, throws if path missing).

Specification Support

Full Support

User-agent groups and inheritance
Allow/Disallow directives
Wildcard pattern matching (*)
End-of-path matching ($)
Crawl-delay directives
Sitemap discovery
Case-insensitive matching
Default user-agent (*) handling
Multiple user-agent declarations
Rule precedence by specificity

Support

Node.js

robotstxt.js runs in all active Node versions (6.x+).

Browser Support

This library is written using modern JavaScript ES2015 (ES6) features. It is expected to work in the following browser versions and later:

Browser	Minimum Supported Version
Desktop Browsers
Chrome	49
Edge	13
Firefox	45
Opera	36
Safari	14.1
Mobile Browsers
Chrome Android	49
Firefox for Android	45
Opera Android	36
Safari on iOS	14.5
Samsung Internet	5.0
WebView Android	49
WebView on iOS	14.5
Other
Node.js	6.13.0