1.4.2 • Published 3 months ago
llmxml v1.4.2
LLMXML
A library for converting between Markdown and LLM-friendly XML formats, with section extraction capabilities.
Features
- Bidirectional conversion between Markdown and LLM-XML
- Fuzzy section matching and extraction
- Precise heading level control
- Configurable tag formatting and attribute output
- Automatic preservation of JSON structures
- Smart handling of code blocks
Installation
npm install llmxml
Quick Start
import { createLLMXML } from 'llmxml';
const llmxml = createLLMXML();
// Convert Markdown to LLM-XML
const xml = await llmxml.toXML(`
# Title
## Section
Content with JSON: {"name":"John","age":30}
`);
// Result:
// <Title>
// Content with JSON: {
// "name": "John",
// "age": 30
// }
// <Section>
// Content
// </Section>
// </Title>
// Convert LLM-XML to Markdown
const markdown = await llmxml.toMarkdown(xml);
// Extract sections
const section = await llmxml.getSection(markdown, 'Section');
Section Extraction
Provides section extraction with fuzzy matching:
// Extract a single section with options
const section = await llmxml.getSection(content, 'Setup Instructions', {
level: 2, // Only match h2 headers (1-6)
exact: false, // Require exact matches
includeNested: true, // Include subsections
fuzzyThreshold: 0.8 // Minimum match score (0-1)
});
// Extract multiple matching sections
const sections = await llmxml.getSections(content, 'setup', {
// Same options as getSection
fuzzyThreshold: 0.7
});
Configuration
Configure behavior when creating an instance:
const llmxml = createLLMXML({
// Default threshold for fuzzy matching (0-1)
defaultFuzzyThreshold: 0.7,
// Warning emission level
warningLevel: 'all', // 'all' | 'none' | 'ambiguous-only',
// Control XML attribute output
includeTitle: false, // Include title attribute (default: false)
includeHlevel: false, // Include hlevel attribute (default: false)
verbose: false, // Include both title and hlevel (default: false)
// Tag name formatting (default: 'PascalCase')
tagFormat: 'PascalCase', // 'snake_case' | 'SCREAMING_SNAKE' | 'camelCase' | 'PascalCase' | 'UPPERCASE'
});
// Examples with different configurations:
const withAttributes = createLLMXML({ verbose: true });
const xml1 = await withAttributes.toXML('# Long Title');
// <LongTitle title="Long Title" hlevel="1">
const snakeCase = createLLMXML({ tagFormat: 'snake_case' });
const xml2 = await snakeCase.toXML('# Long Title');
// <long_title>
Round-trip Conversions
For preserving document structure during round-trip conversions:
// Convert markdown to XML and back, preserving all structure
const roundTripped = await llmxml.roundTrip(`
# Title
## Section
Content
`);
Warning System
Emits warnings for potentially ambiguous situations:
// Register warning handler
llmxml.onWarning(warning => {
// Warning structure:
// {
// code: 'AMBIGUOUS_MATCH' | 'UNKNOWN_WARNING' | etc,
// message: string,
// details: {
// matches?: Array<{
// title: string,
// score: {
// exactMatch: boolean,
// fuzzyScore: number,
// contextualScore: number,
// level: number,
// // ... other scoring details
// }
// }>,
// }
// }
});
Error Handling
Throws typed errors for various failure conditions:
try {
const section = await llmxml.getSection(content, 'nonexistent');
} catch (error) {
if (error.code === 'SECTION_NOT_FOUND') {
console.log('Section not found:', error.message);
}
// Other error codes:
// - PARSE_ERROR: Failed to parse document
// - INVALID_FORMAT: Document format is invalid
// - INVALID_LEVEL: Invalid header level
// - INVALID_SECTION_OPTIONS: Invalid section extraction options
}
Documentation
License
MIT