0.1.0 • Published 6 months ago
md-llm v0.1.0
md-llm
A Markdown to llmxml (.llm
) file format bridge that transforms Markdown documents into structured AST nodes optimized for LLM processing.
What's llmxml? Why?
- LLMs <3 XML. But it doesn't have to be strict insanely nested XML. It can basically be mostly-flat pseudo-xml.
- Humans <3 markdown.
- Markdown sections can be nested hierarchies with # headers.
- md-llm takes markdown documents and breaks them into nested nodes that can be output as XML
- After converting them to nested nodes, you can also use it to target specific portions of a document, giving you importable markdown 'modules' in meld.
This library is consumed by oneshot
and meld
cli tools. If you like what this does, you probably want those.
Features
- Bidirectional conversion between Markdown and LLM-optimized formats
- Semantic section detection and processing
- Rich support for Markdown elements:
- Headers with customizable depth
- Code blocks with language and metadata
- Lists (ordered, unordered, and task lists)
- Tables with alignment and formatting
- Blockquotes and thematic breaks
- HTML content preservation
- Frontmatter processing
- Definition lists
- References and footnotes
- Extensible transform pipeline
- Fuzzy section matching
- High performance and memory efficient
- Customizable tag name generation
- Modular architecture for custom transforms
Installation
npm install md-llm
Quick Start
import { mdToLlm } from 'md-llm';
const markdown = `
# System Context
Some context here...
## Project Setup
Instructions for setup:
\`\`\`bash
npm install
\`\`\`
`;
const result = await mdToLlm(markdown);
console.log(result.ast);
Output:
{
type: 'tag',
name: 'Document',
children: [
{
type: 'tag',
name: 'SystemContext',
children: [
{ type: 'text', value: 'Some context here...' },
{
type: 'tag',
name: 'ProjectSetup',
children: [
{ type: 'text', value: 'Instructions for setup:' },
{
type: 'tag',
name: 'Code',
attributes: { language: 'bash' },
children: [{ type: 'text', value: 'npm install' }]
}
]
}
]
}
]
}
API Reference
Core Function
async function mdToLlm(
markdown: string,
options?: MdToLlmOptions
): Promise<ParseResult>
Options
interface MdToLlmOptions {
headerDepth?: number; // 1-6, default 2
tagNameMap?: Record<string, string>; // Custom header->tag mappings
preserveHeaderText?: boolean; // Keep original text as first line?
customTransforms?: MdToLlmTransform[]; // Add custom transforms
}
Transform Pipeline
The library uses a modular transform pipeline that processes different Markdown elements:
HeaderTransform
: Converts headers to semantic tagsCodeFenceTransform
: Processes code blocks with language and metadataListTransform
: Handles ordered and unordered listsTableTransform
: Processes tables with alignmentBlockquoteTransform
: Handles blockquotesThematicBreakTransform
: Processes horizontal rulesFrontmatterTransform
: Extracts YAML frontmatterDefinitionTransform
: Processes definition listsReferenceTransform
: Handles link references and footnotesTaskListTransform
: Processes task listsHtmlTransform
: Preserves HTML contentSectionTransform
: Handles section boundaries and hierarchy
Custom Transforms
You can create custom transforms by implementing the MdToLlmTransform
interface:
interface MdToLlmTransform {
transform(node: MdastNode): LlmAstNode;
canTransform(node: MdastNode): boolean;
}
Example custom transform:
class CustomTransform implements MdToLlmTransform {
canTransform(node: MdastNode): boolean {
return node.type === 'customType';
}
transform(node: MdastNode): LlmAstNode {
return {
type: 'tag',
name: 'CustomTag',
attributes: {},
children: []
};
}
}
// Use in options
const options = {
customTransforms: [new CustomTransform()]
};
Node Types
The library uses two main node types:
interface TagNode {
type: 'tag';
name: string;
attributes?: Record<string, string>;
children: (TagNode | TextNode)[];
}
interface TextNode {
type: 'text';
value: string;
}
Section Processing
The library provides powerful section processing capabilities:
interface Section {
id: string;
title: string;
level: number;
frontmatter?: string;
content: Node[];
metadata: {
hasDefinitionLists: boolean;
hasTaskLists: boolean;
hasFootnotes: boolean;
references: {
links: Map<string, string>;
footnotes: Map<string, string>;
}
};
parent?: Section;
children: Section[];
}
Error Handling
The library provides detailed error information in the ParseResult
:
interface ParseResult {
ast: DocumentNode;
errors?: Error[];
}
Contributing
- Fork the repository
- Create your feature branch (
git checkout -b feature/amazing-feature
) - Commit your changes (
git commit -m 'Add amazing feature'
) - Push to the branch (
git push origin feature/amazing-feature
) - Open a Pull Request
License
This project is licensed under the Meld License - see the LICENSE file for details.
0.1.0
6 months ago