0.1.0 • Published 11 months ago
md-llm v0.1.0
md-llm
A Markdown to llmxml (.llm) file format bridge that transforms Markdown documents into structured AST nodes optimized for LLM processing.
What's llmxml? Why?
- LLMs <3 XML. But it doesn't have to be strict insanely nested XML. It can basically be mostly-flat pseudo-xml.
- Humans <3 markdown.
- Markdown sections can be nested hierarchies with # headers.
- md-llm takes markdown documents and breaks them into nested nodes that can be output as XML
- After converting them to nested nodes, you can also use it to target specific portions of a document, giving you importable markdown 'modules' in meld.
This library is consumed by oneshot and meld cli tools. If you like what this does, you probably want those.
Features
- Bidirectional conversion between Markdown and LLM-optimized formats
- Semantic section detection and processing
- Rich support for Markdown elements:
- Headers with customizable depth
- Code blocks with language and metadata
- Lists (ordered, unordered, and task lists)
- Tables with alignment and formatting
- Blockquotes and thematic breaks
- HTML content preservation
- Frontmatter processing
- Definition lists
- References and footnotes
- Extensible transform pipeline
- Fuzzy section matching
- High performance and memory efficient
- Customizable tag name generation
- Modular architecture for custom transforms
Installation
npm install md-llmQuick Start
import { mdToLlm } from 'md-llm';
const markdown = `
# System Context
Some context here...
## Project Setup
Instructions for setup:
\`\`\`bash
npm install
\`\`\`
`;
const result = await mdToLlm(markdown);
console.log(result.ast);Output:
{
type: 'tag',
name: 'Document',
children: [
{
type: 'tag',
name: 'SystemContext',
children: [
{ type: 'text', value: 'Some context here...' },
{
type: 'tag',
name: 'ProjectSetup',
children: [
{ type: 'text', value: 'Instructions for setup:' },
{
type: 'tag',
name: 'Code',
attributes: { language: 'bash' },
children: [{ type: 'text', value: 'npm install' }]
}
]
}
]
}
]
}API Reference
Core Function
async function mdToLlm(
markdown: string,
options?: MdToLlmOptions
): Promise<ParseResult>Options
interface MdToLlmOptions {
headerDepth?: number; // 1-6, default 2
tagNameMap?: Record<string, string>; // Custom header->tag mappings
preserveHeaderText?: boolean; // Keep original text as first line?
customTransforms?: MdToLlmTransform[]; // Add custom transforms
}Transform Pipeline
The library uses a modular transform pipeline that processes different Markdown elements:
HeaderTransform: Converts headers to semantic tagsCodeFenceTransform: Processes code blocks with language and metadataListTransform: Handles ordered and unordered listsTableTransform: Processes tables with alignmentBlockquoteTransform: Handles blockquotesThematicBreakTransform: Processes horizontal rulesFrontmatterTransform: Extracts YAML frontmatterDefinitionTransform: Processes definition listsReferenceTransform: Handles link references and footnotesTaskListTransform: Processes task listsHtmlTransform: Preserves HTML contentSectionTransform: Handles section boundaries and hierarchy
Custom Transforms
You can create custom transforms by implementing the MdToLlmTransform interface:
interface MdToLlmTransform {
transform(node: MdastNode): LlmAstNode;
canTransform(node: MdastNode): boolean;
}Example custom transform:
class CustomTransform implements MdToLlmTransform {
canTransform(node: MdastNode): boolean {
return node.type === 'customType';
}
transform(node: MdastNode): LlmAstNode {
return {
type: 'tag',
name: 'CustomTag',
attributes: {},
children: []
};
}
}
// Use in options
const options = {
customTransforms: [new CustomTransform()]
};Node Types
The library uses two main node types:
interface TagNode {
type: 'tag';
name: string;
attributes?: Record<string, string>;
children: (TagNode | TextNode)[];
}
interface TextNode {
type: 'text';
value: string;
}Section Processing
The library provides powerful section processing capabilities:
interface Section {
id: string;
title: string;
level: number;
frontmatter?: string;
content: Node[];
metadata: {
hasDefinitionLists: boolean;
hasTaskLists: boolean;
hasFootnotes: boolean;
references: {
links: Map<string, string>;
footnotes: Map<string, string>;
}
};
parent?: Section;
children: Section[];
}Error Handling
The library provides detailed error information in the ParseResult:
interface ParseResult {
ast: DocumentNode;
errors?: Error[];
}Contributing
- Fork the repository
- Create your feature branch (
git checkout -b feature/amazing-feature) - Commit your changes (
git commit -m 'Add amazing feature') - Push to the branch (
git push origin feature/amazing-feature) - Open a Pull Request
License
This project is licensed under the Meld License - see the LICENSE file for details.
0.1.0
11 months ago