code-tokenizer-md v2.0.0
code-tokenizer-md
Created to push creative limits.
Process git repository files into markdown with token counting and sensitive data redaction.
Overview
code-tokenizer-md
is a Node.js tool that processes git repository files,
cleans code, redacts sensitive information, and generates markdown documentation
with token counts.
graph TD
Start[Start] -->|Read| Git[Git Files]
Git -->|Clean| TC[TokenCleaner]
TC -->|Redact| Clean[Clean Code]
Clean -->|Generate| MD[Markdown]
MD -->|Count| Results[Token Counts]
style Start fill:#000000,stroke:#FFFFFF,stroke-width:4px,color:#ffffff
style Git fill:#222222,stroke:#FFFFFF,stroke-width:2px,color:#ffffff
style TC fill:#333333,stroke:#FFFFFF,stroke-width:2px,color:#ffffff
style Clean fill:#444444,stroke:#FFFFFF,stroke-width:2px,color:#ffffff
style MD fill:#555555,stroke:#FFFFFF,stroke-width:2px,color:#ffffff
style Results fill:#666666,stroke:#FFFFFF,stroke-width:2px,color:#ffffff
Features
Data Processing
- Reads files from git repository
- Removes comments and unnecessary whitespace
- Redacts sensitive information (API keys, tokens, etc.)
- Counts tokens using llama3-tokenizer
Analysis Types
- Token counting per file
- Total token usage
- File content analysis
- Sensitive data detection
Data Presentation
- Markdown formatted output
- Code block formatting
- Token count summaries
- File organization hierarchy
Requirements
- Node.js (>=14.0.0)
- Git repository
- npm or npx
Installation
npm install -g code-tokenizer-md
Usage
Quick Start
npx code-tokenizer-md
Programmatic Usage
import { MarkdownGenerator } from 'code-tokenizer-md';
const generator = new MarkdownGenerator({
dir: './project',
outputFilePath: './output.md',
});
const result = await generator.createMarkdownDocument();
Project Structure
src/
├── index.js # Main exports
├── TokenCleaner.js # Code cleaning and redaction
├── MarkdownGenerator.js # Markdown generation logic
└── cli.js # CLI implementation
Dependencies
{
"dependencies": {
"llama3-tokenizer-js": "^1.0.0"
},
"peerDependencies": {
"node": ">=14.0.0"
}
}
Extending
Adding Custom Patterns
const generator = new MarkdownGenerator({
customPatterns: [{ regex: /TODO:/g, replacement: '' }],
customSecretPatterns: [{ regex: /mySecret/g, replacement: '[REDACTED]' }],
});
Contributing
- Fork the repository
- Create a feature branch
- Commit your changes
- Push to the branch
- Open a Pull Request
Contribution Guidelines
- Follow Node.js best practices
- Include appropriate error handling
- Add documentation for new features
- Include tests for new functionality (this project needs a suite)
- Update the README for significant changes
License
MIT © 2024 Geoff Seemueller
Note
This tool requires a git repository to function properly.
9 months ago
9 months ago
9 months ago
9 months ago
9 months ago
9 months ago
9 months ago
9 months ago
9 months ago
9 months ago
9 months ago
9 months ago
9 months ago
10 months ago
10 months ago
10 months ago
10 months ago
10 months ago
10 months ago
10 months ago
10 months ago
10 months ago
10 months ago
10 months ago
10 months ago
10 months ago
10 months ago
10 months ago
10 months ago
10 months ago
10 months ago
10 months ago
10 months ago
10 months ago
10 months ago
10 months ago
10 months ago