2.0.0 • Published 9 months ago

code-tokenizer-md v2.0.0

Weekly downloads
-
License
AGPL-3.0-or-later
Repository
-
Last release
9 months ago

code-tokenizer-md

Created to push creative limits.

Process git repository files into markdown with token counting and sensitive data redaction.

Overview

code-tokenizer-md is a Node.js tool that processes git repository files, cleans code, redacts sensitive information, and generates markdown documentation with token counts.

graph TD
   Start[Start] -->|Read| Git[Git Files]
   Git -->|Clean| TC[TokenCleaner]
   TC -->|Redact| Clean[Clean Code]
   Clean -->|Generate| MD[Markdown]
   MD -->|Count| Results[Token Counts]
   style Start fill:#000000,stroke:#FFFFFF,stroke-width:4px,color:#ffffff
   style Git fill:#222222,stroke:#FFFFFF,stroke-width:2px,color:#ffffff
   style TC fill:#333333,stroke:#FFFFFF,stroke-width:2px,color:#ffffff
   style Clean fill:#444444,stroke:#FFFFFF,stroke-width:2px,color:#ffffff
   style MD fill:#555555,stroke:#FFFFFF,stroke-width:2px,color:#ffffff
   style Results fill:#666666,stroke:#FFFFFF,stroke-width:2px,color:#ffffff

Features

Data Processing

  • Reads files from git repository
  • Removes comments and unnecessary whitespace
  • Redacts sensitive information (API keys, tokens, etc.)
  • Counts tokens using llama3-tokenizer

Analysis Types

  • Token counting per file
  • Total token usage
  • File content analysis
  • Sensitive data detection

Data Presentation

  • Markdown formatted output
  • Code block formatting
  • Token count summaries
  • File organization hierarchy

Requirements

  • Node.js (>=14.0.0)
  • Git repository
  • npm or npx

Installation

npm install -g code-tokenizer-md

Usage

Quick Start

npx code-tokenizer-md

Programmatic Usage

import { MarkdownGenerator } from 'code-tokenizer-md';

const generator = new MarkdownGenerator({
  dir: './project',
  outputFilePath: './output.md',
});

const result = await generator.createMarkdownDocument();

Project Structure

src/
├── index.js              # Main exports
├── TokenCleaner.js       # Code cleaning and redaction
├── MarkdownGenerator.js  # Markdown generation logic
└── cli.js               # CLI implementation

Dependencies

{
  "dependencies": {
    "llama3-tokenizer-js": "^1.0.0"
  },
  "peerDependencies": {
    "node": ">=14.0.0"
  }
}

Extending

Adding Custom Patterns

const generator = new MarkdownGenerator({
  customPatterns: [{ regex: /TODO:/g, replacement: '' }],
  customSecretPatterns: [{ regex: /mySecret/g, replacement: '[REDACTED]' }],
});

Contributing

  1. Fork the repository
  2. Create a feature branch
  3. Commit your changes
  4. Push to the branch
  5. Open a Pull Request

Contribution Guidelines

  • Follow Node.js best practices
  • Include appropriate error handling
  • Add documentation for new features
  • Include tests for new functionality (this project needs a suite)
  • Update the README for significant changes

License

MIT © 2024 Geoff Seemueller

Note

This tool requires a git repository to function properly.

1.2.7

9 months ago

1.2.6

9 months ago

2.0.0

9 months ago

1.2.0

9 months ago

1.2.5

9 months ago

1.1.6

9 months ago

1.2.4

9 months ago

1.1.5

9 months ago

1.2.3

9 months ago

1.1.4

9 months ago

1.2.2

9 months ago

1.1.3

9 months ago

1.2.1

9 months ago

1.0.19

10 months ago

1.0.18

10 months ago

1.0.17

10 months ago

1.0.16

10 months ago

1.0.9

10 months ago

1.0.8

10 months ago

1.0.7

10 months ago

1.0.6

10 months ago

1.0.5

10 months ago

1.1.2

10 months ago

1.0.21

10 months ago

1.0.20

10 months ago

1.0.11-pr

10 months ago

1.0.11

10 months ago

1.0.10

10 months ago

1.0.15

10 months ago

1.0.14

10 months ago

1.0.13

10 months ago

1.0.12

10 months ago

1.0.4

10 months ago

1.0.3

10 months ago

1.0.2

10 months ago

1.0.1

10 months ago

1.0.0

10 months ago