2.0.0 • Published 6 months ago

code-tokenizer-md v2.0.0

Weekly downloads
-
License
AGPL-3.0-or-later
Repository
-
Last release
6 months ago

code-tokenizer-md

Created to push creative limits.

Process git repository files into markdown with token counting and sensitive data redaction.

Overview

code-tokenizer-md is a Node.js tool that processes git repository files, cleans code, redacts sensitive information, and generates markdown documentation with token counts.

graph TD
   Start[Start] -->|Read| Git[Git Files]
   Git -->|Clean| TC[TokenCleaner]
   TC -->|Redact| Clean[Clean Code]
   Clean -->|Generate| MD[Markdown]
   MD -->|Count| Results[Token Counts]
   style Start fill:#000000,stroke:#FFFFFF,stroke-width:4px,color:#ffffff
   style Git fill:#222222,stroke:#FFFFFF,stroke-width:2px,color:#ffffff
   style TC fill:#333333,stroke:#FFFFFF,stroke-width:2px,color:#ffffff
   style Clean fill:#444444,stroke:#FFFFFF,stroke-width:2px,color:#ffffff
   style MD fill:#555555,stroke:#FFFFFF,stroke-width:2px,color:#ffffff
   style Results fill:#666666,stroke:#FFFFFF,stroke-width:2px,color:#ffffff

Features

Data Processing

  • Reads files from git repository
  • Removes comments and unnecessary whitespace
  • Redacts sensitive information (API keys, tokens, etc.)
  • Counts tokens using llama3-tokenizer

Analysis Types

  • Token counting per file
  • Total token usage
  • File content analysis
  • Sensitive data detection

Data Presentation

  • Markdown formatted output
  • Code block formatting
  • Token count summaries
  • File organization hierarchy

Requirements

  • Node.js (>=14.0.0)
  • Git repository
  • npm or npx

Installation

npm install -g code-tokenizer-md

Usage

Quick Start

npx code-tokenizer-md

Programmatic Usage

import { MarkdownGenerator } from 'code-tokenizer-md';

const generator = new MarkdownGenerator({
  dir: './project',
  outputFilePath: './output.md',
});

const result = await generator.createMarkdownDocument();

Project Structure

src/
├── index.js              # Main exports
├── TokenCleaner.js       # Code cleaning and redaction
├── MarkdownGenerator.js  # Markdown generation logic
└── cli.js               # CLI implementation

Dependencies

{
  "dependencies": {
    "llama3-tokenizer-js": "^1.0.0"
  },
  "peerDependencies": {
    "node": ">=14.0.0"
  }
}

Extending

Adding Custom Patterns

const generator = new MarkdownGenerator({
  customPatterns: [{ regex: /TODO:/g, replacement: '' }],
  customSecretPatterns: [{ regex: /mySecret/g, replacement: '[REDACTED]' }],
});

Contributing

  1. Fork the repository
  2. Create a feature branch
  3. Commit your changes
  4. Push to the branch
  5. Open a Pull Request

Contribution Guidelines

  • Follow Node.js best practices
  • Include appropriate error handling
  • Add documentation for new features
  • Include tests for new functionality (this project needs a suite)
  • Update the README for significant changes

License

MIT © 2024 Geoff Seemueller

Note

This tool requires a git repository to function properly.

1.2.7

6 months ago

1.2.6

7 months ago

2.0.0

6 months ago

1.2.0

7 months ago

1.2.5

7 months ago

1.1.6

7 months ago

1.2.4

7 months ago

1.1.5

7 months ago

1.2.3

7 months ago

1.1.4

7 months ago

1.2.2

7 months ago

1.1.3

7 months ago

1.2.1

7 months ago

1.0.19

7 months ago

1.0.18

7 months ago

1.0.17

7 months ago

1.0.16

7 months ago

1.0.9

7 months ago

1.0.8

7 months ago

1.0.7

7 months ago

1.0.6

7 months ago

1.0.5

7 months ago

1.1.2

7 months ago

1.0.21

7 months ago

1.0.20

7 months ago

1.0.11-pr

7 months ago

1.0.11

7 months ago

1.0.10

7 months ago

1.0.15

7 months ago

1.0.14

7 months ago

1.0.13

7 months ago

1.0.12

7 months ago

1.0.4

7 months ago

1.0.3

7 months ago

1.0.2

7 months ago

1.0.1

7 months ago

1.0.0

8 months ago