Git-repo-parser NPM

git-repo-parser

A powerful tool to scrape all files from a GitHub repository and convert them into JSON or plain text format.

Installation

Install the package globally using npm:

npm install -g git-repo-parser

Or add it to your project as a dependency:

npm install git-repo-parser

Usage

Command Line Interface (CLI)

This package provides two CLI commands:

git-repo-to-json: Scrapes a GitHub repository and saves the result as a JSON file.
git-repo-to-text: Scrapes a GitHub repository and saves the result as a plain text file.

Example usage:

git-repo-to-json https://github.com/username/repo-name.git
git-repo-to-text https://github.com/username/repo-name.git

The scraped data will be saved as files.json or files.txt in your current directory.

Programmatic Usage

You can also use the package in your Node.js projects:

import { scrapeRepositoryToJson, scrapeRepositoryToPlainText } from 'git-repo-parser';

// To get JSON output
const jsonResult = await scrapeRepositoryToJson('https://github.com/username/repo-name.git');

// To get plain text output
const textResult = await scrapeRepositoryToPlainText('https://github.com/username/repo-name.git');

API

`scrapeRepositoryToJson(repoUrl: string): Promise<FileData[]>`

Scrapes the given GitHub repository and returns a promise that resolves to an array of FileData objects.

`scrapeRepositoryToPlainText(repoUrl: string): Promise<string>`

Scrapes the given GitHub repository and returns a promise that resolves to a string containing the repository contents in a structured plain text format.

FileData Interface

The FileData interface represents the structure of files and directories in the JSON output:

interface FileData {
    name: string;
    path: string;
    type: 'file' | 'directory';
    children?: FileData[];
    content?: string;
}

Features

Clones the repository locally (temporary)
Ignores binary files and common non-source files
Supports nested directory structures
Provides both JSON and plain text output formats
Cleans up cloned repository after scraping

Ignored Files

The following file types and patterns are ignored during scraping:

package-lock.json
Binary files (pdf, png, jpg, jpeg, gif, ico, svg, woff, woff2, eot, ttf, otf)
Media files (mp4, avi, webm, mov, mp3, wav, flac, ogg, webp)
Debug and error logs (npm-debug, yarn-debug, yarn-error)
Configuration files (tsconfig, jest.config)
The .git directory

License

This project is licensed under the MIT License.

Author

arnab2001

Contributing

Contributions, issues, and feature requests are welcome. Feel free to check issues page if you want to contribute.

Show your support

Give a ⭐️ if this project helped you!

1 year ago

1 year ago