Sanechain NPM | npm.io

Sane Chain

An attempt to make langchainjs easier to work with

WIP - ~~nothing works yet, just saving the name~~ Some things work, just um - not tested, no warranties :1st_place_medal:

Adds the following loaders:

Utility Classes
1. DocumentLoader
Loaders

Utility Classes

DocumentLoader

This class essentially packages up all of langchainjs (plus sanechain) and creates a class: DocumentLoader that can basically load up all your documents regardless of type.

Example:

const filesAndDirectories = [
  'path/to/somefile.md',
  'path/to/somefile.pdf',
  'path/to/somefile.text',
  'path/to/somefile.html',
  'path/to/somedirectory',
  'https://github.com/some/repo',
  'https://github.com/some/other_repo',
  'path/to/chatgpt.json'
]

const documentLoader = new DocumentLoader(filesAndDirectories)
const documents = documentLoader.loadDocuments()
const splitDocuments = documentLoader.splitDocuments()
// Might take time, probably gonna implement a queue system to speed things up, already using async though.
// also @todo add full parity with all langchain python loaders.

Loaders

ChatGPT Loader

import { ChatGPTLoader } from './chat_gpt_loader.js';

const loader = new ChatGPTLoader('path/to/chat/log.json', 10);
const documents = await loader.load();

Simpler GithubRepoLoader

Insert github link, get repo documents.

  import {GithubRepoLoader} from 'sanechain'
  const loader = new GithubRepoLoader("https://github.com/owner/repo", { /*params*/ });
  const documents = await loader.load();

Roadmap

Models
- General
- Chat
- Embeddings
Prompts
- General Templates
- Chat Template
- Example Selectors
- Output Parsers
Indexes (Primary focus at first)
- Document Loaders %%
  - Airbyte JSON
  - Apify Dataset
  - Arxiv
  - AWS S3
  - AZLyrics
  - Azure Blob Storage
  - Bilibili
  - Blackboard
  - Blockchain
  - ChatGPT Data
  - Confluence
  - CoNLL-U
  - Copy / Paste
  - CSV (langchainjs)
  - Diffbot
  - Discord
  - DuckDB
  - Email
  - EPub (langchainjs)
  - EverNote
  - Facebook Chat
  - Figma
  - File Directory (langchainjs)
  - Git (langchainjs + custom url loader)
  - GitBook
  - Google BigQuery
  - Google Cloud Storage
  - Google Drive
  - Gutenberg
  - Hacker News
  - HTML
  - HuggingFace dataset
  - iFixit
  - Images
  - Image captions
  - IMDB
  - JSON Files (langchain)
  - Jupyter Notebook
  - Markdown (sorta, just parses using TextLoader)
  - MediaWikiDump
  - Microsoft OneDrive
  - Microsoft PowerPoint
  - Microsoft Word (langchainjs)
  - Modern Treasury
  - Notion DB 1/2
  - Notion DB 2/2
  - Obsidian
  - Pandas DataFrame
  - PDF (langchain)
  - Using PyPDFium2
  - ReadTheDocs Documentation
  - Reddit
  - Roam
  - Sitemap
  - Slack
  - Spreedly
  - Stripe
  - Subtitle (langchain)
  - Telegram
  - TOML
  - Twitter
  - Unstructured File (half way)
  - URL (langchainjs via puppetter, playwright, cheerio, etc)
  - Selenium URL Loader
  - Playwright URL Loader (langchainjs)
  - WebBaseLoader
  - WhatsApp Chat
  - Wikipedia
  - YouTube transcripts Text Splitters
  - Character Text Splitter
  - HuggingFace Length Function
  - Latext Text SPlitter
  - Markdown Text Splitter
  - NLTK Text Splitter
  - RecursiveCharacterTextSplitter
  - Spacy Text Splitter
  - tiktoken (OpenAI) Length Function
  - TiktokenTextSplitter
- Vector stores
- Retrievers
Memory (TBD)
Chains (TBD)
Agents
- Tools (TBD)
- Agents (TBD)
- Toolkits (TBD)
- AgentExecutors (TBD)

binary-extensions bullmq d3-dsv epub2 html-to-text langchain mammoth mime-types pdf-parse simple-git srt-parser-2

@everything-registry/sub-chunk-2707

3 years ago

3 years ago

3 years ago

3 years ago

3 years ago