0.0.5 • Published 12 months ago

sanechain v0.0.5

Weekly downloads
-
License
MIT
Repository
-
Last release
12 months ago

Sane Chain

An attempt to make langchainjs easier to work with

WIP - nothing works yet, just saving the name Some things work, just um - not tested, no warranties :1st_place_medal:

Adds the following loaders:

  1. Utility Classes
    1. DocumentLoader
  2. Loaders
    1. ChatGPT Loader
    2. Simpler GithubRepoLoader
    3. Roadmap

Utility Classes

DocumentLoader

This class essentially packages up all of langchainjs (plus sanechain) and creates a class: DocumentLoader that can basically load up all your documents regardless of type.

Example:

const filesAndDirectories = [
  'path/to/somefile.md',
  'path/to/somefile.pdf',
  'path/to/somefile.text',
  'path/to/somefile.html',
  'path/to/somedirectory',
  'https://github.com/some/repo',
  'https://github.com/some/other_repo',
  'path/to/chatgpt.json'
]

const documentLoader = new DocumentLoader(filesAndDirectories)
const documents = documentLoader.loadDocuments()
const splitDocuments = documentLoader.splitDocuments()
// Might take time, probably gonna implement a queue system to speed things up, already using async though.
// also @todo add full parity with all langchain python loaders.

Loaders

ChatGPT Loader

import { ChatGPTLoader } from './chat_gpt_loader.js';

const loader = new ChatGPTLoader('path/to/chat/log.json', 10);
const documents = await loader.load();

Simpler GithubRepoLoader

Insert github link, get repo documents.

  import {GithubRepoLoader} from 'sanechain'
  const loader = new GithubRepoLoader("https://github.com/owner/repo", { /*params*/ });
  const documents = await loader.load();

Roadmap

  • Models
    • General
    • Chat
    • Embeddings
  • Prompts
    • General Templates
    • Chat Template
    • Example Selectors
    • Output Parsers
  • Indexes (Primary focus at first)
    • Document Loaders %%
      • Airbyte JSON
      • Apify Dataset
      • Arxiv
      • AWS S3
      • AZLyrics
      • Azure Blob Storage
      • Bilibili
      • Blackboard
      • Blockchain
      • ChatGPT Data
      • Confluence
      • CoNLL-U
      • Copy / Paste
      • CSV (langchainjs)
      • Diffbot
      • Discord
      • DuckDB
      • Email
      • EPub (langchainjs)
      • EverNote
      • Facebook Chat
      • Figma
      • File Directory (langchainjs)
      • Git (langchainjs + custom url loader)
      • GitBook
      • Google BigQuery
      • Google Cloud Storage
      • Google Drive
      • Gutenberg
      • Hacker News
      • HTML
      • HuggingFace dataset
      • iFixit
      • Images
      • Image captions
      • IMDB
      • JSON Files (langchain)
      • Jupyter Notebook
      • Markdown (sorta, just parses using TextLoader)
      • MediaWikiDump
      • Microsoft OneDrive
      • Microsoft PowerPoint
      • Microsoft Word (langchainjs)
      • Modern Treasury
      • Notion DB 1/2
      • Notion DB 2/2
      • Obsidian
      • Pandas DataFrame
      • PDF (langchain)
      • Using PyPDFium2
      • ReadTheDocs Documentation
      • Reddit
      • Roam
      • Sitemap
      • Slack
      • Spreedly
      • Stripe
      • Subtitle (langchain)
      • Telegram
      • TOML
      • Twitter
      • Unstructured File (half way)
      • URL (langchainjs via puppetter, playwright, cheerio, etc)
      • Selenium URL Loader
      • Playwright URL Loader (langchainjs)
      • WebBaseLoader
      • WhatsApp Chat
      • Wikipedia
      • YouTube transcripts Text Splitters
      • Character Text Splitter
      • HuggingFace Length Function
      • Latext Text SPlitter
      • Markdown Text Splitter
      • NLTK Text Splitter
      • RecursiveCharacterTextSplitter
      • Spacy Text Splitter
      • tiktoken (OpenAI) Length Function
      • TiktokenTextSplitter
    • Vector stores
    • Retrievers
  • Memory (TBD)
  • Chains (TBD)
  • Agents
    • Tools (TBD)
    • Agents (TBD)
    • Toolkits (TBD)
    • AgentExecutors (TBD)