Gpt4all NPM | npm.io

GPT4All Node.js API

Native Node.js LLM bindings for all.

yarn add gpt4all@latest

npm install gpt4all@latest

pnpm install gpt4all@latest

Breaking changes in version 4!!

See Transition

See API Reference
See Examples
See Developing
GPT4ALL nodejs bindings created by jacoobes, limez and the nomic ai community, for all to use.
spare change for a college student? 🤑

Api Examples

Chat Completion

Use a chat session to keep context between completions. This is useful for efficient back and forth conversations.

import { createCompletion, loadModel } from "../src/gpt4all.js";

const model = await loadModel("orca-mini-3b-gguf2-q4_0.gguf", {
    verbose: true, // logs loaded model configuration
    device: "gpu", // defaults to 'cpu'
    nCtx: 2048, // the maximum sessions context window size.
});

// initialize a chat session on the model. a model instance can have only one chat session at a time.
const chat = await model.createChatSession({
    // any completion options set here will be used as default for all completions in this chat session
    temperature: 0.8,
    // a custom systemPrompt can be set here. note that the template depends on the model.
    // if unset, the systemPrompt that comes with the model will be used.
    systemPrompt: "### System:\nYou are an advanced mathematician.\n\n",
});

// create a completion using a string as input
const res1 = await createCompletion(chat, "What is 1 + 1?");
console.debug(res1.choices[0].message);

// multiple messages can be input to the conversation at once.
// note that if the last message is not of role 'user', an empty message will be returned.
await createCompletion(chat, [
    {
        role: "user",
        content: "What is 2 + 2?",
    },
    {
        role: "assistant",
        content: "It's 5.",
    },
]);

const res3 = await createCompletion(chat, "Could you recalculate that?");
console.debug(res3.choices[0].message);

model.dispose();

Stateless usage

You can use the model without a chat session. This is useful for one-off completions.

import { createCompletion, loadModel } from "../src/gpt4all.js";

const model = await loadModel("orca-mini-3b-gguf2-q4_0.gguf");

// createCompletion methods can also be used on the model directly.
// context is not maintained between completions.
const res1 = await createCompletion(model, "What is 1 + 1?");
console.debug(res1.choices[0].message);

// a whole conversation can be input as well.
// note that if the last message is not of role 'user', an error will be thrown.
const res2 = await createCompletion(model, [
    {
        role: "user",
        content: "What is 2 + 2?",
    },
    {
        role: "assistant",
        content: "It's 5.",
    },
    {
        role: "user",
        content: "Could you recalculate that?",
    },
]);
console.debug(res2.choices[0].message);

Embedding

import { loadModel, createEmbedding } from '../src/gpt4all.js'

const embedder = await loadModel("nomic-embed-text-v1.5.f16.gguf", { verbose: true, type: 'embedding'})

console.log(createEmbedding(embedder, "Maybe Minecraft was the friends we made along the way"));

Streaming responses

import { loadModel, createCompletionStream } from "../src/gpt4all.js";

const model = await loadModel("mistral-7b-openorca.gguf2.Q4_0.gguf", {
    device: "gpu",
});

process.stdout.write("Output: ");
const stream = createCompletionStream(model, "How are you?");
stream.tokens.on("data", (data) => {
    process.stdout.write(data);
});
//wait till stream finishes. We cannot continue until this one is done.
await stream.result;
process.stdout.write("\n");
model.dispose();

Async Generators

import { loadModel, createCompletionGenerator } from "../src/gpt4all.js";

const model = await loadModel("mistral-7b-openorca.gguf2.Q4_0.gguf");

process.stdout.write("Output: ");
const gen = createCompletionGenerator(
    model,
    "Redstone in Minecraft is Turing Complete. Let that sink in. (let it in!)"
);
for await (const chunk of gen) {
    process.stdout.write(chunk);
}

process.stdout.write("\n");
model.dispose();

Offline usage

do this b4 going offline

curl -L https://gpt4all.io/models/models3.json -o ./models3.json

import { createCompletion, loadModel } from 'gpt4all'

//make sure u downloaded the models before going offline!
const model = await loadModel('mistral-7b-openorca.gguf2.Q4_0.gguf', {
    verbose: true,
    device: 'gpu',
    modelConfigFile: "./models3.json"
});

await createCompletion(model, 'What is 1 + 1?', { verbose: true })

model.dispose();

Develop

Build Instructions

binding.gyp is compile config
Tested on Ubuntu. Everything seems to work fine
Tested on Windows. Everything works fine.
Sparse testing on mac os.
MingW script works to build the gpt4all-backend. We left it there just in case. HOWEVER, this package works only with MSVC built dlls.

Requirements

git
node.js >= 18.0.0
yarn
node-gyp
- all of its requirements.
(unix) gcc version 12
(win) msvc version 143
- Can be obtained with visual studio 2022 build tools
python 3
On Windows and Linux, building GPT4All requires the complete Vulkan SDK. You may download it from here: https://vulkan.lunarg.com/sdk/home
macOS users do not need Vulkan, as GPT4All will use Metal instead.

Build (from source)

git clone https://github.com/nomic-ai/gpt4all.git
cd gpt4all-bindings/typescript

The below shell commands assume the current working directory is typescript.
To Build and Rebuild:

node scripts/prebuild.js

llama.cpp git submodule for gpt4all can be possibly absent. If this is the case, make sure to run in llama.cpp parent directory

git submodule update --init --recursive

yarn build:backend

This will build platform-dependent dynamic libraries, and will be located in runtimes/(platform)/native

Test

yarn test

Source Overview

src/

Extra functions to help aid devex
Typings for the native node addon
the javascript interface

test/

simple unit testings for some functions exported.
more advanced ai testing is not handled

spec/

Average look and feel of the api
Should work assuming a model and libraries are installed locally in working directory

index.cc

The bridge between nodejs and c. Where the bindings are.

prompt.cc

Handling prompting and inference of models in a threadsafe, asynchronous way.

Known Issues

why your model may be spewing bull 💩
- The downloaded model is broken (just reinstall or download from official site)
Your model is hanging after a call to generate tokens.
- Is nPast set too high? This may cause your model to hang (03/16/2024), Linux Mint, Ubuntu 22.04
Your GPU usage is still high after node.js exits.
- Make sure to call model.dispose()!!!

Roadmap

This package has been stabilizing over time development, and breaking changes may happen until the api stabilizes. Here's what's the todo list:

[ ] Purely offline. Per the gui, which can be run completely offline, the bindings should be as well.
[ ] NPM bundle size reduction via optionalDependencies strategy (need help)
- Should include prebuilds to avoid painful node-gyp errors
[x] createChatSession ( the python equivalent to create_chat_session )
[x] generateTokens, the new name for createTokenStream. As of 3.2.0, this is released but not 100% tested. Check spec/generator.mjs!
[x] ~~createTokenStream, an async iterator that streams each token emitted from the model. Planning on following this example~~ May not implement unless someone else can complete
[x] prompt models via a threadsafe function in order to have proper non blocking behavior in nodejs
[x] generateTokens is the new name for this^
[x] proper unit testing (integrate with circle ci)
[x] publish to npm under alpha tag gpt4all@alpha
[x] have more people test on other platforms (mac tester needed)
[x] switch to new pluggable backend

Changes

This repository serves as the new bindings for nodejs users.

If you were a user of these bindings, they are outdated.
Version 4 includes the follow breaking changes
- createEmbedding & EmbeddingModel.embed() returns an object, EmbeddingResult, instead of a float32array.
- Removed deprecated types ModelType and ModelFile
- Removed deprecated initiation of model by string path only