llama-ggml.js
serve websocket GGML 4/5bit Quantized LLM's based on Meta's LLaMa model with llama.ccp
serve websocket GGML 4/5bit Quantized LLM's based on Meta's LLaMa model with llama.ccp
use `npm i --save llama.native.js` to run lama.cpp models on your local machine. features a socket.io server and client that can do inference with the host of the model.