0.5.0-46 • Published 10 months ago

cortex-cpp v0.5.0-46

Weekly downloads
-
License
Apache-2.0
Repository
github
Last release
10 months ago

cortex-cpp - Embeddable AI

⚠️ cortex-cpp is currently in Development: Expect breaking changes and bugs!

About cortex-cpp

Cortex-cpp is a streamlined, stateless C++ server engineered to be fully compatible with OpenAI's API, particularly its stateless functionalities. It integrates a Drogon server framework to manage request handling and includes features like model orchestration and hardware telemetry, which are essential for production environments.

Remarkably compact, the binary size of cortex-cpp is around 3 MB when compressed, with minimal dependencies. This lightweight and efficient design makes cortex-cpp an excellent choice for deployments in both edge computing and server contexts.

Utilizing GPU capabilities does require CUDA.

Prerequisites

Hardware

Ensure that your system meets the following requirements to run Cortex:

  • OS:
    • MacOSX 13.6 or higher.
    • Windows 10 or higher.
    • Ubuntu 18.04 and later.
  • RAM (CPU Mode):
    • 8GB for running up to 3B models.
    • 16GB for running up to 7B models.
    • 32GB for running up to 13B models.
  • VRAM (GPU Mode):

    • 6GB can load the 3B model (int4) with ngl at 120 ~ full speed on CPU/ GPU.
    • 8GB can load the 7B model (int4) with ngl at 120 ~ full speed on CPU/ GPU.
    • 12GB can load the 13B model (int4) with ngl at 120 ~ full speed on CPU/ GPU.
  • Disk: At least 10GB for app and model download.

Quickstart

To install Cortex CLI, follow the steps below: 1. Download cortex-cpp here: https://github.com/janhq/cortex/releases 2. Install cortex-cpp by running the downloaded file.

  1. Download a Model:
mkdir model && cd model
wget -O llama-2-7b-model.gguf https://huggingface.co/TheBloke/Llama-2-7B-Chat-GGUF/resolve/main/llama-2-7b-chat.Q5_K_M.gguf?download=true
  1. Run cortex-cpp server:
cortex-cpp
  1. Load a model:
curl http://localhost:3928/inferences/server/loadmodel \
  -H 'Content-Type: application/json' \
  -d '{
    "llama_model_path": "/model/llama-2-7b-model.gguf",
    "ctx_len": 512,
    "ngl": 100,
  }'
  1. Make an Inference:
curl http://localhost:3928/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{
    "messages": [
      {
        "role": "user",
        "content": "Who won the world series in 2020?"
      },
    ]
  }'

Table of parameters

Below is the available list of the model parameters you can set when loading a model in cortex-cpp:

ParameterTypeDescription
llama_model_pathStringThe file path to the LLaMA model.
nglIntegerThe number of GPU layers to use.
ctx_lenIntegerThe context length for the model operations.
embeddingBooleanWhether to use embedding in the model.
n_parallelIntegerThe number of parallel operations.
cont_batchingBooleanWhether to use continuous batching.
user_promptStringThe prompt to use for the user.
ai_promptStringThe prompt to use for the AI assistant.
system_promptStringThe prompt to use for system rules.
pre_promptStringThe prompt to use for internal configuration.
cpu_threadsIntegerThe number of threads to use for inferencing (CPU MODE ONLY)
n_batchIntegerThe batch size for prompt eval step
caching_enabledBooleanTo enable prompt caching or not
clean_cache_thresholdIntegerNumber of chats that will trigger clean cache action
grp_attn_nIntegerGroup attention factor in self-extend
grp_attn_wIntegerGroup attention width in self-extend
mlockBooleanPrevent system swapping of the model to disk in macOS
grammar_fileStringYou can constrain the sampling using GBNF grammars by providing path to a grammar file
model_typeStringModel type we want to use: llm or embedding, default value is llm

Download

Download the latest or older versions of Cortex-cpp at the GitHub Releases.

Manual Build

Manual build is a process in which the developers build the software manually. This is usually done when a new feature is implemented, or a bug is fixed. The process for this project is defined in .github/workflows/cortex-build.yml

Contact Support

  • For support, please file a GitHub ticket.
  • For questions, join our Discord here.
  • For long-form inquiries, please email hello@jan.ai.

Star History

Star History Chart

0.5.0-46

10 months ago

0.5.0-45

10 months ago

0.5.0-44

10 months ago

0.5.0-43

10 months ago

0.5.0-42

10 months ago

0.5.0-41

10 months ago

0.5.0-40

10 months ago

0.5.0-38

10 months ago

0.5.0-37

10 months ago

0.5.0-39

10 months ago

0.5.0-32

10 months ago

0.5.0-34

10 months ago

0.5.0-33

10 months ago

0.5.0-36

10 months ago

0.5.0-35

10 months ago

0.5.0-18

10 months ago

0.5.0-17

10 months ago

0.5.0-19

10 months ago

0.5.0-27

10 months ago

0.5.0-26

10 months ago

0.5.0-29

10 months ago

0.5.0-28

10 months ago

0.5.0-21

10 months ago

0.5.0-20

10 months ago

0.5.0-23

10 months ago

0.5.0-22

10 months ago

0.5.0-25

10 months ago

0.5.0-24

10 months ago

0.5.0-30

10 months ago

0.5.0-31

10 months ago

0.5.0-3

11 months ago

0.5.0-2

11 months ago

0.5.0-16

11 months ago

0.5.0-15

11 months ago

0.5.0-10

11 months ago

0.5.0-12

11 months ago

0.5.0-11

11 months ago

0.5.0-14

11 months ago

0.5.0-13

11 months ago

0.5.0-9

11 months ago

0.5.0-8

11 months ago

0.5.0-5

11 months ago

0.5.0-4

11 months ago

0.5.0-7

11 months ago

0.5.0-6

11 months ago

0.5.0-1

11 months ago

0.4.37-6

11 months ago

0.4.37-7

11 months ago

0.5.0

11 months ago

0.4.37-5

11 months ago

0.4.37-1

11 months ago

0.4.32-6

11 months ago

0.4.37-2

11 months ago

0.4.32-8

11 months ago

0.4.32-9

11 months ago

0.4.32-10

11 months ago

0.4.32-11

11 months ago

0.4.32-12

11 months ago

0.1.14-cortex-js

11 months ago

0.4.37

11 months ago

0.4.35

11 months ago

0.4.36

11 months ago

0.4.32-3

11 months ago

0.4.32-4

11 months ago

0.4.32-5

11 months ago

0.1.15-cortex-js

11 months ago

0.4.31

11 months ago

0.4.32

11 months ago

0.4.30

11 months ago

0.4.28

11 months ago

0.4.29

11 months ago

0.4.33

11 months ago

0.4.34

11 months ago

0.4.32-1

11 months ago

0.4.27

11 months ago

0.4.26

11 months ago

0.4.25

11 months ago

0.4.24

11 months ago

0.4.23

11 months ago

0.4.22

11 months ago