@robypag/sap-ai-core-plugin NPM

AI Core Plugin

This package is a CDS Plugin that provides easy access to SAP AI Core Generative AI Hub functionalities. It aims to enable a configuration-based access to Completions and Embeddings in a CAP project, with minimal implementation overhead.

It takes free inspiration from the original CAP LLM Plugin and extends from there.

Please note that THIS IS NOT an official SAP software

!NOTE
This plugin is a personal spare-time project and still requires a lot of work. It has bugs and lacks some key features, so contributions and collaborations are welcome and would be greatly appreciated. But...it currently has only a few commits and it basically works, so there is potential :blush:

Introduction

This plugin offers a simplified way to setup an application to include AI-based conversations. It completely handles system prompts, completions and chat context so that the caller application needs only to provide new user messages and act on responses.

Similarly, it handles a simplified way to generate and store embeddings in a HANA database - assuming that its Vector Engine is enabled.

Please read carefully the documentation, especially the section that describes the managed and un-managed modes.

Installing

Install the package via npm install @robypag/sap-ai-core-plugin. Be sure to satisfy peerDepencencies:

"peerDependencies": {
    "@sap/cds": ">= 7.9"
}

AI Core Plugin

Setup

The plugin uses CAP configuration to set itself up, so it requires a cds configuration entry either in package.json or .cdsrc.json. It comes with a preconfigured schema that helps with value input.

You can define a configuration for both the completion capability and the embeddings capability.

The minimum need is to provide completion configuration section of the plugin:

{
  "cds": {
    "requires": {
      "ai-core": {
        "kind": "ai-core":,
        "completions": {
          "destination": "<NAME_OF_BTP_DESTINATION>",
          "resourceGroup": "<AI_CORE_RESOURCE_GROUP>",
          "deploymentId": "<AI_CORE_DEPLOYMENT_ID_OF_COMPLETION_MODEL>",
          "apiVersion": "<AI_CORE_COMPLETION_MODEL_API_VERSION>",
          "temperature": "<COMPLETION_MODEL_TEMPERATURE>"
        }
      }
    }
  }
}

Similarly, you can configure the embeddings section:

{
  "cds": {
    "requires": {
      "ai-core": {
        ...
        "embeddings": {
          "destination": "<NAME_OF_BTP_DESTINATION>",
          "resourceGroup": "<AI_CORE_RESOURCE_GROUP>",
          "deploymentId": "<AI_CORE_DEPLOYMENT_ID_OF_EMBEDDING_MODEL>",
          "apiVersion": "<AI_CORE_EMBEDDING_MODEL_API_VERSION>"
        }
      }
    }
  }
}

Here is a breakdown of each property:

destination: the name of the BTP destination that points to the AI Core Service Instance. The plugin uses SAP Cloud SDK Connectivity to look this up.
resourceGroup: the name of AI Core resource group under which Configurations and Deployments are created - see Resource Groups.
deploymentId: The ID of the model deployment
apiVersion: The API version of the model. Find the available values here
temperature: (Only valid for Completions) Allows to influence the predictability of the generated text. Accepts values from 0 to 1, where 0 is the most deterministic (more predictable and prone to repetitions) and 1 is the less deterministic (less predictable but more prone to hallucinations)

!WARNING
Embeddings can only be used when running on a HANA database. SQLite does not support vectors and therefore cannot process similarity searches.

The plugin can be configured in a managed and an un-managed way. This is just a way to tell the plugin runtime to discriminate whether you would like only to use Core API functionalities like completions and embeddings or you want to have a completely managed solution, that includes database operations and context handling.

Un-Managed Configuration

To just use API functions, set up the corresponding configuration object as follows:

"ai-core": {
    "completions": {
        "managed": false,
        // Other properties of the completions object
    }
}

The same applies for embeddings. With this configuration, the plugin does not check your database model nor the service actions. It will act as a simple proxy between your code and AI Core, using the configuration provided.

This is the default configuration

Managed Configuration

The managed configuration uses all the embedded functionalities of the plugin, which are described in the following paragraphs.

Use AI Artifacts

The plugin offers a set of aspects to simplify database modeling when using AI capabilities. You can define entities at database level, include the relative aspect and you are good to go. Each aspect comes with specific custom annotations that allow the plugin to determine which entity is used to do what. You can flexibly decide wheter to use aspects or to model your database and add AI annotations to your entities.

There are currently 4 available aspects and 8 available annotations.

Annotations

Annotations allow you to "mark" specific entities, properties and functions so that the plugin knows how to behave:

Annotation Name	For	Description
`@AIConversations`	Entity	Sets the annotated entity as the "Conversations" entity
`@AIMessages`	Entity	Sets the annotated entity as the "Messages" entity
`@AISystemPrompts`	Entity	Sets the annotated entity as the source for static System prompts
`@AIEmbeddingsStorage`	Entity	Sets the annotated entity as the repository for vectorized texts
`@AISummarize`	Property	Marks the property as summarized: by default is the `title` of the `@AIConversations` entity. The value of this property will be generated by a completion
`@AIEmbedding`	Property	Marks the property as the container for vector values in the `@AIEmbeddingsStorage` entity. Must be assigned to a field of type `Vector`
`@AITextChunk`	Property	Marks the property as the container for text values in the `@AIEmbeddingsStorage` entity
`@AICompletion`	Action	Annotates an action to act as a Completion endpoint
`@AIEmbeddingGenerator`	Action	Annotates an action to act as a embedding vector generator

Entity and property annotations are used at runtime to determine how to properly handle the persistence of Messages, Conversations and Embeddings. Action annotations are used at runtime - specifically at cds.once('served') event - to attach custom handlers to action and automatically handle the processing of Completions and Embedding generation. More on this later.

Aspects

Above annotations are automatically assigned if you decide to use the pre-defined aspects defined in index.cds.

AIConversations: Represents the base entity that contains a list of conversation between a User and the AI. It comes with a predefined @AIConversations annotation and includes a single title property annotated with @AISummarize.
AIMessages: Represents the base entity that contains messages exchanged between a User and the AI for a given Conversation. Includes the following properties:

Property	Type	Description
`content`	LargeString	Content of the message sent by either the user or the AI
`role`	String enum 'user'/'system'/'assistant'/'tool'	The role of message sender

AISystemPrompt: Allows to define static texts to be used as context during a conversation. They represent the value of the system message role in a conversation. There are currently two available types: SIMPLE and CONTEXT_AWARE. As the name implies, the first will be used during simple conversations, whereas the latter will be considered during RAG-aware chats.
AIDocumentChunks: This is the base entity that contains vector embeddings. Comes with 3 properties:

Property	Type	Description
`embedding`	Vector(1536)	The vector representation of a text-chunk. Comes annotated with `@AIEmbedding`
`text`	LargeString	The original text-chunk from which vectors are generated. Comes annotated with `@AITextChunk`
`source`	LargeString	The reference to the original text or document from which vectors and text-chunks are determined

Entity Modeling

As described, the above artifacts allow to design a simple database model to satisfy the minimal configuration to perform conversations and embeddings.

!IMPORTANT
Since aspects do not allow to manage compositions or associations, developer must add corresponding properties to entities annotated with @AIConversations and @AIMessages (regardless if entities include provided aspects or not). Specifically:

entity Chats: AIConversations {
  ...
  Messages: Composition of many Messages on Messages.Chat = $self;
}
...
entity Messages: AIMessages {
  ...
  key Chat: Association to one Chat;
}

This way, the plugin knows how to deal with relationships between the two entities. In future enhancements, the plugin will automatically add missing relationship between these base entities.

Completions

Completions are the most basic functionality of an AI chat. They allow message exchange between a User and the AI. It's very easy using AI Core to perform a "completion": given a deployment ID for a completion model, one POST call to the completion endpoint will provide an AI response.

The plugin simplifies the consumption of the completion model by attaching to an arbitrary OData action that is annotated with @AICompletion.

!IMPORTANT
Please note that the annotated action must satisfy the following parameter signature:
{ conversationID: uuid | null, content: string, useRag: boolean }

To be effective, Completions must take care about two key points: system prompt and chat context.

System Prompt

The system prompt defines the basic behavior of the AI during a chat session: it basically provides "the instructions" to the AI, so that its answers are generated around a specific topic (or persona) and they are not completely un-deterministic. See prompt engineering.

It is usually sent once per conversation and is hidden to the User perspective: however, there are instances in which the system prompt can dynamically change during a conversation - for example during RAG-aware chats.

During "normal", non RAG conversations, the system context is calculated once: at the creation of a new Conversation, that is, when a Message that has no relationship with an existing Conversation is sent.

Chat Context

Chat context represents the entire history of messages exchanged during a conversation: an effective AI chat "remembers" previous messages, in order to not repeat itself and to keep a true sense of conversing. To keep a chat context, LLMs usually require to receive the entire history of messages whenever a new one is sent: OpenAI defined a common standard in which messages can be sent as a JSON array, where each element is an object like this:

{
    role: 'system' | 'user' | 'assistant',
    content: 'an arbitrary string that represents a message'
}

The AI Core plugin automatically manages the chat context, by storing Messages of a Conversation in the entities annotated with @AIConversations and @AIMessages: on each message, the context is rebuilt and sent to the completion endpoint.

!TIP
Since AI Core Generative Hub supports multiple completion models, there's no unified endpoint that can generically serve all LLMs. For example, OpenAI models like GPT-4o or GPT-4o-mini will respond to an url like /chat/completions?api-version=xxx whereas Anthropic models like claude-3.5-sonnet will respond to an url like /invoke.
This plugin does its best to automatically determine the correct endpoint: however, it currently is a static mapping between the model name and the corresponding completion URL. You can find it here.

Embeddings

As an LLM would say:

Embedding is a way to represent data, like words or images, as numerical vectors to capture relationships and meaning. Embeddings allow machines to understand, compare, and process data more effectively by transforming complex information into numerical forms that highlight patterns, similarities, and differences.

The plugin provides an easy way to produce embeddings from an arbitrary text or piece of data. There are currently two ways in which you can get embeddings:

Using an action annotated with @AIEmbeddingGenerator: whichever text is sent to the action, will be returned as a numerical vector.
By cds.connect.to('ai-core') and calling the getEmbeddings() API function.

!WARNING To use embeddings, the corresponding cds configuration MUST be set. See setup.

RAG-Aware Completions

RAG-aware completions combine Retrieval-Augmented Generation (RAG) with conversational AI, enhancing responses by retrieving relevant external information, leading to more accurate, informed, and contextually appropriate dialogue in real-time.

The plugin uses the HANA Vector Engine to perform similarity searches and provide additional, specific context to the LLM.

During RAG-aware conversations, the system prompt is re-calculated on every new message: using the user query, a similarity search is performed on the entity annotated with the @AIEmbeddingsStorage entity and the resulting context is used as system prompt. This allows AI answers to be more tailored on application needs, avoiding a broader context and limiting answers to a specific topic.

RAG-aware conversations are activated by providing a truthy value to the parameter useRag of the completion action annotated with @AICompletion.

API

!NOTE API Calls still require the minimal configuration for embeddings and completions. See Setup.

You can always call the Core API functions, regardless of the managed aspects and actions. There are 3 main functions:

Function	Parameters	Description
genericCompletion(messages)	Array<{ role: string, content: string }>	Performs a `completion` call to the LLM deployed in the configured deploymentId. It expects a full chat context, including the `system` role. Returns the AI response in the same format.
createEmbeddings(text)	text: string	Generates a Vector of embeddings, using the LLM deployed in the configured deploymentId. Returns an array of numbers.
vectorSearch(params)	See below	Allows the execution of a generic similarity search on HANA

The third function vectorSearch allows to perform vector-based searches on a user-specified table. It accepts the following parameter: |Name|Type|Description| |---|---|---| |query|string|The text to search for| |tableName|string|The name in HANA format that contain embeddings and texts. I.E. SAP_DEMO_EMBEDDINGS| |embeddingColumnName|string|The name of table field that vectorized representation of data. Must be of type REAL_VECTOR (cds.Vector(1536) in CDS)| |textColumnName|string|The name of table field that contains textual representation of data| |searchAlgorithm|string|The name of similarity algorithm. HANA currently supports COSINE_SIMILARITY and L2DISTANCE| |minScore|number|A value between 0 and 1. It will be used to filter out elements with a score lower than the specified value| |candidates|number|Number of candidates to read from HANA|

Returns an object with found content and a metrics object that includes similarity scores and the table entry that generated the result:

{
    content: ['I am one result in textual representation', 'I am number two'],
    metrics: [{
        score: 0.945424895818,
        textContent: 'I am one result in textual representation',
        tableEntry: {
            foo: 'bar'
        }
    }, {...}]
}

Local Testing

The plugin can only be tested if associated with a CAP application: you can quickly spin up a basic bookshop application and add the plugin usage. Testing with SQLite will only allow the usage of simple Completions: vectors are not supported in SQLite, so Embeddings and RAG-aware Completions are not working.

There are currently no checks performed by the plugin on this: if you try to deploy a model that uses Vectors to SQLite, the database driver will throw an error.

You can however perform an hybrid testing and bind your application to an SAP HANA service and still run locally. To simplify development, bind to a destination service instance as well, in order to easily consume the required destination that points to AI Core deployments.