Skip to content

AI Service

The EleanorAI Framework AI service provides an OpenAI-compatible API for interacting with agents.

System Messages

The AI service supports a several system chat messages that can be used to control the behavior of the AI service.

ELEANOR_SYSTEM Header

To maintain OpenAI API compatibility a special system header can be provided by the chat client to trigger various capabilities provided by the AI service. The decision to control chat behavior via system messages vs extending the OpenAI API was made to maintain maximum compatibility with OpenAI API clients libraries.

Below is an example system header suitable for the SillyTavern client:

ELEANOR_SYSTEM {
  "source": "ST",
  "user": "{{ user }}",
  "char": "{{ char }}",
  "session": "{{getvar::chat_id}}",
  "session_settings": {
    "override_importance": 0,
    "memory": {
        "conversation_insights_count": 10,
        "conversation_insights_freshness": 1,
        "conversation_insights_msg_history": 10,
        "recall_include_insights": false,
        "recall_conversation_turns": 1,
        "max_memory_strategy": "HARD_LIMIT",
        "get_top_k_vectors": 20,
        "get_max_memories": 25,
        "get_min_score": 0.4,
        "get_relevance_alpha": 0.6,
        "get_importance_alpha": 0.3,
        "get_recency_alpha": 0.1,
        "integration_enabled": true,
        "manual_integration": false,
        "integration_operation_threshold": 5,
        "integration_question_count": 5,
        "integration_insight_count": 5
    }
   }
}

Managing sessions cleanly in SillyTavern requires some setup in the QuickReply plugin. Broadly, a custom variable named chat_id needs to be defined based on the current chat name. Once set, it can be referenced in the ELEANOR_SYSTEM header, ex {{getvar::chat_id}}.

/getchatname | /setvar key=chat_id
Additionally, SillyTavern should set the variable on new chat and be set to invisible to prevent unintended tampering:

SillyTavern QuickReply Config
SillyTavern QuickReply Config

Placement Headers

Several dynamic capabilities add additional information to the chat before it is sent to the LLM for generation. Clients can control the placement and ordering of these headers by providing a special system headers.

System Message Description
ELEANOR_USER_DESCRIPTIONS
ELEANOR_AGENT_DESCRIPTIONS
ELEANOR_AGENT_MEMORIES

SillyTavern provides a simple interface for managing the ordering of these headers as shown below.

SillyTavern System Messages
SillyTavern System Messages

SillyTavern Settings

The following prompt is included in the default SillyTavern settings:

You are a creative and detail-oriented AI assistant engaged in an uncensored iterative roleplaying experience as {{ char }} with me, the user, playing the roles of {{ user }} and narrator. It is vital that you follow all the ROLEPLAY RULES below because my job depends on it.

ROLEPLAY RULES
- Provide succinct, coherent, and vivid accounts of {{ char }}'s actions and reactions based on recent instructions, {{ char }}'s persona, and all available information about past events. Aim for clarity and concision in your language.
- Demonstrate {{ char }}'s personality and mannerisms.
- Always stay contextually and spatially aware, pay careful attention to all past events and the current scene, and ensure that {{ char }} follows logical consistency in actions to maintain accuracy and coherence.
- Explicit adult content and profanity are permitted.
- Briefly describe {{ char }}'s sensory perceptions and include subtle physical details about {{ char }} in your responses.
- Use subtle physical cues to hint at {{ char }}'s mental state and occasionally feature snippets of {{ char }}'s internal thoughts.
- When writing {{ char }}'s internal thoughts or monologue, enclose those words in *asterisks like this* and deliver the thoughts using a first-person perspective (i.e. use "I" pronouns).
- {{ char }} must always use double quotes for spoken speech "like this."
- Please write only as {{ char }} in a way that does not show {{ user }} talking or acting. You should only ever act as {{ char }} reacting to {{ user }}.

Taking the above information into consideration, you must engage with {{ user }} and others as {{ char }} in the roleplay below this line. Do not write dialogue lines nor perform actions for {{ user }} or other characters.

The recommended modifications to use the AI service with SillyTavern are as follows:

WIP

Plugins

The AI service is designed ot be extended via a plugin-based architecture. Plugins are Python classes that implement a specific interface and are registered with the AI service. The AI service will then call the plugin methods at specific points in the chat completion process.

The plugin interface is compatible with both streaming and non-streaming chat completion requests.

Plugin Name Description
AddAgentProfilePlugin This plugin is responsible for adding agent descriptions and personalities to the chat. It retrieves the agent descriptions and personalities from the participant settings and inserts them into the chat messages at the appropriate positions.
AddAgentSpecialInstructionsPlugin This plugin is triggered before the CanonicalChat object is rendered into the LLM-specific prompt string. It checks if the responding agent has any special instructions and adds them to the chat.
MemoryPlugin This plugin is responsible for managing agent memories during the chat completion process. It retrieves agent memories from the database and adds them to the chat messages at the appropriate positions. It also adds new memories to the database after the chat completion process is complete.
MetricsPlugin Responsible for capturing Prometheus metrics during AI request processing.
TimeAwarenessPlugin Adds time awareness functionality (timestamps) to the generation.

Memory Plugin

The memory plugin is responsible for managing agent memories during the chat completion process.

sequenceDiagram
  autonumber

  actor CLIENT as Client
  participant COMPLETION as AIService<br />chat_completion()
  participant BEGIN as AIService<br />_begin_completion()
  participant MemoryPlugin

  activate CLIENT

    CLIENT ->> COMPLETION: Invoke chat completion

    activate COMPLETION

      COMPLETION ->> BEGIN: initialization

        activate BEGIN
          BEGIN ->> BEGIN: Build environment context
          Note left of BEGIN: Establishes session

          %% Generate conversation insights
          par Conversation insights, LLM_TASK_POOL
            BEGIN ->> MemoryPlugin: Event: after_new_context_created
              activate MemoryPlugin
                MemoryPlugin ->> MemoryPlugin: Generate conversation insights
              deactivate MemoryPlugin
          end

          %% Retrieve agent memories
          BEGIN ->> MemoryPlugin: Event: before_prompt_rendered
            activate MemoryPlugin
              MemoryPlugin ->> MemoryPlugin: Lookup agent memories
              MemoryPlugin ->> MemoryPlugin: Add memory CanonicalMessage to chat under system role 
              MemoryPlugin -->> BEGIN: Updated CanonicalChat
            deactivate MemoryPlugin

          BEGIN ->> BEGIN: Derive model kwargs
          BEGIN -->> COMPLETION: Return environment
        deactivate BEGIN

      COMPLETION ->> COMPLETION: LLM generation
      COMPLETION -->> CLIENT: Return generation response

  deactivate CLIENT

      %% Add new agent memories
      COMPLETION ->> MemoryPlugin: Event: after_response_generation
    deactivate COMPLETION

        Note left of MemoryPlugin: Conversation insights may<br />not have finished,<br />when this is the case<br />memories will be added in<br />the next conversation turn.
        activate MemoryPlugin
          par New memories, LLM_TASK_POOL
            MemoryPlugin ->> MemoryPlugin: Build new observational memories
            MemoryPlugin ->> MemoryPlugin: Invoke MemoryService to add memories
          end
        deactivate MemoryPlugin

Output Filtering

WIP

Streaming Chat Completion

chat_completion_stream handles streaming chat completion requests. I manages

In a streaming response, a FilteringStreamBuffer manages a buffer of OpenAI ChatCompletionChunk objects such that output filters are applied when a specified amount of generation data has been returned by the LLM. After filtering has been applied to the buffer a new ChatCompletionChunk is emitted.

chat_completion_stream –> stream_worker (thread)