llm_settings

LLM settings module

Classes

LLMFactory

Bases: IoCFactoryModel

OpenAI compatible API endpoint. See langchain_openai.OpenAI

Attributes:

openai_api_key (SecretStr) –

Attributes

openai_api_key `class-attribute` `instance-attribute`

openai_api_key: SecretStr = Field(
    default_factory=lambda: SecretStr(""),
    exclude=True,
    description="OpenAPI API key",
)

LLMSettings

Bases: BaseModel

LLM connection configuration

Attributes:

name (str) –

Name
max_position_embeddings (int) –

Max Position Embeddings
description (str) –

Human readable description of the LLM connection. BEst practice is to document the intended use cas(es), limitations, and other quirks.
system_prompt (str) –

Text injected at the beginning of the prompt. This will be added outside of any formatting triggered by chat_completion_format
prompt_format (str) –

LLM completion prompt format
output_filter_factories (List[IoCFactoryModel]) –

List of output factories, should implement eleanor.llm.generation_filters.GenerationFilter. These are filters that get applied to the LLM response on API calls. For filtering streaming responses, consider setting stream_buffer_flush_hwm as well.
chat_completion_format (str) –

When a completion-based LLM (such as Mistral) that doesn’t easily support multi-user chat sessions and system messages, it can be desirable to format the incoming chat request first before using the LLM’s native format template. In the Mistral example, say we have a multi-user chat with system messages and need to use their simple [INST],[/INST] formatting. When chat_completion_format is set to ‘chatml’, the incoming chat request will first get converted to a ChatML string before it is encapsulated in the LLM’s normal prompt format. When this is set to None, the LLM’s normal prompt_format will be used.
stream_buffer_flush_hwm (int) –

When streaming a response on the Eleanor Framework completions API, this value determines how many characters are held by the buffer before they are sent bak to the client. Effectively, this controls the length of string seen by filters. Higher values will give response filtering more data to look at then pattern matching at the cost of a more delayed response. A value of 0 will disable buffering and filtering on streaming responses. This value has no effect on the completion API.
stream_char_response_delay (float) –

When streaming a response on the Eleanor Framework completions API, this value determines how long the server waits before sending a character chunk back to the the client. This value has no effect on the completion API. Small values are typically better but could spike CPU usage if too small.
tokenizer_factory (IoCFactoryModel) –

Factory for the LLM tokenizer instance. When the LLM settings object is created it will be used to initialize the ‘tokenizer’ field

Attributes

chat_completion_format `class-attribute` `instance-attribute`

chat_completion_format: Optional[str] = Field(
    default=None,
    description="When a completion-based LLM (such as Mistral) that doesn't easily support multi-user chat sessions and system messages, it can be desirable to format the incoming chat request first before using the LLM's native format template. In the Mistral example, say we have a multi-user chat with system messages and need to use their simple [INST],[/INST] formatting. When ``chat_completion_format`` is set to 'chatml', the incoming chat request will first get converted to a ChatML string before it is encapsulated in the LLM's normal prompt format. When this is set to None, the LLM's normal ``prompt_format`` will be used.",
)

description `class-attribute` `instance-attribute`

description: Optional[str] = Field(
    default=None,
    description="Human readable description of the LLM connection. BEst practice is to document the intended use cas(es), limitations, and other quirks.",
)

factory `class-attribute` `instance-attribute`

factory: LLMFactory = Field(
    ..., description="LLM connection factory settings"
)

format_kwargs `class-attribute` `instance-attribute`

format_kwargs: KwargsModel = Field(
    default_factory=KwargsModel,
    description="Prompt format kwargs",
)

mapper_to_str_kwargs `class-attribute` `instance-attribute`

mapper_to_str_kwargs: KwargsModel = Field(
    default_factory=KwargsModel,
    description="Additional kwargs passed to the mapper.to_str() method when rendering a completion prompt string.",
)

max_position_embeddings `class-attribute` `instance-attribute`

max_position_embeddings: int = Field(
    ...,
    title="Max Position Embeddings",
    description="The maximum number of tokens this model can accept. For huggingFace models, this value can be found in the model's config.json file. The intended use case for this value is to determine the maximum number of input tokens a model can accept when stuffing data into a prompt. Since OSS / self-hosted models are usually behind some other inferencing service, the path the the model's config.json should not be assumed and thus needed as part of the Eleanor Framework LLM configuration.",
)

model_config `class-attribute` `instance-attribute`

model_config = ConfigDict(arbitrary_types_allowed=True)

name `class-attribute` `instance-attribute`

name: str = Field(
    ...,
    title="Name",
    description="Name of the LLM configuration. This is a simplified version of the LLM name used internally by the framework to bind to chain configurations, which can apply LLM-specific settings to override defaults",
)

output_filter_factories `class-attribute` `instance-attribute`

output_filter_factories: List[IoCFactoryModel] = Field(
    default_factory=list,
    description="List of output factories, should implement eleanor.llm.generation_filters.GenerationFilter. These are filters that get applied to the LLM response on API calls. For filtering streaming responses, consider setting ``stream_buffer_flush_hwm`` as well.",
)

prompt_format `class-attribute` `instance-attribute`

prompt_format: str = Field(
    default="simple",
    description="LLM completion prompt format",
)

stream_buffer_flush_hwm `class-attribute` `instance-attribute`

stream_buffer_flush_hwm: int = Field(
    default=100,
    ge=0,
    description="When streaming a response on the Eleanor Framework completions API, this value determines how many characters are held by the buffer before they are sent bak to the client. Effectively, this controls the length of string seen by filters. Higher values will give response filtering more data to look at then pattern matching at the cost of a more delayed response. A value of 0 will disable buffering and filtering on streaming responses. This value has no effect on the completion API.",
)

stream_char_response_delay `class-attribute` `instance-attribute`

stream_char_response_delay: float = Field(
    default=0.01,
    ge=0.0,
    description="When streaming a response on the Eleanor Framework completions API, this value determines how long the server waits before sending a character chunk back to the the client. This value has no effect on the completion API. Small values are typically better but could spike CPU usage if too small.",
)

system_prompt `class-attribute` `instance-attribute`

system_prompt: Optional[str] = Field(
    default=None,
    description="Text injected at the beginning of the prompt. This will be added outside of any formatting triggered by ``chat_completion_format``",
)

tokenizer `class-attribute` `instance-attribute`

tokenizer: SkipJsonSchema[BaseTokenizer] = Field(
    default=None,
    title="LLM Tokenizer Instance (volatile)",
    description="Some capabilities in the Eleanor Framework such as token-based chunking require an internal tokenizer to work properly. This field will be initialized when the model is created via the ``tokenizer_factory`` configuration.",
    frozen=False,
    exclude=True,
    validate_default=False,
)

tokenizer_factory `class-attribute` `instance-attribute`

tokenizer_factory: IoCFactoryModel = Field(
    ...,
    title="Tokenizer Factory",
    description="Factory for the LLM tokenizer instance. When the LLM settings object is created it will be used to initialize the 'tokenizer' field",
)

Functions

validate_env

validate_env() -> LLMSettings

llm_settings

Classes

LLMFactory

Attributes

openai_api_key class-attribute instance-attribute

LLMSettings

Attributes

chat_completion_format class-attribute instance-attribute

description class-attribute instance-attribute

factory class-attribute instance-attribute

format_kwargs class-attribute instance-attribute

mapper_to_str_kwargs class-attribute instance-attribute

max_position_embeddings class-attribute instance-attribute

model_config class-attribute instance-attribute

name class-attribute instance-attribute

output_filter_factories class-attribute instance-attribute

prompt_format class-attribute instance-attribute

stream_buffer_flush_hwm class-attribute instance-attribute

stream_char_response_delay class-attribute instance-attribute

system_prompt class-attribute instance-attribute

tokenizer class-attribute instance-attribute

tokenizer_factory class-attribute instance-attribute

Functions

validate_env

openai_api_key `class-attribute` `instance-attribute`

chat_completion_format `class-attribute` `instance-attribute`

description `class-attribute` `instance-attribute`

factory `class-attribute` `instance-attribute`

format_kwargs `class-attribute` `instance-attribute`

mapper_to_str_kwargs `class-attribute` `instance-attribute`

max_position_embeddings `class-attribute` `instance-attribute`

model_config `class-attribute` `instance-attribute`

name `class-attribute` `instance-attribute`

output_filter_factories `class-attribute` `instance-attribute`

prompt_format `class-attribute` `instance-attribute`

stream_buffer_flush_hwm `class-attribute` `instance-attribute`

stream_char_response_delay `class-attribute` `instance-attribute`

system_prompt `class-attribute` `instance-attribute`

tokenizer `class-attribute` `instance-attribute`

tokenizer_factory `class-attribute` `instance-attribute`