Skip to content

llm_settings

LLM settings module

Classes

LLMFactory

Bases: IoCFactoryModel

OpenAI compatible API endpoint. See langchain_openai.OpenAI

Attributes:

Attributes

openai_api_key class-attribute instance-attribute
openai_api_key: SecretStr = Field(
    default_factory=lambda: SecretStr(""),
    exclude=True,
    description="OpenAPI API key",
)

LLMSettings

Bases: BaseModel

LLM connection configuration

Attributes:

  • name (str) –

    Name

  • max_position_embeddings (int) –

    Max Position Embeddings

  • description (str) –

    Human readable description of the LLM connection. BEst practice is to document the intended use cas(es), limitations, and other quirks.

  • system_prompt (str) –

    Text injected at the beginning of the prompt. This will be added outside of any formatting triggered by chat_completion_format

  • prompt_format (str) –

    LLM completion prompt format

  • output_filter_factories (List[IoCFactoryModel]) –

    List of output factories, should implement eleanor.llm.generation_filters.GenerationFilter. These are filters that get applied to the LLM response on API calls. For filtering streaming responses, consider setting stream_buffer_flush_hwm as well.

  • chat_completion_format (str) –

    When a completion-based LLM (such as Mistral) that doesn’t easily support multi-user chat sessions and system messages, it can be desirable to format the incoming chat request first before using the LLM’s native format template. In the Mistral example, say we have a multi-user chat with system messages and need to use their simple [INST],[/INST] formatting. When chat_completion_format is set to ‘chatml’, the incoming chat request will first get converted to a ChatML string before it is encapsulated in the LLM’s normal prompt format. When this is set to None, the LLM’s normal prompt_format will be used.

  • stream_buffer_flush_hwm (int) –

    When streaming a response on the Eleanor Framework completions API, this value determines how many characters are held by the buffer before they are sent bak to the client. Effectively, this controls the length of string seen by filters. Higher values will give response filtering more data to look at then pattern matching at the cost of a more delayed response. A value of 0 will disable buffering and filtering on streaming responses. This value has no effect on the completion API.

  • stream_char_response_delay (float) –

    When streaming a response on the Eleanor Framework completions API, this value determines how long the server waits before sending a character chunk back to the the client. This value has no effect on the completion API. Small values are typically better but could spike CPU usage if too small.

  • tokenizer_factory (IoCFactoryModel) –

    Factory for the LLM tokenizer instance. When the LLM settings object is created it will be used to initialize the ‘tokenizer’ field

Attributes

chat_completion_format class-attribute instance-attribute
chat_completion_format: Optional[str] = Field(
    default=None,
    description="When a completion-based LLM (such as Mistral) that doesn't easily support multi-user chat sessions and system messages, it can be desirable to format the incoming chat request first before using the LLM's native format template. In the Mistral example, say we have a multi-user chat with system messages and need to use their simple [INST],[/INST] formatting. When ``chat_completion_format`` is set to 'chatml', the incoming chat request will first get converted to a ChatML string before it is encapsulated in the LLM's normal prompt format. When this is set to None, the LLM's normal ``prompt_format`` will be used.",
)
description class-attribute instance-attribute
description: Optional[str] = Field(
    default=None,
    description="Human readable description of the LLM connection. BEst practice is to document the intended use cas(es), limitations, and other quirks.",
)
factory class-attribute instance-attribute
factory: LLMFactory = Field(
    ..., description="LLM connection factory settings"
)
format_kwargs class-attribute instance-attribute
format_kwargs: KwargsModel = Field(
    default_factory=KwargsModel,
    description="Prompt format kwargs",
)
mapper_to_str_kwargs class-attribute instance-attribute
mapper_to_str_kwargs: KwargsModel = Field(
    default_factory=KwargsModel,
    description="Additional kwargs passed to the mapper.to_str() method when rendering a completion prompt string.",
)
max_position_embeddings class-attribute instance-attribute
max_position_embeddings: int = Field(
    ...,
    title="Max Position Embeddings",
    description="The maximum number of tokens this model can accept. For huggingFace models, this value can be found in the model's config.json file. The intended use case for this value is to determine the maximum number of input tokens a model can accept when stuffing data into a prompt. Since OSS / self-hosted models are usually behind some other inferencing service, the path the the model's config.json should not be assumed and thus needed as part of the Eleanor Framework LLM configuration.",
)
model_config class-attribute instance-attribute
model_config = ConfigDict(arbitrary_types_allowed=True)
name class-attribute instance-attribute
name: str = Field(
    ...,
    title="Name",
    description="Name of the LLM configuration. This is a simplified version of the LLM name used internally by the framework to bind to chain configurations, which can apply LLM-specific settings to override defaults",
)
output_filter_factories class-attribute instance-attribute
output_filter_factories: List[IoCFactoryModel] = Field(
    default_factory=list,
    description="List of output factories, should implement eleanor.llm.generation_filters.GenerationFilter. These are filters that get applied to the LLM response on API calls. For filtering streaming responses, consider setting ``stream_buffer_flush_hwm`` as well.",
)
prompt_format class-attribute instance-attribute
prompt_format: str = Field(
    default="simple",
    description="LLM completion prompt format",
)
stream_buffer_flush_hwm class-attribute instance-attribute
stream_buffer_flush_hwm: int = Field(
    default=100,
    ge=0,
    description="When streaming a response on the Eleanor Framework completions API, this value determines how many characters are held by the buffer before they are sent bak to the client. Effectively, this controls the length of string seen by filters. Higher values will give response filtering more data to look at then pattern matching at the cost of a more delayed response. A value of 0 will disable buffering and filtering on streaming responses. This value has no effect on the completion API.",
)
stream_char_response_delay class-attribute instance-attribute
stream_char_response_delay: float = Field(
    default=0.01,
    ge=0.0,
    description="When streaming a response on the Eleanor Framework completions API, this value determines how long the server waits before sending a character chunk back to the the client. This value has no effect on the completion API. Small values are typically better but could spike CPU usage if too small.",
)
system_prompt class-attribute instance-attribute
system_prompt: Optional[str] = Field(
    default=None,
    description="Text injected at the beginning of the prompt. This will be added outside of any formatting triggered by ``chat_completion_format``",
)
tokenizer class-attribute instance-attribute
tokenizer: SkipJsonSchema[BaseTokenizer] = Field(
    default=None,
    title="LLM Tokenizer Instance (volatile)",
    description="Some capabilities in the Eleanor Framework such as token-based chunking require an internal tokenizer to work properly. This field will be initialized when the model is created via the ``tokenizer_factory`` configuration.",
    frozen=False,
    exclude=True,
    validate_default=False,
)
tokenizer_factory class-attribute instance-attribute
tokenizer_factory: IoCFactoryModel = Field(
    ...,
    title="Tokenizer Factory",
    description="Factory for the LLM tokenizer instance. When the LLM settings object is created it will be used to initialize the 'tokenizer' field",
)

Functions

validate_env
validate_env() -> LLMSettings