llm_settings
LLM settings module
Classes
LLMFactory
Bases: IoCFactoryModel
OpenAI compatible API endpoint. See langchain_openai.OpenAI
Attributes:
-
openai_api_key
(SecretStr
) –
LLMSettings
Bases: BaseModel
LLM connection configuration
Attributes:
-
name
(str
) –Name
-
max_position_embeddings
(int
) –Max Position Embeddings
-
description
(str
) –Human readable description of the LLM connection. BEst practice is to document the intended use cas(es), limitations, and other quirks.
-
system_prompt
(str
) –Text injected at the beginning of the prompt. This will be added outside of any formatting triggered by
chat_completion_format
-
prompt_format
(str
) –LLM completion prompt format
-
output_filter_factories
(List[IoCFactoryModel]
) –List of output factories, should implement eleanor.llm.generation_filters.GenerationFilter. These are filters that get applied to the LLM response on API calls. For filtering streaming responses, consider setting
stream_buffer_flush_hwm
as well. -
chat_completion_format
(str
) –When a completion-based LLM (such as Mistral) that doesn’t easily support multi-user chat sessions and system messages, it can be desirable to format the incoming chat request first before using the LLM’s native format template. In the Mistral example, say we have a multi-user chat with system messages and need to use their simple [INST],[/INST] formatting. When
chat_completion_format
is set to ‘chatml’, the incoming chat request will first get converted to a ChatML string before it is encapsulated in the LLM’s normal prompt format. When this is set to None, the LLM’s normalprompt_format
will be used. -
stream_buffer_flush_hwm
(int
) –When streaming a response on the Eleanor Framework completions API, this value determines how many characters are held by the buffer before they are sent bak to the client. Effectively, this controls the length of string seen by filters. Higher values will give response filtering more data to look at then pattern matching at the cost of a more delayed response. A value of 0 will disable buffering and filtering on streaming responses. This value has no effect on the completion API.
-
stream_char_response_delay
(float
) –When streaming a response on the Eleanor Framework completions API, this value determines how long the server waits before sending a character chunk back to the the client. This value has no effect on the completion API. Small values are typically better but could spike CPU usage if too small.
-
tokenizer_factory
(IoCFactoryModel
) –Factory for the LLM tokenizer instance. When the LLM settings object is created it will be used to initialize the ‘tokenizer’ field
Attributes
chat_completion_format
class-attribute
instance-attribute
chat_completion_format: Optional[str] = Field(
default=None,
description="When a completion-based LLM (such as Mistral) that doesn't easily support multi-user chat sessions and system messages, it can be desirable to format the incoming chat request first before using the LLM's native format template. In the Mistral example, say we have a multi-user chat with system messages and need to use their simple [INST],[/INST] formatting. When ``chat_completion_format`` is set to 'chatml', the incoming chat request will first get converted to a ChatML string before it is encapsulated in the LLM's normal prompt format. When this is set to None, the LLM's normal ``prompt_format`` will be used.",
)
description
class-attribute
instance-attribute
description: Optional[str] = Field(
default=None,
description="Human readable description of the LLM connection. BEst practice is to document the intended use cas(es), limitations, and other quirks.",
)
factory
class-attribute
instance-attribute
factory: LLMFactory = Field(
..., description="LLM connection factory settings"
)
format_kwargs
class-attribute
instance-attribute
format_kwargs: KwargsModel = Field(
default_factory=KwargsModel,
description="Prompt format kwargs",
)
mapper_to_str_kwargs
class-attribute
instance-attribute
mapper_to_str_kwargs: KwargsModel = Field(
default_factory=KwargsModel,
description="Additional kwargs passed to the mapper.to_str() method when rendering a completion prompt string.",
)
max_position_embeddings
class-attribute
instance-attribute
max_position_embeddings: int = Field(
...,
title="Max Position Embeddings",
description="The maximum number of tokens this model can accept. For huggingFace models, this value can be found in the model's config.json file. The intended use case for this value is to determine the maximum number of input tokens a model can accept when stuffing data into a prompt. Since OSS / self-hosted models are usually behind some other inferencing service, the path the the model's config.json should not be assumed and thus needed as part of the Eleanor Framework LLM configuration.",
)
model_config
class-attribute
instance-attribute
name
class-attribute
instance-attribute
name: str = Field(
...,
title="Name",
description="Name of the LLM configuration. This is a simplified version of the LLM name used internally by the framework to bind to chain configurations, which can apply LLM-specific settings to override defaults",
)
output_filter_factories
class-attribute
instance-attribute
output_filter_factories: List[IoCFactoryModel] = Field(
default_factory=list,
description="List of output factories, should implement eleanor.llm.generation_filters.GenerationFilter. These are filters that get applied to the LLM response on API calls. For filtering streaming responses, consider setting ``stream_buffer_flush_hwm`` as well.",
)
prompt_format
class-attribute
instance-attribute
stream_buffer_flush_hwm
class-attribute
instance-attribute
stream_buffer_flush_hwm: int = Field(
default=100,
ge=0,
description="When streaming a response on the Eleanor Framework completions API, this value determines how many characters are held by the buffer before they are sent bak to the client. Effectively, this controls the length of string seen by filters. Higher values will give response filtering more data to look at then pattern matching at the cost of a more delayed response. A value of 0 will disable buffering and filtering on streaming responses. This value has no effect on the completion API.",
)
stream_char_response_delay
class-attribute
instance-attribute
stream_char_response_delay: float = Field(
default=0.01,
ge=0.0,
description="When streaming a response on the Eleanor Framework completions API, this value determines how long the server waits before sending a character chunk back to the the client. This value has no effect on the completion API. Small values are typically better but could spike CPU usage if too small.",
)
system_prompt
class-attribute
instance-attribute
system_prompt: Optional[str] = Field(
default=None,
description="Text injected at the beginning of the prompt. This will be added outside of any formatting triggered by ``chat_completion_format``",
)
tokenizer
class-attribute
instance-attribute
tokenizer: SkipJsonSchema[BaseTokenizer] = Field(
default=None,
title="LLM Tokenizer Instance (volatile)",
description="Some capabilities in the Eleanor Framework such as token-based chunking require an internal tokenizer to work properly. This field will be initialized when the model is created via the ``tokenizer_factory`` configuration.",
frozen=False,
exclude=True,
validate_default=False,
)
tokenizer_factory
class-attribute
instance-attribute
tokenizer_factory: IoCFactoryModel = Field(
...,
title="Tokenizer Factory",
description="Factory for the LLM tokenizer instance. When the LLM settings object is created it will be used to initialize the 'tokenizer' field",
)