evaluate_cm

CheckMate stages for TextEvolve/ Evaluate operations.

Attributes

EVALUATE_TASK_POOL `module-attribute`

EVALUATE_TASK_POOL = ThreadPoolExecutor(
    thread_name_prefix=service_pool_name,
    max_workers=service_pool_size,
)

EVALUATE_TRACE_TAG `module-attribute`

EVALUATE_TRACE_TAG = 'Evaluate'

Classes

EvaluateEngine

Bases: BaseModel

Attributes

k `property`

k: int

Returns the fixed number of score components (k) used by the evaluation engine.

Returns:

int ( int ) –

The number of score components.

llm_score_fields `property`

llm_score_fields: List[str]

Returns the list of score fields expected in the LLM response. Note that evaluate handles confidence_scores as a special case.

Returns:

List[str] –

List[str]: The list of score fields.

max_score `property`

max_score: float

Returns the maximum score value returned by the LLM for the evaluation engine.

Returns:

float ( float ) –

The maximum score value.

min_score `property`

min_score: float

Returns the minimum score value returned by the LLM for the evaluation engine.

Returns:

float ( float ) –

The minimum score value.

profile `instance-attribute`

profile: EvaluateProfile

profile_name `instance-attribute`

profile_name: str

w `property`

w: ndarray

Derive the score component weight vector from the evaluation profile.

Note

This implementation of Evaluate has 6 fixed score components.

Warning

The ordering of the weights must match the order of the score components, failure to do this will result in incorrect scoring that is difficult to detect. To make this less likely weight ordering the alphabetical order of the score components is used.

Returns:

ndarray –

np.ndarray: The score component weight vector.

Functions

evaluate

evaluate(
    *,
    stage_ref: Process,
    task_output_ref: EvaluateTaskState,
    trace_session_id: str,
    trace_tags: List[str]
) -> None

Evaluate candidate responses using the TextEvolve evaluation function.

Parameters:

stage_ref (Process) –

The stage reference, used for periodically saving state as work is complete. This will alow more fine-grained retries in the event the entire task fails.
task_output_ref (EvaluateTaskState) –

The evaluation task to process. This must be a reference to the exact object used by the Process stage state since all updates will be modified in place.
trace_session_id (str) –

The trace session ID, used to track the evaluation operation referenced by task_output_ref
trace_tags (List[str]) –

Additional LLM tracing tags for this operation

select_best_candidate

select_best_candidate(
    S: ndarray,
    Sc: ndarray,
    y: List[str],
    *,
    probabilistic: bool = False
) -> str

Select the best response candidate based on normalized or softmax scores. Automatically applies weights before computation.

Parameters:

S (ndarray) –

The scoring tensor with dimensions (r x i x j x k).
Sc (ndarray) –

The confidence score tensor with dimensions (r x i x j).
y (List[str]) –

List of candidate responses.
probabilistic (bool, default: False ) –

If True, selects based on softmax scores; otherwise, based on normalized scores.

Returns:

str ( str ) –

The selected response candidate.

EvaluateEngineError

Bases: Exception

Raised when an error occurs during evaluation process. This error may not necessarily be fatal to the Evaluation and may be handled / retried based on the profile configuration.

EvaluateResult

Bases: BaseModel

Attributes

results `class-attribute` `instance-attribute`

results: List[EvaluateTaskResults | None] = Field(
    default_factory=list,
    title="Results",
    description="The results of the evaluation tasks.",
)

status_counts `class-attribute` `instance-attribute`

status_counts: Dict[RecordStatus, int] = Field(
    default_factory=lambda: {
        i: 0 for i in iter(RecordStatus)
    },
    title="Status Counts",
    description="The counts of records processed by status.",
)

EvaluateState

Bases: BaseStageState[EvaluateResult]

Attributes

derived_trace_tags `class-attribute` `instance-attribute`

derived_trace_tags: List[str] | None = None

engine `class-attribute` `instance-attribute`

engine: EvaluateEngine | None = None

evaluate_output `class-attribute` `instance-attribute`

evaluate_output: List[EvaluateTaskState] | None = None

merge_strategy `class-attribute` `instance-attribute`

merge_strategy: MergeStrategyType = 'overwrite'

profile_override `class-attribute` `instance-attribute`

profile_override: EvaluateProfile | None = None

profile_ref `class-attribute` `instance-attribute`

profile_ref: str | None = None

tasks `instance-attribute`

tasks: Union[List[EvaluateTaskInput], EvaluateTaskInput]

trace_session_id `class-attribute` `instance-attribute`

trace_session_id: str | None = None

trace_tags `class-attribute` `instance-attribute`

trace_tags: List[str] | None = None

Initialize

Bases: BaseStage[EvaluateState]

Functions

execute_stage

execute_stage() -> None

Process

Bases: BaseStage[EvaluateState]

Functions

execute_stage

execute_stage() -> None

Functions

cm_factory_text_evolve_evaluate

cm_factory_text_evolve_evaluate(
    task_id: str | None = None,
    stage_templates: EvaluateState | None = None,
    **kwargs
) -> CheckMate

evaluate_cm

Attributes

EVALUATE_TASK_POOL module-attribute

EVALUATE_TRACE_TAG module-attribute

Classes

EvaluateEngine

Attributes

k property

llm_score_fields property

max_score property

min_score property

profile instance-attribute

profile_name instance-attribute

w property

Functions

evaluate

select_best_candidate

EvaluateEngineError

EvaluateResult

Attributes

results class-attribute instance-attribute

status_counts class-attribute instance-attribute

EvaluateState

Attributes

derived_trace_tags class-attribute instance-attribute

engine class-attribute instance-attribute

evaluate_output class-attribute instance-attribute

merge_strategy class-attribute instance-attribute

profile_override class-attribute instance-attribute

profile_ref class-attribute instance-attribute

tasks instance-attribute

trace_session_id class-attribute instance-attribute

trace_tags class-attribute instance-attribute

Initialize

Functions

execute_stage

Process

Functions

execute_stage

Functions

cm_factory_text_evolve_evaluate

EVALUATE_TASK_POOL `module-attribute`

EVALUATE_TRACE_TAG `module-attribute`

k `property`

llm_score_fields `property`

max_score `property`

min_score `property`

profile `instance-attribute`

profile_name `instance-attribute`

w `property`

results `class-attribute` `instance-attribute`

status_counts `class-attribute` `instance-attribute`

derived_trace_tags `class-attribute` `instance-attribute`

engine `class-attribute` `instance-attribute`

evaluate_output `class-attribute` `instance-attribute`

merge_strategy `class-attribute` `instance-attribute`

profile_override `class-attribute` `instance-attribute`

profile_ref `class-attribute` `instance-attribute`

tasks `instance-attribute`

trace_session_id `class-attribute` `instance-attribute`

trace_tags `class-attribute` `instance-attribute`