evaluate_cm
CheckMate stages for TextEvolve/ Evaluate operations.
Attributes
EVALUATE_TASK_POOL
module-attribute
EVALUATE_TASK_POOL = ThreadPoolExecutor(
thread_name_prefix=service_pool_name,
max_workers=service_pool_size,
)
Classes
EvaluateEngine
Bases: BaseModel
Attributes
k
property
Returns the fixed number of score components (k) used by the evaluation engine.
Returns:
-
int
(int
) –The number of score components.
llm_score_fields
property
Returns the list of score fields expected in the LLM response. Note that evaluate handles confidence_scores as a special case.
Returns:
-
List[str]
–List[str]: The list of score fields.
max_score
property
Returns the maximum score value returned by the LLM for the evaluation engine.
Returns:
-
float
(float
) –The maximum score value.
min_score
property
Returns the minimum score value returned by the LLM for the evaluation engine.
Returns:
-
float
(float
) –The minimum score value.
w
property
Derive the score component weight vector from the evaluation profile.
Note
This implementation of Evaluate has 6 fixed score components.
Warning
The ordering of the weights must match the order of the score components, failure to do this will result in incorrect scoring that is difficult to detect. To make this less likely weight ordering the alphabetical order of the score components is used.
Returns:
-
ndarray
–np.ndarray: The score component weight vector.
Functions
evaluate
evaluate(
*,
stage_ref: Process,
task_output_ref: EvaluateTaskState,
trace_session_id: str,
trace_tags: List[str]
) -> None
Evaluate candidate responses using the TextEvolve evaluation function.
Parameters:
-
stage_ref
(Process
) –The stage reference, used for periodically saving state as work is complete. This will alow more fine-grained retries in the event the entire task fails.
-
task_output_ref
(EvaluateTaskState
) –The evaluation task to process. This must be a reference to the exact object used by the Process stage state since all updates will be modified in place.
-
trace_session_id
(str
) –The trace session ID, used to track the evaluation operation referenced by task_output_ref
-
trace_tags
(List[str]
) –Additional LLM tracing tags for this operation
select_best_candidate
select_best_candidate(
S: ndarray,
Sc: ndarray,
y: List[str],
*,
probabilistic: bool = False
) -> str
Select the best response candidate based on normalized or softmax scores. Automatically applies weights before computation.
Parameters:
-
S
(ndarray
) –The scoring tensor with dimensions (r x i x j x k).
-
Sc
(ndarray
) –The confidence score tensor with dimensions (r x i x j).
-
y
(List[str]
) –List of candidate responses.
-
probabilistic
(bool
, default:False
) –If True, selects based on softmax scores; otherwise, based on normalized scores.
Returns:
-
str
(str
) –The selected response candidate.
EvaluateEngineError
Bases: Exception
Raised when an error occurs during evaluation process. This error may not necessarily be fatal to the Evaluation and may be handled / retried based on the profile configuration.
EvaluateResult
Bases: BaseModel
Attributes
results
class-attribute
instance-attribute
results: List[EvaluateTaskResults | None] = Field(
default_factory=list,
title="Results",
description="The results of the evaluation tasks.",
)
status_counts
class-attribute
instance-attribute
status_counts: Dict[RecordStatus, int] = Field(
default_factory=lambda: {
i: 0 for i in iter(RecordStatus)
},
title="Status Counts",
description="The counts of records processed by status.",
)
EvaluateState
Bases: BaseStageState[EvaluateResult]
Attributes
evaluate_output
class-attribute
instance-attribute
evaluate_output: List[EvaluateTaskState] | None = None
Initialize
Bases: BaseStage[EvaluateState]
Process
Bases: BaseStage[EvaluateState]
Functions
cm_factory_text_evolve_evaluate
cm_factory_text_evolve_evaluate(
task_id: str | None = None,
stage_templates: EvaluateState | None = None,
**kwargs
) -> CheckMate