Skip to content

evaluate_cm

CheckMate stages for TextEvolve/ Evaluate operations.

Attributes

EVALUATE_TASK_POOL module-attribute

EVALUATE_TASK_POOL = ThreadPoolExecutor(
    thread_name_prefix=service_pool_name,
    max_workers=service_pool_size,
)

EVALUATE_TRACE_TAG module-attribute

EVALUATE_TRACE_TAG = 'Evaluate'

Classes

EvaluateEngine

Bases: BaseModel

Attributes

k property
k: int

Returns the fixed number of score components (k) used by the evaluation engine.

Returns:

  • int ( int ) –

    The number of score components.

llm_score_fields property
llm_score_fields: List[str]

Returns the list of score fields expected in the LLM response. Note that evaluate handles confidence_scores as a special case.

Returns:

  • List[str]

    List[str]: The list of score fields.

max_score property
max_score: float

Returns the maximum score value returned by the LLM for the evaluation engine.

Returns:

  • float ( float ) –

    The maximum score value.

min_score property
min_score: float

Returns the minimum score value returned by the LLM for the evaluation engine.

Returns:

  • float ( float ) –

    The minimum score value.

profile instance-attribute
profile: EvaluateProfile
profile_name instance-attribute
profile_name: str
w property
w: ndarray

Derive the score component weight vector from the evaluation profile.

Note

This implementation of Evaluate has 6 fixed score components.

Warning

The ordering of the weights must match the order of the score components, failure to do this will result in incorrect scoring that is difficult to detect. To make this less likely weight ordering the alphabetical order of the score components is used.

Returns:

  • ndarray

    np.ndarray: The score component weight vector.

Functions

evaluate
evaluate(
    *,
    stage_ref: Process,
    task_output_ref: EvaluateTaskState,
    trace_session_id: str,
    trace_tags: List[str]
) -> None

Evaluate candidate responses using the TextEvolve evaluation function.

Parameters:

  • stage_ref (Process) –

    The stage reference, used for periodically saving state as work is complete. This will alow more fine-grained retries in the event the entire task fails.

  • task_output_ref (EvaluateTaskState) –

    The evaluation task to process. This must be a reference to the exact object used by the Process stage state since all updates will be modified in place.

  • trace_session_id (str) –

    The trace session ID, used to track the evaluation operation referenced by task_output_ref

  • trace_tags (List[str]) –

    Additional LLM tracing tags for this operation

select_best_candidate
select_best_candidate(
    S: ndarray,
    Sc: ndarray,
    y: List[str],
    *,
    probabilistic: bool = False
) -> str

Select the best response candidate based on normalized or softmax scores. Automatically applies weights before computation.

Parameters:

  • S (ndarray) –

    The scoring tensor with dimensions (r x i x j x k).

  • Sc (ndarray) –

    The confidence score tensor with dimensions (r x i x j).

  • y (List[str]) –

    List of candidate responses.

  • probabilistic (bool, default: False ) –

    If True, selects based on softmax scores; otherwise, based on normalized scores.

Returns:

  • str ( str ) –

    The selected response candidate.

EvaluateEngineError

Bases: Exception

Raised when an error occurs during evaluation process. This error may not necessarily be fatal to the Evaluation and may be handled / retried based on the profile configuration.

EvaluateResult

Bases: BaseModel

Attributes

results class-attribute instance-attribute
results: List[EvaluateTaskResults | None] = Field(
    default_factory=list,
    title="Results",
    description="The results of the evaluation tasks.",
)
status_counts class-attribute instance-attribute
status_counts: Dict[RecordStatus, int] = Field(
    default_factory=lambda: {
        i: 0 for i in iter(RecordStatus)
    },
    title="Status Counts",
    description="The counts of records processed by status.",
)

EvaluateState

Bases: BaseStageState[EvaluateResult]

Attributes

derived_trace_tags class-attribute instance-attribute
derived_trace_tags: List[str] | None = None
engine class-attribute instance-attribute
engine: EvaluateEngine | None = None
evaluate_output class-attribute instance-attribute
evaluate_output: List[EvaluateTaskState] | None = None
merge_strategy class-attribute instance-attribute
merge_strategy: MergeStrategyType = 'overwrite'
profile_override class-attribute instance-attribute
profile_override: EvaluateProfile | None = None
profile_ref class-attribute instance-attribute
profile_ref: str | None = None
tasks instance-attribute
tasks: Union[List[EvaluateTaskInput], EvaluateTaskInput]
trace_session_id class-attribute instance-attribute
trace_session_id: str | None = None
trace_tags class-attribute instance-attribute
trace_tags: List[str] | None = None

Initialize

Bases: BaseStage[EvaluateState]

Functions

execute_stage
execute_stage() -> None

Process

Bases: BaseStage[EvaluateState]

Functions

execute_stage
execute_stage() -> None

Functions

cm_factory_text_evolve_evaluate

cm_factory_text_evolve_evaluate(
    task_id: str | None = None,
    stage_templates: EvaluateState | None = None,
    **kwargs
) -> CheckMate