evaluate_model
TextEvolve/Evaluate data models and types.
Classes
BaseEvaluateTurnRetryStrategy
Bases: BaseDataModel
, ABC
Abstract base class for Evaluate turn retry strategies.
Functions
do_retry
abstractmethod
do_retry(
profile: EvaluateProfile,
xi: List[DebateHistoryEntry],
i: Agent,
round_num: int,
m_i: GetMemoriesResponse | None = None,
) -> RetryAction
DebateHistoryEntry
Bases: BaseModel
Agent debate history entry.
Attributes
confidence_scores
class-attribute
instance-attribute
confidence_scores: ndarray = Field(
...,
title="Confidence",
description="Unweighted confidence scores from the turn",
)
model_config
class-attribute
instance-attribute
model_config = ConfigDict(
populate_by_name=True,
use_enum_values=False,
extra="forbid",
strict=False,
arbitrary_types_allowed=True,
)
r
class-attribute
instance-attribute
response
class-attribute
instance-attribute
role
class-attribute
instance-attribute
scores
class-attribute
instance-attribute
Functions
format_llm
Format the debate history entry as an LLM-safe string.
Returns:
-
str
(str
) –The formatted debate history entry.
validate_confidence_scores
classmethod
EvaluateTaskInput
Bases: BaseDataModel
TextEvolve/Evaluate internal task definition used by the evaluation engine.
Attributes
probabilistic_selection
class-attribute
instance-attribute
probabilistic_selection: bool = Field(
default=False,
title="Probabilistic Selection",
description="When true",
)
x
class-attribute
instance-attribute
x_name
class-attribute
instance-attribute
x_name: str = Field(
default="input",
title="X Name",
description="Natural language name of the x input, ex: 'prompt', 'document', etc. This should be one word that can be used to identify the context of the evaluation to the LLM.",
)
y
class-attribute
instance-attribute
y: List[str] = Field(
default_factory=list,
title="y",
description="List of candidate responses in natural language. Each will be scored against 'x'. Examples may include answers to a question, prompt variations, reasoning, etc.",
)
y_name
class-attribute
instance-attribute
y_name: str = Field(
default="output",
title="Y Name",
description="Name of the y outputs that are to be scored. This should be one word that can be used to identify what is being evaluated.",
)
Functions
EvaluateTaskResults
Bases: BaseModel
When a TextEvolve/Evaluate task is complete this object is populated with the final results and are intended to be presented back to the user.
Attributes
agent_Sc
class-attribute
instance-attribute
agent_Sc: Dict[str, float] = Field(
...,
title="Agent Confidence",
description="Agent confidence scores",
)
input_task
class-attribute
instance-attribute
input_task: EvaluateTaskInput = Field(
...,
title="Input Task",
description="The Evaluate input task",
)
last_cv
class-attribute
instance-attribute
last_round
class-attribute
instance-attribute
model_config
class-attribute
instance-attribute
model_config = ConfigDict(
populate_by_name=True,
use_enum_values=False,
extra="forbid",
strict=False,
arbitrary_types_allowed=True,
)
scores_norm
class-attribute
instance-attribute
scores_norm: ndarray = Field(
default_factory=lambda: array([]),
title="Normalized Scores",
description="The normalized (weighted) score vector of shape (j). Normalized scores have confidence applied.",
)
scores_norm_unweighted
class-attribute
instance-attribute
scores_norm_unweighted: ndarray = Field(
default_factory=lambda: array([]),
title="Unweighted Normalized Scores",
description="The normalized (weighted) score vector of shape (j)",
)
scores_softmax
class-attribute
instance-attribute
scores_softmax: ndarray = Field(
default_factory=lambda: array([]),
title="Softmax Scores",
description="The softmax score vector of shape (j)",
)
trace_id
class-attribute
instance-attribute
y_debate
class-attribute
instance-attribute
y_debate: str = Field(
default="",
title="y Debate",
description="The formatted debate transcript used to determine the final scores.",
)
y_index
class-attribute
instance-attribute
y_index: int = Field(
default=-1,
title="y Index",
description="The index of the highest scoring response candidate",
)
y_response
class-attribute
instance-attribute
y_response: str = Field(
default="",
title="y Response",
description="The raw text of the highest scoring response candidate",
)
y_score
class-attribute
instance-attribute
y_score: float = Field(
default=-1.0,
title="y Score",
description="The score of the highest scoring response candidate. When probabilistic selection is enabled, this is the softmax score, otherwise it is the normalized score.",
)
Functions
validate_scores_norm
classmethod
validate_scores_norm_unweighted
classmethod
EvaluateTaskState
Bases: BaseModel
TextEvolve/Evaluate compute state data model used internally by the CheckMate engine. This object represents a single task / unit of work.
Attributes
S
class-attribute
instance-attribute
S: ndarray = Field(
default_factory=lambda: array([]),
title="S",
description="The scoring tensor of shape (r x i x j x k)",
)
S_round
class-attribute
instance-attribute
S_round: ndarray = Field(
default_factory=lambda: array([]),
title="S, Current Round",
description="The scoring tensor for the current round of shape (i x j x k)",
)
Sc
class-attribute
instance-attribute
Sc: ndarray = Field(
default_factory=lambda: array([]),
title="Sc",
description="The confidence scoring tensor of shape (r x i x j)",
)
Sc_round
class-attribute
instance-attribute
Sc_round: ndarray = Field(
default_factory=lambda: array([]),
title="Sc, Current Round",
description="The confidence scoring tensor for the current round of shape (i x j)",
)
debate_history
class-attribute
instance-attribute
debate_history: List[DebateHistoryEntry] = Field(
default_factory=list,
title="Debate History",
description="The full Evaluate debate history",
)
evaluate_result
class-attribute
instance-attribute
evaluate_result: EvaluateTaskResults | None = Field(
default=None,
title="Evaluate Result",
description="The final evaluation result for the task, populated when the evaluate operation completes successfully.",
)
exception
class-attribute
instance-attribute
exception: str | None = Field(
default=None,
title="Exception",
description="The exception raised during evaluation on failure",
)
model_config
class-attribute
instance-attribute
model_config = ConfigDict(
populate_by_name=True,
use_enum_values=False,
extra="forbid",
strict=False,
arbitrary_types_allowed=True,
)
status
class-attribute
instance-attribute
status: RecordStatus = Field(
default=UNPROCESSED,
title="Status",
description="Evaluate processing status",
)
task
class-attribute
instance-attribute
task: EvaluateTaskInput = Field(
..., title="Task", description="The Evaluate input task"
)
trace_id
class-attribute
instance-attribute
Functions
validate_S
classmethod
validate_S_round
classmethod
HardLimitRetryStrategy
Bases: BaseEvaluateTurnRetryStrategy
Retries turn up to a hard failure limit.
Attributes:
-
fail_count
(int
) –Number of Evaluate turn has failures.
-
max_fail_count
(int
) –Maximum number of failures before a fatal error.
Attributes
fail_count
class-attribute
instance-attribute
fail_count: int = Field(
default=0,
title="Fail Count",
description="Number of Evaluate turn has failures",
)
max_fail_count
class-attribute
instance-attribute
max_fail_count: int = Field(
default=3,
title="Max Fail Count",
description="Maximum number of failures",
)
Functions
do_retry
do_retry(
profile: EvaluateProfile,
xi: List[DebateHistoryEntry],
i: Agent,
round_num: int,
m_i: GetMemoriesResponse | None = None,
) -> RetryAction
Retry the operation up to a hard limit.
Parameters:
-
profile
(EvaluateProfile
) –The evaluation profile settings.
-
xi
(List[DebateHistoryEntry]
) –Debate history.
-
i
(Agent
) –The current agent.
-
round_num
(int
) –The current round number.
-
m_i
(GetMemoriesResponse | None
, default:None
) –Memories for the current agent.
RetryAction
Bases: Enum
Retry action enumeration.
Attributes: