fix_cm
CheckMate stages for TextProc/Fix service.
TextProc/Fix is a general-purpose data cleaning and normalization operation that:
- Reads an input text file and splits it into chunks.
- Derives prompt template values based on the specified
profile
and other user-provide parameters. - Invokes a batch LLM chain to process the templates.
- Writes the processed results to an output file.
Fix-compatible templates take the following inputs:
- text_block_format: Format of the input text, e.g., “plain”, “markdown”, “xml”, etc.
- text_block_format_upper: Uppercase version of the text_block_format (used in the PEM envelope)
- additional_instructions: User-specified details about each text block that may be useful for the LLM during processing.
- text_block: The input text to be processed
- format_specific_instructions: Additional instructions injected by the TextProc based on the text_block_format.
Attributes
Classes
FixInitialize
A stage in the text processing pipeline that fixes the formatting of text files based on a specified profile.
This stage performs the following operations: 1. Verifies that the specified profile exists in the settings. 2. Verifies that the input file path exists and is a file. 3. Generates an output filename based on the input filename and current timestamp. 4. Derives the output directory path and creates it if it does not exist. 5. Reads the input file into memory. 6. Splits the text into chunks using a splitter defined in the profile. 7. Builds templates for each text chunk based on the profile. 8. Batch processes the templates.
FixLLM
A stage in the text processing pipeline that fixes formatting issues in the provided templates.
This stage processes the templates using a chain service and updates the state with the results. It ensures that all templates are processed successfully and writes the output to the specified file.
Functions
execute_stage
Executes the formatting stage for text processing.
This stage includes retry logic to ensure that only unprocessed templates are sent to the LLM for processing. If the stage fails, it can be retried without reprocessing the templates that have already been successfully processed.
Raises:
-
MissingStateException
–If
profile_ref
,output_path
, ortemplates
are not set in the state. -
ValueError
–If some chunks were not processed successfully.
FixResult
Bases: BaseModel
Model representing the result of the Fix stage.
Attributes:
-
summary
(str | None
) –A summary of the fix operation.
-
output_path
(str | None
) –The path to the output file.
FixState
Bases: BaseStageState[FixResult]
Attributes
Functions
cm_factory_fix
cm_factory_fix(
task_id: str | None = None,
stage_templates: FixState | None = None,
**kwargs
) -> CheckMate
CheckMate factory function.
By convention, all CheckMate implementations must have a factory function with a standardized signature. This removes un-necessary code duplication whenever a new task is created or retried.
Parameters:
-
task_id
(str | None
, default:None
) –The ID of the task.
-
stage_templates
(FixState | None
, default:None
) –The state templates for the stages.
-
**kwargs
–Additional keyword arguments.
Returns:
-
CheckMate
(CheckMate
) –The CheckMate instance for the Fix formatting operation.