Skip to content

rebuild_memory_collections_cm

CheckMate stages for TextProc/Fix service.

TextProc/Fix is a general-purpose data cleaning and normalization operation that:

  1. Reads an input text file and splits it into chunks.
  2. Derives prompt template values based on the specified profile and other user-provide parameters.
  3. Invokes a batch LLM chain to process the templates.
  4. Writes the processed results to an output file.

Fix-compatible templates take the following inputs:

  • text_block_format: Format of the input text, e.g., “plain”, “markdown”, “xml”, etc.
  • text_block_format_upper: Uppercase version of the text_block_format (used in the PEM envelope)
  • additional_instructions: User-specified details about each text block that may be useful for the LLM during processing.
  • text_block: The input text to be processed
  • format_specific_instructions: Additional instructions injected by the TextProc based on the text_block_format.

Attributes

TEXTPROC_FIX_TRACE_TAG module-attribute

TEXTPROC_FIX_TRACE_TAG = 'fix'

Classes

FixInitialize

Bases: BaseStage[FixState]

A stage in the text processing pipeline that fixes the formatting of text files based on a specified profile.

This stage performs the following operations: 1. Verifies that the specified profile exists in the settings. 2. Verifies that the input file path exists and is a file. 3. Generates an output filename based on the input filename and current timestamp. 4. Derives the output directory path and creates it if it does not exist. 5. Reads the input file into memory. 6. Splits the text into chunks using a splitter defined in the profile. 7. Builds templates for each text chunk based on the profile. 8. Batch processes the templates.

Functions

execute_stage
execute_stage() -> None

Executes the text processing stage for fixing formatting.

Raises:

  • ValueError

    If the profile is not found in the settings or if the output path exists but is not a directory.

  • FileNotFoundError

    If the input file path does not exist or is not a file.

FixLLM

Bases: BaseStage[FixState]

A stage in the text processing pipeline that fixes formatting issues in the provided templates.

This stage processes the templates using a chain service and updates the state with the results. It ensures that all templates are processed successfully and writes the output to the specified file.

Functions

execute_stage
execute_stage() -> None

Executes the formatting stage for text processing.

This stage includes retry logic to ensure that only unprocessed templates are sent to the LLM for processing. If the stage fails, it can be retried without reprocessing the templates that have already been successfully processed.

Raises:

  • MissingStateException

    If profile_ref, output_path, or templates are not set in the state.

  • ValueError

    If some chunks were not processed successfully.

FixResult

Bases: BaseModel

Model representing the result of the Fix stage.

Attributes:

  • summary (str | None) –

    A summary of the fix operation.

  • output_path (str | None) –

    The path to the output file.

Attributes

output_path class-attribute instance-attribute
output_path: str | None = None
summary class-attribute instance-attribute
summary: str | None = None

FixState

Bases: BaseStageState[FixResult]

Attributes

additional_user_info class-attribute instance-attribute
additional_user_info: str | None = None
input_file instance-attribute
input_file: str
output_dir class-attribute instance-attribute
output_dir: str | None = None
output_path class-attribute instance-attribute
output_path: str | None = None
profile instance-attribute
profile: str
profile_ref class-attribute instance-attribute
profile_ref: FixFormattingProfile | None = None
result_chunks class-attribute instance-attribute
result_chunks: List[str | None] | None = None
templates class-attribute instance-attribute
templates: List[Dict[str, Any]] | None = None
trace_id class-attribute instance-attribute
trace_id: str | None = None
trace_session_id class-attribute instance-attribute
trace_session_id: str | None = None
trace_tags class-attribute instance-attribute
trace_tags: List[str] | None = None

Functions

cm_factory_fix

cm_factory_fix(
    task_id: str | None = None,
    stage_templates: FixState | None = None,
    **kwargs
) -> CheckMate

CheckMate factory function.

By convention, all CheckMate implementations must have a factory function with a standardized signature. This removes un-necessary code duplication whenever a new task is created or retried.

Parameters:

  • task_id (str | None, default: None ) –

    The ID of the task.

  • stage_templates (FixState | None, default: None ) –

    The state templates for the stages.

  • **kwargs

    Additional keyword arguments.

Returns:

  • CheckMate ( CheckMate ) –

    The CheckMate instance for the Fix formatting operation.