util

Utility functions for common tasks and operations.

This module contains utility functions for common tasks and operations such as generating universally unique identifiers (UUIDs), calculating statistics for a list of numbers, converting a dictionary to a YAML string, and more.

Attributes

D `module-attribute`

D = TypeVar('D', int, float)

T `module-attribute`

T = TypeVar('T')

TIMESTAMP_FORMAT `module-attribute`

TIMESTAMP_FORMAT = '%Y%m%dT%H%M%S'

TIMESTAMP_FORMAT_ISO8601 `module-attribute`

TIMESTAMP_FORMAT_ISO8601 = '%Y-%m-%dT%H:%M:%S'

Classes

IndentDumper

IndentDumper(*args, **kwargs)

Bases: Dumper

Functions

increase_indent

increase_indent(flow=False, indentless=False)

represent_str `staticmethod`

represent_str(dumper, data)

Timer

Timer()

Timer class for measuring elapsed time between events.

Attributes

t1 `instance-attribute`

t1 = perf_counter()

Functions

mark

mark() -> float

Calculates the time elapsed since the last call to mark.

Returns:

float ( float ) –

The time elapsed in seconds.

Functions

build_response_schema_objs

build_response_schema_objs(
    response_schemas: List[Dict[str, Any]]
) -> List[ResponseSchema]

Helper method that will create instances of ResponseSchema objects from a list of response schema dictionaries (likely read from a config file).

Intended usage:

response_schemas = build_response_schema_objs(exec_config.response_schema) … initial_statement_exec = FlexChainExecutor( … output_parser=RobustOutputParser.from_response_schemas( build_response_schema_objs(exec_config.response_schema) ), .. )

Parameters:

response_schemas (List[Dict[str, Any]]) –

A list of response schema dictionaries.

Returns:

List[ResponseSchema] –

List[ResponseSchema]: A list of ResponseSchema objects.

Raises:

ValueError –

If any of the response schema entries are invalid.

calculate_stats

calculate_stats(
    numbers: List[float],
) -> Tuple[int, float, float, float, float]

Calculate simple statistics for a list of numbers.

returns a list of -1 if the list is empty

Parameters:

numbers (List[float]) –

A list of numbers.

Returns:

int –

Tuple[int, float, float, float, float]: A tuple containing the count, minimum value, maximum value,
float –

mean value, and standard deviation of the numbers. If the list is empty, the function returns (-1, -1.0, -1.0, -1.0, -1.0).

calculate_stats_pretty

calculate_stats_pretty(numbers: List[float]) -> str

Calculate simple statistics for a list of numbers and return a pretty formatted string.

Parameters:

numbers (List[float]) –

A list of numbers.

Returns:

str ( str ) –

A pretty formatted string containing the count, minimum value, maximum value,
str –

mean value, and standard deviation of the numbers.

calculate_success_percentage

calculate_success_percentage(
    success_count: int, total_count: int, precision: int = 2
) -> float

Calculates the success percentage based on the number of successful & total counts.

Parameters:

success_count (int) –

The number of successful counts.
total_count (int) –

The total count.
precision (int, default: 2 ) –

The number of decimal places to round the percentage to. Defaults to 2.

Returns:

float ( float ) –

The success percentage.

callable_from_string

callable_from_string(path: str) -> Callable

Get a class from a fully qualified class path.

Parameters:

path (str) –

The fully qualified class path.

Returns:

Callable ( Callable ) –

The callable object.

chunk_list

chunk_list(
    input_list: List[Any], n: int
) -> List[List[Any]]

Splits a list into chunks of up to n elements.

Parameters:

input_list (List[Any]) –

The list to be chunked.
n (int) –

The maximum number of elements in each chunk.

Returns:

List[List[Any]] –

List[List[Any]]: A list where each element is a chunk (sublist) of the input list.

count_lines_in_file

count_lines_in_file(path: str) -> int

Counts the number of lines in a file.

Parameters:

path (str) –

The path to the file.

Returns:

int ( int ) –

The number of lines in the file.

dict_to_yaml

dict_to_yaml(data: dict) -> str

Converts a dictionary to a yaml string.

Parameters:

data (dict) –

The dictionary to convert.

Returns:

str ( str ) –

The yaml string.

epoch_start

epoch_start() -> datetime

Get the epoch start datetime.

Returns:

datetime ( datetime ) –

The epoch start datetime.

find_latest_file

find_latest_file(directory: str, prefix: str) -> str

flatten_lists

flatten_lists(
    input_list: Iterable[Iterable[T]],
) -> List[T]

Flattens an iterable of iterables (like list of lists or dict_values of lists) into a single list.

Args: input_list (Iterable[Iterable[T]]): An iterable of iterables to be flattened.

Returns: List[T]: A flattened list containing all elements from the input iterables.

gen_timestamp

gen_timestamp(
    dt: Optional[datetime] = None,
    timestamp_format=TIMESTAMP_FORMAT,
    append_ms=True,
) -> str

Generate a timestamp string based on the current datetime or a specified datetime.

Parameters:

dt (Optional[datetime], default: None ) –

The datetime object to generate the timestamp from. If not provided, the current datetime will be used.
timestamp_format (str, default: TIMESTAMP_FORMAT ) –

The format string for the timestamp. Defaults to “%Y%m%dT%H%M%S”.
append_ms (bool, default: True ) –

Whether to append milliseconds to the timestamp. Defaults to True.

Returns:

str ( str ) –

The generated timestamp string.

gen_trace_id

gen_trace_id(*args, ts: datetime | None = None) -> str

gen_uuid

gen_uuid(prefix: Optional[str] = None) -> str

Generate a universally unique identifier (UUID) with an optional prefix.

Parameters:

prefix (str, default: None ) –

Prefix to be added to the generated UUID. Defaults to None.

Returns:

str ( str ) –

The generated UUID with the optional prefix.

generate_filename

generate_filename(
    input_string: str,
    *,
    max_length: int = 100,
    min_length: int = 1,
    extension: Optional[str] = None
) -> str

Generates a platform-independent, human-readable filename.

This function generates a filename based on the input string that is suitable for use across Linux, Windows, and S3 object storage. The filename will include only uppercase letters, digits, dashes, underscores, and dots, and will be truncated to a specified maximum length.

Parameters:

input_string (str) –

The input string from which to generate the filename.
max_length (int, default: 100 ) –

The maximum length of the output filename, including the extension. Defaults to 100.
min_length (int, default: 1 ) –

The minimum length of the output filename, excluding the extension. Defaults to 1.
extension (Optional[str], default: None ) –

The optional file extension to append (e.g., ‘txt’, ‘json’). Do not include a dot.

Returns:

str ( str ) –

A platform-independent, sanitized, and truncated filename.

Raises:

ValueError –

If min_length is less than 1, max_length is less than min_length, or if the input results in an empty filename.

get_caller_info

get_caller_info(
    frame_index: int = 2, short: bool = False
) -> str

Retrieve the calling function’s file path, line number, and function name.

Returns:

str ( str ) –

A string in the format ‘path/to/file.py:line_number (function_name)’.

get_class_from_string

get_class_from_string(full_class_string)

Dynamically imports a class from a given full class string.

full_class_string (str): The full path to the class, including the module and class name, separated by a dot.

Returns: - class: The class object referred to by full_class_string.

Raises: - ValueError: If the class cannot be found. - ModuleNotFoundError: If the module cannot be imported.

get_current_function_name

get_current_function_name() -> str

Get the name of the current function.

Returns:

str ( str ) –

The name of the current function.

indent_string

indent_string(
    text: str, indent: int, skip_first_line: bool = False
) -> str

Indents each line of a multiline string by a specified number of spaces.

:param text: The multiline string to be indented. :param indent: The number of spaces for indentation. :param skip_first_line: If True, the first line will not be indented. :return: The indented string.

list_to_bullets

list_to_bullets(
    the_list: Union[List, str],
    bullet_char="*",
    quote_char="",
) -> str

Converts a list of items into a formatted string with bullets.

Parameters:

the_list (list) –

The list of items to be converted.
bullet_char (str, default: '*' ) –

The character used as the bullet point. Defaults to “*”.
quote_char (str, default: '' ) –

The character used to enclose each item. Defaults to “”.

Returns:

str ( str ) –

The formatted string with bullets.

list_to_numbered

list_to_numbered(the_list: list) -> str

Converts a list of items into a formatted string with numbers.

Parameters:

the_list (list) –

The list of items to be converted.

Returns:

str ( str ) –

The formatted string with numbers.

log_structured

log_structured(
    data: Dict[str, Union[str, Iterable[str]]],
    color: Optional[int] = None,
    heading: str = "",
) -> str

Formats structured data into a log message.

Example:
    ```python
    LOG.info("

%s”, log_structured(data=data, heading=self.task_id, color=color)) ```

Args:
    data (Dict[str, Union[str, Iterable[str]]]): The structured data to be logged.
    color (Optional[int], optional): The color code for the log message. Defaults to None.
    heading (str, optional): The heading for the log message. Defaults to "".

Returns:
    str: The formatted log message.

minmax_scale

minmax_scale(
    value: D,
    min_value: D,
    max_value: D,
    strict: bool = False,
) -> float

Scale a value to the range [0, 1] using min-max scaling.

This method implements some rules that check for edge cases:

If the value is less than the minimum value, the function will return 0.0.
If the value is greater than the maximum value, the function will return 1.0.
If the minimum and maximum values are equal, the function will return 1.0 (to avoid a division by zero error).

Parameters:

value (D) –

The value to be scaled.
min_value (D) –

The minimum value of the range.
max_value (D) –

The maximum value of the range.
strict (bool, default: False ) –

If True, the function will raise a ValueError if the value is outside the range. Defaults to False.

Returns:

float ( float ) –

The scaled value.

normalize_sentence

normalize_sentence(
    s: str, capitalize_first_word: bool = False
) -> str

Strips surrounding single or double quotes from a string if they encapsulate the entire string. Also, optionally capitalizes the first letter of the first word.

Parameters:

s (str) –

The string to be filtered.

Returns:

str ( str ) –

The filtered string.

now_tz

now_tz() -> datetime

Get the current datetime with the system configured timezone

Returns:

datetime ( datetime ) –

The current datetime in the specified timezone.

now_utc

now_utc() -> datetime

Get the current UTC datetime.

Returns:

datetime ( datetime ) –

The current datetime in the system configured timezone.

now_utc_timestamp

now_utc_timestamp() -> float

Get the current UTC datetime as a numeric timestamp

Returns:

float ( float ) –

The current UTC datetime as a numeric timestamp.

pydantic_hash

pydantic_hash(*args, encoding='utf-8') -> str

Generate a hash value from the JSON representations of Pydantic models.

Parameters:

*args –

Variable number of Pydantic models.
encoding (str, default: 'utf-8' ) –

The encoding to use for encoding the JSON representations. Defaults to “utf-8”.

Returns:

str ( str ) –

The hash value generated from the JSON representations.

Raises:

ValueError –

If any of the objects passed as arguments are not Pydantic models.

remove_from_string

remove_from_string(s: str, to_remove: List[str]) -> str

Removes all occurrences of a list of strings from a string.

Parameters:

s (str) –

The string to remove from.
to_remove (List[str]) –

The list of strings to remove.

Returns:

str ( str ) –

The string with all occurrences of the strings in to_remove removed.

remove_quotes

remove_quotes(input_string) -> str

Removes surrounding matching quotes from the input string if present, ignoring whitespace between quotes. This will work for any number of nested matching quotes that are present in the string.

Parameters:

input_string (str) –

The string to remove quotes from.

Returns:

str ( str ) –

The input string without surrounding matching quotes, if present.

reverse_dict_lookup

reverse_dict_lookup(
    haystack: Dict[Any, Any],
    needle: Any,
    *,
    error_on_empty: bool = False,
    expected_count: int | None = None
) -> List[Any]

Find all keys in a dictionary that map to a specified target value.

Parameters:

haystack (Dict[Any, Any]) –

The dictionary to search.
needle (Any) –

The value to search for.
error_on_empty (bool, default: False ) –

If True, raises a ValueError if no results are found.
expected_count (int, default: None ) –

If provided, raises a ValueError if the number of found keys does not match this value.

Returns:

List[Any] –

List[Any]: A list of keys that map to the target value.

Raises:

TypeError –

If haystack is not a dictionary.
ValueError –

If haystack is None, if error_on_empty is True and no keys are found, or if expected_count is provided and does not match the number of found keys.

set_attr_path

set_attr_path(obj: Any, path: str, value: Any) -> bool

set_or_convert_timezone

set_or_convert_timezone(dt, default_tz=utc) -> datetime

Sets the timezone of a naive datetime object to the default timezone, or converts the timezone of an aware datetime object to the default timezone.

Parameters:

dt (datetime) –

The datetime object to check and set/convert the timezone.
default_tz (tzinfo, default: utc ) –

The default timezone to set or convert to. Default is UTC.

Returns:

datetime ( datetime ) –

The datetime object with timezone set or converted.

string_from_callable

string_from_callable(
    obj: Union[Type[Any], Callable[..., Any], Any]
) -> str

Get a fully qualified class path from a class object, class instance, function reference, or even a built-in type.

Parameters:

obj (Union[Type[Any], Callable[..., Any], Any]) –

The class object, class instance, function reference, or built-in type.

Returns:

str ( str ) –

The fully qualified class path.

Example

# For a class
print(string_from_callable(MyClass))  # Output: "my_module.MyClass"

# For a class instance
instance = MyClass()
print(string_from_callable(instance))  # Output: "my_module.MyClass"

# For a function
print(string_from_callable(my_function))  # Output: "my_module.my_function"

# For a built-in type
print(string_from_callable(int))  # Output: "builtins.int"

update_dict

update_dict(
    the_dict: Dict[Any, Any], key: Any, value: Any
) -> Dict[Any, Any]

Updates the given dictionary d with the specified key and value, and returns the updated dictionary.

Parameters:

the_dict (Dict[Any, Any]) –

The dictionary to be updated.
key (Any) –

The key to be added or updated in the dictionary.
value (Any) –

The value to be associated with the key.

Returns:

Dict[Any, Any] –

Dict[Any, Any]: The updated dictionary.

wrap_multiline_log

wrap_multiline_log(
    message: str,
    color: int | None = None,
    heading: str = "",
    indent: int = 0,
) -> str

Wraps a multiline log message with an easy to read delimiter and colorizes the actual message.

When logging to INFO you may want to use Fore.LIGHTWHITE_EX to make the message easier to read. For WARNING or ERROR levels it be better to keep color = None and use the logger colorizer configuration.

Parameters:

message (str) –

The log message to be wrapped.
color (Optional[int], default: None ) –

The color code to apply to the log message. Defaults to None.
heading (str, default: '' ) –

The heading for the log message. Defaults to “”.
indent (int, default: 0 ) –

The number of spaces to indent each line. Defaults to 0.

Returns:

str ( str ) –

The wrapped log message.

wrap_pem

wrap_pem(data: str, heading: str) -> str

Wrap the data with PEM-style header and footer.

Parameters:

data (str) –

The data to be wrapped.
heading (str) –

The heading to be included in the header.

Returns:

str ( str ) –

The data wrapped with PEM-style header and footer.

wrap_with_tags

wrap_with_tags(
    tag: str, data: List[str], join_seq="\n"
) -> str

Wrap each string in the data list with the specified HTML-style tag. Each wrapped string includes an auto-incrementing id attribute.

Parameters:

tag (str) –

The name of the tag to wrap the strings with.
data (List[str]) –

A list of strings to be wrapped.

Returns:

str ( str ) –

A single string containing all wrapped data elements.

Raises:

ValueError –

If the tag is empty or contains whitespace.
TypeError –

If any element in data is not a string.

util

Attributes

D module-attribute

T module-attribute

TIMESTAMP_FORMAT module-attribute

TIMESTAMP_FORMAT_ISO8601 module-attribute

Classes

IndentDumper

Functions

increase_indent

represent_str staticmethod

Timer

Attributes

t1 instance-attribute

Functions

mark

Functions

build_response_schema_objs

calculate_stats

calculate_stats_pretty

calculate_success_percentage

callable_from_string

chunk_list

count_lines_in_file

dict_to_yaml

epoch_start

find_latest_file

flatten_lists

gen_timestamp

gen_trace_id

gen_uuid

generate_filename

get_caller_info

get_class_from_string

get_current_function_name

indent_string

list_to_bullets

list_to_numbered

log_structured

minmax_scale

normalize_sentence

now_tz

now_utc

now_utc_timestamp

pydantic_hash

remove_from_string

remove_quotes

reverse_dict_lookup

set_attr_path

set_or_convert_timezone

string_from_callable

update_dict

wrap_multiline_log

wrap_pem

wrap_with_tags

D `module-attribute`

T `module-attribute`

TIMESTAMP_FORMAT `module-attribute`

TIMESTAMP_FORMAT_ISO8601 `module-attribute`

represent_str `staticmethod`

t1 `instance-attribute`