util
Utility functions for common tasks and operations.
This module contains utility functions for common tasks and operations such as generating universally unique identifiers (UUIDs), calculating statistics for a list of numbers, converting a dictionary to a YAML string, and more.
Attributes
Classes
IndentDumper
Timer
Functions
build_response_schema_objs
Helper method that will create instances of ResponseSchema objects from a list of response schema dictionaries (likely read from a config file).
Intended usage:
response_schemas = build_response_schema_objs(exec_config.response_schema) … initial_statement_exec = FlexChainExecutor( … output_parser=RobustOutputParser.from_response_schemas( build_response_schema_objs(exec_config.response_schema) ), .. )
Parameters:
-
response_schemas
(List[Dict[str, Any]]
) –A list of response schema dictionaries.
Returns:
-
List[ResponseSchema]
–List[ResponseSchema]: A list of ResponseSchema objects.
Raises:
-
ValueError
–If any of the response schema entries are invalid.
calculate_stats
Calculate simple statistics for a list of numbers.
returns a list of -1 if the list is empty
Parameters:
-
numbers
(List[float]
) –A list of numbers.
Returns:
-
int
–Tuple[int, float, float, float, float]: A tuple containing the count, minimum value, maximum value,
-
float
–mean value, and standard deviation of the numbers. If the list is empty, the function returns (-1, -1.0, -1.0, -1.0, -1.0).
calculate_stats_pretty
Calculate simple statistics for a list of numbers and return a pretty formatted string.
Parameters:
-
numbers
(List[float]
) –A list of numbers.
Returns:
-
str
(str
) –A pretty formatted string containing the count, minimum value, maximum value,
-
str
–mean value, and standard deviation of the numbers.
calculate_success_percentage
Calculates the success percentage based on the number of successful & total counts.
Parameters:
-
success_count
(int
) –The number of successful counts.
-
total_count
(int
) –The total count.
-
precision
(int
, default:2
) –The number of decimal places to round the percentage to. Defaults to 2.
Returns:
-
float
(float
) –The success percentage.
callable_from_string
Get a class from a fully qualified class path.
Parameters:
-
path
(str
) –The fully qualified class path.
Returns:
-
Callable
(Callable
) –The callable object.
chunk_list
Splits a list into chunks of up to n elements.
Parameters:
-
input_list
(List[Any]
) –The list to be chunked.
-
n
(int
) –The maximum number of elements in each chunk.
Returns:
-
List[List[Any]]
–List[List[Any]]: A list where each element is a chunk (sublist) of the input list.
count_lines_in_file
Counts the number of lines in a file.
Parameters:
-
path
(str
) –The path to the file.
Returns:
-
int
(int
) –The number of lines in the file.
dict_to_yaml
Converts a dictionary to a yaml string.
Parameters:
-
data
(dict
) –The dictionary to convert.
Returns:
-
str
(str
) –The yaml string.
epoch_start
Get the epoch start datetime.
Returns:
-
datetime
(datetime
) –The epoch start datetime.
flatten_lists
Flattens an iterable of iterables (like list of lists or dict_values of lists) into a single list.
Args: input_list (Iterable[Iterable[T]]): An iterable of iterables to be flattened.
Returns: List[T]: A flattened list containing all elements from the input iterables.
gen_timestamp
gen_timestamp(
dt: Optional[datetime] = None,
timestamp_format=TIMESTAMP_FORMAT,
append_ms=True,
) -> str
Generate a timestamp string based on the current datetime or a specified datetime.
Parameters:
-
dt
(Optional[datetime]
, default:None
) –The datetime object to generate the timestamp from. If not provided, the current datetime will be used.
-
timestamp_format
(str
, default:TIMESTAMP_FORMAT
) –The format string for the timestamp. Defaults to “%Y%m%dT%H%M%S”.
-
append_ms
(bool
, default:True
) –Whether to append milliseconds to the timestamp. Defaults to True.
Returns:
-
str
(str
) –The generated timestamp string.
gen_uuid
Generate a universally unique identifier (UUID) with an optional prefix.
Parameters:
-
prefix
(str
, default:None
) –Prefix to be added to the generated UUID. Defaults to None.
Returns:
-
str
(str
) –The generated UUID with the optional prefix.
generate_filename
generate_filename(
input_string: str,
*,
max_length: int = 100,
min_length: int = 1,
extension: Optional[str] = None
) -> str
Generates a platform-independent, human-readable filename.
This function generates a filename based on the input string that is suitable for use across Linux, Windows, and S3 object storage. The filename will include only uppercase letters, digits, dashes, underscores, and dots, and will be truncated to a specified maximum length.
Parameters:
-
input_string
(str
) –The input string from which to generate the filename.
-
max_length
(int
, default:100
) –The maximum length of the output filename, including the extension. Defaults to 100.
-
min_length
(int
, default:1
) –The minimum length of the output filename, excluding the extension. Defaults to 1.
-
extension
(Optional[str]
, default:None
) –The optional file extension to append (e.g., ‘txt’, ‘json’). Do not include a dot.
Returns:
-
str
(str
) –A platform-independent, sanitized, and truncated filename.
Raises:
-
ValueError
–If
min_length
is less than 1,max_length
is less thanmin_length
, or if the input results in an empty filename.
get_caller_info
Retrieve the calling function’s file path, line number, and function name.
Returns:
-
str
(str
) –A string in the format ‘path/to/file.py:line_number (function_name)’.
get_class_from_string
Dynamically imports a class from a given full class string.
- full_class_string (str): The full path to the class, including the module and class name, separated by a dot.
Returns:
- class: The class object referred to by full_class_string
.
Raises: - ValueError: If the class cannot be found. - ModuleNotFoundError: If the module cannot be imported.
get_current_function_name
Get the name of the current function.
Returns:
-
str
(str
) –The name of the current function.
indent_string
Indents each line of a multiline string by a specified number of spaces.
:param text: The multiline string to be indented. :param indent: The number of spaces for indentation. :param skip_first_line: If True, the first line will not be indented. :return: The indented string.
list_to_bullets
Converts a list of items into a formatted string with bullets.
Parameters:
-
the_list
(list
) –The list of items to be converted.
-
bullet_char
(str
, default:'*'
) –The character used as the bullet point. Defaults to “*”.
-
quote_char
(str
, default:''
) –The character used to enclose each item. Defaults to “”.
Returns:
-
str
(str
) –The formatted string with bullets.
list_to_numbered
Converts a list of items into a formatted string with numbers.
Parameters:
-
the_list
(list
) –The list of items to be converted.
Returns:
-
str
(str
) –The formatted string with numbers.
log_structured
log_structured(
data: Dict[str, Union[str, Iterable[str]]],
color: Optional[int] = None,
heading: str = "",
) -> str
Formats structured data into a log message.
Example:
```python
LOG.info("
%s”, log_structured(data=data, heading=self.task_id, color=color)) ```
Args:
data (Dict[str, Union[str, Iterable[str]]]): The structured data to be logged.
color (Optional[int], optional): The color code for the log message. Defaults to None.
heading (str, optional): The heading for the log message. Defaults to "".
Returns:
str: The formatted log message.
minmax_scale
Scale a value to the range [0, 1] using min-max scaling.
This method implements some rules that check for edge cases:
- If the value is less than the minimum value, the function will return 0.0.
- If the value is greater than the maximum value, the function will return 1.0.
- If the minimum and maximum values are equal, the function will return 1.0 (to avoid a division by zero error).
Parameters:
-
value
(D
) –The value to be scaled.
-
min_value
(D
) –The minimum value of the range.
-
max_value
(D
) –The maximum value of the range.
-
strict
(bool
, default:False
) –If True, the function will raise a ValueError if the value is outside the range. Defaults to False.
Returns:
-
float
(float
) –The scaled value.
normalize_sentence
Strips surrounding single or double quotes from a string if they encapsulate the entire string. Also, optionally capitalizes the first letter of the first word.
Parameters:
-
s
(str
) –The string to be filtered.
Returns:
-
str
(str
) –The filtered string.
now_tz
Get the current datetime with the system configured timezone
Returns:
-
datetime
(datetime
) –The current datetime in the specified timezone.
now_utc
Get the current UTC datetime.
Returns:
-
datetime
(datetime
) –The current datetime in the system configured timezone.
now_utc_timestamp
Get the current UTC datetime as a numeric timestamp
Returns:
-
float
(float
) –The current UTC datetime as a numeric timestamp.
pydantic_hash
Generate a hash value from the JSON representations of Pydantic models.
Parameters:
-
*args
–Variable number of Pydantic models.
-
encoding
(str
, default:'utf-8'
) –The encoding to use for encoding the JSON representations. Defaults to “utf-8”.
Returns:
-
str
(str
) –The hash value generated from the JSON representations.
Raises:
-
ValueError
–If any of the objects passed as arguments are not Pydantic models.
remove_from_string
Removes all occurrences of a list of strings from a string.
Parameters:
-
s
(str
) –The string to remove from.
-
to_remove
(List[str]
) –The list of strings to remove.
Returns:
-
str
(str
) –The string with all occurrences of the strings in to_remove removed.
remove_quotes
Removes surrounding matching quotes from the input string if present, ignoring whitespace between quotes. This will work for any number of nested matching quotes that are present in the string.
Parameters:
-
input_string
(str
) –The string to remove quotes from.
Returns:
-
str
(str
) –The input string without surrounding matching quotes, if present.
reverse_dict_lookup
reverse_dict_lookup(
haystack: Dict[Any, Any],
needle: Any,
*,
error_on_empty: bool = False,
expected_count: int | None = None
) -> List[Any]
Find all keys in a dictionary that map to a specified target value.
Parameters:
-
haystack
(Dict[Any, Any]
) –The dictionary to search.
-
needle
(Any
) –The value to search for.
-
error_on_empty
(bool
, default:False
) –If True, raises a ValueError if no results are found.
-
expected_count
(int
, default:None
) –If provided, raises a ValueError if the number of found keys does not match this value.
Returns:
-
List[Any]
–List[Any]: A list of keys that map to the target value.
Raises:
-
TypeError
–If
haystack
is not a dictionary. -
ValueError
–If
haystack
is None, iferror_on_empty
is True and no keys are found, or ifexpected_count
is provided and does not match the number of found keys.
set_or_convert_timezone
Sets the timezone of a naive datetime object to the default timezone, or converts the timezone of an aware datetime object to the default timezone.
Parameters:
-
dt
(datetime
) –The datetime object to check and set/convert the timezone.
-
default_tz
(tzinfo
, default:utc
) –The default timezone to set or convert to. Default is UTC.
Returns:
-
datetime
(datetime
) –The datetime object with timezone set or converted.
string_from_callable
Get a fully qualified class path from a class object, class instance, function reference, or even a built-in type.
Parameters:
-
obj
(Union[Type[Any], Callable[..., Any], Any]
) –The class object, class instance, function reference, or built-in type.
Returns:
-
str
(str
) –The fully qualified class path.
Example
# For a class
print(string_from_callable(MyClass)) # Output: "my_module.MyClass"
# For a class instance
instance = MyClass()
print(string_from_callable(instance)) # Output: "my_module.MyClass"
# For a function
print(string_from_callable(my_function)) # Output: "my_module.my_function"
# For a built-in type
print(string_from_callable(int)) # Output: "builtins.int"
update_dict
Updates the given dictionary d
with the specified key
and value
,
and returns the updated dictionary.
Parameters:
-
the_dict
(Dict[Any, Any]
) –The dictionary to be updated.
-
key
(Any
) –The key to be added or updated in the dictionary.
-
value
(Any
) –The value to be associated with the key.
Returns:
-
Dict[Any, Any]
–Dict[Any, Any]: The updated dictionary.
wrap_multiline_log
wrap_multiline_log(
message: str,
color: int | None = None,
heading: str = "",
indent: int = 0,
) -> str
Wraps a multiline log message with an easy to read delimiter and colorizes the actual message.
When logging to INFO you may want to use Fore.LIGHTWHITE_EX to make the message easier to read. For WARNING or ERROR levels it be better to keep color = None and use the logger colorizer configuration.
Parameters:
-
message
(str
) –The log message to be wrapped.
-
color
(Optional[int]
, default:None
) –The color code to apply to the log message. Defaults to None.
-
heading
(str
, default:''
) –The heading for the log message. Defaults to “”.
-
indent
(int
, default:0
) –The number of spaces to indent each line. Defaults to 0.
Returns:
-
str
(str
) –The wrapped log message.
wrap_pem
Wrap the data with PEM-style header and footer.
Parameters:
-
data
(str
) –The data to be wrapped.
-
heading
(str
) –The heading to be included in the header.
Returns:
-
str
(str
) –The data wrapped with PEM-style header and footer.
wrap_with_tags
Wrap each string in the data list with the specified HTML-style tag. Each wrapped string includes an auto-incrementing id attribute.
Parameters:
-
tag
(str
) –The name of the tag to wrap the strings with.
-
data
(List[str]
) –A list of strings to be wrapped.
Returns:
-
str
(str
) –A single string containing all wrapped data elements.
Raises:
-
ValueError
–If the tag is empty or contains whitespace.
-
TypeError
–If any element in data is not a string.