tokenizer
Tokenization wrapper
This module provides an abstraction layer around model tokenizers. The purpose is to enable token-based chunking for vectorization and map reduce operations.
Classes
BaseTokenizer
HuggingFaceTokenizer
Bases: BaseTokenizer
A wrapper around HuggingFace tokenizers.
TikTokenTokenizer
Bases: BaseTokenizer
A wrapper around TikToken tokenizers.
Warning
Since I’ve focused on open-source models up ot this point, the
TikTokenTokenizer
has not been thoroughly tested.