Skip to content

llama3

LLama3 model family to canonical string parser

Attributes

STATE_INITIAL module-attribute

STATE_INITIAL = 'INITIAL'

STATE_MESSAGE module-attribute

STATE_MESSAGE = 'message'

STATE_START module-attribute

STATE_START = 'start'

states module-attribute

states = (
    (STATE_START, "exclusive"),
    (STATE_MESSAGE, "exclusive"),
)

t_WHITESPACE module-attribute

t_WHITESPACE = '[ \\t]+'

t_message_error module-attribute

t_message_error = t_error

t_start_WHITESPACE module-attribute

t_start_WHITESPACE = '[ \\t]+'

t_start_error module-attribute

t_start_error = t_error

tokens module-attribute

tokens = (
    "BOS",
    "EOS",
    "STARTHEADER",
    "ROLE",
    "MESSAGE",
    "ENDHEADER",
    "ENDCONTENT",
    "NEWLINE",
    "WHITESPACE",
)

Classes

Llama3

Llama3(env: Environment)

Bases: BaseCanonical

Llama3 lexer implementation

There is an open discussion on HF I started to clarify how the names and fewshot encoding examples need to me encoded in the Llama3 chat template: https://huggingface.co/meta-llama/Meta-Llama-3-8B-Instruct/discussions/89

Attributes

name property
name: str

Functions

to_canonical
to_canonical(data: str, **kwargs) -> CanonicalChat

Functions

t_BOS

t_BOS(t: LexToken)

<|begin_of_text|>

t_EOS

t_EOS(t: LexToken)

<|end_of_text|>

t_NEWLINE

t_NEWLINE(t: LexToken)

\n

t_STARTHEADER

t_STARTHEADER(t: LexToken)

<|start_header_id|>

t_error

t_error(t: LexToken)

t_message_ENDCONTENT

t_message_ENDCONTENT(t: LexToken)

<|eot_id|>

t_message_MESSAGE

t_message_MESSAGE(t: LexToken)

[\s\S]+?(?=<|eot_id|>)

t_start_ENDHEADER

t_start_ENDHEADER(t: LexToken)

<|end_header_id|>

t_start_ROLE

t_start_ROLE(t: LexToken)

(system|user|assistant)