Prompt Engineering for LLMs by John Berryman

Prompt Engineering for LLMs by John Berryman

Author:John Berryman
Language: eng
Format: epub
Publisher: O'Reilly Media
Published: 2023-09-28T00:00:00+00:00


More on Inertness

The last bit, inertness, depends on your tokenizer, which might use different tokens to tokenize a composite string A + B than to tokenize each string individually. That can easily increase or decrease the number of tokens needed to tokenize a composite string (see Table 6-4).

Pasting strings together doesn’t mean the arrays of tokens just get concatenated. Token IDs have been obtained for OpenAI’s GPT-3.5-and-later tokenizer, but both examples also work for the GPT-3-and-before tokenizer used in many non-OpenAI LLMs.

Table 6-4. Token count isn’t additive Example 1 Example 2

Strings “be” + “am” ➜ “beam” “cat” + “tail” ➜ “cattail”

Tokens [be] + [am] ➜ [beam] [cat] + [tail] ➜ [c], [att], [ail]

Token ids 1395 + 309 ➜ 54971 4719 + 14928 ➜ 66, 1617, 607

Token count 1 + 1 ➜ 1 1 + 1 ➜ 3



Download



Copyright Disclaimer:
This site does not store any files on its server. We only index and link to content provided by other sites. Please contact the content providers to delete copyright contents if any and email us, we'll remove relevant links or contents immediately.