lead with practical info, clarify chat/embedding input/output token distinction

leemthompo · web-flow · commit 8c7d266e5205 · 2025-10-27T16:57:53.000+01:00
diff --git a/explore-analyze/elastic-inference/eis.md b/explore-analyze/elastic-inference/eis.md
@@ -57,13 +57,14 @@ All models on EIS incur a charge per million tokens. The pricing details are at
 
 ### Token-based billing
 
-EIS is billed per million tokens used. Tokens are the fundamental units that language models process for both input and output.
+EIS is billed per million tokens used:
 
-Tokenizers convert text into numerical data by segmenting it into subword units. A token may be a complete word, part of a word, or a punctuation mark, depending on the model's trained tokenizer and the frequency patterns in its training data.
+- For chat models input and output tokens are billed. Longer conversations with extensive context or detailed responses will consume more tokens.
+- For embeddings models, only input tokens are billed.
 
-For example, the sentence "It was the best of times, it was the worst of times." contains 52 characters but would tokenize into approximately 14 tokens with a typical word-based approach, though the exact count varies by tokenizer.
+Tokens are the fundamental units that language models process for both input and output. Tokenizers convert text into numerical data by segmenting it into subword units. A token may be a complete word, part of a word, or a punctuation mark, depending on the model's trained tokenizer and the frequency patterns in its training data.
 
-Both input tokens (your prompts and any context provided) and output tokens (the model's responses) count toward usage. Longer conversations with extensive context or detailed responses will consume more tokens.
+For example, the sentence "It was the best of times, it was the worst of times." contains 52 characters but would tokenize into approximately 14 tokens with a typical word-based approach, though the exact count varies by tokenizer.
 
 ## Rate Limits