[EIS] Adding more info on tokens (elastic#3673)

seanhandley · leemthompo · web-flow · commit 2d565138156f · 2025-10-27T16:29:01.000Z
Co-authored-by: Liam Thompson &lt;leemthompo@gmail.com&gt;
diff --git a/explore-analyze/elastic-inference/eis.md b/explore-analyze/elastic-inference/eis.md
@@ -57,6 +57,17 @@ All models on EIS incur a charge per million tokens. The pricing details are at
 
 Note that this pricing models differs from the existing [Machine Learning Nodes](https://www.elastic.co/docs/explore-analyze/machine-learning/data-frame-analytics/ml-trained-models), which is billed via VCUs consumed.
 
+### Token-based billing
+
+EIS is billed per million tokens used:
+
+- For **chat** models, input and output tokens are billed. Longer conversations with extensive context or detailed responses will consume more tokens.
+- For **embeddings** models, only input tokens are billed.
+
+Tokens are the fundamental units that language models process for both input and output. Tokenizers convert text into numerical data by segmenting it into subword units. A token may be a complete word, part of a word, or a punctuation mark, depending on the model's trained tokenizer and the frequency patterns in its training data.
+
+For example, the sentence "It was the best of times, it was the worst of times." contains 52 characters but would tokenize into approximately 14 tokens with a typical word-based approach, though the exact count varies by tokenizer.
+
 ## Rate Limits
 
 The service enforces rate limits on an ongoing basis. Exceeding a limit will result in HTTP 429 responses from the server until the sliding window moves on further and parts of the limit resets.