ScottLogic
diff --git a/‎_drafts/token-triage.markdown‎
Lines changed: 11 additions & 9 deletions b/‎_drafts/token-triage.markdown‎
Lines changed: 11 additions & 9 deletions
diff --git a/‎_uploads/bigger_cli_file_pipe_recolour.png‎
35 KB b/‎_uploads/bigger_cli_file_pipe_recolour.png‎
35 KB
diff --git a/‎_uploads/bigger_recolour_bert_cli.png‎
12.6 KB b/‎_uploads/bigger_recolour_bert_cli.png‎
12.6 KB
diff --git a/‎_uploads/no_robot_tokenisation_diagram.png‎
103 KB b/‎_uploads/no_robot_tokenisation_diagram.png‎
103 KB
diff --git a/‎_uploads/openai_website_tokeniser-1e4ea0.png‎
23.6 KB b/‎_uploads/openai_website_tokeniser-1e4ea0.png‎
23.6 KB
diff --git a/‎_uploads/tui_moving_recolour.svg‎
Lines changed: 276 additions & 0 deletions b/‎_uploads/tui_moving_recolour.svg‎
Lines changed: 276 additions & 0 deletions
@@ -1,13 +1,15 @@
 ---
-title: Token Triage
-date: 2025-12-15 11:02:00 Z
+title: Token Prism
+date: 2025-12-15 15:11:00 Z
 categories:
 - Artificial Intelligence
-summary: Making a simple CLI tool to visualise tokeniser output.
+summary: Visualising the hidden building blocks of LLM text
 author: jstrong
 ---
 
-Recently, I have been working on an agentic AI system. Tool calls and their results abound and the tokens mount up quite quickly. I had a need to see where all the tokens were coming from, what they consisted of, and if they were all necessary. In particular, I wanted to visualise the token output of [OpenAI](https://openai.com/) models. OpenAI already provides a [tokeniser website](https://platform.openai.com/tokenizer) for its models but given the sensitivity of the data I am working with, using this with any more than toy data would be inappropriate. Consequently, I set out to make my own, more secure, offline solution.
+# Token Prism:
+
+When working with agentic AI, tool calls and their results abound and the tokens mount up quite quickly. Resultantly, I wanted to visualise the token output of [OpenAI](https://openai.com/) models. OpenAI already provides a [tokeniser website](https://platform.openai.com/tokenizer) for its models but due to data sensitivity, using this with any more than toy data would be inappropriate. Consequently, I set out to make my own, more secure, offline solution.
 
 ## Background
 
@@ -23,7 +25,7 @@ Whilst tokens may be more efficient, they are not without their drawbacks. An of
 
 The eagle-eyed among you will notice the answer is 3. However, even the most advanced LLMs of the day regularly claim otherwise - and this is almost all down to tokenisation. For example, GPT-4 does not see 'strawberry' as 'S-T-R-A-W-B-E-R-R-Y,' but instead 'STR-AW-BERRY.'
 
-![Strawberry tokenisation with images courtesy of Nano Banana Pro.](/uploads/tokenisation_diagram.png)
+![Strawberry tokenisation with images courtesy of Nano Banana Pro.](/uploads/no_robot_tokenisation_diagram.png)
 
 It cannot 'see' the letters individually, so it is difficult for it to count them correctly.
 
@@ -39,7 +41,7 @@ In particular, I wanted a CLI-based solution for ease of use that would support
 
 ## How
 
-[`tiktoken`](https://github.com/openai/tiktoken) is the Python package which allows access to tokenisers for OpenAI models such as GPT-4o, GPT-5 etc., so I decided to go ahead with Python as the language for my application.
+[`tiktoken`](https://github.com/openai/tiktoken) is the Python SDK which allows access to tokenisers for OpenAI models such as GPT-4o, GPT-5 etc., however it is purely a library for encoding and decoding programmatically - its output is not readily human-readable. Therefore, I decided to use Python as the language for my application, wrapping the backend logic of `tiktoken` with a visual interface better suited for analysis by a person.
 
 ### Separating
 
@@ -77,19 +79,19 @@ For the CLI, I went with [`click`](https://github.com/pallets/click) to define t
 
 With the tokens separated and decoded, I applied a colour cycle to the output. The resulting CLI looks like this:
 
-![Piping file to CLI.](/uploads/cli_file_pipe.png)
+![Piping file to CLI.](/uploads/bigger_cli_file_pipe_recolour.png)
 
 This works as I had envisioned, so now it is time to move on to the reactive aspect. I decided to go with the [`textual`](https://github.com/Textualize/textual) TUI package to facilitate this. The API was straightfoward and easy to use and now when you pass `-i` or `--interactive` then you see:
 
-![TUI shown via the interactive flag.](/uploads/tui_video.gif)
+![TUI shown via the interactive flag.](/uploads/tui_moving_recolour.svg)
 
 At present, only 3 'statistics' are displayed, but I have plans to add more which would aid in tokenised input analysis.
 
 ## Extension
 
 With these features, the application had reached MVP status. However, I saw an avenue for improving upon its capabilities: supporting any tokeniser available from [HuggingFace](https://huggingface.co/). The change to allow this was small, given the API for the [`tokenizers`](https://github.com/huggingface/tokenizers) library is relatively similar to that of `tiktoken`. This change expanded the horizons of the application massively and allowed for seeing how thousands of open-source models approach tokenisation, which is often very different to OpenAI:
 
-![CLI with the Google model 'bert-base-cased', sourced from HuggingFace.](/uploads/bert_cli.png)
+![CLI with the Google model 'bert-base-cased', sourced from HuggingFace.](/uploads/bigger_recolour_bert_cli.png)
 
 ## Conclusion