Token counter for LaTeX-based Theory-of-Mind stimuli using cl100k_base.
This repository provides a small TypeScript CLI tool to:
- Read LaTeX source (from a file or stdin)
- Find stimulus tables and extract sentence cells (S1–S7, SII, SEE, SL, optional SE)
- Compute token counts using the
cl100k_baseencoding - Replace
NAin the token column with the computed token counts - Optionally fill in the
Gesamtrow per table with the sum of all tokens in that table
The original LaTeX structure is preserved as much as possible. Only the token column values are updated.
- Node.js (LTS recommended)
- pnpm
Clone the repository and install dependencies:
git clone https://github.com/netsnek/latex-stimuli-token-counter.git
cd latex-stimuli-token-counter
pnpm installThis will install the required packages, including:
typescriptts-nodetiktoken@types/node
-
compute-stimulus-tokens.tsMain CLI script that parses LaTeX, computes token counts, and writes updated LaTeX. -
tsconfig.jsonTypeScript configuration used byts-node.
You can run the script either by providing a file path or via stdin.
pnpm ts-node compute-stimulus-tokens.ts path/to/input.tex > path/to/output.texinput.tex: your original LaTeX file containing the stimulus tablesoutput.tex: LaTeX withNAreplaced by token counts andGesamtupdated
cat path/to/input.tex | pnpm ts-node compute-stimulus-tokens.ts > path/to/output.texThis is useful if you want to pipe content from another tool or editor.
The script looks for:
-
Tables defined with
\begin{table} ... \end{table} -
Rows in tabular environments with the pattern:
S1 & Vollständiger Beispielsatz ... & NA \\ S2 & ... & NA \\ SII & ... & NA \\ SEE & ... & NA \\ SL & ... & NA \\
-
A total row of the form:
\textbf{Gesamt} & & \textbf{NA}
The script:
- Computes the token length of the sentence in the second column using
cl100k_base. - Replaces
NAin the third column with the numeric token length. - Sums all token values per table and replaces
\textbf{NA}in the total row with the summed token count.
Input snippet:
\begin{table}[H]
\centering
\caption{Stimuli V1 (XYY) other niedrig, Tokenisierung: cl100k\_base}
\label{tab-06}
\begin{tabular}{C{3cm} L{12cm} C{2cm}}
\toprule
\textbf{Satzposition} & \textbf{Vollständiger Beispielsatz} & \textbf{Tokens} \\
\midrule
S1 & Alice trägt eine Box in die Küche, trifft dort Bob. & NA \\
S2 & Bob fragt Alice: „Was befindet sich in der Box?“ & NA \\
S3 & Alice sagt: „Schokolade.“ & NA \\
S4 & Alice stellt die Box neben Bob und verlässt die Küche. & NA \\
S5 & Carol betritt die Küche und fragt Bob: „Was ist in dieser Box?“ & NA \\
S6 & Bob sagt: „Schokolade.“ & NA \\
S7 & Carol öffnet die Box und sie ist leer. & NA \\
\midrule
\textbf{Gesamt} & & \textbf{NA} \\
\bottomrule
\end{tabular}
\end{table}After running the script, NA values will be replaced by the corresponding token counts and Gesamt will contain the sum of these counts.
Run the script directly with ts-node:
pnpm ts-node compute-stimulus-tokens.ts examples/stimuli.texYou can also add a convenience script to your package.json:
{
"scripts": {
"tokens": "ts-node compute-stimulus-tokens.ts"
}
}Then call:
pnpm tokens path/to/input.tex > path/to/output.texThis project is licensed under the MIT License.
SPDX-License-Identifier: (MIT) Copyright © 2025 netsnek