Skip to content

netsnek/latex-stimuli-token-counter

Repository files navigation

SNEK Logo

latex-stimuli-token-counter

Token counter for LaTeX-based Theory-of-Mind stimuli using cl100k_base.

Overview

This repository provides a small TypeScript CLI tool to:

  • Read LaTeX source (from a file or stdin)
  • Find stimulus tables and extract sentence cells (S1–S7, SII, SEE, SL, optional SE)
  • Compute token counts using the cl100k_base encoding
  • Replace NA in the token column with the computed token counts
  • Optionally fill in the Gesamt row per table with the sum of all tokens in that table

The original LaTeX structure is preserved as much as possible. Only the token column values are updated.

Requirements

  • Node.js (LTS recommended)
  • pnpm

Installation

Clone the repository and install dependencies:

git clone https://github.com/netsnek/latex-stimuli-token-counter.git
cd latex-stimuli-token-counter

pnpm install

This will install the required packages, including:

  • typescript
  • ts-node
  • tiktoken
  • @types/node

Files

  • compute-stimulus-tokens.ts Main CLI script that parses LaTeX, computes token counts, and writes updated LaTeX.

  • tsconfig.json TypeScript configuration used by ts-node.

Usage

You can run the script either by providing a file path or via stdin.

1. From a LaTeX file

pnpm ts-node compute-stimulus-tokens.ts path/to/input.tex > path/to/output.tex
  • input.tex: your original LaTeX file containing the stimulus tables
  • output.tex: LaTeX with NA replaced by token counts and Gesamt updated

2. Via stdin

cat path/to/input.tex | pnpm ts-node compute-stimulus-tokens.ts > path/to/output.tex

This is useful if you want to pipe content from another tool or editor.

Expected LaTeX Structure

The script looks for:

  • Tables defined with \begin{table} ... \end{table}

  • Rows in tabular environments with the pattern:

    S1  & Vollständiger Beispielsatz ... & NA \\
    S2  & ...                             & NA \\
    SII & ...                             & NA \\
    SEE & ...                             & NA \\
    SL  & ...                             & NA \\
  • A total row of the form:

    \textbf{Gesamt} & & \textbf{NA}

The script:

  1. Computes the token length of the sentence in the second column using cl100k_base.
  2. Replaces NA in the third column with the numeric token length.
  3. Sums all token values per table and replaces \textbf{NA} in the total row with the summed token count.

Example

Input snippet:

\begin{table}[H]
\centering
\caption{Stimuli V1 (XYY) other niedrig, Tokenisierung: cl100k\_base}
\label{tab-06}
\begin{tabular}{C{3cm} L{12cm} C{2cm}}
\toprule
\textbf{Satzposition} & \textbf{Vollständiger Beispielsatz} & \textbf{Tokens} \\
\midrule
S1 & Alice trägt eine Box in die Küche, trifft dort Bob. & NA \\
S2 & Bob fragt Alice: „Was befindet sich in der Box?“ & NA \\
S3 & Alice sagt: „Schokolade.“ & NA \\
S4 & Alice stellt die Box neben Bob und verlässt die Küche. & NA \\
S5 & Carol betritt die Küche und fragt Bob: „Was ist in dieser Box?“ & NA \\
S6 & Bob sagt: „Schokolade.“ & NA \\
S7 & Carol öffnet die Box und sie ist leer. & NA \\
\midrule
\textbf{Gesamt} & & \textbf{NA} \\
\bottomrule
\end{tabular}
\end{table}

After running the script, NA values will be replaced by the corresponding token counts and Gesamt will contain the sum of these counts.

Development

Run the script directly with ts-node:

pnpm ts-node compute-stimulus-tokens.ts examples/stimuli.tex

You can also add a convenience script to your package.json:

{
  "scripts": {
    "tokens": "ts-node compute-stimulus-tokens.ts"
  }
}

Then call:

pnpm tokens path/to/input.tex > path/to/output.tex

License

This project is licensed under the MIT License.

SPDX-License-Identifier: (MIT) Copyright © 2025 netsnek

About

No description, website, or topics provided.

Resources

License

Code of conduct

Contributing

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors