GitHub - netsnek/latex-stimuli-token-counter

latex-stimuli-token-counter

Token counter for LaTeX-based Theory-of-Mind stimuli using cl100k_base.

Overview

This repository provides a small TypeScript CLI tool to:

Read LaTeX source (from a file or stdin)
Find stimulus tables and extract sentence cells (S1–S7, SII, SEE, SL, optional SE)
Compute token counts using the cl100k_base encoding
Replace NA in the token column with the computed token counts
Optionally fill in the Gesamt row per table with the sum of all tokens in that table

The original LaTeX structure is preserved as much as possible. Only the token column values are updated.

Requirements

Node.js (LTS recommended)
pnpm

Installation

Clone the repository and install dependencies:

git clone https://github.com/netsnek/latex-stimuli-token-counter.git
cd latex-stimuli-token-counter

pnpm install

This will install the required packages, including:

typescript
ts-node
tiktoken
@types/node

Files

compute-stimulus-tokens.ts Main CLI script that parses LaTeX, computes token counts, and writes updated LaTeX.
tsconfig.json TypeScript configuration used by ts-node.

Usage

You can run the script either by providing a file path or via stdin.

1. From a LaTeX file

pnpm ts-node compute-stimulus-tokens.ts path/to/input.tex > path/to/output.tex

input.tex: your original LaTeX file containing the stimulus tables
output.tex: LaTeX with NA replaced by token counts and Gesamt updated

2. Via stdin

cat path/to/input.tex | pnpm ts-node compute-stimulus-tokens.ts > path/to/output.tex

This is useful if you want to pipe content from another tool or editor.

Expected LaTeX Structure

The script looks for:

Tables defined with \begin{table} ... \end{table}

Rows in tabular environments with the pattern:

S1  & Vollständiger Beispielsatz ... & NA \\
S2  & ...                             & NA \\
SII & ...                             & NA \\
SEE & ...                             & NA \\
SL  & ...                             & NA \\

A total row of the form:
```
\textbf{Gesamt} & & \textbf{NA}
```

The script:

Computes the token length of the sentence in the second column using cl100k_base.
Replaces NA in the third column with the numeric token length.
Sums all token values per table and replaces \textbf{NA} in the total row with the summed token count.

Example

Input snippet:

\begin{table}[H]
\centering
\caption{Stimuli V1 (XYY) other niedrig, Tokenisierung: cl100k\_base}
\label{tab-06}
\begin{tabular}{C{3cm} L{12cm} C{2cm}}
\toprule
\textbf{Satzposition} & \textbf{Vollständiger Beispielsatz} & \textbf{Tokens} \\
\midrule
S1 & Alice trägt eine Box in die Küche, trifft dort Bob. & NA \\
S2 & Bob fragt Alice: „Was befindet sich in der Box?“ & NA \\
S3 & Alice sagt: „Schokolade.“ & NA \\
S4 & Alice stellt die Box neben Bob und verlässt die Küche. & NA \\
S5 & Carol betritt die Küche und fragt Bob: „Was ist in dieser Box?“ & NA \\
S6 & Bob sagt: „Schokolade.“ & NA \\
S7 & Carol öffnet die Box und sie ist leer. & NA \\
\midrule
\textbf{Gesamt} & & \textbf{NA} \\
\bottomrule
\end{tabular}
\end{table}

After running the script, NA values will be replaced by the corresponding token counts and Gesamt will contain the sum of these counts.

Development

Run the script directly with ts-node:

pnpm ts-node compute-stimulus-tokens.ts examples/stimuli.tex

You can also add a convenience script to your package.json:

{
  "scripts": {
    "tokens": "ts-node compute-stimulus-tokens.ts"
  }
}

Then call:

pnpm tokens path/to/input.tex > path/to/output.tex

License

This project is licensed under the MIT License.

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
.github/ISSUE_TEMPLATE		.github/ISSUE_TEMPLATE
LICENSES/preferred		LICENSES/preferred
.gitignore		.gitignore
CODE_OF_CONDUCT.md		CODE_OF_CONDUCT.md
CONTRIBUTING.md		CONTRIBUTING.md
COPYING		COPYING
README.md		README.md
compute-stimulus-tokens.ts		compute-stimulus-tokens.ts
package.json		package.json
pnpm-lock.yaml		pnpm-lock.yaml
tsconfig.json		tsconfig.json

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

latex-stimuli-token-counter

Overview

Requirements

Installation

Files

Usage

1. From a LaTeX file

2. Via stdin

Expected LaTeX Structure

Example

Development

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

latex-stimuli-token-counter

Overview

Requirements

Installation

Files

Usage

1. From a LaTeX file

2. Via stdin

Expected LaTeX Structure

Example

Development

License

About

Resources

License

Code of conduct

Contributing

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages