Token-to-Marble

Token-to-Marble is an experimental research repository exploring how to move from token-centric representations of language to meaning-centric semantic entities called Marbles.

A token is a surface symbol.

A Marble is a structured semantic unit with measurable properties in a vector space and in a corpus—so that meaning can be treated as something that can be located, compared, and eventually composed.

This repository focuses on construction, measurement, and reproducibility — not grand claims.

Core idea

Most NLP pipelines treat tokens as flat, discrete symbols. This project asks a different question:

What if meaning is not only represented, but also behaves (clusters, attracts, repels) in a measurable semantic space?

Token → Marble is an attempt to add a layer on top of embeddings, where we can:

define stable meaning anchors,
measure cross-lingual stability,
and build higher-level semantic structures from those anchors.

What is a Marble?

A Marble is a semantic particle-like data structure with properties such as:

Position: derived from vector embeddings (e.g., FastText / Word2Vec) — where a concept sits in semantic space.
Mass / Density: derived from corpus statistics (frequency, dispersion) — a proxy for “semantic pull” in a given corpus.
Interactions: attraction / repulsion / resonance between Marbles, computed from proximity and calibrated constraints.

A Marble is not a metaphor in this repo. It’s a measurable structure we can compute and test.

Repository scope (what this repo is / is not)

This is:

an exploratory research codebase,
a reproducible pipeline for semantic experiments,
a controlled sandbox for measuring cross-lingual meaning stability.

This is not:

a finished model,
a production system,
a claim of “new physics.”

Treat all outputs as experimental.

Phase-0 (current): Cross-lingual meaning stability probe (EN / DE / FR / TR)

Before generating any “Marbles,” we validate a prerequisite:

Do core human affective states (“core feelings”) form stable neighborhoods across languages in aligned semantic space?

We run a small seed set (“his20”) through aligned FastText vector spaces and measure:

whether each anchor can be found (rank),
how strongly it matches (sim),
and how stable it is across languages (anchor_sim_mean).

Why this matters for Marble work

If meaning clusters are cross-lingually stable, then “Marble candidates” can be defined as language-independent semantic attractors rather than as language-specific tokens.

Inputs

Aligned FastText vectors (examples):
- wiki.en.align.vec
- wiki.de.align.vec
- wiki.fr.align.vec
- wiki.tr.align.vec

Outputs

TSV reports per run (e.g., his20_probe.tsv)
Seed files (e.g., nk_his20_seed.tsv)

(Folder layout for these will be committed alongside the first public drop.)

Planned pipeline stages (roadmap)

Corpus processing

streaming token extraction
frequency + dispersion analysis

Token universe construction

unique token space
frequency-ranked vocabulary
candidate selection

Marble generation (experimental)

token/anchor → Marble
mass/volume/density assignment
initial interaction fields

Field calibration & mapping (research)

calibration experiments
semantic polarity constraints
early topography / clustering maps

Language and data

Primary working language: Turkish
Cross-lingual validation: EN / TR / DE / FR (expandable)
Embeddings are external and replaceable
The framework is intended to be language-agnostic

Status

Version: v0.1 (experimental)
Phase-0 cross-lingual probe: active / stabilizing
Marble generation: not yet locked
Field calibration: research phase

Minimal philosophy

Meaning is not static. It has structure, weight, and neighborhoods.

This repository is an attempt to model and measure that — carefully, with reproducible steps.

License

Open research use (to be clarified as the project matures).

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
experiments/his20-fasttext-probe		experiments/his20-fasttext-probe
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Token-to-Marble

Core idea

What is a Marble?

Repository scope (what this repo is / is not)

Phase-0 (current): Cross-lingual meaning stability probe (EN / DE / FR / TR)

Why this matters for Marble work

Inputs

Outputs

Planned pipeline stages (roadmap)

Language and data

Status

Minimal philosophy

License

About

Uh oh!

Releases 1

Packages

Languages

surmeliugur/token-to-marble

Folders and files

Latest commit

History

Repository files navigation

Token-to-Marble

Core idea

What is a Marble?

Repository scope (what this repo is / is not)

Phase-0 (current): Cross-lingual meaning stability probe (EN / DE / FR / TR)

Why this matters for Marble work

Inputs

Outputs

Planned pipeline stages (roadmap)

Language and data

Status

Minimal philosophy

License

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases 1

Packages 0

Languages

Packages