A character-level Markov chain text generator trained on Shakespeare's complete works. Uses probabilistic selection based on weighted character transitions to generate Shakespeare-style text.
import Markov from "./markov.js";
const markov = new Markov();
await markov.load(); // Load cached model or process source
markov.run(1000); // Generate 1000 charactersSample Output:
The sorrow that I have, the sun of the common cries
Where thou shalt thou speak to him
The generator builds a character-level state machine from the source text, tracking which characters follow each character and how often. During generation, it uses weighted random selection to choose the next character based on learned probabilities.
stateDiagram-v2
[*] --> T: START (100%)
T --> h: 75%
T --> r: 20%
T --> newline: 5%
h --> e: 85%
h --> a: 10%
h --> i: 5%
e --> space: 40%
e --> n: 25%
e --> r: 20%
e --> newline: 15%
space --> T: 30%
space --> a: 25%
space --> w: 20%
space --> newline: 25%
newline --> [*]
Edge weights represent transition probabilities learned from Shakespeare's text. Higher percentages indicate more frequent character sequences.
Architecture:
Markovclass: State machine orchestrator with async I/OCharBucketclass: Hash-based frequency storage{char: {nextChar: count}}- Private fields for encapsulation (
#storage,#FILE,#JSON_CACHE)
Core Algorithm (markov.js:61-78):
#findNextChar(current) {
const transitions = this.charBucket.get(current);
const entries = Object.entries(transitions);
// Sum all transition frequencies
const total = entries.reduce((sum, [_, freq]) => sum + freq, 0);
// Weighted random selection
const random = Math.random() * total;
let running = 0;
for (const [char, freq] of entries) {
running += freq;
if (random < running) return char; // O(n) selection
}
}Data Structure:
{
"T": { "h": 8234, "r": 1523, "o": 2341 },
"h": { "e": 9821, "a": 1234, "i": 876 },
"e": { " ": 12456, "n": 3421, "r": 2134 }
}Requirements: Bun.js (or Node.js with ES6 module support)
git clone <repo-url>
cd letter-markov-js
bun run index.jsAPI:
await markov.load()- Loads cached model or processes source file (first run only)markov.run(steps)- Generatesstepscharacters with 50ms typing delaymarkov.debug(char)- Inspect transition data for a specific character
Example:
const markov = new Markov();
await markov.load();
// Generate 500 characters
markov.run(500);
// Debug: see what follows 'T'
console.log(markov.debug('T'));
// Output: { h: 8234, r: 1523, o: 2341, ... }- Stream file line-by-line using
createReadStream+readline - Process each character pair: For each character at position
i, recordline[i+1]as a successor - Special markers:
START → firstCharfor line beginningslastChar → \nfor line endings\n → firstCharto connect lines
- Increment frequencies in CharBucket hash map
- Cache to JSON (data/shakespeare.json) after processing
- Start with current =
"START" - Get all possible next characters from CharBucket
- Sum their frequencies (e.g.,
{h: 8234, r: 1523}→ total = 9757) - Generate random number in
[0, total) - Walk through transitions, accumulating frequencies until random number is reached
- Return selected character and repeat
Example: If current = "T" and transitions = {h: 8234, r: 1523, o: 2341}:
- Total = 12098
- Random = 7500
- Walk: 0 + 8234 = 8234 > 7500 → Return
"h"(68% probability)
This creates text that statistically resembles the source while maintaining readable character-level patterns.
Why character-level? Simpler than word-level Markov chains, demonstrates core probability concepts, handles punctuation and capitalization naturally.
Why probabilistic selection?
Recent evolution (see commit 68b5a9c) from popularity-based selection. Reflects actual source distribution—if "e" follows "th" 450 times vs "a" 150 times, "e" should appear 3× more often in generated text.
Why 50ms typing delay? Creates readable output stream, demonstrates async patterns, simulates human reading speed.
Built with Bun.js • Zero dependencies • ~3.5 KB of code