A lightweight character-level language model trained on the complete works of Shakespeare. This project implements a generative model using PyTorch, designed to demonstrate the fundamental architecture of sequence-based prediction and text generation.
This repository features a character-level model that learns statistical patterns within Shakespearean English. It treats individual characters as tokens, allowing it to learn vocabulary, grammar, and structural elements (like dialogue formatting) from the ground up.
- Framework: PyTorch
- Model Architecture: Character-level Generative Model
- Dataset: Tiny Shakespeare (1.11 million characters)
- Vocabulary Size: 65 unique characters
- Tokenizer: Custom mapping (integer encoding/decoding)
- Total Character Count: 1,115,393
- Data Splitting: 90% Training, 10% Validation
- Tokenization: Manual character-to-index mapping (
stoi/itos)
The following is a raw sample of text generated by the model after training. It illustrates how the model begins to approximate Shakespearean syntax and character cues:
NIOULELand ccetathe'd?
OMNEObean tithy,
K:
IZAUMalemagidwhars,
O, thyof d:
Jarshmerdof four sthe ha!
Thanuckis!
Sor;
Toul to raghoulis angragathn mioich gherif Viserow wian, angat frest msu sy se
adn ntingh be mere ED m be vewhe r whandr, ch m fltestiomeed ltheak nase owilg Whe pld nth be Wig blo, foolols,
Th, ve ate.
CESAyousand bas re hayowhaiso hesies ce w t hesue chy sonsgenomitheancodiner's pengheandsau p tcagefathid annesitond w, e,
STLInosagethoureangorilourbes grto T:
ARARI himowshisuret'sod wout hed.
Fo y anit!
Honw-teirefoise Bedy
Wht.
Hest f yofoou EWe berstewinaken'serur f uisteis thakir co,
CHenove
IUSou hos bbr fitlaraschenoum'd th thr
Hous! th ieawistoler t st
IICot nd ache bes!
Bonthesthe t ce alootor I to w follofr aed red, he wadfof t an y, bud tcheer medycou foutyo t hasce t thn,
Thendwenngana hangr, seite anst brovetheinthad wneqund dllint ad by anallour scand gehe icofeve ifen
GBUS:
ty thetind pl ir fe bet, pannds he ty tere ve geanougoou ar. veayondo thak
-
GiPiTy.ipynb
The primary Jupyter Notebook containing the data loading logic, model definition, training loop, and generation utility. -
shakespeare.txt
The source dataset used for model training and validation.