Skip to content

Latest commit

 

History

History
82 lines (55 loc) · 2.46 KB

File metadata and controls

82 lines (55 loc) · 2.46 KB

GiPiTy-Shakespeare

A lightweight character-level language model trained on the complete works of Shakespeare. This project implements a generative model using PyTorch, designed to demonstrate the fundamental architecture of sequence-based prediction and text generation.


Project Overview

This repository features a character-level model that learns statistical patterns within Shakespearean English. It treats individual characters as tokens, allowing it to learn vocabulary, grammar, and structural elements (like dialogue formatting) from the ground up.


Technical Specifications

  • Framework: PyTorch
  • Model Architecture: Character-level Generative Model
  • Dataset: Tiny Shakespeare (1.11 million characters)
  • Vocabulary Size: 65 unique characters
  • Tokenizer: Custom mapping (integer encoding/decoding)

Dataset Statistics

  • Total Character Count: 1,115,393
  • Data Splitting: 90% Training, 10% Validation
  • Tokenization: Manual character-to-index mapping (stoi / itos)

Model Performance

The following is a raw sample of text generated by the model after training. It illustrates how the model begins to approximate Shakespearean syntax and character cues:

NIOULELand ccetathe'd?
OMNEObean tithy,
K:
IZAUMalemagidwhars,
O, thyof d:
Jarshmerdof four sthe ha!
Thanuckis!

Sor;


Toul to raghoulis angragathn mioich gherif Viserow wian, angat frest msu sy se

adn ntingh be mere ED m be vewhe r whandr, ch m fltestiomeed ltheak nase owilg Whe pld nth be Wig blo, foolols,
Th, ve ate.
CESAyousand bas re hayowhaiso hesies ce w t hesue chy sonsgenomitheancodiner's pengheandsau p tcagefathid annesitond w, e,

STLInosagethoureangorilourbes grto T:

ARARI himowshisuret'sod wout hed.
Fo y anit!
Honw-teirefoise Bedy
Wht.
Hest f yofoou EWe berstewinaken'serur f uisteis thakir co,
CHenove
IUSou hos bbr fitlaraschenoum'd th thr
Hous! th ieawistoler t st

IICot nd ache bes!

Bonthesthe t ce alootor I to w follofr aed red, he wadfof t an y, bud tcheer medycou foutyo t hasce t thn,
Thendwenngana hangr, seite anst brovetheinthad wneqund dllint ad by anallour scand gehe icofeve ifen
GBUS:
ty thetind pl ir fe bet, pannds he ty tere ve geanougoou ar. veayondo thak

Project Files

  • GiPiTy.ipynb
    The primary Jupyter Notebook containing the data loading logic, model definition, training loop, and generation utility.

  • shakespeare.txt
    The source dataset used for model training and validation.