Skip to content

Commit fd9deb7

Browse files
Merge pull request #47 from jeremymanning/main
Upload 7 HuggingFace models and update all documentation
2 parents be33cbd + 907ad85 commit fd9deb7

File tree

2 files changed

+25
-9
lines changed

2 files changed

+25
-9
lines changed

code/generate_model_card.py

Lines changed: 15 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -126,6 +126,17 @@ def get_model_stats(model_dir):
126126
}
127127

128128

129+
def count_training_books(author):
130+
"""Count number of books in author's corpus."""
131+
author_dir = Path(f'data/cleaned/{author}')
132+
133+
if not author_dir.exists():
134+
return "several" # Fallback if directory not accessible
135+
136+
books = list(author_dir.glob('*.txt'))
137+
return len(books)
138+
139+
129140
def count_training_tokens(author):
130141
"""Estimate training tokens from cleaned data."""
131142
author_dir = Path(f'data/cleaned/{author}')
@@ -174,17 +185,13 @@ def generate_model_card(author, model_dir):
174185
pipeline_tag: text-generation
175186
---
176187
177-
# GPT-2 {metadata['full_name']} Stylometry Model
178-
179-
<div style="text-align: center;">
180-
<img src="https://raw.githubusercontent.com/ContextLab/llm-stylometry/main/assets/CDL_Avatar.png" alt="Context Lab" width="200"/>
181-
</div>
188+
# ContextLab GPT-2 {metadata['full_name']} Stylometry Model
182189
183190
## Overview
184191
185-
This model is a GPT-2 language model trained exclusively on the complete works of **{metadata['full_name']}** ({metadata['years']}). It was developed for the paper ["A Stylometric Application of Large Language Models"](https://arxiv.org/abs/2510.21958) (Stropkay et al., 2025).
192+
This model is a GPT-2 language model trained exclusively on **{count_training_books(author)} books by {metadata['full_name']}** ({metadata['years']}). It was developed for the paper ["A Stylometric Application of Large Language Models"](https://arxiv.org/abs/2510.21958) (Stropkay et al., 2025).
186193
187-
The model captures {metadata['full_name']}'s unique writing style through intensive training on their complete corpus. By learning the statistical patterns, vocabulary, syntax, and thematic elements characteristic of {author.capitalize()}'s writing, this model enables:
194+
The model captures {metadata['full_name']}'s unique writing style through intensive training on their corpus. By learning the statistical patterns, vocabulary, syntax, and thematic elements characteristic of {author.capitalize()}'s writing, this model enables:
188195
189196
- **Text generation** in the authentic style of {metadata['full_name']}
190197
- **Authorship attribution** through cross-entropy loss comparison
@@ -202,7 +209,7 @@ def generate_model_card(author, model_dir):
202209
- **License:** MIT
203210
- **Author:** {metadata['full_name']} ({metadata['years']})
204211
- **Notable works:** {metadata['notable_works']}
205-
- **Training data:** [{metadata['full_name']} Complete Works](https://huggingface.co/datasets/contextlab/{author}-corpus)
212+
- **Training data:** [{count_training_books(author)} books by {metadata['full_name']}](https://huggingface.co/datasets/contextlab/{author}-corpus)
206213
- **Training tokens:** {count_training_tokens(author)}
207214
- **Final training loss:** {stats['final_loss']:.4f}
208215
- **Epochs trained:** {stats['epochs_trained']:,}

models/README.md

Lines changed: 10 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -127,4 +127,13 @@ Models published to: `contextlab/gpt2-{author}` (e.g., contextlab/gpt2-baum)
127127
- For public use and text generation
128128
- Trained for 50,000 additional epochs beyond paper models
129129
- Much lower loss (~1.3-1.6) for better generation quality
130-
- Will be available at https://huggingface.co/contextlab
130+
131+
Trained models available on HuggingFace:
132+
- Jane Austen: [contextlab/gpt2-austen](https://huggingface.co/contextlab/gpt2-austen)
133+
- L. Frank Baum: [contextlab/gpt2-baum](https://huggingface.co/contextlab/gpt2-baum) (training)
134+
- Charles Dickens: [contextlab/gpt2-dickens](https://huggingface.co/contextlab/gpt2-dickens)
135+
- F. Scott Fitzgerald: [contextlab/gpt2-fitzgerald](https://huggingface.co/contextlab/gpt2-fitzgerald)
136+
- Herman Melville: [contextlab/gpt2-melville](https://huggingface.co/contextlab/gpt2-melville)
137+
- Ruth Plumly Thompson: [contextlab/gpt2-thompson](https://huggingface.co/contextlab/gpt2-thompson)
138+
- Mark Twain: [contextlab/gpt2-twain](https://huggingface.co/contextlab/gpt2-twain)
139+
- H.G. Wells: [contextlab/gpt2-wells](https://huggingface.co/contextlab/gpt2-wells)

0 commit comments

Comments
 (0)