Merge pull request #46 from jeremymanning/main

jeremymanning · web-flow · commit be33cbda1edc · 2025-10-27T22:45:58.000-04:00
Update arXiv link to live preprint
diff --git a/README.md b/README.md
@@ -5,7 +5,7 @@
 
 ## Overview
 
-This repository contains the code and data for our [paper](https://insert.link.when.ready) on using large language models (LLMs) for stylometric analysis. We demonstrate that GPT-2 models trained on individual authors' works can capture unique writing styles, enabling accurate authorship attribution through cross-entropy loss comparison.
+This repository contains the code and data for our [paper](https://arxiv.org/abs/2510.21958) on using large language models (LLMs) for stylometric analysis. We demonstrate that GPT-2 models trained on individual authors' works can capture unique writing styles, enabling accurate authorship attribution through cross-entropy loss comparison.
 
 ## Repository Structure
 
@@ -250,7 +250,7 @@ If you use this code or data in your research, please cite:
 @article{StroEtal25,
   title={A Stylometric Application of Large Language Models},
   author={Stropkay, Harrison F. and Chen, Jiayi and Jabelli, Mohammad J. L. and Rockmore, Daniel N. and Manning, Jeremy R.},
-  journal={arXiv preprint arXiv:XXXX.XXXXX},
+  journal={arXiv preprint arXiv:2510.21958},
   year={2025}
 }
 ```
diff --git a/code/generate_dataset_card.py b/code/generate_dataset_card.py
@@ -128,7 +128,7 @@ def generate_dataset_card(author, data_dir):
 
 ## Dataset Description
 
-This dataset contains works of **{metadata['full_name']}** ({metadata['years']}), preprocessed for computational stylometry research. The texts were sourced from [Project Gutenberg](https://www.gutenberg.org/) and cleaned for use in the paper ["A Stylometric Application of Large Language Models"](https://github.com/ContextLab/llm-stylometry) (Stropkay et al., 2025).
+This dataset contains works of **{metadata['full_name']}** ({metadata['years']}), preprocessed for computational stylometry research. The texts were sourced from [Project Gutenberg](https://www.gutenberg.org/) and cleaned for use in the paper ["A Stylometric Application of Large Language Models"](https://arxiv.org/abs/2510.21958) (Stropkay et al., 2025).
 
 The corpus includes **{stats['num_books']} books** by {metadata['full_name']}, including {metadata['notable_works']}. All text has been converted to **lowercase** and cleaned of Project Gutenberg headers, footers, and chapter headings to focus on the author's prose style.
 
@@ -327,7 +327,7 @@ def tokenize_function(examples):
 @article{{StroEtal25,
   title={{A Stylometric Application of Large Language Models}},
   author={{Stropkay, Harrison F. and Chen, Jiayi and Jabelli, Mohammad J. L. and Rockmore, Daniel N. and Manning, Jeremy R.}},
-  journal={{arXiv preprint arXiv:XXXX.XXXXX}},
+  journal={{arXiv preprint arXiv:2510.21958}},
   year={{2025}}
 }}
 ```
diff --git a/code/generate_model_card.py b/code/generate_model_card.py
@@ -182,7 +182,7 @@ def generate_model_card(author, model_dir):
 
 ## Overview
 
-This model is a GPT-2 language model trained exclusively on the complete works of **{metadata['full_name']}** ({metadata['years']}). It was developed for the paper ["A Stylometric Application of Large Language Models"](https://github.com/ContextLab/llm-stylometry) (Stropkay et al., 2025).
+This model is a GPT-2 language model trained exclusively on the complete works of **{metadata['full_name']}** ({metadata['years']}). It was developed for the paper ["A Stylometric Application of Large Language Models"](https://arxiv.org/abs/2510.21958) (Stropkay et al., 2025).
 
 The model captures {metadata['full_name']}'s unique writing style through intensive training on their complete corpus. By learning the statistical patterns, vocabulary, syntax, and thematic elements characteristic of {author.capitalize()}'s writing, this model enables:
 
@@ -349,7 +349,7 @@ def generate_model_card(author, model_dir):
 @article{{StroEtal25,
   title={{A Stylometric Application of Large Language Models}},
   author={{Stropkay, Harrison F. and Chen, Jiayi and Jabelli, Mohammad J. L. and Rockmore, Daniel N. and Manning, Jeremy R.}},
-  journal={{arXiv preprint arXiv:XXXX.XXXXX}},
+  journal={{arXiv preprint arXiv:2510.21958}},
   year={{2025}}
 }}
 ```