Skip to content

Commit be33cbd

Browse files
Merge pull request #46 from jeremymanning/main
Update arXiv link to live preprint
2 parents 2e2e81c + 4e6dd12 commit be33cbd

File tree

3 files changed

+6
-6
lines changed

3 files changed

+6
-6
lines changed

README.md

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -5,7 +5,7 @@
55

66
## Overview
77

8-
This repository contains the code and data for our [paper](https://insert.link.when.ready) on using large language models (LLMs) for stylometric analysis. We demonstrate that GPT-2 models trained on individual authors' works can capture unique writing styles, enabling accurate authorship attribution through cross-entropy loss comparison.
8+
This repository contains the code and data for our [paper](https://arxiv.org/abs/2510.21958) on using large language models (LLMs) for stylometric analysis. We demonstrate that GPT-2 models trained on individual authors' works can capture unique writing styles, enabling accurate authorship attribution through cross-entropy loss comparison.
99

1010
## Repository Structure
1111

@@ -250,7 +250,7 @@ If you use this code or data in your research, please cite:
250250
@article{StroEtal25,
251251
title={A Stylometric Application of Large Language Models},
252252
author={Stropkay, Harrison F. and Chen, Jiayi and Jabelli, Mohammad J. L. and Rockmore, Daniel N. and Manning, Jeremy R.},
253-
journal={arXiv preprint arXiv:XXXX.XXXXX},
253+
journal={arXiv preprint arXiv:2510.21958},
254254
year={2025}
255255
}
256256
```

code/generate_dataset_card.py

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -128,7 +128,7 @@ def generate_dataset_card(author, data_dir):
128128
129129
## Dataset Description
130130
131-
This dataset contains works of **{metadata['full_name']}** ({metadata['years']}), preprocessed for computational stylometry research. The texts were sourced from [Project Gutenberg](https://www.gutenberg.org/) and cleaned for use in the paper ["A Stylometric Application of Large Language Models"](https://github.com/ContextLab/llm-stylometry) (Stropkay et al., 2025).
131+
This dataset contains works of **{metadata['full_name']}** ({metadata['years']}), preprocessed for computational stylometry research. The texts were sourced from [Project Gutenberg](https://www.gutenberg.org/) and cleaned for use in the paper ["A Stylometric Application of Large Language Models"](https://arxiv.org/abs/2510.21958) (Stropkay et al., 2025).
132132
133133
The corpus includes **{stats['num_books']} books** by {metadata['full_name']}, including {metadata['notable_works']}. All text has been converted to **lowercase** and cleaned of Project Gutenberg headers, footers, and chapter headings to focus on the author's prose style.
134134
@@ -327,7 +327,7 @@ def tokenize_function(examples):
327327
@article{{StroEtal25,
328328
title={{A Stylometric Application of Large Language Models}},
329329
author={{Stropkay, Harrison F. and Chen, Jiayi and Jabelli, Mohammad J. L. and Rockmore, Daniel N. and Manning, Jeremy R.}},
330-
journal={{arXiv preprint arXiv:XXXX.XXXXX}},
330+
journal={{arXiv preprint arXiv:2510.21958}},
331331
year={{2025}}
332332
}}
333333
```

code/generate_model_card.py

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -182,7 +182,7 @@ def generate_model_card(author, model_dir):
182182
183183
## Overview
184184
185-
This model is a GPT-2 language model trained exclusively on the complete works of **{metadata['full_name']}** ({metadata['years']}). It was developed for the paper ["A Stylometric Application of Large Language Models"](https://github.com/ContextLab/llm-stylometry) (Stropkay et al., 2025).
185+
This model is a GPT-2 language model trained exclusively on the complete works of **{metadata['full_name']}** ({metadata['years']}). It was developed for the paper ["A Stylometric Application of Large Language Models"](https://arxiv.org/abs/2510.21958) (Stropkay et al., 2025).
186186
187187
The model captures {metadata['full_name']}'s unique writing style through intensive training on their complete corpus. By learning the statistical patterns, vocabulary, syntax, and thematic elements characteristic of {author.capitalize()}'s writing, this model enables:
188188
@@ -349,7 +349,7 @@ def generate_model_card(author, model_dir):
349349
@article{{StroEtal25,
350350
title={{A Stylometric Application of Large Language Models}},
351351
author={{Stropkay, Harrison F. and Chen, Jiayi and Jabelli, Mohammad J. L. and Rockmore, Daniel N. and Manning, Jeremy R.}},
352-
journal={{arXiv preprint arXiv:XXXX.XXXXX}},
352+
journal={{arXiv preprint arXiv:2510.21958}},
353353
year={{2025}}
354354
}}
355355
```

0 commit comments

Comments
 (0)