Skip to content

Commit 86ac165

Browse files
committed
Add HuggingFace dataset links to data/README.md
- All 8 author corpus links with book counts - Usage example with datasets library - Main README references data/README.md for datasets Ref: #42
1 parent ae514d8 commit 86ac165

File tree

1 file changed

+15
-0
lines changed

1 file changed

+15
-0
lines changed

data/README.md

Lines changed: 15 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -38,6 +38,21 @@ data/
3838
- Twain (6 books)
3939
- Wells (12 books)
4040

41+
## HuggingFace Datasets
42+
43+
All author corpora are publicly available on HuggingFace with verified book titles:
44+
45+
- [contextlab/austen-corpus](https://huggingface.co/datasets/contextlab/austen-corpus) - 7 books
46+
- [contextlab/baum-corpus](https://huggingface.co/datasets/contextlab/baum-corpus) - 14 books
47+
- [contextlab/dickens-corpus](https://huggingface.co/datasets/contextlab/dickens-corpus) - 14 books
48+
- [contextlab/fitzgerald-corpus](https://huggingface.co/datasets/contextlab/fitzgerald-corpus) - 8 books
49+
- [contextlab/melville-corpus](https://huggingface.co/datasets/contextlab/melville-corpus) - 10 books
50+
- [contextlab/thompson-corpus](https://huggingface.co/datasets/contextlab/thompson-corpus) - 13 books
51+
- [contextlab/twain-corpus](https://huggingface.co/datasets/contextlab/twain-corpus) - 6 books
52+
- [contextlab/wells-corpus](https://huggingface.co/datasets/contextlab/wells-corpus) - 12 books
53+
54+
Load with: `from datasets import load_dataset; corpus = load_dataset("contextlab/baum-corpus")`
55+
4156
## Creating Variant Data
4257

4358
Generate variant-transformed texts:

0 commit comments

Comments
 (0)