Skip to content

Commit b48047a

Browse files
committed
Overhaul Analysis Variants section
Simplified from 78 lines to 26 lines (-52 lines). Changes: - Removed detailed fairness API examples - Removed redundant model/figure path examples - Streamlined variant descriptions - Clarified supplemental figure connection (S1-S8) - Removed reference to supplement narrative (figures only) README: 497 → 445 lines (-52 lines) Total: 752 → 445 lines (-307 lines, -41%) Related to #35
1 parent e8e74f6 commit b48047a

File tree

1 file changed

+15
-67
lines changed

1 file changed

+15
-67
lines changed

README.md

Lines changed: 15 additions & 67 deletions
Original file line numberDiff line numberDiff line change
@@ -105,83 +105,31 @@ See the [Package API](#package-api) section for all available functions.
105105

106106
## Analysis Variants
107107

108-
The project supports three linguistic variants to understand what stylistic features models learn:
108+
The paper analyzes three linguistic variants (Supplemental Figures S1-S8):
109109

110-
**Content-Only** (`-co`, `--content-only`): Masks function words with `<FUNC>`, preserving only content words (nouns, verbs, adjectives). Tests vocabulary and word choice.
111-
112-
**Function-Only** (`-fo`, `--function-only`): Masks content words with `<CONTENT>`, preserving only function words (articles, prepositions, conjunctions). Tests grammatical structure.
113-
114-
**Part-of-Speech** (`-pos`, `--part-of-speech`): Replaces words with POS tags (Universal Dependencies tagset). Tests syntactic patterns.
115-
116-
All CLI commands accept variant flags. Without a flag, the baseline condition is used. Each variant trains 80 models (8 authors × 10 seeds). See [Training Models from Scratch](#training-models-from-scratch) for training details.
110+
- **Content-only**: Function words masked → tests vocabulary/word choice (Supp. Figs. S1, S4, S7A, S8A)
111+
- **Function-only**: Content words masked → tests grammatical structure (Supp. Figs. S2, S5, S7B, S8B)
112+
- **Part-of-speech**: Words → POS tags → tests syntactic patterns (Supp. Figs. S3, S6, S7C, S8C)
117113

114+
**Generate supplemental figures:**
118115
```bash
119-
# Generate figures for variants
120-
./run_llm_stylometry.sh -f 1a -co # Figure 1A, content variant
121-
./run_llm_stylometry.sh --function-only # All figures, function variant
122-
123-
# Compute statistics
124-
./run_stats.sh --all # All variants at once
125-
./run_stats.sh -co # Single variant
116+
./run_llm_stylometry.sh -f s1a # Supp. Fig. S1A (content-only, Fig 1A format)
117+
./run_llm_stylometry.sh -f s4b # Supp. Fig. S4B (content-only, Fig 2B format)
118+
./run_llm_stylometry.sh -f s7c # Supp. Fig. S7C (POS confusion matrix)
126119
```
127120

128-
**Model directories:**
129-
- Baseline: `{author}_tokenizer=gpt2_seed={0-9}/`
130-
- Variants: `{author}_variant={content|function|pos}_tokenizer=gpt2_seed={0-9}/`
131-
132-
**Figure paths:**
133-
- Baseline: `paper/figs/source/figure_name.pdf`
134-
- Variants: `paper/figs/source/figure_name_{variant}.pdf`
135-
136-
### Fairness-Based Loss Thresholding
137-
138-
Variant models converge much faster than baseline models (all cross 3.0 loss by epochs 15-16) and may converge to different final losses. To ensure fair comparison, **fairness-based loss thresholding** is automatically applied to variant figures (1A, 1B, 3, 4, 5):
139-
140-
1. **Compute threshold**: Maximum of all models' minimum training losses within 500 epochs
141-
2. **Truncate data**: Keep all epochs up to and including the first epoch where training loss ≤ threshold
142-
3. **Fair comparison**: All models compared at the same training loss level (the fairness threshold)
143-
144-
This ensures models are not unfairly compared when some converged to higher losses than others. The feature is enabled by default for variants and can be disabled:
145-
121+
**Training variants:** Each trains 80 models (8 authors × 10 seeds)
146122
```bash
147-
# Fairness enabled (default for variants)
148-
./run_llm_stylometry.sh -f 1a -fo
149-
150-
# Fairness disabled
151-
./run_llm_stylometry.sh -f 1a -fo --no-fairness
123+
./run_llm_stylometry.sh --train -co # Content-only
124+
./remote_train.sh -fo # Function-only on GPU cluster
152125
```
153126

154-
**Example results** (function-only variant):
155-
- Fairness threshold: 1.2720 (Austen's minimum loss)
156-
- Models truncated between epochs 88-500
157-
- Data reduced: 360,640 rows → 170,659 rows (47.3%)
158-
159-
**Python API:**
160-
161-
```python
162-
from llm_stylometry.analysis.fairness import (
163-
compute_fairness_threshold,
164-
apply_fairness_threshold
165-
)
166-
167-
# Compute threshold for variant data
168-
df = pd.read_pickle('data/model_results_function.pkl')
169-
threshold = compute_fairness_threshold(df, min_epochs=500)
170-
print(f"Fairness threshold: {threshold:.4f}")
171-
172-
# Truncate data at threshold
173-
df_fair = apply_fairness_threshold(df, threshold, use_first_crossing=True)
174-
175-
# Generate figure with fairness
176-
from llm_stylometry.visualization import generate_all_losses_figure
177-
fig = generate_all_losses_figure(
178-
data_path='data/model_results_function.pkl',
179-
variant='function',
180-
apply_fairness=True # default for variants
181-
)
127+
**Statistical analysis:**
128+
```bash
129+
./run_stats.sh # All variants (default)
182130
```
183131

184-
**Note**: T-test figures (2A, 2B) never apply fairness thresholding since they require all 500 epochs for statistical calculations.
132+
**Fairness-based loss thresholding:** Automatically ensures fair comparison when variant models converge to different final losses. Disable with `--no-fairness` if needed.
185133

186134
## Training Models from Scratch
187135

0 commit comments

Comments
 (0)