Skip to content

Commit 70253d2

Browse files
Merge pull request #29 from jeremymanning/main
Fix issue #28: Handle NaN/Inf in t-test visualization axis limits
2 parents 949b56f + e792f3e commit 70253d2

File tree

627 files changed

+365568
-10655
lines changed

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

627 files changed

+365568
-10655
lines changed

.github/PULL_REQUEST_TEMPLATE.md

Lines changed: 59 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,59 @@
1+
# Pull Request
2+
3+
## Description
4+
<!-- Provide a brief description of the changes in this PR -->
5+
6+
## Related Issue
7+
<!-- If this PR addresses an issue, link it here (e.g., "Fixes #123") -->
8+
9+
## Type of Change
10+
<!-- Mark the appropriate option with an "x" -->
11+
12+
- [ ] Bug fix (non-breaking change which fixes an issue)
13+
- [ ] New feature (non-breaking change which adds functionality)
14+
- [ ] Breaking change (fix or feature that would cause existing functionality to change)
15+
- [ ] Documentation update
16+
- [ ] Code refactoring
17+
- [ ] Performance improvement
18+
- [ ] Test addition or modification
19+
20+
## Checklist
21+
22+
### Code Quality
23+
- [ ] Code follows the project's style guidelines
24+
- [ ] Code has been formatted with `black` (run `black .` from project root)
25+
- [ ] Variable and function names are descriptive and follow existing conventions
26+
- [ ] Code is well-organized and follows the package structure
27+
28+
### Testing
29+
- [ ] **No mock objects or functions are used in tests** (tests use real models, data, and outputs)
30+
- [ ] All new functionality has comprehensive test coverage
31+
- [ ] Tests use small datasets/models and complete quickly (< 5 minutes total)
32+
- [ ] Tests verify actual outputs (files created, correct formats, expected content)
33+
- [ ] All existing tests pass locally (`pytest tests/`)
34+
- [ ] Tests work across platforms (Linux, macOS, Windows if applicable)
35+
36+
### Documentation
37+
- [ ] Updated relevant documentation (README.md, docstrings, etc.)
38+
- [ ] Added/updated code comments where necessary
39+
- [ ] Examples are provided for new features
40+
41+
### Dependencies
42+
- [ ] No unnecessary dependencies added
43+
- [ ] If new dependencies added, updated `requirements.txt`
44+
- [ ] Verified compatibility with existing dependencies
45+
46+
### Git Hygiene
47+
- [ ] Commits have descriptive messages
48+
- [ ] No sensitive information (passwords, keys, tokens) in code or commits
49+
- [ ] `.gitignore` updated if new file types should be excluded
50+
- [ ] No large files committed (model weights, large datasets, etc.)
51+
52+
### Functional Requirements
53+
- [ ] Changes have been tested manually
54+
- [ ] Scripts run without errors
55+
- [ ] Generated outputs (figures, models, etc.) are correct
56+
- [ ] No breaking changes to existing functionality (or documented if necessary)
57+
58+
## Additional Notes
59+
<!-- Any additional information reviewers should know -->

.gitignore

Lines changed: 7 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -27,5 +27,12 @@ tests/data/*.pkl
2727

2828
# Temporary test files
2929
.test_credentials
30+
31+
# SSH credentials (security)
32+
.ssh/credentials.json
33+
34+
# Model files
3035
models/*/model.safetensors
36+
models/*/model.pth
37+
models/*/pytorch_model.bin
3138
models/*/training_state.pt

CONTRIBUTING.md

Lines changed: 221 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,221 @@
1+
# Contributing to LLM Stylometry
2+
3+
Thank you for your interest in contributing to LLM Stylometry! This project applies language model stylometry to analyze authorship patterns in literary texts. We welcome contributions that improve the codebase, add new features, fix bugs, or enhance documentation.
4+
5+
## Table of Contents
6+
7+
- [Code of Conduct](#code-of-conduct)
8+
- [How to Contribute](#how-to-contribute)
9+
- [Development Guidelines](#development-guidelines)
10+
- [Testing Philosophy](#testing-philosophy)
11+
- [Reporting Bugs](#reporting-bugs)
12+
- [Communication](#communication)
13+
14+
## Code of Conduct
15+
16+
We are committed to providing a welcoming and inclusive environment for all contributors. Please be respectful and constructive in all interactions.
17+
18+
Unacceptable behavior should be reported to [[email protected]](mailto:[email protected]).
19+
20+
## How to Contribute
21+
22+
### Getting Started
23+
24+
1. **Fork the repository** on GitHub
25+
2. **Clone your fork** locally:
26+
```bash
27+
git clone https://github.com/YOUR-USERNAME/llm-stylometry.git
28+
cd llm-stylometry
29+
```
30+
3. **Create a branch** for your changes:
31+
```bash
32+
git checkout -b feature/your-feature-name
33+
```
34+
4. **Make your changes** following our development guidelines
35+
5. **Test your changes** thoroughly
36+
6. **Commit your changes** with descriptive messages
37+
7. **Push to your fork** and submit a pull request
38+
39+
### What to Contribute
40+
41+
We welcome contributions in several areas:
42+
43+
- **Bug fixes**: Identify and fix issues in existing code
44+
- **New features**: Add functionality that aligns with the project's goals
45+
- **Performance improvements**: Optimize existing code
46+
- **Documentation**: Improve README, docstrings, or add examples
47+
- **Tests**: Expand test coverage or improve test quality
48+
- **Code refactoring**: Improve code organization and readability
49+
50+
## Development Guidelines
51+
52+
### Code Style
53+
54+
- **Formatting**: Use `black` for code formatting:
55+
```bash
56+
black .
57+
```
58+
- **Naming conventions**:
59+
- Use descriptive variable and function names
60+
- Follow existing naming patterns in the codebase
61+
- Functions: `snake_case`
62+
- Classes: `PascalCase`
63+
- Constants: `UPPER_CASE`
64+
65+
### Code Organization
66+
67+
- Follow the existing package structure:
68+
```
69+
llm_stylometry/
70+
├── core/ # Experiment configuration and training
71+
├── data/ # Data loading and preprocessing
72+
├── models/ # Model initialization
73+
├── analysis/ # Statistical analysis
74+
└── visualization/ # Figure generation
75+
```
76+
- Keep functions focused and single-purpose
77+
- Add docstrings to all public functions and classes
78+
- Use type hints where appropriate
79+
80+
### Dependencies
81+
82+
- Minimize new dependencies
83+
- Only add dependencies that are:
84+
- Well-maintained
85+
- Widely used
86+
- Necessary for the feature
87+
- Update `requirements.txt` when adding dependencies
88+
- Document why the dependency is needed
89+
90+
## Testing Philosophy
91+
92+
### Core Principles
93+
94+
**Use real models, data, and outputs—no mocks.**
95+
96+
Our testing philosophy prioritizes real-world validation over unit test isolation. This approach ensures:
97+
- External APIs work correctly (e.g., Anthropic, OpenAI, Hugging Face)
98+
- Models can be downloaded and used properly
99+
- Responses are in expected formats
100+
- Database operations succeed
101+
- Files are created and read correctly
102+
- Figures render properly
103+
104+
### Writing Tests
105+
106+
When writing tests, follow these guidelines:
107+
108+
1. **Use real data and models**:
109+
```python
110+
# Good: Use actual small model
111+
model = GPT2LMHeadModel.from_pretrained('gpt2')
112+
113+
# Bad: Mock the model
114+
model = MagicMock()
115+
```
116+
117+
2. **Keep tests fast**:
118+
- Use small datasets (synthetic test data)
119+
- Use tiny models (e.g., GPT-2 with minimal layers)
120+
- Target < 5 minutes for entire test suite
121+
122+
3. **Test actual outputs**:
123+
```python
124+
# Good: Verify real file was created
125+
fig = generate_figure(data, 'output.pdf')
126+
assert Path('output.pdf').exists()
127+
assert Path('output.pdf').stat().st_size > 1000
128+
129+
# Bad: Mock file creation
130+
with patch('pathlib.Path.exists', return_value=True):
131+
...
132+
```
133+
134+
4. **Test edge cases**:
135+
- Empty inputs
136+
- Missing files
137+
- Invalid parameters
138+
- Boundary conditions
139+
140+
5. **Run tests locally before submitting**:
141+
```bash
142+
pytest tests/
143+
```
144+
145+
### Test Coverage
146+
147+
Ensure new features have comprehensive test coverage:
148+
- Happy path (expected usage)
149+
- Error handling
150+
- Edge cases
151+
- Cross-platform compatibility (if applicable)
152+
153+
## Reporting Bugs
154+
155+
When reporting a bug, please include:
156+
157+
1. **Short summary**: Brief description of the issue
158+
2. **Reproduction steps**: Minimal code snippet to reproduce
159+
3. **Expected behavior**: What should happen
160+
4. **Actual behavior**: What actually happens
161+
5. **Environment**:
162+
- Operating system
163+
- Python version
164+
- Package versions (from `pip list`)
165+
166+
### Bug Report Template
167+
168+
```markdown
169+
## Bug Description
170+
[Brief description]
171+
172+
## Steps to Reproduce
173+
```python
174+
# Minimal code example
175+
```
176+
177+
## Expected Behavior
178+
[What should happen]
179+
180+
## Actual Behavior
181+
[What actually happens]
182+
183+
## Environment
184+
- OS: [e.g., Ubuntu 22.04]
185+
- Python: [e.g., 3.10.0]
186+
- PyTorch: [e.g., 2.0.0]
187+
```
188+
189+
## Communication
190+
191+
- **Issues**: Use GitHub Issues for bug reports and feature requests
192+
- **Pull Requests**: Use PRs for code contributions
193+
- **Email**: For sensitive issues or questions, contact [[email protected]](mailto:[email protected])
194+
195+
## Pull Request Process
196+
197+
1. Ensure all tests pass
198+
2. Update documentation as needed
199+
3. Follow the PR template checklist
200+
4. Write clear commit messages
201+
5. Reference related issues (e.g., "Fixes #123")
202+
6. Be responsive to review feedback
203+
204+
### Review Process
205+
206+
- PRs require approval from a maintainer
207+
- Reviewers will check:
208+
- Code quality and style
209+
- Test coverage
210+
- Documentation completeness
211+
- Adherence to project guidelines
212+
- Address feedback promptly and professionally
213+
214+
## Questions?
215+
216+
If you have questions about contributing, feel free to:
217+
- Open an issue for discussion
218+
- Email the maintainers
219+
- Check existing documentation in the README
220+
221+
Thank you for contributing to LLM Stylometry! 🎉

0 commit comments

Comments
 (0)