Skip to content

Commit 8cf59b1

Browse files
authored
Merge pull request #33 from getyourguide/docs/split-readme-and-restructure
docs: partitioned readme
2 parents df9f7b1 + 30500e7 commit 8cf59b1

File tree

4 files changed

+363
-82
lines changed

4 files changed

+363
-82
lines changed

CONTRIBUTING.md

Lines changed: 194 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,194 @@
1+
# Contributing to DataFrameExpectations
2+
3+
Thank you for your interest in contributing to DataFrameExpectations! We welcome contributions from the community, whether it's adding new expectations, fixing bugs, improving documentation, or enhancing the testing framework.
4+
5+
## Table of Contents
6+
7+
- [Getting Started](#getting-started)
8+
- [Development Setup](#development-setup)
9+
- [How to Contribute](#how-to-contribute)
10+
- [Adding New Expectations](#adding-new-expectations)
11+
- [Running Tests](#running-tests)
12+
- [Code Style Guidelines](#code-style-guidelines)
13+
- [Submitting a Pull Request](#submitting-a-pull-request)
14+
- [Versioning and Commits](#versioning-and-commits)
15+
16+
## Getting Started
17+
18+
Before you begin:
19+
1. Check existing [issues](https://github.com/getyourguide/dataframe-expectations/issues) and [pull requests](https://github.com/getyourguide/dataframe-expectations/pulls) to avoid duplicates
20+
2. For major changes, open an issue first to discuss your proposal
21+
3. Ensure you agree with the [Apache 2.0 License](LICENSE.txt)
22+
23+
## Development Setup
24+
25+
1. **Fork and clone the repository:**
26+
```bash
27+
git clone https://github.com/getyourguide/dataframe-expectations.git
28+
cd dataframe-expectations
29+
```
30+
31+
2. **Install UV package manager:**
32+
```bash
33+
pip install uv
34+
```
35+
36+
3. **Install development dependencies:**
37+
```bash
38+
# This will automatically create a virtual environment
39+
uv sync --group dev
40+
```
41+
42+
4. **Activate the virtual environment:**
43+
```bash
44+
source .venv/bin/activate # On Windows: .venv\Scripts\activate
45+
```
46+
47+
5. **Verify your setup:**
48+
```bash
49+
uv run pytest tests/ -n auto --cov=dataframe_expectations
50+
```
51+
52+
6. **(Optional) Install pre-commit hooks:**
53+
```bash
54+
pre-commit install
55+
```
56+
This will automatically run checks before each commit.
57+
58+
## How to Contribute
59+
60+
### Reporting Bugs
61+
Open an [issue](https://github.com/getyourguide/dataframe-expectations/issues) with a clear description, steps to reproduce, expected vs. actual behavior, and relevant environment details.
62+
63+
### Documentation
64+
Fix typos, clarify docs, add examples, or improve the README.
65+
66+
### Features
67+
Open an issue first to discuss new features, explain the use case, and consider backward compatibility.
68+
69+
### Adding Expectations
70+
See the **[Adding Expectations Guide](https://code.getyourguide.com/dataframe-expectations/adding_expectations.html)** for detailed instructions.
71+
72+
73+
## Running Tests
74+
75+
```bash
76+
# Run all tests with parallelization
77+
uv run pytest tests/ -n auto
78+
79+
# Run with coverage and parallelization
80+
uv run pytest tests/ -n auto --cov=dataframe_expectations
81+
82+
# Run specific test file
83+
uv run pytest tests/test_expectations_suite.py -n auto
84+
85+
# Run tests matching a pattern
86+
uv run pytest tests/ -n auto -k "test_expect_min_rows"
87+
```
88+
89+
## Code Style Guidelines
90+
91+
### Python Style
92+
- Follow [PEP 8](https://www.python.org/dev/peps/pep-0008/)
93+
- Use type hints for all function parameters and return values
94+
- Maximum line length: 120 characters
95+
- Use meaningful variable and function names
96+
97+
### Docstrings
98+
- Use Google-style docstrings
99+
- Include parameter descriptions and return types
100+
- Add usage examples for complex functions
101+
102+
### Code Quality
103+
- Write clear, self-documenting code
104+
- Add comments for complex logic
105+
- Keep functions focused and single-purpose
106+
- Avoid deep nesting (max 3-4 levels)
107+
108+
### Testing
109+
- Maintain or improve test coverage
110+
- Test expected behavior (happy paths) and error conditions (edge cases)
111+
- Use descriptive test names
112+
113+
## Submitting a Pull Request
114+
115+
1. **Create a branch** and make your changes
116+
```bash
117+
git checkout -b feature/your-feature-name
118+
```
119+
120+
2. **Run tests:**
121+
```bash
122+
uv run pytest tests/ -n auto --cov=dataframe_expectations
123+
```
124+
125+
3. **Commit using [Conventional Commits](https://www.conventionalcommits.org/)** (see [Versioning](#versioning-and-commits))
126+
```bash
127+
git commit -m "feat: your feature description"
128+
```
129+
130+
4. **Push and open a PR** with a clear description referencing any related issues
131+
132+
## Versioning and Commits
133+
134+
This project follows [Semantic Versioning](https://semver.org/) and uses [Conventional Commits](https://www.conventionalcommits.org/).
135+
136+
### Commit Message Format
137+
138+
```
139+
<type>: <description>
140+
141+
[optional body]
142+
143+
[optional footer]
144+
```
145+
146+
### Commit Types
147+
148+
- `feat:` - New feature → **MINOR** version bump (0.1.0 → 0.2.0)
149+
- `fix:` - Bug fix → **PATCH** version bump (0.1.0 → 0.1.1)
150+
- `feat!:` or `BREAKING CHANGE:` - Breaking change → **MAJOR** version bump (0.1.0 → 1.0.0)
151+
- `docs:` - Documentation changes (no version bump)
152+
- `test:` - Test changes (no version bump)
153+
- `chore:` - Maintenance tasks (no version bump)
154+
- `refactor:` - Code refactoring (no version bump)
155+
- `style:` - Code style changes (no version bump)
156+
- `ci:` - CI/CD changes (no version bump)
157+
158+
### Examples
159+
160+
```bash
161+
# Adding a new feature
162+
git commit -m "feat: add expect_column_sum_equals expectation"
163+
164+
# Fixing a bug
165+
git commit -m "fix: correct validation logic in expect_value_greater_than"
166+
167+
# Breaking change
168+
git commit -m "feat!: remove deprecated API methods"
169+
170+
# With body
171+
git commit -m "feat: add tag filtering support
172+
173+
Allow expectations to be filtered by tags at runtime.
174+
This enables selective execution of validation rules."
175+
176+
# Documentation update
177+
git commit -m "docs: update README with new examples"
178+
```
179+
180+
### What Happens Next
181+
182+
When your PR is merged to main:
183+
1. [Release Please](https://github.com/googleapis/release-please) automatically creates/updates a Release PR
184+
2. The Release PR includes version bump and changelog
185+
3. When the Release PR is merged, a GitHub Release is created
186+
4. The maintainer manually publishes the package to PyPI
187+
188+
## Questions?
189+
190+
If you have questions or need help:
191+
- Open an [issue](https://github.com/getyourguide/dataframe-expectations/issues)
192+
- Review the [documentation](https://code.getyourguide.com/dataframe-expectations/)
193+
194+
Thank you for contributing! 🎉

README.md

Lines changed: 54 additions & 47 deletions
Original file line numberDiff line numberDiff line change
@@ -51,9 +51,9 @@ source .venv/bin/activate # On Windows: .venv\Scripts\activate
5151
uv run pytest tests/ --cov=dataframe_expectations
5252
```
5353

54-
### Using the library
54+
### Quick Start
5555

56-
**Basic usage with Pandas:**
56+
#### Pandas Example
5757
```python
5858
from dataframe_expectations.suite import DataFrameExpectationsSuite
5959
import pandas as pd
@@ -80,7 +80,7 @@ df = pd.DataFrame({
8080
runner.run(df)
8181
```
8282

83-
**PySpark example:**
83+
#### PySpark Example
8484
```python
8585
from dataframe_expectations.suite import DataFrameExpectationsSuite
8686
from pyspark.sql import SparkSession
@@ -114,7 +114,22 @@ df = spark.createDataFrame(data)
114114
runner.run(df)
115115
```
116116

117-
**Decorator pattern for automatic validation:**
117+
### Validation Patterns
118+
119+
#### Manual Validation
120+
Use `runner.run()` to explicitly validate DataFrames:
121+
122+
```python
123+
# Run validation and raise exception on failure
124+
runner.run(df)
125+
126+
# Run validation without raising exception
127+
result = runner.run(df, raise_on_failure=False)
128+
```
129+
130+
#### Decorator-Based Validation
131+
Automatically validate function return values using decorators:
132+
118133
```python
119134
from dataframe_expectations.suite import DataFrameExpectationsSuite
120135
from pyspark.sql import SparkSession
@@ -159,7 +174,9 @@ def conditional_load(should_load: bool):
159174
return None # No validation when None is returned
160175
```
161176

162-
**Output:**
177+
##### Validation Output
178+
When validation runs, you'll see output like this:
179+
163180
```python
164181
========================== Running expectations suite ==========================
165182
ExpectationMinRows (DataFrame contains at least 3 rows) ... OK
@@ -182,31 +199,11 @@ Some examples of violations:
182199
| 15 | Bob | 60000 |
183200
+-----+------+--------+
184201
================================================================================
185-
186202
```
187203

188-
**Tag-based filtering for selective execution:**
189-
```python
190-
from dataframe_expectations import DataFrameExpectationsSuite, TagMatchMode
191-
192-
# Tag expectations with priorities and environments
193-
suite = (
194-
DataFrameExpectationsSuite()
195-
.expect_value_greater_than(column_name="age", value=18, tags=["priority:high", "env:prod"])
196-
.expect_value_not_null(column_name="name", tags=["priority:high"])
197-
.expect_min_rows(min_rows=1, tags=["priority:low", "env:test"])
198-
)
199-
200-
# Run only high-priority checks (OR logic - matches ANY tag)
201-
runner = suite.build(tags=["priority:high"], tag_match_mode=TagMatchMode.ANY)
202-
runner.run(df)
203-
204-
# Run production-critical checks (AND logic - matches ALL tags)
205-
runner = suite.build(tags=["priority:high", "env:prod"], tag_match_mode=TagMatchMode.ALL)
206-
runner.run(df)
207-
```
204+
#### Programmatic Result Inspection
205+
Get detailed validation results without raising exceptions:
208206

209-
**Programmatic result inspection:**
210207
```python
211208
# Get detailed results without raising exceptions
212209
result = runner.run(df, raise_on_failure=False)
@@ -223,35 +220,45 @@ for exp_result in result.results:
223220
print(f"Failed: {exp_result.description} - {exp_result.violation_count} violations")
224221
```
225222

226-
### How to contribute?
227-
Contributions are welcome! You can enhance the library by adding new expectations, refining existing ones, or improving the testing framework.
223+
### Advanced Features
228224

229-
### Versioning
225+
#### Tag-Based Filtering
226+
Filter which expectations to run using tags:
230227

231-
This project follows [Semantic Versioning](https://semver.org/) (SemVer) and uses [Release Please](https://github.com/googleapis/release-please) for automated version management.
228+
```python
229+
from dataframe_expectations import DataFrameExpectationsSuite, TagMatchMode
232230

233-
Versions are automatically determined based on [Conventional Commits](https://www.conventionalcommits.org/):
231+
# Tag expectations with priorities and environments
232+
suite = (
233+
DataFrameExpectationsSuite()
234+
.expect_value_greater_than(column_name="age", value=18, tags=["priority:high", "env:prod"])
235+
.expect_value_not_null(column_name="name", tags=["priority:high"])
236+
.expect_min_rows(min_rows=1, tags=["priority:low", "env:test"])
237+
)
234238

235-
- `feat:` - New feature → **MINOR** version bump (0.1.0 → 0.2.0)
236-
- `fix:` - Bug fix → **PATCH** version bump (0.1.0 → 0.1.1)
237-
- `feat!:` or `BREAKING CHANGE:` - Breaking change → **MAJOR** version bump (0.1.0 → 1.0.0)
238-
- `chore:`, `docs:`, `style:`, `refactor:`, `test:`, `ci:` - No version bump
239+
# Run only high-priority checks (OR logic - matches ANY tag)
240+
runner = suite.build(tags=["priority:high"], tag_match_mode=TagMatchMode.ANY)
241+
runner.run(df)
239242

240-
**Example commits:**
241-
```bash
242-
git commit -m "feat: add new expectation for null values"
243-
git commit -m "fix: correct validation logic in expect_value_greater_than"
244-
git commit -m "feat!: remove deprecated API methods"
243+
# Run production-critical checks (AND logic - matches ALL tags)
244+
runner = suite.build(tags=["priority:high", "env:prod"], tag_match_mode=TagMatchMode.ALL)
245+
runner.run(df)
245246
```
246247

247-
When changes are pushed to the main branch, Release Please automatically:
248-
1. Creates or updates a Release PR with version bump and changelog
249-
2. When merged, creates a GitHub Release and publishes to PyPI
248+
## Contributing
249+
250+
We welcome contributions! Whether you're adding new expectations, fixing bugs, or improving documentation, your help is appreciated.
251+
252+
Please see [CONTRIBUTING.md](CONTRIBUTING.md) for:
253+
- Development setup instructions
254+
- How to add new expectations
255+
- Code style guidelines
256+
- Testing requirements
257+
- Pull request process
250258

251-
No manual version updates needed - just use conventional commit messages!
259+
## Security
252260

253-
### Security
254-
For security issues please contact [email protected].
261+
For security vulnerabilities, please see our [Security Policy](SECURITY.md) or contact [email protected].
255262

256263
### Legal
257264
dataframe-expectations is licensed under the Apache License, Version 2.0. See [LICENSE](LICENSE.txt) for the full text.

SECURITY.md

Lines changed: 23 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,23 @@
1+
# Security Policy
2+
3+
## How to Report
4+
5+
Please do not report security vulnerabilities through public GitHub issues.
6+
7+
Instead, please report security vulnerabilities by emailing:
8+
9+
10+
11+
## What to Include
12+
13+
To help us better understand and address the issue, please include:
14+
15+
- A description of the vulnerability
16+
- Steps to reproduce the issue
17+
- Potential impact of the vulnerability
18+
- Any suggested fixes or mitigations (if available)
19+
- Your contact information for follow-up
20+
21+
## Contact
22+
23+
For any questions about this security policy, please contact [email protected].

0 commit comments

Comments
 (0)