Skip to content

Commit 82fd85b

Browse files
authored
Merge branch 'main' into dependabot/uv/pre-commit-4.5.0
2 parents 6fbc01f + 4c121cc commit 82fd85b

File tree

14 files changed

+786
-718
lines changed

14 files changed

+786
-718
lines changed

CONTRIBUTING.md

Lines changed: 194 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,194 @@
1+
# Contributing to DataFrameExpectations
2+
3+
Thank you for your interest in contributing to DataFrameExpectations! We welcome contributions from the community, whether it's adding new expectations, fixing bugs, improving documentation, or enhancing the testing framework.
4+
5+
## Table of Contents
6+
7+
- [Getting Started](#getting-started)
8+
- [Development Setup](#development-setup)
9+
- [How to Contribute](#how-to-contribute)
10+
- [Adding New Expectations](#adding-new-expectations)
11+
- [Running Tests](#running-tests)
12+
- [Code Style Guidelines](#code-style-guidelines)
13+
- [Submitting a Pull Request](#submitting-a-pull-request)
14+
- [Versioning and Commits](#versioning-and-commits)
15+
16+
## Getting Started
17+
18+
Before you begin:
19+
1. Check existing [issues](https://github.com/getyourguide/dataframe-expectations/issues) and [pull requests](https://github.com/getyourguide/dataframe-expectations/pulls) to avoid duplicates
20+
2. For major changes, open an issue first to discuss your proposal
21+
3. Ensure you agree with the [Apache 2.0 License](LICENSE.txt)
22+
23+
## Development Setup
24+
25+
1. **Fork and clone the repository:**
26+
```bash
27+
git clone https://github.com/getyourguide/dataframe-expectations.git
28+
cd dataframe-expectations
29+
```
30+
31+
2. **Install UV package manager:**
32+
```bash
33+
pip install uv
34+
```
35+
36+
3. **Install development dependencies:**
37+
```bash
38+
# This will automatically create a virtual environment
39+
uv sync --group dev
40+
```
41+
42+
4. **Activate the virtual environment:**
43+
```bash
44+
source .venv/bin/activate # On Windows: .venv\Scripts\activate
45+
```
46+
47+
5. **Verify your setup:**
48+
```bash
49+
uv run pytest tests/ -n auto --cov=dataframe_expectations
50+
```
51+
52+
6. **(Optional) Install pre-commit hooks:**
53+
```bash
54+
pre-commit install
55+
```
56+
This will automatically run checks before each commit.
57+
58+
## How to Contribute
59+
60+
### Reporting Bugs
61+
Open an [issue](https://github.com/getyourguide/dataframe-expectations/issues) with a clear description, steps to reproduce, expected vs. actual behavior, and relevant environment details.
62+
63+
### Documentation
64+
Fix typos, clarify docs, add examples, or improve the README.
65+
66+
### Features
67+
Open an issue first to discuss new features, explain the use case, and consider backward compatibility.
68+
69+
### Adding Expectations
70+
See the **[Adding Expectations Guide](https://code.getyourguide.com/dataframe-expectations/adding_expectations.html)** for detailed instructions.
71+
72+
73+
## Running Tests
74+
75+
```bash
76+
# Run all tests with parallelization
77+
uv run pytest tests/ -n auto
78+
79+
# Run with coverage and parallelization
80+
uv run pytest tests/ -n auto --cov=dataframe_expectations
81+
82+
# Run specific test file
83+
uv run pytest tests/test_expectations_suite.py -n auto
84+
85+
# Run tests matching a pattern
86+
uv run pytest tests/ -n auto -k "test_expect_min_rows"
87+
```
88+
89+
## Code Style Guidelines
90+
91+
### Python Style
92+
- Follow [PEP 8](https://www.python.org/dev/peps/pep-0008/)
93+
- Use type hints for all function parameters and return values
94+
- Maximum line length: 120 characters
95+
- Use meaningful variable and function names
96+
97+
### Docstrings
98+
- Use Google-style docstrings
99+
- Include parameter descriptions and return types
100+
- Add usage examples for complex functions
101+
102+
### Code Quality
103+
- Write clear, self-documenting code
104+
- Add comments for complex logic
105+
- Keep functions focused and single-purpose
106+
- Avoid deep nesting (max 3-4 levels)
107+
108+
### Testing
109+
- Maintain or improve test coverage
110+
- Test expected behavior (happy paths) and error conditions (edge cases)
111+
- Use descriptive test names
112+
113+
## Submitting a Pull Request
114+
115+
1. **Create a branch** and make your changes
116+
```bash
117+
git checkout -b feature/your-feature-name
118+
```
119+
120+
2. **Run tests:**
121+
```bash
122+
uv run pytest tests/ -n auto --cov=dataframe_expectations
123+
```
124+
125+
3. **Commit using [Conventional Commits](https://www.conventionalcommits.org/)** (see [Versioning](#versioning-and-commits))
126+
```bash
127+
git commit -m "feat: your feature description"
128+
```
129+
130+
4. **Push and open a PR** with a clear description referencing any related issues
131+
132+
## Versioning and Commits
133+
134+
This project follows [Semantic Versioning](https://semver.org/) and uses [Conventional Commits](https://www.conventionalcommits.org/).
135+
136+
### Commit Message Format
137+
138+
```
139+
<type>: <description>
140+
141+
[optional body]
142+
143+
[optional footer]
144+
```
145+
146+
### Commit Types
147+
148+
- `feat:` - New feature → **MINOR** version bump (0.1.0 → 0.2.0)
149+
- `fix:` - Bug fix → **PATCH** version bump (0.1.0 → 0.1.1)
150+
- `feat!:` or `BREAKING CHANGE:` - Breaking change → **MAJOR** version bump (0.1.0 → 1.0.0)
151+
- `docs:` - Documentation changes (no version bump)
152+
- `test:` - Test changes (no version bump)
153+
- `chore:` - Maintenance tasks (no version bump)
154+
- `refactor:` - Code refactoring (no version bump)
155+
- `style:` - Code style changes (no version bump)
156+
- `ci:` - CI/CD changes (no version bump)
157+
158+
### Examples
159+
160+
```bash
161+
# Adding a new feature
162+
git commit -m "feat: add expect_column_sum_equals expectation"
163+
164+
# Fixing a bug
165+
git commit -m "fix: correct validation logic in expect_value_greater_than"
166+
167+
# Breaking change
168+
git commit -m "feat!: remove deprecated API methods"
169+
170+
# With body
171+
git commit -m "feat: add tag filtering support
172+
173+
Allow expectations to be filtered by tags at runtime.
174+
This enables selective execution of validation rules."
175+
176+
# Documentation update
177+
git commit -m "docs: update README with new examples"
178+
```
179+
180+
### What Happens Next
181+
182+
When your PR is merged to main:
183+
1. [Release Please](https://github.com/googleapis/release-please) automatically creates/updates a Release PR
184+
2. The Release PR includes version bump and changelog
185+
3. When the Release PR is merged, a GitHub Release is created
186+
4. The maintainer manually publishes the package to PyPI
187+
188+
## Questions?
189+
190+
If you have questions or need help:
191+
- Open an [issue](https://github.com/getyourguide/dataframe-expectations/issues)
192+
- Review the [documentation](https://code.getyourguide.com/dataframe-expectations/)
193+
194+
Thank you for contributing! 🎉

README.md

Lines changed: 74 additions & 63 deletions
Original file line numberDiff line numberDiff line change
@@ -29,31 +29,9 @@ pip install dataframe-expectations
2929
* pyspark >= 3.3.0
3030
* tabulate >= 0.8.9
3131

32-
### Development setup
32+
### Quick Start
3333

34-
To set up the development environment:
35-
36-
```bash
37-
# 1. Clone the repository
38-
git clone https://github.com/getyourguide/dataframe-expectations.git
39-
cd dataframe-expectations
40-
41-
# 2. Install UV package manager
42-
pip install uv
43-
44-
# 3. Install development dependencies (this will automatically create a virtual environment)
45-
uv sync --group dev
46-
47-
# 4. (Optional) To explicitly activate the virtual environment:
48-
source .venv/bin/activate # On Windows: .venv\Scripts\activate
49-
50-
# 5. Run tests (this will run the tests in the virtual environment)
51-
uv run pytest tests/ --cov=dataframe_expectations
52-
```
53-
54-
### Using the library
55-
56-
**Basic usage with Pandas:**
34+
#### Pandas Example
5735
```python
5836
from dataframe_expectations.suite import DataFrameExpectationsSuite
5937
import pandas as pd
@@ -80,7 +58,7 @@ df = pd.DataFrame({
8058
runner.run(df)
8159
```
8260

83-
**PySpark example:**
61+
#### PySpark Example
8462
```python
8563
from dataframe_expectations.suite import DataFrameExpectationsSuite
8664
from pyspark.sql import SparkSession
@@ -114,7 +92,22 @@ df = spark.createDataFrame(data)
11492
runner.run(df)
11593
```
11694

117-
**Decorator pattern for automatic validation:**
95+
### Validation Patterns
96+
97+
#### Manual Validation
98+
Use `runner.run()` to explicitly validate DataFrames:
99+
100+
```python
101+
# Run validation and raise exception on failure
102+
runner.run(df)
103+
104+
# Run validation without raising exception
105+
result = runner.run(df, raise_on_failure=False)
106+
```
107+
108+
#### Decorator-Based Validation
109+
Automatically validate function return values using decorators:
110+
118111
```python
119112
from dataframe_expectations.suite import DataFrameExpectationsSuite
120113
from pyspark.sql import SparkSession
@@ -159,7 +152,9 @@ def conditional_load(should_load: bool):
159152
return None # No validation when None is returned
160153
```
161154

162-
**Output:**
155+
##### Validation Output
156+
When validation runs, you'll see output like this:
157+
163158
```python
164159
========================== Running expectations suite ==========================
165160
ExpectationMinRows (DataFrame contains at least 3 rows) ... OK
@@ -182,10 +177,32 @@ Some examples of violations:
182177
| 15 | Bob | 60000 |
183178
+-----+------+--------+
184179
================================================================================
180+
```
181+
182+
#### Programmatic Result Inspection
183+
Get detailed validation results without raising exceptions:
185184

185+
```python
186+
# Get detailed results without raising exceptions
187+
result = runner.run(df, raise_on_failure=False)
188+
189+
# Inspect validation outcomes
190+
print(f"Total: {result.total_expectations}, Passed: {result.total_passed}, Failed: {result.total_failed}")
191+
print(f"Pass rate: {result.pass_rate:.2%}")
192+
print(f"Duration: {result.total_duration_seconds:.2f}s")
193+
print(f"Applied filters: {result.applied_filters}")
194+
195+
# Access individual results
196+
for exp_result in result.results:
197+
if exp_result.status == "failed":
198+
print(f"Failed: {exp_result.description} - {exp_result.violation_count} violations")
186199
```
187200

188-
**Tag-based filtering for selective execution:**
201+
### Advanced Features
202+
203+
#### Tag-Based Filtering
204+
Filter which expectations to run using tags:
205+
189206
```python
190207
from dataframe_expectations import DataFrameExpectationsSuite, TagMatchMode
191208

@@ -206,52 +223,46 @@ runner = suite.build(tags=["priority:high", "env:prod"], tag_match_mode=TagMatch
206223
runner.run(df)
207224
```
208225

209-
**Programmatic result inspection:**
210-
```python
211-
# Get detailed results without raising exceptions
212-
result = runner.run(df, raise_on_failure=False)
213-
214-
# Inspect validation outcomes
215-
print(f"Total: {result.total_expectations}, Passed: {result.total_passed}, Failed: {result.total_failed}")
216-
print(f"Pass rate: {result.pass_rate:.2%}")
217-
print(f"Duration: {result.total_duration_seconds:.2f}s")
218-
print(f"Applied filters: {result.applied_filters}")
226+
## Development Setup
219227

220-
# Access individual results
221-
for exp_result in result.results:
222-
if exp_result.status == "failed":
223-
print(f"Failed: {exp_result.description} - {exp_result.violation_count} violations")
224-
```
228+
To set up the development environment:
225229

226-
### How to contribute?
227-
Contributions are welcome! You can enhance the library by adding new expectations, refining existing ones, or improving the testing framework.
230+
```bash
231+
# 1. Fork and clone the repository
232+
git clone https://github.com/getyourguide/dataframe-expectations.git
233+
cd dataframe-expectations
228234

229-
### Versioning
235+
# 2. Install UV package manager
236+
pip install uv
230237

231-
This project follows [Semantic Versioning](https://semver.org/) (SemVer) and uses [Release Please](https://github.com/googleapis/release-please) for automated version management.
238+
# 3. Install development dependencies (this will automatically create a virtual environment)
239+
uv sync --group dev
232240

233-
Versions are automatically determined based on [Conventional Commits](https://www.conventionalcommits.org/):
241+
# 4. Activate the virtual environment
242+
source .venv/bin/activate # On Windows: .venv\Scripts\activate
234243

235-
- `feat:` - New feature → **MINOR** version bump (0.1.0 → 0.2.0)
236-
- `fix:` - Bug fix → **PATCH** version bump (0.1.0 → 0.1.1)
237-
- `feat!:` or `BREAKING CHANGE:` - Breaking change → **MAJOR** version bump (0.1.0 → 1.0.0)
238-
- `chore:`, `docs:`, `style:`, `refactor:`, `test:`, `ci:` - No version bump
244+
# 5. Verify your setup
245+
uv run pytest tests/ -n auto --cov=dataframe_expectations
239246

240-
**Example commits:**
241-
```bash
242-
git commit -m "feat: add new expectation for null values"
243-
git commit -m "fix: correct validation logic in expect_value_greater_than"
244-
git commit -m "feat!: remove deprecated API methods"
247+
# 6. Install pre-commit hooks
248+
pre-commit install
249+
# This will automatically run checks before each commit
245250
```
246251

247-
When changes are pushed to the main branch, Release Please automatically:
248-
1. Creates or updates a Release PR with version bump and changelog
249-
2. When merged, creates a GitHub Release and publishes to PyPI
252+
## Contributing
253+
254+
We welcome contributions! Whether you're adding new expectations, fixing bugs, or improving documentation, your help is appreciated.
255+
256+
Please see [CONTRIBUTING.md](CONTRIBUTING.md) for:
257+
- Development setup instructions
258+
- How to add new expectations
259+
- Code style guidelines
260+
- Testing requirements
261+
- Pull request process
250262

251-
No manual version updates needed - just use conventional commit messages!
263+
## Security
252264

253-
### Security
254-
For security issues please contact [email protected].
265+
For security vulnerabilities, please see our [Security Policy](SECURITY.md) or contact [email protected].
255266

256267
### Legal
257268
dataframe-expectations is licensed under the Apache License, Version 2.0. See [LICENSE](LICENSE.txt) for the full text.

0 commit comments

Comments
 (0)