Merge pull request #33 from getyourguide/docs/split-readme-and-restructure

ryanseq-gyg · web-flow · commit 8cf59b15bfa7 · 2025-11-23T15:50:51.000+01:00
docs: partitioned readme
diff --git a/CONTRIBUTING.md b/CONTRIBUTING.md
@@ -0,0 +1,194 @@
+# Contributing to DataFrameExpectations
+
+Thank you for your interest in contributing to DataFrameExpectations! We welcome contributions from the community, whether it's adding new expectations, fixing bugs, improving documentation, or enhancing the testing framework.
+
+## Table of Contents
+
+- [Getting Started](#getting-started)
+- [Development Setup](#development-setup)
+- [How to Contribute](#how-to-contribute)
+- [Adding New Expectations](#adding-new-expectations)
+- [Running Tests](#running-tests)
+- [Code Style Guidelines](#code-style-guidelines)
+- [Submitting a Pull Request](#submitting-a-pull-request)
+- [Versioning and Commits](#versioning-and-commits)
+
+## Getting Started
+
+Before you begin:
+1. Check existing [issues](https://github.com/getyourguide/dataframe-expectations/issues) and [pull requests](https://github.com/getyourguide/dataframe-expectations/pulls) to avoid duplicates
+2. For major changes, open an issue first to discuss your proposal
+3. Ensure you agree with the [Apache 2.0 License](LICENSE.txt)
+
+## Development Setup
+
+1. **Fork and clone the repository:**
+   ```bash
+   git clone https://github.com/getyourguide/dataframe-expectations.git
+   cd dataframe-expectations
+   ```
+
+2. **Install UV package manager:**
+   ```bash
+   pip install uv
+   ```
+
+3. **Install development dependencies:**
+   ```bash
+   # This will automatically create a virtual environment
+   uv sync --group dev
+   ```
+
+4. **Activate the virtual environment:**
+   ```bash
+   source .venv/bin/activate  # On Windows: .venv\Scripts\activate
+   ```
+
+5. **Verify your setup:**
+   ```bash
+   uv run pytest tests/ -n auto --cov=dataframe_expectations
+   ```
+
+6. **(Optional) Install pre-commit hooks:**
+   ```bash
+   pre-commit install
+   ```
+   This will automatically run checks before each commit.
+
+## How to Contribute
+
+### Reporting Bugs
+Open an [issue](https://github.com/getyourguide/dataframe-expectations/issues) with a clear description, steps to reproduce, expected vs. actual behavior, and relevant environment details.
+
+### Documentation
+Fix typos, clarify docs, add examples, or improve the README.
+
+### Features
+Open an issue first to discuss new features, explain the use case, and consider backward compatibility.
+
+### Adding Expectations
+See the **[Adding Expectations Guide](https://code.getyourguide.com/dataframe-expectations/adding_expectations.html)** for detailed instructions.
+
+
+## Running Tests
+
+```bash
+# Run all tests with parallelization
+uv run pytest tests/ -n auto
+
+# Run with coverage and parallelization
+uv run pytest tests/ -n auto --cov=dataframe_expectations
+
+# Run specific test file
+uv run pytest tests/test_expectations_suite.py -n auto
+
+# Run tests matching a pattern
+uv run pytest tests/ -n auto -k "test_expect_min_rows"
+```
+
+## Code Style Guidelines
+
+### Python Style
+- Follow [PEP 8](https://www.python.org/dev/peps/pep-0008/)
+- Use type hints for all function parameters and return values
+- Maximum line length: 120 characters
+- Use meaningful variable and function names
+
+### Docstrings
+- Use Google-style docstrings
+- Include parameter descriptions and return types
+- Add usage examples for complex functions
+
+### Code Quality
+- Write clear, self-documenting code
+- Add comments for complex logic
+- Keep functions focused and single-purpose
+- Avoid deep nesting (max 3-4 levels)
+
+### Testing
+- Maintain or improve test coverage
+- Test expected behavior (happy paths) and error conditions (edge cases)
+- Use descriptive test names
+
+## Submitting a Pull Request
+
+1. **Create a branch** and make your changes
+   ```bash
+   git checkout -b feature/your-feature-name
+   ```
+
+2. **Run tests:**
+   ```bash
+   uv run pytest tests/ -n auto --cov=dataframe_expectations
+   ```
+
+3. **Commit using [Conventional Commits](https://www.conventionalcommits.org/)** (see [Versioning](#versioning-and-commits))
+   ```bash
+   git commit -m "feat: your feature description"
+   ```
+
+4. **Push and open a PR** with a clear description referencing any related issues
+
+## Versioning and Commits
+
+This project follows [Semantic Versioning](https://semver.org/) and uses [Conventional Commits](https://www.conventionalcommits.org/).
+
+### Commit Message Format
+
+```
+<type>: <description>
+
+[optional body]
+
+[optional footer]
+```
+
+### Commit Types
+
+- `feat:` - New feature → **MINOR** version bump (0.1.0 → 0.2.0)
+- `fix:` - Bug fix → **PATCH** version bump (0.1.0 → 0.1.1)
+- `feat!:` or `BREAKING CHANGE:` - Breaking change → **MAJOR** version bump (0.1.0 → 1.0.0)
+- `docs:` - Documentation changes (no version bump)
+- `test:` - Test changes (no version bump)
+- `chore:` - Maintenance tasks (no version bump)
+- `refactor:` - Code refactoring (no version bump)
+- `style:` - Code style changes (no version bump)
+- `ci:` - CI/CD changes (no version bump)
+
+### Examples
+
+```bash
+# Adding a new feature
+git commit -m "feat: add expect_column_sum_equals expectation"
+
+# Fixing a bug
+git commit -m "fix: correct validation logic in expect_value_greater_than"
+
+# Breaking change
+git commit -m "feat!: remove deprecated API methods"
+
+# With body
+git commit -m "feat: add tag filtering support
+
+Allow expectations to be filtered by tags at runtime.
+This enables selective execution of validation rules."
+
+# Documentation update
+git commit -m "docs: update README with new examples"
+```
+
+### What Happens Next
+
+When your PR is merged to main:
+1. [Release Please](https://github.com/googleapis/release-please) automatically creates/updates a Release PR
+2. The Release PR includes version bump and changelog
+3. When the Release PR is merged, a GitHub Release is created
+4. The maintainer manually publishes the package to PyPI
+
+## Questions?
+
+If you have questions or need help:
+- Open an [issue](https://github.com/getyourguide/dataframe-expectations/issues)
+- Review the [documentation](https://code.getyourguide.com/dataframe-expectations/)
+
+Thank you for contributing! 🎉
diff --git a/README.md b/README.md
@@ -51,9 +51,9 @@ source .venv/bin/activate  # On Windows: .venv\Scripts\activate
 uv run pytest tests/ --cov=dataframe_expectations
 ```
 
-### Using the library
+### Quick Start
 
-**Basic usage with Pandas:**
+#### Pandas Example
 ```python
 from dataframe_expectations.suite import DataFrameExpectationsSuite
 import pandas as pd
@@ -80,7 +80,7 @@ df = pd.DataFrame({
 runner.run(df)
 ```
 
-**PySpark example:**
+#### PySpark Example
 ```python
 from dataframe_expectations.suite import DataFrameExpectationsSuite
 from pyspark.sql import SparkSession
@@ -114,7 +114,22 @@ df = spark.createDataFrame(data)
 runner.run(df)
 ```
 
-**Decorator pattern for automatic validation:**
+### Validation Patterns
+
+#### Manual Validation
+Use `runner.run()` to explicitly validate DataFrames:
+
+```python
+# Run validation and raise exception on failure
+runner.run(df)
+
+# Run validation without raising exception
+result = runner.run(df, raise_on_failure=False)
+```
+
+#### Decorator-Based Validation
+Automatically validate function return values using decorators:
+
 ```python
 from dataframe_expectations.suite import DataFrameExpectationsSuite
 from pyspark.sql import SparkSession
@@ -159,7 +174,9 @@ def conditional_load(should_load: bool):
     return None  # No validation when None is returned
 ```
 
-**Output:**
+##### Validation Output
+When validation runs, you'll see output like this:
+
 ```python
 ========================== Running expectations suite ==========================
 ExpectationMinRows (DataFrame contains at least 3 rows) ... OK
@@ -182,31 +199,11 @@ Some examples of violations:
 | 15  | Bob  | 60000  |
 +-----+------+--------+
 ================================================================================
-
 ```
 
-**Tag-based filtering for selective execution:**
-```python
-from dataframe_expectations import DataFrameExpectationsSuite, TagMatchMode
-
-# Tag expectations with priorities and environments
-suite = (
-    DataFrameExpectationsSuite()
-    .expect_value_greater_than(column_name="age", value=18, tags=["priority:high", "env:prod"])
-    .expect_value_not_null(column_name="name", tags=["priority:high"])
-    .expect_min_rows(min_rows=1, tags=["priority:low", "env:test"])
-)
-
-# Run only high-priority checks (OR logic - matches ANY tag)
-runner = suite.build(tags=["priority:high"], tag_match_mode=TagMatchMode.ANY)
-runner.run(df)
-
-# Run production-critical checks (AND logic - matches ALL tags)
-runner = suite.build(tags=["priority:high", "env:prod"], tag_match_mode=TagMatchMode.ALL)
-runner.run(df)
-```
+#### Programmatic Result Inspection
+Get detailed validation results without raising exceptions:
 
-**Programmatic result inspection:**
 ```python
 # Get detailed results without raising exceptions
 result = runner.run(df, raise_on_failure=False)
@@ -223,35 +220,45 @@ for exp_result in result.results:
         print(f"Failed: {exp_result.description} - {exp_result.violation_count} violations")
 ```
 
-### How to contribute?
-Contributions are welcome! You can enhance the library by adding new expectations, refining existing ones, or improving the testing framework.
+### Advanced Features
 
-### Versioning
+#### Tag-Based Filtering
+Filter which expectations to run using tags:
 
-This project follows [Semantic Versioning](https://semver.org/) (SemVer) and uses [Release Please](https://github.com/googleapis/release-please) for automated version management.
+```python
+from dataframe_expectations import DataFrameExpectationsSuite, TagMatchMode
 
-Versions are automatically determined based on [Conventional Commits](https://www.conventionalcommits.org/):
+# Tag expectations with priorities and environments
+suite = (
+    DataFrameExpectationsSuite()
+    .expect_value_greater_than(column_name="age", value=18, tags=["priority:high", "env:prod"])
+    .expect_value_not_null(column_name="name", tags=["priority:high"])
+    .expect_min_rows(min_rows=1, tags=["priority:low", "env:test"])
+)
 
-- `feat:` - New feature → **MINOR** version bump (0.1.0 → 0.2.0)
-- `fix:` - Bug fix → **PATCH** version bump (0.1.0 → 0.1.1)
-- `feat!:` or `BREAKING CHANGE:` - Breaking change → **MAJOR** version bump (0.1.0 → 1.0.0)
-- `chore:`, `docs:`, `style:`, `refactor:`, `test:`, `ci:` - No version bump
+# Run only high-priority checks (OR logic - matches ANY tag)
+runner = suite.build(tags=["priority:high"], tag_match_mode=TagMatchMode.ANY)
+runner.run(df)
 
-**Example commits:**
-```bash
-git commit -m "feat: add new expectation for null values"
-git commit -m "fix: correct validation logic in expect_value_greater_than"
-git commit -m "feat!: remove deprecated API methods"
+# Run production-critical checks (AND logic - matches ALL tags)
+runner = suite.build(tags=["priority:high", "env:prod"], tag_match_mode=TagMatchMode.ALL)
+runner.run(df)
 ```
 
-When changes are pushed to the main branch, Release Please automatically:
-1. Creates or updates a Release PR with version bump and changelog
-2. When merged, creates a GitHub Release and publishes to PyPI
+## Contributing
+
+We welcome contributions! Whether you're adding new expectations, fixing bugs, or improving documentation, your help is appreciated.
+
+Please see [CONTRIBUTING.md](CONTRIBUTING.md) for:
+- Development setup instructions
+- How to add new expectations
+- Code style guidelines
+- Testing requirements
+- Pull request process
 
-No manual version updates needed - just use conventional commit messages!
+## Security
 
-### Security
-For security issues please contact security@getyourguide.com.
+For security vulnerabilities, please see our [Security Policy](SECURITY.md) or contact security@getyourguide.com.
 
 ### Legal
 dataframe-expectations is licensed under the Apache License, Version 2.0. See [LICENSE](LICENSE.txt) for the full text.
diff --git a/SECURITY.md b/SECURITY.md
@@ -0,0 +1,23 @@
+# Security Policy
+
+## How to Report
+
+Please do not report security vulnerabilities through public GitHub issues.
+
+Instead, please report security vulnerabilities by emailing:
+
+security@getyourguide.com
+
+## What to Include
+
+To help us better understand and address the issue, please include:
+
+- A description of the vulnerability
+- Steps to reproduce the issue
+- Potential impact of the vulnerability
+- Any suggested fixes or mitigations (if available)
+- Your contact information for follow-up
+
+## Contact
+
+For any questions about this security policy, please contact security@getyourguide.com.
diff --git a/docs/source/getting_started.rst b/docs/source/getting_started.rst