Skip to content

Commit 6a08bd9

Browse files
committed
docs: updated documentation
1 parent a19126a commit 6a08bd9

File tree

3 files changed

+101
-7
lines changed

3 files changed

+101
-7
lines changed

README.md

Lines changed: 38 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -185,6 +185,44 @@ Some examples of violations:
185185

186186
```
187187

188+
**Tag-based filtering for selective execution:**
189+
```python
190+
from dataframe_expectations import DataFrameExpectationsSuite, TagMatchMode
191+
192+
# Tag expectations with priorities and environments
193+
suite = (
194+
DataFrameExpectationsSuite()
195+
.expect_value_greater_than(column_name="age", value=18, tags=["priority:high", "env:prod"])
196+
.expect_value_not_null(column_name="name", tags=["priority:high"])
197+
.expect_min_rows(min_rows=1, tags=["priority:low", "env:test"])
198+
)
199+
200+
# Run only high-priority checks (OR logic - matches ANY tag)
201+
runner = suite.build(tags=["priority:high"], tag_match_mode=TagMatchMode.ANY)
202+
runner.run(df)
203+
204+
# Run production-critical checks (AND logic - matches ALL tags)
205+
runner = suite.build(tags=["priority:high", "env:prod"], tag_match_mode=TagMatchMode.ALL)
206+
runner.run(df)
207+
```
208+
209+
**Programmatic result inspection:**
210+
```python
211+
# Get detailed results without raising exceptions
212+
result = runner.run(df, raise_on_failure=False)
213+
214+
# Inspect validation outcomes
215+
print(f"Total: {result.total_expectations}, Passed: {result.total_passed}, Failed: {result.total_failed}")
216+
print(f"Pass rate: {result.pass_rate:.2%}")
217+
print(f"Duration: {result.total_duration_seconds:.2f}s")
218+
print(f"Applied filters: {result.applied_filters}")
219+
220+
# Access individual results
221+
for exp_result in result.results:
222+
if exp_result.status == "failed":
223+
print(f"Failed: {exp_result.description} - {exp_result.violation_count} violations")
224+
```
225+
188226
### How to contribute?
189227
Contributions are welcome! You can enhance the library by adding new expectations, refining existing ones, or improving the testing framework.
190228

docs/source/adding_expectations.rst

Lines changed: 17 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -64,6 +64,7 @@ Once you have decided where the expectation needs to be added, you can define it
6464
def create_expectation_is_divisible(**kwargs) -> DataFrameColumnExpectation:
6565
column_name = kwargs["column_name"]
6666
value = kwargs["value"]
67+
tags = kwargs.get("tags")
6768
6869
return DataFrameColumnExpectation(
6970
expectation_name="ExpectIsDivisible",
@@ -72,6 +73,7 @@ Once you have decided where the expectation needs to be added, you can define it
7273
fn_violations_pyspark=lambda df: df.filter(F.col(column_name) % value != 0), # function that finds violations
7374
description=f"'{column_name}' divisible by {value}",
7475
error_message=f"'{column_name}' not divisible by {value}.",
76+
tags=tags,
7577
)
7678
7779
For additional guidance, you can refer to the implementation of ``ExpectationValueGreaterThan`` and
@@ -84,7 +86,6 @@ The ``@register_expectation`` decorator is required and has the following mandat
8486
- ``category``: Use ``ExpectationCategory.COLUMN`` or ``ExpectationCategory.AGGREGATION``
8587
- ``subcategory``: Choose from ``ExpectationSubcategory.NUMERICAL``, ``ExpectationSubcategory.STRING``, or ``ExpectationSubcategory.ANY_VALUE``
8688
- ``pydoc``: A brief description of what the expectation does
87-
- ``params``: List of parameter names (e.g., ["column_name", "value"])
8889
- ``params_doc``: Dictionary mapping parameter names to their descriptions
8990
- ``param_types``: Dictionary mapping parameter names to their Python types
9091

@@ -132,14 +133,15 @@ Here's an example of how to implement an aggregation-based expectation:
132133
Expectation that validates the DataFrame has at least a minimum number of rows.
133134
"""
134135
135-
def __init__(self, min_count: int):
136+
def __init__(self, min_count: int, tags: Optional[List[str]] = None):
136137
description = f"DataFrame has at least {min_count} row(s)"
137138
self.min_count = min_count
138139
139140
super().__init__(
140141
expectation_name="ExpectationMinRows",
141142
column_names=[], # Empty list since this operates on entire DataFrame
142143
description=description,
144+
tags=tags,
143145
)
144146
145147
def aggregate_and_validate_pandas(
@@ -198,7 +200,6 @@ Here's an example of how to implement an aggregation-based expectation:
198200
category=ExpectationCategory.AGGREGATION,
199201
subcategory=ExpectationSubcategory.ANY_VALUE,
200202
pydoc="Expect DataFrame to have at least a minimum number of rows.",
201-
params=["min_count"],
202203
params_doc={"min_count": "Minimum required number of rows"},
203204
param_types={"min_count": int}
204205
)
@@ -213,7 +214,7 @@ Here's an example of how to implement an aggregation-based expectation:
213214
Returns:
214215
ExpectationMinRows: A configured expectation instance.
215216
"""
216-
return ExpectationMinRows(min_count=kwargs["min_count"])
217+
return ExpectationMinRows(min_count=kwargs["min_count"], tags=kwargs.get("tags"))
217218
218219
Key differences for aggregation-based expectations:
219220

@@ -236,7 +237,7 @@ Example of a column-based aggregation expectation:
236237
Expectation that validates the mean value of a column falls within a specified range.
237238
"""
238239
239-
def __init__(self, column_name: str, min_value: float, max_value: float):
240+
def __init__(self, column_name: str, min_value: float, max_value: float, tags: Optional[List[str]] = None):
240241
description = f"column '{column_name}' mean value between {min_value} and {max_value}"
241242
242243
self.column_name = column_name
@@ -247,6 +248,7 @@ Example of a column-based aggregation expectation:
247248
expectation_name="ExpectationColumnMeanBetween",
248249
column_names=[column_name], # List of columns this expectation requires
249250
description=description,
251+
tags=tags,
250252
)
251253
252254
def aggregate_and_validate_pandas(
@@ -415,6 +417,13 @@ The method names are automatically derived by:
415417
416418
No manual integration is required! Simply register your expectation and it will be available in the suite.
417419
420+
**Note for Expectation Authors:**
421+
422+
Your expectations automatically support tagging without any additional implementation. The tagging functionality is
423+
handled by the ``DataFrameExpectation`` base class and the suite builder. Users simply pass the ``tags`` parameter
424+
when adding expectations to their suite. See the Getting Started guide for details on how users can leverage tags
425+
for selective execution.
426+
418427
Generating Type Stubs for IDE Support
419428
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
420429
@@ -424,8 +433,9 @@ To provide IDE autocomplete and type hints for all expect methods, run the stub
424433
425434
uv run python scripts/generate_suite_stubs.py
426435
427-
This creates ``suite.pyi`` with type hints for all registered expectations. The stub file is automatically
428-
validated by the sanity check script and pre-commit hooks.
436+
This creates ``suite.pyi`` with type hints for all registered expectations. The stub generator automatically adds
437+
the ``tags`` parameter to all expectation method signatures with appropriate documentation, so you don't need to
438+
include it in your ``params_doc``. The stub file is automatically validated by the sanity check script and pre-commit hooks.
429439
430440
Adding Unit Tests
431441
-----------------

docs/source/getting_started.rst

Lines changed: 46 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -168,6 +168,52 @@ When validations fail, you'll see detailed output like this:
168168
+-----+------+--------+
169169
================================================================================
170170
171+
Tag-Based Filtering for Selective Execution
172+
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
173+
174+
You can tag expectations and selectively run them based on priority, environment, or custom categories:
175+
176+
.. code-block:: python
177+
178+
from dataframe_expectations import DataFrameExpectationsSuite, TagMatchMode
179+
180+
# Tag expectations with priorities and environments
181+
suite = (
182+
DataFrameExpectationsSuite()
183+
.expect_value_greater_than(column_name="age", value=18, tags=["priority:high", "env:prod"])
184+
.expect_value_not_null(column_name="name", tags=["priority:high"])
185+
.expect_min_rows(min_rows=1, tags=["priority:low", "env:test"])
186+
)
187+
188+
# Run only high-priority checks (OR logic - matches ANY tag)
189+
runner = suite.build(tags=["priority:high"], tag_match_mode=TagMatchMode.ANY)
190+
runner.run(df)
191+
192+
# Run production-critical checks (AND logic - matches ALL tags)
193+
runner = suite.build(tags=["priority:high", "env:prod"], tag_match_mode=TagMatchMode.ALL)
194+
runner.run(df)
195+
196+
Programmatic Result Inspection
197+
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
198+
199+
Get detailed validation results without raising exceptions:
200+
201+
.. code-block:: python
202+
203+
# Get detailed results without raising exceptions
204+
result = runner.run(df, raise_on_failure=False)
205+
206+
# Inspect validation outcomes
207+
print(f"Total: {result.total_expectations}, Passed: {result.total_passed}, Failed: {result.total_failed}")
208+
print(f"Pass rate: {result.pass_rate:.2%}")
209+
print(f"Duration: {result.total_duration_seconds:.2f}s")
210+
print(f"Applied filters: {result.applied_filters}")
211+
212+
# Access individual results
213+
for exp_result in result.results:
214+
if exp_result.status == "failed":
215+
print(f"Failed: {exp_result.description} - {exp_result.violation_count} violations")
216+
171217
How to contribute?
172218
------------------
173219
Contributions are welcome! You can enhance the library by adding new expectations, refining existing ones, or improving

0 commit comments

Comments
 (0)