Skip to content

Commit 09c712d

Browse files
committed
moved readme to sphinx
1 parent eda64cd commit 09c712d

File tree

7 files changed

+2001
-529
lines changed

7 files changed

+2001
-529
lines changed

.gitignore

Lines changed: 12 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -94,11 +94,18 @@ ipython_config.py
9494
# install all needed dependencies.
9595
# Pipfile.lock
9696

97-
# UV
98-
# Similar to Pipfile.lock, it is generally recommended to include uv.lock in version control.
99-
# This is especially recommended for binary packages to ensure reproducibility, and is more
100-
# commonly ignored for libraries.
101-
uv.lock
97+
# sphinx
98+
# Sphinx documentation build output
99+
build/
100+
101+
# Sphinx cache and temporary files
102+
source/.doctrees/
103+
.doctrees/
104+
105+
# Auto-generated API documentation (if using sphinx-apidoc)
106+
source/_autosummary/
107+
source/_generated/
108+
source/api/
102109

103110
# poetry
104111
# Similar to Pipfile.lock, it is generally recommended to include poetry.lock in version control.

README.md

Lines changed: 29 additions & 465 deletions
Large diffs are not rendered by default.

docs/source/adding_expectations.rst

Lines changed: 493 additions & 0 deletions
Large diffs are not rendered by default.

docs/source/expectations.rst

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,7 @@
11
Expectation Gallery
22
===================
33

4+
45
This page provides comprehensive documentation for all available DataFrame expectations.
56
The expectations are automatically categorized and organized for easy browsing.
67

docs/source/getting_started.rst

Lines changed: 122 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,122 @@
1+
Getting Started
2+
===============
3+
4+
Welcome to DataFrame Expectations! This guide will help you get up and running quickly with validating your Pandas and PySpark DataFrames.
5+
6+
Installation
7+
------------
8+
9+
Install DataFrame Expectations using pip:
10+
11+
.. code-block:: bash
12+
13+
pip install dataframe-expectations
14+
15+
Requirements
16+
~~~~~~~~~~~~
17+
18+
* Python 3.10+
19+
* pandas >= 1.5.0
20+
* pyspark >= 3.3.0
21+
* tabulate >= 0.8.9
22+
23+
Basic Usage
24+
-----------
25+
26+
DataFrame Expectations provides a fluent API for building validation suites. Here's how to get started:
27+
28+
Pandas Example
29+
~~~~~~~~~~~~~~
30+
31+
.. code-block:: python
32+
33+
import pandas as pd
34+
from dataframe_expectations.expectations_suite import DataframeExpectationsSuite
35+
36+
# Create a sample DataFrame
37+
df = pd.DataFrame({
38+
"age": [25, 15, 45, 22],
39+
"name": ["Alice", "Bob", "Charlie", "Diana"],
40+
"salary": [50000, 60000, 80000, 45000]
41+
})
42+
43+
# Build a validation suite
44+
suite = (
45+
DataframeExpectationsSuite()
46+
.expect_min_rows(3) # At least 3 rows
47+
.expect_max_rows(10) # At most 10 rows
48+
.expect_value_greater_than("age", 18) # All ages > 18
49+
.expect_value_less_than("salary", 100000) # All salaries < 100k
50+
.expect_value_not_null("name") # No null names
51+
)
52+
53+
# Run validation
54+
suite.run(df)
55+
56+
57+
PySpark Example
58+
~~~~~~~~~~~~~~~
59+
60+
.. code-block:: python
61+
62+
from pyspark.sql import SparkSession
63+
from dataframe_expectations.expectations_suite import DataframeExpectationsSuite
64+
65+
# Initialize Spark
66+
spark = SparkSession.builder.appName("DataFrameExpectations").getOrCreate()
67+
68+
# Create a sample DataFrame
69+
data = [
70+
{"age": 25, "name": "Alice", "salary": 50000},
71+
{"age": 15, "name": "Bob", "salary": 60000},
72+
{"age": 45, "name": "Charlie", "salary": 80000},
73+
{"age": 22, "name": "Diana", "salary": 45000}
74+
]
75+
df = spark.createDataFrame(data)
76+
77+
# Build a validation suite (same API as Pandas!)
78+
suite = (
79+
DataframeExpectationsSuite()
80+
.expect_min_rows(3)
81+
.expect_max_rows(10)
82+
.expect_value_greater_than("age", 18)
83+
.expect_value_less_than("salary", 100000)
84+
.expect_value_not_null("name")
85+
)
86+
87+
# Run validation
88+
suite.run(df)
89+
90+
Example Output
91+
~~~~~~~~~~~~~~
92+
93+
When validations fail, you'll see detailed output like this:
94+
95+
.. code-block:: text
96+
97+
========================== Running expectations suite ==========================
98+
ExpectationMinRows (DataFrame contains at least 3 rows) ... OK
99+
ExpectationMaxRows (DataFrame contains at most 10 rows) ... OK
100+
ExpectationValueGreaterThan ('age' is greater than 18) ... FAIL
101+
ExpectationValueLessThan ('salary' is less than 100000) ... OK
102+
ExpectationValueNotNull ('name' is not null) ... OK
103+
============================ 4 success, 1 failures =============================
104+
105+
ExpectationSuiteFailure: (1/5) expectations failed.
106+
107+
================================================================================
108+
List of violations:
109+
--------------------------------------------------------------------------------
110+
[Failed 1/1] ExpectationValueGreaterThan ('age' is greater than 18): Found 1 row(s) where 'age' is not greater than 18.
111+
Some examples of violations:
112+
+-----+------+--------+
113+
| age | name | salary |
114+
+-----+------+--------+
115+
| 15 | Bob | 60000 |
116+
+-----+------+--------+
117+
================================================================================
118+
119+
How to contribute?
120+
------------------
121+
Contributions are welcome! You can enhance the library by adding new expectations, refining existing ones, or improving
122+
the testing framework or the documentation.

docs/source/index.rst

Lines changed: 13 additions & 59 deletions
Original file line numberDiff line numberDiff line change
@@ -1,68 +1,22 @@
1-
DataFrame Expectations Documentation
2-
====================================
1+
DataFrame Expectations
2+
======================
33

4-
**DataFrameExpectations** is a Python library designed to validate **Pandas** and **PySpark**
5-
DataFrames using customizable, reusable expectations. It simplifies testing in data pipelines
6-
and end-to-end workflows by providing a standardized framework for DataFrame validation.
4+
**DataFrameExpectations** is a Python library designed to validate **Pandas** and **PySpark** DataFrames using
5+
customizable, reusable expectations. It simplifies testing in data pipelines and end-to-end workflows by providing a
6+
standardized framework for DataFrame validation.
7+
8+
Instead of using different validation approaches for DataFrames, this library provides a standardized solution for this
9+
use case. As a result, any contributions made here, such as adding new expectations, can be leveraged by all users of
10+
the library.
11+
12+
See the starter guide :doc:`here <getting_started>`.
13+
See the complete list of expectations :doc:`here <expectations>`.
714

815
.. toctree::
916
:maxdepth: 2
1017
:caption: Contents:
1118

1219
getting_started
20+
adding_expectations
1321
expectations
1422
api_reference
15-
contributing
16-
17-
Quick Start
18-
-----------
19-
20-
Install the package:
21-
22-
.. code-block:: bash
23-
24-
pip install dataframe-expectations
25-
26-
Basic usage with Pandas:
27-
28-
.. code-block:: python
29-
30-
from dataframe_expectations.expectations_suite import DataframeExpectationsSuite
31-
import pandas as pd
32-
33-
# Create a suite of expectations
34-
suite = (
35-
DataframeExpectationsSuite()
36-
.expect_value_greater_than("age", 18)
37-
.expect_value_less_than("age", 65)
38-
)
39-
40-
# Create a DataFrame to validate
41-
df = pd.DataFrame({"age": [25, 30, 45], "name": ["Alice", "Bob", "Charlie"]})
42-
43-
# Run the validation
44-
suite.run(df)
45-
46-
Basic usage with PySpark:
47-
48-
.. code-block:: python
49-
50-
from dataframe_expectations.expectations_suite import DataframeExpectationsSuite
51-
from pyspark.sql import SparkSession
52-
53-
# Initialize Spark session
54-
spark = SparkSession.builder.appName("DataFrameExpectations").getOrCreate()
55-
56-
# Create a suite of expectations
57-
suite = (
58-
DataframeExpectationsSuite()
59-
.expect_value_greater_than("age", 18)
60-
.expect_value_less_than("age", 65)
61-
)
62-
63-
# Create a PySpark DataFrame to validate
64-
data = [{"age": 25, "name": "Alice"}, {"age": 30, "name": "Bob"}, {"age": 45, "name": "Charlie"}]
65-
df = spark.createDataFrame(data)
66-
67-
# Run the validation
68-
suite.run(df)

0 commit comments

Comments
 (0)