Skip to content

Commit ca05555

Browse files
committed
Revamp docs intro and add quickstart guide
1 parent 3cf007f commit ca05555

File tree

3 files changed

+357
-166
lines changed

3 files changed

+357
-166
lines changed

docs/_quarto.yml

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -64,6 +64,7 @@ website:
6464
- section: "Getting Started"
6565
contents:
6666
- index.qmd
67+
- user-guide/quick-start.qmd
6768
- user-guide/installation.qmd
6869
- section: "Validation Plan"
6970
contents:

docs/index.qmd

Lines changed: 133 additions & 166 deletions
Original file line numberDiff line numberDiff line change
@@ -1,223 +1,190 @@
11
---
2-
title: Introduction
2+
title: Welcome to Pointblank
33
jupyter: python3
4-
toc-expand: 2
54
html-table-processing: none
65
---
6+
7+
<div style="text-align: center;">
8+
9+
![](/assets/pointblank_logo.svg){width=60%}
10+
11+
**Data validation made beautiful and powerful.**
12+
13+
</div>
14+
15+
Pointblank is a data validation framework for Python that makes data quality checks beautiful,
16+
powerful, and stakeholder-friendly. Instead of cryptic error messages, get stunning interactive
17+
reports that turn data issues into conversations.
18+
719
```{python}
820
#| echo: false
921
#| output: false
1022
import pointblank as pb
1123
pb.config(report_incl_footer=False)
1224
```
1325

14-
The Pointblank library is all about assessing the state of data quality for a table. You provide the
15-
validation rules and the library will dutifully interrogate the data and provide useful reporting.
16-
We can use different types of tables like Polars and Pandas DataFrames, Parquet files, or various
17-
database tables. Let's walk through what data validation looks like in Pointblank.
18-
19-
## A Simple Validation Table
20-
21-
This is a validation report table that is produced from a validation of a Polars DataFrame:
22-
2326
```{python}
24-
#| code-fold: true
25-
#| code-summary: "Show the code"
27+
#| echo: false
2628
import pointblank as pb
27-
28-
(
29-
pb.Validate(data=pb.load_dataset(dataset="small_table"), label="Example Validation")
30-
.col_vals_lt(columns="a", value=10)
31-
.col_vals_between(columns="d", left=0, right=5000)
32-
.col_vals_in_set(columns="f", set=["low", "mid", "high"])
33-
.col_vals_regex(columns="b", pattern=r"^[0-9]-[a-z]{3}-[0-9]{3}$")
29+
import polars as pl
30+
31+
validation = (
32+
pb.Validate(
33+
data=pb.load_dataset(dataset="game_revenue", tbl_type="polars"),
34+
tbl_name="game_revenue",
35+
label="Comprehensive validation of game revenue data",
36+
thresholds=pb.Thresholds(warning=0.10, error=0.25, critical=0.35),
37+
brief=True
38+
)
39+
.col_vals_regex(columns="player_id", pattern=r"^[A-Z]{12}[0-9]{3}$") # STEP 1
40+
.col_vals_gt(columns="session_duration", value=20) # STEP 2
41+
.col_vals_ge(columns="item_revenue", value=0.20) # STEP 3
42+
.col_vals_in_set(columns="item_type", set=["iap", "ad"]) # STEP 4
43+
.col_vals_in_set( # STEP 5
44+
columns="acquisition",
45+
set=["google", "facebook", "organic", "crosspromo", "other_campaign"]
46+
)
47+
.col_vals_not_in_set(columns="country", set=["Mongolia", "Germany"]) # STEP 6
48+
.col_vals_between( # STEP 7
49+
columns="session_duration",
50+
left=10, right=50,
51+
pre = lambda df: df.select(pl.median("session_duration")),
52+
brief="Expect that the median of `session_duration` should be between `10` and `50`."
53+
)
54+
.rows_distinct(columns_subset=["player_id", "session_id", "time"]) # STEP 8
55+
.row_count_match(count=2000) # STEP 9
56+
.col_count_match(count=11) # STEP 10
57+
.col_vals_not_null(columns="item_type") # STEP 11
58+
.col_exists(columns="start_day") # STEP 12
3459
.interrogate()
3560
)
36-
```
37-
38-
Each row in this reporting table constitutes a single validation step. Roughly, the left-hand side
39-
outlines the validation rules and the right-hand side provides the results of each validation step.
40-
While simple in principle, there's a lot of useful information packed into this validation table.
41-
42-
Here's a diagram that describes a few of the important parts of the validation table:
43-
44-
![](/assets/validation-table-diagram.png){width=100%}
45-
46-
There are three things that should be noted here:
47-
48-
- validation steps: each step is a separate test on the table, focused on a certain aspect of the
49-
table
50-
- validation rules: the validation type is provided here along with key constraints
51-
- validation results: interrogation results are provided here, with a breakdown of test units
52-
(*total*, *passing*, and *failing*), threshold flags, and more
53-
54-
The intent is to provide the key information in one place, and have it be interpretable by data
55-
stakeholders. For example, a failure can be seen in the second row (notice there's a CSV button). A
56-
data quality stakeholder could click this to download a CSV of the failing rows for that step.
57-
58-
## Example Code, Step-by-Step
59-
60-
This section will walk you through the example code used above.
61-
62-
```python
63-
import pointblank as pb
6461
65-
(
66-
pb.Validate(data=pb.load_dataset(dataset="small_table"))
67-
.col_vals_lt(columns="a", value=10)
68-
.col_vals_between(columns="d", left=0, right=5000)
69-
.col_vals_in_set(columns="f", set=["low", "mid", "high"])
70-
.col_vals_regex(columns="b", pattern=r"^[0-9]-[a-z]{3}-[0-9]{3}$")
71-
.interrogate()
72-
)
62+
validation.get_tabular_report(title="Game Revenue Validation Report").show("browser")
7363
```
7464

75-
Note these three key pieces in the code:
65+
Ready to validate? Start with our [Installation](user-guide/installation.qmd) guide or jump straight
66+
to the [User Guide](user-guide/index.qmd).
7667

77-
- **data**: the `Validate(data=)` argument takes a DataFrame or database table that you want to validate
78-
- **steps**: the methods starting with `col_vals_` specify validation steps that run on specific columns
79-
- **execution**: the `~~Validate.interrogate()` method executes the validation plan on the table
68+
Pointblank is made with 💙 by [Posit](https://posit.co/).
8069

81-
This common pattern is used in a validation workflow, where `Validate` and
82-
`~~Validate.interrogate()` bookend a validation plan generated through calling validation methods.
70+
## What is Data Validation?
8371

84-
In the next few sections we'll go a bit further by understanding how we can measure data quality and
85-
respond to failures.
72+
Data validation ensures your data meets quality standards before it's used in analysis, reports, or
73+
downstream systems. Pointblank provides a structured way to define validation rules, execute them,
74+
and communicate results to both technical and non-technical stakeholders.
8675

87-
## Understanding Test Units
76+
With Pointblank you can:
8877

89-
Each validation step will execute a type of validation test on the target table. For example, a
90-
`~~Validate.col_vals_lt()` validation step can test that each value in a column is less than a
91-
specified number. And the key finding that's reported in each step is the number of *test units*
92-
that pass or fail.
78+
- **Validate data** through a fluent, chainable API with [25+ validation methods](reference/index.qmd#validation-steps)
79+
- **Set thresholds** to define acceptable levels of data quality (warning, error, critical)
80+
- **Take actions** when thresholds are exceeded (notifications, logging, custom functions)
81+
- **Generate reports** that make data quality issues immediately understandable
82+
- **Inspect data** with built-in tools for previewing, summarizing, and finding missing values
9383

94-
In the validation report table, test unit metrics are displayed under the `UNITS`, `PASS`, and
95-
`FAIL` columns. This diagram explains what the tabulated values signify:
84+
## Why Pointblank?
9685

97-
![](/assets/validation-test-units.png){width=100%}
86+
Pointblank is designed for the entire data team, not just engineers:
9887

99-
Test units are dependent on the test being run. Some validation methods might test every value in a
100-
particular column, so each value will be a test unit. Others will only have a single test unit since
101-
they aren't testing individual values but rather if the overall test passes or fails.
88+
🎨 **Beautiful Reports**: Interactive validation reports that stakeholders actually want to read
89+
📊 **Threshold Management**: Define quality standards with warning, error, and critical levels
90+
🔍 **Error Drill-Down**: Inspect failing data to get to root causes quickly
91+
🔗 **Universal Compatibility**: Works with Polars, Pandas, DuckDB, MySQL, PostgreSQL, SQLite, and more
92+
📝 **YAML Support**: Write validations in YAML for version control and team collaboration
93+
**CLI Tools**: Run validations from the command line for CI/CD pipelines or as quick checks
94+
**Rich Inspection**: Preview data, analyze columns, and visualize missing values
10295

103-
## Setting Thresholds for Data Quality Signals
96+
## Quick Examples
10497

105-
Understanding test units is essential because they form the foundation of Pointblank's threshold
106-
system. Thresholds let you define acceptable levels of data quality, triggering different severity
107-
signals ('warning', 'error', or 'critical') when certain failure conditions are met.
98+
### Interactive Reports
10899

109-
Here's a simple example that uses a single validation step along with thresholds set using the
110-
`Thresholds` class:
100+
Validation reports aren't just for engineers. They're designed for data stakeholders and are
101+
highly customizable and publishable as HTML:
111102

112-
```{python}
113-
(
114-
pb.Validate(data=pb.load_dataset(dataset="small_table"))
115-
.col_vals_lt(
116-
columns="a",
117-
value=7,
118-
119-
# Set the 'warning' and 'error' thresholds ---
120-
thresholds=pb.Thresholds(warning=2, error=4)
121-
)
122-
.interrogate()
123-
)
103+
```python
104+
validation.get_tabular_report().show() # In REPL
105+
validation # In notebooks: it just works
124106
```
125107

126-
If you look at the validation report table, we can see:
127-
128-
- the `FAIL` column shows that 2 tests units have failed
129-
- the `W` column (short for 'warning') shows a filled gray circle indicating those failing test
130-
units reached that threshold value
131-
- the `E` column (short for 'error') shows an open yellow circle indicating that the number of
132-
failing test units is below that threshold
133-
134-
The one final threshold level, `C` (for 'critical'), wasn't set so it appears on the validation
135-
table as a long dash.
136-
137-
## Taking Action on Threshold Exceedances
108+
### Threshold-Based Quality
138109

139-
Pointblank becomes even more powerful when you combine thresholds with actions. The
140-
`Actions` class lets you trigger responses when validation failures exceed threshold levels, turning
141-
passive reporting into active notifications.
110+
Set expectations and react when data quality degrades (with alerts, logging, or custom functions):
142111

143-
Here's a simple example that adds an action to the previous validation:
144-
145-
```{python}
146-
(
147-
pb.Validate(data=pb.load_dataset(dataset="small_table"))
148-
.col_vals_lt(
149-
columns="a",
150-
value=7,
151-
thresholds=pb.Thresholds(warning=2, error=4),
152-
153-
# Set an action for the 'warning' threshold ---
154-
actions=pb.Actions(
155-
warning="WARNING: Column 'a' has values that aren't less than 7."
156-
)
157-
)
112+
```python
113+
validation = (
114+
pb.Validate(data=sales_data, thresholds=(0.01, 0.02, 0.05)) # Three threhold levels set
115+
.col_vals_not_null(columns="customer_id")
116+
.col_vals_in_set(columns="status", set=["pending", "shipped", "delivered"])
158117
.interrogate()
159118
)
160119
```
161120

162-
Notice the printed warning message: `"WARNING: Column 'a' has values that aren't less than
163-
7."`. The warning indicator (filled gray circle) visually confirms this threshold was reached and
164-
the action should trigger.
121+
### YAML Workflows
122+
123+
Works wonderfully for CI/CD pipelines and team collaboration:
165124

166-
Actions make your validation workflows more responsive and integrated with your data pipelines. For
167-
example, you can generate console messages, Slack notifications, and more.
125+
```yaml
126+
validate:
127+
data: sales_data
128+
tbl_name: "sales_data"
129+
thresholds: [0.01, 0.02, 0.05]
168130

169-
## Navigating the User Guide
131+
steps:
132+
- col_vals_not_null:
133+
columns: "customer_id"
134+
- col_vals_in_set:
135+
columns: "status"
136+
set: ["pending", "shipped", "delivered"]
137+
```
170138
171-
As you continue exploring Pointblank's capabilities, you'll find the **User Guide** organized into
172-
sections that will help you navigate the various features.
139+
```python
140+
validation = pb.yaml_interrogate("validation.yaml")
141+
```
173142

174-
### Getting Started
143+
### Command Line Power
175144

176-
The *Getting Started* section introduces you to Pointblank:
145+
Run validations without writing code:
177146

178-
- [Introduction](index.qmd): Overview of Pointblank and core concepts (**this article**)
179-
- [Installation](user-guide/installation.qmd): How to install and set up Pointblank
147+
```bash
148+
# Quick validation
149+
pb validate sales_data.csv --check col-vals-not-null --column customer_id
180150

181-
### Validation Plan
151+
# Run YAML workflows
152+
pb run validation.yaml --exit-code # <- Great for CI/CD!
182153

183-
The *Validation Plan* section covers everything you need to know about creating robust
184-
validation plans:
154+
# Explore your data
155+
pb scan sales_data.csv
156+
pb missing sales_data.csv
157+
```
185158

186-
- [Overview](user-guide/validation-overview.qmd): Survey of validation methods and their shared parameters
187-
- [Validation Methods](user-guide/validation-methods.qmd): A closer look at the more common validation methods
188-
- [Column Selection Patterns](user-guide/column-selection-patterns.qmd): Techniques for targeting specific columns
189-
- [Preprocessing](user-guide/preprocessing.qmd): Transform data before validation
190-
- [Segmentation](user-guide/segmentation.qmd): Apply validations to specific segments of your data
191-
- [Thresholds](user-guide/thresholds.qmd): Set quality standards and trigger severity levels
192-
- [Actions](user-guide/actions.qmd): Respond to threshold exceedances with notifications or custom functions
193-
- [Briefs](user-guide/briefs.qmd): Add context to validation steps
159+
## Installation
194160

195-
### Advanced Validation
161+
Install Pointblank using pip or conda:
196162

197-
The *Advanced Validation* section explores more specialized validation techniques:
163+
```bash
164+
pip install pointblank
165+
# or
166+
conda install conda-forge::pointblank
167+
```
198168

199-
- [Expression-Based Validation](user-guide/expressions.qmd): Use column expressions for advanced validation
200-
- [Schema Validation](user-guide/schema-validation.qmd): Enforce table structure and column types
201-
- [Assertions](user-guide/assertions.qmd): Raise exceptions to enforce data quality requirements
202-
- [Draft Validation](user-guide/draft-validation.qmd): Create validation plans from existing data
169+
For specific backends:
203170

204-
### Post Interrogation
171+
```bash
172+
pip install "pointblank[pl]" # Polars support
173+
pip install "pointblank[pd]" # Pandas support
174+
pip install "pointblank[duckdb]" # DuckDB support
175+
pip install "pointblank[postgres]" # PostgreSQL support
176+
```
205177

206-
After validating your data, the *Post Interrogation* section helps you analyze and respond to
207-
results:
178+
See the [Installation guide](user-guide/installation.qmd) for more details.
208179

209-
- [Validation Reports](user-guide/validation-reports.qmd): Understand and customize the validation report table
210-
- [Step Reports](user-guide/step-reports.qmd): View detailed results for individual validation steps
211-
- [Data Extracts](user-guide/extracts.qmd): Extract and analyze failing data
212-
- [Sundering Validated Data](user-guide/sundering.qmd): Split data based on validation results
180+
## Join the Community
213181

214-
### Data Inspection
182+
We'd love to hear from you! Connect with us:
215183

216-
The *Data Inspection* section provides tools to explore and understand your data:
184+
- [GitHub Issues](https://github.com/posit-dev/pointblank/issues) for bug reports and feature requests
185+
- [Discord server](https://discord.com/invite/YH7CybCNCQ) for discussions and help
186+
- [Contributing guidelines](https://github.com/posit-dev/pointblank/blob/main/CONTRIBUTING.md) if you'd like to contribute
217187

218-
- [Previewing Data](user-guide/preview.qmd): View samples of your data
219-
- [Column Summaries](user-guide/col-summary-tbl.qmd): Get statistical summaries of your data
220-
- [Missing Values Reporting](user-guide/missing-vals-tbl.qmd): Identify and visualize missing data
188+
---
221189

222-
By following this guide, you'll gain a comprehensive understanding of how to validate, monitor, and
223-
maintain high-quality data with Pointblank.
190+
**License**: MIT | **© 2024-2025 Posit Software, PBC**

0 commit comments

Comments
 (0)