Revamp docs intro and add quickstart guide

rich-iannone · rich-iannone · commit ca0555575ce7 · 2025-10-28T14:03:59.000-04:00
diff --git a/docs/_quarto.yml b/docs/_quarto.yml
@@ -64,6 +64,7 @@ website:
         - section: "Getting Started"
           contents:
             - index.qmd
+            - user-guide/quick-start.qmd
             - user-guide/installation.qmd
         - section: "Validation Plan"
           contents:
diff --git a/docs/index.qmd b/docs/index.qmd
@@ -1,223 +1,190 @@
 ---
-title: Introduction
+title: Welcome to Pointblank
 jupyter: python3
-toc-expand: 2
 html-table-processing: none
 ---
+
+<div style="text-align: center;">
+
+![](/assets/pointblank_logo.svg){width=60%}
+
+**Data validation made beautiful and powerful.**
+
+</div>
+
+Pointblank is a data validation framework for Python that makes data quality checks beautiful,
+powerful, and stakeholder-friendly. Instead of cryptic error messages, get stunning interactive
+reports that turn data issues into conversations.
+
 ```{python}
 #| echo: false
 #| output: false
 import pointblank as pb
 pb.config(report_incl_footer=False)
 ```
 
-The Pointblank library is all about assessing the state of data quality for a table. You provide the
-validation rules and the library will dutifully interrogate the data and provide useful reporting.
-We can use different types of tables like Polars and Pandas DataFrames, Parquet files, or various
-database tables. Let's walk through what data validation looks like in Pointblank.
-
-## A Simple Validation Table
-
-This is a validation report table that is produced from a validation of a Polars DataFrame:
-
 ```{python}
-#| code-fold: true
-#| code-summary: "Show the code"
+#| echo: false
 import pointblank as pb
-
-(
-    pb.Validate(data=pb.load_dataset(dataset="small_table"), label="Example Validation")
-    .col_vals_lt(columns="a", value=10)
-    .col_vals_between(columns="d", left=0, right=5000)
-    .col_vals_in_set(columns="f", set=["low", "mid", "high"])
-    .col_vals_regex(columns="b", pattern=r"^[0-9]-[a-z]{3}-[0-9]{3}$")
+import polars as pl
+
+validation = (
+    pb.Validate(
+        data=pb.load_dataset(dataset="game_revenue", tbl_type="polars"),
+        tbl_name="game_revenue",
+        label="Comprehensive validation of game revenue data",
+        thresholds=pb.Thresholds(warning=0.10, error=0.25, critical=0.35),
+        brief=True
+    )
+    .col_vals_regex(columns="player_id", pattern=r"^[A-Z]{12}[0-9]{3}$")        # STEP 1
+    .col_vals_gt(columns="session_duration", value=20)                          # STEP 2
+    .col_vals_ge(columns="item_revenue", value=0.20)                            # STEP 3
+    .col_vals_in_set(columns="item_type", set=["iap", "ad"])                    # STEP 4
+    .col_vals_in_set(                                                           # STEP 5
+        columns="acquisition",
+        set=["google", "facebook", "organic", "crosspromo", "other_campaign"]
+    )
+    .col_vals_not_in_set(columns="country", set=["Mongolia", "Germany"])        # STEP 6
+    .col_vals_between(                                                          # STEP 7
+        columns="session_duration",
+        left=10, right=50,
+        pre = lambda df: df.select(pl.median("session_duration")),
+        brief="Expect that the median of `session_duration` should be between `10` and `50`."
+    )
+    .rows_distinct(columns_subset=["player_id", "session_id", "time"])          # STEP 8
+    .row_count_match(count=2000)                                                # STEP 9
+    .col_count_match(count=11)                                                  # STEP 10
+    .col_vals_not_null(columns="item_type")                                     # STEP 11
+    .col_exists(columns="start_day")                                            # STEP 12
     .interrogate()
 )
-```
-
-Each row in this reporting table constitutes a single validation step. Roughly, the left-hand side
-outlines the validation rules and the right-hand side provides the results of each validation step.
-While simple in principle, there's a lot of useful information packed into this validation table.
-
-Here's a diagram that describes a few of the important parts of the validation table:
-
-![](/assets/validation-table-diagram.png){width=100%}
-
-There are three things that should be noted here:
-
-- validation steps: each step is a separate test on the table, focused on a certain aspect of the
-table
-- validation rules: the validation type is provided here along with key constraints
-- validation results: interrogation results are provided here, with a breakdown of test units
-(*total*, *passing*, and *failing*), threshold flags, and more
-
-The intent is to provide the key information in one place, and have it be interpretable by data
-stakeholders. For example, a failure can be seen in the second row (notice there's a CSV button). A
-data quality stakeholder could click this to download a CSV of the failing rows for that step.
-
-## Example Code, Step-by-Step
-
-This section will walk you through the example code used above.
-
-```python
-import pointblank as pb
 
-(
-    pb.Validate(data=pb.load_dataset(dataset="small_table"))
-    .col_vals_lt(columns="a", value=10)
-    .col_vals_between(columns="d", left=0, right=5000)
-    .col_vals_in_set(columns="f", set=["low", "mid", "high"])
-    .col_vals_regex(columns="b", pattern=r"^[0-9]-[a-z]{3}-[0-9]{3}$")
-    .interrogate()
-)
+validation.get_tabular_report(title="Game Revenue Validation Report").show("browser")
 ```
 
-Note these three key pieces in the code:
+Ready to validate? Start with our [Installation](user-guide/installation.qmd) guide or jump straight
+to the [User Guide](user-guide/index.qmd).
 
-- **data**: the `Validate(data=)` argument takes a DataFrame or database table that you want to validate
-- **steps**: the methods starting with `col_vals_` specify validation steps that run on specific columns
-- **execution**: the `~~Validate.interrogate()` method executes the validation plan on the table
+Pointblank is made with 💙 by [Posit](https://posit.co/).
 
-This common pattern is used in a validation workflow, where `Validate` and
-`~~Validate.interrogate()` bookend a validation plan generated through calling validation methods.
+## What is Data Validation?
 
-In the next few sections we'll go a bit further by understanding how we can measure data quality and
-respond to failures.
+Data validation ensures your data meets quality standards before it's used in analysis, reports, or
+downstream systems. Pointblank provides a structured way to define validation rules, execute them,
+and communicate results to both technical and non-technical stakeholders.
 
-## Understanding Test Units
+With Pointblank you can:
 
-Each validation step will execute a type of validation test on the target table. For example, a
-`~~Validate.col_vals_lt()` validation step can test that each value in a column is less than a
-specified number. And the key finding that's reported in each step is the number of *test units*
-that pass or fail.
+- **Validate data** through a fluent, chainable API with [25+ validation methods](reference/index.qmd#validation-steps)
+- **Set thresholds** to define acceptable levels of data quality (warning, error, critical)
+- **Take actions** when thresholds are exceeded (notifications, logging, custom functions)
+- **Generate reports** that make data quality issues immediately understandable
+- **Inspect data** with built-in tools for previewing, summarizing, and finding missing values
 
-In the validation report table, test unit metrics are displayed under the `UNITS`, `PASS`, and
-`FAIL` columns. This diagram explains what the tabulated values signify:
+## Why Pointblank?
 
-![](/assets/validation-test-units.png){width=100%}
+Pointblank is designed for the entire data team, not just engineers:
 
-Test units are dependent on the test being run. Some validation methods might test every value in a
-particular column, so each value will be a test unit. Others will only have a single test unit since
-they aren't testing individual values but rather if the overall test passes or fails.
+🎨 **Beautiful Reports**: Interactive validation reports that stakeholders actually want to read
+📊 **Threshold Management**: Define quality standards with warning, error, and critical levels
+🔍 **Error Drill-Down**: Inspect failing data to get to root causes quickly
+🔗 **Universal Compatibility**: Works with Polars, Pandas, DuckDB, MySQL, PostgreSQL, SQLite, and more
+📝 **YAML Support**: Write validations in YAML for version control and team collaboration
+⚡ **CLI Tools**: Run validations from the command line for CI/CD pipelines or as quick checks
+� **Rich Inspection**: Preview data, analyze columns, and visualize missing values
 
-## Setting Thresholds for Data Quality Signals
+## Quick Examples
 
-Understanding test units is essential because they form the foundation of Pointblank's threshold
-system. Thresholds let you define acceptable levels of data quality, triggering different severity
-signals ('warning', 'error', or 'critical') when certain failure conditions are met.
+### Interactive Reports
 
-Here's a simple example that uses a single validation step along with thresholds set using the
-`Thresholds` class:
+Validation reports aren't just for engineers. They're designed for data stakeholders and are
+highly customizable and publishable as HTML:
 
-```{python}
-(
-    pb.Validate(data=pb.load_dataset(dataset="small_table"))
-    .col_vals_lt(
-        columns="a",
-        value=7,
-
-        # Set the 'warning' and 'error' thresholds ---
-        thresholds=pb.Thresholds(warning=2, error=4)
-    )
-    .interrogate()
-)
+```python
+validation.get_tabular_report().show()  # In REPL
+validation  # In notebooks: it just works
 ```
 
-If you look at the validation report table, we can see:
-
-- the `FAIL` column shows that 2 tests units have failed
-- the `W` column (short for 'warning') shows a filled gray circle indicating those failing test
-units reached that threshold value
-- the `E` column (short for 'error') shows an open yellow circle indicating that the number of
-failing test units is below that threshold
-
-The one final threshold level, `C` (for 'critical'), wasn't set so it appears on the validation
-table as a long dash.
-
-## Taking Action on Threshold Exceedances
+### Threshold-Based Quality
 
-Pointblank becomes even more powerful when you combine thresholds with actions. The
-`Actions` class lets you trigger responses when validation failures exceed threshold levels, turning
-passive reporting into active notifications.
+Set expectations and react when data quality degrades (with alerts, logging, or custom functions):
 
-Here's a simple example that adds an action to the previous validation:
-
-```{python}
-(
-    pb.Validate(data=pb.load_dataset(dataset="small_table"))
-    .col_vals_lt(
-        columns="a",
-        value=7,
-        thresholds=pb.Thresholds(warning=2, error=4),
-
-        # Set an action for the 'warning' threshold ---
-        actions=pb.Actions(
-            warning="WARNING: Column 'a' has values that aren't less than 7."
-        )
-    )
+```python
+validation = (
+    pb.Validate(data=sales_data, thresholds=(0.01, 0.02, 0.05)) # Three threhold levels set
+    .col_vals_not_null(columns="customer_id")
+    .col_vals_in_set(columns="status", set=["pending", "shipped", "delivered"])
     .interrogate()
 )
 ```
 
-Notice the printed warning message: `"WARNING: Column 'a' has values that aren't less than
-7."`. The warning indicator (filled gray circle) visually confirms this threshold was reached and
-the action should trigger.
+### YAML Workflows
+
+Works wonderfully for CI/CD pipelines and team collaboration:
 
-Actions make your validation workflows more responsive and integrated with your data pipelines. For
-example, you can generate console messages, Slack notifications, and more.
+```yaml
+validate:
+  data: sales_data
+  tbl_name: "sales_data"
+  thresholds: [0.01, 0.02, 0.05]
 
-## Navigating the User Guide
+steps:
+  - col_vals_not_null:
+      columns: "customer_id"
+  - col_vals_in_set:
+      columns: "status"
+      set: ["pending", "shipped", "delivered"]
+```
 
-As you continue exploring Pointblank's capabilities, you'll find the **User Guide** organized into
-sections that will help you navigate the various features.
+```python
+validation = pb.yaml_interrogate("validation.yaml")
+```
 
-### Getting Started
+### Command Line Power
 
-The *Getting Started* section introduces you to Pointblank:
+Run validations without writing code:
 
-- [Introduction](index.qmd): Overview of Pointblank and core concepts (**this article**)
-- [Installation](user-guide/installation.qmd): How to install and set up Pointblank
+```bash
+# Quick validation
+pb validate sales_data.csv --check col-vals-not-null --column customer_id
 
-### Validation Plan
+# Run YAML workflows
+pb run validation.yaml --exit-code  # <- Great for CI/CD!
 
-The *Validation Plan* section covers everything you need to know about creating robust
-validation plans:
+# Explore your data
+pb scan sales_data.csv
+pb missing sales_data.csv
+```
 
-- [Overview](user-guide/validation-overview.qmd): Survey of validation methods and their shared parameters
-- [Validation Methods](user-guide/validation-methods.qmd): A closer look at the more common validation methods
-- [Column Selection Patterns](user-guide/column-selection-patterns.qmd): Techniques for targeting specific columns
-- [Preprocessing](user-guide/preprocessing.qmd): Transform data before validation
-- [Segmentation](user-guide/segmentation.qmd): Apply validations to specific segments of your data
-- [Thresholds](user-guide/thresholds.qmd): Set quality standards and trigger severity levels
-- [Actions](user-guide/actions.qmd): Respond to threshold exceedances with notifications or custom functions
-- [Briefs](user-guide/briefs.qmd): Add context to validation steps
+## Installation
 
-### Advanced Validation
+Install Pointblank using pip or conda:
 
-The *Advanced Validation* section explores more specialized validation techniques:
+```bash
+pip install pointblank
+# or
+conda install conda-forge::pointblank
+```
 
-- [Expression-Based Validation](user-guide/expressions.qmd): Use column expressions for advanced validation
-- [Schema Validation](user-guide/schema-validation.qmd): Enforce table structure and column types
-- [Assertions](user-guide/assertions.qmd): Raise exceptions to enforce data quality requirements
-- [Draft Validation](user-guide/draft-validation.qmd): Create validation plans from existing data
+For specific backends:
 
-### Post Interrogation
+```bash
+pip install "pointblank[pl]"       # Polars support
+pip install "pointblank[pd]"       # Pandas support
+pip install "pointblank[duckdb]"   # DuckDB support
+pip install "pointblank[postgres]" # PostgreSQL support
+```
 
-After validating your data, the *Post Interrogation* section helps you analyze and respond to
-results:
+See the [Installation guide](user-guide/installation.qmd) for more details.
 
-- [Validation Reports](user-guide/validation-reports.qmd): Understand and customize the validation report table
-- [Step Reports](user-guide/step-reports.qmd): View detailed results for individual validation steps
-- [Data Extracts](user-guide/extracts.qmd): Extract and analyze failing data
-- [Sundering Validated Data](user-guide/sundering.qmd): Split data based on validation results
+## Join the Community
 
-### Data Inspection
+We'd love to hear from you! Connect with us:
 
-The *Data Inspection* section provides tools to explore and understand your data:
+- [GitHub Issues](https://github.com/posit-dev/pointblank/issues) for bug reports and feature requests
+- [Discord server](https://discord.com/invite/YH7CybCNCQ) for discussions and help
+- [Contributing guidelines](https://github.com/posit-dev/pointblank/blob/main/CONTRIBUTING.md) if you'd like to contribute
 
-- [Previewing Data](user-guide/preview.qmd): View samples of your data
-- [Column Summaries](user-guide/col-summary-tbl.qmd): Get statistical summaries of your data
-- [Missing Values Reporting](user-guide/missing-vals-tbl.qmd): Identify and visualize missing data
+---
 
-By following this guide, you'll gain a comprehensive understanding of how to validate, monitor, and
-maintain high-quality data with Pointblank.
+**License**: MIT | **© 2024-2025 Posit Software, PBC**
diff --git a/docs/user-guide/quickstart.qmd b/docs/user-guide/quickstart.qmd