You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
<trdata-index="3" data-listing-date-sort="1746144000000" data-listing-file-modified-sort="1763484258684" data-listing-date-modified-sort="NaN" data-listing-reading-time-sort="12" data-listing-word-count-sort="2204" data-listing-title-sort="Level Up Your Data Validation with `Actions` and `FinalActions`" data-listing-filename-sort="index.qmd">
304
+
<trdata-index="3" data-listing-date-sort="1746144000000" data-listing-file-modified-sort="1763484321296" data-listing-date-modified-sort="NaN" data-listing-reading-time-sort="12" data-listing-word-count-sort="2204" data-listing-title-sort="Level Up Your Data Validation with `Actions` and `FinalActions`" data-listing-filename-sort="index.qmd">
@@ -568,7 +568,7 @@ <h2 class="anchored" data-anchor-id="the-data-validation-workflow">The Data Vali
568
568
<p>All of those numbers under the <code>UNITS</code>, <code>PASS</code>, and <code>FAIL</code> columns have to do with test units, a measure of central importance in Pointblank. Each validation step will execute a type of validation test on the target table. For example, a <ahref="https://posit-dev.github.io/pointblank/reference/Validate.col_vals_lt.html"><code>col_vals_lt()</code></a> validation step can test that each value in a column is less than a specified number. The key finding that’s reported as a result of this test is the number of test units that pass or fail. This little diagram explains what those numbers mean:</p>
<p>Failing test units can be tied to threshold levels, which can provide a better indication of whether failures should raise some basic awareness or spur you into action. Here’s a validation workflow that sets three failure threshold levels that signal the severity of data quality problems:</p>
@@ -1147,7 +1147,7 @@ <h2 class="anchored" data-anchor-id="the-data-validation-workflow">The Data Vali
1147
1147
<h2class="anchored" data-anchor-id="reporting-for-individual-validation-steps">Reporting for Individual Validation Steps</h2>
1148
1148
<p>The second validation step of the previous data validation showed 18 failing test units. That translates to 18 spots in a 2,000 row DataFrame where a data quality assertion failed. We often would like to know exactly what that failing data is; it’s usually the next step toward addressing data quality issues.</p>
1149
1149
<p>Pointblank offers a method that gives you a tabular report on a specific step: <ahref="https://posit-dev.github.io/pointblank/reference/Validate.get_step_report.html"><code>get_step_report()</code></a>. The previous tables you’ve seen (the validation report table) dealt with providing a summary of all validation steps. In contrast, a focused report on a single step can help to get to the heart of a data quality issue. Here’s how that looks for Step 2:</p>
<p>This report provides the 18 rows where the failure occurred. If you scroll the table to the right you’ll see the column that underwent testing (<code>session_duration</code>) is highlighted in red. All of these values are <code>5.0</code> or less, which is in violation of the assertion (in the header) that <code>session_duration > 5</code>.</p>
1393
1393
<p>These types of bespoke reports are useful for finding a needle in a haystack. Another good use for a step report is when validating a table schema. Using the <ahref="https://posit-dev.github.io/pointblank/reference/Validate.col_schema_match.html"><code>col_schema_match()</code></a> validation method with a table schema prepared with the <ahref="https://posit-dev.github.io/pointblank/reference/Schema.html"><span><code>Schema</code></span></a> class allows us to verify our understanding of the table structure. Here is a validation that performs a schema validation with the <code>small_table</code> dataset prepared as a DuckDB table:</p>
<spanid="cb5-3"><ahref="#cb5-3" aria-hidden="true" tabindex="-1"></a><spanclass="co"># Create a schema for the target table (`small_table` as a DuckDB table)</span></span>
<p>This step fails, but the validation report table doesn’t tell us how (or where). Using <ahref="https://posit-dev.github.io/pointblank/reference/Validate.get_step_report.html">`get_step_report()</a> will show us what the underlying issues are:</p>
0 commit comments