@@ -357,6 +357,7 @@ These structural checks form a foundation for more detailed data quality assessm
357357- ` ~~Validate.col_count_match() ` : confirms the table has the expected number of columns
358358- ` ~~Validate.row_count_match() ` : verifies the table has the expected number of rows
359359- ` ~~Validate.tbl_match() ` : validates that the target table matches a comparison table
360+ - ` ~~Validate.data_freshness() ` : checks that data is recent and not stale
360361
361362These structural validations provide essential checks on the fundamental organization of your data
362363tables, ensuring they have the expected dimensions and components needed for reliable data analysis.
@@ -494,6 +495,70 @@ matches a specified count.
494495Expectations on column and row counts can be useful in certain situations and they align nicely with
495496schema checks.
496497
498+ ### Validating Data Freshness
499+
500+ Late or missing data is one of the most common (and costly) data quality issues in production
501+ systems. When data pipelines fail silently or experience delays, downstream analytics and ML models
502+ can produce stale or misleading results. The ` ~~Validate.data_freshness() ` validation method helps
503+ catch these issues early by verifying that your data contains recent records.
504+
505+ Data freshness validation works by checking a datetime column against a maximum allowed age. If the
506+ most recent timestamp in that column is older than the specified threshold, the validation fails.
507+ This simple check can prevent major downstream problems caused by stale data.
508+
509+ Here's an example that validates data is no older than 2 days:
510+
511+ ``` {python}
512+ import polars as pl
513+ from datetime import datetime, timedelta
514+
515+ # Simulate a data feed that should be updated daily
516+ recent_data = pl.DataFrame({
517+ "event": ["login", "purchase", "logout", "signup"],
518+ "event_time": [
519+ datetime.now() - timedelta(hours=1),
520+ datetime.now() - timedelta(hours=6),
521+ datetime.now() - timedelta(hours=12),
522+ datetime.now() - timedelta(hours=18),
523+ ],
524+ "user_id": [101, 102, 103, 104]
525+ })
526+
527+ (
528+ pb.Validate(data=recent_data)
529+ .data_freshness(column="event_time", max_age="2d")
530+ .interrogate()
531+ )
532+ ```
533+
534+ The ` max_age= ` parameter accepts a flexible string format: ` "30m" ` for 30 minutes, ` "6h" ` for 6
535+ hours, ` "2d" ` for 2 days, or ` "1w" ` for 1 week. You can also combine units: ` "1d 12h" ` for 1.5 days.
536+
537+ When validation succeeds, the report includes details about the data's age in the footer. When it
538+ fails, you'll see exactly how old the most recent data is and what threshold was exceeded. This
539+ context helps quickly diagnose whether you're dealing with a minor delay or a major pipeline
540+ failure.
541+
542+ Data freshness validation is particularly valuable for:
543+
544+ - monitoring ETL pipelines to catch failures before they cascade to reports and dashboards
545+ - validating data feeds to ensure third-party data sources are delivering as expected
546+ - including freshness checks in automated data quality tests as part of continuous integration
547+ - building alerting systems that trigger notifications when critical data becomes stale
548+
549+ You might wonder why not just use ` ~~Validate.col_vals_gt() ` with a datetime threshold. While that
550+ approach works, ` ~~Validate.data_freshness() ` offers several advantages: the method name clearly
551+ communicates your intent, the ` max_age= ` string format (e.g., ` "2d" ` ) is more readable than datetime
552+ arithmetic, it auto-generates meaningful validation briefs, the report footer shows helpful context
553+ about actual data age and thresholds, and timezone mismatches between your data and comparison time
554+ are handled gracefully with informative warnings.
555+
556+ ::: {.callout-note}
557+ When comparing timezone-aware and timezone-naive datetimes, Pointblank will include a warning in the
558+ validation report. For consistent results, ensure your data and comparison times use compatible
559+ timezone settings.
560+ :::
561+
497562## 4. AI-Powered Validations
498563
499564AI-powered validations use Large Language Models (LLMs) to validate data based on natural language
0 commit comments