Add support for RowCountMatch rule by joshuazexter · Pull Request #652 · awslabs/deequ

joshuazexter · 2026-01-16T22:37:28Z

Description of changes:
Added RowCountMatch

Comparison utility
DQDL Rule / Executor / Translator
Unit Tests

From

https://docs.aws.amazon.com/glue/latest/dg/dqdl.html

new
RowCountMatch compares row counts between primary and reference datasets as a ratio!

Rules=[RowCountMatch "ref" >= 0.9]

Dataset.ref.RowCountMatch -> 0.857

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.

SamPom100

had some comments

SamPom100 · 2026-01-21T20:55:05Z

src/main/scala/com/amazon/deequ/comparison/RowCountMatch.scala

+                     assertion: Double => Boolean): ComparisonResult = {
+    val primaryCount = primary.count()
+    val referenceCount = reference.count()
+    val ratio = primaryCount.toDouble / referenceCount.toDouble


hello divide by zero 😮

This matches AWS Glue Data Quality's RowCountMatch behavior, including the division-by-zero edge case when the reference dataset is empty. the assertion would receive Infinity or NaN

SamPom100 · 2026-01-21T20:59:42Z

src/main/scala/com/amazon/deequ/comparison/RowCountMatch.scala

+                     assertion: Double => Boolean): ComparisonResult = {
+    val primaryCount = primary.count()
+    val referenceCount = reference.count()
+    val ratio = primaryCount.toDouble / referenceCount.toDouble


you should round this to a couple decimal places. can you check other analyzers for precedent?

Checked other analyzers/comparisons and none round their ratios (DataSynchronization, ReferentialIntegrity, DatasetMatchAnalyzer all return raw doubles).

imo, rounding would also lose precision for assertions. Keeping consistent with existing behavior.

SamPom100 · 2026-01-21T21:00:49Z

src/test/scala/com/amazon/deequ/comparison/RowCountMatchTest.scala

+import com.amazon.deequ.SparkContextSpec
+import org.scalatest.wordspec.AnyWordSpec
+
+class RowCountMatchTest extends AnyWordSpec with SparkContextSpec {


can you add unit tests for

divide by zero

both datasets have 0 rows

SamPom100 · 2026-01-21T21:02:44Z

src/main/scala/com/amazon/deequ/dqdl/execution/executors/RowCountMatchExecutor.scala

+    rules.map { rule =>
+      val outcome = additionalDataSources.get(rule.referenceDatasetAlias) match {
+        case Some(referenceDF) =>
+          val result = RowCountMatch.matchRowCounts(df, referenceDF, rule.assertion)


why do you need val result here

Addressed good point!

joshuazexter force-pushed the master branch from 74208c1 to 89583cb Compare January 17, 2026 16:47

joshuazexter requested a review from SamPom100 January 20, 2026 21:53

SamPom100 requested changes Jan 21, 2026

View reviewed changes

Add support for RowCountMatch rule

6b780bd

joshuazexter force-pushed the master branch from 89583cb to 6b780bd Compare January 21, 2026 21:44

SamPom100 approved these changes Jan 21, 2026

View reviewed changes

joshuazexter merged commit fac5c11 into awslabs:master Jan 21, 2026
1 check passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add support for RowCountMatch rule#652

Add support for RowCountMatch rule#652
joshuazexter merged 1 commit intoawslabs:masterfrom
joshuazexter:master

joshuazexter commented Jan 16, 2026

Uh oh!

SamPom100 left a comment

Uh oh!

SamPom100 Jan 21, 2026

Uh oh!

joshuazexter Jan 21, 2026

Uh oh!

SamPom100 Jan 21, 2026

Uh oh!

joshuazexter Jan 21, 2026

Uh oh!

SamPom100 Jan 21, 2026

Uh oh!

joshuazexter Jan 21, 2026

Uh oh!

SamPom100 Jan 21, 2026

Uh oh!

joshuazexter Jan 21, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

joshuazexter commented Jan 16, 2026

Uh oh!

SamPom100 left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants