Skip to content

Conversation

@ABLL526
Copy link
Contributor

@ABLL526 ABLL526 commented Mar 4, 2025

-Add support for sum of truncated values.
-Added the aggregatedTruncTotal and absAggregatedTruncTotal Measures.

Closes #314

Release notes:
-Added the aggregatedTruncTotal and absAggregatedTruncTotal Measures.
-Added the tests for these Measures.

- Added the aggregatedTruncTotal Measure and the absAggregatedTruncTotal Measure.
- Added the tests for these Measures.
@ABLL526 ABLL526 added enhancement New feature or request Agent Issues touching the agent part of the project labels Mar 4, 2025
@ABLL526 ABLL526 self-assigned this Mar 4, 2025
@ABLL526 ABLL526 linked an issue Mar 4, 2025 that may be closed by this pull request
@ABLL526 ABLL526 linked an issue Mar 4, 2025 that may be closed by this pull request
@github-actions
Copy link

github-actions bot commented Mar 4, 2025

JaCoCo model module code coverage report - scala 2.13.11

Overall Project 56.51% 🍏

There is no coverage information present for the Files changed

@github-actions
Copy link

github-actions bot commented Mar 4, 2025

JaCoCo agent module code coverage report - scala 2.13.11

Overall Project 78.98% -8.83% 🍏
Files changed 61.54%

File Coverage
MeasuresBuilder.scala 100% 🍏
Measure.scala 88.17% -31.12%

@github-actions
Copy link

github-actions bot commented Mar 4, 2025

JaCoCo reader module code coverage report - scala 2.13.11

Overall Project 95.16% 🍏

There is no coverage information present for the Files changed

@github-actions
Copy link

github-actions bot commented Mar 4, 2025

JaCoCo server module code coverage report - scala 2.13.11

Overall Project 68.39% 🍏

There is no coverage information present for the Files changed

DistinctRecordCount.measureName,
SumOfValuesOfColumn.measureName,
AbsSumOfValuesOfColumn.measureName,
SumOfTruncatedValuesOfColumn.measureName,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

val supportedMeasureNames is not used

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have only added the names, it must be dead code possibly. I will comment out the val and rerun the tests.

def apply(measuredCol: String): AbsSumOfValuesOfColumn = AbsSumOfValuesOfColumn(measureName, measuredCol)
}

case class SumOfTruncatedValuesOfColumn private (measureName: String, measuredCol: String) extends AtumMeasure {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why have you decided to use double casting approach instead of using standard functions from package org.apache.spark.sql to round values before summing them?

  /**
   * Returns the value of the column `e` rounded to 0 decimal places with HALF_UP round mode.
   *
   * @group math_funcs
   * @since 1.5.0
   */
  def round(e: Column): Column = round(e, 0)

  /**
   * Round the value of `e` to `scale` decimal places with HALF_UP round mode
   * if `scale` is greater than or equal to 0 or at integral part when `scale` is less than 0.
   *
   * @group math_funcs
   * @since 1.5.0
   */
  def round(e: Column, scale: Int): Column = withExpr { Round(e.expr, Literal(scale)) }

  /**
   * Returns the value of the column `e` rounded to 0 decimal places with HALF_EVEN round mode.
   *
   * @group math_funcs
   * @since 2.0.0
   */
  def bround(e: Column): Column = bround(e, 0)

  /**
   * Round the value of `e` to `scale` decimal places with HALF_EVEN round mode
   * if `scale` is greater than or equal to 0 or at integral part when `scale` is less than 0.
   *
   * @group math_funcs
   * @since 2.0.0
   */
  def bround(e: Column, scale: Int): Column = withExpr { BRound(e.expr, Literal(scale)) }

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you. It was mentioned in the issue that this method was used in ATUM but let me change it accordingly. Since I think this method you have mentioned is more correct.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The round function does not work with negative numbers, unfortunately. We need a truncation function that will simply remove the decimals. But I have found another way that also works to ensure proper functionality without resorting to a double cast.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yup, combination of floor and ceil should work fine

scala> df.show
+-----+
|value|
+-----+
| -1.1|
|  1.1|
| -1.7|
|  1.7|
| -0.8|
|  0.8|
+-----+


scala> df.select(when(col("value") >= 0, floor(col("value"))).otherwise(ceil(col("value")))).show
+-------------------------------------------------------------+
|CASE WHEN (value >= 0) THEN FLOOR(value) ELSE CEIL(value) END|
+-------------------------------------------------------------+
|                                                           -1|
|                                                            1|
|                                                           -1|
|                                                            1|
|                                                            0|
|                                                            0|
+-------------------------------------------------------------+

- Added the aggregatedTruncTotal Measure and the absAggregatedTruncTotal Measure.
- Added the tests for these Measures.
- Made amendments to the function to not include double casts.
}

override def measuredColumns: Seq[String] = Seq(measuredCol)
override val resultValueType: ResultValueType = ResultValueType.BigDecimalValue
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ResultValueType.LongValue?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I like that. Good catch. Thank you.

- Added the aggregatedTruncTotal Measure and the absAggregatedTruncTotal Measure.
- Added the tests for these Measures.
- Made amendments to the function to not include double casts.
- Changed the result from BigDecimal to LongValue.
Copy link
Collaborator

@salamonpavel salamonpavel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@ABLL526 ABLL526 merged commit 07e4b4f into master Mar 12, 2025
9 checks passed
@ABLL526 ABLL526 deleted the 314-add-support-for-sum-of-truncated-values branch March 12, 2025 15:24
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Agent Issues touching the agent part of the project enhancement New feature or request

Projects

Status: Done

Development

Successfully merging this pull request may close these issues.

Add support for sum of truncated values

2 participants