-
Notifications
You must be signed in to change notification settings - Fork 86
#776 Add COMP-3 decoders for the EBCDIC writer #778
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
WalkthroughAdds packed BCD (COMP-3/COMP-3U) encoding and encoder selection, relocates setPrimitiveField to Copybook companion object, updates writer integration and diagnostics, and introduces unit tests covering BCD encoding and writer byte-level verification. Changes
Sequence Diagram(s)sequenceDiagram
autonumber
participant DF as Spark DataFrame Row
participant BRC as BasicRecordCombiner
participant ES as EncoderSelector
participant BCD as BCDNumberEncoders
participant CB as Copybook (object)
DF->>BRC: iterate fields & values
BRC->>ES: getEncoder(field metadata)
ES-->>ES: detect COMP-3 / COMP-3U case
ES->>BCD: encodeBCDNumber(value, precision, scale, scaleFactor, signed, mandatorySignNibble)
BCD-->>ES: Array[Byte] (packed BCD)
ES-->>BRC: encoder -> bytes
BRC->>CB: Copybook.setPrimitiveField(field, recordBytes, bytes, offset)
CB-->>BRC: updated recordBytes
BRC-->>DF: produce row bytes for writer
note right of BCD: New packed-decimal encoding path
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~25 minutes Assessment against linked issues
Assessment against linked issues: Out-of-scope changes
Possibly related PRs
Poem
Tip 🔌 Remote MCP (Model Context Protocol) integration is now available!Pro plan users can now connect to remote MCP servers from the Integrations page. Connect with popular remote MCPs such as Notion and Linear to add more context to your reviews and chats. 📜 Recent review detailsConfiguration used: CodeRabbit UI Review profile: CHILL Plan: Pro 💡 Knowledge Base configuration:
You can enable these sources in your CodeRabbit configuration. 📒 Files selected for processing (3)
🚧 Files skipped from review as they are similar to previous changes (3)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (5)
✨ Finishing Touches🧪 Generate unit tests
🪧 TipsChatThere are 3 ways to chat with CodeRabbit:
SupportNeed help? Create a ticket on our support page for assistance with any issues or questions. CodeRabbit Commands (Invoked using PR/Issue comments)Type Other keywords and placeholders
CodeRabbit Configuration File (
|
JaCoCo code coverage report - 'cobol-parser'
|
JaCoCo code coverage report - 'spark-cobol'
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🧹 Nitpick comments (10)
spark-cobol/src/main/scala/za/co/absa/cobrix/spark/cobol/writer/BasicRecordCombiner.scala (1)
37-38: Minor formatting improvement in pattern matching.The pattern matching for Group filtering is formatted consistently with the rest of the codebase.
cobol-parser/src/main/scala/za/co/absa/cobrix/cobol/parser/encoding/BCDNumberEncoders.scala (6)
25-26: Scaladoc formula and param phrasing are misleading for even precision with a sign nibble
- The array length is not always (precision + 1) / 2. For even precision with a mandatory sign nibble, an extra pad nibble is required, making length = (precision + 2) / 2. For unsigned (no sign nibble) it’s ceil(precision / 2).
- “before of after the decimal point” is a typo; also clarify that scaleFactor is applied relative to scale via movePointLeft(scaleFactor - scale).
Apply this comment-only diff to tighten the docs:
- * The length of the output array is determined by the formula: (precision + 1) / 2. + * Output length (bytes): + * - Unsigned (no sign nibble): ceil(precision / 2). + * - Signed (mandatory sign nibble): ceil((precision + 1) / 2). + * Note: for even precision a pad nibble precedes the trailing sign nibble. @@ - * @param scale A decimal scale if a number is a decimal. Should be greater or equal to zero. - * @param scaleFactor Additional zeros to be added before of after the decimal point. + * @param scale Decimal scale of the PIC (>= 0). + * @param scaleFactor Applied relative to `scale` as movePointLeft(scaleFactor - scale). + * Positive values shift the decimal point left, negative – right.Also applies to: 30-35
42-44: Validate input invariants early (scale, precision) for fail-fast behaviorPrecision check is good. Consider also guarding against invalid
scaleand extremely largeprecisionto fail fast with a clear message, rather than silently returning zeros later.def encodeBCDNumber(number: java.math.BigDecimal, precision: Int, scale: Int, scaleFactor: Int, signed: Boolean, mandatorySignNibble: Boolean): Array[Byte] = { - if (precision < 1) + if (precision < 1) throw new IllegalArgumentException(s"Invalid BCD precision=$precision, should be greater than zero.") + if (scale < 0) + throw new IllegalArgumentException(s"Invalid BCD scale=$scale, should be >= 0.") + // Optional: place an upper bound to prevent pathological allocations + if (precision > 38) + throw new IllegalArgumentException(s"Unsupported BCD precision=$precision, max supported is 38.")
45-52: totalDigits computation is correct; consider extracting for reuse and readabilityThe parity handling for the sign nibble and pad nibble is correct. Encapsulating this in a small helper improves readability and reduces mental load.
- val totalDigits = if (mandatorySignNibble) { - if (precision % 2 == 0) precision + 2 else precision + 1 - } else { - if (precision % 2 == 0) precision else precision + 1 - } + def totalNibbles(precision: Int, hasSignNibble: Boolean): Int = { + val even = (precision & 1) == 0 + if (hasSignNibble) if (even) precision + 2 else precision + 1 + else if (even) precision else precision + 1 + } + val totalDigits = totalNibbles(precision, mandatorySignNibble)
58-62: Normalize once and avoid scientific notation by using toPlainStringTwo minor improvements:
- Compute the shift once.
- Use toPlainString to avoid accidental scientific notation (defensive, even though scale=0 typically prints plain).
- val integralNumberStr = if (scaleFactor - scale == 0) - number.setScale(0, RoundingMode.HALF_DOWN).toString - else - number.movePointLeft(scaleFactor - scale).setScale(0, RoundingMode.HALF_DOWN).toString + val shift = scaleFactor - scale + val shifted = if (shift == 0) number else number.movePointLeft(shift) + val integralNumberStr = shifted.setScale(0, RoundingMode.HALF_DOWN).toPlainString
63-69: Determine sign numerically to avoid string-based edge cases (-0, locale)Using String prefix works here, but a numeric sign check after rounding is more robust and avoids surprises with “-0”.
- val isNegative = integralNumberStr.startsWith("-") - val digitsOnly = integralNumberStr.stripPrefix("-").stripPrefix("+") + val rounded = new java.math.BigDecimal(integralNumberStr) // scale 0 already + val isNegative = rounded.signum() < 0 + val digitsOnly = integralNumberStr.stripPrefix("-")Also applies to: 101-103
76-89: Overflow-to-zero behavior is by design — consider an opt-in strict modeWhen
digitsOnly.length > precision, returning zeros is consistent with other branches. For auditability, a strict mode (throwing or logging a structured warning) can help detect data truncation early in dev/test while keeping prod permissive.I can wire an optional
strict: Boolean = falseparam (defaulting to current behavior) and add tests. Interested?cobol-parser/src/test/scala/za/co/absa/cobrix/cobol/parser/encoding/BCDNumberEncodersSuite.scala (3)
118-123: Add tie-breaking tests to lock in HALF_DOWN semantics (+1.5/-1.5)Given the encoder uses HALF_DOWN, add explicit tie cases to prevent regressions (it’s easy for someone to “optimize” rounding later).
@@ "decimal number" when { + "rounds half-down for +1.5" in { + // shift = 0 => scale to 0 using HALF_DOWN: 1.5 -> 1 + val expected = Array[Byte](0x1C.toByte) + val actual = BCDNumberEncoders.encodeBCDNumber(new java.math.BigDecimal("1.5"), 1, 0, 0, signed = true, mandatorySignNibble = true) + checkExpected(actual, expected) + } + + "rounds half-down for -1.5" in { + // shift = 0 => scale to 0 using HALF_DOWN: -1.5 -> -1 + val expected = Array[Byte](0x1D.toByte) + val actual = BCDNumberEncoders.encodeBCDNumber(new java.math.BigDecimal("-1.5"), 1, 0, 0, signed = true, mandatorySignNibble = true) + checkExpected(actual, expected) + }Also applies to: 153-158
95-101: Consider asserting on null handling explicitlyThe production code returns zero bytes for null; add a test to make this behavior explicit.
@@ "attempt to encode a number with an incorrect precision with sign nibble" in { val expected = Array[Byte](0x00, 0x00, 0x00) val actual = BCDNumberEncoders.encodeBCDNumber(new java.math.BigDecimal(12345), 4, 0, 0, signed = true, mandatorySignNibble = true) checkExpected(actual, expected) } + + "encode null as zeros" in { + val expected = Array[Byte](0x00, 0x00, 0x00) + val actual = BCDNumberEncoders.encodeBCDNumber(null, 5, 0, 0, signed = true, mandatorySignNibble = true) + checkExpected(actual, expected) + }Also applies to: 102-107
190-198: Nit: toHex rendering is solid; consider including precision/scale in the failure messageMinor ergonomics improvement for debugging failures.
- fail(s"Actual: $actualHex\nExpected: $expectedHex") + fail(s"Actual: $actualHex\nExpected: $expectedHex")If you’d like, I can update call sites to pass context (precision/scale/flags) and include it here.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
💡 Knowledge Base configuration:
- MCP integration is disabled by default for public repositories
- Jira integration is disabled by default for public repositories
- Linear integration is disabled by default for public repositories
You can enable these sources in your CodeRabbit configuration.
📒 Files selected for processing (7)
cobol-parser/src/main/scala/za/co/absa/cobrix/cobol/parser/Copybook.scala(2 hunks)cobol-parser/src/main/scala/za/co/absa/cobrix/cobol/parser/encoding/BCDNumberEncoders.scala(1 hunks)cobol-parser/src/main/scala/za/co/absa/cobrix/cobol/parser/encoding/EncoderSelector.scala(3 hunks)cobol-parser/src/test/scala/za/co/absa/cobrix/cobol/parser/encoding/BCDNumberEncodersSuite.scala(1 hunks)cobol-parser/src/test/scala/za/co/absa/cobrix/cobol/parser/extract/BinaryExtractorSpec.scala(3 hunks)spark-cobol/src/main/scala/za/co/absa/cobrix/spark/cobol/writer/BasicRecordCombiner.scala(4 hunks)spark-cobol/src/test/scala/za/co/absa/cobrix/spark/cobol/writer/FixedLengthEbcdicWriterSuite.scala(1 hunks)
🧰 Additional context used
🧬 Code graph analysis (5)
cobol-parser/src/main/scala/za/co/absa/cobrix/cobol/parser/encoding/BCDNumberEncoders.scala (1)
cobol-parser/src/test/scala/za/co/absa/cobrix/cobol/parser/parse/FieldSizeSpec.scala (1)
scale(63-74)
spark-cobol/src/test/scala/za/co/absa/cobrix/spark/cobol/writer/FixedLengthEbcdicWriterSuite.scala (2)
spark-cobol/src/test/scala/za/co/absa/cobrix/spark/cobol/source/fixtures/BinaryFileFixture.scala (1)
withTempDirectory(71-78)spark-cobol/src/main/scala/za/co/absa/cobrix/spark/cobol/writer/RawBinaryOutputFormat.scala (2)
write(49-53)close(54-56)
cobol-parser/src/main/scala/za/co/absa/cobrix/cobol/parser/encoding/EncoderSelector.scala (4)
cobol-parser/src/main/scala/za/co/absa/cobrix/cobol/parser/ast/datatype/Usage.scala (2)
COMP3(30-32)COMP3U(34-36)cobol-parser/src/main/scala/za/co/absa/cobrix/cobol/parser/ast/datatype/Decimal.scala (1)
Decimal(32-63)cobol-parser/src/main/scala/za/co/absa/cobrix/cobol/parser/ast/datatype/Integral.scala (1)
Integral(30-40)cobol-parser/src/main/scala/za/co/absa/cobrix/cobol/parser/encoding/BCDNumberEncoders.scala (2)
BCDNumberEncoders(21-108)encodeBCDNumber(36-106)
cobol-parser/src/test/scala/za/co/absa/cobrix/cobol/parser/encoding/BCDNumberEncodersSuite.scala (1)
cobol-parser/src/main/scala/za/co/absa/cobrix/cobol/parser/encoding/BCDNumberEncoders.scala (2)
BCDNumberEncoders(21-108)encodeBCDNumber(36-106)
spark-cobol/src/main/scala/za/co/absa/cobrix/spark/cobol/writer/BasicRecordCombiner.scala (5)
cobol-parser/src/main/scala/za/co/absa/cobrix/cobol/parser/Copybook.scala (3)
Copybook(28-347)Copybook(350-446)setPrimitiveField(427-445)cobol-parser/src/main/scala/za/co/absa/cobrix/cobol/parser/ast/Primitive.scala (2)
Primitive(42-140)toString(67-69)cobol-parser/src/main/scala/za/co/absa/cobrix/cobol/parser/ast/datatype/AlphaNumeric.scala (1)
AlphaNumeric(28-36)cobol-parser/src/main/scala/za/co/absa/cobrix/cobol/parser/ast/datatype/Decimal.scala (1)
Decimal(32-63)cobol-parser/src/main/scala/za/co/absa/cobrix/cobol/parser/ast/datatype/Integral.scala (1)
Integral(30-40)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (5)
- GitHub Check: Spark 3.5.5 on Scala 2.12.20
- GitHub Check: Spark 3.4.4 on Scala 2.12.20
- GitHub Check: Spark 3.5.5 on Scala 2.13.16
- GitHub Check: Spark 2.4.8 on Scala 2.11.12
- GitHub Check: test (2.12.20, 2.12, 3.3.4, 0, 80, 20)
🔇 Additional comments (19)
cobol-parser/src/main/scala/za/co/absa/cobrix/cobol/parser/Copybook.scala (2)
28-29: LGTM! Clean refactoring implementation.The import statement properly enables unqualified access to the companion object's methods after the
setPrimitiveFieldmethod relocation.
414-445: Well-structured method relocation to companion object.The
setPrimitiveFieldmethod has been properly moved from the instance to the companion object, maintaining the same signature and functionality. The method includes appropriate boundary checks, size validation, and error handling for field encoding operations.cobol-parser/src/test/scala/za/co/absa/cobrix/cobol/parser/extract/BinaryExtractorSpec.scala (3)
219-219: Test assertion updated to reflect COMP-3 encoder availability.The test now correctly expects a non-empty encoder for the COMP-3 field, validating that the new BCD encoding support is properly integrated.
233-233: Consistent test validation for COMP-3 encoder.The assertion properly verifies encoder availability for COMP-3 fields in the padding test scenario.
247-247: Consistent test validation continues for truncation scenario.The assertion maintains consistency with other test cases by verifying COMP-3 encoder availability.
spark-cobol/src/test/scala/za/co/absa/cobrix/spark/cobol/writer/FixedLengthEbcdicWriterSuite.scala (1)
129-184: Comprehensive COMP-3 encoding test with precise byte-level validation.This test provides excellent coverage for the new COMP-3/COMP-3U functionality by:
- Testing both signed (COMP-3) and unsigned (COMP-3U) packed decimal fields
- Including various numeric types (Int, Double, BigDecimal) and both positive/negative values
- Performing exact byte-level verification against expected EBCDIC encoding
- Including diagnostic output for debugging failed assertions
The test data covers edge cases like decimal precision, scale factors, and mixed sign scenarios, ensuring robust validation of the BCD encoding implementation.
cobol-parser/src/main/scala/za/co/absa/cobrix/cobol/parser/encoding/EncoderSelector.scala (3)
19-19: Proper imports added for COMP-3 support.The new imports correctly include the required types (
COMP3,COMP3U,Decimal,Integral) to support BCD encoding functionality.
34-41: Well-implemented COMP-3/COMP-3U encoder selection logic.The pattern matching correctly handles both
IntegralandDecimaltypes with COMP-3/COMP-3U compact metadata. The logic properly:
- Differentiates between signed (COMP-3) and unsigned (COMP-3U) variants
- Sets appropriate parameters for precision, scale, scaleFactor, and sign handling
- Uses
mandatorySignNibble = truefor COMP-3 andfalsefor COMP-3U as per BCD standards
91-111: Robust BCD encoder implementation with proper validation.The
getBdcEncodermethod includes:
- Proper parameter validation ensuring signed fields require mandatory sign nibbles
- Comprehensive type conversion handling for various numeric types (BigDecimal, BigInteger, primitives)
- Fallback to string conversion for unknown types
- Clean delegation to the
BCDNumberEncodersutilityThe type conversion logic appropriately handles null values and covers all common numeric types that might be encountered in practice.
spark-cobol/src/main/scala/za/co/absa/cobrix/spark/cobol/writer/BasicRecordCombiner.scala (6)
21-21: Import added for companion object access.The import enables usage of the relocated
setPrimitiveFieldmethod from the Copybook companion object.
25-25: Additional imports for enhanced field definition formatting.The imports support the new
getFieldDefinitionhelper method for improved error message formatting.
31-31: Clean companion object import pattern.The import statement allows unqualified access to companion object methods within the class.
48-51: Improved error messages with detailed field definitions.The error message now includes more descriptive field information using the new
getFieldDefinitionhelper, making it easier for users to understand which field types are not yet supported.
74-74: Updated call to use companion object method.The call correctly uses
Copybook.setPrimitiveFieldinstead of the instance method, aligning with the method relocation.
120-132: Well-designed companion object with useful helper method.The
getFieldDefinitionmethod provides a clean way to format field definitions for error messages by:
- Preferring
originalPicoverpicfor more accurate representation- Including usage information for numeric types (Integral/Decimal)
- Handling non-numeric types gracefully
- Returning a properly formatted string with trimmed whitespace
This enhances the user experience by providing more informative error messages when unsupported field types are encountered.
cobol-parser/src/main/scala/za/co/absa/cobrix/cobol/parser/encoding/BCDNumberEncoders.scala (3)
21-35: Good addition: focused, dependency-free COMP-3 encoder with clear parametersThe overall structure and responsibilities are clean. The encoder is self-contained, free of external deps, and the parameter list maps well to COBOL semantics. Nice job.
54-56: Null input returns zero bytes — confirm writer semanticsReturning zero-filled bytes for null aligns with tests in this PR. Please confirm this matches the writer contract elsewhere (e.g., BasicRecordCombiner and Copybook.setPrimitiveField) so nulls don’t silently mask data issues in “strict” pipelines.
Would you like a “strict mode” switch (throw on null/overflow) at the encoder level, defaulting to current permissive behavior?
79-89: Padding and packing logic look correct for all parity/sign cases
- The “+ '0' placeholder nibble” and subsequent sign-nibble overwrite are correct and match COMP-3 expectations.
- Loop packs two digits per byte with no off-by-one risks.
Also applies to: 93-99, 101-103
cobol-parser/src/test/scala/za/co/absa/cobrix/cobol/parser/encoding/BCDNumberEncodersSuite.scala (1)
22-107: Comprehensive suite covering signed/unsigned, scale/scaleFactor, and overflow pathsGreat coverage across integral and decimal cases, including negative handling and size mismatches. The byte-level assertions with diagnostic hex are especially useful.
Also applies to: 110-187
Closes #776
Summary by CodeRabbit
New Features
Refactor
Tests