Skip to content

Commit 9cb4a42

Browse files
committed
#795 Add the description for the new feature to README.
1 parent 9ee551c commit 9cb4a42

File tree

1 file changed

+34
-1
lines changed

1 file changed

+34
-1
lines changed

README.md

Lines changed: 34 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1662,7 +1662,8 @@ The output looks like this:
16621662
`common_extended`, `cp037_extended` are code pages supporting non-printable characters that converts to ASCII codes below 32.
16631663

16641664
## EBCDIC Processor (experimental)
1665-
The EBCDIC processor allows processing files by replacing value of fields without changing the underlying format.
1665+
The EBCDIC processor allows processing files by replacing value of fields without changing the underlying format (`CobolProcessingStrategy.InPlace`)
1666+
or with conversion of the input format to variable-record-length format with big-endian RDWs (`CobolProcessingStrategy.ToVariableLength`).
16661667

16671668
The processing does not require Spark. A processing application can have only the COBOL parser as a dependency (`cobol-parser`).
16681669

@@ -1676,6 +1677,7 @@ val builder = CobolProcessor.builder(copybookContents)
16761677

16771678
val builder = CobolProcessor.builder
16781679
.withCopybookContents("...some copybook...")
1680+
.withProcessingStrategy(CobolProcessingStrategy.InPlace) // Or CobolProcessingStrategy.ToVariableLength
16791681

16801682
val processor = new RawRecordProcessor {
16811683
override def processRecord(record: Array[Byte], ctx: CobolProcessorContext): Array[Byte] = {
@@ -1699,6 +1701,7 @@ import za.co.absa.cobrix.cobol.processor.{CobolProcessor, CobolProcessorContext}
16991701

17001702
val count = CobolProcessor.builder
17011703
.withCopybookContents(copybook)
1704+
.withProcessingStrategy(CobolProcessingStrategy.InPlace) // Or CobolProcessingStrategy.ToVariableLength
17021705
.withRecordProcessor { (record: Array[Byte], ctx: CobolProcessorContext) =>
17031706
// The transformation logic goes here
17041707
val value = copybook.getFieldValueByName("some_field", record, 0)
@@ -1726,6 +1729,7 @@ val copybookContents = "...some copybook..."
17261729

17271730
SparkCobolProcessor.builder
17281731
.withCopybookContents(copybook)
1732+
.withProcessingStrategy(CobolProcessingStrategy.InPlace) // Or CobolProcessingStrategy.ToVariableLength
17291733
.withRecordProcessor { (record: Array[Byte], ctx: CobolProcessorContext) =>
17301734
// The transformation logic goes here
17311735
val value = ctx.copybook.getFieldValueByName("some_field", record, 0)
@@ -1740,6 +1744,35 @@ SparkCobolProcessor.builder
17401744
.save(outputPath)
17411745
```
17421746

1747+
## EBCDIC Spark raw record RDD generator (experimental)
1748+
You can process raw records of a mainframe file as an `RDD[Array[Byte]]`. This can be useful for custom processing without converting
1749+
to Spark data types. You can still access fields via parsed copybooks.
1750+
1751+
Example:
1752+
```scala
1753+
import org.apache.spark.rdd.RDD
1754+
import za.co.absa.cobrix.spark.cobol.SparkCobolProcessor
1755+
1756+
val copybookContents = "...some copybook..."
1757+
1758+
val rddBuilder = SparkCobolProcessor.builder
1759+
.withCopybookContents(copybookContents)
1760+
.option("record_format", "F")
1761+
.load("s3://bucket/some/path")
1762+
1763+
// Fetch the parsed copybook and the RDD separately
1764+
val copybook = rddBuilder.getParsedCopybook
1765+
val rdd: RDD[Array[Byte]] = rddBuilder.toRDD
1766+
1767+
val segmentRdds RDD[String] = recordsRdd.flatMap { record =>
1768+
val seg = copybook.getFieldValueByName("SEGMENT_ID", record).toString
1769+
seg
1770+
}
1771+
1772+
// Print the list of unique segments
1773+
segmentRdds.distinct.collect.sorted.foreach(println)
1774+
```
1775+
17431776
## EBCDIC Writer (experimental)
17441777

17451778
Cobrix's EBCDIC writer is an experimental feature that allows writing Spark DataFrames as EBCDIC mainframe files.

0 commit comments

Comments
 (0)