@@ -1662,7 +1662,8 @@ The output looks like this:
16621662` common_extended ` , ` cp037_extended ` are code pages supporting non-printable characters that converts to ASCII codes below 32.
16631663
16641664## EBCDIC Processor (experimental)
1665- The EBCDIC processor allows processing files by replacing value of fields without changing the underlying format.
1665+ The EBCDIC processor allows processing files by replacing value of fields without changing the underlying format (` CobolProcessingStrategy.InPlace ` )
1666+ or with conversion of the input format to variable-record-length format with big-endian RDWs (` CobolProcessingStrategy.ToVariableLength ` ).
16661667
16671668The processing does not require Spark. A processing application can have only the COBOL parser as a dependency (` cobol-parser ` ).
16681669
@@ -1676,6 +1677,7 @@ val builder = CobolProcessor.builder(copybookContents)
16761677
16771678val builder = CobolProcessor .builder
16781679 .withCopybookContents(" ...some copybook..." )
1680+ .withProcessingStrategy(CobolProcessingStrategy .InPlace ) // Or CobolProcessingStrategy.ToVariableLength
16791681
16801682val processor = new RawRecordProcessor {
16811683 override def processRecord (record : Array [Byte ], ctx : CobolProcessorContext ): Array [Byte ] = {
@@ -1699,6 +1701,7 @@ import za.co.absa.cobrix.cobol.processor.{CobolProcessor, CobolProcessorContext}
16991701
17001702val count = CobolProcessor .builder
17011703 .withCopybookContents(copybook)
1704+ .withProcessingStrategy(CobolProcessingStrategy .InPlace ) // Or CobolProcessingStrategy.ToVariableLength
17021705 .withRecordProcessor { (record : Array [Byte ], ctx : CobolProcessorContext ) =>
17031706 // The transformation logic goes here
17041707 val value = copybook.getFieldValueByName(" some_field" , record, 0 )
@@ -1726,6 +1729,7 @@ val copybookContents = "...some copybook..."
17261729
17271730SparkCobolProcessor .builder
17281731 .withCopybookContents(copybook)
1732+ .withProcessingStrategy(CobolProcessingStrategy .InPlace ) // Or CobolProcessingStrategy.ToVariableLength
17291733 .withRecordProcessor { (record : Array [Byte ], ctx : CobolProcessorContext ) =>
17301734 // The transformation logic goes here
17311735 val value = ctx.copybook.getFieldValueByName(" some_field" , record, 0 )
@@ -1740,6 +1744,35 @@ SparkCobolProcessor.builder
17401744 .save(outputPath)
17411745```
17421746
1747+ ## EBCDIC Spark raw record RDD generator (experimental)
1748+ You can process raw records of a mainframe file as an ` RDD[Array[Byte]] ` . This can be useful for custom processing without converting
1749+ to Spark data types. You can still access fields via parsed copybooks.
1750+
1751+ Example:
1752+ ``` scala
1753+ import org .apache .spark .rdd .RDD
1754+ import za .co .absa .cobrix .spark .cobol .SparkCobolProcessor
1755+
1756+ val copybookContents = " ...some copybook..."
1757+
1758+ val rddBuilder = SparkCobolProcessor .builder
1759+ .withCopybookContents(copybookContents)
1760+ .option(" record_format" , " F" )
1761+ .load(" s3://bucket/some/path" )
1762+
1763+ // Fetch the parsed copybook and the RDD separately
1764+ val copybook = rddBuilder.getParsedCopybook
1765+ val rdd : RDD [Array [Byte ]] = rddBuilder.toRDD
1766+
1767+ val segmentRdds RDD [String ] = recordsRdd.flatMap { record =>
1768+ val seg = copybook.getFieldValueByName(" SEGMENT_ID" , record).toString
1769+ seg
1770+ }
1771+
1772+ // Print the list of unique segments
1773+ segmentRdds.distinct.collect.sorted.foreach(println)
1774+ ```
1775+
17431776## EBCDIC Writer (experimental)
17441777
17451778Cobrix's EBCDIC writer is an experimental feature that allows writing Spark DataFrames as EBCDIC mainframe files.
0 commit comments