|
| 1 | +--- |
| 2 | +title: Parse data transformation in mapping data flow |
| 3 | +description: Parse embedded column documents |
| 4 | +author: kromerm |
| 5 | +ms.author: makromer |
| 6 | +ms.service: data-factory |
| 7 | +ms.topic: conceptual |
| 8 | +ms.date: 02/08/2021 |
| 9 | +--- |
| 10 | + |
| 11 | +# Parse transformation in mapping data flow |
| 12 | + |
| 13 | +[!INCLUDE[appliesto-adf-asa-md](includes/appliesto-adf-asa-md.md)] |
| 14 | + |
| 15 | +Use the Parse transformation to parse columns in your data that are in document form. The current supported types of embedded documents that can be parsed are JSON and delimited text. |
| 16 | + |
| 17 | +## Configuration |
| 18 | + |
| 19 | +In the parse transformation configuration panel, you will first pick the type of data contained in the columns that you wish to parse inline. The parse transformation also contains the following configuration settings. |
| 20 | + |
| 21 | + |
| 22 | + |
| 23 | +### Column |
| 24 | + |
| 25 | +Similar to derived columns and aggregates, this is where you will either modify an exiting column by selecting it from the drop-down picker. Or you can type in the name of a new column here. ADF will store the parsed source data in this column. |
| 26 | + |
| 27 | +### Expression |
| 28 | + |
| 29 | +Use the expression builder to set the source for your parsing. This can be as simple as just selecting the source column with the self-contained data that you wish to parse, or you can create complex expressions to parse. |
| 30 | + |
| 31 | +### Output column type |
| 32 | + |
| 33 | +Here is where you will configure the target output schema from the parsing that will be written into a single column. |
| 34 | + |
| 35 | + |
| 36 | + |
| 37 | +In this example, we have defined parsing of the incoming field "jsonString" which is plain text, but formatted as a JSON structure. We're going to store the parsed results as JSON in a new column called "json" with this schema: |
| 38 | + |
| 39 | +```(trade as boolean, customers as string[])``` |
| 40 | + |
| 41 | +Refer to the inspect tab and data preview to verify your output is mapped properly. |
| 42 | + |
| 43 | +## Examples |
| 44 | + |
| 45 | +``` |
| 46 | +source(output( |
| 47 | + name as string, |
| 48 | + location as string, |
| 49 | + satellites as string[], |
| 50 | + goods as (trade as boolean, customers as string[], orders as (orderId as string, orderTotal as double, shipped as (orderItems as (itemName as string, itemQty as string)[]))[]) |
| 51 | + ), |
| 52 | + allowSchemaDrift: true, |
| 53 | + validateSchema: false, |
| 54 | + ignoreNoFilesFound: false, |
| 55 | + documentForm: 'documentPerLine') ~> JsonSource |
| 56 | +source(output( |
| 57 | + movieId as string, |
| 58 | + title as string, |
| 59 | + genres as string |
| 60 | + ), |
| 61 | + allowSchemaDrift: true, |
| 62 | + validateSchema: false, |
| 63 | + ignoreNoFilesFound: false) ~> CsvSource |
| 64 | +JsonSource derive(jsonString = toString(goods)) ~> StringifyJson |
| 65 | +StringifyJson parse(json = jsonString ? (trade as boolean, |
| 66 | + customers as string[]), |
| 67 | + format: 'json', |
| 68 | + documentForm: 'arrayOfDocuments') ~> ParseJson |
| 69 | +CsvSource derive(csvString = 'Id|name|year\n\'1\'|\'test1\'|\'1999\'') ~> CsvString |
| 70 | +CsvString parse(csv = csvString ? (id as integer, |
| 71 | + name as string, |
| 72 | + year as string), |
| 73 | + format: 'delimited', |
| 74 | + columnNamesAsHeader: true, |
| 75 | + columnDelimiter: '|', |
| 76 | + nullValue: '', |
| 77 | + documentForm: 'documentPerLine') ~> ParseCsv |
| 78 | +ParseJson select(mapColumn( |
| 79 | + jsonString, |
| 80 | + json |
| 81 | + ), |
| 82 | + skipDuplicateMapInputs: true, |
| 83 | + skipDuplicateMapOutputs: true) ~> KeepStringAndParsedJson |
| 84 | +ParseCsv select(mapColumn( |
| 85 | + csvString, |
| 86 | + csv |
| 87 | + ), |
| 88 | + skipDuplicateMapInputs: true, |
| 89 | + skipDuplicateMapOutputs: true) ~> KeepStringAndParsedCsv |
| 90 | +``` |
| 91 | + |
| 92 | +## Data flow script |
| 93 | + |
| 94 | +### Syntax |
| 95 | + |
| 96 | +### Examples |
| 97 | + |
| 98 | +``` |
| 99 | +parse(json = jsonString ? (trade as boolean, |
| 100 | + customers as string[]), |
| 101 | + format: 'json', |
| 102 | + documentForm: 'singleDocument') ~> ParseJson |
| 103 | +
|
| 104 | +parse(csv = csvString ? (id as integer, |
| 105 | + name as string, |
| 106 | + year as string), |
| 107 | + format: 'delimited', |
| 108 | + columnNamesAsHeader: true, |
| 109 | + columnDelimiter: '|', |
| 110 | + nullValue: '', |
| 111 | + documentForm: 'documentPerLine') ~> ParseCsv |
| 112 | +``` |
| 113 | + |
| 114 | +## Next steps |
| 115 | + |
| 116 | +* Use the [Flatten transformation](data-flow-flatten.md) to pivot rows to columns. |
| 117 | +* Use the [Derived column transformation](data-flow-derived-column.md) to pivot columns to rows. |
0 commit comments