Merge pull request #286612 from Clare-Zheng82/0912-Add_Iceberg_format_doc

PMEds28 · web-flow · commit 1e6755a3b41e · 2024-11-06T09:39:16.000Z
[New feature] ADF - Add Iceberg format doc
diff --git a/articles/data-factory/TOC.yml b/articles/data-factory/TOC.yml
@@ -492,6 +492,8 @@ items:
       displayName: timeout
     - name: HubSpot
       href: connector-hubspot.md
+    - name: Iceberg format
+      href: format-iceberg.md      
     - name: Impala
       href: connector-impala.md
     - name: Informix
diff --git a/articles/data-factory/connector-azure-data-lake-storage.md b/articles/data-factory/connector-azure-data-lake-storage.md
@@ -7,7 +7,7 @@ author: jianleishen
 ms.subservice: data-movement
 ms.topic: conceptual
 ms.custom: synapse
-ms.date: 01/05/2024
+ms.date: 09/12/2024
 ---
 
 # Copy and transform data in Azure Data Lake Storage Gen2 using Azure Data Factory or Azure Synapse Analytics
@@ -383,7 +383,17 @@ These properties are supported for the linked service:
 
 For a full list of sections and properties available for defining datasets, see [Datasets](concepts-datasets-linked-services.md).
 
-[!INCLUDE [data-factory-v2-file-formats](includes/data-factory-v2-file-formats.md)] 
+Azure Data Factory supports the following file formats. Refer to each article for format-based settings.
+
+- [Avro format](format-avro.md)
+- [Binary format](format-binary.md)
+- [Delimited text format](format-delimited-text.md)
+- [Excel format](format-excel.md)
+- [Iceberg format](format-iceberg.md)
+- [JSON format](format-json.md)
+- [ORC format](format-orc.md)
+- [Parquet format](format-parquet.md)
+- [XML format](format-xml.md)
 
 The following properties are supported for Data Lake Storage Gen2 under `location` settings in the format-based dataset:
 
@@ -497,7 +507,15 @@ The following properties are supported for Data Lake Storage Gen2 under `storeSe
 
 ### Azure Data Lake Storage Gen2 as a sink type
 
-[!INCLUDE [data-factory-v2-file-sink-formats](includes/data-factory-v2-file-sink-formats.md)]
+Azure Data Factory supports the following file formats. Refer to each article for format-based settings.
+
+- [Avro format](format-avro.md)
+- [Binary format](format-binary.md)
+- [Delimited text format](format-delimited-text.md)
+- [Iceberg format](format-iceberg.md)
+- [JSON format](format-json.md)
+- [ORC format](format-orc.md)
+- [Parquet format](format-parquet.md)
 
 The following properties are supported for Data Lake Storage Gen2 under `storeSettings` settings in format-based copy sink:
 
@@ -682,7 +700,7 @@ In this case, all files that were sourced under /data/sales are moved to /backup
 
 ### Sink properties
 
-In the sink transformation, you can write to either a container or folder in Azure Data Lake Storage Gen2. the **Settings** tab lets you manage how the files get written.
+In the sink transformation, you can write to either a container or folder in Azure Data Lake Storage Gen2. The **Settings** tab lets you manage how the files get written.
 
 :::image type="content" source="media/data-flow/file-sink-settings.png" alt-text="sink options":::
 
diff --git a/articles/data-factory/connector-overview.md b/articles/data-factory/connector-overview.md
@@ -6,7 +6,7 @@ author: jianleishen
 ms.subservice: data-movement
 ms.custom: synapse
 ms.topic: conceptual
-ms.date: 01/09/2024
+ms.date: 11/05/2024
 ms.author: jianleishen
 ---
 
@@ -41,6 +41,7 @@ The following file formats are supported. Refer to each article for format-based
 - [Delimited text format](format-delimited-text.md)
 - [Delta format](format-delta.md)
 - [Excel format](format-excel.md)
+- [Iceberg format](format-iceberg.md)
 - [JSON format](format-json.md)
 - [ORC format](format-orc.md)
 - [Parquet format](format-parquet.md)
diff --git a/articles/data-factory/copy-activity-overview.md b/articles/data-factory/copy-activity-overview.md
@@ -6,7 +6,7 @@ author: jianleishen
 ms.subservice: data-movement
 ms.custom: synapse
 ms.topic: conceptual
-ms.date: 08/02/2024
+ms.date: 11/05/2024
 ms.author: jianleishen
 ---
 
@@ -46,7 +46,17 @@ To copy data from a source to a sink, the service that runs the Copy activity pe
 
 ### Supported file formats
 
-[!INCLUDE [data-factory-v2-file-formats](includes/data-factory-v2-file-formats.md)] 
+Azure Data Factory supports the following file formats. Refer to each article for format-based settings.
+
+- [Avro format](format-avro.md)
+- [Binary format](format-binary.md)
+- [Delimited text format](format-delimited-text.md)
+- [Excel format](format-excel.md)
+- [Iceberg format](format-iceberg.md) (only for Azure Data Lake Storage Gen2)
+- [JSON format](format-json.md)
+- [ORC format](format-orc.md)
+- [Parquet format](format-parquet.md)
+- [XML format](format-xml.md)
 
 You can use the Copy activity to copy files as-is between two file-based data stores, in which case the data is copied efficiently without any serialization or deserialization. In addition, you can also parse or generate files of a given format, for example, you can perform the following:
 
diff --git a/articles/data-factory/format-iceberg.md b/articles/data-factory/format-iceberg.md
@@ -0,0 +1,93 @@
+---
+title: Iceberg format in Azure Data Factory
+titleSuffix: Azure Data Factory & Azure Synapse
+description: This topic describes how to deal with Iceberg format in Azure Data Factory and Azure Synapse Analytics.
+author: jianleishen
+ms.subservice: data-movement
+ms.custom: synapse
+ms.topic: conceptual
+ms.date: 09/12/2024
+ms.author: jianleishen
+---
+
+# Iceberg format in Azure Data Factory and Azure Synapse Analytics
+
+[!INCLUDE[appliesto-adf-asa-md](includes/appliesto-adf-asa-md.md)]
+
+Follow this article when you want to **write the data into Iceberg format**. 
+
+Iceberg format is supported for the following connectors: 
+
+- [Azure Data Lake Storage Gen2](connector-azure-data-lake-storage.md)
+
+You can use Iceberg dataset in [Copy activity](copy-activity-overview.md).
+
+## Dataset properties
+
+For a full list of sections and properties available for defining datasets, see the [Datasets](concepts-datasets-linked-services.md) article. This section provides a list of properties supported by the Iceberg format dataset.
+
+| Property         | Description                                                  | Required |
+| ---------------- | ------------------------------------------------------------ | -------- |
+| type             | The type property of the dataset must be set to **Iceberg**. | Yes      |
+| location         | Location settings of the file(s). Each file-based connector has its own location type and supported properties under `location`.  | Yes      |
+
+Below is an example of Iceberg dataset on Azure Data Lake Storage Gen2:
+
+```json
+{
+    "name": "IcebergDataset",
+    "properties": {
+        "type": "Iceberg",
+        "linkedServiceName": {
+            "referenceName": "<Azure Data Lake Storage Gen2 linked service name>",
+            "type": "LinkedServiceReference"
+        },
+        "schema": [ < physical schema, optional, auto retrieved during authoring >
+        ],
+        "typeProperties": {
+            "location": {
+                "type": "AzureBlobFSLocation",
+                "fileSystem": "filesystemname",
+                "folderPath": "folder/subfolder",
+            }
+        }
+    }
+}
+
+```
+
+## Copy activity properties
+
+For a full list of sections and properties available for defining activities, see the [Pipelines](concepts-pipelines-activities.md) article. This section provides a list of properties supported by the Iceberg sink.
+
+### Iceberg as sink
+
+The following properties are supported in the copy activity ***\*sink\**** section.
+
+| Property       | Description                                                  | Required |
+| -------------- | ------------------------------------------------------------ | -------- |
+| type           | The type property of the copy activity source must be set to **IcebergSink**. | Yes      |
+| formatSettings | A group of properties. Refer to **Iceberg write settings** table below. |    No      |
+| storeSettings  | A group of properties on how to write data to a data store. Each file-based connector has its own supported write settings under `storeSettings`.  | No       |
+
+Supported **Iceberg write settings** under `formatSettings`:
+
+| Property      | Description                                                  | Required                                              |
+| ------------- | ------------------------------------------------------------ | ----------------------------------------------------- |
+| type          | The type of formatSettings must be set to **IcebergWriteSettings**. | Yes                                                   |
+
+## Related connectors and formats
+
+Here are some common connectors and formats related to the delimited text format:
+
+- [Azure Data Lake Storage Gen2](connector-azure-data-lake-storage.md)
+- [Binary format](format-binary.md)
+- [Delta format](format-delta.md)
+- [Excel format](format-excel.md)
+- [JSON format](format-json.md)
+- [Parquet format](format-parquet.md)
+
+## Related content
+
+- [Data type mapping in dataset schemas](copy-activity-schema-and-type-mapping.md#data-type-mapping)
+- [Copy activity overview](copy-activity-overview.md)