Skip to content

Commit 6da061b

Browse files
Merge pull request #253578 from KrishnakumarRukmangathan/patch-17
Update connector-google-cloud-storage.md
2 parents 0b62b78 + 505b07c commit 6da061b

File tree

1 file changed

+76
-0
lines changed

1 file changed

+76
-0
lines changed

articles/data-factory/connector-google-cloud-storage.md

Lines changed: 76 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -24,6 +24,7 @@ This Google Cloud Storage connector is supported for the following capabilities:
2424
| Supported capabilities|IR |
2525
|---------| --------|
2626
|[Copy activity](copy-activity-overview.md) (source/-)|① ②|
27+
|[Mapping data flow](concepts-data-flow-overview.md) (source/-)|① |
2728
|[Lookup activity](control-flow-lookup-activity.md)|① ②|
2829
|[GetMetadata activity](control-flow-get-metadata-activity.md)|① ②|
2930
|[Delete activity](delete-activity.md)|① ②|
@@ -246,6 +247,81 @@ Assume that you have the following source folder structure and want to copy the
246247
| ------------------------------------------------------------ | --------------------------------------------------------- | ------------------------------------------------------------ |
247248
| bucket<br/>&nbsp;&nbsp;&nbsp;&nbsp;FolderA<br/>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;**File1.csv**<br/>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;File2.json<br/>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;Subfolder1<br/>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;**File3.csv**<br/>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;File4.json<br/>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;**File5.csv**<br/>&nbsp;&nbsp;&nbsp;&nbsp;Metadata<br/>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;FileListToCopy.txt | File1.csv<br>Subfolder1/File3.csv<br>Subfolder1/File5.csv | **In dataset:**<br>- Bucket: `bucket`<br>- Folder path: `FolderA`<br><br>**In copy activity source:**<br>- File list path: `bucket/Metadata/FileListToCopy.txt` <br><br>The file list path points to a text file in the same data store that includes a list of files you want to copy, one file per line, with the relative path to the path configured in the dataset. |
248249

250+
## Mapping data flow properties
251+
252+
When you're transforming data in mapping data flows, you can read files from Google Cloud Storage in the following formats:
253+
254+
- [Avro](format-avro.md#mapping-data-flow-properties)
255+
- [Delta](format-delta.md#mapping-data-flow-properties)
256+
- [CDM](format-common-data-model.md#mapping-data-flow-properties)
257+
- [Delimited text](format-delimited-text.md#mapping-data-flow-properties)
258+
- [Excel](format-excel.md#mapping-data-flow-properties)
259+
- [JSON](format-json.md#mapping-data-flow-properties)
260+
- [ORC](format-orc.md#mapping-data-flow-properties)
261+
- [Parquet](format-parquet.md#mapping-data-flow-properties)
262+
- [XML](format-xml.md#mapping-data-flow-properties)
263+
264+
Format specific settings are located in the documentation for that format. For more information, see [Source transformation in mapping data flow](data-flow-source.md).
265+
266+
### Source transformation
267+
268+
In source transformation, you can read from a container, folder, or individual file in Google Cloud Storage. Use the **Source options** tab to manage how the files are read.
269+
270+
:::image type="content" source="media/data-flow/source-options-1.png" alt-text="Screenshot of Source options.":::
271+
272+
**Wildcard paths:** Using a wildcard pattern will instruct the service to loop through each matching folder and file in a single source transformation. This is an effective way to process multiple files within a single flow. Add multiple wildcard matching patterns with the plus sign that appears when you hover over your existing wildcard pattern.
273+
274+
From your source container, choose a series of files that match a pattern. Only a container can be specified in the dataset. Your wildcard path must therefore also include your folder path from the root folder.
275+
276+
Wildcard examples:
277+
278+
- `*` Represents any set of characters.
279+
- `**` Represents recursive directory nesting.
280+
- `?` Replaces one character.
281+
- `[]` Matches one or more characters in the brackets.
282+
283+
- `/data/sales/**/*.csv` Gets all .csv files under /data/sales.
284+
- `/data/sales/20??/**/` Gets all files in the 20th century.
285+
- `/data/sales/*/*/*.csv` Gets .csv files two levels under /data/sales.
286+
- `/data/sales/2004/*/12/[XY]1?.csv` Gets all .csv files in December 2004 starting with X or Y prefixed by a two-digit number.
287+
288+
**Partition root path:** If you have partitioned folders in your file source with a `key=value` format (for example, `year=2019`), then you can assign the top level of that partition folder tree to a column name in your data flow's data stream.
289+
290+
First, set a wildcard to include all paths that are the partitioned folders plus the leaf files that you want to read.
291+
292+
:::image type="content" source="media/data-flow/part-file-2.png" alt-text="Screenshot of partition source file settings.":::
293+
294+
Use the **Partition root path** setting to define what the top level of the folder structure is. When you view the contents of your data via a data preview, you'll see that the service will add the resolved partitions found in each of your folder levels.
295+
296+
:::image type="content" source="media/data-flow/partfile1.png" alt-text="Screenshot of partition root path.":::
297+
298+
**List of files:** This is a file set. Create a text file that includes a list of relative path files to process. Point to this text file.
299+
300+
**Column to store file name:** Store the name of the source file in a column in your data. Enter a new column name here to store the file name string.
301+
302+
**After completion:** Choose to do nothing with the source file after the data flow runs, delete the source file, or move the source file. The paths for the move are relative.
303+
304+
To move source files to another location post-processing, first select "Move" for file operation. Then, set the "from" directory. If you're not using any wildcards for your path, then the "from" setting will be the same folder as your source folder.
305+
306+
If you have a source path with wildcard, your syntax will look like this:
307+
308+
`/data/sales/20??/**/*.csv`
309+
310+
You can specify "from" as:
311+
312+
`/data/sales`
313+
314+
And you can specify "to" as:
315+
316+
`/backup/priorSales`
317+
318+
In this case, all files that were sourced under `/data/sales` are moved to `/backup/priorSales`.
319+
320+
> [!NOTE]
321+
> File operations run only when you start the data flow from a pipeline run (a pipeline debug or execution run) that uses the Execute Data Flow activity in a pipeline. File operations *do not* run in Data Flow debug mode.
322+
323+
**Filter by last modified:** You can filter which files you process by specifying a date range of when they were last modified. All datetimes are in UTC.
324+
249325
## Lookup activity properties
250326

251327
To learn details about the properties, check [Lookup activity](control-flow-lookup-activity.md).

0 commit comments

Comments
 (0)