You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: docs/GCSFile-batchsource.md
+24-2Lines changed: 24 additions & 2 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -33,22 +33,44 @@ You also can use the macro function ${conn(connection-name)}.
33
33
**Project ID:** Google Cloud Project ID, which uniquely identifies a project.
34
34
It can be found on the Dashboard in the Google Cloud Platform Console.
35
35
36
+
**Service Account Type:** Service account type, file path where the service account is located or the JSON content of
37
+
the service account.
38
+
39
+
**Service Account File Path:** Path on the local file system of the service account key. Can be set to 'auto-detect'.
40
+
41
+
**Service Account JSON:** Contents of the service account JSON file.
42
+
36
43
**Path:** Path to file(s) to be read. If a directory is specified, terminate the path name with a '/'.
37
44
For example, `gs://<bucket>/path/to/directory/`.
38
45
An asterisk ("\*") can be used as a wildcard to match a filename pattern.
39
46
If no files are found or matched, the pipeline will fail.
40
47
41
48
**Format:** Format of the data to read.
42
-
The format must be one of 'avro', 'blob', 'csv', 'delimited', 'json', 'parquet', 'text', 'tsv', or the
49
+
The format must be one of 'avro', 'blob', 'csv', 'delimited', 'json', 'parquet', 'text', 'tsv', 'xls', or the
43
50
name of any format plugin that you have deployed to your environment.
44
51
If the format is a macro, only the pre-packaged formats can be used.
45
52
If the format is 'blob', every input file will be read into a separate record.
46
53
The 'blob' format also requires a schema that contains a field named 'body' of type 'bytes'.
47
54
If the format is 'text', the schema must contain a field named 'body' of type 'string'.
48
55
56
+
**Get Schema:** Auto-detects schema from file. Supported formats are: avro, parquet, csv, delimited, tsv, blob, text, and xls.
57
+
58
+
**Sample Size:** The maximum number of rows that will get investigated for automatic data type detection.
59
+
The default value is 1000. This is only used when the format is 'xls'.
60
+
61
+
**Override:** A list of columns with the corresponding data types for whom the automatic data type detection gets
62
+
skipped. This is only used when the format is 'xls'.
63
+
64
+
**Terminate Reading After Empty Row:** Specify whether to stop reading after encountering the first empty row. Defaults to false. When false the reader will read all rows in the sheet. This is only used when the format is 'xls'.
65
+
66
+
**Select Sheet Using:** Select the sheet by name or number. Default is 'Sheet Number'. This is only used when the format is 'xls'.
67
+
68
+
**Sheet Value:** The name/number of the sheet to read from. If not specified, the first sheet will be read.
69
+
Sheet Numbers are 0 based, ie first sheet is 0. This is only used when the format is 'xls'.
70
+
49
71
**Delimiter:** Delimiter to use when the format is 'delimited'. This will be ignored for other formats.
50
72
51
-
**Use First Row as Header:** Whether to use first row as header. Supported formats are 'text', 'csv', 'tsv', 'delimited'.
73
+
**Use First Row as Header:** Whether to use first row as header. Supported formats are 'text', 'csv', 'tsv', 'delimited', 'xls'.
52
74
53
75
**Enable Quoted Values:** Whether to treat content between quotes as a value. This value will only be used if the format
54
76
is 'csv', 'tsv' or 'delimited'. For example, if this is set to true, a line that looks like `1, "a, b, c"` will output two fields.
0 commit comments