You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Blob - is set by default as field named 'body' of type bytes.
51
+
52
+
Text - is set by default as two fields: 'body' of type bytes and 'offset' of type 'long'.
53
+
54
+
JSON - is not supported, user has to manually provide the output schema.
55
+
56
+
Parquet - If the path is a directory, the plugin will look for files ending in '.parquet' to read the schema from.
57
+
If no such file can be found, an error will be returned.
58
+
59
+
Avro - If the path is a directory, the plugin will look for files ending in '.avro' to read the schema from.
60
+
If no such file can be found, an error will be returned.
61
+
62
+
**Sample Size:** The maximum number of rows that will get investigated for automatic data type detection.
63
+
The default value is 1000. This is used when the format is `xls`, `csv`, `tsv`, `delimited`.
64
+
65
+
**Override:** A list of columns with the corresponding data types for whom the automatic data type detection gets
66
+
skipped. This is used when the format is `xls`, `csv`, `tsv`, `delimited`.
67
+
68
+
**Delimiter:** Delimiter to use when the format is 'delimited'. This will be ignored for other formats.
69
+
70
+
**Enable Quoted Values** Whether to treat content between quotes as a value. This value will only be used if the format
71
+
is 'csv', 'tsv' or 'delimited'. For example, if this is set to true, a line that looks like `1, "a, b, c"` will output two fields.
72
+
The first field will have `1` as its value and the second will have `a, b, c` as its value. The quote characters will be trimmed.
73
+
The newline delimiter cannot be within quotes.
74
+
75
+
It also assumes the quotes are well enclosed. The left quote will match the first following quote right before the delimiter. If there is an
76
+
unenclosed quote, an error will occur.
77
+
78
+
**Use First Row as Header:** Whether to use the first line of each file as the column headers. Supported formats are 'text', 'csv', 'tsv', 'xls', 'delimited'.
79
+
80
+
**Terminate Reading After Empty Row:** Specify whether to stop reading after encountering the first empty row. Defaults to false. When false the reader will read all rows in the sheet. This is only used when the format is 'xls'.
81
+
82
+
**Select Sheet Using:** Select the sheet by name or number. Default is 'Sheet Number'. This is only used when the format is 'xls'.
83
+
84
+
**Sheet Value:** The name/number of the sheet to read from. If not specified, the first sheet will be read.
85
+
Sheet Numbers are 0 based, ie first sheet is 0. This is only used when the format is 'xls'.
86
+
36
87
### Filtering
37
88
38
89
**Filter:** Filter that can be applied to the files in the selected directory.
@@ -107,6 +158,10 @@ Default 0 value means unlimited. Is not applicable for files in Google formats.
107
158
108
159
**Body Output Format** Output format for body of file. "Bytes" and "String" values are available.
109
160
161
+
**File System Properties:** Additional properties to use with the InputFormat when reading the data.
162
+
163
+
**File Encoding:** The character encoding for the file(s) to be read. The default encoding is UTF-8.
164
+
110
165
### Exporting
111
166
112
167
**Google Documents Export Format:** MIME type which is used for Google Documents when converted to structured records.
0 commit comments