Skip to content

Commit 4d7bd16

Browse files
authored
feat(interactive): Support parsing csv files with special delimiters (#4336)
Support special delimiters like `\t`.
1 parent 94a02d7 commit 4d7bd16

File tree

3 files changed

+31
-4
lines changed

3 files changed

+31
-4
lines changed

.github/workflows/interactive.yml

Lines changed: 12 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -605,3 +605,15 @@ jobs:
605605
SCHEMA_FILE=${GITHUB_WORKSPACE}/flex/tests/rt_mutable_graph/movie_schema_test.yaml
606606
BULK_LOAD_FILE=${GITHUB_WORKSPACE}/flex/tests/rt_mutable_graph/movie_import_test.yaml
607607
GLOG_v=10 ./bin/bulk_loader -g ${SCHEMA_FILE} -l ${BULK_LOAD_FILE} -d /tmp/csr-data-dir/
608+
609+
- name: Test graph loading with different delimiter
610+
env:
611+
GS_TEST_DIR: ${{ github.workspace }}/gstest/
612+
FLEX_DATA_DIR: ${{ github.workspace }}/gstest/flex/modern_graph_tab_delimiter/
613+
run: |
614+
rm -rf /tmp/csr-data-dir/
615+
cd ${GITHUB_WORKSPACE}/flex/build/
616+
SCHEMA_FILE=${GITHUB_WORKSPACE}/flex/interactive/examples/modern_graph/graph.yaml
617+
BULK_LOAD_FILE=${GITHUB_WORKSPACE}/flex/interactive/examples/modern_graph/bulk_load.yaml
618+
sed -i 's/|/\\t/g' ${BULK_LOAD_FILE}
619+
GLOG_v=10 ./bin/bulk_loader -g ${SCHEMA_FILE} -l ${BULK_LOAD_FILE} -d /tmp/csr-data-dir/

docs/flex/interactive/data_import.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -227,7 +227,7 @@ The table below offers a detailed breakdown of each configuration item. In this
227227
| loading_config.scheme | file | The source of input data. Currently only `file` and `odps` are supported | No |
228228
| loading_config.format | N/A | The format of the raw data in CSV | Yes |
229229
| loading_config.format.metadata | N/A | Mainly for configuring the options for reading CSV | Yes |
230-
| loading_config.format.metadata.delimiter | '\|' | Delimiter used to split a row of data | Yes |
230+
| loading_config.format.metadata.delimiter | '|' | Delimiter used to split a row of data, escaped char are also supported, i.e. '\t' | Yes |
231231
| loading_config.format.metadata.header_row | true | Indicate if the first row should be used as the header | No |
232232
| loading_config.format.metadata.quoting | false | Whether quoting is used | No |
233233
| loading_config.format.metadata.quote_char | '\"' | Quoting character (if `quoting` is true) | No |

flex/storages/rt_mutable_graph/loader/csv_fragment_loader.cc

Lines changed: 18 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -158,10 +158,25 @@ static std::vector<std::string> read_header(
158158
static void put_delimiter_option(const LoadingConfig& loading_config,
159159
arrow::csv::ParseOptions& parse_options) {
160160
auto delimiter_str = loading_config.GetDelimiter();
161-
if (delimiter_str.size() != 1) {
162-
LOG(FATAL) << "Delimiter should be a single character";
161+
if (delimiter_str.size() != 1 && delimiter_str[0] != '\\') {
162+
LOG(FATAL) << "Delimiter should be a single character, or a escape "
163+
"character, like '\\t'";
164+
}
165+
if (delimiter_str[0] == '\\') {
166+
if (delimiter_str.size() != 2) {
167+
LOG(FATAL) << "Delimiter should be a single character";
168+
}
169+
// escape the special character
170+
switch (delimiter_str[1]) {
171+
case 't':
172+
parse_options.delimiter = '\t';
173+
break;
174+
default:
175+
LOG(FATAL) << "Unsupported escape character: " << delimiter_str[1];
176+
}
177+
} else {
178+
parse_options.delimiter = delimiter_str[0];
163179
}
164-
parse_options.delimiter = delimiter_str[0];
165180
}
166181

167182
static bool put_skip_rows_option(const LoadingConfig& loading_config,

0 commit comments

Comments
 (0)