Feature Request: Add skip_rows_before_header (or equivalent) to CsvDecoder in Connector Builder #74285
Unanswered
ced455
asked this question in
Connector Builder
Replies: 0 comments
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
-
Summary
The CsvDecoder component in the Airbyte Connector Builder does not support skipping rows before the header row, blocking users from correctly parsing CSV/TSV files that contain metadata lines at the top.
Problem Description
When building a source connector using the Connector Builder against an API (Apple API in my case) that returns a TSV file, the response includes metadata on the first 3 lines before the actual header row. The CsvDecoder reads the first line as the header, resulting in incorrect schema detection and failed parsing.
Root Cause
The official CsvDecoder component only exposes three properties [YAML Reference]:
encoding (string) — default: utf-8
delimiter (string) — default: ,
set_values_to_none (array)
There is no parameter to skip rows before the header. This is in contrast to file-based source connectors (S3, GCS, Azure Blob Storage, etc.), which do support a skip_rows_before_header option [CSV format settings].
Impact / Blocked
Users building API-based connectors via the Connector Builder that receive CSV/TSV responses with metadata headers are completely blocked from correctly parsing those files. There is no workaround available within the Connector Builder.
Request
Add a skip_rows_before_header and skip_rows_after_header parameter to the CsvDecoder component, consistent with the behavior already available in file-based connectors, where it is defined as:
"The number of rows to skip before the header row. For example, if the header row is on the 3rd row, enter 2 in this field."
Proposed YAML Example
decoder:
type: CsvDecoder
delimiter: "\t"
encoding: "utf-8"
skip_rows_before_header: 3
skip_rows_after_header : 0
Priority Justification
Without this feature, users must resort to building a full custom Python connector, which is significantly more complex and has deployment restrictions on Airbyte Cloud.
Beta Was this translation helpful? Give feedback.
All reactions