Skip to content

[bigquery] SPLIT(BYTE) is incorrectly roundtripped #6392

@erindru

Description

@erindru

From the BigQuery docs for SPLIT:

For STRING, the default delimiter is the comma ,.
For BYTES, you must specify a delimiter.

For a STRING input, SQLGlot injects the delimiter as ',' since this lines up with the BQ default:

>>> parse_one("SELECT SPLIT('foo,bar') AS a", dialect="bigquery").sql(dialect="bigquery")
"SELECT SPLIT('foo,bar', ',') AS a"

This works fine.

However, for a BYTES input, SQLGlot still injects the same STRING delimiter:

>>> parse_one("SELECT SPLIT(b'foo,bar') AS a", dialect="bigquery").sql(dialect="bigquery")
"SELECT SPLIT(b'foo,bar', ',') AS a"

This causes BigQuery to throw an error:

No matching signature for function SPLIT Argument types: BYTES, STRING Signature: SPLIT(STRING, [STRING]) Argument 1: Unable to coerce type BYTES to expected type STRING Signature: SPLIT(BYTES, BYTES) Argument 2: Unable to coerce type STRING to expected type BYTES at [1:8]

I think there are two options here:

  • Follow the BQ docs and don't infer a delimiter for a BYTES input if one isn't specified
  • or, if the input is BYTES, infer the delimiter as b',' instead of ','

Metadata

Metadata

Assignees

Labels

No labels
No labels

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions