Skip to content

SELECT COUNT(*) works in Athena query editor, fails in awswrangler #3118

@liquidcarbon

Description

@liquidcarbon

Describe the bug

  • SELECT * FROM db.table LIMIT 3 works in Athena query editor and awswrangler
  • SELECT COUNT(*) FROM db.table LIMIT 3 works in Athena query editor but awswrangler throws an error (same with COUNT(1) or COUNT(s):
File ~/py/awsdata/.venv/lib/python3.12/site-packages/awswrangler/athena/_utils.py:861, in create_ctas_table(sql, database, ctas_table, ctas_database, s3_output, storage_format, write_compression, partitioning_info, bucketing_info, field_delimiter, schema_only, workgroup, data_source, encryption, kms_key, categories, wait, athena_query_wait_polling_delay, execution_params, params, paramstyle, boto3_session)
    857     raise exceptions.InvalidCtasApproachQuery(
    858         f"Please, define distinct names for your columns. Root error message: {msg}"
    859     )
    860 if "Column name not specified" in msg:
--> 861     raise exceptions.InvalidArgumentValue(
    862         "Please, define all columns names in your query. (E.g. 'SELECT MAX(col1) AS max_col1, ...')"
    863     )
    864 if "Column type is unknown" in msg:
    865     raise exceptions.InvalidArgumentValue(
    866         "Please, don't leave undefined columns types in your query. You can cast to ensure it. "
    867         "(E.g. 'SELECT CAST(NULL AS INTEGER) AS MY_COL, ...')"
    868     )

InvalidArgumentValue: Please, define all columns names in your query. (E.g. 'SELECT MAX(col1) AS max_col1, ...')

This may have something to do with vector field in my table:

CREATE EXTERNAL TABLE `test1`(
  `s` string, 
  `vec` array<smallint>, 
  `mbin` binary, 
)
ROW FORMAT SERDE 
  'org.apache.hadoop.hive.ql.io.parquet.serde.ParquetHiveSerDe' 
STORED AS INPUTFORMAT 
  'org.apache.hadoop.hive.ql.io.parquet.MapredParquetInputFormat' 
OUTPUTFORMAT 
  'org.apache.hadoop.hive.ql.io.parquet.MapredParquetOutputFormat'
LOCATION
  's3://...'
TBLPROPERTIES (
  'compressionType'='gzip', 
  'classification'='parquet', 
  'projection.enabled'='false', 
  'typeOfData'='file')

How to Reproduce

not sure

Expected behavior

No response

Your project

No response

Screenshots

No response

OS

Linux

Python version

3.12

AWS SDK for pandas version

3.11.0

Additional context

No response

Metadata

Metadata

Assignees

Labels

No labels
No labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions