You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Avro files can be queried directly as variants using `$1:<column>`.
22
+
:::
23
+
24
+
## Avro Querying Features Overview
25
+
26
+
Databend provides comprehensive support for querying Avro files directly from stages. This allows for flexible data exploration and transformation without needing to load the data into a table first.
27
+
28
+
***Variant Representation**: Each row in an Avro file is treated as a variant, referenced by `$1`. This allows for flexible access to nested structures within the Avro data.
29
+
***Type Mapping**: Each Avro type is mapped to a corresponding variant type in Databend.
30
+
***Metadata Access**: You can access metadata columns like `metadata$filename` and `metadata$file_row_number` for additional context about the source file and row.
31
+
32
+
## Tutorial
33
+
34
+
This tutorial demonstrates how to query Avro files stored in a stage.
35
+
36
+
### Step 1. Prepare an Avro File
37
+
38
+
Consider an Avro file with the following schema named `user`:
39
+
40
+
```json
41
+
{
42
+
"type": "record",
43
+
"name": "user",
44
+
"fields": [
45
+
{
46
+
"name": "id",
47
+
"type": "long"
48
+
},
49
+
{
50
+
"name": "name",
51
+
"type": "string"
52
+
}
53
+
]
54
+
}
55
+
```
56
+
57
+
### Step 2. Create an External Stage
58
+
59
+
Create an external stage with your own S3 bucket and credentials where your Avro files are stored.
60
+
61
+
```sql
62
+
CREATE STAGE avro_query_stage
63
+
URL ='s3://load/avro/'
64
+
CONNECTION = (
65
+
ACCESS_KEY_ID ='<your-access-key-id>'
66
+
SECRET_ACCESS_KEY ='<your-secret-access-key>'
67
+
);
68
+
```
69
+
70
+
### Step 3. Query Avro Files
71
+
72
+
#### Basic Query
73
+
74
+
Query Avro files directly from a stage:
75
+
76
+
```sql
77
+
SELECT
78
+
CAST($1:id ASINT) AS id,
79
+
$1:name AS name
80
+
FROM @avro_query_stage
81
+
(
82
+
FILE_FORMAT =>'AVRO',
83
+
PATTERN =>'.*[.]avro'
84
+
);
85
+
```
86
+
87
+
#### Query with Metadata
88
+
89
+
Query Avro files directly from a stage, including metadata columns like `metadata$filename` and `metadata$file_row_number`:
90
+
91
+
```sql
92
+
SELECT
93
+
metadata$filename AS file,
94
+
metadata$file_row_number AS row,
95
+
CAST($1:id ASINT) AS id,
96
+
$1:name AS name
97
+
FROM @avro_query_stage
98
+
(
99
+
FILE_FORMAT =>'AVRO',
100
+
PATTERN =>'.*[.]avro'
101
+
);
102
+
```
103
+
104
+
## Type Mapping to Variant
105
+
106
+
Variants in Databend are stored as JSONB. While most Avro types map straightforwardly, some special considerations apply:
107
+
108
+
***Time Types**: `TimeMillis` and `TimeMicros` are mapped to `INT64` as JSONB does not have a native Time type. Users should be aware of the original type when processing these values.
109
+
***Decimal Types**: Decimals are loaded as `DECIMAL128` or `DECIMAL256`. An error may occur if the precision exceeds the supported limits.
110
+
***Enum Types**: Avro `ENUM` types are mapped to `STRING` values in Databend.
Copy file name to clipboardExpand all lines: docs/en/guides/40-load-data/04-transform/04-querying-metadata.md
+13-17Lines changed: 13 additions & 17 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -3,11 +3,14 @@ title: Working with File and Column Metadata
3
3
sidebar_label: Metadata
4
4
---
5
5
6
-
This guide explains how to query metadata from staged files. Metadata includes both file-level metadata (such as file name and row number) and column-level metadata (such as column names, types, and nullability).
6
+
This guide explains how to query metadata from staged files. The supported file formats for metadata querying are summarized in the table below:
Databend allows you to retrieve the following column-level metadata from your staged files in the Parquet format using the [INFER_SCHEMA](/sql/sql-functions/table-functions/infer-schema) function:
57
+
Databend allows you to retrieve column-level metadata from your staged files using the [INFER_SCHEMA](/sql/sql-functions/table-functions/infer-schema) function. This is currently supported for **Parquet** files.
0 commit comments