|
1 | 1 | --- |
2 | | -title: Query Metadata for Staged Files |
| 2 | +title: Working with File and Column Metadata |
3 | 3 | sidebar_label: Metadata |
4 | 4 | --- |
5 | 5 |
|
6 | | -## Why and What is Metadata? |
| 6 | +This guide explains how to query metadata from staged files. Metadata includes both file-level metadata (such as file name and row number) and column-level metadata (such as column names, types, and nullability). |
7 | 7 |
|
8 | | -Databend allows you to retrieve metadata from your data files using the [INFER_SCHEMA](/sql/sql-functions/table-functions/infer-schema) function. This means you can extract column definitions from data files stored in internal or external stages. Retrieving metadata through the `INFER_SCHEMA` function provides a better understanding of the data structure, ensures data consistency, and enables automated data integration and analysis. The metadata for each column includes the following information: |
| 8 | +## Accessing File-Level Metadata |
9 | 9 |
|
10 | | -- **column_name**: Indicates the name of the column. |
11 | | -- **type**: Indicates the data type of the column. |
12 | | -- **nullable**: Indicates whether the column allows null values. |
13 | | -- **order_id**: Represents the column's position in the table. |
| 10 | +Databend supports accessing the following file-level metadata fields when reading staged files in the formats CSV, TSV, Parquet, and NDJSON: |
14 | 11 |
|
15 | | -:::note |
16 | | -This feature is currently only available for the Parquet file format. |
17 | | -::: |
| 12 | +| File Metadata | Type | Description | |
| 13 | +|----------------------------|---------|--------------------------------------------------| |
| 14 | +| `metadata$filename` | VARCHAR | The name of the file from which the row was read | |
| 15 | +| `metadata$file_row_number` | INT | The row number within the file (starting from 0) | |
18 | 16 |
|
19 | | -The syntax for `INFER_SCHEMA` is as follows. For more detailed information about this function, see [INFER_SCHEMA](/sql/sql-functions/table-functions/infer-schema). |
| 17 | +These metadata fields are available in: |
20 | 18 |
|
21 | | -```sql |
22 | | -INFER_SCHEMA( |
23 | | - LOCATION => '{ internalStage | externalStage }' |
24 | | - [ PATTERN => '<regex_pattern>'] |
25 | | -) |
26 | | -``` |
| 19 | +- SELECT queries over stages (e.g., `SELECT FROM @stage`) |
| 20 | +- `COPY INTO <table>` statements |
27 | 21 |
|
28 | | -## Tutorial: Querying Column Definitions |
| 22 | +### Examples |
29 | 23 |
|
30 | | -In this tutorial, we will guide you through the process of uploading the sample file to an internal stage, querying the column definitions, and finally creating a table based on the staged file. Before you start, download and save the sample file [books.parquet](https://datafuse-1253727613.cos.ap-hongkong.myqcloud.com/data/books.parquet) to a local folder. |
| 24 | +1. Querying Metadata Fields |
31 | 25 |
|
32 | | -1. Create an internal stage named *my_internal_stage*: |
| 26 | +You can directly select metadata fields when reading from a stage: |
33 | 27 |
|
34 | 28 | ```sql |
35 | | -CREATE STAGE my_internal_stage; |
| 29 | +SELECT |
| 30 | + metadata$filename, |
| 31 | + metadata$file_row_number, |
| 32 | + * |
| 33 | +FROM @my_internal_stage/iris.parquet |
| 34 | +LIMIT 5; |
36 | 35 | ``` |
37 | 36 |
|
38 | | -2. Stage the sample file using [BendSQL](../../30-sql-clients/00-bendsql/index.md): |
39 | | - |
40 | 37 | ```sql |
41 | | -PUT fs:///Users/eric/Documents/books.parquet @my_internal_stage |
| 38 | +┌──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┐ |
| 39 | +│ metadata$filename │ metadata$file_row_number │ id │ sepal_length │ sepal_width │ petal_length │ petal_width │ species │ metadata$filename │ metadata$file_row_number │ |
| 40 | +├───────────────────┼──────────────────────────┼─────────────────┼───────────────────┼───────────────────┼───────────────────┼───────────────────┼──────────────────┼───────────────────┼──────────────────────────┤ |
| 41 | +│ iris.parquet │ 0 │ 1 │ 5.1 │ 3.5 │ 1.4 │ 0.2 │ setosa │ iris.parquet │ 0 │ |
| 42 | +│ iris.parquet │ 1 │ 2 │ 4.9 │ 3 │ 1.4 │ 0.2 │ setosa │ iris.parquet │ 1 │ |
| 43 | +│ iris.parquet │ 2 │ 3 │ 4.7 │ 3.2 │ 1.3 │ 0.2 │ setosa │ iris.parquet │ 2 │ |
| 44 | +│ iris.parquet │ 3 │ 4 │ 4.6 │ 3.1 │ 1.5 │ 0.2 │ setosa │ iris.parquet │ 3 │ |
| 45 | +│ iris.parquet │ 4 │ 5 │ 5 │ 3.6 │ 1.4 │ 0.2 │ setosa │ iris.parquet │ 4 │ |
| 46 | +└──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┘ |
42 | 47 | ``` |
43 | 48 |
|
44 | | -Result: |
45 | | -``` |
46 | | -┌───────────────────────────────────────────────┐ |
47 | | -│ file │ status │ |
48 | | -│ String │ String │ |
49 | | -├─────────────────────────────────────┼─────────┤ |
50 | | -│ /Users/eric/Documents/books.parquet │ SUCCESS │ |
51 | | -└───────────────────────────────────────────────┘ |
52 | | -``` |
| 49 | +2. Using Metadata in COPY INTO |
53 | 50 |
|
54 | | -3. Query the column definitions from the staged sample file: |
| 51 | +You can pass metadata fields into target table columns using COPY INTO: |
55 | 52 |
|
56 | 53 | ```sql |
57 | | -SELECT * FROM INFER_SCHEMA(location => '@my_internal_stage/books.parquet'); |
| 54 | +COPY INTO iris_with_meta |
| 55 | +FROM (SELECT metadata$filename, metadata$file_row_number, $1, $2, $3, $4, $5 FROM @my_internal_stage/iris.parquet) |
| 56 | +FILE_FORMAT=(TYPE=parquet); |
58 | 57 | ``` |
59 | 58 |
|
60 | | -Result: |
61 | | -``` |
62 | | -┌─────────────┬─────────┬─────────┬─────────┐ |
63 | | -│ column_name │ type │ nullable│ order_id│ |
64 | | -├─────────────┼─────────┼─────────┼─────────┤ |
65 | | -│ title │ VARCHAR │ 0 │ 0 │ |
66 | | -│ author │ VARCHAR │ 0 │ 1 │ |
67 | | -│ date │ VARCHAR │ 0 │ 2 │ |
68 | | -└─────────────┴─────────┴─────────┴─────────┘ |
69 | | -``` |
| 59 | +## Inferring Column Metadata from Files |
70 | 60 |
|
71 | | -4. Create a table named *mybooks* based on the staged sample file: |
| 61 | +Databend allows you to retrieve the following column-level metadata from your staged files in the Parquet format using the [INFER_SCHEMA](/sql/sql-functions/table-functions/infer-schema) function: |
72 | 62 |
|
73 | | -```sql |
74 | | -CREATE TABLE mybooks AS SELECT * FROM @my_internal_stage/books.parquet; |
75 | | -``` |
| 63 | +| Column Metadata | Type | Description | |
| 64 | +|-----------------|---------|--------------------------------------------------| |
| 65 | +| `column_name` | String | Indicates the name of the column. | |
| 66 | +| `type` | String | Indicates the data type of the column. | |
| 67 | +| `nullable` | Boolean | Indicates whether the column allows null values. | |
| 68 | +| `order_id` | UInt64 | Represents the column's position in the table. | |
76 | 69 |
|
77 | | -Check the created table: |
| 70 | +### Examples |
78 | 71 |
|
79 | | -```sql |
80 | | -DESC mybooks; |
81 | | -``` |
| 72 | +The following example retrieves column metadata from a Parquet file staged in `@my_internal_stage`: |
82 | 73 |
|
83 | | -Result: |
84 | | -``` |
85 | | -┌─────────┬─────────┬──────┬─────────┬───────┐ |
86 | | -│ Field │ Type │ Null │ Default │ Extra │ |
87 | | -├─────────┼─────────┼──────┼─────────┼───────┤ |
88 | | -│ title │ VARCHAR │ NO │ '' │ │ |
89 | | -│ author │ VARCHAR │ NO │ '' │ │ |
90 | | -│ date │ VARCHAR │ NO │ '' │ │ |
91 | | -└─────────┴─────────┴──────┴─────────┴───────┘ |
| 74 | +```sql |
| 75 | +SELECT * FROM INFER_SCHEMA(location => '@my_internal_stage/iris.parquet'); |
92 | 76 | ``` |
93 | 77 |
|
94 | 78 | ```sql |
95 | | -SELECT * FROM mybooks; |
| 79 | +┌──────────────────────────────────────────────┐ |
| 80 | +│ column_name │ type │ nullable │ order_id │ |
| 81 | +├──────────────┼─────────┼──────────┼──────────┤ |
| 82 | +│ id │ BIGINT │ true │ 0 │ |
| 83 | +│ sepal_length │ DOUBLE │ true │ 1 │ |
| 84 | +│ sepal_width │ DOUBLE │ true │ 2 │ |
| 85 | +│ petal_length │ DOUBLE │ true │ 3 │ |
| 86 | +│ petal_width │ DOUBLE │ true │ 4 │ |
| 87 | +│ species │ VARCHAR │ true │ 5 │ |
| 88 | +└──────────────────────────────────────────────┘ |
96 | 89 | ``` |
97 | 90 |
|
98 | | -Result: |
99 | | -``` |
100 | | -┌───────────────────────────┬───────────────────┬──────┐ |
101 | | -│ title │ author │ date │ |
102 | | -├───────────────────────────┼───────────────────┼──────┤ |
103 | | -│ Transaction Processing │ Jim Gray │ 1992 │ |
104 | | -│ Readings in Database Systems│ Michael Stonebraker│ 2004│ |
105 | | -└───────────────────────────┴───────────────────┴──────┘ |
106 | | -``` |
| 91 | +## Tutorials |
| 92 | + |
| 93 | +- [Querying Metadata](/tutorials/load/query-metadata) |
0 commit comments