Skip to content

Commit 6df81ff

Browse files
authored
Merge pull request #10541 from BohuTANG/doc-map
docs(map): add map type and more refactory the data type category
2 parents 61f108a + 28da72e commit 6df81ff

File tree

8 files changed

+131
-52
lines changed

8 files changed

+131
-52
lines changed

docs/doc/13-sql-reference/10-data-types/30-data-type-string-types.md

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -7,9 +7,9 @@ description: Basic String data type.
77

88
In Databend, strings can be stored in the VARCHAR field, the storage size is variable.
99

10-
| Name | Aliases | Storage Size
11-
| -------- | ------------|---------------
12-
| VARCHAR | STRING | variable
10+
| Name | Aliases | Storage Size |
11+
|---------|---------|--------------|
12+
| VARCHAR | STRING | variable |
1313

1414
## Functions
1515

docs/doc/13-sql-reference/10-data-types/41-data-type-tuple-types.md

Lines changed: 4 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -4,9 +4,10 @@ description: Tuple is a collection of ordered, immutable types.
44
---
55

66
## Tuple Data Types
7-
| Name | Aliases | Values | Description
8-
|---------|-----------|--------------------------|----------------
9-
| TUPLE | | ('2023-02-14 08:00:00','Valentine's Day') | Collection of ordered,immmutable,which requires the type of each element to be declared before being used.
7+
8+
| Name | Aliases | Values | Description |
9+
|-------|---------|-------------------------------------------|------------------------------------------------------------------------------------------------------------|
10+
| TUPLE | | ('2023-02-14 08:00:00','Valentine's Day') | Collection of ordered,immmutable,which requires the type of each element to be declared before being used. |
1011

1112
A tuple is a collection of ordered, immutable, and heterogeneous elements, represented within parentheses () in most programming languages. In other words, a tuple is a finite ordered list of elements of different data types, and once created, its elements cannot be changed or modified.
1213

Lines changed: 91 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,91 @@
1+
---
2+
title: Map
3+
---
4+
5+
The MAP data structure is utilized for holding a set of `Key:Value` pairs, and stores data using a nested data structure of Array(Tuple(key, value)). It is appropriate in situations where the data type is constant, but the `Key`'s value cannot be entirely ascertained.
6+
7+
## Understanding Key:Value
8+
9+
The `Key` is of a specified basic data type, including Boolean, Number, Decimal, String, Date, or Timestamp. A `Key`'s value cannot be Null, and duplicates are not allowed. The `Value` can be any data type, including nested arrays, tuples, and so on.
10+
11+
Map data can be generated through `Key:Value` pairs enclosed in curly braces or by using the Map function to convert two arrays into a Map. The Map function takes two arrays as input, where the elements in the first array serve as the keys and the elements in the second array serve as the values. See an example below:
12+
13+
```sql
14+
-- Input arrays: [1, 2] and ['v1', 'v2']
15+
-- Resulting Map: {1: 'v1', 2: 'v2'}
16+
17+
SELECT {'k1': 1, 'k2': 2}, map([1, 2], ['v1', 'v2']);
18+
+-----------------+---------------------------+
19+
| {'k1':1,'k2':2} | map([1, 2], ['v1', 'v2']) |
20+
+-----------------+---------------------------+
21+
| {'k1':1,'k2':2} | {1:'v1',2:'v2'} |
22+
+-----------------+---------------------------+
23+
```
24+
25+
## Map and Bloom Filter Index
26+
27+
In Databend Map, a bloom filter index is created for the value with certain data types: `Numeric`, `String`, `Timestamp`, and `Date`.
28+
29+
This makes it easier and faster to search for values in the MAP data structure.
30+
31+
The implementation of the bloom filter index in Databend Map is in [PR#10457](https://github.com/datafuselabs/databend/pull/10457).
32+
33+
The bloom filter is particularly effective in reducing query time when the queried value does not exist.
34+
35+
For example:
36+
```sql
37+
select * from nginx_log where log['ip'] = '205.91.162.148';
38+
+----+----------------------------------------+
39+
| id | log |
40+
+----+----------------------------------------+
41+
| 1 | {'ip':'205.91.162.148','url':'test-1'} |
42+
+----+----------------------------------------+
43+
1 row in set
44+
Time: 1.733s
45+
46+
select * from nginx_log where log['ip'] = '205.91.162.141';
47+
+----+-----+
48+
| id | log |
49+
+----+-----+
50+
+----+-----+
51+
0 rows in set
52+
Time: 0.129s
53+
```
54+
55+
## Examples
56+
57+
The following example creates a table that includes a Map column, then queries Map data from the table.
58+
59+
```sql
60+
-- Create a table
61+
CREATE TABLE map_table(m MAP(INT64, STRING));
62+
63+
DESC map_table;
64+
+-------+--------------------+------+---------+-------+
65+
| Field | Type | Null | Default | Extra |
66+
+-------+--------------------+------+---------+-------+
67+
| m | MAP(INT64, STRING) | NO | {} | |
68+
+-------+--------------------+------+---------+-------+
69+
70+
-- Insert Map data
71+
INSERT INTO map_table VALUES({1:'a',2:'b'}), ({1:'c',3:'d',4:'e'});
72+
73+
SELECT * FROM map_table;
74+
+---------------------+
75+
| m |
76+
+---------------------+
77+
| {1:'a',2:'b'} |
78+
| {1:'c',3:'d',4:'e'} |
79+
+---------------------+
80+
81+
-- Query Values in Map by Keys
82+
-- NULL will be returned if Key is not found in a row.
83+
84+
SELECT m[1], m[3] FROM map_table;
85+
+------+------+
86+
| m[1] | m[3] |
87+
+------+------+
88+
| a | NULL |
89+
| c | d |
90+
+------+------+
91+
```

docs/doc/13-sql-reference/10-data-types/42-data-type-semi-structured-types.md renamed to docs/doc/13-sql-reference/10-data-types/43-data-type-variant.md

Lines changed: 3 additions & 11 deletions
Original file line numberDiff line numberDiff line change
@@ -1,18 +1,10 @@
11
---
2-
title: Semi-structured
3-
description: Semi-structured Types can hold any other data types.
2+
title: Variant
43
---
54

6-
## Semi-structured Data Types
7-
| Name | Aliases | Build From Values | Description
8-
|---------|-----------|--------------------------|----------------
9-
| VARIANT | JSON | [1,{"a":1,"b":{"c":2}}] | Collection of elements of different data types, including NULL, BOOLEAN, NUMBER, STRING, ARRAY, and OBJECT.
10-
11-
## Variant Data Types
12-
135
A VARIANT can store a value of any other type, including NULL, BOOLEAN, NUMBER, STRING, ARRAY, and OBJECT, and the internal value can be any level of nested structure, which is very flexible to store various data. VARIANT can also be called JSON, for more information, please refer to [JSON website](https://www.json.org/json-en.html)
146

15-
### Example
7+
Here's an example of inserting and querying Variant data in Databend:
168

179
Create a table:
1810
```sql
@@ -232,4 +224,4 @@ SELECT sum(arr[0]::INT) FROM array_table GROUP BY arr[0]::INT;
232224

233225
## Functions
234226

235-
See [Semi-structured Functions](/doc/reference/functions/semi-structured-functions).
227+
See [Variant Functions](/doc/reference/functions/variant-functions).
Lines changed: 23 additions & 31 deletions
Original file line numberDiff line numberDiff line change
@@ -1,40 +1,32 @@
11
---
2-
title: Databend Data Types
2+
title: Data Types
33
sidebar_position: 1
44
slug: ./
55
---
66

7-
Databend supports SQL data types in several categories:
8-
* [Boolean Data Types](00-data-type-logical-types.md)
9-
* [Numeric Data Types](10-data-type-numeric-types.md)
10-
* [Decimal Data Types](11-data-type-decimal-types.md)
11-
* [Date & Time Data Types](20-data-type-time-date-types.md)
12-
* [String Data Types](30-data-type-string-types.md)
13-
* [Array(T) Data Types](40-data-type-array-types.md)
14-
* [Tuple Data Types](41-data-type-tuple-types.md)
15-
* [Semi-structured Data Types](42-data-type-semi-structured-types.md)
7+
Databend is capable of handling both general and semi-structured data types.
168

17-
## General-Purpose Data Types
9+
## General Data Types
1810

19-
| Name | Aliases | Storage Size | Min Value | Max Value | Description |
20-
|---------------|---------|--------------|--------------------------|--------------------------------|-------------------------------------------------------------------------|
21-
| **BOOLEAN** | BOOL | 1 byte | | | Logical boolean (true/false) |
22-
| **TINYINT** | INT8 | 1 byte | -128 | 127 | |
23-
| **SMALLINT** | INT16 | 2 bytes | -32768 | 32767 | |
24-
| **INT** | INT32 | 4 bytes | -2147483648 | 2147483647 | |
25-
| **BIGINT** | INT64 | 8 bytes | -9223372036854775808 | 9223372036854775807 | |
26-
| **FLOAT** | | 4 bytes | -3.40282347e+38 | 3.40282347e+38 | |
27-
| **DOUBLE** | | 8 bytes | -1.7976931348623157E+308 | 1.7976931348623157E+308 | |
28-
| **DECIMAL** | | 16/32 bytes | -10^P / 10^S | 10^P / 10^S | |
29-
| **DATE** | | 4 bytes | 1000-01-01 | 9999-12-31 | YYYY-MM-DD |
30-
| **TIMESTAMP** | | 8 bytes | 0001-01-01 00:00:00 | 9999-12-31 23:59:59.999999 UTC | YYYY-MM-DD hh:mm:ss[.fraction], up to microseconds (6 digits) precision |
31-
| **VARCHAR** | STRING | variable | | | |
32-
| **ARRAY** | | | | | [1,2,3] |
33-
| **TUPLE** | | | | | ('2023-02-14 08:00:00','Valentine's Day') |
11+
| Data Type | Alias | Storage Size | Min Value | Max Value |
12+
|-----------|--------|--------------|--------------------------|--------------------------------|
13+
| BOOLEAN | BOOL | 1 byte | N/A | N/A |
14+
| TINYINT | INT8 | 1 byte | -128 | 127 |
15+
| SMALLINT | INT16 | 2 bytes | -32768 | 32767 |
16+
| INT | INT32 | 4 bytes | -2147483648 | 2147483647 |
17+
| BIGINT | INT64 | 8 bytes | -9223372036854775808 | 9223372036854775807 |
18+
| FLOAT | N/A | 4 bytes | -3.40282347e+38 | 3.40282347e+38 |
19+
| DOUBLE | N/A | 8 bytes | -1.7976931348623157E+308 | 1.7976931348623157E+308 |
20+
| DECIMAL | N/A | 16/32 bytes | -10^P / 10^S | 10^P / 10^S |
21+
| DATE | N/A | 4 bytes | 1000-01-01 | 9999-12-31 |
22+
| TIMESTAMP | N/A | 8 bytes | 0001-01-01 00:00:00 | 9999-12-31 23:59:59.999999 UTC |
23+
| VARCHAR | STRING | N/A | N/A | N/A |
3424

35-
## Semi-structured Data Types
36-
37-
| Name | Aliases | Build From Values | Description |
38-
|-------------|---------|-------------------------------------------|-------------------------------------------------------------------------------------------------------------|
39-
| **VARIANT** | JSON | [1,{"a":1,"b":{"c":2}}] | Collection of elements of different data types., including ARRAY and OBJECT. |
25+
## Nested / Composite Types
4026

27+
| Data Type | Alias | Sample | Description |
28+
|-----------|-------|----------------------------------|-----------------------------------------------------------------------------------|
29+
| ARRAY | N/A | `[1, 2, 3, 4]` | A collection of values of the same data type, accessed by their index. |
30+
| TUPLE | N/A | `('2023-02-14','Valentine Day')` | An ordered collection of values of different data types, accessed by their index. |
31+
| MAP | N/A | `{"a":1, "b":2, "c":3}` | A set of key-value pairs where each key is unique and maps to a value. | |
32+
| VARIANT | JSON | `[1,{"a":1,"b":{"c":2}}]` | Collection of elements of different data types, including `ARRAY` and `OBJECT`. |

docs/doc/14-sql-commands/00-ddl/20-table/10-ddl-create-table.md

Lines changed: 4 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -48,7 +48,10 @@ Data type reference:
4848
* [Numeric Data Types](../../../13-sql-reference/10-data-types/10-data-type-numeric-types.md)
4949
* [Date & Time Data Types](../../../13-sql-reference/10-data-types/20-data-type-time-date-types.md)
5050
* [String Data Types](../../../13-sql-reference/10-data-types/30-data-type-string-types.md)
51-
* [Semi-structured Data Types](../../../13-sql-reference/10-data-types/42-data-type-semi-structured-types.md)
51+
* [Array Data Types](../../../13-sql-reference/10-data-types/40-data-type-array-types.md)
52+
* [Tuple Data Types](../../../13-sql-reference/10-data-types/41-data-type-tuple-types.md)
53+
* [Map Data Types](../../../13-sql-reference/10-data-types/42-data-type-map.md)
54+
* [Semi-structured Data Types](../../../13-sql-reference/10-data-types/43-data-type-variant.md)
5255
:::
5356

5457
For detailed information about the CLUSTER BY clause, see [SET CLUSTER KEY](../70-clusterkey/dml-set-cluster-key.md).

docs/doc/14-sql-commands/20-query-syntax/dml-json-path.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -2,7 +2,7 @@
22
title: JSON PATH
33
---
44

5-
Databend supports [Semi-structured data type](../../13-sql-reference/10-data-types/42-data-type-semi-structured-types.md) and allow retrieving the inner elements by JSON path operators:
5+
Databend supports [Semi-structured data type](../../13-sql-reference/10-data-types/43-data-type-variant.md) and allow retrieving the inner elements by JSON path operators:
66

77
## Syntax
88

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -1,7 +1,7 @@
11
{
2-
"label": "Semi-structured Data Functions",
2+
"label": "Variant Functions",
33
"link": {
44
"type": "generated-index",
5-
"slug": "/reference/functions/semi-structured-functions"
5+
"slug": "/reference/functions/variant-functions"
66
}
77
}

0 commit comments

Comments
 (0)