You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
* Inside of a fragments folder, any number of [fragment folders](./fragment.md)[`<timestamped_name>`](./timestamped_name.md).
37
-
* Inside of a commit folder, an empty file [`<timestamped_name>`](./timestamped_name.md)`.wrt` associated with every fragment folder [`<timestamped_name>`](./timestamped_name.md), where [`<timestamped_name>`](./timestamped_name.md) is common for the folder and the WRT file. This is used to indicate that fragment [`<timestamped_name>`](./timestamped_name.md) has been *committed* (i.e., its write process finished successfully) and it is ready for use by TileDB. If the WRT file does not exist, the corresponding fragment folder is ignored by TileDB during the reads.
38
-
* Inside the same commit folder, any number of [delete commit files](./delete_commit_file.md) of the form [`<timestamped_name>`](./timestamped_name.md)`.del`.
39
-
* Inside the same commit folder, any number of [update commit files](./update_commit_file.md) of the form [`<timestamped_name>`](./timestamped_name.md)`.upd`.
40
-
* Inside the same commit folder, any number of [consolidated commits files](./consolidated_commits_file.md) of the form [`<timestamped_name>`](./timestamped_name.md)`.con`.
41
-
* Inside the same commit folder, any number of [ignore files](./ignore_file.md) of the form [`<timestamped_name>`](./timestamped_name.md)`.ign`.
42
-
* Inside of a fragment metadata folder, any number of [consolidated fragment metadata files](./consolidated_fragment_metadata_file.md) of the form [`<timestamped_name>`](./timestamped_name.md)`.meta`.
43
-
*[Array metadata](./metadata.md) folder `__meta`.
44
-
* Inside of a labels folder, additional TileDB arrays storing dimension label data.
42
+
* Inside of a `__schema` folder, any number of [array schema files](./array_schema.md)[`<timestamped_name>`](./timestamped_name.md).
43
+
***Note**: the name does _not_ include the format version.
44
+
*_New in version 20_ Inside of the schema folder, an enumerations folder `__enumerations`.
45
+
* Inside of a `__meta` folder, any number of [array metadata files](./metadata.md)[`<timestamped_name>`](./timestamped_name.md).
46
+
* Inside of a `__fragments` folder, any number of [fragment folders](./fragment.md)[`<timestamped_name>`](./timestamped_name.md).
47
+
*_New in version 18_ Inside of a `__labels` folder, additional TileDB arrays storing dimension label data.
48
+
*_New in version 12_ Inside of a `__commits` folder:
49
+
* Any number of empty files [`<timestamped_name>`](./timestamped_name.md)`.wrt`, each associated with fragment folder [`<timestamped_name>`](./timestamped_name.md), indicating that the fragment has been *committed* (i.e., its write process finished successfully). If the WRT file does not exist, the corresponding fragment must be ignored when reading the array.
50
+
* Any number of [consolidated commits files](./consolidated_commits_file.md) of the form [`<timestamped_name>`](./timestamped_name.md)`.con`.
51
+
* Any number of [ignore files](./ignore_file.md) of the form [`<timestamped_name>`](./timestamped_name.md)`.ign`.
52
+
*_New in version 16_ Any number of [delete commit files](./delete_commit_file.md) of the form [`<timestamped_name>`](./timestamped_name.md)`.del`.
53
+
*_New in version 16_ Any number of [update commit files](./update_commit_file.md) of the form [`<timestamped_name>`](./timestamped_name.md)`.upd`.
54
+
*_New in version 12_ Inside of a `__fragment_meta` folder, any number of [consolidated fragment metadata files](./consolidated_fragment_metadata_file.md) of the form [`<timestamped_name>`](./timestamped_name.md)`.meta`.
55
+
56
+
> [!NOTE]
57
+
> Prior to version 12, fragments, commit files, and consolidated fragment metadata were stored directly in the array folder and the extension of commit files was `.ok` instead of `.wrt`. Implementations must support arrays that contain data in both the old and the new hierarchy at the same time.
58
+
59
+
> [!NOTE]
60
+
> Prior to version 10, the array schema was stored in a single `__array_schema.tdb` file in the array folder. Implementations must support arrays that contain both `__array_schema.tdb` and schemas in the `__schema` folder at the same time. For the purpose of array schema evolution, the timestamp of `__array_schema.tdb` must be considered to be earlier than any schema in the `__schema` folder.
61
+
62
+
> [!NOTE]
63
+
> Prior to version 5, commit files were not written. Fragments of these versions are considered to be committed if their corresponding fragment metadata file exists.
Copy file name to clipboardExpand all lines: format_spec/array_format_history.md
+7-7Lines changed: 7 additions & 7 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -1,8 +1,8 @@
1
1
---
2
-
title: Format version history
2
+
title: Array format version history
3
3
---
4
4
5
-
# Format Version History
5
+
# Array Format Version History
6
6
7
7
## Version 22
8
8
@@ -24,7 +24,7 @@ Introduced in TileDB 2.19
24
24
Introduced in TileDB 2.17
25
25
26
26
* Arrays can have [enumerations](./enumeration.md).
27
-
* The bit-width reduction and positive delta filters are supported on data of date or time types.
27
+
* The bit-width reduction and positive delta encoding filters are supported on data of date or time types.
28
28
* The [filter pipeline options](./filter_pipeline.md#filter-options) for the double-delta filter contain the _Reinterpret datatype_ field.
29
29
30
30
## Version 19
@@ -45,7 +45,7 @@ Introduced in TileDB 2.15
45
45
Introduced in TileDB 2.14
46
46
47
47
* The _Order_ field was added to [attributes](./array_schema.md#attribute).
48
-
* Cell offsets in dimensions or attributes of UTF-8 string type are not written in the offset tiles, if the RLE or dictionary filter exists in the filter pipeline. They are instead encoded as part of the data tile.
48
+
* Cell offsets in dimensions or attributes of UTF-8 string type are not written in the offset tiles, if the RLE or dictionary encoding filter exists in the filter pipeline. They are instead encoded as part of the data tile.
49
49
50
50
## Version 16
51
51
@@ -72,7 +72,7 @@ Introduced in TileDB 2.10
72
72
73
73
Introduced in TileDB 2.9
74
74
75
-
*The [dictionary filter](./filters/dictionary_encoding.md) was added.
75
+
*Cell offsets in dimensions or attributes of ASCII string type are not written in the offset tiles, if the dictionary encoding filter exists in the filter pipeline. They are instead encoded as part of the data tile.
76
76
77
77
## Version 12
78
78
@@ -86,7 +86,7 @@ Introduced in TileDB 2.8
86
86
87
87
Introduced in TileDB 2.7
88
88
89
-
* Fragment metadata contain [metadata](./fragment.md#tile-mins-maxes) (min/max value, sum, null count) for each tile.
89
+
* Fragment metadata contain [metadata](./fragment.md#tile-mins-maxes) (min/max value, sum, null count) for data in the whole fragment and each tile.
90
90
* The TileDB implementation has been updated to never split cells when storing them in chunks.
91
91
92
92
## Version 10
@@ -154,7 +154,7 @@ Introduced in TileDB 1.6
154
154
* The [footer](./fragment.md#footer) and [R-Tree](./fragment.md#r-tree) structures were added.
155
155
* The _Bounding coords_ field was removed.
156
156
* The _MBRs_ field was removed. MBRs are now stored in the R-Tree.
157
-
*Structures other than the footer like tile offsets, sizes and metadata are wrapped in their own generic tiles. This allows loading them lazily and in parallel.
157
+
*Tile offsetsand sizes are wrapped in their own generic tiles. This allows loading them lazily and in parallel.
Copy file name to clipboardExpand all lines: format_spec/array_schema.md
+43-54Lines changed: 43 additions & 54 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -2,75 +2,48 @@
2
2
title: Array Schema
3
3
---
4
4
5
-
## Current Array Schema Version
6
-
7
-
The current array schema version(`>=10`) is a folder called `__schema` located here:
8
-
9
-
```
10
-
my_array # array folder
11
-
| ...
12
-
|_ __schema # array schema folder
13
-
|_ <timestamped_name> # array schema file
14
-
|_ ...
15
-
```
16
-
17
-
The array schema folder can contain:
18
-
19
-
* Any number of [array schema files](#array-schema-file) with name [`<timestamped_name>`](./timestamped_name.md).
20
-
* Note: the name does _not_ include the format version.
21
-
22
-
## Previous Array Schema Version
23
-
24
-
The previous array schema version(`<=9`) has a file named `__array_schema.tdb` and is located here:
25
-
26
-
```
27
-
my_array # array folder
28
-
|_ ....
29
-
|_ __array_schema.tdb # array schema file
30
-
|_ ...
31
-
```
32
-
33
5
## Array Schema File
34
6
35
7
The array schema file consists of a single [generic tile](./generic_tile.md), with the following data:
36
8
37
9
|**Field**|**Type**|**Description**|
38
10
| :--- | :--- | :--- |
39
-
| Array version |`uint32_t`| Format version number of the array schema |
40
-
| Allows dups |`bool`| Whether or not the array allows duplicate cells |
11
+
| Array version |`uint32_t`|[Format version](./array_format_history.md) number of the array schema |
12
+
| Allows dups |`bool`|_New in version 5_Whether or not the array allows duplicate cells |
41
13
| Array type |`uint8_t`| Dense or sparse |
42
14
| Tile order |`uint8_t`| Row or column major |
43
15
| Cell order |`uint8_t`| Row or column major |
44
16
| Capacity |`uint64_t`| For sparse fragments, the data tile capacity |
45
17
| Coords filters |[Filter Pipeline](./filter_pipeline.md)| The filter pipeline used as default for coordinate tiles |
46
18
| Offsets filters |[Filter Pipeline](./filter_pipeline.md)| The filter pipeline used for cell var-len offset tiles |
47
-
| Validity filters |[Filter Pipeline](./filter_pipeline.md)| The filter pipeline used for cell validity tiles |
19
+
| Validity filters |[Filter Pipeline](./filter_pipeline.md)|_New in version 7_The filter pipeline used for cell validity tiles |
48
20
| Domain |[Domain](#domain)| The array domain |
49
21
| Num attributes |`uint32_t`| Number of attributes in the array |
50
22
| Attribute 1 |[Attribute](#attribute)| First attribute |
51
23
| … | … | … |
52
24
| Attribute N |[Attribute](#attribute)| Nth attribute |
53
-
| Num labels |`uint32_t`| Number of dimension labels in the array |
54
-
| Label 1 |[Dimension Label](#dimension_label)| First dimension label |
25
+
| Num labels |`uint32_t`|_New in version 18_Number of dimension labels in the array |
26
+
| Label 1 |[Dimension Label](#dimension_label)|_New in version 18_First dimension label |
55
27
| … | … | … |
56
-
| Label N |[Dimension Label](#dimension_label)| Nth dimension label |
57
-
| Num enumerations |`uint32_t`| Number of [enumerations](./enumeration.md) in the array |
58
-
| Enumeration name length 1 |`uint32_t`| The number of characters in the enumeration 1 name |
59
-
| Enumeration name 1 |`uint8_t[]`| The name of enumeration 1 |
60
-
| Enumeration filename length 1 |`uint32_t`| The number of characters in the enumeration 1 file |
61
-
| Enumeration filename 1 |`uint8_t[]`| The name of the file in the `__enumerations` subdirectory that conatins enumeration 1's data |
62
-
| Enumeration name length N |`uint32_t`| The number of characters in the enumeration N name |
63
-
| Enumeration name N |`uint8_t[]`| The name of enumeration N |
64
-
| Enumeration filename length N |`uint32_t`| The number of characters in the enumeration N file |
65
-
| Enumeration filename N |`uint8_t[]`| The name of the file in the `__enumerations` subdirectory that conatins enumeration N's data |
66
-
|CurrentDomain |[CurrentDomain](./current_domain.md)| The array current domain |
28
+
| Label N |[Dimension Label](#dimension_label)|_New in version 18_Nth dimension label |
29
+
| Num enumerations |`uint32_t`|_New in version 20_Number of [enumerations](./enumeration.md) in the array |
30
+
| Enumeration name length 1 |`uint32_t`|_New in version 20_The number of characters in the enumeration 1 name |
31
+
| Enumeration name 1 |`uint8_t[]`|_New in version 20_The name of enumeration 1 |
32
+
| Enumeration filename length 1 |`uint32_t`|_New in version 20_The number of characters in the enumeration 1 file |
33
+
| Enumeration filename 1 |`uint8_t[]`|_New in version 20_The name of the file in the `__enumerations` subdirectory that contains enumeration 1's data |
34
+
| Enumeration name length N |`uint32_t`|_New in version 20_The number of characters in the enumeration N name |
35
+
| Enumeration name N |`uint8_t[]`|_New in version 20_The name of enumeration N |
36
+
| Enumeration filename length N |`uint32_t`|_New in version 20_The number of characters in the enumeration N file |
37
+
| Enumeration filename N |`uint8_t[]`|_New in version 20_The name of the file in the `__enumerations` subdirectory that contains enumeration N's data |
38
+
|Current domain |[Current Domain](#current-domain)|_New in version 22_The array's current domain |
67
39
68
40
## Domain
69
41
70
42
The domain has internal format:
71
43
72
44
|**Field**|**Type**|**Description**|
73
45
| :--- | :--- | :--- |
46
+
| Domain datatype |`uint8_t`|_Removed in version 5_ Datatype of all dimensions |
74
47
| Num dimensions |`uint32_t`| Dimensionality/rank of the domain |
75
48
| Dimension 1 |[Dimension](#dimension)| First dimension |
76
49
| … | … | … |
@@ -84,14 +57,17 @@ The dimension has internal format:
84
57
| :--- | :--- | :--- |
85
58
| Dimension name length |`uint32_t`| Number of characters in dimension name |
86
59
| Dimension name |`uint8_t[]`| Dimension name character array |
87
-
| Dimension datatype |`uint8_t`| Datatype of the coordinate values |
88
-
| Cell val num |`uint32_t`| Number of coordinate values per cell. For variable-length dimensions, this is `std::numeric_limits<uint32_t>::max()`|
89
-
| Filters |[Filter Pipeline](./filter_pipeline.md)| The filter pipeline used on coordinate value tiles |
90
-
| Domain size |`uint64_t[]`| The domain size in bytes |
60
+
| Dimension datatype |`uint8_t`|_New in version 5_Datatype of the coordinate values |
61
+
| Cell val num |`uint32_t`|_New in version 5_Number of coordinate values per cell. For variable-length dimensions, this is `std::numeric_limits<uint32_t>::max()`|
62
+
| Filters |[Filter Pipeline](./filter_pipeline.md)|_New in version 5_The filter pipeline used on coordinate value tiles |
63
+
| Domain size |`uint64_t`|_New in version 5_ The domain size in bytes |
91
64
| Domain |`uint8_t[]`| Byte array of length equal to domain size above, storing the min, max values of the dimension. |
92
65
| Null tile extent |`uint8_t`|`1` if the dimension has a null tile extent, else `0`. |
93
66
| Tile extent |`uint8_t[]`| Byte array of length equal to the dimension datatype size, storing the space tile extent of this dimension. |
94
67
68
+
> [!NOTE]
69
+
> Prior to version 5, the size of the _Domain_ field was always equal to twice the size of the dimension's data type (which is stored in the [domain](#domain) in these versions).
70
+
95
71
## Attribute
96
72
97
73
The attribute has internal format:
@@ -103,11 +79,11 @@ The attribute has internal format:
103
79
| Attribute datatype |`uint8_t`| Datatype of the attribute values |
104
80
| Cell val num |`uint32_t`| Number of attribute values per cell. For variable-length attributes, this is `std::numeric_limits<uint32_t>::max()`|
105
81
| Filters |[Filter Pipeline](./filter_pipeline.md)| The filter pipeline used on attribute value tiles |
106
-
| Fill value size |`uint64_t`| The size in bytes of the fill value |
107
-
| Fill value |`uint8_t[]`| The fill value |
108
-
| Nullable |`bool`| Whether or not the attribute can be null |
109
-
| Fill value validity |`uint8_t`| The validity fill value |
110
-
| Order |`uint8_t`| Order of the data stored in the attribute. This may be unordered, increasing or decreasing |
82
+
| Fill value size |`uint64_t`|_New in version 6_The size in bytes of the fill value |
83
+
| Fill value |`uint8_t[]`|_New in version 6_The fill value |
84
+
| Nullable |`bool`|_New in version 7_Whether or not the attribute can be null |
85
+
| Fill value validity |`uint8_t`|_New in version 7_The validity fill value |
86
+
| Order |`uint8_t`|_New in version 17_Order of the data stored in the attribute. This may be unordered, increasing or decreasing |
111
87
112
88
## Dimension Label
113
89
@@ -127,6 +103,19 @@ The dimension label has internal format:
127
103
| Label datatype |`uint8_t`| The datatype of the label data |
128
104
| Label cell_val_num |`uint32_t`| The number of values per cell of the label data. For variable-length labels, this is `std::numeric_limits<uint32_t>::max()`|
129
105
| Label domain size |`uint64_t`| The size of the label domain |
130
-
| Label domain start size |`uint64_t`| The size of the first value of the domain for variable-lenght datatypes. For fixed-lenght labels, this is 0|
106
+
| Label domain start size |`uint64_t`| The size of the first value of the domain for variable-length datatypes. For fixed-length labels, this is 0|
131
107
| Label domain data |`uint8_t[]`| Byte array of length equal to domain size above, storing the min, max values of the dimension |
132
108
| Is external |`uint8_t`| If the URI is not stored as part of this array |
109
+
110
+
## Current Domain
111
+
112
+
If a current domain is empty, only the version number and the empty flag are serialized to storage.
113
+
114
+
The current domain format is versioned separately from arrays. The current version is `1`.
115
+
116
+
|**Field**|**Type**|**Description**|
117
+
| :--- | :--- | :--- |
118
+
| Version number |`uint32_t`| Current domain version number |
119
+
| Empty |`uint8_t`| Whether the current domain has a representation (e.g. NDRectangle) set |
120
+
| Type |`uint8_t`| The type of current domain stored in this file |
121
+
| NDRectangle |[MBR](./fragment.md#mbr)| A hyperrectangle defined using [1DRange](./fragment.md#mbr) items for each dimension |
0 commit comments