Skip to content

Commit e7d0886

Browse files
Fix markdown format (#457)
Fix markdown format for storage spec
1 parent c3a6a51 commit e7d0886

File tree

1 file changed

+54
-55
lines changed

1 file changed

+54
-55
lines changed

documentation/storage-format-spec.md

Lines changed: 54 additions & 55 deletions
Original file line numberDiff line numberDiff line change
@@ -14,7 +14,7 @@ All data and metadata required for a TileDB-Vector-Search index are stored insid
1414
Metadata values required for configuring the different properties of an index are stored in the `index_uri` group metadata. There are some metadata values that are required for all algorithm implementations as well as per-algorithm specific metadata values. Below is a table of all the metadata values that are recorded for all algorithms.
1515

1616
| Name | Description |
17-
|------------------------|--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
17+
| ---------------------- | ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------ |
1818
| `dataset_type` | The asset type for disambiguation in TileDB cloud. Value: `vector_search` |
1919
| `index_type` | The index algorithm used for this index. Can be one of the following values: `FLAT`, `IVF_FLAT`, `VAMANA`, `IVF_PQ` |
2020
| `storage_version` | The storage version used for the index. The storage version is used to make sure that indexing algorithms can update their storage logic without affecting previously created indexes and maintaining backwards compatibility. |
@@ -25,22 +25,22 @@ Metadata values required for configuring the different properties of an index ar
2525

2626
### Object metadata
2727

28-
This is a 1D sparse array with `external_id` as dimension and attributes the user defined metadata attributes for the respective vectors.
28+
This is a 1D sparse array with `external_id` as dimension and attributes the user defined metadata attributes for the respective vectors.
2929

3030
#### Basic schema parameters
3131

3232
| **Parameter** | **Value** |
33-
|:--------------|:----------|
33+
| :------------ | :-------- |
3434
| Array type | Sparse |
3535
| Rank | 1D |
3636
| Cell order | Row-major |
3737
| Tile order | Row-major |
3838

3939
#### Dimensions
4040

41-
| Dimension Name | TileDB Datatype |
42-
| :------------- | :-------------------- |
43-
| `external_id` | `uint64_t` |
41+
| Dimension Name | TileDB Datatype |
42+
| :------------- | :-------------- |
43+
| `external_id` | `uint64_t` |
4444

4545
### Updates
4646

@@ -57,15 +57,15 @@ TileDB-Vector-Search offers support for updates for all different index algorith
5757

5858
#### Dimensions
5959

60-
| Dimension Name | TileDB Datatype |
61-
| :------------- | :-------------------- |
62-
| `external_id` | `uint64_t` |
60+
| Dimension Name | TileDB Datatype |
61+
| :------------- | :-------------- |
62+
| `external_id` | `uint64_t` |
6363

6464
#### Attributes
6565

66-
| Attribute Name | TileDB Datatype | Description |
67-
| :--------------- | :-------------- | :--------------------------------------------------------------------------------------------- |
68-
| `vector` | variable `dtype`| Contains the vector value. Empty values correspond to vector deletions. |
66+
| Attribute Name | TileDB Datatype | Description |
67+
| :------------- | :--------------- | :---------------------------------------------------------------------- |
68+
| `vector` | variable `dtype` | Contains the vector value. Empty values correspond to vector deletions. |
6969

7070
## Algorithm specific storage format
7171

@@ -78,7 +78,7 @@ This is a 2D dense array that holds all the vectors with no specific ordering.
7878
#### Basic schema parameters
7979

8080
| **Parameter** | **Value** |
81-
|:--------------|:----------|
81+
| :------------ | :-------- |
8282
| Array type | Dense |
8383
| Rank | 2D |
8484
| Cell order | Col-major |
@@ -87,15 +87,15 @@ This is a 2D dense array that holds all the vectors with no specific ordering.
8787
#### Dimensions
8888

8989
| Dimension Name | TileDB Datatype | Domain | Description |
90-
|:---------------|:----------------|:------------------|:----------------------------------------------------------|
90+
| :------------- | :-------------- | :---------------- | :-------------------------------------------------------- |
9191
| `rows` | `int32_t` | `[0, dimensions]` | Corresponds to the vector dimensions. |
9292
| `cols` | `int32_t` | `[0, MAX_INT32]` | Corresponds to the vector position in the set of vectors. |
9393

9494
#### Attributes
9595

96-
| Attribute Name | TileDB Datatype | Description |
97-
| :--------------- | :-------------- | :---------------------------------------------------------------------------|
98-
| `values` | `dtype` | Contains the vector value at the specific dimension. |
96+
| Attribute Name | TileDB Datatype | Description |
97+
| :------------- | :-------------- | :--------------------------------------------------- |
98+
| `values` | `dtype` | Contains the vector value at the specific dimension. |
9999

100100
#### `shuffled_ids`
101101

@@ -112,22 +112,22 @@ This is a 1D dense array that maps vector positions in the `shuffled_vectors` ar
112112

113113
#### Dimensions
114114

115-
| Dimension Name | TileDB Datatype | Domain | Description |
116-
| :------------- | :-------------------- | :-----------------| :--------------------------------------------------------- |
117-
| `rows` | `int32_t` | `[0, MAX_INT32]` | Corresponds to the vector position in `shuffled_vectors`. |
115+
| Dimension Name | TileDB Datatype | Domain | Description |
116+
| :------------- | :-------------- | :--------------- | :-------------------------------------------------------- |
117+
| `rows` | `int32_t` | `[0, MAX_INT32]` | Corresponds to the vector position in `shuffled_vectors`. |
118118

119119
#### Attributes
120120

121-
| Attribute Name | TileDB Datatype | Description |
122-
| :--------------- | :-------------- | :---------------------------------------------------------------------------|
123-
| `values` | `uint64_t` | Contains the vector's `external_id`. |
121+
| Attribute Name | TileDB Datatype | Description |
122+
| :------------- | :-------------- | :----------------------------------- |
123+
| `values` | `uint64_t` | Contains the vector's `external_id`. |
124124

125125
### IVF_FLAT
126126

127127
#### Metadata
128128

129-
| Name | Description |
130-
| ------ | ------ |
129+
| Name | Description |
130+
| ------------------- | ----------------------------------------------------------------------------------- |
131131
| `partition_history` | An ordered list of the number of partitions used at different ingestion timestamps. |
132132

133133
#### `partition_centroids`
@@ -137,7 +137,7 @@ This is a 2D dense array storing the k-means centroids for the different vector
137137
#### Basic schema parameters
138138

139139
| **Parameter** | **Value** |
140-
|:--------------|:----------|
140+
| :------------ | :-------- |
141141
| Array type | Dense |
142142
| Rank | 2D |
143143
| Cell order | Col-major |
@@ -146,40 +146,40 @@ This is a 2D dense array storing the k-means centroids for the different vector
146146
#### Dimensions
147147

148148
| Dimension Name | TileDB Datatype | Domain | Description |
149-
|:---------------|:----------------|:------------------|:----------------------------------------|
149+
| :------------- | :-------------- | :---------------- | :-------------------------------------- |
150150
| `rows` | `int32_t` | `[0, dimensions]` | Corresponds to the centroid dimensions. |
151151
| `cols` | `int32_t` | `[0, MAX_INT32]` | Corresponds to the centroid id. |
152152

153153
#### Attributes
154154

155-
| Attribute Name | TileDB Datatype | Description |
156-
| :--------------- | :-------------- | :---------------------------------------------------------------------------|
157-
| `centroids` | `dtype` | Contains the centroid value at the specific dimension. |
155+
| Attribute Name | TileDB Datatype | Description |
156+
| :------------- | :-------------- | :----------------------------------------------------- |
157+
| `centroids` | `dtype` | Contains the centroid value at the specific dimension. |
158158

159159
#### `partition_indexes`
160160

161-
This is a 1D dense array recording the start-end index of each partition of vectors in the `shuffled_vectors` array.
161+
This is a 1D dense array recording the start-end index of each partition of vectors in the `shuffled_vectors` array.
162162

163163
#### Basic schema parameters
164164

165165
| **Parameter** | **Value** |
166-
|:--------------|:----------|
166+
| :------------ | :-------- |
167167
| Array type | Dense |
168168
| Rank | 1D |
169169
| Cell order | Col-major |
170170
| Tile order | Col-major |
171171

172172
#### Dimensions
173173

174-
| Dimension Name | TileDB Datatype | Domain | Description |
175-
| :------------- | :-------------------- | :-----------------| :------------------------------- |
176-
| `rows` | `int32_t` | `[0, MAX_INT32]` | Corresponds to the partition id. |
174+
| Dimension Name | TileDB Datatype | Domain | Description |
175+
| :------------- | :-------------- | :--------------- | :------------------------------- |
176+
| `rows` | `int32_t` | `[0, MAX_INT32]` | Corresponds to the partition id. |
177177

178178
#### Attributes
179179

180-
| Attribute Name | TileDB Datatype | Description |
181-
| :--------------- | :-------------- | :--------------------------------------------------------------------------------|
182-
| `values` | `uint64_t` | Contains to the position of the partition split in the `shuffled_vectors` array. |
180+
| Attribute Name | TileDB Datatype | Description |
181+
| :------------- | :-------------- | :------------------------------------------------------------------------------- |
182+
| `values` | `uint64_t` | Contains to the position of the partition split in the `shuffled_vectors` array. |
183183

184184
#### `shuffled_vectors`
185185

@@ -188,24 +188,24 @@ This is a 2D dense array that holds all the vectors. Each vector partition is st
188188
#### Basic schema parameters
189189

190190
| **Parameter** | **Value** |
191-
|:--------------|:----------|
191+
| :------------ | :-------- |
192192
| Array type | Dense |
193193
| Rank | 2D |
194194
| Cell order | Col-major |
195195
| Tile order | Col-major |
196196

197197
#### Dimensions
198198

199-
| Dimension Name | TileDB Datatype | Domain | Description |
200-
| :------------- | :-------------------- | :-----------------| :--------------------------------------------------------- |
201-
| `rows` | `int32_t` | `[0, dimensions]` | Corresponds to the vector dimensions. |
202-
| `cols` | `int32_t` | `[0, MAX_INT32]` | Corresponds to the vector position in the set of vectors. |
199+
| Dimension Name | TileDB Datatype | Domain | Description |
200+
| :------------- | :-------------- | :---------------- | :-------------------------------------------------------- |
201+
| `rows` | `int32_t` | `[0, dimensions]` | Corresponds to the vector dimensions. |
202+
| `cols` | `int32_t` | `[0, MAX_INT32]` | Corresponds to the vector position in the set of vectors. |
203203

204204
#### Attributes
205205

206-
| Attribute Name | TileDB Datatype | Description |
207-
| :--------------- | :-------------- | :---------------------------------------------------------------------------|
208-
| `values` | `dtype` | Contains the vector value at the specific dimension. |
206+
| Attribute Name | TileDB Datatype | Description |
207+
| :------------- | :-------------- | :--------------------------------------------------- |
208+
| `values` | `dtype` | Contains the vector value at the specific dimension. |
209209

210210
#### `shuffled_ids`
211211

@@ -214,29 +214,28 @@ This is a 1D dense array that maps vector indices in the `shuffled_vectors` arra
214214
#### Basic schema parameters
215215

216216
| **Parameter** | **Value** |
217-
|:--------------|:----------|
217+
| :------------ | :-------- |
218218
| Array type | Dense |
219219
| Rank | 1D |
220220
| Cell order | Col-major |
221221
| Tile order | Col-major |
222222

223223
#### Dimensions
224224

225-
| Dimension Name | TileDB Datatype | Domain | Description |
226-
| :------------- | :-------------------- | :-----------------| :--------------------------------------------------------- |
227-
| `rows` | `int32_t` | `[0, MAX_INT32]` | Corresponds to the vector position in `shuffled_vectors`. |
225+
| Dimension Name | TileDB Datatype | Domain | Description |
226+
| :------------- | :-------------- | :--------------- | :-------------------------------------------------------- |
227+
| `rows` | `int32_t` | `[0, MAX_INT32]` | Corresponds to the vector position in `shuffled_vectors`. |
228228

229229
#### Attributes
230230

231-
| Attribute Name | TileDB Datatype | Description |
232-
| :--------------- | :-------------- | :---------------------------------------------------------------------------|
233-
| `values` | `uint64_t` | Contains the vector `external_id`. |
234-
231+
| Attribute Name | TileDB Datatype | Description |
232+
| :------------- | :-------------- | :--------------------------------- |
233+
| `values` | `uint64_t` | Contains the vector `external_id`. |
235234

236235
### IVF_PQ
237236

238237
TODO
239238

240239
### VAMANA
241240

242-
TODO
241+
TODO

0 commit comments

Comments
 (0)