You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: README.md
+40-20Lines changed: 40 additions & 20 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -5,12 +5,14 @@
5
5
6
6
## Description
7
7
8
-
FastData is a code generator that analyzes your data and creates high-performance, read-only lookup data structures for static data. It can output the data structures
8
+
FastData is a code generator that analyzes your data and creates high-performance, read-only lookup data structures for
9
+
static data. It can output the data structures
9
10
in many different languages (C#, C++, Rust, etc.), ready for inclusion in your project with zero dependencies.
10
11
11
12
## Use case
12
13
13
-
Imagine a scenario where you have a predefined list of words (e.g., dog breeds) and need to check whether a specific dog breed exists in the set.
14
+
Imagine a scenario where you have a predefined list of words (e.g., dog breeds) and need to check whether a specific dog
15
+
breed exists in the set.
14
16
Usually you create an array and look up the value. However, this is far from optimal and is missing a few optimizations.
15
17
16
18
```csharp
@@ -140,15 +142,17 @@ Each output language has different settings. Type `fastdata <lang> --help` to se
140
142
141
143
### Data structures
142
144
143
-
By default, FastData chooses the optimal data structure for your data, but you can also set it manually with `fastdata -s <type>`. See the details of each structure type below.
145
+
By default, FastData chooses the optimal data structure for your data, but you can also set it manually with
146
+
`fastdata -s <type>`. See the details of each structure type below.
144
147
145
148
#### SingleValue
146
149
147
150
* Memory: Low
148
151
* Latency: Low
149
152
* Complexity: O(1)
150
153
151
-
This data structure only supports a single value. It is much faster than an array with a single item and has no overhead associated with it.
154
+
This data structure only supports a single value. It is much faster than an array with a single item and has no overhead
155
+
associated with it.
152
156
FastData always selects this data structure whenever your dataset only contains one item.
153
157
154
158
#### Conditional
@@ -157,9 +161,11 @@ FastData always selects this data structure whenever your dataset only contains
157
161
* Latency: Low
158
162
* Complexity: O(n)
159
163
160
-
This data structure relies on built-in logic in the programming language. It produces if/switch statements which ultimately become machine instructions on the CPU, rather than data
164
+
This data structure relies on built-in logic in the programming language. It produces if/switch statements which
165
+
ultimately become machine instructions on the CPU, rather than data
161
166
that resides in memory.
162
-
Latency is therefore incredibly low, but the higher number of instructions bloat the assembly, and at a certain point it becomes more efficient to have
167
+
Latency is therefore incredibly low, but the higher number of instructions bloat the assembly, and at a certain point it
168
+
becomes more efficient to have
163
169
the data reside in memory.
164
170
165
171
#### Array
@@ -168,33 +174,39 @@ the data reside in memory.
168
174
* Latency: Low
169
175
* Complexity: O(n)
170
176
171
-
This data structure uses an array as the backing store. It is often faster than a normal array due to efficient early exits (value/length range checks).
172
-
It works well for small amounts of data since the array is scanned linearly, but for larger datasets, the O(n) complexity hurts performance a lot.
177
+
This data structure uses an array as the backing store. It is often faster than a normal array due to efficient early
178
+
exits (value/length range checks).
179
+
It works well for small amounts of data since the array is scanned linearly, but for larger datasets, the O(n)
180
+
complexity hurts performance a lot.
173
181
174
182
#### BinarySearch
175
183
176
184
* Memory: Low
177
185
* Latency: Medium
178
186
* Complexity: O(log n)
179
187
180
-
This data structure sorts your data and does a binary search on it. Since data is sorted at compile time, there is no overhead at runtime. Each lookup
181
-
has a higher latency than a simple array, but once the dataset gets to a few hundred items, it beats the array due to a lower complexity.
188
+
This data structure sorts your data and does a binary search on it. Since data is sorted at compile time, there is no
189
+
overhead at runtime. Each lookup
190
+
has a higher latency than a simple array, but once the dataset gets to a few hundred items, it beats the array due to a
191
+
lower complexity.
182
192
183
193
#### EytzingerSearch
184
194
185
195
* Memory: Low
186
196
* Latency: Medium
187
197
* Complexity: O(n*log(n))
188
198
189
-
This data structure sorts data using an Eytzinger layout. It has better cache-locality than binary search. Under some circumstances it has better performance.
199
+
This data structure sorts data using an Eytzinger layout. It has better cache-locality than binary search. Under some
200
+
circumstances it has better performance.
190
201
191
202
#### KeyLength
192
203
193
204
* Memory: Low
194
205
* Latency: Low
195
206
* Complexity: O(1)
196
207
197
-
This data structure only works on strings, but it indexes them after their length, rather than a hash. In the case all the strings have unique lengths, the
208
+
This data structure only works on strings, but it indexes them after their length, rather than a hash. In the case all
209
+
the strings have unique lengths, the
198
210
data structure further optimizes for latency.
199
211
200
212
#### HashSetChain
@@ -203,7 +215,8 @@ data structure further optimizes for latency.
203
215
* Latency: Medium
204
216
* Complexity: O(1)
205
217
206
-
This data structure is based on a hash table with separate chaining collision resolution. It uses a separate array for buckets to stay cache coherent, but it also uses more
218
+
This data structure is based on a hash table with separate chaining collision resolution. It uses a separate array for
219
+
buckets to stay cache coherent, but it also uses more
207
220
memory since it needs to keep track of indices.
208
221
209
222
#### HashSetLinear
@@ -220,7 +233,8 @@ This data structure is also a hash table, but with linear collision resolution.
220
233
* Latency: Low
221
234
* Complexity: O(1)
222
235
223
-
This data structure tries to create a perfect hash for the dataset. It does so by brute-forcing a seed for a simple hash function
236
+
This data structure tries to create a perfect hash for the dataset. It does so by brute-forcing a seed for a simple hash
237
+
function
224
238
until it hits the right combination. If the dataset is small enough, it can even produce a minimal perfect hash.
225
239
226
240
#### PerfectHashGPerf
@@ -229,12 +243,15 @@ until it hits the right combination. If the dataset is small enough, it can even
229
243
* Latency: Low
230
244
* Complexity: O(1)
231
245
232
-
This data structure uses the same algorithm as gperf to derive a perfect hash. It uses Richard J. Cichelli's method for creating an associative table,
233
-
which is augmented using alpha increments to resolve collisions. It only works on strings, but it is great for medium-sized datasets.
246
+
This data structure uses the same algorithm as gperf to derive a perfect hash. It uses Richard J. Cichelli's method for
247
+
creating an associative table,
248
+
which is augmented using alpha increments to resolve collisions. It only works on strings, but it is great for
249
+
medium-sized datasets.
234
250
235
251
## How does it work?
236
252
237
-
The idea behind the project is to generate a data-dependent optimized data structure for read-only lookup. When data is known beforehand, the algorithm can select from a set
253
+
The idea behind the project is to generate a data-dependent optimized data structure for read-only lookup. When data is
254
+
known beforehand, the algorithm can select from a set
238
255
of different data structures, indexing, and comparison methods that are tailor-built for the data.
239
256
240
257
### Compile-time generation
@@ -256,12 +273,15 @@ FastData uses advanced data analysis techniques to generate optimized data struc
256
273
* Character mapping
257
274
* Encoding analysis
258
275
259
-
It uses the analysis to create so-called early-exits, which are fast `O(1)` checks on your input before doing any `O(n)` checks on the actual dataset.
276
+
It uses the analysis to create so-called early-exits, which are fast `O(1)` checks on your input before doing any `O(n)`
277
+
checks on the actual dataset.
260
278
261
279
#### Hash function generators
262
280
263
-
Hash functions come in many flavors. Some are designed for low latency, some for throughput, others for low collision rate.
264
-
Programming language runtimes come with a hash function that is a tradeoff between these parameters. FastData builds a hash function specifically tailored to the dataset.
281
+
Hash functions come in many flavors. Some are designed for low latency, some for throughput, others for low collision
282
+
rate.
283
+
Programming language runtimes come with a hash function that is a tradeoff between these parameters. FastData builds a
284
+
hash function specifically tailored to the dataset.
265
285
It has support for several techniques:
266
286
267
287
1.**Default:** If no technique is selected, FastData uses a hash function by Daniel Bernstein (DJB2)
0 commit comments