Skip to content

Commit 3b32b58

Browse files
committed
Automatic cleanup
1 parent d67843a commit 3b32b58

File tree

51 files changed

+288
-269
lines changed

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

51 files changed

+288
-269
lines changed

Imports/Library.props

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -20,7 +20,7 @@
2020
</ItemGroup>
2121

2222
<ItemGroup Condition="'$(IncludeAnalyzers)' == 'True'">
23-
<!-- <PackageReference Include="Microsoft.CodeAnalysis.PublicApiAnalyzers" PrivateAssets="all" />-->
23+
<!-- <PackageReference Include="Microsoft.CodeAnalysis.PublicApiAnalyzers" PrivateAssets="all" />-->
2424
<PackageReference Include="ConfigureAwaitChecker.Analyzer" PrivateAssets="all" />
2525
<PackageReference Include="IDisposableAnalyzers" PrivateAssets="all" />
2626
<PackageReference Include="Roslynator.Analyzers" PrivateAssets="all" />

README.md

Lines changed: 40 additions & 20 deletions
Original file line numberDiff line numberDiff line change
@@ -5,12 +5,14 @@
55

66
## Description
77

8-
FastData is a code generator that analyzes your data and creates high-performance, read-only lookup data structures for static data. It can output the data structures
8+
FastData is a code generator that analyzes your data and creates high-performance, read-only lookup data structures for
9+
static data. It can output the data structures
910
in many different languages (C#, C++, Rust, etc.), ready for inclusion in your project with zero dependencies.
1011

1112
## Use case
1213

13-
Imagine a scenario where you have a predefined list of words (e.g., dog breeds) and need to check whether a specific dog breed exists in the set.
14+
Imagine a scenario where you have a predefined list of words (e.g., dog breeds) and need to check whether a specific dog
15+
breed exists in the set.
1416
Usually you create an array and look up the value. However, this is far from optimal and is missing a few optimizations.
1517

1618
```csharp
@@ -140,15 +142,17 @@ Each output language has different settings. Type `fastdata <lang> --help` to se
140142

141143
### Data structures
142144

143-
By default, FastData chooses the optimal data structure for your data, but you can also set it manually with `fastdata -s <type>`. See the details of each structure type below.
145+
By default, FastData chooses the optimal data structure for your data, but you can also set it manually with
146+
`fastdata -s <type>`. See the details of each structure type below.
144147

145148
#### SingleValue
146149

147150
* Memory: Low
148151
* Latency: Low
149152
* Complexity: O(1)
150153

151-
This data structure only supports a single value. It is much faster than an array with a single item and has no overhead associated with it.
154+
This data structure only supports a single value. It is much faster than an array with a single item and has no overhead
155+
associated with it.
152156
FastData always selects this data structure whenever your dataset only contains one item.
153157

154158
#### Conditional
@@ -157,9 +161,11 @@ FastData always selects this data structure whenever your dataset only contains
157161
* Latency: Low
158162
* Complexity: O(n)
159163

160-
This data structure relies on built-in logic in the programming language. It produces if/switch statements which ultimately become machine instructions on the CPU, rather than data
164+
This data structure relies on built-in logic in the programming language. It produces if/switch statements which
165+
ultimately become machine instructions on the CPU, rather than data
161166
that resides in memory.
162-
Latency is therefore incredibly low, but the higher number of instructions bloat the assembly, and at a certain point it becomes more efficient to have
167+
Latency is therefore incredibly low, but the higher number of instructions bloat the assembly, and at a certain point it
168+
becomes more efficient to have
163169
the data reside in memory.
164170

165171
#### Array
@@ -168,33 +174,39 @@ the data reside in memory.
168174
* Latency: Low
169175
* Complexity: O(n)
170176

171-
This data structure uses an array as the backing store. It is often faster than a normal array due to efficient early exits (value/length range checks).
172-
It works well for small amounts of data since the array is scanned linearly, but for larger datasets, the O(n) complexity hurts performance a lot.
177+
This data structure uses an array as the backing store. It is often faster than a normal array due to efficient early
178+
exits (value/length range checks).
179+
It works well for small amounts of data since the array is scanned linearly, but for larger datasets, the O(n)
180+
complexity hurts performance a lot.
173181

174182
#### BinarySearch
175183

176184
* Memory: Low
177185
* Latency: Medium
178186
* Complexity: O(log n)
179187

180-
This data structure sorts your data and does a binary search on it. Since data is sorted at compile time, there is no overhead at runtime. Each lookup
181-
has a higher latency than a simple array, but once the dataset gets to a few hundred items, it beats the array due to a lower complexity.
188+
This data structure sorts your data and does a binary search on it. Since data is sorted at compile time, there is no
189+
overhead at runtime. Each lookup
190+
has a higher latency than a simple array, but once the dataset gets to a few hundred items, it beats the array due to a
191+
lower complexity.
182192

183193
#### EytzingerSearch
184194

185195
* Memory: Low
186196
* Latency: Medium
187197
* Complexity: O(n*log(n))
188198

189-
This data structure sorts data using an Eytzinger layout. It has better cache-locality than binary search. Under some circumstances it has better performance.
199+
This data structure sorts data using an Eytzinger layout. It has better cache-locality than binary search. Under some
200+
circumstances it has better performance.
190201

191202
#### KeyLength
192203

193204
* Memory: Low
194205
* Latency: Low
195206
* Complexity: O(1)
196207

197-
This data structure only works on strings, but it indexes them after their length, rather than a hash. In the case all the strings have unique lengths, the
208+
This data structure only works on strings, but it indexes them after their length, rather than a hash. In the case all
209+
the strings have unique lengths, the
198210
data structure further optimizes for latency.
199211

200212
#### HashSetChain
@@ -203,7 +215,8 @@ data structure further optimizes for latency.
203215
* Latency: Medium
204216
* Complexity: O(1)
205217

206-
This data structure is based on a hash table with separate chaining collision resolution. It uses a separate array for buckets to stay cache coherent, but it also uses more
218+
This data structure is based on a hash table with separate chaining collision resolution. It uses a separate array for
219+
buckets to stay cache coherent, but it also uses more
207220
memory since it needs to keep track of indices.
208221

209222
#### HashSetLinear
@@ -220,7 +233,8 @@ This data structure is also a hash table, but with linear collision resolution.
220233
* Latency: Low
221234
* Complexity: O(1)
222235

223-
This data structure tries to create a perfect hash for the dataset. It does so by brute-forcing a seed for a simple hash function
236+
This data structure tries to create a perfect hash for the dataset. It does so by brute-forcing a seed for a simple hash
237+
function
224238
until it hits the right combination. If the dataset is small enough, it can even produce a minimal perfect hash.
225239

226240
#### PerfectHashGPerf
@@ -229,12 +243,15 @@ until it hits the right combination. If the dataset is small enough, it can even
229243
* Latency: Low
230244
* Complexity: O(1)
231245

232-
This data structure uses the same algorithm as gperf to derive a perfect hash. It uses Richard J. Cichelli's method for creating an associative table,
233-
which is augmented using alpha increments to resolve collisions. It only works on strings, but it is great for medium-sized datasets.
246+
This data structure uses the same algorithm as gperf to derive a perfect hash. It uses Richard J. Cichelli's method for
247+
creating an associative table,
248+
which is augmented using alpha increments to resolve collisions. It only works on strings, but it is great for
249+
medium-sized datasets.
234250

235251
## How does it work?
236252

237-
The idea behind the project is to generate a data-dependent optimized data structure for read-only lookup. When data is known beforehand, the algorithm can select from a set
253+
The idea behind the project is to generate a data-dependent optimized data structure for read-only lookup. When data is
254+
known beforehand, the algorithm can select from a set
238255
of different data structures, indexing, and comparison methods that are tailor-built for the data.
239256

240257
### Compile-time generation
@@ -256,12 +273,15 @@ FastData uses advanced data analysis techniques to generate optimized data struc
256273
* Character mapping
257274
* Encoding analysis
258275

259-
It uses the analysis to create so-called early-exits, which are fast `O(1)` checks on your input before doing any `O(n)` checks on the actual dataset.
276+
It uses the analysis to create so-called early-exits, which are fast `O(1)` checks on your input before doing any `O(n)`
277+
checks on the actual dataset.
260278

261279
#### Hash function generators
262280

263-
Hash functions come in many flavors. Some are designed for low latency, some for throughput, others for low collision rate.
264-
Programming language runtimes come with a hash function that is a tradeoff between these parameters. FastData builds a hash function specifically tailored to the dataset.
281+
Hash functions come in many flavors. Some are designed for low latency, some for throughput, others for low collision
282+
rate.
283+
Programming language runtimes come with a hash function that is a tradeoff between these parameters. FastData builds a
284+
hash function specifically tailored to the dataset.
265285
It has support for several techniques:
266286

267287
1. **Default:** If no technique is selected, FastData uses a hash function by Daniel Bernstein (DJB2)

Src/FastData.Benchmarks/Benchmarks/ArrayVsHashSetBenchmarks.cs

Lines changed: 1 addition & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -3,10 +3,7 @@
33

44
namespace Genbox.FastData.Benchmarks.Benchmarks;
55

6-
/// <summary>
7-
/// Benchmark used to illustrate the algorithmic complexity differences between Array and HashSet.
8-
/// Needed for the Readme.
9-
/// </summary>
6+
/// <summary>Benchmark used to illustrate the algorithmic complexity differences between Array and HashSet. Needed for the Readme.</summary>
107
[Orderer(SummaryOrderPolicy.FastestToSlowest)]
118
public class ArrayVsHashSetBenchmarks
129
{

Src/FastData.Benchmarks/Benchmarks/DogsBenchmark.cs

Lines changed: 2 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -2,9 +2,7 @@
22

33
namespace Genbox.FastData.Benchmarks.Benchmarks;
44

5-
/// <summary>
6-
/// Benchmark used in Readme
7-
/// </summary>
5+
/// <summary>Benchmark used in Readme</summary>
86
[Orderer(SummaryOrderPolicy.FastestToSlowest)]
97
public class DogsBenchmark
108
{
@@ -20,7 +18,7 @@ private static class Dogs
2018
{
2119
public static bool Contains(string value)
2220
{
23-
if ((49280UL & (1UL << (value.Length - 1) % 64)) == 0)
21+
if ((49280UL & (1UL << ((value.Length - 1) % 64))) == 0)
2422
return false;
2523

2624
return value switch

Src/FastData.Benchmarks/Benchmarks/GetHashCodeBenchmarks.cs

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,6 @@
11
using System.Runtime.CompilerServices;
22
using BenchmarkDotNet.Configs;
3+
// ReSharper disable All
34

45
namespace Genbox.FastData.Benchmarks.Benchmarks;
56

Src/FastData.Benchmarks/Benchmarks/HashBenchmarks.cs

Lines changed: 3 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -42,7 +42,9 @@ public ulong XXHashTest()
4242
ulong value = 0;
4343

4444
foreach (string s in _array)
45+
{
4546
value += XxHash.ComputeHash(s);
47+
}
4648

4749
return value;
4850
}
@@ -67,7 +69,7 @@ private static uint ComputeHash(ref char ptr, int length, ulong seed = PRIME64_5
6769
while (length >= 4)
6870
{
6971
hash1 ^= Round(0, ptr64);
70-
hash1 = (RotateLeft(hash1, 27) * PRIME64_1) + PRIME64_4;
72+
hash1 = RotateLeft(hash1, 27) * PRIME64_1 + PRIME64_4;
7173
ptr64 = ref Unsafe.Add(ref ptr64, 1);
7274
length -= 4;
7375
}

Src/FastData.Benchmarks/Benchmarks/PriorityStructureBenchmarks.cs

Lines changed: 10 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -5,9 +5,9 @@ namespace Genbox.FastData.Benchmarks.Benchmarks;
55
[InvocationCount(1_000_000)]
66
public class PriorityStructureBenchmarks
77
{
8-
private readonly MinHeap<bool> _heap = new MinHeap<bool>(10);
98
private readonly RingBuffer _buffer = new RingBuffer(10);
109
private readonly FixedSet _fixedSet = new FixedSet(10);
10+
private readonly MinHeap<bool> _heap = new MinHeap<bool>(10);
1111
private readonly SortedSet<double> _sorted = new SortedSet<double>();
1212

1313
[IterationCleanup]
@@ -23,28 +23,36 @@ public void Cleanup()
2323
public void MinHeapTest()
2424
{
2525
for (double i = 0; i < 100; i++)
26+
{
2627
_heap.Add(i, true);
28+
}
2729
}
2830

2931
[Benchmark]
3032
public void RingBufferTest()
3133
{
3234
for (double i = 0; i < 100; i++)
35+
{
3336
_buffer.Add(i);
37+
}
3438
}
3539

3640
[Benchmark]
3741
public void FixedSetTest()
3842
{
3943
for (double i = 0; i < 100; i++)
44+
{
4045
_fixedSet.Add(i);
46+
}
4147
}
4248

4349
[Benchmark]
4450
public void SortedSetTest()
4551
{
4652
for (double i = 0; i < 100; i++)
53+
{
4754
_sorted.Add(i);
55+
}
4856
}
4957

5058
private sealed class FixedSet(int capacity)
@@ -81,8 +89,8 @@ private sealed class RingBuffer(int capacity)
8189
{
8290
private readonly double[] _buffer = new double[capacity];
8391
private int _count;
84-
private int _next;
8592
private int _minIndex = -1;
93+
private int _next;
8694

8795
public void Add(double value)
8896
{

Src/FastData.Benchmarks/Benchmarks/SegmentGeneratorsBenchmarks.cs

Lines changed: 4 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -8,13 +8,14 @@ namespace Genbox.FastData.Benchmarks.Benchmarks;
88
[MemoryDiagnoser]
99
public class SegmentGeneratorsBenchmarks
1010
{
11-
//We start at 8 and go up to 100 to cover as many cases as possible
12-
private readonly StringProperties _props = DataAnalyzer.GetStringProperties(Enumerable.Range(8, 100).Select(x => TestHelper.GenerateRandomString(Random.Shared, x)).ToArray());
1311
private readonly BruteForceGenerator _bfGen = new BruteForceGenerator(8);
14-
private readonly EdgeGramGenerator _egGen = new EdgeGramGenerator(8);
1512
private readonly DeltaGenerator _deltaGen = new DeltaGenerator();
13+
private readonly EdgeGramGenerator _egGen = new EdgeGramGenerator(8);
1614
private readonly OffsetGenerator _ofGen = new OffsetGenerator();
1715

16+
//We start at 8 and go up to 100 to cover as many cases as possible
17+
private readonly StringProperties _props = DataAnalyzer.GetStringProperties(Enumerable.Range(8, 100).Select(x => TestHelper.GenerateRandomString(Random.Shared, x)).ToArray());
18+
1819
[Benchmark]
1920
public object BruteForceGenerator() => _bfGen.Generate(_props).ToArray();
2021

Src/FastData.Cli/Program.cs

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -143,7 +143,9 @@ private static async IAsyncEnumerable<object> ReadFile(string file, DataType dat
143143
Func<string, object> func = GetTypeFunc(dataType);
144144

145145
await foreach (string line in File.ReadLinesAsync(file))
146+
{
146147
yield return func(line);
148+
}
147149
}
148150

149151
private static Func<string, object> GetTypeFunc(DataType dataType) => dataType switch

Src/FastData.Generator.CPlusPlus.Benchmarks/Program.cs

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -57,7 +57,9 @@ private static string PrintQueries(ITestData data, string identifier)
5757
StringBuilder sb = new StringBuilder();
5858

5959
for (int i = 0; i < 25; i++)
60+
{
6061
sb.AppendLine(CultureInfo.InvariantCulture, $" DoNotOptimize({identifier}::contains({data.GetValueLabel(helper)}));");
62+
}
6163

6264
return sb.ToString();
6365
}

0 commit comments

Comments
 (0)