2023-07-09
goos: linux
goarch: amd64
pkg: github.com/stillmatic/gollum
cpu: AMD Ryzen 9 7950X 16-Core Processor
BenchmarkMemoryVectorStore/BenchmarkInsert-n=10-32 14752 83971 ns/op 64912 B/op 10 allocs/op
BenchmarkMemoryVectorStore/BenchmarkQuery-n=10-k=1-32 810657 1256 ns/op 288 B/op 3 allocs/op
BenchmarkMemoryVectorStore/BenchmarkQuery-n=10-k=10-32 663574 1639 ns/op 2880 B/op 6 allocs/op
BenchmarkMemoryVectorStore/BenchmarkInsert-n=100-32 1263 1042804 ns/op 646807 B/op 190 allocs/op
BenchmarkMemoryVectorStore/BenchmarkQuery-n=100-k=1-32 113851 10399 ns/op 288 B/op 3 allocs/op
BenchmarkMemoryVectorStore/BenchmarkQuery-n=100-k=10-32 93428 12625 ns/op 2880 B/op 6 allocs/op
BenchmarkMemoryVectorStore/BenchmarkQuery-n=100-k=100-32 69256 17569 ns/op 25664 B/op 9 allocs/op
BenchmarkMemoryVectorStore/BenchmarkInsert-n=1000-32 147 8100071 ns/op 6505065 B/op 2734 allocs/op
BenchmarkMemoryVectorStore/BenchmarkQuery-n=1000-k=1-32 10000 104921 ns/op 288 B/op 3 allocs/op
BenchmarkMemoryVectorStore/BenchmarkQuery-n=1000-k=10-32 9831 123464 ns/op 2880 B/op 6 allocs/op
BenchmarkMemoryVectorStore/BenchmarkQuery-n=1000-k=100-32 7848 152007 ns/op 25664 B/op 9 allocs/op
BenchmarkMemoryVectorStore/BenchmarkInsert-n=10000-32 13 82907557 ns/op 64727761 B/op 29740 allocs/op
BenchmarkMemoryVectorStore/BenchmarkQuery-n=10000-k=1-32 783 1514925 ns/op 288 B/op 3 allocs/op
BenchmarkMemoryVectorStore/BenchmarkQuery-n=10000-k=10-32 679 1692251 ns/op 2880 B/op 6 allocs/op
BenchmarkMemoryVectorStore/BenchmarkQuery-n=10000-k=100-32 650 2095042 ns/op 25664 B/op 9 allocs/op
BenchmarkMemoryVectorStore/BenchmarkInsert-n=100000-32 2 814309670 ns/op 648192728 B/op 299774 allocs/op
BenchmarkMemoryVectorStore/BenchmarkQuery-n=100000-k=1-32 82 16656264 ns/op 288 B/op 3 allocs/op
BenchmarkMemoryVectorStore/BenchmarkQuery-n=100000-k=10-32 68 16402470 ns/op 2880 B/op 6 allocs/op
BenchmarkMemoryVectorStore/BenchmarkQuery-n=100000-k=100-32 64 18205266 ns/op 25664 B/op 9 allocs/op
BenchmarkMemoryVectorStore/BenchmarkInsert-n=1000000-32 1 9578485965 ns/op 6552089784 B/op 2999874 allocs/op
BenchmarkMemoryVectorStore/BenchmarkQuery-n=1000000-k=1-32 7 161260588 ns/op 288 B/op 3 allocs/op
BenchmarkMemoryVectorStore/BenchmarkQuery-n=1000000-k=10-32 7 212760511 ns/op 2880 B/op 6 allocs/op
BenchmarkMemoryVectorStore/BenchmarkQuery-n=1000000-k=100-32 4 290261365 ns/op 25664 B/op 9 allocs/op
PASS
ok github.com/stillmatic/gollum 111.224s
post perf improvements, 2023-07-17 changes are most pronounced when k is large, more efficient memory reuse makes time nearly constant with k, about 2x improvement with large k.
goos: linux
goarch: amd64
pkg: github.com/stillmatic/gollum
cpu: AMD Ryzen 9 7950X 16-Core Processor
BenchmarkMemoryVectorStore/BenchmarkInsert-n=10-32 12562 86781 ns/op 64680 B/op 10 allocs/op
BenchmarkMemoryVectorStore/BenchmarkQuery-n=10-k=1-32 1259478 967.6 ns/op 120 B/op 4 allocs/op
BenchmarkMemoryVectorStore/BenchmarkQuery-n=10-k=10-32 1201736 993.5 ns/op 304 B/op 3 allocs/op
BenchmarkMemoryVectorStore/BenchmarkInsert-n=100-32 1335 847394 ns/op 652949 B/op 190 allocs/op
BenchmarkMemoryVectorStore/BenchmarkQuery-n=100-k=1-32 143896 8509 ns/op 120 B/op 4 allocs/op
BenchmarkMemoryVectorStore/BenchmarkQuery-n=100-k=10-32 122300 9787 ns/op 624 B/op 4 allocs/op
BenchmarkMemoryVectorStore/BenchmarkQuery-n=100-k=100-32 111930 11152 ns/op 2752 B/op 3 allocs/op
BenchmarkMemoryVectorStore/BenchmarkInsert-n=1000-32 127 8455112 ns/op 6477091 B/op 2734 allocs/op
BenchmarkMemoryVectorStore/BenchmarkQuery-n=1000-k=1-32 13416 88695 ns/op 120 B/op 4 allocs/op
BenchmarkMemoryVectorStore/BenchmarkQuery-n=1000-k=10-32 12246 97176 ns/op 624 B/op 4 allocs/op
BenchmarkMemoryVectorStore/BenchmarkQuery-n=1000-k=100-32 9746 114710 ns/op 5952 B/op 4 allocs/op
BenchmarkMemoryVectorStore/BenchmarkInsert-n=10000-32 14 88090856 ns/op 65255787 B/op 29740 allocs/op
BenchmarkMemoryVectorStore/BenchmarkQuery-n=10000-k=1-32 769 1357818 ns/op 120 B/op 4 allocs/op
BenchmarkMemoryVectorStore/BenchmarkQuery-n=10000-k=10-32 727 1555869 ns/op 624 B/op 4 allocs/op
BenchmarkMemoryVectorStore/BenchmarkQuery-n=10000-k=100-32 752 1574506 ns/op 5952 B/op 4 allocs/op
BenchmarkMemoryVectorStore/BenchmarkInsert-n=100000-32 2 888843642 ns/op 648192776 B/op 299774 allocs/op
BenchmarkMemoryVectorStore/BenchmarkQuery-n=100000-k=1-32 69 15276284 ns/op 120 B/op 4 allocs/op
BenchmarkMemoryVectorStore/BenchmarkQuery-n=100000-k=10-32 79 14270086 ns/op 624 B/op 4 allocs/op
BenchmarkMemoryVectorStore/BenchmarkQuery-n=100000-k=100-32 68 15162731 ns/op 5952 B/op 4 allocs/op
BenchmarkMemoryVectorStore/BenchmarkInsert-n=1000000-32 1 8644221139 ns/op 6552072176 B/op 2999841 allocs/op
BenchmarkMemoryVectorStore/BenchmarkQuery-n=1000000-k=1-32 8 141239584 ns/op 120 B/op 4 allocs/op
BenchmarkMemoryVectorStore/BenchmarkQuery-n=1000000-k=10-32 1 1354937045 ns/op 624 B/op 4 allocs/op
BenchmarkMemoryVectorStore/BenchmarkQuery-n=1000000-k=100-32 7 156217518 ns/op 5952 B/op 4 allocs/op
PASS
ok github.com/stillmatic/gollum 105.535s
post perf improvement - mac. stabilizes the allocations
goos: darwin
goarch: arm64
pkg: github.com/stillmatic/gollum
BenchmarkMemoryVectorStore/BenchmarkInsert-n=10-10 5341 220817 ns/op 65223 B/op 10 allocs/op
BenchmarkMemoryVectorStore/BenchmarkQuery-n=10-k=1-10 60616 19622 ns/op 120 B/op 4 allocs/op
BenchmarkMemoryVectorStore/BenchmarkQuery-n=10-k=10-10 60388 20033 ns/op 304 B/op 3 allocs/op
BenchmarkMemoryVectorStore/BenchmarkInsert-n=100-10 536 2202933 ns/op 652278 B/op 190 allocs/op
BenchmarkMemoryVectorStore/BenchmarkQuery-n=100-k=1-10 6152 194476 ns/op 120 B/op 4 allocs/op
BenchmarkMemoryVectorStore/BenchmarkQuery-n=100-k=10-10 6094 198124 ns/op 624 B/op 4 allocs/op
BenchmarkMemoryVectorStore/BenchmarkQuery-n=100-k=100-10 5946 199925 ns/op 2752 B/op 3 allocs/op
BenchmarkMemoryVectorStore/BenchmarkInsert-n=1000-10 55 22152592 ns/op 6523947 B/op 2735 allocs/op
BenchmarkMemoryVectorStore/BenchmarkQuery-n=1000-k=1-10 613 1953824 ns/op 120 B/op 4 allocs/op
BenchmarkMemoryVectorStore/BenchmarkQuery-n=1000-k=10-10 610 1987216 ns/op 624 B/op 4 allocs/op
BenchmarkMemoryVectorStore/BenchmarkQuery-n=1000-k=100-10 580 2051436 ns/op 5952 B/op 4 allocs/op
BenchmarkMemoryVectorStore/BenchmarkInsert-n=10000-10 5 222244750 ns/op 64782620 B/op 29747 allocs/op
BenchmarkMemoryVectorStore/BenchmarkQuery-n=10000-k=1-10 61 19383620 ns/op 120 B/op 4 allocs/op
BenchmarkMemoryVectorStore/BenchmarkQuery-n=10000-k=10-10 60 19823898 ns/op 624 B/op 4 allocs/op
BenchmarkMemoryVectorStore/BenchmarkQuery-n=10000-k=100-10 57 20027584 ns/op 5952 B/op 4 allocs/op
BenchmarkMemoryVectorStore/BenchmarkInsert-n=100000-10 1 2207505500 ns/op 648271208 B/op 299808 allocs/op
BenchmarkMemoryVectorStore/BenchmarkQuery-n=100000-k=1-10 6 196473680 ns/op 120 B/op 4 allocs/op
BenchmarkMemoryVectorStore/BenchmarkQuery-n=100000-k=10-10 6 197389812 ns/op 624 B/op 4 allocs/op
BenchmarkMemoryVectorStore/BenchmarkQuery-n=100000-k=100-10 5 200068883 ns/op 5952 B/op 4 allocs/op
BenchmarkMemoryVectorStore/BenchmarkInsert-n=1000000-10 1 22239769458 ns/op 6552038696 B/op 2999849 allocs/op
BenchmarkMemoryVectorStore/BenchmarkQuery-n=1000000-k=1-10 1 1966544833 ns/op 120 B/op 4 allocs/op
BenchmarkMemoryVectorStore/BenchmarkQuery-n=1000000-k=10-10 1 1963972417 ns/op 624 B/op 4 allocs/op
BenchmarkMemoryVectorStore/BenchmarkQuery-n=1000000-k=100-10 1 1988149583 ns/op 5952 B/op 4 allocs/op
PASS
ok github.com/stillmatic/gollum 142.897s
mac is expected to be slower. however, post change, what we see is that our memury usage is much more stable - consistently 4 allocs per operation and much less memory usage too. the memory characteristics are proportional to k. the desktop chip is faster and has SIMD enhanced distance calculation, so not unexpected. the runtime is also consistently linear with n where n is the number of values in the db.