Skip to content

Conversation

bhartnett
Copy link
Contributor

This change appears to improve performance by a few percent.

I ran block imports over the first 3 million blocks and here are the results:

baseline.csv vs delete-all.csv
                       bps_x     bps_y      tps_x      tps_y  time_x time_y    bpsd    tpsd   timed
block_number                                                                                       
(499713, 777522]    8,865.48  9,614.48  14,682.15  15,934.45     43s    38s  19.58%  19.58%  -6.28%
(777522, 1055332]   5,903.32  6,185.96  16,865.01  17,907.18     54s    48s  16.38%  16.38%  -1.28%
(1055332, 1333142]  5,053.96  5,622.88  25,792.41  28,775.77     57s    51s  13.32%  13.32%  -9.20%
(1333142, 1610952]  4,413.58  4,895.69  28,765.84  32,123.93   1m16s   1m1s  29.46%  29.46%  -7.68%
(1610952, 1888761]  3,657.44  4,053.78  25,510.86  28,090.79   1m56s  1m43s  22.56%  22.56%  -7.77%
(1888761, 2166571]  4,393.58  4,397.60  32,989.25  33,007.81    1m8s  1m14s   1.30%   1.30%   9.25%
(2166571, 2444381]  2,009.52  2,368.80  15,735.87  18,544.95  13m40s  13m6s  14.76%  14.76%  -8.79%
(2444381, 2722191]  3,229.39  3,008.85  22,540.26  20,985.28   8m19s  8m28s   0.99%   0.99%   8.00%
(2722191, 3000001]  4,370.49  4,319.19  30,905.05  30,375.02   1m11s  1m11s   9.10%   9.10%   8.76%

blocks: 2492096, baseline: 30m7s, contender: 29m3s
Time (total): -1m3s, -3.54%

bpsd = blocks per sec diff (+), tpsd = txs per sec diff, timed = time to process diff (-)
+ = more is better, - = less is better

@bhartnett bhartnett requested a review from arnetheduck August 7, 2025 03:00
@arnetheduck
Copy link
Member

arnetheduck commented Aug 11, 2025

I ran block imports over the first 3 million blocks

In general, the first 3 million blocks tend to be not representative for lru sizing since the database is still small in general and everything just fits - ie at block height 3m, the whole database is just 1.3gb total. Also, in the sample we can see another effect: the change is efficient in some block ranges but not all - in particular, it may be inefficient past the gas repricing following the shanghai dos attack, as can be seen in the last timing bracket .. for this kind of tests, it's usually better to run them at some more recent block range.

running on a recent block range comes with its own difficulties - in particular, block time is no longer a good measure because it's too easily influenced by OS caching of the database - instead, it's better to look at "number of database lookups" - ie the hit / miss rate of the cache, more or less, and see which one is better - the test has to be long enough to get past the "warm-up" period.

@bhartnett
Copy link
Contributor Author

In general, the first 3 million blocks tend to be not representative for lru sizing since the database is still small in general and everything just fits - ie at block height 3m, the whole database is just 1.3gb total. Also, in the sample we can see another effect: the change is efficient in some block ranges but not all - in particular, it may be inefficient past the gas repricing following the shanghai dos attack, as can be seen in the last timing bracket .. for this kind of tests, it's usually better to run them at some more recent block range.

running on a recent block range comes with its own difficulties - in particular, block time is no longer a good measure because it's too easily influenced by OS caching of the database - instead, it's better to look at "number of database lookups" - ie the hit / miss rate of the cache, more or less, and see which one is better - the test has to be long enough to get past the "warm-up" period.

In that case, I'll run a larger test over a more recent block range and use metrics to count the cache hit / miss rates.

@arnetheduck
Copy link
Member

use metrics

see also --debug-rdb-print-stats

@bhartnett
Copy link
Contributor Author

I ran another longer test over approx 4 million blocks starting from just after the merge and here are the results:

python scripts/block-import-stats.py ~/Downloads/master.csv ~/Downloads/delete-from-caches.csv
master.csv vs delete-from-caches.csv
                      bps_x  bps_y     tps_x     tps_y    time_x    time_y    bpsd    tpsd    timed
block_number                                                                                       
(15537394, 15981838]  31.25  29.09  4,861.74  4,537.47  3h57m32s  4h16m21s  -6.62%  -6.62%    8.29%
(15981838, 16426282]  28.16  26.49  4,009.14  3,781.07  4h23m14s  4h48m15s  -5.44%  -5.44%   10.34%
(16426282, 16870727]  27.95  26.03  4,109.17  3,825.78  4h24m25s  4h46m30s  -6.85%  -6.85%    8.39%
(16870727, 17315171]  26.20  29.22  3,900.46  4,347.84  4h52m33s  4h19m42s  12.54%  12.54%  -10.41%
(17315171, 17759616]  23.29  26.15  3,409.88  3,836.29  5h23m55s  4h44m34s  14.57%  14.57%  -10.54%
(17759616, 18204060]  22.15  21.81  3,180.56  3,133.36  5h37m12s  5h43m37s  -0.98%  -0.98%    2.53%
(18204060, 18648505]  23.24  21.66  3,386.10  3,157.34  5h19m18s  5h46m37s  -6.74%  -6.74%    8.67%
(18648505, 19092949]  20.52  20.59  3,308.84  3,323.32  6h14m35s  6h12m50s   0.59%   0.59%   -0.26%
(19092949, 19537394]  22.51  23.52  3,744.44  3,901.93  5h43m16s  5h18m34s   6.13%   6.08%   -5.70%

blocks: 3991808, baseline: 45h56m5s, contender: 45h57m5s
Time (total): 59s, 0.04%

bpsd = blocks per sec diff (+), tpsd = txs per sec diff, timed = time to process diff (-)
+ = more is better, - = less is better

In summary no improvement in run time overall.

I also collected the cache hits/misses stats of the delete-from-caches run but lost this for the master run because my computer restarted before I could save it:

NTC 2025-08-23 08:48:15.877+08:00 Import complete                            blockNumber=19537394 slot=8738654 blocks=4000000 txs=602958139 mgas=60575476
vtxLru(4971026)
   state    vtype       miss        hit      total hitrate
 Account    Empty    2992425          0    2992425   0.00%
 Account     Leaf  453503411  391126517  844629928  46.31%
 Account   Branch          0          0          0   -nan%
 Account ExtBranch    9718066   14976638   24694704  60.65%
   World    Empty   36094657          0   36094657   0.00%
   World     Leaf  148597607 1472415091 1621012698  90.83%
   World   Branch          0          0          0   -nan%
   World ExtBranch    3450311   10594154   14044465  75.43%
     all      all  654356477 1889112400 2543468877  74.27%
keyLru(0) 
   state       miss        hit      total hitrate
 Account          0          0          0  -nan%
   World          0          0          0  -nan%
     all          0          0          0  -nan%
branchLru(67108864) 
   state       miss        hit      total hitrate
 Account  170067193 2599398052 2769465245 93.86%
   World   71736101  857820940  929557041 92.28%
     all  241803294 3457218992 3699022286 93.46%

@bhartnett bhartnett marked this pull request as draft August 25, 2025 11:59
@arnetheduck
Copy link
Member

This result makes me think that we should have a "refresh" operation in minilru that updates an existing value without updating its position in the lru - this is a reasonable middle ground actually where reads still determine the eviction order but we don't lose the item due to a premature and potentially unnecessary delete.

@arnetheduck
Copy link
Member

@bhartnett
Copy link
Contributor Author

https://github.com/status-im/nim-minilru/pull/5/files

Ok, I'll try that. Will run another test using this refresh operation instead.

@bhartnett
Copy link
Contributor Author

I've completed another test run using the LRUCache refresh operation and here are the results:

python scripts/block-import-stats.py ~/Downloads/master.csv ~/Downloads/refresh.csv 
master.csv vs refresh.csv
                      bps_x  bps_y     tps_x     tps_y    time_x    time_y    bpsd    tpsd   timed
block_number                                                                                      
(15537394, 15981838]  31.25  30.06  4,861.74  4,679.47  3h57m32s   4h7m21s  -3.60%  -3.60%   4.36%
(15981838, 16426282]  28.16  27.27  4,009.14  3,876.72  4h23m14s   4h33m5s  -3.16%  -3.16%   3.76%
(16426282, 16870727]  27.95  27.26  4,109.17  4,009.70  4h24m25s  4h34m10s  -2.39%  -2.39%   3.77%
(16870727, 17315171]  26.20  29.02  3,900.46  4,319.19  4h52m33s  4h21m24s  11.72%  11.72%  -9.84%
(17315171, 17759616]  23.29  25.51  3,409.88  3,742.77  5h23m55s  4h52m30s  11.75%  11.75%  -8.06%
(17759616, 18204060]  22.15  22.10  3,180.56  3,173.78  5h37m12s  5h38m23s   0.38%   0.38%   1.02%
(18204060, 18648505]  23.24  21.47  3,386.10  3,121.70  5h19m18s  5h46m58s  -7.49%  -7.49%   8.84%
(18648505, 19092949]  20.52  20.90  3,308.84  3,371.25  6h14m35s   6h7m30s   2.18%   2.18%  -1.58%
(19092949, 19537394]  22.51  23.94  3,744.44  3,974.13  5h43m16s   5h13m1s   7.89%   7.84%  -7.41%

blocks: 3991808, baseline: 45h56m5s, contender: 45h14m25s
Time (total): -41m40s, -1.51%

bpsd = blocks per sec diff (+), tpsd = txs per sec diff, timed = time to process diff (-)
+ = more is better, - = less is better

Cache hit stats for the run using refresh:

vtxLru(4971026)
   state    vtype       miss        hit      total hitrate
 Account    Empty    2992425          0    2992425   0.00%
 Account     Leaf  450651331  386659911  837311242  46.18%
 Account   Branch          0          0          0   -nan%
 Account ExtBranch    9696247   14855094   24551341  60.51%
   World    Empty   29894033          0   29894033   0.00%
   World     Leaf  146729708 1408961189 1555690897  90.57%
   World   Branch          0          0          0   -nan%
   World ExtBranch    3486146    8982541   12468687  72.04%
     all      all  643449890 1819458735 2462908625  73.87%
keyLru(0) 
   state       miss        hit      total hitrate
 Account          0          0          0  -nan%
   World          0          0          0  -nan%
     all          0          0          0  -nan%
branchLru(67108864) 
   state       miss        hit      total hitrate
 Account  169910194 2586950957 2756861151 93.84%
   World   71963139  828348511  900311650 92.01%
     all  241873333 3415299468 3657172801 93.39%

@arnetheduck
Copy link
Member

(18204060, 18648505] 23.24 21.47 3,386.10 3,121.70 5h19m18s 5h46m58s -7.49% -7.49% 8.84%

hmm .. what's going on here? this is oddly consistent between the versions also, would it make sense to rerun master to see that it's not a fluke? ie master vs master run should be near-identical if the benchmarking setup is good.

@bhartnett
Copy link
Contributor Author

(18204060, 18648505] 23.24 21.47 3,386.10 3,121.70 5h19m18s 5h46m58s -7.49% -7.49% 8.84%

hmm .. what's going on here? this is oddly consistent between the versions also, would it make sense to rerun master to see that it's not a fluke? ie master vs master run should be near-identical if the benchmarking setup is good.

Not sure about that. Sure, I'll run master again.

@bhartnett
Copy link
Contributor Author

After creating a new baseline from the latest master branch here are the results:

python scripts/block-import-stats.py /mnt/5aa6b1af-8122-4eec-b52f-bdfb121e74de/master2.csv /mnt/5aa6b1af-8122-4eec-b52f-bdfb121e74de/refresh.csv
master2.csv vs refresh.csv
                      bps_x  bps_y     tps_x     tps_y    time_x    time_y    bpsd    tpsd   timed
block_number                                                                                      
(15537394, 15981838]  30.28  30.06  4,725.96  4,679.47   4h6m30s   4h7m21s   0.51%   0.51%   1.56%
(15981838, 16426282]  29.06  27.27  4,139.39  3,876.72  4h14m50s   4h33m5s  -6.17%  -6.17%   7.15%
(16426282, 16870727]  29.54  27.26  4,342.01  4,009.70   4h10m8s  4h34m10s  -7.69%  -7.69%   9.64%
(16870727, 17315171]  28.90  29.02  4,302.98  4,319.19  4h22m28s  4h21m24s   0.41%   0.41%  -0.40%
(17315171, 17759616]  26.08  25.51  3,821.36  3,742.77  4h45m20s  4h52m30s  -2.16%  -2.16%   2.50%
(17759616, 18204060]  22.30  22.10  3,203.27  3,173.78   5h34m8s  5h38m23s  -0.86%  -0.86%   1.35%
(18204060, 18648505]  22.11  21.47  3,223.00  3,121.70   5h37m7s  5h46m58s  -2.37%  -2.37%   3.47%
(18648505, 19092949]  20.07  20.90  3,235.50  3,371.25  6h21m52s   6h7m30s   4.46%   4.46%  -3.44%
(19092949, 19537394]  21.85  23.94  3,625.09  3,974.13  5h43m36s   5h13m1s   9.83%   9.83%  -8.59%

blocks: 3991808, baseline: 44h56m3s, contender: 45h14m25s
Time (total): 18m22s, 0.68%

bpsd = blocks per sec diff (+), tpsd = txs per sec diff, timed = time to process diff (-)
+ = more is better, - = less is better

Here are the cache hit stats from the latest baseline run:

vtxLru(4971026)
   state    vtype       miss        hit      total hitrate
 Account    Empty    2992425          0    2992425   0.00%
 Account     Leaf  449910636  388293144  838203780  46.32%
 Account   Branch          0          0          0   -nan%
 Account ExtBranch    9691208   14885717   24576925  60.57%
   World    Empty   29880538          0   29880538   0.00%
   World     Leaf  146804391 1408526686 1555331077  90.56%
   World   Branch          0          0          0   -nan%
   World ExtBranch    3488650    8970524   12459174  72.00%
     all      all  642767848 1820676071 2463443919  73.91%
keyLru(0) 
   state       miss        hit      total hitrate
 Account          0          0          0  -nan%
   World          0          0          0  -nan%
     all          0          0          0  -nan%
branchLru(67108864) 
   state       miss        hit      total hitrate
 Account  169861952 2594261252 2764123204 93.85%
   World   71961489  828173509  900134998 92.01%
     all  241823441 3422434761 3664258202 93.40%

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants