Symlink gptcache folder to data
ln -s GPTCache/examples/benchmark data/
Extract the data from the GPTCache benchmark folder
tar -xvzf similiar_qqp_full.json.gz
Run dummy upstream server
uvicorn dummy_server:app --reload --host 0.0.0.0 --port 8081
Run load test
python similarity_test.py
The cache eviction seems to work:
- memory usage stays constant
- I see plenty of eviction logs
- miss average latency is 3.03 seconds when I set the dummy upstream server to sleep in 3 seconds with fixed small cache size, meaning cache eviction is done in roughly .03 seconds
- total miss latency when cache is missed and dummy upstream returns instantly is average 0.08ms
- hit latency is solidly around 0.03ms
- at a cache size of 682 190mb
- at a cache size of 1798 194 mb
- at a cache size of 2888 200mb
- at a cache size of 5334 204mb
- at a cache size of 7583 209mb
- at a cache size of 9488 212mb
- at a cache size of 11310 213mb
If we run a linear regression on this we find that the mb's scale with roughly 0.0002 per new entry with a minimum of 191mb's at startup (system overhead)
- managed to get it up to 45 req/s running everything locally without issue