semcache-benchmark

set up

Symlink gptcache folder to data ln -s GPTCache/examples/benchmark data/

Extract the data from the GPTCache benchmark folder tar -xvzf similiar_qqp_full.json.gz

Run dummy upstream server uvicorn dummy_server:app --reload --host 0.0.0.0 --port 8081

Run load test python similarity_test.py

Notes from stress test

Cache eviction

The cache eviction seems to work:

memory usage stays constant
I see plenty of eviction logs

miss latency

miss average latency is 3.03 seconds when I set the dummy upstream server to sleep in 3 seconds with fixed small cache size, meaning cache eviction is done in roughly .03 seconds
total miss latency when cache is missed and dummy upstream returns instantly is average 0.08ms

Hit latency

hit latency is solidly around 0.03ms

memory usage

at a cache size of 682 190mb
at a cache size of 1798 194 mb
at a cache size of 2888 200mb
at a cache size of 5334 204mb
at a cache size of 7583 209mb
at a cache size of 9488 212mb
at a cache size of 11310 213mb

If we run a linear regression on this we find that the mb's scale with roughly 0.0002 per new entry with a minimum of 191mb's at startup (system overhead)

Request rate

managed to get it up to 45 req/s running everything locally without issue

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
.vscode		.vscode
GPTcache @ 48f8e76		GPTcache @ 48f8e76
__pycache__		__pycache__
.gitmodules		.gitmodules
README.md		README.md
data		data
dummy_server.py		dummy_server.py
requirements.txt		requirements.txt
similarity_test.py		similarity_test.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

semcache-benchmark

set up

Notes from stress test

Cache eviction

miss latency

Hit latency

memory usage

Request rate

About

Uh oh!

Releases

Packages

Languages

sensoris/semcache-benchmark

Folders and files

Latest commit

History

Repository files navigation

semcache-benchmark

set up

Notes from stress test

Cache eviction

miss latency

Hit latency

memory usage

Request rate

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages