Skip to content

Possible concurrency bug (needs more info) #178

@dakom

Description

@dakom

This is possibly related to #162 but also maybe not, and could be expected behavior

  1. Install a load-testing tool like https://ghz.sh/
  2. Run it over some time against a simple service. Example:
ghz --insecure -z 5s --call cosmos.base.tendermint.v1beta1.Service.GetLatestBlock localhost:9090
  1. Observe that it reports some errors (maybe 0.1% of the time)

Other variations such as running by a large number of requests (e.g. 50,000) or switching the async mode on (--async) do not alleviate the errors

Using smaller loads does work, e.g. running a limited test over 1,000 requests is fine, also, increasing the number of connections reduces errors when it's an explicit number, but not over time. For example, this works for me:

ghz --insecure -n 50000 --connections 20 --call cosmos.base.tendermint.v1beta1.Service.GetLatestBlock localhost:9090

but this does not, even though on my machine this time cap runs roughly the same number of requests as above:

ghz --insecure -z 5s --connections 20 --call cosmos.base.tendermint.v1beta1.Service.GetLatestBlock localhost:9090

I'm not sure if this is expected behavior because we have some sort of internal rate limiting, or if I'm not running the load testing correctly, or if it's a genuine issue - but I do think it's worth looking into to make sure we don't have some genuine concurrency bug in the server

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions