Skip to content

RFC: Drop Verification from Benchmarking #3

@msmith-techempower

Description

@msmith-techempower

Summary

  • Remove the verification step during benchmarking

Motivation

Verification is an important step in determining whether a test implementation is valid, but can be very time-consuming in practice. The implementation of the new Verifier aims to make understanding verification implementations easier, as well as adding new verifications easier. With the assumption that verifications will be more easily expanded and new verifications added (see new verifications for reference), the time it takes to run a verification is expected to increase over time.

At the time of this writing, the update verification takes 31 seconds. json take 4 as does plaintext. There are 650 test implementations, each of which will end up verifying one or more of these test types. In the best case (incorrect) scenario, 2,600 seconds (43 minutes) are spent verifying. In the case where 3/5 take 30 seconds, it looks more like 50,960 seconds (14 hours). A continuous benchmark run can be shortened by several hours via the following:

  1. The benchmark process will not run verification
  2. Assume that if a test implementation is not tagged "broken", that it has passed verification

History

Currently (and also in the legacy implementation), running a benchmark of a given test implementation incurs the cost of verification. This is done to ensure that time is not spent on running a benchmark against a test which will not respond correctly to the end-point being benchmarked (e.g. if fortune returns a 500, instead of a 200, it should not be measured).

The legacy implementation has this rule imposed because verification came as an afterthought to the benchmarking process. Originally, the legacy implementation did benchmark test implementation which returned a 500 response, for example. Eventually, the verification step was added to ensure that tests were implemented correctly, and patches were made to tests to try and get them to pass verification retroactively. Verification has been the standard for several years now, and it seems like we are past the point where test implementations are merged which do not pass verification.

Drawbacks

  • A clever malicious contributor could open a pull request with a sophisticated black box framework implementation (as a linked library, rather than source code) that passes verification to get merged in, but returns empty 200s for all the tests (or a similar attack)
  • Unreliable failures, such as remote dependencies not being available at the time, would result in benchmarking incorrect implementations. May be addressed by RFC: Publish Tagged Test Implementations for Benchmarking #4

Supplemental Considerations

  • Implement a "light" verification step tailored to running a benchmark, which only checks that the service is available and returning a 200 response. This would alleviate, somewhat, the unreliable failures drawback mentioned above.

Alternatives

  • Leave verification as a first-step to running a benchmark

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions