Skip to content

Idea about the structure of an ideal predictor for architect exploration #215

@zybzzz

Description

@zybzzz

I am currently working on the design of a value predictor. I have understood the use of arch_db(trace) and have some new ideas. For the value predictor, we use arch_db to record runtime traces. We then adjust our designed value predictor based on these traces, attempting to optimize it—for example, achieving an accuracy rate above 95%. After that, we perform SPEC06 benchmarking to analyze the gains we've obtained.

In this process, the final benefits we get from the value predictor are unknown; that is, we don't know the ideal gains we would achieve if all instructions were predicted successfully. For example, when we optimize branch prediction, we know that better branch prediction can improve the instruction fetch bandwidth, thereby absolutely enhancing processor performance. Similarly, when we optimize cache prefetching, we know that effective prefetching can definitely reduce memory access overhead, thus absolutely improving processor performance. However, with value prediction technology, we cannot be certain that it will yield better performance even when it works well. Simply put, when we predict load instructions, even if the prediction is correct, the correct result might be wasted due to violations of load/store memory access ordering. Similarly, when predicting add instructions, the predicted addition might not be an instruction affected by true dependencies in the out-of-order execution backend; even if the prediction is correct, it may not significantly improve performance and might instead incur the overhead of incorrect predictions. I believe all these reasons boil down to the fact that optimizing backend execution in an out-of-order processor involves too many considerations.

Based on this, I came up with an idea: we can try to first obtain the best-case benefits of value prediction technology. For example, we can use a emulator like NEMU that directly provides absolutely correct results when predicting instructions. This effectively simulates a value predictor with 100% accuracy. We can then perform benchmarking evaluations on such a value predictor to know in advance the maximum gains we can achieve before we start trace tuning, giving us more confidence to proceed with optimization. If the designed interface is sufficiently excellent and general, I believe this approach can help us quickly perform design space exploration and rapidly analyze the benefits that our design can bring.

In comparison, both tuning the value predictor to optimal performance using traces and constructing an ideal predictor with 100% accuracy aim to determine how much benefit our design can achieve under optimal conditions. For trace-based analysis, the difficulty lies in manually analyzing the traces and tuning based on them. For the ideal predictor, the challenge is in designing a NEMU that provides 100% correct values, integrating it into gem5, and ensuring that the instruction streams between the NEMU and gem5 are synchronized.

I wonder if any of you have relevant experience or would be willing to share your thoughts or offer your suggestions. Thanks!

@shinezyy @jensen-yan

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions