Skip to content
This repository was archived by the owner on Dec 1, 2024. It is now read-only.

How do I match the results of profiling with the parameters of the cost model? #131

@xvanQ

Description

@xvanQ

The output of profile bandwidth is as follows:
size: 0.25 MB, gpu-to-cpu bandwidth: 5.505 GB/s
size: 32.00 MB, gpu-to-cpu bandwidth: 13.220 GB/s
size: 128.00 MB, gpu-to-cpu bandwidth: 13.324 GB/s

size: 0.25 MB, cpu-to-gpu bandwidth: 4.556 GB/s
size: 32.00 MB, cpu-to-gpu bandwidth: 12.285 GB/s
size: 128.00 MB, cpu-to-gpu bandwidth: 12.251 GB/s

Which is ctog_bdw, which is gtoc_bdw_cache, which is gtoc_bdw_hidden?

The output of profile matmul is as follows:
device: cuda, N: 1024, latency: 0.06 ms, TFLOPS: 68.186
device: cuda, N: 2048, latency: 0.20 ms, TFLOPS: 97.026

device: cpu, N: 1024, latency: 0.89 ms, TFLOPS: 3.488
device: cpu, N: 2048, latency: 8.44 ms, TFLOPS: 2.924

which is mm_flops_p, mm_flops_g, bmm_flops_p,bmm_flops_g and cpu_flops?
Thanks

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions