gssr is a utility meant to collect and analyze GPU performance metrics on the CSCS ALPS System. it is based on top of Nvidia's DCGM tool.
pip install gssr
pip install git+https://github.com/eth-cscs/GPU-saturation-scorer.git
To install from a specific branch, e.g. the development branch
pip install git+https://github.com/eth-cscs/GPU-saturation-scorer.git@dev
To install a specific release from a tag, e.g. v0.4.0
pip install git+https://github.com/eth-cscs/GPU-saturation-scorer.git@v0.4.0
If you are submitting a batch job and the command you are executing is
srun python test.py
The srun command should be modified as follows.:
srun gssr profile python test.py
- The gssr option to run is "profile".
- The default output directory is "profile_out_{job_id}"
- You can also set a label to this output data if you prefer with the "-l" flag
If you need to write the output to a specific directory, use the "-o" flag
srun gssr profile -o /abc/def python test.py
The profiled output can be analysed as follows.:
gssr analyze -i ./profile_out
gssr analyze -i ./profile_out --report
PDF report(s) will be generated containing time-series and load-balancing plots.
gssr analyze -i ./profile_out --report -hm
The generation of heatmaps is very time-consuming. Please turn it on at your own risk.
gssr analyze -i ./profile_out --export data.sqlite3
gssr --help