Skip to content

Moggwp/HPC_internship

Repository files navigation

PM100 Dataset

This is the official repository for the PM100 HPC job workload dataset, containing the scripts for the creation of the final data. The official dataset can be found in DOI.

Repository structure

  • extract_data.py: The script containing the functions used to extract the final job power consumption from job traces and power logs. It is showed also an example of a pipeline to extract such values from data structured like the one in M100 dataset.
  • inspect_data.py: The script reports the operation performed on the final PM100 data to produce the plots in the plots folder. Moreover, it reports a function to load the data and provide examples on how to inspect it.
  • documentation: The folder contains some documentation of the final dataset, like the job features description.
  • plots : The folder contains the plots presented in the paper.

Preliminaries

All the packages used in the project are reported in the requirements.txt, the Python version used was the 3.6.8.

It is a good practice to create a virtual environment and then install the required packages with pip3 install -r requirements.txt.

In order to extract the PM100 dataset from M100, first it is needed to download the correct data from Zenodo.

After downloading the data relative to the whole period (YY-MM from 20-05 to 20-10) or just a subset of it, the archives must be extracted and the tables which are needed for the scripts are:

  • year_month=YY-MM/plugin=job_table/metric=job_info_marconi100/a_0.parquet : Parquet file containing the job data;
  • year_month=YY-MM/plugin=ipmi_pub/metric=ps0_input_power/a_0.parquet : Parquet file containing the first power socket metrics;
  • year_month=YY-MM/plugin=ipmi_pub/metric=ps1_input_power/a_0.parquet : Parquet file containing the second power socket metrics.

The job tables related to the different months can be merged by running the merge_m100_tables.py, expliciting the path to the downloaded data in the job_table_data_path variable.

Launch the extraction

Before launching the extraction, the variables job_table_path, ps[0, 1]_table_path and final_table_path must be initialized with the path to the downloaded data and the desidered output file for the dataset.

The result of the execution is a parquet file containing the data structured as presented in the documentation/job_features.md file.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages