Code for generating the dataset for the paper "Towards a Foundation Model for Communication Systems".
arXiv Link: https://arxiv.org/abs/2505.14603
Citation:
@inproceedings{
buffelli2025towards,
title={Towards a Foundation Model for Communication Systems},
author={Davide Buffelli and Sowmen Das and Yu-Wei Lin and Sattar Vakili and Chien-Yi Wang and Masoud Attarifar and Pritthijit Nath and Da-shan Shiu},
booktitle={ICML 2025 Workshop on Machine Learning for Wireless Communication and Networks (ML4Wireless)},
year={2025},
url={https://openreview.net/forum?id=VZzF53BH0h}
}
Create a new conda environment using the environment.yml file provided in the repository:
conda env create -f environment.ymlActivate the newly created conda environment:
conda activate venvTo generate data run the following:
python generate_data/generate.py <path_to_config_directory> <output_folder>Available options:
- overwrite: Overwrite existing saved data. If not specified, configs for which data exists in the output folder will be skipped.
- process: number of concurrent processes to use for multiprocessing. Too many will slow down the throughput
- files: exactly specify the name of a config file. Example:
--files 0.yaml 1.yamlwill run generation using the two specified files - batch: Specify batch size. Default is 10
To generate config files run:
python generate_config_files.py <path_to_config_directory> -n 10This will generate 10 random configurations in the directory provided. If n is greater than all possible combinations, we will generate all possible files.