-
Notifications
You must be signed in to change notification settings - Fork 65
Description
Environment Details
- SDGym version: 0.9.1 (latest)
- Operating System: Linux, environment is a Colab Notebook
Error Description
In a Colab Notebook environment, I am unable to get results for custom synthesizers if I supply a timeout value. The synthesizer shows up in the results DataFrame, but all the associated values for it are NaN (even the ones for dataset size, initialization, etc.). All of the other, pre-defined synthesizers produce values.
This problem goes away if I remove the time out value, or run the script on my local machine instead. So it is the combination of following that is causing the issue:
- (a) running on a Colab notebook (or likely any interactive environment), and
- (b) adding a custom synthesizer to the benchmark, and
- (c) adding a timeout
Steps to reproduce
The code below creates a custom synthesizer that is just a variant of GaussianCopula (setting marginals to uniform). Then it tries to run the benchmark for it.
import sdgym
from sdv.metadata import Metadata
from sdv.single_table import GaussianCopulaSynthesizer
from sdgym import create_single_table_synthesizer
def get_trained_synthesizer(data, metadata):
metadata_obj = Metadata.load_from_dict(metadata)
synthesizer = GaussianCopulaSynthesizer(metadata_obj, default_distribution='uniform')
synthesizer.fit(data)
return synthesizer
def sample_from_synthesizer(synthesizer, n_rows):
return synthesizer.sample(n_rows)
GCUniformSynthesizer = create_single_table_synthesizer(
get_trained_synthesizer_fn=get_trained_synthesizer,
sample_from_synthesizer_fn=sample_from_synthesizer,
display_name='GCUniform'
)
results = sdgym.benchmark_single_table(
synthesizers=['GaussianCopulaSynthesizer'],
custom_synthesizers=[GCUniformSynthesizer],
sdv_datasets=['KRK_v1'],
limit_dataset_size=False,
timeout=20*60, # 20 min
output_filepath='results.csv',
detailed_results_folder='/content/results',
sdmetrics=[]
)This script works as expected on a terminal. But if I run it on a Colab Notebook, I see NaN values produced:

Additional Context
According to @frances-h: We ran into a similar issue when working on this PR.