Skip to content

Pandas raises a FutureWarning when a synthesizer times out #454

@R-Palazzo

Description

@R-Palazzo

Environment Details

  • SDGym version: 0.10.0

Error Description

When a user sets a timeout value and a synthesizer exceeds that timeout, some columns in the result tables are filled with NaN. As a result, pandas raises a FutureWarning during DataFrame concatenation:

FutureWarning:

The behavior of DataFrame concatenation with empty or all-NA entries is deprecated. In a future version, this will no longer exclude empty or all-NA columns when determining the result dtypes. To retain the old behavior, exclude the relevant entries before the concat operation.

The line where the warning is emitted is:

scores = pd.concat(scores, ignore_index=True)

Steps to reproduce

from sdgym.benchmark import benchmark_single_table

result = benchmark_single_table(
    synthesizers=['GaussianCopulaSynthesizer'],
    custom_synthesizers=None,
    sdv_datasets=['child'],
    additional_datasets_folder=None,
    limit_dataset_size=True,
    compute_quality_score=True,
    compute_diagnostic_score=True,
    compute_privacy_score=True,
    sdmetrics=None,
    timeout=4,  # put a small timeout to trigger it
    show_progress=False,
    multi_processing_config=None,
    run_on_ec2=False,
)
result

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions