-
Notifications
You must be signed in to change notification settings - Fork 326
Open
Labels
bugSomething isn't workingSomething isn't workingunder discussionIssue is currently being discussedIssue is currently being discussed
Description
Environment Details
Please indicate the following details about the environment in which you found the bug:
- CTGAN version: 1.25.0 (sdv)
- Python version: 3.12.8
- Operating System:
Error Description
When trying the fit the "automobile" dataset from UCIML, the 'city-mpg' column, which is continuous, seems to be interpreted as a location and a column of strings is generated in the synthetic data. This might have to do with the column name, as if I rename the column as 'mpg', column of correct datatype will be returned.

Steps to reproduce
import pandas as pd
from sdv.single_table import CTGANSynthesizer
from sdv.metadata import Metadata
from ucimlrepo import fetch_ucirepo
# fetch dataset
automobile = fetch_ucirepo(id=10)
# data (as pandas dataframes)
X = automobile.data.features
metadata = Metadata.detect_from_dataframe(X)
synthesizer = CTGANSynthesizer(metadata)
synthesizer.fit(X)
synthetic_data = synthesizer.sample(num_rows=1000)
Metadata
Metadata
Assignees
Labels
bugSomething isn't workingSomething isn't workingunder discussionIssue is currently being discussedIssue is currently being discussed