Skip to content

Regression / Classification problem type identification improvement #3

@sanketsarang

Description

@sanketsarang

Refers to file: https://github.com/blobcity/autoai/blob/main/blobcity/utils/ProblemType.py

target_length =len(np.unique(data))
            if data.dtype in ['int','float'] and target_length<=100: 
                return dict({'type':'Classification'})
            else: 
                return dict({'type':'Regression'})

The above code is not the best way to differentiate between regression and classification.

Change logic to cardinality off column against length of column. If cardinality of column is greater than or equal to 50% of length, then consider as Regression. If cardinality is less than 50% of length, then consider as Classification.

Metadata

Metadata

Assignees

Labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions