Skip to content

[BUG] Crash occurs during type inference on empty DataFrame #86

@rudraditya21

Description

@rudraditya21

When running infer_types on an empty DataFrame, the logic in type_infer/rule_based/core.py (lines 33–94) fails because population_size is 0. The logging statement at line 41 performs a division by population_size, causing a ZeroDivisionError.

Even if that is guarded, the subsequent identifier pass still breaks: get_identifier_description is called with an empty column and immediately accesses data[0], which raises an IndexError on empty input.

Steps To Reproduce

import pandas as pd
from type_infer.api import infer_types

df = pd.DataFrame()
print(infer_types(df))

Output:

INFO:type_infer-21891:Analyzing a sample of 0
Traceback (most recent call last):
  File "/Users/apple/Desktop/type_infer/./issue_test/main.py", line 5, in <module>
    print(infer_types(df))
          ^^^^^^^^^^^^^^^
  File "/Users/apple/Desktop/type_infer/type_infer/api.py", line 38, in infer_types
    return engine.infer(data)
           ^^^^^^^^^^^^^^^^^^
  File "/Users/apple/Desktop/type_infer/type_infer/rule_based/core.py", line 41, in infer
    f'from a total population of {population_size}, this is equivalent to {round(sample_size * 100 / population_size, 1)}% of your data.')  # noqa
                                                                                 ~~~~~~~~~~~~~~~~~~^~~~~~~~~~~~~~~~~
ZeroDivisionError: division by zero

Expected Output:

Empty inputs should be handled properly. The function should either return an empty or an invalid TypeInformation, or raise a ValueError explaining that type_infer cannot run on an empty DataFrame.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions