-
Notifications
You must be signed in to change notification settings - Fork 9
Open
Description
When running infer_types on an empty DataFrame, the logic in type_infer/rule_based/core.py (lines 33–94) fails because population_size is 0. The logging statement at line 41 performs a division by population_size, causing a ZeroDivisionError.
Even if that is guarded, the subsequent identifier pass still breaks: get_identifier_description is called with an empty column and immediately accesses data[0], which raises an IndexError on empty input.
Steps To Reproduce
import pandas as pd
from type_infer.api import infer_types
df = pd.DataFrame()
print(infer_types(df))Output:
INFO:type_infer-21891:Analyzing a sample of 0
Traceback (most recent call last):
File "/Users/apple/Desktop/type_infer/./issue_test/main.py", line 5, in <module>
print(infer_types(df))
^^^^^^^^^^^^^^^
File "/Users/apple/Desktop/type_infer/type_infer/api.py", line 38, in infer_types
return engine.infer(data)
^^^^^^^^^^^^^^^^^^
File "/Users/apple/Desktop/type_infer/type_infer/rule_based/core.py", line 41, in infer
f'from a total population of {population_size}, this is equivalent to {round(sample_size * 100 / population_size, 1)}% of your data.') # noqa
~~~~~~~~~~~~~~~~~~^~~~~~~~~~~~~~~~~
ZeroDivisionError: division by zero
Expected Output:
Empty inputs should be handled properly. The function should either return an empty or an invalid TypeInformation, or raise a ValueError explaining that type_infer cannot run on an empty DataFrame.
Metadata
Metadata
Assignees
Labels
No labels