Skip to content

Conversation

@dongwonmoon
Copy link

@dongwonmoon dongwonmoon commented Nov 7, 2025

@dongwonmoon dongwonmoon force-pushed the bug/json-normalize-int-key branch from f24285a to 84233a2 Compare November 7, 2025 16:20
@dongwonmoon
Copy link
Author

Hi reviewers,

P.S. While working on this fix, I noticed that the type hint for the meta parameter is currently specified as str | list[str | list[str]].

My PR ensures that non-string keys (like the int key in our test case) are handled consistently, whether record_path is specified or not. This aligns with the existing behavior when record_path=None, which already supports non-string keys (mirroring pd.DataFrame's ability to have non-string column names).

I've kept this PR scoped strictly to fixing the TypeError and ensuring consistent behavior.

Would a separate, follow-up issue or PR to discuss updating the type hint (perhaps to something like Hashable or Any) to match this behavior be welcome? Just wanted to bring it to your attention.

Thanks!

- Fix bug in ``on_bad_lines`` callable when returning too many fields: now emits
``ParserWarning`` and truncates extra fields regardless of ``index_col`` (:issue:`61837`)
- Bug in :func:`pandas.json_normalize` inconsistently handling non-dict items in ``data`` when ``max_level`` was set. The function will now raise a ``TypeError`` if ``data`` is a list containing non-dict items (:issue:`62829`)
- Bug in :func:`pandas.json_normalize` raising ``TypeError`` when ``meta`` contained a non-string key (e.g., ``int``) and ``record_path`` was specified, which was inconsistent with the behavior when ``record_path`` was ``None`` (:issue:`63019`)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The docs state meta should be a string or list of strings.

https://pandas.pydata.org/docs/reference/api/pandas.json_normalize.html

Why are we supporting non-strings?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When record_path is None: pd.json_normalize([{'a': 1, 12: 'val'}], meta=[12]) already works perfectly. It correctly creates an integer column named 12, which is very useful and consistent with pd.DataFrame itself supporting non-string column names.

When record_path is set: As this issue shows, the exact same call (meta=[12]) suddenly fails with a TypeError simply because record_path was added.

This felt like a clear bug. My PR doesn't introduce new support for non-strings; it just fixes the TypeError so the function behaves consistently with itself, whether record_path is used or not.

It seemed better to fix this inconsistency (Path 1) rather than introduce a new breaking change to remove the existing, undocumented support from Path 2 (e.g., by adding a TypeError to the record_path=None case).

Copy link
Member

@rhshadrach rhshadrach Nov 18, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm negative on expanding the scope of json_normalize to handle invalid JSON data.

My PR doesn't introduce new support for non-strings; it just fixes the TypeError so the function behaves consistently with itself, whether record_path is used or not.

This is supporting non-strings.

Copy link
Author

@dongwonmoon dongwonmoon Nov 18, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You are correct that the docs specify str, but the function's actual behavior already differs from the docs.
When record_path=None, json_normalize already works perfectly with non-string keys (like int). This is the existing behavior my PR is based on.
My PR is just a bug fix to make the function behave consistently, fixing the TypeError that only happens when record_path is added.

Copy link
Author

@dongwonmoon dongwonmoon Nov 18, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

import pandas as pd

data = [
    {
        "a": 1,
        12: "meta_value_1",  # int key
        "nested": [{"b": 2, "c": 3}],
    },
    {
        "a": 6,
        12: "meta_value_2",
        "nested": [{"b": 7, "c": 8}],
    },
]

df = pd.json_normalize(
    data,
    record_path=None,
    meta=[12, "a"],
)

print(df)
print(type(df.columns[1]))

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

BUG: pd.json_normalize has inconsistent validation for non-string keys - accepts integers in some parameter combinations but rejects in others

2 participants