Skip to content

json_normalize does not handle nested meta paths when also using a nested record_pathΒ #27220

@connormcmk

Description

@connormcmk

Code Sample

from pandas.io.json import json_normalize

data = [{'state': 'Florida',
         'shortname': 'FL',
         'info': {
              'governor': 'Rick Scott'
         },
         'counties': [{'name': 'Dade', 'population': 12345},
                     {'name': 'Broward', 'population': 40000},
                     {'name': 'Palm Beach', 'population': 60000}]},
        {'state': 'Ohio',
         'shortname': 'OH',
         'info': {
              'governor': 'John Kasich'
         },
         'counties': [{'name': 'Summit', 'population': 1234},
                      {'name': 'Cuyahoga', 'population': 1337}]}]

print(json_normalize(data, ['counties', 'name'], ['state', 'shortname', ['info', 'governor']], errors='ignore'))

Returns

    0    state shortname  info.governor
0   D  Florida        FL            NaN
1   a  Florida        FL            NaN
2   d  Florida        FL            NaN
3   e  Florida        FL            NaN
4   B  Florida        FL            NaN
5   r  Florida        FL            NaN
6   o  Florida        FL            NaN
7   w  Florida        FL            NaN
8   a  Florida        FL            NaN
9   r  Florida        FL            NaN
10  d  Florida        FL            NaN
11  P  Florida        FL            NaN
12  a  Florida        FL            NaN
13  l  Florida        FL            NaN
14  m  Florida        FL            NaN
15     Florida        FL            NaN
16  B  Florida        FL            NaN
17  e  Florida        FL            NaN
18  a  Florida        FL            NaN

Problem description

Running json_normalize on a nested record_path with a nested meta argument gives an error that says it cannot find info.governer.

This is inconsistent because running it again, without the nested meta argument, successfully returns the data:

# using the same data from before
print(json_normalize(data, ['counties', 'name'], ['state', 'shortname', 'info'], errors='ignore'))
    0    state shortname                         info
0   D  Florida        FL   {'governor': 'Rick Scott'}
1   a  Florida        FL   {'governor': 'Rick Scott'}
2   d  Florida        FL   {'governor': 'Rick Scott'}
3   e  Florida        FL   {'governor': 'Rick Scott'}
4   B  Florida        FL   {'governor': 'Rick Scott'}
5   r  Florida        FL   {'governor': 'Rick Scott'}
6   o  Florida        FL   {'governor': 'Rick Scott'}
7   w  Florida        FL   {'governor': 'Rick Scott'}

Similarly, using a non-nested record path also works (in fact, this is the exact sample example that can be found in the json_normalize pandas documentation).

# using the same data from before
print(json_normalize(data, 'counties', ['state', 'shortname', ['info', 'governor']]))
         name  population    state shortname info.governor
0        Dade       12345  Florida        FL    Rick Scott
1     Broward       40000  Florida        FL    Rick Scott
2  Palm Beach       60000  Florida        FL    Rick Scott
3      Summit        1234     Ohio        OH   John Kasich
4    Cuyahoga        1337     Ohio        OH   John Kasich

The result is that it is much more difficult to access nested meta data.

May be related to: #21537

Expected Output

    0    state shortname  info.governor
0   D  Florida        FL            Rick Scott
1   a  Florida        FL            Rick Scott
2   d  Florida        FL            Rick Scott
3   e  Florida        FL            Rick Scott
4   B  Florida        FL            Rick Scott
5   r  Florida        FL            Rick Scott
6   o  Florida        FL            Rick Scott
7   w  Florida        FL            Rick Scott
8   a  Florida        FL            Rick Scott
9   r  Florida        FL            Rick Scott
10  d  Florida        FL            Rick Scott
11  P  Florida        FL            Rick Scott
12  a  Florida        FL            Rick Scott
13  l  Florida        FL            Rick Scott
14  m  Florida        FL            Rick Scott
15     Florida        FL            Rick Scott
16  B  Florida        FL            Rick Scott
17  e  Florida        FL            Rick Scott
18  a  Florida        FL            Rick Scott

Note that ['counties', 'name'] is an arbitrary list of strings to use as a record path, and that this example is contrived (who really needs a table comprised of each letter of a string?). However, many real scenarios can be constructed that require this sort of nested record_path extraction along with nested meta path extraction.

Output of pd.show_versions()

[paste the output of pd.show_versions() here below this line]
INSTALLED VERSIONS

commit: None
python: 3.7.3.final.0
python-bits: 64
OS: Linux
OS-release: 4.9.125-linuxkit
machine: x86_64
processor: x86_64
byteorder: little
LC_ALL: en_US.UTF-8
LANG: en_US.UTF-8
LOCALE: en_US.UTF-8

pandas: 0.24.2
pytest: None
pip: 19.1
setuptools: 41.0.1
Cython: 0.29.7
numpy: 1.15.4
scipy: 1.2.1
pyarrow: None
xarray: None
IPython: 7.5.0
sphinx: None
patsy: 0.5.1
dateutil: 2.8.0
pytz: 2019.1
blosc: None
bottleneck: None
tables: None
numexpr: 2.6.9
feather: None
matplotlib: 3.0.3
openpyxl: None
xlrd: 1.2.0
xlwt: None
xlsxwriter: None
lxml.etree: None
bs4: 4.7.1
html5lib: None
sqlalchemy: 1.3.3
pymysql: None
psycopg2: None
jinja2: 2.10.1
s3fs: None
fastparquet: None
pandas_gbq: None
pandas_datareader: None
gcsfs: None

Metadata

Metadata

Assignees

No one assigned

    Labels

    IO JSONread_json, to_json, json_normalizeNeeds TestsUnit test(s) needed to prevent regressionsgood first issue

    Type

    No type

    Projects

    No projects

    Milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions