-
-
Notifications
You must be signed in to change notification settings - Fork 19.1k
Description
Code Sample
from pandas.io.json import json_normalize
data = [{'state': 'Florida',
'shortname': 'FL',
'info': {
'governor': 'Rick Scott'
},
'counties': [{'name': 'Dade', 'population': 12345},
{'name': 'Broward', 'population': 40000},
{'name': 'Palm Beach', 'population': 60000}]},
{'state': 'Ohio',
'shortname': 'OH',
'info': {
'governor': 'John Kasich'
},
'counties': [{'name': 'Summit', 'population': 1234},
{'name': 'Cuyahoga', 'population': 1337}]}]
print(json_normalize(data, ['counties', 'name'], ['state', 'shortname', ['info', 'governor']], errors='ignore'))
Returns
0 state shortname info.governor
0 D Florida FL NaN
1 a Florida FL NaN
2 d Florida FL NaN
3 e Florida FL NaN
4 B Florida FL NaN
5 r Florida FL NaN
6 o Florida FL NaN
7 w Florida FL NaN
8 a Florida FL NaN
9 r Florida FL NaN
10 d Florida FL NaN
11 P Florida FL NaN
12 a Florida FL NaN
13 l Florida FL NaN
14 m Florida FL NaN
15 Florida FL NaN
16 B Florida FL NaN
17 e Florida FL NaN
18 a Florida FL NaN
Problem description
Running json_normalize on a nested record_path with a nested meta argument gives an error that says it cannot find info.governer.
This is inconsistent because running it again, without the nested meta argument, successfully returns the data:
# using the same data from before
print(json_normalize(data, ['counties', 'name'], ['state', 'shortname', 'info'], errors='ignore'))
0 state shortname info
0 D Florida FL {'governor': 'Rick Scott'}
1 a Florida FL {'governor': 'Rick Scott'}
2 d Florida FL {'governor': 'Rick Scott'}
3 e Florida FL {'governor': 'Rick Scott'}
4 B Florida FL {'governor': 'Rick Scott'}
5 r Florida FL {'governor': 'Rick Scott'}
6 o Florida FL {'governor': 'Rick Scott'}
7 w Florida FL {'governor': 'Rick Scott'}
Similarly, using a non-nested record path also works (in fact, this is the exact sample example that can be found in the json_normalize pandas documentation).
# using the same data from before
print(json_normalize(data, 'counties', ['state', 'shortname', ['info', 'governor']]))
name population state shortname info.governor
0 Dade 12345 Florida FL Rick Scott
1 Broward 40000 Florida FL Rick Scott
2 Palm Beach 60000 Florida FL Rick Scott
3 Summit 1234 Ohio OH John Kasich
4 Cuyahoga 1337 Ohio OH John Kasich
The result is that it is much more difficult to access nested meta data.
May be related to: #21537
Expected Output
0 state shortname info.governor
0 D Florida FL Rick Scott
1 a Florida FL Rick Scott
2 d Florida FL Rick Scott
3 e Florida FL Rick Scott
4 B Florida FL Rick Scott
5 r Florida FL Rick Scott
6 o Florida FL Rick Scott
7 w Florida FL Rick Scott
8 a Florida FL Rick Scott
9 r Florida FL Rick Scott
10 d Florida FL Rick Scott
11 P Florida FL Rick Scott
12 a Florida FL Rick Scott
13 l Florida FL Rick Scott
14 m Florida FL Rick Scott
15 Florida FL Rick Scott
16 B Florida FL Rick Scott
17 e Florida FL Rick Scott
18 a Florida FL Rick Scott
Note that ['counties', 'name']
is an arbitrary list of strings to use as a record path, and that this example is contrived (who really needs a table comprised of each letter of a string?). However, many real scenarios can be constructed that require this sort of nested record_path extraction along with nested meta path extraction.
Output of pd.show_versions()
[paste the output of pd.show_versions()
here below this line]
INSTALLED VERSIONS
commit: None
python: 3.7.3.final.0
python-bits: 64
OS: Linux
OS-release: 4.9.125-linuxkit
machine: x86_64
processor: x86_64
byteorder: little
LC_ALL: en_US.UTF-8
LANG: en_US.UTF-8
LOCALE: en_US.UTF-8
pandas: 0.24.2
pytest: None
pip: 19.1
setuptools: 41.0.1
Cython: 0.29.7
numpy: 1.15.4
scipy: 1.2.1
pyarrow: None
xarray: None
IPython: 7.5.0
sphinx: None
patsy: 0.5.1
dateutil: 2.8.0
pytz: 2019.1
blosc: None
bottleneck: None
tables: None
numexpr: 2.6.9
feather: None
matplotlib: 3.0.3
openpyxl: None
xlrd: 1.2.0
xlwt: None
xlsxwriter: None
lxml.etree: None
bs4: 4.7.1
html5lib: None
sqlalchemy: 1.3.3
pymysql: None
psycopg2: None
jinja2: 2.10.1
s3fs: None
fastparquet: None
pandas_gbq: None
pandas_datareader: None
gcsfs: None