You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
OBSERVED TREND For Research Question 1: According to the data, there is an opioid crisis in America. Opioid related deaths increased by more than an order of magnitude in 17 years. (1999: 2.0 deaths per 100,000 population; 2016: 12.6 deaths per 100,000 population.) By comparison, there was a negligable increase in anaesthesia deaths during the same period. Males are almost twice as likely to be affected as females. The age range of 25 - 54 years is most likely to be affected by the crisis. White people are twice as likely to be affected as black people; indigenous peoples are also heavily affected, while asian and pacific islanders are least affected by the crisis. The crisis is distributed fairly evenly throughout the continental United States, although there is a band in the geographic center of the country that is relatively unaffected. Both coasts of the country are heavily affected. Some notable hot-spots include West Virginia, the New England region, Oklahoma, parts of New Mexico, Utah, and Nevada, and the Pacific Northwest. Southern Alaska is affected, while Hawaii is only moderately affected.
OBSERVED TREND For Research Question 2: CMS Medicare Part D Opioid Prescriber data reflects the change in opioid prescribing rates from 2013 to 2015 at the at the state level. At a high level, the majority of states see a decrease in opioid prescribing rates and an increase in extended-release rates over the three year period. Wyoming is the only state to see an increase in opioid prescribing (0.23) and extended release prescribing (0.08). There are a few states that saw a decrease in both opioid prescribing and extended release prescribing: District of Columbia (-0.21 and -0.04), New Mexico (- 0.04 and -0.09), Oregon ( -0.30 and -0.15), and Washington ( -0.07 and -0.05)
OBSERVED TREND For Research Question 3: Looking at 2010 census data of urban population percentage for each state and the KFF analysis of CDC opioid overdose deaths, there seems to be a significant increase in the change in opioid deaths the more rural a state is in that time period (p-value = 0.003). Opioid overdose deaths are used as a proxy for opioid usage in addition as a measure of the negative impact of opioid abuse.
It is a common assumption that crime and drugs are tightly correlated, as had been the case previously with heroin in the 1970s and crack in the 1980s. I test this association to see if it carries over to the current opioid epidemic. It seems that unlike the previous drug epidemics, the impact on crime seems to be negligible. Using the previous opioid death data and FBI uniform crime report data, changes from 1999-2014 in violent crime rates, robbery rates, and property crime rates each did not seem to have a significant impact on 2014 opioid death rates (p-values are 0.776, 0.513, 0.283 respectively).
OBSERVED TREND For Research Question 4: Total CMS Spending increased from 398,350 to 1,254,525 million (214% increase). In this same time period National Health Expenditures also increased from 1,277,700 to 3,337,248 million (160% increase). Even though all National Health Expenditure costs increased, the percentage of those costs that were CMS increased from 31 to 37%.
Developer caveat for Question 6 data set 6 (map.) Geoplotlib does not have a direct analog for matplotlib.savefig. Screenshots (cntl-p in running map window) of the generated maps are available in the Output folder named Q1_Plot6a, b, and c.png. See requirements.txt for geoplotlib dependencies.
Developer caveat for warnings generated by statsmodel.api; Warnings are not important for this usage, as the only part deprecated is for timestamping generated output.
Dependencies
importos# os library
importnumpyasnp# numpy library
importpandasaspd# pandas library
importmatplotlib.pyplotasplt# pyplot module from matplotlib library
importgeoplotlib# geoplotlib library
fromgeoplotlib.utilsimportBoundingBox# BoundingBox module from geoplotlib.utils
fromgeoplotlib.colorsimportColorMap# ColorMap module from geoplotlib.colors
importseabornassns# seaborn library
importjson# json library
importstatsmodels.apiassm# to use regressions
C:\Users\fcardwell\AppData\Local\Continuum\anaconda3\envs\PythonData\lib\site-packages\statsmodels\compat\pandas.py:56: FutureWarning: The pandas.core.datetools module is deprecated and will be removed in a future version. Please use the pandas.tseries module instead.
from pandas.core import datetools
sns.set() # switches to seaborn default display
Research Question 1: Does the crisis exist? If so, what is it's magnitude? Who is affected? Where is it occuring?
filename='Q1DS1.csv'# 1st data file for Q1
csv_file=os.path.join(".", "Data Files", "Question_1", filename) # creates path to read data
q1ds1_df=pd.read_csv(csv_file, index_col="Year") # reads data from fileq1ds1_df.head()
([<matplotlib.axis.XTick at 0xdc76048>,
<matplotlib.axis.XTick at 0x3c2fa20>,
<matplotlib.axis.XTick at 0xdc5ada0>,
<matplotlib.axis.XTick at 0xdcf1ef0>,
<matplotlib.axis.XTick at 0xdcfe630>,
<matplotlib.axis.XTick at 0xdcfeda0>,
<matplotlib.axis.XTick at 0xdd02550>,
<matplotlib.axis.XTick at 0xdd02cc0>,
<matplotlib.axis.XTick at 0xdd08470>,
<matplotlib.axis.XTick at 0xdd08be0>,
<matplotlib.axis.XTick at 0xdd0f390>,
<matplotlib.axis.XTick at 0xdd0fb00>,
<matplotlib.axis.XTick at 0xdd142b0>,
<matplotlib.axis.XTick at 0xdd14a20>,
<matplotlib.axis.XTick at 0xdd191d0>,
<matplotlib.axis.XTick at 0xdd19940>,
<matplotlib.axis.XTick at 0xdd1d0f0>,
<matplotlib.axis.XTick at 0xdd1d860>],
<a list of 18 Text xticklabel objects>)
axes=plt.gca()
c1min, c1max=axes.get_ylim() # gets y limits, sanity check
plt.savefig('./Output/Q1_Plot1.png', bbox_inches='tight') # saves plot to fileplt.show() # displays plot
Sanity Check - Look at Anaesthesia Deaths For Same Period
filename='Q1DS2.csv'# 2nd data file for Q1
csv_file=os.path.join(".", "Data Files", "Question_1", filename) # creates path to read data
q1ds2_df=pd.read_csv(csv_file, index_col="Year") # reads data from file
year_min=q1ds2_df.index.min() # finds min and max datesyear_max=q1ds2_df.index.max()
plt.figure(figsize= (17,10)) # sets bar chart parametersplt.title('Anaesthesia Deaths By Year: %s to %s'% (year_min, year_max), fontdict= {'fontsize': 20})
plt.xlabel('Year', fontdict= {'fontsize': 16})
plt.ylabel('Deaths Per 100,000 Population', fontdict= {'fontsize': 16})
xvals=np.arange(len(q1ds2_df))
tick_locations= [value+0.4forvalueinxvals]
plt.bar(xvals, q1ds2_df['Death Rate'], color='b', alpha=0.7, align="edge")
plt.xticks(tick_locations, q1ds2_df['Year Code'], rotation="horizontal")
axes=plt.gca()
axes.set_ylim([c1min,c1max]) # set y limits to match first chart
(0.0, 13.250775089523252)
plt.savefig('./Output/Q1_Plot2.png', bbox_inches='tight') # saves plot to fileplt.show() # displays plot
Who is affected?
filename='Q1DS3.csv'# 3rd data file for Q1csv_file=os.path.join(".", "Data Files", "Question_1", filename) # creates path to read dataq1ds3_df=pd.read_csv(csv_file) # reads data from fileplt.figure(figsize= (12,10)) # sets bar chart parametersplt.title('Opioid Death Rate By Gender', fontdict= {'fontsize': 20})
plt.xlabel('Gender', fontdict= {'fontsize': 16})
plt.ylabel('Deaths Per 100,000 Population', fontdict= {'fontsize': 16})
xvals=np.arange(len(q1ds3_df))
tick_locations= [value+0.4forvalueinxvals]
plt.bar(xvals, q1ds3_df['Death Rate'], color='r', alpha=0.7, align="edge")
plt.xticks(tick_locations, q1ds3_df['Gender'], rotation="horizontal")
([<matplotlib.axis.XTick at 0xdf09278>, <matplotlib.axis.XTick at 0xdeafc88>],
<a list of 2 Text xticklabel objects>)
plt.savefig('./Output/Q1_Plot3.png', bbox_inches='tight') # saves plot to fileplt.show() # displays plot
filename='Q1DS4.csv'# 4th data file for Q1csv_file=os.path.join(".", "Data Files", "Question_1", filename) # creates path to read dataq1ds4_df=pd.read_csv(csv_file) # reads data from fileplt.figure(figsize= (12,10)) # sets bar chart parametersplt.title('Opioid Death Rate By Age Group', fontdict= {'fontsize': 20})
plt.xlabel('Ten-Year Age Groups', fontdict= {'fontsize': 16})
plt.ylabel('Deaths Per 100,000 Population', fontdict= {'fontsize': 16})
xvals=np.arange(len(q1ds4_df))
tick_locations= [value+0.4forvalueinxvals]
plt.bar(xvals, q1ds4_df['Death Rate'], color='r', alpha=0.7, align="edge")
plt.xticks(tick_locations, q1ds4_df['Ten-Year Age Groups'], rotation="horizontal")
([<matplotlib.axis.XTick at 0xe882c50>,
<matplotlib.axis.XTick at 0xe882160>,
<matplotlib.axis.XTick at 0xe88d668>,
<matplotlib.axis.XTick at 0xe477f98>,
<matplotlib.axis.XTick at 0xe4806d8>,
<matplotlib.axis.XTick at 0xe480e48>,
<matplotlib.axis.XTick at 0xe4855f8>,
<matplotlib.axis.XTick at 0xe485d68>,
<matplotlib.axis.XTick at 0xe489518>,
<matplotlib.axis.XTick at 0xe489c88>,
<matplotlib.axis.XTick at 0xe48f438>],
<a list of 11 Text xticklabel objects>)
plt.savefig('./Output/Q1_Plot4.png', bbox_inches='tight') # saves plot to fileplt.show() # displays plot
filename='Q1DS5.csv'# 5th data file for Q1csv_file=os.path.join(".", "Data Files", "Question_1", filename) # creates path to read dataq1ds5_df=pd.read_csv(csv_file) # 5th data file for Q1plt.figure(figsize= (12,10)) # sets bar chart parametersplt.title('Opioid Death Rate By Race', fontdict= {'fontsize': 20})
plt.xlabel('Race', fontdict= {'fontsize': 16})
plt.ylabel('Deaths Per 100,000 Population', fontdict= {'fontsize': 16})
xvals=np.arange(len(q1ds5_df))
tick_locations= [value+0.4forvalueinxvals]
plt.bar(xvals, q1ds5_df['Death Rate'], color='r', alpha=0.7, align="edge")
plt.xticks(tick_locations, q1ds5_df['Race'], rotation="horizontal")
([<matplotlib.axis.XTick at 0xe669eb8>,
<matplotlib.axis.XTick at 0xe662f98>,
<matplotlib.axis.XTick at 0xe666ef0>,
<matplotlib.axis.XTick at 0xe697f28>],
<a list of 4 Text xticklabel objects>)
plt.savefig('./Output/Q1_Plot5.png', bbox_inches='tight') # saves plot to fileplt.show() # displays plot
Where is it occuring?
filename='Q1DS6.json'# deaths data file for Q1json_file_1=os.path.join(".", "Data Files", "Question_1", filename) # creates path to read datafilename='gz_2010_us_050_00_20m.json'# map shape data file for Q1json_file_2=os.path.join(".", "Data Files", "Question_1", filename) # creates path to read data
# function finds the death rate for the selected county, and converts it to colordefget_color(properties):
key=str(int(properties['STATE'])) +properties['COUNTY']
ifkeyinQ1DS6:
returncmap.to_color(Q1DS6.get(key), 5, 'lin')
else:
return [0, 0, 0, 0]
Change in Medicare Part D Opioid Prescribing Rates from 2013 to 2015 by State
# Took CMS Medicare Part D Opioid Prescribing Geographic 2013 2015 Excel file and created CSV files for the three taps 1) state 2) county and 3) zip# Stored presciber-state file in a variableprescriber_state="Data Files/prescriber_state.csv"prescriber_state_df=pd.read_csv(prescriber_state)
prescriber_state_df.head()
#Selected the opioid prescribe rate columsn for years 2013-2015prescriber_state_rate_df=prescriber_state_df[["State_Name","State_Abbreviation","2013_Opioid_Prescribing_Rate","2013_Extended_Release_Opioid_Prescribing_Rate","2014_Opioid_Prescribing_Rate","2014_Extended_Release_Opioid_Prescribing_Rate","2015_Opioid_Prescribing_Rate","2015_Extended_Release_Opioid_Prescribing_ Rate","2013_2015_Change_in_Opioid_Prescribing_Rate","2013_2015_Change_in_Extended_Release_Opioid_Prescribing_Rate "]]
prescriber_state_df.fillna("National")
prescriber_state_df.head()
# Setting the positions and width for the barspos=list(range(len(prescriber_state_rate_df["State_Name"])))
width=0.25# Plotting the barsfig, ax=plt.subplots(figsize=(12,10))
# Create a bar with pre_score data,# in position pos,plt.bar(pos,
#using df['pre_score'] data,prescriber_state_rate_changeex_df["Opioid"],
# of widthwidth,
# with alpha 0.5alpha=0.5,
# with colorcolor='r',
# with label the first value in first_namelabel=prescriber_state_rate_changeex_df["Opioid"][0])
# Create a bar with mid_score data,# in position pos + some width buffer,plt.bar([p+widthforpinpos],
#using df['mid_score'] data,prescriber_state_rate_changeex_df["Extended Release"],
# of widthwidth,
# with alpha 0.5alpha=0.5,
# with colorcolor='b',
# with label the second value in first_namelabel=prescriber_state_rate_changeex_df["Extended Release"][1])
# Set the y axis labelax.set_ylabel('Opioid Prescriber Rate')
#plt.ylabel("Prescribing Rate")# Set the chart's titleax.set_title("Change in Opioid Prescriber Rate 2013-2015", loc='center')
#plt.title("CMS Medicare Part D Opioid Prescribing Rates 2013-2015 by Sate")# Set the position of the x ticksax.set_xticks([p+1.5*widthforpinpos])
# Set the labels for the x ticksax.set_xticklabels(prescriber_state_rate_df["State_Name"], rotation="vertical")
#plt.xlabel("State")# Setting the x-axis and y-axis limitsplt.xlim(min(pos)-width, max(pos)+width*4)
plt.ylim([-0.8, max( prescriber_state_rate_changeex_df["Opioid"] +prescriber_state_rate_changeex_df["Extended Release"])] )
# Adding the legend and showing the plotplt.legend(['Opioid Prescribing Rate', 'Extended Release Rate'], loc='upper right')
plt.grid()
plt.show() # displays plot
# scatterplot: Urban population % by state in year 2010 on change in opioid death rates from years 2000-2010 sns.lmplot(x='2010', y='Change in Death', data=OpDeathsDelta_df, size=8)
# set titles and labelsplt.title('Opioid Deaths and Urbanization (2000 - 2010)')
plt.xlabel('Urban population % by state in the Year 2010')
plt.ylabel('Change in Opioid Overdose Death Rate per 100,000')
plt.show()
# regression of Opioid Deaths on Urbanization (2000 - 2010)y=OpDeathsDelta_df['Change in Death']
x=OpDeathsDelta_df['2010']
x=sm.add_constant(x)
results=sm.OLS(y, x).fit()
results.summary()
# Scatterplot of change in violent crime and opioid deaths in 2014plt.figure(figsize=(8,8))
# scatter with regression line. sns.regplot(x=OpDeaths_Urb_df['2014__Opioid Overdose Death Rate (Age-Adjusted)'],\
y=crime_df['Violent Crime rate_change'])
#For some reason sns.lmplot doesn't work with data from two df's but sns.regplot does.# set title and labelsplt.title('Change in Violent Crime Rates (1999-2014) and Opioid Deaths by State (2014)')
plt.xlabel('2014 Opioid Overdose Death Rate (Age-Adjusted)')
plt.ylabel('Change in Violent Crime Rates (per 100,000)')
plt.show()
# regression of Change in Violent Crime Rates (1999-2014) and Opioid Deaths by State (2014)y=crime_df['Violent Crime rate_change']
x=OpDeaths_Urb_df['2014__Opioid Overdose Death Rate (Age-Adjusted)']
x=sm.add_constant(x)
results=sm.OLS(y, x).fit()
results.summary()
OLS Regression Results
Dep. Variable:
Violent Crime rate_change
R-squared:
0.002
Model:
OLS
Adj. R-squared:
-0.019
Method:
Least Squares
F-statistic:
0.08176
Date:
Mon, 08 Jan 2018
Prob (F-statistic):
0.776
Time:
23:05:32
Log-Likelihood:
-318.43
No. Observations:
51
AIC:
640.9
Df Residuals:
49
BIC:
644.7
Df Model:
1
Covariance Type:
nonrobust
coef
std err
t
P>|t|
[0.025
0.975]
const
-84.5308
37.541
-2.252
0.029
-159.971
-9.090
2014__Opioid Overdose Death Rate (Age-Adjusted)
-0.9056
3.167
-0.286
0.776
-7.270
5.459
Omnibus:
0.363
Durbin-Watson:
1.962
Prob(Omnibus):
0.834
Jarque-Bera (JB):
0.291
Skew:
-0.175
Prob(JB):
0.865
Kurtosis:
2.880
Cond. No.
25.1
# Scatterplot of change in robbery rates and opioid deaths in 2014plt.figure(figsize=(8,8))
sns.regplot(x=OpDeaths_Urb_df['2014__Opioid Overdose Death Rate (Age-Adjusted)'],\
y=crime_df['Robbery rate_change'])
# set title and labelsplt.title('Change in Robbery Rates (1999-2014) and Opioid Deaths by State (2014)')
plt.xlabel('2014 Opioid Overdose Death Rate (Age-Adjusted)')
plt.ylabel('Change in Robbery Rates (per 100,000)')
plt.show()
# regression of Change in Robbery Rates (1999-2014) and Opioid Deaths by State (2014)y=crime_df['Robbery rate_change']
x=OpDeaths_Urb_df['2014__Opioid Overdose Death Rate (Age-Adjusted)']
x=sm.add_constant(x)
results=sm.OLS(y, x).fit()
results.summary()
OLS Regression Results
Dep. Variable:
Robbery rate_change
R-squared:
0.009
Model:
OLS
Adj. R-squared:
-0.011
Method:
Least Squares
F-statistic:
0.4351
Date:
Mon, 08 Jan 2018
Prob (F-statistic):
0.513
Time:
23:05:34
Log-Likelihood:
-250.27
No. Observations:
51
AIC:
504.5
Df Residuals:
49
BIC:
508.4
Df Model:
1
Covariance Type:
nonrobust
coef
std err
t
P>|t|
[0.025
0.975]
const
-37.9212
9.864
-3.845
0.000
-57.743
-18.099
2014__Opioid Overdose Death Rate (Age-Adjusted)
0.5489
0.832
0.660
0.513
-1.123
2.221
Omnibus:
7.046
Durbin-Watson:
1.952
Prob(Omnibus):
0.030
Jarque-Bera (JB):
6.576
Skew:
-0.875
Prob(JB):
0.0373
Kurtosis:
3.178
Cond. No.
25.1
# Scatterplot of change in property crime rates and opioid deaths in 2014plt.figure(figsize=(8,8))
sns.regplot(x=OpDeaths_Urb_df['2014__Opioid Overdose Death Rate (Age-Adjusted)'],\
y=crime_df['Property crime rate_change'])
# set title and labelsplt.title('Change in Property Crime Rates (1999-2014) and Opioid Deaths by State (2014)')
plt.xlabel('2014 Opioid Overdose Death Rate (Age-Adjusted)')
plt.ylabel('Change in Property Crime Rates (per 100,000)')
plt.show()
# regression of Change in property crime rates and Opioid Deaths by State (2014)y=crime_df['Property crime rate_change']
x=OpDeaths_Urb_df['2014__Opioid Overdose Death Rate (Age-Adjusted)']
x=sm.add_constant(x)
results=sm.OLS(y, x).fit()
results.summary()
OLS Regression Results
Dep. Variable:
Property crime rate_change
R-squared:
0.024
Model:
OLS
Adj. R-squared:
0.004
Method:
Least Squares
F-statistic:
1.180
Date:
Mon, 08 Jan 2018
Prob (F-statistic):
0.283
Time:
23:05:35
Log-Likelihood:
-382.17
No. Observations:
51
AIC:
768.3
Df Residuals:
49
BIC:
772.2
Df Model:
1
Covariance Type:
nonrobust
coef
std err
t
P>|t|
[0.025
0.975]
const
-1248.6099
130.989
-9.532
0.000
-1511.842
-985.377
2014__Opioid Overdose Death Rate (Age-Adjusted)
12.0066
11.051
1.087
0.283
-10.200
34.214
Omnibus:
0.051
Durbin-Watson:
2.474
Prob(Omnibus):
0.975
Jarque-Bera (JB):
0.175
Skew:
-0.068
Prob(JB):
0.916
Kurtosis:
2.748
Cond. No.
25.1
Research Question 4
# read cleaned csv into dffilename='Q4DS1.csv'# 1st data file for Q4csv_file=os.path.join(".", "Data Files", "Question_4", filename) # creates path to read dataq4ds1_df=pd.read_csv(csv_file, index_col="Expenditure Amount (Millions)") # reads data from fileq4ds1_df.head()
# Extract series of spending data 1999 - 2016 to compare against opioid deathscms_spending=q4ds1_df.loc['Total CMS Programs (Medicaid, CHIP and Medicare)','1999':'2016']
nhe_spending=q4ds1_df.loc['Total National Health Expenditures','1999':'2016']
percentage=cms_spending/nhe_spending