Skip to content

Commit b05ce1a

Browse files
committed
diag
1 parent 5d01ccc commit b05ce1a

File tree

252 files changed

+14655
-604
lines changed

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

252 files changed

+14655
-604
lines changed

.Rhistory

Lines changed: 510 additions & 510 deletions
Large diffs are not rendered by default.

_freeze/surveydata5/execute-results/html.json

Lines changed: 2 additions & 2 deletions
Large diffs are not rendered by default.

_freeze/surveydata9/execute-results/html.json

Lines changed: 2 additions & 2 deletions
Large diffs are not rendered by default.
Lines changed: 19 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,19 @@
1+
{
2+
"hash": "501e2d3f0dee34a35f32d4a1035e93ec",
3+
"result": {
4+
"engine": "knitr",
5+
"markdown": "## NHANES: Performance {.unnumbered}\n\nThis tutorial outlines how to evaluate the performance of a logistic regression model fitted to complex survey data using R, focusing on the NHANES dataset.\n\nWe will assess the model's predictive accuracy using a design-correct **Area Under the Curve (AUC)** and evaluate its fit with the **Archer-Lemeshow** test.\n\n------------------------------------------------------------------------\n\n## 1. Data Preparation ⚙️\n\nFirst, we need to load the necessary R packages and prepare the NHANES data. The code below loads data from the `Flegal2016.RData` file, recodes variables like age, race, smoking, and education, and creates the final survey design object (`analytic_design`) for our analysis.\n\n\n::: {.cell}\n\n```{.r .cell-code}\n# --- Load Libraries ---\nlibrary(survey)\nlibrary(dplyr)\n# install.packages(\"devtools\")\n# devtools::install_github(\"ehsanx/svyTable1\", \n# build_vignettes = TRUE, \n# dependencies = TRUE)\nlibrary(svyTable1)\nlibrary(ggplot2)\n\n# --- Load Data ---\nload(\"Data/surveydata/Flegal2016.RData\") \n\n# --- Create Analytic Sample (N=5,455) ---\ndat.analytic <- subset(dat.full, RIDAGEYR >= 20)\ndat.analytic <- subset(dat.analytic, !is.na(BMXBMI))\ndat.analytic <- subset(dat.analytic, is.na(RIDEXPRG) | RIDEXPRG != \"Yes, positive lab pregnancy test\")\n\n# --- Recode Analysis Variables ---\ndat.full <- dat.full %>%\n mutate(\n Age = cut(RIDAGEYR,\n breaks = c(20, 40, 60, Inf),\n right = FALSE,\n labels = c(\"20-39\", \"40-59\", \">=60\")),\n\n race = case_when(\n RIDRETH3 == \"Non-Hispanic White\" ~ \"Non-Hispanic white\",\n RIDRETH3 == \"Non-Hispanic Black\" ~ \"Non-Hispanic black\",\n RIDRETH3 == \"Non-Hispanic Asian\" ~ \"Non-Hispanic Asian\",\n RIDRETH3 %in% c(\"Mexican American\", \"Other Hispanic\") ~ \"Hispanic\",\n TRUE ~ \"Other\" # Consolidating \"Other Race\" for simplicity\n ),\n race = factor(race, levels = c(\"Non-Hispanic white\", \"Non-Hispanic black\", \"Non-Hispanic Asian\", \"Hispanic\", \"Other\")),\n\n obese = factor(ifelse(BMXBMI >= 30, \"Yes\", \"No\"), levels = c(\"No\", \"Yes\")),\n\n smoking = case_when(\n SMQ020 == \"No\" ~ \"Never smoker\",\n SMQ020 == \"Yes\" & SMQ040 == \"Not at all\" ~ \"Former smoker\",\n SMQ020 == \"Yes\" ~ \"Current smoker\",\n TRUE ~ NA_character_\n ),\n smoking = factor(smoking, levels = c(\"Never smoker\", \"Former smoker\", \"Current smoker\")),\n\n education = case_when(\n DMDEDUC2 %in% c(\"Less than 9th grade\", \"9-11th grade (Includes 12th grad)\") ~ \"<High school\",\n DMDEDUC2 == \"High school graduate/GED or equi\" ~ \"High school\",\n DMDEDUC2 %in% c(\"Some college or AA degree\", \"College graduate or above\") ~ \">High school\",\n TRUE ~ NA_character_\n ),\n education = factor(education, levels = c(\"High school\", \"<High school\", \">High school\"))\n )\n\n# --- Create Survey Design Object ---\noptions(survey.lonely.psu = \"adjust\")\nsvy.design.full <- svydesign(id = ~SDMVPSU,\n strata = ~SDMVSTRA,\n weights = ~WTINT2YR,\n nest = TRUE,\n data = dat.full)\nanalytic_design <- subset(svy.design.full, SEQN %in% dat.analytic$SEQN)\n```\n:::\n\n\n------------------------------------------------------------------------\n\n## 2. Fitting a Logistic Regression Model 📊\n\nNext, we fit a survey-weighted logistic regression model using `svyglm()`. Our goal is to predict obesity status based on **Age**, **race**, **smoking status**, and **education level**.\n\n\n::: {.cell}\n\n```{.r .cell-code}\n# Fit the survey-weighted logistic regression model\nfit_obesity <- svyglm(\n I(obese == \"Yes\") ~ Age + race + smoking + education,\n design = analytic_design,\n family = quasibinomial()\n)\n```\n:::\n\n\n------------------------------------------------------------------------\n\n## 3. Model Performance: ROC Curve and AUC 📈\n\nWe calculate the **Area Under the ROC Curve (AUC)** to measure predictive performance. The `svyAUC()` function from `svyTable1` computes a design-corrected AUC and confidence interval, which requires a replicate-weights survey design.\n\n### Create Replicate Design & Refit Model\n\n\n::: {.cell}\n\n```{.r .cell-code}\n# svyAUC() requires a replicate-weights design object\nrep_design <- as.svrepdesign(analytic_design)\n# Restrict the replicate design to complete cases\nvars_for_model <- c(\"obese\", \"Age\", \"race\", \"smoking\", \"education\")\nrep_design_complete <- subset(rep_design, \n complete.cases(rep_design$variables[, vars_for_model]))\n\n# Refit model on the complete-case replicate design\nfit_obesity_rep <- svyglm(\n I(obese == \"Yes\") ~ Age + race + smoking + education,\n design = rep_design_complete,\n family = quasibinomial()\n)\n```\n:::\n\n\n### Calculate AUC\n\n\n::: {.cell}\n\n```{.r .cell-code}\n# Calculate the design-correct AUC\nauc_results_list <- svyAUC(fit_obesity_rep, rep_design_complete, plot = TRUE)\n```\n\n::: {.cell-output-display}\n![](surveydata9c_files/figure-html/unnamed-chunk-4-1.png){width=672}\n:::\n\n```{.r .cell-code}\n\n# Display the AUC summary\nknitr::kable(auc_results_list$summary)\n```\n\n::: {.cell-output-display}\n\n\n| AUC| SE| CI_Lower| CI_Upper|\n|---------:|---------:|---------:|--------:|\n| 0.5967987| 0.0108604| 0.5755123| 0.618085|\n\n\n:::\n:::\n\n\n**Interpretation:** An AUC of **0.5967987** indicates poor discrimination. While better than simpler models, it suggests that the predictors do not have strong predictive power for obesity in this dataset.\n\n### Visualize the ROC Curve\n\n\n::: {.cell}\n\n```{.r .cell-code}\nggplot(auc_results_list$roc_data, aes(x = FPR, y = TPR)) +\n geom_line(color = \"blue\", linewidth = 1) +\n geom_abline(linetype = \"dashed\") +\n labs(\n title = \"Survey-Weighted ROC Curve\",\n subtitle = paste0(\"AUC = \", round(auc_results_list$summary$AUC, 3)),\n x = \"1 - Specificity (FPR)\",\n y = \"Sensitivity (TPR)\"\n ) +\n theme_minimal()\n```\n\n::: {.cell-output-display}\n![](surveydata9c_files/figure-html/unnamed-chunk-5-1.png){width=672}\n:::\n:::\n\n\n------------------------------------------------------------------------\n\n## 4. Goodness-of-Fit: Archer and Lemeshow Test ✅\n\nWe assess model fit using the **Archer-Lemeshow test**. A non-significant p-value (p \\> 0.05) suggests a good model fit.\n\n\n::: {.cell}\n\n```{.r .cell-code}\n# Run the goodness-of-fit test\ngof_results <- svygof(fit_obesity, analytic_design)\n\n# Display the results\nknitr::kable(gof_results)\n```\n\n::: {.cell-output-display}\n\n\n| F_statistic| df1| df2| p_value|\n|-----------:|---:|---:|---------:|\n| 2.168827| 9| 6| 0.1791469|\n\n\n:::\n:::\n\n\n**Interpretation:** The p-value (**0.1791469**) is greater than 0.05, indicating no evidence of a poor fit. The model’s predictions align well with observed outcomes, suggesting good calibration.\n\n------------------------------------------------------------------------\n\n## Summary\n\n- The AUC shows poor discriminatory power, meaning the model does not strongly distinguish between obese and non-obese individuals.\n- The Archer-Lemeshow test indicates good model fit (p \\> 0.05).\n\n**Conclusion:** While well-calibrated, the model’s low predictive power implies other factors (e.g., cholesterol, blood pressure, diet) may better explain obesity risk.\n",
6+
"supporting": [],
7+
"filters": [
8+
"rmarkdown/pagebreak.lua"
9+
],
10+
"includes": {
11+
"include-in-header": [
12+
"<link href=\"site_libs/pagedtable-1.1/css/pagedtable.css\" rel=\"stylesheet\" />\n<script src=\"site_libs/pagedtable-1.1/js/pagedtable.js\"></script>\n"
13+
]
14+
},
15+
"engineDependencies": {},
16+
"preserve": {},
17+
"postProcess": true
18+
}
19+
}
16.7 KB
Loading
24.9 KB
Loading

_freeze/surveydata9c/libs/bootstrap/bootstrap-bb462d781dde1847d9e3ccf7736099dd.min.css

Lines changed: 12 additions & 0 deletions
Some generated files are not rendered by default. Learn more about customizing how changed files appear on GitHub.

0 commit comments

Comments
 (0)