Skip to content

Commit 0ae2482

Browse files
committed
refsvy
1 parent 5f505e9 commit 0ae2482

File tree

5 files changed

+104
-15
lines changed

5 files changed

+104
-15
lines changed

_freeze/surveydata9/execute-results/html.json

Lines changed: 2 additions & 2 deletions
Large diffs are not rendered by default.

docs/search.json

Lines changed: 14 additions & 3 deletions
Large diffs are not rendered by default.

docs/surveydata9.html

Lines changed: 43 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -52,7 +52,27 @@
5252
@media screen {
5353
pre > code.sourceCode > span > a:first-child::before { text-decoration: underline; }
5454
}
55-
</style>
55+
/* CSS for citations */
56+
div.csl-bib-body { }
57+
div.csl-entry {
58+
clear: both;
59+
margin-bottom: 0em;
60+
}
61+
.hanging-indent div.csl-entry {
62+
margin-left:2em;
63+
text-indent:-2em;
64+
}
65+
div.csl-left-margin {
66+
min-width:2em;
67+
float:left;
68+
}
69+
div.csl-right-inline {
70+
margin-left:2em;
71+
padding-left:1em;
72+
}
73+
div.csl-indent {
74+
margin-left: 2em;
75+
}</style>
5676

5777
<script src="site_libs/quarto-nav/quarto-nav.js"></script>
5878
<script src="site_libs/quarto-nav/headroom.min.js"></script>
@@ -1325,6 +1345,7 @@
13251345
<li><a href="#differences-from-the-original-paper" id="toc-differences-from-the-original-paper" class="nav-link" data-scroll-target="#differences-from-the-original-paper">Differences from the Original Paper</a></li>
13261346
</ul>
13271347
</li>
1348+
<li><a href="#references" id="toc-references" class="nav-link" data-scroll-target="#references">References</a></li>
13281349
</ul></nav>
13291350
</div>
13301351
<!-- main -->
@@ -1344,7 +1365,7 @@ <h1 class="title">NHANES: Reliability Standards</h1>
13441365

13451366

13461367
</header><section id="introduction" class="level2"><h2 class="anchored" data-anchor-id="introduction">Introduction</h2>
1347-
<p>This tutorial reproduces the key tables from the <a href="https://jamanetwork.com/journals/jama/article-abstract/2526639">Flegal et al.&nbsp;(2016)</a> article. The analysis uses the same NHANES data and aims to replicate the unweighted sample size counts from Table 1 and the weighted logistic regression models from Table 3. We incorporate <a href="https://wwwn.cdc.gov/nchs/nhanes/tutorials/reliabilityofestimates.aspx">NCHS/CDC reliability standards</a> to ensure estimates are statistically defensible.</p>
1368+
<p>This tutorial reproduces the key tables from the <a href="https://jamanetwork.com/journals/jama/article-abstract/2526639">Flegal et al.&nbsp;(2016)</a> article. The analysis uses the same NHANES data and aims to replicate the unweighted sample size counts from Table 1 and the weighted logistic regression models from Table 3. We incorporate <a href="https://wwwn.cdc.gov/nchs/nhanes/tutorials/reliabilityofestimates.aspx">NCHS/CDC reliability standards</a> to ensure estimates are statistically defensible <span class="citation" data-cites="nhanes_reliability_estimates">(<a href="#ref-nhanes_reliability_estimates" role="doc-biblioref">Disease Control and Prevention 2025</a>)</span>.</p>
13481369
</section><section id="setup-and-data-preparation" class="level2"><h2 class="anchored" data-anchor-id="setup-and-data-preparation">Setup and Data Preparation</h2>
13491370
<p>This first section prepares the data for analysis. The key steps are:</p>
13501371
<ul>
@@ -1433,7 +1454,7 @@ <h1 class="title">NHANES: Reliability Standards</h1>
14331454
</section><section id="reproducing-table-1" class="level2"><h2 class="anchored" data-anchor-id="reproducing-table-1">Reproducing Table 1</h2>
14341455
<p>This section reproduces the unweighted sample sizes shown in Flegal et al.’s Table 1. The code first generates separate summary tables for all participants, men, and women. It then performs several formatting steps to combine these into a single table.</p>
14351456
<section id="what-svytable1-does" class="level3"><h3 class="anchored" data-anchor-id="what-svytable1-does">What <code>svytable1</code> Does</h3>
1436-
<p>The <code>svytable1</code> function creates a descriptive summary table—commonly referred to as a <strong>“Table 1”</strong>—from complex survey data. It is specifically designed to produce publication-ready results that align with <a href="https://wwwn.cdc.gov/nchs/nhanes/tutorials/reliabilityofestimates.aspx">NCHS Data Presentation Standards for reliability</a>.</p>
1457+
<p>The <code>svytable1</code> function creates a descriptive summary table—commonly referred to as a <strong>“Table 1”</strong>—from complex survey data <span class="citation" data-cites="svyTable1">(<a href="#ref-svyTable1" role="doc-biblioref">Karim 2025</a>)</span>. It is specifically designed to produce publication-ready results that align with <a href="https://wwwn.cdc.gov/nchs/nhanes/tutorials/reliabilityofestimates.aspx">NCHS Data Presentation Standards for reliability</a>.</p>
14371458
</section><section id="key-svytable1-operations" class="level3"><h3 class="anchored" data-anchor-id="key-svytable1-operations">Key <code>svytable1</code> Operations</h3>
14381459
<p>When you call <code>svytable1</code> (<a href="https://github.com/ehsanx/svyTable1">link</a>), it performs the following steps for each analysis (for example, for all participants, men, and women):</p>
14391460
<ol type="1">
@@ -2135,7 +2156,7 @@ <h1 class="title">NHANES: Reliability Standards</h1>
21352156
<p><strong>The RSE Can Be Misleading for Regression</strong>: Notice that some statistically insignificant coefficients (like <code>Age40-59</code> and <code>raceHispanic</code>) have high RSEs, which is expected. However, the <code>education\&gt;High school</code> coefficient is highly insignificant: p-value of 0.932 correctly tells you that this coefficient is not statistically significant and is not reliably different from zero. However its RSE is flagged as “TRUE” for being unreliable. The RSE is calculated as (0.147 / -0.013) * 100 = 1109%. Here, the extremely high RSE here is not a result of a large standard error, but of the coefficient estimate being very close to zero. An inflated RSE doesn’t provide any new or more accurate information than the p-value; it simply reflects that the coefficient itself is minuscule. This is a great example of why RSE isn’t a primary tool for regression coefficients: it can be inflated by estimates close to zero, regardless of their precision.</p>
21362157
</section><section id="check-for-multicollinearity" class="level3"><h3 class="anchored" data-anchor-id="check-for-multicollinearity">Check for Multicollinearity</h3>
21372158
<p>Multicollinearity occurs when predictor variables in a model are highly correlated with each other. This can inflate the standard errors and make your coefficient estimates unstable. The VIF is used to detect this issue.</p>
2138-
<p><code>GVIF^(1/(2*Df))</code> is a scaled version of the Generalized Variance Inflation Factor (GVIF) used to assess multicollinearity in regression models with categorical predictors. It adjusts for the number of dummy variables created for a categorical variable, making its value directly comparable to the traditional VIF used for continuous predictors within the same model.</p>
2159+
<p><code>GVIF^(1/(2*Df))</code> is a scaled version of the Generalized Variance Inflation Factor (GVIF) used to assess multicollinearity in regression models with categorical predictors <span class="citation" data-cites="fox1992generalized">(<a href="#ref-fox1992generalized" role="doc-biblioref">Fox and Monette 1992</a>)</span>. It adjusts for the number of dummy variables created for a categorical variable, making its value directly comparable to the traditional VIF used for continuous predictors within the same model.</p>
21392160
<section id="why-gvif12df-is-necessary-for-categorical-variables" class="level4"><h4 class="anchored" data-anchor-id="why-gvif12df-is-necessary-for-categorical-variables">Why GVIF<sup>1/(2×Df)</sup> is Necessary for Categorical Variables</h4>
21402161
<p>A categorical variable with (k) levels (e.g., <em>race</em>) is typically represented in a regression model by (k - 1) <strong>dummy variables</strong>. Dummy variables are inherently correlated because they all describe the same categorical feature. This intrinsic relationship would lead to very high — but misleading — <strong>GVIF</strong> scores if the overall GVIF were interpreted directly.</p>
21412162
<p>The adjustment GVIF<sup>1/(2×Df)</sup> standardizes the GVIF value. It reduces the measure from a <em>hypervolume</em> of confidence for multiple coefficients down to a linear measure, making it comparable to the single VIF value used for continuous predictors. Here, <strong>Df</strong> is the degrees of freedom for the categorical term, which equals (k - 1), the number of dummy variables.</p>
@@ -2165,10 +2186,11 @@ <h1 class="title">NHANES: Reliability Standards</h1>
21652186
</tr>
21662187
<tr class="even">
21672188
<td><strong>&gt; 5</strong></td>
2168-
<td>Potentially severe multicollinearity. The predictor may have a strong overlap with others, obscuring its effect on the outcome. A more conservative cutoff of <strong>4</strong> is sometimes used.</td>
2189+
<td>Potentially severe multicollinearity. The predictor may have a strong overlap with others, obscuring its effect on the outcome.</td>
21692190
</tr>
21702191
</tbody>
21712192
</table>
2193+
<p>A more conservative cutoff of *3** is sometimes used. The scaled GVIF, (GVIF^{1/(2·df)}), is designed to be comparable to the square root of the VIF, which explains the use of cutoffs like () (≈ 2.24) and () (≈ 3.16) <span class="citation" data-cites="nahhas2024introduction">(<a href="#ref-nahhas2024introduction" role="doc-biblioref">Nahhas 2024</a>)</span>. Larger than () (≈ 4.47) is therefore the case of severe multicollinearity.</p>
21722194
<div class="cell">
21732195
<div class="sourceCode cell-code" id="cb10"><pre class="sourceCode numberSource r number-lines code-with-copy"><code class="sourceCode r"><span id="cb10-1"><a href="#cb10-1"></a>vif_values <span class="ot">&lt;-</span> <span class="fu">vif</span>(fit_men_obese)</span>
21742196
<span id="cb10-2"><a href="#cb10-2"></a><span class="fu">print</span>(vif_values)</span>
@@ -2419,9 +2441,24 @@ <h1 class="title">NHANES: Reliability Standards</h1>
24192441
</div>
24202442
</section><section id="differences-from-the-original-paper" class="level3"><h3 class="anchored" data-anchor-id="differences-from-the-original-paper">Differences from the Original Paper</h3>
24212443
<p>While the ‘Statistical Analyses’ section of Flegal et al.&nbsp;(2016) details their models, it does not explicitly state the method used to handle missing data for covariates. Our replication employs a complete-case analysis, which excludes participants with missing smoking or education data from the models. This difference is the most likely reason for the minor discrepancies between our results and those published in the original paper.</p>
2444+
</section></section><section id="references" class="level2"><h2 class="anchored" data-anchor-id="references">References</h2>
24222445

24232446

2424-
</section></section></main><!-- /main --><script id="quarto-html-after-body" type="application/javascript">
2447+
<div id="refs" class="references csl-bib-body hanging-indent" data-entry-spacing="0" role="list">
2448+
<div id="ref-nhanes_reliability_estimates" class="csl-entry" role="listitem">
2449+
Disease Control, Centers for, and Prevention. 2025. <span>“NHANES Tutorials: Reliability of Estimates Module.”</span> National Center for Health Statistics. <a href="https://wwwn.cdc.gov/nchs/nhanes/tutorials/reliabilityofestimates.aspx">https://wwwn.cdc.gov/nchs/nhanes/tutorials/reliabilityofestimates.aspx</a>.
2450+
</div>
2451+
<div id="ref-fox1992generalized" class="csl-entry" role="listitem">
2452+
Fox, John, and Georges Monette. 1992. <span>“Generalized Collinearity Diagnostics.”</span> <em>Journal of the American Statistical Association</em> 87 (417): 178–83.
2453+
</div>
2454+
<div id="ref-svyTable1" class="csl-entry" role="listitem">
2455+
Karim, Mohammad Ehsanul. 2025. <span>“svyTable1: Table 1 and Diagnostics from Complex Survey Designs.”</span> <a href="https://github.com/ehsanx/svyTable1">https://github.com/ehsanx/svyTable1</a>.
2456+
</div>
2457+
<div id="ref-nahhas2024introduction" class="csl-entry" role="listitem">
2458+
Nahhas, Ramzi W. 2024. <em>Introduction to Regression Methods for Public Health Using r</em>. CRC Press.
2459+
</div>
2460+
</div>
2461+
</section></main><!-- /main --><script id="quarto-html-after-body" type="application/javascript">
24252462
window.document.addEventListener("DOMContentLoaded", function (event) {
24262463
const icon = "";
24272464
const anchorJS = new window.AnchorJS();

ref.bib

Lines changed: 37 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -9,6 +9,43 @@ @article{vach1991biased
99
publisher={Oxford University Press}
1010
}
1111

12+
@Misc{svyTable1,
13+
author = {Karim, Mohammad Ehsanul},
14+
title = {svyTable1: Table 1 and diagnostics from complex survey designs},
15+
year = {2025},
16+
version = {0.1.0},
17+
note = {R package},
18+
url = {https://github.com/ehsanx/svyTable1},
19+
accessed = {2025-10-13}
20+
}
21+
22+
@misc{nhanes_reliability_estimates,
23+
author = {Centers for Disease Control and Prevention},
24+
title = {NHANES Tutorials: Reliability of Estimates Module},
25+
organization = {National Center for Health Statistics},
26+
year = {2025},
27+
url = {https://wwwn.cdc.gov/nchs/nhanes/tutorials/reliabilityofestimates.aspx},
28+
note = {Accessed: October 12, 2025}
29+
}
30+
31+
@article{fox1992generalized,
32+
title={Generalized collinearity diagnostics},
33+
author={Fox, John and Monette, Georges},
34+
journal={Journal of the American Statistical Association},
35+
volume={87},
36+
number={417},
37+
pages={178--183},
38+
year={1992},
39+
publisher={Taylor \& Francis}
40+
}
41+
42+
@book{nahhas2024introduction,
43+
title={Introduction to regression methods for public health using R},
44+
author={Nahhas, Ramzi W},
45+
year={2024},
46+
publisher={CRC Press}
47+
}
48+
1249
@article{austin2021missing,
1350
title={Missing data in clinical research: a tutorial on multiple imputation},
1451
author={Austin, Peter C and White, Ian R and Lee, Douglas S and van Buuren, Stef},

surveydata9.qmd

Lines changed: 8 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -2,7 +2,7 @@
22

33
## Introduction
44

5-
This tutorial reproduces the key tables from the [Flegal et al. (2016)](https://jamanetwork.com/journals/jama/article-abstract/2526639) article. The analysis uses the same NHANES data and aims to replicate the unweighted sample size counts from Table 1 and the weighted logistic regression models from Table 3. We incorporate [NCHS/CDC reliability standards](https://wwwn.cdc.gov/nchs/nhanes/tutorials/reliabilityofestimates.aspx) to ensure estimates are statistically defensible.
5+
This tutorial reproduces the key tables from the [Flegal et al. (2016)](https://jamanetwork.com/journals/jama/article-abstract/2526639) article. The analysis uses the same NHANES data and aims to replicate the unweighted sample size counts from Table 1 and the weighted logistic regression models from Table 3. We incorporate [NCHS/CDC reliability standards](https://wwwn.cdc.gov/nchs/nhanes/tutorials/reliabilityofestimates.aspx) to ensure estimates are statistically defensible [@nhanes_reliability_estimates].
66

77
```{r setup, include=FALSE}
88
knitr::opts_chunk$set(echo = TRUE, message = FALSE, warning = FALSE)
@@ -97,7 +97,7 @@ This section reproduces the unweighted sample sizes shown in Flegal et al.'s Tab
9797

9898
### What `svytable1` Does
9999

100-
The `svytable1` function creates a descriptive summary table—commonly referred to as a **“Table 1”**—from complex survey data. It is specifically designed to produce publication-ready results that align with [NCHS Data Presentation Standards for reliability](https://wwwn.cdc.gov/nchs/nhanes/tutorials/reliabilityofestimates.aspx).
100+
The `svytable1` function creates a descriptive summary table—commonly referred to as a **“Table 1”**—from complex survey data [@svyTable1]. It is specifically designed to produce publication-ready results that align with [NCHS Data Presentation Standards for reliability](https://wwwn.cdc.gov/nchs/nhanes/tutorials/reliabilityofestimates.aspx).
101101

102102
### Key `svytable1` Operations
103103

@@ -305,7 +305,7 @@ The few significant predictors are `raceNon-Hispanic Asian` and `education\<High
305305

306306
Multicollinearity occurs when predictor variables in a model are highly correlated with each other. This can inflate the standard errors and make your coefficient estimates unstable. The VIF is used to detect this issue.
307307

308-
`GVIF^(1/(2*Df))` is a scaled version of the Generalized Variance Inflation Factor (GVIF) used to assess multicollinearity in regression models with categorical predictors. It adjusts for the number of dummy variables created for a categorical variable, making its value directly comparable to the traditional VIF used for continuous predictors within the same model.
308+
`GVIF^(1/(2*Df))` is a scaled version of the Generalized Variance Inflation Factor (GVIF) used to assess multicollinearity in regression models with categorical predictors [@fox1992generalized]. It adjusts for the number of dummy variables created for a categorical variable, making its value directly comparable to the traditional VIF used for continuous predictors within the same model.
309309

310310
#### Why GVIF<sup>1/(2×Df)</sup> is Necessary for Categorical Variables
311311

@@ -322,7 +322,9 @@ The interpretation of GVIF<sup>1/(2×Df)</sup> follows the same guidelines as th
322322
| **1** | No correlation among predictors. |
323323
| **1 – 2.5** | Low to moderate correlation — generally acceptable (typical for most well-specified models). |
324324
| **2.5 – 5** | Moderate to high correlation — may warrant further investigation. |
325-
| **\> 5** | Potentially severe multicollinearity. The predictor may have a strong overlap with others, obscuring its effect on the outcome. A more conservative cutoff of **4** is sometimes used. |
325+
| **\> 5** | Potentially severe multicollinearity. The predictor may have a strong overlap with others, obscuring its effect on the outcome. |
326+
327+
A more conservative cutoff of *3** is sometimes used. The scaled GVIF, \(GVIF^{1/(2·df)}\), is designed to be comparable to the square root of the VIF, which explains the use of cutoffs like \(\sqrt{5}\) (≈ 2.24) and \(\sqrt{10}\) (≈ 3.16) [@nahhas2024introduction]. Larger than \(\sqrt{20}\) (≈ 4.47) is therefore the case of severe multicollinearity.
326328

327329
```{r vif-check, cache=TRUE}
328330
vif_values <- vif(fit_men_obese)
@@ -379,3 +381,5 @@ kable(final_table, caption = "Weighted Logistic Regression Models for Obesity",
379381
### Differences from the Original Paper
380382

381383
While the 'Statistical Analyses' section of Flegal et al. (2016) details their models, it does not explicitly state the method used to handle missing data for covariates. Our replication employs a complete-case analysis, which excludes participants with missing smoking or education data from the models. This difference is the most likely reason for the minor discrepancies between our results and those published in the original paper.
384+
385+
## References

0 commit comments

Comments
 (0)