ehsanx
diff --git a/‎_freeze/surveydata9/execute-results/html.json‎
Lines changed: 2 additions & 2 deletions b/‎_freeze/surveydata9/execute-results/html.json‎
Lines changed: 2 additions & 2 deletions
diff --git a/‎docs/search.json‎
Lines changed: 14 additions & 3 deletions b/‎docs/search.json‎
Lines changed: 14 additions & 3 deletions
diff --git a/‎docs/surveydata9.html‎
Lines changed: 43 additions & 6 deletions b/‎docs/surveydata9.html‎
Lines changed: 43 additions & 6 deletions
diff --git a/‎ref.bib‎
Lines changed: 37 additions & 0 deletions b/‎ref.bib‎
Lines changed: 37 additions & 0 deletions
diff --git a/‎surveydata9.qmd‎
Lines changed: 8 additions & 4 deletions b/‎surveydata9.qmd‎
Lines changed: 8 additions & 4 deletions
@@ -52,7 +52,27 @@
 @media screen {
 pre > code.sourceCode > span > a:first-child::before { text-decoration: underline; }
 }
-</style>
+/* CSS for citations */
+div.csl-bib-body { }
+div.csl-entry {
+  clear: both;
+  margin-bottom: 0em;
+}
+.hanging-indent div.csl-entry {
+  margin-left:2em;
+  text-indent:-2em;
+}
+div.csl-left-margin {
+  min-width:2em;
+  float:left;
+}
+div.csl-right-inline {
+  margin-left:2em;
+  padding-left:1em;
+}
+div.csl-indent {
+  margin-left: 2em;
+}</style>
 
 <script src="site_libs/quarto-nav/quarto-nav.js"></script>
 <script src="site_libs/quarto-nav/headroom.min.js"></script>
@@ -1325,6 +1345,7 @@
   <li><a href="#differences-from-the-original-paper" id="toc-differences-from-the-original-paper" class="nav-link" data-scroll-target="#differences-from-the-original-paper">Differences from the Original Paper</a></li>
   </ul>
 </li>
+  <li><a href="#references" id="toc-references" class="nav-link" data-scroll-target="#references">References</a></li>
   </ul></nav>
     </div>
 <!-- main -->
@@ -1344,7 +1365,7 @@ <h1 class="title">NHANES: Reliability Standards</h1>
 
 
 </header><section id="introduction" class="level2"><h2 class="anchored" data-anchor-id="introduction">Introduction</h2>
-<p>This tutorial reproduces the key tables from the <a href="https://jamanetwork.com/journals/jama/article-abstract/2526639">Flegal et al.&nbsp;(2016)</a> article. The analysis uses the same NHANES data and aims to replicate the unweighted sample size counts from Table 1 and the weighted logistic regression models from Table 3. We incorporate <a href="https://wwwn.cdc.gov/nchs/nhanes/tutorials/reliabilityofestimates.aspx">NCHS/CDC reliability standards</a> to ensure estimates are statistically defensible.</p>
+<p>This tutorial reproduces the key tables from the <a href="https://jamanetwork.com/journals/jama/article-abstract/2526639">Flegal et al.&nbsp;(2016)</a> article. The analysis uses the same NHANES data and aims to replicate the unweighted sample size counts from Table 1 and the weighted logistic regression models from Table 3. We incorporate <a href="https://wwwn.cdc.gov/nchs/nhanes/tutorials/reliabilityofestimates.aspx">NCHS/CDC reliability standards</a> to ensure estimates are statistically defensible <span class="citation" data-cites="nhanes_reliability_estimates">(<a href="#ref-nhanes_reliability_estimates" role="doc-biblioref">Disease Control and Prevention 2025</a>)</span>.</p>
 </section><section id="setup-and-data-preparation" class="level2"><h2 class="anchored" data-anchor-id="setup-and-data-preparation">Setup and Data Preparation</h2>
 <p>This first section prepares the data for analysis. The key steps are:</p>
 <ul>
@@ -1433,7 +1454,7 @@ <h1 class="title">NHANES: Reliability Standards</h1>
 </section><section id="reproducing-table-1" class="level2"><h2 class="anchored" data-anchor-id="reproducing-table-1">Reproducing Table 1</h2>
 <p>This section reproduces the unweighted sample sizes shown in Flegal et al.’s Table 1. The code first generates separate summary tables for all participants, men, and women. It then performs several formatting steps to combine these into a single table.</p>
 <section id="what-svytable1-does" class="level3"><h3 class="anchored" data-anchor-id="what-svytable1-does">What <code>svytable1</code> Does</h3>
-<p>The <code>svytable1</code> function creates a descriptive summary table—commonly referred to as a <strong>“Table 1”</strong>—from complex survey data. It is specifically designed to produce publication-ready results that align with <a href="https://wwwn.cdc.gov/nchs/nhanes/tutorials/reliabilityofestimates.aspx">NCHS Data Presentation Standards for reliability</a>.</p>
+<p>The <code>svytable1</code> function creates a descriptive summary table—commonly referred to as a <strong>“Table 1”</strong>—from complex survey data <span class="citation" data-cites="svyTable1">(<a href="#ref-svyTable1" role="doc-biblioref">Karim 2025</a>)</span>. It is specifically designed to produce publication-ready results that align with <a href="https://wwwn.cdc.gov/nchs/nhanes/tutorials/reliabilityofestimates.aspx">NCHS Data Presentation Standards for reliability</a>.</p>
 </section><section id="key-svytable1-operations" class="level3"><h3 class="anchored" data-anchor-id="key-svytable1-operations">Key <code>svytable1</code> Operations</h3>
 <p>When you call <code>svytable1</code> (<a href="https://github.com/ehsanx/svyTable1">link</a>), it performs the following steps for each analysis (for example, for all participants, men, and women):</p>
 <ol type="1">
@@ -2135,7 +2156,7 @@ <h1 class="title">NHANES: Reliability Standards</h1>
 <p><strong>The RSE Can Be Misleading for Regression</strong>: Notice that some statistically insignificant coefficients (like <code>Age40-59</code> and <code>raceHispanic</code>) have high RSEs, which is expected. However, the <code>education\&gt;High school</code> coefficient is highly insignificant: p-value of 0.932 correctly tells you that this coefficient is not statistically significant and is not reliably different from zero. However its RSE is flagged as “TRUE” for being unreliable. The RSE is calculated as (0.147 / -0.013) * 100 = 1109%. Here, the extremely high RSE here is not a result of a large standard error, but of the coefficient estimate being very close to zero. An inflated RSE doesn’t provide any new or more accurate information than the p-value; it simply reflects that the coefficient itself is minuscule. This is a great example of why RSE isn’t a primary tool for regression coefficients: it can be inflated by estimates close to zero, regardless of their precision.</p>
 </section><section id="check-for-multicollinearity" class="level3"><h3 class="anchored" data-anchor-id="check-for-multicollinearity">Check for Multicollinearity</h3>
 <p>Multicollinearity occurs when predictor variables in a model are highly correlated with each other. This can inflate the standard errors and make your coefficient estimates unstable. The VIF is used to detect this issue.</p>
-<p><code>GVIF^(1/(2*Df))</code> is a scaled version of the Generalized Variance Inflation Factor (GVIF) used to assess multicollinearity in regression models with categorical predictors. It adjusts for the number of dummy variables created for a categorical variable, making its value directly comparable to the traditional VIF used for continuous predictors within the same model.</p>
+<p><code>GVIF^(1/(2*Df))</code> is a scaled version of the Generalized Variance Inflation Factor (GVIF) used to assess multicollinearity in regression models with categorical predictors <span class="citation" data-cites="fox1992generalized">(<a href="#ref-fox1992generalized" role="doc-biblioref">Fox and Monette 1992</a>)</span>. It adjusts for the number of dummy variables created for a categorical variable, making its value directly comparable to the traditional VIF used for continuous predictors within the same model.</p>
 <section id="why-gvif12df-is-necessary-for-categorical-variables" class="level4"><h4 class="anchored" data-anchor-id="why-gvif12df-is-necessary-for-categorical-variables">Why GVIF<sup>1/(2×Df)</sup> is Necessary for Categorical Variables</h4>
 <p>A categorical variable with (k) levels (e.g., <em>race</em>) is typically represented in a regression model by (k - 1) <strong>dummy variables</strong>. Dummy variables are inherently correlated because they all describe the same categorical feature. This intrinsic relationship would lead to very high — but misleading — <strong>GVIF</strong> scores if the overall GVIF were interpreted directly.</p>
 <p>The adjustment GVIF<sup>1/(2×Df)</sup> standardizes the GVIF value. It reduces the measure from a <em>hypervolume</em> of confidence for multiple coefficients down to a linear measure, making it comparable to the single VIF value used for continuous predictors. Here, <strong>Df</strong> is the degrees of freedom for the categorical term, which equals (k - 1), the number of dummy variables.</p>
@@ -2165,10 +2186,11 @@ <h1 class="title">NHANES: Reliability Standards</h1>
 </tr>
 <tr class="even">
 <td><strong>&gt; 5</strong></td>
-<td>Potentially severe multicollinearity. The predictor may have a strong overlap with others, obscuring its effect on the outcome. A more conservative cutoff of <strong>4</strong> is sometimes used.</td>
+<td>Potentially severe multicollinearity. The predictor may have a strong overlap with others, obscuring its effect on the outcome.</td>
 </tr>
 </tbody>
 </table>
+<p>A more conservative cutoff of *3** is sometimes used. The scaled GVIF, (GVIF^{1/(2·df)}), is designed to be comparable to the square root of the VIF, which explains the use of cutoffs like () (≈ 2.24) and () (≈ 3.16) <span class="citation" data-cites="nahhas2024introduction">(<a href="#ref-nahhas2024introduction" role="doc-biblioref">Nahhas 2024</a>)</span>. Larger than () (≈ 4.47) is therefore the case of severe multicollinearity.</p>
 <div class="cell">
 <div class="sourceCode cell-code" id="cb10"><pre class="sourceCode numberSource r number-lines code-with-copy"><code class="sourceCode r"><span id="cb10-1"><a href="#cb10-1"></a>vif_values <span class="ot">&lt;-</span> <span class="fu">vif</span>(fit_men_obese)</span>
 <span id="cb10-2"><a href="#cb10-2"></a><span class="fu">print</span>(vif_values)</span>
@@ -2419,9 +2441,24 @@ <h1 class="title">NHANES: Reliability Standards</h1>
 </div>
 </section><section id="differences-from-the-original-paper" class="level3"><h3 class="anchored" data-anchor-id="differences-from-the-original-paper">Differences from the Original Paper</h3>
 <p>While the ‘Statistical Analyses’ section of Flegal et al.&nbsp;(2016) details their models, it does not explicitly state the method used to handle missing data for covariates. Our replication employs a complete-case analysis, which excludes participants with missing smoking or education data from the models. This difference is the most likely reason for the minor discrepancies between our results and those published in the original paper.</p>
+</section></section><section id="references" class="level2"><h2 class="anchored" data-anchor-id="references">References</h2>
 
 
-</section></section></main><!-- /main --><script id="quarto-html-after-body" type="application/javascript">
+<div id="refs" class="references csl-bib-body hanging-indent" data-entry-spacing="0" role="list">
+<div id="ref-nhanes_reliability_estimates" class="csl-entry" role="listitem">
+Disease Control, Centers for, and Prevention. 2025. <span>“NHANES Tutorials: Reliability of Estimates Module.”</span> National Center for Health Statistics. <a href="https://wwwn.cdc.gov/nchs/nhanes/tutorials/reliabilityofestimates.aspx">https://wwwn.cdc.gov/nchs/nhanes/tutorials/reliabilityofestimates.aspx</a>.
+</div>
+<div id="ref-fox1992generalized" class="csl-entry" role="listitem">
+Fox, John, and Georges Monette. 1992. <span>“Generalized Collinearity Diagnostics.”</span> <em>Journal of the American Statistical Association</em> 87 (417): 178–83.
+</div>
+<div id="ref-svyTable1" class="csl-entry" role="listitem">
+Karim, Mohammad Ehsanul. 2025. <span>“svyTable1: Table 1 and Diagnostics from Complex Survey Designs.”</span> <a href="https://github.com/ehsanx/svyTable1">https://github.com/ehsanx/svyTable1</a>.
+</div>
+<div id="ref-nahhas2024introduction" class="csl-entry" role="listitem">
+Nahhas, Ramzi W. 2024. <em>Introduction to Regression Methods for Public Health Using r</em>. CRC Press.
+</div>
+</div>
+</section></main><!-- /main --><script id="quarto-html-after-body" type="application/javascript">
   window.document.addEventListener("DOMContentLoaded", function (event) {
     const icon = "";
     const anchorJS = new window.AnchorJS();
 
@@ -9,6 +9,43 @@ @article{vach1991biased
   publisher={Oxford University Press}
 }
 
+@Misc{svyTable1,
+  author = {Karim, Mohammad Ehsanul},
+  title = {svyTable1: Table 1 and diagnostics from complex survey designs},
+  year = {2025},
+  version = {0.1.0},
+  note = {R package},
+  url = {https://github.com/ehsanx/svyTable1},
+  accessed = {2025-10-13}
+}
+
+@misc{nhanes_reliability_estimates,
+  author = {Centers for Disease Control and Prevention},
+  title = {NHANES Tutorials: Reliability of Estimates Module},
+  organization = {National Center for Health Statistics},
+  year = {2025},
+  url = {https://wwwn.cdc.gov/nchs/nhanes/tutorials/reliabilityofestimates.aspx},
+  note = {Accessed: October 12, 2025}
+}
+
+@article{fox1992generalized,
+  title={Generalized collinearity diagnostics},
+  author={Fox, John and Monette, Georges},
+  journal={Journal of the American Statistical Association},
+  volume={87},
+  number={417},
+  pages={178--183},
+  year={1992},
+  publisher={Taylor \& Francis}
+}
+
+@book{nahhas2024introduction,
+  title={Introduction to regression methods for public health using R},
+  author={Nahhas, Ramzi W},
+  year={2024},
+  publisher={CRC Press}
+}
+
 @article{austin2021missing,
   title={Missing data in clinical research: a tutorial on multiple imputation},
   author={Austin, Peter C and White, Ian R and Lee, Douglas S and van Buuren, Stef},
 
@@ -2,7 +2,7 @@
 
 ## Introduction
 
-This tutorial reproduces the key tables from the [Flegal et al. (2016)](https://jamanetwork.com/journals/jama/article-abstract/2526639) article. The analysis uses the same NHANES data and aims to replicate the unweighted sample size counts from Table 1 and the weighted logistic regression models from Table 3. We incorporate [NCHS/CDC reliability standards](https://wwwn.cdc.gov/nchs/nhanes/tutorials/reliabilityofestimates.aspx) to ensure estimates are statistically defensible.
+This tutorial reproduces the key tables from the [Flegal et al. (2016)](https://jamanetwork.com/journals/jama/article-abstract/2526639) article. The analysis uses the same NHANES data and aims to replicate the unweighted sample size counts from Table 1 and the weighted logistic regression models from Table 3. We incorporate [NCHS/CDC reliability standards](https://wwwn.cdc.gov/nchs/nhanes/tutorials/reliabilityofestimates.aspx) to ensure estimates are statistically defensible [@nhanes_reliability_estimates].
 
 ```{r setup, include=FALSE}
 knitr::opts_chunk$set(echo = TRUE, message = FALSE, warning = FALSE)
@@ -97,7 +97,7 @@ This section reproduces the unweighted sample sizes shown in Flegal et al.'s Tab
 
 ### What `svytable1` Does
 
-The `svytable1` function creates a descriptive summary table—commonly referred to as a **“Table 1”**—from complex survey data. It is specifically designed to produce publication-ready results that align with [NCHS Data Presentation Standards for reliability](https://wwwn.cdc.gov/nchs/nhanes/tutorials/reliabilityofestimates.aspx).
+The `svytable1` function creates a descriptive summary table—commonly referred to as a **“Table 1”**—from complex survey data [@svyTable1]. It is specifically designed to produce publication-ready results that align with [NCHS Data Presentation Standards for reliability](https://wwwn.cdc.gov/nchs/nhanes/tutorials/reliabilityofestimates.aspx).
 
 ### Key `svytable1` Operations
 
@@ -305,7 +305,7 @@ The few significant predictors are `raceNon-Hispanic Asian` and `education\<High
 
 Multicollinearity occurs when predictor variables in a model are highly correlated with each other. This can inflate the standard errors and make your coefficient estimates unstable. The VIF is used to detect this issue.
 
-`GVIF^(1/(2*Df))` is a scaled version of the Generalized Variance Inflation Factor (GVIF) used to assess multicollinearity in regression models with categorical predictors. It adjusts for the number of dummy variables created for a categorical variable, making its value directly comparable to the traditional VIF used for continuous predictors within the same model.
+`GVIF^(1/(2*Df))` is a scaled version of the Generalized Variance Inflation Factor (GVIF) used to assess multicollinearity in regression models with categorical predictors [@fox1992generalized]. It adjusts for the number of dummy variables created for a categorical variable, making its value directly comparable to the traditional VIF used for continuous predictors within the same model.
 
 #### Why GVIF<sup>1/(2×Df)</sup> is Necessary for Categorical Variables
 
@@ -322,7 +322,9 @@ The interpretation of GVIF<sup>1/(2×Df)</sup> follows the same guidelines as th
 | **1** | No correlation among predictors. |
 | **1 – 2.5** | Low to moderate correlation — generally acceptable (typical for most well-specified models). |
 | **2.5 – 5** | Moderate to high correlation — may warrant further investigation. |
-| **\> 5** | Potentially severe multicollinearity. The predictor may have a strong overlap with others, obscuring its effect on the outcome. A more conservative cutoff of **4** is sometimes used. |
+| **\> 5** | Potentially severe multicollinearity. The predictor may have a strong overlap with others, obscuring its effect on the outcome. |
+
+A more conservative cutoff of *3** is sometimes used. The scaled GVIF, \(GVIF^{1/(2·df)}\), is designed to be comparable to the square root of the VIF, which explains the use of cutoffs like \(\sqrt{5}\) (≈ 2.24) and \(\sqrt{10}\) (≈ 3.16) [@nahhas2024introduction]. Larger than \(\sqrt{20}\) (≈ 4.47) is therefore the case of severe multicollinearity.
 
 ```{r vif-check, cache=TRUE}
 vif_values <- vif(fit_men_obese)
@@ -379,3 +381,5 @@ kable(final_table, caption = "Weighted Logistic Regression Models for Obesity",
 ### Differences from the Original Paper
 
 While the 'Statistical Analyses' section of Flegal et al. (2016) details their models, it does not explicitly state the method used to handle missing data for covariates. Our replication employs a complete-case analysis, which excludes participants with missing smoking or education data from the models. This difference is the most likely reason for the minor discrepancies between our results and those published in the original paper.
+
+## References