Skip to content

Commit 25f4993

Browse files
committed
sqrt
1 parent 0ae2482 commit 25f4993

File tree

4 files changed

+53
-25
lines changed

4 files changed

+53
-25
lines changed

_freeze/surveydata9/execute-results/html.json

Lines changed: 2 additions & 2 deletions
Large diffs are not rendered by default.

docs/search.json

Lines changed: 4 additions & 4 deletions
Large diffs are not rendered by default.

docs/surveydata9.html

Lines changed: 38 additions & 10 deletions
Original file line numberDiff line numberDiff line change
@@ -126,7 +126,32 @@
126126
gtag('config', 'G-CVBPG0RQMY', { 'anonymize_ip': true});
127127
</script><script src="site_libs/kePrint-0.0.1/kePrint.js"></script><link href="site_libs/lightable-0.0.1/lightable.css" rel="stylesheet">
128128
<link href="site_libs/pagedtable-1.1/css/pagedtable.css" rel="stylesheet">
129-
<script src="site_libs/pagedtable-1.1/js/pagedtable.js"></script>
129+
<script src="site_libs/pagedtable-1.1/js/pagedtable.js"></script><script src="https://cdnjs.cloudflare.com/polyfill/v3/polyfill.min.js?features=es6"></script><script src="https://cdn.jsdelivr.net/npm/mathjax@3/es5/tex-chtml-full.js" type="text/javascript"></script><script type="text/javascript">
130+
const typesetMath = (el) => {
131+
if (window.MathJax) {
132+
// MathJax Typeset
133+
window.MathJax.typeset([el]);
134+
} else if (window.katex) {
135+
// KaTeX Render
136+
var mathElements = el.getElementsByClassName("math");
137+
var macros = [];
138+
for (var i = 0; i < mathElements.length; i++) {
139+
var texText = mathElements[i].firstChild;
140+
if (mathElements[i].tagName == "SPAN") {
141+
window.katex.render(texText.data, mathElements[i], {
142+
displayMode: mathElements[i].classList.contains('display'),
143+
throwOnError: false,
144+
macros: macros,
145+
fleqn: false
146+
});
147+
}
148+
}
149+
}
150+
}
151+
window.Quarto = {
152+
typesetMath
153+
};
154+
</script>
130155
<meta name="twitter:title" content="NHANES: Reliability Standards – Advanced Epidemiological Methods">
131156
<meta name="twitter:description" content="">
132157
<meta name="twitter:card" content="summary">
@@ -1365,7 +1390,7 @@ <h1 class="title">NHANES: Reliability Standards</h1>
13651390

13661391

13671392
</header><section id="introduction" class="level2"><h2 class="anchored" data-anchor-id="introduction">Introduction</h2>
1368-
<p>This tutorial reproduces the key tables from the <a href="https://jamanetwork.com/journals/jama/article-abstract/2526639">Flegal et al.&nbsp;(2016)</a> article. The analysis uses the same NHANES data and aims to replicate the unweighted sample size counts from Table 1 and the weighted logistic regression models from Table 3. We incorporate <a href="https://wwwn.cdc.gov/nchs/nhanes/tutorials/reliabilityofestimates.aspx">NCHS/CDC reliability standards</a> to ensure estimates are statistically defensible <span class="citation" data-cites="nhanes_reliability_estimates">(<a href="#ref-nhanes_reliability_estimates" role="doc-biblioref">Disease Control and Prevention 2025</a>)</span>.</p>
1393+
<p>This tutorial reproduces the key tables from the <a href="https://jamanetwork.com/journals/jama/article-abstract/2526639">Flegal et al.&nbsp;(2016)</a> article <span class="citation" data-cites="flegal2016trends">(<a href="#ref-flegal2016trends" role="doc-biblioref">Flegal et al. 2016</a>)</span>. The analysis uses the same NHANES data and aims to replicate the unweighted sample size counts from Table 1 and the weighted logistic regression models from Table 3. We incorporate <a href="https://wwwn.cdc.gov/nchs/nhanes/tutorials/reliabilityofestimates.aspx">NCHS/CDC reliability standards</a> to ensure estimates are statistically defensible <span class="citation" data-cites="nhanes_reliability_estimates">(<a href="#ref-nhanes_reliability_estimates" role="doc-biblioref">Disease Control and Prevention 2025</a>)</span>.</p>
13691394
</section><section id="setup-and-data-preparation" class="level2"><h2 class="anchored" data-anchor-id="setup-and-data-preparation">Setup and Data Preparation</h2>
13701395
<p>This first section prepares the data for analysis. The key steps are:</p>
13711396
<ul>
@@ -1456,7 +1481,7 @@ <h1 class="title">NHANES: Reliability Standards</h1>
14561481
<section id="what-svytable1-does" class="level3"><h3 class="anchored" data-anchor-id="what-svytable1-does">What <code>svytable1</code> Does</h3>
14571482
<p>The <code>svytable1</code> function creates a descriptive summary table—commonly referred to as a <strong>“Table 1”</strong>—from complex survey data <span class="citation" data-cites="svyTable1">(<a href="#ref-svyTable1" role="doc-biblioref">Karim 2025</a>)</span>. It is specifically designed to produce publication-ready results that align with <a href="https://wwwn.cdc.gov/nchs/nhanes/tutorials/reliabilityofestimates.aspx">NCHS Data Presentation Standards for reliability</a>.</p>
14581483
</section><section id="key-svytable1-operations" class="level3"><h3 class="anchored" data-anchor-id="key-svytable1-operations">Key <code>svytable1</code> Operations</h3>
1459-
<p>When you call <code>svytable1</code> (<a href="https://github.com/ehsanx/svyTable1">link</a>), it performs the following steps for each analysis (for example, for all participants, men, and women):</p>
1484+
<p>When we call <code>svytable1</code> (<a href="https://github.com/ehsanx/svyTable1">link</a>), it performs the following steps for each analysis (for example, for all participants, men, and women):</p>
14601485
<ol type="1">
14611486
<li><p><strong>Calculates Proportions</strong><br>
14621487
It summarizes categorical variables (like <code>Age</code>) by calculating the proportion of participants in each category (e.g., 20–39, 40–59, ≥60).</p></li>
@@ -1495,10 +1520,10 @@ <h1 class="title">NHANES: Reliability Standards</h1>
14951520
Each of these flags indicates limited precision or instability in the estimate.</li>
14961521
</ul>
14971522
<p>In the output, the asterisks appear in the <strong>“Other” race</strong> column for certain age groups (such as “40–59” and “≥60”).<br>
1498-
This happens because the <strong>number of participants</strong> in those cells is very small, producing unstable or wide confidence intervals. Thus, the function correctly replaces the unreliable estimates with <code>*</code>, ensuring your published results remain statistically defensible and transparent.</p>
1523+
This happens because the <strong>number of participants</strong> in those cells is very small, producing unstable or wide confidence intervals. Thus, the function correctly replaces the unreliable estimates with <code>*</code>, ensuring the published results remain statistically defensible and transparent.</p>
14991524
</section><section id="reliability-metrics-table" class="level3"><h3 class="anchored" data-anchor-id="reliability-metrics-table">Reliability Metrics Table</h3>
15001525
<p>In addition to the detailed checks for proportions, the <code>svytable1</code> function also assesses the reliability of means for numeric variables. For these estimates, it applies the standard NCHS recommendation, which uses the Relative Standard Error (RSE). If a mean’s RSE is 30% or greater, it is considered statistically unreliable and will be suppressed with an asterisk (*) in the formatted table.</p>
1501-
<p>The <code>$reliability_metrics</code> table will be printed with the output if you select <code>return_metrics = TRUE</code> which will include rows for each mean, reporting the calculated RSE and the outcome of this check in the <code>fail_rse_30</code> column.</p>
1526+
<p>The <code>$reliability_metrics</code> table will be printed with the output if we select <code>return_metrics = TRUE</code> which will include rows for each mean, reporting the calculated RSE and the outcome of this check in the <code>fail_rse_30</code> column.</p>
15021527
<div class="cell">
15031528
<div class="sourceCode cell-code" id="cb2"><pre class="sourceCode numberSource r number-lines code-with-copy"><code class="sourceCode r"><span id="cb2-1"><a href="#cb2-1"></a><span class="co"># View reliability_metrics</span></span>
15041529
<span id="cb2-2"><a href="#cb2-2"></a>table1_svy <span class="ot">&lt;-</span> <span class="fu">svytable1</span>(</span>
@@ -1973,7 +1998,7 @@ <h1 class="title">NHANES: Reliability Standards</h1>
19731998
</ul>
19741999
<ol type="1">
19752000
<li><p><strong>The Standard Error (SE)</strong>: A direct measure of the coefficient’s precision. A smaller SE relative to its coefficient suggests a more reliable estimate.</p></li>
1976-
<li><p><strong>The p-value</strong>: Tells you if the coefficient is statistically distinguishable from zero. A non-significant p-value (e.g., p &gt; 0.05) means we cannot be confident the predictor has any association with the outcome.</p></li>
2001+
<li><p><strong>The p-value</strong>: Tells if the coefficient is statistically distinguishable from zero. A non-significant p-value (e.g., p &gt; 0.05) means we cannot be confident the predictor has any association with the outcome.</p></li>
19772002
<li><p><strong>The Confidence Interval (CI)</strong>: Provides a plausible range for the true value of the coefficient. A very wide CI indicates a high degree of uncertainty and, therefore, low reliability. For logistic regression, if the CI for the odds ratio contains 1.0, the result is not statistically significant.</p></li>
19782003
</ol>
19792004
<p>We will also calculate the RSE to demonstrate why it can be misleading. Finally, we’ll run a quick check for multicollinearity using the Variance Inflation Factor (VIF), as this is a common cause of unstable (unreliable) coefficients.</p>
@@ -2153,9 +2178,9 @@ <h1 class="title">NHANES: Reliability Standards</h1>
21532178
</div>
21542179
<p>Generally, the model shows limited reliability and predictive power. Most of the predictor variables, such as <code>Age</code> and <code>smoking status</code>, are not statistically significant (their <code>p.value</code> is high). This indicates that, for men in this dataset, these factors don’t have a clear, reliable association with obesity.</p>
21552180
<p>The few significant predictors are <code>raceNon-Hispanic Asian</code> and <code>education\&lt;High school</code>. These coefficients are considered stable and reliable. The unreliability of the other terms is not caused by the variables being correlated with each other, as the multicollinearity check shows.</p>
2156-
<p><strong>The RSE Can Be Misleading for Regression</strong>: Notice that some statistically insignificant coefficients (like <code>Age40-59</code> and <code>raceHispanic</code>) have high RSEs, which is expected. However, the <code>education\&gt;High school</code> coefficient is highly insignificant: p-value of 0.932 correctly tells you that this coefficient is not statistically significant and is not reliably different from zero. However its RSE is flagged as “TRUE” for being unreliable. The RSE is calculated as (0.147 / -0.013) * 100 = 1109%. Here, the extremely high RSE here is not a result of a large standard error, but of the coefficient estimate being very close to zero. An inflated RSE doesn’t provide any new or more accurate information than the p-value; it simply reflects that the coefficient itself is minuscule. This is a great example of why RSE isn’t a primary tool for regression coefficients: it can be inflated by estimates close to zero, regardless of their precision.</p>
2181+
<p><strong>The RSE Can Be Misleading for Regression</strong>: Notice that some statistically insignificant coefficients (like <code>Age40-59</code> and <code>raceHispanic</code>) have high RSEs, which is expected. However, the <code>education\&gt;High school</code> coefficient is highly insignificant: p-value of 0.932 correctly tells that this coefficient is not statistically significant and is not reliably different from zero. However its RSE is flagged as “TRUE” for being unreliable. The RSE is calculated as (0.147 / -0.013) * 100 = 1109%. Here, the extremely high RSE here is not a result of a large standard error, but of the coefficient estimate being very close to zero. An inflated RSE doesn’t provide any new or more accurate information than the p-value; it simply reflects that the coefficient itself is minuscule. This is a great example of why RSE isn’t a primary tool for regression coefficients: it can be inflated by estimates close to zero, regardless of their precision.</p>
21572182
</section><section id="check-for-multicollinearity" class="level3"><h3 class="anchored" data-anchor-id="check-for-multicollinearity">Check for Multicollinearity</h3>
2158-
<p>Multicollinearity occurs when predictor variables in a model are highly correlated with each other. This can inflate the standard errors and make your coefficient estimates unstable. The VIF is used to detect this issue.</p>
2183+
<p>Multicollinearity occurs when predictor variables in a model are highly correlated with each other. This can inflate the standard errors and make the coefficient estimates unstable. The VIF is used to detect this issue.</p>
21592184
<p><code>GVIF^(1/(2*Df))</code> is a scaled version of the Generalized Variance Inflation Factor (GVIF) used to assess multicollinearity in regression models with categorical predictors <span class="citation" data-cites="fox1992generalized">(<a href="#ref-fox1992generalized" role="doc-biblioref">Fox and Monette 1992</a>)</span>. It adjusts for the number of dummy variables created for a categorical variable, making its value directly comparable to the traditional VIF used for continuous predictors within the same model.</p>
21602185
<section id="why-gvif12df-is-necessary-for-categorical-variables" class="level4"><h4 class="anchored" data-anchor-id="why-gvif12df-is-necessary-for-categorical-variables">Why GVIF<sup>1/(2×Df)</sup> is Necessary for Categorical Variables</h4>
21612186
<p>A categorical variable with (k) levels (e.g., <em>race</em>) is typically represented in a regression model by (k - 1) <strong>dummy variables</strong>. Dummy variables are inherently correlated because they all describe the same categorical feature. This intrinsic relationship would lead to very high — but misleading — <strong>GVIF</strong> scores if the overall GVIF were interpreted directly.</p>
@@ -2190,7 +2215,7 @@ <h1 class="title">NHANES: Reliability Standards</h1>
21902215
</tr>
21912216
</tbody>
21922217
</table>
2193-
<p>A more conservative cutoff of *3** is sometimes used. The scaled GVIF, (GVIF^{1/(2·df)}), is designed to be comparable to the square root of the VIF, which explains the use of cutoffs like () (≈ 2.24) and () (≈ 3.16) <span class="citation" data-cites="nahhas2024introduction">(<a href="#ref-nahhas2024introduction" role="doc-biblioref">Nahhas 2024</a>)</span>. Larger than () (≈ 4.47) is therefore the case of severe multicollinearity.</p>
2218+
<p>A more conservative cutoff of <strong>3</strong> is sometimes used. The scaled GVIF, <span class="math inline">\(GVIF^{1/(2·df)}\)</span>, is designed to be comparable to the square root of the VIF, which explains the use of cutoffs like <span class="math inline">\(\sqrt{5}\)</span> (≈ 2.24) and <span class="math inline">\(\sqrt{10}\)</span> (≈ 3.16) <span class="citation" data-cites="nahhas2024introduction">(<a href="#ref-nahhas2024introduction" role="doc-biblioref">Nahhas 2024</a>)</span>. Larger than <span class="math inline">\(\sqrt{20}\)</span> (≈ 4.47) is therefore the case of severe multicollinearity.</p>
21942219
<div class="cell">
21952220
<div class="sourceCode cell-code" id="cb10"><pre class="sourceCode numberSource r number-lines code-with-copy"><code class="sourceCode r"><span id="cb10-1"><a href="#cb10-1"></a>vif_values <span class="ot">&lt;-</span> <span class="fu">vif</span>(fit_men_obese)</span>
21962221
<span id="cb10-2"><a href="#cb10-2"></a><span class="fu">print</span>(vif_values)</span>
@@ -2200,7 +2225,7 @@ <h1 class="title">NHANES: Reliability Standards</h1>
22002225
<span id="cb10-6"><a href="#cb10-6"></a><span class="co">#&gt; smoking 3.435829 2 1.361469</span></span>
22012226
<span id="cb10-7"><a href="#cb10-7"></a><span class="co">#&gt; education 6.381028 2 1.589361</span></span></code><button title="Copy to Clipboard" class="code-copy-button"><i class="bi"></i></button></pre></div>
22022227
</div>
2203-
<p>The key values in the <code>GVIF\^(1/(2\*Df))</code> column are all low (below 2.5). This confirms that your predictor variables are independent enough from one another and are not artificially inflating each other’s standard errors. The lack of precision in the model comes from other sources, not from multicollinearity.</p>
2228+
<p>The key values in the <code>GVIF\^(1/(2\*Df))</code> column are all low (below 2.5). This confirms that the predictor variables are independent enough from one another and are not artificially inflating each other’s standard errors. The lack of precision in the model comes from other sources, not from multicollinearity.</p>
22042229
</section></section></section><section id="formatting-the-table" class="level3"><h3 class="anchored" data-anchor-id="formatting-the-table">Formatting the Table</h3>
22052230
<div class="cell">
22062231
<div class="sourceCode cell-code" id="cb11"><pre class="sourceCode numberSource r number-lines code-with-copy"><code class="sourceCode r"><span id="cb11-1"><a href="#cb11-1"></a><span class="co"># --- Use the helper function to format results from each model ---</span></span>
@@ -2448,6 +2473,9 @@ <h1 class="title">NHANES: Reliability Standards</h1>
24482473
<div id="ref-nhanes_reliability_estimates" class="csl-entry" role="listitem">
24492474
Disease Control, Centers for, and Prevention. 2025. <span>“NHANES Tutorials: Reliability of Estimates Module.”</span> National Center for Health Statistics. <a href="https://wwwn.cdc.gov/nchs/nhanes/tutorials/reliabilityofestimates.aspx">https://wwwn.cdc.gov/nchs/nhanes/tutorials/reliabilityofestimates.aspx</a>.
24502475
</div>
2476+
<div id="ref-flegal2016trends" class="csl-entry" role="listitem">
2477+
Flegal, Katherine M, Deanna Kruszon-Moran, Margaret D Carroll, Cheryl D Fryar, and Cynthia L Ogden. 2016. <span>“Trends in Obesity Among Adults in the United States, 2005 to 2014.”</span> <em>Jama</em> 315 (21): 2284–91.
2478+
</div>
24512479
<div id="ref-fox1992generalized" class="csl-entry" role="listitem">
24522480
Fox, John, and Georges Monette. 1992. <span>“Generalized Collinearity Diagnostics.”</span> <em>Journal of the American Statistical Association</em> 87 (417): 178–83.
24532481
</div>

0 commit comments

Comments
 (0)