Skip to content

Commit 584dcc1

Browse files
committed
linkp
1 parent b4b4380 commit 584dcc1

File tree

4 files changed

+87
-42
lines changed

4 files changed

+87
-42
lines changed

_freeze/surveydata9/execute-results/html.json

Lines changed: 2 additions & 2 deletions
Large diffs are not rendered by default.

docs/search.json

Lines changed: 2 additions & 13 deletions
Large diffs are not rendered by default.

docs/surveydata9.html

Lines changed: 50 additions & 16 deletions
Original file line numberDiff line numberDiff line change
@@ -1298,13 +1298,13 @@
12981298
<ul>
12991299
<li><a href="#introduction" id="toc-introduction" class="nav-link active" data-scroll-target="#introduction">Introduction</a></li>
13001300
<li><a href="#setup-and-data-preparation" id="toc-setup-and-data-preparation" class="nav-link" data-scroll-target="#setup-and-data-preparation">Setup and Data Preparation</a></li>
1301-
<li><a href="#reproducing-table-1" id="toc-reproducing-table-1" class="nav-link" data-scroll-target="#reproducing-table-1">Reproducing Table 1</a></li>
13021301
<li>
1303-
<a href="#what-svytable1-does" id="toc-what-svytable1-does" class="nav-link" data-scroll-target="#what-svytable1-does">What <code>svytable1</code> Does</a>
1302+
<a href="#reproducing-table-1" id="toc-reproducing-table-1" class="nav-link" data-scroll-target="#reproducing-table-1">Reproducing Table 1</a>
13041303
<ul class="collapse">
1305-
<li><a href="#key-operations" id="toc-key-operations" class="nav-link" data-scroll-target="#key-operations">Key Operations</a></li>
1304+
<li><a href="#what-svytable1-does" id="toc-what-svytable1-does" class="nav-link" data-scroll-target="#what-svytable1-does">What <code>svytable1</code> Does</a></li>
1305+
<li><a href="#key-svytable1-operations" id="toc-key-svytable1-operations" class="nav-link" data-scroll-target="#key-svytable1-operations">Key <code>svytable1</code> Operations</a></li>
13061306
<li><a href="#what-the-asterisk-means" id="toc-what-the-asterisk-means" class="nav-link" data-scroll-target="#what-the-asterisk-means">What the Asterisk (*) Means</a></li>
1307-
<li><a href="#estimates-are-suppressed-if-they-fail-one-or-more-nchs-reliability-rules-for-proportions" id="toc-estimates-are-suppressed-if-they-fail-one-or-more-nchs-reliability-rules-for-proportions" class="nav-link" data-scroll-target="#estimates-are-suppressed-if-they-fail-one-or-more-nchs-reliability-rules-for-proportions">Estimates are suppressed if they fail one or more NCHS reliability rules (for proportions):</a></li>
1307+
<li><a href="#nchs-reliability-rules-for-proportions" id="toc-nchs-reliability-rules-for-proportions" class="nav-link" data-scroll-target="#nchs-reliability-rules-for-proportions">NCHS reliability rules (for proportions)</a></li>
13081308
<li><a href="#reliability-metrics-table" id="toc-reliability-metrics-table" class="nav-link" data-scroll-target="#reliability-metrics-table">Reliability Metrics Table</a></li>
13091309
<li><a href="#format-and-combine-the-tables" id="toc-format-and-combine-the-tables" class="nav-link" data-scroll-target="#format-and-combine-the-tables">Format and combine the tables</a></li>
13101310
</ul>
@@ -1425,10 +1425,10 @@ <h1 class="title">NHANES: Reliability Standards</h1>
14251425
</div>
14261426
</section><section id="reproducing-table-1" class="level2"><h2 class="anchored" data-anchor-id="reproducing-table-1">Reproducing Table 1</h2>
14271427
<p>This section reproduces the unweighted sample sizes shown in Flegal et al.’s Table 1. The code first generates separate summary tables for all participants, men, and women. It then performs several formatting steps to combine these into a single table.</p>
1428-
</section><section id="what-svytable1-does" class="level2"><h2 class="anchored" data-anchor-id="what-svytable1-does">What <code>svytable1</code> Does</h2>
1429-
<p>The <code>svytable1</code> function creates a descriptive summary table—commonly referred to as a <strong>“Table 1”</strong>—from complex survey data. It is specifically designed to produce publication-ready results that align with <strong>NCHS Data Presentation Standards</strong> for reliability.</p>
1430-
<section id="key-operations" class="level3"><h3 class="anchored" data-anchor-id="key-operations">Key Operations</h3>
1431-
<p>When you call <code>svytable1</code>, it performs the following steps for each analysis (for example, for all participants, men, and women):</p>
1428+
<section id="what-svytable1-does" class="level3"><h3 class="anchored" data-anchor-id="what-svytable1-does">What <code>svytable1</code> Does</h3>
1429+
<p>The <code>svytable1</code> function creates a descriptive summary table—commonly referred to as a <strong>“Table 1”</strong>—from complex survey data. It is specifically designed to produce publication-ready results that align with <a href="https://wwwn.cdc.gov/nchs/nhanes/tutorials/reliabilityofestimates.aspx">NCHS Data Presentation Standards for reliability</a>.</p>
1430+
</section><section id="key-svytable1-operations" class="level3"><h3 class="anchored" data-anchor-id="key-svytable1-operations">Key <code>svytable1</code> Operations</h3>
1431+
<p>When you call <code>svytable1</code> (<a href="https://github.com/ehsanx/svyTable1">link</a>), it performs the following steps for each analysis (for example, for all participants, men, and women):</p>
14321432
<ol type="1">
14331433
<li><p><strong>Calculates Proportions</strong><br>
14341434
It summarizes categorical variables (like <code>Age</code>) by calculating the proportion of participants in each category (e.g., 20–39, 40–59, ≥60).</p></li>
@@ -1447,7 +1447,8 @@ <h1 class="title">NHANES: Reliability Standards</h1>
14471447
When <code>reliability_checks = TRUE</code>, the function evaluates each estimate against NCHS Data Presentation Standards. These checks prevent publication of unstable or statistically unreliable estimates.</p></li>
14481448
</ol></section><section id="what-the-asterisk-means" class="level3"><h3 class="anchored" data-anchor-id="what-the-asterisk-means">What the Asterisk (*) Means</h3>
14491449
<p>An asterisk (<code>*</code>) in these tables output indicate <strong>suppression</strong>: the estimate was determined to be statistically unreliable. The function hides the unreliable value to avoid misinterpretation.</p>
1450-
</section><section id="estimates-are-suppressed-if-they-fail-one-or-more-nchs-reliability-rules-for-proportions" class="level3"><h3 class="anchored" data-anchor-id="estimates-are-suppressed-if-they-fail-one-or-more-nchs-reliability-rules-for-proportions">Estimates are suppressed if they fail one or more NCHS reliability rules (for proportions):</h3>
1450+
</section><section id="nchs-reliability-rules-for-proportions" class="level3"><h3 class="anchored" data-anchor-id="nchs-reliability-rules-for-proportions">NCHS reliability rules (for proportions)</h3>
1451+
<p>Estimates are suppressed if they fail one or more NCHS reliability rules (for proportions):</p>
14511452
<ul>
14521453
<li>
14531454
<strong>fail_n_30:</strong> The unweighted sample size (n) is fewer than 30 participants.<br>
@@ -2122,12 +2123,45 @@ <h1 class="title">NHANES: Reliability Standards</h1>
21222123
</table>
21232124
</div>
21242125
</div>
2125-
<p>Generally, the model shows limited reliability and predictive power. Most of the predictor variables, such as Age and smoking status, are not statistically significant (their p.value is high). This indicates that, for men in this dataset, these factors don’t have a clear, reliable association with obesity.</p>
2126-
<p>The few significant predictors are raceNon-Hispanic Asian and education&lt;High school. These coefficients are considered stable and reliable. The unreliability of the other terms is not caused by the variables being correlated with each other, as the multicollinearity check shows.</p>
2127-
<p><strong>The RSE Can Be Misleading for Regression</strong>: Notice that some statistically insignificant coefficients (like Age40-59 and raceHispanic) have high RSEs, which is expected. However, the education&gt;High school coefficient is highly insignificant, but its RSE is flagged as “TRUE” for being unreliable. This is a great example of why RSE isn’t a primary tool for regression coefficientsit can be inflated by estimates close to zero, regardless of their precision.</p>
2126+
<p>Generally, the model shows limited reliability and predictive power. Most of the predictor variables, such as <code>Age</code> and <code>smoking status</code>, are not statistically significant (their <code>p.value</code> is high). This indicates that, for men in this dataset, these factors don’t have a clear, reliable association with obesity.</p>
2127+
<p>The few significant predictors are <code>raceNon-Hispanic Asian</code> and <code>education\&lt;High school</code>. These coefficients are considered stable and reliable. The unreliability of the other terms is not caused by the variables being correlated with each other, as the multicollinearity check shows.</p>
2128+
<p><strong>The RSE Can Be Misleading for Regression</strong>: Notice that some statistically insignificant coefficients (like <code>Age40-59</code> and <code>raceHispanic</code>) have high RSEs, which is expected. However, the <code>education\&gt;High school</code> coefficient is highly insignificant: p-value of 0.932 correctly tells you that this coefficient is not statistically significant and is not reliably different from zero. However its RSE is flagged as “TRUE” for being unreliable. The RSE is calculated as (0.147 / -0.013) * 100 = 1109%. Here, the extremely high RSE here is not a result of a large standard error, but of the coefficient estimate being very close to zero. An inflated RSE doesn’t provide any new or more accurate information than the p-value; it simply reflects that the coefficient itself is minuscule. This is a great example of why RSE isn’t a primary tool for regression coefficients: it can be inflated by estimates close to zero, regardless of their precision.</p>
21282129
</section><section id="check-for-multicollinearity" class="level3"><h3 class="anchored" data-anchor-id="check-for-multicollinearity">Check for Multicollinearity</h3>
2129-
<p>Multicollinearity occurs when predictor variables in a model are highly correlated with each other. This can inflate the standard errors and make your coefficient estimates unstable. The Variance Inflation Factor (VIF) is used to detect this issue.</p>
2130-
<p>For categorical predictors such as Age, race, smoking, and education, we look at the Generalized VIF (GVIF). To make it comparable to a standard VIF, we look at the GVIF^(1/(2*Df)) value.</p>
2130+
<p>Multicollinearity occurs when predictor variables in a model are highly correlated with each other. This can inflate the standard errors and make your coefficient estimates unstable. The VIF is used to detect this issue.</p>
2131+
<p><code>GVIF^(1/(2*Df))</code> is a scaled version of the Generalized Variance Inflation Factor (GVIF) used to assess multicollinearity in regression models with categorical predictors. It adjusts for the number of dummy variables created for a categorical variable, making its value directly comparable to the traditional VIF used for continuous predictors within the same model.</p>
2132+
<section id="why-gvif12df-is-necessary-for-categorical-variables" class="level4"><h4 class="anchored" data-anchor-id="why-gvif12df-is-necessary-for-categorical-variables">Why GVIF<sup>1/(2×Df)</sup> is Necessary for Categorical Variables</h4>
2133+
<p>A categorical variable with (k) levels (e.g., <em>race</em>) is typically represented in a regression model by (k - 1) <strong>dummy variables</strong>. Dummy variables are inherently correlated because they all describe the same categorical feature. This intrinsic relationship would lead to very high — but misleading — <strong>GVIF</strong> scores if the overall GVIF were interpreted directly.</p>
2134+
<p>The adjustment GVIF<sup>1/(2×Df)</sup> standardizes the GVIF value. It reduces the measure from a <em>hypervolume</em> of confidence for multiple coefficients down to a linear measure, making it comparable to the single VIF value used for continuous predictors. Here, <strong>Df</strong> is the degrees of freedom for the categorical term, which equals (k - 1), the number of dummy variables.</p>
2135+
<section id="acceptable-ranges-and-interpretation" class="level5"><h5 class="anchored" data-anchor-id="acceptable-ranges-and-interpretation">Acceptable Ranges and Interpretation</h5>
2136+
<p>The interpretation of GVIF<sup>1/(2×Df)</sup> follows the same guidelines as the standard VIF:</p>
2137+
<table class="caption-top table">
2138+
<colgroup>
2139+
<col style="width: 25%">
2140+
<col style="width: 74%">
2141+
</colgroup>
2142+
<thead><tr class="header">
2143+
<th>GVIF<sup>1/(2×Df)</sup> Range</th>
2144+
<th>Interpretation</th>
2145+
</tr></thead>
2146+
<tbody>
2147+
<tr class="odd">
2148+
<td><strong>1</strong></td>
2149+
<td>No correlation among predictors.</td>
2150+
</tr>
2151+
<tr class="even">
2152+
<td><strong>1 – 2.5</strong></td>
2153+
<td>Low to moderate correlation — generally acceptable (typical for most well-specified models).</td>
2154+
</tr>
2155+
<tr class="odd">
2156+
<td><strong>2.5 – 5</strong></td>
2157+
<td>Moderate to high correlation — may warrant further investigation.</td>
2158+
</tr>
2159+
<tr class="even">
2160+
<td><strong>&gt; 5</strong></td>
2161+
<td>Potentially severe multicollinearity. The predictor may have a strong overlap with others, obscuring its effect on the outcome. A more conservative cutoff of <strong>4</strong> is sometimes used.</td>
2162+
</tr>
2163+
</tbody>
2164+
</table>
21312165
<div class="cell">
21322166
<div class="sourceCode cell-code" id="cb10"><pre class="sourceCode numberSource r number-lines code-with-copy"><code class="sourceCode r"><span id="cb10-1"><a href="#cb10-1"></a>vif_values <span class="ot">&lt;-</span> <span class="fu">vif</span>(fit_men_obese)</span>
21332167
<span id="cb10-2"><a href="#cb10-2"></a><span class="fu">print</span>(vif_values)</span>
@@ -2137,8 +2171,8 @@ <h1 class="title">NHANES: Reliability Standards</h1>
21372171
<span id="cb10-6"><a href="#cb10-6"></a><span class="co">#&gt; smoking 3.435829 2 1.361469</span></span>
21382172
<span id="cb10-7"><a href="#cb10-7"></a><span class="co">#&gt; education 6.381028 2 1.589361</span></span></code><button title="Copy to Clipboard" class="code-copy-button"><i class="bi"></i></button></pre></div>
21392173
</div>
2140-
<p>The key values in the GVIF^(1/(2*Df)) column are all low (below 2.5). This confirms that your predictor variables are independent enough from one another and are not artificially inflating each other’s standard errors. The lack of precision in the model comes from other sources, not from multicollinearity.</p>
2141-
</section><section id="formatting-the-table" class="level3"><h3 class="anchored" data-anchor-id="formatting-the-table">Formatting the Table</h3>
2174+
<p>The key values in the <code>GVIF\^(1/(2\*Df))</code> column are all low (below 2.5). This confirms that your predictor variables are independent enough from one another and are not artificially inflating each other’s standard errors. The lack of precision in the model comes from other sources, not from multicollinearity.</p>
2175+
</section></section></section><section id="formatting-the-table" class="level3"><h3 class="anchored" data-anchor-id="formatting-the-table">Formatting the Table</h3>
21422176
<div class="cell">
21432177
<div class="sourceCode cell-code" id="cb11"><pre class="sourceCode numberSource r number-lines code-with-copy"><code class="sourceCode r"><span id="cb11-1"><a href="#cb11-1"></a><span class="co"># --- Use the helper function to format results from each model ---</span></span>
21442178
<span id="cb11-2"><a href="#cb11-2"></a>men_obese_res <span class="ot">&lt;-</span> <span class="fu">format_publish_output</span>(<span class="fu">publish</span>(fit_men_obese),</span>

0 commit comments

Comments
 (0)