You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
<li><ahref="#setup-and-data-preparation" id="toc-setup-and-data-preparation" class="nav-link" data-scroll-target="#setup-and-data-preparation">Setup and Data Preparation</a></li>
<li><ahref="#what-the-asterisk-means" id="toc-what-the-asterisk-means" class="nav-link" data-scroll-target="#what-the-asterisk-means">What the Asterisk (*) Means</a></li>
1307
-
<li><ahref="#estimates-are-suppressed-if-they-fail-one-or-more-nchs-reliability-rules-for-proportions" id="toc-estimates-are-suppressed-if-they-fail-one-or-more-nchs-reliability-rules-for-proportions" class="nav-link" data-scroll-target="#estimates-are-suppressed-if-they-fail-one-or-more-nchs-reliability-rules-for-proportions">Estimates are suppressed if they fail one or more NCHS reliability rules (for proportions):</a></li>
<li><ahref="#format-and-combine-the-tables" id="toc-format-and-combine-the-tables" class="nav-link" data-scroll-target="#format-and-combine-the-tables">Format and combine the tables</a></li>
<p>This section reproduces the unweighted sample sizes shown in Flegal et al.’s Table 1. The code first generates separate summary tables for all participants, men, and women. It then performs several formatting steps to combine these into a single table.</p>
<p>The <code>svytable1</code> function creates a descriptive summary table—commonly referred to as a <strong>“Table 1”</strong>—from complex survey data. It is specifically designed to produce publication-ready results that align with <strong>NCHS Data Presentation Standards</strong> for reliability.</p>
<p>The <code>svytable1</code> function creates a descriptive summary table—commonly referred to as a <strong>“Table 1”</strong>—from complex survey data. It is specifically designed to produce publication-ready results that align with <ahref="https://wwwn.cdc.gov/nchs/nhanes/tutorials/reliabilityofestimates.aspx">NCHS Data Presentation Standards for reliability</a>.</p>
<p>When you call <code>svytable1</code> (<ahref="https://github.com/ehsanx/svyTable1">link</a>), it performs the following steps for each analysis (for example, for all participants, men, and women):</p>
It summarizes categorical variables (like <code>Age</code>) by calculating the proportion of participants in each category (e.g., 20–39, 40–59, ≥60).</p></li>
When <code>reliability_checks = TRUE</code>, the function evaluates each estimate against NCHS Data Presentation Standards. These checks prevent publication of unstable or statistically unreliable estimates.</p></li>
1448
1448
</ol></section><sectionid="what-the-asterisk-means" class="level3"><h3class="anchored" data-anchor-id="what-the-asterisk-means">What the Asterisk (*) Means</h3>
1449
1449
<p>An asterisk (<code>*</code>) in these tables output indicate <strong>suppression</strong>: the estimate was determined to be statistically unreliable. The function hides the unreliable value to avoid misinterpretation.</p>
1450
-
</section><sectionid="estimates-are-suppressed-if-they-fail-one-or-more-nchs-reliability-rules-for-proportions" class="level3"><h3class="anchored" data-anchor-id="estimates-are-suppressed-if-they-fail-one-or-more-nchs-reliability-rules-for-proportions">Estimates are suppressed if they fail one or more NCHS reliability rules (for proportions):</h3>
<p>Generally, the model shows limited reliability and predictive power. Most of the predictor variables, such as Age and smoking status, are not statistically significant (their p.value is high). This indicates that, for men in this dataset, these factors don’t have a clear, reliable association with obesity.</p>
2126
-
<p>The few significant predictors are raceNon-Hispanic Asian and education<High school. These coefficients are considered stable and reliable. The unreliability of the other terms is not caused by the variables being correlated with each other, as the multicollinearity check shows.</p>
2127
-
<p><strong>The RSE Can Be Misleading for Regression</strong>: Notice that some statistically insignificant coefficients (like Age40-59 and raceHispanic) have high RSEs, which is expected. However, the education>High school coefficient is highly insignificant, but its RSE is flagged as “TRUE” for being unreliable. This is a great example of why RSE isn’t a primary tool for regression coefficients—it can be inflated by estimates close to zero, regardless of their precision.</p>
2126
+
<p>Generally, the model shows limited reliability and predictive power. Most of the predictor variables, such as <code>Age</code> and <code>smoking status</code>, are not statistically significant (their <code>p.value</code> is high). This indicates that, for men in this dataset, these factors don’t have a clear, reliable association with obesity.</p>
2127
+
<p>The few significant predictors are <code>raceNon-Hispanic Asian</code> and <code>education\<High school</code>. These coefficients are considered stable and reliable. The unreliability of the other terms is not caused by the variables being correlated with each other, as the multicollinearity check shows.</p>
2128
+
<p><strong>The RSE Can Be Misleading for Regression</strong>: Notice that some statistically insignificant coefficients (like <code>Age40-59</code> and <code>raceHispanic</code>) have high RSEs, which is expected. However, the <code>education\>High school</code> coefficient is highly insignificant: p-value of 0.932 correctly tells you that this coefficient is not statistically significant and is not reliably different from zero. However its RSE is flagged as “TRUE” for being unreliable. The RSE is calculated as (0.147 / -0.013) * 100 = 1109%. Here, the extremely high RSE here is not a result of a large standard error, but of the coefficient estimate being very close to zero. An inflated RSE doesn’t provide any new or more accurate information than the p-value; it simply reflects that the coefficient itself is minuscule. This is a great example of why RSE isn’t a primary tool for regression coefficients: it can be inflated by estimates close to zero, regardless of their precision.</p>
2128
2129
</section><sectionid="check-for-multicollinearity" class="level3"><h3class="anchored" data-anchor-id="check-for-multicollinearity">Check for Multicollinearity</h3>
2129
-
<p>Multicollinearity occurs when predictor variables in a model are highly correlated with each other. This can inflate the standard errors and make your coefficient estimates unstable. The Variance Inflation Factor (VIF) is used to detect this issue.</p>
2130
-
<p>For categorical predictors such as Age, race, smoking, and education, we look at the Generalized VIF (GVIF). To make it comparable to a standard VIF, we look at the GVIF^(1/(2*Df)) value.</p>
2130
+
<p>Multicollinearity occurs when predictor variables in a model are highly correlated with each other. This can inflate the standard errors and make your coefficient estimates unstable. The VIF is used to detect this issue.</p>
2131
+
<p><code>GVIF^(1/(2*Df))</code> is a scaled version of the Generalized Variance Inflation Factor (GVIF) used to assess multicollinearity in regression models with categorical predictors. It adjusts for the number of dummy variables created for a categorical variable, making its value directly comparable to the traditional VIF used for continuous predictors within the same model.</p>
2132
+
<sectionid="why-gvif12df-is-necessary-for-categorical-variables" class="level4"><h4class="anchored" data-anchor-id="why-gvif12df-is-necessary-for-categorical-variables">Why GVIF<sup>1/(2×Df)</sup> is Necessary for Categorical Variables</h4>
2133
+
<p>A categorical variable with (k) levels (e.g., <em>race</em>) is typically represented in a regression model by (k - 1) <strong>dummy variables</strong>. Dummy variables are inherently correlated because they all describe the same categorical feature. This intrinsic relationship would lead to very high — but misleading — <strong>GVIF</strong> scores if the overall GVIF were interpreted directly.</p>
2134
+
<p>The adjustment GVIF<sup>1/(2×Df)</sup> standardizes the GVIF value. It reduces the measure from a <em>hypervolume</em> of confidence for multiple coefficients down to a linear measure, making it comparable to the single VIF value used for continuous predictors. Here, <strong>Df</strong> is the degrees of freedom for the categorical term, which equals (k - 1), the number of dummy variables.</p>
2135
+
<sectionid="acceptable-ranges-and-interpretation" class="level5"><h5class="anchored" data-anchor-id="acceptable-ranges-and-interpretation">Acceptable Ranges and Interpretation</h5>
2136
+
<p>The interpretation of GVIF<sup>1/(2×Df)</sup> follows the same guidelines as the standard VIF:</p>
2137
+
<tableclass="caption-top table">
2138
+
<colgroup>
2139
+
<colstyle="width: 25%">
2140
+
<colstyle="width: 74%">
2141
+
</colgroup>
2142
+
<thead><trclass="header">
2143
+
<th>GVIF<sup>1/(2×Df)</sup> Range</th>
2144
+
<th>Interpretation</th>
2145
+
</tr></thead>
2146
+
<tbody>
2147
+
<trclass="odd">
2148
+
<td><strong>1</strong></td>
2149
+
<td>No correlation among predictors.</td>
2150
+
</tr>
2151
+
<trclass="even">
2152
+
<td><strong>1 – 2.5</strong></td>
2153
+
<td>Low to moderate correlation — generally acceptable (typical for most well-specified models).</td>
2154
+
</tr>
2155
+
<trclass="odd">
2156
+
<td><strong>2.5 – 5</strong></td>
2157
+
<td>Moderate to high correlation — may warrant further investigation.</td>
2158
+
</tr>
2159
+
<trclass="even">
2160
+
<td><strong>> 5</strong></td>
2161
+
<td>Potentially severe multicollinearity. The predictor may have a strong overlap with others, obscuring its effect on the outcome. A more conservative cutoff of <strong>4</strong> is sometimes used.</td>
2162
+
</tr>
2163
+
</tbody>
2164
+
</table>
2131
2165
<divclass="cell">
2132
2166
<divclass="sourceCode cell-code" id="cb10"><preclass="sourceCode numberSource r number-lines code-with-copy"><codeclass="sourceCode r"><spanid="cb10-1"><ahref="#cb10-1"></a>vif_values <spanclass="ot"><-</span><spanclass="fu">vif</span>(fit_men_obese)</span>
<spanid="cb10-7"><ahref="#cb10-7"></a><spanclass="co">#> education 6.381028 2 1.589361</span></span></code><buttontitle="Copy to Clipboard" class="code-copy-button"><iclass="bi"></i></button></pre></div>
2139
2173
</div>
2140
-
<p>The key values in the GVIF^(1/(2*Df)) column are all low (below 2.5). This confirms that your predictor variables are independent enough from one another and are not artificially inflating each other’s standard errors. The lack of precision in the model comes from other sources, not from multicollinearity.</p>
2141
-
</section><sectionid="formatting-the-table" class="level3"><h3class="anchored" data-anchor-id="formatting-the-table">Formatting the Table</h3>
2174
+
<p>The key values in the <code>GVIF\^(1/(2\*Df))</code> column are all low (below 2.5). This confirms that your predictor variables are independent enough from one another and are not artificially inflating each other’s standard errors. The lack of precision in the model comes from other sources, not from multicollinearity.</p>
2175
+
</section></section></section><sectionid="formatting-the-table" class="level3"><h3class="anchored" data-anchor-id="formatting-the-table">Formatting the Table</h3>
2142
2176
<divclass="cell">
2143
2177
<divclass="sourceCode cell-code" id="cb11"><preclass="sourceCode numberSource r number-lines code-with-copy"><codeclass="sourceCode r"><spanid="cb11-1"><ahref="#cb11-1"></a><spanclass="co"># --- Use the helper function to format results from each model ---</span></span>
0 commit comments