You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: blog.html
+3-3Lines changed: 3 additions & 3 deletions
Original file line number
Diff line number
Diff line change
@@ -241,7 +241,7 @@ <h1 class="title">Blog</h1>
241
241
</thead>
242
242
<tbodyclass="list">
243
243
244
-
<trdata-index="0" data-categories="T3Blbi1Tb3VyY2UlMjBTa2lsbHM=" data-listing-date-sort="1760572800000" data-listing-file-modified-sort="1762256783769" data-listing-date-modified-sort="NaN" data-listing-reading-time-sort="10" data-listing-word-count-sort="1923" data-listing-title-sort="How to Choose an R Package for Statistics" data-listing-filename-sort="package-review.qmd">
244
+
<trdata-index="0" data-categories="T3Blbi1Tb3VyY2UlMjBTa2lsbHM=" data-listing-date-sort="1760572800000" data-listing-file-modified-sort="1762259417550" data-listing-date-modified-sort="NaN" data-listing-reading-time-sort="10" data-listing-word-count-sort="1955" data-listing-title-sort="How to Choose an R Package for Statistics" data-listing-filename-sort="package-review.qmd">
<li><p>Does the package have adequate documentation?</p>
234
235
<ul>
235
236
<li><p>Are there references to external peer reviewed papers?</p></li>
236
-
<li><p>Is there a website / vignettes?</p></li>
237
+
<li><p>Is there a website or vignettes?</p></li>
237
238
</ul></li>
238
239
<li><p>Is there a way to report bugs?</p></li>
239
240
<li><p>Can the packages handle different edge cases?</p></li>
240
241
<li><p>Does the package have a lot of dependencies / unusual dependencies?</p></li>
241
-
<li><p>Look at community adoption?</p></li>
242
+
<li><p>How much has the statistical community or industry adopted it?</p></li>
242
243
</ul>
243
244
<p>Using this checklist can help you quickly and consistently get a sense of a package before spending time looking into the code directly. Let’s see how this works in practice.</p>
<p>For this, we are going to look at the Wilcoxon Rank-Sum test and the associated Hodges-Lehmann confidence interval. After googling a bit, I found three different packages that do a Wilcoxon Rank-Sum p-value and Hodges-Lehmann CI:</p>
248
+
<p>To think this through, we are going to look at a specific worked example - the Wilcoxon Rank-Sum test and the associated Hodges-Lehmann confidence interval. After googling for a little while, I found three different packages that do a Wilcoxon Rank-Sum p-value and Hodges-Lehmann CI:</p>
248
249
<oltype="1">
249
-
<li><p>{stats} (part of base R)</p></li>
250
-
<li><p>{pairwiseCI}</p></li>
251
-
<li><p>{coin}</p></li>
250
+
<li><p><strong>stats</strong> (part of base R)</p></li>
251
+
<li><p><strong>pairwiseCI</strong></p></li>
252
+
<li><p><strong>coin</strong></p></li>
252
253
</ol>
253
-
<p>Great! I might be kind of done, because I tend to favor base R stats functions, but as I start looking into this, I found the {stats} function can’t handle ties if I want the exact methods. So I need to look into and compare the {pairwiseCI} and {coin} packages.</p>
254
+
<p>Great! I might be kind of done, because I tend to favor base R stats functions, but as I start looking into this, I found the <strong>stats</strong> function can’t handle ties if I want the exact methods. So I need to look into and compare the <strong>pairwiseCI</strong> and <strong>coin</strong> packages.</p>
<p>You often find that differences between packages and software show up when there are ties, missing data, and/or extreme values, so it is good to try to include these in the dataset you are using to compare.</p>
265
266
</div>
266
267
</div>
267
-
<p>Now I need to choose between {pairwaiseCI} and {coin}. I could just run the model in both and see if the results match, but that will be a lot of work. So before I get started I want to go through our checklist.</p>
268
+
<p>Now I need to choose between <strong>pairwaiseCI</strong> and <strong>coin</strong>. I could just run the model in both and see if the results match, but that will be a lot of work. So before I get started I want to go through our checklist.</p>
268
269
<p>Let’s pull up the CRAN index pages for each of these packages and see if we can figure out which package we should use for this analysis.</p>
<p>Starting with <strong>pairwiseCI</strong>, the <ahref="https://cran.r-project.org/web/packages/pairwiseCI/index.html">index</a> page looks like this:</p>
</p><p>On the index there are no references, but there might be some references on the function level. Really it just means they haven’t published a paper about this package.</p>
305
306
<p></p></li>
306
-
<li><p>Is there a website / Vignettes?</p>
307
+
<li><p>Is there a website or vignettes?</p>
307
308
<pstyle="color:blue;">
308
309
</p><p>There isn’t a website or vingettes. This means all the documenation will be limited to just the functions, which can be harder to understand.</p>
<li><p>Does the package have a lot of dependencies / unusual dependencies?</p>
320
321
<pstyle="color:blue;">
321
-
</p><p>It looks like this package only has two dependencies, {MCPAN} and, interestingly, {coin}, the other package we are looking at.</p>
322
+
</p><p>It looks like this package only has two dependencies, <strong>MCPAN</strong> and, interestingly, <strong>coin</strong>, the other package we are looking at.</p>
322
323
<p></p></li>
323
324
</ul>
324
-
<p>Okay, having gone through all but the final question, I would say I feel not amazing about the package, but if it was my only option I would still try to use it. The author gives me confidence in the package, but other things like documentation and last update date, make me a bit nervous about this package.</p>
325
+
<p>Okay, having gone through all but the final question, I would say I do not feel great about the package, but if it was my only option I would still try to use it. The author gives me confidence in the package, but other things like documentation and last update date make me a bit nervous about this package.</p>
<p>Now on to <strong>coin</strong> with the same questions. The <ahref="https://cran.r-project.org/web/packages/coin/index.html">index</a> page is as follows:</p>
<li><p>Can the packages handle different edge cases?</p>
372
373
<pstyle="color:blue;">
373
-
</p><p>Same as above. It doesn’t directly said it can handle our edge case, but I know it can.</p>
374
+
</p><p>Same as above. It doesn’t directly say it can handle our edge case, but I know it can.</p>
374
375
<p></p></li>
375
376
<li><p>Does the package have a lot of dependencies / unusual dependencies?</p>
376
377
<pstyle="color:blue;">
377
378
</p><p>While this package has more dependencies, all the dependencies are very standard and do not raise any red flags for me</p>
378
379
<p></p></li>
379
380
</ul>
380
-
<p>Having gone through most the questions, I am fairly confident in saying I want to use {coin} to investigate this method rather than {pairwiseCI}. For almost all the questions {coin} looks slightly better than {pairwiseCI} and really just has a larger accumulation of evidence of quality. But, I haven’t answered the last question in my checklist for either these packages. What about community adoption? It can be a bit hard to look at directly, but I tend to use a few different ways.</p>
381
-
<p>First, staying on the CRAN index page for the package, I look at the Reverse Dependencies. This section gets split into three parts, “Reverse depends”, “Reverse imports”, and “Reverse suggests” which explains how the other packages are using the package. In terms of community adoption, it doesn’t matter if other packages are depending, importing or suggesting the package, all that matters is they are using it. <strong>Note:</strong> This section only appears if other packages on CRAN use the package.</p>
382
-
<p>For these two packages, only {coin} has this section and we can see there are many other packages that use {coin}.</p>
381
+
<p>Having gone through the questions, I am fairly confident in saying I want to use <strong>coin</strong> to investigate this method rather than <strong>pairwiseCI</strong>. For almost all of the questions <strong>coin</strong> looks slightly better than <strong>pairwiseCI</strong> and really just has a larger accumulation of evidence of quality. But, I haven’t answered the last question in my checklist for either of these packages - How much has the statistical community or industry adopted it? It can be a bit hard to look at directly, but I tend to investigate this a few different ways.</p>
382
+
<p>First, staying on the CRAN index page for the package, I look at the Reverse Dependencies. This section gets split into three parts, “Reverse depends”, “Reverse imports”, and “Reverse suggests” which explains how the other packages are using the package. In terms of community adoption, it doesn’t matter if other packages are depending on, importing or suggesting the package. All that matters is that they are using it. <strong>Note:</strong> This section only appears if other packages on CRAN use the package.</p>
383
+
<p>For these two packages, only <strong>coin</strong> has this section and we can see there are many other packages that use <strong>coin</strong>.</p>
<p>And you can see {coin} is much more popular than {pairwiseCI}.</p>
406
-
<p>So with all of this information, I think starting with {coin} is going to be the best use of my time.</p>
407
-
<p>When looking at the number of downloads, you can look over a longer period like over the last month (by using the <code>when</code> parameter) or you can look between specific dates (by using the <code>from</code> and <code>to</code> parameters). But, it will give you the download numbers for each day, which you will need to summaries. These day-by-day numbers can be very helpful to look at trends, especially when there is a new package that is getting rapidly adopted.</p>
408
-
<p>The checklist isn’t intended to replace a full review of the package for an GxP workflows. But, when just trying to decide which package to look into for a particular stats method it can be helpful.</p>
409
-
<p>In summary, selecting the appropriate R package for statistical analyses is hard. Google, isn’t perfect and so it worth finding a few packages and going through this checklist. By taking a few minutes to consider factors like maintenance, documentation, and community adoption can save you time in the long run.</p>
406
+
<p>And you can see <strong>coin</strong> is much more popular than <strong>pairwiseCI</strong>.</p>
407
+
<p>So with all of this information, I think starting with <strong>coin</strong> is going to be the best use of my time.</p>
408
+
<p>When looking at the number of downloads, you can look over a longer period like over the last month (by using the <code>when</code> parameter) or you can look between specific dates (by using the <code>from</code> and <code>to</code> parameters). But, it will give you the download numbers for each day, which you will need to summarise. These day-by-day numbers can be very helpful to look at trends, especially when there is a new package that is getting rapidly adopted.</p>
409
+
<p>Please note that this checklist isn’t intended to replace a full review of the package for GxP workflows. But it is intended to be helpful when starting to think through the issues involved in package choice particularly for statistical methods.</p>
<p>In summary, selecting the appropriate R package for statistical analyses is hard. Google isn’t perfect and so it is worth finding a few packages and going through this checklist. By taking a few minutes to consider factors like maintenance, documentation, and community adoption can save you time in the long run.</p>
0 commit comments