Add clarification to how the Overall Score is calculated for each tool.

davewichers · davewichers · commit 62baf02553b4 · 2015-09-25T13:14:24.000-04:00
diff --git a/scorecard/Benchmark_v1.2beta_Scorecard_for_FBwFindSecBugs.html b/scorecard/Benchmark_v1.2beta_Scorecard_for_FBwFindSecBugs.html
@@ -125,8 +125,9 @@ <h2>Detailed Results</h2>
 <tr ><td>Weak Hash Algorithm</td><td>89</td><td>40</td><td>107</td><td>0</td><td>236</td><td>68.99%</td><td>0.00%</td><td>68.99%</td></tr>
 <tr class="success"><td>Weak Random Number</td><td>218</td><td>0</td><td>275</td><td>0</td><td>493</td><td>100.00%</td><td>0.00%</td><td>100.00%</td></tr>
 <tr class="danger"><td>XPath Injection</td><td>15</td><td>0</td><td>0</td><td>20</td><td>35</td><td>100.00%</td><td>100.00%</td><td>0.00%</td></tr>
-<th>Totals</th><th>1028</th><th>387</th><th>791</th><th>534</th><th>2740</th><th>77.67%</th><th>45.21%</th><th>32.46%</th></tr>
-</table>
+<th>Totals*</th><th>1028</th><th>387</th><th>791</th><th>534</th><th>2740</th><th/><th/><th/></tr>
+<th>Overall Results*</th><th/><th/><th/><th/><th/><th>77.67%</th><th>45.21%</th><th>32.46%</th></tr>
+</table><p>*-The Overall Results are averages across all the vulnerability categories.  You can't compute these averages by simply calculating the TPR and FPR rates using  the values in the Totals row. If you did that, categories with larger number of tests would carry  more weight than categories with less tests. The proper calculation of the Overall Results is to add up all the TPR, FPR, and Score values,  and then divide by the number of vulnerability categories, which is how they are calculated.<p/>
 <p>
 
 
diff --git a/scorecard/Benchmark_v1.2beta_Scorecard_for_FindBugs.html b/scorecard/Benchmark_v1.2beta_Scorecard_for_FindBugs.html
@@ -125,8 +125,9 @@ <h2>Detailed Results</h2>
 <tr class="danger"><td>Weak Hash Algorithm</td><td>0</td><td>129</td><td>107</td><td>0</td><td>236</td><td>0.00%</td><td>0.00%</td><td>0.00%</td></tr>
 <tr class="danger"><td>Weak Random Number</td><td>0</td><td>218</td><td>275</td><td>0</td><td>493</td><td>0.00%</td><td>0.00%</td><td>0.00%</td></tr>
 <tr class="danger"><td>XPath Injection</td><td>0</td><td>15</td><td>20</td><td>0</td><td>35</td><td>0.00%</td><td>0.00%</td><td>0.00%</td></tr>
-<th>Totals</th><th>153</th><th>1262</th><th>1190</th><th>135</th><th>2740</th><th>5.26%</th><th>5.46%</th><th>-0.19%</th></tr>
-</table>
+<th>Totals*</th><th>153</th><th>1262</th><th>1190</th><th>135</th><th>2740</th><th/><th/><th/></tr>
+<th>Overall Results*</th><th/><th/><th/><th/><th/><th>5.26%</th><th>5.46%</th><th>-0.19%</th></tr>
+</table><p>*-The Overall Results are averages across all the vulnerability categories.  You can't compute these averages by simply calculating the TPR and FPR rates using  the values in the Totals row. If you did that, categories with larger number of tests would carry  more weight than categories with less tests. The proper calculation of the Overall Results is to add up all the TPR, FPR, and Score values,  and then divide by the number of vulnerability categories, which is how they are calculated.<p/>
 <p>
 
 
diff --git a/scorecard/Benchmark_v1.2beta_Scorecard_for_OWASP_ZAP.html b/scorecard/Benchmark_v1.2beta_Scorecard_for_OWASP_ZAP.html
@@ -125,8 +125,9 @@ <h2>Detailed Results</h2>
 <tr class="danger"><td>Weak Hash Algorithm</td><td>0</td><td>129</td><td>107</td><td>0</td><td>236</td><td>0.00%</td><td>0.00%</td><td>0.00%</td></tr>
 <tr class="danger"><td>Weak Random Number</td><td>0</td><td>218</td><td>275</td><td>0</td><td>493</td><td>0.00%</td><td>0.00%</td><td>0.00%</td></tr>
 <tr class="danger"><td>XPath Injection</td><td>0</td><td>15</td><td>20</td><td>0</td><td>35</td><td>0.00%</td><td>0.00%</td><td>0.00%</td></tr>
-<th>Totals</th><th>245</th><th>1170</th><th>1324</th><th>1</th><th>2740</th><th>18.03%</th><th>0.04%</th><th>17.99%</th></tr>
-</table>
+<th>Totals*</th><th>245</th><th>1170</th><th>1324</th><th>1</th><th>2740</th><th/><th/><th/></tr>
+<th>Overall Results*</th><th/><th/><th/><th/><th/><th>18.03%</th><th>0.04%</th><th>17.99%</th></tr>
+</table><p>*-The Overall Results are averages across all the vulnerability categories.  You can't compute these averages by simply calculating the TPR and FPR rates using  the values in the Totals row. If you did that, categories with larger number of tests would carry  more weight than categories with less tests. The proper calculation of the Overall Results is to add up all the TPR, FPR, and Score values,  and then divide by the number of vulnerability categories, which is how they are calculated.<p/>
 <p>
 
 
diff --git a/scorecard/Benchmark_v1.2beta_Scorecard_for_PMD.html b/scorecard/Benchmark_v1.2beta_Scorecard_for_PMD.html
@@ -125,8 +125,9 @@ <h2>Detailed Results</h2>
 <tr class="danger"><td>Weak Hash Algorithm</td><td>0</td><td>129</td><td>107</td><td>0</td><td>236</td><td>0.00%</td><td>0.00%</td><td>0.00%</td></tr>
 <tr class="danger"><td>Weak Random Number</td><td>0</td><td>218</td><td>275</td><td>0</td><td>493</td><td>0.00%</td><td>0.00%</td><td>0.00%</td></tr>
 <tr class="danger"><td>XPath Injection</td><td>0</td><td>15</td><td>20</td><td>0</td><td>35</td><td>0.00%</td><td>0.00%</td><td>0.00%</td></tr>
-<th>Totals</th><th>0</th><th>1415</th><th>1325</th><th>0</th><th>2740</th><th>0.00%</th><th>0.00%</th><th>0.00%</th></tr>
-</table>
+<th>Totals*</th><th>0</th><th>1415</th><th>1325</th><th>0</th><th>2740</th><th/><th/><th/></tr>
+<th>Overall Results*</th><th/><th/><th/><th/><th/><th>0.00%</th><th>0.00%</th><th>0.00%</th></tr>
+</table><p>*-The Overall Results are averages across all the vulnerability categories.  You can't compute these averages by simply calculating the TPR and FPR rates using  the values in the Totals row. If you did that, categories with larger number of tests would carry  more weight than categories with less tests. The proper calculation of the Overall Results is to add up all the TPR, FPR, and Score values,  and then divide by the number of vulnerability categories, which is how they are calculated.<p/>
 <p>
 
 
diff --git a/scorecard/Benchmark_v1.2beta_Scorecard_for_SonarQube_Java_Plugin.html b/scorecard/Benchmark_v1.2beta_Scorecard_for_SonarQube_Java_Plugin.html
@@ -125,8 +125,9 @@ <h2>Detailed Results</h2>
 <tr ><td>Weak Hash Algorithm</td><td>89</td><td>40</td><td>107</td><td>0</td><td>236</td><td>68.99%</td><td>0.00%</td><td>68.99%</td></tr>
 <tr class="success"><td>Weak Random Number</td><td>218</td><td>0</td><td>275</td><td>0</td><td>493</td><td>100.00%</td><td>0.00%</td><td>100.00%</td></tr>
 <tr class="danger"><td>XPath Injection</td><td>0</td><td>15</td><td>20</td><td>0</td><td>35</td><td>0.00%</td><td>0.00%</td><td>0.00%</td></tr>
-<th>Totals</th><th>607</th><th>808</th><th>1184</th><th>141</th><th>2740</th><th>50.36%</th><th>17.02%</th><th>33.34%</th></tr>
-</table>
+<th>Totals*</th><th>607</th><th>808</th><th>1184</th><th>141</th><th>2740</th><th/><th/><th/></tr>
+<th>Overall Results*</th><th/><th/><th/><th/><th/><th>50.36%</th><th>17.02%</th><th>33.34%</th></tr>
+</table><p>*-The Overall Results are averages across all the vulnerability categories.  You can't compute these averages by simply calculating the TPR and FPR rates using  the values in the Totals row. If you did that, categories with larger number of tests would carry  more weight than categories with less tests. The proper calculation of the Overall Results is to add up all the TPR, FPR, and Score values,  and then divide by the number of vulnerability categories, which is how they are calculated.<p/>
 <p>
 
 
diff --git a/src/main/java/org/owasp/benchmark/score/report/Report.java b/src/main/java/org/owasp/benchmark/score/report/Report.java
@@ -203,13 +203,16 @@ else if (r.truePositiveRate > .7 && r.falsePositiveRate < .3)
 			if (!Double.isNaN(r.score))
 				totalScore += r.score;
 		}
-		sb.append("<th>" + "Totals" + "</th>");
+		sb.append("<th>Totals*</th>");
 		sb.append("<th>" + totals.tp + "</th>");
 		sb.append("<th>" + totals.fn + "</th>");
 		sb.append("<th>" + totals.tn + "</th>");
 		sb.append("<th>" + totals.fp + "</th>");
 		int total = totals.tp + totals.fn + totals.tn + totals.fp;
 		sb.append("<th>" + total + "</th>");
+		sb.append("<th/><th/><th/></tr>\n");
+		
+		sb.append("<th>Overall Results*</th><th/><th/><th/><th/><th/>");
 		double tpr = (totalTPR / scores.size());
 		sb.append("<th>" + new DecimalFormat("#0.00%").format(tpr) + "</th>");
 		double fpr = (totalFPR / scores.size());
@@ -218,6 +221,12 @@ else if (r.truePositiveRate > .7 && r.falsePositiveRate < .3)
 		sb.append("<th>" + new DecimalFormat("#0.00%").format(score) + "</th>");
 		sb.append("</tr>\n");
 		sb.append("</table>");
+		sb.append("<p>*-The Overall Results are averages across all the vulnerability categories. "
+				+ " You can't compute these averages by simply calculating the TPR and FPR rates using "
+				+ " the values in the Totals row. If you did that, categories with larger number of tests would carry "
+				+ " more weight than categories with less tests. The proper calculation of the Overall Results is to"
+				+ " add up all the TPR, FPR, and Score values, "
+				+ " and then divide by the number of vulnerability categories, which is how they are calculated.<p/>");
 				
 		return sb.toString();
 	}