Skip to content

Commit 2d796c6

Browse files
author
André Duarte
committed
Minor fix
1 parent 336c87b commit 2d796c6

File tree

2 files changed

+8
-8
lines changed

2 files changed

+8
-8
lines changed

final-report/chapters/discussion.tex

Lines changed: 8 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -3,7 +3,7 @@ \chapter{Discussion} \label{chap:discussion}
33

44
\section*{}
55

6-
The experimental results acknowledged that extracting the component changes meta-data is valuable by allowing to predict, with good precision and independently of the language, which files are more probable to have defects (Figure \ref{fig:dp-faults-position}).
6+
The experimental results acknowledged that extracting the component changes meta-data is valuable by allowing to predict, with useful precision and independently of the language, which files are more probable to have defects (Figure \ref{fig:dp-faults-position}).
77

88
\section{Estimating Defect Probability}
99

@@ -12,11 +12,11 @@ \section{Estimating Defect Probability}
1212

1313
Precision and recall of the estimation of defect probability exhibited in Figure \ref{fig:dp-precision-recall} is also relevant to analyze. The precision improvement we see on this figure over an uniform probability distribution for clean and buggy illustrates the information gain we obtain with this solution.
1414

15-
However, the mean accuracy obtained when classifying the test folds selected by Stratified KFold (Figure \ref{fig:kfold-accuracy-dist}) does not seem to be reflected when classifying the test set, namely the project's state. This could be explained by a overfitting mistake, but we think this is not the problem. We analyzed it, tried to use normalized values for changes, instead of the raw value plus date, and tried to use less features and tested the various options by cutting out more recent data and using the rest to model using KFold. The accuracy of classifying the most recent data was always much lower than the accuracy predicting data that was from a closer time frame.
15+
However, the mean accuracy obtained when classifying the test folds selected by Stratified KFold (Figure \ref{fig:kfold-accuracy-dist}) does not seem to be reflected when classifying the test set, namely the project's state. This could be explained by a overfitting mistake, but we think this is not the problem. We analyzed it, tried to use normalized values for changes, instead of the raw values and date, tried to use less features and tested the various options by cutting out more recent data and using the rest to create the model using KFold. The accuracy of classifying the most recent data was always much lower than the accuracy predicting data that was from a closer time frame.
1616

1717
This may be explained by the fact that evolution of the project may affect which patterns allow to identify faulty components, making data from within a closer time frame, more valuable.
1818

19-
The inability to have more consistent results of the mean accuracy, that vary mainly between $0.8$ and $0.95$ as illustrated in Figure \ref{fig:kfold-accuracy-dist} may be caused by the data imbalance and noise.
19+
The inability to have better and more consistent results of the mean accuracy, that vary mainly between $0.8$ and $0.95$ as illustrated in Figure \ref{fig:kfold-accuracy-dist} may be caused by the data imbalance and noise.
2020

2121
Since in each fix commit just a small percentage is changed and all the others are considered clean, extracted data is extremely unbalanced and the number of fault components in the training set is small. Using SMOTE improved the results, but the tendency to $0$ continues to be noticeable.
2222

@@ -28,25 +28,25 @@ \section{Estimating Defect Probability}
2828

2929
\section{Barinel Integration}
3030

31-
Analysis of the unmodified Barinel, illustrated in \ref{fig:fault-positions}, showed how good the results are and the tenuous percentage of tests that can be improved by using our approach to modify Barinel results
31+
Analysis of the unmodified Barinel, illustrated in \ref{fig:fault-positions}, showed how good the results are and the tenuous percentage of tests that can be improved by using our approach to modify Barinel results.
3232

3333
\subsection{Results Modification}
3434

3535
In the best case scenario, the results modification integration can improve $14.67\%$ of the tests. While in the worst case scenario, $29\%$ of the tests would worsen.
3636

37-
Figure \ref{fig:results-modification} shows that when considering as faulty all the components with a predicted defect probability above $0.6$ the Barinel results improve, with little or no error. Examining for example the results for $0.65$ of minimum predicted probability, where the delta is higher, $13.5\%$ of the possible improvements occurred and just one test worsened. Increasing the minimum diminishes both the number of improvements and errors, but starting at $0.75$ errors are completely eliminated.
37+
Figure \ref{fig:results-modification} shows that when considering as faulty all the components with a predicted defect probability above $0.6$ the Barinel results improve, with little or no error. Examining for example the results for $0.65$ of minimum predicted probability, where the delta is higher, $13.5\%$ of the possible improvements occurred and just one test worsened. Increasing the minimum diminishes both the number of improvements and errors. Starting at $0.75$ errors are completely eliminated.
3838

3939
\subsection{Priors Replacement}
4040

41-
The best case scenario showed that even with $100\%$ precision the priors replacement integration can result in worsened tests. \todo{Explain why clearly}
41+
The best case scenario showed that even with $100\%$ precision the priors replacement integration can result in worsened tests. This may be caused by Barinel calculating the defect probability for groups of components and only in the end the defect probabilities for each is calculated based on the probabilities of the groups in which it is present. The changed priors may also change the probability of a related component, which may negatively affect the results.
4242

4343
Even though, real results revealed to be promising by improving approximately $43\%$ of the possible tests and not damaging any. This may illustrate how important the defect probability prediction based on language agnostic features is to improve Barinel results.
4444

4545
\section{Threats to Validity}
4646

47-
There are some threats to the validity of this research. The first is the fact that the Math project (101 tests of 184) appeared to have flaky tests, since with the exact same configuration Barinel, which is deterministic, reported some value changes.
47+
There are some threats to the validity of this research. The first is the fact that the Math project (101 tests of 184) appeared to have flaky tests, since with the exact same configuration Barinel, which is deterministic, some value changes. were observed.
4848

49-
Using three open source Java projects, with 184 tests, may also not be sufficient to predict the application behavior in other different projects
49+
Using three open source Java projects, with 184 tests, may also not be sufficient to predict the application behavior in other different projects.
5050

5151
Being this research all about defect probabilities, we know that the application made to estimate the defect probability can also have defects and may somehow affect the predictions. Although the application was heavily tested and many results were manually checked for validity.
5252

final-report/thesis.pdf

130 Bytes
Binary file not shown.

0 commit comments

Comments
 (0)