Académique Documents
Professionnel Documents
Culture Documents
Outline
1 Motivation
2 Related Studies
3 Research Methodology
Proposed Approach
Experimental Setup
Datasets and Metrics
Performance Measures and Tools
4 Results
Outline
1 Motivation
2 Related Studies
3 Research Methodology
Proposed Approach
Experimental Setup
Datasets and Metrics
Performance Measures and Tools
4 Results
Remark 1
Suitable approach and similar data are two key factors in building
good prediction models
Motivation Related Studies Research Methodology Results Conclusions and Future Works
Remark 1
Suitable approach and similar data are two key factors in building
good prediction models
Remark 2
There are plenty of relevant public datasets available, especially in
the open source repositories
Motivation Related Studies Research Methodology Results Conclusions and Future Works
Data Issues
The main issues concerning the data used for defect prediction are:
Within Project data is usually scarce
Motivation Related Studies Research Methodology Results Conclusions and Future Works
Data Issues
The main issues concerning the data used for defect prediction are:
Within Project data is usually scarce
Data Quality
Defect data collection process is prone to errors
Not all the defects might have been detected
The same measurements may lead to different number of bugs
Motivation Related Studies Research Methodology Results Conclusions and Future Works
Outline
1 Motivation
2 Related Studies
3 Research Methodology
Proposed Approach
Experimental Setup
Datasets and Metrics
Performance Measures and Tools
4 Results
NN Filter
Clustering
Figure: Menzies et al, 2013, Herbold 2013 and Jureczko et al, 2010
Motivation Related Studies Research Methodology Results Conclusions and Future Works
Mixed Data
Universal Model
Outline
1 Motivation
2 Related Studies
3 Research Methodology
Proposed Approach
Experimental Setup
Datasets and Metrics
Performance Measures and Tools
4 Results
Proposed Approach
Proposed Approach
Proposed Approach
Proposed Approach
Chromosome Representation
Proposed Approach
Mutation
Proposed Approach
Mutation
Proposed Approach
Mutation
Proposed Approach
Mutation
Proposed Approach
Crossover
Proposed Approach
Crossover Example
F1 F2 F3 ... Label
Cn1 6 1 0 ... 0
Cn2 3 4 0.97 ... 1
Cn3 3 4 0.98 ... 0
Cn3 5 2 0.69 ... 0
... ... ... ... ... ...
Cn5 3 4 0.98 ... 0
Cn6 3 2 0.82 ... 0
Cn7 16 3 0.73 ... 1
Cn8 3 4 0.97 ... 1
F1 F2 F3 ... Label
Cp1 2 1 1 ... 1
Cp2 2 3 0.89 ... 0
Cp3 1 2 0.90 ... 0
Cp4 1 3 0.45 ... 1
... ... ... ... ... ...
Cp5 0 2 0.22 ... 0
Cp6 7 8 0.95 ... 1
Cp7 11 6 0.33 ... 1
Cp8 3 3 0.8 ... 0
Motivation Related Studies Research Methodology Results Conclusions and Future Works
Proposed Approach
Crossover Example
F1 F2 F3 ... Label F1 F2 F3 ... Label
Cn1 6 1 0 ... 0 Cn1 6 1 0 ... 0
Cn2 3 4 0.97 ... 1 Cn2 3 4 0.97 ... 1
Cn3 3 4 0.98 ... 0 Cn3 3 4 0.98 ... 0
Cn3 5 2 0.69 ... 0 Cn3 5 2 0.69 ... 0
... ... ... ... ... ... ... ... ... ... ... ...
Cn5 3 4 0.98 ... 0
Cn6 3 2 0.82 ... 0
Cn7 16 3 0.73 ... 1
Cn8 3 4 0.97 ... 1
F1 F2 F3 ... Label
Cp1 2 1 1 ... 1
Cp2 2 3 0.89 ... 0
Cp3 1 2 0.90 ... 0
Cp4 1 3 0.45 ... 1
... ... ... ... ... ...
Cp5 0 2 0.22 ... 0
Cp6 7 8 0.95 ... 1
Cp7 11 6 0.33 ... 1
Cp8 3 3 0.8 ... 0
Motivation Related Studies Research Methodology Results Conclusions and Future Works
Proposed Approach
Crossover Example
F1 F2 F3 ... Label F1 F2 F3 ... Label
Cn1 6 1 0 ... 0 Cn1 6 1 0 ... 0
Cn2 3 4 0.97 ... 1 Cn2 3 4 0.97 ... 1
Cn3 3 4 0.98 ... 0 Cn3 3 4 0.98 ... 0
Cn3 5 2 0.69 ... 0 Cn3 5 2 0.69 ... 0
... ... ... ... ... ... ... ... ... ... ... ...
Cn5 3 4 0.98 ... 0 Cp5 0 2 0.22 ... 0
Cn6 3 2 0.82 ... 0 Cp6 7 8 0.95 ... 1
Cn7 16 3 0.73 ... 1 Cp7 11 6 0.33 ... 1
Cn8 3 4 0.97 ... 1 Cp8 3 3 0.8 ... 0
F1 F2 F3 ... Label
Cp1 2 1 1 ... 1
Cp2 2 3 0.89 ... 0
Cp3 1 2 0.90 ... 0
Cp4 1 3 0.45 ... 1
... ... ... ... ... ...
Cp5 0 2 0.22 ... 0
Cp6 7 8 0.95 ... 1
Cp7 11 6 0.33 ... 1
Cp8 3 3 0.8 ... 0
Motivation Related Studies Research Methodology Results Conclusions and Future Works
Proposed Approach
Crossover Example
F1 F2 F3 ... Label F1 F2 F3 ... Label
Cn1 6 1 0 ... 0 Cn1 6 1 0 ... 0
Cn2 3 4 0.97 ... 1 Cn2 3 4 0.97 ... 1
Cn3 3 4 0.98 ... 0 Cn3 3 4 0.98 ... 0
Cn3 5 2 0.69 ... 0 Cn3 5 2 0.69 ... 0
... ... ... ... ... ... ... ... ... ... ... ...
Cn5 3 4 0.98 ... 0 Cp5 0 2 0.22 ... 0
Cn6 3 2 0.82 ... 0 Cp6 7 8 0.95 ... 1
Cn7 16 3 0.73 ... 1 Cp7 11 6 0.33 ... 1
Cn8 3 4 0.97 ... 1 Cp8 3 3 0.8 ... 0
Proposed Approach
Crossover Example
F1 F2 F3 ... Label F1 F2 F3 ... Label
Cn1 6 1 0 ... 0 Cn1 6 1 0 ... 0
Cn2 3 4 0.97 ... 1 Cn2 3 4 0.97 ... 1
Cn3 3 4 0.98 ... 0 Cn3 3 4 0.98 ... 0
Cn3 5 2 0.69 ... 0 Cn3 5 2 0.69 ... 0
... ... ... ... ... ... ... ... ... ... ... ...
Cn5 3 4 0.98 ... 0 Cp5 0 2 0.22 ... 0
Cn6 3 2 0.82 ... 0 Cp6 7 8 0.95 ... 1
Cn7 16 3 0.73 ... 1 Cp7 11 6 0.33 ... 1
Cn8 3 4 0.97 ... 1 Cp8 3 3 0.8 ... 0
Experimental Setup
Experimental Setup
Variable Description
CK suite (6)
WMC Weighted Methods per Class
DIT Depth of Inheritance Tree
LCOM Lack of Cohesion in Methods
RFC Response for a Class
CBO Coupling between Object classes
NOC Number of Children
Lines of Code (1)
LOC Lines Of Code
Motivation Related Studies Research Methodology Results Conclusions and Future Works
Performance Measures
Tools
1
http://www.cs.waikato.ac.nz/ml/weka/
2
https://www.scipy.org/
3
http://www.python.org
4
http://matplotlib.org/
Motivation Related Studies Research Methodology Results Conclusions and Future Works
Tools
1
http://www.cs.waikato.ac.nz/ml/weka/
2
https://www.scipy.org/
3
http://www.python.org
4
http://matplotlib.org/
Motivation Related Studies Research Methodology Results Conclusions and Future Works
Tools
1
http://www.cs.waikato.ac.nz/ml/weka/
2
https://www.scipy.org/
3
http://www.python.org
4
http://matplotlib.org/
Motivation Related Studies Research Methodology Results Conclusions and Future Works
Outline
1 Motivation
2 Related Studies
3 Research Methodology
Proposed Approach
Experimental Setup
Datasets and Metrics
Performance Measures and Tools
4 Results
Median 0.896 0.353 0.496 0.542 0.165 0.667 0.253 0.349 0.144 0.682 0.234 0.329
Mean 0.853 0.424 0.483 0.543 0.252 0.613 0.269 0.334 0.235 0.619 0.248 0.317
Motivation Related Studies Research Methodology Results Conclusions and Future Works
1.0
0.8
0.6
0.4
0.2
0.0
GIS (CP) Cross Validation (WP) (NN)-Filter (CP) Naive (CP)
1.0
0.8
0.6
0.4
0.2
0.0
GIS (CP) Cross Validation (WP) (NN)-Filter (CP) Naive (CP)
Motivation Related Studies Research Methodology Results Conclusions and Future Works
1.0 1.0
0.8 0.8
0.6 0.6
0.4 0.4
0.2 0.2
0.0 0.0
GIS (CP) Cross Validation (WP) (NN)-Filter (CP) Naive (CP) GIS (CP) Cross Validation (WP) (NN)-Filter (CP) Naive (CP)
1.0 1.0
0.8 0.8
0.6 0.6
0.4 0.4
0.2 0.2
0.0 0.0
GIS (CP) Cross Validation (WP) (NN)-Filter (CP) Naive (CP) GIS (CP) Cross Validation (WP) (NN)-Filter (CP) Naive (CP)
Motivation Related Studies Research Methodology Results Conclusions and Future Works
Outline
1 Motivation
2 Related Studies
3 Research Methodology
Proposed Approach
Experimental Setup
Datasets and Metrics
Performance Measures and Tools
4 Results
Conclusions
Conclusions
Conclusions
Conclusions
Future Works
Future Works
Future Works
Future Works
Thank you
GIS Algorithm
Algorithm 1 Pseudo code for GIS
1: Set numGens = The number of generations of each genetic optimization run.
2: Set popSize = The size of the population.
3: Set DATA = {Ant-1.7, Camel-1.6, ivy-2.0, jedit-4.3, log4j-1.2, lucene-2.4, poi-3.0, prop-6, synapse-1.2, tomcat-6.0, velocity-1.6, xalan-
2.7,xerces-1.4}
4: Set FEATURES = { WMC, DIT, NOC, CBO, RFC, LCOM, LOC }
5:
6: for TEST in DATA do
7: set TRAIN = Instances from all other projects
8: tdSize = 0.05 * Number of instances in TRAIN
9: for i = 1 to 30 do
10: Set TestParts = Split TEST instances into p parts
11:
12: for each testPart in TestParts do
13: Set vSet = Generate a validation dataset using 5-(NN)-Filter method (using three distance measures).
14: Set TrainDataSets = Create popSize dataset from TRAIN with replacement each with tdSize instances
15:
16: for each td in TrainDataSets do
17: Evaluate td on vSet and add it to the initial generation
18: end for
19:
20: for g in range(numGens) do
21: Create a new generation using the defined Crossover and Mutation function and Elites from the curent generation.
22: Combine the two generations and extract a new generation
23: end for
24:
25: Set bestDS = Select the top dataset from the GA’s last iteration.
26: Evaluate bestDS on testPart and append the results to the pool of results.
27: end for
28:
29: Calculate Precision, Recall and F-Measure and GMean from the predictions.
30: end for
31: Report the median of all 30 experiments
32: end for
Motivation Related Studies Research Methodology Results Conclusions and Future Works