Software defect prediction framework based on hybrid metaheuristic optimization methods

A software defect is an error, failure, or fault in a software that produces an incorrect or unexpected result. Software defects are expensive in quality and cost. The accurate prediction of defect‐prone software modules certainly assist testing effort, reduce costs and improve the quality of softwa...

Full description

Saved in:
Bibliographic Details
Main Author: Wahono, Romi Satria
Format: Thesis
Language:English
English
Published: 2015
Subjects:
Online Access:http://eprints.utem.edu.my/id/eprint/16874/1/Software%20Defect%20Prediction%20Framework%20Based%20On%20Hybrid%20Metaheuristic%20Optimization%20Methods.pdf
http://eprints.utem.edu.my/id/eprint/16874/2/Software%20defect%20prediction%20framework%20based%20on%20hybrid%20metaheuristic%20optimization%20methods.pdf
Tags: Add Tag
No Tags, Be the first to tag this record!
id my-utem-ep.16874
record_format uketd_dc
institution Universiti Teknikal Malaysia Melaka
collection UTeM Repository
language English
English
advisor Herman, Nanna Suryana

topic Q Science (General)
QA Mathematics
spellingShingle Q Science (General)
QA Mathematics
Wahono, Romi Satria
Software defect prediction framework based on hybrid metaheuristic optimization methods
description A software defect is an error, failure, or fault in a software that produces an incorrect or unexpected result. Software defects are expensive in quality and cost. The accurate prediction of defect‐prone software modules certainly assist testing effort, reduce costs and improve the quality of software. The classification algorithm is a popular machine learning approach for software defect prediction. Unfortunately, software defect prediction remains a largely unsolved problem. As the first problem, the comparison and benchmarking results of the defect prediction using machine learning classifiers indicate that, the poor accuracy level is dominant and no particular classifiers perform best for all the datasets. There are two main problems that affect classification performance in software defect prediction: noisy attributes and imbalanced class distribution of datasets, and difficulty of selecting optimal parameters of the classifiers. In this study, a software defect prediction framework that combines metaheuristic optimization methods for feature selection and parameter optimization, with meta learning methods for solving imbalanced class problem on datasets, which aims to improve the accuracy of classification models has been proposed. The proposed framework and models that are are considered to be the specific research contributions of this thesis are: 1) a comparison framework of classification models for software defect prediction known as CF-SDP, 2) a hybrid genetic algorithm based feature selection and bagging technique for software defect prediction known as GAFS+B, 3) a hybrid particle swarm optimization based feature selection and bagging technique for software defect prediction known as PSOFS+B, and 4) a hybrid genetic algorithm based neural network parameter optimization and bagging technique for software defect prediction, known as NN-GAPO+B. For the purpose of this study, ten classification algorithms have been selected. The selection aims at achieving a balance between established classification algorithms used in software defect prediction. The proposed framework and methods are evaluated using the state-of-the-art datasets from the NASA metric data repository. The results indicated that the proposed methods (GAFS+B, PSOFS+B and NN-GAPO+B) makes an impressive improvement in the performance of software defect prediction. GAFS+B and PSOFS+B significantly affected on the performance of the class imbalance suffered classifiers, such as C4.5 and CART. GAFS+B and PSOFS+B also outperformed the existing software defect prediction frameworks in most datasets. Based on the conducted experiments, logistic regression performs best in most of the NASA MDP datasets, without or with feature selection method. The proposed methods also generated the selected relevant features in software defect prediction. The top ten most relevant features in software defect prediction include branch count metrics, decision density, halstead level metric of a module, number of operands contained in a module, maintenance severity, number of blank LOC, halstead volume, number of unique operands contained in a module, total number of LOC and design density.
format Thesis
qualification_name Doctor of Philosophy (PhD.)
qualification_level Doctorate
author Wahono, Romi Satria
author_facet Wahono, Romi Satria
author_sort Wahono, Romi Satria
title Software defect prediction framework based on hybrid metaheuristic optimization methods
title_short Software defect prediction framework based on hybrid metaheuristic optimization methods
title_full Software defect prediction framework based on hybrid metaheuristic optimization methods
title_fullStr Software defect prediction framework based on hybrid metaheuristic optimization methods
title_full_unstemmed Software defect prediction framework based on hybrid metaheuristic optimization methods
title_sort software defect prediction framework based on hybrid metaheuristic optimization methods
granting_institution Universiti Teknikal Malaysia Melaka
granting_department Faculty Of Information And Communication Technology
publishDate 2015
url http://eprints.utem.edu.my/id/eprint/16874/1/Software%20Defect%20Prediction%20Framework%20Based%20On%20Hybrid%20Metaheuristic%20Optimization%20Methods.pdf
http://eprints.utem.edu.my/id/eprint/16874/2/Software%20defect%20prediction%20framework%20based%20on%20hybrid%20metaheuristic%20optimization%20methods.pdf
_version_ 1747833905393696768
spelling my-utem-ep.168742022-06-02T10:39:31Z Software defect prediction framework based on hybrid metaheuristic optimization methods 2015 Wahono, Romi Satria Q Science (General) QA Mathematics A software defect is an error, failure, or fault in a software that produces an incorrect or unexpected result. Software defects are expensive in quality and cost. The accurate prediction of defect‐prone software modules certainly assist testing effort, reduce costs and improve the quality of software. The classification algorithm is a popular machine learning approach for software defect prediction. Unfortunately, software defect prediction remains a largely unsolved problem. As the first problem, the comparison and benchmarking results of the defect prediction using machine learning classifiers indicate that, the poor accuracy level is dominant and no particular classifiers perform best for all the datasets. There are two main problems that affect classification performance in software defect prediction: noisy attributes and imbalanced class distribution of datasets, and difficulty of selecting optimal parameters of the classifiers. In this study, a software defect prediction framework that combines metaheuristic optimization methods for feature selection and parameter optimization, with meta learning methods for solving imbalanced class problem on datasets, which aims to improve the accuracy of classification models has been proposed. The proposed framework and models that are are considered to be the specific research contributions of this thesis are: 1) a comparison framework of classification models for software defect prediction known as CF-SDP, 2) a hybrid genetic algorithm based feature selection and bagging technique for software defect prediction known as GAFS+B, 3) a hybrid particle swarm optimization based feature selection and bagging technique for software defect prediction known as PSOFS+B, and 4) a hybrid genetic algorithm based neural network parameter optimization and bagging technique for software defect prediction, known as NN-GAPO+B. For the purpose of this study, ten classification algorithms have been selected. The selection aims at achieving a balance between established classification algorithms used in software defect prediction. The proposed framework and methods are evaluated using the state-of-the-art datasets from the NASA metric data repository. The results indicated that the proposed methods (GAFS+B, PSOFS+B and NN-GAPO+B) makes an impressive improvement in the performance of software defect prediction. GAFS+B and PSOFS+B significantly affected on the performance of the class imbalance suffered classifiers, such as C4.5 and CART. GAFS+B and PSOFS+B also outperformed the existing software defect prediction frameworks in most datasets. Based on the conducted experiments, logistic regression performs best in most of the NASA MDP datasets, without or with feature selection method. The proposed methods also generated the selected relevant features in software defect prediction. The top ten most relevant features in software defect prediction include branch count metrics, decision density, halstead level metric of a module, number of operands contained in a module, maintenance severity, number of blank LOC, halstead volume, number of unique operands contained in a module, total number of LOC and design density. 2015 Thesis http://eprints.utem.edu.my/id/eprint/16874/ http://eprints.utem.edu.my/id/eprint/16874/1/Software%20Defect%20Prediction%20Framework%20Based%20On%20Hybrid%20Metaheuristic%20Optimization%20Methods.pdf text en public http://eprints.utem.edu.my/id/eprint/16874/2/Software%20defect%20prediction%20framework%20based%20on%20hybrid%20metaheuristic%20optimization%20methods.pdf text en validuser https://plh.utem.edu.my/cgi-bin/koha/opac-detail.pl?biblionumber=96192 phd doctoral Universiti Teknikal Malaysia Melaka Faculty Of Information And Communication Technology Herman, Nanna Suryana 1. Al-ani, A., 2005. Feature subset selection using ant colony optimization. Expert Systems with Applications, 36(3), pp.6843–6853. 2. Anon, 2010. Artificial neural network model with parameter tuning assisted by genetic algorithm technique : study of critical velocity of slurry flow in pipeline. Asia Pacific Journal of Chemical Engineering, 5, pp.763–777. 3. Arisholm, E., Briand, L.C. & Fuglerud, M., 2007. Data Mining Techniques for Building Fault-proneness Models in Telecom Java Software. Proceedings of the The 18th IEEE International Symposium on Software Reliability, pp.215–224. 4. Arisholm, E., Briand, L.C. & Johannessen, E.B., 2010. A systematic and comprehensive investigation of methods to build and evaluate fault prediction models. Journal of Systems and Software, 83(1), pp.2–17. 5. Awad, M., 2010. Optimization RBFNNs Parameters Using Genetic Algorithms : International Journal of Computer Science and Security, 4(3), pp.295–307. 6. Azar, D. & Vybihal, J., 2011. An ant colony optimization algorithm to improve software quality prediction models: Case of class stability. Information and Software Technology, 53(4), pp.388–393. 7. Babaoglu, İ., Findik, O. & Ülker, E., 2010. A comparison of feature selection models utilizing binary particle swarm optimization and genetic algorithm in determining coronary artery disease using support vector machine. Expert Systems with Applications, 37(4), pp.3177–3183. 8. Benaddy, M. & Wakrim, M., 2012. Simulated Annealing Neural Network for Software Failure Prediction. International Journal of Software Engineering and Its Applications, 6(4). 9. Berndtsson, M., Hansson, J. & Olsson, B., 2008. Thesis projects: a guide for students in computer science and information systems, Springer-Verlag. 10. Beyer, H.-G. & Schwefel, H.-P., 2002. Evolution strategies – A comprehensive introduction. Natural Computing, 1(1), pp.3–52. 11. Bibi, S. et al., 2008. Regression via Classification applied on software defect estimation. Expert Systems with Applications, 34(3), pp.2091–2101. 12. Bishnu, P.S. & Bhattacherjee, V., 2012. Software Fault Prediction Using Quad Tree-Based K-Means Clustering Algorithm. IEEE Transactions on Knowledge and Data Engineering, 24(6), pp.1146–1150. 13. Boehm, B. & Basili, V.R., 2001. Top 10 list [software development]. Computer, 34(1), pp.135–137. 14. De Boer, P.-T. et al., 2005. A Tutorial on the Cross-Entropy Method. Annals of Operations Research, 134(1), pp.19–67. 15. Borg, A., Lavesson, N. & Boeva, V., 2013. Comparison of Clustering Approaches for Gene Expression Data. Frontiers in Artificial Intelligence and Applications, 257, pp.55–64. 16. Breiman, L., 1996. Bagging predictors R. Quinlan, ed. Machine Learning, 24(2), pp.123–140. 17. Breiman, L. et al., 1984. Classification and Regression Trees, Chapman and Hall. 18. Breiman, L., 2001. Random forests. Machine Learning, 45(1), pp.5–32. 19. Buzan, T. & Griffiths, C., 2013. Mind Maps for Business: Using the ultimate thinking tool to revolutionise how you work (2nd Edition), FT Press. 20. Cano, J.R., Herrera, F. & Lozano, M., 2003. Using evolutionary algorithms as instance selection for data reduction in KDD: an experimental study. IEEE Transactions on Evolutionary Computation, 7(6), pp.561–575. 21. Cao, H., Qin, Z. & Feng, T., 2012. A Novel PCA-BP Fuzzy Neural Network Model for Software Defect Prediction. Advanced Science Letters, 9(1), pp.423–428. 22. Catal, C., 2011. Software fault prediction: A literature review and current trends. Expert Systems with Applications, 38(4), pp.4626–4636. 23. Catal, C., Alan, O. & Balkan, K., 2011. Class noise detection based on software metrics and ROC curves. Information Sciences, 181(21), pp.4867–4877. 24. Catal, C. & Diri, B., 2009a. A systematic review of software fault prediction studies. Expert Systems with Applications, 36(4), pp.7346–7354. 25. Catal, C. & Diri, B., 2009b. Investigating the effect of dataset size, metrics sets, and feature selection techniques on software fault prediction problem. Information Sciences, 179(8), pp.1040–1058. 26. Catal, C., Sevim, U. & Diri, B., 2011. Practical development of an Eclipse-based software fault prediction tool using Naive Bayes algorithm. Expert Systems with Applications, 38(3), pp.2347–2353. 27. Challagulla, V., Bastani, F. & Yen, I., 2006. A Unified Framework for Defect Data Analysis Using the MBR Technique. 2006 18th IEEE International Conference on Tools with Artificial Intelligence (ICTAI’06), pp.39–46. 28. Challagulla, V.U.B., Bastani, F.B. & Paul, R.A., 2004. Empirical Assessment of Machine Learning based Software Defect Prediction Techniques. In 10th IEEE International Workshop on Object-Oriented Real-Time Dependable Systems. IEEE, pp. 263–270. 29. Chang, C.-P., Chu, C.-P. & Yeh, Y.-F., 2009. Integrating in-process software defect prediction with association mining to discover defect pattern. Information and Software Technology, 51(2), pp.375–384. 30. Chang, R.H., Mu, X.D. & Zhang, L., 2011. Software Defect Prediction Using Non-Negative Matrix Factorization. Journal of Software, 6(11), pp.2114–2120. 31. Chen, Y., Miao, D. & Wang, R., 2010. A rough set approach to feature selection based on ant colony optimization. Pattern Recognition Letters, 31(3), pp.226–233. 32. Chuang, L.-Y., Tsai, S.-W. & Yang, C.-H., 2011. Improved binary particle swarm optimization using catfish effect for feature selection. Expert Systems with Applications, 38(10), pp.12699–12707. Available at: http://linkinghub.elsevier.com/retrieve/pii/S0957417411005732 [Accessed December 23, 2012]. 33. Collin, S., 2004. Dictionary of Computing Fifth Edit., Bloomsbury Publishing. 34. Correa, A., González, A. & Ladino, C., 2011. Genetic Algorithm Optimization for Selecting the Best Architecture of a Multi-Layer Perceptron Neural Network : A Credit Scoring Case. In SAS Global Forum 2011. pp. 1–8. 35. Cukic, B. & Singh, H., 2004. Robust Prediction of Fault-Proneness by Random Forests. 15th International Symposium on Software Reliability Engineering, pp.417–428. 36. Dawson, C.W., 2009. Projects in Computing and Information Systems A Student’s Guide Second Edition, Pearson Education Limited. 37. Dejaeger, K., Verbraken, T. & Baesens, B., 2013. Toward Comprehensible Software Fault Prediction Models Using Bayesian Network Classifiers. IEEE Transactions on Software Engineering, 39(2), pp.237–257. 38. Demsar, J., 2006. Statistical Comparisons of Classifiers over Multiple Data Sets. The Journal of Machine Learning Research, 7, pp.1–30. 39. Denaro, G., 2000. Estimating software fault-proneness for tuning testing activities. In Proceedings of the 22nd International Conference on Software engineering - ICSE ’00. New York, New York, USA: ACM Press, pp. 704–706. 40. Dorigo, M. & Stützle, T., 2004. Ant Colony Optimization T. U. Gonzalez, ed. IEEE Computational Intelligence Magazine, 1(4), pp.28–39. 41. Elish, K.O. & Elish, M.O., 2008. Predicting defect-prone software modules using support vector machines. Journal of Systems and Software, 81(5), pp.649–660. 42. El Emam, K. & Laitenberger, O., 2001. Evaluating capture-recapture models with two inspectors. IEEE Transactions on Software Engineering, 27(9), pp.851–864. 43. El Emam, K., Melo, W. & Machado, J.C., 2001. The prediction of faulty classes using object-oriented design metrics. Journal of Systems and Software, 56(1), pp.63–75. 44. Fawcett, T., 2006. An introduction to ROC analysis. Pattern Recognition Letters, 27(8), pp.861–874. 45. Fenton, N. et al., 2007. Predicting software defects in varying development lifecycles using Bayesian nets. Information and Software Technology, 49(1), pp.32–43. 46. Fenton, N., Krause, P. & Neil, M., 2001. A Probabilistic Model for Software Defect Prediction. IEEE Transactions on Software Engineering, 44(0), pp.1–35. 47. Fenton, N.E. & Neil, M., 1999. A critique of software defect prediction models. IEEE Transactions on Software Engineering, 25(5), pp.675–689. 48. Floudaos, C.A. & Pardalos, P.M., 2009. Encyclopedia of Optimization Second Edi., Springer Science. 49. Gao, K. et al., 2011. Choosing software metrics for defect prediction: an investigation on feature selection techniques. Software: Practice and Experience, 41(5), pp.579–606. 50. Gayatri, N., Nickolas, S., et al., 2010. Feature Selection Using Decision Tree Induction in Class level Metrics Dataset for Software Defect Predictions. Proceedings of the World Congress on Engineering and Computer Science, I(1), pp.124–129. 51. Gayatri, N., Reddy, S. & Nickolas, A.V., 2010. Feature Selection Using Decision Tree Induction in Class level Metrics Dataset for Software Defect Predictions. Lecture Notes in Engineering and Computer Science, 2186(1), pp.124–129. 52. Glover, F. & Laguna, M., 1997. Tabu Search C. R. Reeves, ed. Journal of Computational Molecular Cell Biology, 16(12), pp.1689–703. 53. Goldberg, D.E., 1989. Genetic Algorithms in Search, Optimization, and Machine Learning, Addison-Wesley. 54. Gondra, I., 2008. Applying machine learning to software fault-proneness prediction. Journal of Systems and Software, 81(2), pp.186–195. 55. Gorunescu, F., 2011. Data Mining: Concepts, Models and Techniques, Springer-Verlag Berlin Heidelberg. 56. Gray, D. et al., 2012. Reflections on the NASA MDP data sets. IET Software, 6(6), p.549. 57. Gray, D. et al., 2011. The misuse of the NASA Metrics Data Program data sets for automated software defect prediction. 15th Annual Conference on Evaluation & Assessment in Software Engineering (EASE 2011), pp.96–103. 58. Güneş Koru, a. & Liu, H., 2007. Identifying and characterizing change-prone classes in two large-scale open-source products. Journal of Systems and Software, 80(1), pp.63–73. 59. Güneş Koru, A. & Tian, J., 2003. An empirical comparison and characterization of high defect and high complexity modules. Journal of Systems and Software, 67(3), pp.153–163. 60. Guo, L., Cukic, B. & Singh, H., 2003. Predicting fault prone modules by the Dempster-Shafer belief networks. In Proceedings of the 18th IEEE International Conference on Automated Software Engineering, 2003. IEEE Comput. Soc, pp. 249–252. 61. Guo, X.C. et al., 2008. A novel LS-SVMs hyper-parameter selection based on particle swarm optimization. Neurocomputing, 71(16-18), pp.3211–3215. 62. Hall, M.A. & Holmes, G., 2003. Benchmarking attribute selection techniques for discrete class data mining. IEEE Transactions on Knowledge and Data Engineering, 15(6), pp.1437–1447. 63. Hall, T. et al., 2012. A Systematic Literature Review on Fault Prediction Performance in Software Engineering. IEEE Transactions on Software Engineering, 38(6), pp.1276–1304. 64. Halstead, M.H., 1977. Elements of Software Science, Elsevier Science Ltd. 65. Han, J. & Kamber, M., 2011. Data mining: concepts and techniques Third Edit., Morgan Kaufmann Publishers. 66. Hand, D.J. & Till, R.J., 2001. A Simple Generalisation of the Area Under the ROC Curve for Multiple Class Classification Problems. Machine Learning, 45, pp.171–186. 67. Hanley, J.A. & McNeil, B.J., 1982. The meaning and use of the area under a receiver operating characteristic (ROC) curve. Radiology, 143, pp.29–36. 68. Huang, C.-L. & Wang, C.-J., 2006. A GA-based feature selection and parameters optimizationfor support vector machines. Expert Systems with Applications, 31(2), pp.231–240. 69. IEEE, 1990. IEEE Standard Glossary of Software Engineering Terminology, Inst. of Electrical and Electronical Engineers. 70. J. Pai, G. & Bechta Dugan, J., 2007. Empirical Analysis of Software Fault Content and Fault Proneness Using Bayesian Methods. IEEE Transactions on Software Engineering, 33(10), pp.675–686. 71. Jiang, Y. et al., 2011. Software Defect Detection with rocus. Journal of Computer Science and Technology, 26(2), pp.328–342. 72. Jin, C., Jin, S.-W. & Ye, J.-M., 2012. Artificial neural network-based metric selection for software fault-prone prediction model. IET Software, 6(6), p.479. 73. Jones, C. & Bonsignour, O., 2012. The Economics of Software Quality, Pearson Education, Inc. 74. Jorgensen, M. & Shepperd, M., 2007. A Systematic Review of Software Development Cost Estimation Studies. IEEE Transactions on Software Engineering, 33(1). 75. Kabir, M.M., Shahjahan, M. & Murase, K., 2012. A new hybrid ant colony optimization algorithm for feature selection. Expert Systems with Applications, 39(3), pp.3747–3763. 76. Kanmani, S. et al., 2004. Object oriented software quality prediction using general regression neural networks. ACM SIGSOFT Software Engineering Notes, 29(5), p.1. 77. Karthik, R. & Manikandan, N., 2010. Defect association and complexity prediction by mining association and clustering rules. 2010 2nd International Conference on Computer Engineering and Technology, pp.V7–569–V7–573. 78. Kennedy, J. & Eberhart, R., 1995. Particle swarm optimization. Proceedings of ICNN’95 - International Conference on Neural Networks, 4, pp.1942–1948. 79. Kenny, G.Q., 1993. Estimating defects in commercial software during operational use. IEEE Transactions on Reliability, 42(1), pp.107–115. 80. Khoshgoftaar, T.M. et al., 1997. Application of neural networks to software quality modeling of a very large telecommunications system. IEEE transactions on neural networks / a publication of the IEEE Neural Networks Council, 8(4), pp.902–9. 81. Khoshgoftaar, T.M. et al., 2000. Classification-tree models of software-quality over multiple releases. IEEE Transactions on Reliability, 49(1), pp.4–11. 82. Khoshgoftaar, T.M. & Allen, E.B., 2000. Prediction of software faults using fuzzy nonlinear regression modeling. Proceedings. Fifth IEEE International Symposium on High Assurance Systems Engineering (HASE 2000), pp.281–290. 83. Khoshgoftaar, T.M. & Gao, K., 2009. Feature Selection with Imbalanced Data for Software Defect Prediction. 2009 International Conference on Machine Learning and Applications, pp.235–240. 84. Khoshgoftaar, T.M., Gao, K. & Seliya, N., 2010. Attribute Selection and Imbalanced Data: Problems in Software Defect Prediction. 2010 22nd IEEE International Conference on Tools with Artificial Intelligence, pp.137–144. 85. Khoshgoftaar, T.M. & Van Hulse, J., 2009. Empirical Case Studies in Attribute Noise Detection. IEEE Transactions on Systems, Man, and Cybernetics, Part C (Applications and Reviews), 39(4), pp.379–388. 86. Khoshgoftaar, T.M., Van Hulse, J. & Napolitano, A., 2011. Comparing Boosting and Bagging Techniques With Noisy and Imbalanced Data. IEEE Transactions on Systems, Man, and Cybernetics - Part A: Systems and Humans, 41(3), pp.552–568. 87. Khoshgoftaar, T.M. & Seliya, N., 2004. Comparative Assessment of Software Quality Classification Techniques: An Empirical Case Study. Empirical Software Engineering, 9(3), pp.229–257. 88. Khoshgoftaar, T.M. & Seliya, N., 2002. Tree-based software quality estimation models for fault prediction. Proceedings Eighth IEEE Symposium on Software Metrics, pp.203–214. 89. Khoshgoftaar, T.M., Seliya, N. & Gao, K., 2005. Assessment of a New Three-Group Software Quality Classification Technique: An Empirical Case Study. Empirical Software Engineering, 10(2), pp.183–218. 90. Khoshgoftaar, T.M., Seliya, N. & Sundaresh, N., 2006. An empirical study of predicting software faults with case-based reasoning. Software Quality Journal, 14(2), pp.85–111. 91. Kim, S. et al., 2011. Dealing with noise in defect prediction. Proceeding of the 33rd International Conference on Software engineering - ICSE ’11, pp.481–490. 92. Kirkpatrick, S., Gelatt, C.D. & Vecchi, M.P., 1983. Optimization by simulated annealing. F. Uni Oldenburg, ed. Science, 220(4598), pp.671–680. 93. Kitchenham, B. & Charters, S., 2007. Guidelines for performing Systematic Literature Reviews in Software Engineering. EBSE Technical Report version 2.3, EBSE-2007-. 94. Ko, Y.-D. et al., 2009. Modeling and optimization of the growth rate for ZnO thin films using neural networks and genetic algorithms. Expert Systems with Applications, 36(2), pp.4061–4066. 95. Koru, A.G., 2005. Building Defect Prediction Models in Practice. IEEE Software, 22(6), pp.23–29. 96. Koru, A.G. & Liu, H., 2005. An investigation of the effect of module size on defect prediction using static measures. Proceedings of the 2005 workshop on Predictor models in software engineering - PROMISE ’05, pp.1–5. 97. Kudo, M. & Sklansky, J., 2000. Comparison of algorithms that select features for pattern classifiers. Pattern Recognition, 33(1), pp.25–41. 98. Lai, C.-C. & Wu, C.-H., 2007. Particle Swarm Optimization-Aided Feature Selection for Spam Email Classification. In Innovative Computing Information and Control 2007 ICICIC 07 Second International Conference on. p. 165. 99. Lai, C.-C., Wu, C.-H. & Tsai, M.-C., 2009. Feature Selection Using Particle Swarm Optimization with Application in Spam Filtering. International Journal of Innovative Computing, 5(2), pp.423–432. 100. Lee, J. & Kang, S., 2007. GA based meta-modeling of BPN architecture for constrained approximate optimization. International Journal of Solids and Structures, 44(18-19), pp.5980–5993. 101. Lessmann, S. et al., 2008. Benchmarking Classification Models for Software Defect Prediction: A Proposed Framework and Novel Findings. IEEE Transactions on Software Engineering, 34(4), pp.485–496. 102. Lessmann, S., Caserta, M. & Arango, I.M., 2011. Tuning metaheuristics: A data mining based approach for particle swarm optimization. Expert Systems with Applications, 38(10), pp.12826–12838. 103. Li, Z. & Reformat, M., 2007. A practical method for the software fault-prediction. In 2007 IEEE International Conference on Information Reuse and Integration. IEEE, pp. 659–666. 104. Lin, S.-W. et al., 2009. Parameter determination and feature selection for back-propagation network by particle swarm optimization. Knowledge and Information Systems, 21(2), pp.249–266. 105. Lin, S.-W. et al., 2008. Particle swarm optimization for parameter determination and feature selection of support vector machines. Expert Systems with Applications, 35(4), pp.1817–1824. 106. Liu, W. & Zhang, D., 2009. Feature Subset Selection Based on Improved Discrete Particle Swarm and Support Vector Machine Algorithm. 2009 International Conference on Information Engineering and Computer Science, pp.1–4. 107. Liu, Y. et al., 2011. An Improved Particle Swarm Optimization for Feature Selection. Journal of Bionic Engineering, 8(2), pp.191–200. 108. Liu, Y. et al., 2011. Combining integrated sampling with SVM ensembles for learning from imbalanced datasets. Information Processing & Management, 47(4), pp.617–631. 109. Liu, Y., Khoshgoftaar, T.M. & Seliya, N., 2010. Evolutionary Optimization of Software Quality Modeling with Multiple Repositories. IEEE Transactions on Software Engineering, 36(6), pp.852–864. 110. Loh, W.-Y., 2011. Classification and regression trees. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery, 1(1), pp.14–23. 111. Lyu, M.R., 2000. Software quality prediction using mixture models with EM algorithm. In Proceedings First Asia-Pacific Conference on Quality Software. IEEE Comput. Soc, pp. 69–78. 112. Ma, Y. et al., 2012. Transfer learning for cross-company software defect prediction. Information and Software Technology, 54(3), pp.248–256. 113. Ma, Y., Guo, L. & Cukic, B., 2006. A statistical framework for the prediction of fault-proneness. In Advances in Machine Learning. pp. 237–263. 114. Ma, Y., Guo, L. & Cukic, B., 2007. A Statistical Framework for the Prediction of Fault-Proneness. In Advances in Machine Learning Applications in Software Engineering. pp. 1–26. 115. Maimon, O. & Rokach, L., 2010. Data Mining and Knolwedge Discovery Handbook Second Edition, Springer. 116. McCabe, T.J., 1976. A Complexity Measure M. Shepperd, ed. IEEE Transactions on Software Engineering, SE-2(4), pp.308–320. 117. McDonald, M., Musson, R. & Smith, R., 2007. The practical guide to defect prevention. Control, pp.260–272. 118. Mende, T. & Koschke, R., 2009. Revisiting the evaluation of defect prediction models. Proceedings of the 5th International Conference on Predictor Models in Software Engineering - PROMISE ’09, p.1. 119. Menzies, T. et al., 2004. Assessing predictors of software defects. In Proceedings of the Workshop on Predictive Software Models. 120. Menzies, T. et al., 2010. Defect prediction from static code features: current results, limitations, new approaches. Automated Software Engineering, 17(4), pp.375–407. 121. Menzies, T., Greenwald, J. & Frank, A., 2007. Data Mining Static Code Attributes to Learn Defect Predictors. IEEE Transactions on Software Engineering, 33(1), pp.2–13. 122. Mısırlı, A.T., Bener, A.B. & Turhan, B., 2011. An industrial case study of classifier ensembles for locating software defects. Software Quality Journal, 19(3), pp.515–536. 123. Myrtveit, I., Stensrud, E. & Shepperd, M., 2005. Reliability and validity in comparative studies of software prediction models. IEEE Transactions on Software Engineering, 31(5), pp.380–391. 124. Nagappan, N. & Ball, T., 2005. Static analysis tools as early indicators of pre-release defect density. In Proceedings of the 27th international conference on Software engineering - ICSE ’05. New York, New York, USA: ACM Press, p. 580. 125. Naik, K. & Tripathy, P., 2008. Software Testing and Quality Assurance, John Wiley & Sons, Inc. 126. Ostrand, T.J., Weyuker, E.J. & Bell, R.M., 2005. Predicting the location and number of faults in large software systems. IEEE Transactions on Software Engineering, 31(4), pp.340–355. 127. Özçift, A. & Gülten, A., 2013. Genetic algorithm wrapped Bayesian network feature selection applied to differential diagnosis of erythemato-squamous diseases. Digital Signal Processing, 23(1), pp.230–237. 128. Park, B., Oh, S. & Pedrycz, W., 2013. The design of polynomial function-based neural network predictors for detection of software defects. Information Sciences, 229, pp.40–57. 129. Park, Y.W. & Rhee, S., 2007. Process modeling and parameter optimization using neural network and genetic algorithms for aluminum laser welding automation. The International Journal of Advanced Manufacturing Technology, 37(9-10), pp.1014–1021. 130. Pelayo, L. & Dick, S., 2012. Evaluating Stratification Alternatives to Improve Software Defect Prediction. IEEE Transactions on Reliability, 61(2), pp.516–525. 131. Peng, J. & Wang, S., 2010. Parameter Selection of Support Vector Machine based on Chaotic Particle Swarm Optimization Algorithm. Electrical Engineering, pp.3271–3274. 132. Peng, Y., Wang, G. & Wang, H., 2012. User preferences based software defect detection algorithms selection using MCDM. Information Sciences, 191, pp.3–13. 133. Peters, F. et al., 2013. Balancing Privacy and Utility in Cross-Company Defect Prediction. IEEE Transactions on Software Engineering, 39(8), pp.1054–1068. 134. Pizzi, N.J., Summers, A.R. & Pedrycz, W., 2002. Software quality prediction using median-adjusted class labels. Proceedings of the 2002 International Joint Conference on Neural Networks. IJCNN’02 (Cat. No.02CH37290), (1), pp.2405–2409. 135. Porter, A.A. & Selby, R.W., 1990. Empirically guided software development using metric-based classification trees. IEEE Software, 7(2), pp.46–54. 136. Quah, T., Mie, M. & Thwin, T., 2003. Application of neural networks for software quality prediction using object-oriented metrics. International Conference on Software Maintenance, 2003. ICSM 2003. Proceedings., pp.116–125. 137. Quinlan, J.R., 1993. C4.5: Programs for Machine Learning M. Kaufmann, ed., Morgan Kaufmann. 138. Quinlan, J.R., 1986. Induction of Decision Trees. Machine Learning, 1(1), pp.81–106. 139. Radjenović, D. et al., 2013. Software fault prediction metrics: A systematic literature review. Information and Software Technology, 55(8), pp.1397–1418. 140. Rakitin, S.R., 2001. Software verification and validation for practitioners and managers, Artech House, Inc. Norwood, MA, USA. 141. Ren, C. et al., 2014. Optimal parameters selection for BP neural network based on particle swarm optimization: A case study of wind speed forecasting. Knowledge-Based Systems, 56, pp.226–239. 142. Sammut, C. & Webb, G.I., 2011. Encyclopedia of Machine Learning, Springer. 143. Sandhu, P.S., Kumar, S. & Singh, H., 2007. Intelligence System for Software Maintenance Severity Prediction. Journal of Computer Science, 3(5), pp.281–288. 144. Seiffert, C., Khoshgoftaar, T.M. & Van Hulse, J., 2009. Improving Software-Quality Predictions With Data Sampling and Boosting. IEEE Transactions on Systems, Man, and Cybernetics - Part A: Systems and Humans, 39(6), pp.1283–1294. 145. Seliya, N. & Khoshgoftaar, T.M., 2007. Software Quality Analysis of Unlabeled Program Modules With Semisupervised Clustering. IEEE Transactions on Systems, Man, and Cybernetics - Part A: Systems and Humans, 37(2), pp.201–211. 146. Seliya, N. & Khoshgoftaar, T.M., 2007. Software quality estimation with limited fault data: a semi-supervised learning perspective. Software Quality Journal, 15(3), pp.327–344. 147. Shepperd, M. et al., 2013. Data Quality: Some Comments on the NASA Software Defect Datasets. IEEE Transactions on Software Engineering, 39(9), pp.1208–1215. 148. Shepperd, M., Cartwright, M. & Mair, C., 2006. Software defect association mining and defect correction effort prediction. IEEE Transactions on Software Engineering, 32(2), pp.69–82. 149. Shepperd, M. & Kadoda, G., 2001. Comparing software prediction techniques using simulation. IEEE Transactions on Software Engineering, 27(11), pp.1014–1022. 150. Shi, Y. & Eberhart, R., 1998. A modified particle swarm optimizer. In 1998 IEEE International Conference on Evolutionary Computation Proceedings IEEE World Congress on Computational Intelligence Cat No98TH8360. Ieee, pp. 69–73. 151. Shull, F. et al., 2002. What we have learned about fighting defects. In Proceedings Eighth IEEE Symposium on Software Metrics 2002. IEEE, pp. 249–258. 152. Song, Q. et al., 2011. A General Software Defect-Proneness Prediction Framework. IEEE Transactions on Software Engineering, 37(3), pp.356–370. 153. Subbotin, S. & Oleynik, A., 2007. Modifications of Ant Colony Optimization Method for Feature Selection. In 2007 9th International Conference - The Experience of Designing and Applications of CAD Systems in Microelectronics. IEEE, pp. 493–494. 154. Sun, Z., Song, Q. & Zhu, X., 2012. Using Coding-Based Ensemble Learning to Improve Software Defect Prediction. IEEE Transactions on Systems, Man, and Cybernetics, Part C (Applications and Reviews), 42(6), pp.1806–1817. 155. Tony Hou, T.-H., Su, C.-H. & Chang, H.-Z., 2008. Using neural networks and immune algorithms to find the optimal parameters for an IC wire bonding process. Expert Systems with Applications, 34(1), pp.427–436. 156. Tosun, A., Turhan, B. & Bener, A., 2008. Ensemble of software defect predictors. In Proceedings of the Second ACM-IEEE international symposium on Empirical software engineering and measurement - ESEM ’08. New York, New York, USA: ACM Press, p. 318. 157. Turhan, B., Menzies, T., et al., 2009. On the relative value of cross-company and within-company data for defect prediction. Empirical Software Engineering, 14(5), pp.540–578. 158. Turhan, B., Kocak, G. & Bener, A., 2009. Data mining source code for locating software bugs: A case study in telecommunication industry. Expert Systems with Applications, 36(6), pp.9986–9990. 159. Unler, A. & Murat, A., 2010. A discrete particle swarm optimization method for feature selection in binary classification problems. European Journal of Operational Research, 206(3), pp.528–539. 160. Unterkalmsteiner, M. et al., 2012. Evaluation and Measurement of Software Process Improvement—A Systematic Literature Review. IEEE Transactions on Software Engineering, 38(2), pp.398–424. 161. Vandecruys, O. et al., 2008. Mining software repositories for comprehensible software fault prediction models. Journal of Systems and Software, 81(5), pp.823–839. 162. Vilalta, R. et al., 2004. Using Meta-Learning to Support Data Mining. International Journal of Computer Science Applications, 1(1), pp.31–45. Available at: http://cling.csd.uwo.ca/cs860/papers/Meta_learning_IJCSA04.pdf. 163. Wahono, R.S. & Herman, N.S., 2014. Genetic Feature Selection for Software Defect Prediction. Advanced Science Letters, 20(1), pp.239–244. 164. Wahono, R.S. & Suryana, N., 2013. Combining Particle Swarm Optimization based Feature Selection and Bagging Technique for Software Defect Prediction. International Journal of Software Engineering and Its Applications, 7(5), pp.153–166. 165. Wang, H., Khoshgoftaar, T.M. & Napolitano, A., 2010. A Comparative Study of Ensemble Feature Selection Techniques for Software Defect Prediction. 2010 Ninth International Conference on Machine Learning and Applications, pp.135–140. 166. Wang, H., Khoshgoftaar, T.M. & Napolitano, A., 2012. Software measurement data reduction using ensemble techniques. Neurocomputing, 92, pp.124–132. 167. Wang, Q. & Yu, B., 2004. Extract rules from software quality prediction model based on neural network. 16th IEEE International Conference on Tools with Artificial Intelligence, (Ictai), pp.191–195. 168. Wang, S. & Yao, X., 2013. Using Class Imbalance Learning for Software Defect Prediction. IEEE Transactions on Reliability, 62(2), pp.434–443. 169. Wang, T.-Y. & Huang, C.-Y., 2007. Applying optimized BPN to a chaotic time series problem. Expert Systems with Applications, 32(1), pp.193–200. 170. Witten, I.H., Frank, E. & Hall, M.A., 2011. Data Mining Third Edition, Elsevier Inc. 171. Wong, W.E. et al., 2012. Effective Software Fault Localization Using an RBF Neural Network. IEEE Transactions on Reliability, 61(1), pp.149–169. 172. Wu, Q., 2011. A self-adaptive embedded chaotic particle swarm optimization for parameters selection of Wv-SVM. Expert Systems with Applications, 38(1), pp.184–192. 173. Xing, F., Guo, P. & Lyu, M.R., 2005. A Novel Method for Early Software Quality Prediction Based on Support Vector Machine. 16th IEEE International Symposium on Software Reliability Engineering (ISSRE’05), pp.213–222. 174. Yusta, S.C., 2009. Different metaheuristic strategies to solve the feature selection problem. Pattern Recognition Letters, 30(5), pp.525–534. 175. Zainuddin, M.F., 2006. K-Chart: A Tool Research Planning and Monitoring. Journal of Quality Measurement and Analysis, 2(1), pp.123–129. 176. Zhang, C. & Hu, H., 2005. Feature selection using the hybrid of ant colony optimization and mutual information for the forecaster. Machine Learning, 3(August), pp.18–21. 177. Zhang, H., Wang, M. & Huang, X., 2010. Parameter Selection of Support Vector Regression Based on Particle Swarm Optimization. 2010 IEEE International Conference on Granular Computing, pp.834–838. 178. Zhang, H. & Zhang, X., 2007. Comments on “Data Mining Static Code Attributes to Learn Defect Predictors.” IEEE Transactions on Software Engineering, 33(9), pp.635–637. 179. Zhang, P. & Chang, Y., 2012. Software fault prediction based on grey neural network. In 2012 8th International Conference on Natural Computation. IEEE, pp. 466–469. 180. Zhao, M. et al., 2011. Feature selection and parameter optimization for support vector machines: A new approach based on genetic algorithm with feature chromosomes. Expert Systems with Applications, 38(5), pp.5197–5204. 181. Zheng, J., 2010. Cost-sensitive boosting neural networks for software defect prediction. Expert Systems with Applications, 37(6), pp.4537–4543. 182. Zhou, Y. & Leung, H., 2006. Empirical Analysis of Object-Oriented Design Metrics for Predicting High and Low Severity Faults. IEEE Transactions on Software Engineering, 32(10), pp.771–789. 183. Zubrow, D. & Clark, B., 2002. How Good Is the Software : A Review of Defect Prediction Techniques. In Software Engineering Process Group (SEPG) Conference.