1 MSc student of computer engineering – software, Pooyandegan Danesh Institution of Higher Education, Chalus, Iran

2 Full time science Committee member, Islamic Azad University of chalus, Chalus, Iran


Nowadays, data mining is one of the most significant issues. One field of data mining is a mixture of computer science and statistics which is considerably limited due to increase in digital data and growth of computational power of computers. One of the domains of data mining is the software cost estimation category. In this article, classifying techniques of learning algorithm of machine and COCOMO model as the most common estimation model of software costs are presented. Then, the analysis method of principal component approach is presented. This article presents a method to improve the performance of software cost estimation is suitable. Moreover, the basic data set is decreased and is turned into a new collection by using this method. Among the features, the best are extracted. The algorithms of several classifications are assessed by applying this method. Finally, the evidence for accuracy of our claims in terms of increase in estimation accuracy of software costs is presented.

Graphical Abstract


[1]      F. Soleimanian Gharehchopogh, A. Talebi, and I. Maleki, “Analysis of use case points models for software cost estimation,” International journal of academic Research, Part A, vol. 6, no. 3, pp. 118-124, 2014.

[2]      H. Leung and Z. Fan, “Software cost estimation,” Handbook of Software Engineering, Hong Kong Polytechnic University, pp. 1-14, 2002.

[3]      M. Fatima, S. F. Ahmad, and M. Hasan, “Fuzzy based software cost estimation methods: a comparative study,” IJIRST-International Journal for Innovative Research in Science & Technology, vol. 1, no. 7, pp. 287-290, 2014.

[4]      R. Tripathi and P. K. Rai, “Comparative study of software cost estimation techniques,” International Journal of Advanced Research in Computer Science and Software Engineering, vol. 6, no. 1, pp. 323-328, 2016.

[5]      T. Menzies, D. Port, Z. Chen, and J. Hihn, “Validation methods for calibrating software effort models,” presented at the27th International Conference on Software Engineering, Saint Louis, USA, 2005.

[6]      J. Hihn and T. Menzies, “Data mining methods and cost estimation models: Why is it so hard to infuse new ideas?,” in Proc. 30th IEEE/ACM International Conference on Automated Software Engineering Workshop (ASEW), pp. 5-9, Lincoln, USA, 2015.

[7]      T. Menzies, Y. Yang, G. Mathew, B. Boehm, and J. Hihn, “Negative results for software effort estimation,” Empiriccal Software Engineering, vol. 22, pp. 1-22, 2016.

[8]      S. Gupta, S. Tiwari, H. Singh, A. Shukla, and H. Raghuvanshi, “A comparison between various software cost estimation models," International Journal of Emerging Trends in Science and Technology, vol. 3, no. 11 , pp. 4771-4776, 2016.

[9]      T. Kaur and J. Singh, “A hybrid model for the enhancement in software effort estimation,” International Journal of Scientific & Engineering Research, vol. 6, no .7, pp. 619-624, 2015.

[10]   S. Sharma, A. Kaushik, and A. Tomar, “Software cost estimation using hybrid algorithm,” International Journal of Engineering Trends and Technology (IJETT), vol. 37, no. 2, pp. 62-71, 2016.

[11]   A. khatibi Bardsiri and S. M. Hashemi, “Software effort estimation: a survey of well-known approaches,” International Journal of Computer Science Engineering (IJCSE), vol. 3, no. 1, pp. 46-50, 2014.

[12]   G. Mathew, T. Menzies, and J. Hihn, “Impacts of bad ESP (early size predication) on software effort estimation,” arxiv preprint arxiv: 1612.03240, pp.1-17, February. 2018.

[13]   H. Najadat, I. Alsmadi, and Y. Shboul, “Predicting software projects cost estimation based on mining historical data,” International Scholarly Research Network, ISRN Software Engineering, vol. 2012, January 2012.

[14]   I. M. Baytas, K. Lin, F. Wang, A. K. Jain, and J. Zhou, “Stochastic convex sparse principal component analysis,” EURASIP Journal on Bioinformatics and Systems Biology, vol. 15, no. 1, pp. 2-11, 2016.

[15]   T. Ensor, J. Cami, N. H. Bhatt, and A. Soddu, “A principal component analysis of the diffuse interstellar bands,” The Astrophysical Journal, vol. 836, no. 2, pp. 1-31, 2017.

[16]   T. M. V.  Suryanarayana and P. B. Mistry, Principal component regression for crop yield estimation, Springer, 2016.

[17]   R. Tavoli, E. Kozegar, M. Shojafar, H. Soleimani, and Z. Pooranian, “Weighted PCA for improving document image retrieval system based on keyword spotting accuracy,” in Proc.  36th International Conference on Telecommunications and Signal Processing (TSP), pp. 773-777, Rome, Italy, 2013.

[18]   R. Tavoli and F. Mahmoudi, “PCA-based relevance feedback in document image retrieval,” arXiv preprint arXiv: 1209.2274, 2012.

[19]   M. Ghazanfari, S. Alizadeh, and B. Teimourpour, Data Mining & Knowledge Discovery, Third edition, Iran University of science and Technology, Tehran, 2008.

[20]   J. Fan, Y. Liao, and H. Lin, “An overview on the estimation of large covariance and precision matrices,” The Econometrics Journal, vol. 19, no. 1, pp. 1-46, 2015.

[21]   C. J. Idoine, E. Brethenoux, J. Hare, P. Krensky, N. Shen, S.  Sicular, and S. Vashisth, (2018, February 22). Gartner magic quadrant for data science and machine learning platforms. Available: Http://www.rapid /resource/Gartner-magic-quadrant-data-science-platforms. Html.

[22]   Boston, Mass, (2018, February 26). Rapid miner named a leader in the 2018 Gartner magic quadrant for data science and machine-learning platforms. Available: Http://

[23]   D. Morris. (2013). Rapid miner – a potential game changer. Available:Http://

[24]   K. Deshmukh, S. Raut, and J. Bhargaw, “An overview on implementation using hybrid naïve Bayes algorithm for text categorization,” International Journal on Future Revolution in Computer Science & Communication Engineering, vol. 4, no. 3, pp. 142-146, 2018.

[25]   D. M. Farid, L. Zhang, C. M. Rahman, M. A. Hossain, and R. Strachan, “Hybrid decision tree and naïve Bayes classifiers for multi-class classification tasks,” Expert System with Applications, vol. 4, no. 4, pp. 1937-1946, 2014.

[26]   A. A. Soofi and A. Awan, “Classification techniques in machine learning: applications and issues,” Journal of Basic & Applied Sciences, vol. 13, pp. 459-465, 2017.

[27]   M. Hossin and M. N. Sulaiman, “A review on evaluation metrics for data classification evaluation,” International Journal of Data Mining Knowledge Management Process (IJDKP), vol. 5, no. 2, pp. 1-11, 2015.

[28]   M. Keyvanpour and R. Tavoli, “Document image retrieval: Algorithms, analysis and promising directions,” International Journal of Software Engineering and Its Applications, vol. 7, no. 1, pp. 93-106, 2013.

[29]   R. Tavoli, “Classification and evaluation of document image retrieval system," Wseas Transactions on Computers, vol. 11, no. 10, pp. 329-338, 2012.

[30]   M. Keyvanpour, R. Tavoli, and S. Mozafari, “Document image retrieval based on keyword spotting using relevance feedback,” International Journal of Engineering, IJE Transactions A: Basics, vol. 27, no. 1, pp. 7-14, 2014. 

[31]   M. Keyvanpour and R. Tavoli, “Feature weighting for improving document image retrieval system performance,” arXiv preprint arXiv: 1206.1291, 2012.

[32]   M. Hasanluo, F. Soleimanian Gharehchopogh, "Software cost estimation by a new hybrid model of particle swarm optimization and k – nearest neighbor algorithms," Journal of  Electrical and Computer Engineering Innovations JECEI, Vol. 4, No. 1, pp. 49-55, 2016.