UPSI Digital Repository (UDRep)
|
|
|
Abstract : |
To increase the quality of loans provision and reduce the risk involved in this process, several credit scoring models have been developed and utilized to improve the process of assessing credit worthiness. Credit scoring is an evaluation of the risk connected with lending to clients (consumers) or an organization. The GustafsonKessel (GK) algorithm has become one of the most valuable tools for credit scoring. However, this algorithm demonstrates a relatively poor capability to identify a subset of features from a large dataset. Most methods that use the GK algorithm require a predefined number of clusters.This paper presents a new GK-based modified binary particle swarm optimization (MBPSO) approach to increase the classification accuracy of the GK algorithm. The proposed MBPSO consists of three parts. First, the figure of particles is utilized to determine the optimal number of clusters automatically and overcome the drawback of the GK algorithm that requires a predefined number of clusters. A subset of features is identified because the same dataset may contain influencing features or a high level of noise. The two procedures are then combined in the same optimization method to increase the classification accuracy of the GK algorithm. Second, the updating function uses velocity and position to update the next position for every particle in the swarm. Third, a kernel fuzzy clustering method (KFCM) is used as the fitness function because this function can analyze high- dimensional data. These modifications are utilized as preprocessing steps before the classification of credit data is performed. Internal measures of clustering are conducted on Australian, German, and Taiwan standard datasets that contain 690, 1,000, and 30,000 instances,respectively, with several feature properties. Results show that the GK algorithm is good at separating the data into clusters. Furthermore, the fuzzy Rand validity measures of the three credit datasets derived by using the proposed method of combining the GK algorithm with a MBPSO are greater than the values of the two other compared methods. This finding means that fuzzy partitioning (classification) is robust therefore, the risk associated with loans provision can be reduced when the proposed method is used. |
References |
1. Gan G, Ma C, Wu J (2007) Data clustering: theory, algorithms,and applications, vol 20. SIAM 2. Sanchez JFM, Lechuga GP (2016) Assessment of a credit scor-ing system for popular bank savings and credit. Contad Adm 61(2):391–417 3. Abdou HA, Pointon J (2011) Credit scoring, statistical techniques and evaluation criteria: a review of the literature. Financ Manag 18(2–3):59–88 4. Leung K, Cheong F, Cheong C (2010) A comparison of traditional and simple artificial immune system (sais) techniques in consumer credit scoring. Int J Artif Intell Soft Comput 2(1–2):1–25 5. Fisher RA (1936) The use of multiple measurements in taxonomic problems. Ann Eugen 7(2):179–188 6. Wiginton JC (1980) A note on the comparison of logit and discriminant models of consumer credit behavior. J Financ Quant Anal 15(03):757–770 7. Grablowsky BJ, Talley WK (1981) Probit and discriminant functions for classifying credit applicants—a comparison. J Econ Bus 33(3):254–261 8. Mangasarian OL (1965) Linear and nonlinear separation of patterns by linear programming. Oper Res 13(3):444–452 9. Henley W, Hand DJ (1996) A k-nearest-neighbour classifier for assessing consumer credit risk. Statistician 45(1):77–95 10. Lee T-S, Chiu C-C, Chou Y-C, Lu C-J (2006) Mining the customer credit using classification and regression tree and multivariate adaptive regression splines. Comput Stat Data Anal 50(4):1113–1130 11. Lahsasna A, Ainon RN, Teh YW (2010) Credit scoring models using soft computing methods: a survey. Int Arab J Inf Technol 7(2):115–123 12. Abdou H, Pointon J, El-Masry A (2008) Neural nets versus conventional techniques in credit scoring in Egyptian banking. Exp Syst Appl 35(3):1275–1292 13. Bezdek JC, Ehrlich R, Full W (1984) Fcm: the fuzzy c-means clustering algorithm. Comput Geosci 10(2–3):191–203 14. Klawonn F, Hoppner F (2009) Fuzzy cluster analysis from the viewpoint of robust statistics. In: Views on fuzzy sets and systems from different perspectives. Springer, pp 439–455 15. Klawonn F (2013) What can fuzzy cluster analysis contribute to clustering of high-dimensional data? In: International workshop on fuzzy logic and applications. Springer, pp 1–14 16. Kennedy J, Eberhart RC (1997) A discrete binary version of the particle swarm algorithm. In: 1997 IEEE international conference on systems, man, and cybernetics, 1997. Computational Cybernetics and Simulation, vol 5. IEEE, pp 4104–4108 17. Kabir MM, Shahjahan M, Murase K (2012) A new hybrid ant colony optimization algorithm for feature selection. Exp Syst Appl 39(3):3747–3763 18. Kohavi R, John GH (1997) Wrappers for feature subset selection. Artif Intell 97(1):273–324 19. Hu Q, Yu D, Liu J, Wu C (2008) Neighborhood rough set based heterogeneous feature subset selection. Inf Sci 178(18):3577–3594 20. Wang L, Li H, Huang JZ (2008) Variable selection in nonparametric varying-coefficient models for analysis of repeated measurements. J Am Stat Assoc 103(484):1556–1569 21. Somol P, Pudil P, Kittler J (2004) Fast branch & bound algorithms for optimal feature selection. IEEE Trans Pattern Anal Mach Intell 26(7):900–912 22. Yang J, Olafsson S (2006) Optimization-based feature selection with adaptive instance sampling. Comput Oper Res 33(11):3088–3106 23. Pudil P, Novovicov ˇ a J, Kittler J (1994) Floating search methods in feature selection. Pattern Recogn Lett 15(11):1119–1125 24. Olafsson S, Yang J (2005) Intelligent partitioning for feature selection. INFORMS J Comput 17(3):339–355 25. Bradley PS, Mangasarian OL, Street WN (1998) Feature selection via mathematical programming. INFORMS J Comput 10(2):209–217 26. Aryuni M, Madyatmadja ED (2015) Feature selection in credit scoring model for credit card applicants in xyz bank: a comparative study. Int J Multimed Ubiquitous Eng 10(5):17–24 27. Huang C-L, Wang C-J (2006) A ga-based feature selection and parameters optimization for support vector machines. Expert Syst Appl 31(2):231–240 28. Li T-S (2006) Feature selection for classification by using a gabased neural network approach. J Chin Inst Indust Eng 23(1):55–64 29. Talbi E-G, Jourdan L, Garcia-Nieto J, Alba E (2008) Comparison of population based metaheuristics for feature selection: application to microarray data classification. In: 2008 IEEE/ACS international conference on computer systems and applications.IEEE, pp 45–52 30. Lin S-W, Lee Z-J, Chen S-C, Tseng T-Y (2008) Parameter determination of support vector machine and feature selection using simulated annealing approach. Appl Soft Comput 8(4):1505–1512 31. Zorarpacı E, Ozel SA A hybrid approach of differential evolution and artificial bee colony for feature selection. Exp Syst Appl 32. Gadat S, Younes L (2007) A stochastic algorithm for feature selection in pattern recognition. J Mach Learn Res 8:509–547 33. Zhou Z, Liu X, Li P, Shang L (2014) Feature selection method with proportionate fitness based binary particle swarm optimization. In: Asia-Pacific conference on simulated evolution and learning. Springer, pp 582–592 34. Huang C-L, Dun J-F (2008) A distributed pso–svm hybrid system with feature selection and parameter optimization. Appl Soft Comput 8(4):1381–1391 35. Unler A, Murat A (2010) A discrete particle swarm optimization method for feature selection in binary classification problems. Eur J Oper Res 206(3):528–539 36. Sadatrasoul S, Gholamian M, Shahanaghi K (2015) Combination of feature selection and optimized fuzzy a priori rules: the case of credit scoring. Int Arab J Inf Technol 12(2):138–145 37. Moayedikia A, Jensen R, Wiil UK, Forsati R (2015) Weighted bee colony algorithm for discrete optimization problems with application to feature selection. Eng Appl Artif Intell 44:153–167 38. Omran MG, Salman A, Engelbrecht AP (2006) Dynamic clustering using particle swarm optimization with application in image segmentation. Pattern Anal Appl 8(4):332–344 39. Das S, Abraham A, Konar A (2008) Automatic kernel clustering with a multi-elitist particle swarm optimization algorithm. Pattern Recogn Lett 29(5):688–699 40. Kao Y, Lee S-Y (2009) Combining k-means and particle swarm optimization for dynamic data clustering problems. In: IEEE international conference on intelligent computing and intelligent systems, 2009. ICIS 2009, vol 1. IEEE, pp 757–761 41. Masoud H, Jalili S, Hasheminejad SMH (2013) Dynamic clustering using combinatorial particle swarm optimization. Appl Intell 38(3):289–314 42. H-L Ling, J-S Wu, Y Zhou, W-S Zheng How many clusters? A robust pso-based local density model. Neurocomputing 43. Kennedy J (2011) Particle swarm optimization. In: Encyclopedia of machine learning. Springer, pp 760–766 44. Gustafson D, Kessel W (1978) Fuzzy clustering with a fuzzy covariance matrix. Scientific Systems, Inc., Cambridge 45. Bezdek JC (2013) Pattern recognition with fuzzy objective function algorithms. Springer Science & Business Media 46. Bensaid AM, Hall LO, Bezdek JC, Clarke LP, Silbiger ML,Arrington JA, Murtagh RF (1996) Validity-guided (re) clustering with applications to image segmentation. IEEE Trans Fuzzy Syst 4(2):112–123 47. Xie XL, Beni G (1991) A validity measure for fuzzy clustering.IEEE Trans Pattern Anal Mach Intell 13(8):841–847 48. Wu K-L, Yang M-S (2005) A cluster validity index for fuzzy clustering. Pattern Recogn Lett 26(9):1275–1291 49. Hullermeier E, Rifqi M, Henzgen S, Senge R (2012) Comparing fuzzy partitions: a generalization of the rand index and related measures. IEEE Trans Fuzzy Syst 20(3):546–556 |
This material may be protected under Copyright Act which governs the making of photocopies or reproductions of copyrighted materials. You may use the digitized material for private study, scholarship, or research. |