Evaluating the impact of point-biserial correlation-based feature selection on machine learning classifiers: a credit card fraud detection case study

Ahmed A.H. Alkurdi; Renas R. Asaad; Saman  M Almufti; Nawzat  S. Ahmed

doi:10.20397/2177-6652/2024.v24iSpecial.2882

Authors

Ahmed A.H. Alkurdi Department of Information Technology, Duhok Technical College, Duhok Polytechnic University, Duhok, KRG-Iraq; Department of Computer Science, College of Science, Nawroz University, Duhok, KRG-Iraq.
Renas R. Asaad Department of Computer Science, College of Science, Nawroz University, Duhok, KRG-Iraq. Department of Technical Informatics, Technical College of Informatics, Akre University for Applied Science, Duhok, KRG-Iraq.
Saman M Almufti Department of Computer Science, College of Science, Nawroz University, Duhok, KRG-Iraq Department of Technical Informatics, Technical College of Informatics, Akre University for Applied Science, Duhok, KRG-Iraq
Nawzat S. Ahmed Department of Information Technology, Duhok Technical College, Duhok Polytechnic University, Duhok, KRG-Iraq

DOI:

https://doi.org/10.20397/2177-6652/2024.v24iSpecial.2882

Keywords:

Credit Card, Fraud, Machine learning, Predictive performance, PBC-based feature selection

Abstract

Objective: This article examines the factors influencing the awareness and adoption of International Public Sector Accounting Standards (IPSAS) in public units in Vietnam. It seeks to identify key challenges and drivers that affect the understanding and implementation of these standards.

Methods: The study uses a survey methodology, gathering responses from a sample of public service units in Vietnam. The survey is designed to assess the level of awareness and readiness of these units to adopt IPSAS, considering variables such as management support, training, and technical infrastructure. Statistical analysis was performed to determine the most influential factors.

Results: The findings highlight that managerial support, adequate training, and access to proper technical infrastructure are crucial for successful IPSAS implementation. Lack of awareness, insufficient training, and resource limitations are the primary barriers to the adoption of these standards. Public units that have higher levels of awareness and better access to resources are more likely to successfully implement IPSAS.

Contribution: The study provides valuable insights into the process of adopting IPSAS in Vietnam’s public sector. It offers recommendations for improving training programs, enhancing managerial support, and strengthening the technical capacity of public units to ensure smoother implementation of the standards.

Conclusion: The implementation of IPSAS in Vietnam's public sector is affected by several key factors, including awareness, training, and infrastructure. Strengthening these areas can significantly improve the adoption process and enhance transparency and accountability in public financial management.

References

D. Varmedja, M. Karanovic, S. Sladojevic, M. Arsenovic, and A. Anderla, “Credit Card Fraud Detection - Machine Learning methods,” in 2019 18th International Symposium INFOTEH-JAHORINA, INFOTEH 2019 - Proceedings, Institute of Electrical and Electronics Engineers Inc., May 2019. doi: 10.1109/INFOTEH.2019.8717766.

S. Misra, V. O. Matthews, A. Adewumi, O. S. Covenant University (Ota, IEEE Nigeria Section, and Institute of Electrical and Electronics Engineers, Proceedings of the IEEE International Conference on Computing, Networking and Informatics (ICCNI 2017) : 29-31 October, 2017, Covenant University, Canaanland, Ota, Ogun State, Nigeria.

D. Varmedja, M. Karanovic, S. Sladojevic, M. Arsenovic, and A. Anderla, “Credit Card Fraud Detection - Machine Learning methods,” in 2019 18th International Symposium INFOTEH-JAHORINA, INFOTEH 2019 - Proceedings, Institute of Electrical and Electronics Engineers Inc., May 2019. doi: 10.1109/INFOTEH.2019.8717766.

S. Misra, V. O. Matthews, A. Adewumi, O. S. Covenant University (Ota, IEEE Nigeria Section, and Institute of Electrical and Electronics Engineers, Proceedings of the IEEE International Conference on Computing, Networking and Informatics (ICCNI 2017) : 29-31 October, 2017, Covenant University, Canaanland, Ota, Ogun State, Nigeria.

V. N. Dornadula and S. Geetha, “Credit Card Fraud Detection using Machine Learning Algorithms,” in Procedia Computer Science, Elsevier B.V., 2019, pp. 631–641. doi: 10.1016/j.procs.2020.01.057.

R. J. Urbanowicz, M. Meeker, W. La Cava, R. S. Olson, and J. H. Moore, “Relief-based feature selection: Introduction and review,” Journal of Biomedical Informatics, vol. 85. Academic Press Inc., pp. 189–203, Sep. 01, 2018. doi: 10.1016/j.jbi.2018.07.014.

A. Jović, K. Brkić, and N. Bogunović, “A review of feature selection methods with applications,” in 2015 38th International Convention on Information and Communication Technology, Electronics and Microelectronics, MIPRO 2015 - Proceedings, Institute of Electrical and Electronics Engineers Inc., Jul. 2015, pp. 1200–1205. doi: 10.1109/MIPRO.2015.7160458.

R. Das, G. Kasieczka, and D. Shih, “Feature Selection with Distance Correlation,” Nov. 2022, [Online]. Available: http://arxiv.org/abs/2212.00046

J. Li et al., “Feature selection: A data perspective,” ACM Computing Surveys, vol. 50, no. 6. Association for Computing Machinery, Dec. 01, 2017. doi: 10.1145/3136625.

D. Kornbrot, “Point Biserial Correlation,” in Wiley StatsRef: Statistics Reference Online, Wiley, 2014. doi: 10.1002/9781118445112.stat06227.

E. Curtis, C. Comiskey, and O. Dempsey, “Importance and use of correlational research,” Nurse Res, vol. 23, no. 6, pp. 20–25, Jul. 2016, doi: 10.7748/nr.2016.e1382.

N. J. Gogtay and U. M. Thatte, “Principles of Correlation Analysis,” 2017.

M. Tanner et al., “Introduction to Multivariate Analysis Analysis of Failure and Survival Data The Analysis and Interpretation of Multivariate Data for Social Scientists The Analysis of Time Series-An Introduction, Sixth Edition Bayes and Empirical Bayes Methods for Data Analysis, Second Edition Bayesian Data Analysis, Second Edition.”

B. Verhulst and M. C. Neale, “Best Practices for Binary and Ordinal Data Analyses,” Behav Genet, vol. 51, no. 3, pp. 204–214, May 2021, doi: 10.1007/s10519-020-10031-x.

A. S. Kubasch et al., “Predicting Early Relapse for Patients with Multiple Myeloma through Machine Learning,” Blood, vol. 138, no. Supplement 1, pp. 2953–2953, Nov. 2021, doi: 10.1182/blood-2021-151195.

J. Jeong and J. Choi, “Development of AOP relevant to microplastics based on toxicity mechanisms of chemical additives using ToxCastTM and deep learning models combined approach,” Environ Int, vol. 137, Apr. 2020, doi: 10.1016/j.envint.2020.105557.

K. Iqbal and M. S. Khan, “Email classification analysis using machine learning techniques,” Applied Computing and Informatics, 2022, doi: 10.1108/ACI-01-2022-0012.

A. Chatterjee, M. Vallières, and J. Seuntjens, “Overlooked pitfalls in multi-class machine learning classification in radiation oncology and how to avoid them,” Physica Medica, vol. 70, pp. 96–100, Feb. 2020, doi: 10.1016/j.ejmp.2020.01.009.

M. Chang, R. J. Dalpatadu, and A. K. Singh, “Selection of Transformations of Continuous Predictors in Logistic Regression,” in Advances in Intelligent Systems and Computing, Springer Verlag, 2018, pp. 443–447. doi: 10.1007/978-3-319-77028-4_58.

S. Subash Chandra Bose, A. Vinoth Kumar, A. Premkumar, M. Deepika, and M. Gokilavani, “Biserial targeted feature projection based radial kernel regressive deep belief neural learning for covid-19 prediction,” Soft comput, vol. 27, no. 3, pp. 1651–1662, Feb. 2023, doi: 10.1007/s00500-022-06943-x.

S. G. Khalid, S. M. Ali, H. Liu, A. G. Qurashi, and U. Ali, “Photoplethysmography temporal marker-based machine learning classifier for anesthesia drug detection,” Med Biol Eng Comput, vol. 60, no. 11, pp. 3057–3068, Nov. 2022, doi: 10.1007/s11517-022-02658-1.

W. Yassin et al., “Machine-learning classification using neuroimaging data in schizophrenia, autism, ultra-high risk and first-episode psychosis,” Transl Psychiatry, vol. 10, no. 1, Dec. 2020, doi: 10.1038/s41398-020-00965-5.

S. Syed-Abdul et al., “Artificial Intelligence based Models for Screening of Hematologic Malignancies using Cell Population Data,” Sci Rep, vol. 10, no. 1, Dec. 2020, doi: 10.1038/s41598-020-61247-0.

Y. Ushio et al., “Machine learning for morbid glomerular hypertrophy,” Sci Rep, vol. 12, no. 1, Dec. 2022, doi: 10.1038/s41598-022-23882-7.

C. Celeste et al., “Ethnic disparity in diagnosing asymptomatic bacterial vaginosis using machine learning,” NPJ Digit Med, vol. 6, no. 1, Nov. 2023, doi: 10.1038/s41746-023-00953-1.

G. Sonowal, “Detecting Phishing SMS Based on Multiple Correlation Algorithms,” SN Comput Sci, vol. 1, no. 6, Nov. 2020, doi: 10.1007/s42979-020-00377-8.

Y. Cheng and H. Liu, “A short note on the maximal point-biserial correlation under non-normality,” Br J Math Stat Psychol, vol. 69, no. 3, pp. 344–351, Nov. 2016, doi: 10.1111/bmsp.12075.

D. G. Bonett, “Point-biserial correlation: Interval estimation, hypothesis testing, meta-analysis, and sample size determination,” British Journal of Mathematical and Statistical Psychology, vol. 73, no. S1, pp. 113–144, Nov. 2020, doi: 10.1111/bmsp.12189.

“USEFULNESS OF CORRELATION ANALYSIS Samithambe Senthilnathan.” [Online]. Available: https://ssrn.com/abstract=3416918https://ssrn.com/abstract=34169182https://ssrn.com/abstract=3416918

D. Mustafa Abdullah, A. Mohsin Abdulazeez, and A. Bibo Sallow, “Lung cancer Prediction and Classification based on Correlation Selection method Using Machine Learning Techniques,” Qubahan Academic Journal, vol. 1, no. 2, pp. 141–149, May 2021, doi: 10.48161/qaj.v1n2a58.

B. Taha Chicho, A. Mohsin Abdulazeez, D. Qader Zeebaree, and D. Assad Zebari, “Machine Learning Classifiers Based Classification For IRIS Recognition,” Qubahan Academic Journal, vol. 1, no. 2, pp. 106–118, May 2021, doi: 10.48161/qaj.v1n2a48.

R. Rajab Asaad, “Review on Deep Learning and Neural Network Implementation for Emotions Recognition,” Qubahan Academic Journal, vol. 1, no. 1, pp. 1–4, Feb. 2021, doi: 10.48161/qaj.v1n1a25.

A. Parmar, R. Katariya, and V. Patel, “A Review on Random Forest: An Ensemble Classifier,” in Lecture Notes on Data Engineering and Communications Technologies, vol. 26, Springer Science and Business Media Deutschland GmbH, 2019, pp. 758–763. doi: 10.1007/978-3-030-03146-6_86.

A. Chaudhary, S. Kolhe, and R. Kamal, “An improved random forest classifier for multi-class classification,” Information Processing in Agriculture, vol. 3, no. 4, pp. 215–222, Dec. 2016, doi: 10.1016/j.inpa.2016.08.002.

K. I. Taher, A. M. Abdulazeez, and D. A. Zebari, “Data Mining Classification Algorithms for Analyzing Soil Data,” Asian Journal of Research in Computer Science, pp. 17–28, May 2021, doi: 10.9734/ajrcos/2021/v8i230196.

A. A. H. Alkurdi, “Enhancing Heart Disease Diagnosis Using Machine Learning Classifiers,” Fusion: Practice and Applications, vol. 13, no. 1, pp. 08–18, 2023, doi: 10.54216/FPA.130101.

G. Biau and E. Scornet, “A Random Forest Guided Tour,” Nov. 2015, [Online]. Available: http://arxiv.org/abs/1511.05741

P. Valdiviezo-Diaz, F. Ortega, E. Cobos, and R. Lara-Cabrera, “A Collaborative Filtering Approach Based on Naïve Bayes Classifier,” IEEE Access, vol. 7, pp. 108581–108592, 2019, doi: 10.1109/ACCESS.2019.2933048.

J. Karandikar, T. McLeay, S. Turner, and T. Schmitz, “Tool wear monitoring using naïve Bayes classifiers,” International Journal of Advanced Manufacturing Technology, vol. 77, no. 9–12, pp. 1613–1626, Apr. 2015, doi: 10.1007/s00170-014-6560-6.

K. Chaudhuri, “Building Naive Bayes Classifier from Scratch to Perform Sentiment Analyses.” 2023. Accessed: Dec. 01, 2023. [Online]. Available: https://www.analyticsvidhya.com/blog/2022/03/building-naive-bayes-classifier-from-scratch-to-perform-sentiment-analysis/

F. J. Yang, “An implementation of naive bayes classifier,” in Proceedings - 2018 International Conference on Computational Science and Computational Intelligence, CSCI 2018, Institute of Electrical and Electronics Engineers Inc., Dec. 2018, pp. 301–306. doi: 10.1109/CSCI46756.2018.00065.

A. Prabhat and V. Khullar, “Sentiment classification on big data using Naïve bayes and logistic regression,” in 2017 International Conference on Computer Communication and Informatics, ICCCI 2017, Institute of Electrical and Electronics Engineers Inc., Nov. 2017. doi: 10.1109/ICCCI.2017.8117734.

P. Date and T. Potok, “Adiabatic quantum linear regression,” Sci Rep, vol. 11, no. 1, p. 21905, Nov. 2021, doi: 10.1038/s41598-021-01445-6.

Y. Yang and M. Loog, “A Benchmark and Comparison of Active Learning for Logistic Regression,” Nov. 2016, doi: 10.1016/j.patcog.2018.06.004.

L. Dong, J. Wesseloo, Y. Potvin, and X. Li, “Discrimination of Mine Seismic Events and Blasts Using the Fisher Classifier, Naive Bayesian Classifier and Logistic Regression,” Rock Mech Rock Eng, vol. 49, no. 1, pp. 183–211, Jan. 2016, doi: 10.1007/s00603-015-0733-y.

S. Kumar, S. Mishra, P. Khanna, and Pragya, “Precision Sugarcane Monitoring Using SVM Classifier,” in Procedia Computer Science, Elsevier B.V., 2017, pp. 881–887. doi: 10.1016/j.procs.2017.11.450.

C. Venkatesan, P. Karthigaikumar, A. Paul, S. Satheeskumaran, and R. Kumar, “ECG Signal Preprocessing and SVM Classifier-Based Abnormality Detection in Remote Healthcare Applications,” IEEE Access, vol. 6, pp. 9767–9773, Jan. 2018, doi: 10.1109/ACCESS.2018.2794346.

A. Vinayagam et al., “A random subspace ensemble classification model for discrimination of power quality events in solar PV microgrid power network,” PLoS One, vol. 17, no. 1, p. e0262570, Jan. 2022, doi: 10.1371/journal.pone.0262570.

A. S. Manek, P. D. Shenoy, M. C. Mohan, and K. R. Venugopal, “Aspect term extraction for sentiment analysis in large movie reviews using Gini Index feature selection method and SVM classifier,” World Wide Web, vol. 20, no. 2, pp. 135–154, Mar. 2017, doi: 10.1007/s11280-015-0381-x.

D. Mustafa Abdullah and A. Mohsin Abdulazeez, “Machine Learning Applications based on SVM Classification A Review,” Qubahan Academic Journal, vol. 1, no. 2, pp. 81–90, Apr. 2021, doi: 10.48161/qaj.v1n2a50.

A. Murugan, S. A. H. Nair, and K. P. S. Kumar, “Detection of Skin Cancer Using SVM, Random Forest and kNN Classifiers,” J Med Syst, vol. 43, no. 8, Aug. 2019, doi: 10.1007/s10916-019-1400-8.

G.-F. Fan, Y.-H. Guo, J.-M. Zheng, and W.-C. Hong, “Application of the Weighted K-Nearest Neighbor Algorithm for Short-Term Load Forecasting,” Energies (Basel), vol. 12, no. 5, p. 916, Mar. 2019, doi: 10.3390/en12050916.

S. Zhang, X. Li, M. Zong, X. Zhu, and R. Wang, “Efficient kNN classification with different numbers of nearest neighbors,” IEEE Trans Neural Netw Learn Syst, vol. 29, no. 5, pp. 1774–1785, May 2018, doi: 10.1109/TNNLS.2017.2673241.

H. Rashid Abdulqadir, A. Mohsin Abdulazeez, and D. Assad Zebari, “Data Mining Classification Techniques for Diabetes Prediction,” Qubahan Academic Journal, vol. 1, no. 2, pp. 125–133, May 2021, doi: 10.48161/qaj.v1n2a55.

A. Gul et al., “Ensemble of a subset of kNN classifiers,” Adv Data Anal Classif, vol. 12, no. 4, pp. 827–840, Jan. 2018, doi: 10.1007/s11634-015-0227-5.

C. Chen, Q. Zhang, Q. Ma, and B. Yu, “LightGBM-PPI: predicting protein-protein interactions through LightGBM with multi-information fusion,” 2019.

D. Ge, J. Gu, S. Chang, and J. H. Cai, “Credit card fraud detection using lightgbm model,” in Proceedings - 2020 International Conference on E-Commerce and Internet Technology, ECIT 2020, Institute of Electrical and Electronics Engineers Inc., Apr. 2020, pp. 232–236. doi: 10.1109/ECIT50008.2020.00060.

A. bin Asad, R. Mansur, S. Zawad, N. Evan, and M. I. Hossain, “Analysis of Malware Prediction Based on Infection Rate Using Machine Learning Techniques,” in 2020 IEEE Region 10 Symposium (TENSYMP), IEEE, 2020, pp. 706–709. doi: 10.1109/TENSYMP50017.2020.9230624.

D. Wang, Y. Zhang, and Y. Zhao, “LightGBM: An effective miRNA classification method in breast cancer patients,” in ACM International Conference Proceeding Series, Association for Computing Machinery, Oct. 2017, pp. 7–11. doi: 10.1145/3155077.3155079.

A. Subasi et al., “Sensor based human activity recognition using adaboost ensemble classifier,” in Procedia Computer Science, Elsevier B.V., 2018, pp. 104–111. doi: 10.1016/j.procs.2018.10.298.

Y. Zhao, L. Gong, B. Zhou, Y. Huang, and C. Liu, “Detecting tomatoes in greenhouse scenes by combining AdaBoost classifier and colour analysis,” Biosyst Eng, vol. 148, pp. 127–137, Aug. 2016, doi: 10.1016/j.biosystemseng.2016.05.001.

Shram Sadhana Bombay Trust College of Engineering and Technology, IEEE Computer Society, Institute of Electrical and Electronics Engineers. Bombay Section, and Institute of Electrical and Electronics Engineers., ICGTSPICC 2016 : International Conference on Global Trends in Signal Processing, Information Computing and Communication : proceedings : 22-24 December 2016, Jalgaon, Maharashtra, India.

N. ELGIRIYEWITHANA, “Credit Card Fraud Detection Dataset 2023.” Sep. 2023. Accessed: Oct. 01, 2023. [Online]. Available: https://www.kaggle.com/datasets/nelgiriyewithana/credit-card-fraud-detection-dataset-2023

N. Asaad Zebari, A. A. H. Alkurdi, R. B. Marqas, and M. Shamal Salih, “Enhancing Brain Tumor Classification with Data Augmentation and DenseNet121,” Academic Journal of Nawroz University, vol. 12, no. 4, pp. 323–334, Oct. 2023, doi: 10.25007/ajnu.v12n4a1985.

Evaluating the impact of point-biserial correlation-based feature selection on machine learning classifiers: a credit card fraud detection case study

Authors

DOI:

Keywords:

Abstract

References

Downloads

Published

How to Cite

Issue

Section

License

Make a Submission

indexing

Keywords

visitors