Evaluating the impact of point-biserial correlation-based feature selection on machine learning classifiers: a credit card fraud detection case study

Ahmed A.H. Alkurdi; Renas R. Asaad; Saman  M Almufti; Nawzat  S. Ahmed

doi:10.20397/2177-6652/2024.v24iSpecial.2882

Autores

Ahmed A.H. Alkurdi Department of Information Technology, Duhok Technical College, Duhok Polytechnic University, Duhok, KRG-Iraq; Department of Computer Science, College of Science, Nawroz University, Duhok, KRG-Iraq.
Renas R. Asaad Department of Computer Science, College of Science, Nawroz University, Duhok, KRG-Iraq. Department of Technical Informatics, Technical College of Informatics, Akre University for Applied Science, Duhok, KRG-Iraq.
Saman M Almufti Department of Computer Science, College of Science, Nawroz University, Duhok, KRG-Iraq Department of Technical Informatics, Technical College of Informatics, Akre University for Applied Science, Duhok, KRG-Iraq
Nawzat S. Ahmed Department of Information Technology, Duhok Technical College, Duhok Polytechnic University, Duhok, KRG-Iraq

DOI:

https://doi.org/10.20397/2177-6652/2024.v24iSpecial.2882

Palavras-chave:

Cartão de Crédito, Fraude, Aprendizado de Máquina, Desempenho Preditivo, Seleção de Recursos Baseada em PBC

Resumo

Objetivo: Este artigo examina os fatores que influenciam a conscientização e a adoção das Normas Internacionais de Contabilidade do Setor Público (IPSAS) nas unidades públicas do Vietnã. O objetivo é identificar os principais desafios e impulsionadores que afetam a compreensão e a implementação dessas normas.

Métodos: O estudo utiliza uma metodologia de pesquisa, coletando respostas de uma amostra de unidades de serviço público no Vietnã. O questionário foi elaborado para avaliar o nível de conscientização e prontidão dessas unidades para adotar as IPSAS, considerando variáveis como apoio gerencial, treinamento e infraestrutura técnica. Foi realizada uma análise estatística para determinar os fatores mais influentes.

Resultados: Os resultados destacam que o apoio gerencial, o treinamento adequado e o acesso à infraestrutura técnica apropriada são cruciais para a implementação bem-sucedida das IPSAS. A falta de conscientização, treinamento insuficiente e limitações de recursos são as principais barreiras à adoção dessas normas. Unidades públicas com maiores níveis de conscientização e melhor acesso a recursos são mais propensas a implementar as IPSAS com sucesso.

Contribuição: O estudo oferece insights valiosos sobre o processo de adoção das IPSAS no setor público do Vietnã. Ele oferece recomendações para melhorar os programas de treinamento, aumentar o apoio gerencial e fortalecer a capacidade técnica das unidades públicas para garantir uma implementação mais suave das normas.

Conclusão: A implementação das IPSAS no setor público do Vietnã é influenciada por vários fatores-chave, como conscientização, treinamento e infraestrutura. O fortalecimento dessas áreas pode melhorar significativamente o processo de adoção e aumentar a transparência e a responsabilidade na gestão financeira pública.

Referências

D. Varmedja, M. Karanovic, S. Sladojevic, M. Arsenovic, and A. Anderla, “Credit Card Fraud Detection - Machine Learning methods,” in 2019 18th International Symposium INFOTEH-JAHORINA, INFOTEH 2019 - Proceedings, Institute of Electrical and Electronics Engineers Inc., May 2019. doi: 10.1109/INFOTEH.2019.8717766.

S. Misra, V. O. Matthews, A. Adewumi, O. S. Covenant University (Ota, IEEE Nigeria Section, and Institute of Electrical and Electronics Engineers, Proceedings of the IEEE International Conference on Computing, Networking and Informatics (ICCNI 2017) : 29-31 October, 2017, Covenant University, Canaanland, Ota, Ogun State, Nigeria.

D. Varmedja, M. Karanovic, S. Sladojevic, M. Arsenovic, and A. Anderla, “Credit Card Fraud Detection - Machine Learning methods,” in 2019 18th International Symposium INFOTEH-JAHORINA, INFOTEH 2019 - Proceedings, Institute of Electrical and Electronics Engineers Inc., May 2019. doi: 10.1109/INFOTEH.2019.8717766.

S. Misra, V. O. Matthews, A. Adewumi, O. S. Covenant University (Ota, IEEE Nigeria Section, and Institute of Electrical and Electronics Engineers, Proceedings of the IEEE International Conference on Computing, Networking and Informatics (ICCNI 2017) : 29-31 October, 2017, Covenant University, Canaanland, Ota, Ogun State, Nigeria.

V. N. Dornadula and S. Geetha, “Credit Card Fraud Detection using Machine Learning Algorithms,” in Procedia Computer Science, Elsevier B.V., 2019, pp. 631–641. doi: 10.1016/j.procs.2020.01.057.

R. J. Urbanowicz, M. Meeker, W. La Cava, R. S. Olson, and J. H. Moore, “Relief-based feature selection: Introduction and review,” Journal of Biomedical Informatics, vol. 85. Academic Press Inc., pp. 189–203, Sep. 01, 2018. doi: 10.1016/j.jbi.2018.07.014.

A. Jović, K. Brkić, and N. Bogunović, “A review of feature selection methods with applications,” in 2015 38th International Convention on Information and Communication Technology, Electronics and Microelectronics, MIPRO 2015 - Proceedings, Institute of Electrical and Electronics Engineers Inc., Jul. 2015, pp. 1200–1205. doi: 10.1109/MIPRO.2015.7160458.

R. Das, G. Kasieczka, and D. Shih, “Feature Selection with Distance Correlation,” Nov. 2022, [Online]. Available: http://arxiv.org/abs/2212.00046

J. Li et al., “Feature selection: A data perspective,” ACM Computing Surveys, vol. 50, no. 6. Association for Computing Machinery, Dec. 01, 2017. doi: 10.1145/3136625.

D. Kornbrot, “Point Biserial Correlation,” in Wiley StatsRef: Statistics Reference Online, Wiley, 2014. doi: 10.1002/9781118445112.stat06227.

E. Curtis, C. Comiskey, and O. Dempsey, “Importance and use of correlational research,” Nurse Res, vol. 23, no. 6, pp. 20–25, Jul. 2016, doi: 10.7748/nr.2016.e1382.

N. J. Gogtay and U. M. Thatte, “Principles of Correlation Analysis,” 2017.

M. Tanner et al., “Introduction to Multivariate Analysis Analysis of Failure and Survival Data The Analysis and Interpretation of Multivariate Data for Social Scientists The Analysis of Time Series-An Introduction, Sixth Edition Bayes and Empirical Bayes Methods for Data Analysis, Second Edition Bayesian Data Analysis, Second Edition.”

B. Verhulst and M. C. Neale, “Best Practices for Binary and Ordinal Data Analyses,” Behav Genet, vol. 51, no. 3, pp. 204–214, May 2021, doi: 10.1007/s10519-020-10031-x.

A. S. Kubasch et al., “Predicting Early Relapse for Patients with Multiple Myeloma through Machine Learning,” Blood, vol. 138, no. Supplement 1, pp. 2953–2953, Nov. 2021, doi: 10.1182/blood-2021-151195.

J. Jeong and J. Choi, “Development of AOP relevant to microplastics based on toxicity mechanisms of chemical additives using ToxCastTM and deep learning models combined approach,” Environ Int, vol. 137, Apr. 2020, doi: 10.1016/j.envint.2020.105557.

K. Iqbal and M. S. Khan, “Email classification analysis using machine learning techniques,” Applied Computing and Informatics, 2022, doi: 10.1108/ACI-01-2022-0012.

A. Chatterjee, M. Vallières, and J. Seuntjens, “Overlooked pitfalls in multi-class machine learning classification in radiation oncology and how to avoid them,” Physica Medica, vol. 70, pp. 96–100, Feb. 2020, doi: 10.1016/j.ejmp.2020.01.009.

M. Chang, R. J. Dalpatadu, and A. K. Singh, “Selection of Transformations of Continuous Predictors in Logistic Regression,” in Advances in Intelligent Systems and Computing, Springer Verlag, 2018, pp. 443–447. doi: 10.1007/978-3-319-77028-4_58.

S. Subash Chandra Bose, A. Vinoth Kumar, A. Premkumar, M. Deepika, and M. Gokilavani, “Biserial targeted feature projection based radial kernel regressive deep belief neural learning for covid-19 prediction,” Soft comput, vol. 27, no. 3, pp. 1651–1662, Feb. 2023, doi: 10.1007/s00500-022-06943-x.

S. G. Khalid, S. M. Ali, H. Liu, A. G. Qurashi, and U. Ali, “Photoplethysmography temporal marker-based machine learning classifier for anesthesia drug detection,” Med Biol Eng Comput, vol. 60, no. 11, pp. 3057–3068, Nov. 2022, doi: 10.1007/s11517-022-02658-1.

W. Yassin et al., “Machine-learning classification using neuroimaging data in schizophrenia, autism, ultra-high risk and first-episode psychosis,” Transl Psychiatry, vol. 10, no. 1, Dec. 2020, doi: 10.1038/s41398-020-00965-5.

S. Syed-Abdul et al., “Artificial Intelligence based Models for Screening of Hematologic Malignancies using Cell Population Data,” Sci Rep, vol. 10, no. 1, Dec. 2020, doi: 10.1038/s41598-020-61247-0.

Y. Ushio et al., “Machine learning for morbid glomerular hypertrophy,” Sci Rep, vol. 12, no. 1, Dec. 2022, doi: 10.1038/s41598-022-23882-7.

C. Celeste et al., “Ethnic disparity in diagnosing asymptomatic bacterial vaginosis using machine learning,” NPJ Digit Med, vol. 6, no. 1, Nov. 2023, doi: 10.1038/s41746-023-00953-1.

G. Sonowal, “Detecting Phishing SMS Based on Multiple Correlation Algorithms,” SN Comput Sci, vol. 1, no. 6, Nov. 2020, doi: 10.1007/s42979-020-00377-8.

Y. Cheng and H. Liu, “A short note on the maximal point-biserial correlation under non-normality,” Br J Math Stat Psychol, vol. 69, no. 3, pp. 344–351, Nov. 2016, doi: 10.1111/bmsp.12075.

D. G. Bonett, “Point-biserial correlation: Interval estimation, hypothesis testing, meta-analysis, and sample size determination,” British Journal of Mathematical and Statistical Psychology, vol. 73, no. S1, pp. 113–144, Nov. 2020, doi: 10.1111/bmsp.12189.

“USEFULNESS OF CORRELATION ANALYSIS Samithambe Senthilnathan.” [Online]. Available: https://ssrn.com/abstract=3416918https://ssrn.com/abstract=34169182https://ssrn.com/abstract=3416918

D. Mustafa Abdullah, A. Mohsin Abdulazeez, and A. Bibo Sallow, “Lung cancer Prediction and Classification based on Correlation Selection method Using Machine Learning Techniques,” Qubahan Academic Journal, vol. 1, no. 2, pp. 141–149, May 2021, doi: 10.48161/qaj.v1n2a58.

B. Taha Chicho, A. Mohsin Abdulazeez, D. Qader Zeebaree, and D. Assad Zebari, “Machine Learning Classifiers Based Classification For IRIS Recognition,” Qubahan Academic Journal, vol. 1, no. 2, pp. 106–118, May 2021, doi: 10.48161/qaj.v1n2a48.

R. Rajab Asaad, “Review on Deep Learning and Neural Network Implementation for Emotions Recognition,” Qubahan Academic Journal, vol. 1, no. 1, pp. 1–4, Feb. 2021, doi: 10.48161/qaj.v1n1a25.

A. Parmar, R. Katariya, and V. Patel, “A Review on Random Forest: An Ensemble Classifier,” in Lecture Notes on Data Engineering and Communications Technologies, vol. 26, Springer Science and Business Media Deutschland GmbH, 2019, pp. 758–763. doi: 10.1007/978-3-030-03146-6_86.

A. Chaudhary, S. Kolhe, and R. Kamal, “An improved random forest classifier for multi-class classification,” Information Processing in Agriculture, vol. 3, no. 4, pp. 215–222, Dec. 2016, doi: 10.1016/j.inpa.2016.08.002.

K. I. Taher, A. M. Abdulazeez, and D. A. Zebari, “Data Mining Classification Algorithms for Analyzing Soil Data,” Asian Journal of Research in Computer Science, pp. 17–28, May 2021, doi: 10.9734/ajrcos/2021/v8i230196.

A. A. H. Alkurdi, “Enhancing Heart Disease Diagnosis Using Machine Learning Classifiers,” Fusion: Practice and Applications, vol. 13, no. 1, pp. 08–18, 2023, doi: 10.54216/FPA.130101.

G. Biau and E. Scornet, “A Random Forest Guided Tour,” Nov. 2015, [Online]. Available: http://arxiv.org/abs/1511.05741

P. Valdiviezo-Diaz, F. Ortega, E. Cobos, and R. Lara-Cabrera, “A Collaborative Filtering Approach Based on Naïve Bayes Classifier,” IEEE Access, vol. 7, pp. 108581–108592, 2019, doi: 10.1109/ACCESS.2019.2933048.

J. Karandikar, T. McLeay, S. Turner, and T. Schmitz, “Tool wear monitoring using naïve Bayes classifiers,” International Journal of Advanced Manufacturing Technology, vol. 77, no. 9–12, pp. 1613–1626, Apr. 2015, doi: 10.1007/s00170-014-6560-6.

K. Chaudhuri, “Building Naive Bayes Classifier from Scratch to Perform Sentiment Analyses.” 2023. Accessed: Dec. 01, 2023. [Online]. Available: https://www.analyticsvidhya.com/blog/2022/03/building-naive-bayes-classifier-from-scratch-to-perform-sentiment-analysis/

F. J. Yang, “An implementation of naive bayes classifier,” in Proceedings - 2018 International Conference on Computational Science and Computational Intelligence, CSCI 2018, Institute of Electrical and Electronics Engineers Inc., Dec. 2018, pp. 301–306. doi: 10.1109/CSCI46756.2018.00065.

A. Prabhat and V. Khullar, “Sentiment classification on big data using Naïve bayes and logistic regression,” in 2017 International Conference on Computer Communication and Informatics, ICCCI 2017, Institute of Electrical and Electronics Engineers Inc., Nov. 2017. doi: 10.1109/ICCCI.2017.8117734.

P. Date and T. Potok, “Adiabatic quantum linear regression,” Sci Rep, vol. 11, no. 1, p. 21905, Nov. 2021, doi: 10.1038/s41598-021-01445-6.

Y. Yang and M. Loog, “A Benchmark and Comparison of Active Learning for Logistic Regression,” Nov. 2016, doi: 10.1016/j.patcog.2018.06.004.

L. Dong, J. Wesseloo, Y. Potvin, and X. Li, “Discrimination of Mine Seismic Events and Blasts Using the Fisher Classifier, Naive Bayesian Classifier and Logistic Regression,” Rock Mech Rock Eng, vol. 49, no. 1, pp. 183–211, Jan. 2016, doi: 10.1007/s00603-015-0733-y.

S. Kumar, S. Mishra, P. Khanna, and Pragya, “Precision Sugarcane Monitoring Using SVM Classifier,” in Procedia Computer Science, Elsevier B.V., 2017, pp. 881–887. doi: 10.1016/j.procs.2017.11.450.

C. Venkatesan, P. Karthigaikumar, A. Paul, S. Satheeskumaran, and R. Kumar, “ECG Signal Preprocessing and SVM Classifier-Based Abnormality Detection in Remote Healthcare Applications,” IEEE Access, vol. 6, pp. 9767–9773, Jan. 2018, doi: 10.1109/ACCESS.2018.2794346.

A. Vinayagam et al., “A random subspace ensemble classification model for discrimination of power quality events in solar PV microgrid power network,” PLoS One, vol. 17, no. 1, p. e0262570, Jan. 2022, doi: 10.1371/journal.pone.0262570.

A. S. Manek, P. D. Shenoy, M. C. Mohan, and K. R. Venugopal, “Aspect term extraction for sentiment analysis in large movie reviews using Gini Index feature selection method and SVM classifier,” World Wide Web, vol. 20, no. 2, pp. 135–154, Mar. 2017, doi: 10.1007/s11280-015-0381-x.

D. Mustafa Abdullah and A. Mohsin Abdulazeez, “Machine Learning Applications based on SVM Classification A Review,” Qubahan Academic Journal, vol. 1, no. 2, pp. 81–90, Apr. 2021, doi: 10.48161/qaj.v1n2a50.

A. Murugan, S. A. H. Nair, and K. P. S. Kumar, “Detection of Skin Cancer Using SVM, Random Forest and kNN Classifiers,” J Med Syst, vol. 43, no. 8, Aug. 2019, doi: 10.1007/s10916-019-1400-8.

G.-F. Fan, Y.-H. Guo, J.-M. Zheng, and W.-C. Hong, “Application of the Weighted K-Nearest Neighbor Algorithm for Short-Term Load Forecasting,” Energies (Basel), vol. 12, no. 5, p. 916, Mar. 2019, doi: 10.3390/en12050916.

S. Zhang, X. Li, M. Zong, X. Zhu, and R. Wang, “Efficient kNN classification with different numbers of nearest neighbors,” IEEE Trans Neural Netw Learn Syst, vol. 29, no. 5, pp. 1774–1785, May 2018, doi: 10.1109/TNNLS.2017.2673241.

H. Rashid Abdulqadir, A. Mohsin Abdulazeez, and D. Assad Zebari, “Data Mining Classification Techniques for Diabetes Prediction,” Qubahan Academic Journal, vol. 1, no. 2, pp. 125–133, May 2021, doi: 10.48161/qaj.v1n2a55.

A. Gul et al., “Ensemble of a subset of kNN classifiers,” Adv Data Anal Classif, vol. 12, no. 4, pp. 827–840, Jan. 2018, doi: 10.1007/s11634-015-0227-5.

C. Chen, Q. Zhang, Q. Ma, and B. Yu, “LightGBM-PPI: predicting protein-protein interactions through LightGBM with multi-information fusion,” 2019.

D. Ge, J. Gu, S. Chang, and J. H. Cai, “Credit card fraud detection using lightgbm model,” in Proceedings - 2020 International Conference on E-Commerce and Internet Technology, ECIT 2020, Institute of Electrical and Electronics Engineers Inc., Apr. 2020, pp. 232–236. doi: 10.1109/ECIT50008.2020.00060.

A. bin Asad, R. Mansur, S. Zawad, N. Evan, and M. I. Hossain, “Analysis of Malware Prediction Based on Infection Rate Using Machine Learning Techniques,” in 2020 IEEE Region 10 Symposium (TENSYMP), IEEE, 2020, pp. 706–709. doi: 10.1109/TENSYMP50017.2020.9230624.

D. Wang, Y. Zhang, and Y. Zhao, “LightGBM: An effective miRNA classification method in breast cancer patients,” in ACM International Conference Proceeding Series, Association for Computing Machinery, Oct. 2017, pp. 7–11. doi: 10.1145/3155077.3155079.

A. Subasi et al., “Sensor based human activity recognition using adaboost ensemble classifier,” in Procedia Computer Science, Elsevier B.V., 2018, pp. 104–111. doi: 10.1016/j.procs.2018.10.298.

Y. Zhao, L. Gong, B. Zhou, Y. Huang, and C. Liu, “Detecting tomatoes in greenhouse scenes by combining AdaBoost classifier and colour analysis,” Biosyst Eng, vol. 148, pp. 127–137, Aug. 2016, doi: 10.1016/j.biosystemseng.2016.05.001.

Shram Sadhana Bombay Trust College of Engineering and Technology, IEEE Computer Society, Institute of Electrical and Electronics Engineers. Bombay Section, and Institute of Electrical and Electronics Engineers., ICGTSPICC 2016 : International Conference on Global Trends in Signal Processing, Information Computing and Communication : proceedings : 22-24 December 2016, Jalgaon, Maharashtra, India.

N. ELGIRIYEWITHANA, “Credit Card Fraud Detection Dataset 2023.” Sep. 2023. Accessed: Oct. 01, 2023. [Online]. Available: https://www.kaggle.com/datasets/nelgiriyewithana/credit-card-fraud-detection-dataset-2023

N. Asaad Zebari, A. A. H. Alkurdi, R. B. Marqas, and M. Shamal Salih, “Enhancing Brain Tumor Classification with Data Augmentation and DenseNet121,” Academic Journal of Nawroz University, vol. 12, no. 4, pp. 323–334, Oct. 2023, doi: 10.25007/ajnu.v12n4a1985.

Avaliando o impacto da seleção de recursos baseada na correlação ponto-bisserial em classificadores de aprendizado de máquina: um estudo de caso de detecção de fraude em cartão de crédito

Autores

DOI:

Palavras-chave:

Resumo

Referências

Downloads

Publicado

Como Citar

Edição

Seção

Licença

Enviar Submissão

indexing

Palavras-chave

visitors