Research on the Impact of Data Volume on the Accuracy of Anomaly Detection Methods in Network Traffic

Authors

DOI:

https://doi.org/10.20397/2177-6652/2025.v25i2.3161

Abstract

This article discusses the use of machine learning algorithms to detect anomalies based on the CICIDS2017 dataset, which was specifically designed to simulate real- world network attack scenarios. Special attention is paid to three popular algorithms: logistic regression, random forest and neural networks. These algorithms were chosen due to their ability to efficiently process large amounts of data and identify complex patterns. Within the framework of this article, a series of experiments has been conducted in which the amount of training data will vary and the performance of models will be evaluated, both on pure and noisy data. For noisy data, neural networks retain their lead with a slight accuracy drop, while random forest performs well but is less effective than on clean data. Logistic regression, though most sensitive to noise, improves with larger datasets, emphasizing the need for thorough preprocessing.The results of this study will help to better understand how different algorithms respond to changes in the amount of data and the quality of input information, which is an important aspect for developing effective cyber security systems

Author Biographies

Anastasia Ma, ITMO University, Saint Petersburg, Russia

Faculty of Software Engineering and Computer Systems

Elena Avksentieva, ITMO University, Saint Petersburg, Russia

Faculty of Software Engineering and Computer Systems

Nikolai Zhukov, ITMO University, Saint Petersburg, Russia; The Herzen State Pedagogical University of Russia, Saint-Petersburg, Russia

1Faculty of Software Engineering and Computer Systems
2 Institute of Computer Science and Technology Education

Downloads

Published

2025-04-07

How to Cite

Ma, A., Avksentieva, E., & Zhukov, N. (2025). Research on the Impact of Data Volume on the Accuracy of Anomaly Detection Methods in Network Traffic. Revista Gestão & Tecnologia, 25(2), 108–125. https://doi.org/10.20397/2177-6652/2025.v25i2.3161