Research on the Impact of Data Volume on the Accuracy of Anomaly Detection Methods in Network Traffic

Autores/as

DOI:

https://doi.org/10.20397/2177-6652/2025.v25i2.3161

Resumen

This article discusses the use of machine learning algorithms to detect anomalies based on the CICIDS2017 dataset, which was specifically designed to simulate real- world network attack scenarios. Special attention is paid to three popular algorithms: logistic regression, random forest and neural networks. These algorithms were chosen due to their ability to efficiently process large amounts of data and identify complex patterns. Within the framework of this article, a series of experiments has been conducted in which the amount of training data will vary and the performance of models will be evaluated, both on pure and noisy data. For noisy data, neural networks retain their lead with a slight accuracy drop, while random forest performs well but is less effective than on clean data. Logistic regression, though most sensitive to noise, improves with larger datasets, emphasizing the need for thorough preprocessing.The results of this study will help to better understand how different algorithms respond to changes in the amount of data and the quality of input information, which is an important aspect for developing effective cyber security systems

Biografía del autor/a

Anastasia Ma, ITMO University, Saint Petersburg, Russia

Facultad de Ingeniería de Software y Sistemas Computacionales

Elena Avksentieva, ITMO University, Saint Petersburg, Russia

Facultad de Ingeniería de Software y Sistemas Computacionales

Nikolai Zhukov, ITMO University, Saint Petersburg, Russia; The Herzen State Pedagogical University of Russia, Saint-Petersburg, Russia

1 Facultad de Ingeniería de Software y Sistemas Informáticos
2 Instituto de Educación en Ciencias de la Computación y Tecnología

Descargas

Publicado

2025-04-07

Cómo citar

Ma, A., Avksentieva, E., & Zhukov, N. (2025). Research on the Impact of Data Volume on the Accuracy of Anomaly Detection Methods in Network Traffic. Revista Gestão & Tecnologia, 25(2), 108–125. https://doi.org/10.20397/2177-6652/2025.v25i2.3161