Research on the Impact of Data Volume on the Accuracy of Anomaly Detection Methods in Network Traffic
DOI:
https://doi.org/10.20397/2177-6652/2025.v25i2.3161Resumen
This article discusses the use of machine learning algorithms to detect anomalies based on the CICIDS2017 dataset, which was specifically designed to simulate real- world network attack scenarios. Special attention is paid to three popular algorithms: logistic regression, random forest and neural networks. These algorithms were chosen due to their ability to efficiently process large amounts of data and identify complex patterns. Within the framework of this article, a series of experiments has been conducted in which the amount of training data will vary and the performance of models will be evaluated, both on pure and noisy data. For noisy data, neural networks retain their lead with a slight accuracy drop, while random forest performs well but is less effective than on clean data. Logistic regression, though most sensitive to noise, improves with larger datasets, emphasizing the need for thorough preprocessing.The results of this study will help to better understand how different algorithms respond to changes in the amount of data and the quality of input information, which is an important aspect for developing effective cyber security systems
Descargas
Publicado
Cómo citar
Número
Sección
Licencia
Derechos de autor 2025 Revista Gestão & Tecnologia

Esta obra está bajo una licencia internacional Creative Commons Atribución-NoComercial 4.0.
Os direitos, inclusive os de tradução, são reservados. É permitido citar parte de artigos sem autorização prévia desde que seja identificada a fonte. A reprodução total de artigos é proibida. Em caso de dúvidas, consulte o Editor.