An Empirical Study on AI-Driven Anomaly Detection for Improving Data Integrity in Real-Time Data Streams

Authors

  • Martinez Nguyen Sophia Real-Time AI Anomaly Detection Engineer, United Kingdom. Author

Keywords:

Anomaly Detection, Data Integrity, Real-Time Data Streams, Artificial Intelligence, Machine Learning, LSTM Autoencoder, Stream Processing, IoT Data, Hybrid Models'

Abstract

The proliferation of real-time data streams from IoT sensors, financial transactions, and network logs presents significant challenges in maintaining data integrity. Anomalies, whether from faulty sources, malicious attacks, or system errors, can corrupt analytics and decision-making. This paper presents an empirical study evaluating the efficacy of Artificial Intelligence (AI)-driven models for anomaly detection in real-time streaming environments. We implement and compare a hybrid model combining a Long Short-Term Memory (LSTM) autoencoder with a contextual random forest classifier against standalone models. Using a benchmark dataset of IoT sensor readings, our experiments demonstrate that the hybrid approach achieves a superior balance between high detection precision (94.3%) and low latency (under 15 ms). The study further proposes a modular architectural framework for integrating such AI models into streaming pipelines, highlighting practical considerations for deployment. Results confirm that AI-driven methods are pivotal for ensuring data integrity by enabling immediate, accurate identification of deviations in high-velocity data.

References

[1] Chandola, V., Banerjee, A., & Kumar, V. (2009). Anomaly detection: A survey. ACM Computing Surveys (CSUR), 41(3), 1-58.

[2] Gentyala, R. (2024). From Pipelines to Predictions: An Empirical Study on the Critical Behavioral Markers and Skill Pathways for Effective AI Data Engineering. Journal of Scientific and Engineering Research, 11(11), 187–197.

[3] Carbone, P., Katsifodimos, A., Ewen, S., Markl, V., Haridi, S., & Tzoumas, K. (2015). Apache Flink: Stream and batch processing in a single engine. Bulletin of the IEEE Computer Society Technical Committee on Data Engineering, 36(4).

[4] Malhotra, P., Ramakrishnan, A., Anand, G., Vig, L., Agarwal, P., & Shroff, G. (2015). LSTM-based encoder-decoder for multi-sensor anomaly detection. arXiv preprint arXiv:1507.00148.

[5] Gentyala, R. (2024). An Economic Model for Data Quality Tool Selection: Quantifying the Trade-off Between Rule-Based and AI-Driven Approaches in Enterprise Data Pipelines. Journal of Scientific and Engineering Research, 11(4), 409–421.

[6] Sakurada, M., & Yairi, T. (2014). Anomaly detection using autoencoders with nonlinear dimensionality reduction. Proceedings of the MLSDA 2014 2nd Workshop on Machine Learning for Sensory Data Analysis, 4-11.

[7] Hundman, K., Constantinou, V., Laporte, C., Colwell, I., & Soderstrom, T. (2018). Detecting spacecraft anomalies using LSTMs and nonparametric dynamic thresholding. Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, 387-395.

[8] Gentyala, R. (2024). From Bronze to Broken: A Grounded Theory Study of Anti-Patterns and Accruing Data Debt in Medallion Lakehouse Deployments. European Journal of Advances in Engineering and Technology, 11(1), 90–100.

[9] Liu, F. T., Ting, K. M., & Zhou, Z. H. (2008). Isolation forest. 2008 Eighth IEEE International Conference on Data Mining, 413-422.

[10] Wu, Z., Pan, S., Chen, F., Long, G., Zhang, C., & Yu, P. S. (2022). A comprehensive survey on graph neural networks. IEEE Transactions on Neural Networks and Learning Systems, 32(1), 4-24.

[11] Aggarwal, C. C. (2015). Outlier analysis. Springer.

[12] Breunig, M. M., Kriegel, H. P., Ng, R. T., & Sander, J. (2000). LOF: identifying density-based local outliers. Proceedings of the 2000 ACM SIGMOD international conference on Management of data, 93-104.

[13] Gentyala, R. (2024). From features to financial personas: Mapping feature transformation efficacy to customer archetypes in behavioral banking data. International Journal of Computer Science and Engineering Research and Development, 14(1), 127–145.

[14] Gu, X., & Akoglu, L. (2021). On evaluation of unsupervised outlier detection: measures, datasets, and an empirical study. Data Mining and Knowledge Discovery, 35(4), 1489-1566.

Downloads

Published

2025-08-09

How to Cite

An Empirical Study on AI-Driven Anomaly Detection for Improving Data Integrity in Real-Time Data Streams. (2025). INTERNATIONAL JOURNAL OF ENGINEERING TRENDS AND TECHNOLOGY RESEARCH (IJETTR), 6(2), 13-21. https://ijettr.com/index.php/IJETTR/article/view/IJETTR_06_02_003