UPSI Digital Repository (UDRep)
|
|
|
Abstract : Universiti Pendidikan Sultan Idris |
In data analysis, recognizing unusual patterns (outliers analysis or anomaly detection) plays a crucial role in identifying critical events. Because of its widespread use in many applications, it remains an important and extensive research brand in data mining. As a result, numerous techniques for finding anomalies have been developed, and more are still being worked on. Researchers can gain vital knowledge by identifying anomalies, which helps them make better meaningful data analyses. However, anomaly detection is even more challenging when the datasets are high-dimensional and multivariate. In the literature, anomaly detection has received much attention but not as much as anomaly detection, specifically in high dimensional and multivariate conditions. This paper systematically reviews the existing related techniques and presents extensive coverage of challenges and perspectives of anomaly detection within high-dimensional and multivariate data. At the same time, it provides a clear insight into the techniques developed for anomaly detection problems. This paper aims to help select the best technique that suits its rightful purpose. It has been found that PCA, DOBIN, Stray algorithm, and DAE-KNN have a high learning rate compared to Random projection, ROBEM, and OCP methods. Overall, most methods have shown an excellent ability to tackle the curse of dimensionality and multivariate features to perform anomaly detection. Moreover, a comparison of each algorithm for anomaly detection is also provided to produce a better algorithm. Finally, it would be a line of future studies to extend by comparing the methods on other domain-specific datasets and offering a comprehensive anomaly interpretation in describing the truth of anomalies. 2023, Politeknik Negeri Padang. All rights reserved. |
References |
M. Çelik, F. Dadaşer-Çelik, and A. Ş. Dokuz, “Anomaly detection intemperature data using dbscan algorithm,” in 2011 international symposium on innovations in intelligent systems and applications, 2011, pp. 91–95. R. Alguliyev, R. Aliguliyev, and L. Sukhostat, “Anomaly detection in Big data based on clustering,” Statistics, Optimization & Information Computing, vol. 5, no. 4, pp. 325–340, 2017. I. Ben-Gal, “Outlier detection,” in Data mining and knowledge discovery handbook, Springer, 2005, pp. 131–146. S. Ayesha, M. K. Hanif, and R. Talib, “Overview and comparative study of dimensionality reduction techniques for high dimensional data,” Information Fusion, vol. 59, pp. 44–58, 2020. A. Ukil, S. Bandyoapdhyay, C. Puri, and A. Pal, “IoT healthcare analytics: The importance of anomaly detection,” in 2016 IEEE 30th international conference on advanced information networking and applications (AINA), 2016, pp. 994–997. L. Basora, X. Olive, and T. Dubot, “Recent advances in anomaly detection methods applied to aviation,” Aerospace, vol. 6, no. 11, p. 117, 2019. M. A. Hayes and M. A. M. Capretz, “Contextual anomaly detection framework for big sensor data,” J Big Data, vol. 2, no. 1, p. 2, 2015. A. Sreenivasulu, “Evaluation of cluster based Anomaly detection.” 2019. X. Yang, Z. Wang, and X. Zi, “Thresholding-based outlier detection for high-dimensional data,” J Stat Comput Simul, vol. 88, no. 11, pp. 2170–2184, 2018. P. Navarro-Esteban and J. A. Cuesta-Albertos, “High-dimensional outlier detection using random projections,” TEST, pp. 1–27, 2021. H. Wang, M. J. Bah, and M. Hammad, “Progress in outlier detection techniques: A survey,” Ieee Access, vol. 7, pp. 107964–108000, 2019. N. R. Prasad, S. Almanza-Garcia, and T. T. Lu, “Anomaly detection,” Computers, Materials and Continua, vol. 14, no. 1, pp. 1–22, 2009, doi: 10.1145/1541880.1541882. D. Samariya and A. Thakkar, “A Comprehensive Survey of Anomaly Detection Algorithms,” Annals of Data Science. Springer Science and Business Media Deutschland GmbH, 2021. doi: 10.1007/s40745-021-00362-9. Y. Yang, L. Chen, and C. Fan, “ELOF: fast and memory-efficient anomaly detection algorithm in data streams,” Soft comput, vol. 25, no.6, pp. 4283–4294, 2021. E. Uzabaci, I. Ercan, and O. Alpu, “Evaluation of outlier detection method performance in symmetric multivariate distributions,” Communications in Statistics-Simulation and Computation, vol. 49, no. 2, pp. 516–531, 2020. R. A. Johnson, D. W. Wichern, and others, Applied multivariate statistical analysis, vol. 6. Pearson London, UK:, 2014. S. Thudumu, P. Branch, J. Jin, and J. J. Singh, “A comprehensive survey of anomaly detection techniques for high dimensional big data,” J Big Data, vol. 7, no. 1, pp. 1–30, 2020. H. Liu, X. Li, J. Li, and S. Zhang, “Efficient Outlier Detection for High-Dimensional Data,” IEEE Trans Syst Man Cybern Syst, vol. 48, no. 12, pp. 2451–2461, Dec. 2018, doi:10.1109/TSMC.2017.2718220. V. S. L’vov, A. Pomyalov, and I. Procaccia, “Outliers, extreme events, and multiscaling,” Phys Rev E, vol. 63, no. 5, p. 56118, 2001. X. Xu, H. Liu, and M. Yao, “Recent progress of anomaly detection,” Complexity, 2019. K. Malik, H. Sadawarti, and K. G S, “Comparative analysis of outlier detection techniques,” in IJCA, 2014, vol. 97, no. 8, pp. 12–21. D. Ghosh and A. Vogt, “Outliers: An evaluation of methodologies,” in Joint statistical meetings, 2012, vol. 2012. P. J. Rousseeuw and M. Hubert, “Anomaly detection by robust statistics,” Wiley Interdiscip Rev Data Min Knowl Discov, vol. 8, no.2, p. e1236, 2018. J. M. Kim and C. S. Park, “Elimination of multidimensional outliers for a compression chiller using a support vector data description,” Sci Technol Built Environ, vol. 27, no. 5, pp. 578–591, 2021. G. Horváth, E. Kovács, R. Molontay, and S. Nováczki, “Copula-based anomaly scoring and localization for large-scale, high-dimensional continuous data,” ACM Transactions on Intelligent Systems and Technology (TIST), vol. 11, no. 3, pp. 1–26, 2020. S. Kandanaarachchi and R. J. Hyndman, “Dimension reduction for outlier detection using DOBIN,” Journal of Computational and Graphical Statistics, vol. 30, no. 1, pp. 204–219, 2021. S. Suboh and I. A. Aziz, “Anomaly Detection with Machine Learning in the Presence of Extreme Value-A Review Paper,” in 2020 IEEE Conference on Big Data and Analytics (ICBDA), 2020, pp. 66–72. X. Chen, B. Zhang, T. Wang, A. Bonni, and G. Zhao, “Robust principal component analysis for accurate outlier sample detection in RNA-Seq data,” BMC Bioinformatics, vol. 21, no. 1, pp. 1–20, 2020. R. Foorthuis, “On the nature and types of anomalies: a review of deviations in data,” Int J Data Sci Anal, vol. 12, no. 4, pp. 297–331, 2021. H. A. M. Shaffril, A. A. Samah, S. F. Samsuddin, and Z. Ali, “Mirrormirror on the wall, what climate change adaptation strategies are practiced by the Asian’s fishermen of all?,” J Clean Prod, vol. 232, pp. 104–117, 2019. P. D. Talagala, R. J. Hyndman, and K. Smith-Miles, “Anomaly detection in high-dimensional data,” Journal of Computational and Graphical Statistics, vol. 30, no. 2, pp. 360–374, 2021. Y. Öner and H. Bulut, “A robust EM clustering approach: ROBEM,” Communications in Statistics-Theory and Methods, vol. 50, no. 19, pp.4587–4605, 2021. |
This material may be protected under Copyright Act which governs the making of photocopies or reproductions of copyrighted materials. You may use the digitized material for private study, scholarship, or research. |