A COMPREHENSIVE REVIEW ON DATA STREAM MINING TECHNIQUES FOR DATA CLASSIFICATION; AND FUTURE TRENDS

Authors

  • Faisal Ramzan Dipartimento di Informatica - Scienza e Ingegneria, ALMA MATER STUDIORUM - Università di Bologna
  • Muawaz Ayyaz Department of Computer Science and IT, The University of Lahore, Gujrat Campus, Gujrat, Pakistan

DOI:

https://doi.org/10.53555/ephijse.v9i3.201

Keywords:

Data Stream Mining, Rapid Development, Classification, Clustering, D-Stream, HP Stream, ANNCAD, CDM, AWSOM, CLustream, Approximate Frequent Counts

Abstract

Data Mining is a developing interdisciplinary control managing Data Reclamation and Data Stream Mining techniques, whose subject is gathering, overseeing, processing, breaking down, and visualizing the huge volume of organized or unstructured data. Data stream mining indicates how to look at Unknown patterns from a massive amount of data over algorithms. It has experienced quick improvement with significant progress in math, statistics, data science, and computer science domains. Data streams are commonly generated by various sources such as sensor networks, social media feeds, financial transactions, online retail, network traffic, and many other applications. The gathered data could be additionally utilized for various purposes, for example, execution assessment, irregularity discovery, change identification, or issue finding of the operating systems. This data stream analysis is done using different data stream mining techniques. This paper provides a broad overview of the distinct approaches used for data stream mining. Initially, we studied the different techniques of data stream mining. Next, we discuss the different clustering and classification techniques and their benefits. Then we examine the evaluation of different data stream mining techniques results that some techniques are feasible for real-time data streams and some of not. This study provides a complete understanding of techniques and their benefits. The studies done so far need to be sufficiently exhaustive for data mining techniques, so future work is needed to assess which technique is feasible for real-time data streams.

References

S. Wares, J. Isaacs, E. Elyan, Data stream mining: methods and chal-[26] lenges for handling concept drift, SN Applied Sciences 1 (11) (2019) 1412.

C. Shearer, The crisp-dm model: the new blueprint for data mining,[27] Journal of data warehousing 5 (4) (2000) 13–22.

N. Anupama, S. Jena, A novel approach using incremental under sampling for data stream mining, Big Data & Information Analytics 2 (5)[28] (2017) 1.

J. Han, J. Pei, M. Kamber, Data mining: concepts and techniques, Else-

vier, 2011. [29]

C. C. Aggarwal, Data streams: models and algorithms, Vol. 31, Springer Science & Business Media, 2007.

W. Yi, F. Teng, J. Xu, Noval stream data mining framework under the[30] background of big data, Cybernetics and Information Technologies 16 (5) (2016) 69–77.

C. C. Aggarwal, J. Han, J. Wang, P. S. Yu, A framework for projected clus-[31] tering of high dimensional data streams, in: Proceedings of the Thirtieth international conference on Very large data bases-Volume 30, 2004,

pp. 852–863. [32]

G. Hulten, L. Spencer, P. Domingos, Mining time-changing data streams, in: Proceedings of the seventh ACM SIGKDD international conference on Knowledge discovery and data mining, 2001, pp. 97–106.[33] [9] X. X. Zhu, D. Tuia, L. Mou, G.-S. Xia, L. Zhang, F. Xu, F. Fraundorfer, Deep learning in remote sensing: A comprehensive review and list of resources, IEEE Geoscience and Remote Sensing Magazine 5 (4) (2017)[34] 8–36.

P. Domingos, G. Hulten, Mining high-speed data streams, in: Proceedings of the sixth ACM SIGKDD international conference on Knowledge

discovery and data mining, 2000, pp. 71–80. [35]

I. Brown, C. Mues, An experimental comparison of classification algorithms for imbalanced credit scoring data sets, Expert Systems with Ap-[36] plications 39 (3) (2012) 3446–3453.

W. Fan, A. Bifet, Mining big data: current status, and forecast to the fu-[37] ture, ACM sIGKDD Explorations Newsletter 14 (2) (2013) 1–5.

E.Ramentol, S.Vluymans, N.Verbiest, Y.Caballero, R.Bello, C.Cornelis,[38] F. Herrera, Ifrowann: imbalanced fuzzy-rough ordered weighted average nearest neighbor classification, IEEE Transactions on Fuzzy Sys-[39] tems 23 (5) (2014) 1622–1637.

D.-H. Tran, M. M. Gaber, K.-U. Sattler, Change detection in streaming data in the era of big data: models and issues, ACM SIGKDD Explo-[40] rations Newsletter 16 (1) (2014) 30–38.

J. Guo, P. Zhang, L. Guo, et al., Mining hot topics from twitter streams,

Procedia Computer Science 9 (2012) 2008–2011.

M. Last, Online classification of nonstationary data streams, Intelligent data analysis 6 (2) (2002) 129–147.

J. Redmon, R. B. G. Santosh Kumar Divvala, A. Farhadi, You only look[42] once: Unified, real-time object detection, CoRR abs/1506.02640 (2015). arXiv:1506.02640.

URL http://arxiv.org/abs/1506.02640 [43]

L. O’callaghan, N. Mishra, A. Meyerson, S. Guha, R. Motwani,

Streaming-data algorithms for high-quality clustering, in: Proceedings[44] 18th International Conference on Data Engineering, IEEE, 2002, pp. 685–694.

K. S. S. Reddy, C. S. Bindu, Streamsw: A density-based approach for clustering data streams over sliding windows, Measurement 144 (2019)[45] 14–19.

M. Z.-u. Rehman, T. Li, Y. Yang, H. Wang, Hyper-ellipsoidal clustering technique for evolving data stream, Knowledge-Based Systems 70[46] (2014) 3–14.

J. Gao, J. Li, Z. Zhang, P.-N. Tan, An incremental data stream clustering algorithm based on dense units detection, in: Pacific-Asia Conference[47] on Knowledge Discovery and Data Mining, Springer, 2005, pp. 420–425. [22] K. Udommanetanakit, T. Rakthanmanon, K. Waiyamai, E-stream: Evolution-based technique for stream clustering, Vol. 4632, 2007, pp. 605–615. doi:10.1007/978-3-540-73871-858.

S. Papadimitriou, A. Brockwell, C. Faloutsos, Adaptive, hands-off stream mining (cmu-cs-02-205) (2003).

C. C. Aggarwal, S. Y. Philip, J. Han, J. Wang, A framework for clustering evolving data streams, in: Proceedings 2003 VLDB conference, Elsevier, 2003, pp.

–92.

V. Ganti, J. Gehrke, R. Ramakrishnan, Mining data streams under block evolution, Acm Sigkdd Explorations Newsletter 3 (2) (2002) 1–10. Y. Chi, H. Wang, P. S. Yu, Loadstar: load shedding in data stream mining, in: Proceedings of the 31st international conference on Very large data bases, VLDB Endowment, 2005, pp. 1302–1305.

M. M. Gaber, S. Krishnaswamy, A. Zaslavsky, On-board mining of data streams in sensor networks, in: Advanced methods for knowledge discovery from complex data, Springer, 2005, pp. 307–335.

C. C. Aggarwal, J. Han, J. Wang, P. S. Yu, On demand classification of data streams, in: Proceedings of the tenth ACM SIGKDD international conference on Knowledge discovery and data mining, 2004, pp. 503–508.

F. Ferrer-Troyano, J. S. Aguilar-Ruiz, J. C. Riquelme, Discovering decision rules from numerical data streams, in: Proceedings of the 2004 ACM symposium on Applied computing, 2004, pp. 649–653.

Y.-N. Law, C. Zaniolo, An adaptive nearest neighbor classification algorithm for data streams, in: European Conference on Principles of Data Mining and Knowledge Discovery, Springer, 2005, pp. 108–120.

Y. Kwon, W. Y. Lee, M. Balazinska, G. Xu, Clustering events on streams using complex context information, in: 2008 IEEE International Conference on Data Mining Workshops, IEEE, 2008, pp. 238–247.

B. Babcock, M. Datar, R. Motwani, et al., Load shedding techniques for data stream systems, in: Proceedings of the 2003 Workshop on Management and Processing of Data Streams, Vol. 577, Citeseer, 2003.

N. Tatbul, U. Çetintemel, S. Zdonik, M. Cherniack, M. Stonebraker, Load shedding on data streams, in: Proceedings of the Workshop on Management and Processing of Data Streams (MPDS 03), San Diego, CA, USA, 2003.

B. Babcock, S. Babu, M. Datar, R. Motwani, J. Widom, Models and issuesin data stream systems, in: Proceedings of the twenty-first ACM SIGMODSIGACT-SIGART symposium on Principles of database systems, 2002, pp. 1–

C. Mathieu, Proceedings of the Twentieth Annual ACM-SIAM Symposium onDiscrete Algorithms, SIAM, 2009.

C. C. Aggarwal, S. Y. Philip, A survey of synopsis construction in data streams, in: Data Streams, Springer, 2007, pp. 169–207.

S. Chawla, Proceedings of the Fourteenth Annual ACM-SIAM Symposium on Discrete Algorithms, SIAM, 2020.

M. Datar, A. Gionis, P. Indyk, R. Motwani, Maintaining stream statistics over sliding windows, SIAM journal on computing 31 (6) (2002) 1794–1813.

B. Babcock, M. Datar, R. Motwani, Sampling from a moving window over streaming data, in: 2002 Annual ACM-SIAM Symposium on Discrete Algorithms (SODA 2002), Stanford InfoLab, 2001.

G. S. Manku, R. Motwani, Approximate frequency counts over data streams, in: VLDB’02: Proceedings of the 28th International Conference on Very Large Databases, Elsevier, 2002, pp. 346–357.

S. Guha, N. Koudas, Approximating a data stream for querying and estimation:

Algorithms and performance evaluation, in: Proceedings 18th International Conference on Data Engineering, IEEE, 2002, pp. 567–576.

S. Guha, N. Koudas, K. Shim, Data-streams and histograms, in: Proceedings of the thirty-third annual ACM symposium on Theory of computing, 2001, pp. 471–475.

J. Gehrke, F. Korn, D. Srivastava, On computing correlated aggregates over continual data streams, ACM SIGMOD Record 30 (2) (2001) 13–24.

H. Wang, W. Fan, P. S. Yu, J. Han, Mining concept-drifting data streams using ensemble classifiers, in: Proceedings of the ninth ACM SIGKDD international conference on Knowledge discovery and data mining, 2003, pp. 226– 235.

Y. Chen, L. Tu, Density-based clustering for real-time stream data, in: Proceedings of the 13th ACM SIGKDD international conference on Knowledge discovery and data mining, 2007, pp. 133–142.

C. Giannella, J. Han, J. Pei, X. Yan, P. S. Yu, Mining frequent patterns in data streams at multiple time granularities, Next generation data mining 212 (2003) 191–212.

V. S. Reddy, T. Rao, A. Govardhan, Data mining techniques for data streams mining, Review of Computer Engineering Studies 4 (1) (2017) 31–35.

Downloads

Published

2023-08-11