Researchers Find That Tracking Hidden Data Clusters Boosts Predictions in Imbalanced and Evolving Data (11 March 2026)
Think about anomaly detection, software defect prediction, credit card fraud detection, network fault diagnosis, or spam filtering in real-time environment. The timely identification of patterns and irregularities becomes very critical.
Now a days, the data is being generated in shape of massive streams. The methods which are based upon traditional off-line training have become obselete. These methods do not have the ability to tackle the evolving nature of data. This situation has led to the development of efficient, adaptive, and scalable data stream classification techniques for maintaining accuracy, responsiveness, and reliability in data-driven systems.
Supervised data stream methods enable models to learn incrementally from labeled data while adapting to changing patterns over time. So these techniques have gained significant prominence due to the growing availability of continuous and rapidly evolving data in modern applications.
Continuing from our lab's work from 2024, we have designed another data stream classification ensembled method titled "MinoClust: Exploiting Minority Sub-Cluster Dynamics for Classification in Imbalanced and Drifting Data Streams". The research work has been published in The IEEE Transactions on Emerging Topics in Computational Intelligence (TETCI) journal in March 2026 (early access). While classical methods focus ignore local drifts (that exist within classes), these lead to inferior performance. Local drifts are actually within-class subcluster transitions. For instance, within minority class data, there could exist clusters of data. With the pessage of time, these clusters move from one location to another, split from a bigger size to smaller clusters, or some small clusters could merge together to form bigger clusters. Solving a class imbalance problem in conjunction with this behaviour could lead to higher performance.
MinoClust targets imbalanced classification non-stationary data. It is designed to cluster the batch data, identifies trending dense regions of the minority space, and performs targetted resampling. We build an ensemble by training classifiers using this resampled data on the fly, and keep pushing these to the ensemble pool. This process directly improves the classification performance, since it resamples in trending regions, a kind of informed resampling. We have experimented it on 37 datasets having high imbalance ratios and variety of concept drifts, and found it outperform many well-known methods.
Full article could be found here: https://ieeexplore.ieee.org/document/11427338/
The team involved: Hina Farooq, Muhammad Usman, and Huanhuan Chen.
Here is a short biography of the authors:
Hina Farooq: Hina Farooq earned her B.S. in Computer Science in 2015 and her M.S. in Computer Science in 2018 from COMSATS University, Islamabad, Pakistan. She is currently pursuing a Ph.D. in Computer Science at the School of Computer Science and Technology, University of Science and Technology of China, Hefei, China. Her research interests primarily focus on data stream mining and machine learning.
Muhammad Usman: Muhammad Usman received his B.Sc. degree from the International Islamic University, Islamabad, Pakistan, in 2006, and his Master's degree from SZABIST, Islamabad, in 2017. He obtained his Ph.D. in Computer Science from the University of Science and Technology of China (USTC), Hefei, China. He is currently working with Pakistan Scientific and Technological Information Center, Islamabad (Pakistan Science Foundation). His research interests include data stream mining, machine learning, and knowledge discovery, with a particular focus on imbalanced learning and concept drift in data streams. He has published several research articles in internationally recognized journals and actively works on developing data-driven solutions for scientific and research applications.
Ruibin Bai: Ruibin Bai (Senior Member, IEEE) obtained his B.Sc. and M.Sc. degrees from Northwestern Polytechnic University, China, in 1999 and 2002, respectively, and completed his Ph.D. at the University of Nottingham, Nottingham, U.K., in 2005. He is currently a Professor and Head of the School of Computer Science at the University of Nottingham Ningbo China, Ningbo, China. He also leads the Artificial Intelligence and Optimisation (AIOP) Group and the Ningbo Digital Port Technologies Key Laboratory. His research focuses on computational intelligence, reinforcement learning, operations research, scheduling, and optimization, with particular emphasis on transportation systems and port logistics.
Huanhuan Chen: Huanhuan Chen (Fellow, IEEE) received his B.Sc. degree from the University of Science and Technology of China (USTC), Hefei, China, in 2004, and earned his Ph.D. in Computer Science from the University of Birmingham, Birmingham, U.K., in 2008. He is currently a Full Professor at the School of Computer Science and Technology, USTC. His research interests include neural networks, Bayesian inference, and evolutionary computation. Dr. Chen has received several prestigious awards, including the International Neural Network Society Young Investigator Award in 2015, the IEEE Computational Intelligence Society Outstanding Ph.D. Dissertation Award in 2012, the IEEE Transactions on Neural Networks Outstanding Paper Award in 2011 (for a paper published in 2009), and the British Computer Society Distinguished Dissertations Award in 2009. He currently serves as an Associate Editor for IEEE Transactions on Neural Networks and Learning Systems and IEEE Transactions on Emerging Topics in Computational Intelligence.
