Adaptive Swarm Intelligence Algorithms for High-Dimensional Data Clustering in Big Data Analytics

Authors

  • Eri Eli Lavindi Politeknik Negeri Semarang Author
  • Nina Faoziyah Universitas Muhammadiyah Tegal Author

DOI:

https://doi.org/10.63846/dfxajx74

Keywords:

Swarm Intelligence, Big Data Clustering, Dimensionality Reduction, Hybrid Optimization Algorithms, Distributed Computing

Abstract

The exponential growth of big data and increasing dimensionality pose significant challenges for traditional clustering algorithms, particularly in terms of computational efficiency and solution quality. This study addresses the critical limitations of existing swarm intelligence approaches by introducing an innovative Hybrid Adaptive Swarm Intelligence (HASI) algorithm for high-dimensional data clustering. The proposed method combines Ant Colony Optimization (ACO) and Particle Swarm Optimization (PSO) with a novel adaptive dimensionality reduction mechanism, overcoming prevalent issues of premature convergence and scalability in complex data environments. By integrating a dynamic feature selection technique and implementing a distributed computing framework compatible with Apache Spark, the HASI algorithm demonstrates superior performance across multiple high-dimensional datasets. Experimental validation on synthetic and real-world big data benchmarks reveals that the proposed approach achieves up to 37% improvement in clustering accuracy and 52% reduction in computational complexity compared to state-of-the-art swarm intelligence clustering methods. The adaptive mechanism dynamically balances exploration and exploitation, enabling more robust and efficient clustering in high-dimensional spaces. The research contributes a scalable, adaptive swarm intelligence framework that significantly enhances clustering performance for big data analytics, offering a promising solution to the computational challenges inherent in high-dimensional data processing.

References

U. A. Salaria, M. I. Menhas, and S. Manzoor, “Quasi Oppositional Population Based Global Particle Swarm Optimizer With Inertial Weights (QPGPSO-W) for Solving Economic Load Dispatch Problem,” Ieee Access, vol. 9, pp. 134081–134095, 2021, doi: 10.1109/access.2021.3116066.

D. Tian, “Adaptive Multi-Updating Strategy Based Particle Swarm Optimization,” Intelligent Automation & Soft Computing, vol. 37, no. 3, pp. 2783–2807, 2023, doi: 10.32604/iasc.2023.039531.

J. Jiang, W. Wen-xue, W.-L. Shao, and Y. Qu, “Research on Large-Scale Bi-Level Particle Swarm Optimization Algorithm,” Ieee Access, vol. 9, pp. 56364–56375, 2021, doi: 10.1109/access.2021.3072199.

W. Gao, “A Dual-Competition-Based Particle Swarm Optimizer for Large-Scale Optimization,” Mathematics, vol. 12, no. 11, p. 1738, 2024, doi: 10.3390/math12111738.

A. A. Shaban, J. A. D. Fuente, M. S. Salih, and R. Ali, “Review of Swarm Intelligence for Solving Symmetric Traveling Salesman Problem,” Qubahan Academic Journal, vol. 3, no. 2, pp. 10–27, 2023, doi: 10.48161/qaj.v3n2a141.

T. Kniazhyk and O. Muliarevych, “Cloud Computing With Resource Allocation Based on Ant Colony Optimization,” Advances in Cyber-Physical Systems, 2023, doi: 10.23939/acps2023.02.104.

I. Chike, “Detecting and Monitoring Artisanal Mining Operations in Semi-Arid Terrain Using Multitemporal SAR Data for InSAR Coherence Estimation and Unsupervised Classification,” The International Archives of the Photogrammetry Remote Sensing and Spatial Information Sciences, vol. XLVIII-1–2024, pp. 105–110, 2024, doi: 10.5194/isprs-archives-xlviii-1-2024-105-2024.

Z. Zhu, T. Wang, and R. J. Samworth, “High-Dimensional Principal Component Analysis With Heterogeneous Missingness,” J R Stat Soc Series B Stat Methodol, vol. 84, no. 5, pp. 2000–2031, 2022, doi: 10.1111/rssb.12550.

N. H. V Nguyen, M. T. Pham, P. Hao, C. T. Pham, and K. Tachibana, “Human Action Recognition Method Based on Conformal Geometric Algebra and Recurrent Neural Network,” Information and Control Systems, no. 5, pp. 2–11, 2020, doi: 10.31799/1684-8853-2020-5-2-11.

X. Zhong, C. Su, and Z. Fan, “Empirical Bayes PCA in High Dimensions,” J R Stat Soc Series B Stat Methodol, vol. 84, no. 3, pp. 853–878, 2022, doi: 10.1111/rssb.12490.

Y. Hwang, “Identifying the Most Representative Actigraphy Variables Reflecting Standardized Hand Function Assessments for Remote Monitoring in Children With Unilateral Cerebral Palsy,” BMC Pediatr, vol. 24, no. 1, 2024, doi: 10.1186/s12887-024-04724-z.

D. Ma, S. He, and K. Sun, “A Modified Multivariable Complexity Measure Algorithm and Its Application for Identifying Mental Arithmetic Task,” Entropy, vol. 23, no. 8, p. 931, 2021, doi: 10.3390/e23080931.

M. Wan, X. Wang, H. Tan, and G. Yang, “Manifold Regularized Principal Component Analysis Method Using L2,p-Norm,” Mathematics, vol. 10, no. 23, p. 4603, 2022, doi: 10.3390/math10234603.

J. Lu, “Time Series Regression Based on Bayesian Model Averaging and Principal Component Analysis,” Advances in Computer Signals and Systems, vol. 7, no. 1, 2023, doi: 10.23977/acss.2023.070110.

J. M. Abuín, N. Lopes, L. Ferreira, T. F. Pena, and B. Schmidt, “Big Data in Metagenomics: Apache Spark vs MPI,” PLoS One, vol. 15, no. 10, p. e0239741, 2020, doi: 10.1371/journal.pone.0239741.

M. H. Alshayeji, B. Behbehani, and I. Ahmad, “Spark‐based Parallel Processing Whale Optimization Algorithm,” Concurr Comput, vol. 34, no. 4, 2021, doi: 10.1002/cpe.6607.

J. Liu, T. Zhu, Y. Zhang, and Z. Liu, “Parallel Particle Swarm Optimization Using Apache Beam,” Information, vol. 13, no. 3, p. 119, 2022, doi: 10.3390/info13030119.

A. Döschl, M.-E. Keller, and P. Mandl, “Performance Evaluation of GPU- And Cluster-Computing for Parallelization of Compute-Intensive Tasks,” International Journal of Web Information Systems, vol. 17, no. 4, pp. 377–402, 2021, doi: 10.1108/ijwis-03-2021-0032.

R. R. Expósito, R. Galego-Torreiro, and J. González-Domínguez, “SeQual: Big Data Tool to Perform Quality Control and Data Preprocessing of Large NGS Datasets,” Ieee Access, vol. 8, pp. 146075–146084, 2020, doi: 10.1109/access.2020.3015016.

A. Ed-daoudy, K. Maalmi, and A. E. Ouaazizi, “A Scalable and Real-Time System for Disease Prediction Using Big Data Processing,” Multimed Tools Appl, vol. 82, no. 20, pp. 30405–30434, 2023, doi: 10.1007/s11042-023-14562-3.

S. I. Boushaki, “Big Data Clustering Based on Spark Chaotic Improved Particle Swarm Optimization,” Indonesian Journal of Electrical Engineering and Computer Science, vol. 34, no. 1, p. 419, 2024, doi: 10.11591/ijeecs.v34.i1.pp419-429.

A. De, “Common Population Codes Produce Extremely Nonlinear Neural Manifolds,” Proceedings of the National Academy of Sciences, vol. 120, no. 39, 2023, doi: 10.1073/pnas.2305853120.

M. Shinn, “Phantom Oscillations in Principal Component Analysis,” Proceedings of the National Academy of Sciences, vol. 120, no. 48, 2023, doi: 10.1073/pnas.2311420120.

A. García, A. Pinto‐Carral, S. P. González, and P. Marqués‐Sánchez, “A Competency Model for Nurse Executives,” Int J Nurs Pract, vol. 28, no. 5, 2022, doi: 10.1111/ijn.13058.

V. R. Lourenço, D. B. de S. Teixeira, C. A. G. Costa, and C. A. K. Taniguchi, “Use of Proximal Sensor for Soil Classes Separation Applying Principal Component Analysis (PCA),” Journal of Hyperspectral Remote Sensing, vol. 10, no. 3, pp. 130–137, 2020, doi: 10.29150/jhrs.v10.3.p130-137.

D. Wu et al., “A Novel Approach for Forensic Identification of Automotive Paints Using Optical Coherence Tomography and Multivariate Statistical Methods,” J Forensic Sci, vol. 67, no. 6, pp. 2253–2266, 2022, doi: 10.1111/1556-4029.15114.

Z. Wang, “Ultra-Short-Term Offshore Wind Power Prediction Based on PCA-SSA-VMD and BiLSTM,” Sensors, vol. 24, no. 2, p. 444, 2024, doi: 10.3390/s24020444.

A. Filianoti et al., “Volatilome Analysis in Prostate Cancer by Electronic Nose: A Pilot Monocentric Study,” Cancers (Basel), vol. 14, no. 12, p. 2927, 2022, doi: 10.3390/cancers14122927.

F. Trozzi, X. Wang, and P. Tao, “UMAP as a Dimensionality Reduction Tool for Molecular Dynamics Simulations of Biomacromolecules: A Comparison Study,” J Phys Chem B, vol. 125, no. 19, pp. 5022–5034, 2021, doi: 10.1021/acs.jpcb.1c02081.

C. Annubaha, A. P. Widodo, and K. Adi, “Implementation of Eigenface Method and Support Vector Machine for Face Recognition Absence Information System,” Indonesian Journal of Electrical Engineering and Computer Science, vol. 26, no. 3, p. 1624, 2022, doi: 10.11591/ijeecs.v26.i3.pp1624-1633.

S. Aeberhard and M. Forina, “Wine,” 1992.

A. D. G. A. O. L. Reyes-Ortiz Jorge and X. Parra, “Human Activity Recognition Using Smartphones,” 2013.

J. Leskovec and A. Krevl, “SNAP Datasets: Stanford Large Network Dataset Collection,” Jun. 2014.

Downloads

Published

10-02-2025

How to Cite

Lavindi, E. E., & Faoziyah, N. (2025). Adaptive Swarm Intelligence Algorithms for High-Dimensional Data Clustering in Big Data Analytics. ALCOM: Journal of Algorithm and Computing, 1(1), 23-32. https://doi.org/10.63846/dfxajx74