UNSUPERVISED EXPLORATION OF HIERARCHICAL STRUCTURE IN HUMAN ACTIVITY RECOGNITION DATA
DOI:
https://doi.org/10.53555/ym736w23Keywords:
Human Activity Recognition (HAR), Unsupervised Clustering, Dimensionality Reduction, Wearable Sensors, Feature AnalysisAbstract
Human Activity Recognition (HAR) datasets contain complex patterns that supervised models exploit with labeled training, but it remains unclear what latent structure exists in the data itself. This paper presents an unsupervised exploratory analysis of a smartphone sensor HAR dataset to uncover inherent activity groupings without using activity labels. We apply a range of clustering algorithms (k-means, Gaussian mixture, hierarchical agglomerative, density-based HDBSCAN, spectral clustering) and dimensionality reduction methods (Principal Component Analysis – PCA, t-distributed Stochastic Neighbor Embedding – t-SNE, Uniform Manifold Approximation and Projection – UMAP, and a feed-forward autoencoder) to identify natural clusters of sensor feature vectors. Quantitatively, we evaluate clustering quality using internal metrics (silhouette coefficient) and external metrics against true labels (Adjusted Rand Index – ARI, and Normalized Mutual Information – NMI). The results reveal a dominant two-cluster division separating static postures from dynamic movements, with finer sub-clusters roughly corresponding to the six known activities when clustering is applied hierarchically. UMAP non-linear embedding dramatically improved cluster separability and alignment with classes, outperforming PCA. Analyzing feature importance in each cluster showed that features related to body orientation and acceleration dynamics differentiate activities. These findings demonstrate that unsupervised learning can automatically discover meaningful activity groupings (e.g. distinguishing stationary vs. moving behaviors) and key distinguishing sensor features, without any labels. The study provides insights into intrinsic HAR data structure, which can inform feature design and hierarchical modeling in future activity recognition systems.
References
[1] Jolliffe, I. T., & Cadima, J. (2016). Principal component analysis: a review and recent developments. Philosophical Transactions of the Royal Society A, 374(2065), 20150202.
[2] Van der Maaten, L., & Hinton, G. (2008). Visualizing data using t-SNE. Journal of Machine Learning Research, 9(11), 2579-2605.
[3] McInnes, L., Healy, J., & Melville, J. (2018). UMAP: Uniform manifold approximation and projection for dimension reduction. arXiv preprint arXiv:1802.03426.
[4] Anguita, D., Ghio, A., Oneto, L., Parra, X., & Reyes-Ortiz, J. L. (2013). A public domain dataset for human activity recognition using smartphones. 21st European Symposium on Artificial Neural Networks, Computational Intelligence and Machine Learning,437-442.
[5] Amrani, M., Djenouri, Y., Habbas, Z., & Belhadi, A. (2022). Deep inertial sensory clustering for human activity recognition. IEEE Sensors Journal, 22(13), 12688-12697.
[6] Ordóñez, F. J., & Roggen, D. (2016). Deep convolutional and LSTM recurrent neural networks for multimodal earable activity recognition. Sensors, 16(1),115.
[7] Kwapisz, J. R., Weiss, G. M., & Moore, S. A. (2011). Activity recognition using cell phone accelerometers. ACM SigKDD Explorations Newsletter, 12(2), 74-82.
[8] Xu, H., Liu, J., Tan, H., & Zhang, Y. (2025). DCAM-Net: A deep convolution attention MLP network for smartphone-based human activity recognition. Expert Systems with Applications, 238, 121907.
[9] Sculley, D. (2010). Web-scale k-means clustering. Proceedings of the 19th International Conference on World Wide Web (WWW), 1177–1178.
[10] McLachlan, G., & Peel, D. (2000). Finite Mixture Models. Wiley.
[11] Ward, J. H. (1963). Hierarchical grouping to optimize an objective function. Journal of the American StatisticalAssociation, 58(301), 236–244.
[12] Campello, R. J. G. B., Moulavi, D., & Sander, J. (2013). Density-based clustering based on hierarchical density estimates. Advances in Knowledge Discovery and Data Mining (PAKDD), 160–172. (See also the extended journal version: Campello et al., 2015, ACM Transactions on Knowledge Discovery from Data, 10(1), 5.)
[13] Ng, A. Y., Jordan, M. I., & Weiss, Y. (2002). On spectral clustering: Analysis and an algorithm. Advances in Neural Information Processing Systems (NeurIPS), 14, 849–856.
[14] Hinton, G. E., & Salakhutdinov, R. R. (2006). Reducing the dimensionality of data with neural networks. Science, 313(5786), 504–507.


