Document Type: Original Research Paper


Faculty of Electrical and Computer, Malek Ashtar University of Technology, Tehran, Iran



Background and Objectives: Action recognition, as the processes of labeling an unknown action of a query video, is a challenging problem, due to the event complexity, variations in imaging conditions, and intra- and inter-individual action-variability. A number of solutions proposed to solve action recognition problem. Many of these frameworks suppose that each video sequence includes only one action class. Therefore, we need to break down a video sequence into sub-sequences, each containing only a single action class.
Methods: In this paper, we develop an unsupervised action change detection method to detect the time of actions change, without classifying the actions. In this method, a silhouette-based framework will be used for action representation. This representation uses xt patterns. The xt pattern is a selected frame of xty volume. This volume is achieved by rotating the traditional space-time volume and displacing its axes. In xty volume, each frame consists of two axes (x) and time (t), and y value specifies the frame number.
Results: To test the performance of the proposed method, we created 105 artificial videos using the Weizmann dataset, as well as time-continuous camera-captured video. The experiments have been conducted on this dataset. The precision of the proposed method was 98.13% and the recall was 100%.
Conclusion: The proposed unsupervised approach can detect action changes with a high precision. Therefore, it can be useful in combination with an action recognition method for designing an integrated action recognition system.


Main Subjects

[1] K. Guo, P. Ishwar, J. Konrad, "Action recognition from video using feature covariance matrices," IEEE Transactions on Image Processing, 22(6): 2479-2494, 2013. 

[2] K. Guo, Action recognition using log-covariance matrices of silhouette and optical-flow features. Boston University, 2012.

[3] S.-R. Ke, H. Thuc, Y.-J. Lee, J.-N. Hwang, J.-H. Yoo, and K.-H. Choi, "A review on video-based human activity recognition," 2(2): 88-131, 2013.

[4] Z. Weng and Y. J. J. o. E. I. Guan, "Trajectory-aware three-stream CNN for video action recognition," 28(2): 021004, 2018.

[5] H. Wang, A. Kläser, C. Schmid, C.-L. J. I. j. o. c. v. Liu, "Dense trajectories and motion boundary descriptors for action recognition," 103(1): 60-79, 2013.

[6] M. Ristivojevic, J. J. I. T. o. I. P. Konrad, "Space-time image sequence analysis: object tunnels and occlusion volumes," 15(2): 364-376, 2006. 

[7] Y. Pritch, A. Rav-Acha, S. J. I. T. o. P. A. Peleg, M. Intelligence, "Nonchronological video synopsis and indexing," 11: 1971-1984, 2008. 

[8] J. J. I. C. m. Konrad, "Videopsy: Dissecting visual data in space-time," 45(1): 34-42, 2007. 

[9] M. Blank, L. Gorelick, E. Shechtman, M. Irani, R. Basri, "Actions as space-time shapes," in null, IEEE: 1395-1402, 2005.   

[10] D. K. Vishwakarma, R. J. E. S. w. A. Kapoor, "Hybrid classifier based human activity recognition using the silhouette and cells," 42 (20): 6957-6965, 2015.

[11] N. Amraji, L. Mu, M. Milanova, "Shape–based human actions recognition in videos," in International Conference on Human-Computer Interaction, Springer: 539-546, 2011.

[12] A. F. Bobick, J. W. Davis, "The recognition of human movement using temporal templates," IEEE Transactions on pattern analysis, 23(3): 257-267, 2001. 

[13] M. Sharif, Muhammad Attique Khan, Farooq Zahid, Jamal Hussain Shah, Tallha Akram., "Human action recognition: a framework of statistical weighted segmentation and rank correlation-based selection," Pattern Analysis and Applications): 281-294, 2020.

[14] C. C. A. Chen, J, "Recognizing human action from a far field of view," Proceedings of the 2009Workshop on Motion and Video Computing (WMVC)): 1–7, December 2009.  

[15] S. Sehgal, "Human Activity Recognition Using BPNN Classifier on HOG Features," In Proceedings of the 2018 International Conference on Intelligent Circuits and Systems (ICICS),Phagwara, India): 286–289, 2018. 

[16] M. A. Khan, Tallha Akram, Muhammad Sharif, Nazeer Muhammad, Muhammad Younus Javed, Syed Rameez Naqvi, "Improved strategy for human action recognition; experiencing a cascaded design," IET Image Processing: 818-829., 2019.  

[17] N. Dalal, B. Triggs, "Histograms of oriented gradients for human detection," in Computer Vision and Pattern Recognition, 2005. CVPR 2005. IEEE Computer Society Conference on, 2005, 1: IEEE): 886-893.  

[18] Y. Zhu, W. Chen, G. J. I. Guo, V. Computing, "Evaluating spatiotemporal interest point features for depth-based action recognition," 32(8): 453-464, 2014.  

[19] J. C. Niebles, H. Wang, L. J. I. j. o. c. v. Fei-Fei, "Unsupervised learning of human action categories using spatial-temporal words," 79(3): 299-318, 2008.  

[20] L. Zhang, R. Khusainov, J. Chiverton, "Practical action recognition with manifold regularized sparse representation," in 29th British Machine Vision Conference: BMVC 2018, 2018: British Machine Vision Association, 2018.

[21] M. A. Khan, Kashif Javed, Sajid Ali Khan, Tanzila Saba, Usman Habib, Junaid Ali Khan, Aaqif Afzaal Abbasi. , "Human action recognition using fusion of multiview and deep features: an application to video surveillance," Multimedia Tools and Applications): 1-27, 2020.  

[22]N. Hussain, Muhammad Attique Khan, Muhammad Sharif, Sajid Ali Khan, Abdulaziz A. Albesher, Tanzila Saba, Ammar Armaghan, "A deep neural network and classical features based scheme for objects recognition: an application for machine inspection," Multimed Tools Application, 2020,  

[23]H. Arshad, Muhammad Attique Khan, Muhammad Irfan Sharif, Mussarat Yasmin, João Manuel RS Tavares, Yu‐Dong Zhang, Suresh Chandra Satapathy, "A multilevel paradigm for deep convolutional neural network features selection with an application to human gait recognition," Expert Systems 2020.    

[24] N. M. Oliver, B. Rosario, A. P. J. I. t. o. p. a. Pentland, m. intelligence, "A Bayesian computer vision system for modeling human interactions," 22, (8): 831-843, 2000.

[25] Y. Zhang et al., "Modeling temporal interactions with interval temporal bayesian networks for complex activity recognition," 35(10): 2468-2483, 2013. 

[26] F. Negin, F. J. I. T. R. Bremond, "Human action recognition in videos: A survey," 2016.

[27] W. Ding, K. Liu, X. Fu, F. Cheng, "Profile HMMs for skeleton-based human action recognition," Signal Processing: Image Communication, 42: 109-119, 2016.  

[28] Y. Zhou, A. J. P. R. L. Ming, "Human action recognition with skeleton induced discriminative approximate rigid part model," 83: 261-267, 2016.

 [29] B. Saghafi, D. Rajan, W. J. P. A. Li, Applications, "Efficient 2D viewpoint combination for human action recognition," 19(2): 563-577, 2016.

[30] S. Das, M. Koperski, F. Bremond, G. Francesca, "A Fusion of Appearance based CNNs and Temporal evolution of Skeleton with LSTM for Daily Living Action Recognition," 2018.

[31] Z. Liu Z. Wang, "Action recognition with low observational latency via part movement model," 76(24): 26675-26693, 2017.   

[32] S. Sempena, N. U. Maulidevi, P. R. Aryan, "Human action recognition using dynamic time warping," in Electrical Engineering and Informatics (ICEEI), 2011 International Conference on, 2011: IEEE: 1-5, 2011.   

[33] S.-R. Ke, "Recognition of Human Actions based on 3D Pose Estimation via Monocular Video Sequences," 2015.

[34] D. C. Luvizon, H. Tabia, D. J. P. R. L. Picard, "Learning features combination for human action recognition from skeleton sequences," 99: 13-20, 2017.

[35] M. Hoai, Z.-Z. Lan, F. De la Torre, "Joint segmentation and classification of human actions in video," in Computer Vision and Pattern Recognition (CVPR), 2011 IEEE Conference on, IEEE: 3265-3272, 2011.

[36] M. Basseville, I. V. Nikiforov, Detection of abrupt changes: theory and application. Prentice Hall Englewood Cliffs, 1993.

[37] L. Gorelick, M. Blank, E. Shechtman, M. Irani, R. J. I. t. o. p. a. Basri, and m. intelligence, "Actions as space-time shapes," 29(12): 2247-2253, 2007. 

[38] K. Guo, P. Ishwar, J. Konrad, "Action recognition in video by covariance matching of silhouette tunnels," in XXII Brazilian Symposium on Computer Graphics and Image Processing, IEEE: 299-306, 2009.   

[39] A. Elgammal, R. Duraiswami, D. Harwood, L. S. J. P. o. t. I. Davis, "Background and foreground modeling using nonparametric kernel density estimation for visual surveillance," 90(7): 1151-1163, 2002. .