Paying Attention to the Features Extracted from the Image to Person Re-identification

Zahiri, S. H.; Iranpoor, R.; Mehrshad, N.

doi:10.22061/jecei.2024.10968.752

Document Type : Original Research Paper

Authors

Department of Electrical Engineering, Faculty of Engineering, University of Birjand, Birjand, Iran.

https://doi.org/10.22061/jecei.2024.10968.752

Abstract

Background and Objectives: Person re-identification is an important application in computer vision, enabling the recognition of individuals across non-overlapping camera views. However, the large number of pedestrians with varying appearances, poses, and environmental conditions makes this task particularly challenging. To address these challenges, various learning approaches have been employed. Achieving a balance between speed and accuracy is a key focus of this research. Recently introduced transformer-based models have made significant strides in machine vision, though they have limitations in terms of time and input data. This research aims to balance these models by reducing the input information, focusing attention solely on features extracted from a convolutional neural network model.
Methods: This research integrates convolutional neural network (CNN) and Transformer architectures. A CNN extracts important features of a person in an image, and these features are then processed by the attention mechanism in a Transformer model. The primary objective of this work is to enhance computational speed and accuracy in Transformer architectures.
Results: The results obtained demonstrate an improvement in the performance of the architectures under consistent conditions. In summary, for the Market-1501 dataset, the mAP metric increased from approximately 30% in the downsized Transformer model to around 74% after applying the desired modifications. Similarly, the Rank-1 metric improved from 48% to approximately 89%.
Conclusion: Indeed, although it still has limitations compared to larger Transformer models, the downsized Transformer architecture has proven to be much more computationally efficient. Applying similar modifications to larger models could also yield positive effects. Balancing computational costs while improving detection accuracy remains a relative goal, dependent on specific domains and priorities. Choosing the appropriate method may emphasize one aspect over another.

Keywords

Main Subjects

Artificial Intelligence

Open Access

This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this license, visit: http://creativecommons.org/licenses/by/4.0/

Publisher’s Note

JECEI Publisher remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Publisher

Shahid Rajaee Teacher Training University

References

[1] S. S. A. Zaidi, M. S. Ansari, A. Aslam, N. Kanwal, M. Asghar, B. Lee, "A survey of modern deep learning based object detection models," Digital Signal Process., 126: 103514, 2022.

[2] A. Krizhevsky, I. Sutskever, G. E. Hinton, "Imagenet classification with deep convolutional neural networks," Advances in neural information processing systems, 25(2), 2012.

[3] W. Wei, W. Yang, E. Zuo, Y. Qian, L. Wang, "Person re-identification based on deep learning—An overview," J. Visual Commun. Image Represent., 82: 103418, 2022.

[4] M. Farenzena, L. Bazzani, A. Perina, V. Murino, M. Cristani, "Person re-identification by symmetry-driven accumulation of local features," in Proc. 2010 IEEE computer society conference on computer vision and pattern recognition: 2360-2367, 2010.

[5] W. S. Zheng, S. Gong, T. Xiang, "Person re-identification by probabilistic relative distance comparison," in Proc. CVPR 2011: 649-656, 2011.

[6] K. He, X. Zhang, S. Ren, J. Sun, "Deep residual learning for image recognition," in Proc. the IEEE conference on computer vision and pattern recognition: 770-778, 2016.

[7] Z. Zheng, L. Zheng, Y. Yang, "A discriminatively learned cnn embedding for person reidentification," ACM Trans. Multimedia Comput. Commun. Appl. (TOMM), 14(1): 1-20, 2017.

[8] H. Liu, J. Feng, M. Qi, J. Jiang, S. Yan, "End-to-end comparative attention networks for person re-identification," IEEE Trans. Image Process., 26(7): 3492-3506, 2017.

[9] L. Zheng, Y. Yang, A. G. Hauptmann, "Person re-identification: Past, present and future," arXiv preprint arXiv:1610.02984, 2016.

[10] L. Zheng, H. Zhang, S. Sun, M. Chandraker, Y. Yang, Q. Tian, "Person re-identification in the wild," in Proc. the IEEE conference on computer vision and pattern recognition: 1367-1376, 2017.

[11] R. Girshick, J. Donahue, T. Darrell, J. Malik, "Rich feature hierarchies for accurate object detection and semantic segmentation," in Proc. the IEEE conference on computer vision and pattern recognition: 580-587, 2014.

[12] H. Luo et al., "A strong baseline and batch normalization neck for deep person re-identification," IEEE Trans. Multimedia, 22(10): 2597-2609, 2019.

[13] Y. Sun, L. Zheng, Y. Yang, Q. Tian, S. Wang, "Beyond part models: Person retrieval with refined part pooling (and a strong convolutional baseline)," in Proc. the European conference on computer vision (ECCV): 480-496, 2018.

[14] Y. Sun, L. Zheng, Y. Li, Y. Yang, Q. Tian, S. Wang, "Learning part-based convolutional features for person re-identification," IEEE Trans. Pattern Anal. Mach. Intell., 43(3): 902-917, 2019.

[15] Y. Sun et al., "Circle loss: A unified perspective of pair similarity optimization," in Proc. the IEEE/CVF conference on computer vision and pattern recognition: 6398-6407, 2020.

[16] G. Wang, Y. Yuan, X. Chen, J. Li, X. Zhou, "Learning discriminative features with multiple granularities for person re-identification," in Proc. the 26th ACM international conference on Multimedia : 274-282, 2018.

[17] H. Luo, W. Jiang, X. Zhang, X. Fan, J. Qian, C. Zhang, "Alignedreid++: Dynamically matching local information for person re-identification," Pattern Recognit., 94: 53-61, 2019.

[18] J. Qian, W. Jiang, H. Luo, H. Yu, "Stripe-based and attribute-aware network: A two-branch deep model for vehicle re-identification," Meas. Sci. Technol., 31(9): 095401, 2020.

[19] A. Vaswani et al., "Attention is all you need," Adv. Neural Inf. Process. Syst., 30, 2017.

[20] K. Han et al., "A survey on visual transformer," arXiv preprint arXiv:2012.12556, 2020.

[21] S. Khan, M. Naseer, M. Hayat, S. W. Zamir, F. S. Khan, M. Shah, "Transformers in vision: A survey," ACM Comput. Surv. (CSUR), 54(10s): 1-41, 2022.

[22] A. Dosovitskiy et al., "An image is worth 16x16 words: Transformers for image recognition at scale," arXiv preprint arXiv:2010.11929, 2020.

[23] H. Touvron, M. Cord, M. Douze, F. Massa, A. Sablayrolles, H. Jégou, "Training data-efficient image transformers & distillation through attention," in Proc. International Conference on Machine Learning: 10347-10357, 2021.

[24] S. He, H. Luo, P. Wang, F. Wang, H. Li, W. Jiang, "Transreid: Transformer-based object re-identification," in Proc. the IEEE/CVF International Conference on Computer Vision: 15013-15022, 2021.

[25] D. Wu et al., "Deep learning-based methods for person re-identification: A comprehensive review," Neurocomputing, 337: 354-371, 2019.

[26] D. Gray, H. Tao, "Viewpoint invariant pedestrian recognition with an ensemble of localized features," in Proc. 10th European Conference on Computer Vision, Part I 10: 262-275, 2008.

[27] C. C. Loy, T. Xiang, S. Gong, "Multi-camera activity correlation analysis," in Proc. 2009 IEEE Conference on Computer Vision and Pattern Recognition: 1988-1995, 2009.

[28] W. Li, R. Zhao, X. Wang, "Human reidentification with transferred metric learning," in Proc. 11th Asian Conference on Computer Vision, Part I 11: 31-44, 2013.

[29] W. Li, R. Zhao, T. Xiao, X. Wang, "Deepreid: Deep filter pairing neural network for person re-identification," in Proc. IEEE Conf. Computer Vision and Pattern Recognition: 152-159, 2014.

[30] L. Zheng, L. Shen, L. Tian, S. Wang, J. Wang, Q. Tian, "Scalable person re-identification: A benchmark," in Proc. IEEE International Conference on Computer Vision: 1116-1124, 2015.

[31] E. Ristani, F. Solera, R. Zou, R. Cucchiara, C. Tomasi, "Performance measures and a data set for multi-target, multi-camera tracking," in Proc. European Conference on Computer Vision: 17-35, 2016.

[32] L. Wei, S. Zhang, W. Gao, Q. Tian, "Person transfer gan to bridge domain gap for person re-identification," in Proc. IEEE Conference on Computer Vision and Pattern Recognition: 79-88, 2018.

[33] S. Ren, K. He, R. Girshick, J. Sun, "Faster r-cnn: Towards real-time object detection with region proposal networks," IEEE Trans. Pattern Anal. Mach. Intell., 39(6): 1137-1149, 2017.

[34] S. Targ, D. Almeida, K. Lyman, "Resnet in resnet: Generalizing residual architectures," arXiv preprint arXiv:1603.08029, 2016.

[35] S. Xie, R. Girshick et al., "Aggregated residual transformations for deep neural networks," in Proc. the IEEE Conference on Computer Vision and Pattern Recognition: 1492-1500, 2017.

[36] A. G. Howard et al., "Mobilenets: Efficient convolutional neural networks for mobile vision applications," arXiv preprint arXiv:1704.04861, 2017.

[37] J. Zang, L. Wang, Z. Liu, Q. Zhang, G. Hua, N. Zheng, "Attention-based temporal weighted convolutional neural network for action recognition," in Proc. Artificial Intelligence Applications and Innovations: 97-108, 2018.

[38] F. N. Iandola, S. Han, M. W. Moskewicz et al. "SqueezeNet: AlexNet-level accuracy with 50x fewer parameters and< 0.5 MB model size," arXiv preprint arXiv:1602.07360, 2016.

[39] F. Chollet, "Xception: Deep learning with depthwise separable convolutions," in Proc. IEEE Conference on Computer Vision and Pattern Recognition: 1251-1258, 2017.

[40] M. Sandler, A. Howard, M. Zhu, A. Zhmoginov, L. C. Chen, "Mobilenetv2: Inverted residuals and linear bottlenecks," in Proc. IEEE Conference on Computer Vision and Pattern Recognition: 4510-4520, 2018.

[41] A. Howard et al., "Searching for mobilenetv3," in Proc. IEEE/CVF International Conference on Computer Vision: 1314-1324, 2019.

[42] S. Elfwing, E. Uchibe, K. Doya, "Sigmoid-weighted linear units for neural network function approximation in reinforcement learning," Neural networks, 107: 3-11, 2018.

[43] Y. Guo, D. Zhou, W. Li, J. Cao, "Deep multi-scale Gaussian residual networks for contextual-aware translation initiation site recognition," Expert Syst. Appl., 207: 118004, 2022.

[44] P. Ramachandran, B. Zoph, Q. V. Le, "Searching for activation functions," arXiv preprint arXiv:1710.05941, 2017.

[45] J. L. Ba, J. R. Kiros, G. E. Hinton, "Layer normalization," arXiv preprint arXiv:1607.06450, 2016.

LETTERS TO EDITOR

Journal of Electrical and Computer Engineering Innovations (JECEI) welcomes letters to the editor for the post-publication discussions and corrections which allows debate post publication on its site, through the Letters to Editor. Letters pertaining to manuscript published in JECEI should be sent to the editorial office of JECEI within three months of either online publication or before printed publication, except for critiques of original research. Following points are to be considering before sending the letters (comments) to the editor.

[1] Letters that include statements of statistics, facts, research, or theories should include appropriate references, although more than three are discouraged.

[2] Letters that are personal attacks on an author rather than thoughtful criticism of the author’s ideas will not be considered for publication.

[3] Letters can be no more than 300 words in length.

[4] Letter writers should include a statement at the beginning of the letter stating that it is being submitted either for publication or not.

[5] Anonymous letters will not be considered.

[6] Letter writers must include their city and state of residence or work.

[7] Letters will be edited for clarity and length.

Name *

Email Address *

Affiliation *

Comments *

Security Code *

Journal of Electrical and Computer Engineering Innovations (JECEI)

Paying Attention to the Features Extracted from the Image to Person Re-identification

References

References

Send comment about this article

Volume 13, Issue 2
July 2025
Pages 267-274

Paying Attention to the Features Extracted from the Image to Person Re-identification

References

References

Send comment about this article

Volume 13, Issue 2July 2025Pages 267-274

Volume 13, Issue 2
July 2025
Pages 267-274