ISSN 0253-2778

CN 34-1054/N

Open AccessOpen Access JUSTC Info. & Intelligence 22 September 2022

Self-supervised human semantic parsing for video-based person re-identification

Cite this:
https://doi.org/10.52396/JUSTC-2021-0212
More Information
  • Author Bio:

    Wei Wu received her B.E. degree in Electronic Information Engineering from the University of Science and Technology of China (USTC) in 2020, and is pursing a Ph.D. degree in the School of Cyber Science and Technology at USTC. Her research interests mainly include computer vision and multimedia

    Jiawei Liu received his B.E. degree from Hefei University of Technology in 2013 and received his Ph.D. degree from the University of Science and Technology of China (USTC) in 2019. He is currently an associate research fellow in the School of Information Science and Technology at USTC. His research interests mainly include computer vision and multimedia

  • Corresponding author: E-mail: jwliu6@ustc.edu.cn
  • Received Date: 26 September 2021
  • Accepted Date: 10 April 2022
  • Available Online: 22 September 2022
  • Video-based person re-identification is an important research topic in computer vision that entails associating a pedestrian’s identity with non-overlapping cameras. It suffers from severe temporal appearance misalignment and visual ambiguity problems. We propose a novel self-supervised human semantic parsing approach (SS-HSP) for video-based person re-identification in this work. It employs self-supervised learning to adaptively segment the human body at pixel-level by estimating motion information of each body part between consecutive frames and explores complementary temporal relations for pursuing reinforced appearance and motion representations. Specifically, a semantic segmentation network within SS-HSP is designed, which exploits self-supervised learning by constructing a pretext task of predicting future frames. The network learns precise human semantic parsing together with the motion field of each body part between consecutive frames, which permits the reconstruction of future frames with the aid of several customized loss functions. Local aligned features of body parts are obtained according to the estimated human parsing. Moreover, an aggregation network is proposed to explore the correlation information across video frames for refining the appearance and motion representations. Extensive experiments on two video datasets have demonstrated the effectiveness of the proposed approach.
    The basic structure of self-supervised human semantic parsing approach (SS-HSP).
    Video-based person re-identification is an important research topic in computer vision that entails associating a pedestrian’s identity with non-overlapping cameras. It suffers from severe temporal appearance misalignment and visual ambiguity problems. We propose a novel self-supervised human semantic parsing approach (SS-HSP) for video-based person re-identification in this work. It employs self-supervised learning to adaptively segment the human body at pixel-level by estimating motion information of each body part between consecutive frames and explores complementary temporal relations for pursuing reinforced appearance and motion representations. Specifically, a semantic segmentation network within SS-HSP is designed, which exploits self-supervised learning by constructing a pretext task of predicting future frames. The network learns precise human semantic parsing together with the motion field of each body part between consecutive frames, which permits the reconstruction of future frames with the aid of several customized loss functions. Local aligned features of body parts are obtained according to the estimated human parsing. Moreover, an aggregation network is proposed to explore the correlation information across video frames for refining the appearance and motion representations. Extensive experiments on two video datasets have demonstrated the effectiveness of the proposed approach.
    • A self-supervised human semantic parsing approach is proposed for video-based person re-identification.
    • We employ self-supervised learning to adaptively segment the human body by estimating the motion information of each body part between consecutive frames.
    • We explore complementary temporal relations for pursuing reinforced appearance and motion representations.

  • loading
  • [1]
    Li X, Zhou W, Zhou Y, et al. Relation-guided spatial attention and temporal refinement for video-based person re-identification. Proceedings of the AAAI Conference on Artificial Intelligence, 2020, 34 (7): 11434–11441. doi: https://doi.org/10.1609/aaai.v34i07.6807
    [2]
    Cheng Z, Dong Q, Gong S, et al. Inter-task association critic for cross-resolution person re-identification. In: 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). Seattle, USA: IEEE, 2020: 2602–2612.
    [3]
    Huang Y, Zha Z J, Fu X, et al. Real-world person re-identification via degradation invariance learning. In: 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). Seattle, USA: IEEE, 2020: 14072–14082.
    [4]
    Ding Y, Fan H, Xu M, et al. Adaptive exploration for unsupervised person re-identification. ACM Transactions on Multimedia Computing, Communications, and Applications (TOMM), 2020, 16 (1): 1–19. doi: 10.1145/3369393
    [5]
    Kalayeh M M, Basaran E, Gökmen M, et al. Human semantic parsing for person re-identification. In: 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Salt Lake City, USA: IEEE, 2018: 1062–1071.
    [6]
    Liang X, Gong K, Shen X, et al. Look into person: Joint body parsing & pose estimation network and a new benchmark. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2019, 41 (4): 871–885. doi: 10.1109/TPAMI.2018.2820063
    [7]
    Song C, Huang Y, Ouyang W, et al. Mask-guided contrastive attention model for person re-identification. In: 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Salt Lake City, USA: IEEE, 2018: 1179–1188.
    [8]
    Ye M, Yuen P C. PurifyNet: A robust person re-identification model with noisy labels. IEEE Transactions on Information Forensics and Security, 2020, 15: 2655–2666. doi: 10.1109/TIFS.2020.2970590
    [9]
    Liu H, Jie Z, Jayashree K, et al. Video-based person re-identification with accumulative motion context. IEEE Transactions on Circuits and Systems for Video Technology, 2018, 28 (10): 2788–2802. doi: 10.1109/TCSVT.2017.2715499
    [10]
    Wang Z, Luo S, Sun H, et al. An efficient non-local attention network for video-based person re-identification. In: ICIT 2019: Proceedings of the 2019 7th International Conference on Information Technology: IoT and Smart City. Shanghai, China: Association for Computing Machinery, 2019: 212–217.
    [11]
    Zheng L, Bie Z, Sun Y, et al. MARS: A video benchmark for large-scale person re-identification. In: Leibe B, Matas J, Sebe N, et al. editors. Computer Vision – ECCV 2016. Cham, Switzerland: Springer, 2016: 868–884.
    [12]
    Wang T, Gong S, Zhu X, et al. Person re-identification by video ranking. In: Fleet D, PajdlaT, Schiele B, et al. editors. Computer Vision – ECCV 2014. Cham, Switzerland: Springer, 2014: 688–703.
    [13]
    McLaughlin N, del Rincon J M, Miller P. Recurrent convolutional network for video-based person re-identification. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). Las Vegas, USA: IEEE, 2016: 1325–1334.
    [14]
    Yang J, Zheng W S, Yang Q, et al. Spatial-temporal graph convolutional network for video-based person re-identification. In: 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). Seattle, USA: IEEE, 2020: 3286-3296.
    [15]
    Wu Y, Bourahla O E F, Li X, et al. Adaptive graph representation learning for video person re-identification. IEEE Transactions on Image Processing, 2020, 29: 8821–8830. doi: 10.1109/TIP.2020.3001693
    [16]
    Li S, Bak S, Carr P, et al. Diversity regularized spatiotemporal attention for video-based person re-identification. In: 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Salt Lake City, USA: IEEE, 2018: 369–378.
    [17]
    Zhou Z, Huang Y, Wang W, et al. See the forest for the trees: Joint spatial and temporal recurrent neural networks for video-based person re-identification. In: 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). Honolulu, USA: IEEE, 2017: 4747-4756.
    [18]
    Li X, Loy C C. Video object segmentation with joint re-identification and attention-aware mask propagation. In: Ferrari, V, Hebert M, Sminchisescu C, et al. editors. Computer Vision – ECCV 2018. Cham, Switzerland: Springer, 2018: 93–110.
    [19]
    Jones M J, Rambhatla S. Body part alignment and temporal attention for video-based person re-identification. In: Sidorov K, Hicks Y, editors. Proceedings of the British Machine Vision Conference (BMVC). London: BMVA Press, 2019, 115: 1−12.
    [20]
    Gao C, Chen Y, Yu J G, et al. Pose-guided spatiotemporal alignment for video-based person re-identification. Information Sciences, 2020, 527: 176–190. doi: 10.1016/j.ins.2020.04.007
    [21]
    Liu J, Zha Z J, Chen X, et al. Dense 3D-convolutional neural network for person re-identification in videos. ACM Transactions on Multimedia Computing, Communications, and Applications, 2019, 15 (1s): 1–19. doi: 10.1145/3231741
    [22]
    Chung D, Tahboub K, Delp E J. A two stream siamese convolutional neural network for person re-identification. In: 2017 IEEE International Conference on Computer Vision (ICCV). Venice, Italy: IEEE, 2017: 1992-2000.
    [23]
    Li J, Zhang S, Huang T. Multi-scale 3D convolution network for video based person re-identification. In: AAAI'19: AAAI Conference on Artificial Intelligence. Honolulu, USA: AAAI Press, 2019: 1057.
    [24]
    Jin X, He T, Zheng K, et al. Cloth-changing person re-identification from a single image with gait prediction and regularization. [2021-09-01]. https://arxiv.org/abs/2103.15537
    [25]
    Zhang P, Wu Q, Xu J, et al. Long-term person re-identification using true motion from videos. In: 2018 IEEE Winter Conference on Applications of Computer Vision (WACV). Lake Tahoe, USA: IEEE, 2018: 494–502.
    [26]
    Zhu K, Guo H, Liu Z, et al. Identity-guided human semantic parsing for person re-identification. In: Vedaldi A, Bischof H, Brox T, et al. editors. Computer Vision – ECCV 2020. Cham, Switzerland: Springer, 2020: 346-363.
    [27]
    Liao S C, Hu Y, Zhu X Y, et al. Person re-identification by local maximal occurrence representation and metric learning. In: 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). Boston, USA: IEEE, 2015, 2197–2206.
    [28]
    Bazzani L, Cristani M, Murino V. Symmetry-driven accumulation of local features for human characterization and re-identification. Computer Vision and Image Understanding, 2013, 117 (2): 130–144. doi: 10.1016/j.cviu.2012.10.008
    [29]
    Zhang L, Xiang T, Gong S. Learning a discriminative null space for person re-identification. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). Las Vegas, USA: IEEE, 2016: 1239-1248.
    [30]
    Zhou Q, Zhong B, Lan X, et al. LRDNN: Local-refining based deep neural network for person re-identification with attribute discerning. In: Proceedings of the Twenty-Eighth International Joint Conference on Artificial Intelligence. Macao: International Joint Conferences on Artificial Intelligence Organization, 2019: 1041−1047.
    [31]
    Zhang Z, Lan C, Zeng W, et al. Relation-aware global attention for person re-identification. In: 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). Seattle, USA: IEEE, 2020: 3183-3192.
    [32]
    Jin X, Lan C, Zeng W, et al. Semantics-aligned representation learning for person re-identification. Proceedings of the AAAI Conference on Artificial Intelligence, 2020, 34 (7): 11173–11180. doi: 10.1609/aaai.v34i07.6775
    [33]
    You J, Wu A, Li X, et al. Top-push video-based person re-identification. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). Las Vegas, USA: IEEE, 2016: 1345–1353.
    [34]
    Gu X, Chang H, Ma B, et al. Appearance-preserving 3D convolution for video-based person re-identification. In: Vedaldi A, Bischof H, Brox T, et al. editors. Computer Vision – ECCV 2020. Cham, Switzerland: Springer, 2020: 228–243.
    [35]
    Li S, Yu H, Hu H. Appearance and motion enhancement for video-based person re-identification. Proceedings of the AAAI Conference on Artificial Intelligence, 2020, 34 (7): 11394–11401. doi: 10.1609/aaai.v34i07.6802
    [36]
    He K, Zhang X, Ren S, et al. Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). Las Vegas, USA: IEEE, 2016: 770–778.
    [37]
    Siarohin A, Lathuilière A, Tulyakov S, et al. First order motion model for image animation. In: Wallach H, Larochelle H, Beygelzimer A et al. editors. Advances in Neural Information Processing Systems. Red Hook, NY: Curran Associates, Inc, 2019: 3854.
    [38]
    Ronneberger O, Fischer P, Brox T. U-net: Convolutional networks for biomedical image segmentation. In: Navab N, Hornegger J, Wells W, et al. editors. Medical Image Computing and Computer-Assisted Intervention–MICCAI 2015. Cham, Switzerland: Springer, 2015: 234–241.
    [39]
    Johnson J, Alahi A, Li F F. Perceptual losses for real-time style transfer and super-resolution. In: Leibe B, Matas J, Sebe N, et al. editors. Computer Vision – ECCV 2016. Cham, Switzerland: Springer, 2016: 694-711.
    [40]
    Siarohin A, Sangineto E, Lathuiliere S, et al. Deformable GANs for pose-based human image generation. In: 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Salt Lake City, USA: IEEE, 2018: 3408−3416.
    [41]
    Hung W C, Jampani V, Liu S F, et al. SCOPS: Self-supervised co-part segmentation. In: 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Long Beach, USA: IEEE, 2019: 869–878.
    [42]
    Hou R, Chang H, Ma B, et al. Temporal complementary learning for video person re-identification. [2021-09-01]. https://arxiv.org/abs/2007.09357.
    [43]
    Hermans A, Beyer L, Leibe B. In defense of the triplet loss for person re-identification. [2021-09-01]. https://arxiv.org/abs/1703.07737
    [44]
    Liu J, Zha Z J, Chen D, et al. Adaptive transfer network for cross-domain person re-identification. In: 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). Long Beach, USA: IEEE, 2019: 7195–7204.
    [45]
    Liu Y, Yan J, Ouyang W. Quality aware network for set to set recognition. In: 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). Honolulu, USA: IEEE, 2017: 4694–4703.
    [46]
    Subramaniam A, Nambiar A, Mittal A, et al. Co-segmentation inspired attention networks for video-based person re-identification. In: 2019 IEEE/CVF International Conference on Computer Vision (ICCV). Seoul, Korea (South): IEEE, 2019: 562–572.
    [47]
    Chen D, Li H, Xiao T, et al. Video person re-identification with competitive snippet-similarity aggregation and co-attentive snippet embedding. In: 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Salt Lake City, USA: IEEE, 2018: 1169–1178.
    [48]
    Li J, Zhang S, Wang J, et al. Global-local temporal representations for video person re-identification. In: 2019 IEEE/CVF International Conference on Computer Vision (ICCV). Seoul, Korea(South): IEEE, 2019: 3957–3966.
    [49]
    Aich A, Zheng M, Karanam S, et al. Spatio-temporal representation factorization for video-based person re-identification. In: 2021 IEEE/CVF International Conference on Computer Vision (ICCV). Montreal, Canada: IEEE, 2021: 152–162.
    [50]
    He T Y, Jin X, Shen X, et al. Dense interaction learning for video-based person re-identification. In: 2021 IEEE/CVF International Conference on Computer Vision (ICCV). Montreal, Canada: IEEE, 2021: 1470–1481.
  • 加载中

Catalog

    Figure  1.  Example video sequences in the MARS and iLIDS-VID person re-identification datasets.

    Figure  2.  The overall architecture of the proposed SS-HSP. It consists of a backbone network, a semantic segmentation network as well as an aggregation network.

    Figure  3.  Detailed structure of the semantic segmentation network.

    Figure  4.  Parameter analysis of (a) the number of body parts $ K $ and (b) the sequence length $ T $ on the MARS dataset.

    Figure  5.  Visualization results of the estimated segmentation maps of two video sequences.

    Figure  6.  Example of retrieval results by SS-HSP on MARS dataset. Correct matches are highlighted red.

    [1]
    Li X, Zhou W, Zhou Y, et al. Relation-guided spatial attention and temporal refinement for video-based person re-identification. Proceedings of the AAAI Conference on Artificial Intelligence, 2020, 34 (7): 11434–11441. doi: https://doi.org/10.1609/aaai.v34i07.6807
    [2]
    Cheng Z, Dong Q, Gong S, et al. Inter-task association critic for cross-resolution person re-identification. In: 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). Seattle, USA: IEEE, 2020: 2602–2612.
    [3]
    Huang Y, Zha Z J, Fu X, et al. Real-world person re-identification via degradation invariance learning. In: 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). Seattle, USA: IEEE, 2020: 14072–14082.
    [4]
    Ding Y, Fan H, Xu M, et al. Adaptive exploration for unsupervised person re-identification. ACM Transactions on Multimedia Computing, Communications, and Applications (TOMM), 2020, 16 (1): 1–19. doi: 10.1145/3369393
    [5]
    Kalayeh M M, Basaran E, Gökmen M, et al. Human semantic parsing for person re-identification. In: 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Salt Lake City, USA: IEEE, 2018: 1062–1071.
    [6]
    Liang X, Gong K, Shen X, et al. Look into person: Joint body parsing & pose estimation network and a new benchmark. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2019, 41 (4): 871–885. doi: 10.1109/TPAMI.2018.2820063
    [7]
    Song C, Huang Y, Ouyang W, et al. Mask-guided contrastive attention model for person re-identification. In: 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Salt Lake City, USA: IEEE, 2018: 1179–1188.
    [8]
    Ye M, Yuen P C. PurifyNet: A robust person re-identification model with noisy labels. IEEE Transactions on Information Forensics and Security, 2020, 15: 2655–2666. doi: 10.1109/TIFS.2020.2970590
    [9]
    Liu H, Jie Z, Jayashree K, et al. Video-based person re-identification with accumulative motion context. IEEE Transactions on Circuits and Systems for Video Technology, 2018, 28 (10): 2788–2802. doi: 10.1109/TCSVT.2017.2715499
    [10]
    Wang Z, Luo S, Sun H, et al. An efficient non-local attention network for video-based person re-identification. In: ICIT 2019: Proceedings of the 2019 7th International Conference on Information Technology: IoT and Smart City. Shanghai, China: Association for Computing Machinery, 2019: 212–217.
    [11]
    Zheng L, Bie Z, Sun Y, et al. MARS: A video benchmark for large-scale person re-identification. In: Leibe B, Matas J, Sebe N, et al. editors. Computer Vision – ECCV 2016. Cham, Switzerland: Springer, 2016: 868–884.
    [12]
    Wang T, Gong S, Zhu X, et al. Person re-identification by video ranking. In: Fleet D, PajdlaT, Schiele B, et al. editors. Computer Vision – ECCV 2014. Cham, Switzerland: Springer, 2014: 688–703.
    [13]
    McLaughlin N, del Rincon J M, Miller P. Recurrent convolutional network for video-based person re-identification. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). Las Vegas, USA: IEEE, 2016: 1325–1334.
    [14]
    Yang J, Zheng W S, Yang Q, et al. Spatial-temporal graph convolutional network for video-based person re-identification. In: 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). Seattle, USA: IEEE, 2020: 3286-3296.
    [15]
    Wu Y, Bourahla O E F, Li X, et al. Adaptive graph representation learning for video person re-identification. IEEE Transactions on Image Processing, 2020, 29: 8821–8830. doi: 10.1109/TIP.2020.3001693
    [16]
    Li S, Bak S, Carr P, et al. Diversity regularized spatiotemporal attention for video-based person re-identification. In: 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Salt Lake City, USA: IEEE, 2018: 369–378.
    [17]
    Zhou Z, Huang Y, Wang W, et al. See the forest for the trees: Joint spatial and temporal recurrent neural networks for video-based person re-identification. In: 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). Honolulu, USA: IEEE, 2017: 4747-4756.
    [18]
    Li X, Loy C C. Video object segmentation with joint re-identification and attention-aware mask propagation. In: Ferrari, V, Hebert M, Sminchisescu C, et al. editors. Computer Vision – ECCV 2018. Cham, Switzerland: Springer, 2018: 93–110.
    [19]
    Jones M J, Rambhatla S. Body part alignment and temporal attention for video-based person re-identification. In: Sidorov K, Hicks Y, editors. Proceedings of the British Machine Vision Conference (BMVC). London: BMVA Press, 2019, 115: 1−12.
    [20]
    Gao C, Chen Y, Yu J G, et al. Pose-guided spatiotemporal alignment for video-based person re-identification. Information Sciences, 2020, 527: 176–190. doi: 10.1016/j.ins.2020.04.007
    [21]
    Liu J, Zha Z J, Chen X, et al. Dense 3D-convolutional neural network for person re-identification in videos. ACM Transactions on Multimedia Computing, Communications, and Applications, 2019, 15 (1s): 1–19. doi: 10.1145/3231741
    [22]
    Chung D, Tahboub K, Delp E J. A two stream siamese convolutional neural network for person re-identification. In: 2017 IEEE International Conference on Computer Vision (ICCV). Venice, Italy: IEEE, 2017: 1992-2000.
    [23]
    Li J, Zhang S, Huang T. Multi-scale 3D convolution network for video based person re-identification. In: AAAI'19: AAAI Conference on Artificial Intelligence. Honolulu, USA: AAAI Press, 2019: 1057.
    [24]
    Jin X, He T, Zheng K, et al. Cloth-changing person re-identification from a single image with gait prediction and regularization. [2021-09-01]. https://arxiv.org/abs/2103.15537
    [25]
    Zhang P, Wu Q, Xu J, et al. Long-term person re-identification using true motion from videos. In: 2018 IEEE Winter Conference on Applications of Computer Vision (WACV). Lake Tahoe, USA: IEEE, 2018: 494–502.
    [26]
    Zhu K, Guo H, Liu Z, et al. Identity-guided human semantic parsing for person re-identification. In: Vedaldi A, Bischof H, Brox T, et al. editors. Computer Vision – ECCV 2020. Cham, Switzerland: Springer, 2020: 346-363.
    [27]
    Liao S C, Hu Y, Zhu X Y, et al. Person re-identification by local maximal occurrence representation and metric learning. In: 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). Boston, USA: IEEE, 2015, 2197–2206.
    [28]
    Bazzani L, Cristani M, Murino V. Symmetry-driven accumulation of local features for human characterization and re-identification. Computer Vision and Image Understanding, 2013, 117 (2): 130–144. doi: 10.1016/j.cviu.2012.10.008
    [29]
    Zhang L, Xiang T, Gong S. Learning a discriminative null space for person re-identification. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). Las Vegas, USA: IEEE, 2016: 1239-1248.
    [30]
    Zhou Q, Zhong B, Lan X, et al. LRDNN: Local-refining based deep neural network for person re-identification with attribute discerning. In: Proceedings of the Twenty-Eighth International Joint Conference on Artificial Intelligence. Macao: International Joint Conferences on Artificial Intelligence Organization, 2019: 1041−1047.
    [31]
    Zhang Z, Lan C, Zeng W, et al. Relation-aware global attention for person re-identification. In: 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). Seattle, USA: IEEE, 2020: 3183-3192.
    [32]
    Jin X, Lan C, Zeng W, et al. Semantics-aligned representation learning for person re-identification. Proceedings of the AAAI Conference on Artificial Intelligence, 2020, 34 (7): 11173–11180. doi: 10.1609/aaai.v34i07.6775
    [33]
    You J, Wu A, Li X, et al. Top-push video-based person re-identification. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). Las Vegas, USA: IEEE, 2016: 1345–1353.
    [34]
    Gu X, Chang H, Ma B, et al. Appearance-preserving 3D convolution for video-based person re-identification. In: Vedaldi A, Bischof H, Brox T, et al. editors. Computer Vision – ECCV 2020. Cham, Switzerland: Springer, 2020: 228–243.
    [35]
    Li S, Yu H, Hu H. Appearance and motion enhancement for video-based person re-identification. Proceedings of the AAAI Conference on Artificial Intelligence, 2020, 34 (7): 11394–11401. doi: 10.1609/aaai.v34i07.6802
    [36]
    He K, Zhang X, Ren S, et al. Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). Las Vegas, USA: IEEE, 2016: 770–778.
    [37]
    Siarohin A, Lathuilière A, Tulyakov S, et al. First order motion model for image animation. In: Wallach H, Larochelle H, Beygelzimer A et al. editors. Advances in Neural Information Processing Systems. Red Hook, NY: Curran Associates, Inc, 2019: 3854.
    [38]
    Ronneberger O, Fischer P, Brox T. U-net: Convolutional networks for biomedical image segmentation. In: Navab N, Hornegger J, Wells W, et al. editors. Medical Image Computing and Computer-Assisted Intervention–MICCAI 2015. Cham, Switzerland: Springer, 2015: 234–241.
    [39]
    Johnson J, Alahi A, Li F F. Perceptual losses for real-time style transfer and super-resolution. In: Leibe B, Matas J, Sebe N, et al. editors. Computer Vision – ECCV 2016. Cham, Switzerland: Springer, 2016: 694-711.
    [40]
    Siarohin A, Sangineto E, Lathuiliere S, et al. Deformable GANs for pose-based human image generation. In: 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Salt Lake City, USA: IEEE, 2018: 3408−3416.
    [41]
    Hung W C, Jampani V, Liu S F, et al. SCOPS: Self-supervised co-part segmentation. In: 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Long Beach, USA: IEEE, 2019: 869–878.
    [42]
    Hou R, Chang H, Ma B, et al. Temporal complementary learning for video person re-identification. [2021-09-01]. https://arxiv.org/abs/2007.09357.
    [43]
    Hermans A, Beyer L, Leibe B. In defense of the triplet loss for person re-identification. [2021-09-01]. https://arxiv.org/abs/1703.07737
    [44]
    Liu J, Zha Z J, Chen D, et al. Adaptive transfer network for cross-domain person re-identification. In: 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). Long Beach, USA: IEEE, 2019: 7195–7204.
    [45]
    Liu Y, Yan J, Ouyang W. Quality aware network for set to set recognition. In: 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). Honolulu, USA: IEEE, 2017: 4694–4703.
    [46]
    Subramaniam A, Nambiar A, Mittal A, et al. Co-segmentation inspired attention networks for video-based person re-identification. In: 2019 IEEE/CVF International Conference on Computer Vision (ICCV). Seoul, Korea (South): IEEE, 2019: 562–572.
    [47]
    Chen D, Li H, Xiao T, et al. Video person re-identification with competitive snippet-similarity aggregation and co-attentive snippet embedding. In: 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Salt Lake City, USA: IEEE, 2018: 1169–1178.
    [48]
    Li J, Zhang S, Wang J, et al. Global-local temporal representations for video person re-identification. In: 2019 IEEE/CVF International Conference on Computer Vision (ICCV). Seoul, Korea(South): IEEE, 2019: 3957–3966.
    [49]
    Aich A, Zheng M, Karanam S, et al. Spatio-temporal representation factorization for video-based person re-identification. In: 2021 IEEE/CVF International Conference on Computer Vision (ICCV). Montreal, Canada: IEEE, 2021: 152–162.
    [50]
    He T Y, Jin X, Shen X, et al. Dense interaction learning for video-based person re-identification. In: 2021 IEEE/CVF International Conference on Computer Vision (ICCV). Montreal, Canada: IEEE, 2021: 1470–1481.

    Article Metrics

    Article views (528) PDF downloads(2081)
    Proportional views

    /

    DownLoad:  Full-Size Img  PowerPoint
    Return
    Return