• 中文核心期刊要目总览
  • 中国科技核心期刊
  • 中国科学引文数据库(CSCD)
  • 中国科技论文与引文数据库(CSTPCD)
  • 中国学术期刊文摘数据库(CSAD)
  • 中国学术期刊(网络版)(CNKI)
  • 中文科技期刊数据库
  • 万方数据知识服务平台
  • 中国超星期刊域出版平台
  • 国家科技学术期刊开放平台
  • 荷兰文摘与引文数据库(SCOPUS)
  • 日本科学技术振兴机构数据库(JST)

基于自监督人体语义解析的视频行人重识别方法

Self-supervised human semantic parsing for video-based person re-identification

  • 摘要: 基于视频的行人重识别是计算机视觉领域中一个重要研究课题,其目的在于找到非重叠相机中的同一行人。该任务的难点在于时序外观未对准以及视觉模糊等。本文针对基于视频的行人重识别提出了自监督人体语义分析(SS-HSP)方法。该方法引入自监督学习方法,通过估计视频连续帧之间各个身体部位的运动信息和复杂的时间关联信息实现像素级人体区域分割,从而增强行人外观与动作表示。具体地,在SS-HSP中设计语义分割网络,通过构造预测后续帧的任务实现自监督学习。该网络学习准确的语义解析和连续帧之间身体部位的运动信息,从而实现在多个自定义的损失函数的帮助下预测后续帧。人体区域局部对准特征由估计的人体语义解析获取。此外,我们提出聚合网络,用于探究帧之间的关联信息以改进对行人外观与动作的表示。我们在两个视频数据集上进行了大量的实验并证实了该方法的有效性。

     

    Abstract: Video-based person re-identification is an important research topic in computer vision that entails associating a pedestrian’s identity with non-overlapping cameras. It suffers from severe temporal appearance misalignment and visual ambiguity problems. We propose a novel self-supervised human semantic parsing approach (SS-HSP) for video-based person re-identification in this work. It employs self-supervised learning to adaptively segment the human body at pixel-level by estimating motion information of each body part between consecutive frames and explores complementary temporal relations for pursuing reinforced appearance and motion representations. Specifically, a semantic segmentation network within SS-HSP is designed, which exploits self-supervised learning by constructing a pretext task of predicting future frames. The network learns precise human semantic parsing together with the motion field of each body part between consecutive frames, which permits the reconstruction of future frames with the aid of several customized loss functions. Local aligned features of body parts are obtained according to the estimated human parsing. Moreover, an aggregation network is proposed to explore the correlation information across video frames for refining the appearance and motion representations. Extensive experiments on two video datasets have demonstrated the effectiveness of the proposed approach.

     

/

返回文章
返回