在视频监控情景中以视景致人注意的本地化为基础的害虫属性确认 (Pedestrian Attribute Recognition in Video Surveillance Scenarios Based on View-attribute Attention Localization)

Pedestrian attribute recognition in surveillance scenarios is still a challenging task due to inaccurate localization of specific attributes. In this paper, we propose a novel view-attribute localization method based on attention (VALA), which relies on the strong relevance between attributes and views to capture specific view-attributes and to localize attribute-corresponding areas by attention mechanism. A specific view-attribute is composed by the extracted attribute feature and four view scores which are predicted by view predictor as the confidences for attribute from different views. View-attribute is then delivered back to shallow network layers for supervising deep feature extraction. To explore the location of a view-attribute, regional attention is introduced to aggregate spatial information of the input attribute feature in height and width direction for constraining the image into a narrow range. Moreover, the inter-channel dependency of view-feature is embedded in the above two spatial directions. An attention attribute-specific region is gained after fining the narrow range by balancing the ratio of channel dependencies between height and width branches. The final view-attribute recognition outcome is obtained by combining the output of regional attention with the view scores from view predictor. Experiments on three wide datasets (RAP, RAPv2, PETA, and PA-100K) demonstrate the effectiveness of our approach compared with state-of-the-art methods.

翻译：由于特定属性的本地化不准确,在监视情景中,对属性属性的识别仍是一项艰巨的任务。在本文件中,我们提出基于关注的新的视图属性定位方法(VALA),该方法依靠属性和视图之间的强烈关联性来捕捉特定的视图属性属性属性,并通过关注机制将属性对应区域本地化。具体的视图属性属性属性由提取的属性特征和四个视图分数组成,这些属性由查看预测者作为不同观点的属性的属性预测而预测,从而成为对属性的信任。然后将视图属性交付到浅端网络层,以监督深度特征提取。为探索一个视图属性定位,将区域对投入属性属性属性的高度和宽度定位的汇总空间信息引入区域关注度,以将图像限制到狭小范围。此外,上述两个空间方向都含有对属性属性的部门间依赖性。通过平衡频道对高度和宽宽度分支之间依赖度的频道比例进行定位,从而获得特定属性区域的注意度区域。最后的视图属性识别结果是通过将区域定位的输出与三度预测方法的实验性、对区域定位的预测结果进行对比。

相关内容

注意力机制

关注 0

Attention机制最早是在视觉图像领域提出来的，但是真正火起来应该算是google mind团队的这篇论文《Recurrent Models of Visual Attention》[14]，他们在RNN模型上使用了attention机制来进行图像分类。随后，Bahdanau等人在论文《Neural Machine Translation by Jointly Learning to Align and Translate》 [1]中，使用类似attention的机制在机器翻译任务上将翻译和对齐同时进行，他们的工作算是是第一个提出attention机制应用到NLP领域中。接着类似的基于attention机制的RNN模型扩展开始应用到各种NLP任务中。最近，如何在CNN中使用attention机制也成为了大家的研究热点。下图表示了attention研究进展的大概趋势。

【图与几何深度学习】Graph and geometric deep learning，49页ppt

专知会员服务

65+阅读 · 2021年4月24日

【视频描述综述论文】Video Description: A Survey of Methods, Datasets, and Evaluation Metrics

专知会员服务

65+阅读 · 2020年5月12日

Video Description视频描述综述论文-方法、数据集和评估指标，UWA

专知会员服务

39+阅读 · 2020年3月5日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日