Skeleton-based human action recognition aims to classify human skeletal sequences, which are spatiotemporal representations of actions, into predefined categories. To reduce the reliance on costly annotations of skeletal sequences while maintaining competitive recognition accuracy, the task of 3D Action Recognition with Limited Training Samples, also known as semi-supervised 3D Action Recognition, has been proposed. In addition, active learning, which aims to proactively select the most informative unlabeled samples for annotation, has been explored in semi-supervised 3D Action Recognition for training sample selection. Specifically, researchers adopt an encoder-decoder framework to embed skeleton sequences into a latent space, where clustering information, combined with a margin-based selection strategy using a multi-head mechanism, is utilized to identify the most informative sequences in the unlabeled set for annotation. However, the most representative skeleton sequences may not necessarily be the most informative for the action recognizer, as the model may have already acquired similar knowledge from previously seen skeleton samples. To solve it, we reformulate Semi-supervised 3D action recognition via active learning from a novel perspective by casting it as a Markov Decision Process (MDP). Built upon the MDP framework and its training paradigm, we train an informative sample selection model to intelligently guide the selection of skeleton sequences for annotation. To enhance the representational capacity of the factors in the state-action pairs within our method, we project them from Euclidean space to hyperbolic space. Furthermore, we introduce a meta tuning strategy to accelerate the deployment of our method in real-world scenarios. Extensive experiments on three 3D action recognition benchmarks demonstrate the effectiveness of our method.
翻译:基于骨架的人体动作识别旨在将人体骨架序列——即动作的时空表示——分类到预定义的类别中。为了在保持竞争力的识别准确性的同时减少对成本高昂的骨架序列标注的依赖,提出了有限训练样本下的三维动作识别任务,也称为半监督三维动作识别。此外,主动学习旨在主动选择信息量最大的未标注样本进行标注,已在半监督三维动作识别中被探索用于训练样本选择。具体而言,研究者采用编码器-解码器框架将骨架序列嵌入到潜在空间中,其中聚类信息与基于间隔的选择策略(采用多头机制)相结合,用于识别未标注集中信息量最大的序列进行标注。然而,最具代表性的骨架序列对于动作识别器而言未必是最具信息性的,因为模型可能已从先前见过的骨架样本中获取了类似知识。为解决此问题,我们从一个新颖的视角重新构建了基于主动学习的半监督三维动作识别,将其建模为马尔可夫决策过程。基于MDP框架及其训练范式,我们训练了一个信息性样本选择模型,以智能地指导骨架序列的标注选择。为增强我们方法中状态-动作对因子的表示能力,我们将其从欧几里得空间投影到双曲空间。此外,我们引入了一种元调优策略,以加速我们方法在现实场景中的部署。在三个三维动作识别基准数据集上进行的大量实验证明了我们方法的有效性。