SkelSplat：基于可微分高斯渲染的鲁棒多视角三维人体姿态估计 (SkelSplat: Robust Multi-view 3D Human Pose Estimation with Differentiable Gaussian Rendering)

Accurate 3D human pose estimation is fundamental for applications such as augmented reality and human-robot interaction. State-of-the-art multi-view methods learn to fuse predictions across views by training on large annotated datasets, leading to poor generalization when the test scenario differs. To overcome these limitations, we propose SkelSplat, a novel framework for multi-view 3D human pose estimation based on differentiable Gaussian rendering. Human pose is modeled as a skeleton of 3D Gaussians, one per joint, optimized via differentiable rendering to enable seamless fusion of arbitrary camera views without 3D ground-truth supervision. Since Gaussian Splatting was originally designed for dense scene reconstruction, we propose a novel one-hot encoding scheme that enables independent optimization of human joints. SkelSplat outperforms approaches that do not rely on 3D ground truth in Human3.6M and CMU, while reducing the cross-dataset error up to 47.8% compared to learning-based methods. Experiments on Human3.6M-Occ and Occlusion-Person demonstrate robustness to occlusions, without scenario-specific fine-tuning. Our project page is available here: https://skelsplat.github.io.

翻译：精确的三维人体姿态估计对于增强现实和人机交互等应用至关重要。当前最先进的多视角方法通过在大型标注数据集上进行训练来学习融合多视角预测，导致测试场景不同时泛化能力较差。为克服这些限制，我们提出了SkelSplat，一种基于可微分高斯渲染的新型多视角三维人体姿态估计框架。人体姿态被建模为由三维高斯函数构成的骨架（每个关节对应一个高斯函数），通过可微分渲染进行优化，从而无需三维真实标注即可实现任意相机视角的无缝融合。由于高斯泼溅技术最初是为密集场景重建设计的，我们提出了一种新颖的独热编码方案，使人体关节能够独立优化。在Human3.6M和CMU数据集上，SkelSplat的性能优于不依赖三维真实标注的方法，同时与基于学习的方法相比，跨数据集误差降低了高达47.8%。在Human3.6M-Occ和Occlusion-Person数据集上的实验表明，该方法对遮挡具有鲁棒性，且无需针对特定场景进行微调。项目页面详见：https://skelsplat.github.io。