High-fidelity human 3D models can now be learned directly from videos, typically by combining a template-based surface model with neural representations. However, obtaining a template surface requires expensive multi-view capture systems, laser scans, or strictly controlled conditions. Previous methods avoid using a template but rely on a costly or ill-posed mapping from observation to canonical space. We propose a hybrid point-based representation for reconstructing animatable characters that does not require an explicit surface model, while being generalizable to novel poses. For a given video, our method automatically produces an explicit set of 3D points representing approximate canonical geometry, and learns an articulated deformation model that produces pose-dependent point transformations. The points serve both as a scaffold for high-frequency neural features and an anchor for efficiently mapping between observation and canonical space. We demonstrate on established benchmarks that our representation overcomes limitations of prior work operating in either canonical or in observation space. Moreover, our automatic point extraction approach enables learning models of human and animal characters alike, matching the performance of the methods using rigged surface templates despite being more general. Project website: https://lemonatsu.github.io/npc/
翻译:高保真人类3D模型现在可以直接从视频中学习,通常通过将基于模板的表面模型与神经表示相结合来实现。然而,获取一个模板表面需要昂贵的多视角捕捉系统、激光扫描或严格控制的条件。以前的方法避免使用模板,但依赖于昂贵或病态的观测到规范空间的映射。我们提出了一种用于重建可动人物的混合基于点的表示,不需要显式表面模型,同时还可以推广到新的姿势。对于给定的视频,我们的方法自动产生明确的表示大致规范几何的3D点,并学习一个关节形变模型,产生依赖于姿势的点变换。这些点既作为高频神经特征的脚手架,也作为在观察和规范空间之间高效映射的锚点。我们在已有的基准测试上证明了我们的表示克服了先前在规范空间或观察空间中操作的方法的局限性。此外,我们的自动点提取方法使得学习人类和动物角色的模型成为可能,与使用固定曲面模板的方法相匹配,尽管更加通用。项目网站:https://lemonatsu.github.io/npc/