This report presents Pelican-VL 1.0, a new family of open-source embodied brain models with parameter scales ranging from 7 billion to 72 billion. Our explicit mission is clearly stated as: To embed powerful intelligence into various embodiments. Pelican-VL 1.0 is currently the largest-scale open-source embodied multimodal brain model. Its core advantage lies in the in-depth integration of data power and intelligent adaptive learning mechanisms. Specifically, metaloop distilled a high-quality dataset from a raw dataset containing 4+ billion tokens. Pelican-VL 1.0 is trained on a large-scale cluster of 1000+ A800 GPUs, consuming over 50k+ A800 GPU-hours per checkpoint. This translates to a 20.3% performance uplift from its base model and outperforms 100B-level open-source counterparts by 10.6%, placing it on par with leading proprietary systems on well-known embodied benchmarks. We establish a novel framework, DPPO (Deliberate Practice Policy Optimization), inspired by human metacognition to train Pelican-VL 1.0. We operationalize this as a metaloop that teaches the AI to practice deliberately, which is a RL-Refine-Diagnose-SFT loop.
翻译:本报告介绍了Pelican-VL 1.0,这是一个新系列的开源具身脑模型,其参数量级从70亿到720亿不等。我们明确阐述的使命是:将强大的智能嵌入到各种具身体中。Pelican-VL 1.0是目前规模最大的开源具身多模态脑模型。其核心优势在于数据能力与智能自适应学习机制的深度融合。具体而言,元循环(metaloop)从一个包含超过40亿标记的原始数据集中蒸馏出了一个高质量数据集。Pelican-VL 1.0在由1000多块A800 GPU组成的大规模集群上进行训练,每个检查点消耗超过5万A800 GPU小时。这使其性能相较于基础模型提升了20.3%,并超越了百亿级开源同类模型10.6%,在知名的具身基准测试中达到了与领先的专有系统相当的水平。我们建立了一个新颖的框架DPPO(刻意练习策略优化),其灵感来源于人类的元认知,用于训练Pelican-VL 1.0。我们将其实现为一个元循环,该循环教导AI进行刻意练习,这是一个RL(强化学习)-精炼-诊断-监督微调(SFT)的循环过程。