Video generators are increasingly evaluated as potential world models, which requires them to encode and understand physical laws. We investigate their representation of a fundamental law: gravity. Out-of-the-box video generators consistently generate objects falling at an effectively slower acceleration. However, these physical tests are often confounded by ambiguous metric scale. We first investigate if observed physical errors are artifacts of these ambiguities (e.g., incorrect frame rate assumptions). We find that even temporal rescaling cannot correct the high-variance gravity artifacts. To rigorously isolate the underlying physical representation from these confounds, we introduce a unit-free, two-object protocol that tests the timing ratio $t_1^2/t_2^2 = h_1/h_2$, a relationship independent of $g$, focal length, and scale. This relative test reveals violations of Galileo's equivalence principle. We then demonstrate that this physical gap can be partially mitigated with targeted specialization. A lightweight low-rank adaptor fine-tuned on only 100 single-ball clips raises $g_{\mathrm{eff}}$ from $1.81\,\mathrm{m/s^2}$ to $6.43\,\mathrm{m/s^2}$ (reaching $65\%$ of terrestrial gravity). This specialist adaptor also generalizes zero-shot to two-ball drops and inclined planes, offering initial evidence that specific physical laws can be corrected with minimal data.
翻译:视频生成模型正日益被评估为潜在的世界模型,这要求其能够编码并理解物理定律。本研究探究了其对一项基本定律——重力——的表征能力。未经调整的视频生成模型持续生成以下落加速度显著更慢的物体。然而,这些物理测试常因度量尺度模糊而受到干扰。我们首先探究观测到的物理误差是否为这些模糊性(如错误的帧率假设)的产物。研究发现,即使进行时间尺度重标定也无法修正高方差的重力伪影。为严格从这些干扰因素中分离出底层的物理表征,我们提出了一种无量纲的双物体测试协议,通过时序比 $t_1^2/t_2^2 = h_1/h_2$ 进行验证——该关系独立于重力加速度 $g$、焦距及尺度。这项相对性测试揭示了模型对伽利略等效原理的违背。随后,我们证明这种物理表征缺陷可通过针对性专业化得到部分缓解:一个仅使用100段单球下落视频片段进行微调的轻量级低秩适配器,可将有效重力加速度 $g_{\mathrm{eff}}$ 从 $1.81\,\mathrm{m/s^2}$ 提升至 $6.43\,\mathrm{m/s^2}$(达到地球重力的65%)。该专业化适配器还能零样本泛化至双球下落与斜面运动场景,初步证明特定物理定律可通过极少量数据得到修正。