WebVIA：一种基于Web的视觉-语言智能体框架，用于交互式且可验证的UI到代码生成 (WebVIA: A Web-based Vision-Language Agentic Framework for Interactive and Verifiable UI-to-Code Generation)

User interface (UI) development requires translating design mockups into functional code, a process that remains repetitive and labor-intensive. While recent Vision-Language Models (VLMs) automate UI-to-Code generation, they generate only static HTML/CSS/JavaScript layouts lacking interactivity. To address this, we propose WebVIA, the first agentic framework for interactive UI-to-Code generation and validation. The framework comprises three components: 1) an exploration agent to capture multi-state UI screenshots; 2) a UI2Code model that generates executable interactive code; 3) a validation module that verifies the interactivity. Experiments demonstrate that WebVIA-Agent achieves more stable and accurate UI exploration than general-purpose agents (e.g., Gemini-2.5-Pro). In addition, our fine-tuned WebVIA-UI2Code models exhibit substantial improvements in generating executable and interactive HTML/CSS/JavaScript code, outperforming their base counterparts across both interactive and static UI2Code benchmarks. Our code and models are available at \href{https://zheny2751-dotcom.github.io/webvia.github.io/}{\texttt{https://webvia.github.io}}.

翻译：用户界面（UI）开发需要将设计稿转换为功能代码，这一过程仍具有重复性和劳动密集型特点。尽管近期的视觉-语言模型（VLMs）实现了UI到代码的自动化生成，但它们仅生成缺乏交互性的静态HTML/CSS/JavaScript布局。为解决此问题，我们提出了WebVIA，这是首个用于交互式UI到代码生成与验证的智能体框架。该框架包含三个组件：1）一个探索智能体，用于捕获多状态UI截图；2）一个UI2Code模型，用于生成可执行的交互式代码；3）一个验证模块，用于验证交互性。实验表明，WebVIA-Agent相比通用智能体（例如Gemini-2.5-Pro）实现了更稳定和准确的UI探索。此外，我们微调的WebVIA-UI2Code模型在生成可执行且交互式的HTML/CSS/JavaScript代码方面展现出显著改进，在交互式和静态UI2Code基准测试中均优于其基础模型。我们的代码和模型可在 \href{https://zheny2751-dotcom.github.io/webvia.github.io/}{\texttt{https://webvia.github.io}} 获取。