Current text-to-image models struggle to render the nuanced facial expressions required for compelling manga narratives, largely due to the ambiguity of language itself. To bridge this gap, we introduce an interactive system built on a novel, dual-hybrid pipeline. The first stage combines landmark-based auto-detection with a manual framing tool for robust, artist-centric face preparation. The second stage maps expressions using the LivePortrait engine, blending intuitive performative input from video for fine-grained control. Our case study analysis suggests that this integrated workflow can streamline the creative process and effectively translate narrative intent into visual expression. This work presents a practical model for human-AI co-creation, offering artists a more direct and intuitive means of ``infusing souls'' into their characters. Our primary contribution is not a new generative model, but a novel, interactive workflow that bridges the gap between artistic intent and AI execution.
翻译:当前的文本到图像模型在渲染引人入胜的漫画叙事所需的细腻面部表情方面存在困难,这主要源于语言本身的模糊性。为弥合这一差距,我们引入了一个基于新型双混合流程的交互式系统。第一阶段结合了基于关键点的自动检测与手动构图工具,实现了稳健且以艺术家为中心的面部准备。第二阶段利用LivePortrait引擎映射表情,通过视频输入的直观表演性控制实现精细调节。我们的案例分析表明,该集成工作流能够优化创作流程,并有效将叙事意图转化为视觉表达。本研究提出了一种实用的人机协同创作模型,为艺术家提供了一种更直接、更直观的“注入灵魂”于角色的方法。我们的主要贡献并非新的生成模型,而是一种新颖的交互式工作流,它弥合了艺术意图与AI执行之间的鸿沟。