The Video Browser Showdown (VBS) challenges systems to deliver accurate results under strict time constraints. To meet this demand, we present Fusionista2.0, a streamlined video retrieval system optimized for speed and usability. All core modules were re-engineered for efficiency: preprocessing now relies on ffmpeg for fast keyframe extraction, optical character recognition uses Vintern-1B-v3.5 for robust multilingual text recognition, and automatic speech recognition employs faster-whisper for real-time transcription. For question answering, lightweight vision-language models provide quick responses without the heavy cost of large models. Beyond these technical upgrades, Fusionista2.0 introduces a redesigned user interface with improved responsiveness, accessibility, and workflow efficiency, enabling even non-expert users to retrieve relevant content rapidly. Evaluations demonstrate that retrieval time was reduced by up to 75% while accuracy and user satisfaction both increased, confirming Fusionista2.0 as a competitive and user-friendly system for large-scale video search.
翻译:视频浏览器大赛(VBS)要求系统在严格的时间限制下提供精确的检索结果。为满足这一需求,我们提出了Fusionista2.0,这是一个针对速度和可用性优化的精简视频检索系统。所有核心模块均经过效率重构:预处理现依赖ffmpeg进行快速关键帧提取,光学字符识别采用Vintern-1B-v3.5实现鲁棒的多语言文本识别,自动语音识别使用faster-whisper进行实时转录。对于问答任务,轻量级视觉-语言模型可在无需大型模型高昂成本的情况下提供快速响应。除这些技术升级外,Fusionista2.0还引入了重新设计的用户界面,提升了响应性、可访问性和工作流效率,使非专业用户也能快速检索相关内容。评估表明,检索时间减少了高达75%,同时准确率和用户满意度均有所提高,证实了Fusionista2.0作为大规模视频搜索系统具有竞争力和用户友好性。