Resources for simulation-based evaluation of conversational recommender systems (CRSs) are scarce. The UserSimCRS toolkit was introduced to address this gap. In this work, we present UserSimCRS v2, a significant upgrade aligning the toolkit with state-of-the-art research. Key extensions include an enhanced agenda-based user simulator, introduction of large language model-based simulators, integration for a wider range of CRSs and datasets, and new LLM-as-a-judge evaluation utilities. We demonstrate these extensions in a case study.
翻译:用于对话推荐系统仿真评估的资源较为稀缺。UserSimCRS工具包正是为填补这一空白而提出。本文介绍了UserSimCRS v2,该版本作为一次重大升级,使工具包与前沿研究保持同步。主要扩展包括:增强的基于议程的用户模拟器、引入基于大语言模型的模拟器、支持更广泛的对话推荐系统与数据集集成,以及新增基于大语言模型作为评判器的评估功能。我们通过一项案例研究展示了这些扩展的应用。