Interaction data is widely used in multiple domains such as cognitive science, visualization, human computer interaction, and cybersecurity, among others. Applications range from cognitive analyses over user/behavior modeling, adaptation, recommendations, to (user/bot) identification/verification. That is, research on these applications - in particular those relying on learned models - require copious amounts of structured data for both training and evaluation. Different application domains thereby impose different requirements. I.e., for some purposes it is vital that the data is based on a guided interaction process, meaning that monitored subjects pursued a given task, while other purposes require additional context information, such as widget interactions or metadata. Unfortunately, the amount of publicly available datasets is small and their respective applicability for specific purposes limited. We present GUIDEd Interaction DATA (GUIDAETA) - a new dataset, collected from a large-scale guided user study with more than 250 users, each working on three pre-defined information retrieval tasks using a custom-built consumer information system. Besides being larger than most comparable datasets - with 716 completed tasks, 2.39 million mouse and keyboard events (2.35 million and 40 thousand, respectively) and a total observation period of almost 50 hours - its interactions exhibit encompassing context information in the form of widget information, triggered (system) events and associated displayed content. Combined with extensive metadata such as sociodemographic user data and answers to explicit feedback questionnaires (regarding perceived usability, experienced cognitive load, pre-knowledge on the information system's topic), GUIDAETA constitutes a versatile dataset, applicable for various research domains and purposes.
翻译:交互数据在认知科学、可视化、人机交互及网络安全等多个领域被广泛应用,其应用场景涵盖从认知分析、用户/行为建模、自适应系统、推荐算法到(用户/机器人)身份识别/验证等多个方面。这意味着相关研究——尤其是依赖学习模型的应用——需要大量结构化数据用于训练与评估。不同应用领域对此提出了差异化需求:例如某些研究要求数据必须基于引导式交互过程(即受试者在预设任务框架下进行操作),而另一些研究则需要额外的上下文信息,如界面组件交互记录或元数据。然而,当前公开可用的数据集数量有限,且其针对特定研究目的的适用性存在局限。本文提出GUIDEd Interaction DATA(GUIDAETA)——一个通过大规模引导式用户实验收集的新数据集,实验涵盖超过250名用户,每位用户均使用定制化消费信息系统完成三项预定义信息检索任务。该数据集在规模上超越多数同类数据集:包含716项完整任务记录、239万次鼠标与键盘事件(分别为235万次与4万次),总观测时长近50小时。其交互数据具备完整的上下文信息,包括界面组件信息、触发的(系统)事件及关联显示内容。结合丰富的元数据(如用户社会人口统计学数据、关于系统可用性感知、认知负荷体验及信息系统主题先验知识的显式反馈问卷结果),GUIDAETA构成了一个适用于多研究领域与目的的通用型数据集。