Context-based question answering (CBQA) models provide more accurate and relevant answers by considering the contextual information. They effectively extract specific information given a context, making them functional in various applications involving user support, information retrieval, and educational platforms. In this manuscript, we benchmarked the performance of 47 CBQA models from Hugging Face on eight different datasets. This study aims to identify the best-performing model across diverse datasets without additional fine-tuning. It is valuable for practical applications where the need to retrain models for specific datasets is minimized, streamlining the implementation of these models in various contexts. The best-performing models were trained on the SQuAD v2 or SQuAD v1 datasets. The best-performing model was ahotrod/electra_large_discriminator_squad2_512, which yielded 43\% accuracy across all datasets. We observed that the computation time of all models depends on the context length and the model size. The model's performance usually decreases with an increase in the answer length. Moreover, the model's performance depends on the context complexity. We also used the Genetic algorithm to improve the overall accuracy by integrating responses from other models. ahotrod/electra_large_discriminator_squad2_512 generated the best results for bioasq10b-factoid (65.92\%), biomedical\_cpgQA (96.45\%), QuAC (11.13\%), and Question Answer Dataset (41.6\%). Bert-large-uncased-whole-word-masking-finetuned-squad achieved an accuracy of 82\% on the IELTS dataset.
翻译:基于上下文的问答模型通过考虑上下文信息提供更准确和相关的答案。这些模型能够根据给定上下文有效提取特定信息,使其在用户支持、信息检索和教育平台等各种应用中发挥功能。在本研究中,我们在八个不同数据集上对来自Hugging Face的47种CBQA模型进行了性能基准测试。本研究旨在识别在不同数据集上表现最佳的模型,而无需额外微调。这对于实际应用具有重要价值,可最大限度地减少针对特定数据集重新训练模型的需求,从而简化这些模型在各种场景中的部署。表现最佳的模型均在SQuAD v2或SQuAD v1数据集上进行训练。最佳性能模型为ahotrod/electra_large_discriminator_squad2_512,在所有数据集上取得了43%的准确率。我们观察到所有模型的计算时间取决于上下文长度和模型规模。模型性能通常随答案长度的增加而下降。此外,模型性能还取决于上下文复杂度。我们还采用遗传算法通过整合其他模型的响应来提高整体准确率。ahotrod/electra_large_discriminator_squad2_512在bioasq10b-factoid(65.92%)、biomedical_cpgQA(96.45%)、QuAC(11.13%)和Question Answer Dataset(41.6%)数据集上取得了最佳结果。Bert-large-uncased-whole-word-masking-finetuned-squad在IELTS数据集上达到了82%的准确率。