卸载与否：加速器时代边缘原生与设备端处理模型驱动比较 (To Offload or Not To Offload: Model-driven Comparison of Edge-native and On-device Processing In the Era of Accelerators)

Computational offloading is a promising approach for overcoming resource constraints on client devices by moving some or all of an application's computations to remote servers. With the advent of specialized hardware accelerators, client devices can now perform fast local processing of specific tasks, such as machine learning inference, reducing the need for offloading computations. However, edge servers with accelerators also offer faster processing for offloaded tasks than was previously possible. In this paper, we present an analytic and experimental comparison of on-device processing and edge offloading for a range of accelerator, network, multi-tenant, and application workload scenarios, with the goal of understanding when to use local on-device processing and when to offload computations. We present models that leverage analytical queuing results to derive explainable closed-form equations for the expected end-to-end latencies of both strategies, which yield precise, quantitative performance crossover predictions that guide adaptive offloading. We experimentally validate our models across a range of scenarios and show that they achieve a mean absolute percentage error of 2.2% compared to observed latencies. We further use our models to develop a resource manager for adaptive offloading and show its effectiveness under variable network conditions and dynamic multi-tenant edge settings.

翻译：计算卸载是一种通过将应用程序的部分或全部计算任务迁移至远程服务器来克服客户端设备资源限制的有效方法。随着专用硬件加速器的出现，客户端设备现在能够对特定任务（如机器学习推理）进行快速本地处理，从而减少对计算卸载的需求。然而，配备加速器的边缘服务器也为卸载任务提供了比以往更快的处理能力。本文通过分析与实验，比较了设备端处理与边缘卸载在多种加速器、网络、多租户及应用程序负载场景下的表现，旨在明确何时应采用本地设备端处理，何时应进行计算卸载。我们提出了基于分析排队结果的模型，推导出两种策略预期端到端延迟的可解释闭式方程，这些方程能够生成精确、定量的性能交叉预测，从而指导自适应卸载决策。我们在多种场景下对模型进行了实验验证，结果显示模型预测延迟与实测延迟的平均绝对百分比误差仅为2.2%。进一步地，我们利用该模型开发了自适应卸载资源管理器，并证明了其在可变网络条件和动态多租户边缘环境下的有效性。