Running data analytics queries on serverless (FaaS) workers has been shown to be cost- and performance-efficient for a variety of real-world scenarios, including intermittent query arrival patterns, sudden load spikes and management challenges that afflict managed VM clusters. Alas, existing serverless data analytics works focus primarily on the serverless execution engine and assume the existence of a "good" query execution plan or rely on user guidance to construct such a plan. Meanwhile, even simple analytics queries on serverless have a huge space of possible plans, with vast differences in both performance and cost among plans. This paper introduces Odyssey, an end-to-end serverless-native data analytics pipeline that integrates a query planner, cost model and execution engine. Odyssey automatically generates and evaluates serverless query plans, utilizing state space pruning heuristics and a novel search algorithm to identify Pareto-optimal plans that balance cost and performance with low latency even for complex queries. Our evaluations demonstrate that Odyssey accurately predicts both monetary cost and latency, and consistently outperforms AWS Athena on cost and/or latency.
翻译:暂无翻译