Road network data can provide rich information about cities and thus become the base for various urban research. However, processing large volume world-wide road network data requires intensive computing resources and the processed results might be different to be unified for testing downstream tasks. Therefore, in this paper, we process the OpenStreetMap data via a distributed computing of 5,000 cores on cloud services and release a structured world-wide 1-billion-vertex road network graph dataset with high accessibility (opensource and downloadable to the whole world) and usability (open-box graph structure and easy spatial query interface). To demonstrate how this dataset can be utilized easily, we present three illustrative use cases, including traffic prediction, city boundary detection and traffic policy control, and conduct extensive experiments for these three tasks. (1) For the well-investigated traffic prediction tasks, we release a new benchmark with 31 cities (traffic data processed and combined with our released OSM+ road network dataset), to provide much larger spatial coverage and more comprehensive evaluation of compared algorithms than the previously frequently-used datasets. This new benchmark will push the algorithms on their scalability from hundreds of road network intersections to thousands of intersections. (2) While for the more advanced traffic policy control task which requires interaction with the road network, we release a new 6 city datasets with much larger scale than the previous datasets. This brings new challenge for thousand-scale multi-agent coordination. (3) Along with the OSM+ dataset, the release of data converters facilitates the integration of multimodal spatial-temporal data for geospatial foundation model training, thereby expediting the process of uncovering compelling scientific insights. PVLDB Reference Forma
翻译:道路网络数据能够提供丰富的城市信息,因此成为各类城市研究的基础。然而,处理全球范围内的大规模道路网络数据需要密集的计算资源,且处理结果往往难以统一用于下游任务的测试。为此,本文通过云服务上的5000核分布式计算处理OpenStreetMap数据,并发布了一个结构化、全球范围的十亿顶点道路网络图数据集,该数据集具有高可访问性(开源且全球可下载)和高可用性(开箱即用的图结构及便捷的空间查询接口)。为展示该数据集如何被便捷地利用,我们提出了三个示例性应用案例,包括交通预测、城市边界检测和交通政策控制,并对这三项任务进行了广泛的实验。(1)针对已有深入研究的交通预测任务,我们发布了一个包含31个城市的新基准(交通数据经处理并与我们发布的OSM+道路网络数据集结合),相比先前常用的数据集,该基准提供了更大的空间覆盖范围和更全面的算法评估。这一新基准将推动算法从处理数百个道路网络交叉口扩展到数千个交叉口的可扩展性。(2)对于需要与道路网络交互的更高级交通政策控制任务,我们发布了一个包含6个城市的新数据集,其规模远超以往数据集。这为千级规模的多智能体协调带来了新的挑战。(3)伴随OSM+数据集的发布,数据转换器的推出促进了多模态时空数据的整合,以用于地理空间基础模型训练,从而加速了揭示引人注目的科学发现的过程。PVLDB参考格式