Deep learning based computer vision models are increasingly used by urban planners to support decision making for shaping urban environments. Such models predict how people perceive the urban environment quality in terms of e.g. its safety or beauty. However, the blackbox nature of deep learning models hampers urban planners to understand what landscape objects contribute to a particularly high quality or low quality urban space perception. This study investigates how computer vision models can be used to extract relevant policy information about peoples' perception of the urban space. To do so, we train two widely used computer vision architectures; a Convolutional Neural Network and a transformer, and apply GradCAM -- a well-known ex-post explainable AI technique -- to highlight the image regions important for the model's prediction. Using these GradCAM visualizations, we manually annotate the objects relevant to the models' perception predictions. As a result, we are able to discover new objects that are not represented in present object detection models used for annotation in previous studies. Moreover, our methodological results suggest that transformer architectures are better suited to be used in combination with GradCAM techniques. Code is available on Github.
翻译:城市规划者越来越多地使用基于深层次学习的计算机愿景模型,以支持决策塑造城市环境。这些模型预测人们如何从安全或美等角度看待城市环境质量。然而,深层次学习模型的黑匣子性质妨碍了城市规划者了解景观物体对城市空间的感知特别高或低质量。本研究研究如何利用计算机愿景模型来获取关于人们对城市空间的感知的相关政策信息。为此,我们培训了两种广泛使用的计算机愿景架构;一个革命神经网络和一个变异器,并应用GradCAM -- -- 一种众所周知的事后可解释的AI技术 -- -- 来突出模型预测的重要图像区域。我们使用这些 GradCAM视觉化,手动说明与模型认知预测相关的物体。结果,我们能够发现在以往研究中用于说明的当前物体探测模型中没有反映的新物体。此外,我们的方法结果表明,变异器结构更适合与格拉德-卡马技术结合使用。代码可在Github上查阅。