In most existing learning systems, images are typically viewed as 2D pixel arrays. However, in another paradigm gaining popularity, a 2D image is represented as an implicit neural representation (INR) -- an MLP that predicts an RGB pixel value given its (x,y) coordinate. In this paper, we propose two novel architectural techniques for building INR-based image decoders: factorized multiplicative modulation and multi-scale INRs, and use them to build a state-of-the-art continuous image GAN. Previous attempts to adapt INRs for image generation were limited to MNIST-like datasets and do not scale to complex real-world data. Our proposed architectural design improves the performance of continuous image generators by x6-40 times and reaches FID scores of 6.27 on LSUN bedroom 256x256 and 16.32 on FFHQ 1024x1024, greatly reducing the gap between continuous image GANs and pixel-based ones. To the best of our knowledge, these are the highest reported scores for an image generator, that consists entirely of fully-connected layers. Apart from that, we explore several exciting properties of INR-based decoders, like out-of-the-box superresolution, meaningful image-space interpolation, accelerated inference of low-resolution images, an ability to extrapolate outside of image boundaries and strong geometric prior. The source code is available at https://github.com/universome/inr-gan
翻译:在大多数现有的学习系统中,图像通常被视为2D像素阵列。然而,在另一个日益受欢迎的范例中,2D图像被表现为隐性神经显示(INR) -- -- 一个预测 RGB像素值的 MLP, 因为它的( x,y) 坐标。 在本文中, 我们提议了两种创新的建筑技术, 用于建设以IRS为基础的图像解码器: 乘以多倍式调制和多尺度的 INR, 并用来构建最先进的连续图像 GAN。 先前为图像生成而调整IRS的尝试, 仅限于像MNIST一样的数据集, 而不是对复杂的真实世界数据进行缩放。 我们拟议的建筑设计用x6- 40 次提高连续图像生成器的性能, 并在 LSUN 卧室256x256 和 16.32 在 FFHQ 1024x1024 上, 大大缩小连续图像GANS和以像素为基础的图像源之间的鸿沟。 据我们所知, 这些是所报道的最强的I- 图像生成器的强的分数级的分数分数,, 完全由前的SIR- decomml- decol- decol- develil develildal) 的离层。