The knowledge that data lies close to a particular submanifold of the ambient Euclidean space may be useful in a number of ways. For instance, one may want to automatically mark any point far away from the submanifold as an outlier, or to use its geodesic distance to measure similarity between points. Classical problems for manifold learning are often posed in a very high dimension, e.g. for spaces of images or spaces of representations of words. Today, with deep representation learning on the rise in areas such as computer vision and natural language processing, many problems of this kind may be transformed into problems of moderately high dimension, typically of the order of hundreds. Motivated by this, we propose a manifold learning technique suitable for moderately high dimension and large datasets. The manifold is learned from the training data in the form of an intersection of quadric hypersurfaces -- simple but expressive objects. At test time, this manifold can be used to introduce an outlier score for arbitrary new points and to improve a given similarity metric by incorporating learned geometric structure into it.
翻译:数据接近周围欧几里德空间某一子层的知识在很多方面可能有用。 例如,人们可能希望自动将离亚平面很远的任何点标为外端,或者使用其大地测量距离来测量各点之间的相似性。 典型的多重学习问题往往是在非常高的维度上产生的, 例如图像空间或文字表达空间。 今天,随着在计算机视觉和自然语言处理等领域的上升方面的深刻代表性学习,许多这类问题可能会变成中等高维度的问题, 通常是几百个层次的问题。 我们为此提出一种适合中等高维度和大数据集的多元学习技术。 从培训数据中以四面高面交汇的形式学习到的方块 -- -- 简单但能表达的物体。 在试验时, 可以利用这些方块来为任意的新点引入外部分, 并通过将学习的几何结构纳入其中来改进特定的相似度指标。