监督式流水线的模块化射流：诊断幻象与可辨识性 (Modular Jets for Supervised Pipelines: Diagnosing Mirage vs Identifiability)

Classical supervised learning evaluates models primarily via predictive risk on hold-out data. Such evaluations quantify how well a function behaves on a distribution, but they do not address whether the internal decomposition of a model is uniquely determined by the data and evaluation design. In this paper, we introduce \emph{Modular Jets} for regression and classification pipelines. Given a task manifold (input space), a modular decomposition, and access to module-level representations, we estimate empirical jets, which are local linear response maps that describe how each module reacts to small structured perturbations of the input. We propose an empirical notion of \emph{mirage} regimes, where multiple distinct modular decompositions induce indistinguishable jets and thus remain observationally equivalent, and contrast this with an \emph{identifiable} regime, where the observed jets single out a decomposition up to natural symmetries. In the setting of two-module linear regression pipelines we prove a jet-identifiability theorem. Under mild rank assumptions and access to module-level jets, the internal factorisation is uniquely determined, whereas risk-only evaluation admits a large family of mirage decompositions that implement the same input-to-output map. We then present an algorithm (MoJet) for empirical jet estimation and mirage diagnostics, and illustrate the framework using linear and deep regression as well as pipeline classification.

翻译：经典监督学习主要通过留出数据的预测风险来评估模型。此类评估量化了函数在分布上的表现优劣，但并未解决模型内部分解是否由数据与评估设计唯一确定的问题。本文针对回归与分类流水线引入\emph{模块化射流}。给定任务流形（输入空间）、模块化分解及模块级表征的访问权限，我们估计经验射流——即描述每个模块对输入的小型结构化扰动如何响应的局部线性响应映射。我们提出\emph{幻象}机制的经验概念：在此机制下，多个不同的模块化分解会诱导出不可区分的射流，从而保持观测等价性；并将其与\emph{可辨识}机制进行对比——后者中观测到的射流可在自然对称性范围内唯一确定分解。针对双模块线性回归流水线场景，我们证明了射流可辨识性定理：在温和的秩假设及模块级射流可访问条件下，内部分解是唯一确定的；而仅基于风险的评估则允许大量实现相同输入-输出映射的幻象分解族存在。随后我们提出用于经验射流估计与幻象诊断的算法（MoJet），并通过线性回归、深度回归及流水线分类实例阐释该框架。