Conformal prediction (CP) constructs uncertainty sets for model outputs with finite-sample coverage guarantees. A candidate output is included in the prediction set if its non-conformity score is not considered extreme relative to the scores observed on a set of calibration examples. However, this procedure is only straightforward when scores are scalar-valued, which has limited CP to real-valued scores or ad-hoc reductions to one dimension. The problem of ordering vectors has been studied via optimal transport (OT), which provides a principled method for defining vector-ranks and multivariate quantile regions, though typically with only asymptotic coverage guarantees. We restore finite-sample, distribution-free coverage by conformalizing the vector-valued OT quantile region. Here, a candidate's rank is defined via a transport map computed for the calibration scores augmented with that candidate's score. This defines a continuum of OT problems for which we prove that the resulting optimal assignment is piecewise-constant across a fixed polyhedral partition of the score space. This allows us to characterize the entire prediction set tractably, and provides the machinery to address a deeper limitation of prediction sets: that they only indicate which outcomes are plausible, but not their relative likelihood. In one dimension, conformal predictive distributions (CPDs) fill this gap by producing a predictive distribution with finite-sample calibration. Extending CPDs beyond one dimension remained an open problem. We construct, to our knowledge, the first multivariate CPDs with finite-sample calibration, i.e., they define a valid multivariate distribution where any derived uncertainty region automatically has guaranteed coverage. We present both conservative and exact randomized versions, the latter resulting in a multivariate generalization of the classical Dempster-Hill procedure.
翻译:保形预测(CP)通过有限样本覆盖保证为模型输出构建不确定性集合。若候选输出的非保形分数相对于一组校准样本观测到的分数不被视为极端值,则该输出被纳入预测集合。然而,该流程仅在分数为标量值时可直接实施,这限制了CP仅适用于实值分数或临时降维至一维的情况。向量排序问题已通过最优传输(OT)进行研究,该方法为定义向量秩和多元分位数区域提供了理论框架,但通常仅具有渐近覆盖保证。我们通过对向量值OT分位数区域进行保形化处理,恢复了有限样本且与分布无关的覆盖性。在此框架下,候选向量的秩通过计算校准分数(加入该候选分数后)的传输映射来定义。这定义了一系列OT问题,我们证明了所得最优分配在分数空间的固定多面体划分上是分段常数。这使得我们能够有效刻画整个预测集合,并为解决预测集合的深层局限提供了技术基础:预测集合仅指示哪些结果是可能的,而未体现其相对似然性。在一维情形中,保形预测分布(CPD)通过生成具有有限样本校准性的预测分布填补了这一空白。将CPD扩展至多维仍是一个悬而未决的问题。我们构建了首个(据我们所知)具有有限样本校准性的多元CPD,即定义了一个有效的多元分布,其中任何衍生的不确定性区域均自动具备覆盖保证。我们提出了保守版本和精确随机化版本,后者实现了经典Dempster-Hill方法的多元推广。