In this work, we propose a set of conformal prediction procedures tailored to compositional responses, where outcomes are proportions that must be positive and sum to one. Building on Dirichlet regression, we introduce a split conformal approach based on quantile residuals and a highest-density region strategy that combines a fast coordinate-floor approximation with an internal grid refinement to restore sharpness. Both constructions are model-agnostic at the conformal layer and guarantee finite-sample marginal coverage under exchangeability, while respecting the geometry of the simplex. A comprehensive Monte Carlo study spanning homoscedastic and heteroscedastic designs shows that the quantile residual and grid-refined HDR methods achieve empirical coverage close to the nominal 90\% level and produce substantially narrower regions than the coordinate-floor approximation, which tends to be conservative. We further demonstrate the methods on household budget shares from the BudgetItaly dataset, using standardized socioeconomic and price covariates with a train, calibration, and test split. In this application, the grid-refined HDR attains coverage closest to the target with the smallest average widths, closely followed by the quantile residual approach, while the simple triangular HDR yields wider, less informative sets. Overall, the results indicate that conformal prediction on the simplex can be both calibrated and efficient, providing practical uncertainty quantification for compositional prediction tasks.
翻译:本文提出了一套专为组合响应变量设计的保形预测方法,其中输出结果为必须为正且总和为一的比例值。基于狄利克雷回归,我们引入了一种基于分位数残差的分割保形方法,以及一种最高密度区域策略,该策略将快速坐标下限近似与内部网格细化相结合以恢复锐度。两种构造在保形层上均与模型无关,并在可交换性条件下保证有限样本的边际覆盖,同时保持单纯形的几何结构。一项涵盖同方差与异方差设计的全面蒙特卡洛研究表明,分位数残差法与网格细化最高密度区域方法均实现了接近名义90%水平的经验覆盖,且产生的区域远窄于坐标下限近似法(后者往往趋于保守)。我们进一步在BudgetItaly数据集中的家庭预算份额上验证了这些方法,使用标准化的社会经济与价格协变量,并采用训练、校准与测试分割。在此应用中,网格细化最高密度区域方法以最小的平均宽度实现了最接近目标值的覆盖,紧随其后的是分位数残差法,而简单的三角最高密度区域法则产生更宽、信息量更少的集合。总体而言,结果表明单纯形上的保形预测既能保持校准性又具备高效性,为组合预测任务提供了实用的不确定性量化工具。