快速、样本高效、仿射不变的高斯子分布私有均值与协方差估计 (Fast, Sample-Efficient, Affine-Invariant Private Mean and Covariance Estimation for Subgaussian Distributions)

We present a fast, differentially private algorithm for high-dimensional covariance-aware mean estimation with nearly optimal sample complexity. Only exponential-time estimators were previously known to achieve this guarantee. Given $n$ samples from a (sub-)Gaussian distribution with unknown mean $μ$ and covariance $Σ$, our $(\varepsilon,δ)$-differentially private estimator produces $\tildeμ$ such that $\|μ- \tildeμ\|_Σ \leq α$ as long as $n \gtrsim \tfrac d {α^2} + \tfrac{d \sqrt{\log 1/δ}}{α\varepsilon}+\frac{d\log 1/δ}{\varepsilon}$. The Mahalanobis error metric $\|μ- \hatμ\|_Σ$ measures the distance between $\hat μ$ and $μ$ relative to $Σ$; it characterizes the error of the sample mean. Our algorithm runs in time $\tilde{O}(nd^{ω- 1} + nd/\varepsilon)$, where $ω< 2.38$ is the matrix multiplication exponent. We adapt an exponential-time approach of Brown, Gaboardi, Smith, Ullman, and Zakynthinou (2021), giving efficient variants of stable mean and covariance estimation subroutines that also improve the sample complexity to the nearly optimal bound above. Our stable covariance estimator can be turned to private covariance estimation for unrestricted subgaussian distributions. With $n\gtrsim d^{3/2}$ samples, our estimate is accurate in spectral norm. This is the first such algorithm using $n= o(d^2)$ samples, answering an open question posed by Alabi et al. (2022). With $n\gtrsim d^2$ samples, our estimate is accurate in Frobenius norm. This leads to a fast, nearly optimal algorithm for private learning of unrestricted Gaussian distributions in TV distance. Duchi, Haque, and Kuditipudi (2023) obtained similar results independently and concurrently.

翻译：我们提出了一种快速的差分隐私算法，用于高维协方差感知的均值估计，其样本复杂度接近最优。此前仅指数时间估计器能达到该保证。给定来自具有未知均值 $μ$ 和协方差 $Σ$ 的（亚）高斯分布的 $n$ 个样本，我们的 $(\varepsilon,δ)$-差分隐私估计器生成 $\tildeμ$，使得只要 $n \gtrsim \tfrac d {α^2} + \tfrac{d \sqrt{\log 1/δ}}{α\varepsilon}+\frac{d\log 1/δ}{\varepsilon}$，就有 $\|μ- \tildeμ\|_Σ \leq α$。马氏距离误差度量 $\|μ- \hatμ\|_Σ$ 衡量 $\hat μ$ 与 $μ$ 相对于 $Σ$ 的距离；它刻画了样本均值的误差。我们的算法运行时间为 $\tilde{O}(nd^{ω- 1} + nd/\varepsilon)$，其中 $ω< 2.38$ 是矩阵乘法指数。我们改进了 Brown、Gaboardi、Smith、Ullman 和 Zakynthinou（2021）的指数时间方法，给出了稳定均值与协方差估计子程序的高效变体，同时将样本复杂度提升至上述接近最优的界。我们的稳定协方差估计器可转化为无限制亚高斯分布的私有协方差估计。当 $n\gtrsim d^{3/2}$ 个样本时，我们的估计在谱范数下是准确的。这是首个使用 $n= o(d^2)$ 样本的此类算法，回答了 Alabi 等人（2022）提出的开放性问题。当 $n\gtrsim d^2$ 个样本时，我们的估计在 Frobenius 范数下是准确的。这导出了一个快速、接近最优的算法，用于在 TV 距离下私有学习无限制高斯分布。Duchi、Haque 和 Kuditipudi（2023）独立且同时获得了类似结果。