自适应矩阵稀疏化及其在经验风险最小化中的应用 (Adaptive Matrix Sparsification and Applications to Empirical Risk Minimization)

Consider the empirical risk minimization (ERM) problem, which is stated as follows. Let $K_1, \dots, K_m$ be compact convex sets with $K_i \subseteq \mathbb{R}^{n_i}$ for $i \in [m]$, $n = \sum_{i=1}^m n_i$, and $n_i\le C_K$ for some absolute constant $C_K$. Also, consider a matrix $A \in \mathbb{R}^{n \times d}$ and vectors $b \in \mathbb{R}^d$ and $c \in \mathbb{R}^n$. Then the ERM problem asks to find \[ \min_{\substack{x \in K_1 \times \dots \times K_m\\ A^\top x = b}} c^\top x. \] We give an algorithm to solve this to high accuracy in time $\widetilde{O}(nd + d^6\sqrt{n}) \le \widetilde{O} (nd + d^{11})$, which is nearly-linear time in the input size when $A$ is dense and $n \ge d^{10}$. Our result is achieved by implementing an $\widetilde{O}(\sqrt{n})$-iteration interior point method (IPM) efficiently using dynamic data structures. In this direction, our key technical advance is a new algorithm for maintaining leverage score overestimates of matrices undergoing row updates. Formally, given a matrix $A \in \mathbb{R}^{n \times d}$ undergoing $T$ batches of row updates of total size $n$ we give an algorithm which can maintain leverage score overestimates of the rows of $A$ summing to $\widetilde{O}(d)$ in total time $\widetilde{O}(nd + Td^6)$. This data structure is used to sample a spectral sparsifier within a robust IPM framework to establish the main result.

翻译：考虑经验风险最小化（ERM）问题，其表述如下。设 $K_1, \dots, K_m$ 为紧凸集，满足 $K_i \subseteq \mathbb{R}^{n_i}$，其中 $i \in [m]$，$n = \sum_{i=1}^m n_i$，且 $n_i\le C_K$（$C_K$ 为绝对常数）。同时，考虑矩阵 $A \in \mathbb{R}^{n \times d}$ 以及向量 $b \in \mathbb{R}^d$ 和 $c \in \mathbb{R}^n$。则 ERM 问题旨在求解 \[ \min_{\substack{x \in K_1 \times \dots \times K_m\\ A^\top x = b}} c^\top x. \] 我们提出一种算法，可在 $\widetilde{O}(nd + d^6\sqrt{n}) \le \widetilde{O} (nd + d^{11})$ 时间内高精度求解该问题。当 $A$ 为稠密矩阵且 $n \ge d^{10}$ 时，该时间复杂度在输入规模上近乎线性。我们的成果通过利用动态数据结构高效实现 $\widetilde{O}(\sqrt{n})$ 次迭代的内点法（IPM）而达成。在此方向上，我们的关键技术突破是提出了一种新算法，用于维护经历行更新的矩阵的杠杆分数上界估计。形式化地，给定矩阵 $A \in \mathbb{R}^{n \times d}$ 经历 $T$ 批总规模为 $n$ 的行更新，我们提出的算法可在总时间 $\widetilde{O}(nd + Td^6)$ 内维护 $A$ 各行杠杆分数上界估计的总和至 $\widetilde{O}(d)$。该数据结构用于在鲁棒 IPM 框架内采样谱稀疏化器，从而确立主要结果。