Motivated by the EU's "Right To Be Forgotten" regulation, we initiate a study of statistical data deletion problems where users' data are accessible only for a limited period of time. This setting is formulated as an online supervised learning task with \textit{constant memory limit}. We propose a deletion-aware algorithm \texttt{FIFD-OLS} for the low dimensional case, and witness a catastrophic rank swinging phenomenon due to the data deletion operation, which leads to statistical inefficiency. As a remedy, we propose the \texttt{FIFD-Adaptive Ridge} algorithm with a novel online regularization scheme, that effectively offsets the uncertainty from deletion. In theory, we provide the cumulative regret upper bound for both online forgetting algorithms. In the experiment, we showed \texttt{FIFD-Adaptive Ridge} outperforms the ridge regression algorithm with fixed regularization level, and hopefully sheds some light on more complex statistical models.
翻译:基于欧盟的“被遗忘的权利”规则,我们开始研究统计数据删除问题,因为用户的数据只能在有限时间内获得。这一设置是作为在线监管学习任务而设计的,具有\ textit{constant 内存限制}。我们提议为低维案例采用删除算法\ textt{FID-OLS},并见证了由于数据删除操作而导致的灾难性等级波动现象,这导致了统计效率低下。作为一种补救措施,我们建议采用新的在线正规化计划来有效抵消删除的不确定性。理论上,我们为两种在线遗忘算法提供了累积的遗憾。在实验中,我们展示了\ textt{FID-Adaptive Ridge} 超过了固定正规化水平的山脊回归算法,并希望能为更复杂的统计模型提供一些线索。