We study the problem of variable selection in convex nonparametric least squares (CNLS). Whereas the least absolute shrinkage and selection operator (Lasso) is a popular technique for least squares, its variable selection performance is unknown in CNLS problems. In this work, we investigate the performance of the Lasso estimator and find out it is usually unable to select variables efficiently. Exploiting the unique structure of the subgradients in CNLS, we develop a structured Lasso method by combining $\ell_1$-norm and $\ell_{\infty}$-norm. The relaxed version of the structured Lasso is proposed for achieving model sparsity and predictive performance simultaneously, where we can control the two effects--variable selection and model shrinkage--using separate tuning parameters. A Monte Carlo study is implemented to verify the finite sample performance of the proposed approaches. We also use real data from Swedish electricity distribution networks to illustrate the effects of the proposed variable selection techniques. The results from the simulation and application confirm that the proposed structured Lasso performs favorably, generally leading to sparser and more accurate predictive models, relative to the conventional Lasso methods in the literature.
翻译:本文研究凸非参数最小二乘(CNLS)中的变量选择问题。尽管最小绝对收缩与选择算子(Lasso)是处理最小二乘问题的常用技术,但其在CNLS问题中的变量选择性能尚不明确。本工作中,我们探究了Lasso估计量的性能,发现其通常难以有效筛选变量。通过利用CNLS中子梯度的独特结构,我们结合$\ell_1$范数与$\ell_{\infty}$范数,开发了一种结构化Lasso方法。为同时实现模型稀疏性与预测性能,我们提出了结构化Lasso的松弛版本,其中可通过独立的调谐参数分别控制变量选择与模型收缩两种效应。通过蒙特卡洛模拟验证了所提方法在有限样本下的性能。我们还使用瑞典电力分配网络的实际数据,展示了所提变量选择技术的效果。仿真与应用结果均证实,相较于文献中的传统Lasso方法,所提出的结构化Lasso表现更优,通常能构建更稀疏且预测更精确的模型。