• 中文核心期刊要目总览
  • 中国科技核心期刊
  • 中国科学引文数据库(CSCD)
  • 中国科技论文与引文数据库(CSTPCD)
  • 中国学术期刊文摘数据库(CSAD)
  • 中国学术期刊(网络版)(CNKI)
  • 中文科技期刊数据库
  • 万方数据知识服务平台
  • 中国超星期刊域出版平台
  • 国家科技学术期刊开放平台
  • 荷兰文摘与引文数据库(SCOPUS)
  • 日本科学技术振兴机构数据库(JST)

一种切片逆回归中最优子集选择的剪接算法

A splicing algorithm for best subset selection in sliced inverse regression

  • 摘要: 本文研究切片逆回归问题。它是充分降维领域广泛使用的方法,旨在通过用最少足够的自变量的线性组合替换原有的多维自变量,找到其降维版本而不损失信息。近来,人们提出了切片逆回归中的一些正则化方法,将稀疏性结构考虑进来,提高了自变量的解释性。然而,现有方法使用凸松弛来绕过稀疏性约束,可能无法得到最优子集。特别是当自变量之间存在相关性时,它们倾向于多选不相关的变量。在本文中,我们将稀疏切片逆回归表述为一个非凸优化问题,建立其最优性条件,并通过剪接技术迭代地求解,从而直接地处理稀疏性约束。该算法不对稀疏性约束和正交约束使用凸松弛,且表现出经验上的优越性——我们的数值研究证明了这一点。从计算角度,我们的算法比自然的稀疏切片逆回归估计的松弛方法快很多。从统计角度,我们的算法在中心子空间估计和最优子集选择的准确性上比现有方法更好,并且即使在自变量相关的情形也能保持良好的表现。

     

    Abstract: In this study, we examine the problem of sliced inverse regression (SIR), a widely used method for sufficient dimension reduction (SDR). It was designed to find reduced-dimensional versions of multivariate predictors by replacing them with a minimally adequate collection of their linear combinations without loss of information. Recently, regularization methods have been proposed in SIR to incorporate a sparse structure of predictors for better interpretability. However, existing methods consider convex relaxation to bypass the sparsity constraint, which may not lead to the best subset, and particularly tends to include irrelevant variables when predictors are correlated. In this study, we approach sparse SIR as a nonconvex optimization problem and directly tackle the sparsity constraint by establishing the optimal conditions and iteratively solving them by means of the splicing technique. Without employing convex relaxation on the sparsity constraint and the orthogonal constraint, our algorithm exhibits superior empirical merits, as evidenced by extensive numerical studies. Computationally, our algorithm is much faster than the relaxed approach for the natural sparse SIR estimator. Statistically, our algorithm surpasses existing methods in terms of accuracy for central subspace estimation and best subset selection and sustains high performance even with correlated predictors.

     

/

返回文章
返回