ISSN 0253-2778

CN 34-1054/N

Open AccessOpen Access JUSTC Research Articles:Mathematics

Subgroup analysis for multi-response regression

Cite this:
https://doi.org/10.52396/JUST-2021-0053
  • Received Date: 08 March 2021
  • Rev Recd Date: 28 March 2021
  • Publish Date: 31 March 2021
  • Correctly identifying the subgroups in a heterogeneous population has gained increasing popularity in modern big data applications since studying the heterogeneous effect can eliminate the impact of individual differences and make the estimation results more accurate. Despite the fast growing literature, most existing methods mainly focus on the heterogeneous univariate regression and how to precisely identify subgroups in face of multiple responses remains unclear. Here, we develop a new methodology for heterogeneous multi-response regression via a concave pairwise fusion approach, which estimates the coefficient matrix and identifies the subgroup structure jointly. Besides, we provide theoretical guarantees for the proposed methodology by establishing the estimation consistency. Our numerical studies demonstrate the effectiveness of the proposed method.
    Correctly identifying the subgroups in a heterogeneous population has gained increasing popularity in modern big data applications since studying the heterogeneous effect can eliminate the impact of individual differences and make the estimation results more accurate. Despite the fast growing literature, most existing methods mainly focus on the heterogeneous univariate regression and how to precisely identify subgroups in face of multiple responses remains unclear. Here, we develop a new methodology for heterogeneous multi-response regression via a concave pairwise fusion approach, which estimates the coefficient matrix and identifies the subgroup structure jointly. Besides, we provide theoretical guarantees for the proposed methodology by establishing the estimation consistency. Our numerical studies demonstrate the effectiveness of the proposed method.
  • loading
  • [1]
    Zhang Z, Nie L, Soon G, et al. The use of covariates and random effects in evaluating predictive biomarkers under a potential outcome framework. Annals of Applied Statistics, 2014, 8(4): 2336-2355.
    [2]
    Shen J, He X. Inference for subgroup analysis with a structured logistic-normal mixture model. Journal of the American Statistical Association, 2015, 110(509): 303-312.
    [3]
    Hastie T, Tibshirani R. Discriminant analysis by Gaussian mixtures. Journal of the Royal Statistical Society Series B, 1966, 58 (1): 155-176.
    [4]
    Wei S, Kosorok M. Latent supervised learning. Journal of the American Statistical Association, 2013, 108(503): 957-970.
    [5]
    Guo F J, Levina E, Michailidis G, et al. Pairwise variable selection for high-dimensional model-based clustering. Biometrics, 2010, 66(3): 793-804.
    [6]
    Chi E C, Lange K. Splitting methods for convex clustering. Journal of Computational and Graphical Statistics, 2015, 24(4): 994-1013.
    [7]
    Wang J, Li J, Li Y, et al. A model-based multithreshold method for subgroup identification. Statistics in Medicine, 2019, 38: 2605-2631.
    [8]
    Li J, Yue M, Zhang, W. Subgroup identification via homogeneity pursuit for dense longitudinal/spatial data. Statistics in Medicine, 2019, 38: 3256-3271.
    [9]
    Ma S, Huang J. A concave pairwise fusion approach to subgroup analysis. Journal of the American Statistical Association, 2017, 112(517): 410-423.
    [10]
    Izenman A. Reduced-rank regression for the multivariate linear model. Journal of Multivariate Analysis, 1975, 5(2): 248-264.
    [11]
    Reinsel G, Velu R. Multivariate Reduced-Rank Regression: Theory and Applications. New York: Springer, 1998.
    [12]
    Yuan M, Ekici A, Lu Z, et al. Dimension reduction and coefficient estimation in multivariate linear regression. Journal of the Royal Statistical Society Series B, 2007, 69(3): 329-346.
    [13]
    Chen L, Huang J Z. Sparse reduced-rank regression for simultaneous dimension reduction and variable selection. Journal of the American Statistical Association, 2012, 107: 1533-1545.
    [14]
    Liu H, Wang L, Zhao T. Calibrated multivariate regression with application to neural semantic basis discovery. Journal of Machine Learning Research, 2015, 16: 1579-1606.
    [15]
    Zheng Z, Bahadori M T, Liu Y, et al. Scalable interpretable multi-response regression via SEED. Journal of Machine Learning Research, 2019, 20: 1-34.
    [16]
    Tibshirani R J. Regression shrinkage and selection via the Lasso. Journal of the Royal Statistical Society Series B, 1996, 58(1): 267-288.
    [17]
    Zhang C H. Nearly unbiased variable selection under minimax concave penalty. Annals of Statistics, 2010, 38(2): 894-942.
    [18]
    Fan J, Li R. Variable selection via nonconcave penalized likelihood and its oracle properties. Journal of the American Statistical Association, 2011, 96(459): 1348-1360.
    [19]
    Wang L, Chen G, Li H. Group SCAD regression analysis for microarray time course gene expression data. Bioinformatics, 2007, 23(12): 1486-1494.
    [20]
    Tseng P. Convergence of a block coordinate descent method for nondifferentiable minimization. Journal of Optimization Theory and Applications,2001, 109: 475-494.
  • 加载中

Catalog

    [1]
    Zhang Z, Nie L, Soon G, et al. The use of covariates and random effects in evaluating predictive biomarkers under a potential outcome framework. Annals of Applied Statistics, 2014, 8(4): 2336-2355.
    [2]
    Shen J, He X. Inference for subgroup analysis with a structured logistic-normal mixture model. Journal of the American Statistical Association, 2015, 110(509): 303-312.
    [3]
    Hastie T, Tibshirani R. Discriminant analysis by Gaussian mixtures. Journal of the Royal Statistical Society Series B, 1966, 58 (1): 155-176.
    [4]
    Wei S, Kosorok M. Latent supervised learning. Journal of the American Statistical Association, 2013, 108(503): 957-970.
    [5]
    Guo F J, Levina E, Michailidis G, et al. Pairwise variable selection for high-dimensional model-based clustering. Biometrics, 2010, 66(3): 793-804.
    [6]
    Chi E C, Lange K. Splitting methods for convex clustering. Journal of Computational and Graphical Statistics, 2015, 24(4): 994-1013.
    [7]
    Wang J, Li J, Li Y, et al. A model-based multithreshold method for subgroup identification. Statistics in Medicine, 2019, 38: 2605-2631.
    [8]
    Li J, Yue M, Zhang, W. Subgroup identification via homogeneity pursuit for dense longitudinal/spatial data. Statistics in Medicine, 2019, 38: 3256-3271.
    [9]
    Ma S, Huang J. A concave pairwise fusion approach to subgroup analysis. Journal of the American Statistical Association, 2017, 112(517): 410-423.
    [10]
    Izenman A. Reduced-rank regression for the multivariate linear model. Journal of Multivariate Analysis, 1975, 5(2): 248-264.
    [11]
    Reinsel G, Velu R. Multivariate Reduced-Rank Regression: Theory and Applications. New York: Springer, 1998.
    [12]
    Yuan M, Ekici A, Lu Z, et al. Dimension reduction and coefficient estimation in multivariate linear regression. Journal of the Royal Statistical Society Series B, 2007, 69(3): 329-346.
    [13]
    Chen L, Huang J Z. Sparse reduced-rank regression for simultaneous dimension reduction and variable selection. Journal of the American Statistical Association, 2012, 107: 1533-1545.
    [14]
    Liu H, Wang L, Zhao T. Calibrated multivariate regression with application to neural semantic basis discovery. Journal of Machine Learning Research, 2015, 16: 1579-1606.
    [15]
    Zheng Z, Bahadori M T, Liu Y, et al. Scalable interpretable multi-response regression via SEED. Journal of Machine Learning Research, 2019, 20: 1-34.
    [16]
    Tibshirani R J. Regression shrinkage and selection via the Lasso. Journal of the Royal Statistical Society Series B, 1996, 58(1): 267-288.
    [17]
    Zhang C H. Nearly unbiased variable selection under minimax concave penalty. Annals of Statistics, 2010, 38(2): 894-942.
    [18]
    Fan J, Li R. Variable selection via nonconcave penalized likelihood and its oracle properties. Journal of the American Statistical Association, 2011, 96(459): 1348-1360.
    [19]
    Wang L, Chen G, Li H. Group SCAD regression analysis for microarray time course gene expression data. Bioinformatics, 2007, 23(12): 1486-1494.
    [20]
    Tseng P. Convergence of a block coordinate descent method for nondifferentiable minimization. Journal of Optimization Theory and Applications,2001, 109: 475-494.

    Article Metrics

    Article views (225) PDF downloads(418)
    Proportional views

    /

    DownLoad:  Full-Size Img  PowerPoint
    Return
    Return