Confidence intervals for high-dimensional multi-task regression

Yuanli Ma; Yang Li; Jianjun Xu

doi:10.52396/JUSTC-2022-0115

PDF( 1540 KB)

Open Access JUSTC Information Science and Technology / Management 04 May 2023

Confidence intervals for high-dimensional multi-task regression

1.
School of Data Science, University of Science and Technology of China, Hefei 230026, China
2.
International Institute of Finance, School of Management, University of Science and Technology of China, Hefei 230026, China

Cite this:

https://doi.org/10.52396/JUSTC-2022-0115

More Information

Author Bio:
Yuanli Ma is currently a master student at the University of Science and Technology of China. Her research mainly focuses on big data problems

Yang Li is currently a postdoctoral researcher at the University of Science and Technology of China (USTC). He received his Ph.D. degree in Statistics from USTC in 2021. His research interests include high-dimensional statistical inference and distributed learning
Corresponding author: E-mail: tjly@mail.ustc.edu.cn
Received Date: 21 August 2022
Accepted Date: 16 November 2022

Available Online: 04 May 2023

Abstract Full text PDF

Abstract

Abstract

Regression problems among multiple responses and predictors have been widely employed in many applications, such as biomedical sciences and economics. In this paper, we focus on statistical inference for the unknown coefficient matrix in high-dimensional multi-task learning problems. The new statistic is constructed in a row-wise manner based on a two-step projection technique, which improves the inference efficiency by removing the impacts of important signals. Based on the established asymptotic normality for the proposed two-step projection estimator (TPE), we generate corresponding confidence intervals for all components of the unknown coefficient matrix. The performance of the proposed method is presented through simulation studies and a real data analysis.

Graphical abstract

Statistical inference for coefficient matrix in high-dimensional multi-task regression.

Abstract

Regression problems among multiple responses and predictors have been widely employed in many applications, such as biomedical sciences and economics. In this paper, we focus on statistical inference for the unknown coefficient matrix in high-dimensional multi-task learning problems. The new statistic is constructed in a row-wise manner based on a two-step projection technique, which improves the inference efficiency by removing the impacts of important signals. Based on the established asymptotic normality for the proposed two-step projection estimator (TPE), we generate corresponding confidence intervals for all components of the unknown coefficient matrix. The performance of the proposed method is presented through simulation studies and a real data analysis.

Public Summary

We propose a two-step projection estimator for statistical inference in high-dimensional multi-task learning problems.
We establish the asymptotic properties of the proposed estimator.
The performance of our method is presented through simulation studies and a TCGA-OV dataset.

FullText(HTML)

References(34)

References

[1]	Lounici K, Pontil M, Tsybakov A B, et al. Taking advantage of sparsity in multi-task learning. arXiv:0903.1468, 2009.
[2]	Obozinski G, Taskar B, Jordan M I. Joint covariate selection and joint subspace selection for multiple classification problems. Stat. Comput., 2010, 20 (2): 231–252. doi: 10.1007/s11222-008-9111-x
[3]	Lounici K, Pontil M, Van De Geer S, et al. Oracle inequalities and optimal inference under group sparsity. Ann. Statist., 2011, 39 (4): 2164–2204. doi: 10.1214/11-AOS896
[4]	Wang H, Nie F, Huang H, et al. Identifying quantitative trait loci via group-sparse multitask regression and feature selection: An imaging genetics study of the ADNI cohort. Bioinformatics, 2012, 28 (2): 229–237. doi: 10.1093/bioinformatics/btr649
[5]	Greenlaw K, Szefer E, Graham J, et al. A Bayesian group sparse multi-task regression model for imaging genetics. Bioinformatics, 2017, 33 (16): 2513–2522. doi: 10.1093/bioinformatics/btx215
[6]	Zhou J J, Cho M H, Lange C, et al. Integrating multiple correlated phenotypes for genetic association analysis by maximizing heritability. Human Heredity, 2015, 79 (2): 93–104. doi: 10.1159/000381641
[7]	Kim S, Sohn K-A, Xing E P. A multivariate regression approach to association analysis of a quantitative trait network. Bioinformatics, 2009, 25 (12): i204–i212. doi: 10.1093/bioinformatics/btp218
[8]	Mørk S, Pletscher-Frankild S, Palleja Caro A, et al. Protein-driven inference of miRNA-disease associations. Bioinformatics, 2014, 30 (3): 392–397. doi: 10.1093/bioinformatics/btt677
[9]	Gommans W M, Berezikov E. Controlling miRNA regulation in disease. In: Next-Generation MicroRNA Expression Profiling Technology: Methods and Protocols. Totowa, NJ: Humana Press, 2012: 1–18.
[10]	Izenman A J. Reduced-rank regression for the multivariate linear model. J. Multivariate Anal., 1975, 5 (2): 248–264. doi: 10.1016/0047-259X(75)90042-1
[11]	Velu R, Reinsel G C. Multivariate Reduced-Rank Regression: Theory and Applications. New York: Springer Science & Business Media, 1998.
[12]	Anderson T W. Asymptotic distribution of the reduced rank regression estimator under general conditions. Ann. Statist., 1999, 27 (4): 1141–1154. doi: 10.1214/aos/1017938918
[13]	Uematsu Y, Fan Y, Chen K, et al. SOFAR: Large-scale association network learning. IEEE Trans. Inform. Theory, 2019, 65 (8): 4924–4939. doi: 10.1109/TIT.2019.2909889
[14]	Zheng Z, Li Y, Wu J, et al. Sequential scaled sparse factor regression. J. Bus. Econom. Statist., 2022, 40 (2): 595–604. doi: 10.1080/07350015.2020.1844212
[15]	Yuan M, Ekici A, Lu Z, et al. Dimension reduction and coefficient estimation in multivariate linear regression. The Journal of the Royal Statistical Society, Series B: Statistical Methodology, 2007, 69 (3): 329–346. doi: 10.1111/j.1467-9868.2007.00591.x
[16]	Bunea F, She Y, Wegkamp M H. Joint variable and rank selection for parsimonious estimation of high-dimensional matrices. Ann. Statist., 2012, 40 (5): 2359–2388. doi: 10.1214/12-AOS1039
[17]	Chen L, Huang J Z. Sparse reduced-rank regression for simultaneous dimension reduction and variable selection. J. Amer. Statist. Assoc., 2012, 107 (500): 1533–1545. doi: 10.1080/01621459.2012.734178
[18]	Chen K, Chan K-S, Stenseth N C. Reduced rank stochastic regression with a sparse singular value decomposition. Journal of the Royal Statistical Society, Series B: Statistical Methodology, 2012, 74 (2): 203–221. doi: 10.1111/j.1467-9868.2011.01002.x
[19]	Obozinski G, Wainwright M J, Jordan M I. Support union recovery in high-dimensional multivariate regression. Ann. Statist., 2011, 39 (1): 1–47. doi: 10.1214/09-AOS776
[20]	Turlach B A, Venables W N, Wright S J. Simultaneous variable selection. Technometrics, 2005, 47 (3): 349–363. doi: 10.1198/004017005000000139
[21]	Quattoni A, Carreras X, Collins M, et al. An efficient projection for ℓ_{1, ∞} regularization. In: Proceedings of the 26th Annual International Conference on Machine Learning. New York: ACM, 2009: 857–864.
[22]	Zhang C-H, Zhang S S. Confidence intervals for low dimensional parameters in high dimensional linear models. Journal of the Royal Statistical Society Series B: Statistical Methodology, 2014, 76 (1): 217–242. doi: 10.1111/rssb.12026
[23]	Chevalier J-A, Salmon J, Gramfort A, et al. Statistical control for spatio-temporal MEG/EEG source imaging with desparsified mutli-task lasso. In: Advances in Neural Information Processing Systems 33. Red Hook, NY: Curran Associates, Inc., 2020: 1759–1770.
[24]	Li Y, Zheng Z, Zhou J, et al. High-dimensional inference via hybrid orthogonalization. arXiv:2111.13391, 2012.
[25]	Li R, Zhong W, Zhu L. Feature screening via distance correlation learning. Journal of the American Statistical Association, 2012, 107 (499): 1129–1139. doi: 10.1080/01621459.2012.695654
[26]	Székely G J, Rizzo M L, Bakirov N K. Measuring and testing dependence by correlation of distances. Ann. Statist., 2007, 35 (6): 2769–2794. doi: 10.1214/009053607000000505
[27]	Reid S, Tibshirani R, Friedman J. A study of error variance estimation in lasso regression. Statist. Sinica, 2016, 26: 35–67. doi: 10.5705/ss.2014.042
[28]	Ye F,Zhang C H. Rate minimaxity of the lasso and Dantzig selector for the ℓ_q loss in ℓ_r balls. Journal of Machine Learning Research, 2010, 11: 3519–3540.
[29]	Cao H, Zhou J, Schwarz E. RMTL: an R library for multi-task learning. Bioinformatics, 2019, 35 (10): 1797–1798. doi: 10.1093/bioinformatics/bty831
[30]	Sakurai T, Fujikoshi Y. High-dimensional properties of information criteria and their efficient criteria for multivariate linear regression models with covariance structures. 2017. http://www.math.sci.hiroshima-u.ac.jp/stat/TR/TR17/TR17-13.pdf. Accessed August 1, 2022
[31]	Li Y, Nan B, Zhu J. Multivariate sparse group lasso for the multivariate multiple linear regression with an arbitrary group structure. Biometrics, 2015, 71 (2): 354–363. doi: 10.1111/biom.12292
[32]	Aziz N B, Mahmudunnabi R G, Umer M, et al. MicroRNAs in ovarian cancer and recent advances in the development of microRNA-based biosensors. Analyst, 2020, 145 (6): 2038–2057. doi: 10.1039/c9an02263e
[33]	Wu Y D, Li Q, Zhang R S, et al. Circulating microRNAs: Biomarkers of disease. Clinica Chimica Acta, 2021, 516: 46–54. doi: 10.1016/j.cca.2021.01.008
[34]	Ren Z P, Hou X B, Tian X D, et al. Identification of nine microRNAs as potential biomarkers for lung adenocarcinoma. FEBS Open Bio, 2019, 9 (2): 315–327. doi: 10.1002/2211-5463.12572

Supplements(0)

Track Citations

Proportional views

Proportional views

Get Citation

PDF

XML

Figure 1. Estimates of the unknown coefficients of miRNA hsa-mir-486-2 (red squares for TPE and black dots for MLDPE) and the corresponding 95% confidence intervals (obtained by TPE) over all 50 proteins.

[1]	Lounici K, Pontil M, Tsybakov A B, et al. Taking advantage of sparsity in multi-task learning. arXiv:0903.1468, 2009.
[2]	Obozinski G, Taskar B, Jordan M I. Joint covariate selection and joint subspace selection for multiple classification problems. Stat. Comput., 2010, 20 (2): 231–252. doi: 10.1007/s11222-008-9111-x
[3]	Lounici K, Pontil M, Van De Geer S, et al. Oracle inequalities and optimal inference under group sparsity. Ann. Statist., 2011, 39 (4): 2164–2204. doi: 10.1214/11-AOS896
[4]	Wang H, Nie F, Huang H, et al. Identifying quantitative trait loci via group-sparse multitask regression and feature selection: An imaging genetics study of the ADNI cohort. Bioinformatics, 2012, 28 (2): 229–237. doi: 10.1093/bioinformatics/btr649
[5]	Greenlaw K, Szefer E, Graham J, et al. A Bayesian group sparse multi-task regression model for imaging genetics. Bioinformatics, 2017, 33 (16): 2513–2522. doi: 10.1093/bioinformatics/btx215
[6]	Zhou J J, Cho M H, Lange C, et al. Integrating multiple correlated phenotypes for genetic association analysis by maximizing heritability. Human Heredity, 2015, 79 (2): 93–104. doi: 10.1159/000381641
[7]	Kim S, Sohn K-A, Xing E P. A multivariate regression approach to association analysis of a quantitative trait network. Bioinformatics, 2009, 25 (12): i204–i212. doi: 10.1093/bioinformatics/btp218
[8]	Mørk S, Pletscher-Frankild S, Palleja Caro A, et al. Protein-driven inference of miRNA-disease associations. Bioinformatics, 2014, 30 (3): 392–397. doi: 10.1093/bioinformatics/btt677
[9]	Gommans W M, Berezikov E. Controlling miRNA regulation in disease. In: Next-Generation MicroRNA Expression Profiling Technology: Methods and Protocols. Totowa, NJ: Humana Press, 2012: 1–18.
[10]	Izenman A J. Reduced-rank regression for the multivariate linear model. J. Multivariate Anal., 1975, 5 (2): 248–264. doi: 10.1016/0047-259X(75)90042-1
[11]	Velu R, Reinsel G C. Multivariate Reduced-Rank Regression: Theory and Applications. New York: Springer Science & Business Media, 1998.
[12]	Anderson T W. Asymptotic distribution of the reduced rank regression estimator under general conditions. Ann. Statist., 1999, 27 (4): 1141–1154. doi: 10.1214/aos/1017938918
[13]	Uematsu Y, Fan Y, Chen K, et al. SOFAR: Large-scale association network learning. IEEE Trans. Inform. Theory, 2019, 65 (8): 4924–4939. doi: 10.1109/TIT.2019.2909889
[14]	Zheng Z, Li Y, Wu J, et al. Sequential scaled sparse factor regression. J. Bus. Econom. Statist., 2022, 40 (2): 595–604. doi: 10.1080/07350015.2020.1844212
[15]	Yuan M, Ekici A, Lu Z, et al. Dimension reduction and coefficient estimation in multivariate linear regression. The Journal of the Royal Statistical Society, Series B: Statistical Methodology, 2007, 69 (3): 329–346. doi: 10.1111/j.1467-9868.2007.00591.x
[16]	Bunea F, She Y, Wegkamp M H. Joint variable and rank selection for parsimonious estimation of high-dimensional matrices. Ann. Statist., 2012, 40 (5): 2359–2388. doi: 10.1214/12-AOS1039
[17]	Chen L, Huang J Z. Sparse reduced-rank regression for simultaneous dimension reduction and variable selection. J. Amer. Statist. Assoc., 2012, 107 (500): 1533–1545. doi: 10.1080/01621459.2012.734178
[18]	Chen K, Chan K-S, Stenseth N C. Reduced rank stochastic regression with a sparse singular value decomposition. Journal of the Royal Statistical Society, Series B: Statistical Methodology, 2012, 74 (2): 203–221. doi: 10.1111/j.1467-9868.2011.01002.x
[19]	Obozinski G, Wainwright M J, Jordan M I. Support union recovery in high-dimensional multivariate regression. Ann. Statist., 2011, 39 (1): 1–47. doi: 10.1214/09-AOS776
[20]	Turlach B A, Venables W N, Wright S J. Simultaneous variable selection. Technometrics, 2005, 47 (3): 349–363. doi: 10.1198/004017005000000139
[21]	Quattoni A, Carreras X, Collins M, et al. An efficient projection for ℓ_{1, ∞} regularization. In: Proceedings of the 26th Annual International Conference on Machine Learning. New York: ACM, 2009: 857–864.
[22]	Zhang C-H, Zhang S S. Confidence intervals for low dimensional parameters in high dimensional linear models. Journal of the Royal Statistical Society Series B: Statistical Methodology, 2014, 76 (1): 217–242. doi: 10.1111/rssb.12026
[23]	Chevalier J-A, Salmon J, Gramfort A, et al. Statistical control for spatio-temporal MEG/EEG source imaging with desparsified mutli-task lasso. In: Advances in Neural Information Processing Systems 33. Red Hook, NY: Curran Associates, Inc., 2020: 1759–1770.
[24]	Li Y, Zheng Z, Zhou J, et al. High-dimensional inference via hybrid orthogonalization. arXiv:2111.13391, 2012.
[25]	Li R, Zhong W, Zhu L. Feature screening via distance correlation learning. Journal of the American Statistical Association, 2012, 107 (499): 1129–1139. doi: 10.1080/01621459.2012.695654
[26]	Székely G J, Rizzo M L, Bakirov N K. Measuring and testing dependence by correlation of distances. Ann. Statist., 2007, 35 (6): 2769–2794. doi: 10.1214/009053607000000505
[27]	Reid S, Tibshirani R, Friedman J. A study of error variance estimation in lasso regression. Statist. Sinica, 2016, 26: 35–67. doi: 10.5705/ss.2014.042
[28]	Ye F,Zhang C H. Rate minimaxity of the lasso and Dantzig selector for the ℓ_q loss in ℓ_r balls. Journal of Machine Learning Research, 2010, 11: 3519–3540.
[29]	Cao H, Zhou J, Schwarz E. RMTL: an R library for multi-task learning. Bioinformatics, 2019, 35 (10): 1797–1798. doi: 10.1093/bioinformatics/bty831
[30]	Sakurai T, Fujikoshi Y. High-dimensional properties of information criteria and their efficient criteria for multivariate linear regression models with covariance structures. 2017. http://www.math.sci.hiroshima-u.ac.jp/stat/TR/TR17/TR17-13.pdf. Accessed August 1, 2022
[31]	Li Y, Nan B, Zhu J. Multivariate sparse group lasso for the multivariate multiple linear regression with an arbitrary group structure. Biometrics, 2015, 71 (2): 354–363. doi: 10.1111/biom.12292
[32]	Aziz N B, Mahmudunnabi R G, Umer M, et al. MicroRNAs in ovarian cancer and recent advances in the development of microRNA-based biosensors. Analyst, 2020, 145 (6): 2038–2057. doi: 10.1039/c9an02263e
[33]	Wu Y D, Li Q, Zhang R S, et al. Circulating microRNAs: Biomarkers of disease. Clinica Chimica Acta, 2021, 516: 46–54. doi: 10.1016/j.cca.2021.01.008
[34]	Ren Z P, Hou X B, Tian X D, et al. Identification of nine microRNAs as potential biomarkers for lung adenocarcinoma. FEBS Open Bio, 2019, 9 (2): 315–327. doi: 10.1002/2211-5463.12572

TrendMD

Volume 53 Issue 4 page: 0403

Cover

Keywords

Article Metrics

Article views (443) PDF downloads(1763)

Confidence intervals for high-dimensional multi-task regression

Abstract

Graphical abstract

Abstract

Public Summary

References

Proportional views

Catalog

Recommended articles

TrendMD

Article Metrics

Proportional views

Authors

Browse

Contact Us

About

Confidence intervals for high-dimensional multi-task regression

Share

Tools

Abstract

Graphical abstract

Abstract

Public Summary

References

Proportional views

Catalog

Recommended articles

TrendMD

Article Metrics

Proportional views

Authors

Browse

Contact Us

About

Export File

Citation

Format

Content