Gaussian graphical model estimation with measurement error

Xianglu Wang

doi:10.52396/JUSTC-2022-0108

JUSTC > 2023 > 53(11): 1105. > DOI: 10.52396/JUSTC-2022-0108 CSTR: 32290.14.JUSTC-2022-0108

PDF (1033 KB)

Open Access JUSTC Mathematics Article 14 January 2024

Gaussian graphical model estimation with measurement error

Xianglu Wang^,

Department of Statistics and Finance, School of Management, University of Science and Technology of China, Hefei 230026, China

Cite this: JUSTC, 2023, 53(11): 1105

https://doi.org/10.52396/JUSTC-2022-0108

CSTR: 32290.14.JUSTC-2022-0108

More Information

Author Bio:
Xianglu Wang is currently a master student at the School of Management, University of Science and Technology of China (USTC). He received his B.S. degree from USTC in 2019. His research mainly focuses on high-dimensional variable selection and inference
Corresponding author:
Xianglu Wang, E-mail: wz124517@mail.ustc.edu.cn
Received Date: July 27, 2022
Accepted Date: October 22, 2022
Available Online: January 14, 2024

Full text PDF

Abstract

Abstract

It is well known that regression methods designed for clean data will lead to erroneous results if directly applied to corrupted data. Despite the recent methodological and algorithmic advances in Gaussian graphical model estimation, how to achieve efficient and scalable estimation under contaminated covariates is unclear. Here a new methodology called convex conditioned innovative scalable efficient estimation (COCOISEE) for Gaussian graphical models under both additive and multiplicative measurement errors is developed. It combines the strengths of the innovative scalable efficient estimation in the Gaussian graphical model and the nearest positive semidefinite matrix projection, thus enjoying stepwise convexity and scalability. Comprehensive theoretical guarantees are provided and the effectiveness of the proposed methodology is demonstrated through numerical studies.

Graphical Abstract

The process of recovering the precision matrix in the presence of additive and multiplicative measurement errors.

Abstract

It is well known that regression methods designed for clean data will lead to erroneous results if directly applied to corrupted data. Despite the recent methodological and algorithmic advances in Gaussian graphical model estimation, how to achieve efficient and scalable estimation under contaminated covariates is unclear. Here a new methodology called convex conditioned innovative scalable efficient estimation (COCOISEE) for Gaussian graphical models under both additive and multiplicative measurement errors is developed. It combines the strengths of the innovative scalable efficient estimation in the Gaussian graphical model and the nearest positive semidefinite matrix projection, thus enjoying stepwise convexity and scalability. Comprehensive theoretical guarantees are provided and the effectiveness of the proposed methodology is demonstrated through numerical studies.
Public Summary
- We propose a new methodology COCOISEE to achieve scalable and interpretable estimation for Gaussian graphical model under both additive and multiplicative measurement errors.
- The method is stepwise convex, computationally stable, efficient and scalable.
- Both theoretical and simulation results verify the feasibility of our method.

FullText(HTML)

1. Introduction

The estimation precision (inverse covariance) matrix has become a basic problem in modern multivariate statistical analysis and has a very wide range of practical applications. For example, in genome-wide association studies, genomic data are used to estimate the precision matrix and obtain the genetic correlation between traits from the nonzero terms. It is not only crucial for understanding gene regulatory pathways and gene functions but also contributes to a deeper understanding of the genetic basis of quantitative variation in complex traits^[1]. Similarly, in social network analysis^{[2, 3]}, a precision matrix can be used to explore the correlation between users and is also widely used in economics and finance^[4–6], health science^[7], high-dimensional discriminant analysis^[8], and complex data visualization^[9].

The estimation of the precision matrix has attracted many scholars to study in recent years. In terms of the form of research methods, the currently proposed methods can be roughly divided into two classes: penalized likelihood methods and column-by-column estimation methods. The former class includes, for example, Ref. [10], which has laid down a general framework for the penalized likelihood method, and Refs. [11–15], which used different penalty algorithms to select zero elements in the precision matrix, and these methods all impose a penalty term on the negative log-likelihood function to estimate the precision matrix. The theoretical properties of these methods have been thoroughly studied by Refs. [16, 17]. The latter class includes, for instance, Refs. [8, 14, 18–21]. Such methods convert the problem of precision matrix estimation into a nodewise or pairwise regression, or optimization problem and then apply the technique of high-dimensional regularization using the Lasso or Dantzig selector type methods. However, most of these existing methods are designed for clean data, while corrupted data are often encountered in various fields. Naively applying the aforementioned methods to analyze the corrupted data can lead to inconsistent and unstable estimates, thus leading to misleading conclusions. Therefore, it is urgent to develop an efficient method for large precision matrix estimation under measurement errors.

Various statistical methods have been proposed to mitigate the effects of measurement errors. Specifically, there are a series of works to address measurement errors in univariate response linear regression models, which can be traced back to Ref. [22] and its extensions include Refs. [23, 24]. In recent years, great progress has also been made in high-dimensional error-in-variables regression. For instance, Ref. [25] developed a $l_{1}$ -regularized likelihood approach to handle missing data by solving a negative log-likelihood optimization problem via the EM algorithm. Similarly, Ref. [26] developed a Lasso-type estimator by replacing the corrupted Gram matrix with unbiased estimates for corrupted data. Furthermore, Ref. [27] proposed a Dantzig selector-type estimator based on the compensated matrix uncertainty method. However, after adjusting for corrupted data, negative likelihood functions are usually not convex and may depend on some crucial hidden parameters. To address this difficulty, Ref. [28] proposed the convex conditioned Lasso (CoCoLasso) method by replacing the unbiased covariance matrix estimate with the nearest positive semidefinite matrix, thus enjoying stepwise convexity in high-dimensional measurement error regression.

In this work, motivated by the study of the above scholars, we develop a new approach called convex conditioned innovative scalable efficient estimation (COCOISEE) to address Gaussian graphical models under measurement errors by combining the strengths of the recently developed innovative scalable efficient estimation^[21] and the nearest positive semidefinite matrix projection^[28], thus enjoying stepwise convexity and scalability.

The rest of the paper is organized as follows. Section 2 presents the model setting and the new methodology. Theoretical properties including consistency in estimation are established in Section 3. We provide simulation examples in Section 4. Section 5 discusses possible extensions of our method and possible future work. All technical details are relegated to Appendix.

Notation. We briefly summarize some symbols to be used throughout the paper. We use ${\boldsymbol{x}}$ , $X$ , and ${\boldsymbol{X}}$ to denote vectors, elements, and matrices, respectively. For any matrix ${\boldsymbol{X}}=(X_{ij})$ , let $\left \| {\boldsymbol{X}} \right \|_{\infty}={\rm max}_{i}\displaystyle \sum \limits_j|X_{ij}|$ denote the matrix $l_{\infty}$ norm, whereas $\left \|{\boldsymbol{X}} \right \|_{\rm max}={\rm max}_{i, j}|X_{ij}|$ denotes the elementwise maximum norm. Similarly, for any vector ${\boldsymbol{x}}=(X_1, ..., X_p)^{\rm T}$ , let $||{\boldsymbol{x}}||_q=\left( {\displaystyle \sum\limits_i|X_i|^q} \right)^{1/q}$ be the $l_q$ norm of vector ${\boldsymbol{x}}$ , and $l_{\infty}$ vector norm is $||{\boldsymbol{x}}||_{\infty}={\rm max}_{i}|X_{i}|$ . For any subsets $A, B \subset \{1, ..., p\}$ , denote by ${\boldsymbol{x}}_A$ a subvector of ${\boldsymbol{x}}$ formed by its components with indices in $A$ , denote by ${\boldsymbol{x}}_{-A}$ a subvector of ${\boldsymbol{x}}$ formed by its components without indices in $A$ , and $\boldsymbol \varOmega_{A, B}=(w_{ij})_{i\in A, j\in B}$ a submatrix of ${\boldsymbol \varOmega}$ with rows in A and columns in B. Let $(S_{l})_{1\leqslant l \leqslant L}$ be a partition of the index set $\{1, ..., p\}$ . $\lambda_{\rm min}(\cdot)$ and $\lambda_{\rm max}(\cdot)$ denote the smallest and largest eigenvalues of a given symmetric matrix, respectively. We use $\varDelta$ to denote a fixed sequence going to zero.

2. Gaussian graphical model under measurement errors

2.1 Model setting

Consider an undirected Gaussian graphical model $G=(V, E)$ for a $p$ -dimensional Gaussian random vector

$\begin{array}{l} {\boldsymbol{x}}=(X_1, ..., X_p)^{\rm T}\sim N(\mu, \varSigma), \end{array}$

where $G$ is an undirected graph related to vector ${\boldsymbol{x}}$ , $V=\{X_1, ..., X_p\}$ is a node set, $E=(e_{ij})$ is an edge matrix with each entry $e_{ij}$ representing the edge between $X_i$ and $X_j$ , $\mu=(\mu_i)_{1\leqslant i \leqslant p}$ is a mean vector and $\varSigma=(\sigma_{ij})$ is a covariance matrix. Assume that $\{{\boldsymbol{x}}_{i}\}_{1\leqslant i \leqslant n}$ is an independently identically distributed $(\rm i.i.d.)$ sample from the Gaussian distribution. Without loss of generality, suppose that the mean vector $\mu=0$ throughout the paper. Denote by ${\boldsymbol \varOmega} =(\omega_{ij})$ the precision matrix, that is, the inverse $\varSigma^{-1}$ of the covariance matrix $\varSigma$ . It is well known that in Gaussian graphical model theory, an edge $(i, j)$ exists (or $e_{ij}\neq 0$ ) between $X_i$ and $X_j$ if and only if the corresponding entry $\omega_{ij}$ of the precision matrix is non-zero. The above characterization of the precision matrix shows that the problem of estimating Gaussian graphical model $G$ can be equivalent to estimating the support:

$\begin{array}{l} {\rm supp}( {\boldsymbol \varOmega} )=\{(i, j):\omega_{ij}\neq 0\}. \end{array}$

To motivate our new method, we consider a linear transformation of the precision matrix similar to Ref. [21]. Specifically, based on the transformation of ${\boldsymbol{x}}$ , we have the following structure:

$\begin{array}{l} \tilde{{\boldsymbol{x}}}= {\boldsymbol \varOmega} {\boldsymbol{x}}, \quad \tilde{{\boldsymbol{x}}}\sim N(0, {\boldsymbol \varOmega} ). \end{array}$

(2.1)

Therefore, from the distribution of $\tilde{{\boldsymbol{x}}}$ , we can see that if estimator $\tilde{{\boldsymbol{x}}}$ is obtained, the precision matrix ${\boldsymbol \varOmega}$ can be achieved by estimating the covariance matrix of $\tilde{{\boldsymbol{x}}}$ .

By the definition of $\tilde{{\boldsymbol{x}}}$ , we can write the subvector $\tilde{{\boldsymbol{x}}}_{S}$ in following form:

$\begin{array}{l} \hat{{\boldsymbol{x}}}_{S}= {\boldsymbol \varOmega} _{S, S}({\boldsymbol{x}}_{S}+ {\boldsymbol \varOmega} _{S, S}^{-1} {\boldsymbol \varOmega} _{S, -S}{\boldsymbol{x}}_{-S})= {\boldsymbol \varOmega} _{S, S}{\boldsymbol \eta}_{S}. \end{array}$

(2.2)

It is well known that in the Gaussian graphical model, the condition distribution about the vector ${\boldsymbol{x}}_{S}$ can be written as

$\begin{array}{l} {\boldsymbol{x}}_{S}|{\boldsymbol{x}}_{-S}\sim N(- {\boldsymbol \varOmega} _{S, S}^{-1} {\boldsymbol \varOmega} _{S, -S}{\boldsymbol{x}}_{-S}, {\boldsymbol \varOmega} _{S, S}^{-1}). \end{array}$

(2.3)

Formula (3) can be expressed as the following multivariate linear regression model:

$\begin{array}{l} {\boldsymbol{x}}_{S}=\varTheta_{S}^{\rm T}{\boldsymbol{x}}_{-S}+\boldsymbol \eta_{S}, \end{array}$

(2.4)

where $\varTheta_{S}=- {\boldsymbol \varOmega} _{-S, S} {\boldsymbol \varOmega} _{S, S}^{-1}$ is a $n\times (p-|S|)$ dimensional regression coefficients matrix, and ${\boldsymbol \eta}_{S}={\boldsymbol{x}}_{S}+ {\boldsymbol \varOmega} _{S, S}^{-1} {\boldsymbol \varOmega} _{S, -S}{\boldsymbol{x}}_{-S}$ is the matrix of model errors which has a multivariate Gaussian distribution $N(0, {\boldsymbol \varOmega} _{S, S}^{-1})$ and independent of ${\boldsymbol{x}}_{-S}$ . Thus, the estimates of the two terms on the right side of Eq. $(2)$ are correlated and can be efficiently achieved by linear regression techniques, i.e., Lasso^[29].

The representation of subvector ${\boldsymbol{x}}_{S}$ shows that the unknown subvector $\tilde{{\boldsymbol{x}}}_{S}$ can be estimated by the regression technique. To see this, let $\hat{{\boldsymbol \eta}}_{S}$ be the residual vector obtained by fitting model $(4)$ using Lasso. Then the unknown matrix ${\boldsymbol \varOmega} _{S, S}$ can be estimated as the inverse of the sample covariance matrix of the model residual vector $\hat{{\boldsymbol \eta}}_{S}$ . Then we can estimate the subvector $\tilde{{\boldsymbol{x}}}_{S}$ in Eq. $(2)$ as $\hat{{\boldsymbol{x}}}_{S}=\hat{ {\boldsymbol \varOmega} }_{S, S}\hat{{\boldsymbol \eta}}_{S}$ .

The ISEE^[21] repeated the above procedure for each $S_{l}$ with $1\leqslant l \leqslant p$ to obtain the estimated subvectors $\hat{{\boldsymbol{x}}}_{S_{l}}$ ’s, and then stacked all these subvectors together to form an estimate $\hat{{\boldsymbol{x}}}$ of the oracle innovated vector $\tilde{{\boldsymbol{x}}}$ . Thus, the problem of estimating the precision matrix based on the original vector ${\boldsymbol{x}}$ reduces to that of estimating the covariance matrix based on the estimated transformed vector $\tilde{{\boldsymbol{x}}}$ .

However, in many practical applications, the data we collect may contain unobvious measurement errors, resulting in inaccurate Gaussian graphical model estimation. In this paper, we consider two kinds of measurement errors associated with matrix ${\boldsymbol{X}}$ , as shown below:

● Additive error. We assume that the observed elements of the data matrix $\boldsymbol{Z}$ are affected by additive measurement errors $z_{ij}=x_{ij}+a_{ij}$ , and write it as a matrix form $\boldsymbol{Z}= \boldsymbol{X} + \boldsymbol{A}$ , where ${\boldsymbol{A}}= (a_{ij})_{p\times p}$ is the measurement error matrix. We assume that the rows of $\boldsymbol{A}$ are $\rm i.i.d.$ with mean $\bf{0}$ and finite covariance matrix $\varSigma^{A}$ .

● Multiplicative error. We assume that the observed elements of the data matrix ${\boldsymbol{Z}}$ are affected by multiplicative measurement errors $z_{ij}= x_{ij} \odot m_{ij}$ , and write it as a matrix form $\boldsymbol{Z}= \boldsymbol{X} \odot \boldsymbol{M}$ , where $\odot$ denotes the Hadamard product and ${\boldsymbol{M}} = (m_{ij})_{p\times p}$ is the measurement error matrix. We also assume that the rows of ${\boldsymbol{M}}$ are $\rm i.i.d.$ with mean $\boldsymbol{\mu^{M}}$ and finite covariance matrix $\varSigma^{M}$ . Particularly, missing data can be regarded as a special case of the multiplicative measurement error, where $m_{ij}=I(x_{ij})$ , and $I(\cdot)$ represents the indicator function.

2.2 Scalable estimation by COCOISEE

The data matrix is denote by ${\boldsymbol{X}}=({\boldsymbol{x}}_{1}, ..., {\boldsymbol{x}}_{n})^{\rm T}\in {\rm R}^{n\times p}$ . Then, the transformed data matrix is defined as $\tilde{{\boldsymbol{X}}}={\boldsymbol{X}} {\boldsymbol \varOmega}$ . Thus, Eq. (4) can be rewritten as

$\begin{array}{l} {\boldsymbol{X}}_{S}={\boldsymbol{X}}_{-S}\varTheta_{S}+E_{S}, \end{array}$

(2.5)

where $E_{S}$ is an error matrix with rows being $\rm i.i.d.$ copies from ${\boldsymbol \eta}_{S}^{\rm T}$ . Then, $\tilde{{\boldsymbol{X}}}_{S}$ can be written as

$\begin{split} \tilde{{\boldsymbol{X}}}_{S}= & ({\boldsymbol{X}} {\boldsymbol \varOmega} )_{S}={\boldsymbol{X}}_{S} {\boldsymbol \varOmega} _{S, S}+{\boldsymbol{X}}_{-S} {\boldsymbol \varOmega} _{-S, S}=\\&({\boldsymbol{X}}_{S}+{\boldsymbol{X}}_{-S} {\boldsymbol \varOmega} _{-S, S} {\boldsymbol \varOmega} _{S, S}^{-1}) {\boldsymbol \varOmega} _{S, S}=E_{S} {\boldsymbol \varOmega} _{S, S}. \end{split}$

(2.6)

The representation in Eq. (6) provides the basis for the estimation of matrix $\tilde{{\boldsymbol{X}}}$ .

To fit the Gaussian linear regression model (5), we suggest univariate regression methods. Specifically, for each node $i$ in the index set $S$ , we consider the univariate linear regression model for ${\boldsymbol{X}}_{i} :$

$\begin{array}{l} {\boldsymbol{X}}_{i}={\boldsymbol{X}}_{-S}\theta_{i}+E_{i}. \end{array}$

(2.7)

In general, we can use some high-dimensional regression methods to fit Eq. (7), such as SCAD^[30], Lasso^[29], elastic net^[31], adaptive Lasso^[32], and Dantzig selector^[33]. However, we observe the data matrix ${\boldsymbol{Z}}$ instead of the matrix ${\boldsymbol{X}}$ in linear models with error-in-variables. Thus, directly applying Lasso to this equation by minimizing

$\begin{array}{l} \dfrac{1}{2}\left \| {\boldsymbol{Z}}_{i}-{\boldsymbol{Z}}_{-S}\theta_{i} \right \| _2^2+\lambda_{i}\left \| \theta_{i} \right \|_{1} \Longleftrightarrow \dfrac{1}{2n}\theta_{i}^{\rm T}\varSigma_{-S, -S}\theta_{i}-\theta_{i}^{\rm T}\varSigma_{-S, i}+\lambda_{i}\left \|\theta_{i}\right \|_{1} \end{array}$

is often erroneous, where $\varSigma_{-S, -S}=\dfrac{1}{n}{\boldsymbol{Z}}_{-S}^{\rm T}{\boldsymbol{Z}}_{-S}$ and $\varSigma_{-S, i}= \dfrac{1}{n}{\boldsymbol{Z}}_{-S}^{\rm T}{\boldsymbol{Z}}_{i}$ . Based on the corrupted data matrix ${\boldsymbol{Z}}$ , Ref. [28] suggested constructing unbiased surrogates $\hat{\varSigma}_{-S, -S}$ and $\tilde{\varSigma}_{-S, i}$ for $\varSigma_{-S, -S}$ and $\varSigma_{-S, i}$ that cannot be observed to mitigate the effects of measurement errors. For the additive error setting, the unbiased surrogates are defined as

$\begin{array}{l} \begin{split} & \hat{\varSigma}_{-S, -S}^{\rm add}=\frac{1}{n}{\boldsymbol{Z}}_{-S}^{\rm T}{\boldsymbol{Z}}_{-S}-\varSigma_{-S, -S}^{A} , \\& \tilde{\varSigma}_{-S, i}^{\rm add}=\frac{1}{n}{\boldsymbol{Z}}_{-S}^{\rm T}{\boldsymbol{Z}}_{i}. \end{split} \end{array}$

(2.8)

And for the multiplicative error setting, the unbiased surrogates are defined as

$\begin{array}{l} \begin{split} & \hat{\varSigma}_{-S, -S}^{\rm mult}=\frac{1}{n}{\boldsymbol{Z}}_{-S}^{\rm T}{\boldsymbol{Z}}_{-S}\oslash(\varSigma_{-S, -S}^{M}+\mu_{-S}^{M}(\mu_{-S}^{M})^{\rm T}), \\ & \tilde{\varSigma}_{-S, i}^{\rm mult}=\frac{1}{n}{\boldsymbol{Z}}_{-S}^{\rm T}{\boldsymbol{Z}}_{i}\oslash\mu_{-S}^{M}, \end{split} \end{array}$

(2.9)

where $\oslash$ represents the element division operator of vectors and matrices. Since $\varSigma^{A}$ or $(\varSigma^{M}, \mu^{M})$ can be obtained in practice through repeated measurements or field experience, we assume that they are known for the model.

However, the estimator $\hat{\varSigma}$ is usually not positive semidefinite in high-dimensional models, resulting in the related optimization problems that may no longer be convex. To overcome this problem, we draw on the idea of Ref. [28] to obtain the sparse regression coefficients $\theta_{i}$ by minimizing

$\begin{array}{l} \hat{ \theta}_{i}:=\mathop{\rm argmin}\limits_{\theta_{i}\in R^{p-|S|}}(\dfrac{1}{2}\theta_{i}^{\rm T}\tilde{\varSigma}_{-S, -S}\theta_{i}-\theta_{i}^{\rm T}\tilde{\varSigma}_{-S, i}+\lambda_{i}||\theta_{i}||_{1}), \end{array}$

(2.10)

where $\tilde{\varSigma}_{-S, -S}$ is the principal submatrix of $\tilde{\varSigma}$ , and $\tilde{\varSigma}=(\hat{\varSigma})_{+}=\mathop{\rm argmin}\limits_{\varSigma \geqslant 0}||\varSigma-\hat{\varSigma}||_{\rm max}$ is the nearest positive semidefinite matrix. From the definition of $\tilde{\varSigma}$ , we can solve it efficiently by an alternating direction method of multipliers (ADMM) and have

$\begin{array}{l} \left \| \tilde{\varSigma}-\varSigma \right \|_{\rm max} \leqslant \left \|\tilde{\varSigma}-\hat{\varSigma}\right \|_{\rm max} + \left \|\hat{\varSigma}-\varSigma\right \|_{\rm max} \leqslant 2 \left \|\hat{\varSigma}-\varSigma\right \|_{\rm max}. \end{array}$

(2.11)

Eq. (11) is obtained by the triangle inequality and the definition of $\tilde{\varSigma}$ ensures $\tilde{\varSigma}$ approximates $\varSigma$ as well as the initial surrogate $\hat{\varSigma}$ .

Based on the regression step, for each node $i$ in the index set $S$ we cannot directly replace ${\boldsymbol{Z}}$ with ${\boldsymbol{X}}$ in Eq. (5) to calculate $\hat{E}_{S}$ . Instead, we can use $\hat{E}_{S}$ as an intermediate variable to calculate $\hat {\boldsymbol \varOmega} _{S, S}$ .Then the estimator $\hat {\boldsymbol \varOmega} _{S, S}$ can be written as

$\begin{array}{l} \hat{ {\boldsymbol \varOmega} }_{S, S}=(\dfrac{1}{n}\hat{E}_{S}^{\rm T}\hat{E}_{S})^{-1}=(\hat{\varSigma}_{S, S}-2\hat{\varSigma}_{S, -S}\hat{\varTheta}_{S}+\hat{\varTheta}_{S}^{\rm T}\hat{\varSigma}_{-S, -S}\hat{\varTheta}_{S})^{-1}, \end{array}$

(2.12)

where $\hat{\varTheta}_{S}=(\hat{\theta}_{i})_{i\in S}$ . These observations suggest a simple plug-in estimator $\hat{E}_{S}\hat{ {\boldsymbol \varOmega} }_{S, S}$ for the unobservable submatrix $\hat{{\boldsymbol{X}}}_{S}$ in Eq. (6). Since the error matrix $\hat{E}_{S}$ is unknown, we cannot directly calculate $\hat{{\boldsymbol{X}}}_{S}$ . Similarly, for each $1\leqslant k \neq m \leqslant L$ , we can use $\hat{{\boldsymbol{X}}}_{S_{k}}$ and $\hat{{\boldsymbol{X}}}_{S_{m}}$ as intermediate variables to calculate $\hat{ {\boldsymbol \varOmega} }_{S_{m}, S_{k}}$ . Then the estimator $\hat{ {\boldsymbol \varOmega} }_{S_{m}, S_{k}}$ can be written as

$\begin{split} \hat{ {\boldsymbol \varOmega} }_{S_{m}, S_{k}}=&\dfrac{1}{n}\hat{{\boldsymbol{X}}}_{S_{m}}^{\rm T}\hat{{\boldsymbol{X}}}_{S_{k}}=\\&\dfrac{1}{n}(\hat{E}_{S_{m}}\hat{ {\boldsymbol \varOmega} }_{S_{m}, S_{m}})^{\rm T}(\hat{E}_{S_{k}}\hat{ {\boldsymbol \varOmega} }_{S_{k}, S_{k}})=\\&\hat{ {\boldsymbol \varOmega} }_{S_{m}, S_{m}}^{\rm T}\left( {\dfrac{1}{n}\hat{E}_{S_{m}}^{\rm T}\hat{E}_{S_{k}}} \right)\hat{ {\boldsymbol \varOmega} }_{S_{k}, S_{k}}=\\&\hat{ {\boldsymbol \varOmega} }_{S_{m}, S_{m}}^{\rm T}(\hat{\varSigma}_{S_{m}, S_{k}}- \hat{\varSigma}_{S_{m}, -S_{k}}\hat{\varTheta}_{S_{k}}-\\& \hat{\varTheta}_{S_{m}}^{\rm T}\hat{\varSigma}_{-S_{m}, S_{k}} + \hat{\varTheta}_{S_{m}}^{\rm T}\hat{\varSigma}_{-S_{m}, -S_{k}}\hat{\varTheta}_{S_{k}})\hat{ {\boldsymbol \varOmega} }_{S_{k}, S_{k}}. \end{split}$

(2.13)

By doing so, the initial COCOISEE estimator is

$\begin{split}&\hat{ {\boldsymbol \varOmega} }_{\rm ini}^{\rm{COCOISEE}}=\\& \begin{cases} &(\hat{\varSigma}_{S, S}-2\hat{\varSigma}_{S, -S}\hat{\varTheta}_{S}+\hat{\varTheta}_{S}^{T}\hat{\varSigma}_{-S, -S}\hat{\varTheta}_{S})^{-1},\\& \;\;\;\;k=m; \\& \hat{ {\boldsymbol \varOmega} }_{S_{m}, S_{m}}^{\rm T}(\hat{\varSigma}_{S_{m}, S_{k}}-\hat{\varSigma}_{S_{m}, -S_{k}}\hat{\varTheta}_{S_{k}}-\hat{\varTheta}_{S_{m}}^{\rm T}\hat{\varSigma}_{-S_{m}, S_{k}} + \\& \hat{\varTheta}_{S_{m}}^{\rm T}\hat{\varSigma}_{-S_{m}, -S_{k}}\hat{\varTheta}_{S_{k}})\hat{ {\boldsymbol \varOmega} }_{S_{k}, S_{k}},\\& \;\;\;\; k\neq m. \end{cases} \end{split}$

(2.14)

Then for a given threshold $\tau \geqslant 0$ , define

$\begin{array}{l} \hat{ {\boldsymbol \varOmega} }_{g}^{\rm COCOISEE}=T_{\tau}(\hat{ {\boldsymbol \varOmega} }_{\rm ini}^{\rm COCOISEE}), \end{array}$

(2.15)

where $T_{\tau}({\boldsymbol{X}})=(x_{ij}I_{|x_{ij}|\geqslant \tau})$ denotes the matrix ${\boldsymbol{X}}$ thresholded at $\tau$ . Thus, the set of links of graphical structure is ${\rm supp}(\hat{ {\boldsymbol \varOmega} }_{g}^{\rm COCOISEE})$ . The choice of the threshold $\tau$ in Eq. (15) is important for practical implementation. We use the cross-validation method for large precision matrix estimation. Specifically, we randomly split the sample of $n$ rows of the sample matrix ${\boldsymbol{Z}}$ into two subsamples of sizes $n_1$ and $n_2$ , and repeat this $R$ times. Denote by $\hat{ {\boldsymbol \varOmega} }_{\rm ini}^{\rm COCOISEE}(n_{1}, i)$ and $\hat{ {\boldsymbol \varOmega} }_{\rm ini}^{\rm COCOISEE}(n_{2}, i)$ the precision matrices as defined in Eq. (14) based on these two subsamples, respectively, for the $i$ th split. Thus, the threshold $\tau$ can be chosen to minimize

$\begin{array}{l} \tau:=\dfrac{1}{R} \displaystyle \sum \limits _{i=1}^{R} \left \| T_{\tau}(\hat{ {\boldsymbol \varOmega} }_{\rm ini}^{\rm COCOISEE}(n_{1}, i))-\hat{ {\boldsymbol \varOmega} }_{\rm ini}^{\rm COCOISEE}(n_{2}, i) \right \|^2. \end{array}$

(2.16)

The implementation of COCOISEE is summarized in Algorithm 1.

Algorithm 1. COCOISEE

Input:
Sample matrix

${\boldsymbol{Z} }$ .
Additive error: covariance matrix

$\varSigma^{A}$ ;

${\bf{OR}}$ Multiplicative errors: mean vector

$\mu^{M}$ and covariance matrix

$\varSigma^{M}$ .
Output: An estimate of the inverse covariance matrix

$\hat{ {\boldsymbol \varOmega} }_{\rm ini}^{\rm COCOISEE}$ and

$\hat{ {\boldsymbol \varOmega} }_{g}^{\rm COCOISEE}$ .

${\bf{Construct}}$

$\hat{ {\boldsymbol \varOmega} }_{\rm ini}^{\rm COCOISEE}$
for

$S \subset \{1, ..., p\}$ and

$i \in S$ do

$\hat{\varSigma}_{-S, -S}^{\rm add}=\dfrac{1}{n}{\boldsymbol{Z} }_{-S}^{\rm T}{\boldsymbol{Z} }_{-S}-\varSigma_{-S, -S}^{A}$ or

$\hat{\varSigma}_{-S, -S}^{\rm mult}=\dfrac{1}{n}{\boldsymbol{Z} }_{-S}^{\rm T}{\boldsymbol{Z} }_{-S}\oslash(\varSigma_{-S, -S}^{M}+\mu_{-S}^{M}(\mu_{-S}^{M})^{\rm T}),$

$\widetilde{\varSigma}_{-S, -S}=(\hat{\varSigma}_{-S, -S})_{+},$

$\tilde{\varSigma}_{-S, i}^{\rm add}=\dfrac{1}{n}{\boldsymbol{Z} }_{-S}^{\rm T}{\boldsymbol{Z} }_{i}$ or

$\tilde{\varSigma}_{-S, i}^{\rm mult}=\dfrac{1}{n}{\boldsymbol{Z} }_{-S}^{\rm T}{\boldsymbol{Z} }_{i}\oslash\mu_{-S}^{M},$

$\hat{ \theta}_{i}:=\mathop{\rm argmin}\limits_{\theta_{i}\in R^{p-|S|} }(\dfrac{1}{2}\theta_{i}^{\rm T}\tilde{\varSigma}_{-S, -S}\theta_{i}-\theta_{i}^{\rm T}\tilde{\varSigma}_{-S, i}+\lambda_{i}||\theta_{i}||_{1}),$

$\hat{\varTheta}_{S}=(\hat{\theta}_{i})_{i\in S}$ .
end for
Construct the block-diagonal entries

$\hat{ {\boldsymbol \varOmega} }_{S, S}$ according to Eq.

$(12).$
Construct block-diagonal entries

$\hat{ {\boldsymbol \varOmega} }_{S_{m}, S_{k} }$ according to Eq.

$(13).$
Construct the initial COCOISEE estimator

$\hat{ {\boldsymbol \varOmega} }_{\rm ini}^{\rm COCOISEE}.$

${\bf{Construct}}$

$\hat{ {\boldsymbol \varOmega} }_{g}^{\rm COCOISEE}$

$\hat{ {\boldsymbol \varOmega} }_{g}^{\rm COCOISEE}=T_{\tau}(\hat{ {\boldsymbol \varOmega} }_{\rm ini}^{\rm COCOISEE}).$

| Show Table

DownLoad: CSV

3. Theoretical properties

In this section, we will present several technical conditions and then analyze the theoretical properties of COCOISEE.

3.1 Technical conditions

Condition 1. (Closeness condition) Assume that the distributions of $\hat{\varSigma}_{-S, -S}$ and $\tilde{\varSigma}_{-S, j}$ are identified by a set of parameters $\theta$ . Then there exist universal constants $C$ and $c$ , and positive functions $\zeta$ and $\epsilon_0$ depending on $\theta$ and $\sigma^2$ such that for any $\varepsilon\leqslant \varepsilon_0$ , $\hat{\varSigma}_{-S, -S}$ and $\tilde{\varSigma}_{-S, j}$ satisfy the following probability statements:

$\begin{split} &{\rm Pr}(|(\hat{\varSigma}_{-S, -S})_{ij}-(\varSigma_{-S, -S})_{ij}|\geqslant \varepsilon)\leqslant C {\rm exp}(-cn\varepsilon^{2}\zeta^{-1}) ,\quad \\&\forall i, j\in(\{1, ..., p\}-S). \end{split}$

(3.1)

$\begin{split} &{\rm Pr}(|(\tilde{\varSigma}_{-S, j})_{i}-(\varSigma_{-S, j})_{i}|\geqslant \varepsilon)\leqslant C {\rm exp}(-cn\varepsilon^{2}K^{-2}\zeta^{-1}), \quad \\&\forall i\in(\{1, ..., p\}-S), j=\in\{1, ..., p\}. \end{split}$

(3.2)

Condition 2. (Restricted eigenvalue condition) The restricted eigenvalue of the Gram matrix $\varSigma=\dfrac{1}{n}{\boldsymbol{X}}^{\rm T}{\boldsymbol{X}}$ satisfies:

$\begin{array}{l} 0< {\boldsymbol \varOmega} =\mathop{\rm min}\limits_{{\boldsymbol{x}}\neq 0, ||{\boldsymbol{x}}_{-\kappa}||_{1}\leqslant 3||{\boldsymbol{x}}_{\kappa}||_{1}} \dfrac{{\boldsymbol{x}}^{\rm T}\varSigma{\boldsymbol{x}}}{||{\boldsymbol{x}}||_{2}^{2}}. \end{array}$

(3.3)

Condition 3. (Irrepresentable condition)

$\begin{array}{l} ||\varSigma_{-\kappa, \kappa}\varSigma_{\kappa, \kappa}^{-1}||_{\infty}<1, \end{array}$

(3.4)

where $\kappa=\{1, ..., K\}$ is the true support set of the regression coefficient vector $\theta^{*}$ .

Condition 4. The entries of measurement error matrices $A$ and $M$ are all independent and identically distributed sub-Gaussian random variables.

Condition 5. The random vectors ${\boldsymbol \eta}_{i}$ obey a zero-mean sub-Gaussian distribution with a finite variance parameter, i.e., $\sigma_{i}^{2}\leqslant C$ .

Condition 6. The Gaussian graph model we focus on satisfies the following conditions.

$\begin{array}{l} \mathcal{G}(M, K)=\\ \{ {\boldsymbol \varOmega} :{{\text{each row has at most}} \;K\; \\ {\text{ nonzero off-diagonal entries and}}}\;\\ M^{-1}\leqslant \lambda_{\rm min}( {\boldsymbol \varOmega} ) \leqslant \lambda_{\rm max}( {\boldsymbol \varOmega} )\leqslant M\}. \end{array}$

(3.5)

Condition 1 ensures that the surrogates $\hat{\varSigma}_{-S, -S}$ (and hence $\tilde{\varSigma}_{-S, -S}$ ) and $\hat{\varSigma}_{-S, j}$ are close to $\varSigma_{-S, -S}$ and $\varSigma_{-S, j}$ respectively in terms of the elementwise maximum norm. Condition 2 is the restricted eigenvalue condition proposed in Ref. [34], and is widely used in Lasso related articles. It imposes a lower bound on eigenvalues of the Gram matrix $\varSigma$ to constrain the correlations between relatively small numbers of predictors in the design matrix ${\boldsymbol{X}}$ . Condition 3 is the irrepresentable condition proposed in Refs. [35, 36], and was obtained to prove a model selection consistency result for Gaussian graphical model selection using the Lasso. Conditions 1–3 are necessary conditions for proving Lemma A.1 introduced in Appendix.

Condition 4 imposes a mild assumption on measurement error matrices since sub-Gaussians are a natural kind of random variable for which the properties of Gaussians can be extended^[37]. Condition 5 is standard and has been widely used in high-dimensional linear regression models. Condition 6 guarantees that the number of links for each node is bounded by $K$ from above and that the precision matrix has a bounded spectrum^[21].

3.2 Main results

Theorem 1. Assume that Conditions 1–6 hold and $\lambda K\to o(1)$ . Then with probability $1- \varDelta$ tending to one the initial COCOISEE estimator $\hat{ {\boldsymbol \varOmega} }_{\rm ini}^{\rm COCOISEE}$ in Eq. (14) satisfies that

$\begin{array}{l} ||\hat{ {\boldsymbol \varOmega} }_{\rm ini}^{\rm COCOISEE}- {\boldsymbol \varOmega} ||_{\infty}=O(\lambda). \end{array}$

(3.6)

Theorem 1 establishes the entrywise infinity norm estimation bound for the initial COCOISEE estimator. From the proof of Theorem 1 in Appendix, we see that the rate of convergence for the initial COCOISEE estimator is the maximum of two components $\lambda^{2}K$ and $\lambda$ for both block-diagonal and off-block-diagonal entries of $\hat{ {\boldsymbol \varOmega} }_{\rm ini}^{\rm COCOISEE}$ . Since we assume that $\lambda K\to o(1)$ , the rate of convergence $O(\lambda)$ dominates that of $O(\lambda^2 K)$ , meaning that the rate of convergence for the initial COCOISEE estimator is change into $O(\lambda)$ .

Theorem 2. Assume that the conditions of Theorem 1 hold and $\omega_{0}={\rm min}\{|\omega_{jk}|:(j, k)\in {\rm supp} ( {\boldsymbol \varOmega} )\}\geqslant \omega_{0}^{*}=C\lambda$ with $C>0$ and some sufficiently large constants. Then, with probability $1-\varDelta$ tending to one, and for any $\tau\in[c\omega_{0}^{*}, \omega_{0}-c\omega_{0}^{*}]$ with $c\in (0, {1}/{2})$ , we have

$\begin{array}{l} {\rm supp}(\hat{ {\boldsymbol \varOmega} }_{g}^{\rm COCOISEE})={\rm supp}( {\boldsymbol \varOmega} ). \end{array}$

(3.7)

Theorem 2 shows that the sparse precision matrix estimator $\hat{ {\boldsymbol \varOmega} }_{g}^{\rm COCOISEE}$ enjoy good asymptotic properties.

4. Simulation studies

In this section, we simulated the finite-sample performance of the proposed COCOISEE method. To illustrate the impact of ignoring measurement errors, we compared two methods designed for clean data sets: Glasso^[12] and ISEE^[21]. The three methods were implemented as follows. Glasso was implemented by the R package glasso and had a tuning parameter that was chosen using five-fold cross-validation. ISEE^[21] selected the tuning parameters in scaled Lasso^[38] following the suggestion of Ref. [39] and adapted the cross-validation method proposed in Refs. [40, 41] for the threshold parameters. For the implementation of our COCOISEE approach, we used 5-fold corrected cross-validation, similar to Ref. [29]. Specifically, we used 90 $\%$ of the sample to calculate $\hat{ {\boldsymbol \varOmega} }_{\rm ini}^{\rm COCOISEE}(n_{1}, i)$ and the remained 10 $\%$ to calculate $\hat{ {\boldsymbol \varOmega} }_{\rm ini}^{\rm COCOISEE}(n_{2}, i)$ . Then, we choose $\tau$ from 20 values with the same interval by minimizing criterion (16) with the number of random splits set to $R=5$ .

We generated 50 data sets. For each data set, we generate the precision matrix ${\boldsymbol \varOmega}$ in two steps. First, we produce a band matrix ${\boldsymbol \varOmega} _{0}$ with diagonal entries being one, ${\boldsymbol \varOmega} _{0}(i, i+1)= {\boldsymbol \varOmega} _{0}(i+1, i)=0.5$ for $i=1, ..., p$ , and all other entries being zero. Second, we randomly permute the rows and columns of ${\boldsymbol \varOmega} _{0}$ to obtain the precision matrix ${\boldsymbol \varOmega}$ . Thus, in general, the final precision matrix ${\boldsymbol \varOmega}$ no longer has the band structure. We next sample the rows of the $n\times p$ data matrix ${\boldsymbol{X}}$ as i.i.d. vectors copied from the multivariate Gaussian distribution $N(0, {\boldsymbol \varOmega} ^{-1})$ . Then, based on different types of measurement errors, the corrupted covariate matrices ${\boldsymbol{Z}}$ can be provided respectively as follows.

● Additive errors case. The observed covariates $\boldsymbol{Z}={\boldsymbol{X}}+{\boldsymbol A}$ , where the rows of ${\boldsymbol{A}}$ are i.i.d. vectors copied from $N(0, \tau^{2})$ with $\tau=0.2$ .

● Multiplicative errors case. The observed covariates are $\boldsymbol{Z}={\boldsymbol X}\odot {\boldsymbol M}$ , where the elements of $\boldsymbol{M}$ follow a log-normal distribution, meaning that ${\rm log} (m_{ij})$ ’s are i.i.d. variables from $N(0, \tau^{2})$ with $\tau=0.2$ .

● Missing data case. The observed covariates are defined as $z_{ij}=x_{ij}m_{ij}$ , where $m_{ij}=I(x_{ij}{\text{ is not missing}})$ , so that the covariates are missing at random with probability 0.1.

Due to high computational cost, we consider settings with $n=100$ , whose dimensionality $p$ varies in $\{50, 100, 150, 200\}$ .

To compare the aforementioned methods, we consider the same performance measures as suggested in Ref. [21]. The first two measures are the true positive rate (TPR) and the false positive rate (FPR), defined as

$\begin{array}{l} {\rm TPR}=\dfrac{\#\; {\text{of correctly identified edges}}}{\#\; {\text{of identified edges in total}}}, \end{array}$

$\begin{array}{l} {\rm FPR}=\dfrac{\# \;{\text{of falsely identified edges}}}{\# \;{\text{of identified nonedges in total}}}, \end{array}$

respectively. We use the Frobenius norm to calculate the normalized estimation error (NEE) defined as

$\begin{array}{l} {\rm NEE}(\hat{ {\boldsymbol \varOmega} })=\dfrac{||\hat{ {\boldsymbol \varOmega} }- {\boldsymbol \varOmega} ||_{F}}{|| {\boldsymbol \varOmega} ||_{F}}. \end{array}$

Tables 1–3 summarize the simulation results of these three performance measures for additive, multiplicative error and missing data cases. In view of TPR, FPR, and NEE in these tables, it is clear that the NEE of COCOISEE is the best, and both TPR and FPR are also doing very well.

Table 1. Additive errors case.

p	Algorithm	TPR $(\times 10^{-2})$	SE(TPR) $(\times 10^{-2})$	FPR $(\times 10^{-2})$	SE(FPR) $(\times 10^{-2})$	NEE $(\times 10^{-2})$	SE(NEE) $(\times 10^{-2})$
50	Glasso	100.00	0.00	23.36	3.90	19.49	3.73
	ISEE	99.97	0.19	9.01	1.27	15.67	1.54
	COCOISEE	93.92	2.16	4.26	1.26	14.92	1.71
100	Glasso	100.00	0.00	14.07	1.25	28.59	2.76
	ISEE	100.00	0.00	8.83	0.85	29.06	2.23
	COCOISEE	91.14	2.64	3.27	0.31	21.70	2.55
150	Glasso	99.99	0.06	9.89	0.85	37.34	2.77
	ISEE	100.00	0.00	9.34	0.75	45.00	3.42
	COCOISEE	89.11	1.33	2.60	0.09	25.35	1.83
200	Glasso	99.97	0.10	7.67	0.66	44.08	3.01
	ISEE	99.99	0.05	9.59	0.62	60.49	3.53
	COCOISEE	89.10	3.49	9.31	9.63	33.66	2.14

| Show Table

DownLoad: CSV

Table 2. Multiplicative errors case.

p	Algorithm	TPR $(\times 10^{-2})$	SE(TPR) $(\times 10^{-2})$	FPR $(\times 10^{-2})$	SE(FPR) $(\times 10^{-2})$	NEE $(\times 10^{-2})$	SE(NEE) $(\times 10^{-2})$
50	Glasso	98.30	1.63	24.83	8.60	71.92	4.08
	ISEE	87.73	3.44	11.73	1.47	63.59	1.71
	COCOISEE	80.74	3.25	6.47	0.91	46.11	3.49
100	Glasso	95.79	1.34	17.92	6.18	82.88	2.67
	ISEE	80.44	2.96	10.79	0.89	76.92	1.10
	COCOISEE	79.46	2.44	7.56	2.27	57.75	1.54
150	Glasso	92.49	1.83	13.73	4.39	87.71	1.74
	ISEE	75.07	2.30	9.87	0.78	82.61	0.89
	COCOISEE	85.04	2.59	12.01	4.17	65.66	0.87
200	Glasso	90.24	1.82	9.98	3.69	91.01	1.83
	ISEE	72.15	2.25	9.77	0.72	86.02	0.73
	COCOISEE	82.24	2.33	8.65	3.31	67.35	1.42

| Show Table

DownLoad: CSV

Table 3. Missing data case.

p	Algorithm	TPR $(\times 10^{-2})$	SE(TPR) $(\times 10^{-2})$	FPR $(\times 10^{-2})$	SE(FPR) $(\times 10^{-2})$	NEE $(\times 10^{-2})$	SE(NEE) $(\times 10^{-2})$
50	Glasso	99.62	0.72	24.57	9.21	53.61	7.18
	ISEE	91.70	2.63	11.53	1.25	43.93	2.29
	COCOISEE	81.49	3.26	3.79	0.34	39.02	2.69
100	Glasso	98.70	0.80	16.40	5.82	69.26	4.17
	ISEE	87.03	2.17	9.75	0.67	59.78	1.89
	COCOISEE	80.60	4.67	5.42	1.01	54.60	1.74
150	Glasso	97.29	1.07	12.69	4.72	77.40	3.79
	ISEE	81.96	2.19	9.03	0.59	68.88	1.63
	COCOISEE	82.28	2.87	8.74	2.49	61.20	1.57
200	Glasso	96.23	1.04	10.75	3.65	81.53	3.04
	ISEE	79.15	2.39	8.65	0.55	73.94	1.30
	COCOISEE	84.85	2.68	13.40	3.99	65.01	2.05

| Show Table

DownLoad: CSV

5. Conclusions

In this paper, we have introduced a new methodology COCOISEE to achieve scalable and interpretable estimation for a Gaussian graphical model under both additive and multiplicative measurement errors. It takes full advantage of recently developed innovative scalable efficient estimation and the nearest positive semidefinite matrix projection, thus enjoying stepwise convexity and scalability in Gaussian graphical model estimation. The suggested method is ideal for parallel and distributed computing and cloud computing and has been shown to enjoy appealing theoretical properties. Both the established theoretical properties and numerical performances demonstrate that the proposed method enjoys good estimation, recovery accuracy, and high scalability under both additive and multiplicative measurement errors. Similar theoretical conclusions can also be extended to random sub-Gaussian settings. It would be very interesting to study several extensions of COCOISEE to more general model settings such as the time series model, the generalized linear model and the large nonparanormal graphical model, which are beyond the scope of the current paper and demand future studies.

There are three main contributions in this paper. First, the proposed COCOISEE method has scalability and high efficiency, because the main calculation cost comes from the estimation of the regression coefficient in the case of measurement error, and the other steps adopt matrix operation, which has relatively high calculation efficiency. Second, we use a recently developed semipositive definite matrix projection technique to mitigate the effects of measurement errors when recovering high-dimensional coefficient matrices from column-by-column regression. Therefore, COCOISEE enjoys progressive convexity and computational stability. Finally, we provide comprehensive theoretical properties to the proposed method by establishing the consistency of the estimator. Numerical results show the effectiveness of the proposed method.

Conflict of interest

The author declares that he has no conflict of interest.

Conflict of Interest

The authors declare that they have no conflict of interest.

We propose a new methodology COCOISEE to achieve scalable and interpretable estimation for Gaussian graphical model under both additive and multiplicative measurement errors. The method is stepwise convex, computationally stable, efficient and scalable. Both theoretical and simulation results verify the feasibility of our method.

References (41)

References

[1]	Baselmans B M, Jansen R, Ip H F, et al. Multivariate genome-wide analyses of the well-being spectrum. Nature Genetics, 2019, 51 (3): 445–451. DOI: 10.1038/s41588-018-0320-8
[2]	Yang K, Lee L F. Identification and QML estimation of multivariate and simultaneous equations spatial autoregressive models. Journal of Econometrics, 2017, 196 (1): 196–214. DOI: 10.1016/j.jeconom.2016.04.019
[3]	Zhu X, Huang D, Pan R, et al. Multivariate spatial autoregressive model for large scale social networks. Journal of Econometrics, 2020, 215 (2): 591–606. DOI: 10.1016/j.jeconom.2018.11.018
[4]	Han F, Liu H. Optimal rates of convergence for latent generalized correlation matrix estimation in transelliptical distribution. arXiv: 1305.6916, 2013.
[5]	Rubinstein M. Markowitz’s “portfolio selection”: A fifty-year retrospective. The Journal of Finance, 2002, 57 (3): 1041–1045. DOI: 10.1111/1540-6261.00453
[6]	Wegkamp M, Zhao Y. Adaptive estimation of the copula correlation matrix for semiparametric elliptical copulas. Bernoulli, 2016, 22 (2): 1184–1226. DOI: 10.3150/14-BEJ690
[7]	Fan J, Han F, Liu H. Challenges of big data analysis. National Science Review, 2014, 1 (2): 293–314. DOI: 10.1093/nsr/nwt032
[8]	Cai T, Liu W, Luo X. A constrained ℓ₁ minimization approach to sparse precision matrix estimation. Journal of the American Statistical Association, 2011, 106 (494): 594–607. DOI: 10.1198/jasa.2011.tm10155
[9]	Tokuda T, Goodrich B, van Mechelen I, et al. Visualizing distributions of covariance matrices. New York: Columbia University, 2011.
[10]	Fan J, Peng H. Nonconcave penalized likelihood with a diverging number of parameters. The Annals of Statistics, 2004, 32 (3): 928–961. DOI: 10.1214/009053604000000256
[11]	Yuan M, Lin Y. Model selection and estimation in the Gaussian graphical model. Biometrika, 2007, 94 (1): 19–35. DOI: 10.1093/biomet/asm018
[12]	Friedman J, Hastie T, Tibshirani R. Sparse inverse covariance estimation with the graphical Lasso. Biostatistics, 2008, 9 (3): 432–441. DOI: 10.1093/biostatistics/kxm045
[13]	Banerjee O, El Ghaoui L, d’Aspremont A. Model selection through sparse maximum likelihood estimation for multivariate Gaussian or binary data. The Journal of Machine Learning Research, 2008, 9: 485–516. DOI: 10.5555/1390681.1390696
[14]	Meinshausen N, Bühlmann P. High-dimensional graphs and variable selection with the Lasso. The Annals of Statistics, 2006, 34 (3): 1436–1462. DOI: 10.1214/009053606000000281
[15]	Wille A, Zimmermann P, Vranova E, et al. Sparse graphical Gaussian modeling of the isoprenoid gene network in Arabidopsis thaliana. Genome Biology, 2004, 5 (11): R92. DOI: 10.1186/gb-2004-5-11-r92
[16]	Rothman A J, Bickel P J, Levina E, et al. Sparse permutation invariant covariance estimation. Electronic Journal of Statistics, 2008, 2: 494–515. DOI: 10.1214/08-EJS176
[17]	Lam C, Fan J. Sparsistency and rates of convergence in large covariance matrix estimation. The Annals of Statistics, 2009, 37 (6B): 4254–4278. DOI: 10.1214/09-AOS720
[18]	Yuan M. High dimensional inverse covariance matrix estimation via linear programming. The Journal of Machine Learning Research, 2010, 11: 2261–2286. DOI: 10.5555/1756006.1859930
[19]	Liu W, Luo X. High-dimensional sparse precision matrix estimation via sparse column inverse operator. arXiv: 1203.3896, 2012.
[20]	Sun T, Zhang C H. Sparse matrix inversion with scaled Lasso. The Journal of Machine Learning Research, 2013, 14 (1): 3385–3418. DOI: 10.5555/2567709.2567771
[21]	Fan Y, Lv J. Innovated scalable efficient estimation in ultra-large Gaussian graphical models. The Annals of Statistics, 2016, 44 (5): 2098–2126. DOI: 10.1214/15-AOS1416
[22]	Bickel P, Ritov Y. Efficient estimation in the errors in variables model. The Annals of Statistics, 1987, 15 (2): 513–540. DOI: 10.1214/aos/1176350358
[23]	Ma Y, Li R. Variable selection in measurement error models. Bernoulli, 2010, 16 (1): 274–300. DOI: 10.3150/09-bej205
[24]	Liang H, Li R. Variable selection for partially linear models with measurement errors. Journal of the American Statistical Association, 2009, 104 (485): 234–248. DOI: 10.1198/jasa.2009.0127
[25]	Städler N, Bühlmann P. Missing values: Sparse inverse covariance estimation and an extension to sparse regression. Statistics and Computing, 2012, 22 (1): 219–235. DOI: 10.1007/s11222-010-9219-7
[26]	Loh P L, Wainwright M J. High-dimensional regression with noisy and missing data: Provable guarantees with nonconvexity. Advances in Neural Information Processing Systems, 2012, 40 (3): 1637–1664. DOI: 10.1214/12-AOS1018
[27]	Belloni A, Rosenbaum M, Tsybakov A B. Linear and conic programming estimators in high dimensional errors-in-variables models. Journal of the Royal Statistical Society: Series B (Statistical Methodology), 2017, 79 (3): 939–956. DOI: 10.1111/rssb.12196
[28]	Datta A, Zou H. Cocolasso for high-dimensional error-in-variables regression. The Annals of Statistics, 2017, 45 (6): 2400–2426. DOI: 10.1214/16-AOS1527
[29]	Tibshirani R. Regression shrinkage and selection via the Lasso. Journal of the Royal Statistical Society: Series B (Methodological), 1996, 58 (1): 267–288. DOI: 10.1111/j.2517-6161.1996.tb02080.x
[30]	Fan J, Li R. Variable selection via nonconcave penalized likelihood and its oracle properties. Journal of the American Statistical Association, 2001, 96 (456): 1348–1360. DOI: 10.1198/016214501753382273
[31]	Zou H, Hastie T. Regularization and variable selection via the elastic net. Journal of the Royal Statistical Society: Series B (Statistical Methodology), 2005, 67 (2): 301–320. DOI: 10.1111/j.1467-9868.2005.00503.x
[32]	Zou H. The adaptive Lasso and its oracle properties. Journal of the American Statistical Association, 2006, 101 (476): 1418–1429. DOI: 10.1198/016214506000000735
[33]	Candes E, Tao T. The Dantzig selector: Statistical estimation when p is much larger than n. The Annals of Statistics, 2007, 35 (6): 2313–2351. DOI: 10.1214/009053606000001523
[34]	Bickel P J, Ritov Y, Tsybakov A B. Simultaneous analysis of Lasso and Dantzig selector. The Annals of Statistics, 2009, 37 (4): 1705–1732. DOI: 10.1214/08-AOS620
[35]	Zhao P, Yu B. On model selection consistency of Lasso. The Journal of Machine Learning Research, 2006, 7: 2541–2563. DOI: 10.5555/1248547.1248637
[36]	Wainwright M J. Sharp thresholds for high-dimensional and noisy sparsity recovery using l₁-constrained quadratic programming (Lasso). IEEE Transactions on Information Theory, 2009, 55 (5): 2183–2202. DOI: 10.1109/TIT.2009.2016018
[37]	Buldygin V V, Kozachenko Yu V. Metric Characterization of Random Variables and Random Processes. Providence, RI: American Mathematical Society, 2000.
[38]	Sun T, Zhang C H. Scaled sparse linear regression. Biometrika, 2012, 99 (4): 879–898. DOI: 10.1093/biomet/ass043
[39]	Ren Z, Sun T, Zhang C H, et al. Asymptotic normality and optimalities in estimation of large Gaussian graphical models. The Annals of Statistics, 2015, 43 (3): 991–1026. DOI: 10.1214/14-AOS1286
[40]	Bickel P J, Levina E. Regularized estimation of large covariance matrices. The Annals of Statistics, 2008, 36 (1): 199–227. DOI: 10.1214/009053607000000758
[41]	Bickel P J, Levina E. Covariance regularization by thresholding. The Annals of Statistics, 2008, 36 (6): 2577–2604. DOI: 10.1214/08-AOS600

Supplements (1)

Supplements
Other Related Supplements
- Graphic and text summary
  Download

Cited By

Track Citations

Get Citation

{{if article.articleBusiness.pdfLink && article.articleBusiness.pdfLink != ''}} {{else}} {{/if}}PDF

XML

References

[1]	Baselmans B M, Jansen R, Ip H F, et al. Multivariate genome-wide analyses of the well-being spectrum. Nature Genetics, 2019, 51 (3): 445–451. DOI: 10.1038/s41588-018-0320-8
[2]	Yang K, Lee L F. Identification and QML estimation of multivariate and simultaneous equations spatial autoregressive models. Journal of Econometrics, 2017, 196 (1): 196–214. DOI: 10.1016/j.jeconom.2016.04.019
[3]	Zhu X, Huang D, Pan R, et al. Multivariate spatial autoregressive model for large scale social networks. Journal of Econometrics, 2020, 215 (2): 591–606. DOI: 10.1016/j.jeconom.2018.11.018
[4]	Han F, Liu H. Optimal rates of convergence for latent generalized correlation matrix estimation in transelliptical distribution. arXiv: 1305.6916, 2013.
[5]	Rubinstein M. Markowitz’s “portfolio selection”: A fifty-year retrospective. The Journal of Finance, 2002, 57 (3): 1041–1045. DOI: 10.1111/1540-6261.00453
[6]	Wegkamp M, Zhao Y. Adaptive estimation of the copula correlation matrix for semiparametric elliptical copulas. Bernoulli, 2016, 22 (2): 1184–1226. DOI: 10.3150/14-BEJ690
[7]	Fan J, Han F, Liu H. Challenges of big data analysis. National Science Review, 2014, 1 (2): 293–314. DOI: 10.1093/nsr/nwt032
[8]	Cai T, Liu W, Luo X. A constrained ℓ₁ minimization approach to sparse precision matrix estimation. Journal of the American Statistical Association, 2011, 106 (494): 594–607. DOI: 10.1198/jasa.2011.tm10155
[9]	Tokuda T, Goodrich B, van Mechelen I, et al. Visualizing distributions of covariance matrices. New York: Columbia University, 2011.
[10]	Fan J, Peng H. Nonconcave penalized likelihood with a diverging number of parameters. The Annals of Statistics, 2004, 32 (3): 928–961. DOI: 10.1214/009053604000000256
[11]	Yuan M, Lin Y. Model selection and estimation in the Gaussian graphical model. Biometrika, 2007, 94 (1): 19–35. DOI: 10.1093/biomet/asm018
[12]	Friedman J, Hastie T, Tibshirani R. Sparse inverse covariance estimation with the graphical Lasso. Biostatistics, 2008, 9 (3): 432–441. DOI: 10.1093/biostatistics/kxm045
[13]	Banerjee O, El Ghaoui L, d’Aspremont A. Model selection through sparse maximum likelihood estimation for multivariate Gaussian or binary data. The Journal of Machine Learning Research, 2008, 9: 485–516. DOI: 10.5555/1390681.1390696
[14]	Meinshausen N, Bühlmann P. High-dimensional graphs and variable selection with the Lasso. The Annals of Statistics, 2006, 34 (3): 1436–1462. DOI: 10.1214/009053606000000281
[15]	Wille A, Zimmermann P, Vranova E, et al. Sparse graphical Gaussian modeling of the isoprenoid gene network in Arabidopsis thaliana. Genome Biology, 2004, 5 (11): R92. DOI: 10.1186/gb-2004-5-11-r92
[16]	Rothman A J, Bickel P J, Levina E, et al. Sparse permutation invariant covariance estimation. Electronic Journal of Statistics, 2008, 2: 494–515. DOI: 10.1214/08-EJS176
[17]	Lam C, Fan J. Sparsistency and rates of convergence in large covariance matrix estimation. The Annals of Statistics, 2009, 37 (6B): 4254–4278. DOI: 10.1214/09-AOS720
[18]	Yuan M. High dimensional inverse covariance matrix estimation via linear programming. The Journal of Machine Learning Research, 2010, 11: 2261–2286. DOI: 10.5555/1756006.1859930
[19]	Liu W, Luo X. High-dimensional sparse precision matrix estimation via sparse column inverse operator. arXiv: 1203.3896, 2012.
[20]	Sun T, Zhang C H. Sparse matrix inversion with scaled Lasso. The Journal of Machine Learning Research, 2013, 14 (1): 3385–3418. DOI: 10.5555/2567709.2567771
[21]	Fan Y, Lv J. Innovated scalable efficient estimation in ultra-large Gaussian graphical models. The Annals of Statistics, 2016, 44 (5): 2098–2126. DOI: 10.1214/15-AOS1416
[22]	Bickel P, Ritov Y. Efficient estimation in the errors in variables model. The Annals of Statistics, 1987, 15 (2): 513–540. DOI: 10.1214/aos/1176350358
[23]	Ma Y, Li R. Variable selection in measurement error models. Bernoulli, 2010, 16 (1): 274–300. DOI: 10.3150/09-bej205
[24]	Liang H, Li R. Variable selection for partially linear models with measurement errors. Journal of the American Statistical Association, 2009, 104 (485): 234–248. DOI: 10.1198/jasa.2009.0127
[25]	Städler N, Bühlmann P. Missing values: Sparse inverse covariance estimation and an extension to sparse regression. Statistics and Computing, 2012, 22 (1): 219–235. DOI: 10.1007/s11222-010-9219-7
[26]	Loh P L, Wainwright M J. High-dimensional regression with noisy and missing data: Provable guarantees with nonconvexity. Advances in Neural Information Processing Systems, 2012, 40 (3): 1637–1664. DOI: 10.1214/12-AOS1018
[27]	Belloni A, Rosenbaum M, Tsybakov A B. Linear and conic programming estimators in high dimensional errors-in-variables models. Journal of the Royal Statistical Society: Series B (Statistical Methodology), 2017, 79 (3): 939–956. DOI: 10.1111/rssb.12196
[28]	Datta A, Zou H. Cocolasso for high-dimensional error-in-variables regression. The Annals of Statistics, 2017, 45 (6): 2400–2426. DOI: 10.1214/16-AOS1527
[29]	Tibshirani R. Regression shrinkage and selection via the Lasso. Journal of the Royal Statistical Society: Series B (Methodological), 1996, 58 (1): 267–288. DOI: 10.1111/j.2517-6161.1996.tb02080.x
[30]	Fan J, Li R. Variable selection via nonconcave penalized likelihood and its oracle properties. Journal of the American Statistical Association, 2001, 96 (456): 1348–1360. DOI: 10.1198/016214501753382273
[31]	Zou H, Hastie T. Regularization and variable selection via the elastic net. Journal of the Royal Statistical Society: Series B (Statistical Methodology), 2005, 67 (2): 301–320. DOI: 10.1111/j.1467-9868.2005.00503.x
[32]	Zou H. The adaptive Lasso and its oracle properties. Journal of the American Statistical Association, 2006, 101 (476): 1418–1429. DOI: 10.1198/016214506000000735
[33]	Candes E, Tao T. The Dantzig selector: Statistical estimation when p is much larger than n. The Annals of Statistics, 2007, 35 (6): 2313–2351. DOI: 10.1214/009053606000001523
[34]	Bickel P J, Ritov Y, Tsybakov A B. Simultaneous analysis of Lasso and Dantzig selector. The Annals of Statistics, 2009, 37 (4): 1705–1732. DOI: 10.1214/08-AOS620
[35]	Zhao P, Yu B. On model selection consistency of Lasso. The Journal of Machine Learning Research, 2006, 7: 2541–2563. DOI: 10.5555/1248547.1248637
[36]	Wainwright M J. Sharp thresholds for high-dimensional and noisy sparsity recovery using l₁-constrained quadratic programming (Lasso). IEEE Transactions on Information Theory, 2009, 55 (5): 2183–2202. DOI: 10.1109/TIT.2009.2016018
[37]	Buldygin V V, Kozachenko Yu V. Metric Characterization of Random Variables and Random Processes. Providence, RI: American Mathematical Society, 2000.
[38]	Sun T, Zhang C H. Scaled sparse linear regression. Biometrika, 2012, 99 (4): 879–898. DOI: 10.1093/biomet/ass043
[39]	Ren Z, Sun T, Zhang C H, et al. Asymptotic normality and optimalities in estimation of large Gaussian graphical models. The Annals of Statistics, 2015, 43 (3): 991–1026. DOI: 10.1214/14-AOS1286
[40]	Bickel P J, Levina E. Regularized estimation of large covariance matrices. The Annals of Statistics, 2008, 36 (1): 199–227. DOI: 10.1214/009053607000000758
[41]	Bickel P J, Levina E. Covariance regularization by thresholding. The Annals of Statistics, 2008, 36 (6): 2577–2604. DOI: 10.1214/08-AOS600

[1]	Deng Xin, Tian Chunyu, Ge Meimei, Ye Jing, Ding Yang, Wu Yi. The asymptotic properties of least square estimators in the linear errors-in-variables regression model with φ-mixing errors[J]. JUSTC, 2021, 51(2): 164-172. DOI: 10.52396/JUST-2020-0019
[2]	PAN Wanbin, ZHANG Lian. Pro-cyclicality of margin trading and short selling based on EMD method[J]. JUSTC, 2020, 50(8): 1144. DOI: 10.3969/j.issn.0253-2778.2020.08.015
[3]	ZHOU Huiting, ZHOU Jia, ZHENG Zemin. Scalable confidence intervals of precision matrices in high dimensions[J]. JUSTC, 2020, 50(6): 752-757. DOI: 10.3969/j.issn.0253-2778.2020.06.006
[4]	QIU Zhen, WANG Qiyuan, LIU Di, MENG Hongmin. A novel method for mining latent events based on scalable patterns[J]. JUSTC, 2019, 49(7): 524-532. DOI: 10.3969/j.issn.0253-2778.2019.07.002
[5]	LIN Yanqing, FU Yanggeng. A rule activation method for extended belief rule base based on improved similarity measures[J]. JUSTC, 2018, 48(1): 20-27. DOI: 10.3969/j.issn.0253-2778.2018.01.003
[6]	HU Dahai, ZHANG Weiping. Least product relative error estimation with right censoring[J]. JUSTC, 2017, 47(9): 755-761. DOI: 10.3969/j.issn.0253-2778.2017.09.007
[7]	WEI Huijuan, DAI Muhong, NING Yongyu. Collaborative filtering recommendation algorithm based on nearest neighbor clustering[J]. JUSTC, 2016, 46(9): 736-742. DOI: 10.3969/j.issn.0253-2778.2016.09.004
[8]	WU Shidong, BAO Hua, ZHANG Chenbin, CHEN Zonghai. Particle filter tracking based on visual saliency feature[J]. JUSTC, 2015, 45(11): 934-942. DOI: 10.3969/j.issn.0253-2778.2015.11.009
[9]	LU Haibo, WANG Zhanfeng, WU Yaohua. Sequential shrinkage estimation in generalized linear models with measurement errors[J]. JUSTC, 2015, 45(6): 449-459. DOI: 10.3969/j.issn.0253-2778.2015.06.004
[10]	LI Zhengping, RAN Tiangang. Application of a sliding-window data accumulation algorithm for improving security in radiant intensity detection systems[J]. JUSTC, 2015, 45(1): 56-60. DOI: 10.3969/j.issn.0253-2778.2015.01.009

TrendMD

Volume 53 Issue 11 PP. 1105

Cover

Keywords

Article Metrics

Article views (387) PDF downloads (1049)

Gaussian graphical model estimation with measurement error

Abstract

Graphical Abstract

Abstract

Public Summary

1. Introduction

2. Gaussian graphical model under measurement errors

2.1 Model setting

2.2 Scalable estimation by COCOISEE

3. Theoretical properties

3.1 Technical conditions

3.2 Main results

4. Simulation studies

5. Conclusions

Conflict of interest

Conflict of Interest

References

Related Articles

Supplements

Other Related Supplements

Graphic and text summary

Catalog

References

Related Articles

TrendMD

Article Metrics

Authors

Browse

Contact Us

About

Gaussian graphical model estimation with measurement error

Share

Tools

Abstract

Graphical Abstract

Abstract

Public Summary

1. Introduction

2. Gaussian graphical model under measurement errors

2.1 Model setting

2.2 Scalable estimation by COCOISEE

3. Theoretical properties

3.1 Technical conditions

3.2 Main results

4. Simulation studies

5. Conclusions

Conflict of interest

Conflict of Interest

References

Related Articles

Supplements

Other Related Supplements

Graphic and text summary

Catalog

References

Related Articles

TrendMD

Article Metrics

Authors

Browse

Contact Us

About

Export File

Citation

Format

Content