Wenjie Gao is currently a master student at the School of Management, University of Science and Technology of China (USTC). He received his B.S. degree from USTC in 2019. His research interests focus on high-dimensional variable selection and inference
Jie Wu is currently a Ph.D. student at the School of Management, University of Science and Technology of China. She received her B.S. degree from Anhui University of Technology in 2017. Her research mainly focuses on high-dimensional variable selection and classification
Gaussian graphical models have been widely used for network data analysis. Although various methods exist for estimating the parameters, simultaneous inference is essential for graphical models. In this study, we propose a bootstrap procedure to conduct simultaneous inference for Gaussian graphical models. The simultaneous inference procedure is applied to large-scale graphical models and allows the dimension of the parameter vector of interest to exceed the sample size. We prove that the simultaneous test achieves a pre-set significance level asymptotically. Further simulation studies demonstrate the effectiveness of the proposed methods.
Graphical Abstract
The method, theorem and simulation study of the simultaneous inference for a high-dimensional precision matrix.
Abstract
Gaussian graphical models have been widely used for network data analysis. Although various methods exist for estimating the parameters, simultaneous inference is essential for graphical models. In this study, we propose a bootstrap procedure to conduct simultaneous inference for Gaussian graphical models. The simultaneous inference procedure is applied to large-scale graphical models and allows the dimension of the parameter vector of interest to exceed the sample size. We prove that the simultaneous test achieves a pre-set significance level asymptotically. Further simulation studies demonstrate the effectiveness of the proposed methods.
Public Summary
We propose a bootstrap procedure to conduct simultaneous inference for Gaussian graphical models.
The procedure is applied to large-scale graphical models and allows the dimension of the parameter vector of interest to exceed the sample size.
Both theoretical and simulation results verify the feasibility of our method.
In the era of the rapid development of information technology, vast amounts of information are being introduced for a range of applications, including modern healthcare, online marketing, and quantitative finance. Graphical models have been widely used in the analysis of networks comprising many individuals. In Gaussian graphical models, the conditional independence of two variables is equivalent to that of the corresponding element with a value of zero in the precision matrix (inverse covariance matrix)[1]. Therefore, Gaussian graphical models can accurately describe and estimate functional brain networks. Comparing the functional connectivity of subjects in the two populations calls for the comparison of these estimated Gaussian graphical models. Belilovsky et al.[2] proposed an approach to identify differences within such a model, which is known to have a similar structure.
As Gaussian graphical models find many applications, numerous studies have estimated the precision matrices of such models. A widely used method for estimating the precision matrix is to maximize the penalized Gaussian likelihood or minimize the penalized empirical risk[3–5]. The penalized regression or Dantzig selector-type optimization methods have also attracted interest[6–8]. The above methods have the advantages of rationality and theory, but they are unsatisfactory in terms of computation when the sample dimension p is large. The innovated scalable efficient estimation (ISEE)[9] combined the advantages of high-dimensional sparse modeling and large covariance matrix estimation. This computational problem was solved to a certain extent for large-scale graphical models. The aforementioned methods mainly focused on the point estimation of the precision matrix and did not characterize the uncertainty of the estimated entries.
In recent years, several studies have focused on de-biasing the penalized estimators by inverting the Karush-Kuhn-Tucker (KKT) condition[10] such that the asymptotic normal distributions for linear and generalized linear models can be obtained. The de-bias method is also used to deal with the Gaussian graphical model estimator. For example, Jankov and van de Geer[11] proposed a de-sparsified graphical lasso estimator by inverting the KKT condition based on a penalized empirical risk estimator. A similar study was conducted by them in Ref. [12]. Recently, Zhou et al.[13] adopted a similar approach to achieve asymptotic normality in the elements of the de-sparsified ISEE estimator. The problem with the inference methods of graphical models is that only one element of the precision matrix can be inferred at a time, whereas researchers tend to prioritize the state of a block's connection. Jankov and van de Geer[12] indicated the problem of support recovery for multiple variables, but it was not discussed in depth. Thus, it is necessary to develop new methods for simultaneous inference of the precision matrix. Several studies have been conducted on the simultaneous inference of high-dimensional linear models based on de-sparsified estimators[14]. We consider using a similar method to process simultaneous inferences for Gaussian graphical models.
In this study, we propose a bootstrap procedure to conduct simultaneous inference for Gaussian graphical models based on the ISEE[9] and its de-sparsifying estimator[13]. We overcome the problem wherein the de-bias estimator recovers the precision matrix based only on a single point. This procedure can naturally construct simultaneous confidence intervals; support recovery is carried out on the entire graph. In theory, a simultaneous testing procedure asymptotically achieves a pre-specified significance level, and the test set can even overwrite all elements in the precision matrix. Moreover, it inherits the computational advantages of ISEE, which can deal with ultra-large precision matrices with simple tuning. The simulation results also support the reliability of the proposed method.
The remainder of this paper is organized as follows. In Section 2, we introduce the simultaneous inference method for a large precision matrix. The theoretical results are provided in Section 3. Simulation studies are also presented in Section 4. The proofs of the theorems in Section 3 and the technical details are included in the Appendix.
Notation. For a vector x∈Rd and p∈(0,∞), we use the notation ||x||p to denote the p-norm of x in the classical sense. By ei, we denote a p-dimensional vector of zeros, with one at position i. For matrix A∈Rd×d, we use the notation Ai=Aei, |||A|||∞=maxi||eTiA||1, ||A||∞=maxi,j|Ai,j|, and supp(A)={(i,j):Ai,j≠0}. We define G(M,K)={A, where each row has at most K nonzero off-diagonal entries, and M−1⩽. For two numbers, a and b , let a\vee b={\rm{max}}\{a, b\} and a\land b= {\rm{min}}\{a, b\}. For sequences f_n, g_n , we use the notation f_n={\cal{O}}(g_n) if f_n\leqslant Cg_n and f_n=\varOmega(g_n) if f_n\geqslant Cg_n for some constant C>0 independent of n . We write f_n\asymp g_n if both f_n={\cal{O}}(g_n) and f_n=\varOmega(g_n), and f_n=o(g_n) if {\rm{lim}}_{n\rightarrow \infty}f_n/g_n=0.
2.
Simultaneous inference procedures
2.1
Model setting
We consider a p -variate zero-mean Gaussian random vector,
where {\boldsymbol{\varSigma}}=(\varSigma_{ij}) denotes the population covariance matrix of {\boldsymbol{x}}. Suppose that {\boldsymbol{X}}=({\boldsymbol{x}}_1, ..., {\boldsymbol{x}}_n)^{\rm{T}} is an n\times p sample matrix with n i.i.d. observations {\boldsymbol{x}}_i, 1\leqslant i\leqslant n , where {\boldsymbol{x}}_i \sim N({\boldsymbol{0}}, {\boldsymbol{\varSigma}}). Let \widehat{{\boldsymbol{\varSigma}}}=\dfrac{1}{n} \displaystyle {\sum}_{i=1}^{n}{\boldsymbol{x}}_i{\boldsymbol{x}}_i^{\rm{T}} be the empirical covariance matrix of {\boldsymbol{X}} , and let {\boldsymbol{\varTheta}}={\boldsymbol{\varSigma}}^{-1} be the inverse of the covariance matrix, which is also called the precision matrix. In Gaussian graphical models, an edge exists between nodes X_i and X_j is defined as the conditional dependence between X_i and X_j . X_i and X_j are conditionally dependent if and only if {\varTheta}_{ij}\neq 0[1].
In Section 1, we introduce several estimation methods for the precision matrix {\boldsymbol{\varTheta}} and an inference method for a single estimation point. However, in high-dimensional graphical models, it is important to analyze the connection of a set of variables. Thus, we focus on the simultaneous inference of the elements of the precision matrix {\boldsymbol{\varTheta}}. Consider the hypothesis testing problem:
versus alternative H_{a, G}: \varTheta_{ij}\neq \tilde{\varTheta}_{ij} for some (i, j)\in G , where (\tilde{\varTheta}_{ij}) with (i, j)\in G is a vector of pre-specified values. Specifically, the test can be applied to support the recovery of {\boldsymbol{\varTheta}} when \tilde{\varTheta}_{ij}= 0 for all (i, j)\in G . The problem of recovering the Gaussian graph is equivalent to supporting the recovery of {\boldsymbol{\varTheta}} because of the characterization of the edge set based on {\boldsymbol{\varTheta}}. Moreover, we expect that a testing set with index set G in {\boldsymbol{\varTheta}}=(\varTheta_{ij}) can be designed arbitrarily. To solve this problem, we propose a bootstrap method for the simultaneous inference of {\boldsymbol{\varTheta}} based on de-bias estimator.
2.2
The de-sparsified estimator
It is essential to obtain a valid initial estimate of the precision matrix to address the hypothesis testing problem. We select the ISEE estimator[9], which converts the precision matrix estimation problem into that of a large covariance matrix estimation as the initial estimator for {\boldsymbol{\varTheta}} because it can deal with ultra-large graphical models and provide high computational efficiency. In the remainder of this study, we denote the ISEE estimator \widehat{{\boldsymbol{\varTheta}}}_{\rm {cini}}[9] by \widehat{{\boldsymbol{\varTheta}}}.
A scalable inference method exists for a single element in {\boldsymbol{\varTheta}}. Let
where Z_{ij}^n converges to the Gaussian distribution N(0, \sigma_{ij}^2) with \sigma_{ij}>0 , \sigma_{ij}^2=\varTheta_{ii}\varTheta_{jj}+\varTheta_{ij}^2, and the negligible term does not depend on (i, j) . Although the above conclusion is a good description of the asymptotic properties of individual elements, it is not sufficient for simultaneous inferences.
2.3
Bootstrap method for simultaneous inference
Based on the estimator \widehat{\boldsymbol{T}} , we propose a test statistic
We hope that this method can eliminate the influence of tail item in Eq. (2), such we only need to consider the distribution of \max\limits_{(i,j)\in G}Z_{ij}^n, which has been proved in the next section. Note that {\boldsymbol{\varTheta}} is a symmetric matrix. For an efficient calculation, we only consider hypothesis testing for the upper triangular block of the inverse covariance matrix {\boldsymbol{\varTheta}} without loss of generality. Thus, G\subset \{(i, j):1\leqslant i\leqslant j\leqslant p\} .
As the distribution of \max\limits_{(i, j)\in G}\sqrt{n}(\widehat{T}_{ij}-\varTheta_{ij}) cannot be obtained, one alternative is to obtain a reliable approximation of the distribution for Eq. (3). We construct a simple bootstrap method. Note that Z_{ij}^n is asymptotically normal, and the remainder in Eq. (2) is negligible; thus, we use a multivariate normal random vector to simulate the distribution of our test statistic. It can be proven that the sequence \{Z_{ij}^n\}_{1\leqslant i\leqslant j\leqslant p} follows an asymptotic normal distribution with the covariance matrix
where \boldsymbol{\varXi} is a \dfrac{p(p+1)}{2}\times\dfrac{p(p+1)}{2} matrix with the diagonal elements (\sigma_{ij}^2)_{ i\leqslant j} . Based on this form of the covariance matrix for \{Z_{ij}^n\}_{1\leqslant i\leqslant j\leqslant p} , we define
as the estimator of \boldsymbol{\varXi}, with diagonal elements \widehat{\sigma}_{ij}^2=\widehat{\varTheta}_{ii}\widehat{\varTheta}_{jj}+ \widehat{\varTheta}_{ij}^2. We then generate a random vector \{\hat{Z}_{ij}\}_{1\leqslant i\leqslant j\leqslant p}\sim N({\bf{0}}, \widehat{\boldsymbol{\varXi}}) and define
By definition, W_G has an approximate distribution as follows: \max\limits_{(i, j)\in G}\sqrt{n}(\widehat{T}_{ij}-\varTheta_{ij}). The bootstrap critical value is given by c_G(\alpha) = {\rm{inf}}\{t\in R: P(W_G\leqslant t| X)\geqslant 1-\alpha\}. Our method is used to generate a bootstrap sequence of i.i.d. random vectors \{\hat{Z}_{ij, m}\}_{1\leqslant m\leqslant M} and select the \alpha -quantile of the sequence \{\max\limits_{(i, j)\in G}\{\hat{Z}_{ij, m}\}_{1\leqslant m\leqslant M}\} as a substitute for c_G(\alpha) .
Moreover, we can adjust the test and bootstrap statistics to obtain a similar regularization test. We consider the studentized statistic
where \widehat{\sigma}_{ij}>0 and \hat{\sigma}_{ij}^2=\widehat{\varTheta}_{ii}\widehat{\varTheta}_{jj}+\widehat{\varTheta}_{ij}^2. Correspondingly, the bootstrap critical value can be obtained via \bar{c}_G(\alpha) = {\rm{inf}}\{t\in R: P(\bar{W}_G\leqslant t| X)\geqslant 1-\alpha\}, defined as
We now consider a simultaneous confidence interval. Suppose there is a given set G\subseteq \{(k, l):1\leqslant k\leqslant p, 1\leqslant l\leqslant p\} , which is the index set of the subset to be tested for {\boldsymbol{\varTheta}}. To address the hypothesis testing problem, we generate a n\times|G| bootstrap sequence \{\widehat{Z}_{ij}\}_{(i, j)\in G, n} with n i.i.d. samples, where (\widehat{Z}_{ij})_{(i, j)\in G}\sim N({\bf{0}}, \widehat{\boldsymbol{\varXi}}_G), \widehat{\boldsymbol{\varXi}}_G is a |G|\times|G| subblock of \hat{\boldsymbol{\varXi}}. The sequence \{\max\limits_{(i, j)\in G}\hat{Z}_{ij}\}_n has an approximate distribution of \max\limits_{(i, j)\in G}\sqrt{n}(\widehat{T}_{ij}-{\varTheta}_{ij})/\hat{\sigma}_{ij}. We then obtain the hypercuboid I_G= I_1\times I_2\times ... \times I_{|G|} as the confidence region of \{\tilde{\varTheta}_{ij}\}_{(i, j)\in G} with a confidence level 1-\alpha , where I_k=(\widehat{T}_{i_kj_k}-\hat{\sigma}_{i_kj_k}\bar{c}_G(\alpha/2)/ \sqrt{n}, \widehat{T}_{i_kj_k}+\hat{\sigma}_{i_kj_k}\bar{c}_G(\alpha/2)/\sqrt{n}). The confidence region I_G can also be the acceptance domain of the null hypothesis \tilde{\varTheta}=\varTheta for the hypothesis problem (1), with confidence level 1-\alpha .
3.
Theoretical results
This section presents theoretical support for the procedures in Section 2. The theorems demonstrate the consistency of the asymptotic distribution between {W}_G and \sqrt{n}(\widehat{T}_{ij}-\varTheta_{ij}). Thus, the bootstrap method is established as reasonable. We also prove the validity of the support recovery process.
The following four assumptions are necessary to support this result:
Assumption 3.1. Assume that {\boldsymbol{\varTheta}}\in {\cal{G}}(M, K), with M= {\cal{O}}(1), M > 1 and p>n, \log p/n=o(1) . For every i\neq j , there exists a positive constant c , in which |\varTheta_{ij}|/\sqrt{\varTheta_{ii}\varTheta_{jj}}\leqslant c < 1.
Assumption 3.2. There exists a certain sparsity level s\geqslant K , s = {\cal{O}}(d) = o(\sqrt{n}/\log p) and some constants 0 \leqslant \alpha \leqslant 1/2 and \xi > 1 , such that the l1 -norm cone invertibility factor
of covariance matrix {\boldsymbol{\varSigma}}= {\boldsymbol{\varTheta}}^{-1} satisfies F_{\infty}^{-1}={\cal{O}}(K^{\alpha}) .
Assumption 3.3. We assume that (\log(pn))^7/n\leqslant C_1n^{-c_1} for constants c_1, C_1>0 .
Assumption 3.4. There exists a sequence of positive numbers \alpha_n\rightarrow \infty , such that \alpha_n/p = o(1) and \alpha_n(\log p)^2K\sqrt{\dfrac{\log p}{n}} = o(1), (\log p)^2\max\limits_{1\leqslant i, j\leqslant p}\varDelta_{ij}= o_p(1).
Based on Assumptions 3.1 and 3.2, We obtain the asymptotic normality of a single element of the de-sparsified estimator \widehat{\boldsymbol{T}} (Lemma A.1). We then propose a theorem to ensure the theoretical feasibility of simultaneous inference. Assumptions 3.3 and 3.4 are necessary to prove this theorem, as they control for the bias of the covariance of the bootstrap statistics.
The following theorem justifies the validity of the bootstrap procedure for the studentized-type statistic \max\limits_{(i, j)\in G} \sqrt{n}(\widehat{T}_{ij}-\varTheta_{ij})/\widehat{\sigma}_{ij}. This is an important theoretical basis for the simultaneous confidence interval inference.
Theorem 3.1. Suppose that Assumptions 3.1–3.4 hold. Then, for any G\subseteq \{(k, l):1\leqslant k\leqslant l\leqslant p\} ,
The theorem demonstrates the reasonability of using the bootstrap quantile \bar{c}_G(\alpha) as the quantile of the distribution for \max\limits_{(i, j)\in G}\sqrt{n}(\widehat{T}_{ij}-\varTheta_{ij})/\widehat{\sigma}_{ij}. A crucial feature of this bootstrap-assisted testing procedure is that it explicitly accounts for the effect of |G| because the bootstrap critical value c_G(\alpha) depends on G . Thus, the result is readily applicable to the construction of simultaneous confidence intervals for (\tilde{\varTheta}_{ij}) with (i, j)\in G .
Next, we consider the inference of support recovery. The major goal of support recovery is to identify nonzero signal locations in a pre-specified set G . The procedure involves setting an appropriate threshold \tau in the set
Theorem 3.2 in the following section shows that the above support recovery procedure is consistent and effective if \tau\geqslant 2 and provides the theoretical optimum separation parameter \tau_0=2 . Next, we conduct a power analysis for the above procedure. When |G| is fixed, the test is consistent (based on the asymptotic normality of a single element in \widehat{T} ). Thus, we focus on the case where |G| \rightarrow \infty . Note that no signal strength assumption is required in simultaneous confidence interval inference. However, this assumption is necessary to support recovery.
The separation set {\cal{U}}_G(c) shows that there is at least one element in G , such that |\varTheta^*_{ij}-\varTheta_{ij}|/\widehat{\sigma}_{ij} > c\sqrt{\log (|G|)/n}. The following theorem shows that the test rejects the null hypothesis as long as one element in G satisfies the above condition.
Theorem 3.2. We assume that G\subset \{(i, j):1\leqslant i\leqslant j\leqslant p\} . Under the assumptions of Theorem 3.1, for any \epsilon_0>0 ,
Theorem 3.2 supports the conclusion that the proposed procedure is highly sensitive in terms of detecting sparse alternatives. This is the theoretical basis for the support recovery procedure.
From the proof of Theorem 3.2, we also observe that the separation rate (\sqrt{2}+\epsilon_0)\sqrt{\log(|G|)/n} for any \epsilon_0 > 0 derived in Theorem 3.2 is minimax optimal under suitable assumptions. The following theorem shows that the support recovery procedure is consistent and effective if the threshold value is set as \tau_0 = 2 and further justifies the optimality of \tau_0 . A similar support recovery test procedure was presented by Ref. [12]. We show the relationship between the optimal threshold parameter and the number of test variables in the following corollary.
Corollary 3.1.Under the assumptions in Theorem 3.2, we obtain the following:
where {\boldsymbol{S}}^*(s_0)=\{{\boldsymbol{\varTheta}}=(\varTheta_{ij}):\displaystyle{\sum}_{1\leqslant i < j\leqslant p}\boldsymbol{I}\{\varTheta_{ij}\neq 0\}=s_0\}.
This observation, in part, has motivated the consideration of the numerical study procedure in Section 4.
4.
Numerical studies
In Section 4, we present the simulation results for the applications of the simultaneous inference process, including simultaneous confidence interval and support recovery.
4.1
Simultaneous confidence interval
We consider the following model setting to demonstrate the proposed method. {\boldsymbol{\varTheta}}= {\rm{tridiag}}(\rho; 1; \rho) where a given \rho=0.45 is a tri-diagonal matrix. For every data set, the rows of sample X are sampled as i.i.d. copies of N(\boldsymbol{0}, {\boldsymbol{\varTheta}}^{-1}). We consider various sample sizes, n , and dimensions, p , where {\boldsymbol{\varTheta}}\in R^{p\times p}. The index set G of the inverse covariance matrix for the test is set as follows: (I) G contains the top 50 lines of a column of {\boldsymbol{\varTheta}} with |G|=50 . (II) G is a column of {\boldsymbol{\varTheta}}, with |G|=p . For both cases, we test the top 10 columns for each set of data. (III) G is an upper tri-diagonal subblock of {\boldsymbol{\varTheta}} without the diagonal elements. The subblock is an 11\times 11 submatrix such that |G|=55 . (IV) G is defined as the above case with a d\times d submatrix, where d= {\rm{min}}\{p/10, \lceil \sqrt{2p}+1/2 \rceil\}, such that |G| is close to p . For both cases, we test 10 disjoint subblocks for each set of data. The tuning parameter is set as the value suggested by Ref. [9], and the confidence interval is shown in Section 2. All the performance measures are averaged over N = 50 replications.
Although our procedure is applied to achieve the simultaneous confidence interval of the entire precision matrix {\boldsymbol{\varTheta}}, it is not computationally feasible because of the complexity of computing the square-rooting matrix of size p^2\times p^2 .
We report the results at two confidence levels: \alpha = 0.1, 0.05 . The results are summarized in Tables 1 and 2. The length of the interval is the confidence interval of \max\limits_{(i, j)\in G}(\widehat{T}_{ij}-{\Theta}_{ij})/\widehat{\sigma}_{ij} . From Tables 1 and 2, we observe that simultaneous confidence interval inference is effective in a variety of situations. In particular, the procedure performs satisfactorily and stably under high-dimensional conditions, where n<p . Notably, the length of the confidence intervals decreases with an increase in the size of sample n and test set |G| . This result is consistent with the theoretical results.
Table
1.
The simultaneous confidence interval result in Cases I and II. "C" denotes the coverage rate of the confidence interval and "L" denotes the length of the confidence interval. They are similarly defined in the following table as well.
We consider the most challenging and valuable scenario, where G = \{(i, j)\} _{1\leqslant i<j \leqslant p} , because the test for non-diagonal elements is of practical significance. We consider two model settings to demonstrate the proposed method: (Type I) {\boldsymbol{\varTheta}}= {\rm{tridiag}}(\rho;\;1;\;\rho) with a given \rho > 0 being a tri-diagonal matrix; we set \rho=0.45 . (Type II) {\boldsymbol{\varTheta}}= {\text{five-diag}}({\rho_0}; {\rho_1}; {\rho_2}), which is defined by
and set {\boldsymbol{\varTheta}}= five-diag (1, 0.5, 0.4) . The setting of sample X is the same as that in simultaneous confidence interval inference. To assess the performance of support recovery, we consider the power and number of false positives (FP) of the procedure. We simultaneously compare the result with the estimated support set by ISEE, which is defined as \{(i, j): |\widehat{{\bf{\varTheta}}}^{\rm{isee}}_{ij}| > 10^{-3}\}. Moreover, in line with Ref. [14], we consider the following similarity measure:
We also compare the effect and computation speed with De-GLasso and De-NLasso, as recommended by Ref. [12]. All performance measures were averaged over N = 50 replications.
The results are summarized in Tables 3, 4, and 5. Although ISEE has a remarkably high recognition rate for nonzero elements, some zero elements are picked out incorrectly. Thus, our approach provides a better way to support recovery. Tables 3 and 4 show that the sample size n has a significant influence on the result compared with the dimension p . To ensure that the sample size n is sufficiently large, our procedure performs well even when p is larger than n . Based on Table 5, we see that although the effects are similar among the different methods for estimating {\bf{\varTheta}}, De-ISEE enables much faster calculation than De-GLasso and De-NLasso.
Table
3.
The support recovery result, where {\boldsymbol{\varTheta}} follows Type I, d=d(\hat{{\boldsymbol{S}}}_0(2), {\boldsymbol{S}}_0) .
In this study, we propose a bootstrap method to deal with simultaneous inferences for Gaussian graphical models. The method is based on de-sparsified ISEE so it has asymptotic normality and computational advantages. We then demonstrate the effectiveness of the proposed method both theoretically and through simulation. Notwithstanding our findings, the problem of simultaneous confidence interval inference for the entire graph continues to persist because of the high computational cost involved. This is a direction for future research.
In the appendix, we provide proofs of the main results of this study. First, we list the notation used in the proofs. Let \tilde{\boldsymbol{x}}={\boldsymbol{\varTheta}}\boldsymbol{x}=(\tilde{X}_1, ..., \tilde{X}_p). It is evident that \tilde{\boldsymbol{x}}\sim N(\boldsymbol{0}, {\boldsymbol{\varTheta}}). Let \{z_{ij}\}_{1\leqslant i\leqslant j\leqslant p} be a p(p+1)/2 sequence of the mean zero independent Gaussian vector with Ez_{ij}z_{ij}^{\rm{T}} = \boldsymbol{\varXi}. We use the notation \varXi_{ij, kl} to show the ij th row and kl th column element of \boldsymbol{\varXi}, which can be regarded as the covariance of the variance \tilde{x}^i\tilde{x}^j-\varTheta_{ij} and \tilde{x}^k\tilde{x}^l-\varTheta_{kl}. We define
\begin{array}{l} \bar{{\boldsymbol{\varTheta}}}=(\varTheta_{ij}/\sqrt{\varTheta_{ii}\varTheta_{jj}})_{1\leqslant i, j\leqslant p} \end{array}
as the centralized result of matrix {\boldsymbol{\varTheta}}. Let \xi_{ijk}={\boldsymbol{\varTheta}}_i^{\rm{T}}\boldsymbol{x}_k\boldsymbol{x}_k^{\rm{T}} {\boldsymbol{\varTheta}}_j- \varTheta_{ij}. As each element in an i.i.d. sequence \{{\boldsymbol{\varTheta}}\boldsymbol{x}_k\}_{k=1}^n has the same distribution as \tilde{\boldsymbol{x}} , we observe that \{\xi_{ijk}\}_{1\leqslant i\leqslant j\leqslant p} and \{z_{ij}\}_{1\leqslant i\leqslant j\leqslant p} have the same mean and covariance matrices.
We then present several lemmas that are used in the proof of the theorem in Section 3.
Lemma A.1.[13, Theorem 3.1] Let \widehat{\boldsymbol{T}} be defined in Section 2. Under Assumptions 3.1 and 3.2 and ||\boldsymbol{\varSigma}||_\infty={\cal{O}}(1), we obtain
for all 1\leqslant i, j, k, l\leqslant p , where the triangle inequality is used, and |\widehat{{\boldsymbol{\varTheta}}}|_\infty={\cal{O}}(1), |{\boldsymbol{\varTheta}}|_\infty={\cal{O}}(1). Thus, we obtain
Let c_{z, G}(\alpha)={\rm{inf}}\{t\in R:P(\max\limits_{(i, j)\in G}z_{ij}\leqslant t)\geqslant 1-\alpha\}. Following the arguments in the proof of Ref. [15, Lemma 3.2],
Let \bar{\varGamma}=\max\limits_{(i\leqslant j), (k\leqslant l)} |\widehat{\varXi}_{ij,\, kl}/(\hat{\sigma}_{ij}\hat{\sigma}_{kl})-\varXi_{ij,\, kl}/(\sigma_{ij}\sigma_{kl})|, which then yields
Using the above arguments, we can show that P(\bar{\varGamma} > v) = o(1) for v = 1/(\alpha_n(\log p)^2) . The remaining proofs are similar to the proof of Lemma A.1; as such, we have omitted the details.
Appendix A.3
The proof of Theorem 3.2
We define Z_{ij}=z_{ij}/\sigma_{ij} for 1\leqslant i\leqslant j\leqslant p . Thus, \{Z_{ij}\}_{1\leqslant i\leqslant j\leqslant p}\sim N(\boldsymbol{0}, \bar{\boldsymbol{\varXi}}) with
where \xi_1\sqrt{1\vee \log(p/\xi_1)}=o(1) and \xi_2=o(1) from the proof of Theorem 3.1, the distribution of \max\limits_{(i, j)\in G}\sqrt{n}|\hat{T}_{ij}-\varTheta_{ij}|/\widehat{\sigma}_{ij} can be approximated by \max\limits_{(i, j)\in G}|Z_{ij}| . Under Lemma A.2, by Ref. [16, Lemma 6] for any x\in R and as |G|\rightarrow \infty , we obtain
where q_{\alpha} is the 100(1 - \alpha)\% quantile of F(x) . There is a (k, l)\in G such that |\tilde{\varTheta}_{kl}-\varTheta_{kl}|/\widehat{\sigma}_{kl} > (\sqrt{2}+\epsilon_0)\sqrt{\log(|G|)/n}. For any \delta>0 , using the AM-GM inequality, we have
as (k, l) is fixed and |G| is sufficiently large. From the proof of Theorem 3.1, we know that the difference between n|\tilde{\varTheta}_{kl}-\varTheta_{kl}|^2/\widehat{\sigma}_{kl}^2 and n|\tilde{\varTheta}_{kl}-\varTheta_{kl}|^2/\sigma_{kl}^2 is asymptotically negligible. Thus, given that \boldsymbol {\varTheta} \in {\cal{U}}_G(\sqrt{2}+\epsilon_0), we obtain
Similar to the proof of Theorem 3.2, the distribution of \max\limits_{(i, j)\in G}\sqrt{n}|\hat{T}_{ij}-\varTheta_{ij}|/\widehat{\sigma}_{ij} can be approximated by \max\limits_{(i, j)\in G}|Z_{ij}| . Under Lemma A.2, by Ref. [16, Lemma 6], for any x\in R and as |G|\rightarrow \infty , we obtain
As the difference between \min\limits_{(i, j)\in S_0}n|T_{ij}|^2/\widehat{\sigma}_{ij}^2 and \min\limits_{(i, j)\in S_0}n|T_{ij}|^2/\sigma_{ij}^2 is asymptotically negligible, and P(2\max\limits_{(i, j)\in S_0}n|\widehat{T}_{ij}-T_{ij}|^2/\widehat{\sigma}_{ij}^2\leqslant 4\log(|G|)-\log\log(|G|))\rightarrow 1, we obtain
Next, we prove the optimality of \tau = 2 . For a sufficiently large M , we can choose the set G : such that \varTheta_{ij}=0 for (i, j)\in G , and |G|=M . Based on the above arguments, we know that the distribution of \max\limits_{(i, j)\in G}\sqrt{n}|\hat{T}_{ij}-\varTheta_{ij}|/\widehat{\sigma}_{ij} can be approximated by \max\limits_{(i, j)\in G}|Z_{ij}| . From (A.7), we obtain the following:
where C_1'=2^7B^2C_1={\cal{O}}(1) . Hence, we consider the application of Ref. [15, Corollary 2.1] to the i.i.d. sequence \{\xi_{ijk}\}_{k=1}^n . Only its Condition (E.1) must be verified. For clarity, we state the following condition:
uniformly over j , where c_0, C_0 > 0 and B={\cal{O}}(1) . As \tilde{\boldsymbol{x}} is a normal random vector with \tilde{\boldsymbol{x}}=\boldsymbol{x}{\boldsymbol{\varTheta}}, we obtain the following:
It is straightforward to see that \max\limits_{l=1, 2}E|\xi_{ijk}|^{2+l}={\cal{O}}(1) ; therefore, B_1={\cal{O}}(1) , such that \max\limits_{l=1, 2}E|\xi_{ijk}|^{2+l}/B_1^l\leqslant 2 .
Let B_2:=|\varTheta_{ij}|/\log(2). As \tilde{X} is normal, there exists a constant B_3 , such that E{\rm e}^{\tilde{X}_i^2/B_3}\leqslant 1 and E{\rm e}^{\tilde{X}_j^2/B_3}\leqslant 1. Now, let
where we use the average and Cauchy inequalities in the third and fourth inequations, respectively. This completes the proof of Lemma A.2.
Appendix A.7
The proof of Lemma A.3
First, we prove that there is a constant 0<c<1 such that |\bar{\varTheta}_{ij}| < c for all i\neq j . As {\boldsymbol \varTheta} \in {\cal{G}}(M, K), with M={\cal{O}}(1) , we have
\begin{array}{l} M^{-1}\leqslant \lambda_{\rm min}({\boldsymbol{\varTheta}})\leqslant \lambda_{\rm max}({\boldsymbol{\varTheta}})\leqslant M. \end{array}
Let \boldsymbol{z}=\boldsymbol{e}_i/\sqrt{\varTheta_{ii}}-\boldsymbol{e}_j/\sqrt{\varTheta_{jj}}. Then, we have
Thus, \bar{\varTheta}_{ij}\leqslant 1-\dfrac{1}{M^2}. Similarly, we prove that -\bar{\varTheta}_{ij}\leqslant 1-\dfrac{1}{M^2}. Let c=1-\dfrac{1}{M^2}<1 , we have |\bar{\varTheta}_{ij}| < c for all i\neq j .
{\bf{Case\; 1.}} The indices in (i, j) and (k, l) are the same.
Owing to the symmetry of (i, j),\ (k, l) in \varXi_{ij,\; kl}= \varTheta_{ik}\varTheta_{jl}+\varTheta_{il}\varTheta_{jk}, for convenience, we assume that l=j . Then, we have that i\neq k and |\bar{\varTheta}_{ik}|\leqslant c because (i, j)\neq (k, l) . Thus,
where the Cauchy inequality is used in the first inequation. Let c_1=1-\dfrac{1}{2M^2} < 1. Then, |\varXi_{ij, kl}|/\sqrt{\varXi_{ij, ij}\varXi_{kl, kl}} < c_1 for all (i, j)\neq (k, l) .
{\bf{Case\; 2.}} The indices in (i, j) and (k, l) are not the same.
The target formula can be rewritten in the following form:
We divide the Gaussian variable \tilde{x}_i into two independent Gaussian parts \tilde{x}_i' and \tilde{x}_i'' such that \tilde{x}_i=\tilde{x}_i'+\tilde{x}_i'' , with \tilde{x}_i'' being the projection of \tilde{x}_i onto the linear space {\rm span} \{\tilde{x}_k: 1\leqslant k\leqslant p, k\neq i, j\}. This implies that with an appropriate p \times 1 parameter vector \boldsymbol{u} with u_i=u_j=0 , we obtain \tilde{x}_i''=\boldsymbol{u}^{\rm{T}}\tilde{\boldsymbol{x}}. This also implies that E\tilde{x}_i'\tilde{x}_i''=0 and E\tilde{x}_i'\tilde{x}_k=E\tilde{x}_i'\tilde{x}_l=0 . Let \boldsymbol{z'}=\boldsymbol{e}_i-\boldsymbol{u} ; we then have
Therefore, E\tilde{x}_i'^2\geqslant \dfrac{1}{M} . We then use the same partition method to divide {\tilde{x}_j} into {\tilde{x}_j'} and {\tilde{x}_j''} . It is evident that {\tilde{x}_i'},\; {\tilde{x}_j'} are independent of {\tilde{x}_i'},\; {\tilde{x}_j''} and are all zero-mean Gaussian variables.
Defining E\tilde{x}_i'\tilde{x}_j'=\varTheta_{ij}' and E\tilde{x}_i''\tilde{x}_j''=\varTheta_{ij}'', we have
where we use the Cauchy inequality in the first inequation and E\tilde{x}_i'^2\geqslant \dfrac{1}{M}. If c_2=\sqrt{1-\dfrac{1}{2M^4}} < 1, we have |\varXi_{ij, kl}|/ \sqrt{\varXi_{ij, ij}\varXi_{kl, kl}}\leqslant c_2.
Combining Cases 1 and 2, let c=c_1 \vee c_2 ; we then have
The authors declare that they have no conflict of interest.
Conflict of Interest
The authors declare that they have no conflict of interest.
We propose a bootstrap procedure to conduct simultaneous inference for Gaussian graphical models.
The procedure is applied to large-scale graphical models and allows the dimension of the parameter vector of interest to exceed the sample size.
Both theoretical and simulation results verify the feasibility of our method.
Lauritzen S L. Graphical Models. London: Clarendon Press, 1996.
[2]
Belilovsky E, Varoquaux G, Blaschko M B. Testing for differences in Gaussian graphical models: Applications to brain connectivity. https://arxiv.org/abs/1512.08643.
[3]
Yuan M, Lin Y. Model selection and estimation in the Gaussian graphical model. Biometrika,2007, 94: 19–35. DOI: 10.1093/biomet/asm018
[4]
Fan J Q, Yang F, Wu Y. Network exploration via the adaptive lasso and scad penalties. The Annals of Applied Statistics,2009, 3 (2): 521–541. DOI: 10.1214/08-AOAS215SUPP
[5]
Friedman J, Hastie T, Tibshirani R. Sparse inverse covariance estimation with the graphical Lasso. Biostatistics,2007, 9: 432–441. DOI: 10.1093/biostatistics/kxm045
[6]
Meinshausen N, Bühlmann P. High-dimensional graphs and variable selection with the lasso. The Annals of Statistics,2006, 34: 1436–1462. DOI: 10.1214/009053606000000281
[7]
Cai T T, Liu W, Zhou H H. Estimating sparse precision matrix: Optimal rates of convergence and adaptive estimation. The Annals of Statistics,2016, 44: 455–488. DOI: 10.1214/13-AOS1171
[8]
Peng J, Wang P, Zhou N, et al. Partial correlation estimation by joint sparse regression models. Journal of the American Statistical Association,2009, 104: 735–746. DOI: 10.1198/jasa.2009.0126
[9]
Fan Y, Lv J. Innovated scalable efficient estimation in ultra-large Gaussian graphical models. The Annals of Statistics,2016, 44: 2098–2126. DOI: 10.1214/15-AOS1416
[10]
Zhang C H, Zhang S S. Confidence intervals for low dimensional parameters in high dimensional linear models. Journal of the Royal Statistical Society,2014, 76: 217–242. DOI: 10.1111/rssb.12026
[11]
Jankov J, van de Geer S. Confidence intervals for high-dimensional inverse covariance estimation. Electronic Journal of Statistics,2015, 9: 1205–1229. DOI: 10.1214/15-EJS1031
[12]
Jankov J, van de Geer S. Honest confidence regions and optimality in high-dimensional precision matrix estimation. Test,2017, 26: 143–162. DOI: 10.1007/s11749-016-0503-5
[13]
Zhou J, Zheng Z, Zhou H, et al. Innovated scalable efficient inference for ultra-large graphical models. Statistics and Probability Letters,2021, 173: 109085. DOI: 10.1016/j.spl.2021.109085
[14]
Zhang X, Cheng G. Simultaneous inference for high-dimensional linear models. Journal of the American Statistical Association,2017, 112: 757–768. DOI: 10.1080/01621459.2016.1166114
[15]
Chernozhukov V, Chetverikov D, Kato K. Gaussian approximations and multiplier bootstrap for maxima of sums of high-dimensional random vectors. The Annals of Statistics,2013, 41: 2786–2819. DOI: 10.1214/13-AOS1161
[16]
Cai T T, Liu W, Xia Y. Two-sample test of high dimensional means under dependence. Journal of the Royal Statistical Society, Series B(Statistical Methodology),2014, 76: 349–372. DOI: 10.1111/rssb.12034
Lauritzen S L. Graphical Models. London: Clarendon Press, 1996.
[2]
Belilovsky E, Varoquaux G, Blaschko M B. Testing for differences in Gaussian graphical models: Applications to brain connectivity. https://arxiv.org/abs/1512.08643.
[3]
Yuan M, Lin Y. Model selection and estimation in the Gaussian graphical model. Biometrika,2007, 94: 19–35. DOI: 10.1093/biomet/asm018
[4]
Fan J Q, Yang F, Wu Y. Network exploration via the adaptive lasso and scad penalties. The Annals of Applied Statistics,2009, 3 (2): 521–541. DOI: 10.1214/08-AOAS215SUPP
[5]
Friedman J, Hastie T, Tibshirani R. Sparse inverse covariance estimation with the graphical Lasso. Biostatistics,2007, 9: 432–441. DOI: 10.1093/biostatistics/kxm045
[6]
Meinshausen N, Bühlmann P. High-dimensional graphs and variable selection with the lasso. The Annals of Statistics,2006, 34: 1436–1462. DOI: 10.1214/009053606000000281
[7]
Cai T T, Liu W, Zhou H H. Estimating sparse precision matrix: Optimal rates of convergence and adaptive estimation. The Annals of Statistics,2016, 44: 455–488. DOI: 10.1214/13-AOS1171
[8]
Peng J, Wang P, Zhou N, et al. Partial correlation estimation by joint sparse regression models. Journal of the American Statistical Association,2009, 104: 735–746. DOI: 10.1198/jasa.2009.0126
[9]
Fan Y, Lv J. Innovated scalable efficient estimation in ultra-large Gaussian graphical models. The Annals of Statistics,2016, 44: 2098–2126. DOI: 10.1214/15-AOS1416
[10]
Zhang C H, Zhang S S. Confidence intervals for low dimensional parameters in high dimensional linear models. Journal of the Royal Statistical Society,2014, 76: 217–242. DOI: 10.1111/rssb.12026
[11]
Jankov J, van de Geer S. Confidence intervals for high-dimensional inverse covariance estimation. Electronic Journal of Statistics,2015, 9: 1205–1229. DOI: 10.1214/15-EJS1031
[12]
Jankov J, van de Geer S. Honest confidence regions and optimality in high-dimensional precision matrix estimation. Test,2017, 26: 143–162. DOI: 10.1007/s11749-016-0503-5
[13]
Zhou J, Zheng Z, Zhou H, et al. Innovated scalable efficient inference for ultra-large graphical models. Statistics and Probability Letters,2021, 173: 109085. DOI: 10.1016/j.spl.2021.109085
[14]
Zhang X, Cheng G. Simultaneous inference for high-dimensional linear models. Journal of the American Statistical Association,2017, 112: 757–768. DOI: 10.1080/01621459.2016.1166114
[15]
Chernozhukov V, Chetverikov D, Kato K. Gaussian approximations and multiplier bootstrap for maxima of sums of high-dimensional random vectors. The Annals of Statistics,2013, 41: 2786–2819. DOI: 10.1214/13-AOS1161
[16]
Cai T T, Liu W, Xia Y. Two-sample test of high dimensional means under dependence. Journal of the Royal Statistical Society, Series B(Statistical Methodology),2014, 76: 349–372. DOI: 10.1111/rssb.12034
Table
1.
The simultaneous confidence interval result in Cases I and II. "C" denotes the coverage rate of the confidence interval and "L" denotes the length of the confidence interval. They are similarly defined in the following table as well.