Shanshan Wang is currently a master student under the supervision of Assoc. Prof. Zhanfeng Wang at the University of Science and Technology of China. Her research mainly focuses on functional data
Hao Ding received his PhD degree from the University of Science and Technology of China (USTC). He is currently a postdoctoral fellow at USTC. His research focuses on robust estimation, functional data analysis
Extended t-process is robust to outliers and inherits many attractive properties from the Gaussian process. In this paper, we provide a function-on-function nonparametric random-effects model using extended t-process priors in which we consider heterogeneity of individual effect, flexible mean function, nonparametric covariance function and robustness. A likelihood-based estimation procedure is constructed to estimate parameters involved in the model. Information consistency for the parameter estimation is provided. Simulation studies and a real data example are further investigated to evaluate the performance of the developed procedures.
Graphical Abstract
Abstract
Extended t-process is robust to outliers and inherits many attractive properties from the Gaussian process. In this paper, we provide a function-on-function nonparametric random-effects model using extended t-process priors in which we consider heterogeneity of individual effect, flexible mean function, nonparametric covariance function and robustness. A likelihood-based estimation procedure is constructed to estimate parameters involved in the model. Information consistency for the parameter estimation is provided. Simulation studies and a real data example are further investigated to evaluate the performance of the developed procedures.
Public Summary
A function-on-function random effects model with extended t-process priors is considered.
The proposed model is general and flexible which includes various kinds of functional models as special cases.
The extended t-process model is robust to outliers and inherits almost all the good features for Gaussian process regression.
As the development of science and technology, some data sets are recorded frequently with curves, surfaces and other types, which are usually called functional data that plays an important role in wide fields such as atmospheric science, engineering, medical research, see more details in Ramsay and Silverman[1]. Functional regression models are useful tools in functional data analysis, where one of the most interesting and challenging cases is function-on-function regression, see Ramsay and Silverman[1,2], Yao et al.[3, 4]. In this paper, we consider the following functional model proposed by Wang et al.[5], for m=1,⋯,M,
where y_m(t) is the functional response, {\boldsymbol{{z}}}_{{m}}(t) is a p -vector of functional covariates, {\boldsymbol{{\nu}}} is the corresponding parameters, {\boldsymbol{{x}}}_{{m}}(s, t) is a q -dimensional of covariates depends on s and t , and {\boldsymbol{{\beta}}}(s,t) is a vector of the functional coefficients, S_t is interval for t , \varepsilon_m(t) is random error term for the m th curve. Model (1) is flexible, and includes some function-on-function models in Gervini[6], Malfait and Ramsay[7], Ramsay and Silverman[2], as special cases. Note that \tau_m is used to model the heterogeneity among the different subjects, which depends on {\boldsymbol{{z}}}_{{m}}(t) , {\boldsymbol{{x}}}_{{m}}(\cdot, t) . Wang et al.[5] considered the above random effects model using Gaussian process priors. More on Gaussian process priors in functional model[8,9].
However, when there exist outliers in the observations, it is not robust to use the model based on Gaussian process priors, see e.g. Wang et al.[10]. Then in order to overcome the influence of outliers, various forms of student t-process have been developed to model a heavy-tailed process, e.g. Yu et al.[11], Zhang and Yeung[12]. Shah et al.[13] pointed out that the t-distribution under addition is not closed to maintain the good properties of Gaussian models. Thus, Wang et al.[10] developed an extended t-process regression, which has the following advantages: ① it can maintain the good properties of Gaussian process; ② it has flexible forms, and contains model in Shah et al.[13] as a special case; ③ it is robust. More general discussions on t-process can see Refs. [10,14].
In this paper, we consider a functional nonparametric random effects model with extended t-process priors, and propose an estimation procedure. The proposed method has 3 merits. ① It applies the extended t-process prior to model the heterogeneity of individual effect in the function-on-function regression model such that the model has robustness; ② A basis expansion smoothing method and a penalized likelihood method are developed to estimate the parameter in the fixed effect and covariance function of random effects, which leads to estimation of the smoothing function and prediction of the random effect; ③ Information consistency of the parameter estimation is obtained.
The remainder of the paper is organized as follows. In Section 2, we present the nonparametric random effects model using extended t-process priors, and develop prediction distribution and estimation procedure. In Section 3, we conduct simulation studies and a real data example to evaluate the performance of the proposed method. The conclusions are given in Section 4. All the proofs are given in Appendix.
2.
Main results
2.1
Extended t-process
Extended t-process proposed by Wang et al.[10] is briefly introduced as follows. Let f(\cdot) , a real-valued random function from {\cal{X}} to R , satisfy that
\begin{array}{l} f \mid r \sim {\rm{GP}}(h, r k), \quad r \sim {\rm{IG}}(v, \omega) \end{array},
where {\rm{GP}}(\cdot,\cdot) and {\rm{IG}}(\cdot,\cdot) stand for Gaussian process and inverse gamma distribution respectively. Then f follows an extended t-process (ETP), and can be denoted by f \sim {\rm{ETP}}(v, \omega, h, k). We call h(\cdot): {\cal{X}} \rightarrow R mean function and k(\cdot, \cdot): {\cal{X}} \times {\cal{X}} \rightarrow R covariance kernel function. From the definition of ETP, we show that for any points {\boldsymbol{{X}}}=\left({\boldsymbol{{x}}}_{1}, \cdots, {\boldsymbol{{x}}}_{n}\right)^{\top} , we have
where {\boldsymbol{{h}}}_{n}=\left(h\left({\boldsymbol{{x}}}_{1}\right), \cdots, h\left({\boldsymbol{{x}}}_{n}\right)\right)^{\top}, {\boldsymbol{{K}}}_{n}=\left(k_{i j}\right)_{n \times n} and k_{i j}=k\left({\boldsymbol{{x}}}_{i}, {\boldsymbol{{x}}}_{j}\right) .
2.2
Function-on-function regression model with random effects
In model (1), the random effect \tau_m depicts individual effect. Considering robustness against outliers, an ETP process prior is applied to \tau_m . This paper assumes that \tau_m and \varepsilon_m have a joint extended t-process,
where \delta_{\varepsilon}(t, s)=I(t=s) and I(\cdot) is an indicator function.
Note that the random effect \tau_m relies on {\boldsymbol{z_m}}(t) and {\boldsymbol{x_m}}(\cdot,t) , then following Wang et al.[5], the kernel function k is an expression as
where {\boldsymbol{{u}}}_{{m}}(t) = ({\boldsymbol{{z}}}_{{m}}^{{\top}}(t), {\boldsymbol{{x}}}_{{m}}^{{\top}}(\cdot, t))^\top , {\boldsymbol{{z}}}_{{m}}(t) = (z_{m1}(t),\cdots,z_{mp}(t))^\top and {\boldsymbol{{x}}}_{{m}}(s,t) = (x_{m1}(s, t),\cdots,x_{mq}(s, t))^\top . Let {\boldsymbol{\theta}}=(\theta_{10}, \theta_{11}, \cdots , \theta_{1 Q}, \theta_{21}, \cdots , \theta_{2 Q})^{\top} represent a set of hyper-parameters with Q=p+q , and \|g(\cdot)\|_{\Lambda} be a \Lambda norm of function g . A choice of \|\cdot\|_{\Lambda} is the L_2 norm of a function, that is, \|g(\cdot)\|_{\Lambda}=\int g(s)^2{\rm{d}}s is a \Lambda norm of function g .
Let observations \{y_{m i}=y_{m}(t_{i}) , i=1, \cdots, n, m=1, \cdots, M\} , {\boldsymbol{{u}}}_{{m}}(t_i)=({\boldsymbol{{z}}}_{{m}}^{{\top}}(t_i),{\boldsymbol{{x}}}_{{m}}^{{\top}}(\cdot,t_i))^{\top} , error term \varepsilon_{m i}=\varepsilon_{m}\left(t_{i}\right), where \{t_{i}\} are observed times. Assume that true values of {\boldsymbol{{\nu}}} , {\boldsymbol{{\beta}}} , \tau_m in model (1) are {\boldsymbol{{\nu}}}_{{0}} , {\boldsymbol{{\beta}}}_{{0}} , \tau_{0 m} respectively. From model (1), we further consider the following (true) data model:
This paper aims to develop methods to estimate \nu_0 , {\boldsymbol{{\beta}}}_{{0}} , and predict \tau_{0m} .
2.3
Prediction
Denote c_m(t)={\boldsymbol{{z}}}_{{m}}^{{\top}}(t){\boldsymbol{{\nu}}}+\int_{S_t} {\boldsymbol{{x}}}_{{m}}^{{\top}}(s,t){\boldsymbol{{\beta}}}(s,t){\rm{d}}s. From model (1), we have the following results,
where {\boldsymbol{{y}}}_{{m}}=(y_{m}(t_{1}),\cdots,y_{m}(t_{n}))^{\top} are observations for the m th subject at points \{t_{1},\cdots,t_{n}\} , similarly, {\boldsymbol{{\tau}}}_{{m}}=(\tau_{m}({\boldsymbol{{u}}}_{{m}}(t_{1})), \cdots, \tau_{m}({\boldsymbol{{u}}}_{{m}}(t_{n})))^{\top}, {\boldsymbol{{c}}}_{{m}}=(c_{m}(t_{1}),\cdots,c_{m}(t_{n}))^{\top} , {\boldsymbol{{K}}}_{{m}}=(k_{{\boldsymbol{{\theta}}}}({\boldsymbol{{u}}}_{{m}}(t_{i}),{\boldsymbol{{u}}}_{{m}}(t_{j})))_{n \times n} , {\boldsymbol{I}} is the identity matrix.
Denoted by the data set {\cal{D}}=\{(y_{m}(t_{j}), {\boldsymbol{{u}}}_{{m}}(t_{j})): j=1, \cdots,n, m=1, \cdots, M\} . Since that
where {\boldsymbol{{k}}}_{{mt}}=(k({\boldsymbol{{u}}}_{{m}}(t), {\boldsymbol{{u}}}_{{m}}(t_{1})), \cdots, k({\boldsymbol{{u}}}_{{m}}(t), {\boldsymbol{{u}}}_{{m}}(t_{n})))^{\top} . It indicates that
It follows that Eq. (4) is an estimation of the covariance function of \hat{y}_{m}(\cdot) .
2.4
Parameter estimation
Note that {\boldsymbol{{\beta}}}(s, t) in model (1) is a smooth function and can be approximated based on basis functions \{\phi_k(s),k=1,\cdots,K_s\} , and \{\psi_k(s),k=1,\cdots,K_t\} ,
where \{b_{i k l}\} are coefficients, {\boldsymbol{B}}_{{i}}=(b_{i k l})_{K_{s} \times K_{t}} , {\boldsymbol{{\phi}}}(s)=\left(\phi_{1}(s), \cdots, \phi_{K_{s}}(s)\right)^{\top}, {\boldsymbol{{\psi}}}(t)=\left(\psi_{1}(t), \cdots, \psi_{K_{t}}(t)\right)^{\top} . Let {\boldsymbol{\phi}}_{{x m i}}(t)=\int_{S_{t}} {\boldsymbol{{\phi}}}(s) x_{m i}(s, t) {\rm{d}} s,and
where “\otimes” represents the Kronecker product. Hence, c_{m}(t)={\boldsymbol{\gamma}}_{{m}}(t)^{\top} {\boldsymbol{b}} . Let {\boldsymbol{\varGamma}}_{{m n}}=({\boldsymbol{\gamma}}_{{m}}(t_{1}), \cdots, {\boldsymbol{\gamma}}_{{m}}(t_{n})), then {\boldsymbol{{c}}}_{{m}}={\boldsymbol{\varGamma}}_{{m n}}^{{\top}} {\boldsymbol{{b}}}.
Next we estimate {\boldsymbol{\theta}} , {\boldsymbol{b}} and {\boldsymbol{\sigma}}^{{2}} via using a likelihood method. By Eq. (3), we obtain a likelihood function of {\boldsymbol{{y}}}_{{m}} ,
where H_m({\boldsymbol{{\theta}}}, {\boldsymbol{{b}}}, \sigma^{2})=({\boldsymbol{{y}}}_{{m}}-{\boldsymbol{\varGamma}}_{{m n}}^{{\top}} {\boldsymbol{{b}}})^{\top}({\boldsymbol{{K}}}_{{m}}+\sigma^{2} {\boldsymbol{{I}}})^{-1}({\boldsymbol{{y}}}_{{m}}-{\boldsymbol{\varGamma}}_{{m n}}^{{\top}} {\boldsymbol{{b}}}). Then we have the following objective function based on the log-likelihood function,
where \lambda_s and \lambda_t are tuning parameters. Take the derivative of G({\boldsymbol{{\theta}}}, {\boldsymbol{{b}}}, \sigma^{2}) with respect to {\boldsymbol{b}} , we can obtain the estimation equation
where {\boldsymbol{\varLambda}}={\rm diag}({\bf{0}}_{{p\times p}},\lambda_s{\boldsymbol{J}}_{{{\boldsymbol{{\psi}}} {\boldsymbol{{\psi}}}}}\otimes{\boldsymbol{L}}_{{{\boldsymbol{{\phi}}}{\boldsymbol{{\phi}}}}}+\lambda_t{\boldsymbol{L}}_{{{\boldsymbol{{\psi}}} {\boldsymbol{{\psi}}}}}\otimes{\boldsymbol{J}}_{{{\boldsymbol{{\phi}}} {\boldsymbol{{\phi}}}}},\cdots,\lambda_s{\boldsymbol{J}}_{{{\boldsymbol{{\psi}}} {\boldsymbol{{\psi}}}}}\otimes{\boldsymbol{L}}_{{{\boldsymbol{{\phi}}}{\boldsymbol{{\phi}}}}}+ \lambda_t{\boldsymbol{L}}_{{{\boldsymbol{{\psi}}} {\boldsymbol{{\psi}}}}}\otimes{\boldsymbol{J}}_{{{\boldsymbol{{\phi}}} {\boldsymbol{{\phi}}}}}) is a (p+qK_sK_t)\times (p+qK_sK_t) matrix. Similarly, we can get estimation equations with respect to {\boldsymbol{{\theta}}} and \sigma^{2} .
From these estimation equations, we construct an estimation procedure as follows.
Step 1 Given an initial estimate of {\boldsymbol{\theta}} ;
Step 2 Given {\boldsymbol{\theta}} , we update the estimates of {\boldsymbol{b}} and \sigma^2 via
Step 4 Repeat Step 2 and Step 3 until convergence.
Similar to Ref. [5], when the absolute value of relative difference of l({\boldsymbol{{\theta}}}, {\boldsymbol{{b}}}, \sigma^{2}) between two successive iterations is less than a given value, the procedure stops.
2.5
Information consistency
The common mean structure and its properties have been studied a lot in functional models, see Yao et al.[4], Yuan and Cai[15], Sun et al.[16], and among others. Next we only consider the information consistency. Let {\cal {X}}={\cal {X}}_1 \times {\cal{X}}_2 , where {\cal {X}}_1 and {\cal{X}}_2 are spaces covariates {\boldsymbol{z}}_{m}(t) and {\boldsymbol{x}}_{m}(\cdot,t) belonging to. Let p_{\sigma _{0}}({\boldsymbol{{y}}}_{{m}}|\tau_{0m},{\boldsymbol{{u}}}_{{m}}) be the density function to generate the data {\boldsymbol{{y}}}_{{m}} given {\boldsymbol{{u}}}_{{m}} and \tau_{0m} , where \sigma_{0} is the true value of \sigma , \tau_{0m} is the true value of \tau_m . Let p_{{\boldsymbol{{\theta}}}}(\tau) be a measurement of the random process \tau on space \cal{F}=\{\tau(\cdot,\cdot): {\cal{X}} \rightarrow R\} . Let
be the density function to generate the data {\boldsymbol{{y}}}_{{m}} given {\boldsymbol{{u}}}_m under model (1). Let p_{\sigma_0,\hat{{\boldsymbol{\theta }}}}({\boldsymbol{{y}}}_{{m}}|{\boldsymbol{{u}}}_{{m}}) be the estimated density function. Denote
as the Kullback-Leibler divergence between two densities p_{1} and p_{2} . According to Ref. [6], we only need to show the Kullback-Leibler divergence between two density functions for {\boldsymbol{{y}}}_{{m}}|{\boldsymbol{{u}}}_{{m}} from the true and the assumed models tends to zero when n is large enough.
For information consistency of the parameter estimation, we need the following condition.
where \|\tau_{0m}\|_k is the reproducing kernel Hilbert space norm of \tau_{0m} associated with k(\cdot , \cdot ;{\boldsymbol{{\theta}}}) , {\boldsymbol{{K}}}_{{m}} is covariance matrix of \tau_{0m} over {\boldsymbol{{u}}}_{{m}} , {\boldsymbol{{I}}} is the n \times n identity matrix.
More details about Condition (A) can see Seeger et al.[17] and Wang et al.[5]. More on reproducing kernel Hilbert space can see Berlinet and Thomas[18].
Proposition 2.1. Under the conditions in Lemma A.1 (Appendix) and condition (A), we have
\begin{array}{l} \dfrac{1}{n} E_{{\boldsymbol{{u}}}_{{m}}}\left(D[p_{\sigma _{0}}({\boldsymbol{{y}}}_{{m}}|\tau_{0m},{\boldsymbol{{u}}}_{{m}}),p_{\sigma_0,\hat{\boldsymbol{{\theta }}}}({\boldsymbol{{y}}}_{{m}}|{\boldsymbol{{u}}}_{{m}})]\right) \longrightarrow 0, \quad {\rm { as }} \quad n \rightarrow \infty, \end{array}
where the expectation is taken over the distribution of {\boldsymbol{{u}}}_{{m}} .
3.
Numerical results
3.1
Simulations
Performance of the proposed method is investigated by numerical studies. Simulation data are generated by the following model,
where {z}_{m}(\cdot) \sim {\rm{GP}}(h_1, k_1), h_1 = h_1(t) = t , for t \in (0,1) , k_1 = k_1({z}_{m}(t_1), {z}_{m}(t_2)) \;=\; g(t_1,t_2) \;=\; 0.1\exp\{-5(t_1-t_2)^2\} + 0.1t_1t_2, and {x}_{m} (\cdot,\cdot) \sim GP(h_2, k_2), h_2 = h_2(t) = t + {\rm{cos}}(s)(s), for t,s \in (0,1) , k_2 = k_2({x}_{m}(s_1,t),{x}_{m}(s_2,t)) = g(s_1,s_2). Let {\boldsymbol{{\nu}}} = 1.0, \theta_{10} = \theta_{12} =\theta_{21} = \theta_{22} = 0.1, \theta_{11} = 10, \sigma^2 = 0.5 , and t and s take 20 points equally in (0,1). Consider four different combinations of \tau_{m} and {\boldsymbol{{\beta}}}(s,t) ,
S1: \tau_{m} \sim {\rm{GP}}(0, {\rm Cov}(\tau_{m}({\boldsymbol{{u}}}_{{m}}(t_{1})), \tau_{m}({\boldsymbol{{u}}}_{{m}}(t_{2})))), and h_2 = h_2(t) = t +{\rm{cos}}(s) (s), for s,t \in (0,1) ;
S2: \tau_{m} \sim {\rm{GP}}(0, {\rm Cov}(\tau_{m}({\boldsymbol{{u}}}_{{m}}(t_{1})), \tau_{m}({\boldsymbol{{u}}}_{{m}}(t_{2})))), and {\boldsymbol{{\beta}}}(s,t) \;=\; \exp \{-(t^2 + s^2)\}/10, for s,t \in (0,1) ;
S3: \tau_{m} =0 and {\boldsymbol{{\beta}}}(s,t) = (t^2 + \cos(s))/10 , for s,t \in (0,1) ;
S4: \tau_{m} =0 and {\boldsymbol{{\beta}}}(s,t) = \exp\{-(t^2 + s^2)\}/10 , for s,t \in (0,1) .
We take sample sizes M =10, 20, and 30. All simulations are repeated 500 times.
To show robustness of model (1) with random effect having ETPR, saying ETPR, we also compute model (1) with random effect having GPR, denoted by GPR. Two indices: prediction error (PE),
are applied to show performance of two methods: ETPR and GPR, where \hat{f}(t)={\boldsymbol{{z}}}_{{m}}^{{\top}}(t)\hat{{\boldsymbol{{\nu}}}}+\int_{0}^{1} {\boldsymbol{{x}}}_{{m}}^{{\top}}(s,t)\hat{{\boldsymbol{{\beta}}}}(s,t){\rm{d}}s+ \hat{\tau}_{m}({\boldsymbol{{z}}}_{{m}}(t), {\boldsymbol{{x}}}_{{m}}(\cdot, t)) is an estimator of the true regression function f_0(t)= {\boldsymbol{{z}}}_{{m}}^{{\top}}(t){\boldsymbol{{\nu}}}_0+\int_{0}^{1} {\boldsymbol{{x}}}_{{m}}^{{\top}}(s,t){\boldsymbol{{\beta}}}_0(s,t){\rm{d}}s+\tau_{0m}({\boldsymbol{{z}}}_{{m}}(t), {\boldsymbol{{x}}}_{{m}}(\cdot, t)). To show robustness of our method, one curve is randomly selected and added with an extra disturbance, \delta t_3 , where t_3 stands for student t distribution with degree of freedom 3. Table 1 presents the values of PE and AB from these two methods. We see that ETPR has smaller PE and AB than GPR, especially with \delta = 1.0 and small sample sizes. It shows that the proposed method ETPR has more robustness against outliers compared to GPR.
Table
1.
PE and AB of prediction from ETPR method and GPR method, where SDs are presented in parentheses.
In addition, we also consider one constant disturbance for the abnormal curves with small sample sizes 10 and 20. Tables 2 and 3 present PE and AB of prediction from ETPR method and GPR method for one and two curves disturbed, respectively. We see that ETPR has better performance in prediction compared to GPR.
Table
2.
PE and AB of prediction from ETPR method and GPR method with one curve disturbed by constant 1.0, where SDs are presented in parentheses.
The proposed method is applied to Canadian weather data, which is obtained from the R package fda. We aim to study fixed effect of temperature on precipitation by common temperature effect of stations in the same region, and random effect of temperature on precipitation by individual effect of each station. Generally, the 35 stations are divided into four regions: Arctic, Atlantic, Pacific and Continental. Obviously, there exists heterogeneity among the stations due to the spatial nature of the weather data. Then we propose the following model:
where {y}_{ij}(t) represents precipitation and {x}_{ij}(t) represents temperature, for time t , region i and j th station. In this model, we have {z}_{ij}(t) = 1 and {x}_{ij}(s,t) = {x}_{ij}(s) which effectively simplifies model fit.
Figs. 1 and 2 show random and fixed effects of the 4 regions: Arctic, Atlantic, Pacific and Continental from the proposed method. We see from the random effects that each station in the same region has different temperature effects on the precipitation. To compare performance of prediction from ETPR with GPR, 10-folds cross validation method is used to compute mean squares of prediction errors, 0.310 and 0.314, for ETPR and GPR, respectively. It shows that ETPR has a little better performance in prediction.
Figure
1.
Random and fixed effects of model using ETPR for Arctic and Atlantic.
A function-on-function random effects model with extended t-process prior in this paper is developed to analyze functional data which may include outliers. The proposed model is flexible, including various kinds of functional models, such as the function-on-function linear model[2] and the historical functional regression model[7] as special cases. The proposed extended t-process model is not only robust against outliers, but also inherits almost all the nice properties from Gaussian process regression, such as closed form of prediction and convenient computation procedure. The estimation procedure and computing algorithm are developed to estimate the parameters and predict the random effect in the regression model. The functional response considered in this paper has one dimension. In practical application, functional multi-response may consist of several correlated curves. It is interesting that the proposed method is extended to functional data with multi-response, which will be studied in our further work.
Appendix
Lemma A.1. Let w=v-1 . Under model (1), assume that {\boldsymbol{{y}}}_{{m}} are independently sampled, the covariance kernel function k is bounded and continuous on the parameter {\boldsymbol{{\theta}}} , and \hat{{\boldsymbol{\theta}}} converges to {\boldsymbol{{\theta}}} when n \rightarrow \infty . Then, for a positive constant c and any \varepsilon>0 , when n is large enough, we have
where q_m^{2}=({\boldsymbol{{y}}}_m-{\boldsymbol{{c}}}_{{0m}}-{\boldsymbol{{\tau}}}_{{0m}})^{\top}({\boldsymbol{{y}}}_{{m}}-{\boldsymbol{{c}}}_{{0m}}-{\boldsymbol{{\tau}}}_{{0m}}) / \sigma_{0}^2 , {\boldsymbol{{c}}}_{{0m}} is the true value of {\boldsymbol{{c}}}_{{m}} , \|\tau_{0m}\|_k is the reproducing kernel Hilbert space norm of \tau_{0m} associated with k(\cdot , \cdot ;{\boldsymbol{{\theta}}}) , {\boldsymbol{{K}}}_{{m}} is covariance matrix of \tau_{0m} over {\boldsymbol{{u}}}_{{m}} , {\boldsymbol{{I}}} is the n \times n identity matrix.
Proof of Lemma A.1. Assume r is a random variable following inverse gamma distribution {\rm{IG}}(v,(v-1)). Conditional on r , we have
where {\rm{GP}}(h,k) stands for Gaussian process with mean function h and covariance function k . Then conditional on r_m , the extended t-process regression model y_m=c_m+\tau_m+\varepsilon_m becomes Gaussian process regression model
where \tilde{\tau}_m=\tau_m|r_m \sim {\rm{G P}}(0, r_m k(\cdot, \cdot; {\boldsymbol{\theta}})), \tilde{\varepsilon}_m=\varepsilon_m| r_m \sim {\rm{G P}}(0, r \sigma^2\delta_{\varepsilon}), and \tilde{\tau}_m and \tilde{\varepsilon}_m are independent. Denoted the computation of conditional probability density for given r_m by \tilde{p} . Let
where \tilde{p}_{\boldsymbol{\theta}} is the induced measure from Gaussian process {\rm{G P}}(0, r_m k(\cdot,\cdot ; \hat{\boldsymbol{\theta}})). Note that variable r is independent of {\boldsymbol{{u}}}_{{m}} . We can show that
Proof of Proposition 2.1. Obviously q_m^{2}=({\boldsymbol{{y}}}_m-{\boldsymbol{{c}}}_{{0m}}-{\boldsymbol{{\tau}}}_{{0m}})^{\top} \cdot ({\boldsymbol{{y}}}_m-{\boldsymbol{{c}}}_{{0m}}-{\boldsymbol{{\tau}}}_{{0m}}) / \sigma_{0}^2=O(n). Under the conditions of Lemma A.1 and condition (A), by Lemma A.1, for a positive constant c and any \varepsilon>0 , when n is large enough, we have
We thank the reviewers for their insightful comments and suggestions. This work was supported in part by the National Natural Science Foundation of China (11971457), Anhui Provincial Natural Science Foundation (1908085MA06) and the Fundamental Research Funds for the Central Universities (WK2040000035).
Conflict of interest
The authors declare that they have no conflict of interest.
Conflict of Interest
The authors declare that they have no conflict of interest.
A function-on-function random effects model with extended t-process priors is considered.
The proposed model is general and flexible which includes various kinds of functional models as special cases.
The extended t-process model is robust to outliers and inherits almost all the good features for Gaussian process regression.
Wang Z, Noh M, Lee Y, et al. A general robust t-process regression model. Computational Statistics and Data Analysis,2021, 154: 107093. DOI: 10.1016/j.csda.2020.107093
[2]
Yuan M, Cai T T. A reproducing kernel Hilbert space approach to functional linear regression. The Annals of Statistics,2010, 38 (6): 3412–3444. DOI: 10.1214/09-AOS772
[3]
Wang Z, Shi J Q, Lee Y. Extended t-process regression models. Journal of Statistical Planning and Inference,2017, 189: 38–60. DOI: 10.1016/j.jspi.2017.05.006
[4]
Seeger M W, Kakade S M, Foster D P. Information consistency of nonparametric Gaussian process methods. IEEE Transactions on Information Theory,2008, 54: 2376–2382. DOI: 10.1109/TIT.2007.915707
[5]
Zhang Y, Yeung D Y. Multi-task learning using generalized t-process. In: Proceedings of the Thirteenth International Conference on Artificial Intelligence and Statistics. Cambridge, MA: PMLR, 2010: 964–971.
[6]
Yao F, Müller H G, Wang J L. Functional data analysis for sparse longitudinal data. Journal of the American Statistical Association,2005, 100: 577–590. DOI: 10.1198/016214504000001745
[7]
Wang B, Shi J Q. Generalized gaussian process regression model for non-gaussian functional data. Journal of the American Statistical Association,2014, 109: 1123–1133. DOI: 10.1080/01621459.2014.889021
[8]
Shi J Q, Choi T. Gaussian Process Regression Analysis for Functional Data. Boca Raton, FL: CRC Press, 2011
[9]
Wang Z, Ding H, Chen Z, et al. Nonparametric random effects functional regression model using Gaussian process priors. Statistica Sinica,2021, 31: 53–78. DOI: 10.5705/ss.202018.0296
[10]
Yu S, Tresp V, Yu K. Robust multi-task learning with t-processes. In: Proceedings of the 24th International Conference on Machine Learning. New York: ACM, 2007: 1103–1110.
[11]
Berlinet A, Thomas-Agnan C. Reproducing Kernel Hilbert Spaces in Probability and Statistics. Berlin: Springer Science & Business Media, 2011.
[12]
Malfait N, Ramsay J O. The historical functional linear model. Canadian Journal of Statistics,2003, 31: 115–128. DOI: 10.2307/3316063
[13]
Sun X, Du P, Wang X, et al. Optimal penalized function-on-function regression under a reproducing kernel Hilbert space framework. Journal of the American Statistical Association,2018, 113 (524): 1601–1611. DOI: 10.1080/01621459.2017.1356320
[14]
Shah A, Wilson A, Ghahramani Z. Student-t processes as alternatives to Gaussian processes. In: Proceedings of the Seventeenth International Conference on Artificial Intelligence and Statistics. Cambridge, MA: PMLR, 2014: 877–885.
[15]
Gervini D. Dynamic retrospective regression for functional data. Technometrics,2015, 57: 26–34. DOI: 10.1080/00401706.2013.879076
[16]
Ramsay J O, Silverman B W. Functional Data Analysis. New York: Springer, 2005.
[17]
Ramsay J O, Dalzell C. Some tools for functional data analysis. Journal of the Royal Statistical Society: Series B (Statistical Methodology),1991, 53: 539–572. DOI: 10.1111/j.2517-6161.1991.tb01844.x
[18]
Yao F, Müller H G, Wang J L. Functional linear regression analysis for longitudinal data. The Annals of Statistics,2005, 33: 2873–2903. DOI: 10.1214/009053605000000660
Figure
1.
Random and fixed effects of model using ETPR for Arctic and Atlantic.
Figure
2.
Random and fixed effects of model using ETPR for Continental and Pacific.
References
[1]
Wang Z, Noh M, Lee Y, et al. A general robust t-process regression model. Computational Statistics and Data Analysis,2021, 154: 107093. DOI: 10.1016/j.csda.2020.107093
[2]
Yuan M, Cai T T. A reproducing kernel Hilbert space approach to functional linear regression. The Annals of Statistics,2010, 38 (6): 3412–3444. DOI: 10.1214/09-AOS772
[3]
Wang Z, Shi J Q, Lee Y. Extended t-process regression models. Journal of Statistical Planning and Inference,2017, 189: 38–60. DOI: 10.1016/j.jspi.2017.05.006
[4]
Seeger M W, Kakade S M, Foster D P. Information consistency of nonparametric Gaussian process methods. IEEE Transactions on Information Theory,2008, 54: 2376–2382. DOI: 10.1109/TIT.2007.915707
[5]
Zhang Y, Yeung D Y. Multi-task learning using generalized t-process. In: Proceedings of the Thirteenth International Conference on Artificial Intelligence and Statistics. Cambridge, MA: PMLR, 2010: 964–971.
[6]
Yao F, Müller H G, Wang J L. Functional data analysis for sparse longitudinal data. Journal of the American Statistical Association,2005, 100: 577–590. DOI: 10.1198/016214504000001745
[7]
Wang B, Shi J Q. Generalized gaussian process regression model for non-gaussian functional data. Journal of the American Statistical Association,2014, 109: 1123–1133. DOI: 10.1080/01621459.2014.889021
[8]
Shi J Q, Choi T. Gaussian Process Regression Analysis for Functional Data. Boca Raton, FL: CRC Press, 2011
[9]
Wang Z, Ding H, Chen Z, et al. Nonparametric random effects functional regression model using Gaussian process priors. Statistica Sinica,2021, 31: 53–78. DOI: 10.5705/ss.202018.0296
[10]
Yu S, Tresp V, Yu K. Robust multi-task learning with t-processes. In: Proceedings of the 24th International Conference on Machine Learning. New York: ACM, 2007: 1103–1110.
[11]
Berlinet A, Thomas-Agnan C. Reproducing Kernel Hilbert Spaces in Probability and Statistics. Berlin: Springer Science & Business Media, 2011.
[12]
Malfait N, Ramsay J O. The historical functional linear model. Canadian Journal of Statistics,2003, 31: 115–128. DOI: 10.2307/3316063
[13]
Sun X, Du P, Wang X, et al. Optimal penalized function-on-function regression under a reproducing kernel Hilbert space framework. Journal of the American Statistical Association,2018, 113 (524): 1601–1611. DOI: 10.1080/01621459.2017.1356320
[14]
Shah A, Wilson A, Ghahramani Z. Student-t processes as alternatives to Gaussian processes. In: Proceedings of the Seventeenth International Conference on Artificial Intelligence and Statistics. Cambridge, MA: PMLR, 2014: 877–885.
[15]
Gervini D. Dynamic retrospective regression for functional data. Technometrics,2015, 57: 26–34. DOI: 10.1080/00401706.2013.879076
[16]
Ramsay J O, Silverman B W. Functional Data Analysis. New York: Springer, 2005.
[17]
Ramsay J O, Dalzell C. Some tools for functional data analysis. Journal of the Royal Statistical Society: Series B (Statistical Methodology),1991, 53: 539–572. DOI: 10.1111/j.2517-6161.1991.tb01844.x
[18]
Yao F, Müller H G, Wang J L. Functional linear regression analysis for longitudinal data. The Annals of Statistics,2005, 33: 2873–2903. DOI: 10.1214/009053605000000660