Zhishui Hu received his Ph.D. degree in Statistics from the University of Science and Technology of China (USTC) in 2005. He is currently an Associate Professor with the USTC. He is mainly engaged in limit theory. Related research results were published in Annals of Statistics, Journal of Applied Probability, Electronic Journal of Probability, Science China Mathematics, and other academic journals
Liang Dong received his Ph.D. degree in Statistics from the University of Science and Technology of China in 2022. He is currently a lecturer with Nanjing University of Science and Technology. His research mainly focuses on limit theory in random graphs
Consider a rooted N-ary tree. To each of its vertices, we assign an independent and identically distributed continuous random variable. A vertex is called accessible if the assigned random variables along the path from the root to it are increasing. We study the number CN,k of accessible vertices of the first k levels and the number CN of accessible vertices in the N-ary tree. As N→∞, we obtain the limit distribution of CN,βN as β varies from 0 to +∞ and the joint limiting distribution of (CN,CN,αN+t√αN) for 0<α⩽ and t\in \mathbb{R} . In this work, we also obtain a weak law of large numbers for the longest increasing path in the first n levels of the N -ary tree for fixed N .
Graphical abstract
The overall framework of our accessibility percolation on a rooted tree.
Abstract
Consider a rooted N -ary tree. To each of its vertices, we assign an independent and identically distributed continuous random variable. A vertex is called accessible if the assigned random variables along the path from the root to it are increasing. We study the number C_{N,\,k} of accessible vertices of the first k levels and the number C_N of accessible vertices in the N -ary tree. As N\rightarrow \infty , we obtain the limit distribution of C_{N,\, \beta N} as \beta varies from 0 to +\infty and the joint limiting distribution of (C_{N}, C_{N,\,\alpha N+t \sqrt{\alpha N}}) for 0 < \alpha\leqslant 1 and t\in \mathbb{R} . In this work, we also obtain a weak law of large numbers for the longest increasing path in the first n levels of the N -ary tree for fixed N .
Public Summary
Several limit theorems for the number of accessible vertices on an N-ary tree are established.
The law of large numbers of the length of longest increasing paths on an N-ary tree is proved.
For N \in \mathbb{N} , let \mathbb{T}^{(N)} be a rooted N -ary tree, in which each vertex has exactly N children. The root of \mathbb{T}^{(N)} is denoted by o . Each vertex \sigma \in \mathbb{T}^{(N)} is assigned a continuous random variable X_{\sigma} , called its fitness. The fintness values \{X_{\sigma}, \sigma \in \mathbb{T}^{(N)}\} are independent and identically distributed (i.i.d.) random variables. Let |\sigma| denote the graph distance from the root o to \sigma . For any \sigma \in \mathbb{T}^{(N)} with |\sigma|=k \in \mathbb{N} , there is a unique path P from o to \sigma :
This model is called accessibility percolation by Nowak and Krug[1].
The accessibility percolation model is inspired by evolutionary biology. Each vertex represents one genotype that has an associated fitness value. A particular genotype gives rise to N new genotypes through mutations, which either replace the original wild genotype or disappear. In the “strong selection, weak mutation” (SSWM) regime, only mutations which reproduce a fitter genotype can replace the wild genotype and survive. Thus, a survival mutation path is one with increasing fitness values. In this paper, we use the House of Cards (HoC) model (see, for instance, Refs. [2, 3]), in which all fitness values are independent and identically continuously distributed.
The accessibility percolation on N -ary tree \mathbb{T}^{(N)} has been studied by many scholars, and most of them concentrated on limit properties of the number of accessible vertices with level k\in \mathbb{N} , which can be written as
where we sum over all vertices \sigma with |\sigma|=k in the N -ary tree.
Since we only care about whether the fitness values along a path are in increasing order, as long as the random variables are continuous, changing the precise distribution will not influence the results. Without loss of generality, we assume throughout this work that all the random variables \{X_{\sigma}: \sigma\in \mathbb{T}^{(N)}\} are independent and uniformly distributed on [0, 1]. For any x \in [0,1] , we introduce the probability measure under the condition that the fitness value of the root X_o=x is given, i.e.,
and denote by \mathbb{E}_x the expectation with respect to \mathbb{P}_x .
Nowak and Krug[1], Roberts and Zhao[4], Chen[5], and Duque et al.[6] studied the probability \mathbb{P}_0(Z_{N,k}\ge 1) . Roberts and Zhao[4] proved that
He also obtained the asymptotic behaviors of Z_{N,\alpha N} as N \to \infty . Assume that 0<\alpha <1 and x>0 . Then, under \mathbb{P}_{1-\alpha+x/N} ,
where m_{N,\alpha}=(\alpha N)^{\alpha N}/(\alpha N)! and Z is an exponential variable with mean 1.
The numbers of increasing paths on other graphs have also been widely studied. Coletti et al.[7] considered infinite spherically symmetric trees, Hegarty and Martinsson[8], Berestycki et al.[9, 10], and Li[11] studied the N -dimensional binary hypercube \{0,1\}^N . We refer to Krug[12] and reference therein for more related models and results. Hu et al.[13] studied the number of accessible vertices on random rooted labelled trees.
The remainder of this paper is organized as follows. In Section 2, main results are stated. The proofs of Theorems 2.1–2.3 are provided in Section 3. Finally, we present the proof of Theorem 2.4 in Section 4.
2.
Main results
We first consider the number of accessible vertices in the first k levels, i.e.,
\begin{eqnarray} C_{N,\,k}:=\sum_{|\sigma|\leqslant k} \mathbb{I}(\sigma \rm{ is accessible})=\sum_{j=0}^kZ_{N,j}, \end{eqnarray}
(4)
where we sum over all nodes \sigma with |\sigma|\leqslant k in the N -ary tree. Assume that x>0,\; \beta>0 and 0 < \alpha\leqslant 1 are real numbers and let Z be an exponential variable with mean 1.
Theorem 2.1. Under \mathbb{P}_{1-\alpha+x/N} , we have that as N\rightarrow \infty ,
We can also study the total number of accessible vertices in the N -ary tree \mathbb{T}^{(N)} , which can be written as
\begin{array}{*{20}{l}} C_{N}:=\displaystyle \sum\limits_{\sigma} \mathbb{I}(\sigma \rm{ is accessible}), \end{array}
where we sum over all nodes \sigma in the N -ary tree. For any x\in [0,1] , by noting that \mathbb{E}_x (C_N) ={\rm e}^{-N(1-x)} < \infty (see Lemma 3.2), we have C_N < \infty\; {\rm a.s.} \; \mathbb{P}_x. Furthermore, we obtain the joint limiting distribution of (C_{N}, C_{N,\,\alpha N+t \sqrt{\alpha N}}) as N \to \infty .
Theorem 2.2. For any 0 < \alpha\leqslant 1, \; x > 0 and t\in \mathbb{R} , we have that under \mathbb{P}_{1-\alpha+x/N} ,
where \varPhi(\cdot) is the cumulative distribution function of the standard normal distribution.
From Theorems 2.1 and 2.2, C_N and C_{N,\, \beta N} (with \beta>\alpha ) have the same limit distribution. It is natural to imagine that the number of accessible vertices at distances greater than \beta N from the root o is relatively small. In fact, the following Theorem 2.3 shows that under \mathbb{P}_{1-\alpha+x/N} , most of the accessible vertices concentrate on the levels from \alpha N-b_N \sqrt{N} to \alpha N+b_N \sqrt{N} for any sequence \{b_N\} with b_N\rightarrow \infty .
Theorem 2.3. Let \{b_N, N \geqslant 1\} be a sequence of real numbers with b_N\rightarrow \infty as N\rightarrow \infty . Then under \mathbb{P}_{1-\alpha+x/N} ,
Theorems 2.1–2.3 describe the distribution of the number of accessible vertices from the root o . Another interesting quantity is the maximum level of accessible vertices (i.e., the longest increasing path from o ) which is defined as
\begin{array}{*{20}{l}} M_N:=\max\{|\sigma|: \sigma \rm{ is accessible} \}. \end{array}
Noting that \{M_n\geqslant k\}=\{Z_{N,k}\geqslant 1\}, it follows from Eq. (2) that (under \mathbb{P}_0 )
In the related existing results on N -ary trees, it is generally assumed that N \rightarrow \infty . For fixed N , it is clear that M_N\leqslant C_N < \infty a.s.. In this case, we can consider the longest increasing path down the N -ary tree. We say that the path P=\sigma_0\sigma_1\cdots \sigma_k with length l(P)=k is a path down the tree if P starts at any vertex and descends into children until it stops at some node, i.e., |\sigma_0|=|\sigma_1|-1=\cdots=|\sigma_k|-k , and P is increasing if X_{\sigma_0}<X_{\sigma_1}<\cdots<X_{\sigma_k} . Let \mathbb{T}^{(N)}_n be the subgraph of \mathbb{T}^{(N)} induced by the set of vertices with levels not exceeding n . Define
Furthermore, let \sigma_1\wedge \sigma_2 denote the latest common ancestor of \sigma_1 and \sigma_2 . If |\sigma_1|=m+i, |\sigma_2|=m+j and |\sigma_1 \wedge \sigma_2|=m for some m\geqslant 0 and i,j\geqslant 0, then, for any 0\leqslant x\leqslant 1,
Furthermore, we have \mathbb{E}_x (C_N)={\rm e}^{N(1-x)} and \rm{Var}_x(C_{N})\leqslant {\rm e}^{2N(1-x)}.
Proof. Since C_{N,k}=\sum_{|\sigma|\leqslant k} \mathbb{I}(\sigma \rm{ is accessible}) and \#\{\sigma: |\sigma|= k\}= N^k , it follows from Eq. (6) that
Since C_{N,\,k}, k=1,2,\cdots, are nondecreasing with respect to k and \lim_{k\rightarrow \infty} C_{N,k}=C_N , we obtain that \rm{Var}_x(C_{N})\leqslant (\mathbb{E}_x C_{N})^2\leqslant {\rm e}^{2N(1-x)} by the monotone convergence theorem. The proof of Lemma 3.2 is completed.
Remark 3.1. For C_N , we can obtain the explicit expression of \rm{Var}_x (C_N) . By using arguments similar to those in (10) and (11), we have
Proof. For any N\in \mathbb{N} , we let X_{N1}, X_{N2},\cdots, X_{NN} denote a sequence of i.i.d. Poisson random variables with mean \alpha-x/N , then X_{N1}+\cdots+X_{NN} is also Poisson-distributed (with mean \alpha N-x ), i.e.,
\begin{array}{*{20}{l}} \lim\limits_{N\rightarrow \infty}\mathbb{P}(X_{N1}+\cdots+X_{NN}\leqslant \beta N+k)=\left\{ \begin{array}{ll} 0, & \rm{ if }\beta< \alpha;\\ 1/2, & \rm{ if }\beta=\alpha;\\ 1, & \rm{ if }\beta>\alpha. \end{array} \right. \end{array}
This, together with (12), proves Lemma 3.3.
For any k\geqslant 1, we let {\cal{F}}_k denote the available information of the first k generation on the N -ary tree, i.e., {\cal{F}}_k=\sigma \{X_{\sigma}: |\sigma|\leqslant k\}.
Lemma 3.4. For any 0 < \alpha\leqslant 1, \; \beta > 0,\; x>0 and \varepsilon>0 , we have
Proof. Let \sigma be a vertex on the N -ary tree. On the subtree rooted at \sigma , we let C_{N,\, k}(\sigma) be the number of accessible vertices in the first k generations. Then, for any k<\beta N ,
By noting that \{(C_{N,\, \beta N-k}(\sigma), X_{\sigma}), |\sigma|=k\} are i.i.d. random vectors and have the same distribution as (C_{N,\, \beta N-k}, X_{o}), we have
Let v_1, \dots, v_N be all the children of the root o , and define \varTheta_k(v_i) in the subtree rooted at v_i as in (15), i.e., \varTheta_{k}(v_i):=\mathbb{E}(C_{N,\,\beta N+k}(v_i) |{\cal{F}}_{k+1}), where C_{N,\,\beta N+k}(v_i) is the number of accessible vertices in the first \beta N+k generations of the subtree rooted at v_i . Then \{(\varTheta_k(v_i), X_{v_i}), i=1,\cdots,N\} are i.i.d. random vectors and have the same distribution as (\varTheta_k, X_{o}). By noting that
It follows from Eq. (16) and Lemma 3.3 that Eq. (18) holds true for k=0 . Now suppose that (18) holds for k\geqslant 0. By a change of variable in Eq. (17), we have
and then \lim_{N\rightarrow \infty}\theta_{k+1}(\lambda,1-\alpha+x/N,N) =Q_{k+1}(\lambda ,x) from Eq. (19). By induction, we conclude Eq. (18) for all k\geqslant 0.
If \beta<\alpha , it is clear that Q_k(\lambda, x)\equiv 1 for all \lambda>0 and x>0 . If \beta\geqslant \alpha, then, following the same arguments in the proof of Theorem 1 in Ref. [9] or Proposition B.2 in Ref. [5], we have
By noting that the generating function of {\rm e}^{-x}Z is \mathbb{E}(\exp\{-\lambda {\rm e}^{-x} Z\})=(1+\lambda {\rm e}^{-x})^{-1}, it follows from Lemma 3.5 that
Proof of Theorem 2.3. Note that for any t\in \mathbb{R} , C_{N, \,\alpha N-b_N \sqrt{N}}\leqslant C_{N,\,\alpha N+t \sqrt{\alpha N}} holds for sufficiently large N . By using Theorem 2.2, we have that under \mathbb{P}_{1-\alpha+x/N} ,
Similarly, for any t\in \mathbb{R} , C_N\geqslant C_{N, \alpha N+b_N \sqrt{N}}\geqslant C_{N,\alpha N+t \sqrt{\alpha N}} holds for sufficiently large N . It follows from Theorem 2.2 that under \mathbb{P}_{1-\alpha+x/N} ,
In this proof, N\geqslant 2 is a fixed positive integer.
Let \mathcal{P}_{n,\,k} be the set of paths down \mathbb{T}^{(N)}_n with length k . Define T_{n,\,k} to be the number of increasing paths in \mathcal{P}_{n,\,k}:
We say that two paths P and P' are vertex-disjoint if V(P)\cap V(P')=\emptyset , where V(P) and V(P') are the vertex sets of P and P' , respectively. For any P=x_0x_1\cdots x_k, \tilde{P}=\tilde{x}_0\tilde{x}_1\cdots\tilde{x}_k\in \mathcal{P}_{n,k} with |x_0|\leqslant |\tilde{x}_0|, if P' and P are not vertex-disjoint, then there exist integers l,\, m such that 0\leqslant l \leqslant m \leqslant k and x_i=\tilde{x}_{i-m+l} iff m-l\leqslant i\leqslant m. By Lemma 3.1, we have
Note that \mathbb{I}\, (P \; \rm{is increasing}) and \mathbb{I}\, (\tilde{P} \; \rm{is increasing}) are independent if P and \tilde{P} are vertex-disjoint. Thus,
For any \varepsilon>0 , we take k_n=(1+\varepsilon)n \log N/\log n and \tilde{k}_n=(1-\varepsilon)n \log N/\log n . Then, by applying Eq. (24), we have
We thank the anonymous referees for the useful suggestions that greatly improved the presentation of this work. This work was supported by the National Natural Science Foundation of China (11671373).
Conflict of interest
The authors declare that they have no conflict of interest.
Nowak S, Krug J. Accessibility percolation on n-trees. Europhysics Letters,2013, 101 (6): 66004. doi: 10.1209/0295-5075/101/66004
[2]
Kingman J F C. A simple model for the balance between selection and mutation. Journal of Applied Probability,1978, 15 (1): 1–12.
[3]
Kauffman S, Levin S. Towards a general theory of adaptive walks on rugged landscapes. Journal of Theoretical Biology,1987, 128 (1): 11–45. doi: 10.1016/S0022-5193(87)80029-2
[4]
Roberts M I, Zhao L Z. Increasing paths in regular trees. Electronic Communications in Probability,2013, 18: 1–10. doi: 10.1214/ECP.v18-2784
Duque F, Roldán-Correa A, Valencia L A. Accessibility percolation with crossing valleys on n-ary trees. Journal of Statistical Physics,2019, 174 (5): 1027–1037. doi: 10.1007/s10955-019-02223-5
[7]
Coletti C F, Gava R J, Rodríguez P M. On the existence of accessibility in a tree-indexed percolation model. Physica A: Statistical Mechanics and its Applications,2018, 492: 382–388. doi: 10.1016/j.physa.2017.10.019
[8]
Hegarty P, Martinsson A. On the existence of accessible paths in various models of fitness landscapes. The Annals of Applied Probability,2014, 24 (4): 1375–1395. doi: 10.1214/13-AAP949
[9]
Berestycki J, Brunet E, Shi Z. The number of accessible paths in the hypercube. Bernoulli,2016, 22 (2): 653–680. doi: 10.3150/14-BEJ641
[10]
Berestycki J, Brunet E, Shi Z. Accessibility percolation with backsteps. ALEA Latin American Journal of Probability and Mathematical Statistics,2017, 14 (1): 45–62. doi: 10.30757/ALEA.v14-04
[11]
Li L. Phase transition for accessibility percolation on hypercubes. Journal of Theoretical Probability,2018, 31 (4): 2072–2111. doi: 10.1007/s10959-017-0769-x
[12]
Krug J. Accessibility percolation in random fitness landscapes. In: Probabilistic Structures in Evolution. Berlin: EMS Press, 2021: 1–22
[13]
Hu Z, Li Z, Feng Q. Accessibility percolation on random rooted labeled trees. Journal of Applied Probability,2019, 56 (2): 533–545. doi: 10.1017/jpr.2019.29
Nowak S, Krug J. Accessibility percolation on n-trees. Europhysics Letters,2013, 101 (6): 66004. doi: 10.1209/0295-5075/101/66004
[2]
Kingman J F C. A simple model for the balance between selection and mutation. Journal of Applied Probability,1978, 15 (1): 1–12.
[3]
Kauffman S, Levin S. Towards a general theory of adaptive walks on rugged landscapes. Journal of Theoretical Biology,1987, 128 (1): 11–45. doi: 10.1016/S0022-5193(87)80029-2
[4]
Roberts M I, Zhao L Z. Increasing paths in regular trees. Electronic Communications in Probability,2013, 18: 1–10. doi: 10.1214/ECP.v18-2784
Duque F, Roldán-Correa A, Valencia L A. Accessibility percolation with crossing valleys on n-ary trees. Journal of Statistical Physics,2019, 174 (5): 1027–1037. doi: 10.1007/s10955-019-02223-5
[7]
Coletti C F, Gava R J, Rodríguez P M. On the existence of accessibility in a tree-indexed percolation model. Physica A: Statistical Mechanics and its Applications,2018, 492: 382–388. doi: 10.1016/j.physa.2017.10.019
[8]
Hegarty P, Martinsson A. On the existence of accessible paths in various models of fitness landscapes. The Annals of Applied Probability,2014, 24 (4): 1375–1395. doi: 10.1214/13-AAP949
[9]
Berestycki J, Brunet E, Shi Z. The number of accessible paths in the hypercube. Bernoulli,2016, 22 (2): 653–680. doi: 10.3150/14-BEJ641
[10]
Berestycki J, Brunet E, Shi Z. Accessibility percolation with backsteps. ALEA Latin American Journal of Probability and Mathematical Statistics,2017, 14 (1): 45–62. doi: 10.30757/ALEA.v14-04
[11]
Li L. Phase transition for accessibility percolation on hypercubes. Journal of Theoretical Probability,2018, 31 (4): 2072–2111. doi: 10.1007/s10959-017-0769-x
[12]
Krug J. Accessibility percolation in random fitness landscapes. In: Probabilistic Structures in Evolution. Berlin: EMS Press, 2021: 1–22
[13]
Hu Z, Li Z, Feng Q. Accessibility percolation on random rooted labeled trees. Journal of Applied Probability,2019, 56 (2): 533–545. doi: 10.1017/jpr.2019.29