Jie Liu is an Associate Professor at the University of Science and Technology of China (USTC). He received his Bachelor’s degree and Ph.D. degree in Statistics from USTC in 2003 and 2008, respectively. His major research interests include random networks, statistical modelling, and network analysis
Jiayang Zhao is currently a Ph.D. student at the University of Science and Technology of China (USTC). His research interests focus on network data modelling and subgroup analysis
Social interaction with peer pressure is widely studied in social network analysis. Game theory can be utilized to model dynamic social interaction and one class of game network models assumes that peopleos decision payoff functions hinge on individual covariates and the choices of their friends. However, peer pressure would be misidentified and induce a non-negligible bias when incomplete covariates are involved in the game model. For this reason, we develop a generalized constant peer effects model based on homogeneity structure in dynamic social networks. The new model can effectively avoid bias through homogeneity pursuit and can be applied to a wider range of scenarios. To estimate peer pressure in the model, we first present two algorithms based on the initialize expand merge method and the polynomial-time two-stage method to estimate homogeneity parameters. Then we apply the nested pseudo-likelihood method and obtain consistent estimators of peer pressure. Simulation evaluations show that our proposed methodology can achieve desirable and effective results in terms of the community misclassification rate and parameter estimation error. We also illustrate the advantages of our model in the empirical analysis when compared with a benchmark model.
Abstract
Social interaction with peer pressure is widely studied in social network analysis. Game theory can be utilized to model dynamic social interaction and one class of game network models assumes that peopleos decision payoff functions hinge on individual covariates and the choices of their friends. However, peer pressure would be misidentified and induce a non-negligible bias when incomplete covariates are involved in the game model. For this reason, we develop a generalized constant peer effects model based on homogeneity structure in dynamic social networks. The new model can effectively avoid bias through homogeneity pursuit and can be applied to a wider range of scenarios. To estimate peer pressure in the model, we first present two algorithms based on the initialize expand merge method and the polynomial-time two-stage method to estimate homogeneity parameters. Then we apply the nested pseudo-likelihood method and obtain consistent estimators of peer pressure. Simulation evaluations show that our proposed methodology can achieve desirable and effective results in terms of the community misclassification rate and parameter estimation error. We also illustrate the advantages of our model in the empirical analysis when compared with a benchmark model.
Public Summary
This paper introduces the Generalized Constant Peer Effect (GCPE) model, a novel network game model that quantifies social interactions within dynamic networks.
The proposed model can efficiently mitigate estimation inaccuracies related to peer pressure by integrating homogeneity.
We design innovative algorithms to accurately identify homogeneity. Then we apply the Nested Pseudo-Likelihood Estimation (NPLE) method to obtain consistent estimators of parameters.
People’s behavior based on individual interactions in social networks has been widely studied by researchers[1–5]. The individual’s behavior needs to take into account the effect of peer pressure between connected nodes in dynamic social networks, i.e., the influence of partners in decision-making. For example, when conducting surveys on students’ smoking behaviors, people usually only know whether the student has smoked during a particular time period. Peer pressure refers to the influence of the behaviors of friends on these students’ smoking behavior. Game theory is commonly utilized to describe social interactions in networks[6–8]. Through observation, we can discover that those who make the same decision often have similar characteristics, which is homogeneity. The homogeneity structure is quite common in social network analysis[9–12]. Peer pressure is a type of social influence that is often confused with homogeneity when discussing individual behavior, see Refs. [12, 13]. Ref. [14] studies a practical problem that takes homogeneity into account. Therefore, to obtain more accurate estimates of peer pressure, it is necessary to consider the impact of homogeneity on peer pressure. However, the above study based on game theory on peer pressure does not consider the potential confusion between social influence and homogeneity, which may lead to bias in the estimation of social influence. There is some relevant work on distinguishing homogeneity from social influence, see Ref. [13, 15, 16]. It is necessary to consider the impact of homogeneity in game models to obtain more accurate peer pressure. Therefore, combining the two is meaningful research.
Game theory can be used to statistically model dynamic social interaction with peer pressure, and one class of game network models usually assumes that people's decision payoff functions depend on individual covariates and their friends’ decisions and then derives the corresponding conditional choice probability equations based on the payoff functions. In Refs. [17, 18], they discussed a dynamic game problem based on Markov games, but the focus of their research was not peer pressure. However, the deduction approach for dynamic games can be applied to other models. Ref. [19] proposed the concept of “individual behavior influenced by others”. Ref. [20] proposed the concept of “local interaction” for the first time. Based on the above work, Ref. [7] proposed a static network game model with incomplete information to study the estimation of peer pressure. Ref. [8] expanded upon the model in Ref. [7], allowing for peer pressure to depend on social influence, but the model remains limited to static scenarios. Their studies did not take into account the effect of homogeneity on individual decision-making, which might lead to bias in the estimation of peer pressure confounded with homogeneity. In addition, they implied that the observed covariates were complete. However, this is difficult in practice, which can also lead to estimation bias.
Homogeneity pursuit is an effective way to avoid possible bias caused by incomplete covariates, where incomplete covariates imply the existence of node attributes related to the formation of social relationships or to node behavior that have not been observed[16]. People tend to interact with people who are similar to them, which implies that homogeneity plays a crucial role in the formation of social network structures and it indeed has an influence on individual behaviors[11]. Ref. [13] introduced a continuous potential space setting to estimate homogeneity to correctly identify social influence. Ref. [15] used the potential community setting to estimate homogeneity. Both [13] and [15] noted that reducing the estimation bias of social influence can be effectively achieved by considering homogeneity. Based on the above work, Ref. [16] proposed a linear model for explaining social interactions in homogeneous social networks, and when the estimation of homogeneity meets certain conditions, the estimator of the model converges. However, people’s behavior is discrete in many scenarios, and the property of consistent convergence of corresponding parameters no longer holds in our game network model. Then, the homogeneity estimation method is no longer applicable in our model's corresponding network, and we need to redesign the estimation method for homogeneity. Moreover, Ref. [16] assumed that different individuals have the same influence on each other, which does not align well with reality.
This study introduces a new game model with individual homogeneity to characterize social interactions with peer pressure in dynamic networks. The new model is called the generalized constant peer effect (GCPE) model. To construct such a dynamic network model, we adopt a Markov perfect equilibrium as the game to obtain an equilibrium solution and then give the conditional choice probability equation. The GCPE model assumes that discrete individual decisions are determined by the covariates unrelated to homogeneity, peer pressure, homogeneity, and random disturbance terms. It can be thought of as the extension of the game model in Ref. [8] and the model in Ref. [16]. We propose a simple, efficient, and accurate estimation procedure to estimate homogeneity and peer pressure. Specifically, we first utilize two algorithms based on the initialize expand merge method and the polynomial-time two-stage method to estimate homogeneity parameters because homogeneity cannot be directly observed. Then we apply the nested pseudo-likelihood (NPLE) method to efficiently obtain consistent estimators of peer pressure.
The remainder of this paper is organized as follows. In Section 2, we present a generalized constant peer effect (GCPE) model. Two algorithms are given to estimate homogeneity in Section 3. In Section 4, the NPLE method and the corresponding theoretical properties are represented. In Section 5, we demonstrate the estimation results of the homogeneity algorithm and validate the theoretical results of the NPLE estimator. Section 6 compares the results of our proposed model with a benchmark model in applications to real data. Conclusions are provided in Section 7. The proof details of theorems and some simulation results are given in the Appendix.
2.
Generalized constant peer effects model
We consider a Markov game model in dynamic social networks. The dynamic network is a series of random graphs consisting of nodes with directed edges. Let i∈I={1,⋯,n} denote each node. The relationship between nodes at time t∈T={1,⋯,m} is represented by the adjacency matrix Gt∈Rn×n, where the (i,j) entry gtij=1 if individual i thinks j is his/her best friend at time t, and gtij=0 otherwise. we follow the convention to let gtii=0 for i∈I and t∈T. Denoting Fti={j∈I:gtij=1} to be the group of i’s best friends.
The latent community partition of each network is denoted by D=(D1,⋯,DM), where M<n is the number of communities. Under the potential community setting, we refer to a class of multi-graph community identification methods, where a dynamic network is a kind of multi-graph network, such as Ref. [21, 22]. This type of research considers community labeling to be global, not local. Specifically, MIT Reality Mining data in Ref. [21] is a dynamic network dataset, but community identification is performed by identifying a global community label for it as a whole rather than a local community label for each moment of the network individually. Therefore, let {\boldsymbol{C}}_{i} be the homogeneity parameter of node i that does not change over time and it takes the following value:
where d_j is the community-specific parameter in community D_j .
In dynamic social networks, we record the social behavior of node i at time t\in (-\infty,+\infty) as Y^{t}_{i} \in \{ 0,1\} . Y^{t}_{i} can be the student’s decision to smoke or not as mentioned above. It is worth noting that our model introduced later also allows for multiple values of Y^{t}_{i} without loss of generality. Then the utility function of node i is given by
Here, {\boldsymbol{y}}^{t}_{-i} is the behavior vector of all nodes except node i. X^{t}_{i}\in \mathbb{R}^{d} is the covariate vector of individual i that is not relevant to homogeneity parameter {{\boldsymbol{C}}}_i. Q^t_i is the total number of i’s best friends at time t, we set Q^t_i=1 if Q^t_i=0 to ensure that {1}/{Q^t_{i}} makes sense. S^t_{ji} is a social-influence measure of node j on node i at time t; {\bf{1}}(A) is an indicator function of a set A; \varepsilon^{t}_{i}(k)\in \mathbb{R} are unobserved action-dependent utility shocks, where k\in \{0,1\} . Let \varepsilon^{t}_{i}=\{\varepsilon^{t}_{i}(0), \varepsilon^{t}_{i}(1)\} . Moreover, \beta_{k} is the coefficient vectors to be estimated for X^{t}_{i} , \gamma_{k} is the coefficients vectors to be estimated for {\boldsymbol{C}}_i, and \alpha_{k} (\cdot) are unknown functions that measure the peer pressure on player i from his/her friend j when they make the same decision for k\in \{0,1\} .
Under the condition of maximizing the total expected discounted payoff, we use a Markov game to derive the payoff function with a homogeneous structure as the following conditional choice probability equation, i.e., the GCPE model:
\mathbb{W}^t=(X^t,{{\boldsymbol{C}}},G^t), {\boldsymbol{C}}=({\boldsymbol{C}}_1,\cdots,{\boldsymbol{C}}_n) is the set of homogeneity of nodes in the network, θ is the parameter to be estimated and will be introduced later in detail, and the total expected discounted payoff function will also be introduced later. The GCPE model describes how individual i’s decision-making is influenced by self factors and his/her social friends. The detailed derivation is described in Section 2.1.
2.1
Model derivation
Markov perfect equilibrium
In dynamic games, the adjacency matrix \mathbb{G}^t will change over time, and the state variables (X^{t}_{i},\varepsilon^{t}_{i}) will change as players make decisions. Therefore, their change is described by an exogenous probability. To solve for Markov perfect equilibrium solutions, we further make the assumption of conditional independence of their changes:
Assumption 2.1. The exogenous probability function satisfies conditional independence, that is,
This assumption is quite common for Markov games, such as Refs. [17, 18]. Player i pursues the highest expected return, usually demonstrating long-term consideration in the Markov game. Let \eta \in (0,1) denote the discount factor. Given the state (X^{t}_{i},\varepsilon^{t}_{i}) at time t, player i will make a series of decisions to maximize its total expected discounted payoff, i.e.
where the expectation is taken over the state evolution given in Eq. (2).
Because the state transition is stationary, we adopt the Markov perfect equilibrium as the equilibrium concept, and thus the time index t can be omitted from the notation. Let P_i (y_i|\mathbb{W}) be the conditional selection probability of player i choosing decision y_i\in \{ 0,1\} in state \mathbb{W}=(X,{{\bf{C}}},\mathbb{G}) . Given P_j (y_j|\mathbb{W}),\ \forall j \ne i , the current expected payoff of player i choosing decision y_i in state \mathbb{W} is
Bellman optimality. Let V_i(x) be the conditional expectation function of player i in state x. For any i\in {\cal{I}} and \mathbb{W} \in {\cal{W}} , we have
where f_{{\cal{W}}}^P(\mathbb{W}'|\mathbb{W}) is a conditional density function of state \mathbb{W}' when given \mathbb{W} , and e^P_i(y_i,\mathbb{W}) is a function related to the distribution of \varepsilon_i(k) .
Conditional choice probability equation. Let v_i(y_i|\mathbb{W}) be the conditional choice-specific expected value function of player i, then
where f_{{\cal{W}}}^P(\mathbb{W}'|\mathbb{W},y_i) is a conditional density function.
When the private shocks \varepsilon_{i} are observed, player i makes decision y_i=k if and only if k \in \arg\max_{k \in \{0,1\}}\{v_i(y_i=k|\mathbb{W})+ \varepsilon_{i} (y_i=k)\}. Therefore, we have
To derive the above equation more specifically, we give the following assumption about the distribution of private shocks \varepsilon_{i}^{t}(k) .
Assumption 2.2. The error terms \varepsilon_{i}^t(k),\ i\in \{ 1, \cdots, n\}, k \in \{ 0, 1\} are independent and identically distributed {\rm (i.i.d.)} across both players and actions at time t. Furthermore, the error term follows the extreme value distribution (Type Ⅰ) with density
f(t)=\exp{(-t)}\exp{(-\exp{(-t)})}.
Assumption 2.2 is often used in game network models[7, 8, 17, 18]. Note that the Gaussian distribution and other distributions can also be applied to derive the conditional choice probability equation, but the corresponding formulation will be more sophisticated than that of the extreme value distribution (Type I). Under Assumption 2.2, we can derive a simple form of P_i(y_i=k|\mathbb{W}) , that is
The following assumption is imposed to simplify Eq. (4).
Assumption 2.3. f_{{\cal{W}}}^P(\mathbb{W}'|\mathbb{W},y_i) is independent of y_i .
Since homogeneity replaces the attribute variables X in state \mathbb{W} , which are relevant to the players’ decisions, thus Assumption 2.3 is reasonable. Under Assumption 2.3, we further simplify Eq. (4) and continue to use the symbol t. The final solution yields our proposed GCPE model:
For the sake of tractability, we linearize \alpha(\cdot) for empirical analysis, see Refs. [7, 8], let \alpha_0(s)=\varphi_0+\varphi_1 \times s and \alpha_1(s)=\phi_0+\phi_1 \times s . For model identification, we set \beta_0 = 0 , \gamma _0 = 0 and \varphi_0=0 . By following Ref. [8], we can treat P_i^t(y_i^t=1|\mathbb{W}^t) as a known object. Let T^t_i=\ln P_i^t(y_i^t=1|\mathbb{W}^t)- \ln(1-P_i^t(y_i^t=1|\mathbb{W}^t)), we can obtain that
where \varphi_1 represents the peer pressure effect exerted by node i’s friends who are different from node i’s decision when the social influence of nodes in the network is different. \phi_0 quantifies the peer pressure effect of node i’s friends, whose decisions are the same as node i when the social influence of all nodes is equal. \phi_1 refers to the peer pressure effect of node i’s friends whose decisions are the same as those of node i when nodal social influence remains fixed in the network.
Our target parameters can be denoted by \theta = (\beta'_1, \varphi_1, \phi_0, \phi_1, \gamma'_1)^{'} \in \varTheta, where Θ is the parameter space. Let \theta_0 \in \varTheta be the true parameter for the game.
3.
Estimation of homogeneity
As stated in Ref. [16], we cannot directly observe the homogeneity parameter {{\bf{C}}}_i of player i. Thus we turn to utilizing the attribute covariates of player i and its social relationships to approximately estimate the homogeneity parameter {{\bf{C}}}_i . {\hat{{\bf{C}}}} denotes the estimated homogeneity vector of {{\bf{C}}} .
We follow the idea of Ref. [23] and define a weighted adjacency matrix W as follows:
where f(X_i,X_j) is a function that takes values between 0 and 1, 0\leqslant \alpha \leqslant 1 is a constant. The number of edges in the weighted directed network is too large, thus we choose to delete the edges with smaller weights to ensure that our proposed algorithms can work. Specifically, we set W_{ij}=0 if W_{ij}<\omega_{0.625} where \omega_{0.625} is the 0.625 quantile of W. Let F^{W}_i=\{j \in {\cal{I}}:W_{ij}^t>0\} , and F^{W}=\{F^{W}_1,\cdots,F^{W}_n\} .
For community identification of homogeneity parameters, we first present a revised algorithm of the IEM algorithm in Ref. [24]. We use Sim_{ij}^{\rm IEMC} to replace Sim_{ij}^{\rm IEM}, where Sim_{ij}^{\rm IEMC} follows the pattern of the directed network formation corresponding to the GCPE model. Then, the IEM algorithm pulls j into the community of i when j={\rm argmax}_{j\neq i} Sim_{ij}^{\rm IEMC}, but we instead pull i into j’s community, which is more consistent with our model and can effectively improve algorithm efficiency. In summary, our proposed Algorithm 1 (initialize-expand-merge-change method) can be applied to directed weight networks.
7: Merge community i and community j and record this new community collection as C_{-ij}, then calculate Q_{-ij}=Q(C_{-ij})
8: end for
9: if \max_{C_{-ij}}(Q_{-ij})>Q(C), let C={\rm argmax}_{C_{-ij} }(Q_{-ij}) and back to Step 5; else return \sigma^0(i)=C(i) for all i\in{\cal{I}} and k=|C|.
Although Algorithm 1 is able to estimate the communities of heterogeneous parameters very well, there is a more charming algorithm in[25] that bounds the probability of misclassification, which is of order {\rm e}^{-O(n)}. However, their algorithm is only suitable for undirected networks, thus it cannot be directly used in our GCPE model. To further improve the accuracy of community identification, we design a modified cross-validation version of Algorithm 1 and introduce Algorithm 2. Compared with the Algorithm 1, Algorithm 2 simplifies and changes the initial screening Steps 2–4 in Algorithm 1 at the cost of heavy computation. Based on the idea of Ref. [25], we present our Algorithm 2.
Algorithm 2. A refinement scheme for community detection.
Input: Weighted adjacency matrix W, number of communities k, initial community detection method \sigma^0 (i.e., Algorithm 1).
Output: Community assignment \tilde{\sigma}.
1: fori = 1 to ndo
2: Apply \sigma^0 on W_{-i} for all j \ne i, obtain \sigma^0_i(j) and \sigma^0_i(i)=0, where W_{-i} denotes the (n-1)\times (n-1) submatrix of W with its i{\rm th} row and i{\rm th} column removed.
3: Define \tilde{\sigma}_i:[n]\ \to\ [k] by setting \tilde{\sigma}_i(j)=\sigma^0_i(j) for all i \ne j where [n]=\{1,\cdots,n\}, and
Assumption 3.1 is reasonable for homogeneity estimation. Refs. [26, 27] pointed out that it is possible to consistently recover the communities for networks based on latent community setting when very mild regularity conditions are satisfied. The homogeneity structure in our proposed GCPE model is the latent community setting. In addition, after summarizing many relevant studies, Ref. [16] mentions that there are many community detection methods that satisfy this assumption. One such method is described in Ref. [25], and it provides us with a framework for constructing 2. In Section 5, we show numerically that the community misclassification rate of the two algorithms in the GCPE model tends to 0 when n \to \infty.
4.
Estimation of peer pressure
Once we obtain the estimated homogeneity vector {\hat{{\bf{C}}}} , we can plug {\hat{{\bf{C}}}} into the linearization form of the GCPE model in Eq. (5). Let \tilde{\mathbb{W}}^t = (X,{\hat{{\bf{C}}}},\mathbb{G}) and \tilde{T}^t_i = \ln P_i^t(y_i^t = 1|\tilde{\mathbb{W}}^t) - \ln(1- P_i^t(y_i^t = 1|\tilde{\mathbb{W}}^t)), we have
Our target parameters can be denoted by \tilde{\theta} = (\tilde{\beta}'_1, \tilde{\varphi}_1, \tilde{\phi}_0, \tilde{\phi}_1, \tilde{\gamma}'_1)^{'} \in \varTheta. Let \tilde{\theta}_0 be the true parameter given {\hat{{\bf{C}}}} . However, the estimation procedure of \tilde{\theta} by directly using the maximum likelihood method may suffer from a huge computational burden caused by the game. Therefore, we introduce the nested pseudo-likelihood estimation (NPLE) method to address the problem of excessive computation. We further provide theoretical properties of the NPLE method.
4.1
NPLE method
In a large dynamic game network, it is obviously more difficult to calculate the equilibrium solution. The NPLE method can effectively alleviate the problem of excessive computational burden, and it is essentially an iterative algorithm consisting of a series of logit estimates that is easy to implement. In the later subsection we prove the convergence property of the NPLE method, which guarantees the accuracy of the method. To describe the NPLE method in our model, let
{\mathfrak{P}^{t*}}({\mathbb{W}^t}) = (P_1^t(y_1^{t = 1}|{\mathbb{W}^t}), \cdots ,P_n^t(y_1^{t = 1}|{\mathbb{W}^t}))' and {\mathfrak{P}}^{t}= (p_1^t,\cdots ,p_n^t) \in [0,1]^n be the Markov perfect equilibrium choice probability solution and a general probability distribution at time t. For any {\mathfrak{P}}^{t} \in [0,1]^n , we have
We denote \mathfrak{P}^{t}(\theta; \mathbb{W}^t) as the solution of the equation: \Gamma_i^t \left ( \mathfrak{P}^{t}, \theta; \mathbb{W}^t \right )=p_i^t for all t \in {\cal{T}} and i \in {\cal{I}} . Note that we cannot directly determine \mathfrak{P}^{t*}(\mathbb{W}^t)=\mathfrak{P}^{t}(\theta; \mathbb{W}^t) , because it is difficult to guarantee the uniqueness of the structural parameters θ, that is, a solution may correspond to multiple structural parameters θ. Nevertheless, we can determine \mathfrak{P}^{t*}(\mathbb{W}^t)= \mathfrak{P}^{t}(\theta_0; \mathbb{W}^t), where \mathfrak{P}^{t}(\theta_0; \mathbb{W}^t) is unique. we further define a pseudo log-likelihood function:
where \mathfrak{P}=(\mathfrak{P}^1, \cdots,\mathfrak{P}^m ) . Note that we are using the estimated homogeneity parameter {\hat{{\bf{C}}}} instead of the true homogeneity parameter {{\bf{C}}} . If we want to obtain the pseudo log-likelihood function corresponding to the true parameter \theta_0 , we need to use the true homogeneity to construct the function. Therefore, we define the limiting form of this pseudo log-likelihood function as follows:
where \mathfrak{P}=\Gamma (\mathfrak{P}, \tilde{\theta}; \tilde{\mathbb{W}}) .
First, we follow Ref. [7] and set the initial value of \mathfrak{P}^{(0)} to be (0, \cdots, 0) \in [0,1]^{n \times m} . Second, we conduct an iterative procedure over the two steps described below:
When \left \| \tilde{\theta}^{(k)}-\tilde{\theta}^{(k-1)} \right \| < c or \left \| \mathfrak{P}^{(k)}-\mathfrak{P}^{(k-1)} \right \| < c (c is small enough, e.g., 10^{-6} ), we stop iterating. Then we have \tilde{\theta}_{\rm NPLE}=\tilde{\theta}^{(k)}.
Although there is an excellent condition to ensure the local convergence property of the NPLE algorithm in Ref. [28], it is still difficult to verify the uniqueness of the equilibrium solution in our model. However, this does not mean that we cannot use the NPLE method, because extensive experience through application shows that the NPLE method usually converges to the same solution, see Ref. [17].
4.2
Asymptotic analysis
We first define some notations of the network \mathbb{G}^t : let N^t_{(i,h)}=\{ j\in {\cal{I}}: ({\mathbb{G}^t}^k)_{ji} \geqslant 1\ {\rm for}\ \forall k \leqslant h \} and \mathbb{G}^t_{(i,h)} is a N^t_{(i,h)} \times N^t_{(i,h)} submatrix of \mathbb{G}^t for i \in {\cal{I}} , \forall h \in \mathbb{N} and t \in {\cal{T}} . Then, we introduce the assumptions required by the theorem.
Assumption 4.1 ensures that Eq. (1) allows a unique equilibrium solution. This assumption is a sufficient assumption for equilibrium uniqueness in Bayesian games, see Ref. [8]. We found it is also a useful assumption for equilibrium uniqueness in our Markov games, so we still use it as a sufficient assumption. Similar assumptions can also be found in Ref. [7]. Under this assumption and previous assumptions, we can further obtain the following lemma.
Lemma 4.1. Under Assumptions 2.1, 2.2, 2.3 and 4.1 in Theorem 4.1, the games exist a unique pure strategy Markov perfect equilibria.
As mentioned in Ref. [8], Lemma 4.1 is an important theorem for statistical inference. We need it to avoid the problem of multiple equilibria. One can refer to Ref. [8] for more details.
Assumption 4.2. E(Z^t_i Z^{t'}_i) and E(\tilde{Z}^t_i \tilde{Z}^{t'}_i) are full rank.
Assumption 4.3. The parameter \theta_0 is the only solution to the following equation
Assumption 4.2 is used for model identification. Similar assumptions can be found in Refs. [7, 8]. As stated above and in Ref. [8], assumption 4.3 is an essential assumption for our theoretical results. Although it is easy to verify that \theta_0 is a solution of the above equation, we cannot prove that \theta_0 is the unique solution of the above equation. If the above equation has multiple solutions, then each solution is a convergence point of the NPLE method. Therefore, we need this assumption to guarantee that the NPLE method convergence point is unique. The NPLE method can still be used when the assumption is not satisfied. As mentioned in Ref. [17], we can use multiple NPLE initial points to verify whether the fixed point is unique, if not, consider the point that maximizes the pseudo log-likelihood function among the multiple fixed points as the solution of the NPLE method.
Assumption 4.4. The parameter space Θ is compact and the support {\cal{S}}_{X{\bf{C}}} and {\cal{S}}_{X\hat{{\bf{C}}}} is bounded.
Assumption 4.5. There exists a positive integer constant c_0 such that
\max\limits_{i \in {\cal{I}}, t \in {\cal{T}}}\sum\limits_{j=1}^n g^t_{ij} \leqslant c_0
is always true.
Assumption 4.6. For any given h\in\mathbb{N} , the probability distribution of \mathbb{G}^t_{(i,h)} converges to a limiting distribution as n tends to infinity for all t\in {\cal{T}} and i \in {\cal{I}} , and if N^t_{(i,h)} \cap N^t_{(j,h)} = \emptyset , then \mathbb{G}^t_{(i,h)} and \mathbb{G}^t_{(j,h)} are independent. Moreover, the payoff covariates X^t_i are \rm i.i.d. across players when the exogenous random network at time t is given.
Assumption 4.4 guarantees that our pseudo log-likelihood function is uniformly bounded, since it ensures that our probability selection function is uniformly bounded and far from zero, see Refs. [7, 8]. Assumption 4.5 aims to reduce the dependencies between the various players, please refer to Ref. [7] and the network decaying dependence condition (NDD condition) mentioned therein. It is also true that this is the case in some actual datasets. Assumption 4.6 is similar to that in Refs. [7, 8]. With regard to the assumption of payoff covariates X^t_i in assumption 4.6, we do not directly use the attribute variables of the players in the model (e.g., height, age, gender, etc.). Similarly, as described in Ref. [8], Under certain conditions, we can slightly relax this assumption.
Assumption 4.7. For any two players i and j, the central measure function S_{ji}^t between them needs to satisfy
where S(\cdot,\cdot,\cdot) is a bounded function and q is a constant.
Assumption 4.7 is utilized to limit the dependence caused by social influence. In our model, we still want to keep the social influence setting in Ref. [8], but it will break some of the dependencies that were reduced in Refs. [7, 8]. Thus we need Assumption 4.7 to ensure that the dependencies in our model are not so strong.
Theorem 4.1. Under Assumptions 2.1, 2.2, 2.3 3.1, 4.1, 4.2, 4.3, 4.4, 4.5, 4.6, 4.7, and m \ll n , h=\ln n / (4\ln c) , then,
Theorem 4.1 guarantees the convergence property of the NPLE estimator, but it still has extensible aspects. It has been found that center metrics may not meet Assumption 4.7, we can modify this type of center metric to fit our model and avoid possible bias. Therefore, we make Assumption 4.8.
Assumption 4.8. For any two players i and j, the central measure function( S_{ji}^t ) between them needs to satisfy
holds. Note that the h that we choose in this theorem is not unique and we use S^{'t}_{ji} instead of S_{ji}^t to estimate the parameters.
5.
Monte Carlo experiments
In this section, we evaluate the performance of our proposed community identification algorithms for estimating homogeneity, and verify whether the NPLE estimation of peer pressure conforms to theoretical results. We first elaborate on the generation process of simulated data, and then present the simulation results.
5.1
Experimental setup
For a dynamic homogeneous network, we consider time m = 2 and assume that the number of directed edges of each node in the network has an upper bound, i.e., we set a minimum of 1 and a maximum of 10 directed edges for each node. Similar to Ref. [16], we further consider 4 potential communities for individual homogeneity parameters. The probability of having a directed edge between two nodes in the same community is 0.75, while the probability of having a directed edge between two nodes belonging to different communities is 0.25. The simulation results of other choices of link probabilities for the edges can be found in the Appendix. Furthermore, we generate four different groups of two-dimensional random attribute covariates for estimating the homogeneity parameter, that is, D_1: (N(2,{1}/{4}),N(2,{1}/{4})), D_2: (N(2,{1}/{4}),N(-2,{1}/{4})), D_3: (N(-2,{1}/{4}),N(2,{1}/{4})), D_4: (N(-2,{1}/{4}),N(-2,{1}/{4})), and set f(X_i,X_j)=d(X_i,X_j) , where d(\cdot,\cdot) is Euclid distance.
Regarding social influence, we apply the KB centrality measure in Ref. [8], which is defined as follows:
It can be seen that using KB centrality as the true result in the simulation will lead to deviation in the estimation procedure, because it will violate the precondition of the theoretical result. We make the following changes to the KB centrality to conform to Theorem 4.2 for estimation.
where \mathbb{G}^t_{(i,h)} is a N^t_{(i,h)} \times N^t_{(i,h)} submatrix of \mathbb{G}^t for i \in {\cal{I}} , h \in \mathbb{N} and t \in {\cal{T}} and N^t_{(i,h)}=\{ j\in {\cal{I}}: ({\mathbb{G}^t}^k)_{ji} \geqslant 1\ for\ \forall k \leqslant h \}. It is simple to verify that this modified KB centrality satisfies condition 2 of Theorem 4.2.
For nodal covariates X_t\in \mathbb{R} , we follow Ref. [7, 8] and let X^t_i=S^t_i , the corresponding coefficient is set to be \beta_1 = -1 . With regard to homogeneity parameters, we let \gamma'_1=(-2,-1,1,2)^{'} . Then, we give each node a probability 1/{n^{1.5}} of community misidentification in each simulation to verify our theoretical results. The reason why we do not use our algorithm is because of the computational cost and we have difficulty in having enough time to perform simulations with large samples. Finally, the estimation results are obtained by simulating 100 replications.
5.2
Experimental results
We summarize the experimental results from the following three aspects. First, we evaluate the performance of the two proposed algorithms for estimating homogeneity parameters. Second, we analyze the effect of the NPLE estimation method of peer pressure parameters from the perspective of mse and sd of the estimated values, and compare the two cases of whether there is error in the estimation of homogeneity parameters. At the same time, we give the sensitivity analysis of the parameters. Finally, we compare the performance of our model and method in dynamic networks and static networks.
We can see from Fig. 1 that the average community misclassification rate for 50 repetitions by using Algorithm 1 has a significant decreasing trend as n increases. When the number of nodes n reaches 100, the community identification error of Algorithm 1 is able to reach within a desirable error range. Because the time cost of running Algorithm 2 once is equivalent to running Algorithm 1 n^2 times, we only compare the two algorithms when the network size n is much smaller. As shown in Fig. 2, when the network size n increases, the community misclassification rate of Algorithm 2 has a significant downward trend and even reaches 0 directly. Overall, the accuracy of Algorithm 2 is better than that of Algorithm 1 in the simulations. Therefore, we utilize Algorithm 2 for homogeneity identification when the network size n is moderate, and the accuracy of the community identification results is exchanged by the computational cost. When the network size n is large, Algorithm 1 can dramatically save computational cost while obtaining sufficient accuracy for homogeneity estimation.
Figure
1.
Average community misclassification rate for Algorithm 1, which is based on the average of 50 replications.
In Table 1, as for peer pressure parameters (\varphi_1,\phi_0,\phi_1) , it can be found that with the increase of n, the estimated value (\varphi_1,\phi_0,\phi_1) tends to approach the true value. In addition, when compared with Table 2, one can see that when n is small, the homogeneity identification error has a greater impact on the estimation of the result, but as n increases, this impact gradually decreases to disappear.
Table
1.
Peer pressure estimation with homogeneity identification error.
Description of results
\beta=-1
\varphi_1=1
\phi_0=1
\phi_1=1
\gamma'_1(1)=-2
\gamma'_1(2)=-1
\gamma'_1(3)=1
\gamma'_1(4)=2
mean
−1.0121
1.0211
0.9936
1.0046
−2.0063
−1.0064
1.0378
2.0324
n=500
sd
0.3844
0.5127
0.7130
0.6358
0.5607
0.5797
0.6780
0.7941
mse
0.1464
0.2607
0.5033
0.4002
0.3113
0.3327
0.4566
0.6254
mean
−1.0386
1.0209
0.9530
0.9677
−1.9985
−0.9785
1.0758
2.0837
n=1000
sd
0.2640
0.3237
0.4969
0.4245
0.4538
0.4053
0.5275
0.5704
mse
0.0705
0.1041
0.2466
0.1794
0.2039
0.1631
0.2812
0.3291
mean
−1.0100
0.9544
0.9995
0.9184
−2.0177
−1.0049
1.0023
2.0096
n=1500
sd
0.1739
0.2954
0.2616
0.3312
0.3824
0.2242
0.3916
0.3620
mse
0.0300
0.0867
0.0678
0.1086
0.1449
0.0518
0.1518
0.1364
The first row of data represents the average of the predicted results, and the second row is the corresponding standard deviation, and the third row is mean squared error.
Table
2.
Peer pressure estimation without homogeneity identification error.
Description of results
\beta=-1
\varphi_1=1
\phi_0=1
\phi_1=1
\gamma'_1(1)=-2
\gamma'_1(2)=-1
\gamma'_1(3)=1
\gamma'_1(4)=2
mean
−1.0662
0.9904
0.9811
0.9019
−2.0257
−0.9462
1.0923
2.1071
n=500
sd
0.3311
0.4596
0.7589
0.6382
0.5729
0.5127
0.6097
0.7203
mse
0.1129
0.2093
0.5705
0.4128
0.3257
0.2632
0.3766
0.5252
mean
−0.9984
0.9895
0.9884
0.9589
−2.0294
−1.0123
1.0075
2.0187
n=1000
sd
0.2322
0.3035
0.4612
0.4475
0.3812
0.3229
0.4173
0.4791
mse
0.0534
0.0913
0.2107
0.1999
0.1448
0.1033
0.1724
0.2275
mean
−1.0449
1.0200
0.9561
0.9710
−1.9413
−0.9538
1.0869
2.0915
n=1500
sd
0.1947
0.2463
0.4545
0.3837
0.3339
0.2932
0.4012
0.4564
mse
0.0395
0.0605
0.2064
0.1466
0.1138
0.0872
0.1669
0.2146
The first row of data represents the average of the predicted results, and the second row is the corresponding standard deviation, and the third row is mean squared error.
Moreover, it can be found from Table 1 that both the standard deviation ( sd ) and the mean square error ( mse ) decreases as n increases, and the speed of convergence to 0 is basically consistent with n. However, compared with Table 2, When n is not large enough, both the standard deviation and the mean square error are slightly inferior to the results estimated with the true homogeneity, which is still acceptable.
Figure
3.
The standard deviation and mean square error of the estimated peer pressure parameter.
Finally, we compare the performance of our methodology in dynamic networks and static networks. From Table 3, we can see that although in a static network, with the increase in the number of nodes n, the estimation results still remain convergent, but all the estimation results are still worse than those in Table 2. Therefore, when the number of nodes is insufficient, the expansion of the time dimension can ensure that our estimation results are more accurate.
Table
3.
Peer pressure estimation without homogeneity identification error in static network.
Description of results
\beta=-1
\varphi_1=1
\phi_0=1
\phi_1=1
\gamma'_1(1)=-2
\gamma'_1(2)=-1
\gamma'_1(3)=1
\gamma'_1(4)=2
mean
−1.0142
0.9650
0.9821
0.9143
−2.1668
−1.0378
1.0212
2.0608
n=500
sd
0.4945
0.6656
1.0955
0.9485
0.8194
0.7533
0.8737
1.0097
mse
0.2423
0.4398
1.1885
0.8981
0.6925
0.5632
0.7562
1.0130
mean
−1.0496
1.0126
0.9703
0.9306
−1.9781
−0.9654
1.0754
2.1021
n=1000
sd
0.3532
0.4359
0.7097
0.6817
0.5820
0.5615
0.6576
0.7797
mse
0.1260
0.1883
0.4995
0.4649
0.3358
0.3133
0.4338
0.6122
mean
−1.0746
1.0175
0.8553
0.9018
−1.9112
−0.9108
1.1671
2.2148
n=1500
sd
0.3003
0.3659
0.6127
0.5475
0.5535
0.4602
0.5753
0.6518
mse
0.0949
0.1328
0.3925
0.3064
0.3111
0.2177
0.3556
0.4668
The first row of data represents the average of the predicted results, the second row is the corresponding standard deviation, and the third row is the mean squared error.
6.
Empirical application: Teenage friends and lifestyle study
6.1
Dataset description
The dataset is collected from the teenage friends and lifestyle study1, which is designed to identify the influencing factors of adolescent smoking behavior during their growth. There are n=129 teenagers without default attributes in the dataset and it also combines students' friendship networks and longitudinal survey data on their social and economic characteristics with time phases m=3 . Observed demographic characteristics included age, sex, pocket money, sports, music preferences and family smoking. The dependent variable Y is the smoking situation of students in the 3 phases. We record frequent smoking as Y=1 , and occasional smoking or non-smoking as Y=0 . Table 4 represents the descriptive statistics of the dataset.
Table
4.
Statistical summary of the data.
Variable
Min
Max
Mean
smoking
0
1
0.47
KB centrality
0
1.43
0.33
money
−1
99.99
11.90
sex
0
1
0.57
age
12.8
14.3
13.32
smoke in home
−1
1
-0.13
parents smoke
−1
1
−0.02
brothers and sisters smoke
−1
1
0.74
num of friends
0
13
4.63
Note: In the table, sex: 0=girl, 1=boy; all family smoking: −1=yes 1=no.
We compare the performance of our GCPE model with a benchmark model in Ref. [8], which we call the KB-CPE model. The KB-CPE model does not consider homogeneity pursuit and is designed for static networks, thus we use the estimation results of the KB-CPE model based on the three periods' data for comparison. The initial value of \varSigma^{0} in the NPLE method for both models is taken to be \varSigma^{0}=(0,\cdots,0).
From Table 5, we can see that in our GCPE model, the three peer pressure parameters satisfy 0 < \varphi_1 < \phi_0 < \phi_1 . This implies that the adolescents who choose to smoke can be more affected by smoking friends than non-smoking friends, and whether there is a difference in the social influence of adolescents in social networks will not affect this conclusion. In addition, the individuals with higher KB centrality have greater influence on individuals with lower KB centrality, and non-smoking individuals with high KB centrality tend to avoid smoking when they are influenced by their non-smoking friends with low KB centrality.
Table
6.
Overall change in the probability of establishing connections between communities.
Description of results
\beta=-1
\varphi_1=1
\phi_0=1
\phi_1=1
\gamma'_1(1)=-2
\gamma'_1(2)=-1
\gamma'_1(3)=1
\gamma'_1(4)=2
mean
−1.0662
1.0359
0.9494
0.8862
−1.914
−0.9346
1.1207
2.1141
P_{in}=0.6
sd
0.3581
0.4815
0.6219
0.6960
0.6108
0.5541
0.6104
0.6990
mse
0.1313
0.2308
0.3854
0.4925
0.3768
0.3082
0.3834
0.4967
mean
−1.0546
1.0218
0.9682
0.9100
−1.9207
−0.9497
1.0980
2.0856
P_{in}=0.65
sd
0.3931
0.5307
0.6349
0.6605
0.6320
0.5852
0.6490
0.7419
mse
0.1560
0.2793
0.4000
0.4400
0.4018
0.3415
0.4266
0.5523
mean
−1.0178
1.0000
0.9513
0.9401
−1.9788
−0.9903
1.0578
2.0440
P_{in}=0.7
sd
0.3773
0.4950
0.6759
0.6198
0.5705
0.5524
0.6202
0.7168
mse
0.1413
0.2426
0.4547
0.3831
0.3227
0.3022
0.3842
0.5106
mean
−1.0121
1.0211
0.9936
1.0046
−2.0063
−1.0064
1.0378
2.0324
P_{in}=0.75
sd
0.3844
0.5127
0.7130
0.6358
0.5607
0.5797
0.6780
0.7941
mse
0.1464
0.2607
0.5033
0.4002
0.3113
0.3327
0.4566
0.6254
mean
−1.0274
1.0232
1.0281
0.9985
−1.9805
−0.9958
1.0410
2.0283
P_{in}=0.8
sd
0.3613
0.5463
0.7509
0.5649
0.5510
0.5641
0.6808
0.7904
mse
0.1300
0.2960
0.5590
0.3160
0.3009
0.3151
0.4606
0.6193
mean
−1.0091
0.9622
1.0916
1.0139
−2.0263
−1.0278
0.9845
1.9485
P_{in}=0.85
sd
0.3913
0.5619
0.7502
0.5377
0.5805
0.6085
0.7539
0.8560
mse
0.1517
0.3140
0.5655
0.2864
0.3343
0.3673
0.5630
0.7280
mean
−1.0369
1.0129
1.0359
1.0453
−1.9987
−0.9779
1.0484
2.0308
P_{in}=0.9
sd
0.3681
0.4898
0.8767
0.5839
0.5721
0.5599
0.7943
0.9219
mse
0.1355
0.2377
0.7623
0.3391
0.2340
0.3109
0.6269
0.8424
The first row of data represents the average of the predicted results, and the second row is the corresponding standard deviation, and the third row is mean squared error. P_{\rm in} means the probability of establishing directed edges at nodes of the same community and P_{\rm out} means the probability of establishing directed edges at nodes of different communities, where P_{\rm out}=1-P_{\rm in}. Network size n=400 . Homogeneity exists for identification errors.
The first row of data represents the average of the predicted results, and the second row is the corresponding standard deviation, and the third row is mean squared error. The probability matrix P=\left[ {\begin{array}{*{20}{c}} {0.75}&{0.25}&{0.35}&{0.15}\\ {0.35}&{0.80}&{0.05}&{0.20}\\ {0.30}&{0.35}&{0.70}&{0.25}\\ {0.15}&{0.25}&{0.15}&{0.75} \end{array}} \right] for establishing links between communities. Note that in a real simulation, the probability is not the above result; for example, p_{({\bf{C}}(1),{\bf{C}}(1))}=P_{11} and p_{({\bf{C} }(1),{\bf{C} }(2))}=(1-P_{11})\dfrac{P_{12} }{P_{12}+P_{13}+P_{14} }. Homogeneity exists for identification errors.
For the KB-CPE model, \varphi_1>0 and \phi_0 > 0 but \phi_1 < 0 , which can be problematic since when S_{ji}^{t} > 0 for all j \in F_{i}^t . Thus, the KB-CPE model cannot give a reasonable explanation for the estimation results of peer pressure parameters.
7.
Conclusions
This study focuses on the estimation of peer pressure in dynamic homogeneous networks, which is extensively studied in social network analysis. We developed a generalized constant peer effects (GCPE) model based on the Markov perfect equilibrium, which can effectively avoid bias through homogeneity pursuit and can be applied to a wider range of networks. To estimate peer pressure in the GCPE model, we first present two algorithms to estimate homogeneity parameters. Then we introduce the nested pseudo-likelihood (NPLE) method and apply it to obtain consistent estimators of peer pressure. We further represent the theoretical properties of the NPLE estimators. Simulation evaluations show that our proposed methodology can achieve desirable and effective results in terms of the community misclassification rate and parameter estimation error. We also illustrate the advantages of our model in empirical analysis when compared with the KB-CPE model.
Two interesting directions for future research are the plurality of decisions made by gamers and the multiple choices of homogeneous settings. In social networks, individuals do not behave in only two categories; thus, it is a natural idea to expand the decision dimensions of the players in the game. Although the homogeneity setting has been extensively studied and we have succeeded in proving the consistency of the NPLE estimators in the setting of homogeneity pursuit, the consistent properties of the NPLE estimators under other settings are also worth further study.
Acknowledgements
This work was supported by the National Nature Science Foundation of China (71771201, 71973001).
Conflict of interest
The authors declare that they have no conflict of interest.
This paper introduces the Generalized Constant Peer Effect (GCPE) model, a novel network game model that quantifies social interactions within dynamic networks.
The proposed model can efficiently mitigate estimation inaccuracies related to peer pressure by integrating homogeneity.
We design innovative algorithms to accurately identify homogeneity. Then we apply the Nested Pseudo-Likelihood Estimation (NPLE) method to obtain consistent estimators of parameters.
Kim J, Kim M, Choi J, et al. Offline social interactions and online shopping demand: Does the degree of social interactions matter? Journal of Business Research,2019, 99: 373–381. DOI: 10.1016/j.jbusres.2017.09.022
[3]
Poutvaara P, Siemers L R. Smoking and social interaction. Journal of Health Economics,2008, 27 (6): 1503–1515. DOI: 10.1016/j.jhealeco.2008.06.005
[4]
Yin J, He X, Yang Y, et al. Outcome-based evaluations of social interaction valence in a contingent response context. Frontiers in Psychology,2019, 10: 2557. DOI: 10.3389/fpsyg.2019.02557
[5]
Sirakaya S. Recidivism and social interactions. Journal of the American Statistical Association,2006, 101: 863–877. DOI: 10.1198/016214506000000177
[6]
Blume L E, Brock W A, Durlauf S N, et al. Linear social interactions models. Journal of Political Economy,2015, 123 (2): 444–496. DOI: 10.1086/679496
[7]
Xu H. Social interactions in large networks: A game theoretic approach. International Economic Review,2018, 59 (1): 257–284. DOI: 10.1111/iere.12269
[8]
Lin Z, Xu H. Estimation of social-influence-dependent peer pressure in a large network game. The Econometrics Journal,2017, 20 (3): S86–S102. DOI: 10.1111/ectj.12102
[9]
Sun Z, Du Y, Chen X, et al. Implicit community discovery based on microblog theme homogeneit. IOP Conference Series: Materials Science and Engineering,2020, 790 (1): 012045. DOI: 10.1088/1757-899X/790/1/012045
[10]
Favre G, Figeac J, Grossetti M, et al. Social distance in France: Evolution of homogeneity within personal networks from 2001 to 2017. Social Networks,2022, 68: 70–83. DOI: 10.1016/j.socnet.2021.05.001
[11]
Liu L, Wang X, Zheng Y, et al. Homogeneity trend on social networks changes evolutionary advantage in competitive information diffusion. New Journal of Physics,2020, 22 (1): 013019. DOI: 10.1016/j.socnet.2021.05.001
[12]
Shalizi C R, Thomas A C. Homophily and contagion are generically confounded in observational social network studies. Sociological Methods & Research,2011, 40 (2): 211–239. DOI: 10.1177/0049124111404820
[13]
Davin J P, Gupta S, Piskorski M J. Separating homophily and peer influence with latent space. Boston, MA: Harvard Business School, 2014.
[14]
Hill S, Provost F, Volinsky C. Network-based marketing: Identifying likely adopters via consumer networks. Statistical Science,2006, 21 (2): 256–276. DOI: 10.1214/088342306000000222
[15]
Worrall H. Community detection as a method to control for homophily in social networks. Corpus ID: 15409339, 2014.
[16]
McFowland III E, Shalizi C R. Estimating causal peer influence in homophilous social networks by inferring latent locations. Journal of the American Statistical Association,2023, 118: 707–718. DOI: 10.1080/01621459.2021.1953506
[17]
Aguirregabiria V, Mira P. Sequential estimation of dynamic discrete games. Econometrica,2007, 75 (1): 1–53. DOI: 10.1111/j.1468-0262.2007.00731.x
[18]
Egesdal M, Lai Z, Su C-L. Estimating dynamic discrete-choice games of incomplete information. Quantitative Economics,2015, 6 (3): 567–597. DOI: 10.3982/QE430
[19]
Manski C F. Identification of endogenous social effects: The reflection problem. The Review of Economic Studies,1993, 60 (3): 531–542. DOI: 10.2307/2298123
[20]
Seim K. An empirical model of firm entry with endogenous product-type choices. The RAND Journal of Economics,2006, 37 (3): 619–640. DOI: 10.1111/j.1756-2171.2006.tb00034.x
[21]
Han Q, Xu K, Airoldi E. Consistent estimation of dynamic and multi-layer block models. In: Proceedings of the 32nd International Conference on Machine Learning. Lille, France: JMLR, 2015, 37: 1511–1520.
[22]
Pensky M, Zhang T. Spectral clustering in the dynamic stochastic block model. Electronic Journal of Statistics,2019, 13: 678–709. DOI: 10.1214/19-EJS1533
[23]
Chunaev P. Community detection in node-attributed social networks: A survey. Computer Science Review,2020, 37: 100286. DOI: 10.1016/j.cosrev.2020.100286
[24]
Liu M, Guo J, Chen J. Community discovery in weighted networks based on the similarity of common neighbors. Journal of Information Processing Systems,2019, 15 (5): 1055–1067. DOI: 10.3745/JIPS.04.0133
[25]
Gao C, Ma Z, Zhang A Y, et al. Achieving optimal misclassification proportion in stochastic block models. The Journal of Machine Learning Research,2017, 18 (1): 1980–2024. DOI: 10.5555/3122009.3153016
[26]
Bickel P J, Chen A. A nonparametric view of network models and Newman–Girvan and other modularities. 2009. Proceedings of the National Academy of Sciences,2009, 106 (50): 21068–21073. DOI: 10.1073/pnas.0907096106
[27]
Zhao Y, Levina E, Zhu J. Consistency of community detection in networks under degree-corrected stochastic block models. Annals of Statistics,2012, 40: 2266–2292. DOI: 10.1214/12-AOS1036
[28]
Kasahara H, Shimotsu K. Sequential estimation of structural models with a fixed point constraint. Econometrica,2012, 80 (5): 2303–2319. DOI: 10.3982/ECTA8291
Kim J, Kim M, Choi J, et al. Offline social interactions and online shopping demand: Does the degree of social interactions matter? Journal of Business Research,2019, 99: 373–381. DOI: 10.1016/j.jbusres.2017.09.022
[3]
Poutvaara P, Siemers L R. Smoking and social interaction. Journal of Health Economics,2008, 27 (6): 1503–1515. DOI: 10.1016/j.jhealeco.2008.06.005
[4]
Yin J, He X, Yang Y, et al. Outcome-based evaluations of social interaction valence in a contingent response context. Frontiers in Psychology,2019, 10: 2557. DOI: 10.3389/fpsyg.2019.02557
[5]
Sirakaya S. Recidivism and social interactions. Journal of the American Statistical Association,2006, 101: 863–877. DOI: 10.1198/016214506000000177
[6]
Blume L E, Brock W A, Durlauf S N, et al. Linear social interactions models. Journal of Political Economy,2015, 123 (2): 444–496. DOI: 10.1086/679496
[7]
Xu H. Social interactions in large networks: A game theoretic approach. International Economic Review,2018, 59 (1): 257–284. DOI: 10.1111/iere.12269
[8]
Lin Z, Xu H. Estimation of social-influence-dependent peer pressure in a large network game. The Econometrics Journal,2017, 20 (3): S86–S102. DOI: 10.1111/ectj.12102
[9]
Sun Z, Du Y, Chen X, et al. Implicit community discovery based on microblog theme homogeneit. IOP Conference Series: Materials Science and Engineering,2020, 790 (1): 012045. DOI: 10.1088/1757-899X/790/1/012045
[10]
Favre G, Figeac J, Grossetti M, et al. Social distance in France: Evolution of homogeneity within personal networks from 2001 to 2017. Social Networks,2022, 68: 70–83. DOI: 10.1016/j.socnet.2021.05.001
[11]
Liu L, Wang X, Zheng Y, et al. Homogeneity trend on social networks changes evolutionary advantage in competitive information diffusion. New Journal of Physics,2020, 22 (1): 013019. DOI: 10.1016/j.socnet.2021.05.001
[12]
Shalizi C R, Thomas A C. Homophily and contagion are generically confounded in observational social network studies. Sociological Methods & Research,2011, 40 (2): 211–239. DOI: 10.1177/0049124111404820
[13]
Davin J P, Gupta S, Piskorski M J. Separating homophily and peer influence with latent space. Boston, MA: Harvard Business School, 2014.
[14]
Hill S, Provost F, Volinsky C. Network-based marketing: Identifying likely adopters via consumer networks. Statistical Science,2006, 21 (2): 256–276. DOI: 10.1214/088342306000000222
[15]
Worrall H. Community detection as a method to control for homophily in social networks. Corpus ID: 15409339, 2014.
[16]
McFowland III E, Shalizi C R. Estimating causal peer influence in homophilous social networks by inferring latent locations. Journal of the American Statistical Association,2023, 118: 707–718. DOI: 10.1080/01621459.2021.1953506
[17]
Aguirregabiria V, Mira P. Sequential estimation of dynamic discrete games. Econometrica,2007, 75 (1): 1–53. DOI: 10.1111/j.1468-0262.2007.00731.x
[18]
Egesdal M, Lai Z, Su C-L. Estimating dynamic discrete-choice games of incomplete information. Quantitative Economics,2015, 6 (3): 567–597. DOI: 10.3982/QE430
[19]
Manski C F. Identification of endogenous social effects: The reflection problem. The Review of Economic Studies,1993, 60 (3): 531–542. DOI: 10.2307/2298123
[20]
Seim K. An empirical model of firm entry with endogenous product-type choices. The RAND Journal of Economics,2006, 37 (3): 619–640. DOI: 10.1111/j.1756-2171.2006.tb00034.x
[21]
Han Q, Xu K, Airoldi E. Consistent estimation of dynamic and multi-layer block models. In: Proceedings of the 32nd International Conference on Machine Learning. Lille, France: JMLR, 2015, 37: 1511–1520.
[22]
Pensky M, Zhang T. Spectral clustering in the dynamic stochastic block model. Electronic Journal of Statistics,2019, 13: 678–709. DOI: 10.1214/19-EJS1533
[23]
Chunaev P. Community detection in node-attributed social networks: A survey. Computer Science Review,2020, 37: 100286. DOI: 10.1016/j.cosrev.2020.100286
[24]
Liu M, Guo J, Chen J. Community discovery in weighted networks based on the similarity of common neighbors. Journal of Information Processing Systems,2019, 15 (5): 1055–1067. DOI: 10.3745/JIPS.04.0133
[25]
Gao C, Ma Z, Zhang A Y, et al. Achieving optimal misclassification proportion in stochastic block models. The Journal of Machine Learning Research,2017, 18 (1): 1980–2024. DOI: 10.5555/3122009.3153016
[26]
Bickel P J, Chen A. A nonparametric view of network models and Newman–Girvan and other modularities. 2009. Proceedings of the National Academy of Sciences,2009, 106 (50): 21068–21073. DOI: 10.1073/pnas.0907096106
[27]
Zhao Y, Levina E, Zhu J. Consistency of community detection in networks under degree-corrected stochastic block models. Annals of Statistics,2012, 40: 2266–2292. DOI: 10.1214/12-AOS1036
[28]
Kasahara H, Shimotsu K. Sequential estimation of structural models with a fixed point constraint. Econometrica,2012, 80 (5): 2303–2319. DOI: 10.3982/ECTA8291
7: Merge community i and community j and record this new community collection as C_{-ij}, then calculate Q_{-ij}=Q(C_{-ij})
8: end for
9: if \max_{C_{-ij}}(Q_{-ij})>Q(C), let C={\rm argmax}_{C_{-ij} }(Q_{-ij}) and back to Step 5; else return \sigma^0(i)=C(i) for all i\in{\cal{I}} and k=|C|.
Algorithm 2. A refinement scheme for community detection.
Input: Weighted adjacency matrix W, number of communities k, initial community detection method \sigma^0 (i.e., Algorithm 1).
Output: Community assignment \tilde{\sigma}.
1: fori = 1 to ndo
2: Apply \sigma^0 on W_{-i} for all j \ne i, obtain \sigma^0_i(j) and \sigma^0_i(i)=0, where W_{-i} denotes the (n-1)\times (n-1) submatrix of W with its i{\rm th} row and i{\rm th} column removed.
3: Define \tilde{\sigma}_i:[n]\ \to\ [k] by setting \tilde{\sigma}_i(j)=\sigma^0_i(j) for all i \ne j where [n]=\{1,\cdots,n\}, and
Table
1.
Peer pressure estimation with homogeneity identification error.
Description of results
\beta=-1
\varphi_1=1
\phi_0=1
\phi_1=1
\gamma'_1(1)=-2
\gamma'_1(2)=-1
\gamma'_1(3)=1
\gamma'_1(4)=2
mean
−1.0121
1.0211
0.9936
1.0046
−2.0063
−1.0064
1.0378
2.0324
n=500
sd
0.3844
0.5127
0.7130
0.6358
0.5607
0.5797
0.6780
0.7941
mse
0.1464
0.2607
0.5033
0.4002
0.3113
0.3327
0.4566
0.6254
mean
−1.0386
1.0209
0.9530
0.9677
−1.9985
−0.9785
1.0758
2.0837
n=1000
sd
0.2640
0.3237
0.4969
0.4245
0.4538
0.4053
0.5275
0.5704
mse
0.0705
0.1041
0.2466
0.1794
0.2039
0.1631
0.2812
0.3291
mean
−1.0100
0.9544
0.9995
0.9184
−2.0177
−1.0049
1.0023
2.0096
n=1500
sd
0.1739
0.2954
0.2616
0.3312
0.3824
0.2242
0.3916
0.3620
mse
0.0300
0.0867
0.0678
0.1086
0.1449
0.0518
0.1518
0.1364
The first row of data represents the average of the predicted results, and the second row is the corresponding standard deviation, and the third row is mean squared error.
Table
2.
Peer pressure estimation without homogeneity identification error.
Description of results
\beta=-1
\varphi_1=1
\phi_0=1
\phi_1=1
\gamma'_1(1)=-2
\gamma'_1(2)=-1
\gamma'_1(3)=1
\gamma'_1(4)=2
mean
−1.0662
0.9904
0.9811
0.9019
−2.0257
−0.9462
1.0923
2.1071
n=500
sd
0.3311
0.4596
0.7589
0.6382
0.5729
0.5127
0.6097
0.7203
mse
0.1129
0.2093
0.5705
0.4128
0.3257
0.2632
0.3766
0.5252
mean
−0.9984
0.9895
0.9884
0.9589
−2.0294
−1.0123
1.0075
2.0187
n=1000
sd
0.2322
0.3035
0.4612
0.4475
0.3812
0.3229
0.4173
0.4791
mse
0.0534
0.0913
0.2107
0.1999
0.1448
0.1033
0.1724
0.2275
mean
−1.0449
1.0200
0.9561
0.9710
−1.9413
−0.9538
1.0869
2.0915
n=1500
sd
0.1947
0.2463
0.4545
0.3837
0.3339
0.2932
0.4012
0.4564
mse
0.0395
0.0605
0.2064
0.1466
0.1138
0.0872
0.1669
0.2146
The first row of data represents the average of the predicted results, and the second row is the corresponding standard deviation, and the third row is mean squared error.
Table
3.
Peer pressure estimation without homogeneity identification error in static network.
Description of results
\beta=-1
\varphi_1=1
\phi_0=1
\phi_1=1
\gamma'_1(1)=-2
\gamma'_1(2)=-1
\gamma'_1(3)=1
\gamma'_1(4)=2
mean
−1.0142
0.9650
0.9821
0.9143
−2.1668
−1.0378
1.0212
2.0608
n=500
sd
0.4945
0.6656
1.0955
0.9485
0.8194
0.7533
0.8737
1.0097
mse
0.2423
0.4398
1.1885
0.8981
0.6925
0.5632
0.7562
1.0130
mean
−1.0496
1.0126
0.9703
0.9306
−1.9781
−0.9654
1.0754
2.1021
n=1000
sd
0.3532
0.4359
0.7097
0.6817
0.5820
0.5615
0.6576
0.7797
mse
0.1260
0.1883
0.4995
0.4649
0.3358
0.3133
0.4338
0.6122
mean
−1.0746
1.0175
0.8553
0.9018
−1.9112
−0.9108
1.1671
2.2148
n=1500
sd
0.3003
0.3659
0.6127
0.5475
0.5535
0.4602
0.5753
0.6518
mse
0.0949
0.1328
0.3925
0.3064
0.3111
0.2177
0.3556
0.4668
The first row of data represents the average of the predicted results, the second row is the corresponding standard deviation, and the third row is the mean squared error.
Table
6.
Overall change in the probability of establishing connections between communities.
Description of results
\beta=-1
\varphi_1=1
\phi_0=1
\phi_1=1
\gamma'_1(1)=-2
\gamma'_1(2)=-1
\gamma'_1(3)=1
\gamma'_1(4)=2
mean
−1.0662
1.0359
0.9494
0.8862
−1.914
−0.9346
1.1207
2.1141
P_{in}=0.6
sd
0.3581
0.4815
0.6219
0.6960
0.6108
0.5541
0.6104
0.6990
mse
0.1313
0.2308
0.3854
0.4925
0.3768
0.3082
0.3834
0.4967
mean
−1.0546
1.0218
0.9682
0.9100
−1.9207
−0.9497
1.0980
2.0856
P_{in}=0.65
sd
0.3931
0.5307
0.6349
0.6605
0.6320
0.5852
0.6490
0.7419
mse
0.1560
0.2793
0.4000
0.4400
0.4018
0.3415
0.4266
0.5523
mean
−1.0178
1.0000
0.9513
0.9401
−1.9788
−0.9903
1.0578
2.0440
P_{in}=0.7
sd
0.3773
0.4950
0.6759
0.6198
0.5705
0.5524
0.6202
0.7168
mse
0.1413
0.2426
0.4547
0.3831
0.3227
0.3022
0.3842
0.5106
mean
−1.0121
1.0211
0.9936
1.0046
−2.0063
−1.0064
1.0378
2.0324
P_{in}=0.75
sd
0.3844
0.5127
0.7130
0.6358
0.5607
0.5797
0.6780
0.7941
mse
0.1464
0.2607
0.5033
0.4002
0.3113
0.3327
0.4566
0.6254
mean
−1.0274
1.0232
1.0281
0.9985
−1.9805
−0.9958
1.0410
2.0283
P_{in}=0.8
sd
0.3613
0.5463
0.7509
0.5649
0.5510
0.5641
0.6808
0.7904
mse
0.1300
0.2960
0.5590
0.3160
0.3009
0.3151
0.4606
0.6193
mean
−1.0091
0.9622
1.0916
1.0139
−2.0263
−1.0278
0.9845
1.9485
P_{in}=0.85
sd
0.3913
0.5619
0.7502
0.5377
0.5805
0.6085
0.7539
0.8560
mse
0.1517
0.3140
0.5655
0.2864
0.3343
0.3673
0.5630
0.7280
mean
−1.0369
1.0129
1.0359
1.0453
−1.9987
−0.9779
1.0484
2.0308
P_{in}=0.9
sd
0.3681
0.4898
0.8767
0.5839
0.5721
0.5599
0.7943
0.9219
mse
0.1355
0.2377
0.7623
0.3391
0.2340
0.3109
0.6269
0.8424
The first row of data represents the average of the predicted results, and the second row is the corresponding standard deviation, and the third row is mean squared error. P_{\rm in} means the probability of establishing directed edges at nodes of the same community and P_{\rm out} means the probability of establishing directed edges at nodes of different communities, where P_{\rm out}=1-P_{\rm in}. Network size n=400 . Homogeneity exists for identification errors.
The first row of data represents the average of the predicted results, and the second row is the corresponding standard deviation, and the third row is mean squared error. The probability matrix P=\left[ {\begin{array}{*{20}{c}} {0.75}&{0.25}&{0.35}&{0.15}\\ {0.35}&{0.80}&{0.05}&{0.20}\\ {0.30}&{0.35}&{0.70}&{0.25}\\ {0.15}&{0.25}&{0.15}&{0.75} \end{array}} \right] for establishing links between communities. Note that in a real simulation, the probability is not the above result; for example, p_{({\bf{C}}(1),{\bf{C}}(1))}=P_{11} and p_{({\bf{C} }(1),{\bf{C} }(2))}=(1-P_{11})\dfrac{P_{12} }{P_{12}+P_{13}+P_{14} }. Homogeneity exists for identification errors.