Hyperevent network modelling of partially observed gossip data

Veronica Poda^a¹¹1[email protected], Veronica Vinciotti^a²²2[email protected], Ernst C. Wit^b³³3[email protected]

a.

Department of Mathematics, University of Trento,

b.

Institute of Computing, Università della Svizzera italiana

(November 23, 2025)

Abstract

Gossiping is a widespread social phenomenon that shapes relationships and information flow in communities. From a network theoretic point of view, gossiping can be seen as a higher-order interaction, as it involves at least two persons talking about a non-present third. The mechanism of gossiping is complex: it is most likely dynamic, as its intensity changes over time, and possibly viral, if a gossiping event induces future gossiping, such as a repetition or retaliation. We define covariates of interest for these effects and propose a relational hyperevent model to study and quantify these complex dynamics. We consider survey data collected yearly from 44 secondary schools in Hungary. No information is available about the exact timing of the events nor about the aggregate number of events within the yearly time interval. What is measured is whether at least one gossiping event has occurred in a given time interval. We extend inference for relational hyperevent models to the case of right-censored interval-time data and show how flexible and efficient generalized additive models can be used for estimation of effects of interest. Our analysis on the school data illustrates how a model that accounts for linear, smooth and random effects can identify the social drivers of gossiping, while revealing complex temporal dynamics.

Keywords: dynamic network, right-censoring, relational hyperevent model, generalized additive mixed model

1 Introduction

Gossiping is a central aspect of social interactions, influencing relationships and affecting group behavior (behaviorgossip; foster_gossip). It plays a key role in the regulation of reputations, the reinforcement of group norms, and the dissemination of information within social networks. While it can foster cooperation, it can also lead to harmful outcomes, such as social exclusion and reputational damage (KisfalusiGossipReputation). Moreover, in a school setting, gossiping can manifest itself as a form of bullying, shaping students’ social and academic development (bullying; bullies). Understanding the mechanisms that generate and sustain gossiping is therefore relevant to sociology and educational research.

A number of papers have studied the factors that can explain the occurrence of gossiping. These can be about characteristics of the people involved in gossiping. For example, nynkegossip find that gossipers tend to gossip more about a target of their same gender. Other, possibly time-varying, exogenous variables may also have an association with gossiping. However, the occurrence of gossiping can depend also on the history of past gossiping, which is the case of endogenous factors. For example, KisfalusiGossipReputation find a positive effect for outdegree and indegree popularity. For the first one, this suggests that the occurrence of gossiping tends to induce further occurrences of gossiping in the future, while, for the second one, this means that people who are the target of gossiping may, as a result of this, be the subject of further gossiping. Similarly, reciprocity – also referred to as retaliation in this setting – is found by both studies to be present, with the gossiper becoming itself the future target of a gossiping initiated by its original target. Finally, self-reported gossiping seems to co-evolve with perceived gossiping (nynkegossip), or between gossiping and competition for reputation (KisfalusiGossipReputation).

The studies above show how gossiping, as various other social mechanisms, is characterized by a fundamentally viral nature. This is acknowledged by KisfalusiGossipReputation and nynkegossip with the inclusion of factors like reciprocity and popularity in their statistical model. However, both studies fall short in capturing the complexity of the underlying dynamics. In nynkegossip, the effects are estimated from data collected at a single time point, while KisfalusiGossipReputation considers longitudinal data but includes in the model only linear effects for the covariates. The latter is partly due to the choice of a Stochastic Actor-Oriented Model (SAOM) (saom), as complex dynamic models are computationally expensive and difficult to fit within this framework. Alternatively, in a relational event model the effects of potential drivers have been recently extended to random effects (uzaheta2023random; boschialienspecies), non-linear and time-varying effects (bauer2022smooth; boschialienspecies; lembo25; rutaheterogeneity), with efficient computational tools available to fit these complex models.

Gossiping is by definition a higher-order interaction, as it involves at least two people talking about a third non-present person (DoresCruz). The temporal nature of gossiping has motivatived the choice of continuous time dynamic networks models for modelling the gossip process. Since gossiping is a relational event rather than a relational state, relational event models are more appropriate than SAOMs in capturing the underlying dynamics that describe the sequence of the gossiping hyperevents. Relational hyperevent models (RHEM) (rhem) have been recently introduced as an extension of traditional relational event models to the case of higher-order interactions involving multiple senders and/or multiple receivers.

We consider the longitudinal school survey study by gossip_dataset, in which data are collected once every academic year. The data do not contain information about the exact timing of the gossiping events nor about the number of times a group of students has gossiped about the same target during the yearly time interval. What can be deduced by the nominations made by a student is that at least one gossiping event of a certain type has occurred in a given time interval. Ignoring this censoring, as in previous studies (nynkegossip; KisfalusiGossipReputation), can lead to biased estimates of effects. We therefore develop an extension of the inference for relational hyperevent models to the case of right-censored interval-time data. Within this setting, we show how the likelihood can be written as that of a particular type of regression model. Thus, similarly to traditional relational event models, flexible and efficient generalized additive models can be used for the estimation of effects of interest, so that the potentially complex temporal dynamics of gossiping can be recovered from the partially observed data.

The remainder of this paper is organized as follows. The next session introduces the school survey data described in gossip_dataset which motivates the methodological development. We then present the RHEM framework for modelling gossiping hyperevents, followed by a section describing the extension of traditional inferential approaches for RHEM to right-censored interval-time data. We evaluate the performance of the proposed approach via a simulation study and compare different ways of calculating the effect of time-varying covariates in the case of partially observed data. Finally, we present an illustration of the method in the school survey data, and discuss the gossiping dynamics that are inferred from this.

2 RECENS: a longitudinal school survey study involving gossiping

Refer to caption — Figure 1: (a) Groups of mutual friends tend to be small in size, (b) Gossiping involves predominantly two close friends talking badly about a target, (c) Gossiping decreases over time, (d) There is a large heterogeneity in gossiping activity among the classes.

In this paper, we study the mechanisms underlying gossiping using data from the Wired into Each Other longitudinal study conducted by the Research Center for Educational and Network Studies (RECENS) in Hungary (gossip_dataset). The study involves a four-wave survey of $1,686$ students in $44$ Hungarian secondary school classes, across 7 schools in 4 Hungarian towns. The study started with a survey distributed two months after the beginning of high school (October-November 2010) to 9th-grade students enrolled in the selected schools. These are students with an average age of $15.1$ years (gossip_dataset). Data were further collected six months later (April 2011), one year after the second wave (April 2012), and one year after the third wave (April 2013). The longitudinal nature of the data provides an opportunity to understand the dynamics of gossiping, but it also comes with several methodological challenges.

The first challenge is that the interaction between gossipers talking badly about someone not present is not observed directly. Instead, the survey records the nominations of each student in response to the question: Of whom do you say bad things to your friends? This type of data provides dyadic information that can be used as a proxy to understand gossiping, as it was done in previous studies (KisfalusiGossipReputation; nynkegossip). In this paper, however, we combine this information with another survey question, where students are asked to nominate their friends: Please tell us how much you like or dislike your classmates. [–2: strong dislike or hate, –1: dislike, 0: neutral, 1: like, 2: close friendship.] We use this information, to construct all cliques of mutual close friends within each class. Figure 1a shows how the largest group of mutual friends is made up of $6$ individuals, while the majority of groups are smaller in size. From this, we define a gossiping hyperevent as the largest group of reciprocated close friendships who talk badly about the same person. In total, we find $212$ of such gossiping hyperevents. Figure 1b shows how the majority of gossiping hyperevents involve two friends talking badly about a target.

The gossiping hyperevents can be further described by the waves and classes in which they happen, by the receivers and sender groups, and by gender composition. Figure 1c shows how gossiping decreases over time. At the class level, $9$ out of the $44$ classes recorded no gossiping. Out of the $35$ classes that experienced at least one gossiping hyperevent, Figure 1d shows a large heterogeneity in the gossiping activity. At the receiver level, $142$ out of the $1,686$ students were targeted by gossiping at least once, while $114$ out of the total $7,276$ potential groups of mutual friends participated in at least one gossiping hyperevent. This indicates that gossiping is generally rare, which is consistent with previous studies (sparsenet). In particular, considering all potential sender groups and receivers within each class generates a total of $299,419$ possible gossiping hyperevents across all four waves. Of these, we observe only $212$ . We will later refer to the set of potential events in a wave $k$ as the risk set $\mathcal{R}(k)$ . Finally, as discussed also in the literature (KisfalusiGossipReputation), the hyperevent data show a predominance of females, both in the role of gossiper and receiver of gossiping. In particular, 80.3% among gossipers and 74.2% among receivers are females. Moreover, females tend to gossip together, as 52.3% of all sender groups are only female groups, compared to 25.7% of only males and 21.1% of mixed groups.

The second challenge that comes with these data is that the nominations made by students only allow us to reconstruct that at least one gossiping event has occurred since the last survey. No information is available about the exact timing of the event nor about the number of times that that gossiping event has occurred. In particular, this means that the $212$ gossiping hyperevents from Figure 1 are only a lower bound of the total number of gossiping hyperevents that occurred during the four waves. This leads to relational event data that are partially observed both in terms of the event times (interval-censored) and of the counts associated to a gossiping hyperevent (right-censored). Indeed, the information available with a gossiping hyperevent is that there has been at least one occurrence of that event within the wave when it was measured.

In the next sections, we meet these challenges by first describing a relational event model for gossiping hyperevents, and then by developing an inferential procedure that accounts for the censored nature of the data.

3 A relational hyperevent model of gossiping

Since gossiping is a higher-order interaction, we consider a relational hyperevent model (rhem) and define in this section a number of covariates that may be informative in describing the dynamics of gossiping.

3.1 The relational hyperevent model

A directed relational hyperevent is an interaction between a set of senders and a set of receivers, occurring at a specific point in time. Formally, let $V$ denote the set of individuals in a social network. A relational hyperevent at time $t$ is defined as the tuple $(S,R,t)$ , where:

1.

$S\subset V$ is the set of senders;
2.

$R\subset V$ is the set of receivers;
3.

$t\in[0,T]$ is the time at which the hyperevent occurs.

In our setting, the receiver set consists of a single individual $r\in V$ who, at time $t$ , is the target of a gossip by a sender group $S$ of mutual friends. So we define a gossiping hyperevent as the tuple $(S,r,t)$ . A gossiping hyperevent process is a marked point process,

\{((S_{k},r_{k}),\,t_{k})\,|\,k\geq 1\},

where, at a random time point $t_{k}$ , an interaction occurs from the set of senders $S_{k}$ to the receiver $r_{k}$ . Figure 2 provides a visualization of a realization of the stochastic process, with gossiping hyperevents occurring over time.

Figure 2: Higher-order gossiping interactions evolving over time. At each time point, a gossiping hyperevent occurs, with the grey nodes representing the set of gossip senders, while the receiver of the gossip is shown in white. For each hyperevent, we evaluated three covariates from Table 1. At time

t_{2}

, there is a repetition of the same receiver and a subset repetition with respect to time

t_{0}

and a reciprocal event with respect to time

t_{1}

Associated with the relational event process there exists a multivariate counting process $N$ , which records the number of directed interactions from $S$ to $r$ up to time $t$ :

N_{Sr}(t)=\sum_{k\geq 1}1\{t_{k}\leq t,\ S_{k}=S,\ r_{k}=r\}.

From the Doob–Meyer decomposition (meyer1962decomposition), this submartingale process can be split into a predictable process $\Lambda_{Sr}(t)$ , and a residual martingale process $M_{Sr}(t)$ , that is

N_{Sr}(t)=\Lambda_{Sr}(t)+M_{Sr}(t).

If it exists, the derivative of the cumulative hazard process $\Lambda_{Sr}$

\lambda_{Sr}(t)=\frac{d\Lambda_{Sr}}{dt}(t)

defines the instantaneous hazard for the relational hyperevent $(S,r)$ .

A Relational Hyperevent Model (RHEM) describes how covariates of interest are associated to the hazard of a hyperevent. In particular, we consider the model

\lambda_{Sr}(t)\,=\,Y_{Sr}(t)\,\lambda_{0}(t)\,\exp\!\left\{f(\bm{x}_{Sr}(t))+\bm{\gamma}^{\top}\bm{z}_{Sr}\right\},

(1)

with $Y_{Sr}(t)$ an indicator, equal to $1$ if the hyperevent $(S,r)$ is at risk of happening at time $t$ , and $0$ otherwise, a non-parametric baseline hazard function $\lambda_{0}(t)$ , which is common to all hyperevents and does not depend on the specific pair $(S,r)$ , and a parametric component of the model capturing the effect of covariates of interest on the hazard. The covariates $\bm{x}_{Sr}(t)$ can include both endogenous and exogenous covariates that are measurable with respect to the history of the process. In particular, endogenous covariates depend on the history of the relational hyperevent network, such as repetion or reciprocity of a gossiping hyperevent, while exogenous covariates are time-dependent or defined by actor-level attributes but they are independent of previous interactions, such as gender of senders/receiver or their average age. These covariates can enter the model either linearly, i.e., $f(\bm{x}_{Sr}(t))=\bm{\beta}^{\top}\bm{x}_{Sr}(t)$ , for some vector of parameters $\bm{\beta}$ , or non-linearly through a more flexible smooth function $f(\cdot)$ . In addition to the fixed effects, $\bm{z}_{Sr}$ represents a vector of binary covariates associated to random effects $\bm{\gamma}\sim\mathcal{N}(0,\Sigma)$ . These random effects account for unobserved heterogeneity, such as variation in the average hazard rates of interactions associated to different classrooms, sender groups or receivers.

The next section describes a number of endogenous covariates that may play a role in describing the dynamics of gossiping.

3.2 Endogenous Gossiping Covariates

The hyperevent nature of gossiping requires covariates that account for its higher-order structure. Moreover, we focus on the case of multiple senders and a single receiver, rather than the case of a single sender and multiple receivers considered by LernerLomi2023 for modelling email communications. Table 1 lists a number of endogenous covariates that are of potential interest in describing the dynamics of gossiping. We split these into covariates that capture monadic effects, dyadic effects and those that refer to triadic effects.

endogenous covariates describing monadic effects Sender Degree $x^{(\mathrm{sd})}_{S}(t)=\dfrac{1}{|S|}\displaystyle\sum_{\begin{subarray}{c}t_{i}<t\end{subarray}}1_{\{S\subseteq S_{i}\}}$ Receiver Degree $x^{(\mathrm{rd})}_{r}(t)=\displaystyle\sum_{\begin{subarray}{c}t_{i}<t\end{subarray}}1_{\{r=r_{i}\}}$ endogenous covariates describing dyadic effects Repetition $x_{Sr}^{(\mathrm{rep})}(t)=\dfrac{1}{|S|}\displaystyle\sum_{\begin{subarray}{c}t_{i}<t\end{subarray}}1_{\{S_{i}=S\land r_{i}=r\}}$ Subset Repetition $x_{Sr}^{(\mathrm{sub\_rep})}(t)=\displaystyle\sum_{p=1}^{|S|}\frac{1}{\binom{|S|}{p}}\sum_{S^{\prime}\in\binom{S}{p}}\mathrm{hy\_deg}_{t}(S^{\prime},r)$ Retaliation/Reciprocity $x^{(\mathrm{rec})}_{Sr}(t)=\dfrac{1}{|S|}\displaystyle\sum_{t_{i}<t}1_{\{r\in S_{i}\,\wedge\,r_{i}\in S\}}$ endogenous covariates describing triadic effects Transitive Closure $x^{(\mathrm{tc})}_{Sr}(t)=\dfrac{1}{|S|}\displaystyle\sum_{\begin{subarray}{c}a\neq s,r\\ s\in S\end{subarray}}\min\{\mathrm{hy\_deg}_{t}(S,a),\,\mathrm{hy\_deg}_{t}(a,r)\}$ Cyclic Closure $x^{(\mathrm{cc})}_{Sr}(t)=\dfrac{1}{|S|}\displaystyle\sum_{\begin{subarray}{c}a\neq s,r\\ s\in S\end{subarray}}\min\{\mathrm{hy\_deg}_{t}(r,a),\,\mathrm{hy\_deg}_{t}(a,s)\}$ Sender Balance $x^{(\mathrm{sb})}_{Sr}(t)=\dfrac{1}{|S|}\displaystyle\sum_{\begin{subarray}{c}a\neq s,r\\ s\in S\end{subarray}}\min\{\mathrm{hy\_deg}_{t}(a,s),\,\mathrm{hy\_deg}_{t}(a,r)\}$ Receiver Balance $x^{(\mathrm{rb})}_{Sr}(t)=\dfrac{1}{|S|}\displaystyle\sum_{\begin{subarray}{c}a\neq s,r\\ s\in S\end{subarray}}\min\{\mathrm{hy\_deg}_{t}(S,a),\,\mathrm{hy\_deg}_{t}(r,a)\}$

Table 1: Endogenous covariates describing gossiping hyperevents based on the history of the relational process. Solid lines (

\to

) refer to past relational hyperevents, while dashed arrows (

\dashrightarrow

) indicate current relational hyperevents.

Sender Degree.

This covariate measures the activity level of a sender group. We define it as the number of times a sender set $S$ is involved in a gossiping hyperevent before the current time $t$ . As this depends on the size of the sender set, we normalize it by the cardinality of $S$ , leading to the definition

x^{(\mathrm{sd})}_{S}(t)=\dfrac{1}{|S|}\sum_{t_{i}\,<\,t}1_{\{S\,\subseteq\,S_{i}\}}.

In gossip dynamics, a high sender degree indicates a persistent gossiping activity associated to a group of gossipers, so an effect associated to this variable may point to the tendency of individuals to maintain their role as central gossipers (sparsenet).

Receiver Degree.

Similarly, the receiver degree captures how often a given target has been the subject of gossiping prior to the current time $t$ , i.e.,

x^{(\mathrm{rd})}_{r}(t)=\sum_{t_{i}\,<\,t}1_{\{r\,=\,r_{i}\}}.

For example, Figure 2 shows the same receiver being the target of gossiping at time $t_{2}$ and at time $t_{0}$ . In a school setting, this is an important variable for assessing whether bullying is taking place (KisfalusiGossipReputation).

Repetition.

Similarly to (LernerLomi2023), but adapted to our setting with multiple senders, we include a covariate that measures past repetition of a hyperevent. Formally, this is defined as the number of past occurrences in which the same sender set $S$ has gossiped about a receiver $r$ , normalized by the size of $S$ :

x_{Sr}^{(\mathrm{rep})}(t)=\frac{1}{|S|}\sum_{t_{i}\,<\,t}1_{\{S_{i}\,=\,S\,\wedge\,r_{i}\,=\,r\}}.

Reciprocity (Retaliation).

A natural question in the study of gossiping is whether students are more likely to gossip about individuals who they believe are gossiping about them (nynkegossip). This mechanism is naturally dynamic: retaliation occurs when a previous target of gossip becomes a sender and spreads information about those who gossiped about them. We operationalize this by counting whether the current receiver $r$ has previously acted as a sender gossiping about members of $S$ :

x^{(\mathrm{rec})}_{Sr}(t)\;=\;\frac{1}{|S|}\sum_{t_{i}\,<\,t}1_{\{r\,\in\,S_{i}\;\wedge\;r_{i}\,\in\,S\}}.

For example, Figure 2, shows a retaliation at time $t_{1}$ of the event that happened at time $t_{0}$ and at time $t_{2}$ of the event that happened at time $t_{1}$ .

Subset Repetition.

This covariate relaxes the definition of repetition by measuring whether subsets of senders within $S$ have previously gossiped about $r$ . In particular, for each possible subset of $S$ of size $p$ , we consider all subsets $S^{\prime}\subseteq S$ of that size and, for each of them, we count how often they have previously gossiped about $r$ . Similarly to the definition of LernerLomi2023 for the case of multiple receivers, we define this last quantity as the hyperdegree of $S^{\prime}$ with respect to $r$ , that is

\mathrm{hy\_deg}_{t}(S^{\prime},r)=\sum_{t_{i}\,<\,t}1_{\{S^{\prime}\,\subseteq\,S_{i}\,\land\,r\,=\,r_{i}\}}.

(2)

These hyperdegrees are normalized by the number of subsets $S^{\prime}$ of size $p$ and summed across all possible sizes $p=1,\dots,|S|$ , leading to the following definition of subset repetition

x_{Sr}^{(\mathrm{sub\_rep})}(t)=\sum_{p\,=\,1}^{|S|}\frac{1}{\binom{|S|}{p}}\sum_{S^{\prime}\,\in\,\binom{S}{p}}\mathrm{hy\_deg}_{t}(S^{\prime},r).

For example, Figure 2, shows a subset repetition at time $t_{2}$ of the event that happened at time $t_{0}$ .

The notion of hyperdegrees serves as the basic building block for the definition of covariates that describe triadic effects, which have been extensively studied in the social networks and relational event literature (rutaheterogeneity). In particular, it allows us to extend notions of closure and balance from dyadic events to higher-order interactions. Below we describe a number of covariates that will be considered for modelling the potential triadic effects of gossiping.

Transitive Closure.

This covariate captures the intuition that friends of friends are likely to become gossip conduits (giardini2019). If a sender set $S$ has gossiped about some alter $a$ , and $a$ together with someone else has gossiped about a receiver $r$ , then $S$ gossiping about $r$ completes the transitive path. To capture this type of effect in our hypergraph framework, we define the following covariate

x^{(\mathrm{tc})}_{Sr}(t)\;=\;\frac{1}{|S|}\sum_{\begin{subarray}{c}a\,\neq\,s,r\\ s\,\in\,S\end{subarray}}\min\Big\{\text{hy\_deg}_{t}(S,a),\;\text{hy\_deg}_{t}(a,r)\Big\},

which keeps track of whether a sender set $S$ can reach a receiver $r$ through an intermediary $a$ . Note that for the definition of this covariate, as well as all subsequent ones, we account for the direction of interactions, but not for their temporal order, i.e., we do not require that $S\to a$ occurs prior to $a\to r$ .

Cyclic Closure.

Cyclic effects capture the circulation of gossiping along closed loops. If prior to time $t$ , a receiver $r$ has gossiped about an intermediary node $a$ , and $a$ has subsequently gossiped about a sender $s\in S$ , then gossiping flowing back from $S$ to $r$ at time $t$ completes this cycle. Such patterns resonate with the gossip triangle identified in prior studies (Ellwardt).

x^{(\mathrm{cc})}_{Sr}(t)\;=\;\frac{1}{|S|}\sum_{\begin{subarray}{c}a\,\neq\,s,r\\ s\,\in\,S\end{subarray}}\min\Big\{\text{hy\_deg}_{t}(r,a),\;\text{hy\_deg}_{t}(a,s)\Big\}.

Sender Balance.

This covariate quantifies the extent to which other actors direct gossip towards both a given sender and a given receiver. Thus it is defined by

x^{(\mathrm{sb})}_{Sr}(t)\;=\;\frac{1}{|S|}\sum_{\begin{subarray}{c}a\,\neq\,s,r\\ s\,\in\,S\end{subarray}}\min\Big\{\text{hy\_deg}_{t}(a,s),\;\text{hy\_deg}_{t}(a,r)\Big\}.

Intuitively, this covariate captures situations in which a third party $a$ is simultaneously connected to $s$ and $r$ . This resembles so called coalition triads (wittek1998), where gossip circulates more among actors with shared connections.

Receiver Balance.

This covariate measures the extent to which a sender set $S$ and a receiver $r$ target the same third parties with their gossiping. This is defined by

x^{(\mathrm{rb})}_{Sr}(t)\;=\;\frac{1}{|S|}\sum_{\begin{subarray}{c}a\,\neq\,s,r\\ s\,\in\,S\end{subarray}}\min\Big\{\text{hy\_deg}_{t}(S,a),\;\text{hy\_deg}_{t}(r,a)\Big\}.

(3)

It captures shared focus on common alters, reflecting consensus-building processes within triads. This aligns with structural balance theory (cartwright1956), which predicts that imbalanced triads generate relational tension, and this is resolved by aligning attention or judgments towards common targets (halevy2019).

4 Inference from partially observed relational event data

As discussed earlier, survey data are collected at discrete time points, asking whether gossiping occurred during that time. No information is recorded about when it occurred or how many times. This means that only partial information is available on the relational events. In particular, the event times are interval-censored, since all we know is that they belong to the wave leading up to the survey, and the counts of the events are right-censored, since all we know is that at least one gossiping event occurred during that wave. In this section, we derive the likelihood of these data, accounting for their censored nature.

4.1 The likelihood under censoring

We assume that there is an underlying relational hyperevent model (1) which describes the dynamics of gossiping hyperevents. Let $K$ be the number of surveys conducted and let $(t_{k-1},t_{k}]$ , for $k=1,\ldots,K$ , the corresponding time intervals. From the inhomogeneous Poisson counting process associated to the relational event process, the number of hyperevents involving a sender set $S$ gossiping about a receiver $r$ during wave $k$ evolves according to

N_{Sr}(t_{k})-N_{Sr}(t_{k-1})~|~\mathcal{H}_{t_{k-1}}\sim\text{Poisson}\left(\int_{t_{k-1}}^{t_{k}}\lambda_{Sr}(u)\,du\right),

(4)

conditional on the history $\mathcal{H}$ of the process up to time $t_{k-1}$ . Denoting with $y_{Srk}^{*}$ the increments, $y_{Srk}^{*}=N_{Sr}(t_{k})-N_{Sr}(t_{k{-}1})~|~\mathcal{H}_{t_{k-1}}$ , what we observe are the right-censored counts, i.e.,

y_{Srk}\,=\,1_{\{y_{Srk}^{*}>0\}}\,=\,\begin{cases}1&\text{if at least one hyperevent occured in }(t_{k{-}1},t_{k}],\\ 0&\text{otherwise}.\end{cases}

In other words, the count data are right-censored at $1$ . This leads to the following likelihood

L(\bm{\beta})=\prod_{k=1}^{K}\prod_{(S,r)\in\mathcal{R}(k)}\underbrace{\left(e^{-\int_{t_{k-1}}^{t_{k}}\lambda_{Sr}(u)\,du}\right)^{1-y_{Srk}}}_{\raisebox{-5.58054pt}{$y^{*}_{Srk}\,=\,0$}}\underbrace{\left(1-e^{-\int_{t_{k-1}}^{t_{k}}\lambda_{Sr}(u)\,du}\right)^{y_{Srk}}}_{\raisebox{-5.58054pt}{$y^{*}_{Srk}\,>\,0$}},

(5)

with $\mathcal{R}(k)$ denoting the risk set of potential hyperevents that could occur in wave $k$ . Of those, the hyperevents that do no occur contribute to the likelihood with a Poisson probability of zero, whereas the hyperevents that do occur contribute with a Poisson probability of a positive count, which is all the information we have from the surveys.

As the integrals in (5) are intractable, we approximate these by

\int_{t_{k-1}}^{t_{k}}\lambda_{Sr}(u)\,du\,\,=\,\,(t_{k}-t_{k-1})\,\lambda_{Sr}(\bar{t}),

with $\bar{t}_{k}=(t_{k-1}+t_{k})/2$ the midpoint of the $(t_{k-1},t_{k}]$ interval. The relational hyperevent model (1) provides the link between the intensities $\lambda_{Sr}(\bar{t})$ and the covariates $\bm{x}_{Sr}(\bar{t})$ . Since the covariates and the hyperevents are recorded only at the point of the survey and not within the interval, we have various options on how to evaluate $\lambda_{Sr}(\bar{t})$ . In the simulation study, we show how a good strategy is to consider

\lambda_{Sr}(\bar{t}_{k})\,=\,Y_{Sr}(\bar{t}_{k})\,\lambda_{0}(\bar{t}_{k})\,\exp\!\left\{f\Big(\dfrac{\bm{x}_{Sr}(t_{k-1})+\bm{x}_{Sr}(t_{k})}{2}\Big)+\bm{\gamma}^{\top}\bm{z}_{Sr}\right\},

(6)

i.e., to evaluate the hazard using the average of the covariate values at the two extremes of the interval. Indeed, this value is expected to carry most of the information about the time-varying behaviour of the variables within the interval.

Substituting the intensities from equation (6) into the likelihood from equation (5), leads to the approximation

\bar{L}(\bm{\beta})=\prod_{k=1}^{K}\prod_{(S,r)\in\mathcal{R}(k)}\left[e^{-(t_{k}-t_{k-1})\lambda_{Sr}(\bar{t}_{k})}\right]^{1-y_{Srk}}\left[1-e^{-(t_{k}-t_{k-1})\lambda_{Sr}(\bar{t}_{k})}\right]^{y_{Srk}},

(7)

which is the likelihood of a right-censored Poisson regression model with the quantities $\log(t_{k}-t_{k-1})$ as offsets. Indeed, conditional on the covariates, the rate $\mu_{Srk}$ of hyperevent $(S,r)$ in the risk set of wave $k$ is described by

\log(\mu_{Srk})=\log(t_{k}-t_{k-1})+\log\lambda_{Sr}(\bar{t}).

Since the censoring is at $1$ , this censored Poisson regression is equivalent to a Binomial regression with complementary log-log (cloglog) link. Indeed,

\pi_{Srk}=P(y_{Srk}=1)=P(y^{*}_{Srk}>0)=1-P(y^{*}_{Srk}=0)=1-e^{-\mu_{Srk}},

leading to

\log(-\log(1-\pi_{Srk}))=\log(\mu_{Srk}),

and thus to the predictor

\log(-\log(1-\pi_{Srk}))=\log(t_{k}-t_{k-1})+\log\lambda_{Sr}(\bar{t}).

So the binary vector of gossip indicators $y_{Srk}$ can be equivalently modelled via a Binomial regression model with offsets $\log(t_{k}-t_{k-1})$ and using the complementary log-log link function.

4.2 Efficient inference of flexible RHEMs

A key advantage of the likelihood formulation derived in the previous section is that efficient implementations of flexible models are available in standard statistical software packages, particularly for Binomial complementary log-log (cloglog) regression models. In particular, the complex dynamics of gossiping necessitates the use of flexible generalized additive modelling with fixed, smooth and random effects. These effects are implemented in the R package mgcv (wood2017) both for the case of censored Poisson as well as for Binomial cloglog regression. The latter will be used for the analyses in the next sections.

5 Simulation Study

In this section, we evaluate the effectiveness of the proposed approach through a simulation study. In a first study, we evaluate the quality of parameter estimation using the proposed approach in the presence of partially observed data. In a second simulation study, we compare different ways of evaluating the time-varying covariates that are included in the model. These, as all the other available data, are measured only at the beginning and at the end of a wave.

5.1 Parameter estimation from partially observed data

We evaluate the performance of our proposed inferential procedure by constructing partially observed right-censored relational data, as in the gossiping application motivating the methodological development. In particular, we generate data from a relational hyperevent model characterized by the following intensity process

\lambda_{Sr}(t)=\lambda_{0}\exp(\beta_{1}x_{r}^{(\mathrm{girl\_alter})}+f(x_{Sr}^{\left(\text{age}\right)})),

(8)

where we mimic the effect of two exogenous covariates, gender and age, on the dynamics of gossiping. To this end, we define $x_{r}^{(\mathrm{girl\_alter})}$ as a binary indicator taking the value of $1$ if the receiver is female, and $0$ otherwise, i.e.,

x_{r}^{(\mathrm{girl\_alter})}\;=\;1_{\{\text{gender}(r)\,=\,\text{female}\}},

and set $\beta_{1}$ to $0.9$ to indicate the tendency for gossiping to be directed more toward women than men. The second variable, $x_{Sr}^{(\mathrm{age})}$ , is instead defined as the average age of all senders and receivers in a hyperevent, i.e.,

x_{Sr}^{(\mathrm{age})}\;=\;\frac{1}{|S|+1}\Bigg(\sum_{s\,\in\,S}\text{age}(s)+\text{age}(r)\Bigg).

For this variable, we simulate a decreasing smooth effect on the hazard of gossiping by defining

f(x_{Sr}^{(\mathrm{age})})=\frac{1}{1+\exp(2(x_{Sr}^{(\mathrm{age})}-16))}.

We generate relational hyperevents among $8$ interacting students from this process via a Gillespie algorithm (Gill77). In particular, inter-arrival times are drawn from an exponential distribution, and at each event time the occurring hyperevent is sampled from the risk set according to a multinomial distribution. We assume that, at any time point, all events between a sender group of size at most $3$ and a receiver among the remaining students are at risk of happening. Given the complete data, we consider $6$ waves defined by the intervals ( $k-1,k]$ , $k=1,\ldots,6$ , and construct a right-censored version of these data, by considering for each hyperevent whether it happened at least once within each wave.

We use the proposed inferential procedure to recover the intensity of the process from the censored data. In particular, we fit a Binomial cloglog model using the gam function from the R package mgcv, with a thin-plate regression spline on age (wood2003). The simulations are performed under a small value of $\lambda_{0}$ ( $\lambda_{0}=0.25$ ) and a large value ( $\lambda_{0}=0.75$ ), creating in this way two settings with a varying average number of occurrences for each hyperevent. In particular, a larger $\lambda_{0}$ value will generate a larger number of occurrences of each hyperevent within each wave and therefore a larger information loss when complete data are replaced by censored data, compared to a smaller value of $\lambda_{0}$ , where censored data may be not too different to the complete data. For each setting, we perform 20 simulations of the process and summarize the results in terms of quality of parameter estimation of the linear (gender) and smooth (age) effects.

Figure 3 reports the results. In particular, Figure 3a shows boxplots of the estimated $\beta_{1}$ coefficient associated to the covariate $x_{r}^{(\mathrm{girl\_alter})}$ , under a small (top) and large (bottom) $\lambda_{0}$ value. Similarly, Figure 3b reports the estimated smooth effects for the covariate $x_{Sr}^{(\mathrm{age})}$ under both settings. As a benchmark, we take the estimates obtained from the complete data via a Poisson generalized additive model with the same specifications as the Binomial cloglog model. The results show how the estimates from the censored data are well calibrated under all settings. As expected, the information loss due to censoring leads to an increase in the uncertainty of the estimates, particularly for the case of a large $\lambda_{0}$ where multiple occurrences of the same hyperevent within each wave are more likely. This is reflected by a wider boxplot in Figure 3a (bottom) and a wider confidence band in Figure 3b (bottom), compared to the estimates from the uncensored data.

5.2 Evaluating time-varying covariates from partially observed data

Besides censoring on the number of gossiping hyperevents, a second, connected, source of information loss is given by the fact that the event times are interval-censored, as their exact times are bounded within the interval of the corresponding wave. This form of information loss occurs frequently in relational event settings, where data are sometimes provided in the form of aggregate counts across discrete time intervals rather than instantaneous events.

In these cases, also the covariates that are used to describe the process are typically evaluated only at the extremes of the interval. Referring back to the gossiping data, this is clearly the case for endogenous covariates, as they can change only when new events occur and this information is available only at the time when the survey is handed out. But this is the case also for endogenous covariates that are also measured during the survey. Since most of the covariates are time-varying, a question of interest is how best to evaluate these covariates. In particular, given a wave $(t_{k-1},t_{k}]$ , one could evaluate the covariates

1.

At time $t_{k-1}$ , i.e., using only information on the history of the process prior to the wave. We denote this strategy with past;
2.

At time $t_{k}$ , i.e., using all information up to and including the current wave. We denote this strategy with current;
3.

As an arithmetic mean of the past and current values. We denote this strategy with average.

Intuitively, the last strategy should be the most effective one, as it provides the best description of the entire behaviour of the covariate across the interval.

We design a simulation study to compare the three evaluation strategies. In particular, we generate relational hyperevents from a model characterized by the following intensity process

\lambda_{Sr}(t)=\lambda_{0}\exp(\beta x(t)),

(9)

where the time-varying covariate is defined by $x(t)=\log(t+1)$ , the baseline intensity $\lambda_{0}$ is set to 0.038 and the regression coefficient $\beta$ to 0.8. As the covariate is time-varying, we use the tau-leap algorithm (tau-leap) to simulate relational hyperevents from this model. As before, we consider $8$ interacting students, sender sets with at most $3$ individuals, and generate data up to time $t=6$ . We consider time intervals $(k-1,k]$ , $k=1,\ldots,6$ , construct right-censored hyperevent data as before and assume that the covariate is measured only at the extremes of each interval.

Figure 4 reports the results from $100$ simulations. We consider the three different ways of evaluating the covariate at each time interval and, in each case, fit a Binomial cloglog gam, with a thin-plate regression spline on this covariate. The boxplots of the estimates of $\beta$ across the $100$ simulations and the three settings, show how, as expected, the average approach for the evaluation of the covariate leads to the best estimation of the parameters.

6 Modelling gossiping in the RECENS school survey study

6.1 Gossiping hyperevent model

We now return to the gossiping study, and consider the following relational hyperevent model to describe the dynamics of gossiping:

	$\displaystyle\lambda_{Sr}(t)$	$\displaystyle=\lambda_{0}(t)\exp\{\beta_{1}x_{r}^{(\mathrm{girl\_alter})}+f_{1}(x_{S}^{(\mathrm{girl\_ego})})+f_{2}(x_{S}^{(\mathrm{sd})}(t))+f_{3}(x_{r}^{(\mathrm{rd})}(t))+f_{4}(x_{Sr}^{(\mathrm{rep})}(t))$
		$\displaystyle+f_{5}(x_{Sr}^{(\mathrm{sub\_rep})}(t))+f_{6}(x_{Sr}^{(\mathrm{rec})}(t))+f_{7}(x_{Sr}^{(\mathrm{tc})}(t))+f_{8}(x_{Sr}^{(\mathrm{cc})}(t))+f_{9}(x_{Sr}^{(\mathrm{sb})}(t))+f_{10}(x_{Sr}^{(\mathrm{rb})}(t))$
		$\displaystyle+\gamma_{\mathrm{class}(S,r)}+\gamma_{r}+\gamma_{S}\Big\}.$

Besides a flexible baseline function $\lambda_{0}$ , capturing temporal changes on the rate of gossiping, the first two variables are exogenous variables defined by

	$\displaystyle x_{r}^{(\mathrm{girl\_alter})}$	$\displaystyle=1_{\{\text{gender}(r)\,=\,\text{female}\}},$
	$\displaystyle x_{S}^{(\mathrm{girl\_ego})}$	$\displaystyle=\dfrac{1}{\|S\|}\sum_{s\in S}1_{\{\text{gender}(s)\,=\,\text{female}\}},$

and describing the effect of gender on the rate of gossiping. The following $9$ variables are instead the endogenous covariates defined in Table 1, capturing the potentially viral nature of gossiping. The last three terms are random effects for class, receiver, and sender, respectively. These account for unobserved heterogeneity in the rate of gossiping between the different classes, the different receivers or the different sender groups, respectively. The inclusion of these random effects reduces potential confounding induced by unobserved heterogeneity. For example, rutaheterogeneity show how failing to adjust for node degree heterogeneity may generate spurious “ghost” triadic effects. Besides the binary variable measuring the gender of the receiver $x_{r}^{(\mathrm{girl\_alter})}$ , all other covariates are included in the model as flexible smooth effects, described by the functions $\lambda_{0},f_{1},\dots,f_{10}$ .

6.2 The dynamics of gossiping inferred from the RECENS school survey study

We fit the model on the school survey data discussed earlier. The four waves have different lengths, so we set the time intervals according to the duration in months of each wave, i.e., $(0,2]$ , $(2,8]$ , $(8,20)$ , $(20,32]$ , respectively. The covariates are evaluated for all hyperevents in the risk set across all waves, so for a total of $299,419$ potential hyperevents. Sender and receiver degree are included with a log-transformation, $\log(x+1)$ , for computational stability purposes. For the time-varying covariates, we take the average of their evaluation at the two extremes of the time interval of the corresponding wave, as indicated by the simulation study. We then fit a Binomial generalized additive mixed model with cloglog link via the bam function from the mgcv package, with a thin-plate regression spline on each smooth term (wood2017). The observations from each wave have a corresponding offset given by the logarithm of the length in months of that wave, i.e., $2$ , $6$ , $12$ , $12$ for the 4 waves, respectively. In order to perform automatic model selection, we use the double-penalty approach of penaltySelectTrue, whereby each smooth term receives an additional penalty which allows it to be shrunk entirely to zero if it is uninformative.

The results are summarized in Table 2. Besides the fixed effects, which are found to be significant according to the small associated p-values, the informative variables are those with a large effective degrees of freedom (edf). Figure 5 plots the fitted smooth effects associated to four of the informative variables. The results highlight a number of key factors and mechanisms underlying gossiping.

Fixed effects
	estimate	std. error	z value	p-value
intercept	-8.7922	0.4981	-17.653	$<2\times 10^{-16}$	***
girl alter	0.5006	0.1858	2.694	0.00706	**
Smooth effects
	edf	ref.df	chi-sq	p-value
girl ego	1.634	9	70.06	0.000354	***
time	0.971	3	98.82	$<2\times 10^{-16}$	***
sender degree	5.927	9	233.52	$<2\times 10^{-16}$	***
receiver degree	3.430	7	41.99	7.33e-05	***
repetition	5.737e-05	3	0.00	0.69478
subset repetition	4.932	9	147.14	$<2\times 10^{-16}$	***
reciprocity	4.959e-05	9	0.00	0.80313
transitive closure	7.853e-05	5	0.00	0.34624
cyclic closure	3.822e-05	9	0.00	0.91670
sender balance	6.748e-05	9	0.00	0.56604
receiver balance	5.949e-05	7	0.00	0.49844
Random effects
	# groups	std. dev.	95% CI
class	44	0.742	[0.487, 1.130]
receiver	1686	0.497	[0.186, 1.330]
sender set	7276	1.479	[1.317, 1.662]

Table 2: Results from the Binomial generalized additive mixed model of gossiping fitted to the school survey data with a double-penalty approach (penaltySelectTrue), showing fixed, smooth and random effects.

Firstly, girls are both more likely to gossip and to be the target of gossiping. The first aspect is supported by the increasing smooth effect of $x_{S}^{(\mathrm{girl\_ego})}$ (Figure 5a), while the second aspect is reflected by the positive linear effect of $x_{r}^{(\mathrm{girl\_alter})}$ (Table 2) on the rate of gossiping. Combined with the fact that the majority of sender groups is made up of only girls (52.3%) this aligns with prior research suggesting that same-gender gossip is frequent, and is more common among girls (nynkegossip; KisfalusiGossipReputation).

Regarding the effect of the endogenous covariates, only a small number of these are found to play a significant role in gossiping. This may be partly due to the small number of time points available, which may limit the data available for estimating these effects, in particular the triadic ones. As for the informative variables, the smooth effects relative to sender degree (Figure 5b), receiver degree (Figure 5c) and subset repetition (Figure 5d) show a consistent non-linear pattern on the rate of gossiping, characterized by an initial increase, a plateau and a sharp decrease at high activity levels. This points to an intense but short-lived gossiping activity which does not become persistently viral. In particular, the fact that a sender group has been involved in a gossiping event, a receiver has been the target of a gossiping, or a sender group has targeted a specific receiver, increases the rate of the same sender group, the same target or the same event, respectively, to occur again in the future. However, this effect fades away after a few occurrences of the same event, and the event is in fact less likely to occur again at high activity levels. Whereas in the literature, it is suggested that students who gossip continue to do so (KisfalusiGossipReputation), our results show that this effect does not become viral. Similarly, our results show how being the target of past gossiping increases the rate of being targeted again, in line with the positive linear effect fitted by KisfalusiGossipReputation. However, also in this case, the use of more complex models with non-linear smooth effects shows how there is no persistent viral targeting of the same receiver.

Another important aspect of our model, which is not considered by the existing literature on gossiping, is the inclusion of random effects for sender group, receiver and class. Table 2 shows heterogeneity associated to each of these effects, as indicated by significant standard deviations. The use of these random effects allows us to disentangle the viral effect of gossiping from unobserved individual traits. The latter may be related to someone’s personality, visibility, or social prominence, and may explain the high gossiping activity associated to some individuals or the persistent targeting of others. Our results indeed suggest that being a gossiper or being targeted by gossiping is driven more by intrinsic traits than by past exposure, possibly explaining the apparent discrepancy with existing studies regarding the effect of endogenous covariates (KisfalusiGossipReputation). Finally, our model is fitted to the data from all classes, rather than separately for a small selection of classes (KisfalusiGossipReputation). The joint modelling approach allows for borrowing strength across classes, which is particularly important in this study with a small number of total observations. However, the inclusion of a class random effect, allows us to conclude that there is significant heterogeneity at the class level which is accounted for by our model.

7 Conclusion

Gossiping is a social behaviour influenced by multiple factors in a complex and dynamic way. In this paper, we present a relational hyperevent model that describes the complexity of this process via a number of exogenous and endogenous variables. These variables capture the heterogeneity and potential virality of the phenomenon and have been defined to account for the higher-order nature of gossiping, whereby at least two gossipers are needed to gossip about a third non-present person.

We fit the model to data from a longitudinal school survey from Hungarian secondary schools. Limitations in data collection require an extension of the inference to accommodate for right-censored hyperevent data and interval-censored event times. Thanks to a reformulation of the likelihood to that of standard regression models, we are able to fit complex models with linear, smooth and random effects in the presence of partially observed data. Our results show how the inclusion of smooth effects, capturing complex non-linear dynamics, and random effects, capturing unobserved heterogeneity, may play an important role in disentangling the viral effects of gossiping from those related to individual traits.

Overall, by adapting relational hyperevent models to higher-order gossiping interactions and extending them to handle partially observed data, our study broadens the applicability of these models to a wider range of empirical settings, providing researchers with a flexible tool for investigating collective dynamics from incomplete network data.

8 Code availability

The code for reproducing the simulation study and the empirical analysis of gossip data can be found at the GitHub repository page https://github.com/veronicapoda/gossip-rhem.git.

Acknowledgments

We thank the Research Center for Educational and Network Studies (RECENS) for providing access to the school survey dataset and useful information about the data.