
2. Social Network Analysis
Social network analysis [Kre2002] concerns itself
with the measuring of relationships and flows between
entities. We are able to model an IRC channel as a social
network, as each individual user is an entity and their
interactions imply relationships and flows. Such social
networks can provide a mathematical analysis of the
relationships in an IRC channel, yet visual
representations are often easier to comprehend. The
network is modeled as a graph, consisting of a set of
nodes and edges, where each node represents a user and
an edge represents a relationship between a pair of
nodes, as shown in Figure 2.
Visualization of social networks is important, as it
allows the viewer to determine facts about nodes and
relationships between nodes more rapidly than
examining the raw mathematical model. For example,
the prominence of a node in the network can be
determined by its centrality, which is easy to see in a
visualization of a social network.
3. Inferring Social Networks
A social network is virtually present whenever we
observe a group of people interacting electronically
[Wel1997]. The first step in visualizing the social
network of an IRC channel is to infer the approximate
mathematical representation. Identifying the nodes in the
graph is a trivial task, as these simply correspond to the
users in the channel. Identifying the presence of edges is
slightly more difficult, as this can only be done by
monitoring the activity in the channel and identifying
specific classes of user interactions. Furthermore, we
enhance our social network model by assigning
weightings to each edge to show the strength of each
relationship.
Fortunately, there are some fairly simple heuristics
that enable us to obtain reasonably accurate
approximations of the data required to produce the social
network, most of which are analogous to inferring social
networks from real life conversations. It must be noted
that the accuracy of these approximations is very
subjective by nature and that the social network derived
from the heuristics can be no more than a good guess.
However, in practice, we find that the results are
generally good. We call this first stage inferring the
social network.
3.1. Inferring Relationship Strengths
An IRC bot is used to monitor channels and infer the
social network structure for us (the term bot is commonly
used to describe an automated IRC client and is a
contraction of robot). The bot is called PieSpy and has
been implemented in Java using the PircBot IRC Bot
Framework [Mut2001]. The bot is instructed to join a
channel and examine the messages and actions sent to
the channel. Each user has a unique nickname (or nick)
and each message includes a source nickname so it is
possible to tell which user it came from. To begin with,
the inferred graph contains only a set of nodes to
represent the users in a channel. All that remains is for us
to build the set of weighted edges.
3.2. Direct Addressing of Users
The first simple method we use to infer relationships
in the graph is to monitor occurrences of direct
addressing. This is where a user attempts to target a
channel message to another user by specifying their
nickname, as shown in Figure 3. This is a very common
observation in a channel and usually involves the target
nickname being stated before the actual message, often
separated by a colon or other punctuation. This is a
simple yet reliable way of building the set of edges in the
graph, but it works best in conjunction with other
methods.
<Dave> Can someone ping me?
<Phil> Dave: Okay.
Figure 3 An example of direct addressing
3.3. Temporal Proximity
Direct addressing is not always used (or required) to
specify the target of a message. A message without
explicit direct addressing is either targeted to everybody
in the channel, or it is targeted to an individual user.
Analogous to a real life conversation, if there is a long
period of silence before a user sends a message and this
message is immediately followed up by a message from
another user, then it is reasonable to imply that the
second message was in response to the first. The fact that
the second message was probably a response to the first
allows us to infer a relationship between the two users.
3.4. Temporal Density
If there are no long delays in a channel’s
conversation, it is still possible to derive clues about the
structure of the social network by examining other
temporal features. If the last n messages have been sent
within a short time span and all n of these messages
originate from only two users, then it is reasonable to
assume that these two users are engaged in conversation.
We find that values of n > 5 allow us to build the set of
edges and their weightings in the graph fairly accurately.
3.5. Monitoring Private Messages
Each IRC user is able to bypass channel discussions
and send messages directly to other users. This is the
strongest and most accurate indication of a relationship
between sender and recipient. Our bot does not
implement this heuristic, as it would require special
access to the servers that make up an IRC network and
raises strong ethical debate about the privacy of users. As
users may also be in more than one channel, it is not a