Social network analysis [SNA] is the mapping and measuring of relationships and flows between people, groups, organizations, computers, URLs, and other connected information/knowledge entities. The nodes in the network are the people and groups while the links show relationships or flows between the nodes.[Ref]

## Relevant questions w.r.t. Social Network analysis

- What patterns are created by the aggregate of interactions in a social media space?
- How are participants connected to one another?
- What social roles exist and who plays critical roles like connector, answer person, discussion starter, or content caretaker?
- What discussions, pages, or files have attracted the most interest from different kinds of participants?
- How do network structures correlate with the contributions people make within the social media space?
- Find out the relationships among social entities and what are the patterns and implications of these relationships?

What are the methods and models for analyzing social network data?

## Goals

Network graphs can be explored along multiple dimensions, most prominently scale and time. Some research questions focus on the structure of the whole graph or large sub-graphs, other questions focus on identifying individual nodes that are of particular interest. Some analysts will want to analyze the whole graph aggregated over its entire lifetime; others will want to slice the network into units of time to explore the progression of the network’s development.[Ref] Perer and Shneiderman[Ref] identified following list of goals:

- Overall network metrics, e.g., number of nodes, number of

edges, density, diameter - Node rankings, e.g., degree, betweenness, closeness centrality
- Edge rankings, e.g., weight, betweenness centrality
- Node rankings in pairs, e.g., degree vs. betweenness, plotted on a scatter gram)
- Edge rankings in pairs
- Cohesive subgroups, e.g., finding communities
- Multiplexity, e.g., analyzing comparisons between different edge types, such as friends vs. enemies.

## Kite Network Example

To better understand the meaning of each graph metric, following Kite network, created by David Krackhardt is used widely:

Figure 1 (Kite Network) – Ref

## Key Concepts

**Actor:**Actors are discrete individual, corporate, or collective social units. Examples of actors are people in a group, departments within a corporation, public service agencies in a city, or nation-states in the world system. Use of the term “actor” does not imply that these entities necessarily have volition or the ability to “act.” Further, most social network applications focus on collections actors that are all of the same type (for example, people in a work group). Such collections are called one-mode networks. However, some methods allow one to look ‘ at actors of conceptually different types or levels, or from different sets.[Ref]**Relational Tie:**Actors are linked to one another by social ties. The range and type of ties can be quite extensive. The defining feature of a tie is that it establishes a linkage between a pair of actors. Some of the more common examples of ties employed in network analysis are:- Evaluation of one person by another (for example expressed friendship, liking, or respect)
- Transfers of material resources (for example business transactions, lending or borrowing things)
- Association or affiliation (for example jointly attending a social event, or belonging to the same social club)
- Behavioral interaction (talking together, sending messages)
- Movement between places or statuses (migration, social or physical mobility)
- Physical connection (a road, river, or bridge connecting two points)
- Formal relations (for example authority)
- Biological relationship (kinship or descent) [Ref]

**Dyad:**At the most basic level, a linkage or relationship establishes a tie between two actors. The tie is inherently a property of the pair and therefore is not thought of as pertaining simply to an individual actor. Many kinds of network analysis are concerned with understanding ties among pairs. All of these approaches take the dyad as the unit of analysis. A dyad consists of a pair of actors and the (possible)

tie(s) between them. Dyadic analyses focus on the properties of pairwise relationships, such as whether ties are reciprocated or not, or whether specific types of multiple relationships tend to occur together. [Ref]**Triad:**Many important social network methods and models focus on the triad; a subset of three actors and the (possible) tie(s) among them. Balance theory has informed and motivated many triadic analyses. [Ref]**Subgroup:**Dyads are pairs of actors and associated ties, triads are triples of actors and associated ties. It follows that we can define a subgroup of actors as any subset of actors, and all ties among them. Locatiug and studying subgroups using specific criteria has been an important concern in social network analysis. [Ref]**Group:**Network analysis is not simply concerned with colleclions of dyads, or triads, or subgroups. To a large extent, the power of network analysis lies in the ability to model the relationships among systems of actors. A system consists of ties among members of some (more or less bounded) group. A group, then, consists of a finite set of actors who for conceptual, theoretical, or empirical reasons are treated as a finite set of individuals on which network measurements are made.[Ref]**Relation:**The collection of ties of a specific kind among members of a group is called a relation. For example, the set of friendships among pairs of children in a classroom, or the set of formal diplomatic ties maintained by pairs of nations in the world, are ties that define relations. For any group of actors, we might measure several different relations (for example, in addition to formal diplomatic ties among nations, we might also record the dollar amount of trade in a given year). It is important to note that a relation refers to the collection of ties of a given kind measured on pairs of actors from a specified actor set. The ties themselves only exist between specific pairs of actors. [Ref]**Network:**A social network consists of a finite set or sets of actors and the relation or relations defined on them. The presence of relational information is a critical and defining feature of a social network. [Ref]**Overall Network Metrics**[Ref]**Graph type:**Undirected or directed.**Vertices:**The number of total vertices.**Total edges:**The number of total edges.**Self-loops:**The number of edges that connect a vertex with itself.**Connected components:**The number of connected components (i.e., clusters of vertices that are connected to each other but separate from other vertices in the graph). In the Kite network there is only one connected component because you can get from one vertex to all other vertices. In contrast, the Invitation network(below) includes two connected components: the large group

containing Carol, Ed, Dave, Bob, Ann, and Frank, and the smaller component containing Gary and Helen.

**Single vertex connected components:**The number of isolated vertices that are not connected to any other vertices in the graph. There are no isolated vertices in

the Kite network or Invitation network.**Maximum vertices in a connected component:**The number of vertices in the connected component with the most vertices. This is equal to the number of vertices in the Kite network, because they are all part of the only connected component. In the Invitation network, the largest component includes six people,

so this value would be 6.**Maximum edges in a connected component:**The number of edges in the connected component with the most edges. This is equal to the number of edges in the Kite network, because they are all part of the only connected component. In the Invitation network, the component with the most edges has six connections.**Maximum geodesic distance (diameter):**The geodesic distance is the length of the shortest path between two people. If you think of the edges as roads and

the vertices as houses, the geodesic distance would be the number of roads someone must take to get from one house to another, assuming that the person is traveling on the shortest path possible. The maximum geodesic distance, or diameter of a network, is the largest geodesic distance of all, or the distance between the two vertices that are farthest from each other. In the Kite network, this value is 4. For example, the shortest path between Jane and Diane is 4; similarly the shortest path from Jane to Andre, Beverly, Carol, and Ed is also 4. All other geodesic distances are smaller. For example, the shortest path between Jane and Ike is 1.**Average geodesic distance:**The average of all geodesic distances.*This value gives a sense of how “close” community members are from one another.*If it is high, many individuals in the social network do not directly know each other. People may be connected through a friend of a friend of a friend of a friend, but not through short paths. If it is low, most people know one another either directly or through a mutual friend.**Graph density:**The number between 0 and 1 indicating how interconnected the vertices are in the network. For an undirected graph where all vertices are connected to all others through at least one edge, the graph density is calculated by dividing the number of total edges by the maximum number of possible edges. For the Kite network, there are 18 edges and 45 possible edges, resulting in a graph density of 0.4. A more dense graph (e.g., 0.6) would include more total edges for the same number of vertices. See how Facebook is leveraging this…“Graph density is a predictor of a user becoming engaged so if the flow fails to deliver upon a minimal quantity and quality of density, then it isn’t doing its job.”[Ref]

**Node/Vertices-specific metrics**[Ref]**Degree:**The degree of a vertex (sometimes called degree centrality) is a count of the number of unique edges that are connected to it. Diane has a degree of 6 because she is directly connected to six other individuals. In comparison, Jane has a degree of only 1 because she is connected to only one other person. If the edges represented the strong friendship ties of individuals in a class, we might say that Diane is the most popular person in the class and Jane is the least popular. If you were analyzing a directed graph (such as the Party Invitation network), the single degree metric would be split into two metrics: (1) In-Degree, which measures the number of edges that point toward the vertex of interest (i.e., number of people that have invited the person to the party), and (2) Out-Degree, which measures the number of edges that the vertex of interest points toward (i.e., the number of people the person has invited to the

party).**Betweenness Centrality:**Although popularity is important, it is not everything. Consider Heather in the Kite network. She is directly related to only three other people (i.e., she has a degree of 3). Despite her relatively low degree, her position as a “*bridge*” between Ike (and indirectly Jane) to the rest of the group may be of utmost importance. If, for example, information were passed from one person to another, Heather would be vital for assuring that Ike and Jane could communicate with the rest of the group. In fact, if she were removed from the network, Ike and Jane would be disconnected from the other class members. Thus, Heather has high betweenness centrality. In contrast, Ed has a betweenness centrality of 0. Notice that if he were removed from the graph, everyone would still be connected to everyone else, and their shortest communication paths would not even be altered. More generally, vertices that are included in many of the shortest paths between other vertices (called*geodesic distances*) have a higher betweenness centrality than those that are not included on such paths.**Closeness Centrality:**Another characteristic you may care about is*how close each person is to the other people in the network*. If information needed to flow through the network, some people would be able to get a message to all the other people relatively quickly (i.e., in few steps), whereas others may require many steps. Closeness centrality is a measure of the average shortest distance from each vertex to each other vertex. In the Kite network, Fernando and Garth have the lowest Closeness Centrality measure, suggesting that they may be in a good position to initiate the spread of information through the network. Stanley Milgram explored the idea that the*average closeness centrality between any two people on Earth is equal to six*; his work is the motivation for the idea of “*six degrees of separation*.”

**Eigenvector Centrality:**In many cases, a connection to a popular individual is more important than a connection to a loner. The Eigenvector Centrality network metric takes into consideration not only how many connections a vertex has (i.e., its degree), but also the degree of the vertices that it is connected to. Both Heather and Ed have a degree of 3. However, Ed is directly connected to Diane, the most popular person in the class, whereas Heather is connected to Ike, who is among the least popular. This explains why the Eigenvector Centrality metric for Heather is lower than it is for Ed.**Clustering Coefficient:**In some cases, a person’s friends may be friends with

each other. For example, Ed’s three friends Beverly, Diane, and Garth are all directly connected to one another, creating a*clique*. More generally, a clique or complete graph occurs when all vertices in a group are directly connected to each other. In other cases, a person’s friends may not be friends with one another. For example, Ike’s two friends, Heather and Jane, are not friends with each other. The clustering coefficient measures how connected a vertex’s neighbors are to one another. More specifically, it is the number of edges connecting a vertex’s neighbors divided by the total number of possible edges between the vertex’s neighbors. For example, Heather’s three neighbors are Fernando, Garth, and Ike. Only one connection exists between any of them (the connection between Fernando and Garth). There are three possible connections

(Fernando-Garth; Fernando-Ike; Garth-Ike). Thus, the clustering coefficient for Heather is 1/3.

**References**

[1] **Social Network Analysis for Startups: Finding Connections on the Social Web**

[2] Task Taxonomy for Graph Visualization

[3] The Development of Social Network Analysis—with an Emphasis on Recent Events

[4] Systematic Yet Flexible Discovery: Guiding Domain Experts through Exploratory Data Analysis

[5] Analyzing Social Media Networks with NodeXL – Insights from a Connected World – Book

[6] Analyzing Social Media Networks with NodeXL – Paper

[7] NodeXL: Network Overview, Discovery, and Exploration for Excel

[8] Social Network Analysis – Methods and Applications

[9] Social Network Analysis: An Introduction

[10] Ranking Methods for Networks

[11] Networks, Crowds, and Markets: Reasoning About a Highly Connected World By David Easley and Jon Kleinberg

[12] Coursera – Social and Economic Networks: Models and Analysis