---
title: "Closeness & Betweenness Centrality/Centralization in R Lab"
date: "CRJ 605 Statistical Analysis of Networks"
output:
html_document:
df_print: paged
theme: cosmo
highlight: haddock
toc: yes
toc_float: yes
code_fold: show
---
```{r,echo=FALSE,eval=TRUE,message=FALSE,warning=FALSE}
library(sna)
library(devtools)
library(UserNetR)
library(network)
```
###
*This lab examines closeness centrality and betweenness centrality/centralization using the `closeness()`, `betweenness()`, and `centralization()` functions in the `sna` package.*
***
###**Closeness Centrality (Undirected Binary Graphs)**
How *close* is a node to other nodes? In an undirected binary graph, **closeness centrality** measures how near a node is to the other nodes in the network. This is based on the inverse of the distance of each actor to every other actor.
Terminology:
* A **geodesic** is the shortest path between two nodes.
* The **distance**, $d(n_i,n_j)$, is the length of the path between *i* and *j*.
Closeness centrality is calculated as: $C_C(n_i) = [\sum\limits_{j=1}^g d(n_i,n_j)]^{-1}$ or $\frac{1}{[\sum\limits_{j=1}^g d(n_i,n_j)]}$.
As can be seen by the equation, we want to first identify the distances between *i* and *j*, sum them, and then take the inverse. We can manually calculate the distance matrix using the `geodist()` function in `sna`.
Let's take a look:
```{r,echo=TRUE,eval=TRUE,message=FALSE}
# First, clear the workspace.
rm(list = ls())
# then, load the sna library.
library(sna)
#now, take a look at the geodist() function.
?geodist
```
Let's go ahead and set up a simple matrix and examine the geodesics for that matrix.
```{r,echo=TRUE,eval=TRUE,message=FALSE}
u.mat <- rbind(c(0,1,0,0,0),c(1,0,1,0,0),c(0,1,0,1,1),c(0,0,1,0,1),c(0,0,1,1,0))
rownames(u.mat) <- c("Jen","Tom","Bob","Leaf","Jim")
colnames(u.mat) <- c("Jen","Tom","Bob","Leaf","Jim")
# let's look at what the geodist() function creates.
u.mat.geodist <- geodist(u.mat)
u.mat.geodist
```
We can see that the function creates an object of class `list`. In the object, there are two arrays, `$counts` and `$gdist`. To get geodesic distances, we use the `$` sign to select a single piece from the list:
```{r,echo=TRUE,eval=TRUE,message=FALSE}
# print out the distances.
u.mat.geodist$gdist
# If we take the row sum of this object, we get the distances.
rowSums(u.mat.geodist$gdist)
# now, we can create the closeness centrality based on the reciprocal of the row sum.
close.cent <- 1 / rowSums(u.mat.geodist$gdist)
close.cent
# We can then calculate the standardized closeness centrality by multiplying by g-1.
g <- dim(u.mat)[1]
close.cent.s <- (g-1) * close.cent
```
Alternatively, we could just use the `closeness()` function in the `sna` package. First, take a look at the function using `?closeness`. Note that the standardized closeness centrality is reported by default. If we want the unstandardized closeness, we can just divide the result returned by `closeness()` by *g-1*.
```{r,echo=TRUE,eval=TRUE,message=FALSE}
?closeness
closeness(u.mat,gmode="graph") #standardized.
closeness(u.mat,gmode="graph") / (g-1) #raw.
```
####**Closeness Centralization**
We can also summarize the entire network, in terms of how close nodes are to each other. *Group closeness centralization* tells us how much variation there is in the closeness scores. As with degree centrality, this measure is bound between 0 and 1 where a value of 0 indicates complete uniformity across nodes in there centralization scores and a value of 1 indicates that one node has the highest possible centrality score, and all others are at the minimum.
This is calculated as: $C_C = \frac{\sum\limits_{i=1}^g[C'_C(n^*)-C'_C(n_i)]}{[(g-2)(g-1)]/(2g-3)}$.
Where $C'_C(n^*)$ is the maximum or largest standarized closeness centrality score. For a more elaborate discussion of this equation, see Wasserman & Faust (1994: 191-192). To calculate the group closeness centrality, we can use the `centralization()` function in `sna` and specify `closeness` in the `FUN` argument.
```{r,echo=TRUE,eval=TRUE,message=FALSE}
?centralization
centralization(u.mat,closeness,mode="graph")
```
###**Betweenness Centrality (Undirected Binary Graphs)**
We have seen how centrality can be conceptualized as having a high number of ties (i.e. degree centrality) or being close to others in the network (i.e. closeness centrality). We can also conceptualize centrality as a node that lies on a particular path between other nodes. *Betweenness centrality* is based on the number of shortest paths between *j* and *k* that actor *i* resides on.
Betweenness centrality is calculated as: $C_B(n_i) = \sum\limits_{j