---
title: "Identifying Cohesive Subgroups in R Lab"
date: "CRJ 605 Statistical Analysis of Networks"
output:
html_document:
df_print: paged
theme: cosmo
highlight: haddock
toc: yes
toc_float: yes
code_fold: show
---
```{r,echo=FALSE,eval=TRUE,message=FALSE,warning=FALSE}
library(devtools)
library(UserNetR)
```
###
*This lab examines various definitions of **cohesive subgroups** the `cliques()` function in the `igraph` package and the `kcores()` function in the `sna` package.*
***
###**Cohesive Subgroups**
How do we define *cohesion*? It is common to focus on four general properties of cohesive groups (Wasserman & Faust,1994: 251):
* Mutuality of ties
* The closeness or reachability of subgroup members
* The frequency of ties among members
* The relative frequency of ties among subgroup members compared to non-members
Note that these are *conceptual* definitions, from which we can derive *operational* definitions. What would a graph with these properties look like?
We could approach this several ways:
* Bottom-up: Dense connections are built-up from simpler dyads and triads to more extended dense clusters. Example: clique, n-clique.
* Top-down: Looking at the whole network, sub-structures are areas of the graph that seem to be locally dense, but separated to some degree, from the rest of the graph. Example: components, k-cores, community detection.
###**Undirected Graphs**
####Cliques
A *clique* in a graph is a subgraph of three or more nodes such that:
* all nodes are adjacent to all other nodes
* and there are no other nodes that are also adjacent to all other nodes.
Let's build the example network from the **Identifying Subgroups** lecture:
```{r,echo=TRUE,eval=TRUE,message=FALSE,include=TRUE}
rm(list = ls())
library(sna)
library(network)
clique.example1.net <- network.initialize(6, directed=FALSE)
clique.example1.net[1,c(2,5,6)] <- 1
clique.example1.net[2,c(5,3,4)] <- 1
clique.example1.net[3,c(4,5,6)] <- 1
clique.example1.net[4,5] <- 1
network.vertex.names <- LETTERS[1:6] # use the LETTERS function to get 6 upper-case letters.
```
```{r,echo=FALSE,eval=TRUE,message=FALSE,include=FALSE}
set.seed(1)
coords <- gplot(clique.example1.net,gmode="graph",label=network.vertex.names, vertex.col="lightgreen")
coords[1,] <- c(0.2, 4.0) #A
coords[2,] <- c(1.2, 3.0) #B
coords[3,] <- c(1.2, 1.5) #C
coords[4,] <- c(-0.5, 1.5) #D
coords[5,] <- c(-0.5, 3.0) #E
coords[6,] <- c(2.2, 2.8) #F
```
```{r,echo=FALSE,eval=TRUE,message=FALSE}
# Now, take a look at the plot.
gplot(clique.example1.net,gmode="graph",label=network.vertex.names, vertex.col="white",coord = coords)
```
As we saw in the lecture, the cliques are: {A, B, E}, {B, C, D, E}, but not {B, C, E} or {B, D, E}.
```{r,echo=FALSE,eval=TRUE,message=FALSE,display=TRUE}
op <- par(mfrow=c(2,2),mar=c(0,0,4,0))
e.col <- c("lightgreen", "lightgreen", "grey40", "lightgreen", "grey40", "grey40", "grey40", "grey40", "grey40", "grey40")
v.col <- c("lightgreen","lightgreen","white","white","lightgreen","white")
gplot(clique.example1.net, gmode = "graph", label = network.vertex.names, vertex.col = v.col,coord = coords, edge.col = e.col)
e.col <- c("grey40", "grey40", "grey40", "lightgreen", "lightgreen", "lightgreen", "lightgreen", "lightgreen", "grey40", "lightgreen")
v.col <- c("white","lightgreen","lightgreen","lightgreen","lightgreen","white")
gplot(clique.example1.net, gmode = "graph", label = network.vertex.names, vertex.col = v.col,coord = coords, edge.col = e.col)
e.col <- c("grey40", "grey40", "grey40", "lightgreen", "lightgreen", "red", "red", "lightgreen", "grey40", "red")
v.col <- c("white","lightgreen","lightgreen","red","lightgreen","white")
gplot(clique.example1.net, gmode = "graph", label = network.vertex.names, vertex.col = v.col,coord = coords, edge.col = e.col)
e.col <- c("grey40", "grey40", "grey40", "lightgreen", "red", "lightgreen", "red", "red", "grey40", "lightgreen")
v.col <- c("white","lightgreen","red","lightgreen","lightgreen","white")
gplot(clique.example1.net, gmode = "graph", label = network.vertex.names, vertex.col = v.col,coord = coords, edge.col = e.col)
par(op)
```
***
Note that we can use the `cliques()` function in the `igraph` package to pull this information from the graph. For our example, we need to convert our `network` object to an `igraph` object using the `intergraph` package. If you have not already done so, use the `install.packages("intergraph")` and `install.packages("igraph")`command to install these packages.
```{r,echo=TRUE,eval=TRUE,message=FALSE}
# load the libraries.
library(intergraph)
library(igraph)
# convert the object.
i.clique.example1.net <- asIgraph(clique.example1.net)
# use the cliques() function to show the cliques.
cliques(
i.clique.example1.net, # our network as an igraph object.
min=3 # define the minimum number of nodes to be 3 (exclude dyads and isolates).
)
# now plot these with the numbers (instead of letters).
plot(i.clique.example1.net)
```
***Note*** that `igraph` relaxes the constraint of no externally tied nodes. That is, the {B, C, E} (i.e. {2, 3, 5}) and {B, D, E} (i.e. {2, 4, 5}) cliques are identified by the function.
The *clique* definition can be limited, in that there is no internal differentiation since all positions are identical in the subgraph. That is, there are no core members or leaders. Also, data collection may prevent cliques in graphs. For example, limited nomination (e.g. "nominate 3 people") designs will impose constraints on the number of cliques in the network.
***
Some solutions are to operationalize cohesiveness based on:
* *Reachability*, such that nodes are not necessarily adjacent, but connecting paths are short. Example: n-cliques.
* *Degree*, such that nodes are adjacent to many other nodes, thereby reducing the vulnerability of the structure. Examples: k-cores.
####*n*-Cliques
An *n*-clique is a subgraph in which the largest geodesic distance between any two nodes is no greater than *n*. That is: $$d(i,j) \leq n \text{ for all } n_i,n_j \in N_s$$.
To my knowledge, `igraph` does not compute *n*-cliques. However, `sna` has a `clique.census()` function.
***
####*k*-Cores
An alternative approach, is to base the clique on actor degree. A *k*-core is a subgraph in which each node is adjacent to at least a minimum number, *k*, other nodes in the subgraph. A node's minimum degree within the subgraph will be at least *k*. That is: $$d_s(i) \geq k \text{ for all } n_i \in N_s$$.
Let's create the network that is examined in the **Identifying Subgroups** lecture:
```{r,echo=TRUE,eval=TRUE,message=FALSE,display=FALSE}
# first, detach igraph since it conflicts with sna.
detach(package:igraph)
# then, reload sna.
library(sna)
# Create the network object.
set.seed(605)
random.net <- as.network(rgraph(15,mode="graph",tp = 0.35),directed=FALSE)
# now plot it.
set.seed(605)
coords <- gplot(random.net,gmode="graph",label=LETTERS[1:15], vertex.col="white", edge.col="grey40")
```
```{r,echo=TRUE,eval=TRUE,message=FALSE}
# plot the network.
gplot(random.net,gmode="graph",label=LETTERS[1:15], vertex.col="white", edge.col="grey40", coord = coords)
```
Now, we can examine the *k*-cores in this graph using the `kcores()` function in `sna`.
```{r,echo=TRUE,eval=TRUE,message=FALSE}
# find the k-cores using the kcores() function.
?kcores
random.net.kcores <- kcores(random.net,mode="graph")
# the object prints out the number of individuals in each core.
random.net.kcores
```
```{r,echo=TRUE,eval=TRUE,message=FALSE}
# create the colors and the labels.
v.cols <- rainbow(max(random.net.kcores))
v.cols <- v.cols[random.net.kcores]
set.vertex.attribute(random.net,"v.cols",v.cols)
set.vertex.attribute(random.net,"v.labs",LETTERS[1:15])
set.vertex.attribute(random.net,"x.coords",coords[,1])
set.vertex.attribute(random.net,"y.coords",coords[,2])
# Plot them.
op <- par(mar=c(0,0,4,0))
gplot(random.net,gmode="graph",label=random.net.kcores, label.pos = 5, vertex.col=v.cols, edge.col="grey40", coord = coords)
title(main="Plot of All k-cores",cex.main=3)
par(op)
# Now, create the subgraphs and plot them.
random.net.2core <- get.inducedSubgraph(random.net,v=which(random.net.kcores >= 2))
random.net.3core <- get.inducedSubgraph(random.net,v=which(random.net.kcores >= 3))
random.net.4core <- get.inducedSubgraph(random.net,v=which(random.net.kcores >= 4))
# Fix the coordinates.
cord2core <- cbind(random.net.2core %v% "x.coords",random.net.2core %v% "y.coords")
cord3core <- cbind(random.net.3core %v% "x.coords",random.net.3core %v% "y.coords")
cord4core <- cbind(random.net.4core %v% "x.coords",random.net.4core %v% "y.coords")
# Plot all of them.
op <- par(mfrow=c(2,2),mar=c(0,0,4,0))
gplot(random.net.2core,gmode="graph",label=random.net.2core %v% "v.labs", label.pos = 5, vertex.col=random.net.2core %v% "v.cols", edge.col="grey40", coord = cord2core)
title(main="All 2:4-cores",cex.main=2)
gplot(random.net.3core,gmode="graph",label=random.net.3core %v% "v.labs", label.pos = 5, vertex.col=random.net.3core %v% "v.cols", edge.col="grey40", coord = cord3core)
title(main="All 3:4-cores",cex.main=2)
gplot(random.net.4core,gmode="graph",label=random.net.4core %v% "v.labs", label.pos = 5, vertex.col=random.net.4core %v% "v.cols", edge.col="grey40", coord = cord4core)
title(main="All 4-cores",cex.main=2)
gplot(random.net,gmode="graph",label=LETTERS[1:15], vertex.col="white", edge.col="grey40", coord = coords)
par(op)
```
***
###**Cohesiveness in the PINS Get Along With Network**
The *Prison Inmate Networks Study (PINS)* examines the social networks of prison inmates in a state correctional institution. Information about PINS can be accessed at: http://justicecenter.psu.edu/research/projects/prison-inmate-networks-study-pins.
If you scroll through the website, you will see that there are two edgelists avaiable. The *get along with network* which measures the individuals that individuals indicated they "get along with" on the unit. There is also the *power and influence network* which captures nominations made of individuals who were considered "powerful and influential" on the unit.
We can examine the cohesiveness scores by first building the *get along with* network from the edgelist on the website:
```{r,echo=TRUE,eval=TRUE,message=FALSE, figure.align = "center", figure.height = 5, figure.width = 5}
GA.edgelist <- read.csv(url("http://justicecenter.psu.edu/research/projects/files/pins-get-along-with-edgelist"),header=FALSE,as.is=TRUE)
GA.net <- as.network(GA.edgelist,directed=TRUE, matrix.type="edgelist")
# Note that the network contains 9 isolates.
length(which(has.edges(GA.net) ==FALSE))
# We can remove these by using the get.inducedSubgraph() function.
# Specifically, we stipulate in the v= argument that we only want vertices that have edges incident on them.
GA.net <- get.inducedSubgraph(GA.net, v=which(has.edges(GA.net) ==TRUE))
# Now, take a look at the plot.
gplot(GA.net, edge.col="grey40", vertex.col="lightgreen", main="PINS Get Along With Network")
```
####*What are the features of the network that "jump-out" when you examine the plot?*
***
Note that we are now working with a directed graph. To examine the *coreness* of the graph, we need to make a decision about how to treat the edges.
If we relax the constrain that all edges must be reciprocated, we can identify four kinds of dyads of increasingly strict connectivity:
* *Weakly n-connected*: *i* and *j* are joined by a semipath of length < *n*.
* *Unilaterally n-connected*: a path of length < *n* from *i* to *j* or from *j* to *i*.
* *Strongly n-connected*: *i* and *j* are connected by two reciprocal paths of length < *n*, where the paths may have different intermediary nodes.
* *Recursively n-connected*: *i* and *j* use same intermediary nodes and lines in reverse order as the path from *j* to *i*.
*The result is that we have 4 types of n-cliques: weakly, unilaterally, strongly, and recursively. We can see this in the `symmetrize()` function in the `sna` package.
```{r,echo=TRUE,eval=TRUE,message=FALSE}
?symmetrize
# Let's create a symmetrized get along network following the "strong" definition.
GA.net.strong <- symmetrize(GA.net,rule="strong")
# Now, take a look at the plot.
gplot(GA.net.strong, edge.col="grey40", vertex.col="lightgreen", main="PINS Get Along With Network",gmode="graph")
# Now, let's look at the kcores.
GA.net.strong.kcores <- kcores(GA.net.strong,mode="graph")
table(GA.net.strong.kcores)
# Let's look at a plot.
v.cols <- rainbow(max(GA.net.strong.kcores))
v.cols <- v.cols[GA.net.strong.kcores]
gplot(GA.net.strong, edge.col="grey40", vertex.col=v.cols, main="PINS Get Along With Network",gmode="graph", label = GA.net.strong.kcores, label.pos = 5, label.cex = 0.6)
```
####*What if we use the **weak** rule?*
```{r,echo=TRUE,eval=TRUE,message=FALSE}
GA.net.weak <- symmetrize(GA.net,rule="weak")
gplot(GA.net.weak, edge.col="grey40", vertex.col="lightblue", main="PINS Get Along With Network",gmode="graph")
GA.net.weak.kcores <- kcores(GA.net.weak,mode="graph")
table(GA.net.weak.kcores)
# Let's look at a plot.
library(RColorBrewer)
rbPal <- colorRampPalette(c("red","green","blue","orange"))
v.cols <- rbPal(6)[as.numeric(cut(GA.net.weak.kcores,breaks = 6))]
gplot(GA.net.weak, edge.col="grey40", vertex.col=v.cols, main="PINS Get Along With Network",gmode="graph", label = GA.net.weak.kcores, label.pos = 5, label.cex = 0.6)
```
***
####***Questions?***####
```{r,echo=FALSE,eval=FALSE}
#END OF LAB
```