---
title: "Bipartite Graphs/Two-Mode Networks in R Lab"
date: "CRJ 605 Statistical Analysis of Networks"
output:
html_document:
df_print: paged
theme: cosmo
highlight: haddock
toc: yes
toc_float: yes
code_fold: show
---
```{r,echo=FALSE,eval=TRUE,message=FALSE,warning=FALSE}
library(devtools)
library(network)
library(sna)
```
###
*This lab examines various properties of bipartite graphs (e.g. density, degree centrality, dyadic clustering) using basic functions in R as well as the `tnet` package.*
***
###**Bipartite Graphs/Two-Mode Networks**
Bipartite graphs are useful for operationalizing context where nodes come from two separate classes. Examples:
* Members of various groups
* Authors of papers
* Students in courses
* Participants in an event
In contrast to one-mode networks, or unipartite graphs, where edges can be incident *within* a particular node/vertex set, in two-mode or bipartite graphs there are two partitions of nodes (called modes), and edges only occur *between* these partitions (i.e. not within).
Let's build the example bipartite graph from the **Biparitite Graphs** lecture:
```{r,echo=TRUE,eval=TRUE,message=FALSE,include=TRUE}
# clear the workspace.
rm(list = ls())
# create the example network.
bipartite.example <- rbind(c(1,1,0,0,0),c(1,0,0,0,0),c(1,1,0,0,0),c(0,1,1,1,1),c(0,0,1,0,0),c(0,0,1,0,1))
rownames(bipartite.example) <- c("A","B","C","D","E","F")
colnames(bipartite.example) <- c("1","2","3","4","5")
# print out the object.
bipartite.example
```
We can create an object of class `network` by using the `as.network()` function in the `network` package. First, take a look at the help for the `as.network()` function, paying particular attention to the `bipartite=` argument.
```{r,echo=TRUE,eval=TRUE,message=FALSE,include=TRUE}
# Call the network package.
library(network)
?as.network
```
In looking through the help file for the `as.network()` function, we see that the `bipartite=` argument says that this argument allows the count of the number of actors in the bipartite network. Recall from the **Biparitite Graphs** lecture that a bipartite adjacency matrix has order *NxM*, where *N* represents the number of rows (e.g. actors) and *M* represents the number of columns (e.g. events). In the `bipartite=` argument, we can specify the count as *N* or as *N* + *M*.
For example:
```{r,echo=TRUE,eval=TRUE,message=FALSE,include=TRUE}
# Create a network object.
bipartite.example.net <- as.network(bipartite.example,bipartite=dim(bipartite.example)[1]) # take the first dimension of the matrix.
bipartite.example.net # there 11 vertices, 6 are bipartite (in the first mode), and 12 edges.
# Or,
bipartite.example2.net <- as.network(bipartite.example,bipartite=(dim(bipartite.example)[1]+dim(bipartite.example)[2])) # add the first and second dimension.
bipartite.example2.net # there 11 vertices, 6 are bipartite (in the first mode), and 12 edges.
# Show that the structures are the same.
as.matrix(bipartite.example.net) == as.matrix(bipartite.example2.net)
# Let's see a plot.
op <- par(mar=c(0,0,4,0))
set.seed(3)
bipartite.coords <- gplot(
bipartite.example.net,label=network.vertex.names(bipartite.example.net),
gmode="twomode", # Note that in gplot(), we use the "gmode" argument and assign the value "twomode" to indicate that the graph has two partitions of vertices.
usearrows=FALSE, #Also, we toggle the arrows since the graph is undirected.
vertex.cex=2,label.cex=1.2,main="Bipartite Graph of Example Graph"
)
par(op)
# look how it reads the labels.
network.vertex.names(bipartite.example.net)
```
*Note* that the `gplot()` function reads the labels by starting with the names in the first mode, then the names in the second mode. Also, the `gmode=` argument automatically changes the colors for us when we specify that the graph is `twomode`. We can use this ordering to change features of the plot, such as color or shapes. Here are some examples:
```{r,echo=TRUE,eval=TRUE,message=FALSE,include=TRUE}
# Let's change the shapes and the colors.
op <- par(mar=c(0,0,4,0))
set.seed(3)
gplot(bipartite.example.net,gmode="twomode",usearrows=FALSE,label=network.vertex.names(bipartite.example.net),
vertex.cex=2,label.cex=0.8,main="Bipartite Graph of Example Graph",
vertex.sides=c(rep(3,dim(bipartite.example)[1]),rep(4,dim(bipartite.example)[2])),
# this reads:
# repeat 3 the number of times in the first dimension;
# repeated 4 the number of times in the second dimension.
vertex.col=c(rep("orange",dim(bipartite.example)[1]),rep("purple",dim(bipartite.example)[2]))
# this reads:
# repeat the color orange the number of times in the first dimension;
# repeat the color purple the number of times in the second dimension;
)
par(op)
```
***
###**Structural Properties of Bipartite Graphs/Two-Mode Networks**
####*Density*
The *density* of a bipartite graph is the number of observed edges, *L*, divided by the number of nodes in the first mode, *N*, multiplied by the number of nodes in the second mode, *M*. That is:
$$\frac{L}{N \times M}$$
In other words, the density of the graph is the number of edges we observed divided by the maximum number of possible edges in the graph. We can calculate this using the `sum()` and `dim()` functions.
```{r,echo=TRUE,eval=TRUE,message=FALSE,include=TRUE}
density.bipartite.example <- sum(as.matrix(bipartite.example.net))/(dim(as.matrix(bipartite.example.net))[1]*dim(as.matrix(bipartite.example.net))[2])
density.bipartite.example # what is the interpretation of the density? 40% of the ties we could observe, were observed.
# Alternatively, we could write a function to do this:
density.bipartite <- function(edges,N,M){
density <- edges/(N*M)
return(density)
}
density.bipartite(
sum(as.matrix(bipartite.example.net)), # the edges argument.
dim(as.matrix(bipartite.example.net))[1], # the N argument.
dim(as.matrix(bipartite.example.net))[2] # the M argument.
)
# Note, that alternatively, we could have just added the calculation of the arguments to the function.
density.bipartite.simpler <- function(net.object){
density <- sum(as.matrix(net.object))/(dim(as.matrix(net.object))[1]*dim(as.matrix(net.object))[2])
return(density)
}
```
**What does the value of 0.4 for the density of our graph tell us?**
***
####*Degree Centrality*
For a bipartite graph there are *two* degree distributions:
* The distribution of ties in the first mode
* The distribution of ties in the second mode
We can calculate the degree centrality scores for each node in each corresponding vertex set by taking the *row* sum for *N* nodes in the first mode and taking the *column* sum for *M* nodes in the second mode.
```{r,echo=TRUE,eval=TRUE,message=FALSE,include=TRUE}
# print out the raw scores.
rowSums(as.matrix(bipartite.example.net))
colSums(as.matrix(bipartite.example.net))
# this is easier if we look at the table of values.
table(rowSums(as.matrix(bipartite.example.net)))
table(colSums(as.matrix(bipartite.example.net)))
```
**How should we interpret the centrality scores for each node set?**
***
#####*Mean Degree Centrality*
As before, we could examine the central tendency by examining the mean degree for each node/vertex set:
*For the first node set, we divide by $$\frac{L}{N}$$, the number of nodes in that set.
*For the second node set, we divide by $$\frac{L}{M}$$, the number of nodes in that set.
```{r,echo=TRUE,eval=TRUE,message=FALSE,include=TRUE}
sum(as.matrix(bipartite.example.net)) / dim(as.matrix(bipartite.example.net))[1]
sum(as.matrix(bipartite.example.net)) / dim(as.matrix(bipartite.example.net))[2]
# We could also see this by using the mean function over the degree data.
mean(rowSums(as.matrix(bipartite.example.net)))
mean(colSums(as.matrix(bipartite.example.net)))
```
**How should we interpret the mean centrality score for each node set?**
***
#####*Standardized Degree Centrality*
Degree centrality scores for each node/vertex set not only reflects each node’s connectivity to nodes in the other set, but also depend on the size of that set. As a result, larger networks will have a higher maximum possible degree centrality value. *Solution?*
**Standardize!!!**
As we saw for unipartite graphs, we can adjust the raw degree centrality scores by taking into account the size of the graph. In a bipartite graph, we can standardize, or *normalize*, by dividing the raw centrality scores by the number of nodes in the opposite vertex set. That is, for the centrality scores in the first mode we divide by *M* and for the centrality scores in the second mode we divide by *N*.
```{r,echo=TRUE,eval=TRUE,message=FALSE,include=TRUE}
(rowSums(as.matrix(bipartite.example.net)))/dim(as.matrix(bipartite.example.net))[2]
(colSums(as.matrix(bipartite.example.net)))/dim(as.matrix(bipartite.example.net))[1]
```
**How should we interpret the standardized centrality scores for each node set?**
As we did before, we could create a function to return this information for us:
```{r,echo=TRUE,eval=TRUE,message=FALSE,include=TRUE}
# Raw.
degree.bipartite <- function(biparitite.net){
degree.n <- rowSums(as.matrix(biparitite.net)) # degree for the first node set.
degree.m <- colSums(as.matrix(biparitite.net)) # degree for the second node set.
return(c(degree.n,degree.m)) # put them both together.
}
# Standardized.
degree.s.bipartite <- function(biparitite.net){
degree.s.n <- (rowSums(as.matrix(biparitite.net)))/dim(as.matrix(biparitite.net))[2] # degree for the first node set.
degree.s.m <- (rowSums(as.matrix(biparitite.net)))/dim(as.matrix(biparitite.net))[1] # degree for the second node set.
return(c(degree.s.n,degree.s.m)) # put them both together.
}
# Now, take a look at the scores:
degree.bipartite(bipartite.example.net)
degree.s.bipartite(bipartite.example.net)
# Pass these to the plot.
set.seed(3)
op <- par(mar=c(0,0,4,0))
gplot(bipartite.example.net,gmode="twomode",usearrows=FALSE,
label=network.vertex.names(bipartite.example.net),
label.cex=1.2,main="Bipartite Graph of Example Graph",
vertex.sides=c(rep(3,dim(bipartite.example)[1]),rep(4,dim(bipartite.example)[2])),
vertex.col=c(rep("orange",dim(bipartite.example)[1]),rep("purple",dim(bipartite.example)[2])),
vertex.cex=degree.bipartite(bipartite.example.net)
)
par(op)
```
***
####*Dyadic Clustering*
The *density* tells us about the overall level of ties between the node/vertex sets in the graph and *degree centrality* tells us about how many edges are incident on a node in each node/vertex set. *What about the overlap in ties?* In other words, do nodes in *N* tend to **share** nodes in *M*? This is the notion of *clustering* in a graph.
In a bipartite graph, there are two interesting structures:
* 3-paths (sometimes called $L_3$) are a path of distance 3.
* Cycles (sometimes called $C_4$) are a closed 3-path.
Cycles in a graph create multiple ties between vertices in *both* modes. The ratio of cycles to 3-paths in a graph is proportional to the level of *dyadic clustering* (sometimes called *reinforcement*). A value of 1 indicates that every 3-path is closed (i.e., is embedded in a cycle). Networks with values at or close to 1 will have considerable redundancy in ties.
We can calculate the *dyadic clustering* coefficient for the example network above by using the `reinforcement_tm()` function in the `tnet()` package. If you have not already installed the `tnet()` package, do so using the `install.packages("tnet")` command.
```{r,echo=TRUE,eval=TRUE,message=FALSE,include=TRUE}
library(tnet) # Note that tnet conflicts with some of the functionality of the sna package.
# Take a look at the reinforcement_tm() function.
?reinforcement_tm
# First, we need to coerce our network to be of class "tnet" using as.tnet() so that the reinforcement() function will work.
?as.tnet
bipartite.example.tnet <- as.tnet(as.matrix(bipartite.example.net))
# Now, calculate the dyadic clustering coefficient.
reinforcement_tm(bipartite.example.tnet)
```
**What is the interpretation of the dyadic clustering coefficent in this example?**
***
###**Empirical Example**
Now, let's take a look at the biparite graph used in a paper I wrote with Justin Ready entitled ["Diffusion of Ideas and Technology: The Role of Networks in Influencing the Endorsement and Use of On-Officer Video Cameras"](https://journals.sagepub.com/doi/10.1177/1043986214553380). The goal of the paper was to examine how police officers develop cognitive frames about the usefulness of body-worn cameras and try to determine whether interactions with other officers provide a conduit for facilitating cognitive frames that increase camera legitimacy.
The bipartite graph is available as an edgelist is available [here](https://www.jacobtnyoung.com/uploads/2/3/4/5/23459640/officer.event.edgelist.csv). Let's import the edgelist and take a look at the network.
```{r,echo=TRUE,eval=TRUE,message=FALSE,include=TRUE}
library(sna)
library(network)
officer.event.edgelist <- as.matrix(read.csv("https://www.jacobtnyoung.com/uploads/2/3/4/5/23459640/officer.event.edgelist.csv", as.is=TRUE, header=TRUE))
dim(officer.event.edgelist) # 358 edges in the network.
length(unique(officer.event.edgelist[,1])) # 81 officers.
length(unique(officer.event.edgelist[,2])) # 153 events.
# Now, create the network object.
officer.event.net <- as.network(officer.event.edgelist,
bipartite=81, #tell the function that there are 81 nodes in the first mode.
matrix.type = "edgelist" # tell the function this is an edgelist.
)
```
Let's check our data before moving forward:
```{r,echo=TRUE,eval=TRUE,message=FALSE,include=TRUE}
# take a look at the network.
officer.event.net
# take a look at the names.
network.vertex.names(officer.event.net)
# look at the dimensions.
dim(as.matrix(officer.event.net))
# now look at the way the network is read in:
as.matrix(officer.event.net)[1:10,1:10]
```
**Something fishy here?**
```{r,echo=TRUE,eval=TRUE,message=FALSE,include=TRUE}
# There are duplicate entries in the edgelist.
duplicated(officer.event.edgelist)
# For example, take a look at rows 15 and 17.
officer.event.edgelist[15:17,]
```
A way we can do this differently is to create an empty matrix, and then assign edges to that matrix based on the edgelist. Then, we won't have duplicate edges and the names will be correctly ordered:
```{r,echo=TRUE,eval=TRUE,message=FALSE,include=TRUE}
# First, create the adjacency matrix
uR <-unique(officer.event.edgelist[,1]) # all unique row labels.
uC <- unique(officer.event.edgelist[,2]) # all unique colunm labels.
mat <- matrix(0, length(uR), length(uC)) # initialize zeroed matrix.
rownames(mat) <- uR # name the rows.
colnames(mat) <- uC # name the columns.
mat[officer.event.edgelist] <- 1 # fill edgelist indexed edges with a 1.
diag(mat) <- 0
# Take a look at the matrix.
sum(mat) # 343 edges.
dim(mat) # 81 officers by 153 events.
# Now, create the bipartite network.
rm(officer.event.net) # remove the old one.
officer.event.net <- as.network(mat,bipartite=81,matrix.type="adjacency")
# take a look at the network.
officer.event.net
# take a look at the names.
network.vertex.names(officer.event.net)
# look at the dimensions.
dim(as.matrix(officer.event.net))
```
***
#####*Properties of the Bipartite Graph*
Now that we have it constructly properly, we can example the properties of the network.
First, let's plot the network. To aid in visualization, we need to create a variable indicating whether the officer was in the treatment (i.e. received a body-cam) or control group. If you look at the `network.vertex.names(officer.event.net)` object, you will see that the officer ids start with "C" or "T". The first 44 ids are for control officers and the subsequent 37 ids are for treatment officers. The remaining 153 correspond to events. We can use this info to create a variable that indicates treatment status.
```{r,echo=TRUE,eval=TRUE,message=FALSE,include=TRUE}
# Create a variable using the information in the ids.
status <- c(rep("Control",44),rep("Treatment",37),rep("Event",153))
# Create colors for the plot.
vcol <- status
vcol[status == "Control"] <- "green" # make controls green.
vcol[status == "Treatment"] <- "Red" # make treatment group red.
vcol[status == "Event"] <- "White" # make events white.
# Create the shapes.
vsides <- rep(0,length(status))
vsides[status == "Control"] <- 3 # make controls triangles.
vsides[status == "Treatment"] <- 4 # make treatment group squares.
vsides[status == "Event"] <- 50 # make events circles.
# Now, plot the network.
set.seed(605)
op <- par(mar=c(0,0,4,0))
gplot(officer.event.net,gmode="twomode",usearrows=FALSE,displayisolates=FALSE,
vertex.col=vcol,
vertex.sides=vsides,
edge.col="grey60",
edge.lwd=1.2,
vertex.cex = c(rep(2,dim(as.matrix(officer.event.net))[1]), # sizes for officers.
rep(1.2,dim(as.matrix(officer.event.net))[2])) # sizes for events.
)
par(op)
```
Now, let's take a look at several properties of the graph: density, degree centrality, and dyadic clustering:
```{r,echo=TRUE,eval=TRUE,message=FALSE,include=TRUE}
# We can just use the functions we created earlier for density and centrality.
officer.event.density <- density.bipartite.simpler(officer.event.net)
officer.event.degree <- degree.bipartite(officer.event.net)
# Density.
officer.event.density
# Degree distributions.
table(officer.event.degree[1:81]) # note we have to ask for the first node set N.
table(officer.event.degree[82:234]) # note we have to ask for the second node set M.
# Mean Degree.
sum(as.matrix(officer.event.net)) / dim(as.matrix(officer.event.net))[1]
sum(as.matrix(officer.event.net)) / dim(as.matrix(officer.event.net))[2]
```
**What is the interpretation of the density for this network? What is the interpretation of the mean degree for each node set?**
Lastly, we can calculate the *dyadic clustering* coefficient for this network by using the `reinforcement_tm()` function in the `tnet()` package.
```{r,echo=TRUE,eval=TRUE,message=FALSE,include=TRUE,warning=FALSE}
library(tnet)
officer.event.tnet <- as.tnet(as.matrix(officer.event.net))
reinforcement_tm(officer.event.tnet)
```
**What is the interpretation of the dyadic clustering coefficent in this example?**
***
####***Questions?***####
```{r,echo=FALSE,eval=FALSE}
#END OF LAB
```