---
title: "SOLUTIONS: Homework/Problem Set 3"
author: "Statistical Analysis of Networks (CRJ) 605"
output:
html_document:
df_print: paged
theme: cosmo
highlight: haddock
toc: yes
toc_float: yes
code_fold: show
---
```{r setup, include=FALSE}
knitr::opts_chunk$set(echo = TRUE)
```
```{r,echo=FALSE,eval=TRUE,message=FALSE,warning=FALSE}
# load libraries.
library(sna, quietly = TRUE)
library(network, quietly = TRUE)
# clear the workspace.
rm(list = ls())
```
***
# **Problems**
###**For the *Local Health Department Communication* (LHDS) network, do the following**:
1. Plot the LHDS network and plot a random network with the same number of edges.
2. Based on the plots, how does the LHDS network differ from the random network?
3. Plot the degree distribution for the LHDS network and the random network?
4. How do the degree distributions differ between the LHDS network and the random network?
5. Estimate an ERGM where departments communicate at random. Interpret the coefficient.
6. Examine and describe the relationship between degree and the attribute `years` of experience.
7. Estimate an ERGM where department leaders with more `years` of experience differ in their degree. Interpret the coefficients.
8. Examine the mixing matrix for `hivscreen`. What is the interpretation of this matrix?
9. Estimate an ERGM where departments that provide an `hivscreen` have a preference for communicating with departments that also provide an `hivscreen`.
10. What is the probability of a tie between two departments who *both* provide an `hivscreen`?
11. What is the probability of a tie between two departments who *do not* provide an `hivscreen`?
12. Does the interpretation of the mixing matrix for `hivscreen` lead to the same conclusion as the ERG model for `hivscreen`?
13. Examine the mixing matrix for `nutrition`. What is the interpretation of this matrix?
14. Estimate an ERGM where departments that provide `nutrition` programs have a preference for communicating with departments that also provide `nutrition` programs.
15. What is the probability of a tie between two departments who *both* provide `nutrition` programs?
16. What is the probability of a tie between two departments who *do not* provide a `nutrition` program?
17. Does the interpretation of the mixing matrix for `nutrition` lead to the same conclusion as the ERG model for `nutrition`?
18. Now, estimate an ERGM where: departments have a preference for communicating with other departments based on `hivscreen` **and** `nutrition`. Interpret the coefficients.
19. Plot the LDHS network and identify the nodes based on `hivscreen` and `nutrition`.
20. Simulate a network from the `hivscreen` **and** `nutrition` ERGM and compare it to the observed LDHS network.
21. Evaluate the goodness of fit for the `hivscreen` **and** `nutrition` ERGM.
###**BONUS Question**:
22.B. Look through the `ergm.terms` page (use `?ergm.terms`). For the LHDS network, identify **two** other network configurations that could explain the observed network. Estimate and interpret the models.
***
# **Solutions**
###**For the *Local Health Department Communication* (LHDS) network, do the following**:
```{r,echo=TRUE,eval=TRUE,message=FALSE}
# clear the workspace.
rm(list = ls())
# First load the libraries we need.
library(sna, quietly = TRUE)
library(network, quietly = TRUE)
# Now, load the UserNetR package for the data.
library(UserNetR)
summary(lhds, print.adj = FALSE)
```
We can see from the summary that the `lhds` object is of class `network` and has 1,283 vertices (nodes) and 2,708 edges (lines). There are 6 vertex attributes.
####1. Plot the LDHS network and plot a random network with the same number of edges.
```{r,echo=TRUE,eval=TRUE,message=FALSE, figure.align = "center", figure.height = 5, figure.width = 5}
# Get some colors to use.
library(RColorBrewer)
mycols<-brewer.pal(11,"Spectral")
#display.brewer.pal(11,"Spectral") # I like the 3rd one for the colors.
# lets plot the data and see what it looks like.
set.seed(605)
gplot(lhds, usearrows = FALSE, edge.col = "grey80", vertex.col=mycols[3])
title("Communication network for 1,283 health departments")
```
```{r,echo=TRUE,eval=TRUE,message=FALSE, figure.align = "center", figure.height = 5, figure.width = 5}
# define a few terms.
g <- dim(as.matrix(lhds))[1]
l <- sum(as.matrix(lhds))/2
den <- l / (g*(g-1)/2)
# generate the random graph.
set.seed(605)
random.graph <- rgraph(
g, # the number of nodes in the lhds network.
1, # we want just 1 random graph.
tprob = den, # the density of the lhds network.
mode = "graph" )
random.net <- as.network(random.graph, directed = FALSE)
# Get some colors to use.
library(RColorBrewer)
mycols<-brewer.pal(11,"Spectral")
#display.brewer.pal(11,"Spectral") # I like the 3rd and 9th one for the colors.
# Now, take a look at both graphs.
op <- par(mfrow=c(1,2), mai = c(0,0,0.7,0))
set.seed(605)
gplot(lhds, usearrows = FALSE, edge.col = "grey80", vertex.col=mycols[3])
title("Communication network for\n 1,283 health departments")
set.seed(605)
gplot(random.net, usearrows = FALSE, edge.col = "grey80", vertex.col=mycols[9])
title("Random network")
par(op)
```
####2. Based on the plots, how does the LHDS network differ from the random network?
Well, the `random.net` network looks like a big *hairball*. In other words, we cannot really see any subgroups or structural features other than the isolates and some peripheral members. For the `lhds` network, things are different:
* First, there are a bunch of subgroups with about 6-12 members that are disconnected from the main component
* Second, there are multiple health departments with high degree that connect otherwise disconnected departments (betweenness)
* Third, most people can reach other departments meaning that, although the closeness is low, they are not disconnected (reachable)
There is actually a function in the `sna` package, `component.dist()`, that we could use to look at the components. Let's run the command `lhds.components <- component.dist(lhds)` to create an object that has this information. See `?component.dist` for info about the object properties.
```{r,echo=FALSE,eval=TRUE,message=FALSE,display=FALSE,warning=FALSE,results="hide"}
lhds.components <- component.dist(lhds)
```
```{r,echo=TRUE,eval=TRUE,message=FALSE}
# How many components are there?
length(unique(lhds.components$membership))
# How big is the largest component?
max(lhds.components$csize)
```
####3. Plot the degree distribution for the LHDS network and the random network?
```{r,echo=TRUE,eval=TRUE,message=FALSE, figure.align = "center", figure.height = 5, figure.width = 5}
# create the degree scores for each network.
lhds.deg <- degree(lhds, gmode="graph")
random.deg <- degree(random.net, gmode="graph")
op <- par(mfrow=c(2,1), mai=c(0.8,0.8,0.2,0.2))
hist(lhds.deg, breaks=20, main="", xlab="Degree (lhds)" , xlim=range(0:25), ylim=range(0:300), col=mycols[3])
hist(random.deg, breaks=20, main="", xlab="Degree (Random)", xlim=range(0:25), ylim=range(0:300), col=mycols[9])
par(op)
```
####4. How do the degree distributions differ between the LHDS network and the random network?
In looking at the distributions, we can see that there is much more right skew in the degree distribution for the `lhds` network. This indicates that there are more nodes in the `lhds` network with high degree than the network with a random assignment of ties. We can see this by looking at the difference in the means for the degree in each network as well.
```{r,echo=TRUE,eval=TRUE,message=FALSE}
# calculate means for degree.
mean(lhds.deg)
mean(random.deg)
```
####5. Estimate an ERGM where departments communicate at random. Interpret the coefficient.
The ERGM expresses the probability of observing a tie between nodes *i* and *j* given some terms (i.e. network configurations). The edge independence model of Erdos and Renyi (1959) uses a single term, the number of edges in the graph, to represent the probability of a tie. In the `ergm` package, this is the `edges` term.
```{r,echo=TRUE,eval=TRUE,message=FALSE}
# first load the ergm package.
library(ergm, quietly = TRUE)
# run the model.
edge.indep.lhds <- ergm(lhds ~ edges)
# check out the summary.
summary(edge.indep.lhds)
```
The value of -5.712 indicates that the density of the network is far below .5 or 50%. Recall that an `edges` term of 0 would represent 50% or 0.5 density.
As discussed in the **Introduction to Random Graph Models (ERGMs) in R Lab**, in the general formulation of the ERGM, the $\delta$ represents the *change statistic*, or the change in the statistic of interest when an edge is added (i.e. $Y_{ij}$) goes from 0 to 1). The change statistic for the `edges` term is always 1, so we can think of the probability of a tie between *i* and *j* as the logit of the coefficients for the `edges` term. Specifically, we can use the calculation that is usual for a logistic regression to interpret the coefficient:
$\frac{1}{1 + e^{-(\theta_1X_1)}}$
If we plug in the value of -5.71272 that is returned by the `ergm()` function, we get:
$\Bigg(P \Bigg(Y_{ij}=1 \> | \> n \> actors, Y_{ij}^C\Bigg) \Bigg) = logistic(\theta_{edges} \times \delta_{edges})$
$\Bigg(P \Bigg(Y_{ij}=1 \> | \> n \> actors, Y_{ij}^C\Bigg) \Bigg) = logistic(-5.71272 \times 1)$
$\Bigg(P \Bigg(Y_{ij}=1 \> | \> n \> actors, Y_{ij}^C\Bigg) \Bigg) = \frac{1}{1 + e^{-(-5.71272 \times 1)}} = 0.003$
*Does the value **0.003** look familiar?*
The value of 0.003 is returned by using the command `plogis()`.
```{r,echo=TRUE,eval=TRUE,message=FALSE,include=TRUE}
# we could write it out.
plogis(-5.71272*1)
# or, we could pull from the model.
plogis(edge.indep.lhds$coef[1]*1)
```
####6. Examine and describe the relationship between degree and the attribute `years` of experience.
```{r,echo=TRUE,eval=TRUE,message=FALSE}
# Examine the correlation.
cor(degree(lhds,gmode="graph"), lhds%v%"years", use = "complete.obs")
```
The correlation between degree and years of experience is 0.1678848, indicating that individuals who have more years of experience are more likely to communicate with more departments. In the model, we would want to include a term that takes this into account since individuals with more years will have higher degree.
####7. Estimate an ERGM where department leaders with more `years` of experience in that position differ in their degree. Interpret the coefficients.
```{r,echo=TRUE,eval=TRUE,message=FALSE}
# run the model.
years.degree.lhds <- ergm(lhds ~ edges + nodefactor("years"))
# check out the summary.
summary(years.degree.lhds)
# Take a look at the values for the years attributes.
table(lhds%v%"years")
```
Note that the term adds one network statistic to the model for each categorical value of the variable we have passed. But, the first category is excluded as the reference category. The value of 0 for years in position is excluded from the model, so individuals with 0 years experience in that position is the referent category. For the `nodefactor` term, positive values indicate that a node with that particular value of the attribute is *more likely* to have an edge, relative to the referent category. Alternatively, a negative value indicates that a node with that particular value of the attribute is *less likely* to have an edge, relative to the referent category.
Let's look at this for our example. Looking at the table we see:
* the term `nodefactor.years.1` has a value of `0.13571`, indicating that individuals who have 1 year experience are more likely to have edges, compared to those with 0.
* the term `nodefactor.years.2` has a value of `0.26097`, indicating that individuals who have 2 years experience are more likely to have edges, compared to those with 0.
* the term `nodefactor.years.3` has a value of `0.32162`, indicating that individuals who have 3 years experience are more likely to have edges, compared to those with 0.
Recall that in the general formulation of the ERGM, the $\delta$ represents the *change statistic*, or the change in the statistic of interest when an edge is added (i.e. $Y_{ij}$) goes from 0 to 1). The change statistic for the `nodefactor` term is different from the `edges` term we saw above. If the predictor is categorical, the value of the change statistic is 0, 1 or 2. Specifically:
* If neither of the nodes have the characteristic of interest, the change statistic is 0.
* A value of 1 indicates that *one* of the nodes in the dyad has the characteristic.
* A value of 2 indicates that *both* of the nodes in the dyad have the characteristic.
Again, let's use the calculation that is usual for a logistic regression to interpret the coefficient:
$\frac{1}{1 + e^{-(\theta_1X_1)}}$
Using the coefficient of 0.13571, the predicted probability of a tie between two nodes with **1** year of experiences as a leader is:
$\Bigg(P \Bigg(Y_{ij}=1 \> | \> n \> actors, Y_{ij}^C\Bigg) \Bigg) = logistic(\theta_{edges} \times \delta_{edges} + \theta_{years} \times \delta_{years})$
$\Bigg(P \Bigg(Y_{ij}=1 \> | \> n \> actors, Y_{ij}^C\Bigg) \Bigg) = logistic((-6.12630 \times 1) + (0.13571 \times 2))$
$\Bigg(P \Bigg(Y_{ij}=1 \> | \> n \> actors, Y_{ij}^C\Bigg) \Bigg) = logistic(-6.12630 + 0.27142)$
$\Bigg(P \Bigg(Y_{ij}=1 \> | \> n \> actors, Y_{ij}^C\Bigg) \Bigg) = logistic(-5.85488) = 0.00285769$
The value of 0.003 is returned by using the command `plogis()`.
```{r,echo=TRUE,eval=TRUE,message=FALSE,include=TRUE}
plogis(years.degree.lhds$coef[1]*1 + years.degree.lhds$coef[2]*2)
```
Note that this is a fairly small effect. That is, based on the model, nodes who are both have 1 year of experience have a 0.20% of being connected. But, the probability of a tie between two nodes whom both have 0 years experience is just the coefficient for the `edges` term:
$\Bigg(P \Bigg(Y_{ij}=1 \> | \> n \> actors, Y_{ij}^C\Bigg) \Bigg) = logistic(\theta_{edges} \times \delta_{edges})$
$\Bigg(P \Bigg(Y_{ij}=1 \> | \> n \> actors, Y_{ij}^C\Bigg) \Bigg) = logistic(-6.12630 \times 1)$
$\Bigg(P \Bigg(Y_{ij}=1 \> | \> n \> actors, Y_{ij}^C\Bigg) \Bigg) = logistic(-6.12630) = 0.002179878$
The value of 0.002 is returned by using the command `plogis()`.
```{r,echo=TRUE,eval=TRUE,message=FALSE,include=TRUE}
plogis(years.degree.lhds$coef[1]*1)
```
What is the probability of a tie between two nodes where one node has **1** year of experience and the other node has **3** years of experience? We need the coefficients for each year for this. In this case, we will multiply the change statistic by 1 for *each* coefficient:
$\Bigg(P \Bigg(Y_{ij}=1 \> | \> n \> actors, Y_{ij}^C\Bigg) \Bigg) = logistic(\theta_{edges} \times \delta_{edges} + \theta_{years} \times \delta_{years} )$
$\Bigg(P \Bigg(Y_{ij}=1 \> | \> n \> actors, Y_{ij}^C\Bigg) \Bigg) = logistic((-6.12630 \times 1) + (0.13571 \times 1) + (0.3216247 \times 1))$
$\Bigg(P \Bigg(Y_{ij}=1 \> | \> n \> actors, Y_{ij}^C\Bigg) \Bigg) = logistic(-6.12630 + 0.13571 + 0.3216247)$
$\Bigg(P \Bigg(Y_{ij}=1 \> | \> n \> actors, Y_{ij}^C\Bigg) \Bigg) = logistic(-5.6689658) = 0.003439564$
The value of 0.003 is returned by using the command `plogis()`.
```{r,echo=TRUE,eval=TRUE,message=FALSE,include=TRUE}
plogis(years.degree.lhds$coef[1]*1 + years.degree.lhds$coef[2]*1 + years.degree.lhds$coef[4]*1)
```
Nodes where ego has **1** year of experience and alter has **2** years of experience have a 0.3% of being connected.
####8. Examine the mixing matrix for `hivscreen`. What is the interpretation of this matrix?
```{r,echo=TRUE,eval=TRUE,message=FALSE}
mixingmatrix(lhds,"hivscreen")
```
The `mixingmatrix()` function shows the pattern of ties based on the `hivscreen` variable. We can see that among those who say Yes to having an HIV screen, it appears that there are more ties compared to those relations where ego says Yes and alter says No (and *visa versa*).
####9. Estimate an ERGM where departments that provide an `hivscreen` have a preference for communicating with departments that also provide an `hivscreen`.
```{r,echo=TRUE,eval=TRUE,message=FALSE}
# run the model.
hiv.homophily.lhds <- ergm(lhds ~ edges + nodematch("hivscreen"))
# check out the summary.
summary(hiv.homophily.lhds)
```
####10. What is the probability of a tie between two departments who *both* provide an `hivscreen`?
Note that the term adds one network statistic to the model for the variable. For the `nodematch` term, positive values indicate that for a pair of nodes with the **same** attribute value, an edge is *more likely*. Alternatively, a negative value indicates that for a pair of nodes with the **same** attribute value, an edge is *less likely*. Thus, positive coefficients indicate the presence of **homophily** and negative coefficients indicate the presence of **heterophily**.
Let's work through our example to see the interpretation of the coefficient. Looking at the table we see:
* the term `nodematch.hivscreen` has a value of `1.00204`, indicating that ties are more likely among nodes with the same value for the variable `hivscreen`.
Recall that in the general formulation of the ERGM, the $\delta$ represents the *change statistic*, or the change in the statistic of interest when an edge is added (i.e. $Y_{ij}$) goes from 0 to 1). The change statistic for the `nodematch` term is similar to the `edges` term we saw above. If the predictor is categorical, the value of the change statistic is 0 or 1. Specifically:
* If *i* and *j* have the same value for a categorical covariate the change statistic is 1.
* If *i* and *j* **do not** have the same value for a categorical covariate the change statistic is 0.
Again, let's use the calculation that is usual for a logistic regression to interpret the coefficient:
$\frac{1}{1 + e^{-(\theta_1X_1)}}$
Using the coefficient of 1.00204, the predicted probability of an edge between nodes with the **same** `hivscreen` designation is:
$\Bigg(P \Bigg(Y_{ij}=1 \> | \> n \> actors, Y_{ij}^C\Bigg) \Bigg) = logistic(\theta_{edges} \times \delta_{edges} + \theta_{hivscreen.homophily} \times \delta_{hivscreen.homophily})$
$\Bigg(P \Bigg(Y_{ij}=1 \> | \> n \> actors, Y_{ij}^C\Bigg) \Bigg) = logistic(-6.35331 \times 1 + 1.00204 \times 1)$
$\Bigg(P \Bigg(Y_{ij}=1 \> | \> n \> actors, Y_{ij}^C\Bigg) \Bigg) = logistic(-6.35331 + 1.00204)$
$\Bigg(P \Bigg(Y_{ij}=1 \> | \> n \> actors, Y_{ij}^C\Bigg) \Bigg) = logistic(-5.35127) = 0.004719743$
The value of 0.005 is returned by using the command `plogis()`.
Note that this is a fairly small effect. That is, based on the model, nodes with the same `hivscreen` designation have an 0.4% chance of being connected. But, ties are pretty rare in the network anyway, so we need to think about this relative to the effect sizes we have already seen.
```{r,echo=TRUE,eval=TRUE,message=FALSE,include=TRUE}
plogis(hiv.homophily.lhds$coef[1]*1 + hiv.homophily.lhds$coef[2]*1)
```
####11. What is the probability of a tie between two departments who *do not* provide an `hivscreen`?
This would be the same, because the relationship is still *homophilous*.
Note, however, that we could also estimate *differential homophily* terms, allowing the effect of homophily to vary by "Yes" or "No" regarding the `hivscreen` attribute by setting the `diff=` argument to `TRUE` in the `ergm()` function.
```{r,echo=TRUE,eval=TRUE,message=FALSE}
# run the model with the diff=TRUE argument.
hiv.diff.homophily.lhds <- ergm(lhds ~ edges + nodematch("hivscreen", diff=TRUE))
# check out the summary.
summary(hiv.diff.homophily.lhds)
```
This returns two coefficients for `hivscreen`. Using the coefficient of 1.05211, the predicted probability of an edge between nodes who **do not** provide an `hivscreen` is:
$\Bigg(P \Bigg(Y_{ij}=1 \> | \> n \> actors, Y_{ij}^C\Bigg) \Bigg) = logistic(\theta_{edges} \times \delta_{edges} + \theta_{hivscreen.homophily} \times \delta_{hivscreen.homophily})$
$\Bigg(P \Bigg(Y_{ij}=1 \> | \> n \> actors, Y_{ij}^C\Bigg) \Bigg) = logistic(-6.35331 \times 1 + 1.05211 \times 1)$
$\Bigg(P \Bigg(Y_{ij}=1 \> | \> n \> actors, Y_{ij}^C\Bigg) \Bigg) = logistic(-6.35331 + 1.05211)$
$\Bigg(P \Bigg(Y_{ij}=1 \> | \> n \> actors, Y_{ij}^C\Bigg) \Bigg) = logistic(-5.3012) = 0.004960875$
The value of 0.005 is returned by using the command `plogis()`.
```{r,echo=TRUE,eval=TRUE,message=FALSE,include=TRUE}
plogis(hiv.diff.homophily.lhds$coef[1]*1 + hiv.diff.homophily.lhds$coef[2]*1)
```
Now for the Yes responses. Using the coefficient of 0.98504, the predicted probability of an edge between nodes who **do** provide an `hivscreen` designation is:
$\Bigg(P \Bigg(Y_{ij}=1 \> | \> n \> actors, Y_{ij}^C\Bigg) \Bigg) = logistic(\theta_{edges} \times \delta_{edges} + \theta_{hivscreen.homophily} \times \delta_{hivscreen.homophily})$
$\Bigg(P \Bigg(Y_{ij}=1 \> | \> n \> actors, Y_{ij}^C\Bigg) \Bigg) = logistic(-6.35331 \times 1 + 0.98504 \times 1)$
$\Bigg(P \Bigg(Y_{ij}=1 \> | \> n \> actors, Y_{ij}^C\Bigg) \Bigg) = logistic(-6.35331 + 0.98504)$
$\Bigg(P \Bigg(Y_{ij}=1 \> | \> n \> actors, Y_{ij}^C\Bigg) \Bigg) = logistic(-5.36827) = 0.004640555$
The value of 0.005 is returned by using the command `plogis()`.
```{r,echo=TRUE,eval=TRUE,message=FALSE,include=TRUE}
plogis(hiv.diff.homophily.lhds$coef[1]*1 + hiv.diff.homophily.lhds$coef[3]*1)
```
So, the model shows that, although there is homophily based on `hivscreen`, it differs based on "Yes" and "No". Specifically, homophily is stronger among the "No" departments.
Note that the differential homophily model is nested within the uniform homophily model. We could determine whether there is statistical evidence of an effect of differential homophily, compared to uniform homophily, by conducting a $\chi^2$ test for the difference in the deviance (note that this comparison has 1 degree of freedom). We fail to reject that test. Also, comparison of the `AIC` and `BIC` show that the uniform homophily model is preferred to the differential homophily model.
####12. Does the interpretation of the mixing matrix for `hivscreen` lead to the same conclusion as the ERG model for `hivscreen`?
No. The model supports our conclusion that there is homophily based on `hivscreen`.
####13. Examine the mixing matrix for `nutrition`. What is the interpretation of this matrix?
The `mixingmatrix()` function shows the pattern of ties based on the `nutrition` variable. We can see that among those who both say Yes to having nutrition programming, it appears that there are more ties compared to those relations where ego says Yes and alter says No (and *visa versa*). Note however, that there are more off-diagonal ties (i.e. Yes/No and No/Yes) than there are ties among those who both say No to having a nutrition program. It looks like there is homophily for the Yes nodes, but not for the No nodes. To evaluate this, we want to take into account the actual distribution of Yes and No responses in the network. So, the `ergm()` will do this for us.
```{r,echo=TRUE,eval=TRUE,message=FALSE}
mixingmatrix(lhds,"nutrition")
```
####14. Estimate an ERGM where departments that provide `nutrition` programs have a preference for communicating with departments that also provide `nutrition` programs.
```{r,echo=TRUE,eval=TRUE,message=FALSE}
# run the model.
nutrition.homophily.lhds <- ergm(lhds ~ edges + nodematch("nutrition"))
# check out the summary.
summary(nutrition.homophily.lhds)
```
####15. What is the probability of a tie between two departments who *both* provide `nutrition` programs?
Let's work through our example to see the interpretation of the coefficient. Looking at the table we see:
* the term `nodematch.nutrition` has a value of `0.68013`, indicating that ties are more likely among nodes with the same value for the variable `nutrition`.
Recall that in the general formulation of the ERGM, the $\delta$ represents the *change statistic*, or the change in the statistic of interest when an edge is added (i.e. $Y_{ij}$) goes from 0 to 1). The change statistic for the `nodematch` term is similar to the `edges` term we saw above. If the predictor is categorical, the value of the change statistic is 0 or 1. Specifically:
* If *i* and *j* have the same value for a categorical covariate the change statistic is 1.
* If *i* and *j* **do not** have the same value for a categorical covariate the change statistic is 0.
Again, let's use the calculation that is usual for a logistic regression to interpret the coefficient:
$\frac{1}{1 + e^{-(\theta_1X_1)}}$
Using the coefficient of 0.68013, the predicted probability of an edge between nodes with the **same** `nutrition` designation is:
$\Bigg(P \Bigg(Y_{ij}=1 \> | \> n \> actors, Y_{ij}^C\Bigg) \Bigg) = logistic(\theta_{edges} \times \delta_{edges} + \theta_{nutrition.homophily} \times \delta_{nutrition.homophily})$
$\Bigg(P \Bigg(Y_{ij}=1 \> | \> n \> actors, Y_{ij}^C\Bigg) \Bigg) = logistic(-6.17403 \times 1 + 0.68013 \times 1)$
$\Bigg(P \Bigg(Y_{ij}=1 \> | \> n \> actors, Y_{ij}^C\Bigg) \Bigg) = logistic(-6.17403 + 0.68013)$
$\Bigg(P \Bigg(Y_{ij}=1 \> | \> n \> actors, Y_{ij}^C\Bigg) \Bigg) = logistic(-5.4939) = 0.004094939$
The value of 0.004 is returned by using the command `plogis()`.
Note that this is a fairly small effect. That is, based on the model, nodes with the same `nutrition` designation have an 0.4% chance of being connected. But, ties are pretty rare in the network anyway, so we need to think about this relative to the effect sizes we have already seen.
```{r,echo=TRUE,eval=TRUE,message=FALSE,include=TRUE}
plogis(nutrition.homophily.lhds$coef[1]*1 + nutrition.homophily.lhds$coef[2]*1)
```
####16. What is the probability of a tie between two departments who *do not* provide a `nutrition` program?
This would be the same, because the relationship is still *homophilous*.
Note, however, that we could also estimate *differential homophily* terms, allowing the effect of homophily to vary by "Yes" or "No" regarding the `nutrition` attribute by setting the `diff=` argument to `TRUE` in the `ergm()` function.
```{r,echo=TRUE,eval=TRUE,message=FALSE}
# run the model with the diff=TRUE argument.
nutrition.diff.homophily.lhds <- ergm(lhds ~ edges + nodematch("nutrition", diff=TRUE))
# check out the summary.
summary(nutrition.diff.homophily.lhds)
```
This returns two coefficients for `nutrition`. Using the coefficient of 0.67581, the predicted probability of an edge between nodes who **do not** provide a `nutrition` program is:
$\Bigg(P \Bigg(Y_{ij}=1 \> | \> n \> actors, Y_{ij}^C\Bigg) \Bigg) = logistic(\theta_{edges} \times \delta_{edges} + \theta_{nutrition.homophily} \times \delta_{nutrition.homophily}$
$\Bigg(P \Bigg(Y_{ij}=1 \> | \> n \> actors, Y_{ij}^C\Bigg) \Bigg) = logistic(-6.17403 \times 1 + 0.67581 \times 1)$
$\Bigg(P \Bigg(Y_{ij}=1 \> | \> n \> actors, Y_{ij}^C\Bigg) \Bigg) = logistic(-6.17403 + 0.67581)$
$\Bigg(P \Bigg(Y_{ij}=1 \> | \> n \> actors, Y_{ij}^C\Bigg) \Bigg) = logistic(-5.49822) = 0.004077359$
The value of 0.004 is returned by using the command `plogis()`. *Note: this is a bit off due to rounding in the `ergm()` summary.*
```{r,echo=TRUE,eval=TRUE,message=FALSE,include=TRUE}
plogis(nutrition.diff.homophily.lhds$coef[1]*1 + nutrition.diff.homophily.lhds$coef[2]*1)
```
Now for the Yes responses. Using the coefficient of 0.68064, the predicted probability of an edge between nodes who **do** provide an `nutrition` program is:
$\Bigg(P \Bigg(Y_{ij}=1 \> | \> n \> actors, Y_{ij}^C\Bigg) \Bigg) = logistic(\theta_{edges} \times \delta_{edges} + \theta_{nutrition.homophily} \times \delta_{nutrition.homophily})$
$\Bigg(P \Bigg(Y_{ij}=1 \> | \> n \> actors, Y_{ij}^C\Bigg) \Bigg) = logistic(-6.17403 \times 1 + 0.68064 \times 1)$
$\Bigg(P \Bigg(Y_{ij}=1 \> | \> n \> actors, Y_{ij}^C\Bigg) \Bigg) = logistic(-6.17403 + 0.68064)$
$\Bigg(P \Bigg(Y_{ij}=1 \> | \> n \> actors, Y_{ij}^C\Bigg) \Bigg) = logistic(-5.49339) = 0.00409702$
The value of 0.00409702 is returned by using the command `plogis()`.
```{r,echo=TRUE,eval=TRUE,message=FALSE,include=TRUE}
plogis(nutrition.diff.homophily.lhds$coef[1]*1 + nutrition.diff.homophily.lhds$coef[3]*1)
```
So, the model shows that, although there is homophily based on `nutrition`, it differs based on "Yes" and "No", but it is a pretty small difference. Specifically, homophily is stronger among the "Yes" departments. But, again, it is a small difference.
Note that the differential homophily model is nested within the uniform homophily model. We could determine whether there is statistical evidence of an effect of differential homophily, compared to uniform homophily, by conducting a $\chi^2$ test for the difference in the deviance (note that this comparison has 1 degree of freedom). We fail to reject that test. Also, comparison of the `AIC` and `BIC` show that the uniform homophily model is preferred to the differential homophily model.
####17. Does the interpretation of the mixing matrix for `nutrition` lead to the same conclusion as the ERG model for `nutrition`?
No. The model supports our conclusion that there is homophily based on `nutrition`. In fact, it shows us also that our initial belief that there was not homophily on the "No" responses was misguided as the mixingmatrix didn't account for the fact that there are three times as many "Yes" responses as there are "No" responses.
####18. Now, estimate an ERGM where: departments have a preference for communicating with other departments based on `hivscreen` **and** `nutrition`. Interpret the coefficients.
```{r,echo=TRUE,eval=TRUE,message=FALSE}
# run the model.
hiv.nutrition.homophily.lhds <- ergm(lhds ~ edges + nodematch("hivscreen") + nodematch("nutrition"))
# check out the summary.
summary(hiv.nutrition.homophily.lhds)
```
####19. Plot the LDHS network and identify the nodes based on `hivscreen` and `nutrition`.
```{r,echo=TRUE,eval=TRUE,message=FALSE}
# Create the colors for hivscreen.
hiv.cols <- lhds%v%"hivscreen"
hiv.cols[hiv.cols == "Y"] <- "blue"
hiv.cols[hiv.cols == "N"] <- "red"
hiv.cols[is.na(hiv.cols) ==TRUE] <- "white"
# Create shapes for nutrition.
nutr.shp <- lhds%v%"nutrition"
nutr.shp[nutr.shp == "Y"] <- 4
nutr.shp[nutr.shp == "N"] <- 3
nutr.shp[is.na(nutr.shp) ==TRUE] <- 50
nutr.shp <- as.numeric(nutr.shp)
# let's plot the data and see what it looks like.
set.seed(605)
gplot(lhds, usearrows = FALSE, edge.col = "grey80", vertex.col=hiv.cols, vertex.sides=nutr.shp)
title("Communication network for 1,283 health departments",
sub="Nodes Colored by HIV Screen (Blue = Y, Red = N, White = NA)\n
Nodes Shaped by Nutrition Program (Square = Y, Triangle = N, Circle = NA)"
)
```
*In the visualization, do you see the **homophily** that we observed in the prior model?*
####20. Simulate a network from the `hivscreen` **and** `nutrition` ERGM and compare it to the observed LDHS network.
```{r,echo=TRUE,eval=TRUE,message=FALSE}
# Run the simulation.
sim.hiv.nutrition.homophily.lhds <- simulate.ergm(
hiv.nutrition.homophily.lhds, # we are going to use the estimates from this model for our simulation.
nsim=1, # simulate 1 network.
warning=FALSE, # suppress a warning that pops up when we use this function.
seed = 605 # set the seed so that it reproduces the same results.
)
# Set up the colors and shapes for the simulated network.
sim.hiv.cols <- sim.hiv.nutrition.homophily.lhds%v%"hivscreen"
sim.hiv.cols[sim.hiv.cols == "Y"] <- "blue"
sim.hiv.cols[sim.hiv.cols == "N"] <- "red"
sim.hiv.cols[is.na(sim.hiv.cols) ==TRUE] <- "white"
# Create shapes for nutrition.
sim.nutr.shp <- sim.hiv.nutrition.homophily.lhds%v%"nutrition"
sim.nutr.shp[sim.nutr.shp == "Y"] <- 4
sim.nutr.shp[sim.nutr.shp == "N"] <- 3
sim.nutr.shp[is.na(sim.nutr.shp) ==TRUE] <- 50
sim.nutr.shp <- as.numeric(sim.nutr.shp)
# Now, plot both of the networks.
op <- par(mfrow=c(1,2), mai = c(0,0,0.7,0))
set.seed(605)
gplot(lhds, usearrows = FALSE, edge.col = "grey80", vertex.col=hiv.cols, vertex.sides=nutr.shp, main="Observed")
set.seed(605)
gplot(sim.hiv.nutrition.homophily.lhds, usearrows = FALSE, edge.col = "grey80", vertex.col=sim.hiv.cols, vertex.sides = sim.nutr.shp, main="Simulated")
par(op)
```
####21. Evaluate the goodness of fit for the `hivscreen` **and** `nutrition` ERGM.
```{r,echo=TRUE,eval=TRUE,message=FALSE,include=TRUE}
hiv.nutrition.homophily.lhds.gof <- gof(
hiv.nutrition.homophily.lhds, # our estimated model.
verbose = TRUE, # we want it to tell us what it is doing while it is doing it.
control = control.gof.ergm(seed = 605) # set the seed to reproduce results, note the control= argument.
)
# Now, print out the goodness of fit info.
hiv.nutrition.homophily.lhds.gof
```
The default for `gof()` is to estimate the degree distribution, the edgewise shared partner distribution, and the geodesic distance distribution. Inspection of each one tells us about how well the model we have constructed is representing the processes that generated the observed network. Specifically:
* The `Goodness-of-fit for degree` table indicates that the model is underestimating the number of nodes with degree 0 and degree 1. The model does a decent job estimating degree 2-4, but for the most part, the rest of the degree distribution is not well represented by our model.
* The `Goodness-of-fit for edgewise shared partner` table shows that all of the edgewise shared partner terms are not well represented in the model. In other words, our model is not capturing the transitivity in the network. Put differently, our model cannot generate networks that have transitivity level similar to our observed network.
* The `Goodness-of-fit for minimum geodesic distance` both also indicates a poor fit of the model. In particular, the model overpredicts how *close* the nodes are in the graph. That is, nodes in the network have a longer average path distance from each other than our model can generate.
```{r,echo=TRUE,eval=TRUE,message=FALSE,include=TRUE}
#Set up the plotting pane.
op <- par(mfrow=c(2,2))
plot(hiv.nutrition.homophily.lhds.gof)
par(op)
```
Inspection of the plots shows the same information that we observed in the tables.
###**BONUS Question**:
####22.B. Look through the `ergm.terms` page (use `?ergm.terms`). For the LHDS network, identify **two** other network configurations that could explain the observed network. Estimate and interpret the models.
Based on the results from the goodness-of-fit above, let's try to account for the transitivity in the graph using the `esp` term. Specifically, let's estimate an `esp` term for edgewise shared partners of 0 through 3. If you run this model: `esp.lhds <- ergm(lhds ~ edges + nodematch("hivscreen") + nodematch("nutrition") + esp(0:3))` you will see that it is degenerate, meaning that MCMC algorithm is not working properly based on this configuration.
Let's take a look at the `gwesp` term, which was designed to handle model degeneracy.
```{r,echo=TRUE,eval=TRUE,message=FALSE,include=TRUE}
# take a look at the info for the term.
?gwesp
# estimate the model.
gwesp.lhds <- ergm(lhds ~ edges + nodematch("hivscreen") + nodematch("nutrition") + gwesp(fixed = TRUE))
# summarize the model.
summary(gwesp.lhds)
```
This model also struggles to find a solution (as seen by the lack of convergence). The next step would be to try and tune the estimation using the `control.ergm` function.
Finally, let's try the `gwdegree` term. This takes into account the degree distribution.
```{r,echo=TRUE,eval=TRUE,message=FALSE,include=TRUE}
# take a look at the info for the term.
?gwdegree
# estimate the model.
gwdegree.lhds <- ergm(lhds ~ edges + nodematch("hivscreen") + nodematch("nutrition") + gwdegree(fixed = TRUE))
# summarize the model.
summary(gwdegree.lhds)
```
That worked! Now, let's take a look at the model fit.
```{r,echo=TRUE,eval=TRUE,message=FALSE,include=TRUE}
gwdegree.lhds.gof <- gof(
gwdegree.lhds,
GOF = ~degree # the degree distribution,
+ espartners # the edge-wise shared partners distribution,
+ distance, # our estimated model.
verbose = TRUE, # we want it to tell us what it is doing while it is doing it.
control = control.gof.ergm(seed = 605) # set the seed to reproduce results, note the control= argument.
)
#Set up the plotting pane.
op <- par(mfrow=c(2,2))
plot(gwdegree.lhds.gof)
par(op)
```
Evaluation of the goodness-of-fit shows that we are now fitting the degree distribution better than before. That makes sense given that we included a term that models the probability of various degree, as following a geometric distribution. Although, we are still struggling with the edgewise shared partner distribution and the geodesic distance distribution.
***
# **Problems**
###**Extra BONUS Questions**:
The *Talked About the Course* network, stored as the object `talk.course.net`, which is an object of class `network`, is available at: https://www.jacobtnyoung.com/uploads/2/3/4/5/23459640/crj_605_networks_spring_2019_class_survey_data_w1.rdata.
Do the following:
1. Plot the *Talked About the Course* network and plot a random network with the same number of edges.
2. Based on the plots, how does the *Talked About the Course* network differ from the random network?
3. Plot the degree distribution for the *Talked About the Course* network and the random network?
4. How do the degree distributions differ between the *Talked About the Course* network and the random network?
5. Estimate an ERGM where students talked about the course at random. Interpret the coefficient.
6. Examine and describe the relationship between degree and the attribute `Sleept1` of experience.
7. Estimate an ERGM where students who sleep more differ in their degree. Interpret the coefficients.
8. Estimate an ERGM where students who sleep more *send* less ties. Interpret the coefficient.
9. Estimate an ERGM where students who sleep more *receive* fewer ties. Interpret the coefficient.
10. Estimate an ERGM where students who sleep more *send* **and** *receive* fewer ties. Interpret the coefficients.
***
# **Solutions**
####1. Plot the *Talked About the Course* network and plot a random network with the same number of edges.
```{r,echo=TRUE,eval=TRUE}
# clear the workspace.
rm(list = ls())
# Load the data.
load(url("https://www.jacobtnyoung.com/uploads/2/3/4/5/23459640/crj_605_networks_spring_2019_class_survey_data_w1.rdata"))
# Take a look at the talked about the course network.
summary(talk.course.net,print.adj = FALSE)
```
```{r,echo=TRUE,eval=TRUE,message=FALSE, figure.align = "center", figure.height = 5, figure.width = 5}
# Get some colors to use.
library(RColorBrewer)
mycols<-brewer.pal(11,"Spectral")
#display.brewer.pal(11,"Spectral") # I like the 4th one for the colors.
# lets plot the data and see what it looks like.
set.seed(605)
gplot(talk.course.net, edge.col = "grey80", vertex.col=mycols[4])
title("Talked about the Course Network")
```
```{r,echo=TRUE,eval=TRUE,message=FALSE, figure.align = "center", figure.height = 5, figure.width = 5}
# define a few terms.
g <- dim(as.matrix(talk.course.net))[1]
l <- sum(as.matrix(talk.course.net))/2
den <- l / (g*(g-1)) # note that here we do not divide by 2 because the network is directed.
# generate the random graph.
set.seed(605)
random.graph <- rgraph(
g, # the number of nodes in talk.course.net.
1, # we want just 1 random graph.
tprob = den, # the density of talk.course.net.
mode = "digraph" )
random.net <- as.network(random.graph, directed = TRUE)
# Get some colors to use.
library(RColorBrewer)
mycols<-brewer.pal(11,"Spectral")
#display.brewer.pal(11,"Spectral") # I like the 4th and 10th one for the colors.
# Now, take a look at both graphs.
op <- par(mfrow=c(1,2), mai = c(0,0,0.7,0))
set.seed(605)
gplot(talk.course.net, edge.col = "grey80", vertex.col=mycols[4])
title("Talked about \nthe Course Network")
set.seed(605)
gplot(random.net, edge.col = "grey80", vertex.col=mycols[10])
title("Random network")
par(op)
```
####2. Based on the plots, how does the *Talked About the Course* network differ from the random network?
The major difference is that there is no core/periphery structure in the random network. Also, there are no isolates in the random network. Interestingly, the random network has 56 edges, whereas the observed network has 59 edges. *But*, visually the random network looks to have more going on.
####3. Plot the degree distribution for the *Talked About the Course* network and the random network?
```{r,echo=FALSE,eval=FALSE}
# create the degree scores for each network.
talk.ideg <- degree(talk.course.net, cmode="indegree")
talk.odeg <- degree(talk.course.net, cmode="outdegree")
random.ideg <- degree(random.net, cmode="indegree")
random.odeg <- degree(random.net, cmode="outdegree")
# Plot the distributions.
op <- par(mfrow=c(2,2), mai=c(0.8,0.8,0.2,0.2))
hist(talk.ideg, breaks=20, main="", xlab="Indegree (Talk Net)" , xlim=range(0:10), ylim=range(0:6), col=mycols[4])
hist(talk.odeg, breaks=20, main="", xlab="Outdegree (Talk Net)" , xlim=range(0:10), ylim=range(0:6), col=mycols[4])
hist(random.ideg, breaks=20, main="", xlab="Indegree (Random)" , xlim=range(0:10), ylim=range(0:6), col=mycols[10])
hist(random.odeg, breaks=20, main="", xlab="Outdegree (Random)" , xlim=range(0:10), ylim=range(0:6), col=mycols[10])
par(op)
```
####4. How do the degree distributions differ between the *Talked About the Course* network and the random network?
In looking at the indegree distributions, we can see that there is not a huge amount of difference for the `talk.course.net` network and random network. The only difference is that there are individuals receiving no ties (i.e. indegree = 0) in the `talk.course.net` network, but not the random network. For the outdegree distributions, we see the individuals we outdegree = 0 standing out for the `talk.course.net` network, whereas the random network does not have this structural feature.
####5. Estimate an ERGM where students talked about the course at random. Interpret the coefficient.
The ERGM expresses the probability of observing a tie between nodes *i* and *j* given some terms (i.e. network configurations). The edge independence model of Erdos and Renyi (1959) uses a single term, the number of edges in the graph, to represent the probability of a tie. In the `ergm` package this is the `edges` term.
```{r,echo=TRUE,eval=TRUE,message=FALSE}
# run the model.
edge.indep.talk <- ergm(talk.course.net ~ edges)
# check out the summary.
summary(edge.indep.talk)
```
The value of -0.9397 indicates that the density of the network is far below .5 or 50%. Recall that an `edges` term of 0 would represent 50% or 0.5 density.
If we plug in the value of -0.9397 that is returned by the `ergm()` function, we get:
$\Bigg(P \Bigg(Y_{ij}=1 \> | \> n \> actors, Y_{ij}^C\Bigg) \Bigg) = logistic(\theta_{edges} \times \delta_{edges})$
$\Bigg(P \Bigg(Y_{ij}=1 \> | \> n \> actors, Y_{ij}^C\Bigg) \Bigg) = logistic(-0.9397 \times 1)$
$\Bigg(P \Bigg(Y_{ij}=1 \> | \> n \> actors, Y_{ij}^C\Bigg) \Bigg) = \frac{1}{1 + e^{-(-0.9397 \times 1)}} = 0.2809609$
*Does the value **0.281** look familiar?*
The value of 0.281 is returned by using the command `plogis()`.
```{r,echo=TRUE,eval=TRUE,message=FALSE,include=TRUE}
# we could write it out.
plogis(-0.9397*1)
# or, we could pull from the model.
plogis(edge.indep.talk$coef[1]*1)
```
The coefficient of -0.9397 indicates that the probability of a tie is 0.281 for this model.
####6. Examine and describe the relationship between degree and the attribute `Sleept1` of experience.
```{r,echo=FALSE,eval=FALSE}
# Let's build a table with the variables we want.
talked.mat.data <- data.frame(
talked.in = talk.ideg,
talked.out = talk.odeg,
sleep = talk.course.net%v%"Sleept1"
)
# print the correlation matrix.
cor(talked.mat.data)
```
The correlation matrix indicates that individuals who sleep more are *less* likely to talk about the course with another person (i.e. outdegree) and are *less* likely to have someone talk about the course with them (i.e. indegree). Also, note from the correlation matrix that individuals who send more ties are more likely to receive ties.
####7. Estimate an ERGM where students who sleep more differ in their degree. Interpret the coefficients.
```{r,echo=TRUE,eval=TRUE,message=FALSE}
# First, take a look at the values for the sleep attribute.
table(talk.course.net%v%"Sleept1")
# Since the sleep variable is continuous, and not categorical, we want to use the nodecov term.
?nodecov
# run the model.
sleep.deg.talked <- ergm(talk.course.net ~ edges + nodecov("Sleept1"))
# check out the summary.
summary(sleep.deg.talked)
```
Note that the `nodecov` term adds one network statistic to the model for the variable we have passed. For the `nodecov` term, positive values indicate that a node with that particular value of the attribute is *more likely* to send and/or receive a tie as the value of the attribute increases (**note** the difference in interpretation from `nodefactor`). Alternatively, a negative value indicates that a node with that particular value of the attribute is *less likely* to send and/or receive a tie as the value of the attribute increases.
Let's look at this for our example. Looking at the table we see:
* the term `nodecov.Sleept1` has a value of `-0.41671`, indicating that individuals who sleep more are *less* likely to send/receive a tie, compared to those who sleep less.
Recall that in the general formulation of the ERGM, the $\delta$ represents the *change statistic*, or the change in the statistic of interest when an edge is added (i.e. $Y_{ij}$) goes from 0 to 1). The change statistic for the `nodecov` term is different from the `nodefactor` term we saw above. Since the predictor is *continuous* we would just plug in the value for how much change we want in the variable. This is similar to a typical regression where the interpretation of the $\beta$ coefficient is the change in *y* for a unit change in the predictor.
Using the coefficient of -0.41671, the predicted probability of a tie being incident on a node with the mean level of sleep (i.e. `mean(talk.course.net%v%"Sleept1")`) of 6.6 hours is:
$\Bigg(P \Bigg(Y_{ij}=1 \> | \> n \> actors, Y_{ij}^C\Bigg) \Bigg) = logistic(\theta_{edges} \times \delta_{edges} + \theta_{sleep} \times \delta_{sleep})$
$\Bigg(P \Bigg(Y_{ij}=1 \> | \> n \> actors, Y_{ij}^C\Bigg) \Bigg) = logistic((4.45423 \times 1) + (-0.41671 \times 6.6))$
$\Bigg(P \Bigg(Y_{ij}=1 \> | \> n \> actors, Y_{ij}^C\Bigg) \Bigg) = logistic(4.45423 + -2.750286)$
$\Bigg(P \Bigg(Y_{ij}=1 \> | \> n \> actors, Y_{ij}^C\Bigg) \Bigg) = logistic(1.703944) = 0.8460491$
The value of 0.846 is returned by using the command `plogis()`.
```{r,echo=TRUE,eval=TRUE,message=FALSE,include=TRUE}
plogis(sleep.deg.talked$coef[1]*1 + sleep.deg.talked$coef[2]*6.6)
```
####8. Estimate an ERGM where students who sleep more *send* less ties. Interpret the coefficient.
The `nodecov` term estimates an effect for an edge being incident on a node, but does not take into account the fact that individuals may differ in the receiving and sending of ties. To account for this difference, we can use the `nodeicov` and `nodeocov` terms to allow an attribute to influence individuals receiving of ties and sending of ties, respectively.
```{r,echo=TRUE,eval=TRUE,message=FALSE}
# run the model.
sleep.odeg.talked <- ergm(talk.course.net ~ edges + nodeocov("Sleept1"))
# check out the summary.
summary(sleep.odeg.talked)
```
Note that the `nodeocov` term adds one network statistic to the model for the variable we have passed. For the `nodeocov` term, positive values indicate that a node with that particular value of the attribute is *more likely* to **send** a tie as the value of the attribute increases (again, **note** the difference in interpretation from `nodefactor` and `nodeofactor`). Alternatively, a negative value indicates that a node with that particular value of the attribute is *less likely* to **send** a tie as the value of the attribute increases.
Let's look at this for our example. Looking at the table we see:
* the term `nodeicov.Sleept1` has a value of `-0.4694`, indicating that individuals who sleep more are *less* likely to **send** a tie, compared to those who sleep less.
* this is consistent with the negative correlation between `Sleept1` and outdegree noted above.
Using the coefficient of -0.2611, the predicted probability of a node *sending* a tie with the mean level of sleep (i.e. `mean(talk.course.net%v%"Sleept1")`) of 6.6 hours is:
$\Bigg(P \Bigg(Y_{ij}=1 \> | \> n \> actors, Y_{ij}^C\Bigg) \Bigg) = logistic(\theta_{edges} \times \delta_{edges} + \theta_{out.sleep} \times \delta_{out.sleep})$
$\Bigg(P \Bigg(Y_{ij}=1 \> | \> n \> actors, Y_{ij}^C\Bigg) \Bigg) = logistic((0.7593 \times 1) + (-0.2611 \times 6.6))$
$\Bigg(P \Bigg(Y_{ij}=1 \> | \> n \> actors, Y_{ij}^C\Bigg) \Bigg) = logistic(0.7593 + -1.72326)$
$\Bigg(P \Bigg(Y_{ij}=1 \> | \> n \> actors, Y_{ij}^C\Bigg) \Bigg) = logistic(-0.96396) = 0.276086$
The value of 0.276 is returned by using the command `plogis()`.
```{r,echo=TRUE,eval=TRUE,message=FALSE,include=TRUE}
plogis(sleep.odeg.talked$coef[1]*1 + sleep.odeg.talked$coef[2]*6.6)
```
####9. Estimate an ERGM where students who sleep more *receive* fewer ties. Interpret the coefficient.
This model would use the `nodeicov` term to capture variation in indegree based on sleep.
```{r,echo=TRUE,eval=TRUE,message=FALSE}
# run the model.
sleep.ideg.talked <- ergm(talk.course.net ~ edges + nodeicov("Sleept1"))
# check out the summary.
summary(sleep.ideg.talked)
```
Note that the `nodeicov` term adds one network statistic to the model for the variable we have passed. For the `nodeicov` term, positive values indicate that a node with that particular value of the attribute is *more likely* to **receive** a tie as the value of the attribute increases (again, **note** the difference in interpretation from `nodefactor` and `nodeifactor`). Alternatively, a negative value indicates that a node with that particular value of the attribute is *less likely* to **receive** a tie as the value of the attribute increases.
Let's look at this for our example. Looking at the table we see:
* the term `nodeicov.Sleept1` has a value of `-0.4694`, indicating that individuals who sleep more are *less* likely to **receive** a tie, compared to those who sleep less.
* this is consistent with the negative correlation between `Sleept1` and indegree noted above.
Using the coefficient of -0.4694, the predicted probability of a node *receiving* a tie with the mean level of sleep (i.e. `mean(talk.course.net%v%"Sleept1")`) of 6.6 hours is:
$\Bigg(P \Bigg(Y_{ij}=1 \> | \> n \> actors, Y_{ij}^C\Bigg) \Bigg) = logistic(\theta_{edges} \times \delta_{edges} + \theta_{in.sleep} \times \delta_{in.sleep})$
$\Bigg(P \Bigg(Y_{ij}=1 \> | \> n \> actors, Y_{ij}^C\Bigg) \Bigg) = logistic((2.0853 \times 1) + (-0.4694 \times 6.6))$
$\Bigg(P \Bigg(Y_{ij}=1 \> | \> n \> actors, Y_{ij}^C\Bigg) \Bigg) = logistic(2.0853 + -3.09804)$
$\Bigg(P \Bigg(Y_{ij}=1 \> | \> n \> actors, Y_{ij}^C\Bigg) \Bigg) = logistic(-1.01274) = 0.266444$
The value of 0.266 is returned by using the command `plogis()`.
```{r,echo=TRUE,eval=TRUE,message=FALSE,include=TRUE}
plogis(sleep.ideg.talked$coef[1]*1 + sleep.ideg.talked$coef[2]*6.6)
```
####10. Estimate an ERGM where students who sleep more *send* **and** *receive* fewer ties. Interpret the coefficients.
For this model, we would use both the `nodeicov` and `nodeocov` terms for sleep.
```{r,echo=TRUE,eval=TRUE,message=FALSE}
# run the model.
sleep.i.o.deg.talked <- ergm(talk.course.net ~ edges + nodeicov("Sleept1") + nodeocov("Sleept1"))
# check out the summary.
summary(sleep.i.o.deg.talked)
```
Let's look at this for our example. Looking at the table we see:
* the term `nodeicov.Sleept1` has a value of `-0.5104`, indicating that individuals who sleep more are *less* likely to **receive** a tie, compared to those who sleep less.
* the term `nodeocov.Sleept1` has a value of `-0.3246`, indicating that individuals who sleep more are *less* likely to **send** a tie, compared to those who sleep less.
Using the coefficients of -0.5104 and -0.3246, the predicted probability of a node *receiving* **and** *sending* a tie with the mean level of sleep (i.e. `mean(talk.course.net%v%"Sleept1")`) of 6.6 hours is:
$\Bigg(P \Bigg(Y_{ij}=1 \> | \> n \> actors, Y_{ij}^C\Bigg) \Bigg) = logistic(\theta_{edges} \times \delta_{edges} + \theta_{i.sleep} \times \delta_{i.sleep} + \theta_{o.sleep} \times \delta_{o.sleep})$
$\Bigg(P \Bigg(Y_{ij}=1 \> | \> n \> actors, Y_{ij}^C\Bigg) \Bigg) = logistic((4.4598 \times 1) + (-0.5104 \times 6.6) + (-0.3246 \times 6.6))$
$\Bigg(P \Bigg(Y_{ij}=1 \> | \> n \> actors, Y_{ij}^C\Bigg) \Bigg) = logistic(4.459 + -3.36864 + -2.14236)$
$\Bigg(P \Bigg(Y_{ij}=1 \> | \> n \> actors, Y_{ij}^C\Bigg) \Bigg) = logistic(-1.052) = 0.2588412$
The value of 0.259 is returned by using the command `plogis()`.
```{r,echo=TRUE,eval=TRUE,message=FALSE,include=TRUE}
plogis(sleep.i.o.deg.talked$coef[1]*1 + sleep.i.o.deg.talked$coef[2]*6.6 + sleep.i.o.deg.talked$coef[3]*6.6)
```
Note that the model with the `nodecov` term is nested in the model that separates this into `nodeicov` and `nodeocov`. This means we can determine whether individuals differ in their indegree and outdegree based on sleep by comparing these models. If we fail to reject the hypothesis that the coefficients are equal, then we can conclude that the simpler `nodecov` model is preferable and that indegree and outdegree do not differ based on sleep. We can do this by examining the AIC and BIC of the models.
```{r,echo=TRUE,eval=TRUE,message=FALSE,include=TRUE}
# The more complex model.
summary(sleep.i.o.deg.talked)
# The simpler model.
summary(sleep.deg.talked)
```
The simpler model shows a smaller AIC and BIC indicating that this model is preferred to the more complex model.
***
####***Questions?***####
```{r,echo=FALSE,eval=FALSE}
# END OF SOLUTIONS.
```