--- title: "Basic Network Analysis with R Lab" date: "CRJ 605 Statistical Analysis of Networks" output: html_document: df_print: paged theme: cosmo highlight: haddock toc: yes toc_float: yes code_fold: show --- ### *This lab will work through some basic examples of matrices and how they are created, imported, and manipulated in R.* ###**Working with matrices** First, clear the workspace: `rm(list = ls())` Let's start by working with an example of an undirected, binary network. We will create an object that is the adjacency matrix. To create the adjacency matrix, we use the `matrix()` function and the `concatenate()` or `c()` function. We can look at what these functions do by asking for help using the `help()` or `?()` functions. Recall that the help describes the arguments that the function takes and also provides examples. ```{r,echo=TRUE,eval=FALSE} ?matrix # help for the matrix() function. ?c # help for the c() function. ``` Now, let's create the data object: ```{r,echo=TRUE,eval=TRUE} data <- matrix( c(0,1,0,0,0,1,0,1,0,0,0,1,0,1,1,0,0,1,0,1,0,0,1,1,0), nrow=5,byrow=TRUE ) ``` This command reads as follows: - Combine the following numbers; - From these combined numbers, create a matrix by reading across the list; - Create an object called data. This object will be a matrix with 5 rows. We can see the object by just typing the object name: `data`. *Note* that if the number of elements does not correctly match the dimensions of the matrix, R gives you an error. For example: ```{r,echo=TRUE,eval=TRUE} junk1 <- matrix(c(1,2,3,4,5,6,7),nrow=2,byrow=TRUE) # Because there are 7 elements here, the 8th element needed for a matrix is replaced with the first value in the vector. junk1 ``` After we have created our data object, we can examine the dimensions with the dim function: `dim(data)`. We can also attach names to the rows and columns of the matrix by using the `rownames()` and `colnames()` functions. ```{r,echo=TRUE,eval=TRUE} rownames(data)<-c("Jen","Tom","Bob","Leaf","Jim") colnames(data)<-c("Jen","Tom","Bob","Leaf","Jim") data ``` We can refer to specific elements, rows, or columns by using the `[` and `]` symbols. This reads as: "object[row,column]". For example, lets look at the relation Jen sends to Tom. Recall from lecture that this is element 1,2 in the matrix (i.e. row one, column two). `data[1,2]` This command reads as follows: for the object data, return the value at row 1 column 2. We can also call the values for an entire row or column. A single value is called a scalar. ```{r,echo=TRUE,eval=TRUE} data[1,] # this reads: return the first row of data. data[,1] # this reads: return the first column of data. ``` Since we have defined names for the rows and columns, we can use those as well. ```{r,echo=TRUE,eval=TRUE} data["Jen",] # reference the row pertaining to Jen. data[,"Jen"] # reference the column pertaining to Jen. ``` *Note*: the following does not work because it needs a character, defined by the "" symbols around the name. ```{r,echo=TRUE,eval=FALSE} data[,Jen] #returns an error because there is no object called Jen. data[,"Jen"] # compare the difference with the prior line. ``` We can also call a series of values: ```{r,echo=TRUE,eval=TRUE} data[1:3,] # return the first three rows of the object data. data[,1:3] # return the first three columns of the object data. ``` We can also call a group of values that are non-contiguous using the `c()` function: ```{r,echo=TRUE,eval=TRUE} data[c(1,3),] # return the first and second rows of the object data. data[,c(1,3)] # return the first and second columns of the object data. ``` We can also call a group of values that that do not contain specified values by putting a "-" (i.e. a minus sign) in front of the c function: ```{r,echo=TRUE,eval=TRUE} data[-c(1,3),] # return the object data without rows 1 and 3. data[,-c(1,3)] # return the object data without columns 1 and 3. ``` ###**Exploring the `network` package** Now that we have our data object set up, lets manipulate it into a network and a graph. To do this, we can use the `network` package. `network` is a package containing tools to create and modify network objects created by Carter Butts (http://erzuli.ss.uci.edu/~buttsc/). See http://cran.r-project.org/web/packages/network/index.html for an overview. ```{r,echo=FALSE,eval=TRUE,message=FALSE} library(network) ``` First, we need to install the package using: `install.packages("network")`. *Note*: if you have already installed the package, no do not need to reinstall it. If it is already installed, we should check to make sure we have the most recent version: `update.packages("network")` Whenever we start R, we need to load the package because it is not automatically loaded. To do this, use the `library()` function. `library("network")` To get a list of the contents of the package, as for help with respect to the package itself use the `help()` function, but tell R we want help on the package: `help(package="network")` ###**Working with *Unidirected*, Binary Networks** Now that the package is loaded, lets create a new object from our matrix that is a network. That is, we will use the network function to create an object that is of class "network". To use some of the functions, it has to be a specific class. ```{r,echo=TRUE,eval=TRUE} data # look at our data object. class(data) # what class is the object "data"? net.u <- as.network(data) #now, coerce the object "data" into a object called "net.u" that is of class "network". ``` When we enter the object in the command line, summary info about the object is produced: `net.u`. This is because the object is of class "network". We can use the class function to confirm this: `class(net.u)`. Let's look at the object again: `net.u`. *What does the summary output of the object tell us?* Note that the network is treated as **directed**. By default, the function `as.network()` sets the argument `directed =` to `TRUE`. We can see this by looking at the structure of the function in the help page: `?as.network`. *What do we need to change in the `as.network()` function?* We need to change the input for the `directed=` argument because our network is **undirected**. In other words, `directed = FALSE`. This tells the function that the matrix we are entering is an undirected network: ```{r,echo=TRUE,eval=TRUE} net.u.correct <- as.network(data,directed=FALSE) # Now, compare the difference since telling the function that the network is directed: net.u net.u.correct ``` The `summary()` function is a generic function that summarizes objects. We can use it on an object of class "network" to provide more information: `summary(net.u.correct)`. More information about what can be done with the `summary()` function for an object of class network is shown on the `?as.network` page. We could also enter the data as an edgelist using the `matrix.type = ` argument. By default, the function `as.network()` sets the argument `matrix.type =` to `adjacency`. For an edgelist, we would need to change the input for the `matrix.type =` argument to `edgelist`. ```{r,echo=TRUE,eval=TRUE} # For example, lets make an edgelist: edge <- matrix(c("Jen","Tom","Tom","Bob","Bob","Leaf","Bob","Jim","Leaf","Jim"),nrow=5,byrow=TRUE) # 5 rows and we are reading off by row. edge edge.net.u <- as.network(edge,directed=FALSE,matrix.type="edgelist") # we have changed the default to edgelist for the matrix.type argument. summary(edge.net.u) # Note that the as.network() function will often recognize the matrix type being entered: edge.net.u <- as.network(edge,directed=FALSE) summary(edge.net.u) ``` ###**Working with *Directed*, Binary Networks** Now, lets work with the example of a directed, binary network. We will create an object that is the adjacency matrix. ```{r,echo=TRUE,eval=TRUE} data.d <- matrix(c(0,1,0,0,0,0,0,1,0,0,0,0,0,1,1,0,0,1,0,1,0,0,1,1,0),nrow=5,byrow=TRUE) # again, note that these were written in by row, so we want it to read by row (i.e. byrow=TRUE). data.d # take a look at the matrix. ``` Now, let's coerce it to be an object of class network. ```{r,echo=TRUE,eval=TRUE} net.d <- as.network(data.d,directed=TRUE) summary(net.d) ``` Just as before, we could also enter the data as an edgelist. Since we have directed relations, we have more edges. This is because reciprocated ties count twice. So, we have to tell the `matrix()` function that the matrix has 8 rows, instead of 5. ```{r,echo=TRUE,eval=TRUE} edge.d <- matrix(c("Jen","Tom","Tom","Bob","Bob","Leaf","Bob","Jim","Leaf","Bob","Leaf","Jim","Jim","Bob","Jim","Leaf"),nrow=8,byrow=TRUE) edge.d.net <-network(edge.d,directed=TRUE,matrix.type="edgelist") summary(edge.d.net) # I have added the argument "print.adj=FALSE", what is different? summary(edge.d.net,print.adj=FALSE) ``` ###**Importing Network data** ```{r,echo=FALSE,eval=TRUE} rm(list = ls()) ``` If we had a large network, these routines (i.e. using the `matrix()` function) would be tedious. Most of the time, we have a file that is an adjacency matrix or an edgelist that we can import. The `read.csv()` function can be used to read in .csv files that are arranged in this way. Let's take a look at the help for this function: `?read.csv`. We will use the "lab_2_undirected_example.csv" file from my website. To access the file, we can use the url in the `read.csv()` function: ```{r,echo=TRUE,eval=TRUE} data.u <- read.csv("https://www.jacobtnyoung.com/uploads/2/3/4/5/23459640/undirected_net_example.csv") data.u class(data.u) #note that the read.csv function creates an object of class data.frame. ``` We need to adjust the arguments to read in the file how we want it. Specifically, we want to do the following: - Set the `as.is=` argument equal to `TRUE` so that it reads the data as it is. - Set the `header=` argument to `TRUE` to indicate that there is a header, or column labels. - Set the `row.names=` argument equal to 1 to indicate that the name of the rows are in the first column. ```{r,echo=TRUE,eval=TRUE} data.u <- read.csv("http://www.weebly.com/uploads/2/3/4/5/23459640/lab_2_undirected_example.csv",as.is=TRUE,header=TRUE,row.names=1) data.u ``` Now, make the object into one of class network: ```{r,echo=TRUE,eval=TRUE} net.u <- network(data.u,directed=FALSE) #recall that since this network is undirected, so we set the directed argument to FALSE. ``` We could also import the file if it is saved locally (i.e. we are not going to the web to get it). Let's do this for the directed network. I have saved the file to my desktop. First, look at what directory we are in using: `getwd()` function. This function gets the wording directory R is currently looking at. Then, set the directory where the file is using the `setwd()` function. You can get the location of the file in Windows using properties, or info on Mac. *Note*: you have to configure this path to your machine. For me, it is: `setwd("/Users/jyoung20/Desktop")`. Then, use read.csv as above: ```{r,echo=TRUE,eval=FALSE} setwd("/Users/jyoung20/Desktop") data.d <- read.csv("directed_net_example.csv",as.is=TRUE,header=TRUE,row.names=1) net.d <- network(data.d,directed=TRUE) # Note: we don't need to tell it that the network is directed since this is the default, but a good habit to get into. ```

####***Questions?***#### ```{r,echo=FALSE,eval=FALSE} #END OF LAB ```