What is a social network ?

A social network describes all the individuals in a group AND all the links between them. The photo above shows a social network from the television show The L Word. In this particular network, the nodes, shown as names, represent people in the fictional LA social scene of the series; while the edges, or lines between names, represent romantic and sexual connections to each other. As you can see, the author of this particular social network analysis - Alice Pieszecki - has annotated her network to indicate particularly important (or promiscuous) nodes (i.e., Shane, Lauren, and Alice, herself) that have many connections across the network, and colored some nodes and edges differently to indicate perhaps other social qualities within the network.

Of course, there are more formal ways to understand the importance of particular nodes or the quality of relationships based on the structure of edges in a social network, and even to find out if there are other variables outside of the nodes and edges themselves that might influence a social network (in the case of The L Word, for example, class or sexual orientation might be important variables).

We can find these using social network analysis.

What can social network analyses tell us?

Social network analysis includes the theory, methods, and practical/analytical applications for finding meaningful patterns within the structure of social network datasets.

In this vignette, you will learn (1) how to prepare social network data for analysis and (2) how to perform quadratic assignment procedures to assess correlations between social networks (response variables) and factors that may influence the structure of those networks (predictor variables).

A network analysis (as opposed to a non-network analysis) may be used to reveal information about the group as a whole, as opposed to the individual. Looking back at our social network from The L Word, non-network analysis might only look at Shane’s romantic connections independently, to understand why Shane hooks up with whom and how many hook-ups she’s had. This can tell us about Shane’s choices and appeal, and potentially about who seeks out Shane as a partner, but doesn’t tell us much about the LA power-lesbian scene as a whole. A social network analysis, however, can tell us how Shane’s romantic connections are structured in relation to Alice’s and Lauren’s (in other words, we can understand them as being not independent of the larger social network). It could also reveal group-level characteristics that don’t exist or are difficult to ascertain just by looking at individuals. When we want to look at the patterns of relationships between individuals, an edge is just as important (and as much a network-object) as a node.

Terms to know

  • Nodes: the individuals in the social network

  • Edges: the relationships between individuals in a network

  • Directionality: some social networks will have a relationship that goes in one direction (e.g., Helena gives Alice money). Other relationships won’t have directionality (e.g., Bette and Tina get married). Directionality is relevant for understanding transactional or rank-based relationships.

  • Edge weight: the strength of the relationship between individuals, which may differ between individuals within social networks (e.g., if Shane has a lot of one-night-stands, but hooks up with Carmen multiple times, we might be interested in how many times as this might be an indicator of the quality of their relationship. This value is the edge’s weight)

By the end of this module, you should be able to answer the following questions about social networks in R:

  1. How are individuals connected in a group?

  2. How can we test the extent to which specific attributes influence whether individuals in a group are connected?

  3. How do networks change over time?

To accomplish this, we will (sadly) leave The L Word to use real social network datasets from nonhuman primates!

We will start with a dataset from bonobos (Pan paniscus). Bonobos are one of our closest living relatives and live in multi-male/multi-female social groups. Female bonobos are highly cohesive and have strong affiliative bonds. These bonds are maintained through high levels of sociosexual behavior, the most common being genito-genital (GG) rubbing. Similar to most other primates, bonobos will also frequently groom each other, and grooming behaviors can be a key indicator of rank and the quality of relationships within a social group.

Installing packages

The most important packages we will be using for this modules is {statnet}:

library(statnet)

{igraph} and {statnet} are two of the most commonly used packages for social network analyses. Both include built-in functions that can plot networks and answer almost every beginner-level question about a given network. However, there may be some functions that are present in one package but not the other (or just work better). So, {intergraph} is another useful package that helps users transition between {igraph} and {statnet}. For this module, we will only need {statnet}.

Understanding the structure of social network data

Loading data

Let’s start by loading our data from bonobos. There are two datasets from zoo-based bonobo populations at Apenheul Primate Park in the Netherlands, and La Vallée des Singes in France. One dataset describes GG rubbing networks and the other describes grooming networks in these populations.

library(curl)
f <- curl("https://raw.githubusercontent.com/fuzzyatelin/fuzzyatelin.github.io/master/bioanth-stats/module-F23-Group1/AnzaEtAl_2021_ggr_weighted.csv")
bonobo_ggr_edgelist <- read.csv(f, header = TRUE, sep = ",")
head(bonobo_ggr_edgelist) # data about bonobos' GG rubbing interactions
##   actor receiver GGR.Weight
## 1    ln       ky          4
## 2     h        j         20
## 3     j        h          4
## 4    ky       lc         15
## 5    lc       ky         20
## 6    lc       ln         31
q <- curl("https://raw.githubusercontent.com/fuzzyatelin/fuzzyatelin.github.io/master/bioanth-stats/module-F23-Group1/AnzaEtAl_2021_grooming_weighted.csv")
bonobo_groom_edgelist <- read.csv(q, header = TRUE, sep = ",")
head(bonobo_groom_edgelist) # data about bonobos' grooming interactions
##   actor receiver weight
## 1     j       li      1
## 2    li        h      3
## 3     h        j      6
## 4     j        h      3
## 5     h       li      2
## 6    ln       dn      8

What are we looking at in these data sets?

Who are the actors (also known as senders) vs the receivers?

In the first 6 rows we are looking at for each data set, who has the most weighted relationship?

What does this mean?

Setting up social network data

How we set-up and manage the social network data can impact the ease with which we navigate our social network analysis. Based on how many individuals there are, whether we know any of their attributes (age, sex, etc), and information about the edges themselves (directionality, edge-weights, etc), we can store and input data either as a matrix or as an edge list.

Adjacency matrix

A matrix used for social network analysis is called a sociomatrix or an adjacency matrix. They are square matrices. For directed networks, rows indicate the starting node, and columns indicate the receiving node. Undirected networks are symmetric matrices around the diagonal. For un-weighted networks, the cells contain a 1, for the presence of an edge and a 0 for the absence of an edge between individuals. For weighted networks, the cells can contain the edge-weights instead of 1s and 0s.

Edge list

Alternatively, we can also represent network data as a columned list, called an edge list. Each row of the list represents one edge (one relationship between individuals). In directed data, the first column names the sender of the edge and the second column names the receiver. In undirected data, it does not matter which node is listed first. For weighted data, a third column contains a numeric value for each tie.

An edge list is usually easier for a person to interpret. However, it is important to be familiar with adjacency matrices as they are the format used to conduct mathematical operations in various social network analyses.

Is our bonobo dataset structured as an edge list or adjacency matrix?

Node-Level Variables

To include variables that describe your nodes (age, rank, sex, etc), it is good practice to load in a separate file that contains these data. This data frame must include one column of the node IDs using the exact same IDs used in the adjacency matrix or edge list. All other variables should then be included in additional columns.

j <- curl("https://raw.githubusercontent.com/fuzzyatelin/fuzzyatelin.github.io/master/bioanth-stats/module-F23-Group1/AnzaEtAl_2021_attributes.csv")
bonobo_attribute <- read.csv(j, header = TRUE, sep = ",")
bonobo_attribute
##   actor rank age group
## 1    ln 3.42   9     3
## 2     h 3.42  31     2
## 3     j 6.20  24     2
## 4    ky 4.66  11     3
## 5    lc 4.11   9     3
## 6    dn 5.57  44     3
## 7    li 5.23  12     2

Who is the highest ranking bonobo? Who is the lowest? Who is the oldest? Who is the youngest?

Now we can create the network objects and also add node attributes to those objects. Note that adding node attributes to objects involves the %v% operator, which resembles the pipe operator we use in the {tidyverse} (%>%). This functions similarly to the $ operator for tibbles or dataframes, and allows us to add rank variables to a dataset.

bonobo_ggr_net <- as.network.data.frame(bonobo_ggr_edgelist) # turns GG rubbing edge list into a GG rubbing network
bonobo_groom_net<-as.network.data.frame(bonobo_groom_edgelist) # turns grooming edge list into a grooming network

bonobo_ggr_net %v% "rank" <- bonobo_attribute$rank #adds rank variable to GG rubbing network as a vertex attribute
bonobo_ggr_net %v% "age" <- bonobo_attribute$age #adds age variable to GG rubbing network as a vertex attribute
bonobo_ggr_net %v% "group" <- bonobo_attribute$group #adds group variable to GG rubbing network as a vertex attribute
bonobo_groom_net %v% "rank" <- bonobo_attribute$rank #adds rank variable to grooming network as a vertex attribute
bonobo_groom_net %v% "group" <- bonobo_attribute$group #adds group variable to grooming network as a vertex attribute

Now that everything is set up, we can use summary() to look at our networks. Let’s start by taking a look at the bonobo grooming network:

summary(bonobo_ggr_net)
## Network attributes:
##   vertices = 7
##   directed = TRUE
##   hyper = FALSE
##   loops = FALSE
##   multiple = FALSE
##   bipartite = FALSE
##  total edges = 13 
##    missing edges = 0 
##    non-missing edges = 13 
##  density = 0.3095238 
## 
## Vertex attributes:
## 
##  age:
##    integer valued attribute
##    7 values
## 
##  group:
##    integer valued attribute
##    7 values
## 
##  rank:
##    numeric valued attribute
##    attribute summary:
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##   3.420   3.765   4.660   4.659   5.400   6.200 
##   vertex.names:
##    character valued attribute
##    7 valid vertex names
## 
## Edge attributes:
## 
##  GGR.Weight:
##    integer valued attribute
##    13values
## 
## Network adjacency matrix:
##    ln h j ky lc li dn
## ln  0 0 0  1  1  0  1
## h   0 0 1  0  0  1  0
## j   0 1 0  0  0  0  0
## ky  1 0 0  0  1  0  0
## lc  1 0 0  1  0  0  1
## li  0 1 0  0  0  0  0
## dn  0 0 0  0  1  0  0

Using what we have learned so far, how should we interpret the output of this summary?

Visualizing social network data

Seeing the above summary output is fine… but not as intuitive as seeing something like Alice Pieszecki’s Chart. For a more intuitive figure, we can visualize social networks by creating network objects with {statnet}:

par(mfrow = c(1, 2))
plot(bonobo_ggr_net, label = "vertex.names", main="GG rubbing", vertex.cex = 4) # left plot
box(col = "black")

plot(bonobo_ggr_net, vertex.col = "rank", label = "vertex.names", main="GG rubbing", vertex.cex = 4, edge.label = "GGR.Weight", edge.label.cex = 0.6) # right plot
box(col = "black")

Note the two separate clusters, and the arrows indicating directionality! Were you able to get these relationships from the summary() output?

The plot on the left looks great, but we can make it even more informative by adding edge weights, and by coloring the nodes by rank. This will enable us to have an intuitive understanding of how much individuals interact with one another based on their rankings!

We can also do the same for the grooming network.

par(mfrow = c(1, 2))
plot(bonobo_groom_net, label = "vertex.names", main="Grooming", vertex.cex = 4) 
box(col = "black")

plot(bonobo_groom_net, vertex.col = "rank", label = "vertex.names", main="Grooming", vertex.cex = 4, edge.label = "weight", edge.label.cex = 0.6)
box(col = "black")

How would you describe the grooming and GG rubbing networks? What stands out? What’s different?

Quadratic Assignment Procedure (QAP)

When analyzing social network data, we want to know to what degree our variables impact the formation of a particular network organization. Our independent variables or predictor variables are node level variables (e.g. age) or similarity between node level variables (e.g. whether pairs belong to the same age category). Our dependent variable or response variable is the organization of edges in the network.

We cannot use standard OLS regressions to analyze pair characteristics and the presence of edges between pairs because it violates the assumption that observations are independent. For example, observations A-B, A-C, and A-D are not independent because they all involve individual A. Or if one bonobo is GG rubbing with 5 other bonobos, those 5 other bonobos may also be GG rubbing with each other. Quadratic assignment procedure (QAP) addresses this issue of non-independence.

QAP is a resampling-based method that controls for non-independence in network structure via random permutations. These permutations use different arrangements of the rows and columns in the adjacency matrix. Thus, the network structure is maintained but the arrangement of individuals in the structure is randomized. This represents the null hypothesis because it should eliminate any potential correlations between ties and independent values.

What kinds of questions would you ask with our bonobo data?

There are 3 types of QAP regressions we can run with {statnet}: correlation regressions, multiple/linear regressions, and logistic regressions. We will describe and demonstrate each regression using the bonobo data. We’ll also do the same with a more complex dataset from spider monkeys!

1. Correlation

Correlation QAPs simply test whether two social networks are correlated. For example, using our bonobo datasets, we can ask:

Do bonobos who groom each other correlate with bonobos who GG rub with one another?

What is our null hypothesis and alternative hypothesis in this question?

We can start by using the gcor() function to get a correlation coefficient between the two networks.

gcor(bonobo_ggr_net, bonobo_groom_net)
## [1] 0.1111054

What does this correlation coefficient signify?

These networks don’t seem very correlated but let’s test for significance anyways. The qaptest() function uses Monte Carlo simulations of likelihood quantiles to test arbitrary graph-level statistics against a QAP null hypothesis.

set.seed(123)
qap_cor <- qaptest(list(bonobo_ggr_net, bonobo_groom_net), # include both network objects in a list
                gcor, # the function you're using is correlation between networks (gcor) 
                g1=1, # use first graph in the list (in this case the GG rubbing network)
                g2=2, # use second graph in the list (in this case the groom network)
                reps = 1000) # number of permutations to run (1000 is actually the default)
summary(qap_cor)
## 
## QAP Test Results
## 
## Estimated p-values:
##  p(f(perm) >= f(d)): 0.328 
##  p(f(perm) <= f(d)): 0.81 
## 
## Test Diagnostics:
##  Test Value (f(d)): 0.1111054 
##  Replications: 1000 
##  Distribution Summary:
##      Min:     -0.4191705 
##      1stQ:    -0.2070601 
##      Med:     0.005050247 
##      Mean:    -0.00163123 
##      3rdQ:    0.1111054 
##      Max:     0.8534918

Under Test Diagnostics, the test value represents the correlation coefficient between grooming and GG rubbing networks. This is interpreted at the level of the dyad, pairs who groom each other are likely not correlated to pairs who GG rub with one another. We previously calculated this using the gcor() function. A positive value indicates that networks positively correlate. In other words, one network has ties where the other also does. A negative value indicates that networks negatively correlate. In other words, one network has ties whether the other does not.

Under Estimated p-values, we find statistics that tell us about how this observed correlation compares to what the correlation looks like when we randomly permute one of our networks. Note that these are both one-tailed p-values! One tests whether the correlation is higher than expected by random chance (p(f(perm) >= f(d))), and the other tests whether the correlation is lower than expected by random chance (p(f(perm) <= f(d)). We can see that our p-values are greater than 0.05, meaning we have evidence to fail to reject our null hypothesis. There is no significant correlation between grooming and GG rubbing networks.

We can also visualize the Distribution Summary using the plot() function.

plot(qap_cor)

The dotted line gives us the correlation among the observed networks. The curve shows the distribution of correlations across the permuted networks. The permutations are random, so they are expected to form a normal distribution in which most of the permutations have a correlation of 0. The correlation of our observed networks falls within the typical amount of correlation expected from random networks.

2. Linear/Multiple regression

Linear regression QAPs test the extent to which predictor variables are affecting edge weights. Recall that our predictor variables are node attributes.

For example, if our predictor variable is rank, we can ask: Does rank influence the edge weight of GG rubbing? In other words, we are asking: Is a bonobo more likely to have more weighted edges for each unit increase in rank?

Before we dive in, what do you predict? Do you think there is be a difference between actors (senders) and receivers? What does it mean to have more edge weight?

Predictor variables must be matrices in order to be used in the QAP functions. We can look at whether rank affects the weight of ties sent or received (or both, but they must be added to the model separately).

First, we must create a matrix that shows the rank of senders. Note that senders are represented by ROWS in the matrices.

nodes <- length(bonobo_attribute$actor) # number of nodes in the data set
rank <- bonobo_ggr_net %v% "rank" # a vector of the node-level variable we are interested in

rank_sending <- matrix(data = NA, nrow = nodes, ncol = nodes) # create empty matrix to be filled for senders

for (i in 1:nodes){ # for 1 through the number of nodes in the data set
rank_sending[i,] <- rep(rank[i], nodes)} # the rank of each actor is repeated over entire ROW of matrix
rank_sending
##      [,1] [,2] [,3] [,4] [,5] [,6] [,7]
## [1,] 3.42 3.42 3.42 3.42 3.42 3.42 3.42
## [2,] 3.42 3.42 3.42 3.42 3.42 3.42 3.42
## [3,] 6.20 6.20 6.20 6.20 6.20 6.20 6.20
## [4,] 4.66 4.66 4.66 4.66 4.66 4.66 4.66
## [5,] 4.11 4.11 4.11 4.11 4.11 4.11 4.11
## [6,] 5.57 5.57 5.57 5.57 5.57 5.57 5.57
## [7,] 5.23 5.23 5.23 5.23 5.23 5.23 5.23

Next, we must create a matrix that shows the rank of receivers. Note that receivers are represented by COLUMNS in the matrices.

rank_receiving <- matrix(data = NA, nrow = nodes, ncol = nodes) # empty matrix for receivers

for (i in 1:nodes){ # for 1 through the number of nodes in the data set
rank_receiving[,i] <- rep(rank[i], nodes)} # the rank of each receiver is repeated over entire COLUMN of matrix
rank_receiving
##      [,1] [,2] [,3] [,4] [,5] [,6] [,7]
## [1,] 3.42 3.42  6.2 4.66 4.11 5.57 5.23
## [2,] 3.42 3.42  6.2 4.66 4.11 5.57 5.23
## [3,] 3.42 3.42  6.2 4.66 4.11 5.57 5.23
## [4,] 3.42 3.42  6.2 4.66 4.11 5.57 5.23
## [5,] 3.42 3.42  6.2 4.66 4.11 5.57 5.23
## [6,] 3.42 3.42  6.2 4.66 4.11 5.57 5.23
## [7,] 3.42 3.42  6.2 4.66 4.11 5.57 5.23

Now we can run our linear regression! First we are asking:

Does rank affect how many times bonobos GG rubbed with each others (the weight of the edges they sent)?

Our null hypothesis is that rank does not affect how many times bonobos GG rub others (both high and low).

set.seed(123)
lrqap <- netlm(bonobo_ggr_net, # response variable is a network object with a weighted adjacency matrix
                list(rank_sending)) # list of predictor variables as network or matrix objects
summary(lrqap)
## 
## OLS Network Model
## 
## Residuals:
##         0%        25%        50%        75%       100% 
## -0.4534205 -0.3093578 -0.2036344  0.5465795  0.8695587 
## 
## Coefficients:
##             Estimate   Pr(<=b) Pr(>=b) Pr(>=|b|)
## (intercept)  0.8507548 0.995   0.005   0.008    
## x1          -0.1161796 0.011   0.990   0.021    
## 
## Residual standard error: 0.4587 on 40 degrees of freedom
## Multiple R-squared: 0.06227  Adjusted R-squared: 0.03883 
## F-statistic: 2.656 on 1 and 40 degrees of freedom, p-value: 0.111 
## 
## 
## Test Diagnostics:
## 
##  Null Hypothesis: qap 
##  Replications: 1000 
##  Coefficient Distribution Summary:
## 
##        (intercept)        x1
## Min      -2.660544 -1.882904
## 1stQ     -0.643181 -0.590200
## Median    0.098940 -0.007029
## Mean      0.103537 -0.027443
## 3rdQ      0.832046  0.496377
## Max       2.693891  1.874392

To interpret this we can imagine the x-axis being rank and the y-axis being edge weight. For every unit increase in rank, edge weights are going down 0.1161796 (the estimate of x1 in the above coefficients). This relationship is significant with a p-value of 0.021 using a two-tailed test. However, if we look at our adjusted r2 value, our model is only explaining ~4% of the variability. And our F-statistic p-value is too high to be significant at 0.111. Taken together, we can say that the rank of sender is not significantly explaining the edge weight of GG rubbing interactions.

Now let’s look at the receivers.

Does rank affect how many times bonobos received GG rubbing from others (the weight of the edges they receive)?

The null hypothesis is the same as above, that rank of receiver does not affect the number of times they are GG rubbed with.

set.seed(123)
lrqap2 <- netlm(bonobo_ggr_net, # response variable is a network object with a weighted adjacency matrix
                list(rank_receiving)) # list of predictor variables as network or matrix objects
summary(lrqap2)
## 
## OLS Network Model
## 
## Residuals:
##         0%        25%        50%        75%       100% 
## -0.3992884 -0.3393159 -0.2434687  0.6007116  0.8021901 
## 
## Coefficients:
##             Estimate    Pr(<=b) Pr(>=b) Pr(>=|b|)
## (intercept)  0.64715036 0.996   0.004   0.007    
## x1          -0.07247427 0.038   0.962   0.083    
## 
## Residual standard error: 0.4679 on 40 degrees of freedom
## Multiple R-squared: 0.02423  Adjusted R-squared: -0.0001605 
## F-statistic: 0.9934 on 1 and 40 degrees of freedom, p-value: 0.3249 
## 
## 
## Test Diagnostics:
## 
##  Null Hypothesis: qap 
##  Replications: 1000 
##  Coefficient Distribution Summary:
## 
##        (intercept)         x1
## Min     -2.1166790 -1.2337177
## 1stQ    -0.5527539 -0.4807737
## Median   0.0543513 -0.0004686
## Mean     0.0602322 -0.0068813
## 3rdQ     0.6924992  0.4798283
## Max      2.0115654  1.3478074

Just like with the actors/senders, let’s imagine that rank is on the x-axis and edge weight is on the y-axis. Again, we see a negative slope, though this one is even less steep at -0.07247427. The p-value for the two-tailed test for this variable is not significant. This model is explaining almost 0% of the variability, which means this model is really not a good fit for the residuals (in other words, we can imagine our data points all scattered far away from the model line). The F-statistic p-value is also too high to be significant at 0.3249. All of this indicates that rank of receiver does not significantly explain the edge weight of GG rubbing interactions.

So we fail to reject the null hypothesis in this question too. But that begs the question, if rank is not influencing the edge weight, then is there some other attribute that is?

What if we included multiple predictor variables? We can add another network as a predictor variable! By doing so, we are conducting a multiple regression QAP. For example, we can now ask:

Do sender rank and grooming relationships predict the number of times bonobos GG rubbed in the network?

The null hypothesis is that neither rank of the actor nor grooming relationships affect the number of times bonobos initiate GG rubbing (two-tailed test).

set.seed(123)
mrqap <- netlm(bonobo_ggr_net, # response variable is a network object with a weighted adjacency matrix
                list(rank_sending, rank_receiving, bonobo_groom_net)) # list of all predictor variables as network or matrix objects

summary(mrqap)
## 
## OLS Network Model
## 
## Residuals:
##         0%        25%        50%        75%       100% 
## -0.7038861 -0.3321997 -0.1704610  0.5354274  0.7466385 
## 
## Coefficients:
##             Estimate   Pr(<=b) Pr(>=b) Pr(>=|b|)
## (intercept)  1.3835945 0.993   0.007   0.011    
## x1          -0.1392945 0.007   0.993   0.018    
## x2          -0.1036147 0.020   0.980   0.051    
## x3           0.1510413 0.798   0.202   0.413    
## 
## Residual standard error: 0.4541 on 38 degrees of freedom
## Multiple R-squared: 0.1269   Adjusted R-squared: 0.05801 
## F-statistic: 1.842 on 3 and 38 degrees of freedom, p-value: 0.1561 
## 
## 
## Test Diagnostics:
## 
##  Null Hypothesis: qap 
##  Replications: 1000 
##  Coefficient Distribution Summary:
## 
##        (intercept)       x1       x2       x3
## Min       -3.12987 -2.37270 -1.70518 -2.94942
## 1stQ      -0.83095 -0.64332 -0.47930 -0.92280
## Median     0.02042 -0.03306  0.06527 -0.20974
## Mean       0.04138 -0.03371  0.03601  0.12131
## 3rdQ       0.80742  0.56561  0.57086  0.81042
## Max        2.82364  2.30921  2.00442 10.68268

Are any of the above relationships significant when taken into the model together?

Is this model a better fit than the others? How can we tell?

3. Logistic regression

Logistic regression QAPs test the extent to which independent variables are affecting the presence or absence of edges, but not the weight of those edges. If you have a network with binary ties, we can use this regression method to predict ties in the outcome network (response variable). Using our bonobo dataset, we can ask:

Does higher rank make an individual more or less likely to G-G rub with other individuals?

In this example, our GG rubbing network is the response variable while age is our predictor variable. If you imagine rank (as the predictor variable) on the x-axis, for every unit of rank increase for a bonobo, are they more likely to GG rub with more individuals? Will an older individual have more ties on the social network?

If we were using a different predictor variable, then we would need to create new matrices for senders and receivers. However, we are using the same one as before (rank) so we can skip that step this time. Now we can run our logistic regression!

What is our null hypothesis for this question?

The netlogit() function performs a logistic regression of the network variable for the response variable (bonobo_ggr_net) on the network variables in the predictor variables (age_receiving and age_sending). The resulting fits and coefficients are tested against the null hypothesis.

set.seed(123)
logqap <- netlogit(bonobo_ggr_net, # response variable is a network object
                  list(rank_receiving, rank_sending), # list of all predictor variables as network or matrix objects
                  reps = 1000) # number of draws for quantile estimation, 1000 reps is the default
summary(logqap)
## 
## Network Logit Model
## 
## Coefficients:
##             Estimate   Exp(b)     Pr(<=b) Pr(>=b) Pr(>=|b|)
## (intercept)  4.5524763 94.8670398 0.974   0.026   0.042    
## x1          -0.4903676  0.6124013 0.022   0.978   0.046    
## x2          -0.6817886  0.5057117 0.015   0.985   0.023    
## 
## Goodness of Fit Statistics:
## 
## Null deviance: 58.22436 on 42 degrees of freedom
## Residual deviance: 47.45396 on 39 degrees of freedom
## Chi-Squared test of fit improvement:
##   10.77041 on 3 degrees of freedom, p-value 0.01303442 
## AIC: 53.45396    BIC: 58.66696 
## Pseudo-R^2 Measures:
##  (Dn-Dr)/(Dn-Dr+dfn): 0.2040994 
##  (Dn-Dr)/Dn: 0.1849811 
## Contingency Table (predicted (rows) x actual (cols)):
## 
##          Actual
## Predicted    0    1
##         0   25   11
##         1    4    2
## 
##  Total Fraction Correct: 0.6428571 
##  Fraction Predicted 1s Correct: 0.3333333 
##  Fraction Predicted 0s Correct: 0.6944444 
##  False Negative Rate: 0.8461538 
##  False Positive Rate: 0.137931 
## 
## Test Diagnostics:
## 
##  Null Hypothesis: qap 
##  Replications: 1000 
##  Distribution Summary:
## 
##        (intercept)        x1        x2
## Min      -2.099992 -1.536187 -2.015489
## 1stQ     -0.697576 -0.590863 -0.614552
## Median   -0.026785 -0.008864  0.055893
## Mean     -0.002342 -0.011792  0.030776
## 3rdQ      0.704731  0.553704  0.642645
## Max       1.996422  1.661594  2.026379

As a reminder, we are testing whether rank affects the likelihood of giving and receiving GG rubbing. In the output above, x1 represents the first variable in the model (rank_receiving), and x2 represents the second variable in the model (rank_sending).

Because the hypothesis we were testing was whether there is any relationship between rank and sending and receiving ties, we will use the two-tailed p-value in the column labeled Pr(>=|b|). In this case, both are significant if our alpha value is 0.05; rank_receiving squeaks in right under it! They both have negative slopes as well.

The first column, Estimate is the log odds, which is difficult to interpret. Think of it as a tendency to form or not form ties. The log odds for rank_receiving is -0.4903676, so there is an increasing tendency with rank to not receive ties. The second column, Exp(b), gives odds ratios instead, which are easier to understand. The odds ratio gives the likelihood of a tie to form (or not form) under one condition in contrast to another condition. The odds ratio of rank_receiving is 0.6124013. So for each unit of rank increase a female bonobo is, she is 0.6124013 times as likely to receive any given tie compared to if she were lower ranked. An odds ratio less than 1 is associated with a lower odds of outcome (receiving GG rubbing in this case). More generally, they are less likely to receive GG rubbing as they rise in the ranks. The same can be said for sending ties as they get higher ranked. This slope is even steeper at -0.6817886 and an even lower odds ratio of 0.5057117. With each unit increase in rank, the less likely a bonobo is to initiate gg rubbing. The Chi-squared goodness-of-fit test p-value is significant.

A follow up question given these results could be: are higher-ranked bonobos less likely to participate in GG rubbing as lower-ranked bonobos? We could answer this with a linear regression QAP because we would be looking at the weight of edges in this case. The multiple regression we did before, which also took grooming into account, looked at this and may help address this question.

4. Homophily

You may have noticed when we visualized the bonobo data that the individuals sorted into two groups that did not have any observed interactions. The original dataset actually used bonobos living in two separate zoos, meaning their social networks could not overlap. Obviously, in this case, belonging to the same group is an important predictor for whether a dyad will have an interaction. This may be obscuring other trends when we do not account for group belonging in our models.

We can account for the bonobos’ group belonging including a variable that accounts for homophily, the tendency of actors to form edges with others who are similar to them in some way. To do this we need to create a group homophily variable as a binary adjacency matrix. In this matrix, 1 or TRUE will indicate that a sender and receiver share the same group value. 0 or FALSE will indicate that they do not share the same group value.

The function outer() will take two vectors (or two of the same vector in this case) and create a new matrix based on a specified function applied to every combination of terms in the vector. The == operator tells outer() to create a matrix that reports TRUE/FALSE for every combination of terms.

group <- bonobo_ggr_net %v% "group" # first create a vector giving the group of each node
group_homophily <- outer(group, group, FUN = "==") # shows us whether each combination of terms in vector group is the same (TRUE) or different (FALSE)
group_homophily 
##       [,1]  [,2]  [,3]  [,4]  [,5]  [,6]  [,7]
## [1,]  TRUE FALSE FALSE  TRUE  TRUE  TRUE FALSE
## [2,] FALSE  TRUE  TRUE FALSE FALSE FALSE  TRUE
## [3,] FALSE  TRUE  TRUE FALSE FALSE FALSE  TRUE
## [4,]  TRUE FALSE FALSE  TRUE  TRUE  TRUE FALSE
## [5,]  TRUE FALSE FALSE  TRUE  TRUE  TRUE FALSE
## [6,]  TRUE FALSE FALSE  TRUE  TRUE  TRUE FALSE
## [7,] FALSE  TRUE  TRUE FALSE FALSE FALSE  TRUE

For example, bonobos 1 and 2 belong to different groups, so cell [1,2] contains FALSE. We can now use this matrix as a predictor variable of group homophily in our logistic model from earlier.

set.seed(123)
logqap2 <- netlogit(bonobo_ggr_net,
                  list(rank_receiving, rank_sending, group_homophily),
                  reps = 1000)
summary(logqap2)
## 
## Network Logit Model
## 
## Coefficients:
##             Estimate   Exp(b)     Pr(<=b) Pr(>=b) Pr(>=|b|)
## (intercept)  4.2407432 69.4594517 0.966   0.034   0.065    
## x1          -0.5134417  0.5984324 0.035   0.965   0.063    
## x2          -0.7186133  0.4874277 0.009   0.991   0.017    
## x3           1.1806238  3.2564048 0.868   0.132   0.218    
## 
## Goodness of Fit Statistics:
## 
## Null deviance: 58.22436 on 42 degrees of freedom
## Residual deviance: 44.77452 on 38 degrees of freedom
## Chi-Squared test of fit improvement:
##   13.44985 on 4 degrees of freedom, p-value 0.00927464 
## AIC: 52.77452    BIC: 59.72519 
## Pseudo-R^2 Measures:
##  (Dn-Dr)/(Dn-Dr+dfn): 0.2425588 
##  (Dn-Dr)/Dn: 0.2310003 
## Contingency Table (predicted (rows) x actual (cols)):
## 
##          Actual
## Predicted    0    1
##         0   25    8
##         1    4    5
## 
##  Total Fraction Correct: 0.7142857 
##  Fraction Predicted 1s Correct: 0.5555556 
##  Fraction Predicted 0s Correct: 0.7575758 
##  False Negative Rate: 0.6153846 
##  False Positive Rate: 0.137931 
## 
## Test Diagnostics:
## 
##  Null Hypothesis: qap 
##  Replications: 1000 
##  Distribution Summary:
## 
##        (intercept)        x1        x2        x3
## Min      -1.891645 -1.657796 -2.010851 -2.303175
## 1stQ     -0.654240 -0.541867 -0.606004 -0.987334
## Median   -0.021953 -0.003103  0.044755 -0.384264
## Mean      0.007076 -0.012917  0.030103 -0.111246
## 3rdQ      0.681959  0.511718  0.666226  0.741716
## Max       2.013238  1.635185  1.981424  3.058840

At a glance, including group homophily does not improve this model. There is not a significant relationship between group homophily and the presence/absence of a given tie. The AIC value is only slightly lower for the model including group homphily than the one without group homophily.

Overall, the results from our bonobo data are a little anti-climactic (not much we looked in to seemed important). But sometimes, that’s the nature of small-sample primatology work. To make things a little more exciting, let’s run the three QAPs again with some spider monkey data!

Spider Monkey Social Networks

The spider monkeys in this data set are Critically Endangered brown spider monkeys (Ateles hybridus) from Hacienda San Juan de Carare in Colombia. They are a group living platyrrhine primate with fission-fusion dynamics, similar to bonobos. The original intent of this data set and what it was published for was to test how parasites moved between individuals depending on different levels of contact. That leaves us with a rich dataset of networks and attributes that we can work with!

First, we must load our data.

spmonkey_dat <- curl("https://raw.githubusercontent.com/fuzzyatelin/fuzzyatelin.github.io/master/bioanth-stats/module-F23-Group1/spidermonkey_beh_edgelist.csv")
spmonkey_el <- read.csv(spmonkey_dat)
head(spmonkey_el)
##   Focal        Beh Rec Mins
## 1    Ba In contact  Bo  262
## 2    Ba In contact  Db    1
## 3    Ba In contact  Dl  134
## 4    Ba In contact  Ku    6
## 5    Ba In contact  Nw   10
## 6    Ba In contact  Pe   37
unique(spmonkey_el$Beh) #subsetting the column for proximity types
## [1] "In contact" "mating"     "play"       "play "      "groom"

We see here that spider monkeys are engaging in four unique social behaviors in this dataset: in contact, mating, playing, and grooming. This means that we can construct four separate social networks, on for each behavior. Additionally, the packages that we use can only read edge lists that are in the form of a 2-column format. So, let’s clean this data up a bit more.

First, we are starting with the mating network.

mating_spmonkey <- spmonkey_el[spmonkey_el$Beh == "mating",] #making a separate data frame for mating
mating_spmonkey <- mating_spmonkey[,-c(2,4)] #removing unnecessary column data, we'll add back this "Mins" data as edge attributes later 
sp_mating_net <- as.network(mating_spmonkey, matrix.type = "edgelist", directed = TRUE)
plot(sp_mating_net, label = "vertex.names", main="Spider Monkey Mating Network")

Next, let’s add network and edge attributes to the network.

sp_attr <- curl("https://raw.githubusercontent.com/fuzzyatelin/fuzzyatelin.github.io/master/bioanth-stats/module-F23-Group1/spidermonkey_attributes.csv")
sp_attr <- read.csv(sp_attr)
head(sp_attr)
##   Number ID    sex   age dead.alive Repr_status
## 1      1 Pe female adult       dead   preg/lact
## 2      2 Ku female adult       dead   preg/lact
## 3      3 Dl female adult       dead   preg/lact
## 4      4 Ba female adult       dead   preg/lact
## 5      5 Vi female adult       dead   preg/lact
## 6      6 Nw   male adult      alive repr_active
sp_attr$age[sp_attr$age == "infant"] <- 1 #assigning a numeric value to each life history stage
sp_attr$age[sp_attr$age == "juvenile"] <- 2
sp_attr$age[sp_attr$age == "adult"] <- 3
sp_attr$age <- as.numeric(sp_attr$age)
class(sp_attr$age) #confirming the age attribute is coming up as a numeric class
## [1] "numeric"
sp_attr$sex[sp_attr$sex == "female"] <- 0 #assigning a numeric value to each sex
sp_attr$sex[sp_attr$sex == "male"] <- 1
sp_attr$sex <- as.numeric(sp_attr$sex)
class(sp_attr$sex) #confirming the sex attribute is coming up as a numeric class
## [1] "numeric"
#next we need to assign the node attributes to the mating network
sp_mating_net %v% "sex" <- sp_attr$sex #adds sex as node attribute
sp_mating_net %v% "age" <- sp_attr$age #adds age as node attribute
sp_mating_net %v% "repr" <- sp_attr$Repr_status #adds reproductive status as a node attribute

We can also add the “Mins” from the initial spider monkey edge list as edge weights. The %e% operator is like the %v% operator. It subsets a network object to the edge attribute so we can store time (in minutes spent mating) there.

sp_mating_net %e% "time" <- spmonkey_el[spmonkey_el$Beh == "mating",]$Mins 
summary(sp_mating_net) #can summarize the network like we did for the bonobo GG rubbing network
## Network attributes:
##   vertices = 7
##   directed = TRUE
##   hyper = FALSE
##   loops = FALSE
##   multiple = FALSE
##   bipartite = FALSE
##  total edges = 11 
##    missing edges = 0 
##    non-missing edges = 11 
##  density = 0.2619048 
## 
## Vertex attributes:
## 
##  age:
##    numeric valued attribute
##    attribute summary:
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##       3       3       3       3       3       3 
## 
##  repr:
##    character valued attribute
##    attribute summary:
##   preg/lact repr_active 
##           5           2 
## 
##  sex:
##    numeric valued attribute
##    attribute summary:
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##  0.0000  0.0000  0.0000  0.2857  0.5000  1.0000 
##   vertex.names:
##    character valued attribute
##    7 valid vertex names
## 
## Edge attributes:
## 
##  time:
##    integer valued attribute
##    11values
## 
## Network adjacency matrix:
##    Nw Ba Pe Wa Dl Pk Vt
## Nw  0  1  1  0  1  0  0
## Ba  1  0  0  1  0  1  1
## Pe  1  0  0  1  0  1  0
## Wa  0  0  0  0  1  0  0
## Dl  0  0  0  0  0  0  0
## Pk  0  0  0  0  0  0  0
## Vt  0  0  0  0  0  0  0

This process can be repeated for grooming, play and contact networks.

play_spmonkey <- spmonkey_el[spmonkey_el$Beh == "play",][,-c(2,4)] #making a separate data frame for play behavior
sp_play_net_unweighted <- as.network(play_spmonkey, matrix.type = "edgelist", directed = TRUE, weighted=FALSE)
plot(sp_play_net_unweighted, label = "vertex.names", main="Spider Monkey Play Network")

# Adding network and edge attributes to network
sp_play_net_unweighted %v% "sex" <- sp_attr$sex #adds sex as node attribute
sp_play_net_unweighted %v% "age" <- sp_attr$age #adds age as node attribute
sp_play_net_unweighted %v% "repr" <- sp_attr$Repr_status #adds reproductive status as node attribute
groom_spmonkey <- spmonkey_el[spmonkey_el$Beh == "groom",][,-c(2,4)] #making a separate data frame for grooming behavior
sp_groom_net <- as.network(groom_spmonkey, matrix.type = "edgelist", directed = TRUE, loops = TRUE)
plot(sp_groom_net, label = "vertex.names", main="Spider Monkey Grooming Network")

# Adding network and edge attributes to the network
sp_groom_net %v% "sex" <- sp_attr$sex #adds sex as node attribute
sp_groom_net %v% "age" <- sp_attr$age #adds age as node attribute
sp_groom_net %v% "repr" <- sp_attr$Repr_status #adds reproductive status as a node attribute

# Let's add the "Mins" from the initial spmonkey edgelist as edge weights
sp_groom_net %e% "time" <- spmonkey_el[spmonkey_el$Beh == "groom",]$Mins 
contact_spmonkey <- spmonkey_el[spmonkey_el$Beh == "In contact",][,-c(2,4)] #making a separate data frame for contact
sp_contact_net <- as.network(contact_spmonkey, matrix.type = "edgelist", directed = TRUE, loops = TRUE)
plot(sp_contact_net, label = "vertex.names", main="Spider Monkey Contact Network")

# Adding network and edge attributes to the network
sp_contact_net %v% "sex" <- sp_attr$sex #adds sex as node attribute
sp_contact_net %v% "age" <- sp_attr$age #adds age as node attribute
sp_contact_net %v% "repr" <- sp_attr$Repr_status #adds reproductive status as a node attribute

# Let's add the "Mins" from the initial spmonkey edgelist as edge weights
sp_contact_net %e% "time" <- spmonkey_el[spmonkey_el$Beh == "In contact",]$Mins 
plot(sp_contact_net, label = "vertex.names", main="Spider Monkey Contact Network")

How was this process different from the bonobo data loading, cleaning, and visualizing process?

How do these plots compare to the bonobo data?

Do you think we could make any predictions based on just looking at these plots?

Can we manipulate the plots so that the node colors are different based on relevant variables like sex?

Can we benefit from showing edge weights in the spider monkey plots as well?

Now that we have loaded, cleaned, and visualized our spider monkey dataset, we can analyze it via QAP regressions!

QAPs for Spider Monkey Data

What are some questions you would ask about our spider monkey social networks and which QAP would you use?

Correlation

We need to use two networks with the same number of nodes so we chose to ask a question about grooming and contact:

Does grooming relationships correlate with contact relationships in spider monkeys?

What is our null hypothesis? Alternative hypothesis?

set.seed(123)

# get the correlation value
gcor(sp_groom_net, sp_contact_net)
## [1] 0.362977
# is it significant?
qap_cor <- qaptest(list(sp_groom_net, sp_contact_net), # include both network objects in a list
                gcor, # the function you're using is correlation between networks (gcor) 
                g1=1, # use first graph in the list
                g2=2, # use second graph in the list
                reps = 1000) # number of permutations to run
summary(qap_cor)
## 
## QAP Test Results
## 
## Estimated p-values:
##  p(f(perm) >= f(d)): 0.002 
##  p(f(perm) <= f(d)): 0.999 
## 
## Test Diagnostics:
##  Test Value (f(d)): 0.362977 
##  Replications: 1000 
##  Distribution Summary:
##      Min:     -0.3811259 
##      1stQ:    -0.09074425 
##      Med:     0 
##      Mean:    -3.62977e-05 
##      3rdQ:    0.0725954 
##      Max:     0.3811259
plot(qap_cor)

How do you interpret this? Are these highly correlated networks? Which test (upper or lower tailed) is significant?

It looks like the correlation between the grooming and contact networks is significant. Test Value (f(d)) is 0.362977, which is significant for p(f(perm) >= f(d)). This means, that the dyads that groomed each other were also more likely to be in contact with each other.

Linear regression

Recall that in linear regression for social network analysis involves identifying an independent variable (node attributes) to test its effect on the edge weight of the response variable (the social network). For the spider monkeys, we have multiple options for both node attributes and social networks. We chose to look at age and time spent grooming.

Is age category predictive of time spent in grooming behavior within dyads of spider monkeys?

But before we can get into the analysis, we must first make the matrices for senders and receivers of grooming like we did for the bonobos. As a reminder, senders are represented by ROWS in the matrices.

nodes <- length(sp_attr$ID) # the number of nodes in the data set
age <- sp_groom_net %v% "age" # a vector of the node-level variable we are interested in

age_sending <- matrix(data = NA, nrow = nodes, ncol = nodes) #create an empty matrix to be filled

for (i in 1:nodes){ # for 1 through the number of nodes in the data set
age_sending[i,] <- rep(age[i], nodes)} # The age of each actor is repeated over entire ROW of matrix
head(age_sending)
##      [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10] [,11] [,12] [,13] [,14]
## [1,]    3    3    3    3    3    3    3    3    3     3     3     3     3     3
## [2,]    3    3    3    3    3    3    3    3    3     3     3     3     3     3
## [3,]    3    3    3    3    3    3    3    3    3     3     3     3     3     3
## [4,]    3    3    3    3    3    3    3    3    3     3     3     3     3     3
## [5,]    3    3    3    3    3    3    3    3    3     3     3     3     3     3
## [6,]    3    3    3    3    3    3    3    3    3     3     3     3     3     3
##      [,15] [,16] [,17]
## [1,]     3     3     3
## [2,]     3     3     3
## [3,]     3     3     3
## [4,]     3     3     3
## [5,]     3     3     3
## [6,]     3     3     3

Receivers are represented by COLUMNS in the matrices.

age_receiving <- matrix(data = NA, nrow = nodes, ncol = nodes)

for (i in 1:nodes){ # for 1 through the number of nodes in the data set
age_receiving[,i] <- rep(age[i], nodes)} # The age of each actor is repeated over entire COLUMN of matrix
head(age_receiving)
##      [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10] [,11] [,12] [,13] [,14]
## [1,]    3    3    3    3    3    3    3    3    3     3     3     2     2     2
## [2,]    3    3    3    3    3    3    3    3    3     3     3     2     2     2
## [3,]    3    3    3    3    3    3    3    3    3     3     3     2     2     2
## [4,]    3    3    3    3    3    3    3    3    3     3     3     2     2     2
## [5,]    3    3    3    3    3    3    3    3    3     3     3     2     2     2
## [6,]    3    3    3    3    3    3    3    3    3     3     3     2     2     2
##      [,15] [,16] [,17]
## [1,]     1     2     1
## [2,]     1     2     1
## [3,]     1     2     1
## [4,]     1     2     1
## [5,]     1     2     1
## [6,]     1     2     1

What is the null hypothesis? What is the alternative hypothesis?

set.seed(123)
lrqap.sp <- netlm(sp_groom_net, # response variable is a network object with a weighted adjacency matrix
                list(age_sending)) # list of all predictor variables as network objects or matrix objects

summary(lrqap.sp)
## 
## OLS Network Model
## 
## Residuals:
##         0%        25%        50%        75%       100% 
## -0.3642857 -0.3642857 -0.1214286  0.2500000  0.6357143 
## 
## Coefficients:
##             Estimate   Pr(<=b) Pr(>=b) Pr(>=|b|)
## (intercept) -0.3642857 0.049   0.951   0.113    
## x1           0.2428571 1.000   0.000   0.008    
## 
## Residual standard error: 0.4001 on 270 degrees of freedom
## Multiple R-squared: 0.1524   Adjusted R-squared: 0.1492 
## F-statistic: 48.54 on 1 and 270 degrees of freedom, p-value: 2.472e-11 
## 
## 
## Test Diagnostics:
## 
##  Null Hypothesis: qap 
##  Replications: 1000 
##  Coefficient Distribution Summary:
## 
##        (intercept)      x1
## Min        -6.8264 -8.8876
## 1stQ       -2.2247 -1.4083
## Median     -0.5181  0.4010
## Mean       -0.2970  0.1511
## 3rdQ        1.4219  2.0195
## Max         9.3958  6.2148

Does age have a positive or negative relationship with amount of time spent grooming?

For each unit (in this case life history stage) of age increase, how much more or less likely is a spider monkey to groom?

Is the relationship between age and grooming significant?

How much variability in the data does this model explain?

Is it a good fit model? Is this is a significant effect? How can you tell?

The b1 coefficient is 0.24. For every 1 unit increase in age category, the individuals are likely to have 0.24 times greater edge weights for each grooming tie they send.

Logistic regression

Recall that for logistic regression we are testing the presence or absence of ties based on the predictor variable (node attribute). Here we chose to look at the effect of sex of individuals on the presence or absence of ties in the play network.

Does sex predict play behavior (number of ties in play social network)?

In other words, do males or females play more?

Again, we make the matrices for the variables for both sending (rows) and receiving (columns) individuals.

nodes <- 16 # the number of nodes in the data set
sex <- sp_play_net_unweighted %v% "sex" # a vector of the node-level variable we are interested in

sex_sending <- matrix(data = NA, nrow = nodes, ncol = nodes) #create an empty matrix to be filled

for (i in 1:nodes){ # for 1 through the number of nodes in the data set
sex_sending[i,] <- rep(sex[i], nodes)} # sex of each actor repeated over entire ROW of matrix
head(sex_sending)
##      [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10] [,11] [,12] [,13] [,14]
## [1,]    0    0    0    0    0    0    0    0    0     0     0     0     0     0
## [2,]    0    0    0    0    0    0    0    0    0     0     0     0     0     0
## [3,]    0    0    0    0    0    0    0    0    0     0     0     0     0     0
## [4,]    0    0    0    0    0    0    0    0    0     0     0     0     0     0
## [5,]    0    0    0    0    0    0    0    0    0     0     0     0     0     0
## [6,]    1    1    1    1    1    1    1    1    1     1     1     1     1     1
##      [,15] [,16]
## [1,]     0     0
## [2,]     0     0
## [3,]     0     0
## [4,]     0     0
## [5,]     0     0
## [6,]     1     1
sex_receiving <- matrix(data = NA, nrow = nodes, ncol = nodes)

for (i in 1:nodes){ # for 1 through the number of nodes in the data set
sex_receiving[,i] <- rep(sex[i], nodes)} # sex of each actor repeated over entire COLUMN of matrix
head(sex_receiving)
##      [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10] [,11] [,12] [,13] [,14]
## [1,]    0    0    0    0    0    1    1    1    1     1     1     0     0     0
## [2,]    0    0    0    0    0    1    1    1    1     1     1     0     0     0
## [3,]    0    0    0    0    0    1    1    1    1     1     1     0     0     0
## [4,]    0    0    0    0    0    1    1    1    1     1     1     0     0     0
## [5,]    0    0    0    0    0    1    1    1    1     1     1     0     0     0
## [6,]    0    0    0    0    0    1    1    1    1     1     1     0     0     0
##      [,15] [,16]
## [1,]     0     0
## [2,]     0     0
## [3,]     0     0
## [4,]     0     0
## [5,]     0     0
## [6,]     0     0

What is the null hypothesis? What is the alternative hypothesis?

set.seed(123)
lrqap <- netlogit(sp_play_net_unweighted, # response variable is a network object with an unweighted adjacency matrix
                  list(sex_receiving, sex_sending), # list of predictor variables as network or matrix objects
                  reps = 1000) 
summary(lrqap)
## 
## Network Logit Model
## 
## Coefficients:
##             Estimate   Exp(b)     Pr(<=b) Pr(>=b) Pr(>=|b|)
## (intercept) -2.4875920 0.08310985 0.000   1.000   0.000    
## x1           0.4783349 1.61338572 0.899   0.101   0.229    
## x2           0.1100466 1.11633005 0.588   0.412   0.876    
## 
## Goodness of Fit Statistics:
## 
## Null deviance: 332.7106 on 240 degrees of freedom
## Residual deviance: 150.4112 on 237 degrees of freedom
## Chi-Squared test of fit improvement:
##   182.2994 on 3 degrees of freedom, p-value 0 
## AIC: 156.4112    BIC: 166.8531 
## Pseudo-R^2 Measures:
##  (Dn-Dr)/(Dn-Dr+dfn): 0.4316829 
##  (Dn-Dr)/Dn: 0.5479219 
## Contingency Table (predicted (rows) x actual (cols)):
## 
##       0     1
## 0   217    23
## 1     0     0
## 
##  Total Fraction Correct: 0.9041667 
##  Fraction Predicted 1s Correct: NaN 
##  Fraction Predicted 0s Correct: 0.9041667 
##  False Negative Rate: 1 
##  False Positive Rate: 0 
## 
## Test Diagnostics:
## 
##  Null Hypothesis: qap 
##  Replications: 1000 
##  Distribution Summary:
## 
##        (intercept)        x1        x2
## Min      -6.786638 -2.076333 -2.690130
## 1stQ     -5.254470 -0.671335 -1.112456
## Median   -4.616653  0.030267 -0.212350
## Mean     -4.421015 -0.015145  0.001826
## 3rdQ     -3.737862  0.603964  1.054874
## Max       0.032959  2.689822  3.798673

Do either sex receiving or sex sending have a significant relationship with amount of edges in the play network?

Does this model significantly explain the effect?