What is a social network ?
A social network describes all the individuals in a group
AND all the links between them. The photo above shows a social network
from the television show The L Word.
In this particular network, the nodes, shown as names,
represent people in the fictional LA social scene of the series; while
the edges, or lines between names, represent romantic and
sexual connections to each other. As you can see, the author of this
particular social network analysis - Alice Pieszecki - has annotated her
network to indicate particularly important (or promiscuous) nodes (i.e.,
Shane, Lauren, and Alice, herself) that have many connections across the
network, and colored some nodes and edges differently to indicate
perhaps other social qualities within the network.
Of course, there are more formal ways to understand the importance of
particular nodes or the quality of relationships based on the structure
of edges in a social network, and even to find out if there are other
variables outside of the nodes and edges themselves that might influence
a social network (in the case of The L Word, for example, class
or sexual orientation might be important variables).
We can find these using social network analysis.
What can social network analyses tell us?
Social network analysis includes the theory, methods, and
practical/analytical applications for finding meaningful patterns within
the structure of social network datasets.
In this vignette, you will learn (1) how to prepare
social network data for analysis and (2) how to perform
quadratic assignment procedures to assess correlations between social
networks (response variables) and factors that may influence the
structure of those networks (predictor variables).
A network analysis (as opposed to a
non-network analysis) may be used to reveal information
about the group as a whole, as opposed to the individual. Looking back
at our social network from The L Word, non-network analysis
might only look at Shane’s romantic connections
independently, to understand why Shane hooks up with
whom and how many hook-ups she’s had. This can tell us about Shane’s
choices and appeal, and potentially about who seeks out Shane as a
partner, but doesn’t tell us much about the LA power-lesbian scene as a
whole. A social network analysis, however, can tell us how Shane’s
romantic connections are structured in relation to Alice’s and
Lauren’s (in other words, we can understand them as being not
independent of the larger social network). It could also reveal
group-level characteristics that don’t exist or are difficult to
ascertain just by looking at individuals. When we want to look at the
patterns of relationships between individuals, an edge is just as
important (and as much a network-object) as a node.
Terms to know
Nodes: the individuals in the social
network
Edges: the relationships between individuals in
a network
Directionality: some social networks will have a
relationship that goes in one direction (e.g., Helena gives Alice
money). Other relationships won’t have directionality (e.g., Bette and
Tina get married). Directionality is relevant for understanding
transactional or rank-based relationships.
Edge weight: the strength of the relationship
between individuals, which may differ between individuals within social
networks (e.g., if Shane has a lot of one-night-stands, but hooks up
with Carmen multiple times, we might be interested in how many times as
this might be an indicator of the quality of their relationship. This
value is the edge’s weight)
By the end of this module, you should be able to answer the following
questions about social networks in R:
How are individuals connected in a group?
How can we test the extent to which specific attributes
influence whether individuals in a group are connected?
How do networks change over time?
To accomplish this, we will (sadly) leave The L Word to use
real social network datasets from nonhuman primates!
We will start with a dataset from bonobos (Pan paniscus).
Bonobos are one of our closest living relatives and live in
multi-male/multi-female social groups. Female bonobos are highly
cohesive and have strong affiliative bonds. These bonds are maintained
through high levels of sociosexual behavior, the most common being
genito-genital (GG) rubbing. Similar to most other primates, bonobos
will also frequently groom each other, and grooming behaviors can be a
key indicator of rank and the quality of relationships within a social
group.
Understanding the structure of social network data
Loading data
Let’s start by loading our data from bonobos. There are two datasets
from zoo-based bonobo populations at Apenheul Primate Park in the
Netherlands, and La Vallée des Singes in France. One dataset describes
GG rubbing networks and the other describes grooming networks in these
populations.
library(curl)
f <- curl("https://raw.githubusercontent.com/fuzzyatelin/fuzzyatelin.github.io/master/bioanth-stats/module-F23-Group1/AnzaEtAl_2021_ggr_weighted.csv")
bonobo_ggr_edgelist <- read.csv(f, header = TRUE, sep = ",")
head(bonobo_ggr_edgelist) # data about bonobos' GG rubbing interactions
## actor receiver GGR.Weight
## 1 ln ky 4
## 2 h j 20
## 3 j h 4
## 4 ky lc 15
## 5 lc ky 20
## 6 lc ln 31
q <- curl("https://raw.githubusercontent.com/fuzzyatelin/fuzzyatelin.github.io/master/bioanth-stats/module-F23-Group1/AnzaEtAl_2021_grooming_weighted.csv")
bonobo_groom_edgelist <- read.csv(q, header = TRUE, sep = ",")
head(bonobo_groom_edgelist) # data about bonobos' grooming interactions
## actor receiver weight
## 1 j li 1
## 2 li h 3
## 3 h j 6
## 4 j h 3
## 5 h li 2
## 6 ln dn 8
What are we looking at in these data sets?
Who are the actors (also known as senders) vs the
receivers?
In the first 6 rows we are looking at for each data set, who
has the most weighted relationship?
What does this mean?
Setting up social network data
How we set-up and manage the social network data can impact the ease
with which we navigate our social network analysis. Based on how many
individuals there are, whether we know any of their attributes (age,
sex, etc), and information about the edges themselves (directionality,
edge-weights, etc), we can store and input data either as a
matrix or as an edge list.
Adjacency matrix
A matrix used for social network analysis is called a
sociomatrix or an adjacency matrix.
They are square matrices. For directed networks, rows indicate the
starting node, and columns indicate the receiving node. Undirected
networks are symmetric matrices around the diagonal. For un-weighted
networks, the cells contain a 1, for the presence of an edge and a 0 for
the absence of an edge between individuals. For weighted networks, the
cells can contain the edge-weights instead of 1s and 0s.
Edge list
Alternatively, we can also represent network data as a columned list,
called an edge list. Each row of the list represents
one edge (one relationship between individuals). In directed data, the
first column names the sender of the edge and the second column names
the receiver. In undirected data, it does not matter which node is
listed first. For weighted data, a third column contains a numeric value
for each tie.
An edge list is usually easier for a person to interpret. However, it
is important to be familiar with adjacency matrices as they are the
format used to conduct mathematical operations in various social network
analyses.
Is our bonobo dataset structured as an edge list or adjacency
matrix?
Node-Level Variables
To include variables that describe your nodes (age, rank, sex, etc),
it is good practice to load in a separate file that contains these data.
This data frame must include one column of the node IDs using the exact
same IDs used in the adjacency matrix or edge list. All other variables
should then be included in additional columns.
j <- curl("https://raw.githubusercontent.com/fuzzyatelin/fuzzyatelin.github.io/master/bioanth-stats/module-F23-Group1/AnzaEtAl_2021_attributes.csv")
bonobo_attribute <- read.csv(j, header = TRUE, sep = ",")
bonobo_attribute
## actor rank age group
## 1 ln 3.42 9 3
## 2 h 3.42 31 2
## 3 j 6.20 24 2
## 4 ky 4.66 11 3
## 5 lc 4.11 9 3
## 6 dn 5.57 44 3
## 7 li 5.23 12 2
Who is the highest ranking bonobo? Who is the lowest? Who is
the oldest? Who is the youngest?
Now we can create the network objects and also add node attributes to
those objects. Note that adding node attributes to objects involves the
%v%
operator, which resembles the pipe operator we use in
the {tidyverse} (%>%
). This functions similarly to the
$
operator for tibbles or dataframes, and allows us to add
rank variables to a dataset.
bonobo_ggr_net <- as.network.data.frame(bonobo_ggr_edgelist) # turns GG rubbing edge list into a GG rubbing network
bonobo_groom_net<-as.network.data.frame(bonobo_groom_edgelist) # turns grooming edge list into a grooming network
bonobo_ggr_net %v% "rank" <- bonobo_attribute$rank #adds rank variable to GG rubbing network as a vertex attribute
bonobo_ggr_net %v% "age" <- bonobo_attribute$age #adds age variable to GG rubbing network as a vertex attribute
bonobo_ggr_net %v% "group" <- bonobo_attribute$group #adds group variable to GG rubbing network as a vertex attribute
bonobo_groom_net %v% "rank" <- bonobo_attribute$rank #adds rank variable to grooming network as a vertex attribute
bonobo_groom_net %v% "group" <- bonobo_attribute$group #adds group variable to grooming network as a vertex attribute
Now that everything is set up, we can use summary()
to
look at our networks. Let’s start by taking a look at the bonobo
grooming network:
summary(bonobo_ggr_net)
## Network attributes:
## vertices = 7
## directed = TRUE
## hyper = FALSE
## loops = FALSE
## multiple = FALSE
## bipartite = FALSE
## total edges = 13
## missing edges = 0
## non-missing edges = 13
## density = 0.3095238
##
## Vertex attributes:
##
## age:
## integer valued attribute
## 7 values
##
## group:
## integer valued attribute
## 7 values
##
## rank:
## numeric valued attribute
## attribute summary:
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 3.420 3.765 4.660 4.659 5.400 6.200
## vertex.names:
## character valued attribute
## 7 valid vertex names
##
## Edge attributes:
##
## GGR.Weight:
## integer valued attribute
## 13values
##
## Network adjacency matrix:
## ln h j ky lc li dn
## ln 0 0 0 1 1 0 1
## h 0 0 1 0 0 1 0
## j 0 1 0 0 0 0 0
## ky 1 0 0 0 1 0 0
## lc 1 0 0 1 0 0 1
## li 0 1 0 0 0 0 0
## dn 0 0 0 0 1 0 0
Using what we have learned so far, how should we interpret
the output of this summary?
Visualizing social network data
Seeing the above summary output is fine… but not as intuitive as
seeing something like Alice Pieszecki’s Chart. For a more intuitive
figure, we can visualize social networks by creating network objects
with {statnet}:
par(mfrow = c(1, 2))
plot(bonobo_ggr_net, label = "vertex.names", main="GG rubbing", vertex.cex = 4) # left plot
box(col = "black")
plot(bonobo_ggr_net, vertex.col = "rank", label = "vertex.names", main="GG rubbing", vertex.cex = 4, edge.label = "GGR.Weight", edge.label.cex = 0.6) # right plot
box(col = "black")
Note the two separate clusters, and the arrows indicating
directionality! Were you able to get these relationships from the
summary()
output?
The plot on the left looks great, but we can make it even more
informative by adding edge weights, and by coloring the nodes by rank.
This will enable us to have an intuitive understanding of how much
individuals interact with one another based on their rankings!
We can also do the same for the grooming network.
par(mfrow = c(1, 2))
plot(bonobo_groom_net, label = "vertex.names", main="Grooming", vertex.cex = 4)
box(col = "black")
plot(bonobo_groom_net, vertex.col = "rank", label = "vertex.names", main="Grooming", vertex.cex = 4, edge.label = "weight", edge.label.cex = 0.6)
box(col = "black")
How would you describe the grooming and GG rubbing networks?
What stands out? What’s different?
Quadratic Assignment Procedure (QAP)
When analyzing social network data, we want to know to what degree
our variables impact the formation of a particular network organization.
Our independent variables or predictor variables are
node level variables (e.g. age) or similarity between node level
variables (e.g. whether pairs belong to the same age category). Our
dependent variable or response variable is the
organization of edges in the network.
We cannot use standard OLS regressions to analyze pair
characteristics and the presence of edges between pairs because it
violates the assumption that observations are independent. For example,
observations A-B, A-C, and A-D are not independent because they all
involve individual A. Or if one bonobo is GG rubbing with 5 other
bonobos, those 5 other bonobos may also be GG rubbing with each other.
Quadratic assignment procedure (QAP) addresses this issue of
non-independence.
QAP is a resampling-based method that controls for non-independence
in network structure via random permutations. These permutations use
different arrangements of the rows and columns in the adjacency matrix.
Thus, the network structure is maintained but the arrangement of
individuals in the structure is randomized. This represents the null
hypothesis because it should eliminate any potential correlations
between ties and independent values.
What kinds of questions would you ask with our bonobo
data?
There are 3 types of QAP regressions we can run with {statnet}:
correlation regressions, multiple/linear regressions, and logistic
regressions. We will describe and demonstrate each regression using the
bonobo data. We’ll also do the same with a more complex dataset from
spider monkeys!
1. Correlation
Correlation QAPs simply test whether two social networks are
correlated. For example, using our bonobo datasets, we can ask:
Do bonobos who groom each other correlate with bonobos who GG
rub with one another?
What is our null hypothesis and alternative hypothesis in
this question?
We can start by using the gcor()
function to get a
correlation coefficient between the two networks.
gcor(bonobo_ggr_net, bonobo_groom_net)
## [1] 0.1111054
What does this correlation coefficient signify?
These networks don’t seem very correlated but let’s test for
significance anyways. The qaptest()
function uses Monte
Carlo simulations of likelihood quantiles to test arbitrary graph-level
statistics against a QAP null hypothesis.
set.seed(123)
qap_cor <- qaptest(list(bonobo_ggr_net, bonobo_groom_net), # include both network objects in a list
gcor, # the function you're using is correlation between networks (gcor)
g1=1, # use first graph in the list (in this case the GG rubbing network)
g2=2, # use second graph in the list (in this case the groom network)
reps = 1000) # number of permutations to run (1000 is actually the default)
summary(qap_cor)
##
## QAP Test Results
##
## Estimated p-values:
## p(f(perm) >= f(d)): 0.328
## p(f(perm) <= f(d)): 0.81
##
## Test Diagnostics:
## Test Value (f(d)): 0.1111054
## Replications: 1000
## Distribution Summary:
## Min: -0.4191705
## 1stQ: -0.2070601
## Med: 0.005050247
## Mean: -0.00163123
## 3rdQ: 0.1111054
## Max: 0.8534918
Under Test Diagnostics
, the test value represents the
correlation coefficient between grooming and GG rubbing networks. This
is interpreted at the level of the dyad, pairs who groom each other are
likely not correlated to pairs who GG rub with one another. We
previously calculated this using the gcor()
function. A
positive value indicates that networks positively correlate. In other
words, one network has ties where the other also does. A negative value
indicates that networks negatively correlate. In other words, one
network has ties whether the other does not.
Under Estimated p-values
, we find statistics that tell
us about how this observed correlation compares to what the correlation
looks like when we randomly permute one of our networks. Note that these
are both one-tailed p-values! One tests whether the correlation is
higher than expected by random chance
(p(f(perm) >= f(d)))
, and the other tests whether the
correlation is lower than expected by random chance
(p(f(perm) <= f(d))
. We can see that our p-values are
greater than 0.05, meaning we have evidence to fail to reject our null
hypothesis. There is no significant correlation between grooming and GG
rubbing networks.
We can also visualize the Distribution Summary
using the
plot()
function.
plot(qap_cor)
The dotted line gives us the correlation among the observed networks.
The curve shows the distribution of correlations across the permuted
networks. The permutations are random, so they are expected to form a
normal distribution in which most of the permutations have a correlation
of 0. The correlation of our observed networks falls within the typical
amount of correlation expected from random networks.
2. Linear/Multiple regression
Linear regression QAPs test the extent to which predictor variables
are affecting edge weights. Recall that our predictor variables are node
attributes.
For example, if our predictor variable is rank, we can ask:
Does rank influence the edge weight of GG rubbing? In
other words, we are asking: Is a bonobo more likely to have more
weighted edges for each unit increase in rank?
Before we dive in, what do you predict? Do you think there is
be a difference between actors (senders) and receivers? What does it
mean to have more edge weight?
Predictor variables must be matrices in order to be used in the QAP
functions. We can look at whether rank affects the weight of ties sent
or received (or both, but they must be added to the model
separately).
First, we must create a matrix that shows the rank of senders. Note
that senders are represented by ROWS in the matrices.
nodes <- length(bonobo_attribute$actor) # number of nodes in the data set
rank <- bonobo_ggr_net %v% "rank" # a vector of the node-level variable we are interested in
rank_sending <- matrix(data = NA, nrow = nodes, ncol = nodes) # create empty matrix to be filled for senders
for (i in 1:nodes){ # for 1 through the number of nodes in the data set
rank_sending[i,] <- rep(rank[i], nodes)} # the rank of each actor is repeated over entire ROW of matrix
rank_sending
## [,1] [,2] [,3] [,4] [,5] [,6] [,7]
## [1,] 3.42 3.42 3.42 3.42 3.42 3.42 3.42
## [2,] 3.42 3.42 3.42 3.42 3.42 3.42 3.42
## [3,] 6.20 6.20 6.20 6.20 6.20 6.20 6.20
## [4,] 4.66 4.66 4.66 4.66 4.66 4.66 4.66
## [5,] 4.11 4.11 4.11 4.11 4.11 4.11 4.11
## [6,] 5.57 5.57 5.57 5.57 5.57 5.57 5.57
## [7,] 5.23 5.23 5.23 5.23 5.23 5.23 5.23
Next, we must create a matrix that shows the rank of receivers. Note
that receivers are represented by COLUMNS in the matrices.
rank_receiving <- matrix(data = NA, nrow = nodes, ncol = nodes) # empty matrix for receivers
for (i in 1:nodes){ # for 1 through the number of nodes in the data set
rank_receiving[,i] <- rep(rank[i], nodes)} # the rank of each receiver is repeated over entire COLUMN of matrix
rank_receiving
## [,1] [,2] [,3] [,4] [,5] [,6] [,7]
## [1,] 3.42 3.42 6.2 4.66 4.11 5.57 5.23
## [2,] 3.42 3.42 6.2 4.66 4.11 5.57 5.23
## [3,] 3.42 3.42 6.2 4.66 4.11 5.57 5.23
## [4,] 3.42 3.42 6.2 4.66 4.11 5.57 5.23
## [5,] 3.42 3.42 6.2 4.66 4.11 5.57 5.23
## [6,] 3.42 3.42 6.2 4.66 4.11 5.57 5.23
## [7,] 3.42 3.42 6.2 4.66 4.11 5.57 5.23
Now we can run our linear regression! First we are asking:
Does rank affect how many times bonobos GG rubbed with each
others (the weight of the edges they sent)?
Our null hypothesis is that rank does not affect how many times
bonobos GG rub others (both high and low).
set.seed(123)
lrqap <- netlm(bonobo_ggr_net, # response variable is a network object with a weighted adjacency matrix
list(rank_sending)) # list of predictor variables as network or matrix objects
summary(lrqap)
##
## OLS Network Model
##
## Residuals:
## 0% 25% 50% 75% 100%
## -0.4534205 -0.3093578 -0.2036344 0.5465795 0.8695587
##
## Coefficients:
## Estimate Pr(<=b) Pr(>=b) Pr(>=|b|)
## (intercept) 0.8507548 0.995 0.005 0.008
## x1 -0.1161796 0.011 0.990 0.021
##
## Residual standard error: 0.4587 on 40 degrees of freedom
## Multiple R-squared: 0.06227 Adjusted R-squared: 0.03883
## F-statistic: 2.656 on 1 and 40 degrees of freedom, p-value: 0.111
##
##
## Test Diagnostics:
##
## Null Hypothesis: qap
## Replications: 1000
## Coefficient Distribution Summary:
##
## (intercept) x1
## Min -2.660544 -1.882904
## 1stQ -0.643181 -0.590200
## Median 0.098940 -0.007029
## Mean 0.103537 -0.027443
## 3rdQ 0.832046 0.496377
## Max 2.693891 1.874392
To interpret this we can imagine the x-axis being rank and the y-axis
being edge weight. For every unit increase in rank, edge weights are
going down 0.1161796 (the estimate of x1
in the above coefficients). This relationship is significant with a
p-value of 0.021 using a two-tailed test. However, if we look at our
adjusted r2 value, our model is only explaining ~4% of the
variability. And our F-statistic p-value is too high to be significant
at 0.111. Taken together, we can say that the rank of sender is not
significantly explaining the edge weight of GG rubbing interactions.
Now let’s look at the receivers.
Does rank affect how many times bonobos received GG rubbing
from others (the weight of the edges they receive)?
The null hypothesis is the same as above, that rank of receiver does
not affect the number of times they are GG rubbed with.
set.seed(123)
lrqap2 <- netlm(bonobo_ggr_net, # response variable is a network object with a weighted adjacency matrix
list(rank_receiving)) # list of predictor variables as network or matrix objects
summary(lrqap2)
##
## OLS Network Model
##
## Residuals:
## 0% 25% 50% 75% 100%
## -0.3992884 -0.3393159 -0.2434687 0.6007116 0.8021901
##
## Coefficients:
## Estimate Pr(<=b) Pr(>=b) Pr(>=|b|)
## (intercept) 0.64715036 0.996 0.004 0.007
## x1 -0.07247427 0.038 0.962 0.083
##
## Residual standard error: 0.4679 on 40 degrees of freedom
## Multiple R-squared: 0.02423 Adjusted R-squared: -0.0001605
## F-statistic: 0.9934 on 1 and 40 degrees of freedom, p-value: 0.3249
##
##
## Test Diagnostics:
##
## Null Hypothesis: qap
## Replications: 1000
## Coefficient Distribution Summary:
##
## (intercept) x1
## Min -2.1166790 -1.2337177
## 1stQ -0.5527539 -0.4807737
## Median 0.0543513 -0.0004686
## Mean 0.0602322 -0.0068813
## 3rdQ 0.6924992 0.4798283
## Max 2.0115654 1.3478074
Just like with the actors/senders, let’s imagine that rank is on the
x-axis and edge weight is on the y-axis. Again, we see a negative slope,
though this one is even less steep at -0.07247427. The p-value for the
two-tailed test for this variable is not significant. This model is
explaining almost 0% of the variability, which means this model is
really not a good fit for the residuals (in other words, we can imagine
our data points all scattered far away from the model line). The
F-statistic p-value is also too high to be significant at 0.3249. All of
this indicates that rank of receiver does not significantly explain the
edge weight of GG rubbing interactions.
So we fail to reject the null hypothesis in this question too. But
that begs the question, if rank is not influencing the edge
weight, then is there some other attribute that is?
What if we included multiple predictor variables? We can add another
network as a predictor variable! By doing so, we are conducting a
multiple regression QAP. For example, we can now ask:
Do sender rank and grooming relationships predict the number
of times bonobos GG rubbed in the network?
The null hypothesis is that neither rank of the actor nor grooming
relationships affect the number of times bonobos initiate GG rubbing
(two-tailed test).
set.seed(123)
mrqap <- netlm(bonobo_ggr_net, # response variable is a network object with a weighted adjacency matrix
list(rank_sending, rank_receiving, bonobo_groom_net)) # list of all predictor variables as network or matrix objects
summary(mrqap)
##
## OLS Network Model
##
## Residuals:
## 0% 25% 50% 75% 100%
## -0.7038861 -0.3321997 -0.1704610 0.5354274 0.7466385
##
## Coefficients:
## Estimate Pr(<=b) Pr(>=b) Pr(>=|b|)
## (intercept) 1.3835945 0.993 0.007 0.011
## x1 -0.1392945 0.007 0.993 0.018
## x2 -0.1036147 0.020 0.980 0.051
## x3 0.1510413 0.798 0.202 0.413
##
## Residual standard error: 0.4541 on 38 degrees of freedom
## Multiple R-squared: 0.1269 Adjusted R-squared: 0.05801
## F-statistic: 1.842 on 3 and 38 degrees of freedom, p-value: 0.1561
##
##
## Test Diagnostics:
##
## Null Hypothesis: qap
## Replications: 1000
## Coefficient Distribution Summary:
##
## (intercept) x1 x2 x3
## Min -3.12987 -2.37270 -1.70518 -2.94942
## 1stQ -0.83095 -0.64332 -0.47930 -0.92280
## Median 0.02042 -0.03306 0.06527 -0.20974
## Mean 0.04138 -0.03371 0.03601 0.12131
## 3rdQ 0.80742 0.56561 0.57086 0.81042
## Max 2.82364 2.30921 2.00442 10.68268
Are any of the above relationships significant when taken
into the model together?
Is this model a better fit than the others? How can we
tell?
3. Logistic regression
Logistic regression QAPs test the extent to which independent
variables are affecting the presence or absence of edges, but not the
weight of those edges. If you have a network with binary ties, we can
use this regression method to predict ties in the outcome network
(response variable). Using our bonobo dataset, we can ask:
Does higher rank make an individual more or less likely to
G-G rub with other individuals?
In this example, our GG rubbing network is the response variable
while age is our predictor variable. If you imagine rank (as the
predictor variable) on the x-axis, for every unit of rank increase for a
bonobo, are they more likely to GG rub with more individuals? Will an
older individual have more ties on the social network?
If we were using a different predictor variable, then we would need
to create new matrices for senders and receivers. However, we are using
the same one as before (rank) so we can skip that step this time. Now we
can run our logistic regression!
What is our null hypothesis for this question?
The netlogit()
function performs a logistic regression
of the network variable for the response variable
(bonobo_ggr_net
) on the network variables in the predictor
variables (age_receiving
and age_sending
). The
resulting fits and coefficients are tested against the null
hypothesis.
set.seed(123)
logqap <- netlogit(bonobo_ggr_net, # response variable is a network object
list(rank_receiving, rank_sending), # list of all predictor variables as network or matrix objects
reps = 1000) # number of draws for quantile estimation, 1000 reps is the default
summary(logqap)
##
## Network Logit Model
##
## Coefficients:
## Estimate Exp(b) Pr(<=b) Pr(>=b) Pr(>=|b|)
## (intercept) 4.5524763 94.8670398 0.974 0.026 0.042
## x1 -0.4903676 0.6124013 0.022 0.978 0.046
## x2 -0.6817886 0.5057117 0.015 0.985 0.023
##
## Goodness of Fit Statistics:
##
## Null deviance: 58.22436 on 42 degrees of freedom
## Residual deviance: 47.45396 on 39 degrees of freedom
## Chi-Squared test of fit improvement:
## 10.77041 on 3 degrees of freedom, p-value 0.01303442
## AIC: 53.45396 BIC: 58.66696
## Pseudo-R^2 Measures:
## (Dn-Dr)/(Dn-Dr+dfn): 0.2040994
## (Dn-Dr)/Dn: 0.1849811
## Contingency Table (predicted (rows) x actual (cols)):
##
## Actual
## Predicted 0 1
## 0 25 11
## 1 4 2
##
## Total Fraction Correct: 0.6428571
## Fraction Predicted 1s Correct: 0.3333333
## Fraction Predicted 0s Correct: 0.6944444
## False Negative Rate: 0.8461538
## False Positive Rate: 0.137931
##
## Test Diagnostics:
##
## Null Hypothesis: qap
## Replications: 1000
## Distribution Summary:
##
## (intercept) x1 x2
## Min -2.099992 -1.536187 -2.015489
## 1stQ -0.697576 -0.590863 -0.614552
## Median -0.026785 -0.008864 0.055893
## Mean -0.002342 -0.011792 0.030776
## 3rdQ 0.704731 0.553704 0.642645
## Max 1.996422 1.661594 2.026379
As a reminder, we are testing whether rank affects the likelihood of
giving and receiving GG rubbing. In the output above, x1
represents the first variable in the model
(rank_receiving
), and x2
represents the second
variable in the model (rank_sending
).
Because the hypothesis we were testing was whether there is any
relationship between rank and sending and receiving ties, we will use
the two-tailed p-value in the column labeled Pr(>=|b|)
.
In this case, both are significant if our alpha value is 0.05;
rank_receiving
squeaks in right under it! They both have
negative slopes as well.
The first column, Estimate
is the log odds, which is
difficult to interpret. Think of it as a tendency to form or not form
ties. The log odds for rank_receiving
is -0.4903676, so
there is an increasing tendency with rank to not receive ties. The
second column, Exp(b)
, gives odds ratios instead, which are
easier to understand. The odds ratio gives the likelihood of a tie to
form (or not form) under one condition in contrast to another condition.
The odds ratio of rank_receiving
is 0.6124013. So for each
unit of rank increase a female bonobo is, she is 0.6124013 times as
likely to receive any given tie compared to if she were lower ranked. An
odds ratio less than 1 is associated with a lower odds of outcome
(receiving GG rubbing in this case). More generally, they are less
likely to receive GG rubbing as they rise in the ranks. The same can be
said for sending ties as they get higher ranked. This slope is even
steeper at -0.6817886 and an even lower odds ratio of 0.5057117. With
each unit increase in rank, the less likely a bonobo is to initiate gg
rubbing. The Chi-squared goodness-of-fit test p-value is
significant.
A follow up question given these results could be: are higher-ranked
bonobos less likely to participate in GG rubbing as lower-ranked
bonobos? We could answer this with a linear regression QAP because we
would be looking at the weight of edges in this case. The multiple
regression we did before, which also took grooming into account, looked
at this and may help address this question.
4. Homophily
You may have noticed when we visualized the bonobo data that the
individuals sorted into two groups that did not have any observed
interactions. The original dataset actually used bonobos living in two
separate zoos, meaning their social networks could not overlap.
Obviously, in this case, belonging to the same group is an important
predictor for whether a dyad will have an interaction. This may be
obscuring other trends when we do not account for group belonging in our
models.
We can account for the bonobos’ group belonging including a variable
that accounts for homophily, the tendency of actors to form edges with
others who are similar to them in some way. To do this we need to create
a group homophily variable as a binary adjacency matrix. In this matrix,
1
or TRUE
will indicate that a sender and
receiver share the same group value. 0
or
FALSE
will indicate that they do not share the same group
value.
The function outer()
will take two vectors (or two of
the same vector in this case) and create a new matrix based on a
specified function applied to every combination of terms in the vector.
The ==
operator tells outer()
to create a
matrix that reports TRUE/FALSE for every combination of terms.
group <- bonobo_ggr_net %v% "group" # first create a vector giving the group of each node
group_homophily <- outer(group, group, FUN = "==") # shows us whether each combination of terms in vector group is the same (TRUE) or different (FALSE)
group_homophily
## [,1] [,2] [,3] [,4] [,5] [,6] [,7]
## [1,] TRUE FALSE FALSE TRUE TRUE TRUE FALSE
## [2,] FALSE TRUE TRUE FALSE FALSE FALSE TRUE
## [3,] FALSE TRUE TRUE FALSE FALSE FALSE TRUE
## [4,] TRUE FALSE FALSE TRUE TRUE TRUE FALSE
## [5,] TRUE FALSE FALSE TRUE TRUE TRUE FALSE
## [6,] TRUE FALSE FALSE TRUE TRUE TRUE FALSE
## [7,] FALSE TRUE TRUE FALSE FALSE FALSE TRUE
For example, bonobos 1 and 2 belong to different groups, so cell
[1,2] contains FALSE. We can now use this matrix as a predictor variable
of group homophily in our logistic model from earlier.
set.seed(123)
logqap2 <- netlogit(bonobo_ggr_net,
list(rank_receiving, rank_sending, group_homophily),
reps = 1000)
summary(logqap2)
##
## Network Logit Model
##
## Coefficients:
## Estimate Exp(b) Pr(<=b) Pr(>=b) Pr(>=|b|)
## (intercept) 4.2407432 69.4594517 0.966 0.034 0.065
## x1 -0.5134417 0.5984324 0.035 0.965 0.063
## x2 -0.7186133 0.4874277 0.009 0.991 0.017
## x3 1.1806238 3.2564048 0.868 0.132 0.218
##
## Goodness of Fit Statistics:
##
## Null deviance: 58.22436 on 42 degrees of freedom
## Residual deviance: 44.77452 on 38 degrees of freedom
## Chi-Squared test of fit improvement:
## 13.44985 on 4 degrees of freedom, p-value 0.00927464
## AIC: 52.77452 BIC: 59.72519
## Pseudo-R^2 Measures:
## (Dn-Dr)/(Dn-Dr+dfn): 0.2425588
## (Dn-Dr)/Dn: 0.2310003
## Contingency Table (predicted (rows) x actual (cols)):
##
## Actual
## Predicted 0 1
## 0 25 8
## 1 4 5
##
## Total Fraction Correct: 0.7142857
## Fraction Predicted 1s Correct: 0.5555556
## Fraction Predicted 0s Correct: 0.7575758
## False Negative Rate: 0.6153846
## False Positive Rate: 0.137931
##
## Test Diagnostics:
##
## Null Hypothesis: qap
## Replications: 1000
## Distribution Summary:
##
## (intercept) x1 x2 x3
## Min -1.891645 -1.657796 -2.010851 -2.303175
## 1stQ -0.654240 -0.541867 -0.606004 -0.987334
## Median -0.021953 -0.003103 0.044755 -0.384264
## Mean 0.007076 -0.012917 0.030103 -0.111246
## 3rdQ 0.681959 0.511718 0.666226 0.741716
## Max 2.013238 1.635185 1.981424 3.058840
At a glance, including group homophily does not improve this model.
There is not a significant relationship between group homophily and the
presence/absence of a given tie. The AIC value is only slightly lower
for the model including group homphily than the one without group
homophily.
Overall, the results from our bonobo data are a little anti-climactic
(not much we looked in to seemed important). But sometimes, that’s the
nature of small-sample primatology work. To make things a little more
exciting, let’s run the three QAPs again with some spider monkey
data!
Spider Monkey Social Networks
The spider monkeys in this data set are Critically Endangered brown
spider monkeys (Ateles hybridus) from Hacienda San Juan de
Carare in Colombia. They are a group living platyrrhine primate with
fission-fusion dynamics, similar to bonobos. The original intent of this
data set and what it was published for was to test how parasites moved
between individuals depending on different levels of contact. That
leaves us with a rich dataset of networks and attributes that we can
work with!
First, we must load our data.
spmonkey_dat <- curl("https://raw.githubusercontent.com/fuzzyatelin/fuzzyatelin.github.io/master/bioanth-stats/module-F23-Group1/spidermonkey_beh_edgelist.csv")
spmonkey_el <- read.csv(spmonkey_dat)
head(spmonkey_el)
## Focal Beh Rec Mins
## 1 Ba In contact Bo 262
## 2 Ba In contact Db 1
## 3 Ba In contact Dl 134
## 4 Ba In contact Ku 6
## 5 Ba In contact Nw 10
## 6 Ba In contact Pe 37
unique(spmonkey_el$Beh) #subsetting the column for proximity types
## [1] "In contact" "mating" "play" "play " "groom"
We see here that spider monkeys are engaging in four unique social
behaviors in this dataset: in contact, mating, playing, and grooming.
This means that we can construct four separate social networks, on for
each behavior. Additionally, the packages that we use can only read edge
lists that are in the form of a 2-column format. So, let’s clean this
data up a bit more.
First, we are starting with the mating network.
mating_spmonkey <- spmonkey_el[spmonkey_el$Beh == "mating",] #making a separate data frame for mating
mating_spmonkey <- mating_spmonkey[,-c(2,4)] #removing unnecessary column data, we'll add back this "Mins" data as edge attributes later
sp_mating_net <- as.network(mating_spmonkey, matrix.type = "edgelist", directed = TRUE)
plot(sp_mating_net, label = "vertex.names", main="Spider Monkey Mating Network")
Next, let’s add network and edge attributes to the network.
sp_attr <- curl("https://raw.githubusercontent.com/fuzzyatelin/fuzzyatelin.github.io/master/bioanth-stats/module-F23-Group1/spidermonkey_attributes.csv")
sp_attr <- read.csv(sp_attr)
head(sp_attr)
## Number ID sex age dead.alive Repr_status
## 1 1 Pe female adult dead preg/lact
## 2 2 Ku female adult dead preg/lact
## 3 3 Dl female adult dead preg/lact
## 4 4 Ba female adult dead preg/lact
## 5 5 Vi female adult dead preg/lact
## 6 6 Nw male adult alive repr_active
sp_attr$age[sp_attr$age == "infant"] <- 1 #assigning a numeric value to each life history stage
sp_attr$age[sp_attr$age == "juvenile"] <- 2
sp_attr$age[sp_attr$age == "adult"] <- 3
sp_attr$age <- as.numeric(sp_attr$age)
class(sp_attr$age) #confirming the age attribute is coming up as a numeric class
## [1] "numeric"
sp_attr$sex[sp_attr$sex == "female"] <- 0 #assigning a numeric value to each sex
sp_attr$sex[sp_attr$sex == "male"] <- 1
sp_attr$sex <- as.numeric(sp_attr$sex)
class(sp_attr$sex) #confirming the sex attribute is coming up as a numeric class
## [1] "numeric"
#next we need to assign the node attributes to the mating network
sp_mating_net %v% "sex" <- sp_attr$sex #adds sex as node attribute
sp_mating_net %v% "age" <- sp_attr$age #adds age as node attribute
sp_mating_net %v% "repr" <- sp_attr$Repr_status #adds reproductive status as a node attribute
We can also add the “Mins” from the initial spider monkey edge list
as edge weights. The %e%
operator is like the
%v%
operator. It subsets a network object to the edge
attribute so we can store time (in minutes spent mating) there.
sp_mating_net %e% "time" <- spmonkey_el[spmonkey_el$Beh == "mating",]$Mins
summary(sp_mating_net) #can summarize the network like we did for the bonobo GG rubbing network
## Network attributes:
## vertices = 7
## directed = TRUE
## hyper = FALSE
## loops = FALSE
## multiple = FALSE
## bipartite = FALSE
## total edges = 11
## missing edges = 0
## non-missing edges = 11
## density = 0.2619048
##
## Vertex attributes:
##
## age:
## numeric valued attribute
## attribute summary:
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 3 3 3 3 3 3
##
## repr:
## character valued attribute
## attribute summary:
## preg/lact repr_active
## 5 2
##
## sex:
## numeric valued attribute
## attribute summary:
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0.0000 0.0000 0.0000 0.2857 0.5000 1.0000
## vertex.names:
## character valued attribute
## 7 valid vertex names
##
## Edge attributes:
##
## time:
## integer valued attribute
## 11values
##
## Network adjacency matrix:
## Nw Ba Pe Wa Dl Pk Vt
## Nw 0 1 1 0 1 0 0
## Ba 1 0 0 1 0 1 1
## Pe 1 0 0 1 0 1 0
## Wa 0 0 0 0 1 0 0
## Dl 0 0 0 0 0 0 0
## Pk 0 0 0 0 0 0 0
## Vt 0 0 0 0 0 0 0
This process can be repeated for grooming, play and contact
networks.
play_spmonkey <- spmonkey_el[spmonkey_el$Beh == "play",][,-c(2,4)] #making a separate data frame for play behavior
sp_play_net_unweighted <- as.network(play_spmonkey, matrix.type = "edgelist", directed = TRUE, weighted=FALSE)
plot(sp_play_net_unweighted, label = "vertex.names", main="Spider Monkey Play Network")
# Adding network and edge attributes to network
sp_play_net_unweighted %v% "sex" <- sp_attr$sex #adds sex as node attribute
sp_play_net_unweighted %v% "age" <- sp_attr$age #adds age as node attribute
sp_play_net_unweighted %v% "repr" <- sp_attr$Repr_status #adds reproductive status as node attribute
groom_spmonkey <- spmonkey_el[spmonkey_el$Beh == "groom",][,-c(2,4)] #making a separate data frame for grooming behavior
sp_groom_net <- as.network(groom_spmonkey, matrix.type = "edgelist", directed = TRUE, loops = TRUE)
plot(sp_groom_net, label = "vertex.names", main="Spider Monkey Grooming Network")
# Adding network and edge attributes to the network
sp_groom_net %v% "sex" <- sp_attr$sex #adds sex as node attribute
sp_groom_net %v% "age" <- sp_attr$age #adds age as node attribute
sp_groom_net %v% "repr" <- sp_attr$Repr_status #adds reproductive status as a node attribute
# Let's add the "Mins" from the initial spmonkey edgelist as edge weights
sp_groom_net %e% "time" <- spmonkey_el[spmonkey_el$Beh == "groom",]$Mins
contact_spmonkey <- spmonkey_el[spmonkey_el$Beh == "In contact",][,-c(2,4)] #making a separate data frame for contact
sp_contact_net <- as.network(contact_spmonkey, matrix.type = "edgelist", directed = TRUE, loops = TRUE)
plot(sp_contact_net, label = "vertex.names", main="Spider Monkey Contact Network")
# Adding network and edge attributes to the network
sp_contact_net %v% "sex" <- sp_attr$sex #adds sex as node attribute
sp_contact_net %v% "age" <- sp_attr$age #adds age as node attribute
sp_contact_net %v% "repr" <- sp_attr$Repr_status #adds reproductive status as a node attribute
# Let's add the "Mins" from the initial spmonkey edgelist as edge weights
sp_contact_net %e% "time" <- spmonkey_el[spmonkey_el$Beh == "In contact",]$Mins
plot(sp_contact_net, label = "vertex.names", main="Spider Monkey Contact Network")
How was this process different from the bonobo data loading,
cleaning, and visualizing process?
How do these plots compare to the bonobo data?
Do you think we could make any predictions based on just
looking at these plots?
Can we manipulate the plots so that the node colors are
different based on relevant variables like sex?
Can we benefit from showing edge weights in the spider monkey
plots as well?
Now that we have loaded, cleaned, and visualized our spider monkey
dataset, we can analyze it via QAP regressions!
QAPs for Spider Monkey Data
What are some questions you would ask about our spider monkey
social networks and which QAP would you use?
Correlation
We need to use two networks with the same number of nodes so we chose
to ask a question about grooming and contact:
Does grooming relationships correlate with contact
relationships in spider monkeys?
What is our null hypothesis? Alternative
hypothesis?
set.seed(123)
# get the correlation value
gcor(sp_groom_net, sp_contact_net)
## [1] 0.362977
# is it significant?
qap_cor <- qaptest(list(sp_groom_net, sp_contact_net), # include both network objects in a list
gcor, # the function you're using is correlation between networks (gcor)
g1=1, # use first graph in the list
g2=2, # use second graph in the list
reps = 1000) # number of permutations to run
summary(qap_cor)
##
## QAP Test Results
##
## Estimated p-values:
## p(f(perm) >= f(d)): 0.002
## p(f(perm) <= f(d)): 0.999
##
## Test Diagnostics:
## Test Value (f(d)): 0.362977
## Replications: 1000
## Distribution Summary:
## Min: -0.3811259
## 1stQ: -0.09074425
## Med: 0
## Mean: -3.62977e-05
## 3rdQ: 0.0725954
## Max: 0.3811259
plot(qap_cor)
How do you interpret this? Are these highly correlated
networks? Which test (upper or lower tailed) is
significant?
It looks like the correlation between the grooming and contact
networks is significant. Test Value (f(d))
is 0.362977,
which is significant for p(f(perm) >= f(d))
. This means,
that the dyads that groomed each other were also more likely to be in
contact with each other.
Linear regression
Recall that in linear regression for social network analysis involves
identifying an independent variable (node attributes) to test its effect
on the edge weight of the response variable (the social network). For
the spider monkeys, we have multiple options for both node attributes
and social networks. We chose to look at age and time spent
grooming.
Is age category predictive of time spent in grooming behavior
within dyads of spider monkeys?
But before we can get into the analysis, we must first make the
matrices for senders and receivers of grooming like we did for the
bonobos. As a reminder, senders are represented by ROWS in the
matrices.
nodes <- length(sp_attr$ID) # the number of nodes in the data set
age <- sp_groom_net %v% "age" # a vector of the node-level variable we are interested in
age_sending <- matrix(data = NA, nrow = nodes, ncol = nodes) #create an empty matrix to be filled
for (i in 1:nodes){ # for 1 through the number of nodes in the data set
age_sending[i,] <- rep(age[i], nodes)} # The age of each actor is repeated over entire ROW of matrix
head(age_sending)
## [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10] [,11] [,12] [,13] [,14]
## [1,] 3 3 3 3 3 3 3 3 3 3 3 3 3 3
## [2,] 3 3 3 3 3 3 3 3 3 3 3 3 3 3
## [3,] 3 3 3 3 3 3 3 3 3 3 3 3 3 3
## [4,] 3 3 3 3 3 3 3 3 3 3 3 3 3 3
## [5,] 3 3 3 3 3 3 3 3 3 3 3 3 3 3
## [6,] 3 3 3 3 3 3 3 3 3 3 3 3 3 3
## [,15] [,16] [,17]
## [1,] 3 3 3
## [2,] 3 3 3
## [3,] 3 3 3
## [4,] 3 3 3
## [5,] 3 3 3
## [6,] 3 3 3
Receivers are represented by COLUMNS in the matrices.
age_receiving <- matrix(data = NA, nrow = nodes, ncol = nodes)
for (i in 1:nodes){ # for 1 through the number of nodes in the data set
age_receiving[,i] <- rep(age[i], nodes)} # The age of each actor is repeated over entire COLUMN of matrix
head(age_receiving)
## [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10] [,11] [,12] [,13] [,14]
## [1,] 3 3 3 3 3 3 3 3 3 3 3 2 2 2
## [2,] 3 3 3 3 3 3 3 3 3 3 3 2 2 2
## [3,] 3 3 3 3 3 3 3 3 3 3 3 2 2 2
## [4,] 3 3 3 3 3 3 3 3 3 3 3 2 2 2
## [5,] 3 3 3 3 3 3 3 3 3 3 3 2 2 2
## [6,] 3 3 3 3 3 3 3 3 3 3 3 2 2 2
## [,15] [,16] [,17]
## [1,] 1 2 1
## [2,] 1 2 1
## [3,] 1 2 1
## [4,] 1 2 1
## [5,] 1 2 1
## [6,] 1 2 1
What is the null hypothesis? What is the alternative
hypothesis?
set.seed(123)
lrqap.sp <- netlm(sp_groom_net, # response variable is a network object with a weighted adjacency matrix
list(age_sending)) # list of all predictor variables as network objects or matrix objects
summary(lrqap.sp)
##
## OLS Network Model
##
## Residuals:
## 0% 25% 50% 75% 100%
## -0.3642857 -0.3642857 -0.1214286 0.2500000 0.6357143
##
## Coefficients:
## Estimate Pr(<=b) Pr(>=b) Pr(>=|b|)
## (intercept) -0.3642857 0.049 0.951 0.113
## x1 0.2428571 1.000 0.000 0.008
##
## Residual standard error: 0.4001 on 270 degrees of freedom
## Multiple R-squared: 0.1524 Adjusted R-squared: 0.1492
## F-statistic: 48.54 on 1 and 270 degrees of freedom, p-value: 2.472e-11
##
##
## Test Diagnostics:
##
## Null Hypothesis: qap
## Replications: 1000
## Coefficient Distribution Summary:
##
## (intercept) x1
## Min -6.8264 -8.8876
## 1stQ -2.2247 -1.4083
## Median -0.5181 0.4010
## Mean -0.2970 0.1511
## 3rdQ 1.4219 2.0195
## Max 9.3958 6.2148
Does age have a positive or negative relationship with amount
of time spent grooming?
For each unit (in this case life history stage) of age
increase, how much more or less likely is a spider monkey to
groom?
Is the relationship between age and grooming
significant?
How much variability in the data does this model
explain?
Is it a good fit model? Is this is a significant effect? How
can you tell?
The b1
coefficient is 0.24. For every 1 unit increase in
age category, the individuals are likely to have 0.24 times greater edge
weights for each grooming tie they send.
Logistic regression
Recall that for logistic regression we are testing the presence or
absence of ties based on the predictor variable (node attribute). Here
we chose to look at the effect of sex of individuals on the presence or
absence of ties in the play network.
Does sex predict play behavior (number of ties in play social
network)?
In other words, do males or females play more?
Again, we make the matrices for the variables for both sending (rows)
and receiving (columns) individuals.
nodes <- 16 # the number of nodes in the data set
sex <- sp_play_net_unweighted %v% "sex" # a vector of the node-level variable we are interested in
sex_sending <- matrix(data = NA, nrow = nodes, ncol = nodes) #create an empty matrix to be filled
for (i in 1:nodes){ # for 1 through the number of nodes in the data set
sex_sending[i,] <- rep(sex[i], nodes)} # sex of each actor repeated over entire ROW of matrix
head(sex_sending)
## [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10] [,11] [,12] [,13] [,14]
## [1,] 0 0 0 0 0 0 0 0 0 0 0 0 0 0
## [2,] 0 0 0 0 0 0 0 0 0 0 0 0 0 0
## [3,] 0 0 0 0 0 0 0 0 0 0 0 0 0 0
## [4,] 0 0 0 0 0 0 0 0 0 0 0 0 0 0
## [5,] 0 0 0 0 0 0 0 0 0 0 0 0 0 0
## [6,] 1 1 1 1 1 1 1 1 1 1 1 1 1 1
## [,15] [,16]
## [1,] 0 0
## [2,] 0 0
## [3,] 0 0
## [4,] 0 0
## [5,] 0 0
## [6,] 1 1
sex_receiving <- matrix(data = NA, nrow = nodes, ncol = nodes)
for (i in 1:nodes){ # for 1 through the number of nodes in the data set
sex_receiving[,i] <- rep(sex[i], nodes)} # sex of each actor repeated over entire COLUMN of matrix
head(sex_receiving)
## [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10] [,11] [,12] [,13] [,14]
## [1,] 0 0 0 0 0 1 1 1 1 1 1 0 0 0
## [2,] 0 0 0 0 0 1 1 1 1 1 1 0 0 0
## [3,] 0 0 0 0 0 1 1 1 1 1 1 0 0 0
## [4,] 0 0 0 0 0 1 1 1 1 1 1 0 0 0
## [5,] 0 0 0 0 0 1 1 1 1 1 1 0 0 0
## [6,] 0 0 0 0 0 1 1 1 1 1 1 0 0 0
## [,15] [,16]
## [1,] 0 0
## [2,] 0 0
## [3,] 0 0
## [4,] 0 0
## [5,] 0 0
## [6,] 0 0
What is the null hypothesis? What is the alternative
hypothesis?
set.seed(123)
lrqap <- netlogit(sp_play_net_unweighted, # response variable is a network object with an unweighted adjacency matrix
list(sex_receiving, sex_sending), # list of predictor variables as network or matrix objects
reps = 1000)
summary(lrqap)
##
## Network Logit Model
##
## Coefficients:
## Estimate Exp(b) Pr(<=b) Pr(>=b) Pr(>=|b|)
## (intercept) -2.4875920 0.08310985 0.000 1.000 0.000
## x1 0.4783349 1.61338572 0.899 0.101 0.229
## x2 0.1100466 1.11633005 0.588 0.412 0.876
##
## Goodness of Fit Statistics:
##
## Null deviance: 332.7106 on 240 degrees of freedom
## Residual deviance: 150.4112 on 237 degrees of freedom
## Chi-Squared test of fit improvement:
## 182.2994 on 3 degrees of freedom, p-value 0
## AIC: 156.4112 BIC: 166.8531
## Pseudo-R^2 Measures:
## (Dn-Dr)/(Dn-Dr+dfn): 0.4316829
## (Dn-Dr)/Dn: 0.5479219
## Contingency Table (predicted (rows) x actual (cols)):
##
## 0 1
## 0 217 23
## 1 0 0
##
## Total Fraction Correct: 0.9041667
## Fraction Predicted 1s Correct: NaN
## Fraction Predicted 0s Correct: 0.9041667
## False Negative Rate: 1
## False Positive Rate: 0
##
## Test Diagnostics:
##
## Null Hypothesis: qap
## Replications: 1000
## Distribution Summary:
##
## (intercept) x1 x2
## Min -6.786638 -2.076333 -2.690130
## 1stQ -5.254470 -0.671335 -1.112456
## Median -4.616653 0.030267 -0.212350
## Mean -4.421015 -0.015145 0.001826
## 3rdQ -3.737862 0.603964 1.054874
## Max 0.032959 2.689822 3.798673
Do either sex receiving or sex sending have a significant
relationship with amount of edges in the play network?
Does this model significantly explain the
effect?