This module introduces how survey data can be a powerful tool in anthropological research by using real-world responses from a survey done in this class. Surveys allow anthropologists to gather insights into cultural beliefs, behaviors, and perceptions across diverse populations. In this module, you’ll explore how structured survey responses can be analyzed to reveal patterns and meanings relevant to social science questions. To get started, you’ll need the following:
library(tidyverse)
library(curl)
library(ggplot2)
library(tm)
library(wordcloud2)
library(RColorBrewer)
BUT FIRST:
Why Use Survey Data in Anthropology?
Survey data helps anthropologists:
Understand cultural trends and social behavior.
Collect both quantitative and qualitative data, providing different approaches that analyzing both broad trends and detailed insights within R.
Compare societies/populations across different worlds and time periods.
Quantify ethnographic findings with statistical evidence.
Example: Imagine a cultural anthropologist studying youth perceptions of education across rural and urban communities in Kenya. Through open-and closed-ended survey questions, they can:
Quantitatively measure how many respondents believe education leads to job security.
Qualitatively analyze why they hold those beliefs based on open-text responses.
Use R to visualize group differences (e.g., urban vs. rural) and run regression models to predict educational outlook based on socioeconomic variables.
1. Loading and Exploring the Data
We will use a
survey
dataset for this Module. We will need
to load in the dataset:
# Load the raw dataset
raw_data <- curl("https://github.com/ZeddyCraft/AN588-Group-Presentation/raw/refs/heads/main/AN588%20Survey%20(Responses)%20-%20Form%20Responses%201.csv")
d <- read.csv(raw_data, header = TRUE, sep = ",", stringsAsFactors = FALSE)
head(d)
## Timestamp How.old.are.you. What.is.your.gender.
## 1 3/25/2025 16:44:00 21 Female
## 2 3/25/2025 16:55:22 20 Male
## 3 3/25/2025 17:14:43 19 Female
## 4 3/25/2025 17:15:59 22 Female
## 5 3/25/2025 17:20:32 21 Non-Binary
## 6 3/25/2025 17:20:33 20 Non-Binary
## How.tall.are.you..in.inches.please....if.you.don.t.know.guess.a.number...
## 1 6'9
## 2 69
## 3 64
## 4 66
## 5 60"
## 6 65
## How.much.do.you.weigh..in.lbs.please....if.you.don.t.know.guess.a.number...
## 1 180lbs
## 2 200
## 3 155
## 4 130
## 5 125lbs
## 6 185
## What.is.your.major...Answer.in.full..Ex..Computer.Science.
## 1 cs
## 2 PO/IR
## 3 Computer Science
## 4 Marine Sciences, Earth & Environmental Sciences
## 5 Film & TV, Advertising
## 6 Painting
## Do.you.have.are.planning.to.add.a.minor.
## 1 No
## 2 Yes
## 3 No
## 4 No
## 5 No
## 6 No
## If.the.answer.to.your.previous.question.was.yes..what.minor...Answer.in.full..Ex..Computer.Science.
## 1
## 2 DS
## 3
## 4
## 5
## 6
## On.a.scale.of.1.10..how.much.do.you.like.BU.
## 1 8
## 2 7
## 3 8
## 4 8
## 5 6
## 6 8
## On.a.scale.of.1.10..how.confident.are.you.in.your.academics.
## 1 7
## 2 7
## 3 3
## 4 9
## 5 5
## 6 3
## On.a.scale.of.1.10..how.confident.are.you.that.you.will.get.a.job.after.graduation.
## 1 8
## 2 4
## 3 2
## 4 4
## 5 2
## 6 2
## On.a.scale.of.1.10..how.often.are.you.anxious.about.the.future.
## 1 5
## 2 9
## 3 10
## 4 7
## 5 10
## 6 10
## How.many.chickens.can.fit.in.the.basement.of.CAS.
## 1 Infinite
## 2 At least 5
## 3 643,022.76
## 4 at least two
## 5 infinite
## 6 200,000
## How.many.zombies.do.you.think.you.could.kill.in.the.zombie.apocalypse.
## 1 10
## 2 Depends on the equipment but my delusional ass thinks at least 5
## 3 3
## 4 none i'd be patient zero
## 5 depends, at min like maybe 2?
## 6 i’d get to like 4 them just give up tbh
## If.you.could.have.an.infinite.number.of.goats..how.many.goats.would.you.want.
## 1 2
## 2 I do not want goats, so as many as I could realistically sell for profit without over-saturating the market or endangering their wellbeing
## 3 4
## 4 ...one?????
## 5 infinite
## 6 2
## What.is.your.favorite.color.
## 1 purple
## 2 Yellow
## 3 805999
## 4 #dabbed
## 5 teal or forest green
## 6 Purple
## Use.one.word.to.describe.how.do.you.feel.about.BU. Column.17
## 1 multicultural NA
## 2 Nuanced NA
## 3 Wonderous NA
## 4 lol NA
## 5 meh NA
## 6 Labyrinthian NA
Take a moment to explore the raw survey data above. Before we begin analysis, consider the following questions:
What is wrong with this dataset here?
What will likely happen if we run an analysis with right now?
What challenges or limitations would this impose on our research?
Why Cleaning Your Data Matters
Before conducting any kind of analysis, it’s essential to clean your data. Dirty or inconsistent data can lead to misleading results or errors in your analysis. Common cleaning steps include:
You can clean your data using R (e.g., mutate()
,
rename()
, filter()
functions in
dplyr
), or manually in a spreadsheet if the dataset is
small. The cleaned version used in this module is already processed for
demonstration.
Now load this cleaned dataset:
# Load the dataset
cleaned_data <- curl("https://github.com/ZeddyCraft/AN588-Group-Presentation/raw/refs/heads/main/AN588%20Survey%20(Responses)%20-%20Cleaned%20Data.csv")
d <- read.csv(cleaned_data, header = TRUE, sep = ",", stringsAsFactors = FALSE)
head(d)
## Age Gender Height.In. Weight.lb. Major
## 1 21 Female 81 180 Computer Science
## 2 20 Male 69 200 Political Science
## 3 19 Female 64 155 Computer Science
## 4 22 Female 66 130 Marine Sciences
## 5 21 Non-Binary 60 125 Film & TV
## 6 20 Non-Binary 65 185 Painting
## Second_Major Minor Like_BU Academic_Confidence
## 1 8 7
## 2 International Relations Data Science 7 7
## 3 8 3
## 4 Earth & Environmental Sciences 8 9
## 5 Advertising 6 5
## 6 8 3
## Job_Confidence Future_Anxious Chickens_Basement Zombies_Could_Kill
## 1 8 5 1E+43 10
## 2 4 9 5 5
## 3 2 10 643,022.76 3
## 4 4 7 2 0
## 5 2 10 1E+43 2
## 6 2 10 200,000 4
## Goat_Number Favorite_Color BU_Description
## 1 2e+00 Purple Multicultural
## 2 5e+02 Yellow Nuanced
## 3 4e+00 Purple Wonderous
## 4 1e+00 Purple lol
## 5 1e+12 Teal Meh
## 6 2e+00 Purple Labyrinthian
Discussion: What differences do you see with this dataset? How does it influence our research going forward?
TAKEAWAY: Clean your data!!
Topic: Regression Modeling in Survey Research
Objective: Use basic regression models to compare the predictive power of confidence and anxiety on job security.
survey_data <- d
Confidence <- d$Academic_Confidence
Anxiety <- d$Future_Anxious
JobSecurity <- d$Job_Confidence
# Two regression models
model_conf <- lm(JobSecurity ~ Confidence, data = survey_data)
model_anx <- lm(JobSecurity ~ Anxiety, data = survey_data)
summary(model_conf)
##
## Call:
## lm(formula = JobSecurity ~ Confidence, data = survey_data)
##
## Residuals:
## Min 1Q Median 3Q Max
## -4.8393 -2.3072 -0.3072 2.6286 5.5644
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 2.5640 1.3163 1.948 0.0605 .
## Confidence 0.4679 0.1972 2.372 0.0241 *
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 2.676 on 31 degrees of freedom
## Multiple R-squared: 0.1537, Adjusted R-squared: 0.1264
## F-statistic: 5.628 on 1 and 31 DF, p-value: 0.02406
summary(model_anx)
##
## Call:
## lm(formula = JobSecurity ~ Anxiety, data = survey_data)
##
## Residuals:
## Min 1Q Median 3Q Max
## -5.8423 -1.7497 -0.1109 1.8891 4.8891
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 9.5736 1.3900 6.887 1.01e-07 ***
## Anxiety -0.5463 0.1761 -3.103 0.00407 **
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 2.541 on 31 degrees of freedom
## Multiple R-squared: 0.237, Adjusted R-squared: 0.2124
## F-statistic: 9.627 on 1 and 31 DF, p-value: 0.004068
# Compare R-squared values
conf_r2 <- summary(model_conf)$r.squared
anx_r2 <- summary(model_anx)$r.squared
bar_data <- tibble(
Predictor = c("Academic Confidence", "Academic Anxiety"),
R_squared = c(conf_r2, anx_r2)
)
Let’s visualize it
# Barplot
ggplot(bar_data, aes(x = Predictor, y = R_squared, fill = Predictor)) +
geom_col(show.legend = FALSE) +
ylim(0, 1) +
labs(
title = "Prediction of Job Security by Confidence vs. Anxiety",
y = "R-squared Value",
x = "Predictor"
) +
theme_minimal()
# Scatterplot
ggplot(bar_data, aes(x = Predictor, y = R_squared)) +
geom_point() +
geom_smooth(method = "lm", se = TRUE) +
labs(title = "Academic Confidence vs. Job Security",
x = "Academic Confidence",
y = "Job Security")
## `geom_smooth()` using formula = 'y ~ x'
Discussion Prompt: Which predictor has a stronger relationship with perceived job security? What might this tell us about how confidence and anxiety influence perceptions of the future?
Takeaway: Regression helps test relationships between variables — in this case, identifying whether emotional or academic factors are more influential on students’ views of their future.
One of the most useful parts of surveys is qualitative statistics or data represented by words or phrases, not numbers.
Take for example, the “Favorite_Color” column, based on the prompt “What is your favorite color?”.
head(d$Favorite_Color)
## [1] "Purple" "Yellow" "Purple" "Purple" "Teal" "Purple"
How can we organize this data? How can we analyze it? Traditional methods of data visualization don’t work and neither do traditional methods of analysis. One way of data visualization is through word clouds. Word clouds correlate the frequency of a descriptor with the size of the word in the image, giving us a visual representation of these descriptions. The package wordcloud2 gives us a function “wordcloud2” which allows us to create these in r.
library(wordcloud2)
#first let's find the frequency (or counts) of each of the color responses in r
t<-table(d$Favorite_Color)
wordcloud2(t, size=.8, color=(c("Black","Blue","Brown","Cyan","Green","Pink","Purple","Red","Teal","Yellow")))
That’s visualization! Word cloud can be an amazing tool to use for your data visuals. Another common form of visualization is through graphs! The two most common types of graphs used are pie graphs and bar graphs. Let’s make a few graphs using the favorite color of the respondents!
ggplot(d, aes(x=Favorite_Color, fill=Favorite_Color))+
geom_bar(show.legend = F)+
scale_fill_manual(values=c("black","blue","brown","cyan","green","pink","purple","red","turquoise","yellow"))
pie(x=t,col=c("black","blue","brown","cyan","green","pink","purple","red","turquoise","yellow"))
Notice how ggplot doesn’t have a built-in function for pie charts. You actually can make pie charts using ggplot! Here’s a good guide on how to do so!
Now it’s your turn. Let’s try making a word cloud and a bar graph with the Majors column.
t2<-table(d$Major)
wordcloud2(t2, size=.4, color="random-dark")
ggplot(d, aes(x=Major,fill=Major))+
geom_bar()+
theme(axis.text.x=element_blank(), legend.key.size=unit(.1, "cm"))
Discussion Prompt: What is the most frequent major that appeared in this dataset? Any interesting major in there?
Takeaway: Text data adds depth — a word cloud offers a quick thematic scan, helping anthropologists connect numbers to narratives.
Survey data is an incredibly flexible tool for anthropologists. In this module, we explored descriptive stats, modeling, and qualitative visualization — all from one simple dataset. The skills here are the foundation for both research and applied work.