Using Survey Data in Anthropology: Properly Prepping, Analyzing, and Modeling

Preliminaries

This module introduces how survey data can be a powerful tool in anthropological research by using real-world responses from a survey done in this class. Surveys allow anthropologists to gather insights into cultural beliefs, behaviors, and perceptions across diverse populations. In this module, you’ll explore how structured survey responses can be analyzed to reveal patterns and meanings relevant to social science questions. To get started, you’ll need the following:

library(tidyverse)
library(curl)
library(ggplot2)
library(tm)
library(wordcloud2)
library(RColorBrewer)

Objectives

In this module, we will learn:
1. How to clean and manipulate survey data.
2. How to calculate descriptive statistics using survey data
3. How to determine what type of analyses we can do with survey data.

BUT FIRST:

Why Use Survey Data in Anthropology?

Survey data helps anthropologists:

  1. Understand cultural trends and social behavior.

  2. Collect both quantitative and qualitative data, providing different approaches that analyzing both broad trends and detailed insights within R.

  3. Compare societies/populations across different worlds and time periods.

  4. Quantify ethnographic findings with statistical evidence.

Example: Imagine a cultural anthropologist studying youth perceptions of education across rural and urban communities in Kenya. Through open-and closed-ended survey questions, they can:

  • Quantitatively measure how many respondents believe education leads to job security.

  • Qualitatively analyze why they hold those beliefs based on open-text responses.

  • Use R to visualize group differences (e.g., urban vs. rural) and run regression models to predict educational outlook based on socioeconomic variables.

1. Loading and Exploring the Data We will use a survey dataset for this Module. We will need to load in the dataset:

# Load the raw dataset
raw_data <- curl("https://github.com/ZeddyCraft/AN588-Group-Presentation/raw/refs/heads/main/AN588%20Survey%20(Responses)%20-%20Form%20Responses%201.csv")
d <- read.csv(raw_data, header = TRUE, sep = ",", stringsAsFactors = FALSE)
head(d)
##            Timestamp How.old.are.you. What.is.your.gender.
## 1 3/25/2025 16:44:00               21               Female
## 2 3/25/2025 16:55:22               20                 Male
## 3 3/25/2025 17:14:43               19               Female
## 4 3/25/2025 17:15:59               22               Female
## 5 3/25/2025 17:20:32               21           Non-Binary
## 6 3/25/2025 17:20:33               20           Non-Binary
##   How.tall.are.you..in.inches.please....if.you.don.t.know.guess.a.number...
## 1                                                                       6'9
## 2                                                                        69
## 3                                                                        64
## 4                                                                        66
## 5                                                                       60"
## 6                                                                        65
##   How.much.do.you.weigh..in.lbs.please....if.you.don.t.know.guess.a.number...
## 1                                                                      180lbs
## 2                                                                         200
## 3                                                                         155
## 4                                                                         130
## 5                                                                      125lbs
## 6                                                                         185
##   What.is.your.major...Answer.in.full..Ex..Computer.Science.
## 1                                                         cs
## 2                                                      PO/IR
## 3                                           Computer Science
## 4            Marine Sciences, Earth & Environmental Sciences
## 5                                     Film & TV, Advertising
## 6                                                   Painting
##   Do.you.have.are.planning.to.add.a.minor.
## 1                                       No
## 2                                      Yes
## 3                                       No
## 4                                       No
## 5                                       No
## 6                                       No
##   If.the.answer.to.your.previous.question.was.yes..what.minor...Answer.in.full..Ex..Computer.Science.
## 1                                                                                                    
## 2                                                                                                  DS
## 3                                                                                                    
## 4                                                                                                    
## 5                                                                                                    
## 6                                                                                                    
##   On.a.scale.of.1.10..how.much.do.you.like.BU.
## 1                                            8
## 2                                            7
## 3                                            8
## 4                                            8
## 5                                            6
## 6                                            8
##   On.a.scale.of.1.10..how.confident.are.you.in.your.academics.
## 1                                                            7
## 2                                                            7
## 3                                                            3
## 4                                                            9
## 5                                                            5
## 6                                                            3
##   On.a.scale.of.1.10..how.confident.are.you.that.you.will.get.a.job.after.graduation.
## 1                                                                                   8
## 2                                                                                   4
## 3                                                                                   2
## 4                                                                                   4
## 5                                                                                   2
## 6                                                                                   2
##   On.a.scale.of.1.10..how.often.are.you.anxious.about.the.future.
## 1                                                               5
## 2                                                               9
## 3                                                              10
## 4                                                               7
## 5                                                              10
## 6                                                              10
##   How.many.chickens.can.fit.in.the.basement.of.CAS.
## 1                                          Infinite
## 2                                        At least 5
## 3                                        643,022.76
## 4                                      at least two
## 5                                          infinite
## 6                                           200,000
##   How.many.zombies.do.you.think.you.could.kill.in.the.zombie.apocalypse.
## 1                                                                     10
## 2       Depends on the equipment but my delusional ass thinks at least 5
## 3                                                                      3
## 4                                               none i'd be patient zero
## 5                                          depends, at min like maybe 2?
## 6                                i’d get to like 4 them just give up tbh
##                                                                If.you.could.have.an.infinite.number.of.goats..how.many.goats.would.you.want.
## 1                                                                                                                                          2
## 2 I do not want goats, so as many as I could realistically sell for profit without over-saturating the market or endangering their wellbeing
## 3                                                                                                                                          4
## 4                                                                                                                                ...one?????
## 5                                                                                                                                   infinite
## 6                                                                                                                                          2
##   What.is.your.favorite.color.
## 1                       purple
## 2                       Yellow
## 3                       805999
## 4                      #dabbed
## 5         teal or forest green
## 6                       Purple
##   Use.one.word.to.describe.how.do.you.feel.about.BU. Column.17
## 1                                      multicultural        NA
## 2                                            Nuanced        NA
## 3                                          Wonderous        NA
## 4                                                lol        NA
## 5                                                meh        NA
## 6                                       Labyrinthian        NA

Challenge 1: Reviewing Survey Data

Take a moment to explore the raw survey data above. Before we begin analysis, consider the following questions:

  1. What is wrong with this dataset here?

  2. What will likely happen if we run an analysis with right now?

  3. What challenges or limitations would this impose on our research?

Why Cleaning Your Data Matters

Before conducting any kind of analysis, it’s essential to clean your data. Dirty or inconsistent data can lead to misleading results or errors in your analysis. Common cleaning steps include:

You can clean your data using R (e.g., mutate(), rename(), filter() functions in dplyr), or manually in a spreadsheet if the dataset is small. The cleaned version used in this module is already processed for demonstration.

Now load this cleaned dataset:

# Load the dataset
cleaned_data <- curl("https://github.com/ZeddyCraft/AN588-Group-Presentation/raw/refs/heads/main/AN588%20Survey%20(Responses)%20-%20Cleaned%20Data.csv")
d <- read.csv(cleaned_data, header = TRUE, sep = ",", stringsAsFactors = FALSE)
head(d)
##   Age     Gender Height.In. Weight.lb.             Major
## 1  21     Female         81        180  Computer Science
## 2  20       Male         69        200 Political Science
## 3  19     Female         64        155  Computer Science
## 4  22     Female         66        130   Marine Sciences
## 5  21 Non-Binary         60        125         Film & TV
## 6  20 Non-Binary         65        185          Painting
##                     Second_Major        Minor Like_BU Academic_Confidence
## 1                                                   8                   7
## 2        International Relations Data Science       7                   7
## 3                                                   8                   3
## 4 Earth & Environmental Sciences                    8                   9
## 5                    Advertising                    6                   5
## 6                                                   8                   3
##   Job_Confidence Future_Anxious Chickens_Basement Zombies_Could_Kill
## 1              8              5             1E+43                 10
## 2              4              9                 5                  5
## 3              2             10        643,022.76                  3
## 4              4              7                 2                  0
## 5              2             10             1E+43                  2
## 6              2             10           200,000                  4
##   Goat_Number Favorite_Color BU_Description
## 1       2e+00         Purple  Multicultural
## 2       5e+02         Yellow        Nuanced
## 3       4e+00         Purple      Wonderous
## 4       1e+00         Purple            lol
## 5       1e+12           Teal            Meh
## 6       2e+00         Purple   Labyrinthian

Discussion: What differences do you see with this dataset? How does it influence our research going forward?

TAKEAWAY: Clean your data!!

Challenge 2: Test a Hypothesis

Topic: Regression Modeling in Survey Research

Objective: Use basic regression models to compare the predictive power of confidence and anxiety on job security.

survey_data <- d
Confidence <- d$Academic_Confidence
Anxiety <- d$Future_Anxious
JobSecurity <- d$Job_Confidence
# Two regression models
model_conf <- lm(JobSecurity ~ Confidence, data = survey_data)
model_anx <- lm(JobSecurity ~ Anxiety, data = survey_data)

summary(model_conf)
## 
## Call:
## lm(formula = JobSecurity ~ Confidence, data = survey_data)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -4.8393 -2.3072 -0.3072  2.6286  5.5644 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)  
## (Intercept)   2.5640     1.3163   1.948   0.0605 .
## Confidence    0.4679     0.1972   2.372   0.0241 *
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 2.676 on 31 degrees of freedom
## Multiple R-squared:  0.1537, Adjusted R-squared:  0.1264 
## F-statistic: 5.628 on 1 and 31 DF,  p-value: 0.02406
summary(model_anx)
## 
## Call:
## lm(formula = JobSecurity ~ Anxiety, data = survey_data)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -5.8423 -1.7497 -0.1109  1.8891  4.8891 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)   9.5736     1.3900   6.887 1.01e-07 ***
## Anxiety      -0.5463     0.1761  -3.103  0.00407 ** 
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 2.541 on 31 degrees of freedom
## Multiple R-squared:  0.237,  Adjusted R-squared:  0.2124 
## F-statistic: 9.627 on 1 and 31 DF,  p-value: 0.004068
# Compare R-squared values
conf_r2 <- summary(model_conf)$r.squared
anx_r2 <- summary(model_anx)$r.squared

bar_data <- tibble(
  Predictor = c("Academic Confidence", "Academic Anxiety"),
  R_squared = c(conf_r2, anx_r2)
)

Let’s visualize it

# Barplot
ggplot(bar_data, aes(x = Predictor, y = R_squared, fill = Predictor)) +
  geom_col(show.legend = FALSE) +
  ylim(0, 1) +
  labs(
    title = "Prediction of Job Security by Confidence vs. Anxiety",
    y = "R-squared Value",
    x = "Predictor"
  ) +
  theme_minimal()

# Scatterplot
ggplot(bar_data, aes(x = Predictor, y = R_squared)) +
  geom_point() +
  geom_smooth(method = "lm", se = TRUE) +
  labs(title = "Academic Confidence vs. Job Security",
       x = "Academic Confidence",
       y = "Job Security")
## `geom_smooth()` using formula = 'y ~ x'

Discussion Prompt: Which predictor has a stronger relationship with perceived job security? What might this tell us about how confidence and anxiety influence perceptions of the future?

Takeaway: Regression helps test relationships between variables — in this case, identifying whether emotional or academic factors are more influential on students’ views of their future.

Qualitative Statistics

One of the most useful parts of surveys is qualitative statistics or data represented by words or phrases, not numbers.

Take for example, the “Favorite_Color” column, based on the prompt “What is your favorite color?”.

head(d$Favorite_Color)
## [1] "Purple" "Yellow" "Purple" "Purple" "Teal"   "Purple"

How can we organize this data? How can we analyze it? Traditional methods of data visualization don’t work and neither do traditional methods of analysis. One way of data visualization is through word clouds. Word clouds correlate the frequency of a descriptor with the size of the word in the image, giving us a visual representation of these descriptions. The package wordcloud2 gives us a function “wordcloud2” which allows us to create these in r.

library(wordcloud2)

#first let's find the frequency (or counts) of each of the color responses in r
t<-table(d$Favorite_Color)
wordcloud2(t, size=.8, color=(c("Black","Blue","Brown","Cyan","Green","Pink","Purple","Red","Teal","Yellow")))

That’s visualization! Word cloud can be an amazing tool to use for your data visuals. Another common form of visualization is through graphs! The two most common types of graphs used are pie graphs and bar graphs. Let’s make a few graphs using the favorite color of the respondents!

ggplot(d, aes(x=Favorite_Color, fill=Favorite_Color))+
  geom_bar(show.legend = F)+
  scale_fill_manual(values=c("black","blue","brown","cyan","green","pink","purple","red","turquoise","yellow"))

pie(x=t,col=c("black","blue","brown","cyan","green","pink","purple","red","turquoise","yellow"))

Notice how ggplot doesn’t have a built-in function for pie charts. You actually can make pie charts using ggplot! Here’s a good guide on how to do so!

Challenge 3

Now it’s your turn. Let’s try making a word cloud and a bar graph with the Majors column.

t2<-table(d$Major)
wordcloud2(t2, size=.4, color="random-dark")
ggplot(d, aes(x=Major,fill=Major))+
  geom_bar()+
  theme(axis.text.x=element_blank(), legend.key.size=unit(.1, "cm"))

Discussion Prompt: What is the most frequent major that appeared in this dataset? Any interesting major in there?

Takeaway: Text data adds depth — a word cloud offers a quick thematic scan, helping anthropologists connect numbers to narratives.

Conclusion

Survey data is an incredibly flexible tool for anthropologists. In this module, we explored descriptive stats, modeling, and qualitative visualization — all from one simple dataset. The skills here are the foundation for both research and applied work.