Questionnaires and surveys often have items that reflect multiple sub-scales or multiple (possibly related) factors (AKA components or sub-scales), in a single measure. In Confirmatory Factor Analysis you already have allocated items to specific factors (e.g. based on literature), but need to confirm whether these are valid allocations.
For example, the empathy quotient [@Lawrence2004] has multiple types of empathy within it:
cognitive empathy
emotional reactivity
social skills
Let’s use one publicly available data set to check if we can confirm that the allocations of items to each of these subscales is consistent with the current data:
We already have three subscales for the empathy quotient, so let’s confirm whether our current data is consistent with them. The subscales are:
14: I am good at predicting how someone will feel.
15: I am quick to spot when someone in a group is feeling awkward or uncomfortable.
29: I can sense if I am intruding, even if the other person doesn’t tell me.
34: I can tune into how someone else feels rapidly and intuitively.
35: I can easily work out what another person might want to talk about.
Social Skills:
2: I find it difficult to explain to others things that I understand easily, when they don’t understand it first time.
4: I find it hard to know what to do in a social situation.
7: Friendships and relationships are just too difficult, so I tend not to bother with them.
8: I often find it difficult to judge if something is rude or polite.
21: I don’t tend to find social situations confusing.
3: I really enjoy caring for other people.
16: If I say something that someone else is offended by, I think that that’s their problem, not mine.
19: Seeing people cry doesn’t really upset me.
33: I usually stay emotionally detached when watching a film.
39: I tend to get emotionally involved with a friend’s problems.
Let’s start by looking at correlation matrices to see if the items tend to correlate with each other in the current data set:
eq14 eq15 eq29 eq34 eq35
eq14 1.0000000 0.34501695 0.2905797 0.4416185 0.27504384
eq15 0.3450170 1.00000000 0.3721252 0.2532783 0.07192026
eq29 0.2905797 0.37212520 1.0000000 0.2468024 0.28854264
eq34 0.4416185 0.25327834 0.2468024 1.0000000 0.23509011
eq35 0.2750438 0.07192026 0.2885426 0.2350901 1.00000000
So far looking good, as everything positively correlates - mostly with r-values greater than .25.
eq3 eq16 eq19 eq33 eq39
eq3 1.0000000 0.19886409 0.23743638 0.2444680 0.3494996
eq16 0.1988641 1.00000000 0.09384386 0.2801024 0.2080836
eq19 0.2374364 0.09384386 1.00000000 0.1884394 0.3160680
eq33 0.2444680 0.28010241 0.18843936 1.0000000 0.2923009
eq39 0.3494996 0.20808361 0.31606800 0.2923009 1.0000000
Emotional empathy seems similarly consistent to cognitive empathy.
Let’s now see how well each item loads onto the total of these scores:
eq_subscales_r <- data.frame(
# Cog
# Social Skills
# Emotion
matrix(data = c(
rowSums(eq_processed[,c("eq2","eq4", "eq7", "eq8", "eq21")]),
), ncol = 3
trying to create a proper table…
Call: psych::fa(r = eq_processed, nfactors = 3, rotate = "oblimin")
Standardized loadings (pattern matrix) based upon correlation matrix
# from https://www.anthonyschmidt.co/post/2020-09-27-efa-tables-in-r/
flex <- function(data, title=NULL) {
# this grabs the data and converts it to a flextbale
flextable(data) %>%
# this makes the table fill the page width
set_table_properties(layout = "autofit", width = 1) %>%
# font size
fontsize(size=10, part="all") %>%
#this adds a ttitlecreates an automatic table number
autonum = officer::run_autonum(seq_id = "tab",
pre_label = "Table ",
post_label = "\n",
bkm = "anytable")) %>%
# font type
font(fontname="Times New Roman", part="all")
fa_table <- function(x, cut) {
#get sorted loadings
loadings <- psych::fa.sort(x)$loadings %>% round(3)
#supress loadings
loadings[loadings < cut] <- ""
#get additional info
add_info <- cbind(x$communalities,
x$complexity) %>%
# make it a data frame
as.data.frame() %>%
# column names
rename("Communality" = V1,
"Uniqueness" = V2,
"Complexity" = V3) %>%
#get the item names from the vector
#build table
loadings %>%
unclass() %>%
as.data.frame() %>%
rownames_to_column("item") %>%
left_join(add_info) %>%
mutate(across(where(is.numeric), round, 3))
psych::fa(eq_processed, nfactors = 3, rotate="oblimin"),
1 eq22 0.617 0.454 0.546 1.373
2 eq14 0.613 0.430 0.570 1.287
3 eq34 0.591 0.414 0.586 1.371
4 eq40 0.584 0.419 0.581 1.434
5 eq13 0.569 0.330 0.670 1.036
6 eq29 0.544 0.337 0.663 1.280
7 eq28 0.535 0.413 0.587 1.857
8 eq1 0.532 0.297 0.703 1.099
9 eq36 0.515 0.516 0.484 2.063
10 eq3 0.287 0.713 1.437
11 eq39 0.319 0.681 1.748
12 eq18 0.394 0.606 2.360
13 eq15 0.246 0.754 1.679
14 eq26 0.370 0.630 2.167
15 eq16 0.347 0.653 2.009
16 eq35 0.283 0.717 2.294
17 eq19 0.150 0.850 1.261
18 eq33 0.209 0.791 2.105
19 eq8 0.141 0.859 1.885
20 eq32 0.162 0.838 2.589
21 eq6 0.085 0.915 1.623
22 eq27 0.060 0.940 1.117
23 eq31 0.602 0.483 0.517 1.598
24 eq5 0.589 0.397 0.603 1.285
25 eq20 0.548 0.390 0.610 1.602
26 eq38 0.386 0.614 1.612
27 eq11 0.237 0.763 1.637
28 eq30 0.184 0.816 1.338
29 eq9 0.282 0.718 2.558
30 eq17 0.170 0.830 2.708
31 eq7 0.145 0.855 2.578
32 eq4 0.564 0.325 0.675 1.041
33 eq21 0.249 0.751 1.176
34 eq12 0.398 0.602 2.493
35 eq25 0.286 0.714 2.195
36 eq24 0.146 0.854 2.639
37 eq10 0.142 0.858 2.085
38 eq2 0.130 0.870 2.510
39 eq23 0.055 0.945 2.305
40 eq37 0.013 0.987 2.339
cog_eq_sum = (
eq_processed$eq14 +
eq_processed$eq15 +
eq_processed$eq29 +
eq_processed$eq34 +
cog_eq_mean = (
eq_processed$eq14 +
eq_processed$eq15 +
eq_processed$eq29 +
eq_processed$eq34 +
lm(eq14 ~ cog_eq_sum, eq_processed)
lm(formula = eq14 ~ cog_eq_sum, data = eq_processed)
eigen() decomposition
To start
Social Skills
This is a bit less convincing, as the 8th item doesn’t consistently correlate with other items in this subscale.