Central Tendency (R, Python, Excel, JASP)
Central tendency describes typical values of a variable, such as its mean and median.
Mean
The mean is often called the “average” informally, but is actually a specific type of “average”. The mean is the average you get when you add together a group of values, and then divide by the number of items you combined.
For example, to calculate the mean life expectancy of countries in 2007, we’ll use gapminder data
# install (if required) and load the gapminder data
if(!require(gapminder)){install.packages("gapminder")}
Loading required package: gapminder
library(gapminder)
# create a new data frame that only focuses on data from 2007
<- subset(
gapminder_2007 # the data set
gapminder, == 2007
year
)
# a reminder of the data frame
::paged_table(head(gapminder_2007)) rmarkdown
# total of all years
= sum(gapminder_2007$lifeExp)
sum_life_expectancy
# count the people
= length(gapminder_2007$lifeExp)
n_life_expectancy = sum_life_expectancy/n_life_expectancy
mean_life_expectancy mean_life_expectancy
[1] 67.00742
# load the gapminder module and import the gapminder dataset
from gapminder import gapminder
# import the tabulate
from tabulate import tabulate
# create a new data frame that only focuses on data from 2007
= gapminder.loc[gapminder['year'] == 2007]
gapminder_2007
#display table
print(tabulate(gapminder_2007[:6], headers=gapminder_2007.head() , tablefmt="fancy_grid",showindex=False ))
# total of all years
= gapminder_2007['lifeExp'].sum()
sum_life_expectancy
# count the people
= gapminder_2007['lifeExp'].count()
n_life_expectancy
# calculate mean life expectancy
= sum_life_expectancy/n_life_expectancy
mean_life_expectancy mean_life_expectancy
67.00742253521126
You should be able to access an excel spreadsheet of the gapminder data here.
To calculate the mean you can calculate the total of all scores using sum, and then divide by the number of items using count:
Which would give you 59.47443937. However, if you would like to calculate the mean for just the year 2007, you would need to use sumifs and countifs:
Which gives us 67.0074225352113.
You may want to skip to the next section to use the mean functionality in JASP. However, if you would like to calculate the sum and number of data points to manually calculate the mean, use the descriptives functionality and make sure the Valid (how many valid data points there are) and Sum checkboxes are ticked.
You would then divide 101344.445 by 1704 to get 59.47444
If you want the mean for 2007 specifically, you’ll need to apply a filter first and then complete the same steps to get the descriptives:
You would then divide 9515.054 by 142 to get 67.00742
For those of you who like to double check these things (which is a good instinct), let’s see what number you get if you use a direct function for mean:
mean(gapminder_2007$lifeExp)
[1] 67.00742
'lifeExp'].mean() gapminder_2007[
67.00742253521126
We use the average function to calculate the mean in Excel:
But if we just want the mean of 2007, then we would use the averageifs function:
Which gives us 67.0074225352113
If you want the overall mean in JASP, use the descriptives functionality and make sure the mean is selected:
If you want the mean for 2007 specifically, you’ll need to apply a filter first and then complete the same steps to get the descriptives:
Whew - it’s the same as the manual calculation above.
Median
Now median is less known than mean. Median is the value in the middle once you sort your data in ascending or descending order. It’s well explained in the first paragraph on wikipedia: https://en.wikipedia.org/wiki/Median, so I would suggest looking there. The mean and median are not always the same (in fact, they are usually at least slightly different; remember the mean was 67.00742 when looking at the medians below).
median(gapminder_2007$lifeExp)
[1] 71.9355
'lifeExp'].median() gapminder_2007[
71.93549999999999
Just use the median function:
Which should be 60.7125. Or combine median and if using the following structure:
=median(if(criterion for selecting rows, data you want median from)
Which should be 71.9355
Using the descriptives functionality we can get the median:
If you want the median for 2007 specifically, you’ll need to apply a filter first:
Question 1
Which of the following is most influenced by outliers?