Statistics Basics
Red means that the page does not exist yet
Gray means that the page doesn’t yet have separation of different levels of understanding
Orange means that the page is started
In this website you can choose to expand or shrink the page to match the level of understanding you want.
- If you do not expand any (green) subsections then you will only see the most superficial level of description about the statistics. If you expand the green subsections you will get details that are required to complete the tests, but perhaps not all the explanations for why the statistics work.
- If you expand the blue subsections you will also see some explanations that will give you a more complete understanding. If you are completing MSc-level statistics you would be expected to understand all the blue subsections.
- Red subsections will go deeper than what is expected at MSc level, such as testing higher level concepts.
Population vs. sample
Statistics generally involves analyses on samples to allow us to draw conclusions about the population. For example, if we wanted to assess whether dogs are taller than cats on average, we could measure the height of every dog and every cat, and then we would have the population average for each animal. However, as there are billions of dogs and cats, it’s not realistic to test all of them. So we instead would recruit a few animals in a sample of dogs and cats and then estimate what the population average is based on these animals in our sample. Much of the statistics on this website aims to draw conclusions about a population based on a sample.
Bias
Bias can have a few meanings, but often refers to how recruitment or analysis of a sample will systematically (i.e. more often than chance) misrepresent the population. For example, if you wanted to investigate whether Dutch people tend to be taller than Swedish people, you would create bias if you only recruited males within the Dutch population, and females within the Swedish population, as males tend to be taller than females. Analyses can increase bias if they increase the risk of over or underestimating an effect size. For example, if you did your statistics in such a way as to exaggerate the differences in height between Dutch and Swedish people, that would create bias.
What is a variable?
In statistics, a variable is any (measurable) attribute that describes any organism or object. It’s called a variable because they vary from organism to organism or object to object. Height is a good example of a variable within humans, as height changes from person to person.
Within coding, a variable tends to refer to a particular object in your code, such as a specific value, list, dataset, etc. In R the terminology for a variable tends to be object.
What is a hypothesis?
A(n experimental) hypothesis is a possible outcome for the study you will run. Sometimes researchers think in terms of null hypotheses, which is what you would expect if your (experimental) hypothesis is incorrect.
What is a p-value?
Oversimplification: A p-value tells you how likely your hypothesis is correct
Better definition: How likely you would get your current results by chance (i.e. randomly) if your main hypothesis were not true. We assume that a result is meaningful if there is only a very small chance that they could happen by accident.
Technical definition: The p-value is the probability of observing a particular (or more extreme) effect under the assumption that the null hypothesis is true (or the probability of the data given the null hypothesis: \(Pr(Data|H_0)\)).
To give a more concrete example:
If the observed difference between two means is small, then there is a high probability that the data underlying this difference could have occurred if the null (there is no difference) is true, and so the resulting p-value would be large.
In contrast, if the difference is huge, then the data underlying this difference is much less likely to have occurred if the null is true, and the subsequent p-value will be smaller to reflect the lower probability.
In terms of whether your research conclusions are valid, there are 4 broad possible outcomes:
True | False | |
---|---|---|
Positive | You conclude there is an effect in your sample and this reflects the general population. (i.e. the effect you found was real) e.g. You conclude that big dogs have more fur than small dogs (presumably this is true as there is more dog to put fur on in big dogs, right?) |
You conclude there is an effect in your sample, but this does not reflect the general population. (i.e. the effect you found was not real) e.g. You conclude that half-full glasses have more water than half-empty glasses This is a type 1 error |
Negative | You conclude that there is no effect in your sample, and this reflects the general population. (i.e. you are correct to say that there is no effect) e.g. You conclude that half-full glasses have as much water as a half-empty glasses |
You conclude that there is no effect in your sample, but this does not reflect the general population. (i.e. you are incorrect to say that there is no effect) e.g. You conclude that big dogs do not have more fur than small dogs This is a type 2 error |
What is the alpha value?
\(Alpha\) (⍺) is the p-value threshold that identifies if a result is “significant” or not. Within psychology, the alpha value is .05, in which we believe that if the p-value is less than .05 then the result is “significant” (i.e. so unlikely that this would have happened by chance that we conclude this didn’t happen randomly).
Technical definition: The alpha-level is the expected rate of false-positives or type 1 errors (in the long run). Under the null hypothesis all p-values are equally probable, and so the alpha value sets the chance that a null hypothesis is rejected incorrectly (i.e. we say there is an effect when there isn’t one).
Setting alpha at 0.05 is a convention that means we would only do this 5% of the time, and if we wanted to be more or less strict with the false-positive rate, we could adjust this value (this has been a contentious issue in recent years, see here and here).
What is power (and what is the beta value)
Power tells you how likely you are to get a significant result (assuming the effect you are investigating is real) based on:
the sample size of how many participants you have. The more participants you have, the more powerful your analysis will be (i.e. the more likely you are to find a significant result).
the effect size (e.g. Cohen’s d). The bigger the effect size, the more powerful your analysis will be. Effect sizes in power calculations are generally based on previous studies with similar paradigms.
the \(alpha\) (⍺) threshold. A smaller \(alpha\) threshold requires a higher powered analysis to get a significant result.
\(Beta\) is the risk of a false-negative, so power is \(1-Beta\).
To explore this, you might find the following interactive tool helpful to see what happens when you change the above:
https://observablehq.com/@patrickmineault/interactive-demo-in-pure-js
Signal vs. noise
Signal reflects changes in the data that can be explained by one or more factor that you’re measuring. For example, if you are trying to predict someone’s weight, then height would be a source of signal, as it helps you explain differences in weight between individuals.
Noise is a term to describe factors that are influencing your results/outcome that you haven’t taken into account. Going back to our example with height and weight, there are other factors that might not have been measured (e.g. muscle density) which are sources of noise.
Your prediction of findings will be more accurate if you have more signal (accurate predictors) you are measuring and less noise (predictors you have overlooked).
Question 1
Assuming that you are investigating a real effect, which sample size is more likely to give you a significant result?
Question 2
Assuming that you are NOT investigating a real effect, which sample size is more likely to give you a significant result?
Question 3
Alpha values are…