Factorial ANOVAs are a bit more complicated. A factorial ANOVA means there is more than one factor being examined. Maybe we’re interested in both multiple choice vs. short answer questions as well as whether the questions are factual or applied. There are two factors that require a factorial ANOVA. There can be more factors and they can have more levels (maybe you want to compare multiple-choice to short-answer to essay – that’s 3 levels). Factorial ANOVAs allow us to examine interactions between our factors. For example, Roediger and Karpicke (2006) found that at a short delay, participants performed better when they restudied, but at a longer delay they performed better when they were tested via retrieval practice (1). That is an interaction between two variables: delay (short vs. long) and type of study (restudy vs. retrieval practice).
r: Pearson’s r is used to ask how strong of a relationship exists between two variables. The value of r can range from -1 to 1. An r-value near 0 means that there is no relationship between the variables and a value near 1 (positive or negative) means there is a perfect (maximum strength) relationship. A positive r-value means that as one variable goes up, so does the other. An example here would be the relationship between height and weight; as height goes up, weight also goes up. A negative r-value means that as one variable goes up, the other goes down. An example here would be the relationship between absences and course grades; as absences go up, course grades tend to go down.
Regression: There are many different types of regression and discussing all of them is outside the scope of this blog post, but in general, a regression is a mathematical way of plotting the relationship described by a correlation. The big advantage of using regression is that you can make predictions about what would happen outside the range of data. At the time of this writing, regression lines are becoming very popular as people are looking at linear and logistical regressions to plot the trend line for coronavirus cases and deaths and to make predictions for the coming weeks and months.
p: When statistics are reported, most typically you will see a z, t, F, or r-value referring to the statistical tests above, and then you will see a p-value. The p-value is the exact probably that this effect is due to chance (or the exact probability of obtaining results that are this extreme or more). We are therefore looking for very low values as strong evidence that there is something more going on than just random variability. Traditionally, .05 or 5% has been used as a cut off value for something being considered “statistically significant” (although this method has come under quite a bit of scrutiny in recent years). So, again, statistically significant simply means that there is sufficient evidence to indicate that the results we see are not due to chance.
Effect size: After the p-value, you will often see one more letter (d, ηp2, etc.). This is the effect size, which is exactly what it sounds like. It is the magnitude of the effect and there are different rules of thumb for what constitutes a small, medium, and large effect. This is arguably more important than statistical significance because a very small effect can still be statistically significant without being terribly important. As an example, you could have a weight loss program where you find a statistically significant amount of weight loss, demonstrating that it is real – the weight loss is likely due to the weight loss program, but the magnitude of the loss might only be 1 lb… that small of an effect might not be worth spending money on.
There is much more that could be covered here, but hopefully this is enough to help decipher the results sections of empirical articles. One more final recommendation might be to read the first paragraph or two of the discussion section before going back to reading the results as that section usually begins with a summary of the main findings.