ANOVA vs t-Test: Key Differences Explained

Note: this post is part of a series of posts about How to Choose an Appropriate Statistical Test

Why do ANOVA over a t-test? It may seem confusing why we want to invent a new statistical method the moment we have >2 groups — but hear me out for a bit. There are actually both very practical, and also theoretical reasons why we would want to do this!

The standard t test involves us comparing between at maximum of 2 groups at once — which is why we sometimes also term it “pairwise comparison”. Rather than give you the usual example of 3 groups — let’s shake things up a bit to better illustrate why ANOVAs are important.

Say you are a chemist examining the decay rate of 5 new radioactive substances. Your independent variable now has 5 levels — substances A, B, C, D, E. The dependent variable (DV) is the half life of the substance (how long it takes to decay). Your data looks like that:

You are low-key suspecting that the substances are all actually the same (just stained in different colour for some reason). However, your boss insists that they are different — and wants to look for evidence that they are in fact different.

How do you want to go about doing this?

When t-test start to stand for (t)edious-tests

If you really want to stick to t-test — keep in mind this means that you can only compare 2 substances at once. You need to compare

A & B
A & C
A & D
A & E
B & C
B & D
B & E
C & D
C & E
D & E

That’s 10 separate t-tests — and I’m tired just typing this out!

Rather than comparing these substances 2 at a time (pairwise fashion) — is there any better way where we can compare them all at once?

Or maybe — if you are very hardworking — perhaps you may think that I’m just lazy. However, this isn’t just a matter of brute force — and to understand why, we go to the next reason why conducting multiple t-tests is no good.

Inflated Type 1 Error

You remember that when we do a t-test — we always compare our p-value to an α value? α is not there for fun — it represents type 1 error — the chance of false positive that we are willing to accept (someone described statistics as the art of educated uncertainty — which I thought was very good).

If you still don’t get why this matters, let me put it bluntly:
Assuming the null hypothesis is true, using a 5% significance level makes us wrong 5% of the time. Out of 100 significant t-tests done under the assumption that H₀ is true, 5 OF THEM ARE WRONG. That’s not a maybe — that’s baked into the system.

Now do you see why this matters? We set α at 5% because that’s the level of uncertainty we’re willing to accept — not because there’s anything sacred or magical about that number.

When you choose to compare groups by conducting multiple t-tests — you are basically increasing the chance of you getting false positive results drastically. Instead of 0.05 chance of being wrong, if you choose to analyse the above scenario with 10 t-tests, your chance of making at least 1 type 1 error in your analysis is 0.4013 (see below for proof).

Are you really willing to tell people to trust your results — when you know you have a 40% chance of having errors in it?

Which is why brute forcing mulitple t-test isn’t just a matter of working hard — it’s really just bad statistics.

What about Modified t-tests?

If type 1 error is the problem — can’t we just control for that specifically? If you are asking this — good for you! You are thinking on your feet — which shows that you are not blindly following instructions.

True. I can’t argue with your logic. In fact, these types of “controlled t-tests” are precisely what is being done after a significant ANOVA result anyways. The only drawback of this argument is that

It does not solve the tediousness of having to doi so many comparisons
There are many forms of controlled t-tests — each with their own pros and cons — and this decision adds another layer of complexity to your analysis.

ANOVA: The Solution

Which is why ANOVA was even invented in the first place. It is a straightforward method to compare ALL groups at once — while keeping your type 1 error rate constant at 0.05. Best part is that doesn’t increase in complexity as the number of groups increases — so no matter how many groups you want to compare, you just need to do the ANOVA once.

Conclusion

And so, you conduct your ANOVA and fail to reject H0 (no evidence of difference between the means). You smugly smile, happy to prove your boss wrong.

You tell him that “there’s no difference between the mean decay times!”. He stares at you in silence for a while — then tells you bluntly: “don’t overstate your conclusions”

What did you do wrong? Find out in the next post on Equivalence Testing!

Analysis of Variance (ANOVA) vs t-Test: Differences, Uses, and Examples

When t-test start to stand for (t)edious-tests

Inflated Type 1 Error

What about Modified t-tests?

ANOVA: The Solution

Conclusion

Comments

Statistical Analysis (Basics)

z-test for Proportions vs Independent Samples t-test: Don’t Use the Wrong Test

More from this blog

Python For Data Analysis In Industry: A Crash Course

Negative Binomial Regression: Extension of Poisson Regression

Poisson Regression: Modelling of Frequency Data

Ordinal Logistic Regression: When Your Categorical Groups are Related

Multinomial Logistic Regression: Predicting Three or more Categories

Command Palette

When t-test start to stand for (t)edious-tests

Inflated Type 1 Error

What about Modified t-tests?

ANOVA: The Solution

Conclusion

Comments

Statistical Analysis (Basics)

z-test for Proportions vs Independent Samples t-test: Don’t Use the Wrong Test

More from this blog