Choosing Between Z-Test & T-Test for Proportions

Note: this post is part of a series of posts about How to Choose an Appropriate Statistical Test

This is one of the common errors that trips people up — on surface, it seems intuitive because proportions deal with percentages, whereas for independent sample t-test we look at a continuous DV.

But trust me — when you stare at a dataset — even the most basic of things you will start to forget unless you pay special attention to it. Your brain will find that it is theoretically possible to convert data from one format into another, and then suddenly it becomes a little bit foggy as to which analysis method seems best.

Let’s use a classic textbook example — a coin flip. Your goal is simple — determine whether or not the coin is a fair coin.

Seems intuitive right? But say your data analyst friend presented the data this way:

He’s now proposing using an independent samples t-test to compare the numbers, to see if the the number of heads differs significantly from the number of tails. By extension then, if you see a significant difference, then the coin is not a fair coin.

Logical? Makes sense? Do you agree?

Statistics Point of View

If you agreed — then you are in trouble. See, any layman can put in some numbers in excel and plot a graph. The vislisation you choose has to respect the properties of the data — and if it doesn’t, it is very easy to be misled into the wrong statistical test becasue the graphs seem like they are in the format of a test you previously used.

Here, the DV is obviously interrelated to each other — it is a binary outcome variable after all! (if not tails, means head). When your outcomes are interrelated — your independence assumption obviously does not hold — menaing that analysing it using an independent samples t-test is downright wrong.

If you are thinking along the lines of a paired samples t-test/one-sample t test — unfortunately you are wrong again😧. This is not a case where the data is paired — this is the case where the data is literally derived from one another (It’s like X and 1-X). Besides — to even use a t-test requires you to compute variance somehow for the standard estimate — becuase the assumption is that each individual provides a DV measurement, and then you are comparing the average of this DV to either another group (two sample t-test) or a population value (one-sample t-test). With your dataset being filled with 0 and 1s (head or tails) — what variance are you even computing? Truth is, t-tests of any kind in this context does not make sense at all.

The only form of analysis that make sense for this dataset is a one sample z-test for proportions. You can take either the proportion of heads OR tails — it doesn’t matter as both will give you the same result (X and 1-X afterall!) Test this against the null hypothesis that the population proportion is 50% — rejecting this null hypothesis means that the coin is not a fair coin.

A visualisation that respects the properties of this dataset honestly should only contain either the proportion of heads OR tails — since the other outcome is merely dervied from the other*(1-X)* there’s no real need to present both in your visualisation.

You will also notice that with this visualisation — it is a lot less tempting to use an independent samples t-test to analyse this data anymore. Indeed — that’s the point! As a statistician — you want your figure to reflect the properties of your data — and also guide people to the correct form of statistical analysis just based on your data vislusation alone. Don’t unncesscarily confuse people!

How Data Visualisations get Convoluted

Charts are very good for guiding people to see what you want them to see. You have seen for yourself how a simple change of the chart type — even if it’s the same underlying data being presented — can make the subsequent interpretation and analysis vary quite a bit.

But what I want to highlight here is that Data Visualisation is not implicit to the data as well. Going back to this chart — let’s change up the labels and the title a bit:

Now say you want to compare the two groups. Which test do you use? The answer is: independent samples t-test.

The chart is exactly the same — but yet the properties of this dataset is very different from the initial one. Point being — how you represent your data is an active choice you need to make. None of these visualisations are wrong to be frank — but some are more correct than others because they direct people to the right statistical test.

Conclusion

Hope this post was insightful for you — and don’t get your z-test for proprotios wrong next time ya! 😉

Check out my next post on ANOVA vs t-tests

z-test for Proportions vs Independent Samples t-test: Don’t Use the Wrong Test

Statistics Point of View

How Data Visualisations get Convoluted

Conclusion

Comments

Statistical Analysis (Basics)

The Difference Between t-test & Paired t test: When to Use One Over the Other

More from this blog

Python For Data Analysis In Industry: A Crash Course

Negative Binomial Regression: Extension of Poisson Regression

Poisson Regression: Modelling of Frequency Data

Ordinal Logistic Regression: When Your Categorical Groups are Related

Multinomial Logistic Regression: Predicting Three or more Categories

Command Palette

Statistics Point of View

How Data Visualisations get Convoluted

Conclusion

Comments

Statistical Analysis (Basics)

The Difference Between t-test & Paired t test: When to Use One Over the Other

More from this blog