Overview of Hypothesis Testing

Fundamentally, Hypothesis Tests are a tools that we use to make decisions when data is involved. For example, if we run a business where we deliver products to customers, and we need to decide between two different couriers, we’d like to pick the one that’s both faster and more reliable (assuming the cost difference isn’t substantial). How can we make such a decision and be confident that we’re right?

If possible, we’d probably assign each of them a series of deliveries, collect data on delivery time, and then analyze that data to see which one was faster, on average. But how do we know if the sample deliveries we gave them were representative of how they truly perform? If we ran the test again, would we get the same result?

That’s where Hypothesis Testing comes in. In the example above, we’d start with the default hypothesis that the averages delivery times of the two couriers are the same, and then conduct a statistical test to determine how likely it is that our hypothesis is true. If the probability of our hypothesis being true gets low enough (generally below 5%), we reject our hypothesis and accept that the averages between the two couriers is, in fact, different.

Selecting a Hypothesis Test

Choosing which Hypothesis Test to use depends on what you are comparing. The list below gives a brief description of what each test is for, and a link to an example. Note that no matter which test you use, the analysis will produces a “p-value”, which is, essentially, the probability that your default hypothesis is true.

      • T Test – The T Test compares the means (averages) of two data sets. The default hypothesis is that the means of the data sets are statistically the same, so the p-value will indicate if they are different. This test assumes the data are Normally distributed. You can verify that with the “Tell me about my data” function. (Click here for an example of how to use the T Test.)
      • Paired T Test – Use this test when you want to determine if there is a statistical difference between the same set of samples tested twice. This is often used in before and after situations when some operation has been done to a sample after the initial measurement, and then a secondary measurement is taken to see if there is any change. The default hypotheses is that there is no difference in the mean before and after. (Click here for an example of a Paired t-Test.)
      • 1-Sample T Test – This test is used when you want to compare the mean of a sample to a hypothesized or known historical mean. The default hypothesis is that the mean of your sample is the same as the hypothesized mean. A low p-value will indicate that the sample mean is statistically different from the hypothesized mean. The sample data should be Normally distributed. (Click here for an example of how to use a 1-Sample T Test).
      • 1-way ANOVA – This test is used to compare the means of three to ten data sets where there is just one thing that differs from set-to-set. For example, you might compare the average dimensions of the same part provided by different suppliers. In this case, all other things being equal (e.g., material, specifications, etc.), the one difference is the “supplier”. 1-way ANOVA will allow you to determine if all the suppliers have the same average or not. The assumption is that the data sets are normally distributed, and that they have approximately the same variance. The default hypothesis is that the means of all the data sets are the same. A low p-value will indicate that the means are likely not the same. (Click here for an example of 1-way ANOVA.)
      • Mood’s Median Test – Use this test to determine if there is a statistically significant difference between the medians of multiple data sets. The data sets do not have to fit any particular distribution, but the data itself should be continuous. (Click here for an example of the Mood’s Median Test.)
      • F Test – The F Test compares variances (and hence the standard deviations) of two data sets. The default hypothesis are that the variances of the data sets are statistically the same, so p-value will indicate if they are different. This test assumes the data are Normally distributed. You can verify that with the “Tell me about my data” function. (Click here for an example of how to use the F Test.)
      • Test of Proportions – This test is used to determine if the proportion of “defective” items in one group is statistically different from the proportion of “defective” items in a second group. Keep in mind that by “defectives”, we are really just referring to the subset of items in each group that have a certain property we care about. For example, you might be comparing the proportion of defective products produced by two different shifts. The default hypothesis is that the proportions are the same. A low p-value will indicate that the proportions are likely different. (Click here for an example of a Test of Proportions.)
      • Independence Test – Also called the Chi-Square test of Independence, this test is used to determine if there is a statistically significant correlation in categorical data. The default hypothesis is that the there is no correlation, so a low p-value indicates that a correlation likely exists. (Click here for an example of an Independence Test.)