I took statistics in the twelfth grade. I didn’t do well, but that because I wasn’t trying. I do remember learning about p-values, though, and using them do hypothesis testing.
Then in college, I took business statistics. We learned all about p-values and used them in class probably every day. Two years later, I took econometrics, which basically assumed familiarity with what a p-value is and how to use it to do hypothesis testing.
After all that, I can’t really say I ever understood p-values. I used them by remembering the steps my teachers took to do hypothesis testing, but I didn’t know why I was using them, and I certainly didn’t know how to use them beyond the confines of a very specific, black-and-white problem posed in a textbook somewhere.
But tonight, all that changed. I finally understand what a p-value is, now that I’m a quarter of the way through a master’s in economics at George Mason University. And now that I get it, I’m going to explain it to you in a very simple and straightforward way—that is, the one way it was never explained to me by any professor or textbook through three statistics classes.
(I realize this is a divergence from my usual economic/political/financial commentary on this blog, but I’m hoping it will help at least a few people avoid the frustration I had trying to learn this from dry, hard-to-follow textbooks.)
What is a p-value?
Suppose you want to find the average wage of the American adult population. First, you make a guess based on information from the Bureau of Economic Analysis that that this average wage equals $15 per hour. Armed with this hypothesis, you then ask 1,000 people (your sample) how much they make per hour. You add up their answers, divide by 1,000, and come up with the mean: say, $17. Because we’re doing statistics and want to sound like a statistician, we’ll call this mean the expected value of your sample.
So the expected value of your sample’s hourly wage is $17, which is $2 more than your original hypothesis. In other words, your hypothesis is $2 less than what you observed in your survey.
This makes you worry that your sample must not accurately reflect the entire population. A $2 difference is pretty big, relative to the numbers we’re working with. But on the other hand, maybe it means you’ve discovered that this hypothesis, based on your prior studies, is wrong, and that your information needs to be seen by economists at the Bureau of Economic Analysis because it might mean their number is wrong.
But how can you know which is true? Is your sample bad, or is the Bureau or Economic Analysis wrong?
There are several things you must do in order to answer that question, but one of the first steps involves calculating a p-value.
What is a p-value? Well, lets imagine you could draw another sample—that is, ask another 1,000 people about their wage. You don’t actually do it, but it’s a hypothetical possibility. After you gather the answers, you’d then calculate the average wage associated with that sample, which (again) you refer to as that second sample’s expected value. The probability of this second sample’s expected value being as far away or further from your hypothesis of $15 as your first sample’s expected value of $17 (that is, a $2 difference) is the p-value.
I repeat in more general terms:
- The p-value is the probability that drawing another sample and calculating it’s expected value will yield a number that is as different from (or more different than) the first sample’s expected value was from the original hypothesis.
So referring back to our example: Imagine that it’s very likely that another new hypothetical sample will yield an expected value at least $2 away from $15—say, $11 or $20 or $22…anything more than $2 away from $15, because that was the expected value we actually calculated from the real, non-hypothetical sample we drew at the beginning. This means that the p-value is very high.
Again, if it’s true that taking another sample would result in an expected value that is as different from (or more different than) your hypothesis, your p-value is high. The exact probability of this happening is the p-value. So if it will happen 90 percent of the time, the p-value is 90%.
If it’s unlikely that another sample will result in an expected value that is more than $2 different from your hypothesis of $15, then the p-value is low. If only one out of one hundred samples will yield an expected value more than $2 different from $15, then the p-value is 1%.
Now that you’ve got that down, understanding the following technical definition of a p-value should be easier.
- The p-value, sometimes called the “probability value,” is the probability of drawing a statistic from another hypothetical sample that is at least as adverse to the hypothesis as the one you actually computed in your real, non-hypothetical sample.
Here’s another definition. Sometimes hearing things a few different ways helps.
- The p-value is the probability of observing an expected value of a sample that is at least as different from the hypothesis as the sample whose quality is the subject of your investigation.
Now here is the technical definition your teacher or professor probably wants to hear from you.
- The p-value is the probability of observing, by pure random sampling variation, a sample expected value at least as different from the null hypothesis value as the observed sample’s expected value, assuming that the null hypothesis is true.
I hope that makes sense. I hope it also makes sense that a high p-value means you are less likely to reject your original hypothesis than a low p-value. In our example from above, a high p-value means it’s like that further samples will yield numbers higher than $17 or lower than $13 (both $2 away from $15). This means there is probably a considerable degree of random sampling variation and that our sample, while yielding an expected value $2 different than the hypothesis, really shouldn’t be thrown out.
On the other hand, a low p-value—say, 5%—means that the sample you drew is about as far as you’ll get from your hypothesis while staying within the realm of random sampling variation. The lower it gets, the more likely that your original hypothesis is wrong, as this only means the odds of you having drawn that sample get smaller and smaller. If it’s 0.5%, then it’s very likely that the difference between your hypothesis and your sample’s expected value is attributable to something other than random chance (i.e. that your hypothesis is wrong).
That’s p-value in a nutshell. I didn’t get into how to calculate a p-value, but that’s a topic for another day. I just wanted to fill what I perceive as a gap in the way p-value (and statistics at large, for that matter) is taught. Anyways, you need to understand this before you begin studying how to calculate p-values. In fact, understanding exactly what a p-value is will even help you remember how to calculate them come exam day.