Is the Dunning-Kruger effect real?

This is an abridged, less technical version of our longer report, found here.

You’ve probably heard that people who are bad at something tend to overestimate how skilled they are. They don't even know enough to know how bad they are. This idea gained prominence in 1999 when Justin Kruger and David Dunning introduced what’s now known as ‘the Dunning-Kruger effect’ - a cognitive bias whereby people who lack competence in a task are also unable to accurately assess their own performance. Their studies also showed that high performers underestimate their skill level, which they suggest is caused by the ‘false-consensus effect’ - whereby individuals believe their own traits, beliefs, and behaviors are more common in the general population than they actually are.

But are these effects real?

Over the years, numerous studies have found results similar to the ones that led Kruger and Dunning to their conclusions. However, there is growing literature challenging the existence of the Dunning-Kruger effect.

But how can anyone deny a finding that’s been so widely replicated? Well, buckle up, as we're going to tell your a data-driven story that delves into:

how tricky and subtle math can be - and how easy it is for it to trick us into thinking we've made a discovery when perhaps we haven't
how irrational-seeming behavior isn't necessarily irrational

Note: If you’re short on time, you can skip ahead to the “Key Takeaways” at the end for a quick summary!

Replicating the Dunning-Kruger effect

As we've mentioned, studies often find results that seem to indicate a Dunning-Kruger effect. To show you how this happens, let’s take a look at a study we ran ourselves.

We recruited 1817 participants in the United States, using our Positly study participant platform, and we gave them intelligence related tasks to assess their IQ scores, and then gathered their self-assessed IQ scores by asking them:

“Out of 100 random people in your country who are of your own age and gender, how many of those 100 people do you think you would do better than on an intelligence test designed to accurately measure IQ (if you all took the same test)?”

The answers to this question can easily be converted to self-assessed IQ scores, since IQ is known to have a distribution that is approximately a bell curve (i.e., a "normal distribution"). We can then plot each person's measured IQ against what they predicted their IQ to be.

Were people able to estimate their own scores accurately? Take a look at the chart below. Each blue dot reflects the results of one person, the dark blue line reflects the trend in the relationship between measured IQ and self-reported IQ:

We can see that people with higher measured IQs believed themselves to have higher IQs than those with lower measured IQs - but not by much!

If both measured IQ and self-assessed IQ were completely accurate (i.e., the test was a perfect test, and people were perfect at evaluating their own scores), all of the data points would fall on the gray dashed line. However, we see that this is not the case: participants with lower measured IQ (i.e., those to the left on the chart) overestimated their IQ, showing a behavior consistent with the Dunning-Kruger effect. This is encapsulated by the fact that, on the left side of the chart, the blue trend line lies above the gray dashed line showing that lower IQ participants overestimated their ability. Likewise, we see that higher IQ participants tended to underestimate their IQs, as encapsulated by the fact that toward the right of the chart the blue line lies below the gray dashed line.

Kruger and Dunning’s explanation of the phenomenon was that, since the worst performers lacked the skill being tested, they also lacked the ability to accurately self-assess that skill. This, according to them, is domain specific, so a person could be a victim of the Dunning-Kruger effect when self-assessing themselves on one skill but not another.

Any time the trendline starts above the gray-dotted line but then passes below it, it will illustrate (what at least appears to be) a Dunning-Kruger effect.

So our results seem to have reproduced the Dunning-Kruger effect, suggesting it's very likely real. Right?

But wait, it’s more complicated than that

Not so fast. The question is, why do the lowest performers at a given task consistently overestimate their performance? And do charts like the one above really demonstrate what they seem to?

In recent years, several studies (such as this one and this one) have been published that either deny or downplay the Dunning-Kruger effect, suggesting that the observed patterns in the graphs are due to subtle mathematical phenomena rather than psychological ones. Could it be that what we think is a cognitive bias is actually just a result of the math involved, rather than telling us something about the human mind?

Let’s explore how this could be the case by looking at a series of simulations that we ran. In each of them, we generated 1,000 simulated people, each with one measured skill value, and one self-assessed skill value. Each simulation makes different assumptions. We can then use these simulations when an apparent Dunning-Kruger effect appears.

Case 1: A test that some people perfectly ace or completely fail

What happens when the test that people take does not perfectly measure their real skill? For example, tests often are only capable of measuring skill within a specific range of abilities. If someone's real skill level is below the test's lower limit (the "floor"), they are likely to get the lowest possible score no matter how bad they are - in other words, the test can't distinguish between bad performers and extremely bad performers. In a sense, this inflates the score of extremely bad performers (since they get the same score as merely bad performers).

Similarly, if someone's real skill level is above the test's upper limit (the “ceiling”), they are likely to get the highest possible score no matter how exceptional they are - the test can't tell apart very good performers from incredible performers since they both tend to get the maximum score.

Ceiling and floor effects like these can generate graphs that appear to show a (small) Dunning-Kruger effect in situations when people are also making rough, noisy guesses about their own abilities. But these are "fake" Dunning-Kruger effects - they are mathematical artifacts that have nothing to do with human psychology.

We can see this phenomena in action here, as we adjust the parameters of the following simulation:

We start with a test without much floor or ceiling effect (top-left plot, above) and as we lower the ceiling and raise the floor on our test, a Dunning-Kruger effect appears (top-right plot) - and then the Dunning-Kruger effect becomes stronger when we increase the noise of the test used to measure skill (bottom plot). But as we know, this is just due to limitations in the test plus facts about mathematics, so it's a "fake" effect - it's demonstrating a mathematical fact, not a psychological one.

Why does this occur? Well, think of the people whose real skill level is far lower than what the test can measure (they're "below the floor"). As long as they know the test’s minimum score, it is nearly impossible for this group of people to underestimate their own performance - the lowest prediction they can give is the minimum score (which is already above their true skill level). But it IS possible for them to overestimate their performance. Because it's nearly impossible for them to underestimate their performance, but possible to overestimate it, and they have uncertainty about their skill level, on average they will overestimate it - an apparent Dunning-Kruger effect! But this is just a mathematical necessity, even for participants making totally rational estimates.

A similar effect happens in reverse for participants who are so skilled that their true skill is above the ceiling of the test - it's nearly impossible for them to overestimate their own performance (since, at most, they can assign themselves the maximum score) but possible to underestimate it, so on average they underestimate.

What this means is that anyone attempting to study the Dunning-Kruger effect might accidentally end up with a fake Dunning-Kruger effect if they use a test that has floor or ceiling effects! Since some real world tests will be aced or completely failed by a sizable fraction of participants, this urges caution to watch out for this phenomenon.

Case 2: A very noisy test of skill

It turns out, there exists a totally different reason why studies may find a fake Dunning-Kruger effect, even when using a test with no floor or ceiling effects.

Real-life tests do not measure skill perfectly - they have uncertainty in their results. In other words, they measure each person's real skill plus some random noise. That is certainly the case with commercial IQ tests. These tests rely on the hope that by combining enough of these tasks together IQ is measured accurately - but uncertainty inevitably remains.

The following simulations show us that, the noisier the test is at measuring skill, the more it seems that lower-performing participants overestimate their level of skill, making the charts look like an example of the Dunning-Kruger effect, when in fact the results are just caused by the noisiness of the test itself:

We can intuitively understand why this happens: in the group of participants with the lowest measured scores, we find those with genuinely low skill, and those who were unlucky when they took the test (their measured skill was lower than their true skill).

Those who are low skilled but got lucky on the test (i.e., their measured skill was higher than their true skill) wil be more likely to be on the right side of the chart and outside of the low measured skill group. Therefore, those inside the group will tend to have higher true skill than their measured skill! Because of this, the lowest scoring participants will seem to be overestimating their skill level on average even though (in this simulation) they are actually giving unbiased and accurate estimates of their skill.

A similar effect happens in reverse for people on the right of the chart: some of them are truly very skilled, and some were just "lucky" when taking the test of skill, meaning that their true skill is lower than their measured skill. So the chart will make it seem like they overestimate their performance, even though (in this simulation) they aren't doing so.

Like we saw with floor and ceiling effects, this has nothing to do with human psychology, it only has to do with the nature of testing and mathematics. This gives us our second reason why a fake Dunning-Kruger effect may be found: it will appear to occur when a test of skill has too much measurement error.

Case 3: Rational actors with uncertainty

Now let’s explore what happens when we introduce bias in the way participants estimate their skill. "Bias" here simply means that people are not right, on average, about their own skill - their estimates are not simply their true skill plus some random noise. A biased estimate means that individuals may systematically overestimate (or systematically underestimate) how skilled they are.

Although the term "bias" seems to imply irrationality, this isn't necessarily the case. For instance, imagine that you have no idea how good you are at a given skill. What is it most rational to believe about your skill? Well, if you truly lack any information about your skill, the best prediction you can make is that your skill is about average (or, that your percentile compared to people similar to you is around 50%).

Suppose then that you get a small amount of evidence that you are not that great - but the evidence is far from definitive. The rational solution is to then believe that your skill is a bit below average (or that you're percentile is slightly below 50%) - that is, to nudge yourself away from thinking you're average because of this evidence. The stronger the evidence is that you have low skill, the more your prediction should diverge (downward) from the average.

To make this concrete, suppose you've never played darts (or any game that's similar) before. Initially, you predict that you have average dart skill (among people who also haven't played). If you miss with your first three darts, you might nudge this prediction down a little bit - it's a small amount of evidence that you may be below average - but it's not much evidence since even okay players will miss sometimes, and plus, you're just getting the hang of it. As you keep missing throw after throw though it's rational to keep adjusting your estimate of how bad you are at darts. After 100 terrible throws in a row, you conclude you really are indeed very unskilled at darts.

Such an estimation procedure may actually be the rational thing to do! Yet it involves bias (in the technical mathematical sense). On average, people using this approach will systematically estimate their skill is closer to average than it really is. Interestingly, people who have low skill that use this procedure will systematically over-predict their performance - they will tend to put themselves closer to the average than they really are (since they start with a prediction of being at the average and adjust down as they get evidence), but since they are below average this means they overestimate their ability. On the other hand, people with above-average skills who use this method will do the opposite: they will start predicting they are average, and as they get evidence about their ability they'll keep bumping up their prediction, meaning that they'll systematically tend to underestimate their skill. Of course, if people knew whether they were one of the high-skilled vs. one of the low-skilled people this would be an irrational approach - but the point is that they don't know which of the two groups they are in, so they begin with an estimate that they are average and then "update" away from thinking they are average as they gather evidence.

The upshot is that, if following this very reasonable approach to estimation, both low-skilled and high-skilled people will tend to think they are closer to average than they really are - creating an apparent Dunning-Kruger effect!

Note that if this occurs it would be a real (not "fake") Dunning-Kruger effect, because if this happens then it's a genuine fact about human psychology, not a mathematical artifact. On the other hand, this sort of Dunning-Kruger effect may occur even among rational actors! In other words, while the Dunning-Kruger effect is usually thought of as a form of irrationality, this effect may be observed even if nobody is being irrational!

Of course, we don't know for sure that what's really happening psychologically is that people are starting with estimating their ability near the mean and then adjusting based on evidence. But it's interesting to observe that the results we see are very similar to what we'd get if that's indeed what they were doing.

So, is there a Dunning-Kruger effect?

Designing a test that doesn’t introduce the “fake” Dunning-Kruger effects we've mentioned can be quite challenging, even when one is aware of these effects. In our study, our estimate of the correlation coefficient between general intelligence (g) and our measured IQ is 0.75 (corresponding to a noise standard deviation of 10 in the above examples). That measurement error, while fairly small, contributes to the flattening of the self-assessments line - and therefore the appearance of a Dunning-Kruger effect. We unfortunately can't be confident about how much our own result can be attributed to a “fake” Dunning-Kruger effect vs. a real one. And for many other study designs it is difficult to rule out measurement error and floor/ceiling effects as the cause of any apparent Dunning-Kruger effect. Very carefully conducted measurements (with low measurement error, and no floor or ceiling effects) would have to be made to definitively prove there is a real effect!

But the cases and simulations above are remarkable because they show that even when researchers are careful to avoid "fake" Dunning-Kruger effects, the patterns that emerge in Dunning-Kruger studies can be reproduced anyway by people acting rationally.

But what of Kruger and Dunning's original explanation? Well, it might still be true - perhaps it really is the case that people with lower skill lack the ability to judge their own skill, and that, additionally, high skilled people are suffering from a false-consensus effect. Indeed, Kruger and Dunning provided some evidence of this in their original studies. But it seems to us that the combination of these two biases (one impacting low skilled people, the other impacting higher skilled people) doesn't necessarily provide the most parsimonious explanation for the observations. We have to contend with at least two other possibilities:

The Dunning-Kruger effects are fake, due to one of the two mathematical issues we've discussed
The Dunning-Kruger effects are real, but they are a result of rational behavior, not the combination of two biases

The longer, more-detailed version of this article goes even further: showing how two plausible assumptions can result in simulations that produce results extremely similar to actual data that appear to show the Dunning-Kruger effect, without requiring Kruger and Dunning’s explanation to be true. We also show in that piece evidence that Dunning-Kruger studies suggest two additional types of irrationality may be occuring (an "Under Adjustment Bias" and a "Better-Than-Average" effect) that are not the biases usually used to explain the Dunning-Kruger effect.

Final takeaways

So, what did we learn here?

There are some key takeaways from these simulations:

Fake Dunning-Kruger effects that are just due to mathematical artifacts (rather than psychology) can easily occur - for instance, when a test of skill has a high degree of measurement error, or when the test has ceiling and floor effects. Therefore, great care must be taken in research to be confident one has found a real Dunning-Kruger effect, rather than a fake one. In fact, unless really high-quality skill measurements are made, we can't be sure any given study is showing real Dunning-Kruger effects rather than fake ones! This is true in the case of our IQ study. We hope that more such studies will be conducted that are able to make the really precise measurements required.
A real Dunning-Kruger effect probably still may exist, but it may be the result of rational behavior! While we can't rule out Kruger and Dunning's original explanation for the effect (i.e., that the lowest performers lack the metacognitive skills needed to accurately assess their own performance), the data matches simulated data where people estimate their skill level to be average (when they lack any evidence about their skill), and then adjust their estimates based on the evidence they have.

If you’d like to read more about this subject, check out the full write-up (of which this is an abridged version) here.

If you’re interested in this subject, why not also try our Overconfidence Analyzer tool? Discover how accurate your confidence levels are likely to be for any skill.

Launch the Overconfidence Analyzer!

You can also take your thinking about popular psychology concepts even further with the Guess Which Experiments Replicate quiz. Can you figure out which high-profile experimental findings replicated when people tried the experiments again, and which did not?

Launch the quiz!