Briefly: we show that averaging multiple guesses from one person about a matter of fact provides a better estimate than either guess alone. We interpret this as evidence of people ‘sampling’ responses from internal belief distributions.
Here are links to this paper in the news:
- Psych Science Blog
Here is a link to the final paper: https://www.edvul.com/pdf/VulPashler-PS-2008.pdf.
There were a few concerns expressed by various commenters on the sites above, so we’ll respond to them here:
1. Are people looking up the answers?
No. For two reasons: first, the second guesses (immediate and delayed, both) are worse than the initial guesses (see original paper). Second, if people were looking up, and reporting, the correct answer, there would be a conspicuous peak in the error histograms at the true value (error=0). The histograms in the figure below reveal no such effect.
The scatter plot of errors for guess 1 and guess 2 in the immediate (panel 1) and delayed (panel 2). We also show the marginal histograms for each guess. The positive correlation indicates that subjects are not setting a ‘range’ or systematically ‘overcorrecting’ with their second guesses. The histograms indicate no evidence of subjects ‘looking up’ the answer before guess 2. The joint distribution is best described as a multivariate normal with a positive correlation.
2. Is it “just a statistical phenomenon”?
If one averages a number of independent samples, their average will converge to the mean of the population, which will be, roughly, the correct answer if the group is not biased. It seems natural for guesses from different people to be independent samples from an unbiased distribution, thus giving rise to the “wisdom of crowds”. However, this is not usually assumed of multiple guesses from a single individual. The fact that we observe the statistical phenomenon (a reduction of error in the average of multiple guesses from one individual) indicates that the error of different guesses from one person is independent — suggesting the presence of a “crowd” within.
3. Are people implicitly setting a range?
No, if people were setting a ‘range’ the error of guess 1 and guess 2 would be anti-correlated. Instead, they are positively correlated (see the scatter-plot from the answer to question 1): with a correlation of 0.7 for the immediate guess, and 0.35 for the delayed guess.
4. How can the errors be positively correlated, the second guess be worse than the first, and the average to be better than either guess alone?
I think the problem most people have with this is the implicit assumption that all of the conditions must apply to every single pair of guesses. Thus people wonder: if guess 1 is better than guess 2, and guess 2 errs on the same side of the mean as guess 1 (an inference made from the positive correlation), then how can the average be better than guess 1 — it must be further from the correct answer. This would be true if all of these conditions pertained to all of the guesses, but the correlations and average errors are statistics of the whole group of guesses.
Our pairs of guesses are best described as random samples from a multivariate normal distribution with mean [0 0], a positive correlation, and greater variance along the guess 2 dimension. Perhaps more intuitively, I can provide an example in which all of the conditions are true of the population, while the average is better than either guess alone:
- guess 1: 3 guess 2:1.1
- guess 1: 1; guess 2: 3.1
- guess 1: -3; guess 2: -1.1
- guess 1: -1; guess 2: -3.1
Note that the MSE is lower in guess 1 (5) than guess 2 (5.4), but the MSE of average (4.2) is lower yet, and the correlation between guess 1 and guess 2 is positive (0.6).
5. What are the questions, answers?
1. The area of the USA is what percent of the area of the Pacific Ocean?
2. What percent of the world’s population lives in either China, India, or the European Union?
3. What percent of the world’s airports are in the United States?
4. What percent of the world’s roads are in India?
5. What percent of the world’s countries have a higher fertility rate than the United states?
6. What percent of the world’s telephone lines are in China, USA, or the European Union?
7. Saudi Arabia consumes what percentage of the oil it produces?
8. What percentage of the world’s countries have a higher life expectancy than the United States?