Re: Case Studies in Retrospect
I would not have predicted that the pars would do quite that well, but I agree that they did. Perhaps I should have expected it.
Compare any two versions of a given image at random, and ask yourself whether a significant majority, say two-thirds, would prefer one version to another. I’m experienced at asking myself that question and then verifying my opinion with others. Without seeing these two versions in advance, I’d have to predict that the chances of one being voted better are about 40%, since I imagine that 40% of the time the jury would prefer the other and 20% of the time it would have no preference.
Now suppose that we compare one of these versions to five randomly chosen ones. What are the odds of it winning against all five? This is impossible to state accurately, because it’s not like flipping a coin five times, where what happened on the first four flips has no bearing on the result of the fifth. But each time our version wins it becomes more likely to win against its next opponent. If it has already won against four opponents it has shown itself to be good enough to be a heavy favorite against the fifth. So, I’m going to speculate that the odds of being better than all five randomly chosen opponents are slightly less than one in ten, let’s be generous and say 10%.
For the MIT study I took 150 random images, each corrected by five retouchers. In each case I averaged the five, then compared the average to each of the five parents. According to my scoresheet the odds of the average being better than an individual parent were not 40% but 74%. The odds of being better than all five were not 10% but 24%.
I was thinking that our par versions would show somewhat similar results but this was stupid. Of the five MIT retouchers often one or two would do a notably poor job, and this would detract from the average version. Someone who did a good job, therefore, had a potential advantage over the average. In our case this was not true: the par versions weren’t created from random efforts but rather from good ones, the best five, say, out of 25.
And the results were convincing. I didn’t actually count these up like I did with the MIT study but my impression that in at least half of these studies the par would have been voted better than any of its five parents. There were instances where people came up with something at least competitive to if not better than the par, but they were rare.
I seem to recall posting once that blending multiple versions is probably most advantageous to the inexperienced, because averaging may cover up certain deficiencies in one version. But these studies suggest that maybe blending is even more advantageous to the experienced professional.