A couple years ago, I wrote Online Advertisements and Statistical Analysis, in which I did my best to show that a past study of online advertising click-thru rates (CTRs) wasn’t worth the pixels it was printed on.
About a week ago, my wife and I were visiting friends, and I found myself in a room with 3 neuroscientists. The topic of statistics came up, and I managed to insert into conversation my small triumph in analyzing the click-thru study and determining both a confidence interval and the number of tests that would need to be run in order to have a meaningful confidence interval. “Sure,” one of the scientists says, “but what you should really do instead is a chi-square test for goodness-of-fit.”
I had no idea what that meant, or even how it was spelled (I was thinking Kai at first instead of chi), but I found a description in my statistics textbook.
In the original post, there were 6 different banner advertisements, which varied only slightly. Each one was run for 30,000 impressions, which resulted in 6 click-frequencies:
Ad | A | B | C | D | E | F |
---|---|---|---|---|---|---|
Clicks per 30,000 | 81 | 84 | 90 | 96 | 99 | 108 |
The chi-square test calls not just for observed values, but also for expected values:
For the expected value, I used the mean of all 6 ads, 93 clicks per 30,000 ad impressions. (I don’t know if this is the best value to pick, but it falls in line with the textbook examples.)
I got a chi-square value of 5.42. I wasn’t positive in this case if there were 5 degrees of freedom, or 4, but in either case, using a lookup table, it suggested that the p-value was greater than 0.1, which further suggests that the differences in clicks may have been due to chance, rather than differences in ad design.
My wife uses several stats software packages, so we entered my data into GraphPad Prism and ran a chi-square test there. It came up with a chi-square value very close to mine (though not identical–I’ll have to check my figures again), and a p-value of 0.37. That value, if I interpret things correctly, means there was a 37% chance that the results were due to random chance.
I realized part-way through this exercise that there is a way to trick people into believing the truth without any analysis at all, though. Instead of focusing on the clicks, focus on the instances where people didn’t click. Sure, if one ad gets 81 clicks and another gets 108 clicks, one seems clearly superior to the other. If you instead compare 29,919 non-clicks to 29,892 non-clicks, the difference seems trivial.
I am more satisfied with my result, though. We should be skeptical when people try to convince us that such small differences are significant over so few tests.