Liars, Damn Liars, and Statisticians
Sure, sure, I had statistics thrown into the mix when in engineering school, but nothing like the 120 hours of applied industrial stats I went through when I worked for ALCOA. Most had to do with the hard core fundamentals, but the point of it was to prepare us for design of experiments, which is really where you get your money’s worth out of experimentation and data.
I mention this not because I’m a stats freak (I truly am not), but because one of the very interesting things we went through was data mining and analysis, and how you could, under some circumstances, get what appears to be almost diametrically different results depending on how you slice the data set.
There are two different issues afoot here: one – how do you construct the experiment, and two – how do you ask the questions of the data set you get.
In the ideal world, you determine what the questions are that you want answers for before you construct the experiment, so you can design it to maximize the potential for getting good data.
However, just as in real life where there are what I refer to as the “Thirteen Sides to Every Story” effect, this effect also applies to experimental design – there are many different ways to construct an experiment with the goal of finding the best data, and each gives its different perspectives. So you may think you have the best data (the “truth”), but someone else could have the “truth”, too, that’s slightly different than yours.
So, there is the potential for, hmm, differences in “truth” in how you design the experiment when your motives are pure.
Let’s say, then, you get the data set and analyze it based on your initial parameters and get your results (good/bad/mixed, doesn’t matter). What if you start changing the parameters of the analysis? Doesn’t good research demand that you look deeper into the data to see if there is something significant hidden within the larger data? Sure! But this is where subjectivity gets involved – anytime you start the “What if we set boundaries here…” you are likely leaving objectivity behind – after all, if you don’t know what you are looking for, how can you be objective about it? Different people, with different perspectives, will formulate different approaches, and, voila!, you get different results… even when intentions are good.
Then there is the whole other situation when someone wants to manipulate the data in hopes of providing “proof” for a given position. It depends on the data set, of course, but this can be ridiculously easy to do – not trivial, and potentially time-consuming, but not hard.
Where this leads me, ever since those stats classes at ALCOA, is the reminder that whenever I see significant results shouted out based on rigorous analysis of data, I don’t immediately discount the “proof”, but I hold my acceptance of the results in abeyance until I see some reasonable corroboration that the design, data and analysis were done on the up and up…. and even then, I remain skeptical to a certain degree – after all, there are always thirteen sides to every story of data, too.
There are Liars, Damn Liars, and then there are Statisticians…

Trackbacks and Pingbacks