These thoughts were spurred by a post I spotted by principal research scientist at MIT Andrew McAfee entitled The Pathetic Pundit Playbook. He’s not the only one to advance this ridiculous narrative – only the latest.
Most of the folks providing this "punditry is awful and must be destroyed" line of messaging are they themselves pundits… otherwise we wouldn’t be talking about this. Pundits are simply "people who provide opinions to mass media." If you’re a blogger with any kind of opinion, or you’re appearing on a radio show, podcast or tv show, you’re a pundit. You’re providing color commentary on anything online? You’re engaging in punditry. In this, Andrew McAfee undermines his own editorial when he says “…pundits become easy to recognize. When you spot one, move calmly and purposefully toward the nearest exit.”
Nate Silver, to the best of my knowledge, didn’t ever reveal the precise nature of his algorithms, nor the methodology he used for normalizing his data, nor on what methods he used to ensure his source data integrity.
I certainly remember the things I and most students were told in math class: show your work. Because Silver and his organization isn’t responsible for making the observations (conducting the polls), there’s a bit of a black box involved, and his methodology won’t be unassailable.
There were plenty of well-reasoned critiques of Silver from me and others, including the comments on McAfee’s own blog:
Questioning whether his inputs (state polls) were valid isn’t a conspiracy theory. Pointing out where Nate has been wrong in the past (because the state polls were wrong) is hardly an assault on math. His model is built on assumptions and some people didn’t buy those assumptions. Turns out Nate’s assumptions were correct, but that doesn’t mean it was outrageous to question them.
Compound this with the adversarial nature he chose to take with established political pundits, and it created an unnecessarily antagonistic relationship between data science and punditry.
This relationship is a media narrative, not reality.
Journalism and punditry relies on data science, and is not antagonistic to it. The source of modern punditry’s most visible foibles isn’t being divorced from science, it’s that revenue models in place for the majority of Old and New Media journalism incentivizes sensationalism. Data Science relies heavily on observation and hard facts and figures – it isn’t prone to wild unexplained swings (within the context of political polling, particularly), so it only acts as an aid to punditry, not an adversary.
Are there bad pundits? Absolutely, and I criticize them all the time. Are all pundits bad? Of course not, don’t be ridiculous.
Does Data Science solve every problem with journalism (political or otherwise)? No. Does it solve a lot of things? Heck, yes. SiliconANGLE is literally banking on it.
Disclosure: SiliconANGLE is a data-driven journalism organization. I’m a founding editor there; I know what I’m talking about. I like that Nate Silver has brought increased attention to data science, but I don’t like that everyone is suddenly an "expert" on what data science is, and what it’s relationship to media is.
Update: On Facebook, longtime friend Joel Finkelstein questioned me on some of the details of what I was trying to say here…
Joel: I talk with guys like Sam Wang (Princeton Election Consortium), or Peter Norvig, or whoever… and these guys definitely believe that their analysis is more akin to a scientific truth claim than it is to an opinion (an is vs. an ought). If that is so, why shouldn’t we have different forms of punditry for the appropriate level of analysis (Is vs. ought) so we can actually make substantive arguments about truth claims rather than conflate them with opinion pieces.
Rizzn: Reliability and track records are the watch-words of punditry. Data Scientists, despite their best claims, are not at all on the "is" side of "is v. ought." They’re on the "more" side of "more v. less reliable."
This is a nuanced topic, and it often depends on what sort of data you’re analyzing (structured v. unstructured), but always have to make a set of assumptions about the data you’re analyzing before you can formulate an algorithm and reach a conclusion based on the observed data. The assumptions all have to do with the presumed accuracy of the data, the level of granularity/sampling of the data, and if you’re doing time-series analysis (as Nate was doing) that your assumptions remain constant over the course of the time series (ie, that variables aren’t changing under your nose).
These sorts of assumptions require the expertise of a SME, which may or may not be as knowledgeable as the data scientist in charge of the algorithm. Any time you’re relying on expertise of an SME, you’re subjecting your algorithms to their bias and experience (since, as we know, expertise can be faked).
I’m not saying this to cast aspersions on data science, but these are the realities… if we start putting faith in data science as this infallible thing, we’re going to open the door for all sorts of things masquerading as data science (taking advantage of social confirmation bias). It’s a new-ish discipline, and it’s important to understand the vulnerabilities of the discipline. People understand the vulnerabilities in general purpose punditry – that’s what we call being media-savvy. People don’t understand the vulnerabilities in data science and visualization… so they’re looking like fools for not questioning it as often as they’re looking like fools for questioning it in the wrong ways.