GDELT and Big Data- Why Theory Still Matters

I’m really excited by the announcement of the GDELT (Global Data on Events, Location, and Tone) data set. Foreign Policy has done a great summary, and the Dart-Throwing Chimp has also written an insightful commentary on GDELT’s potential for the evolution of political science. You can also read the authors’ paper, which was recently presented at the ISA conference a few weeks ago.

At the risk of oversimplifying things, GDELT is a step towards building a giant database of everything.  Political events such as diplomatic overtures, threats, and demonstrations can be visually mapped out and tracked over a specific period of time. By mapping out the occurrence of events, one can potentially use the data as a predictive tool, as James Yonamine did in his paper on tracking violence in Afghanistan.

Image

More real-time work is being done by Alex Hanna on the Arab Spring and Rolf Fredheim on Russian protests (thanks for the tips so far!),  and I look forward to contributing as soon as I become more familiar with R software. But in this post, I’ll focus on the normative and theoretical implications of GDELT and Big Data.

In Viktor Mayer-Schoenberger and Kenneth Cukier’s recent book, “Big Data: A Revolution That Will Transform How We Live, Work, and Think”, the authors emphasize how having more “messy” data can transform our traditional theories of causation in social science. Whereas social scientists once needed high quality and representative sample sizes for their data sets, researchers of today and tomorrow will have the luxuries of having an overwhelming deluge of data instead. The necessity of having a “valid” sample will be lessened, when you can potentially have “n=all” instead.

In a recent interview, Viktor Mayer-Schoenberger summarizes how our age-old model of the scientific method may be improved.

“So Big Data enables us not to test the hypothesis, but to let the data speak and tell us what hypothesis is best. And in that way it completely reshapes what we call the scientific method or — more generally speaking — how we understand and make sense of the world.”

The Big Data movement may potentially evolve our traditional notions of what constitutes “theory”, and in turn improve our understanding of what the world is like. Yet it becomes even more important to remain skeptical of the data which is presented. Skilled researchers must be aware of both GDELT’s own internal limitations, and its external implications for policymakers as well. John Beiler has also written a great piece on the theoretical implications of big data, in particularly what this means for social science theory.

“…I think the social sciences, and science in general, is about asking interesting questions of the data that will often require more finesse than taking an “ANALYZE ALL THE DATA” approach. Thus, while datasets like GDELT provide new opportunities, they are not opportunities to relax and let the data do the talking. If anything, big data generating processes will require more work on the part of the researcher than previous data sources.”

My takeaway from John’s points is that data in itself is not neutral, we must become especially more self-reflective about the parts of the data we are using. But I am even more concerned with the external and practical purposes of the data we have access to.

In the example of aid transparency, “Big Data” is inherently geared towards policy, academic, and donor elites who actually have the power to interpret and do things with these numbers, while the beneficiares of aid (often the rural and technologically-illiterate poor) are often disadvantaged. During a recent roundtable at ISA2013, I heard a story about an African government which had asked a aid transparency organization to disclose the names and location of civil society groups to find out where the aid was going. In this case, the government wanted to use the data to target the NGOs’ (who can be a threat to the state’s legitimacy in certain contexts) and put them in jail. The commentator went on to emphasize that no one really knows who is using open data, and there is no clear way to determine for what purpose.

In conclusion, information and “Big Data” are not apolitical, for information in itself is power. Power remains asymmetrically biased towards the actors who not only has access to big data and information, but also have the power to do things with it. Furthermore, big data cannot tell us what we should do with this information.

Regardless, I remain optimistic about the potential of big data to really evolve our traditional ways of thinking. I agree with John and Big Data’s authors in that more and more “messy” data will be more useful to improving our theories. Yet I remind myself that theories of power and politics are still relevant, perhaps more than ever.

 “Big data is a resource and a tool. It is meant to inform, rather than explain; it points us toward understanding, but it can still lead to misunderstanding, depending on how well or poorly it is wielded. And however dazzling we find the power of big data to be, we much never let its seductive glimmer blind us to its inherent imperfections.”  Page 197 of Viktor Mayer-Schoenberger and Kenneth Cukier’s”Big Data: A Revolution That Will Transform How We Live, Work, and Think”,


The “Academic-Policy” divide and the role of the citizen

Earlier this morning, the Duck of Minerva linked to a CNN commentary by UCSD’s Stephen Haggard who argues that Kim Jong-Un is not crazy, citing clues that the recent blustering is “ritualized escalation” for domestic political purposes. This isn’t exactly a mind-blowing argument, as most academics in social science would likely agree of the importance of understanding the domestic constraints to a political actor’s actions. Indeed, it is rare for a political leader to be “crazy”; my personal impression is that the young leader must act especially bellicose to consolidate his power among the military leaders.

But scroll a bit down to the article’s comments, and witness the overwhelming amount of disagreement to Haggard’s comments. Here are some of my favorite comments:

  • “This professor, in my opinion, has spent to way too much time in suspended academic animation, San Diego style. Maybe the professor is the one who is crazy. Maybe it’s the intoxicating sunset views of the pacific at UCSD that renders him unable to escape from that dream-induced world of his.”
  • “You are the problem.”
  • “I suppose that the same argument could be made that going into a party store with an exposed semi-automatic rifle and wearing a ski mask doesn’t mean you intend to rob it or hurt anyone. I know that argument works great with the police right?”

So on and so forth, rare was the voice that agreed with Kelly’s analysis.

It is naturally elitist of academics to dismiss the voices of the online mob; winning arguments on the Youtube comments section is often times a lost cause. I wouldn’t blame any reasonable person in academia with giving up entirely on communicating their ideas to the masses, it’s much more rational for an academic to cater their arguments to the policy community.

However, I do wonder. Policymakers, politicians, and bureaucrats are tied (indirectly and directly) to the concerns of the greater citizen public, for all its good and bad.

Conventions and conferences like the recent ISA 2013 at San Francisco are often ripe with discussions about the “theory-practice” divide, where academics lament the lack of attention policymakers give to their ideas. At a thought-provoking panel about Rio +20 and the future of sustainable development, we wondered why there was virtually zero representation of UN/State Dept/think tank’ish/CSO/NGO delegates in the room. I’m sure this scene is replicated in nearly every panel, where are all the policy wonks and activists who may benefit from academic knowledge?

While the “theory-practice” divide is given much attention by various academics and bloggers, I’ll pose a different question instead. What are the implications of academia ignoring the conversation with the greater public audience (especially the people misinformed about China holding all our money)?

My hypothesis follows that, IF citizens can significantly influence the agenda of policymakers (ideally so in a representative democracy!), and IF academics want to influence the agenda of policymakers (strong disconnect between what academics and policymakers are most concerned about), it would follow that academics need to be better at engaging the citizen public at large.

(of course, this is contingent upon policymakers caring about their constituents, whether or not citizens even disagree with most policymakers, etc etc)

I’m not suggesting it’s the academics’ fault for the public’s many misconceptions. 37% of Americans believe global warming is a hoax, 28% believe Saddam was involved in 9/11, etc etc (but at least only 5% believe Paul McCartney is dead!). The media is heavily complicit in spreading false assumptions too, perhaps the most culpable of all.

But academics seem to cede the battle entirely, with most of their work being directed towards each other, policymakers, and civil society elites. While this is perfectly understandable, considering the day-to-day incentives and constraints of an academic scholar, I wonder what we lose by ignoring the public sphere completely. There must be some cost, when academics cannot communicate their ideas and perspectives to the larger community as well.

I do believe that there are inherent limits to telling somebody the truth about something, the rational actor does not exist in the collective sphere. I would hope that academics continue to explore more creative mediums to teach the greater public, such as film documentaries and interviews that may connect with people on a more emotive perspective.

That’s why Stephen Colbert’s whole “truthiness” gag is peculiar to me. While the whole Colbert bit is obviously sarcastic, I do feel that people can’t experience truth through knowledge alone. Though we’re living in the big data/information age, let’s not forget that people will instinctively trust their gut when confronted with counter-intuitive knowledge.

What are your thoughts?