

by Elena Zhirukhina, Marie Curie Fellow at the School of International Relations, University of St Andrews.
Intensively growing number of researchers use disaggregated data on a variety of conflict types to study their dynamic. N-large datasets present a tremendous opportunity to closely look on how violence evolves. Disaggregated datasets, while providing details, contain certain aspects to consider when extracting specific data and, especially, when combining data from different sources. Working mainly with data on terrorism incidents, I encountered three issues to keep in mind, including data verification, specificity of methodology, and language.
Issue 1: Data verification.
Requirements of data collection specify the necessity to have links to primary sources, especially if information was obtained from the media. The possibility of reaching the initial source is seen as an essential step in data verification, especially when considering that the main source used for terrorism studies is media[1]. Media reports are known to have the risk of disinformation, censorship, underestimation, or overestimation[2]. However, lacking a better choice, researchers naturally rely on media. Biases associated with media can be overcome by using disaggregated data[3] which allows for checking different levels of data and going deeper into the sources. Nevertheless, the study still should acknowledge for facts, which are more likely to be distorted, such as geographic location of the event or casualties caused by it[4]. The reasonable solution to reporting bias is to resort to the inclusion of diverse sources and triangulation[5]. It becomes important to pay attention whether the dataset contains links to primary sources enabling verification.
Issue 2: Specificity of methodology.
Databases rely on different methodological approaches shaped by the purposes of their collection. Thus, there is no broadly accepted definition of terrorism and related violence. Absence of any common approach influences the origin of the datasets. Methodology varies across the projects, demanding care with what the study is focused on and how data was gathered. Differences in definitions, limitations imposed by focus on specific areas, type of events, or perpetrators heavily influence the data. Accuracy with definition is very valuable while examining terrorism-related actions due to the ambiguity of the phenomenon and the presence of multiple terms[6]. One of its consequences is the inclusion of different numbers of incidents into the datasets. Thus, the databases based on distinguished methodologies are likely to possess a distinct number of events.
Issue 3: Language.
An issue arises regarding linguistic perspectives on data collection. Many global databases refer to English-speaking media in which only severe attacks tend to be mentioned, leading to underestimation of the number of incidents. It happens due to the remoteness of an event location or its small scale[7]. Local sources are believed to positively influence inclusiveness of events, in contrast to English-speaking media that typically cover only exceptional cases and may elide ordinary incidents. If an incident occurred in a remote area, it is likely to be reported infrequently or information is likely to be distorted[8]. If the event resulted in insignificant casualties, it is unlikely to be reported by international media[9]. That is why incorporation of local media information into situation estimation, where possible, is worth an effort.
[1] LaFree, G., 2011. Using Open Sources Data to Counter Common Myths About Terrorism. In B. Forst, J. R. Green, & J. P. Lynch, eds. Criminologists on Terrorism and Homeland Security. New York: Cambridge University Press, pp. 411–442.
[2] Schmid, A., 2004. Statistics on terrorism: the challenge of measuring trends in global terrorism. Forum on Crime and Society, 1-2(December), pp.49–71. Available at: https://www.unodc.org/documents/data-and-analysis/Forum/V05-81059_EBOOK.pdf.
[3] Kalyvas, S.N., 2006. The Logic of Violence in Civil Wars, New York: Cambridge University Press.
[4] Weidmann, N.B., 2014. On the Accuracy of Media-based Conflict Event Data. Journal of Conflict Resolution, p.0022002714530431. Available at: http://jcr.sagepub.com.ezproxy.lib.utexas.edu/content/early/2014/04/28/0022002714530431\nhttp://jcr.sagepub.com.ezproxy.lib.utexas.edu/content/early/2014/04/28/0022002714530431.full.pdf.
[5] O’Loughlin, J., Holland, E.C. & Witmer, F.D.W., 2011. The Changing Geography of Violence in Russia’s North Caucasus, 1999-2011: Regional Trends and Local Dynamics in Dagestan, Ingushetia, and Kabardino-Balkaria. Eurasian Geography and Economics, 52(5), pp.596–630.
[6] Schmid, A.P., 2011. The Routledge Handbook of Terrorism Research, New York: Routledge.
[7] Weidmann, N.B., 2014. On the Accuracy
[8] Ibid
[9] Weidmann, N.B., 2016. A Closer Look at Reporting Bias in Conflict Event Data. American Journal of Political Science, 60(1), pp.206–218.