Civil Unrest Twitter Data

Introduction

Civil unrest events are typically organized in social media, especially by Twitter and Facebook. Therefore, mining these data allow us capability to potentially detect and forecast future events. By identifying those tweets who could indicate about future civil unrest events, the goal is to utilize Twitter data as social sensors to forecast the spatiotemporal patterns of protests for different locations and dates.

Processsed Data

Download link:

Dataset
#Events

Download Link

Argentina
1427
Brazil
3417
Chile
776
Colombia
1287
Ecuador
511
El Salvador
730
Mexico
5907
Paraguay
2114
Uruguay
664
Venezuela
3320

Data format: *.mat (can be opened by Matlab)

Data description:

Variable Name
Type Size

Description

keywords array of string 1*923 keyword list to represent a tweet message into a document vector
locations array of string 1*n location names of n cities in the current country
dates array of string 1*729 all the dates
X array of matrices 1*n input data: tweet data for n locations from 2013-01-01 to 2014-12-30
  • each element is a 729*923 matrix: 729 samples (i.e., dates) by 923 features (i.e., keywords)
    • each element is a keyword count for a date
Y array of matrices 1*n output data: event occurrence data for n locations from 2013-01-01 to 2014-12-30
  • each element is a 1*729 matrix: 729 samples (i.e., dates)
    • each element is the outbreak occurrence (1) or not (0) for each date

Data Source

All the civil unrest tweet messages X, label set Y, and keywords are obtained from IARPA OSI project. Please refer to the papers [KDD 14] and [KDD 16] for details. The raw label set can be downloaded here: [Output Raw Data].

Citation

To use these datasets, please cite the papers:

Liang Zhao, Qian Sun, Jieping Ye, Feng Chen, Chang-Tien Lu, and Naren Ramakrishnan. "Multi-Task Learning for Spatio-Temporal Event Forecasting." in Proceedings of the 21st ACM SIGKDD Conference on Knowledge Discovery and Data Mining (KDD 2015), research track, (acceptance rate: 19.4%), Sydney, Australia, pp. 1503-1512, Aug 2015.

Sathappan Muthiah, Patrick Butler, Rupinder Paul Khandpur, Parang Saraf, Nathan Self, Alla Rozovskaya, Liang Zhao, Jose Cadena et al. "EMBERS at 4 years:Experiences operating an Open Source Indicators Forecasting System." in Proceedings of the 22st ACM SIGKDD Conference on Knowledge Discovery and Data Mining (KDD 2016), applied data science track, accepted (acceptance rate: 19.9%), pp. 205-214, San Francisco, California, Aug 2016.

Acknowledgement

 

NSF 1755850 (sole-PI): "CRII: III: Interpretable Models for Spatio-Temporal Event Forecasting using Social Sensors", $174,990. 2018-2021, National Science Foundation.