Loading...

EPIDE: Event Prediction in the Big Data Era

Reviews, Tutorials, Datasets, and Code

A website based on the survey paper:
Liang Zhao. Event Prediction in the Big Data Era: A Systematic Survey. ACM Computing Surveys. 54, 5, Article 94 (June 2021), 37 pages. DOI: https://doi.org/10.1145/3450287


Background

Motivations: Events are occurrences in specific locations, time, and semantics that nontrivially impact either our society or the nature, such as earthquakes, civil unrest, system failures, pandemics, and crimes. It is highly desirable to be able to anticipate the occurrence of such events in advance to reduce the potential social upheaval and damage caused. Event prediction, which has traditionally been prohibitively challenging, is now becoming a viable option in the big data era and is thus experiencing rapid growth, also thanks to advances in high performance computers and new Artificial Intelligence techniques. There is a large amount of existing work that focuses on addressing the challenges involved, including heterogeneous multi-faceted outputs, complex (e.g., spatial, temporal, and semantic) dependencies, and streaming data feeds. Due to the strong interdisciplinary nature of event prediction problems, most existing event prediction methods were initially designed to deal with specific application domains, though the techniques and evaluation procedures utilized are usually generalizable across different domains. However, it is imperative yet difficult to cross-reference the techniques across different domains, given the absence of a comprehensive literature survey for event prediction. This website aims to provide a systematic and comprehensive repository of resources about this domain.

Problem Formulations, Techniques, Applications, and Evaluation Metrics: please refer to our paper in ACM Computing Surveys. A even longer version is here: https://arxiv.org/abs/2007.09815

Datasets

<
Domains
Applications and Data Resources
Healthcare Population-level Pandemics [COVID-19 dataset];
Influenza Outbreaks [CDC flu map dataset];
Social sensing [Twitter and Google flu trend datasets]
Individual-level Social media-based [Flu adverse event dataset];
Electronic health records [EHR dataset];
Online Health Community [Breast Cancer Forum dataset];
Mobile Devices [mHEALTH dataset]
Media

Multimedia-based

Next action prediction [Sport-related datasets, Vehicle-related datsets]
Script-based Offline news data [New York Times];
Online News Data [dataset]
Transportation

Group-based

NYU Yellow Taxi datasets [dataset link];
Didi Gaia dataset[dataset link]
Individual-based Car accident prediction [dataset link]
Engineering Systems Engergy engineering Global Energy Forecasting Competition dataset [dataset link];
Solar Power Forecasting [2014 datasets]
Cyber systems


Cyber attack prediction [Phoronix benchmark suite]
[ARCS datasets] [Social coding datasets]
Political Events

Offline events

Civil unrests [ICEWS dataset] [GDELTS dataset]
[OSI dataset] [TERRIER dataset].
Online events Online activism [online petition prediction dataset]
Natural Events

geophysics;
atmospherics;
astrophysics

Geophysics [USGS earthquake datasets, Water pollution prediction];
Atmospherics [China environment data];
Astrophysics [Solar event datasets]
Business

Customer Activity Prediction

Telecom Churn datasets [dataset survey];
Massive Open Online Courses event dataset [dataset survey]
Business Process Events CRSP/Compustat Merged Database [dataset link];
AAERs datasets [dataset link];
BPI Challenge datasets [dataset paper link]
Crimes

Political Crimes and Terrorism

Global Terrorism Database [dataset link];
UCDP dataset [dataset link]
Crime Incidents and Hotspots NYC Crimes [dataset link]; Demographic [dataset link]; Sensor data [dataset link]
Synthetic Dataset N/AComplex event dataset and code [dataset link]

Methods and Code

Research Problem
Methodologies Papers, Methods, Code
(the following reference numbers are from our survey paper.)
Time Prediction Occurrence Classification;
(Auto-)Regression;
Anomaly Detection.
[212,209,85,117,47,11,171];
[59,68,194];
[160,83].
Discrete Time Direct Manner;
Indirect Manner.
[126] [131] [162] [172];
[20] [92] [24] [92] [146]
Continuous Time Regression;
Point Process;
Survival Analysis.
[20] [111] [179];
[141] [45] [58] [163];
[114] [101] [175]
Location Prediction

Raster-based

Spatial clustering;
Spatial embedding;
Spatial convolution;
Trajectory destination prediction.
[88] [178] [185];
[76] [88] [148];
[71,139];
[93,94,98,174,182,215]]
Point-based Supervised:
spatial multi-task learning;
spatial autoregressive;

Unsupervised:
spatial scan;
network scan.

[206,69,207,132];
[25,205,193,192];

[31,144];
[109,32]
Semantic Prediction

Association-based

Frequent set mining-based;
decision list-based
[176,213];
[106]
Causality-based Step 1: Event representation;
Step 2: Event causality inference;
Step 3: Future event inference.
[46,161,210,105,35,191,142,210]
Semantic Sequence Classical sequence classification:
feature-based;
model-based;
prototype-based;

Recurrent neural networks:
attributed-based;
descriptive-based.

[72,170,72,107,64];
[189,7,8,104,202];
[3];

[116,26,27,71,95,56,113];
[79,80,167,186,195]
Joint Prediction Time and Semantic Temporal association rule;
Time expression extraction;
Multivariate time series forecasting.
[176,35,166,190];
[44,84];
[110,184,19,23,121]
Time and Location Raster-based:
CNN+RNN;
3D Convolutional Neural Networks;
Spatiotemporal Conditional MRF.

Point-based:
spatiotemporal Gaussian process;
spatiotemporal point process.
[53,71,82,87,88,100,142,147,183,197,211];



[118,134,120]


Time, Location, and Semantic System-based:
model-fusion system;
crowdsourced system;

Planned future event detection;
Tensor-based methods.

[144,135,78,91,149,108];
[39,44,84,90,99,129,130];

[127,214]

Reviews and Tutorials

Literature Reviews

  • Liang Zhao. Event Prediction in the Big Data Era: A Systematic Survey. ACM Computing Surveys. 54, 5, Article 94 (June 2021), 37 pages. DOI:https://doi.org/10.1145/3450287


  • Related Tutorials

    • Spatio-temporal Event Forecasting and Precursor Identification
      • Presenters: Ning, Yue, Liang Zhao, Feng Chen, Chang-Tien Lu, and Huzefa Rangwala.
      • Venue: In Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining (KDD 2019)
      • Website: link
      • How to Cite: Ning, Yue, Liang Zhao, Feng Chen, Chang-Tien Lu, and Huzefa Rangwala. "Spatio-temporal Event Forecasting and Precursor Identification." In Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, pp. 3237-3238. ACM, 2019.
    • Big Data Analytics for Societal Event Forecasting
      • Presenters: Liang Zhao and Feng Chen.
      • Venue: in IEEE International Conference on Big Data (IEEE BigData 2018)
      • Website: link
      • How to Cite: Liang Zhao and Feng Chen. "Big Data Analytics for Societal Event Forecasting", in IEEE International Conference on Big Data (IEEE BigData 2018), December 10, 2018, Seattle, WA, USA.
    • Explainable AI for Societal Event Predictions: Foundations, Methods, and Applications
      • Presenters: Songgaojun Deng, Yue Ning, Huzefa Rangwala
      • Venue: AAAI 2021
      • Website: link
      • How to Cite: Songgaojun Deng, Yue Ning, Huzefa Rangwala. Explainable AI for Societal Event Predictions: Foundations, Methods, and Applications", in AAAI 2021, Feb, 2021, USA.

    Contact Us


    Liang Zhao
  • Homepage: http://cs.emory.edu/~lzhao41/
  • Email: liang.zhao@emory.edu
  • Address: Mathematics and Science Center Suite W401 400 Dowman Drive Atlanta, Georgia 30322
  • Phone: (404) 727-7580 (M-F 9AM-5PM)
  • Fax: (404) 727-5611

  • Acknowledgement



    National Science Foundation

    NVIDIA

    Jeffress Memorial Trust Foundation

    Amazon AWS