Background
Motivations: Events are occurrences in specific locations, time, and semantics that nontrivially impact either our society or the nature, such as earthquakes, civil unrest, system failures, pandemics, and crimes. It is highly desirable to be able to anticipate the occurrence of such events in advance to reduce the potential social upheaval and damage caused. Event prediction, which has traditionally been prohibitively challenging, is now becoming a viable option in the big data era and is thus experiencing rapid growth, also thanks to advances in high performance computers and new Artificial Intelligence techniques. There is a large amount of existing work that focuses on addressing the challenges involved, including heterogeneous multi-faceted outputs, complex (e.g., spatial, temporal, and semantic) dependencies, and streaming data feeds. Due to the strong interdisciplinary nature of event prediction problems, most existing event prediction methods were initially designed to deal with specific application domains, though the techniques and evaluation procedures utilized are usually generalizable across different domains. However, it is imperative yet difficult to cross-reference the techniques across different domains, given the absence of a comprehensive literature survey for event prediction. This website aims to provide a systematic and comprehensive repository of resources about this domain.
Problem Formulations, Techniques, Applications, and Evaluation Metrics: please refer to our paper in ACM Computing Surveys. A even longer version is here: https://arxiv.org/abs/2007.09815
Datasets
Domains |
Applications and Data Resources | |
Healthcare | Population-level | Pandemics [COVID-19 dataset]; Influenza Outbreaks [CDC flu map dataset]; Social sensing [Twitter and Google flu trend datasets] |
Individual-level | Social media-based [Flu adverse event dataset]; Electronic health records [EHR dataset]; Online Health Community [Breast Cancer Forum dataset]; Mobile Devices [mHEALTH dataset] |
|
Media | Multimedia-based |
Next action prediction [Sport-related datasets, Vehicle-related datsets] |
Script-based | Offline news data [New York Times]; Online News Data [dataset] |
|
Transportation | Group-based |
NYU Yellow Taxi datasets [dataset link]; Didi Gaia dataset[dataset link] |
Individual-based | Car accident prediction [dataset link] | <|
Engineering Systems | Engergy engineering | Global Energy Forecasting Competition dataset [dataset link]; Solar Power Forecasting [2014 datasets] |
Cyber systems |
|
Cyber attack prediction [Phoronix benchmark suite] [ARCS datasets] [Social coding datasets] |
Political Events | Offline events |
Civil unrests [ICEWS dataset] [GDELTS dataset] [OSI dataset] [TERRIER dataset]. |
Online events | Online activism [online petition prediction dataset] | |
Natural Events | geophysics; |
Geophysics [USGS earthquake datasets, Water pollution prediction]; Atmospherics [China environment data]; Astrophysics [Solar event datasets] |
Business | Customer Activity Prediction |
Telecom Churn datasets [dataset survey]; Massive Open Online Courses event dataset [dataset survey] |
Business Process Events | CRSP/Compustat Merged Database [dataset link]; AAERs datasets [dataset link]; BPI Challenge datasets [dataset paper link] |
|
Crimes | Political Crimes and Terrorism |
Global Terrorism Database [dataset link]; UCDP dataset [dataset link] |
Crime Incidents and Hotspots | NYC Crimes [dataset link]; Demographic [dataset link]; Sensor data [dataset link] |
|
Synthetic Dataset | N/A | Complex event dataset and code [dataset link] |
Methods and Code
Research Problem |
Methodologies | Papers, Methods, Code (the following reference numbers are from our survey paper.) |
|
Time Prediction | Occurrence | Classification; (Auto-)Regression; Anomaly Detection. |
[212,209,85,117,47,11,171]; [59,68,194]; [160,83]. |
Discrete Time | Direct Manner; Indirect Manner. |
[126] [131] [162] [172]; [20] [92] [24] [92] [146] |
|
Continuous Time | Regression; Point Process; Survival Analysis. |
[20] [111] [179]; [141] [45] [58] [163]; [114] [101] [175] | |
Location Prediction | Raster-based |
Spatial clustering; Spatial embedding; Spatial convolution; Trajectory destination prediction. |
[88] [178] [185]; [76] [88] [148]; [71,139]; [93,94,98,174,182,215]] |
Point-based | Supervised: spatial multi-task learning; spatial autoregressive; Unsupervised: spatial scan; network scan. |
[206,69,207,132]; [25,205,193,192]; [31,144]; [109,32] |
Semantic Prediction | Association-based |
Frequent set mining-based; decision list-based |
[176,213]; [106] |
Causality-based | Step 1: Event representation; Step 2: Event causality inference; Step 3: Future event inference. |
[46,161,210,105,35,191,142,210] | |
Semantic Sequence | Classical sequence classification: feature-based; model-based; prototype-based; Recurrent neural networks: attributed-based; descriptive-based. |
[72,170,72,107,64]; [189,7,8,104,202]; [3]; [116,26,27,71,95,56,113]; [79,80,167,186,195] |
|
Joint Prediction | Time and Semantic | Temporal association rule; Time expression extraction; Multivariate time series forecasting. |
[176,35,166,190]; [44,84]; [110,184,19,23,121] |
Time and Location | Raster-based: CNN+RNN; 3D Convolutional Neural Networks; Spatiotemporal Conditional MRF. Point-based: spatiotemporal Gaussian process; spatiotemporal point process. |
[53,71,82,87,88,100,142,147,183,197,211]; [118,134,120] |
|
Time, Location, and Semantic | System-based: model-fusion system; crowdsourced system; Planned future event detection; Tensor-based methods. |
[144,135,78,91,149,108]; [39,44,84,90,99,129,130]; [127,214] |
Reviews and Tutorials
Literature Reviews
Related Tutorials
Contact Us
Acknowledgement
National Science Foundation |
NVIDIA |
Jeffress Memorial Trust Foundation |
|
Amazon AWS |
|
Copyright © All rights reserved | This template is made with by Colorlib