Kai Shu

New Multi-Hop Multimodal Claim Verification

MMCV is a multimodal claim verification dataset featuring natural, multi-hop claims, with strong supervision for supporting facts to enable more explainable fact-checking systems Leaderboard.

New Authorship Attribution in LLMs

We systematically categorizes authorship attribution in the era of LLMs into four problems: attributing unknown texts to human authors, detecting LLM-generated texts, identifying specific LLMs or human authors, and classifying texts as human-authored, machine-generated, or co-authored by both, while also highlighting key challenges and open problems. Github.

New Knowledge Editing in LLMs

We start an initiative aiming to explore and understand knowledge editing in LLMs. We proposed HalluEditBench to holistically benchmark knowledge editing methods in correcting real-world hallucinations Github . We propose to reformulate knowledge editing as a new type of safety threat for LLMs, namely EditingAttack, and discover its emerging risk of injecting misinformation or bias into LLMs Github .

New LLMs meet Misinformation: Detecting LLM-generated Misinformation

We released an initiative to combat misinformation in the age of LLMs Website . We created a dataset LLMFake as part of the initiative Github.

WALNUT: Benchmark on Semi-weakly Supervised Learning for NLU

This benchmark provides a publicly accessible framework for advocating and facilitate research on weak supervision for NLU. We expect WALNUT to stimulate further research on methodologies to leverage weak supervision more effectively. The benchmark and code for baselines are available at Website

Graph Neural Networks for Fake News Detection

This repository offers a publicly accessible platform and benchmark for using a series of Graph Neural Network (GNN) based fake news detection models. We welcome contributions of results of existing models and the SOTA results of new models based on our dataset. You can check the benchmark hosted by PaperWithCode for SOTA models and their performancesBenchmark Github.

COVID-19 Data Repository

This repository offers a publicly accessible platform to gather and curate datasets related to COVID-19 with multi-disciplines including spatial-temporal epidemic data, fact-checked content of different types of disinformation (e.g., fraud URLs, false news), social media content and network data from Twitter, scholar articles, etc. The repository also encourages data donation from the research community and promotes collaborations. Github

dEFEND: Explainable Fake News Detection

In recent years, to mitigate the problem of fake news, computational detection of fake news has been studied, producing some promising early results. While important, however, we argue that a critical missing piece of the study be the explainability of such detection, i.e., why a particular piece of news is detected as fake. In this paper, therefore, we study the explainable detection of fake news. We develop a sentence-comment co-attention sub-network to exploit both news contents and user comments to jointly capture explainable top-k check-worthy sentences and user comments for fake news detection. We conduct extensive experiments on real-world datasets and demonstrate that the proposed method not only significantly outperforms several state-of-the-art fake news detection methods. Code and Results.

Unsupervised Fake News Detection

Most existing methods of fake news detection are supervised, which require an extensive amount of time and labor to build a reliably annotated dataset. In search of an alternative, in this paper, we investigate if we could detect fake news in an unsupervised manner. We treat truths of news and users’ credibility as latent random variables, and exploit users’ engagements on social media to identify their opinions towards the authenticity of news. Code

Fake News Detection Data Repository

We released a tool FakeNewsTracker, for collecting, analyzing, and visualizing of fake news and the related dissemination on social media!
The latest dataset paper with detailed analysis on the dataset can be found at FakeNewsNet.
FakeNewsNet is a benchmark data repository fake news detection, which contains information of news content, social context, and spatialtemporal information for studying fake news on social media. Data and APIs are available at Github.