A P P E N D I X B

Tools

In this appendix, we introduce some representative online tools for tracking and detecting fake

news on social media.

Hoaxy Hoaxy

aims to build a uniform and extensible platform to collect and track misinfor-

mation and fact-checking [125], with visualization techniques to understand the misinformation

propagation on social media.

Data Scraping. e major components included a tracker for the Twitter API, and a set of

crawlers for both fake news and fact checking websites and databases (see FigureB.1). e system

collects data from two main sources: news websites and social media. From the ﬁrst group we

can obtain data about the origin and evolution of both fake news stories and their fact checking.

From the second group we collect instances of these news stories (i.e., URLs) that are being

shared online. To collect data from such disparate sources, diﬀerent technologies are used: web

scraping, web syndication, and, where available, APIs of social networking platforms. To collect

data on news stories they use rich site summary (RSS), which allows a uniﬁed protocol instead of

manually adapting our scraper to the multitude of web authoring systems used on the Web. RSS

feeds contain information about updates made to news stories. e data is collected from news

sites using the following two steps: when a new website is added to our list of monitored sources,

a “deep” crawl of its link structure is performed using a custom Python spider written with the

Scrapy framework; at this stage, the URL of the RSS feed is identiﬁed if it is available. Once

all existing stories have been acquired, a “light” crawl is performed every two hour by checking

its RSS feed only. To perform the “deep” crawl, we use a depth ﬁrst strategy. e “light” crawl

is instead performed using a breadth-ﬁrst approach.

Analysis Dashboard. Hoaxy provides various visualization interfaces to demonstrate the

news spreading process. As shown in Figure B.2, we demonstrate the major functionalities on

the analysis dashboard. On the top, users can search any news articles by providing speciﬁc

keywords. On the left side, it demonstrates the temporal trendiness of the user engagements

for the news articles. On the right side, it illustrates the propagation network on Twitter, which

clearly convey the information on who spreads the news tweets from whom. In addition, they

also evaluate the bot score for all users with BotMeter [31].

https://hoaxy.iuni.iu.edu/

86 B. TOOLS

Social Networks

URL Tracker

(stream api)

RSS Parser

Scrapy Spider

News Sites

Monitors

Analysis Dashboard

Fetch

Database

Store

API Crawler

Figure B.1: e framework of Hoaxy. Based on [125].

FakeNewsTracker FakeNewsTracker

is a system for fake news data collection, detection,

and visualization on social media [133]. It mainly consists of the following components (see

Figure B.3): (1) fake news collection; (2) fake news detection; and (3) fake news visualization.

Collecting Fake News Data. Fake news is widely spread across various online platforms.

We use some of the fact-checking websites like PolitiFact as a source for collecting fake news

information. On these fact-checking sites, fake news information is provided by the trusted

authors and relevant claims are made by the authors on why the mentioned news is not true.

e detailed collection procedure is described in Figure A.1.

Detecting Fake News. A deep learning model is proposed to learn neural textual features

from news content, and temporal representations from social context simultaneously to predict

fake news. An auto-encoder [67] is used to learn the feature representation of news articles, by

reconstructing the news content, and LSTM is utilized to learn the temporal features of user

http://blogtrackers.fulton.asu.edu:3000/

B. TOOLS 87

Figure B.2: e main dashboard of Hoaxy website.

engagements. Finally, the learned feature representations of news and social engagements are

fused to predict fake news.

Visualization Fake News in Twitter. We have developed a fake news visualization as

shown in Figure B.4 for the developing insights on the collected data through various interfaces.

We demonstrate the temporal trends of the number of tweets spreading fake and real news in a

speciﬁc time period, as in Figure B.4a.

In addition, we can explore the social network structure among users in the propagation

network (see Figure B.4b for an example), and further compare the diﬀerences between the users

who interact with the fake news and the true news.

For identifying the diﬀerences in the news content of the true news and the fake news we

have used word cloud representation of the words for the textual data. We search for fake news

within a time frame and identify the relevant data. In addition, we provide the comparison of

feature signiﬁcance and model performance as part of this dashboard. Moreover, we could see

how fake news is spread around certain areas using the geo-locations of tweets.

88 B. TOOLS

Fake News

Collection

Fake News

Visualization

Fake News

Detection

Twitter Adv.

Crawler

Fact

Checking

Crawler

News

Content

Crawler

Database

Engagement

Crawler

Fake News

Detection

Figure B.3: e framework of FakeNewsTracker. Based on [133].

dEFEND dEFEND

is a fake news detection system that are also able to provide explainable

user comments on Twitter. dEFEND (see Figure B.5) mainly consists of two major components:

a web-based user interface and a backend which integrates our fake news detection model.

e web-based interface provides users with explainable fact-checking of news. A user can

input either the tweet URL or the title of the news. A screenshot was shown in Figure B.6. On

typical fact-checking websites, a user just sees the check-worthy score of news (like Gossip Cop

)

or each sentence (like ClaimBuster.

) In our approach, the user cannot only see the detection

result (in the right of Figure B.6a), but also can ﬁnd all the arguments that support the detection

result, including crucial sentences in the article (in the middle of Figure B.6b) and explainable

comments from social media platforms (in the right of Figure B.6b). At last, the user can also

review the results and ﬁnd related news/claims.

e system also provides exploratory search functions including news propagation net-

work, trending news, top claims and related news. e news propagation network (in the left

of Figure B.6b) is to help readers understand the dynamics of real and fake news sharing, as

fake news are normally dominated by very active users, while real news/fact checking is a more

http://fooweb-env.qnmbmwmxj3.us-east-2.elasticbeanstalk.com/

https://www.gossipcop.com/

https://idir-server2.uta.edu/claimbuster/

B. TOOLS 89

(a) User interface of trend on news spreading

(b) User interface on news propagation networks

Figure B.4: Demonstration of FakeNewsTracker system.

90 B. TOOLS

Request

News URL/Title

Hoaxy API

News Content

Users’ Comments

dEFEND Algorithm

Detection Module

Attention Map Module

Propagation

Network

Visualization

Warning Sign

Disputed by dEFEND with 95% Conﬁdence

Sentences’ Attention Weights

Explainable Comments

Results

UI Backend

Figure B.5: e framework of dEFEND.

grass-roots activity [125]. Trending news, top claims, and related news (in the lower left of

Figure B.6a) can give some query suggestions to users.

e backend consists of multiple components: (1) a database to store the pre-trained re-

sults as well as a crawler to extract unseen news and its comments; (2) the dEFEND algorithm

module based on explainable deep learning fake news detection (details in Section 4.3.2), which

gives the detection result and explanations simultaneously; and (3) an exploratory component

that shows the propagation network of the news, trending and related news.

Exploratory Search. e system also provides users with browsing functions. Consider

a user who doesn’t know what to check speciﬁcally. By browsing the trending news, top claims

and news related to the previous search right below the input box, the user can get some ideas

about what he could do. News can be the coverage of an event, such as “Seattle Police Begin

Gun Conﬁscations: No Laws Broken, No Warrant, No Charges” and claim is the coverage

around what a celebrity said, such as “Actor Brad Pitt: ‘Trump Is Not My President, We Have

No Future With is....”’ Users can search these titles by clicking on them. e news related to

the user’s previous search is recommended. For example, news “Obama’s Health Care Speech

to Congress” is related to the query “It’s Better For Our Budget If Cancer Patients Die More

Quickly.”

B. TOOLS 91

(a) User interface of search: the input box (upper left), query suggestions (lower left),

and intuitive propagation network (right).

(b) Explainable Fact Checking: news content (left), explainable sentences (upper right),

and comments (lower right).

Figure B.6: Demonstration of dEFEND system.

Explainable Fact-Checking. Consider a user who wants to check whether Tom Price has

said “It’s Better For Our Budget If Cancer Patients Die More Quickly.” e user ﬁrst enters the

tweet URL or the title of a news in the input box in Figure B.6a. e system would return the

check-worthy score, the propagation network, sentences with explainable scores, and comments

with explainable scores to the user in Figure B.6b. e user can zoom in the network to check

the details of the diﬀusion path. Each sentence is shown in the table along with its score. e

higher the score, the more likely the sentence contains check-worthy factual claims. e lower

the score, the more non-factual and subjective the sentence is. e user can sort the sentences

either by the order of appearance or by the score. Comments’ explainable scores are similar to

sentences’. e top-5 comments are shown in the descending order of their explainable score.

92 B. TOOLS

NewsVerify NewsVerify

is a real-time news veriﬁcation system which can detect the credi-

bility of an event by providing some keywords about it [184].

NewsVerify mainly contains three stages: (1) crawling data; (2) building an ensemble

model; and (3) visualizing the results. Given the keywords and time range of a news event, the

related microblogs can be collected through the search engine of Sina Weibo. Based on these

messages, the key users and microblogs can be extracted for further analysis. e key users are

used for information source certiﬁcation while the key microblogs are used for propagation and

content certiﬁcation. All the data above are crawled through distributed data acquisition system

which will be illustrated below. After three individual models have been developed, the scores

from the above mentioned models are combined via weighted combination. Finally, an event

level credibility score is provided, and each single model will also have a credibility score that

measure the credibility of corresponding aspect. To improve the user experience of our appli-

cation, the results are visualized from various perspectives, which provide useful information of

events for further investigation.

Data Acquisition. ree kinds of information are collected: microblogs, propagation, and

microbloggers. Like most distributed system, NewsVerify also has master node and child nodes.

e master node is responsible for task distribution and results integration while child node

process the speciﬁc task and store the collected data in the appointed temporary storage space.

e child node will inform the master node after all tasks ﬁnished. en, master node will merge

all slices of data from temporary storage space and stored the combined data in permanent

storage space. After above operations, the temporary storage will be deleted. e distributed

system is based on ZooKeeper,

a centralized service for maintaining conﬁguration information,

naming, providing distributed synchronization, and providing group services. As the attributes

of frequent data interaction, stored, read, we adopt eﬃcient key-val database Redis to handle

the real-time data acquisition task. Redis, working with an in-memory dataset, can achieve

outstanding performance.

Model Ensemble. Diﬀerent individual models are built to verify the truthfulness of news

pieces from the perspective of news content, news propagation, and information source (see

Figure B.7). e content-based model is based on hierarchical propagation networks [58]. e

credibility network has three layers: message layer, sub-event layer and event layer. Following

that, the semantic and structure features are exploited to adjust the weights of links in the net-

work. Given a news event and its related microblogs, sub-events are generated by clustering

algorithm. Sub-event layer is constructed to capture implicit semantic information within an

event. Four types of network links are made to reﬂect the relation between network nodes. e

intra-level links (Message to Message, Sub-event to Sub-event) reﬂect the relations among enti-

ties of a same type while the inter level links (Message to Sub-event, Sub-event to Event) reﬂect

the impact from level to level. After the network constructed, all entities are initialize with cred-

https://www.newsverify.com/

http://zookeeper.apache.org/

B. TOOLS 93

Extract Unauthorized Key Elements

UGC News Clues

Propagation Model

UGC Credibility score = S

content

⋃

propa

⋃

user

User Model

Key Users Key Tweets

Tweets

Social Network

Content Model

Figure B.7: e framework of NewsVerify system. Based on [184].

ibility values using classiﬁcation results. We formulate this propagation as a graph optimization

problem and provides a global optimal solution to it. e propagation-based model propose to

compute a propagation inﬂuence score over time to capture the temporal trends. e informa-

tion source based model utilize the sentiment and activeness degree as features to help predict

fake news. From the aforementioned models, an individual score is obtained. en a weighted

logistic regression model can be used to ensemble the result and produce an overall score for the

news piece.

Interface Visualization. Figure B.8 illustrate the interface of NewsVerify system. It allows

users to report fake news, and search speciﬁc news to verify by providing keywords to the system.

It also automatically show the degree of veracity for Weibo data of diﬀerent categories. For each

Weibo in the time-line, NewsVerify shows the credibility score to justify how likely the Weibo

is related to fake news. In addition, it allows interested users to click “View detailed analysis” to

learn more about the news. As shown in Figure B.9, it mainly demonstrates: (1) the introduction

of the news event including the time, source, related news, etc.; (2) the changes of trends and

topics over time related to the news events; (3) the proﬁles and aggregated statistics of users who

are engaged in the news spreading process such as the key communicator, sex ratio, certiﬁcation

ratio; and (4) the images or videos related to the news events.

94 B. TOOLS

Figure B.8: e interface of NewsVerify system.

Figure B.9: Demonstration of detail news analysis of NewsVerify system.