20 2. WHAT NEWS CONTENT TELLS

news veracity. Knowledge-based approaches use sources that are employed to fact-check claims

in news contents. e goal of fact-checking is to assign a truth value to a claim in a particular

context [158]. Fact-checking has attracted increasing attention, and many eﬀorts have been

made to develop an automated fact-checking system. e goal is to assess news authenticity by

comparing the information extracted from to-be-veriﬁed news content with known knowledge.

Existing fact-checking approaches can be categorized as Manual Fact-checking and Automatic

Fact-checking.

2.4.1 MANUAL FACT-CHECKING

Manual fact-checking aims to utilize human experts to provide signals manually of annotating

fake news. It heavily relies on human domain experts or normal users to investigate relevant

data and documents to construct the verdicts of claim veracity. Existing manual fact-checking

approaches mainly fall into: expert-based and crowdsourcing-based fact-checking (Table 2.3).

Table 2.3: Comparison of expert-based and crowdsourcing-based fact checking

Expert-Based Crowdsourcing-Based

Fact-checkers Domain-experts Regular individuals (i.e., collective intelligence)

Annotation reliability High Comparatively low

Scalability Poor Comparatively high

Expert-Based Fact-Checking

Fact-checking heavily relies on human domain experts to investigate relevant data and docu-

ments to deliver the verdicts of claim veracity. However, expert-based fact-checking is an intel-

lectually demanding and time-consuming process, which limits the potential for high eﬃciency.

We introduce some representative and popular fact-checking websites as follows.

• PolitiFact:

PolitiFact is a U.S. website that rates the accuracy of claims or statements by

elected oﬃcials, pundits, columnists, bloggers, political analysts, and other members of

the media. It is an independent, non-partisan source of online fact-checking system for

political news and information. e editors examine the speciﬁc word and the full context

of a claim carefully, and then verify the reliability of the claims and statements. e label

types include true, mostly true, half true, mostly false, false, and pants on ﬁre.

• Snopes:

Snopes is widely known as one of the ﬁrst online fact-checking websites for

validating and debunking urban legends. It covers a wide range of disciplines including

www.politifact.com/

http://www.snopes.com/

2.4. KNOWLEDGE-BASED METHODS 21

automobiles, business, computers, crime, fraud and scams, history, and so on. e label

types include true and false.

• FactCheck:

FactCheck is a nonproﬁt “consumer advocate webpage” for voters that aims

to reduce the level of deception and confusion in U.S. politics. ose claims and statements

are originated from various platforms, including TV advertisements, debates, speeches, in-

terviews, new releases, and social media. ey mainly focus on presidential candidates in

presidential election years, and evaluate the factual accuracy of their statements systemat-

ically.

• GossipCop:

GossipCop investigates entertainment stories that are published in maga-

zines and newspapers, as well as on the Web, to ascertain whether they are true or false.

ey provide the score scaling from 0–10, where 0 means fake and 10 mean real.

• TruthOrFiction:

TruthOrFiction is a non-partisan online website that provide fact-

checking results on warnings, hoaxes, virus warnings, and humorous or inspirationalstories

that are distributed through emails. It mainly focuses on misleading information that are

popular via forwarded emails. And they rate stories or information by the following cat-

egories: truth, ﬁction, reported to be truth, unproven, truth and ﬁction, previously truth,

disputed, and pending investigation.

Crowdsourcing-Based Fact-Checking

Fact-checking exploits the “wisdom of crowd” to enable people to annotate news content. ese

annotations are then aggregated to produce an overall assessment of the claim veracity. For ex-

ample, Fiskkit

allows users to discuss and annotate the accuracy of speciﬁc parts of a news

article. As another example, an anti-fake-news bot named “For real” is a public account in the

communication mobile application LINE,

which allows people to report suspicious news con-

tent which is then further checked by editors.

2.4.2 AUTOMATIC FACT-CHECKING

Manual fact-checking relies on humans annotation, which is usually time-consuming and labor-

intensive. Instead, automatic fact-checking for speciﬁc claims largely relies on external knowledge

to determine the truthfulness of a particular claim. Two typical external sources include the open

web and structured knowledge graph. Open web sources are utilized as references that can be

compared with given claims in terms of both the consistency and frequency [10, 84]. Knowl-

edge graphs are integrated from the linked open data as a structured network topology, such as

https://www.factcheck.org/

https://www.gossipcop.com/

https://www.truthorfiction.com/

http://fiskkit.com

https://grants.g0v.tw/projects/588fa7b382223f001e022944

22 2. WHAT NEWS CONTENT TELLS

DBpedia and Google Relation Extraction Corpus. Fact-checking using a knowledge graph aims

to check whether the claims in news content can be inferred from existing facts in the knowl-

edge graph [29, 129, 171]. Next, we introduce a standard knowledge graph matching approach

that matches news claims with the facts in knowledge graphs.

Path Finding Fake news spreads false claims in news content, so a natural means of detecting

fake news is to check the truthfulness of major claims in the news article. A claim in news

content can be represented by a subject-predicate-object triple .s; p; o/, where the subject entity

s is related to the object entity o by the predicate relation p. We can ﬁnd all the paths that start

with s and end with o, and then evaluate these paths to estimate the truth value of the claim.

is set of paths, also known as knowledge stream [130], are denoted as P.s; o/. Intuitively, if

the paths involve more speciﬁc entities, then the claim is more likely to be true. us, we can

deﬁne a “speciﬁcity” measure S.P

s;o

/ as follows:

S.P

s;o

/ D

1 C

n1

iD2

log d.o

; (2.16)

where d.o

/ is the degree of entity o

, i.e., the number of paths that entity o participates. One

approach is to optimize a path evaluation function: .c/ D max W.P

s;o

/, which maps the set

of possible paths connecting s and o (i.e., P

s;o

) to a truth value  . If s is already present in the

knowledge graph, it can assign maximum truth value 1; otherwise, the objective function will

be optimized to ﬁnd the shortest path between s and o.

Flow Optimization We can assume that each edge of the network is associated with two

quantities: a capacity to carry knowledge related to .s; p; o/ across its two endpoints, and a cost

of usage. e capacity can be computed using S.P

s;o

/, and the cost of an edge in knowledge is

deﬁned as c

D log d.o

/. e goal is to identify the set of paths responsible for the maximum

ﬂow of knowledge between s and o at the minimum cost. e maximum knowledge a path P

s;o

can carry is the minimum knowledge of its edges, also called its bottleneck B.P

s;o

/. us, the

objective can be deﬁned as a minimum cost maximum ﬂow problem

.e/

s;o

B.P

s;o

/  S.P

s;o

/; (2.17)

where B.P

s;o

/ is denoted as a minimization form: B.P

s;o

/ D minfx

j 2 P

s;o

g, with x

indicating

the residual capacity of edge x in a residual network [130].

e knowledge graph itself can be redundant, invalid, conﬂicting, unreliable, and incom-

plete [185]. In these cases, path ﬁnding and ﬂow optimization may not be suﬃcient to obtain

good results of assessing the truth value. erefore, additional tasks need to be considered in

order to reconstruct the knowledge graph and to facilitate its capability as follows.

• Entity Resolution: refers to the process of ﬁnding related entries in one or more related

relations in a database and creating links among them [19]. is problem has been exten-

2.4. KNOWLEDGE-BASED METHODS 23

sively studied in the database area and applied to data warehousing and business intelli-

gence. Based on this survey [72], existing methods exploit features in three ways, namely

numerical, rule-based, and workﬂow-based. Numerical approaches combine the similarity

score of each feature into a weighted sum to decide linkage [39]; rule-based approaches

derive match decision through a logical combination of testing separate rules of each fea-

ture with a threshold; workﬂow-based methods apply a sequence of feature comparison in

an iterative way. Both supervised such as TAILOR [37] and MARLIN [15], and unsu-

pervised approaches such as MOMA [151] and SERF [13] are studied in the literature.

• Time Recording: aims to remove outdated knowledge. is task is important giving that

fake news pieces are often related to newly emerging events. Existing work on time record-

ing mainly utilize the Compound Value Type structure to allow facts incorporating begin-

ning and ending date annotations [17], or adding extra assertions to current facts [52].

• Knowledge Fusion: (or truth discovery) aims to identify true subject-predicate-object

triples extracted by multiple information extractors from multiple information sources [36,

79]. Truth discovery methods do not explore the claims directly, but rely on a collec-

tion of contradicting sources that record the properties of objects to determine the truth

value. Truth discovery aims to determine the source credibility and object truthfulness at the

same time. Fake news detection can beneﬁt from various aspects of truth discovery ap-

proaches under diﬀerent scenarios. For example, the credibility of diﬀerent news outlets

can be modeled to infer the truthfulness of reported news. As another example, relevant

social media posts can also be modeled as social response sources to better determine the

truthfulness of claims [93, 167]. However, there are some other issues that must be con-

sidered to apply truth discovery to fake news detection in social media scenarios. First,

most existing truth discovery methods focus on handling structured input in the form of

subject-predicate-object (SPO) tuples, while social media data is highly unstructured and

noisy. Second, truth discovery methods cannot be well applied when a fake news article

is newly launched and published by only a few news outlets because at that point there is

not enough social media posts relevant to it to serve as additional sources.

• Link Prediction: on knowledge graphs aims to predict new fact from existing facts. is

is important since existing knowledge graphs are often missing many facts, and some of

the edges they contain are incorrect. Relational machine learning methods are widely used

to infer new knowledge representations [97], including latent feature models and graph

feature models. Latent feature models exploit the latent features or entities to learn the

possible SPO triples. For example, RESCAL [98] is a bilinear relational learning model

that explain triples through pairwise interactions of latent features. Graph feature mod-

els assume that the existence of an edge can be predicted by extracting features from the

observed edges in the graph, such as Markov logic programming or path ranking algo-

rithms. For example, Markov Random Fields (MRFs) [129] encode dependencies of facts

24 2. WHAT NEWS CONTENT TELLS

into random variables and infer the missing dependencies through statistical probabilistic

learning.