28 3. HOW SOCIAL CONTEXT HELPS
on both datasets demonstrate that users who are more likely to share fake news registered much
earlier. As another example for implicit features, we demonstrate the comparison in Figure 3.2.
We can see that the predicted ages are significantly different, and users who spread fake news
are predicted younger than those who spread real news. Motivated by the observations, we can
further extract these explicit and implicit features for all the users that spread the news to predict
whether it is a fake news piece or not.
Psychology-Related Features
To understand the characteristics of users who spread fake news, we can rely on psychological
theories. Although there is a large body of work on these psychological theories, not many of
them can be (1) applied to users and their behaviors on social media and (2) quantitatively
measured for fake news articles and spreaders on social media. Hence, based on psychological
theories we have mentioned in Section 1.2, we can enumerate five categories of features that can
potentially express the differences between users who spread fake news and the ones who spread
real news.
• Motivational Factors:
there are three LIWC categories that are related to
uncertainty
:
discrepancy (e.g., should, would, and could), tentativeness (e.g., maybe, perhaps, and
guess), and certainty (e.g., always and never). ese categories are abbreviated as discrep,
tentat, and certain respectively. Anxiety can be measured using the LIWC Anxiety cat-
egory (anx) which includes words such as nervous, afraid, and tense. Importance or outcome-
relevance is observed to be a difficult feature to measure in psychology so researchers suggest
using proxies to quantify importance; we use anxiety as a proxy for measuring this feature,
meaning that people are more anxious about a topic which is more important to them. We
use LIWC Future Focus (futurefocus) to measure lack of control, this category includes
words such as may, will, and soon. We do not measure belief explicitly because we assume
that any user who tweets fake news articles believes in it.
• Demographics: Twitter users are not obligated to include information such as age, race,
gender, political orientation, or education on their profiles. Hence, we cannot obtain any
demographic information from public profiles of tweeters unless we use a proxy. e
prevalence of using swear words is shown to be correlated with gender, social status, and
race [12]. Hence, we use LIWC Swear Words category (swear) as a measure for demo-
graphics feature.
• Social Engagement: the more a user is involved with social media the less likely it is for
her/him to be misguided by fake news. We measure social engagement on Twitter using
the average number of tweets per day.
• Position in the Network: this feature can be quantified using a variety of metrics when
the network structure is known. However, in the case of social networks between Twitter