2.1. TEXTUAL FEATURES 11
2.1.3 NEURAL TEXTUAL FEATURES
With the recent advancements of deep neural networks in NLP, different neural network struc-
tures, such as convolution neural networks (CNNs) and recurrent neural networks (RNNs), are
developed to learn the latent textual feature representations. Neural textual features are based
on dense vector representations rather than high-dimensional and sparse features, and have
achieved superior results on various NLP tasks [178]. We introduce major representative deep
neural textual methods including CNNs and RNNs, and some of their variants.
CNN
CNNs have been widely used for fake news detection and achieve many good results [163, 175].
CNNs have the ability to extract salient n-gram features from the input sentence to create an in-
formative latent semantic representation of the sentence for downstream tasks [178]. As shown
in Figure 2.1, to learn the textual representations of news sentences, CNNs first build a word
representation matrix for each word using the word-embedding vectors such as Word2Vec [90],
then apply several convolution layers and a max-pooling layer to obtain the final textual repre-
sentations. Specifically, for each word w
i
in the news sentence, we can denote the k dimensional
word embedding vector as w
i
2 R
k
, and a sentence with n words can be represented as:
w
1Wn
D w
1
˚ w
2
w
n
; (2.3)
where ˚ denotes the concatenation operation. A convolution filter with window size h takes the
contiguous sequence of h words in the sentence as input and output the feature, as follows:
Qw
i
D .W w
iWi Ch1
/ (2.4)
and ./ is the ReLU activation function and W represents the weight of the filter. e filter can
further be applied to the rest of the words and then we can get a feature vector for the sentence:
Qw D Œ Qw
1
; Qw
2
; ; Qw
nhC1
(2.5)
for every feature vector t, we further use max-pooling operation to take the maximum value to
extract the most important information. e process can be repeated until we get the features
for all filters. Following the max pooling operations, a fully connected layer is used to ensure the
final textual feature representation.
RNN
An RNN is popular in NLP, which can encode the sequence information of sentences and
paragraphs directly. e representative RNN for learning textual representation is the long short-
term memory (LSTM) neural networks [64, 114, 119].
For example, two layers of LSTM [114] is built to detect fake news, where one layer puts
simple word embedding into LSTM and the other one concatenate LSTM output with Linguis-
tic Inquiry and Word Count (LIWC) [105] feature vectors before feeding into the action layer.