3.2. POST-BASED DETECTION 37
(RNN) to learn the emotion embedding vectors. Following traditional settings [54], we first
obtain a large-scale real-world datasets that contain emotions, and use the emotions as the emo-
tion labels, and then initialize each word with one-hot vector. After initiation, all word vectors
pass an embedding layer which project each words from the original one-hot space into a low
dimensional space, and then sequentially fed into a one-layer GRU model. en, through back-
propagation, the embedding layer get updated during training, producing emotion embedding
e
i
for each word w
i
.
Incorporating Emotion Representations We introduce how to incorporate emotion embed-
dings to news contents and user comments to learn the representations for fake news detection.
We can learn the basic textual feature representations through a bidirectional GRU word en-
coder as in Section 2.1.3. For each word w
i
, the word embedding vector w
i
is initialized with the
pre-trained word2vec [90]. e bidirectional GRU contains the forward GRU
!
f which reads
each sentence from word w
0
to w
M
and a backward GRU
f which reads the sentence from
word w
n
to w
0
:
!
h
w
i
D
!
GRU.w
i
/; i 2 Œ0; n;
h
w
i
D
GRU.w
i
/; i 2 Œ0; n:
(3.15)
For a given word w
i
, we could obtain its word encoding vector h
w
i
by concatenating the forward
hidden state
!
h
w
i
and backward hidden state
h
w
i
, i.e., h
w
i
D Œ
!
h
w
i
;
h
w
i
.
Similarly to the word encoder, we adopt bidirectional GRU to model the emotion feature
representations for the words. After we obtain the emotion embedding vectors e
i
, we can learn
the emotion encoding h
e
i
for word w
i
:
!
h
e
i
D
!
GRU.e
i
/; i 2 Œ0; n;
h
e
i
D
GRU.e
i
/; i 2 Œ0; n;
(3.16)
for a given word w
i
, we could obtain its emotion encoding vector h
e
i
by concatenating the for-
ward hidden state
!
h
e
i
and backward hidden state
h
e
i
, i.e., h
e
i
D Œ
!
h
e
i
;
h
e
i
.
e overall emotion information of news content is also important when deciding how
much information from emotion embedding should be absorbed for the words. For a given post
a, we extract the emotion features included in [22] and also add some emotion features. ere
are 19 features regarding emotion aspects of news, including numbers of positive/negative words,
sentiment score, etc. News emotion features of a is denoted as se.
Gate_N is applied to learn information jointly from word embedding, emotion embed-
ding and sentence emotion features, and yield new representation for each word (see Figure 3.5).
e units in Gate_N is motivated by the forget gate and input gate in LSTM. In Gate_N, two
emotion inputs corporately decide the value of r
t
and u
t
with two sigmoid layers, which are
used for manage how much information from semantic and emotion is added into the new rep-
resentation. Meanwhile, a dense layer transfer the emotion inputs to the same dimensional space