92 B. TOOLS
NewsVerify NewsVerify
6
is a real-time news verification system which can detect the credi-
bility of an event by providing some keywords about it [184].
NewsVerify mainly contains three stages: (1) crawling data; (2) building an ensemble
model; and (3) visualizing the results. Given the keywords and time range of a news event, the
related microblogs can be collected through the search engine of Sina Weibo. Based on these
messages, the key users and microblogs can be extracted for further analysis. e key users are
used for information source certification while the key microblogs are used for propagation and
content certification. All the data above are crawled through distributed data acquisition system
which will be illustrated below. After three individual models have been developed, the scores
from the above mentioned models are combined via weighted combination. Finally, an event
level credibility score is provided, and each single model will also have a credibility score that
measure the credibility of corresponding aspect. To improve the user experience of our appli-
cation, the results are visualized from various perspectives, which provide useful information of
events for further investigation.
Data Acquisition. ree kinds of information are collected: microblogs, propagation, and
microbloggers. Like most distributed system, NewsVerify also has master node and child nodes.
e master node is responsible for task distribution and results integration while child node
process the specific task and store the collected data in the appointed temporary storage space.
e child node will inform the master node after all tasks finished. en, master node will merge
all slices of data from temporary storage space and stored the combined data in permanent
storage space. After above operations, the temporary storage will be deleted. e distributed
system is based on ZooKeeper,
7
a centralized service for maintaining configuration information,
naming, providing distributed synchronization, and providing group services. As the attributes
of frequent data interaction, stored, read, we adopt efficient key-val database Redis to handle
the real-time data acquisition task. Redis, working with an in-memory dataset, can achieve
outstanding performance.
Model Ensemble. Different individual models are built to verify the truthfulness of news
pieces from the perspective of news content, news propagation, and information source (see
Figure B.7). e content-based model is based on hierarchical propagation networks [58]. e
credibility network has three layers: message layer, sub-event layer and event layer. Following
that, the semantic and structure features are exploited to adjust the weights of links in the net-
work. Given a news event and its related microblogs, sub-events are generated by clustering
algorithm. Sub-event layer is constructed to capture implicit semantic information within an
event. Four types of network links are made to reflect the relation between network nodes. e
intra-level links (Message to Message, Sub-event to Sub-event) reflect the relations among enti-
ties of a same type while the inter level links (Message to Sub-event, Sub-event to Event) reflect
the impact from level to level. After the network constructed, all entities are initialize with cred-
6
https://www.newsverify.com/
7
http://zookeeper.apache.org/