and Fact Checking
We are happy to report that we received 43 strong submissions covering a broad range of topics within the scope of the call. Our diverse program committee worked hard to evaluate each submission and based on their recommendations we selected a total of 12 contributions for oral presentations (acceptance rate 28%).
On Social Networks please use the hashtag #MisInfoWeb in association with #TheWebConf
Relevant Document Discovery for Fact-Checking Articles
Authors: Xuezhi Wang, Cong Yu, Simon Baumgartner and Flip Korn
Keywords: Claim Discovery, Fact Checking, Digital Misinformation
With the support of major search platforms such as Google and Bing, fact-checking articles, which can be identified by their adoption of the schema.org ClaimReview structured markup, have gained widespread recognition for their role in the fight against digital misinformation. A claim-relevant document is an online document that addresses, and potentially expresses a stance towards, some claim. The claim-relevance discovery problem, then, is to find claim-relevant documents. Depending on the verdict from the fact check, claim-relevance discovery can help identify online misinformation. In this paper, we provide an initial approach to the claim-relevance discovery problem by leveraging various information retrieval and machine learning techniques. The system consists of three phases. First, we retrieve candidate documents based on various features in the fact-checking article. Second, we apply a relevance classifier to filter away documents that do not address the claim. Third, we apply a language feature based classifier to distinguish documents with different stances towards the claim. We experimentally demonstrate that our solution achieves solid results on a large-scale dataset and beats state-of-the-art baselines. Finally, we highlight a rich set of case studies to demonstrate the myriad of remaining challenges and that this problem is far from being solved.
Online Misinformation: Challenges and Future Directions
Authors: Miriam Fernandez and Harith Alani
Keywords: Misinformation, Technology Development, Research Directions
Misinformation has become a common part of our digital media environments and it is compromising the ability of our societies to form informed opinions. It generates misperceptions, which have affected the decision making processes in many domains, including economy, health, environment, and elections, among others. Misinformation and its generation, propagation, impact, and management is being studied through a variety of lenses (computer science, social science, journalism, psychology, etc.) since it widely affects multiple aspects of society. In this paper we analyse the phenomenon of misinformation from a technological point of view. We study the current socio-technical advancements towards addressing the problem, identify some of the key limitations of current technologies, and propose some ideas to target such limitations. The goal of this position paper is to reflect on the current state of the art and to stimulate discussions on the future design and development of algorithms, methodologies, and applications.
Fake News Detection in Social Networks via Crowd Signals
Authors: Sebastian Tschiatschek, Adish Singla, Manuel Gomez Rodriguez, Arpit Merchant and Andreas Krause
Keywords: Fake news, Social networks, Social Media, Crowd Signals
Our work considers leveraging crowd signals for detecting fake news and is motivated by tools recently introduced by Facebook that enable users to flag fake news. By aggregating users’ flags, our goal is to select a small subset of news every day, send them to an expert (e.g., via a third-party fact-checking organization), and stop the spread of news identified as fake by an expert. The main objective of our work is to minimize the spread of misinformation by stopping the propagation of fake news in the network. It is especially challenging to achieve this objective as it requires detecting fake news with high-confidence as quickly as possible. We show that in order to leverage users’ flags efficiently, it is crucial to learn about users’ flagging accuracy. We develop a novel algorithm, Detective, that performs Bayesian inference for detecting fake news and jointly learns about users’ flagging accuracy over time. Our algorithm employs posterior sampling to actively trade off exploitation (selecting news that directly maximize the objective value at a given epoch) and exploration (selecting news that maximize the value of information towards learning about users’ flagging accuracy). We demonstrate the effectiveness of our approach via extensive experiments and show the power of leveraging community signals.
A Structured Response to Misinformation: Defining and Annotating Credibility Indicators in News Articles
Authors: Amy Zhang, Aditya Ranganathan, Emlen Metz, Scott Appling, Connie Moon Sehat, Norman Gilmore, Nick Adams, Emmanuel Vincent, Jennifer Lee and An Xiao Mina
Keywords: misinformation, disinformation, information disorder, credibility, news, journalism, media literacy, web standards
The proliferation of misinformation in online news and its amplification by automated feeds and social media are a growing concern. While there have been calls for an increased effort to improve the detection of and response to misinformation, doing so effectively requires collective agreement on the indicators that signal credible content. In this paper, we present an initial set of indicators for article credibility defined collaboratively by a diverse coalition of experts. These indicators originate from both within an article’s text as well as from external sources or article metadata. As a proof-of-concept, we present a novel dataset of 40 articles of varying credibility annotated with our credibility indicators by 6 trained annotators using specialized platforms. Finally, we outline the future steps for this initiative, including expanding annotation efforts, broadening the set of indicators, and considering their use by platforms and the public, towards the development of interoperable standards for content credibility.
Satire or Fake News? Social Media Consumers Socio-Demographics Decide
Authors: Chianna Schoenthaler and Michele Bedard
Keywords: Fake News, Satire, Click-hit Revenue
ABSTRACT After the decidedly volatile results from the 2016 U.S. presidential race, the subject of Fake News in our worldwide media consumption has grown daily. On a smaller scale, mainstream media has taken a closer look at the relatively narrow genre of satirical news content. Satirical news is designed specifically to entertain the reader, usually with irony or wit, in order to critique society or a social figure and invoke change or reform (Koltonski, 2017). The problems that can arise from satirical news come about due to the readers, professional or not, misinterpreting the content as truth and then in turn, sharing it with others as fact. Oftentimes this occurs due to deliberate misleading by the newest type of satirical news, the decidedly unfunny and often joke-free “satire” that have sprung up in order to generate click-hit revenue (Rensin, 2014). As the gullibility increases, the media and public lose patience; the merging and confusion of all the various terminology in the fake news universe begin and satire can be mislabeled as fake news (Media Matters Staff, 2016). Using our primary research, we seek to determine if there is a correlation between a media consumers understanding of the difference between satirical news versus fake news and varying socio-demographic factors.
Selection Bias in News Coverage: Learning it, Fighting it
Authors: Dylan Bourgeois, Jérémie Rappaz and Karl Aberer
Keywords: news coverage, selection bias, media pluralism, echo-chamber, factorization methods, ranking methods
News entities must select and filter the coverage they broadcast through their respective channels since the set of world events is too large to be treated exhaustively. The subjective nature of this filtering induces biases due to, among other things, resource constraints, editorial guidelines, ideological affinities, or even the fragmented nature of the information at a journalist’s disposal. The magnitude and direction of these biases are, however, widely unknown. The absence of ground truth, the sheer size of the event space, or the lack of an exhaustive set of absolute features to measure make it difficult to observe the bias directly, to characterize the leaning’s nature and to factor it out to ensure a neutral coverage of the news.
In this work, we introduce a methodology to capture the latent structure of media’s decision process on a large scale. Our contribution is multi-fold. First, we show media coverage to be predictable using personalization techniques, and evaluate our approach on a large set of events collected from the GDELT database. We then show that a personalized and parametrized approach not only exhibits higher accuracy in coverage prediction, but also provides an interpretable representation of the selection bias. Last, we propose a method able to select a set of sources by leveraging the latent representation. These selected sources provide a more diverse and egalitarian coverage, all while retaining the most actively covered events.
Misleading or Falsification? Inferring Deceptive Strategies and Types in Online News and Social Media
Authors: Svitlana Volkova and Jin Yea Jang
Keywords: natural language processing, machine learning, disinformation, misinformation, deception, social media analysis
Deceptive information in online news and social media had dramatic effect on our society in recent years. This study is the first to gain deeper insights into writers’ intent behind digital misinformation by analyzing psycholinguistic signals: moral foundations and connotations extracted from different types of deceptive news ranging from strategic disinformation to propaganda and hoaxes. To ensure consistency of our findings and generalizability across domains we experiment with data from: (1) confirmed cases of disinformation in news summaries, (2) propaganda, hoax and disinformation news pages, and (3) Twitter news. We first contrast lexical markers of biased language, syntactic and stylistic signals, and connotations across deceptive news types: disinformation, propaganda, and hoaxes and deceptive strategies: misleading or falsification. We then incorporate these insights to build machine learning and deep learning predictive models to infer deception strategies and deceptive news types. Our experimental results demonstrate that unlike earlier work on deception detection, content combined with biased language markers, moral foundations, and connotations leads to better predictive performance of deception strategies compared to syntactic and stylistic signals (as reported in earlier work on deceptive reviews). Falsification strategy is easier to identify than misleading. Disinformation is more difficult to predict compared to propaganda or hoaxes. Deceptive news types (disinformation, propaganda, and hoaxes), unlike deceptive strategies (falsification and misleading), are more revealed, and thus easier to identify in tweets compared to news reports. Finally, our novel connotation analysis across deception types provides deeper understanding of writers’ perspectives, therefore, the intentions behind digital misinformation.
Illuminating the ecosystem of partisan websites
Authors: Shweta Bhatt, Sagar Joglekar, Shehar Bano and Nishanth Sastry
Keywords: Fake News, hyper partisan news, Internal linking, Filter Bubble
Alternative news media ecosystems thrive on growing popularity of social media networking sites and thus have been successful in influencing opinions and beliefs by false and biased news reporting. The 2016 U.S. elections are believed to have suffered enormously due to hyper-partisan online journalism. Within this context, this paper aims at finding specific evidences of hyper-partisan clusters and key characteristics that mediate the traffic flow within the partisan media ecosystem. This is achieved by analyzing a data set consisting of a curated list of 668 partisan websites and 4M Facebook posts across 507 corresponding Facebook pages to understand the stake of partisan media in influencing U.S. politics. The paper successfully points out the extensive internal traffic forwarding within partisan sites, illustrates how the web and social media strengthen the political divide between the left and the right, finds temporal evidences of strong involvement of partisan sites during 2016 U.S. elections and discusses the characteristics of their target audiences
From Alt-Right to Alt-Rechts: Twitter Analysis of the 2017 German Federal Election
Authors: Fred Morstatter, Yunqui Shao, Aram Galstyan and Shanika Karunasekera
Keywords: Social Networks, Online Campaigns, Bots
In the 2017 German Federal elections, the “Alternative for Deutschland”, or AfD, party was able to take control of many seats in German parliament. Their success was credited, in part, to their large online presence. Like other “alt-right” organizations worldwide, this party is tech savvy, generating a large social media footprint, especially on Twitter, which provides an ample opportunity to understand their online behavior. In this work we present an analysis of Twitter data related to the aforementioned election. We show how users self-organize into communities, and identify the themes that define those communities. Next we analyze the content generated by those communities, and the extent to which these communities interact. Despite these elections being held in Germany, we note a substantial impact from the English-speaking Twittersphere. Specifically, we note that many of these accounts appear to be from the American alt-right movement, and support the German alt-right movement. Finally, noting a presence of bots in the dataset, we measure the extent of bots in the German election and show how they attempt to influence the discussion of the election.
Exploring Entity-centric Networks in Entangled News Streams
Authors: Andreas Spitz and Michael Gertz
Keywords: entity network, implicit network, news stream, document indexing
The increasing number of news outlets and the frequency of the news cycle have made it all but impossible to obtain the full picture from online news. Consolidating news from different sources has thus become a necessity in online news processing. Despite the amount of research that has been devoted to different aspects of new event detection and tracking in news streams, solid solutions for such entangled streams of full news articles are still lacking. Many existing works focus on streams of microblogs since the analysis of news articles raises the additional problem of summarizing or extracting the relevant sections of articles. For the consolidation of identified news snippets, schemes along numerous different dimensions have been proposed, including publication time, temporal expressions, geo-spatial references, named entities, and topics. The granularity of aggregated news snippets then includes such diverse aspects as events, incidents, threads, or topics for various subdivisions of news articles. To support this variety of granularity levels, we propose a comprehensive network model for the representation of multiple entangled streams of news documents. Unlike previous methods, the model is geared towards entity-centric explorations and enables the consolidation of news along all dimensions, including the context of entity mentions. Since the model also serves as a reverse index, it supports explorations along the dimensions of sentences or documents for an encompassing view on news events. We evaluate the performance of our model on a large collection of entangled news streams from major news outlets of English speaking countries and a ground truth that we generate from event summaries in the Wikipedia Current Events portal.
A Content Management Perspective on Fact-Checking
Authors: Sylvie Cazalens, Philippe Lamarre, Julien Leblay, Ioana Manolescu and Xavier Tannier
Keywords: fact checking, data management, information retrieval
Fact checking has captured the attention of the media and the public alike; it has also recently received strong attention from the computer science community, in particular from data and knowledge management, natural language processing and information retrieval; we denote these together under the term “content management”. In this paper, we identify the fact checking tasks which can be performed with the help of content management technologies, and survey the recent research works in this area, before laying out some perspectives for the future. We hope our work will provide interested researchers, journalists and fact checkers with an entry point in the existing lit- erature as well as help develop a roadmap for future research and development work.
Detect Rumor and Stance Jointly by Neural Multi-task Learning
Authors: Jing Ma, Wei Gao and Kam-Fai Wong
Keywords: Rumour detection, Stance classification, Multi-task learning, Weight sharing, Social media, Microblog
In recent years, an unhealthy phenomenon characterized as the massive spread of fake news or unverified information (i.e., rumors) has become increasingly a daunting issue in human society. The rumors commonly originate from social media outlets, primarily microblogging platforms, being viral afterwards by the wild, willful propagation via a large number of participants. It is observed that rumorous posts often trigger versatile, mostly controversial stances among participating users. Thus, determining the stances on the posts in question can be pertinent to the successful detection of rumors, and vice versa. Existing studies, however, mainly regard rumor detection and stance classification as separate tasks. In this paper, we argue that they should be treated as a joint, collaborative effort, considering the strong connections between the veracity of claim and the stances expressed in responsive posts. Enlightened by the multi-task learning scheme, we propose a joint framework that unifies the two highly pertinent tasks, i.e., rumor detection and stance classification. Based on deep neural networks, we train both tasks jointly using weight sharing to extract the common and task-invariant features while each task can still learn its task-specific features. Extensive experiments on real-world datasets gathered from Twitter and news portals demonstrate that our proposed framework improves both rumor detection and stance classification tasks consistently with the help of the strong inter-task connections, achieving much better performance than state-of-the-art methods.