Crowdsourcing and
Human Computation for the Web

List of accepted papers :

  • Creating Crowdsourced Research Talks at Scale
    Authors: Rajan Vaish, Shirish Goyal, Amin Saberi and Sharad Goel

    Keywords: Crowdsourcing, online creative collaboration, peer video production, scientific knowledge and learning at scale

    There has been a marked shift towards learning and consuming information through video. Most academic research, however, is still distributed only in text form, as researchers often have limited time, resources, and incentives to create video versions of their work. To address this gap, we propose, deploy, and evaluate a scalable, end-to-end system for crowdsourcing the creation of short, 5-minute research videos based on academic papers. Doing so requires solving complex coordination and collaborative video production problems. To assist coordination, we designed a structured workflow that enables efficient delegation of tasks, while also motivating the crowd through a collaborative learning environment. To facilitate video production, we developed an online tool with which groups can make micro-audio recordings that are automatically stitched together to create a complete talk. We tested this approach with a group of volunteers recruited from 52 countries through an open call. This distributed crowd produced over 100 video talks in 12 languages based on papers from top-tier computer science conferences. The produced talks consistently received high ratings from a diverse group of non-experts and experts, including the authors of the original papers. These results indicate that our crowdsourcing approach is a promising method for producing high-quality research talks at scale, increasing the distribution and accessibility of scientific knowledge.

  • Attack under Disguise: An Intelligent Data Poisoning Attack Mechanism in Crowdsourcing
    Authors: Chenglin Miao, Qi Li, Lu Su, Mengdi Huai, Wenjun Jiang and Jing Gao

    Keywords: Crowdsourcing, Data poisoning, Expectation maximization

    As an effective way to solicit useful information from the crowd, crowdsourcing has emerged as a popular paradigm to solve challenging tasks. However, the data provided by the participating workers are not always trustworthy. In real world, there may exist malicious workers in crowdsourcing systems who conduct the data poisoning attacks for the purpose of sabotage or financial rewards. Although data aggregation methods such as majority voting are conducted on workers’ labels in order to improve data quality, they are vulnerable to such attacks as they treat all the workers equally. In order to capture the variety in the reliability of workers, the Dawid-Skene model, a sophisticated data aggregation method, has been widely adopted in practice. By conducting maximum likelihood estimation (MLE) using the expectation maximization (EM) algorithm, the Dawid-Skene model can jointly estimate each worker’s reliability and conduct weighted aggregation, and thus can tolerate the data poisoning attacks to some degree. However, the Dawid-Skene model still has weakness. In this paper, we study the data poisoning attacks against such crowdsourcing systems with the Dawid-Skene model empowered. We design an intelligent attack mechanism, based on which the attacker can not only achieve maximum attack utility but also disguise the attacking behaviors. Extensive experiments based on real-world crowdsourcing datasets are conducted to verify the desirable properties of the proposed mechanism.

  • Leveraging Crowdsourcing Data For Deep Active Learning – An Application: Learning Intents in Alexa
    Authors: Jie Yang, Thomas Drake, Andreas Damianou and Yoelle Maarek

    Keywords: Deep Active Learning, Targeted Crowdsourcing, Bayesian Method

    This paper presents a generic Bayesian framework that enables any deep learning model to actively learn from targeted crowds. Our framework inherits from recent advances in Bayesian deep learning, and extends existing work by considering the targeted crowdsourcing approach, in which multiple annotators of unknown expertise contribute an uncontrolled amount (often limited) of annotations. Specifically, our framework leverages the low-rank structure in annotations to learn individual annotator expertise, which then helps to infer the true labels from noisy and sparse annotations. It provides a unified Bayesian model to simultaneously infer the true labels and train the deep learning model in order to reach an optimal learning efficacy. Finally, our framework exploits the uncertainty of the deep learning model during prediction as well as the annotators’ estimated expertise to minimize the number of required annotations and annotators for optimally training the deep learning model.
    We evaluate the effectiveness of our framework in both simulated and real-world datasets, and validate its applicability in a strategic task of Alexa, Amazon’s personal assistant, namely intent classification. Extensive experiments show that our framework can accurately learn annotator expertise, infer true labels, and effectively reduce the amount of annotations in model training as compared to state-of-the-art approaches. We further discuss the potential of our proposed framework in bridging machine learning and crowdsourcing towards improved human-in-the-loop systems.

  • Web-Based VR Experiments Powered by the Crowd
    Authors: Xiao Ma, Megan Cackett, Leslie Park, Eric Chien, Tianyu Xie, Jing Shan and Mor Naaman

    Keywords: VR, Crowdsourcing, Amazon Mechanical Turk

    We build on the increasing availability of VR devices and Web technologies to conduct behavioral experiments in Virtual Reality (VR) using crowdsourcing techniques. A new recruiting and validation method allows us to create a panel of eligible experiment participants recruited from Amazon Mechanical Turk. Using this panel, we ran three different crowdsourced VR experiments, each reproducing one of three VR illusions: place illusion, embodiment illusion, and plausibility illusion. Our experience and worker feedback on these experiments show that conducting Web-based VR experiments using crowdsourcing is already feasible, though some challenges–including scale–remain. Such crowdsourced VR experiments on the Web have the potential to finally support replicable VR experiments with diverse populations at a low cost.

  • CHIMP: Crowdsourcing Human Inputs for Mobile Phones
    Authors: Mario Almeida, Muhammad Bilal, Matteo Varvello, Alessandro Finamore, Ilias Leontiadis, Yan Grunenberger and Jeremy Blackburn

    Keywords: mobile, system, crowdsourcing, android

    While developing mobile apps is becoming easier, testing and characterizing their behavior is still hard. On the one hand, the de facto testing tool, called “”Monkey,”” scales well due to being based on random inputs, but fails to gather inputs useful in understanding things like user engagement and attention. On the other hand, gathering inputs and data from real users requires distributing instrumented apps, or even phones with pre-installed apps, an expensive and inherently unscalable task.
    To address these limitations we present CHIMP, a system that integrates automated tools and large-scale crowdsourced inputs. CHIMP is different from previous approaches in that it runs apps in a virtualized mobile environment that thousands of users all over the world can access via a standard Web browser. CHIMP is thus able to gather the full range of real-user inputs, detailed run-time traces of apps, and network traffic.
    We describe CHIMP’s design and demonstrate the efficiency of our approach by testing thousands of apps via thousands of crowdsourced users. We calibrate CHIMP with a large-scale campaign to understand how users approach app testing tasks. Finally, we show how CHIMP can be used to improve both traditional app testing tasks, as well as more novel tasks such as building a traffic classifier on encrypted network flows.

  • Crowd-based Multi-Predicate Screening of Papers in Literature Reviews
    Authors: Evgeny Krivosheev, Fabio Casati and Boualem Benatallah

    Keywords: crowdsourcing, classification, systematic reviews

    Systematic literature reviews (SLRs) are one of the most common and useful form of scientific research and publication. Tens of thousands of SLRs are published each year, and this rate is growing across all fields of science. Performing an accurate, complete and unbiased SLR is however a difficult and expensive endeavor. This is true in general for all phases of a literature review, and in particular for the paper screening phase, where authors lter a set of potentially in-scope papers based on a number of exclusion criteria. To address the problem, in recent years the research community has began to explore the use of the crowd to allow for a faster, accurate, cheaper and unbiased screening of papers. Initial results show that crowdsourcing can be effective, even for relatively complex reviews.
    In this paper we derive and analyze a set of strategies for crowd-based screening, and show that an adaptive strategy, that continuously re-assesses the statistical properties of the problem to minimize the number of votes needed to take decisions for each paper, significantly outperforms a number of non-adaptive approaches in terms of cost and accuracy. We validate both applicability and results of the approach through a set of crowdsourcing experiments, and discuss properties of the problem and algorithms that we believe to be generally of interest for classification problems where items are classified via a series of successive tests (as it often happens in medicine).

  • The Size Conundrum: Why Online Knowledge Markets Can Fail at Scale
    Authors: Himel Dev, Chase Geigle, Qingtao Hu, Jiahui Zheng and Hari Sundaram

    Keywords: knowledge market, community question answering, content generation, Cobb-Douglas production function, diseconomies of scale

    In this paper, we interpret the community question answering websites on the StackExchange platform as knowledge markets, and analyze how and why these markets can fail at scale. A knowledge market framing allows site operators to reason about market failures, and to design policies to prevent them. Our goal is to provide insights on large-scale knowledge market failures through an interpretable model. We explore a set of interpretable economic production models on a large empirical dataset to analyze the dynamics of content generation in knowledge markets. Amongst these, the Cobb-Douglas model best explains empirical data and provides an intuitive explanation for content generation through concepts of elasticity and diminishing returns. Content generation depends on user participation and also on how specific types of content (e.g. answers) depends on other types (e.g. questions). We show that these factors of content generation have constant elasticity—a percentage increase in any of the inputs leads to a constant percentage increase in the output. Furthermore, markets exhibit diminishing returns—the marginal output decreases as the input is incrementally increased. Knowledge markets also vary on their returns to scale—the increase in output resulting from a proportionate increase in all inputs. Importantly, many knowledge markets exhibit diseconomies of scale—measures of market health (e.g., the percentage of questions with an accepted answer) decrease as a function of number of participants. The implications of our work are two-fold: site operators ought to design incentives as a function of system size (number of participants); the market lens should shed insight into complex dependencies amongst different content types and participant actions in general social networks.