A Critical Review of Social Data: Biases, Methodological Pitfalls, and Ethical Boundaries
Online social data such as user-generated content, expressed or implicit relationships between people, and behavioral traces are at the core of many popular web applications and platforms, driving the research agenda of many researcher in both academia and industry. The promises of social data are many, including the understanding of “what the world thinks” about a social issue, brand, product, celebrity, or other entity, as well as enabling be er decision-making in a variety of elds including public policy, healthcare, and economics.
However, many academics and practitioners are increasingly warning against the naive usage of social data. They highlight that there are biases and inaccuracies occurring at the source of the data, but also introduced during data processing pipeline; there are methodological limitations and pitfalls, as well as ethical boundaries and unexpected consequences that are o en overlooked. Such an overlook can lead to wrong or inappropriate results that can be consequential.
This tutorial recognizes the rigor with which these issues are addressed by different researchers varies across a wide range, and aims to survey and categorize common classes of data biases and pitfalls that can occur both at the sources of social data as well as along the prototypical data processing pipeline. We conclude the tutorial with a set of open questions and research directions.