Program
of
track
User Modeling, Interaction
and Experience on the Web

List of accepted papers :

  • Prediction of Sparse User-Item Consumption Rates with Zero-Inflated Poisson Regression
    Authors: Moshe Lichman and Padhraic Smyth

    Keywords: consumption rate modeling, repeat consumption, explore-exploit, zero-inflated poisson

    Abstract:
    There are a variety of applications where user behavior consists of a combination of both repeat item consumption and new item consumption, such as listening to music artists, visiting Web sites, purchasing groceries, and so on. In this paper we address the problem of building user models that can predict the rate at which individuals consume both old and new items. We use zero-inflated Poisson (ZIP) regression models as the basis for our modeling approach, leading to a general framework for modeling user-item consumption rates over time. We show that these models are more flexible in capturing user behavior than alternatives such as well-known latent factor and embedding models. We compare the performance of ZIP regression and latent factor and embedding models on three different data sets involving music, restaurant reviews, and social media. The ZIP regression models are systematically more accurate across all three data sets across different prediction metrics.

  • Coevolutionary Recommendation Model: Mutual Learning between Rating and Reviews
    Authors: Yichao Lu, Ruihai Dong and Barry Smyth

    Keywords: Recommender Systems, User Experience, Natural Language Processing

    Abstract:
    Collaborative filtering (CF) is a common recommendation approach that relies on user-item ratings. However, the natural sparsity of user-item rating data can be problematic in many domains and settings, limiting the ability to generate accurate predictions and effective recommendations. Moreover, in some CF approaches latent features are often used to represent users and items, which can lead to a lack of recommendation transparency and explainability. User-generated, customer reviews are now commonplace on many web sites, providing users with an opportunity to convey their experiences and opinions of products and services. As such, these reviews have the potential to serve as a useful source of recommendation data, through capturing valuable sentiment information about particular product features. In this paper, we present a novel deep learning recommendation model, which co-learns user and item information from ratings and customer reviews, by optimizing matrix factorization and an attention-based GRU network. Using real-world datasets we show a significant improvement in recommendation performance, compared to a variety of alternatives. Furthermore, the approach is useful when it comes to assigning intuitive meanings to latent features to improve the transparency and explainability of recommender systems.

  • Aspect-Aware Latent Factor Model: Rating Prediction with Ratings and Reviews
    Authors: Zhiyong Cheng, Ying Ding, Lei Zhu and Mohan Kankanhalli

    Keywords: Aspect-aware, Matrix Factorization, Recommendation, Topic Model

    Abstract:
    Although latent factor model (e.g., matrix factorization) achieves good accuracy in rating prediction, it suffers the problems including cold-start, non-transparency, and suboptimal recommendation for local users or items. In this paper, we exploit textual review information with ratings to tackle these limitations. Firstly, we apply a proposed aspect-aware topic model (ATM) on the review text to model user preferences and item features from different aspects, and estimate the aspect importance of a user towards an item. The aspect importance is then integrated into a novel aspect-aware latent factor model (ALFM), which learns user’s and item’s latent factors based on ratings. In particular, ALFM introduces a weighted matrix to associate those latent factors with the same set of aspects discovered by ATM, such that the latent factors could be used to estimate aspect ratings. Finally, the overall rating is computed via a linear combination of the aspect ratings, which are weighted by the corresponding aspect importance. To this end, our model could alleviate the data sparsity problem and gain good interpretability for recommendation. Besides, an aspect rating is weighted by an aspect importance, which is dependent on the targeted user’s preferences and targeted item’s features. Therefore, it is expected that the proposed method can model a user’s preferences on an item more accurately for each user-item pair locally. Comprehensive experimental studies have been conducted on 19 datasets from Amazon and Yelp 2017 Challenge dataset. Results show that our method achieves significant improvement compared with strong baseline methods, especially for users with only few ratings. Moreover, our model could interpret the recommendation results in depth.

  • When Sheep Shop: Measuring Herding Effects in Product Ratings with Natural Experiments
    Authors: Gael Lederrey and Robert West

    Keywords: product reviews, ratings, herding, social influence, natural experiment, observational study

    Abstract:
    As online shopping becomes ever more prevalent, customers rely increasingly on product rating websites for making purchase decisions. The reliability of online ratings, however, is potentially compromised by the so-called herding effect: when rating a product, customers may be biased to follow other customers’ previous ratings of the same product. This is problematic because it skews long-term customer perception through haphazard early ratings. The study of herding poses important methodological challenges. Observational studies are impeded by the lack of counterfactuals: simply correlating early with subsequent ratings is insufficient because we cannot know what the subsequent ratings would have looked like had the first ratings been different. Experimental studies are rarely an option because either they manipulate real customers’ attitudes toward real products, or they examine lab settings that might differ fundamentally from real settings. The methodology introduced here exploits a situation that comes close to an experiment, although it is purely observational – a natural experiment. Our key methodological device consists in studying the same product on two separate rating sites, focusing on products that received a high first rating on one site, and a low first rating on the other. This largely controls for confounds such as a product’s inherent quality, advertising, and producer identity, and lets us isolate the effect of the first rating on subsequent ratings. In a case study, we focus on beers as products and jointly study two beer rating sites, but our method applies to any pair of sites across which products can be matched. We find clear evidence of herding in beer ratings. For instance, if a beer receives a very high first rating, its second rating is on average half a standard deviation higher, compared to a situation where the identical beer receives a very low first rating. Moreover, herding effects tend to last a long time and are noticeable even after 20 or more ratings. Our results have important implications for the design of better rating systems.

  • Anxiety and Information Seeking: Evidence From Large-Scale Mouse Tracking
    Authors: Brit Youngmann and Elad Yom-Tov

    Keywords: Mouse tracking, Relevance, User interaction, Anxiety

    Abstract:
    People seeking information through search engines are assumed to behave similarly, regardless of the topic which they are searching. Here we use mouse tracking, which is correlated with gaze, to show that the information seeking patterns of people differ dramatically depending on their level of anxiety at the time of the search. We investigate the behavior of people during searches for medical symptoms, ranging from benign indications, where users are not usually anxious, to ones which could harbinger life-threatening conditions, where extreme anxiety is expected. We show that for the latter, 90% of people never saw more than the top 67% of the screen, compared to over 95% scanned by people seeking information on benign symptoms, even though relevant documents are similarly distributed in the results pages to these queries. Based on this observation, we develop a model which can predict the level of anxiety experienced by a user, using attributes derived from mouse tracking data and other user interactions. The model achieves Kendall’s Tau of 0.48 with the medical severity of the symptoms searched. We show the importance of using information about the users’ level of anxiety as predicted by the model, when measuring search engine performance. Our results prove that ignoring this information can lead to significant over-estimation of performance. Additionally, we show the utility of the model in three special instances: where multiple symptoms are searched concurrently; where the searcher has an underlying medical condition; and when users seek information on ways to commit suicide. In the latter, our results demonstrate the importance of help-line notices, and emphasize the need to measure the effective number of results seen by the user. Our results indicate that measures of relevance which use anxiety information can lead to more accurate understanding of the quality of search results, especially when delivering potentially life-saving information to users.

  • On the Causal Effect of Badges
    Authors: Tomasz Kusmierczyk and Manuel Gomez Rodriguez

    Keywords: badges, social platform, statistical testing, causality, natural experiment, bootstrap, difference-in-differences, point processes, confounders, counterfactual world

    Abstract:
    A wide variety of online platforms use digital badges to encourage users to take certain types of desirable actions. However, despite their growing popularity, their causal effect on users’ behavior is not well understood. This is partly due to the lack of counterfactual data and the myriad of complex factors that influence users’ behavior over time. As a consequence, their design and deployment lacks general principles. In this paper, we focus on first-time badges, which are awarded after a user takes a particular type of action for the first time, and study their causal effect by harnessing the delayed introduction of several badges in a popular Q&A website. In doing so, we introduce a novel causal inference framework for first-time badges whose main technical innovations are a robust survival-based hypothesis testing procedure, which controls for the heterogeneity in the benefit users obtain from taking an action, and a bootstrap difference-in-differences method, which controls for the random fluctuations in users’ behavior over time. Our results suggest that first-time badges steer users’ behavior if the initial benefit a user obtains from taking the corresponding action is sufficiently low, otherwise, we do not find significant effects. Moreover, for badges that successfully steered user behavior, we perform a counterfactual analysis and show that they significantly improved the functioning of the site at a community level.

  • The Effect of Ad Blocking on User Engagement with the Web
    Authors: Ben Miroglio, David Zeber, Jofish Kaye and Rebecca Weiss

    Keywords: Ad Blocking, Propensity Scoring, Natural Experiment, Web engagement

    Abstract:
    Web users are increasingly turning to ad blockers to avoid ads, which are often perceived as annoying or an invasion of privacy. While there has been significant research into the factors driving ad blocker adoption and the detrimental effect to ad publishers on the Web, the resulting effects of ad blocker usage on Web users’ browsing experience is not well understood. To approach this problem, we conduct a retrospective natural field experiment using Firefox browser usage data, with the goal of estimating the effect of adblocking on user engagement with the Web. We focus on new users who installed an ad blocker after a baseline observation period, to avoid comparing different populations. Their subsequent browser activity is compared against that of a control group, whose members do not use ad blockers, over a corresponding observation period, controlling for prior baseline usage. In order to estimate causal effects, we employ propensity score matching on a number of other features recorded during the baseline period. In the group that installed an ad blocker, we find significant increases in both active time spent in the browser (+28% over control) and the number of pages viewed (+15% over control), while seeing no change in the number of searches. Additionally, by reapplying the same methodology to other popular Firefox browser extensions, we show that these effects are specific to ad blockers. We conclude that ad blocking has a positive impact on user engagement with the Web, suggesting that any costs of using ad blockers to users’ browsing experience are largely drowned out by the utility that they offer.

  • Learning Causal Effects From Many Randomized Experiments Using Regularized Instrumental Variables
    Authors: Alexander Peysakhovich and Dean Eckles

    Keywords: causal inference, experimentation, instrumental variables, machine learning

    Abstract:
    Scientific and business practices are increasingly resulting in large collections of randomized experiments. Analyzed together, these collections can tell us things that individual experiments in the collection cannot. We study how to learn causal relationships between variables from the kinds of collections faced by modern data scientists: the number of experiments is large, many experiments have very small effects, and the analyst lacks metadata (e.g., descriptions of the interventions). Here we use experimental groups as instrumental variables (IV) and show that a standard method (two-stage least squares) is biased even when the number of experiments is infinite. We show how a sparsity-inducing l_0 regularization can — in a reversal of the standard bias–variance tradeoff in regularization — reduce bias (and thus error) of interventional predictions. Because we are interested in interventional loss minimization we also propose a modified cross-validation procedure (IVCV) to feasibly select the regularization parameter. We show, using a trick from Monte Carlo sampling, that IVCV can be done using summary statistics instead of raw data. This makes our full procedure simple to use in many real-world applications.

  • Camel: Content-Aware and Meta-path Augmented Metric Learning for Author Identification
    Authors: Chuxu Zhang, Chao Huang, Lu Yu, Xiangliang Zhang and Nitesh Chawla

    Keywords: Author Identification, Heterogeneous Networks, Representation Learning, Metric Learning, Deep Learning

    Abstract:
    In this paper, we study the problem of author identification in big scholarly data, which is to effectively rank potential authors for each anonymous paper by using historical data. Most of the existing de-anonymization approaches predict relevance score of paper-author pair via feature engineering, which is not only time and storage consuming, but also introduces irrelevant and redundant features or miss important attributes. Representation learning can automate the feature generation process by learning nodes’ embeddings in academic network to infer the correlation of paper-author pair. However, the learned embeddings are often for general purpose (independent of the specific task), or based on network structure only (without considering the node content). To address these issues and make a further progress in solving the author identification problem, we propose a content-aware and meta-path augmented metric learning model. Specifically, first, the directly correlated paper-author pairs are modeled based on distance metric learning by introducing a push loss function. Next, the paper’s content embedding encoded by the gated recurrent neural network is integrated into the distance loss. Moreover, the historical bibliographic data of papers is utilized to construct an academic heterogeneous network, wherein a meta-path guided walk integrative learning module based on the task-dependent and content-aware Skipgram model is designed to formulate the correlations between each paper and its indirect author neighbors, and further augments the model. The results of extensive evaluations and analytical experiments on the well known AMiner dataset demonstrate that the proposed model achieves better performance, comparing to the state-of-the-art baselines. It achieves an average improvement of 8.3\% over the best baseline method.

  • Bayesian Models for Product Size Recommendations
    Authors: Vivek Sembium, Rajeev Rastogi, Lavanya Sita Tekumalla and Atul Saroop

    Keywords: Personalization, Size Recommendation, Bayesian, Polya-Gamma, Probit, Variational Inference

    Abstract:
    Lack of calibrated product sizing in popular categories such as apparel and shoes leads to customers purchasing incorrect sizes, which in turn results in high return rates due to fi€t issues. We address the problem of product size recommendations based on customer purchase and return data. We propose a novel approach based on Bayesian logit and probit regression models with ordinal categories {Small, Fit, Large} to model size fits as a function of the difference between latent sizes of customers and products. We propose posterior computation based on mean-field variational inference, leveraging the Polya-Gamma augmentation for the logit prior, that results in simple updates, enabling our technique to efficiently handle large datasets. O„ur experiments with real-life shoe datasets show that our model outperforms the state of the art in 5 of 6 datasets and leads to an improvement of 17-26% in AUC over baselines when predicting size fit outcomes.

  • Robust Factorization Machines for User Response Prediction
    Authors: Surabhi Punjabi and Priyanka Bhatt

    Keywords: Factorization Machines, Field Aware Factorization Machines, Robust Optimization, Computational Advertising, Response Prediction, Interval Uncertainty

    Abstract:
    Factorization machines (FMs) are state-of-the-art model class for user response prediction in the computational advertising domain. Rapid growth of internet and mobile device usage has given rise to multiple customer touchpoints. This coupled with factors like high cookie churn rate results in a fragmented view of user activity at the advertiser’s end. Current literature assumes procured user signals as the absolute truth which is contested by absence of deterministic identity linkage across a user’s multiple avatars. This is the first work advocating the application of Robust Optimization (RO) principles to design approaches that account for these data uncertainties and are immune against perturbations. We propose two novel algorithms: robust factorization machine (RFM) and its field aware variant (RFFM), under interval uncertainty. These formulations are generic and can find applicability in any classification setting under noise. We provide a distributed and scalable Spark implementation using parallel stochastic gradient descent. In the experiments conducted on three real world datasets, the robust counterparts outperform the baselines significantly under perturbed settings. Our experimental findings reveal interesting connections between choice of uncertainty set and the noise-proofness of resulting models.

  • Aesthetic-based Clothing Recommendation
    Authors: Wenhui Yu, Huidi Zhang, Xiangnan He, Xu Chen, Li Xiong and Zheng Qin

    Keywords: Clothing recommendation, side information, aesthetic features, tensor factorization, dynamic collaborative filtering

    Abstract:
    Recently, product images gain increasing concern in clothing recommendation since the visual appearance of the items have a significant impact on consumers’ decision. Existing models usually extract conventional features, such as convolutional neural network (CNN) features, scale-invariant feature transform (SIFT) features, and color histograms, to represent item image characters and capture user visual preferences. However, one important feature, aesthetic feature, is typically ignored. It is vital in recommendation since users’ decision depends largely on if the clothing is in line with their aesthetic while the conventional image features cannot portray this directly. To bridge this gap, we propose to introduce aesthetic information which is more related with users’ preference into the field of clothing recommender system. To do so, we first present the aesthetic features extracted by an pre-trained neural network, which is a brain inspired deep structure trained for aesthetic assessment task. Considering the aesthetic preference shows diversity with different people and time, we propose a novel tensor factorization model as a basic model and then incorporate the aesthetic features into it. Finally, extensive experiments on real-world datasets demonstrate that our approach can capture the aesthetic preference of consumers and outperform several state-of-the-art models significantly.

  • How to Impute Missing Ratings?: Claims, Solution, and Its Application to Collaborative Filtering
    Authors: Youngnam Lee, Sang-Wook Kim, Sunju Park and Xing Xie

    Keywords: Recommender Systems, Collaborative Filtering, Data Sparsity, Data Imputation

    Abstract:
    Data sparsity is one of the biggest problems faced by collaborative filtering used in recommender systems. Data imputation alleviates the data sparsity problem by inferring missing ratings and imputing them to the original rating matrix. In this paper, we identify the limitations of existing data imputation approaches and suggest three new claims that all data imputation approaches should follow to achieve high recommendation accuracy. Furthermore, we propose a deep-learning based approach to compute imputed values that satisfies all three claims. Based on our hypothesis that most pre-use preferences (e.g., impressions) on items lead to their post-use preferences (e.g., ratings), our approach tries to understand via deep learning how pre-use preferences lead to post-use preferences differently depending on the characteristics of users and items. Through extensive experiments on real-world datasets, we verify our three claims and hypothesis, and also demonstrate that our approach significantly outperforms existing state-of-the-art approaches.

  • AdaError: An Adaptive Learning Rate Method for Matrix Approximation-based Collaborative Filtering
    Authors: Dongsheng Li, Chao Chen, Qin Lv, Hansu Gu, Tun Lu, Li Shang, Ning Gu and Stephen Chu

    Keywords: recommender systems, collaborative filtering, matrix approximation

    Abstract:
    In matrix approximation (MA)-based collaborative filtering (CF) algorithms, gradient-based learning methods, e.g., stochastic gradient descent (SGD), are widely adopted to learn MA models based on observed user-item ratings. However, one of the common issues in existing gradient-based learning methods is how to determine proper learning rates, because the model convergence will be inaccurate or very slow if the learning rate is too large or too small, respectively. This paper proposes AdaError — an adaptive learning rate method for matrix approximation-based collaborative filtering. AdaError can reduce the learning rates for noisy ratings to prevent the learned models from overreacting to the noises. Meanwhile, AdaError can adaptively shrink the learning rates to eliminate the need of manually tuning the learning rates. Our theoretical and empirical analysis shows that the generalization performance of learned MA models can be improved using AdaError. Experimental studies on MovieLens and Netflix datasets demonstrate that the proposed method can outperform state-of-the-art adaptive learning rate methods in matrix approximation-based collaborative filtering. Meanwhile, by applying the proposed AdaError method on standard matrix approximation method, we can achieve statistically significant improvements in both rating prediction accuracy and top-N recommendation accuracy compared with state-of-the-art collaborative filtering methods.

  • Latent Relational Metric Learning via Memory-based Attention for Collaborative Ranking
    Authors: Yi Tay, Anh Tuan Luu and Siu Cheung Hui

    Keywords: Collaborative Filtering, Recommender Systems, Neural Networks, Deep Learning, Attention Mechanism

    Abstract:
    This paper proposes a new neural architecture for collaborative ranking with implicit feedback. Our model, LRML (Latent Relational Metric Learning) is a novel extension of metric learning approaches for recommendation. More specifically, instead of simple push pull mechanisms between user and item pairs, we propose to learn latent relations for each user item interaction. This helps to alleviate the potential geometric inflexibility of existing metric learning approaches. This not only enables better performance but also a greater extent of modeling capability, allowing our model to scale to larger number of interactions. In order to do so, we employ a augmented memory module and learn to attend over these memory blocks to construct latent relations. The attention module is controlled by the user-item interaction, making the learned relation vector specific to each user-item pair. Hence, this can be interpreted as learning an exclusive and optimal relational translation for each user-item interaction. The proposed architecture not only demonstrates the state-of-the-art performance across multiple recommendation benchmarks. LRML outperforms other metric learning models by 6%-7.5% in terms of Hits@10 and nDCG@10 on large datasets such as Netflix and MovieLens20M. Moreover, qualitative studies also demonstrate evidence that our proposed model is able to infer and encode explicit sentiment, temporal and attribute information despite being only trained on implicit feedback. As such, this ascertains the ability of LRML to uncover hidden relational structure within implicit datasets.

  • Variational Autoencoders for Collaborative Filtering
    Authors: Dawen Liang, Rahul Krishnan, Matthew Hoffman and Tony Jebara

    Keywords: Recommender systems, collaborative filtering, implicit feedback, variational autoencoder, Bayesian models

    Abstract:
    We extend variational autoencoders (VAEs) to collaborative filtering for implicit feedback. This non-linear probabilistic model enables us to go beyond the limited modeling capacity of linear factor models which still largely dominate collaborative filtering research. We introduce a generative model with multinomial likelihood and use Bayesian inference to learn this powerful generative model. Despite widespread use in language modeling and economics, the multinomial likelihood receives less attention in the recommender systems literature. We introduce a different regularization parameter for the learning objective, which proves to be crucial for achieving competitive performance. Remarkably, there is an efficient way to tune the parameter using annealing. The resulting model and learning algorithm has information-theoretic connections to maximum entropy discrimination and the information bottleneck principle. Empirically, we show that the proposed approach significantly outperforms several state-of-the-art baselines, including two recently-proposed neural network approaches, on several real-world datasets. We also provide extended experiments comparing the multinomial likelihood with other commonly used likelihood functions in the latent factor collaborative filtering literature and show favorable results. Finally, we identify the pros and cons of employing a principled Bayesian inference approach and characterize settings where it provides the most significant improvements.