Security and Privacy on the Web

List of accepted papers :

  • ProxyTorrent: Untangling the Free HTTP(S) Proxy Ecosystem
    Authors: Diego Perino, Matteo Varvello and Claudio Soriente

    Keywords: free proxies, measurements, web security

    Free web proxies promise anonymity and censorship circumvention at no cost. Several websites publish lists with thousands of free proxies organized by country, anonymity level, and performance. However, these lists, populated via automated tools and crowd-sourcing, contain lots of unreachable, unreliable, and sometimes even malicious proxies. It is fair to say that little is known about the free proxy ecosystem. In this paper we shed light on this ecosystem via ProxyTorrent, a distributed measurement system that leverages both active and passive measurements. Active measurements discover free proxies, assess their performance and detect potential malicious activities. Passive measurements run at user premises (via a Chrome plugin) and collect statistics about proxy usage and performance in the wild. ProxyTorrent has been running since January 2017, monitoring up to 160,000 free proxies per day and totalling more than 1500 users. Our dataset shows that only half of the announced proxies have decent performance and that roughly 7% exhibit malicious behavior. Interestingly, malicious proxies are the ones that show the best performance, supporting the common belief that free proxies are “free for a reason”. Also, we observe users do not have strong anonymity preferences and select proxies in countries where they are most available.

  • An Automated Approach to Auditing Disclosure of Third-Party Data Collection in Website Privacy Policies
    Authors: Timothy Libert

    Keywords: Web Privacy, Web Security, Internet Policy, Internet Regulation

    A dominant regulatory model for web privacy is “”notice and choice””. In this model, users are notified of data collection and provided with options to control it. To examine the efficacy of this approach, this study presents the first large-scale audit of disclosure of third-party data collection in website privacy policies. Data flows on one million websites are analyzed and over 200,000 website’s privacy policies are audited to determine if users are notified of the names of the companies which collect their data. Policies from 25 prominent third-party data collectors are also examined to provide deeper insights into the totality of the policy environment. Policies are additionally audited to determine if the choice expressed by the “”Do Not Track”” browser setting is respected.
    It is found that third-party data collection is wide-spread, but only 15% of attributed data flows are disclosed. The third-parties most likely to be disclosed are those with consumer services users may be aware of, those without consumer services are less likely to be mentioned. Policies are difficult to understand and the average time requirement to read both a given site’s policy and the associated third-party policies exceeds 85 minutes. Only 7% of first-party site policies mention the “”Do Not Track”” signal, and the majority of such mentions are to specify that the signal is ignored. Among third-party policies examined, none offer unqualified support for the “”Do Not Track”” signal. Findings indicate that current implementations of “”Notice and Choice”” fail to provide notice or respect choice.

  • Your Secrets Are Safe: How Browsers’ Explanations Impact Misconceptions About Private Browsing Mode
    Authors: Yuxi Wu, Panya Gupta, Miranda Wei, Yasemin Acar, Sascha Fahl and Blase Ur

    Keywords: Private browsing, Web browser privacy, Usable privacy, User study, Incognito mode, Private browsing mode

    All major web browsers include a private browsing mode that does not store browsing history, cookies, or temporary files across browsing sessions. Unfortunately, users have misconceptions about what this mode does. Many factors likely contribute to these misconceptions. In this paper, we focus on browsers’ disclosures, or their in-browser explanations of private browsing mode. In a 460-participant online study, each participant saw one of 13 different disclosures (the desktop and mobile disclosures of six popular browsers, plus a control). Based on the disclosure they saw, participants answered questions about what would happen in twenty browsing scenarios capturing previously documented misconceptions. We found that browsers’ disclosures fail to correct the majority of the misconceptions we tested. These misconceptions included beliefs that private browsing mode would prevent geolocation, advertisements, viruses, and tracking by both the websites visited and the network provider. Furthermore, participants who saw certain disclosures were more likely to have misconceptions about private browsing’s impact on targeted advertising, the persistence of lists of downloaded files and bookmarks, and tracking by ISPs, employers, and governments.

  • Betrayed by Your Dashboard: Discovering Malicious Campaigns via Web Analytics
    Authors: Oleksii Starov, Yuchen Zhou, Xiao Zhang, Najmeh Miramirkhani and Nick Nikiforakis

    Keywords: web analytics, malware, domains, attribution, phishing

    To better understand the demographics of their visitors and their paths
    through their websites, the vast majority of modern website owners make
    use of third-party analytics platforms, such as, Google Analytics and
    ClickTale. Given that all the clients of a third-party analytics platform
    report to the same server, the tracking requests need to contain identifiers
    that allows the analytics server to differentiate between their clients.
    In this paper, we analyze the analytics identifiers utilized by twenty
    different third-party analytics platforms and show that these identifiers
    allow for the clustering of seemingly unrelated websites as part of a common
    third-party analytics account (i.e. websites whose analytics are managed by
    a single person or team). We focus our attention on malicious websites that
    also utilize third-party web analytics and show that threat analysts can
    utilize web analytics to both discover previously unknown malicious pages
    in a threat-agnostic fashion, as well as to cluster malicious websites into
    campaigns. We build a system for automatically identifying, isolating, and
    quering analytics identifiers from malicious pages and use it to discover
    more than 14K live domains that use analytics associated with malicious
    pages. We show how our system can be used to improve the coverage of existing
    blacklists, discover previously unknown phishing campaigns, identify malicious
    binaries and Android apps, and even aid in attribution of malicious domains
    with protected WHOIS information.

  • Large-Scale Analysis of Style Injection by Relative Path Overwrite
    Authors: Sajjad Arshad, Seyed Ali Mirheidari, Tobias Lauinger, Bruno Crispo, Engin Kirda and William Robertson

    Keywords: Relative Path Overwrite, Scriptless Attack, Style Injection

    Relative Path Overwrite (RPO) is a recent technique to inject style directives into websites even when no style sink or markup injection vulnerability is present. It exploits differences in how browsers and web servers interpret relative paths (i.e., path confusion) to make a HTML page reference itself as a stylesheet; a simple text injection vulnerability along with browsers’ leniency in parsing CSS resources results in an attacker’s ability to inject style directives that will be interpreted by the browser. Even though style injection may appear less serious a threat than script injection, it has been shown that it enables a range of attacks, including secret exfiltration.
    In this paper, we present the first large-scale study of the Web to measure the prevalence and significance of style injection using RPO. Our work shows that around 9 % of the websites in the Alexa Top 10,000 contain at least one vulnerable page, out of which more than one third can be exploited. We analyze in detail various impediments to successful exploitation, and make recommendations for remediation. In contrast to script injection, relatively simple countermeasures exist to mitigate style injection. However, there appears to be little awareness of this attack vector as evidenced by a range of popular Content Management Systems (CMSes) that we found to be exploitable.

  • Uncovering HTTP Header Inconsistencies and the Impact on Desktop/Mobile Websites
    Authors: Abner Mendoza, Phakpoom Chinprutthiwong and Guofei Gu

    Keywords: Mobile Web Security, HTTP Header, Measurement

    The paradigm shift to a mobile-first economy has seen a drastic increase in mobile-optimized websites that in many cases are derived from their desktop counterparts. Mobile website design is often focused on performance optimization rather than security, and possibly developed by different teams of developers. This has resulted in a number of subtle but critical inconsistencies in terms of security guarantees provided on the web platform, such as protection mechanisms against common web attacks. In this work, we have conducted the first systematic measurement study of inconsistencies between mobile and desktop HTTP security response configuration in the top 70,000 websites. We show that HTTP security configuration inconsistencies between mobile and desktop versions of the same website can lead to vulnerabilities. Our study compares data snapshots collected one year apart to garner insights into the longitudinal trends of mobile versus desktop inconsistencies in websites.
    To complement our measurement study, we present a threat analysis that explores some possible attack scenarios that can leverage the inconsistencies found on real websites. We systematically analyze the security impact of the inconsistent implementations between the mobile and desktop versions of a website and show how it can lead to real-world exploits. We present several case studies of popular websites to show real-world impact of how these inconsistencies are leveraged to compromise security and privacy of web users. Our results show little to no improvements across our datasets, which highlight the continued pervasiveness of subtle inconsistencies affecting even some high profile websites.

  • Panning for Understanding the dynamics of domain dropcatching
    Authors: Najmeh Miramirkhani, Timothy Barron, Michael Ferdman and Nick Nikiforakis

    Keywords: expired domains, deleted domains, network security, domain registration, DNS

    An event that is rarely considered by technical users and laymen alike is that of a domain name expiration. The massive growth in the registration of domain names is followed by daily, equally massive, expirations where domains are allowed to expire and are made again available for registration. While the vast majority of expiring domains are of no value, among the hundreds of thousands of daily expirations, there exist domains that are clearly valuable, either because of their lexical composition, or because of their residual trust.
    In this paper, we investigate the dynamics of domain dropcatching where companies, on behalf of users, compete to register the most desirable domains as soon as they are made available and then auction them off to the highest bidder. Using a data-driven approach, we monitor the expiration of 28 million domains over the period of ten months, collecting domain features, WHOIS records, and crawling the registered domains on a regular basis to uncover the purpose for which they were re-registered. Among others, we find that, on average only 10% of the expired (dropped) domains are re-registered (caught) with the vast majority of the reregistrations happening on the day of their expiration. We investigate the reasons that make some domains more likely to be registered than others and discover that a domain that was malicious at the time of its expiration is twice as likely to be registered than the average domain. Moreover, previously-malicious domains are significantly more likely to be reused for malicious purposes than previously benign domains. We identify three types of users who are interested in purchasing dropped domains, ranging from freelancers that purchase one or two domains to professionals who invested more than $660K purchasing expired domains in only three months. Finally, content-wise, we observe that less than 11\% were used to host web content with the remaining content used either for speculatory/malicious purposes.

  • Incognito: A Method for Obfuscating Web Data
    Authors: Rahat Masood, Dinusha Vatsalan, Muhammad Ikram and Dali Kaafar

    Keywords: Web Privacy, Privacy Risk Evaluation, Probabilistic Model, Semantic Similarity, Adversary Attack

    Users leave a trail of their personal data, interests, and intents while surfing or sharing information on the Web. Web data could therefore reveal some private/sensitive information about users based on inference analysis. The possible identification of information corresponding to a single individual by an inference attack holds true even if the user identifiers are encoded or removed in the Web data. Several works have been done on improving privacy of Web data through obfuscation methods. However, these methods are neither comprehensive, generic to be applicable to any Web data, nor effective against adversarial attacks. To this end, we propose a privacy-aware obfuscation method for Web data addressing these identified drawbacks of existing methods. We use probabilistic methods to predict privacy risk of Web data that incorporates all key privacy aspects, which are uniqueness, uniformity, and linkability of Web data. The Web data with high predicted risk are then obfuscated by our method to minimize the privacy risk using semantically similar data. Our method is resistant against adversary who has knowledge about the datasets and model learned risk probabilities using differential privacy-based noise addition. Experimental study conducted on two real Web datasets validates the significance and efficacy of our method. Our results indicate that the average privacy risk reaches to 100% with a minimum of 10 sensitive Web entries, while at most 0% privacy risk could be attained with our obfuscation method at the cost of average utility loss of 64.3%.

  • Platform Criminalism – The ‘Last-Mile’ Geography of the Darknet Market Supply Chain
    Authors: Martin Dittus, Joss Wright and Mark Graham

    Keywords: Darknet markets, Cryptomarkets, Platforms, Online crime, Economic geography, Information geography

    Does recent growth of darknet markets signify a slow reorganisation of the illicit drug trade? Where are darknet markets situated in the global drug supply chain? In principle, darknet markets allow producers to sell directly to end users, bypassing traditional trafficking routes. And yet, there is evidence that many offerings originate from a small number of highly active consumer countries, rather than from countries that are primarily known for drug production. In a large-scale empirical study, we determine the darknet trading geography of three plant-based drugs across four of the largest darknet markets, and compare it to the global footprint of production and consumption for these drugs. We present strong evidence that cannabis and cocaine vendors are primarily located in a small number of consumer countries, rather than producer countries, suggesting that darknet trading happens at the `last mile’, possibly leaving old trafficking routes intact. A model to explain trading volumes of opiates is inconclusive. We cannot find evidence for significant production-side offerings across any of the drug types or marketplaces. Our evidence further suggests that the geography of darknet market trades is primarily driven by existing consumer demand, rather than new demand fostered by individual markets.

  • Tagvisor: A Privacy Advisor for Sharing Hashtags
    Authors: Yang Zhang, Mathias Humbert, Tahleen Rahman, Cheng-Te Li, Jun Pang and Michael Backes

    Keywords: location privacy, hashtag, social network

    Hashtag has emerged as a widely used concept of popular culture and campaigns, but its implications on people’s privacy have not been investigated so far. In this paper, we present the first systematic analysis of privacy issues induced by hashtags. We concentrate in particular on location, which is recognized as one of the key privacy concerns in the Internet era. By relying on a random forest model, we show that we can infer a user’s precise location from hashtags with accuracy of 70% to 76%, depending on the city. To remedy this situation, we introduce a system called Tagvisor that systematically suggests alternative hashtags if the user-selected ones constitute a threat to location privacy. Tagvisor realizes this by means of three conceptually different obfuscation techniques and a semantics-based metric for measuring the consequent utility loss. Our findings show that obfuscating as little as two hashtags already provides a near-optimal trade-off between privacy and utility in our dataset. This in particular renders Tagvisor highly time-efficient, and thus, practical in real-world settings.

  • AdBudgetKiller: Online Advertising Budget Draining Attack
    Authors: I Luk Kim, Weihang Wang, Yonghwi Kwon, Yunhui Zheng, Yousra Aafer, Weijie Meng and Xiangyu Zhang

    Keywords: Online Advertising, Ad Fraud, Budget Draining Attack

    In this paper, we present a new ad budget draining attack. By repeatedly pulling ads from targeted advertisers using crafted browsing profiles, we are able to reduce the chance of showing their ads to real-human visitors and trash the ad budget. From the advertiser profiles collected by an automated crawler, we infer advertising strategies, train satisfying browsing profiles and launch large-scale attacks. We evaluate our methods on 291 public advertisers selected from Alexa Top 500, where we successfully reveal the targeting strategies used by 87% of the advertisers we considered. We also executed a series of attacks against a controlled advertiser and 3 real-world advertisers within the ethical and legal boundary. The results show that we are able to fetch 40,958 ads and drain up to $155.89 from the targeted advertisers within an hour.

  • Hiding in the crowd: an analysis of the effectiveness of browser fingerprinting at large scale
    Authors: Alejandro Gómez-Boix, Pierre Laperdrix and Benoit Baudry

    Keywords: browser fingerprinting, privacy, software diversity

    Browser fingerprinting is a stateless technique, which consists in collecting a wide range of data about a device, through browser APIs. Past studies have demonstrated that modern devices present so much diversity that fingerprints can be exploited to identify and track users online. With this work, we want to evaluate browser fingerprinting is still effective at uniquely identifying a large group of users when analyzing a millions of fingerprint over a few months.
    We collected 2,067,942 browser fingerprints from one of the top 15 French websites. The analysis of this novel dataset sheds a new light on the ever-growing browser fingerprinting domain. The key insight is that the percentage of unique fingerprints in our dataset is much lower than what was reported in the past: only 33.6% of fingerprints are unique by opposition to over 80% in previous studies. We show that this observation is robust: even if fingerprints change, the degree of uniqueness does not increase. We also confirm that the current evolution of web technology is benefiting users’ privacy significantly as the removal of plugins brings down substantively the rate of unique desktop machines.

  • Exposing Search and Advertisement Abuse Tactics and Infrastructure of Technical Support Scammers
    Authors: Bharat Srinivasan, Athanasios Kountouras, Najmeh Miramirkhani, Monjur Alam, Nick Nikiforakis, Manos Antonakakis and Mustaque Ahamad

    Keywords: cross-channel abuse, social engineering, measurement

    Technical Support Scams (TSS), which combine online abuse with social engineering over the phone channel, have persisted despite several law enforcement actions. Although recent research has provided important insights into TSS, these scams have now evolved to exploit ubiquitously used online services such as search and sponsored advertisements served in response to search queries. We use a data-driven approach to understand search-and-ad abuse by TSS to gain visibility into the online infrastructure that facilitates it. By carefully formulating tech support queries with multiple search engines, we collect data about both the support infrastructure and the websites to which TSS victims are directed when they search online for tech support resources. We augment this with a DNS-based amplification technique to further enhance visibility into this abuse infrastructure. By analyzing the collected data, we provide new insights into search-and-ad abuse by TSS and reinforce some of the findings of earlier research. Further, we demonstrate that tech support scammers are (1) successful in getting major as well as custom search engines to return links to websites controlled by them, and (2) they are able to get ad networks to serve malicious advertisements that lead to scam pages. Our study period of approximately eight months uncovered over 9,000 TSS domains, of both passive and aggressive types, with minimal overlap between sets that are reached via organic search results and sponsored ads. Also, we found over 2,400 support domains which aid the TSS domains in manipulating organic search results. Moreover, to our surprise, we found very little overlap with domains that are reached via abuse of domain parking and URL-shortening services which was investigated previously. Thus, investigation of search-and-ad abuse provides new insights into TSS tactics and helps detect previously unknown abuse infrastructure that facilitates these scams.

  • Mind Your Credit: Assessing the Health of the Ripple Credit Network
    Authors: Pedro Moreno-Sanchez, Navin Modi, Raghuvir Songhela, Aniket Kate and Sonia Fahmy

    Keywords: Ripple credit network, IOweYou (IOU), credit devilry, rippling, faulty gateways, stale exchange offers

    The Ripple credit network has emerged as a payment backbone with key advantages for financial institutions and the remittance industry. Its path-based IOweYou (IOU) settlements across different currencies conceptually distinguishes the Ripple blockchain from cryptocurrencies (such as Bitcoin and altcoins), and makes it highly suitable to an orthogonal yet vast set of applications in the remittance world for cross-border transactions and beyond.
    This work studies the structure and evolution of the Ripple network since its inception, and investigates its vulnerability to attacks that harm the IOU credit of its wallets. We find that about 13M USD are at risk in the current Ripple network due to inappropriate configuration of the rippling flag on credit links, facilitating undesired redistribution of credit across those links. Although the Ripple network has grown around a few highly connected hub (gateway) wallets that constitute the core of the network and provide high liquidity to users, such a credit link distribution results in a user base of around 112,000 wallets that can be financially isolated by as few as 10 highly connected gateway wallets. Indeed, today about 4.9M USD cannot be withdrawn by their owners from the Ripple network due to PayRoutes, a gateway tagged as faulty by the Ripple community. Finally, we observe that stale exchange offers pose a real problem, and exchanges (market makers) have not always been vigilant about periodically updating their exchange offers according to current real-world exchange rates. For example, stale offers were used by 84 Ripple wallets to gain more than 4.5M USD from mid-July to mid-August 2017. Our findings should prompt the Ripple community to improve the health of the network by educating its users on increasing their connectivity, and by appropriately maintaining the credit limits, rippling flags, and exchange offers on their IOU credit links.

  • I’m Listening to your Location! Inferring User Location with Acoustic Side Channel
    Authors: Youngbae Jeon, Minchul Kim, Hyunsoo Kim, Hyoungshick Kim, Jun Ho Huh and Ji Won Yoon

    Keywords: Electrical network frequency, Location tracking, Side channel analysis

    Electrical network frequency (ENF) signals have common patterns that can be used as signatures for identifying recorded time and location of videos and sound. To enable cost-efficient, reliable and scalable location inference, we created a reference map of ENF signals representing hundreds of locations world wide — extracting real-world ENF signals from online multimedia streaming services (e.g., YouTube and Explore). Based on this reference map of ENF signals, we propose a novel side-channel attack that can identify the physical location of where a target video or sound was recorded or streamed from. Our attack does not require any expensive ENF signal receiver nor any software to be installed on a victim’s device — all we need is the recorded video or sound files to perform the attack and they are collected from world wide web. The evaluation results show that our attack can infer the intra-grid location of the recorded audio files with an accuracy of 76\% when those files are 5 minutes or longer. We also showed that our proposed attack works well even when video and audio data are processed within a certain distortion range with audio codecs used in real VoIP applications.

  • SafeKeeper: Protecting Web Passwords using Trusted Execution Environments
    Authors: Klaudia Krawiecka, Arseny Kurnikov, Andrew Paverd, Mohammad Mannan and N. Asokan

    Keywords: Passwords, Trusted Execution Environment, Intel SGX, Breaches, Phishing

    Passwords are by far the most widely-used mechanism for authenticating users on the web, out-performing all competing solutions in terms of deployability (e.g., cost, compatibility etc.). However, two critical security concerns are phishing and theft of password databases. These are exacerbated by users’ tendency to reuse passwords across different services. Current solutions typically address only one of the two concerns, and do not protect passwords against rogue servers. Furthermore, they do not provide any verifiable evidence of their (server-side) adoption to users, and they face deployability challenges in terms of ease-of-use for end users, and/or costs for service providers.
    We present SafeKeeper, a novel and comprehensive solution to ensure secrecy of passwords in web authentication systems. Unlike previous approaches, SafeKeeper protects users’ passwords against very strong adversaries, including external phishers as well as corrupted (rogue) servers. It is relatively inexpensive to deploy as it (i) uses widely available hardware-based trusted execution environments like Intel SGX, (ii) requires only minimal changes for integration into popular web platforms like WordPress, and (iii) imposes negligible performance overhead. We discuss several challenges in designing and implementing such a system, and how we overcome them. Via an 86-participant user study, systematic analysis, and experiments, we show the usability, security, and deployability of SafeKeeper, which is available as open-source.