WebSci '21: 13th ACM Web Science Conference 2021

Full Citation in the ACM Digital Library

SESSION: Keynotes

In conversation with Martha Lane Fox and Wendy Hall on the Future of the Internet

Global Digital Infrastructure in a Post-Pandemic World

The Post-API Age Reconsidered: Web Science in the ’20s and Beyond

Digital Data and a Multilevel Perspective of Institutions on the Web

SESSION: Panels

The Future of Web and Society

The Coded Gaze: algorithmic bias, facial recognition and beyond: How research can change the law and influence people

COVID-19 and Society

AI, Media and the Future of News on the Web

Directions in Digital Government

SESSION: Does the system work? Evaluating Tools and Functions

The influence of search engine optimization on Google's results: A multi-dimensional approach for detecting SEO

Search engine optimization (SEO) can significantly influence what is shown on the result pages of commercial search engines. However, it is unclear what proportion of (top) results have actually been optimized. We developed a tool that uses a semi-automatic approach to detect, based on a given URL, whether SEO measures were taken. In this multi-dimensional approach, we analyze the HTML code from which we extract information on SEO and analytics tools. Further, we extract SEO indicators on the page level and the website level (e.g., page descriptions and loading time of a website). We amend this approach by using lists of manually classified websites and use machine learning methods to improve the classifier. An analysis based on three datasets with a total of 1,914 queries and 256,853 results shows that a large fraction of pages found in Google is at least probably optimized, which is in line with statements from SEO experts saying that it is tough to gain visibility in search engines without applying SEO techniques.

Measuring Digital Literacy with Eye Tracking: An examination of skills and performance based on user gaze

Digital inequality has been intensively studied in recent decades, due to the considerable social significance of this phenomenon. Research has struggled with finding quality and profound ways to measure digital literacy of people from different social groups, due to the dynamic character of digital technology, which results in ever-changing, nuanced types of digital inequality. This study proposes an innovative method for examining how users approach and accomplish digital tasks by introducing eye tracking for measuring user scan patterns, gaze and attention during completion of tasks. Eye tracking as a measurement of attention and focus reflects the processes that occur while users are performing required tasks, and therefore may be a useful tool to comprehend digital literacy. We apply this innovative methodology in a repeated measures observation study of digital skills of low-skilled participants in a computer introductory course. 19 participants were requested to perform several online tasks before and after completing the course. The paper describes the results, which demonstrate that although participants’ skills have improved, the improvement is manifest in basic, trivial uses, while advance uses, such as understanding of efficient searching, or using the Internet in sophisticated ways as an environmental resource supporting one's situational awareness, are only slightly improved, as data based on tracking user gaze and mouse movements reveals.

Towards a Novel Benchmark for Automatic Generation of ClaimReview Markup

The spreading of disinformation throughout the web has become a critical problem for a democratic society. The dissemination of fake news has become a profitable business and a common practice among politicians and content producers. On the other hand, journalists and fact-checkers work unceasingly to debunk misinformation and prevent it from further spreading. In 2015, a new web markup called ClaimReview has been introduced to grant access to the fact-checking article’s meaning by search engines. It is an important initiative to fight fake news by promoting and highlighting fact-check articles among users. However, barely half of fact-checkers have adopted the ClaimReview markup so far, resulting in low findability of fact-check articles, especially in under-represented countries and languages. In this work, we investigate the viability of using Artificial Intelligence for generating ClaimReview automatically. Besides promoting fact-check articles, the automatic generating of ClaimReview is an important step towards the creation of updated multilingual knowledge base for fighting disinformation. Our experiments show noticeable results, which indicate a viable solution in a production environment. Furthermore, this work has created a benchmark that can be used in upcoming investigations in this domain.

Automatically Selecting Striking Images for Social Cards

To allow previewing a web page, social media platforms have developed social cards: visualizations consisting of vital information about the underlying resource. At a minimum, social cards often include features such as the web resource’s title, text summary, striking image, and domain name. News and scholarly articles on the web are frequently subject to social card creation when being shared on social media. However, we noticed that not all web resources offer sufficient metadata elements to enable appealing social cards. For example, the COVID-19 emergency has made it clear that scholarly articles, in particular, are at an aesthetic disadvantage in social media platforms when compared to their often more flashy disinformation rivals. Also, social cards are often not generated correctly for archived web resources, including pages that lack or predate standards for specifying striking images. With these observations, we are motivated to quantify the levels of inclusion of required metadata in web resources, its evolution over time for archived resources, and create and evaluate an algorithm to automatically select a striking image for social cards. We find that more than 40% of archived news articles sampled from the NEWSROOM dataset and 22% of scholarly articles sampled from the PubMed Central dataset fail to supply striking images. We demonstrate that we can automatically predict the striking image with a Precision@1 of 0.83 for news articles from NEWSROOM and 0.78 for scholarly articles from the open access journal PLOS ONE.

Limiting Tags Fosters Efficiency

Tagging facilitates information retrieval in social media and other online communities by allowing users to organize and describe online content. Researchers found that the efficiency of tagging systems steadily decreases over time, because tags become less precise in identifying specific documents, i.e., they lose their descriptiveness. However, previous works did not answer how or even whether community managers can improve the efficiency of tags. In this work, we use information-theoretic measures to track the descriptive and retrieval efficiency of tags on Stack Overflow, a question-answering system that strictly limits the number of tags users can specify per question. We observe that tagging efficiency stabilizes over time, while tag content and descriptiveness both increase. To explain this observation, we hypothesize that limiting the number of tags fosters novelty and diversity in tag usage, two properties which are both beneficial for tagging efficiency. To provide qualitative evidence supporting our hypothesis, we present a statistical model of tagging that demonstrates how novelty and diversity lead to greater tag efficiency in the long run. Our work offers insights into policies to improve information organization and retrieval in online communities.

SESSION: Critical Methods for Examining the Web

On the conditions for integrating deep learning into the study of visual politics

Traditional methods to study visual politics have been limited in geographical, media and temporal coverage. Recent advances in deep learning have the potential to dramatically extend the scope of the field especially with respect to making sense of the contemporary social and political developments in digital media. While some early adopters may be tempted to take the new computational tools at face value, others see their black-box character as cause for concern. This paper argues that the integration of deep learning into the study of visual politics must be approached still more critically and boldly. On the one hand, the complexity of visual political themes requires a more substantial human involvement if compared with other applications of deep neural networks. Therefore, a question is how the scientist and the network should best interact. On the other hand, it is important to acknowledge that a deep learning tool will never simply replace specific tasks inside a research process: its adoption has implications for the broader process from the delineation of the object of analysis, to data collection, to the interpretation and communication of results. We examine the conditions of integrating a deep learning tool for image classification into the large-scale study of visual politics in digital and social media along these two dimensions.

Auditing Algorithmic Bias on Twitter

Digital media platforms are reshaping our habits, how we access information, and how we interact with others. As a result, algorithms used by platforms, for example, to recommend content, play an increasingly important role in our access to information. Due to practical difficulties of accessing how platforms present content to their users, relatively little is known about how recommendation algorithms affect the information people receive. In this paper we implement a sock-puppet audit, a computational framework to audit black-box social media systems so as to quantify the impact of algorithmic curation on the information people see. We evaluate this framework by conducting a study on Twitter. We demonstrate that Twitter’s timeline curation algorithms skew the popularity and novelty of content people see and increase the inequality of their exposure to friends’ tweets. Our work provides evidence that algorithmic curation of content systematically distorts the information people see.

Improving Reactions to Rejection in Crowdsourcing Through Self-Reflection

In popular crowdsourcing marketplaces like Amazon Mechanical Turk, crowd workers complete tasks posted by requesters in return for monetary rewards. Task requesters are solely responsible for deciding whether to accept or reject submitted work. Rejecting work can directly affect the monetary reward of corresponding workers, and indirectly influence worker qualifications and their future work opportunities in the marketplace. Unexpected or unwarranted rejections therefore result in negative emotions and reactions among workers. Despite the high prevalence of rejections in crowdsourcing marketplaces, little research has explored ways to mitigate the negative emotional repercussions of rejections on crowd workers. Addressing this important research gap, we investigate whether introducing self-reflection at different stages after task execution can alleviate the emotional toll of rejection decisions on crowd workers. Our work is inspired by prior studies in psychology that have shown that self-reflection on negative personal experiences can positively affect one’s emotion. To this end, we carried out an experimental study investigating the impact of explicit self-reflection on the emotions of rejected crowd workers. Results show that allowing workers to self-reflect on their delivered work, especially before receiving a rejection, has a significantly positive impact on their self-reported emotions in terms of valence and dominance. Our findings reveal that introducing a self-reflection stage before workers receive acceptance or rejection decisions on submitted work, can help in positively influencing the emotions of a worker. These findings have important design implications towards fostering a healthier requester-worker relationship and contributing towards the sustainability of the crowdsourcing marketplace.

Trick and Please. A Mixed-Method Study On User Assumptions About the TikTok Algorithm

The short-form video sharing app TikTok is characterized by content-based interactions that largely depend on individually customized video feeds curated by the app’s recommendation algorithm. Algorithms are generally invisible mechanisms within socio-technical systems that can influence how we perceive online and offline reality, and how we interact with each other. Based on experiences from consuming and creating videos, users develop assumptions about how the TikTok algorithm might work, and about how to trick and please the algorithm to make their videos trend so it pushes them to other users’ ‘for you’ pages. We conducted 28 qualitative interviews with TikTok users and identified three main criteria they assume influence the platform’s algorithm: video engagement, posting time, and adding and piling up hashtags. We then collected 300,617 videos from the TikTok trending section and performed a series of data exploration and analysis to test these user assumption by determining criteria for trending videos. Our data analysis confirms that higher video engagement through comments, likes, and shares leads to a higher chance of the algorithm pushing a video to the trending section. We also find that posting videos at certain times increases the chances of it trending and reaching higher popularity. In contrast, the highly common assumption that using trending hashtags, algorithm related hashtags (e.g. #fyp, #foryou), and piling up trending hashtags would significantly push videos to the trending section was found not applicable. Our results contribute to existing research on user understanding of social media algorithms using TikTok as an example for a short-video app that is explicitly built around algorithmic content recommendation. Our results provide a broader perspective on user beliefs and behavior in the context of socio-technical systems and social media content creation and consumption.

NetProtect: Network Perturbations to Protect Nodes against Entry-Point Attack

In many network applications, it may be desirable to conceal certain target nodes from detection by a data collector, who is using a crawling algorithm to explore a network. For example, in a computer network, the network administrator may wish to protect those computers (target nodes) with sensitive information from discovery by a hacker who has exploited vulnerable machines and entered the network. These networks are often protected by hiding the machines (nodes) from external access, and allow only fixed entry points into the system (protection against external attacks). However, in this protection scheme, once one of the entry points is breached, the safety of all internal machines is jeopardized (i.e., the external attack turns into an internal attack). In this paper, we view this problem from the perspective of the data protector. We propose the Node Protection Problem: given a network with known entry points, which edges should be removed/added so as to protect as many target nodes from the data collector as possible? A trivial way to solve this problem would be to simply disconnect either the entry points or the target nodes – but that would make the network non-functional. Accordingly, we impose certain constraints: for each node, only (1 − r) fraction of its edges can be removed, and the resulting network must not be disconnected. We propose two novel scoring mechanisms - the Frequent Path Score and the Shortest Path Score. Using these scores, we propose NetProtect, an algorithm that selects edges to be removed or added so as to best impede the progress of the data collector. We show experimentally that NetProtect outperforms baseline node protection algorithms across several real-world networks. In some datasets, With 1% of the edges removed by NetProtect, we found that the data collector requires up to 6 (4) times the budget compared to the next best baseline in order to discover 5 (50) nodes.

SESSION: Developing New Web Research Using NLP and Machine Learning

Social Science for Natural Language Processing: A Hostile Narrative Analysis Prototype

We propose a new methodology for analysing hostile narratives by incorporating theories from Social Science into a Natural Language Processing (NLP) pipeline. Drawing upon Peace Research, we use the “Self-Other gradient” from the theory of cultural violence to develop a framework and methodology for analysing hostile narratives. As test data for this development, we contrast Hitler’s Mein Kampf and texts from the “War on Terror” era with non-violent speeches from Martin Luther King. Our experiments with this dataset question the explanatory value of numerical outputs generated by quantitative methods in NLP. In response, we draw upon narrative analysis techniques for the technical development of our pipeline. We experimentally show how analysing narrative clauses has the potential to generate outputs of improved explanatory value to quantitative methods. To the best of our knowledge, this work constitutes the first attempt to incorporate cultural violence into an NLP pipeline for the analysis of hostile narratives.

fastText-based methods for Emotion Identification in Russian Internet Discourse

In this paper we tackle the problem of emotion detection and classification in Russian short text messages. We use such recent NLP development as fastText that produces state-of-the-art results for a variety of tasks. We put a special emphasis on the challenges that arise while using a dataset of text messages from the most popular Russian messaging/social networking services (Telegram, VK). We also provide an extensive quantitative prediction analysis along with suggestions of possible ways to improve the results. Finally, we discuss the prospects of developing and implementing discourse-specific emotion identification technologies for the Web.

You’d Better Stop! Understanding Human Reliance on Machine Learning Models under Covariate Shift

Decision-making aids powered by machine learning models become increasingly prevalent on the web today. However, when applied to a new distribution of data that is different from the training data (i.e., when covariate shift occurs), machine learning models often suffer from performance degradation and may provide misleading recommendations to human decision-makers. In this paper, we conduct a randomized experiment to investigate how people rely on machine learning models to make decisions under covariate shift. Surprisingly, we find that people rely on machine learning models more when making decisions on out-of-distribution data than in-distribution data. Moreover, while increasing people’s awareness of the machine learning model’s possible performance disparity on different data helps decrease people’s over-reliance on the model under covariate shift, enabling people to visualize the data distributions and the model’s performance does not seem to help. We conclude by discussing the implication of our results.

Efficient Detection of Multilingual Hate Speech by Using Interactive Attention Network with Minimal Human Feedback

Online hate speech on social media has become a critical problem for social network services that has been further fueled by the self-isolation in the COVID-2019 pandemic. Current studies have primarily focused on detecting hate speech in one language due to the complexity of the task; however, hate speech has no boundaries across the languages and geographies in the real world nowadays. This demands further investigation on multilingual hate speech detection methods, with strong requirements for model interpretability to effectively understand the context of the model errors. In this paper, we propose a Multilingual Interactive Attention Network (MLIAN) model for hate speech detection on multilingual social media text corpora, by building upon the attention networks for interpretability and human-in-the-loop paradigm for model adaptability. This model interactively learns to give attention to the relevant contextual words and leverage the labels for the hate target mentions from the simulated human feedback. We evaluated the proposed model on SemEval-2019 Task 5 datasets in English and Spanish. Extensive experimentation of model training in both settings of single and multiple language data demonstrates the superior performance of our model (with AUC more than 84%) compared to the strong baselines. Our results show that human feedback not only improves the model performance but also helps to improve the interpretability of the model by establishing a strong connection between the learned attention weights and semantic frames for the text across languages. Further, an analysis of the amount of human feedback required to achieve reliable and increased model performance shows that less than 4% of training data is sufficient. The application of the MLIAN method can inform future studies on multilingual hate speech.

SESSION: Problematic Online Content

Are Anti-Feminist Communities Gateways to the Far Right? Evidence from Reddit and YouTube

Researchers have suggested that “the Manosphere,” a conglomerate of men-centered online communities, may serve as a gateway to far right movements. In that context, this paper quantitatively studies the migratory patterns between a variety of groups within the Manosphere and the Alt-right, a loosely connected far right movement that has been particularly active in mainstream social networks. Our analysis leverages over 300 million comments spread through Reddit (in 115 subreddits) and YouTube (in 526 channels) to investigate whether the audiences of channels and subreddits associated with these communities have converged between 2006 and 2018. In addition to subreddits related to the communities of interest, we also collect data on counterparts: other groups of users which we use for comparison (e.g., for YouTube we use a set of media channels). Besides measuring the similarity in the commenting user bases of these communities, we perform a migration study, calculating to which extent users in the Manosphere gradually engage with Alt-right content. Our results suggest that there is a large overlap between the user bases of the Alt-right and of the Manosphere and that members of the Manosphere have a bigger chance to engage with far right content than carefully chosen counterparts. However, our analysis also shows that migration and user base overlap varies substantially across different platforms and within the Manosphere. Members of some communities (e.g., Men’s Rights Activists) gradually engage with the Alt-right significantly more than counterparts on both Reddit and YouTube, whereas for other communities, this engagement happens mostly on Reddit (e.g., Pick Up Artists). Overall, our work paints a nuanced picture of the pipeline between the Manosphere and the Alt-right, which may inform platforms’ policies and moderation decisions regarding these communities.

“Subverting the Jewtocracy”: Online Antisemitism Detection Using Multimodal Deep Learning

The exponential rise of online social media has enabled the creation, distribution, and consumption of information at an unprecedented rate. However, it has also led to the burgeoning of various forms of online abuse. Increasing cases of online antisemitism have become one of the major concerns because of its socio-political consequences. Unlike other major forms of online abuse like racism, sexism, etc., online antisemitism has not been studied much from a machine learning perspective. To the best of our knowledge, we present the first work in the direction of automated multimodal detection of online antisemitism. The task poses multiple challenges that include extracting signals across multiple modalities, contextual references, and handling multiple aspects of antisemitism. Unfortunately, there does not exist any publicly available benchmark corpus for this critical task. Hence, we collect and label two datasets with 3,102 and 3,509 social media posts from Twitter and Gab respectively. Further, we present a multimodal deep learning system that detects the presence of antisemitic content and its specific antisemitism category using text and images from posts. We perform an extensive set of experiments on the two datasets to evaluate the efficacy of the proposed system. Finally, we also present a qualitative analysis of our study.

Monetizing Propaganda: How Far-right Extremists Earn Money by Video Streaming

Video streaming platforms such as Youtube, Twitch, and DLive allow users to live-stream video content for viewers who can optionally express their appreciation through monetary donations. DLive is one of the smaller and lesser-known streaming platforms, and historically has had fewer content moderation practices. It has thus become a popular place for violent extremists and other clandestine groups to earn money and propagandize. What is the financial structure of the DLive streaming ecosystem and how much money is changing hands? In the past it has been difficult to understand how far-right extremists fundraise via podcasts and video streams because of the secretive nature of the activity and because of the difficulty of getting data from social media platforms. This paper describes a novel experiment to collect and analyze data from DLive's publicly available ledgers of transactions in order to understand the financial structure of the clandestine, extreme far-right video streaming community. The main findings of this paper are, first, that the majority of donors are using micropayments in varying frequencies, but a small handful of donors spend large amounts of money to finance their favorite streamers. Next, the timing of donations to high-profile far-right streamers follows a fairly predictable pattern that is closely tied to a broadcast schedule. Finally, the far-right video streaming financial landscape is divided into separate cliques which exhibit very little crossover in terms of sizable donations. This work will be important to technology companies, policymakers, and researchers who are trying to understand how niche social media services, including video platforms, are being exploited by extremists to propagandize and fundraise.

The Rise and Fall of Fake News sites: A Traffic Analysis

Over the past decade, we have witnessed the rise of misinformation on the Internet, with online users constantly falling victims of fake news. A multitude of past studies have analyzed fake news diffusion mechanics and detection and mitigation techniques. However, there are still open questions about their operational behavior such as: How old are fake news websites? Do they typically stay online for long periods of time? Do such websites synchronize with each other their up and down time? Do they share similar content through time? Which third-parties support their operations? How much user traffic do they attract, in comparison to mainstream or real news websites? In this paper, we perform a first of its kind investigation to answer such questions regarding the online presence of fake news websites and characterize their behavior in comparison to real news websites. Based on our findings, we build a content-agnostic ML classifier for automatic detection of fake news websites (i.e., F1 score up to 0.942 and AUC of ROC up to 0.976) that are not yet included in manually curated blacklists.

Fighting Against Fake News During Pandemic Era: Does Providing Related News Help Student Internet Users to Detect COVID-19 Misinformation?

The COVID-19 “infodemic” has resulted in the widespread dissemination of counterfeit medical advice, hoaxes, fake products, and phoney information about the virus and responses. As a result, computational methods for determining any information’s authenticity to improve trust in public health awareness and policy decisions are profoundly discussed in the scientific community. Even before the pandemic, mis- and disinformation, including fake news, have been observed in the online world in significant numbers for numerous business, political and personal reasons. Moreover, many of these fake news was published from sources believed to be reliable. In contrast, some other fake news was fabricated in a way that would be easily trusted and shared by the general people in social media. COVID-19 related fake news has enormous effects on both the offline and online community, and thus, it challenges government initiatives for proper health intervention. Therefore, interest in research in this area has risen to understand the problem both socially and technically. In this paper, we attempt to understand how we can help student Internet users of colleges from the lower-middle-income country, Bangladesh, in Southeast Asia, to distinguish COVID-19 misinformation. Our study reveals that providing related news as supplementary information to any online news helps students make better decision about news authenticity. Statistical analyses on the survey data show that male students were found to be more accurate than female students to detect mis- and disinformation; students from the urban areas could detect misleading news better than students from villages; and that students from Science background demonstrated overall best performance, while students from Madrasah background, who are all male, could not produce a significant improvement. We conclude that the female students in general and male students of Madrasah, who spend the least amount of time online among all the student Internet users, are the most vulnerable groups to fake news.

SESSION: Extremism, Polarisation and Controversy: The New Reality of the Web

Understanding the Effect of Deplatforming on Social Networks

Aiming to enhance the safety of their users, social media platforms enforce terms of service by performing active moderation, including removing content or suspending users. Nevertheless, we do not have a clear understanding of how effective it is, ultimately, to suspend users who engage in toxic behavior, as that might actually draw users to alternative platforms where moderation is laxer. Moreover, this deplatforming efforts might end up nudging abusive users towards more extreme ideologies and potential radicalization risks. In this paper, we set to understand what happens when users get suspended on a social platform and move to an alternative one. We focus on accounts active on Gab that were suspended from Twitter and Reddit. We develop a method to identify accounts belonging to the same person on these platforms, and observe whether there was a measurable difference in the activity and toxicity of these accounts after suspension. We find that users who get banned on Twitter/Reddit exhibit an increased level of activity and toxicity on Gab, although the audience they potentially reach decreases. Overall, we argue that moderation efforts should go beyond ensuring the safety of users on a single platform, taking into account the potential adverse effects of banning users on major platforms.

Mainstream Consensus and the Expansive Fringe: Characterizing the Polarized Information Ecosystems of Online Climate Change Discourse

In this paper, we introduce a social network perspective for characterizing polarized information ecosystems. We apply our framework to a large-scale dataset of climate change conversations on Twitter. Leveraging a stance detection algorithm, we quantify the link-sharing behaviors of Believers and Disbelievers of anthropogenic climate change. We generate networks of web domains based on co-sharing by Believers and Disbelievers, and we characterize these networks in terms of both structure and content. While Believers outnumber Disbelievers in our dataset, our results showed that Disbelievers are responsible for sharing over three times as many unique domains. However, for every 10% increment in a domain’s proportion of Disbeliever sharers, the same domain is shared 58% less. Structurally, we observed that domain clusters associated with Believers were significantly smaller, denser, and less connected to other domains. Content-wise, we additionally found that Disbelievers were the near-exclusive sharers of majority right-wing and known fake news sources, whereas both Believers and Disbelievers shared left-wing and centrist domains. Collectively, our findings indicate that climate change Believers rely on a popular, well-consolidated, and exclusive set of mainstream web domains. In contrast, Disbelievers draw on a diverse and fragmented collection of fringe information sources. Although these results suggest the marginal status of climate change skepticism, they also troublingly point to its expansive ecosystem of online information featuring multiple entry points. We conclude with directions for future research and potential implications for science communication and climate change policymakers.

Drivers of Polarized Discussions on Twitter during Venezuela Political Crisis

Social media activity is driven by real-world events (natural disasters, political unrest, etc.) and by processes within the platform itself (viral content, posts by influentials, etc). Understanding how these different factors affect social media conversations in polarized communities has practical implications, from identifying polarizing users to designing content promotion algorithms that alleviate polarization. Based on two datasets that record real-world events (ACLED and GDELT), we investigate how internal and external factors drive related Twitter activity in the highly polarizing context of the Venezuela’s political crisis from early 2019. Our findings show that antagonistic communities react differently to different exogenous sources depending on the language they tweet. The engagement of influential users within particular topics seem to match the different levels of polarization observed in the networks.

Analysis and Prediction of Multilingual Controversy on Reddit

Social media users express their opinions about arbitrary subjects, including controversial matters such as the 2020 U.S. presidential election or climate change. Controversial topics typically attract user attention, which often lead to fruitful, but sometimes also heated discussions potentially segregating the community. Understanding features that are predictive of controversy in social media can improve moderation of communities and therefore the public discourse. In this paper, we analyze and predict controversy on the multilingual social platform Reddit. In particular, we compare a large set of textual and user activity features in controversial and non-controversial comments posted in six different languages. Using these features we perform a prediction task and study their predictive strengths for controversy. Our results indicate that, regardless of the language, controversial comments are harder to read, more negative and users follow up faster and more frequently to such comments. Moreover, with our prediction experiment (ROC AUC = 0.79) we find that across all languages user activity is the most predictive of controversy on Reddit. Our results contribute to an improved understanding of controversy in social media and can serve as a foundation for tools and models to automatically detect controversial content posted on such platforms.

A Look into COVID-19 Vaccination Debate on Twitter

Twitter is one of the most popular social media applications used by the general public to debate a wide range of topics. It is not surprising that the platform has become an effervescent channel where people are talking about the COVID-19 pandemic. After one year of a severe pandemic, we are now giving the first steps towards its ending: the production and distribution of vaccines as well as the start of vaccination campaigns in several countries worldwide. However, the relatively quick emergence of alternative vaccines raised several concerns and doubts among the general people, leading to lively online and offline debates.

In this paper, we investigate the public perception of this topic as it unrolls in the real world, analyzing over 12 million tweets during two months corresponding to the early stages of vaccination in the world. Our investigation includes the analyses of user engagement as well as content properties, including sentiment and psycholinguistic characteristics. In broad terms, our findings offer a first look into the dynamics of the online debate around a topic – COVID-19 vaccination – at its early stages of development, evidencing how people use the online world, notably Twitter, to share their impressions and concerns about it. As a means to allow reproducibility and foster follow-up studies, we release our collected dataset for public use.

SESSION: Data Sharing, Data Use and the Elusiveness of Privacy

CCCC: Corralling Cookies into Categories with CookieMonster

Browser cookies are ubiquitous in the web ecosystem today. Although these cookies were initially introduced to preserve user-specific state in browsers, they have now been used for numerous other purposes, including user profiling and tracking across multiple websites. This paper sets out to understand and quantify the different uses for cookies, and in particular, the extent to which targeting and advertising, performance analytics and other uses which only serve the website and not the user add to overall cookie volumes. We start with 31 million cookies collected in Cookiepedia, which is currently the most comprehensive database of cookies on the Web. Cookiepedia provides a useful four-part categorisation of cookies into strictly necessary, performance, functionality and targeting/advertising cookies, as suggested by the UK International Chamber of Commerce. Unfortunately, we found that, Cookiepedia data can categorise less than 22% of the cookies used by Alexa Top20K websites and less than 15% of the cookies set in the browsers of a set of real users. These results point to an acute problem with the coverage of current cookie categorisation techniques.

Consequently, we developed CookieMonster, a novel machine learning-driven framework which can categorise a cookie into one of the aforementioned four categories with more than 94% F1 score and less than 1.5 ms latency. We demonstrate the utility of our framework by classifying cookies in the wild. Our investigation revealed that in Alexa Top20K websites necessary and functional cookies constitute only 13.05% and 9.52% of all cookies respectively. We also apply our framework to quantify the effectiveness of tracking countermeasures such as privacy legislation and ad blockers. Our results identify a way to significantly improve coverage of cookies classification today as well as identify new patterns in the usage of cookies in the wild.

AAA: Fair Evaluation for Abuse Detection Systems Wanted

User-generated web content is rife with abusive language that can harm others and discourage participation. Thus, a primary research aim is to develop abuse detection systems that can be used to alert and support human moderators of online communities. Such systems are notoriously hard to develop and evaluate. Even when they appear to achieve satisfactory performance on current evaluation metrics, they may fail in practice on new data. This is partly because datasets commonly used in this field suffer from selection bias, and consequently, existing supervised models overrely on cue words such as group identifiers (e.g., gay and black) which are not inherently abusive. Although there are attempts to mitigate this bias, current evaluation metrics do not adequately quantify their progress. In this work, we introduce Adversarial Attacks against Abuse (AAA), a new evaluation strategy and associated metric that better captures a model’s performance on certain classes of hard-to-classify microposts, and for example penalises systems which are biased on low-level lexical features. It does so by adversarially modifying the model developer’s training and test data to generate plausible test samples dynamically. We make AAA available as an easy-to-use tool, and show its effectiveness in error analysis by comparing the AAA performance of several state-of-the-art models on multiple datasets. This work will inform the development of detection systems and contribute to the fight against abusive language online.

Wide-AdGraph: Detecting Ad Trackers with a Wide Dependency Chain Graph

Websites use third-party ads and tracking services to deliver targeted ads and collect information about users that visit them. These services put users’ privacy at risk, and that is why users’ demand for blocking these services is growing. Most of the blocking solutions rely on crowd-sourced filter lists manually maintained by a large community of users. In this work, we seek to simplify the update of these filter lists by combining different websites through a large-scale graph connecting all resource requests made over a large set of sites. The features of this graph are extracted and used to train a machine learning algorithm with the aim of detecting ads and tracking resources. As our approach combines different information sources, it is more robust toward evasion techniques that use obfuscation or changing the usage patterns. We evaluate our work over the Alexa top-10K websites and find its accuracy to be 96.1% biased and 90.9% unbiased with high precision and recall. It can also block new ads and tracking services, which would necessitate being blocked by further crowd-sourced existing filter lists. Moreover, the approach followed in this paper sheds light on the ecosystem of third-party tracking and advertising.

Social acceptability of personal data utilization business according to data controllers and purposes

The rapid development of data analysis technology has enabled various uses of data, including some cases in which data usage cannot be accepted socially. Digital platform operators find it difficult to estimate the reputational risk of releasing new data utilization services. In addition, because the services deployed on the Internet are used across countries, the problem becomes even more complex when the differences between countries are considered. In this study, we conducted a survey in multiple countries, and the analysis takes into account the circumstances among countries to find acceptance of data utilization depending on the entity that utilizes the data. To take into account cultural and legal differences, a survey was conducted in Japan, the United States, the United Kingdom, and France. The number of participants are 782, 509, 235, and 103 respectively. The typology of data utilization is represented by the combination of a “data controller,” “data category,” “processing result,” and “domain (purpose).” We asked the participants about the acceptability of personal data utilization in two stages: whether they believe it is socially acceptable, and whether they would like to use such a service. The former is focused on admissibility, the latter is an intention to use. As a result, we found that acceptability, which is controlled for differences among nations, depends most heavily on an individual’s beliefs about the assurance of the data controller. In short, the individual’s beliefs about the data controller are more important than what the data controller is. Moreover, we investigated the contextual suitability. As a result, we found that the combination of data utilization increased or decreased admissibility and intention to use. In some cases, the intention to use is particularly high even if data utilization is not admitted. This suggests that the service is used out of necessity, including maintaining the status quo.

YouTubing at Home: Media Sharing Behavior Change as Proxy for Mobility Around COVID-19 Lockdowns

Compliance with public health measures, such as restrictions on movement and socialization, is paramount in limiting the spread of diseases such as the severe acute respiratory syndrome coronavirus 2 (also referred to as COVID19). Although large population datasets, such as phone-based mobility data, may provide some glimpse into such compliance, it is often proprietary, and may not be available for all locales. In this work, we examine the usefulness of video sharing on social media as a proxy of the amount of time Internet users spend at home. In particular, we focus on the number of people sharing YouTube videos on Twitter before and during COVID19 lockdown measures were imposed by 109 countries. We find that the media sharing behavior differs widely between countries, in some having immediate response to the lockdown decrees – mostly by increasing the sharing volume dramatically – while in others having a substantial lag. We confirm that these insights correlate strongly with mobility, as measured using phone data. Finally, we illustrate that both media sharing and mobility behaviors change more drastically around mandated lockdowns, and less so around more lax recommendations. We make the media sharing volume data available to the research community for continued monitoring of behavior change around public health measures.

SESSION: Web Tracking and Internet Accessibility

Is this a click towards diversity? Explaining when and why news users make diverse choices

Modelling the different factors that lead people to choose news articles is one of the key challenges for understanding the diversity of news diets – as a news diet is the result of a series of decisions for certain articles over others, a sequence of choices that was made by the individual consumer. This study sheds light on the interplay between content-related (past behavior, habits, preferences) and situational factors (positioning, saturation, control). The latter could offer possibilities to promote more unexpected news encounters that diverge from past news consumption. To test this, a Python-based web application for interactively testing different forms of news personalization over time was used. 247 respondents used the system over a two-week period, in total making almost 23,000 choices. Our results show that: (1) Selections are influenced by a strong positioning effect that follows a reading pattern (left-right, up-down). This effect is stable across devices, topics, and preferences. (2) How much control people are given influences the length and the amount of different sessions (personalization leads to fewer and shorter sessions). (3) With high control, the diversity of preferences influenced the diversity of selected news more, possibly widening gaps between diversity-seeking and -averse users. (4) How often a topic was chosen in the last hour negatively impacts whether it gets chosen again, showing saturation effects. (5) Clicks on sports and economic articles can be explained by preferences, but not past behavior; for political news the opposite is found. (6) There is no significant correlation between the actual diversity (presented or selected topics) and the topic diversity perceived by the users – in spite of clear differences in actual diversity between the groups. From this we can conclude the importance of situational factors in modelling news selection and their potential to narrow or widen the diversity corridor. In sum, our results contribute to a better understanding of the interaction of news recommender systems and humans and how this shapes which news articles get chosen.

An Analysis of Web Tracking Domains in Mobile Applications

Modern web browsers provide users an improved awareness of how websites track them and the potential use of data that is gathered. These features can be built into the browser or provided through an extension. Recently, DuckDuckGo, a privacy advocate and privacy focused search engine company, made publicly available the data they use to inform users of trackers in their browsers and extensions. While users can install extensions or use default browser features to inform themselves of how websites use trackers and potentially block trackers, this availability is limited in mobile applications that communicate with the same services as websites. In this paper, we utilize the data set from DuckDuckGo to analyze mobile applications on iOS. We investigate the top applications in categories designed to provide information to users or that are used in social networks. We also identify which of these applications utilize personal data on the device including location, contacts, and photos. From this investigation, 84% of the applications communicated with domains categorized as advertisement or analytics services that track users. 18% of the applications transmitted the user’s location where of these, 95% communicated with domains classified as trackers. The most common tracker utilized is from Google where 55% of the applications utilize their tracking services and in general, 89% of the applications communicated with Google’s services. While progress has been made in providing more transparency in web browsers, there is still work to be done in mobile operating systems. We advocate that the same features in web browsers become available through the native mobile operating system. As mobile operating system producers shift towards more privacy controls being generally available to users, this can be one more step in providing more transparency to the user in how applications use their data.

Differential Tracking Across Topical Webpages of Indian News Media

Online user privacy and tracking have been extensively studied in recent years, especially due to privacy and personal data-related legislations in the EU and the USA, such as the General Data Protection Regulation, ePrivacy Regulation, and California Consumer Privacy Act. Research has revealed novel tracking and personally identifiable information leakage methods that first- and third-parties employ on websites around the world, as well as the intensity of tracking performed on such websites. However, for the sake of scaling to cover a large portion of the Web, most past studies focused on homepages of websites, and did not look deeper into the tracking practices on their topical subpages. The majority of studies focused on the Global North markets such as the EU and the USA. Large markets such as India, covering 20% of the world population and has no explicit privacy laws, have not been studied in this regard.

We aim to address these gaps and focus on the following research questions: Is tracking on topical subpages of Indian news websites different from their homepage? Do third-party trackers prefer to track specific topics? How does this preference compare to the similarity of content shown on topical subpages? To answer these questions, we propose a novel method for semi-automatic extraction and categorization of Indian news topical subpages based on the details in their URLs. We study the identified topical subpages and compare them with their homepages with respect to the intensity of cookie injection and third-party embeddedness and type. We find differential user tracking among subpages, and between subpages and homepages. We also find a preferential attachment of third-party trackers to specific topics. Also, embedded third-parties tend to track specific subpages simultaneously, revealing possible user profiling in action.

A Bayesian Analysis of Collective Action and Internet Shutdowns in India

Since 2011, internet shutdowns have steadily become an increasingly popular form of digital repression, especially in India – which accounted for more than 50% of global recorded shutdowns from 2016 to 2019. Common shutdown justifications include ‘ensuring public safety’ in order to curb the prevalence of collective action in the form of protests and riots. This paper examines the correlation between internet shutdowns and a range of predictors, identifying riots as the main predictor of a shutdown.

We focus on shutdowns throughout India between 2016 and 2019 with particular attention to Jammu and Kashmir. Primarily using data from the NGO Access Now and the Integrated Conflict Early Warning System, we apply Bayesian inference via generalised linear modelling implemented using the Stan probabilistic programming language, to estimate correlates of shutdown behaviour. We first examine how the prevalence of collective action may impact the probability of observing a shutdown; and second how the length of a shutdown impacts subsequent collective action.

Our main finding is that riots seem to be the key predictor of a shutdown with increased protests and riots increasing the odds of observing a shutdown the same day by 7% with a 95% credible interval of 0.01-0.13 and 15% with a 95% credible interval of 0.03-0.26 respectively. As a predictor, however, the duration of an internet shutdown only has a marginal negative effect on the occurrence of riots at -8% per subsequent shutdown day with a 95% credible interval of -0.16 to -0.002.

Understanding Internet Censorship in Europe: The Case of Spain

European Union (EU) member states consider themselves bulwarks of democracy and freedom of speech. However, there is a lack of empirical studies assessing possible violations of these principles in the EU through Internet censorship. This work starts addressing this research gap by investigating Internet censorship in Spain over 2016-2020, including the controversial 2017 Catalan independence referendum. We focus, in particular, on network interference disrupting the regular operation of Internet services or contents.

We analyzed the data collected by the Open Observatory of Network Interference (OONI) network measurement tool. The measurements targeted civil rights defending websites, secure communication tools, extremist political content, and information portals for the Catalan referendum.

Our analysis indicates the existence of advanced network interference techniques that grow in sophistication over time. Internet Service Providers (ISPs) initially introduced information controls for a clearly defined legal scope (i.e., copyright infringement). Our research observed that such information controls had been re-purposed (e.g., to target websites supporting the referendum).

We present evidence of network interference from all the major ISPs in Spain, serving 91% of mobile and 98% of broadband users and several governmental and law enforcement authorities. In these measurements, we detected 16 unique blockpages, 2 Deep Packet Inspection (DPI) vendors, and 78 blocked websites.

We also contribute an enhanced domain testing methodology to detect certain kinds of Transport Layer Security (TLS) blocking that OONI could not initially detect. In light of our experience analyzing this dataset, we also make suggestions on improving the collection of evidence of network interference.