HT '20: Proceedings of the 31st ACM Conference on Hypertext and Social Media

Full Citation in the ACM Digital Library

SESSION: Keynotes and Invited Talks

Tech Won't Save Us: Reimagining Digital Technologies for the Public

Critical information scholars continue to demonstrate how technology and its narratives are shaped by and infused with values, that is, that it is not the result of the actions of impartial, disembodied, unpositioned agents. Technology consists of a set of social practices, situated within the dynamics of race, gender, class, and politics. This talk, stemming from the recent book, Algorithms of Oppression: How Search Engines Reinforce Racism, addresses the issues of racial equity and public interest technologies that could foreground civil and human rights in the 21st century movements for AI.

Games, Hypertext, and Meaning

It might seem that games can address almost any topic. There are versions of Monopoly and Tetris that, alone, seem to address subjects ranging from pop music, bass fishing, and sex to mass murder, slavery, and predatory real estate development. Yet for all but the last of these, the actual play of these games is at odds with the intended theme.

So what topics can games meaningfully address? One powerful way that games can address topics is by having playable models that resonate with their intended themes. Monopoly is actually an example of such a game, with a playable model of real estate development ripped off from a game intended as a critique of capitalism's approach to resources. So is the less philosophical DOOM, with playable models of combat and space that match its "death travelogue" theme.

The foundation of any playable model is a set of operational logics, which combine communication and computation with opportunities for play. (Monopoly's real estate model includes resource, pattern matching, and chance logics.) Video games depend on a relatively small vocabulary of such logics [1]. This restricts the playable models available, which is a challenge faced by those seeking to meaningfully address personal, cultural, and political topics through games [2].

One conspicuous counter-example, however, is the linking logic. The communicative role of the hypertext link is flexible enough that it can be used to address a wide range of topics. Yet the very flexibility of linking logics pushes the burden of systemic use onto game developers, which itself produces limits. Greater connection between video game research and hypertext research communities could be a path to address this.

The Hypertext Years?

This talk begins with the crazy notion that we might think of hypertext as a signature for the period 1985-2020. The claim is more plausible technically than culturally, but the talk is perversely addressed to culture. Among other things, the discussion revisits Moulthrop's previous ACM Hypertext keynote in 1998, in which he distinguished between "exoteric" hypertext - the then-novel adaptation of the World Wide Web by Amazon and other online retailers - and "esoteric" applications in things like hypertext fiction and digital art. The talk updates this insight with reference to later developments such as Jill Walker's "feral hypertext" thesis, the rise of social media, and the recognition of computer games as legitimate channels of ideas.

While these phenomena have arguably displaced hypertextuality in the popular imagination, Moulthrop points to the major interest in complex narratives, counterfactuals, and multiverses as places where the hypertext aesthetic survives. Turning from aesthetics back to the technical, the talk focuses on Twine, the popular text-gaming application that marries what Alexander Galloway would call the "proctological" openness of web technologies with the structure-mapping affordances of graphical hypertext systems. In some ways portraying Twine as a second coming of hypertext is a clear and perhaps intentional misreading. The talk ends by wondering what this misreading might reveal.

Climates of Change: An Online Exhibition of Creative Work

Given the global state of crisis, this year's exhibition was migrated, and a new call was distributed for online works designed to use hypertext to drive engagement with the current challenges. The curators distributed a call for works responding to the overwhelming "climates of change," with a particular emphasis on the ongoing environmental crisis. Thinking globally can be overwhelming: thus, this exhibition asks artists and viewers to engage with these global concerns through the lens of local and the personal. The exhibit features works that are brief and poetic; works that engage with moments and personal challenges; works that respond to local challenges and warnings for the future that is already here. The curators welcomed works positioned through the lens of the current moment; works that challenge and inspire us; and works that call out for reflection and change.

The curators particularly encouraged those submitting to draw on personal experiences or connections or understandings about climate change, the impacts, causes and effects. Pieces might also engage with how that understanding is changing every day under our growing collective challenges.

SESSION: Session 1: Hypertext Literature

What Authors Think about Hypertext Authoring

Despite significant research into authoring tools for interactive narratives and a number of established authoring platforms, there is still a lack of understanding around the authoring process itself, and the challenges that authors face when writing hypertext and other forms of interactive narratives. This has led to a monolithic view of authoring, which has hindered tool design, resulting in tools that can lack focus, or ignore important parts of the creative process. In order to understand how authors practise writing, we conducted semi-structured interviews with 20 interactive narrative authors. Using a qualitative analysis, we coded their comments to identify both processes and challenges, and then mapped these against each other in order to understand where issues occurred during the authoring process. In our previous work we were able to gather together a set of authoring steps that were relevant to interactive narratives through a review of the academic literature. Those steps were: Training/Support, Planning, Visualising/Structuring, Writing, Editing, and Compiling/Testing. In this work we discovered two additional authoring steps, Ideation and Publishing that had not been previously identified in our reviews of the academic literature - as these are practical concerns of authors that are invisible to researchers. For challenges we identified 18 codes under 5 themes, falling into 3 phases of development: Pre-production, where issues fall under User/Tool Misalignment and Documentation; Production, adding issues under Complexity and Programming Environment; and Post-production, replacing previous issues with longer term issues related to the narrative's Lifecycle. Our work shows that the authoring problem goes beyond the technical difficulties of using a system, rather it is rooted in the common misalignment between the authors' expectations and the tools capabilities, the fundamental tension between expressivity and complexity, and the invisibility of the edges of the process to researchers and tool builders. Our work suggests that a less monolithic view of authoring would allow designers to create more focused tools and address issues specifically at the places in which they occur.

Mediation as Calibration: A Framework for Evaluating the Author/Reader Relation

Emerging communication technologies remediate and redefine relations between reader and author, but a comprehensive progressive framework for assessing this dynamic during the process of preparation, transmission, reception, and consumption of media remains elusive. Such a framework is of consequence for hypertext (and first generation electronic literature in particular). Speculative claims for its utility and equally reductive rejections of the reading experience it offers call for a model which assesses the calibration of the reader/author relationship from within the medium itself. This paper presents a first framework for assessing these dynamics both at the stage of authoring and reading. Within this analysis framework we identify eleven remediating factors conceived as scales between opposing tensions, and implement this model with reference to first generation electronic literature.

On Links To Be: Exercises in Style #2

This contribution extends the discussion of the types and uses of links bootstrapped by Mason and Bernstein's "On Links: Exercises in Style", focusing on how authors use marginalia and annotations as links to the future. We argue that the development of a common semantics of "links to be" is needed in order to systematise individual authorial practices, provide greater interpretive understanding for readers and enable the development of new tools. We present examples on different types of annotations from the Holographic Vernon Lee project (HoL) and provide our own exercises to formulate a preliminary framework of links to be.

SESSION: Session 2: Hypertext Games and Digital Humanities

Hacking Droids and Casting Spells: Locative Augmented Reality Games and the Reimagining of the Theme Park

Locative play and augmented reality games are a growing part of consumer games, thanks to successful mobile experiments such as Niantic's Ingress and Pokemon GO. However, while experiments in using locative play to transform a space have been ongoing in the context of campuses, libraries, national parks, and other tourist destinations, such games reach a limited audience and are still viewed as novelties. The growing movement to bring this type of play into theme parks has the potential to change that, as locative augmented reality games such as Disney's Galaxy's Edge experience and Play Disney app gesture to a broader commercial future for locative interactive narrative. In this essay, we document how these commercial games experiences are transforming both guest expectations of theme parks, and broader understandings of locative play.

Keeping People Playing: The Effects of Domain News Presentation on Player Engagement in Educational Prediction Games

Educational prediction games use the popularity and engagement of fantasy sports as a success model to promote learning in other domains. Fantasy sports motivate players to stay up-to-date with relevant news and explore large statistical data sets, thereby deepening their domain understanding while potentially honing their data analysis skills. We conducted a study of fantasy sports players, and discovered that while some participants performed sophisticated data analysis to support their gameplay, far more relied on news and published commentary. We used results from this study to design a prototype prediction game, Fantasy Climate, which helps players move from intuitions and advice to consuming news and analyzing data by supporting a variety of activities essential to gameplay. Because news is a key component of Fantasy Climate, we evaluated two link-based interfaces to domain-related news, one geospatial and the other organized as a list. The evaluation revealed that news presentation has a strong effect on players' engagement and performance: players using the geospatial interface not only were more engaged in the game; they also made better predictions than players who used the list-based presentation.

Augustine as

This paper, Augustine as "Naturalist of the Mind", is a linear portal to its associated graph-structured Tinderbox hypertext. The hypertext is one component of a research project arising out of a Philosophy seminar on Augustine as the preeminent bridge philosopher between the ancient world of Greece and Rome and the subsequent 1000 years of Western philosophy. The research project explores some surprising insights that emerged during this seminar from a deep study of Augustine's Confessions: Book 10-Memory. The purpose of the Augustine as "Naturalist of the Mind" Tinderbox hypertext is not only to be a multi-dimensional resource base for the research but also to provide an exploratorium where new materials can be added, new relationships created, and new research directions can be discovered and pursued.

SESSION: Session 3: Web and the Society - I

The 'Fairness Doctrine' lives on?: Theorizing about the Algorithmic News Curation of Google's Top Stories

When one searches for political candidates on Google, a panel composed of recent news stories, known as Top stories, is commonly shown at the top of the search results page. These stories are selected by an algorithm that chooses from hundreds of thousands of articles published by thousands of news publishers. In our previous work, we identified 56 news sources that contributed 2/3 of all Top stories for 30 political candidates running in the primaries of 2020 US Presidential Election. In this paper, we survey US voters to elicit their familiarity and trust with these 56 news outlets. We find that some of the most frequent outlets are not familiar to all voters (e.g. The Hill or Politico), or particularly trusted by voters of any political stripes (e.g. Washington Examiner or The Daily Beast). Why then, are such sources shown so frequently in Top stories? We theorize that Google is sampling news articles from sources with different political leanings to offer a balanced coverage. This is reminiscent of the so-called "fairness doctrine'' (1949-1987) policy in the United States that required broadcasters (radio or TV stations) to air contrasting views about controversial matters. Because there are fewer right-leaning publications than center or left-leaning ones, in order to maintain this "fair'' balance, hyper-partisan far-right news sources of low trust receive more visibility than some news sources that are more familiar to and trusted by the public.

Anonymity Effects: A Large-Scale Dataset from an Anonymous Social Media Platform

Today online social media sites function as the medium of expression for billions of users. As a result, aside from conventional social media sites like Facebook and Twitter, platform designers introduced many alternative social media platforms (e.g., 4chan, Whisper, Snapchat, Mastodon) to serve specific userbases. Among these platforms, anonymous social media sites like Whisper and 4chan hold a special place for researchers. Unlike conventional social media sites, posts on anonymous social media sites are not associated with persistent user identities or profiles. Thus, these anonymous social media sites can provide an extremely interesting data-driven lens into the effects of anonymity on online user behavior. However, to the best of our knowledge, currently there are no publicly available datasets to facilitate research efforts on these anonymity effects.

To that end, in this paper, we aim to publicly release the first ever large-scale dataset from Whisper, a large anonymous online social media platform. Specifically, our dataset contains 89.8 Million Whisper posts (called "whispers'') published between a 2-year period from June 6, 2014 to June 6, 2016 (when Whisper was quite popular). Each of these whispers contained both post text and associated metadata. The metadata contains information like coarse-grained location of upload and categories of whispers. We also present preliminary descriptive statistics to demonstrate a significant language and categorical diversity in our dataset. We leverage previous work as well as novel analysis to demonstrate that the whispers contain personal emotions and opinions (likely facilitated by a disinhibition complex due to anonymity). Consequently, we envision that our dataset will facilitate novel research ranging from understanding online aggression to detect depression within online populace.

Unsupervised Fake News Detection: A Graph-based Approach

Fake news has become more prevalent than ever, correlating with the rise of social media that allows every user to rapidly publish their views or hearsay. Today, fake news spans almost every realm of human activity, across diverse fields such as politics and healthcare. Most existing methods for fake news detection leverage supervised learning methods and expect a large labelled corpus of articles and social media user engagement information, which are often hard, time-consuming and costly to procure. In this paper, we consider the task of unsupervised fake news detection, which considers fake news detection in the absence of labelled historical data. We develop GTUT, a graph-based approach for the task which operates in three phases. Starting off with identifying a seed set of fake and legitimate articles exploiting high-level observations on inter-user behavior in fake news propagation, it progressively expands the labelling to all articles in the dataset. Our technique draws upon graph-based methods such as biclique identification, graph-based feature vector learning and label spreading. Through an extensive empirical evaluation over multiple real-world datasets, we establish the improved effectiveness of our method over state-of-the-art techniques for the task.

How to Assess the Exhaustiveness of Longitudinal Web Archives: A Case Study of the German Academic Web

Longitudinal web archives can be a foundation for investigating structural and content-based research questions. One prerequisite is that they contain a faithful representation of the relevant subset of the web. Therefore, an assessment of the authority of a given dataset with respect to a research question should precede the actual investigation. Next to proper creation and curation, this requires measures for estimating the potential of a longitudinal web archive to yield information about the central objects the research question aims to investigate. In particular, content-based research questions often lack the ab-initio confidence about the integrity of the data. In this paper we focus on one specifically important aspect, namely the exhaustiveness of the dataset with respect to the central objects. Therefore, we investigate the recall coverage of researcher names in a longitudinal academic web crawl over a seven year period and the influence of our crawl method on the dataset integrity. Additionally, we propose a method to estimate the amount of missing information as a means to describe the exhaustiveness of the crawl and motivate a use case for the presented corpus.

SESSION: Session 4: Users and Web Search Interactions

How Does Team Composition Affect Knowledge Gain of Users in Collaborative Web Search?

Studies in searching as learning (SAL) have revealed that user knowledge gain not only manifests over a long-term learning period, but also occurs in single short-term web search sessions. Though prior works have shown that the knowledge gain of collaborators can be influenced by user demographics and searching strategies in long-term collaborative learning, little is known about the effect of these factors on user knowledge gain in short-term collaborative web search. In this paper, we present a study addressing the knowledge gain of user pairs in single collaborative web search sessions. Using crowdsourcing we recruited 454 unique users (227 random pairs), who then collaboratively worked on informational search tasks spanning 10 different topics and information needs. We investigated how users' demographics and traits, and the interaction between these factors could influence their knowledge gain. We found that in contrast to offline collaboration cases, user demographics such as gender, age, etc. do not significantly effect users' knowledge gain in collaborative web search sessions. Instead, our results highlight the presence of labor division of queries and particular interaction patterns in communication that facilitate knowledge gain in user pairs. Based on these findings, we propose a multiple linear regression model to predict the knowledge gain of users in collaborative web search sessions from the perspective of team composition.

Analyzing the Effects of

This study investigates the impact of collective questions and answers displayed on Search engine result pages (SERP), known as People also ask on searchers' behaviors and beliefs. Two experiments were conducted in which participants were asked to perform health-related search tasks. In both experiments, items in People also ask were manipulated. Experiment 1 focused on the effect of question, answer, and answer's opinion. Experiment 2 focused on the effect of the alternative question, i.e., a question related to the solution that can achieve the same goal as a query. The results revealed the following. (i) Participants issued fewer queries and spent less time on a SERP when People also ask were presented. (ii) Participants were less likely to interact with a SERP when they first encounter a belief-inconsistent answer. (iii) We could not confirm the effect of People also ask on beliefs at the current state. The findings suggest that People also ask might not help mitigate confirmation bias as participants are likely to spend less effort on the search process (i.e., issue fewer queries) when they first encounter a belief-inconsistent answer unlike when they encounter a belief-inconsistent document within the search results. An additional experiment is required to validate that participants who first encounter a belief-inconsistent answer are more likely to alter their beliefs as the number of such participants was inadequate.

Towards Personalized Annotation of Webpages for Efficient Screen-Reader Interaction

To interact with webpages, people who are blind use special-purpose assistive technology, namely screen readers that enable them to serially navigate and listen to the content using keyboard shortcuts. Although screen readers support a multitude of shortcuts for navigating over a variety of HTML tags, it has been observed that blind users typically rely on only a fraction of these shortcuts according to their personal preferences and knowledge. Thus, a mismatch between a user's repertoire of shortcuts and a webpage markup can significantly increase browsing effort even for simple everyday web tasks. Also, inconsistent usage of ARIA coupled with the increased adoption of styling and semantic HTML tags (e.g., <div>, <span>) for which there is limited screen-reader support, further make interaction arduous and frustrating for blind users.

To address these issues, in this work, we explore personalized annotation of webpages that enables blind users to efficiently navigate webpages using their preferred shortcuts. Specifically, our approach automatically injects personalized 'annotation' nodes into the existing HTML DOM such that blind users can quickly access certain semantically-meaningful segments (e.g., menu, search results, filter options, calendar widget, etc.) on the page, using their preferred screen-reader shortcuts. Using real shortcut profiles collected from 5 blind screen-reader users doing representative web tasks, we observed that with personalized annotation, the interaction effort can be potentially reduced by as much as 48 (average) shortcut presses.

SESSION: Blue Sky Ideas - I

Thoughts Reflection Machine

This blue sky paper presents the Thoughts Reflection Machine (TRM) which combines hypertext technologies and intelligent components. Using hypertext, the TRM provides means to its users to express or communicate their thoughts and ideas. Furthermore, the machine suggests relevant information that trigger users' creative thinking. The TRM is an approach towards a tight cooperation between human and machine supporting both in their specific tasks in which they are most excellent in: creative problem solving respective computation of huge data sets.

Games/Hypertext

The relationship between hypertext research and games design is not clear, despite the striking similarity between literary hypertexts and narrative games. This matters as different communities are now exploring hypertext, interactive fiction, electronic literature, and narrative games from different perspectives - but lack a common critical vocabulary or shared body of work with which they can communicate. In this paper I attempt to deconstruct the relationship between literary hypertext and narrative games. I do this through two lenses. Firstly, by looking at Hypertext as Games; with a specific set of mechanics based around textual lexia and link-following (but with a tradition of exploring alternative Strange Hypertext approaches) resulting in a dynamic of exploration and puzzle solving depending on whether agency is expressed at the level of Syuzhet or Fabula. Secondly, by looking at Games as Hypertexts; that depend heavily on textual content, use guard fields, patterns, and sculptural hypertext models to manage agency, that experiment with aporia and epiphany, and that take place within a wider interlinked transmedia experience. This analysis reveals that Narrative Games are both more and less than Hypertext, with a wider set of mechanics and interfaces, but possessed of a core hypertextuality and situated within a greater hypertext context. This suggests that there is much value to be gained from interactions between the communities invested in interactive narrative, and significant potential in the cross-pollination of ideas.

Addressing the Skies of the Future of Text: A Call for Continuous Improvement in Infrastructures

The idiom 'blue sky' refers to 'thinking not grounded or in touch with the way things are'. If we extrapolate far enough into blue sky thinking we can wish for buttons marked 'Do Work' , 'Education' and 'Understand me'. These are all (absolutely worthy) desired results but the interaction entirely removes the human from the equation-even another machine could have pushed the button. To honestly address these wished for improvements in work, education and communication, we need to accept that there will be no magic bullet, pill or machine. These areas are not end-result type areas, such as pill which cures an ailment, they are processes which needs the human's active integration in order to have value. If a button can be pressed to do your work, you no longer have a job.

SESSION: Poster Session - I

Kurios: A Web App for Saving and Sharing Audio Memories with Physical Objects

Kurios is a smartphone web application for saving and sharing audio stories embedded in physical objects. People can use Kurios to preserve their memories sparked by family photos, heirlooms, travel souvenirs, and trophies into the objects themselves. By creating this easily accessed semantic platform, the project seeks to not only preserve past stories but also promote the bonds and sense of community that come with storytelling itself.

The Narrative of the Image

This poster paper explores semiotics and rhetoric as narrative in social media visual culture, specifically with issues of identity and social change on social media platforms such as YouTube. Under the umbrella of semiotics, postmodernism, and poststructuralism, the paper builds upon the work of Roland Barthes, Stuart Hall, and Safiya Umoja Noble by expanding the concepts of visual semiotics, visual rhetoric, postcolonialism, critical race theory, and algorithms to examine the narrative of the image.

Hypertext as a Tool for Exploring Personal Data on Social Media

Social networks such as Facebook are required to provide users with their personal data. However, these dumps do not provide users insight in overarching themes in their online behavior. In this poster, we discuss the development of Mother, a spatial hypertext system for visual data exploration. First insights include that the less obvious connections are more interesting and relevant to the user than very close semantic or temporal connections.

Date the Artist: A Virtual Date with a Virtual Character

Date the Artist is a video based interactive website that presents people with the opportunity to date a virtual character (VC). Each webpage represents a stage of the relationship. In each page, the user has 2 to 3 choices leading to different videos. By the end, all the videos lead to the same final video representing the end of the relationship. The goal of this paper is to explain the design process, the main metaphor behind the project, as well as highlighting the overall user experience.

What is BitChute?: Characterizing the

In this paper, we characterize the content and discourse on BitChute, a social video-hosting platform. Launched in 2017 as an alternative to YouTube, BitChute joins an ecosystem of alternative, low content moderation platforms, including Gab, Voat, Minds, and 4chan. Uniquely, BitChute is the first of these alternative platforms to focus on video content and is growing in popularity. Our analysis reveals several key characteristics of the platform. We find that only a handful of channels receive any engagement, and almost all of those channels contain conspiracies or hate speech. This high rate of hate speech on the platform as a whole, much of which is anti-Semitic, is particularly concerning. Our results suggest that BitChute has a higher rate of hate speech than Gab but less than 4chan. Lastly, we find that while some BitChute content producers have been banned from other platforms, many maintain profiles on mainstream social media platforms, particularly YouTube. This paper contributes a first look at the content and discourse on BitChute and provides a building block for future research on low content moderation platforms.

SESSION: Session 5: Web and the Society - II

Matching User Preferences and Behavior for Mobility

Understanding user mobility is central to develop better transport systems that answer users' needs. Users usually plan their travel according to their needs and preferences; however, different factors can influence their choices when traveling. In this work, we model users' preferences, and we match their actual transport use. We use data coming from a mobility platform developed for mobile devices, whose aim is to understand the value of users' travel time. Our first goal is to characterize the perception that users have of their mobility by analyzing their general preferences expressed before their travel time. Our approach combines dimensionality reduction and clustering techniques to provide interpretable profiles of users. Then, we perform the same task after monitoring users' travels by doing a matching between users' preferences and their actual behavior. Our results show that there are substantial differences between users' perception of their mobility and their actual behavior: users overestimate their preferences for specific mobility modes, that in general, yield a lower return in terms of the worthwhileness of their trip.

Understanding Targeted Video-Ads in Children's Content

As the volume of online video entertainment via streaming increases, ever so more are users targeted by online advertisement algorithms. Nevertheless, this rise in targeting and revenue does not come without any concerns. That is, even though the online advertising business model has is very successful, nowadays, rising societal concerns regarding the ethics and extent to which such algorithms agree with the laws of different countries are also present. Motivated by the dichotomy above, we here explore how targeted video-ads meet the regulatory policies regarding children advertising in Brazil and Canada. To perform our study, we create synthetic user personas that watch YouTube videos daily. Our personas are tailored to stream children's content while controlling for several variables (e.g., gender, country, and type of content streamed). With the data gathered, our analyses reveal statistical evidence of algorithmic targeting in videos geared towards children. Also, some of the advertised products (e.g., alcoholic beverages and fast-food) go directly against the regulations of the studied countries. With advertisements being matched to users by machine learning algorithms, it is impossible to state whether regulations are not followed on purpose (e.g., advertisers gaming the system). Nevertheless, our findings and discussion do raise a flag that regulations may not be sufficient, and content providers may still need to audit systems to meet the regulations.

Wikipedia and Westminster: Quality and Dynamics of Wikipedia Pages about UK Politicians

Wikipedia is a major source of information providing a large variety of content online, trusted by readers from around the world. Readers go to Wikipedia to get reliable information about different subjects, one of the most popular being living people, and especially politicians. While a lot is known about the general usage and information consumption on Wikipedia, less is known about the life-cycle and quality of Wikipedia articles in the context of politics. The aim of this study is to quantify and qualify content production and consumption for articles about politicians, with a specific focus on UK Members of Parliament (MPs).

First, we analyze spatio-temporal patterns of readers' and editors' engagement with MPs' Wikipedia pages, finding huge peaks of attention during election times, related to signs of engagement on other social media (e.g. Twitter). Second, we quantify editors' polarisation and find that most editors specialize in a specific party and choose specific news outlets as references. Finally we observe that the average citation quality is pretty high, with statements on 'Early life and career' missing citations most often (18%).

SESSION: Session 6: Hypertext Infrastructures and User Interfaces

Personalizing Information Exploration with an Open User Model

Over the past two decades, several information exploration approaches were suggested to support a special category of search tasks known as exploratory search. These approaches creatively combined search, browsing, and information analysis steps shifting user efforts from recall (formulating a query) to recognition (i.e., selecting a link) and helping them to gradually learn more about the explored domain. More recently, a few projects demonstrated that personalising the process of information exploration with models of user interests can add value to information exploration systems. However, the current model-based information exploration interfaces are very sophisticated and focus on highly experienced users. The project presented in this paper attempted to assess the value of open user modeling in supporting personalized information exploration by novice users. We present an information exploration system with an open and controllable user model, which supports undergraduate students in finding research advisors. A controlled study of this system with target users demonstrated its advantage over a traditional search interface and revealed interesting aspects of user behavior in a model-based interface.

Text2SceneVR: Generating Hypertexts with VAnnotatoR as a Pre-processing Step for Text2Scene Systems

The automatic generation of digital scenes from texts is a central task of computer science. This task requires a kind of text comprehension, the automation of which is tied to the availability of sufficiently large, diverse and deeply annotated data, which is freely available. This paper introduces Text2SceneVR, a system that addresses this bottleneck problem by allowing its users to create a sort of spatial hypertexts in Virtual Reality (VR). We describe Text2SceneVR's data model, its user interface and a number of problems related to the implicitness of natural language in the manifestation of spatial relations that Text2SceneVR aims to address while trying to remain language independent. Finally, we present a user study with which we evaluated Text2SceneVR.

SESSION: Session 7: Recommender Systems

You Do Not Decide for Me! Evaluating Explainable Group Aggregation Strategies for Tourism

Most recommender systems propose items to individual users. However, in domains such as tourism, people often consume items in groups rather than individually. Different individual preferences in such a group can be difficult to resolve, and often compromises need to be made. Social choice strategies can be used to aggregate the preferences of individuals. We evaluated two explainable modified preference aggregation strategies in a between-subject study (n=200), and compared them with two baseline strategies for groups that are also explainable, in two scenarios: high divergence (group members with different travel preferences) and low divergence (group members with similar travel preferences). Generally, all investigated aggregation strategies performed well in terms of perceived individual and group satisfaction and perceived fairness. The results also indicate that participants were sensitive to a dictator-based strategy, which affected both their individual and group satisfaction negatively (compared to the other strategies).

Calibration in Collaborative Filtering Recommender Systems: a User-Centered Analysis

Recommender systems learn from past user preferences in order to predict future user interests and provide users with personalized suggestions. Previous research has demonstrated that biases in user profiles in the aggregate can influence the recommendations to users who do not share the majority preference. One consequence of this bias propagation effect is miscalibration, a mismatch between the types or categories of items that a user prefers and the items provided in recommendations. In this paper, we conduct a systematic analysis aimed at identifying key characteristics in user profiles that might lead to miscalibrated recommendations. We consider several categories of profile characteristics, including similarity to the average user, propensity towards popularity, profile diversity, and preference intensity. We develop predictive models of miscalibration and use these models to identify the most important features correlated with miscalibration, given different algorithms and dataset characteristics. Our analysis is intended to help system designers predict miscalibration effects and to develop recommendation algorithms with improved calibration properties.

Simulating the Impact of Recommender Systems on the Evolution of Collective Users' Choices

The major focus of recommender systems (RSs) research is on improving the goodness of the generated recommendations. Less attention has been dedicated to understand the effect of an RS on the actual users' choices. Hence, in this paper, we propose a novel simulation model of users' choices under the influence of an RS. The model leverages real rating/choice data observed up to a point in time in order to simulate next, month-by-month, choices of the users. We have analysed choice diversity, popularity and utility and found that: RSs have different effects on the users' choices; the behaviour of new users is particularly important to understand collective choices; and the users' previous knowledge, i.e., their "awareness" of the item catalogue greatly affects choice diversity.

SESSION: Blue Sky Ideas - II

Learning about Queer Representation through Mods: Reviewing Past Challenges and Outlining Ideas About Future Approaches

In this paper I outline an idea that I am researching more fully in my PhD dissertation: using modification of video games as an educational method to explore queer representation. This paper primarily focuses on educational concerns related to that approach: I first review discussions about the challenges surrounding game-based learning approaches and claim that some common points of skepticism in these discussions actually highlight the effectiveness of games for learning about queer representation. I also suggest that, as a genre, choice-based interactive fiction is ideal for exploratory learning about queer representation in comparison to other kinds of games. I then analyze educational uses of video game modification, noting that one of the most common approaches is educators modifying popular games for the classroom. I claim that approach could work especially well as a way to explore problems with representation in video games. I close the article with a short discussion about expanding upon this approach by building an easily-modifiable interactive fiction game to address issues with portrayals of queerness in mainstream games, a method I examine more fully in my PhD dissertation and through an associated digital game prototype. Overall, I argue that interactive fiction works are very well-suited to dealing with issues of representation and that modifiable interactive fiction games are an effective way to learn more about problems with representation in video games.

Docuverse Despatch: Information Farming for the Collective

Since the 1993 paper on Information Farming[5], hypertext has grown in scale and in the degree of its collective editing and use. This paper reflects on what these changes in scale and volume mean for the task of the information farmer and asks if we understand the skills and tools needed for the task of sustaining the docusphere.

Bad Character: Who do We Want our Hypertexts to Be?

We frequently assume that adaptive hypertexts ought to adopt the customs, habits and inclinations of the reader, that our computational assistants ought to act as reliable servants, and that users - even new users - ought to like the hypertextual artifacts we create. This might be a mistake.

Visually-Aware Video Recommendation in the Cold Start

Recommender Systems have become essential tools in any modern video-sharing platform. Although, recommender systems have shown to be effective in generating personalized suggestions in video-sharing platforms, however, they suffer from the so-called New Item problem. New item problem, as part of Cold Start problem, happens when a new item is added to the system catalogue and the recommender system has no or little data available for that new item. In such a case, the system may fail to meaningfully recommend the new item to the users.

In this paper, we propose a novel recommender system that is based on visual tags, i.e., tags that are automatically annotated to videos based on visual description of the videos. Such visual tags can be used in an extreme cold start situation, where neither any rating, nor any tag is available for the new video. The visual tags could also be used in the moderate cold start situation when the new video might have been annotated with few tags. This type of content features can be extracted automatically without any human involvement and have been shown to be very effective in representing the video content.

We have used a large dataset of videos and shown that automatically extracted visual tags can be incorporated into the cold start recommendation process and achieve superior results compared to the recommendation based on human-annotated tags.

SESSION: Poster Session - II

Man is to Person as Woman is to Location: Measuring Gender Bias in Named Entity Recognition

In this paper, we study the bias in named entity recognition (NER) models---specifically, the difference in the ability to recognize male and female names as PERSON entity types. We evaluate NER models on a dataset containing 139 years of U.S. census baby names and find that relatively more female names, as opposed to male names, are not recognized as PERSON entities. The result of this analysis yields a new benchmark for gender bias evaluation in named entity recognition systems. The data and code for the application of this benchmark is publicly available for researchers to use.

A Taxonomy of User Actions on Social Networking Sites

The spread of information within and across Social Networking Sites (SNSs) is increasingly impactful on contemporary society. As information (and misinformation) moves across multiple online platforms, it is important to be able to put these platforms in conversation with one another in order to better understand complex phenomena. This article proposes a taxonomy of actions that are consistent across SNSs to provide researchers and other stakeholders with consistent terminology that enables classifying and comparing user activities over a variety of social media platforms. The proposed taxonomy of actions indicates that although SNSs differentiate themselves in the market and at the level of user experience through unique capabilities and forms of interaction, they can be productively understood as varying means to perform the same set of underlying actions: create, vote, follow, and post.

Link Prediction in Signed Networks

Signed networks represent the real world relationships, which are both positive or negative. Recent research works focus on either discriminative or generative based models for signed network embedding. In this paper, we propose a generative adversarial network (GAN) model for signed network which unifies generative and discriminative models to generate the node embedding. Our experimental evaluations on several datasets, like Slashdot, Epinions, Reddit, Bitcoin and Wiki-RFA indicates that the proposed approach ensures better macro F1-score than the existing state-of-the-art approaches in link prediction and handling of sparsity of signed networks.

Dash: A Hyper Framework

Popular application suites, as well as specialized apps, are designed for workflows in which users focus on a single task for extended periods of time. These application silos slow down the many other workflows that require users to move with agility between tasks in a single working session. This is particularly true for creative people who have personalized patterns of gathering, organizing, and presenting information from a variety of sources. Moreover, each application comes with its own learning curve and data model, restricting users seeking to extend their workflows and in some cases, losing data through poor data transferring mechanisms such as clipboard copy and paste.

Dash is an open component-based hypermedia system that provides what we believe to be a best of breed set of components. While each component can be used in isolation as less full featured versions of an analogous application, Dash allows users to interoperate components and compose their own workflows without losing data or expending effort while switching between tasks. As hypertext allowed users to flexibly move between texts, Dash allows users to flexibly move between tasks.

Towards Extending Wikipedia with Bidirectional Links

In this paper, we present the results of our WikiLinks project which aims at extending current Wikipedia linkage mechanisms. Wikipedia has become recently one of the most important information sources on the Internet, which still is based on relatively simple linkage facilities. A WikiLinks system extends the Wikipedia with bidirectional links between fragments of articles. However, there were several attempts to introduce bidirectional fragment-fragment links to the Web, WikiLinks project is the first attempt to bring the new linkage mechanism directly to Wikipedia.

SESSION: Session 8: Social Media Analysis

A Sociolinguistic Route to the Characterization and Detection of the Credibility of Events on Twitter

Although Twitter constitutes as one of the primary sources of real-time news with users acting as the sensors updating the content from all across the globe, yet the spread of rumours via Twitter is becoming an increasingly alarming issue and is known to have caused significant damage already. We propose a credibility analysis approach based on the linguistic structure of the tweets. We not only characterize the Twitter events but also predict their perceived credibility of them by a novel deep learning architecture. We use the huge CREDBANK data to conduct our experiments. Some of our exciting findings are that standard LIWC categories like 'negate', 'discrep', 'cogmech', 'swear' and the Empath categories like 'hate', 'poor', 'government', 'worship' and 'swearing-terms' correlate negatively with the credibility of events. While some of our results resonate with the earlier literature others represent novel insights of the fake and legitimate twitter events. Using the above observations and the current deep learning architecture we predict the credibility of an event (a four-class classification problem in our case) with an accuracy of 0.54 that improves the best-known state-of-the-art (current accuracy 0.43) by ~ 26%. A fascinating observation is that even by looking at the first few tweets of an event, it is possible to make the prediction almost as accurate as in the case where the entire volume of tweets is observed.

NesTPP: Modeling Thread Dynamics in Online Discussion Forums

Online discussion forum creates an asynchronous conversation environment for online users to exchange ideas and share opinions through a unique thread-reply communication mode. Accurately modeling information dynamics under such a mode is important, as it provides a means of mining latent spread patterns and understanding user behaviors. In this paper, we design a novel temporal point process model to characterize information cascades in online discussion forums. The proposed model views the entire event space as a nested structure composed of main thread streams and their linked reply streams, and it explicitly models the correlations between these two types of streams through their intensity functions. Leveraging the Reddit data, we examine the performance of the designed model in different applications and compare it with other popular methods. The experimental results have shown that our model can produce competitive results, and it outperforms state-of-the-art methods in most cases.

Scalable Heterogeneous Social Network Alignment through Synergistic Graph Partition

Social network alignment has been an important research problem for social network analysis in recent years. With the identified shared users across networks, it will provide researchers with the opportunity to achieve a more comprehensive understanding of users' social activities both within and across networks. Social network alignment is a very difficult problem. Besides the challenges introduced by the network heterogeneity, the network alignment can be reduced to a combinatorial optimization problem with an extremely large search space. The learning effectiveness and efficiency of existing alignment models will be degraded significantly as the network size increases. In this paper, we focus on studying the scalable heterogeneous social network alignment problem and propose to address it with a novel two-stage network alignment model, namely Scalable Heterogeneous Network Alignment (SHNA). Based on a group of intra- and inter-network meta diagrams, SHNA first partitions the social networks into a group of sub-networks synergistically. Via the partially known anchor links, SHNA can extract the partitioned sub-network correspondence relationships. Instead of aligning the complete input network, SHNA proposes to identify the anchor links between the matched sub-network pairs, while those between the unmatched sub-networks will be pruned to effectively shrink the search space. Extensive experiments have been done to compare SHNA with the state-of-the-art baseline methods on a real-world aligned social networks dataset. The experimental results have demonstrated both the effectiveness and efficiency of SHNA in addressing the problem.

Noise-Enhanced Community Detection

Community structure plays a significant role in uncovering the structure of a network. While many community detection algorithms have been introduced, improving the quality of detected communities is still an open problem. In many areas of science, adding noise improves system performance and algorithm efficiency, motivating us to also explore the possibility of adding noise to improve community detection algorithms. We propose a noise-enhanced community detection framework that improves communities detected by existing community detection methods. The framework introduces three noise methods to help detect communities better. Theoretical justification and extensive experiments on synthetic and real-world datasets show that our framework helps community detection methods find better communities.

SESSION: Session 9: Recommender Systems and Narratives

Probabilistic Model of Narratives Over Topical Trends in Social Media: A Discrete Time Model

Online social media platforms are turning into the prime source of news and narratives about worldwide events. However, a systematic summarization-based narrative extraction that can facilitate communicating the main underlying events is lacking. To address this issue, we propose a novel event-based narrative summary extraction framework. Our proposed framework is designed as a probabilistic topic model, with categorical time distribution, followed by extractive text summarization. Our topic model identifies topics' recurrence over time with a varying time resolution. This framework not only captures the topic distributions from the data, but also approximates the user activity fluctuations over time. Furthermore, we define significance-dispersity trade-off (SDT) as a comparison measure to identify the topic with the highest lifetime attractiveness in a timestamped corpus. We evaluate our model on a large corpus of Twitter data, including more than one million tweets in the domain of the disinformation campaigns conducted against the White Helmets of Syria. Our results indicate that the proposed framework is effective in identifying topical trends, as well as extracting narrative summaries from text corpus with timestamped data.

Off-line vs. On-line Evaluation of Recommender Systems in Small E-commerce

In this paper, we present our work towards comparing on-line and off-line evaluation metrics in the context of small e-commerce recommender systems. Recommending on small e-commerce enterprises is rather challenging due to the lower volume of interactions and low user loyalty, rarely extending beyond a single session. On the other hand, we usually have to deal with lower volumes of objects, which are easier to discover by users through various browsing/searching GUIs.

The main goal of this paper is to determine applicability of off-line evaluation metrics in learning true usability of recommender systems (evaluated on-line in A/B testing). In total 800 variants of recommenders were evaluated off-line w.r.t. 18 metrics covering rating-based, ranking-based, novelty and diversity evaluation. The off-line results were afterwards compared with on-line evaluation of 12 selected recommender variants and based on the results, we tried to learn and utilize an off-line to on-line results prediction model.

Off-line results shown a great variance in performance w.r.t. different metrics with the Pareto front covering 64% of the approaches. Furthermore, we observed that on-line results are considerably affected by the seniority of users. On-line metrics correlates positively with ranking-based metrics (AUC, MRR, nDCG) for novice users, while too high values of novelty had a negative impact on the on-line results for them.

Tell Me What You Want: Embedding Narratives for Movie Recommendations

Recommender systems are efficient exploration tools providing their users with valuable suggestions about items, such as products or movies. However, in scenarios where users have more specific ideas about what they are looking for (e.g., they provide describing narratives, such as "Movies with minimal story, but incredible atmosphere, such as No Country for Old Men"), traditional recommender systems struggle to provide relevant suggestions. In this paper, we study this problem by investigating a large collection of such narratives from the movie domain. We start by empirically analyzing a dataset containing free-text narratives representing movie suggestion requests from reddit users as well as community suggestions to those requests. We find that community suggestions are frequently more diverse than requests, making a recommendation task a challenging one. In a prediction experiment, we use embedding algorithms to assess the importance of request features including movie descriptions, genres, and plot keywords, by computing recommendations. Our findings suggest that, in our dataset, positive movies and keywords have the strongest, whereas negative movie features have the weakest predictive power. We strongly believe that our new insights into narratives for recommender systems represent an important stepping stone towards novel applications, such as interactive recommender applications.

SESSION: Workshop and Tutorial Summaries

3rd Workshop on Human Factors in Hypertext (HUMAN'20)

HUMAN'20 is the third workshop of a series for the ACM Hypertext conferences. It has a strong focus on the user and thus is complementary to the strong machine analytics research direction that could be experienced in previous conferences.

>The user-centric view on hypertext not only includes user interfaces and interaction, but also discussions about hypertext application domains. Furthermore, the workshop raises the question of how original hypertext ideas (e.g., Doug Engelbart's "augmenting human intellect" or Frank Halasz' "hypertext as a medium for thinking and communication") can improve today's hypertext systems.

Knowledge-infused Deep Learning

Deep Learning has shown remarkable success during the last decade for essential tasks in computer vision and natural language processing. Yet, challenges remain in the development and deployment of artificial intelligence (AI) models in real-world cases, such as dependence on extensive data and trust, explainability, traceability, and interactivity. These challenges are amplified in high-risk fields, including healthcare, cyber threats, crisis response, autonomous driving, and future manufacturing. On the other hand, symbolic computing with knowledge graphs has shown significant growth in specific tasks with reliable performance. This tutorial (a) discusses the novel paradigm of knowledge-infused deep learning to synthesize neural computing with symbolic computing (b) describes different forms of knowledge and infusion methods in deep learning, and (c) discusses application-specific evaluation methods to assure explainability and reasoning using benchmark datasets and knowledge-resources. The resulting paradigm of "knowledge-infused learning'' combines knowledge from both domain expertise and physical models. A wide variety of techniques involving shallow, semi-deep, and deep infusion will be discussed along with the corresponding intuitions, limitations, use cases, and applications. More details can be found \urlhttp://kidl2020.aiisc.ai/.