# CIKM '19- Proceedings of the 28th ACM International Conference on Information and Knowledge Management

Full Citation in the ACM Digital Library

### Autonomous Driving Towards Mass Production

• Jianping Shi

Visual recognition technology is very important for autonomous driving especially in direction of mass production. In this talk, we will introduce the algorimic progress for SenseTime in autonoumous driving, as well as our platform foundation for AI technology. Based on this, we illustrate how we make use of these technology into mass production product for autonomous driving.

### The Fisher-Rao Metric in Computer Vision

• Stephen John Maybank

The Fisher-Rao metric is a Riemannian metric defined on any manifold that forms the parameter space for a family of probability distributions. The metric is specified by quadratic forms defined on the tangent spaces of the manifold. If a parameterisation of the manifold is chosen then each quadratic form is given by a symmetric positive definite matrix. Lengths, areas, volumes and hyper-volumes calculated using the Fisher-Rao metric are invariant under reparameterisation. This invariance is essential in practice because the parameterisation can be changed arbitrarily while keeping the data unchanged. The Fisher-Rao metric is obtained as a limit of the expected value of the log likelihood ratio for two nearby probability distributions. The inverse of the Fisher-Rao matrix is the Cramer-Rao lower bound on the covariance of an unbiased estimate of a parameter. The Fisher-Rao metric is used to divide the parameter space for the Hough transform method for detecting structures in data. Each division or accumulator is invariant under reparametrerisation, and the number of accumulators is proportional to the volume of the parameter space. Accurate approximations to the Fisher Rao metric are obtained for lines, catadioptric images of lines, circles, ellipses and the cross ratio. It is shown that the Fisher-Rao metric can be used to compare the amount of information in point features with the amount of information in edge element features.

### From Unstructured Text to TextCube: Automated Construction and Multidimensional Exploration

• Jiawei Han

The real-world big data are largely unstructured, interconnected, and dynamic, in the form of natural language text. It is highly desirable to transform such massive unstructured data into structured knowledge. Many researchers rely on labor-intensive labeling and curation to extract knowledge from such data, which may not be scalable, especially considering that a lot of text corpora are highly dynamic and domain specific. We believe that massive text data itself may disclose a large body of hidden patterns, structures, and knowledge. With domain-independent and domain-dependent knowledge bases, we propose to explore the power of massive data itself for turning unstructured data into structured knowledge. By organizing massive text documents into multidimensional text cubes, we show structured knowledge can be extracted and used effectively. In this talk, we introduce a set of methods developed recently in our group for such an exploration, including mining quality phrases, entity recognition and typing, multi-faceted taxonomy construction, and construction and exploration of multi-dimensional text cubes. We show that data-driven approach could be a promising direction at transforming massive text data into structured knowledge.

### Practicing the Art of Data Science

• Jian Pei

Data science embraces interdisciplinary methodologies and tools, such as those in statistics, artificial intelligence/machine learning, data management, algorithms, and computation. Practicing data science to empower innovative applications, however, remains an art due to many factors beyond technology, such as sophistication of application scenarios, business demands, and the central role of human being in the loop.

The purpose of this keynote speech is to share with the audience two most important rules of thumb that I learned from my practice of data science research, development and applications, as well as my thoughts on the future enterprise and organization data strategies.

First, I will demonstrate the importance and challenges in developing domain-oriented, end-to-end solutions. Specifically, I will discuss our experience in transforming algorithms to domain-oriented tools, and review some of our latest techniques in transforming black-box deep learning networks into interpretable white-box models.

Second, I will advocate the core value of data science as the connector and transformer between vertical application challenges and general scientific principles and engineering tools. Using network embedding as an example, I will illustrate the innovative value of building connectors and transformers for new types of data and applications so that they can take great advantage of well established scientific methods and engineering tools.

I envision data science for social, commercial and ecological good has to build on enterprise and organization data strategies and infrastructure. About future work, I will provide some thoughts on this perspective, such as data value assessing and pricing, as well as privacy preservation.

## SESSION: Long - Algorithmic Techniques

### On VR Spatial Query for Dual Entangled Worlds

• Shao-Heng Ko
• Ying-Chun Lin
• Hsu-Chao Lai
• Wang-Chien Lee
• De-Nian Yang

With the rapid advent of Virtual Reality (VR) technology and virtual tour applications, there is a research need on spatial queries tailored for simultaneous movements in both the physical and virtual worlds. Traditional spatial queries, designed mainly for one world, do not consider the entangled dual worlds in VR. In this paper, we first investigate the fundamental shortest-path query in VR as the building block for spatial queries, aiming to avoid hitting boundaries and obstacles in the physical environment by leveraging Redirected Walking (RW) in Computer Graphics. Specifically, we first formulate Dual-world Redirected-walking Obstacle-free Path (DROP) to find the minimum-distance path in the virtual world, which is constrained by the RW cost in the physical world to ensure immersive experience in VR. We prove DROP is NP-hard and design a fully polynomial-time approximation scheme, Dual Entangled World Navigation (DEWN), by finding Minimum Immersion Loss Range (MIL Range). Afterward, we show that the existing spatial query algorithms and index structures can leverage DEWN as a building block to support kNN and range queries in the dual worlds of VR. Experimental results and a user study with implementation in HTC VIVE manifest that DEWN outperforms the baselines with smoother RW operations in various VR scenarios.

### Sketching Streaming Histogram Elements using Multiple Weighted Factors

• Quang-Huy Duong
• Heri Ramampiaro
• Kjetil Nørvåg

We propose a novel sketching approach for streaming data that, even with limited computing resources, enables processing high volume and high velocity data efficiently. Our approach accounts for the fact that a stream of data is generally dynamic, with the underlying distribution possibly changing all the time. Specifically, we propose a hashing (sketching) technique that is able to automatically estimate a histogram from a stream of data by using a model with adaptive coefficients. Such a model is necessary to enable the preservation of histogram similarities, following the varying weight/importance of the generated histograms. To address the dynamic properties of data streams, we develop a novel algorithm that can sketch the histograms from a data stream using multiple weighted factors. The results from our extensive experiments on both synthetic and real-world datasets show the effectiveness and the efficiency of the proposed method.

### Improved Compressed String Dictionaries

• Nieves R. Brisaboa
• Ana Cerdeira-Pena
• Guillermo de Bernardo
• Gonzalo Navarro

We introduce a new family of compressed data structures to efficiently store and query large string dictionaries in main memory. Our main technique is a combination of hierarchical Front-coding with ideas from longest-common-prefix computation in suffix arrays. Our data structures yield relevant space-time tradeoffs in real-world dictionaries. We focus on two domains where string dictionaries are extensively used and efficient compression is required: URL collections, a key element in Web graphs and applications such as Web mining; and collections of URIs and literals, the basic components of RDF datasets. Our experiments show that our data structures achieve better compression than the state-of-the-art alternatives while providing very competitive query times.

### On Transforming Relevance Scales

• Lei Han
• Kevin Roitero
• Stefano Mizzaro
• Gianluca Demartini

Information Retrieval (IR) researchers have often used existing IR evaluation collections and transformed the relevance scale in which judgments have been collected, e.g., to use metrics that assume binary judgments like Mean Average Precision. Such scale transformations are often arbitrary (e.g., 0,1 mapped to 0 and 2,3 mapped to 1) and it is assumed that they have no impact on the results of IR evaluation. Moreover, the use of crowdsourcing to collect relevance judgments has become a standard methodology. When designing the crowdsourcing relevance judgment task, one of the decision to be made is the how granular the relevance scale used to collect judgments should be. Such decision has then repercussions on the metrics used to measure IR system effectiveness. In this paper we look at the effect of scale transformations in a systematic way. We perform extensive experiments to study the transformation of judgments from fine-grained to coarse-grained. We use different relevance judgments expressed on different relevance scales and either expressed by expert annotators or collected by means of crowdsourcing. The objective is to understand the impact of relevance scale transformations on IR evaluation outcomes and to draw conclusions on how to best transform judgments into a different scale, when necessary.

### Streamline Density Peak Clustering for Practical Adoptions

• Shuai Yang
• Xipeng Shen
• Min Chi

Since Density Peak Clustering (DPC) algorithm was proposed in 2014, it has drawn lots of interest in various domains. As a clustering method, DPC features superior generality, robustness, flexibility and simplicity. There are however two main roadblocks for its practical adoptions, both centered around the selection of cutoff distance, the single critical hyperparameter of DPC. This work proposes an improved algorithm named Streamlined Density Peak Clustering (SDPC). SDPC speeds up DPC executions on a sequence of cutoff distances by 2.2-8.8X while at the same time reducing memory usage by a magnitude. As an algorithm preserving the original semantic of DPC, SDPC offers an efficient and scalable drop-in replacement of DPC for data clustering.

## SESSION: Long - Analyzing Spatio-Temporal Data

### Recommendation-based Team Formation for On-demand Taxi-calling Platforms

• Lingyu Zhang
• Tianshu Song
• Yongxin Tong
• Zimu Zhou
• Dan Li
• Wei Ai
• Lulu Zhang
• Guobin Wu
• Yan Liu
• Jieping Ye

On-demand taxi-calling platforms often ignore the social engagement of individual drivers. The lack of social incentives impairs the work enthusiasms of drivers and will affect the quality of service. In this paper, we propose to form teams among drivers to promote participation. A team consists of a leader and multiple members, which acts as the basis for various group-based incentives such as competition. We define the Recommendation-based Team Formation (RTF) problem to form as many teams as possible while accounting for the choices of drivers. The RTF problem is challenging. It needs both accurate recommendation and coordination among recommendations, since each driver can be in at most one team. To solve the RTF problem, we devise a Recommendation-Matrix-Based Framework (RMBF). It first estimates the acceptance probability of recommendations and then derives a recommendation matrix to maximize the number of formed teams from a global view. We conduct trace-driven simulations using real data covering over 64,000 drivers and deploy our solution on a large on-demand taxi-calling platform for online evaluations. Experimental results show that RMBF outperforms the greedy-based strategy by forming up to 20% and 12.4% teams in trace-driven simulations and online evaluations, and the drivers who form teams and are involved in the competition have more service time, number of finished orders and income.

### DeepIST: Deep Image-based Spatio-Temporal Network for Travel Time Estimation

• Tao-yang Fu
• Wang-Chien Lee

Estimating the travel time for a given path is a fundamental problem in many urban transportation systems. However, prior works fail to well capture moving behaviors embedded in paths and thus do not estimate the travel time accurately. To fill in this gap, in this work, we propose a novel neural network framework, namely Deep Image-based Spatio-Temporal network (DeepIST), for travel time estimation of a given path. The novelty of DeepIST lies in the following aspects:1) we propose to plot a path as a sequence of -generalized images"which include sub-paths along with additional information, such as traffic conditions, road network and traffic signals, in order to harness the power of convolutional neural network model (CNN)on image processing; 2) we design a novel two-dimensional CNN, namely PathCNN, to extract spatial patterns for lines in images by regularization and adopting multiple pooling methods; and 3) we apply a one-dimensional CNN to capture temporal patterns among the spatial patterns along the paths for the estimation. Empirical results show that DeepIST soundly outperforms the state-of-the-art travel time estimation models by 24.37% to 25.64% of mean absolute error (MAE) in multiple large-scale real-world datasets

### Personalized Route Description Based On Historical Trajectories

• Han Su
• GuangLin Cong
• Wei Chen
• Bolong Zheng
• Kai Zheng

The turn-by-turn route descriptions provided in the existing navigation applications are exclusively derived from underlying road network topology information, i.e., the connectivity of edges to each other. Therefore, the turn-by-turn route descriptions are simplified as metric translation of physical world (e.g. distance/time to turn) to spoken language. Such translation that ignores human cognition of the geographic space, is frequently verbose and redundant for the drivers who have knowledge of the geographical areas. In this paper, we study a Personalized Route Description system dubbed PerRD-with which the goal is to generate more customized and intuitive route descriptions based on user generated content. PerRD utilizes a wealth of user generated historical trajectory data to extract frequently visited routes in the road network. The extracted information is used to make cognitive customized route description for each user. We formalize this task as a problem of finding the optimal partition for a given route that maximizes the familiarity while minimizing the number of partitions, and finding a proper sentence to describe each partition. For empirical study, our solution is applied to three trajectory datasets and users' real experiences to evaluate the performance and effectiveness of PerRD.

### Geolocating Tweets in any Language at any Location

• Mike Izbicki
• Vagelis Papalexakis
• Vassilis Tsotras

Most social media messages are written in languages other than English, but commonly used text mining tools were designed only for English. This paper introduces the Unicode Convolutional Neural Network (UnicodeCNN) for analyzing text written in any language. The UnicodeCNN does not require the language to be known in advance, allows the language to change arbitrarily mid-sentence, and is robust to the misspellings and grammatical mistakes commonly found in social media. We demonstrate the UnicodeCNN's effectiveness on the challenging task of content-based tweet geolocation using a dataset with 900 million tweets written in more than 100 languages. Whereas previous work restricted itself to predicting a tweet's country or city of origin (and only worked on tweets written in certain languages from highly populated cities), we predict the exact GPS locations of tweets (and our method works on tweets written in any language sent from anywhere in the world). We predict GPS coordinates using the mixture of von Mises-Fisher (MvMF) distribution. The MvMF exploits the Earth's spherical geometry to improve predictions, a task that previous work considered too computationally difficult. On English tweets, our model's predictions average more than 300km closer to the true location than previous work, and in other languages our model's predictions are up to 1500km more accurate. Remarkably, the UnicodeCNN can learn geographic knowledge in one language and automatically transfer that knowledge to other languages.

### SeiSMo: Semi-supervised Time Series Motif Discovery for Seismic Signal Detection

• M Ashraf Siddiquee
• Zeinab Akhavan
• Abdullah Mueen

Unlike semi-supervised clustering, classification and rule discovery; semi-supervised motif discovery is a surprisingly unexplored area in data mining. Semi-supervised Motif Discovery finds hidden patterns in long time series when a few arbitrarily known patterns are given. A naive approach is to exploit the known patterns and perform similarity search within a radius of the patterns. However, this method would find only similar shapes and would be limited in discovering new shapes. In contrast, traditional unsupervised motif discovery algorithms detect new shapes, while missing some patterns because the given information is not utilized. We propose a semi-supervised motif discovery algorithm that forms a nearest neighbor graph to identify chains of nearest neighbors from the given events. We demonstrate that the chains are likely to identify hidden patterns in the data. We have applied the method to find novel events in several geoscientific datasets more accurately than existing methods.

## SESSION: Long - Biomedical Informatics

### UA-CRNN: Uncertainty-Aware Convolutional Recurrent Neural Network for Mortality Risk Prediction

• Qingxiong Tan
• Andy Jinhua Ma
• Mang Ye
• Baoyao Yang
• Huiqi Deng
• Vincent Wai-Sun Wong
• Yee-Kit Tse
• Terry Cheuk-Fung Yip
• Grace Lai-Hung Wong
• Jessica Yuet-Ling Ching
• Francis Ka-Leung Chan
• Pong C. Yuen

Accurate prediction of mortality risk is important for evaluating early treatments, detecting high-risk patients and improving healthcare outcomes. Predicting mortality risk from the irregular clinical time series data is challenging due to the varying time intervals in the consecutive records. Existing methods usually solve this issue by generating regular time series data from the original irregular data without considering the uncertainty in the generated data, caused by varying time intervals. In this paper, we propose a novel Uncertainty-Aware Convolutional Recurrent Neural Network (UA-CRNN), which incorporates the uncertainty information in the generated data to improve the mortality risk prediction performance. To handle the complex clinical time series data with sub-series of different frequencies, we propose to incorporate the uncertainty information into the sub-series level rather than the whole time series data. Specifically, we design a novel hierarchical uncertainty-aware decomposition layer (UADL) to adaptively decompose time series into different sub-series and assign them proper weights according to their reliabilities. Experimental results on two real-world clinical datasets demonstrate that the proposed UA-CRNN method significantly outperforms state-of-the-art methods in both short-term and long-term mortality risk predictions.

### Learning More with Less: Conditional PGGAN-based Data Augmentation for Brain Metastases Detection Using Highly-Rough Annotation on MR Images

• Changhee Han
• Kohei Murao
• Tomoyuki Noguchi
• Yusuke Kawata
• Fumiya Uchiyama
• Leonardo Rundo
• Hideki Nakayama
• Shin'ichi Satoh

Accurate Computer-Assisted Diagnosis, associated with proper data wrangling, can alleviate the risk of overlooking the diagnosis in a clinical environment. Towards this, as a Data Augmentation (DA) technique, Generative Adversarial Networks (GANs) can synthesize additional training data to handle the small/fragmented medical imaging datasets collected from various scanners; those images are realistic but completely different from the original ones, filling the data lack in the real image distribution. However, we cannot easily use them to locate disease areas, considering expert physicians' expensive annotation cost. Therefore, this paper proposes Conditional Progressive Growing of GANs (CPGGANs), incorporating highly-rough bounding box conditions incrementally into PGGANs to place brain metastases at desired positions/sizes on 256 × 256 Magnetic Resonance (MR) images, for Convolutional Neural Network-based tumor detection; this first GAN-based medical DA using automatic bounding box annotation improves the training robustness. The results show that CPGGAN-based DA can boost 10% sensitivity in diagnosis with clinically acceptable additional False Positives. Surprisingly, further tumor realism, achieved with additional normal brain MR images for CPGGAN training, does not contribute to detection performance, while even three physicians cannot accurately distinguish them from the real ones in Visual Turing Test.

### Domain Knowledge Guided Deep Atrial Fibrillation Classification and Its Visual Interpretation

• Xiaoyu Li
• Jishang Wei
• Xianli Zhang
• Sirui Chen
• Qinghua Zheng
• none none

Hand-crafted features have been proven useful in solving the electrocardiograph~(ECG) classification problem. The features rely on domain knowledge and carry clinical meanings. However, the construction of the features requires tedious fine tuning in practice. Lately, a set of end-to-end deep neural network models have been proposed and show promising results in ECG classification. Though effective, such models learn patterns which usually mismatch human's concept, and thereby it is hard to get a convincing explanation with interpretation methods. This limitation significantly narrows the applicability of deep models, considering it is difficult for cardiologists to accept the unexplainable results from deep learning. To alleviate such limitation, we are bringing the best from the two worlds and propose a domain knowledge guided deep neural network. Specifically, we utilize a deep residual network as a classification framework, within which key feature ~(P-wave and R-peak position) reconstruction tasks are adopted to incorporate domain knowledge in the learning process. The reconstruction tasks make the model pay more attention to key feature points within ECG. Furthermore, we utilize occlusion method to get visual interpretation and design a visualization at both heartbeat level and feature point level. Our experiments show the superior performance of the proposed ECG classification methods compared to the model without P-wave and R-peak tasks, and the patterns learnt by our model is more explainable.

### Question Difficulty Prediction for Multiple Choice Problems in Medical Exams

• Zhaopeng Qiu
• Xian Wu
• Wei Fan

In the ITS (Intelligent Tutoring System) services, personalized question recommendation is a critical function in which the key challenge is to predict the difficulty of each question. Given the difficulty of each question, ITS can allocate suitable questions for students with varied knowledge proficiency. Existing approaches mainly relied on expert labeling, which is both subjective and labor intensive. In this paper, we propose a Document enhanced Attention based neural Network(DAN) framework to predict the difficulty of multiple choice problems in medical exams. DAN consists of three major steps: (1) In addition to stem and options, DAN retrieves relevant medical documents to enrich the content of each question; (2) DAN breaks down the question's difficulty into two parts: the hardness for recalling the knowledge assessed by the question and the confusion degree to exclude distractors. For each part, DAN introduces corresponding attention layers to model it; (3) DAN combines two parts of difficulties together to predict the overall difficulty. We collect a real-world data set from one of the largest medical online education websites in China. And the experimental results demonstrate the effectiveness of the proposed framework.

### GRAPHENE: A Precise Biomedical Literature Retrieval Engine with Graph Augmented Deep Learning and External Knowledge Empowerment

• Sendong Zhao
• Chang Su
• Andrea Sboner
• Fei Wang

Effective biomedical literature retrieval (BLR) plays a central role inprecision medicine informatics. In this paper, we propose GRAPHENE,which is a deep learning based framework for precise BLR. GRAPHENEconsists of three main different modules 1) graph-augmented doc-ument representation learning; 2) query expansion and represen-tation learning and 3) learning to rank biomedical articles. Thegraph-augmented document representation learning module con-structs a document-concept graph containing biomedical conceptnodes and document nodes so that global biomedical related con-cept from external knowledge source can be captured, which isfurther connected to a BiLSTM so both local and global topics canbe explored. Query expansion and representation learning moduleexpands the query with abbreviations and different names, and thenbuilds a CNN-based model to convolve the expanded query andobtain a vector representation for each query. Learning to rank min-imizes a ranking loss between biomedical articles with the queryto learn the retrieval function. Experimental results on applyingour system to TREC Precision Medicine track data are provided todemonstrate its effectiveness.

## SESSION: Long - Computer Vision

### Video-level Multi-model Fusion for Action Recognition

• Xiaomin Wang
• Junsan Zhang
• Leiquan Wang
• Philip S. Yu
• Jie Zhu
• Haisheng Li

The approaches based on spatio-temporal features for video action recognition have emerged such as two-stream based methods and 3D convolution based methods. However, current methods suffer from the problems caused by partial observation, or restricted to single information modeling, and so on. Segment-level recognition results obtained from dense sampling can not represent the entire video and, therefore lead to partial observation. And a single model is hard to capture the complementary information on spacial, temporal and spatio-temporal information from video at the same time. Therefore, the challenge is to build the video-level representation and capture multiple information. In this paper, a video-level multi-model fusion action recognition method is proposed to solve these problems. Firstly, an efficient video-level 3D convolution model is proposed to get the global information in the video which assembling segment-level 3D convolution models. Secondly, a multi-model fusion architecture is proposed for video action recognition to capture multiple information. The spatial, temporal and spatio-temporal information are aggregate with SVM classifier. Experimental results show that this method achieves the state-of-the-art performance on the datasets of UCF-101(97.6%) without pre-training on Kinetics.

### Large Scale Landmark Recognition via Deep Metric Learning

• Andrei Boiarov
• Eduard Tyantov

This paper presents a novel approach for landmark recognition in images that we've successfully deployed at Mail.ru. This method enables us to recognize famous places, buildings, monuments, and other landmarks in user photos. The main challenge lies in the fact that it's very complicated to give a precise definition of what is and what is not a landmark. Some buildings, statues and natural objects are landmarks; others are not. There's also no database with a fairly large number of landmarks to train a recognition model. A key feature of using landmark recognition in a production environment is that the number of photos containing landmarks is extremely small. This is why the model should have a very low false positive rate as well as high recognition accuracy. We propose a metric learning-based approach that successfully deals with existing challenges and efficiently handles a large number of landmarks. Our method uses a deep neural network and requires a single pass inference that makes it fast to use in production. We also describe an algorithm for cleaning landmarks database which is essential for training a metric learning model. We provide an in-depth description of basic components of our method like neural network architecture, the learning strategy, and the features of our metric learning approach. We show the results of proposed solutions in tests that emulate the distribution of photos with and without landmarks from a user collection. We compare our method with others during these tests. The described system has been deployed as a part of a photo recognition solution at Cloud Mail.ru, which is the photo sharing and storage service at Mail.ru Group.

### Multi-stage Deep Classifier Cascades for Open World Recognition

• Xiaojie Guo
• Amir Alipour-Fanid
• Lingfei Wu
• Hemant Purohit
• Xiang Chen
• Kai Zeng
• Liang Zhao

At present, object recognition studies are mostly conducted in a closed lab setting with classes in test phase typically in training phase. However, real-world problem are far more challenging because: i)~new classes unseen in the training phase can appear when predicting; ii)~discriminative features need to evolve when new classes emerge in real time; and iii)~instances in new classes may not follow the "independent and identically distributed" (iid) assumption. Most existing work only aims to detect the unknown classes and is incapable of continuing to learn newer classes. Although a few methods consider both detecting and including new classes, all are based on the predefined handcrafted features that cannot evolve and are out-of-date for characterizing emerging classes. Thus, to address the above challenges, we propose a novel generic end-to-end framework consisting of a dynamic cascade of classifiers that incrementally learn their dynamic and inherent features. The proposed method injects dynamic elements into the system by detecting instances from unknown classes, while at the same time incrementally updating the model to include the new classes. The resulting cascade tree grows by adding a new leaf node classifier once a new class is detected, and the discriminative features are updated via an end-to-end learning strategy. Experiments on two real-world datasets demonstrate that our proposed method outperforms existing state-of-the-art methods.

### Inferring Context from Pixels for Multimodal Image Classification

• Manan Shah
• Krishnamurthy Viswanathan
• Chun-Ta Lu
• Ariel Fuxman
• Zhen Li
• Aleksei Timofeev
• Chao Jia
• Chen Sun

Image classification models take image pixels as input and predict labels in a predefined taxonomy. While contextual information (e.g. text surrounding an image) can provide valuable orthogonal signals to improve classification, the typical setting in literature assumes the unavailability of text and thus focuses on models that rely purely on pixels. In this work, we also focus on the setting where only pixels are available in the input. However, we demonstrate that if we predict textual information from pixels, we can subsequently use the predicted text to train models that improve overall performance. We propose a framework that consists of two main components: (1) a phrase generator that maps image pixels to a contextual phrase, and (2) a multimodal model that uses textual features from the phrase generator and visual features from the image pixels to produce labels in the output taxonomy. The phrase generator is trained using web-based query-image pairs to incorporate contextual information associated with each image and has a large output space. We evaluate our framework on diverse benchmark datasets (specifically, the WebVision dataset for evaluating multi-class classification and OpenImages dataset for evaluating multi-label classification), demonstrating performance improvements over approaches based exclusively on pixels and showcasing benefits in prediction interpretability. We additionally present results to demonstrate that our framework provides improvements in few-shot learning of minimally labeled concepts. We further demonstrate the unique benefits of the multimodal nature of our framework by utilizing intermediate image/text co-embeddings to perform baseline zero-shot learning on the ImageNet dataset.

### Multi-Target Multi-Camera Tracking with Human Body Part Semantic Features

• Mingkun Wang
• Dianxi Shi
• Naiyang Guan
• Wei Yi
• Tao Zhang
• Zunlin Fan

Recently, Multi-Target Multi-Camera Tracking (MTMCT) has gained more and more attention. It is a challenging task with major problems including occlusion, background clutter, poses and camera point of view variations. Compared to single camera tracking, which takes advantage of location information and strict time constraints, good appearance features are more important to MTMCT. This drives us to extract robust and discriminative features for MTMCT. We propose MTMCT\_HS which uses human body part semantic features to overcome the above challenges. We use a two-stream deep neural network to extract the global appearance features and human body part semantic maps separately, and employ aggregation operations to generate final features. We argue that these features are more suitable for affinity measurement, which can be seen as the average of appearance similarity weighted by the corresponding human body part similarity. Next, our tracker adopts a hierarchical correlation clustering algorithm, which combines targets' appearance feature similarity with motion correlation for data association. We validate the effectiveness of our MTMCT\_HS method by demonstrating its superiority over the state-of-the-art method on DukeMTMC benchmark. Experiments show that the extracted features with human body part semantics are more effective for MTMCT compared with the methods solely employing global appearance features.

## SESSION: Long - Database and System

### Efficient Join Processing Over Incomplete Data Streams

• Weilong Ren
• Xiang Lian
• Kambiz Ghazinour

For decades, the join operator over fast data streams has always drawn much attention from the database community, due to its wide spectrum of real-world applications, such as online clustering, intrusion detection, sensor data monitoring, and so on. Existing works usually assume that the underlying streams to be joined are complete (without any missing values). However, this assumption may not always hold, since objects from streams may contain some missing attributes, due to various reasons such as packet losses, network congestion/failure, and so on. In this paper, we formalize an important problem, namely join over incomplete data streams (Join-iDS), which retrieves joining object pairs from incomplete data streams with high confidences. We tackle the Join-iDS problem in the style of "data imputation and query processing at the same time". To enable this style, we design an effective and efficient cost-model-based imputation method via deferential dependency (DD), devise effective pruning strategies to reduce the Join-iDS search space, and propose efficient algorithms via our proposed cost-model-based data synopsis/indexes. Extensive experiments have been conducted to verify the efficiency and effectiveness of our proposed Join-iDS approach on both real and synthetic data sets.

### Inclusion Dependency Discovery: An Experimental Evaluation of Thirteen Algorithms

• Falco Dürsch
• Axel Stebner
• Fabian Windheuser
• Maxi Fischer
• Tim Friedrich
• Nils Strelow
• Tobias Bleifuß
• Hazar Harmouch
• Lan Jiang
• Thorsten Papenbrock
• Felix Naumann

Inclusion dependencies are an important type of metadata in relational databases, because they indicate foreign key relationships and serve a variety of data management tasks, such as data linkage, query optimization, and data integration. The discovery of inclusion dependencies is, therefore, a well-studied problem and has been addressed by many algorithms. Each of these discovery algorithms follows its own strategy with certain strengths and weaknesses, which makes it difficult for data scientists to choose the optimal algorithm for a given profiling task.

This paper summarizes the different state-of-the-art discovery approaches and discusses their commonalities. For evaluation purposes, we carefully re-implemented the thirteen most popular discovery algorithms and discuss their individual properties. Our extensive evaluation on several real-world and synthetic datasets shows the unbiased performance of the different discovery approaches and, hence, provides a guideline on when and where each approach works best. Comparing the different runtimes and scalability graphs, we identify the best approaches for certain situations and demonstrate where certain algorithms fail.

### Constructing a Comprehensive Events Database from the Web

• Qifan Wang
• Bhargav Kanagal
• Vijay Garg
• D. Sivakumar

In this paper, we consider the problem of constructing a comprehensive database of events taking place around the world. Events include small hyper-local events like farmer's markets, neighborhood garage sales, as well as larger concerts and festivals. Designing a high-precision and high-recall event extractor from unstructured pages across the whole web is a challenging problem. We cannot resort overly to domain-specific strategies since it needs to work on all web pages, including on new domains; we need to account for variations in page layouts and structure across websites. Further, we need to deal with low-quality pages on the web with limited structure. We have built an ML-powered extraction system to solve this problem, using schema.org annotations as training data. Our extraction system operates in two phases. In the first phase, we generate raw event information from individual web pages. To do this, an \em event page classifier predicts if a web page contains any event information; this is then followed by a \em single/multiple classifier that decides if the page contains a single event or multiple events; the first phase concludes by applying \em event extractors that extract the key fields of a public event (the title, the date/time information, and the location information). In the second phase, we further improve the extraction quality via three novel algorithms, \em repeated patterns, \em event consolidation and \em wrapper induction, which are designed to use the raw event extractions as input and generate events whose quality is significantly higher. We evaluate our extraction models on two large scale publicly available web corpus, Common Crawl and ClueWeb12. Experimental analysis shows that our methodology achieves over 95% extraction precision and recall on both datasets.

### Deploying Hash Tables on Die-Stacked High Bandwidth Memory

• Xuntao Cheng
• Bingsheng He
• Eric Lo
• Wei Wang
• Shengliang Lu
• Xinyu Chen

Die-stacked High Bandwidth Memory (HBM) is an emerging memory architecture that achieves much higher memory bandwidth with similar or lower memory access latency and smaller capacity, compared with main memories. Memory-intensive database algorithms may potentially benefit from these new features. Due to the small capacity of such die-stacked HBM, a hybrid memory architecture comprising both main memories and HBMs is promising for main-memory databases. As a starting point, we study a key data structure, hash tables, in such a hybrid memory architecture. In a large hash table distributed among multiple NUMA (non-uniform memory accesses) nodes and accessed by multiple CPU sockets, the data placement and memory access scheduling for workload balance are challenging due to the random memory accesses involved that are difficult to predict. In this work, we propose a deployment algorithm that first estimates the memory access cost and then places data in a way that exploits the hybrid memory architecture in a balanced manner. Evaluation results show that the proposed deployment is able to achieve up to three times performance improvement over the state-of-the-art NUMA-aware scheduling algorithms for hash joins in relational databases on present and simulated future hybrid memory architectures.

## SESSION: Long - Domain Adaptation and Transfer Learning

• Chaozhuo Li
• Senzhang Wang
• Hao Wang
• Yanbo Liang
• Philip S. Yu
• Zhoujun Li
• Wei Wang

With the increasing popularity and diversity of social media, users tend to join multiple social platforms to enjoy different types of services. User identity linkage, which aims to link identical identities across different social platforms, has attracted increasing research attentions recently. Existing methods usually focus on pairwise identity linkage between two platforms, which cannot piece up the information from multi-sources to depict the intrinsic figures of social users. In this paper, we propose a novel adversarial learning based framework MSUIL with partially shared generators to perform Semi-supervised User Identity Linkage across Multiple social networks. The isomorphism across multiple platforms is captured as the complementary to link identities. The insight is that we aim to learn the desirable projection functions (generators) to not only minimize the distance between the distributions of user identities in arbitrary pairs of platforms, but also incorporate the available annotations as the learning guidance. The projection functions of different platform pairs share partial parameters, which ensures MSUIL can capture the interdependencies among multiple platforms and improves the model efficiency. Empirically, we evaluate our proposal over multiple datasets. The experimental results demonstrate the superiority of the proposed MSUIL model.

• Manliang Cao
• Xiangdong Zhou
• Yiming Xu
• Yue Pang
• Bo Yao

In the cross-domain image classification scenario, domain adaption aims to address the challenge of transferring the knowledge obtained from the source domain to the target domain that is regarded as similar but different from the source domain. To get more reliable domain invariant representations, recent methods start to consider class-level distribution alignment across the source and target domains by adaptively assigning pseudo target labels. However, these approaches are vulnerable to the error accumulation and hence unable to preserve cross-domain category consistency. Because the accuracy of pseudo labels cannot be guaranteed explicitly. In this paper, we propose Adversarial Domain Adaptation with Semantic Consistency (ADASC) model to align the discriminative features across domains progressively and effectively, via exploiting the class-level relations between domains. Specifically, to simultaneously alleviate the negative influence of the false pseudo-target labels and get the discriminative domain invariant features, we introduce an Adaptive Centroid Alignment (ACA) strategy and a Class Discriminative Constraint (CDC) step to complement each other iteratively and alternatively in an end-to-end framework. Extensive experiments are conducted on several unsupervised domain adaptation datasets, and the results show that ADASC outperforms the state-of-the-art methods.

### ATL: Autonomous Knowledge Transfer from Many Streaming Processes

• Mahardhika Pratama
• Marcus de Carvalho
• Renchunzi Xie
• Edwin Lughofer
• Jie Lu

Transferring knowledge across many streaming processes remains an uncharted territory in the existing literature and features unique characteristics: no labelled instance of the target domain, covariate shift of source and target domain, different period of drifts in the source and target domains. Autonomous transfer learning (ATL) is proposed in this paper as a flexible deep learning approach for the online unsupervised transfer learning problem across many streaming processes. ATL offers an online domain adaptation strategy via the generative and discriminative phases coupled with the KL divergence based optimization strategy to produce a domain invariant network while putting forward an elastic network structure. It automatically evolves its network structure from scratch with/without the presence of ground truth to overcome independent concept drifts in the source and target domain. Rigorous numerical evaluation has been conducted along with comparison against recently published works. ATL demonstrates improved performance while showing significantly faster training speed than its counterparts.

### Knowledge Transfer based on Multiple Manifolds Assumption

• Pengfei Wei
• Yiping Ke

Unsupervised domain adaptation is a popular but challenging problem setting. Existing unsupervised domain adaptation methods are based on the single manifold assumption, i.e., data are sampled from a single low-dimensional manifold, and thus may not well capture the complex characteristic of the real-world data. In this paper, we propose to transfer knowledge across domains under the multiple manifolds assumption that assumes the data are sampled from multiple low-dimensional manifolds. Specifically, we develop a multiple manifolds information transfer framework (MMIT). The proposed MMIT aims to transfer the multiple manifolds information, which is represented by the data manifold neighborhood structure, with the the best adaptation capacity. To do so, we propose to couple the multiple manifolds information transfer with the domain distribution discrepancy minimization in the adaptation procedure. Experimental studies demonstrate that MMIT achieves the promising adaptation performance on various real-world adaptation tasks.

### Cross-domain Aspect Category Transfer and Detection via Traceable Heterogeneous Graph Representation Learning

• Zhuoren Jiang
• Jian Wang
• Lujun Zhao
• Changlong Sun
• Yao Lu
• Xiaozhong Liu

Aspect category detection is an essential task for sentiment analysis and opinion mining. However, the cost of categorical data labeling, e.g., label the review aspect information for a large number of product domains, can be inevitable but unaffordable. In this study, we propose a novel problem, cross-domain aspect category transfer and detection, which faces three challenges: various feature spaces, different data distributions, and diverse output spaces. To address these problems, we propose an innovative solution, Traceable Heterogeneous Graph Representation Learning (THGRL). Unlike prior text-based aspect detection works, THGRL explores latent domain aspect category connections via massive user behavior information on a heterogeneous graph. Moreover, an innovative latent variable "Walker Tracer" is introduced to characterize the global semantic/aspect dependencies and capture the informative vertexes on the random walk paths. By using THGRL, we project different domains' feature spaces into a common one, while allowing data distributions and output spaces stay differently. Experiment results show that the proposed method outperforms a series of state-of-the-art baseline models.

## SESSION: Long - E-Commerce and Advertising I

### A Deep Neural Framework for Sales Forecasting in E-Commerce

• Yan Qi
• Chenliang Li
• Han Deng
• Min Cai
• Yunwei Qi
• Yuming Deng

Product sales forecasting plays a fundamental role in enhancing timeliness of product delivery in E-Commerce. Among many heterogeneous features relevant to sales forecasting, promotion campaigns held in E-Commerce and competing relation between substitutable products would greatly complicate the matter. Unfortunately, these factors are usually overlooked in the existing literature, since the conventional time series analysis based techniques mainly consider the sales records alone. In this paper, we propose a novel deep neural framework for sales forecasting in E-Commerce, named DSF. In DSF, sales forecasting is formulated as a sequence-to-sequence learning problem where the sales is estimated in a recurrent fashion. On top of the decoder, we introduce a sales residual network to explicitly model the impact of competing relation when a promotion campaign is launched for a target item or some substitutable counterparts. Extensive experiments are conducted over two real-world datasets in different domains from Taobao E-Commerce platform. Our results demonstrate that the proposed DSF obtains substantial performance gain over the traditional baselines and up-to-date deep learning alternatives in terms of forecasting accuracy. Further comparison shows that DSF has also surpassed the deep learning based solution currently depolyed in Taobao platform.

### An Active and Deep Semantic Matching Framework for Query Rewrite in E-Commercial Search Engine

• Yatao Yang
• Jun Tan
• Hongbo Deng
• Zibin Zheng
• Yutong Lu
• Xiangke Liao

In order to make the query retrieve much more related products, some query rewrite methods have been proposed to obtain a set of candidate queries which can infer users' search intents and reduce the vocabulary gap between the original query and title of related products. However, previous studies ignore that some candidate queries may change users' search intents and retrieve irrelevant products. As a result, users' search experience will be impacted significantly. To reduce this influence, we need to design a semantic matching model to determine whether the candidate query change the original query's search intents (semantics). In addition, building a semantic matching model faces the following challenges: 1) Queries are usually very short and have limited information. It is very hard to learn an effective semantic matching model with the textual information of queries and candidate queries. 2) In order to get a generalized and effective mode, sufficient data samples are required to train the model. However, the cost of labeling is very huge. In order to address the above challenges, we propose an active and deep semantic matching framework (ActiveMatch) which is composed of two components. One component is the deep semantic matching (DSM) model which can make full use of the search log information to enhance the representation of queries and candidate queries. Then, it can estimate the semantic similarity between the original query and the candidate query more accurately. The other component is an uncertainty and novelty sampling (UNS) strategy which selects the samples to label based on the difficulty of the model estimating and the probability of the occurrence of new words. It not only reduces the cost of labeling but also ensures the effectiveness of the model. The experimental results on the Taobao e-commercial search platform verify the effectiveness of our framework.

### AIBox: CTR Prediction Model Training on a Single Node

• Weijie Zhao
• Jingyuan Zhang
• Deping Xie
• Yulei Qian
• Ronglai Jia
• Ping Li

As one of the major search engines in the world, Baidu's Sponsored Search has long adopted the use of deep neural network (DNN) models for Ads click-through rate (CTR) predictions, as early as in 2013. The input futures used by Baidu's online advertising system (a.k.a. "Phoenix Nest'') are extremely high-dimensional (e.g., hundreds or even thousands of billions of features) and also extremely sparse. The size of the CTR models used by Baidu's production system can well exceed 10TB. This imposes tremendous challenges for training, updating, and using such models in production. For Baidu's Ads system, it is obviously important to keep the model training process highly efficient so that engineers (and researchers) are able to quickly refine and test their new models or new features. Moreover, as billions of user ads click history entries are arriving every day, the models have to be re-trained rapidly because CTR prediction is an extremely time-sensitive task. Baidu's current CTR models are trained on MPI (Message Passing Interface) clusters, which require high fault tolerance and synchronization that incur expensive communication and computation costs. And, of course, the maintenance costs for clusters are also substantial. This paper presents AIBox, a centralized system to train CTR models with tens-of-terabytes-scale parameters by employing solid-state drives (SSDs) and GPUs. Due to the memory limitation on GPUs, we carefully partition the CTR model into two parts: one is suitable for CPUs and another for GPUs. We further introduce a bi-level cache management system over SSDs to store the 10TB parameters while providing low-latency accesses. Extensive experiments on production data reveal the effectiveness of the new system. AIBox has comparable training performance with a large MPI cluster, while requiring only a small fraction of the cost for the cluster.

### Improving Ad Click Prediction by Considering Non-displayed Events

• Bowen Yuan
• Jui-Yang Hsia
• Meng-Yuan Yang
• Hong Zhu
• Chih-Yao Chang
• Zhenhua Dong
• Chih-Jen Lin

Click-through rate (CTR) prediction is the core problem of building advertising systems. Most existing state-of-the-art approaches model CTR prediction as binary classification problems, where displayed events with and without click feedbacks are respectively considered as positive and negative instances for training and offline validation. However, due to the selection mechanism applied in most advertising systems, a selection bias exists between distributions of displayed and non-displayed events. Conventional CTR models ignoring the bias may have inaccurate predictions and cause a loss of the revenue. To alleviate the bias, we need to conduct counterfactual learning by considering not only displayed events but also non-displayed events. In this paper, through a review of existing approaches of counterfactual learning, we point out some difficulties for applying these approaches for CTR prediction in a real-world advertising system. To overcome these difficulties, we propose a novel framework for counterfactual CTR prediction. In experiments, we compare our proposed framework against state-of-the-art conventional CTR models and existing counterfactual learning approaches. Experimental results show significant improvements.

### Approximation Algorithms for Coordinating Ad Campaigns on Social Networks

• Kartik Lakhotia
• David Kempe

## SESSION: Long - E-Commerce and Advertising II

### Regularized Adversarial Sampling and Deep Time-aware Attention for Click-Through Rate Prediction

• Yikai Wang
• Liang Zhang
• Quanyu Dai
• Fuchun Sun
• Bo Zhang
• Yang He
• Weipeng Yan
• Yongjun Bao

Improving the performance of click-through rate (CTR) prediction remains one of the core tasks in online advertising systems. With the rise of deep learning, CTR prediction models with deep networks remarkably enhance model capacities. In deep CTR models, exploiting users' historical data is essential for learning users' behaviors and interests. As existing CTR prediction works neglect the importance of the temporal signals when embed users' historical clicking records, we propose a time-aware attention model which explicitly uses absolute temporal signals for expressing the users' periodic behaviors and relative temporal signals for expressing the temporal relation between items. Besides, we propose a regularized adversarial sampling strategy for negative sampling which eases the classification imbalance of CTR data and can make use of the strong guidance provided by the observed negative CTR samples. The adversarial sampling strategy significantly improves the training efficiency, and can be co-trained with the time-aware attention model seamlessly. Experiments are conducted on real-world CTR datasets from both in-station and out-station advertising places.

### Conversational Product Search Based on Negative Feedback

• Keping Bi
• Qingyao Ai
• Yongfeng Zhang
• W. Bruce Croft

Intelligent assistants change the way people interact with computers and make it possible for people to search for products through conversations when they have purchase needs. During the interactions, the system could ask questions on certain aspects of the ideal products to clarify the users' needs. For example, previous work proposed to ask users the exact characteristics of their ideal items before showing results. However, users may not have clear ideas about what an ideal item looks like, especially when they have not seen any item. So it is more feasible to facilitate the conversational search by showing example items and asking for feedback instead. In addition, when the users provide negative feedback for the presented items, it is easier to collect their detailed feedback on certain properties (aspect-value pairs) of the non-relevant items. By breaking down the item-level negative feedback to fine-grained feedback on aspect-value pairs, more information is available to help clarify users' intents. So in this paper, we propose a conversational paradigm for product search driven by non-relevant items, based on which fine-grained feedback is collected and utilized to show better results in the next iteration. We then propose an aspect-value likelihood model to incorporate both positive and negative feedback on fine-grained aspect-value pairs of the non-relevant items. Experimental results show that our model is significantly better than state-of-the-art product search baselines without using feedback and those baselines using item-level negative feedback.

### Learning to Ask: Question-based Sequential Bayesian Product Search

• Jie Zou
• Evangelos Kanoulas

Product search is generally recognized as the first and foremost stage of online shopping and thus significant for users and retailers of e-commerce. Most of the traditional retrieval methods use some similarity functions to match the user's query and the document that describes a product, either directly or in a latent vector space. However, user queries are often too general to capture the minute details of the specific product that a user is looking for. In this paper, we propose a novel interactive method to effectively locate the best matching product. The method is based on the assumption that there is a set of candidate questions for each product to be asked. In this work, we instantiate this candidate set by making the hypothesis that products can be discriminated by the entities that appear in the documents associated with them. We propose a Question-based Sequential Bayesian Product Search method, QSBPS, which directly queries users on the expected presence of entities in the relevant product documents. The method learns the product relevance as well as the reward of the potential questions to be asked to the user by being trained on the search history and purchase behavior of a specific user together with that of other users. The experimental results show that the proposed method can greatly improve the performance of product search compared to the state-of-the-art baselines.

### A Zero Attention Model for Personalized Product Search

• Qingyao Ai
• Daniel N. Hill
• S. V. N. Vishwanathan
• W. Bruce Croft

Product search is one of the most popular methods for people to discover and purchase products on e-commerce websites. Because personal preferences often have an important influence on the purchase decision of each customer, it is intuitive that personalization should be beneficial for product search engines. While synthetic experiments from previous studies show that purchase histories are useful for identifying the individual intent of each product search session, the effect of personalization on product search in practice, however, remains mostly unknown. In this paper, we formulate the problem of personalized product search and conduct large-scale experiments with search logs sampled from a commercial e-commerce search engine. Results from our preliminary analysis show that the potential of personalization depends on query characteristics, interactions between queries, and user purchase histories. Based on these observations, we propose a Zero Attention Model for product search that automatically determines when and how to personalize a user-query pair via a novel attention mechanism. Empirical results on commercial product search logs show that the proposed model not only significantly outperforms state-of-the-art personalized product retrieval models, but also provides important information on the potential of personalization in each product search session.

### Learning to Generate Personalized Product Descriptions

• Ido Guy
• Slava Novgorodov
• Benny Kimelfeld

Personalization plays a key role in electronic commerce, adjusting the products presented to users through search and recommendations according to their personality and tastes. Current personalization efforts focus on the adaptation of product selections, while the description of a given product remains the same regardless of the user who views it. In this work, we propose an approach to personalize product descriptions according to the personality of an individual user. To the best of our knowledge, we are the first to address the problem of generating personalized product descriptions. We first learn to predict a user's personality based on past activity on an e-commerce website. Then, given a user personality, we propose an extractive summarization-based algorithm that selects the sentences to be used as part of a product description in accordance with the given personality. Our evaluation shows that user personality can be effectively learned from past e-commerce activity, while personalized descriptions can lead to a higher interest in the product and increased purchase likelihood.

## SESSION: Long - Network Embedding I

### Fast and Accurate Network Embeddings via Very Sparse Random Projection

• Haochen Chen
• Yingtao Tian
• Muhao Chen
• Steven Skiena

We present FastRP, a scalable and performant algorithm for learning distributed node representations in a graph. FastRP is over 4,000 times faster than state-of-the-art methods such as DeepWalk and node2vec, while achieving comparable or even better performance as evaluated on several real-world networks on various downstream tasks. We observe that most network embedding methods consist of two components: construct a node similarity matrix and then apply dimension reduction techniques to this matrix. We show that the success of these methods should be attributed to the proper construction of this similarity matrix, rather than the dimension reduction method employed. FastRP is proposed as a scalable algorithm for network embeddings. Two key features of FastRP are: 1) it explicitly constructs a node similarity matrix that captures transitive relationships in a graph and normalizes matrix entries based on node degrees; 2) it utilizes very sparse random projection, which is a scalable optimization-free method for dimension reduction. An extra benefit from combining these two design choices is that it allows the iterative computation of node embeddings so that the similarity matrix need not be explicitly constructed, which further speeds up FastRP. FastRP is also advantageous for its ease of implementation, parallelization and hyperparameter tuning. The source code is available at https://github.com/GTmac/FastRP.

### Hierarchical Community Structure Preserving Network Embedding: A Subspace Approach

• Qingqing Long
• Yiming Wang
• Lun Du
• Guojie Song
• Yilun Jin
• Wei Lin

To depict ubiquitous relational data in real world, network data have been widely applied in modeling complex relationships. Projecting vertices to low dimensional spaces, quoted as Network Embedding, would thus be applicable to diverse real-world predicative tasks. Numerous works exploiting pairwise proximities, one characteristic owned by real networks, the clustering property, namely vertices are inclined to form communities of various ranges and hence form a hierarchy consisting of communities, has barely received attention from researchers. In this paper, we propose our network embedding framework, abbreviated SpaceNE, preserving hierarchies formed by communities through subspaces, manifolds with flexible dimensionalities and are inherently hierarchical. Moreover, we propose that subspaces are able to address further problems in representing hierarchical communities, including sparsity and space warps. Last but not least, we proposed constraints on dimensions of subspaces to denoise, which are further approximated by differentiable functions such that joint optimization is enabled, along with a layer-wise scheme to alleviate the overhead cause by the vast number of parameters. We conduct various experiments with results demonstrating our model's effectiveness in addressing community hierarchies.

• Yizhu Jiao
• Yun Xiong
• Jiawei Zhang
• Yangyong Zhu

### Discerning Edge Influence for Network Embedding

• Yaojing Wang
• Yuan Yao
• Hanghang Tong
• Feng Xu
• Jian Lu

Network embedding, which learns the low-dimensional representations of nodes, has gained significant research attention. Despite its superior empirical success, often measured by the prediction performance of downstream tasks (e.g., multi-label classification), it is unclear \em why a given embedding algorithm outputs the specific node representations, and \em how the resulting node representations relate to the structure of the input network. In this paper, we propose to discern the edge influence as the first step towards understanding skip-gram basd network embedding methods. For this purpose, we propose an auditing framework Near, whose key part includes two algorithms (Near-add \ and Near-del ) to effectively and efficiently quantify the influence of each edge. Based on the algorithms, we further identify high-influential edges by exploiting the linkage between edge influence and the network structure. Experimental results demonstrate that the proposed algorithms (Near-add \ and Near-del ) are significantly faster (up to $2,000\times$) than straightforward methods with little quality loss. Moreover, the proposed framework can efficiently identify the most influential edges for network embedding in the context of downstream prediction task and adversarial attacking.

### Constrained Co-embedding Model for User Profiling in Question Answering Communities

• Yupeng Luo
• Shangsong Liang
• Zaiqiao Meng

## SESSION: Long - Network Embedding II

### Hyper-Path-Based Representation Learning for Hyper-Networks

• Jie Huang
• Xin Liu
• Yangqiu Song

Network representation learning has aroused widespread interests in recent years. While most of the existing methods deal with edges as pairwise relationships, only a few studies have been proposed for hyper-networks to capture more complicated tuplewise relationships among multiple nodes. A hyper-network is a network where each edge, called hyperedge, connects an arbitrary number of nodes. Different from conventional networks, hyper-networks have certain degrees of indecomposability such that the nodes in a subset of a hyperedge may not possess a strong relationship. That is the main reason why traditional algorithms fail in learning representations in hyper-networks by simply decomposing hyperedges into pairwise relationships. In this paper, we firstly define a metric to depict the degrees of indecomposability for hyper-networks. Then we propose a new concept called hyper-path and design hyper-path-based random walks to preserve the structural information of hyper-networks according to the analysis of the indecomposability. Then a carefully designed algorithm, Hyper-gram, utilizes these random walks to capture both pairwise relationships and tuplewise relationships in the whole hyper-networks. Finally, we conduct extensive experiments on several real-world datasets covering the tasks of link prediction and hyper-network reconstruction, and results demonstrate the rationality, validity, and effectiveness of our methods compared with those existing state-of-the-art models designed for conventional networks or hyper-networks.

### Multi-Hot Compact Network Embedding

• Chaozhuo Li
• Lei Zheng
• Senzhang Wang
• Feiran Huang
• Philip S. Yu
• Zhoujun Li

Network embedding, as a promising way of the network representation learning, is capable of supporting various subsequent network mining and analysis tasks, and has attracted growing research interests recently. Traditional approaches assign each node with an independent continuous vector, which will cause memory overhead for large networks. In this paper we propose a novel multi-hot compact network embedding framework to effectively reduce memory cost by learning partially shared embeddings. The insight is that a node embedding vector is composed of several basis vectors according to a multi-hot index vector. The basis vectors are shared by different nodes, which can significantly reduce the number of continuous vectors while maintain similar data representation ability. Specifically, we propose a MCNE$_p$ model to learn compact embeddings from pre-learned node features. A novel component named compressor is integrated into MCNE$_p$ to tackle the challenge that popular back-propagation optimization cannot propagate loss through discrete samples. We further propose an end-to-end model MCNE$_t$ to learn compact embeddings from the input network directly. Empirically, we evaluate the proposed models over four real network datasets, and the results demonstrate that our proposals can save about 90% of memory cost of network embeddings without significantly performance decline.

### Temporal Network Embedding with Micro- and Macro-dynamics

• Yuanfu Lu
• Xiao Wang
• Chuan Shi
• Philip S. Yu
• Yanfang Ye

Network embedding aims to embed nodes into a low-dimensional space, while capturing the network structures and properties. Although quite a few promising network embedding methods have been proposed, most of them focus on static networks. In fact, temporal networks, which usually evolve over time in terms of microscopic and macroscopic dynamics, are ubiquitous. The micro-dynamics describe the formation process of network structures in a detailed manner, while the macro-dynamics refer to the evolution pattern of the network scale. Both micro- and macro-dynamics are the key factors to network evolution; however, how to elegantly capture both of them for temporal network embedding, especially macro-dynamics, has not yet been well studied. In this paper, we propose a novel temporal network embedding method with micro- and macro-dynamics, named $\rmM^2DNE$. Specifically, for micro-dynamics, we regard the establishments of edges as the occurrences of chronological events and propose a temporal attention point process to capture the formation process of network structures in a fine-grained manner. For macro-dynamics, we define a general dynamics equation parameterized with network embeddings to capture the inherent evolution pattern and impose constraints in a higher structural level on network embeddings. Mutual evolutions of micro- and macro-dynamics in a temporal network alternately affect the process of learning node embeddings. Extensive experiments on three real-world temporal networks demonstrate that $\rmM^2DNE$ significantly outperforms the state-of-the-arts not only in traditional tasks, e.g., network reconstruction, but also in temporal tendency-related tasks, e.g., scale prediction.

### MrMine: Multi-resolution Multi-network Embedding

• Boxin Du
• Hanghang Tong

Network embedding has become the cornerstone of a variety of mining tasks, such as classification, link prediction, clustering, anomaly detection and many more, thanks to its superior ability to encode the intrinsic network characteristics in a compact low-dimensional space. Most of the existing methods focus on a single network and/or a single resolution, which generate embeddings of different network objects (node/subgraph/network) from different networks separately. A fundamental limitation with such methods is that the intrinsic relationship across different networks (e.g., two networks share same or similar subgraphs) and that across different resolutions (e.g., the node-subgraph membership) are ignored, resulting in disparate embeddings. Consequentially, it leads to sub-optimal performance or even becomes inapplicable for some downstream mining tasks (e.g., role classification, network alignment. etc.). In this paper, we propose a unified framework MrMine to learn the representations of objects from multiple networks at three complementary resolutions (i.e., network, subgraph and node) simultaneously. The key idea is to construct the cross-resolution cross-network context for each object. The proposed method bears two distinctive features. First, it enables and/or boosts various multi-network downstream mining tasks by having embeddings at different resolutions from different networks in the same embedding space. Second, Our method is efficient and scalable, with a O(nlog(n)) time complexity for the base algorithm and a linear time complexity w.r.t. the number of nodes and edges of input networks for the accelerated version. Extensive experiments on real-world data show that our methods (1) are able to enable and enhance a variety of multi-network mining tasks, and (2) scale up to million-node networks.

### Task-Guided Pair Embedding in Heterogeneous Network

• Chanyoung Park
• Donghyun Kim
• Qi Zhu
• Jiawei Han
• Hwanjo Yu

Many real-world tasks solved by heterogeneous network embedding methods can be cast as modeling the likelihood of a pairwise relationship between two nodes. For example, the goal of author identification task is to model the likelihood of a paper being written by an author (paper-author pairwise relationship). Existing taskguided embedding methods are node-centric in that they simply measure the similarity between the node embeddings to compute the likelihood of a pairwise relationship between two nodes. However, we claim that for task-guided embeddings, it is crucial to focus on directly modeling the pairwise relationship. In this paper, we propose a novel task-guided pair embedding framework in heterogeneous network, called TaPEm, that directly models the relationship between a pair of nodes that are related to a specific task (e.g., paper-author relationship in author identification). To this end, we 1) propose to learn a pair embedding under the guidance of its associated context path, i.e., a sequence of nodes between the pair, and 2) devise the pair validity classifier to distinguish whether the pair is valid with respect to the specific task at hand. By introducing pair embeddings that capture the semantics behind the pairwise relationships, we are able to learn the fine-grained pairwise relationship between two nodes, which is paramount for task-guided embedding methods. Extensive experiments on author identification task demonstrate that TaPEm outperforms the state-of-the-art methods, especially for authors with few publication records.

## SESSION: Long - Graph Nerual Network I

### Graph Convolutional Networks with Motif-based Attention

• John Boaz Lee
• Ryan A. Rossi
• Xiangnan Kong
• Sungchul Kim
• Eunyee Koh
• Anup Rao

The success of deep convolutional neural networks in the domains of computer vision and speech recognition has led researchers to investigate generalizations of the said architecture to graph-structured data. A recently-proposed method called Graph Convolutional Networks has been able to achieve state-of-the-art results in the task of node classification. However, since the proposed method relies on localized first-order approximations of spectral graph convolutions, it is unable to capture higher-order interactions between nodes in the graph. In this work, we propose a motif-based graph attention model, called Motif Convolutional Networks, which generalizes past approaches by using weighted multi-hop motif adjacency matrices to capture higher-order neighborhoods. A novel attention mechanism is used to allow each individual node to select the most relevant neighborhood to apply its filter. We evaluate our approach on graphs from different domains (social networks and bioinformatics) with results showing that it is able to outperform a set of competitive baselines on the semi-supervised node classification task. Additional results demonstrate the usefulness of attention, showing that different higher-order neighborhoods are prioritized by different kinds of nodes.

### Long-tail Hashtag Recommendation for Micro-videos with Graph Convolutional Network

• Mengmeng Li
• Tian Gan
• Meng Liu
• Zhiyong Cheng
• Jianhua Yin
• Liqiang Nie

Hashtags, a user provides to a micro-video, are the ones which can well describe the semantics of the micro-video's content in his/her mind. At the same time, hashtags have been widely used to facilitate various micro-video retrieval scenarios (e.g., search, browse, and categorization). Despite their importance, numerous micro-videos lack hashtags or contain inaccurate or incomplete hashtags. In light of this, hashtag recommendation, which suggests a list of hashtags to a user when he/she wants to annotate a post, becomes a crucial research problem. However, little attention has been paid to micro-video hashtag recommendation, mainly due to the following three reasons: 1) lack of benchmark dataset; 2) the temporal and multi-modality characteristics of micro-videos; and 3) hashtag sparsity and long-tail distributions. In this paper, we recommend hashtags for micro-videos by presenting a novel multi-view representation interactive embedding model with graph-based information propagation. It is capable of boosting the performance of micro-videos hashtag recommendation by jointly considering the sequential feature learning, the video-user-hashtag interaction, and the hashtag correlations. Extensive experiments on a constructed dataset demonstrate our proposed method outperforms state-of-the-art baselines. As a side research contribution, we have released our dataset and codes to facilitate the research in this community.

### Hashing Graph Convolution for Node Classification

• Wenting Zhao
• Zhen Cui
• Chunyan Xu
• Chengzheng Li
• Tong Zhang
• Jian Yang

Convolution on graphs has aroused great interest in AI due to its potential applications to non-gridded data. To bypass the influence of ordering and different node degrees, the summation/average diffusion/aggregation is often imposed on local receptive field in most prior works. However, the collapsing into one node in this way tends to cause signal entanglements of nodes, which would result in a sub-optimal feature and decrease the discriminability of nodes. To address this problem, in this paper, we propose a simple but effective Hashing Graph Convolution (HGC) method by using global-hashing and local-projection on node aggregation for the task of node classification. In contrast to the conventional aggregation with a full collision, the hash-projection can greatly reduce the collision probability during gathering neighbor nodes. Another incidental effect of hash-projection is that the receptive field of each node is normalized into a common-size bucket space, which not only staves off the trouble of different-size neighbors and their order but also makes a graph convolution run like the standard shape-gridded convolution. Considering the few training samples, also, we introduce a prediction-consistent regularization term into HGC to constrain the score consistency of unlabeled nodes in the graph. HGC is evaluated on both transductive and inductive experimental settings and achieves new state-of-the-art results on all datasets for node classification task. The extensive experiments demonstrate the effectiveness of hash-projection.

### Relation-Aware Graph Convolutional Networks for Agent-Initiated Social E-Commerce Recommendation

• Fengli Xu
• Jianxun Lian
• Zhenyu Han
• Yong Li
• Yujian Xu
• Xing Xie

Recent years have witnessed a phenomenal success of agent-initiated social e-commerce models, which encourage users to become selling agents to promote items through their social connections. The complex interactions in this type of social e-commerce can be formulated as Heterogeneous Information Networks (HIN), where there are numerous types of relations between three types of nodes, i.e., users, selling agents and items. Learning high quality node embeddings is of key interest, and Graph Convolutional Networks (GCNs) have recently been established as the latest state-of-the-art methods in representation learning. However, prior GCN models have fundamental limitations in both modeling heterogeneous relations and efficiently sampling relevant receptive field from vast neighborhood. To address these problems, we propose RecoGCN, which stands for a RElation-aware CO-attentive GCN model, to effectively aggregate heterogeneous features in a HIN. It makes up current GCN's limitation in modelling heterogeneous relations with a relation-aware aggregator, and leverages the semantic-aware meta-paths to carve out concise and relevant receptive fields for each node. To effectively fuse the embeddings learned from different meta-paths, we further develop a co-attentive mechanism to dynamically assign importance weights to different meta-paths by attending the three-way interactions among users, selling agents and items. Extensive experiments on a real-world dataset demonstrate RecoGCN is able to learn meaningful node embeddings in HIN, and consistently outperforms baseline methods in recommendation tasks.

### Fi-GNN: Modeling Feature Interactions via Graph Neural Networks for CTR Prediction

• Zekun Li
• Zeyu Cui
• Shu Wu
• Xiaoyu Zhang
• Liang Wang

Click-through rate (CTR) prediction is an essential task in web applications such as online advertising and recommender systems, whose features are usually in multi-field form. The key of this task is to model feature interactions among different feature fields. Recently proposed deep learning based models follow a general paradigm: raw sparse input multi-field features are first mapped into dense field embedding vectors, and then simply concatenated together to feed into deep neural networks (DNN) or other specifically designed networks to learn high-order feature interactions. However, the simple unstructured combination of feature fields will inevitably limit the capability to model sophisticated interactions among different fields in a sufficiently flexible and explicit fashion. In this work, we propose to represent the multi-field features in a graph structure intuitively, where each node corresponds to a feature field and different fields can interact through edges. The task of modeling feature interactions can be thus converted to modeling node interactions on the corresponding graph. To this end, we design a novel model Feature Interaction Graph Neural Networks (Fi-GNN). Taking advantage of the strong representative power of graphs, our proposed model can not only model sophisticated feature interactions in a flexible and explicit fashion, but also provide good model explanations for CTR prediction. Experimental results on two real-world datasets show its superiority over the state-of-the-arts.

## SESSION: Long - Graph Nerual Network II

### Key Player Identification in Underground Forums over Attributed Heterogeneous Information Network Embedding Framework

• Yiming Zhang
• Yujie Fan
• Yanfang Ye
• Liang Zhao
• Chuan Shi

Online underground forums have been widely used by cybercriminals to exchange knowledge and trade in illicit products or services, which have played a central role in the cybercriminal ecosystem. In order to combat the evolving cybercrimes, in this paper, we propose and develop an intelligent system named iDetective to automate the analysis of underground forums for the identification of key players (i.e., users who play the vital role in the value chain). In iDetective, we first introduce an attributed heterogeneous information network (AHIN) for user representation and use a meta-path based approach to incorporate higher-level semantics to build up relatedness over users in underground forums; then we propose Player2Vec to efficiently learn node (i.e., user) representations in AHIN for key player identification. In Player2Vec, we first map the constructed AHIN to a multi-view network which consists of multiple single-view attributed graphs encoding the relatedness over users depicted by different designed meta-paths; then we employ graph convolutional network (GCN) to learn embeddings of each single-view attributed graph; later, an attention mechanism is designed to fuse different embeddings learned based on different single-view attributed graphs for final representations. Comprehensive experiments on the data collections from different underground forums (i.e., Hack Forums, Nulled) are conducted to validate the effectiveness of iDetective in key player identification by comparisons with alternative approaches.

### Learning to Identify High Betweenness Centrality Nodes from Scratch: A Novel Graph Neural Network Approach

• Changjun Fan
• Li Zeng
• Yuhui Ding
• Muhao Chen
• Yizhou Sun
• Zhong Liu

Betweenness centrality (BC) is a widely used centrality measures for network analysis, which seeks to describe the importance of nodes in a network in terms of the fraction of shortest paths that pass through them. It is key to many valuable applications, including community detection and network dismantling. Computing BC scores on large networks is computationally challenging due to its high time complexity. Many sampling-based approximation algorithms have been proposed to speed up the estimation of BC. However, these methods still need considerable long running time on large-scale networks, and their results are sensitive to even small perturbation to the networks. In this paper, we focus on the efficient identification of top-k nodes with highest BC in a graph, which is an essential task to many network applications. Different from previous heuristic methods, we turn this task into a learning problem and design an encoder-decoder based framework as a solution. Specifically, the encoder leverages the network structure to represent each node as an embedding vector, which captures the important structural information of the node. The decoder transforms each embedding vector into a scalar, which identifies the relative rank of a node in terms of its BC. We use the pairwise ranking loss to train the model to identify the orders of nodes regarding their BC. By training on small-scale networks, the model is capable of assigning relative BC scores to nodes for much larger networks, and thus identifying the highly-ranked nodes. Experiments on both synthetic and real-world networks demonstrate that, compared to existing baselines, our model drastically speeds up the prediction without noticeable sacrifice in accuracy, and even outperforms the state-of-the-arts in terms of accuracy on several large real-world networks.

### Multiple Rumor Source Detection with Graph Convolutional Networks

• Ming Dong
• Bolong Zheng
• Nguyen Quoc Viet Hung
• Han Su
• Guohui Li

Detecting rumor source in social networks is one of the key issues for defeating rumors automatically. Although many efforts have been devoted to defeating online rumors, most of them are proposed based an assumption that the underlying propagation model is known in advance. However, this assumption may lead to impracticability on real data, since it is usually difficult to acquire the actual underlying propagation model. Some attempts are developed by using label propagation to avoid the limitation caused by lack of prior knowledge on the underlying propagation model. Nonetheless, they still suffer from the shortcoming that the node label is simply an integer which may restrict the prediction precision. In this paper, we propose a deep learning based model, namely GCNSI (Graph Convolutional Networks based Source Identification), to locate multiple rumor sources without prior knowledge of underlying propagation model. By adopting spectral domain convolution, we build node representation by utilizing its multi-order neighbors information such that the prediction precision on the sources is improved. We conduct experiments on several real datasets and the results demonstrate that our model outperforms state-of-the-art model.

### Rethinking the Item Order in Session-based Recommendation with Graph Neural Networks

• Ruihong Qiu
• Jingjing Li
• Zi Huang
• Hongzhi YIn

Predicting a user's preference in a short anonymous interaction session instead of long-term history is a challenging problem in the real-life session-based recommendation, e.g., e-commerce and media stream. Recent research of the session-based recommender system mainly focuses on sequential patterns by utilizing the attention mechanism, which is straightforward for the session's natural sequence sorted by time. However, the user's preference is much more complicated than a solely consecutive time pattern in the transition of item choices. In this paper, therefore, we study the item transition pattern by constructing a session graph and propose a novel model which collaboratively considers the sequence order and the latent order in the session graph for a session-based recommender system. We formulate the next item recommendation within the session as a graph classification problem. Specifically, we propose a weighted attention graph layer and a Readout function to learn embeddings of items and sessions for the next item recommendation. Extensive experiments have been conducted on two benchmark E-commerce datasets, Yoochoose and Diginetica, and the experimental results show that our model outperforms other state-of-the-art methods.

### Gravity-Inspired Graph Autoencoders for Directed Link Prediction

• Guillaume Salha
• Stratis Limnios
• Romain Hennequin
• Viet-Anh Tran
• Michalis Vazirgiannis

Graph autoencoders (AE) and variational autoencoders (VAE) recently emerged as powerful node embedding methods. In particular, graph AE and VAE were successfully leveraged to tackle the challenging link prediction problem, aiming at figuring out whether some pairs of nodes from a graph are connected by unobserved edges. However, these models focus on undirected graphs and therefore ignore the potential direction of the link, which is limiting for numerous real-life applications. In this paper, we extend the graph AE and VAE frameworks to address link prediction in directed graphs. We present a new gravity-inspired decoder scheme that can effectively reconstruct directed graphs from a node embedding. We empirically evaluate our method on three different directed link prediction tasks, for which standard graph AE and VAE perform poorly. We achieve competitive results on three real-world graphs, outperforming several popular baselines.

## SESSION: Long - Heterogeneous Data

### Discovering Hypernymy in Text-Rich Heterogeneous Information Network by Exploiting Context Granularity

• Yu Shi
• Jiaming Shen
• Yuchen Li
• Naijing Zhang
• Xinwei He
• Zhengzhi Lou
• Qi Zhu
• Matthew Walker
• Myunghwan Kim
• Jiawei Han

Text-rich heterogeneous information networks (text-rich HINs) are ubiquitous in real-world applications. Hypernymy, also known as is-a relation or subclass-of relation, lays in the core of many knowledge graphs and benefits many downstream applications. Existing methods of hypernymy discovery either leverage textual patterns to extract explicitly mentioned hypernym-hyponym pairs, or learn a distributional representation for each term of interest based its context. These approaches rely on statistical signals from the textual corpus, and their effectiveness would therefore be hindered when the signals from the corpus are not sufficient for all terms of interest. In this work, we propose to discover hypernymy in text-rich HINs, which can introduce additional high-quality signals. We develop a new framework, named HyperMine, that exploits multi-granular contexts and combines signals from both text and network without human labeled data. HyperMine extends the definition of "context" to the scenario of text-rich HIN. For example, we can define typed nodes and communities as contexts. These contexts encode signals of different granularities and we feed them into a hypernymy inference model. HyperMine learns this model using weak supervision acquired based on high-precision textual patterns. Extensive experiments on two large real-world datasets demonstrate the effectiveness of HyperMine and the utility of modeling context granularity. We further show a case study that a high-quality taxonomy can be generated solely based on the hypernymy discovered by HyperMine.

### αCyber: Enhancing Robustness of Android Malware Detection System against Adversarial Attacks on Heterogeneous Graph based Model

• Shifu Hou
• Yujie Fan
• Yiming Zhang
• Yanfang Ye
• Jingwei Lei
• Wenqiang Wan
• Jiabin Wang
• Qi Xiong
• Fudong Shao

The explosive growth and increasing sophistication of Android malware call for new defensive techniques that are capable of protecting mobile users against novel threats. To combat the evolving Android malware attacks, systems of HinDroid and AiDroid have demonstrated the success of heterogeneous graph (HG) based classifiers in Android malware detection; however, their success may also incentivize attackers to defeat HG based models to bypass the detection. By far, there has no work on adversarial attack and/or defense on HG data. In this paper, we explore the robustness of HG based model in Android malware detection at the first attempt. In particular, based on a generic HG based classifier, (1) we first present a novel yet practical adversarial attack model (named HG-Attack) on HG data by considering Android malware attackers' current capabilities and knowledge; (2) to effectively combat the adversarial attacks on HG, we then propose a resilient yet elegant defense paradigm (named Rad-HGC) to enhance robustness of HG based classifier in Android malware detection. Promising experimental results based on the large-scale and real sample collections from Tencent Security Lab demonstrate the effectiveness of our developed system αCyber, which integrates our proposed defense model Rad-HGC that is resilient against practical adversarial malware attacks on the HG data performed by HG-Attack.

### BHIN2vec: Balancing the Type of Relation in Heterogeneous Information Network

• Seonghyeon Lee
• Chanyoung Park
• Hwanjo Yu

The goal of network embedding is to transform nodes in a network to a low-dimensional embedding vectors. Recently, heterogeneous network has shown to be effective in representing diverse information in data. However, heterogeneous network embedding suffers from the imbalance issue, i.e. the size of relation types (or the number of edges in the network regarding the type) is imbalanced. In this paper, we devise a new heterogeneous network embedding method, called BHIN2vec, which considers the balance among all relation types in a network. We view the heterogeneous network embedding as simultaneously solving multiple tasks in which each task corresponds to each relation type in a network. After splitting the skip-gram loss into multiple losses corresponding to different tasks, we propose a novel random-walk strategy to focus on the tasks with high loss values by considering the relative training ratio. Unlike previous random walk strategies, our proposed random-walk strategy generates training samples according to the relative training ratio among different tasks, which results in a balanced training for the node embedding. Our extensive experiments on node classification and recommendation demonstrate the superiority of BHIN2vec compared to the state-of-the-art methods. Also, based on the relative training ratio, we analyze how much each relation type is represented in the embedding space.

### Deep Sequence-to-Sequence Entity Matching for Heterogeneous Entity Resolution

• Hao Nie
• Xianpei Han
• Ben He
• Le Sun
• Bo Chen
• Wei Zhang
• Suhui Wu
• Hao Kong

Entity Resolution (ER) identifies records from different data sources that refer to the same real-world entity. Conventional ER approaches usually employ a structure matching mechanism, where attributes are aligned, compared and aggregated for ER decision. The structure matching approaches, unfortunately, often suffer from heterogeneous and dirty ER problems. That is, entities from different data sources are described using different schemas, and attribute values may be misplaced, missing, or noisy. In this paper, we propose a deep sequence-to-sequence entity matching model, denoted Seq2SeqMatcher, which can effectively solve the heterogeneous and dirty problems by modeling ER as a token-level sequence-to-sequence matching task. Specifically, we propose an align-compare-aggregate neural network for Seq2Seq entity matching, which can learn the representations of tokens, capture the semantic relevance between tokens, and aggregate matching evidence for accurate ER decisions in an end-to-end manner. Experimental results show that, by comparing entity records in token level and learning all components in an end-to-end manner, our Seq2Seq entity matching model can achieve remarkable performance improvements on 9 standard entity resolution benchmarks.

### HeteSpaceyWalk: A Heterogeneous Spacey Random Walk for Heterogeneous Information Network Embedding

• Yu He
• Yangqiu Song
• Jianxin Li
• Cheng Ji
• Jian Peng
• Hao Peng

Heterogeneous information network (HIN) embedding has gained increasing interests recently. However, the current way of random-walk based HIN embedding methods have paid few attention to the higher-order Markov chain nature of meta-path guided random walks, especially to the stationarity issue. In this paper, we systematically formalize the meta-path guided random walk as a higher-order Markov chain process,and present a heterogeneous personalized spacey random walk to efficiently and effectively attain the expected stationary distribution among nodes. Then we propose a generalized scalable framework to leverage the heterogeneous personalized spacey random walk to learn embeddings for multiple types of nodes in an HIN guided by a meta-path, a meta-graph, and a meta-schema respectively. We conduct extensive experiments in several heterogeneous networks and demonstrate that our methods substantially outperform the existing state-of-the-art network embedding algorithms.

## SESSION: Long - Knowledge Graph I

### EHR Coding with Multi-scale Feature Attention and Structured Knowledge Graph Propagation

• Xiancheng Xie
• Yun Xiong
• Philip S. Yu
• Yangyong Zhu

Assigning standard medical codes (e.g., ICD-9-CM) representing diagnoses or procedures to electronic health record (EHR) is an important task in the medical domain. However, automatic coding is difficult since the clinical note is composed of multiple long and heterogeneous textual narratives (e.g., discharge diagnosis, pathology reports, surgical procedure notes). Furthermore, the code label space is large and the label distribution is extremely unbalanced. The state-of-the-art methods mainly regard EHR coding as a multi-label text classification task and use shallow convolution neural network with fixed window size, which is incapable of learning variable n-gram features and the ontology structure between codes. In this paper, we leverage a densely connected convolutional neural network which is able to produce variable n-gram features for clinical note feature learning. We also incorporate a multi-scale feature attention to adaptively select multi-scale features since the most informative n-grams in clinical notes for each word can vary in length according to the neighborhood. Furthermore, we leverage graph convolutional neural network to capture both the hierarchical relationships among medical codes and the semantics of each code. Finally, We validate our method on the public dataset, and the evaluation results indicate that our method can significantly outperform other state-of-the-art models.

### A Fine-grained and Noise-aware Method for Neural Relation Extraction

• Jianfeng Qu
• Wen Hua
• Dantong Ouyang
• Xiaofang Zhou
• Ximing Li

Distant supervision is an efficient way to generate large-scale training data for relation extraction without human efforts. However, a coin has two sides. The automatically annotated labels for training data are problematic, which can be summarized as multi-instance multi-label problem and coarse-grained (bag-level) supervised signal. To address these problems, we propose two reasonable assumptions and craft reinforcement learning to capture the expressive sentence for each relation mentioned in a bag. More specifically, we extend the original expressed-at-least-once assumption to multi-label level, and introduce a novel express-at-most-one assumption. Besides, we design a fine-grained reward function, and model the sentence selection process as an auction where different relations for a bag need to compete together to achieve the possession of a specific sentence based on its expressiveness. In this way, our model can be dynamically self-adapted, and eventually implements the accurate one-to-one mapping from a relation label to its chosen expressive sentence, which serves as training instances for the extractor. The experimental results on a public dataset demonstrate that our model constantly and substantially outperforms current state-of-the-art methods for relation extraction.

### Learning Region Similarity over Spatial Knowledge Graphs with Hierarchical Types and Semantic Relations

• Xiongnan Jin
• Byungkook Oh
• Sanghak Lee
• Dongho Lee
• Kyong-Ho Lee
• Liang Chen

A large number of spatial knowledge graphs (SKGs) are available from spatially enriched knowledge bases, e.g., DBpedia and YAGO2. This provides a great chance to understand valuable information about the regions surrounding us. However, it is hard to comprehend SKGs due to the explosively growing volume and the complication of the graph structures. Thus we study the problem of similar region search (SRS), which is an easy-to-use but effective way to explore spatial data. The effectiveness of SRS highly depends on how to measure the region similarity. However, existing approaches cannot make use of the rich information contained in SKGs thus may lead to incorrect results. In this paper, we propose a spatial knowledge representation learning method for region similarity, namely SKRL4RS. SKRL4RS firstly encodes the spatial entities of an SKG into a vector space to make it easier to extract useful features. Then regions are represented by 3-D tensors using the spatial entity embeddings together with geographical information. Finally, region tensors are fed into the conventional triplet network to learn the feature vectors of regions. The region similarity measure learned by SKRL4RS can capture the hierarchical types, semantic relatedness, and relative locations of spatial entities inside a region. Experimental results on two real-world datasets show that our SKRL4RS outperforms the state-of-the-art by a significant margin in terms of the accuracy of measuring region similarity.

### Bayes EMbedding (BEM): Refining Representation by Integrating Knowledge Graphs and Behavior-specific Networks

• Yuting Ye
• Xuwu Wang
• Jiangchao Yao
• Kunyang Jia
• Jingren Zhou
• Yanghua Xiao
• Hongxia Yang

Low-dimensional embeddings of knowledge graphs and behavior graphs have proved remarkably powerful in varieties of tasks, from predicting unobserved edges between entities to content recommendation. The two types of graphs can contain distinct and complementary information for the same entities/nodes. However, previous works focus either on knowledge graph embedding or behavior graph embedding while few works consider both in a unified way. Here we present BEM, a Bayesian framework that incorporates the information from knowledge graphs and behavior graphs. To be more specific, BEM takes as prior the pre-trained embeddings from the knowledge graph, and integrates them with the pre-trained embeddings from the behavior graphs via a Bayesian generative model. BEM is able to mutually refine the embeddings from both sides while preserving their own topological structures. To show the superiority of our method, we conduct a range of experiments on three benchmark datasets: node classification, link prediction, triplet classification on two small datasets related to Freebase, and item recommendation on a large-scale e-commerce dataset.

### A Benchmark for Fact Checking Algorithms Built on Knowledge Bases

• Viet-Phi Huynh
• Paolo Papotti

Fact checking is the task of determining if a given claim holds. Several algorithms have been developed to check claims with reference information in the form of facts in a knowledge base. While individual algorithms have been experimentally evaluated in the past, we provide the first comprehensive and publicly available benchmark infrastructure for evaluating methods across a wide range of assumptions about the claims and the reference information. We show how, by changing the popularity, transparency, homogeneity, and functionality properties of the facts in an experiment, it is possible to influence significantly the performance of the fact checking algorithms. We introduce a benchmark framework to systematically enforce such properties in training and testing datasets with fine tune control over their properties. We then use our benchmark to compare fact checking algorithms with one another, as well as with methods that can solve the link prediction task in knowledge bases. Our evaluation shows the impact of the four data properties on the qualitative performance of the fact checking solutions and reveals a number of new insights concerning their applicability and performance.

## SESSION: Long - Knowledge Graph II

### Online Schemaless Querying of Heterogeneous Open Knowledge Bases

• Nikita Bhutani

Applications that depend on a deep understanding of natural language text have led to a renaissance of large knowledge bases (KBs). Some of these are curated manually and conform to an ontology. Many others, called open KBs, are derived automatically from unstructured text without any pre-specified ontology. These open KBs offer broad coverage of information but are far more heterogeneous than curated KBs, which themselves are more heterogeneous than traditional databases with a fixed schema. Due to the heterogeneity of information representation, querying KBs is a challenging task. Traditionally, query expansion is performed to cover all possible transformations and semantically equivalent structures. Such query expansion can be impractical for heterogeneous open KBs, particularly when complex queries lead to a combinatorial explosion of expansion possibilities. Furthermore, learning a query expansion model requires training examples, which is difficult to scale to diverse representations of facts in the KB. In this paper, we introduce an online schemaless querying method that does not require the query to exactly match the facts. Instead of exactly matching a query, it finds matches for individual query components and then identifies an answer by reasoning over the collective evidence. We devise an alignment-based algorithm for extracting answers based on textual and semantic similarity of query components and evidence fields. Thus, any representational mismatches between the query and evidence are handled online at query-time. Experiments show our approach is effective in handling multi-constraint queries.

### Enhancing Conversational Dialogue Models with Grounded Knowledge

• Wen Zheng
• Ke Zhou

Leveraging external knowledge to enhance conversational models has become a popular research area in recent years. Compared to vanilla generative models, the knowledge-grounded models may produce more informative and engaging responses. Although various approaches have been proposed in the past, how to effectively incorporate knowledge remains an open research question. It is unclear how much external knowledge should be retrieved and what is the optimal way to enhance the conversational model, trading off between relevant information and noise. Therefore, in this paper, we aim to bridge the gap by first extensively evaluating various types of state-of-the-art knowledge-grounded conversational models, including recurrent neural network based, memory networks based, and Transformer based models. We demonstrate empirically that those conversational models can only be enhanced with the right amount of external knowledge. To effectively leverage information originated from external knowledge, we propose a novel Transformer with Expanded Decoder (Transformer-ED or TED for short), which can automatically tune the weights for different sources of evidence when generating responses. Our experiments show that our proposed model outperforms state-of-the-art models in terms of both quality and diversity.

### MedTruth: A Semi-supervised Approach to Discovering Knowledge Condition Information from Multi-Source Medical Data

• Yang Deng
• Yaliang Li
• Ying Shen
• Nan Du
• Wei Fan
• Min Yang
• Kai Lei

Knowledge Graph (KG) contains entities and the relations between entities. Due to its representation ability, KG has been successfully applied to support many medical/healthcare tasks. However, in the medical domain, knowledge holds under certain conditions. Such conditions for medical knowledge are crucial for decision-making in various medical applications, which is missing in existing medical KGs. In this paper, we aim to discovery medical knowledge conditions from texts to enrich KGs. Electronic Medical Records (EMRs) are systematized collection of clinical data and contain detailed information about patients, thus EMRs can be a good resource to discover medical knowledge conditions. Unfortunately, the amount of available EMRs is limited due to reasons such as regularization. Meanwhile, a large amount of medical question answering (QA) data is available, which can greatly help the studied task. However, the quality of medical QA data is quite diverse, which may degrade the quality of the discovered medical knowledge conditions. In the light of these challenges, we propose a new truth discovery method, MedTruth, for medical knowledge condition discovery, which incorporates prior source quality information into the source reliability estimation procedure, and also utilizes the knowledge triple information for trustworthy information computation. We conduct series of experiments on real-world medical datasets to demonstrate that the proposed method can discover meaningful and accurate conditions for medical knowledge by leveraging both EMR and QA data. Further, the proposed method is tested on synthetic datasets to validate its effectiveness under various scenarios.

### Look before you Hop: Conversational Question Answering over Knowledge Graphs Using Judicious Context Expansion

• Philipp Christmann
• Rishiraj Saha Roy
• Abdalghani Abujabal
• Jyotsna Singh
• Gerhard Weikum

Fact-centric information needs are rarely one-shot; users typically ask follow-up questions to explore a topic. In such a conversational setting, the user's inputs are often incomplete, with entities or predicates left out, and ungrammatical phrases. This poses a huge challenge to question answering (QA) systems that typically rely on cues in full-fledged interrogative sentences. As a solution, we develop CONVEX, an unsupervised method that can answer incomplete questions over a knowledge graph (KG) by maintaining conversation context using entities and predicates seen so far and automatically inferring missing or ambiguous pieces for follow-up questions. The core of our method is a graph exploration algorithm that judiciously expands a frontier to find candidate answers for the current question. To evaluate CONVEX, we release ConvQuestions, a crowdsourced benchmark with 11,200 distinct conversations from five different domains. We show that CONVEX: (i) adds conversational support to any stand-alone QA system, and (ii) outperforms state-of-the-art baselines and question completion strategies.

### Learning to Answer Complex Questions over Knowledge Bases with Query Composition

• Nikita Bhutani
• Xinyi Zheng

Recent years have seen a surge of knowledge-based question answering (KB-QA) systems which provide crisp answers to user-issued questions by translating them to precise structured queries over a knowledge base (KB). A major challenge in KB-QA is bridging the gap between natural language expressions and the complex schema of the KB. As a result, existing methods focus on simple questions answerable with one main relation path in the KB and struggle with complex questions that require joining multiple relations. We propose a KB-QA system, TextRay, which answers complex questions using a novel decompose-execute-join approach. It constructs complex query patterns using a set of simple queries. It uses a semantic matching model which is able to learn simple queries using implicit supervision from question-answer pairs, thus eliminating the need for complex query patterns. Our proposed system significantly outperforms existing KB-QA systems on complex questions while achieving comparable results on simple questions.

## SESSION: Long - Machine Learning Themes I

### Auto-completion for Data Cells in Relational Tables

• Shuo Zhang
• Krisztian Balog

We address the task of auto-completing data cells in relational tables. Such tables describe entities (in rows) with their attributes (in columns). We present the CellAutoComplete framework to tackle several novel aspects of this problem, including: (i) enabling a cell to have multiple, possibly conflicting values, (ii) supplementing the predicted values with supporting evidence, (iii) combining evidence from multiple sources, and (iv) handling the case where a cell should be left empty. Our framework makes use of a large table corpus and a knowledge base as data sources, and consists of preprocessing, candidate value finding, and value ranking components. Using a purpose-built test collection, we show that our approach is 40% more effective than the best baseline.

### Author Set Identification via Quasi-Clique Discovery

• Yuyan Zheng
• Chuan Shi
• Xiangnan Kong
• Yanfang Ye

Author identification based on heterogeneous bibliographic networks, which is to identify potential authors given an anonymous paper, has been studied in recent years. However, most of the existing works merely consider the relationship between authors and anonymous papers, while ignore the relationships between authors. In this paper, we take the relationships among authors into consideration to study the problem of author set identification, which is to identify an author set rather than an individual author related to an anonymous paper. The proposed problem has important applications to new collaborator discovery and group building. We propose a novel Author Set Identification approach, namely ASI. ASI first extracts a task-guided embedding to learn the low-dimensional representations of nodes in bibliographic network. And then ASI leverages the learned embedding to construct a weighted paper-ego-network, which contains anonymous paper and candidate authors. Finally, converting the optimal author set identification to the quasi-clique discovery in the constructed network, ASI utilizes a local-search heuristic mechanism under the guidance of the devised density function to find the optimal quasiclique. Extensive experiments on bibliographic networks demonstrate that ASI outperforms the state-of-art baselines in author set identification.

• Vasileios Iosifidis
• Eirini Ntoutsi

The widespread use of ML-based decision making in domains with high societal impact such as recidivism, job hiring and loan credit has raised a lot of concerns regarding potential discrimination. In particular, in certain cases it has been observed that ML algorithms can provide different decisions based on sensitive attributes such as gender or race and therefore can lead to discrimination. Although, several fairness-aware ML approaches have been proposed, their focus has been largely on preserving the overall classification accuracy while improving fairness in predictions for both protected and non-protected groups (defined based on the sensitive attribute(s)). The overall accuracy however is not a good indicator of performance in case of class imbalance, as it is biased towards the majority class. As we will see in our experiments, many of the fairness-related datasets suffer from class imbalance and therefore, tackling fairness requires also tackling the imbalance problem. To this end, we propose AdaFair, a fairness-aware classifier based on AdaBoost that further updates the weights of the instances in each boosting round taking into account a cumulative notion of fairness based upon all current ensemble members, while explicitly tackling class-imbalance by optimizing the number of ensemble members for balanced classification error. Our experiments show that our approach can achieve parity in true positive and true negative rates for both protected and non-protected groups, while it significantly outperforms existing fairness-aware methods up to 25% in terms of balanced error.

### New Online Kernel Ridge Regression via Incremental Predictive Sampling

• Shan Xu
• Xiao Zhang
• Shizhong Liao

Online kernel ridge regression via existing sampling approaches, which aim at approximating the kernel matrix as accurately as possible, is independent of learning and has a cubic time complexity with respect to the sampling size for updating hypothesis. In this paper, we propose a new online kernel ridge regression via an incremental predictive sampling approach, which has the nearly optimal accumulated loss and performs efficiently at each round. We use the estimated ridge leverage score of the labeled matrix, which depends on the accumulated loss at each round, to construct the predictive sampling distribution, and use this sampling probability for the Nyströ m approximation. To avoid calculating the inverse of the approximated kernel matrix directly, we use the Woodbury formula to accelerate the computation and adopt the truncated incremental singular value decomposition to update the generalized inverse of the intersection matrix. Our online kernel ridge regression has a time complexity of $O(tmk+k^3 )$ for updating hypothesis at round t, where k is the truncated rank of the intersection matrix, and enjoys a regret bound of order $O(\sqrtT )$, where T is the time horizon. Experimental results show that the proposed online kernel ridge regression via the incremental predictive sampling performs more stably and efficiently than the online kernel ridge regression via existing online sampling approaches that directly approximate the kernel matrix.

### Online Kernel Selection via Tensor Sketching

• Shizhong Liao
• Xiao Zhang

Online kernel selection is a more complex problem compared with offline kernel selection, which intermixes training and selection at each round and requires a sublinear regret and low computational complexities. But existing online kernel selection approaches have at least linear time and space complexities at each round with respect to the number of rounds, or lack sublinear regret guarantees for an uncountably infinite number of candidate kernels. To address these issues, we propose a novel online kernel selection approach using tensor sketching, which has constant computational complexities at each round and enjoys a sublinear regret bound for an uncountably infinite number of candidate kernels. We represent the data using the tensor products and construct data sketches using the Taylor series and the Count Sketch matrices, which yields a sketched reproducing kernel Hilbert space (SRKHS). Then we update the optimal kernels and the hypotheses using online gradient descent in SRKHS. We prove that the kernel corresponding to SRKHS satisfies the reproducing property, the hypotheses in SRKHS are convex with respect to the kernel parameter, and the proposed online kernel selection approach in SRKHS enjoys a regret bound of order $O(\sqrtT )$ for an uncountably infinite number of candidate kernels, which is optimal for a convex loss function, where T is the number of rounds. By the fast Fourier transform, the hypotheses in SRKHS can be computed in a quasilinear time complexity and a logarithmic space complexity with respect to the sketch size at each round, where the sketch size is a constant. Experimental results demonstrate that our online kernel selection approach is more accurate and efficient for online kernel learning on high dimension data.

## SESSION: Long - Machine Learning Themes II

### One-Class Active Learning for Outlier Detection with Multiple Subspaces

• Holger Trittenbach
• Klemens Böhm

Active learning for outlier detection involves users in the process, by asking them for annotations of observations, in the form of class labels. The usual assumption is that users can provide such feedback, regardless of the nature and the presentation of the results. This is a simplification, which may not hold in practice. To overcome it, we propose SubSVDD, a semi-supervised classifier, that learns decision boundaries in low-dimensional projections of the data. SubSVDD de-constructs the outlier classification so that users can comprehend and interpret results more easily. For active learning, SubSVDD features a new update mechanism that adjusts decision boundaries based on user feedback. In particular, it considers that outliers may only occur in some of the low-dimensional projections. We conduct systematic experiments to show the effectiveness of our approach. In a comprehensive benchmark, SubSVDD outperforms alternative approaches on several data sets.

### AutoGRD: Model Recommendation Through Graphical Dataset Representation

• Noy Cohen-Shapira
• Lior Rokach
• Bracha Shapira
• Roman Vainshtein

The widespread use of machine learning algorithms and the high level of expertise required to utilize them have fuelled the demand for solutions that can be used by non-experts. One of the main challenges non-experts face in applying machine learning to new problems is algorithm selection - the identification of the algorithm(s) that will deliver top performance for a given dataset, task, and evaluation measure. We present AutoGRD, a novel meta-learning approach for algorithm recommendation. AutoGRD first represents datasets as graphs and then extracts their latent representation that is used to train a ranking meta-model capable of accurately recommending top-performing algorithms for previously unseen datasets. We evaluate our approach on 250 datasets and demonstrate its effectiveness both for classification and regression tasks. AutoGRD outperforms state-of-the-art meta-learning and Bayesian methods.

### Batch Mode Active Learning for Semantic Segmentation Based on Multi-Clue Sample Selection

• Yao Tan
• Liu Yang
• Qinghua Hu
• Zhibin Du

Large labeled datasets are required for training a powerful semantic segmentation model. However, it is very expensive to construct pixel-wise annotated images. In this work, we propose a general batch mode active learning algorithm for semantic segmentation which automatically selects important samples to be labeled for building a competitive classifier. In our approach the edge information of an image is first introduced as a new selecting clue of active learning, which can measure the essential information relevant to segmentation performance. In addition, we also incorporate the informativeness based on Query by Committee (QBC) and representativeness criteria in our algorithm. We combine three clues to select a batch of samples during each iteration. It is shown that the image edge information is significant for the active learning for semantic segmentation in the experiments. And we also demonstrate the performance of our method outperforms the state of the art active learning approaches on the datasets of CamVid, Stanford Background and PASCAL VOC 2012.

### CRUX: Adaptive Querying for Efficient Crowdsourced Data Extraction

• Theodoros Rekatsinas
• Amol Deshpande

Crowdsourcing is essential for collecting information about real-world entities. Existing crowdsourced data extraction solutions use fixed, non-adaptive querying strategies that repeatedly ask workers to provide entities from a fixed domain until a desired level of coverage is reached. Unfortunately, such solutions are highly impractical as they yield many duplicate extractions. We design an adaptive querying framework, CRUX, that maximizes the number of extracted entities for a given budget. We show that the problem of budgeted crowdsourced entity extraction is NP-Hard. We leverage two insights to focus our extraction efforts: \em exploiting the structure of the domain of interest, and \em using exclude lists to limit repeated extractions. We develop new statistical tools to reason about the number of new distinct extracted entities of \em additional queries under the presence of little information, and embed them within adaptive algorithms that maximize the distinct extracted entities under budget constraints. We evaluate our techniques on synthetic and real-world datasets, demonstrating an improvement of up to 300% over competing approaches for the same budget.

### Deep Forest with LRRS Feature for Fine-grained Website Fingerprinting with Encrypted SSL/TLS

• Ziqing Zhang
• Cuicui Kang
• Gang Xiong
• Zhen Li

With the development of encryption protocol, such as Secure Sockets Layer (SSL) and Transport Layer Security (TLS), the traditional fingerprinting approaches based on packet content and special field are difficult to fingerprint the websites. Therefore, recent research imported machine learning algorithms to deal with this problem, and various features are extracted for the machine learning algorithms. However, previous approaches of fingerprinting encrypted websites are based on HTTP/1.1, which are not applicable to the widely used HTTP/2. In addition, most of the work only fingerprints the home page of each website, but in fact, users also visit other web pages of the website. To solve the feature compatibility problem, we propose to use the local request and response sequence (LRRS) as features. LRRS can represent the patterns of the encrypted Internet traffic not only based on HTTP/1.1 but also based on HTTP/2 using local packet sequences. In order to fingerprint different web pages in the same website, we import Deep Forest to extract fine-grained features. It utilizes a convolution structure to make full use of LRRS sequential features and multi-layer structure to enhance the ability of feature representation. The experimental results show the proposed algorithm has achieved the best overall performance on four datasets. Especially on the bidirectional encrypted traffic dataset with HTTP/2, the proposed approach achieved 55% higher of f1 score than the state-of-the-art method KFP with Random Forest.

## SESSION: Long - Machine Learning Themes III

### N2N: Network Derivative Mining

• Jian Kang
• Hanghang Tong

Network mining plays a pivotal role in many high-impact application domains, including information retrieval, healthcare, social network analysis, security and recommender systems. State-of-the-art offers a wealth of sophisticated network mining algorithms, many of which have been widely adopted in real-world with superior empirical performance. Nonetheless, they often lack effective and efficient ways to characterize how the results of a given mining task relate to the underlying network structure. In this paper, we introduce network derivative mining problem. Given the input network and a specific mining algorithm, network derivative mining finds a derivative network whose edges measure the influence of the corresponding edges of the input network on the mining results. We envision that network derivative mining could be beneficial in a variety of scenarios, ranging from explainable network mining, adversarial network mining, sensitivity analysis on network structure, active learning, learning with side information to counterfactual learning on networks. We propose a generic framework for network derivative mining from the optimization perspective and provide various instantiations for three classic network mining tasks, including ranking, clustering, and matrix completion. For each mining task, we develop effective algorithm for constructing the derivative network based on influence function analysis, with numerous optimizations to ensure a linear complexity in both time and space. Extensive experimental evaluation on real-world datasets demonstrates the efficacy of the proposed framework and algorithms.

### MoBoost: A Self-improvement Framework for Linear-based Hashing

• Xingbo Liu
• Xiushan Nie
• Xiaoming Xi
• Lei Zhu
• Yilong Yin

The linear model is commonly utilized in hashing methods owing to its efficiency. To obtain better accuracy, linear-based hashing methods focus on designing a generalized linear objective function with different constraints or penalty terms that consider neighborhood information. In this study, we propose a novel generalized framework called Model Boost (MoBoost), which can achieve the self-improvement of the linear-based hashing. The proposed MoBoost is used to improve model parameter optimization for linear-based hashing methods without adding new constraints or penalty terms. In the proposed MoBoost, given a linear-based hashing method, we first execute the method several times to get several different hash codes for training samples, and then combine these different hash codes into one set utilizing one novel fusion strategy. Based on this set of hash codes, we learn some new parameters for the linear hash function that can significantly improve accuracy. The proposed MoBoost can be generally adopted in existing linear-based hashing methods, achieving more precise and stable performance compared to the original methods while imposing negligible added expenditure in terms of time and space. Extensive experiments are performed based on three benchmark datasets, and the results demonstrate the superior performance of the proposed framework.

### Loopless Semi-Stochastic Gradient Descent with Less Hard Thresholding for Sparse Learning

• Xiangyang Liu
• Bingkun Wei
• Fanhua Shang
• Hongying Liu

Stochastic gradient hard thresholding methods have recently been shown to work favorably for solving large-scale empirical risk minimization problems under sparsity constraints. Many stochastic hard thresholding methods (e.g., SVRG-HT) conduct a full gradient update with a constant frequency and perform a hard thresholding operation at each iteration, which leads to a high computational complexity especially for high-dimensional and sparse problems. To be more efficient in large-scale datasets, we propose an efficient single-layer semi-stochastic gradient hard thresholding (LSSG-HT) method. The proposed algorithm updates full gradient with a given probability p and reduces lots of hard thresholding operations by setting frequency m, which reduces hard thresholding complexity in theory to O(κ_s/młog(1/ε)) compared with O(κ_słog(1/ε)) of SVRG-HT. We prove that our algorithm can converge to an optimal solution with a linear convergence rate. Furthermore, we also present an asynchronous parallel variant of LSSG-HT. Numerical experimental results demonstrate that the efficiency of our algorithms with comparison against the state-of-the-art algorithms.

### EPA: Exoneration and Prominence based Age for Infection Source Identification

• Syed Shafat Ali
• Tarique Anwar
• Ajay Rastogi
• Syed Afzal Murtaza Rizvi

Infection source identification is a well-established problem, having gained a substantial scale of research attention over the years. In this paper, we study the problem by exploiting the idea of the source being the oldest node. For the same, we propose a novel algorithm called Exoneration and Prominence based Age (EPA), which calculates the age of an infected node by considering its prominence in terms of its both infected and non-infected neighbors. These non-infected neighbors hold the key in exonerating an infected node from being the infection source. We also propose a computationally inexpensive variant of EPA, called EPA-LW. Extensive experiments are performed on seven datasets, including 5 real-world and 2 synthetic, of different topologies and varying sizes to demonstrate the effectiveness of the proposed algorithms. We consistently outperform the state-of-the-art single source identification methods in terms of average error distance. To the best of our knowledge, this is the largest scale performance evaluation of the considered problem till date. We also extend EPA to identify multiple sources by developing two new algorithms - one based on K-Means, called EPA_K-Means, and another based on successive identification of sources, called EPA_SSI. Our results show that both EPA_K-Means and EPA_SSI outperform the other multi-source heuristic approaches.

## SESSION: Long - Mining in Emerging Applications I

### Generating Persuasive Visual Storylines for Promotional Videos

• Chang Liu
• Yi Dong
• Han Yu
• Zhiqi Shen
• Zhanning Gao
• Pan Wang
• Changgong Zhang
• Peiran Ren
• Xuansong Xie
• Lizhen Cui
• Chunyan Miao

Video contents have become a critical tool for promoting products in E-commerce. However, the lack of automatic promotional video generation solutions makes large-scale video-based promotion campaigns infeasible. The first step of automatically producing promotional videos is to generate visual storylines, which is to select the building block footage and place them in an appropriate order. This task is related to the subjective viewing experience. It is hitherto performed by human experts and thus, hard to scale. To address this problem, we propose WundtBackpack, an algorithmic approach to generate storylines based on available visual materials, which can be video clips or images. It consists of two main parts, 1) the Learnable Wundt Curve to evaluate the perceived persuasiveness based on the stimulus intensity of a sequence of visual materials, which only requires a small volume of data to train; and 2) a clustering-based backpacking algorithm to generate persuasive sequences of visual materials while considering video length constraints. In this way, the proposed approach provides a dynamic structure to empower artificial intelligence (AI) to organize video footage in order to construct a sequence of visual stimuli with persuasive power. Extensive real-world experiments show that our approach achieves close to 10% higher perceived persuasiveness scores by human testers, and 12.5% higher expected revenue compared to the best performing state-of-the-art approach.

### Clustering Recurrent and Semantically Cohesive Program Statements in Introductory Programming Assignments

• Victor J. Marin
• Carlos R. Rivero

Students taking introductory programming courses are typically required to complete assignments and expect timely feedback to advance their learning. With the current popularity of these courses in both traditional and online versions, graders are seeing themselves overwhelmed by the sheer amount of student programs they have to handle, and the quality of the educational experience provided is often compromised for promptness. Thus, there is a need for automated approaches to effectively increase grading productivity. Existing approaches in this context fail to support flexible grading schemes and customization based on the assignment at hand. This paper presents a data-driven approach for clustering recurrent program statements performing similar but not exact semantics across student programs, which we refer to as core statements. We rely on structural graph clustering over the program dependence graph representations of student programs. Such clustering is performed over the graph resulting from the pairwise approximate graph alignments of programs. Core statements help graders understand solution variations at a glance and, since they group program statements present in individual student programs, can be used to propagate feedback, thus increasing grading productivity. Our experimental results show that, on average, we discover core statements covering more than 50% of individual student programs, and that program statements grouped by core statements are semantically cohesive, which ensures effective grading.

### Going Beyond Content Richness: Verified Information Aware Summarization of Crisis-Related Microblogs

• Ashish Sharma
• Koustav Rudra
• Niloy Ganguly

High-impact catastrophic events (bomb attacks, shootings) trigger posting of large volume of information on social media platforms such as Twitter. Recent works have proposed content-aware systems for summarizing this information, thereby facilitating post-disaster services. However, a significant proportion of the posted content is unverified, which restricts the practical usage of the existing summarization systems. In this paper, we work on the novel task of generating verified summaries of information posted on Twitter during disasters. We first jointly learn representations of content-classes and expression-classes of tweets posted during disasters using a novel LDA-based generative model. These representations of content & expression classes are used in conjunction with pre-disaster user behavior and temporal signals (replies) for training a Tree-LSTM based tweet-verification model. The model infers tweet verification probabilities which are used, besides information content of tweets, in an Integer Linear Programming (ILP) framework for generating the desired verified summaries. The summaries are fine-tuned using the class information of the tweets as obtained from the LDA-based generative model. Extensive experiments are performed on a publicly-available labeled dataset of man-made disasters which demonstrate the effectiveness of our tweet-verification (3-13% gain over baselines) and summarization (12-48% gain in verified content proportion, 8-13% gain in ROUGE-score over state-of-the-art) systems. We make implementations of our various modules available online.

### Declarative User Selection with Soft Constraints

• Yael Amsterdamer
• Tova Milo
• Amit Somech
• Brit Youngmann

In applications with large userbases such as crowdsourcing, social networks or recommender systems, selecting users is a common and challenging task. Different applications require different policies for selecting users, and implementing such policies is applicationspecific and laborious. To this end, we introduce a novel declarative framework that abstracts common components of the user selection problem, while allowing for domain-specific tuning. The framework is based on an ontology view of user profiles, with respect to which we define a query language for policy specification. Our language extends SPARQL with means for capturing soft constraints which are essential for worker selection. At the core of our query engine is then a novel efficient algorithm for handling these constraints. Our experimental study on real-life data indicates the effectiveness and flexibility of our approach, showing in particular that it outperforms existing task-specific solutions in prominent user selection tasks.

### #suicidal - A Multipronged Approach to Identify and Explore Suicidal Ideation in Twitter

• Rohan Mishra
• Ramit Sawhney
• Debanjan Mahata
• Rajiv Ratn Shah
• Huan Liu

Technological advancements have led to the creation of social media platforms like Twitter, where people have started voicing their views over rarely discussed and socially stigmatizing issues. Twitter, is increasingly being used for studying psycho-linguistic phenomenon spanning from expressions of adverse drug reactions, depressions, to suicidality. In this work we focus on identifying suicidal posts from Twitter. Towards this objective we take a multipronged approach and implement different neural network models such assequential models andgraph convolutional networks, that are trained on textual content shared in Twitter, the historical tweeting activity of the users and social network formed between different users posting about suicidality. We train a stacked ensemble of classifiers representing different aspects of suicidal tweeting activity, and achieve state-of-the-art results on a new manually annotated dataset developed by us, that contains textual as well as network information of suicidal tweets. We further investigate into the trained models and perform qualitative analysis showing how historical tweeting activity and rich information embedded in the homophily networks amongst users in Twitter, aids in accurately identifying tweets expressing suicidal intent.

## SESSION: Long - Mining in Emerging Applications II

### MusicBot: Evaluating Critiquing-Based Music Recommenders with Conversational Interaction

• Yucheng Jin
• Wanling Cai
• Li Chen
• Nyi Nyi Htun
• Katrien Verbert

Critiquing-based recommender systems aim to elicit more accurate user preferences from users' feedback toward recommendations. However, systems using a graphical user interface (GUI) limit the way that users can critique the recommendation. With the rise of chatbots in many application domains, they have been regarded as an ideal platform to build critiquing-based recommender systems. Therefore, we present MusicBot, a chatbot for music recommendations, featured with two typical critiquing techniques, user-initiated critiquing (UC) and system-suggested critiquing (SC). By conducting a within-subjects (N=45) study with two typical scenarios of music listening, we compared a system of only having UC with a hybrid critiquing system that combines SC with UC. Furthermore, we analyzed the effects of four personal characteristics,musical sophistication (MS), desire for control (DFC), chatbot experience (CE), and tech savviness (TS), on the user's perception and interaction of the recommendation in MusicBot. In general, compared with UC, SC yields higher perceived diversity and efficiency in looking for songs; combining UC and SC tends to increase user engagement. Both MS and DFC positively influence several key user experience (UX) metrics of MusicBot such as interest matching, perceived controllability, and intent to provide feedback.

### Discovering Polarized Communities in Signed Networks

• Francesco Bonchi
• Edoardo Galimberti
• Aristides Gionis
• Bruno Ordozgoiti
• Giancarlo Ruffo

Signed networks contain edge annotations to indicate whether each interaction is friendly (positive edge) or antagonistic (negative edge). The model is simple but powerful and it can capture novel and interesting structural properties of real-world phenomena. The analysis of signed networks has many applications from modeling discussions in social media, to mining user reviews, and to recommending products in e-commerce sites. In this paper we consider the problem of discovering polarized communities in signed networks. In particular, we search for two communities (subsets of the network vertices) where within communities there are mostly positive edges while across communities there are mostly negative edges. We formulate this novel problem as a "discrete eigenvector'' problem, which we show to be NP-hard. We then develop two intuitive spectral algorithms: one deterministic, and one randomized with quality guarantee $\sqrtn$ (where n is the number of vertices in the graph), tight up to constant factors. We validate our algorithms against non-trivial baselines on real-world signed networks. Our experiments confirm that our algorithms produce higher quality solutions, are much faster and can scale to much larger networks than the baselines, and are able to detect ground-truth polarized communities.

### Model-based Constrained MDP for Budget Allocation in Sequential Incentive Marketing

• Shuai Xiao
• Le Guo
• Zaifan Jiang
• Lei Lv
• Yuanbo Chen
• Jun Zhu
• Shuang Yang

Sequential incentive marketing is an important approach for online businesses to acquire customers, increase loyalty and boost sales. How to effectively allocate the incentives so as to maximize the return (e.g., business objectives) under the budget constraint, however, is less studied in the literature. This problem is technically challenging due to the facts that 1) the allocation strategy has to be learned using historically logged data, which is counterfactual in nature, and 2) both the optimality and feasibility (i.e., that cost cannot exceed budget) needs to be assessed before being deployed to online systems. In this paper, we formulate the problem as a constrained Markov decision process (CMDP). To solve the CMDP problem with logged counterfactual data, we propose an efficient learning algorithm which combines bisection search and model-based planning. First, the CMDP is converted into its dual using Lagrangian relaxation, which is proved to be monotonic with respect to the dual variable. Furthermore, we show that the dual problem can be solved by policy learning, with the optimal dual variable being found efficiently via bisection search (i.e., by taking advantage of the monotonicity). Lastly, we show that model-based planing can be used to effectively accelerate the joint optimization process without retraining the policy for every dual variable. Empirical results on synthetic and real marketing datasets confirm the effectiveness of our methods.

### Wide-Ranging Review Manipulation Attacks: Model, Empirical Study, and Countermeasures

• Parisa Kaghazgaran
• Majid Alfifi
• James Caverlee

User reviews have become a cornerstone of how we make decisions. However, this user-based feedback is susceptible to manipulation as recent research has shown the feasibility of automatically generating fake reviews. Previous investigations, however, have focused on generative fake review approaches that are (i) domain dependent and not extendable to other domains without replicating the whole process from scratch; and (ii) character-level based known to generate reviews of poor quality that are easily detectable by anti-spam detectors and by end users. In this work, we propose and evaluate a new class of attacks on online review platforms based on neural language models at word-level granularity in an inductive transfer-learning framework wherein a universal model is refined to handle domain shift, leading to potentially wide-ranging attacks on review systems. Through extensive evaluation, we show that such model-generated reviews can bypass powerful anti-spam detectors and fool end users. Paired with this troubling attack vector, we propose a new defense mechanism that exploits the distributed representation of these reviews to detect model-generated reviews. We conclude that despite the success of neural models in generating realistic reviews, our proposed RNN-based discriminator can combat this type of attack effectively (90% accuracy).

### Augment to Prevent: Short-Text Data Augmentation in Deep Learning for Hate-Speech Classification

• Georgios Rizos
• Konstantin Hemker
• Björn Schuller

In this paper, we address the issue of augmenting text data in supervised Natural Language Processing problems, exemplified by deep online hate speech classification. A great challenge in this domain is that although the presence of hate speech can be deleterious to the quality of service provided by social platforms, it still comprises only a tiny fraction of the content that can be found online, which can lead to performance deterioration due to majority class overfitting. To this end, we perform a thorough study on the application of deep learning to the hate speech detection problem: a) we propose three text-based data augmentation techniques aimed at reducing the degree of class imbalance and to maximise the amount of information we can extract from our limited resources and b) we apply them on a selection of top-performing deep architectures and hate speech databases in order to showcase their generalisation properties. The data augmentation techniques are based on a) synonym replacement based on word embedding vector closeness, b) warping of the word tokens along the padded sequence or c) class-conditional, recurrent neural language generation. Our proposed framework yields a significant increase in multi-class hate speech detection, outperforming the baseline in the largest online hate speech database by an absolute 5.7% increase in Macro-F1 score and 30% in hate speech class recall.

## SESSION: Long - Natural Language Processing I

### Nested Relation Extraction with Iterative Neural Network

• Yixuan Cao
• Dian Chen
• Hongwei Li
• Ping Luo

Natural language is used to describe objective facts, including simple relations like ""Jobs was the CEO of Apple"", and complex relations like ""the GDP of the United States in 2018 grew 2.9% compared with 2017". For the latter example, the growth rate relation is between two other relations. Due to the complex nature of language, this kind of nested relations is expressed frequently, especially in professional documents in fields like economics, finance, and biomedicine. But extracting nested relations is challenging, and research on this problem is almost vacant. In this paper, we formally formulate the nested relation extraction problem, and come up with a solution using Iterative Neural Network. Specifically, we observe that the nested relation structures can be expressed as a Directed Acyclic Graph (DAG), and propose the model to simultaneously consider the word sequence of natural language in the horizontal direction and the DAG structure in the vertical direction. Based on two nested relation extraction tasks, namely semantic causality relation extraction and formula extraction, we show that the proposed model works well on them. Moreover, we speed up the DAG-LSTM training significantly by a simple parallelization solution.

### Learning Chinese Word Embeddings from Stroke, Structure and Pinyin of Characters

• Yun Zhang
• Yongguo Liu
• Jiajing Zhu
• Ziqiang Zheng
• Xiaofeng Liu
• Weiguang Wang
• Zijie Chen
• Shuangqing Zhai

Chinese word embeddings have recently attracted much attention in natural language processing (NLP). Existing researches learn Chinese word embeddings based on characters, radicals, components and stroke n-gram. Besides abovementioned features, Chinese characters also own structure and pinyin features. In this paper, we design feature substring, a super set of radicals, components and stroke n-gram with structure and pinyin information, to integrate stroke, structure and pinyin features of Chinese characters and capture the semantics of Chinese words. Based on the feature substring, we propose a novel method ssp2vec to predict the contextual words based on the feature substrings of the target words for learning Chinese word embeddings. It is based on our observation that exploiting the morphological information (stroke and structure) and the phonetic information (pinyin) is crucial for capturing the meanings of Chinese words. Meanwhile, the phonetic information (pinyin) can assist the model to distinguish Chinese words. Experimental results on word analogy, word similarity, text classification and named entity recognition tasks show that the proposed method obtains better results than state-of-the-art approaches.

### Sentiment Commonsense Induced Sequential Neural Networks for Sentiment Classification

• Chen Shiyun
• Lin Xin
• Xiao Yanghua
• He Liang

Although neural networks achieve promising performance in sentence level sentiment classification, most of them are not aware of sentiment commonsense, such as sentiment polarity tags (Positive or Negative) for words, which explicitly determine the sentiment of the sentence in most cases. In this paper, we propose an auxiliary tagging task to integrate sentiment commonsense into sequential neural networks (such as LSTM). We employ the advantage of multitask learning to achieve two goals simultaneously: 1) the sequential learning task accounts for incorporating the semantic information of the surrounding words; 2) the word tagging task ensures the sequential representation still retains the corresponding word tagging information. Besides, considering the most direct way to introduce sentiment information into models as additional knowledge, we further incorporate the additional knowledge enhancing tagging task model to strengthen the effect of sentiment commonsense. We prove the effectiveness of the sentiment commonsense by extensive experiments. The results show that our models exhibit consistent superiority over competitors on three real-word datasets. Specifically, we obtain an accuracy of 55.2%, which is a new state-of-the-art for SST-fine dataset.

### Interactive Multi-Grained Joint Model for Targeted Sentiment Analysis

• Da Yin
• Xiao Liu
• Xiaojun Wan

In this paper, we propose an interactive multi-grained joint model for targeted sentiment analysis. Firstly, different from previous works, we leverage the correlation between target and sentiment clues and deeply strengthen interaction between them because targets are highly related to the sentiment clues in a sentence. Moreover, we apply a multi-layer structure to consider multi-grained target and sentiment tagging information more comprehensively. Also, we design two specific loss functions to prevent a word from being both part of a target and a sentiment clue simultaneously, and to align the boundary information of two labeling subsystems. We conduct experiments on English and Spanish datasets and the experimental results show that our approach substantially outperforms a variety of previous models and achieves new state-of-the-art results on these datasets.

### Beyond word2vec: Distance-graph Tensor Factorization for Word and Document Embeddings

• Suhang Wang
• Charu Aggarwal
• Huan Liu

The \em word2vec methodology such as Skip-gram and CBOW has seen significant interest in recent years because of its ability to model semantic notions of word similarity and distances in sentences. A related methodology, referred to as \em doc2vec is also able to embed sentences and paragraphs. These methodologies, however, lead to different embeddings that cannot be related to one another. In this paper, we present a tensor factorization methodology, which simultaneously embeds words and sentences into latent representations in one shot. Furthermore, these latent representations are concretely related to one another via tensor factorization. Whereas \em word2vec and \em doc2vec are dependent on the use of contextual windows in order to create the projections, our approach treats each document as a structural graph on words. Therefore, all the documents in the corpus are jointly factorized in order to simultaneously create an embedding for the individual documents and the words. Since the graphical representation of a document is much richer than a contextual window, the approach is capable of designing more powerful representations than those using the \em word2vec family of methods. We use a carefully designed negative sampling methodology to provide an efficient implementation of the approach. We relate the approach to factorization machines, which provides an efficient alternative for its implementation. We present experimental results illustrating the effectiveness of the approach for document classification, information retrieval and visualization.

## SESSION: Long - Natural Language Processing II

### Hierarchical Multi-label Text Classification: An Attention-based Recurrent Network Approach

• Wei Huang
• Enhong Chen
• Qi Liu
• Yuying Chen
• Zai Huang
• Yang Liu
• Zhou Zhao
• Dan Zhang
• Shijin Wang

Hierarchical multi-label text classification (HMTC) is a fundamental but challenging task of numerous applications (e.g., patent annotation), where documents are assigned to multiple categories stored in a hierarchical structure. Categories at different levels of a document tend to have dependencies. However, the majority of prior studies for the HMTC task employ classifiers to either deal with all categories simultaneously or decompose the original problem into a set of flat multi-label classification subproblems, ignoring the associations between texts and the hierarchical structure and the dependencies among different levels of the hierarchical structure. To that end, in this paper, we propose a novel framework called Hierarchical Attention-based Recurrent Neural Network (HARNN) for classifying documents into the most relevant categories level by level via integrating texts and the hierarchical category structure. Specifically, we first apply a documentation representing layer for obtaining the representation of texts and the hierarchical structure. Then, we develop an hierarchical attention-based recurrent layer to model the dependencies among different levels of the hierarchical structure in a top-down fashion. Here, a hierarchical attention strategy is proposed to capture the associations between texts and the hierarchical structure. Finally, we design a hybrid method which is capable of predicting the categories of each level while classifying all categories in the entire hierarchical structure precisely. Extensive experimental results on two real-world datasets demonstrate the effectiveness and explanatory power of HARNN.

### A Semantics Aware Random Forest for Text Classification

• Md Zahidul Islam
• Jixue Liu
• Jiuyong Li
• Lin Liu
• Wei Kang

The Random Forest (RF) classifiers are suitable for dealing with the high dimensional noisy data in text classification. An RF model comprises a set of decision trees each of which is trained using random subsets of features. Given an instance, the prediction by the RF is obtained via majority voting of the predictions of all the trees in the forest. However, different test instances would have different values for the features used in the trees and the trees should contribute differently to the predictions. This diverse contribution of the trees is not considered in traditional RFs. Many approaches have been proposed to model the diverse contributions by selecting a subset of trees for each instance. This paper is among these approaches. It proposes a Semantics Aware Random Forest (SARF) classifier. SARF extracts the features used by trees to generate the predictions and selects a subset of the predictions for which the features are relevant to the predicted classes. We evaluated SARF's classification performance on $30$ real-world text datasets and assessed its competitiveness with state-of-the-art ensemble selection methods. The results demonstrate the superior performance of the proposed approach in textual information retrieval and initiate a new direction of research to utilise interpretability of classifiers.

### Federated Topic Modeling

• Di Jiang
• Yuanfeng Song
• Yongxin Tong
• Xueyang Wu
• Weiwei Zhao
• Qian Xu
• Qiang Yang

Topic modeling has been widely applied in a variety of industrial applications. Training a high-quality model usually requires massive amount of in-domain data, in order to provide comprehensive co-occurrence information for the model to learn. However, industrial data such as medical or financial records are often proprietary or sensitive, which precludes uploading to data centers. Hence training topic models in industrial scenarios using conventional approaches faces a dilemma: a party (i.e., a company or institute) has to either tolerate data scarcity or sacrifice data privacy. In this paper, we propose a novel framework named Federated Topic Modeling (FTM), in which multiple parties collaboratively train a high-quality topic model by simultaneously alleviating data scarcity and maintaining immune to privacy adversaries. FTM is inspired by federated learning and consists of novel techniques such as private Metropolis Hastings, topic-wise normalization and heterogeneous model integration. We conduct a series of quantitative evaluations to verify the effectiveness of FTM and deploy FTM in an Automatic Speech Recognition (ASR) system to demonstrate its utility in real-life applications. Experimental results verify FTM's superiority over conventional topic modeling.

### Multi-Turn Response Selection in Retrieval-Based Chatbots with Iterated Attentive Convolution Matching Network

• Heyuan Wang
• Ziyi Wu
• Junyu Chen

Building an intelligent chatbot with multi-turn dialogue ability is a major challenge, which requires understanding the multi-view semantic and dependency correlation among words, n-grams and sub-sequences. In this paper, we investigate selecting the proper response for a context through multi-grained representation and interactive matching. To construct hierarchical representation types of text segments, we propose a refined architecture which exclusively consists of gated dilated-convolution and self-attention. Compared with the recurrent-based sentence modeling methods, this architecture provides more flexibility and a speedup. The matching signals of each utterance-response pair are extracted by integrating the interactive information from different views. Then a turns-aware attention mechanism is utilized to aggregate the matching sequence, so as to identify important utterances and capture the implicit relationship of the whole context. Experiments on two large-scale public data sets show that our model significantly outperforms the state-of-the-art methods in terms of all metrics. We empirically provide a thorough ablation test, as well as the comparison of different representation and matching strategies, for a better insight into how each component affects the performance of the model.

### Sentiment Lexicon Enhanced Neural Sentiment Classification

• Chuhan Wu
• Fangzhao Wu
• Junxin Liu
• Yongfeng Huang
• Xing Xie

Sentiment classification is an important task in the sentiment analysis field. Many deep learning based sentiment classification methods have been proposed in recent years. However, these methods usually rely on massive labeled texts to train sentiment classifiers, which are expensive and time-consuming to annotate. Luckily, many high-quality sentiment lexicons have been constructed and can cover a large number of sentiment words. Since sentiment words are the basic units to convey sentiments in texts, these sentiment lexicons have the potential to improve the performance of neural sentiment classification. In this paper, we propose two approaches to exploit sentiment lexicons to enhance neural sentiment classification. In our first approach we use sentiment lexicons to learn sentiment-aware attentions. We propose a word sentiment classification task to classify the sentiments of words in a sentence based on their hidden representations in the attention network of neural sentiment classification models. We jointly train this task with neural sentiment classifier to facilitate the attention network to recognize and highlight sentiment-bearing words. In our second approach we use sentiment lexicons to learn sentiment-aware word embeddings. We design an auxiliary task to classify the sentiments of words in sentiment lexicons based on their word embeddings, and jointly train this task with neural sentiment classifier to encode sentiment information in sentiment lexicons to word embeddings. Extensive experiments on three benchmark datasets validate the effectiveness of our approach.

## SESSION: Long - Deep Nerual Network I

### ResumeGAN: An Optimized Deep Representation Learning Framework for Talent-Job Fit via Adversarial Learning

• Yong Luo
• Huaizheng Zhang
• Yonggang Wen
• Xinwen Zhang

Nowadays, it is popular to utilize online recruitment services for talent recruitment and job recommendation. Given the vast amounts of online talent profiles and job-posts, it is labor-intensive and exhausted for recruiters to manually select only a few potential candidates for further consideration, and also nontrivial for talents to find the most matched job positions. Recently, some deep learning-based approaches are developed to automatically matching the talent resumes and job requirements, and have achieved encouraging performance. In this paper, we propose a novel framework that targets the same task, but integrate different types of information in a more sophisticated way and introduce adversarial learning to learn more expressive representation. In addition, we build a dataset for model evaluation and the effectiveness of our framework is demonstrated by extensive experiments.

### Regularizing Deep Neural Networks by Ensemble-based Low-Level Sample-Variances Method

• Shuai Yao
• Yuexian Hou
• Liangzhu Ge
• Zeting Hu

Deep Neural Networks (DNNs) with a large number of parameters are very powerful machine learning systems. However, overfitting is a serious problem in such networks. Till now, many regularizers such as dropout, data augmentation have been proposed to prevent overfitting. Motivated by ensemble learning, we treat each hidden layer in neural networks as an ensemble of some base learners by dividing hidden units into some non-overlapping groups and each group is considered as a base learner. Based on the theoretical analysis of generalization error of ensemble estimators (bias-variance-covariance decomposition), we find the variance of each base learner plays an important role in preventing overfitting and propose a novel regularizer---\emphEnsemble-based Low-Level Sample-Variances Method (ELSM) to encourage each base learner of hidden layers to have a low-level sample-variance. Experiments across a number of datasets and network architectures show that ELSM can effectively reduce overfitting and improve the generalization ability of DNNs.

### Attention-Residual Network with CNN for Rumor Detection

• Yixuan Chen
• Jie Sui
• Liang Hu
• Wei Gong

Wide dissemination of unverified claims has negative influence on social lives. Rumors are easy to emerge and spread in the crowds especially in Online Social Network (OSN), due to its openness and extensive amount of users. Therefore, rumor detection in OSN is a very challenging and urgent issue. In this paper, we propose an Attention-Residual network combined with CNN (ARC), which is based on the content features for rumor detection. First, we build a data encoding model based on word-level data for contextual feature representation. Second, we propose a residual framework based on fine-tuned attention mechanism to capture long-range dependency. Third, we apply convolution neural network with varying window size to select important components and local features. Experiments on two twitter datasets demonstrate that the proposed model has better performance than other content-based methods both in rumor detection and early rumor verification. To the best of our knowledge, we are the first work that utilize attention model in conjunction with residual network on rumor detection.

### Imbalance Rectification in Deep Logistic Regression for Multi-Label Image Classification Using Random Noise Samples

• Wenjin Yan
• Ruixuan Li
• Jun Wang
• Yuhua Li
• Jinyang Wang
• Pan Zhou
• Xiwu Gu

Logistic regression (LR) is the most commonly used loss function in multi-label image classification. However, it suffers from class imbalance problem caused by the huge difference in quantity between positive and negative samples as well as between different classes. First, we find that feeding randomly generated noise samples into an LR classifier is an effective way to detect class imbalances, and further define an informative imbalance metric named inference tendency based on noise sample analysis. Second, we design an efficient moving average based method for calculating inference tendency, which can be easily done during training with negligible overhead. Third, two novel rectification methods called extremum shift (ES) and tendency constraint (TC) are designed to offset or constrain inference tendency in the loss function, and mitigate class imbalances significantly. Finally, comparative experiments with Resnet on Microsoft COCO, NUS-WIDE and DeepFashion demonstrate the effectiveness of inference tendency and the superiority of our approach over the baseline LR and several state-of-the-art alternatives.

### CamDrop: A New Explanation of Dropout and A Guided Regularization Method for Deep Neural Networks

• Hongjun Wang
• Guangrun Wang
• Guanbin Li
• Liang Lin

To force convolutional networks to explore more discriminative evidence throughout spatial regions, this paper presents a novel CamDrop to improve the conventional dropout in two aspects. First, by considering the intensity of class activation mapping (CAM) all around, CamDrop selectively abandons some specific spatial regions in predominating visual patterns at each iteration. In many classification tasks, CamDrop demonstrates its effectiveness and achieves considerable improvements on robust predictions for adversarial examples. Second, although dropout is a widely adopted technique that has been applied to regularize large models, the improvement in performance always attributes to better preventing DNN from overfitting. Here we give a new explanation of dropout from the perspective of optimization that it makes the upper bound of the magnitude of gradients much tighter, which leads to a more stable behavior of the gradients and effectively avoids neurons falling into the saturation region of the nonlinear activation, even when using high learning rates. Extensive experiments have been performed to prove the above two strengths of CamDrop.

## SESSION: Long - Deep Nerual Network II

### Dynamic Collaborative Recurrent Learning

• Teng Xiao
• Shangsong Liang
• Zaiqiao Meng

In this paper, we provide a unified learning algorithm, dynamic collaborative recurrent learning, DCRL, of two directions of recommendations: temporal recommendations focusing on tracking the evolution of users' long-term preference and sequential recommendations focusing on capturing short-term preferences given a short time window. Our DCRL builds based on RNN and Sate Space Model (SSM), and thus it is not only able to collaboratively capture users' short-term and long-term preferences as in sequential recommendations, but also can dynamically track the evolution of users' long-term preferences as in temporal recommendations in a unified framework. In addition, we introduce two smoothing and filtering scalable inference algorithms for DCRL's offline and online learning, respectively, based on amortized variational inference, allowing us to effectively train the model jointly over all time. Experiments demonstrate DCRL outperforms the temporal and sequential recommender models, and does capture users' short-term preferences and track the evolution of long-term preferences.

### AutoInt: Automatic Feature Interaction Learning via Self-Attentive Neural Networks

• Weiping Song
• Chence Shi
• Zhiping Xiao
• Zhijian Duan
• Yewen Xu
• Ming Zhang
• Jian Tang

Click-through rate (CTR) prediction, which aims to predict the probability of a user clicking on an ad or an item, is critical to many online applications such as online advertising and recommender systems. The problem is very challenging since (1) the input features (e.g., the user id, user age, item id, item category) are usually sparse and high-dimensional, and (2) an effective prediction relies on high-order combinatorial features (a.k.a. cross features), which are very time-consuming to hand-craft by domain experts and are impossible to be enumerated. Therefore, there have been efforts in finding low-dimensional representations of the sparse and high-dimensional raw features and their meaningful combinations. In this paper, we propose an effective and efficient method called the AutoInt to automatically learn the high-order feature interactions of input features. Our proposed algorithm is very general, which can be applied to both numerical and categorical input features. Specifically, we map both the numerical and categorical features into the same low-dimensional space. Afterwards, a multi-head self-attentive neural network with residual connections is proposed to explicitly model the feature interactions in the low-dimensional space. With different layers of the multi-head self-attentive neural networks, different orders of feature combinations of input features can be modeled. The whole model can be efficiently fit on large-scale raw data in an end-to-end fashion. Experimental results on four real-world datasets show that our proposed approach not only outperforms existing state-of-the-art approaches for prediction but also offers good explainability. Code is available at: \urlhttps://github.com/DeepGraphLearning/RecommenderSystems.

### Automatic Construction of Multi-layer Perceptron Network from Streaming Examples

• Mahardhika Pratama
• Choiru Za'in
• Andri Ashfahani
• Yew Soon Ong
• Weiping Ding

### Robust Embedded Deep K-means Clustering

• Rui Zhang
• Hanghang Tong
• Yinglong Xia

Deep neural network clustering is superior to the conventional clustering methods due to deep feature extraction and nonlinear dimensionality reduction. Nevertheless, deep neural network leads to a rough representation regarding the inherent relationship of the data points. Therefore, it is still difficult for deep neural network to exploit the effective structure for direct clustering. To address this issue, we propose a robust embedded deep K-means clustering (RED-KC) method. The proposed RED-KC approach utilizes the δ-norm metric to constrain the feature mapping process of the auto-encoder network, so that data are mapped to a latent feature space, which is more conducive to the robust clustering. Compared to the existing auto-encoder networks with the fixed prior, the proposed RED-KC is adaptive during the process of feature mapping. More importantly, the proposed RED-KC embeds the clustering process with the auto-encoder network, such that deep feature extraction and clustering can be performed simultaneously. Accordingly, a direct and efficient clustering could be obtained within only one step to avoid the inconvenience of multiple separate stages, namely, losing pivotal information and correlation. Consequently, extensive experiments are provided to validate the effectiveness of the proposed approach.

## SESSION: Long - Network Science

### Discovering Interesting Cycles in Directed Graphs

• Cigdem Aslay
• Tijl De Bie
• Aristides Gionis
• Jefrey Lijffijt

Cycles in graphs often signify interesting processes. For example, cyclic trading patterns can indicate inefficiencies or economic dependencies in trade networks, cycles in food webs can identify fragile dependencies in ecosystems, and cycles in financial transaction networks can be an indication of money laundering. Identifying such interesting cycles, which can also be constrained to contain a given set of query nodes, although not extensively studied, is thus a problem of considerable importance. In this paper, we introduce the problem of discovering interesting cycles in graphs. We first address the problem of quantifying the extent to which a given cycle is interesting for a particular analyst. We then show that finding cycles according to this interestingness measure is related to the longest cycle and maximum mean-weight cycle problems (in the unconstrained setting) and to the maximum Steiner cycle and maximum mean Steiner cycle problems (in the constrained setting). A complexity analysis shows that finding interesting cycles is NP-hard, and is NP-hard to approximate within a constant factor in the unconstrained setting, and within a factor polynomial in the input size for the constrained setting. The latter inapproximability result implies a similar result for the maximum Steiner cycle and maximum mean Steiner cycle problems. Motivated by these hardness results, we propose a number of efficient heuristic algorithms. We verify the effectiveness of the proposed methods and demonstrate their practical utility on two real-world use cases: a food web and an international trade-network dataset.

### FLEET: Butterfly Estimation from a Bipartite Graph Stream

• Seyed-Vahid Sanei-Mehri
• Yu Zhang
• Ahmet Erdem Sariyüce
• Srikanta Tirthapura

We consider space-efficient single-pass estimation of the number of butterflies, a fundamental bipartite graph motif, from a massive bipartite graph stream where each edge represents a connection between entities in two different partitions. We present a space lower bound for any streaming algorithm that can estimate the number of butterflies accurately, as well as FLEET, a suite of algorithms for accurately estimating the number of butterflies in the graph stream. Estimates returned by the algorithms come with provable guarantees on the approximation error, and experiments show good tradeoffs between the space used and the accuracy of approximation. We also present space-efficient algorithms for estimating the number of butterflies within a sliding window of the most recent elements in the stream. While there is a significant body of work on counting subgraphs such as triangles in a unipartite graph stream, our work seems to be one of the few to tackle the case of bipartite graph streams.

### Selecting the Optimal Groups: Efficiently Computing Skyline k-Cliques

• Chen Zhang
• Wenjie Zhang
• Ying Zhang
• Lu Qin
• Fan Zhang
• Xuemin Lin

In many applications, graphs often involve the nodes with multi-dimensional numerical attributes, and it is desirable to retrieve a group of nodes that are both highly connected (e.g., clique) and optimal according to some ranking functions. It is well known that the skyline returns candidates for the optimal objects when ranking functions are not specified. Motivated by this, in this paper we formulate the novel model of skyline k-cliques over multi-valued attributed graphs and develop efficient algorithms to conduct the computation. To verify the group based dominance between two k-cliques, we make use of maximum bipartite matching and develop a set of optimization techniques to improve the verification efficiency. Then, a progressive computation algorithm is developed which enumerates the k-cliques in an order such that a k-clique is guaranteed not to be dominated by those generated after it. Novel pruning and early termination techniques are developed to exclude unpromising nodes or cliques by investigating the structural and attribute properties of the multi-valued attributed graph. Empirical studies on four real datasets demonstrate the effectiveness of the skyline k-clique model and the efficiency of the novel computing techniques.

### Balance in Signed Bipartite Networks

• Tyler Derr
• Cassidy Johnson
• Yi Chang
• Jiliang Tang

A large portion of today's big data can be represented as networks. However, not all networks are the same, and in fact, for many that have additional complexities to their structure, traditional general network analysis methods are no longer applicable. For example, signed networks contain both positive and negative links, and thus dedicated theories and algorithms have been developed. However, previous work mainly focuses on the unipartite setting where signed links connect any pair of nodes. Signed bipartite networks on the one hand, are commonly found, but have primarily been overlooked. Their complexities of having two node types where signed links can only form across the two sets introduce challenges that prevent most existing literature on unipartite signed and unsigned bipartite networks from being applied. On the other hand, balance theory, a key signed social theory, has been generally defined for cycles of any length and is being used in the form of triangles for numerous unipartite signed network tasks. However, in bipartite networks there are no triangles and furthermore there exist two types of nodes. Therefore, in this work, we conduct the first comprehensive analysis and validation of balance theory using the smallest cycle in signed bipartite networks - signed butterflies (i.e., cycles of length 4 containing the two node types). Then, to investigate the applicability of balance theory aiding signed bipartite network tasks, we develop multiple sign prediction methods that utilize balance theory in the form of signed butterflies. Our sign prediction experiment on three real-world signed bipartite networks demonstrates the effectiveness of using these signed butterflies for not only sign prediction, but paves the way for improvements in other signed bipartite network analysis tasks.

### Adaptive Algorithms for Estimating Betweenness and k-path Centralities

• Mostafa Haghir Chehreghani
• Albert Bifet
• Talel Abdessalem

Betweenness centrality and k-path centrality are two important indices that are widely used to analyze social, technological and information networks. In the current paper, first given a directed network G and a vertex $r\in V(G)$, we present a novel adaptive algorithm for estimating betweenness score of r. Our algorithm first computes two subsets of the vertex set of G, called $\mathcalRF (r)$ and $\mathcalRT (r)$. They define the sample spaces of the start-points and the end-points of the samples. Then, it adaptively samples from $\mathcalRF (r)$ and $\mathcalRT (r)$ and stops as soon as some condition is satisfied. The stopping condition depends on the samples met so far, $|\mathcalRF (r)|$ and $|\mathcalRT (r)|$. We show that compared to the well-known existing algorithms, our algorithm gives a better $(łambda,δ)$-approximation. Then, we propose a novel algorithm for estimating k-path centrality of r. Our algorithm is based on computing two sets $\mathcalRF (r)$ and $\mathcalD (r)$. While $\mathcalRF (r)$ defines the sample space of the source vertices of the sampled paths, $\mathcalD (r)$ defines the sample space of the other vertices of the paths. We show that in order to give a $(łambda,δ)$-approximation of the k-path score of r, our algorithm requires considerably less samples. Moreover, it processes each sample faster and with less memory. Finally, we empirically evaluate our proposed algorithms and show their superior performance. Also, we show that they can be used to efficiently compute centrality scores of a set of vertices.

## SESSION: Long - Online and Real-Time

### Interactive Variance Attention based Online Spoiler Detection for Time-Sync Comments

• Wenmian Yang
• Weijia Jia
• Wenyuan Gao
• Xiaojie Zhou
• Yutao Luo

Nowadays, time-sync comment (TSC), a new form of interactive comments, has become increasingly popular on Chinese video websites. By posting TSCs, people can easily express their feelings and exchange their opinions with others when watching online videos. However, some spoilers appear among the TSCs. These spoilers reveal crucial plots in videos that ruin people's surprise when they first watch the video. In this paper, we proposed a novel Similarity-Based Network with Interactive Variance Attention (SBN-IVA) to classify comments as spoilers or not. In this framework, we firstly extract textual features of TSCs through the word-level attentive encoder. We design Similarity-Based Network (SBN) to acquire neighbor and keyframe similarity according to semantic similarity and timestamps of TSCs. Then, we implement Interactive Variance Attention (IVA) to eliminate the impact of noise comments. Finally, we obtain the likelihood of spoiler based on the difference between the neighbor and keyframe similarity. Experiments show SBN-IVA is on average 11.2% higher than the state-of-the-art method on F1-score in baselines.

### Detecting Malicious Accounts in Online Developer Communities Using Deep Learning

• Qingyuan Gong
• Jiayun Zhang
• Yang Chen
• Qi Li
• Yu Xiao
• Xin Wang
• Pan Hui

Online developer communities like GitHub provide services such as distributed version control and task management, which allow a massive number of developers to collaborate online. However, the openness of the communities makes themselves vulnerable to different types of malicious attacks, since the attackers can easily join and interact with legitimate users. In this work, we formulate the malicious account detection problem in online developer communities, and propose GitSec, a deep learning-based solution to detect malicious accounts. GitSec distinguishes malicious accounts from legitimate ones based on the account profiles as well as dynamic activity characteristics. On one hand, GitSec makes use of users' descriptive features from the profiles. On the other hand, GitSec processes users' dynamic behavioral data by constructing two user activity sequences and applying a parallel neural network design to deal with each of them, respectively. An attention mechanism is used to integrate the information generated by the parallel neural networks. The final judgement is made by a decision maker implemented by a supervised machine learning-based classifier. Based on the real-world data of GitHub users, our extensive evaluations show that GitSec is an accurate detection system, with an F1-score of 0.922 and an AUC value of 0.940.

### Exploring Multi-Objective Exercise Recommendations in Online Education Systems

• Zhenya Huang
• Qi Liu
• Chengxiang Zhai
• Yu Yin
• Enhong Chen
• Weibo Gao
• Guoping Hu

Recommending suitable exercises to students in an online education system is highly useful. Existing approaches usually rely on machine learning techniques to mine large amounts of student interaction log data accumulated in the systems to select the most suitable exercises for each student. Generally, they mainly aim to optimize a single objective, i.e., recommending non-mastered exercises to address the immediate weakness of students. While this is a reasonable objective, there exist more beneficial multiple objectives in the long-term learning process that need to be addressed including Review & Explore, Smoothness of difficulty level and Engagement. In this paper, we propose a novel Deep Reinforcement learning framework, namely DRE, for adaptively recommending Exercises to students with optimization of above three objectives. In the framework, we propose two different Exercise Q-Networks for the agent, i.e., EQNM and EQNR, to generate recommendations following Markov property and Recurrent manner, respectively. We also propose novel reward functions to formally quantify those three objectives so that DRE could update and optimize its recommendation strategy by interactively receiving students' performance feedbacks (e.g., score). We conduct extensive experiments on two real-world datasets. Experimental results clearly show that the proposed DRE can effectively learn from the student interaction data to optimize multiple objectives in a single unified framework and adaptively recommend suitable exercises to students.

### Into the Battlefield: Quantifying and Modeling Intra-community Conflicts in Online Discussion

• Subhabrata Dutta
• Dipankar Das
• Gunkirat Kaur
• Shreyans Mongia
• Arpan Mukherjee
• Tanmoy Chakraborty

Over the last decade, online forums have become primary news sources for readers around the globe, and social media platforms are the space where these news forums find most of their audience and engagement. Our particular focus in this paper is to study conflict dynamics over online news articles in Reddit, one of the most popular online discussion platforms. We choose to study how conflicts develop around news inside a discussion community, the \em r/news subreddit. Mining the characteristics of these engagements often provide useful insights into the behavioral dynamics of large-scale human interactions. Such insights are useful for many reasons -- for news houses to improvise their publishing strategies and potential audience, for data analytics to get a better introspection over media engagement as well as for social media platforms to avoid unnecessary and perilous conflicts. In this work, we present a novel quantification of conflict in online discussion. Unlike previous studies on conflict dynamics, which model conflict as a binary phenomenon, our measure is continuous-valued, which we validate with manually annotated ratings. We address a two-way prediction task. Firstly, we predict the probable degree of conflict a news article will face from its audience. We employ multiple machine learning frameworks for this task using various features extracted from news articles.Secondly, given a pair of users and their interaction history, we predict if their future engagement will result in a conflict. We fuse textual and network-based features together using a support vector machine which achieves an AUC of 0.89. Moreover, we implement a graph convolutional model which exploits engagement histories of users to predict whether a pair of users who never met each other before will have a conflicting interaction, with an AUC of 0.69. We perform our studies on a massive discussion dataset crawled from the Reddit news community, containing over $41k$ news articles and $5.5$ million comments. Apart from the prediction tasks, our studies offer interesting insights on the conflict dynamics -- how users form clusters based on conflicting engagements, how different is the temporal nature of conflict over different online news forums, how is contribution of different language based features to induce conflict, etc. In short, our study paves the way towards new methods of exploration and modeling of conflict dynamics inside online discussion communities.

### Offline and Online Satisfaction Prediction in Open-Domain Conversational Systems

• Jason Ingyu Choi
• Eugene Agichtein

Predicting user satisfaction in conversational systems has become critical, as spoken conversational assistants operate in increasingly complex domains. Online satisfaction prediction (i.e., predicting satisfaction of the user with the system after each turn) could be used as a new proxy for implicit user feedback, and offers promising opportunities to create more responsive and effective conversational agents, which adapt to the user's engagement with the agent. To accomplish this goal, we propose a conversational satisfaction prediction model specifically designed for open-domain spoken conversational agents, called ConvSAT. To operate robustly across domains, ConvSAT aggregates multiple representations of the conversation, namely the conversation history, utterance and response content, and system- and user-oriented behavioral signals. We first calibrate ConvSAT performance against state of the art methods on a standard dataset (Dialogue Breakdown Detection Challenge) in an online regime, and then evaluate ConvSAT on a large dataset of conversations with real users, collected as part of the Alexa Prize competition. Our experimental results show that ConvSAT significantly improves satisfaction prediction for both offline and online setting on both datasets, compared to the previously reported state-of-the-art approaches. The insights from our study can enable more intelligent conversational systems, which could adapt in real-time to the inferred user satisfaction and engagement.

## SESSION: Long - Privacy

### Privacy-Preserving Tensor Factorization for Collaborative Health Data Analysis

• Jing Ma
• Qiuchen Zhang
• Jian Lou
• Joyce C. Ho
• Li Xiong
• Xiaoqian Jiang

Tensor factorization has been demonstrated as an efficient approach for computational phenotyping, where massive electronic health records (EHRs) are converted to concise and meaningful clinical concepts. While distributing the tensor factorization tasks to local sites can avoid direct data sharing, it still requires the exchange of intermediary results which could reveal sensitive patient information. Therefore, the challenge is how to jointly decompose the tensor under rigorous and principled privacy constraints, while still support the model's interpretability. We propose DPFact, a privacy-preserving collaborative tensor factorization method for computational phenotyping using EHR. It embeds advanced privacy-preserving mechanisms with collaborative learning. Hospitals can keep their EHR database private but also collaboratively learn meaningful clinical concepts by sharing differentially private intermediary results. Moreover, DPFact solves the heterogeneous patient population using a structured sparsity term. In our framework, each hospital decomposes its local tensors and sends the updated intermediary results with output perturbation every several iterations to a semi-trusted server which generates the phenotypes. The evaluation on both real-world and synthetic datasets demonstrated that under strict privacy constraints, our method is more accurate and communication-efficient than state-of-the-art baseline methods.

### Achieve Privacy-Preserving Truth Discovery in Crowdsensing Systems

• Jianchao Tang
• ShaoJing Fu
• Ming Xu
• Yuchuan Luo
• Kai Huang

To solve the problem that the data collected in crowdsensing systems are not reliable, a large number of truth discovery protocols have been proposed. However, most of them neglect the privacy protection existing in crowdsensing systems. Some truth discovery protocols that consider privacy only provide limited privacy protection, such as only protecting the privacy of collected data. To bridge the gap, in this paper, we propose a more comprehensive privacy-preserving truth discovery protocol that can simultaneously protect the privacy of participants and truth results. Specifically, our protocol encrypts participants' observed data based on Paillier Homomorphic Cryptosystem. Then, through the interaction between two servers, we can calculate participants' weights and estimate the truth results in the encrypted domain. Moreover, based on the data perturbation technology, the privacy of sensitive data exchanged between the two servers is protected in our protocol. Theoretical analysis and experimental results demonstrate that our protocol can effectively protect the privacy of participants and truth results without losing the accuracy of truth results.

### Privacy-preserving Crowd-guided AI Decision-making in Ethical Dilemmas

• Teng Wang
• Jun Zhao
• Han Yu
• Jinyan Liu
• Xinyu Yang
• Xuebin Ren
• Shuyu Shi

With the rapid development of artificial intelligence (AI), ethical issues surrounding AI have attracted increasing attention. In particular, autonomous vehicles may face moral dilemmas in accident scenarios, such as staying the course resulting in hurting pedestrians or swerving leading to hurting passengers. To investigate such ethical dilemmas, recent studies have adopted preference aggregation, in which each voter expresses her/his preferences over decisions for the possible ethical dilemma scenarios, and a centralized system aggregates these preferences to obtain the winning decision. Although a useful methodology for building ethical AI systems, such an approach can potentially violate the privacy of voters since moral preferences are sensitive information and their disclosure can be exploited by malicious parties resulting in negative consequences. In this paper, we report a first-of-its-kind privacy-preserving crowd-guided AI decision-making approach in ethical dilemmas. We adopt the formal and popular notion of differential privacy to quantify privacy, and consider four granularities of privacy protection by taking voter-/record-level privacy protection and centralized/distributed perturbation into account, resulting in four approaches VLCP, RLCP, VLDP, and RLDP, respectively. Moreover, we propose different algorithms to achieve these privacy protection granularities, while retaining the accuracy of the learned moral preference model. Specifically, VLCP and RLCP are implemented with the data aggregator setting a universal privacy parameter and perturbing the averaged moral preference to protect the privacy of voters' data. VLDP and RLDP are implemented in such a way that each voter perturbs her/his local moral preference with a personalized privacy parameter. Extensive experiments based on both synthetic data and real-world data of voters' moral decisions demonstrate that the proposed approaches achieve high accuracy of preference aggregation while protecting individual voter's privacy.

### Privacy Preserving Approximate K-means Clustering

• Chandan Biswas
• Debasis Ganguly
• Dwaipayan Roy
• Ujjwal Bhattacharya

Privacy preserving computation is of utmost importance in a cloud computing environment where a client often requires to send sensitive data to servers offering computing services over untrusted networks. Eavesdropping over the network or malware at the server may lead to leaking sensitive information from the data. To prevent this, we propose to encode the input data in such a way that, firstly, it should be difficult to decode it back to the true data, and secondly, the computational results obtained with the encoded data should not be substantially different from those obtained with the true data. Specifically, the computational activity that we focus on is the K-means clustering, which is widely used for many data mining tasks. Our proposed variant of the K-means algorithm is capable of privacy preservation in the sense that it requires as input only binary encoded data, and is not allowed to access the true data vectors at any stage of the computation. During intermediate stages of K-means computation, our algorithm is able to effectively process the inputs with incomplete information seeking to yield outputs relatively close to the complete information (non-encoded) case. Evaluation on real datasets show that the proposed methods yields comparable clustering effectiveness in comparison to the standard K-means algorithm on image clustering (MNIST-8M dataset), and in fact outperforms the standard K-means on text clustering (ODPtweets dataset).

### Practical Access Pattern Privacy by Combining PIR and Oblivious Shuffle

• Zhilin Zhang
• Ke Wang
• Weipeng Lin
• Raymond Chi-Wing Wong

We consider the following secure data retrieval problem: a client outsources encrypted data blocks to a semi-trusted cloud server and later retrieves blocks without disclosing access patterns. Existing PIR and ORAM solutions suffer from serious performance bottlenecks in terms of communication or computation costs. To help eliminate this void, we introduce "access pattern unlinkability'' that separates access pattern privacy into short-term privacy at individual query level and long-term privacy at query distribution level. This new security definition provides tunable trade-offs between privacy and query performance. We present an efficient construction, called SBR protocol, using PIR and Oblivious Shuffling to enable secure data retrieval while satisfying access pattern unlinkability. Both analytical and empirical analysis show that SBR exhibits flexibility and usability in practice.

## SESSION: Long - Question Answering and Dialogue Systems I

### A Hybrid Retrieval-Generation Neural Conversation Model

• Liu Yang
• Junjie Hu
• Minghui Qiu
• Chen Qu
• Jianfeng Gao
• W. Bruce Croft
• Xiaodong Liu
• Yelong Shen
• Jingjing Liu

Intelligent personal assistant systems that are able to have multi-turn conversations with human users are becoming increasingly popular. Most previous research has been focused on using either retrieval-based or generation-based methods to develop such systems. Retrieval-based methods have the advantage of returning fluent and informative responses with great diversity. However, the performance of the methods is limited by the size of the response repository. On the other hand, generation-based methods can produce highly coherent responses on any topics. But the generated responses are often generic and not informative due to the lack of grounding knowledge. In this paper, we propose a hybrid neural conversation model that combines the merits of both response retrieval and generation methods. Experimental results on Twitter and Foursquare data show that the proposed model outperforms both retrieval-based methods and generation-based methods (including a recently proposed knowledge-grounded neural conversation model) under both automatic evaluation metrics and human evaluation. We hope that the findings in this study provide new insights on how to integrate text retrieval and text generation models for building conversation systems.

### A Latent-Constrained Variational Neural Dialogue Model for Information-Rich Responses

• Yanan Zheng
• Yan Wang
• Lijie Wen
• Jianmin Wang

The variational neural models have achieved significant progress in dialogue generation. They are of encoder-decoder architecture, with stochastic latent variables learned at the utterance level. However, latent variables are usually approximated by factorized-form distributions, the value space of which is too large relative to latent features to be encoded, leading to the sparsity problem. As a result, little useful information is carried in latent representations, and generated responses tend to be non-committal and meaningless. To address it, we initially propose the Latent-Constrained Variational Neural Dialogue Model (LC-VNDM). It follows variational neural dialogue framework, with an utterance encoder, a context encoder and a response decoder hierarchically organized. Particularly, LC-VNDM uses a hierarchically-structured variational distribution form, which considers inter-dependencies between latent variables. Thus it defines a constrained latent value space, and prevents latent global features from being diluted. Therefore, latent representations sampled from it would carry richer global information to facilitate the decoding, generating meaningful responses. We conduct extensive experiments on three datasets using automatic evaluation and human evaluation. Experiments prove that LC-VNDM significantly outperforms the state-of-the-arts and can generate information-richer responses by learning a better-quality latent space.

### Legal Summarization for Multi-role Debate Dialogue via Controversy Focus Mining and Multi-task Learning

• Xinyu Duan
• Yating Zhang
• Lin Yuan
• Xin Zhou
• Xiaozhong Liu
• Tianyi Wang
• Ruocheng Wang
• Qiong Zhang
• Changlong Sun
• Fei Wu

Multi-role court debate is a critical component in a civil trial where parties from different camps (plaintiff, defendant, witness, judge, etc.) actively involved. Unlike other types of dialogue, court debate can be lengthy, and important information, with respect to the controversy focus(es), often hides within the redundant and colloquial dialogue data. Summarizing court debate can be a novel but significant task to assist judge to effectively make the legal decision for the target trial. In this work, we propose an innovative end-to-end model to address this problem. Unlike prior summarization efforts, the proposed model projects the multi-role debate into the controversy focus space, which enables high-quality essential utterance(s) extraction in terms of legal knowledge and judicial factors. An extensive set of experiments with a large civil trial dataset shows that the proposed model can provide more accurate and readable summarization against several alternatives in the multi-role court debate scene.

### ConCET: Entity-Aware Topic Classification for Open-Domain Conversational Agents

• Harshita Sahijwani
• Jason Ingyu Choi
• Eugene Agichtein

Identifying the topic (domain) of each user's utterance in open-domain conversational systems is a crucial step for all subsequent language understanding and response tasks. In particular, for complex domains, an utterance is often routed to a single component responsible for that domain. Thus, correctly mapping a user utterance to the right domain is critical. This is a challenging task: users could mention entities like actors, singers or locations to implicitly indicate the domain, which requires extensive domain knowledge to interpret. To address this problem, we introduce ConCET: a Concurrent Entity-aware conversational Topic classifier, which incorporates entity type information together with the utterance content features. Specifically, ConCET utilizes entity information to enrich the utterance representation, combining character, word, and entity type embeddings into a single representation. However, for rich domains with millions of available entities, unrealistic amounts of labeled training data would be required. To complement our model, we propose a simple and effective method for generating synthetic training data, to augment the typically limited amounts of labeled training data, using commonly available knowledge bases as to generate additional labeled utterances. We extensively evaluate ConCET and our proposed training method first on an openly available human-human conversational dataset called Self-Dialogue, to calibrate our approach against previous state-of-the-art methods; second, we evaluate ConCET on a large dataset of human-machine conversations with real users, collected as part of the Amazon Alexa Prize. Our results show that ConCET significantly improves topic classification performance on both datasets, reaching 8-10% improvements compared to state-of-the-art deep learning methods. We complement our quantitative results with detailed analysis of system performance, which could be used for further improvements of conversational agents.

### An Interactive Mechanism to Improve Question Answering Systems via Feedback

• Xinbo Zhang
• Lei Zou
• Sen Hu

Semantic parsing-based RDF question answering (QA) systems are to interpret users' natural language questions as query graphs and return answers over RDF repository. However, due to the complexity of linking natural phrases with specific RDF items (e.g., entities and predicates), it remains difficult to understand users' question sentences precisely, hence QA systems may not meet users' expectation, offering wrong answers and dismissing some correct answers. In this paper, we design an I nteractive M echanism aiming for PRO motion V ia users' fe edback to Q A systems (IMPROVE-QA), a whole framework to not only make existing QA systems return more precise answers based on a few feedbacks over the original answers given by RDF QA systems, but also enhance paraphrasing dictionaries to ensure a continuous-learning capability in improving RDF QA systems. To provide better interactivity and online performance, we design a holistic graph mining algorithm (HWspan) to automatically refine the query graph. Extensive experiments on both Freebase and DBpedia confirm the effectiveness and superiority of our approach.

## SESSION: Long - Question Answering and Dialogue Systems II

### Attentive History Selection for Conversational Question Answering

• Chen Qu
• Liu Yang
• Minghui Qiu
• Yongfeng Zhang
• Cen Chen
• W. Bruce Croft
• Mohit Iyyer

### Emotion-aware Chat Machine: Automatic Emotional Response Generation for Human-like Emotional Interaction

• Wei Wei
• Jiayi Liu
• Xianling Mao
• Guibing Guo
• Feida Zhu
• Pan Zhou
• Yuchong Hu

The consistency of a response to a given post at semantic-level and emotional-level is essential for a dialogue system to deliver human-like interactions. However, this challenge is not well addressed in the literature, since most of the approaches neglect the emotional information conveyed by a post while generating responses. This article addresses this problem by proposing a unified end-to-end neural architecture, which is capable of simultaneously encoding the semantics and the emotions in a post for generating more intelligent responses with appropriately expressed emotions. Extensive experiments on real-world data demonstrate that the proposed method outperforms the state-of-the-art methods in terms of both content coherence and emotion appropriateness.

### Commonsense Properties from Query Logs and Question Answering Forums

• Julien Romero
• Simon Razniewski
• Koninika Pal
• Jeff Z. Pan
• Gerhard Weikum

Commonsense knowledge about object properties, human behavior and general concepts is crucial for robust AI applications. However, automatic acquisition of this knowledge is challenging because of sparseness and bias in online sources. This paper presents Quasimodo, a methodology and tool suite for distilling commonsense properties from non-standard web sources. We devise novel ways of tapping into search-engine query logs and QA forums, and combining the resulting candidate assertions with statistical cues from encyclopedias, books and image tags in a corroboration step. Unlike prior work on commonsense knowledge bases, Quasimodo focuses on salient properties that are typically associated with certain objects or concepts. Extensive evaluations, including extrinsic use-case studies, show that Quasimodo provides better coverage than state-of-the-art baselines with comparable quality.

### Adapting Visual Question Answering Models for Enhancing Multimodal Community Q&A Platforms

• Avikalp Srivastava
• Hsin-Wen Liu
• Sumio Fujita

Question categorization and expert retrieval methods have been crucial for information organization and accessibility in community question & answering (CQA) platforms. Research in this area, however, has dealt with only the text modality. With the increasingly multimodal nature of web content, we focus on extending these methods for CQA questions accompanied by images. Specifically, we leverage the success of representation learning for text and images in the visual question answering (VQA) domain and adapt the underlying concept and architecture for automated category classification and expert retrieval on image-based questions posted on Yahoo! Chiebukuro, the Japanese counterpart of Yahoo! Answers. To the best of our knowledge, this is the first work to tackle the multimodality challenge in CQA, and to adapt VQA models for tasks on a more ecologically valid source of visual questions. Our analysis of the differences between visual QA and community QA data drives our proposal of novel augmentations of an attention method tailored for CQA and use of auxiliary tasks for learning better grounding features. Our final model markedly outperforms the text-only and VQA model baselines for both tasks of classification and expert retrieval on real-world multimodal CQA data.

### Message Passing for Complex Question Answering over Knowledge Graphs

• Svitlana Vakulenko
• Javier David Fernandez Garcia
• Axel Polleres
• Maarten de Rijke
• Michael Cochez

Question answering over knowledge graphs (KGQA) has evolved from simple single-fact questions to complex questions that require graph traversal and aggregation. We propose a novel approach for complex KGQA that uses unsupervised message passing, which propagates confidence scores obtained by parsing an input question and matching terms in the knowledge graph to a set of possible answers. First, we identify entity, relationship, and class names mentioned in a natural language question, and map these to their counterparts in the graph. Then, the confidence scores of these mappings propagate through the graph structure to locate the answer entities. Finally, these are aggregated depending on the identified question type. This approach can be efficiently implemented as a series of sparse matrix multiplications mimicking joins over small local subgraphs. Our evaluation results show that the proposed approach outperforms the state of the art on the LC-QuAD benchmark. Moreover, we show that the performance of the approach depends only on the quality of the question interpretation results, i.e., given a correct relevance score distribution, our approach always produces a correct answer ranking. Our error analysis reveals correct answers missing from the benchmark dataset and inconsistencies in the DBpedia knowledge graph. Finally, we provide a comprehensive evaluation of the proposed approach accompanied with an ablation study and an error analysis, which showcase the pitfalls for each of the question answering components in more detail.

## SESSION: Long - Recommendation System I

### BERT4Rec: Sequential Recommendation with Bidirectional Encoder Representations from Transformer

• Fei Sun
• Jun Liu
• Jian Wu
• Changhua Pei
• Xiao Lin
• Wenwu Ou
• Peng Jiang

Modeling users' dynamic preferences from their historical behaviors is challenging and crucial for recommendation systems. Previous methods employ sequential neural networks to encode users' historical interactions from left to right into hidden representations for making recommendations. Despite their effectiveness, we argue that such left-to-right unidirectional models are sub-optimal due to the limitations including: \begin enumerate* [label=series\itshape\alph*\upshape)] \item unidirectional architectures restrict the power of hidden representation in users' behavior sequences; \item they often assume a rigidly ordered sequence which is not always practical. \end enumerate* To address these limitations, we proposed a sequential recommendation model called BERT4Rec, which employs the deep bidirectional self-attention to model user behavior sequences. To avoid the information leakage and efficiently train the bidirectional model, we adopt the Cloze objective to sequential recommendation, predicting the random masked items in the sequence by jointly conditioning on their left and right context. In this way, we learn a bidirectional representation model to make recommendations by allowing each item in user historical behaviors to fuse information from both left and right sides. Extensive experiments on four benchmark datasets show that our model outperforms various state-of-the-art sequential models consistently.

### Adaptive Feature Sampling for Recommendation with Missing Content Feature Values

• Shaoyun Shi
• Min Zhang
• Xinxing Yu
• Yongfeng Zhang
• Bin Hao
• Yiqun Liu
• Shaoping Ma

Most recommendation algorithms mainly make use of user history interactions in the model, while these methods often suffer from the cold-start problem (user/item has no history information). On the other sides, content features help on cold-start scenarios for modeling new users or items. So it is essential to utilize content features to enhance different recommendation models. To take full advantage of content features, feature interactions such as cross features are used by some models and outperform than using raw features. However, in real-world systems, many content features are incomplete, e.g., we may know the occupation and gender of a user, but the values of other features (location, interests, etc.) are missing. This missing-feature-value (MFV) problem is harmful to the model performance, especially for models that rely heavily on rich feature interactions. Unfortunately, this problem has not been well studied previously.

In this work, we propose a new adaptive "Feature Sampling'' strategy to help train different models to fit distinct scenarios, no matter for cold-start or missing feature value cases. With the help of this strategy, more feature interactions can be utilized. A novel model named CC-CC is proposed. The model takes both raw features and the feature interactions into consideration. It has a linear part to memorize useful variant information from the user or item contents and contexts (Content & Context Module), and a deep attentive neural module that models both content and collaborate information to enhance the generalization ability (Content & Collaborate Module). Both parts have feature interactions. The model is evaluated on two public datasets. Comparative results show that the proposed CC-CC model outperforms the state-of-the-art algorithms on both warm and cold scenarios significantly (up to 6.3%). To the best of our knowledge, this model is the first clear and powerful model that proposed to handle the missing feature values problem in deep neural network frameworks for recommender systems.

### A Dynamic Co-attention Network for Session-based Recommendation

• Wanyu Chen
• Fei Cai
• Honghui Chen
• Maarten de Rijke

Session-based recommendation is the task of recommending the next item a user might be interested in given partially known session information, e.g., part of a session or recent historical sessions. An effective session-based recommender should be able to exploit a user's evolving preferences, which we assume to be a mixture of her short- and long-term interests. Existing session-based recommendation methods often embed a user's long-term preference into a static representation, which plays a fixed role when dealing with her current short-term interests. This is problematic because long-term preferences may be more or less important for predicting the next conversion depending on the user's short-term interests. We propose a DCN-SR. DCN-SR applies a co-attention network to capture the dynamic interactions between the user's long- and short-term interaction behavior and generates co-dependent representations of the user's long- and short-term interests. For modeling a user's short-term interaction behavior, we design a CGRU network to take actions like "click'', "collect'' and "buy'' into account. Experiments on e-commerce datasets show significant improvements of DCN-SR over state-of-the-art session-based recommendation methods, with improvements of up to 2.58% on the Tmall dataset and 3.08% on the Tianchi dataset in terms of [email protected] [email protected] improvements are 3.78% and 4.05%, respectively. We also investigate the scalability and sensitivity of DCN-SR. The improvements of DCN-SR over state-of-the-art baselines are especially noticeable for short sessions and active users with many historical interactions.

### Attributed Multi-Relational Attention Network for Fact-checking URL Recommendation

• Di You
• Nguyen Vo
• Kyumin Lee
• Qiang LIU

To combat fake news, researchers mostly focused on detecting fake news and journalists built and maintained fact-checking sites (e.g., Snopes.com and Politifact.com). However, fake news dissemination has been greatly promoted via social media sites, and these fact-checking sites have not been fully utilized. To overcome these problems and complement existing methods against fake news, in this paper we propose a deep-learning based fact-checking URL recommender system to mitigate impact of fake news in social media sites such as Twitter and Facebook. In particular, our proposed framework consists of a multi-relational attentive module and a heterogeneous graph attention network to learn complex/semantic relationship between user-URL pairs, user-user pairs, and URL-URL pairs. Extensive experiments on a real-world dataset show that our proposed framework outperforms eight state-of-the-art recommendation models, achieving at least 3$\sim$5.3% improvement. Our source code and dataset are available at \urlhttps://web.cs.wpi.edu/~kmlee/data.html .

### A Hierarchical Self-Attentive Model for Recommending User-Generated Item Lists

• Yun He
• Jianling Wang
• Wei Niu
• James Caverlee

User-generated item lists are a popular feature of many different platforms. Examples include lists of books on Goodreads, playlists on Spotify and YouTube, collections of images on Pinterest, and lists of answers on question-answer sites like Zhihu. Recommending item lists is critical for increasing user engagement and connecting users to new items, but many approaches are designed for the item-based recommendation, without careful consideration of the complex relationships between items and lists. Hence, in this paper, we propose a novel user-generated list recommendation model called AttList. Two unique features of AttList are careful modeling of (i) hierarchical user preference, which aggregates items to characterize the list that they belong to, and then aggregates these lists to estimate the user preference, naturally fitting into the hierarchical structure of item lists; and (ii) item and list consistency, through a novel self-attentive aggregation layer designed for capturing the consistency of neighboring items and lists to better model user preference. Through experiments over three real-world datasets reflecting different kinds of user-generated item lists, we find that AttList results in significant improvements in NDCG, [email protected], and [email protected] versus a suite of state-of-the-art baselines. Furthermore, all code and data are available at https://github.com/heyunh2015/AttList.

## SESSION: Long - Recommendation System II

### HAES: A New Hybrid Approach for Movie Recommendation with Elastic Serendipity

• Xueqi Li
• Wenjun Jiang
• Weiguang Chen
• Jie Wu
• Guojun Wang

Recommendation systems provide good guidance for users to find their favorite movies from an overwhelming amount of options. However, most systems excessively pursue the recommendation accuracy and give rise to over-specialization, which triggers the emergence of serendipity. Hence, serendipity recommendation has received more attention in recent years, facing three key challenges: subjectivity in the definition, the lack of data, and users' floating demands for serendipity. To address these challenges, we introduce a new model called HAES, a H ybrid A pproach for movie recommendation with E lastic S erendipity, to recommend serendipitous movies. Specifically, we (1) propose a more objective definition of serendipity, \em content difference and \em genre accuracy, according to the analysis on a real dataset, (2) propose a new algorithm named JohnsonMax to mitigate the data sparsity and build weak ties beneficial to finding serendipitous movies, and (3) define a novel concept of elasticity in the recommendation, to adjust the level of serendipity flexibly and reach a trade-off between accuracy and serendipity. Extensive experiments on real-world datasets show that HAES enhances the serendipity of recommendations while preserving recommendation quality, compared to several widely used methods.

### DBRec: Dual-Bridging Recommendation via Discovering Latent Groups

• Jingwei Ma
• Jiahui Wen
• Mingyang Zhong
• Liangchen Liu
• Chaojie Li
• Weitong Chen
• Yin Yang
• Hongkui Tu
• Xue Li

In recommender systems, the user-item interaction data is usually sparse and not sufficient for learning comprehensive user/item representations for recommendation. To address this problem, we propose a novel dual-bridging recommendation model (DBRec). DBRec performs latent user/item group discovery simultaneously with collaborative filtering, and interacts group information with users/items for bridging similar users/items. Therefore, a user's preference over an unobserved item, in DBRec, can be bridged by the users within the same group who have rated the item, or the user-rated items that share the same group with the unobserved item. In addition, we propose to jointly learn user-user group (item-item group) hierarchies, so that we can effectively discover latent groups and learn compact user/item representations. We jointly integrate collaborative filtering, latent group discovering and hierarchical modelling into a unified framework, so that all the model parameters can be learned toward the optimization of the objective function. We validate the effectiveness of the proposed model with two real datasets, and demonstrate its advantage over the state-of-the-art recommendation models with extensive experiments.

### Candidate Generation with Binary Codes for Large-Scale Top-N Recommendation

• Wang-Cheng Kang
• Julian McAuley

Generating the Top-N recommendations from a large corpus is computationally expensive to perform at scale. Candidate generation and re-ranking based approaches are often adopted in industrial settings to alleviate efficiency problems. However it remains to be fully studied how well such schemes approximate complete rankings (or how many candidates are required to achieve a good approximation), or to develop systematic approaches to generate high-quality candidates efficiently. In this paper, we seek to investigate these questions via proposing a candidate generation and re-ranking based framework (CIGAR), which first learns a preference-preserving binary embedding for building a hash table to retrieve candidates, and then learns to re-rank the candidates using real-valued ranking models with a candidate-oriented objective. We perform a comprehensive study on several large-scale real-world datasets consisting of millions of users/items and hundreds of millions of interactions. Our results show that CIGAR significantly boosts the Top-N accuracy against state-of-the-art recommendation models, while reducing the query time by orders of magnitude. We hope that this work could draw more attention to the candidate generation problem in recommender systems.

### DTCDR: A Framework for Dual-Target Cross-Domain Recommendation

• Feng Zhu
• Chaochao Chen
• Yan Wang
• Guanfeng Liu
• Xiaolin Zheng

In order to address the data sparsity problem in recommender systems, in recent years, Cross-Domain Recommendation (CDR) leverages the relatively richer information from a source domain to improve the recommendation performance on a target domain with sparser information. However, each of the two domains may be relatively richer in certain types of information (e.g., ratings, reviews, user profiles, item details, and tags), and thus, if we can leverage such information well, it is possible to improve the recommendation performance on both domains simultaneously (i.e., dual-target CDR), rather than a single target domain only. To this end, in this paper, we propose a new framework, DTCDR, for Dual-Target Cross-Domain Recommendation. In DTCDR, we first extensively utilize rating and multi-source content information to generate rating and document embeddings of users and items. Then, based on Multi-Task Learning (MTL), we design an adaptable embedding-sharing strategy to combine and share the embeddings of common users across domains, with which DTCDR can improve the recommendation performance on both richer and sparser (i.e., dual-target) domains simultaneously. Extensive experiments conducted on real-world datasets demonstrate that DTCDR can significantly improve the recommendation accuracies on both richer and sparser domains and outperform the state-of-the-art single-domain and cross-domain approaches.

### Recommender System Using Sequential and Global Preference via Attention Mechanism and Topic Modeling

• Kyeongpil Kang
• Junwoo Park
• Wooyoung Kim
• Hojung Choe
• Jaegul Choo

Deep neural networks improved the accuracy of sequential recommendation approach which takes into account the sequential patterns of user logs, e.g., a purchase history of a user. However, incorporating only the individual's recent logs may not be sufficient in properly reflecting global preferences and trends across all users and items. In response, we propose a self-attentive sequential recommender system with topic modeling-based category embedding as a novel approach to exploit global information in the process of sequential recommendation. Our self-attention module effectively leverages the sequential patterns from the user's recent history. In addition, our novel category embedding approach, which utilizes the information computed by topic modeling, efficiently captures global information that the user generally prefers. Furthermore, to provide diverse recommendations as well as to prevent overfitting, our model also incorporates a vector obtained by random sampling. Experimental studies show that our model outperforms state-of-the-art sequential recommendation models, and that category embedding effectively provides global preference information.

## SESSION: Long - Recommendation System III

### A Spatio-temporal Recommender System for On-demand Cinemas

• Taofeng Xue
• Beihong Jin
• Beibei Li
• Weiqing Wang
• Qi Zhang
• Sihua Tian

On-demand cinemas are a new type of offline entertainment venues which have shown the rapid expansion in the recent years. Recommending movies of interest to the potential audiences in on-demand cinemas is keen but challenging because the recommendation scenario is totally different from all the existing recommendation applications including online video recommendation, offline item recommendation and group recommendation. In this paper, we propose a novel spatio-temporal approach called Pegasus. Because of the specific characteristics of on-demand cinema recommendation, Pegasus exploits the POI (Point of Interest) information around cinemas and the content descriptions of movies, apart from the historical movie consumption records of cinemas. Pegasus explores the temporal dynamics and spatial influences rooted in audience behaviors, and captures the similarities between cinemas, the changes of audience crowds, time-varying features and regional disparities of movie popularity. It offers an effective and explainable way to recommend movies to on-demand cinemas. The corresponding Pegasus system has been deployed in some pilot on-demand cinemas. Based on the real-world data from on-demand cinemas, extensive experiments as well as pilot tests are conducted. Both experimental results and post-deployment feedback show that Pegasus is effective.

### Semi-Supervised Learning for Cross-Domain Recommendation to Cold-Start Users

• SeongKu Kang
• Junyoung Hwang
• Dongha Lee
• Hwanjo Yu

Providing accurate recommendations to newly joined users (or potential users, so-called cold-start users) has remained a challenging yet important problem in recommender systems. To infer the preferences of such cold-start users based on their preferences observed in other domains, several cross-domain recommendation (CDR) methods have been studied. The state-of-the-art Embedding and Mapping approach for CDR (EMCDR) aims to infer the latent vectors of cold-start users by supervised mapping from the latent space of another domain. In this paper, we propose a novel CDR framework based on semi-supervised mapping, called SSCDR, which effectively learns the cross-domain relationship even in the case that only a few number of labeled data is available. To this end, it first learns the latent vectors of users and items for each domain so that their interactions are represented by the distances, then trains a cross-domain mapping function to encode such distance information by exploiting both overlapping users as labeled data and all the items as unlabeled data. In addition, SSCDR adopts an effective inference technique that predicts the latent vectors of cold-start users by aggregating their neighborhood information. Our extensive experiments on different CDR scenarios show that SSCDR outperforms the state-of-the-art methods in terms of CDR accuracy, particularly in the realistic settings that a small portion of users overlap between two domains.

### Leveraging Ratings and Reviews with Gating Mechanism for Recommendation

• Haifeng Xia
• Zengmao Wang
• Bo Du
• Lefei Zhang
• Shuai Chen
• Gang Chun

Recommender system plays an important role to provide people with personalized information based on their history records. However, it is still a challenge to capture the preference of users accurately due to the sparsity of rating data and the heterogeneity of review data. In this paper, we propose a hybrid deep collaborative filtering model that jointly learns latent representations from ratings and reviews. Specifically, the model learns the rating feature and textual feature based on ratings and reviews simultaneously. Two embedding layers are employed to learn rating feature for users and items based on the user and item interactions, and two attention-based GRU networks learn context-aware representation from user and item reviews. Then a gating mechanism is used to leverage contributions from rating feature and textual feature. Experimental results on six real-world datasets demonstrate the superior performance of the proposed method over several state-of-the-art methods. Moreover, the keywords in reviews can be highlighted to interpret the predictions with the attention mechanism.

### Instagrammers, Fashionistas, and Me: Recurrent Fashion Recommendation with Implicit Visual Influence

• Yin Zhang
• James Caverlee

Fashion-focused key opinion bloggers on Instagram, Facebook, and other social media platforms are fast becoming critical influencers. They can inspire consumer clothing purchases by linking high fashion visual evolution with daily street style. In this paper, we build thefirst visual influence-aware fashion recommender (FIRN) with leveraging fashion bloggers and their dynamic visual posts. Specifically, we extract thedynamic fashion features highlighted by these bloggers via a BiLSTM that integrates a large corpus of visual posts and community influence. We then learn theimplicit visual influence funnel from bloggers to individual users via a personalized attention layer. Finally, we incorporate user personal style and her preferred fashion features across time in a recurrent recommendation network for dynamic fashion-updated clothing recommendation. Experiments show that FIRN outperforms state-of-the-art fashion recommenders, especially for users who are most impacted by fashion influencers, and utilizing fashion bloggers can bring greater improvements in recommendation compared with using other potential sources of visual information. We also release a largetime-aware high-quality visual dataset of fashion influencers that can be exploited for future research.

### What Can History Tell Us?

• Ke Sun
• Tieyun Qian
• Hongzhi Yin
• Tong Chen
• Yiqi Chen
• Ling Chen

Recommendation systems have been widely applied to many E-commerce and online social media platforms. Recently, sequential item recommendation, especially session-based recommendation, has aroused wide research interests. However, existing sequential recommendation approaches either ignore the historical sessions or consider all historical sessions without any distinction that whether the historical sessions are relevant or not to the current session, which motivates us to distinguish the effect of each historical session and identify relevant historical sessions for recommendation. In light of this, we propose a novel deep learning based sequential recommender framework for session-based recommendation, which takes Nonlocal Neural Network and Recurrent Neural Network as the main building blocks. Specifically, we design a two-layer nonlocal architecture to identify historical sessions that are relevant to the current session and learn the long-term user preferences mostly from these relevant sessions. Besides, we also design a gated recurrent unit (GRU) enhanced by the nonlocal structure to learn the short-term user preferences from the current session. Finally, we propose a novel approach to integrate both long-term and short-term user preferences in a unified way to facilitate training the whole recommender model in an end-to-end manner. We conduct extensive experiments on two widely used real-world datasets, and the experimental results show that our model achieves significant improvements over the state-of-the-art methods.

## SESSION: Long - Reinforcement Learning

### Context-Aware Ranking by Constructing a Virtual Environment for Reinforcement Learning

• Junqi Zhang
• Jiaxin Mao
• Yiqun Liu
• Ruizhe Zhang
• Min Zhang
• Shaoping Ma
• Jun Xu
• Qi Tian

Result ranking is one of the major concerns for Web search technologies. Most existing methodologies rank search results in descending order according to pointwise relevance estimation of single results. However, the dependency relationship between different search results are not taken into account. While search engine result pages contain more and more heterogenous components, a better ranking strategy should be a context-aware process and optimize result ranking globally. In this paper, we propose a novel framework which aims to improve context-aware listwise ranking performance by optimizing online evaluation metrics. The ranking problem is formalized as a Markov Decision Process (MDP) and solved with the reinforcement learning paradigm. To avoid the great cost to online systems during the training of the ranking model, we construct a virtual environment with millions of historical click logs to simulate the behavior of real users. Extensive experiments on both simulated and real datasets show that: 1) constructing a virtual environment can effectively leverage the large scale click logs and capture some important properties of real users. 2) the proposed framework can improve search ranking performance by a large margin.

### A Multi-Scale Temporal Feature Aggregation Convolutional Neural Network for Portfolio Management

• Si Shi
• Jianjun Li
• Guohui Li
• Peng Pan

Financial portfolio management is the process of periodically reallocating a fund into different financial investment products, with the goal of achieving the maximum profits. While conventional financial machine learning methods try to predict the price trends, reinforcement learning based portfolio management methods makes trading decisions according to the price changes directly. However, existing reinforcement learning based methods are limited in extracting the price change information at single-scale level, which makes their performance still not satisfactory. In this paper, inspired by the Inception network that has achieved great success in computer vision and can extract multi-scale features simultaneously, we propose a novel Ensemble of Identical Independent Inception (EI$^3$) convolutional neural network, with the objective of addressing the limitation of existing reinforcement learning based portfolio management methods. With EI$^3$, multiple assets can be processed independently while sharing the same network parameters. Moreover, price movement information for each product can be extracted at multiple scales via wide network and then aggregated to make trading decision. Based on EI$^3$, we further propose a recurrent reinforcement learning framework to provide a deep machine learning solution for the portfolio management problem. Comprehensive experiments on the cryptocurrency datasets demonstrate the superiority of our method over existing competitors, in both upswing and downswing environments.

### Order-free Medicine Combination Prediction with Graph Convolutional Reinforcement Learning

• Shanshan Wang
• Pengjie Ren
• Zhumin Chen
• Zhaochun Ren
• Jun Ma
• Maarten de Rijke

Medicine Combination Prediction (MCP) based on Electronic Health Record (EHR) can assist doctors to prescribe medicines for complex patients. Previous studies on MCP either ignore the correlations between medicines (i.e., MCP is formulated as a binary classifcation task), or assume that there is a sequential correlation between medicines (i.e., MCP is formulated as a sequence prediction task). The latter is unreasonable because the correlations between medicines should be considered in an order-free way. Importantly, MCP must take additional medical knowledge (e.g., Drug-Drug Interaction (DDI)) into consideration to ensure the safety of medicine combinations. However, most previous methods for MCP incorporate DDI knowledge with a post-processing scheme, which might undermine the integrity of proposed medicine combinations. In this paper, we propose a graph convolutional reinforcement learning model for MCP, named Combined Order-free Medicine Prediction Network (CompNet), that addresses the issues listed above. CompNet casts the MCP task as an order-free Markov Decision Process (MDP) problem and designs a Deep Q Learning (DQL) mechanism to learn correlative and adverse interactions between medicines. Specifcally, we frst use a Dual Convolutional Neural Network (Dual-CNN) to obtain patient representations based on EHRs. Then, we introduce the medicine knowledge associated with predicted medicines to create a dynamic medicine knowledge graph, and use a Relational Graph Convolutional Network (R-GCN) to encode it. Finally, CompNet selects medicines by fusing the combination of patient information and the medicine knowledge graph. Experiments on a benchmark dataset, i.e., MIMIC-III, demonstrate that CompNet signifcantly outperforms state-of-the-art methods and improves a recently proposed model by 3.74%pt, 6.64%pt in terms of Jaccard and F1 metrics.

### Reinforcement Learning with Sequential Information Clustering in Real-Time Bidding

• Junwei Lu
• Chaoqi Yang
• Xiaofeng Gao
• Liubin Wang
• Changcheng Li
• Guihai Chen

Display advertising is a billion dollar business which is the primary income of many companies. In this scenario, real-time bidding optimization is one of the most important problems, where the bids of ads for each impression are determined by an intelligent policy such that some global key performance indicators are optimized. Due to the highly dynamic bidding environment, many recent works try to use reinforcement learning algorithms to train the bidding agents. However, as the probability of the occurrence of a particular state is typically low and the state representation in current work lacks sequential information, the convergence speed and performance of deep reinforcement algorithms are disappointing. To tackle these two challenges in the real-time bidding scenario, we propose ClusterA3C, a novel Advantage Asynchronous Actor-Critic (A3C) variant integrated with a sequential information extraction scheme and a clustering based state aggregation scheme. We conduct extensive experiments to validate the proposed scheme on a real-world commercial dataset. Experimental results show that the proposed scheme outperforms the state of the art methods in terms of either performance or convergence speed.

### Generative Question Refinement with Deep Reinforcement Learning in Retrieval-based QA System

• Ye Liu
• Chenwei Zhang
• Xiaohui Yan
• Yi Chang
• Philip S. Yu

In real-world question-answering (QA) systems, ill-formed questions, such as wrong words, ill word order and noisy expressions, are common and may prevent the QA systems from understanding and answering them accurately. In order to eliminate the effect of ill-formed questions, we approach the question refinement task and propose a unified model, QREFINE, to refine the ill-formed questions to well-formed question. The basic idea is to learn a Seq2Seq model to generate a new question from the original one. To improve the quality and retrieval performance of the generated questions, we make two major improvements: 1) To better encode the semantics of ill-formed questions, we enrich the representation of questions with character embedding and the recent proposed contextual word embedding such as BERT, besides the traditional context-free word embeddings; 2) To make it capable to generate desired questions, we train the model with deep reinforcement learning techniques that considers an appropriate wording of the generation as an immediate reward and the correlation between generated question and answer as time-delayed long-term rewards. Experimental results on real-world datasets show that the proposed QREFINE method can generate refined questions with more readability but fewer mistakes than the original questions provided by users. Moreover, the refined questions also significantly improve the accuracy of answer retrieval.

## SESSION: Long - Search & Retrieval

### Analyzing the Effects of Document's Opinion and Credibility on Search Behaviors and Belief Dynamics

• Suppanut Pothirattanachaikul
• Takehiro Yamamoto
• Yusuke Yamamoto
• Masatoshi Yoshikawa

To obtain accurate information through web searches, people have to search for information carefully. This study investigates how the search behaviors and decision outcomes of searchers were affected by the documents they encountered during their search process. We focus on two document factors: (1) opinion (consistent and inconsistent) with the searchers' beliefs prior to the search task, and (2) credibility (high and low). We conducted a user study in which 260 participants were asked to perform health-related search tasks while controlling a search result with different opinions and credibility levels. The results revealed that (i) the participants spent more effort searching by issuing more queries, when belief-inconsistent documents were presented; (ii) the documents' opinion and credibility affected their belief dynamics, (i.e., how their beliefs changed after the search task); and (iii) their belief dynamics and search efforts had few relationships. These findings suggest that search engines could prevent users from polarization and thus, help them to obtain accurate information, by presenting documents that are inconsistent with users' beliefs on the higher-rank of the results.

### Identifying Facet Mismatches In Search Via Micrographs

• Sriram Srinivasan
• Nikhil S. Rao
• Karthik Subbian
• Lise Getoor

E-commerce search engines are the primary means by which customers shop for products online. Each customer query contains multiple facets such as product type, color, brand, etc. A successful search engine retrieves products that are relevant to the query along each of these attributes. However, due to lexical (erroneous title, description, etc.) and behavioral irregularities (clicks or purchases of products that do not belong to the same facet as the query), some mismatched products are often included in search results. These irregularities can be detected using simple binary classifiers like gradient boosted decision trees or logistic regression. Typically, these binary classifiers use strong independence assumptions between the results and ignore structural relationships available in the data, such as the connections between products and queries. In this paper, we use the connections that exist between products and query to identify a special kind of structure we refer to as a micrograph. Further, we make use of Statistical Relational Learning (SRL) to incorporate these micrographs in the data and pose the problem as a structured prediction problem. We refer to this approach as structured mismatch classification (\SMC). In addition, we show that naive addition of structure does not improve the performance of the model and hence introduce a variation of \SMC, strong \SMC~(\SSMC), which improves over the baseline by passing information from high-confidence predictions to lower confidence predictions. In our empirical evaluation we show that our proposed approach outperforms the baseline classification methods by up to 12% in precision. Furthermore, we use quasi-Newton methods to make our method viable for real-time inference in a search engine and show that our approach is up to 150 times faster than existing ADMM-based solvers.

### GRIP: Multi-Store Capacity-Optimized High-Performance Nearest Neighbor Search for Vector Search Engine

• Minjia Zhang
• Yuxiong He

This paper presents GRIP, an approximate nearest neighbor (ANN) search algorithm for building vector search engine which makes heavy use of the algorithm. GRIP is designed to retrieve documents at large-scale based on their semantic meanings in a scalable way. It is both fast and capacity-optimized. GRIP combines new algorithmic and system techniques to collaboratively optimize the use of memory, storage, and computation. The contributions include: (1) The first hybrid memory-storage ANN algorithm that allows ANN to benefit from both DRAM and SSDs simultaneously; (2) The design of a highly optimized indexing scheme that provides both memory-efficiency and high performance; (3) A cost analysis and a cost function for evaluating the capacity improvements of ANN algorithms. GRIP achieves an order of magnitude improvements on overall system efficiency, significantly reducing the cost of vector search, while attaining equal or higher accuracy, compared with the state-of-the-art.

### Improving Web Image Search with Contextual Information

• Xiaohui Xie
• Jiaxin Mao
• Yiqun Liu
• Maarten de Rijke
• Qingyao Ai
• Yufei Huang
• Min Zhang
• Shaoping Ma

In web image search, items users search for are images instead of Web pages or online services. Web image search constitutes a very important part of web search. Re-ranking is a trusted technique to improve retrieval effectiveness in web search. Previous work on re-ranking web image search results mainly focuses on intra-query information (e.g., human interactions with the initial list of the current query). Contextual information such as the query sequence and implicit user feedback provided during a search session prior to the current query is known to improve the performance of general web search but has so far not been used in web image search. The differences in result placement and interaction mechanisms of image search make the search process rather different from general Web search engines. Because of these differences, context-aware re-ranking models that have originally been developed for general web search cannot simply be applied to web image search. We propose CARM, a context-aware re-ranking model, a neural network-based framework to re-rank web image search results for a query based on previous interaction behavior in the search session in which the query was submitted. Specifically, we explore a hybrid encoder with an attention mechanism to model intra-query and inter-query user preferences for image results in a two-stage structure. We train context-aware re-ranking model (CARM) to jointly learn query and image representations so as to be able to deal with the multimodal characteristics of web image search. Extensive experiments are carried out on a commercial web image search dataset. The results show that CARM outperforms state-of-the-art baseline models in terms of personalized evaluation metrics. Also, CARM combines the original ranking can improve the original ranking on personalized ranking and relevance estimation. We make the implementation of CARM and relevant datasets publicly available to facilitate future studies.

### Dynamic Bayesian Metric Learning for Personalized Product Search

• Teng Xiao
• Jiaxin Ren
• Zaiqiao Meng
• Huan Sun
• Shangsong Liang

In this paper, we study the problem of personalized product search under streaming scenarios. We address the problem by proposing a Dynamic Bayesian Metric Learning model, abbreviated as DBML, which can collaboratively track the evolutions of latent semantic representations of different categories of entities (i.e., users, products and words) over time in a joint metric space. In particular, unlike previous work using inner-product metric to model the affinities between entities, our DBML is a novel probabilistic metric learning approach that is able to avoid the contradicts, keep the triangle inequality in the latent space, and correctly utilize implicit feedbacks. For inferring dynamic embeddings of the entities, we propose a scalable online inference algorithm, which can jointly learn the latent representations of entities and smooth their changes across time, based on amortized inference. The inferred dynamic semantic representations of entities collaboratively inferred in a unified form by our DBML can benefit not only for improving personalized product search, but also for capturing the affinities between users, products and words. Experimental results on large datasets over a number of applications demonstrate that our DBML outperforms the state-of-the-art algorithms, and can effectively capture the evolutions of semantic representations of different categories of entities over time.

## SESSION: Long - Sequential Data Analysis

### Towards Accurate and Interpretable Sequential Prediction: A CNN & Attention-Based Feature Extractor

• Jingyi Wang
• Qiang Liu
• Zhaocheng Liu
• Shu Wu

With the influence of information explosion, there are more and more choices exposed to public view. Next item recommendation is being a significant and challenging task. Recently, attention mechanism, Convolutional Neural Networks (CNN) and other kinds of deep components are used to model user behaviors. However, the proposed models often fail to extract the feature of user behaviors in different time periods and the CNN-based models before are hard to make the used CNN interpretable. In this paper, we propose a CNN & Attention-based Sequential Feature Extractor (CASFE) module to capture the possible features of user behaviors at different time intervals. Specifically, we import CNN to extract multi-level features of user behaviors with different time periods. After each CNN layer, we use attention module to emphasize the different effect of behaviors on the prediction result. Besides, the features we try to extract here have the similar concept and meaning with the hand-crafted features in Feature Engineering, which proves the validity of CASFE. Accordingly, CASFE becomes a general sequential feature extractor that can be used in various sequential prediction tasks. With Multi-Layer Perceptron (MLP), CASFE would be a state-of-the-art next item recommendation model. The model obtains good performance on Last.fm_1K dataset and MovieLens_1M dataset. Besides, as a compatible extractor module, it can also promote CTR prediction models as well as other sequential prediction tasks.

### Locally Slope-based Dynamic Time Warping for Time Series Classification

• Jidong Yuan
• Qianhong Lin
• Wei Zhang
• Zhihai Wang

Dynamic time warping (DTW) has been widely used in various domains of daily life. Essentially, DTW is a non-linear point-to-point matching method under time consistency constraints to find the optimal path between two temporal sequences. Although DTW achieves a globally optimal solution, it does not naturally capture locally reasonable alignments. Concretely, two points with entirely dissimilar local shape may be aligned. To solve this problem, we propose a novel weighted DTW based on local slope feature (LSDTW), which enhances DTW by taking regional information into consideration. LSDTW is inherently a DTW algorithm. However, it additionally attempts to pair locally similar shapes, and to avoid matching points with distinct neighborhood slopes. Furthermore, when LSDTW is used as a similarity measure in the popular nearest neighbor classifier, it beats other distance-based methods on the vast majority of public datasets, with significantly improved classification accuracies. In addition, case studies establish the interpretability of the proposed method.

### HiCAN: Hierarchical Convolutional Attention Network for Sequence Modeling

• Yi Cao
• Weifeng Zhang
• Bo Song
• Congfu Xu

Convolutional neural networks (CNN) are widely used on sequential data since it can capture local context dependencies and temporal order information inside sequences. Attention (ATT) mechanisms have also attracted enormous interests due to its capability of capturing the important parts of a sequence. These two neural networks can extract different features from sequences. In order to combine the advantages of CNN and ATT, we propose a convolutional attention network (CAN), which merges the structure of CNN and ATT into a single neural network and can serve as a new basic module in complex neural networks. Based on CAN, we then build a sequence encoding model with hierarchical structure, "hierarchical convolutional attention network (HiCAN)", to tackle sequence modeling problems. It can explicitly capture both the local and global context dependencies and temporal order information in sequences. Extensive experiments conducted on session-based recommendation (Recommender Systems) demonstrate that HiCAN is able to outperform state-of-the-art methods and show higher computational efficiency. Furthermore, we conduct extended experiments on text classification (Natural Language Processing). The results show that our model can also achieve competitive performance on NLP tasks.

### Automatic Sequential Pattern Mining in Data Streams

• Koki Kawabata
• Yasuko Matsubara
• Yasushi Sakurai

Given a large volume of multi-dimensional data streams, such as that produced by IoT applications, finance and online web-click logs, how can we discover typical patterns and compress them into compact models? In addition, how can we incrementally distinguish multiple patterns while considering the information obtained from a pattern found in a streaming setting? In this paper, we propose a streaming algorithm, namely StreamScope, that is designed to find intuitive patterns efficiently from event streams evolving over time. Our proposed method has the following properties: (a) it is effective: it operates on semi-infinite collections of co-evolving streams and summarizes all the streams into a set of multiple discrete segments grouped by their similarities. (b) it is automatic: it automatically and incrementally recognizes such patterns and generates models for each of them if necessary; (c) it is scalable: the complexity of our method does not depend on the length of the data streams. Our extensive experiments on real data streams demonstrate that StreamScope can find meaningful patterns and achieve great improvements in terms of computational time and memory space over its full batch method competitors.

### Efficient Sequential and Parallel Algorithms for Estimating Higher Order Spectra

• Zigeng Wang
• Abdullah-Al Mamun
• Xingyu Cai
• Nalini Ravishanker
• Sanguthevar Rajasekaran

Higher order spectra (HOS) are a powerful tool in nonlinear time series analysis and they have been extensively used as feature representations in data mining, communications and cosmology domains. However, HOS estimation suffers from high computational cost and memory consumption. Any algorithm for computing the kth order spectra on a dataset of size n needs O(n^k-1 ) time since the output size will be O(n^k-1 ) as well, which makes the direct HOS analysis difficult for long time series, and further prohibits its direct deployment to resource-limited and time-sensitive applications. Existing algorithms for computing HOS are either inefficient or have been implemented on obsolete architectures. Thus it is essential to develop efficient generic algorithms for HOS estimations. In this paper, we present a package of generic sequential and parallel algorithms for computationally and memory efficient HOS estimations which can be employed on any parallel machine or platform. Our proposed algorithms largely reduce the HOS' computational cost and memory usage in spectrum multiplication and smoothing steps through carefully designed prefix sum operations. Moreover, we employ a matrix partitioning technique and design algorithms with optimal memory usage and present the parallel approaches on the PRAM and the mesh models. Furthermore, we implement our algorithms for both bispectrum and trispectrum estimations. We conduct extensive experiments and cross-compare the proposed algorithms' performance. Results show that our algorithms achieve state-of-the-art computational and memory efficiency, and our parallel algorithms achieve close to linear speedups. The code is available at https://github.com/ZigengWang/HOS.

## SESSION: Long - Social Network

### A Modular Adversarial Approach to Social Recommendation

• Hari Cheruvu
• Cheng Tao
• Hari Sundaram

This paper proposes a novel framework to incorporate social regularization for item recommendation. Social regularization grounded in ideas of homophily and influence appears to capture latent user preferences. However, there are two key challenges: first, the importance of a specific social link depends on the context and second, a fundamental result states that we cannot disentangle homophily and influence from observational data to determine the effect of social inference. Thus we view the attribution problem as inherently adversarial where we examine two competing hypothesis---social influence and latent interests---to explain each purchase decision. We make two contributions. First, we propose a modular, adversarial framework that decouples the architectural choices for the recommender and social representation models, for social regularization. Second, we overcome degenerate solutions through an intuitive contextual weighting strategy, that supports an expressive attribution, to ensure informative social associations play a larger role in regularizing the learned user interest space. Our results indicate significant gains (5-10% relative [email protected]) over state-of-the-art baselines across multiple publicly available datasets.

### Emotional Contagion-Based Social Sentiment Mining in Social Networks by Introducing Network Communities

• Xiaobao Wang
• Di Jin
• Mengquan Liu
• Dongxiao He
• Katarzyna Musial
• Jianwu Dang

The rapid development of social media services has facilitated the communication of opinions through online news, blogs, microblogs, instant-messages, and so on. This article concentrates on the mining of readers' social sentiments evoked by social media materials. Existing methods are only applicable to a minority of social media like news portals with emotional voting information, while ignore the emotional contagion between writers and readers. However, incorporating such factors is challenging since the learned hidden variables would be very fuzzy (because of the short and noisy text in social networks). In this paper, we try to solve this problem by introducing a high-order network structure, i.e. communities. We first propose a new generative model called Community-Enhanced Social Sentiment Mining (CESSM), which 1) considers the emotional contagion between writers and readers to capture precise social sentiment, and 2) incorporates network communities to capture coherent topics. We then derive an inference algorithm based on Gibbs sampling. Empirical results show that, CESSM achieves significantly superior performance against the state-of-the-art techniques for text sentiment classification and interestingness in social sentiment mining.

### Social-Aware VR Configuration Recommendation via Multi-Feedback Coupled Tensor Factorization

• Hsu-Chao Lai
• Hong-Han Shuai
• De-Nian Yang
• Jiun-Long Huang
• Wang-Chien Lee
• Philip S. Yu

Recent technological advent in virtual reality (VR) has attracted a lot of attention to the VR shopping, which thus far is designed for a single user. In this paper, we envision the scenario of VR group shopping, where VR supports: 1) flexible display of items to address diverse personal preferences, and 2) convenient view switching between personal and group views to foster social interactions. We formulate the Multiview-Enabled Configuration Recommendation (MECR) problem to rank a set of displayed items for a VR shopping user. We design the Multiview-Enabled Configuration Ranking System (MEIRS) that first extracts discriminative features based on Marketing theories and then introduces a new coupled tensor factorization model to learn the representation of users, Multi-View Display (MVD) configurations, and multiple feedback with content features. Experimental results manifest that the proposed approach outperforms personalized recommendations and group recommendations by at least 30.8% in large-scale datasets and 63.3% in the user study in terms of hit ratio and mean average precision.

• Yu Yang
• Zhefeng Wang
• Tianyuan Jin
• Jian Pei
• Enhong Chen

### Adversarial Factorization Autoencoder for Look-alike Modeling

• Khoa D. Doan
• Chandan K. Reddy

### Concept Drift Adaption for Online Anomaly Detection in Structural Health Monitoring

• Hongda Tian
• Nguyen Lu Dang Khoa
• Ali Anaissi
• Yang Wang
• Fang Chen

Despite its success for anomaly detection in the scenario where only data representing normal behavior are available, one-class support vector machine (OCSVM) still has challenge in dealing with non-stationary data stream, where the underlying distributions of data are time-varying. Existing OCSVM-based online learning methods incrementally update the model to address the challenge, however, they solely rely on the location relationship between a test sample and error support vectors. To better accommodate normal behavior evolution, online anomaly detection in non-stationary data stream is formulated as a concept drift adaptation problem in this paper. It is proposed that OCSVM-based incremental learning is only performed in the case of a normal drift. For an incoming sample, its relative relationship with three sets of vectors in OCSVM, namely margin support vectors, error support vectors, and reserve vectors is fully utilized to estimate whether a normal drift is emerging. Extensive experiments in the field of structural health monitoring have been conducted and the results have shown that the proposed simple approach outperforms the existing OCSVM-based online learning algorithms for anomaly detection.

### Multi-task based Sales Predictions for Online Promotions

• Shen Xin
• Martin Ester
• Jiajun Bu
• Chengwei Yao
• Zhao Li
• Xun Zhou
• Yizhou Ye
• Can Wang

The e-commerce era is witnessing a rapid development of various annual online promotions, such as Black Friday, Cyber Monday, and Alibaba's 11.11, etc. S ales P redictions for O nline P romotions (SPOP) are a set of sales related forecasts for the promotion day, including gross merchandise volume, sales volume, best selling products, etc. SPOP is highly important for e-commerce platforms to efficiently organize merchandise and maximize business values. However, sales patterns during the promotions are varied according to different scenarios, each model of which is designed with different features, static or dynamic, for one task in particular. Therefore, several models are proposed with part of features that are possibly beneficial to other tasks, which indicates the universal representation for the items needs to be learned across different promotion scenarios. To address this problem, this paper proposes a D eep I tem N etwork for O nline P romotions (DINOP). In DINOP, we design a novel T arget U sers C ontrolled G ated R ecurrent U nit (TUC-GRU) structure for dynamic features, and provide a new attention mechanism introducing static users profiles. In contrast to traditional prediction models, the network we proposed can effectively and efficiently learn universal item representation by incorporating users' properties as controllers. Furthermore, it can successfully discover the static and dynamic features guided by the multi-task learning, and is easily extended to other sales related prediction problems without retraining. Empirical results show that performance of DINOP in the real data set of Alibaba's Global Shopping Festival is superior to other state-of-the-arts practical methodologies in terms of the convergence rate and prediction accuracy.

### Experimental Study of Multivariate Time Series Forecasting Models

• Jiaming Yin
• Weixiong Rao
• Mingxuan Yuan
• Jia Zeng
• Kai Zhao
• Chenxi Zhang
• Jiangfeng Li
• Qinpei Zhao

Multivariate time series forecasting has wide applications such as traffic flow prediction, supermarket commodity demand forecasting and etc. In literature, Due to the complex temporal patterns and inter-dependencies among multivariate time series, a large number of forecasting models have been developed. However, one question still remains unclear: how these models perform on a certain forecasting task, and there is lack of comprehensive performance comparison of these models on different tasks. To this end, in this paper, we conduct a systematic evaluation of eight representative forecasting models over eight multivariate time series datasets, and have the following findings: 1) When the datasets exhibit strong periodic patterns, deep learning models perform best. Otherwise on the datasets in a non-periodic manner, the statistical models such as ARIMA perform best. 2) For the long term prediction involving a high horizon value, the direct prediction strategy could lead to lower errors than the recursive one, but at the cost of higher training time. 3) For the multivariate time series explicitly involving graphic inter-dependencies among the multivariates, e.g., the road network topology in the spatio-temporal time series of traffic volumes in multiple routes, the Graph Convolution Network can incorporate the graphic inter-dependencies into their forecasting models for smaller prediction errors.

### GMTL: A GART Based Multi-task Learning Model for Multi-Social-Temporal Prediction in Online Games

• Jianrong Tao
• Linxia Gong
• Changjie Fan
• Longbiao Chen
• Dezhi Ye
• Sha Zhao

Multi-social-temporal (MST) data, which represent multi-attributed time series corresponding to the entities in multi-relational social network series, are ubiquitous in real-world and virtual-world dynamic systems, such as online games. Predictions over MST data such as social time series prediction and temporal link weight prediction are of great importance but challenging. They are affected by many complex factors, including temporal characteristics, social characteristics, collaborative characteristics, task characteristics and the intrinsic causality between them. In this paper, we propose a graph attention recurrent network (GART) based multi-task learning model (GMTL) to fuse information across multiple social-temporal prediction tasks. Experiments on an MMORPG dataset demonstrate that GMTL outperforms the state-of-the-art baselines and can significantly improve performances of specific social-temporal prediction task with additional information from others. Our work has been deployed to several MMORPGs in practice and can also expand to many related multi-social-temporal prediction tasks in real-world applications. Case studies on applications for multi-social-temporal prediction show that GMTL produces great value in the actual business in NetEase Games.

• Xiao Yang
• Tao Deng
• Weihan Tan
• Xutian Tao
• Junwei Zhang
• Shouke Qin
• Zongyao Ding

## SESSION: Demo - Demo Session 1

### Inspect What Your Location History Reveals About You: Raising user awareness on privacy threats associated with disclosing his location data

• Antoine Boutet
• Sébastien Gambs

Location is one of the most extensively collected personal data on mobile by applications and third-party services. However, how the location of users is actually processed in practice by the actors of targeted advertising ecosystem remains unclear. Nonetheless, these providers have a strong incentive to create very detailed profile of users to better monetize the collected data. End users are usually not aware about the strength and wide range of inference that can be performed from their mobility traces. In this demonstration, users interact with a web-based application to inspect their location history and to discover the inferential power of this kind of data. Moreover to better understand the possible countermeasures, users can apply a sanitization to protect their data and visualize the impact on both the mobility traces and the associated inferred information. The objective of this demonstration is to raise the user awareness on the profiling capabilities and the privacy threats associated with disclosing his location data as well as how sanitization mechanisms can be efficient to mitigate these privacy risks. In addition, by collecting users feedbacks on the personal information revealed and the usage of a geosanitization mechanism, we hope that this demonstration will also be useful to constitute a new and valuable dataset on users perceptions on these questions.

### PODIUM: Probabilistic Datalog Analysis via Contribution Maximization

• Tova Milo
• Yuval Moskovitch
• Brit Youngmann

The use of probabilistic datalog programs has been advocated for applications that involve recursive computation and uncertainty. While using such programs allows for a flexible knowledge derivation, it makes the analysis of query results a challenging task. Particularly, given a set O of output tuples and a number k, one would like to understand which k-size subset of the input tuples has affected the most the derivation of O. This is useful for multiple tasks, such as identifying critical sources of errors and understanding surprising results. To this end, we formalize the Contribution Maximization problem and present an efficient algorithm to solve it. Our algorithm injects a refined variant of the classic Magic Sets technique, integrated with a sampling method, into top-performing algorithms for the well-studied Influence Maximization problem. We propose to demonstrate our solution in a system called PODIUM. We will demonstrate the usefulness of PODIUM using real-life data and programs, and illustrate the effectiveness of our algorithm.

### Document in Context of its Time (DICT): Providing Temporal Context to Support Analysis of Past Documents

• Ricardo Campos
• Sourav S. Bhowmick
• Antoine Doucet

Old documents tend to be difficult to be analyzed and understood, not only for average users but oftentimes for professionals as well. This is due to the context shift, vocabulary evolution and, in general, the lack of precise knowledge about the writing styles in the past. We propose a concept of positioning document in the context of its time, and develop an interactive system to support such an objective. Our system helps users to know whether the vocabulary used by an author in the past were frequent at the time of text creation, whether the author used anachronisms or neologisms, and so on. It also enables detecting terms in text that underwent considerable semantic change and provides more information on the nature of such change. Overall, the proposed tool offers additional knowledge on the writing style and vocabulary choice in documents by drawing from data collected at the time of their creation or at other user-specified time.

### ATENA: An Autonomous System for Data Exploration Based on Deep Reinforcement Learning

• Ori Bar El
• Tova Milo
• Amit Somech

Exploratory Data Analysis (EDA), is an important yet challenging task, that requires profound analytical skills and familiarity with the data domain. While Deep Reinforcement Learning (DRL) is nowadays used to solve AI challenges previously considered to be intractable, to our knowledge such solutions have not yet been applied to EDA.

In this work we present ATENA, an autonomous system capable of exploring a given dataset by executing a meaningfulsequence of EDA operations. ATENA uses a novel DRL architecture, and learns to perform EDA operations by independently interacting with the dataset, without any training data or human assistance. We demonstrate ATENA in the context ofcyber security log analysis, where the audience is invited to partake in a data exploration challenge: explore real-life network logs, assisted by ATENA, in order to reveal underlying security attacks hidden in the data.

### ExplIQuE: Interactive Databases Exploration with SQL

• Marie Le Guilly
• Jean-Marc Petit
• Vasile-Marian Scuturici
• Ihab F. Ilyas

To help databases users who have just started learning SQL or are not familiar with their database, we propose ExplIQuE, an exploration interface with query extensions. Its purpose is to assist users to smoothly dive into data exploration, and to be able to express imprecise questions over their data. Indeed, such situations are more and more current with the increasing desire for users to get value out of their data. In this configuration, in addition to classic SQL querying possibilities, ExplIQuE offers the possibility to extend a given SQL query, by suggesting a set of possible selection predicates to add to the query, that aim at dividing the initial answer set to identify interesting exploration zones. In addition, ExplIQuE proposes some indicators to help the user in choosing its desire extension and in understanding her data, as well as interactive visualizations of the result set, in two dimensions revealed by PCA techniques. In this demonstration, we offer the audience the possibility to try the various functionalities of ExplIQuE by trying to express an imprecise question over a scientific database on bacterial colonies, through an iterative process. A video of the proposed demonstration is available at \urlhttps://youtu.be/oK8xWGCWj_A.

### TraVis: An Interactive Visualization System for Mining Inbound Traveler Activities by Leveraging Mobile Ad Request Data

• Pei-Xun Wang
• Hsuan Chiu
• Wen-Qian Chen
• Che-Chia Chang
• Yu-Hsuan Huang
• Tzu-Hao Huang
• Yuchun Lai
• Chia-Hu Chang

The growth of inbound travel is fully coordinate with the successful urban development. Increasing the number of inbound travelers not only creates more jobs and economic opportunities but also drives the country toward prosperity. Thus, inbound traveler analysis through trajectory pattern mining, a subfield of urban computing, is regarded as a promising solution. This paper introduces large-scale mobile ad requests as an alternative data source of trajectory pattern mining in order to eliminate the limitations of conventional data sources, such as GPS data, cellular data, and IP address data. In addition, to expedite a comprehensive inbound traveler analysis, we build TraVis, a real-world system for efficiently exploring the inbound travelers' activities through the interactive visualization interface. By incorporating various modules, such as mobile users' home country and travel intent prediction, frequent trajectory pattern mining, and interactive visualization, TraVis proves the capability of profiling the travelers' behavior pattern. We use Japan inbound travelers in the case study to present the mining insights, and we also demonstrate the extensive system functionalities. Our system has been assisting Japan government agencies to formulate travel marketing strategies, including tourist experience enhancement and attractions marketing.

### Understanding Data in the Blink of an Eye

• Matteo Paganelli
• Paolo Sottovia
• Antonio Maccioni
• Matteo Interlandi
• Francesco Guerra

Many data analysis and knowledge mining tasks require a basic understanding of the content of a dataset prior to any data access. In this demo, we showcase how data descriptions---a set of compact, readable and insightful formulas of boolean predicates---can be used to guide users in understanding datasets. Finding the best description for a dataset is, unfortunately, both computationally hard and task-specific. This demo shows that not only we can generate descriptions at interactive speed, but also that diverse user needs---from anomaly detection to data exploration---can be accommodated through a user-driven process exploiting dynamic programming in concert with a set of heuristics.

### SIMILANT: An Analytic Tool for Similarity Modeling

• David Bernhauer
• Tomáš Skopal
• Irena Holubová
• Martin Svoboda

We present SIMILANT, a data analytics tool for modeling similarity in content-based retrieval scenarios. In similarity search, data elements are modeled using black-box descriptors, where a pair-wise similarity function is the only way how to relate data elements to each other. Only these relations provide information about the dataset structure. Data analysts need to identify meaningful combinations of descriptors and similarity functions effectively. Therefore, we proposed a tool enabling a data analyst to systematically browse, tune, and analyze similarity models for a specific domain.

### MithraLabel: Flexible Dataset Nutritional Labels for Responsible Data Science

• Chenkai Sun
• Abolfazl Asudeh
• Bill Howe
• Julia Stoyanovich

Using inappropriate datasets for data science tasks can be harmful, especially for applications that impact humans. Targeting data ethics, we demonstrate MithraLabel, a system for generating task-specific information about a dataset, in the form of a set of visual widgets, as a flexible "nutritional label" that provides a user with information to determine the fitness of the dataset for the task at hand.

### PRIVATA: Differentially Private Data Market Framework using Negotiation-based Pricing Mechanism

• Kangsoo Jung
• Junkyu Lee
• Kunyoung Park
• Seog Park

As the value of digital data increases, the data market is in the spotlight as a means of obtaining a personal information. However, the collection of personal information makes a serious privacy violation and it is a serious problem in the use of personal data. Differential privacy, which is a de-facto standard for privacy protection in statistical databases, can be applied to solve the privacy violation problem. To apply differential privacy to the data market, the amount of noise and corresponding data price should be determined between the provider and consumer. However, this matter has not yet been studied. In this work, we introduce a Privata which is a differentially private data market framework to set the appropriate price and noise parameter in the data market environment. The Privata is based on negotiation technique using Rubinstein bargaining considering social welfare to prevent unfair transactions. We explain the Privata overview and negotiation technique in Privata, and show the Privata implementation.

### MiCRon: Making Sense of News via Relationship Subgraphs

• Zixian Huang
• Shuxin Li
• Gong Cheng
• Evgeny Kharlamov
• Yuzhong Qu

Knowledge graphs (KGs) have been extensively used to annotate text, e.g., news articles, in order to enhance its comprehension by readers. This requires to map entities occurring in the news to the target entities of the KG and to extract a so-called relationship sub-graph (RSG) that spans these entities. RSG extraction is computationally demanding and cannot scale to large KGs. Existing approximation algorithms that focus on structurally compact RSGs are not satisfactory since they often return no answers. We address this problem and develop an efficient algorithm to find approximations that connect the most salient subset of the target entities. Moreover, we propose a context-aware method to rank RSGs by their relevance to the news and their semantic cohesion. In the demo we will present our approach and the attendees will be able to experience how our system MiCRon helps to make sense of news article by computing and presenting RSGs relevant to these articles.

### LuPe: A System for Personalized and Transparent Data-driven Decisions

• Sarah Oppold
• Melanie Herschel

Machine learning models are commonly used for decision support even though they are far from perfect, e.g., due to bias introduced by imperfect training data or wrong feature selection. While efforts are made and should continue to be put into developing better models, we will likely continue to rely on imperfect models in many applications. In these settings, how could we at least use the "best" model for an individual or a group of users and transparently communicate the risks and weaknesses that apply?

We demonstrate LuPe, a system that addresses these questions. LuPe allows to optimize the choice of the applied model for subgroups of the population or individuals, thereby personalizing the model choice to best fit users' profiles, which improves fairness. LuPe further captures data to explain the choices made and the results of the model. We showcase how such data enable users to understand the system performance they can expect. This transparency helps users in making informed decisions or providing informed consent when such systems are used. Our demonstration will focus on several real-world applications showcasing the behavior of LuPe, including credit scoring and income prediction.

### Insta-Search: Towards Effective Exploration of Knowledge Graphs

• Maya Ramanath

Knowledge Graphs (KGs) are used to store heterogenous information in the form of graphs. One flexible and non-expert way to query these KGs is to use relationship queries or keyword search. The user can specify a query using keywords referring to entities in the graph. The system then returns a set of relationships among the queried entities. However, effectively querying these graphs is still challenging for a new user. She is not familiar with the entities and relationships in the graph and hence, her queries could often return empty or too few answers. We demonstrate a system called Insta-Search which facilitates effective exploration of KGs using relationship queries. Insta-Search helps the user by giving autocomplete keyword suggestions for partially typed words. It also displays an estimated number of answers that the current query would fetch along with few approximate top-scoring answers. The users also get entity suggestions so that they can iteratively reformulate the query until they find the query with the expected results. On submitting the query, the system returns ranked query results, grouped on the basis of similar information content to enhance result interpretation. No prerequisite knowledge of the data is required by the user to be able to use the system.

## SESSION: Demo - Demo Session 2

### SkyRec: Finding Pareto Optimal Groups

• Jinfei Liu
• Li Xiong
• Jian Pei
• Jun Luo
• Haoyu Zhang
• Si Zhang

We present SkyRec (Skyline Recommender), a recommendation toolkit for finding optimal groups based on the notion of group skyline. Skyline computation, aiming at identifying a set of skyline points that are not dominated by any other point, is particularly useful for multi-criteria data analysis and decision-making. Traditional skyline computation, however, is inadequate to answer queries that need to analyze not only individual points but also groups of points. o address this gap, SkyRec finds Pare to optimal groups with two group skyline models: G-Skyline [3] and Sum-Skyline [2]. SkyRecre turns Pare to optimal groups with group size k that are not dom-inated by any other group with the same group size. Users can examine the results of the group skyline based recommendation compared to traditional top-k and skyline based recommendation and how different group skyline notions differ from each other. Although we demonstrate Sky Rec for hotel reservation in this paper, it can be applied to various decision-making applications

### CurrentClean: Interactive Change Exploration and Cleaning of Stale Data

• Zheng Zheng
• Tri Minh Quach
• Ziyi Jin
• Fei Chiang
• Mostafa Milani

Enterprises often assume their data is up-to-date, where the presence of a timestamp in the recent past qualifies the data as current. However, entities modeled in the data experience varying rates of change that influence data currency. We argue that data currency is a relative notion based on individual spatio-temporal update patterns, and these patterns can be learned and predicted. We develop CurrentClean, a probabilistic system for identifying and cleaning stale values, and enables a user to interactively explore change in her data. Our system provides a Web-based user-interface, and a backend infrastructure that learns update correlations among cell values in a database to infer and repair stale values. Our demonstration provides two motivating scenarios that highlight change exploration, and cleaning features using clinical, and sensor data from a data centre enterprise.

### PatMat: A Distributed Pattern Matching Engine with Cypher

• Kongzhang Hao
• Zhengyi Yang
• Longbin Lai
• Zhengmin Lai
• Xin Jin
• Xuemin Lin

Graph pattern matching is one of the most fundamental problems in graph database and is associated with a wide spectrum of applications. Due to its computational intensiveness, researchers have primarily devoted their efforts to improving the performance of the algorithm while constraining the graphs to have singular labels on vertices (edges) or no label. Whereas in practice graphs are typically associated with rich properties, thus the main focus in the industry is instead on powerful query languages that can express a sufficient number of pattern matching scenarios. We demo PatMat in this work to glue together the academic efforts on performance and the industrial efforts on expressiveness. To do so, we leverage the state-of-the-art join-based algorithms in the distributed contexts and Cypher query language - the most widely-adopted declarative language for graph pattern matching. The experiments demonstrate how we are capable of turning complex Cypher semantics into a distributed solution with high performance.

### On a Chatbot Conducting Virtual Dialogues

• Boris Galitsky
• Dmitry Ilvovsky

We present a demo of the chatbot that delivers content in the form of virtual dialogues automatically produced from the plain texts extracted and selected from the documents. This virtual dialogue content is provided in the form of answers derived from the found and selected documents split into fragments, and questions are automatically generated for these answers.

### Rehab-Path: Recommending Alcohol and Drug-free Routes

• Yihong Zhang
• Panote Siriaraya
• Yukiko Kawai

Nowadays routing systems can provide optimal routes in terms of time and travel distance. However, they do not consider special needs of certain group of users. For example, people recovering from alcohol and drug addiction may want to travel a route that is alcohol and drug-free. In this demonstration, we propose a system we built that helps with this special need. We detect if a street is related to alcohol and drug by exploiting Web open data, including Foursquare, microblog tweets, Google Street View images, and crime data. We calculate an alcohol and drug relevance score using unsupervised methods, to be used in route ranking. Our system prototype is ready to be tested for the cities of San Francisco and Kyoto.

### kBrowse: kNN Graph Browser

• Ramon Bespinyowong
• Anthony K. H. Tung

The construction of k-nearest Neighbor Graph (kNNG) in several applications, such as a recommender system, similarity search, and data exploration is heavily based on the distance function which is usually unweighted and considered constant for all users. However, attributes are not all equally important and using different attribute weight gives different kNNGs. We present kBrowse, which allows users to explore, modify and understand kNNG computed from a weighted Manhattan distance function on loosely-defined weight space. It samples possible weight vectors, and computes their corresponding kNNGs. The system summarizes all the kNNGs into one graph by keeping all the edges with high edge certainty, a probabilistic measurement on how likely an edge is going to appear in the weight space. To make the weight space more defined, users can directly adjust the weight space or gives kNN examples. Sample weight vectors failing to satisfy the given conditions are then removed and the graph is summarized again. Finally, kBrowse also gives a user better understanding of kNN by showing which attribute is important in connecting nodes.

### BIP! Finder: Facilitating Scientific Literature Search by Exploiting Impact-Based Ranking

• Thanasis Vergoulis
• Serafeim Chatzopoulos
• Ilias Kanellos
• Panagiotis Deligiannis
• Christos Tryfonopoulos
• Theodore Dalamagas

Due to the rapidly increasing number of scientific articles, finding valuable work for further research has become tedious and time consuming. To alleviate this issue, search engines have used citation-based article impact ranking. However, most engines rely on very simplistic impact measures (usually the citation count) and make the problematic assumption that there is a one-size-fits-all impact measure. To address these problems, we present BIP! Finder, a search engine that facilitates the identification of valuable articles by exploiting two different impact measures, each capturing a different aspect of the article impact. In addition, BIP! Finder provides many useful features (article comparison, intuitive visualisations, article bookmarking mechanism, etc.) making it a powerful addition to the researcher's toolbox.

### BeLink: Querying Networks of Facts, Statements and Beliefs

• Tien-Duc Cao
• Ludivine Duroyon
• François Goasdoué
• Ioana Manolescu
• Xavier Tannier

An important class of journalistic fact-checking scenarios involves verifying the claims and knowledge of different actors at different moments in time. Claims may be about facts, or about other claims, leading to chains of hearsay. We have recently proposed a data model for (time-anchored) facts, statements and beliefs. It builds upon the W3C's RDF standard for Linked Open Data to describe connections between agents and their statements, and to trace information propagation as agents communicate. We propose to demonstrate BeLink, a prototype capable of storing such interconnected corpora, and answer powerful queries over them relying on SPARQL 1.1. The demo will showcase the exploration of a rich real-data corpus built from Twitter and mainstream media, and interconnected through extraction of statements with their sources, time, and topics.

### TuneR: Fine Tuning of Rule-based Entity Matchers

• Matteo Paganelli
• Paolo Sottovia
• Francesco Guerra
• Yannis Velegrakis

A rule-based entity matching task requires the definition of an effective set of rules, which is a time-consuming and error-prone process. The typical approach adopted for its resolution is a trial and error method, where the rules are incrementally added and modified until satisfactory results are obtained. This approach requires significant human intervention, since a typical dataset needs the definition of a large number of rules and possible interconnections that cannot be manually managed. In this paper, we propose TuneR, a software library supporting developers (i.e., coders, scientists, and domain experts) in tuning sets of matching rules. It aims to reduce human intervention by offering a tool for the optimization of rule sets based on user-defined criteria (such as effectiveness, interpretability, etc.). Our goal is to integrate the framework in the Magellan ecosystem, thus completing the functionalities required by the developers for performing Entity Matching tasks.

### I-REX: A Lucene Plugin for EXplainable IR

• Dwaipayan Roy
• Sourav Saha
• Mandar Mitra
• Bihan Sen
• Debasis Ganguly

Providing high-level, intuitive explanations of the performance of IR systems is generally difficult due to their complexity, and the various low-level implementation details involved. We present I-REX, a tool built on top of Lucene, that is intended to provide a systematic view into the inner workings of retrieval models and methods (specifically query expansion). This should help researchers study, compare, understand and explain the performance of these models and methods. I-REX can be run either as a Web service accessible through a browser, or as a terminal-based tool with a shell-like interactive interface. In this article, we describe a session that illustrates how I-REX can be used to explain the observed difference in the performance of two variants of the Language Model.

### Model Asset eXchange: Path to Ubiquitous Deep Learning Deployment

• Alex Bozarth
• Brendan Dwyer
• Fei Hu
• Daniel Jalova
• Karthik Muthuraman
• Nick Pentreath
• Simon Plovyt
• Gabriela de Queiroz
• Saishruthi Swaminathan
• Patrick Titzler
• Xin Wu
• Hong Xu
• Frederick R. Reiss
• Vijay Bommireddipalli

A recent trend observed in traditionally challenging fields such as computer vision and natural language processing has been the significant performance gains shown by deep learning (DL). In many different research fields, DL models have been evolving rapidly and become ubiquitous. Despite researchers' excitement, unfortunately, most software developers are not DL experts and oftentimes have a difficult time following the booming DL research outputs. As a result, it usually takes a significant amount of time for the latest superior DL models to prevail in industry. This issue is further exacerbated by the common use of sundry incompatible DL programming frameworks, such as Tensorflow, PyTorch, Theano, etc. To address this issue, we propose a system, called Model Asset Exchange (MAX), that avails developers of easy access to state-of-the-art DL models. Regardless of the underlying DL programming frameworks, it provides an open source Python library (called the MAX framework) that wraps DL models and unifies programming interfaces with our standardized RESTful APIs. These RESTful APIs enable developers to exploit the wrapped DL models for inference tasks without the need to fully understand different DL programming frameworks. Using MAX, we have wrapped and open-sourced more than 30 state-of-the-art DL models from various research fields, including computer vision, natural language processing and signal processing, etc. In the end, we selectively demonstrate two web applications that are built on top of MAX, as well as the process of adding a DL model to MAX.

### ReducE-Comm: Effective Inventory Reduction System for E-Commerce

• Shay Gershtein
• Tova Milo
• Slava Novgorodov

Many e-commerce platforms serve as an intermediary between companies/manufacturers and consumers, receiving a commission per purchase. To increase revenue, such sites tend to offer a wide variety of items. However, in many situations a smaller subset of the items should be selected and offered for sale, e.g., when opening an express branch or expanding to a new region, or when maintenance costs become prohibitive and redundant items should be disposed of. In all these cases selecting a reduced inventory which covers most consumer needs is an important goal.

In this demo we introduce ReducE-Comm - a highly parallelizable and scalable system that given a large set of items, a bound on the number of items that can be supported and information about consumer preferences/items relationships, allows to select a subset of the items which maximizes the likelihood of a purchase. Our system is interactive and facilitates real-time analysis, by providing detailed per-item impact statistics. We demonstrate the effectiveness of ReducE-Comm on real-world data and scenarios taken from a large e-commerce system, by interacting with the CIKM'19 audience who act as analysts aiming to intelligently reduce the inventory.

### dEFEND: A System for Explainable Fake News Detection

• Limeng Cui
• Kai Shu
• Suhang Wang
• Dongwon Lee
• Huan Liu

Despite recent advancements in computationally detecting fake news, we argue that a critical missing piece be the explainability of such detection--i.e., why a particular piece of news is detected as fake--and propose to exploit rich information in users' comments on social media to infer the authenticity of news. In this demo paper, we present our system for an explainable fake news detection called dEFEND, which can detect the authenticity of a piece of news while identifying user comments that can explain why the news is fake or real. Our solution develops a sentence-comment co-attention sub-network to exploit both news contents and user comments to jointly capture explainable top-k check-worthy sentences and user comments for fake news detection. The system is publicly accessible.

## SESSION: Tutorials

### Enterprise Knowledge Graph From Specific Business Task to Enterprise Knowledge Management

• Rong Duan
• Yanghua Xiao

Data driven Knowledge Graph is rapidly adapted by different societies. Many open domain and specific domain knowledge graphs have been constructed, and many industries have benefited from knowledge graph. Currently, enterprise related knowledge graph is classified as specific domain, but the applications span from solving a narrow specific problem to Enterprise Knowledge Management system. With the digital transform of traditional industry, Enterprise knowledge becomes more and more complicated, it involves knowledge from common domain, multiple specific domains, and corporate-specific in general. This tutorial provides an overview of current Enterprise Knowledge Graph(EKG). It distinguishes the EKG from specific domain according to the knowledge it covers, and provides the examples to illustrate the difference between EKG and specific domain KG. The tutorial further summarizes EKG into three types: Specific Business Task Enterprise KG, Specific Business Unit Enterprise KG and Cross Business Unit Enterprise KG, and illustrates the characteristics, steps, challenges, and future research in constructing and consuming of each of these three types of EKG .

### Taming Social Bots: Detection, Exploration and Measurement

• Abdullah Mueen
• Nikan Chavoshi
• Amanda Minnich

Social bots have been around for over a decade since 2008. Social bots are capable of swaying political opinion, spreading false information, and recruiting for terrorist organizations. Social bots use various sophisticated techniques by adopting emotions, sympathy following, synchronous deletions, and profile molting. There are several approaches proposed in the literature for detection, exploration, and measuring social bots. We provide a comprehensive overview of the existing work from data mining and machine learning perspective, discuss relative strengths and weaknesses of various methods, make recommendations for researchers and practitioners, and propose novel directions for future research in taming the social bots. The tutorial also discusses pitfalls in collecting and sharing data on social bots.

### Learning-Based Methods with Human-in-the-Loop for Entity Resolution

• Lucian Popa
• Kun Qian
• Prithviraj Sen

This tutorial is intended for researchers and practitioners working in the data integration area and, in particular, entity resolution (ER), which is a sub-area focused on linking entities across heterogeneous datasets. We outline the ideal requirements of modern ER systems: (1) capture domain knowledge via (minimal) human interaction, (2) provide as much automation as possible via machine learning techniques, and (3) achieve high explainability. We describe recent research trends towards bringing such ideal ER systems closer to reality. We begin with an overview of human-in-the-loop methods that are based on techniques such as crowdsourcing and active learning. We then dive into recent trends that involve deep learning techniques such as representation learning to automate feature engineering, and combinations of transfer and active learning to reduce the amount of user labels required. We also discuss how explainable AI relates to ER, and outline some of the recent advances towards explainable ER.

### Learning and Reasoning on Graph for Recommendation

• Xiang Wang
• Xiangnan He
• Tat-Seng Chua

Recommendation methods construct predictive models to estimate the likelihood of a user-item interaction. Previous models largely follow a general supervised learning paradigm --- treating each interaction as a separate data instance and performing prediction based on the ''information isolated island''. Such methods, however, overlook the relations among data instances, which may result in suboptimal performance especially for sparse scenarios. Moreover, the models built on a separate data instance only can hardly exhibit the reasons behind a recommendation, making the recommendation process opaque to understand. In this tutorial, we revisit the recommendation problem from the perspective of graph learning. Common data sources for recommendation can be organized into graphs, such as user-item interactions (bipartite graphs), social networks, item knowledge graphs (heterogeneous graphs), among others. Such a graph-based organization connects the isolated data instances, bringing benefits to exploiting high-order connectivities that encode meaningful patterns for collaborative filtering, content-based filtering, social influence modeling and knowledge-aware reasoning. Together with the recent success of graph neural networks (GNNs), graph-based models have exhibited the potential to be the technologies for next-generation recommendation systems. This tutorial provides a review on graph-based learning methods for recommendation, with special focus on recent developments of GNNs and knowledge graph-enhanced recommendation. By introducing this emerging and promising topic in this tutorial, we expect the audience to get deep understanding and accurate insight on the spaces, stimulate more ideas and discussions, and promote developments of technologies.

### Recent Developments of Deep Heterogeneous Information Network Analysis

• Chuan Shi
• Philip S. Yu

Recently, there is a surge of research on employing Heterogeneous Information Networks (HIN) to model complex interaction system, where networks compose of different types of nodes or links, since HIN contains richer structure and semantic information. Many researches develop structural analysis approaches by leveraging the rich semantic meaning of structural types of objects and links in the networks. Furthermore, recent advancement on deep learning and network embedding poses new opportunities and challenges to mine HIN, and heterogeneous network embedding, even heterogeneous graph neural network, is becoming a hot topic. In this tutorial, we will give a survey on recent developments of heterogeneous information network analysis, especially on newly emerging heterogeneous network embedding. This tutorial shall help researchers and practitioners to share new techniques for identifying and analyzing relationships in networks that integrate multiple types or sources of information.

### Synergy of Database Techniques and Machine Learning Models for String Similarity Search and Join

• Jiaheng Lu
• Chunbin Lin
• Jin Wang
• Chen Li

String data is ubiquitous and string similarity search and join are critical to the applications of information retrieval, data integration, data cleaning, and also big data analytics. To support these operations, many techniques in the database and machine learning areas have been proposed independently. More precisely, in the database research area, there are techniques based on the filtering-and-verification framework that can not only achieve a high performance, but also provide guaranteed quality of results for given similarity functions. In the machine learning research area, string similarity processing is modeled as a problem of identifying similar text records; Specifically, the deep learning approaches use embedding techniques that map text to a low-dimensional continuous vector space. In this tutorial, we review a number of studies of string similarity search and join in these two research areas. We divide the studies in each area into different categories. For each category, we provide a comprehensive review of the relevant works, and present the details of these solutions. We conclude this tutorial by pinpointing promising directions for future work to combine techniques in these two areas.

### Realtime Object Detection via Deep Learning-based Pipelines

• James G. Shanahan
• Liang Dai

Ever wonder how the Tesla Autopilot system works (or why it fails)? In this tutorial we will look under the hood of self-driving cars and of other applications of computer vision and review state-of-the-art tech pipelines for object detection such as two-stage approaches (e.g., Faster R-CNN) or single-stage approaches (e.g., YOLO/SSD). This is accomplished via a series of Jupyter Notebooks that use Python, OpenCV, Keras, and Tensorflow. No prior knowledge of computer vision is assumed (although it will be help!). To this end we begin this tutorial with a review of computer vision and traditional approaches to object detection such as Histogram of oriented gradients (HOG).

### Recommendation for Multi-stakeholders and through Neural Review Mining

• Muthusamy Chelliah
• Yong Zheng
• Sudeshna Sarkar

Recommender systems are able to produce a list of recommended items tailored to user preferences, while the end user is the only stakeholder in these traditional system. However, there could be multiple stakeholders in several applications domains (e.g., e-commerce, movies, music). Recommendations are necessary to be produced by balancing the needs of different stakeholders. First session of this tutorial introduces multi-stakeholder recommender systems (MSRS) with several case studies, and discusses the corresponding methods and challenges in MSRS. Reviews in an e-commerce platform may be mined to address cold-start problem and to generate explanations. Our earlier tutorial covered aspect-based sentiment analysis of products and topic models/distributed representations that bridge vocabulary gap between user reviews and product descriptions. Focus in the second session of this tutorial instead is on recent neural methods for review text mining - covering hands-on code for its use to enhance product recommendation. Each section will introduce topics from various mechanism (e.g., attention) and task (e.g., review ranking) perspectives, present cutting-edge research and a walk-through of programs executed on Jupyter notebook using real-world data sets.

### Machine Learning on Graphs with Kernels

• Michalis Vazirgiannis
• Giannis Nikolentzos
• Giannis Siglidis

Graphs are becoming a dominant structure in current information management with many domains involved, including social networks, chemistry, biology, etc. Many real-world problems require applying machine learning tasks to graph-structured data. Graph kernels have emerged as a promising approach for dealing with these tasks. A graph kernel is a symmetric, positive semidefinite function on the set of graphs. These functions extend the applicability of kernel methods to graphs. Graph kernels have attracted a lot of attention during the last 20 years. The considerable research activity that occurred in the field resulted in the development of dozens of kernels, each focusing on specific structural properties of graphs. The goal of this tutorial is to offer a comprehensive presentation of a wide range of graph kernels, and to describe their key applications. The tutorial will also offer to the participants hands-on experience in applying graph kernels to classification problems.

## SESSION: Workshop Summaries

### DTMBIO 2019: The Thirteenth International Workshop on Data and Text Mining in Biomedical Informatics

• Hyojung Paik
• Ruibin Xi
• Doheon Lee

Started in 2006 as a specialized workshop in the field of text mining applied to biomedical informatics, DTMBIO (ACM international workshop on Data and Text Mining in Biomedical Informatics) has been held annually in conjunction with one of the largest data management conferences, CIKM, bringing together researchers working on computer science and bioinformatics area including text mining and genomic data analysis. The purpose of DTMBIO is to foster discussions regarding the state-of-the-art applications of data and text mining on biomedical research problems. DTMBIO 2019 will help scientists navigate emerging trends and opportunities in the evolving area of informatics related techniques and problems in the context of biomedical research.

### Knowledge-Driven Analytics and Systems Impacting Human Quality of Life

• Arijit Ukil
• Leandro Marin
• Antonio Jara
• John Farserotu

The advent of artificial intelligence (AI), Internet of Things (IoT), powerful computational hardwares like graphics processing units, affordable sensing devices like smart bands, wearables, smartphones pave ways for large number of useful and intelligent applications hitherto never commonly envisaged. However, it is felt that applications, which positively influence human life and society, need distinct attention from the perspective of the researchers, application developers as well as industry. It is understood that knowledge-driven initiatives in terms of technology, application and practical deployment have strong capability to enable long term human-centric convergence of cyber-physical systems. Our endeavor is to discuss those finer details, research directions and application development aspects of analytics and systems intended for impacting human quality of life.

### HENA 2019: The 3rd Workshop of Heterogeneous Information Network Analysis and Applications

• Chuan Shi
• Yanfang Ye
• Jiawei Zhang

The third International Workshop on Heterogeneous Information Network Analysis and Applications is held in Beijing, China on November 3, 2019 and is co-located with the 28th International Conference on Information and Knowledge Management. The goal of this workshop is to bring together people from these different areas and provide an opportunity for researchers and practitioners to share new techniques for identifying and analyzing relationships in networks that integrate multiple types or sources of information. This workshop has an exciting program that spans a number of subareas, including: network construction and mining, network embedding, information diffusion, knowledge graph analysis, community detection, parallel computing for network analysis, and network analysis applications. The program includes several invited speakers, lively discussion on emerging topics, and presentations of accepted original research papers.

### EYRE 2019: 2nd International Workshop on EntitY REtrieval

• Gong Cheng
• Kalpa Gunaratna
• Jun Wang

Entity retrieval has received increasing research attention from both the Information Retrieval (IR) and Semantic Web communities. This workshop series provides a platform where interdisciplinary studies of entity retrieval can be presented, and focused discussions can take place. We also organize two shared tasks related to entity retrieval. The 2nd International Workshop on EntitY REtrieval (EYRE 2019) was a half-day workshop co-located with the 28th ACM International Conference on Information and Knowledge Management (CIKM 2019) in Beijing, China.

### CIKM 2019 Workshop on Artificial Intelligence in Transportation (AI in transportation)

• Weinan Zhang
• Haiming Jin
• Lingyu Zhang
• Hongtu Zhu
• Jessie Zhenhui Li
• Jieping Ye

Data-enabled smart transportation has attracted a surge of interest from machine learning and data mining researchers nowadays due to the bloom of online ride-hailing industry and rapid development of autonomous driving. Large-scale high quality route data and trading data (spatiotemporal data) have been generated every day, which makes AI an urgent need and preferred solution for the decision making in intelligent transportation systems. While a large of amount of work have been dedicated to traditional transportation problems, they are far from satisfactory for the rising need. We propose a half-day workshop at CIKM 2019 for the professionals, researchers, and practitioners who are interested in mining and understanding big and heterogeneous data generated in transportation, and AI applications to improve the transportation system. We plan to have several invited talks from both academia and industry. This workshop would be organized by Shanghai Jiao Tong University, Didi Chuxing and Pennsylvania State University.

### GRLA 2019: The first International Workshop on Graph Representation Learning and its Applications

• Huawei Shen
• Jian Tang
• Peng Bao

Graphs are the universal data structures for representing the relationships between interconnected objects. They are ubiquitous in a variety of disciplines and domains ranging from computer science, social science, economics, medicine, to bioinformatics. In Recent years, extensive studies have been conducted on the graph analysis techniques. One of the most fundamental challenges of analyzing graphs is effectively representing graphs, which largely determines the performance of many follow-up tasks. This workshop aims to provide a forum for industry and academia to discuss the latest progress on graph representation learning and their applications in different fields. We hope more advanced technologies can be proposed or inspired, and also we expect that the direction of graph representation learning can catch much more attention in both academic and industry.

### International Workshop on Model Selection and Parameter Tuning in Recommender Systems

• Fikret Sivrikaya
• Sahin Albayrak
• Defu Lian

Recommender systems have strongly attracted the attention of the machine learning research community with prosperous real-life deployments in the last few decades. The performance and success of most applications developed in this domain highly depend on an elaborate selection of models and configuration of their hyperparameters. The international MoST-Rec 2019 workshop addresses the issues of algorithm selection and parameter tuning for recommender systems. The workshop aims to bring together researchers from the model selection and hyperparameter tuning community in the general scope of machine learning with researchers from the recommender systems community for discussing and exchanging recent advances and open challenges in the field.

### 2nd Workshop on Knowledge-aware and Conversational Recommender Systems - KaRS

• Vito Walter Anelli
• Tommaso Di Noia

Over the last years, we have been witnessing the advent of more and more precise and powerful recommendation algorithms and techniques able to effectively assess users' tastes and predict information that would probably be of interest for them. Most of these approaches rely on the collaborative paradigm (often exploiting machine learning techniques) and do not take into account the huge amount of knowledge, both structured and non-structured ones, describing the domain of interest of the recommendation engine. Although very effective in in predicting relevant items, collaborative approaches miss some very interesting features that go beyond the accuracy of results and move into the direction of providing novel and diverse results as well as generating an explanation for the recommended items or support interactive and conversational recommendation processes.

### BigScholar 2019: The 6th Workshop on Big Scholarly Data

• Feng Xia
• Huan Liu
• Irwin King
• Kuansan Wang

Recent years have witnessed the rapid growth in the number of academics and practitioners who are interested in big scholarly data as well as closely-related areas. Quite a lot of papers reporting recent advancements in this area have been published in leading conferences and journals. Both non-commercial and commercial platforms and systems have been released in recent years, which provide innovative services built upon big scholarly data to the academic community. Examples include Microsoft Academic Graph, Google Scholar, DBLP, arXiv, CiteSeerX, Web of Knowledge, Udacity, Coursera, and edX. The workshop will contribute to the birth of a community having a shared interest around big scholarly data and exploring it using knowledge discovery, data science and analytics, network science, and other appropriate technologies.