CIKM '19- Proceedings of the 28th ACM International Conference on Information and Knowledge Management

Full Citation in the ACM Digital Library

SESSION: Keynote Address

Autonomous Driving Towards Mass Production

Visual recognition technology is very important for autonomous driving especially in direction of mass production. In this talk, we will introduce the algorimic progress for SenseTime in autonoumous driving, as well as our platform foundation for AI technology. Based on this, we illustrate how we make use of these technology into mass production product for autonomous driving.

The Fisher-Rao Metric in Computer Vision

The Fisher-Rao metric is a Riemannian metric defined on any manifold that forms the parameter space for a family of probability distributions. The metric is specified by quadratic forms defined on the tangent spaces of the manifold. If a parameterisation of the manifold is chosen then each quadratic form is given by a symmetric positive definite matrix. Lengths, areas, volumes and hyper-volumes calculated using the Fisher-Rao metric are invariant under reparameterisation. This invariance is essential in practice because the parameterisation can be changed arbitrarily while keeping the data unchanged. The Fisher-Rao metric is obtained as a limit of the expected value of the log likelihood ratio for two nearby probability distributions. The inverse of the Fisher-Rao matrix is the Cramer-Rao lower bound on the covariance of an unbiased estimate of a parameter. The Fisher-Rao metric is used to divide the parameter space for the Hough transform method for detecting structures in data. Each division or accumulator is invariant under reparametrerisation, and the number of accumulators is proportional to the volume of the parameter space. Accurate approximations to the Fisher Rao metric are obtained for lines, catadioptric images of lines, circles, ellipses and the cross ratio. It is shown that the Fisher-Rao metric can be used to compare the amount of information in point features with the amount of information in edge element features.

From Unstructured Text to TextCube: Automated Construction and Multidimensional Exploration

The real-world big data are largely unstructured, interconnected, and dynamic, in the form of natural language text. It is highly desirable to transform such massive unstructured data into structured knowledge. Many researchers rely on labor-intensive labeling and curation to extract knowledge from such data, which may not be scalable, especially considering that a lot of text corpora are highly dynamic and domain specific. We believe that massive text data itself may disclose a large body of hidden patterns, structures, and knowledge. With domain-independent and domain-dependent knowledge bases, we propose to explore the power of massive data itself for turning unstructured data into structured knowledge. By organizing massive text documents into multidimensional text cubes, we show structured knowledge can be extracted and used effectively. In this talk, we introduce a set of methods developed recently in our group for such an exploration, including mining quality phrases, entity recognition and typing, multi-faceted taxonomy construction, and construction and exploration of multi-dimensional text cubes. We show that data-driven approach could be a promising direction at transforming massive text data into structured knowledge.

Practicing the Art of Data Science

Data science embraces interdisciplinary methodologies and tools, such as those in statistics, artificial intelligence/machine learning, data management, algorithms, and computation. Practicing data science to empower innovative applications, however, remains an art due to many factors beyond technology, such as sophistication of application scenarios, business demands, and the central role of human being in the loop.

The purpose of this keynote speech is to share with the audience two most important rules of thumb that I learned from my practice of data science research, development and applications, as well as my thoughts on the future enterprise and organization data strategies.

First, I will demonstrate the importance and challenges in developing domain-oriented, end-to-end solutions. Specifically, I will discuss our experience in transforming algorithms to domain-oriented tools, and review some of our latest techniques in transforming black-box deep learning networks into interpretable white-box models.

Second, I will advocate the core value of data science as the connector and transformer between vertical application challenges and general scientific principles and engineering tools. Using network embedding as an example, I will illustrate the innovative value of building connectors and transformers for new types of data and applications so that they can take great advantage of well established scientific methods and engineering tools.

I envision data science for social, commercial and ecological good has to build on enterprise and organization data strategies and infrastructure. About future work, I will provide some thoughts on this perspective, such as data value assessing and pricing, as well as privacy preservation.

SESSION: Long - Algorithmic Techniques

On VR Spatial Query for Dual Entangled Worlds

With the rapid advent of Virtual Reality (VR) technology and virtual tour applications, there is a research need on spatial queries tailored for simultaneous movements in both the physical and virtual worlds. Traditional spatial queries, designed mainly for one world, do not consider the entangled dual worlds in VR. In this paper, we first investigate the fundamental shortest-path query in VR as the building block for spatial queries, aiming to avoid hitting boundaries and obstacles in the physical environment by leveraging Redirected Walking (RW) in Computer Graphics. Specifically, we first formulate Dual-world Redirected-walking Obstacle-free Path (DROP) to find the minimum-distance path in the virtual world, which is constrained by the RW cost in the physical world to ensure immersive experience in VR. We prove DROP is NP-hard and design a fully polynomial-time approximation scheme, Dual Entangled World Navigation (DEWN), by finding Minimum Immersion Loss Range (MIL Range). Afterward, we show that the existing spatial query algorithms and index structures can leverage DEWN as a building block to support kNN and range queries in the dual worlds of VR. Experimental results and a user study with implementation in HTC VIVE manifest that DEWN outperforms the baselines with smoother RW operations in various VR scenarios.

Sketching Streaming Histogram Elements using Multiple Weighted Factors

We propose a novel sketching approach for streaming data that, even with limited computing resources, enables processing high volume and high velocity data efficiently. Our approach accounts for the fact that a stream of data is generally dynamic, with the underlying distribution possibly changing all the time. Specifically, we propose a hashing (sketching) technique that is able to automatically estimate a histogram from a stream of data by using a model with adaptive coefficients. Such a model is necessary to enable the preservation of histogram similarities, following the varying weight/importance of the generated histograms. To address the dynamic properties of data streams, we develop a novel algorithm that can sketch the histograms from a data stream using multiple weighted factors. The results from our extensive experiments on both synthetic and real-world datasets show the effectiveness and the efficiency of the proposed method.

Improved Compressed String Dictionaries

We introduce a new family of compressed data structures to efficiently store and query large string dictionaries in main memory. Our main technique is a combination of hierarchical Front-coding with ideas from longest-common-prefix computation in suffix arrays. Our data structures yield relevant space-time tradeoffs in real-world dictionaries. We focus on two domains where string dictionaries are extensively used and efficient compression is required: URL collections, a key element in Web graphs and applications such as Web mining; and collections of URIs and literals, the basic components of RDF datasets. Our experiments show that our data structures achieve better compression than the state-of-the-art alternatives while providing very competitive query times.

On Transforming Relevance Scales

Information Retrieval (IR) researchers have often used existing IR evaluation collections and transformed the relevance scale in which judgments have been collected, e.g., to use metrics that assume binary judgments like Mean Average Precision. Such scale transformations are often arbitrary (e.g., 0,1 mapped to 0 and 2,3 mapped to 1) and it is assumed that they have no impact on the results of IR evaluation. Moreover, the use of crowdsourcing to collect relevance judgments has become a standard methodology. When designing the crowdsourcing relevance judgment task, one of the decision to be made is the how granular the relevance scale used to collect judgments should be. Such decision has then repercussions on the metrics used to measure IR system effectiveness. In this paper we look at the effect of scale transformations in a systematic way. We perform extensive experiments to study the transformation of judgments from fine-grained to coarse-grained. We use different relevance judgments expressed on different relevance scales and either expressed by expert annotators or collected by means of crowdsourcing. The objective is to understand the impact of relevance scale transformations on IR evaluation outcomes and to draw conclusions on how to best transform judgments into a different scale, when necessary.

Streamline Density Peak Clustering for Practical Adoptions

Since Density Peak Clustering (DPC) algorithm was proposed in 2014, it has drawn lots of interest in various domains. As a clustering method, DPC features superior generality, robustness, flexibility and simplicity. There are however two main roadblocks for its practical adoptions, both centered around the selection of cutoff distance, the single critical hyperparameter of DPC. This work proposes an improved algorithm named Streamlined Density Peak Clustering (SDPC). SDPC speeds up DPC executions on a sequence of cutoff distances by 2.2-8.8X while at the same time reducing memory usage by a magnitude. As an algorithm preserving the original semantic of DPC, SDPC offers an efficient and scalable drop-in replacement of DPC for data clustering.

SESSION: Long - Analyzing Spatio-Temporal Data

Recommendation-based Team Formation for On-demand Taxi-calling Platforms

On-demand taxi-calling platforms often ignore the social engagement of individual drivers. The lack of social incentives impairs the work enthusiasms of drivers and will affect the quality of service. In this paper, we propose to form teams among drivers to promote participation. A team consists of a leader and multiple members, which acts as the basis for various group-based incentives such as competition. We define the Recommendation-based Team Formation (RTF) problem to form as many teams as possible while accounting for the choices of drivers. The RTF problem is challenging. It needs both accurate recommendation and coordination among recommendations, since each driver can be in at most one team. To solve the RTF problem, we devise a Recommendation-Matrix-Based Framework (RMBF). It first estimates the acceptance probability of recommendations and then derives a recommendation matrix to maximize the number of formed teams from a global view. We conduct trace-driven simulations using real data covering over 64,000 drivers and deploy our solution on a large on-demand taxi-calling platform for online evaluations. Experimental results show that RMBF outperforms the greedy-based strategy by forming up to 20% and 12.4% teams in trace-driven simulations and online evaluations, and the drivers who form teams and are involved in the competition have more service time, number of finished orders and income.

DeepIST: Deep Image-based Spatio-Temporal Network for Travel Time Estimation

Estimating the travel time for a given path is a fundamental problem in many urban transportation systems. However, prior works fail to well capture moving behaviors embedded in paths and thus do not estimate the travel time accurately. To fill in this gap, in this work, we propose a novel neural network framework, namely Deep Image-based Spatio-Temporal network (DeepIST), for travel time estimation of a given path. The novelty of DeepIST lies in the following aspects:1) we propose to plot a path as a sequence of -generalized images"which include sub-paths along with additional information, such as traffic conditions, road network and traffic signals, in order to harness the power of convolutional neural network model (CNN)on image processing; 2) we design a novel two-dimensional CNN, namely PathCNN, to extract spatial patterns for lines in images by regularization and adopting multiple pooling methods; and 3) we apply a one-dimensional CNN to capture temporal patterns among the spatial patterns along the paths for the estimation. Empirical results show that DeepIST soundly outperforms the state-of-the-art travel time estimation models by 24.37% to 25.64% of mean absolute error (MAE) in multiple large-scale real-world datasets

Personalized Route Description Based On Historical Trajectories

The turn-by-turn route descriptions provided in the existing navigation applications are exclusively derived from underlying road network topology information, i.e., the connectivity of edges to each other. Therefore, the turn-by-turn route descriptions are simplified as metric translation of physical world (e.g. distance/time to turn) to spoken language. Such translation that ignores human cognition of the geographic space, is frequently verbose and redundant for the drivers who have knowledge of the geographical areas. In this paper, we study a Personalized Route Description system dubbed PerRD-with which the goal is to generate more customized and intuitive route descriptions based on user generated content. PerRD utilizes a wealth of user generated historical trajectory data to extract frequently visited routes in the road network. The extracted information is used to make cognitive customized route description for each user. We formalize this task as a problem of finding the optimal partition for a given route that maximizes the familiarity while minimizing the number of partitions, and finding a proper sentence to describe each partition. For empirical study, our solution is applied to three trajectory datasets and users' real experiences to evaluate the performance and effectiveness of PerRD.

Geolocating Tweets in any Language at any Location

Most social media messages are written in languages other than English, but commonly used text mining tools were designed only for English. This paper introduces the Unicode Convolutional Neural Network (UnicodeCNN) for analyzing text written in any language. The UnicodeCNN does not require the language to be known in advance, allows the language to change arbitrarily mid-sentence, and is robust to the misspellings and grammatical mistakes commonly found in social media. We demonstrate the UnicodeCNN's effectiveness on the challenging task of content-based tweet geolocation using a dataset with 900 million tweets written in more than 100 languages. Whereas previous work restricted itself to predicting a tweet's country or city of origin (and only worked on tweets written in certain languages from highly populated cities), we predict the exact GPS locations of tweets (and our method works on tweets written in any language sent from anywhere in the world). We predict GPS coordinates using the mixture of von Mises-Fisher (MvMF) distribution. The MvMF exploits the Earth's spherical geometry to improve predictions, a task that previous work considered too computationally difficult. On English tweets, our model's predictions average more than 300km closer to the true location than previous work, and in other languages our model's predictions are up to 1500km more accurate. Remarkably, the UnicodeCNN can learn geographic knowledge in one language and automatically transfer that knowledge to other languages.

SeiSMo: Semi-supervised Time Series Motif Discovery for Seismic Signal Detection

Unlike semi-supervised clustering, classification and rule discovery; semi-supervised motif discovery is a surprisingly unexplored area in data mining. Semi-supervised Motif Discovery finds hidden patterns in long time series when a few arbitrarily known patterns are given. A naive approach is to exploit the known patterns and perform similarity search within a radius of the patterns. However, this method would find only similar shapes and would be limited in discovering new shapes. In contrast, traditional unsupervised motif discovery algorithms detect new shapes, while missing some patterns because the given information is not utilized. We propose a semi-supervised motif discovery algorithm that forms a nearest neighbor graph to identify chains of nearest neighbors from the given events. We demonstrate that the chains are likely to identify hidden patterns in the data. We have applied the method to find novel events in several geoscientific datasets more accurately than existing methods.

SESSION: Long - Biomedical Informatics

UA-CRNN: Uncertainty-Aware Convolutional Recurrent Neural Network for Mortality Risk Prediction

Accurate prediction of mortality risk is important for evaluating early treatments, detecting high-risk patients and improving healthcare outcomes. Predicting mortality risk from the irregular clinical time series data is challenging due to the varying time intervals in the consecutive records. Existing methods usually solve this issue by generating regular time series data from the original irregular data without considering the uncertainty in the generated data, caused by varying time intervals. In this paper, we propose a novel Uncertainty-Aware Convolutional Recurrent Neural Network (UA-CRNN), which incorporates the uncertainty information in the generated data to improve the mortality risk prediction performance. To handle the complex clinical time series data with sub-series of different frequencies, we propose to incorporate the uncertainty information into the sub-series level rather than the whole time series data. Specifically, we design a novel hierarchical uncertainty-aware decomposition layer (UADL) to adaptively decompose time series into different sub-series and assign them proper weights according to their reliabilities. Experimental results on two real-world clinical datasets demonstrate that the proposed UA-CRNN method significantly outperforms state-of-the-art methods in both short-term and long-term mortality risk predictions.

Learning More with Less: Conditional PGGAN-based Data Augmentation for Brain Metastases Detection Using Highly-Rough Annotation on MR Images

Accurate Computer-Assisted Diagnosis, associated with proper data wrangling, can alleviate the risk of overlooking the diagnosis in a clinical environment. Towards this, as a Data Augmentation (DA) technique, Generative Adversarial Networks (GANs) can synthesize additional training data to handle the small/fragmented medical imaging datasets collected from various scanners; those images are realistic but completely different from the original ones, filling the data lack in the real image distribution. However, we cannot easily use them to locate disease areas, considering expert physicians' expensive annotation cost. Therefore, this paper proposes Conditional Progressive Growing of GANs (CPGGANs), incorporating highly-rough bounding box conditions incrementally into PGGANs to place brain metastases at desired positions/sizes on 256 × 256 Magnetic Resonance (MR) images, for Convolutional Neural Network-based tumor detection; this first GAN-based medical DA using automatic bounding box annotation improves the training robustness. The results show that CPGGAN-based DA can boost 10% sensitivity in diagnosis with clinically acceptable additional False Positives. Surprisingly, further tumor realism, achieved with additional normal brain MR images for CPGGAN training, does not contribute to detection performance, while even three physicians cannot accurately distinguish them from the real ones in Visual Turing Test.

Domain Knowledge Guided Deep Atrial Fibrillation Classification and Its Visual Interpretation

Hand-crafted features have been proven useful in solving the electrocardiograph~(ECG) classification problem. The features rely on domain knowledge and carry clinical meanings. However, the construction of the features requires tedious fine tuning in practice. Lately, a set of end-to-end deep neural network models have been proposed and show promising results in ECG classification. Though effective, such models learn patterns which usually mismatch human's concept, and thereby it is hard to get a convincing explanation with interpretation methods. This limitation significantly narrows the applicability of deep models, considering it is difficult for cardiologists to accept the unexplainable results from deep learning. To alleviate such limitation, we are bringing the best from the two worlds and propose a domain knowledge guided deep neural network. Specifically, we utilize a deep residual network as a classification framework, within which key feature ~(P-wave and R-peak position) reconstruction tasks are adopted to incorporate domain knowledge in the learning process. The reconstruction tasks make the model pay more attention to key feature points within ECG. Furthermore, we utilize occlusion method to get visual interpretation and design a visualization at both heartbeat level and feature point level. Our experiments show the superior performance of the proposed ECG classification methods compared to the model without P-wave and R-peak tasks, and the patterns learnt by our model is more explainable.

Question Difficulty Prediction for Multiple Choice Problems in Medical Exams

In the ITS (Intelligent Tutoring System) services, personalized question recommendation is a critical function in which the key challenge is to predict the difficulty of each question. Given the difficulty of each question, ITS can allocate suitable questions for students with varied knowledge proficiency. Existing approaches mainly relied on expert labeling, which is both subjective and labor intensive. In this paper, we propose a Document enhanced Attention based neural Network(DAN) framework to predict the difficulty of multiple choice problems in medical exams. DAN consists of three major steps: (1) In addition to stem and options, DAN retrieves relevant medical documents to enrich the content of each question; (2) DAN breaks down the question's difficulty into two parts: the hardness for recalling the knowledge assessed by the question and the confusion degree to exclude distractors. For each part, DAN introduces corresponding attention layers to model it; (3) DAN combines two parts of difficulties together to predict the overall difficulty. We collect a real-world data set from one of the largest medical online education websites in China. And the experimental results demonstrate the effectiveness of the proposed framework.

GRAPHENE: A Precise Biomedical Literature Retrieval Engine with Graph Augmented Deep Learning and External Knowledge Empowerment

Effective biomedical literature retrieval (BLR) plays a central role inprecision medicine informatics. In this paper, we propose GRAPHENE,which is a deep learning based framework for precise BLR. GRAPHENEconsists of three main different modules 1) graph-augmented doc-ument representation learning; 2) query expansion and represen-tation learning and 3) learning to rank biomedical articles. Thegraph-augmented document representation learning module con-structs a document-concept graph containing biomedical conceptnodes and document nodes so that global biomedical related con-cept from external knowledge source can be captured, which isfurther connected to a BiLSTM so both local and global topics canbe explored. Query expansion and representation learning moduleexpands the query with abbreviations and different names, and thenbuilds a CNN-based model to convolve the expanded query andobtain a vector representation for each query. Learning to rank min-imizes a ranking loss between biomedical articles with the queryto learn the retrieval function. Experimental results on applyingour system to TREC Precision Medicine track data are provided todemonstrate its effectiveness.

SESSION: Long - Computer Vision

Video-level Multi-model Fusion for Action Recognition

The approaches based on spatio-temporal features for video action recognition have emerged such as two-stream based methods and 3D convolution based methods. However, current methods suffer from the problems caused by partial observation, or restricted to single information modeling, and so on. Segment-level recognition results obtained from dense sampling can not represent the entire video and, therefore lead to partial observation. And a single model is hard to capture the complementary information on spacial, temporal and spatio-temporal information from video at the same time. Therefore, the challenge is to build the video-level representation and capture multiple information. In this paper, a video-level multi-model fusion action recognition method is proposed to solve these problems. Firstly, an efficient video-level 3D convolution model is proposed to get the global information in the video which assembling segment-level 3D convolution models. Secondly, a multi-model fusion architecture is proposed for video action recognition to capture multiple information. The spatial, temporal and spatio-temporal information are aggregate with SVM classifier. Experimental results show that this method achieves the state-of-the-art performance on the datasets of UCF-101(97.6%) without pre-training on Kinetics.

Large Scale Landmark Recognition via Deep Metric Learning

This paper presents a novel approach for landmark recognition in images that we've successfully deployed at This method enables us to recognize famous places, buildings, monuments, and other landmarks in user photos. The main challenge lies in the fact that it's very complicated to give a precise definition of what is and what is not a landmark. Some buildings, statues and natural objects are landmarks; others are not. There's also no database with a fairly large number of landmarks to train a recognition model. A key feature of using landmark recognition in a production environment is that the number of photos containing landmarks is extremely small. This is why the model should have a very low false positive rate as well as high recognition accuracy. We propose a metric learning-based approach that successfully deals with existing challenges and efficiently handles a large number of landmarks. Our method uses a deep neural network and requires a single pass inference that makes it fast to use in production. We also describe an algorithm for cleaning landmarks database which is essential for training a metric learning model. We provide an in-depth description of basic components of our method like neural network architecture, the learning strategy, and the features of our metric learning approach. We show the results of proposed solutions in tests that emulate the distribution of photos with and without landmarks from a user collection. We compare our method with others during these tests. The described system has been deployed as a part of a photo recognition solution at Cloud, which is the photo sharing and storage service at Group.

Multi-stage Deep Classifier Cascades for Open World Recognition

At present, object recognition studies are mostly conducted in a closed lab setting with classes in test phase typically in training phase. However, real-world problem are far more challenging because: i)~new classes unseen in the training phase can appear when predicting; ii)~discriminative features need to evolve when new classes emerge in real time; and iii)~instances in new classes may not follow the "independent and identically distributed" (iid) assumption. Most existing work only aims to detect the unknown classes and is incapable of continuing to learn newer classes. Although a few methods consider both detecting and including new classes, all are based on the predefined handcrafted features that cannot evolve and are out-of-date for characterizing emerging classes. Thus, to address the above challenges, we propose a novel generic end-to-end framework consisting of a dynamic cascade of classifiers that incrementally learn their dynamic and inherent features. The proposed method injects dynamic elements into the system by detecting instances from unknown classes, while at the same time incrementally updating the model to include the new classes. The resulting cascade tree grows by adding a new leaf node classifier once a new class is detected, and the discriminative features are updated via an end-to-end learning strategy. Experiments on two real-world datasets demonstrate that our proposed method outperforms existing state-of-the-art methods.

Inferring Context from Pixels for Multimodal Image Classification

Image classification models take image pixels as input and predict labels in a predefined taxonomy. While contextual information (e.g. text surrounding an image) can provide valuable orthogonal signals to improve classification, the typical setting in literature assumes the unavailability of text and thus focuses on models that rely purely on pixels. In this work, we also focus on the setting where only pixels are available in the input. However, we demonstrate that if we predict textual information from pixels, we can subsequently use the predicted text to train models that improve overall performance. We propose a framework that consists of two main components: (1) a phrase generator that maps image pixels to a contextual phrase, and (2) a multimodal model that uses textual features from the phrase generator and visual features from the image pixels to produce labels in the output taxonomy. The phrase generator is trained using web-based query-image pairs to incorporate contextual information associated with each image and has a large output space. We evaluate our framework on diverse benchmark datasets (specifically, the WebVision dataset for evaluating multi-class classification and OpenImages dataset for evaluating multi-label classification), demonstrating performance improvements over approaches based exclusively on pixels and showcasing benefits in prediction interpretability. We additionally present results to demonstrate that our framework provides improvements in few-shot learning of minimally labeled concepts. We further demonstrate the unique benefits of the multimodal nature of our framework by utilizing intermediate image/text co-embeddings to perform baseline zero-shot learning on the ImageNet dataset.

Multi-Target Multi-Camera Tracking with Human Body Part Semantic Features

Recently, Multi-Target Multi-Camera Tracking (MTMCT) has gained more and more attention. It is a challenging task with major problems including occlusion, background clutter, poses and camera point of view variations. Compared to single camera tracking, which takes advantage of location information and strict time constraints, good appearance features are more important to MTMCT. This drives us to extract robust and discriminative features for MTMCT. We propose MTMCT\_HS which uses human body part semantic features to overcome the above challenges. We use a two-stream deep neural network to extract the global appearance features and human body part semantic maps separately, and employ aggregation operations to generate final features. We argue that these features are more suitable for affinity measurement, which can be seen as the average of appearance similarity weighted by the corresponding human body part similarity. Next, our tracker adopts a hierarchical correlation clustering algorithm, which combines targets' appearance feature similarity with motion correlation for data association. We validate the effectiveness of our MTMCT\_HS method by demonstrating its superiority over the state-of-the-art method on DukeMTMC benchmark. Experiments show that the extracted features with human body part semantics are more effective for MTMCT compared with the methods solely employing global appearance features.

SESSION: Long - Database and System

Efficient Join Processing Over Incomplete Data Streams

For decades, the join operator over fast data streams has always drawn much attention from the database community, due to its wide spectrum of real-world applications, such as online clustering, intrusion detection, sensor data monitoring, and so on. Existing works usually assume that the underlying streams to be joined are complete (without any missing values). However, this assumption may not always hold, since objects from streams may contain some missing attributes, due to various reasons such as packet losses, network congestion/failure, and so on. In this paper, we formalize an important problem, namely join over incomplete data streams (Join-iDS), which retrieves joining object pairs from incomplete data streams with high confidences. We tackle the Join-iDS problem in the style of "data imputation and query processing at the same time". To enable this style, we design an effective and efficient cost-model-based imputation method via deferential dependency (DD), devise effective pruning strategies to reduce the Join-iDS search space, and propose efficient algorithms via our proposed cost-model-based data synopsis/indexes. Extensive experiments have been conducted to verify the efficiency and effectiveness of our proposed Join-iDS approach on both real and synthetic data sets.

Inclusion Dependency Discovery: An Experimental Evaluation of Thirteen Algorithms

Inclusion dependencies are an important type of metadata in relational databases, because they indicate foreign key relationships and serve a variety of data management tasks, such as data linkage, query optimization, and data integration. The discovery of inclusion dependencies is, therefore, a well-studied problem and has been addressed by many algorithms. Each of these discovery algorithms follows its own strategy with certain strengths and weaknesses, which makes it difficult for data scientists to choose the optimal algorithm for a given profiling task.

This paper summarizes the different state-of-the-art discovery approaches and discusses their commonalities. For evaluation purposes, we carefully re-implemented the thirteen most popular discovery algorithms and discuss their individual properties. Our extensive evaluation on several real-world and synthetic datasets shows the unbiased performance of the different discovery approaches and, hence, provides a guideline on when and where each approach works best. Comparing the different runtimes and scalability graphs, we identify the best approaches for certain situations and demonstrate where certain algorithms fail.

Constructing a Comprehensive Events Database from the Web

In this paper, we consider the problem of constructing a comprehensive database of events taking place around the world. Events include small hyper-local events like farmer's markets, neighborhood garage sales, as well as larger concerts and festivals. Designing a high-precision and high-recall event extractor from unstructured pages across the whole web is a challenging problem. We cannot resort overly to domain-specific strategies since it needs to work on all web pages, including on new domains; we need to account for variations in page layouts and structure across websites. Further, we need to deal with low-quality pages on the web with limited structure. We have built an ML-powered extraction system to solve this problem, using annotations as training data. Our extraction system operates in two phases. In the first phase, we generate raw event information from individual web pages. To do this, an \em event page classifier predicts if a web page contains any event information; this is then followed by a \em single/multiple classifier that decides if the page contains a single event or multiple events; the first phase concludes by applying \em event extractors that extract the key fields of a public event (the title, the date/time information, and the location information). In the second phase, we further improve the extraction quality via three novel algorithms, \em repeated patterns, \em event consolidation and \em wrapper induction, which are designed to use the raw event extractions as input and generate events whose quality is significantly higher. We evaluate our extraction models on two large scale publicly available web corpus, Common Crawl and ClueWeb12. Experimental analysis shows that our methodology achieves over 95% extraction precision and recall on both datasets.

Deploying Hash Tables on Die-Stacked High Bandwidth Memory

Die-stacked High Bandwidth Memory (HBM) is an emerging memory architecture that achieves much higher memory bandwidth with similar or lower memory access latency and smaller capacity, compared with main memories. Memory-intensive database algorithms may potentially benefit from these new features. Due to the small capacity of such die-stacked HBM, a hybrid memory architecture comprising both main memories and HBMs is promising for main-memory databases. As a starting point, we study a key data structure, hash tables, in such a hybrid memory architecture. In a large hash table distributed among multiple NUMA (non-uniform memory accesses) nodes and accessed by multiple CPU sockets, the data placement and memory access scheduling for workload balance are challenging due to the random memory accesses involved that are difficult to predict. In this work, we propose a deployment algorithm that first estimates the memory access cost and then places data in a way that exploits the hybrid memory architecture in a balanced manner. Evaluation results show that the proposed deployment is able to achieve up to three times performance improvement over the state-of-the-art NUMA-aware scheduling algorithms for hash joins in relational databases on present and simulated future hybrid memory architectures.

SESSION: Long - Domain Adaptation and Transfer Learning

Partially Shared Adversarial Learning For Semi-supervised Multi-platform User Identity Linkage

With the increasing popularity and diversity of social media, users tend to join multiple social platforms to enjoy different types of services. User identity linkage, which aims to link identical identities across different social platforms, has attracted increasing research attentions recently. Existing methods usually focus on pairwise identity linkage between two platforms, which cannot piece up the information from multi-sources to depict the intrinsic figures of social users. In this paper, we propose a novel adversarial learning based framework MSUIL with partially shared generators to perform Semi-supervised User Identity Linkage across Multiple social networks. The isomorphism across multiple platforms is captured as the complementary to link identities. The insight is that we aim to learn the desirable projection functions (generators) to not only minimize the distance between the distributions of user identities in arbitrary pairs of platforms, but also incorporate the available annotations as the learning guidance. The projection functions of different platform pairs share partial parameters, which ensures MSUIL can capture the interdependencies among multiple platforms and improves the model efficiency. Empirically, we evaluate our proposal over multiple datasets. The experimental results demonstrate the superiority of the proposed MSUIL model.

Adversarial Domain Adaptation with Semantic Consistency for Cross-Domain Image Classification

In the cross-domain image classification scenario, domain adaption aims to address the challenge of transferring the knowledge obtained from the source domain to the target domain that is regarded as similar but different from the source domain. To get more reliable domain invariant representations, recent methods start to consider class-level distribution alignment across the source and target domains by adaptively assigning pseudo target labels. However, these approaches are vulnerable to the error accumulation and hence unable to preserve cross-domain category consistency. Because the accuracy of pseudo labels cannot be guaranteed explicitly. In this paper, we propose Adversarial Domain Adaptation with Semantic Consistency (ADASC) model to align the discriminative features across domains progressively and effectively, via exploiting the class-level relations between domains. Specifically, to simultaneously alleviate the negative influence of the false pseudo-target labels and get the discriminative domain invariant features, we introduce an Adaptive Centroid Alignment (ACA) strategy and a Class Discriminative Constraint (CDC) step to complement each other iteratively and alternatively in an end-to-end framework. Extensive experiments are conducted on several unsupervised domain adaptation datasets, and the results show that ADASC outperforms the state-of-the-art methods.

ATL: Autonomous Knowledge Transfer from Many Streaming Processes

Transferring knowledge across many streaming processes remains an uncharted territory in the existing literature and features unique characteristics: no labelled instance of the target domain, covariate shift of source and target domain, different period of drifts in the source and target domains. Autonomous transfer learning (ATL) is proposed in this paper as a flexible deep learning approach for the online unsupervised transfer learning problem across many streaming processes. ATL offers an online domain adaptation strategy via the generative and discriminative phases coupled with the KL divergence based optimization strategy to produce a domain invariant network while putting forward an elastic network structure. It automatically evolves its network structure from scratch with/without the presence of ground truth to overcome independent concept drifts in the source and target domain. Rigorous numerical evaluation has been conducted along with comparison against recently published works. ATL demonstrates improved performance while showing significantly faster training speed than its counterparts.

Knowledge Transfer based on Multiple Manifolds Assumption

Unsupervised domain adaptation is a popular but challenging problem setting. Existing unsupervised domain adaptation methods are based on the single manifold assumption, i.e., data are sampled from a single low-dimensional manifold, and thus may not well capture the complex characteristic of the real-world data. In this paper, we propose to transfer knowledge across domains under the multiple manifolds assumption that assumes the data are sampled from multiple low-dimensional manifolds. Specifically, we develop a multiple manifolds information transfer framework (MMIT). The proposed MMIT aims to transfer the multiple manifolds information, which is represented by the data manifold neighborhood structure, with the the best adaptation capacity. To do so, we propose to couple the multiple manifolds information transfer with the domain distribution discrepancy minimization in the adaptation procedure. Experimental studies demonstrate that MMIT achieves the promising adaptation performance on various real-world adaptation tasks.

Cross-domain Aspect Category Transfer and Detection via Traceable Heterogeneous Graph Representation Learning

Aspect category detection is an essential task for sentiment analysis and opinion mining. However, the cost of categorical data labeling, e.g., label the review aspect information for a large number of product domains, can be inevitable but unaffordable. In this study, we propose a novel problem, cross-domain aspect category transfer and detection, which faces three challenges: various feature spaces, different data distributions, and diverse output spaces. To address these problems, we propose an innovative solution, Traceable Heterogeneous Graph Representation Learning (THGRL). Unlike prior text-based aspect detection works, THGRL explores latent domain aspect category connections via massive user behavior information on a heterogeneous graph. Moreover, an innovative latent variable "Walker Tracer" is introduced to characterize the global semantic/aspect dependencies and capture the informative vertexes on the random walk paths. By using THGRL, we project different domains' feature spaces into a common one, while allowing data distributions and output spaces stay differently. Experiment results show that the proposed method outperforms a series of state-of-the-art baseline models.

SESSION: Long - E-Commerce and Advertising I

A Deep Neural Framework for Sales Forecasting in E-Commerce

Product sales forecasting plays a fundamental role in enhancing timeliness of product delivery in E-Commerce. Among many heterogeneous features relevant to sales forecasting, promotion campaigns held in E-Commerce and competing relation between substitutable products would greatly complicate the matter. Unfortunately, these factors are usually overlooked in the existing literature, since the conventional time series analysis based techniques mainly consider the sales records alone. In this paper, we propose a novel deep neural framework for sales forecasting in E-Commerce, named DSF. In DSF, sales forecasting is formulated as a sequence-to-sequence learning problem where the sales is estimated in a recurrent fashion. On top of the decoder, we introduce a sales residual network to explicitly model the impact of competing relation when a promotion campaign is launched for a target item or some substitutable counterparts. Extensive experiments are conducted over two real-world datasets in different domains from Taobao E-Commerce platform. Our results demonstrate that the proposed DSF obtains substantial performance gain over the traditional baselines and up-to-date deep learning alternatives in terms of forecasting accuracy. Further comparison shows that DSF has also surpassed the deep learning based solution currently depolyed in Taobao platform.

An Active and Deep Semantic Matching Framework for Query Rewrite in E-Commercial Search Engine

In order to make the query retrieve much more related products, some query rewrite methods have been proposed to obtain a set of candidate queries which can infer users' search intents and reduce the vocabulary gap between the original query and title of related products. However, previous studies ignore that some candidate queries may change users' search intents and retrieve irrelevant products. As a result, users' search experience will be impacted significantly. To reduce this influence, we need to design a semantic matching model to determine whether the candidate query change the original query's search intents (semantics). In addition, building a semantic matching model faces the following challenges: 1) Queries are usually very short and have limited information. It is very hard to learn an effective semantic matching model with the textual information of queries and candidate queries. 2) In order to get a generalized and effective mode, sufficient data samples are required to train the model. However, the cost of labeling is very huge. In order to address the above challenges, we propose an active and deep semantic matching framework (ActiveMatch) which is composed of two components. One component is the deep semantic matching (DSM) model which can make full use of the search log information to enhance the representation of queries and candidate queries. Then, it can estimate the semantic similarity between the original query and the candidate query more accurately. The other component is an uncertainty and novelty sampling (UNS) strategy which selects the samples to label based on the difficulty of the model estimating and the probability of the occurrence of new words. It not only reduces the cost of labeling but also ensures the effectiveness of the model. The experimental results on the Taobao e-commercial search platform verify the effectiveness of our framework.

AIBox: CTR Prediction Model Training on a Single Node

As one of the major search engines in the world, Baidu's Sponsored Search has long adopted the use of deep neural network (DNN) models for Ads click-through rate (CTR) predictions, as early as in 2013. The input futures used by Baidu's online advertising system (a.k.a. "Phoenix Nest'') are extremely high-dimensional (e.g., hundreds or even thousands of billions of features) and also extremely sparse. The size of the CTR models used by Baidu's production system can well exceed 10TB. This imposes tremendous challenges for training, updating, and using such models in production. For Baidu's Ads system, it is obviously important to keep the model training process highly efficient so that engineers (and researchers) are able to quickly refine and test their new models or new features. Moreover, as billions of user ads click history entries are arriving every day, the models have to be re-trained rapidly because CTR prediction is an extremely time-sensitive task. Baidu's current CTR models are trained on MPI (Message Passing Interface) clusters, which require high fault tolerance and synchronization that incur expensive communication and computation costs. And, of course, the maintenance costs for clusters are also substantial. This paper presents AIBox, a centralized system to train CTR models with tens-of-terabytes-scale parameters by employing solid-state drives (SSDs) and GPUs. Due to the memory limitation on GPUs, we carefully partition the CTR model into two parts: one is suitable for CPUs and another for GPUs. We further introduce a bi-level cache management system over SSDs to store the 10TB parameters while providing low-latency accesses. Extensive experiments on production data reveal the effectiveness of the new system. AIBox has comparable training performance with a large MPI cluster, while requiring only a small fraction of the cost for the cluster.

Improving Ad Click Prediction by Considering Non-displayed Events

Click-through rate (CTR) prediction is the core problem of building advertising systems. Most existing state-of-the-art approaches model CTR prediction as binary classification problems, where displayed events with and without click feedbacks are respectively considered as positive and negative instances for training and offline validation. However, due to the selection mechanism applied in most advertising systems, a selection bias exists between distributions of displayed and non-displayed events. Conventional CTR models ignoring the bias may have inaccurate predictions and cause a loss of the revenue. To alleviate the bias, we need to conduct counterfactual learning by considering not only displayed events but also non-displayed events. In this paper, through a review of existing approaches of counterfactual learning, we point out some difficulties for applying these approaches for CTR prediction in a real-world advertising system. To overcome these difficulties, we propose a novel framework for counterfactual CTR prediction. In experiments, we compare our proposed framework against state-of-the-art conventional CTR models and existing counterfactual learning approaches. Experimental results show significant improvements.

Approximation Algorithms for Coordinating Ad Campaigns on Social Networks

We study a natural model of coordinated social ad campaigns over a social network, based on models of Datta et al. and Aslay et al. Multiple advertisers are willing to pay the host - up to a known budget - per user exposure, whether that exposure is sponsored or organic (i.e., shared by a friend). Campaigns are seeded with sponsored ads to some users, but no network user must be exposed to too many sponsored ads. As a result, while ad campaigns proceed independently over the network, they need to be carefully coordinated with respect to their seed sets. We study the objective of maximizing the network's total ad revenue. Our main result is to show that under a broad class of social influence models, the problem can be reduced to maximizing a submodular function subject to two matroid constraints; it can therefore be approximated within a factor essentially 1/2 in polynomial time. When there is no bound on the individual seed set sizes of advertisers, the constraints correspond only to a single matroid, and the guarantee can be improved to 1 - 1/e; in that case, a factor 1/2 is achieved by a practical greedy algorithm. The 1 - 1/e approximation algorithm for the matroid-constrained problem is far from practical; however, we show that specifically under the Independent Cascade model, LP rounding and Reverse Reachability techniques can be combined to obtain a 1 - 1/e approximation algorithm which scales to several tens of thousands of nodes. Our theoretical results are complemented by experiments evaluating the extent to which the coordination of multiple ad campaigns inhibits the revenue obtained from each individual campaign, as a function of the similarity of the influence networks and the strength of ties in the network. Our experiments suggest that as networks for different advertisers become less similar, the harmful effect of competition decreases. With respect to tie strengths, we show that the most harm is done in an intermediate range.

SESSION: Long - E-Commerce and Advertising II

Regularized Adversarial Sampling and Deep Time-aware Attention for Click-Through Rate Prediction

Improving the performance of click-through rate (CTR) prediction remains one of the core tasks in online advertising systems. With the rise of deep learning, CTR prediction models with deep networks remarkably enhance model capacities. In deep CTR models, exploiting users' historical data is essential for learning users' behaviors and interests. As existing CTR prediction works neglect the importance of the temporal signals when embed users' historical clicking records, we propose a time-aware attention model which explicitly uses absolute temporal signals for expressing the users' periodic behaviors and relative temporal signals for expressing the temporal relation between items. Besides, we propose a regularized adversarial sampling strategy for negative sampling which eases the classification imbalance of CTR data and can make use of the strong guidance provided by the observed negative CTR samples. The adversarial sampling strategy significantly improves the training efficiency, and can be co-trained with the time-aware attention model seamlessly. Experiments are conducted on real-world CTR datasets from both in-station and out-station advertising places.

Conversational Product Search Based on Negative Feedback

Intelligent assistants change the way people interact with computers and make it possible for people to search for products through conversations when they have purchase needs. During the interactions, the system could ask questions on certain aspects of the ideal products to clarify the users' needs. For example, previous work proposed to ask users the exact characteristics of their ideal items before showing results. However, users may not have clear ideas about what an ideal item looks like, especially when they have not seen any item. So it is more feasible to facilitate the conversational search by showing example items and asking for feedback instead. In addition, when the users provide negative feedback for the presented items, it is easier to collect their detailed feedback on certain properties (aspect-value pairs) of the non-relevant items. By breaking down the item-level negative feedback to fine-grained feedback on aspect-value pairs, more information is available to help clarify users' intents. So in this paper, we propose a conversational paradigm for product search driven by non-relevant items, based on which fine-grained feedback is collected and utilized to show better results in the next iteration. We then propose an aspect-value likelihood model to incorporate both positive and negative feedback on fine-grained aspect-value pairs of the non-relevant items. Experimental results show that our model is significantly better than state-of-the-art product search baselines without using feedback and those baselines using item-level negative feedback.

Learning to Ask: Question-based Sequential Bayesian Product Search

Product search is generally recognized as the first and foremost stage of online shopping and thus significant for users and retailers of e-commerce. Most of the traditional retrieval methods use some similarity functions to match the user's query and the document that describes a product, either directly or in a latent vector space. However, user queries are often too general to capture the minute details of the specific product that a user is looking for. In this paper, we propose a novel interactive method to effectively locate the best matching product. The method is based on the assumption that there is a set of candidate questions for each product to be asked. In this work, we instantiate this candidate set by making the hypothesis that products can be discriminated by the entities that appear in the documents associated with them. We propose a Question-based Sequential Bayesian Product Search method, QSBPS, which directly queries users on the expected presence of entities in the relevant product documents. The method learns the product relevance as well as the reward of the potential questions to be asked to the user by being trained on the search history and purchase behavior of a specific user together with that of other users. The experimental results show that the proposed method can greatly improve the performance of product search compared to the state-of-the-art baselines.

A Zero Attention Model for Personalized Product Search

Product search is one of the most popular methods for people to discover and purchase products on e-commerce websites. Because personal preferences often have an important influence on the purchase decision of each customer, it is intuitive that personalization should be beneficial for product search engines. While synthetic experiments from previous studies show that purchase histories are useful for identifying the individual intent of each product search session, the effect of personalization on product search in practice, however, remains mostly unknown. In this paper, we formulate the problem of personalized product search and conduct large-scale experiments with search logs sampled from a commercial e-commerce search engine. Results from our preliminary analysis show that the potential of personalization depends on query characteristics, interactions between queries, and user purchase histories. Based on these observations, we propose a Zero Attention Model for product search that automatically determines when and how to personalize a user-query pair via a novel attention mechanism. Empirical results on commercial product search logs show that the proposed model not only significantly outperforms state-of-the-art personalized product retrieval models, but also provides important information on the potential of personalization in each product search session.

Learning to Generate Personalized Product Descriptions

Personalization plays a key role in electronic commerce, adjusting the products presented to users through search and recommendations according to their personality and tastes. Current personalization efforts focus on the adaptation of product selections, while the description of a given product remains the same regardless of the user who views it. In this work, we propose an approach to personalize product descriptions according to the personality of an individual user. To the best of our knowledge, we are the first to address the problem of generating personalized product descriptions. We first learn to predict a user's personality based on past activity on an e-commerce website. Then, given a user personality, we propose an extractive summarization-based algorithm that selects the sentences to be used as part of a product description in accordance with the given personality. Our evaluation shows that user personality can be effectively learned from past e-commerce activity, while personalized descriptions can lead to a higher interest in the product and increased purchase likelihood.

SESSION: Long - Network Embedding I

Fast and Accurate Network Embeddings via Very Sparse Random Projection

We present FastRP, a scalable and performant algorithm for learning distributed node representations in a graph. FastRP is over 4,000 times faster than state-of-the-art methods such as DeepWalk and node2vec, while achieving comparable or even better performance as evaluated on several real-world networks on various downstream tasks. We observe that most network embedding methods consist of two components: construct a node similarity matrix and then apply dimension reduction techniques to this matrix. We show that the success of these methods should be attributed to the proper construction of this similarity matrix, rather than the dimension reduction method employed. FastRP is proposed as a scalable algorithm for network embeddings. Two key features of FastRP are: 1) it explicitly constructs a node similarity matrix that captures transitive relationships in a graph and normalizes matrix entries based on node degrees; 2) it utilizes very sparse random projection, which is a scalable optimization-free method for dimension reduction. An extra benefit from combining these two design choices is that it allows the iterative computation of node embeddings so that the similarity matrix need not be explicitly constructed, which further speeds up FastRP. FastRP is also advantageous for its ease of implementation, parallelization and hyperparameter tuning. The source code is available at

Hierarchical Community Structure Preserving Network Embedding: A Subspace Approach

To depict ubiquitous relational data in real world, network data have been widely applied in modeling complex relationships. Projecting vertices to low dimensional spaces, quoted as Network Embedding, would thus be applicable to diverse real-world predicative tasks. Numerous works exploiting pairwise proximities, one characteristic owned by real networks, the clustering property, namely vertices are inclined to form communities of various ranges and hence form a hierarchy consisting of communities, has barely received attention from researchers. In this paper, we propose our network embedding framework, abbreviated SpaceNE, preserving hierarchies formed by communities through subspaces, manifolds with flexible dimensionalities and are inherently hierarchical. Moreover, we propose that subspaces are able to address further problems in representing hierarchical communities, including sparsity and space warps. Last but not least, we proposed constraints on dimensions of subspaces to denoise, which are further approximated by differentiable functions such that joint optimization is enabled, along with a layer-wise scheme to alleviate the overhead cause by the vast number of parameters. We conduct various experiments with results demonstrating our model's effectiveness in addressing community hierarchies.

Collective Link Prediction Oriented Network Embedding with Hierarchical Graph Attention

To enjoy more social network services, users nowadays are usually involved in multiple online sites at the same time. Aligned social networks provide more information to alleviate the problem of data insufficiency. In this paper, we target on the collective link prediction problem and aim to predict both the intra-network social links as well as the inter-network anchor links across multiple aligned social networks. It is not an easy task, and the major challenges involve the network characteristic difference problem and different directivity properties of the social and anchor links to be predicted. To address the problem, we propose an application oriented network embedding framework, Hierarchical Graph Attention based Network Embedding (HGANE), for collective link prediction over directed aligned networks. Very different from the conventional general network embedding models, HGANE effectively incorporates the collective link prediction task objectives into consideration. It learns the representations of nodes by aggregating information from both the intra-network neighbors (connected by social links) and inter-network partners (connected by anchor links). What's more, we introduce a hierarchical graph attention mechanism for the intra-network neighbors and inter-network partners respectively, which resolves the network characteristic differences and the link directivity challenges effectively. Extensive experiments have been conducted on real-world aligned networks datasets to demonstrate that our model outperformed the state-of-the-art baseline methods in addressing the collective link prediction problem by a large margin.

Discerning Edge Influence for Network Embedding

Network embedding, which learns the low-dimensional representations of nodes, has gained significant research attention. Despite its superior empirical success, often measured by the prediction performance of downstream tasks (e.g., multi-label classification), it is unclear \em why a given embedding algorithm outputs the specific node representations, and \em how the resulting node representations relate to the structure of the input network. In this paper, we propose to discern the edge influence as the first step towards understanding skip-gram basd network embedding methods. For this purpose, we propose an auditing framework Near, whose key part includes two algorithms (Near-add \ and Near-del ) to effectively and efficiently quantify the influence of each edge. Based on the algorithms, we further identify high-influential edges by exploiting the linkage between edge influence and the network structure. Experimental results demonstrate that the proposed algorithms (Near-add \ and Near-del ) are significantly faster (up to $2,000\times$) than straightforward methods with little quality loss. Moreover, the proposed framework can efficiently identify the most influential edges for network embedding in the context of downstream prediction task and adversarial attacking.

Constrained Co-embedding Model for User Profiling in Question Answering Communities

In this paper, we study the problem of user profiling in question answering communities. We address the problem by proposing a constrained co-embedding model (CCEM). CCEM jointly infers the embeddings of both users and words in question answering communities such that the similarities between users and words can be semantically measured. Our CCEM works with constraints which enforce the inferred embeddings of users and words subject to this criteria: given a question in the community, embeddings of users whose answers receive more votes are closer to the embeddings of the words occurring in these answers, compared to the embeddings of those whose answers receive less votes. Experiments on a Chinese dataset, Zhihu dataset, demonstrate that our proposed co-embedding algorithm outperforms state-of-the-art methods in the task of user profiling.

SESSION: Long - Network Embedding II

Hyper-Path-Based Representation Learning for Hyper-Networks

Network representation learning has aroused widespread interests in recent years. While most of the existing methods deal with edges as pairwise relationships, only a few studies have been proposed for hyper-networks to capture more complicated tuplewise relationships among multiple nodes. A hyper-network is a network where each edge, called hyperedge, connects an arbitrary number of nodes. Different from conventional networks, hyper-networks have certain degrees of indecomposability such that the nodes in a subset of a hyperedge may not possess a strong relationship. That is the main reason why traditional algorithms fail in learning representations in hyper-networks by simply decomposing hyperedges into pairwise relationships. In this paper, we firstly define a metric to depict the degrees of indecomposability for hyper-networks. Then we propose a new concept called hyper-path and design hyper-path-based random walks to preserve the structural information of hyper-networks according to the analysis of the indecomposability. Then a carefully designed algorithm, Hyper-gram, utilizes these random walks to capture both pairwise relationships and tuplewise relationships in the whole hyper-networks. Finally, we conduct extensive experiments on several real-world datasets covering the tasks of link prediction and hyper-network reconstruction, and results demonstrate the rationality, validity, and effectiveness of our methods compared with those existing state-of-the-art models designed for conventional networks or hyper-networks.

Multi-Hot Compact Network Embedding

Network embedding, as a promising way of the network representation learning, is capable of supporting various subsequent network mining and analysis tasks, and has attracted growing research interests recently. Traditional approaches assign each node with an independent continuous vector, which will cause memory overhead for large networks. In this paper we propose a novel multi-hot compact network embedding framework to effectively reduce memory cost by learning partially shared embeddings. The insight is that a node embedding vector is composed of several basis vectors according to a multi-hot index vector. The basis vectors are shared by different nodes, which can significantly reduce the number of continuous vectors while maintain similar data representation ability. Specifically, we propose a MCNE$_p $ model to learn compact embeddings from pre-learned node features. A novel component named compressor is integrated into MCNE$_p $ to tackle the challenge that popular back-propagation optimization cannot propagate loss through discrete samples. We further propose an end-to-end model MCNE$_t $ to learn compact embeddings from the input network directly. Empirically, we evaluate the proposed models over four real network datasets, and the results demonstrate that our proposals can save about 90% of memory cost of network embeddings without significantly performance decline.

Temporal Network Embedding with Micro- and Macro-dynamics

Network embedding aims to embed nodes into a low-dimensional space, while capturing the network structures and properties. Although quite a few promising network embedding methods have been proposed, most of them focus on static networks. In fact, temporal networks, which usually evolve over time in terms of microscopic and macroscopic dynamics, are ubiquitous. The micro-dynamics describe the formation process of network structures in a detailed manner, while the macro-dynamics refer to the evolution pattern of the network scale. Both micro- and macro-dynamics are the key factors to network evolution; however, how to elegantly capture both of them for temporal network embedding, especially macro-dynamics, has not yet been well studied. In this paper, we propose a novel temporal network embedding method with micro- and macro-dynamics, named $\rmM^2DNE $. Specifically, for micro-dynamics, we regard the establishments of edges as the occurrences of chronological events and propose a temporal attention point process to capture the formation process of network structures in a fine-grained manner. For macro-dynamics, we define a general dynamics equation parameterized with network embeddings to capture the inherent evolution pattern and impose constraints in a higher structural level on network embeddings. Mutual evolutions of micro- and macro-dynamics in a temporal network alternately affect the process of learning node embeddings. Extensive experiments on three real-world temporal networks demonstrate that $\rmM^2DNE $ significantly outperforms the state-of-the-arts not only in traditional tasks, e.g., network reconstruction, but also in temporal tendency-related tasks, e.g., scale prediction.

MrMine: Multi-resolution Multi-network Embedding

Network embedding has become the cornerstone of a variety of mining tasks, such as classification, link prediction, clustering, anomaly detection and many more, thanks to its superior ability to encode the intrinsic network characteristics in a compact low-dimensional space. Most of the existing methods focus on a single network and/or a single resolution, which generate embeddings of different network objects (node/subgraph/network) from different networks separately. A fundamental limitation with such methods is that the intrinsic relationship across different networks (e.g., two networks share same or similar subgraphs) and that across different resolutions (e.g., the node-subgraph membership) are ignored, resulting in disparate embeddings. Consequentially, it leads to sub-optimal performance or even becomes inapplicable for some downstream mining tasks (e.g., role classification, network alignment. etc.). In this paper, we propose a unified framework MrMine to learn the representations of objects from multiple networks at three complementary resolutions (i.e., network, subgraph and node) simultaneously. The key idea is to construct the cross-resolution cross-network context for each object. The proposed method bears two distinctive features. First, it enables and/or boosts various multi-network downstream mining tasks by having embeddings at different resolutions from different networks in the same embedding space. Second, Our method is efficient and scalable, with a O(nlog(n)) time complexity for the base algorithm and a linear time complexity w.r.t. the number of nodes and edges of input networks for the accelerated version. Extensive experiments on real-world data show that our methods (1) are able to enable and enhance a variety of multi-network mining tasks, and (2) scale up to million-node networks.

Task-Guided Pair Embedding in Heterogeneous Network

Many real-world tasks solved by heterogeneous network embedding methods can be cast as modeling the likelihood of a pairwise relationship between two nodes. For example, the goal of author identification task is to model the likelihood of a paper being written by an author (paper-author pairwise relationship). Existing taskguided embedding methods are node-centric in that they simply measure the similarity between the node embeddings to compute the likelihood of a pairwise relationship between two nodes. However, we claim that for task-guided embeddings, it is crucial to focus on directly modeling the pairwise relationship. In this paper, we propose a novel task-guided pair embedding framework in heterogeneous network, called TaPEm, that directly models the relationship between a pair of nodes that are related to a specific task (e.g., paper-author relationship in author identification). To this end, we 1) propose to learn a pair embedding under the guidance of its associated context path, i.e., a sequence of nodes between the pair, and 2) devise the pair validity classifier to distinguish whether the pair is valid with respect to the specific task at hand. By introducing pair embeddings that capture the semantics behind the pairwise relationships, we are able to learn the fine-grained pairwise relationship between two nodes, which is paramount for task-guided embedding methods. Extensive experiments on author identification task demonstrate that TaPEm outperforms the state-of-the-art methods, especially for authors with few publication records.

SESSION: Long - Graph Nerual Network I

Graph Convolutional Networks with Motif-based Attention

The success of deep convolutional neural networks in the domains of computer vision and speech recognition has led researchers to investigate generalizations of the said architecture to graph-structured data. A recently-proposed method called Graph Convolutional Networks has been able to achieve state-of-the-art results in the task of node classification. However, since the proposed method relies on localized first-order approximations of spectral graph convolutions, it is unable to capture higher-order interactions between nodes in the graph. In this work, we propose a motif-based graph attention model, called Motif Convolutional Networks, which generalizes past approaches by using weighted multi-hop motif adjacency matrices to capture higher-order neighborhoods. A novel attention mechanism is used to allow each individual node to select the most relevant neighborhood to apply its filter. We evaluate our approach on graphs from different domains (social networks and bioinformatics) with results showing that it is able to outperform a set of competitive baselines on the semi-supervised node classification task. Additional results demonstrate the usefulness of attention, showing that different higher-order neighborhoods are prioritized by different kinds of nodes.

Long-tail Hashtag Recommendation for Micro-videos with Graph Convolutional Network

Hashtags, a user provides to a micro-video, are the ones which can well describe the semantics of the micro-video's content in his/her mind. At the same time, hashtags have been widely used to facilitate various micro-video retrieval scenarios (e.g., search, browse, and categorization). Despite their importance, numerous micro-videos lack hashtags or contain inaccurate or incomplete hashtags. In light of this, hashtag recommendation, which suggests a list of hashtags to a user when he/she wants to annotate a post, becomes a crucial research problem. However, little attention has been paid to micro-video hashtag recommendation, mainly due to the following three reasons: 1) lack of benchmark dataset; 2) the temporal and multi-modality characteristics of micro-videos; and 3) hashtag sparsity and long-tail distributions. In this paper, we recommend hashtags for micro-videos by presenting a novel multi-view representation interactive embedding model with graph-based information propagation. It is capable of boosting the performance of micro-videos hashtag recommendation by jointly considering the sequential feature learning, the video-user-hashtag interaction, and the hashtag correlations. Extensive experiments on a constructed dataset demonstrate our proposed method outperforms state-of-the-art baselines. As a side research contribution, we have released our dataset and codes to facilitate the research in this community.

Hashing Graph Convolution for Node Classification

Convolution on graphs has aroused great interest in AI due to its potential applications to non-gridded data. To bypass the influence of ordering and different node degrees, the summation/average diffusion/aggregation is often imposed on local receptive field in most prior works. However, the collapsing into one node in this way tends to cause signal entanglements of nodes, which would result in a sub-optimal feature and decrease the discriminability of nodes. To address this problem, in this paper, we propose a simple but effective Hashing Graph Convolution (HGC) method by using global-hashing and local-projection on node aggregation for the task of node classification. In contrast to the conventional aggregation with a full collision, the hash-projection can greatly reduce the collision probability during gathering neighbor nodes. Another incidental effect of hash-projection is that the receptive field of each node is normalized into a common-size bucket space, which not only staves off the trouble of different-size neighbors and their order but also makes a graph convolution run like the standard shape-gridded convolution. Considering the few training samples, also, we introduce a prediction-consistent regularization term into HGC to constrain the score consistency of unlabeled nodes in the graph. HGC is evaluated on both transductive and inductive experimental settings and achieves new state-of-the-art results on all datasets for node classification task. The extensive experiments demonstrate the effectiveness of hash-projection.

Relation-Aware Graph Convolutional Networks for Agent-Initiated Social E-Commerce Recommendation

Recent years have witnessed a phenomenal success of agent-initiated social e-commerce models, which encourage users to become selling agents to promote items through their social connections. The complex interactions in this type of social e-commerce can be formulated as Heterogeneous Information Networks (HIN), where there are numerous types of relations between three types of nodes, i.e., users, selling agents and items. Learning high quality node embeddings is of key interest, and Graph Convolutional Networks (GCNs) have recently been established as the latest state-of-the-art methods in representation learning. However, prior GCN models have fundamental limitations in both modeling heterogeneous relations and efficiently sampling relevant receptive field from vast neighborhood. To address these problems, we propose RecoGCN, which stands for a RElation-aware CO-attentive GCN model, to effectively aggregate heterogeneous features in a HIN. It makes up current GCN's limitation in modelling heterogeneous relations with a relation-aware aggregator, and leverages the semantic-aware meta-paths to carve out concise and relevant receptive fields for each node. To effectively fuse the embeddings learned from different meta-paths, we further develop a co-attentive mechanism to dynamically assign importance weights to different meta-paths by attending the three-way interactions among users, selling agents and items. Extensive experiments on a real-world dataset demonstrate RecoGCN is able to learn meaningful node embeddings in HIN, and consistently outperforms baseline methods in recommendation tasks.

Fi-GNN: Modeling Feature Interactions via Graph Neural Networks for CTR Prediction

Click-through rate (CTR) prediction is an essential task in web applications such as online advertising and recommender systems, whose features are usually in multi-field form. The key of this task is to model feature interactions among different feature fields. Recently proposed deep learning based models follow a general paradigm: raw sparse input multi-field features are first mapped into dense field embedding vectors, and then simply concatenated together to feed into deep neural networks (DNN) or other specifically designed networks to learn high-order feature interactions. However, the simple unstructured combination of feature fields will inevitably limit the capability to model sophisticated interactions among different fields in a sufficiently flexible and explicit fashion. In this work, we propose to represent the multi-field features in a graph structure intuitively, where each node corresponds to a feature field and different fields can interact through edges. The task of modeling feature interactions can be thus converted to modeling node interactions on the corresponding graph. To this end, we design a novel model Feature Interaction Graph Neural Networks (Fi-GNN). Taking advantage of the strong representative power of graphs, our proposed model can not only model sophisticated feature interactions in a flexible and explicit fashion, but also provide good model explanations for CTR prediction. Experimental results on two real-world datasets show its superiority over the state-of-the-arts.

SESSION: Long - Graph Nerual Network II

Key Player Identification in Underground Forums over Attributed Heterogeneous Information Network Embedding Framework

Online underground forums have been widely used by cybercriminals to exchange knowledge and trade in illicit products or services, which have played a central role in the cybercriminal ecosystem. In order to combat the evolving cybercrimes, in this paper, we propose and develop an intelligent system named iDetective to automate the analysis of underground forums for the identification of key players (i.e., users who play the vital role in the value chain). In iDetective, we first introduce an attributed heterogeneous information network (AHIN) for user representation and use a meta-path based approach to incorporate higher-level semantics to build up relatedness over users in underground forums; then we propose Player2Vec to efficiently learn node (i.e., user) representations in AHIN for key player identification. In Player2Vec, we first map the constructed AHIN to a multi-view network which consists of multiple single-view attributed graphs encoding the relatedness over users depicted by different designed meta-paths; then we employ graph convolutional network (GCN) to learn embeddings of each single-view attributed graph; later, an attention mechanism is designed to fuse different embeddings learned based on different single-view attributed graphs for final representations. Comprehensive experiments on the data collections from different underground forums (i.e., Hack Forums, Nulled) are conducted to validate the effectiveness of iDetective in key player identification by comparisons with alternative approaches.

Learning to Identify High Betweenness Centrality Nodes from Scratch: A Novel Graph Neural Network Approach

Betweenness centrality (BC) is a widely used centrality measures for network analysis, which seeks to describe the importance of nodes in a network in terms of the fraction of shortest paths that pass through them. It is key to many valuable applications, including community detection and network dismantling. Computing BC scores on large networks is computationally challenging due to its high time complexity. Many sampling-based approximation algorithms have been proposed to speed up the estimation of BC. However, these methods still need considerable long running time on large-scale networks, and their results are sensitive to even small perturbation to the networks. In this paper, we focus on the efficient identification of top-k nodes with highest BC in a graph, which is an essential task to many network applications. Different from previous heuristic methods, we turn this task into a learning problem and design an encoder-decoder based framework as a solution. Specifically, the encoder leverages the network structure to represent each node as an embedding vector, which captures the important structural information of the node. The decoder transforms each embedding vector into a scalar, which identifies the relative rank of a node in terms of its BC. We use the pairwise ranking loss to train the model to identify the orders of nodes regarding their BC. By training on small-scale networks, the model is capable of assigning relative BC scores to nodes for much larger networks, and thus identifying the highly-ranked nodes. Experiments on both synthetic and real-world networks demonstrate that, compared to existing baselines, our model drastically speeds up the prediction without noticeable sacrifice in accuracy, and even outperforms the state-of-the-arts in terms of accuracy on several large real-world networks.

Multiple Rumor Source Detection with Graph Convolutional Networks

Detecting rumor source in social networks is one of the key issues for defeating rumors automatically. Although many efforts have been devoted to defeating online rumors, most of them are proposed based an assumption that the underlying propagation model is known in advance. However, this assumption may lead to impracticability on real data, since it is usually difficult to acquire the actual underlying propagation model. Some attempts are developed by using label propagation to avoid the limitation caused by lack of prior knowledge on the underlying propagation model. Nonetheless, they still suffer from the shortcoming that the node label is simply an integer which may restrict the prediction precision. In this paper, we propose a deep learning based model, namely GCNSI (Graph Convolutional Networks based Source Identification), to locate multiple rumor sources without prior knowledge of underlying propagation model. By adopting spectral domain convolution, we build node representation by utilizing its multi-order neighbors information such that the prediction precision on the sources is improved. We conduct experiments on several real datasets and the results demonstrate that our model outperforms state-of-the-art model.

Rethinking the Item Order in Session-based Recommendation with Graph Neural Networks

Predicting a user's preference in a short anonymous interaction session instead of long-term history is a challenging problem in the real-life session-based recommendation, e.g., e-commerce and media stream. Recent research of the session-based recommender system mainly focuses on sequential patterns by utilizing the attention mechanism, which is straightforward for the session's natural sequence sorted by time. However, the user's preference is much more complicated than a solely consecutive time pattern in the transition of item choices. In this paper, therefore, we study the item transition pattern by constructing a session graph and propose a novel model which collaboratively considers the sequence order and the latent order in the session graph for a session-based recommender system. We formulate the next item recommendation within the session as a graph classification problem. Specifically, we propose a weighted attention graph layer and a Readout function to learn embeddings of items and sessions for the next item recommendation. Extensive experiments have been conducted on two benchmark E-commerce datasets, Yoochoose and Diginetica, and the experimental results show that our model outperforms other state-of-the-art methods.

Gravity-Inspired Graph Autoencoders for Directed Link Prediction

Graph autoencoders (AE) and variational autoencoders (VAE) recently emerged as powerful node embedding methods. In particular, graph AE and VAE were successfully leveraged to tackle the challenging link prediction problem, aiming at figuring out whether some pairs of nodes from a graph are connected by unobserved edges. However, these models focus on undirected graphs and therefore ignore the potential direction of the link, which is limiting for numerous real-life applications. In this paper, we extend the graph AE and VAE frameworks to address link prediction in directed graphs. We present a new gravity-inspired decoder scheme that can effectively reconstruct directed graphs from a node embedding. We empirically evaluate our method on three different directed link prediction tasks, for which standard graph AE and VAE perform poorly. We achieve competitive results on three real-world graphs, outperforming several popular baselines.

SESSION: Long - Heterogeneous Data

Discovering Hypernymy in Text-Rich Heterogeneous Information Network by Exploiting Context Granularity

Text-rich heterogeneous information networks (text-rich HINs) are ubiquitous in real-world applications. Hypernymy, also known as is-a relation or subclass-of relation, lays in the core of many knowledge graphs and benefits many downstream applications. Existing methods of hypernymy discovery either leverage textual patterns to extract explicitly mentioned hypernym-hyponym pairs, or learn a distributional representation for each term of interest based its context. These approaches rely on statistical signals from the textual corpus, and their effectiveness would therefore be hindered when the signals from the corpus are not sufficient for all terms of interest. In this work, we propose to discover hypernymy in text-rich HINs, which can introduce additional high-quality signals. We develop a new framework, named HyperMine, that exploits multi-granular contexts and combines signals from both text and network without human labeled data. HyperMine extends the definition of "context" to the scenario of text-rich HIN. For example, we can define typed nodes and communities as contexts. These contexts encode signals of different granularities and we feed them into a hypernymy inference model. HyperMine learns this model using weak supervision acquired based on high-precision textual patterns. Extensive experiments on two large real-world datasets demonstrate the effectiveness of HyperMine and the utility of modeling context granularity. We further show a case study that a high-quality taxonomy can be generated solely based on the hypernymy discovered by HyperMine.

αCyber: Enhancing Robustness of Android Malware Detection System against Adversarial Attacks on Heterogeneous Graph based Model

The explosive growth and increasing sophistication of Android malware call for new defensive techniques that are capable of protecting mobile users against novel threats. To combat the evolving Android malware attacks, systems of HinDroid and AiDroid have demonstrated the success of heterogeneous graph (HG) based classifiers in Android malware detection; however, their success may also incentivize attackers to defeat HG based models to bypass the detection. By far, there has no work on adversarial attack and/or defense on HG data. In this paper, we explore the robustness of HG based model in Android malware detection at the first attempt. In particular, based on a generic HG based classifier, (1) we first present a novel yet practical adversarial attack model (named HG-Attack) on HG data by considering Android malware attackers' current capabilities and knowledge; (2) to effectively combat the adversarial attacks on HG, we then propose a resilient yet elegant defense paradigm (named Rad-HGC) to enhance robustness of HG based classifier in Android malware detection. Promising experimental results based on the large-scale and real sample collections from Tencent Security Lab demonstrate the effectiveness of our developed system αCyber, which integrates our proposed defense model Rad-HGC that is resilient against practical adversarial malware attacks on the HG data performed by HG-Attack.

BHIN2vec: Balancing the Type of Relation in Heterogeneous Information Network

The goal of network embedding is to transform nodes in a network to a low-dimensional embedding vectors. Recently, heterogeneous network has shown to be effective in representing diverse information in data. However, heterogeneous network embedding suffers from the imbalance issue, i.e. the size of relation types (or the number of edges in the network regarding the type) is imbalanced. In this paper, we devise a new heterogeneous network embedding method, called BHIN2vec, which considers the balance among all relation types in a network. We view the heterogeneous network embedding as simultaneously solving multiple tasks in which each task corresponds to each relation type in a network. After splitting the skip-gram loss into multiple losses corresponding to different tasks, we propose a novel random-walk strategy to focus on the tasks with high loss values by considering the relative training ratio. Unlike previous random walk strategies, our proposed random-walk strategy generates training samples according to the relative training ratio among different tasks, which results in a balanced training for the node embedding. Our extensive experiments on node classification and recommendation demonstrate the superiority of BHIN2vec compared to the state-of-the-art methods. Also, based on the relative training ratio, we analyze how much each relation type is represented in the embedding space.

Deep Sequence-to-Sequence Entity Matching for Heterogeneous Entity Resolution

Entity Resolution (ER) identifies records from different data sources that refer to the same real-world entity. Conventional ER approaches usually employ a structure matching mechanism, where attributes are aligned, compared and aggregated for ER decision. The structure matching approaches, unfortunately, often suffer from heterogeneous and dirty ER problems. That is, entities from different data sources are described using different schemas, and attribute values may be misplaced, missing, or noisy. In this paper, we propose a deep sequence-to-sequence entity matching model, denoted Seq2SeqMatcher, which can effectively solve the heterogeneous and dirty problems by modeling ER as a token-level sequence-to-sequence matching task. Specifically, we propose an align-compare-aggregate neural network for Seq2Seq entity matching, which can learn the representations of tokens, capture the semantic relevance between tokens, and aggregate matching evidence for accurate ER decisions in an end-to-end manner. Experimental results show that, by comparing entity records in token level and learning all components in an end-to-end manner, our Seq2Seq entity matching model can achieve remarkable performance improvements on 9 standard entity resolution benchmarks.

HeteSpaceyWalk: A Heterogeneous Spacey Random Walk for Heterogeneous Information Network Embedding

Heterogeneous information network (HIN) embedding has gained increasing interests recently. However, the current way of random-walk based HIN embedding methods have paid few attention to the higher-order Markov chain nature of meta-path guided random walks, especially to the stationarity issue. In this paper, we systematically formalize the meta-path guided random walk as a higher-order Markov chain process,and present a heterogeneous personalized spacey random walk to efficiently and effectively attain the expected stationary distribution among nodes. Then we propose a generalized scalable framework to leverage the heterogeneous personalized spacey random walk to learn embeddings for multiple types of nodes in an HIN guided by a meta-path, a meta-graph, and a meta-schema respectively. We conduct extensive experiments in several heterogeneous networks and demonstrate that our methods substantially outperform the existing state-of-the-art network embedding algorithms.

SESSION: Long - Knowledge Graph I

EHR Coding with Multi-scale Feature Attention and Structured Knowledge Graph Propagation

Assigning standard medical codes (e.g., ICD-9-CM) representing diagnoses or procedures to electronic health record (EHR) is an important task in the medical domain. However, automatic coding is difficult since the clinical note is composed of multiple long and heterogeneous textual narratives (e.g., discharge diagnosis, pathology reports, surgical procedure notes). Furthermore, the code label space is large and the label distribution is extremely unbalanced. The state-of-the-art methods mainly regard EHR coding as a multi-label text classification task and use shallow convolution neural network with fixed window size, which is incapable of learning variable n-gram features and the ontology structure between codes. In this paper, we leverage a densely connected convolutional neural network which is able to produce variable n-gram features for clinical note feature learning. We also incorporate a multi-scale feature attention to adaptively select multi-scale features since the most informative n-grams in clinical notes for each word can vary in length according to the neighborhood. Furthermore, we leverage graph convolutional neural network to capture both the hierarchical relationships among medical codes and the semantics of each code. Finally, We validate our method on the public dataset, and the evaluation results indicate that our method can significantly outperform other state-of-the-art models.

A Fine-grained and Noise-aware Method for Neural Relation Extraction

Distant supervision is an efficient way to generate large-scale training data for relation extraction without human efforts. However, a coin has two sides. The automatically annotated labels for training data are problematic, which can be summarized as multi-instance multi-label problem and coarse-grained (bag-level) supervised signal. To address these problems, we propose two reasonable assumptions and craft reinforcement learning to capture the expressive sentence for each relation mentioned in a bag. More specifically, we extend the original expressed-at-least-once assumption to multi-label level, and introduce a novel express-at-most-one assumption. Besides, we design a fine-grained reward function, and model the sentence selection process as an auction where different relations for a bag need to compete together to achieve the possession of a specific sentence based on its expressiveness. In this way, our model can be dynamically self-adapted, and eventually implements the accurate one-to-one mapping from a relation label to its chosen expressive sentence, which serves as training instances for the extractor. The experimental results on a public dataset demonstrate that our model constantly and substantially outperforms current state-of-the-art methods for relation extraction.

Learning Region Similarity over Spatial Knowledge Graphs with Hierarchical Types and Semantic Relations

A large number of spatial knowledge graphs (SKGs) are available from spatially enriched knowledge bases, e.g., DBpedia and YAGO2. This provides a great chance to understand valuable information about the regions surrounding us. However, it is hard to comprehend SKGs due to the explosively growing volume and the complication of the graph structures. Thus we study the problem of similar region search (SRS), which is an easy-to-use but effective way to explore spatial data. The effectiveness of SRS highly depends on how to measure the region similarity. However, existing approaches cannot make use of the rich information contained in SKGs thus may lead to incorrect results. In this paper, we propose a spatial knowledge representation learning method for region similarity, namely SKRL4RS. SKRL4RS firstly encodes the spatial entities of an SKG into a vector space to make it easier to extract useful features. Then regions are represented by 3-D tensors using the spatial entity embeddings together with geographical information. Finally, region tensors are fed into the conventional triplet network to learn the feature vectors of regions. The region similarity measure learned by SKRL4RS can capture the hierarchical types, semantic relatedness, and relative locations of spatial entities inside a region. Experimental results on two real-world datasets show that our SKRL4RS outperforms the state-of-the-art by a significant margin in terms of the accuracy of measuring region similarity.

Bayes EMbedding (BEM): Refining Representation by Integrating Knowledge Graphs and Behavior-specific Networks

Low-dimensional embeddings of knowledge graphs and behavior graphs have proved remarkably powerful in varieties of tasks, from predicting unobserved edges between entities to content recommendation. The two types of graphs can contain distinct and complementary information for the same entities/nodes. However, previous works focus either on knowledge graph embedding or behavior graph embedding while few works consider both in a unified way. Here we present BEM, a Bayesian framework that incorporates the information from knowledge graphs and behavior graphs. To be more specific, BEM takes as prior the pre-trained embeddings from the knowledge graph, and integrates them with the pre-trained embeddings from the behavior graphs via a Bayesian generative model. BEM is able to mutually refine the embeddings from both sides while preserving their own topological structures. To show the superiority of our method, we conduct a range of experiments on three benchmark datasets: node classification, link prediction, triplet classification on two small datasets related to Freebase, and item recommendation on a large-scale e-commerce dataset.

A Benchmark for Fact Checking Algorithms Built on Knowledge Bases

Fact checking is the task of determining if a given claim holds. Several algorithms have been developed to check claims with reference information in the form of facts in a knowledge base. While individual algorithms have been experimentally evaluated in the past, we provide the first comprehensive and publicly available benchmark infrastructure for evaluating methods across a wide range of assumptions about the claims and the reference information. We show how, by changing the popularity, transparency, homogeneity, and functionality properties of the facts in an experiment, it is possible to influence significantly the performance of the fact checking algorithms. We introduce a benchmark framework to systematically enforce such properties in training and testing datasets with fine tune control over their properties. We then use our benchmark to compare fact checking algorithms with one another, as well as with methods that can solve the link prediction task in knowledge bases. Our evaluation shows the impact of the four data properties on the qualitative performance of the fact checking solutions and reveals a number of new insights concerning their applicability and performance.

SESSION: Long - Knowledge Graph II

Online Schemaless Querying of Heterogeneous Open Knowledge Bases

Applications that depend on a deep understanding of natural language text have led to a renaissance of large knowledge bases (KBs). Some of these are curated manually and conform to an ontology. Many others, called open KBs, are derived automatically from unstructured text without any pre-specified ontology. These open KBs offer broad coverage of information but are far more heterogeneous than curated KBs, which themselves are more heterogeneous than traditional databases with a fixed schema. Due to the heterogeneity of information representation, querying KBs is a challenging task. Traditionally, query expansion is performed to cover all possible transformations and semantically equivalent structures. Such query expansion can be impractical for heterogeneous open KBs, particularly when complex queries lead to a combinatorial explosion of expansion possibilities. Furthermore, learning a query expansion model requires training examples, which is difficult to scale to diverse representations of facts in the KB. In this paper, we introduce an online schemaless querying method that does not require the query to exactly match the facts. Instead of exactly matching a query, it finds matches for individual query components and then identifies an answer by reasoning over the collective evidence. We devise an alignment-based algorithm for extracting answers based on textual and semantic similarity of query components and evidence fields. Thus, any representational mismatches between the query and evidence are handled online at query-time. Experiments show our approach is effective in handling multi-constraint queries.

Enhancing Conversational Dialogue Models with Grounded Knowledge

Leveraging external knowledge to enhance conversational models has become a popular research area in recent years. Compared to vanilla generative models, the knowledge-grounded models may produce more informative and engaging responses. Although various approaches have been proposed in the past, how to effectively incorporate knowledge remains an open research question. It is unclear how much external knowledge should be retrieved and what is the optimal way to enhance the conversational model, trading off between relevant information and noise. Therefore, in this paper, we aim to bridge the gap by first extensively evaluating various types of state-of-the-art knowledge-grounded conversational models, including recurrent neural network based, memory networks based, and Transformer based models. We demonstrate empirically that those conversational models can only be enhanced with the right amount of external knowledge. To effectively leverage information originated from external knowledge, we propose a novel Transformer with Expanded Decoder (Transformer-ED or TED for short), which can automatically tune the weights for different sources of evidence when generating responses. Our experiments show that our proposed model outperforms state-of-the-art models in terms of both quality and diversity.

MedTruth: A Semi-supervised Approach to Discovering Knowledge Condition Information from Multi-Source Medical Data

Knowledge Graph (KG) contains entities and the relations between entities. Due to its representation ability, KG has been successfully applied to support many medical/healthcare tasks. However, in the medical domain, knowledge holds under certain conditions. Such conditions for medical knowledge are crucial for decision-making in various medical applications, which is missing in existing medical KGs. In this paper, we aim to discovery medical knowledge conditions from texts to enrich KGs. Electronic Medical Records (EMRs) are systematized collection of clinical data and contain detailed information about patients, thus EMRs can be a good resource to discover medical knowledge conditions. Unfortunately, the amount of available EMRs is limited due to reasons such as regularization. Meanwhile, a large amount of medical question answering (QA) data is available, which can greatly help the studied task. However, the quality of medical QA data is quite diverse, which may degrade the quality of the discovered medical knowledge conditions. In the light of these challenges, we propose a new truth discovery method, MedTruth, for medical knowledge condition discovery, which incorporates prior source quality information into the source reliability estimation procedure, and also utilizes the knowledge triple information for trustworthy information computation. We conduct series of experiments on real-world medical datasets to demonstrate that the proposed method can discover meaningful and accurate conditions for medical knowledge by leveraging both EMR and QA data. Further, the proposed method is tested on synthetic datasets to validate its effectiveness under various scenarios.

Look before you Hop: Conversational Question Answering over Knowledge Graphs Using Judicious Context Expansion

Fact-centric information needs are rarely one-shot; users typically ask follow-up questions to explore a topic. In such a conversational setting, the user's inputs are often incomplete, with entities or predicates left out, and ungrammatical phrases. This poses a huge challenge to question answering (QA) systems that typically rely on cues in full-fledged interrogative sentences. As a solution, we develop CONVEX, an unsupervised method that can answer incomplete questions over a knowledge graph (KG) by maintaining conversation context using entities and predicates seen so far and automatically inferring missing or ambiguous pieces for follow-up questions. The core of our method is a graph exploration algorithm that judiciously expands a frontier to find candidate answers for the current question. To evaluate CONVEX, we release ConvQuestions, a crowdsourced benchmark with 11,200 distinct conversations from five different domains. We show that CONVEX: (i) adds conversational support to any stand-alone QA system, and (ii) outperforms state-of-the-art baselines and question completion strategies.

Learning to Answer Complex Questions over Knowledge Bases with Query Composition

Recent years have seen a surge of knowledge-based question answering (KB-QA) systems which provide crisp answers to user-issued questions by translating them to precise structured queries over a knowledge base (KB). A major challenge in KB-QA is bridging the gap between natural language expressions and the complex schema of the KB. As a result, existing methods focus on simple questions answerable with one main relation path in the KB and struggle with complex questions that require joining multiple relations. We propose a KB-QA system, TextRay, which answers complex questions using a novel decompose-execute-join approach. It constructs complex query patterns using a set of simple queries. It uses a semantic matching model which is able to learn simple queries using implicit supervision from question-answer pairs, thus eliminating the need for complex query patterns. Our proposed system significantly outperforms existing KB-QA systems on complex questions while achieving comparable results on simple questions.

SESSION: Long - Machine Learning Themes I

Auto-completion for Data Cells in Relational Tables

We address the task of auto-completing data cells in relational tables. Such tables describe entities (in rows) with their attributes (in columns). We present the CellAutoComplete framework to tackle several novel aspects of this problem, including: (i) enabling a cell to have multiple, possibly conflicting values, (ii) supplementing the predicted values with supporting evidence, (iii) combining evidence from multiple sources, and (iv) handling the case where a cell should be left empty. Our framework makes use of a large table corpus and a knowledge base as data sources, and consists of preprocessing, candidate value finding, and value ranking components. Using a purpose-built test collection, we show that our approach is 40% more effective than the best baseline.

Author Set Identification via Quasi-Clique Discovery

Author identification based on heterogeneous bibliographic networks, which is to identify potential authors given an anonymous paper, has been studied in recent years. However, most of the existing works merely consider the relationship between authors and anonymous papers, while ignore the relationships between authors. In this paper, we take the relationships among authors into consideration to study the problem of author set identification, which is to identify an author set rather than an individual author related to an anonymous paper. The proposed problem has important applications to new collaborator discovery and group building. We propose a novel Author Set Identification approach, namely ASI. ASI first extracts a task-guided embedding to learn the low-dimensional representations of nodes in bibliographic network. And then ASI leverages the learned embedding to construct a weighted paper-ego-network, which contains anonymous paper and candidate authors. Finally, converting the optimal author set identification to the quasi-clique discovery in the constructed network, ASI utilizes a local-search heuristic mechanism under the guidance of the devised density function to find the optimal quasiclique. Extensive experiments on bibliographic networks demonstrate that ASI outperforms the state-of-art baselines in author set identification.

AdaFair: Cumulative Fairness Adaptive Boosting

The widespread use of ML-based decision making in domains with high societal impact such as recidivism, job hiring and loan credit has raised a lot of concerns regarding potential discrimination. In particular, in certain cases it has been observed that ML algorithms can provide different decisions based on sensitive attributes such as gender or race and therefore can lead to discrimination. Although, several fairness-aware ML approaches have been proposed, their focus has been largely on preserving the overall classification accuracy while improving fairness in predictions for both protected and non-protected groups (defined based on the sensitive attribute(s)). The overall accuracy however is not a good indicator of performance in case of class imbalance, as it is biased towards the majority class. As we will see in our experiments, many of the fairness-related datasets suffer from class imbalance and therefore, tackling fairness requires also tackling the imbalance problem. To this end, we propose AdaFair, a fairness-aware classifier based on AdaBoost that further updates the weights of the instances in each boosting round taking into account a cumulative notion of fairness based upon all current ensemble members, while explicitly tackling class-imbalance by optimizing the number of ensemble members for balanced classification error. Our experiments show that our approach can achieve parity in true positive and true negative rates for both protected and non-protected groups, while it significantly outperforms existing fairness-aware methods up to 25% in terms of balanced error.

New Online Kernel Ridge Regression via Incremental Predictive Sampling

Online kernel ridge regression via existing sampling approaches, which aim at approximating the kernel matrix as accurately as possible, is independent of learning and has a cubic time complexity with respect to the sampling size for updating hypothesis. In this paper, we propose a new online kernel ridge regression via an incremental predictive sampling approach, which has the nearly optimal accumulated loss and performs efficiently at each round. We use the estimated ridge leverage score of the labeled matrix, which depends on the accumulated loss at each round, to construct the predictive sampling distribution, and use this sampling probability for the Nyströ m approximation. To avoid calculating the inverse of the approximated kernel matrix directly, we use the Woodbury formula to accelerate the computation and adopt the truncated incremental singular value decomposition to update the generalized inverse of the intersection matrix. Our online kernel ridge regression has a time complexity of $O(tmk+k^3 )$ for updating hypothesis at round t, where k is the truncated rank of the intersection matrix, and enjoys a regret bound of order $O(\sqrtT )$, where T is the time horizon. Experimental results show that the proposed online kernel ridge regression via the incremental predictive sampling performs more stably and efficiently than the online kernel ridge regression via existing online sampling approaches that directly approximate the kernel matrix.

Online Kernel Selection via Tensor Sketching

Online kernel selection is a more complex problem compared with offline kernel selection, which intermixes training and selection at each round and requires a sublinear regret and low computational complexities. But existing online kernel selection approaches have at least linear time and space complexities at each round with respect to the number of rounds, or lack sublinear regret guarantees for an uncountably infinite number of candidate kernels. To address these issues, we propose a novel online kernel selection approach using tensor sketching, which has constant computational complexities at each round and enjoys a sublinear regret bound for an uncountably infinite number of candidate kernels. We represent the data using the tensor products and construct data sketches using the Taylor series and the Count Sketch matrices, which yields a sketched reproducing kernel Hilbert space (SRKHS). Then we update the optimal kernels and the hypotheses using online gradient descent in SRKHS. We prove that the kernel corresponding to SRKHS satisfies the reproducing property, the hypotheses in SRKHS are convex with respect to the kernel parameter, and the proposed online kernel selection approach in SRKHS enjoys a regret bound of order $O(\sqrtT )$ for an uncountably infinite number of candidate kernels, which is optimal for a convex loss function, where T is the number of rounds. By the fast Fourier transform, the hypotheses in SRKHS can be computed in a quasilinear time complexity and a logarithmic space complexity with respect to the sketch size at each round, where the sketch size is a constant. Experimental results demonstrate that our online kernel selection approach is more accurate and efficient for online kernel learning on high dimension data.

SESSION: Long - Machine Learning Themes II

One-Class Active Learning for Outlier Detection with Multiple Subspaces

Active learning for outlier detection involves users in the process, by asking them for annotations of observations, in the form of class labels. The usual assumption is that users can provide such feedback, regardless of the nature and the presentation of the results. This is a simplification, which may not hold in practice. To overcome it, we propose SubSVDD, a semi-supervised classifier, that learns decision boundaries in low-dimensional projections of the data. SubSVDD de-constructs the outlier classification so that users can comprehend and interpret results more easily. For active learning, SubSVDD features a new update mechanism that adjusts decision boundaries based on user feedback. In particular, it considers that outliers may only occur in some of the low-dimensional projections. We conduct systematic experiments to show the effectiveness of our approach. In a comprehensive benchmark, SubSVDD outperforms alternative approaches on several data sets.

AutoGRD: Model Recommendation Through Graphical Dataset Representation

The widespread use of machine learning algorithms and the high level of expertise required to utilize them have fuelled the demand for solutions that can be used by non-experts. One of the main challenges non-experts face in applying machine learning to new problems is algorithm selection - the identification of the algorithm(s) that will deliver top performance for a given dataset, task, and evaluation measure. We present AutoGRD, a novel meta-learning approach for algorithm recommendation. AutoGRD first represents datasets as graphs and then extracts their latent representation that is used to train a ranking meta-model capable of accurately recommending top-performing algorithms for previously unseen datasets. We evaluate our approach on 250 datasets and demonstrate its effectiveness both for classification and regression tasks. AutoGRD outperforms state-of-the-art meta-learning and Bayesian methods.

Batch Mode Active Learning for Semantic Segmentation Based on Multi-Clue Sample Selection

Large labeled datasets are required for training a powerful semantic segmentation model. However, it is very expensive to construct pixel-wise annotated images. In this work, we propose a general batch mode active learning algorithm for semantic segmentation which automatically selects important samples to be labeled for building a competitive classifier. In our approach the edge information of an image is first introduced as a new selecting clue of active learning, which can measure the essential information relevant to segmentation performance. In addition, we also incorporate the informativeness based on Query by Committee (QBC) and representativeness criteria in our algorithm. We combine three clues to select a batch of samples during each iteration. It is shown that the image edge information is significant for the active learning for semantic segmentation in the experiments. And we also demonstrate the performance of our method outperforms the state of the art active learning approaches on the datasets of CamVid, Stanford Background and PASCAL VOC 2012.

CRUX: Adaptive Querying for Efficient Crowdsourced Data Extraction

Crowdsourcing is essential for collecting information about real-world entities. Existing crowdsourced data extraction solutions use fixed, non-adaptive querying strategies that repeatedly ask workers to provide entities from a fixed domain until a desired level of coverage is reached. Unfortunately, such solutions are highly impractical as they yield many duplicate extractions. We design an adaptive querying framework, CRUX, that maximizes the number of extracted entities for a given budget. We show that the problem of budgeted crowdsourced entity extraction is NP-Hard. We leverage two insights to focus our extraction efforts: \em exploiting the structure of the domain of interest, and \em using exclude lists to limit repeated extractions. We develop new statistical tools to reason about the number of new distinct extracted entities of \em additional queries under the presence of little information, and embed them within adaptive algorithms that maximize the distinct extracted entities under budget constraints. We evaluate our techniques on synthetic and real-world datasets, demonstrating an improvement of up to 300% over competing approaches for the same budget.

Deep Forest with LRRS Feature for Fine-grained Website Fingerprinting with Encrypted SSL/TLS

With the development of encryption protocol, such as Secure Sockets Layer (SSL) and Transport Layer Security (TLS), the traditional fingerprinting approaches based on packet content and special field are difficult to fingerprint the websites. Therefore, recent research imported machine learning algorithms to deal with this problem, and various features are extracted for the machine learning algorithms. However, previous approaches of fingerprinting encrypted websites are based on HTTP/1.1, which are not applicable to the widely used HTTP/2. In addition, most of the work only fingerprints the home page of each website, but in fact, users also visit other web pages of the website. To solve the feature compatibility problem, we propose to use the local request and response sequence (LRRS) as features. LRRS can represent the patterns of the encrypted Internet traffic not only based on HTTP/1.1 but also based on HTTP/2 using local packet sequences. In order to fingerprint different web pages in the same website, we import Deep Forest to extract fine-grained features. It utilizes a convolution structure to make full use of LRRS sequential features and multi-layer structure to enhance the ability of feature representation. The experimental results show the proposed algorithm has achieved the best overall performance on four datasets. Especially on the bidirectional encrypted traffic dataset with HTTP/2, the proposed approach achieved 55% higher of f1 score than the state-of-the-art method KFP with Random Forest.

SESSION: Long - Machine Learning Themes III

N2N: Network Derivative Mining

Network mining plays a pivotal role in many high-impact application domains, including information retrieval, healthcare, social network analysis, security and recommender systems. State-of-the-art offers a wealth of sophisticated network mining algorithms, many of which have been widely adopted in real-world with superior empirical performance. Nonetheless, they often lack effective and efficient ways to characterize how the results of a given mining task relate to the underlying network structure. In this paper, we introduce network derivative mining problem. Given the input network and a specific mining algorithm, network derivative mining finds a derivative network whose edges measure the influence of the corresponding edges of the input network on the mining results. We envision that network derivative mining could be beneficial in a variety of scenarios, ranging from explainable network mining, adversarial network mining, sensitivity analysis on network structure, active learning, learning with side information to counterfactual learning on networks. We propose a generic framework for network derivative mining from the optimization perspective and provide various instantiations for three classic network mining tasks, including ranking, clustering, and matrix completion. For each mining task, we develop effective algorithm for constructing the derivative network based on influence function analysis, with numerous optimizations to ensure a linear complexity in both time and space. Extensive experimental evaluation on real-world datasets demonstrates the efficacy of the proposed framework and algorithms.

MoBoost: A Self-improvement Framework for Linear-based Hashing

The linear model is commonly utilized in hashing methods owing to its efficiency. To obtain better accuracy, linear-based hashing methods focus on designing a generalized linear objective function with different constraints or penalty terms that consider neighborhood information. In this study, we propose a novel generalized framework called Model Boost (MoBoost), which can achieve the self-improvement of the linear-based hashing. The proposed MoBoost is used to improve model parameter optimization for linear-based hashing methods without adding new constraints or penalty terms. In the proposed MoBoost, given a linear-based hashing method, we first execute the method several times to get several different hash codes for training samples, and then combine these different hash codes into one set utilizing one novel fusion strategy. Based on this set of hash codes, we learn some new parameters for the linear hash function that can significantly improve accuracy. The proposed MoBoost can be generally adopted in existing linear-based hashing methods, achieving more precise and stable performance compared to the original methods while imposing negligible added expenditure in terms of time and space. Extensive experiments are performed based on three benchmark datasets, and the results demonstrate the superior performance of the proposed framework.

Loopless Semi-Stochastic Gradient Descent with Less Hard Thresholding for Sparse Learning

Stochastic gradient hard thresholding methods have recently been shown to work favorably for solving large-scale empirical risk minimization problems under sparsity constraints. Many stochastic hard thresholding methods (e.g., SVRG-HT) conduct a full gradient update with a constant frequency and perform a hard thresholding operation at each iteration, which leads to a high computational complexity especially for high-dimensional and sparse problems. To be more efficient in large-scale datasets, we propose an efficient single-layer semi-stochastic gradient hard thresholding (LSSG-HT) method. The proposed algorithm updates full gradient with a given probability p and reduces lots of hard thresholding operations by setting frequency m, which reduces hard thresholding complexity in theory to O(κ_s/młog(1/ε)) compared with O(κ_słog(1/ε)) of SVRG-HT. We prove that our algorithm can converge to an optimal solution with a linear convergence rate. Furthermore, we also present an asynchronous parallel variant of LSSG-HT. Numerical experimental results demonstrate that the efficiency of our algorithms with comparison against the state-of-the-art algorithms.

EPA: Exoneration and Prominence based Age for Infection Source Identification

Infection source identification is a well-established problem, having gained a substantial scale of research attention over the years. In this paper, we study the problem by exploiting the idea of the source being the oldest node. For the same, we propose a novel algorithm called Exoneration and Prominence based Age (EPA), which calculates the age of an infected node by considering its prominence in terms of its both infected and non-infected neighbors. These non-infected neighbors hold the key in exonerating an infected node from being the infection source. We also propose a computationally inexpensive variant of EPA, called EPA-LW. Extensive experiments are performed on seven datasets, including 5 real-world and 2 synthetic, of different topologies and varying sizes to demonstrate the effectiveness of the proposed algorithms. We consistently outperform the state-of-the-art single source identification methods in terms of average error distance. To the best of our knowledge, this is the largest scale performance evaluation of the considered problem till date. We also extend EPA to identify multiple sources by developing two new algorithms - one based on K-Means, called EPA_K-Means, and another based on successive identification of sources, called EPA_SSI. Our results show that both EPA_K-Means and EPA_SSI outperform the other multi-source heuristic approaches.

SESSION: Long - Mining in Emerging Applications I

Generating Persuasive Visual Storylines for Promotional Videos

Video contents have become a critical tool for promoting products in E-commerce. However, the lack of automatic promotional video generation solutions makes large-scale video-based promotion campaigns infeasible. The first step of automatically producing promotional videos is to generate visual storylines, which is to select the building block footage and place them in an appropriate order. This task is related to the subjective viewing experience. It is hitherto performed by human experts and thus, hard to scale. To address this problem, we propose WundtBackpack, an algorithmic approach to generate storylines based on available visual materials, which can be video clips or images. It consists of two main parts, 1) the Learnable Wundt Curve to evaluate the perceived persuasiveness based on the stimulus intensity of a sequence of visual materials, which only requires a small volume of data to train; and 2) a clustering-based backpacking algorithm to generate persuasive sequences of visual materials while considering video length constraints. In this way, the proposed approach provides a dynamic structure to empower artificial intelligence (AI) to organize video footage in order to construct a sequence of visual stimuli with persuasive power. Extensive real-world experiments show that our approach achieves close to 10% higher perceived persuasiveness scores by human testers, and 12.5% higher expected revenue compared to the best performing state-of-the-art approach.

Clustering Recurrent and Semantically Cohesive Program Statements in Introductory Programming Assignments

Students taking introductory programming courses are typically required to complete assignments and expect timely feedback to advance their learning. With the current popularity of these courses in both traditional and online versions, graders are seeing themselves overwhelmed by the sheer amount of student programs they have to handle, and the quality of the educational experience provided is often compromised for promptness. Thus, there is a need for automated approaches to effectively increase grading productivity. Existing approaches in this context fail to support flexible grading schemes and customization based on the assignment at hand. This paper presents a data-driven approach for clustering recurrent program statements performing similar but not exact semantics across student programs, which we refer to as core statements. We rely on structural graph clustering over the program dependence graph representations of student programs. Such clustering is performed over the graph resulting from the pairwise approximate graph alignments of programs. Core statements help graders understand solution variations at a glance and, since they group program statements present in individual student programs, can be used to propagate feedback, thus increasing grading productivity. Our experimental results show that, on average, we discover core statements covering more than 50% of individual student programs, and that program statements grouped by core statements are semantically cohesive, which ensures effective grading.

Going Beyond Content Richness: Verified Information Aware Summarization of Crisis-Related Microblogs

High-impact catastrophic events (bomb attacks, shootings) trigger posting of large volume of information on social media platforms such as Twitter. Recent works have proposed content-aware systems for summarizing this information, thereby facilitating post-disaster services. However, a significant proportion of the posted content is unverified, which restricts the practical usage of the existing summarization systems. In this paper, we work on the novel task of generating verified summaries of information posted on Twitter during disasters. We first jointly learn representations of content-classes and expression-classes of tweets posted during disasters using a novel LDA-based generative model. These representations of content & expression classes are used in conjunction with pre-disaster user behavior and temporal signals (replies) for training a Tree-LSTM based tweet-verification model. The model infers tweet verification probabilities which are used, besides information content of tweets, in an Integer Linear Programming (ILP) framework for generating the desired verified summaries. The summaries are fine-tuned using the class information of the tweets as obtained from the LDA-based generative model. Extensive experiments are performed on a publicly-available labeled dataset of man-made disasters which demonstrate the effectiveness of our tweet-verification (3-13% gain over baselines) and summarization (12-48% gain in verified content proportion, 8-13% gain in ROUGE-score over state-of-the-art) systems. We make implementations of our various modules available online.

Declarative User Selection with Soft Constraints

In applications with large userbases such as crowdsourcing, social networks or recommender systems, selecting users is a common and challenging task. Different applications require different policies for selecting users, and implementing such policies is applicationspecific and laborious. To this end, we introduce a novel declarative framework that abstracts common components of the user selection problem, while allowing for domain-specific tuning. The framework is based on an ontology view of user profiles, with respect to which we define a query language for policy specification. Our language extends SPARQL with means for capturing soft constraints which are essential for worker selection. At the core of our query engine is then a novel efficient algorithm for handling these constraints. Our experimental study on real-life data indicates the effectiveness and flexibility of our approach, showing in particular that it outperforms existing task-specific solutions in prominent user selection tasks.

#suicidal - A Multipronged Approach to Identify and Explore Suicidal Ideation in Twitter

Technological advancements have led to the creation of social media platforms like Twitter, where people have started voicing their views over rarely discussed and socially stigmatizing issues. Twitter, is increasingly being used for studying psycho-linguistic phenomenon spanning from expressions of adverse drug reactions, depressions, to suicidality. In this work we focus on identifying suicidal posts from Twitter. Towards this objective we take a multipronged approach and implement different neural network models such assequential models andgraph convolutional networks, that are trained on textual content shared in Twitter, the historical tweeting activity of the users and social network formed between different users posting about suicidality. We train a stacked ensemble of classifiers representing different aspects of suicidal tweeting activity, and achieve state-of-the-art results on a new manually annotated dataset developed by us, that contains textual as well as network information of suicidal tweets. We further investigate into the trained models and perform qualitative analysis showing how historical tweeting activity and rich information embedded in the homophily networks amongst users in Twitter, aids in accurately identifying tweets expressing suicidal intent.

SESSION: Long - Mining in Emerging Applications II

MusicBot: Evaluating Critiquing-Based Music Recommenders with Conversational Interaction

Critiquing-based recommender systems aim to elicit more accurate user preferences from users' feedback toward recommendations. However, systems using a graphical user interface (GUI) limit the way that users can critique the recommendation. With the rise of chatbots in many application domains, they have been regarded as an ideal platform to build critiquing-based recommender systems. Therefore, we present MusicBot, a chatbot for music recommendations, featured with two typical critiquing techniques, user-initiated critiquing (UC) and system-suggested critiquing (SC). By conducting a within-subjects (N=45) study with two typical scenarios of music listening, we compared a system of only having UC with a hybrid critiquing system that combines SC with UC. Furthermore, we analyzed the effects of four personal characteristics,musical sophistication (MS), desire for control (DFC), chatbot experience (CE), and tech savviness (TS), on the user's perception and interaction of the recommendation in MusicBot. In general, compared with UC, SC yields higher perceived diversity and efficiency in looking for songs; combining UC and SC tends to increase user engagement. Both MS and DFC positively influence several key user experience (UX) metrics of MusicBot such as interest matching, perceived controllability, and intent to provide feedback.

Discovering Polarized Communities in Signed Networks

Signed networks contain edge annotations to indicate whether each interaction is friendly (positive edge) or antagonistic (negative edge). The model is simple but powerful and it can capture novel and interesting structural properties of real-world phenomena. The analysis of signed networks has many applications from modeling discussions in social media, to mining user reviews, and to recommending products in e-commerce sites. In this paper we consider the problem of discovering polarized communities in signed networks. In particular, we search for two communities (subsets of the network vertices) where within communities there are mostly positive edges while across communities there are mostly negative edges. We formulate this novel problem as a "discrete eigenvector'' problem, which we show to be NP-hard. We then develop two intuitive spectral algorithms: one deterministic, and one randomized with quality guarantee $\sqrtn $ (where n is the number of vertices in the graph), tight up to constant factors. We validate our algorithms against non-trivial baselines on real-world signed networks. Our experiments confirm that our algorithms produce higher quality solutions, are much faster and can scale to much larger networks than the baselines, and are able to detect ground-truth polarized communities.

Model-based Constrained MDP for Budget Allocation in Sequential Incentive Marketing

Sequential incentive marketing is an important approach for online businesses to acquire customers, increase loyalty and boost sales. How to effectively allocate the incentives so as to maximize the return (e.g., business objectives) under the budget constraint, however, is less studied in the literature. This problem is technically challenging due to the facts that 1) the allocation strategy has to be learned using historically logged data, which is counterfactual in nature, and 2) both the optimality and feasibility (i.e., that cost cannot exceed budget) needs to be assessed before being deployed to online systems. In this paper, we formulate the problem as a constrained Markov decision process (CMDP). To solve the CMDP problem with logged counterfactual data, we propose an efficient learning algorithm which combines bisection search and model-based planning. First, the CMDP is converted into its dual using Lagrangian relaxation, which is proved to be monotonic with respect to the dual variable. Furthermore, we show that the dual problem can be solved by policy learning, with the optimal dual variable being found efficiently via bisection search (i.e., by taking advantage of the monotonicity). Lastly, we show that model-based planing can be used to effectively accelerate the joint optimization process without retraining the policy for every dual variable. Empirical results on synthetic and real marketing datasets confirm the effectiveness of our methods.

Wide-Ranging Review Manipulation Attacks: Model, Empirical Study, and Countermeasures

User reviews have become a cornerstone of how we make decisions. However, this user-based feedback is susceptible to manipulation as recent research has shown the feasibility of automatically generating fake reviews. Previous investigations, however, have focused on generative fake review approaches that are (i) domain dependent and not extendable to other domains without replicating the whole process from scratch; and (ii) character-level based known to generate reviews of poor quality that are easily detectable by anti-spam detectors and by end users. In this work, we propose and evaluate a new class of attacks on online review platforms based on neural language models at word-level granularity in an inductive transfer-learning framework wherein a universal model is refined to handle domain shift, leading to potentially wide-ranging attacks on review systems. Through extensive evaluation, we show that such model-generated reviews can bypass powerful anti-spam detectors and fool end users. Paired with this troubling attack vector, we propose a new defense mechanism that exploits the distributed representation of these reviews to detect model-generated reviews. We conclude that despite the success of neural models in generating realistic reviews, our proposed RNN-based discriminator can combat this type of attack effectively (90% accuracy).

Augment to Prevent: Short-Text Data Augmentation in Deep Learning for Hate-Speech Classification

In this paper, we address the issue of augmenting text data in supervised Natural Language Processing problems, exemplified by deep online hate speech classification. A great challenge in this domain is that although the presence of hate speech can be deleterious to the quality of service provided by social platforms, it still comprises only a tiny fraction of the content that can be found online, which can lead to performance deterioration due to majority class overfitting. To this end, we perform a thorough study on the application of deep learning to the hate speech detection problem: a) we propose three text-based data augmentation techniques aimed at reducing the degree of class imbalance and to maximise the amount of information we can extract from our limited resources and b) we apply them on a selection of top-performing deep architectures and hate speech databases in order to showcase their generalisation properties. The data augmentation techniques are based on a) synonym replacement based on word embedding vector closeness, b) warping of the word tokens along the padded sequence or c) class-conditional, recurrent neural language generation. Our proposed framework yields a significant increase in multi-class hate speech detection, outperforming the baseline in the largest online hate speech database by an absolute 5.7% increase in Macro-F1 score and 30% in hate speech class recall.

SESSION: Long - Natural Language Processing I

Nested Relation Extraction with Iterative Neural Network

Natural language is used to describe objective facts, including simple relations like ""Jobs was the CEO of Apple"", and complex relations like ""the GDP of the United States in 2018 grew 2.9% compared with 2017". For the latter example, the growth rate relation is between two other relations. Due to the complex nature of language, this kind of nested relations is expressed frequently, especially in professional documents in fields like economics, finance, and biomedicine. But extracting nested relations is challenging, and research on this problem is almost vacant. In this paper, we formally formulate the nested relation extraction problem, and come up with a solution using Iterative Neural Network. Specifically, we observe that the nested relation structures can be expressed as a Directed Acyclic Graph (DAG), and propose the model to simultaneously consider the word sequence of natural language in the horizontal direction and the DAG structure in the vertical direction. Based on two nested relation extraction tasks, namely semantic causality relation extraction and formula extraction, we show that the proposed model works well on them. Moreover, we speed up the DAG-LSTM training significantly by a simple parallelization solution.

Learning Chinese Word Embeddings from Stroke, Structure and Pinyin of Characters

Chinese word embeddings have recently attracted much attention in natural language processing (NLP). Existing researches learn Chinese word embeddings based on characters, radicals, components and stroke n-gram. Besides abovementioned features, Chinese characters also own structure and pinyin features. In this paper, we design feature substring, a super set of radicals, components and stroke n-gram with structure and pinyin information, to integrate stroke, structure and pinyin features of Chinese characters and capture the semantics of Chinese words. Based on the feature substring, we propose a novel method ssp2vec to predict the contextual words based on the feature substrings of the target words for learning Chinese word embeddings. It is based on our observation that exploiting the morphological information (stroke and structure) and the phonetic information (pinyin) is crucial for capturing the meanings of Chinese words. Meanwhile, the phonetic information (pinyin) can assist the model to distinguish Chinese words. Experimental results on word analogy, word similarity, text classification and named entity recognition tasks show that the proposed method obtains better results than state-of-the-art approaches.

Sentiment Commonsense Induced Sequential Neural Networks for Sentiment Classification

Although neural networks achieve promising performance in sentence level sentiment classification, most of them are not aware of sentiment commonsense, such as sentiment polarity tags (Positive or Negative) for words, which explicitly determine the sentiment of the sentence in most cases. In this paper, we propose an auxiliary tagging task to integrate sentiment commonsense into sequential neural networks (such as LSTM). We employ the advantage of multitask learning to achieve two goals simultaneously: 1) the sequential learning task accounts for incorporating the semantic information of the surrounding words; 2) the word tagging task ensures the sequential representation still retains the corresponding word tagging information. Besides, considering the most direct way to introduce sentiment information into models as additional knowledge, we further incorporate the additional knowledge enhancing tagging task model to strengthen the effect of sentiment commonsense. We prove the effectiveness of the sentiment commonsense by extensive experiments. The results show that our models exhibit consistent superiority over competitors on three real-word datasets. Specifically, we obtain an accuracy of 55.2%, which is a new state-of-the-art for SST-fine dataset.

Interactive Multi-Grained Joint Model for Targeted Sentiment Analysis

In this paper, we propose an interactive multi-grained joint model for targeted sentiment analysis. Firstly, different from previous works, we leverage the correlation between target and sentiment clues and deeply strengthen interaction between them because targets are highly related to the sentiment clues in a sentence. Moreover, we apply a multi-layer structure to consider multi-grained target and sentiment tagging information more comprehensively. Also, we design two specific loss functions to prevent a word from being both part of a target and a sentiment clue simultaneously, and to align the boundary information of two labeling subsystems. We conduct experiments on English and Spanish datasets and the experimental results show that our approach substantially outperforms a variety of previous models and achieves new state-of-the-art results on these datasets.

Beyond word2vec: Distance-graph Tensor Factorization for Word and Document Embeddings

The \em word2vec methodology such as Skip-gram and CBOW has seen significant interest in recent years because of its ability to model semantic notions of word similarity and distances in sentences. A related methodology, referred to as \em doc2vec is also able to embed sentences and paragraphs. These methodologies, however, lead to different embeddings that cannot be related to one another. In this paper, we present a tensor factorization methodology, which simultaneously embeds words and sentences into latent representations in one shot. Furthermore, these latent representations are concretely related to one another via tensor factorization. Whereas \em word2vec and \em doc2vec are dependent on the use of contextual windows in order to create the projections, our approach treats each document as a structural graph on words. Therefore, all the documents in the corpus are jointly factorized in order to simultaneously create an embedding for the individual documents and the words. Since the graphical representation of a document is much richer than a contextual window, the approach is capable of designing more powerful representations than those using the \em word2vec family of methods. We use a carefully designed negative sampling methodology to provide an efficient implementation of the approach. We relate the approach to factorization machines, which provides an efficient alternative for its implementation. We present experimental results illustrating the effectiveness of the approach for document classification, information retrieval and visualization.

SESSION: Long - Natural Language Processing II

Hierarchical Multi-label Text Classification: An Attention-based Recurrent Network Approach

Hierarchical multi-label text classification (HMTC) is a fundamental but challenging task of numerous applications (e.g., patent annotation), where documents are assigned to multiple categories stored in a hierarchical structure. Categories at different levels of a document tend to have dependencies. However, the majority of prior studies for the HMTC task employ classifiers to either deal with all categories simultaneously or decompose the original problem into a set of flat multi-label classification subproblems, ignoring the associations between texts and the hierarchical structure and the dependencies among different levels of the hierarchical structure. To that end, in this paper, we propose a novel framework called Hierarchical Attention-based Recurrent Neural Network (HARNN) for classifying documents into the most relevant categories level by level via integrating texts and the hierarchical category structure. Specifically, we first apply a documentation representing layer for obtaining the representation of texts and the hierarchical structure. Then, we develop an hierarchical attention-based recurrent layer to model the dependencies among different levels of the hierarchical structure in a top-down fashion. Here, a hierarchical attention strategy is proposed to capture the associations between texts and the hierarchical structure. Finally, we design a hybrid method which is capable of predicting the categories of each level while classifying all categories in the entire hierarchical structure precisely. Extensive experimental results on two real-world datasets demonstrate the effectiveness and explanatory power of HARNN.

A Semantics Aware Random Forest for Text Classification

The Random Forest (RF) classifiers are suitable for dealing with the high dimensional noisy data in text classification. An RF model comprises a set of decision trees each of which is trained using random subsets of features. Given an instance, the prediction by the RF is obtained via majority voting of the predictions of all the trees in the forest. However, different test instances would have different values for the features used in the trees and the trees should contribute differently to the predictions. This diverse contribution of the trees is not considered in traditional RFs. Many approaches have been proposed to model the diverse contributions by selecting a subset of trees for each instance. This paper is among these approaches. It proposes a Semantics Aware Random Forest (SARF) classifier. SARF extracts the features used by trees to generate the predictions and selects a subset of the predictions for which the features are relevant to the predicted classes. We evaluated SARF's classification performance on $30$ real-world text datasets and assessed its competitiveness with state-of-the-art ensemble selection methods. The results demonstrate the superior performance of the proposed approach in textual information retrieval and initiate a new direction of research to utilise interpretability of classifiers.

Federated Topic Modeling

Topic modeling has been widely applied in a variety of industrial applications. Training a high-quality model usually requires massive amount of in-domain data, in order to provide comprehensive co-occurrence information for the model to learn. However, industrial data such as medical or financial records are often proprietary or sensitive, which precludes uploading to data centers. Hence training topic models in industrial scenarios using conventional approaches faces a dilemma: a party (i.e., a company or institute) has to either tolerate data scarcity or sacrifice data privacy. In this paper, we propose a novel framework named Federated Topic Modeling (FTM), in which multiple parties collaboratively train a high-quality topic model by simultaneously alleviating data scarcity and maintaining immune to privacy adversaries. FTM is inspired by federated learning and consists of novel techniques such as private Metropolis Hastings, topic-wise normalization and heterogeneous model integration. We conduct a series of quantitative evaluations to verify the effectiveness of FTM and deploy FTM in an Automatic Speech Recognition (ASR) system to demonstrate its utility in real-life applications. Experimental results verify FTM's superiority over conventional topic modeling.

Multi-Turn Response Selection in Retrieval-Based Chatbots with Iterated Attentive Convolution Matching Network

Building an intelligent chatbot with multi-turn dialogue ability is a major challenge, which requires understanding the multi-view semantic and dependency correlation among words, n-grams and sub-sequences. In this paper, we investigate selecting the proper response for a context through multi-grained representation and interactive matching. To construct hierarchical representation types of text segments, we propose a refined architecture which exclusively consists of gated dilated-convolution and self-attention. Compared with the recurrent-based sentence modeling methods, this architecture provides more flexibility and a speedup. The matching signals of each utterance-response pair are extracted by integrating the interactive information from different views. Then a turns-aware attention mechanism is utilized to aggregate the matching sequence, so as to identify important utterances and capture the implicit relationship of the whole context. Experiments on two large-scale public data sets show that our model significantly outperforms the state-of-the-art methods in terms of all metrics. We empirically provide a thorough ablation test, as well as the comparison of different representation and matching strategies, for a better insight into how each component affects the performance of the model.

Sentiment Lexicon Enhanced Neural Sentiment Classification

Sentiment classification is an important task in the sentiment analysis field. Many deep learning based sentiment classification methods have been proposed in recent years. However, these methods usually rely on massive labeled texts to train sentiment classifiers, which are expensive and time-consuming to annotate. Luckily, many high-quality sentiment lexicons have been constructed and can cover a large number of sentiment words. Since sentiment words are the basic units to convey sentiments in texts, these sentiment lexicons have the potential to improve the performance of neural sentiment classification. In this paper, we propose two approaches to exploit sentiment lexicons to enhance neural sentiment classification. In our first approach we use sentiment lexicons to learn sentiment-aware attentions. We propose a word sentiment classification task to classify the sentiments of words in a sentence based on their hidden representations in the attention network of neural sentiment classification models. We jointly train this task with neural sentiment classifier to facilitate the attention network to recognize and highlight sentiment-bearing words. In our second approach we use sentiment lexicons to learn sentiment-aware word embeddings. We design an auxiliary task to classify the sentiments of words in sentiment lexicons based on their word embeddings, and jointly train this task with neural sentiment classifier to encode sentiment information in sentiment lexicons to word embeddings. Extensive experiments on three benchmark datasets validate the effectiveness of our approach.

SESSION: Long - Deep Nerual Network I

ResumeGAN: An Optimized Deep Representation Learning Framework for Talent-Job Fit via Adversarial Learning

Nowadays, it is popular to utilize online recruitment services for talent recruitment and job recommendation. Given the vast amounts of online talent profiles and job-posts, it is labor-intensive and exhausted for recruiters to manually select only a few potential candidates for further consideration, and also nontrivial for talents to find the most matched job positions. Recently, some deep learning-based approaches are developed to automatically matching the talent resumes and job requirements, and have achieved encouraging performance. In this paper, we propose a novel framework that targets the same task, but integrate different types of information in a more sophisticated way and introduce adversarial learning to learn more expressive representation. In addition, we build a dataset for model evaluation and the effectiveness of our framework is demonstrated by extensive experiments.

Regularizing Deep Neural Networks by Ensemble-based Low-Level Sample-Variances Method

Deep Neural Networks (DNNs) with a large number of parameters are very powerful machine learning systems. However, overfitting is a serious problem in such networks. Till now, many regularizers such as dropout, data augmentation have been proposed to prevent overfitting. Motivated by ensemble learning, we treat each hidden layer in neural networks as an ensemble of some base learners by dividing hidden units into some non-overlapping groups and each group is considered as a base learner. Based on the theoretical analysis of generalization error of ensemble estimators (bias-variance-covariance decomposition), we find the variance of each base learner plays an important role in preventing overfitting and propose a novel regularizer---\emphEnsemble-based Low-Level Sample-Variances Method (ELSM) to encourage each base learner of hidden layers to have a low-level sample-variance. Experiments across a number of datasets and network architectures show that ELSM can effectively reduce overfitting and improve the generalization ability of DNNs.

Attention-Residual Network with CNN for Rumor Detection

Wide dissemination of unverified claims has negative influence on social lives. Rumors are easy to emerge and spread in the crowds especially in Online Social Network (OSN), due to its openness and extensive amount of users. Therefore, rumor detection in OSN is a very challenging and urgent issue. In this paper, we propose an Attention-Residual network combined with CNN (ARC), which is based on the content features for rumor detection. First, we build a data encoding model based on word-level data for contextual feature representation. Second, we propose a residual framework based on fine-tuned attention mechanism to capture long-range dependency. Third, we apply convolution neural network with varying window size to select important components and local features. Experiments on two twitter datasets demonstrate that the proposed model has better performance than other content-based methods both in rumor detection and early rumor verification. To the best of our knowledge, we are the first work that utilize attention model in conjunction with residual network on rumor detection.

Imbalance Rectification in Deep Logistic Regression for Multi-Label Image Classification Using Random Noise Samples

Logistic regression (LR) is the most commonly used loss function in multi-label image classification. However, it suffers from class imbalance problem caused by the huge difference in quantity between positive and negative samples as well as between different classes. First, we find that feeding randomly generated noise samples into an LR classifier is an effective way to detect class imbalances, and further define an informative imbalance metric named inference tendency based on noise sample analysis. Second, we design an efficient moving average based method for calculating inference tendency, which can be easily done during training with negligible overhead. Third, two novel rectification methods called extremum shift (ES) and tendency constraint (TC) are designed to offset or constrain inference tendency in the loss function, and mitigate class imbalances significantly. Finally, comparative experiments with Resnet on Microsoft COCO, NUS-WIDE and DeepFashion demonstrate the effectiveness of inference tendency and the superiority of our approach over the baseline LR and several state-of-the-art alternatives.

CamDrop: A New Explanation of Dropout and A Guided Regularization Method for Deep Neural Networks

To force convolutional networks to explore more discriminative evidence throughout spatial regions, this paper presents a novel CamDrop to improve the conventional dropout in two aspects. First, by considering the intensity of class activation mapping (CAM) all around, CamDrop selectively abandons some specific spatial regions in predominating visual patterns at each iteration. In many classification tasks, CamDrop demonstrates its effectiveness and achieves considerable improvements on robust predictions for adversarial examples. Second, although dropout is a widely adopted technique that has been applied to regularize large models, the improvement in performance always attributes to better preventing DNN from overfitting. Here we give a new explanation of dropout from the perspective of optimization that it makes the upper bound of the magnitude of gradients much tighter, which leads to a more stable behavior of the gradients and effectively avoids neurons falling into the saturation region of the nonlinear activation, even when using high learning rates. Extensive experiments have been performed to prove the above two strengths of CamDrop.

SESSION: Long - Deep Nerual Network II

Dynamic Collaborative Recurrent Learning

In this paper, we provide a unified learning algorithm, dynamic collaborative recurrent learning, DCRL, of two directions of recommendations: temporal recommendations focusing on tracking the evolution of users' long-term preference and sequential recommendations focusing on capturing short-term preferences given a short time window. Our DCRL builds based on RNN and Sate Space Model (SSM), and thus it is not only able to collaboratively capture users' short-term and long-term preferences as in sequential recommendations, but also can dynamically track the evolution of users' long-term preferences as in temporal recommendations in a unified framework. In addition, we introduce two smoothing and filtering scalable inference algorithms for DCRL's offline and online learning, respectively, based on amortized variational inference, allowing us to effectively train the model jointly over all time. Experiments demonstrate DCRL outperforms the temporal and sequential recommender models, and does capture users' short-term preferences and track the evolution of long-term preferences.

AutoInt: Automatic Feature Interaction Learning via Self-Attentive Neural Networks

Click-through rate (CTR) prediction, which aims to predict the probability of a user clicking on an ad or an item, is critical to many online applications such as online advertising and recommender systems. The problem is very challenging since (1) the input features (e.g., the user id, user age, item id, item category) are usually sparse and high-dimensional, and (2) an effective prediction relies on high-order combinatorial features (a.k.a. cross features), which are very time-consuming to hand-craft by domain experts and are impossible to be enumerated. Therefore, there have been efforts in finding low-dimensional representations of the sparse and high-dimensional raw features and their meaningful combinations. In this paper, we propose an effective and efficient method called the AutoInt to automatically learn the high-order feature interactions of input features. Our proposed algorithm is very general, which can be applied to both numerical and categorical input features. Specifically, we map both the numerical and categorical features into the same low-dimensional space. Afterwards, a multi-head self-attentive neural network with residual connections is proposed to explicitly model the feature interactions in the low-dimensional space. With different layers of the multi-head self-attentive neural networks, different orders of feature combinations of input features can be modeled. The whole model can be efficiently fit on large-scale raw data in an end-to-end fashion. Experimental results on four real-world datasets show that our proposed approach not only outperforms existing state-of-the-art approaches for prediction but also offers good explainability. Code is available at: \url

Automatic Construction of Multi-layer Perceptron Network from Streaming Examples

Autonomous construction of deep neural network (DNNs) is desired for data streams because it potentially offers two advantages: proper model's capacity and quick reaction to drift and shift. While self-organizing mechanism of DNNs remains an open issue, this task is even more challenging to be developed for standard multi-layer DNNs than that using the different-depth structures, because addition of a new layer results in information loss of previously trained knowledge. A Neural Network with Dynamically Evolved Capacity (NADINE) is proposed in this paper. NADINE features a fully open structure where its network structure, depth and width, can be automatically evolved from scratch in the online manner and without the use of problem-specific thresholds. NADINE is structured under a standard MLP architecture and the catastrophic forgetting issue during the hidden layer addition phase is resolved using the proposal of soft-forgetting and adaptive memory methods. The advantage of NADINE, namely elastic structure and online learning trait, is numerically validated using nine data stream classification and regression problems where it demonstrates performance's improvement over prominent algorithms in all problems. In addition, it is capable of dealing with data stream regression and classification problems equally well.

Robust Embedded Deep K-means Clustering

Deep neural network clustering is superior to the conventional clustering methods due to deep feature extraction and nonlinear dimensionality reduction. Nevertheless, deep neural network leads to a rough representation regarding the inherent relationship of the data points. Therefore, it is still difficult for deep neural network to exploit the effective structure for direct clustering. To address this issue, we propose a robust embedded deep K-means clustering (RED-KC) method. The proposed RED-KC approach utilizes the δ-norm metric to constrain the feature mapping process of the auto-encoder network, so that data are mapped to a latent feature space, which is more conducive to the robust clustering. Compared to the existing auto-encoder networks with the fixed prior, the proposed RED-KC is adaptive during the process of feature mapping. More importantly, the proposed RED-KC embeds the clustering process with the auto-encoder network, such that deep feature extraction and clustering can be performed simultaneously. Accordingly, a direct and efficient clustering could be obtained within only one step to avoid the inconvenience of multiple separate stages, namely, losing pivotal information and correlation. Consequently, extensive experiments are provided to validate the effectiveness of the proposed approach.

SESSION: Long - Network Science

Discovering Interesting Cycles in Directed Graphs

Cycles in graphs often signify interesting processes. For example, cyclic trading patterns can indicate inefficiencies or economic dependencies in trade networks, cycles in food webs can identify fragile dependencies in ecosystems, and cycles in financial transaction networks can be an indication of money laundering. Identifying such interesting cycles, which can also be constrained to contain a given set of query nodes, although not extensively studied, is thus a problem of considerable importance. In this paper, we introduce the problem of discovering interesting cycles in graphs. We first address the problem of quantifying the extent to which a given cycle is interesting for a particular analyst. We then show that finding cycles according to this interestingness measure is related to the longest cycle and maximum mean-weight cycle problems (in the unconstrained setting) and to the maximum Steiner cycle and maximum mean Steiner cycle problems (in the constrained setting). A complexity analysis shows that finding interesting cycles is NP-hard, and is NP-hard to approximate within a constant factor in the unconstrained setting, and within a factor polynomial in the input size for the constrained setting. The latter inapproximability result implies a similar result for the maximum Steiner cycle and maximum mean Steiner cycle problems. Motivated by these hardness results, we propose a number of efficient heuristic algorithms. We verify the effectiveness of the proposed methods and demonstrate their practical utility on two real-world use cases: a food web and an international trade-network dataset.

FLEET: Butterfly Estimation from a Bipartite Graph Stream

We consider space-efficient single-pass estimation of the number of butterflies, a fundamental bipartite graph motif, from a massive bipartite graph stream where each edge represents a connection between entities in two different partitions. We present a space lower bound for any streaming algorithm that can estimate the number of butterflies accurately, as well as FLEET, a suite of algorithms for accurately estimating the number of butterflies in the graph stream. Estimates returned by the algorithms come with provable guarantees on the approximation error, and experiments show good tradeoffs between the space used and the accuracy of approximation. We also present space-efficient algorithms for estimating the number of butterflies within a sliding window of the most recent elements in the stream. While there is a significant body of work on counting subgraphs such as triangles in a unipartite graph stream, our work seems to be one of the few to tackle the case of bipartite graph streams.

Selecting the Optimal Groups: Efficiently Computing Skyline k-Cliques

In many applications, graphs often involve the nodes with multi-dimensional numerical attributes, and it is desirable to retrieve a group of nodes that are both highly connected (e.g., clique) and optimal according to some ranking functions. It is well known that the skyline returns candidates for the optimal objects when ranking functions are not specified. Motivated by this, in this paper we formulate the novel model of skyline k-cliques over multi-valued attributed graphs and develop efficient algorithms to conduct the computation. To verify the group based dominance between two k-cliques, we make use of maximum bipartite matching and develop a set of optimization techniques to improve the verification efficiency. Then, a progressive computation algorithm is developed which enumerates the k-cliques in an order such that a k-clique is guaranteed not to be dominated by those generated after it. Novel pruning and early termination techniques are developed to exclude unpromising nodes or cliques by investigating the structural and attribute properties of the multi-valued attributed graph. Empirical studies on four real datasets demonstrate the effectiveness of the skyline k-clique model and the efficiency of the novel computing techniques.

Balance in Signed Bipartite Networks

A large portion of today's big data can be represented as networks. However, not all networks are the same, and in fact, for many that have additional complexities to their structure, traditional general network analysis methods are no longer applicable. For example, signed networks contain both positive and negative links, and thus dedicated theories and algorithms have been developed. However, previous work mainly focuses on the unipartite setting where signed links connect any pair of nodes. Signed bipartite networks on the one hand, are commonly found, but have primarily been overlooked. Their complexities of having two node types where signed links can only form across the two sets introduce challenges that prevent most existing literature on unipartite signed and unsigned bipartite networks from being applied. On the other hand, balance theory, a key signed social theory, has been generally defined for cycles of any length and is being used in the form of triangles for numerous unipartite signed network tasks. However, in bipartite networks there are no triangles and furthermore there exist two types of nodes. Therefore, in this work, we conduct the first comprehensive analysis and validation of balance theory using the smallest cycle in signed bipartite networks - signed butterflies (i.e., cycles of length 4 containing the two node types). Then, to investigate the applicability of balance theory aiding signed bipartite network tasks, we develop multiple sign prediction methods that utilize balance theory in the form of signed butterflies. Our sign prediction experiment on three real-world signed bipartite networks demonstrates the effectiveness of using these signed butterflies for not only sign prediction, but paves the way for improvements in other signed bipartite network analysis tasks.

Adaptive Algorithms for Estimating Betweenness and k-path Centralities

Betweenness centrality and k-path centrality are two important indices that are widely used to analyze social, technological and information networks. In the current paper, first given a directed network G and a vertex $r\in V(G)$, we present a novel adaptive algorithm for estimating betweenness score of r. Our algorithm first computes two subsets of the vertex set of G, called $\mathcalRF (r)$ and $\mathcalRT (r)$. They define the sample spaces of the start-points and the end-points of the samples. Then, it adaptively samples from $\mathcalRF (r)$ and $\mathcalRT (r)$ and stops as soon as some condition is satisfied. The stopping condition depends on the samples met so far, $|\mathcalRF (r)|$ and $|\mathcalRT (r)|$. We show that compared to the well-known existing algorithms, our algorithm gives a better $(łambda,δ)$-approximation. Then, we propose a novel algorithm for estimating k-path centrality of r. Our algorithm is based on computing two sets $\mathcalRF (r)$ and $\mathcalD (r)$. While $\mathcalRF (r)$ defines the sample space of the source vertices of the sampled paths, $\mathcalD (r)$ defines the sample space of the other vertices of the paths. We show that in order to give a $(łambda,δ)$-approximation of the k-path score of r, our algorithm requires considerably less samples. Moreover, it processes each sample faster and with less memory. Finally, we empirically evaluate our proposed algorithms and show their superior performance. Also, we show that they can be used to efficiently compute centrality scores of a set of vertices.

SESSION: Long - Online and Real-Time

Interactive Variance Attention based Online Spoiler Detection for Time-Sync Comments

Nowadays, time-sync comment (TSC), a new form of interactive comments, has become increasingly popular on Chinese video websites. By posting TSCs, people can easily express their feelings and exchange their opinions with others when watching online videos. However, some spoilers appear among the TSCs. These spoilers reveal crucial plots in videos that ruin people's surprise when they first watch the video. In this paper, we proposed a novel Similarity-Based Network with Interactive Variance Attention (SBN-IVA) to classify comments as spoilers or not. In this framework, we firstly extract textual features of TSCs through the word-level attentive encoder. We design Similarity-Based Network (SBN) to acquire neighbor and keyframe similarity according to semantic similarity and timestamps of TSCs. Then, we implement Interactive Variance Attention (IVA) to eliminate the impact of noise comments. Finally, we obtain the likelihood of spoiler based on the difference between the neighbor and keyframe similarity. Experiments show SBN-IVA is on average 11.2% higher than the state-of-the-art method on F1-score in baselines.

Detecting Malicious Accounts in Online Developer Communities Using Deep Learning

Online developer communities like GitHub provide services such as distributed version control and task management, which allow a massive number of developers to collaborate online. However, the openness of the communities makes themselves vulnerable to different types of malicious attacks, since the attackers can easily join and interact with legitimate users. In this work, we formulate the malicious account detection problem in online developer communities, and propose GitSec, a deep learning-based solution to detect malicious accounts. GitSec distinguishes malicious accounts from legitimate ones based on the account profiles as well as dynamic activity characteristics. On one hand, GitSec makes use of users' descriptive features from the profiles. On the other hand, GitSec processes users' dynamic behavioral data by constructing two user activity sequences and applying a parallel neural network design to deal with each of them, respectively. An attention mechanism is used to integrate the information generated by the parallel neural networks. The final judgement is made by a decision maker implemented by a supervised machine learning-based classifier. Based on the real-world data of GitHub users, our extensive evaluations show that GitSec is an accurate detection system, with an F1-score of 0.922 and an AUC value of 0.940.

Exploring Multi-Objective Exercise Recommendations in Online Education Systems

Recommending suitable exercises to students in an online education system is highly useful. Existing approaches usually rely on machine learning techniques to mine large amounts of student interaction log data accumulated in the systems to select the most suitable exercises for each student. Generally, they mainly aim to optimize a single objective, i.e., recommending non-mastered exercises to address the immediate weakness of students. While this is a reasonable objective, there exist more beneficial multiple objectives in the long-term learning process that need to be addressed including Review & Explore, Smoothness of difficulty level and Engagement. In this paper, we propose a novel Deep Reinforcement learning framework, namely DRE, for adaptively recommending Exercises to students with optimization of above three objectives. In the framework, we propose two different Exercise Q-Networks for the agent, i.e., EQNM and EQNR, to generate recommendations following Markov property and Recurrent manner, respectively. We also propose novel reward functions to formally quantify those three objectives so that DRE could update and optimize its recommendation strategy by interactively receiving students' performance feedbacks (e.g., score). We conduct extensive experiments on two real-world datasets. Experimental results clearly show that the proposed DRE can effectively learn from the student interaction data to optimize multiple objectives in a single unified framework and adaptively recommend suitable exercises to students.

Into the Battlefield: Quantifying and Modeling Intra-community Conflicts in Online Discussion

Over the last decade, online forums have become primary news sources for readers around the globe, and social media platforms are the space where these news forums find most of their audience and engagement. Our particular focus in this paper is to study conflict dynamics over online news articles in Reddit, one of the most popular online discussion platforms. We choose to study how conflicts develop around news inside a discussion community, the \em r/news subreddit. Mining the characteristics of these engagements often provide useful insights into the behavioral dynamics of large-scale human interactions. Such insights are useful for many reasons -- for news houses to improvise their publishing strategies and potential audience, for data analytics to get a better introspection over media engagement as well as for social media platforms to avoid unnecessary and perilous conflicts. In this work, we present a novel quantification of conflict in online discussion. Unlike previous studies on conflict dynamics, which model conflict as a binary phenomenon, our measure is continuous-valued, which we validate with manually annotated ratings. We address a two-way prediction task. Firstly, we predict the probable degree of conflict a news article will face from its audience. We employ multiple machine learning frameworks for this task using various features extracted from news articles.Secondly, given a pair of users and their interaction history, we predict if their future engagement will result in a conflict. We fuse textual and network-based features together using a support vector machine which achieves an AUC of 0.89. Moreover, we implement a graph convolutional model which exploits engagement histories of users to predict whether a pair of users who never met each other before will have a conflicting interaction, with an AUC of 0.69. We perform our studies on a massive discussion dataset crawled from the Reddit news community, containing over $41k$ news articles and $5.5$ million comments. Apart from the prediction tasks, our studies offer interesting insights on the conflict dynamics -- how users form clusters based on conflicting engagements, how different is the temporal nature of conflict over different online news forums, how is contribution of different language based features to induce conflict, etc. In short, our study paves the way towards new methods of exploration and modeling of conflict dynamics inside online discussion communities.

Offline and Online Satisfaction Prediction in Open-Domain Conversational Systems

Predicting user satisfaction in conversational systems has become critical, as spoken conversational assistants operate in increasingly complex domains. Online satisfaction prediction (i.e., predicting satisfaction of the user with the system after each turn) could be used as a new proxy for implicit user feedback, and offers promising opportunities to create more responsive and effective conversational agents, which adapt to the user's engagement with the agent. To accomplish this goal, we propose a conversational satisfaction prediction model specifically designed for open-domain spoken conversational agents, called ConvSAT. To operate robustly across domains, ConvSAT aggregates multiple representations of the conversation, namely the conversation history, utterance and response content, and system- and user-oriented behavioral signals. We first calibrate ConvSAT performance against state of the art methods on a standard dataset (Dialogue Breakdown Detection Challenge) in an online regime, and then evaluate ConvSAT on a large dataset of conversations with real users, collected as part of the Alexa Prize competition. Our experimental results show that ConvSAT significantly improves satisfaction prediction for both offline and online setting on both datasets, compared to the previously reported state-of-the-art approaches. The insights from our study can enable more intelligent conversational systems, which could adapt in real-time to the inferred user satisfaction and engagement.

SESSION: Long - Privacy

Privacy-Preserving Tensor Factorization for Collaborative Health Data Analysis

Tensor factorization has been demonstrated as an efficient approach for computational phenotyping, where massive electronic health records (EHRs) are converted to concise and meaningful clinical concepts. While distributing the tensor factorization tasks to local sites can avoid direct data sharing, it still requires the exchange of intermediary results which could reveal sensitive patient information. Therefore, the challenge is how to jointly decompose the tensor under rigorous and principled privacy constraints, while still support the model's interpretability. We propose DPFact, a privacy-preserving collaborative tensor factorization method for computational phenotyping using EHR. It embeds advanced privacy-preserving mechanisms with collaborative learning. Hospitals can keep their EHR database private but also collaboratively learn meaningful clinical concepts by sharing differentially private intermediary results. Moreover, DPFact solves the heterogeneous patient population using a structured sparsity term. In our framework, each hospital decomposes its local tensors and sends the updated intermediary results with output perturbation every several iterations to a semi-trusted server which generates the phenotypes. The evaluation on both real-world and synthetic datasets demonstrated that under strict privacy constraints, our method is more accurate and communication-efficient than state-of-the-art baseline methods.

Achieve Privacy-Preserving Truth Discovery in Crowdsensing Systems

To solve the problem that the data collected in crowdsensing systems are not reliable, a large number of truth discovery protocols have been proposed. However, most of them neglect the privacy protection existing in crowdsensing systems. Some truth discovery protocols that consider privacy only provide limited privacy protection, such as only protecting the privacy of collected data. To bridge the gap, in this paper, we propose a more comprehensive privacy-preserving truth discovery protocol that can simultaneously protect the privacy of participants and truth results. Specifically, our protocol encrypts participants' observed data based on Paillier Homomorphic Cryptosystem. Then, through the interaction between two servers, we can calculate participants' weights and estimate the truth results in the encrypted domain. Moreover, based on the data perturbation technology, the privacy of sensitive data exchanged between the two servers is protected in our protocol. Theoretical analysis and experimental results demonstrate that our protocol can effectively protect the privacy of participants and truth results without losing the accuracy of truth results.

Privacy-preserving Crowd-guided AI Decision-making in Ethical Dilemmas

With the rapid development of artificial intelligence (AI), ethical issues surrounding AI have attracted increasing attention. In particular, autonomous vehicles may face moral dilemmas in accident scenarios, such as staying the course resulting in hurting pedestrians or swerving leading to hurting passengers. To investigate such ethical dilemmas, recent studies have adopted preference aggregation, in which each voter expresses her/his preferences over decisions for the possible ethical dilemma scenarios, and a centralized system aggregates these preferences to obtain the winning decision. Although a useful methodology for building ethical AI systems, such an approach can potentially violate the privacy of voters since moral preferences are sensitive information and their disclosure can be exploited by malicious parties resulting in negative consequences. In this paper, we report a first-of-its-kind privacy-preserving crowd-guided AI decision-making approach in ethical dilemmas. We adopt the formal and popular notion of differential privacy to quantify privacy, and consider four granularities of privacy protection by taking voter-/record-level privacy protection and centralized/distributed perturbation into account, resulting in four approaches VLCP, RLCP, VLDP, and RLDP, respectively. Moreover, we propose different algorithms to achieve these privacy protection granularities, while retaining the accuracy of the learned moral preference model. Specifically, VLCP and RLCP are implemented with the data aggregator setting a universal privacy parameter and perturbing the averaged moral preference to protect the privacy of voters' data. VLDP and RLDP are implemented in such a way that each voter perturbs her/his local moral preference with a personalized privacy parameter. Extensive experiments based on both synthetic data and real-world data of voters' moral decisions demonstrate that the proposed approaches achieve high accuracy of preference aggregation while protecting individual voter's privacy.

Privacy Preserving Approximate K-means Clustering

Privacy preserving computation is of utmost importance in a cloud computing environment where a client often requires to send sensitive data to servers offering computing services over untrusted networks. Eavesdropping over the network or malware at the server may lead to leaking sensitive information from the data. To prevent this, we propose to encode the input data in such a way that, firstly, it should be difficult to decode it back to the true data, and secondly, the computational results obtained with the encoded data should not be substantially different from those obtained with the true data. Specifically, the computational activity that we focus on is the K-means clustering, which is widely used for many data mining tasks. Our proposed variant of the K-means algorithm is capable of privacy preservation in the sense that it requires as input only binary encoded data, and is not allowed to access the true data vectors at any stage of the computation. During intermediate stages of K-means computation, our algorithm is able to effectively process the inputs with incomplete information seeking to yield outputs relatively close to the complete information (non-encoded) case. Evaluation on real datasets show that the proposed methods yields comparable clustering effectiveness in comparison to the standard K-means algorithm on image clustering (MNIST-8M dataset), and in fact outperforms the standard K-means on text clustering (ODPtweets dataset).

Practical Access Pattern Privacy by Combining PIR and Oblivious Shuffle

We consider the following secure data retrieval problem: a client outsources encrypted data blocks to a semi-trusted cloud server and later retrieves blocks without disclosing access patterns. Existing PIR and ORAM solutions suffer from serious performance bottlenecks in terms of communication or computation costs. To help eliminate this void, we introduce "access pattern unlinkability'' that separates access pattern privacy into short-term privacy at individual query level and long-term privacy at query distribution level. This new security definition provides tunable trade-offs between privacy and query performance. We present an efficient construction, called SBR protocol, using PIR and Oblivious Shuffling to enable secure data retrieval while satisfying access pattern unlinkability. Both analytical and empirical analysis show that SBR exhibits flexibility and usability in practice.

SESSION: Long - Question Answering and Dialogue Systems I

A Hybrid Retrieval-Generation Neural Conversation Model

Intelligent personal assistant systems that are able to have multi-turn conversations with human users are becoming increasingly popular. Most previous research has been focused on using either retrieval-based or generation-based methods to develop such systems. Retrieval-based methods have the advantage of returning fluent and informative responses with great diversity. However, the performance of the methods is limited by the size of the response repository. On the other hand, generation-based methods can produce highly coherent responses on any topics. But the generated responses are often generic and not informative due to the lack of grounding knowledge. In this paper, we propose a hybrid neural conversation model that combines the merits of both response retrieval and generation methods. Experimental results on Twitter and Foursquare data show that the proposed model outperforms both retrieval-based methods and generation-based methods (including a recently proposed knowledge-grounded neural conversation model) under both automatic evaluation metrics and human evaluation. We hope that the findings in this study provide new insights on how to integrate text retrieval and text generation models for building conversation systems.

A Latent-Constrained Variational Neural Dialogue Model for Information-Rich Responses

The variational neural models have achieved significant progress in dialogue generation. They are of encoder-decoder architecture, with stochastic latent variables learned at the utterance level. However, latent variables are usually approximated by factorized-form distributions, the value space of which is too large relative to latent features to be encoded, leading to the sparsity problem. As a result, little useful information is carried in latent representations, and generated responses tend to be non-committal and meaningless. To address it, we initially propose the Latent-Constrained Variational Neural Dialogue Model (LC-VNDM). It follows variational neural dialogue framework, with an utterance encoder, a context encoder and a response decoder hierarchically organized. Particularly, LC-VNDM uses a hierarchically-structured variational distribution form, which considers inter-dependencies between latent variables. Thus it defines a constrained latent value space, and prevents latent global features from being diluted. Therefore, latent representations sampled from it would carry richer global information to facilitate the decoding, generating meaningful responses. We conduct extensive experiments on three datasets using automatic evaluation and human evaluation. Experiments prove that LC-VNDM significantly outperforms the state-of-the-arts and can generate information-richer responses by learning a better-quality latent space.

Legal Summarization for Multi-role Debate Dialogue via Controversy Focus Mining and Multi-task Learning

Multi-role court debate is a critical component in a civil trial where parties from different camps (plaintiff, defendant, witness, judge, etc.) actively involved. Unlike other types of dialogue, court debate can be lengthy, and important information, with respect to the controversy focus(es), often hides within the redundant and colloquial dialogue data. Summarizing court debate can be a novel but significant task to assist judge to effectively make the legal decision for the target trial. In this work, we propose an innovative end-to-end model to address this problem. Unlike prior summarization efforts, the proposed model projects the multi-role debate into the controversy focus space, which enables high-quality essential utterance(s) extraction in terms of legal knowledge and judicial factors. An extensive set of experiments with a large civil trial dataset shows that the proposed model can provide more accurate and readable summarization against several alternatives in the multi-role court debate scene.

ConCET: Entity-Aware Topic Classification for Open-Domain Conversational Agents

Identifying the topic (domain) of each user's utterance in open-domain conversational systems is a crucial step for all subsequent language understanding and response tasks. In particular, for complex domains, an utterance is often routed to a single component responsible for that domain. Thus, correctly mapping a user utterance to the right domain is critical. This is a challenging task: users could mention entities like actors, singers or locations to implicitly indicate the domain, which requires extensive domain knowledge to interpret. To address this problem, we introduce ConCET: a Concurrent Entity-aware conversational Topic classifier, which incorporates entity type information together with the utterance content features. Specifically, ConCET utilizes entity information to enrich the utterance representation, combining character, word, and entity type embeddings into a single representation. However, for rich domains with millions of available entities, unrealistic amounts of labeled training data would be required. To complement our model, we propose a simple and effective method for generating synthetic training data, to augment the typically limited amounts of labeled training data, using commonly available knowledge bases as to generate additional labeled utterances. We extensively evaluate ConCET and our proposed training method first on an openly available human-human conversational dataset called Self-Dialogue, to calibrate our approach against previous state-of-the-art methods; second, we evaluate ConCET on a large dataset of human-machine conversations with real users, collected as part of the Amazon Alexa Prize. Our results show that ConCET significantly improves topic classification performance on both datasets, reaching 8-10% improvements compared to state-of-the-art deep learning methods. We complement our quantitative results with detailed analysis of system performance, which could be used for further improvements of conversational agents.

An Interactive Mechanism to Improve Question Answering Systems via Feedback

Semantic parsing-based RDF question answering (QA) systems are to interpret users' natural language questions as query graphs and return answers over RDF repository. However, due to the complexity of linking natural phrases with specific RDF items (e.g., entities and predicates), it remains difficult to understand users' question sentences precisely, hence QA systems may not meet users' expectation, offering wrong answers and dismissing some correct answers. In this paper, we design an I nteractive M echanism aiming for PRO motion V ia users' fe edback to Q A systems (IMPROVE-QA), a whole framework to not only make existing QA systems return more precise answers based on a few feedbacks over the original answers given by RDF QA systems, but also enhance paraphrasing dictionaries to ensure a continuous-learning capability in improving RDF QA systems. To provide better interactivity and online performance, we design a holistic graph mining algorithm (HWspan) to automatically refine the query graph. Extensive experiments on both Freebase and DBpedia confirm the effectiveness and superiority of our approach.

SESSION: Long - Question Answering and Dialogue Systems II

Attentive History Selection for Conversational Question Answering

Conversational question answering (ConvQA) is a simplified but concrete setting of conversational search. One of its major challenges is to leverage the conversation history to understand and answer the current question. In this work, we propose a novel solution for ConvQA that involves three aspects. First, we propose a positional history answer embedding method to encode conversation history with position information using BERT in a natural way. BERT is a powerful technique for text representation. Second, we design a history attention mechanism (HAM) to conduct a "soft selection" for conversation histories. This method attends to history turns with different weights based on how helpful they are on answering the current question. Third, in addition to handling conversation history, we take advantage of multi-task learning (MTL) to do answer prediction along with another essential conversation task (dialog act prediction) using a uniform model architecture. MTL is able to learn more expressive and generic representations to improve the performance of ConvQA. We demonstrate the effectiveness of our model with extensive experimental evaluations on QuAC, a large-scale ConvQA dataset. We show that position information plays an important role in conversation history modeling. We also visualize the history attention and provide new insights into conversation history understanding.

Emotion-aware Chat Machine: Automatic Emotional Response Generation for Human-like Emotional Interaction

The consistency of a response to a given post at semantic-level and emotional-level is essential for a dialogue system to deliver human-like interactions. However, this challenge is not well addressed in the literature, since most of the approaches neglect the emotional information conveyed by a post while generating responses. This article addresses this problem by proposing a unified end-to-end neural architecture, which is capable of simultaneously encoding the semantics and the emotions in a post for generating more intelligent responses with appropriately expressed emotions. Extensive experiments on real-world data demonstrate that the proposed method outperforms the state-of-the-art methods in terms of both content coherence and emotion appropriateness.

Commonsense Properties from Query Logs and Question Answering Forums

Commonsense knowledge about object properties, human behavior and general concepts is crucial for robust AI applications. However, automatic acquisition of this knowledge is challenging because of sparseness and bias in online sources. This paper presents Quasimodo, a methodology and tool suite for distilling commonsense properties from non-standard web sources. We devise novel ways of tapping into search-engine query logs and QA forums, and combining the resulting candidate assertions with statistical cues from encyclopedias, books and image tags in a corroboration step. Unlike prior work on commonsense knowledge bases, Quasimodo focuses on salient properties that are typically associated with certain objects or concepts. Extensive evaluations, including extrinsic use-case studies, show that Quasimodo provides better coverage than state-of-the-art baselines with comparable quality.

Adapting Visual Question Answering Models for Enhancing Multimodal Community Q&A Platforms

Question categorization and expert retrieval methods have been crucial for information organization and accessibility in community question & answering (CQA) platforms. Research in this area, however, has dealt with only the text modality. With the increasingly multimodal nature of web content, we focus on extending these methods for CQA questions accompanied by images. Specifically, we leverage the success of representation learning for text and images in the visual question answering (VQA) domain and adapt the underlying concept and architecture for automated category classification and expert retrieval on image-based questions posted on Yahoo! Chiebukuro, the Japanese counterpart of Yahoo! Answers. To the best of our knowledge, this is the first work to tackle the multimodality challenge in CQA, and to adapt VQA models for tasks on a more ecologically valid source of visual questions. Our analysis of the differences between visual QA and community QA data drives our proposal of novel augmentations of an attention method tailored for CQA and use of auxiliary tasks for learning better grounding features. Our final model markedly outperforms the text-only and VQA model baselines for both tasks of classification and expert retrieval on real-world multimodal CQA data.

Message Passing for Complex Question Answering over Knowledge Graphs

Question answering over knowledge graphs (KGQA) has evolved from simple single-fact questions to complex questions that require graph traversal and aggregation. We propose a novel approach for complex KGQA that uses unsupervised message passing, which propagates confidence scores obtained by parsing an input question and matching terms in the knowledge graph to a set of possible answers. First, we identify entity, relationship, and class names mentioned in a natural language question, and map these to their counterparts in the graph. Then, the confidence scores of these mappings propagate through the graph structure to locate the answer entities. Finally, these are aggregated depending on the identified question type. This approach can be efficiently implemented as a series of sparse matrix multiplications mimicking joins over small local subgraphs. Our evaluation results show that the proposed approach outperforms the state of the art on the LC-QuAD benchmark. Moreover, we show that the performance of the approach depends only on the quality of the question interpretation results, i.e., given a correct relevance score distribution, our approach always produces a correct answer ranking. Our error analysis reveals correct answers missing from the benchmark dataset and inconsistencies in the DBpedia knowledge graph. Finally, we provide a comprehensive evaluation of the proposed approach accompanied with an ablation study and an error analysis, which showcase the pitfalls for each of the question answering components in more detail.

SESSION: Long - Recommendation System I

BERT4Rec: Sequential Recommendation with Bidirectional Encoder Representations from Transformer

Modeling users' dynamic preferences from their historical behaviors is challenging and crucial for recommendation systems. Previous methods employ sequential neural networks to encode users' historical interactions from left to right into hidden representations for making recommendations. Despite their effectiveness, we argue that such left-to-right unidirectional models are sub-optimal due to the limitations including: \begin enumerate* [label=series\itshape\alph*\upshape)] \item unidirectional architectures restrict the power of hidden representation in users' behavior sequences; \item they often assume a rigidly ordered sequence which is not always practical. \end enumerate* To address these limitations, we proposed a sequential recommendation model called BERT4Rec, which employs the deep bidirectional self-attention to model user behavior sequences. To avoid the information leakage and efficiently train the bidirectional model, we adopt the Cloze objective to sequential recommendation, predicting the random masked items in the sequence by jointly conditioning on their left and right context. In this way, we learn a bidirectional representation model to make recommendations by allowing each item in user historical behaviors to fuse information from both left and right sides. Extensive experiments on four benchmark datasets show that our model outperforms various state-of-the-art sequential models consistently.

Adaptive Feature Sampling for Recommendation with Missing Content Feature Values

Most recommendation algorithms mainly make use of user history interactions in the model, while these methods often suffer from the cold-start problem (user/item has no history information). On the other sides, content features help on cold-start scenarios for modeling new users or items. So it is essential to utilize content features to enhance different recommendation models. To take full advantage of content features, feature interactions such as cross features are used by some models and outperform than using raw features. However, in real-world systems, many content features are incomplete, e.g., we may know the occupation and gender of a user, but the values of other features (location, interests, etc.) are missing. This missing-feature-value (MFV) problem is harmful to the model performance, especially for models that rely heavily on rich feature interactions. Unfortunately, this problem has not been well studied previously.

In this work, we propose a new adaptive "Feature Sampling'' strategy to help train different models to fit distinct scenarios, no matter for cold-start or missing feature value cases. With the help of this strategy, more feature interactions can be utilized. A novel model named CC-CC is proposed. The model takes both raw features and the feature interactions into consideration. It has a linear part to memorize useful variant information from the user or item contents and contexts (Content & Context Module), and a deep attentive neural module that models both content and collaborate information to enhance the generalization ability (Content & Collaborate Module). Both parts have feature interactions. The model is evaluated on two public datasets. Comparative results show that the proposed CC-CC model outperforms the state-of-the-art algorithms on both warm and cold scenarios significantly (up to 6.3%). To the best of our knowledge, this model is the first clear and powerful model that proposed to handle the missing feature values problem in deep neural network frameworks for recommender systems.

A Dynamic Co-attention Network for Session-based Recommendation

Session-based recommendation is the task of recommending the next item a user might be interested in given partially known session information, e.g., part of a session or recent historical sessions. An effective session-based recommender should be able to exploit a user's evolving preferences, which we assume to be a mixture of her short- and long-term interests. Existing session-based recommendation methods often embed a user's long-term preference into a static representation, which plays a fixed role when dealing with her current short-term interests. This is problematic because long-term preferences may be more or less important for predicting the next conversion depending on the user's short-term interests. We propose a DCN-SR. DCN-SR applies a co-attention network to capture the dynamic interactions between the user's long- and short-term interaction behavior and generates co-dependent representations of the user's long- and short-term interests. For modeling a user's short-term interaction behavior, we design a CGRU network to take actions like "click'', "collect'' and "buy'' into account. Experiments on e-commerce datasets show significant improvements of DCN-SR over state-of-the-art session-based recommendation methods, with improvements of up to 2.58% on the Tmall dataset and 3.08% on the Tianchi dataset in terms of [email protected] [email protected] improvements are 3.78% and 4.05%, respectively. We also investigate the scalability and sensitivity of DCN-SR. The improvements of DCN-SR over state-of-the-art baselines are especially noticeable for short sessions and active users with many historical interactions.

Attributed Multi-Relational Attention Network for Fact-checking URL Recommendation

To combat fake news, researchers mostly focused on detecting fake news and journalists built and maintained fact-checking sites (e.g., and However, fake news dissemination has been greatly promoted via social media sites, and these fact-checking sites have not been fully utilized. To overcome these problems and complement existing methods against fake news, in this paper we propose a deep-learning based fact-checking URL recommender system to mitigate impact of fake news in social media sites such as Twitter and Facebook. In particular, our proposed framework consists of a multi-relational attentive module and a heterogeneous graph attention network to learn complex/semantic relationship between user-URL pairs, user-user pairs, and URL-URL pairs. Extensive experiments on a real-world dataset show that our proposed framework outperforms eight state-of-the-art recommendation models, achieving at least 3$\sim$5.3% improvement. Our source code and dataset are available at \url .

A Hierarchical Self-Attentive Model for Recommending User-Generated Item Lists

User-generated item lists are a popular feature of many different platforms. Examples include lists of books on Goodreads, playlists on Spotify and YouTube, collections of images on Pinterest, and lists of answers on question-answer sites like Zhihu. Recommending item lists is critical for increasing user engagement and connecting users to new items, but many approaches are designed for the item-based recommendation, without careful consideration of the complex relationships between items and lists. Hence, in this paper, we propose a novel user-generated list recommendation model called AttList. Two unique features of AttList are careful modeling of (i) hierarchical user preference, which aggregates items to characterize the list that they belong to, and then aggregates these lists to estimate the user preference, naturally fitting into the hierarchical structure of item lists; and (ii) item and list consistency, through a novel self-attentive aggregation layer designed for capturing the consistency of neighboring items and lists to better model user preference. Through experiments over three real-world datasets reflecting different kinds of user-generated item lists, we find that AttList results in significant improvements in NDCG, [email protected], and [email protected] versus a suite of state-of-the-art baselines. Furthermore, all code and data are available at

SESSION: Long - Recommendation System II

HAES: A New Hybrid Approach for Movie Recommendation with Elastic Serendipity

Recommendation systems provide good guidance for users to find their favorite movies from an overwhelming amount of options. However, most systems excessively pursue the recommendation accuracy and give rise to over-specialization, which triggers the emergence of serendipity. Hence, serendipity recommendation has received more attention in recent years, facing three key challenges: subjectivity in the definition, the lack of data, and users' floating demands for serendipity. To address these challenges, we introduce a new model called HAES, a H ybrid A pproach for movie recommendation with E lastic S erendipity, to recommend serendipitous movies. Specifically, we (1) propose a more objective definition of serendipity, \em content difference and \em genre accuracy, according to the analysis on a real dataset, (2) propose a new algorithm named JohnsonMax to mitigate the data sparsity and build weak ties beneficial to finding serendipitous movies, and (3) define a novel concept of elasticity in the recommendation, to adjust the level of serendipity flexibly and reach a trade-off between accuracy and serendipity. Extensive experiments on real-world datasets show that HAES enhances the serendipity of recommendations while preserving recommendation quality, compared to several widely used methods.

DBRec: Dual-Bridging Recommendation via Discovering Latent Groups

In recommender systems, the user-item interaction data is usually sparse and not sufficient for learning comprehensive user/item representations for recommendation. To address this problem, we propose a novel dual-bridging recommendation model (DBRec). DBRec performs latent user/item group discovery simultaneously with collaborative filtering, and interacts group information with users/items for bridging similar users/items. Therefore, a user's preference over an unobserved item, in DBRec, can be bridged by the users within the same group who have rated the item, or the user-rated items that share the same group with the unobserved item. In addition, we propose to jointly learn user-user group (item-item group) hierarchies, so that we can effectively discover latent groups and learn compact user/item representations. We jointly integrate collaborative filtering, latent group discovering and hierarchical modelling into a unified framework, so that all the model parameters can be learned toward the optimization of the objective function. We validate the effectiveness of the proposed model with two real datasets, and demonstrate its advantage over the state-of-the-art recommendation models with extensive experiments.

Candidate Generation with Binary Codes for Large-Scale Top-N Recommendation

Generating the Top-N recommendations from a large corpus is computationally expensive to perform at scale. Candidate generation and re-ranking based approaches are often adopted in industrial settings to alleviate efficiency problems. However it remains to be fully studied how well such schemes approximate complete rankings (or how many candidates are required to achieve a good approximation), or to develop systematic approaches to generate high-quality candidates efficiently. In this paper, we seek to investigate these questions via proposing a candidate generation and re-ranking based framework (CIGAR), which first learns a preference-preserving binary embedding for building a hash table to retrieve candidates, and then learns to re-rank the candidates using real-valued ranking models with a candidate-oriented objective. We perform a comprehensive study on several large-scale real-world datasets consisting of millions of users/items and hundreds of millions of interactions. Our results show that CIGAR significantly boosts the Top-N accuracy against state-of-the-art recommendation models, while reducing the query time by orders of magnitude. We hope that this work could draw more attention to the candidate generation problem in recommender systems.

DTCDR: A Framework for Dual-Target Cross-Domain Recommendation

In order to address the data sparsity problem in recommender systems, in recent years, Cross-Domain Recommendation (CDR) leverages the relatively richer information from a source domain to improve the recommendation performance on a target domain with sparser information. However, each of the two domains may be relatively richer in certain types of information (e.g., ratings, reviews, user profiles, item details, and tags), and thus, if we can leverage such information well, it is possible to improve the recommendation performance on both domains simultaneously (i.e., dual-target CDR), rather than a single target domain only. To this end, in this paper, we propose a new framework, DTCDR, for Dual-Target Cross-Domain Recommendation. In DTCDR, we first extensively utilize rating and multi-source content information to generate rating and document embeddings of users and items. Then, based on Multi-Task Learning (MTL), we design an adaptable embedding-sharing strategy to combine and share the embeddings of common users across domains, with which DTCDR can improve the recommendation performance on both richer and sparser (i.e., dual-target) domains simultaneously. Extensive experiments conducted on real-world datasets demonstrate that DTCDR can significantly improve the recommendation accuracies on both richer and sparser domains and outperform the state-of-the-art single-domain and cross-domain approaches.

Recommender System Using Sequential and Global Preference via Attention Mechanism and Topic Modeling

Deep neural networks improved the accuracy of sequential recommendation approach which takes into account the sequential patterns of user logs, e.g., a purchase history of a user. However, incorporating only the individual's recent logs may not be sufficient in properly reflecting global preferences and trends across all users and items. In response, we propose a self-attentive sequential recommender system with topic modeling-based category embedding as a novel approach to exploit global information in the process of sequential recommendation. Our self-attention module effectively leverages the sequential patterns from the user's recent history. In addition, our novel category embedding approach, which utilizes the information computed by topic modeling, efficiently captures global information that the user generally prefers. Furthermore, to provide diverse recommendations as well as to prevent overfitting, our model also incorporates a vector obtained by random sampling. Experimental studies show that our model outperforms state-of-the-art sequential recommendation models, and that category embedding effectively provides global preference information.

SESSION: Long - Recommendation System III

A Spatio-temporal Recommender System for On-demand Cinemas

On-demand cinemas are a new type of offline entertainment venues which have shown the rapid expansion in the recent years. Recommending movies of interest to the potential audiences in on-demand cinemas is keen but challenging because the recommendation scenario is totally different from all the existing recommendation applications including online video recommendation, offline item recommendation and group recommendation. In this paper, we propose a novel spatio-temporal approach called Pegasus. Because of the specific characteristics of on-demand cinema recommendation, Pegasus exploits the POI (Point of Interest) information around cinemas and the content descriptions of movies, apart from the historical movie consumption records of cinemas. Pegasus explores the temporal dynamics and spatial influences rooted in audience behaviors, and captures the similarities between cinemas, the changes of audience crowds, time-varying features and regional disparities of movie popularity. It offers an effective and explainable way to recommend movies to on-demand cinemas. The corresponding Pegasus system has been deployed in some pilot on-demand cinemas. Based on the real-world data from on-demand cinemas, extensive experiments as well as pilot tests are conducted. Both experimental results and post-deployment feedback show that Pegasus is effective.

Semi-Supervised Learning for Cross-Domain Recommendation to Cold-Start Users

Providing accurate recommendations to newly joined users (or potential users, so-called cold-start users) has remained a challenging yet important problem in recommender systems. To infer the preferences of such cold-start users based on their preferences observed in other domains, several cross-domain recommendation (CDR) methods have been studied. The state-of-the-art Embedding and Mapping approach for CDR (EMCDR) aims to infer the latent vectors of cold-start users by supervised mapping from the latent space of another domain. In this paper, we propose a novel CDR framework based on semi-supervised mapping, called SSCDR, which effectively learns the cross-domain relationship even in the case that only a few number of labeled data is available. To this end, it first learns the latent vectors of users and items for each domain so that their interactions are represented by the distances, then trains a cross-domain mapping function to encode such distance information by exploiting both overlapping users as labeled data and all the items as unlabeled data. In addition, SSCDR adopts an effective inference technique that predicts the latent vectors of cold-start users by aggregating their neighborhood information. Our extensive experiments on different CDR scenarios show that SSCDR outperforms the state-of-the-art methods in terms of CDR accuracy, particularly in the realistic settings that a small portion of users overlap between two domains.

Leveraging Ratings and Reviews with Gating Mechanism for Recommendation

Recommender system plays an important role to provide people with personalized information based on their history records. However, it is still a challenge to capture the preference of users accurately due to the sparsity of rating data and the heterogeneity of review data. In this paper, we propose a hybrid deep collaborative filtering model that jointly learns latent representations from ratings and reviews. Specifically, the model learns the rating feature and textual feature based on ratings and reviews simultaneously. Two embedding layers are employed to learn rating feature for users and items based on the user and item interactions, and two attention-based GRU networks learn context-aware representation from user and item reviews. Then a gating mechanism is used to leverage contributions from rating feature and textual feature. Experimental results on six real-world datasets demonstrate the superior performance of the proposed method over several state-of-the-art methods. Moreover, the keywords in reviews can be highlighted to interpret the predictions with the attention mechanism.

Instagrammers, Fashionistas, and Me: Recurrent Fashion Recommendation with Implicit Visual Influence

Fashion-focused key opinion bloggers on Instagram, Facebook, and other social media platforms are fast becoming critical influencers. They can inspire consumer clothing purchases by linking high fashion visual evolution with daily street style. In this paper, we build thefirst visual influence-aware fashion recommender (FIRN) with leveraging fashion bloggers and their dynamic visual posts. Specifically, we extract thedynamic fashion features highlighted by these bloggers via a BiLSTM that integrates a large corpus of visual posts and community influence. We then learn theimplicit visual influence funnel from bloggers to individual users via a personalized attention layer. Finally, we incorporate user personal style and her preferred fashion features across time in a recurrent recommendation network for dynamic fashion-updated clothing recommendation. Experiments show that FIRN outperforms state-of-the-art fashion recommenders, especially for users who are most impacted by fashion influencers, and utilizing fashion bloggers can bring greater improvements in recommendation compared with using other potential sources of visual information. We also release a largetime-aware high-quality visual dataset of fashion influencers that can be exploited for future research.

What Can History Tell Us?

Recommendation systems have been widely applied to many E-commerce and online social media platforms. Recently, sequential item recommendation, especially session-based recommendation, has aroused wide research interests. However, existing sequential recommendation approaches either ignore the historical sessions or consider all historical sessions without any distinction that whether the historical sessions are relevant or not to the current session, which motivates us to distinguish the effect of each historical session and identify relevant historical sessions for recommendation. In light of this, we propose a novel deep learning based sequential recommender framework for session-based recommendation, which takes Nonlocal Neural Network and Recurrent Neural Network as the main building blocks. Specifically, we design a two-layer nonlocal architecture to identify historical sessions that are relevant to the current session and learn the long-term user preferences mostly from these relevant sessions. Besides, we also design a gated recurrent unit (GRU) enhanced by the nonlocal structure to learn the short-term user preferences from the current session. Finally, we propose a novel approach to integrate both long-term and short-term user preferences in a unified way to facilitate training the whole recommender model in an end-to-end manner. We conduct extensive experiments on two widely used real-world datasets, and the experimental results show that our model achieves significant improvements over the state-of-the-art methods.

SESSION: Long - Reinforcement Learning

Context-Aware Ranking by Constructing a Virtual Environment for Reinforcement Learning

Result ranking is one of the major concerns for Web search technologies. Most existing methodologies rank search results in descending order according to pointwise relevance estimation of single results. However, the dependency relationship between different search results are not taken into account. While search engine result pages contain more and more heterogenous components, a better ranking strategy should be a context-aware process and optimize result ranking globally. In this paper, we propose a novel framework which aims to improve context-aware listwise ranking performance by optimizing online evaluation metrics. The ranking problem is formalized as a Markov Decision Process (MDP) and solved with the reinforcement learning paradigm. To avoid the great cost to online systems during the training of the ranking model, we construct a virtual environment with millions of historical click logs to simulate the behavior of real users. Extensive experiments on both simulated and real datasets show that: 1) constructing a virtual environment can effectively leverage the large scale click logs and capture some important properties of real users. 2) the proposed framework can improve search ranking performance by a large margin.

A Multi-Scale Temporal Feature Aggregation Convolutional Neural Network for Portfolio Management

Financial portfolio management is the process of periodically reallocating a fund into different financial investment products, with the goal of achieving the maximum profits. While conventional financial machine learning methods try to predict the price trends, reinforcement learning based portfolio management methods makes trading decisions according to the price changes directly. However, existing reinforcement learning based methods are limited in extracting the price change information at single-scale level, which makes their performance still not satisfactory. In this paper, inspired by the Inception network that has achieved great success in computer vision and can extract multi-scale features simultaneously, we propose a novel Ensemble of Identical Independent Inception (EI$^3$) convolutional neural network, with the objective of addressing the limitation of existing reinforcement learning based portfolio management methods. With EI$^3$, multiple assets can be processed independently while sharing the same network parameters. Moreover, price movement information for each product can be extracted at multiple scales via wide network and then aggregated to make trading decision. Based on EI$^3$, we further propose a recurrent reinforcement learning framework to provide a deep machine learning solution for the portfolio management problem. Comprehensive experiments on the cryptocurrency datasets demonstrate the superiority of our method over existing competitors, in both upswing and downswing environments.

Order-free Medicine Combination Prediction with Graph Convolutional Reinforcement Learning

Medicine Combination Prediction (MCP) based on Electronic Health Record (EHR) can assist doctors to prescribe medicines for complex patients. Previous studies on MCP either ignore the correlations between medicines (i.e., MCP is formulated as a binary classifcation task), or assume that there is a sequential correlation between medicines (i.e., MCP is formulated as a sequence prediction task). The latter is unreasonable because the correlations between medicines should be considered in an order-free way. Importantly, MCP must take additional medical knowledge (e.g., Drug-Drug Interaction (DDI)) into consideration to ensure the safety of medicine combinations. However, most previous methods for MCP incorporate DDI knowledge with a post-processing scheme, which might undermine the integrity of proposed medicine combinations. In this paper, we propose a graph convolutional reinforcement learning model for MCP, named Combined Order-free Medicine Prediction Network (CompNet), that addresses the issues listed above. CompNet casts the MCP task as an order-free Markov Decision Process (MDP) problem and designs a Deep Q Learning (DQL) mechanism to learn correlative and adverse interactions between medicines. Specifcally, we frst use a Dual Convolutional Neural Network (Dual-CNN) to obtain patient representations based on EHRs. Then, we introduce the medicine knowledge associated with predicted medicines to create a dynamic medicine knowledge graph, and use a Relational Graph Convolutional Network (R-GCN) to encode it. Finally, CompNet selects medicines by fusing the combination of patient information and the medicine knowledge graph. Experiments on a benchmark dataset, i.e., MIMIC-III, demonstrate that CompNet signifcantly outperforms state-of-the-art methods and improves a recently proposed model by 3.74%pt, 6.64%pt in terms of Jaccard and F1 metrics.

Reinforcement Learning with Sequential Information Clustering in Real-Time Bidding

Display advertising is a billion dollar business which is the primary income of many companies. In this scenario, real-time bidding optimization is one of the most important problems, where the bids of ads for each impression are determined by an intelligent policy such that some global key performance indicators are optimized. Due to the highly dynamic bidding environment, many recent works try to use reinforcement learning algorithms to train the bidding agents. However, as the probability of the occurrence of a particular state is typically low and the state representation in current work lacks sequential information, the convergence speed and performance of deep reinforcement algorithms are disappointing. To tackle these two challenges in the real-time bidding scenario, we propose ClusterA3C, a novel Advantage Asynchronous Actor-Critic (A3C) variant integrated with a sequential information extraction scheme and a clustering based state aggregation scheme. We conduct extensive experiments to validate the proposed scheme on a real-world commercial dataset. Experimental results show that the proposed scheme outperforms the state of the art methods in terms of either performance or convergence speed.

Generative Question Refinement with Deep Reinforcement Learning in Retrieval-based QA System

In real-world question-answering (QA) systems, ill-formed questions, such as wrong words, ill word order and noisy expressions, are common and may prevent the QA systems from understanding and answering them accurately. In order to eliminate the effect of ill-formed questions, we approach the question refinement task and propose a unified model, QREFINE, to refine the ill-formed questions to well-formed question. The basic idea is to learn a Seq2Seq model to generate a new question from the original one. To improve the quality and retrieval performance of the generated questions, we make two major improvements: 1) To better encode the semantics of ill-formed questions, we enrich the representation of questions with character embedding and the recent proposed contextual word embedding such as BERT, besides the traditional context-free word embeddings; 2) To make it capable to generate desired questions, we train the model with deep reinforcement learning techniques that considers an appropriate wording of the generation as an immediate reward and the correlation between generated question and answer as time-delayed long-term rewards. Experimental results on real-world datasets show that the proposed QREFINE method can generate refined questions with more readability but fewer mistakes than the original questions provided by users. Moreover, the refined questions also significantly improve the accuracy of answer retrieval.

SESSION: Long - Search & Retrieval

Analyzing the Effects of Document's Opinion and Credibility on Search Behaviors and Belief Dynamics

To obtain accurate information through web searches, people have to search for information carefully. This study investigates how the search behaviors and decision outcomes of searchers were affected by the documents they encountered during their search process. We focus on two document factors: (1) opinion (consistent and inconsistent) with the searchers' beliefs prior to the search task, and (2) credibility (high and low). We conducted a user study in which 260 participants were asked to perform health-related search tasks while controlling a search result with different opinions and credibility levels. The results revealed that (i) the participants spent more effort searching by issuing more queries, when belief-inconsistent documents were presented; (ii) the documents' opinion and credibility affected their belief dynamics, (i.e., how their beliefs changed after the search task); and (iii) their belief dynamics and search efforts had few relationships. These findings suggest that search engines could prevent users from polarization and thus, help them to obtain accurate information, by presenting documents that are inconsistent with users' beliefs on the higher-rank of the results.

Identifying Facet Mismatches In Search Via Micrographs

E-commerce search engines are the primary means by which customers shop for products online. Each customer query contains multiple facets such as product type, color, brand, etc. A successful search engine retrieves products that are relevant to the query along each of these attributes. However, due to lexical (erroneous title, description, etc.) and behavioral irregularities (clicks or purchases of products that do not belong to the same facet as the query), some mismatched products are often included in search results. These irregularities can be detected using simple binary classifiers like gradient boosted decision trees or logistic regression. Typically, these binary classifiers use strong independence assumptions between the results and ignore structural relationships available in the data, such as the connections between products and queries. In this paper, we use the connections that exist between products and query to identify a special kind of structure we refer to as a micrograph. Further, we make use of Statistical Relational Learning (SRL) to incorporate these micrographs in the data and pose the problem as a structured prediction problem. We refer to this approach as structured mismatch classification (\SMC). In addition, we show that naive addition of structure does not improve the performance of the model and hence introduce a variation of \SMC, strong \SMC~(\SSMC), which improves over the baseline by passing information from high-confidence predictions to lower confidence predictions. In our empirical evaluation we show that our proposed approach outperforms the baseline classification methods by up to 12% in precision. Furthermore, we use quasi-Newton methods to make our method viable for real-time inference in a search engine and show that our approach is up to 150 times faster than existing ADMM-based solvers.

GRIP: Multi-Store Capacity-Optimized High-Performance Nearest Neighbor Search for Vector Search Engine

This paper presents GRIP, an approximate nearest neighbor (ANN) search algorithm for building vector search engine which makes heavy use of the algorithm. GRIP is designed to retrieve documents at large-scale based on their semantic meanings in a scalable way. It is both fast and capacity-optimized. GRIP combines new algorithmic and system techniques to collaboratively optimize the use of memory, storage, and computation. The contributions include: (1) The first hybrid memory-storage ANN algorithm that allows ANN to benefit from both DRAM and SSDs simultaneously; (2) The design of a highly optimized indexing scheme that provides both memory-efficiency and high performance; (3) A cost analysis and a cost function for evaluating the capacity improvements of ANN algorithms. GRIP achieves an order of magnitude improvements on overall system efficiency, significantly reducing the cost of vector search, while attaining equal or higher accuracy, compared with the state-of-the-art.

Improving Web Image Search with Contextual Information

In web image search, items users search for are images instead of Web pages or online services. Web image search constitutes a very important part of web search. Re-ranking is a trusted technique to improve retrieval effectiveness in web search. Previous work on re-ranking web image search results mainly focuses on intra-query information (e.g., human interactions with the initial list of the current query). Contextual information such as the query sequence and implicit user feedback provided during a search session prior to the current query is known to improve the performance of general web search but has so far not been used in web image search. The differences in result placement and interaction mechanisms of image search make the search process rather different from general Web search engines. Because of these differences, context-aware re-ranking models that have originally been developed for general web search cannot simply be applied to web image search. We propose CARM, a context-aware re-ranking model, a neural network-based framework to re-rank web image search results for a query based on previous interaction behavior in the search session in which the query was submitted. Specifically, we explore a hybrid encoder with an attention mechanism to model intra-query and inter-query user preferences for image results in a two-stage structure. We train context-aware re-ranking model (CARM) to jointly learn query and image representations so as to be able to deal with the multimodal characteristics of web image search. Extensive experiments are carried out on a commercial web image search dataset. The results show that CARM outperforms state-of-the-art baseline models in terms of personalized evaluation metrics. Also, CARM combines the original ranking can improve the original ranking on personalized ranking and relevance estimation. We make the implementation of CARM and relevant datasets publicly available to facilitate future studies.

Dynamic Bayesian Metric Learning for Personalized Product Search

In this paper, we study the problem of personalized product search under streaming scenarios. We address the problem by proposing a Dynamic Bayesian Metric Learning model, abbreviated as DBML, which can collaboratively track the evolutions of latent semantic representations of different categories of entities (i.e., users, products and words) over time in a joint metric space. In particular, unlike previous work using inner-product metric to model the affinities between entities, our DBML is a novel probabilistic metric learning approach that is able to avoid the contradicts, keep the triangle inequality in the latent space, and correctly utilize implicit feedbacks. For inferring dynamic embeddings of the entities, we propose a scalable online inference algorithm, which can jointly learn the latent representations of entities and smooth their changes across time, based on amortized inference. The inferred dynamic semantic representations of entities collaboratively inferred in a unified form by our DBML can benefit not only for improving personalized product search, but also for capturing the affinities between users, products and words. Experimental results on large datasets over a number of applications demonstrate that our DBML outperforms the state-of-the-art algorithms, and can effectively capture the evolutions of semantic representations of different categories of entities over time.

SESSION: Long - Sequential Data Analysis

Towards Accurate and Interpretable Sequential Prediction: A CNN & Attention-Based Feature Extractor

With the influence of information explosion, there are more and more choices exposed to public view. Next item recommendation is being a significant and challenging task. Recently, attention mechanism, Convolutional Neural Networks (CNN) and other kinds of deep components are used to model user behaviors. However, the proposed models often fail to extract the feature of user behaviors in different time periods and the CNN-based models before are hard to make the used CNN interpretable. In this paper, we propose a CNN & Attention-based Sequential Feature Extractor (CASFE) module to capture the possible features of user behaviors at different time intervals. Specifically, we import CNN to extract multi-level features of user behaviors with different time periods. After each CNN layer, we use attention module to emphasize the different effect of behaviors on the prediction result. Besides, the features we try to extract here have the similar concept and meaning with the hand-crafted features in Feature Engineering, which proves the validity of CASFE. Accordingly, CASFE becomes a general sequential feature extractor that can be used in various sequential prediction tasks. With Multi-Layer Perceptron (MLP), CASFE would be a state-of-the-art next item recommendation model. The model obtains good performance on Last.fm_1K dataset and MovieLens_1M dataset. Besides, as a compatible extractor module, it can also promote CTR prediction models as well as other sequential prediction tasks.

Locally Slope-based Dynamic Time Warping for Time Series Classification

Dynamic time warping (DTW) has been widely used in various domains of daily life. Essentially, DTW is a non-linear point-to-point matching method under time consistency constraints to find the optimal path between two temporal sequences. Although DTW achieves a globally optimal solution, it does not naturally capture locally reasonable alignments. Concretely, two points with entirely dissimilar local shape may be aligned. To solve this problem, we propose a novel weighted DTW based on local slope feature (LSDTW), which enhances DTW by taking regional information into consideration. LSDTW is inherently a DTW algorithm. However, it additionally attempts to pair locally similar shapes, and to avoid matching points with distinct neighborhood slopes. Furthermore, when LSDTW is used as a similarity measure in the popular nearest neighbor classifier, it beats other distance-based methods on the vast majority of public datasets, with significantly improved classification accuracies. In addition, case studies establish the interpretability of the proposed method.

HiCAN: Hierarchical Convolutional Attention Network for Sequence Modeling

Convolutional neural networks (CNN) are widely used on sequential data since it can capture local context dependencies and temporal order information inside sequences. Attention (ATT) mechanisms have also attracted enormous interests due to its capability of capturing the important parts of a sequence. These two neural networks can extract different features from sequences. In order to combine the advantages of CNN and ATT, we propose a convolutional attention network (CAN), which merges the structure of CNN and ATT into a single neural network and can serve as a new basic module in complex neural networks. Based on CAN, we then build a sequence encoding model with hierarchical structure, "hierarchical convolutional attention network (HiCAN)", to tackle sequence modeling problems. It can explicitly capture both the local and global context dependencies and temporal order information in sequences. Extensive experiments conducted on session-based recommendation (Recommender Systems) demonstrate that HiCAN is able to outperform state-of-the-art methods and show higher computational efficiency. Furthermore, we conduct extended experiments on text classification (Natural Language Processing). The results show that our model can also achieve competitive performance on NLP tasks.

Automatic Sequential Pattern Mining in Data Streams

Given a large volume of multi-dimensional data streams, such as that produced by IoT applications, finance and online web-click logs, how can we discover typical patterns and compress them into compact models? In addition, how can we incrementally distinguish multiple patterns while considering the information obtained from a pattern found in a streaming setting? In this paper, we propose a streaming algorithm, namely StreamScope, that is designed to find intuitive patterns efficiently from event streams evolving over time. Our proposed method has the following properties: (a) it is effective: it operates on semi-infinite collections of co-evolving streams and summarizes all the streams into a set of multiple discrete segments grouped by their similarities. (b) it is automatic: it automatically and incrementally recognizes such patterns and generates models for each of them if necessary; (c) it is scalable: the complexity of our method does not depend on the length of the data streams. Our extensive experiments on real data streams demonstrate that StreamScope can find meaningful patterns and achieve great improvements in terms of computational time and memory space over its full batch method competitors.

Efficient Sequential and Parallel Algorithms for Estimating Higher Order Spectra

Higher order spectra (HOS) are a powerful tool in nonlinear time series analysis and they have been extensively used as feature representations in data mining, communications and cosmology domains. However, HOS estimation suffers from high computational cost and memory consumption. Any algorithm for computing the kth order spectra on a dataset of size n needs O(n^k-1 ) time since the output size will be O(n^k-1 ) as well, which makes the direct HOS analysis difficult for long time series, and further prohibits its direct deployment to resource-limited and time-sensitive applications. Existing algorithms for computing HOS are either inefficient or have been implemented on obsolete architectures. Thus it is essential to develop efficient generic algorithms for HOS estimations. In this paper, we present a package of generic sequential and parallel algorithms for computationally and memory efficient HOS estimations which can be employed on any parallel machine or platform. Our proposed algorithms largely reduce the HOS' computational cost and memory usage in spectrum multiplication and smoothing steps through carefully designed prefix sum operations. Moreover, we employ a matrix partitioning technique and design algorithms with optimal memory usage and present the parallel approaches on the PRAM and the mesh models. Furthermore, we implement our algorithms for both bispectrum and trispectrum estimations. We conduct extensive experiments and cross-compare the proposed algorithms' performance. Results show that our algorithms achieve state-of-the-art computational and memory efficiency, and our parallel algorithms achieve close to linear speedups. The code is available at

SESSION: Long - Social Network

A Modular Adversarial Approach to Social Recommendation

This paper proposes a novel framework to incorporate social regularization for item recommendation. Social regularization grounded in ideas of homophily and influence appears to capture latent user preferences. However, there are two key challenges: first, the importance of a specific social link depends on the context and second, a fundamental result states that we cannot disentangle homophily and influence from observational data to determine the effect of social inference. Thus we view the attribution problem as inherently adversarial where we examine two competing hypothesis---social influence and latent interests---to explain each purchase decision. We make two contributions. First, we propose a modular, adversarial framework that decouples the architectural choices for the recommender and social representation models, for social regularization. Second, we overcome degenerate solutions through an intuitive contextual weighting strategy, that supports an expressive attribution, to ensure informative social associations play a larger role in regularizing the learned user interest space. Our results indicate significant gains (5-10% relative [email protected]) over state-of-the-art baselines across multiple publicly available datasets.

Emotional Contagion-Based Social Sentiment Mining in Social Networks by Introducing Network Communities

The rapid development of social media services has facilitated the communication of opinions through online news, blogs, microblogs, instant-messages, and so on. This article concentrates on the mining of readers' social sentiments evoked by social media materials. Existing methods are only applicable to a minority of social media like news portals with emotional voting information, while ignore the emotional contagion between writers and readers. However, incorporating such factors is challenging since the learned hidden variables would be very fuzzy (because of the short and noisy text in social networks). In this paper, we try to solve this problem by introducing a high-order network structure, i.e. communities. We first propose a new generative model called Community-Enhanced Social Sentiment Mining (CESSM), which 1) considers the emotional contagion between writers and readers to capture precise social sentiment, and 2) incorporates network communities to capture coherent topics. We then derive an inference algorithm based on Gibbs sampling. Empirical results show that, CESSM achieves significantly superior performance against the state-of-the-art techniques for text sentiment classification and interestingness in social sentiment mining.

Social-Aware VR Configuration Recommendation via Multi-Feedback Coupled Tensor Factorization

Recent technological advent in virtual reality (VR) has attracted a lot of attention to the VR shopping, which thus far is designed for a single user. In this paper, we envision the scenario of VR group shopping, where VR supports: 1) flexible display of items to address diverse personal preferences, and 2) convenient view switching between personal and group views to foster social interactions. We formulate the Multiview-Enabled Configuration Recommendation (MECR) problem to rank a set of displayed items for a VR shopping user. We design the Multiview-Enabled Configuration Ranking System (MEIRS) that first extracts discriminative features based on Marketing theories and then introduces a new coupled tensor factorization model to learn the representation of users, Multi-View Display (MVD) configurations, and multiple feedback with content features. Experimental results manifest that the proposed approach outperforms personalized recommendations and group recommendations by at least 30.8% in large-scale datasets and 63.3% in the user study in terms of hit ratio and mean average precision.

Tracking Top-k Influential Users with Relative Errors

Tracking influential users in a dynamic social network is a fundamental step in fruitful applications, such as social recommendation, network topology optimization, and blocking rumour spreading. The major obstacle in mining top influential users is that estimating users' influence spreads is \#P-hard under most influence propagation models. Previous studies along this line either seek heuristic solutions or may return meaningless results due to the lack of prior knowledge about users' influence in the dynamic network. In this paper, we tackle the problem of tracking top-k influential individuals in a dynamic social network. When a top-k query is issued, our algorithm returns a set S of more than k users. With high probability, our algorithm guarantees that S contains all real top-k influential users and there exists a relative error ε < 1$ such that the least influential user in S has influence at least $(1-ε) I^k$, where $I^k$ is the influence of the k-th most influential user and we can adjust ε via parameter settings. Controlling such a relative error enables us to obtain meaningful results even when we know nothing about the value of $I^k$ or $I^k$ changes over time in the dynamic network. In addition to the thorough theoretical results, our experimental results on large real networks clearly demonstrate the effectiveness and efficiency of our algorithm.

NActSeer: Predicting User Actions in Social Network using Graph Augmented Neural Network

Nowadays social network platforms like Twitter, Facebook, Weibo have created a new landscape to communicate with our friends and the world at large. In this landscape our social activities, purchase decisions, check-ins etc. become available immediately to our friends/followers and thus encouraging them to involve in the same activity. This gives rise to the question, given a user and her friends' previous actions, can we predict what is she going to do next? This problem can serve as a good indicator enabling policy research, targeted advertising, assortment planning etc. To capture such sequential mechanism two broad classes of methods have been proposed in the past. First one is the Markov Chain (MC), which assumes user's next action can be predicted based on her most recently taken actions while the second type of approach i.e. Recurrent Neural Network (RNN) tries to model both long and short term preferences of a user. However, none of the two classes of models contain any integrated mechanism to capture the preferences of neighbor's actions. To fill this gap, we propose a social network augmented neural network model named NActSeer which takes the neighbors' actions into account in addition to the user's history. To achieve this NActSeer maintains a dynamic user embedding based on the activities within a time window. It then learns a feature representation for each user which is augmented by her neighbors. Empirical studies on four real-world datasets show that NActSeer is able to outperform several classical and state-of-the-art models proposed for similar problems and achieves up to 71% performance boost.

SESSION: Long - Understanding and Interpretability I

In2Rec: Influence-based Interpretable Recommendation

Interpretability of recommender systems has caused increasing attention due to its promotion of the effectiveness and persuasiveness of recommendation decision, and thus user satisfaction. Most existing methods, such as Matrix Factorization (MF), tend to be black-box machine learning models that lack interpretability and do not provide a straightforward explanation for their outputs. In this paper, we focus on probabilistic factorization model and further assume the absence of any auxiliary information, such as item content or user review. We propose an influence mechanism to evaluate the importance of the users' historical data, so that the most related users and items can be selected to explain each predicted rating. The proposed method is thus called Influencebased Interpretable Recommendation model (In2Rec). To further enhance the recommendation accuracy, we address the important issue of missing not at random, i.e., missing ratings are not independent from the observed and other unobserved ratings, because users tend to only interact what they like. In2Rec models the generative process for both observed and missing data, and integrates the influence mechanism in a Bayesian graphical model. A learning algorithm capitalizing on iterated condition modes is proposed to tackle the non-convex optimization problem pertaining to maximum a posteriori estimation for In2Rec. A series of experiments on four real-world datasets (Movielens 10M, Netflix, Epinions, and Yelp) have been conducted. By comparing with the state-of-the-art recommendation methods, the experimental results have shown that In2Rec can consistently benefit the recommendation system in both rating prediction and ranking estimation tasks, and friendly interpret the recommendation results with the aid of the proposed influence mechanism.

Accounting for Temporal Dynamics in Document Streams

Textual information, such as news articles, social media, and online forum discussions, often comes in a form of sequential text streams. Events happening in the real world trigger a set of articles talking about them or related events over a period of time. In the meanwhile, even one event is fading out, another related event could raise public attention. Hence, it is important to leverage the information about how topics influence each other over time to obtain a better understanding and modeling of document streams. In this paper, we explicitly model mutual influence among topics over time, with the purpose to better understand how events emerge, fade and inherit. We propose a temporal point process model, referred to as Correlated Temporal Topic Model (CoTT), to capture the temporal dynamics in a latent topic space. Our model allows for efficient online inference, scaling to continuous time document streams. Extensive experiments on real-world data reveal the effectiveness of our model in recovering meaningful temporal dependency structure among topics and documents.

How Does BERT Answer Questions?: A Layer-Wise Analysis of Transformer Representations

Bidirectional Encoder Representations from Transformers (BERT) reach state-of-the-art results in a variety of Natural Language Processing tasks. However, understanding of their internal functioning is still insufficient and unsatisfactory. In order to better understand BERT and other Transformer-based models, we present a layer-wise analysis of BERT's hidden states. Unlike previous research, which mainly focuses on explaining Transformer models by their attention weights, we argue that hidden states contain equally valuable information. Specifically, our analysis focuses on models fine-tuned on the task of Question Answering (QA) as an example of a complex downstream task. We inspect how QA models transform token vectors in order to find the correct answer. To this end, we apply a set of general and QA-specific probing tasks that reveal the information stored in each representation layer. Our qualitative analysis of hidden state visualizations provides additional insights into BERT's reasoning process. Our results show that the transformations within BERT go through phases that are related to traditional pipeline tasks. The system can therefore implicitly incorporate task-specific information into its token representations. Furthermore, our analysis reveals that fine-tuning has little impact on the models' semantic abilities and that prediction errors can be recognized in the vector representations of even early layers.

Patterns of Search Result Examination: Query to First Action

To determine key factors that affect a user's behavior with search results, we conducted a controlled eye-tracking study of users completing search tasks using both desktop and mobile devices. We focus our investigation on users' behavior from their query to the first action they take with the search engine results page (SERP): either a click on a search result or a reformulation of their query. We found that a user deciding to reformulate a query rather than click on a result is best understood as being caused by the user's examination pattern not including a relevant search result. If a user sees a relevant result, they are very likely to click it. Of note, users do not look at all search results and their examination may be influenced by other factors. The key factors we found to explain a user's examination pattern are: the rank of search results, the user type, and the query quality. While existing research has identified rank and user types as important factors affecting examination patterns, to our knowledge, query quality is a new discovery. We found that user queries can be understood as either of weak or strong quality. Weak queries are those that the user may believe are more likely to fail compared to a strong query, and as a result, we find that users modify their examination patterns to view fewer documents when they issue a weak query, i.e. they give up sooner.

A Dynamic Product-aware Learning Model for E-commerce Query Intent Understanding

Query intent understanding is a fundamental and essential task in searching, which promotes personalized retrieval results and users' satisfaction. In E-commerce, query understanding is particularly referring to bridging the gap between query representations and product representations. In this paper, we aim to map the queries into the predefined tens of thousands of fine-grained categories extracted from the product descriptions. The problem is very challenging in several aspects. First, a query may be related to multiple categories and to identify all the best matching categories could eventually drive the search engine for high recall and diversity. Second, the same query may have dynamic intents under various scenarios and there is a need to distinguish the differences to promote accurate categories of products. Third, the tail queries are particularly difficult for understanding due to noise and lack of customer feedback information. To better understand the queries, we firstly conduct analysis on the search queries and behaviors in the E-commerce domain and identified the uniqueness of our problem (e.g. longer sessions). Then we propose a Dynamic Product-aware Hierarchical Attention (DPHA) framework to capture the explicit and implied meanings of a query given its context information in the session. Specifically, DPHA automatically learns the bidirectional query-level and self-attentional session-level representations which can capture both complex long range dependencies and structural information. Extensive experimental results on a real E-commerce query data set demonstrate the effectiveness of the proposed DPHA compared to the state-of-art baselines.

SESSION: Long - Understanding and Interpretability II

Scalable Causal Graph Learning through a Deep Neural Network

Learning the causal graph in a complex system is crucial for knowledge discovery and decision making, yet it remains a challenging problem because of the unknown nonlinear interaction among system components. Most of the existing methods either rely on predefined kernel or data distribution, or they focus simply on the causality between a single target and the remaining system. This work presents a deep neural network for scalable causal graph learning (SCGL) through low-rank approximation. The SCGL model can explore nonlinearity on both temporal and intervariable relationships without any predefined kernel or distribution assumptions. Through low-rank approximation, the noise influence is reduced, and better accuracy and high scalability are achieved. Experiments using synthetic and real-world datasets show that our SCGL algorithm outperforms existing state-of-the-art methods for causal graph learning.

Interpretable Multiple-Kernel Prototype Learning for Discriminative Representation and Feature Selection

Prototype-based methods are of the particular interest for domain specialists and practitioners as they summarize a dataset by a small set of representatives. Therefore, in a classification setting, interpretability of the prototypes is as significant as the prediction accuracy of the algorithm. Nevertheless, the state-of-the-art methods make inefficient trade-offs between these concerns by sacrificing one in favor of the other, especially if the given data has a kernel-based (or multiple-kernel) representation. In this paper, we propose a novel interpretable multiple-kernel prototype learning (IMKPL) to construct highly interpretable prototypes in the feature space, which are also efficient for the discriminative representation of the data. Our method focuses on the local discrimination of the classes in the feature space and shaping the prototypes based on condensed class-homogeneous neighborhoods of data. Besides, IMKPL learns a combined embedding in the feature space in which the above objectives are better fulfilled. When the base kernels coincide with the data dimensions, this embedding results in a discriminative features selection. We evaluate IMKPL on several benchmarks from different domains which demonstrate its superiority to the related state-of-the-art methods regarding both interpretability and discriminative representation.

BePT: A Behavior-based Process Translator for Interpreting and Understanding Process Models

Sharing process models on the web has emerged as a common practice. Users can collect and share their experimental process models with others. However, some users always feel confused about the shared process models for lack of necessary guidelines or instructions. Therefore, several process translators have been proposed to explain the semantics of process models in natural language (NL). We find that previous studies suffer from information loss and generate semantically erroneous descriptions that diverge from original model behaviors. In this paper, we propose a novel process translator named BePT (Behavior-based Process Translator) based on the encoder-decoder paradigm, encoding a process model into a middle representation and decoding the representation into NL descriptions. Our theoretical analysis demonstrates that BePT satisfies behavior correctness, behavior completeness and description minimality. The qualitative and quantitative experiments show that BePT outperforms the state-of-the-art baselines.

Towards Effective and Interpretable Person-Job Fitting

The diversity of job requirements and the complexity of job seekers' abilities put forward higher requirements for the accuracy and interpretability of Person-Job Fit system. Interpretable Person-Job Fit system can show reasons for giving recommendations or not recommending specific jobs to some people, and vice versa. Such reasons help us understand according to what the final decision is made by the system and guarantee a high recommending accuracy. Existing studies on Person-Job Fit have focused on 1) one perspective, without considering the variances of role and psychological motivation between interviewer and job seeker; 2) modeling the matching degree between resume and job requirements directly through a deep neural network without interaction matching modules, which leads to shortage on interpretation. To this end, we propose an Interpretable Person-Job Fit (IPJF) model, which 1) models the Person-Job Fit problem from the perspectives/intentions of employer and job seeker in a multi-tasks optimization fashion to interpretively formulate the Person-Job Fit process; 2) leverages deep interactive representation learning to automatically learn the interdependence between a resume and job requirements without relying on a clear list of job seeker's abilities, and deploys the optimizing problem as a learning to rank problem. Experiments on large real dataset show that the proposed IPJF model outperforms state-of-the-art baselines and also gives promising interpretable recommending reasons.

Leveraging Graph Neighborhoods for Efficient Inference

Several probabilistic extensions of description logic languages have been proposed and thoroughly studied. However, their practical use has been hampered by intractability of various reasoning tasks. While present-day knowledge bases (KBs) contain millions of instances and thousands of axioms, most state-of-the-art reasoners are capable of handling small scale KBs with thousands of instances. Thus, recent research has focused on leveraging the structure of KBs and queries in order to speed up inference runtime. However, these efforts have not been satisfactory in providing reasoners that are suitable for practical use in large scale KBs. In this study, we aim to tackle this challenging problem. In doing so, we use a probabilistic extension of OWL RL (called PRORL) as a modeling language and exploit graph neighborhoods (of undirected graphical models) for efficient approximate probabilistic inference. We show that subgraph extraction based inference is much faster and has comparable accuracy to full graph inference. We perform several experiments, in order to support our claim, over a NELL KB containing millions of instances and thousands of axioms. Furthermore, we propose a novel graph-based algorithm to automatically partition inferences rules based on their structure for efficient parallel inference.

SESSION: Long - Urban Computing I

STAR: Spatio-Temporal Taxonomy-Aware Tag Recommendation for Citizen Complaints

In modern cities, complaining has become an important way for citizens to report emerging urban issues to governments for quick response. For ease of retrieval and handling, government officials usually organize citizen complaints by manually assigning tags to them, which is inefficient and cannot always guarantee the quality of assigned tags. This work attempts to solve this problem by recommending tags for citizen complaints. Although there exist many studies on tag recommendation for textual content, few of them consider two characteristics of citizen complaints, i.e., the spatio-temporal correlations and the taxonomy of candidate tags. In this paper, we propose a novel Spatio-Temporal Taxonomy-Aware Recommendation model (STAR), to recommend tags for citizen complaints by jointly incorporating spatio-temporal information of complaints and the taxonomy of candidate tags. Specifically, STAR first exploits two parallel channels to learn representations for textual and spatio-temporal information. To effectively leverage the taxonomy of tags, we design chained neural networks that gradually refine the representations and perform hierarchical recommendation under a novel taxonomy constraint. A fusion module is further proposed to adaptively integrate contributions of textual and spatio-temporal information in a tag-specific manner. We conduct extensive experiments on a real-world dataset and demonstrate that STAR significantly performs better than state-of-the-art methods. The effectiveness of key components in our model is also verified through ablation studies.

CoLight: Learning Network-level Cooperation for Traffic Signal Control

Cooperation among the traffic signals enables vehicles to move through intersections more quickly. Conventional transportation approaches implement cooperation by pre-calculating the offsets between two intersections. Such pre-calculated offsets are not suitable for dynamic traffic environments. To enable cooperation of traffic signals, in this paper, we propose a model, CoLight, which uses graph attentional networks to facilitate communication. Specifically, for a target intersection in a network, CoLight can not only incorporate the temporal and spatial influences of neighboring intersections to the target intersection, but also build up index-free modeling of neighboring intersections. To the best of our knowledge, we are the first to use graph attentional networks in the setting of reinforcement learning for traffic signal control and to conduct experiments on the large-scale road network with hundreds of traffic signals. In experiments, we demonstrate that by learning the communication, the proposed model can achieve superior performance against the state-of-the-art methods.

Learning to Effectively Estimate the Travel Time for Fastest Route Recommendation

Fastest Route Recommendation (FRR) aims to find the fastest path in response to user's queries in a large complex road network. Early studies cast the FRR task as a pathfinding problem on graphs and adopt heuristic algorithms as the major solution due to the efficiency and robustness. A major problem of heuristic algorithms is that the heuristic function is usually empirically set with simple methods, which is difficult to model other useful factors. In this paper, we extend the classic A* algorithm for the FRR task by modeling complex traffic information with neural networks. Specially, we identify an important factor that is important to improve the FRR task, i.e. the estimation of travel time. For this purpose, we first develop a module for predicting the time-varying traffic speed for a road segment, which is the foundation for estimating the travel time. Conditioned on this module, we further design another module to estimate the fastest travel time between two locations connected by routes. We adopt neural networks to implement both modules for enabling the capacity of modeling complex traffic characteristics and dynamics. In this way, the original two cost functions of A* algorithm have been set in a more principled way with neural networks. To our knowledge, we are the first to use neural networks for improving A* algorithm in the FRR task. It elegantly combines the merits of A* algorithm and the powerful modeling capacities of neural networks for the FRR task. Extensive results on the three real-world datasets have shown the effectiveness and robustness of the proposed model.

PRNet: Outdoor Position Recovery for Heterogenous Telco Data by Deep Neural Network

Recent years have witnessed unprecedented amounts of telecommunication (Telco) data generated by Telco networks. For example, measurement records (MRs) are generated to report the connection states, e.g., received signal strength, between mobile devices and Telco networks. MR data have been widely used to precisely recover outdoor locations of mobile devices for the applications e.g., human mobility, urban planning and traffic forecasting. Existing works using first-order sequence models such as the Hidden Markov Model (HMM) attempt to capture the spatio-temporal locality in underlying mobility patterns for lower localization errors. Such HMM approaches typically assume stable mobility pattern of underlying mobile devices. Yet real MR datasets frequently exhibit heterogeneous mobility patterns due to mixed transportation modes of underlying mobile devices and uneven distribution of the positions associated with MR samples. To address this issue, we propose a deep neural network (DNN)-based position recovery framework, namely PRNet, which can ensemble the power of CNN, sequence model LSTM, and two attention mechanisms to learn local, short- and long-term spatio-temporal dependencies from input MR samples. Extensive evaluation on six datasets collected at three representative areas (core, urban, and suburban areas in Shanghai, China) indicates that PRNet greatly outperforms seven counterparts.

Active Collaborative Sensing for Energy Breakdown

Residential homes constitute roughly one-fourth of the total energy usage worldwide. Providing appliance-level energy breakdown has been shown to induce positive behavioral changes that can reduce energy consumption by 15%. Existing approaches for energy breakdown either require hardware installation in every target home or demand a large set of energy sensor data available for model training. However, very few homes in the world have installed sub-meters (sensors measuring individual appliance energy); and the cost of retrofitting a home with extensive sub-metering eats into the funds available for energy saving retrofits. As a result, strategically deploying sensing hardware to maximize the reconstruction accuracy of sub-metered readings in non-instrumented homes while minimizing deployment costs becomes necessary and promising. In this work, we develop an active learning solution based on low-rank tensor completion for energy breakdown. We propose to actively deploy energy sensors to appliances from selected homes, with a goal to improve the prediction accuracy of the completed tensor with minimum sensor deployment cost. We empirically evaluate our approach on the largest public energy dataset collected in Austin, Texas, USA, from 2013 to 2017. The results show that our approach gives better performance with fixed number of sensors installed, when compared to the state-of-the-art, which is also proven by our theoretical analysis.

SESSION: Long - Urban Computing II

Forecasting Pavement Performance with a Feature Fusion LSTM-BPNN Model

In modern pavement management systems, pavement roughness is an important indicator of pavement performance, and it reflects the smoothness of pavement surface. International Roughness Index (IRI) is the de-facto metric to quantitatively analyze the roughness of pavement surface. The pavement with high IRI not only reduces the lifetime of vehicles, but also raises the risk of car accidents. Accurate prediction of IRI becomes a key task for the pavement management system, and it helps the transportation department refurbish the pavement in time. However, existing models are proposed on top of small datasets, and have poor performance. Besides, they only consider cross-sectional features of the pavements without any time-series information. In order to better capture the latent relationship between the cross-sectional and time-series features, we propose a novel feature fusion LSTM-BPNN model. LSTM-BPNN first learns the cross-sectional and time-series features with two neural networks separately, then it fuses both features via an attention mechanism. Experimental results on a high-quality real-world dataset clearly demonstrate that the new model outperforms existing considerable alternatives.

Learning Phase Competition for Traffic Signal Control

Increasingly available city data and advanced learning techniques have empowered people to improve the efficiency of our city functions. Among them, improving urban transportation efficiency is one of the most prominent topics. Recent studies have proposed to use reinforcement learning (RL) for traffic signal control. Different from traditional transportation approaches which rely heavily on prior knowledge, RL can learn directly from the feedback. However, without a careful model design, existing RL methods typically take a long time to converge and the learned models may fail to adapt to new scenarios. For example, a model trained well for morning traffic may not work for the afternoon traffic because the traffic flow could be reversed, resulting in very different state representation. In this paper, we propose a novel design called FRAP, which is based on the intuitive principle of phase competition in traffic signal control: when two traffic signals conflict, priority should be given to one with larger traffic movement (i.e., higher demand). Through the phase competition modeling, our model achieves invariance to symmetrical cases such as flipping and rotation in traffic flow. By conducting comprehensive experiments, we demonstrate that our model finds better solutions than existing RL methods in the complicated all-phase selection problem, converges much faster during training, and achieves superior generalizability for different road structures and traffic conditions.

Path Travel Time Estimation using Attribute-related Hybrid Trajectories Network

Estimation of path travel time provides great value to applications like bus line designs and route plannings. Existing approaches are mainly based on single-source trajectory datasets that are usually large in size to ensure a satisfactory performance. This leads to two limitations: 1) Large-scale data may not always be attainable, e.g. city-scale public bus data is usually small compared to taxi data due to relative fewer bus trips in a day. 2) Considering only single-source trajectory data neglects the potential estimation-improving insights of external data, e.g. trajectory dataset of other vehicle sources obtained from the same geographical region. A challenge is how to effectively utilize such other trajectory sources. Moreover, existing work does not attend the important attributes of a trajectory including vehicle ID, day of week, rainfall level etc., which are important for estimating the path travel time. Motivated by these and the recent successes of neural network models, we propose Attribute-related Hybrid Trajectories Network~(AtHy-TNet), a neural model that effectively utilizes the attribute correlations, as well as the spatial and temporal relationships across hybrid trajectory data. We apply this to a novel problem of estimating path travel time of a type of vehicles using a hybrid trajectory dataset that includes trajectories from other vehicle types. We demonstrate in our experiments the benefits of considering hybrid data for travel time estimation, and show that AtHy-TNet significantly outperforms state-of-the-art methods on real-world trajectory datasets.

CoRide: Joint Order Dispatching and Fleet Management for Multi-Scale Ride-Hailing Platforms

How to optimally dispatch orders to vehicles and how to trade off between immediate and future returns are fundamental questions for a typical ride-hailing platform. We model ride-hailing as a large-scale parallel ranking problem and study the joint decision-making task of order dispatching and fleet management in online ride-hailing platforms. This task brings unique challenges in the following four aspects. First, to facilitate a huge number of vehicles to act and learn efficiently and robustly, we treat each region cell as an agent and build a multi-agent reinforcement learning framework. Second, to coordinate the agents from different regions to achieve long-term benefits, we leverage the geographical hierarchy of the region grids to perform hierarchical reinforcement learning. Third, to deal with the heterogeneous and variant action space for joint order dispatching and fleet management, we design the action as the ranking weight vector to rank and select the specific order or the fleet management destination in a unified formulation. Fourth, to achieve the multi-scale ride-hailing platform, we conduct the decision-making process in a hierarchical way where a multi-head attention mechanism is utilized to incorporate the impacts of neighbor agents and capture the key agent in each scale. The whole novel framework is named as CoRide. Extensive experiments based on multiple cities real-world data as well as analytic synthetic data demonstrate that CoRide provides superior performance in terms of platform revenue and user experience in the task of city-wide hybrid order dispatching and fleet management over strong baselines.

Unsupervised Representation Learning of Spatial Data via Multimodal Embedding

Increasing urbanization across the globe has coincided with greater access to urban data; this enables researchers and city administrators with better tools to understand urban dynamics, such as crime, traffic, and living standards. In this paper, we study the Learning an Embedding Space for Regions (LESR) problem, wherein we aim to produce vector representations of discrete regions. Recent studies have shown that embedding geospatial regions in a latent vector space can be useful in a variety of urban computing tasks. However, previous studies do not consider regions across multiple modalities in an end-to-end framework. We argue that doing so facilitates the learning of greater semantic relationships among regions. We propose a novel method, RegionEncoder, that jointly learns region representations from satellite image, point-of-interest, human mobility, and spatial graph data. We demonstrate that these region embeddings are useful as features in two regression tasks and across two distinct urban environments. Additionally, we perform an ablation study that evaluates each major architectural component. Finally, we qualitatively explore the learned embedding space, and show that semantic relationships are discovered across modalities

SESSION: Long - User Behavior

Rating Mechanisms for Sustainability of Crowdsourcing Platforms

Crowdsourcing leverages the diverse skill sets of large collections of individual contributors to solve problems and execute projects, where contributors may vary significantly in experience, expertise, and interest in completing tasks. Hence, to ensure the satisfaction of its task requesters, most existing crowdsourcing platforms focus primarily on supervising contributors' behavior. This lopsided approach to supervision negatively impacts contributor engagement and platform sustainability.

In this paper, we introduce rating mechanisms to evaluate requesters' behavior, such that the health and sustainability of crowdsourcing platform can be improved. We build a game theoretical model to systematically account for the different goals of requesters, contributors, and platform, and their interactions. On the basis of this model, we focus on a specific application, in which we aim to design a rating policy that incentivizes requesters to engage lessexperienced contributors. Considering the hardness of the problem, we develop a time efficient heuristic algorithm with theoretical bound analysis. Finally, we conduct a user study in Amazon Mechanical Turk (MTurk) to validate the central hypothesis of the model. We provide a simulation based on 3 million task records extracted from MTurk demonstrating that our rating policy can appreciably motivate requesters to hire less-experienced contributors.

Exploring The Interaction Effects for Temporal Spatial Behavior Prediction

In location based services, predicting users' temporal-spatial behavior is critical for accurate recommendation. In this paper, we adopt a joint embedding (JointE) model to learn the representations of user, location, and users' action in the same latent space. The functionality of a location is the critical factor influencing different elements of the behavior and is learned by an embedding vector encoding crowd behaviors. A user personalized preference is learned from the user historical behaviors and has two features. One is the combination of action and location, which is learned by maximizing the semantic consistency of the observed behaviors. The other is the periodic preference. Inspired by the notion of periodical temporal rules, we introduce the concept of temporal pattern to describe how often users visit places so as to reduce the high temporal variance of behaviors. A projection matrix is introduced to combine the temporal patterns with location functionality. A user behavior is predicted by the joint probability on behavior elements. We conduct experiments against two representative datasets. The results show that our approach outperforms other approaches.

Social Cards Probably Provide For Better Understanding Of Web Archive Collections

Used by a variety of researchers, web archive collections have become invaluable sources of evidence. If a researcher is presented with a web archive collection that they did not create, how do they know what is inside so that they can use it for their own research? Search engine results and social media links are represented as surrogates, small easily digestible summaries of the underlying page. Search engines and social media have a different focus, and hence produce different surrogates than web archives. Search engine surrogates help a user answer the question "Will this link meet my information need?" Social media surrogates help a user decide "Should I click on this?" Our use case is subtly different. We hypothesize that groups of surrogates together are useful for summarizing a collection. We want to help users answer the question of "What does the underlying collection contain?" But which surrogate should we use? With Mechanical Turk participants, we evaluate six different surrogate types against each other. We find that the type of surrogate does not influence the time to complete the task we presented the participants. Of particular interest are social cards, surrogates typically found on social media, and browser thumbnails, screen captures of web pages rendered in a browser. At p=0.0569, and p=0.0770, respectively, we find that social cards and social cards paired side-by-side with browser thumbnails probably provide better collection understanding than the surrogates currently used by the popular Archive-It web archiving platform. We measure user interactions with each surrogate and find that users interact with social cards less than other types. The results of this study have implications for our web archive summarization work, live web curation platforms, social media, and more.

Learning from Dynamic User Interaction Graphs to Forecast Diverse Social Behavior

Most of the existing graph analytics for understanding social behavior focuses on learning from static rather than dynamic graphs using hand-crafted network features or recently emerged graph embeddings learned independently from a downstream predictive task, and solving predictive (e.g., link prediction) rather than forecasting tasks directly. To address these limitations, we propose (1) a novel task -- forecasting user interactions over dynamic social graphs, and (2) a novel deep learning, multi-task, node-aware attention model that focuses on forecasting social interactions, going beyond recently emerged approaches for learning dynamic graph embeddings. Our model relies on graph convolutions and recurrent layers to forecast future social behavior and interaction patterns in dynamic social graphs. We evaluate our model on the ability to forecast the number of retweets and mentions of a specific news source on Twitter (focusing on deceptive and credible news sources) with R^2 of 0.79 for retweets and 0.81 for mentions. An additional evaluation includes model forecasts of user-repository interactions on GitHub and comments to a specific video on YouTube with a mean absolute error close to 2% and R^2 exceeding 0.69. Our results demonstrate that learning from connectivity information over time in combination with node embeddings yields better forecasting results than when we incorporate the state-of-the-art graph embeddings e.g., Node2Vec and DeepWalk into our model. Finally, we perform in-depth analyses to examine factors that influence model performance across tasks and different graph types e.g., the influence of training and forecasting windows as well as graph topological properties.

Understanding Default Behavior in Online Lending

Microcredit, very small loans given out without any collaterals, is a new form of financial instrument that serves the segment of population that are typically underserved by traditional financial services. When microcredit takes the form of lending over the internet, it has the advantage of easy online application process and fast funding for borrowers, as well as attractive rate of return for individual lenders. For platforms that facilitate such activities, the key challenge lies in risk management, i.e. adequately pricing each loan's risk so as to balance borrowers' lending cost and lenders' risk-adjusted return. In fact, identifying default borrowers is of critical importance for the ecosystem. Traditionally, credit risk depends heavily on borrowers' historical loan records. However, most borrowers do not have any bureau history, and therefore cannot provide sufficient loan records. In this paper, we study default prediction in online lending by using social behavior. Specifically, we based our work on a dataset provided by PPDai, one of the leading platforms in China. Our dataset consists of over 11 million users and more than 1.5 billion call logs between them. We establish a mobile network and explore social factors that predict borrowers' default. Based on this, we focused on cheating agents, who recruit and teach borrowers to cheat by providing false information and faking application materials. Cheating agents represent a type of default, especially detrimental to the system. We propose a novel probabilistic framework to identify default borrowers and cheating agents simultaneously. Experimental results on production dataset demonstrate significant improvement over several baseline methods. Moreover, our model can effectively identify cheating agents without any labels.

SESSION: Short - Interpretability & Reasoning

Interpretable MTL from Heterogeneous Domains using Boosted Tree

Multi-task learning (MTL) aims at improving the generalization performance of several related tasks by leveraging useful information contained in them. However, in industrial scenarios, interpretability is always demanded, and the data of different tasks may be in heterogeneous domains, making the existing methods unsuitable or unsatisfactory. In this paper, following the philosophy of boosted tree, we proposed a two-stage method. In stage one, a common model is built to learn the commonalities using the common features of all instances. Different from the training of conventional boosted tree model, we proposed a regularization strategy and an early-stopping mechanism to optimize the multi-task learning process. In stage two, started by fitting the residual error of the common model, a specific model is constructed with the task-specific instances to further boost the performance. Experiments on both benchmark and real-world datasets validate the effectiveness of the proposed method. What's more, interpretability can be naturally obtained from the tree based method, satisfying the industrial needs.

Machine Reading Comprehension: Matching and Orders

In this paper, we study the machine reading comprehension of temporal order in text. Given a document of instruction sequences, a model aims to find out the most coherent sequences of activities matching the document among all answer candidates. To tackle the task, we proposeOrdMatch model, which is able to match each activity in a sequence to the corresponding instruction in the document and regularizes the partial order of activities to match the order of instructions. We evaluate the task using the RecipeQA dataset, which includes step-by-step instructions of cooking recipes. Our model outperforms the state-of-the-art models with a wide margin. The experimental results demonstrate the effectiveness of our novel ordering regularizer. Our code will be made available at \href

Aspect and Opinion Aware Abstractive Review Summarization with Reinforced Hard Typed Decoder

In this paper, we study abstractive review summarization. Observing that review summaries often consist of aspect words, opinion words and context words, we propose a two-stage reinforcement learning approach, which first predicts the output word type from the three types, and then leverages the predicted word type to generate the final word distribution. Experimental results on two Amazon product review datasets demonstrate that our method can consistently outperform several strong baseline approaches based on ROUGE scores.

Datalog Reasoning over Compressed RDF Knowledge Bases

Materialisation is often used in RDF systems as a preprocessing step to derive all facts implied by given RDF triples and rules. Although widely used, materialisation considers all possible rule applications and can use a lot of memory for storing the derived facts, which can hinder performance. We present a novel materialisation technique that compresses the RDF triples so that the rules can sometimes be applied to multiple facts at once, and the derived facts can be represented using structure sharing. Our technique can thus require less space, as well as skip certain rule applications. Our experiments show that our technique can be very effective: when the rules are relatively simple, our system is both faster and requires less memory than prominent state-of-the-art RDF systems.

An Explainable Deep Fusion Network for Affect Recognition Using Physiological Signals

Affective computing is an emerging research area which provides insights on human's mental state through human-machine interaction. During the interaction process, bio-signal analysis is essential to detect human affective changes. Currently, machine learning methods to analyse bio-signals are the state of the art to detect the affective states, but most empirical works mainly deploy traditional machine learning methods rather than deep learning models due to the need for explainability. In this paper, we propose a deep learning model to process multimodal-multisensory bio-signals for affect recognition. It supports batch training for different sampling rate signals at the same time, and our results show significant improvement compared to the state of the art. Furthermore, the results are interpreted at the sensor- and signal- level to improve the explainaibility of our deep learning model.

SESSION: Short - Machine Learning

MarlRank: Multi-agent Reinforced Learning to Rank

When estimating the relevancy between a query and a document, ranking models largely neglect the mutual information among documents. A common wisdom is that if two documents are similar in terms of the same query, they are more likely to have similar relevance score. To mitigate this problem, in this paper, we propose a multi-agent reinforced ranking model, named MarlRank. In particular, by considering each document as an agent, we formulate the ranking process as a multi-agent Markov Decision Process (MDP), where the mutual interactions among documents are incorporated in the ranking process. To compute the ranking list, each document predicts its relevance to a query considering not only its own query-document features but also its similar documents' features and actions. By defining reward as a function of NDCG, we can optimize our model directly on the ranking performance measure. Our experimental results on two LETOR benchmark datasets show that our model has significant performance gains over the state-of-art baselines. We also find that the NDCG shows an overall increasing trend along with the step of interactions, which demonstrates that the mutual information among documents helps improve the ranking performance.

LinkRadar: Assisting the Analysis of Inter-app Page Links via Transfer Learning

Analyzing links among pages from different mobile apps is an important task of app analysis. Currently, most efforts of analyzing inter-app page links rely on static program analysis, which produces a lot of false positives, requiring significant manual effort to verify the links. To address the issue, in this paper, we propose LinkRadar, a data-driven approach to assisting the analysis of inter-app page links. Our key idea is to use dynamic program analysis to gather a set of actual inter-app page links, based on which we train a model to predict whether there exist links among pages from different apps to help verify the results of static program analysis. The challenge is that inter-app page links are hard to be triggered by dynamic program analysis, making it difficult to collect enough inter-app page links to train the model. Considering the similarity between intra-app page links and inter-app page links, we use transfer learning to deal with the data scarcity problem. Evaluation results show that LinkRadar is able to infer the inter-app page links with high accuracy.

NAD: Neural Network Aided Design for Textile Pattern Generation

Textile pattern design is a challenging task that can be hardly resolved by a single deep neural network, due to the requirements on high resolution, periodic tiling, copyright protection and aesthetic preference of designers. In this paper, we present our NAD system which can automatically produce high-quality textile patterns for printing industry. Our NAD system splits the work into three steps: layout design, image filtering and pattern style transfer. In the first and last step, we employ different neural models to learn the process of artwork creation by human designers. Specifically, a reinforcement learning model is first developed for layout adjustment, followed by a CNN-based model for style transfer. We have employed our NAD system in an online production system with real customers and the results are very impressive and promising. The NAD system not only frees human designers from the labor intensive design process, but also results in a 2%-5% daily purchase rate.

Feature Selection for Facebook Feed Ranking System via a Group-Sparsity-Regularized Training Algorithm

In modern production platforms, large scale online learning models are applied to data of very high dimension. To save computational resource, it is important to have an efficient algorithm to select the most significant features from an enormous feature pool. In this paper, we propose a novel neural-network-suitable feature selection algorithm, which selects important features from the input layer during training. Instead of directly regularizing the training loss, we inject group-sparsity regularization into the (stochastic) training algorithm. In particular, we introduce a group sparsity norm into the proximally regularized stochastical gradient descent algorithm. To fully evaluate the practical performance, we apply our method to Facebook News Feed dataset, and achieve favorable performance compared with state-of-the-arts using traditional regularizers.

Fine-Grained Geolocalization of User-Generated Short Text based on Weight Probability Model

Recently, the fine-grained geolocalization of User-Generated Short Text (UGST) has become increasingly important. Existing methods can not make full use of the location information in the UGSTs. Besides, existing works only consider the importance of terms for all locations, but do not distinguish the importance of the same term in different locations. To solve these problems, we propose a fine-grained geolocalization method based on a weight probability model (FGST-WP). The method mainly includes three parts: 1) Using the reverse maximum match algorithm to filter out UGSTs that do not contain any location indicative information. 2) Building coupling of terms and locations and adopting a mixed weight strategy to assign weights to terms. 3) Calculating the probability of non-geotagged UGST posted from each location and selecting k locations according to the top-k probabilities. Experiments on ground-truth datasets prove the superior performance of FGST-WP.

A Compare-Aggregate Model with Latent Clustering for Answer Selection

In this paper, we propose a novel method for a sentence-level answer-selection task that is a fundamental problem in natural language processing. First, we explore the effect of additional information by adopting a pretrained language model to compute the vector representation of the input text and by applying transfer learning from a large-scale corpus. Second, we enhance the compare-aggregate model by proposing a novel latent clustering method to compute additional information within the target corpus and by changing the objective function from listwise to pointwise. To evaluate the performance of the proposed approaches, experiments are performed with the WikiQA and TREC-QA datasets. The empirical results demonstrate the superiority of our proposed approach, which achieve state-of-the-art performance for both datasets.

SESSION: Short - Embeddings

Spotting Terrorists by Learning Behavior-aware Heterogeneous Network Embedding

Heterogeneous network is a useful data representation in depicting complex interactions among multi-typed entities and relations. In this work, by representing criminal and terrorism activities as a heterogeneous network, we propose a novel unsupervised method, Outlier Spotting with behavior-aware Network Embedding (OSNE), to identify terrorists among potential criminals. The basic idea of OSNE is to exploit high-order relation paths for translation-based embedding learning, and distinguish same-type entities based on behavior penalty and type-aware negative sampling. We evaluate the effectiveness of OSNE using six criminal network datasets provided by DARPA, and make comparison with strong competitors. The results exhibit the promising performance of OSNE.

Scalable Manifold-Regularized Attributed Network Embedding via Maximum Mean Discrepancy

Networks are ubiquitous in many real-world applications due to their capability of representing the rich information in the data. One fundamental problem of network analysis is to learn a low- dimensional vector representation for nodes within the attributed networks. However, there is little work theoretically considering the information heterogeneity from the attributed networks, and most of the existing attributed network embedding techniques are able to capture at most k-th order node proximity, thus leading to the information loss of the long-range spatial dependencies between individual nodes across the entire network. To address the above problems, in this paper, we propose a novel MAnifold-RegularIzed Network Embedding (MARINE) algorithm inspired by minimizing the information discrepancy in a Reproducing Kernel Hilbert Space via Maximum Mean Discrepancy. In particular, we show that MARINE recursively aggregates the graph structure information as well as individual node attributes from the entire network, and thereby preserves the long-range spatial dependencies between nodes across the network. The experimental results on real networks demonstrate the effectiveness and efficiency of the proposed MARINE algorithm over state-of-the-art embedding methods.

Tensor Decomposition-based Node Embedding

In recent years, node embedding algorithms, which learn low dimensional vector representations for nodes in a graph, have been one of the key research interests of the graph mining community. The existing algorithms either rely on computationally expensive eigendecomposition of the large matrices, or require tuning of the word embedding-based hyperparameters as a result of representing the graph as a node sequence similar to the sentences in a document. Moreover, the latent features produced by these algorithms are hard to interpret. In this paper, we present Tensor Decomposition-based Node Embedding (TDNE), a novel model for learning node representations for arbitrary types of graphs: undirected, directed, and/or weighted. Our model preserves the local and global structural properties of a graph by constructing a third-order tensor using the k-step transition probability matrices and decomposing the tensor through CANDECOMP/PARAFAC (CP) decomposition in order to produce an interpretable, low dimensional vector space for the nodes. Our experimental evaluation using two well-known social network datasets proves TDNE to be interpretable with respect to the understandability of the feature space, and precise with respect to the network reconstruction.

Geometric Estimation of Specificity within Embedding Spaces

Specificity is the level of detail at which a given term is represented. Existing approaches to estimating term specificity are primarily dependent on corpus-level frequency statistics. In this work, we explore how neural embeddings can be used to define corpus-independent specificity metrics. Particularly, we propose to measure term specificity based on the distribution of terms in the neighborhood of the given term in the embedding space. The intuition is that a term that is surrounded by other terms in the embedding space is more likely to be specific while a term surrounded by less closely related terms is more likely to be generic. On this basis, we leverage geometric properties between embedded terms to define three groups of metrics: (1) neighborhood-based, (2) graph-based and (3) cluster-based metrics. Moreover, we employ learning-to-rank techniques to estimate term specificity in a supervised approach by employing the three proposed groups of metrics. We curate and publicly share a test collection of term specificity measurements defined based on Wikipedia's category hierarchy. We report on our experiments through metric performance comparison, ablation study and comparison against the state-of-the-art baselines.

Similarity-Aware Network Embedding with Self-Paced Learning

Network embedding, which aims to learn low-dimensional vector representations for nodes in a network, has shown promising performance for many real-world applications, such as node classification and clustering. While various embedding methods have been developed for network data, they are limited in their assumption that nodes are correlated with their neighboring nodes with the same similarity degree. As such, these methods can be suboptimal for embedding network data. In this paper, we propose a new method named SANE, short for Similarity-Aware Network Embedding, to learn node representations by explicitly considering different similarity degrees between connected nodes in a network. In particular, we develop a new framework based on self-paced learning by accounting for both the explicit relations (i.e., observed links) and implicit relations (i.e., unobserved node similarities) in network representation learning. To justify our proposed model, we perform experiments on two real-world network data. Experiments results show that SNAE outperforms state-of-the-art embedding models on the tasks of node classification and node clustering.

Integrating Multi-Network Topology via Deep Semi-supervised Node Embedding

Node Embedding, which uses low-dimensional non-linear feature vectors to represent nodes in the network, has shown a great promise, not only because it is easy-to-use for downstream tasks, but also because it has achieved great success on many network analysis tasks. One of the challenges has been how to develop a node embedding method for integrating topological information from multiple networks. To address this critical problem, we propose a novel node embedding, called DeepMNE, for multi-network integration using a deep semi-supervised autoencoder. The key point of DeepMNE is that it captures complex topological structures of multiple networks and utilizes correlation among multiple networks as constraints. We evaluate DeepMNE in node classification task and link prediction task on four real-world datasets. The experimental results demonstrate that DeepMNE shows superior performance over seven state-of-the-art single-network and multi-network embedding algorithms.

SESSION: Short - Time Sequences & Dynamics

Query-Specific Knowledge Summarization with Entity Evolutionary Networks

Given a query, unlike traditional IR that finds relevant documents or entities, in this work, we focus on retrieving both entities and their connections for insightful knowledge summarization. For example, given a query "computer vision'' on a CS literature corpus, rather than returning a list of relevant entities like "cnn'', "imagenet'' and "svm'', we are interested in the connections among them, and furthermore, the evolution patterns of such connections along particular ordinal dimensions such as time. Particularly, we hope to provide structural knowledge relevant to the query, such as "svm'' is related to "imagenet'' but not "cnn''. Moreover, we aim to model the changing trends of the connections, such as "cnn'' becomes highly related to "imagenet'' after 2010, which enables the tracking of knowledge evolutions. In this work, to facilitate such a novel insightful search system, we propose SetEvolve, which is a unified framework based on nonparanomal graphical models for evolutionary network construction from large text corpora. Systematic experiments on synthetic data and insightful case studies on real-world corpora demonstrate the utility of SetEvolve.

Real-time Edge Repartitioning for Dynamic Graph

To improve the performance of large graph computing, graph partitioning has become a mandatory step in distributed graph computing frameworks. Some existing frameworks partition edges of an input graph in a streaming way. As the scale of real-world graphs grows dynamically, they need to limit the increasing communication cost and time cost in graph computing by reducing vertex replicas(each vertex can be replicated to multiple partitions). In this paper, we propose a real-time edge repartitioning algorithm for dynamic graph, which reduces the vertex replicas by reassigning edges near the new edge. We find that some edges are migrated just after being assigned, which leads to unnecessary migrations. To reduce migration cost, according to the replicas distribution of neighbors of two vertices connected by the new edge, we assign the new edge to the partition where it is most likely to be located after repartitioning. Our evaluation shows that it improves the performance of graph computing by only a small amount of migration.

DSANet: Dual Self-Attention Network for Multivariate Time Series Forecasting

Multivariate time series forecasting has attracted wide attention in areas, such as system, traffic, and finance. The difficulty of the task lies in that traditional methods fail to capture complicated non-linear dependencies between time steps and between multiple time series. Recently, recurrent neural network and attention mechanism have been used to model periodic temporal patterns across multiple time steps. However, these models fit not well for time series with dynamic-period patterns or nonperiodic patterns. In this paper, we propose a dual self-attention network (DSANet) for highly efficient multivariate time series forecasting, especially for dynamic-period or nonperiodic series. DSANet completely dispenses with recurrence and utilizes two parallel convolutional components, called global temporal convolution and local temporal convolution, to capture complex mixtures of global and local temporal patterns. Moreover, DSANet employs a self-attention module to model dependencies between multiple series. To further improve the robustness, DSANet also integrates a traditional autoregressive linear model in parallel to the non-linear neural network. Experiments on real-world multivariate time series data show that the proposed model is effective and outperforms baselines.

Time Series Prediction with Interpretable Data Reconstruction

Time series prediction plays a key role in wide applications and has been investigated for a couple of decades. Nevertheless, most of the prior works fail to identify the most effective frequency components of time series before passing through the prediction, which induces the drop of the performance. In this paper, we propose a novel predictor which integrates the sequence to sequence (seq2seq) model based on long short-term memory units (LSTM) with interpretable data reconstruction, where the learned hidden state is taken as a bridge. The reconstructor can effectively regularize data based on the learned frequency components and extract the effective components of time series. Moreover, we present an alternative training mechanism and a dedicated loss function to guarantee the success of prediction. Experimental results on extensive real-world datasets show a prominent superiority in comparison to state-of-the-art methods.

Towards Explainable Representation of Time-Evolving Graphs via Spatial-Temporal Graph Attention Networks

Many complex systems with relational data can be naturally represented as dynamic processes on graphs, with the addition/deletion of nodes and edges over time. For such graphs, network embedding provides an important class of tools for leveraging the node proximity to learn a low-dimensional representation before using the off-the-shelf machine learning models. However, for dynamic graphs, most, if not all, embedding approaches rely on various hyper-parameters to extract spatial and temporal context information, which differ from task to task and from data to data. Besides, many regulated industries (e.g., finance, health care) require the learning models to be interpretable and the output results to meet compliance. Therefore, a natural research question is how we can jointly model the spatial and temporal context information and learn a unique network representation, while being able to provide interpretable inference over the observed data. To address this question, we propose a generic graph attention neural mechanism named STANE, which guides the context sampling process to focus on the crucial part of the data. Moreover, to interpret the network embedding results, STANE enables the end users to investigate the graph context distributions along three dimensions (i.e., nodes, training window length, and time). We perform extensive experiments regarding quantitative evaluation and case studies, which demonstrate the effectiveness and interpretability of STANE.

Deep Prototypical Networks for Imbalanced Time Series Classification under Data Scarcity

With the increase of temporal data availability, time series classification has drawn a lot of attention in the literature because of its wide spectrum of applications in diverse domains (e.g., healthcare, bioinformatics and finance), ranging from human activity recognition to financial pattern identification. While significant progress has been made to solve time series classification problem, the success of such methods relies on data sufficiency, and may not well capture the quality embeddings when training triple instances are scarce and highly imbalance across classes. To address these challenges, we propose a prototype embedding framework-Deep Prototypical Networks (DPN), which leverages a main embedding space to capture the discrepancies of difference time series classes for alleviating data scarcity. In addition, we further augment DPN framework with a relationship-dependent masking module to automatically fuse relevant information with a distance metric learning process, which addresses the data imbalance issue and performs robust time series classification. Experimental results show significant and consistent improvements compared to state-of-the-art techniques.

SESSION: Short - Graph Neural Networks

Knowledge-aware Textual Entailment with Graph Attention Network

Textual entailment is a central problem of language variability, which has been attracting a lot of interest and it poses significant issues in front of systems aimed at natural language understanding. Recently, various frameworks have been proposed for textual entailment recognition, ranging from traditional computational linguistics techniques to deep learning model based methods. However, recent deep neural networks that achieve the state of the art on textual entailment task only consider the context information of the given sentences rather than the real-world background information and knowledge beyond the context. In the paper, we propose a Knowledge-Context Interactive Textual Entailment Network (KCI-TEN) that learns graph level sentence representations by harnessing external knowledge graph with graph attention network. We further propose a text-graph interaction mechanism for neural based entailment matching learning, which endows the redundancy and noise with less importance and put emphasis on the informative representations. Experiments on the SciTail dataset demonstrate that KCI-TEN outperforms the state-of-the-art methods.

Fast Approximations of Betweenness Centrality with Graph Neural Networks

Betweenness centrality is an important measure to find out influential nodes in networks in terms of information spread and connectivity. However, the exact calculation of betweenness centrality is computationally expensive. Although researchers have proposed approximation methods, they are either less efficient, or suboptimal, or both. In this paper, we present a Graph Neural Network(GNN) based inductive framework which uses constrained message passing of node features to approximate betweenness centrality. As far as we know, we are the first to propose a GNN based model to accomplish this task. We demonstrate that our approach dramatically outperforms current techniques while taking less amount of time through extensive experiments on a series of real-world datasets.

Neighborhood Interaction Attention Network for Link Prediction

Interactions between neighborhoods of two target nodes are often regarded as important clues for link prediction. In this paper, we propose a novel link prediction neural model named Neighborhood Interaction Attention Network (NIAN), which is able to automatically learn comprehensive neighborhood interaction features and predict links in an end-to-end way. The proposed model mainly consists of two attention layers. A node-level attention is designed to extract latent structure features of nodes in target neighborhoods. Based on the latent node features, a neighborhood-level attention is proposed to learn neighborhood interaction features by considering different importance of pair-wise interactions. The superiority of NIAN is demonstrated by extensive experiments on 6 benchmark datasets against 12 popular and state-of-the-art approaches.

Long-short Distance Aggregation Networks for Positive Unlabeled Graph Learning

Graph neural nets are emerging tools to represent network nodes for classification. However, existing approaches typically suffer from two limitations: (1) they only aggregate information from short distance (e.g., 1-hop neighbors) each round and fail to capturelong distance relationship in graphs; (2) they require users to label data from several classes to facilitate the learning of discriminative models; whereas in reality, users may only provide labels of a small number of nodes in a single class. To overcome these limitations, this paper presents a novel long-short distance aggregation networks (\textttLSDAN ) for positive unlabeled (PU) graph learning. Our theme is to generate multiple graphs at different distances based on the adjacency matrix, and further develop a long-short distance attention model for these graphs. The short-distance attention mechanism is used to capture the importance of neighbor nodes to a target node. The long-distance attention mechanism is used to capture the propagation of information within a localized area of each node and help model weights of different graphs for node representation learning. A non-negative risk estimator is further employed, to aggregate long- short-distance networks, for PU learning using back-propagated loss modeling. Experiments on real-world datasets validate the effectiveness of our approach.

Using External Knowledge for Financial Event Prediction Based on Graph Neural Networks

This paper focuses on a novel financial event prediction task that takes a historical event chain as input and predicts what event will happen next. We introduce financial news as supplementary information to solve problems of multiple interpretations of same financial event. Besides, a gated graph neural network based approach is utilized to capture complicated relationships between event graphs for better event prediction. For the evaluation, we build a new dataset consisting of financial events for thousands of Chinese listed companies from 2013 to 2017. Experimental results show the effectiveness of our proposed model.

Cross-Domain Recommendation via Preference Propagation GraphNet

Recommendation can be framed as a graph link prediction task naturally. The user-item interaction graph built within a single domain often suffers from high sparsity. Thus, there has been a surge of approaches to alleviate the sparsity issue via cross-domain mutual augmentation. The SOTA cross-domain recommendation algorithms all try to bridge the gap via knowledge transfer in the latent space. We find there are mainly three problems in their formulations: 1) their knowledge transfer is unaware of the cross-domain graph structure. 2) their framework cannot capture high-order information propagation on the graph. 3) their cross-domain transfer formulations are generally more complicated to be optimized than the unified methods. In this paper, we propose the Preference Propagation GraphNet (PPGN) to address the above problems. Specifically, we construct a Cross-Domain Preference Matrix (CDPM) to model the interactions of different domains as a whole. Through the propagation layer of PPGN, we try to capture how user preferences propagate in the graph. Consequently, a joint objective for different domains is defined, and we simplify the cross-domain recommendation into a unified multi-task model. Extensive experiments on two pairs of real-world datasets show PPGN outperforms the SOTA algorithms significantly.

SESSION: Short - Recommendation

ARP: Aspect-aware Neural Review Rating Prediction

Review rating prediction is an important task in data mining and natural language processing fields, and has wide applications. Users usually express opinions towards many aspects in their reviews, and the overall review rating is a synthesis of these opinions. However, most existing review rating prediction methods ignore users' opinions on aspects, which is insufficient. In this paper, we propose a neural aspect-aware rating prediction approach for Chinese reviews. In our approach we propose a collaborative learning framework to jointly train review-level rating predictor and multiple aspect-level rating predictors. In our framework different rating predictors share the same review encoder model to exploit the inherent relatedness between them, but have different attention networks to focus on different informative texts for each task. The final review representation for rating prediction is a concatenation of the review representations from all predictors. Since word segmentation of Chinese reviews is usually inaccurate, we propose a multi-view learning model to learn review representations from both words and characters. Extensive experiments on real-world dataset validate the effectiveness of our approach.

CosRec: 2D Convolutional Neural Networks for Sequential Recommendation

Sequential patterns play an important role in building modern recommender systems. To this end, several recommender systems have been built on top of Markov Chains and Recurrent Models (among others). Although these sequential models have proven successful at a range of tasks, they still struggle to uncover complex relationships nested in user purchase histories. In this paper, we argue that modeling pairwise relationships directly leads to an efficient representation of sequential features and captures complex item correlations. Specifically, we propose a 2D convolutional network for sequential recommendation (CosRec). It encodes a sequence of items into a three-way tensor; learns local features using 2D convolutional filters; and aggregates high-order interactions in a feedforward manner. Quantitative results on two public datasets show that our method outperforms both conventional methods and recent sequence-based approaches, achieving state-of-the-art performance on various evaluation metrics.

Data Poisoning Attacks on Cross-domain Recommendation

Cross-domain recommendation has attracted growing interests given their simplicity and effectiveness. In the cross-domain scenarios, we may improve predictive accuracy in one domain by transferring knowledge from the other, which alleviates the data sparsity issue. However, the relatedness of these domains can be exploited by a malicious party to launch data poisoning attacks. Here we study the vulnerability of cross-domain recommendation under data poisoning attacks. We show that data poisoning attacks can be formulated as a bilevel optimization problem. Our experimental results show that cross-domain system can be compromised under attacks, highlighting the need for countermeasures against data poisoning attacks in cross-domain recommendation.

Session-based Recommendation with Hierarchical Memory Networks

The task of session-based recommendation aims to predict users' future interests based on anonymous historical sessions. Recent works have shown that memory models, which capture user preference from previous interaction sequence with long short-term or short-term memory, can lead to encouraging results in this problem. However, most existing memory models tend to regard each item as a memory unit, which neglect n-gram features and are insufficient to learn the user's feature-level preferences. In this paper, we aim to leverage n-gram features and model users' feature-level preferences in an explicit and effective manner. To this end, we present a memory model with multi-scale feature memory for session-based recommendation. A densely connected convolutional neural network (CNN) with short-cut path between upstream and downstream convolutional blocks is applied to build multi-scale features from item representations, and features in the same scale are combined with memory mechanism to capture users' feature-level preferences. Furthermore, attention is used to adaptively select users' multi-scale feature-level preferences for recommendation. Extensive experiments conducted on two benchmark datasets demonstrate the effectiveness of the proposed model in comparison with competitive baselines.

Correcting for Recency Bias in Job Recommendation

Users are known to interact more with fresh content in certain temporally associated domains such as news search or job seeking, leading to an uneven distribution of interactions over items of different degrees of freshness. Data collected under such an "aging effect'' is usually used unconditionally on all sort of recommendation tasks, and as a result more recently published content may be over-represented during model training and evaluation. In this study, we characterize this temporal influence as a recency bias, and present an analysis in the domain of job recommendation. We show that, by correcting for recency bias using an unbiased learning to rank approach, one can improve the quality of recommendation significantly over a recent neural collaborative filtering model on RecSys Challenge 2017 data.

Motif Enhanced Recommendation over Heterogeneous Information Network

Heterogeneous Information Networks (HIN) has been widely used in recommender systems (RSs). In previous HIN-based RSs, meta-path is used to compute the similarity between users and items. However, existing meta-path based methods only consider first-order relations, ignoring higher-order relations among the nodes ofsame type, captured bymotifs. In this paper, we propose to use motifs to capture higher-order relations among nodes of same type in a HIN and develop the motif-enhanced meta-path (MEMP) to combine motif-based higher-order relations with edge-based first-order relations. With MEMP-based similarities between users and items, we design a recommending model MoHINRec, and experimental results on two real-world datasets, Epinions and CiaoDVD, demonstrate its superiority over existing HIN-based RS methods.

SESSION: Short - Algorithm

GPU-Accelerated Decoding of Integer Lists

An inverted index is the basic data structure used in most current large-scale information retrieval systems. It can be modeled as a collection of sorted sequences of integers. Many compression techniques for inverted indexes have been studied in the past, with some of them reaching tremendous decompression speeds through the use of SIMD instructions available on modern CPUs. While there has been some work on query processing algorithms for Graphics Processing Units (GPUs), little of it has focused on how to efficiently access compressed index structures, and we see some potential for significant improvements in decompression speed.

In this paper, we describe and implement two encoding schemes for index decompression on GPU architectures. Their format and decoding algorithm is adapted from existing CPU-based compression methods to exploit the execution model and memory hierarchy offered by GPUs. We show that our solutions, GPU-BP and GPU-VByte, achieve significant speedups over their already carefully optimized CPU counterparts.

Synergizing Local and Global Models for Matrix Approximation

Ensemble matrix approximation (MA) methods have achieved promising performance in collaborative filtering, many of which perform matrix approximation on multiple submatrices of user-item ratings in parallel and then combine the predictions from the sub-models for higher efficiency. However, data partitioning could lead to suboptimal accuracy due to the lack of capturing structural information related to most or all users/items. This paper proposes a new ensemble learning framework, in which the local models and global models are synergetically updated from each other. This makes it possible to capture both local associations in user-item subgroups and global structures over all users and items. Experiments on three real-world datasets demonstrate that the proposed method outperforms six state-of-the-art methods in recommendation accuracy with decent scalability.

Deep Colorization by Variation

We propose an adversarial learning based model for image colorization in which we elaborately adapt image translation mechanism that are optimized according to the task. After developing approaches on improving the global and local quality of the image colorization by analyzing this processing made by network architecture and objective functions, we formulate a diverse map-ping from the gray scale images to colorful images by latent space variation within the model. At last, discussion on the theoretical framework for studying color information distribution and video colorization is given.

Fast Random Forest Algorithm via Incremental Upper Bound

Random forest is an ensemble approach based on decision trees. It computes the best split in each node in terms of impurity reduction. However, the impurity computations incur high computation cost in its training process. This paper proposes F-forest, an efficient variant of random forest. It incrementally estimates upper bounds for scores that correspond to impurity reductions to find the best split. Since we can safely skip unnecessary computations, it can guarantee the same training result as the original approach. Experiments show that our approach is faster than state-of-the-art approaches.

Convolution-Consistent Collective Matrix Completion

Collective matrix completion refers to the problem of simultaneously predicting the missing entries in multiple matrices by leveraging the cross-matrix information. It finds abundant applications in various domains such as recommender system, dimensionality reduction, and image recovery. Most of the existing work represents the cross-matrix information in a shared latent structure constrained by the Euclidean-based pairwise similarity, which may fail to capture the nonlinear relationship of the data. To address this problem, in this paper, we propose a new collective matrix completion framework, named C4, which uses the graph spectral filters to capture the non-Euclidean cross-matrix information. To the best of our knowledge, this is the first effort to represent the cross-matrix information in the graph spectral domain. We benchmark our model against 8 recent models on 10 real-world data sets, and our model outperforms state-of-the-art methods in most tasks.

Faster Algorithms for k-Regret Minimizing Sets via Monotonicity and Sampling

Regret-based queries are a complement of top-k and skyline queries when users cannot specify accurate utility functions while must output a controllable size of the query results. Various regret-based queries are proposed in last decade for multi-criteria decision making. The k-regret minimizing set (k-RMS) query which returns r points from the dataset and minimizes the maximum k-regret ratio has been extensively studied. However, existing state-of-art algorithms to find k-regret minimizing sets are very time-consuming and unapplicable. In this paper, we propose a faster algorithm SAMPGREED for k-RMS queries by utilizing the monotonicity of the regret ratio function with sampling techniques. We provide the theoretical analysis of our SAMPGREED algorithm and experiments on synthetic and real datasets verify our proposed algorithm is superior to existing state-of-art approaches.

Towards Stochastic Simulations of Relevance Profiles

Recently proposed methods allow the generation of simulated scores representing the values of an effectiveness metric, but they do not investigate the generation of the actual lists of retrieved documents. In this paper we address this limitation: we present an approach that exploits an evolutionary algorithm and, given a metric score, creates a simulated relevance profile (i.e., a ranked list of relevance values) that produces that score. We show how the simulated relevance profiles are realistic under various analyses.

SESSION: Short - Anomaly Detection

SpecAE: Spectral AutoEncoder for Anomaly Detection in Attributed Networks

Anomaly detection in attributed networks (instance-to-instance dependencies and interactions are available) has various applications such as monitoring suspicious accounts in social media and financial fraud in transaction networks. However, it remains a challenging task since the definition of anomaly becomes more complicated and topological structures are heterogeneous with nodal attributes. In this paper, we propose a spectral convolution and deconvolution based framework - SpecAE, to project the attributed network into a tailored space to detect global and community anomalies. SpecAE leverages Laplacian sharpening to amplify the distances between representations of anomalies and the ones of the majority. The learned representations along with reconstruction errors are combined with a density estimation model to perform the detection. Experiments on real-world datasets demonstrate the effectiveness of the proposed SpecAE.

On Continuously Matching of Evolving Graph Patterns

An evolving pattern graph is defined by an initial pattern graph and a graph update stream consisting of edge insertions and deletions. Identifying and monitoring evolving graph patterns in the data graph is important in various application domains such as Cyberthreats surveillance. This motivates us to explore matching patterns with evolvement, and the investigation presents a novel algorithm \incepg for continuously matching of evolving patterns. Specially, we propose a concise representation \Index of partial matching solutions, and its execution model allows fast incremental maintenance. We also conceive an effective model for estimating step-wise cost of pattern evaluation to drive the matching process. Extensive experiments verify the superiority of \incepg.

Time-Series Aware Precision and Recall for Anomaly Detection: Considering Variety of Detection Result and Addressing Ambiguous Labeling

We proposetime-series aware precision andrecall, which are appropriate for evaluating anomaly detection methods in time-series data. In time-series data, an anomaly corresponds toa series of instances. The conventional metrics, however, overlook this characteristic, so they suffer from a problem of giving a high score to the method that only detects a long anomaly. To overcome the problem, our metrics consider thevariety of the detected anomalies to be more important through two scoring strategies,detection scoring (\ie, how many anomalies are detected) andportion scoring (\ie, how precisely each anomaly is detected). Moreover, our metrics concernambiguous instances, which indicate the instances labeled as 'normal' although they are affected by their precedent anomaly. Our metrics give smaller scores to those instances as they are likely to be anomalous. We demonstrate that our metrics are more suitable for time-series data compared to existing metrics by evaluations using a real-world dataset as well as several examples.\footnoteOur code and the detection results are available at:

Additive Explanations for Anomalies Detected from Multivariate Temporal Data

Detecting anomalies from high-dimensional multivariate temporal data is challenging, because of the non-linear, complex relationships between signals. Recently, deep learning methods based on autoencoders have been shown to capture these relationships and accurately discern between normal and abnormal patterns of behavior, even in fully unsupervised scenarios. However, validating the anomalies detected is difficult without additional explanations. In this paper, we extend SHAP -- a unified framework for providing additive explanations, previously applied for supervised models -- with influence weighting, in order to explain anomalies detected from multivariate time series with a GRU-based autoencoder. Namely, we extract the signals that contribute most to an anomaly and those that counteract it. We evaluate our approach on two use cases and show that we can generate insightful explanations for both single and multiple anomalies.

ED2: A Case for Active Learning in Error Detection

State-of-the-art approaches formulate error detection as a semi-supervised classification problem. Recent research suggests that active learning is insufficiently effective for error detection and proposes the usage of neural networks and data augmentation to reduce the number of these user-provided labels. However, we can show that using the appropriate active learning strategy, it is possible to outperform the more complex models that rely on data augmentation. To this end, we propose a multi-classifier approach with two-stage sampling for active learning. This intuitive and neat sampling method chooses the most promising cells across rows and columns for labeling. On three datasets, ED2 achieves state-of-the-art detection accuracy while for large datasets, the required number of user labels is lower by one order of magnitude compared to the state of the art.

Multi-scale Trajectory Clustering to Identify Corridors in Mobile Networks

Deployment and management of large-scale mobile edge computing infrastructure in 5G networks has created a major challenge for mobile operators. The ability to extract common users' trajectories (i.e., corridors) in mobile networks helps mobile operators to better manage and orchestrate the allocation of network resources. However, compared with other types of trajectories, mobile trajectories are coarse, and their granularity varies due to the inconsistent density of cell towers. To identify the underlying geographical corridors of users in mobile networks, we propose a hierarchical multi-scale trajectory clustering algorithm for corridor identification by analyzing the non-homogeneity of the spatial distribution of cell towers and users' movements. To measure trajectory similarity on different scales we propose a distance measure based on Hausdorff distance that considers the cell density distribution. Common corridors are represented as weighted graphs as the final results, which can not only highlight users' frequent paths but also users' movement pattern between cell towers. The proposed method is validated using real-life datasets provided by China Mobile. Results show that by considering the heterogeneity of mobile networks, our method can achieve the best performance with more than 10% improvement in clustering quality compared with state-of-the-art algorithms.

SESSION: Short - Recognition

Multi-view Moments Embedding Network for 3D Shape Recognition

Benefited from rapid developments of deep learning, 3D shape recognition has become a remarkable subject in computer vision systems.The existing methods of multi-perspective views have shown competitive performance in 3D shape recognition.However, they have not yet fully exploited the information among all views of projection.In this paper, we propose a novel Multi-view Moments Embedding Network(MMEN) for capturing multiple moments information.MMEN obtains the similarity between different views and retains the description of the original view by generating moments matrix for representing the general features of the 3D shape.Additionally, we apply the matrix square-root layer to perform a non-linear scaling to the eigenvalues of the moment embedding matrix.We compare the performance of our proposed network with several state-of-the-art models on the ModelNet datasets, and the results of the average instance/class accuracy demonstrate the promising performance of MMEN on 3D shape recognition.

Active Entity Recognition in Low Resource Settings

The task of Named Entity Recognition (NER) has been well studied under high-resource conditions (e.g., extracting named mentions of PERSON, ORGANIZATION and LOCATION from news articles). However, there are very few studies of the NER task for open-domain collections and in low-resource settings. We focus on NER for low-resource collections, in which any entity types of practical interest to the users of the system must be supported. We try to achieve this with a low cost of annotation of data from the target domain/collection. We propose an entity recognition framework that combines active learning and conditional random fields (CRF), and which provides the flexibility to define new entity types as needed by the users. Our experiments on a help & support corpus show that the system can achieve F1 measure of 0.77 by relying on only 100 manually-annotated sentences.

On Novel Object Recognition: A Unified Framework for Discriminability and Adaptability

The rich and accessible labeled data fueled the revolutionary successes of deep learning in object recognition. However, recognizing objects of novel classes with limited supervision information provided, i.e., Novel Object Recognition (NOR), remains a challenging task. We identify in this paper two key factors for the success of NOR that previous approaches fail to simultaneously guarantee. The first is producing discriminative feature representations for images of novel classes, and the second is generating a flexible classifier readily adapted to novel classes provided with limited supervision signals. To secure both key factors, we propose a framework which decouples a deep classification model into a feature extraction module and a classification module. We learn the former to ensure feature discriminability with a standard multi-class classification task by fully utilizing the competing information among all classes within a training set, and learn the latter to secure adaptability by training a meta-learner network which generates classifier weights whenever provided with minimal supervision information of target classes. Extensive experiments on common benchmark datasets in the settings of both zero-shot and few-shot learning demonstrate our method achieves state-of-the-art performance.

Exploiting Multiple Embeddings for Chinese Named Entity Recognition

Identifying the named entities mentioned in text would enrich many semantic applications at the downstream level. However, due to the predominant usage of colloquial language in microblogs, the named entity recognition (NER) in Chinese microblogs experience significant performance deterioration, compared with performing NER in formal Chinese corpus. In this paper, we propose a simple yet effective neural framework to derive the character-level embeddings for NER in Chinese text, named ME-CNER. A character embedding is derived with rich semantic information harnessed at multiple granularities, ranging from radical, character to word levels. The experimental results demonstrate that the proposed approach achieves a large performance improvement on Weibo dataset and comparable performance on MSRA news dataset with lower computational cost against the existing state-of-the-art alternatives.

Gate-based Bidirectional Interactive Decoding Network for Scene Text Recognition

Scene text recognition has attracted rapidly increasing attention from the research community. Recent dominant approaches typically follow an attention-based encoder-decoder framework that uses a unidirectional decoder to perform decoding in a left-to-right manner, but ignoring equally important right-to-left grammar information. In this paper, we propose a novel Gate-based Bidirectional Interactive Decoding Network (GBIDN) for scene text recognition. Firstly, the backward decoder performs decoding from right to left and generates the reverse language context. After that, the forward decoder simultaneously utilizes the visual context from image encoder and the reverse language context from backward decoder through two attention modules. In this way, the bidirectional decoders perform effective interaction to fully fuse the bidirectional grammar information and further improve the decoding quality. Besides, in order to relieve the adverse effect of noises, we devise a gated context mechanism to adaptively make use of the visual context and reverse language context. Extensive experiments on various challenging benchmarks demonstrate the effectiveness of our method.

Modeling Long-Range Context for Concurrent Dialogue Acts Recognition

In dialogues, an utterance is a chain of consecutive sentences produced by one speaker which ranges from a short sentence to a thousand-word post. When studying dialogues at the utterance level, it is not uncommon that an utterance would serve multiple functions. For instance, "Thank you. It works great." expresses both gratitude and positive feedback in the same utterance. Multiple dialogue acts (DA) for one utterance breeds complex dependencies across dialogue turns. Therefore, DA recognition challenges a model's predictive power over long utterances and complex DA context. We term this problem Concurrent Dialogue Acts (CDA) recognition. Previous work on DA recognition either assumes one DA per utterance or fails to realize the sequential nature of dialogues. In this paper, we present an adapted Convolutional Recurrent Neural Network (CRNN) which models the interactions between utterances of long-range context. Our model significantly outperforms existing work on CDA recognition on a tech forum dataset.

SESSION: Short - Urbanism and Mobility

Labelling for Venue Visit Detection by Matching Wi-Fi Hotspots with Businesses

User behaviour data is essential for modern companies, as it allows them to measure the impact of decisions they make and to gain new insights. A particular type of such data is user location trajectories, which can be clustered into Points of Interest, which, in turn, can be tied to certain venues (restaurants, schools, theaters, etc.). Machine learning is extensively utilized to detect and predict venue visits given the location data, but it requires a sufficient sample of labeled visits. Few Internet services provide a possibility to check-in for a user --- to send a signal that she is visiting a particular venue. However, for the majority of mobile applications it is unreasonable or far-fetched to introduce such a functionality for labeling purposes only. In this paper, we present a novel approach to label large quantities of location data as visits based on the following intuition: if a user is connected to a Wi-Fi hotspot of some venue, she is visiting the venue. Namely, we address the problem of matching Wi-Fi hotspots with venues by means of machine learning achieving 95% precision and 85% recall. The method has been deployed to production of one of the most popular global geo-based web services. We also release our dataset (that we utilize to develop the matching model) to facilitate research in this area.

Heterogeneous Components Fusion Network for Load Forecasting of Charging Stations

Accurate load forecasting of charging stations enable managers to reduce the drivers' waiting time and operating costs. But the existing works for spatial-temporal sequence forecasting usually assume the spatial-continuity of signals. However, the recharging scenario, in which the above assumptions are not valid due to the sparse spatial distribution of stations, need further research. To fill the gap, we present a Heterogeneous Components Fusion Network to model dual components sourced from the planned and the unplanned recharging events independently. For planned recharging component, we design a customized transformer to 'looks up' the reference 'memory' for the prediction. And we propose the time-variant graph to model highly dynamic unplanned events. Experiments conducted on a load reading dataset of 120 stations suggest that our model achieves better performance than a series of state-of-the-arts for spatial-temporal sequence prediction problem.

Learning Traffic Signal Control from Demonstrations

Reinforcement learning (RL) has recently become a promising approach in various decision-making tasks. Among them, traffic signal control is the one where RL makes a great breakthrough. However, these methods always suffer from the prominent exploration problem and even fail to converge. To resolve this issue, we make an analogy between agents and humans. Agents can learn from demonstrations generated by traditional traffic signal control methods, in the similar way as people master a skill from expert knowledge. Therefore, we propose DemoLight, for the first time, to leverage demonstrations collected from classic methods to accelerate learning. Based on the state-of-the-art deep RL method Advantage Actor-Critic (A2C), training with demos are carried out for both the actor and the critic and reinforcement learning is followed for further improvement. Results under real-world datasets show that DemoLight enables a more efficient exploration and outperforms existing baselines with faster convergence and better performance.

Spatio-Temporal Graph Convolutional and Recurrent Networks for Citywide Passenger Demand Prediction

Online ride-sharing platforms have become a critical part of the urban transportation system. Accurately recommending hotspots to drivers in such platforms is essential to help drivers find passengers and improve users' experience, which calls for efficient passenger demand prediction strategy. However, predicting multi-step passenger demand is challenging due to its high dynamicity, complex dependencies along spatial and temporal dimensions, and sensitivity to external factors (meteorological data and time meta). We propose an end-to-end deep learning framework to address the above problems. Our model comprises three components in pipeline: 1) a cascade graph convolutional recurrent neural network to accurately extract the spatial-temporal correlations within citywide historical passenger demand data; 2) two multi-layer LSTM networks to represent the external meteorological data and time meta, respectively; 3) an encoder-decoder module to fuse the above two parts and decode the representation to predict over multi-steps into the future. The experimental results on three real-world datasets demonstrate that our model can achieve accurate prediction and outperform the most discriminative state-of-the-art methods.

Collaborative Analysis for Computational Risk in Urban Water Supply Systems

Urban Water Supply (UWS) is one of the most critical and sensitive systems to sustain overall city operations. The European Union (EU) has strict water quality regulations that currently depend on periodic laboratory tests of selected parameters in most of the cases. The tests of some biological parameters can take up to 48 hours, which leads to the delay of risk detection and lengthens the response time for taking countermeasures in UWS. This situation increases the risk of negative impacts to the health of mass population. To address this challenge, we propose a data-driven risk analysis method which is low-cost and efficient. First, we build a framework for risk evaluation and prediction, within which a risk evaluation model is introduced considering the Quantitative Microbiological Risk Assessment (QMRA) process suggested by the World Health Organization (WHO). Second, we present a collaborative method to analyze biological risk features and similarities across different locations. Third, we propose a new risk prediction algorithm. We apply this method on the real-world data collected from 4 UWS systems in Norway. The preliminary results are depicted in risk maps and prediction accuracies are compared with different strategies. The application results show that our method is practical with good accuracy and explainability.

Long- and Short-term Preference Learning for Next POI Recommendation

Next POI recommendation has been studied extensively in recent years. The goal is to recommend next POI for users at specific time given users' historical check-in data. Therefore, it is crucial to model users' general taste and recent sequential behavior. Moreover, the context information such as the category and check-in time is also important to capture user preference. To this end, we propose a long- and short-term preference learning model (LSPL) considering the sequential and context information. In long-term module, we learn the contextual features of POIs and leverage attention mechanism to capture users' preference. In the short-term module, we utilize LSTM to learn the sequential behavior of users. Specifically, to better learn the different influence of location and category of POIs, we train two LSTM models for location-based sequence and category-based sequence, respectively. Then we combine the long and short-term results to recommend next POI for users. At last, we evaluate the proposed model on two real-world datasets. The experiment results demonstrate that our method outperforms the state-of-art approaches for next POI recommendation.

SESSION: Short - Information Retrieval

Cluster-Based Focused Retrieval

The focused retrieval task is to rank documents' passages by their presumed relevance to a query. Inspired by work on cluster-based document retrieval, we present a novel cluster-based focused retrieval method. The method is based on ranking clusters of similar passages using a learning-to-rank approach and transforming the cluster ranking to passage ranking. Empirical evaluation demonstrates the clear merits of the method.

Cross-modal Image-Text Retrieval with Multitask Learning

In this paper, we propose a multi-task learning approach for cross-modal image-text retrieval. First, a correlation network is proposed for relation recognition task, which helps learn the complicated relations and common information of different modalities. Then, we propose a correspondence cross-modal autoencoder for cross-modal input reconstruction task, which helps correlate the hidden representations of two uni-modal autoencoders. In addition, to further improve the performance of cross-modal retrieval, two regularization terms (variance and consistency constraints) are introduced to the cross-modal embeddings such that the learned common information has large variance and is modality invariant. Finally, to enable large-scale cross-modal similarity search, a flexible binary transform network is designed to convert the text and image embeddings into binary codes. Extensive experiments on two benchmark datasets demonstrate that our model has robust superiority over the compared strong baseline methods. Source code is available at \url

A Unified Generation-Retrieval Framework for Image Captioning

Recent image captioning approaches are typically trained on generation-based or retrieval-based approaches. Both methods have their advantages but limited by the disadvantages. In this paper, we propose a Unified Generation-Retrieval framework for Image Captioning (UGRIC) by using adversarial learning. Different from previous methods, the proposed UGRIC model leverages the informative contents of N-best response candidates provided by the retrieval-based model to enhance the generation-based method. In addition, to further improve the informativeness of the generated caption, we employ copying mechanism to choose words from the retrieved candidate captions and put them into proper positions of the output sequence. Experiments on MSCOCO dataset demonstrate the effectiveness of the UGRIC model through various evaluation metrics.\footnoteCode and data are available at: \url

A Lossy Compression Method on Positional Index for Efficient and Effective Retrieval

In query processing, incorporating proximity between query terms is beneficial for effective retrieval. However, it brings inevitable storage and computing costs by using positional data in inverted indexes. In this paper, we propose a lossy method for compressing term position data in the case of utilizing term proximity. Our method exploits clustering property of term occurrences, adaptively clusters the nearby occurrences, and replaces the clustered positions with a centralized value. Experimental results show that our adaptive method is competitive with respect to index size, ranking efficiency and effectiveness.

Interactive Matching Network for Multi-Turn Response Selection in Retrieval-Based Chatbots

In this paper, we propose an interactive matching network (IMN) for the multi-turn response selection task. First, IMN constructs word representations from three aspects to address the challenge of out-of-vocabulary (OOV) words. Second, an attentive hierarchical recurrent encoder (AHRE), which is capable of encoding sentences hierarchically and generating more descriptive representations by aggregating with an attention mechanism, is designed. Finally, the bidirectional interactions between whole multi-turn contexts and response candidates are calculated to derive the matching information between them. Experiments on four public datasets show that IMN outperforms the baseline models on all metrics, achieving a new state-of-the-art performance and demonstrating compatibility across domains for multi-turn response selection.

Analysis of Adaptive Training for Learning to Rank in Information Retrieval

Learning to Rank is an important framework used in search engines to optimize the combination of multiple features in a single ranking function. In the existing work on learning to rank, such a ranking function is often trained on a large set of different queries to optimize the overall performance on all of them. However, the optimal parameters to combine those features are generally query-dependent, making such a strategy of "one size fits all" non-optimal. Some previous works have addressed this problem by suggesting a query-level adaptive training for learning to rank with promising results. However, previous work has not analyzed the reasons for the improvement. In this paper, we present a Best-Feature Calibration (BFC) strategy for analyzing learning to rank models and use this strategy to examine the benefit of query-level adaptive training. Our results show that the benefit of adaptive training mainly lies in the improvement of the robustness of learning to rank in cases where it does not perform as well as the best single feature.

SESSION: Short - E-commerce & Production

QPIN: A Quantum-inspired Preference Interactive Network for E-commerce Recommendation

Recently, recurrent neural networks (RNNs) based methods have achieved profitable performance on mining temporal characteristics in user behavior. However, user preferences are changing over time and have not been fully exploited in e-commerce scenarios. To fill in the gap, we propose an approach, called quantum inspired preference interactive networks (QPIN), which leverages the mathematical formalism of quantum theory (QT) and the long short term memory (LSTM) network, to interactively learn user preferences. Specifically, the tensor product operation is used to model the interaction among a single user's own preferences, i.e. individual preferences. A quantum many-body wave function (QMWF) is employed to model interaction among all users' preferences, i.e. group preferences. Further, we bridge them by deriving a rigorous projection, and thus take the interplay between them into account. Experiments on an Amazon dataset as well as a real-world e-commerce dataset demonstrate the effectiveness of QPIN, which achieves superior performances compared with the state-of-the-art methods in terms of AUC and F1-score.

A Study of Context Dependencies in Multi-page Product Search

In product search, users tend to browse results on multiple search result pages (SERPs) (e.g., for queries on clothing and shoes) before deciding which item to purchase. Users' clicks can be considered as implicit feedback which indicates their preferences and used to re-rank subsequent SERPs. Relevance feedback (RF) techniques are usually involved to deal with such scenarios. However, these methods are designed for document retrieval, where relevance is the most important criterion. In contrast, product search engines need to retrieve items that are not only relevant but also satisfactory in terms of customers' preferences. Personalization based on users' purchase history has been shown to be effective in product search. However, this method captures users' long-term interest, which do not always align with their short-term interest, and does not benefit customers with little or no purchase history. In this paper, we study RF techniques based on both long-term and short-term context dependencies in multi-page product search. We also propose an end-to-end context-aware embedding model which can capture both types of context. Our experimental results show that short-term context leads to much better performance compared with long-term and no context. Moreover, our proposed model is more effective than state-of-art word-based RF models.

Query-bag Matching with Mutual Coverage for Information-seeking Conversations in E-commerce

Information-seeking conversation system aims at satisfying the information needs of users through conversations. Text matching between a user query and a pre-collected question is an important part of the information-seeking conversation in E-commerce. In the practical scenario, a sort of questions always correspond to a same answer. Naturally, these questions can form a bag. Learning the matching between user query and bag directly may improve the conversation performance, denoted as query-bag matching. Inspired by such opinion, we propose a query-bag matching model which mainly utilizes the mutual coverage between query and bag and measures the degree of the content in the query mentioned by the bag, and vice verse. In addition, the learned bag representation in word level helps find the main points of a bag in a fine grade and promotes the query-bag matching performance. Experiments on two datasets show the effectiveness of our model.

Neural Review Rating Prediction with User and Product Memory

Neural network methods have achieved great success in sentiment classification. Recent studies have found that incorporating user and product information can effectively improve the performance of review sentiment classification. However, most of these studies only concentrate on the influence of users and products, ignoring the inherent correlation between users or products. This information is important for users or products since they can obtain more information from similar users or products. In this paper, we propose a novel framework for review rating prediction with user and product memory. First, besides the original user or product representations, we construct inferred representations from representative users or products which are stored in memory slots. These memory units can be viewed as refined knowledge representations of users or products learned from the data. Then, we employ two hierarchical networks with user attention and product attention using both the original and inferred representations. Experiments on benchmark datasets show that our method can achieve state-of-the-art performance. Besides, our approach performs much more better in cold-start scenarios where the training data is scarce.

Intent Term Weighting in E-commerce Queries

E-commerce search engines can fail to retrieve results that satisfy a query's product intent because: (i) conventional retrieval approaches, such as BM25, may ignore the important terms in queries owing to their low "inverse document frequency" " (IDF), and (ii) for long queries, as is usually the case in rare queries (i.e., tail queries), they may fail to determine the relevant terms that are representative of the query's product intent. In this paper, we leverage the historical query reformulation logs of a large e-retailer ( to develop a distant-supervision-based approach to identify the relevant terms that characterize the query's product intent. The key idea underpinning our approach is that the terms retained in the reformulation of a query are more important in describing the query's product intent than the discarded terms. Additionally, we also use the fact that the significance of a term depends on its context (other terms in the neighborhood) in the query to determine the term's importance towards the query's product intent. We show that identifying and emphasizing the terms that define the query's product intent leads to a 3% improvement in ranking and outperforms the context-unaware baselines.

Fine-Grained Product Categorization in E-commerce

E-commerce sites usually leverage taxonomies for better organizing products. The fine-grained categories, regarding the leaf categories in taxonomies, are defined by the most descriptive and specific words of products. Fine-grained product categorization remains challenging, due to blurred concepts of fine grained categories (i.e. multiple equivalent or synonymous categories), instable category vocabulary (i.e. the emerging new products and the evolving language habits), and lack of labelled data. To address these issues, we proposes a novel Neural Product Categorization model---NPC to identify fine-grained categories from the product content. NPC is equipped with a character-level convolutional embedding layer to learn the compositional word representations, and a spiral residual layer to extract the word context annotations capturing complex long range dependencies and structural information. To perform categorization beyond predefined categories, NPC categorizes a product by jointly recognizing categories from the product content and predicting categories from predefined category vocabularies. Furthermore, to avoid extensive human labors, NPC is able to adapt to weak labels, generated by mining the search logs, where the customers' behaviors naturally connect products with categories. Extensive experiments performed on a real e-commerce platform datasets illustrate the effectiveness of the proposed models.

SESSION: Short - Classification

Large Margin Prototypical Network for Few-shot Relation Classification with Fine-grained Features

Relation classification (RC) plays a pivotal role in both natural language understanding and knowledge graph completion. It is generally formulated as a task to recognize the relationship between two entities of interest appearing in a free-text sentence. Conventional approaches on RC, regardless of feature engineering or deep learning based, can obtain promising performance on categorizing common types of relation leaving a large proportion of unrecognizable long-tail relations due to insufficient labeled instances for training. In this paper, we consider few-shot learning is of great practical significance to RC and thus improve a modern framework of metric learning for few-shot RC. Specifically, we adopt the large-margin ProtoNet with fine-grained features, expecting they can generalize well on long-tail relations. Extensive experiments were conducted by FewRel, a large-scale supervised few-shot RC dataset, to evaluate our framework: LM-ProtoNet (FGF). The results demonstrate that it can achieve substantial improvements over many baseline approaches.

Meta-GNN: On Few-shot Node Classification in Graph Meta-learning

Meta-learning has received a tremendous recent attention as a possible approach for mimicking human intelligence, i.e., acquiring new knowledge and skills with little or even no demonstration. Most of the existing meta-learning methods are proposed to tackle few-shot learning problems such as image and text, in rather Euclidean domain. However, there are very few works applying meta-learning to non-Euclidean domains, and the recently proposed graph neural networks (GNNs) models do not perform effectively on graph few-shot learning problems. Towards this, we propose a novel graph meta-learning framework -- Meta-GNN -- to tackle the few-shot node classification problem in graph meta-learning settings. It obtains the prior knowledge of classifiers by training on many similar few-shot learning tasks and then classifies the nodes from new classes with only few labeled samples. Additionally, Meta-GNN is a general model that can be straightforwardly incorporated into any existing state-of-the-art GNN. Our experiments conducted on three benchmark datasets demonstrate that our proposed approach not only improves the node classification performance by a large margin on few-shot learning problems in meta-learning paradigm, but also learns a more general and flexible model for task adaption.

Enriching Pre-trained Language Model with Entity Information for Relation Classification

Relation classification is an important NLP task to extract relations between entities. The state-of-the-art methods for relation classification are primarily based on Convolutional or Recurrent Neural Networks. Recently, the pre-trained BERT model achieves very successful results in many NLP classification / sequence labeling tasks. Relation classification differs from those tasks in that it relies on information of both the sentence and the two target entities. In this paper, we propose a model that both leverages the pre-trained BERT language model and incorporates information from the target entities to tackle the relation classification task. We locate the target entities and transfer the information through the pre-trained architecture and incorporate the corresponding encoding of the two entities. We achieve significant improvement over the state-of-the-art method on the SemEval-2010 task 8 relational dataset.

Unsupervised Concept Drift Detection with a Discriminative Classifier

In data stream mining, one of the biggest challenges is to develop algorithms that deal with the changing data. As data evolve over time, static models become outdated. This phenomenon is called concept drift, and it is investigated extensively in the literature. Detecting and subsequently adapting to concept drifts yield more robust and better performing models. In this study, we present an unsupervised method called D3 which uses a discriminative classifier with a sliding window to detect concept drift by monitoring changes in the feature space. It is a simple method that can be used along with any existing classifier that does not intrinsically have a drift adaptation mechanism. We experiment on the most prevalent concept drift detectors using 8 datasets. The results demonstrate that D3 outperforms the baselines, yielding models with higher performances on both real-world and synthetic datasets.

Hybrid Deep Pairwise Classification for Author Name Disambiguation

Author name disambiguation (AND) can be defined as the problem of clustering together unique authors from all author mentions that have been extracted from publication or related records in digital libraries or other sources. Pairwise classification is an essential part of AND, and is used to estimate the probability that any pair of author mentions belong to the same author. Previous studies trained classifiers with features manually extracted from each attribute of the data. Recently, others trained a model to learn a vector representation from text without considering any structure information. Both of these approaches have advantages. The former method takes advantage of the structure of data, while the latter takes into account the textual similarity across attributes. Here, we introduce a hybrid method which takes advantage of both approaches by extracting both structure-aware features and global features. In addition, we introduce a novel way to train a global model utilizing a large number of negative samples. Results on AMiner and PubMed data shows the relative improvement of the mean average precision (MAP) by more than 7.45% when compared to previous state-of-the-art methods.

Approximate Definitional Constructs as Lightweight Evidence for Detecting Classes Among Wikipedia Articles

A lightweight method applies a few extraction patterns to the task of distinguishing Wikipedia articles that are classes ("Walled garden", "Garden") from other articles ("High Hazels Park"). The method acquires a set of classes, based on patterns targeting phrases that likely refer to either concepts being introduced or defined ("a *walled garden* is a garden [..]"); or to concepts used to introduce or define other concepts ("a walled garden is a *garden* [..]"). Experimental results over multiple evaluation sets are better, when relying on defined phrases alone vs. defining phrases alone; and further improved, when combining complementary evidence from both.

SESSION: Short - Knowledge Extraction & Generation

Towards the Gradient Vanishing, Divergence Mismatching and Mode Collapse of Generative Adversarial Nets

Generative adversarial network (GAN) is a powerful generative model. However, it suffers from gradient vanishing, divergence mismatching and mode collapse. To overcome these problems, we propose a novel GAN, which consists of one generator G and two discriminators (D1, D2). Focusing on the gradient vanishing, Spectral Normalization (SN) and ResBlock are first adopted in D1 and D2. Then, Scaled Exponential Linear Units (SELU) is adopted at last half layers of D2 to further address the problem. To divergence mismatching, relativistic discriminator is adopted in our GAN to make the loss function minimization in the training of generator equal to the theoretical divergence minimization. Concentrating on the mode collapse, D1 rewards high scores for the samples from the data distribution, while D2 favors the samples from the generator conversely. In addition, the minibatch discrimination is adopted in D1 to further address the problem. Extensive experiments on CIFAR-10/100 and ImageNet datasets demonstrate that our GAN can obtain the highest inception score (IS) and lowest Frechet Inception Distance (FID) compared with other state-of-the-art GANs.

Generating Paraphrase with Topic as Prior Knowledge

Paraphrase generation can be modeled as a sequence-to-sequence (Seq2Seq) learning problem. Nonetheless, a typical Seq2Seq model is liable to convey the original meaning incorrectly, as the vectorial representation of the given sentence is sometimes inadequate in recapitulating complicated semantic. Naturally, paraphrases concern the same topic, which can serve as an auxiliary guidance to promote the preservation of source semantic. Moreover, some interesting words for restatements can be derived from the topical information. To exploit topic in paraphrase generation, we incorporate topic words into the Seq2Seq framework through a topic-aware input and a topic-biased generation distribution. Direct supervision signals are also introduced to help dealing with the topic information more accurately. Empirical studies on two benchmark datasets show that the proposed method significantly improves the basic Seq2Seq model, and it is comparable with the state-of-the-art systems.

Sexual Harassment Story Classification and Key Information Identification

Recently more and more personal stories about sexual harassment are shared online, mainly inspired by the \#MeToo movement. Safecity is an online forum for victims of sexual harassment to share their personal experience. Previous study applied neural network models to classify the harassment forms of the stories. To uncover patterns of sexual harassment, the extraction of the key elements and the categorization of these stories in different dimensions can be useful as well. In this study, we proposed neural network models to extract key elements including harasser, time, location and trigger words. In addition, we categorized these stories from different dimensions, such as location, time, and harassers' characteristics, including their age range, single/multiple harassers, profession, and relationship with the victims. We further demonstrated that encoding the key element information in the story categorization model can improve its performance. The proposed approaches and analysis would be helpful in automatically filing reports, raising public awareness, making preventing strategies and etc.

Neural Review Summarization Leveraging User and Product Information

Product review summarization is a special form of text summarization, which gives a brief summary of an online product review. It is useful for both sellers to get feedback and consumers to make purchase decisions. Compared to traditional well-studied text summarization, product review summarization is highly personalized and targeted. Users have their own styles to write reviews and summaries, and products have different aspects to focus on. In this paper, we explore different ways to leverage the user and product information to help review summarization. Experiments show that our approaches are very effective and our models outperform the strong summarization baselines with a large margin.

Incorporating Relation Knowledge into Commonsense Reading Comprehension with Multi-task Learning

This paper focuses on how to take advantage of external relational knowledge to improve machine reading comprehension (MRC) with multi-task learning. Most of the traditional methods in MRC assume that the knowledge used to get the correct answer generally exists in the given documents. However, in real-world task, part of knowledge may not be mentioned and machines should be equipped with the ability to leverage external knowledge. In this paper, we integrate relational knowledge into MRC model for commonsense reasoning. Specifically, based on a pre-trained language model (LM), We design two auxiliary relation-aware tasks to predict if there exists any commonsense relation and what is the relation type be-tween two words, in order to better model the interactions between document and candidate answer option. We conduct experiments on two multi-choice benchmark datasets: the SemEval-2018 Task11 and the Cloze Story Test. The experimental results demonstrate the effectiveness of the proposed method, which achieves superior performance compared with the comparable baselines on both datasets.

SESSION: Short - Health & Sentiment

DIRT: Deep Learning Enhanced Item Response Theory for Cognitive Diagnosis

Cognitive diagnosis is the cornerstone of modern educational techniques. One of the most classic cognitive diagnosis methods is Item Response Theory (IRT), which provides interpretable parameters for analyzing student performance. However, traditional IRT only exploits student response results and has difficulties in fully utilizing the semantics of question texts, which significantly restricts its application. To this end, in this paper, we propose a simple yet surprisingly effective framework to enhance the semantic exploiting process, which we termed Deep Item Response Theory (DIRT). In DIRT, we first use a proficiency vector to represent student proficiency on knowledge concepts and represent question texts and knowledge concepts by dense embedding. Then, we use deep learning to enhance the process of diagnosing parameters of student and question by exploiting question texts and the relationship between question texts and knowledge concepts. Finally, with the diagnosed parameters, we adopt the item response function to predict student performance. Extensive experimental results on real-world data clearly demonstrate the effectiveness and the interpretability of DIRT framework.

Neural Gender Prediction in Microblogging with Emotion-aware User Representation

Demographics of social media users such as gender are very important for personalized online services. However, the gender information of many users is usually not available. Luckily, the messages posted by social media users can provide rich clues for inferring their genders, since male and female users usually have differences in their message content. In addition, users with different genders often have different patterns in expressing emotions. In this paper, we propose a neural approach for gender prediction in social media based on both content and emotion of messages posted by users. The core of our approach is an emotion-aware hierarchical user representation model. Our model first learns message representations from words using message encoder and then learns user representations from messages using user encoder with hierarchical attention networks selecting important words and messages to learn informative user representations. In addition, we propose two methods to incorporate emotion information in messages into user representation learning. The first one is to incorporate emotion-aware message representations generated by a pre-trained emotion classifier into message representations. The second one is to train emotion-aware message encoders via jointly training our model with an auxiliary emotion classification task. Extensive experiments on two real-world datasets validate the effectiveness of our approach.

Health Card Retrieval for Consumer Health Search: An Empirical Investigation of Methods

This paper investigates methods to rank health cards, a domain-specific type of entity cards, for consumer health search (CHS) queries. A key challenge in this context is which card(s) should be presented to the user. In particular, little evidence exists to determine the effectiveness of retrieval and ranking methods for health cards in CHS. CHS is a challenging domain, where users lack domain expertise and thus are often unable to formulate effective queries, and to interpret the retrieved results. In addition, unlike in other contexts, CHS presents the opportunity to exploit a number of domain specific characteristics and features. In this paper, we focus on difficult queries with self-diagnosis intents. Our study makes the following contributions: (1) it assembles and releases the first test collection of health cards for research purposes, and (2) it empirically evaluates a large range of entity retrieval methods adapted to health cards retrieval, including features specific to health cards for learning to rank. This is the first study that thoroughly investigates methods to rank health cards.

NICE: Neural In-Hospital Cost Estimation from Medical Records

Estimating in-hospital costs from medical records is an important task with many applications such as accountable care. Existing methods for this task usually rely on manual feature engineering which needs massive domain knowledge, and do not exploit the textual information in medical records, e.g., diagnosis and operation texts. In this paper, we propose a neural in-hospital cost estimation (NICE) approach to estimate the in-hospital costs of patients from their admission records. Our approach can exploit the heterogeneous information in records, such as patient features, diagnosis/operation texts, and the diagnosis/operation IDs, via a multi-view learning framework. In addition, since different words, diagnoses and operations have different importance for cost estimation, we propose a hierarchical attention network to select important words, diagnoses and operations for learning informative record representations. Extensive experiments on a real-world medical dataset validate the effectiveness of our approach.

Modeling Sentiment Evolution for Social Incidents

Modeling sentiment evolution for social incidents in Microblogs is of vital importance for both enterprises and government officials. Existing works on sentiment tracking are not satisfying, due to the lack of entity-level sentiment extraction and accurate sentiment shift detection. Identifying entity-level sentiment is challenging as Microbloggers often use multiple opinion expressions in a sentence which targets different entities. Moreover, the evolution of the background sentiment, which is essential to shift detection, is ignored in the previous study. To address these issues, we leverage the proximity information to obtain more precise entity-level sentiment extraction. Based on it, we propose to simultaneously model the evolution of background opinion and the sentiment shift using a state space model on the time series of sentiment polarities. Experiments on real data sets demonstrate that our proposed approaches outperform state-of-the-art methods on the task of modeling sentiment evolution for social incidents.

SESSION: Short - Theory

Adaptive Feature Redundancy Minimization

Most existing feature selection methods select the top-ranked features according to certain criterion. However, without considering the redundancy among the features, the selected ones are frequently highly correlated with each other, which is detrimental to the performance. To tackle this problem, we propose a framework regarding adaptive redundancy minimization (ARM) for the feature selection. Unlike other feature selection methods, the proposed model has the following merits: (1) The redundancy matrix is adaptively constructed instead of presetting it as the priori information. (2) The proposed model could pick out the discriminative and non-redundant features via minimizing the global redundancy of the features. (3) ARM can reduce the redundancy of the features from both supervised and unsupervised perspectives.

Finding a Maximum Clique in Dense Graphs via χ2 Statistics

The maximum clique extraction problem finds extensive application in diverse domains like community discovery in social networks, brain connectivity networks, motif discovery, gene expression in bioinformatics, anomaly detection, road networks and expert graphs. Since the problem is NP-hard, known algorithms for finding a maximum clique can be expensive for large real-life graphs. Current heuristics also fail to provide high accuracy and run-time efficiency for dense networks, quite common in the above domains. In this paper, we propose the ALTHEA heuristic to efficiently extract a maximum clique from a dense graph. We show that ALTHEA, based on chi-square statistical significance, is able to dramatically prune the search space for finding a maximum clique, thereby providing run-time efficiency. Further, experimental results on both real and synthetic graph datasets demonstrate that ALTHEA is highly accurate and robust in detecting a maximum clique.

On Heavy-user Bias in A/B Testing

On-line experimentation (also known as A/B testing) has become an integral part of software development. To timely incorporate user feedback and continuously improve products, many software companies have adopted the culture of agile deployment, requiring online experiments to be conducted and concluded on limited sets of users for a short period. While conceptually efficient, the result observed during the experiment duration can deviate from what is seen after the feature deployment, which makes the A/B test result biased. In this paper, we provide theoretical analysis to show that heavy-users can contribute significantly to the bias, and propose a re-sampling estimator for bias adjustment.

Adversarial Training of Gradient-Boosted Decision Trees

Adversarial training is a prominent approach to make machine learning (ML) models resilient to adversarial examples. Unfortunately, such approach assumes the use of differentiable learning models, hence it cannot be applied to relevant ML techniques, such as ensembles of decision trees. In this paper, we generalize adversarial training to gradient-boosted decision trees (GBDTs). Our experiments show that the performance of classifiers based on existing learning techniques either sharply decreases upon attack or is unsatisfactory in absence of attacks, while adversarial training provides a very good trade-off between resiliency to attacks and accuracy in the unattacked setting.

Adversarial Structured Neural Network Pruning

In recent years, convolutional neural networks (CNN) have been successfully employed for performing various tasks due to their high capacity. However, just like a double-edged sword, high capacity results from millions of parameters, which also brings a huge amount of redundancy and dramatically increases the computational complexity. The task of pruning a pretrained network to make it thinner and easier to deploy on resource-limited devices is still challenging. In this paper, we employ the idea of adversarial examples to sparsify a CNN. Adversarial examples were originally designed to fool a network. Rather than adjusting the input image, we view any layer as an input to the layers afterwards. By performing an adversarial attack algorithm, the sensitivity information of the network components could be observed. With this information, we perform pruning in a structured manner to retain only the most critical channels. Empirical evaluations show that our proposed approach obtains the state-of-the-art structured pruning performance.

Ontology-Mediated Queries over Probabilistic Data via Probabilistic Logic Programming

We study ontology-mediated querying over probabilistic data for the case when the ontology is formulated in EL(hdr), an expressive member of the EL family of description logics. We leverage techniques that have been developed (i) for classical ontology-mediated querying and (ii) for probabilistic logic programming and provide an implementation based on our findings. We include both theoretical considerations and an experimental evaluation of our approach.

SESSION: Short - Search

Query Embedding Learning for Context-based Social Search

Recommending individuals through keywords is an essential and common search task in online social platforms such as Facebook and LinkedIn. However, it is often that one has only the impression about the desired targets, depicted by labels of social contexts (e.g. gender, interests, skills, visited locations, employment, etc). Assume each user is associated a set of labels, we propose a novel task, Search by Social Contexts (SSC), in online social networks. SSC is a kind of query-based people recommendation, recommending the desired target based on a set of user-specified query labels. We develop the method Social Query Embedding Learning (SQEL) to deal with SSC. SQEL aims to learn the feature representation (i.e., embedding vector) of the query, along with user feature vectors derived from graph embedding, and use the learned query vectors to find the targets via similarity. Experiments conducted on Facebook and Twitter datasets exhibit satisfying accuracy and encourage more advanced efforts on search by social contexts.

Towards More Usable Dataset Search: From Query Characterization to Snippet Generation

Reusing published datasets on the Web is of great interest to researchers and developers. Their data needs may be met by submitting queries to a dataset search engine to retrieve relevant datasets. In this ongoing work towards developing a more usable dataset search engine, we characterize real data needs by annotating the semantics of 1,947 queries using a novel fine-grained scheme, to provide implications for enhancing dataset search. Based on the findings, we present a query-centered framework for dataset search, and explore the implementation of snippet generation and evaluate it with a preliminary user study.

Session-based Search Behavior in Naturalistic Settings for Learning-related Tasks

In this research, we investigate the behavioral patterns exhibited in different search sessions as users attempt to complete search tasks of increasing cognitive complexity. The search tasks, which are exploratory in nature, have been designed using the Taxonomy of Educational Objectives, and are presented to the users hierarchically. We capture naturalistic search behavior of the users in real world (non-lab) setting using a Chrome browser plugin. The research analyzes the web log data of the users to assess if and how the web search behavior of the users changes over different search sessions. We also look at the different demographic factors like age and gender, educational factors like the academic background, read and write proficiency in English, and search skills to determine if these factors influence the web search behavior of the users. Our results indicate that search sessions have significant effects on the web search behavior of the users. Most of the web search behaviors differed significantly across search sessions. Of the secondary factors, gender showed significant effect on the query reformulations (measured using average edit distance between queries) and query length (measured using number of words per query) while year of study affected only the average query length. Search experience had significant effect on all the web search behaviors.

Best Co-Located Community Search in Attributed Networks

Various networks have rich attributes such as texts (e.g., tweets) and locations (e.g., check-ins). The community search in such attributed networks have been intensively studied recently due to its wide applications in recommendation, marketing, biology, etc. In this paper, we study the problem of searching the \underlineB est \underlineC o-located \underlineC ommunity (\BCC) in attributed networks, which returns a community that satisfies the following properties: i) structural cohesiveness: members in the community are densely connected, ii) spatial co-location: members are close to each other, and iii) quality optimality: the community has the best quality in terms of given attributes. The problem can be used in social network user behavior analysis, recommendation systems, disease predication, etc. We first propose an index structure called \DTree to integrate the spatial information, the local structure information, and the attribute information together to accelerate the query processing. Then, based on this index we develop an efficient algorithm. The experimental study conducted on both real and synthetic datasets demonstrate the efficiency and effectiveness of the proposed methods.

Caching Scores for Faster Query Processing with Dynamic Pruning in Search Engines

We propose to use a score cache, which stores the score of the k-th result of a query, to accelerate top-k query processing with dynamic pruning methods (i.e., WAND and BMW). We introduce heuristics that, for a new query, generate its subsets and probe the score cache to obtain a lower-bound on its score threshold. Our experiments show up to 8.6% savings in mean processing time for the queries that are not seen before, i.e., cannot benefit from a result cache.

Investigating the Learning Process in Job Search: A Longitudinal Study

We investigated the learning process in search by conducting a log-based study involving registered job seekers of a commercial job search engine. The analysis shows that job search is a complex task: seekers usually submit multiple queries over sessions that can last days or even weeks. We find that querying, clicking, and job application rates change over time: job seekers tend to use more filters and a less diverse set of query terms. In terms of click and application behavior, we observed a significant decrease in click rate and query term diversity, as well as an increase in application rates. These trends are found to largely match information seeking models of learning in a complex search task. However, common behaviors are observed in the logs that suggest the existing models may not be sufficient to describe all of the users' learning and seeking processes.

SESSION: Short - System & Database

Set Reconciliation with Cuckoo Filters

Set reconciliation is a common and fundamental task in distributed systems. In many cases, given set A on $Host_A$ and set B on $Host_B$, applications need to identify those elements that appear in set A but not in set B, and vice versa. However, existing methods incur unsatisfactory space utilization and non-trivial false positives and false negatives. In this paper, we present a novel reconciliation method based on Cuckoo filter (CF). After exchanging the CFs each of which represents a set of elements, we query the local elements against the received CF to determine the elements that only belong to the local host and should be transmitted to the other host. The evaluation results indicate that the CF-based reconciliation method outperforms existing methods significantly.

Shared-Nothing Distributed Enumeration of 2-Plexes

We present a novel approach for the detection of 2-plexes, a popular relaxation of cliques used for modeling network communities. Specifically, with the purpose of identifying theoretically sound methods for community detection on a large scale, we introduce the first shared-nothing distributed algorithm for this problem. This result opens a new research direction for scalable community detection. Our proposal has three main ingredients: (i) we reduce the problem of finding 2-plexes to that of finding cliques; (ii) we leverage known algorithms for fast computation of cliques; (iii) we exploit a decomposition technique for a distributed shared-nothing computation. Preliminary experiments on a 10-nodes cluster running Spark confirm the effectiveness of our approach.

Estimating the Number of Distinct Items in a Database by Sampling

Counting the number of distinct items in a dataset is a well known computational problem with numerous applications. Sometimes, exact counting is infeasible, and one must use some approximation method. One approach to approximation is to estimate the number of distinct items from a random sample. This approach is useful, for example, when the dataset is too big, or when only a sample is available, but not the entire data. Moreover, it can considerably speed up the computation. In statistics, this problem is known as the \em Unseen Species Problem. In this paper, we propose an estimation method for this problem, which is especially suitable for cases where the sample is much smaller than the entire set, and the number of repetitions of each item is relatively small. Our method is simple in comparison to known methods, and gives good enough estimates to make it useful in certain real life datasets that arise in data mining scenarios. We demonstrate our method on real data where the task at hand is to estimate the number of duplicate URLs.

Cost-effective Resource Provisioning for Spark Workloads

Spark is one of the prevalent big data analytical platforms. Configuring proper resource provision for Spark jobs is challenging but essential for organizations to save time, achieve high resource utilization, and remain cost-effective. In this paper, we study the challenge of determining the proper parameter values that meet the performance requirements of workloads while minimizing both resource cost and resource utilization time. We propose a simulation-based cost model to predict the performance of jobs accurately. We achieve low-cost training by taking advantage of simulation framework, i.e., Monte Carlo (MC) simulation, which uses a small amount of data and resources to make a reliable prediction for larger datasets and clusters. The salient feature of our method is that it allows us to invest low training cost while obtaining an accurate prediction. Through experiments with six benchmark workloads, we demonstrate that the cost model yields less than 7% error on average prediction accuracy and the recommendation achieves up to 5x resource cost saving.

A Sampling-Based System for Approximate Big Data Analysis on Computing Clusters

To break the in-memory bottleneck and facilitate online sampling in cluster computing frameworks, we propose a new sampling-based system for approximate big data analysis on computing clusters. We address both computational and statistical aspects of big data across the main layers of cluster computing frameworks: big data storage, big data management, big data online sampling, big data processing, and big data exploration and analysis. We use the new Random Sample Partition (RSP) distributed data model to store a big data set as a set of ready-to-use random sample data blocks in Hadoop Distributed File System (HDFS), called RSP blocks. With this system, only a few RSP blocks are selected and processed using a sequential algorithm in a distributed data-parallel manner to produce approximate results for the entire data set. In this paper, we present a prototype RSP-based system and demonstrate its advantages. Our experiments show that RSP blocks can be used to get approximate models and summary statistics as well as estimate the proportions of inconsistent values without computing the entire data or running expensive online sampling operations. This new system enables big data exploration and analysis where the entire data set cannot be computed.

TianGong-ST: A New Dataset with Large-scale Refined Real-world Web Search Sessions

Web search session data is precious for a wide range of Information Retrieval (IR) tasks, such as session search, query suggestion, click through rate (CTR) prediction and so on. Numerous studies have shown the great potential of considering context information for search system optimization. The well-known TREC Session Tracks have enhanced the development in this domain to a great extent. However, they are mainly collected via user studies or crowdsourcing experiments and normally contain only tens to thousands sessions, which are deficient for the investigation with more sophisticated models. To tackle this obstacle, we present a new dataset that contains 147,155 refined web search sessions with both click-based and human-annotated relevance labels. The sessions are sampled from a huge search log thus can reflect real search scenarios. The proposed dataset can support a wide range of session-level or task-based IR studies. As an example, we test several interactive search models with both the PSCM and human relevance labels provided by this dataset and report the performance as a reference for future studies of session search.

SESSION: Applied - E-commerce

Virtual ID Discovery from E-commerce Media at Alibaba: Exploiting Richness of User Click Behavior for Visual Search Relevance

Visual search plays an essential role for E-commerce. To meet the search demands of users and promote shopping experience at Alibaba, visual search relevance of real-shot images is becoming the bottleneck. Traditional visual search paradigm is usually based upon supervised learning with labeled data. However, large-scale categorical labels are required with expensive human annotations, which limits its applicability and also usually fails in distinguishing the real-shot images.

In this paper, we propose to discover Virtual ID from user click behavior to improve visual search relevance at Alibaba. As a totally click-data driven approach, we collect various types of click data for training deep networks without any human annotations at all. In particular, Virtual ID are learned as classification supervision with co-click embedding, which explores image relationship from user co-click behaviors to guide category prediction and feature learning. Concretely, we deploy Virtual ID Category Network by integrating first-clicks and switch-clicks as regularizer. Incorporating triplets and list constraints, Virtual ID Feature Network is trained in a joint classification and ranking manner. Benefiting from exploration of user click data, our networks are more effective to encode richer supervision and better distinguish real-shot images in terms of category and feature. To validate our method for visual search relevance, we conduct an extensive set of offline and online experiments on the collected real-shot images. We consistently achieve better experimental results across all components, compared with alternative and state-of-the-art methods.

Autor3: Automated Real-time Ranking with Reinforcement Learning in E-commerce Sponsored Search Advertising

Sponsored search platforms rank the advertisements (ads) by a ranking function to determine the impression allocation and the charging price for the advertisers. To place ads optimally, it is highly desirable but remain challenging to adapt ranking function to ad traffic at both large-scale and fine granularity. In this paper, we propose an automatic adaptive auction system called Autor 3. Our system leverages the variability and correlation of ad traffic in a search session and models ranking ads in a session as a multi-step decision-making problem. With effective yet lightweight abstractions of auction states and ranking actions, Autor3 builds a reinforcement learning (RL) framework to learn the ranking decision at the fine granularity of page views (i.e., impressions) over the large-scale auction volume. Our offline experiments show that our method considering sequential decision are superior to those that do not. We deployed Autor3 to process the billion-scale impressions per day in Taobao, the largest e-commerce platform in China. Using online A/B test and a subsequent full-scale deployment, we show that both the Revenue-Per-Mille (RPM) and Click-Through-Rates (CTRs) are improved comparing to the previous keyword-level approach used in Taobao's live production environment.

Cross-domain Attention Network with Wasserstein Regularizers for E-commerce Search

Product search and recommendation is a task that every e-commerce platform wants to outperform their peels on. However, training a good search or recommendation model often requires more data than what many platforms have. Fortunately, the search tasks on different platforms share the common underlying structure. Considering each platform as a domain, we propose a cross-domain learning approach to help the task on data-deficient platforms by leveraging the data from data-abundant platforms. In our solution, the importance of features in different domains is addressed by a domain-specific attention network. Meanwhile, a multi-task regularizer based on Wasserstein distance is introduced to help extract both domain-invariant and domain-specific features. Our model consistently outperforms the competing methods on both public and real-world industry datasets. Quantitative evaluation shows that our model can discover important features for different domains, which helps us better understand different user needs across platforms. Last but not least, we have deployed our model online in three big e-commerce platforms namely Taobao, Tmall, and Qintao, and observed better performance than the production models for all the platforms.

Conceptualize and Infer User Needs in E-commerce

Understanding latent user needs beneath shopping behaviors is critical to e-commercial applications. Without a proper definition of user needs in e-commerce, most industry solutions are not driven directly by user needs at current stage, which prevents them from further improving user satisfaction. Representing implicit user needs explicitly as nodes like "outdoor barbecue" or "keep warm for kids" in a knowledge graph, provides new imagination for various e- commerce applications. Backed by such an e-commerce knowledge graph, we propose a supervised learning algorithm to conceptualize user needs from their transaction history as "concept" nodes in the graph and infer those concepts for each user through a deep attentive model. Online experiments demonstrate the effectiveness and stability of our model, and online industry strength tests show substantial advantages of such user needs understanding.

Learning to Advertise for Organic Traffic Maximization in E-Commerce Product Feeds

Most e-commerce product feeds provide blended results of advertised products and recommended products to consumers. The underlying advertising and recommendation platforms share similar if not exactly the same set of candidate products. Consumers' behaviors on the advertised results constitute part of the recommendation model's training data and therefore can influence the recommended results. We refer to this process as Leverage. Considering this mechanism, we propose a novel perspective that advertisers can strategically bid through the advertising platform to optimize their recommended organic traffic. By analyzing the real-world data, we first explain the principles of Leverage mechanism, i.e., the dynamic models of Leverage. Then we introduce a novel Leverage optimization problem and formulate it with a Markov Decision Process. To deal with the sample complexity challenge in model-free reinforcement learning, we propose a novel Hybrid Training Leverage Bidding (HTLB) algorithm which combines the real-world samples and the emulator-generated samples to boost the learning speed and stability. Our offline experiments as well as the results from the online deployment demonstrate the superior performance of our approach.

SESSION: Applied - Graph Applications

System Deterioration Detection and Root Cause Learning on Time Series Graphs

System deterioration detection and root cause analysis is crucial for today's industrial society. However, the design and operation of mechanic system is getting more and more complex, which makes it hard at identifying deterioration with noisy data. Our research focuses on solving such problem on time-evolving sensor graphs in a streaming setting. Given a sequence of graphs, the ability to identify 1) any gradual and stable structured change and 2) the root cause components is of importance for early warning and system diagnosis. Existing methods either raise too many false alerts on instant changes or are too sensitive to noise. To address these problems, we propose Robust Failure Detection and Diagnosis (RoFaD). RoFaD can capture failure propagation given a time series of graph. By optimizing a matrix-based Taylor expansion, RoFaD can identify system deterioration in the presence of noise and immediate changes, and diagnose the root cause components. Experiments on both synthetic and real world datasets demonstrate that RoFaD is more effective than the popular baselines.

A Dynamic Default Prediction Framework for Networked-guarantee Loans

Commercial banks normally require Small and Medium Enterprises (SMEs) to provide their warranties when applying for a loan. If the borrower defaults, the guarantor is obligated to repay its loan. Such a guarantee system is designed to reduce delinquent risks, but may introduce a new dimension risk if more and more SMEs involve and subsequently form complex temporal networks. Monitoring the financial status of SMEs in these networks, and preventing or reducing systematic loan risk, is an area of great concern for both the regulatory commission and the banks. To allow possible actions to be taken in advance, this paper studies the problem of predicting repayment delinquency in the networked-guarantee loans. We propose a dynamic default prediction framework (DDPF), which preserves temporal network structures and loan behavior sequences in an end-to-end model. In particular, we design a gated recursive and attention mechanism to integrate both the loan behavior and network information. Then, we uncover risky warrant patterns by the learned weights, which effectively accelerate risk evaluation process. Finally, we conduct extensive experiments in a real-world loan risk control system to evaluate its performance, the results demonstrate the effectiveness of our proposed approach compared with state-of-the-art baselines.

Feature Enhancement via User Similarities Networks for Improved Click Prediction in Yahoo Gemini Native

Yahoo's native advertising marketplace (also known as Gemini native) serves billions of ad impressions daily, reaching many hundreds of millions USD in yearly revenue. Driving Gemini native models that are used to predict ad click probability (pCTR) is OFFSET - a feature enhanced collaborative-filtering (CF) based event prediction algorithm. While some of the user features used by OFFSET have high coverage, other features, especially those based on click patterns, suffer from extremely low coverage. In this work, we present a framework that simplifies complex interactions between users and other entities in a bipartite graph. The one mode projection of this bipartite graph onto users represents a user similarity network, allowing us to quantify similarities between users. This network is combined with existing user features to create an enhanced feature set. In particular, we describe the implementation and performance of our framework using user Internet browsing data (e.g., visited pages URLs) to enhance the user category feature. Using our framework we effectively increase the feature coverage by roughly 15%. Moreover, online results evaluated on 1% of Gemini native traffic show that using the enhanced feature increases revenue by almost 1% when compared to the baseline operating with the original feature, which is a substantial increase at scale.

Large-Scale Visual Search with Binary Distributed Graph at Alibaba

Graph-based approximate nearest neighbor search has attracted more and more attentions due to its online search advantages. Numbers of methods studying the enhancement of speed and recall have been put forward. However, few of them focus on the efficiency and scale of offline graph-construction. For a deployed visual search system with several billions of online images in total, building a billion-scale offline graph in hours is essential, which is almost unachievable by most existing methods.

In this paper, we propose a novel algorithm called Binary Distributed Graph to solve this problem. Specifically, we combine binary codes with graph structure to speedup both offline and online procedures, and achieve comparable performance with the ones that use real-value features, by recalling and reranking more binary candidates. Furthermore, the graph-construction is optimized to completely distributed implementation, which significantly accelerates the offline process and gets rid of the limitation of single machine, such as memory and storage. Experimental comparisons on Alibaba Commodity Data Set (more than three billion images) show that the proposed method outperforms the state-of-the-art with respect to the online/offline trade-off.

Graph Representation Learning for Merchant Incentive Optimization in Mobile Payment Marketing

Mobile payment such as Alipay has been widely used in our daily lives. To further promote the mobile payment activities, it is important to run marketing campaigns under a limited budget by providing incentives such as coupons, commissions to merchants. As a result, incentive optimization is the key to maximizing the commercial objective of the marketing campaign. With the analyses of online experiments, we found that the transaction network can subtly describe the similarity of merchants' responses to different incentives, which is of great use in the incentive optimization problem. In this paper, we present a graph representation learning method atop of transaction networks for merchant incentive optimization in mobile payment marketing. With limited samples collected from online experiments, our end-to-end method first learns merchant representations based on an attributed transaction networks, then effectively models the correlations between the commercial objectives each merchant may achieve and the incentives under varying treatments. Thus we are able to model the sensitivity to incentive for each merchant, and spend the most budgets on those merchants that show strong sensitivities in the marketing campaign. Extensive offline and online experimental results at Alipay demonstrate the effectiveness of our proposed approach.

SESSION: Applied - Recommendation and Advertising

Query-based Interactive Recommendation by Meta-Path and Adapted Attention-GRU

Recently, interactive recommender systems are becoming increasingly popular. The insight is that, with the interaction between users and the system, (1) users can actively intervene the recommendation results rather than passively receive them, and (2) the system learns more about users so as to provide better recommendation.

We focus on the single-round interaction, i.e. the system asks the user a question (Step 1), and exploits his feedback to generate better recommendation (Step 2). A novel query-based interactive recommender system is proposed in this paper, where personalized questions are accurately generated from millions of automatically constructed questions in Step 1, and the recommendation is ensured to be closely-related to users' feedback in Step 2. We achieve this by transforming Step 1 into a query recommendation task and Step 2 into a retrieval task. The former task is our key challenge. We firstly propose a model based on Meta-Path to efficiently retrieve hundreds of query candidates from the large query pool. Then an adapted Attention-GRU model is developed to effectively rank these candidates for recommendation. Offline and online experiments on Taobao, a large-scale e-commerce platform in China, verify the effectiveness of our interactive system. The system has already gone into production in the homepage of Taobao App since Nov. 11, 2018 (see on how it works online). Our code is public in

Learning Adaptive Display Exposure for Real-Time Advertising

In E-commerce advertising, where product recommendations and product ads are presented to users simultaneously, the traditional setting is to display ads at fixed positions. However, under such a setting, the advertising system loses the flexibility to control the number and positions of ads, resulting in sub-optimal platform revenue and user experience. Consequently, major e-commerce platforms (e.g., have begun to consider more flexible ways to display ads. In this paper, we investigate the problem of advertising with adaptive exposure: can we dynamically determine the number and positions of ads for each user visit under certain business constraints so that the platform revenue can be increased? More specifically, we consider two types of constraints: request-level constraint ensures user experience for each user visit, and platform-level constraint controls the overall platform monetization rate. We model this problem as a Constrained Markov Decision Process with per-state constraint (psCMDP) and propose a constrained two-level reinforcement learning approach to decompose the original problem into two relatively independent sub-problems. To accelerate policy learning, we also devise a constrained hindsight experience replay mechanism. Experimental evaluations on industry-scale real-world datasets demonstrate the merits of our approach in both obtaining higher revenue under the constraints and the effectiveness of the constrained hindsight experience replay mechanism.

What You Look Matters?: Offline Evaluation of Advertising Creatives for Cold-start Problem

Modern online auction-based advertising systems combine item and user features to promote ad creatives with the most revenue.However, new ad creatives have to display for certain initial users before enough click statistics could collected and utilized in later ads ranking and bidding processes. This leads to a well-known challenging cold start problem.In this paper, we argue that the content of the creatives intrinsically determines their performance (e.g. ctr, cvr), and we add a pre-ranking stage based on the content. The stage prunes inferior creatives and thus makes online impressions more effective. Since the pre-ranking stage can be executed offline, we can use deep features and take their well generalization to navigate the cold start problem.Specifically, we propose Pre Evaluation Ad Creation Model (PEAC), a novel method to evaluate creatives even before they were shown in the online ads system. Our proposed PEAC only utilizes ads information such as verbal and visual content, but requires no user data as features. During the online A/B testing, PEAC shows significant improvement in revenue. The method has been implemented and deployed in the large scale online advertising system at ByteDance. Furthermore, we provide detailed analysis on what the model learns, which also gives suggestions for ad creative design.

Multi-Interest Network with Dynamic Routing for Recommendation at Tmall

Industrial recommender systems have embraced deep learning algorithms for building intelligent systems to make accurate recommendations. At its core, deep learning offers powerful ability for learning representations from data, especially for user and item representations. Existing deep learning-based models usually represent a user by one representation vector, which is usually insufficient to capture diverse interests for large-scale users in practice. In this paper, we approach the learning of user representations from a different view, by representing a user with multiple representation vectors encoding the different aspects of the user's interests. To this end, we propose the Multi-Interest Network with Dynamic routing (MIND) for learning user representations in recommender systems. Specifically, we design a multi-interest extractor layer based on the recently proposed dynamic routing mechanism, which is applicable for modeling and extracting diverse interests from user's behaviors. Furthermore, a technique named label-aware attention is proposed to help the learning process of user representations. Through extensive experiments on several public benchmarks and one large-scale industrial dataset from Tmall, we demonstrate that MIND can achieve superior performance than state-of-the-art methods in terms of recommendation accuracy. Currently, MIND has been deployed for handling major online traffic at the homepage on Mobile Tmall App.

Learning to be Relevant: Evolution of a Course Recommendation System

We present the evolution of a large-scale content recommendation platform for LinkedIn Learning, serving 645M+ LinkedIn users across several different channels (e.g., desktop, mobile). We address challenges and complexities from both algorithms and infrastructure perspectives. We describe the progression from unsupervised models that exploit member similarity with course content, to supervised learning models leveraging member interactions with courses, and finally to hyper-personalized mixed-effects models with several million coefficients. For all the experiments, we include metric lifts achieved via online A/B tests and illustrate the trade-offs between computation and storage requirements.

SDM: Sequential Deep Matching Model for Online Large-scale Recommender System

Capturing users' precise preferences is a fundamental problem in large-scale recommender system. Currently, item-based Collaborative Filtering (CF) methods are common matching approaches in industry. However, they are not effective to model dynamic and evolving preferences of users. In this paper, we propose a new sequential deep matching (SDM) model to capture users' dynamic preferences by combining short-term sessions and long-term behaviors. Compared with existing sequence-aware recommendation methods, we tackle the following two inherent problems in real-world applications: (1) there could exist multiple interest tendencies in one session. (2) long-term preferences may not be effectively fused with current session interests. Long-term behaviors are various and complex, hence those highly related to the short-term session should be kept for fusion. We propose to encode behavior sequences with two corresponding components: multi-head self-attention module to capture multiple types of interests and long-short term gated fusion module to incorporate long-term preferences. Successive items are recommended after matching between sequential user behavior vector and item embedding vectors. Offline experiments on real-world datasets show the superior performance of the proposed SDM. Moreover, SDM has been successfully deployed on online large-scale recommender system at Taobao and achieves improvements in terms of a range of commercial metrics.

SESSION: Applied - Urbanism and Mobility

Multi-Agent Reinforcement Learning for Order-dispatching via Order-Vehicle Distribution Matching

Improving the efficiency of dispatching orders to vehicles is a research hotspot in online ride-hailing systems. Most of the existing solutions for order-dispatching are centralized controlling, which require to consider all possible matches between available orders and vehicles. For large-scale ride-sharing platforms, there are thousands of vehicles and orders to be matched at every second which is of very high computational cost. In this paper, we propose a decentralized execution order-dispatching method based on multi-agent reinforcement learning to address the large-scale order-dispatching problem. Different from the previous cooperative multi-agent reinforcement learning algorithms, in our method, all agents work independently with the guidance from an evaluation of the joint policy since there is no need for communication or explicit cooperation between agents. Furthermore, we use KL-divergence optimization at each time step to speed up the learning process and to balance the vehicles (supply) and orders (demand). Experiments on both the explanatory environment and real-world simulator show that the proposed method outperforms the baselines in terms of accumulated driver income (ADI) and Order Response Rate (ORR) in various traffic environments. Besides, with the support of the online platform of Didi Chuxing, we designed a hybrid system to deploy our model.

MONOPOLY: Learning to Price Public Facilities for Revaluing Private Properties with Large-Scale Urban Data

The value assessment of private properties is an attractive but challenging task which is widely concerned by a majority of people around the world. A prolonged topic among us is "how much is my house worth?". To answer this question, most experienced agencies would like to price a property given the factors of its attributes as well as the demographics and the public facilities around it. However, no one knows the exact prices of these factors, especially the values of public facilities which may help assess private properties. In this paper, we introduce our newly launched project "Monopoly" (named after a classic board game) in which we propose a distributed approach for revaluing private properties by learning to price public facilities (such as hospitals, schools, and metros) with the large-scale urban data we have accumulated via Baidu Maps. To be specific, our method organizes many points of interest (POIs) into an undirected weighted graph and formulates multiple factors including the virtual prices of surrounding public facilities as adaptive variables to parallelly estimate the housing prices we know. Then the prices of both public facilities and private properties can be iteratively updated according to the loss of prediction until convergence. We have conducted extensive experiments with the large-scale urban data of several metropolises in China. Results show that our approach outperforms several mainstream methods with significant margins. Further insights from more in-depth discussions demonstrate that the "Monopoly" is an innovative application in the interdisciplinary field of business intelligence and urban computing, and it will be beneficial to tens of millions of our users for investments and to the governments for urban planning as well as taxation.

CityTraffic: Modeling Citywide Traffic via Neural Memorization and Generalization Approach

With the increasing vehicles on the road, it is becoming more and more important to sense citywide traffic, which is of great benefit to the government's policy-making and people's decision making. Currently, traffic speed and volume information are mostly derived from GPS trajectories data and volume sensor records respectively. Unfortunately, speed and volume information suffer from serious data missing problem. Speed can be absent at arbitrary road segment and time slot, while volume is only recorded by limited volume sensors. For modeling citywide traffic, inspired by the observations of missing patterns and prior knowledge about traffic, we propose a neural memorization and generalization approach to infer the missing speed and volume, which mainly consists of a memorization module for speed inference and a generalization module for volume inference. Considering the temporal closeness and period properties, memorization module takes advantage of neural multi-head self-attention architecture to memorize the intrinsic correlations from historical traffic information. Generalization module adopts neural key-value attention architecture to generalize the extrinsic dependencies among volume sensors by exploiting road contexts. We conduct extensive experiments on two real-world datasets in two cities, Guiyang and Jinan, and the experimental results consistently demonstrate the advantages of our approach. We have developed a real-time system on the cloud, entitled CityTraffic, providing citywide traffic speed and volume information and fine-grained pollutant emission of vehicles in Guiyang city.

Deep Dynamic Fusion Network for Traffic Accident Forecasting

Traffic accident forecasting is a vital part of intelligent transportation systems in urban sensing. However, predicting traffic accidents is not trivial because of two key challenges: i) the complexities of external factors which are presented with heterogeneous data structures; ii) the complex sequential transition regularities exhibited with time-dependent and high-order inter-correlations. To address these challenges, we develop a deep Dynamic Fusion Network framework (DFN), to explore the central theme of improving the ability of deep neural network on modeling heterogeneous external factors in a fully dynamic manner for traffic accident forecasting. Specifically, DFN first develops an integrative architecture, i.e., with the cooperation of a context-aware embedding module and a hierarchical fusion network, to effectively transferring knowledge from different external units for spatial-temporal pattern learning across space and time. After that, we further develop a temporal aggregation neural network layer to automatically capture relevance scores from the temporal dimension. Through extensive experiments on real-world data collected from New York City, we validate the effectiveness of our framework against various competitive methods. Besides, we also provide a qualitative analysis on prediction results to show the model interpretability.

Matrix Factorization for Spatio-Temporal Neural Networks with Applications to Urban Flow Prediction

Predicting urban flow is essential for city risk assessment and traffic management, which profoundly impacts people's lives and property. Recently, some deep learning models, focusing on capturing spatio-temporal (ST) correlations between urban regions, have been proposed to predict urban flows. However, these models overlook latent region functions that impact ST correlations greatly. Thus, it is necessary to have a framework to assist these deep models in tackling the region function issue. However, it is very challenging because of two problems: 1) how to make deep models predict flows taking into consideration latent region functions; 2) how to make the framework generalize to a variety of deep models. To tackle these challenges, we propose a novel framework that employs matrix factorization for spatio-temporal neural networks (MF-STN), capable of enhancing the state-of-the-art deep ST models. MF-STN consists of two components: 1) a ST feature learner, which obtains features of ST correlations from all regions by the corresponding sub-networks in the existing deep models; and 2) a region-specific predictor, which leverages the learned ST features to make region-specific predictions. In particular, matrix factorization is employed on the neural networks, namely, decomposing the region-specific parameters of the predictor into learnable matrices, i.e., region embedding matrices and parameter embedding matrices, to model latent region functions and correlations among regions. Extensive experiments were conducted on two real-world datasets, illustrating that MF-STN can significantly improve the performance of some representative ST models while preserving model complexity.

SESSION: Applied - Language Models

Semantically Driven Auto-completion

The Bloomberg Terminal has been a leading source of financial data and analytics for over 30 years. Through its thousands of functions, the Terminal allows its users to query and run analytics over a large array of data sources, including structured, semi-structured, and unstructured data; as well as plot charts, set up event-driven alerts and triggers, create interactive maps, exchange information via email and instant messaging, and so on. To improve user experience, we have been building question answering systems that can understand a wide range of natural language constructs for various domains that are of fundamental interest to our users. Such natural language interfaces, while exceedingly helpful to users, introduce a number of usability challenges of their own. We tackle some of these challenges through auto-completion. A distinguishing mark of our auto-complete systems is that they are based on and guided by corresponding semantic parsing systems. We describe the auto-complete problem as it arises in this setting, the novel algorithms that we use to solve it, and report on the quality of the results and the efficiency of our approach.

Spam Review Detection with Graph Convolutional Networks

Reviews on online shopping websites affect the buying decisions of customers, meanwhile, attract lots of spammers aiming at misleading buyers. Xianyu, the largest second-hand goods app in China, suffering from spam reviews. The anti-spam system of Xianyu faces two major challenges: scalability of the data and adversarial actions taken by spammers. In this paper, we present our technical solutions to address these challenges. We propose a large-scale anti-spam method based on graph convolutional networks (GCN) for detecting spam advertisements at Xianyu, named GCN-based Anti-Spam (GAS) model. In this model, a heterogeneous graph and a homogeneous graph are integrated to capture the local context and global context of a comment. Offline experiments show that the proposed method is superior to our baseline model in which the information of reviews, features of users and items being reviewed are utilized. Furthermore, we deploy our system to process million-scale data daily at Xianyu. The online performance also demonstrates the effectiveness of the proposed method.

Industry Specific Word Embedding and its Application in Log Classification

Word, sentence and document embeddings have become the cornerstone of most natural language processing-based solutions. The training of an effective embedding depends on a large corpus of relevant documents. However, such corpus is not always available, especially for specialized heavy industries such as oil, mining, or steel. To address the problem, this paper proposes a semi-supervised learning framework to create document corpus and embedding starting from an industry taxonomy, along with a very limited set of relevant positive and negative documents. Our solution organizes candidate documents into a graph and adopts different explore and exploit strategies to iteratively create the corpus and its embedding. At each iteration, two metrics, called Coverage and Context Similarity, are used as proxy to measure the quality of the results. Our experiments demonstrate how an embedding created by our solution is more effective than the one created by processing thousands of industry-specific document pages. We also explore using our embedding in downstream tasks, such as building an industry specific classification model given labeled training data, as well as classifying unlabeled documents according to industry taxonomy terms.

Document-Level Multi-Aspect Sentiment Classification for Online Reviews of Medical Experts

In the era of big data, online doctor review platforms, which enable patients to give feedback to their doctors, have become one of the most important components in healthcare systems. On one hand, they help patients to choose their doctors based on the experience of others. On the other hand, they help doctors to improve the quality of their service. Moreover, they provide important sources for us to discover common concerns of patients and existing problems in clinics, which potentially improve current healthcare systems. In this paper, we systematically investigate the dataset from one of such review platform, namely,, where each review for a doctor comes with an overall rating and ratings of four different aspects. A comprehensive statistical analysis is conducted first for reviews, ratings, and doctors. Then, we explore the content of reviews by extracting latent topics related to different aspects with unsupervised topic modeling techniques. As the core component of this paper, we propose a multi-task learning framework for the document-level multi-aspect sentiment classification. This task helps us to not only recover missing aspect-level ratings and detect inconsistent rating scores but also identify aspect-keywords for a given review based on ratings. The proposed model takes both features of doctors and aspect-keywords into consideration. Extensive experiments have been conducted on two subsets of ratemds dataset to demonstrate the effectiveness of the proposed model.

SESSION: Applied - Novel Applications

Deep Learning for Blast Furnaces: Skip-Dense Layers Deep Learning Model to Predict the Remaining Time to Close Tap-holes for Blast Furnaces

Manufacturing steel requires extremely challenging industrial processes. In particular, predicting the exact time instance of opening and closing tap-holes in a blast furnace has a great influence on steel production efficiency and operating cost, in addition to human safety. However, currently predicting the time to open and close tap-holes of the blast furnace still highly relies on manual human expertise and labor. Also, most of the prior research is limited to indirectly model the level of liquids in the hearth, using complex mathematical models or classical machine learning approaches.

In this paper, we use a data-driven deep learning method to more accurately predict the remaining time to close each tap-hole in a blast furnace and develop an AI-enabled automated advisory system to reduce manual human efforts as well as operation cost. We develop a multivariate time series forecasting algorithm using Convolutional Neural Networks (CNNs) and Long Short-Term Memory (LSTM) models to more accurately predict the opening and closing time for the Pohang Iron and Steel Company (POSCO) blast furnace. In particular, we use and validate data from one of the largest operating furnaces in the world to develop our system. Our proposed Skip-dense CNN (S-CNN) model achieves more than 90% accuracy within ±30 minutes tolerance, compared to other LSTM baseline models. Our S-CNN model has been successfully deployed at a large-scale blast furnace of POSCO since January 2018 and has achieved similar accuracy. And we even exceeded the reported human performance in a real operational environment.

Deep Graph Similarity Learning for Brain Data Analysis

We propose an end-to-end graph similarity learning framework called Higher-order Siamese GCN for multi-subject fMRI data analysis. The proposed framework learns the brain network representations via a supervised metric-based approach with siamese neural networks using two graph convolutional networks as the twin networks. Our proposed framework performs higher-order convolutions by incorporating higher-order proximity in graph convolutional networks to characterize and learn the community structure in brain connectivity networks. To the best of our knowledge, this is the first community-preserving graph similarity learning framework for multi-subject brain network analysis. Experimental results on four real fMRI datasets demonstrate the potential use cases of the proposed framework for multi-subject brain analysis in health and neuropsychiatric disorders. Our proposed approach achieves an average AUC gain of $75$% compared to PCA, an average AUC gain of $65.5$% compared to Spectral Embedding, and an average AUC gain of $24.3$% compared to S-GCN across the four datasets, indicating promising applications in clinical investigation and brain disease diagnosis.

How to Find It Better?: Cross-Learning for WeChat Mini Programs

WeChat Mini Program is a lightweight app relying on the WeChat client, which can be accessed directly from the search list without downloading and installing. Retrieval and ranking for the Mini Programs differ from traditional web search in two sides. On the one hand, as the search queries are often short and most Mini Programs contain few useful textual information, it is hard to retrieve when the user input is inaccurate. On the other hand, without user scoring and rating system like App Store and Google Play, it is hard to rank relatively better results in a more advanced position. In this paper, we propose a Cross-Learning strategy to improve the search experience, where the semantics of queries and Mini Programs are represented not by itself, but by each other. We treat the search task as an extreme multi-label classification problem where the queries are inputs and the Mini Programs are labels. We propose a N-Gram self-attention query encoder to capture the search intention behind these short queries, and carefully design the label selection strategy based on user behavior to rank higher quality Mini Programs in higher positions. Our model outperforms some state-of-the-art baselines in the offline environment, and brought improvement to our actual business in the online A/B Test, which proves the practical significance of our work.

Job2Vec: Job Title Benchmarking with Collective Multi-View Representation Learning

Job Title Benchmarking (JTB) aims at matching job titles with similar expertise levels across various companies. JTB could provide precise guidance and considerable convenience for both talent recruitment and job seekers for position and salary calibration/prediction. Traditional JTB approaches mainly rely on manual market surveys, which is expensive and labor intensive. Recently, the rapid development of Online Professional graph has accumulated a large number of talent career records, which provides a promising trend for data-driven solutions. However, it is still a challenging task since (1) the job title and job transition (job-hopping) data is messy which contains a lot of subjective and non-standard naming conventions for a same position (\eg,Programmer, Software Development Engineer, SDE, Implementation Engineer ), (2) there is a large amount of missing title/transition information, and (3) one talent only seeks limited numbers of jobs which brings the incompleteness and randomness for modeling job transition patterns. To overcome these challenges, we aggregate all the records to construct a large-scale Job Title Benchmarking Graph (Job-Graph), where nodes denote job titles affiliated with specific companies and links denote the correlations between jobs. We reformulate the JTB as the task of link prediction over the Job-Graph that matched job titles should have links. Along this line, we propose a collective multi-view representation learning method (Job2Vec) by examining the Job-Graph jointly in (1) graph topology view (the structure of relationships among job titles), (2) semantic view (semantic meaning of job descriptions), (3) job transition balance view (the numbers of bidirectional transitions between two similar-level jobs are close), and (4) job transition duration view (the shorter the average duration of transitions is, the more similar the job titles are). We fuse the multi-view representations in the encode-decode paradigm to obtain an unified optimal representations for the task of link prediction. Finally, we conduct extensive experiments to validate the effectiveness of our proposed method.

Learning to Predict Human Stress Level with Incomplete Sensor Data from Wearable Devices

Stress is a common problem in modern life that can bring both psychological and physical disorder. Wearable sensors are commonly used to study the relationship between physical records and mental status. Although sensor data generated by wearable devices provides an opportunity to identify stress in people for predictive medicine, in practice, the data are typically complicated and vague and also often fragmented. In this paper, we propose DataCompletion with Diurnal Regularizers (DCDR) and TemporallyHierarchical Attention Network (THAN) to address the fragmented data issue and predict human stress level with recovered sensor data. We model fragmentation as a sparsity issue. The nuclear norm minimization method based on the low-rank assumption is first applied to derive unobserved sensor data with diurnal patterns of human behaviors. A hierarchical recurrent neural network with the attention mechanism then models temporally structural information in the reconstructed sensor data, thereby inferring the predicted stress level. Data for this study were from 75 undergraduate students (taken from a sample of a larger study) who provided sensor data from smart wristbands. They also completed weekly stress surveys as ground-truth labels about their stress levels. This survey lasted 12 weeks and the sensor records are also in this period. The experimental results demonstrate that our approach significantly outperforms conventional methods in both data completion and stress level prediction. Moreover, an in-depth analysis further shows the effectiveness and robustness of our approach.

Fine-Grained Fuel Consumption Prediction

The high costs and pollutant emissions of vehicles have raised the demand for reducing fuel consumption globally. The idea is to improve the operations of vehicles without losing the output power such that the engine speed and torque work with the minimum fuel consumption rate. It relies on the complete map of engine speed and torque to fuel consumption rate, known as the engine universal characteristic map. Unfortunately, such a map is often incomplete (fuel consumption rate not observed under most engine speed and torque combinations) and inconsistent (different fuel consumption rates observed under the same engine speed and torque combination). In this paper, we propose to predict the fine-grained fuel consumption rate of each engine speed and torque combination, by learning a model from the incomplete and inconsistent observation data. A novel FuelNet is designed based on Convolutional Neural Networks (CNNs) and Generative Adversarial Networks (GANs). Deconvolution is employed to predict the incomplete fuel consumption rates, while the discriminator can successfully tolerate the inconsistent fuel consumption rate observations. Experiments show that our FuelNet outperforms the existing approaches in both imputing the incomplete and repairing the inconsistent fuel consumption rates. Remarkably, we deploy the predicted fine-grained fuel consumption rates in a mobile application to assist driving, and show that the fuel consumption can be reduced up to 12.8%.

SESSION: Applied - Online and User bahaviors

Soft Frequency Capping for Improved Ad Click Prediction in Yahoo Gemini Native

Yahoo's native advertising (also known as Gemini native) serves billions of ad impressions daily, reaching a yearly run-rate of many hundred of millions USD. Driving the Gemini native models that are used to predict both click probability (pCTR) and conversion probability (pCONV) is øffset\ -- a feature enhanced collaborative-filtering (CF) based event prediction algorithm. øffset is a one-pass algorithm that updates its model for every new batch of logged data using a stochastic gradient descent (SGD) based approach. Since øffset represents its users by their features (i.e., user-less model) due to sparsity issues, rule based hard frequency capping (HFC) is used to control the number of times a certain user views a certain ad. Moreover, related statistics reveal that user ad fatigue results in a dramatic drop in click through rate (CTR). Therefore, to improve click prediction accuracy, we propose a soft frequency capping (SFC) approach, where the frequency feature is incorporated into the øffset model as a user-ad feature and its weight vector is learned via logistic regression as part of øffset training. Online evaluation of the soft frequency capping algorithm via bucket testing showed a significant $7.3$% revenue lift. Since then, the frequency feature enhanced model has been pushed to production serving all traffic, and is generating a hefty revenue lift for Yahoo Gemini native. We also report related statistics that reveal, among other things, that while users' gender does not affect ad fatigue, the latter seems to increase with users' age.

Adversarial Factorization Autoencoder for Look-alike Modeling

Digital advertising is performed in multiple ways, for e.g., contextual, display-based and search-based advertising. Across these avenues, the primary goal of the advertiser is to maximize the return on investment. To realize this, the advertiser often aims to target the advertisements towards a targeted set of audience as this set has a high likelihood to respond positively towards the advertisements. One such form of tailored and personalized, targeted advertising is known as look-alike modeling, where the advertiser provides a set of seed users and expects the machine learning model to identify a new set of users such that the newly identified set is similar to the seed-set with respect to the online purchasing activity. Existing look-alike modeling techniques (i.e., similarity-based and regression-based) suffer from serious limitations due to the implicit constraints induced during modeling. In addition, the high-dimensional and sparse nature of the advertising data increases the complexity. To overcome these limitations, in this paper, we propose a novel Adversarial Factorization Autoencoder that can efficiently learn a binary mapping from sparse, high-dimensional data to a binary address space through the use of an adversarial training procedure. We demonstrate the effectiveness of our proposed approach on a dataset obtained from a real-world setting and also systematically compare the performance of our proposed approach with existing look-alike modeling baselines.

Concept Drift Adaption for Online Anomaly Detection in Structural Health Monitoring

Despite its success for anomaly detection in the scenario where only data representing normal behavior are available, one-class support vector machine (OCSVM) still has challenge in dealing with non-stationary data stream, where the underlying distributions of data are time-varying. Existing OCSVM-based online learning methods incrementally update the model to address the challenge, however, they solely rely on the location relationship between a test sample and error support vectors. To better accommodate normal behavior evolution, online anomaly detection in non-stationary data stream is formulated as a concept drift adaptation problem in this paper. It is proposed that OCSVM-based incremental learning is only performed in the case of a normal drift. For an incoming sample, its relative relationship with three sets of vectors in OCSVM, namely margin support vectors, error support vectors, and reserve vectors is fully utilized to estimate whether a normal drift is emerging. Extensive experiments in the field of structural health monitoring have been conducted and the results have shown that the proposed simple approach outperforms the existing OCSVM-based online learning algorithms for anomaly detection.

Multi-task based Sales Predictions for Online Promotions

The e-commerce era is witnessing a rapid development of various annual online promotions, such as Black Friday, Cyber Monday, and Alibaba's 11.11, etc. S ales P redictions for O nline P romotions (SPOP) are a set of sales related forecasts for the promotion day, including gross merchandise volume, sales volume, best selling products, etc. SPOP is highly important for e-commerce platforms to efficiently organize merchandise and maximize business values. However, sales patterns during the promotions are varied according to different scenarios, each model of which is designed with different features, static or dynamic, for one task in particular. Therefore, several models are proposed with part of features that are possibly beneficial to other tasks, which indicates the universal representation for the items needs to be learned across different promotion scenarios. To address this problem, this paper proposes a D eep I tem N etwork for O nline P romotions (DINOP). In DINOP, we design a novel T arget U sers C ontrolled G ated R ecurrent U nit (TUC-GRU) structure for dynamic features, and provide a new attention mechanism introducing static users profiles. In contrast to traditional prediction models, the network we proposed can effectively and efficiently learn universal item representation by incorporating users' properties as controllers. Furthermore, it can successfully discover the static and dynamic features guided by the multi-task learning, and is easily extended to other sales related prediction problems without retraining. Empirical results show that performance of DINOP in the real data set of Alibaba's Global Shopping Festival is superior to other state-of-the-arts practical methodologies in terms of the convergence rate and prediction accuracy.

Experimental Study of Multivariate Time Series Forecasting Models

Multivariate time series forecasting has wide applications such as traffic flow prediction, supermarket commodity demand forecasting and etc. In literature, Due to the complex temporal patterns and inter-dependencies among multivariate time series, a large number of forecasting models have been developed. However, one question still remains unclear: how these models perform on a certain forecasting task, and there is lack of comprehensive performance comparison of these models on different tasks. To this end, in this paper, we conduct a systematic evaluation of eight representative forecasting models over eight multivariate time series datasets, and have the following findings: 1) When the datasets exhibit strong periodic patterns, deep learning models perform best. Otherwise on the datasets in a non-periodic manner, the statistical models such as ARIMA perform best. 2) For the long term prediction involving a high horizon value, the direct prediction strategy could lead to lower errors than the recursive one, but at the cost of higher training time. 3) For the multivariate time series explicitly involving graphic inter-dependencies among the multivariates, e.g., the road network topology in the spatio-temporal time series of traffic volumes in multiple routes, the Graph Convolution Network can incorporate the graphic inter-dependencies into their forecasting models for smaller prediction errors.

GMTL: A GART Based Multi-task Learning Model for Multi-Social-Temporal Prediction in Online Games

Multi-social-temporal (MST) data, which represent multi-attributed time series corresponding to the entities in multi-relational social network series, are ubiquitous in real-world and virtual-world dynamic systems, such as online games. Predictions over MST data such as social time series prediction and temporal link weight prediction are of great importance but challenging. They are affected by many complex factors, including temporal characteristics, social characteristics, collaborative characteristics, task characteristics and the intrinsic causality between them. In this paper, we propose a graph attention recurrent network (GART) based multi-task learning model (GMTL) to fuse information across multiple social-temporal prediction tasks. Experiments on an MMORPG dataset demonstrate that GMTL outperforms the state-of-the-art baselines and can significantly improve performances of specific social-temporal prediction task with additional information from others. Our work has been deployed to several MMORPGs in practice and can also expand to many related multi-social-temporal prediction tasks in real-world applications. Case studies on applications for multi-social-temporal prediction show that GMTL produces great value in the actual business in NetEase Games.

Learning Compositional, Visual and Relational Representations for CTR Prediction in Sponsored Search

As the main revenue source for search engines, sponsored search system retrieves and allocates ads to display on the search result pages. Click-through rate (CTR) prediction is a crucial task for search ads, due to it plays a key role in ranking and pricing of candidate ads. Commercial search engines typically model it as a classification problem and use machine learning models for CTR prediction. Recently deep learning based models have been proposed to learn latent representations of query-ad relevance and historical information to improve the accuracy, which follow the Embedding&MLP paradigm. As the learning feasible embeddings requires sufficient samples and these models rely mainly and heavily on textual features and sparse ID features, representations learning for new ads is inadequate and models are confronted with the cold-start problem. Meanwhile, as search ad offerings become increasingly complex, rich ads with various sizes, decorations and formats are growing rapidly. Due to the diverse ad extensions and layouts, rich ads pose new challenges for CTR prediction. To tackle these problems, in this paper, we propose an approach to improve the accuracy of CTR prediction by learning supplementary representations from three new aspects: the compositional components, the visual appearance and the relational structure of ads. This method can utilize straightforward and auxiliary information for new and rich ads, which can improve the expressive ability and the generalization of model greatly. We demonstrate the performance of this method on datasets obtained from a real sponsored search system in the offline environment. Experimental results show that our approach can improve the accuracy of CTR prediction and achieve superior performance compared to the baseline method, especially for the rich and new ads.

SESSION: Demo - Demo Session 1

Inspect What Your Location History Reveals About You: Raising user awareness on privacy threats associated with disclosing his location data

Location is one of the most extensively collected personal data on mobile by applications and third-party services. However, how the location of users is actually processed in practice by the actors of targeted advertising ecosystem remains unclear. Nonetheless, these providers have a strong incentive to create very detailed profile of users to better monetize the collected data. End users are usually not aware about the strength and wide range of inference that can be performed from their mobility traces. In this demonstration, users interact with a web-based application to inspect their location history and to discover the inferential power of this kind of data. Moreover to better understand the possible countermeasures, users can apply a sanitization to protect their data and visualize the impact on both the mobility traces and the associated inferred information. The objective of this demonstration is to raise the user awareness on the profiling capabilities and the privacy threats associated with disclosing his location data as well as how sanitization mechanisms can be efficient to mitigate these privacy risks. In addition, by collecting users feedbacks on the personal information revealed and the usage of a geosanitization mechanism, we hope that this demonstration will also be useful to constitute a new and valuable dataset on users perceptions on these questions.

PODIUM: Probabilistic Datalog Analysis via Contribution Maximization

The use of probabilistic datalog programs has been advocated for applications that involve recursive computation and uncertainty. While using such programs allows for a flexible knowledge derivation, it makes the analysis of query results a challenging task. Particularly, given a set O of output tuples and a number k, one would like to understand which k-size subset of the input tuples has affected the most the derivation of O. This is useful for multiple tasks, such as identifying critical sources of errors and understanding surprising results. To this end, we formalize the Contribution Maximization problem and present an efficient algorithm to solve it. Our algorithm injects a refined variant of the classic Magic Sets technique, integrated with a sampling method, into top-performing algorithms for the well-studied Influence Maximization problem. We propose to demonstrate our solution in a system called PODIUM. We will demonstrate the usefulness of PODIUM using real-life data and programs, and illustrate the effectiveness of our algorithm.

Document in Context of its Time (DICT): Providing Temporal Context to Support Analysis of Past Documents

Old documents tend to be difficult to be analyzed and understood, not only for average users but oftentimes for professionals as well. This is due to the context shift, vocabulary evolution and, in general, the lack of precise knowledge about the writing styles in the past. We propose a concept of positioning document in the context of its time, and develop an interactive system to support such an objective. Our system helps users to know whether the vocabulary used by an author in the past were frequent at the time of text creation, whether the author used anachronisms or neologisms, and so on. It also enables detecting terms in text that underwent considerable semantic change and provides more information on the nature of such change. Overall, the proposed tool offers additional knowledge on the writing style and vocabulary choice in documents by drawing from data collected at the time of their creation or at other user-specified time.

ATENA: An Autonomous System for Data Exploration Based on Deep Reinforcement Learning

Exploratory Data Analysis (EDA), is an important yet challenging task, that requires profound analytical skills and familiarity with the data domain. While Deep Reinforcement Learning (DRL) is nowadays used to solve AI challenges previously considered to be intractable, to our knowledge such solutions have not yet been applied to EDA.

In this work we present ATENA, an autonomous system capable of exploring a given dataset by executing a meaningfulsequence of EDA operations. ATENA uses a novel DRL architecture, and learns to perform EDA operations by independently interacting with the dataset, without any training data or human assistance. We demonstrate ATENA in the context ofcyber security log analysis, where the audience is invited to partake in a data exploration challenge: explore real-life network logs, assisted by ATENA, in order to reveal underlying security attacks hidden in the data.

ExplIQuE: Interactive Databases Exploration with SQL

To help databases users who have just started learning SQL or are not familiar with their database, we propose ExplIQuE, an exploration interface with query extensions. Its purpose is to assist users to smoothly dive into data exploration, and to be able to express imprecise questions over their data. Indeed, such situations are more and more current with the increasing desire for users to get value out of their data. In this configuration, in addition to classic SQL querying possibilities, ExplIQuE offers the possibility to extend a given SQL query, by suggesting a set of possible selection predicates to add to the query, that aim at dividing the initial answer set to identify interesting exploration zones. In addition, ExplIQuE proposes some indicators to help the user in choosing its desire extension and in understanding her data, as well as interactive visualizations of the result set, in two dimensions revealed by PCA techniques. In this demonstration, we offer the audience the possibility to try the various functionalities of ExplIQuE by trying to express an imprecise question over a scientific database on bacterial colonies, through an iterative process. A video of the proposed demonstration is available at \url

TraVis: An Interactive Visualization System for Mining Inbound Traveler Activities by Leveraging Mobile Ad Request Data

The growth of inbound travel is fully coordinate with the successful urban development. Increasing the number of inbound travelers not only creates more jobs and economic opportunities but also drives the country toward prosperity. Thus, inbound traveler analysis through trajectory pattern mining, a subfield of urban computing, is regarded as a promising solution. This paper introduces large-scale mobile ad requests as an alternative data source of trajectory pattern mining in order to eliminate the limitations of conventional data sources, such as GPS data, cellular data, and IP address data. In addition, to expedite a comprehensive inbound traveler analysis, we build TraVis, a real-world system for efficiently exploring the inbound travelers' activities through the interactive visualization interface. By incorporating various modules, such as mobile users' home country and travel intent prediction, frequent trajectory pattern mining, and interactive visualization, TraVis proves the capability of profiling the travelers' behavior pattern. We use Japan inbound travelers in the case study to present the mining insights, and we also demonstrate the extensive system functionalities. Our system has been assisting Japan government agencies to formulate travel marketing strategies, including tourist experience enhancement and attractions marketing.

Understanding Data in the Blink of an Eye

Many data analysis and knowledge mining tasks require a basic understanding of the content of a dataset prior to any data access. In this demo, we showcase how data descriptions---a set of compact, readable and insightful formulas of boolean predicates---can be used to guide users in understanding datasets. Finding the best description for a dataset is, unfortunately, both computationally hard and task-specific. This demo shows that not only we can generate descriptions at interactive speed, but also that diverse user needs---from anomaly detection to data exploration---can be accommodated through a user-driven process exploiting dynamic programming in concert with a set of heuristics.

SIMILANT: An Analytic Tool for Similarity Modeling

We present SIMILANT, a data analytics tool for modeling similarity in content-based retrieval scenarios. In similarity search, data elements are modeled using black-box descriptors, where a pair-wise similarity function is the only way how to relate data elements to each other. Only these relations provide information about the dataset structure. Data analysts need to identify meaningful combinations of descriptors and similarity functions effectively. Therefore, we proposed a tool enabling a data analyst to systematically browse, tune, and analyze similarity models for a specific domain.

MithraLabel: Flexible Dataset Nutritional Labels for Responsible Data Science

Using inappropriate datasets for data science tasks can be harmful, especially for applications that impact humans. Targeting data ethics, we demonstrate MithraLabel, a system for generating task-specific information about a dataset, in the form of a set of visual widgets, as a flexible "nutritional label" that provides a user with information to determine the fitness of the dataset for the task at hand.

PRIVATA: Differentially Private Data Market Framework using Negotiation-based Pricing Mechanism

As the value of digital data increases, the data market is in the spotlight as a means of obtaining a personal information. However, the collection of personal information makes a serious privacy violation and it is a serious problem in the use of personal data. Differential privacy, which is a de-facto standard for privacy protection in statistical databases, can be applied to solve the privacy violation problem. To apply differential privacy to the data market, the amount of noise and corresponding data price should be determined between the provider and consumer. However, this matter has not yet been studied. In this work, we introduce a Privata which is a differentially private data market framework to set the appropriate price and noise parameter in the data market environment. The Privata is based on negotiation technique using Rubinstein bargaining considering social welfare to prevent unfair transactions. We explain the Privata overview and negotiation technique in Privata, and show the Privata implementation.

MiCRon: Making Sense of News via Relationship Subgraphs

Knowledge graphs (KGs) have been extensively used to annotate text, e.g., news articles, in order to enhance its comprehension by readers. This requires to map entities occurring in the news to the target entities of the KG and to extract a so-called relationship sub-graph (RSG) that spans these entities. RSG extraction is computationally demanding and cannot scale to large KGs. Existing approximation algorithms that focus on structurally compact RSGs are not satisfactory since they often return no answers. We address this problem and develop an efficient algorithm to find approximations that connect the most salient subset of the target entities. Moreover, we propose a context-aware method to rank RSGs by their relevance to the news and their semantic cohesion. In the demo we will present our approach and the attendees will be able to experience how our system MiCRon helps to make sense of news article by computing and presenting RSGs relevant to these articles.

LuPe: A System for Personalized and Transparent Data-driven Decisions

Machine learning models are commonly used for decision support even though they are far from perfect, e.g., due to bias introduced by imperfect training data or wrong feature selection. While efforts are made and should continue to be put into developing better models, we will likely continue to rely on imperfect models in many applications. In these settings, how could we at least use the "best" model for an individual or a group of users and transparently communicate the risks and weaknesses that apply?

We demonstrate LuPe, a system that addresses these questions. LuPe allows to optimize the choice of the applied model for subgroups of the population or individuals, thereby personalizing the model choice to best fit users' profiles, which improves fairness. LuPe further captures data to explain the choices made and the results of the model. We showcase how such data enable users to understand the system performance they can expect. This transparency helps users in making informed decisions or providing informed consent when such systems are used. Our demonstration will focus on several real-world applications showcasing the behavior of LuPe, including credit scoring and income prediction.

Insta-Search: Towards Effective Exploration of Knowledge Graphs

Knowledge Graphs (KGs) are used to store heterogenous information in the form of graphs. One flexible and non-expert way to query these KGs is to use relationship queries or keyword search. The user can specify a query using keywords referring to entities in the graph. The system then returns a set of relationships among the queried entities. However, effectively querying these graphs is still challenging for a new user. She is not familiar with the entities and relationships in the graph and hence, her queries could often return empty or too few answers. We demonstrate a system called Insta-Search which facilitates effective exploration of KGs using relationship queries. Insta-Search helps the user by giving autocomplete keyword suggestions for partially typed words. It also displays an estimated number of answers that the current query would fetch along with few approximate top-scoring answers. The users also get entity suggestions so that they can iteratively reformulate the query until they find the query with the expected results. On submitting the query, the system returns ranked query results, grouped on the basis of similar information content to enhance result interpretation. No prerequisite knowledge of the data is required by the user to be able to use the system.

SESSION: Demo - Demo Session 2

SkyRec: Finding Pareto Optimal Groups

We present SkyRec (Skyline Recommender), a recommendation toolkit for finding optimal groups based on the notion of group skyline. Skyline computation, aiming at identifying a set of skyline points that are not dominated by any other point, is particularly useful for multi-criteria data analysis and decision-making. Traditional skyline computation, however, is inadequate to answer queries that need to analyze not only individual points but also groups of points. o address this gap, SkyRec finds Pare to optimal groups with two group skyline models: G-Skyline [3] and Sum-Skyline [2]. SkyRecre turns Pare to optimal groups with group size k that are not dom-inated by any other group with the same group size. Users can examine the results of the group skyline based recommendation compared to traditional top-k and skyline based recommendation and how different group skyline notions differ from each other. Although we demonstrate Sky Rec for hotel reservation in this paper, it can be applied to various decision-making applications

CurrentClean: Interactive Change Exploration and Cleaning of Stale Data

Enterprises often assume their data is up-to-date, where the presence of a timestamp in the recent past qualifies the data as current. However, entities modeled in the data experience varying rates of change that influence data currency. We argue that data currency is a relative notion based on individual spatio-temporal update patterns, and these patterns can be learned and predicted. We develop CurrentClean, a probabilistic system for identifying and cleaning stale values, and enables a user to interactively explore change in her data. Our system provides a Web-based user-interface, and a backend infrastructure that learns update correlations among cell values in a database to infer and repair stale values. Our demonstration provides two motivating scenarios that highlight change exploration, and cleaning features using clinical, and sensor data from a data centre enterprise.

PatMat: A Distributed Pattern Matching Engine with Cypher

Graph pattern matching is one of the most fundamental problems in graph database and is associated with a wide spectrum of applications. Due to its computational intensiveness, researchers have primarily devoted their efforts to improving the performance of the algorithm while constraining the graphs to have singular labels on vertices (edges) or no label. Whereas in practice graphs are typically associated with rich properties, thus the main focus in the industry is instead on powerful query languages that can express a sufficient number of pattern matching scenarios. We demo PatMat in this work to glue together the academic efforts on performance and the industrial efforts on expressiveness. To do so, we leverage the state-of-the-art join-based algorithms in the distributed contexts and Cypher query language - the most widely-adopted declarative language for graph pattern matching. The experiments demonstrate how we are capable of turning complex Cypher semantics into a distributed solution with high performance.

On a Chatbot Conducting Virtual Dialogues

We present a demo of the chatbot that delivers content in the form of virtual dialogues automatically produced from the plain texts extracted and selected from the documents. This virtual dialogue content is provided in the form of answers derived from the found and selected documents split into fragments, and questions are automatically generated for these answers.

Rehab-Path: Recommending Alcohol and Drug-free Routes

Nowadays routing systems can provide optimal routes in terms of time and travel distance. However, they do not consider special needs of certain group of users. For example, people recovering from alcohol and drug addiction may want to travel a route that is alcohol and drug-free. In this demonstration, we propose a system we built that helps with this special need. We detect if a street is related to alcohol and drug by exploiting Web open data, including Foursquare, microblog tweets, Google Street View images, and crime data. We calculate an alcohol and drug relevance score using unsupervised methods, to be used in route ranking. Our system prototype is ready to be tested for the cities of San Francisco and Kyoto.

kBrowse: kNN Graph Browser

The construction of k-nearest Neighbor Graph (kNNG) in several applications, such as a recommender system, similarity search, and data exploration is heavily based on the distance function which is usually unweighted and considered constant for all users. However, attributes are not all equally important and using different attribute weight gives different kNNGs. We present kBrowse, which allows users to explore, modify and understand kNNG computed from a weighted Manhattan distance function on loosely-defined weight space. It samples possible weight vectors, and computes their corresponding kNNGs. The system summarizes all the kNNGs into one graph by keeping all the edges with high edge certainty, a probabilistic measurement on how likely an edge is going to appear in the weight space. To make the weight space more defined, users can directly adjust the weight space or gives kNN examples. Sample weight vectors failing to satisfy the given conditions are then removed and the graph is summarized again. Finally, kBrowse also gives a user better understanding of kNN by showing which attribute is important in connecting nodes.

BIP! Finder: Facilitating Scientific Literature Search by Exploiting Impact-Based Ranking

Due to the rapidly increasing number of scientific articles, finding valuable work for further research has become tedious and time consuming. To alleviate this issue, search engines have used citation-based article impact ranking. However, most engines rely on very simplistic impact measures (usually the citation count) and make the problematic assumption that there is a one-size-fits-all impact measure. To address these problems, we present BIP! Finder, a search engine that facilitates the identification of valuable articles by exploiting two different impact measures, each capturing a different aspect of the article impact. In addition, BIP! Finder provides many useful features (article comparison, intuitive visualisations, article bookmarking mechanism, etc.) making it a powerful addition to the researcher's toolbox.

BeLink: Querying Networks of Facts, Statements and Beliefs

An important class of journalistic fact-checking scenarios involves verifying the claims and knowledge of different actors at different moments in time. Claims may be about facts, or about other claims, leading to chains of hearsay. We have recently proposed a data model for (time-anchored) facts, statements and beliefs. It builds upon the W3C's RDF standard for Linked Open Data to describe connections between agents and their statements, and to trace information propagation as agents communicate. We propose to demonstrate BeLink, a prototype capable of storing such interconnected corpora, and answer powerful queries over them relying on SPARQL 1.1. The demo will showcase the exploration of a rich real-data corpus built from Twitter and mainstream media, and interconnected through extraction of statements with their sources, time, and topics.

TuneR: Fine Tuning of Rule-based Entity Matchers

A rule-based entity matching task requires the definition of an effective set of rules, which is a time-consuming and error-prone process. The typical approach adopted for its resolution is a trial and error method, where the rules are incrementally added and modified until satisfactory results are obtained. This approach requires significant human intervention, since a typical dataset needs the definition of a large number of rules and possible interconnections that cannot be manually managed. In this paper, we propose TuneR, a software library supporting developers (i.e., coders, scientists, and domain experts) in tuning sets of matching rules. It aims to reduce human intervention by offering a tool for the optimization of rule sets based on user-defined criteria (such as effectiveness, interpretability, etc.). Our goal is to integrate the framework in the Magellan ecosystem, thus completing the functionalities required by the developers for performing Entity Matching tasks.

I-REX: A Lucene Plugin for EXplainable IR

Providing high-level, intuitive explanations of the performance of IR systems is generally difficult due to their complexity, and the various low-level implementation details involved. We present I-REX, a tool built on top of Lucene, that is intended to provide a systematic view into the inner workings of retrieval models and methods (specifically query expansion). This should help researchers study, compare, understand and explain the performance of these models and methods. I-REX can be run either as a Web service accessible through a browser, or as a terminal-based tool with a shell-like interactive interface. In this article, we describe a session that illustrates how I-REX can be used to explain the observed difference in the performance of two variants of the Language Model.

Model Asset eXchange: Path to Ubiquitous Deep Learning Deployment

A recent trend observed in traditionally challenging fields such as computer vision and natural language processing has been the significant performance gains shown by deep learning (DL). In many different research fields, DL models have been evolving rapidly and become ubiquitous. Despite researchers' excitement, unfortunately, most software developers are not DL experts and oftentimes have a difficult time following the booming DL research outputs. As a result, it usually takes a significant amount of time for the latest superior DL models to prevail in industry. This issue is further exacerbated by the common use of sundry incompatible DL programming frameworks, such as Tensorflow, PyTorch, Theano, etc. To address this issue, we propose a system, called Model Asset Exchange (MAX), that avails developers of easy access to state-of-the-art DL models. Regardless of the underlying DL programming frameworks, it provides an open source Python library (called the MAX framework) that wraps DL models and unifies programming interfaces with our standardized RESTful APIs. These RESTful APIs enable developers to exploit the wrapped DL models for inference tasks without the need to fully understand different DL programming frameworks. Using MAX, we have wrapped and open-sourced more than 30 state-of-the-art DL models from various research fields, including computer vision, natural language processing and signal processing, etc. In the end, we selectively demonstrate two web applications that are built on top of MAX, as well as the process of adding a DL model to MAX.

ReducE-Comm: Effective Inventory Reduction System for E-Commerce

Many e-commerce platforms serve as an intermediary between companies/manufacturers and consumers, receiving a commission per purchase. To increase revenue, such sites tend to offer a wide variety of items. However, in many situations a smaller subset of the items should be selected and offered for sale, e.g., when opening an express branch or expanding to a new region, or when maintenance costs become prohibitive and redundant items should be disposed of. In all these cases selecting a reduced inventory which covers most consumer needs is an important goal.

In this demo we introduce ReducE-Comm - a highly parallelizable and scalable system that given a large set of items, a bound on the number of items that can be supported and information about consumer preferences/items relationships, allows to select a subset of the items which maximizes the likelihood of a purchase. Our system is interactive and facilitates real-time analysis, by providing detailed per-item impact statistics. We demonstrate the effectiveness of ReducE-Comm on real-world data and scenarios taken from a large e-commerce system, by interacting with the CIKM'19 audience who act as analysts aiming to intelligently reduce the inventory.

dEFEND: A System for Explainable Fake News Detection

Despite recent advancements in computationally detecting fake news, we argue that a critical missing piece be the explainability of such detection--i.e., why a particular piece of news is detected as fake--and propose to exploit rich information in users' comments on social media to infer the authenticity of news. In this demo paper, we present our system for an explainable fake news detection called dEFEND, which can detect the authenticity of a piece of news while identifying user comments that can explain why the news is fake or real. Our solution develops a sentence-comment co-attention sub-network to exploit both news contents and user comments to jointly capture explainable top-k check-worthy sentences and user comments for fake news detection. The system is publicly accessible.

SESSION: Tutorials

Enterprise Knowledge Graph From Specific Business Task to Enterprise Knowledge Management

Data driven Knowledge Graph is rapidly adapted by different societies. Many open domain and specific domain knowledge graphs have been constructed, and many industries have benefited from knowledge graph. Currently, enterprise related knowledge graph is classified as specific domain, but the applications span from solving a narrow specific problem to Enterprise Knowledge Management system. With the digital transform of traditional industry, Enterprise knowledge becomes more and more complicated, it involves knowledge from common domain, multiple specific domains, and corporate-specific in general. This tutorial provides an overview of current Enterprise Knowledge Graph(EKG). It distinguishes the EKG from specific domain according to the knowledge it covers, and provides the examples to illustrate the difference between EKG and specific domain KG. The tutorial further summarizes EKG into three types: Specific Business Task Enterprise KG, Specific Business Unit Enterprise KG and Cross Business Unit Enterprise KG, and illustrates the characteristics, steps, challenges, and future research in constructing and consuming of each of these three types of EKG .

Taming Social Bots: Detection, Exploration and Measurement

Social bots have been around for over a decade since 2008. Social bots are capable of swaying political opinion, spreading false information, and recruiting for terrorist organizations. Social bots use various sophisticated techniques by adopting emotions, sympathy following, synchronous deletions, and profile molting. There are several approaches proposed in the literature for detection, exploration, and measuring social bots. We provide a comprehensive overview of the existing work from data mining and machine learning perspective, discuss relative strengths and weaknesses of various methods, make recommendations for researchers and practitioners, and propose novel directions for future research in taming the social bots. The tutorial also discusses pitfalls in collecting and sharing data on social bots.

Learning-Based Methods with Human-in-the-Loop for Entity Resolution

This tutorial is intended for researchers and practitioners working in the data integration area and, in particular, entity resolution (ER), which is a sub-area focused on linking entities across heterogeneous datasets. We outline the ideal requirements of modern ER systems: (1) capture domain knowledge via (minimal) human interaction, (2) provide as much automation as possible via machine learning techniques, and (3) achieve high explainability. We describe recent research trends towards bringing such ideal ER systems closer to reality. We begin with an overview of human-in-the-loop methods that are based on techniques such as crowdsourcing and active learning. We then dive into recent trends that involve deep learning techniques such as representation learning to automate feature engineering, and combinations of transfer and active learning to reduce the amount of user labels required. We also discuss how explainable AI relates to ER, and outline some of the recent advances towards explainable ER.

Learning and Reasoning on Graph for Recommendation

Recommendation methods construct predictive models to estimate the likelihood of a user-item interaction. Previous models largely follow a general supervised learning paradigm --- treating each interaction as a separate data instance and performing prediction based on the ''information isolated island''. Such methods, however, overlook the relations among data instances, which may result in suboptimal performance especially for sparse scenarios. Moreover, the models built on a separate data instance only can hardly exhibit the reasons behind a recommendation, making the recommendation process opaque to understand. In this tutorial, we revisit the recommendation problem from the perspective of graph learning. Common data sources for recommendation can be organized into graphs, such as user-item interactions (bipartite graphs), social networks, item knowledge graphs (heterogeneous graphs), among others. Such a graph-based organization connects the isolated data instances, bringing benefits to exploiting high-order connectivities that encode meaningful patterns for collaborative filtering, content-based filtering, social influence modeling and knowledge-aware reasoning. Together with the recent success of graph neural networks (GNNs), graph-based models have exhibited the potential to be the technologies for next-generation recommendation systems. This tutorial provides a review on graph-based learning methods for recommendation, with special focus on recent developments of GNNs and knowledge graph-enhanced recommendation. By introducing this emerging and promising topic in this tutorial, we expect the audience to get deep understanding and accurate insight on the spaces, stimulate more ideas and discussions, and promote developments of technologies.

Recent Developments of Deep Heterogeneous Information Network Analysis

Recently, there is a surge of research on employing Heterogeneous Information Networks (HIN) to model complex interaction system, where networks compose of different types of nodes or links, since HIN contains richer structure and semantic information. Many researches develop structural analysis approaches by leveraging the rich semantic meaning of structural types of objects and links in the networks. Furthermore, recent advancement on deep learning and network embedding poses new opportunities and challenges to mine HIN, and heterogeneous network embedding, even heterogeneous graph neural network, is becoming a hot topic. In this tutorial, we will give a survey on recent developments of heterogeneous information network analysis, especially on newly emerging heterogeneous network embedding. This tutorial shall help researchers and practitioners to share new techniques for identifying and analyzing relationships in networks that integrate multiple types or sources of information.

Synergy of Database Techniques and Machine Learning Models for String Similarity Search and Join

String data is ubiquitous and string similarity search and join are critical to the applications of information retrieval, data integration, data cleaning, and also big data analytics. To support these operations, many techniques in the database and machine learning areas have been proposed independently. More precisely, in the database research area, there are techniques based on the filtering-and-verification framework that can not only achieve a high performance, but also provide guaranteed quality of results for given similarity functions. In the machine learning research area, string similarity processing is modeled as a problem of identifying similar text records; Specifically, the deep learning approaches use embedding techniques that map text to a low-dimensional continuous vector space. In this tutorial, we review a number of studies of string similarity search and join in these two research areas. We divide the studies in each area into different categories. For each category, we provide a comprehensive review of the relevant works, and present the details of these solutions. We conclude this tutorial by pinpointing promising directions for future work to combine techniques in these two areas.

Realtime Object Detection via Deep Learning-based Pipelines

Ever wonder how the Tesla Autopilot system works (or why it fails)? In this tutorial we will look under the hood of self-driving cars and of other applications of computer vision and review state-of-the-art tech pipelines for object detection such as two-stage approaches (e.g., Faster R-CNN) or single-stage approaches (e.g., YOLO/SSD). This is accomplished via a series of Jupyter Notebooks that use Python, OpenCV, Keras, and Tensorflow. No prior knowledge of computer vision is assumed (although it will be help!). To this end we begin this tutorial with a review of computer vision and traditional approaches to object detection such as Histogram of oriented gradients (HOG).

Recommendation for Multi-stakeholders and through Neural Review Mining

Recommender systems are able to produce a list of recommended items tailored to user preferences, while the end user is the only stakeholder in these traditional system. However, there could be multiple stakeholders in several applications domains (e.g., e-commerce, movies, music). Recommendations are necessary to be produced by balancing the needs of different stakeholders. First session of this tutorial introduces multi-stakeholder recommender systems (MSRS) with several case studies, and discusses the corresponding methods and challenges in MSRS. Reviews in an e-commerce platform may be mined to address cold-start problem and to generate explanations. Our earlier tutorial covered aspect-based sentiment analysis of products and topic models/distributed representations that bridge vocabulary gap between user reviews and product descriptions. Focus in the second session of this tutorial instead is on recent neural methods for review text mining - covering hands-on code for its use to enhance product recommendation. Each section will introduce topics from various mechanism (e.g., attention) and task (e.g., review ranking) perspectives, present cutting-edge research and a walk-through of programs executed on Jupyter notebook using real-world data sets.

Machine Learning on Graphs with Kernels

Graphs are becoming a dominant structure in current information management with many domains involved, including social networks, chemistry, biology, etc. Many real-world problems require applying machine learning tasks to graph-structured data. Graph kernels have emerged as a promising approach for dealing with these tasks. A graph kernel is a symmetric, positive semidefinite function on the set of graphs. These functions extend the applicability of kernel methods to graphs. Graph kernels have attracted a lot of attention during the last 20 years. The considerable research activity that occurred in the field resulted in the development of dozens of kernels, each focusing on specific structural properties of graphs. The goal of this tutorial is to offer a comprehensive presentation of a wide range of graph kernels, and to describe their key applications. The tutorial will also offer to the participants hands-on experience in applying graph kernels to classification problems.

SESSION: Workshop Summaries

DTMBIO 2019: The Thirteenth International Workshop on Data and Text Mining in Biomedical Informatics

Started in 2006 as a specialized workshop in the field of text mining applied to biomedical informatics, DTMBIO (ACM international workshop on Data and Text Mining in Biomedical Informatics) has been held annually in conjunction with one of the largest data management conferences, CIKM, bringing together researchers working on computer science and bioinformatics area including text mining and genomic data analysis. The purpose of DTMBIO is to foster discussions regarding the state-of-the-art applications of data and text mining on biomedical research problems. DTMBIO 2019 will help scientists navigate emerging trends and opportunities in the evolving area of informatics related techniques and problems in the context of biomedical research.

Knowledge-Driven Analytics and Systems Impacting Human Quality of Life

The advent of artificial intelligence (AI), Internet of Things (IoT), powerful computational hardwares like graphics processing units, affordable sensing devices like smart bands, wearables, smartphones pave ways for large number of useful and intelligent applications hitherto never commonly envisaged. However, it is felt that applications, which positively influence human life and society, need distinct attention from the perspective of the researchers, application developers as well as industry. It is understood that knowledge-driven initiatives in terms of technology, application and practical deployment have strong capability to enable long term human-centric convergence of cyber-physical systems. Our endeavor is to discuss those finer details, research directions and application development aspects of analytics and systems intended for impacting human quality of life.

HENA 2019: The 3rd Workshop of Heterogeneous Information Network Analysis and Applications

The third International Workshop on Heterogeneous Information Network Analysis and Applications is held in Beijing, China on November 3, 2019 and is co-located with the 28th International Conference on Information and Knowledge Management. The goal of this workshop is to bring together people from these different areas and provide an opportunity for researchers and practitioners to share new techniques for identifying and analyzing relationships in networks that integrate multiple types or sources of information. This workshop has an exciting program that spans a number of subareas, including: network construction and mining, network embedding, information diffusion, knowledge graph analysis, community detection, parallel computing for network analysis, and network analysis applications. The program includes several invited speakers, lively discussion on emerging topics, and presentations of accepted original research papers.

EYRE 2019: 2nd International Workshop on EntitY REtrieval

Entity retrieval has received increasing research attention from both the Information Retrieval (IR) and Semantic Web communities. This workshop series provides a platform where interdisciplinary studies of entity retrieval can be presented, and focused discussions can take place. We also organize two shared tasks related to entity retrieval. The 2nd International Workshop on EntitY REtrieval (EYRE 2019) was a half-day workshop co-located with the 28th ACM International Conference on Information and Knowledge Management (CIKM 2019) in Beijing, China.

CIKM 2019 Workshop on Artificial Intelligence in Transportation (AI in transportation)

Data-enabled smart transportation has attracted a surge of interest from machine learning and data mining researchers nowadays due to the bloom of online ride-hailing industry and rapid development of autonomous driving. Large-scale high quality route data and trading data (spatiotemporal data) have been generated every day, which makes AI an urgent need and preferred solution for the decision making in intelligent transportation systems. While a large of amount of work have been dedicated to traditional transportation problems, they are far from satisfactory for the rising need. We propose a half-day workshop at CIKM 2019 for the professionals, researchers, and practitioners who are interested in mining and understanding big and heterogeneous data generated in transportation, and AI applications to improve the transportation system. We plan to have several invited talks from both academia and industry. This workshop would be organized by Shanghai Jiao Tong University, Didi Chuxing and Pennsylvania State University.

GRLA 2019: The first International Workshop on Graph Representation Learning and its Applications

Graphs are the universal data structures for representing the relationships between interconnected objects. They are ubiquitous in a variety of disciplines and domains ranging from computer science, social science, economics, medicine, to bioinformatics. In Recent years, extensive studies have been conducted on the graph analysis techniques. One of the most fundamental challenges of analyzing graphs is effectively representing graphs, which largely determines the performance of many follow-up tasks. This workshop aims to provide a forum for industry and academia to discuss the latest progress on graph representation learning and their applications in different fields. We hope more advanced technologies can be proposed or inspired, and also we expect that the direction of graph representation learning can catch much more attention in both academic and industry.

International Workshop on Model Selection and Parameter Tuning in Recommender Systems

Recommender systems have strongly attracted the attention of the machine learning research community with prosperous real-life deployments in the last few decades. The performance and success of most applications developed in this domain highly depend on an elaborate selection of models and configuration of their hyperparameters. The international MoST-Rec 2019 workshop addresses the issues of algorithm selection and parameter tuning for recommender systems. The workshop aims to bring together researchers from the model selection and hyperparameter tuning community in the general scope of machine learning with researchers from the recommender systems community for discussing and exchanging recent advances and open challenges in the field.

2nd Workshop on Knowledge-aware and Conversational Recommender Systems - KaRS

Over the last years, we have been witnessing the advent of more and more precise and powerful recommendation algorithms and techniques able to effectively assess users' tastes and predict information that would probably be of interest for them. Most of these approaches rely on the collaborative paradigm (often exploiting machine learning techniques) and do not take into account the huge amount of knowledge, both structured and non-structured ones, describing the domain of interest of the recommendation engine. Although very effective in in predicting relevant items, collaborative approaches miss some very interesting features that go beyond the accuracy of results and move into the direction of providing novel and diverse results as well as generating an explanation for the recommended items or support interactive and conversational recommendation processes.

BigScholar 2019: The 6th Workshop on Big Scholarly Data

Recent years have witnessed the rapid growth in the number of academics and practitioners who are interested in big scholarly data as well as closely-related areas. Quite a lot of papers reporting recent advancements in this area have been published in leading conferences and journals. Both non-commercial and commercial platforms and systems have been released in recent years, which provide innovative services built upon big scholarly data to the academic community. Examples include Microsoft Academic Graph, Google Scholar, DBLP, arXiv, CiteSeerX, Web of Knowledge, Udacity, Coursera, and edX. The workshop will contribute to the birth of a community having a shared interest around big scholarly data and exploring it using knowledge discovery, data science and analytics, network science, and other appropriate technologies.