CIKM Tutorial

The 29th ACM International Conference on Information and Knowledge Management (CIKM) will be a virtual conference on October 19-23, 2020.

Tutorial title: Neural Bayesian Information Processing

Description: This half-day tutorial addresses the fundamentals and advances in deep Bayesian learning for a variety of information systems ranging from speech recognition to document summarization, text classification, information extraction, image caption generation, sentence/image generation, dialogue management, sentiment classification, recommendation system, question answering and machine translation, to name a few. Traditionally, “deep learning” is taken to be a learning process from source inputs to target outputs where the inference or optimization is based on the real-valued deterministic model. The “semantic structure” in words, sentences, entities, images, videos, actions and documents may not be well expressed or correctly optimized in mathematical logic or computer programs. The  “distribution function” in discrete or continuous latent variable model for natural sentences or images may not be properly decomposed or estimated. A systematic and elaborate transfer learning is required to meet source and target domains. This tutorial addresses the fundamentals of statistical models and neural networks, and focus on a series of advanced Bayesian models and deep models including variational auto-encoder (VAE), stochastic temporal convolutional network, stochastic recurrent neural network, sequence-to-sequence model, attention mechanism, memory-augmented neural network, skip neural network, temporal difference VAE, predictive state neural network, and generative or normalizing flow. Enhancing the prior/posterior representation is addressed. We present how these models are connected and why they work for information and knowledge management on symbolic and complex patterns in temporal and spatial data. The variational inference and sampling method are formulated to tackle the optimization for complicated models. The word, sentence and image embeddings are merged with structural or semantic constraint. A series of case studies are presented to tackle different issues in neural Bayesian information processing. At last, we will point out a number of directions and outlooks for future studies. This tutorial serves the objectives to introduce novices to major topics within deep Bayesian learning, motivate and explain a topic of emerging importance for data mining and information retrieval, and present a novel synthesis combining distinct lines of machine learning work.

Organization: This tutorial is composed of five parts. First of all, we share the current status of researches on statistical modeling, deep neural network, information processing and data mining, and explain the key issues in deep Bayesian learning for discrete-valued observation data and latent semantics. Modern neural information models are introduced to address how data analysis is performed from language processing to memory networking, semantic understanding and knowledge learning. Secondly, we address a number of modern learning theories ranging from latent variable model to variational inference, sampling method, deep unfolding, transfer learning and adversarial learning. In the third part, a series of deep models including memory network, sequence-to-sequence learning, convolutional network, recurrent network, attention network, transformer and BERT are introduced. Next, the fourth part focuses on a variety of advanced studies which illustrate how deep Bayesian learning is developed to infer the sophisticated recurrent models for sequential information processing. In particular, the Bayesian recurrent network, VAE, neural variational learning, neural discrete representation, stochastic temporal neural network, Markov recurrent neural network and temporal-difference neural network are introduced in various information systems which open a window to various practical tasks, e.g. reading comprehension, sentence generation, dialogue system, question answering, machine translation and state prediction. Variational inference methods based on normalizing flows and variational mixture of posteriors are addressed. Posterior collapse problem in variational sequential learning is compensated. The meeting between source inputs and target outputs is pursued and optimized. In the final part, we spotlight on some future directions for deep Bayesian mining and understanding which can handle the challenges of big data, heterogeneous condition and dynamic system. In particular, deep learning, structural learning, temporal and spatial modeling, long history representation and stochastic learning are emphasized.