【Academic Event】Workshop on Optimization, Probability and Simulation (WOPS) 2021/2022
Date and Time: Jan 21-23, 2022 (Beijing time)
Location: Room 103, Dao Yuan Building, CUHK-Shenzhen
Online: Zoom (Zoom link and code will be shared via email once your registration is completed.)
Language: English
Recording:
【January 21, Part1】【January 21, Part2】【January 22, Part1】【January 22, Part2】【January 23】
Registration:
【Option A】Click the registration link:https://www.wjx.cn/vj/h3NmJfs.aspx
【Option B】Scan the QR code below:

We will send out an invitation email to you once your registration is confirmed.
Workshop on Optimization, Probability and Simulation (WOPS) 2021/2022 will be held from January 21st to 23rd, 2022 by the School of Data Science in a hybrid format. WOPS 2021/2022 is an international workshop that aims to provide a platform for young and top researchers from China and abroad to exchange ideas, to explore opportunities for collaboration, and to discuss recent research results.
The workshop is held in the hybrid format. For those who cannot make their presence onsite, we also provide the online channel via Zoom simultaneously.
Registration is open. Please join us.
*Due to the pandemic control policies, the seats are limited. If you have been to overseas in 21 days, you may not join the workshop onsite.
Agenda

Introduction of speakers
Michael Friedlander (University of British Columbia)
Research Area: Large-Scale Optimization, Numerical Methods, Signal Processing, Machine Learning
Topic: Communication and Fairness in Federated Learning
Abstract:
Federated learning is a decentralized machine learning scheme that allows many data owners to collaboratively train a model without sharing private data. Participation is best encouraged by fairly rewarding owners for the quality of their data. We describe a dual decomposition approach that leads to provably efficient communication among the participants, and also describe a computationally efficient method from cooperative game theory that preserves fairness.
Biography:
Michael Friedlander is IBM Professor of Computational Mathematics at the University of British Columbia. He received his PhD in Operations Research from Stanford University in 2002, and his BA in Physics from Cornell University in 1993. From 2002 to 2004 he was the Wilkinson Fellow in Scientific Computing at Argonne National Laboratory. He has held visiting positions at UCLA's Institute for Pure and Applied Mathematics (2010), and at Berkeley's Simons Institute for the Theory of Computing (2013). He currently serves as Area Editor for the Mathematics of Operations Research journal. His research is primarily in developing numerical methods for large-scale optimization, their software implementation, and applying these to problems in signal processing and machine learning.
* More information will be updated later, so stay tuned.
Jeff Hong (Fudan University)
Research Area: Modeling and Optimization of Stochastic Simulation, Financial Engineering & Risk Management, Business Analytics
Topic: Option Pricing by Neural Stochastic Differential Equations: A Simulation Optimization Approach
Abstract:
The classical option pricing models rely on prior assumptions on the dynamics of the underlying assets. Though empirical evidence shows that these models may partially explain the option prices, their performance may be poor when the actual situations deviate from the assumptions. Neural network models are capable of learning the underlying relationship through the data. However, they require massive amount of data to avoid over-fitting, which is typically not available for option pricing problems. Thus, we propose a new model by integrating neural networks to a classical stochastic differential equation pricing model to balance the model flexibility and the data requirement. Besides, some more specific models are also constructed by using neural network as a model calibration method of the classical models. Furthermore, we show that the training of the model can be formulated into a simulation optimization problem and can be solved in a way that is compatible to the training of neural networks as well. Preliminary numerical results show that our approach appears to work better compared with some existing models. This is a joint work with Shoudao Wang and Nifei Lin.
Biography:
Prof. Jeff Hong received his bachelor’s and doctoral degrees from Tsinghua University and Northwestern University, respectively. He is currently with School of Management and School of Data Science at Fudan University in Shanghai, China, holding the positions of Fudan Distinguished Professor, Hongyi Chair Professor, Chair of Department of Management Science in School of Management, and Associate Dean of School of Data Science. He was Chair Professor of Management Sciences at City University of Hong Kong, and Professor of Industrial Engineering and Logistics Management at the Hong Kong University of Science and Technology. Prof. Hong’s research interests include stochastic simulation, stochastic optimization, financial risk management and supply chain management. He is currently the Associate Editor-in-Chief of Journal of Operations Research Society of China, the Simulation Area Editor of Operations Research, an Associate Editor of Management Science, and the President of INFORMS Simulation Society.
* More information will be updated later, so stay tuned.
Kengo Kamatani (Institute of Statistical Mathematics, Tokyo)
Research Area: Bayesian Statistics
Topic: Non-reversible Guided Metropolis Kernel
Abstract:
We construct a class of non-reversible Metropolis kernels as a multivariate extension of the guided-walk kernel proposed by Gustafson 1998. The main idea of our method is to introduce a projection that maps a state space to a totally ordered group. By using Haar measure, we construct a novel Markov kernel termed Haar-mixture kernel, which is of interest in its own right. This is achieved by inducing a topological structure to the totally ordered group. Our proposed method, the Delta-guided Metropolis--Haar kernel, is constructed by using the Haar-mixture kernel as a proposal kernel. The proposed non-reversible kernel is at least 10 times better than the random-walk Metropolis kernel and Hamiltonian Monte Carlo kernel for the logistic regression and a discretely observed stochastic process in terms of effective sample size per second. This is joint work with Xiaoling Song (Osaka University).
Biography:
Kengo Kamatani is an Associate Professor at the Institute of Statistical Mathematics, Japan. He works on inference of stochastic process, Bayesian inference and Bayesian computation, in particular Markov chain Monte Carlo methods. He is also the developer of the R package YUIMA, which is a powerful and efficient tool for statistical inference for stochastic processes. He is currently an associate editor of the Journal of/Advances in Applied Probability, Foundations of Data Science and Annals of ISM.
Suvrit Sra (Massachusetts Institute of Technology, MIT, IDSS, LIDS)
Research Area: Mathematics of Machine Learning/Data Science, Optimization for Machine Learning
Topic: *Cancelled due to pandemic difficulties
Biography:
Suvrit Sra joined MIT’s Department of Electrical Engineering and Computer Science and IDSS as a core faculty member, in January 2018. Prior to this, he was a Principal Research Scientist in the MIT Laboratory for Information and Decision Systems (LIDS). Before coming to LIDS, he was a Senior Research Scientist at the Max Planck Institute for Intelligent Systems, in Tübingen, Germany. During this time, he was also a visiting faculty member at the University of California at Berkeley (EECS Department) and Carnegie Mellon University (Machine Learning Department). He received his PhD in Computer Science from the University of Texas at Austin.
Suvrit’s research bridges a variety of mathematical topics including optimization, matrix theory, differential geometry, and probability with machine learning. His recent work focuses on the foundations of geometric optimization, an emerging subarea of nonconvex optimization where geometry (often non-Euclidean) enables efficient computation of global optimality. More broadly, his work encompasses a wide range of topics in optimization, especially in machine learning, statistics, signal processing, and related areas. He is pursuing novel applications of machine learning and optimization to materials science, quantum chemistry, synthetic biology, healthcare, and other data-driven domains.
His work has won several awards at machine learning conferences, the 2011 “SIAM Outstanding Paper” award, faculty research awards from Criteo and Amazon, and an NSF CAREER award. In addition, Suvrit founded (and regularly co-chairs) the popular OPT “Optimization for Machine Learning” series of Workshops at the Conference on Neural Information Processing Systems (NIPS). He has also edited a well-received book with the same title (MIT Press, 2011).
Albert Berahas (University of Michigan)
Research Area: Nonlinear Programming and Algorithms, Optimization for Machine Learning, Stochastic Optimization, Distributed Optimization
Topic: Algorithms for Deterministically Constrained Stochastic Optimization
Abstract:
Stochastic gradient and related methods for solving stochastic optimization problems have been studied extensively in recent years. It has been shown that such algorithms and much of their convergence and complexity guarantees extend in straightforward ways when one considers problems involving simple constraints, such as when one can perform projections onto the feasible region of the problem. However, settings with general nonlinear constraints have received less attention, and many of the approaches that have been proposed for solving such problems resort to using penalty or (augmented) Lagrangian methods, which are often not the most effective strategies. In this work, we propose and analyze stochastic optimization algorithms for deterministically constrained problems based on the sequential quadratic optimization (commonly known as SQP) methodology. We discuss the rationale behind our proposed techniques, convergence in expectation and complexity guarantees for our algorithms, and the results of preliminary numerical experiments that we have performed. This is joint work with Raghu Bollapragada, Frank E. Curtis, Michael O'Neill, Daniel P. Robinson and Baoyu Zhou.
Biography:
Albert S. Berahas is an Assistant Professor in the Industrial and Operations Engineering department at the University of Michigan. Before joining the University of Michigan, he was a Postdoctoral Research Fellow in the Industrial and Systems Engineering department at Lehigh University working with Professors Katya Scheinberg, Frank Curtis and Martin Takáč. Prior to that appointment, he was a Postdoctoral Research Fellow in the Industrial Engineering and Management Sciences department at Northwestern University working with ProfessorJorge Nocedal. Berahas completed his PhD studies in the Engineering Sciences and Applied Mathematics (ESAM) department at Northwestern University in 2018, advised by Professor Jorge Nocedal. He received his undergraduate degree in Operations Research and Industrial Engineering (ORIE) from Cornell University in 2009, and in 2012 obtained an MS degree in Applied Mathematics from Northwestern University. Berahas’ research broadly focuses on designing, developing and analyzing algorithms for solving large scale nonlinear optimization problems. Specifically, he is interested in and has explored several sub-fields of nonlinear optimization such as: (i) general nonlinear optimization algorithms, (ii) optimization algorithms for machine learning, (iii) constrained optimization, (iv) stochastic optimization, (v) derivative-free optimization, and (vi) distributed optimization.
Wei Deng (Purdue University, Morgan Stanley)
Research Area: Monte Carlo, Bandits, Q-Learning, Diffusion Models, Financial Applications
Topic: Non-convex Bayesian Learning via Stochastic Gradient Markov Chain Monte Carlo
Abstract:
The training of modern deep neural networks (DNNs) boils down to a non-convex Bayesian learning problem. A standard tool to handle the problem is Langevin Monte Carlo, which, however, can be arbitrarily slow and often fails to explore the multi-modal posterior given a limited time. As a result, advanced techniques are still required.
In this talk, we start with the replica exchange Langevin Monte Carlo (also known as parallel tempering), which is a Markov jump process that proposes appropriate swaps between exploration and exploitation to achieve accelerations. However, the naive extension of swaps to big data problems leads to a large bias, and the bias-corrected swaps are required. Such a mechanism leads to few effective swaps and insignificant accelerations. To alleviate this issue, we first propose a control variates method to reduce the variance of noisy energy estimators and show a potential to accelerate the exponential convergence. We also present the population-chain replica exchange and propose a generalized deterministic even-odd scheme to track the non-reversibility and obtain an optimal round trip rate.
We also study scalable dynamic importance sampling algorithms based on stochastic approximation. Traditional dynamic importance sampling algorithms, such as multicanonical ensemble and Wang Landau algorithms, have achieved successes in bioinformatics and statistical physics, however, the lack of scalability has greatly limited their extensions to big data. To handle this scalability issue, we resolve the vanishing gradient problem and propose two dynamic importance sampling algorithms based on stochastic gradient Langevin dynamics. Convergence analysis with stability guarantees are also provided. In addition, we propose a pleasingly parallel version of such algorithms with interacting latent variables. We show that the interacting algorithm can be theoretically more efficient than the single-chain alternative with an equivalent computational budget.
Biography:
Wei Deng is a machine learning researcher at Morgan Stanley, NY. He got his Ph.D. in Mathematics at Purdue University in 2021. His research is focused on scalable Monte Carlo methods and he is also interested in diffusion models and bandits. His works are mainly published in ICML, NeurIPS, and ICLR.
Zhiyuan Huang (Tongji University)
Research Area: Stochastic Modeling, Monte Carlo Simulation, Rare-Event Simulation, Optimization under Uncertainty
Topic: Learning-Based Robust Optimization: Procedures and Statistical Guarantees
Abstract:
Robust optimization (RO) is a common approach to tractably obtain safeguarding solutions for optimization problems with uncertain constraints. In this talk, we present a statistical framework to integrate data into RO based on learning a prediction set using (combinations of) geometric shapes that are compatible with established RO tools and on a simple data-splitting validation step that achieves finite-sample nonparametric statistical guarantees on feasibility. We demonstrate how our required sample size to achieve feasibility at a given confidence level is independent of the dimensions of both the decision space and the probability space governing the stochasticity, and we discuss some approaches to improve the objective performances while maintaining these dimension-free statistical feasibility guarantees.
Biography:
Dr. Zhiyuan Huang is an Assistant Professor in the Department of Management Science and Engineering at the Tongji University. His primary research interest is in the areas of data-driven robust optimization and rare-event simulation. Applications include safety evaluation of intelligent physical systems and experimental design. His research has been published in journals and conferences such as Management Science, IEEE Transactions on Intelligent Transportation Systems, Winter Simulation Conference (WSC), and International Conference on Artificial Intelligence and Statistics (AISTATS).
Samuel Livingstone (University College London)
Research Area: Bayesian Computation, Markov Processes and Langevin Dynamics, Health Data Science
Topic: Adaptive Random Neighbourhood Samplers for Bayesian Variable Selection
Abstract:
We introduce a framework for efficient Markov Chain Monte Carlo (MCMC) algorithms targeting discrete-valued high-dimensional distributions, such as posterior distributions in Bayesian variable selection (BVS) problems. We show that many recently introduced algorithms, such as the locally informed sampler and the Adaptively Scaled Individual adaptation sampler (ASI), can be viewed as particular cases within the framework. We then describe a novel algorithm, the Adaptive Random Neighbourhood Informed sampler (ARNI), by combining ideas from both of these existing approaches. We show using several examples of both real and simulated datasets that a computationally efficient point-wise implementation (PARNI) leads to relatively more reliable inferences on a range of variable selection problems, particularly in the very large p setting. This is joint work with Xitong Liang and Jim Griffin.
* More information will be updated later, so stay tuned.
Jure Vogrinc (University of Warwick)
Research Area: Markov Chain Monte Carlo Methods, Simulation of Rare Events
Topic: Optimal Design of the Barker Proposal and other Locally-alanced Metropolis-Hastings Algorithms
Abstract:
We study the class of first-order locally-balanced Metropolis--Hastings algorithms introduced in Livingstone & Zanella (2021). To choose a specific algorithm within the class the user must select a balancing function g satisfying g(t)=tg(1/t), and a noise distribution for the proposal increment. Popular choices within the class are the Metropolis-adjusted Langevin algorithm and the recently introduced Barker proposal. We first establish a universal limiting optimal acceptance rate of 57% and scaling of n^(−1/3) as the dimension n tends to infinity among all members of the class under mild smoothness assumptions on g and when the target distribution for the algorithm is of the product form. In particular we obtain an explicit expression for the asymptotic efficiency of an arbitrary algorithm in the class, as measured by expected squared jumping distance. We then consider how to optimise this expression under various constraints. We derive an optimal choice of noise distribution for the Barker proposal, optimal choice of balancing function under a Gaussian noise distribution, and optimal choice of first-order locally-balanced algorithm among the entire class, which turns out to depend on the specific target distribution. Numerical simulations confirm our theoretical findings and in particular show that a bi-modal choice of noise distribution in the Barker proposal gives rise to a practical algorithm that is consistently more efficient than the original Gaussian version.
This is joint work with Samuel Livingstone (University College London) and Giacomo Zanella (Bocconi University Milano). Preprint available at https://arxiv.org/abs/2201.01123
Biography:
I am a Research Fellow at the University of Warwick working with Theo Damoulas and Adam Johansen on simulations of Gaussian and deep Gaussian processes. Before that I was a research fellow (at Warwick) with Wilfrid Kendall exploring the use of Dirichet forms to investigate theoretical properties (particularly optimal scaling) of MCMC methods. From April 2017 until September 2018 I was a postdoctoral research assistant at Queen Mary University of London with John Moriarty, working on simulation of rare events and other applications of MCMC methods to energy related problems. I did my PhD (2013-2017) at Imperial College London with Alex Mijatović on the topic of variance reduction using control variates obtained by approximately solving the Poisson equation. My main research interest focuses on methodological and theoretic aspects of computational statistical algorithms.
Jingzhao Zhang (Tsinghua University, IIIS)
Research Area: High-Dimensional Nonconvex Optimization, Machine Learning
Topic: On Convergence of Training Loss Without Reaching Stationary Points
Abstract:
It is a well-known fact that nonconvex optimization is computationally intractable in the worst case. As a result, theoretical analysis of optimization algorithms such as gradient descent often focuses on local convergence to stationary points where the gradient norm is zero or negligible. In this work, we examine the disconnect between the existing theoretical analysis of gradient-based algorithms and actual practice. Specifically, we provide numerical evidence that in large-scale neural network training, such as in ImageNet, ResNet, and WT103 + TransformerXL models, the Neural Network weight variables do not converge to stationary points where the gradient of the loss function vanishes. Remarkably, however, we observe that while weights do not converge to stationary points, the value of the loss function converges. Inspired by this observation, we propose a new perspective based on ergodic theory of dynamical systems. We prove convergence of the distribution of weight values to an approximate invariant measure (without smoothness assumptions) that explains this phenomenon. We further discuss how this perspective can better align the theory with empirical observations.
Biography:
Jingzhao Zhang is a researcher at IIIS Tsinghua University. His research focuses on understanding optimization dynamics in high-dimensional nonconvex problems. He is particularly interested in examining theoretical analysis from a practitioner’s point of view and aims to reduce the gap between theory and practice. He received B.S. from UC Berkeley and PhD from MIT.
* More information will be updated later, so stay tuned.
Junwen Qiu (The Chinese University of Hong Kong, Shenzhen)
Topic: Convergence of Random Reshuffling Under the Kurdyka-Lojasiewicz Inequality
Abstract:
We study the random reshuffling (RR) method for smooth nonconvex optimization problems with a finite-sum structure. Though this method is widely utilized in practice, e.g., in the training of neural networks, its convergence behavior is only understood in several limited settings. In this paper, under the well-known Kurdyka-Lojasiewicz (KL) inequality, we establish strong limit-point convergence results for RR with appropriate diminishing step sizes, namely, the whole sequence of iterates generated by RR is convergent and converges to a single stationary point in an almost sure sense. In addition, we derive the corresponding rate of convergence, depending on the KL exponent and suitably selected diminishing step sizes. When the KL exponent is smaller than or equal to 1/2, the convergence is at the same rate as the strongly convex case. The standard KL inequality-based convergence analysis framework only applies to algorithms with a certain descent property. We conduct a novel convergence analysis for the non-descent RR method with diminishing step sizes based on the KL inequality, which generalizes the standard KL framework. We summarize our main steps and core ideas in an informal analysis framework, which is of independent interest. As a direct application of this framework, we establish similar strong limit-point convergence results for the reshuffled proximal point method.
Biography:
Junwen Qiu is a PhD candidate in Data Science at School of Data Science, the Chinese University of Hong Kong, Shenzhen, advised by Prof. Milzarek Andre and Prof. Li Xiao. He is interested in developing and analyzing the stochastic first-order optimization methods.
Jing Zhang (The Chinese University of Hong Kong, Shenzhen)
Topic: Improved Annealing for Sampling from Multimodal Distributions via Landscape Modification
Abstract:
Given a target distribution μ∝e^(-H) to sample from with Hamiltonian H, in this paper we propose and analyze new Metropolis-Hastings sampling algorithms that target an alternative distribution μ_(1,α,c)^f∝e^(-H_(1,α,c)^f) where H_(1,α,c)^f is a landscape-modified Hamiltonian which we introduce explicitly. The advantage of the Metropolis dynamics which targets π_(1,α,c)^f is that it enjoys reduced critical height described by the threshold parameter c, function f, and a penalty parameter α≥0 that controls the state-dependent effect.
First, we investigate the case of fixed α and propose a self-normalized estimator that corrects for the bias of sampling and prove asymptotic convergence results and Chernoff-type bound of the proposed estimator.
Next, we consider the case of annealing the penalty parameter α. We prove strong ergodicity and bounds on the total variation mixing time of the resulting non-homogeneous chain subject to appropriate assumptions on the decay of α.
We illustrate the proposed algorithms by comparing their mixing times with the original Metropolis dynamics on statistical physics models including the ferromagnetic Ising model on the hypercube or the complete graph and the q-state Potts model on the two-dimensional torus. In these cases, the mixing times of the classical Glauber dynamics are at least exponential in the system size as the critical height grows at least linearly with the size, while the proposed annealing algorithm, with appropriate choice of f,c, and annealing schedule on α, mixes rapidly with at most polynomial dependence on the size. The crux of the proof harnesses on the important observation that the reduced critical height can be bounded independently of the size that gives rise to rapid mixing.
Biography:
Jing Zhang is a PhD candidate in Data Sciences at Chinese University of Hong Kong, Shenzhen, advised by Prof. Michael Choi. Her research interests lie broadly in the areas of probability theory, statistics, stochastic processes, and their applications. Before coming to CUHKSZ, she earned her bachelor’s degrees in Financial Mathematics from department of Mathematics in Southern University of Science and Technology.
Time Table of WOPS 2021/2022
