## 【学术讲座】优化、概率及仿真国际研讨会WOPS 2021/2022

**日期及时间：2022年1月21日 16:05 至 1月23日 18:00（北京时间 ）**

**地点：香港中文大学（深圳）道远楼103会议室**

**线上：ZOOM（报名成功后，Zoom链接及密码将通过邮件发送至您的邮箱）**

**语言：英语**

**全程回顾视频：**

【1月21日 Part1】【1月21日 Part2】【1月22日 Part1】【1月22日 Part2】【1月23日】（点击文字查看）

**注册方式：**

**【方式一】点击链接：https://www.wjx.cn/vj/h3NmJfs.aspx**

**【方式二】扫描下方二维码进入注册页面**

**注册成功后，我们将以邮件形式通知您，敬请留意。**

数据科学学院主办的国际研讨会Workshop on Optimization, Probability and Simulation (WOPS) 2021/2022会议将于2022年1月21日至23日在香港中文大学（深圳）道远楼103召开。本届会议将邀请了10位国内外著名高校教授学者及2位在读博士生进行讲座分享，共同探讨优化、概率及仿真的理论和相关应用，发布最新研究进展和成果，聚焦行业前沿话题。

**本次会议提供****线上及线下参会方式，现已开放注册****，诚邀国内外学者及学子参加！**

*由于疫情防控，我们将提供有限的线下参会名额。21天内有海外旅居史的观众可选择线上参会，我们暂不开放线下参与的机会，敬请谅解！

**会议议程**

**演讲嘉宾简介**

**Michael Friedlander** (University of British Columbia)

**Research Area:** Large-Scale Optimization, Numerical Methods, Signal Processing, Machine Learning

**Topic:** Communication and Fairness in Federated Learning

**Abstract**:

Federated learning is a decentralized machine learning scheme that allows many data owners to collaboratively train a model without sharing private data. Participation is best encouraged by fairly rewarding owners for the quality of their data. We describe a dual decomposition approach that leads to provably efficient communication among the participants, and also describe a computationally efficient method from cooperative game theory that preserves fairness.

**Biography**:

Michael Friedlander is IBM Professor of Computational Mathematics at the University of British Columbia. He received his PhD in Operations Research from Stanford University in 2002, and his BA in Physics from Cornell University in 1993. From 2002 to 2004 he was the Wilkinson Fellow in Scientific Computing at Argonne National Laboratory. He has held visiting positions at UCLA's Institute for Pure and Applied Mathematics (2010), and at Berkeley's Simons Institute for the Theory of Computing (2013). He currently serves as Area Editor for the Mathematics of Operations Research journal. His research is primarily in developing numerical methods for large-scale optimization, their software implementation, and applying these to problems in signal processing and machine learning.

*更多资料后续将更新，敬请关注。

**Jeff Hong** (Fudan University)

**Research Area: **Modeling and Optimization of Stochastic Simulation, Financial Engineering & Risk Management, Business Analytics

**Topic:** Option Pricing by Neural Stochastic Differential Equations: A Simulation Optimization Approach

**Abstract**:

The classical option pricing models rely on prior assumptions on the dynamics of the underlying assets. Though empirical evidence shows that these models may partially explain the option prices, their performance may be poor when the actual situations deviate from the assumptions. Neural network models are capable of learning the underlying relationship through the data. However, they require massive amount of data to avoid over-fitting, which is typically not available for option pricing problems. Thus, we propose a new model by integrating neural networks to a classical stochastic differential equation pricing model to balance the model flexibility and the data requirement. Besides, some more specific models are also constructed by using neural network as a model calibration method of the classical models. Furthermore, we show that the training of the model can be formulated into a simulation optimization problem and can be solved in a way that is compatible to the training of neural networks as well. Preliminary numerical results show that our approach appears to work better compared with some existing models. This is a joint work with Shoudao Wang and Nifei Lin.

**Biography**:

Prof. Jeff Hong received his bachelor’s and doctoral degrees from Tsinghua University and Northwestern University, respectively. He is currently with School of Management and School of Data Science at Fudan University in Shanghai, China, holding the positions of Fudan Distinguished Professor, Hongyi Chair Professor, Chair of Department of Management Science in School of Management, and Associate Dean of School of Data Science. He was Chair Professor of Management Sciences at City University of Hong Kong, and Professor of Industrial Engineering and Logistics Management at the Hong Kong University of Science and Technology. Prof. Hong’s research interests include stochastic simulation, stochastic optimization, financial risk management and supply chain management. He is currently the Associate Editor-in-Chief of *Journal of Operations Research Society of China*, the Simulation Area Editor of *Operations Research*, an Associate Editor of *Management Science*, and the President of INFORMS Simulation Society.

*更多资料后续将更新，敬请关注。

**Kengo Kamatani **(Institute of Statistical Mathematics, Tokyo)

**Research Area:** Bayesian Statistics

**Topic:** Non-reversible Guided Metropolis Kernel

**Abstract: **

We construct a class of non-reversible Metropolis kernels as a multivariate extension of the guided-walk kernel proposed by Gustafson 1998. The main idea of our method is to introduce a projection that maps a state space to a totally ordered group. By using Haar measure, we construct a novel Markov kernel termed Haar-mixture kernel, which is of interest in its own right. This is achieved by inducing a topological structure to the totally ordered group. Our proposed method, the Delta-guided Metropolis--Haar kernel, is constructed by using the Haar-mixture kernel as a proposal kernel. The proposed non-reversible kernel is at least 10 times better than the random-walk Metropolis kernel and Hamiltonian Monte Carlo kernel for the logistic regression and a discretely observed stochastic process in terms of effective sample size per second. This is joint work with Xiaoling Song (Osaka University).

**Biography: **

Kengo Kamatani is an Associate Professor at the Institute of Statistical Mathematics, Japan. He works on inference of stochastic process, Bayesian inference and Bayesian computation, in particular Markov chain Monte Carlo methods. He is also the developer of the R package YUIMA, which is a powerful and efficient tool for statistical inference for stochastic processes. He is currently an associate editor of the Journal of/Advances in Applied Probability, Foundations of Data Science and Annals of ISM.

**Suvrit Sra** (Massachusetts Institute of Technology, MIT, IDSS, LIDS)

**Research Area: **Mathematics of Machine Learning/Data Science, Optimization for Machine Learning

**Topic:** **Cancelled due to pandemic difficulties*

**Biography:**

Suvrit Sra joined MIT’s Department of Electrical Engineering and Computer Science and IDSS as a core faculty member, in January 2018. Prior to this, he was a Principal Research Scientist in the MIT Laboratory for Information and Decision Systems (LIDS). Before coming to LIDS, he was a Senior Research Scientist at the Max Planck Institute for Intelligent Systems, in Tübingen, Germany. During this time, he was also a visiting faculty member at the University of California at Berkeley (EECS Department) and Carnegie Mellon University (Machine Learning Department). He received his PhD in Computer Science from the University of Texas at Austin.

Suvrit’s research bridges a variety of mathematical topics including optimization, matrix theory, differential geometry, and probability with machine learning. His recent work focuses on the foundations of geometric optimization, an emerging subarea of nonconvex optimization where geometry (often non-Euclidean) enables efficient computation of global optimality. More broadly, his work encompasses a wide range of topics in optimization, especially in machine learning, statistics, signal processing, and related areas. He is pursuing novel applications of machine learning and optimization to materials science, quantum chemistry, synthetic biology, healthcare, and other data-driven domains.

His work has won several awards at machine learning conferences, the 2011 “SIAM Outstanding Paper” award, faculty research awards from Criteo and Amazon, and an NSF CAREER award. In addition, Suvrit founded (and regularly co-chairs) the popular OPT “Optimization for Machine Learning” series of Workshops at the Conference on Neural Information Processing Systems (NIPS). He has also edited a well-received book with the same title (MIT Press, 2011).

**Albert Berahas **(University of Michigan)

**Research Area:** Nonlinear Programming and Algorithms, Optimization for Machine Learning, Stochastic Optimization, Distributed Optimization

**Topic:** Algorithms for Deterministically Constrained Stochastic Optimization

**Abstract: **

Stochastic gradient and related methods for solving stochastic optimization problems have been studied extensively in recent years. It has been shown that such algorithms and much of their convergence and complexity guarantees extend in straightforward ways when one considers problems involving simple constraints, such as when one can perform projections onto the feasible region of the problem. However, settings with general nonlinear constraints have received less attention, and many of the approaches that have been proposed for solving such problems resort to using penalty or (augmented) Lagrangian methods, which are often not the most effective strategies. In this work, we propose and analyze stochastic optimization algorithms for deterministically constrained problems based on the sequential quadratic optimization (commonly known as SQP) methodology. We discuss the rationale behind our proposed techniques, convergence in expectation and complexity guarantees for our algorithms, and the results of preliminary numerical experiments that we have performed. This is joint work with Raghu Bollapragada, Frank E. Curtis, Michael O'Neill, Daniel P. Robinson and Baoyu Zhou.

**Biography: **

Albert S. Berahas is an Assistant Professor in the Industrial and Operations Engineering department at the University of Michigan. Before joining the University of Michigan, he was a Postdoctoral Research Fellow in the Industrial and Systems Engineering department at Lehigh University working with Professors Katya Scheinberg, Frank Curtis and Martin Takáč. Prior to that appointment, he was a Postdoctoral Research Fellow in the Industrial Engineering and Management Sciences department at Northwestern University working with ProfessorJorge Nocedal. Berahas completed his PhD studies in the Engineering Sciences and Applied Mathematics (ESAM) department at Northwestern University in 2018, advised by Professor Jorge Nocedal. He received his undergraduate degree in Operations Research and Industrial Engineering (ORIE) from Cornell University in 2009, and in 2012 obtained an MS degree in Applied Mathematics from Northwestern University. Berahas’ research broadly focuses on designing, developing and analyzing algorithms for solving large scale nonlinear optimization problems. Specifically, he is interested in and has explored several sub-fields of nonlinear optimization such as: (i) general nonlinear optimization algorithms, (ii) optimization algorithms for machine learning, (iii) constrained optimization, (iv) stochastic optimization, (v) derivative-free optimization, and (vi) distributed optimization.

**Wei Deng** (Purdue University, Morgan Stanley)

**Research Area:** Monte Carlo, Bandits, Q-Learning, Diffusion Models, Financial Applications

**Topic:** Non-convex Bayesian Learning via Stochastic Gradient Markov Chain Monte Carlo

**Abstract: **

The training of modern deep neural networks (DNNs) boils down to a non-convex Bayesian learning problem. A standard tool to handle the problem is Langevin Monte Carlo, which, however, can be arbitrarily slow and often fails to explore the multi-modal posterior given a limited time. As a result, advanced techniques are still required.

In this talk, we start with the replica exchange Langevin Monte Carlo (also known as parallel tempering), which is a Markov jump process that proposes appropriate swaps between exploration and exploitation to achieve accelerations. However, the naive extension of swaps to big data problems leads to a large bias, and the bias-corrected swaps are required. Such a mechanism leads to few effective swaps and insignificant accelerations. To alleviate this issue, we first propose a control variates method to reduce the variance of noisy energy estimators and show a potential to accelerate the exponential convergence. We also present the population-chain replica exchange and propose a generalized deterministic even-odd scheme to track the non-reversibility and obtain an optimal round trip rate.

We also study scalable dynamic importance sampling algorithms based on stochastic approximation. Traditional dynamic importance sampling algorithms, such as multicanonical ensemble and Wang Landau algorithms, have achieved successes in bioinformatics and statistical physics, however, the lack of scalability has greatly limited their extensions to big data. To handle this scalability issue, we resolve the vanishing gradient problem and propose two dynamic importance sampling algorithms based on stochastic gradient Langevin dynamics. Convergence analysis with stability guarantees are also provided. In addition, we propose a pleasingly parallel version of such algorithms with interacting latent variables. We show that the interacting algorithm can be theoretically more efficient than the single-chain alternative with an equivalent computational budget.

**Biography: **

Wei Deng is a machine learning researcher at Morgan Stanley, NY. He got his Ph.D. in Mathematics at Purdue University in 2021. His research is focused on scalable Monte Carlo methods and he is also interested in diffusion models and bandits. His works are mainly published in ICML, NeurIPS, and ICLR.

**Zhiyuan Huang **(Tongji University)

**Research Area:** Stochastic Modeling, Monte Carlo Simulation, Rare-Event Simulation, Optimization under Uncertainty

**Topic:** Learning-Based Robust Optimization: Procedures and Statistical Guarantees

**Abstract: **

Robust optimization (RO) is a common approach to tractably obtain safeguarding solutions for optimization problems with uncertain constraints. In this talk, we present a statistical framework to integrate data into RO based on learning a prediction set using (combinations of) geometric shapes that are compatible with established RO tools and on a simple data-splitting validation step that achieves finite-sample nonparametric statistical guarantees on feasibility. We demonstrate how our required sample size to achieve feasibility at a given confidence level is independent of the dimensions of both the decision space and the probability space governing the stochasticity, and we discuss some approaches to improve the objective performances while maintaining these dimension-free statistical feasibility guarantees.

**Biography: **

Dr. Zhiyuan Huang is an Assistant Professor in the Department of Management Science and Engineering at the Tongji University. His primary research interest is in the areas of data-driven robust optimization and rare-event simulation. Applications include safety evaluation of intelligent physical systems and experimental design. His research has been published in journals and conferences such as Management Science, IEEE Transactions on Intelligent Transportation Systems, Winter Simulation Conference (WSC), and International Conference on Artificial Intelligence and Statistics (AISTATS).

**Samuel Livingstone **(University College London)

**Research Area:** Bayesian Computation, Markov Processes and Langevin Dynamics, Health Data Science

**Topic:** Adaptive Random Neighbourhood Samplers for Bayesian Variable Selection

**Abstract: **

We introduce a framework for efficient Markov Chain Monte Carlo (MCMC) algorithms targeting discrete-valued high-dimensional distributions, such as posterior distributions in Bayesian variable selection (BVS) problems. We show that many recently introduced algorithms, such as the locally informed sampler and the Adaptively Scaled Individual adaptation sampler (ASI), can be viewed as particular cases within the framework. We then describe a novel algorithm, the Adaptive Random Neighbourhood Informed sampler (ARNI), by combining ideas from both of these existing approaches. We show using several examples of both real and simulated datasets that a computationally efficient point-wise implementation (PARNI) leads to relatively more reliable inferences on a range of variable selection problems, particularly in the very large p setting. This is joint work with Xitong Liang and Jim Griffin.

*更多资料后续将更新，敬请关注。

**Jure Vogrinc **(University of Warwick)

**Research Area:** Markov Chain Monte Carlo Methods, Simulation of Rare Events

**Topic:** Optimal Design of the Barker Proposal and other Locally-alanced Metropolis-Hastings Algorithms

**Abstract: **

We study the class of first-order locally-balanced Metropolis--Hastings algorithms introduced in Livingstone & Zanella (2021). To choose a specific algorithm within the class the user must select a balancing function g satisfying g(t)=tg(1/t), and a noise distribution for the proposal increment. Popular choices within the class are the Metropolis-adjusted Langevin algorithm and the recently introduced Barker proposal. We first establish a universal limiting optimal acceptance rate of 57% and scaling of n^(−1/3) as the dimension n tends to infinity among all members of the class under mild smoothness assumptions on g and when the target distribution for the algorithm is of the product form. In particular we obtain an explicit expression for the asymptotic efficiency of an arbitrary algorithm in the class, as measured by expected squared jumping distance. We then consider how to optimise this expression under various constraints. We derive an optimal choice of noise distribution for the Barker proposal, optimal choice of balancing function under a Gaussian noise distribution, and optimal choice of first-order locally-balanced algorithm among the entire class, which turns out to depend on the specific target distribution. Numerical simulations confirm our theoretical findings and in particular show that a bi-modal choice of noise distribution in the Barker proposal gives rise to a practical algorithm that is consistently more efficient than the original Gaussian version.

This is joint work with Samuel Livingstone (University College London) and Giacomo Zanella (Bocconi University Milano). Preprint available at https://arxiv.org/abs/2201.01123

**Biography:**

I am a Research Fellow at the University of Warwick working with Theo Damoulas and Adam Johansen on simulations of Gaussian and deep Gaussian processes. Before that I was a research fellow (at Warwick) with Wilfrid Kendall exploring the use of Dirichet forms to investigate theoretical properties (particularly optimal scaling) of MCMC methods. From April 2017 until September 2018 I was a postdoctoral research assistant at Queen Mary University of London with John Moriarty, working on simulation of rare events and other applications of MCMC methods to energy related problems. I did my PhD (2013-2017) at Imperial College London with Alex Mijatović on the topic of variance reduction using control variates obtained by approximately solving the Poisson equation. My main research interest focuses on methodological and theoretic aspects of computational statistical algorithms.

**Jingzhao Zhang** (Tsinghua University, IIIS)

**Research Area:** High-Dimensional Nonconvex Optimization, Machine Learning

**Topic:** On Convergence of Training Loss Without Reaching Stationary Points

**Abstract: **

It is a well-known fact that nonconvex optimization is computationally intractable in the worst case. As a result, theoretical analysis of optimization algorithms such as gradient descent often focuses on local convergence to stationary points where the gradient norm is zero or negligible. In this work, we examine the disconnect between the existing theoretical analysis of gradient-based algorithms and actual practice. Specifically, we provide numerical evidence that in large-scale neural network training, such as in ImageNet, ResNet, and WT103 + TransformerXL models, the Neural Network weight variables do not converge to stationary points where the gradient of the loss function vanishes. Remarkably, however, we observe that while weights do not converge to stationary points, the value of the loss function converges. Inspired by this observation, we propose a new perspective based on ergodic theory of dynamical systems. We prove convergence of the distribution of weight values to an approximate invariant measure (without smoothness assumptions) that explains this phenomenon. We further discuss how this perspective can better align the theory with empirical observations.

**Biography: **

Jingzhao Zhang is a researcher at IIIS Tsinghua University. His research focuses on understanding optimization dynamics in high-dimensional nonconvex problems. He is particularly interested in examining theoretical analysis from a practitioner’s point of view and aims to reduce the gap between theory and practice. He received B.S. from UC Berkeley and PhD from MIT.

*更多资料后续将更新，敬请关注。

**Junwen Qiu **(The Chinese University of Hong Kong, Shenzhen)

**Topic:** Convergence of Random Reshuffling Under the Kurdyka-Lojasiewicz Inequality

**Abstract: **

We study the random reshuffling (RR) method for smooth nonconvex optimization problems with a finite-sum structure. Though this method is widely utilized in practice, e.g., in the training of neural networks, its convergence behavior is only understood in several limited settings. In this paper, under the well-known Kurdyka-Lojasiewicz (KL) inequality, we establish strong limit-point convergence results for RR with appropriate diminishing step sizes, namely, the whole sequence of iterates generated by RR is convergent and converges to a single stationary point in an almost sure sense. In addition, we derive the corresponding rate of convergence, depending on the KL exponent and suitably selected diminishing step sizes. When the KL exponent is smaller than or equal to 1/2, the convergence is at the same rate as the strongly convex case. The standard KL inequality-based convergence analysis framework only applies to algorithms with a certain descent property. We conduct a novel convergence analysis for the non-descent RR method with diminishing step sizes based on the KL inequality, which generalizes the standard KL framework. We summarize our main steps and core ideas in an informal analysis framework, which is of independent interest. As a direct application of this framework, we establish similar strong limit-point convergence results for the reshuffled proximal point method.

**Biography:**

Junwen Qiu is a PhD candidate in Data Science at School of Data Science, the Chinese University of Hong Kong, Shenzhen, advised by Prof. Milzarek Andre and Prof. Li Xiao. He is interested in developing and analyzing the stochastic first-order optimization methods.

**Jing Zhang **(The Chinese University of Hong Kong, Shenzhen)

**Topic:** Improved Annealing for Sampling from Multimodal Distributions via Landscape Modification

**Abstract: **

Given a target distribution μ∝e^(-H) to sample from with Hamiltonian H, in this paper we propose and analyze new Metropolis-Hastings sampling algorithms that target an alternative distribution μ_(1,α,c)^f∝e^(-H_(1,α,c)^f) where H_(1,α,c)^f is a landscape-modified Hamiltonian which we introduce explicitly. The advantage of the Metropolis dynamics which targets π_(1,α,c)^f is that it enjoys reduced critical height described by the threshold parameter c, function f, and a penalty parameter α≥0 that controls the state-dependent effect.

First, we investigate the case of fixed α and propose a self-normalized estimator that corrects for the bias of sampling and prove asymptotic convergence results and Chernoff-type bound of the proposed estimator.

Next, we consider the case of annealing the penalty parameter α. We prove strong ergodicity and bounds on the total variation mixing time of the resulting non-homogeneous chain subject to appropriate assumptions on the decay of α.

We illustrate the proposed algorithms by comparing their mixing times with the original Metropolis dynamics on statistical physics models including the ferromagnetic Ising model on the hypercube or the complete graph and the q-state Potts model on the two-dimensional torus. In these cases, the mixing times of the classical Glauber dynamics are at least exponential in the system size as the critical height grows at least linearly with the size, while the proposed annealing algorithm, with appropriate choice of f,c, and annealing schedule on α, mixes rapidly with at most polynomial dependence on the size. The crux of the proof harnesses on the important observation that the reduced critical height can be bounded independently of the size that gives rise to rapid mixing.

**Biography: **

Jing Zhang is a PhD candidate in Data Sciences at Chinese University of Hong Kong, Shenzhen, advised by Prof. Michael Choi. Her research interests lie broadly in the areas of probability theory, statistics, stochastic processes, and their applications. Before coming to CUHKSZ, she earned her bachelor’s degrees in Financial Mathematics from department of Mathematics in Southern University of Science and Technology.