UBC Theses and Dissertations

UBC Theses Logo

UBC Theses and Dissertations

Decision making with inference and learning methods Hoffman, Matthew William

Abstract

In this work we consider probabilistic approaches to sequential decision making. The ultimate goal is to provide methods by which decision making problems can be attacked by approaches and algorithms originally built for probabilistic inference. This allows us to directly apply a wide variety of popular, practical algorithms to these tasks. In Chapter 1 we provide an overview of the general problem of sequential decision making and a broad description of various solution methods. Much of the remaining work of this thesis then proceeds by relying upon probabilistic reinterpretations of the decision making process. This strategy of reducing learning problems to simpler inference tasks has been shown to be very fruitful in much of machine learning, and we expect similar improvements to arise in the control and reinforcement learning fields. The approaches of Chapters 2–3 build upon the framework of [Toussaint and Storkey, 2006] in reformulating the solution of Markov decision processes instead as maximum-likelihood estimation in an equivalent probabilistic model. In Chapter 2 we utilize this framework to construct an Expectation Maximization algorithm for continuous, linear-Gaussian models with mixture-of-Gaussian rewards. This approach extends popular linear-quadratic reward models to a much more general setting. We also show how to extend this probabilistic framework to continuous time processes. Chapter 3 further builds upon these methods to introduce a Bayesian approach to policy search using Markov chain Monte Carlo. In Chapter 4 we depart from the setting of direct policy search and instead consider value function estimation. In particular we utilize least-squares temporal difference learning to reduce the problem of value function estimation to a more standard regression problem. In this chapter we specifically tackle the use of regularization methods in order to encourage sparse solutions. In Chapters 5–6 we consider the task of optimization as a sequential decision problem. In the first of these chapters we introduce the bandit framework and discuss a number of variations. Then in Chapter 6 we discuss a related approach to optimization utilizing Bayesian estimates of the underlying, unknown function. We finally introduce a novel approach to choose between different underlying point selection heuristics.

Item Media

Item Citations and Data

Rights

Attribution-NonCommercial-NoDerivatives 4.0 International