UBC Theses and Dissertations

UBC Theses Logo

UBC Theses and Dissertations

Near-optimal Herding Samadi, Samira

Abstract

Herding is an algorithm of recent interest in the machine learning community, motivated by inference in Markov random fields. It solves the following Sampling Problem: given a set Χ \subset R^d with mean μ, construct an infinite sequence of points from Χ such that, for every t ≥ 1, the mean of the first t points in that sequence lies within Euclidean distance O(1/t) of μ. The error of a solution to Sampling Problem is defined to be the distance between the empirical mean of the first t samples and the original mean μ. The O(1/t) error bound suppresses the dependence on d and Χ. In this thesis, we study the best dependence on d and |Χ| that can be achieved for the error in Sampling Problem. Known analysis of the Herding algorithm give an error bound that depends on geometric properties of Χ but, even under favorable conditions, this bound depends linearly on d. We first show that any algorithm for the Sampling Problem must have error Ω(√d/t). Afterward, we present a new polynomial-time algorithm that solves the Sampling Problem with error O(√d log^2.5|Χ|/t) assuming that Χ is finite. This implies that our algorithm is optimal to within logarithmic factors. Finally, we prove that the actual error of the Herding Algorithm is strictly worse than the error of our algorithm if we measure the error in the infinity-norm. Our algorithm is randomized and based on recent algorithmic results in discrepancy theory. We implement our algorithm and other potential solutions for the Sampling Problem and evaluate them on various inputs.

Item Media

Item Citations and Data

Rights

Attribution-NonCommercial-NoDerivs 2.5 Canada