Jonas Lieber, University of Chicago (Job Market Seminar)

"Estimating Concentration Parameters for Bandit Algorithms"

Abstract

Bandit models are widely used to capture learning in contexts where agents repeatedly choose actions with uncertain rewards. Examples include firms maximizing profits by experimenting with prices or advertisement, randomized control trials maximizing outcomes by evaluating alternative treatments, and consumers maximizing utility by trying experience goods. A popular bandit algorithm is the upper confidence bound (UCB) algorithm. The UCB algorithm requires sub-Gaussian concentration parameters as inputs. In practice, these parameters are unknown so that the UCB algorithm is not fully data driven. I propose a method to estimate these parameters and use the estimated parameters to conduct inference with Hoeffding's inequality. I show that asymptotic inference with estimated parameters is valid under mild and optimal under stronger conditions. In finite samples, I establish validity of inference under an anti-concentration condition. Equipped with the proposed estimator for sub-Gaussian concentration parameters, I adapt the UCB algorithm to settings where these parameters are unknown. In an empirical application, I study price experimentation after the liberalization of the spirits market in Washington State in 2012 and find that the adapted UCB algorithm leads to considerably higher profits. My theoretical results can also be applied to non-standard inference problems that arise in partial identification and machine learning.

Contact person: Anders Munk-Nielsen