On Bias and Variance Decomposition of Offline Policy Evaluation Estimators
Note: This blog post is an unpublished partial draft of a larger work, jointly written with Aishwarya Mandyam.
Introduction
Evaluation is a critical component of learning contextual bandit policies that can be deployed in high-risk settings. One way to perform this evaluation is to directly deploy it in...