
machine learning with experts and bandits
4 crediti (20 ore), Dottorato in Informatica
DOCENTE: Nicolò CesaBianchi

Orario lezioni
Abstract
The course is about topics at the interface between machine learning and game theory. This area, also known as online learning, is concerned with the design of efficient algorithms for big data using local optimization techniques. We will start by introducing the Hedge algorithm for solving a class of decision problems known as prediction with expert advice. Hedge will be analyzed in the wider setting of sequential decision problems with partial feedback (multiarmed bandits). We will describe two application of multiarmed bandits: online auctions and recommender systems. In the second part of the course we will show that Hedge is a special case of a larger family of algorithms for online optimization known as Mirror Descent, whose analysis will be carried out using convex analysis tools. Finally, we will show how certain instances of Mirror Descent can learn data sources that change over time in an arbitrary fashion.
Syllabus
 Introduction and motivation
 Prediction with expert advice: binary prediction (PLG 1.4)
 Halving (PLG 1.4)
 Weighted Majority (PLG 1.4)
 Lower bound for deterministic binary prediction
 Randomized Weighted Majority (PLG 2.2)
 Lower bound for randomized binary prediction (PLG Theorem 3.7)
 From binary prediction to sequential decisions
 Hedge
 The bandit setting
 The importance weighting trick
 Exp3 (notes in prep.)
 Similarities between actions: Exp3SET run on a feedback graph (GRA Lemma 1)
 Independence number bounds the variance in Exp3SET (notes in prep.)
 Exp4 (REG 4.2)
 Secondprice auctions (ALG Appendix C)
 Exp3RTB (ALG Appendix C)
 Prediction with a linear optimization oracle
 Follow the perturbed leader
 The online convex optimization setting (OCO Section 2)
 Hedge performs online linear optimization in the simplex (OCO Section 2)
 The linearization trick (OCO Section 2)
 Follow the regularized leader (OCO Section 2)
 Convex duality (OCO Section 2)
 Euclidean and entropic regularizers (OCO Section 2)
 Regret against a sequence of models
References
 PLG: N. CesaBianchi and G. Lugosi
Prediction, Learning, and Games
Cambridge University Press, 2006.
 REG: S. Bubeck and N. CesaBianchi
Regret analysis of stochastic and nonstochastic multiarmed bandit problems (online version)
Foundations and Trends in Machine Learning, 5(1)1122, 2012.
 GRA: N. Alon, N. CesaBianchi, C. Gentile, S. Mannor, Y. Mansour, and O. Shamir
Nonstochastic multiarmed bandits with graphstructured feedback (online version)
SIAM Journal on Computing. To appear.
 ALG: N. CesaBianchi, P. Gaillard, C. Gentile, and S. Gerchinovitz
Algorithmic chaining and the role of partial feedback in online nonparametric learning (online version)
Proceedings of the 30th Conference on Learning Theory (COLT). PMLR 65:465481, 2017.
 OCO: S. ShalevShwartz
Online Learning and Online Convex Optimization (online version)
Foundations and Trends in Machine Learning 4(2), 2011
Possible topics for the exam
 The bandit killer: Lower bounds for the bandit setting
 Follow the Lazy Leader and the Shrinking Dartboard: Algorithms for prediction with expert advice that seldom change their minds.
 Experts and game theory: A relationship in (correlated) equilibrium.
 Adagrad and Online Newton Step: A superior breed of algorithms for online convex optimization.
 Adaptive regret, shifting regret, tracking regret: How to compete against a sequence of models.
 Routing packets in the dark: algorithms for the linear bandit setting.
Exam
In order to pass the exam you have to write a research note summarizing on one or more papers on a topic previously agreed with the lecturer. Your note must include:
 motivation and explanation of the problem,
 rigorous illustration of the most important results (including the formal definitions necessary to understand the results),
 at least one significant proof carried out in detail,
 description of previous results and followup results,
 a final section mentioning what are, in your opinion, strengths and weaknesses of the results' contributions.