Experts or Bandits?
machine learning with experts and bandits

4 crediti (20 ore), Dottorato in Informatica

DOCENTE: Nicolò Cesa-Bianchi

Orario lezioni

Abstract

The course is about topics at the interface between machine learning and game theory. This area, also known as online learning, is concerned with the design of efficient algorithms for big data using local optimization techniques. We will start by introducing the Hedge algorithm for solving a class of decision problems known as prediction with expert advice. Hedge will be analyzed in the wider setting of sequential decision problems with partial feedback (multiarmed bandits). We will describe two application of multiarmed bandits: online auctions and recommender systems. In the second part of the course we will show that Hedge is a special case of a larger family of algorithms for online optimization known as Mirror Descent, whose analysis will be carried out using convex analysis tools. Finally, we will show how certain instances of Mirror Descent can learn data sources that change over time in an arbitrary fashion.

Syllabus

  1. Introduction and motivation
  2. Prediction with expert advice: binary prediction (PLG 1.4)
  3. Halving (PLG 1.4)
  4. Weighted Majority (PLG 1.4)
  5. Lower bound for deterministic binary prediction
  6. Randomized Weighted Majority (PLG 2.2)
  7. Lower bound for randomized binary prediction (PLG Theorem 3.7)
  8. From binary prediction to sequential decisions
  9. Hedge
  10. The bandit setting
  11. The importance weighting trick
  12. Exp3 (notes in prep.)
  13. Similarities between actions: Exp3-SET run on a feedback graph (GRA Lemma 1)
  14. Independence number bounds the variance in Exp3-SET (notes in prep.)
  15. Exp4 (REG 4.2)
  16. Second-price auctions (ALG Appendix C)
  17. Exp3-RTB (ALG Appendix C)
  18. Prediction with a linear optimization oracle
  19. Follow the perturbed leader
  20. The online convex optimization setting (OCO Section 2)
  21. Hedge performs online linear optimization in the simplex (OCO Section 2)
  22. The linearization trick (OCO Section 2)
  23. Follow the regularized leader (OCO Section 2)
  24. Convex duality (OCO Section 2)
  25. Euclidean and entropic regularizers (OCO Section 2)
  26. Regret against a sequence of models

References

Possible topics for the exam

Exam

In order to pass the exam you have to write a research note summarizing on one or more papers on a topic previously agreed with the lecturer. Your note must include: