Reinforcement Learning

reinforcement learning

6 CFU, MSc in Data Science for Economics

Instructors: Nicolò Cesa-Bianchi, Alfio Ferrara

News

Course page on Ariel
The RL course will be taught in the period March 30 - June 22, 2026.

Goals

This course introduces the theoretical and algorithmic foundations of Reinforcement Learning, the subfield of Machine Learning studying adaptive agents that take actions and interact with an unknown environment. Reinforcement learning is a powerful paradigm for the study of autonomous AI systems, and has been applied to a wide range of tasks including autonomous driving, industrial automation, conversational agents (including those based on large language models), trading and finance, game playing, and healthcare.

Syllabus

Introduction (version Jan 16, 2025) — 3 classes
1. What is reinforcement learning
2. Markov decision processes
3. Evaluation criteria: finite horizon, infinite horizon, discounted horizon
4. Markov policies and their properties
Finite horizon (version Jan 23, 2025) — 1 class
1. State-value function
2. Action-value function
3. Bellman optimality equations for finite horizon
Discounted horizon (version Feb 3, 2025) — 1.5 classes
1. Bellman optimality equations for discounted horizon
2. Value iteration
3. Policy iteration
4. Linear programming
Model-free reinforcement learning (version Feb 17, 2025) — 2.5 classes
1. Q-learning
2. SARSA
Temporal difference algorithms (version July 17, 2025. Corrected typo in SARSA(λ)) — 2 classes
1. TD(0)
2. TD(λ)
3. Equivalence between forward and backward view
4. SARSA(λ)
Value Function Approximation
1. Linear Value Function Approximation
2. Monte Carlo Value Function Approximation
3. TD Learning with Value Function Approximation
4. Value Function Approximation for Policy Evaluation
Control using Value Function Approximation
1. Action-Value Function Approximation
2. Non-Linear and Deep Neural Network Approximation
3. Model-Free Control with General Function Approximation
4. Q-Learning with Value Function Approximation
Policy Gradient
1. Policy Gradient Theorem
2. Off-Policy Policy Gradients
3. Monte-Carlo Policy Gradient (REINFORCE)
4. Actor-critic algorithms
5. Deep Q-learning algorithm (DQN)
Case Study: RL in Classic Games
1. Formalize Word Problem as MDP
2. Choice of the Algorithms
3. Problem KPIs
4. Coding and implementation

Reference material

Lecture notes (Prof. Cesa-Bianchi): linked to the syllabus.
Lecture notes, notebooks and code (Prof. Ferrara): here
Suggested reading:
- Shie Mannor, Yishay Mansour, and Aviv Tamar. RL: Foundations (in progress).
- Richard Sutton and Andrew Barto. Reinforcement Learning: an Introduction (2nd edition). MIT Press, 2020.

Exam

The exam consists in developing an experimental project and writing a report which will be discussed in the oral exam. The discussion will also include questions on the theory covered in the course. The final grade will take into account both the project and the oral exam.

Course calendar:

Browse the calendar pages to find out what was covered in each class.