Skip to main content
Ctrl+K
Introduction to reinforcement learning and control  documentation - Home Introduction to reinforcement learning and control  documentation - Home
  • Github exercise code repository
  • Python support - Discord channel invitation
  • DTU Python Support
  • Gymnasium reinformcement learning library
  • Information
    • About this course
    • Installation
    • Using VS Code
    • Pre-requisites
    • Frequently Asked Questions
  • Exercises
    • Exercise 0: Installation and self-test
    • Exercise 1: The finite-horizon decision problem
    • Exercise 2: Dynamical Programming
    • Exercise 3: DP reformulations and introduction to Control
    • Exercise 4: Discretization and PID control
    • Exercise 5: Linear-quadratic problems in control
    • Exercise 6: Linearization and iterative LQR
    • Exercise 7: Exploration and Bandits
    • Exercise 8: Bellmans equations and exact planning
    • Exercise 9: Monte-carlo methods
    • Exercise 10: Model-Free Control with tabular methods
    • Exercise 11: Linear methods and n-step estimation
    • Exercise 12: Eligibility traces
    • Exercise 13: Deep-Q learning
  • Projects
    • Project 1: Dynamical Programming
    • Project 2: Control theory
    • Project 3: Reinforcement Learning 1
    • Project 4: Reinforcement Learning 2
  • Examples
    • The Pacman Game
    • Control models
    • Week 1: The Pacman game
    • Week 1: The Inventory-control game
    • Week 2: Optimal planning in the Inventory-environment
    • Week 2: Optimal planning with Pacman
    • Week 3: Frozen lake and dynamical programming
    • Week 3: Harmonic Oscillator
    • Week 3: Pendulum with random actions
    • Week 4: PID Control
    • Week 8: Simple bandit
    • Week 8: UCB bandit algorithm
    • Week 9: Policy evaluation
    • Week 9: Policy iteration
    • Week 9: Value iteration
    • Week 10: MC Control
    • Week 10: TD-learning
    • Week 10: MC value estimation
    • Week 11: Sarsa
    • Week 11: Q-learning
    • Week 11: N-step sarsa
    • Week 11: Mountain-car with linear feature approximators
    • Week 12: TD(Lambda)
    • Week 12: Sarsa(Lambda)
    • Week 13: DynaQ
  • Exam practicals
  • .rst

Examples

Examples#

This page is work in progress, and contains an overview of the documentation for selected models and environments used in the course.

  • The Pacman Game
  • Control models
  • Week 1: The Pacman game
  • Week 1: The Inventory-control game
  • Week 2: Optimal planning in the Inventory-environment
  • Week 2: Optimal planning with Pacman
  • Week 3: Frozen lake and dynamical programming
  • Week 3: Harmonic Oscillator
  • Week 3: Pendulum with random actions
  • Week 4: PID Control
  • Week 8: Simple bandit
  • Week 8: UCB bandit algorithm
  • Week 9: Policy evaluation
  • Week 9: Policy iteration
  • Week 9: Value iteration
  • Week 10: MC Control
  • Week 10: TD-learning
  • Week 10: MC value estimation
  • Week 11: Sarsa
  • Week 11: Q-learning
  • Week 11: N-step sarsa
  • Week 11: Mountain-car with linear feature approximators
  • Week 12: TD(Lambda)
  • Week 12: Sarsa(Lambda)
  • Week 13: DynaQ

previous

Project 4: Reinforcement Learning 2

next

The Pacman Game

By Tue Herlau

© Copyright 2026.