Fundamentals of Reinforcement Learning

Intermediate Level

Approx. 15 hours

Flexible Schedule

Martha White , Adam White

What You’ll Learn

Formalize problems as Markov Decision Processes

Understand basic exploration methods and the exploration / exploitation tradeoff

Understand value functions, as a general-purpose tool for optimal decision-making

Know how to implement dynamic programming as an efficient solution approach to an industrial control problem

Skills You’ll Gain

Markov Model Algorithms Machine Learning Reinforcement Learning Artificial Intelligence Probability Distribution Machine Learning Algorithms

Shareable Certificate

Earn a shareable certificate to add to your LinkedIn profile.

Develop Your Specialized Knowledge

Learn new concepts from industry experts

Gain a foundational understanding of a subject or tool

Develop job-relevant skills with hands-on projects

Earn a shareable career certificate

There are 5 modules in this course

Welcome to: Fundamentals of Reinforcement Learning, the first course in a four-part specialization on Reinforcement Learning brought to you by the University of Alberta, Onlea, and Coursera. In this pre-course module, you'll be introduced to your instructors, get a flavour of what the course has in store for you, and be given an in-depth roadmap to help make your journey through this specialization as smooth as possible.

For the first week of this course, you will learn how to understand the exploration-exploitation trade-off in sequential decision-making, implement incremental algorithms for estimating action-values, and compare the strengths and weaknesses to different algorithms for exploration. For this week’s graded assessment, you will implement and test an epsilon-greedy agent.

When you’re presented with a problem in industry, the first and most important step is to translate that problem into a Markov Decision Process (MDP). The quality of your solution depends heavily on how well you do this translation. This week, you will learn the definition of MDPs, you will understand goal-directed behavior and how this can be obtained from maximizing scalar rewards, and you will also understand the difference between episodic and continuing tasks. For this week’s graded assessment, you will create three example tasks of your own that fit into the MDP framework.

Once the problem is formulated as an MDP, finding the optimal policy is more efficient when using value functions. This week, you will learn the definition of policies and value functions, as well as Bellman equations, which is the key technology that all of our algorithms will use.

This week, you will learn how to compute value functions and optimal policies, assuming you have the MDP model. You will implement dynamic programming to compute value functions and optimal policies and understand the utility of dynamic programming for industrial applications and problems. Further, you will learn about Generalized Policy Iteration as a common template for constructing algorithms that maximize reward. For this week’s graded assessment, you will implement an efficient dynamic programming agent in a simulated industrial control problem.

Fundamentals of Reinforcement Learning

What You’ll Learn

Skills You’ll Gain

Shareable Certificate

Develop Your Specialized Knowledge

There are 5 modules in this course

Module 1: Welcome to the Course! 50 minutes to complete

Module 2: An Introduction to Sequential Decision-Making 3 hours to complete

Module 3: Markov Decision Processes 3 hours to complete

Module 4: Value Functions &amp; Bellman Equations 3 hours to complete

Module 5: Dynamic Programming 3 hours to complete