A Markov Decision Process (MDP) is a mathematical framework that is used for decision-making in environments with uncertainty. It is widely used in reinforcement learning, robotics, and operations research to model how an agent interacts with an environment to maximize long-term rewards.
How MDPs Work
An MDP is essentially defined by a tuple :
- States ():
- The set of all possible situations the agent can be in
- For example: a robotβs location in a grid world
- Actions ():
- The set of all possible actions the agent can take
- For example: moving left, right, up, or down in a grid world
- Transition Probability ()
- Defines the probability of reaching state when taking action in state
- The next state depends on the current state and action , not on past states
- Rewards ()
- The reward the agent receives for taking action in state
- For example: +1 for reaching a goal, -1 for hitting a wall
- Discount Factor ()
- Determines how much future rewards are valued compared to immediate rewards