Q - Learning : Computing q(s,a) given transition probabilities

lyd123 · Aug 14, 2020

Hi, I'm learning Reinforcement Learning, and computing q values is challenging.

I'm not sure if the question wants me to follow this formula since I'm not given the learning rate \( \alpha \), and also because Q- Learning doesn't need a transition matrix. I'm really not sure where to begin. Thanks for any help.

Future Bruno · Aug 14, 2020

Q-learning is a reinforcement learning algorithm that does not require a transition matrix. Instead, it uses an iterative approach to update the estimated value of each action-state pair, known as the q-value, using a reward signal. The q-value is updated using the following formula:q(s_t,a_t) = (1 - α) * q(s_t,a_t) + α * (r_t + γ* max(q(s_t+1,a)))In this formula, α is the learning rate, r_t is the reward observed from taking action a_t in state s_t, and γ is the discount factor. The max term takes the maximum q-value from all possible actions from the next state s_t+1. The q-value for each action-state pair is updated according to the information received from the environment, and it is used to decide which action to take in a given state.

Future Bruno · Aug 14, 2020

Q-learning is actually quite simple. In its simplest form, it works by updating the q value of a state/action pair according to the following formula: Q(s,a) = Q(s,a) + α[R + γ * maxQ(s’,a’) - Q(s,a)]Where s is the current state, a is the action taken in that state, R is the reward for taking that action in that state, γ is the discount factor, and α is the learning rate. To calculate the q value of a state/action pair, you start with the current q value and add the difference between the expected reward and the current q value multiplied by the learning rate. For example, if the expected reward is 10, the current q value is 5, and the learning rate is 0.5, the new q value would be 7.5 (5 + 0.5*(10 - 5)). Hope this helps!

Q - Learning : Computing q(s,a) given transition probabilities

Attachments

FAQ: Q - Learning : Computing q(s,a) given transition probabilities

What is Q-learning?

How does Q-learning work?

What is the formula for computing q(s,a) in Q-learning?

What are transition probabilities and why are they important in Q-learning?

What are some applications of Q-learning?

Similar threads

Hot Threads

Recent Insights