Question regarding MDPs

system · 11. Juni 2020 um 11:08

Disclaimer: Dieser Thread wurde aus dem alten Forum importiert. Daher werden eventuell nicht alle Formatierungen richtig angezeigt. Der ursprüngliche Thread beginnt im zweiten Post dieses Threads.

riwo · 11. Juni 2020 um 11:08

Question regarding MDPs
I’m having a hard time understanding the phrase “Markovian transition model” in definition 23.1.2 (definition of Markov decision problems)

From my understanding we introduced Bayesian networks (which Markov chains are a special case of) to compute probabilites of random variables influenced by hidden variables we can’t observe. In definition 23.1.2 however we assume our world to be fully ovservable, so I don’t see the point of introducing a Bayesian network.

If the transition model P(s^\prime | s,a) defined later in definition 23.1.2 was a transition model in the Markovian sense, we would have random variables for s and a, which, because I know the state I’m in and which action I have taken, I don’t really see the point of having.

Jazzpirate · 11. Juni 2020 um 11:42

That’s one way to put it, yes. However, it’s questionable to say that Markov decision procedures are a special case of Bayesian Networks, since they model different kinds of situations.

One fundamental difference in general is that Bayesian networks are acyclic, Markov chains are not. Both Markov chains and Bayesian Networks allow computing the probability of a random variable assuming hidden variables.

Markov decision procedures however compute a policy; i.e., they tell us „what to do in a given situation“ in order to maximize expected utility for some utility function.

We assume the environment which our agent is in to be fully observable - which does not entail that we have full information. It just means that at any point in time we know exactly which state we’re in. So we know we’re in state s, we know which actions a1…an we can perform in that state, however, we don’t necessarily know which state we will be in after performing some action ai - we only know the probability with which we will end up in some state. That is why the transition model is a probability distribution.