Total number of possible n-state sequences

  • Thread starter AryanK
  • Start date
  • Tags
    Sequences
In summary, AryanK is investigating markov properties and is having trouble calculating the odds of different sequences.
  • #1
AryanK
4
0
Hi everyone,

I'm doing an investigation of markov properties and in an example I have made the following transition matrix:

http://img152.imageshack.us/img152/1584/matrixki.png

If all the probabilities were above zero, finding the total number of possible 4-state sequences (i.e. ACBA, BACB etc.) would've been very simple. However, P(C|B)=0 (I must have at least one zero-probability option at this point) and I don't know how to find the number of total possible 4-state sequences without counting them one by one. Any help?

Thanks,
AryanK
 
Last edited by a moderator:
Physics news on Phys.org
  • #2
Hey AryanK and welcome to the forums.

If you want to find for example the probability of starting at A and ending at C after n steps you simply calculate A^n where A is your transition matrix which means you multiply A*A*..*A with n multiplies.

If you want to find a particular sequence, simply multiply the right probabilities together.

For example let's look at ACBA. A to C has probability 1/2. C to B has probability 1/2 and B to A has probability 2/3. Since each transition is independent, you find the probability through multiplication which gives 1/2*1/2*2/3 = 1/6.

Same idea for the other ones.
 
  • #3
Hi and thanks for the prompt reply. :)

Yes, I understand how to calculate the probability of a specific sequence. However, what I'm interested in is how to calculate the number of all possible distinct sequences the matrix can produce. I'm interested in this as I'm trying to figure out a method/model to find the most likely n-state sequence of a given transition matrix.

For example, take the following matrix:
http://img404.imageshack.us/img404/5540/test2xt.png

Here I know that the total number of possible 4-state sequences are 3*3*3*3, as there are four states, and there are 3 possibilities in each state. However, in the original matrix, P(C|B)=0, therefore whenever a sequence contains B, it has only 2 possible options for the next state instead of 3. This should decrease the total number of possible sequences, but I don't know how to calculate the new total number.

Sorry if this sounds confusing. I've always had problems describing math-related stuff! :D
 
Last edited by a moderator:
  • #4
Well the easiest way for this case is to count the number of times a B occurs. For this we use a binomial model where one state is B and the other state is not B (i.e A,C,).

Since we are only interested in the transitions, it means the last realization doesn't count and we are only interested in the first three. So we change the binomial from n = 4 to n = 3 and apply the results. For 3 B's we get 2*2*2 as an example. Since order doesn't matter we can use this to find out probabilistically how many B's we get for 3 B's, 2 B's 1 B, and no B's.

This means we get a probability for no B's, 1 B, 2 B's, and 3 B's. This is easy to calculate. Then by mapping a B to 2 and a non-B to 3 we get our answer.

So for 2*2*2 (ie BBBX for some X = {A,B,C}) the probability is given by the binomial distribution which tells us the chance of this happening. Same with the other ones.
 
  • #5
The number of possible 4-state sequences will be 34, less those which contain a consecutive pair BC. 9 start with BC, 9 have a central BC, 9 finish with BC. Those 27 only contain one repeat: BCBC. So we have 34 - 26 = 55.
However, not sure that helps much wrt finding the most common sequence. Maybe you were intending to construct a new matrix for transitions between 4-state sequences. 55 is still rather a large number, perhaps.
I hope chiro's post helps - I can't follow it:blushing:.
 
  • #6
More thoughts...
For the odds of the different sequences, compute the single state odds in the usual way then extend one state at a time: P(AB) = P(A)P(B|A).
If the offered example is much simplified from the real problem, it may be that the total number of possible sequences makes it unmanageable to compute the exact odds of every sequence. In that case, can discontinue some sequences if the odds fall below a threshold. E.g. In the example, the most likely sequence must have odds at least 1/55, so can discard any sequence as soon as it falls below that.
 
  • #7
Thanks for the replies! You have both suggested pretty smart methods that I failed to think of. :D

I'm basically trying to figure out a less-intense way of finding the most likely sequence, because calculating the odds for every single sequence can be suicidal in more complex cases.

@Harusoex: Yeah, I've tried setting some limitations to eliminate a couple of candidates, but I still have to find a way to narrow things even further. I've actually found an algorithm (Viterbi) which does something similar to what I want, but I'm still trying to find a simpler way, so wish me luck! :)
 
Last edited:
  • #8
OK. Initially ignore there are just three possible states in a sequence of four and say there are 4!=24 sequences including repeats. Forbidden sequences are BC, AA, BB and CC. Each of these can occur three possuble ways in a sequence of four. Therefore, if we subtract sequences with forbidden two state sequences, there are 12 remaining possible 4 state sequences.
 
Last edited:
  • #9
SW VandeCarr said:
OK. Initially ignore there are just three possible states in a sequence of four and say there are 4!=24 sequences including repeats. Forbidden sequences are BC, AA, BB and CC. Each of these can occur three possuble ways in a sequence of four. Therefore, if we subtract sequences with forbidden two state sequences, there are 12 remaining possible 4 state sequences.
With no constraints there are 81 possible 4-state sequences.
Why are AA, BB and CC forbidden?
 
  • #10
AryanK said:
I've tried setting some limitations to eliminate a couple of candidates, but I still have to find a way to narrow things even further. I've actually found an algorithm (Viterbi) which does something similar to what I want, but I'm still trying to find a simpler way, so wish me luck! :)
Viterbi is for hidden Markov models, no?
I wondered if there was a higher threshold you could use for discarding partial sequences early. So I tried constructing a pathological example. Consider M states that behave exactly the same - i.e. for any other given state S, each has the same transition probabilities to and from S. A further N states, Ci, form a deterministic chain C1->C2->..->CN. Each of the M has a very small chance p of transiting to C1. CN has an evens chance of transiting to C1, otherwise equally likely to any of the M.
At any instant, prob of being in a (particular one of the) M states is 1/(M+2pNM), and for each of the N it's 2p/(1+2pN). For the N-state sequences, prob of C1...CN is also 2p/(1+2pN). For X.C1...CN-1, where X is one of the M, it's p/(M+2pNM). Set p << 1/2M.
Thus, for the first state in the sequence, C1 looks poor, but the C1...CN sequence is the single most likely of length N.
 
  • #11
@SW VandeCarr: Thanks for the reply! Well, in this case repetition is allowed, so that automatically leads to many more sequences than the number you've suggested. I think haruspex's method is pretty persuasive. :)

@haruspex: Well, markov models aren't deterministic, as down the road they are random. However, the most likely sequence IS a deterministic chain, so that chain must somehow be defined. Another idea that came to my mind is that since the most likely sequence is directly related to the matrix itself, it could be possible to get the most likely 2-state sequences of each row (in this case AC, BA, and CB), remove any other combination, and state (in a formal fashion) that the most likely can be generated using only those sequences.

Sometimes I wish it would suffice to say "just look at the matrix and follow the states with highest probabilities"!
 
  • #12
AryanK said:
@SW VandeCarr: Thanks for the reply! Well, in this case repetition is allowed, so that automatically leads to many more sequences than the number you've suggested. I think haruspex's method is pretty persuasive. :)

OK. You did say sequences of states, not state transitions, but if you're only interested in the number of 4 state paths (not probability) involving only state transitions, it seems consecutively repeating states can be ignored or collapsed to one state. BTW my model failed to take into account states like ABAB where two of the states repeat non- consecutively. Your formulation requires one state to repeat, so that's taken care if we think of the repeat as A' etc. To cover repeat pairs, we add all pairs of the matrix to the number of combinations. Then we have 4!+9=33. Remove the 12 "illegal" sequences (including those with BC) and we have 21 paths. Note, any sequences "stuck'' in states eventually become "unstuck" with increasing probability over time.EDIT: Just food for thought since you still haven't gotten your answer. BTW if you used the method in post 5 to exclude AA,BB and CC in addition to BC, it seems you would exclude 27x4=108 sequences out of 81 (less repeat pairs)..
 
Last edited:
  • #13
SW VandeCarr said:
if you used the method in post 5 to exclude AA,BB and CC in addition to BC, it seems you would exclude 27x4=108 sequences out of 81 (less repeat pairs)..
No. 9 start with AA, 9 have AA in the middle, and 9 finish with AA. But now you have more repeats than in the BC case: AAAx and xAAA. That comes to 5 repeating cases, one of them (AAAA) being repeated twice: 27-6 = 21. Eliminating all with AA, BB or CC gives 6 more repeats: AABB etc. 81-3*21 + 6 = 24. But of course, there's a much easier way of working that out.
If you want to eliminate those with BC as well then there are more repeats to worry about.
 

FAQ: Total number of possible n-state sequences

1. What is the meaning of "n-state sequences"?

The term "n-state sequences" refers to a set of possible sequences where each element can take on one of n possible states. These states can represent different values, categories, or outcomes.

2. How is the total number of possible n-state sequences calculated?

The total number of possible n-state sequences can be calculated using the formula n^m, where n is the number of possible states and m is the length of the sequence. For example, if there are 3 possible states and the sequence length is 4, the total number of possible sequences would be 3^4 = 81.

3. Can the total number of possible n-state sequences be infinite?

Yes, the total number of possible n-state sequences can be infinite if the sequence length (m) is infinite. For example, if there are 2 possible states and the sequence length is infinite, the total number of possible sequences would be 2^∞, which is infinite.

4. How does the length of the sequence impact the total number of possible n-state sequences?

The length of the sequence has a significant impact on the total number of possible n-state sequences. As the length of the sequence increases, the total number of possible sequences also increases exponentially. This is because for each additional element in the sequence, the number of possible combinations increases by a factor of n.

5. What is the significance of the total number of possible n-state sequences in scientific research?

The total number of possible n-state sequences is important in many scientific fields such as genetics, bioinformatics, and statistics. It helps researchers understand the complexity and diversity of different systems, make predictions about outcomes, and analyze data. It also allows for the identification of rare or significant sequences that may have important implications for research findings.

Back
Top