- #1
vivek1
- 1
- 0
I have a dataset of protein, consisting of 10000 sequence each, having length Si
, where 1<=i<=10000. Now, I extracted k-mer "a" from the 1st sequence. The probability of occurrence of amino acid (character of protein sequence) is given by its frequency in the dataset. If I choose k-mer "b" from other sequence, what will be the probability that k-mer "b" matches k-mer "a" at least in r position out of k position?
, where 1<=i<=10000. Now, I extracted k-mer "a" from the 1st sequence. The probability of occurrence of amino acid (character of protein sequence) is given by its frequency in the dataset. If I choose k-mer "b" from other sequence, what will be the probability that k-mer "b" matches k-mer "a" at least in r position out of k position?