Undergrad One-Time Pad and Frequency Analysis

Haku · Apr 13, 2021

My lecturer said that the cryptosystem one-time pad, has a weakness which is when it is subject to frequency analysis. But after him trying to explain why that is a weakness of this system I am still unable to see why. Because the frequency of letters is completely irrelevant to the structure of the actual message right? Since a row of 10 a's could correspond to 10 different letters, doesn't that imply that frequency analysis offers no assistance when trying to crack one-time pad?

Office_Shredder · Apr 13, 2021

I agree with you on this.

Vanadium 50 · Apr 13, 2021

If the one-time pad is shorter than the message, it is vulnerable to frequency analysis. For the same reason, if a pad is reused, even in part, it is vulnerable to frequency analysis. Otherwise it is not.

Haku · Apr 14, 2021

This is what my lecturer said:

"You know 'E' is one of the most common letters in the English language, and the frequency of each letter is known. This refers to the frequency occurring in both the message to be encoded and the secret key. You are right in that the letter 'E' in the message could be encoded using any letter in the secret key, but it is most likely to be encoded using the letter 'E'. So the most frequent occurrence in the cryptotext would be 'E' encoded using 'E'. Similarly, you can compute the frequency that any letter is encoded using any other letter, and use this to get an estimate of the total frequency of any letter occurring in the cryptotext. Of course, you would need a large message to do this effectively, but the general idea still holds."

I don't think that this implies one-time pad would be vulnerable to frequency analysis though right?

Haku · Apr 14, 2021

This is another explanation I got, can someone please explain this to me?
"
The idea is that you can calculate the frequency of any letter appearing in the ciphertext. For example, the letter A could have been encoded as A+A, or B+Z, or C+Y, or D+X, or E+W and so on. So, based on the frequency of letters in English, you can calculate the probability of each of these cases occurring and hence the total frequency of A appearing in the ciphertext.

If you do then get A appearing a lot in the ciphertext (more than expected), then you could assume some proportion of these are coming from the encoding E+W (or whichever pair has a high frequency of occurring). It's pretty fiddly, but the idea is that you can compute these frequencies and with sufficiently long messages you can use some form of trial and error to estimate with high probability which letters are appearing in the message/secret key (and this is much better than just blind guessing)."

Haku · Apr 14, 2021

Update: It is vulnerable to frequency analysis because the combinations of letters has some probability, E+E being the greatest. So do the rest of the possible combinations, therefore you could expect to see some patterns in the long run. This is how I understand it now, is this correct?

Office_Shredder · Apr 14, 2021

No, E+E does not have the greatest probability of appearing. If the one time pad is done correctly, E+A,...,E+Z are all equally likely transformations of the character E.If the one time pad is like, an actual english paragraph, then yes it is vulnerable to attack. But a proper one time pad is a purely random string of characters

Haku · Apr 14, 2021

Office_Shredder said:

No, E+E does not have the greatest probability of appearing. If the one time pad is done correctly, E+A,...,E+Z are all equally likely transformations of the character E.If the one time pad is like, an actual english paragraph, then yes it is vulnerable to attack. But a proper one time pad is a purely random string of characters

Im my course they say that a book is a good example of the key, but that would not be a random string of characters would it? Is the key meant to be a completely randomly generated string of letters? Then you sum the corresponding numerical values together mod 26 and it gives you a number which you then encode as a letter correct?

Office_Shredder · Apr 14, 2021

Yes, a book is actually a bad example of a one time pad. I agree if you use a book then you are subject to a frequency analysis attack.

The best in class implementation generates completely random keys, then after you use the key once you throw it away. The hard part here is distributing the keys without anyone intercepting, since if you had a way of securely transmitting it then you already had a way of securely transmitting your message.

Throwing away the key might also be challenging, since you will need to destroy it in a way that it cannot be recovered.

Also I think there's at least one real world example where one time pad were not generated sufficiently randomly and hence were cracked.

Nugatory · Apr 15, 2021

Haku said:

This is what my lecturer said:

"You know 'E' is one of the most common letters in the English language, and the frequency of each letter is known. This refers to the frequency occurring in both the message to be encoded and the secret key."

This is absolute nonsense. In a one-time-pad cryptosystem, the keys are generated randomly, not taken from some snippet of English-language text. There is some possibility that you have misunderstood your lecturer, and they were trying to explain why anything less than completely random key generation will lead to a vulnerability.

Haku · Apr 17, 2021

Nugatory said:

This is absolute nonsense. In a one-time-pad cryptosystem, the keys are generated randomly, not taken from some snippet of English-language text. There is some possibility that you have misunderstood your lecturer, and they were trying to explain why anything less than completely random key generation will lead to a vulnerability.

Nah, for some reason in this course they have taught it as if they keys are books or something similar. That is where the confusion was, I didn't realize that they taught it as if the key was taken to be some english-language text.

f95toli · Apr 21, 2021

Haku said:

Nah, for some reason in this course they have taught it as if they keys are books or something similar. That is where the confusion was, I didn't realize that they taught it as if the key was taken to be some english-language text.

Well, from purely practical point of view using a book is probably not a bad solution if your message is short enough. As long as it is a popular book (Say "Moby Dick") that is widely available in English (or some other language) it nicely solves the problem of how to share the key.
Hence, I suspect using a book is one of the more common implementations of one pad crypto. It is just not very secure if you have a long message.

Office_Shredder · Apr 21, 2021

f95toli said:

Well, from purely practical point of view using a book is probably not a bad solution if your message is short enough. As long as it is a popular book (Say "Moby Dick") that is widely available in English (or some other language) it nicely solves the problem of how to share the key.
Hence, I suspect using a book is one of the more common implementations of one pad crypto. It is just not very secure if you have a long message.

Common implementations between who? Casual friends sending oto encrypted messages for fun? Any professional organization using a one time pad is not going to do this.

f95toli · Apr 21, 2021

Office_Shredder said:

Common implementations between who? Casual friends sending oto encrypted messages for fun? Any professional organization using a one time pad is not going to do this.

Well, I was mainly thinking of the former.
I don't know if there are any actual examples of "professional" organisations using a OTP based on books; although it must have at least been considered during say WW2 or the early part of the cold war.

Truly "random" OTP were (are?) used for e.g. the number stations but the problem is of course that you still need to share the OTP somehow and being in possession of a OTP would be almost impossible to explain if you are caught.

Office_Shredder · Apr 21, 2021

https://en.m.wikipedia.org/wiki/One-time_pad

The uses section is not empty. My favorite example is the hotline between moscow and Washington during the cuban missile crisis used this, so no one could spy on the hotline and they didn't have to reveal more sensitive encryption techniques to each other. This is obviously an example where the disadvantages of the pad are minimized.

onatirec · Apr 26, 2021

https://en.wikipedia.org/wiki/Gadsby_(novel)

checkmate prof

Office_Shredder · Apr 26, 2021

I wonder if you used that whole book as a one time pad for normal messages, if frequency analysis would eventually crack it. I really don't know

Undergrad One-Time Pad and Frequency Analysis

Similar threads

Undergrad About the existence of Hamel basis for vector spaces

Undergrad How to define a vector field?

Undergrad ##(A/\mathfrak{a})_{\mathfrak{p}/\mathfrak{a}}## and its isomorphism?

Undergrad Can one find a matrix that's 'unique' to a collection of eigenvectors?

Undergrad Localizing single variable quotient polynomial ring at a prime ideal

Insights Thinking Outside The Box Versus Knowing What’s In The Box

Insights Why Entangled Photon-Polarization Qubits Violate Bell’s Inequality

Insights Quantum Entanglement is a Kinematic Fact, not a Dynamical Effect

Insights What Exactly is Dirac’s Delta Function? - Insight

Insights Relativator (Circular Slide-Rule): Simulated with Desmos - Insight

Insights Fixing Things Which Can Go Wrong With Complex Numbers