Countability of binary Strings

  • #1
Mr X
33
7
TL;DR Summary
If we consider the set of all strings made up of 0s and 1s, will that set be countable or uncountable? How to prove it?
Looking at the countability of the set A, made up of all sequences whose elements are the digits 0 and 1, I've run into multiple interpretations/answers.
One says the set is countable, because each natural number can be written as as a binary number. This allows for one one matching between the set of seqences and set of naturals, which makes it countable.
Another way, if we assume the set is countable, we can use something similiar to the cantor's digonaolization argument, to show that there exists infinitely many more sequences that has not been included.

The third is something that I've seen in a guide for NET exam, and the fact that the set is countable is added as a note. I do not fully understand the things said above clearly, but it kind of goes like this.
when the series a/2+ b/2^2 + c/2^3.... converges to x with a,b,c.. as binary digits then the binary expression of x is given by x=0.abc....
and after this explanation, the countability is added as a note with no furhter explanation. I'm not even sure these two things are directly connected or the explanation is used in the deduction, but adding it in here Just in case.

So my question is, is the set countable or uncountable? Is there someting wrong with one of these proofs, which is the correct proof?
 
Physics news on Phys.org
  • #2
Mr X said:
TL;DR Summary: If we consider the set of all strings made up of 0s and 1s, will that set be countable or uncountable? How to prove it?
The set of all finite binary strings of arbitrary length is countable.

The set of all infinite binary strings is uncountable.
Mr X said:
Looking at the countability of the set A, made up of all sequences whose elements are the digits 0 and 1, I've run into multiple interpretations/answers.
If this is all infinite sequences, then the set is uncountable.
Mr X said:
One says the set is countable, because each natural number can be written as as a binary number. This allows for one one matching between the set of seqences and set of naturals, which makes it countable.
The natural numbers map to the set of finite strings of arbitrary length. This is countable. None of the strings is infinite (although you can pad out a finite string with an infinite sequence of zeroes, but that doesn't change the countability).
Mr X said:
Another way, if we assume the set is countable, we can use something similiar to the cantor's digonaolization argument, to show that there exists infinitely many more sequences that has not been included.
That's correct. As long as we are dealing with all infinite binary strings.
Mr X said:
The third is something that I've seen in a guide for NET exam, and the fact that the set is countable is added as a note. I do not fully understand the things said above clearly, but it kind of goes like this.
when the series a/2+ b/2^2 + c/2^3.... converges to x with a,b,c.. as binary digits then the binary expression of x is given by x=0.abc....
and after this explanation, the countability is added as a note with no furhter explanation. I'm not even sure these two things are directly connected or the explanation is used in the deduction, but adding it in here Just in case.
I'm not sure I understand what that's saying.
Mr X said:
So my question is, is the set countable or uncountable? Is there someting wrong with one of these proofs, which is the correct proof?
As above, the set is uncountable, but you have to be careful not to confuse "of infinite length" with "of finite but arbitrary length".
 
  • Like
Likes jbriggs444 and Mr X
  • #3
PeroK said:
The set of all finite binary strings of arbitrary length is countable.

The set of all infinite binary strings is uncountable.

If this is all infinite sequences, then the set is uncountable.

The natural numbers map to the set of finite strings of arbitrary length. This is countable. None of the strings is infinite (although you can pad out a finite string with an infinite sequence of zeroes, but that doesn't change the countability).

That's correct. As long as we are dealing with all infinite binary strings.

I'm not sure I understand what that's saying.

As above, the set is uncountable, but you have to be careful not to confuse "of infinite length" with "of finite but arbitrary length".
Understood. Thankyou for the explanation.
 
  • #4
Mr X said:
TL;DR Summary: If we consider the set of all strings made up of 0s and 1s, will that set be countable or uncountable?
Yes, I know some of this repeats Perok, Some doesn't.

(A) If such a set is limited to finite-length sequences, it is countable. (B) If they are infinite-length, it is uncountable.

Mr X said:
How to prove it?
Cantor's Diagonal Argument. The real one, not the one taught in introductory courses, which is better remembered.

Mr X said:
One says the set is countable, because each natural number can be written as as a binary number. This allows for one one matching between the set of seqences and set of naturals, which makes it countable.
This is why my statement (A) is correct. Each natural number n has a finite-length binary expansion. The length is less than LOG2(n)+1.

Mr X said:
Another way, if we assume the set is countable, we can use something similiar to the cantor's digonaolization argument, to show that there exists infinitely many more sequences that has not been included.
Not just "similar," that is CDA. He did not use it on real numbers - he even said so explicitly. Here is how he described the set, translated into English:
  • Let m and n be two different characters, and consider a set M of elements E = (x1, x2, … , xv, …) which depend on infinitely many coordinates x1, x2, … , xv, …, and where each of the coordinates is either m or w. Let M be the totality of all elements E.
Modern (correct) versions use the numbers 0 and 1 instead of the characters 'm' and 'w'.
Mr X said:
The third is something that I've seen in a guide for NET exam, and the fact that the set is countable is added as a note. I do not fully understand the things said above clearly, but it kind of goes like this.
when the series a/2+ b/2^2 + c/2^3.... converges to x with a,b,c.. as binary digits then the binary expression of x is given by x=0.abc....
This sounds like the binary-representation interpretation of the modern version of the set Cantor used. It adds the complication that "10000..." and "01111..." both represent the number 1/2.

BTW, the proof is usually taught incorrectly. An outline of the incorrect version is:
  1. Assume the set M is countable; that is, a bijection E(n) exists between N and M,
  2. Diagonalize this bijection to construct a sequence E0 that is not mapped by E(n).
  3. Since E0 is in M, this contradicts the assumption in step 1, disproving it.
The flaw in this is that the diagonalization process used in step 2 does not use the assumption that E(n) is a bijection. It only uses E(n) as a function that may or may not be a surjection (or an injection, but that isn't relevant.)

This is Cantor's actual argument:
  1. Let E(n) be any function from N to M.
    1. This is not an assumption, such functions exist.
    2. The only near-flaw in CDA is that Cantor should have demonstrated one.
  2. Diagonalize this function to construct a sequence E0 that is not mapped by E(n).
  3. This proves the proposition "If E(n) is a function from N to M, then there is a sequence E0 in M that is not mapped by E(n).
  4. (Direct quote from Cantor) From this proposition it follows immediately that the totality of all elements of M cannot be put into the sequence: E1, E2, …, Ev, … otherwise we would have the contradiction, that a thing E0 would be both an element of M, but also not an element of M.
 
  • Like
Likes Mr X
  • #5
JeffJo said:
Cantor's Diagonal Argument. The real one, not the one taught in introductory courses, which is better remembered.
First of all thankyou for the detailed reply, I've got some doubts.

I'm not sure I get this. What is the real argument, and how are both different? I've no idea about which one's taught in introductory courses, because I've never encountered anything related to countability in formal education, only when self studying for exams or curiosity.
It would be nice if you could provide links or short explanations for both versions.
JeffJo said:
Each natural number n has a finite-length binary expansion. The length is less than LOG2(n)+1.
Is it a formula to find the length of a binary representation of any natural number?
JeffJo said:
Not just "similar," that is CDA. He did not use it on real numbers - he even said so explicitly.
So Cantor did not utilise CDA for proving real numbers is uncountable? That proof is derived from his original work which can be directly applied for binary sequences?
If that is not what you meant, then I did not understand.
JeffJo said:
The flaw in this is that the diagonalization process used in step 2 does not use the assumption that E(n) is a bijection. It only uses E(n) as a function that may or may not be a surjection (or an injection, but that isn't relevant.)
I don't understand this part. Why does step 2 need to use that assumption? Since we are trying to prove that there exists no such bijection, the assumption is important even if it's not used in step 2, right? Because only if we assume it does the point that there exists a member outside of the relation that we've taken become relevant. If it is not a bijection in the first place, stating that there exists a member outside of the listed is unimportant, no?
Unless, we're showing that regardless which relation we choose, bijection or not, there always exists an outlier, and hence a bijection is not simply possible. This I believe you've used in the later argument. But while this works, I don't get how assuming the bijection is a flaw in and of itself.
 
  • Like
Likes PeroK
  • #6
Re Cantor's Diagonal Argument (CDA)

There is a technical issue when using the CDA to prove that the real numbers (between zero and 1) are uncountable. That is that some real numbers have two decimal expansions. Specifically, any number that ends with an infinite sequence of zeroes can also be represented a decimal expansion that ends with an infinite sequence of nines - where the last non-zero digit has one subtracted for it.

The infamous example is, of course, that ##1 = 0.999\dots##.

This doesn't really matter because all such numbers are necessarily rational, hence form a countable subset of the real numbers, and are repeated only twice. The sets of all infinite decimals, therefore, is the set of all real numbers between zero and 1 with a countable subset repeated twice. The set of all decimal expansions, therefore, is countable if and only if the real numbers are countable. That's the first possible approach.

The second approach is to exclude any decimal expansions that end with an infinite sequence of nines. In this case, you have to be careful that the additional expansion you generate does not end in an infinite sequence of nines. This is easy to do by never choosing the digit 9. You also want never to choose the digit 0. You simply ensure, whatever the algorithm, that the new digit is in the range 1-8.

Note that this is why it's better to use a decimal rather than a binary expansion.

The uncountablility for all infinite binary strings is simpler because we don't care that two strings may represent the same real number. If we prove this, we have a third approach for the real numbers, which is again to note that only countably many real numbers can be represented by two different binary strings. So, if the real numbers were countable, then so would be the set of all binary strings, being the union of two countable sets.
 
  • Like
Likes Mr X
  • #7
Re strings of infinite length, if we were to consider strings of countably-infinite length, we may have to deal with Set-theoretic issues. Say ##w:=\omega:= |\mathbb N |##, the Naturals, then strings of fixed length ## w## alone , will have length ##2^w##, which is the cardinality of the Continuum, which is a lower bound for all strings of infinite length, if you accept A.C ( Choice).
 
  • Like
Likes Mr X
  • #8
PeroK said:
The set of all decimal expansions, therefore, is countable if and only if the real numbers are countable. That's the first possible approach.
I got the gist of it, thankyou, But I didn't fully understand this part. Don't we use the that reals between 0 and 1 are uncountable to show reals are uncountable in the first place? So won't linking it to the countability of reals to (0,1) end up as a cyclical argument?
Or are you saying that since after taking into account that some decimals may have two representation and finding a solution for that situation, we can use the diagonalisation to show the set is uncountable without any loopholes? And since the condition is iff, this confirms the reals to be uncountable?

Also is this the complete, real diagonalisation argument that JeffJo talked about in his reply?
 
  • #9
WWGD said:
Re strings of infinite length, if we were to consider strings of countably-infinite length, we may have to deal with Set-theoretic issues. Say ##w:=\omega:= |\mathbb N |##, the Naturals, then strings of fixed length ## w## alone , will have length ##2^w##, which is the cardinality of the Continuum, which is a lower bound for all strings of infinite length, if you accept A.C ( Choice).
Yes, got it. But isn't it like bit of an overkill when diagonalisation argument can do the job?
Though I hadn't thought about it that way, thanks on the different viewpoint.
 
  • Like
Likes PeroK and WWGD
  • #10
Mr X said:
I got the gist of it, thankyou, But I didn't fully understand this part. Don't we use the that reals between 0 and 1 are uncountable to show reals are uncountable in the first place? So won't linking it to the countability of reals to (0,1) end up as a cyclical argument?
Or are you saying that since after taking into account that some decimals may have two representation and finding a solution for that situation, we can use the diagonalisation to show the set is uncountable without any loopholes? And since the condition is iff, this confirms the reals to be uncountable?
We have three sets.

##A = [0, 1)##, the real numbers between 0 and 1.

##B## is the subset of ##A## of numbers with a non-unique decimal expansion. All numbers in ##B## are rational, hence ##B## is countable.

##C## is the set of infinite decimal strings. The CDA shows that ##C## is uncountable.

We have a one-to-one mapping from ##A## into ##C## using the decimal expansion of each number in ##[0, 1)##. We choose the decimal expansion that ends in an infinite sequence of zeroes, rather than an infinite sequence of nines, in the case of ambiguity.

This doesn't quite prove that ##A## is uncountable, as you need to deal with the technicality of the numbers in set ##B##. And that, hence, this mapping is not a bijection.

My preferred approach is to change the set ##C## (to let's say ##C'##) by excluding strings that end in an infinite sequence of nines. That means that ##C'## is precisely the decimal expansions of the numbers ##[0, 1)##. The CDA can be used to prove that ##C'## is uncountable.
 
Last edited:
  • Like
Likes Mr X
  • #11
Mr X said:
I'm not sure I get this. What is the real argument, and how are both different?
The one Cantor published in the 1890 paper Uber ein elementare Frage der Mannigfaltigkeitslehre. Translation at http://www.logicmuseum.com/cantor/diagarg.htm .

It actually was not the theorem that paper was written for. It was an example, much like you might show a (3,4,5) right triangle before proving the Pythagorean Theorem. The real theorem was that P(A), the power set (set of all possible subsets) of any set A, has a greater cardinality than A. If you take 'm' and 'w' to mean "included" and "excluded," then the strings Cantor used will each represent a unique subset of natural numbers.

The commonly-taught version uses real numbers since their representations are more familiar to introductory students. It can work, but introduces the need for two inelegant, and unnecessary, corrections. Usually only one is mentioned. It also presents an easier to visualize, but logically incorrect, proof by contradiction. And with no disrespect intended, your objections show that such proofs are not well understood.

Mr X said:
So Cantor did not utilise CDA for proving real numbers is uncountable?
Cantor used infinite-length binary strings to show that uncountable sets exist. (Literally, "That there is an infinite manifold, which cannot be put into a one-one correlation with the totality of all finite whole numbers.") They can be thought of as binary representations of the numbers in [0,1] (yes, 1 usually needs to be included), but some numbers have two representations. Accounting for that is the inelegant part. The proof can work with those fixes, so I'll keep talking about real numbers.

Mr X said:
I don't understand this part. Why does step 2 need to use that assumption?
To make step 3 a contradiction that follows from the assumption. In other words, no part of the proof before step 3 changes if we drop that part of the assumption. And if we do drop it, there is no contradiction in step 3.

Cantor's contradiction is not that the list is incomplete. It is that if the list could be complete, there is a number that has to be in it, but can't be in it.

Mr X said:
Unless, we're showing that regardless which relation we choose, bijection or not, there always exists an outlier, and hence a bijection is not simply possible.
Correct. The proof that it is not possible is that there is a number that has to be in it, but can't be in it.

Mr X said:
This I believe you've used in the later argument. But while this works, I don't get how assuming the bijection is a flaw in and of itself.
The flaw is that proof by contradiction has to show that contradiction follows from the assumed statement. You have to use the assumption that the list is complete, and then prove it isn't, for the usual proof to be valid.

Hyperbolic example: Say I change the assumption to "Assume a complete listing of [0,1] and that the moon is made of green cheese." The alleged contradiction does not prove anything about the moon unless that part of the assumption is needed to reach the contradiction. Similarly, "complete" needs to be used to prove anything about "complete."
 
  • Sad
  • Like
Likes Mr X and PeroK
  • #12
PeroK said:
We have three sets.

##A = [0, 1)##, the real numbers between 0 and 1.

##B## is the subset of ##A## of numbers with a non-unique decimal expansion. All numbers in ##B## are rational, hence ##B## is countable.

##C## is the set of infinite decimal strings. The CDA shows that ##C## is uncountable.
We need only two sets:

##N##, the set of all natural numbers, as defined by the Axiom of Infinity.
##C = P(N)##, the power set of ##N##.

Each element c of ##C## is a subset of ##N##. But note that we can represent c as an indexed string, where the nth character of c is a 1 if n is in the subset, or a 0 if it is not.

The elegance of CDA is just how simple it is. Your corrections take away from that.
 
  • #13
JeffJo said:
The elegance of CDA is just how simple it is. Your corrections take away from that.
I wasn't correcting anything. Just dealing with a specific example.

The power set of ##\mathbb N## is not the real numbers. So, you still have some work to do to show that ##\mathbb R## is uncountable.

Note that I've been using CDA in this thread as shorthand for the "usual diagonal argument". Not the one from Cantor's original paper.
 
Last edited:
  • Like
Likes Mr X
  • #14
PeroK said:
We have three sets.

##A = [0, 1)##, the real numbers between 0 and 1.

##B## is the subset of ##A## of numbers with a non-unique decimal expansion. All numbers in ##B## are rational, hence ##B## is countable.

##C## is the set of infinite decimal strings. The CDA shows that ##C## is uncountable.

We have a one-to-one mapping from ##A## into ##C## using the decimal expansion of each number in ##[0, 1)##. We choose the decimal expansion that ends in an infinite sequence of zeroes, rather than an infinite sequence of nines, in the case of ambiguity.

This doesn't quite prove that ##A## is uncountable, as you need to deal with the technicality of the numbers in set ##B##. And that, hence, this mapping is not a bijection.

My preferred approach is to change the set ##C## (to let's say ##C'##) by excluding strings that end in an infinite sequence of nines. That means that ##C'## is precisely the decimal expansions of the numbers ##[0, 1)##. The CDA can be used to prove that ##C'## is uncountable.
Okay, understood.
 
  • #15
JeffJo said:
The real theorem was that P(A), the power set (set of all possible subsets) of any set A, has a greater cardinality than A.
I see. So this theorem is further simplified, or rather used in a different setting that is familiar to introductory course students, and the simplification, in and of itself creates a few misunderstandings and necessitates some inelegant parts which causes the confusion. Which includes the possibility of two numbers repeating the same value, as well as the wrong proof by contradiction in the end.
Did I get that right?
JeffJo said:
They can be thought of as binary representations of the numbers in [0,1] (yes, 1 usually needs to be included)
How does inclusion of one change anything in an uncountably infinite set? Won't the open interval work as well?
JeffJo said:
To make step 3 a contradiction that follows from the assumption. In other words, no part of the proof before step 3 changes if we drop that part of the assumption. And if we do drop it, there is no contradiction in step 3.
Got it. It was shockingly simple flaw, so I'm amazed I never spotted it.
JeffJo said:
Correct. The proof that it is not possible is that there is a number that has to be in it, but can't be in it.
Understood.
 
  • Like
Likes PeroK
  • #16
JeffJo said:
We need only two sets:

##N##, the set of all natural numbers, as defined by the Axiom of Infinity.
##C = P(N)##, the power set of ##N##.

Each element c of ##C## is a subset of ##N##. But note that we can represent c as an indexed string, where the nth character of c is a 1 if n is in the subset, or a 0 if it is not.

The elegance of CDA is just how simple it is. Your corrections take away from that.
That wouldn't solve the real number interval's countability problem, will it? That only shows there exist an uncountable set, which is the powerset of the natural numbers, which can have the same cardinality as the set of infinite binary strings.
For this part, which is a specific example, and not an explanation of the theorem itself, wouldn't the method talked about be better?
 
  • #17
PeroK said:
I wasn't correcting anything. Just dealing with a specific example.

The power set of ##\mathbb N## is not the real numbers. So, you still have some work to do to show that ##\mathbb R## is uncountable.

Note that I've been using CDA in this thread as shorthand for the "usual diagonal argument". Not the one from Cantor's original paper.
My bad, I didn't see the clarification allready posted before posting the other message.
And thanks for the heads on which CDA specifically.
 
  • #18
Mr X said:
My bad, I didn't see the clarification allready posted before posting the other message.
And thanks for the heads on which CDA specifically.
Here's a video from MIT that I just found. It uses another approach, that I really like. It shows that the real numbers in ##(0, 1)## whose decimal expansions can be written using only the digits 3 and 4 form an uncountable set. The digits 3 and 4 are arbitrary, but this avoids the technicality of the terminating infinite sequence of nines. Note that the mathematician here calls this "Cantor's Diagonalization Argument". Mathematics isn't based on a set of holy books, wherein lies the true gospel! It's more ecumenical than that.

 
  • Like
Likes Mr X
  • #19
PeroK said:
I wasn't correcting anything. Just dealing with a specific example.

The power set of ##\mathbb N## is not the real numbers. So, you still have some work to do to show that ##\mathbb R## is uncountable.

Note that I've been using CDA in this thread as shorthand for the "usual diagonal argument". Not the one from Cantor's original paper.
If you accept the Continuum Hypothesis ( Edit+ AC), the power set of the Naturals does have the cardinality of the Reals.
Edit: https://en.m.wikipedia.org/wiki/Continuum_hypothesis
 
  • Like
Likes Mr X
Back
Top