Character strings as random variables?

In summary: If however you take the view that each unit of output is the realization/event of a random variable, then entropy does exist. The entropy of a character string is the sum of the entropies of the individual units of output.
  • #1
SW VandeCarr
2,199
81
Consider a character string randomly generated from an alphabet {T,H} of length L, where T and H each have a probability of 0.5. For an arbitrary finite L the probability of a given string is p=(0.5)^L.

A probability is the sole determinant of Shannon entropy (S). Therefore I'm claiming that such character strings have Shannon entropy which, given a uniform PDF, would be S=-logb2(P)^L.

This is my reasoning for claiming that such character strings have entropy. I've been challenged on this based on the argument that each element of the string is a random variable, but the entire string is a "constant". In fact, there is no specification that the string need be generated sequentially. A string, as defined above, where L=10 has 1024 possible outcomes or states. Is this not an example of entropy?

EDIT: In addition, I'm claiming that if L were an RV and P(T or F) is fixed, then S is a random variable with a known PDF.
(see also LuculentCabal:logarithm of discrete RV Jul 12)
 
Last edited:
Physics news on Phys.org
  • #2
SW VandeCarr said:
Consider a character string randomly generated from an alphabet {T,H} of length L, where T and H each have a probability of 0.5. For an arbitrary finite L the probability of a given string is p=(0.5)^L.

A probability is the sole determinant of Shannon entropy (S). Therefore I'm claiming that such character strings have Shannon entropy which, given a uniform PDF, would be S=-logb2(P)^L.

This is my reasoning for claiming that such character strings have entropy. I've been challenged on this based on the argument that each element of the string is a random variable, but the entire string is a "constant". In fact, there is no specification that the string need be generated sequentially. A string, as defined above, where L=10 has 1024 possible outcomes or states. Is this not an example of entropy?

EDIT: In addition, I'm claiming that if L were an RV and P(T or F) is fixed, then S is a random variable with a known PDF.
(see also LuculentCabal:logarithm of discrete RV Jul 12)

I think the confusion may be that a particular string is not a random variable but a realization/event of a random variable. Maybe in your past debate you were just having a comunication problem.
 
  • #3
John Creighto said:
I think the confusion may be that a particular string is not a random variable but a realization/event of a random variable. Maybe in your past debate you were just having a comunication problem.

Well, that is the root of the problem apparently. But if you take the view that an outcome, once observed, has no information, then information/entropy doesn't exist as an observable. If we have a system which has 1024 equally probable states than the entropy of that system is 10 in Shannon measure, is it not? The string that is observed is one randomly realized state of the system. What is the proper context for the concept of information/entropy?
 
Last edited:
  • #4
SW VandeCarr said:
Well, that is the root of the problem apparently. But if you take the view that an outcome, once observed, has no information, then information/entropy doesn't exist as an observable. If we have a system which has 1024 equally probable states than the entropy of that system is 10 in Shannon measure, is it not? The string that is observed is one randomly realized state of the system. What is the proper context for the concept of information/entropy?

That all makes sense to me. Keep in mind though I haven't studied Shannon entropy but did try reading his paper once (a long long time ago).
 
  • #5
John Creighto said:
That all makes sense to me. Keep in mind though I haven't studied Shannon entropy but did try reading his paper once (a long long time ago).

The essential thing you need to know is that entropy is defined as:

[tex]S = -k \sum (p(x_{i})log_{2} p(x_{i})) [/tex]

Therefore any value that can be calculated from the appropriate input parameters by means of this equation is entropy. If entropy can be calculated for a character string, then the string has entropy. In the thermodynamic version, k is the Boltzmann constant and it applies to a system whose microstate is defined in terms of the kinetic energy (KE) of the individual particles and the macrostate in terms of temperature (T) resulting in S= KE/T in SI units for systems in thermal equilibrium.

In the statistical application the same equation applies usually with k=1. The the input values in the case at hand is the alphabet {T,H}, the length(L) of the string and the probability of the string [tex](0.5)^{L}[/tex].

So character strings can have entropy. The question is how they have entropy. If you assume that a string is generated sequentially, then the character output is known as the process proceeds. Here you can argue that each unit of output is the mapping of a random variable (RV), but the string as a whole is not the mapping of a RV. However, if the string is considered as a unit entity (one state of a system), then the entire string is the mapping of a random variable onto the event space.
 
Last edited:
  • #6
When, I first read this I thought "So". So I decided to look at your other thread to try and understand why you are making what seems to be a seemingly obvious point. A string has no entropy as it is a particular instance of a random variable but each string is one state or mode. The entropy of the entire system is based upon the number of modes and the probability of each mode. So now that we are in agreement so far let's get back to your other thread.
 

Related to Character strings as random variables?

1. What are character strings as random variables?

Character strings are a type of data that consists of a sequence of characters, such as letters, numbers, and symbols. As random variables, they represent a random outcome or value that can take on different forms or patterns.

2. How are character strings used in scientific research?

Character strings are commonly used in scientific research as a way to represent and analyze data, particularly in fields such as genetics, linguistics, and computer science. They can be used to study patterns and relationships between different character sequences, and can also be manipulated and transformed for statistical analysis.

3. How are character strings generated?

Character strings can be generated using various methods, such as through random sampling or by following a specific pattern or rule. In scientific research, character strings may be generated through simulations or by extracting data from existing sources.

4. What are some challenges in using character strings as random variables?

One challenge in using character strings as random variables is ensuring that the data is truly random and not biased in any way. This can be especially difficult when working with large datasets. Additionally, the interpretation and analysis of character strings can be complex, requiring specialized knowledge and techniques.

5. How do character strings contribute to our understanding of randomness?

Character strings play a crucial role in our understanding of randomness by representing and analyzing data that exhibits random characteristics. Through the study of character strings, scientists can gain insights into the patterns and probabilities of random events, which can inform our understanding of complex systems and phenomena.

Similar threads

  • Set Theory, Logic, Probability, Statistics
Replies
2
Views
1K
  • Set Theory, Logic, Probability, Statistics
Replies
1
Views
571
  • Set Theory, Logic, Probability, Statistics
Replies
2
Views
1K
  • Set Theory, Logic, Probability, Statistics
Replies
1
Views
207
  • Set Theory, Logic, Probability, Statistics
Replies
16
Views
2K
  • Set Theory, Logic, Probability, Statistics
Replies
0
Views
1K
  • Set Theory, Logic, Probability, Statistics
Replies
1
Views
4K
  • Set Theory, Logic, Probability, Statistics
Replies
2
Views
1K
  • Set Theory, Logic, Probability, Statistics
Replies
1
Views
1K
  • Set Theory, Logic, Probability, Statistics
2
Replies
37
Views
9K
Back
Top