An elementary confusion on discrete or continuous variable

In summary, I think that the question is asking if the marks of a student can be discrete or continuous. It would seem that for a student's marks, they can only take on discrete values. However, family income can be considered to be a continuous variable. It would make sense to assume that it is normally distributed, as deviations from normality are likely due to skew rather than due to discretization. This would allow for a lot of powerful statistical methods to be used, as it would cover a lot of reality. The teacher's choice in this case is the right one, as it is a judgement call.
  • #1
ssd
268
6
The question is simply posed as " identity the variables as discrete or continuous. 1) Mark of a student in an examination. 2) Family income."
What I think:
1) There must be a minimum gap between two possible consecutive marks that the examiner can assign. Eg. Suppose that there are N students and we are considering percentile scores. Then the scores can take values on N isolated points only, 100*1/N, 100*2/N,..., 100*N/N. Implying that scores are discrete. Again, if we assume that scores are sample observations from a merit distribution, and the distribution is continuous, then we will have a different answer.
2) Family income cannot be more accurate than the smallest currency, and hence discrete.

My confusion is, my daughter's college professor in their introductory class of Statistics uniquely states both the variables as continuous.
 
Physics news on Phys.org
  • #2
I think it's a touch ambiguous.

Consider income. This is typically thought of as a real valued (read: not just rationals) random variable or parameter in a model. Strictly speaking every income I've heard of is countable -- in the US, simply in dollars and cents. But for statistical modeling you tend to think of it as real valued. Having access to ##\mathbb R## makes it a bit easier to model in a regression -- no issues with taking a 2 norm or what have you. Not a totally satisfying answer perhaps.

For grading, I mean it depends on how the graders are given. Certainly this is discrete if no partial credit is given and tests are scored in a remotely familiar way as Correct or Incorrect (read: Bernouli aka 0-1 proposition). However partial credit is presumably allowed per question, which is interpretted as being real valued. I don't actually really think you'd have irrationals for partial credit, but in principle I couldn't rule it out. And again, it makes modeling a bit easier for regression.

- - -
edit: for avoidance of doubt: for the no partial credit case, I meant 0-1 scoring per question. Most tests I've seen allow for some partial credit per question.
 
Last edited:
  • Like
Likes ssd
  • #3
Agree with you. For all advanced studies or inference they are taken as continuous without problem. But, in the introductory discussion one may allow the other aspects of the idea to have a complete picture.
 
  • #4
ssd said:
1) There must be a minimum gap between two possible consecutive marks that the examiner can assign. Eg. Suppose that there are N students and we are considering percentile scores. Then the scores can take values on N isolated points only, 100*1/N, 100*2/N,..., 100*N/N. Implying that scores are discrete. Again, if we assume that scores are sample observations from a merit distribution, and the distribution is continuous, then we will have a different answer.
2) Family income cannot be more accurate than the smallest currency, and hence discrete.
By this logic every number you could enter into a computer would be discrete. There is always a smallest difference in computer numbers. Similarly any measuring device has a smallest detectable difference.

Many things in statistics are not about some underlying reality, but about your model. They are arbitrary choices that the statistician can make to model the problem, some choices are better than others.

In this case, both grades and income should clearly be treated as continuous. For both it would make sense to assume that they are normally distributed rather than some discrete distribution. Deviations from normality are likely to be due to skew rather than due to discretization. Assuming continuous variables opens up a lot of statistical methods that are both relevant and powerful.

The teacher’s choice is clearly the right choice and it is the choice that any reasonable statistician would make. It is a judgement call, and the teacher is teaching the students which is the right call in this case.
 
  • Like
Likes Klystron and fresh_42
  • #5
I was trying to go by analyzing the definition. The question is not intended for the answer that " what should we assume for convenience of inference making". Logic, yes it is the base of my understanding. Photons exist as very small discrete packets, solving and generating big questions. Apparently, light , 'chosen' as continuous waves simplified many things at one point time. For that matter, no model is really arbitrary in Statistics. Nor that, something (say some model) in Statistics exists without connection to underlying reality (as derivations shown in textbooks up to the post graduate level, so far as I have seen). We only make some assumptions in our models so that we can mathematically handle it. Following researchers relax the assumptions one by one to move closer to reality.

Coming back to the original question: I give one example, which will further explain my question. There are tests for large number of students in some country, where 5 alternatives answers are given to each question. A student has to blacken a circle corresponding to the right answer. 1 point is given for the right answer and 0 point for the wrong. An optical device reads the answers and shows the mark obtained. This mark obviously is discrete.
My point was, without proper explanation of marking process, it cannot be stamped apriori as continuous.
 
  • #6
ssd said:
Coming back to the original question: I give one example, which will further explain my question. There are tests for large number of students in some country, where 5 alternatives answers are given to each question. A student has to blacken a circle corresponding to the right answer. 1 point is given for the right answer and 0 point for the wrong. An optical device reads the answers and shows the mark obtained. This mark obviously is discrete.
My point was, without proper explanation of marking process, it cannot be stamped apriori as continuous.

I really cringe at the idea of calling the sum (convolution) of a finite number of Bernouli's as continuous. Bernoulis (coin tosses) are sort of the canonical example of discrete. Of course is there are a large number involved we can approximate sums it with a Gaussian.

I think the key point is that this is a statistics course not a probability course. Maybe it's better to just tweak the question, and instead of asking whether something is discrete or continuous, just ask -- can we model it as continuous? If no, then why not?

Btw, in an awful lot of cases the "continuous vs discrete" label is kind of a misnomer -- what people are really getting at is something closer to cardinal vs ordinal data.
 
  • #7
Sounds absolutely meaningful.
 
  • #8
ssd said:
There are tests for large number of students in some country, where 5 alternatives answers are given to each question. A student has to blacken a circle corresponding to the right answer. 1 point is given for the right answer and 0 point for the wrong. An optical device reads the answers and shows the mark obtained. This mark obviously is discrete
No, even this would be best handled as a continuous variable. You would generally model the performance of a student on the test as a normally distributed random variable, meaning it is continuous. Hence common terms like “grading on the curve”.

Furthermore, tests are often seen as an instrument that measures some underlying trait of interest. That trait is frequently continuous, despite the finite resolution of the measurement instrument.
 
Last edited:
  • #9
How we may handle the data is irrelevant in context of the question, as mentioned earlier. The question is not " what you would generally assume". The question is 'what is what' by definition. You 'assume' a variable to be distributed normally does not imply the variable is continuous, you just assumed it to be continuous. You are focusing on inference making or graduation. Even then we do not get a normal variate from a binomial "generally". The conditions of De Moivre - Laplace limit theorem have to hold.
 
  • #10
To an obsessive purest, both are discrete. But it is really a matter of convenience. It depends on whether a person wants to handle the probabilities one-by-one as descrete values or in ranges. If the student scores are "A, B, C, D, F", then virtually everyone would treat them as discrete, but if they are integers 0..100, then virtually everyone would treat them as continuous ranges.
 
  • #11
ssd said:
How we may handle the data is irrelevant in context of the question, as mentioned earlier. The question is not " what you would generally assume". The question is 'what is what' by definition.
You and your daughter were trying to understand why the professor answered that they are continuous. In which case your approach is clearly wrong since it comes to the wrong conclusion.

If you wish to try to justify your/her mistake then by all means proceed with your approach. However, the professor is teaching statistics, and is right to treat both variables as continuous in a statistics class. So if your daughter wishes to learn statistics and improve her scores on future assignments then she needs to take a different focus than what you are suggesting.

The professor is clearly not asking the question you are insisting should be discussed. You are doing your daughter a disservice by trying to get justification for your answer instead of trying to understand the professor’s answer.

ssd said:
You 'assume' a variable to be distributed normally does not imply the variable is continuous, you just assumed it to be continuous.
On the contrary, a variable isn’t something that exists in the real world. It is a part of your model, so if you define it to be continuous by your model’s assumption then it is in fact continuous.

The question you are getting at is whether or not the model thus defined is a good model, meaning does it accurately predict the results of experiments and is it simple. Treating test scores as continuous usually simplifies the model, and produces results that are as accurate as a discrete assumption. Thus it is the better choice, as indicated by the professor.

I am firmly in agreement with the professor on this. I wish your daughter the best of luck.
 
Last edited:
  • #12
They are continuous. Continuous means they can take on any value withing their support. Your assumption that grades take discrete values of step size 100/N isn't accurate or justifiable. Suppose you have 25 students in a class, with a total grade of 100. These 25 students can have grades between 0 and 100, and not necessarily any of the grades 0, 4, 8, ..., 100, only. Probably they would follow a normal distribution with a mean that equals the average grade. If you divide the grades into categories like A, B, C, D, E, and F, then yes, it is a discrete random variable.
 

FAQ: An elementary confusion on discrete or continuous variable

What is the difference between discrete and continuous variables?

Discrete variables are those that can only take on specific values, while continuous variables can take on any value within a certain range. For example, the number of children in a family is a discrete variable, while the weight of a person is a continuous variable.

How can I identify if a variable is discrete or continuous?

A variable is discrete if it can be counted and has a finite or countably infinite number of possible values. A variable is continuous if it can take on any value within a certain range and is typically measured using decimals or fractions.

Can a variable be both discrete and continuous?

No, a variable can only be either discrete or continuous. It cannot be both at the same time.

Why is it important to understand the difference between discrete and continuous variables?

Understanding the difference between discrete and continuous variables is important in order to correctly analyze and interpret data. Different statistical methods and tests are used for each type of variable, so misclassifying a variable could lead to incorrect conclusions.

What type of variable is time?

Time is typically considered a continuous variable, as it can be measured in infinitely small intervals. However, it can also be discretized into intervals, such as hours or minutes, depending on the specific analysis being conducted.

Back
Top