Finding standard deviation of combination of data

  • #1
songoku
2,367
348
Homework Statement
Group A has standard deviation of 10 and group B has standard deviation of 20. If group A has 150 data and group B has 100 data, what is the standard deviation of A + B?
Relevant Equations
##\sigma^2=\frac{1}{n}\left(\Sigma x^2 -\frac{(\Sigma x)^2}{n}\right)##
I tried some workings but got me nowhere. I just want to ask whether this question is solvable, i.e the answer can be in numerical value. If yes, then I want to try a bit by myself before asking for hint here.

Thanks
 
Physics news on Phys.org
  • #2
To be clear, by A+B, I assume you mean some set of data ##\{ c_i = a_i + b_i | a_i \in A,\ b_i \in B\}##.
In that case, the correlation between the ##a_i##s and associated ##b_i##s must be considered.
The general equation is Var(##X_A+Y_B##) = Var(##X_A##) + Var(##Y_B##) +2 Cov(##X_A,\ Y_B##).
For uncorrelated random variables, ##X_A## and ##Y_B##, this becomes Var(##X_A+Y_B##) = Var(##X_A##) + Var(##Y_B##)
 
  • Like
Likes WWGD and songoku
  • #3
FactChecker said:
To be clear, by A+B, I assume you mean some set of data ##\{ c_i = a_i + b_i | a_i \in A,\ b_i \in B\}##.
In that case, the correlation between the ##a_i##s and associated ##b_i##s must be considered.
The general equation is Var(##X_A+Y_B##) = Var(##X_A##) + Var(##Y_B##) +2 Cov(##X_A,\ Y_B##).
For uncorrelated random variables, ##X_A## and ##Y_B##, this becomes Var(##X_A+Y_B##) = Var(##X_A##) + Var(##Y_B##)
Ah I see, so basically this question not really making sense because the number of data in each group is not the same so A + B will result in some data in A has no match for data in B.

If the question is modified into finding the standard deviation if the data in A is combined with data in B (so now the total data is 250), can we solve it? Actually this is the one I tried and got stuck (so I thought maybe the information of the question is not enough)

Thanks
 
  • #4
songoku said:
Ah I see, so basically this question not really making sense because the number of data in each group is not the same so A + B will result in some data in A has no match for data in B.

If the question is modified into finding the standard deviation if the data in A is combined with data in B (so now the total data is 250), can we solve it? Actually this is the one I tried and got stuck (so I thought maybe the information of the question is not enough)

Thanks
It can be solved if we assume that the groups are taken from the same population and have the same mean.
 
  • Like
Likes songoku
  • #5
songoku said:
Ah I see, so basically this question not really making sense because the number of data in each group is not the same so A + B will result in some data in A has no match for data in B.
The first problem is that the meaning of "A+B" is undefined, or at least not clear to me. Do you mean the sum of random variables, ##X_A##, from A and ##X_B##, from B? In that case, you need to know which of the A samples match up and sum with which of the B samples.
songoku said:
If the question is modified into finding the standard deviation if the data in A is combined with data in B (so now the total data is 250), can we solve it? Actually this is the one I tried and got stuck (so I thought maybe the information of the question is not enough)
So you are talking about drawing samples of a random variable, X, from the union of A and B, ##A \cup B##. Are the samples drawn randomly uniformly from ##A \cup B##?
In that case, you should be able to use the standard equation for ##\sigma^2## that you gave above. Apply it to the entire 250 elements. Why do you say that it didn't work?
 
  • Like
Likes songoku and WWGD
  • #6
Maybe to clarify , are these samples from two populations A, B, or do these describe the whole population of interest?
You may do some tests to determine if the data comes from different populations. I believe the Wilcoxon rank test is one such non-parametric test.
 
  • Like
Likes songoku
  • #7
What about the property ## \sigma_{A+B}^2 = \sigma_A^2 + \sigma_B^2 ## ?
 
  • Like
Likes FactChecker
  • #8
Gavran said:
What about the property ## \sigma_{A+B}^2 = \sigma_A^2 + \sigma_B^2 ## ?
The OP defines A and B as sets. So A+B is not the sum of random variables. It is the sum of sets, whatever that means.
If you are talking about the sum of random variables, the formula is ##\sigma_{X+Y}^2 = \sigma_{X}^2 +\sigma_{Y}^2 + 2 cov(X,Y)##. Your "property" is wrong in general and only right for uncorrelated variables.
On the other hand, if you are talking about the union of sets, ##C=A\cup B##, with a random variable, ##X##, drawn with uniform distribution from ##C##, then it is still wrong. Consider the single-element sets ##A=\{0\}, B=\{100\}##. Clearly, ##\sigma_A = \sigma_B = 0## but ##\sigma_C = 50##.
 
  • #9
FactChecker said:
The first problem is that the meaning of "A+B" is undefined, or at least not clear to me. Do you mean the sum of random variables, ##X_A##, from A and ##X_B##, from B? In that case, you need to know which of the A samples match up and sum with which of the B samples.

So you are talking about drawing samples of a random variable, X, from the union of A and B, ##A \cup B##. Are the samples drawn randomly uniformly from ##A \cup B##?
In that case, you should be able to use the standard equation for ##\sigma^2## that you gave above. Apply it to the entire 250 elements. Why do you say that it didn't work?
I am not really sure how to interpret the question. I posted the exact question, word by word.

In my opinion, it makes more sense if the interpretation is not the sum of random variables but maybe sum of sets. Group A has 150 data with standard deviation of 10 and group B has standard deviation of 20 with 100 data. Let say I combine all data into one set, set C, so this set contains 250 data and I want to find the standard deviation of C.

This is what I did:
For group A:
$$\sigma_{a}^{2}=\frac{1}{n_a} \left(\Sigma a^2 - \frac{(\Sigma a)^2}{n_a}\right)$$
$$100=\frac{1}{150} \left(\Sigma a^2 - \frac{(\Sigma a)^2}{150}\right)$$
$$\Sigma a^2=15000+\frac{(\Sigma a)^2}{150}....(1)$$

For group B:
$$\sigma_{b}^{2}=\frac{1}{n_b} \left(\Sigma b^2 - \frac{(\Sigma b)^2}{n_b}\right)$$
$$400=\frac{1}{100} \left(\Sigma b^2 - \frac{(\Sigma b)^2}{100}\right)$$
$$\Sigma b^2=40000+\frac{(\Sigma b)^2}{100}....(2)$$

For group C:
$$\sigma_{c}^{2}=\frac{1}{n_c} \left(\Sigma c^2 - \frac{(\Sigma c)^2}{n_c}\right)$$
$$=\frac{1}{250} \left(\Sigma a^2 +\Sigma b^2 - \frac{(\Sigma a+\Sigma b)^2}{250}\right)$$
$$=\frac{1}{250}\left(15000+\frac{(\Sigma a)^2}{150} + 40000+\frac{(\Sigma b)^2}{100} - \frac{(\Sigma a+\Sigma b)^2}{250}\right)$$

Then I stuck.

Thanks
 
  • #10
songoku said:
I am not really sure how to interpret the question. I posted the exact question, word by word.

In my opinion, it makes more sense if the interpretation is not the sum of random variables but maybe sum of sets. Group A has 150 data with standard deviation of 10 and group B has standard deviation of 20 with 100 data. Let say I combine all data into one set, set C, so this set contains 250 data and I want to find the standard deviation of C.

This is what I did:
For group A:
$$\sigma_{a}^{2}=\frac{1}{n_a} \left(\Sigma a^2 - \frac{(\Sigma a)^2}{n_a}\right)$$
$$100=\frac{1}{150} \left(\Sigma a^2 - \frac{(\Sigma a)^2}{150}\right)$$
$$\Sigma a^2=15000+\frac{(\Sigma a)^2}{150}....(1)$$

For group B:
$$\sigma_{b}^{2}=\frac{1}{n_b} \left(\Sigma b^2 - \frac{(\Sigma b)^2}{n_b}\right)$$
$$400=\frac{1}{100} \left(\Sigma b^2 - \frac{(\Sigma b)^2}{100}\right)$$
$$\Sigma b^2=40000+\frac{(\Sigma b)^2}{100}....(2)$$

For group C:
$$\sigma_{c}^{2}=\frac{1}{n_c} \left(\Sigma c^2 - \frac{(\Sigma c)^2}{n_c}\right)$$
$$=\frac{1}{250} \left(\Sigma a^2 +\Sigma b^2 - \frac{(\Sigma a+\Sigma b)^2}{250}\right)$$
$$=\frac{1}{250}\left(15000+\frac{(\Sigma a)^2}{150} + 40000+\frac{(\Sigma b)^2}{100} - \frac{(\Sigma a+\Sigma b)^2}{250}\right)$$

Then I stuck.

Thanks
Have you tried the approach of the post #4 or you find it unreasonable?
 
  • Like
Likes songoku
  • #11
Hill said:
Have you tried the approach of the post #4 or you find it unreasonable?
Oh I did that and I got ##\sqrt{220}## as the answer. I thought Factchecker was talking about something else, not using the assumption in post#4.

Thanks
 
Last edited:
  • #12
songoku said:
I am not really sure how to interpret the question. I posted the exact question, word by word.

In my opinion, it makes more sense if the interpretation is not the sum of random variables but maybe sum of sets. Group A has 150 data with standard deviation of 10 and group B has standard deviation of 20 with 100 data. Let say I combine all data into one set, set C, so this set contains 250 data and I want to find the standard deviation of C.


For group C:
$$\sigma_{c}^{2}=\frac{1}{n_c} \left(\Sigma c^2 - \frac{(\Sigma c)^2}{n_c}\right)$$
$$=\frac{1}{250} \left(\Sigma a^2 +\Sigma b^2 - \frac{(\Sigma a+\Sigma b)^2}{250}\right)$$
$$=\frac{1}{250}\left(15000+\frac{(\Sigma a)^2}{150} + 40000+\frac{(\Sigma b)^2}{100} - \frac{(\Sigma a+\Sigma b)^2}{250}\right)$$

Then I stuck.
You are not stuck. You are done.
I can not checked your arithmetic for group C, but that is the correct approach (given certain assumptions about what your question means)
If you want to consider the combined set C as the entire population of possible values of a random variable drawn uniformly from C, then you have calculated the variance of that random variable.
If you want to consider the combined set C as the set of sample results, then you should make one change to your equation. It should be ##\sigma_{c}^{2}=\frac{1}{n_c -1} \left(\Sigma c^2 - \frac{(\Sigma c)^2}{n_c}\right)##. The divisor is reduced by 1 because the population mean is being estimated.


PS. When you combine two sets into one, IMO, you should use the union symbol, ##A \cup B##, rather than a plus sign.
 
Last edited:
  • Like
Likes songoku
  • #13
Oh ok, it means I can't get the answer in numerical value.

Thank you very much for the help and explanation FactChecker, Hill, WWGD, Gavran
 
  • Like
Likes Hill
  • #14
songoku said:
Oh ok, it means I can't get the answer in numerical value.

Thank you very much for the help and explanation FactChecker, Hill, WWGD, Gavran
Oh, wait! I thought that you had the values of the summations of all the elements in ##A \cup B##. Don't you have that? How did you get the means of A and B?
 
  • #15
FactChecker said:
Oh, wait! I thought that you had the values of the summations of all the elements in ##A \cup B##. Don't you have that? How did you get the means of A and B?
I posted all the questions in OP, that's everything. I don't know the values of the summations of all the elements in ##A \cup B## and I don't have the means of A and B.
 
  • #16
songoku said:
I posted all the questions in OP, that's everything. I don't know the values of the summations of all the elements in ##A \cup B## and I don't have the means of A and B.
Sorry, I misunderstood.

Interpreting A+B as ##A \cup B##:
There is no way to solve it. Consider three simpler problems, all with the same individual 0 (or undefined, if you wish) standard deviations for ##A## and ##B## but significantly different standard deviations for ##A \cup B##:
1) A={0}, B={1}. ##\sigma_{sample A\cup B} = 0.70710678## and ##\sigma_{population A\cup B} = 0.5##
2) A={0}, B={10}. ##\sigma_{sample A\cup B} = 7.0710678## and ##\sigma_{population A\cup B} = 5##
3) A={0}, B={100}. ##\sigma_{sample A\cup B} = 70.710678## and ##\sigma_{population A\cup B} = 50##

If you don't like the 0 or undefined standard deviations for single-element sets A and B, you can easily make multiple-element examples.

Interpreting A+B as ##\{a+b| a\in A, b\in B, \text {selected independently and randomly}\}##:
Then apply ##\sigma_{A+B}^2 = \sigma_A^2 + \sigma_B^2## as @Gavran stated in post #7.
Since this is the only interpretation of A+B with a solution, it is probably the correct interpretation.
 
Last edited:
  • Like
Likes songoku
  • #17
I understand.

Thank you very much FactChecker
 

Similar threads

Replies
4
Views
2K
Replies
39
Views
1K
Replies
2
Views
14K
Replies
1
Views
1K
Replies
3
Views
874
Back
Top