Expected Value and Variance for Wilcoxon Signed-Rank Test

In summary, the expected value and variance for the Wilcoxon Signed-Rank Test are used to determine the statistical significance of the results. The expected value is calculated by multiplying the sample size by (N+1)/2 and dividing by 2, while the variance is calculated by summing the squared differences between each pair of observations and dividing by 12. These values can be used to make inferences about the population, such as comparing the expected value to a hypothesized value or using it in confidence interval calculations.
  • #1
Mogarrr
120
6
Using a normal approximation method for the Wilcoxon Signed-Rank Test, I've seen that the expected value is [itex] \mu = \frac {n(n+1)}2 [/itex] and the variance is [itex] \sigma^2 = \frac {n(n+1)(2n+1)}{24} [/itex].

I'm wondering why these are the expected value and variance.

I do recognize the formula for the sum of N natural numbers and the sum of N squared natural numbers.

I have an idea as to why the expected value is half the sum of N natural numbers. Under the null hypothesis, roughly half of the differences should be positive, so it would make sense to half the sum of N natural numbers.

I have no intuition for the variance of the distribution.

An explanation would be appreciated.
 
Physics news on Phys.org
  • #2
Mogarrr said:
Using a normal approximation method for the Wilcoxon Signed-Rank Test, I've seen that the expected value is [itex] \mu = \frac {n(n+1)}2 [/itex] and the variance is [itex] \sigma^2 = \frac {n(n+1)(2n+1)}{24} [/itex].

I'm wondering why these are the expected value and variance.

I do recognize the formula for the sum of N natural numbers and the sum of N squared natural numbers.

I have an idea as to why the expected value is half the sum of N natural numbers. Under the null hypothesis, roughly half of the differences should be positive, so it would make sense to half the sum of N natural numbers.

I have no intuition for the variance of the distribution.

An explanation would be appreciated.

Are you sure you're right about the expected value?

The statistic used in a signed-rank test is

[itex]W=\sum_{i=1}^{n}I_iR_i[/itex]

where [itex]I_i[/itex] is an indicator variable defined as [itex]0[/itex] if [itex]x_i-y_i[/itex] is negative, and equal to [itex]1[/itex] otherwise, for couples of [itex](x_i,y_i)[/itex] taken from both continuous distributions respectively describing random variables [itex]X_i[/itex] and [itex]Y_i[/itex].

Now, note that

[itex]W=\sum_{i=1}^{n}I_iR_i[/itex]

has the same distribution of

[itex]U=\sum_{i=1}^{n}U_i[/itex],

where [itex]P(U_i=0)=P(U_i=i)=0.5[/itex], since both [itex]W[/itex] and [itex]U[/itex] are sums of subsets of [itex]1,2,...,n[/itex].

In other words, the equal chances of falling on either a negative or a positive difference are equivalent to the equal chances of being included in the sum or not.

Therefore,

[itex]E(W)=E(U)=\sum_{i=1}^{n}E(U_i)=\sum_{i=1}^{n}[0\frac{1}{2}+i\frac{1}{2}]=\frac{1}{2}\sum_{i=1}^{n}i[/itex]

And we know that

[itex]\sum_{i=1}^{n}i=\frac{n(n+1)}{2}[/itex],

Therefore,

[itex]E(W)=\frac{n(n+1)}{4}[/itex].

Now what would you get for the variance, working with [itex]Var(W)=Var(U)[/itex] knowing the [itex]U_i[/itex] are independent?

A similar work would do the trick.

In fact, the results make sense because the test statistic [itex]W[/itex] ranges from a minimum of [itex]0[/itex], if all the differences are negative, to a maximum of [itex]\frac{n(n+1)}{2}[/itex], if all the differences are positive. Since everything we're working with is symmetric (equally probably two results), then [itex]W[/itex] is expected to be close to its mean, [itex]\frac{n(n+1)}{4}[/itex].
 
Last edited:
  • Like
Likes Mogarrr
  • #3
Right. I wrote down the wrong number for the expected value.

So similarly, [itex] E W^2 = E U^2 = \sum_{i=1}^n 0 \cdot \frac 12 + i^2 \cdot \frac 12 = \frac 12 \sum_{i=1}^n i^2 = \frac {n(n+1)(2n+1)}{12}[/itex].

Then the variance of W is [itex] EW^2 - (EW)^2 [/itex], but this quantity doesn't seem to come out to be the variance I was given.
 
  • #4
Mogarrr said:
Right. I wrote down the wrong number for the expected value.

So similarly, [itex] E W^2 = E U^2 = \sum_{i=1}^n 0 \cdot \frac 12 + i^2 \cdot \frac 12 = \frac 12 \sum_{i=1}^n i^2 = \frac {n(n+1)(2n+1)}{12}[/itex].

Then the variance of W is [itex] EW^2 - (EW)^2 [/itex], but this quantity doesn't seem to come out to be the variance I was given.

Be careful, there's a difference between [itex]U[/itex] and [itex]U_i[/itex]!

You're assuming [itex]E(U_i^2)=E(U^2)[/itex] but in fact, we have that [itex]E(U)=\sum_{i=1}^{n}E(U_i)[/itex] because the [itex]U_i[/itex] are independent.

We should have :

[itex]Var(U_i) = E(U_i^2)-E^2(U_i) = \left(0^2 \cdot \frac 12 + i^2 \cdot \frac 12\right) - \left(\frac{1}{2}\right)^2= \frac {i^2}{2} - \left(\frac{i}{2}\right)^2 = \frac{i^2}{4}[/itex]

And finally,

[itex]Var(W) = \sum_{i=1}^{n} Var(U_i) = \sum_{i=1}^{n} \frac {i^2}{4} = \frac{1}{4} \cdot \frac{n(n+1)(2n+1)}{6} = \frac{n(n+1)(2n+1)}{24}[/itex]

gives us the expected result.
 
Last edited:
  • Like
Likes Mogarrr
  • #5
A follow-up question. For expectation and variance of wilcoxon, for the value of n (i.e., number of pairs), do you exclude pairs in which the difference between the pairs is zero? So let's say you have 100 pairs (n = 100), but for one of the pairs, the score for the two observations is the same and thus they are excluded in determining ranks. Now if you wish to determine whether or not the obtained W is significant, you convert to a z score using (W-expW)/sqrt(varW). So again my question, in this scenario to compute expW and varW, does n = 100 or does n = 99?
 
  • #6
ron_vancouver said:
A follow-up question. For expectation and variance of wilcoxon, for the value of n (i.e., number of pairs), do you exclude pairs in which the difference between the pairs is zero? So let's say you have 100 pairs (n = 100), but for one of the pairs, the score for the two observations is the same and thus they are excluded in determining ranks. Now if you wish to determine whether or not the obtained W is significant, you convert to a z score using (W-expW)/sqrt(varW). So again my question, in this scenario to compute expW and varW, does n = 100 or does n = 99?

In most applications of the Wilcoxon test, we omit from consideration the cases where the absolute difference of ##X_i## and ##Y_i## for a certain bivariate pair is zero. They provide no useful information to the procedure.
 

FAQ: Expected Value and Variance for Wilcoxon Signed-Rank Test

1. What is the purpose of calculating expected value and variance for the Wilcoxon Signed-Rank Test?

The expected value and variance for the Wilcoxon Signed-Rank Test are used to determine the statistical significance of the results. The expected value represents the average difference between the paired observations, while the variance measures the spread of the data around the expected value.

2. How is the expected value calculated for the Wilcoxon Signed-Rank Test?

The expected value for the Wilcoxon Signed-Rank Test is calculated by multiplying the sample size by the (N+1)/2, where N is the number of pairs in the data set. This value is then divided by 2 to account for the fact that the signed ranks range from 1 to N and are symmetric around 0.

3. What does the variance represent in the Wilcoxon Signed-Rank Test?

The variance in the Wilcoxon Signed-Rank Test represents the variability of the data around the expected value. It can also be interpreted as the average squared deviation from the expected value. A smaller variance indicates that the data points are closer to the expected value, while a larger variance suggests that the data is more spread out.

4. How is the variance calculated for the Wilcoxon Signed-Rank Test?

The variance for the Wilcoxon Signed-Rank Test is calculated by summing the squared differences between each pair of observations and dividing by 12. This value is then multiplied by the number of pairs in the data set, minus the number of tied ranks.

5. Can the expected value and variance be used to make inferences about the population?

Yes, the expected value and variance from the Wilcoxon Signed-Rank Test can be used to make inferences about the population. The expected value can be compared to a hypothesized value, such as 0 for a two-tailed test, to determine if the results are statistically significant. The variance can also be used in confidence interval calculations to estimate the true difference between the paired observations in the population.

Back
Top