- #1
His_Dudeness3
- 16
- 0
Probability and Statistics Questions!
Hey everyone, I've got a Statistics project due on Wednesday and I've got it all pretty much done except for a couple of questions. The ones I've got a problem with are in bold.
Question 1.
Airlines usually over-book the seats on an aircraft by a certain margin because they know from experience that some people change or do not show for their scheduled flight. Data collected for a particular Melbourne–Darwin flight showed that, on average, 230 people (with a standard deviation of 30) did arrive for their scheduled flight. The data followed a normal distribution. The aircraft has seats for 305 passengers.
a) What are the z-scores for 150, 200 and 250 arrivals?
b) What is the z-score that represents a completely filled flight? What is the probability that a randomly selected flight has enough seats available for all the people who turn up?
c) This particular flight goes every day. During one year of operation, how many times would you expect there to be more passengers than available seats? Justify your answer.
d) What proportion of flights have less than half the seats occupied?
Question 2.
Crickets make a characteristic chirping sound by rapidly rubbing their wing covers together. Researchers decided to investigate the relationship between the temperature and the frequency of the chirps. They obtained the following data from from observing a particular type of cricket:
Chirps per second: Temperature(degrees celcius):
14.5 22.1
15.5 23.2
20.0 32.1
18.5 28.0
16.4 25.1
19.7 33.0
17.1 26.3
15.8 24.8
16.7 25.1
15.9 23.9
17.0 27.2
17.8 27.6
18.9 31.0
18.1 29.7
a) Decide which one is the explanatory (predictor) variable and which is the response variable and produce the scatterplot with a line of best fit and the r-squared value (include this in your answer).
b) If the temperature is 33°C, how often would you estimate the cricket is chirping?
c) Suppose a cricket is chirping 1050 times a minute, use the line you have found to estimate the temperature at that time? Is this line the best for estimating temperature?
Question 3.
The heights of a group of fathers has mean 175 cms and SD 5 cms. The sons heights have mean 180 and SD 6. The correlation between father and son heights is 0.5
a) If a father has a height of 184cms, estimate the height of the son.
b) How tall should a father be for the estimated height of his son to be the same?
c) Why doesn’t regression to the mean imply that we all end up with the same height?
Question 4.
Suppose 95% of hypertensives (high blood pressure) and 20% of normotensives are classified as hypertensive by a blood-pressure machine. Given that 20% of the population are hypertensive, what is the probability that someone classified as hypertensive by the machine really is hypertensive?
Question 5.
Suppose an influenza epidemic strikes a city. 1000 2-parent families are surveyed. In 10% of families the mother gets the disease (includes the possibility the father does too), in 10% the father gets it (includes the possibility the mother does too) and in 2% both do.
a) What is the marginal distribution for disease among the parents?
b) What is the relative frequency the father gets the disease given the mother does?
c) What is the probability neither father nor mother gets the disease?
The Answers I've gotten for the following are as follows (includes ones I don't think I need help with):
1. (a) z(150)= -2.667 z(200)= -1 z(250)= 0.667
(b) z(completely filled flight)= 2.50, proportion of a flight having enough seats for all who show up= 0.9938 (i got this from the proportion that z<2.50 from the normal distribution table)
(c) Proportion of flights not having enough seats= 1-.9938=0.0062. I then multiplied this by 365 to get the no. of flights per year that don't have enough seats, which equals 2.263 flights/year (~3 flights/year).
(d) I got the z-score for this as -2.60 ( I used 152 as raw score, as it states 'have less than half seats occupied) and the proportion of this is 0.0047 ( ~2 flights/ year ).
2. (a) chirps/second= response variable, temperature= explanatory variable.
Regression line: y=0.4713x+4.517
(b) Using regression line, I got y= 20.1 chirps/second
(c) Converted 1050 chirps/minute to 17.5 chirps/sec by dividing 1050 by 60. Subbed this into regression line, where I got the temperature as 27.5 degrees celcius. I said, because the R^2 value wasn't greater than 0.99, it can't give an accurate estimate. I was wondering if I had to go on and do a residual plot for the linear relationship to see if this is the best line to estimate from.
3. (a) Using the y(estimate)= y(mean) + ( r * (Sx/Sy) (x-x(mean)), I got y(estimate)=183.8cm
(b) To get the same height as estimation, I let the y(estimate)=x. I then solved for x and got x=183.6cm for the sons estimated height and the fathers actual height to be the same.
(c) I said that correlation to the mean (i.e. regression or correlation) doesn't imply or explain causality, only variability of change of one variable with respect to another variable. Thus, the sons height won't be solely effected by their Father's height.
4. I used this convention: C+= people classified as hypertensive, C-= people classified as normotensive, H+= people who are actually hypertensive, H-= people who are actually normotensive. I used a table to display the data:
H+ H-
C+ 19 16 35
C- 1 64 65
20 80 1.00
Thus, I got the proportion that someone is hypertensive, given they're hypertensive is 19/35,(0.543).
5. This one was a real doozy, especially when it came to interpreting the data. I just interpreted the data as the No.( Father getting the disease ) as 10% and the No.( Mother getting the disease ) as 10%.
(a) I set up the distribution as follows:
A= Mother Gets Disease, A*= Mother doesn't get disease, B= Father gets disease, B*= Father doesn't get disease
A A*
B 0.02 0.08 0.10
B* 0.08 0.82 0.90
0.10 0.90 1.00
(b) I got the probability for this using Pr(AB)/Pr(B), which got 0.20
(c) from the table, Pr(A*B*)= 0.82
Any help is greatly appreciated! Oh yeah, sorry about the dodgy KV maps for questions (4) and (5)(a), no matter how I manipulate it, the characters just go straight to the margin. And as a future reference, where would I post homework questions on Statistics? Thanks guys
Hey everyone, I've got a Statistics project due on Wednesday and I've got it all pretty much done except for a couple of questions. The ones I've got a problem with are in bold.
Question 1.
Airlines usually over-book the seats on an aircraft by a certain margin because they know from experience that some people change or do not show for their scheduled flight. Data collected for a particular Melbourne–Darwin flight showed that, on average, 230 people (with a standard deviation of 30) did arrive for their scheduled flight. The data followed a normal distribution. The aircraft has seats for 305 passengers.
a) What are the z-scores for 150, 200 and 250 arrivals?
b) What is the z-score that represents a completely filled flight? What is the probability that a randomly selected flight has enough seats available for all the people who turn up?
c) This particular flight goes every day. During one year of operation, how many times would you expect there to be more passengers than available seats? Justify your answer.
d) What proportion of flights have less than half the seats occupied?
Question 2.
Crickets make a characteristic chirping sound by rapidly rubbing their wing covers together. Researchers decided to investigate the relationship between the temperature and the frequency of the chirps. They obtained the following data from from observing a particular type of cricket:
Chirps per second: Temperature(degrees celcius):
14.5 22.1
15.5 23.2
20.0 32.1
18.5 28.0
16.4 25.1
19.7 33.0
17.1 26.3
15.8 24.8
16.7 25.1
15.9 23.9
17.0 27.2
17.8 27.6
18.9 31.0
18.1 29.7
a) Decide which one is the explanatory (predictor) variable and which is the response variable and produce the scatterplot with a line of best fit and the r-squared value (include this in your answer).
b) If the temperature is 33°C, how often would you estimate the cricket is chirping?
c) Suppose a cricket is chirping 1050 times a minute, use the line you have found to estimate the temperature at that time? Is this line the best for estimating temperature?
Question 3.
The heights of a group of fathers has mean 175 cms and SD 5 cms. The sons heights have mean 180 and SD 6. The correlation between father and son heights is 0.5
a) If a father has a height of 184cms, estimate the height of the son.
b) How tall should a father be for the estimated height of his son to be the same?
c) Why doesn’t regression to the mean imply that we all end up with the same height?
Question 4.
Suppose 95% of hypertensives (high blood pressure) and 20% of normotensives are classified as hypertensive by a blood-pressure machine. Given that 20% of the population are hypertensive, what is the probability that someone classified as hypertensive by the machine really is hypertensive?
Question 5.
Suppose an influenza epidemic strikes a city. 1000 2-parent families are surveyed. In 10% of families the mother gets the disease (includes the possibility the father does too), in 10% the father gets it (includes the possibility the mother does too) and in 2% both do.
a) What is the marginal distribution for disease among the parents?
b) What is the relative frequency the father gets the disease given the mother does?
c) What is the probability neither father nor mother gets the disease?
The Answers I've gotten for the following are as follows (includes ones I don't think I need help with):
1. (a) z(150)= -2.667 z(200)= -1 z(250)= 0.667
(b) z(completely filled flight)= 2.50, proportion of a flight having enough seats for all who show up= 0.9938 (i got this from the proportion that z<2.50 from the normal distribution table)
(c) Proportion of flights not having enough seats= 1-.9938=0.0062. I then multiplied this by 365 to get the no. of flights per year that don't have enough seats, which equals 2.263 flights/year (~3 flights/year).
(d) I got the z-score for this as -2.60 ( I used 152 as raw score, as it states 'have less than half seats occupied) and the proportion of this is 0.0047 ( ~2 flights/ year ).
2. (a) chirps/second= response variable, temperature= explanatory variable.
Regression line: y=0.4713x+4.517
(b) Using regression line, I got y= 20.1 chirps/second
(c) Converted 1050 chirps/minute to 17.5 chirps/sec by dividing 1050 by 60. Subbed this into regression line, where I got the temperature as 27.5 degrees celcius. I said, because the R^2 value wasn't greater than 0.99, it can't give an accurate estimate. I was wondering if I had to go on and do a residual plot for the linear relationship to see if this is the best line to estimate from.
3. (a) Using the y(estimate)= y(mean) + ( r * (Sx/Sy) (x-x(mean)), I got y(estimate)=183.8cm
(b) To get the same height as estimation, I let the y(estimate)=x. I then solved for x and got x=183.6cm for the sons estimated height and the fathers actual height to be the same.
(c) I said that correlation to the mean (i.e. regression or correlation) doesn't imply or explain causality, only variability of change of one variable with respect to another variable. Thus, the sons height won't be solely effected by their Father's height.
4. I used this convention: C+= people classified as hypertensive, C-= people classified as normotensive, H+= people who are actually hypertensive, H-= people who are actually normotensive. I used a table to display the data:
H+ H-
C+ 19 16 35
C- 1 64 65
20 80 1.00
Thus, I got the proportion that someone is hypertensive, given they're hypertensive is 19/35,(0.543).
5. This one was a real doozy, especially when it came to interpreting the data. I just interpreted the data as the No.( Father getting the disease ) as 10% and the No.( Mother getting the disease ) as 10%.
(a) I set up the distribution as follows:
A= Mother Gets Disease, A*= Mother doesn't get disease, B= Father gets disease, B*= Father doesn't get disease
A A*
B 0.02 0.08 0.10
B* 0.08 0.82 0.90
0.10 0.90 1.00
(b) I got the probability for this using Pr(AB)/Pr(B), which got 0.20
(c) from the table, Pr(A*B*)= 0.82
Any help is greatly appreciated! Oh yeah, sorry about the dodgy KV maps for questions (4) and (5)(a), no matter how I manipulate it, the characters just go straight to the margin. And as a future reference, where would I post homework questions on Statistics? Thanks guys