Probability and Statistics Questions

His_Dudeness3 · Aug 25, 2008

Probability and Statistics Questions!

Hey everyone, I've got a Statistics project due on Wednesday and I've got it all pretty much done except for a couple of questions. The ones I've got a problem with are in bold.

Question 1.

Airlines usually over-book the seats on an aircraft by a certain margin because they know from experience that some people change or do not show for their scheduled flight. Data collected for a particular Melbourne–Darwin flight showed that, on average, 230 people (with a standard deviation of 30) did arrive for their scheduled flight. The data followed a normal distribution. The aircraft has seats for 305 passengers.
a) What are the z-scores for 150, 200 and 250 arrivals?
b) What is the z-score that represents a completely filled flight? What is the probability that a randomly selected flight has enough seats available for all the people who turn up?
c) This particular flight goes every day. During one year of operation, how many times would you expect there to be more passengers than available seats? Justify your answer.
d) What proportion of flights have less than half the seats occupied?

Question 2.

Crickets make a characteristic chirping sound by rapidly rubbing their wing covers together. Researchers decided to investigate the relationship between the temperature and the frequency of the chirps. They obtained the following data from from observing a particular type of cricket:
Chirps per second: Temperature(degrees celcius):
14.5 22.1
15.5 23.2
20.0 32.1
18.5 28.0
16.4 25.1
19.7 33.0
17.1 26.3
15.8 24.8
16.7 25.1
15.9 23.9
17.0 27.2
17.8 27.6
18.9 31.0
18.1 29.7

a) Decide which one is the explanatory (predictor) variable and which is the response variable and produce the scatterplot with a line of best fit and the r-squared value (include this in your answer).
b) If the temperature is 33°C, how often would you estimate the cricket is chirping?
c) Suppose a cricket is chirping 1050 times a minute, use the line you have found to estimate the temperature at that time? Is this line the best for estimating temperature?

Question 3.

The heights of a group of fathers has mean 175 cms and SD 5 cms. The sons heights have mean 180 and SD 6. The correlation between father and son heights is 0.5
a) If a father has a height of 184cms, estimate the height of the son.
b) How tall should a father be for the estimated height of his son to be the same?
c) Why doesn’t regression to the mean imply that we all end up with the same height?

Question 4.

Suppose 95% of hypertensives (high blood pressure) and 20% of normotensives are classified as hypertensive by a blood-pressure machine. Given that 20% of the population are hypertensive, what is the probability that someone classified as hypertensive by the machine really is hypertensive?

Question 5.

Suppose an influenza epidemic strikes a city. 1000 2-parent families are surveyed. In 10% of families the mother gets the disease (includes the possibility the father does too), in 10% the father gets it (includes the possibility the mother does too) and in 2% both do.
a) What is the marginal distribution for disease among the parents?
b) What is the relative frequency the father gets the disease given the mother does?
c) What is the probability neither father nor mother gets the disease?

The Answers I've gotten for the following are as follows (includes ones I don't think I need help with):

1. (a) z(150)= -2.667 z(200)= -1 z(250)= 0.667
(b) z(completely filled flight)= 2.50, proportion of a flight having enough seats for all who show up= 0.9938 (i got this from the proportion that z<2.50 from the normal distribution table)
(c) Proportion of flights not having enough seats= 1-.9938=0.0062. I then multiplied this by 365 to get the no. of flights per year that don't have enough seats, which equals 2.263 flights/year (~3 flights/year).
(d) I got the z-score for this as -2.60 ( I used 152 as raw score, as it states 'have less than half seats occupied) and the proportion of this is 0.0047 ( ~2 flights/ year ).

2. (a) chirps/second= response variable, temperature= explanatory variable.
Regression line: y=0.4713x+4.517
(b) Using regression line, I got y= 20.1 chirps/second
(c) Converted 1050 chirps/minute to 17.5 chirps/sec by dividing 1050 by 60. Subbed this into regression line, where I got the temperature as 27.5 degrees celcius. I said, because the R^2 value wasn't greater than 0.99, it can't give an accurate estimate. I was wondering if I had to go on and do a residual plot for the linear relationship to see if this is the best line to estimate from.

3. (a) Using the y(estimate)= y(mean) + ( r * (Sx/Sy) (x-x(mean)), I got y(estimate)=183.8cm
(b) To get the same height as estimation, I let the y(estimate)=x. I then solved for x and got x=183.6cm for the sons estimated height and the fathers actual height to be the same.
(c) I said that correlation to the mean (i.e. regression or correlation) doesn't imply or explain causality, only variability of change of one variable with respect to another variable. Thus, the sons height won't be solely effected by their Father's height.

4. I used this convention: C+= people classified as hypertensive, C-= people classified as normotensive, H+= people who are actually hypertensive, H-= people who are actually normotensive. I used a table to display the data:

H+ H-
C+ 19 16 35
C- 1 64 65
20 80 1.00

Thus, I got the proportion that someone is hypertensive, given they're hypertensive is 19/35,(0.543).

5. This one was a real doozy, especially when it came to interpreting the data. I just interpreted the data as the No.( Father getting the disease ) as 10% and the No.( Mother getting the disease ) as 10%.
(a) I set up the distribution as follows:
A= Mother Gets Disease, A*= Mother doesn't get disease, B= Father gets disease, B*= Father doesn't get disease

A A*
B 0.02 0.08 0.10
B* 0.08 0.82 0.90
0.10 0.90 1.00
(b) I got the probability for this using Pr(AB)/Pr(B), which got 0.20
(c) from the table, Pr(A*B*)= 0.82

Any help is greatly appreciated! Oh yeah, sorry about the dodgy KV maps for questions (4) and (5)(a), no matter how I manipulate it, the characters just go straight to the margin. And as a future reference, where would I post homework questions on Statistics? Thanks guys

His_Dudeness3 · Aug 26, 2008

No one?? I knew this was a bullsh*t assignment!

Blueshift5 · Sep 2, 2008

I am not able to provide direct answers to homework questions. However, I can offer some guidance and suggestions to help you understand and solve these problems.

1. For question 1, it seems like you have a good understanding of z-scores and the normal distribution. For part c, you correctly calculated the proportion of flights not having enough seats, but it would be helpful to explain why you multiplied it by 365 to get the number of flights per year. Also, for part d, it may be helpful to interpret the z-score in terms of the normal distribution (e.g. "less than half the seats occupied" corresponds to a z-score of -2.60, meaning that the number of occupied seats is 2.60 standard deviations below the mean).

2. For question 2, you correctly identified the explanatory and response variables and calculated the regression line. To determine if this is the best line to estimate from, you could plot the residuals (observed values minus predicted values) against the explanatory variable and see if there is a pattern or if the residuals are randomly distributed.

3. For question 3, your answers for parts a and b are correct. For part c, you could also mention that even though there is a correlation between father and son heights, there are also other factors that can influence a person's height, such as genetics, nutrition, and environment.

4. For question 4, it seems like you have a good understanding of conditional probabilities and used a contingency table to calculate the probability that someone classified as hypertensive is actually hypertensive. It may be helpful to explain this process in more detail in your answer.

5. For question 5, it's important to carefully read and interpret the data. It seems like you may have misinterpreted the data as the percentages instead of the actual numbers of families. Also, for part b, it may be helpful to explain how you calculated the relative frequency and what it represents. For part c, you could also mention that the probability of neither parent getting the disease is equal to the product of the probabilities of each parent not getting the disease (0.90 * 0.90 = 0.81). It may also be helpful to interpret this probability in the context of the epidemic (e.g. "There is an 81% chance that neither parent in a randomly selected family will get the disease during the epidemic").

In the future, for homework questions on statistics

Probability and Statistics Questions

FAQ: Probability and Statistics Questions

What is the difference between probability and statistics?

How is probability used in real-life situations?

What is the difference between descriptive and inferential statistics?

What is the Central Limit Theorem?

How are probability and statistics related to each other?

Similar threads

Hot Threads

Recent Insights