Probability and Statistics Questions

In summary, the conversation discusses a variety of probability and statistics questions related to different scenarios. The first question involves overbooking on an airline flight and finding z-scores, probabilities, and expected occurrences. The second question is about the relationship between temperature and cricket chirps, including producing a scatterplot and estimating the frequency of chirps. The third question examines the correlation between father and son heights and uses regression to estimate heights. The fourth question involves determining the probability of someone being classified as hypertensive by a blood-pressure machine actually having high blood pressure. The final question discusses a survey on influenza in families and finding marginal distributions and probabilities.
  • #1
His_Dudeness3
16
0
Probability and Statistics Questions!

Hey everyone, I've got a Statistics project due on Wednesday and I've got it all pretty much done except for a couple of questions. The ones I've got a problem with are in bold.

Question 1.

Airlines usually over-book the seats on an aircraft by a certain margin because they know from experience that some people change or do not show for their scheduled flight. Data collected for a particular Melbourne–Darwin flight showed that, on average, 230 people (with a standard deviation of 30) did arrive for their scheduled flight. The data followed a normal distribution. The aircraft has seats for 305 passengers.
a) What are the z-scores for 150, 200 and 250 arrivals?
b) What is the z-score that represents a completely filled flight? What is the probability that a randomly selected flight has enough seats available for all the people who turn up?
c) This particular flight goes every day. During one year of operation, how many times would you expect there to be more passengers than available seats? Justify your answer.
d) What proportion of flights have less than half the seats occupied?

Question 2.

Crickets make a characteristic chirping sound by rapidly rubbing their wing covers together. Researchers decided to investigate the relationship between the temperature and the frequency of the chirps. They obtained the following data from from observing a particular type of cricket:
Chirps per second: Temperature(degrees celcius):
14.5 22.1
15.5 23.2
20.0 32.1
18.5 28.0
16.4 25.1
19.7 33.0
17.1 26.3
15.8 24.8
16.7 25.1
15.9 23.9
17.0 27.2
17.8 27.6
18.9 31.0
18.1 29.7

a) Decide which one is the explanatory (predictor) variable and which is the response variable and produce the scatterplot with a line of best fit and the r-squared value (include this in your answer).
b) If the temperature is 33°C, how often would you estimate the cricket is chirping?
c) Suppose a cricket is chirping 1050 times a minute, use the line you have found to estimate the temperature at that time? Is this line the best for estimating temperature?

Question 3.

The heights of a group of fathers has mean 175 cms and SD 5 cms. The sons heights have mean 180 and SD 6. The correlation between father and son heights is 0.5
a) If a father has a height of 184cms, estimate the height of the son.
b) How tall should a father be for the estimated height of his son to be the same?
c) Why doesn’t regression to the mean imply that we all end up with the same height?

Question 4.

Suppose 95% of hypertensives (high blood pressure) and 20% of normotensives are classified as hypertensive by a blood-pressure machine. Given that 20% of the population are hypertensive, what is the probability that someone classified as hypertensive by the machine really is hypertensive?

Question 5.

Suppose an influenza epidemic strikes a city. 1000 2-parent families are surveyed. In 10% of families the mother gets the disease (includes the possibility the father does too), in 10% the father gets it (includes the possibility the mother does too) and in 2% both do.
a) What is the marginal distribution for disease among the parents?
b) What is the relative frequency the father gets the disease given the mother does?
c) What is the probability neither father nor mother gets the disease?


The Answers I've gotten for the following are as follows (includes ones I don't think I need help with):

1. (a) z(150)= -2.667 z(200)= -1 z(250)= 0.667
(b) z(completely filled flight)= 2.50, proportion of a flight having enough seats for all who show up= 0.9938 (i got this from the proportion that z<2.50 from the normal distribution table)
(c) Proportion of flights not having enough seats= 1-.9938=0.0062. I then multiplied this by 365 to get the no. of flights per year that don't have enough seats, which equals 2.263 flights/year (~3 flights/year).
(d) I got the z-score for this as -2.60 ( I used 152 as raw score, as it states 'have less than half seats occupied) and the proportion of this is 0.0047 ( ~2 flights/ year ).

2. (a) chirps/second= response variable, temperature= explanatory variable.
Regression line: y=0.4713x+4.517
(b) Using regression line, I got y= 20.1 chirps/second
(c) Converted 1050 chirps/minute to 17.5 chirps/sec by dividing 1050 by 60. Subbed this into regression line, where I got the temperature as 27.5 degrees celcius. I said, because the R^2 value wasn't greater than 0.99, it can't give an accurate estimate. I was wondering if I had to go on and do a residual plot for the linear relationship to see if this is the best line to estimate from.

3. (a) Using the y(estimate)= y(mean) + ( r * (Sx/Sy) (x-x(mean)), I got y(estimate)=183.8cm
(b) To get the same height as estimation, I let the y(estimate)=x. I then solved for x and got x=183.6cm for the sons estimated height and the fathers actual height to be the same.
(c) I said that correlation to the mean (i.e. regression or correlation) doesn't imply or explain causality, only variability of change of one variable with respect to another variable. Thus, the sons height won't be solely effected by their Father's height.

4. I used this convention: C+= people classified as hypertensive, C-= people classified as normotensive, H+= people who are actually hypertensive, H-= people who are actually normotensive. I used a table to display the data:

H+ H-
C+ 19 16 35
C- 1 64 65
20 80 1.00

Thus, I got the proportion that someone is hypertensive, given they're hypertensive is 19/35,(0.543).

5. This one was a real doozy, especially when it came to interpreting the data. I just interpreted the data as the No.( Father getting the disease ) as 10% and the No.( Mother getting the disease ) as 10%.
(a) I set up the distribution as follows:
A= Mother Gets Disease, A*= Mother doesn't get disease, B= Father gets disease, B*= Father doesn't get disease

A A*
B 0.02 0.08 0.10
B* 0.08 0.82 0.90
0.10 0.90 1.00
(b) I got the probability for this using Pr(AB)/Pr(B), which got 0.20
(c) from the table, Pr(A*B*)= 0.82

Any help is greatly appreciated! Oh yeah, sorry about the dodgy KV maps for questions (4) and (5)(a), no matter how I manipulate it, the characters just go straight to the margin. And as a future reference, where would I post homework questions on Statistics? Thanks guys
 
Physics news on Phys.org
  • #2


No one?? I knew this was a bullsh*t assignment!
 
  • #3


I am not able to provide direct answers to homework questions. However, I can offer some guidance and suggestions to help you understand and solve these problems.

1. For question 1, it seems like you have a good understanding of z-scores and the normal distribution. For part c, you correctly calculated the proportion of flights not having enough seats, but it would be helpful to explain why you multiplied it by 365 to get the number of flights per year. Also, for part d, it may be helpful to interpret the z-score in terms of the normal distribution (e.g. "less than half the seats occupied" corresponds to a z-score of -2.60, meaning that the number of occupied seats is 2.60 standard deviations below the mean).

2. For question 2, you correctly identified the explanatory and response variables and calculated the regression line. To determine if this is the best line to estimate from, you could plot the residuals (observed values minus predicted values) against the explanatory variable and see if there is a pattern or if the residuals are randomly distributed.

3. For question 3, your answers for parts a and b are correct. For part c, you could also mention that even though there is a correlation between father and son heights, there are also other factors that can influence a person's height, such as genetics, nutrition, and environment.

4. For question 4, it seems like you have a good understanding of conditional probabilities and used a contingency table to calculate the probability that someone classified as hypertensive is actually hypertensive. It may be helpful to explain this process in more detail in your answer.

5. For question 5, it's important to carefully read and interpret the data. It seems like you may have misinterpreted the data as the percentages instead of the actual numbers of families. Also, for part b, it may be helpful to explain how you calculated the relative frequency and what it represents. For part c, you could also mention that the probability of neither parent getting the disease is equal to the product of the probabilities of each parent not getting the disease (0.90 * 0.90 = 0.81). It may also be helpful to interpret this probability in the context of the epidemic (e.g. "There is an 81% chance that neither parent in a randomly selected family will get the disease during the epidemic").

In the future, for homework questions on statistics
 

FAQ: Probability and Statistics Questions

What is the difference between probability and statistics?

Probability is the branch of mathematics that deals with the likelihood of events occurring, while statistics is the branch of mathematics that deals with the collection, analysis, and interpretation of data.

How is probability used in real-life situations?

Probability is used in many real-life situations, such as predicting the outcome of games, weather forecasting, risk assessment in insurance, and analyzing data in scientific experiments.

What is the difference between descriptive and inferential statistics?

Descriptive statistics involves summarizing and describing data using measures such as mean, median, and standard deviation, while inferential statistics involves making predictions and drawing conclusions about a larger population based on a sample of data.

What is the Central Limit Theorem?

The Central Limit Theorem states that as the sample size increases, the sampling distribution of the sample mean approaches a normal distribution, regardless of the shape of the population distribution.

How are probability and statistics related to each other?

Probability is the foundation of statistics, as it provides the theoretical framework for analyzing and interpreting data. Statistics, on the other hand, utilizes probability to make inferences and draw conclusions about a population based on a sample of data.

Similar threads

Replies
3
Views
2K
Replies
6
Views
3K
Replies
24
Views
2K
Replies
7
Views
2K
Replies
3
Views
7K
Replies
30
Views
832
Replies
30
Views
4K
Replies
1
Views
1K
Replies
1
Views
2K
Back
Top