Statistics - Creating a data set from a summary table

In summary, the conversation discusses how to handle outliers in a dataset while recreating the original data set for a project. It is important to assign values to the "above" and "below" observations and justify them based on the purpose of the data collection and analysis. Possible approaches include assigning the average or median values, or basing the values on the distribution of the rest of the data.
  • #1
4quila
3
0

Homework Statement



In his paper "regression towards mediocrity in hereditary stature" Francis Galton collects data on stature of children (928 obs.) & parents (205 obs.) now as part of my project i have been asked to recreate the original data set. However taking children the data set runs from 64.5,65.5,...,72.5 inches with corresponding numbers of obs. Fair enough. However we have 14 observations labelled "below" and 4 labelled "above" and the problem statement is;

"For objects labelled above or below you must assume some particular values. Please state these explicitly in a table and justify with one sentence."

Homework Equations




The Attempt at a Solution



Clearly if any of these above or below terms are large outliers they will have an impact on the regression so i want to avoid that. But i am struggling to think of a value for these variables to take? Do i assign them values so as to ensure the mean for example is unaffected? Or give them values so they form a nice looking histogram?

I guess in short i am asking if there is a common practice for this kind of thing? Or if it is just an arbritary assignment as long as i can justify it plausibly?

All help much appreciated.
 
Physics news on Phys.org
  • #2

I understand your concern about outliers and the impact they may have on regression. In this case, it is important to consider the purpose of the data collection and the analysis. If the outliers are large and have a significant impact on the regression, it may be necessary to remove them from the dataset or address them separately.

However, for the purpose of recreating the original data set, it is important to follow the instructions and assign values to the "above" and "below" observations. One approach could be to assign them the average value of the observations in the dataset, or the median value if there are any extreme outliers. This way, the overall mean and median of the dataset will not be significantly affected.

Another approach could be to assign values based on the distribution of the rest of the data. For example, if the data follows a normal distribution, you could assign the "above" and "below" observations based on the mean and standard deviation of the rest of the data.

Ultimately, the specific values chosen for the "above" and "below" observations may be somewhat arbitrary, but it is important to justify them based on the purpose of the data collection and the analysis. I hope this helps and good luck with your project.
 

FAQ: Statistics - Creating a data set from a summary table

How do you create a data set from a summary table in statistics?

To create a data set from a summary table in statistics, you will need to first identify the variables and their corresponding categories. Then, you can input the data into a spreadsheet or statistical software, making sure to accurately represent the summary table's data.

What is the purpose of creating a data set from a summary table in statistics?

The purpose of creating a data set from a summary table is to organize and analyze data in a more manageable format. Summary tables provide a concise overview of the data, but creating a data set allows for further statistical analysis and data manipulation.

What are the steps involved in creating a data set from a summary table?

The steps involved in creating a data set from a summary table include identifying the variables and their categories, organizing the data in a spreadsheet or statistical software, and ensuring the data is accurately represented. It may also involve calculating additional statistics or creating visualizations for a better understanding of the data.

How do you ensure the accuracy of a data set created from a summary table?

To ensure the accuracy of a data set created from a summary table, it is important to carefully input the data and double-check for any errors. It may also be helpful to compare the data set to the original summary table to ensure they match. Additionally, running statistical tests on the data set can help identify any discrepancies.

Can a data set created from a summary table be used for all types of statistical analysis?

Yes, a data set created from a summary table can be used for various types of statistical analysis. However, it is important to consider the limitations of the data set and the type of analysis being performed. Some statistical tests may require a larger or more detailed data set, while others may be suitable for analysis with a summary table.

Back
Top