Cleaning/Reordering towards regression

In summary, you would need to use the names in the data to pair the columns up and then do a regression of the variables.
  • #1
WWGD
Science Advisor
Gold Member
7,422
11,426
TL;DR Summary
Excel: How to regress variable A on variable B when values for A are not matched with those for B? I mean, we have a population P for which we obtain values A=(a_i) and B=(b_i), but order in which A is presented is different than that in which B is presented. Example: Say I have separately the age and weight of 100 people but not given by :( Name, Age, Weight), but rather ordered separately by Age, Weight. How to easily re-order the latter to appear as triplets ( Name, Age, Weight)?
I have quantitative data on all countries on two variables, say A,B in Excel and I am trying to regress A on B. Problem is that data are ordered based on the magnitude of A, B , rather than Alpha by country. Is there a reasonable way of ordering by country for each and then regress A on B? If I rearrange Alpha by country, it will not, Afaik, match for A,B fittingly.
 
Computer science news on Phys.org
  • #2
I'm no clear on what you mean. Are A and B paired in the data or not? What does "ordered separately by Age, Weight" mean? What do your records look like? What columns of information are in the records?
The regression does not care about the order of the data, but it can not do a regression if the variables are not paired up. You may need to use names to get the data into the same order.
Are you saying that you have two tables (age, name) and (weight, name), each ordered by the first variable? If so, sort both tables by "name" and get the table (name, age, weight). Then do a regression of the variables "age" and "weight".

PS. It is very common that two tables that are supposed to have the same key, like "name", are not really accurate on the key variable. The tables have typos, missing entries, etc. that make problems in matching up the two tables. You may have to deal with that in some way.
 
Last edited:
  • Like
Likes WWGD
  • #3
Yes, clearly I have to pair it. I'm just wondering if there is a " reasonably nice" way of doing it within Excel and without requiring, e.g., Visual Basic or other tools I'm not conversant with. I have data the likes of some 200 data points of :(Person, Height), ordered by height and ( Person, Weight), ordered by weight and . I want to obtain a triple ( Person, Height, Weight) from the first two, so that the ith person is matched with their height and weight. Just wondering if there is a " nice" way of doing this within Excel.
 
  • #4
Let's say you have a Heights tab and a Weights tab. Names are in column A and Heights or Weights in column B. Data start on row 5.

In Heights!C5 enter the formula =match($A5,Weights!$A:$A,0).
In Heights!D5 enter the formula =index(Weights!B:B,$C5).

Copy the formulae down.

Edit: example on Google Sheets.
 
Last edited:
  • Like
Likes FactChecker and WWGD
  • #5
WWGD said:
Yes, clearly
The whole situation wasn't clear to me. But I am not the sharpest knife in the drawer. ;-)
I have data the likes of some 200 data points of :(Person, Height), ordered by height and ( Person, Weight), ordered by weight and . I want to obtain a triple ( Person, Height, Weight) from the first two, so that the ith person is matched with their height and weight. Just wondering if there is a " nice" way of doing this within Excel.​
Now it is clear.
 
  • #6
FactChecker said:
The whole situation wasn't clear to me. But I am not the sharpest knife in the drawer. ;-)
I have data the likes of some 200 data points of :(Person, Height), ordered by height and ( Person, Weight), ordered by weight and . I want to obtain a triple ( Person, Height, Weight) from the first two, so that the ith person is matched with their height and weight. Just wondering if there is a " nice" way of doing this within Excel.​
Now it is clear.
Sorry if I was abrupt ; it was pre-caffeine ;).
 
  • Haha
Likes FactChecker
  • #7
WWGD said:
Sorry if I was abrupt ; it was pre-caffeine ;).
Ha! No problem. I completely understand. :cool:
 
  • Like
Likes WWGD

FAQ: Cleaning/Reordering towards regression

What is "Cleaning/Reordering towards regression"?

"Cleaning/Reordering towards regression" is a data preprocessing technique used in regression analysis. It involves identifying and removing any irrelevant or noisy data, as well as reordering the remaining data in a way that improves the performance of the regression model.

Why is it important to clean and reorder data for regression?

Cleaning and reordering data is important because it helps to improve the accuracy and reliability of the regression model. By removing irrelevant or noisy data and organizing the remaining data, the model can better identify patterns and relationships between variables.

How do you determine which data needs to be cleaned or reordered?

This can vary depending on the specific dataset and the goals of the regression analysis. Generally, data that is missing, duplicated, or inconsistent should be cleaned. Reordering data may involve organizing it by variables, such as time or magnitude, that are relevant to the regression analysis.

What techniques can be used to clean and reorder data for regression?

There are several techniques that can be used, including removing outliers, imputing missing values, and transforming variables. Reordering techniques can include sorting data by a specific variable, binning data into categories, or using dimensionality reduction methods.

Can cleaning and reordering data completely eliminate the need for other regression techniques?

No, cleaning and reordering data is just one step in the overall regression analysis process. Other techniques, such as feature selection, model selection, and model evaluation, may also be necessary to create an accurate and reliable regression model.

Similar threads

Back
Top