Data collected from different devices: how to combine for analysis?

  • I
  • Thread starter Mikki123
  • Start date
  • Tags
    Data
  • #1
Mikki123
4
3
Hi Everyone,
I'm working on a project where I have current values from three different devices when there is no arc and an arc generated by an arc generator. When I plot them, they all look different since the data is from different devices. Is there anything I can do to make them comparable, like make them look similar, so that I can perform further analysis?
 
Physics news on Phys.org
  • #2
In your plot where y is current values, what is x ?
 
  • #3
it is just indexes starting from 0 to the number of samples
 
  • #4
However you may get statistics, e.g. average, sandard deviation, as mathematical treatment, Number has no physical meaning. You had better pick up some phisical quantitiy from the samples for plot,e.g. same divice with different physical condition, same condition with different devices.
 
  • Like
  • Informative
Likes Vanadium 50, russ_watters and mcastillo356
  • #5
All good. I will try doing that. Thankyou :smile:
 
  • #6
Mikki123 said:
it is just indexes starting from 0 to the number of samples
Are they collected over time? Then x is time, isn't it? Are these samples collected in regular periods? Then it is just a matter of knowing frequency, no?
 
  • Like
Likes russ_watters
  • #7
Hi Borek,
I just have a single column of 800,000 current values. The x values should be time, I suppose. I have the same from three different devices. But while plotting it, they looked so different. I wanted to train my machine learning model with this data for further processing. Since the data all looks so different, I'm getting such poor performance. Do you think the Fourier transformation for all three will make them look similar so that I can better train my model. I'm looking for any kind of preprocessing apart from normalization and feature extraction
 
  • #8
Mikki123 said:
The x values should be time, I suppose.
You should probably know this (?????) It matters that they be equally spaced with no " jitter ".
Then look at (the difference) fourier transform to find interferences.
When you say feature extraction what exactly do you mean?
Why do you expect similar results?
 
  • Like
Likes russ_watters
  • #9
Good data requires a detailed understanding of what it really means; how the measurement instrument works; what it is REALLY measuring. Good lab work is more about experiment design and selecting and/or researching the instrumentation than it is about collecting the data. This would be especially true if you are trying to represent the same physical quantity with different methods.

Without this prior engineering it is likely that the data is meaningless, or has unknown meaning. Bad data can be combined however you like. Garbage in, garbage out applies from the very beginning of experimentation and analysis.

A set of numbers isn't data, it's just numbers. Data has an associated meaning and context.

You will not get useful answers from us if we don't know, in detail, what you are measuring, why, and how.
 
  • Like
Likes gleem, Vanadium 50, hutchphd and 2 others

FAQ: Data collected from different devices: how to combine for analysis?

How do I ensure data compatibility from different devices?

To ensure data compatibility, you need to standardize the data formats from different devices. This can involve converting data into a common format, aligning timestamp formats, and ensuring that measurement units are consistent. Using data transformation tools and scripting languages like Python or R can help automate this process.

What are the best practices for cleaning and preprocessing data from multiple sources?

Best practices for cleaning and preprocessing data include removing duplicates, handling missing values, normalizing data, and ensuring data integrity. It is also important to document the preprocessing steps to maintain data provenance. Tools like Pandas in Python or the dplyr package in R can be very useful for these tasks.

How can I handle different sampling rates from various devices?

To handle different sampling rates, you can resample the data to a common frequency. This might involve upsampling (interpolating data points) or downsampling (aggregating data points). Techniques like linear interpolation, spline interpolation, or averaging can be used depending on the nature of the data.

What methods can be used to synchronize data from different devices?

Synchronization can be achieved by aligning timestamps. This often requires converting all timestamps to a common time zone and format. If devices are not perfectly synchronized, you can use techniques like cross-correlation to find time offsets and adjust accordingly. Ensuring that all devices are synchronized to a common time source, such as NTP (Network Time Protocol), can also help.

How do I combine and merge datasets from different sources for analysis?

Combining and merging datasets typically involves using keys or common fields that can link records from different sources. This can be done using database join operations (e.g., SQL JOINs) or data manipulation libraries in programming languages (e.g., Pandas merge function in Python). Ensuring that the keys are consistent and correctly formatted is crucial for accurate merging.

Back
Top