Choice of Pipelines for Data Analysis

In summary, the conversation discusses the use of Tensorflow (TF) for processing data and its comparison to traditional calculation methods such as Linear/Multilinear Regression. The main variable affecting this choice is the type of Activation Functions used. TF introduces hidden layers in neural networks, resulting in a non-linear fit that typically performs better. The conversation also mentions working through a Keras tutorial on a dataset and provides resources for further learning, including the books "Hands On ML with Scikit-Learn, TF, and Keras" and "The 100 pg ML book."
  • #1
WWGD
Science Advisor
Gold Member
7,458
11,461
TL;DR Summary
What kind of rules of thumb are there to decide choice of pipeline?
Hi,
So say I have some data to process. I am trying, say, Linear/Multilinear Regression. I know how to do this within Python Pandas. I can learn how with Tensorflow (TF). Would TF produce the same output given the "right" choice of Activation Functions *? Or would it output a model that is somehow "More General"?

* I assume this is the only/main variable affecting this choice and not other variables such as choice of metrics, sessions, etc.
 
Physics news on Phys.org
  • #2
Given the same model structure (choice of metrics, properly normalized data, loss function) I don't see why TF should not converge on the same coefficients as a traditional calculation. However with a neural network we introduce hidden layers that create a non-linear fit which in most cases will perform better.

Have you worked through the Keras tutorial on the fuel efficiency dataset?
 
  • Like
Likes WWGD
  • #3
Does TF stand for TensorFlow or The f$%*? ;). Thanks for your answer. Will look up the link; thanks.
 
  • Haha
Likes pbuk
  • #4
WWGD said:
Does TF stand for TensorFlow or The f$%*? ;).
I must admit to having used the words "why won't you converge you f$%*?" or similar on a number of occasions.
 
Last edited:
  • Like
Likes PhDeezNutz and WWGD

FAQ: Choice of Pipelines for Data Analysis

What is a pipeline in data analysis?

A pipeline in data analysis is a sequence of steps or processes that are used to transform raw data into meaningful insights. It involves collecting, cleaning, and organizing data, applying statistical and machine learning techniques, and visualizing the results.

Why is choosing the right pipeline important in data analysis?

Choosing the right pipeline is important because it can greatly impact the accuracy and reliability of the results. Different pipelines may produce different outcomes, so it is crucial to select the one that is most suitable for the specific dataset and research question.

What factors should be considered when selecting a pipeline for data analysis?

Some important factors to consider when selecting a pipeline for data analysis include the type and size of the dataset, the research question, the available tools and resources, and the desired outcome. It is also important to consider the expertise and experience of the data analyst in using different pipelines.

What are some commonly used pipelines in data analysis?

Some commonly used pipelines in data analysis include the ETL (extract, transform, load) pipeline, which involves extracting data from various sources, transforming it into a usable format, and loading it into a database for analysis. The CRISP-DM (Cross-Industry Standard Process for Data Mining) pipeline is another popular approach, which involves six phases: business understanding, data understanding, data preparation, modeling, evaluation, and deployment.

Is it possible to change the pipeline during the data analysis process?

Yes, it is possible to change the pipeline during the data analysis process. This may be necessary if the initial pipeline is not producing the desired results or if new information or tools become available. However, it is important to carefully evaluate the potential impact of changing the pipeline and to document any changes made for transparency and reproducibility.

Similar threads

Replies
3
Views
1K
Replies
4
Views
2K
Replies
2
Views
1K
Replies
6
Views
2K
Replies
26
Views
2K
Replies
2
Views
1K
Back
Top