Data Science and the need of programming?

In summary, the person in the conversation has an undergraduate degree in CS and is starting grad school to study AI and Machine Learning. They are looking for a career that involves more theory and ideas rather than just coding. They are wondering if a job as a data scientist or machine learning engineer would involve more theory or application code. It is suggested that most software jobs involve mundane tasks, but there are some options for more interesting work such as research or working for specialized government departments. However, most companies prioritize deploying marketable and stable products over proof of concepts, making the job less exciting.
  • #1
Chubigans
21
0
Hello. I have an undergraduate degree in CS and I'm beginning grad school to study AI and Machine Learning.

I've spent a lot of time doing "software engineering" (dealing with design patterns, web development, QA, Agile processes, SQL, GUI development) and I'm really sick of it. I don't use much of my computer science knowledge as a programmer. I studied CS because I find beauty in algorithms, but there's not much theory in writing web apps or .NET apps.

I love writing proofs, analyzing algorithms and data structures, writing python and LISP code, etc. I am really into *proofs of concept* as opposed to *ultra robust software*... I like to deal in ideas, not code. Love to read and write papers, love to share ideas and teach... so I'm looking for a career that involves that.

As a data scientist / machine learning engineer with a Ph.D. in CS, would I be expected to write a lot of "application" code, or would I spend the majority of my workday in the realm of theory, creating new ideas and hacking together prototypes in scripting languages?

Is it essentially just an advanced software development job?

Attached is an example job posting from CL:
http://sfbay.craigslist.org/sfc/sci/3722725371.html
 
Last edited by a moderator:
Physics news on Phys.org
  • #2
I'm a software engineer with a computer science degree & over 20 years experience. Like you I'm a bit of a guru on theoretical computer science.

Calling yourself a "data scientist" is the way of the future. It gives a catchy marketing title for those of us with hard science degrees.

You are right, the majority of the code in this world is boring. Most of the world is business producing mundane things, who need mundane software. The "interesting" jobs I've had have been rare.
There is some non-boring work like research or startups, but they are not too many jobs.

Some ideas for less boring jobs:
- specialist govt departments with technical needs eg defence, defence contractors, environment, energy
- anything to do with statistics eg govt depts
- engineering-related jobs
- university - non-academic programming jobs

For fun read http://www.kalzumeus.com/2011/10/28/dont-call-yourself-a-programmer/
 
  • #3
I have a phd in physics and have worked two jobs in the "data science" type description. In the first job, there was very little need for coding, but there was also very little in the way of algorithm design. Mostly, it was tiny tweeks of off-the-shelf statistical algorithms, that sort of thing. To get an idea of how far from a robust-software environment, one of the "production models" that predated me was a series of excel notebooks, with data transferred via copy-paste.

In my newer job, I still don't do much in the way of algorithm design, but I do a fair amount of coding. For most business problems, existing methods work very well, so off the shelf statistical models (in R or any of the other statistical packages) are the name of the game. Its less like research into machine learning and more like applied machine-learning. But we do provide finished models in various forms for the client ,which is closer to robust-software, so much more of my time is spent on IT solution/routine coding type stuff. I spent most of yesterday coding up various unit-tests.
 
Last edited:
  • #4
Almost everyone who works in software would prefer to do "proofs of concepts" and do the fun part of prototyping a new idea and then jump to a new project. The problem is there is no money in that for a company. Proof of concepts don't make money, deployed products do...

On real world software projects the devil is in the details. Coming up with a proof of concept is the easy part. Implementing that proof of concept into a marketable, usable, and stable product is the hard part. It also is the more boring part.
 
  • #5


As a scientist with experience in both data science and programming, I can understand your frustration with the current state of software engineering. However, I believe that data science and programming go hand in hand and are both essential skills for a successful career in this field.

Data science involves using various techniques and tools to extract insights and knowledge from large datasets. This often requires programming skills to manipulate and analyze the data. In addition, data scientists are also responsible for creating models and algorithms to make predictions and guide decision making. These models and algorithms are often implemented through coding.

While it is true that data scientists spend a significant amount of time in the realm of theory and developing new ideas, it is also important to have the practical skills to bring those ideas to life through programming. This not only allows for better understanding and testing of the theories, but it also enables the implementation of these ideas into real-world applications.

Furthermore, data science is a rapidly growing field, and the demand for professionals with a combination of data science and programming skills is increasing. As a data scientist with a Ph.D. in CS, you would be expected to have a strong foundation in both theory and programming, and your expertise in both areas would be highly valued in the industry.

In regards to the job posting from CL, it is important to note that every company and position may have different requirements and expectations. However, in general, data scientists are expected to have a strong understanding of theoretical concepts and the ability to apply them through programming.

In conclusion, while data science may involve a fair amount of programming, it is not just an advanced software development job. It requires a unique combination of skills, including theory, programming, and communication, to be successful in this field. I believe that with your background in CS and your passion for proofs and ideas, you would make a great data scientist and have a fulfilling career in this exciting and constantly evolving field.
 

FAQ: Data Science and the need of programming?

1. What is data science?

Data science is a multidisciplinary field that combines statistical analysis, machine learning, and programming to extract insights and knowledge from data. It involves using various tools and techniques to collect, organize, and analyze large sets of data in order to make informed decisions and predictions.

2. What is the role of programming in data science?

Programming is an essential part of data science as it enables data scientists to manipulate, clean, and analyze large datasets. Programming languages such as Python, R, and SQL are commonly used in data science to perform tasks such as data cleaning, statistical analysis, and machine learning algorithms.

3. Do I need to have a background in programming to become a data scientist?

Having a background in programming is not a requirement to become a data scientist, but it can be beneficial. Many data scientists come from different backgrounds such as mathematics, statistics, or business, and learn programming skills on the job. However, having a strong foundation in programming can help you to quickly pick up the necessary skills and tools used in data science.

4. What are the benefits of using programming in data science?

Programming allows data scientists to automate tasks and perform complex analyses on large datasets. It also enables them to create customized solutions for data-related problems and build predictive models. Additionally, programming skills are highly sought after in the job market, making data scientists with programming skills in high demand.

5. Which programming language should I learn for data science?

The most commonly used programming languages in data science are Python, R, and SQL. Each language has its own strengths and weaknesses, so it ultimately depends on the specific needs and goals of the data scientist. However, learning basic programming concepts and understanding the fundamentals of data structures and algorithms can be beneficial regardless of the language chosen.

Similar threads

Replies
30
Views
7K
Replies
2
Views
2K
Replies
4
Views
1K
Replies
1
Views
1K
Replies
68
Views
8K
Replies
29
Views
4K
Replies
1
Views
2K
Back
Top