[Python] finding the correct data mining approach

In summary, the conversation is about finding a way to predict log-in times for a website based on past data. The speaker suggests clustering the data by day and hour and using regression analysis to make predictions. They are looking for a resource on how to do this in Python.
  • #1
eherrtelle59
25
0
I'm having trouble finding the correct approach to my (fairly simple) example.

Let's say I have months of data for log-in times of a certain website. The data has been selected and cleaned such that I have a list of Date_Time for each log-in.

Now, suppose I wanted to predict the log-ins for the next two weeks by day and hour, based on these past trends.

I imagine I would cluster the data by day (assuming beforehand that there will be different trends with respect to Monday vs. Friday) and make some regression analysis to predict the next two (say) Mondays.

Similarly, I could cluster by the hour and do a regression analysis to extrapolate the trend of log-ins.

Anyone know of a resource which tells you how to do this in Python? I want to keep this example fairly straightforward, but I'm open to any more ideas on how to model this behavior more efficiently.
 
Technology news on Phys.org
  • #3
There is also lowess.
 

Related to [Python] finding the correct data mining approach

1. What is data mining and why is it important?

Data mining is the process of extracting useful and actionable insights from large amounts of data. It is important because it helps organizations make better decisions, identify patterns and trends, and improve their overall efficiency and productivity.

2. How do I choose the right data mining approach for my project?

Choosing the right data mining approach depends on the type of data you have, the goals of your project, and the resources available. Some common approaches include classification, clustering, regression, and association rule mining. It is important to carefully consider the specific needs of your project before selecting an approach.

3. What is the difference between supervised and unsupervised learning in data mining?

Supervised learning involves using labeled data to train a model and make predictions, while unsupervised learning involves finding patterns and relationships in unlabeled data. In other words, supervised learning requires a target variable to be identified, while unsupervised learning does not.

4. Can Python be used for data mining?

Yes, Python is a popular and powerful programming language for data mining. It offers a variety of libraries and packages specifically designed for data mining tasks, such as pandas, scikit-learn, and TensorFlow. Its simple syntax and extensive community support make it a great choice for beginners and experienced data scientists alike.

5. What are some common challenges in data mining and how can they be addressed?

Some common challenges in data mining include dealing with missing or noisy data, selecting relevant features, and handling large datasets. These challenges can be addressed by using data cleaning techniques, feature selection algorithms, and distributed computing platforms. It is also important to carefully plan and design a data mining project to address potential challenges from the start.

Similar threads

  • Programming and Computer Science
Replies
1
Views
774
  • Programming and Computer Science
Replies
8
Views
1K
  • Programming and Computer Science
Replies
11
Views
1K
  • Set Theory, Logic, Probability, Statistics
Replies
4
Views
1K
Replies
7
Views
743
  • General Math
Replies
1
Views
965
  • Programming and Computer Science
Replies
2
Views
1K
  • Set Theory, Logic, Probability, Statistics
Replies
3
Views
2K
  • Programming and Computer Science
Replies
1
Views
2K
Replies
24
Views
2K
Back
Top