Training YOLOv9 on custom dataset

  • Comp Sci
  • Thread starter falyusuf
  • Start date
  • #1
falyusuf
35
3
Homework Statement
I want to use the Python language in Kaggle to train YOLOv9 on a custom dataset containing three folders: train, test, and valid. Each folder has two subfolders: images and labels. The dataset is installed from Roboflow, using the YOLOv9 model. However, I encountered an error during the training process.

Below, I have attached my code and the error message.
Relevant Equations
-
training YOLOv9:
!git clone https://github.com/SkalskiP/yolov9.git
%cd yolov9
!pip3 install -r requirements.txt -q

import os
Home = os.getcwd()
print(Home)

!mkdir -p /kaggle/working/yolov9/weights
!wget -P /kaggle/working/yolov9/weights -q https://github.com/WongKinYiu/yolov9/releases/download/v0.1/yolov9-c.pt
!wget -P /kaggle/working/yolov9/weights -q https://github.com/WongKinYiu/yolov9/releases/download/v0.1/yolov9-e.pt
!wget -P /kaggle/working/yolov9/weights -q https://github.com/WongKinYiu/yolov9/releases/download/v0.1/gelan-c.pt
!wget -P /kaggle/working/yolov9/weights -q https://github.com/WongKinYiu/yolov9/releases/download/v0.1/gelan-e.pt
   
%cd /kaggle/working/yolov9

!pip install roboflow

from roboflow import Roboflow
rf = Roboflow(api_key="#my_api")
project = rf.workspace("egat-43h2x").project("parking_lot-1lqk2")
version = project.version(1)
dataset = version.download("yolov9")
The dataset has been uploaded.

training YOLOv9:
%cd /kaggle/working/yolov9

!python train.py \
--batch 16 --epochs 50 --img 640 --min-items 0 --close-mosaic 15 \
--data /kaggle/input/parkingspacesyolov9/data.yaml\
--weights /kaggle/working/yolov9/weights/gelan-c.pt \
--cfg /kaggle/working/yolov9/models/detect/gelan-c.yaml \
--hyp hyp.scratch-high.yaml

I am getting this error:
Logging results to runs/train/exp
Starting training for 50 epochs...

Epoch GPU_mem box_loss cls_loss dfl_loss Instances Size
0%| | 0/84 00:00
Traceback (most recent call last):
File "/kaggle/working/yolov9/train.py", line 634, in <module>
main(opt)
File "/kaggle/working/yolov9/train.py", line 528, in main
train(opt.hyp, opt, device, callbacks)
File "/kaggle/working/yolov9/train.py", line 277, in train
for i, (imgs, targets, paths, _) in pbar: # batch -------------------------------------------------------------
File "/opt/conda/lib/python3.10/site-packages/tqdm/std.py", line 1182, in __iter__
for obj in iterable:
File "/opt/conda/lib/python3.10/site-packages/torch/utils/data/dataloader.py", line 630, in __next__
data = self._next_data()
File "/opt/conda/lib/python3.10/site-packages/torch/utils/data/dataloader.py", line 1345, in _next_data
return self._process_data(data)
File "/opt/conda/lib/python3.10/site-packages/torch/utils/data/dataloader.py", line 1371, in _process_data
data.reraise()
File "/opt/conda/lib/python3.10/site-packages/torch/_utils.py", line 694, in reraise
raise exception
IndexError: Caught IndexError in DataLoader worker process 0.
Original Traceback (most recent call last):
File "/opt/conda/lib/python3.10/site-packages/torch/utils/data/_utils/worker.py", line 308, in _worker_loop
data = fetcher.fetch(index)
File "/opt/conda/lib/python3.10/site-packages/torch/utils/data/_utils/fetch.py", line 51, in fetch
data = [self.dataset[idx] for idx in possibly_batched_index]
File "/opt/conda/lib/python3.10/site-packages/torch/utils/data/_utils/fetch.py", line 51, in <listcomp>
data = [self.dataset[idx] for idx in possibly_batched_index]
File "/kaggle/working/yolov9/utils/dataloaders.py", line 656, in __getitem__
img, labels = self.load_mosaic(index)
File "/kaggle/working/yolov9/utils/dataloaders.py", line 791, in load_mosaic
img4, labels4, segments4 = copy_paste(img4, labels4, segments4, p=self.hyp['copy_paste'])
File "/kaggle/working/yolov9/utils/augmentations.py", line 248, in copy_paste
l, box, s = labels[j], boxes[j], segments[j]
IndexError: list index out of range

How to fix?
 
Physics news on Phys.org
  • #2
falyusuf said:
Traceback (most recent call last):
<snip>
File "/kaggle/working/yolov9/utils/augmentations.py", line 248, in copy_paste
l, box, s = labels[j], boxes[j], segments[j]
IndexError: list index out of range
The above is the last line you showed, so represents the most recent call. It looks to me like the index j is out of range for one or more of the lists.
 
  • #3
Mark44 said:
The above is the last line you showed, so represents the most recent call. It looks to me like the index j is out of range for one or more of the lists.
I have copied the code snippet for uploading the dataset directly from Roboflow for the YOLOv9 model. I installed YOLOv9 and its requirements as mentioned on the official website. Where might the problem be?
 
  • #4
falyusuf said:
I have copied the code snippet for uploading the dataset directly from Roboflow for the YOLOv9 model. I installed YOLOv9 and its requirements as mentioned on the official website. Where might the problem be?
Mark just told you. Have you checked the indices and their ranges yet?
 

FAQ: Training YOLOv9 on custom dataset

How do I prepare my custom dataset for training YOLOv9?

To prepare your custom dataset for training YOLOv9, you need to ensure that your images are labeled correctly in the YOLO format. Each image should have an associated text file containing the class and bounding box coordinates for each object. The dataset should be divided into training, validation, and test sets. Additionally, you need to create a configuration file that specifies the paths to these datasets and the classes involved.

What are the hardware requirements for training YOLOv9?

Training YOLOv9 typically requires a powerful GPU with substantial VRAM, such as an NVIDIA RTX series card. The exact requirements depend on the size of your dataset and the complexity of your model. At a minimum, you should have a GPU with at least 8GB of VRAM, along with a multi-core CPU and at least 16GB of RAM. Using a machine with a high-speed SSD can also significantly improve data loading times.

How do I fine-tune the hyperparameters for YOLOv9 training?

Fine-tuning hyperparameters for YOLOv9 involves adjusting values like learning rate, batch size, and number of epochs. Start with the default values provided in the YOLOv9 configuration file and then experiment with small changes. Use a validation set to monitor performance metrics such as mAP (mean Average Precision) and adjust the hyperparameters accordingly to improve these metrics.

How can I monitor the training process and evaluate the model's performance?

You can monitor the training process by using visualization tools like TensorBoard or Weights & Biases. These tools can help you track metrics such as loss, accuracy, and mAP in real-time. Additionally, periodically evaluate your model on the validation set to ensure it is not overfitting. After training, use the test set to evaluate the final performance of your model.

What are common issues faced during YOLOv9 training and how can I troubleshoot them?

Common issues during YOLOv9 training include overfitting, underfitting, and poor convergence. To troubleshoot, ensure your dataset is well-balanced and sufficiently large. Use data augmentation techniques to increase dataset diversity. If overfitting occurs, consider regularization techniques like dropout. For poor convergence, try adjusting the learning rate or using a different optimizer. Monitoring training and validation loss can provide insights into these issues.

Back
Top