Training YOLOv8 Models for Hailo-8
This guide covers the complete workflow for training a YOLOv8 object detection model on a Linux machine with GPU acceleration and compiling it to run on a Hailo-8 AI accelerator.
Prerequisites
Hardware Requirements
Linux machine with NVIDIA GPU (for training)
A large amount of VRAM may be required for training large models(e.g. 4-6 GB for yolov8n, 10-11 GB for yolov8m)
Hailo-8 device (for deployment)
Minimum 16GB RAM (32GB+ recommended for training)
75GB+ free disk space
Software Requirements
Ubuntu 20.04 or 22.04 (recommended)
NVIDIA drivers installed
Docker with NVIDIA Container Toolkit
Python 3.8 or higher
CUDA 11.0+ and cuDNN
Part 1: Setting Up the Training Environment
Install NVIDIA Drivers and CUDA
# Check if NVIDIA drivers are installed
nvidia-smi
# Install NVIDIA drivers if needed (Ubuntu)
sudo apt update
sudo apt install nvidia-driver-525
sudo reboot
Install Docker and NVIDIA Container Toolkit
# Install Docker
curl -fsSL https://get.docker.com -o get-docker.sh
sudo sh get-docker.sh
sudo usermod -aG docker $USER
# Install NVIDIA Container Toolkit
distribution=$(. /etc/os-release;echo $ID$VERSION_ID)
curl -s -L https://nvidia.github.io/nvidia-docker/gpgkey | sudo apt-key add -
curl -s -L https://nvidia.github.io/nvidia-docker/$distribution/nvidia-docker.list | \
sudo tee /etc/apt/sources.list.d/nvidia-docker.list
sudo apt-get update
sudo apt-get install -y nvidia-container-toolkit
sudo systemctl restart docker
# Test NVIDIA Docker
docker run --rm --gpus all nvidia/cuda:11.8.0-base-ubuntu22.04 nvidia-smi
Install Python and YOLOv8
# Create a virtual environment
python3 -m venv yolov8-env
source yolov8-env/bin/activate
# Install Ultralytics YOLOv8
pip install ultralytics
pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu118
Part 2: Preparing Your Dataset
ROBOFLOW
You can use Roboflow to easily create and manage your dataset:
Sign up at https://roboflow.com/
Upload your images and annotate them
Export the dataset in YOLO format
the exported data is in the following format: Dataset Structure ^^^^^^^^^^^^^^^^^
YOLOv8 expects datasets in YOLO format with the following structure:
dataset/
├── images/
│ ├── train/
│ │ ├── image1.jpg
│ │ ├── image2.jpg
│ │ └── ...
│ └── val/
│ ├── image1.jpg
│ └── ...
└── labels/
├── train/
│ ├── image1.txt
│ ├── image2.txt
│ └── ...
└── val/
├── image1.txt
└── ...
Label Format
Each label file contains one line per object in YOLO format:
class_id center_x center_y width height
Where all coordinates are normalized to [0, 1]:
class_id: Integer class index (starting from 0)center_x: X coordinate of bounding box centercenter_y: Y coordinate of bounding box centerwidth: Width of bounding boxheight: Height of bounding box
Create Dataset Configuration
Create a dataset.yaml file:
# dataset.yaml
path: /path/to/dataset
train: images/train
val: images/val
# Number of classes
nc: 3
# Class names
names:
0: class1
1: class2
2: class3
Part 3: Training YOLOv8 Model
Basic Training
# Activate environment
source yolov8-env/bin/activate
# Train YOLOv8n (nano) model
yolo detect train data=dataset.yaml model=yolov8n.pt epochs=100 imgsz=640 batch=16
Training Parameters
Key training parameters:
model: Pre-trained model (yolov8n.pt, yolov8s.pt, yolov8m.pt, yolov8l.pt, yolov8x.pt)epochs: Number of training epochs (typically 100-300)imgsz: Image size (640 is standard, must be multiple of 32)batch: Batch size (adjust based on GPU memory)device: GPU device (0, 1, or cpu)
Training metrics are saved to runs/detect/train_exp/. View with TensorBoard:
Export to ONNX Format
Hailo requires models in ONNX format first:
# Export best model to ONNX
yolo export model=runs/detect/train_exp/weights/best.pt format=onnx imgsz=640 simplify=True
Or with Python:
from ultralytics import YOLO
# Load trained model
model = YOLO('runs/detect/train_exp/weights/best.pt')
# Export to ONNX
model.export(
format='onnx',
imgsz=640,
simplify=True,
opset=11,
)
This creates best.onnx in the same directory.
Part 5: Compile Model with Hailo Docker
Pull Hailo Docker Image
# Pull the Hailo Dataflow Compiler Docker image
docker pull hailo/hailo_ai_sw_suite:2023-07
# Verify the image
docker images | grep hailo
Start Hailo Docker Container
# Create a working directory
mkdir -p ~/hailo-workspace
cd ~/hailo-workspace
# Copy your ONNX model
cp /path/to/best.onnx ~/hailo-workspace/
# Run Hailo container with GPU support
docker run -it --gpus all \
-v ~/hailo-workspace:/workspace \
--name hailo-compiler \
hailo/hailo_ai_sw_suite:2023-07 \
/bin/bash
Compile Model to HEF Format
Inside the Docker container, use the Hailo Dataflow Compiler:
# Navigate to workspace
cd /workspace
# Compile ONNX model to HEF (Hailo Executable Format)
hailomz compile yolov8n \
--ckpt best.onnx \
--hw-arch hailo8 \
--calib-path /path/to/calibration/images \
--classes 2 \
--performance \
--output-dir ./compiled
Calibration Dataset
The calibration process needs representative images:
There are some scripts on github that automate the process .. code-block:: bash
# Create calibration directory with 50-100 representative images mkdir -p ~/hailo-workspace/calibration_images
# Copy representative images from your validation set cp dataset/images/val/*.jpg ~/hailo-workspace/calibration_images/
Part 6: Deploy to Hailo-8 Device
Copy HEF to Target System
# Exit Docker container
exit
# HEF file is now in ~/hailo-workspace/
ls ~/hailo-workspace/*.hef
# Copy to your robot or target device
scp ~/hailo-workspace/yolov8n_hailo8.hef user@robot:/path/to/models/
Python Inference Code
Example Python code for inference on Hailo-8:
from hailo_platform import (
HEF, Device, VDevice, HailoStreamInterface,
InferVStreams, ConfigureParams
)
import numpy as np
import cv2
class YOLOv8Hailo:
def __init__(self, hef_path):
self.hef = HEF(hef_path)
self.device = Device()
self.network_group = self.device.configure(self.hef)[0]
def preprocess(self, image):
"""Preprocess image for YOLOv8"""
# Resize to model input size
img = cv2.resize(image, (640, 640))
# Convert BGR to RGB
img = cv2.cvtColor(img, cv2.COLOR_BGR2RGB)
# Normalize to [0, 1]
img = img.astype(np.float32) / 255.0
# Add batch dimension
img = np.expand_dims(img, axis=0)
return img
def postprocess(self, outputs, conf_threshold=0.25):
"""Post-process YOLOv8 outputs"""
# Implementation depends on your specific model output format
# Typically involves:
# 1. Decoding bounding boxes
# 2. Applying confidence threshold
# 3. Non-maximum suppression
detections = []
# Your post-processing logic here
return detections
def infer(self, image):
"""Run inference on image"""
# Preprocess
input_data = self.preprocess(image)
# Run inference
with InferVStreams(self.network_group) as infer_pipeline:
input_dict = {self.hef.get_input_vstream_infos()[0].name: input_data}
output_dict = infer_pipeline.infer(input_dict)
# Post-process
detections = self.postprocess(output_dict)
return detections
# Usage
detector = YOLOv8Hailo('yolov8n_hailo8.hef')
image = cv2.imread('test_image.jpg')
results = detector.infer(image)
Best Practices
Training Tips
Use transfer learning: Always start with pre-trained weights
Augmentation: Enable data augmentation for better generalization
Dataset size: Aim for at least 1500+ images per class
Class balance: Try to balance the number of samples per class
Image quality: Use high-quality, representative images
Validation split: Use 80/20 or 70/30 train/val split
Hailo Optimization Tips
Model size: Smaller models (YOLOv8n, YOLOv8s) compile faster and run better on Hailo-8
Input size: 640x640 is standard, but 416x416 may be faster
Calibration data: Use 50-100 diverse, representative images
Compression: Higher compression saves memory but may reduce accuracy
Testing: Always validate HEF accuracy against original model
Troubleshooting
Training Issues
Out of memory during training:
# Reduce batch size
yolo detect train data=dataset.yaml model=yolov8n.pt batch=8
# Or use smaller model
yolo detect train data=dataset.yaml model=yolov8n.pt
Poor training results:
Check dataset labels are correct
Increase epochs (300-500 for small datasets)
Adjust learning rate
Use more data augmentation
Compilation Issues
ONNX export fails:
# Try different opset version
yolo export model=best.pt format=onnx opset=11 simplify=True
HEF compilation fails:
Ensure ONNX model is simplified
Check calibration images are valid
Try lower compression level
Verify model architecture is supported by Hailo
Performance Issues
Slow inference on Hailo-8:
Use smaller input size (416 instead of 640)
Choose lighter model (yolov8n instead of yolov8s)
Enable hardware optimizations during compilation
Accuracy drop after compilation:
Use more calibration images
Lower compression level
Verify post-processing implementation
Additional Resources
Example Workflow Summary
Complete workflow from start to finish:
# 1. Prepare environment
source yolov8-env/bin/activate
# 2. Train model
yolo detect train data=dataset.yaml model=yolov8n.pt epochs=100 imgsz=640 batch=16
# 3. Validate model
yolo detect val model=runs/detect/train/weights/best.pt data=dataset.yaml
# 4. Export to ONNX
yolo export model=runs/detect/train/weights/best.pt format=onnx imgsz=640 simplify=True
# 5. Start Hailo Docker
docker run -it --gpus all -v ~/hailo-workspace:/workspace hailo/hailo_ai_sw_suite:2023-07 /bin/bash
# 6. Inside Docker: Compile to HEF
hailo parser onnx /workspace/best.onnx
hailo optimize --model-name yolov8n --hw-arch hailo8 --calib-set /workspace/calibration_images
hailo compiler --hw-arch hailo8 yolov8n_optimized.har --output yolov8n_hailo8.hef
This completes the full pipeline from training to deployment on Hailo-8 hardware.