Training YOLOv8 Models for Hailo-8

This guide covers the complete workflow for training a YOLOv8 object detection model on a Linux machine with GPU acceleration and compiling it to run on a Hailo-8 AI accelerator.

Prerequisites

Hardware Requirements

Linux machine with NVIDIA GPU (for training)
A large amount of VRAM may be required for training large models(e.g. 4-6 GB for yolov8n, 10-11 GB for yolov8m)
Hailo-8 device (for deployment)
Minimum 16GB RAM (32GB+ recommended for training)
75GB+ free disk space

Software Requirements

Ubuntu 20.04 or 22.04 (recommended)
NVIDIA drivers installed
Docker with NVIDIA Container Toolkit
Python 3.8 or higher
CUDA 11.0+ and cuDNN

Part 1: Setting Up the Training Environment

Install NVIDIA Drivers and CUDA

# Check if NVIDIA drivers are installed
nvidia-smi

# Install NVIDIA drivers if needed (Ubuntu)
sudo apt update
sudo apt install nvidia-driver-525
sudo reboot

Install Docker and NVIDIA Container Toolkit

# Install Docker
curl -fsSL https://get.docker.com -o get-docker.sh
sudo sh get-docker.sh
sudo usermod -aG docker $USER

# Install NVIDIA Container Toolkit
distribution=$(. /etc/os-release;echo $ID$VERSION_ID)
curl -s -L https://nvidia.github.io/nvidia-docker/gpgkey | sudo apt-key add -
curl -s -L https://nvidia.github.io/nvidia-docker/$distribution/nvidia-docker.list | \
  sudo tee /etc/apt/sources.list.d/nvidia-docker.list

sudo apt-get update
sudo apt-get install -y nvidia-container-toolkit
sudo systemctl restart docker

# Test NVIDIA Docker
docker run --rm --gpus all nvidia/cuda:11.8.0-base-ubuntu22.04 nvidia-smi

Install Python and YOLOv8

# Create a virtual environment
python3 -m venv yolov8-env
source yolov8-env/bin/activate

# Install Ultralytics YOLOv8
pip install ultralytics
pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu118

Part 2: Preparing Your Dataset

ROBOFLOW

You can use Roboflow to easily create and manage your dataset:

Sign up at https://roboflow.com/
Upload your images and annotate them
Export the dataset in YOLO format

the exported data is in the following format: Dataset Structure ^^^^^^^^^^^^^^^^^

YOLOv8 expects datasets in YOLO format with the following structure:

dataset/
├── images/
│   ├── train/
│   │   ├── image1.jpg
│   │   ├── image2.jpg
│   │   └── ...
│   └── val/
│       ├── image1.jpg
│       └── ...
└── labels/
    ├── train/
    │   ├── image1.txt
    │   ├── image2.txt
    │   └── ...
    └── val/
        ├── image1.txt
        └── ...

Label Format

Each label file contains one line per object in YOLO format:

class_id center_x center_y width height

Where all coordinates are normalized to [0, 1]:

class_id: Integer class index (starting from 0)
center_x: X coordinate of bounding box center
center_y: Y coordinate of bounding box center
width: Width of bounding box
height: Height of bounding box

Create Dataset Configuration

Create a dataset.yaml file:

# dataset.yaml
path: /path/to/dataset
train: images/train
val: images/val

# Number of classes
nc: 3

# Class names
names:
  0: class1
  1: class2
  2: class3

Part 3: Training YOLOv8 Model

Basic Training

# Activate environment
source yolov8-env/bin/activate

# Train YOLOv8n (nano) model
yolo detect train data=dataset.yaml model=yolov8n.pt epochs=100 imgsz=640 batch=16

Training Parameters

Key training parameters:

model: Pre-trained model (yolov8n.pt, yolov8s.pt, yolov8m.pt, yolov8l.pt, yolov8x.pt)
epochs: Number of training epochs (typically 100-300)
imgsz: Image size (640 is standard, must be multiple of 32)
batch: Batch size (adjust based on GPU memory)
device: GPU device (0, 1, or cpu)

Training metrics are saved to runs/detect/train_exp/. View with TensorBoard:

Export to ONNX Format

Hailo requires models in ONNX format first:

# Export best model to ONNX
yolo export model=runs/detect/train_exp/weights/best.pt format=onnx imgsz=640 simplify=True

Or with Python:

from ultralytics import YOLO

# Load trained model
model = YOLO('runs/detect/train_exp/weights/best.pt')

# Export to ONNX
model.export(
    format='onnx',
    imgsz=640,
    simplify=True,
    opset=11,
)

This creates best.onnx in the same directory.

Part 5: Compile Model with Hailo Docker

Pull Hailo Docker Image

# Pull the Hailo Dataflow Compiler Docker image
docker pull hailo/hailo_ai_sw_suite:2023-07

# Verify the image
docker images | grep hailo

Start Hailo Docker Container

# Create a working directory
mkdir -p ~/hailo-workspace
cd ~/hailo-workspace

# Copy your ONNX model
cp /path/to/best.onnx ~/hailo-workspace/

# Run Hailo container with GPU support
docker run -it --gpus all \
  -v ~/hailo-workspace:/workspace \
  --name hailo-compiler \
  hailo/hailo_ai_sw_suite:2023-07 \
  /bin/bash

Compile Model to HEF Format

Inside the Docker container, use the Hailo Dataflow Compiler:

# Navigate to workspace
cd /workspace

# Compile ONNX model to HEF (Hailo Executable Format)
hailomz compile yolov8n \
  --ckpt best.onnx \
  --hw-arch hailo8 \
  --calib-path /path/to/calibration/images \
  --classes 2 \
  --performance \
  --output-dir ./compiled

Calibration Dataset

The calibration process needs representative images:

There are some scripts on github that automate the process .. code-block:: bash

# Create calibration directory with 50-100 representative images mkdir -p ~/hailo-workspace/calibration_images

# Copy representative images from your validation set cp dataset/images/val/*.jpg ~/hailo-workspace/calibration_images/

Part 6: Deploy to Hailo-8 Device

Copy HEF to Target System

# Exit Docker container
exit

# HEF file is now in ~/hailo-workspace/
ls ~/hailo-workspace/*.hef

# Copy to your robot or target device
scp ~/hailo-workspace/yolov8n_hailo8.hef user@robot:/path/to/models/

Python Inference Code

Example Python code for inference on Hailo-8:

from hailo_platform import (
    HEF, Device, VDevice, HailoStreamInterface,
    InferVStreams, ConfigureParams
)
import numpy as np
import cv2

class YOLOv8Hailo:
    def __init__(self, hef_path):
        self.hef = HEF(hef_path)
        self.device = Device()
        self.network_group = self.device.configure(self.hef)[0]

    def preprocess(self, image):
        """Preprocess image for YOLOv8"""
        # Resize to model input size
        img = cv2.resize(image, (640, 640))
        # Convert BGR to RGB
        img = cv2.cvtColor(img, cv2.COLOR_BGR2RGB)
        # Normalize to [0, 1]
        img = img.astype(np.float32) / 255.0
        # Add batch dimension
        img = np.expand_dims(img, axis=0)
        return img

    def postprocess(self, outputs, conf_threshold=0.25):
        """Post-process YOLOv8 outputs"""
        # Implementation depends on your specific model output format
        # Typically involves:
        # 1. Decoding bounding boxes
        # 2. Applying confidence threshold
        # 3. Non-maximum suppression
        detections = []
        # Your post-processing logic here
        return detections

    def infer(self, image):
        """Run inference on image"""
        # Preprocess
        input_data = self.preprocess(image)

        # Run inference
        with InferVStreams(self.network_group) as infer_pipeline:
            input_dict = {self.hef.get_input_vstream_infos()[0].name: input_data}
            output_dict = infer_pipeline.infer(input_dict)

        # Post-process
        detections = self.postprocess(output_dict)
        return detections

# Usage
detector = YOLOv8Hailo('yolov8n_hailo8.hef')
image = cv2.imread('test_image.jpg')
results = detector.infer(image)

Best Practices

Training Tips

Use transfer learning: Always start with pre-trained weights
Augmentation: Enable data augmentation for better generalization
Dataset size: Aim for at least 1500+ images per class
Class balance: Try to balance the number of samples per class
Image quality: Use high-quality, representative images
Validation split: Use 80/20 or 70/30 train/val split

Hailo Optimization Tips

Model size: Smaller models (YOLOv8n, YOLOv8s) compile faster and run better on Hailo-8
Input size: 640x640 is standard, but 416x416 may be faster
Calibration data: Use 50-100 diverse, representative images
Compression: Higher compression saves memory but may reduce accuracy
Testing: Always validate HEF accuracy against original model

Troubleshooting

Training Issues

Out of memory during training:

# Reduce batch size
yolo detect train data=dataset.yaml model=yolov8n.pt batch=8

# Or use smaller model
yolo detect train data=dataset.yaml model=yolov8n.pt

Poor training results:

Check dataset labels are correct
Increase epochs (300-500 for small datasets)
Adjust learning rate
Use more data augmentation

Compilation Issues

ONNX export fails:

# Try different opset version
yolo export model=best.pt format=onnx opset=11 simplify=True

HEF compilation fails:

Ensure ONNX model is simplified
Check calibration images are valid
Try lower compression level
Verify model architecture is supported by Hailo

Performance Issues

Slow inference on Hailo-8:

Use smaller input size (416 instead of 640)
Choose lighter model (yolov8n instead of yolov8s)
Enable hardware optimizations during compilation

Accuracy drop after compilation:

Use more calibration images
Lower compression level
Verify post-processing implementation

Additional Resources

Example Workflow Summary

Complete workflow from start to finish:

# 1. Prepare environment
source yolov8-env/bin/activate

# 2. Train model
yolo detect train data=dataset.yaml model=yolov8n.pt epochs=100 imgsz=640 batch=16

# 3. Validate model
yolo detect val model=runs/detect/train/weights/best.pt data=dataset.yaml

# 4. Export to ONNX
yolo export model=runs/detect/train/weights/best.pt format=onnx imgsz=640 simplify=True

# 5. Start Hailo Docker
docker run -it --gpus all -v ~/hailo-workspace:/workspace hailo/hailo_ai_sw_suite:2023-07 /bin/bash

# 6. Inside Docker: Compile to HEF
hailo parser onnx /workspace/best.onnx
hailo optimize --model-name yolov8n --hw-arch hailo8 --calib-set /workspace/calibration_images
hailo compiler --hw-arch hailo8 yolov8n_optimized.har --output yolov8n_hailo8.hef

This completes the full pipeline from training to deployment on Hailo-8 hardware.