Edge Vision For Predicting Conveyor Belt Tear Using Tiny Yolov8 And Imu Correlation
Written by
Xenon Bot
Edge Vision for Predicting Conveyor Belt Tear Using Tiny YOLOv8 and IMU Correlation
A couple weekends ago I got pulled into a frustrating smart-manufacturing problem: a conveyor belt would start “micro-tearing” along the same seam line, and by the time operators noticed, it was usually already expensive. The sensors on the line were telling something was wrong, but not what.
So I built a small edge pipeline that watches a single belt region with an on-device camera model and cross-checks it with an IMU (accelerometer/gyroscope) mounted near the motor. The goal was very specific: detect early tear artifacts in the image and correlate them with vibration patterns that happen when the belt starts slipping or deforming.
The result wasn’t a magical predictor—it was a practical, layered detector that can run on localized hardware (no cloud needed) and triggers a “likely tear ahead” event with enough confidence to justify inspection.
What I built (and why it worked)
I used:
- Tiny YOLOv8: A lightweight object detection model. “Object detection” means the model outputs bounding boxes and class labels for features it learned during training.
- IMU vibration features: From accelerometer/gyro streams, I computed simple metrics like RMS vibration and dominant frequency energy.
- Correlation logic: I combined “tear probability from vision” with “vibration signature from IMU” to reduce false positives (dust, lighting changes, random marks).
The niche part: I focused on a very narrow seam region of the belt and trained the model to detect tear edge micro-features (thin bright/dark fringes) rather than generic “damage.” That made the detector sensitive to the real failure mode.
Hardware assumptions
This example is written to be portable, but I designed the flow around a typical edge setup:
- A camera pointed at the belt seam region
- An IMU on the conveyor motor frame streaming at a few hundred Hz
- An edge device running Python (Raspberry Pi class, Jetson class, or industrial PC)
For the code below, I simulated camera frames and IMU data so the pipeline is runnable anywhere.
Step 1: Define the event scoring rules
My scoring strategy was intentionally simple:
- Vision model outputs a
tear_probbetween 0 and 1. - IMU features produce a
vibration_scorebetween 0 and 1. - I compute:
final_score = 0.65 * tear_prob + 0.35 * vibration_score
- I require
final_score > thresholdand persistence over a short window to avoid single-frame glitches.
Here’s the implementation of the vibration scoring and persistence gate.
import numpy as np from collections import deque def rms(x: np.ndarray) -> float: return float(np.sqrt(np.mean(np.square(x)))) def dominant_energy(x: np.ndarray, fs: float) -> float: """ Compute normalized energy around the dominant frequency. This is a simple proxy for "the vibration has a strong tone". """ x = x - np.mean(x) n = len(x) if n < 8: return 0.0 # Real FFT freqs = np.fft.rfftfreq(n, d=1.0/fs) spec = np.abs(np.fft.rfft(x))**2 idx = int(np.argmax(spec)) if idx == 0: return 0.0 # Normalize by total energy total = float(np.sum(spec)) if total <= 1e-12: return 0.0 return float(spec[idx] / total) class TearPredictor: def __init__(self, threshold=0.72, window_seconds=2.0, imu_fs=200.0): self.threshold = threshold self.window_len = int(window_seconds * (imu_fs / 1.0)) # used for IMU chunks count self.events = deque(maxlen=60) # store last N fused scores (simple persistence) def vibration_score(self, ax, ay, az, gx, gy, gz, fs): """ Produce a score in [0,1] from IMU signals using: - RMS acceleration magnitude - Dominant frequency energy from accel magnitude """ a_mag = np.sqrt(ax**2 + ay**2 + az**2) g_mag = np.sqrt(gx**2 + gy**2 + gz**2) a_rms = rms(a_mag) g_rms = rms(g_mag) # Normalize with heuristic scaling for demo purposes # In a real line you calibrate these ranges from healthy data. a_rms_norm = np.clip(a_rms / 2.5, 0, 1) g_rms_norm = np.clip(g_rms / 25.0, 0, 1) dom = dominant_energy(a_mag, fs) # also in [0,1] due to normalization # Weighted blend: strong tone + stronger vibration = higher score score = 0.55 * dom + 0.35 * a_rms_norm + 0.10 * g_rms_norm return float(np.clip(score, 0, 1)) def update(self, tear_prob, vib_score): """ Fuse vision + vibration, then gate with persistence: trigger only if enough recent frames exceed threshold. """ final_score = 0.65 * tear_prob + 0.35 * vib_score self.events.append(final_score) if len(self.events) < 10: return False, final_score recent = list(self.events)[-10:] # Persistence rule: at least 7 out of last 10 fused scores exceed threshold trigger = sum(s > self.threshold for s in recent) >= 7 return trigger, final_score
Why these choices?
- RMS alone detects “more motion,” but conveyors always vibrate.
- Dominant frequency energy adds a “vibration pattern changed” signal (belt slipping tends to create stronger tonal components).
- Persistence prevents a single good/bad frame from triggering.
Step 2: Build a tiny end-to-end demo pipeline (simulated)
To make this blog post runnable, I simulate:
- Camera frames that sometimes contain a “tear” region
- IMU streams that correlate with that event
In production you’d replace the simulation with:
- a camera grab loop
- an IMU reader (serial, CAN, Ethernet, GPIO-based IMU module, etc.)
- a real YOLOv8 model inference
Here’s the demo runner.
import numpy as np import time def simulate_imu(fs=200.0, n=400, fault=False, seed=0): """ Simulate IMU windows. When fault=True, add stronger tonal vibration and higher RMS. """ rng = np.random.default_rng(seed) t = np.arange(n) / fs # base vibration: noise + mild tone base_tone = 12.0 # Hz tone_amp = 0.35 if not fault else 0.85 noise = 0.08 * rng.standard_normal((6, n)) ax = tone_amp * np.sin(2*np.pi*base_tone*t) + noise[0] ay = 0.6 * tone_amp * np.sin(2*np.pi*base_tone*t + 0.4) + noise[1] az = (0.9 * tone_amp * np.sin(2*np.pi*base_tone*t + 1.2) + noise[2]) + 1.0 # gyro: correlated but weaker gx = 2.0 * (0.3 if not fault else 0.9) * np.sin(2*np.pi*base_tone*t + 0.2) + noise[3] gy = 2.0 * (0.2 if not fault else 0.7) * np.sin(2*np.pi*base_tone*t + 1.1) + noise[4] gz = 2.0 * (0.25 if not fault else 0.8) * np.sin(2*np.pi*base_tone*t + 2.0) + noise[5] return ax, ay, az, gx, gy, gz def simulate_tear_prob(step, fault_start_step=25, fault_end_step=45): """ Simulate vision probabilities: before fault: low probabilities during fault: higher probabilities with some jitter """ rng = np.random.default_rng(1234 + step) if fault_start_step <= step <= fault_end_step: # high tear probability, but not perfect return float(np.clip(0.55 + 0.35 * rng.random(), 0, 1)) else: # mostly low return float(np.clip(0.05 + 0.25 * rng.random(), 0, 1)) def run_demo(): fs = 200.0 imu_window = 400 # 2 seconds per window at 200Hz predictor = TearPredictor(threshold=0.72, window_seconds=2.0, imu_fs=fs) # Simulate 70 steps (each step corresponds to one vision+imu window) for step in range(70): fault_now = (25 <= step <= 45) tear_prob = simulate_tear_prob(step) ax, ay, az, gx, gy, gz = simulate_imu(fs=fs, n=imu_window, fault=fault_now, seed=step) vib_score = predictor.vibration_score(ax, ay, az, gx, gy, gz, fs=fs) trigger, final_score = predictor.update(tear_prob, vib_score) print( f"step={step:02d} fault={fault_now} " f"tear_prob={tear_prob:.2f} vib_score={vib_score:.2f} " f"final={final_score:.2f} TRIGGER={trigger}" ) # Make it fast but human-readable time.sleep(0.02) if __name__ == "__main__": run_demo()
What you should see
- Before step ~25,
tear_probstays low andvib_scorestays low →finalrarely exceeds the threshold. - During step ~25–45, both vision and vibration spike →
TRIGGER=Truebecomes frequent due to the persistence gate. - After step ~45, values drop and triggers stop.
This confirms the behavioral logic even without the real model.
Step 3: Swap in real Tiny YOLOv8 inference (real code scaffold)
Below is the structure I used when I swapped simulation for a real camera + YOLO model. This assumes you have:
ultralyticsinstalled- a trained YOLO model file for your seam micro-tear class
from ultralytics import YOLO import cv2 import numpy as np class VisionTearDetector: def __init__(self, model_path, conf=0.25, target_class_id=0): """ model_path: path to trained YOLOv8 model (e.g., runs/train/.../weights/best.pt) conf: confidence threshold used by YOLO to consider detections target_class_id: class index for "micro tear edge" in your dataset """ self.model = YOLO(model_path) self.conf = conf self.target_class_id = target_class_id def infer_prob(self, frame_bgr, roi): """ frame_bgr: full camera frame (H,W,3) in BGR roi: (x1, y1, x2, y2) defining the narrow seam region we care about """ x1, y1, x2, y2 = roi roi_img = frame_bgr[y1:y2, x1:x2] # YOLO expects RGB sometimes; ultralytics handles many formats, but explicit convert is safe roi_rgb = cv2.cvtColor(roi_img, cv2.COLOR_BGR2RGB) results = self.model.predict( source=roi_rgb, conf=self.conf, verbose=False ) # YOLO returns a list of Results; we use the first one r = results[0] # Default low probability when nothing matches tear_prob = 0.02 if r.boxes is not None and len(r.boxes) > 0: # boxes.cls: detected class ids # boxes.conf: confidence per detection (0..1) cls = r.boxes.cls.cpu().numpy().astype(int) confs = r.boxes.conf.cpu().numpy() # take max confidence for the target class mask = (cls == self.target_class_id) if np.any(mask): tear_prob = float(np.max(confs[mask])) return tear_prob
Why ROI matters so much
I trained and inferred on a small ROI around the seam because:
- tiny tear artifacts occupy only a few pixels
- background (rollers, shadows, labels) otherwise swamps the model
- inference becomes faster (less pixels to process)
Step 4: Combine Vision + IMU in one loop (production pattern)
Here’s the “real” loop structure. It’s written so the vision and IMU pieces are pluggable.
import time import numpy as np # Assume TearPredictor from earlier is imported # Assume VisionTearDetector from earlier is imported def imu_read_window(): """ Placeholder for real IMU read. Should return (ax, ay, az, gx, gy, gz) arrays of equal length. """ fs = 200.0 n = 400 ax = np.random.normal(0, 0.2, n) ay = np.random.normal(0, 0.2, n) az = np.random.normal(0, 0.2, n) + 1.0 gx = np.random.normal(0, 1.0, n) gy = np.random.normal(0, 1.0, n) gz = np.random.normal(0, 1.0, n) return fs, ax, ay, az, gx, gy, gz def camera_read(): """ Placeholder for real camera frame acquisition. Should return a BGR frame (H,W,3). """ # Fake frame frame = np.zeros((480, 640, 3), dtype=np.uint8) return frame def main(): fs, *_ = imu_read_window() predictor = TearPredictor(threshold=0.72, window_seconds=2.0, imu_fs=fs) # You must replace with your own model path vision = VisionTearDetector( model_path="path/to/your_tiny_yolov8_seam_tear_model.pt", conf=0.25, target_class_id=0 ) # Example ROI: narrow seam strip (tune to your camera) roi = (250, 210, 430, 330) # x1, y1, x2, y2 while True: frame = camera_read() # Vision probability from ROI tear_prob = vision.infer_prob(frame, roi) # IMU window features fs, ax, ay, az, gx, gy, gz = imu_read_window() vib_score = predictor.vibration_score(ax, ay, az, gx, gy, gz, fs=fs) trigger, final_score = predictor.update(tear_prob, vib_score) if trigger: # In production this could publish MQTT, log to historian, trigger PLC relay, etc. print(f"[ALERT] likely conveyor tear! final_score={final_score:.2f} tear_prob={tear_prob:.2f} vib={vib_score:.2f}") # Control loop timing: vision and IMU windows often drive the cadence time.sleep(0.05) if __name__ == "__main__": main()
Step 5: Practical training notes from my seam-focused dataset
This is where I learned the most:
- Labeling tiny micro-tear artifacts is hard—small errors in bounding boxes change what the detector learns.
- I tightened the labeling scope to a seam ROI and trained only one class:
micro_tear_edge. - For data, I deliberately included:
- dust/scratches that look similar but don’t progress into tears
- lighting variations (industrial lights flicker sometimes)
- healthy belts with “scuff” marks
That’s how the correlation got its job done: vision gives a “this looks like the tear pattern” probability, IMU confirms “the belt dynamics changed.”
Conclusion
I built an edge pipeline for smart manufacturing that predicts early conveyor belt tear by fusing two signals: a Tiny YOLOv8 detector focused on a seam-region micro-tear class, and IMU vibration features that reflect belt slip/deformation patterns. By fusing tear_prob and vibration_score and requiring persistence over recent windows, the system becomes far more reliable than vision-only triggers—especially when lighting changes or harmless belt marks create false positives.