All articles
Engineering Deep Dives·

The Four-Tracker Spectrum: Picking the Right Multi-Object Tracker for Edge Vision

Real-time multi-object tracking on an edge AI box used to be a binary choice: pay the BoT-SORT tax or live with weak re-identification. Adding ByteTrack and OC-SORT opens a real four-point spectrum — 79ms down to 16ms per frame on the same hardware, mission-selectable per deployment. Here's how to choose.

The Four-Tracker Spectrum: Picking the Right Multi-Object Tracker for Edge Vision

If you're running real-time object detection on an edge device — a Jetson, an NVIDIA AGX board, a Coral, anything that's not a cloud GPU — the tracker you put downstream of your detector is where your power budget goes to die.

A typical pipeline runs the detector once per frame, then hands the bounding boxes to a tracker that decides which detection corresponds to which target across frames. The detector usually gets all the attention. The tracker is treated as an afterthought — "use BoT-SORT, it's the default, move on." But on an edge box, the tracker often costs more wall-clock time than the detector itself. A 13W power budget that has to run a 30 FPS pipeline doesn't care that BoT-SORT works great in academic benchmarks; it cares that BoT-SORT plus its ReID embedding compute eats 79 milliseconds per frame on dense scenes, leaving 21 milliseconds for everything else.

Treating tracker selection as a strategic decision instead of a default unlocks a four-point spectrum on the same hardware running the same detector. Here's the picture and how to navigate it.

The four trackers, in order of cost

These are the four trackers most commonly worth considering on an edge AI deployment in 2026. Latency numbers below come from a real-time thermal identification pipeline on an NVIDIA Jetson AGX Orin 64GB at MODE_30W with a fine-tuned YOLO26n TensorRT FP16 engine. Same detector, same hardware, same captures — only the tracker changes.

1. BoT-SORT with ReID — the maximum-fidelity option

BoT-SORT augments a Kalman filter motion model with a re-identification embedding network. When a target leaves the frame and re-enters, the embedding network compares its visual features to the embeddings of recently-lost tracks and reassigns the original track ID. This is the right tracker when re-identification across occlusion is mission-critical — persistent surveillance of a small set of targets over minutes, maritime patrol where a vessel disappears behind another vessel and re-emerges, anything where losing a track ID is operationally expensive.

The cost: roughly 79ms per frame at p95 on a dense scene. ReID adds about 15ms at p95 on top of BoT-SORT itself, and the p99 long tail is dominated by the embedding compute. On a 13W edge budget, that's a 10 FPS deploy ceiling. Acceptable for persistent surveillance. Not acceptable for fast-moving steady-state ISR.

2. BoT-SORT without ReID — the Kalman-only middle ground

The same BoT-SORT pipeline minus the ReID embedding. You still get the Kalman motion model, the Hungarian matching, the global motion compensation (GMC). You lose the cross-occlusion re-identification.

The cost drops to roughly 64ms at p95 dense — about 15ms saved by dropping the ReID. That's a 15 FPS deploy ceiling. The right tracker for sparse-target single-pass scenarios where targets don't leave and re-enter the field of view: a coastal flight watching a single aircraft pass through, a fixed-pole lookout scanning for incoming targets, anywhere the re-identification cost is theoretical.

3. OC-SORT — Kalman with observation-centric re-update, no GMC tax

OC-SORT is the newest of the four. It replaces BoT-SORT's GMC layer with an observation-centric re-update (ORU) that handles brief occlusions by re-anchoring the Kalman state to the most recent observation when a track is recovered. The motion model is similar to BoT-SORT-noreid; the architectural difference is the ORU re-anchoring and the elimination of the GMC compute.

The cost lands at roughly 17ms at p95 on a dense scene — about 1ms over ByteTrack, about 47ms under BoT-SORT-noreid. That's a 30 FPS deploy ceiling. It's the right tracker when short occlusions matter (a target moves behind a building for a few frames) but a moving camera with global motion compensation is overkill. Static cameras at lookout posts are the canonical case.

The catch: OC-SORT isn't shipped natively in Ultralytics. The common Python implementation is in Roboflow's trackers library, which requires Python ≥ 3.10 + numpy ≥ 2. On a JetPack 5.1.2 device capped at Python 3.8 and numpy 1.23, that doesn't install cleanly. The workaround is vendoring the upstream noahcao/OC_SORT reference directly — we wrote a separate post on that integration story.

4. ByteTrack — IOU-only, fastest, no Kalman

ByteTrack uses two-stage IOU matching against the previous frame's tracks and skips both the Kalman motion model and the appearance embedding. It works because most detections in a steady-state video stream have enough spatial overlap with their own previous bounding box that IOU matching is sufficient. The classic two-stage trick — match high-confidence detections first, then sweep the unmatched low-confidence detections against unmatched tracks — recovers a surprising amount of accuracy from a very lightweight algorithm.

The cost: roughly 16ms at p95 on a dense scene, with a flat distribution — p99 is within 1-2ms of p95, no long tail. That's a 30 FPS deploy ceiling with headroom. The right tracker for fast steady-state ISR where targets are reliably visible and the spatial overlap assumption holds.

ByteTrack fails on two scene types: fast camera motion (no Kalman to compensate) and frequent occlusion (no ReID to recover). Don't use it on a moving drone tracking a maneuvering target. Do use it on a fixed sensor watching a steady stream of well-separated targets.

The deploy spectrum, side-by-side

Tracker p95 dense (ms) FPS budget Mission profile
BoT-SORT-reid 79 10 Persistent surveillance, re-entry critical, maritime
BoT-SORT-noreid 64 15 Sparse single-pass, Kalman + GMC, no re-entry needed
OC-SORT 17 30 Short-occlusion / re-acquisition, static camera, no GMC tax
ByteTrack 16 30 Fast steady-state ISR, reliable IOU overlap

Same hardware. Same detector. Same captures. The difference between the fastest and slowest is a factor of 5 — and the decision about which to deploy is entirely a mission profile question, not a hardware question.

The trap most engineering teams fall into is selecting one tracker as the deployment default and forcing every mission to live inside its profile. That's how you get persistent-surveillance latency on a fast steady-state mission, or how you end up dropping tracks on a re-entry scenario because someone optimized for FPS at the expense of robustness.

The better pattern is making the tracker a runtime flag. The detector, the engine, the inference loop, the JSONL output, the downstream MAVLink publish — all of it stays the same. The tracker swaps via a CLI argument. The operator picks the right profile at flight time.

Why this matters beyond the trackers

The structural lesson isn't about BoT-SORT versus ByteTrack. It's about treating components in your inference pipeline as substitutable along well-defined axes — latency, robustness, motion-model fidelity, appearance-feature dependency — rather than locked-in defaults inherited from a paper or a library's quickstart.

Other components in an edge vision pipeline that benefit from the same treatment:

The detector itself. YOLO26n at FP16 versus FP32, RF-DETR Nano versus Small versus Medium, alternative detectors entirely. Each costs a different amount of compute, and the mission profile dictates which one wins.

The vision language model. Qwen3-VL 8B versus Molmo 7B versus a smaller distilled model on the box, with or without a cloud fallback to Claude or GPT-5 when the task warrants. Same latency-versus-capability tradeoff, different axis.

The motion compensation. BoT-SORT's GMC layer is one option. EKF + IMU fusion is another. Pure visual odometry is a third. Each costs different compute and matches different platform constraints (static, moving, drone-mounted, ground-mounted).

The four-tracker spectrum is one example of a broader pattern: treat your inference pipeline as a set of swappable components configured per-mission, not as a fixed stack that runs everywhere.

Picking your starting point

For a project that's just spinning up real-time tracking on an edge box:

  • Default to ByteTrack if your camera is static or near-static and your targets don't disappear and re-emerge. It's the fastest, the most predictable distribution, and the cheapest to debug.
  • Move to OC-SORT when brief occlusions are part of the mission profile — targets moving behind obstacles, frame edges, partial fade-outs.
  • Move to BoT-SORT-noreid when the camera itself is moving and global motion compensation matters more than visual appearance features.
  • Move to BoT-SORT-reid when persistent track identity across full occlusion is operationally required — the kind of scenario where confusing two targets is worse than dropping one.

Start with the simplest tracker that meets the mission. Move up the spectrum when measured evidence shows you need to. Don't deploy the most capable tracker by default; you'll burn frame budget that your downstream pipeline needs.


Related reading:

Share:

Stay Connected

Get practical insights on using AI and automation to grow your business. No fluff.