Our AI Kept Calling Cars Bicycles. The Bug Was in the Docs.

At Blue Octopus Technology we built a model to recognize objects in aerial footage — the kind of thing that looks at a video feed and draws a box around each vehicle, person, or bike and says what it is. In testing, it kept making the same baffling mistake. It would look at a car, draw a clean box around it, and label it a bicycle.

Not occasionally. Reliably. A car was a bicycle, every time.

Our first instinct was the natural one: the model needs more training. It hasn't seen enough examples. That instinct was wrong, and chasing it would have cost us weeks. The real problem was much smaller and much more embarrassing, and it's a trap anyone working with AI can fall into.

Octo realizing the labels are swapped — the model boxed a car and called it a bicycle because that is what the answer key told it

The model was right. The answer key was wrong.

Here's how these models learn. You show them thousands of labeled examples — this picture is a car, this one is a person, this one is a bicycle — and they learn the patterns. The labels are the answer key. The model is only ever as correct as the answer key it studied from.

Our answer key had cars and bicycles swapped.

Not in the obvious way — nobody sat there typing "bicycle" under pictures of cars. It was subtler and more insidious. The dataset we trained on uses numbers internally, not words. Object type 1, object type 3, object type 7, and so on. Somewhere there's a small table that says which number means which word.

That table was wrong. It listed the numbers in the wrong order. And when we wrote the code to translate those numbers back into words for training, we followed the table — the documentation — instead of checking what the numbers actually corresponded to in the data.

So the model studied diligently, learned exactly what we taught it, and faithfully reproduced our mistake. It was calling cars bicycles because we had labeled cars as bicycles, without ever realizing it, by trusting a document over the thing the document described.

The model wasn't failing. It was succeeding — at learning the wrong answer key we'd handed it.

Why this is the expensive habit

The reason this is worth a whole blog post isn't the bug. Bugs happen. It's the habit that caused it, because that habit is everywhere in AI work and it's almost invisible.

The habit is trusting the description of the data instead of the data.

A document that says "here's how this dataset is organized" feels authoritative. It's written down. It's official. And it's exactly the kind of thing that goes quietly out of date, or was wrong from the start, while everyone downstream keeps trusting it because checking feels redundant. Why would you open the data and count when there's a table right there telling you the answer?

Because the table can be wrong, and the data can't. The data is the ground truth. The document is just somebody's claim about the data — and claims rot.

How we actually fixed it

There were two fixes, and the difference between them matters.

The fast fix was a patch at the very end: catch the model when it says "bicycle," and if we know it really means "car," swap the word back before anyone sees it. That's a band-aid. It works, it's honest about being a band-aid, and it bought us time. But it leaves the real problem — the swapped answer key — sitting in place underneath.

The real fix was to go back to the data, verify which number actually means which object by looking at the examples themselves rather than the table, correct the translation, and retrain the model on a now-correct answer key. More work. The right work. Once the model studied from a correct key, it called cars cars.

We're glad we found it. We're less glad about how long the band-aid almost became permanent — which is its own lesson about the difference between making a symptom go away and fixing the cause.

What this means if you're deploying AI

You're probably not training vision models. But you are, increasingly, feeding AI your own data and trusting its output — and the same trap is waiting.

If you hand an AI a spreadsheet and a note that says "column D is revenue," and column D is actually cost, the AI will give you beautifully confident, completely backwards analysis, and nothing will look broken. If your customer records have a field everyone thinks means one thing and it quietly means another, the AI will build on the misunderstanding, fluently.

The defense is boring and it works: before you trust what an AI tells you about your data, spot-check that the AI — and you — understand what the data actually is. Open it. Look at a few rows. Confirm the labels match reality. Don't take the documentation's word for it, and don't take the AI's word for it either.

Our model spent who-knows-how-long calling cars bicycles, with total confidence, because we believed a table. The fix was five lines of code. Finding it took a week. The lesson was worth more than the week: trust the data, not the story someone wrote about the data.

That includes the stories AI tells you. Especially those.

Our AI Kept Calling Cars Bicycles. The Bug Was in the Docs.

The model was right. The answer key was wrong.

Why this is the expensive habit

How we actually fixed it

What this means if you're deploying AI

More from the field.

The Four-Tracker Spectrum: Picking the Right Multi-Object Tracker for Edge Vision

Why Your TensorRT FP16 Speedup Looks Smaller Than Promised

Estimating Range From a Single Camera: The Math That Replaces LIDAR for Most Cases

Stay Connected