All articles
AI & Automation·

How We Built a Business Directory Using AI

We built a directory of 18,906 businesses across Western North Carolina for under $500. The hardest part wasn't building it — it was making the data actually useful.

How We Built a Business Directory Using AI

The first version of BluePages was a CSV file with 1,166 businesses and a lot of empty columns.

We had names. We had addresses. We had Google Business listings. What we didn't have was anything useful — no websites, no scores, no way to tell which businesses were thriving online and which were invisible. Just a list.

That was the starting point. What came next was months of pipeline work, dead-end experiments, and one lesson we learned the hard way more than once: raw data is not the same as useful data.

This is the story of how we built BluePages — a free business directory covering 18,906 businesses across 150+ cities in Western North Carolina — and what we learned along the way.

It Started with a Question Nobody Was Answering

We had been writing about how most small businesses are invisible to AI. We'd built a scoring system that evaluated businesses on 13 digital presence signals. We'd scored thousands of them and published the results.

But the more we dug into the data, the more we realized something. There was no good, comprehensive directory of local businesses in Western North Carolina. Google Maps has listings, sure. Yelp has some. But none of them told you anything about a business's digital health. None of them scored businesses on the signals that actually matter in 2026 — structured data, email authentication, mobile responsiveness, content freshness.

We thought: what if the directory itself was the product?

Phase One — Getting the Raw Data

Everything started with the Google Places API. We queried it city by city, category by category, pulling business names, addresses, phone numbers, Google ratings, review counts, and whatever metadata Google would give us.

The first batch covered 17 cities. Then we kept expanding — batch after batch — until we'd covered over 150 cities across the region. By the time we stopped adding new cities, we had 18,906 business records in the database.

But "records" is a generous word for what we had at that point. Most entries were just a name, an address, and a Google listing. No website. No domain. No way to score them on anything meaningful.

About 70% of the businesses in our database don't have a website at all. They exist as a Google Business listing and nothing more. That ratio alone tells a story about the state of small business digital presence in Western NC.

The Domain Matching Problem

This is where we lost the most time.

To score a business, we need its website. Google Places sometimes gives you a URL, but not always. And when it does, the URL might be a Facebook page, a Yelp listing, or a dead link. So we built a domain matching pipeline to find and verify the actual website for each business.

Our first attempt used fuzzy string matching. Take the business name, generate some URL candidates, check if they resolve. Simple enough in theory.

In practice, it produced garbage.

"Mountain View Dental" could match mountainviewdental.com, mountainviewdentistry.com, mvdental.com, or a dozen other variations. Fuzzy matching would confidently return the wrong one. A dental practice in Asheville would get matched to an orthodontist in Oregon. A landscaping company would get matched to a completely unrelated business that happened to share two words in its name.

We burned weeks on this before accepting the fundamental lesson: prefer API validation over fuzzy matching. Instead of guessing domains and hoping we were right, we started validating every match through actual API calls — checking DNS records, verifying page content, confirming the business name appeared on the site. It was slower. It was more expensive. It worked.

After cleaning up the domain pipeline, we ended up with 8,012 businesses with verified domains. That's about 42% of the total database. The rest either don't have websites or have websites we couldn't confidently match.

Honest numbers. We could have inflated that by loosening our matching criteria, but then we'd be scoring the wrong businesses. Accuracy matters more than coverage.

Enrichment — Turning Domains into Data

Once we had verified domains, the enrichment pipeline kicked in. This is the phase where a URL turns into something useful.

For each domain, we crawled the site and checked for 13 signals across four categories — presence, security, marketing, and technical infrastructure. Does the site have SSL? Is there structured data? Are email authentication records configured? When was the content last updated? Is there a CRM indicator in the page source?

Each signal has a weight. A website existing at all is the most important — without it, nothing else scores. Structured data is heavily weighted because it's the single biggest factor in whether AI systems can understand what a business does. Email authentication is weighted because it's a trust signal that most businesses don't know exists.

All 13 signals roll up into a score from 0 to 100. We scored 7,266 businesses this way.

The average score was around 40. That's not great. It means the typical business in Western NC is sending fewer than half the signals that AI systems look for when deciding who to recommend.

The Database Diet

Here's a detail that doesn't sound important until you're paying for hosting: our SQLite database started at 999MB.

That's a problem when you're deploying to a $3.50/month VPS on AWS Lightsail. A gigabyte database on a tiny server means slow queries, slow page loads, and unhappy users.

We went through the data and found enormous bloat — cached HTML from crawls, redundant metadata, uncompressed text fields storing entire web pages. None of it needed to be in the production database.

After cleanup and optimization, the database dropped to 206MB. Same data. Same scores. One-fifth the size. The site got noticeably faster, and we stopped worrying about our VPS running out of disk space.

The Stack

For the technically curious: BluePages runs on Next.js hosted on Vercel for the frontend, with a Python enrichment pipeline handling all the crawling, scoring, and data processing. The database is SQLite — simple, portable, and easy to deploy. The API server runs on a Lightsail instance.

We recently added an AI chat feature called Ask Octo that lets you ask questions about local businesses in plain English. "Who's a good dentist in Black Mountain?" or "Find me a plumber in Asheville with good reviews." It pulls from the same database and returns real answers with business details, scores, and links.

The whole thing cost less than $500 to build, not counting our time. API calls to Google Places were the biggest expense. Hosting is under $20/month.

What We Actually Learned

Most small businesses have terrible digital presence — and that's the opportunity. When 70% of businesses in a region don't even have a website, and the ones that do average a 40 out of 100 on basic digital signals, there's a massive gap between where businesses are and where they need to be. That gap is a service opportunity.

Domain matching is way harder than it sounds. This was our most expensive lesson in terms of time wasted. If you're building anything that needs to connect business names to websites, don't try to be clever with string matching. Validate everything. The API calls cost money, but wrong data costs more.

Database size matters at small scale. When you're running on big infrastructure, nobody notices a bloated database. When you're on a $3.50/month VPS, every megabyte counts. We cut our database by 80% and the site went from sluggish to responsive. Constraint forces discipline.

The hardest part isn't building the directory — it's enriching the data. Pulling 18,906 business listings from Google Places is the easy part. A few days of API calls and you've got a big spreadsheet. But a big spreadsheet is not a product. The scoring, the domain verification, the signal analysis, the ongoing recrawls to keep data fresh — that's where all the real work lives. That's also what makes the data defensible. Anyone can pull a list of businesses from Google. Not everyone can tell you which ones are ready for AI and which ones are invisible.

The Bigger Idea

We didn't build BluePages to prove a technical point. We built it because we saw a model that nobody was talking about.

Niche directories are an overlooked AI business model. The economics are simple — under $500 to build, monetized through lead generation, and the data itself is the moat. Once you've crawled, verified, and scored thousands of businesses in a region, that dataset doesn't exist anywhere else. You can't replicate it by asking ChatGPT.

Local business data is also resistant to the disruption that AI is causing everywhere else. AI can write blog posts and generate images and summarize research papers. It can't go verify that a plumber in Weaverville actually has a working website with structured data. That requires building a pipeline, running it, and maintaining it over time.

A 273-day directory transcript from a source we studied validated this model before we started building. The pattern is real: pick a geography, pick a vertical, build the data layer, and the business opportunities follow.

What Comes Next

BluePages has 18,906 businesses, 8,012 with verified domains, and 7,266 with full AI readiness scores. It covers 150+ cities across Western North Carolina. It's free to search, free to browse, and the AI chat answers questions about local businesses in real time.

But the data is only as good as the last time we checked it. Businesses update their websites. New businesses open. Old ones close. The enrichment pipeline needs to keep running.

And the scoring model is version one. We weighted signals based on our best understanding of what AI systems prioritize, but we don't have ground truth data on exactly how much ChatGPT cares about JSON-LD versus email authentication versus page speed. Our weights are informed guesses. They'll get better as we learn more.

The point was never to build a perfect directory. It was to build a useful one — something that shows real businesses where they stand and what they can do about it.

Not the technology. The plumber in Asheville who looks up their score, sees they're missing structured data, and fixes it. That's the whole point.

Share:

Stay Connected

Get practical insights on using AI and automation to grow your business. No fluff.