A field report on rebuilding the search on our aircraft marketplace, starting from a one-character bug and ending with search that resolves what people actually mean.

The short version:


The problem

We run an aircraft marketplace that pulls listings from more than 100 broker sites. Search is the front door. If someone types the name of an airplane and gets the wrong airplane, we have lost them before they have seen a single listing.

The bug that started this: search "Cirrus SR22" and you get the Cirrus SR22, correct. Add a space, search "Cirrus SR 22", and you get the Cirrus SR22T instead. The SR22T is the turbocharged version, a separate model that sells for tens of thousands more. A space bar was quietly sending buyers to the wrong, more expensive airplane.

That is the kind of bug that does not throw an error and does not show up in a log. It just hands people the wrong results. We decided to rebuild search properly. Most of what we learned applies to any marketplace where people search for things you keep in a catalog: cars by make and model, parts by number, books by title.

Learning 1: Look at what people actually type before you change anything

The first thing we did was not write code. We pulled a year of real search queries out of our analytics (site search shows up as a dimension in GA4). It came to 834 distinct queries, and reading that list reshaped the whole plan.

People do not type the way you picture. The real queries included:


If we had built against our assumptions, we would have tuned for tidy "Make Model" queries that are only a slice of the traffic. The single space in "Cirrus SR 22" that kicked this off was not a rare slip. It was a common shape. We even found the literal string "Cirrus sr 22" in the log, which was oddly satisfying.

The point is simple. Your analytics already records what people search for. Read it first. It tells you which cases are worth getting exactly right and which are noise.

Learning 2: Fuzzy matching against raw listing text is the trap

The old search resolved make and model by taking the query and fuzzy-matching it against strings collected from listings, using a string-similarity library. That sounds reasonable, and it is the default approach almost everyone reaches for. It is also exactly why "Cirrus SR 22" broke.

Here is the mechanism. "SR 22" was split into two tokens, "sr" and "22". The matcher tried the single token "sr" first, and "sr" is a partial match for both "sr22" and "sr22t" at the same score. The tiebreak happened to pick the turbo. The two-token match "sr 22" that would have landed on the plain SR22 never got its turn, because the greedy single-token pass fired first.

You can patch that one ordering. The deeper problem stays. String similarity has no idea that SR22 and SR22T are two different airplanes at two different prices. It only sees that the strings look alike. In a catalog full of names that differ by a single character (SR20, SR22, SR22T, DA40, DA42, DA62, PA28, PA32), "looks alike" is precisely the wrong signal. The near-misses are not noise to smooth over. They are different products.

Learning 3: Resolve the query against your catalog, not your data

This is the heart of the rebuild. We already had a curated taxonomy: manufacturer, then model family, then variant, with aliases (so Cessna also matches "Cesna", and Daher also matches the old "Socata" name). That taxonomy is the source of truth for what an airplane is.

Instead of fuzzy-matching the query against listing strings, we resolve it against the taxonomy with exact matching on a normalized form. Normalized means lowercase, strip spaces and hyphens, then compare. "SR 22", "SR-22", and "SR22" all normalize to "sr22", which matches the SR22 model exactly and matches the SR22T never, because "sr22" is not "sr22t". The same step makes "diamond da 40", "DA-40", and "DA40" all land on the DA40. No similarity scores, no tiebreaks, no guessing.

That reframes search from a fuzzy text problem into a lookup against a structured catalog we already own. The query is messy, the catalog is clean, and the job is to map one onto the other. Once a query resolves to a model, we have its id (a slug), and we can match listings by that id rather than by how their titles happen to be spelled.

The one case the resolver missed at first was a model typed with no make, just "172" or "DA40" or "SR22". A make-anchored resolver needs the make to find the model. So we added a second pass that looks a model designator up across every manufacturer when it is distinctive enough to be unambiguous. "172" is Cessna, "da40" is Diamond, and that is settled.

Learning 4: Search is two jobs, and the second one is where it bit us

Resolving the query is only half of search. The other half is building the result set and ordering it. We fixed resolution, shipped it, and a user found a bug within a day that had nothing to do with text.

The report: search "Diamond DA40", get a great list of DA40s, then switch the sort to "cheapest first" and the page fills with random cheap airplanes. A two-seat trainer here, a different make there, whatever happened to be cheap.

The cause was uncomfortable. The search had never actually filtered the results down to DA40s. It kept the whole catalog in the result set and ranked the DA40s to the top. Under the default relevance sort that looks identical to filtering, because the relevant items are on top and nobody scrolls past 8,000 listings. The instant you sort by price, the relevance ranking is thrown away and you are sorting the entire marketplace by price. The DA40s sink into the pile.

The fix: when a query resolves to a subject, a specific model or at least a make, restrict the result set to that subject, then sort within it. Now "DA40 by price" means the cheapest DA40s, not the cheapest anything. There was a tidy side effect too. The result counts became honest. Searching "Cessna" used to claim several thousand results because it was ranking the whole site. Now it returns the Cessnas.

If there is one debugging lesson here, it is this. A result list that looks right under the default sort can be hiding the fact that it was never filtered at all. Change the sort and watch what falls out.

Learning 5: Show the exact match first, but do not hide the close one

Our first cut of the fix was a touch too strict. Search "SR22" and it returned only the plain SR22, hiding the SR22T completely. Technically correct, since they are different models. The owner pushed back, and he was right. The turbo SR22 is, to a buyer, basically the same airplane. Show it. Just put it below the plain SR22, not in place of it.

So ranking groups models by a "series stem". We take the model identifier and drop a trailing letter that follows the number, so sr22 and sr22t share the stem sr22, m20j and m20k share m20, and 172 and 172rg share 172. The order becomes: exact model first, same-series siblings right below, other models from the same maker after that, and unrelated airplanes last. The detail that matters is that da40 and da42 do not share a stem, because the numbers differ, so they stay apart. The line between "same series, different trim" and "different model" is something we only drew correctly because a person who knows airplanes looked at the output. A unit test would have happily left the SR22T hidden and called it green.

Learning 6: Build autocomplete from the catalog too

Once resolution worked, we added a type-ahead dropdown. The tempting move is to build the suggestions from an index of listing text. We built them from the same taxonomy instead: every brand, model, and variant that actually has a listing, with its count. Same source of truth as resolution, so the thing you click resolves cleanly when the search runs.

It also has to match the way people type, which is not always left to right in one neat string. "cess 172" should suggest the Cessna 172 even though "cess" and "172" are never next to each other in the real name. So the matcher is token aware: every typed token has to appear somewhere in the candidate, and whole-query prefixes rank highest. That one detail is the difference between an autocomplete that feels smart and one that feels broken.

Learning 7: Decide what you will not resolve

A lot of real queries do not map to a make or model, and forcing them is a mistake. "4 seater", "helikopter", "Rotax 912uls", "Germany", "N556L". The design that worked is a clean split. The taxonomy resolves make, model, and variant precisely. A handful of specific extras get their own handling: seat counts including spelled-out numbers ("four seats"), category words in several languages, and registrations (we added the registration field to the searchable text, so "N556L" now finds the exact airplane). Everything else falls through to a plain text search over the listing fields.

We also tightened the old fuzzy matcher rather than deleting it. It still catches genuine misspellings like "Cesna" and "Boing", but only when the candidate shares a first letter and clears a higher bar. That stopped it from turning an engine model like "Rotax 912uls" into some unrelated airplane, which it used to do. The fuzzy logic is now a small safety net behind the exact resolver, not the main event.

The search we landed on

Stripped down, the search now does this:


What we would tell another marketplace team

  1. Read your real search logs before you tune anything. They will not look like you expect.
  2. If you keep a catalog of what you sell, resolve queries against that catalog, not against the raw text of your listings. Exact matching on a normalized form beats fuzzy similarity for anything where near-identical names mean different products.
  3. Treat resolution and ranking as two separate jobs. A list that looks correct under the default sort can be unfiltered underneath. Change the sort to find out.
  4. Ranking carries product judgment. "Show the close variant, just lower down" is the kind of rule you get from a person, not from a test.
  5. Decide which queries you will resolve precisely and which you will let fall through to text search. Trying to resolve all of them is how you end up with confident wrong answers.

Search felt like a language problem when we started. It turned out to be a data-modeling problem. We already had the right catalog of every airplane we list. The work was not inventing a clever matcher. It was pointing the messy query at the clean catalog, and then getting out of the way.