|
| 1 | +--- |
| 2 | +title: "The Limits of LLMs: Shipping Software Without Outsourcing Judgment" |
| 3 | +date: "2026-02-09" |
| 4 | +summary: "LLMs are great at plausible solutions, but engineers still own assumptions, trade-offs, and failure modes." |
| 5 | +description: "Using place deduplication as a real example, this post explores what LLMs do well, what they miss, and how to ship responsibly without outsourcing judgment." |
| 6 | +tags: ["AI", "LLMs", "Development", "Systems Design", "MVP"] |
| 7 | +featured: false |
| 8 | +readTime: 12 |
| 9 | +image: "/assets/og-default.png" |
| 10 | +author: "David Martin" |
| 11 | +canonicalURL: "https://djmtech.dev/blog/the-limits-of-llms" |
| 12 | +--- |
| 13 | + |
| 14 | +# The Limits of LLMs: Shipping Software Without Outsourcing Judgment |
| 15 | + |
| 16 | +Large language models are very good at producing plausible solutions. |
| 17 | +They are not good at knowing whether a solution is appropriate for your constraints, scale, or failure modes. |
| 18 | + |
| 19 | +This post isn't about dunking on AI tools. I use them constantly. It's about understanding where their usefulness ends and where responsibility begins. |
| 20 | + |
| 21 | +I ran into this boundary while solving a very real problem: place deduplication in a map-based app. |
| 22 | + |
| 23 | +--- |
| 24 | + |
| 25 | +## The Problem That Triggered the Lesson |
| 26 | + |
| 27 | +When users add places to a map, duplicates are inevitable. |
| 28 | + |
| 29 | +My original logic required an exact match on both name and coordinates. It was safe, but naive. In the real world: |
| 30 | + |
| 31 | +- People type "Starbucks" and "Starbucks Coffee" |
| 32 | +- GPS coordinates drift by a few meters |
| 33 | +- Different restaurants can exist at the exact same location |
| 34 | + |
| 35 | +The result was predictable: obvious duplicates slipping through, and frustrated users wondering why the app didn't "just know." |
| 36 | + |
| 37 | +So I asked Cursor for help. |
| 38 | + |
| 39 | +--- |
| 40 | + |
| 41 | +## What LLMs Are Good At (And What Cursor Did Well) |
| 42 | + |
| 43 | +Cursor proposed a multi-stage deduplication strategy: |
| 44 | + |
| 45 | +1. Match on a deterministic external ID (Mapbox Place ID) |
| 46 | +2. Fall back to strict name + coordinate matching |
| 47 | +3. Finally, use fuzzy matching as a last resort |
| 48 | + |
| 49 | +The fuzzy step combined: |
| 50 | + |
| 51 | +- String similarity (substring checks and Levenshtein distance) |
| 52 | +- Physical proximity (Haversine distance) |
| 53 | + |
| 54 | +Candidates were scored using weighted heuristics: |
| 55 | + |
| 56 | +- 70% name similarity |
| 57 | +- 30% distance proximity |
| 58 | + |
| 59 | +On paper, this was a solid answer. In fact, it was too easy to accept. |
| 60 | + |
| 61 | +This is where LLMs shine: they can rapidly assemble a solution that looks reasonable, idiomatic, and complete. |
| 62 | + |
| 63 | +But that's also where the danger lives. |
| 64 | + |
| 65 | +--- |
| 66 | + |
| 67 | +## The Temptation to Just Ship It |
| 68 | + |
| 69 | +At first glance, it would have been easy to shrug and merge the code. |
| 70 | + |
| 71 | +This is the failure mode I see constantly with AI-assisted development: |
| 72 | + |
| 73 | +> "It seems fine, and I'm tired, so whatever." |
| 74 | +
|
| 75 | +If I had done that, I would have shipped behavior I didn't fully understand -- and that's where systems quietly rot. |
| 76 | + |
| 77 | +So I slowed down. |
| 78 | + |
| 79 | +--- |
| 80 | + |
| 81 | +## What I Learned by Reviewing the Code |
| 82 | + |
| 83 | +By walking through the implementation carefully, a few things became clear. |
| 84 | + |
| 85 | +### What It Actually Does Well |
| 86 | + |
| 87 | +- Minor name variations resolve cleanly |
| 88 | +- Small coordinate drift no longer causes duplicates |
| 89 | +- Different restaurants at the same coordinates remain distinct |
| 90 | + |
| 91 | +### What It Quietly Assumes |
| 92 | + |
| 93 | +- The dataset per search is small |
| 94 | +- The client is allowed to make heuristic decisions |
| 95 | +- Occasional duplicates are acceptable |
| 96 | +- Concurrency is not yet a dominant problem |
| 97 | + |
| 98 | +None of these are wrong. But none of them are guarantees, either. |
| 99 | + |
| 100 | +LLMs don't flag these assumptions for you. They're implicit. It's your job to surface them. |
| 101 | + |
| 102 | +--- |
| 103 | + |
| 104 | +## Why I Chose to Keep the Implementation (For Now) |
| 105 | + |
| 106 | +I ultimately shipped the solution. Not because it was "what Cursor suggested," but because I understood the trade-offs. |
| 107 | + |
| 108 | +This approach is acceptable today because: |
| 109 | + |
| 110 | +- Fuzzy matching only runs as a fallback |
| 111 | +- The search radius is tightly bounded (50 meters) |
| 112 | +- Duplicates are recoverable with admin tooling |
| 113 | +- Thresholds are tunable, not locked in forever |
| 114 | + |
| 115 | +Most importantly, the behavior aligns with user expectations, which is the actual goal of deduplication in an MVP. |
| 116 | + |
| 117 | +This wasn't blind trust in AI. It was conditional acceptance. |
| 118 | + |
| 119 | +--- |
| 120 | + |
| 121 | +## The "Google Question": A Calibration Tool, Not a Goal |
| 122 | + |
| 123 | +When I'm evaluating a solution, I often ask myself a question I never intend to fully answer: |
| 124 | + |
| 125 | +> "How would Google build this?" |
| 126 | +
|
| 127 | +Not because I want to build that. |
| 128 | +But because it gives me a sense of where my current logic sits on the spectrum between toy and planet-scale system. |
| 129 | + |
| 130 | +It's a way to surface hidden assumptions. |
| 131 | + |
| 132 | +So, purely as an exercise, I asked: |
| 133 | + |
| 134 | +**What would place deduplication look like if the constraint wasn't my MVP, but the entire world?** |
| 135 | + |
| 136 | +The answer is... a lot. |
| 137 | + |
| 138 | +--- |
| 139 | + |
| 140 | +## A Rough Sketch of a Google-Scale Deduplication System |
| 141 | + |
| 142 | +At Google scale, you're not deduplicating places for a handful of users. You're reconciling billions of noisy, multilingual, user-contributed entities, all while maintaining low latency and avoiding catastrophic merges. |
| 143 | + |
| 144 | +That changes everything. |
| 145 | + |
| 146 | +### 1. Canonical Name Normalization |
| 147 | + |
| 148 | +Instead of lowercasing strings and calling it a day, you'd see: |
| 149 | + |
| 150 | +- Unicode normalization (NFKC) |
| 151 | +- Language-aware stemming and transliteration |
| 152 | +- Removal of common business suffixes ("Restaurant", "Ltd", "LLC") |
| 153 | +- Large, curated synonym dictionaries for global brands |
| 154 | + |
| 155 | +Some of this would be rules-based. Some would be learned over time. The point isn't elegance... it's coverage. |
| 156 | + |
| 157 | +### 2. Geospatial Indexing at Planet Scale |
| 158 | + |
| 159 | +Rather than radius-based Haversine queries, Google uses **S2 geometry**, which partitions the Earth into hierarchical cells. |
| 160 | + |
| 161 | +Every place is indexed into these cells, enabling: |
| 162 | + |
| 163 | +- Fast "nearby" queries |
| 164 | +- Consistent behavior near poles and datelines |
| 165 | +- Efficient spatial joins at massive scale |
| 166 | + |
| 167 | +Latitude/longitude math works. S2 works everywhere. |
| 168 | + |
| 169 | +### 3. Semantic Similarity, Not Just String Distance |
| 170 | + |
| 171 | +Instead of relying purely on edit distance: |
| 172 | + |
| 173 | +- Place names are embedded into semantic vectors |
| 174 | +- Similarity is computed using cosine distance |
| 175 | +- Context like categories, cuisine types, and metadata can be included |
| 176 | + |
| 177 | +This allows the system to understand that |
| 178 | +"Joe's Pizza" and "Joe's Famous NY Pizza" are probably related, even when strings differ significantly. |
| 179 | + |
| 180 | +### 4. Multi-Signal Confidence Scoring |
| 181 | + |
| 182 | +Deduplication decisions aren't binary. They're probabilistic. |
| 183 | + |
| 184 | +A real system would combine: |
| 185 | + |
| 186 | +- Name similarity (rules + ML) |
| 187 | +- Spatial proximity (S2-based) |
| 188 | +- Category overlap |
| 189 | +- Phone number or domain matches |
| 190 | +- Review patterns |
| 191 | +- Possibly even image similarity |
| 192 | + |
| 193 | +These signals feed an ensemble model that outputs a confidence score. Only merges above a carefully tuned threshold are allowed automatically. |
| 194 | + |
| 195 | +### 5. Humans Still Exist |
| 196 | + |
| 197 | +For ambiguous cases, humans step in: |
| 198 | + |
| 199 | +- Moderators |
| 200 | +- Trusted contributors |
| 201 | +- Regional experts |
| 202 | + |
| 203 | +At scale, human judgment becomes a scarce but necessary resource. |
| 204 | + |
| 205 | +### 6. Offline Reconciliation and Monitoring |
| 206 | + |
| 207 | +Finally, none of this is "set and forget." |
| 208 | + |
| 209 | +There are: |
| 210 | + |
| 211 | +- Batch jobs that reconcile place graphs nightly |
| 212 | +- Dashboards tracking false merges |
| 213 | +- A/B tests tuning thresholds |
| 214 | +- SREs watching metrics like hawks |
| 215 | + |
| 216 | +This system is expensive. It's complex. And it's justified because the cost of being wrong is enormous. |
| 217 | + |
| 218 | +--- |
| 219 | + |
| 220 | +## Why This Matters (And Why I Didn't Build It) |
| 221 | + |
| 222 | +I don't ask "how would Google build this" because I want to imitate it. |
| 223 | + |
| 224 | +I ask it because it reveals: |
| 225 | + |
| 226 | +- Which problems I'm not solving yet |
| 227 | +- Which assumptions are safe at my scale |
| 228 | +- Which heuristics will eventually stop working |
| 229 | + |
| 230 | +My fuzzy matching approach is nowhere near this and that's fine. |
| 231 | + |
| 232 | +Google builds for billions of users, adversarial input, and permanent correctness. |
| 233 | +I'm building for an MVP where duplicates are recoverable and behavior needs to feel intuitive. |
| 234 | + |
| 235 | +Understanding the gap helps me ship responsibly now, without pretending my constraints don't exist. |
| 236 | + |
| 237 | +--- |
| 238 | + |
| 239 | +## Where LLMs Actually Stop |
| 240 | + |
| 241 | +This experience reinforced something important: |
| 242 | + |
| 243 | +LLMs are excellent at proposing shapes. |
| 244 | +They are bad at owning consequences. |
| 245 | + |
| 246 | +They won't: |
| 247 | + |
| 248 | +- Tell you when a heuristic becomes dangerous |
| 249 | +- Warn you about race conditions |
| 250 | +- Decide which failure modes are acceptable |
| 251 | +- Notice when "good enough" quietly turns into technical debt |
| 252 | + |
| 253 | +That responsibility doesn't go away just because the code compiled. |
| 254 | + |
| 255 | +--- |
| 256 | + |
| 257 | +## Conclusion |
| 258 | + |
| 259 | +This post isn't about fuzzy matching. It's about using AI without surrendering agency. |
| 260 | + |
| 261 | +Cursor helped me move faster. |
| 262 | +Reviewing the code helped me stay honest. |
| 263 | + |
| 264 | +I now know: |
| 265 | + |
| 266 | +- What my app is doing |
| 267 | +- Why it behaves the way it does |
| 268 | +- Where it will eventually break |
| 269 | +- What to watch as it grows |
| 270 | + |
| 271 | +That's the real line between assistance and abdication. |
| 272 | + |
| 273 | +AI can write code. |
| 274 | +Only engineers decide when it's safe to ship. |
| 275 | + |
| 276 | +--- |
| 277 | + |
| 278 | +### Rule I'm Trying to Follow |
| 279 | + |
| 280 | +> Never ship AI-generated code I couldn't explain to a teammate or debug at 2am. |
| 281 | +
|
| 282 | +--- |
| 283 | + |
| 284 | +## References & Further Reading |
| 285 | + |
| 286 | +### Geospatial Distance & Matching |
| 287 | + |
| 288 | +- **Haversine Formula** |
| 289 | + Movable Type Scripts - *Calculate distance, bearing and more between Latitude/Longitude points* |
| 290 | + https://www.movable-type.co.uk/scripts/latlong.html |
| 291 | + |
| 292 | +- **Haversine Formula (Reference)** |
| 293 | + Wikipedia - *Haversine formula* |
| 294 | + https://en.wikipedia.org/wiki/Haversine_formula |
| 295 | + |
| 296 | +- **Levenshtein Distance (Edit Distance)** |
| 297 | + Wikipedia - *Levenshtein distance* |
| 298 | + https://en.wikipedia.org/wiki/Levenshtein_distance |
| 299 | + |
| 300 | +- **Original Levenshtein Paper (1966)** |
| 301 | + Levenshtein, V. I. - *Binary codes capable of correcting deletions, insertions, and reversals* |
| 302 | + https://nymity.ch/sybilhunting/pdf/Levenshtein1966a.pdf |
| 303 | + |
| 304 | +- **Wagner-Fischer Algorithm (1974)** |
| 305 | + Wagner, R. A., & Fischer, M. J. - *The String-to-String Correction Problem* |
| 306 | + https://par.cse.nsysu.edu.tw/resource/lab_relative/Wagn74.pdf |
| 307 | + |
| 308 | +--- |
| 309 | + |
| 310 | +### Google-Scale Geospatial & Infrastructure References |
| 311 | + |
| 312 | +- **S2 Geometry Library (Open Source)** |
| 313 | + Google - *S2 Geometry Library* |
| 314 | + https://github.com/google/s2geometry |
| 315 | + |
| 316 | +- **S2 Geometry in Practice (Google Cloud)** |
| 317 | + Google Cloud Blog - *Best practices for spatial clustering in BigQuery* |
| 318 | + https://cloud.google.com/blog/products/data-analytics/best-practices-for-spatial-clustering-in-bigquery |
| 319 | + |
| 320 | +- **Spatial Indexing with S2 Cells** |
| 321 | + Google Cloud Docs - *Grid systems for spatial analysis* |
| 322 | + https://cloud.google.com/bigquery/docs/grid-systems-spatial-analysis |
| 323 | + |
| 324 | +--- |
| 325 | + |
| 326 | +### Large-Scale Systems & Reliability |
| 327 | + |
| 328 | +- **Bigtable: A Distributed Storage System** |
| 329 | + Chang et al. (Google Research, 2006) |
| 330 | + https://research.google.com/archive/bigtable-osdi06.pdf |
| 331 | + |
| 332 | +- **Cloud Spanner** |
| 333 | + Google Cloud - *Spanner: Globally distributed, strongly consistent database* |
| 334 | + https://cloud.google.com/spanner |
| 335 | + |
| 336 | +- **Site Reliability Engineering** |
| 337 | + Google - *Monitoring Distributed Systems* |
| 338 | + https://sre.google/sre-book/monitoring-distributed-systems/ |
| 339 | + |
| 340 | +--- |
| 341 | + |
| 342 | +### Experimentation, Tuning, and Human-in-the-Loop |
| 343 | + |
| 344 | +- **Controlled Experiments at Scale** |
| 345 | + Kohavi et al. - *Practical Guide to Controlled Experiments on the Web* |
| 346 | + https://ai.stanford.edu/~ronnyk/2007GuideControlledExperiments.pdf |
| 347 | + |
| 348 | +- **Google Local Guides Program** |
| 349 | + Google Transparency Center - *Local Guides Program Policies* |
| 350 | + https://transparency.google/intl/en/our-policies/product-terms/local-guides/ |
| 351 | + |
| 352 | +--- |
| 353 | + |
| 354 | +*Note: The "Google-scale" architecture described in this post is an informed synthesis based on public documentation, research papers, and industry-standard patterns. It is intended as a conceptual calibration tool rather than a literal description of any single internal Google system.* |
0 commit comments