Skip to content

Commit 62c6e70

Browse files
authored
Merge pull request #1 from djmartin2019/cursor/new-blog-post-cf4c
2 parents 4294460 + 0d49bfb commit 62c6e70

File tree

1 file changed

+354
-0
lines changed

1 file changed

+354
-0
lines changed
Lines changed: 354 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,354 @@
1+
---
2+
title: "The Limits of LLMs: Shipping Software Without Outsourcing Judgment"
3+
date: "2026-02-09"
4+
summary: "LLMs are great at plausible solutions, but engineers still own assumptions, trade-offs, and failure modes."
5+
description: "Using place deduplication as a real example, this post explores what LLMs do well, what they miss, and how to ship responsibly without outsourcing judgment."
6+
tags: ["AI", "LLMs", "Development", "Systems Design", "MVP"]
7+
featured: false
8+
readTime: 12
9+
image: "/assets/og-default.png"
10+
author: "David Martin"
11+
canonicalURL: "https://djmtech.dev/blog/the-limits-of-llms"
12+
---
13+
14+
# The Limits of LLMs: Shipping Software Without Outsourcing Judgment
15+
16+
Large language models are very good at producing plausible solutions.
17+
They are not good at knowing whether a solution is appropriate for your constraints, scale, or failure modes.
18+
19+
This post isn't about dunking on AI tools. I use them constantly. It's about understanding where their usefulness ends and where responsibility begins.
20+
21+
I ran into this boundary while solving a very real problem: place deduplication in a map-based app.
22+
23+
---
24+
25+
## The Problem That Triggered the Lesson
26+
27+
When users add places to a map, duplicates are inevitable.
28+
29+
My original logic required an exact match on both name and coordinates. It was safe, but naive. In the real world:
30+
31+
- People type "Starbucks" and "Starbucks Coffee"
32+
- GPS coordinates drift by a few meters
33+
- Different restaurants can exist at the exact same location
34+
35+
The result was predictable: obvious duplicates slipping through, and frustrated users wondering why the app didn't "just know."
36+
37+
So I asked Cursor for help.
38+
39+
---
40+
41+
## What LLMs Are Good At (And What Cursor Did Well)
42+
43+
Cursor proposed a multi-stage deduplication strategy:
44+
45+
1. Match on a deterministic external ID (Mapbox Place ID)
46+
2. Fall back to strict name + coordinate matching
47+
3. Finally, use fuzzy matching as a last resort
48+
49+
The fuzzy step combined:
50+
51+
- String similarity (substring checks and Levenshtein distance)
52+
- Physical proximity (Haversine distance)
53+
54+
Candidates were scored using weighted heuristics:
55+
56+
- 70% name similarity
57+
- 30% distance proximity
58+
59+
On paper, this was a solid answer. In fact, it was too easy to accept.
60+
61+
This is where LLMs shine: they can rapidly assemble a solution that looks reasonable, idiomatic, and complete.
62+
63+
But that's also where the danger lives.
64+
65+
---
66+
67+
## The Temptation to Just Ship It
68+
69+
At first glance, it would have been easy to shrug and merge the code.
70+
71+
This is the failure mode I see constantly with AI-assisted development:
72+
73+
> "It seems fine, and I'm tired, so whatever."
74+
75+
If I had done that, I would have shipped behavior I didn't fully understand -- and that's where systems quietly rot.
76+
77+
So I slowed down.
78+
79+
---
80+
81+
## What I Learned by Reviewing the Code
82+
83+
By walking through the implementation carefully, a few things became clear.
84+
85+
### What It Actually Does Well
86+
87+
- Minor name variations resolve cleanly
88+
- Small coordinate drift no longer causes duplicates
89+
- Different restaurants at the same coordinates remain distinct
90+
91+
### What It Quietly Assumes
92+
93+
- The dataset per search is small
94+
- The client is allowed to make heuristic decisions
95+
- Occasional duplicates are acceptable
96+
- Concurrency is not yet a dominant problem
97+
98+
None of these are wrong. But none of them are guarantees, either.
99+
100+
LLMs don't flag these assumptions for you. They're implicit. It's your job to surface them.
101+
102+
---
103+
104+
## Why I Chose to Keep the Implementation (For Now)
105+
106+
I ultimately shipped the solution. Not because it was "what Cursor suggested," but because I understood the trade-offs.
107+
108+
This approach is acceptable today because:
109+
110+
- Fuzzy matching only runs as a fallback
111+
- The search radius is tightly bounded (50 meters)
112+
- Duplicates are recoverable with admin tooling
113+
- Thresholds are tunable, not locked in forever
114+
115+
Most importantly, the behavior aligns with user expectations, which is the actual goal of deduplication in an MVP.
116+
117+
This wasn't blind trust in AI. It was conditional acceptance.
118+
119+
---
120+
121+
## The "Google Question": A Calibration Tool, Not a Goal
122+
123+
When I'm evaluating a solution, I often ask myself a question I never intend to fully answer:
124+
125+
> "How would Google build this?"
126+
127+
Not because I want to build that.
128+
But because it gives me a sense of where my current logic sits on the spectrum between toy and planet-scale system.
129+
130+
It's a way to surface hidden assumptions.
131+
132+
So, purely as an exercise, I asked:
133+
134+
**What would place deduplication look like if the constraint wasn't my MVP, but the entire world?**
135+
136+
The answer is... a lot.
137+
138+
---
139+
140+
## A Rough Sketch of a Google-Scale Deduplication System
141+
142+
At Google scale, you're not deduplicating places for a handful of users. You're reconciling billions of noisy, multilingual, user-contributed entities, all while maintaining low latency and avoiding catastrophic merges.
143+
144+
That changes everything.
145+
146+
### 1. Canonical Name Normalization
147+
148+
Instead of lowercasing strings and calling it a day, you'd see:
149+
150+
- Unicode normalization (NFKC)
151+
- Language-aware stemming and transliteration
152+
- Removal of common business suffixes ("Restaurant", "Ltd", "LLC")
153+
- Large, curated synonym dictionaries for global brands
154+
155+
Some of this would be rules-based. Some would be learned over time. The point isn't elegance... it's coverage.
156+
157+
### 2. Geospatial Indexing at Planet Scale
158+
159+
Rather than radius-based Haversine queries, Google uses **S2 geometry**, which partitions the Earth into hierarchical cells.
160+
161+
Every place is indexed into these cells, enabling:
162+
163+
- Fast "nearby" queries
164+
- Consistent behavior near poles and datelines
165+
- Efficient spatial joins at massive scale
166+
167+
Latitude/longitude math works. S2 works everywhere.
168+
169+
### 3. Semantic Similarity, Not Just String Distance
170+
171+
Instead of relying purely on edit distance:
172+
173+
- Place names are embedded into semantic vectors
174+
- Similarity is computed using cosine distance
175+
- Context like categories, cuisine types, and metadata can be included
176+
177+
This allows the system to understand that
178+
"Joe's Pizza" and "Joe's Famous NY Pizza" are probably related, even when strings differ significantly.
179+
180+
### 4. Multi-Signal Confidence Scoring
181+
182+
Deduplication decisions aren't binary. They're probabilistic.
183+
184+
A real system would combine:
185+
186+
- Name similarity (rules + ML)
187+
- Spatial proximity (S2-based)
188+
- Category overlap
189+
- Phone number or domain matches
190+
- Review patterns
191+
- Possibly even image similarity
192+
193+
These signals feed an ensemble model that outputs a confidence score. Only merges above a carefully tuned threshold are allowed automatically.
194+
195+
### 5. Humans Still Exist
196+
197+
For ambiguous cases, humans step in:
198+
199+
- Moderators
200+
- Trusted contributors
201+
- Regional experts
202+
203+
At scale, human judgment becomes a scarce but necessary resource.
204+
205+
### 6. Offline Reconciliation and Monitoring
206+
207+
Finally, none of this is "set and forget."
208+
209+
There are:
210+
211+
- Batch jobs that reconcile place graphs nightly
212+
- Dashboards tracking false merges
213+
- A/B tests tuning thresholds
214+
- SREs watching metrics like hawks
215+
216+
This system is expensive. It's complex. And it's justified because the cost of being wrong is enormous.
217+
218+
---
219+
220+
## Why This Matters (And Why I Didn't Build It)
221+
222+
I don't ask "how would Google build this" because I want to imitate it.
223+
224+
I ask it because it reveals:
225+
226+
- Which problems I'm not solving yet
227+
- Which assumptions are safe at my scale
228+
- Which heuristics will eventually stop working
229+
230+
My fuzzy matching approach is nowhere near this and that's fine.
231+
232+
Google builds for billions of users, adversarial input, and permanent correctness.
233+
I'm building for an MVP where duplicates are recoverable and behavior needs to feel intuitive.
234+
235+
Understanding the gap helps me ship responsibly now, without pretending my constraints don't exist.
236+
237+
---
238+
239+
## Where LLMs Actually Stop
240+
241+
This experience reinforced something important:
242+
243+
LLMs are excellent at proposing shapes.
244+
They are bad at owning consequences.
245+
246+
They won't:
247+
248+
- Tell you when a heuristic becomes dangerous
249+
- Warn you about race conditions
250+
- Decide which failure modes are acceptable
251+
- Notice when "good enough" quietly turns into technical debt
252+
253+
That responsibility doesn't go away just because the code compiled.
254+
255+
---
256+
257+
## Conclusion
258+
259+
This post isn't about fuzzy matching. It's about using AI without surrendering agency.
260+
261+
Cursor helped me move faster.
262+
Reviewing the code helped me stay honest.
263+
264+
I now know:
265+
266+
- What my app is doing
267+
- Why it behaves the way it does
268+
- Where it will eventually break
269+
- What to watch as it grows
270+
271+
That's the real line between assistance and abdication.
272+
273+
AI can write code.
274+
Only engineers decide when it's safe to ship.
275+
276+
---
277+
278+
### Rule I'm Trying to Follow
279+
280+
> Never ship AI-generated code I couldn't explain to a teammate or debug at 2am.
281+
282+
---
283+
284+
## References & Further Reading
285+
286+
### Geospatial Distance & Matching
287+
288+
- **Haversine Formula**
289+
Movable Type Scripts - *Calculate distance, bearing and more between Latitude/Longitude points*
290+
https://www.movable-type.co.uk/scripts/latlong.html
291+
292+
- **Haversine Formula (Reference)**
293+
Wikipedia - *Haversine formula*
294+
https://en.wikipedia.org/wiki/Haversine_formula
295+
296+
- **Levenshtein Distance (Edit Distance)**
297+
Wikipedia - *Levenshtein distance*
298+
https://en.wikipedia.org/wiki/Levenshtein_distance
299+
300+
- **Original Levenshtein Paper (1966)**
301+
Levenshtein, V. I. - *Binary codes capable of correcting deletions, insertions, and reversals*
302+
https://nymity.ch/sybilhunting/pdf/Levenshtein1966a.pdf
303+
304+
- **Wagner-Fischer Algorithm (1974)**
305+
Wagner, R. A., & Fischer, M. J. - *The String-to-String Correction Problem*
306+
https://par.cse.nsysu.edu.tw/resource/lab_relative/Wagn74.pdf
307+
308+
---
309+
310+
### Google-Scale Geospatial & Infrastructure References
311+
312+
- **S2 Geometry Library (Open Source)**
313+
Google - *S2 Geometry Library*
314+
https://github.com/google/s2geometry
315+
316+
- **S2 Geometry in Practice (Google Cloud)**
317+
Google Cloud Blog - *Best practices for spatial clustering in BigQuery*
318+
https://cloud.google.com/blog/products/data-analytics/best-practices-for-spatial-clustering-in-bigquery
319+
320+
- **Spatial Indexing with S2 Cells**
321+
Google Cloud Docs - *Grid systems for spatial analysis*
322+
https://cloud.google.com/bigquery/docs/grid-systems-spatial-analysis
323+
324+
---
325+
326+
### Large-Scale Systems & Reliability
327+
328+
- **Bigtable: A Distributed Storage System**
329+
Chang et al. (Google Research, 2006)
330+
https://research.google.com/archive/bigtable-osdi06.pdf
331+
332+
- **Cloud Spanner**
333+
Google Cloud - *Spanner: Globally distributed, strongly consistent database*
334+
https://cloud.google.com/spanner
335+
336+
- **Site Reliability Engineering**
337+
Google - *Monitoring Distributed Systems*
338+
https://sre.google/sre-book/monitoring-distributed-systems/
339+
340+
---
341+
342+
### Experimentation, Tuning, and Human-in-the-Loop
343+
344+
- **Controlled Experiments at Scale**
345+
Kohavi et al. - *Practical Guide to Controlled Experiments on the Web*
346+
https://ai.stanford.edu/~ronnyk/2007GuideControlledExperiments.pdf
347+
348+
- **Google Local Guides Program**
349+
Google Transparency Center - *Local Guides Program Policies*
350+
https://transparency.google/intl/en/our-policies/product-terms/local-guides/
351+
352+
---
353+
354+
*Note: The "Google-scale" architecture described in this post is an informed synthesis based on public documentation, research papers, and industry-standard patterns. It is intended as a conceptual calibration tool rather than a literal description of any single internal Google system.*

0 commit comments

Comments
 (0)