The first open-vocabulary semantic atlas of New York City.
I ran a Vision Language Model on millions of images of New York City to create a searchable visual index of the city. This project moves beyond the rigid grid of addresses to map the invisible systems (culture, wealth, infrastructure, etc.) that actually define the urban experience.
Maps are blind. To Google or Apple, the city is a grid of addresses and listings. The rest of the world gets flattened. The map can tell you where a pharmacy is, but it cannot tell you where the fire escapes are, where the murals are, or where the street trees actually cast shade.
I sought to address this mapping gap. By processing street view imagery with a Vision Language Model (VLM), I did not ask the computer for coordinates; but rather asked it to look. At scale, it effectively translated the visual noise of the street into structured data, turning pixels into patterns and moving from a map of location to a map of meaning.
Every ten years, New York City conducts a massive, manual census of its street trees. Thousands of volunteers walk every block with clipboards, counting and identifying every oak and maple. They do it because the digital map does not know the trees exist.
I wanted to explore: if a human can look at a street corner and see "gentrification," "neglect," or "culture," can a machine do the same? Can we automate the perception of urban biology?
Standard maps rely on manual entry into databases. I used a supercomputer to "watch" the city instead. By generating hundreds of descriptive tags for every street view image in New York City, I created a searchable visual index.
When we query "Chinese," the AI identifies architectural patterns, signage density, and color palettes. It successfully delineates Chinatown without knowing a single zip code. When we query "Gothic," it reveals the 19th-century spine of the city (churches, universities, and older civic buildings) separating the historic from the modern glass towers.
Querying "Gothic" reveals the historic spine of New York City, distinct from the glass of modern skyscrapers.
This was the most unexpected finding in the dataset. When we queried "East" vs "West," the model accurately lit up the respective sides of Manhattan.
|
|
Is it reading street signs? Shadows? The model somehow figured out which way it was facing just by analyzing the image data.
When you stop looking for addresses and start looking for patterns, the invisible becomes obvious.
An in-depth look at the query "scaffolding."
|
An in-depth look at the query "conditioning."
|
Mapping scaffolding is effectively a way to map change. It shows where money is being spent on renovation, and where Local Law 11 is forcing facade repairs. It captures the temporary city, frozen in 2025.
Consider the air conditioner. As modern HVAC systems retro-fit the skyline, the window unit becomes a marker of building age and socioeconomic strata. A semantic query instantly lights up every wall sleeve or hanging unit across the boroughs, revealing the city's pace of renovation in real-time.
I found over 3,000 unique descriptive tags. Here are some of the ones I thought were interesting (more on the Searchable City website):
However, this approach has inherent limitations. It is bound by the same physics as the human eye. A fire hydrant can vanish behind a double‑parked delivery truck. A basement entrance can dissolve into darkness.
And then there are the structural blind spots: what the camera never sees. Courtyards. Lobbies. Rooftops. The private city behind the street wall. Unlike ground-truth datasets provided by the city, a visual index carries the biases of its vantage point. It sees what the street view car sees - no more, no less. So treat this atlas as a hypothesis engine, not a verdict.
Imagine a city you can Ctrl+F.
Not a list of addresses: a living surface you can query. Search: “flood risk.” Search: “closed storefront.” Search: “stoops where people actually sit.”
We’re heading toward a continuous, searchable reality. As cameras multiply and refresh cycles compress, the map stops being a document and becomes a question you can ask at any moment. The interface is simple—a search bar—but what it returns is new: a city organized by meaning instead of coordinates.
Imagery from Google Maps. © 2025 Google LLC, used under fair use.









