Skip to content

seanhlewis/searchablecity

Repository files navigation

Searchable City

searchablecityheadernewnew

The first open-vocabulary semantic atlas of New York City.

I ran a Vision Language Model on millions of images of New York City to create a searchable visual index of the city. This project moves beyond the rigid grid of addresses to map the invisible systems (culture, wealth, infrastructure, etc.) that actually define the urban experience.


Details

Maps are blind. To Google or Apple, the city is a grid of addresses and listings. The rest of the world gets flattened. The map can tell you where a pharmacy is, but it cannot tell you where the fire escapes are, where the murals are, or where the street trees actually cast shade.

I sought to address this mapping gap. By processing street view imagery with a Vision Language Model (VLM), I did not ask the computer for coordinates; but rather asked it to look. At scale, it effectively translated the visual noise of the street into structured data, turning pixels into patterns and moving from a map of location to a map of meaning.

Inspiration

Every ten years, New York City conducts a massive, manual census of its street trees. Thousands of volunteers walk every block with clipboards, counting and identifying every oak and maple. They do it because the digital map does not know the trees exist.

I wanted to explore: if a human can look at a street corner and see "gentrification," "neglect," or "culture," can a machine do the same? Can we automate the perception of urban biology?

Overview

Standard maps rely on manual entry into databases. I used a supercomputer to "watch" the city instead. By generating hundreds of descriptive tags for every street view image in New York City, I created a searchable visual index.

chinese_searchable_city_new When we query "Chinese," it successfully delineates Chinatown without knowing a single zip code.

When we query "Chinese," the AI identifies architectural patterns, signage density, and color palettes. It successfully delineates Chinatown without knowing a single zip code. When we query "Gothic," it reveals the 19th-century spine of the city (churches, universities, and older civic buildings) separating the historic from the modern glass towers.

gothic_searchable_city_new Querying "Gothic" reveals the historic spine of New York City, distinct from the glass of modern skyscrapers.

The Ghost in the Machine

This was the most unexpected finding in the dataset. When we queried "East" vs "West," the model accurately lit up the respective sides of Manhattan.

west_searchable_city_west
east_searchable_city_new

Is it reading street signs? Shadows? The model somehow figured out which way it was facing just by analyzing the image data.

The Decoded City

When you stop looking for addresses and start looking for patterns, the invisible becomes obvious.

scaffolding_indepth An in-depth look at the query "scaffolding."
conditioning_indepth An in-depth look at the query "conditioning."

Perpetual Construction

Mapping scaffolding is effectively a way to map change. It shows where money is being spent on renovation, and where Local Law 11 is forcing facade repairs. It captures the temporary city, frozen in 2025.

The Air Conditioner

Consider the air conditioner. As modern HVAC systems retro-fit the skyline, the window unit becomes a marker of building age and socioeconomic strata. A semantic query instantly lights up every wall sleeve or hanging unit across the boroughs, revealing the city's pace of renovation in real-time.

The Visual Language

I found over 3,000 unique descriptive tags. Here are some of the ones I thought were interesting (more on the Searchable City website):

Visualization Query & Observation
bagel

BAGEL

The breakfast of champions. Note the complete absence in industrial zones.

beer

BEER

Identifies bars, advertisements, and bodegas with neon signage.

garbage

GARBAGE

Correlates with commercial density and foot traffic.

graffiti

GRAFFITI

The unauthorized art layer of New York City.

flower

FLOWER

The city's landscape punctuated by seasonal blooms.

The Blind Spots

However, this approach has inherent limitations. It is bound by the same physics as the human eye. A fire hydrant can vanish behind a double‑parked delivery truck. A basement entrance can dissolve into darkness.

image

And then there are the structural blind spots: what the camera never sees. Courtyards. Lobbies. Rooftops. The private city behind the street wall. Unlike ground-truth datasets provided by the city, a visual index carries the biases of its vantage point. It sees what the street view car sees - no more, no less. So treat this atlas as a hypothesis engine, not a verdict.

The Searchable Future

Imagine a city you can Ctrl+F.

Not a list of addresses: a living surface you can query. Search: “flood risk.” Search: “closed storefront.” Search: “stoops where people actually sit.”

We’re heading toward a continuous, searchable reality. As cameras multiply and refresh cycles compress, the map stops being a document and becomes a question you can ask at any moment. The interface is simple—a search bar—but what it returns is new: a city organized by meaning instead of coordinates.

chinese_look_new gothic_look_new scaffolding_look_new

Special Thanks

Imagery from Google Maps. © 2025 Google LLC, used under fair use.

About

I ran a Vision Language Model on millions of images of New York City to create the first open-vocabulary semantic atlas.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors