Skip to content

Possible optimization for very large tables #9

@andybak

Description

@andybak

I'm going to implement this on my custom subclass of MVTManager but I thought it was worth writing up the plan (and my eventual findings) here in case it's worth implementing upstream.

Problem:

I have a table with a few million rows and it's both slow and fairly useless to return a full result set when zoomed out. You probably only need to return a random subset of the results when zoomed out as most items will end up taking up less than a single pixel.

The slowness comes from 2 sources:

  1. the cost of actually querying that many rows
  2. the amount of data you have to send down the wire

Solving (2) would be easy if it wasn't for (1). You still pay a performance penalty for any normal approach to filtering the results.

However Postgres supports a very fast way to return a subset of rows and this is done before any WHERE clause is evaluated:

SELECT * FROM tablename TABLESAMPLE SYSTEM(0.1) WHERE ...;

I'm going to experiment with dynamically calculating the sample % based on zoom level. I have a hunch there's a simple linear formula based on the total rows in the result set and the zoom level that will return a visually similar set of tiles much quicker. It's just a case of tweaking the slope of that formula.

One more point worth noting. Because COUNT itself is slow on large tables there's a trick that gets a fast approximate COUNT:

SELECT reltuples AS ct FROM pg_class WHERE oid = 'tablename'::regclass;

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions