Skip to content

Latest commit

 

History

History
363 lines (287 loc) · 11.5 KB

File metadata and controls

363 lines (287 loc) · 11.5 KB

Plan: Categorical Column Filtering

Overview

Add a dynamic filtering system that detects categorical columns from loaded parquet data and provides dropdown/multi-select inputs to filter the data. Filters will update the map and all charts, grouping by and summing values for each category. Users can select a subset of values (e.g., just "one", "three").


Architecture

Data Flow

Parquet File (loaded)
    ↓
DuckDB-WASM detects columns & types
    ↓
Identify categorical columns (string/enum types with low cardinality)
    ↓
Query distinct values for each categorical column
    ↓
Render dropdown/multi-select UI components
    ↓
User selects values (e.g., ["one", "three"])
    ↓
Build SQL predicate (WHERE column IN ('one', 'three'))
    ↓
Combine with existing Mosaic brush selection
    ↓
Update Map + Charts with filtered data

Implementation Steps

1. Create Categorical Filter Slice (src/CategoricalFilterSlice.ts)

A new Zustand slice to manage categorical filter state:

// State structure
type CategoricalFilterState = {
  categoricalFilters: {
    // Discovered categorical columns with their unique values
    columns: Array<{
      name: string;
      values: string[];       // All possible values
      selected: string[];     // Currently selected values (empty = all)
    }>;
    // Actions
    setColumns: (columns: Array<{name: string; values: string[]}>) => void;
    setSelectedValues: (columnName: string, values: string[]) => void;
    clearFilter: (columnName: string) => void;
    clearAllFilters: () => void;
  };
};

Config schema (Zod):

export const CategoricalFilterConfig = z.object({
  selectedFilters: z.record(z.string(), z.array(z.string())).default({}),
});

2. Create Column Detection Utility (src/utils/categoricalColumns.ts)

Utility functions to detect and query categorical columns:

// Detect categorical columns from table schema
async function detectCategoricalColumns(
  connector: DuckDbConnector,
  tableName: string,
  maxCardinality: number = 50  // Columns with > 50 unique values are not categorical
): Promise<string[]>

// Get distinct values for a categorical column
async function getDistinctValues(
  connector: DuckDbConnector,
  tableName: string,
  columnName: string
): Promise<string[]>

Detection logic:

  1. Query table schema: DESCRIBE tableName
  2. Filter for string/varchar columns
  3. For each candidate, count distinct values: SELECT COUNT(DISTINCT col) FROM table
  4. Include only columns with cardinality ≤ maxCardinality

3. Create Filter UI Components

3.1 Single Column Filter (src/components/filters/CategoricalFilter.tsx)

A reusable dropdown/multi-select component:

interface CategoricalFilterProps {
  columnName: string;
  values: string[];           // All possible values
  selected: string[];         // Currently selected
  onChange: (selected: string[]) => void;
}

UI options (choose one):

Option A: Multi-select Dropdown (Recommended)

  • Uses @sqlrooms/ui Select/Combobox with multi-select
  • Shows selected count as badge
  • Searchable for columns with many values
  • "Select All" / "Clear" buttons

Option B: Checkbox List

  • Vertical list of checkboxes
  • Good for columns with few values (< 10)
  • Takes more vertical space

Option C: Chip/Tag Selection

  • Horizontal chips that toggle on click
  • Visual and compact
  • Best for 5-15 values

3.2 Filters Container (src/components/filters/CategoricalFiltersPanel.tsx)

Container that renders all detected categorical filters:

export function CategoricalFiltersPanel() {
  const columns = useRoomStore(state => state.categoricalFilters.columns);
  const setSelectedValues = useRoomStore(state => state.categoricalFilters.setSelectedValues);

  return (
    <AccordionItem value="categorical-filters">
      <AccordionTrigger>Filter by Category</AccordionTrigger>
      <AccordionContent>
        {columns.map(col => (
          <CategoricalFilter
            key={col.name}
            columnName={col.name}
            values={col.values}
            selected={col.selected}
            onChange={(values) => setSelectedValues(col.name, values)}
          />
        ))}
      </AccordionContent>
    </AccordionItem>
  );
}

4. Integrate with Mosaic Selection System

4.1 Build Combined Filter Predicate

Create a utility to combine categorical filters with the brush selection:

// src/utils/buildFilterPredicate.ts
import { sql } from '@sqlrooms/mosaic';

export function buildCategoricalPredicate(
  filters: Record<string, string[]>
): string | null {
  const predicates = Object.entries(filters)
    .filter(([_, values]) => values.length > 0)
    .map(([column, values]) => {
      const escaped = values.map(v => `'${v.replace(/'/g, "''")}'`).join(', ');
      return `"${column}" IN (${escaped})`;
    });

  return predicates.length > 0 ? predicates.join(' AND ') : null;
}

4.2 Update MapView Query

Modify src/components/map/MapView.tsx to include categorical filters:

const categoricalPredicate = useRoomStore(state =>
  buildCategoricalPredicate(state.categoricalFilters.selectedFilters)
);

const { data } = useMosaicClient<Table>({
  selectionName: 'brush',
  query: (filter: any) => {
    let query = Query.from('earthquakes')
      .select('Latitude', 'Longitude', 'Magnitude', 'Depth', 'DateTime');

    if (categoricalPredicate) {
      query = query.where(sql`${categoricalPredicate}`);
    }
    return query.where(filter);
  },
});

4.3 Update Chart Plots

Modify src/components/filters/filterPlots.ts to accept categorical filter:

export const createMagPlot = (brush: Selection, categoricalFilter?: string) => {
  const source = categoricalFilter
    ? vg.from('earthquakes', { where: categoricalFilter })
    : vg.from('earthquakes');

  return vg.plot(
    vg.rectY(source, { /* ... */ }),
    // ...
  );
};

5. Initialize Filters on Data Load

5.1 Hook for Auto-Detection (src/hooks/useCategoricalColumns.ts)

export function useCategoricalColumns(tableName: string) {
  const connector = useRoomStore(state => state.db.connector);
  const setColumns = useRoomStore(state => state.categoricalFilters.setColumns);
  const isTableReady = useRoomStore(state =>
    state.db.tables.find(t => t.tableName === tableName)
  );

  useEffect(() => {
    if (!isTableReady || !connector) return;

    async function detect() {
      const columns = await detectCategoricalColumns(connector, tableName);
      const columnsWithValues = await Promise.all(
        columns.map(async name => ({
          name,
          values: await getDistinctValues(connector, tableName, name),
          selected: [],
        }))
      );
      setColumns(columnsWithValues);
    }

    detect();
  }, [isTableReady, connector, tableName]);
}

5.2 Integrate in MainView or FiltersPanel

Call the hook when the component mounts:

// In FiltersPanel.tsx
useCategoricalColumns('earthquakes');

6. Update Store Configuration

Modify src/store.ts to include the new slice:

import { createCategoricalFilterSlice, CategoricalFilterSliceState } from './CategoricalFilterSlice';

export type RoomState = RoomShellSliceState &
  SqlEditorSliceState &
  MosaicSliceState &
  MapSettingsSliceState &
  CategoricalFilterSliceState;  // Add new slice

// In store creation:
...createCategoricalFilterSlice()(set, get, store),

File Changes Summary

File Action Description
src/CategoricalFilterSlice.ts Create New Zustand slice for filter state
src/utils/categoricalColumns.ts Create Column detection and value queries
src/utils/buildFilterPredicate.ts Create SQL predicate builder
src/hooks/useCategoricalColumns.ts Create Auto-detection hook
src/components/filters/CategoricalFilter.tsx Create Single filter dropdown component
src/components/filters/CategoricalFiltersPanel.tsx Create Container for all categorical filters
src/components/filters/FiltersPanel.tsx Modify Add CategoricalFiltersPanel to accordion
src/components/filters/filterPlots.ts Modify Accept categorical filter parameter
src/components/map/MapView.tsx Modify Include categorical filter in query
src/store.ts Modify Add CategoricalFilterSlice

UI Mockup

┌─────────────────────────────────────────┐
│ Filters Panel                           │
├─────────────────────────────────────────┤
│ ▼ Filter by Category                    │
│   ┌─────────────────────────────────┐   │
│   │ Region          [Select...]  ▼  │   │
│   │                                 │   │
│   │ ☑ California                    │   │
│   │ ☐ Nevada                        │   │
│   │ ☑ Oregon                        │   │
│   │ ☐ Washington                    │   │
│   │                                 │   │
│   │ [Select All] [Clear]            │   │
│   └─────────────────────────────────┘   │
│                                         │
│   ┌─────────────────────────────────┐   │
│   │ Source Type     [2 selected] ▼  │   │
│   └─────────────────────────────────┘   │
├─────────────────────────────────────────┤
│ ▼ Distribution by Magnitude             │
│   [Histogram chart...]                  │
├─────────────────────────────────────────┤
│ ▼ Temporal Frequency                    │
│   [Timeline chart...]                   │
└─────────────────────────────────────────┘

Edge Cases & Considerations

  1. Empty Selection: When no values are selected, show all data (no filter applied)
  2. Large Cardinality: Skip columns with >50 unique values (not truly categorical)
  3. Null Values: Handle NULL values in categorical columns - display as "(Empty)" or similar
  4. Special Characters: Escape quotes and special chars in SQL predicates
  5. Performance: Cache distinct values, don't re-query on every render
  6. Loading State: Show skeleton/spinner while detecting columns
  7. No Categorical Columns: Hide the filter section if no categorical columns detected

Testing Checklist

  • Categorical columns auto-detected on data load
  • Dropdown shows all unique values for each column
  • Selecting values filters the map correctly
  • Selecting values filters all charts correctly
  • Multi-select works (e.g., select "one" and "three")
  • "Select All" selects all values
  • "Clear" removes all selections (shows all data)
  • Filters persist across page refresh (if persistence enabled)
  • Combines correctly with brush selection on charts
  • Performance acceptable with large datasets
  • Handles columns with NULL values