|
| 1 | +# Hierarchical Geography Filtering |
| 2 | + |
| 3 | +## Overview |
| 4 | + |
| 5 | +The DataSpace platform now supports hierarchical geography filtering for datasets and use cases. When a parent geography (e.g., a country) is selected as a filter, the search results will automatically include entities tagged with any child geographies (e.g., states/provinces within that country). |
| 6 | + |
| 7 | +## How It Works |
| 8 | + |
| 9 | +### Geography Hierarchy Structure |
| 10 | + |
| 11 | +The geography data is organized hierarchically: |
| 12 | +- **Region** (e.g., Asia Pacific) |
| 13 | + - **Country** (e.g., India, Thailand, Indonesia) |
| 14 | + - **State/Province** (e.g., Assam, Maharashtra, Bangkok) |
| 15 | + |
| 16 | +### Filtering Behavior |
| 17 | + |
| 18 | +**Before:** Selecting "India" as a geography filter would only return datasets/use cases explicitly tagged with "India". |
| 19 | + |
| 20 | +**After:** Selecting "India" as a geography filter now returns: |
| 21 | +- Datasets/use cases tagged with "India" |
| 22 | +- Datasets/use cases tagged with any Indian state (Assam, Maharashtra, Karnataka, etc.) |
| 23 | + |
| 24 | +This makes it easier to discover all relevant content for a geographic region without having to select each child geography individually. |
| 25 | + |
| 26 | +## Implementation Details |
| 27 | + |
| 28 | +### Backend Changes |
| 29 | + |
| 30 | +#### 1. Geography Model (`api/models/Geography.py`) |
| 31 | + |
| 32 | +Added two new methods to the `Geography` model: |
| 33 | + |
| 34 | +```python |
| 35 | +def get_all_descendant_names(self) -> List[str]: |
| 36 | + """ |
| 37 | + Get all descendant geography names including self. |
| 38 | + Returns a list of geography names including self and all descendants. |
| 39 | + """ |
| 40 | + |
| 41 | +@classmethod |
| 42 | +def get_geography_names_with_descendants(cls, geography_names: List[str]) -> List[str]: |
| 43 | + """ |
| 44 | + Given a list of geography names, return all names including their descendants. |
| 45 | + This is a helper method for filtering that expands parent geographies to include children. |
| 46 | + """ |
| 47 | +``` |
| 48 | + |
| 49 | +#### 2. Search Views |
| 50 | + |
| 51 | +Updated both search views to use hierarchical filtering: |
| 52 | + |
| 53 | +**`api/views/search_dataset.py`** - Dataset search |
| 54 | +**`api/views/search_usecase.py`** - Use case search |
| 55 | + |
| 56 | +In the `add_filters` method, when processing geography filters: |
| 57 | + |
| 58 | +```python |
| 59 | +# For geographies, expand to include all descendant geographies |
| 60 | +if filter == "geographies": |
| 61 | + filter_values = Geography.get_geography_names_with_descendants( |
| 62 | + filter_values |
| 63 | + ) |
| 64 | +``` |
| 65 | + |
| 66 | +### API Usage |
| 67 | + |
| 68 | +The API endpoints remain unchanged. The hierarchical filtering is applied automatically on the backend: |
| 69 | + |
| 70 | +**Dataset Search:** |
| 71 | +``` |
| 72 | +GET /api/search/dataset/?geographies=India |
| 73 | +``` |
| 74 | + |
| 75 | +**Use Case Search:** |
| 76 | +``` |
| 77 | +GET /api/search/usecase/?geographies=Thailand |
| 78 | +``` |
| 79 | + |
| 80 | +**Multiple Geographies:** |
| 81 | +``` |
| 82 | +GET /api/search/dataset/?geographies=India,Thailand |
| 83 | +``` |
| 84 | + |
| 85 | +## Examples |
| 86 | + |
| 87 | +### Example 1: Single Parent Geography |
| 88 | + |
| 89 | +**Request:** |
| 90 | +``` |
| 91 | +GET /api/search/dataset/?geographies=India |
| 92 | +``` |
| 93 | + |
| 94 | +**Behavior:** |
| 95 | +- Expands "India" to include all Indian states/UTs (Assam, Maharashtra, Karnataka, etc.) |
| 96 | +- Returns datasets tagged with India OR any Indian state |
| 97 | + |
| 98 | +### Example 2: Multiple Geographies |
| 99 | + |
| 100 | +**Request:** |
| 101 | +``` |
| 102 | +GET /api/search/dataset/?geographies=India,Thailand |
| 103 | +``` |
| 104 | + |
| 105 | +**Behavior:** |
| 106 | +- Expands "India" to include all Indian states |
| 107 | +- Expands "Thailand" to include all Thai provinces |
| 108 | +- Returns datasets tagged with any of these geographies |
| 109 | + |
| 110 | +### Example 3: Child Geography |
| 111 | + |
| 112 | +**Request:** |
| 113 | +``` |
| 114 | +GET /api/search/usecase/?geographies=Assam |
| 115 | +``` |
| 116 | + |
| 117 | +**Behavior:** |
| 118 | +- "Assam" is a leaf node (no children) |
| 119 | +- Returns use cases tagged with Assam only |
| 120 | +- No expansion occurs |
| 121 | + |
| 122 | +### Example 4: Mixed Parent and Child |
| 123 | + |
| 124 | +**Request:** |
| 125 | +``` |
| 126 | +GET /api/search/dataset/?geographies=India,Bangkok |
| 127 | +``` |
| 128 | + |
| 129 | +**Behavior:** |
| 130 | +- Expands "India" to include all Indian states |
| 131 | +- "Bangkok" remains as-is (leaf node) |
| 132 | +- Returns datasets tagged with India, any Indian state, or Bangkok |
| 133 | + |
| 134 | +## Testing |
| 135 | + |
| 136 | +### Unit Tests |
| 137 | + |
| 138 | +Comprehensive unit tests are available in `tests/test_geography_hierarchy.py`: |
| 139 | + |
| 140 | +```bash |
| 141 | +pytest tests/test_geography_hierarchy.py -v |
| 142 | +``` |
| 143 | + |
| 144 | +Tests cover: |
| 145 | +- Parent geography expansion |
| 146 | +- Leaf node behavior |
| 147 | +- Multiple geography expansion |
| 148 | +- Non-existent geography handling |
| 149 | +- Mixed existing/non-existent geographies |
| 150 | +- Multi-level hierarchy depth |
| 151 | + |
| 152 | +### Manual Testing |
| 153 | + |
| 154 | +You can test the hierarchy methods directly: |
| 155 | + |
| 156 | +```python |
| 157 | +from api.models import Geography |
| 158 | + |
| 159 | +# Get a country |
| 160 | +india = Geography.objects.get(name="India") |
| 161 | + |
| 162 | +# Get all descendants (including itself) |
| 163 | +descendants = india.get_all_descendant_names() |
| 164 | +print(f"India has {len(descendants)} geographies") |
| 165 | + |
| 166 | +# Expand multiple geographies |
| 167 | +expanded = Geography.get_geography_names_with_descendants(["India", "Thailand"]) |
| 168 | +print(f"Expanded to {len(expanded)} geographies") |
| 169 | +``` |
| 170 | + |
| 171 | +## Performance Considerations |
| 172 | + |
| 173 | +### Recursive Query Optimization |
| 174 | + |
| 175 | +The `get_all_descendant_names()` method uses recursion to traverse the geography tree. For the current dataset (4 countries with ~200 states/provinces total), this is performant. |
| 176 | + |
| 177 | +If the geography hierarchy grows significantly deeper or wider, consider: |
| 178 | +1. Adding a caching layer for frequently accessed hierarchies |
| 179 | +2. Using Django's `prefetch_related()` for bulk operations |
| 180 | +3. Implementing a materialized path or nested set model |
| 181 | + |
| 182 | +### Elasticsearch Impact |
| 183 | + |
| 184 | +The geography expansion happens before the Elasticsearch query is executed, so: |
| 185 | +- No changes to Elasticsearch index structure required |
| 186 | +- Query performance remains similar (using `terms` filter) |
| 187 | +- The expanded list of geography names is passed directly to Elasticsearch |
| 188 | + |
| 189 | +## Backward Compatibility |
| 190 | + |
| 191 | +✅ **Fully backward compatible** |
| 192 | + |
| 193 | +- No API changes required |
| 194 | +- Frontend code continues to work without modifications |
| 195 | +- Existing geography filters automatically benefit from hierarchical filtering |
| 196 | +- No database migrations needed |
| 197 | + |
| 198 | +## Future Enhancements |
| 199 | + |
| 200 | +Potential improvements for future iterations: |
| 201 | + |
| 202 | +1. **Configurable Hierarchy Depth**: Allow API consumers to specify how many levels to expand |
| 203 | +2. **Parent Geography Aggregations**: Show parent geographies in faceted search results |
| 204 | +3. **Geography Path Display**: Show full hierarchy path (e.g., "Asia Pacific > India > Assam") |
| 205 | +4. **Caching**: Cache geography hierarchies for improved performance |
| 206 | +5. **Reverse Lookup**: Find parent geographies for a given child |
| 207 | + |
| 208 | +## Related Files |
| 209 | + |
| 210 | +- `api/models/Geography.py` - Geography model with hierarchy methods |
| 211 | +- `api/views/search_dataset.py` - Dataset search with hierarchical filtering |
| 212 | +- `api/views/search_usecase.py` - Use case search with hierarchical filtering |
| 213 | +- `tests/test_geography_hierarchy.py` - Unit tests for hierarchy functionality |
| 214 | +- `api/management/commands/populate_geographies.py` - Geography data population |
| 215 | + |
| 216 | +## Support |
| 217 | + |
| 218 | +For questions or issues related to hierarchical geography filtering, please: |
| 219 | +1. Check the unit tests for usage examples |
| 220 | +2. Review the Geography model implementation |
| 221 | +3. Test with the provided test script |
| 222 | +4. Contact the development team |
0 commit comments