Skip to content

Commit 434e5dd

Browse files
committed
add hierarchy filtering for geographies
1 parent f8835ef commit 434e5dd

File tree

5 files changed

+424
-3
lines changed

5 files changed

+424
-3
lines changed

api/models/Geography.py

Lines changed: 44 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,4 +1,4 @@
1-
from typing import Optional
1+
from typing import List, Optional
22

33
from django.db import models
44

@@ -19,6 +19,49 @@ class Geography(models.Model):
1919
def __str__(self) -> str:
2020
return f"{self.name} ({self.type})"
2121

22+
def get_all_descendant_names(self) -> List[str]:
23+
"""
24+
Get all descendant geography names including self.
25+
This is used for hierarchical filtering - when a parent geography is selected,
26+
all child geographies should also be included in the filter.
27+
28+
Returns:
29+
List of geography names including self and all descendants
30+
"""
31+
descendants = [self.name]
32+
children = Geography.objects.filter(parent_id=self.id)
33+
34+
for child in children:
35+
descendants.extend(child.get_all_descendant_names()) # type: ignore[attr-defined]
36+
37+
return descendants
38+
39+
@classmethod
40+
def get_geography_names_with_descendants(
41+
cls, geography_names: List[str]
42+
) -> List[str]:
43+
"""
44+
Given a list of geography names, return all names including their descendants.
45+
This is a helper method for filtering that expands parent geographies to include children.
46+
47+
Args:
48+
geography_names: List of geography names to expand
49+
50+
Returns:
51+
List of geography names including all descendants
52+
"""
53+
all_names = set()
54+
55+
for name in geography_names:
56+
try:
57+
geography = cls.objects.get(name=name)
58+
all_names.update(geography.get_all_descendant_names())
59+
except cls.DoesNotExist:
60+
# If geography doesn't exist, just add the name as-is
61+
all_names.add(name)
62+
63+
return list(all_names)
64+
2265
class Meta:
2366
db_table = "geography"
2467
verbose_name_plural = "geographies"

api/views/search_dataset.py

Lines changed: 8 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -9,7 +9,7 @@
99
from rest_framework import serializers
1010
from rest_framework.permissions import AllowAny
1111

12-
from api.models import Dataset, DatasetMetadata, Metadata
12+
from api.models import Dataset, DatasetMetadata, Geography, Metadata
1313
from api.utils.telemetry_utils import trace_method, track_metrics
1414
from api.views.paginated_elastic_view import PaginatedElasticSearchAPIView
1515
from search.documents import DatasetDocument
@@ -260,6 +260,13 @@ def add_filters(self, filters: Dict[str, str], search: Search) -> Search:
260260
raw_filter = filter + ".raw"
261261
if raw_filter in self.aggregations:
262262
filter_values = filters[filter].split(",")
263+
264+
# For geographies, expand to include all descendant geographies
265+
if filter == "geographies":
266+
filter_values = Geography.get_geography_names_with_descendants(
267+
filter_values
268+
)
269+
263270
search = search.filter("terms", **{raw_filter: filter_values})
264271
else:
265272
search = search.filter("term", **{filter: filters[filter]})

api/views/search_usecase.py

Lines changed: 8 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -9,7 +9,7 @@
99
from rest_framework import serializers
1010
from rest_framework.permissions import AllowAny
1111

12-
from api.models import Metadata, UseCase, UseCaseMetadata
12+
from api.models import Geography, Metadata, UseCase, UseCaseMetadata
1313
from api.utils.telemetry_utils import trace_method, track_metrics
1414
from api.views.paginated_elastic_view import PaginatedElasticSearchAPIView
1515
from search.documents import UseCaseDocument
@@ -311,6 +311,13 @@ def add_filters(self, filters: Dict[str, str], search: Search) -> Search:
311311
raw_filter = filter + ".raw"
312312
if raw_filter in self.aggregations:
313313
filter_values = filters[filter].split(",")
314+
315+
# For geographies, expand to include all descendant geographies
316+
if filter == "geographies":
317+
filter_values = Geography.get_geography_names_with_descendants(
318+
filter_values
319+
)
320+
314321
search = search.filter("terms", **{raw_filter: filter_values})
315322
else:
316323
search = search.filter("term", **{filter: filters[filter]})
Lines changed: 222 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,222 @@
1+
# Hierarchical Geography Filtering
2+
3+
## Overview
4+
5+
The DataSpace platform now supports hierarchical geography filtering for datasets and use cases. When a parent geography (e.g., a country) is selected as a filter, the search results will automatically include entities tagged with any child geographies (e.g., states/provinces within that country).
6+
7+
## How It Works
8+
9+
### Geography Hierarchy Structure
10+
11+
The geography data is organized hierarchically:
12+
- **Region** (e.g., Asia Pacific)
13+
- **Country** (e.g., India, Thailand, Indonesia)
14+
- **State/Province** (e.g., Assam, Maharashtra, Bangkok)
15+
16+
### Filtering Behavior
17+
18+
**Before:** Selecting "India" as a geography filter would only return datasets/use cases explicitly tagged with "India".
19+
20+
**After:** Selecting "India" as a geography filter now returns:
21+
- Datasets/use cases tagged with "India"
22+
- Datasets/use cases tagged with any Indian state (Assam, Maharashtra, Karnataka, etc.)
23+
24+
This makes it easier to discover all relevant content for a geographic region without having to select each child geography individually.
25+
26+
## Implementation Details
27+
28+
### Backend Changes
29+
30+
#### 1. Geography Model (`api/models/Geography.py`)
31+
32+
Added two new methods to the `Geography` model:
33+
34+
```python
35+
def get_all_descendant_names(self) -> List[str]:
36+
"""
37+
Get all descendant geography names including self.
38+
Returns a list of geography names including self and all descendants.
39+
"""
40+
41+
@classmethod
42+
def get_geography_names_with_descendants(cls, geography_names: List[str]) -> List[str]:
43+
"""
44+
Given a list of geography names, return all names including their descendants.
45+
This is a helper method for filtering that expands parent geographies to include children.
46+
"""
47+
```
48+
49+
#### 2. Search Views
50+
51+
Updated both search views to use hierarchical filtering:
52+
53+
**`api/views/search_dataset.py`** - Dataset search
54+
**`api/views/search_usecase.py`** - Use case search
55+
56+
In the `add_filters` method, when processing geography filters:
57+
58+
```python
59+
# For geographies, expand to include all descendant geographies
60+
if filter == "geographies":
61+
filter_values = Geography.get_geography_names_with_descendants(
62+
filter_values
63+
)
64+
```
65+
66+
### API Usage
67+
68+
The API endpoints remain unchanged. The hierarchical filtering is applied automatically on the backend:
69+
70+
**Dataset Search:**
71+
```
72+
GET /api/search/dataset/?geographies=India
73+
```
74+
75+
**Use Case Search:**
76+
```
77+
GET /api/search/usecase/?geographies=Thailand
78+
```
79+
80+
**Multiple Geographies:**
81+
```
82+
GET /api/search/dataset/?geographies=India,Thailand
83+
```
84+
85+
## Examples
86+
87+
### Example 1: Single Parent Geography
88+
89+
**Request:**
90+
```
91+
GET /api/search/dataset/?geographies=India
92+
```
93+
94+
**Behavior:**
95+
- Expands "India" to include all Indian states/UTs (Assam, Maharashtra, Karnataka, etc.)
96+
- Returns datasets tagged with India OR any Indian state
97+
98+
### Example 2: Multiple Geographies
99+
100+
**Request:**
101+
```
102+
GET /api/search/dataset/?geographies=India,Thailand
103+
```
104+
105+
**Behavior:**
106+
- Expands "India" to include all Indian states
107+
- Expands "Thailand" to include all Thai provinces
108+
- Returns datasets tagged with any of these geographies
109+
110+
### Example 3: Child Geography
111+
112+
**Request:**
113+
```
114+
GET /api/search/usecase/?geographies=Assam
115+
```
116+
117+
**Behavior:**
118+
- "Assam" is a leaf node (no children)
119+
- Returns use cases tagged with Assam only
120+
- No expansion occurs
121+
122+
### Example 4: Mixed Parent and Child
123+
124+
**Request:**
125+
```
126+
GET /api/search/dataset/?geographies=India,Bangkok
127+
```
128+
129+
**Behavior:**
130+
- Expands "India" to include all Indian states
131+
- "Bangkok" remains as-is (leaf node)
132+
- Returns datasets tagged with India, any Indian state, or Bangkok
133+
134+
## Testing
135+
136+
### Unit Tests
137+
138+
Comprehensive unit tests are available in `tests/test_geography_hierarchy.py`:
139+
140+
```bash
141+
pytest tests/test_geography_hierarchy.py -v
142+
```
143+
144+
Tests cover:
145+
- Parent geography expansion
146+
- Leaf node behavior
147+
- Multiple geography expansion
148+
- Non-existent geography handling
149+
- Mixed existing/non-existent geographies
150+
- Multi-level hierarchy depth
151+
152+
### Manual Testing
153+
154+
You can test the hierarchy methods directly:
155+
156+
```python
157+
from api.models import Geography
158+
159+
# Get a country
160+
india = Geography.objects.get(name="India")
161+
162+
# Get all descendants (including itself)
163+
descendants = india.get_all_descendant_names()
164+
print(f"India has {len(descendants)} geographies")
165+
166+
# Expand multiple geographies
167+
expanded = Geography.get_geography_names_with_descendants(["India", "Thailand"])
168+
print(f"Expanded to {len(expanded)} geographies")
169+
```
170+
171+
## Performance Considerations
172+
173+
### Recursive Query Optimization
174+
175+
The `get_all_descendant_names()` method uses recursion to traverse the geography tree. For the current dataset (4 countries with ~200 states/provinces total), this is performant.
176+
177+
If the geography hierarchy grows significantly deeper or wider, consider:
178+
1. Adding a caching layer for frequently accessed hierarchies
179+
2. Using Django's `prefetch_related()` for bulk operations
180+
3. Implementing a materialized path or nested set model
181+
182+
### Elasticsearch Impact
183+
184+
The geography expansion happens before the Elasticsearch query is executed, so:
185+
- No changes to Elasticsearch index structure required
186+
- Query performance remains similar (using `terms` filter)
187+
- The expanded list of geography names is passed directly to Elasticsearch
188+
189+
## Backward Compatibility
190+
191+
**Fully backward compatible**
192+
193+
- No API changes required
194+
- Frontend code continues to work without modifications
195+
- Existing geography filters automatically benefit from hierarchical filtering
196+
- No database migrations needed
197+
198+
## Future Enhancements
199+
200+
Potential improvements for future iterations:
201+
202+
1. **Configurable Hierarchy Depth**: Allow API consumers to specify how many levels to expand
203+
2. **Parent Geography Aggregations**: Show parent geographies in faceted search results
204+
3. **Geography Path Display**: Show full hierarchy path (e.g., "Asia Pacific > India > Assam")
205+
4. **Caching**: Cache geography hierarchies for improved performance
206+
5. **Reverse Lookup**: Find parent geographies for a given child
207+
208+
## Related Files
209+
210+
- `api/models/Geography.py` - Geography model with hierarchy methods
211+
- `api/views/search_dataset.py` - Dataset search with hierarchical filtering
212+
- `api/views/search_usecase.py` - Use case search with hierarchical filtering
213+
- `tests/test_geography_hierarchy.py` - Unit tests for hierarchy functionality
214+
- `api/management/commands/populate_geographies.py` - Geography data population
215+
216+
## Support
217+
218+
For questions or issues related to hierarchical geography filtering, please:
219+
1. Check the unit tests for usage examples
220+
2. Review the Geography model implementation
221+
3. Test with the provided test script
222+
4. Contact the development team

0 commit comments

Comments
 (0)