Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 4 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -13,6 +13,10 @@ and this project uses [Semantic Versioning](https://semver.org/spec/v2.0.0.html)

- Obstore and VirtualiZarr should not be required([#1097](https://github.com/nsidc/earthaccess/issues/1097))([@betolink](https://github.com/betolink))

## Added:

- Multi-feature search support (multi_bounding_box, multipolygon, multipoint, multicircle, multiline) from a single API call following the [CMR](https://cmr.earthdata.nasa.gov/search/site/docs/search/api.html#g-polygon) standards.

## [v0.15.0] - 2025-09-16

### Changed
Expand Down
57 changes: 57 additions & 0 deletions docs/user_guide/search.md
Original file line number Diff line number Diff line change
Expand Up @@ -394,6 +394,63 @@ results = earthaccess.search_data(
)
```

### Multi feature support

`earthaccess` supports multi-feature searches for various spatial types, including polygons, points, lines, and more. To query multiple features of the same type, use the prefix multi followed by the feature type, such as:

- `multi_bounding_box`
- `multipolygon`
- `multipoint`
- `multicircle`
- `multiline`

When using `earthaccess.search_data`, the query will return granules that intersect any of the specified features. For example, to search using multiple polygons, you can structure your code as follows:

```python
polygons = [
# same polygon used in the single query
[
(-49.64860422604741, 69.23553485026147),
(-49.667876114626296, 69.07309059285959),
(-49.1722491331669, 69.03175841820749),
(-47.53552489113113, 69.03872918462292),
(-47.35616491854395, 69.22149993224824),
(-48.1447695277283, 69.33507802083219),
(-49.178671242118384, 69.29455117736225),
(-49.64860422604741, 69.23553485026147),
],
# a second polygon over the Eyjafjallajökull volcano in Iceland
[
(-19.61490317965708, 63.63370144220765),
(-19.61490317965708, 63.61370144220765),
(-19.59490317965708, 63.61370144220765),
(-19.59490317965708, 63.63370144220765),
(-19.61490317965708, 63.63370144220765),
]
]

results = earthaccess.search_data(
short_name="ATL06",
multipolygon=polygons,
)
```

Similarly, to query multiple points, you can use:

```python
lon_lat_pairs = [
(-105.25303896425012, 40.01259873086735),
(-96.123457744456789, 19.98765455634521)
]

results = earthaccess.search_data(
short_name="ATL06",
multipoint=lon_lat_pairs,
)
```

This method enables efficient querying of granules that intersect with any of the specified spatial features without the need for multiple API calls.

## Search for services

NASA Earthdata provides services that you can use to transform data before you download it. Transformations include converting data files to a different file format, subsetting data by spatial extent, time range or variable, reprojecting or transforming data to a different coordinate reference system (CRS) from the one it is stored in. Not all datasets have services and not all transformation services are available for a given dataset.
Expand Down
96 changes: 96 additions & 0 deletions earthaccess/search.py
Original file line number Diff line number Diff line change
Expand Up @@ -851,6 +851,26 @@ def point(self, lon: FloatLike, lat: FloatLike) -> Self:
"""
return super().point(lon, lat)

def multipoint(self, lon_lat_pairs: Sequence[PointLike]) -> Self:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please add an or_ kwarg so the caller can explicitly set the option, instead of assuming it should be set to True (and please update the docstring to include it):

Suggested change
def multipoint(self, lon_lat_pairs: Sequence[PointLike]) -> Self:
def points(self, lon_lat_pairs: Sequence[PointLike], *, or_: bool | None = None) -> Self:

I would prefer or_ to be a kwarg because passing a bool value without a keyword is not intuitive.

However, making it a positional arg allows for it to be set like so (whereas this is not possible when or_ is a kwarg):

earthaccess.search_data(
    short_name="ATL06",
    polygons=(polygons, True),
)

Again, it's not intuitive as to what True means here, so I'd like some input from others on this point (cc: @betolink, @mfisher87, @jhkennedy).

I don't like the "bare" boolean argument, but on the other hand, I don't like that (sub)kwargs cannot be specified via search_data.

In order to pass or_ when it is a kwarg, the caller cannot use search_data, and must instead directly construct a DataGranules instance, like so:

query = (
    DataGranules()
    .short_name("ATL06")
    .polygons(polygons, or_=True)
)

(Alternatively, renaming or_ to any_ might make it even more readable -- i.e., any/all is more intuitive than or/and, no?)

"""Filter by granules that include multiple geographic points.

Parameters:
lon_lat_pairs: sequence of (lon, lat) tuples

Returns:
self
"""
points = []

for x, y in lon_lat_pairs:
self.point(x, y)
points.append(self.params.pop('point')[0])

self.params['point'] = points
Comment on lines +863 to +869
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should be unnecessary because the point method already performs an append each time it is called. That is, the point method is already implemented to support being called multiple times, adding a new point to the list of points on each call.

Therefore, this should be all that's required:

Suggested change
points = []
for x, y in lon_lat_pairs:
self.point(x, y)
points.append(self.params.pop('point')[0])
self.params['point'] = points
for x, y in lon_lat_pairs:
self.point(x, y)

self.options['point'] = {'or': True}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would prefer not to assume the caller wants the or option to be set to True, so let's add an or_ kwarg, defaulted to None:

Suggested change
self.options['point'] = {'or': True}
self.option("point", "or", or_)

return self


@override
def polygon(self, coordinates: Sequence[PointLike]) -> Self:
"""Filter by granules that overlap a polygonal area. Must be used in combination
Expand All @@ -869,6 +889,25 @@ def polygon(self, coordinates: Sequence[PointLike]) -> Self:
"""
return super().polygon(coordinates)

def multipolygon(self, multi_coordinates: Sequence[Sequence[PointLike]]) -> Self:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please add an or_ kwarg, as I've described in my comment on the multipoint method:

Suggested change
def multipolygon(self, multi_coordinates: Sequence[Sequence[PointLike]]) -> Self:
def polygons(self, polygons_: Sequence[Sequence[PointLike]], *, or_: bool | None = None) -> Self:

"""Filter by granules that overlap any polygonal area from an input list.

Parameters:
multi_coordinates: list of lists of (lon, lat) tuples
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
multi_coordinates: list of lists of (lon, lat) tuples
polygons_: list of lists of (lon, lat) tuples


Returns:
self
"""
polygons = []

for polygon in multi_coordinates:
self.polygon(polygon)
polygons.append(self.params.pop('polygon'))

self.params['polygon'] = polygons
Comment on lines +901 to +907
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
polygons = []
for polygon in multi_coordinates:
self.polygon(polygon)
polygons.append(self.params.pop('polygon'))
self.params['polygon'] = polygons
self.params["polygon"] = [
self.polygon(polygon).params.pop("polygon") for polygon in polygons_
]

self.options['polygon'] = {'or': True}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
self.options['polygon'] = {'or': True}
self.option("polygon", "or", or_)

return self

@override
def bounding_box(
self,
Expand All @@ -895,6 +934,25 @@ def bounding_box(
return super().bounding_box(
lower_left_lon, lower_left_lat, upper_right_lon, upper_right_lat
)

def multi_bounding_box(self, boxes: Sequence[Tuple[FloatLike, FloatLike, FloatLike, FloatLike]]) -> Self:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
def multi_bounding_box(self, boxes: Sequence[Tuple[FloatLike, FloatLike, FloatLike, FloatLike]]) -> Self:
def bounding_boxes(self, bboxes: Sequence[tuple[FloatLike, FloatLike, FloatLike, FloatLike]], *, or_: bool | None = None) -> Self:

"""Filter by granules that overlap any bounding box from an input list.

Parameters:
boxes: list of tuples of (lower_left_lon, lower_left_lat, upper_right_lon, upper_right_lat)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
boxes: list of tuples of (lower_left_lon, lower_left_lat, upper_right_lon, upper_right_lat)
bboxes: list of tuples of (lower_left_lon, lower_left_lat, upper_right_lon, upper_right_lat)


Returns:
self
"""
bboxes = []

for box in boxes:
self.bounding_box(*box)
bboxes.append(self.params.pop('bounding_box'))

self.params['bounding_box'] = bboxes
Comment on lines +947 to +953
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
bboxes = []
for box in boxes:
self.bounding_box(*box)
bboxes.append(self.params.pop('bounding_box'))
self.params['bounding_box'] = bboxes
self.params["bounding_box"] = [
self.bounding_box(*bbox).params.pop("bounding_box") for bbox in bboxes
]

self.options['bounding_box'] = {'or': True}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
self.options['bounding_box'] = {'or': True}
self.option("bounding_box", "or", or_)

return self

@override
def line(self, coordinates: Sequence[PointLike]) -> Self:
Expand All @@ -913,6 +971,44 @@ def line(self, coordinates: Sequence[PointLike]) -> Self:
pairs, or a coordinate could not be converted to a float.
"""
return super().line(coordinates)

def multiline(self, multi_coordinates: Sequence[Sequence[PointLike]]) -> Self:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
def multiline(self, multi_coordinates: Sequence[Sequence[PointLike]]) -> Self:
def lines(self, lines_: Sequence[Sequence[PointLike]], *, or_: bool | None = None) -> Self:

"""Filter by granules that overlap any series of connected points from an input list.

Parameters:
multi_coordinates: a list of lists of (lon, lat) tuples
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
multi_coordinates: a list of lists of (lon, lat) tuples
lines_: a list of lists of (lon, lat) tuples


Returns:
self
"""
lines = []

for line in multi_coordinates:
self.line(line)
lines.append(self.params.pop('line'))

self.params['line'] = lines
Comment on lines +984 to +990
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
lines = []
for line in multi_coordinates:
self.line(line)
lines.append(self.params.pop('line'))
self.params['line'] = lines
self.params["line"] = [
self.polygon(line).params.pop("line") for line in lines_
]

self.options['line'] = {'or': True}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
self.options['line'] = {'or': True}
self.option("line", "or", or_)

return self

def multicircle(self, multi_circles: Sequence[Tuple[FloatLike,FloatLike,FloatLike]]) -> Self:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
def multicircle(self, multi_circles: Sequence[Tuple[FloatLike,FloatLike,FloatLike]]) -> Self:
def circles(self, circles_: Sequence[Tuple[FloatLike,FloatLike,FloatLike]], *, or_: bool | None = None) -> Self:

"""Filter by granules that overlap any circle from an input list.

Parameters:
multi_circles: list of tuples of (lon, lat, radius)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
multi_circles: list of tuples of (lon, lat, radius)
circles_: list of tuples of (lon, lat, radius)


Returns:
self
"""
circles = []

for circle in multi_circles:
self.circle(*circle)
circles.append(self.params.pop('circle'))

self.params['circle'] = circles
Comment on lines +1003 to +1009
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
circles = []
for circle in multi_circles:
self.circle(*circle)
circles.append(self.params.pop('circle'))
self.params['circle'] = circles
self.params["circle"] = [
self.circle(*circle).params.pop("circle") for circle in circles_
]

self.options['circle'] = {'or': True}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
self.options['circle'] = {'or': True}
self.option("circle", "or", or_)

return self

@override
def downloadable(self, downloadable: bool = True) -> Self:
Expand Down
138 changes: 138 additions & 0 deletions tests/integration/test_search_data.py
Original file line number Diff line number Diff line change
Expand Up @@ -260,3 +260,141 @@ def test_search_data_by_short_name_with_line():
count=expected_count,
)
assert len(results) > 0


@pytest.mark.skipif(SKIP_THIS, reason="calls python-cmr, set SKIP_THIS=False to run")
def test_search_data_by_short_name_with_multipoint():
"""Tests searching for granules with multiple points."""
# Test with 2 points
multipoint_coords = [
(-105.61708725711999, 36.38510879364757), # Taos, NM
(-112.73, 42.5), # Idaho/Utah area
]
results = earthaccess.search_data(
short_name="MOD10A1",
multipoint=multipoint_coords,
count=expected_count,
)
assert len(results) > 0

# Verify that multipoint returns more results than single point
single_point_results = earthaccess.search_data(
short_name="MOD10A1",
point=multipoint_coords[0],
count=expected_count,
)
# Note: multipoint uses OR logic, so should generally return >= single point results
assert len(results) >= len(single_point_results)


@pytest.mark.skipif(SKIP_THIS, reason="calls python-cmr, set SKIP_THIS=False to run")
def test_search_data_by_short_name_with_multipolygon():
"""Tests searching for granules with multiple polygons."""
# Second polygon near Greenland
polygon2 = [
(-45.0, 70.0),
(-45.0, 68.0),
(-40.0, 68.0),
(-40.0, 70.0),
(-45.0, 70.0),
]

multipolygon_coords = [polygon, polygon2]

results = earthaccess.search_data(
short_name="ATL06",
multipolygon=multipolygon_coords,
count=expected_count,
)
assert len(results) > 0

# Verify that multipolygon returns more results than single polygon
single_polygon_results = earthaccess.search_data(
short_name="ATL06",
polygon=polygon,
count=expected_count,
)
# Note: multipolygon uses OR logic, so should generally return >= single polygon results
assert len(results) >= len(single_polygon_results)


@pytest.mark.skipif(SKIP_THIS, reason="calls python-cmr, set SKIP_THIS=False to run")
def test_search_data_by_short_name_with_multi_bounding_box():
"""Tests searching for granules with multiple bounding boxes."""
# Greenland area bounding boxes
bbox1 = (-46.5, 61.0, -42.5, 63.0) # Original bbox from existing test
bbox2 = (-50.0, 65.0, -45.0, 68.0) # Another Greenland area

multi_bboxes = [bbox1, bbox2]

results = earthaccess.search_data(
short_name="ATL06",
multi_bounding_box=multi_bboxes,
count=expected_count,
)
assert len(results) > 0

# Verify that multi_bounding_box returns more results than single bbox
single_bbox_results = earthaccess.search_data(
short_name="ATL06",
bounding_box=bbox1,
count=expected_count,
)
# Note: multi_bounding_box uses OR logic, so should generally return >= single bbox results
assert len(results) >= len(single_bbox_results)


@pytest.mark.skipif(SKIP_THIS, reason="calls python-cmr, set SKIP_THIS=False to run")
def test_search_data_by_short_name_with_multiline():
"""Tests searching for granules with multiple lines."""
# Second line in a different area
line2 = [
(-120.0, 40.0),
(-119.0, 41.0),
(-118.0, 42.0),
(-117.0, 43.0),
]

multiline_coords = [line, line2]

results = earthaccess.search_data(
short_name="ATL08",
multiline=multiline_coords,
count=expected_count,
)
assert len(results) > 0

# Verify that multiline returns more results than single line
single_line_results = earthaccess.search_data(
short_name="ATL08",
line=line,
count=expected_count,
)
# Note: multiline uses OR logic, so should generally return >= single line results
assert len(results) >= len(single_line_results)


@pytest.mark.skipif(SKIP_THIS, reason="calls python-cmr, set SKIP_THIS=False to run")
def test_search_data_by_short_name_with_multicircle():
"""Tests searching for granules with multiple circles."""
# Define two circles
circle1 = (-105.61708725711999, 36.38510879364757, 1000.0) # Taos, NM
circle2 = (-110.0, 35.0, 1500.0) # Another area

multicircle_coords = [circle1, circle2]

results = earthaccess.search_data(
short_name="ATL03",
multicircle=multicircle_coords,
count=expected_count,
)
assert len(results) > 0

# Verify that multicircle returns more results than single circle
single_circle_results = earthaccess.search_data(
short_name="ATL03",
circle=circle1,
count=expected_count,
)
# Note: multicircle uses OR logic, so should generally return >= single circle results
assert len(results) >= len(single_circle_results)
Loading