add behavior to allow for deleting items out of collections in bulk by philvarner · Pull Request #891 · stac-utils/stac-server

philvarner · 2025-04-24T20:31:29Z

Related Issue(s):

Bulk operations feature inadvertently dropped in 2.0.0 #885

Proposed Changes:

removes unused code for the old "bulk" operations feature that was removed in 2.0.0
adds new "actions" ingest and a "truncate" action to remove all of the items from a collection
adds better code coverage configuration and run targets
dependency update of minor and patch versions
got ingest.js to 100% code coverage 🎉

PR Checklist:

I have added my changes to the CHANGELOG or a CHANGELOG entry is not required.

Copilot

Pull Request Overview

This PR removes deprecated bulk operations code and introduces a new "truncate" ingest action to delete all items from a collection while keeping the collection intact. It also refactors the ingest processing functions (renaming ingestItems to processMessages) and updates related tests and documentation.

Removed unused bulk operations functions.
Added new ingest action "truncate" with dedicated error handling.
Updated tests, documentation, and CI scripts to support processMessages and ingest actions.

Reviewed Changes

Copilot reviewed 15 out of 18 changed files in this pull request and generated 3 comments.

Show a summary per file

File	Description
tests/unit/test-ingest-2.js	Replaced ingestItems with processMessages in unit tests
tests/system/test-ingest.js	Updated tests to validate new truncate action behavior and error handling
tests/system/test-api-search-post.js	Replaced ingestItems with processMessages
tests/system/test-api-search-get.js	Replaced ingestItems with processMessages in GET-based tests
tests/system/test-api-get-collection-aggregate.js	Updated ingest function reference
tests/system/test-api-get-aggregate.js	Updated ingest function reference
tests/helpers/ingest.js	Renamed variable from item to msg for clarity; refactored ingestFixture
src/lib/stac-utils.js	Added isStacEntity and isAction helper functions
src/lib/ingest.js	Refactored ingest flow to support new actions; replaced bulk operations with direct calls
src/lib/fs.js	Removed unused readJson helper
src/lambdas/ingest/index.js	Updated to call processMessages instead of ingestItems
README.md	Updated documentation to include ingest actions
CHANGELOG.md	Documented changes associated with new ingest actions
.github/workflows/push.yaml	Updated CI test commands to use new coverage script commands

Files not reviewed (3)

.c8rc: Language not supported
package.json: Language not supported
tests/fixtures/truncate.json: Language not supported

tests/system/test-ingest.js

README.md

tests/system/test-ingest.js

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

philvarner · 2025-04-24T20:41:03Z

src/lambdas/ingest/index.js

 import got from 'got' // eslint-disable-line import/no-unresolved
 import { createIndex } from '../../lib/database-client.js'
-import { ingestItems, publishResultsToSns } from '../../lib/ingest.js'
+import { processMessages, publishResultsToSns } from '../../lib/ingest.js'


renamed to better indicate that there were multiple types of messages it was expecting. This was already an inaccurate name, as Collections could be ingested in addition to Items already.

philvarner · 2025-04-24T20:41:16Z

src/lib/fs.js

- * @param {string} filename
- * @returns {Promise<unknown>}
- */
-export const readJson = (filename) => readFile(filename, 'utf8').then(JSON.parse)


philvarner · 2025-04-24T20:41:43Z

src/lib/ingest.js

-  /** @type {{ hasOwnProperty: (arg0: string) => any; type: string, collection: string; links: any[]; id: any; }} */ data
-) {
-  let index = ''
+export async function convertIngestMsgToDbOperation(data) {


no change in what this does, just a more accurate name

philvarner · 2025-04-24T20:42:38Z

src/lib/ingest.js

  logger.debug('data', data)
  if (isCollection(data)) {
    index = COLLECTIONS_INDEX
+    action = 'index'


this method now decides if this is an index or "action command" operation

philvarner · 2025-04-24T20:44:15Z

src/lib/ingest.js

-    throw new InvalidIngestError('Expected a "links" property on the stac object')
-  }
-  const links = data.links.filter(
+  const links = (data.links || []).filter(


For whatever reason, ingest rejected the Item if it was missing a links field with an array value. While this is required for a valid STAC Item, it's one of the only validations that ingest actually does, so it's pretty useless. This just adds an empty links array if it's missing.

philvarner · 2025-04-24T20:45:05Z

src/lib/ingest.js

-  }, [])
-  return operations
-}
+  if (!index) {


all operations require an index -- maybe in the future, we'll have some that don't.

philvarner · 2025-04-24T20:46:19Z

src/lib/ingest.js

+      await createIndex(id)
+    }
+  } else if (action === 'truncate') {
+    if (process.env['ENABLE_INGEST_ACTION_TRUNCATE'] !== 'true') {


only allow truncate if enabled. For example, a deployment may want to enable this in dev and staging, but disable in prod so you don't accidentally delete the real catalog.

This is a really important use case for the environment variable, so I wouldn't relegate it to just a comment on the PR. I'd add it to the README.md where you document the variable.

philvarner · 2025-04-24T20:47:41Z

src/lib/ingest.js

        logger.warn('Invalid ingest item', result.error)
      } else {
-        logger.error('Error while ingesting item', result.error)
+        logger.error('Error while ingesting item::', result.error)


added a double colon, because the downstream errors this surfaces have colons in them, and it was confusing as to how they related.

philvarner · 2025-04-24T20:48:24Z

src/lib/ingest.js

      error = e
    }
-    results.push({ record, dbRecord, result, error })
+    results.push({ record: msg, dbRecord: dbOp, result, error })


preserve the output format of this message, even though we've changed the internal names.

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

philvarner · 2025-04-24T20:50:33Z

tests/system/test-ingest.js


 test('Ingest item failure is published to post-ingest SNS topic', async (t) => {
-  const collection = await ingestCollectionAndPurgePostIngestQueue(t)
+  await ingestCollectionAndPurgePostIngestQueue(t)


this previously failed because Item was missing a links field, so I had to make it fail for another reason, e.g., collection doesn't exist

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

pro-phil · 2025-04-24T21:05:09Z

src/lib/ingest.js

+      await createIndex(id)
+    }
+  } else if (action === 'truncate') {
+    if (process.env['ENABLE_INGEST_ACTION_TRUNCATE'] !== 'true') {


This is a really important use case for the environment variable, so I wouldn't relegate it to just a comment on the PR. I'd add it to the README.md where you document the variable.

pro-phil · 2025-04-24T21:44:41Z

src/lib/stac-utils.js

+}
+
+export function isAction(record) {
+  return record && record.type === 'action'


nitpick: action is not capitalized by Collection and Feature are.

I might also rename the parameter from record to msg, since you changed the nomenclature elsewhere.

also, moving to a msgs metaphor seems like the right thing to do, but Collection and Feature aren't really msg types, so i'm a bit concerned about overloading type this way. I'd almost lean toward having and action_type and record_type where record_type is pertinent when the action_type is something like add_item. But again, not a code base I'm super familiar with so I may be worried about nothing.

🧌 Shouldn't it be STACtion? 🙄

Alright, I'll see myself out...

In all seriousness though, I think @phil-osk is right. This change is conflating objects to be ingested with actions. I don't love that. It might be the pragmatic choice, but I think it in concept is an incorrect choice.

Perhaps it would be better to have a new, separate lambda that is an action lambda (for lack of a better name at the moment) that could consume action messages, some of which could be ingest actions. This new lambda could maybe be considered a replacement for the existing ingest lambda, and the ingest lambda would then be deprecated? Or maybe there's value in maintaining support for both, to support item/collection sources that are not stac-server specific, because it is probably not ideal to require having yet another lambda in between just to munge items/collections into an action message format.

Alternately, might I ask this question: why expose this truncate functionality in this manner, as opposed to via an API endpoint?

Definitely the pragmatic choice and not the ideal conceptual one. We're not going to invest the time to separate all this out into something cleaner though, which is why all of this code is like it is anyway. 🤷🏻

A new lambda would be nice, but that's also days of work, including supporting it via both the builtin serverless example and the terraform module.

It could be another api endpoint, though I'm not sure which one. I don't think we actually want something this powerful in the API, but that's not a sufficient reason. It's really another pragmatic one -- in an actual deployment with an auth proxy, that would have to be exposed by the proxy, or a user would have to ssh tunnel directly to the stac-server to run it, rather than the user just being able to run an aws cli command to send the command to the ingest queue.

nitpick: action is not capitalized by Collection and Feature are.

That was an intentional decision, since those names are defined by STAC and GeoJSON. I wanted to keep the discriminator as type, but didn't want to use Action as I thought it might be confused with some OGC type named Action.

I tried to be conservative with renaming things, since most of these function, parameter, and variable names are confusing already. record isn't even a term we use in STAC, and I don't if it's supposed to be referring to OGC Records, the generic record/struct data structure, or "a thing that is recorded". I can clean this up more as a preparatory change if that's desired.

I'd like to move ahead with merging this. While I understand and agree with most of the critique here, this change is at least in line with all of the other code. The current need that I'm attempting fill with this behavior doesn't warrant a change more significant than this, so I don't feel I can justify that investment. stac-server is not actively maintained, so we're entirely reliant on people just implementing what they need -- e.g., we should take what we can get until a better situation arises.

pro-phil · 2025-04-24T21:46:27Z

tests/system/test-api-get-aggregate.js

  ]
  const items = await Promise.all(fixtureFiles.map((x) => loadJson(x)))
-  await ingestItems(items)
+  await processMessages(items)


nitpick: as before, this is jarring to pass items as messages. the naming incongruity is a yellow flag to me.

add behavior to allow for deleting items out of collections in bulk

222ec82

This was referenced Apr 24, 2025

add stac-server enable_ingest_action_truncate configuration Element84/filmdrop-aws-tf-modules#136

Merged

Bulk operations feature inadvertently dropped in 2.0.0 #885

Closed

philvarner requested review from Copilot and pjhartzell April 24, 2025 20:37

Copilot AI reviewed Apr 24, 2025

View reviewed changes

tests/system/test-ingest.js Outdated Show resolved Hide resolved

README.md Outdated Show resolved Hide resolved

tests/system/test-ingest.js Outdated Show resolved Hide resolved

Update README.md

cb78c94

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

philvarner commented Apr 24, 2025

View reviewed changes

Update tests/system/test-ingest.js

a3233dc

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

philvarner commented Apr 24, 2025

View reviewed changes

Update tests/system/test-ingest.js

3e0dddb

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

philvarner marked this pull request as ready for review April 24, 2025 20:52

pro-phil reviewed Apr 24, 2025

View reviewed changes

add comment about dev/staging/prod

8cbe1eb

philvarner enabled auto-merge May 5, 2025 14:30

philvarner disabled auto-merge May 5, 2025 14:30

philvarner merged commit 5f4c03f into main May 5, 2025
3 checks passed

philvarner deleted the pv/enable_ingest_action_truncate branch May 5, 2025 14:31

Conversation

philvarner commented Apr 24, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull Request Overview

Reviewed Changes

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

philvarner commented Apr 24, 2025 •

edited

Loading