Skip to content

add behavior to allow for deleting items out of collections in bulk#891

Merged
philvarner merged 5 commits intomainfrom
pv/enable_ingest_action_truncate
May 5, 2025
Merged

add behavior to allow for deleting items out of collections in bulk#891
philvarner merged 5 commits intomainfrom
pv/enable_ingest_action_truncate

Conversation

@philvarner
Copy link
Collaborator

@philvarner philvarner commented Apr 24, 2025

Related Issue(s):

Proposed Changes:

  1. removes unused code for the old "bulk" operations feature that was removed in 2.0.0
  2. adds new "actions" ingest and a "truncate" action to remove all of the items from a collection
  3. adds better code coverage configuration and run targets
  4. dependency update of minor and patch versions
  5. got ingest.js to 100% code coverage 🎉

PR Checklist:

  • I have added my changes to the CHANGELOG or a CHANGELOG entry is not required.

Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR removes deprecated bulk operations code and introduces a new "truncate" ingest action to delete all items from a collection while keeping the collection intact. It also refactors the ingest processing functions (renaming ingestItems to processMessages) and updates related tests and documentation.

  • Removed unused bulk operations functions.
  • Added new ingest action "truncate" with dedicated error handling.
  • Updated tests, documentation, and CI scripts to support processMessages and ingest actions.

Reviewed Changes

Copilot reviewed 15 out of 18 changed files in this pull request and generated 3 comments.

Show a summary per file
File Description
tests/unit/test-ingest-2.js Replaced ingestItems with processMessages in unit tests
tests/system/test-ingest.js Updated tests to validate new truncate action behavior and error handling
tests/system/test-api-search-post.js Replaced ingestItems with processMessages
tests/system/test-api-search-get.js Replaced ingestItems with processMessages in GET-based tests
tests/system/test-api-get-collection-aggregate.js Updated ingest function reference
tests/system/test-api-get-aggregate.js Updated ingest function reference
tests/helpers/ingest.js Renamed variable from item to msg for clarity; refactored ingestFixture
src/lib/stac-utils.js Added isStacEntity and isAction helper functions
src/lib/ingest.js Refactored ingest flow to support new actions; replaced bulk operations with direct calls
src/lib/fs.js Removed unused readJson helper
src/lambdas/ingest/index.js Updated to call processMessages instead of ingestItems
README.md Updated documentation to include ingest actions
CHANGELOG.md Documented changes associated with new ingest actions
.github/workflows/push.yaml Updated CI test commands to use new coverage script commands
Files not reviewed (3)
  • .c8rc: Language not supported
  • package.json: Language not supported
  • tests/fixtures/truncate.json: Language not supported

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
import got from 'got' // eslint-disable-line import/no-unresolved
import { createIndex } from '../../lib/database-client.js'
import { ingestItems, publishResultsToSns } from '../../lib/ingest.js'
import { processMessages, publishResultsToSns } from '../../lib/ingest.js'
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

renamed to better indicate that there were multiple types of messages it was expecting. This was already an inaccurate name, as Collections could be ingested in addition to Items already.

* @param {string} filename
* @returns {Promise<unknown>}
*/
export const readJson = (filename) => readFile(filename, 'utf8').then(JSON.parse)
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

unused

/** @type {{ hasOwnProperty: (arg0: string) => any; type: string, collection: string; links: any[]; id: any; }} */ data
) {
let index = ''
export async function convertIngestMsgToDbOperation(data) {
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

no change in what this does, just a more accurate name

logger.debug('data', data)
if (isCollection(data)) {
index = COLLECTIONS_INDEX
action = 'index'
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this method now decides if this is an index or "action command" operation

throw new InvalidIngestError('Expected a "links" property on the stac object')
}
const links = data.links.filter(
const links = (data.links || []).filter(
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For whatever reason, ingest rejected the Item if it was missing a links field with an array value. While this is required for a valid STAC Item, it's one of the only validations that ingest actually does, so it's pretty useless. This just adds an empty links array if it's missing.

}, [])
return operations
}
if (!index) {
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

all operations require an index -- maybe in the future, we'll have some that don't.

await createIndex(id)
}
} else if (action === 'truncate') {
if (process.env['ENABLE_INGEST_ACTION_TRUNCATE'] !== 'true') {
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

only allow truncate if enabled. For example, a deployment may want to enable this in dev and staging, but disable in prod so you don't accidentally delete the real catalog.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a really important use case for the environment variable, so I wouldn't relegate it to just a comment on the PR. I'd add it to the README.md where you document the variable.

logger.warn('Invalid ingest item', result.error)
} else {
logger.error('Error while ingesting item', result.error)
logger.error('Error while ingesting item::', result.error)
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

added a double colon, because the downstream errors this surfaces have colons in them, and it was confusing as to how they related.

error = e
}
results.push({ record, dbRecord, result, error })
results.push({ record: msg, dbRecord: dbOp, result, error })
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

preserve the output format of this message, even though we've changed the internal names.

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

test('Ingest item failure is published to post-ingest SNS topic', async (t) => {
const collection = await ingestCollectionAndPurgePostIngestQueue(t)
await ingestCollectionAndPurgePostIngestQueue(t)
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this previously failed because Item was missing a links field, so I had to make it fail for another reason, e.g., collection doesn't exist

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
@philvarner philvarner marked this pull request as ready for review April 24, 2025 20:52
await createIndex(id)
}
} else if (action === 'truncate') {
if (process.env['ENABLE_INGEST_ACTION_TRUNCATE'] !== 'true') {

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a really important use case for the environment variable, so I wouldn't relegate it to just a comment on the PR. I'd add it to the README.md where you document the variable.

}

export function isAction(record) {
return record && record.type === 'action'

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nitpick: action is not capitalized by Collection and Feature are.

I might also rename the parameter from record to msg, since you changed the nomenclature elsewhere.

also, moving to a msgs metaphor seems like the right thing to do, but Collection and Feature aren't really msg types, so i'm a bit concerned about overloading type this way. I'd almost lean toward having and action_type and record_type where record_type is pertinent when the action_type is something like add_item. But again, not a code base I'm super familiar with so I may be worried about nothing.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧌 Shouldn't it be STACtion? 🙄

Alright, I'll see myself out...

In all seriousness though, I think @phil-osk is right. This change is conflating objects to be ingested with actions. I don't love that. It might be the pragmatic choice, but I think it in concept is an incorrect choice.

Perhaps it would be better to have a new, separate lambda that is an action lambda (for lack of a better name at the moment) that could consume action messages, some of which could be ingest actions. This new lambda could maybe be considered a replacement for the existing ingest lambda, and the ingest lambda would then be deprecated? Or maybe there's value in maintaining support for both, to support item/collection sources that are not stac-server specific, because it is probably not ideal to require having yet another lambda in between just to munge items/collections into an action message format.

Alternately, might I ask this question: why expose this truncate functionality in this manner, as opposed to via an API endpoint?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Definitely the pragmatic choice and not the ideal conceptual one. We're not going to invest the time to separate all this out into something cleaner though, which is why all of this code is like it is anyway. 🤷🏻

A new lambda would be nice, but that's also days of work, including supporting it via both the builtin serverless example and the terraform module.

It could be another api endpoint, though I'm not sure which one. I don't think we actually want something this powerful in the API, but that's not a sufficient reason. It's really another pragmatic one -- in an actual deployment with an auth proxy, that would have to be exposed by the proxy, or a user would have to ssh tunnel directly to the stac-server to run it, rather than the user just being able to run an aws cli command to send the command to the ingest queue.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nitpick: action is not capitalized by Collection and Feature are.

That was an intentional decision, since those names are defined by STAC and GeoJSON. I wanted to keep the discriminator as type, but didn't want to use Action as I thought it might be confused with some OGC type named Action.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tried to be conservative with renaming things, since most of these function, parameter, and variable names are confusing already. record isn't even a term we use in STAC, and I don't if it's supposed to be referring to OGC Records, the generic record/struct data structure, or "a thing that is recorded". I can clean this up more as a preparatory change if that's desired.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd like to move ahead with merging this. While I understand and agree with most of the critique here, this change is at least in line with all of the other code. The current need that I'm attempting fill with this behavior doesn't warrant a change more significant than this, so I don't feel I can justify that investment. stac-server is not actively maintained, so we're entirely reliant on people just implementing what they need -- e.g., we should take what we can get until a better situation arises.

]
const items = await Promise.all(fixtureFiles.map((x) => loadJson(x)))
await ingestItems(items)
await processMessages(items)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nitpick: as before, this is jarring to pass items as messages. the naming incongruity is a yellow flag to me.

@philvarner philvarner enabled auto-merge May 5, 2025 14:30
@philvarner philvarner disabled auto-merge May 5, 2025 14:30
@philvarner philvarner merged commit 5f4c03f into main May 5, 2025
3 checks passed
@philvarner philvarner deleted the pv/enable_ingest_action_truncate branch May 5, 2025 14:31
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants