add behavior to allow for deleting items out of collections in bulk#891
add behavior to allow for deleting items out of collections in bulk#891philvarner merged 5 commits intomainfrom
Conversation
There was a problem hiding this comment.
Pull Request Overview
This PR removes deprecated bulk operations code and introduces a new "truncate" ingest action to delete all items from a collection while keeping the collection intact. It also refactors the ingest processing functions (renaming ingestItems to processMessages) and updates related tests and documentation.
- Removed unused bulk operations functions.
- Added new ingest action "truncate" with dedicated error handling.
- Updated tests, documentation, and CI scripts to support processMessages and ingest actions.
Reviewed Changes
Copilot reviewed 15 out of 18 changed files in this pull request and generated 3 comments.
Show a summary per file
| File | Description |
|---|---|
| tests/unit/test-ingest-2.js | Replaced ingestItems with processMessages in unit tests |
| tests/system/test-ingest.js | Updated tests to validate new truncate action behavior and error handling |
| tests/system/test-api-search-post.js | Replaced ingestItems with processMessages |
| tests/system/test-api-search-get.js | Replaced ingestItems with processMessages in GET-based tests |
| tests/system/test-api-get-collection-aggregate.js | Updated ingest function reference |
| tests/system/test-api-get-aggregate.js | Updated ingest function reference |
| tests/helpers/ingest.js | Renamed variable from item to msg for clarity; refactored ingestFixture |
| src/lib/stac-utils.js | Added isStacEntity and isAction helper functions |
| src/lib/ingest.js | Refactored ingest flow to support new actions; replaced bulk operations with direct calls |
| src/lib/fs.js | Removed unused readJson helper |
| src/lambdas/ingest/index.js | Updated to call processMessages instead of ingestItems |
| README.md | Updated documentation to include ingest actions |
| CHANGELOG.md | Documented changes associated with new ingest actions |
| .github/workflows/push.yaml | Updated CI test commands to use new coverage script commands |
Files not reviewed (3)
- .c8rc: Language not supported
- package.json: Language not supported
- tests/fixtures/truncate.json: Language not supported
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
| import got from 'got' // eslint-disable-line import/no-unresolved | ||
| import { createIndex } from '../../lib/database-client.js' | ||
| import { ingestItems, publishResultsToSns } from '../../lib/ingest.js' | ||
| import { processMessages, publishResultsToSns } from '../../lib/ingest.js' |
There was a problem hiding this comment.
renamed to better indicate that there were multiple types of messages it was expecting. This was already an inaccurate name, as Collections could be ingested in addition to Items already.
| * @param {string} filename | ||
| * @returns {Promise<unknown>} | ||
| */ | ||
| export const readJson = (filename) => readFile(filename, 'utf8').then(JSON.parse) |
| /** @type {{ hasOwnProperty: (arg0: string) => any; type: string, collection: string; links: any[]; id: any; }} */ data | ||
| ) { | ||
| let index = '' | ||
| export async function convertIngestMsgToDbOperation(data) { |
There was a problem hiding this comment.
no change in what this does, just a more accurate name
| logger.debug('data', data) | ||
| if (isCollection(data)) { | ||
| index = COLLECTIONS_INDEX | ||
| action = 'index' |
There was a problem hiding this comment.
this method now decides if this is an index or "action command" operation
| throw new InvalidIngestError('Expected a "links" property on the stac object') | ||
| } | ||
| const links = data.links.filter( | ||
| const links = (data.links || []).filter( |
There was a problem hiding this comment.
For whatever reason, ingest rejected the Item if it was missing a links field with an array value. While this is required for a valid STAC Item, it's one of the only validations that ingest actually does, so it's pretty useless. This just adds an empty links array if it's missing.
| }, []) | ||
| return operations | ||
| } | ||
| if (!index) { |
There was a problem hiding this comment.
all operations require an index -- maybe in the future, we'll have some that don't.
| await createIndex(id) | ||
| } | ||
| } else if (action === 'truncate') { | ||
| if (process.env['ENABLE_INGEST_ACTION_TRUNCATE'] !== 'true') { |
There was a problem hiding this comment.
only allow truncate if enabled. For example, a deployment may want to enable this in dev and staging, but disable in prod so you don't accidentally delete the real catalog.
There was a problem hiding this comment.
This is a really important use case for the environment variable, so I wouldn't relegate it to just a comment on the PR. I'd add it to the README.md where you document the variable.
| logger.warn('Invalid ingest item', result.error) | ||
| } else { | ||
| logger.error('Error while ingesting item', result.error) | ||
| logger.error('Error while ingesting item::', result.error) |
There was a problem hiding this comment.
added a double colon, because the downstream errors this surfaces have colons in them, and it was confusing as to how they related.
| error = e | ||
| } | ||
| results.push({ record, dbRecord, result, error }) | ||
| results.push({ record: msg, dbRecord: dbOp, result, error }) |
There was a problem hiding this comment.
preserve the output format of this message, even though we've changed the internal names.
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
|
|
||
| test('Ingest item failure is published to post-ingest SNS topic', async (t) => { | ||
| const collection = await ingestCollectionAndPurgePostIngestQueue(t) | ||
| await ingestCollectionAndPurgePostIngestQueue(t) |
There was a problem hiding this comment.
this previously failed because Item was missing a links field, so I had to make it fail for another reason, e.g., collection doesn't exist
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
| await createIndex(id) | ||
| } | ||
| } else if (action === 'truncate') { | ||
| if (process.env['ENABLE_INGEST_ACTION_TRUNCATE'] !== 'true') { |
There was a problem hiding this comment.
This is a really important use case for the environment variable, so I wouldn't relegate it to just a comment on the PR. I'd add it to the README.md where you document the variable.
| } | ||
|
|
||
| export function isAction(record) { | ||
| return record && record.type === 'action' |
There was a problem hiding this comment.
nitpick: action is not capitalized by Collection and Feature are.
I might also rename the parameter from record to msg, since you changed the nomenclature elsewhere.
also, moving to a msgs metaphor seems like the right thing to do, but Collection and Feature aren't really msg types, so i'm a bit concerned about overloading type this way. I'd almost lean toward having and action_type and record_type where record_type is pertinent when the action_type is something like add_item. But again, not a code base I'm super familiar with so I may be worried about nothing.
There was a problem hiding this comment.
🧌 Shouldn't it be STACtion? 🙄
Alright, I'll see myself out...
In all seriousness though, I think @phil-osk is right. This change is conflating objects to be ingested with actions. I don't love that. It might be the pragmatic choice, but I think it in concept is an incorrect choice.
Perhaps it would be better to have a new, separate lambda that is an action lambda (for lack of a better name at the moment) that could consume action messages, some of which could be ingest actions. This new lambda could maybe be considered a replacement for the existing ingest lambda, and the ingest lambda would then be deprecated? Or maybe there's value in maintaining support for both, to support item/collection sources that are not stac-server specific, because it is probably not ideal to require having yet another lambda in between just to munge items/collections into an action message format.
Alternately, might I ask this question: why expose this truncate functionality in this manner, as opposed to via an API endpoint?
There was a problem hiding this comment.
Definitely the pragmatic choice and not the ideal conceptual one. We're not going to invest the time to separate all this out into something cleaner though, which is why all of this code is like it is anyway. 🤷🏻
A new lambda would be nice, but that's also days of work, including supporting it via both the builtin serverless example and the terraform module.
It could be another api endpoint, though I'm not sure which one. I don't think we actually want something this powerful in the API, but that's not a sufficient reason. It's really another pragmatic one -- in an actual deployment with an auth proxy, that would have to be exposed by the proxy, or a user would have to ssh tunnel directly to the stac-server to run it, rather than the user just being able to run an aws cli command to send the command to the ingest queue.
There was a problem hiding this comment.
nitpick: action is not capitalized by Collection and Feature are.
That was an intentional decision, since those names are defined by STAC and GeoJSON. I wanted to keep the discriminator as type, but didn't want to use Action as I thought it might be confused with some OGC type named Action.
There was a problem hiding this comment.
I tried to be conservative with renaming things, since most of these function, parameter, and variable names are confusing already. record isn't even a term we use in STAC, and I don't if it's supposed to be referring to OGC Records, the generic record/struct data structure, or "a thing that is recorded". I can clean this up more as a preparatory change if that's desired.
There was a problem hiding this comment.
I'd like to move ahead with merging this. While I understand and agree with most of the critique here, this change is at least in line with all of the other code. The current need that I'm attempting fill with this behavior doesn't warrant a change more significant than this, so I don't feel I can justify that investment. stac-server is not actively maintained, so we're entirely reliant on people just implementing what they need -- e.g., we should take what we can get until a better situation arises.
| ] | ||
| const items = await Promise.all(fixtureFiles.map((x) => loadJson(x))) | ||
| await ingestItems(items) | ||
| await processMessages(items) |
There was a problem hiding this comment.
nitpick: as before, this is jarring to pass items as messages. the naming incongruity is a yellow flag to me.
Related Issue(s):
Proposed Changes:
PR Checklist: