-
Notifications
You must be signed in to change notification settings - Fork 8.5k
Description
Summary
The deletePolicyBulk method in synthetics_private_location.ts silently swallows all errors during package policy deletion, making it impossible for callers to detect and handle failures. This can lead to orphaned package policies and inconsistent state.
Current Behavior
In synthetics_private_location.ts:
async deletePolicyBulk(policyIdsToDelete: string[]) {
const soClient = this.server.coreStart.savedObjects.createInternalRepository();
const esClient = this.server.coreStart.elasticsearch.client.asInternalUser;
if (policyIdsToDelete.length > 0) {
try {
return await this.server.fleet.packagePolicyService.delete(
soClient,
esClient,
policyIdsToDelete,
{
force: true,
asyncDeploy: true,
}
);
} catch (e) {
this.server.logger.error(e);
// Returns undefined on error - caller cannot detect failure
}
}
}The calling code in editMonitors does not capture or check the delete response:
const [_createResponse, failedUpdatesRes] = await Promise.all([
this.createPolicyBulk(policiesToCreate),
this.updatePolicyBulk(policiesToUpdate),
this.deletePolicyBulk(uniqueToDelete), // Delete response is discarded
]);Problems
1. Silent failure masking
All errors (500s, 404s, permission errors, etc.) are caught and logged but not surfaced to the caller. The method returns undefined on error, which is indistinguishable from a successful operation with no policies to delete.
Impact:
- 500 errors during deletion are completely invisible — orphaned package policies may remain
- The monitor edit operation will appear successful even when policies weren't properly cleaned up
2. Potential race condition with parallel operations
The editMonitors function executes create, update, and delete operations in parallel using Promise.all. While this improves performance, it can create race conditions where:
- A "migrated" policy is being created with a new ID while the legacy policy is being deleted
- Fleet agents may receive operations in an unexpected order
Impact:
- During migration flows, agents could temporarily see inconsistent state
- In edge cases, the wrong policy could be active on the agent
Suggested Improvements
Option A: Return detailed results and handle in caller
async deletePolicyBulk(policyIdsToDelete: string[]): Promise<DeletePackagePoliciesResponse | undefined> {
if (policyIdsToDelete.length === 0) {
return undefined;
}
const soClient = this.server.coreStart.savedObjects.createInternalRepository();
const esClient = this.server.coreStart.elasticsearch.client.asInternalUser;
const result = await this.server.fleet.packagePolicyService.delete(
soClient,
esClient,
policyIdsToDelete,
{ force: true, asyncDeploy: true }
);
// Log and return failures for caller to handle
const failures = result.filter(r => r.statusCode !== 200 && r.statusCode !== 404);
if (failures.length > 0) {
this.server.logger.error(`Failed to delete ${failures.length} package policies: ${JSON.stringify(failures)}`);
}
return result;
}Option B: Distinguish 404s from actual failures
async deletePolicyBulk(policyIdsToDelete: string[]) {
// ... existing setup ...
try {
return await this.server.fleet.packagePolicyService.delete(/* ... */);
} catch (e) {
// 404s are expected (policy already deleted or never existed)
if (e.statusCode === 404) {
this.server.logger.debug(`Policy not found during cleanup: ${e.message}`);
return [];
}
// Re-throw actual failures so caller can handle them
throw e;
}
}Option C: Sequential operations for migration scenarios
For migration scenarios specifically, consider executing delete operations after create/update to avoid race conditions:
const [_createResponse, failedUpdatesRes] = await Promise.all([
this.createPolicyBulk(policiesToCreate),
this.updatePolicyBulk(policiesToUpdate),
]);
// Delete after create/update completes to avoid race conditions
const deleteResult = await this.deletePolicyBulk(uniqueToDelete);Additional Context
This issue exists independently of recent changes and affects all monitor edit operations involving private locations. The Fleet packagePolicyService.delete method returns detailed results per policy (including statusCode and body.message), but this information is currently discarded.
Acceptance Criteria
-
deletePolicyBulkreturns meaningful results to callers - Actual deletion failures (5xx errors) are surfaced, not just logged
- Expected "not found" scenarios (404s) are handled gracefully
- Callers can detect and respond to deletion failures appropriately