Episodes that meet the STUCK_EPISODE_THRESHOLD once should remain in error state until they are completely uploaded. Currently the cloudwatch alarms cycle in and out of error state because the alarm samples publisher log state faster than the stuck episode period.
The current log level periods look like:
┌───────────┬──────────────────────────────────┬───────────┐
│ Duration │ Threshold │ Log level │
├───────────┼──────────────────────────────────┼───────────┤
│ < 20 min │ below SLOW_EPISODE_THRESHOLD │ info │
├───────────┼──────────────────────────────────┼───────────┤
│ 20–35 min │ SLOW_EPISODE_THRESHOLD (20 min) │ warn │
├───────────┼──────────────────────────────────┼───────────┤
│ >= 35 min │ STUCK_EPISODE_THRESHOLD (35 min) │ error │
└───────────┴──────────────────────────────────┴───────────┘
Episodes that meet the STUCK_EPISODE_THRESHOLD once should remain in error state until they are completely uploaded. Currently the cloudwatch alarms cycle in and out of error state because the alarm samples publisher log state faster than the stuck episode period.
The current log level periods look like: