Skip to content

Add SSM command output logging to S3 (Traceability)#306

Open
harshavemula-ua wants to merge 1 commit intomainfrom
feature/ssm-logging-clean
Open

Add SSM command output logging to S3 (Traceability)#306
harshavemula-ua wants to merge 1 commit intomainfrom
feature/ssm-logging-clean

Conversation

@harshavemula-ua
Copy link
Collaborator

@harshavemula-ua harshavemula-ua commented Feb 11, 2026

Summary

  • Add full EC2 command stdout/stderr logging to s3://ciroh-community-ngen-datastream/ssm-logs/{execution_name}/ via SSM OutputS3BucketName and OutputS3KeyPrefix
  • Inject Step Functions execution name into Commander Lambda payload using States.JsonMerge for traceable, browsable log paths
  • Logs are organized by execution name (e.g., ssm-logs/cfe_nom_short_range_00_VPU_07_20260211.../) matching the Step Functions console

S3 Log Folder Structure

ssm-logs/
  └── {execution_name}/                          ← matches Step Functions execution name
        └── {command_id}/                         ← SSM command ID
              └── {instance_id}/                  ← EC2 instance ID
                    └── awsrunShellScript/
                          └── 0.awsrunShellScript/
                                ├── stdout        ← full command output
                                └── stderr        ← full error output

Lifecycle Configuration

aws s3api put-bucket-lifecycle-configuration \
  --bucket ciroh-community-ngen-datastream \
  --lifecycle-configuration '{
    "Rules": [
      {
        "ID": "DeleteSSMLogsAfter3Days",
        "Filter": {
          "Prefix": "ssm-logs/"
        },
        "Status": "Enabled",
        "Expiration": {
          "Days": 3
        }
      }
    ]
  }'

Sample Output

Test plan

  • Deployed to test_harsha environment
  • Triggered state machine execution — Commander succeeded
  • Verified SSM logs written to s3://ciroh-community-ngen-datastream/ssm-logs/test-exec-logging-20260211-123820/
  • Confirmed full stdout (1.2 MB) and stderr (38 KB) captured without truncation
  • Verify no impact on existing scheduled production executions

Write full EC2 command stdout/stderr to
s3://ciroh-community-ngen-datastream/ssm-logs/{execution_name}/
by adding OutputS3BucketName and OutputS3KeyPrefix to the SSM
send_command call. Inject Step Functions execution name into the
Commander Lambda payload via States.JsonMerge for traceable log paths.
@harshavemula-ua harshavemula-ua changed the title Add SSM command output logging to S3 Add SSM command output logging to S3 (Traceability) Feb 11, 2026
}
'executionTimeout': [f"{3600*24}"]
},
OutputS3BucketName='ciroh-community-ngen-datastream',
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@harshavemula-ua, I think this is a worthy and needed feature, but it may motivate some changes in datastreamcli or the prod bmi configs before we deploy. Reason being is NextGen can generate a massive amount of print outs, which will bog the ec2 instance down if ssm needs to stream everything to s3. I've dealt with this in the past by piping all of the ngiab docker print outs to /dev/null, but this gets rid of everything that nextgen prints out. Before we merge this, let's do some tests to confirm the ec2 does not slow down substantially for any execution type. If it does, let's identify why and see if we can suppress printouts in a judicial way.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants