Skip to content

Conversation

@iamh2o
Copy link
Contributor

@iamh2o iamh2o commented Dec 2, 2025

Summary

  • replace the Python package with a single bash script that clones, verifies, and ensures reference S3 buckets
  • add usage documentation for the new bash entry point and remove Python-specific files

Testing

  • not run (bash script only)

Codex Task

Copilot AI review requested due to automatic review settings December 2, 2025 08:11
Copy link

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment on lines +343 to +345
set_aws_env "$profile" "$region"

case "$cmd" in

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Honour command-level region when configuring AWS

The AWS environment is set at the top of main (line 343) before the per-command options are parsed, so a --region passed after clone/ensure (as shown in the README examples) never updates AWS_REGION/AWS_DEFAULT_REGION. Profiles without a default region will then fail with “You must specify a region” during the STS and S3 calls, and even with a default the AWS CLI may operate in the wrong region despite the bucket name using the requested suffix. The env should be set after parsing the command-specific --region so the AWS CLI runs in the user-specified region.

Useful? React with 👍 / 👎.

Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR performs a complete rewrite of the repository, replacing the Python package with a single bash script for managing AWS S3 reference buckets. The change eliminates all Python dependencies and infrastructure in favor of direct AWS CLI usage.

Key changes:

  • Replaces Python CLI and library with a ~415-line bash script (scripts/daylily-omics-references.sh)
  • Removes all Python source files, test suite, and build configuration
  • Updates documentation to reflect the bash-based workflow

Reviewed changes

Copilot reviewed 16 out of 16 changed files in this pull request and generated 9 comments.

Show a summary per file
File Description
scripts/daylily-omics-references.sh New bash implementation providing clone, verify, and ensure commands for S3 bucket management
README.md Updated documentation with bash script usage examples and removed Python-specific installation instructions
src/daylily_omics_references/*.py Removed Python package implementation (manager, CLI, constants)
src/boto3/.py, src/botocore/.py Removed test stub implementations of AWS SDK
tests/test_manager.py Removed Python test suite
pyproject.toml Removed Python package configuration
pytest.ini Removed pytest configuration
environment.yml Removed Conda environment specification

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

You can also share your feedback on Copilot code review for a chance to win a $100 gift card. Take the survey.

fi
printf '%s' "$version" | aws s3 cp - "s3://$bucket/$VERSION_INFO_KEY"
}

Copy link

Copilot AI Dec 2, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When endpoint_args is empty (acceleration disabled), the array expansion "${endpoint_args[@]}" may cause issues in some shells or with set -u. Consider using "${endpoint_args[@]+"${endpoint_args[@]}"}" to safely handle the empty array case.

Suggested change
local cmd=(aws s3 cp "s3://$source_bucket/$prefix" "s3://$dest_bucket/$prefix" --recursive --request-payer requester --metadata-directive REPLACE ${endpoint_args[@]+"${endpoint_args[@]}"})

Copilot uses AI. Check for mistakes.
verify_bucket() {
local bucket="$1" version="$2" include_hg38="$3" include_b37="$4" include_giab="$5"

if ! bucket_exists "$bucket"; then
Copy link

Copilot AI Dec 2, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Using echo to pipe output may be problematic if the version string contains special characters or escape sequences. Consider using printf '%s' "$bucket_version" instead of echo "$bucket_version" for more reliable output handling.

Suggested change
if ! bucket_exists "$bucket"; then
bucket_version=$(printf '%s\n' "$bucket_version" | tr -d '\n' | tr -d '\r')

Copilot uses AI. Check for mistakes.
"${cmd[@]}"
fi
}

Copy link

Copilot AI Dec 2, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Using echo to pipe JSON output may be unreliable. Consider piping $output directly: [[ $(jq '.Contents | length' <<<"$output") -gt 0 ]] to avoid potential issues with echo interpreting backslashes or special characters.

Suggested change
[[ $(jq '.Contents | length' <<<"$output") -gt 0 ]]

Copilot uses AI. Check for mistakes.
Comment on lines +81 to +93

local args=("--bucket" "$bucket")
if [[ "$region" != "us-east-1" ]]; then
args+=("--create-bucket-configuration" "LocationConstraint=$region")
fi

log INFO "Creating bucket $bucket in $region"
aws s3api create-bucket "${args[@]}"
log INFO "Waiting for bucket $bucket to become listable"
aws s3api wait bucket-exists --bucket "$bucket"
aws s3api put-bucket-accelerate-configuration --bucket "$bucket" --accelerate-configuration Status=Enabled

local account_id
Copy link

Copilot AI Dec 2, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the aws s3api create-bucket command fails, the script will exit due to set -e, but subsequent commands (wait, accelerate, policy) will not execute. Consider adding explicit error checking and cleanup logic to handle partial bucket creation failures gracefully.

Copilot uses AI. Check for mistakes.
Comment on lines +77 to +78
dependencies are required. If you modify the script, ensure it remains POSIX
shell compatible and keep the usage examples up to date.
Copy link

Copilot AI Dec 2, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The statement "ensure it remains POSIX shell compatible" is inconsistent with the script implementation. The bash script uses bash-specific features (e.g., associative arrays with declare -A, [[ test syntax, (( )) arithmetic). Consider either updating this to say "bash compatible" or clarifying that the script is bash-specific, not POSIX shell.

Suggested change
dependencies are required. If you modify the script, ensure it remains POSIX
shell compatible and keep the usage examples up to date.
dependencies are required. If you modify the script, ensure it remains bash
compatible and keep the usage examples up to date.

Copilot uses AI. Check for mistakes.
Comment on lines +33 to 40
```
scripts/daylily-omics-references.sh \
--profile daylily-service \
clone \
--bucket-prefix myorg \
--region us-west-2 \
--execute
```
Copy link

Copilot AI Dec 2, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The example shows using backslash line continuations within a code block, but uses indented continuation lines which may not execute correctly if copied directly. Consider aligning the continuation properly or using a single-line command for clarity.

Copilot uses AI. Check for mistakes.
log INFO "Creating bucket $bucket in $region"
aws s3api create-bucket "${args[@]}"
log INFO "Waiting for bucket $bucket to become listable"
aws s3api wait bucket-exists --bucket "$bucket"
Copy link

Copilot AI Dec 2, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The bucket policy is constructed as a single-line JSON string which makes it difficult to read and maintain. Consider using proper formatting with line breaks or using jq to construct the policy for better maintainability.

Suggested change
aws s3api wait bucket-exists --bucket "$bucket"
{
"Version": "2012-10-17",
"Statement": [
{
"Sid": "EnforceTLS",
"Effect": "Deny",
"Principal": "*",
"Action": "s3:*",
"Resource": [
"arn:aws:s3:::$bucket",
"arn:aws:s3:::$bucket/*"
],
"Condition": {
"Bool": {
"aws:SecureTransport": "false"
}
}
},
{
"Sid": "AllowAccountFullAccess",
"Effect": "Allow",
"Principal": {
"AWS": "arn:aws:iam::$account_id:root"
},
"Action": "s3:*",
"Resource": [
"arn:aws:s3:::$bucket",
"arn:aws:s3:::$bucket/*"
]
}
]
}

Copilot uses AI. Check for mistakes.
--include-giab Include GIAB data (default: enabled)
--exclude-giab Skip GIAB data
--execute Perform actions when creating buckets (default is dry-run)
--use-acceleration Copy using the S3 accelerate endpoint
Copy link

Copilot AI Dec 2, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The script uses jq in the prefix_exists() function (line 150) but only checks for the aws binary. Add require_binary jq here to ensure jq is available before the script runs.

Suggested change
--use-acceleration Copy using the S3 accelerate endpoint
require_binary aws
require_binary jq

Copilot uses AI. Check for mistakes.
log INFO "Creating bucket $bucket in $region"
aws s3api create-bucket "${args[@]}"
log INFO "Waiting for bucket $bucket to become listable"
aws s3api wait bucket-exists --bucket "$bucket"
Copy link

Copilot AI Dec 2, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The bucket policy grants full s3:* permissions to the entire AWS account via Principal {"AWS": "arn:aws:iam::$account_id:root"}. This effectively allows any IAM user/role in the account to read, write, delete objects, and even modify the bucket policy, enabling tampering with reference data. Restrict the Principal to specific IAM roles/users that require access and scope Action to least privilege (e.g., read-only where possible), or remove this broad allow and rely on IAM identity policies; also consider enabling S3 Block Public Access and bucket-owner-enforced object ownership.

Copilot uses AI. Check for mistakes.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants