Skip to content

Commit 7ae4f2b

Browse files
authored
Refactor: gsutil to gcloud storage Migration with Per-Command Feature Flags (#4930)
# Refactor: `gsutil` to `gcloud storage` Migration with Per-Command Feature Flags ## Summary This pull request fundamentally refactors our `gsutil` to `gcloud storage` migration strategy. The previous approach used a single global feature flag, `USE_GCLOUD_STORAGE`, which activated the migration for all commands simultaneously. This PR replaces that global flag with a granular, per-command flag system, allowing for a safer, more controlled, and incremental transition. The following environment variables have been introduced to individually control their corresponding commands: * `USE_GCLOUD_STORAGE_CP` * `USE_GCLOUD_STORAGE_RSYNC` * `USE_GCLOUD_STORAGE_LS` * `USE_GCLOUD_STORAGE_MB` * `USE_GCLOUD_STORAGE_CORS_SET` * `USE_GCLOUD_STORAGE_DEFSTORAGECLASS_SET` * `USE_GCLOUD_STORAGE_IAM_CH` ----- ## Motivation The migration from `gsutil` to the more modern `gcloud storage` tools is an important step to modernize our codebase and leverage the performance and feature improvements of `gcloud`. However, a "big bang" migration with a single global flag posed a significant risk. A bug or performance issue in a single command could impact all storage operations, causing widespread instability. This new approach mitigates that risk by allowing us to enable and test the migration for each command in isolation. This makes it easier to identify issues, roll back specific changes, and ensure a smooth, stable transition to the new tooling. ----- ## Implementation Details The main technical changes include: * **Dynamic Flag Function:** * The global `use_gcloud_storage()` function has been removed. * A new function, `use_gcloud_for_command(command: str)`, has been introduced in `src/clusterfuzz/_internal/google_cloud_utils/gsutil.py`. This function dynamically constructs the environment variable name (e.g., `USE_GCLOUD_STORAGE_CP`) based on the provided command and checks if it is set. * **`GSUtilRunner` Refactoring:** * The `GSUtilRunner` in `gsutil.py` no longer relies on an internal state (`self.use_gcloud_storage`). * Its methods (`rsync`, `download_file`, etc.) now invoke `use_gcloud_for_command` with the relevant command to decide at runtime which CLI (`gsutil` or `gcloud storage`) to use. * **Direct `gsutil` Call Updates:** * Scripts that invoked `gsutil` directly, such as `src/local/butler/common.py` and `src/clusterfuzz/_internal/scripts/copy_corpus.py`, have been refactored to use the new feature flag logic. * Shell scripts in `configs/` and `docker/` have also been updated to check the new command-specific environment variables. * **User Interface:** * The upload instruction message in the `corpora` handler (`src/appengine/handlers/corpora.py`) now dynamically displays the `gsutil rsync` or `gcloud storage rsync` command, depending on the state of the `USE_GCLOUD_STORAGE_RSYNC` flag, ensuring users see the correct command. ----- ## Rollout Strategy and How to Use Activating the new functionality is done via environment variables. By default, all commands will continue to use `gsutil`. To enable `gcloud storage` for a specific command, set the corresponding environment variable to a truthy value (e.g., `"1"` or `"True"`). **Example:** To test the `cp` command using `gcloud storage`, set the following environment variable: ```bash export USE_GCLOUD_STORAGE_CP=1 ``` To revert to `gsutil`, simply unset the environment variable or set it to an empty string. ### Recommended Rollout Strategy: 1. **Staging Environment**: Start by enabling the flags one by one in a staging or testing environment. 2. **Monitoring**: After enabling each flag, closely monitor logs, performance metrics, and related functionality to ensure there are no regressions. 3. **Incremental Rollout**: We suggest the following order for a production rollout, starting with lower-risk commands: * `ls` * `mb`, `cors set`, `defstorageclass set` * `cp` * `rsync` 4. **Production**: Once a command has been thoroughly tested and validated in staging, the corresponding flag can be enabled in production. 5. **Migration Completion**: The ultimate goal is to have all flags enabled in production. After a stabilization period, we can plan a future refactoring to completely remove the legacy `gsutil` code and the feature flags themselves. ----- ## Tests * All existing unit tests in `gsutil_test.py` and `deploy_test.py` have been updated to use the new per-command feature flags. * Tests have been parameterized to ensure each command functions correctly with both `gsutil` (flag disabled) and `gcloud storage` (flag enabled). * All tests, linters, and formatters are passing, ensuring code quality and consistency.
1 parent aa66083 commit 7ae4f2b

File tree

21 files changed

+1372
-809
lines changed

21 files changed

+1372
-809
lines changed

.github/workflows/tests.yaml

Lines changed: 2 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -23,9 +23,8 @@ jobs:
2323

2424
steps:
2525
- uses: actions/checkout@v3
26-
- run: | # Needed for git diff to work.
27-
git fetch origin master --depth 1
28-
git symbolic-ref refs/remotes/origin/HEAD refs/remotes/origin/master
26+
with:
27+
fetch-depth: 0
2928

3029
- name: Setup python environment
3130
uses: actions/setup-python@b55428b1882923874294fa556849718a1d7f2ca5

AGENTS.md

Lines changed: 8 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -65,6 +65,14 @@ python butler.py py_unittest -t core -m
6565

6666
The `-m` flag runs the tests in parallel, which is recommended.
6767

68+
### Running a single test file
69+
70+
To run a single test file, you can use the `-p` or `--pattern` flag. For example, to run the tests in `deploy_test.py`, you can use the following command:
71+
72+
```bash
73+
python butler.py py_unittest -t core -p deploy_test.py
74+
```
75+
6876
## Linting
6977

7078
To check the code for style and linting issues, run the following command:

Pipfile

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -9,6 +9,7 @@ crcmod = "==1.7"
99
future = "==0.17.1"
1010
protobuf = "==4.23.4"
1111
psutil = "==5.9.4"
12+
google-cloud-ndb = "==2.3.4"
1213

1314
[dev-packages]
1415
Fabric = "==1.14.1"

Pipfile.lock

Lines changed: 671 additions & 380 deletions
Some generated files are not rendered by default. Learn more about customizing how changed files appear on GitHub.

cloudbuild.yaml

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -24,6 +24,7 @@ steps:
2424
# the repository to grab the .git directory.
2525
git clone 'https://github.com/google/clusterfuzz' tmp
2626
git -C tmp fetch origin "$COMMIT_SHA"
27+
git -C tmp fetch origin master
2728
git -C tmp checkout -qf FETCH_HEAD
2829
mv tmp/.git .git
2930
rm -rf tmp

configs/test/bot/setup/android.bash

Lines changed: 18 additions & 12 deletions
Original file line numberDiff line numberDiff line change
@@ -74,20 +74,26 @@ fi
7474
echo "Activating credentials with the Google Cloud SDK."
7575
$GSUTIL_PATH/gcloud auth activate-service-account --key-file=$GOOGLE_APPLICATION_CREDENTIALS
7676

77-
# Otherwise, gsutil will error out due to multiple types of configured
78-
# credentials. For more information about this, see
79-
# https://cloud.google.com/storage/docs/gsutil/commands/config#configuration-file-selection-procedure
80-
echo "Specifying the proper Boto configuration file."
81-
BOTO_CONFIG_PATH=$($GSUTIL_PATH/gsutil -D 2>&1 | grep "config_file_list" | egrep -o "/[^']+gserviceaccount\.com/\.boto") || true
82-
if [ -f $BOTO_CONFIG_PATH ]; then
83-
export BOTO_CONFIG="$BOTO_CONFIG_PATH"
84-
else
85-
echo "WARNING: failed to identify the Boto configuration file and specify BOTO_CONFIG env."
86-
fi
87-
8877
echo "Downloading ClusterFuzz source code."
8978
rm -rf clusterfuzz
90-
$GSUTIL_PATH/gsutil cp gs://$DEPLOYMENT_BUCKET/$DEPLOYMENT_ZIP clusterfuzz-source.zip
79+
if [ "$USE_GCLOUD_STORAGE_CP" = "1" ]; then
80+
echo "Downloading ClusterFuzz source code using gcloud."
81+
$GSUTIL_PATH/gcloud storage cp gs://$DEPLOYMENT_BUCKET/$DEPLOYMENT_ZIP clusterfuzz-source.zip
82+
else
83+
# Otherwise, gsutil will error out due to multiple types of configured
84+
# credentials. For more information about this, see
85+
# https://cloud.google.com/storage/docs/gsutil/commands/config#configuration-file-selection-procedure
86+
echo "Specifying the proper Boto configuration file."
87+
BOTO_CONFIG_PATH=$($GSUTIL_PATH/gsutil -D 2>&1 | grep "config_file_list" | egrep -o "/[^']+gserviceaccount\.com/\.boto") || true
88+
if [ -f $BOTO_CONFIG_PATH ]; then
89+
export BOTO_CONFIG="$BOTO_CONFIG_PATH"
90+
else
91+
echo "WARNING: failed to identify the Boto configuration file and specify BOTO_CONFIG env."
92+
fi
93+
94+
echo "Downloading ClusterFuzz source code using gsutil."
95+
$GSUTIL_PATH/gsutil cp gs://$DEPLOYMENT_BUCKET/$DEPLOYMENT_ZIP clusterfuzz-source.zip
96+
fi
9197
unzip -q clusterfuzz-source.zip
9298

9399
echo "Installing ClusterFuzz package dependencies using pipenv."

configs/test/bot/setup/linux.bash

Lines changed: 18 additions & 12 deletions
Original file line numberDiff line numberDiff line change
@@ -69,20 +69,26 @@ fi
6969
echo "Activating credentials with the Google Cloud SDK."
7070
$GSUTIL_PATH/gcloud auth activate-service-account --key-file=$GOOGLE_APPLICATION_CREDENTIALS
7171

72-
# Otherwise, gsutil will error out due to multiple types of configured
73-
# credentials. For more information about this, see
74-
# https://cloud.google.com/storage/docs/gsutil/commands/config#configuration-file-selection-procedure
75-
echo "Specifying the proper Boto configuration file."
76-
BOTO_CONFIG_PATH=$($GSUTIL_PATH/gsutil -D 2>&1 | grep "config_file_list" | egrep -o "/[^']+gserviceaccount\.com/\.boto") || true
77-
if [ -f $BOTO_CONFIG_PATH ]; then
78-
export BOTO_CONFIG="$BOTO_CONFIG_PATH"
79-
else
80-
echo "WARNING: failed to identify the Boto configuration file and specify BOTO_CONFIG env."
81-
fi
82-
8372
echo "Downloading ClusterFuzz source code."
8473
rm -rf clusterfuzz
85-
$GSUTIL_PATH/gsutil cp gs://$DEPLOYMENT_BUCKET/$DEPLOYMENT_ZIP clusterfuzz-source.zip
74+
if [ "$USE_GCLOUD_STORAGE_CP" = "1" ]; then
75+
echo "Downloading ClusterFuzz source code using gcloud."
76+
$GSUTIL_PATH/gcloud storage cp gs://$DEPLOYMENT_BUCKET/$DEPLOYMENT_ZIP clusterfuzz-source.zip
77+
else
78+
# Otherwise, gsutil will error out due to multiple types of configured
79+
# credentials. For more information about this, see
80+
# https://cloud.google.com/storage/docs/gsutil/commands/config#configuration-file-selection-procedure
81+
echo "Specifying the proper Boto configuration file."
82+
BOTO_CONFIG_PATH=$($GSUTIL_PATH/gsutil -D 2>&1 | grep "config_file_list" | egrep -o "/[^']+gserviceaccount\.com/\.boto") || true
83+
if [ -f $BOTO_CONFIG_PATH ]; then
84+
export BOTO_CONFIG="$BOTO_CONFIG_PATH"
85+
else
86+
echo "WARNING: failed to identify the Boto configuration file and specify BOTO_CONFIG env."
87+
fi
88+
89+
echo "Downloading ClusterFuzz source code using gsutil."
90+
$GSUTIL_PATH/gsutil cp gs://$DEPLOYMENT_BUCKET/$DEPLOYMENT_ZIP clusterfuzz-source.zip
91+
fi
8692
unzip -q clusterfuzz-source.zip
8793

8894
echo "Installing ClusterFuzz package dependencies using pipenv."

configs/test/bot/setup/mac.bash

Lines changed: 15 additions & 9 deletions
Original file line numberDiff line numberDiff line change
@@ -65,17 +65,23 @@ echo "Specifying the proper Boto configuration file."
6565
# Otherwise, gsutil will error out due to multiple types of configured
6666
# credentials. For more information about this, see
6767
# https://cloud.google.com/storage/docs/gsutil/commands/config#configuration-file-selection-procedure
68-
BOTO_CONFIG_PATH=$($GSUTIL_PATH/gsutil -D 2>&1 | grep "config_file_list" | egrep -o "/[^']+gserviceaccount\.com/\.boto")
69-
70-
if [ -f $BOTO_CONFIG_PATH ]; then
71-
export BOTO_CONFIG="$BOTO_CONFIG_PATH"
72-
else
73-
echo "WARNING: failed to identify the Boto configuration file and specify BOTO_CONFIG env."
74-
fi
75-
7668
echo "Downloading ClusterFuzz source code."
7769
rm -rf clusterfuzz
78-
$GSUTIL_PATH/gsutil cp gs://$DEPLOYMENT_BUCKET/$DEPLOYMENT_ZIP clusterfuzz-source.zip
70+
if [ "$USE_GCLOUD_STORAGE_CP" = "1" ]; then
71+
echo "Downloading ClusterFuzz source code using gcloud."
72+
$GSUTIL_PATH/gcloud storage cp gs://$DEPLOYMENT_BUCKET/$DEPLOYMENT_ZIP clusterfuzz-source.zip
73+
else
74+
BOTO_CONFIG_PATH=$($GSUTIL_PATH/gsutil -D 2>&1 | grep "config_file_list" | egrep -o "/[^']+gserviceaccount\.com/\.boto")
75+
76+
if [ -f $BOTO_CONFIG_PATH ]; then
77+
export BOTO_CONFIG="$BOTO_CONFIG_PATH"
78+
else
79+
echo "WARNING: failed to identify the Boto configuration file and specify BOTO_CONFIG env."
80+
fi
81+
82+
echo "Downloading ClusterFuzz source code using gsutil."
83+
$GSUTIL_PATH/gsutil cp gs://$DEPLOYMENT_BUCKET/$DEPLOYMENT_ZIP clusterfuzz-source.zip
84+
fi
7985
unzip -q clusterfuzz-source.zip
8086

8187
echo "Installing ClusterFuzz package dependencies using pipenv."

configs/test/gce/android-init.bash

Lines changed: 5 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -92,7 +92,11 @@ cd $INSTALL_DIRECTORY
9292
9393
echo "Downloading ClusterFuzz source code."
9494
rm -rf $ROOT_DIR
95-
$GSUTIL_PATH/gsutil cp gs://$DEPLOYMENT_BUCKET/linux.zip clusterfuzz-source.zip
95+
if [ "$USE_GCLOUD_STORAGE_CP" = "1" ]; then
96+
$GSUTIL_PATH/gcloud storage cp gs://$DEPLOYMENT_BUCKET/linux.zip clusterfuzz-source.zip
97+
else
98+
$GSUTIL_PATH/gsutil cp gs://$DEPLOYMENT_BUCKET/linux.zip clusterfuzz-source.zip
99+
fi
96100
unzip -q clusterfuzz-source.zip
97101
98102
echo "Setting up android."

configs/test/gce/windows-init.ps1

Lines changed: 6 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -270,9 +270,14 @@ rm "C:\Program Files (x86)\Windows Kits\10\Debuggers\x64\sym" -Recurse -Force
270270
$deploymentBucket = $webClient.DownloadString('http://metadata.google.internal/computeMetadata/v1/project/attributes/deployment-bucket')
271271

272272
# Download ClusterFuzz source.
273+
Write-Host "Downloading ClusterFuzz source code."
273274
rm c:\clusterfuzz -Recurse -Force
274275
$fileName = "$tmp\clusterfuzz.zip"
275-
gsutil cp gs://$deploymentBucket/windows-3.zip $fileName
276+
if ($env:USE_GCLOUD_STORAGE_CP -eq "1") {
277+
gcloud storage cp gs://$deploymentBucket/windows-3.zip $fileName
278+
} else {
279+
gsutil cp gs://$deploymentBucket/windows-3.zip $fileName
280+
}
276281
unzip $fileName
277282

278283
# Resize partition to max available size.

0 commit comments

Comments
 (0)