Skip to content

Commit d73df1e

Browse files
authored
Consolidate coverage uploading (#787)
* Use test template for gpu tests Signed-off-by: Dong Hyuk Chang <donghyukc@nvidia.com> * Remove unneeded dependencies Signed-off-by: Dong Hyuk Chang <donghyukc@nvidia.com> * Fix naming Signed-off-by: Dong Hyuk Chang <donghyukc@nvidia.com> * Update build args for dockerfile Signed-off-by: Dong Hyuk Chang <donghyukc@nvidia.com> * Remove the use of venv Signed-off-by: Dong Hyuk Chang <donghyukc@nvidia.com> * Prune dockerfile Signed-off-by: Dong Hyuk Chang <donghyukc@nvidia.com> * Set workdir Signed-off-by: Dong Hyuk Chang <donghyukc@nvidia.com> * Update workspace Signed-off-by: Dong Hyuk Chang <donghyukc@nvidia.com> * Use source activate Signed-off-by: Dong Hyuk Chang <donghyukc@nvidia.com> * Fix typo Signed-off-by: Dong Hyuk Chang <donghyukc@nvidia.com> * Update test path Signed-off-by: Dong Hyuk Chang <donghyukc@nvidia.com> * Update pytest run command Signed-off-by: Dong Hyuk Chang <donghyukc@nvidia.com> * Set timeout to 20 minutes Signed-off-by: Dong Hyuk Chang <donghyukc@nvidia.com> * Coverage settings in pyproject toml Signed-off-by: Dong Hyuk Chang <donghyukc@nvidia.com> * Update codecov yml Signed-off-by: Dong Hyuk Chang <donghyukc@nvidia.com> * Check workspace if coverage file was generated Signed-off-by: Dong Hyuk Chang <donghyukc@nvidia.com> * Expand workspace ls Signed-off-by: Dong Hyuk Chang <donghyukc@nvidia.com> * Remove coveragerc in favor for pyproject.toml Signed-off-by: Dong Hyuk Chang <donghyukc@nvidia.com> * Update source path Signed-off-by: Dong Hyuk Chang <donghyukc@nvidia.com> * Move cpu test into cicd NeMo Signed-off-by: Dong Hyuk Chang <donghyukc@nvidia.com> * Update cpu test config Signed-off-by: Dong Hyuk Chang <donghyukc@nvidia.com> * Fix workflow syntax Signed-off-by: Dong Hyuk Chang <donghyukc@nvidia.com> * Fix spacing on workflow syntax Signed-off-by: Dong Hyuk Chang <donghyukc@nvidia.com> * Update cpu test name Signed-off-by: Dong Hyuk Chang <donghyukc@nvidia.com> * Update naming and dependency Signed-off-by: Dong Hyuk Chang <donghyukc@nvidia.com> * Echo coverage report name Signed-off-by: Dong Hyuk Chang <donghyukc@nvidia.com> * Echo coverage report var Signed-off-by: Dong Hyuk Chang <donghyukc@nvidia.com> * Add id to cpu coverage report generation Signed-off-by: Dong Hyuk Chang <donghyukc@nvidia.com> * Test updating gpu needs Signed-off-by: Dong Hyuk Chang <donghyukc@nvidia.com> * Revert need Signed-off-by: Dong Hyuk Chang <donghyukc@nvidia.com> * Update coverage paths Signed-off-by: Dong Hyuk Chang <donghyukc@nvidia.com> * Address PR comments Signed-off-by: Dong Hyuk Chang <donghyukc@nvidia.com> * Update name Signed-off-by: Dong Hyuk Chang <donghyukc@nvidia.com> * Update docs_only logic Signed-off-by: Dong Hyuk Chang <donghyukc@nvidia.com> * Remove config from testing Signed-off-by: Dong Hyuk Chang <donghyukc@nvidia.com> --------- Signed-off-by: Dong Hyuk Chang <donghyukc@nvidia.com>
1 parent 115c547 commit d73df1e

File tree

9 files changed

+644
-330
lines changed

9 files changed

+644
-330
lines changed

.coveragerc

Lines changed: 0 additions & 8 deletions
This file was deleted.
Lines changed: 241 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,241 @@
1+
# Copyright (c) 2025, NVIDIA CORPORATION. All rights reserved.
2+
#
3+
# Licensed under the Apache License, Version 2.0 (the "License");
4+
# you may not use this file except in compliance with the License.
5+
# You may obtain a copy of the License at
6+
#
7+
# http://www.apache.org/licenses/LICENSE-2.0
8+
#
9+
# Unless required by applicable law or agreed to in writing, software
10+
# distributed under the License is distributed on an "AS IS" BASIS,
11+
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
12+
# See the License for the specific language governing permissions and
13+
# limitations under the License.
14+
name: "Test Template"
15+
description: "Template for running NeMo tests in a containerized environment"
16+
17+
inputs:
18+
timeout:
19+
description: "Max runtime of test in minutes"
20+
required: false
21+
default: "10"
22+
script:
23+
description: "Test script to execute"
24+
required: true
25+
is_optional:
26+
description: "Failure will cancel all other tests if set to true"
27+
required: false
28+
default: "false"
29+
is_unit_test:
30+
description: "Upload coverage as unit test"
31+
required: false
32+
default: "false"
33+
cpu-only:
34+
description: "Run tests on CPU only"
35+
required: false
36+
default: "false"
37+
azure-client-id:
38+
description: "Azure Client ID"
39+
required: true
40+
azure-tenant-id:
41+
description: "Azure Tenant ID"
42+
required: true
43+
azure-subscription-id:
44+
description: "Azure Subscription ID"
45+
required: true
46+
has-azure-credentials:
47+
description: "Has Azure credentials"
48+
required: false
49+
default: "false"
50+
PAT:
51+
description: "GitHub Personal Access Token"
52+
required: true
53+
runs:
54+
using: "composite"
55+
steps:
56+
- name: Install Azure CLI
57+
if: ${{ inputs.has-azure-credentials == 'true' }}
58+
shell: bash
59+
run: |
60+
curl -sL https://aka.ms/InstallAzureCLIDeb | sudo bash
61+
62+
- name: Azure Login
63+
if: ${{ inputs.has-azure-credentials == 'true' }}
64+
uses: azure/login@v2
65+
with:
66+
client-id: ${{ inputs.azure-client-id }}
67+
tenant-id: ${{ inputs.azure-tenant-id }}
68+
subscription-id: ${{ inputs.azure-subscription-id }}
69+
70+
- name: Azure Fileshare
71+
if: ${{ inputs.has-azure-credentials == 'true' && inputs.is_unit_test == 'false' }}
72+
shell: bash
73+
id: azure-fileshare
74+
run: |
75+
sudo apt update
76+
sudo apt install -y cifs-utils
77+
78+
RESOURCE_GROUP_NAME="azure-gpu-vm-runner_group"
79+
STORAGE_ACCOUNT_NAME="nemocistorageaccount2"
80+
FILE_SHARE_NAME="fileshare"
81+
82+
MNT_ROOT="/media"
83+
MNT_PATH="$MNT_ROOT/$STORAGE_ACCOUNT_NAME/$FILE_SHARE_NAME"
84+
85+
echo "MNT_PATH=$MNT_PATH" | tee -a "$GITHUB_OUTPUT"
86+
87+
sudo mkdir -p $MNT_PATH
88+
89+
# Create a folder to store the credentials for this storage account and
90+
# any other that you might set up.
91+
CREDENTIAL_ROOT="/etc/smbcredentials"
92+
sudo mkdir -p "/etc/smbcredentials"
93+
94+
# Get the storage account key for the indicated storage account.
95+
# You must be logged in with az login and your user identity must have
96+
# permissions to list the storage account keys for this command to work.
97+
STORAGE_ACCOUNT_KEY=$(az storage account keys list \
98+
--resource-group $RESOURCE_GROUP_NAME \
99+
--account-name $STORAGE_ACCOUNT_NAME \
100+
--query "[0].value" --output tsv | tr -d '"')
101+
102+
# Create the credential file for this individual storage account
103+
SMB_CREDENTIAL_FILE="$CREDENTIAL_ROOT/$STORAGE_ACCOUNT_NAME.cred"
104+
if [ ! -f $SMB_CREDENTIAL_FILE ]; then
105+
echo "username=$STORAGE_ACCOUNT_NAME" | sudo tee $SMB_CREDENTIAL_FILE > /dev/null
106+
echo "password=$STORAGE_ACCOUNT_KEY" | sudo tee -a $SMB_CREDENTIAL_FILE > /dev/null
107+
else
108+
echo "The credential file $SMB_CREDENTIAL_FILE already exists, and was not modified."
109+
fi
110+
111+
# Change permissions on the credential file so only root can read or modify the password file.
112+
sudo chmod 600 $SMB_CREDENTIAL_FILE
113+
114+
# This command assumes you have logged in with az login
115+
HTTP_ENDPOINT=$(az storage account show --resource-group $RESOURCE_GROUP_NAME --name $STORAGE_ACCOUNT_NAME --query "primaryEndpoints.file" --output tsv | tr -d '"')
116+
SMB_PATH=$(echo $HTTP_ENDPOINT | cut -c7-${#HTTP_ENDPOINT})$FILE_SHARE_NAME
117+
118+
STORAGE_ACCOUNT_KEY=$(az storage account keys list --resource-group $RESOURCE_GROUP_NAME --account-name $STORAGE_ACCOUNT_NAME --query "[0].value" --output tsv | tr -d '"')
119+
120+
sudo mount -t cifs $SMB_PATH $MNT_PATH -o credentials=$SMB_CREDENTIAL_FILE,serverino,nosharesock,actimeo=30,mfsymlinks
121+
122+
ls -al $MNT_PATH/TestData
123+
124+
- name: Checkout repository
125+
uses: actions/checkout@v2
126+
with:
127+
path: NeMo-Curator
128+
129+
- name: Build container
130+
shell: bash
131+
env:
132+
GH_TOKEN: ${{ inputs.PAT }}
133+
run: |
134+
docker build -f Dockerfile -t curator .
135+
136+
- name: Start container
137+
shell: bash
138+
run: |
139+
MNT_PATH=${{ steps.azure-fileshare.outputs.mnt_path }}
140+
141+
ARG=("")
142+
if [[ "${{ inputs.cpu-only }}" == "false" ]]; then
143+
ARG=("--runtime=nvidia --gpus all")
144+
fi
145+
146+
cmd=$(cat <<RUN_TEST_EOF
147+
#!/bin/bash
148+
docker container rm -f nemo_container_${{ github.run_id }} || true
149+
docker run \
150+
--rm \
151+
-d \
152+
--name nemo_container_${{ github.run_id }} ${ARG[@]} \
153+
--shm-size=64g \
154+
--env RUN_ID=${{ github.run_id }} \
155+
--volume $(pwd)/NeMo-Curator:/workspace \
156+
--workdir /workspace \
157+
--volume $MNT_PATH/TestData:/home/TestData \
158+
curator \
159+
bash -c "sleep $(( ${{ inputs.timeout }} * 60 + 60 ))"
160+
RUN_TEST_EOF
161+
)
162+
163+
echo "$cmd" | tee "retry_job.sh"
164+
bash retry_job.sh
165+
166+
- name: Create run-script
167+
id: create
168+
shell: bash
169+
run: |
170+
COVERAGE_PREFIX=$([[ "${{ inputs.is_unit_test }}" == "true" ]] && echo "unit-test" || echo "e2e")
171+
echo "coverage-prefix=$COVERAGE_PREFIX" | tee -a "$GITHUB_OUTPUT"
172+
173+
cmd=$(cat <<'RUN_TEST_EOF'
174+
#!/bin/bash
175+
176+
docker exec -t nemo_container_${{ github.run_id }} bash -c '
177+
set -e
178+
179+
source activate /opt/conda/envs/curator
180+
181+
bash tests/${{ inputs.script }}.sh
182+
'
183+
184+
RUN_TEST_EOF
185+
)
186+
187+
echo "timeout_in_seconds=$(( ${{ inputs.timeout }} * 60 ))" | tee -a "$GITHUB_OUTPUT"
188+
echo "$cmd" | tee "job.sh"
189+
190+
- name: Run main script
191+
uses: nick-fields/retry@v3
192+
id: run-main-script
193+
with:
194+
timeout_seconds: ${{ steps.create.outputs.timeout_in_seconds }}
195+
max_attempts: 3
196+
shell: bash
197+
retry_on: timeout
198+
command: /bin/bash job.sh
199+
on_retry_command: /bin/bash retry_job.sh
200+
201+
- name: Check result
202+
id: check
203+
shell: bash
204+
run: |
205+
docker exec nemo_container_${{ github.run_id }} coverage combine || true
206+
docker exec nemo_container_${{ github.run_id }} coverage xml || true
207+
docker cp nemo_container_${{ github.run_id }}:/workspace/.coverage .coverage
208+
docker cp nemo_container_${{ github.run_id }}:/workspace/coverage.xml coverage.xml
209+
210+
coverage_report=coverage-${{ steps.create.outputs.coverage-prefix }}-${{ github.run_id }}-$(uuidgen)
211+
echo "coverage_report=$coverage_report" >> "$GITHUB_OUTPUT"
212+
213+
EXIT_CODE=${{ steps.run-main-script.outputs.exit_code }}
214+
IS_SUCCESS=$([[ "$EXIT_CODE" -eq 0 ]] && echo "true" || echo "false")
215+
216+
if [[ "$IS_SUCCESS" == "false" && "${{ inputs.is_optional }}" == "true" ]]; then
217+
echo "::warning:: Test failed, but displayed as successful because it is marked as optional."
218+
IS_SUCCESS=true
219+
fi
220+
221+
if [[ "$IS_SUCCESS" == "false" ]]; then
222+
echo Test did not finish successfully.
223+
exit 1
224+
fi
225+
226+
exit $EXIT_CODE
227+
228+
- name: Test coverage
229+
shell: bash -x -e -u -o pipefail {0}
230+
run: |
231+
docker exec -t nemo_container_${{ github.run_id }} coverage report -i
232+
233+
- name: Upload artifacts
234+
uses: actions/upload-artifact@v4
235+
if: ${{ steps.check.outputs.coverage_report != 'none' }}
236+
with:
237+
name: ${{ steps.check.outputs.coverage_report }}
238+
path: |
239+
coverage.xml
240+
.coverage
241+
include-hidden-files: true

0 commit comments

Comments
 (0)