Skip to content

enable pytest to survive crashing tests and potentially complete the remaining tests #1909

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Draft
wants to merge 184 commits into
base: main
Choose a base branch
from
Draft
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
184 commits
Select commit Hold shift + click to select a range
d3f32fa
modify build
mengfei25 Jul 16, 2025
bdc58d7
modify ut
mengfei25 Jul 16, 2025
b07b490
modify e2e
mengfei25 Jul 16, 2025
7b4582b
update
mengfei25 Jul 16, 2025
3ae4b09
update
mengfei25 Jul 16, 2025
fe06ca3
update
mengfei25 Jul 16, 2025
be531f7
Update nightly_ondemand.yml
mengfei25 Jul 16, 2025
1df6138
update
mengfei25 Jul 17, 2025
9fe4dcb
update
mengfei25 Jul 17, 2025
ef91984
update
mengfei25 Jul 17, 2025
01fbe46
update
mengfei25 Jul 17, 2025
191b5c0
update
mengfei25 Jul 17, 2025
f313b85
update
mengfei25 Jul 18, 2025
7d4488b
update
mengfei25 Jul 18, 2025
66e28da
update
mengfei25 Jul 18, 2025
8b22418
update
mengfei25 Jul 18, 2025
acf94d1
update
mengfei25 Jul 18, 2025
a6fa7da
update
mengfei25 Jul 18, 2025
053bed3
update
mengfei25 Jul 18, 2025
7ee8d4b
update
mengfei25 Jul 18, 2025
428e483
update
mengfei25 Jul 18, 2025
c483968
update
mengfei25 Jul 18, 2025
32474f8
update
mengfei25 Jul 18, 2025
5976099
update
mengfei25 Jul 18, 2025
28e53b2
update
mengfei25 Jul 18, 2025
2e9921e
update
mengfei25 Jul 18, 2025
b058b1a
update
mengfei25 Jul 18, 2025
93e5444
update
mengfei25 Jul 18, 2025
2aa5b11
Merge branch 'main' into mengfeil/containerd
mengfei25 Jul 18, 2025
8baec84
update
mengfei25 Jul 18, 2025
95709b9
Merge branch 'mengfeil/containerd' of https://github.com/intel/torch-…
mengfei25 Jul 18, 2025
d4c78aa
update
mengfei25 Jul 18, 2025
a8154f1
update
mengfei25 Jul 18, 2025
f25ecfe
update
mengfei25 Jul 18, 2025
e06e1bd
update
mengfei25 Jul 18, 2025
c437f29
update
mengfei25 Jul 18, 2025
0ae0bb1
update
mengfei25 Jul 18, 2025
d4da95d
update
mengfei25 Jul 18, 2025
db17d7d
update
mengfei25 Jul 18, 2025
b9c247a
update
mengfei25 Jul 18, 2025
a7d76ae
Merge branch 'main' into mengfeil/containerd
mengfei25 Jul 18, 2025
9ae98ea
update
mengfei25 Jul 18, 2025
c06f1ee
update
mengfei25 Jul 18, 2025
6e14f8b
update
mengfei25 Jul 21, 2025
9a621c5
update
mengfei25 Jul 21, 2025
bb17bab
update
mengfei25 Jul 21, 2025
981c744
update
mengfei25 Jul 21, 2025
6482077
update
mengfei25 Jul 21, 2025
ec0c1f2
update
mengfei25 Jul 21, 2025
1cc986e
update
mengfei25 Jul 21, 2025
b2b48c5
update
mengfei25 Jul 21, 2025
bccec93
update
mengfei25 Jul 21, 2025
9f604a7
update
mengfei25 Jul 21, 2025
8a78c7c
update
mengfei25 Jul 22, 2025
46d00c8
update
mengfei25 Jul 22, 2025
e3949d8
update
mengfei25 Jul 22, 2025
3f69213
update
mengfei25 Jul 22, 2025
e8b015a
update
mengfei25 Jul 22, 2025
bbd82cd
update
mengfei25 Jul 22, 2025
c144bab
get runner
mengfei25 Jul 22, 2025
40180c0
test env
mengfei25 Jul 22, 2025
54ea2f0
update
mengfei25 Jul 22, 2025
9b660b9
Revert "update"
mengfei25 Jul 22, 2025
7d025c0
update
mengfei25 Jul 22, 2025
dd23ceb
update
mengfei25 Jul 22, 2025
f21e4c9
Merge branch 'main' into mengfeil/containerd
mengfei25 Jul 22, 2025
de4a432
update
mengfei25 Jul 22, 2025
517c324
Merge branch 'mengfeil/containerd' of https://github.com/intel/torch-…
mengfei25 Jul 22, 2025
65cc01a
remove useless inputs for op benchmark
mengfei25 Jul 22, 2025
f727ef8
checkout torch-xpu-ops
mengfei25 Jul 22, 2025
18ada97
modify get runner
mengfei25 Jul 22, 2025
018f968
modify build
mengfei25 Jul 22, 2025
93fa112
modify build
mengfei25 Jul 23, 2025
ad8cc67
update
mengfei25 Jul 23, 2025
de0f557
Merge branch 'main' into mengfeil/containerd
mengfei25 Jul 23, 2025
7c9d3a3
update
mengfei25 Jul 23, 2025
2fc3b8e
update
mengfei25 Jul 23, 2025
78cedbf
update
mengfei25 Jul 23, 2025
c6bc928
update
mengfei25 Jul 23, 2025
9765fac
update
mengfei25 Jul 23, 2025
eda9634
modify ut
mengfei25 Jul 23, 2025
ec697f5
modify build
mengfei25 Jul 23, 2025
c1e4ca7
modify build
mengfei25 Jul 23, 2025
50e40fe
modify build
mengfei25 Jul 24, 2025
6173798
Merge branch 'main' into mengfeil/containerd
mengfei25 Jul 24, 2025
b3f6f0e
update
mengfei25 Jul 24, 2025
5848944
Merge branch 'mengfeil/containerd' of https://github.com/intel/torch-…
mengfei25 Jul 24, 2025
42da693
modify build
mengfei25 Jul 24, 2025
26b56db
modify build
mengfei25 Jul 24, 2025
77d8172
modify build
mengfei25 Jul 24, 2025
e9d551a
update
mengfei25 Jul 24, 2025
9649dfd
update
mengfei25 Jul 24, 2025
84a5132
update
mengfei25 Jul 24, 2025
ee18a1c
update
mengfei25 Jul 24, 2025
43fee42
update
mengfei25 Jul 24, 2025
e8f1c0d
update
mengfei25 Jul 24, 2025
517b081
update
mengfei25 Jul 25, 2025
ddecdf9
update
mengfei25 Jul 25, 2025
d1bf4cf
modify ut
mengfei25 Jul 25, 2025
d99668c
update
mengfei25 Jul 25, 2025
d0d1ceb
update
mengfei25 Jul 25, 2025
1f26538
update
mengfei25 Jul 25, 2025
d06b8db
update
mengfei25 Jul 25, 2025
70577e1
update
mengfei25 Jul 25, 2025
2467e9e
update
mengfei25 Jul 25, 2025
96ff039
update
mengfei25 Jul 25, 2025
da12ea0
update
mengfei25 Jul 25, 2025
4f6ecfd
modify ut
mengfei25 Jul 25, 2025
ba97507
modify ut
mengfei25 Jul 25, 2025
9896441
fix pip warnings
mengfei25 Jul 25, 2025
50467ee
modify ut logs path
mengfei25 Jul 25, 2025
5c62bc9
modify ut logs path
mengfei25 Jul 25, 2025
8b33c21
set run name for nightly and on-demand tests
mengfei25 Jul 25, 2025
f08c528
modify ut logs path
mengfei25 Jul 25, 2025
55bd5dc
ut summray always
mengfei25 Jul 25, 2025
dbd3a27
fix ut logs path
mengfei25 Jul 25, 2025
2e7680d
fix e2e summary permission
mengfei25 Jul 25, 2025
0a78df1
fix ut log path
mengfei25 Jul 25, 2025
074992f
update
mengfei25 Jul 25, 2025
b11510f
update
mengfei25 Jul 25, 2025
a18995b
modify e2e summary
mengfei25 Jul 25, 2025
754202d
modify e2e summary
mengfei25 Jul 25, 2025
27c5cff
modify e2e summary
mengfei25 Jul 25, 2025
92d7ff1
update
mengfei25 Jul 25, 2025
0fade31
Merge branch 'main' into mengfeil/containerd
mengfei25 Aug 4, 2025
56520ca
update
mengfei25 Aug 4, 2025
9117a0c
update
mengfei25 Aug 4, 2025
79155e5
Merge branch 'main' into mengfeil/containerd
mengfei25 Aug 4, 2025
587aa95
update
mengfei25 Aug 4, 2025
3b0b94d
update
mengfei25 Aug 4, 2025
e47b3e4
update
mengfei25 Aug 5, 2025
eafefa4
Merge branch 'main' into mengfeil/containerd
mengfei25 Aug 5, 2025
51578bd
enable pytest to survive crashing tests and potentially complete the …
mengfei25 Aug 6, 2025
75c99ff
update
mengfei25 Aug 6, 2025
15d0daf
Merge branch 'mengfeil/containerd' into mengfeil/ut_skip_crash
mengfei25 Aug 6, 2025
dcc4433
fix lint issue
mengfei25 Aug 6, 2025
e244cb1
Update pull.yml
mengfei25 Aug 6, 2025
47cbdf5
modify pt2e
mengfei25 Aug 7, 2025
37e652e
Merge branch 'main' into mengfeil/containerd
mengfei25 Aug 7, 2025
de15a3f
update
mengfei25 Aug 7, 2025
8445b8b
e2e test matrix tests
mengfei25 Aug 11, 2025
a145fa2
Merge branch 'main' into mengfeil/containerd
mengfei25 Aug 11, 2025
8fe34c5
modify e2e summary
mengfei25 Aug 11, 2025
bfc98da
update
mengfei25 Aug 11, 2025
8c66acd
Merge branch 'main' into mengfeil/containerd
mengfei25 Aug 11, 2025
8064126
update
mengfei25 Aug 11, 2025
1ea6a62
update
mengfei25 Aug 11, 2025
530af25
update
mengfei25 Aug 11, 2025
091678f
update
mengfei25 Aug 11, 2025
eaa4bc4
update
mengfei25 Aug 11, 2025
f70ef8a
update deps
mengfei25 Aug 11, 2025
a12045a
update
mengfei25 Aug 11, 2025
70415c2
modify cache dir
mengfei25 Aug 11, 2025
5fcc6c6
update
mengfei25 Aug 12, 2025
067ef5a
Rebase
mengfei25 Aug 12, 2025
263a393
Merge branch 'main' into mengfeil/containerd
mengfei25 Aug 12, 2025
18f22e0
update
mengfei25 Aug 12, 2025
2eb33ed
Merge branch 'mengfeil/containerd' of https://github.com/intel/torch-…
mengfei25 Aug 12, 2025
0add64e
update
mengfei25 Aug 12, 2025
8902540
update
mengfei25 Aug 13, 2025
0eda9f7
update
mengfei25 Aug 13, 2025
3631454
update
mengfei25 Aug 13, 2025
00a3720
Revert "update"
mengfei25 Aug 13, 2025
315544e
Rebase
mengfei25 Aug 13, 2025
0aab07a
update
mengfei25 Aug 13, 2025
29a9fd8
update
mengfei25 Aug 13, 2025
c69854f
update
mengfei25 Aug 13, 2025
3d98b0e
Merge branch 'main' into mengfeil/containerd
mengfei25 Aug 13, 2025
a6b2302
merge main
mengfei25 Aug 13, 2025
0a17050
update
mengfei25 Aug 13, 2025
a7257b0
modify e2e summary
mengfei25 Aug 14, 2025
88af21f
Merge branch 'main' into mengfeil/containerd
mengfei25 Aug 14, 2025
8a54cfa
modify on-demand test
mengfei25 Aug 14, 2025
82783de
Merge branch 'mengfeil/containerd' of https://github.com/intel/torch-…
mengfei25 Aug 14, 2025
23f097f
modify on-demand test
mengfei25 Aug 14, 2025
a047acc
rebase
mengfei25 Aug 14, 2025
21aedbf
Merge branch 'main' into mengfeil/containerd
mengfei25 Aug 14, 2025
7df6ea3
rebase
mengfei25 Aug 14, 2025
58122cd
rebase
mengfei25 Aug 14, 2025
72c4bb5
parallel 1 to skip crash only
mengfei25 Aug 14, 2025
d951abf
install pytest-xdist
mengfei25 Aug 14, 2025
d459b6e
modify
mengfei25 Aug 14, 2025
171cc61
modify
mengfei25 Aug 14, 2025
d50cf68
lint python
mengfei25 Aug 14, 2025
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
47 changes: 47 additions & 0 deletions .github/actions/get-runner/action.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,47 @@
name: Get Runner Infos

outputs:
runner_id:
value: ${{ steps.runner.outputs.runner_id }}
user_id:
value: ${{ steps.runner.outputs.user_id }}
render_id:
value: ${{ steps.runner.outputs.render_id }}
hostname:
value: ${{ steps.runner.outputs.hostname }}

permissions: read-all

runs:
using: composite
steps:
- name: Get runner
shell: bash -xe {0}
id: runner
run: |
# get test runner
echo "runner_id=$(echo ${RUNNER_NAME} |sed 's/\-[0-9]$//')" |tee -a ${GITHUB_OUTPUT}
echo "user_id=$(id -u)" |tee -a ${GITHUB_OUTPUT}
echo "render_id=$(getent group render |cut -d: -f3)" |tee -a ${GITHUB_OUTPUT}
echo "hostname=$(hostname)" |tee -a ${GITHUB_OUTPUT}
# show host info
lscpu
lshw -C display
free -h
df -h
cat /etc/os-release
uname -a
- name: Cleanup host
shell: bash -xe {0}
run: |
# clean docker cache
docker system prune -af || true
# clean workspace
ls -al
sudo find ./ |grep -v "^\./$" |xargs sudo rm -rf
cd ${RUNNER_WORKSPACE}/..
if [ "${PWD}" != "/" ];then
ls -al
sudo chmod 777 -R torch-xpu-ops _temp _actions _tool || true
sudo rm -rf _temp
fi
185 changes: 0 additions & 185 deletions .github/actions/inductor-xpu-e2e-test/action.yml

This file was deleted.

111 changes: 111 additions & 0 deletions .github/actions/linux-e2etest/action.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,111 @@
name: Linux E2E Test

inputs:
env_prepare:
required: false
description: If set to any value, will prepare suite test env
suite:
required: true
type: string
default: 'huggingface'
description: Dynamo benchmarks test suite. huggingface,timm_models,torchbench. Delimiter is comma
dt:
required: true
type: string
default: 'float32'
description: Data precision of the test.float32,bfloat16,float16,amp_bf16,amp_fp16. Delimiter is comma
mode:
required: true
type: string
default: 'inference'
description: inference,training. Delimiter is comma
scenario:
required: true
type: string
default: 'accuracy'
description: accuracy,performance. Delimiter is comma

runs:
using: composite
steps:
- name: E2E Test (${{ inputs.suite }} ${{ inputs.dt }} ${{ inputs.mode }} ${{ inputs.scenario }})
shell: bash -x {0}
run: |
pip list |grep -E 'intel|torch'
cp ./.github/scripts/inductor_xpu_test.sh ./pytorch
cd ./pytorch
# check param
function contains() {
contains_status="echo 'Start $2 ...'"
{
[[ $1 =~ (^|,)$2($|,) ]]
} || {
echo "[Warning] $2 is not suppotted type! Skipped!"
contains_status="continue"
}
}
xpu_num=$(clinfo --list |awk 'BEGIN{gpu=0;}{if(gpu==1 && $0~/Platform/){gpu=0;}; if(gpu==1){print $0;}; if($0~/Platform.*Graphics/){gpu=1;}}' |wc -l)
cores_per_instance="$(lscpu |grep -E 'Core\(s\) per socket:|Socket\(s\):' |awk -v i="${xpu_num}" 'BEGIN{sum=1}{sum*=$NF}END{print sum/i}')"
export OMP_NUM_THREADS=${cores_per_instance}
for suite in $(echo ${{ inputs.suite }} |sed 's/,/ /g')
do
if [ "${suite}" == "pt2e" ];then
continue
fi
contains "huggingface,timm_models,torchbench" $suite
$contains_status
for dt in $(echo ${{ inputs.dt }} |sed 's/,/ /g')
do
contains "float32,bfloat16,float16,amp_bf16,amp_fp16" $dt
$contains_status
for mode in $(echo ${{ inputs.mode }} |sed 's/,/ /g')
do
contains "inference,training" $mode
$contains_status
for scenario in $(echo ${{ inputs.scenario }} |sed 's/,/ /g')
do
contains "accuracy,performance" $scenario
$contains_status
if [ "${MODEL_ONLY_NAME}" == "" ];then
for xpu_id in $(seq 0 $[ ${xpu_num} - 1 ])
do
cpu_list="$(echo "${cores_per_instance} ${xpu_id}" |awk '{printf("%d-%d", $1*$2, $1*$2+$1-1)}')"
numactl --localalloc --physcpubind=${cpu_list} bash -x inductor_xpu_test.sh ${suite} ${dt} ${mode} ${scenario} xpu ${xpu_id} static ${xpu_num} ${xpu_id} &
done
else
for test_model in $(echo ${MODEL_ONLY_NAME} |sed 's/,/ /g')
do
numactl --localalloc bash -x inductor_xpu_test.sh ${suite} ${dt} ${mode} ${scenario} xpu 0 static 1 0 ${test_model}
done
fi
wait
# summarize pass rate
LOG_DIR="inductor_log/${suite}/${dt}"
LOG_NAME=inductor_${suite}_${dt}_${mode}_xpu_${scenario}_all.log
rm -f ${LOG_DIR}/${LOG_NAME}
find ${LOG_DIR}/ -name "inductor_${suite}_${dt}_${mode}_xpu_${scenario}_card*.log" |xargs cat >> ${LOG_DIR}/${LOG_NAME} 2>&1
done
done
done
done

- name: Summary E2E Test (${{ inputs.suite }} ${{ inputs.dt }} ${{ inputs.mode }} ${{ inputs.scenario }})
shell: bash -xe {0}
run: |
cd ./pytorch
rm -f inductor_log/summary_accuracy.csv
for var in $(find inductor_log/ -name "inductor_*_xpu_accuracy.csv")
do
sed -i "s/$/,$(basename $var)/" $var
cat $var >> inductor_log/summary_accuracy.csv
done
cp ${{ github.workspace }}/.github/scripts/inductor_summary.py ./
csv_file="$(find inductor_log/ -name "inductor_*_xpu_*.csv" |tail -n 1)"
if [ -f "${csv_file}" ];then
pip install styleFrame scipy pandas
dt=$(echo ${{ inputs.dt }} |sed 's/,/ /g')
mode=$(echo ${{ inputs.mode }} |sed 's/,/ /g')
suite=$(echo ${{ inputs.suite }} |sed 's/,/ /g')
scenario=$(echo ${{ inputs.scenario }} |sed 's/,/ /g')
python inductor_summary.py -p ${dt} -s ${suite} -m ${mode} -sc ${scenario}
fi
Loading
Loading