-
Notifications
You must be signed in to change notification settings - Fork 687
Device telemetry for benchmark #11301
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
🔗 Helpful Links🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/executorch/11301
Note: Links to docs will display an error until the docs builds have been completed. ✅ No FailuresAs of commit a6b2e51 with merge base 8cfa858 ( This comment was automatically generated by Dr. CI and updates every 15 minutes. |
f19a422
to
506994c
Compare
2b2a315
to
0a6b89f
Compare
0a6b89f
to
a6b2e51
Compare
I dont follow the time_in_state part of the summary. That is supposed to be time in each frequency state, right? If so your output pasted in summary doesnt make sense |
Why do this conditionally? Why not just always have this |
Ok I see that you are already doing this |
Why battery stats? I thought these devices are always plugged in and they should be. and if they are plugged in there isnt much we info we get out of battery measurements really |
- echo "Mandatory Cool Down for 10 minutes" | ||
- | | ||
adb -s $DEVICEFARM_DEVICE_UDID shell 'cat /sys/devices/system/cpu/cpu*/cpufreq/stats/time_in_state /sys/devices/system/cpu/cpu*/cpufreq/stats/trans_table' > $DEVICEFARM_LOG_DIR/state_before.txt | ||
adb -s $DEVICEFARM_DEVICE_UDID shell 'sleep 600' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why do you need to schedule sleep on the device. Just make sure the job submissions is 10 minutes apar.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Or this is the only way to make sure really nothing else runs (like other users)
adb -s $DEVICEFARM_DEVICE_UDID shell 'cat /sys/devices/system/cpu/cpu*/cpufreq/stats/time_in_state /sys/devices/system/cpu/cpu*/cpufreq/stats/trans_table' > $DEVICEFARM_LOG_DIR/state_before.txt | ||
adb -s $DEVICEFARM_DEVICE_UDID shell 'sleep 600' | ||
|
||
- echo "Collect Device Telemetry - CPU Scaling Configuration" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You want to collect these numbers before sleep not after
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh nevermind. I see you are doing that after benchmark as well. Name the log file apprpriately. Like the pre benchmark should be prebenchmark suffixed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sleep doesnt need to be scheduled on device
but benchmark submission framework should make sure that after the benchmark is run, device is unavailable for certain amount of time. I dont know if just doing sleep will block the device. This has to be at benchmark infra level |
Looks good at a high level. Logical next step is to write a python script and flag if something is off in these txt files and discard or flag the numbers if we see throttling or thermal interrupts or something along those lines. But I understand parsing, and processing these kernels logs can be pain, so I will leave it upto you how much to automate. |
Looks like this PR hasn't been updated in a while so we're going to go ahead and mark this as |
This PR is establishing basic device telemetry for Android devices in the Benchmark Infra
Fill the DevX gap that we have been discussed here:#10983
The goal of establishing device telemetry are to:
The device telemetry to be collected via this PR:
1. CPU scaling configuration
Whether the CPU scaling is locked or not on the device under test.
This is the one-time stats collected prior to the benchmark run, after mandatory cool down sleeping.
Here is the example of collected CPU scaling config for S22:
The governor
walt
is the dynamic sched common on Qcomm chip.min_freq != max_freq
also shows the CPU scaling is not locked.2. CPU frequency transitions
Record
time_in_state
,trans_table
andtotal_trans
before and after the benchmark run, then calculating the difference will show the frequency behavior specifically during your benchmark.Important: The
time_in_state
,trans_table
andtotal_trans
data show cumulative statistics from system boot or last reset, so need to be reset prior to start benchmark jobs in order to get more accurate CPU frequency transitions during benchmark run. We need to collect them both pre-benchmark and post-benchmark.Here is the example of collected CPU frequency transitions from S22:
3. The Thermal Stats
Record the thermal status of the device prior to the benchmark.
This is to ensure the device will not be over-heated prior to start benchmark. We may conditionally put the device for extra sleep or skip benchmarking if over-heating is detected. For the beginning, we will start with simplest approach by putting device to a mandatory sleep for 10 mins unconditionally.
Here is the example of collected thermal stats from S22:
4. Battery Status and Info
Record battery status before and after the benchmark.
The battery status and info will be used to determine the battery health, which typically useful to determine perf regression caused by lower battery level, battery mode, etc. This will report temperature as well, in tenths of degrees Celsius.
Besides determine device health, by comparing the battery level before and after the benchmark we can have a rough understand of the power consumption of running ML models on-device. Though today the primary perf metrics we are focusing are latency and accuracy, power consumption of a model is a crucial metric for on-device in that would affect users experience significantly. Collecting this metric will give us some early signal of about power efficiency.
Here is the example of collected battery stats from S22:
The raw artifacts are downloadable via the CI, under
DEVICEFARM_LOG_DIR
: