ARM performence issus #4722
Replies: 0 comments 3 replies
-
The rpi 4 has 4 cores whereas the 3b+ only has 1. Node.js has a just-in-time compiler where compilation happens in the background, i.e., while the program is running. On the rpi 4 that work is done on another core but on the 3b+ the program and the compiler share the same core. You can see it for yourself with a system profiler like perf. |
Beta Was this translation helpful? Give feedback.
-
I do some further digging.Compiling prime.c without -O2.Now the result is:pi@Pi-4B: pi@Pi-3plus: pi@R4S: ==================================== Without "O2" seems has not effect to A72,But make A53 spend about double time to executing code. |
Beta Was this translation helpful? Give feedback.
Uh oh!
There was an error while loading. Please reload this page.
-
I'm trying to develope something on IOT edge device by using nodejs-v18.16,and encounte some performence problem.
At first,I must say: node is very very fast on some occasion.
I write a program(very simple to calc prime number)with C and JS to test performence of my devices.On raspi-4B(OC to 2Ghz,all my test platform is 64bit) the rsult is:
pi@Pi-4B:~/devel/nodejs $ node --jitless prime.js
Warning: disabling flag --expose_wasm due to conflicting flags
Total time: 118.194S Total Prime: 664578
pi@Pi-4B:~/devel/nodejs $ node prime.js
Total time: 7.846S Total Prime: 664578
pi@Pi-4B:~/devel/nodejs $ ./prime
prime count: 664578 delta sec: 7.840 Sec
==============================================================
"prime" is a C program compiled with "gcc -o prime prime1.c -O2 -lm"
While on raspi-3B+,the result is :
pi@Pi-3plus:~/devel/nodejs $ node --jitless prime.js
Warning: disabling flag --expose_wasm due to conflicting flags
Total time: 462.267S Total Prime: 664578
pi@Pi-3plus:~/devel/nodejs $ node prime.js
Total time: 29.455S Total Prime: 664578
pi@Pi-3plus:~/devel/nodejs $ ./prime
prime count: 664578 delta sec: 14.381 Sec
===============================================================
On pi-4b,Nodejs is run almost as fast as C,But on pi-3B+,Nodejs run much slower,about half of C.
Why??? To make myself clean,I do more test,with all of my device(rk3399,rk3588s,rk3308,Intel N4200).
Rk3399 is looks like a hybrid of pi-4b and pi-3B+,which has 2A72+4A53 and 4GB mem.
pi@R4S:
/devel/nodejs$taskset -c 5 node prime.js/devel/nodejs$ taskset -c 5 ./primeTotal time: 7.875S Total Prime: 664578
pi@R4S:
prime count: 664578 delta sec: 7.717 Sec
pi@R4S:
/devel/nodejs$ taskset -c 0 node prime.js/devel/nodejs$ taskset -c 0 ./primeTotal time: 25.485S Total Prime: 664578
pi@R4S:
prime count: 664578 delta sec: 12.402 Sec
===============================================================
The same thing happend! On A53,Nodejs run much slower,about half of C.
On Rk3588S(4A75+4A55,16GB mem),the test result is similar as Rk3399:
firefly@firefly:
/nodejs$ taskset -c 6 ./prime/nodejs$ taskset -c 6 node prime.jsprime count: 664578 delta sec: 6.899 Sec
firefly@firefly:
Total time: 6.868S Total Prime: 664578
firefly@firefly:
/nodejs$ taskset -c 2 ./prime/nodejs$ taskset -c 2 node prime.jsprime count: 664578 delta sec: 11.010 Sec
firefly@firefly:
Total time: 22.605S Total Prime: 664578
===============================================================
On RK3308(rockpi-s),Nodejs also run much slower,about half of C.
i@Pi-S:
/nodejs$ node prime.js/nodejs$ ./primeTotal time: 36.975S Total Prime: 664578
pi@Pi-S:
prime count: 664578 delta sec: 16.707 Sec
===============================================================
On Intel N4200, Nodejs is run as fast as C ,as A72 and A 76.
Can anyone tell me WHY? And would pleased to tell me,how to optimize performence on A53 plateform?
Here is the code of prime.c and prime.js
#include <stdio.h>
#include <stdlib.h>
#include <sys/time.h>
#include <math.h>
int main()
{
const long long Prime = 10000000;
long long sqrt_i,i,n,j;
float delta_sec;
struct timeval in_time,out_time;
n=0;
gettimeofday(&in_time,NULL);
for ( i = 3; i <= Prime; i=i+2) {
sqrt_i = sqrt(i);
for ( j = 2; j <= sqrt_i; j++) if (i % j == 0) break;
if (j>sqrt_i) {
n++;
}
}
gettimeofday(&out_time,NULL);
delta_sec=(out_time.tv_sec-in_time.tv_sec)+(out_time.tv_usec-in_time.tv_usec)*1e-6;
printf("prime count: %lld delta sec: %.3f Sec\n",n,delta_sec);
printf("size of short:%d\n",sizeof(short));
printf("size of int:%d\n",sizeof(int));
printf("size of long:%d\n",sizeof(long));
printf("size of long long:%d\n",sizeof(long long));
printf("size of float:%d\n",sizeof(float));
printf("size of double:%d\n",sizeof(double));
}
const Prime = 10000000;
let sqrt_i;
let in_time,out_time;
let n=0;
let i,j;
in_time= new Date().getTime();
nextPrime:
for ( i = 3; i <= Prime; i=i+2) {
sqrt_i=Math.sqrt(i);
for ( j = 2; j <= sqrt_i; j++) { if (i % j == 0) break;
}
if (j>sqrt_i) n++;
}
out_time = new Date().getTime();
console.log(
Total time: ${(out_time-in_time)/1000}S Total Prime: ${n}
);Beta Was this translation helpful? Give feedback.
All reactions