Commit e98f184
Performance: Optimize para_gemm and para_linear_transform (deepmodeling#5967)
* change to copy matrix
* optimize PLinearTransform::act
* fix: CUDA compiling error without LCAO
* optimize allocate for GPU
* fix compile
* update results1 parent e6c6cad commit e98f184
File tree
11 files changed
+292
-263
lines changed- source
- module_base
- kernels
- cuda
- rocm
- test
- module_hsolver
- tests/integrate/102_PW_BPCG_BP
11 files changed
+292
-263
lines changed| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
385 | 385 | | |
386 | 386 | | |
387 | 387 | | |
388 | | - | |
389 | 388 | | |
390 | | - | |
391 | | - | |
392 | | - | |
393 | | - | |
394 | | - | |
395 | | - | |
| 389 | + | |
396 | 390 | | |
397 | | - | |
398 | | - | |
| 391 | + | |
| 392 | + | |
| 393 | + | |
| 394 | + | |
399 | 395 | | |
400 | | - | |
401 | | - | |
402 | | - | |
403 | | - | |
| 396 | + | |
404 | 397 | | |
405 | 398 | | |
406 | 399 | | |
| |||
980 | 973 | | |
981 | 974 | | |
982 | 975 | | |
983 | | - | |
984 | | - | |
985 | | - | |
986 | | - | |
987 | | - | |
| 976 | + | |
| 977 | + | |
| 978 | + | |
| 979 | + | |
| 980 | + | |
| 981 | + | |
988 | 982 | | |
989 | | - | |
990 | | - | |
991 | | - | |
| 983 | + | |
| 984 | + | |
| 985 | + | |
992 | 986 | | |
993 | 987 | | |
994 | 988 | | |
995 | | - | |
996 | | - | |
997 | | - | |
998 | | - | |
999 | | - | |
| 989 | + | |
| 990 | + | |
| 991 | + | |
| 992 | + | |
| 993 | + | |
| 994 | + | |
1000 | 995 | | |
1001 | | - | |
1002 | | - | |
1003 | | - | |
| 996 | + | |
| 997 | + | |
| 998 | + | |
1004 | 999 | | |
| 1000 | + | |
1005 | 1001 | | |
1006 | 1002 | | |
1007 | | - | |
1008 | | - | |
1009 | | - | |
1010 | | - | |
1011 | | - | |
| 1003 | + | |
| 1004 | + | |
| 1005 | + | |
| 1006 | + | |
| 1007 | + | |
| 1008 | + | |
1012 | 1009 | | |
1013 | | - | |
1014 | | - | |
1015 | | - | |
1016 | | - | |
| 1010 | + | |
| 1011 | + | |
| 1012 | + | |
1017 | 1013 | | |
1018 | 1014 | | |
1019 | 1015 | | |
| |||
1027 | 1023 | | |
1028 | 1024 | | |
1029 | 1025 | | |
1030 | | - | |
| 1026 | + | |
1031 | 1027 | | |
1032 | 1028 | | |
1033 | 1029 | | |
1034 | 1030 | | |
1035 | 1031 | | |
1036 | 1032 | | |
1037 | 1033 | | |
| 1034 | + | |
1038 | 1035 | | |
1039 | | - | |
| 1036 | + | |
| 1037 | + | |
1040 | 1038 | | |
1041 | 1039 | | |
1042 | 1040 | | |
1043 | 1041 | | |
1044 | 1042 | | |
1045 | 1043 | | |
1046 | | - | |
1047 | | - | |
1048 | 1044 | | |
1049 | 1045 | | |
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
337 | 337 | | |
338 | 338 | | |
339 | 339 | | |
340 | | - | |
| 340 | + | |
341 | 341 | | |
342 | | - | |
| 342 | + | |
343 | 343 | | |
344 | 344 | | |
345 | 345 | | |
346 | 346 | | |
347 | | - | |
| 347 | + | |
348 | 348 | | |
349 | | - | |
| 349 | + | |
350 | 350 | | |
351 | 351 | | |
352 | 352 | | |
| |||
367 | 367 | | |
368 | 368 | | |
369 | 369 | | |
370 | | - | |
| 370 | + | |
371 | 371 | | |
372 | 372 | | |
373 | 373 | | |
| |||
385 | 385 | | |
386 | 386 | | |
387 | 387 | | |
388 | | - | |
| 388 | + | |
| 389 | + | |
389 | 390 | | |
390 | 391 | | |
391 | 392 | | |
| |||
394 | 395 | | |
395 | 396 | | |
396 | 397 | | |
397 | | - | |
398 | 398 | | |
399 | 399 | | |
400 | 400 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
298 | 298 | | |
299 | 299 | | |
300 | 300 | | |
301 | | - | |
302 | | - | |
| 301 | + | |
| 302 | + | |
303 | 303 | | |
304 | 304 | | |
305 | | - | |
| 305 | + | |
| 306 | + | |
306 | 307 | | |
307 | 308 | | |
308 | 309 | | |
309 | 310 | | |
310 | 311 | | |
311 | 312 | | |
312 | | - | |
313 | | - | |
| 313 | + | |
314 | 314 | | |
315 | 315 | | |
316 | 316 | | |
| |||
370 | 370 | | |
371 | 371 | | |
372 | 372 | | |
373 | | - | |
374 | | - | |
375 | | - | |
376 | | - | |
377 | | - | |
378 | | - | |
| 373 | + | |
| 374 | + | |
| 375 | + | |
| 376 | + | |
| 377 | + | |
| 378 | + | |
| 379 | + | |
379 | 380 | | |
380 | 381 | | |
381 | 382 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
307 | 307 | | |
308 | 308 | | |
309 | 309 | | |
310 | | - | |
311 | 310 | | |
312 | | - | |
313 | | - | |
314 | | - | |
315 | | - | |
316 | | - | |
317 | | - | |
318 | | - | |
319 | | - | |
320 | | - | |
321 | | - | |
| 311 | + | |
| 312 | + | |
| 313 | + | |
| 314 | + | |
| 315 | + | |
| 316 | + | |
| 317 | + | |
322 | 318 | | |
323 | | - | |
324 | | - | |
325 | | - | |
326 | | - | |
| 319 | + | |
327 | 320 | | |
328 | 321 | | |
329 | 322 | | |
| |||
889 | 882 | | |
890 | 883 | | |
891 | 884 | | |
892 | | - | |
893 | | - | |
894 | | - | |
895 | | - | |
896 | | - | |
| 885 | + | |
| 886 | + | |
| 887 | + | |
| 888 | + | |
| 889 | + | |
| 890 | + | |
897 | 891 | | |
898 | | - | |
899 | | - | |
900 | | - | |
| 892 | + | |
| 893 | + | |
| 894 | + | |
| 895 | + | |
901 | 896 | | |
902 | 897 | | |
903 | 898 | | |
904 | | - | |
905 | | - | |
906 | | - | |
907 | | - | |
908 | | - | |
| 899 | + | |
| 900 | + | |
| 901 | + | |
| 902 | + | |
| 903 | + | |
| 904 | + | |
909 | 905 | | |
910 | | - | |
911 | | - | |
912 | | - | |
| 906 | + | |
| 907 | + | |
| 908 | + | |
| 909 | + | |
913 | 910 | | |
914 | 911 | | |
915 | 912 | | |
916 | | - | |
917 | | - | |
918 | | - | |
919 | | - | |
920 | | - | |
| 913 | + | |
| 914 | + | |
| 915 | + | |
| 916 | + | |
| 917 | + | |
| 918 | + | |
921 | 919 | | |
922 | | - | |
923 | | - | |
924 | | - | |
| 920 | + | |
| 921 | + | |
| 922 | + | |
| 923 | + | |
925 | 924 | | |
926 | 925 | | |
927 | 926 | | |
| |||
935 | 934 | | |
936 | 935 | | |
937 | 936 | | |
938 | | - | |
| 937 | + | |
939 | 938 | | |
940 | 939 | | |
941 | 940 | | |
| |||
944 | 943 | | |
945 | 944 | | |
946 | 945 | | |
947 | | - | |
| 946 | + | |
948 | 947 | | |
949 | 948 | | |
950 | 949 | | |
951 | 950 | | |
952 | 951 | | |
953 | 952 | | |
954 | | - | |
| 953 | + | |
955 | 954 | | |
956 | 955 | | |
957 | 956 | | |
0 commit comments