File tree Expand file tree Collapse file tree 3 files changed +250
-0
lines changed
Expand file tree Collapse file tree 3 files changed +250
-0
lines changed Original file line number Diff line number Diff line change 1+ Optimizer Benchmark Summary
2+ =========================
3+
4+ Dataset: cnn_mnist
5+ -----------------
6+ Final Test Accuracy:
7+ SGD: 97.57%
8+ Adam: 97.23%
9+ AdamW: 97.20%
10+ MaxFactor: 97.47%
11+
12+ Convergence Speed (epochs to 90% of final accuracy):
13+ SGD: 1 epochs
14+ Adam: 0 epochs
15+ AdamW: 0 epochs
16+ MaxFactor: 1 epochs
17+
18+ Average Time per Epoch:
19+ SGD: 1.17s
20+ Adam: 2.18s
21+ AdamW: 2.35s
22+ MaxFactor: 2.64s
23+
24+ Average Parameter Update Norm:
25+ SGD: 0.2764
26+ Adam: 0.5658
27+ AdamW: 0.5640
28+ MaxFactor: 0.6968
29+
30+
31+ Dataset: cnn_cifar
32+ -----------------
33+ Final Test Accuracy:
34+ SGD: 54.17%
35+ Adam: 21.43%
36+ AdamW: 21.47%
37+ MaxFactor: 49.57%
38+
39+ Convergence Speed (epochs to 90% of final accuracy):
40+ SGD: 6 epochs
41+ Adam: 3 epochs
42+ AdamW: 3 epochs
43+ MaxFactor: 5 epochs
44+
45+ Average Time per Epoch:
46+ SGD: 3.61s
47+ Adam: 3.62s
48+ AdamW: 1.97s
49+ MaxFactor: 1.97s
50+
51+ Average Parameter Update Norm:
52+ SGD: 0.3957
53+ Adam: 0.3934
54+ AdamW: 0.3926
55+ MaxFactor: 0.6972
56+
57+
58+ Dataset: convnet_cifar
59+ -----------------
60+ Final Test Accuracy:
61+ SGD: 48.37%
62+ Adam: 32.13%
63+ AdamW: 32.30%
64+ MaxFactor: 42.87%
65+
66+ Convergence Speed (epochs to 90% of final accuracy):
67+ SGD: 7 epochs
68+ Adam: 6 epochs
69+ AdamW: 6 epochs
70+ MaxFactor: 8 epochs
71+
72+ Average Time per Epoch:
73+ SGD: 1.87s
74+ Adam: 1.88s
75+ AdamW: 1.89s
76+ MaxFactor: 2.34s
77+
78+ Average Parameter Update Norm:
79+ SGD: 0.2950
80+ Adam: 0.8404
81+ AdamW: 0.8322
82+ MaxFactor: 0.6114
83+
84+
85+
86+ Memory Usage Comparison
87+ =====================
88+
89+ Feature Dimension: 100
90+ --------------------------
91+ SGD: 26.50 MB
92+ Adam: 35.34 MB
93+ AdamW: 35.34 MB
94+ MaxFactor: 26.53 MB
95+
96+ Feature Dimension: 200
97+ --------------------------
98+ SGD: 29.60 MB
99+ Adam: 39.21 MB
100+ AdamW: 39.21 MB
101+ MaxFactor: 29.62 MB
102+
103+ Feature Dimension: 400
104+ --------------------------
105+ SGD: 34.28 MB
106+ Adam: 46.34 MB
107+ AdamW: 46.34 MB
108+ MaxFactor: 34.31 MB
109+
110+ Feature Dimension: 800
111+ --------------------------
112+ SGD: 42.91 MB
113+ Adam: 57.21 MB
114+ AdamW: 57.21 MB
115+ MaxFactor: 42.94 MB
116+
117+ Feature Dimension: 1600
118+ --------------------------
119+ SGD: 61.66 MB
120+ Adam: 82.21 MB
121+ AdamW: 82.21 MB
122+ MaxFactor: 61.69 MB
123+
124+ MaxFactor uses 25.1% less memory than AdamW on average.
125+
Original file line number Diff line number Diff line change 1+ Optimizer Benchmark Summary
2+ =========================
3+
4+ Dataset: cnn_mnist
5+ -----------------
6+ Final Test Accuracy:
7+ SGD: 97.57%
8+ Adam: 97.23%
9+ AdamW: 97.20%
10+ MaxFactor: 97.27%
11+
12+ Convergence Speed (epochs to 90% of final accuracy):
13+ SGD: 1 epochs
14+ Adam: 0 epochs
15+ AdamW: 0 epochs
16+ MaxFactor: 1 epochs
17+
18+ Average Time per Epoch:
19+ SGD: 1.10s
20+ Adam: 1.14s
21+ AdamW: 1.13s
22+ MaxFactor: 1.32s
23+
24+ Average Parameter Update Norm:
25+ SGD: 0.2764
26+ Adam: 0.5658
27+ AdamW: 0.5640
28+ MaxFactor: 1.2449
29+
30+
31+ Dataset: cnn_cifar
32+ -----------------
33+ Final Test Accuracy:
34+ SGD: 54.17%
35+ Adam: 21.43%
36+ AdamW: 21.47%
37+ MaxFactor: 46.77%
38+
39+ Convergence Speed (epochs to 90% of final accuracy):
40+ SGD: 6 epochs
41+ Adam: 3 epochs
42+ AdamW: 3 epochs
43+ MaxFactor: 7 epochs
44+
45+ Average Time per Epoch:
46+ SGD: 1.84s
47+ Adam: 1.83s
48+ AdamW: 1.83s
49+ MaxFactor: 2.05s
50+
51+ Average Parameter Update Norm:
52+ SGD: 0.3957
53+ Adam: 0.3934
54+ AdamW: 0.3926
55+ MaxFactor: 1.1278
56+
57+
58+ Dataset: convnet_cifar
59+ -----------------
60+ Final Test Accuracy:
61+ SGD: 48.37%
62+ Adam: 32.13%
63+ AdamW: 32.30%
64+ MaxFactor: 33.47%
65+
66+ Convergence Speed (epochs to 90% of final accuracy):
67+ SGD: 7 epochs
68+ Adam: 6 epochs
69+ AdamW: 6 epochs
70+ MaxFactor: 4 epochs
71+
72+ Average Time per Epoch:
73+ SGD: 1.91s
74+ Adam: 1.87s
75+ AdamW: 1.88s
76+ MaxFactor: 3.15s
77+
78+ Average Parameter Update Norm:
79+ SGD: 0.2950
80+ Adam: 0.8404
81+ AdamW: 0.8322
82+ MaxFactor: 1.3449
83+
84+
85+
86+ Memory Usage Comparison
87+ =====================
88+
89+ Feature Dimension: 100
90+ --------------------------
91+ SGD: 26.50 MB
92+ Adam: 35.34 MB
93+ AdamW: 35.34 MB
94+ MaxFactor: 26.53 MB
95+
96+ Feature Dimension: 200
97+ --------------------------
98+ SGD: 29.60 MB
99+ Adam: 39.21 MB
100+ AdamW: 39.21 MB
101+ MaxFactor: 29.62 MB
102+
103+ Feature Dimension: 400
104+ --------------------------
105+ SGD: 34.28 MB
106+ Adam: 46.34 MB
107+ AdamW: 46.34 MB
108+ MaxFactor: 34.31 MB
109+
110+ Feature Dimension: 800
111+ --------------------------
112+ SGD: 42.91 MB
113+ Adam: 57.21 MB
114+ AdamW: 57.21 MB
115+ MaxFactor: 42.94 MB
116+
117+ Feature Dimension: 1600
118+ --------------------------
119+ SGD: 61.66 MB
120+ Adam: 82.21 MB
121+ AdamW: 82.21 MB
122+ MaxFactor: 61.69 MB
123+
124+ MaxFactor uses 25.1% less memory than AdamW on average.
125+
You can’t perform that action at this time.
0 commit comments