Skip to content

Commit 56fd697

Browse files
authored
Merge pull request #493 from komaksym/add_new_problem_CosineAnnealingLR
add new problem: CossineAnnealingLr learning rate scheduler
2 parents 57b1114 + 20d0fe4 commit 56fd697

File tree

7 files changed

+157
-0
lines changed

7 files changed

+157
-0
lines changed
Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,3 @@
1+
## Problem
2+
3+
Write a Python class CosineAnnealingLRScheduler to implement a learning rate scheduler based on the Cosine Annealing LR strategy. Your class should have an __init__ method to initialize with an initial_lr (float), T_max (int, the maximum number of iterations/epochs), and min_lr (float, the minimum learning rate) parameters. It should also have a **get_lr(self, epoch)** method that returns the current learning rate for a given epoch (int). The learning rate should follow a cosine annealing schedule. The returned learning rate should be rounded to 4 decimal places. Only use standard Python and the math module for trigonometric functions.
Lines changed: 5 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,5 @@
1+
{
2+
"input": "import math\nscheduler = CosineAnnealingLRScheduler(initial_lr=0.1, T_max=10, min_lr=0.001)\nprint(f\"{scheduler.get_lr(epoch=0):.4f}\")\nprint(f\"{scheduler.get_lr(epoch=2):.4f}\")\nprint(f\"{scheduler.get_lr(epoch=5):.4f}\")\nprint(f\"{scheduler.get_lr(epoch=7):.4f}\")\nprint(f\"{scheduler.get_lr(epoch=10):.4f}\")",
3+
"output": "0.1000\n0.0905\n0.0505\n0.0214\n0.0010",
4+
"reasoning": "The learning rate starts at initial_lr (0.1), follows a cosine curve, reaches min_lr (0.001) at T_max (epoch 10), and then cycles back up. Each value is rounded to 4 decimal places."
5+
}
Lines changed: 48 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,48 @@
1+
# **Learning Rate Schedulers: CosineAnnealingLR**
2+
3+
## **1. Definition**
4+
A **learning rate scheduler** is a technique used in machine learning to adjust the learning rate during the training of a model. The **learning rate** dictates the step size taken in the direction of the negative gradient of the loss function.
5+
6+
**CosineAnnealingLR (Cosine Annealing Learning Rate)** is a scheduler that aims to decrease the learning rate from a maximum value to a minimum value following the shape of a cosine curve. This approach helps in achieving faster convergence while also allowing the model to explore flatter regions of the loss landscape towards the end of training. It is particularly effective for deep neural networks.
7+
8+
## **2. Why Use Learning Rate Schedulers?**
9+
* **Faster Convergence:** A higher initial learning rate allows for quicker movement through the loss landscape.
10+
* **Improved Performance:** A smaller learning rate towards the end of training enables finer adjustments, helping the model converge to a better local minimum and preventing oscillations.
11+
* **Avoiding Local Minima:** The cyclical nature (or a part of it, as often seen in restarts) of cosine annealing can help the optimizer escape shallow local minima.
12+
* **Stability:** Gradual reduction in learning rate promotes training stability.
13+
14+
## **3. CosineAnnealingLR Mechanism**
15+
The learning rate is scheduled according to a cosine function. Over a cycle of $T_{\text{max}}$ epochs, the learning rate decreases from an initial learning rate (often considered the maximum $LR_{\text{max}}$) to a minimum learning rate ($LR_{\text{min}}$).
16+
17+
The formula for the learning rate at a given epoch e is:
18+
19+
$$LR_e = LR_{\text{min}} + 0.5 \times (LR_{\text{initial}} - LR_{\text{min}}) \times \left(1 + \cos\left(\frac{e}{T_{\text{max}}} \times \pi\right)\right)$$
20+
21+
Where:
22+
* $LR_e$: The learning rate at epoch e.
23+
* $LR_{\text{initial}}$: The initial (maximum) learning rate.
24+
* $LR_{\text{min}}$: The minimum learning rate that the schedule will reach.
25+
* $T_{\text{max}}$: The maximum number of epochs in the cosine annealing cycle. The learning rate will reach $LR_{\text{min}}$ at epoch $T_{\text{max}}$.
26+
* e: The current epoch number (0-indexed), clamped between 0 and $T_{\text{max}}$.
27+
* π: The mathematical constant pi (approximately 3.14159).
28+
* $\cos(\cdot)$: The cosine function.
29+
30+
**Example:**
31+
If $LR_{\text{initial}} = 0.1$, $T_{\text{max}} = 10$, and $LR_{\text{min}} = 0.001$:
32+
33+
* **Epoch 0:**
34+
$LR_0 = 0.001 + 0.5 \times (0.1 - 0.001) \times (1 + \cos(0)) = 0.001 + 0.0495 \times 2 = 0.1$
35+
36+
* **Epoch 5 (mid-point):**
37+
$LR_5 = 0.001 + 0.5 \times (0.1 - 0.001) \times (1 + \cos(\pi/2)) = 0.001 + 0.0495 \times 1 = 0.0505$
38+
39+
* **Epoch 10 (end of cycle):**
40+
$LR_{10} = 0.001 + 0.5 \times (0.1 - 0.001) \times (1 + \cos(\pi)) = 0.001 + 0.0495 \times 0 = 0.001$
41+
42+
## **4. Applications of Learning Rate Schedulers**
43+
Learning rate schedulers, including CosineAnnealingLR, are widely used in training various machine learning models, especially deep neural networks, across diverse applications such as:
44+
* **Image Classification:** Training Convolutional Neural Networks (CNNs) for tasks like object recognition.
45+
* **Natural Language Processing (NLP):** Training Recurrent Neural Networks (RNNs) and Transformers for tasks like machine translation, text generation, and sentiment analysis.
46+
* **Speech Recognition:** Training models for converting spoken language to text.
47+
* **Reinforcement Learning:** Optimizing policies in reinforcement learning agents.
48+
* **Any optimization problem** where gradient descent or its variants are used.
Lines changed: 15 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,15 @@
1+
{
2+
"id": "155",
3+
"title": "CosineAnnealingLR Learning Rate Scheduler",
4+
"difficulty": "medium",
5+
"category": "Machine Learning",
6+
"video": "",
7+
"likes": "0",
8+
"dislikes": "0",
9+
"contributor": [
10+
{
11+
"profile_link": "https://github.com/komaksym",
12+
"name": "komaksym"
13+
}
14+
]
15+
}
Lines changed: 41 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,41 @@
1+
import math
2+
3+
class CosineAnnealingLRScheduler:
4+
def __init__(self, initial_lr, T_max, min_lr):
5+
"""
6+
Initializes the CosineAnnealingLR scheduler.
7+
8+
Args:
9+
initial_lr (float): The initial (maximum) learning rate.
10+
T_max (int): The maximum number of epochs in the cosine annealing cycle.
11+
The learning rate will reach min_lr at this epoch.
12+
min_lr (float): The minimum learning rate.
13+
"""
14+
self.initial_lr = initial_lr
15+
self.T_max = T_max
16+
self.min_lr = min_lr
17+
18+
def get_lr(self, epoch):
19+
"""
20+
Calculates and returns the current learning rate for a given epoch,
21+
following a cosine annealing schedule and rounded to 4 decimal places.
22+
23+
Args:
24+
epoch (int): The current epoch number (0-indexed).
25+
26+
Returns:
27+
float: The calculated learning rate for the current epoch, rounded to 4 decimal places.
28+
"""
29+
# Ensure epoch does not exceed T_max for the calculation cycle,
30+
# as the cosine formula is typically defined for e from 0 to T_max.
31+
# Although in practice, schedulers might restart or hold LR after T_max.
32+
# For this problem, we'll clamp it to T_max if it goes over.
33+
current_epoch = min(epoch, self.T_max)
34+
35+
# Calculate the learning rate using the Cosine Annealing formula
36+
# LR_e = LR_min + 0.5 * (LR_initial - LR_min) * (1 + cos(e / T_max * pi))
37+
lr = self.min_lr + 0.5 * (self.initial_lr - self.min_lr) * \
38+
(1 + math.cos(current_epoch / self.T_max * math.pi))
39+
40+
# Round the learning rate to 4 decimal places
41+
return round(lr, 4)
Lines changed: 11 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,11 @@
1+
import math
2+
3+
class CosineAnnealingLRScheduler:
4+
def __init__(self, initial_lr, T_max, min_lr):
5+
# Initialize initial_lr, T_max, and min_lr
6+
pass
7+
8+
def get_lr(self, epoch):
9+
# Calculate and return the learning rate for the given epoch, rounded to 4 decimal places
10+
pass
11+
Lines changed: 34 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,34 @@
1+
[
2+
{
3+
"test": "import math\nscheduler = CosineAnnealingLRScheduler(initial_lr=0.1, T_max=10, min_lr=0.001)\nprint(f\"{scheduler.get_lr(epoch=0):.4f}\")",
4+
"expected_output": "0.1000"
5+
},
6+
{
7+
"test": "import math\nscheduler = CosineAnnealingLRScheduler(initial_lr=0.1, T_max=10, min_lr=0.001)\nprint(f\"{scheduler.get_lr(epoch=2):.4f}\")",
8+
"expected_output": "0.0905"
9+
},
10+
{
11+
"test": "import math\nscheduler = CosineAnnealingLRScheduler(initial_lr=0.1, T_max=10, min_lr=0.001)\nprint(f\"{scheduler.get_lr(epoch=5):.4f}\")",
12+
"expected_output": "0.0505"
13+
},
14+
{
15+
"test": "import math\nscheduler = CosineAnnealingLRScheduler(initial_lr=0.1, T_max=10, min_lr=0.001)\nprint(f\"{scheduler.get_lr(epoch=7):.4f}\")",
16+
"expected_output": "0.0214"
17+
},
18+
{
19+
"test": "import math\nscheduler = CosineAnnealingLRScheduler(initial_lr=0.1, T_max=10, min_lr=0.001)\nprint(f\"{scheduler.get_lr(epoch=10):.4f}\")",
20+
"expected_output": "0.0010"
21+
},
22+
{
23+
"test": "import math\nscheduler = CosineAnnealingLRScheduler(initial_lr=0.05, T_max=50, min_lr=0.0)\nprint(f\"{scheduler.get_lr(epoch=0):.4f}\\n{scheduler.get_lr(epoch=25):.4f}\\n{scheduler.get_lr(epoch=50):.4f}\")",
24+
"expected_output": "0.0500\n0.0250\n0.0000"
25+
},
26+
{
27+
"test": "import math\nscheduler = CosineAnnealingLRScheduler(initial_lr=0.001, T_max=1, min_lr=0.0001)\nprint(f\"{scheduler.get_lr(epoch=0):.4f}\\n{scheduler.get_lr(epoch=1):.4f}\")",
28+
"expected_output": "0.0010\n0.0001"
29+
},
30+
{
31+
"test": "import math\nscheduler = CosineAnnealingLRScheduler(initial_lr=0.2, T_max=20, min_lr=0.01)\nprint(f\"{scheduler.get_lr(epoch=15):.4f}\")",
32+
"expected_output": "0.0378"
33+
}
34+
]

0 commit comments

Comments
 (0)