diff --git a/.github/ISSUE_TEMPLATE/bug_report.md b/.github/ISSUE_TEMPLATE/bug_report.md
new file mode 100644
index 0000000..dd84ea7
--- /dev/null
+++ b/.github/ISSUE_TEMPLATE/bug_report.md
@@ -0,0 +1,38 @@
+---
+name: Bug report
+about: Create a report to help us improve
+title: ''
+labels: ''
+assignees: ''
+
+---
+
+**Describe the bug**
+A clear and concise description of what the bug is.
+
+**To Reproduce**
+Steps to reproduce the behavior:
+1. Go to '...'
+2. Click on '....'
+3. Scroll down to '....'
+4. See error
+
+**Expected behavior**
+A clear and concise description of what you expected to happen.
+
+**Screenshots**
+If applicable, add screenshots to help explain your problem.
+
+**Desktop (please complete the following information):**
+ - OS: [e.g. iOS]
+ - Browser [e.g. chrome, safari]
+ - Version [e.g. 22]
+
+**Smartphone (please complete the following information):**
+ - Device: [e.g. iPhone6]
+ - OS: [e.g. iOS8.1]
+ - Browser [e.g. stock browser, safari]
+ - Version [e.g. 22]
+
+**Additional context**
+Add any other context about the problem here.
diff --git a/.github/ISSUE_TEMPLATE/feature_request.md b/.github/ISSUE_TEMPLATE/feature_request.md
new file mode 100644
index 0000000..bbcbbe7
--- /dev/null
+++ b/.github/ISSUE_TEMPLATE/feature_request.md
@@ -0,0 +1,20 @@
+---
+name: Feature request
+about: Suggest an idea for this project
+title: ''
+labels: ''
+assignees: ''
+
+---
+
+**Is your feature request related to a problem? Please describe.**
+A clear and concise description of what the problem is. Ex. I'm always frustrated when [...]
+
+**Describe the solution you'd like**
+A clear and concise description of what you want to happen.
+
+**Describe alternatives you've considered**
+A clear and concise description of any alternative solutions or features you've considered.
+
+**Additional context**
+Add any other context or screenshots about the feature request here.
diff --git a/CODE_OF_CONDUCT.md b/CODE_OF_CONDUCT.md
new file mode 100644
index 0000000..5f2a828
--- /dev/null
+++ b/CODE_OF_CONDUCT.md
@@ -0,0 +1,76 @@
+# Contributor Covenant Code of Conduct
+
+## Our Pledge
+
+In the interest of fostering an open and welcoming environment, we as
+contributors and maintainers pledge to making participation in our project and
+our community a harassment-free experience for everyone, regardless of age, body
+size, disability, ethnicity, sex characteristics, gender identity and expression,
+level of experience, education, socio-economic status, nationality, personal
+appearance, race, religion, or sexual identity and orientation.
+
+## Our Standards
+
+Examples of behavior that contributes to creating a positive environment
+include:
+
+* Using welcoming and inclusive language
+* Being respectful of differing viewpoints and experiences
+* Gracefully accepting constructive criticism
+* Focusing on what is best for the community
+* Showing empathy towards other community members
+
+Examples of unacceptable behavior by participants include:
+
+* The use of sexualized language or imagery and unwelcome sexual attention or
+ advances
+* Trolling, insulting/derogatory comments, and personal or political attacks
+* Public or private harassment
+* Publishing others' private information, such as a physical or electronic
+ address, without explicit permission
+* Other conduct which could reasonably be considered inappropriate in a
+ professional setting
+
+## Our Responsibilities
+
+Project maintainers are responsible for clarifying the standards of acceptable
+behavior and are expected to take appropriate and fair corrective action in
+response to any instances of unacceptable behavior.
+
+Project maintainers have the right and responsibility to remove, edit, or
+reject comments, commits, code, wiki edits, issues, and other contributions
+that are not aligned to this Code of Conduct, or to ban temporarily or
+permanently any contributor for other behaviors that they deem inappropriate,
+threatening, offensive, or harmful.
+
+## Scope
+
+This Code of Conduct applies both within project spaces and in public spaces
+when an individual is representing the project or its community. Examples of
+representing a project or community include using an official project e-mail
+address, posting via an official social media account, or acting as an appointed
+representative at an online or offline event. Representation of a project may be
+further defined and clarified by project maintainers.
+
+## Enforcement
+
+Instances of abusive, harassing, or otherwise unacceptable behavior may be
+reported by contacting the project team at ryan.dsilva.98@gmail.com. All
+complaints will be reviewed and investigated and will result in a response that
+is deemed necessary and appropriate to the circumstances. The project team is
+obligated to maintain confidentiality with regard to the reporter of an incident.
+Further details of specific enforcement policies may be posted separately.
+
+Project maintainers who do not follow or enforce the Code of Conduct in good
+faith may face temporary or permanent repercussions as determined by other
+members of the project's leadership.
+
+## Attribution
+
+This Code of Conduct is adapted from the [Contributor Covenant][homepage], version 1.4,
+available at https://www.contributor-covenant.org/version/1/4/code-of-conduct.html
+
+[homepage]: https://www.contributor-covenant.org
+
+For answers to common questions about this code of conduct, see
+https://www.contributor-covenant.org/faq
diff --git a/CONTRIBUTING.md b/CONTRIBUTING.md
new file mode 100644
index 0000000..888972d
--- /dev/null
+++ b/CONTRIBUTING.md
@@ -0,0 +1,8 @@
+# Contributing
+
+Keeping this simple. To contribute,
+1. Create an issue with the feature request/bug or ask to be assigned one of the existing issues.
+2. Create a new branch from develop and make all your changes in that branch.
+3. Ask to be merged into develop, adding either [@RyanDsilva](https://github.com/RyanDsilva) or [@sanfernoronha](https://github.com/sanfernoronha) as reviewer
+
+Thanks for contributing, let's make this project help thousands of people get started with Neural Networks
diff --git a/README.md b/README.md
index 6d1f878..5430d92 100644
--- a/README.md
+++ b/README.md
@@ -66,20 +66,32 @@ True Values:
 
 ## Roadmap 📑
 
-- [x] Basic Activation Functions
-- [x] Basic Loss Functions
-- [x] Gradient Descent
+- [ ] Activation Functions
+  - [x] Linear
+  - [x] Sigmoid
+  - [x] Tanh
+  - [x] Tanh
+  - [x] ReLu
+  - [ ] LeakyReLu
+  - [ ] SoftMax
+  - [ ] GeLu
+- [ ] Loss Functions
+  - [x] MAE
+  - [x] MSE
+  - [ ] CrossEntropy
+- [ ] Optimizers Functions
+  - [x] Gradient Descent
+  - [x] Gradient Descent w/ Momentum
+  - [ ] Nestrov's Accelerated
+  - [ ] RMSProp
+  - [ ] Adam
+- [ ] Regularization
+  - [ ] L1
+  - [ ] L2
+  - [ ] Dropout
 - [x] Layer Architecture
 - [x] Wrapper Classes
 - [x] Hyperparameters Configuration
-- [ ] Exotic Functions
-  - [ ] SoftMax Activation
-  - [ ] Gradient Descent w/ Momentum
-  - [ ] RMSProp Optimizer
-  - [ ] Adam Optimizer
-  - [ ] CrossEntropy Loss Function
-  - [ ] GeLu Activation
-- [ ] Regularization
 - [ ] Clean Architecture
 - [ ] UI (Similar to Tensorflow Playground)
 
@@ -87,6 +99,15 @@ True Values:
 
 ###### Collaborations in implementing and maintaining this project are welcome. Kindly reach out to me if interested.
 
+## Contributers 🌟
+
+<a href="https://github.com/RyanDsilva">
+  <img src="https://github.com/RyanDsilva.png?size=75" style="border-radius:50%">
+</a>
+<a href="https://github.com/sanfernoronha">
+  <img src="https://github.com/sanfernoronha.png?size=75" style="border-radius:50%">
+</a>
+
 ## References 📚
 
 - Deep Learning Specialization, Andrew NG - Coursera
diff --git a/core/dense.py b/core/dense.py
index 1762e7d..5dde8c9 100644
--- a/core/dense.py
+++ b/core/dense.py
@@ -9,6 +9,8 @@ class Dense(Layer):
     def __init__(self, input_size, output_size):
         self.weights = np.random.rand(input_size, output_size) - 0.5
         self.bias = np.random.rand(1, output_size) - 0.5
+        self.vW = np.zeros([input_size, output_size])
+        self.vB = np.zeros([1, output_size])
 
     def forward_propagation(self, input_data):
         self.input = input_data
@@ -19,8 +21,12 @@ def backward_propagation(self, output_error, optimizer_fn, learning_rate):
         input_error = np.dot(output_error, self.weights.T)
         dW = np.dot(self.input.T, output_error)
         dB = output_error
-        w_updated, b_updated = optimizer_fn(
-            self.weights, self.bias, dW, dB, learning_rate)
+
+        w_updated, b_updated, vW_updated, vB_updated = optimizer_fn.minimize(
+            self.weights, self.bias, dW, dB, self.vW, self.vB, learning_rate
+        )
         self.weights = w_updated
         self.bias = b_updated
+        self.vW = vW_updated
+        self.vB = vB_updated
         return input_error
diff --git a/main.py b/main.py
index 82e975e..59c408a 100644
--- a/main.py
+++ b/main.py
@@ -1,4 +1,5 @@
 import numpy as np
+import time
 
 import config
 from core.network import Network
@@ -6,26 +7,26 @@
 from core.activation_layer import Activation
 from activations.activation_functions import Tanh, dTanh
 from loss.loss_functions import MSE, dMSE
-from optimizers.optimizer_functions import GradientDescent
+from optimizers.optimizer_functions import Momentum
 
 from keras.datasets import mnist
 from keras.utils import np_utils
 
 # Load MNIST
 (x_train, y_train), (x_test, y_test) = mnist.load_data()
-x_train = x_train.reshape(x_train.shape[0], 1, 28*28)
-x_train = x_train.astype('float32')
+x_train = x_train.reshape(x_train.shape[0], 1, 28 * 28)
+x_train = x_train.astype("float32")
 x_train /= 255
 y_train = np_utils.to_categorical(y_train)
 
-x_test = x_test.reshape(x_test.shape[0], 1, 28*28)
-x_test = x_test.astype('float32')
+x_test = x_test.reshape(x_test.shape[0], 1, 28 * 28)
+x_test = x_test.astype("float32")
 x_test /= 255
 y_test = np_utils.to_categorical(y_test)
 
 # Model
 nn = Network()
-nn.add(Dense(28*28, 100))
+nn.add(Dense(28 * 28, 100))
 nn.add(Activation(Tanh, dTanh))
 nn.add(Dense(100, 50))
 nn.add(Activation(Tanh, dTanh))
@@ -33,10 +34,12 @@
 nn.add(Activation(Tanh, dTanh))
 
 # Training
+
 nn.useLoss(MSE, dMSE)
-nn.useOptimizer(GradientDescent, learning_rate=config.learning_rate)
+nn.useOptimizer(Momentum(), learning_rate=config.learning_rate)
 nn.fit(x_train[0:2000], y_train[0:2000], epochs=config.epochs)
 
+
 # Prediction
 out = nn.predict(x_test[0:2])
 print("\nPredicted Values: ")
diff --git a/optimizers/README.md b/optimizers/README.md
index a1bf819..1e12546 100644
--- a/optimizers/README.md
+++ b/optimizers/README.md
@@ -18,6 +18,8 @@ Optimizer Functions help us update the parameters in the most efficient way poss
 
   <img src="images/momentum.svg">
 
+  `vdW: accumulator for weight parameter | beta: momentum term (dampening factor) | dJ/dW: weights gradient (obtained from loss function)`
+
 - RMSProp
 
   <img src="images/rms_prop.svg" />
diff --git a/optimizers/optimizer_functions.py b/optimizers/optimizer_functions.py
index ecc2efb..3936221 100644
--- a/optimizers/optimizer_functions.py
+++ b/optimizers/optimizer_functions.py
@@ -1,8 +1,9 @@
 import numpy as np
 
 
-def GradientDescent(w, b, dW, dB, learning_rate=0.01):
-    """Implements Gradient Descent to find minima of cost function
+class GradientDescent:
+    def minimize(self, w, b, dW, dB, vW, vB, learning_rate=0.01):
+        """Implements Gradient Descent to find minima of cost function
 
     Parameters:
     - w (numpy array): weights matrix
@@ -16,32 +17,66 @@ def GradientDescent(w, b, dW, dB, learning_rate=0.01):
     - b_updated (numpy array): updated bias
 
     """
-    w_updated = w - learning_rate*dW
-    b_updated = b - learning_rate*dB
-    return w_updated, b_updated
+        w_updated = w - learning_rate * dW
+        b_updated = b - learning_rate * dB
+        return w_updated, b_updated, vW, vB
 
 
-def Momentum(w, b, dW, dB, learning_rate=0.01, beta=0.9):
-    """Implements Gradient Descent with Momentum to find minima of cost function
+class Momentum:
+    def minimize(self, w, b, dW, dB, vW, vB, learning_rate=0.01, beta=0.9):
+        """Implements Gradient Descent with Momentum to find minima of cost function
 
-    Parameters:
-    - w (numpy array): weights matrix
-    - b (numpy array): bias matrix
-    - dW (numpy array): gradient of weights matrix wrt cost function
-    - dB (numpy array): gradient of bias matrix wrt cost function
-    - learning_rate (double): learning rate used to update weights
-    - beta (double): 
+        Parameters:
+        - w (numpy array): weights matrix
+        - b (numpy array): bias matrix
+        - dW (numpy array): gradient of weights matrix wrt cost function
+        - dB (numpy array): gradient of bias matrix wrt cost function
+        - learning_rate (double): learning rate used to update weights
+        - beta (double): Momentum term for smoothing
+        - vW (numpy array): holds the state of the optimizer for previous iteration (weights)
+        - vB (numpy array): holds the state of the optimizer for previous iterations (biases)
 
-    Returns:
-    - w_updated (numpy array): updated weights
-    - b_updated (numpy array): updated bias
+        Returns:
+        - w_updated (numpy array): updated weights
+        - b_updated (numpy array): updated bias
+        - vW (numpy array): updated state of the optimizer for current iteration (weights)
+        - vB (numpy array): updated state of the optimizer for current iteration (biases)
 
-    """
-    pass
+        """
 
+        vW = beta * vW + (1 - beta) * dW
+        vB = beta * vB + (1 - beta) * dB
+        w_updated = w - learning_rate * vW
+        b_updated = b - learning_rate * vB
+        return w_updated, b_updated, vW, vB
 
-def RMSProp(w, b, dW, dB, learning_rate, beta, epsilon):
-    pass
+
+class RMSProp:
+    def minimize(self, w, b, dW, dB, sW, sB, learning_rate=0.01, beta=0.9,epsilon=1e-07):
+        """Implements Gradient Descent with RMSprop to find minima of cost function
+        Parameters:
+        - w (numpy array): weights matrix
+        - b (numpy array): bias matrix
+        - dW (numpy array): gradient of weights matrix wrt cost function
+        - dB (numpy array): gradient of bias matrix wrt cost function
+        - learning_rate (double): learning rate used to update weights
+        - beta (double): Momentum term for smoothing
+        - sW (numpy array): holds the state of the optimizer for previous iteration (weights)
+        - sB (numpy array): holds the state of the optimizer for previous iterations (biases)
+        - epsilon(double): a small constant for numerical stability
+        
+        Returns:
+        - w_updated (numpy array): updated weights
+        - b_updated (numpy array): updated bias
+        - sW (numpy array): updated state of the optimizer for current iteration (weights)
+        - sB (numpy array): updated state of the optimizer for current iteration (biases)
+        """
+        sW = beta*sW + (1-beta)*np.square(dW)
+        sB = beta*sB + (1-beta)*np.square(dB)
+        w_updated = w - (learning_rate*dW)/(np.sqrt(sW)+epsilon)
+        b_updated = b - (learning_rate*dB)/(np.sqrt(sB)+epsilon)
+
+        return w_updated, b_updated, sW, sB
 
 
 def Adam(w, b, dW, dB, learning_rate, beta1, beta2, epsilon):