diff --git a/SUMMARY.md b/SUMMARY.md
index 66371b16..3e1f30fa 100644
--- a/SUMMARY.md
+++ b/SUMMARY.md
@@ -1,108 +1,107 @@
-# Summary
+# Table of contents
 
 * [Introduction to Machine Learning Interviews Book](README.md)
-    * [Target audience](contents/0-target-audience.md)
-    * [About the questions](contents/0-about-the-questions.md)
-    * [About the answers](contents/0-about-the-answers.md)
-    * [Gaming the interview process](contents/0-gaming-the-interview-process.md)
-    * [Acknowledgments](contents/0-acknowledgments.md)
-    * [About the author](contents/0-about-the-author.md)
+  * [Target audience](contents/0-target-audience.md)
+  * [About the questions](contents/0-about-the-questions.md)
+  * [About the answers](contents/0-about-the-answers.md)
+  * [Gaming the interview process](contents/0-gaming-the-interview-process.md)
+  * [Acknowledgments](contents/0-acknowledgments.md)
+  * [About the author](contents/0-about-the-author.md)
 * [Part I. Overview](contents/part-i.-overview.md)
-    * [Chapter 1. Machine learning jobs](contents/chapter-1.-ml-jobs.md)
-        * [1.1 Different machine learning roles](contents/1.1-different-ml-roles.md)
-            * [1.1.1 Working in research vs. working in production](contents/1.1.1-working-in-research-vs.-workingin-production.md)
-            * [1.1.2 Research](contents/1.1.2-research.md)
-                * [1.1.2.1 Research vs. applied research](contents/1.1.2.1-research-vs.-applied-research.md)
-                * [1.1.2.2 Research scientist vs. research engineer](contents/1.1.2.2-research-scientist-vs.-research-engineer.md)
-            * [1.1.3 Production](contents/1.1.3-production.md)
-                * [1.1.3.1 Production cycle](contents/1.1.3.1-production-cycle.md)
-                * [1.1.3.2 Machine learning engineer vs. software engineer](contents/1.1.3.2-machine-learning-engineer-vs.-software-engineer.md)
-                * [1.1.3.3 Machine learning engineer vs. data scientist](contents/1.1.3.3-machine-learning-engineer-vs.-data-scientist.md)
-                * [1.1.3.4 Other technical roles in ML production](contents/1.1.3.4-other-technical-roles-in-ml-production.md)
-                * [1.1.3.5 Understanding roles and titles](contents/1.1.3.5-understanding-roles-and-titles.md)
-        * [1.2 Types of companies](contents/1.2-types-of-companies.md)
-            * [1.2.1 Applications companies vs. tooling companies](contents/1.2.1-applications-companies-vs.-tooling-companies.md)
-            * [1.2.2 Enterprise vs. consumer products](contents/1.2.2-enterprise-vs.-consumer-products.md)
-            * [1.2.3 Startups or big companies](contents/1.2.3-startups-or-big-companies.md)
-    * [Chapter 2. Machine learning interview process](contents/chapter-2.-machine-learning-interview-process.md)
-        * [2.1 Understanding the interviewers’ mindset](contents/2.1-understanding-the-interviewers’-mindset.md)
-            * [2.1.1 What companies want from candidates](contents/2.1.1-what-companies-want-from-candidates.md)
-                * [2.1.1.1 Technical skills](contents/2.1.1.1-technical-skills.md)
-                * [2.1.1.2 Non-technical skills](contents/2.1.1.2-non-technical-skills.md)
-                * [2.1.1.3 What exactly is culture fit?](contents/2.1.1.3-what-exactly-is-culture-fit.md)
-                * [2.1.1.4 Junior vs senior roles](contents/2.1.1.4-junior-vs-senior-roles.md)
-                * [2.1.1.5 Do I need a Ph.D. to work in machine learning?](contents/2.1.1.5-do-i-need-a-ph.d.-to-work-in-machine-learning.md)
-            * [2.1.2 How companies source candidates](contents/2.1.2-how-companies-source-candidates.md)
-            * [2.1.3 What signals companies look for in candidates](contents/2.1.3-what-signals-companies-look-for-in-candidates.md)
-        * [2.2 Interview pipeline](contents/2.2-interview-pipeline.md)
-            * [2.2.1 Common interview formats](contents/2.2.1-common-interview-formats.md)
-            * [2.2.2 Alternative interview formats](contents/2.2.2-alternative-interview-formats.md)
-            * [2.2.3 Interviews at big companies vs. at small companies](contents/2.2.3-interviews-at-big-companies-vs.-at-small-companies.md)
-            * [2.2.4 Interviews for internships vs. for full-time positions](contents/2.2.4-interviews-for-internships-vs.-for-full-time-positions.md)
-        * [2.3 Types of questions](contents/2.3-types-of-questions.md)
-            * [2.3.1 Behavioral questions](contents/2.3.1-behavioral-questions.md)
-                * [2.3.1.1 Background and resume](contents/2.3.1.1-background-and-resume.md)
-                * [2.3.1.2 Interests](contents/2.3.1.2-interests.md)
-                * [2.3.1.3 Communication](contents/2.3.1.3-communication.md)
-                * [2.3.1.4 Personality](contents/2.3.1.4-personality.md)
-            * [2.3.2 Questions to ask your interviewers](contents/2.3.2-questions-to-ask-your-interviewers.md)
-            * [2.3.3 Bad interview questions](contents/2.3.3-bad-interview-questions.md)
-        * [2.4 Red flags](contents/2.4-red-flags.md)
-        * [2.5 Timeline](contents/2.5-timeline.md)
-        * [2.6 Understanding your odds](contents/2.6-understanding-your-odds.md)
-    * [Chapter 3. After an offer](contents/chapter-3.-after-an-offer.md)
-        * [3.1 Compensation package](contents/3.1-compensation-package.md)
-            * [3.1.1 Base salary](contents/3.1.1-base-salary.md)
-            * [3.1.2 Equity grants](contents/3.1.2-equity-grants.md)
-            * [3.1.3 Bonuses](contents/3.1.3-bonuses.md)
-            * [3.1.4 Compensation packages at different levels](contents/3.1.4-compensation-packages-at-different-levels.md)
-        * [3.2 Negotiation](contents/3.2-negotiation.md)
-            * [3.2.1 Compensation expectations](contents/3.2.1-compensation-expectations.md)
-        * [3.3 Career progression](contents/3.3-career-progression.md)
-    * [Chapter 4. Where to start](contents/chapter-4.-where-to-start.md)
-        * [4.1 How long do I need for my job search?](contents/4.1-how-long-do-i-need-for-my-job-search.md)
-        * [4.2 How other people did it](contents/4.2-how-other-people-did-it.md)
-        * [4.3 Resources](contents/4.3-resources.md)
-            * [4.3.1 Courses](contents/4.3.1-courses.md)
-            * [4.3.2 Books & articles](contents/4.3.2-books-&-articles.md)
-            * [4.3.3 Other resources](contents/4.3.3-other-resources.md)
-        * [4.4 Do’s and don’ts for ML interviews](contents/4.4-do’s-and-don’ts-for-ml-interviews.md)
-            * [4.4.1 Do’s](contents/4.4.1-do’s.md)
-            * [4.4.2 Don’ts](contents/4.4.2-don’ts.md)
+  * [Chapter 1. Machine learning jobs](contents/chapter-1.-ml-jobs.md)
+    * [1.1 Different machine learning roles](contents/1.1-different-ml-roles.md)
+      * [1.1.1 Working in research vs. working in production](contents/1.1.1-working-in-research-vs.-workingin-production.md)
+      * [1.1.2 Research](contents/1.1.2-research.md)
+        * [1.1.2.1 Research vs. applied research](contents/1.1.2.1-research-vs.-applied-research.md)
+        * [1.1.2.2 Research scientist vs. research engineer](contents/1.1.2.2-research-scientist-vs.-research-engineer.md)
+      * [1.1.3 Production](contents/1.1.3-production.md)
+        * [1.1.3.1 Production cycle](contents/1.1.3.1-production-cycle.md)
+        * [1.1.3.2 Machine learning engineer vs. software engineer](contents/1.1.3.2-machine-learning-engineer-vs.-software-engineer.md)
+        * [1.1.3.3 Machine learning engineer vs. data scientist](contents/1.1.3.3-machine-learning-engineer-vs.-data-scientist.md)
+        * [1.1.3.4 Other technical roles in ML production](contents/1.1.3.4-other-technical-roles-in-ml-production.md)
+        * [1.1.3.5 Understanding roles and titles](contents/1.1.3.5-understanding-roles-and-titles.md)
+    * [1.2 Types of companies](contents/1.2-types-of-companies.md)
+      * [1.2.1 Applications companies vs. tooling companies](contents/1.2.1-applications-companies-vs.-tooling-companies.md)
+      * [1.2.2 Enterprise vs. consumer products](contents/1.2.2-enterprise-vs.-consumer-products.md)
+      * [1.2.3 Startups or big companies](contents/1.2.3-startups-or-big-companies.md)
+  * [Chapter 2. Machine learning interview process](contents/chapter-2.-machine-learning-interview-process.md)
+    * [2.1 Understanding the interviewers’ mindset](contents/2.1-understanding-the-interviewers’-mindset.md)
+      * [2.1.1 What companies want from candidates](contents/2.1.1-what-companies-want-from-candidates.md)
+        * [2.1.1.1 Technical skills](contents/2.1.1.1-technical-skills.md)
+        * [2.1.1.2 Non-technical skills](contents/2.1.1.2-non-technical-skills.md)
+        * [2.1.1.3 What exactly is culture fit?](contents/2.1.1.3-what-exactly-is-culture-fit.md)
+        * [2.1.1.4 Junior vs senior roles](contents/2.1.1.4-junior-vs-senior-roles.md)
+        * [2.1.1.5 Do I need a Ph.D. to work in machine learning?](contents/2.1.1.5-do-i-need-a-ph.d.-to-work-in-machine-learning.md)
+      * [2.1.2 How companies source candidates](contents/2.1.2-how-companies-source-candidates.md)
+      * [2.1.3 What signals companies look for in candidates](contents/2.1.3-what-signals-companies-look-for-in-candidates.md)
+    * [2.2 Interview pipeline](contents/2.2-interview-pipeline.md)
+      * [2.2.1 Common interview formats](contents/2.2.1-common-interview-formats.md)
+      * [2.2.2 Alternative interview formats](contents/2.2.2-alternative-interview-formats.md)
+      * [2.2.3 Interviews at big companies vs. at small companies](contents/2.2.3-interviews-at-big-companies-vs.-at-small-companies.md)
+      * [2.2.4 Interviews for internships vs. for full-time positions](contents/2.2.4-interviews-for-internships-vs.-for-full-time-positions.md)
+    * [2.3 Types of questions](contents/2.3-types-of-questions.md)
+      * [2.3.1 Behavioral questions](contents/2.3.1-behavioral-questions.md)
+        * [2.3.1.1 Background and resume](contents/2.3.1.1-background-and-resume.md)
+        * [2.3.1.2 Interests](contents/2.3.1.2-interests.md)
+        * [2.3.1.3 Communication](contents/2.3.1.3-communication.md)
+        * [2.3.1.4 Personality](contents/2.3.1.4-personality.md)
+      * [2.3.2 Questions to ask your interviewers](contents/2.3.2-questions-to-ask-your-interviewers.md)
+      * [2.3.3 Bad interview questions](contents/2.3.3-bad-interview-questions.md)
+    * [2.4 Red flags](contents/2.4-red-flags.md)
+    * [2.5 Timeline](contents/2.5-timeline.md)
+    * [2.6 Understanding your odds](contents/2.6-understanding-your-odds.md)
+  * [Chapter 3. After an offer](contents/chapter-3.-after-an-offer.md)
+    * [3.1 Compensation package](contents/3.1-compensation-package.md)
+      * [3.1.1 Base salary](contents/3.1.1-base-salary.md)
+      * [3.1.2 Equity grants](contents/3.1.2-equity-grants.md)
+      * [3.1.3 Bonuses](contents/3.1.3-bonuses.md)
+      * [3.1.4 Compensation packages at different levels](contents/3.1.4-compensation-packages-at-different-levels.md)
+    * [3.2 Negotiation](contents/3.2-negotiation.md)
+      * [3.2.1 Compensation expectations](contents/3.2.1-compensation-expectations.md)
+    * [3.3 Career progression](contents/3.3-career-progression.md)
+  * [Chapter 4. Where to start](contents/chapter-4.-where-to-start.md)
+    * [4.1 How long do I need for my job search?](contents/4.1-how-long-do-i-need-for-my-job-search.md)
+    * [4.2 How other people did it](contents/4.2-how-other-people-did-it.md)
+    * [4.3 Resources](contents/4.3-resources.md)
+      * [4.3.1 Courses](contents/4.3.1-courses.md)
+      * [4.3.2 Books & articles](contents/4.3.2-books-&-articles.md)
+      * [4.3.3 Other resources](contents/4.3.3-other-resources.md)
+    * [4.4 Do’s and don’ts for ML interviews](contents/4.4-do’s-and-don’ts-for-ml-interviews.md)
+      * [4.4.1 Do’s](contents/4.4.1-do’s.md)
+      * [4.4.2 Don’ts](contents/4.4.2-don’ts.md)
 * [Part II: Questions](contents/part-ii.-questions.md)
-    * [Chapter 5. Math](contents/chapter-5.-math.md)
-        * [Notation](contents/notation.md)
-        * [5.1 Algebra and (little) calculus](contents/5.1-algebra-and-calculus.md)
-            * [5.1.1 Vectors](contents/5.1.1-vectors.md)
-            * [5.1.2 Matrices](contents/5.1.2-matrices.md)
-            * [5.1.3 Dimensionality reduction](contents/5.1.3-dimensionality-reduction.md)
-            * [5.1.4 Calculus and convex optimization](contents/5.1.4-calculus-and-convex-optimization.md)
-        * [5.2 Probability and statistics](contents/5.2-probability-and-statistics.md)
-            * [5.2.1 Probability](contents/5.2.1-probability.md)
-                * [5.2.1.1 Basic concepts to review](contents/5.2.1.1-basic-concepts-to-review.md)
-                * [5.2.1.2 Questions](contents/5.2.1.2-questions.md)
-            * [5.2.2 Stats](contents/5.2.2-stats.md)
-    * [Chapter 6. Computer Science](contents/chapter-6.-computer-science.md)
-        * [6.1 Algorithms](contents/6.1-algorithms.md)
-        * [6.2 Complexity and numerical analysis](contents/6.2-complexity-and-numerical-analysis.md)
-        * [6.3 Data](contents/6.3-data.md)
-            * [6.3.1 Data structures](contents/6.3.1-data-structures.md)
-    * [Chapter 7. Machine learning workflows](contents/chapter-7.-machine-learning-workflows.md)
-        * [7.1 Basics](contents/7.1-basics.md)
-        * [7.2 Sampling and creating training data](contents/7.2-sampling-and-creating-training-data.md)
-        * [7.3 Objective functions, metrics, and evaluation](contents/7.3-objective-functions,-metrics,-and-evaluation.md)
-    * [Chapter 8. Machine learning algorithms](contents/chapter-8.-machine-learning-algorithms.md)
-        * [8.1 Classical machine learning](contents/8.1-classical-machine-learning.md)
-            * [8.1.1 Overview: Basic algorithm](contents/8.1.1-overview:-basic-algorithm.md)
-            * [8.1.2 Questions](contents/8.1.2-questions.md)
-        * [8.2 Deep learning architectures and applications](contents/8.2-deep-learning-architectures-and-applications.md)
-            * [8.2.1 Natural language processing](contents/8.2.1-natural-language-processing.md)
-            * [8.2.2 Computer vision](contents/8.2.2-computer-vision.md)
-            * [8.2.3 Reinforcement learning](contents/8.2.3-reinforcement-learning.md)
-            * [8.2.4 Other](contents/8.2.4-other.md)
-        * [8.3 Training neural networks](contents/8.3-training-neural-networks.md)
+  * [Chapter 5. Math](contents/chapter-5.-math.md)
+    * [Notation](contents/notation.md)
+    * [5.1 Algebra and (little) calculus](contents/5.1-algebra-and-calculus.md)
+      * [5.1.1 Vectors](contents/5.1.1-vectors.md)
+      * [5.1.2 Matrices](contents/5.1.2-matrices.md)
+      * [5.1.3 Dimensionality reduction](contents/5.1.3-dimensionality-reduction.md)
+      * [5.1.4 Calculus and convex optimization](contents/5.1.4-calculus-and-convex-optimization.md)
+    * [5.2 Probability and statistics](contents/5.2-probability-and-statistics.md)
+      * [5.2.1 Probability](contents/5.2.1-probability.md)
+        * [5.2.1.1 Basic concepts to review](contents/5.2.1.1-basic-concepts-to-review.md)
+        * [5.2.1.2 Questions](contents/5.2.1.2-questions.md)
+      * [5.2.2 Stats](contents/5.2.2-stats.md)
+  * [Chapter 6. Computer Science](contents/chapter-6.-computer-science.md)
+    * [6.1 Algorithms](contents/6.1-algorithms.md)
+    * [6.2 Complexity and numerical analysis](contents/6.2-complexity-and-numerical-analysis.md)
+    * [6.3 Data](contents/6.3-data.md)
+      * [6.3.1 Data structures](contents/6.3.1-data-structures.md)
+  * [Chapter 7. Machine learning workflows](contents/chapter-7.-machine-learning-workflows.md)
+    * [7.1 Basics](contents/7.1-basics.md)
+    * [7.2 Sampling and creating training data](contents/7.2-sampling-and-creating-training-data.md)
+    * [7.3 Objective functions, metrics, and evaluation](contents/7.3-objective-functions,-metrics,-and-evaluation.md)
+  * [Chapter 8. Machine learning algorithms](contents/chapter-8.-machine-learning-algorithms.md)
+    * [8.1 Classical machine learning](contents/8.1-classical-machine-learning.md)
+      * [8.1.1 Overview: Basic algorithm](contents/8.1.1-overview-basic-algorithm.md)
+      * [8.1.2 Questions](contents/8.1.2-questions.md)
+    * [8.2 Deep learning architectures and applications](contents/8.2-deep-learning-architectures-and-applications.md)
+      * [8.2.1 Natural language processing](contents/8.2.1-natural-language-processing.md)
+      * [8.2.2 Computer vision](contents/8.2.2-computer-vision.md)
+      * [8.2.3 Reinforcement learning](contents/8.2.3-reinforcement-learning.md)
+      * [8.2.4 Other](contents/8.2.4-other.md)
+    * [8.3 Training neural networks](contents/8.3-training-neural-networks.md)
 * [Appendix](contents/appendix.md)
-    * [A. For interviewers](contents/a.-for-interviewers.md)
-        * [The zen of interviews](contents/the-zen-of-interviews.md)
-    * [B. Building your network](contents/b.-building-your-network.md)
-
+  * [A. For interviewers](contents/a.-for-interviewers.md)
+    * [The zen of interviews](contents/the-zen-of-interviews.md)
+  * [B. Building your network](contents/b.-building-your-network.md)
diff --git a/answers/chapter5.md b/answers/chapter5.md
index e69de29b..7918c607 100644
--- a/answers/chapter5.md
+++ b/answers/chapter5.md
@@ -0,0 +1,2 @@
+The geometric interpretation of the dot product of two vectors provides a measure of simmilarity in direction of the two vectors
+or in other words quantifies how much one vector is going in the direction of the other
diff --git a/contents/8.1.1-overview:-basic-algorithm.md b/contents/8.1.1-overview-basic-algorithm.md
similarity index 77%
rename from contents/8.1.1-overview:-basic-algorithm.md
rename to contents/8.1.1-overview-basic-algorithm.md
index 6b0de3f2..8750402d 100644
--- a/contents/8.1.1-overview:-basic-algorithm.md
+++ b/contents/8.1.1-overview-basic-algorithm.md
@@ -1,34 +1,34 @@
-#### 8.1.1 Overview: Basic algorithms
+# 8.1.1 Overview: Basic algorithm
 
-##### 8.1.1.1 k-nearest neighbor (k-NN)
+## 8.1.1.1 k-nearest neighbor (k-NN)
 
 k-NN is a non-parametric method used for classification and regression. Given an object, the algorithm’s output is computed from its k closest training examples in the feature space.
 
-*   In k-NN classification, each object is classified into the class most common among its k nearest neighbors.
-*   In k-NN regression, each object’s value is calculated as the average of the values of its k nearest neighbors.
+* In k-NN classification, each object is classified into the class most common among its k nearest neighbors.
+* In k-NN regression, each object’s value is calculated as the average of the values of its k nearest neighbors.
 
 **Applications**: anomaly detection, search, recommender system
 
-##### 8.1.1.2 k-means clustering
+## 8.1.1.2 k-means clustering
 
 k-means clustering aims to partition observations into k clusters in which each observation belongs to the cluster with the nearest mean. k-means minimizes within-cluster variances (squared Euclidean distances), but not regular Euclidean distances.
 
 The algorithm doesn’t guarantee convergence to the global optimum. The result may depend on the initial clusters. As the algorithm is usually fast, it is common to run it multiple times with different starting conditions.
 
-The problem is computationally difficult (NP-hard); however, efficient heuristic algorithms converge quickly to a local optimum. 
+The problem is computationally difficult (NP-hard); however, efficient heuristic algorithms converge quickly to a local optimum.
 
 The algorithm has a loose relationship to the k-nearest neighbor classifier. After obtaining clusters using k-means clustering, we can classify new data into those clusters by applying the 1-nearest neighbor classifier to the cluster centers.
 
 **Applications**: Vector quantization for signal processing (where k-means clustering was originally developed), cluster analysis, feature learning, topic modeling.
 
-##### 8.1.1.3 EM (expectation-maximization) algorithm
+## 8.1.1.3 EM (expectation-maximization) algorithm
 
 EM algorithm is an iterative method to find maximum likelihood (MLE) or maximum a posteriori (MAP) estimates of parameters. It’s useful when the model depends on unobserved latent variables and equations can’t be solved directly.
 
 The iteration alternates between performing:
 
-*   an expectation (E) step, which creates a function for the expectation of the log-likelihood evaluated using the current estimate for the parameters
-*   a maximization (M) step, which computes parameters maximizing the expected log-likelihood found on the E step. These parameter-estimates are then used to determine the distribution of the latent variables in the next E step.
+* an expectation (E) step, which creates a function for the expectation of the log-likelihood evaluated using the current estimate for the parameters
+* a maximization (M) step, which computes parameters maximizing the expected log-likelihood found on the E step. These parameter-estimates are then used to determine the distribution of the latent variables in the next E step.
 
 EM algorithm is guaranteed to return a local optimum of the sample likelihood function.
 
@@ -36,7 +36,7 @@ EM algorithm is guaranteed to return a local optimum of the sample likelihood fu
 
 **Applications**: Data clustering, collaborative filtering.
 
-##### 8.1.1.4 Tree-based methods
+## 8.1.1.4 Tree-based methods
 
 **Decision tree** is a tree-based method that goes from observations about an object (represented in the branches) to conclusions about its target value (represented in the leaves). At its core, decision trees are nest if-else conditions.
 
@@ -44,11 +44,11 @@ In **classification trees**, the target value is discrete and each leaf represen
 
 Decision trees are easy to interpret and can be used to visualize decisions. However, they are overfit to the data they are trained on -- small changes to the training set can result in significantly different tree structures, which lead to significantly different outputs.
 
-##### 8.1.1.5 Bagging and boosting
+## 8.1.1.5 Bagging and boosting
 
 Bagging and boosting are two popular ensembling methods commonly used with tree-based algorithms that can also be used for other algorithms.
 
-###### 8.1.1.5.1 Bagging
+### 8.1.1.5.1 Bagging
 
 Bagging, shortened for **b**ootstrap **agg**regat**ing**, is designed to improve the stability and accuracy of ML algorithms. It reduces variance and helps to avoid overfitting.
 
@@ -58,12 +58,10 @@ If the problem is classification, the final prediction is decided by the majorit
 
 If the problem is regression, the final prediction is the average of all models’ predictions.
 
-Bagging generally improves unstable methods, such as neural networks, classification and regression trees, and subset selection in linear regression. However, it can mildly degrade the performance of stable methods such as k-nearest neighbors[^2].
+Bagging generally improves unstable methods, such as neural networks, classification and regression trees, and subset selection in linear regression. However, it can mildly degrade the performance of stable methods such as k-nearest neighbors\[^2].
 
-<center>
-	<img src="images/image26.png" width="90%" alt="Bagging" title="image_tooltip"><br>
-	Illustration by <a href="https://en.wikipedia.org/wiki/Bootstrap_aggregating#/media/File:Ensemble_Bagging.svg">Sirakorn</a>
-</center>
+![Bagging](images/image26.png)\
+Illustration by [Sirakorn](https://en.wikipedia.org/wiki/Bootstrap\_aggregating#/media/File:Ensemble\_Bagging.svg)
 
 A **random forest** is an example of bagging. A random forest is a collection of decision trees constructed by both **bagging** and **feature randomness**, each tree can pick only from a random subset of features to use.
 
@@ -73,36 +71,33 @@ Due to its ensembling nature, random forests correct for decision trees’ overf
 
 For more information on random forests, see [Understanding Random Forest](https://towardsdatascience.com/understanding-random-forest-58381e0602d2) by Tony Yiu.
 
-###### 8.1.1.5.2 Boosting
+### 8.1.1.5.2 Boosting
 
 Boosting is a family of iterative ensemble algorithms that convert weak learners to strong ones. Each learner in this ensemble is trained on the same set of samples but the samples are weighted differently among iterations. Thus, future weak learners focus more on the examples that previous weak learners misclassified.
 
-
 1. You start by training the first weak classifier on the original dataset.
 2. Samples are reweighted based on how well the first classifier classifies them, e.g. misclassified samples are given higher weight.
 3. Train the second classifier on this reweighted dataset. Your ensemble now consists of the first and the second classifiers.
 4. Samples are weighted based on how well the ensemble classifies them.
 5. Train the third classifier on this reweighted dataset. Add the third classifier to the ensemble.
 6. Repeat for as many iterations as needed.
-7. Form the final strong classifier as a weighted combination of the existing classifiers --  classifiers with smaller training errors have higher weights.
-
-<center>
-	<img src="images/image27.png" width="90%" alt="Boosting" title="image_tooltip"><br>
-	Illustration by <a href="https://en.wikipedia.org/wiki/Boosting_(machine_learning)#/media/File:Ensemble_Boosting.svg">Sirakorn</a>
-</center>
+7. Form the final strong classifier as a weighted combination of the existing classifiers -- classifiers with smaller training errors have higher weights.
 
+![Boosting](images/image27.png)\
+Illustration by [Sirakorn](https://en.wikipedia.org/wiki/Boosting\_\(machine\_learning\)#/media/File:Ensemble\_Boosting.svg)
 
 An example of a boosting algorithm is Gradient Boosting Machine which produces a prediction model typically from weak decision trees. It builds the model in a stage-wise fashion as other boosting methods do, and it generalizes them by allowing optimization of an arbitrary differentiable loss function.
 
-XGBoost, a variant of GBM, used to be [the algorithm of choice for many winning teams of machine learning competitions](https://github.com/dmlc/xgboost/tree/master/demo#machine-learning-challenge-winning-solutions). It’s been used in a wide range of tasks from classification, ranking, to the discovery of the Higgs Boson[^3]. However, many teams have been opting for [LightGBM](https://github.com/microsoft/LightGBM), a distributed gradient boosting framework that allows parallel learning which generally allows faster training on large datasets.
+XGBoost, a variant of GBM, used to be [the algorithm of choice for many winning teams of machine learning competitions](https://github.com/dmlc/xgboost/tree/master/demo#machine-learning-challenge-winning-solutions). It’s been used in a wide range of tasks from classification, ranking, to the discovery of the Higgs Boson\[^3]. However, many teams have been opting for [LightGBM](https://github.com/microsoft/LightGBM), a distributed gradient boosting framework that allows parallel learning which generally allows faster training on large datasets.
 
-##### 8.1.1.6 Kernel methods
+## 8.1.1.6 Kernel methods
 
 In machine learning, kernel methods are a class of algorithms for pattern analysis, whose best-known member is the support vector machine (SVM). The general task of pattern analysis is to find and study general types of relations (for example clusters, rankings, principal components, correlations, classifications) in datasets. For many algorithms that solve these tasks, the data in raw representation have to be explicitly transformed into feature vector representations via a user-specified feature map: in contrast, kernel methods require only a user-specified kernel, i.e., a similarity function over pairs of data points in raw representation.
 
-Kernel methods owe their name to the use of kernel functions, which enable them to operate in a high-dimensional, implicit feature space without ever computing the coordinates of the data in that space, but rather by simply computing the inner products between the images of all pairs of data in the feature space. This operation is often computationally cheaper than the explicit computation of the coordinates. This approach is called the "kernel trick".[1] Kernel functions have been introduced for sequence data, graphs, text, images, as well as vectors.
+Kernel methods owe their name to the use of kernel functions, which enable them to operate in a high-dimensional, implicit feature space without ever computing the coordinates of the data in that space, but rather by simply computing the inner products between the images of all pairs of data in the feature space. This operation is often computationally cheaper than the explicit computation of the coordinates. This approach is called the "kernel trick".\[1] Kernel functions have been introduced for sequence data, graphs, text, images, as well as vectors.
 
 Algorithms capable of operating with kernels include the kernel perceptron, support vector machines (SVM), Gaussian processes, principal components analysis (PCA), canonical correlation analysis, ridge regression, spectral clustering, linear adaptive filters, and many others. Any linear model can be turned into a non-linear model by applying the kernel trick to the model: replacing its features (predictors) with a kernel function.
 
----
-*This book was created by [Chip Huyen](https://huyenchip.com) with the help of wonderful friends. For feedback, errata, and suggestions, the author can be reached [here](https://huyenchip.com/communication/). Copyright ©2021 Chip Huyen.*
\ No newline at end of file
+***
+
+_This book was created by_ [_Chip Huyen_](https://huyenchip.com) _with the help of wonderful friends. For feedback, errata, and suggestions, the author can be reached_ [_here_](https://huyenchip.com/communication/)_. Copyright ©2021 Chip Huyen._