florian-huber
diff --git a/‎images/fig_nlp_word2vec.png‎
164 KB b/‎images/fig_nlp_word2vec.png‎
164 KB
diff --git a/‎notebooks/24_NLP_4_ngrams_word_vectors.ipynb‎
Lines changed: 15 additions & 4 deletions b/‎notebooks/24_NLP_4_ngrams_word_vectors.ipynb‎
Lines changed: 15 additions & 4 deletions
@@ -2028,16 +2028,26 @@
     "\n",
     "### Word2Vec\n",
     "\n",
-    "The fundamental idea behind Word2Vec is to use the context in which words appear to learn their meanings. As shown in the {numref}`fig_word2vec_sliding_window`, a sliding window of a fixed size (in this case, 5) moves across the sentence \"The customer likes cake with a cappuccino.\" At each step, the algorithm selects a target word and its surrounding context words. The goal is to predict the target word based on its context or vice versa.\n",
+    "The fundamental idea behind Word2Vec is to use the context in which words appear to learn their meanings. As shown in the {numref}`fig_word2vec_sliding_window`, a sliding window of a fixed size (here, 5) moves over the sentence \n",
     "\n",
-    "For example, in the phrase \"the customer likes,\" the target word is \"the,\" and the context words are \"customer\" and \"likes.\" This process is repeated for each possible position in the sentence. These word-context pairs are fed into the Word2Vec model, which learns to map each word to a unique vector in such a way that words appearing in similar contexts have similar vectors. This vector representation captures semantic similarities, meaning that words with similar meanings or usages are positioned closer together in the vector space. Word2Vec thus enables various applications such as sentiment analysis, machine translation, and recommendation systems by providing a mathematical representation of words that reflects their meanings and relationships.\n",
+    "> \"The customer likes cake with a cappuccino.\"\n",
     "\n",
-    "Word2Vec models can be trained using two main methods: Continuous Bag of Words (CBOW) and Skip-Gram. In CBOW, the model predicts a target word based on its surrounding context words, focusing on understanding the word's context to infer its meaning. Conversely, the Skip-Gram model predicts the surrounding context words given a target word, emphasizing the ability to generate context from a single word {cite}`mikolov_distributed_2013`{cite}`mikolov_efficient_2013`.\n",
+    "At each position, the algorithm selects the central word as **target** treats the remaining words in the window as its **context**. For example, in the phrase \"the customer likes,\" the target word is \"the,\" and the context words are \"customer\" and \"likes.\" This process is repeated for each possible position in the sentence. \n",
+    "\n",
+    "These word-context pairs are fed into the Word2Vec model, which learns to map each word to a unique vector in such a way that words appearing in similar contexts have similar vectors. This vector representation captures semantic similarities, meaning that words with similar meanings or usages are positioned closer together in the vector space. Word2Vec thereby enables various applications such as sentiment analysis, machine translation, and recommendation systems by providing a mathematical representation of words that reflects their meanings and relationships.\n",
     "\n",
     "```{figure} ../images/fig_word2vec_sliding_window.png\n",
     ":name: fig_word2vec_sliding_window\n",
     "\n",
     "Techniques such as Word2Vec learn vector representations of individual words based on their \"context\", which is given by the neighboring words. \n",
+    "```\n",
+    "\n",
+    "Word2Vec models can be trained using two main methods: Continuous Bag of Words (CBOW) and Skip-Gram. In CBOW, the model predicts a target word based on its surrounding context words, focusing on understanding the word's context to infer its meaning (see {numref}`fig_nlp_word2vec`). Conversely, the Skip-Gram model predicts the surrounding context words given a target word, emphasizing the ability to generate context from a single word {cite}`mikolov_distributed_2013`{cite}`mikolov_efficient_2013`.\n",
+    "\n",
+    "```{figure} ../images/fig_nlp_word2vec.png\n",
+    ":name: fig_nlp_word2vec\n",
+    "\n",
+    "The aim of a Word2Vec model is typically to learn vector representations of individual words based on some context words. This is done in such a way that the very large (and very sparse) input vectors are converted into highly compressed float vectors. In this example figure, the context words are \"cherry\", \"is\", \"and\", \"sweet\", and the target word would be \"red\".\n",
     "```\n"
    ]
   },
@@ -2657,7 +2667,8 @@
     "\n",
     "Very good starting points for going deeper are:\n",
     "- The book \"Speech and Language Processing\" by Jurafsky and Martin {cite}`jurafsky2024speech`, see [link to the book](https://web.stanford.edu/~jurafsky/slp3/).\n",
-    "- \"Natural language processing with transformers\" by Tunstall, von Werra, and Wolf {cite}`tunstall2022natural`"
+    "- \"Natural language processing with transformers\" by Tunstall, von Werra, and Wolf {cite}`tunstall2022natural`\n",
+    "- Another free online course with Python code material: [nlp planet](https://www.nlplanet.org/course-practical-nlp/)."
    ]
   },
   {