diff --git a/.github/workflows/build_documentation.yml b/.github/workflows/build_documentation.yml
index 9ebde4c42..2cb0fae14 100644
--- a/.github/workflows/build_documentation.yml
+++ b/.github/workflows/build_documentation.yml
@@ -14,6 +14,6 @@ jobs:
       package: course
       path_to_docs: course/chapters/
       additional_args: --not_python_module
-      languages: ar bn de en es fa fr gj he hi id it ja ko ne pl pt ru ro te th tr vi zh-CN zh-TW
+      languages: ar bn de en es fa fr gj he hi id it ja ko my ne pl pt ru ro te th tr vi zh-CN zh-TW
     secrets:
       hf_token: ${{ secrets.HF_DOC_BUILD_PUSH }}
diff --git a/.github/workflows/build_pr_documentation.yml b/.github/workflows/build_pr_documentation.yml
index a0f408054..8e9259cd5 100644
--- a/.github/workflows/build_pr_documentation.yml
+++ b/.github/workflows/build_pr_documentation.yml
@@ -16,4 +16,4 @@ jobs:
       package: course
       path_to_docs: course/chapters/
       additional_args: --not_python_module
-      languages: ar bn de en es fa fr gj he hi id it ja ko ne pl pt ru ro te th tr vi zh-CN zh-TW
+      languages: ar bn de en es fa fr gj he hi id it ja ko my ne pl pt ru ro te th tr vi zh-CN zh-TW
diff --git a/chapters/bn/chapter2/1.mdx b/chapters/bn/chapter2/1.mdx
index 358c1f474..fd19b9784 100644
--- a/chapters/bn/chapter2/1.mdx
+++ b/chapters/bn/chapter2/1.mdx
@@ -20,6 +20,5 @@
 
 তারপরে আমরা টোকেনাইজার API দেখব, যা `pipeline()` ফাংশনের অন্য একটি প্রধান উপাদান। টোকেনাইজার জিনিসটা প্রথম ও শেষ প্রসেসিং স্টেপগুলোতে মেইনলি কাজে লাগে, নিউরাল নেটওয়ার্কের জন্য টেক্সট ডাটা থেকে সংখ্যাসূচক ইনপুটে রূপান্তর  এবং পরে আবার প্রয়োজন অনুযায়ী সংখ্যাসূচক ডাটা থেকে টেক্সট ডাটাতে রূপান্তর করার সময়। পরিশেষে, আমরা আপনাকে দেখাব কিভাবে ব্যাচের মাধ্যমে একাধিক বাক্যকে একটি মডেলে পাঠানো যায়।  তারপরে আরেকবার হাই-লেভেলে `tokenizer()` ফাংশনটিকে একনজরে দেখার মাধ্যমে পুরো অধ্যায়ের ইতি টানব।
 
-<Tip>
-⚠️ Model Hub এবং 🤗 Transformers এর সাথে উপলব্ধ সমস্ত বৈশিষ্ট্যগুলি থেকে উপকৃত হওয়ার জন্য, আমরা সাজেস্ট করি <a href="https://huggingface.co/join">এখানে  একটি একাউন্ট তৈরি করার জন্যে।</a>.
-</Tip>
\ No newline at end of file
+> [!TIP]
+> ⚠️ Model Hub এবং 🤗 Transformers এর সাথে উপলব্ধ সমস্ত বৈশিষ্ট্যগুলি থেকে উপকৃত হওয়ার জন্য, আমরা সাজেস্ট করি <a href="https://huggingface.co/join">এখানে  একটি একাউন্ট তৈরি করার জন্যে।</a>.
\ No newline at end of file
diff --git a/chapters/de/chapter1/3.mdx b/chapters/de/chapter1/3.mdx
index 7db9cb48a..9e15b54d2 100644
--- a/chapters/de/chapter1/3.mdx
+++ b/chapters/de/chapter1/3.mdx
@@ -9,11 +9,10 @@
 
 In diesem Abschnitt schauen wir uns an, was Transformer-Modelle zu leisten imstande sind. Zudem verwenden wir unser erstes Werkzeug aus der 🤗 Transformers-Bibliothek: die Funktion `pipeline()`.
 
-<Tip>
-👀 Siehst du rechts oben die Schaltfläche <em>Open in Colab</em>? Klicke darauf, um ein Google Colab Notebook, das alle Codebeispiele dieses Abschnitts enthält, zu öffnen. Diese Schaltfläche ist in jedem Abschnitt, der Codebeispiele enthält, zu finden.
-
-Wenn du die Beispiele lieber lokal ausführen möchtest, empfehlen wir dir, einen Blick auf das Kapitel <a href="/course/chapter0">Einrichtung</a> zu werfen.
-</Tip>
+> [!TIP]
+> 👀 Siehst du rechts oben die Schaltfläche <em>Open in Colab</em>? Klicke darauf, um ein Google Colab Notebook, das alle Codebeispiele dieses Abschnitts enthält, zu öffnen. Diese Schaltfläche ist in jedem Abschnitt, der Codebeispiele enthält, zu finden.
+>
+> Wenn du die Beispiele lieber lokal ausführen möchtest, empfehlen wir dir, einen Blick auf das Kapitel <a href="/course/chapter0">Einrichtung</a> zu werfen.
 
 ## Transformer-Modelle sind überall anzutreffen!
 
@@ -23,9 +22,8 @@ Transformer-Modelle werden verwendet, um alle Arten von CL-Aufgaben (engl. Tasks
 
 Die [🤗 Transformers-Bibliothek](https://github.com/huggingface/transformers) bietet die Funktionalität, um diese geteilten Modelle zu erstellen und zu nutzen. Der [Model Hub](https://huggingface.co/models) enthält Tausende von vortrainierten Modellen, die jeder herunterladen und nutzen kann. Auch du kannst dort deine eigenen Modelle hochladen!
 
-<Tip>
-⚠️ Der Hugging Face Hub ist nicht auf Transformer-Modelle beschränkt. Jede bzw. jeder kann die von ihr bzw. ihm gewünschten Arten von Modellen oder Datensätzen teilen! <a href="https://huggingface.co/join">Erstelle ein Konto auf huggingface.co</a>, um alle verfügbaren Features nutzen zu können!
-</Tip>
+> [!TIP]
+> ⚠️ Der Hugging Face Hub ist nicht auf Transformer-Modelle beschränkt. Jede bzw. jeder kann die von ihr bzw. ihm gewünschten Arten von Modellen oder Datensätzen teilen! <a href="https://huggingface.co/join">Erstelle ein Konto auf huggingface.co</a>, um alle verfügbaren Features nutzen zu können!
 
 Bevor wir uns ansehen, wie Transformer-Modelle im Einzelnen funktionieren, widmen wir uns ein paar Beispielen, die veranschaulichen, wie sie zur Lösung interessanter CL-Problemstellungen eingesetzt werden können.
 
@@ -104,11 +102,8 @@ classifier(
 
 Diese Pipeline heißt _zero-shot_, weil du das Modell nicht erst auf deine Daten feintunen musst, ehe du es verwenden kannst. Sie kann direkt die Wahrscheinlichkeiten für jede beliebige von dir vorgegebene Liste von Labels liefern!
 
-<Tip>
-
-✏️ **Probiere es aus!** Spiel mit deinen eigenen Sequenzen und Labels herum und beobachte, wie sich das Modell verhält.
-
-</Tip>
+> [!TIP]
+> ✏️ **Probiere es aus!** Spiel mit deinen eigenen Sequenzen und Labels herum und beobachte, wie sich das Modell verhält.
 
 
 ## Textgenerierung
@@ -132,11 +127,8 @@ generator("In this course, we will teach you how to")
 
 Mit dem Argument `num_return_sequences` kannst du steuern, wie viele verschiedene Sequenzen erzeugt werden und mit dem Argument `max_length`, wie lang der Ausgabetext insgesamt sein soll.
 
-<Tip>
-
-✏️ **Probiere es aus!** Wähle die Argumente `num_return_sequences` und `max_length` so, dass zwei Sätze mit jeweils 15 Wörtern erzeugt werden.
-
-</Tip>
+> [!TIP]
+> ✏️ **Probiere es aus!** Wähle die Argumente `num_return_sequences` und `max_length` so, dass zwei Sätze mit jeweils 15 Wörtern erzeugt werden.
 
 
 ## Verwendung eines beliebigen Modells vom Hub in einer Pipeline
@@ -168,11 +160,8 @@ Du kannst deine Suche nach einem Modell verfeinern, indem du auf eines der `Lang
 
 Nachdem du auf ein Modell geklickt und es ausgewählt hast, siehst du, dass es ein Widget gibt, mit dem du es direkt online ausprobieren kannst. Dementsprechend kannst du die Fähigkeiten eines Modells erst schnell testen, bevor du dich dazu entschließt, es herunterzuladen.
 
-<Tip>
-
-✏️ **Probiere es aus!** Verwende die Filter, um ein Textgenerierungsmodell für eine andere Sprache zu finden. Experimentiere ruhig ein wenig mit dem Widget und verwende das Modell in einer Pipeline!
-
-</Tip>
+> [!TIP]
+> ✏️ **Probiere es aus!** Verwende die Filter, um ein Textgenerierungsmodell für eine andere Sprache zu finden. Experimentiere ruhig ein wenig mit dem Widget und verwende das Modell in einer Pipeline!
 
 ### Die Inference API
 
@@ -204,11 +193,8 @@ unmasker("This course will teach you all about <mask> models.", top_k=2)
 
 Mit dem Argument `top_k` kannst du bestimmen, wie viele Möglichkeiten dir ausgegeben werden sollen. Beachte, dass das Modell hier das spezielle Wort `<mask>` auffüllt, das oft als *Mask-Token* bezeichnet wird. Andere Modelle, die dazu dienen, Maskierungen aufzufüllen, können andere Mask Tokens haben. Deshalb ist es immer gut, erst das verwendete Mask Token zu ermitteln, wenn du andere Modelle nutzen möchtest. Eine Möglichkeit, zu überprüfen, welches Mask Token verwendet wird, ist das Widget.
 
-<Tip>
-
-✏️ **Probiere es aus!** Suche im Hub nach dem Modell `bert-base-cased` und finde sein Mask Token im Widget, das auf der Inference API basiert, heraus. Was sagt dieses Modell für den oben in der Pipeline verwendeten Satz vorher?
-
-</Tip>
+> [!TIP]
+> ✏️ **Probiere es aus!** Suche im Hub nach dem Modell `bert-base-cased` und finde sein Mask Token im Widget, das auf der Inference API basiert, heraus. Was sagt dieses Modell für den oben in der Pipeline verwendeten Satz vorher?
 
 ## Named Entity Recognition
 
@@ -232,11 +218,8 @@ Hier hat das Modell richtig erkannt, dass Sylvain eine Person (PER), Hugging Fac
 
 In der Funktion zur Erstellung der Pipeline übergeben wir die Option `grouped_entities=True`, um die Pipeline anzuweisen, die Teile des Satzes, die der gleichen Entität entsprechen, zu gruppieren: Hier hat das Modell "Hugging" und "Face" richtigerweise als eine einzelne Organisation gruppiert, auch wenn der Name aus mehreren Wörtern besteht. Wie wir im nächsten Kapitel sehen werden, werden bei der Vorverarbeitung (engl. Preprocessing) sogar einige Wörter in kleinere Teile zerlegt. Zum Beispiel wird `Sylvain` in vier Teile zerlegt: `S`, `##yl`, `##va` und `##in`. Im Nachverarbeitungsschritt (engl. Post-Processing) hat die Pipeline diese Teile erfolgreich neu gruppiert.
 
-<Tip>
-
-✏️ **Probiere es aus!** Suche im Model Hub nach einem Modell, das in der Lage ist, Part-of-Speech-Tagging (in der Regel als POS abgekürzt) im Englischen durchzuführen (Anm.: d. h. Wortarten zuzuordnen). Was sagt dieses Modell für den Satz im obigen Beispiel vorher?
-
-</Tip>
+> [!TIP]
+> ✏️ **Probiere es aus!** Suche im Model Hub nach einem Modell, das in der Lage ist, Part-of-Speech-Tagging (in der Regel als POS abgekürzt) im Englischen durchzuführen (Anm.: d. h. Wortarten zuzuordnen). Was sagt dieses Modell für den Satz im obigen Beispiel vorher?
 
 ## Frage-Antwort-Systeme (Question Answering)
 
@@ -320,10 +303,7 @@ translator("Ce cours est produit par Hugging Face.")
 
 Wie bei der Textgenerierung und -zusammenfassung kannst du auch hier `max_length` oder `min_length` als Argumente für das Ergebnis angeben.
 
-<Tip>
-
-✏️ **Probiere es aus!** Suche nach Übersetzungsmodellen in anderen Sprachen und versuche, den vorangegangenen Satz in mehrere verschiedene Sprachen zu übersetzen.
-
-</Tip>
+> [!TIP]
+> ✏️ **Probiere es aus!** Suche nach Übersetzungsmodellen in anderen Sprachen und versuche, den vorangegangenen Satz in mehrere verschiedene Sprachen zu übersetzen.
 
 Die bisher gezeigten Pipelines dienen hauptsächlich zu Demonstrationszwecken. Sie wurden für bestimmte Aufgabenstellungen programmiert und sind nicht für Abwandlungen geeignet. Im nächsten Kapitel erfährst du, was sich hinter einer `pipeline()`-Funktion verbirgt und wie du ihr Verhalten anpassen kannst.
diff --git a/chapters/de/chapter3/2.mdx b/chapters/de/chapter3/2.mdx
index f77dc0141..1c9aaad0a 100644
--- a/chapters/de/chapter3/2.mdx
+++ b/chapters/de/chapter3/2.mdx
@@ -149,11 +149,8 @@ raw_train_dataset.features
 
 Hinter den Kulissen ist `label` vom Typ `ClassLabel`, und die Zuordnung von Ganzzahlen zum Labelnamen wird im Ordner *names* gespeichert. `0` entspricht `not_equivalent`, also "nicht äquivalent", und `1` entspricht `equivalent`, also "äquivalent".
 
-<Tip>
-
-✏️ **Probier es aus!** Sieh dir das Element 15 der Trainingsdaten und Element 87 des Validierungsdaten an. Was sind ihre Labels?
-
-</Tip>
+> [!TIP]
+> ✏️ **Probier es aus!** Sieh dir das Element 15 der Trainingsdaten und Element 87 des Validierungsdaten an. Was sind ihre Labels?
 
 ### Vorverarbeitung eines Datensatzes
 
@@ -191,11 +188,8 @@ inputs
 
 In [Kapitel 2](/course/chapter2) haben wir die Schlüsselwerte `input_ids` und `attention_mask` behandelt, allerdings haben wir es aufgeschoben, über `token_type_ids` zu sprechen. In diesem Beispiel teilt diese dem Modell mit, welcher Teil des Input der erste Satz und welcher der zweite Satz ist.
 
-<Tip>
-
-✏️ **Probier es aus!** Nimm Element 15 der Trainingsdaten und tokenisiere die beiden Sätze separat und als Paar. Wo liegt der Unterschied zwischen den beiden Ergebnissen?
-
-</Tip>
+> [!TIP]
+> ✏️ **Probier es aus!** Nimm Element 15 der Trainingsdaten und tokenisiere die beiden Sätze separat und als Paar. Wo liegt der Unterschied zwischen den beiden Ergebnissen?
 
 Wenn wir die IDs in `input_ids` zurück in Worte dekodieren:
 
@@ -353,11 +347,8 @@ Das sieht gut aus! Jetzt, da wir vom Rohtext zu Batches übergegangen sind, mit
 
 {/if}
 
-<Tip>
-
-✏️ **Probier es aus!** Repliziere die Vorverarbeitung auf dem GLUE SST-2-Datensatz. Es ist ein bisschen anders, da es aus einzelnen Sätzen statt aus Paaren besteht, aber der Rest von dem, was wir gemacht haben, sollte gleich aussehen. Alternative wäre eine schwierigere Herausforderung, eine Vorverarbeitungsfunktion zu schreiben, die bei allen GLUE-Aufgaben funktioniert.
-
-</Tip>
+> [!TIP]
+> ✏️ **Probier es aus!** Repliziere die Vorverarbeitung auf dem GLUE SST-2-Datensatz. Es ist ein bisschen anders, da es aus einzelnen Sätzen statt aus Paaren besteht, aber der Rest von dem, was wir gemacht haben, sollte gleich aussehen. Alternative wäre eine schwierigere Herausforderung, eine Vorverarbeitungsfunktion zu schreiben, die bei allen GLUE-Aufgaben funktioniert.
 
 {#if fw === 'tf'}
 
diff --git a/chapters/de/chapter3/3.mdx b/chapters/de/chapter3/3.mdx
index 202251577..ef20299a4 100644
--- a/chapters/de/chapter3/3.mdx
+++ b/chapters/de/chapter3/3.mdx
@@ -42,11 +42,8 @@ from transformers import TrainingArguments
 training_args = TrainingArguments("test-trainer")
 ```
 
-<Tip>
-
-💡 Wenn du dein Modell während des Trainings automatisch in das Hub hochladen möchtest, kann in `TrainingArguments` das Argument `push_to_hub=True` angegeben werden. Darüber erfahren wir in [Kapitel 4](/course/chapter4/3) mehr.
-
-</Tip>
+> [!TIP]
+> 💡 Wenn du dein Modell während des Trainings automatisch in das Hub hochladen möchtest, kann in `TrainingArguments` das Argument `push_to_hub=True` angegeben werden. Darüber erfahren wir in [Kapitel 4](/course/chapter4/3) mehr.
 
 Der zweite Schritt ist die Definition unseres Modells. Wie im [vorherigen Kapitel](/course/chapter2) verwenden wir die Klasse `AutoModelForSequenceClassification` mit zwei Labels:
 
@@ -164,9 +161,6 @@ Der `Trainer` funktioniert sofort auf mehreren GPUs oder TPUs und bietet zahlrei
 
 Damit ist die Einführung in das Fein-tunen mit der `Trainer` API abgeschlossen. Beispiele für die gängigsten CL-Aufgaben werden in Kapitel 7 gegeben, aber jetzt schauen wir uns erst einmal an, wie man das Gleiche in PyTorch bewerkstelligen kann.
 
-<Tip>
-
-✏️ **Probier es aus!** Fein-tune ein Modell mit dem GLUE SST-2 Datensatz, indem du die Datenverarbeitung aus Abschnitt 2 verwendest.
-
-</Tip>
+> [!TIP]
+> ✏️ **Probier es aus!** Fein-tune ein Modell mit dem GLUE SST-2 Datensatz, indem du die Datenverarbeitung aus Abschnitt 2 verwendest.
 
diff --git a/chapters/de/chapter3/3_tf.mdx b/chapters/de/chapter3/3_tf.mdx
index 970d06835..4d4cbb18b 100644
--- a/chapters/de/chapter3/3_tf.mdx
+++ b/chapters/de/chapter3/3_tf.mdx
@@ -70,11 +70,8 @@ Im Gegensatz zu [Kapitel 2](/course/chapter2) wird eine Warnung angezeigt, nachd
 
 Um das Modell mit unserem Datensatz fein-tunen zu können, müssen wir das Modell `kompilieren()` und unsere Daten an die `fit()`-Methode übergeben. Damit wird das Fein-tuning gestartet (dies sollte auf einer GPU ein paar Minuten dauern) und der Trainingsverlust sowie der Validierungsverlust am Ende jeder Epoche gemeldet.
 
-<Tip>
-
-🤗 Transformer Modelle haben eine besondere Fähigkeit, die die meisten Keras Modelle nicht haben - sie können automatisch einen geeigneten Verlust verwenden, der intern berechnet wird. Dieser Verlust wird standardmäßig verwendet, wenn in `compile()` kein Verlustargument angegeben wird. Um den internen Verlust zu verwenden, musst du deine Labels als Teil des Input übergeben und nicht als separates Label, wie es normalerweise bei Keras-Modellen der Fall ist. Beispiele dafür gibt es in Teil 2 des Kurses, wobei die Definition der richtigen Verlustfunktion schwierig sein kann. Für die Klassifizierung von Sequenzen eignet sich jedoch eine der Standardverlustfunktionen von Keras, die wir hier verwenden werden.
-
-</Tip>
+> [!TIP]
+> 🤗 Transformer Modelle haben eine besondere Fähigkeit, die die meisten Keras Modelle nicht haben - sie können automatisch einen geeigneten Verlust verwenden, der intern berechnet wird. Dieser Verlust wird standardmäßig verwendet, wenn in `compile()` kein Verlustargument angegeben wird. Um den internen Verlust zu verwenden, musst du deine Labels als Teil des Input übergeben und nicht als separates Label, wie es normalerweise bei Keras-Modellen der Fall ist. Beispiele dafür gibt es in Teil 2 des Kurses, wobei die Definition der richtigen Verlustfunktion schwierig sein kann. Für die Klassifizierung von Sequenzen eignet sich jedoch eine der Standardverlustfunktionen von Keras, die wir hier verwenden werden.
 
 ```py
 from tensorflow.keras.losses import SparseCategoricalCrossentropy
@@ -90,11 +87,8 @@ model.fit(
 )
 ```
 
-<Tip warning={true}>
-
-Hier gibt es einen sehr häufigen Stolperstein - du *kannst* Keras einfach den Namen des Verlusts als String übergeben, aber standardmäßig geht Keras davon aus, dass du bereits einen Softmax auf die Outputs angewendet hast. Viele Modelle geben jedoch die Werte direkt vor der Anwendung des Softmax als *Logits* aus. Hier ist es wichtig der Keras Verlustfunktion mitzuteilen, dass unser Modell genau diess tut, und das geht nur indem sie direkt aufgerufen wird, und nicht über den Namen mit einem String.
-
-</Tip>
+> [!WARNING]
+> Hier gibt es einen sehr häufigen Stolperstein - du *kannst* Keras einfach den Namen des Verlusts als String übergeben, aber standardmäßig geht Keras davon aus, dass du bereits einen Softmax auf die Outputs angewendet hast. Viele Modelle geben jedoch die Werte direkt vor der Anwendung des Softmax als *Logits* aus. Hier ist es wichtig der Keras Verlustfunktion mitzuteilen, dass unser Modell genau diess tut, und das geht nur indem sie direkt aufgerufen wird, und nicht über den Namen mit einem String.
 
 
 ### Verbesserung der Trainingsperformance
@@ -122,11 +116,8 @@ from tensorflow.keras.optimizers import Adam
 opt = Adam(learning_rate=lr_scheduler)
 ```
 
-<Tip>
-
-Die 🤗 Transformer Bibliothek hat eine `create_optimizer()`-Funktion, die einen `AdamW`-Optimierer mit Lernratenabfall erzeugt. Das ist eine praktisches Tool, auf das wir in den nächsten Abschnitten des Kurses im Detail eingehen werden.
-
-</Tip>
+> [!TIP]
+> Die 🤗 Transformer Bibliothek hat eine `create_optimizer()`-Funktion, die einen `AdamW`-Optimierer mit Lernratenabfall erzeugt. Das ist eine praktisches Tool, auf das wir in den nächsten Abschnitten des Kurses im Detail eingehen werden.
 
 Somit haben wir einen neuen Optimierer definiert und können ihn zum Training verwenden. Zuerst laden wir das Modell neu, um die Änderungen an der Gewichtung aus dem letzten Trainingslauf zurückzusetzen, und dann können wir es mit dem neuen Optimierer kompilieren:
 
@@ -144,11 +135,8 @@ Jetzt starten wir einen erneuten Trainingslauf mit `fit`:
 model.fit(tf_train_dataset, validation_data=tf_validation_dataset, epochs=3)
 ```
 
-<Tip>
-
-💡 Wenn du dein Modell während des Trainings automatisch in den Hub hochladen möchtest, kannst du in der Methode `model.fit()` einen `PushToHubCallback` mitgeben. Mehr darüber erfahren wir in [Kapitel 4](/course/chapter4/3)
-
-</Tip>
+> [!TIP]
+> 💡 Wenn du dein Modell während des Trainings automatisch in den Hub hochladen möchtest, kannst du in der Methode `model.fit()` einen `PushToHubCallback` mitgeben. Mehr darüber erfahren wir in [Kapitel 4](/course/chapter4/3)
 
 ### Modell-Vorhersagen
 
@@ -188,8 +176,5 @@ Die genauen Ergebnisse können variieren, da die zufällige Initialisierung des
 
 Damit ist die Einführung in das Fein-tunen mit der Keras-API abgeschlossen. Beispiele für die gängigsten CL-Aufgaben findest du in Kapitel 7.
 
-<Tip>
-
-✏️ **Probier es aus!** Fein-tune ein Modell mit dem GLUE SST-2 Datensatz, indem du die Datenverarbeitung aus Abschnitt 2 verwendest.
-
-</Tip>
+> [!TIP]
+> ✏️ **Probier es aus!** Fein-tune ein Modell mit dem GLUE SST-2 Datensatz, indem du die Datenverarbeitung aus Abschnitt 2 verwendest.
diff --git a/chapters/de/chapter3/4.mdx b/chapters/de/chapter3/4.mdx
index bf63ada9c..6395de2e0 100644
--- a/chapters/de/chapter3/4.mdx
+++ b/chapters/de/chapter3/4.mdx
@@ -196,11 +196,8 @@ metric.compute()
 
 Auch hier werden deine Ergebnisse wegen der Zufälligkeit bei der Initialisierung des Modellkopfes und der Datenverteilung etwas anders ausfallen, aber sie sollten in etwa gleich sein.
 
-<Tip>
-
-✏️ **Probier es selbt!** Ändere die vorherige Trainingsschleife, um dein Modell auf dem SST-2-Datensatz fein zu tunen.
-
-</Tip>
+> [!TIP]
+> ✏️ **Probier es selbt!** Ändere die vorherige Trainingsschleife, um dein Modell auf dem SST-2-Datensatz fein zu tunen.
 
 ### Verbessere deine Trainingsschleife mit 🤗 Accelerate
 
@@ -292,9 +289,8 @@ Die erste Zeile, die hinzugefügt werden muss, ist die Import-Zeile. Die zweite
 
 Der Hauptteil der Arbeit wird dann in der Zeile erledigt, die die Dataloader, das Modell und den Optimierer an `accelerator.prepare()` sendet. Dadurch werden diese Objekte in den richtigen Container verpackt, damit das verteilte Training wie vorgesehen funktioniert. Die verbleibenden Änderungen sind das Entfernen der Zeile, die das Batch auf dem Gerät mit `device` ablegt (wenn du das beibehalten willst, kannst du es einfach in `accelerator.device` ändern) und das Ersetzen von `loss.backward()` durch `accelerator.backward(loss)`.
 
-<Tip>
-⚠️ Um von dem Geschwindigkeitsvorteil der Cloud TPUs zu profitieren, empfehlen wir, deine Samples mit den Argumenten `padding="max_length"` und `max_length` des Tokenizers auf eine feste Länge aufzufüllen.
-</Tip>
+> [!TIP]
+> ⚠️ Um von dem Geschwindigkeitsvorteil der Cloud TPUs zu profitieren, empfehlen wir, deine Samples mit den Argumenten `padding="max_length"` und `max_length` des Tokenizers auf eine feste Länge aufzufüllen.
 
 Wenn du damit experimentieren möchtest, siehst du hier, wie die komplette Trainingsschleife mit 🤗 Accelerate aussieht:
 
diff --git a/chapters/de/chapter4/2.mdx b/chapters/de/chapter4/2.mdx
index a445d6f60..082e5040c 100644
--- a/chapters/de/chapter4/2.mdx
+++ b/chapters/de/chapter4/2.mdx
@@ -91,6 +91,5 @@ model = TFAutoModelForMaskedLM.from_pretrained("camembert-base")
 ```
 {/if}
 
-<Tip>
-Wenn du ein vortrainiertes Modell verwendest, prüf erstmal, wie genau das traininert wurde, mit welchen Datensätzen, sowie seine Einschränkungen und Biases. All diese Informationen sollten auf der Modellbeschreibungskarte stehen.
-</Tip>
+> [!TIP]
+> Wenn du ein vortrainiertes Modell verwendest, prüf erstmal, wie genau das traininert wurde, mit welchen Datensätzen, sowie seine Einschränkungen und Biases. All diese Informationen sollten auf der Modellbeschreibungskarte stehen.
diff --git a/chapters/de/chapter4/3.mdx b/chapters/de/chapter4/3.mdx
index 343221b0b..6a9b99441 100644
--- a/chapters/de/chapter4/3.mdx
+++ b/chapters/de/chapter4/3.mdx
@@ -171,11 +171,8 @@ Click auf den Tab "Files and versions" und da solltest du die Dateien finden, di
 </div>
 {/if}
 
-<Tip>
-
-✏️ **Probier das selber aus!** Lade das Modell und den Tokenizer vom Checkpoint `bert-base-cased` mit der Methode `push_to_hub()` hoch. Überprüfe, dass der Repository auf deiner Seite richtig erscheint, bevor du den löschst.
-
-</Tip>
+> [!TIP]
+> ✏️ **Probier das selber aus!** Lade das Modell und den Tokenizer vom Checkpoint `bert-base-cased` mit der Methode `push_to_hub()` hoch. Überprüfe, dass der Repository auf deiner Seite richtig erscheint, bevor du den löschst.
 
 Wie du schon gesehen hast, akzeptiert die Methode `push_to_hub()` mehrere Argumente. Dies erlaub das Hochladen auf den Namespace eines spezifischen Repositorys oder einer Organisation, sowie die Möglichkeit, einen anderen API Token zu benutzten. Wir empfehlen dir, die Dokumentation der Methode direkt auf [🤗 Transformers documentation](https://huggingface.co/transformers/model_sharing.html) zu lesen, um dir eine Vorstellung zu schaffen, was alles damit möglich ist.
 
@@ -459,9 +456,8 @@ Wenn du dir die Dateigrößen anschaust (z.B. mit `ls -lh`), solltest du sehen,
 
 {/if}
 
-<Tip>
-✏️  Wenn ein Repository mittels der Webinterface kreiert wird, wird die *.gitattributes* Datei automatisch gesetzt, um bestimmte Dateiendungen wie *.bin* und *.h5* als große Dateien zu betrachten, sodass git-lfs sie tracken kann, ohne dass du weiteres konfigurieren musst.
-</Tip> 
+> [!TIP]
+> ✏️  Wenn ein Repository mittels der Webinterface kreiert wird, wird die *.gitattributes* Datei automatisch gesetzt, um bestimmte Dateiendungen wie *.bin* und *.h5* als große Dateien zu betrachten, sodass git-lfs sie tracken kann, ohne dass du weiteres konfigurieren musst. 
 
 Nun können wir weitermachen und so arbeiten wie wir es mit normalen Git Repositories machen. Wir können die Dateien stagen mit dem Git-Befehl `git add`:
 
diff --git a/chapters/en/chapter1/1.mdx b/chapters/en/chapter1/1.mdx
index ff9f55560..9294f9af1 100644
--- a/chapters/en/chapter1/1.mdx
+++ b/chapters/en/chapter1/1.mdx
@@ -146,9 +146,8 @@ For some languages, the [course YouTube videos](https://youtube.com/playlist?lis
 
 <img src="https://huggingface.co/datasets/huggingface-course/documentation-images/resolve/main/en/chapter1/subtitles.png" alt="Activating subtitles for the Hugging Face course YouTube videos" width="75%">
 
-<Tip>
-Don't see your language in the above table or you'd like to contribute to an existing translation? You can help us translate the course by following the instructions <a href="https://github.com/huggingface/course#translating-the-course-into-your-language">here</a>.
-</Tip>
+> [!TIP]
+> Don't see your language in the above table or you'd like to contribute to an existing translation? You can help us translate the course by following the instructions <a href="https://github.com/huggingface/course#translating-the-course-into-your-language">here</a>.
 
 ## Let's go 🚀
 
diff --git a/chapters/en/chapter1/2.mdx b/chapters/en/chapter1/2.mdx
index 13fa4eef2..ebfb179a4 100644
--- a/chapters/en/chapter1/2.mdx
+++ b/chapters/en/chapter1/2.mdx
@@ -27,11 +27,8 @@ NLP isn't limited to written text though. It also tackles complex challenges in
 
 In recent years, the field of NLP has been revolutionized by Large Language Models (LLMs). These models, which include architectures like GPT (Generative Pre-trained Transformer) and [Llama](https://huggingface.co/meta-llama), have transformed what's possible in language processing.
 
-<Tip>
-
-A large language model (LLM) is an AI model trained on massive amounts of text data that can understand and generate human-like text, recognize patterns in language, and perform a wide variety of language tasks without task-specific training. They represent a significant advancement in the field of natural language processing (NLP).
-
-</Tip>
+> [!TIP]
+> A large language model (LLM) is an AI model trained on massive amounts of text data that can understand and generate human-like text, recognize patterns in language, and perform a wide variety of language tasks without task-specific training. They represent a significant advancement in the field of natural language processing (NLP).
 
 LLMs are characterized by:
 - **Scale**: They contain millions, billions, or even hundreds of billions of parameters
diff --git a/chapters/en/chapter1/3.mdx b/chapters/en/chapter1/3.mdx
index df13888ec..2865476fb 100644
--- a/chapters/en/chapter1/3.mdx
+++ b/chapters/en/chapter1/3.mdx
@@ -9,11 +9,10 @@
 
 In this section, we will look at what Transformer models can do and use our first tool from the 🤗 Transformers library: the `pipeline()` function.
 
-<Tip>
-👀 See that <em>Open in Colab</em> button on the top right? Click on it to open a Google Colab notebook with all the code samples of this section. This button will be present in any section containing code examples. 
-
-If you want to run the examples locally, we recommend taking a look at the <a href="/course/chapter0">setup</a>.
-</Tip>
+> [!TIP]
+> 👀 See that <em>Open in Colab</em> button on the top right? Click on it to open a Google Colab notebook with all the code samples of this section. This button will be present in any section containing code examples. 
+>
+> If you want to run the examples locally, we recommend taking a look at the <a href="/course/chapter0">setup</a>.
 
 ## Transformers are everywhere![[transformers-are-everywhere]]
 
@@ -23,11 +22,8 @@ Transformer models are used to solve all kinds of tasks across different modalit
 
 The [🤗 Transformers library](https://github.com/huggingface/transformers) provides the functionality to create and use those shared models. The [Model Hub](https://huggingface.co/models) contains millions of pretrained models that anyone can download and use. You can also upload your own models to the Hub!
 
-<Tip>
-
-⚠️ The Hugging Face Hub is not limited to Transformer models. Anyone can share any kind of models or datasets they want! <a href="https://huggingface.co/join">Create a huggingface.co</a> account to benefit from all available features!
-
-</Tip>
+> [!TIP]
+> ⚠️ The Hugging Face Hub is not limited to Transformer models. Anyone can share any kind of models or datasets they want! <a href="https://huggingface.co/join">Create a huggingface.co</a> account to benefit from all available features!
 
 Before diving into how Transformer models work under the hood, let's look at a few examples of how they can be used to solve some interesting NLP problems.
 
@@ -75,11 +71,8 @@ The `pipeline()` function supports multiple modalities, allowing you to work wit
 
 Here's an overview of what's available:
 
-<Tip>
-
-For a full and updated list of pipelines, see the [🤗 Transformers documentation](https://huggingface.co/docs/hub/en/models-tasks).
-
-</Tip>
+> [!TIP]
+> For a full and updated list of pipelines, see the [🤗 Transformers documentation](https://huggingface.co/docs/hub/en/models-tasks).
 
 ### Text pipelines
 
@@ -130,11 +123,8 @@ classifier(
 
 This pipeline is called _zero-shot_ because you don't need to fine-tune the model on your data to use it. It can directly return probability scores for any list of labels you want!
 
-<Tip>
-
-✏️ **Try it out!** Play around with your own sequences and labels and see how the model behaves.
-
-</Tip>
+> [!TIP]
+> ✏️ **Try it out!** Play around with your own sequences and labels and see how the model behaves.
 
 
 ## Text generation[[text-generation]]
@@ -158,11 +148,8 @@ generator("In this course, we will teach you how to")
 
 You can control how many different sequences are generated with the argument `num_return_sequences` and the total length of the output text with the argument `max_length`.
 
-<Tip>
-
-✏️ **Try it out!** Use the `num_return_sequences` and `max_length` arguments to generate two sentences of 15 words each.
-
-</Tip>
+> [!TIP]
+> ✏️ **Try it out!** Use the `num_return_sequences` and `max_length` arguments to generate two sentences of 15 words each.
 
 ## Using any model from the Hub in a pipeline[[using-any-model-from-the-hub-in-a-pipeline]]
 
@@ -193,11 +180,8 @@ You can refine your search for a model by clicking on the language tags, and pic
 
 Once you select a model by clicking on it, you'll see that there is a widget enabling you to try it directly online. This way you can quickly test the model's capabilities before downloading it.
 
-<Tip>
-
-✏️ **Try it out!** Use the filters to find a text generation model for another language. Feel free to play with the widget and use it in a pipeline!
-
-</Tip>
+> [!TIP]
+> ✏️ **Try it out!** Use the filters to find a text generation model for another language. Feel free to play with the widget and use it in a pipeline!
 
 ### Inference Providers[[inference-providers]]
 
@@ -229,11 +213,8 @@ unmasker("This course will teach you all about <mask> models.", top_k=2)
 
 The `top_k` argument controls how many possibilities you want to be displayed. Note that here the model fills in the special `<mask>` word, which is often referred to as a *mask token*. Other mask-filling models might have different mask tokens, so it's always good to verify the proper mask word when exploring other models. One way to check it is by looking at the mask word used in the widget.
 
-<Tip>
-
-✏️ **Try it out!** Search for the `bert-base-cased` model on the Hub and identify its mask word in the Inference API widget. What does this model predict for the sentence in our `pipeline` example above?
-
-</Tip>
+> [!TIP]
+> ✏️ **Try it out!** Search for the `bert-base-cased` model on the Hub and identify its mask word in the Inference API widget. What does this model predict for the sentence in our `pipeline` example above?
 
 ## Named entity recognition[[named-entity-recognition]]
 
@@ -257,11 +238,8 @@ Here the model correctly identified that Sylvain is a person (PER), Hugging Face
 
 We pass the option `grouped_entities=True` in the pipeline creation function to tell the pipeline to regroup together the parts of the sentence that correspond to the same entity: here the model correctly grouped "Hugging" and "Face" as a single organization, even though the name consists of multiple words. In fact, as we will see in the next chapter, the preprocessing even splits some words into smaller parts. For instance, `Sylvain` is split into four pieces: `S`, `##yl`, `##va`, and `##in`. In the post-processing step, the pipeline successfully regrouped those pieces.
 
-<Tip>
-
-✏️ **Try it out!** Search the Model Hub for a model able to do part-of-speech tagging (usually abbreviated as POS) in English. What does this model predict for the sentence in the example above?
-
-</Tip>
+> [!TIP]
+> ✏️ **Try it out!** Search the Model Hub for a model able to do part-of-speech tagging (usually abbreviated as POS) in English. What does this model predict for the sentence in the example above?
 
 ## Question answering[[question-answering]]
 
@@ -345,11 +323,8 @@ translator("Ce cours est produit par Hugging Face.")
 
 Like with text generation and summarization, you can specify a `max_length` or a `min_length` for the result.
 
-<Tip>
-
-✏️ **Try it out!** Search for translation models in other languages and try to translate the previous sentence into a few different languages.
-
-</Tip>
+> [!TIP]
+> ✏️ **Try it out!** Search for translation models in other languages and try to translate the previous sentence into a few different languages.
 
 ## Image and audio pipelines
 
diff --git a/chapters/en/chapter1/4.mdx b/chapters/en/chapter1/4.mdx
index 3870b541f..7500f6376 100644
--- a/chapters/en/chapter1/4.mdx
+++ b/chapters/en/chapter1/4.mdx
@@ -7,11 +7,8 @@
 
 In this section, we will take a look at the architecture of Transformer models and dive deeper into the concepts of attention, encoder-decoder architecture, and more.
 
-<Tip warning={true}>
-
-🚀 We're taking things up a notch here. This section is detailed and technical, so don't worry if you don't understand everything right away. We'll come back to these concepts later in the course.
-
-</Tip>
+> [!WARNING]
+> 🚀 We're taking things up a notch here. This section is detailed and technical, so don't worry if you don't understand everything right away. We'll come back to these concepts later in the course.
 
 ## A bit of Transformer history[[a-bit-of-transformer-history]]
 
@@ -34,8 +31,7 @@ The [Transformer architecture](https://arxiv.org/abs/1706.03762) was introduced
 
 - **May 2020**, [GPT-3](https://huggingface.co/papers/2005.14165), an even bigger version of GPT-2 that is able to perform well on a variety of tasks without the need for fine-tuning (called _zero-shot learning_)
 
-- **January 2022**: [InstructGPT](https://huggingface.co/papers/2203.02155), a version of GPT-3 that was trained to follow instructions better
-This list is far from comprehensive, and is just meant to highlight a few of the different kinds of Transformer models. Broadly, they can be grouped into three categories:
+- **January 2022**: [InstructGPT](https://huggingface.co/papers/2203.02155), a version of GPT-3 that was trained to follow instructions better.
 
 - **January 2023**: [Llama](https://huggingface.co/papers/2302.13971), a large language model that is able to generate text in a variety of languages.
 
@@ -45,6 +41,8 @@ This list is far from comprehensive, and is just meant to highlight a few of the
 
 - **November 2024**: [SmolLM2](https://huggingface.co/papers/2502.02737), a state-of-the-art small language model (135 million to 1.7 billion parameters) that achieves impressive performance despite its compact size, and unlocking new possibilities for mobile and edge devices.
 
+This list is far from comprehensive, and is just meant to highlight a few of the different kinds of Transformer models. Broadly, they can be grouped into three categories:
+
 - GPT-like (also called _auto-regressive_ Transformer models)
 - BERT-like (also called _auto-encoding_ Transformer models) 
 - T5-like (also called _sequence-to-sequence_ Transformer models)
diff --git a/chapters/en/chapter1/5.mdx b/chapters/en/chapter1/5.mdx
index 339fe62a4..f00f6643b 100644
--- a/chapters/en/chapter1/5.mdx
+++ b/chapters/en/chapter1/5.mdx
@@ -4,11 +4,8 @@
 
 In [Transformers, what can they do?](/course/chapter1/3), you learned about natural language processing (NLP), speech and audio, computer vision tasks, and some important applications of them. This page will look closely at how models solve these tasks and explain what's happening under the hood. There are many ways to solve a given task, some models may implement certain techniques or even approach the task from a new angle, but for Transformer models, the general idea is the same. Owing to its flexible architecture, most models are a variant of an encoder, a decoder, or an encoder-decoder structure. 
 
-<Tip>
-
-Before diving into specific architectural variants, it's helpful to understand that most tasks follow a similar pattern: input data is processed through a model, and the output is interpreted for a specific task. The differences lie in how the data is prepared, what model architecture variant is used, and how the output is processed.
-
-</Tip>
+> [!TIP]
+> Before diving into specific architectural variants, it's helpful to understand that most tasks follow a similar pattern: input data is processed through a model, and the output is interpreted for a specific task. The differences lie in how the data is prepared, what model architecture variant is used, and how the output is processed.
 
 To explain how tasks are solved, we'll walk through what goes on inside the model to output useful predictions. We'll cover the following models and their corresponding tasks:
 
@@ -21,11 +18,8 @@ To explain how tasks are solved, we'll walk through what goes on inside the mode
 - [GPT2](https://huggingface.co/docs/transformers/model_doc/gpt2) for NLP tasks like text generation that use a decoder
 - [BART](https://huggingface.co/docs/transformers/model_doc/bart) for NLP tasks like summarization and translation that use an encoder-decoder
 
-<Tip>
-
-Before you go further, it is good to have some basic knowledge of the original Transformer architecture. Knowing how encoders, decoders, and attention work will aid you in understanding how different Transformer models work. Be sure to check out our [the previous section](https://huggingface.co/course/chapter1/4?fw=pt) for more information! 
-
-</Tip>
+> [!TIP]
+> Before you go further, it is good to have some basic knowledge of the original Transformer architecture. Knowing how encoders, decoders, and attention work will aid you in understanding how different Transformer models work. Be sure to check out our [the previous section](https://huggingface.co/course/chapter1/4?fw=pt) for more information!
 
 ## Transformer models for language 
 
@@ -59,11 +53,8 @@ As we covered in the previous section, language models are typically pretrained
 
 In the following sections, we'll explore specific model architectures and how they're applied to various tasks across speech, vision, and text domains.
 
-<Tip>
-
-Understanding which part of the Transformer architecture (encoder, decoder, or both) is best suited for a particular NLP task is key to choosing the right model. Generally, tasks requiring bidirectional context use encoders, tasks generating text use decoders, and tasks converting one sequence to another use encoder-decoders.
-
-</Tip>
+> [!TIP]
+> Understanding which part of the Transformer architecture (encoder, decoder, or both) is best suited for a particular NLP task is key to choosing the right model. Generally, tasks requiring bidirectional context use encoders, tasks generating text use decoders, and tasks converting one sequence to another use encoder-decoders.
 
 ### Text generation
 
@@ -83,9 +74,8 @@ GPT-2's pretraining objective is based entirely on [causal language modeling](ht
 
 Ready to try your hand at text generation? Check out our complete [causal language modeling guide](https://huggingface.co/docs/transformers/tasks/language_modeling#causal-language-modeling) to learn how to finetune DistilGPT-2 and use it for inference!
 
-<Tip>
-For more information about text generation, check out the [text generation strategies](generation_strategies) guide!
-</Tip>
+> [!TIP]
+> For more information about text generation, check out the [text generation strategies](https://huggingface.co/docs/transformers/generation_strategies#generation-strategies) guide!
 
 ### Text classification
 
@@ -121,11 +111,8 @@ To use BERT for question answering, add a span classification head on top of the
 
 Ready to try your hand at question answering? Check out our complete [question answering guide](https://huggingface.co/docs/transformers/tasks/question_answering) to learn how to finetune DistilBERT and use it for inference!
 
-<Tip>
-
-💡 Notice how easy it is to use BERT for different tasks once it's been pretrained. You only need to add a specific head to the pretrained model to manipulate the hidden states into your desired output!
-
-</Tip>
+> [!TIP]
+> 💡 Notice how easy it is to use BERT for different tasks once it's been pretrained. You only need to add a specific head to the pretrained model to manipulate the hidden states into your desired output!
 
 ### Summarization
 
@@ -143,11 +130,8 @@ Encoder-decoder models like [BART](https://huggingface.co/docs/transformers/mode
 
 Ready to try your hand at summarization? Check out our complete [summarization guide](https://huggingface.co/docs/transformers/tasks/summarization) to learn how to finetune T5 and use it for inference!
 
-<Tip>
-
-For more information about text generation, check out the [text generation strategies](https://huggingface.co/docs/transformers/generation_strategies) guide!
-
-</Tip>
+> [!TIP]
+> For more information about text generation, check out the [text generation strategies](https://huggingface.co/docs/transformers/generation_strategies) guide!
 
 ### Translation
 
@@ -158,11 +142,8 @@ BART has since been followed up by a multilingual version, mBART, intended for t
 
 Ready to try your hand at translation? Check out our complete [translation guide](https://huggingface.co/docs/transformers/tasks/translation) to learn how to finetune T5 and use it for inference!
 
-<Tip>
-
-As you've seen throughout this guide, many models follow similar patterns despite addressing different tasks. Understanding these common patterns can help you quickly grasp how new models work and how to adapt existing models to your specific needs.
-
-</Tip>
+> [!TIP]
+> As you've seen throughout this guide, many models follow similar patterns despite addressing different tasks. Understanding these common patterns can help you quickly grasp how new models work and how to adapt existing models to your specific needs.
 
 ## Modalities beyond text
 
@@ -190,11 +171,8 @@ Whisper was pretrained on a massive and diverse dataset of 680,000 hours of labe
 
 Now that Whisper is pretrained, you can use it directly for zero-shot inference or finetune it on your data for improved performance on specific tasks like automatic speech recognition or speech translation!
 
-<Tip>
-
-The key innovation in Whisper is its training on an unprecedented scale of diverse, weakly supervised audio data from the internet. This allows it to generalize remarkably well to different languages, accents, and tasks without task-specific finetuning.
-
-</Tip>
+> [!TIP]
+> The key innovation in Whisper is its training on an unprecedented scale of diverse, weakly supervised audio data from the internet. This allows it to generalize remarkably well to different languages, accents, and tasks without task-specific finetuning.
 
 ### Automatic speech recognition
 
@@ -223,11 +201,8 @@ There are two ways to approach computer vision tasks:
 1. Split an image into a sequence of patches and process them in parallel with a Transformer.
 2. Use a modern CNN, like [ConvNeXT](https://huggingface.co/docs/transformers/model_doc/convnext), which relies on convolutional layers but adopts modern network designs.
 
-<Tip>
-
-A third approach mixes Transformers with convolutions (for example, [Convolutional Vision Transformer](https://huggingface.co/docs/transformers/model_doc/cvt) or [LeViT](https://huggingface.co/docs/transformers/model_doc/levit)). We won't discuss those because they just combine the two approaches we examine here.  
-
-</Tip>
+> [!TIP]
+> A third approach mixes Transformers with convolutions (for example, [Convolutional Vision Transformer](https://huggingface.co/docs/transformers/model_doc/cvt) or [LeViT](https://huggingface.co/docs/transformers/model_doc/levit)). We won't discuss those because they just combine the two approaches we examine here.
 
 ViT and ConvNeXT are commonly used for image classification, but for other vision tasks like object detection, segmentation, and depth estimation, we'll look at DETR, Mask2Former and GLPN, respectively; these models are better suited for those tasks.
 
@@ -256,8 +231,5 @@ The main change ViT introduced was in how images are fed to a Transformer:
 Ready to try your hand at image classification? Check out our complete [image classification guide](https://huggingface.co/docs/transformers/tasks/image_classification) to learn how to fine-tune ViT and use it for inference!  
 
 
-<Tip>
-
-Notice the parallel between ViT and BERT: both use a special token (<code>[CLS]</code>) to capture the overall representation, both add position information to their embeddings, and both use a Transformer encoder to process the sequence of tokens/patches.  
-
-</Tip>
+> [!TIP]
+> Notice the parallel between ViT and BERT: both use a special token (<code>[CLS]</code>) to capture the overall representation, both add position information to their embeddings, and both use a Transformer encoder to process the sequence of tokens/patches.
diff --git a/chapters/en/chapter1/6.mdx b/chapters/en/chapter1/6.mdx
index e540c8624..049078cf3 100644
--- a/chapters/en/chapter1/6.mdx
+++ b/chapters/en/chapter1/6.mdx
@@ -5,16 +5,13 @@
 
 # Transformer Architectures[[transformer-architectures]]
 
-In the previous sections, we introduced the general Transformer architecture and explored how these models can solve various tasks. Now, let's take a closer look at the three main architectural variants of Transformer models and understand when to use each one. Then, we looked at how those architectures are applied to different language tasks. 
+In the previous sections, we introduced the general Transformer architecture and explored how these models can solve various tasks. Now, let's take a closer look at the three main architectural variants of Transformer models and understand when to use each one. Then, we look at how those architectures are applied to different language tasks. 
 
 In this section, we're going to dive deeper into the three main architectural variants of Transformer models and understand when to use each one.
 
 
-<Tip>
-
-Remember that most Transformer models use one of three architectures: encoder-only, decoder-only, or encoder-decoder (sequence-to-sequence). Understanding these differences will help you choose the right model for your specific task.
-
-</Tip>
+> [!TIP]
+> Remember that most Transformer models use one of three architectures: encoder-only, decoder-only, or encoder-decoder (sequence-to-sequence). Understanding these differences will help you choose the right model for your specific task.
 
 ## Encoder models[[encoder-models]]
 
@@ -26,11 +23,8 @@ The pretraining of these models usually revolves around somehow corrupting a giv
 
 Encoder models are best suited for tasks requiring an understanding of the full sentence, such as sentence classification, named entity recognition (and more generally word classification), and extractive question answering.
 
-<Tip>
-
-As we saw in [How 🤗 Transformers solve tasks](/chapter1/5), encoder models like BERT excel at understanding text because they can look at the entire context in both directions. This makes them perfect for tasks where comprehension of the whole input is important.
-
-</Tip>
+> [!TIP]
+> As we saw in [How 🤗 Transformers solve tasks](https://huggingface.co/learn/llm-course/chapter1/5), encoder models like BERT excel at understanding text because they can look at the entire context in both directions. This makes them perfect for tasks where comprehension of the whole input is important.
 
 Representatives of this family of models include:
 
@@ -48,11 +42,8 @@ The pretraining of decoder models usually revolves around predicting the next wo
 
 These models are best suited for tasks involving text generation.
 
-<Tip>
-
-Decoder models like GPT are designed to generate text by predicting one token at a time. As we explored in [How 🤗 Transformers solve tasks](/chapter1/5), they can only see previous tokens, which makes them excellent for creative text generation but less ideal for tasks requiring bidirectional understanding.
-
-</Tip>
+> [!TIP]
+> Decoder models like GPT are designed to generate text by predicting one token at a time. As we explored in [How 🤗 Transformers solve tasks](https://huggingface.co/learn/llm-course/chapter1/5), they can only see previous tokens, which makes them excellent for creative text generation but less ideal for tasks requiring bidirectional understanding.
 
 Representatives of this family of models include:
 
@@ -85,7 +76,7 @@ Modern decoder-based LLMs have demonstrated impressive capabilities:
 | Reasoning | Working through problems step by step | Solving math problems or logical puzzles |
 | Few-shot learning | Learning from a few examples in the prompt | Classifying text after seeing just 2-3 examples |
 
-You can experiment with decoder-based LLMs directly in your browser via model repo pages on the Hub. Here's an an example with the classic [GPT-2](https://huggingface.co/openai-community/gpt2) (OpenAI's finest open source model!):
+You can experiment with decoder-based LLMs directly in your browser via model repo pages on the Hub. Here's an example with the classic [GPT-2](https://huggingface.co/openai-community/gpt2) (OpenAI's finest open source model!):
 
 <iframe
 	src="https://huggingface.co/openai-community/gpt2"
@@ -104,11 +95,8 @@ The pretraining of these models can take different forms, but it often involves
 
 Sequence-to-sequence models are best suited for tasks revolving around generating new sentences depending on a given input, such as summarization, translation, or generative question answering.
 
-<Tip>
-
-As we saw in [How 🤗 Transformers solve tasks](/chapter1/5), encoder-decoder models like BART and T5 combine the strengths of both architectures. The encoder provides deep bidirectional understanding of the input, while the decoder generates appropriate output text. This makes them perfect for tasks that transform one sequence into another, like translation or summarization.  
-
-</Tip>
+> [!TIP]
+> As we saw in [How 🤗 Transformers solve tasks](https://huggingface.co/learn/llm-course/chapter1/5), encoder-decoder models like BART and T5 combine the strengths of both architectures. The encoder provides deep bidirectional understanding of the input, while the decoder generates appropriate output text. This makes them perfect for tasks that transform one sequence into another, like translation or summarization.
 
 ### Practical applications
 
@@ -153,17 +141,14 @@ When working on a specific NLP task, how do you decide which architecture to use
 | Question answering (generative) | Encoder-Decoder or Decoder | T5, GPT |
 | Conversational AI | Decoder | GPT, LLaMA |
 
-<Tip>  
-
-When in doubt about which model to use, consider:  
-
-1. What kind of understanding does your task need? (Bidirectional or unidirectional)  
-2. Are you generating new text or analyzing existing text?  
-3. Do you need to transform one sequence into another?  
-
-The answers to these questions will guide you toward the right architecture.  
-
-</Tip> 
+> [!TIP]
+> When in doubt about which model to use, consider:  
+>
+> 1. What kind of understanding does your task need? (Bidirectional or unidirectional)  
+> 2. Are you generating new text or analyzing existing text?  
+> 3. Do you need to transform one sequence into another?  
+>
+> The answers to these questions will guide you toward the right architecture. 
 
 ## The evolution of LLMs
 
@@ -175,11 +160,8 @@ Most transformer models use full attention in the sense that the attention matri
 computational bottleneck when you have long texts. Longformer and reformer are models that try to be more efficient and
 use a sparse version of the attention matrix to speed up training.
 
-<Tip>
-
-Standard attention mechanisms have a computational complexity of O(n²), where n is the sequence length. This becomes problematic for very long sequences. The specialized attention mechanisms below help address this limitation.
-
-</Tip>
+> [!TIP]
+> Standard attention mechanisms have a computational complexity of O(n²), where n is the sequence length. This becomes problematic for very long sequences. The specialized attention mechanisms below help address this limitation.
 
 ### LSH attention
 
@@ -221,4 +203,4 @@ in E2.
 
 In this section, we've explored the three main Transformer architectures and some specialized attention mechanisms. Understanding these architectural differences is crucial for selecting the right model for your specific NLP task.
 
-As we move forward in the course, you'll get hands-on experience with these different architectures and learn how to fine-tune them for your specific needs. In the next section, we'll look at some of the limitations and biases present in these models that you should be aware of when deploying them.
\ No newline at end of file
+As we move forward in the course, you'll get hands-on experience with these different architectures and learn how to fine-tune them for your specific needs. In the next section, we'll look at some of the limitations and biases present in these models that you should be aware of when deploying them.
diff --git a/chapters/en/chapter1/8.mdx b/chapters/en/chapter1/8.mdx
index a84e76f41..6be1cb515 100644
--- a/chapters/en/chapter1/8.mdx
+++ b/chapters/en/chapter1/8.mdx
@@ -23,11 +23,8 @@ The attention mechanism is what gives LLMs their ability to understand context a
 
 This process of identifying the most relevant words to predict the next token has proven to be incredibly effective. Although the basic principle of training LLMs—predicting the next token—has remained generally consistent since BERT and GPT-2, there have been significant advancements in scaling neural networks and making the attention mechanism work for longer and longer sequences, at lower and lower costs.
 
-<Tip>
-
-In short, the attention mechanism is the key to LLMs being able to generate text that is both coherent and context-aware. It sets modern LLMs apart from previous generations of language models.
-
-</Tip>
+> [!TIP]
+> In short, the attention mechanism is the key to LLMs being able to generate text that is both coherent and context-aware. It sets modern LLMs apart from previous generations of language models.
 
 ### Context Length and Attention Span
 
@@ -42,11 +39,8 @@ These capabilities are limited by several practical factors:
 
 In an ideal world, we could feed unlimited context to the model, but hardware constraints and computational costs make this impractical. This is why different models are designed with different context lengths to balance capability with efficiency.
 
-<Tip>
-
-The context length is the maximum number of tokens the model can consider at once when generating a response.
-
-</Tip>
+> [!TIP]
+> The context length is the maximum number of tokens the model can consider at once when generating a response.
 
 ### The Art of Prompting
 
@@ -54,11 +48,8 @@ When we pass information to LLMs, we structure our input in a way that guides th
 
 Understanding how LLMs process information helps us craft better prompts. Since the model's primary task is to predict the next token by analyzing the importance of each input token, the wording of your input sequence becomes crucial.
 
-<Tip>
-
-Careful design of the prompt makes it easier **to guide the generation of the LLM toward the desired output**.
-
-</Tip>
+> [!TIP]
+> Careful design of the prompt makes it easier **to guide the generation of the LLM toward the desired output**.
 
 ## The Two-Phase Inference Process
 
diff --git a/chapters/en/chapter1/9.mdx b/chapters/en/chapter1/9.mdx
index b5082b85e..13f448a32 100644
--- a/chapters/en/chapter1/9.mdx
+++ b/chapters/en/chapter1/9.mdx
@@ -9,7 +9,7 @@
 
 If your intent is to use a pretrained model or a fine-tuned version in production, please be aware that, while these models are powerful tools, they come with limitations. The biggest of these is that, to enable pretraining on large amounts of data, researchers often scrape all the content they can find, taking the best as well as the worst of what is available on the internet. 
 
-To give a quick illustration, let's go back the example of a `fill-mask` pipeline with the BERT model:
+To give a quick illustration, let's go back to the example of a `fill-mask` pipeline with the BERT model:
 
 ```python
 from transformers import pipeline
diff --git a/chapters/en/chapter11/1.mdx b/chapters/en/chapter11/1.mdx
index 0d1913406..e6da9f28d 100644
--- a/chapters/en/chapter11/1.mdx
+++ b/chapters/en/chapter11/1.mdx
@@ -18,9 +18,8 @@ Low Rank Adaptation (LoRA) is a technique for fine-tuning language models by add
 
 Evaluation is a crucial step in the fine-tuning process. It allows us to measure the performance of the model on a task-specific dataset.
 
-<Tip>
-⚠️ In order to benefit from all features available with the Model Hub and 🤗 Transformers, we recommend <a href="https://huggingface.co/join">creating an account</a>.
-</Tip>
+> [!TIP]
+> ⚠️ In order to benefit from all features available with the Model Hub and 🤗 Transformers, we recommend <a href="https://huggingface.co/join">creating an account</a>.
 
 ## References
 
diff --git a/chapters/en/chapter11/2.mdx b/chapters/en/chapter11/2.mdx
index b7be1664e..08b03f700 100644
--- a/chapters/en/chapter11/2.mdx
+++ b/chapters/en/chapter11/2.mdx
@@ -10,13 +10,12 @@
 
 Chat templates are essential for structuring interactions between language models and users. Whether you're building a simple chatbot or a complex AI agent, understanding how to properly format your conversations is crucial for getting the best results from your model. In this guide, we'll explore what chat templates are, why they matter, and how to use them effectively.
 
-<Tip>
-Chat templates are crucial for:
-- Maintaining consistent conversation structure
-- Ensuring proper role identification
-- Managing context across multiple turns
-- Supporting advanced features like tool use
-</Tip>
+> [!TIP]
+> Chat templates are crucial for:
+> - Maintaining consistent conversation structure
+> - Ensuring proper role identification
+> - Managing context across multiple turns
+> - Supporting advanced features like tool use
 
 ## Model Types and Templates
 
@@ -27,9 +26,8 @@ Instruction tuned models are trained to follow a specific conversational structu
 
 To make a base model behave like an instruct model, we need to format our prompts in a consistent way that the model can understand. This is where chat templates come in. ChatML is one such template format that structures conversations with clear role indicators (system, user, assistant). Here's a guide on [ChatML](https://huggingface.co/HuggingFaceTB/SmolLM2-135M-Instruct/blob/e2c3f7557efbdec707ae3a336371d169783f1da1/tokenizer_config.json#L146).
 
-<Tip warning={true}>
-When using an instruct model, always verify you're using the correct chat template format. Using the wrong template can result in poor model performance or unexpected behavior. The easiest way to ensure this is to check the model tokenizer configuration on the Hub. For example, the `SmolLM2-135M-Instruct` model uses <a href="https://huggingface.co/HuggingFaceTB/SmolLM2-135M-Instruct/blob/e2c3f7557efbdec707ae3a336371d169783f1da1/tokenizer_config.json#L146">this configuration</a>.  
-</Tip>
+> [!WARNING]
+> When using an instruct model, always verify you're using the correct chat template format. Using the wrong template can result in poor model performance or unexpected behavior. The easiest way to ensure this is to check the model tokenizer configuration on the Hub. For example, the `SmolLM2-135M-Instruct` model uses <a href="https://huggingface.co/HuggingFaceTB/SmolLM2-135M-Instruct/blob/e2c3f7557efbdec707ae3a336371d169783f1da1/tokenizer_config.json#L146">this configuration</a>.
 
 ### Common Template Formats
 
@@ -143,12 +141,11 @@ Chat templates can handle more complex scenarios beyond just conversational inte
 3. **Function Calling**: For structured function execution
 4. **Multi-turn Context**: For maintaining conversation history
 
-<Tip>
-When implementing advanced features:
-- Test thoroughly with your specific model. Vision and tool use template are particularly diverse.
-- Monitor token usage carefully between each feature and model.
-- Document the expected format for each feature
-</Tip>
+> [!TIP]
+> When implementing advanced features:
+> - Test thoroughly with your specific model. Vision and tool use template are particularly diverse.
+> - Monitor token usage carefully between each feature and model.
+> - Document the expected format for each feature
 
 For multimodal conversations, chat templates can include image references or base64-encoded images:
 
@@ -208,44 +205,42 @@ When working with chat templates, follow these key practices:
 4. **Error Handling**: Include proper error handling for tool calls and multimodal inputs
 5. **Validation**: Validate message structure before sending to the model
 
-<Tip warning={true}>
-Common pitfalls to avoid:
-- Mixing different template formats in the same application
-- Exceeding token limits with long conversation histories
-- Not properly escaping special characters in messages
-- Forgetting to validate input message structure
-- Ignoring model-specific template requirements
-</Tip>
+> [!WARNING]
+> Common pitfalls to avoid:
+> - Mixing different template formats in the same application
+> - Exceeding token limits with long conversation histories
+> - Not properly escaping special characters in messages
+> - Forgetting to validate input message structure
+> - Ignoring model-specific template requirements
 
 ## Hands-on Exercise
 
 Let's practice implementing chat templates with a real-world example.
 
-<Tip>
-Follow these steps to convert the `HuggingFaceTB/smoltalk` dataset into chatml format:
-
-1. Load the dataset:
-```python
-from datasets import load_dataset
-
-dataset = load_dataset("HuggingFaceTB/smoltalk")
-```
-
-2. Create a processing function:
-```python
-def convert_to_chatml(example):
-    return {
-        "messages": [
-            {"role": "user", "content": example["input"]},
-            {"role": "assistant", "content": example["output"]},
-        ]
-    }
-```
-
-3. Apply the chat template using your chosen model's tokenizer
-
-Remember to validate your output format matches your target model's requirements!
-</Tip>
+> [!TIP]
+> Follow these steps to convert the `HuggingFaceTB/smoltalk` dataset into chatml format:
+>
+> 1. Load the dataset:
+> ```python
+> from datasets import load_dataset
+>
+> dataset = load_dataset("HuggingFaceTB/smoltalk")
+> ```
+>
+> 2. Create a processing function:
+> ```python
+> def convert_to_chatml(example):
+>     return {
+>         "messages": [
+>             {"role": "user", "content": example["input"]},
+>             {"role": "assistant", "content": example["output"]},
+>         ]
+>     }
+> ```
+>
+> 3. Apply the chat template using your chosen model's tokenizer
+>
+> Remember to validate your output format matches your target model's requirements!
 
 ## Additional Resources
 
diff --git a/chapters/en/chapter11/3.mdx b/chapters/en/chapter11/3.mdx
index 6acdfdb79..2967f7bbc 100644
--- a/chapters/en/chapter11/3.mdx
+++ b/chapters/en/chapter11/3.mdx
@@ -14,12 +14,11 @@ This page provides a step-by-step guide to fine-tuning the [`deepseek-ai/DeepSee
 
 Before diving into implementation, it's important to understand when SFT is the right choice for your project. As a first step, you should consider whether using an existing instruction-tuned model with well-crafted prompts would suffice for your use case. SFT involves significant computational resources and engineering effort, so it should only be pursued when prompting existing models proves insufficient.
 
-<Tip>
-Consider SFT only if you:
-- Need additional performance beyond what prompting can achieve
-- Have a specific use case where the cost of using a large general-purpose model outweighs the cost of fine-tuning a smaller model
-- Require specialized output formats or domain-specific knowledge that existing models struggle with
-</Tip>
+> [!TIP]
+> Consider SFT only if you:
+> - Need additional performance beyond what prompting can achieve
+> - Have a specific use case where the cost of using a large general-purpose model outweighs the cost of fine-tuning a smaller model
+> - Require specialized output formats or domain-specific knowledge that existing models struggle with
 
 If you determine that SFT is necessary, the decision to proceed depends on two primary factors:
 
@@ -36,15 +35,14 @@ When working in specialized domains, SFT helps align the model with domain-speci
 3. Handling technical queries appropriately
 4. Following industry-specific guidelines
 
-<Tip>
-Before starting SFT, evaluate whether your use case requires:
-- Precise output formatting
-- Domain-specific knowledge
-- Consistent response patterns
-- Adherence to specific guidelines
-
-This evaluation will help determine if SFT is the right approach for your needs.
-</Tip>
+> [!TIP]
+> Before starting SFT, evaluate whether your use case requires:
+> - Precise output formatting
+> - Domain-specific knowledge
+> - Consistent response patterns
+> - Adherence to specific guidelines
+>
+> This evaluation will help determine if SFT is the right approach for your needs.
 
 ## Dataset Preparation
 
@@ -88,13 +86,12 @@ The SFTTrainer configuration requires consideration of several parameters that c
    - `eval_steps`: How often to evaluate on validation data
    - `save_steps`: Frequency of model checkpoint saves
 
-<Tip>
-Start with conservative values and adjust based on monitoring:
-- Begin with 1-3 epochs
-- Use smaller batch sizes initially
-- Monitor validation metrics closely
-- Adjust learning rate if training is unstable
-</Tip>
+> [!TIP]
+> Start with conservative values and adjust based on monitoring:
+> - Begin with 1-3 epochs
+> - Use smaller batch sizes initially
+> - Monitor validation metrics closely
+> - Adjust learning rate if training is unstable
 
 ## Implementation with TRL
 
@@ -145,9 +142,8 @@ trainer = SFTTrainer(
 trainer.train()
 ```
 
-<Tip>
-When using a dataset with a "messages" field (like the example above), the SFTTrainer automatically applies the model's chat template, which it retrieves from the hub. This means you don't need any additional configuration to handle chat-style conversations - the trainer will format the messages according to the model's expected template format.
-</Tip>
+> [!TIP]
+> When using a dataset with a "messages" field (like the example above), the SFTTrainer automatically applies the model's chat template, which it retrieves from the hub. This means you don't need any additional configuration to handle chat-style conversations - the trainer will format the messages according to the model's expected template format.
 
 ## Packing the Dataset
 
@@ -201,13 +197,12 @@ Effective monitoring involves tracking quantitative metrics, and evaluating qual
 - Learning rate progression
 - Gradient norms
 
-<Tip warning={true}>
-Watch for these warning signs during training:
-1. Validation loss increasing while training loss decreases (overfitting)
-2. No significant improvement in loss values (underfitting)
-3. Extremely low loss values (potential memorization)
-4. Inconsistent output formatting (template learning issues)
-</Tip>
+> [!WARNING]
+> Watch for these warning signs during training:
+> 1. Validation loss increasing while training loss decreases (overfitting)
+> 2. No significant improvement in loss values (underfitting)
+> 3. Extremely low loss values (potential memorization)
+> 4. Inconsistent output formatting (template learning issues)
 
 ### The Path to Convergence
 
@@ -243,9 +238,8 @@ Extremely low loss values could suggest memorization rather than learning. This
 - The outputs lack diversity
 - The responses are too similar to training examples
 
-<Tip warning={true}>
-Monitor both the loss values and the model's actual outputs during training. Sometimes the loss can look good while the model develops unwanted behaviors. Regular qualitative evaluation of the model's responses helps catch issues that metrics alone might miss.
-</Tip>
+> [!WARNING]
+> Monitor both the loss values and the model's actual outputs during training. Sometimes the loss can look good while the model develops unwanted behaviors. Regular qualitative evaluation of the model's responses helps catch issues that metrics alone might miss.
 
 We should note that the interpretation of the loss values we outline here is aimed on the most common case, and in fact, loss values can behave on various ways depending on the model, the dataset, the training parameters, etc. If you interested in exploring more about outlined patterns, you should check out this blog post by the people at [Fast AI](https://www.fast.ai/posts/2023-09-04-learning-jumps/).
 
@@ -260,14 +254,13 @@ After completing SFT, consider these follow-up actions:
 3. Test domain-specific knowledge retention
 4. Monitor real-world performance metrics
 
-<Tip>
-Document your training process, including:
-- Dataset characteristics
-- Training parameters
-- Performance metrics
-- Known limitations
-This documentation will be valuable for future model iterations.
-</Tip>
+> [!TIP]
+> Document your training process, including:
+> - Dataset characteristics
+> - Training parameters
+> - Performance metrics
+> - Known limitations
+> This documentation will be valuable for future model iterations.
 
 ## Quiz
 
diff --git a/chapters/en/chapter11/4.mdx b/chapters/en/chapter11/4.mdx
index 2470d0634..a3e977ef3 100644
--- a/chapters/en/chapter11/4.mdx
+++ b/chapters/en/chapter11/4.mdx
@@ -67,9 +67,8 @@ Let's walk through the LoRA configuration and key parameters.
 | `bias` | Controls training of bias terms. Options are "none", "all", or "lora_only". "none" is most common for memory efficiency. |
 | `target_modules` | Specifies which model modules to apply LoRA to. Can be "all-linear" or specific modules like "q_proj,v_proj". More modules enable greater adaptability but increase memory usage. |
 
-<Tip>
-When implementing PEFT methods, start with small rank values (4-8) for LoRA and monitor training loss. Use validation sets to prevent overfitting and compare results with full fine-tuning baselines when possible. The effectiveness of different methods can vary by task, so experimentation is key.
-</Tip>
+> [!TIP]
+> When implementing PEFT methods, start with small rank values (4-8) for LoRA and monitor training loss. Use validation sets to prevent overfitting and compare results with full fine-tuning baselines when possible. The effectiveness of different methods can vary by task, so experimentation is key.
 
 ## Using TRL with PEFT
 
@@ -112,11 +111,8 @@ trainer = SFTTrainer(
 )
 ```
 
-<Tip>
-
-✏️ **Try it out!** Build on your fine-tuned model from the previous section, but fine-tune it with LoRA. Use the `HuggingFaceTB/smoltalk` dataset to fine-tune a `deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B` model, using the LoRA configuration we defined above.
-
-</Tip>
+> [!TIP]
+> ✏️ **Try it out!** Build on your fine-tuned model from the previous section, but fine-tune it with LoRA. Use the `HuggingFaceTB/smoltalk` dataset to fine-tune a `deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B` model, using the LoRA configuration we defined above.
 
 ## Merging LoRA Adapters
 
@@ -158,11 +154,8 @@ merged_model.save_pretrained("path/to/save/merged_model")
 tokenizer.save_pretrained("path/to/save/merged_model")
 ```
 
-<Tip>
-
-✏️ **Try it out!** Merge the adapter weights back into the base model. Use the `HuggingFaceTB/smoltalk` dataset to fine-tune a `deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B` model, using the LoRA configuration we defined above.
-
-</Tip>
+> [!TIP]
+> ✏️ **Try it out!** Merge the adapter weights back into the base model. Use the `HuggingFaceTB/smoltalk` dataset to fine-tune a `deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B` model, using the LoRA configuration we defined above.
 
 
 # Resources
diff --git a/chapters/en/chapter11/5.mdx b/chapters/en/chapter11/5.mdx
index 27462e20e..479521698 100644
--- a/chapters/en/chapter11/5.mdx
+++ b/chapters/en/chapter11/5.mdx
@@ -120,11 +120,8 @@ Results are displayed in a tabular format showing:
 
 Lighteval also include a python API for more detailed evaluation tasks, which is useful for manipulating the results in a more flexible way. Check out the [Lighteval documentation](https://huggingface.co/docs/lighteval/using-the-python-api) for more information.
 
-<Tip>
-
-✏️ **Try it out!** Evaluate your finetuned model on a specific task in lighteval.
-
-</Tip>
+> [!TIP]
+> ✏️ **Try it out!** Evaluate your finetuned model on a specific task in lighteval.
 
 # End-of-chapter quiz[[end-of-chapter-quiz]]
 
diff --git a/chapters/en/chapter12/1.mdx b/chapters/en/chapter12/1.mdx
index a74d35134..773f2c808 100644
--- a/chapters/en/chapter12/1.mdx
+++ b/chapters/en/chapter12/1.mdx
@@ -12,9 +12,9 @@ LLMs have shown excellent performance on many generative tasks. However, up unti
 
 Open R1 is a project that aims to make LLMs reason on complex problems. It does this by using reinforcement learning to encourage LLMs to 'think' and reason. 
 
-In simple terms, the model is train to generate thoughts as well as outputs, and to structure these thoughts and outputs so that they can be handled separately by the user. 
+In simple terms, the model is trained to generate thoughts as well as outputs, and to structure these thoughts and outputs so that they can be handled separately by the user. 
 
-Let's take a look at an example. A we gave ourself the task of solving the following problem, we might think like this:
+Let's take a look at an example. As we gave ourself the task of solving the following problem, we might think like this:
 
 ```sh
 Problem: "I have 3 apples and 2 oranges. How many pieces of fruit do I have in total?"
@@ -77,16 +77,13 @@ To get the most out of this chapter, it's helpful to have:
 
 Don't worry if you're missing some of these – we'll explain key concepts as we go along! 🚀
 
-<Tip>
-
-If you don't have all the prerequisites, check out this [course](/course/chapter1/1) from units 1 to 11
-
-</Tip>
+> [!TIP]
+> If you don't have all the prerequisites, check out this [course](/course/chapter1/1) from units 1 to 11
 
 ## How to Use This Chapter
 
 1. **Read Sequentially**: The sections build on each other, so it's best to read them in order
-2. **Share Notes**: Write down key concepts and questions and discuss them with in the community in [Discord](https://discord.gg/F3vZujJH)
+2. **Share Notes**: Write down key concepts and questions and discuss them within the community in [Discord](https://discord.gg/UrrTSsSyjb)
 3. **Try the Code**: When we get to practical examples, try them yourself
 4. **Join the Community**: Use the resources we provide to connect with other learners
 
diff --git a/chapters/en/chapter12/2.mdx b/chapters/en/chapter12/2.mdx
index 64db11635..0d4743967 100644
--- a/chapters/en/chapter12/2.mdx
+++ b/chapters/en/chapter12/2.mdx
@@ -4,11 +4,8 @@ Welcome to the first page!
 
 We're going to start our journey into the exciting world of Reinforcement Learning (RL) and discover how it's revolutionizing the way we train Language Models like the ones you might use every day.
 
-<Tip>
-
-In this chapter, we are focusing on reinforcement learning for language models. However, reinforcement learning is a broad field with many applications beyond language models. If you're interested in learning more about reinforcement learning, you should check out the [Deep Reinforcement Learning course](https://huggingface.co/courses/deep-rl-course/en/unit1/introduction).
-
-</Tip>
+> [!TIP]
+> In this chapter, we are focusing on reinforcement learning for language models. However, reinforcement learning is a broad field with many applications beyond language models. If you're interested in learning more about reinforcement learning, you should check out the [Deep Reinforcement Learning course](https://huggingface.co/courses/deep-rl-course/en/unit1/introduction).
 
 This page will give you a friendly and clear introduction to RL, even if you've never encountered it before. We'll break down the core ideas and see why RL is becoming so important in the field of Large Language Models (LLMs).
 
@@ -119,14 +116,11 @@ Proximal Policy Optimization (PPO) was one of the first highly effective techniq
 
 Direct Preference Optimization (DPO) was later developed as a simpler technique that eliminates the need for a separate reward model using preference data directly. Essentially, framing the problem as a classification task between the chosen and rejected responses.
 
-<Tip>
-
-DPO and PPO are complex reinforcement learning algorithms in their own right, which we will not cover in this course. If you're interested in learning more about them, you can check out the following resources:
-
-- [Proximal Policy Optimization](https://huggingface.co/docs/trl/main/en/ppo_trainer)
-- [Direct Preference Optimization](https://huggingface.co/docs/trl/main/en/dpo_trainer)
-
-</Tip>
+> [!TIP]
+> DPO and PPO are complex reinforcement learning algorithms in their own right, which we will not cover in this course. If you're interested in learning more about them, you can check out the following resources:
+>
+> - [Proximal Policy Optimization](https://huggingface.co/docs/trl/main/en/ppo_trainer)
+> - [Direct Preference Optimization](https://huggingface.co/docs/trl/main/en/dpo_trainer)
 
 Unlike DPO and PPO, GRPO groups similar samples together and compares them as a group. The group-based approach provides more stable gradients and better convergence properties compared to other methods.
 
diff --git a/chapters/en/chapter12/3.mdx b/chapters/en/chapter12/3.mdx
index a42762dbd..fd9d5583d 100644
--- a/chapters/en/chapter12/3.mdx
+++ b/chapters/en/chapter12/3.mdx
@@ -10,11 +10,8 @@ In the next chapter, we will build on this knowledge and implement GRPO in pract
 
 The initial goal of the paper was to explore whether pure reinforcement learning could develop reasoning capabilities without supervised fine-tuning. 
 
-<Tip>
-
-Up until that point, all the popular LLMs required some supervised fine-tuning, which we explored in [chapter 11](/course/chapter11/1).
-
-</Tip>
+> [!TIP]
+> Up until that point, all the popular LLMs required some supervised fine-tuning, which we explored in [chapter 11](/course/chapter11/1).
 
 ## The Breakthrough 'Aha' Moment
 
@@ -157,13 +154,10 @@ This approach proves more stable than traditional methods because:
 - The group-based normalization helps prevent issues with reward scaling
 - The KL penalty acts like a safety net, ensuring the model doesn't forget what it already knows while learning new things
 
-<Tip>
-
-GRPO's key innovations are:
-- Learning directly from any function or model, eliminating the reliance on a separate reward model.
-- Group-based learning, which is more stable and efficient than traditional methods like pairwise comparisons.
-
-</Tip>
+> [!TIP]
+> GRPO's key innovations are:
+> - Learning directly from any function or model, eliminating the reliance on a separate reward model.
+> - Group-based learning, which is more stable and efficient than traditional methods like pairwise comparisons.
 
 This breakdown is complex, but the key takeaway is that GRPO is a more efficient and stable way to train a model to reason. 
 
diff --git a/chapters/en/chapter12/3a.mdx b/chapters/en/chapter12/3a.mdx
index b4effb193..a849c0b9d 100644
--- a/chapters/en/chapter12/3a.mdx
+++ b/chapters/en/chapter12/3a.mdx
@@ -1,20 +1,17 @@
 # Advanced Understanding of Group Relative Policy Optimization (GRPO) in DeepSeekMath
 
-<Tip>
-
-This section dives into the technical and mathematical details of GRPO. It was authored by [Shirin Yamani](https://github.com/shirinyamani).
-
-</Tip>
+> [!TIP]
+> This section dives into the technical and mathematical details of GRPO. It was authored by <a href="https://huggingface.com/shirinyamani" target="_blank">Shirin Yamani</a>.
 
 Let's deepen our understanding of GRPO so that we can improve our model's training process.
 
 GRPO directly evaluates the model-generated responses by comparing them within groups of generation to optimize policy model, instead of training a separate value model (Critic). This approach leads to significant reduction in computational cost!
 
-GRPO can be applied to any verifiable task where the correctness of the response can be determined. For instance, in math reasoning, the correctness of the response can be easily verified by comparing it to the ground truth. 
+GRPO can be applied to any verifiable task where the correctness of the response can be determined. For instance, in math reasoning, the correctness of the response can be easily verified by comparing it to the ground truth.
 
 Before diving into the technical details, let's visualize how GRPO works at a high level:
 
-![deep](./img/2.jpg)
+![deep](https://huggingface.co/reasoning-course/images/resolve/main/grpo/16.png)
 
 Now that we have a visual overview, let's break down how GRPO works step by step.
 
@@ -28,14 +25,19 @@ Let's walk through each step of the algorithm in detail:
 
 The first step is to generate multiple possible answers for each question. This creates a diverse set of outputs that can be compared against each other.
 
-For each question $q$, the model will generate $G$ outputs (group size) from the trained policy:{ ${o_1, o_2, o_3, \dots, o_G}\pi_{\theta_{\text{old}}}$ }, $G=8$ where each $o_i$ represents one completion from the model.
+For each question \\( q \\), the model will generate  \\( G \\) outputs (group size) from the trained policy: { \\( {o_1, o_2, o_3, \dots, o_G}\pi_{\theta_{\text{old}}} \\) }, \\( G=8 \\) where each \\( o_i \\) represents one completion from the model.
 
-#### Example:
+#### Example
 
 To make this concrete, let's look at a simple arithmetic problem:
 
-- **Question** $q$ : $\text{Calculate}\space2 + 2 \times 6$
-- **Outputs** $(G = 8)$: $\{o_1:14 \text{ (correct)}, o_2:16 \text{ (wrong)}, o_3:10 \text{ (wrong)}, \ldots, o_8:14 \text{ (correct)}\}$
+**Question** 
+
+\\( q \\) : \\( \text{Calculate}\space2 + 2 \times 6 \\)
+
+**Outputs** 
+
+\\( (G = 8) \\): \\( \{o_1:14 \text{ (correct)}, o_2:16 \text{ (wrong)}, o_3:10 \text{ (wrong)}, \ldots, o_8:14 \text{ (correct)}\} \\)
 
 Notice how some of the generated answers are correct (14) while others are wrong (16 or 10). This diversity is crucial for the next step.
 
@@ -43,34 +45,36 @@ Notice how some of the generated answers are correct (14) while others are wrong
 
 Once we have multiple responses, we need a way to determine which ones are better than others. This is where the advantage calculation comes in.
 
-#### Reward Distribution:
+#### Reward Distribution
 
 First, we assign a reward score to each generated response. In this example, we'll use a reward model, but as we learnt in the previous section, we can use any reward returning function.
 
-Assign a RM score to each of the generated responses based on the correctness $r_i$ *(e.g. 1 for correct response, 0 for wrong response)* then for each of the $r_i$ calculate the following Advantage value 
+Assign a RM score to each of the generated responses based on the correctness \\( r_i \\) *(e.g. 1 for correct response, 0 for wrong response)* then for each of the \\( r_i \\) calculate the following Advantage value.
 
-#### Advantage Value Formula:
+#### Advantage Value Formula
 
 The key insight of GRPO is that we don't need absolute measures of quality - we can compare outputs within the same group. This is done using standardization:
 
 $$A_i = \frac{r_i - \text{mean}(\{r_1, r_2, \ldots, r_G\})}{\text{std}(\{r_1, r_2, \ldots, r_G\})}$$
 
-#### Example:
+#### Example
 
 Continuing with our arithmetic example for the same example above, imagine we have 8 responses, 4 of which is correct and the rest wrong, therefore;
-- Group Average: $mean(r_i) = 0.5$
-- Std: $std(r_i) = 0.53$
-- Advantage Value:
-	- Correct response: $A_i = \frac{1 - 0.5}{0.53}= 0.94$
-	- Wrong response: $A_i = \frac{0 - 0.5}{0.53}= -0.94$
 
-#### Interpretation:  
+| Metric | Value |
+|--------|-------|
+| Group Average | \\( mean(r_i) = 0.5 \\) |
+| Standard Deviation | \\( std(r_i) = 0.53 \\) |
+| Advantage Value (Correct response) | \\( A_i = \frac{1 - 0.5}{0.53}= 0.94 \\) |
+| Advantage Value (Wrong response) | \\( A_i = \frac{0 - 0.5}{0.53}= -0.94 \\) |
+
+#### Interpretation
 
 Now that we have calculated the advantage values, let's understand what they mean:
 
-This standardization (i.e. $A_i$ weighting) allows the model to assess each response's relative performance, guiding the optimization process to favour responses that are better than average (high reward) and discourage those that are worse.  For instance if $A_i > 0$, then the $o_i$ is better response than the average level within its group; and if $A_i < 0$, then the $o_i$ then the quality of the response is less than the average (i.e. poor quality/performance).
+This standardization (i.e. \\( A_i \\) weighting) allows the model to assess each response's relative performance, guiding the optimization process to favorable responses that are better than average (high reward) and discourage those that are worse.  For instance if \\( A_i > 0 \\), then the \\( o_i \\) is better response than the average level within its group; and if \\( A_i < 0 \\), then the \\( o_i \\) then the quality of the response is less than the average (i.e. poor quality/performance).
 
-For the example above, if $A_i = 0.94 \text{(correct output)}$ then during optimization steps its generation probability will be increased. 
+For the example above, if \\( A_i = 0.94 \text{(correct output)} \\) then during optimization steps its generation probability will be increased.
 
 With our advantage values calculated, we're now ready to update the policy.
 
@@ -80,7 +84,7 @@ The final step is to use these advantage values to update our model so that it b
 
 The target function for policy update is:
 
-$$J_{GRPO}(\theta) = \left[\frac{1}{G} \sum_{i=1}^{G} \min \left( \frac{\pi_{\theta}(o_i|q)}{\pi_{\theta_{old}}(o_i|q)} A_i \text{clip}\left( \frac{\pi_{\theta}(o_i|q)}{\pi_{\theta_{old}}(o_i|q)}, 1 - \epsilon, 1 + \epsilon \right) A_i \right)\right]- \beta D_{KL}(\pi_{\theta} || \pi_{ref})$$
+$$J_{GRPO}(\theta) = \left[\frac{1}{G} \sum_{i=1}^{G} \min \left( \frac{\pi_{\theta}(o_i|q)}{\pi_{\theta_{old}}(o_i|q)} A_i \text{clip}\left( \frac{\pi_{\theta}(o_i|q)}{\pi_{\theta_{old}}(o_i|q)}, 1 - \epsilon, 1 + \epsilon \right) A_i \right)\right]- \beta D_{KL}(\pi_{\theta} \|\| \pi_{ref})$$
 
 This formula might look intimidating at first, but it's built from several components that each serve an important purpose. Let's break them down one by one.
 
@@ -92,13 +96,14 @@ The GRPO update function combines several techniques to ensure stable and effect
 
 The probability ratio is defined as:
 
-$\left(\frac{\pi_{\theta}(o_i|q)}{\pi_{\theta_{old}}(o_i|q)}\right)$ 
+\\( \left(\frac{\pi_{\theta}(o_i|q)}{\pi_{\theta_{old}}(o_i|q)}\right) \\) 
 
 Intuitively, the formula compares how much the new model's response probability differs from the old model's response probability while incorporating a preference for responses that improve the expected outcome.
 
-#### Interpretation:
-- If $\text{ratio} > 1$, the new model assigns a higher probability to response $o_i$​ than the old model.
-- If $\text{ratio} < 1$, the new model assigns a lower probability to $o_i$​ 
+#### Interpretation
+
+- If  \\( \text{ratio} > 1 \\), the new model assigns a higher probability to response \\( o_i \\) than the old model.
+- If  \\( \text{ratio} < 1 \\), the new model assigns a lower probability to \\( o_i \\) 
 
 This ratio allows us to control how much the model changes at each step, which leads us to the next component.
 
@@ -106,22 +111,25 @@ This ratio allows us to control how much the model changes at each step, which l
 
 The clipping function is defined as:
 
-$\text{clip}\left( \frac{\pi_{\theta}(o_i|q)}{\pi_{\theta_{old}}(o_i|q)}, 1 - \epsilon, 1 + \epsilon\right)$ 
+\\( \text{clip}\left( \frac{\pi_{\theta}(o_i|q)}{\pi_{\theta_{old}}(o_i|q)}, 1 - \epsilon, 1 + \epsilon\right) \\) 
+
+Limit the ratio discussed above to be within \\( [1 - \epsilon, 1 + \epsilon] \\) to avoid/control drastic changes or crazy updates and stepping too far off from the old policy. In other words, it limit how much the probability ratio can increase to help maintaining stability by avoiding updates that push the new model too far from the old one.
 
-Limit the ratio discussed above to be within $[1 - \epsilon, 1 + \epsilon]$ to avoid/control drastic changes or crazy updates and stepping too far off from the old policy. In other words, it limit how much the probability ratio can increase to help maintaining stability by avoiding updates that push the new model too far from the old one.
+#### Example (ε = 0.2)
 
-#### Example $\space \text{suppose}(\epsilon = 0.2)$
 Let's look at two different scenarios to better understand this clipping function:
 
 - **Case 1**: if the new policy has a probability of 0.9 for a specific response and the old policy has a probabiliy of 0.5, it means this response is getting reinforeced by the new policy to have higher probability, but within a controlled limit which is the clipping to tight up its hands to not get drastic 
-	- $\text{Ratio}: \frac{\pi_{\theta}(o_i|q)}{\pi_{\theta_{old}}(o_i|q)} = \frac{0.9}{0.5} = 1.8  → \text{Clip}\space1.2$ (upper bound limit 1.2) 
+	- \\( \text{Ratio}: \frac{\pi_{\theta}(o_i|q)}{\pi_{\theta_{old}}(o_i|q)} = \frac{0.9}{0.5} = 1.8  → \text{Clip}\space1.2 \\) (upper bound limit 1.2) 
 - **Case 2**: If the new policy is not in favour of a response (lower probability e.g. 0.2), meaning if the response is not beneficial the increase might be incorrect, and the model would be penalized.
-	- $\text{Ratio}: \frac{\pi_{\theta}(o_i|q)}{\pi_{\theta_{old}}(o_i|q)} = \frac{0.2}{0.5} = 0.4  →\text{Clip}\space0.8$ (lower bound limit 0.8)
-#### Interpretation:
+	- \\( \text{Ratio}: \frac{\pi_{\theta}(o_i|q)}{\pi_{\theta_{old}}(o_i|q)} = \frac{0.2}{0.5} = 0.4  →\text{Clip}\space0.8 \\) (lower bound limit 0.8)
+
+#### Interpretation
+
 - The formula encourages the new model to favour responses that the old model underweighted **if they improve the outcome**.
-- If the old model already favoured a response with a high probability, the new model can still reinforce it **but only within a controlled limit $[1 - \epsilon, 1 + \epsilon]$, $\text{(e.g., }\epsilon = 0.2, \space \text{so} \space [0.8-1.2])$**.
+- If the old model already favoured a response with a high probability, the new model can still reinforce it **but only within a controlled limit \\( [1 - \epsilon, 1 + \epsilon] \\), \\( \text{(e.g., }\epsilon = 0.2, \space \text{so} \space [0.8-1.2]) \\)**.
 - If the old model overestimated a response that performs poorly, the new model is **discouraged** from maintaining that high probability.
-- Therefore, intuitively, By incorporating the probability ratio, the objective function ensures that updates to the policy are proportional to the advantage $A_i$ while being moderated to prevent drastic changes. T
+- Therefore, intuitively, By incorporating the probability ratio, the objective function ensures that updates to the policy are proportional to the advantage \\( A_i \\) while being moderated to prevent drastic changes. T
 
 While the clipping function helps prevent drastic changes, we need one more safeguard to ensure our model doesn't deviate too far from its original behavior.
 
@@ -129,33 +137,35 @@ While the clipping function helps prevent drastic changes, we need one more safe
 
 The KL divergence term is:
 
-$\beta D_{KL}(\pi_{\theta} || \pi_{ref})$
+\\( \beta D_{KL}(\pi_{\theta} \|\| \pi_{ref}) \\)
 
-In the KL divergence term, the $\pi_{ref}$ is basically the pre-update model's output, `per_token_logps` and $\pi_{\theta}$ is the new model's output, `new_per_token_logps`. Theoretically, KL divergence is minimized to prevent the model from deviating too far from its original behavior during optimization. This helps strike a balance between improving performance based on the reward signal and maintaining coherence. In this context, minimizing KL divergence reduces the risk of the model generating nonsensical text or, in the case of mathematical reasoning, producing extremely incorrect answers.
+In the KL divergence term, the \\( \pi_{ref} \\) is basically the pre-update model's output, `per_token_logps` and \\( \pi_{\theta} \\) is the new model's output, `new_per_token_logps`. Theoretically, KL divergence is minimized to prevent the model from deviating too far from its original behavior during optimization. This helps strike a balance between improving performance based on the reward signal and maintaining coherence. In this context, minimizing KL divergence reduces the risk of the model generating nonsensical text or, in the case of mathematical reasoning, producing extremely incorrect answers.
 
 #### Interpretation
+
 - A KL divergence penalty keeps the model's outputs close to its original distribution, preventing extreme shifts.
 - Instead of drifting towards completely irrational outputs, the model would refine its understanding while still allowing some exploration
 
 #### Math Definition
+
 For those interested in the mathematical details, let's look at the formal definition:
 
 Recall that KL distance is defined as follows:
-$$D_{KL}(P || Q) = \sum_{x \in X} P(x) \log \frac{P(x)}{Q(x)}$$
+$$D_{KL}(P \|\| Q) = \sum_{x \in X} P(x) \log \frac{P(x)}{Q(x)}$$
 In RLHF, the two distributions of interest are often the distribution of the new model version, P(x), and a distribution of the reference policy, Q(x).
 
-#### The Role of $\beta$ Parameter
+#### The Role of β Parameter
 
-The coefficient $\beta$ controls how strongly we enforce the KL divergence constraint:
+The coefficient \\( \beta \\) controls how strongly we enforce the KL divergence constraint:
 
--  **Higher $\beta$ (Stronger KL Penalty)**
+-  **Higher β (Stronger KL Penalty)**
     - More constraint on policy updates. The model remains close to its reference distribution.
     - Can slow down adaptation: The model may struggle to explore better responses.
-- **Lower $\beta$ (Weaker KL Penalty)**
+- **Lower β (Weaker KL Penalty)**
     - More freedom to update policy: The model can deviate more from the reference.
     - Faster adaptation but risk of instability: The model might learn reward-hacking behaviors.
 	- Over-optimization risk: If the reward model is flawed, the policy might generate nonsensical outputs.
-- **Original** [DeepSeekMath](https://arxiv.org/abs/2402.03300) paper set this $\beta= 0.04$
+- **Original** [DeepSeekMath](https://arxiv.org/abs/2402.03300) paper set this \\( \beta= 0.04 \\)
 
 Now that we understand the components of GRPO, let's see how they work together in a complete example.
 
@@ -169,9 +179,9 @@ $$\text{Q: Calculate}\space2 + 2 \times 6$$
 
 ### Step 1: Group Sampling
 
-First, we generate multiple responses from our model:
+First, we generate multiple responses from our model.
 
-Generate $(G = 8)$ responses, $4$ of which are correct answer ($14, \text{reward=} 1$) and $4$ incorrect $\text{(reward= 0)}$, Therefore:
+Generate   \\( (G = 8) \\) responses, \\( 4 \\) of which are correct answer (\\( 14, \text{reward=} 1 \\)) and \\( 4 \\) incorrect \\( \text{(reward= 0)} \\), Therefore:
 
 $${o_1:14(correct), o_2:10 (wrong), o_3:16 (wrong), ... o_G:14(correct)}$$
 
@@ -179,20 +189,20 @@ $${o_1:14(correct), o_2:10 (wrong), o_3:16 (wrong), ... o_G:14(correct)}$$
 
 Next, we calculate the advantage values to determine which responses are better than average:
 
-- Group Average: 
-$$mean(r_i) = 0.5$$
-- Std: $$std(r_i) = 0.53$$
-- Advantage Value:
-	- Correct response: $A_i = \frac{1 - 0.5}{0.53}= 0.94$
-	- Wrong response: $A_i = \frac{0 - 0.5}{0.53}= -0.94$
+| Statistic | Value |
+|-----------|-------|
+| Group Average | \\( mean(r_i) = 0.5 \\) |
+| Standard Deviation | \\( std(r_i) = 0.53 \\) |
+| Advantage Value (Correct response) | \\( A_i = \frac{1 - 0.5}{0.53}= 0.94 \\) |
+| Advantage Value (Wrong response) | \\( A_i = \frac{0 - 0.5}{0.53}= -0.94 \\) |
 
 ### Step 3: Policy Update
 
 Finally, we update our model to reinforce the correct responses:
 
-- Assuming the probability of old policy ($\pi_{\theta_{old}}$) for a correct output $o_1$ is $0.5$ and the new policy increases it to $0.7$ then:
+- Assuming the probability of old policy (\\( \pi_{\theta_{old}} \\)) for a correct output \\( o_1 \\) is \\( 0.5 \\) and the new policy increases it to \\( 0.7 \\) then:
 $$\text{Ratio}: \frac{0.7}{0.5} = 1.4  →\text{after Clip}\space1.2 \space (\epsilon = 0.2)$$
-- Then when the target function is re-weighted, the model tends to reinforce the generation of correct output, and the $\text{KL Divergence}$  limits the deviation from the reference policy. 
+- Then when the target function is re-weighted, the model tends to reinforce the generation of correct output, and the \\( \text{KL Divergence} \\)  limits the deviation from the reference policy. 
 
 With the theoretical understanding in place, let's see how GRPO can be implemented in code.
 
@@ -385,7 +395,6 @@ As you continue exploring GRPO, consider experimenting with different group size
 Happy training! 🚀
 
 ## References
-
 1. [RLHF Book by Nathan Lambert](https://github.com/natolambert/rlhf-book)
 2. [DeepSeek-V3 Technical Report](https://huggingface.co/papers/2412.19437)
 3. [DeepSeekMath](https://huggingface.co/papers/2402.03300)
diff --git a/chapters/en/chapter12/4.mdx b/chapters/en/chapter12/4.mdx
index ea9e339c1..769bb561c 100644
--- a/chapters/en/chapter12/4.mdx
+++ b/chapters/en/chapter12/4.mdx
@@ -4,11 +4,8 @@ In this page, we'll learn how to implement Group Relative Policy Optimization (G
 
 We'll explore the core concepts of GRPO as they are embodied in TRL's GRPOTrainer, using snippets from the official TRL documentation to guide us.
 
-<Tip>
-
-This chapter is aimed at TRL beginners. If you are already familiar with TRL, you might want to also check out the [Open R1 implementation](https://github.com/huggingface/open-r1/blob/main/src/open_r1/grpo.py) of GRPO.
-
-</Tip>
+> [!TIP]
+> This chapter is aimed at TRL beginners. If you are already familiar with TRL, you might want to also check out the [Open R1 implementation](https://github.com/huggingface/open-r1/blob/main/src/open_r1/grpo.py) of GRPO.
 
 First, let's remind ourselves of some of the important concepts of GRPO algorithm:
 
diff --git a/chapters/en/chapter12/5.mdx b/chapters/en/chapter12/5.mdx
index d240dbb7a..a00b205cd 100644
--- a/chapters/en/chapter12/5.mdx
+++ b/chapters/en/chapter12/5.mdx
@@ -8,11 +8,8 @@
 
 Now that you've seen the theory, let's put it into practice! In this exercise, you'll fine-tune a model with GRPO.
 
-<Tip>
-
-This exercise was written by LLM fine-tuning expert [@mlabonne](https://huggingface.co/mlabonne).
-
-</Tip>
+> [!TIP]
+> This exercise was written by LLM fine-tuning expert [@mlabonne](https://huggingface.co/mlabonne).
 
 ## Install dependencies
 
diff --git a/chapters/en/chapter12/6.mdx b/chapters/en/chapter12/6.mdx
index d40a10408..7a60ea6e9 100644
--- a/chapters/en/chapter12/6.mdx
+++ b/chapters/en/chapter12/6.mdx
@@ -10,12 +10,8 @@ In this exercise, you'll fine-tune a model with GRPO (Group Relative Policy Opti
 
 Unsloth is a library that accelerates LLM fine-tuning, making it possible to train models faster and with less computational resources. Unsloth is plugs into TRL, so we'll build on what we learned in the previous sections, and adapt it for Unsloth specifics.
 
-
-<Tip>
-
-This exercise can be run on a free Google Colab T4 GPU. For the best experience, follow along with the notebook linked above and try it out yourself.
-
-</Tip>
+> [!TIP]
+> This exercise can be run on a free Google Colab T4 GPU. For the best experience, follow along with the notebook linked above and try it out yourself.
 
 ## Install dependencies
 
@@ -72,11 +68,8 @@ model = FastLanguageModel.get_peft_model(
 
 This code loads the model in 4-bit quantization to save memory and applies LoRA (Low-Rank Adaptation) for efficient fine-tuning. The `target_modules` parameter specifies which layers of the model to fine-tune, and `use_gradient_checkpointing` enables training with longer contexts.
 
-<Tip>
-
-We won't cover the details of LoRA in this chapter, but you can learn more in [Chapter 11](/course/chapter11/3).
-
-</Tip>
+> [!TIP]
+> We won't cover the details of LoRA in this chapter, but you can learn more in [Chapter 11](/course/chapter11/3).
 
 ## Data Preparation
 
@@ -279,11 +272,8 @@ Now let's start the training:
 trainer.train()
 ```
 
-<Tip warning={true}>
-
-Training may take some time. You might not see rewards increase immediately - it can take 150-200 steps before you start seeing improvements. Be patient!
-
-</Tip>
+> [!WARNING]
+> Training may take some time. You might not see rewards increase immediately - it can take 150-200 steps before you start seeing improvements. Be patient!
 
 ## Testing the Model
 
diff --git a/chapters/en/chapter2/1.mdx b/chapters/en/chapter2/1.mdx
index 70e290a9d..a298aecc8 100644
--- a/chapters/en/chapter2/1.mdx
+++ b/chapters/en/chapter2/1.mdx
@@ -20,6 +20,5 @@ This chapter will begin with an end-to-end example where we use a model and a to
 
 Then we'll look at the tokenizer API, which is the other main component of the `pipeline()` function. Tokenizers take care of the first and last processing steps, handling the conversion from text to numerical inputs for the neural network, and the conversion back to text when it is needed. Finally, we'll show you how to handle sending multiple sentences through a model in a prepared batch, then wrap it all up with a closer look at the high-level `tokenizer()` function.
 
-<Tip>
-⚠️ In order to benefit from all features available with the Model Hub and 🤗 Transformers, we recommend <a href="https://huggingface.co/join">creating an account</a>.
-</Tip>
\ No newline at end of file
+> [!TIP]
+> ⚠️ In order to benefit from all features available with the Model Hub and 🤗 Transformers, we recommend <a href="https://huggingface.co/join">creating an account</a>.
\ No newline at end of file
diff --git a/chapters/en/chapter2/2.mdx b/chapters/en/chapter2/2.mdx
index 205e07e51..b47928a1c 100644
--- a/chapters/en/chapter2/2.mdx
+++ b/chapters/en/chapter2/2.mdx
@@ -228,8 +228,5 @@ Now we can conclude that the model predicted the following:
 
 We have successfully reproduced the three steps of the pipeline: preprocessing with tokenizers, passing the inputs through the model, and postprocessing! Now let's take some time to dive deeper into each of those steps.
 
-<Tip>
-
-✏️ **Try it out!** Choose two (or more) texts of your own and run them through the `sentiment-analysis` pipeline. Then replicate the steps you saw here yourself and check that you obtain the same results!
-
-</Tip>
+> [!TIP]
+> ✏️ **Try it out!** Choose two (or more) texts of your own and run them through the `sentiment-analysis` pipeline. Then replicate the steps you saw here yourself and check that you obtain the same results!
diff --git a/chapters/en/chapter2/4.mdx b/chapters/en/chapter2/4.mdx
index d07264690..024af4860 100644
--- a/chapters/en/chapter2/4.mdx
+++ b/chapters/en/chapter2/4.mdx
@@ -197,11 +197,8 @@ print(ids)
 
 These outputs, once converted to the appropriate framework tensor, can then be used as inputs to a model as seen earlier in this chapter.
 
-<Tip>
-
-✏️ **Try it out!** Replicate the two last steps (tokenization and conversion to input IDs) on the input sentences we used in section 2 ("I've been waiting for a HuggingFace course my whole life." and "I hate this so much!"). Check that you get the same input IDs we got earlier!
-
-</Tip>
+> [!TIP]
+> ✏️ **Try it out!** Replicate the two last steps (tokenization and conversion to input IDs) on the input sentences we used in section 2 ("I've been waiting for a HuggingFace course my whole life." and "I hate this so much!"). Check that you get the same input IDs we got earlier!
 
 ## Decoding[[decoding]]
 
diff --git a/chapters/en/chapter2/5.mdx b/chapters/en/chapter2/5.mdx
index 299a15c5f..8040813e6 100644
--- a/chapters/en/chapter2/5.mdx
+++ b/chapters/en/chapter2/5.mdx
@@ -96,11 +96,8 @@ batched_ids = [ids, ids]
 
 This is a batch of two identical sequences!
 
-<Tip>
-
-✏️ **Try it out!** Convert this `batched_ids` list into a tensor and pass it through your model. Check that you obtain the same logits as before (but twice)!
-
-</Tip>
+> [!TIP]
+> ✏️ **Try it out!** Convert this `batched_ids` list into a tensor and pass it through your model. Check that you obtain the same logits as before (but twice)!
 
 Batching allows the model to work when you feed it multiple sentences. Using multiple sequences is just as simple as building a batch with a single sequence. There's a second issue, though. When you're trying to batch together two (or more) sentences, they might be of different lengths. If you've ever worked with tensors before, you know that they need to be of rectangular shape, so you won't be able to convert the list of input IDs into a tensor directly. To work around this problem, we usually *pad* the inputs.
 
@@ -184,11 +181,8 @@ Now we get the same logits for the second sentence in the batch.
 
 Notice how the last value of the second sequence is a padding ID, which is a 0 value in the attention mask.
 
-<Tip>
-
-✏️ **Try it out!** Apply the tokenization manually on the two sentences used in section 2 ("I've been waiting for a HuggingFace course my whole life." and "I hate this so much!"). Pass them through the model and check that you get the same logits as in section 2. Now batch them together using the padding token, then create the proper attention mask. Check that you obtain the same results when going through the model!
-
-</Tip>
+> [!TIP]
+> ✏️ **Try it out!** Apply the tokenization manually on the two sentences used in section 2 ("I've been waiting for a HuggingFace course my whole life." and "I hate this so much!"). Pass them through the model and check that you get the same logits as in section 2. Now batch them together using the padding token, then create the proper attention mask. Check that you obtain the same results when going through the model!
 
 ## Longer sequences[[longer-sequences]]
 
diff --git a/chapters/en/chapter3/1.mdx b/chapters/en/chapter3/1.mdx
index 743b13c6c..2776eb1c2 100644
--- a/chapters/en/chapter3/1.mdx
+++ b/chapters/en/chapter3/1.mdx
@@ -15,11 +15,8 @@ In [Chapter 2](/course/chapter2) we explored how to use tokenizers and pretraine
 * How to leverage the 🤗 Accelerate library to easily run distributed training on any setup
 * How to apply current fine-tuning best practices for maximum performance
 
-<Tip>
-
-📚 **Essential Resources**: Before starting, you might want to review the [🤗 Datasets documentation](https://huggingface.co/docs/datasets/) for data processing.
-
-</Tip>
+> [!TIP]
+> 📚 **Essential Resources**: Before starting, you might want to review the [🤗 Datasets documentation](https://huggingface.co/docs/datasets/) for data processing.
 
 This chapter will also serve as an introduction to some Hugging Face libraries beyond the 🤗 Transformers library! We'll see how libraries like 🤗 Datasets, 🤗 Tokenizers, 🤗 Accelerate, and 🤗 Evaluate can help you train models more efficiently and effectively.
 
@@ -30,11 +27,8 @@ Each of the main sections in this chapter will teach you something different:
 
 By the end of this chapter, you'll be able to fine-tune models on your own datasets using both high-level APIs and custom training loops, applying the latest best practices in the field.
 
-<Tip>
-
-🎯 **What You'll Build**: By the end of this chapter, you'll have fine-tuned a BERT model for text classification and understand how to adapt the techniques to your own datasets and tasks.
-
-</Tip>
+> [!TIP]
+> 🎯 **What You'll Build**: By the end of this chapter, you'll have fine-tuned a BERT model for text classification and understand how to adapt the techniques to your own datasets and tasks.
 
 This chapter focuses exclusively on **PyTorch**, as it has become the standard framework for modern deep learning research and production. We'll use the latest APIs and best practices from the Hugging Face ecosystem.
 
diff --git a/chapters/en/chapter3/2.mdx b/chapters/en/chapter3/2.mdx
index bc1b00179..232441b84 100644
--- a/chapters/en/chapter3/2.mdx
+++ b/chapters/en/chapter3/2.mdx
@@ -45,11 +45,8 @@ The Hub doesn't just contain models; it also has multiple datasets in lots of di
 
 The 🤗 Datasets library provides a very simple command to download and cache a dataset on the Hub. We can download the MRPC dataset like this:
 
-<Tip>
-
-💡 **Additional Resources**: For more dataset loading techniques and examples, check out the [🤗 Datasets documentation](https://huggingface.co/docs/datasets/).
-
-</Tip> 
+> [!TIP]
+> 💡 **Additional Resources**: For more dataset loading techniques and examples, check out the [🤗 Datasets documentation](https://huggingface.co/docs/datasets/). 
 
 ```py
 from datasets import load_dataset
@@ -77,11 +74,8 @@ DatasetDict({
 
 As you can see, we get a `DatasetDict` object which contains the training set, the validation set, and the test set. Each of those contains several columns (`sentence1`, `sentence2`, `label`, and `idx`) and a variable number of rows, which are the number of elements in each set (so, there are 3,668 pairs of sentences in the training set, 408 in the validation set, and 1,725 in the test set).
 
-<Tip>
-
-This command downloads and caches the dataset, by default in *~/.cache/huggingface/datasets*. Recall from Chapter 2 that you can customize your cache folder by setting the `HF_HOME` environment variable.
-
-</Tip>
+> [!TIP]
+> This command downloads and caches the dataset, by default in *~/.cache/huggingface/datasets*. Recall from Chapter 2 that you can customize your cache folder by setting the `HF_HOME` environment variable.
 
 We can access each pair of sentences in our `raw_datasets` object by indexing, like with a dictionary:
 
@@ -112,11 +106,8 @@ raw_train_dataset.features
 
 Behind the scenes, `label` is of type `ClassLabel`, and the mapping of integers to label name is stored in the *names* folder. `0` corresponds to `not_equivalent`, and `1` corresponds to `equivalent`.
 
-<Tip>
-
-✏️ **Try it out!** Look at element 15 of the training set and element 87 of the validation set. What are their labels?
-
-</Tip>
+> [!TIP]
+> ✏️ **Try it out!** Look at element 15 of the training set and element 87 of the validation set. What are their labels?
 
 ### Preprocessing a dataset[[preprocessing-a-dataset]]
 
@@ -133,11 +124,8 @@ tokenized_sentences_1 = tokenizer(raw_datasets["train"]["sentence1"])
 tokenized_sentences_2 = tokenizer(raw_datasets["train"]["sentence2"])
 ```
 
-<Tip>
-
-💡 **Deep Dive**: For more advanced tokenization techniques and understanding how different tokenizers work, explore the [🤗 Tokenizers documentation](https://huggingface.co/docs/transformers/main/en/tokenizer_summary) and the [tokenization guide in the cookbook](https://huggingface.co/learn/cookbook/en/advanced_rag#tokenization-strategies).
-
-</Tip>
+> [!TIP]
+> 💡 **Deep Dive**: For more advanced tokenization techniques and understanding how different tokenizers work, explore the [🤗 Tokenizers documentation](https://huggingface.co/docs/transformers/main/en/tokenizer_summary) and the [tokenization guide in the cookbook](https://huggingface.co/learn/cookbook/en/advanced_rag#tokenization-strategies).
 
 However, we can't just pass two sequences to the model and get a prediction of whether the two sentences are paraphrases or not. We need to handle the two sequences as a pair, and apply the appropriate preprocessing. Fortunately, the tokenizer can also take a pair of sequences and prepare it the way our BERT model expects: 
 
@@ -156,11 +144,8 @@ inputs
 
 We discussed the `input_ids` and `attention_mask` keys in [Chapter 2](/course/chapter2), but we put off talking about `token_type_ids`. In this example, this is what tells the model which part of the input is the first sentence and which is the second sentence.
 
-<Tip>
-
-✏️ **Try it out!** Take element 15 of the training set and tokenize the two sentences separately and as a pair. What's the difference between the two results?
-
-</Tip>
+> [!TIP]
+> ✏️ **Try it out!** Take element 15 of the training set and tokenize the two sentences separately and as a pair. What's the difference between the two results?
 
 If we decode the IDs inside `input_ids` back to words:
 
@@ -215,11 +200,8 @@ This function takes a dictionary (like the items of our dataset) and returns a n
 
 Note that we've left the `padding` argument out in our tokenization function for now. This is because padding all the samples to the maximum length is not efficient: it's better to pad the samples when we're building a batch, as then we only need to pad to the maximum length in that batch, and not the maximum length in the entire dataset. This can save a lot of time and processing power when the inputs have very variable lengths!
 
-<Tip>
-
-📚 **Performance Tips**: Learn more about efficient data processing techniques in the [🤗 Datasets performance guide](https://huggingface.co/docs/datasets/about_arrow).
-
-</Tip>
+> [!TIP]
+> 📚 **Performance Tips**: Learn more about efficient data processing techniques in the [🤗 Datasets performance guide](https://huggingface.co/docs/datasets/about_arrow).
 
 Here is how we apply the tokenization function on all our datasets at once. We're using `batched=True` in our call to `map` so the function is applied to multiple elements of our dataset at once, and not on each element separately. This allows for faster preprocessing.
 
@@ -259,11 +241,8 @@ The last thing we will need to do is pad all the examples to the length of the l
 
 The function that is responsible for putting together samples inside a batch is called a *collate function*. It's an argument you can pass when you build a `DataLoader`, the default being a function that will just convert your samples to PyTorch tensors and concatenate them (recursively if your elements are lists, tuples, or dictionaries). This won't be possible in our case since the inputs we have won't all be of the same size. We have deliberately postponed the padding, to only apply it as necessary on each batch and avoid having over-long inputs with a lot of padding. This will speed up training by quite a bit, but note that if you're training on a TPU it can cause problems — TPUs prefer fixed shapes, even when that requires extra padding.
 
-<Tip>
-
-🚀 **Optimization Guide**: For more details on optimizing training performance, including padding strategies and TPU considerations, see the [🤗 Transformers performance documentation](https://huggingface.co/docs/transformers/main/en/performance).
-
-</Tip>
+> [!TIP]
+> 🚀 **Optimization Guide**: For more details on optimizing training performance, including padding strategies and TPU considerations, see the [🤗 Transformers performance documentation](https://huggingface.co/docs/transformers/main/en/performance).
 
 To do this in practice, we have to define a collate function that will apply the correct amount of padding to the items of the dataset we want to batch together. Fortunately, the 🤗 Transformers library provides us with such a function via `DataCollatorWithPadding`. It takes a tokenizer when you instantiate it (to know which padding token to use, and whether the model expects padding to be on the left or on the right of the inputs) and will do everything you need:
 
@@ -301,13 +280,10 @@ batch = data_collator(samples)
 
 Looking good! Now that we've gone from raw text to batches our model can deal with, we're ready to fine-tune it!
 
-<Tip>
-
-✏️ **Try it out!** Replicate the preprocessing on the GLUE SST-2 dataset. It's a little bit different since it's composed of single sentences instead of pairs, but the rest of what we did should look the same. For a harder challenge, try to write a preprocessing function that works on any of the GLUE tasks.
-
-📖 **Additional Practice**: Check out these hands-on examples from the [🤗 Transformers examples](https://huggingface.co/docs/transformers/main/en/notebooks).
-
-</Tip>
+> [!TIP]
+> ✏️ **Try it out!** Replicate the preprocessing on the GLUE SST-2 dataset. It's a little bit different since it's composed of single sentences instead of pairs, but the rest of what we did should look the same. For a harder challenge, try to write a preprocessing function that works on any of the GLUE tasks.
+>
+> 📖 **Additional Practice**: Check out these hands-on examples from the [🤗 Transformers examples](https://huggingface.co/docs/transformers/main/en/notebooks).
 
 Perfect! Now that we have preprocessed our data with the latest best practices from the 🤗 Datasets library, we're ready to move on to training our model using the modern Trainer API. The next section will show you how to fine-tune your model effectively using the latest features and optimizations available in the Hugging Face ecosystem.
 
@@ -435,12 +411,9 @@ Test your understanding of data processing concepts:
 	]}
 />
 
-<Tip>
-
-💡 **Key Takeaways:**
-- Use `batched=True` with `Dataset.map()` for significantly faster preprocessing
-- Dynamic padding with `DataCollatorWithPadding` is more efficient than fixed-length padding
-- Always preprocess your data to match what your model expects (numerical tensors, correct column names)
-- The 🤗 Datasets library provides powerful tools for efficient data processing at scale
-
-</Tip>
+> [!TIP]
+> 💡 **Key Takeaways:**
+> - Use `batched=True` with `Dataset.map()` for significantly faster preprocessing
+> - Dynamic padding with `DataCollatorWithPadding` is more efficient than fixed-length padding
+> - Always preprocess your data to match what your model expects (numerical tensors, correct column names)
+> - The 🤗 Datasets library provides powerful tools for efficient data processing at scale
diff --git a/chapters/en/chapter3/3.mdx b/chapters/en/chapter3/3.mdx
index 12705fca7..20c47152c 100644
--- a/chapters/en/chapter3/3.mdx
+++ b/chapters/en/chapter3/3.mdx
@@ -13,11 +13,8 @@
 
 🤗 Transformers provides a `Trainer` class to help you fine-tune any of the pretrained models it provides on your dataset with modern best practices. Once you've done all the data preprocessing work in the last section, you have just a few steps left to define the `Trainer`. The hardest part is likely to be preparing the environment to run `Trainer.train()`, as it will run very slowly on a CPU. If you don't have a GPU set up, you can get access to free GPUs or TPUs on [Google Colab](https://colab.research.google.com/).
 
-<Tip>
-
-📚 **Training Resources**: Before diving into training, familiarize yourself with the comprehensive [🤗 Transformers training guide](https://huggingface.co/docs/transformers/main/en/training) and explore practical examples in the [fine-tuning cookbook](https://huggingface.co/learn/cookbook/en/fine_tuning_code_llm_on_single_gpu).
-
-</Tip>
+> [!TIP]
+> 📚 **Training Resources**: Before diving into training, familiarize yourself with the comprehensive [🤗 Transformers training guide](https://huggingface.co/docs/transformers/main/en/training) and explore practical examples in the [fine-tuning cookbook](https://huggingface.co/learn/cookbook/en/fine_tuning_code_llm_on_single_gpu).
 
 The code examples below assume you have already executed the examples in the previous section. Here is a short summary recapping what you need:
 
@@ -50,11 +47,8 @@ training_args = TrainingArguments("test-trainer")
 
 If you want to automatically upload your model to the Hub during training, pass along `push_to_hub=True` in the `TrainingArguments`. We will learn more about this in [Chapter 4](/course/chapter4/3)
 
-<Tip>
-
-🚀 **Advanced Configuration**: For detailed information on all available training arguments and optimization strategies, check out the [TrainingArguments documentation](https://huggingface.co/docs/transformers/main/en/main_classes/trainer#transformers.TrainingArguments) and the [training configuration cookbook](https://huggingface.co/learn/cookbook/en/fine_tuning_code_llm_on_single_gpu).
-
-</Tip>
+> [!TIP]
+> 🚀 **Advanced Configuration**: For detailed information on all available training arguments and optimization strategies, check out the [TrainingArguments documentation](https://huggingface.co/docs/transformers/main/en/main_classes/trainer#transformers.TrainingArguments) and the [training configuration cookbook](https://huggingface.co/learn/cookbook/en/fine_tuning_code_llm_on_single_gpu).
 
 The second step is to define our model. As in the [previous chapter](/course/chapter2), we will use the `AutoModelForSequenceClassification` class, with two labels:
 
@@ -83,11 +77,8 @@ trainer = Trainer(
 
 When you pass a tokenizer as the `processing_class`, the default `data_collator` used by the `Trainer` will be a `DataCollatorWithPadding`. You can skip the `data_collator=data_collator` line in this case, but we included it here to show you this important part of the processing pipeline.
 
-<Tip>
-
-📖 **Learn More**: For comprehensive details on the Trainer class and its parameters, visit the [Trainer API documentation](https://huggingface.co/docs/transformers/main/en/main_classes/trainer) and explore advanced usage patterns in the [training cookbook recipes](https://huggingface.co/learn/cookbook/en/fine_tuning_code_llm_on_single_gpu).
-
-</Tip>
+> [!TIP]
+> 📖 **Learn More**: For comprehensive details on the Trainer class and its parameters, visit the [Trainer API documentation](https://huggingface.co/docs/transformers/main/en/main_classes/trainer) and explore advanced usage patterns in the [training cookbook recipes](https://huggingface.co/learn/cookbook/en/fine_tuning_code_llm_on_single_gpu).
 
 To fine-tune the model on our dataset, we just have to call the `train()` method of our `Trainer`:
 
@@ -137,11 +128,8 @@ metric.compute(predictions=preds, references=predictions.label_ids)
 {'accuracy': 0.8578431372549019, 'f1': 0.8996539792387542}
 ```
 
-<Tip>
-
-Learn about different evaluation metrics and strategies in the [🤗 Evaluate documentation](https://huggingface.co/docs/evaluate/).
-
-</Tip>
+> [!TIP]
+> Learn about different evaluation metrics and strategies in the [🤗 Evaluate documentation](https://huggingface.co/docs/evaluate/).
 
 The exact results you get may vary, as the random initialization of the model head might change the metrics it achieved. Here, we can see our model has an accuracy of 85.78% on the validation set and an F1 score of 89.97. Those are the two metrics used to evaluate results on the MRPC dataset for the GLUE benchmark. The table in the [BERT paper](https://arxiv.org/pdf/1810.04805.pdf) reported an F1 score of 88.9 for the base model. That was the `uncased` model while we are currently using the `cased` model, which explains the better result.
 
@@ -216,21 +204,15 @@ training_args = TrainingArguments(
 )
 ```
 
-<Tip>
-
-🎯 **Performance Optimization**: For more advanced training techniques including distributed training, memory optimization, and hardware-specific optimizations, explore the [🤗 Transformers performance guide](https://huggingface.co/docs/transformers/main/en/performance).
-
-</Tip>
+> [!TIP]
+> 🎯 **Performance Optimization**: For more advanced training techniques including distributed training, memory optimization, and hardware-specific optimizations, explore the [🤗 Transformers performance guide](https://huggingface.co/docs/transformers/main/en/performance).
 
 The `Trainer` will work out of the box on multiple GPUs or TPUs and provides lots of options for distributed training. We will go over everything it supports in Chapter 10.
 
 This concludes the introduction to fine-tuning using the `Trainer` API. An example of doing this for most common NLP tasks will be given in [Chapter 7](/course/chapter7), but for now let's look at how to do the same thing with a pure PyTorch training loop.
 
-<Tip>
-
-📝 **More Examples**: Check out the comprehensive collection of [🤗 Transformers notebooks](https://huggingface.co/docs/transformers/main/en/notebooks).
-
-</Tip>
+> [!TIP]
+> 📝 **More Examples**: Check out the comprehensive collection of [🤗 Transformers notebooks](https://huggingface.co/docs/transformers/main/en/notebooks).
 
 ## Section Quiz[[section-quiz]]
 
@@ -380,14 +362,11 @@ Test your understanding of the Trainer API and fine-tuning concepts:
 	]}
 />
 
-<Tip>
-
-💡 **Key Takeaways:**
-- The `Trainer` API provides a high-level interface that handles most training complexity
-- Use `processing_class` to specify your tokenizer for proper data handling
-- `TrainingArguments` controls all aspects of training: learning rate, batch size, evaluation strategy, and optimizations
-- `compute_metrics` enables custom evaluation metrics beyond just training loss
-- Modern features like mixed precision (`fp16=True`) and gradient accumulation can significantly improve training efficiency
-
-</Tip>
+> [!TIP]
+> 💡 **Key Takeaways:**
+> - The `Trainer` API provides a high-level interface that handles most training complexity
+> - Use `processing_class` to specify your tokenizer for proper data handling
+> - `TrainingArguments` controls all aspects of training: learning rate, batch size, evaluation strategy, and optimizations
+> - `compute_metrics` enables custom evaluation metrics beyond just training loss
+> - Modern features like mixed precision (`fp16=True`) and gradient accumulation can significantly improve training efficiency
 
diff --git a/chapters/en/chapter3/4.mdx b/chapters/en/chapter3/4.mdx
index e69c4e750..2bbcb8e57 100644
--- a/chapters/en/chapter3/4.mdx
+++ b/chapters/en/chapter3/4.mdx
@@ -11,11 +11,8 @@
 
 Now we'll see how to achieve the same results as we did in the last section without using the `Trainer` class, implementing a training loop from scratch with modern PyTorch best practices. Again, we assume you have done the data processing in section 2. Here is a short summary covering everything you will need:
 
-<Tip>
-
-🏗️ **Training from Scratch**: This section builds on the previous content. For comprehensive guidance on PyTorch training loops and best practices, check out the [🤗 Transformers training documentation](https://huggingface.co/docs/transformers/main/en/training#train-in-native-pytorch) and the [custom training cookbook](https://huggingface.co/learn/cookbook/en/fine_tuning_code_llm_on_single_gpu#model).
-
-</Tip>
+> [!TIP]
+> 🏗️ **Training from Scratch**: This section builds on the previous content. For comprehensive guidance on PyTorch training loops and best practices, check out the [🤗 Transformers training documentation](https://huggingface.co/docs/transformers/main/en/training#train-in-native-pytorch) and the [custom training cookbook](https://huggingface.co/learn/cookbook/en/fine_tuning_code_llm_on_single_gpu#model).
 
 ```py
 from datasets import load_dataset
@@ -116,16 +113,13 @@ from torch.optim import AdamW
 optimizer = AdamW(model.parameters(), lr=5e-5)
 ```
 
-<Tip>
-
-💡 **Modern Optimization Tips**: For even better performance, you can try:
-- **AdamW with weight decay**: `AdamW(model.parameters(), lr=5e-5, weight_decay=0.01)`
-- **8-bit Adam**: Use `bitsandbytes` for memory-efficient optimization
-- **Different learning rates**: Lower learning rates (1e-5 to 3e-5) often work better for large models
-
-🚀 **Optimization Resources**: Learn more about optimizers and training strategies in the [🤗 Transformers optimization guide](https://huggingface.co/docs/transformers/main/en/performance#optimizer).
-
-</Tip>
+> [!TIP]
+> 💡 **Modern Optimization Tips**: For even better performance, you can try:
+> - **AdamW with weight decay**: `AdamW(model.parameters(), lr=5e-5, weight_decay=0.01)`
+> - **8-bit Adam**: Use `bitsandbytes` for memory-efficient optimization
+> - **Different learning rates**: Lower learning rates (1e-5 to 3e-5) often work better for large models
+>
+> 🚀 **Optimization Resources**: Learn more about optimizers and training strategies in the [🤗 Transformers optimization guide](https://huggingface.co/docs/transformers/main/en/performance#optimizer).
 
 Finally, the learning rate scheduler used by default is just a linear decay from the maximum value (5e-5) to 0. To properly define it, we need to know the number of training steps we will take, which is the number of epochs we want to run multiplied by the number of training batches (which is the length of our training dataloader). The `Trainer` uses three epochs by default, so we will follow that:
 
@@ -184,18 +178,15 @@ for epoch in range(num_epochs):
         progress_bar.update(1)
 ```
 
-<Tip>
-
-💡 **Modern Training Optimizations**: To make your training loop even more efficient, consider:
-
-- **Gradient Clipping**: Add `torch.nn.utils.clip_grad_norm_(model.parameters(), max_norm=1.0)` before `optimizer.step()`
-- **Mixed Precision**: Use `torch.cuda.amp.autocast()` and `GradScaler` for faster training
-- **Gradient Accumulation**: Accumulate gradients over multiple batches to simulate larger batch sizes
-- **Checkpointing**: Save model checkpoints periodically to resume training if interrupted
-
-🔧 **Implementation Guide**: For detailed examples of these optimizations, see the [🤗 Transformers efficient training guide](https://huggingface.co/docs/transformers/main/en/perf_train_gpu_one) and the [range of optimizers](https://huggingface.co/docs/transformers/main/en/optimizers).
-
-</Tip>
+> [!TIP]
+> 💡 **Modern Training Optimizations**: To make your training loop even more efficient, consider:
+>
+> - **Gradient Clipping**: Add `torch.nn.utils.clip_grad_norm_(model.parameters(), max_norm=1.0)` before `optimizer.step()`
+> - **Mixed Precision**: Use `torch.cuda.amp.autocast()` and `GradScaler` for faster training
+> - **Gradient Accumulation**: Accumulate gradients over multiple batches to simulate larger batch sizes
+> - **Checkpointing**: Save model checkpoints periodically to resume training if interrupted
+>
+> 🔧 **Implementation Guide**: For detailed examples of these optimizations, see the [🤗 Transformers efficient training guide](https://huggingface.co/docs/transformers/main/en/perf_train_gpu_one) and the [range of optimizers](https://huggingface.co/docs/transformers/main/en/optimizers).
 
 You can see that the core of the training loop looks a lot like the one in the introduction. We didn't ask for any reporting, so this training loop will not tell us anything about how the model fares. We need to add an evaluation loop for that.
 
@@ -204,11 +195,8 @@ You can see that the core of the training loop looks a lot like the one in the i
 
 As we did earlier, we will use a metric provided by the 🤗 Evaluate library. We've already seen the `metric.compute()` method, but metrics can actually accumulate batches for us as we go over the prediction loop with the method `add_batch()`. Once we have accumulated all the batches, we can get the final result with `metric.compute()`. Here's how to implement all of this in an evaluation loop:
 
-<Tip>
-
-📊 **Evaluation Best Practices**: For more sophisticated evaluation strategies and metrics, explore the [🤗 Evaluate documentation](https://huggingface.co/docs/evaluate/) and the [comprehensive evaluation cookbook](https://github.com/huggingface/evaluation-guidebook).
-
-</Tip>
+> [!TIP]
+> 📊 **Evaluation Best Practices**: For more sophisticated evaluation strategies and metrics, explore the [🤗 Evaluate documentation](https://huggingface.co/docs/evaluate/) and the [comprehensive evaluation cookbook](https://github.com/huggingface/evaluation-guidebook).
 
 ```py
 import evaluate
@@ -233,11 +221,8 @@ metric.compute()
 
 Again, your results will be slightly different because of the randomness in the model head initialization and the data shuffling, but they should be in the same ballpark.
 
-<Tip>
-
-✏️ **Try it out!** Modify the previous training loop to fine-tune your model on the SST-2 dataset.
-
-</Tip>
+> [!TIP]
+> ✏️ **Try it out!** Modify the previous training loop to fine-tune your model on the SST-2 dataset.
 
 ### Supercharge your training loop with 🤗 Accelerate[[supercharge-your-training-loop-with-accelerate]]
 
@@ -245,11 +230,8 @@ Again, your results will be slightly different because of the randomness in the
 
 The training loop we defined earlier works fine on a single CPU or GPU. But using the [🤗 Accelerate](https://github.com/huggingface/accelerate) library, with just a few adjustments we can enable distributed training on multiple GPUs or TPUs. 🤗 Accelerate handles the complexity of distributed training, mixed precision, and device placement automatically. Starting from the creation of the training and validation dataloaders, here is what our manual training loop looks like:
 
-<Tip>
-
-⚡ **Accelerate Deep Dive**: Learn everything about distributed training, mixed precision, and hardware optimization in the [🤗 Accelerate documentation](https://huggingface.co/docs/accelerate/) and explore practical examples in the [transformers documentation](https://huggingface.co/docs/transformers/main/en/accelerate).
-
-</Tip>
+> [!TIP]
+> ⚡ **Accelerate Deep Dive**: Learn everything about distributed training, mixed precision, and hardware optimization in the [🤗 Accelerate documentation](https://huggingface.co/docs/accelerate/) and explore practical examples in the [transformers documentation](https://huggingface.co/docs/transformers/main/en/accelerate).
 
 ```py
 from accelerate import Accelerator
@@ -293,9 +275,8 @@ The first line to add is the import line. The second line instantiates an `Accel
 
 Then the main bulk of the work is done in the line that sends the dataloaders, the model, and the optimizer to `accelerator.prepare()`. This will wrap those objects in the proper container to make sure your distributed training works as intended. The remaining changes to make are removing the line that puts the batch on the `device` (again, if you want to keep this you can just change it to use `accelerator.device`) and replacing `loss.backward()` with `accelerator.backward(loss)`.
 
-<Tip>
-⚠️ In order to benefit from the speed-up offered by Cloud TPUs, we recommend padding your samples to a fixed length with the `padding="max_length"` and `max_length` arguments of the tokenizer.
-</Tip>
+> [!TIP]
+> ⚠️ In order to benefit from the speed-up offered by Cloud TPUs, we recommend padding your samples to a fixed length with the `padding="max_length"` and `max_length` arguments of the tokenizer.
 
 If you'd like to copy and paste it to play around, here's what the complete training loop looks like with 🤗 Accelerate:
 
@@ -361,11 +342,8 @@ notebook_launcher(training_function)
 
 You can find more examples in the [🤗 Accelerate repo](https://github.com/huggingface/accelerate/tree/main/examples).
 
-<Tip>
-
-🌐 **Distributed Training**: For comprehensive coverage of multi-GPU and multi-node training, check out the [🤗 Transformers distributed training guide](https://huggingface.co/docs/transformers/main/en/perf_train_gpu_many) and the [scaling training cookbook](https://huggingface.co/docs/transformers/main/en/accelerate).
-
-</Tip>
+> [!TIP]
+> 🌐 **Distributed Training**: For comprehensive coverage of multi-GPU and multi-node training, check out the [🤗 Transformers distributed training guide](https://huggingface.co/docs/transformers/main/en/perf_train_gpu_many) and the [scaling training cookbook](https://huggingface.co/docs/transformers/main/en/accelerate).
 
 ### Next Steps and Best Practices[[next-steps-and-best-practices]]
 
@@ -555,14 +533,11 @@ Test your understanding of custom training loops and advanced training technique
 	]}
 />
 
-<Tip>
-
-💡 **Key Takeaways:**
-- Manual training loops give you complete control but require understanding of the proper sequence: forward → backward → optimizer step → scheduler step → zero gradients
-- AdamW with weight decay is the recommended optimizer for transformer models
-- Always use `model.eval()` and `torch.no_grad()` during evaluation for correct behavior and efficiency
-- 🤗 Accelerate makes distributed training accessible with minimal code changes
-- Device management (moving tensors to GPU/CPU) is crucial for PyTorch operations
-- Modern techniques like mixed precision, gradient accumulation, and gradient clipping can significantly improve training efficiency
-
-</Tip>
+> [!TIP]
+> 💡 **Key Takeaways:**
+> - Manual training loops give you complete control but require understanding of the proper sequence: forward → backward → optimizer step → scheduler step → zero gradients
+> - AdamW with weight decay is the recommended optimizer for transformer models
+> - Always use `model.eval()` and `torch.no_grad()` during evaluation for correct behavior and efficiency
+> - 🤗 Accelerate makes distributed training accessible with minimal code changes
+> - Device management (moving tensors to GPU/CPU) is crucial for PyTorch operations
+> - Modern techniques like mixed precision, gradient accumulation, and gradient clipping can significantly improve training efficiency
diff --git a/chapters/en/chapter3/5.mdx b/chapters/en/chapter3/5.mdx
index ee107eaf8..c402879c4 100644
--- a/chapters/en/chapter3/5.mdx
+++ b/chapters/en/chapter3/5.mdx
@@ -76,11 +76,8 @@ The accuracy curve shows the percentage of correct predictions over time. Unlike
 - **Increase with training**: Accuracy should generally improve as the model learns if it is able to learn the patterns in the data
 - **May show plateaus**: Accuracy often increases in discrete jumps rather than smoothly, as the model makes predictions that are close to the true labels
 
-<Tip>
-
-💡 **Why Accuracy Curves Are "Steppy"**: Unlike loss, which is continuous, accuracy is calculated by comparing discrete predictions to true labels. Small improvements in model confidence might not change the final prediction, causing accuracy to remain flat until a threshold is crossed.
-
-</Tip>
+> [!TIP]
+> 💡 **Why Accuracy Curves Are "Steppy"**: Unlike loss, which is continuous, accuracy is calculated by comparing discrete predictions to true labels. Small improvements in model confidence might not change the final prediction, causing accuracy to remain flat until a threshold is crossed.
 
 ### Convergence[[convergence]]
 
@@ -110,14 +107,11 @@ One notable difference between the curves is the smoothness and the presence of
 
 For example, in a binary classifier distinguishing cats (0) from dogs (1), if the model predicts 0.3 for an image of a dog (true value 1), this is rounded to 0 and is an incorrect classification. If in the next step it predicts 0.4, it's still incorrect. The loss will have decreased because 0.4 is closer to 1 than 0.3, but the accuracy remains unchanged, creating a plateau. The accuracy will only jump up when the model predicts a value greater than 0.5 that gets rounded to 1.
 
-<Tip>
-
-**Characteristics of healthy curves:**
-- **Smooth decline in loss**: Both training and validation loss decrease steadily
-- **Close training/validation performance**: Small gap between training and validation metrics
-- **Convergence**: Curves level off, indicating the model has learned the patterns
-
-</Tip>
+> [!TIP]
+> **Characteristics of healthy curves:**
+> - **Smooth decline in loss**: Both training and validation loss decrease steadily
+> - **Close training/validation performance**: Small gap between training and validation metrics
+> - **Convergence**: Curves level off, indicating the model has learned the patterns
 
 ### Practical Examples[[practical-examples]]
 
@@ -141,16 +135,14 @@ After the training process is complete, you can analyze the complete curves to u
 3. **Generalization**: How close are training and validation performance?
 4. **Trends**: Would additional training likely improve performance?
 
-<Tip>
-
-🔍 **W&B Dashboard Features**: Weights & Biases automatically creates beautiful, interactive plots of your learning curves. You can:
-- Compare multiple runs side by side
-- Add custom metrics and visualizations  
-- Set up alerts for anomalous behavior
-- Share results with your team
-
-Learn more in the [Weights & Biases documentation](https://docs.wandb.ai/).
-</Tip>
+> [!TIP]
+> 🔍 **W&B Dashboard Features**: Weights & Biases automatically creates beautiful, interactive plots of your learning curves. You can:
+> - Compare multiple runs side by side
+> - Add custom metrics and visualizations  
+> - Set up alerts for anomalous behavior
+> - Share results with your team
+>
+> Learn more in the [Weights & Biases documentation](https://docs.wandb.ai/).
 
 #### Overfitting[[overfitting]]
 
@@ -279,19 +271,16 @@ training_args = TrainingArguments(
 
 Understanding learning curves is crucial for becoming an effective machine learning practitioner. These visual tools provide immediate feedback about your model's training progress and help you make informed decisions about when to stop training, adjust hyperparameters, or try different approaches. With practice, you'll develop an intuitive understanding of what healthy learning curves look like and how to address issues when they arise. 
 
-<Tip>
-
-💡 **Key Takeaways:**
-- Learning curves are essential tools for understanding model training progress
-- Monitor both loss and accuracy curves, but remember they have different characteristics
-- Overfitting shows as diverging training/validation performance
-- Underfitting shows as poor performance on both training and validation data
-- Tools like Weights & Biases make it easy to track and analyze learning curves
-- Early stopping and proper regularization can address most common training issues
-
-🔬 **Next Steps**: Practice analyzing learning curves on your own fine-tuning experiments. Try different hyperparameters and observe how they affect the curve shapes. This hands-on experience is the best way to develop intuition for reading training progress.
-
-</Tip>
+> [!TIP]
+> 💡 **Key Takeaways:**
+> - Learning curves are essential tools for understanding model training progress
+> - Monitor both loss and accuracy curves, but remember they have different characteristics
+> - Overfitting shows as diverging training/validation performance
+> - Underfitting shows as poor performance on both training and validation data
+> - Tools like Weights & Biases make it easy to track and analyze learning curves
+> - Early stopping and proper regularization can address most common training issues
+>
+> 🔬 **Next Steps**: Practice analyzing learning curves on your own fine-tuning experiments. Try different hyperparameters and observe how they affect the curve shapes. This hands-on experience is the best way to develop intuition for reading training progress.
 
 ## Section Quiz[[section-quiz]]
 
diff --git a/chapters/en/chapter3/6.mdx b/chapters/en/chapter3/6.mdx
index e24553dfd..cb05477c1 100644
--- a/chapters/en/chapter3/6.mdx
+++ b/chapters/en/chapter3/6.mdx
@@ -16,31 +16,25 @@ That was comprehensive! In the first two chapters you learned about models and t
 * Used 🤗 Accelerate to make your training code work seamlessly on multiple GPUs or TPUs
 * Applied modern optimization techniques like mixed precision training and gradient accumulation
 
-<Tip>
-
-🎉 **Congratulations!** You've mastered the fundamentals of fine-tuning transformer models. You're now ready to tackle real-world ML projects!
-
-📖 **Continue Learning**: Explore these resources to deepen your knowledge:
-- [🤗 Transformers task guides](https://huggingface.co/docs/transformers/main/en/tasks/sequence_classification) for specific NLP tasks
-- [🤗 Transformers examples](https://huggingface.co/docs/transformers/main/en/notebooks) for comprehensive notebooks
-
-🚀 **Next Steps**: 
-- Try fine-tuning on your own dataset using the techniques you've learned
-- Experiment with different model architectures available on the [Hugging Face Hub](https://huggingface.co/models)
-- Join the [Hugging Face community](https://discuss.huggingface.co/) to share your projects and get help
-
-</Tip>
+> [!TIP]
+> 🎉 **Congratulations!** You've mastered the fundamentals of fine-tuning transformer models. You're now ready to tackle real-world ML projects!
+>
+> 📖 **Continue Learning**: Explore these resources to deepen your knowledge:
+> - [🤗 Transformers task guides](https://huggingface.co/docs/transformers/main/en/tasks/sequence_classification) for specific NLP tasks
+> - [🤗 Transformers examples](https://huggingface.co/docs/transformers/main/en/notebooks) for comprehensive notebooks
+>
+> 🚀 **Next Steps**: 
+> - Try fine-tuning on your own dataset using the techniques you've learned
+> - Experiment with different model architectures available on the [Hugging Face Hub](https://huggingface.co/models)
+> - Join the [Hugging Face community](https://discuss.huggingface.co/) to share your projects and get help
 
 This is just the beginning of your journey with 🤗 Transformers. In the next chapter, we'll explore how to share your models and tokenizers with the community and contribute to the ever-growing ecosystem of pretrained models.
 
 The skills you've developed here - data preprocessing, training configuration, evaluation, and optimization - are fundamental to any machine learning project. Whether you're working on text classification, named entity recognition, question answering, or any other NLP task, these techniques will serve you well.
 
-<Tip>
-
-💡 **Pro Tips for Success**:
-- Always start with a strong baseline using the `Trainer` API before implementing custom training loops
-- Use the 🤗 Hub to find pretrained models that are close to your task for better starting points
-- Monitor your training with proper evaluation metrics and don't forget to save checkpoints
-- Leverage the community - share your models and datasets to help others and get feedback on your work
-
-</Tip>
+> [!TIP]
+> 💡 **Pro Tips for Success**:
+> - Always start with a strong baseline using the `Trainer` API before implementing custom training loops
+> - Use the 🤗 Hub to find pretrained models that are close to your task for better starting points
+> - Monitor your training with proper evaluation metrics and don't forget to save checkpoints
+> - Leverage the community - share your models and datasets to help others and get feedback on your work
diff --git a/chapters/en/chapter4/2.mdx b/chapters/en/chapter4/2.mdx
index 0bd50669b..e1fe5bebb 100644
--- a/chapters/en/chapter4/2.mdx
+++ b/chapters/en/chapter4/2.mdx
@@ -91,6 +91,5 @@ model = TFAutoModelForMaskedLM.from_pretrained("camembert-base")
 ```
 {/if}
 
-<Tip>
-When using a pretrained model, make sure to check how it was trained, on which datasets, its limits, and its biases. All of this information should be indicated on its model card.
-</Tip>
+> [!TIP]
+> When using a pretrained model, make sure to check how it was trained, on which datasets, its limits, and its biases. All of this information should be indicated on its model card.
diff --git a/chapters/en/chapter4/3.mdx b/chapters/en/chapter4/3.mdx
index 9de3fb1d8..586cf03f0 100644
--- a/chapters/en/chapter4/3.mdx
+++ b/chapters/en/chapter4/3.mdx
@@ -172,11 +172,8 @@ Click on the "Files and versions" tab, and you should see the files visible in t
 </div>
 {/if}
 
-<Tip>
-
-✏️ **Try it out!** Take the model and tokenizer associated with the `bert-base-cased` checkpoint and upload them to a repo in your namespace using the `push_to_hub()` method. Double-check that the repo appears properly on your page before deleting it.
-
-</Tip>
+> [!TIP]
+> ✏️ **Try it out!** Take the model and tokenizer associated with the `bert-base-cased` checkpoint and upload them to a repo in your namespace using the `push_to_hub()` method. Double-check that the repo appears properly on your page before deleting it.
 
 As you've seen, the `push_to_hub()` method accepts several arguments, making it possible to upload to a specific repository or organization namespace, or to use a different API token. We recommend you take a look at the method specification available directly in the [🤗 Transformers documentation](https://huggingface.co/transformers/model_sharing) to get an idea of what is possible.
 
@@ -465,9 +462,8 @@ If you look at the file sizes (for example, with `ls -lh`), you should see that
 
 {/if}
 
-<Tip>
-✏️ When creating the repository from the web interface, the *.gitattributes* file is automatically set up to consider files with certain extensions, such as *.bin* and *.h5*, as large files, and git-lfs will track them with no necessary setup on your side.
-</Tip> 
+> [!TIP]
+> ✏️ When creating the repository from the web interface, the *.gitattributes* file is automatically set up to consider files with certain extensions, such as *.bin* and *.h5*, as large files, and git-lfs will track them with no necessary setup on your side. 
 
 We can now go ahead and proceed like we would usually do with traditional Git repositories. We can add all the files to Git's staging environment using the `git add` command:
 
diff --git a/chapters/en/chapter5/2.mdx b/chapters/en/chapter5/2.mdx
index acf417bba..ba7471279 100644
--- a/chapters/en/chapter5/2.mdx
+++ b/chapters/en/chapter5/2.mdx
@@ -48,11 +48,8 @@ SQuAD_it-train.json.gz:	   82.2% -- replaced with SQuAD_it-train.json
 
 We can see that the compressed files have been replaced with _SQuAD_it-train.json_ and _SQuAD_it-test.json_, and that the data is stored in the JSON format.
 
-<Tip>
-
-✎ If you're wondering why there's a `!` character in the above shell commands, that's because we're running them within a Jupyter notebook. Simply remove the prefix if you want to download and unzip the dataset within a terminal.
-
-</Tip>
+> [!TIP]
+> ✎ If you're wondering why there's a `!` character in the above shell commands, that's because we're running them within a Jupyter notebook. Simply remove the prefix if you want to download and unzip the dataset within a terminal.
 
 To load a JSON file with the `load_dataset()` function, we just need to know if we're dealing with ordinary JSON (similar to a nested dictionary) or JSON Lines (line-separated JSON). Like many question answering datasets, SQuAD-it uses the nested format, with all the text stored in a `data` field. This means we can load the dataset by specifying the `field` argument as follows:
 
@@ -126,11 +123,8 @@ DatasetDict({
 
 This is exactly what we wanted. Now, we can apply various preprocessing techniques to clean up the data, tokenize the reviews, and so on.
 
-<Tip>
-
-The `data_files` argument of the `load_dataset()` function is quite flexible and can be either a single file path, a list of file paths, or a dictionary that maps split names to file paths. You can also glob files that match a specified pattern according to the rules used by the Unix shell (e.g., you can glob all the JSON files in a directory as a single split by setting `data_files="*.json"`). See the 🤗 Datasets [documentation](https://huggingface.co/docs/datasets/loading#local-and-remote-files) for more details.
-
-</Tip>
+> [!TIP]
+> The `data_files` argument of the `load_dataset()` function is quite flexible and can be either a single file path, a list of file paths, or a dictionary that maps split names to file paths. You can also glob files that match a specified pattern according to the rules used by the Unix shell (e.g., you can glob all the JSON files in a directory as a single split by setting `data_files="*.json"`). See the 🤗 Datasets [documentation](https://huggingface.co/docs/datasets/loading#local-and-remote-files) for more details.
 
 The loading scripts in 🤗 Datasets actually support automatic decompression of the input files, so we could have skipped the use of `gzip` by pointing the `data_files` argument directly to the compressed files:
 
@@ -158,10 +152,7 @@ squad_it_dataset = load_dataset("json", data_files=data_files, field="data")
 
 This returns the same `DatasetDict` object obtained above, but saves us the step of manually downloading and decompressing the _SQuAD_it-*.json.gz_ files. This wraps up our foray into the various ways to load datasets that aren't hosted on the Hugging Face Hub. Now that we've got a dataset to play with, let's get our hands dirty with various data-wrangling techniques!
 
-<Tip>
-
-✏️ **Try it out!** Pick another dataset hosted on GitHub or the [UCI Machine Learning Repository](https://archive.ics.uci.edu/ml/index.php) and try loading it both locally and remotely using the techniques introduced above. For bonus points, try loading a dataset that’s stored in a CSV or text format (see the [documentation](https://huggingface.co/docs/datasets/loading#local-and-remote-files) for more information on these formats).
-
-</Tip>
+> [!TIP]
+> ✏️ **Try it out!** Pick another dataset hosted on GitHub or the [UCI Machine Learning Repository](https://archive.ics.uci.edu/ml/index.php) and try loading it both locally and remotely using the techniques introduced above. For bonus points, try loading a dataset that’s stored in a CSV or text format (see the [documentation](https://huggingface.co/docs/datasets/loading#local-and-remote-files) for more information on these formats).
 
 
diff --git a/chapters/en/chapter5/3.mdx b/chapters/en/chapter5/3.mdx
index 8b4619fa5..88faf83e3 100644
--- a/chapters/en/chapter5/3.mdx
+++ b/chapters/en/chapter5/3.mdx
@@ -89,11 +89,8 @@ DatasetDict({
 })
 ```
 
-<Tip>
-
-✏️ **Try it out!** Use the `Dataset.unique()` function to find the number of unique drugs and conditions in the training and test sets.
-
-</Tip>
+> [!TIP]
+> ✏️ **Try it out!** Use the `Dataset.unique()` function to find the number of unique drugs and conditions in the training and test sets.
 
 Next, let's normalize all the `condition` labels using `Dataset.map()`. As we did with tokenization in [Chapter 3](/course/chapter3), we can define a simple function that can be applied across all the rows of each split in `drug_dataset`:
 
@@ -217,11 +214,8 @@ drug_dataset["train"].sort("review_length")[:3]
 
 As we suspected, some reviews contain just a single word, which, although it may be okay for sentiment analysis, would not be informative if we want to predict the condition.
 
-<Tip>
-
-🙋 An alternative way to add new columns to a dataset is with the `Dataset.add_column()` function. This allows you to provide the column as a Python list or NumPy array and can be handy in situations where `Dataset.map()` is not well suited for your analysis.
-
-</Tip>
+> [!TIP]
+> 🙋 An alternative way to add new columns to a dataset is with the `Dataset.add_column()` function. This allows you to provide the column as a Python list or NumPy array and can be handy in situations where `Dataset.map()` is not well suited for your analysis.
 
 Let's use the `Dataset.filter()` function to remove reviews that contain fewer than 30 words. Similarly to what we did with the `condition` column, we can filter out the very short reviews by requiring that the reviews have a length above this threshold:
 
@@ -236,11 +230,8 @@ print(drug_dataset.num_rows)
 
 As you can see, this has removed around 15% of the reviews from our original training and test sets.
 
-<Tip>
-
-✏️ **Try it out!** Use the `Dataset.sort()` function to inspect the reviews with the largest numbers of words. See the [documentation](https://huggingface.co/docs/datasets/package_reference/main_classes#datasets.Dataset.sort) to see which argument you need to use sort the reviews by length in descending order.
-
-</Tip>
+> [!TIP]
+> ✏️ **Try it out!** Use the `Dataset.sort()` function to inspect the reviews with the largest numbers of words. See the [documentation](https://huggingface.co/docs/datasets/package_reference/main_classes#datasets.Dataset.sort) to see which argument you need to use sort the reviews by length in descending order.
 
 The last thing we need to deal with is the presence of HTML character codes in our reviews. We can use Python's `html` module to unescape these characters, like so:
 
@@ -297,11 +288,8 @@ As you saw in [Chapter 3](/course/chapter3), we can pass one or several examples
 
 You can also time a whole cell by putting `%%time` at the beginning of the cell. On the hardware we executed this on, it showed 10.8s for this instruction (it's the number written after "Wall time").
 
-<Tip>
-
-✏️ **Try it out!** Execute the same instruction with and without `batched=True`, then try it with a slow tokenizer (add `use_fast=False` in the `AutoTokenizer.from_pretrained()` method) so you can see what numbers you get on your hardware.
-
-</Tip>
+> [!TIP]
+> ✏️ **Try it out!** Execute the same instruction with and without `batched=True`, then try it with a slow tokenizer (add `use_fast=False` in the `AutoTokenizer.from_pretrained()` method) so you can see what numbers you get on your hardware.
 
 Here are the results we obtained with and without batching, with a fast and a slow tokenizer:
 
@@ -338,19 +326,13 @@ Options         | Fast tokenizer | Slow tokenizer
 
 Those are much more reasonable results for the slow tokenizer, but the performance of the fast tokenizer was also substantially improved. Note, however, that won't always be the case -- for values of `num_proc` other than 8, our tests showed that it was faster to use `batched=True` without that option. In general, we don't recommend using Python multiprocessing for fast tokenizers with `batched=True`.
 
-<Tip>
-
-Using `num_proc` to speed up your processing is usually a great idea, as long as the function you are using is not already doing some kind of multiprocessing of its own.
-
-</Tip>
+> [!TIP]
+> Using `num_proc` to speed up your processing is usually a great idea, as long as the function you are using is not already doing some kind of multiprocessing of its own.
 
 All of this functionality condensed into a single method is already pretty amazing, but there's more! With `Dataset.map()` and `batched=True` you can change the number of elements in your dataset. This is super useful in many situations where you want to create several training features from one example, and we will need to do this as part of the preprocessing for several of the NLP tasks we'll undertake in [Chapter 7](/course/chapter7).
 
-<Tip>
-
-💡 In machine learning, an _example_ is usually defined as the set of _features_ that we feed to the model. In some contexts, these features will be the set of columns in a `Dataset`, but in others (like here and for question answering), multiple features can be extracted from a single example and belong to a single column.
-
-</Tip>
+> [!TIP]
+> 💡 In machine learning, an _example_ is usually defined as the set of _features_ that we feed to the model. In some contexts, these features will be the set of columns in a `Dataset`, but in others (like here and for question answering), multiple features can be extracted from a single example and belong to a single column.
 
 Let's have a look at how it works! Here we will tokenize our examples and truncate them to a maximum length of 128, but we will ask the tokenizer to return *all* the chunks of the texts instead of just the first one. This can be done with `return_overflowing_tokens=True`:
 
@@ -520,11 +502,8 @@ Let's create a `pandas.DataFrame` for the whole training set by selecting all th
 train_df = drug_dataset["train"][:]
 ```
 
-<Tip>
-
-🚨 Under the hood, `Dataset.set_format()` changes the return format for the dataset's `__getitem__()` dunder method. This means that when we want to create a new object like `train_df` from a `Dataset` in the `"pandas"` format, we need to slice the whole dataset to obtain a `pandas.DataFrame`. You can verify for yourself that the type of `drug_dataset["train"]` is `Dataset`, irrespective of the output format.
-
-</Tip>
+> [!TIP]
+> 🚨 Under the hood, `Dataset.set_format()` changes the return format for the dataset's `__getitem__()` dunder method. This means that when we want to create a new object like `train_df` from a `Dataset` in the `"pandas"` format, we need to slice the whole dataset to obtain a `pandas.DataFrame`. You can verify for yourself that the type of `drug_dataset["train"]` is `Dataset`, irrespective of the output format.
 
 
 From here we can use all the Pandas functionality that we want. For example, we can do fancy chaining to compute the class distribution among the `condition` entries:
@@ -595,11 +574,8 @@ Dataset({
 })
 ```
 
-<Tip>
-
-✏️ **Try it out!** Compute the average rating per drug and store the result in a new `Dataset`.
-
-</Tip>
+> [!TIP]
+> ✏️ **Try it out!** Compute the average rating per drug and store the result in a new `Dataset`.
 
 This wraps up our tour of the various preprocessing techniques available in 🤗 Datasets. To round out the section, let's create a validation set to prepare the dataset for training a classifier on. Before doing so, we'll reset the output format of `drug_dataset` from `"pandas"` to `"arrow"`:
 
diff --git a/chapters/en/chapter5/4.mdx b/chapters/en/chapter5/4.mdx
index 8e6415a3f..1fdd4e031 100644
--- a/chapters/en/chapter5/4.mdx
+++ b/chapters/en/chapter5/4.mdx
@@ -44,11 +44,8 @@ Dataset({
 
 We can see that there are 15,518,009 rows and 2 columns in our dataset -- that's a lot!
 
-<Tip>
-
-✎ By default, 🤗 Datasets will decompress the files needed to load a dataset. If you want to preserve hard drive space, you can pass `DownloadConfig(delete_extracted=True)` to the `download_config` argument of `load_dataset()`. See the [documentation](https://huggingface.co/docs/datasets/package_reference/builder_classes#datasets.DownloadConfig) for more details.
-
-</Tip>
+> [!TIP]
+> ✎ By default, 🤗 Datasets will decompress the files needed to load a dataset. If you want to preserve hard drive space, you can pass `DownloadConfig(delete_extracted=True)` to the `download_config` argument of `load_dataset()`. See the [documentation](https://huggingface.co/docs/datasets/package_reference/builder_classes#datasets.DownloadConfig) for more details.
 
 Let's inspect the contents of the first example:
 
@@ -99,11 +96,8 @@ Dataset size (cache file) : 19.54 GB
 
 Nice -- despite it being almost 20 GB large, we're able to load and access the dataset with much less RAM!
 
-<Tip>
-
-✏️ **Try it out!** Pick one of the [subsets](https://the-eye.eu/public/AI/pile_preliminary_components/) from the Pile that is larger than your laptop or desktop's RAM, load it with 🤗 Datasets, and measure the amount of RAM used. Note that to get an accurate measurement, you'll want to do this in a new process. You can find the decompressed sizes of each subset in Table 1 of [the Pile paper](https://arxiv.org/abs/2101.00027).
-
-</Tip>
+> [!TIP]
+> ✏️ **Try it out!** Pick one of the [subsets](https://the-eye.eu/public/AI/pile_preliminary_components/) from the Pile that is larger than your laptop or desktop's RAM, load it with 🤗 Datasets, and measure the amount of RAM used. Note that to get an accurate measurement, you'll want to do this in a new process. You can find the decompressed sizes of each subset in Table 1 of [the Pile paper](https://arxiv.org/abs/2101.00027).
 
 If you're familiar with Pandas, this result might come as a surprise because of Wes Kinney's famous [rule of thumb](https://wesmckinney.com/blog/apache-arrow-pandas-internals/) that you typically need 5 to 10 times as much RAM as the size of your dataset. So how does 🤗 Datasets solve this memory management problem? 🤗 Datasets treats each dataset as a [memory-mapped file](https://en.wikipedia.org/wiki/Memory-mapped_file), which provides a mapping between RAM and filesystem storage that allows the library to access and operate on elements of the dataset without needing to fully load it into memory.
 
@@ -131,11 +125,8 @@ print(
 
 Here we've used Python's `timeit` module to measure the execution time taken by `code_snippet`. You'll typically be able to iterate over a dataset at speed of a few tenths of a GB/s to several GB/s. This works great for the vast majority of applications, but sometimes you'll have to work with a dataset that is too large to even store on your laptop's hard drive. For example, if we tried to download the Pile in its entirety, we'd need 825 GB of free disk space! To handle these cases, 🤗 Datasets provides a streaming feature that allows us to download and access elements on the fly, without needing to download the whole dataset. Let's take a look at how this works.
 
-<Tip>
-
-💡 In Jupyter notebooks you can also time cells using the [`%%timeit` magic function](https://ipython.readthedocs.io/en/stable/interactive/magics.html#magic-timeit).
-
-</Tip>
+> [!TIP]
+> 💡 In Jupyter notebooks you can also time cells using the [`%%timeit` magic function](https://ipython.readthedocs.io/en/stable/interactive/magics.html#magic-timeit).
 
 ## Streaming datasets[[streaming-datasets]]
 
@@ -173,11 +164,8 @@ next(iter(tokenized_dataset))
 {'input_ids': [101, 4958, 5178, 4328, 6779, ...], 'attention_mask': [1, 1, 1, 1, 1, ...]}
 ```
 
-<Tip>
-
-💡 To speed up tokenization with streaming you can pass `batched=True`, as we saw in the last section. It will process the examples batch by batch; the default batch size is 1,000 and can be specified with the `batch_size` argument.
-
-</Tip>
+> [!TIP]
+> 💡 To speed up tokenization with streaming you can pass `batched=True`, as we saw in the last section. It will process the examples batch by batch; the default batch size is 1,000 and can be specified with the `batch_size` argument.
 
 You can also shuffle a streamed dataset using `IterableDataset.shuffle()`, but unlike `Dataset.shuffle()` this only shuffles the elements in a predefined `buffer_size`:
 
@@ -278,10 +266,7 @@ next(iter(pile_dataset["train"]))
  'text': 'It is done, and submitted. You can play “Survival of the Tastiest” on Android, and on the web...'}
 ```
 
-<Tip>
-
-✏️ **Try it out!** Use one of the large Common Crawl corpora like [`mc4`](https://huggingface.co/datasets/mc4) or [`oscar`](https://huggingface.co/datasets/oscar) to create a streaming multilingual dataset that represents the spoken proportions of languages in a country of your choice. For example, the four national languages in Switzerland are German, French, Italian, and Romansh, so you could try creating a Swiss corpus by sampling the Oscar subsets according to their spoken proportion.
-
-</Tip>
+> [!TIP]
+> ✏️ **Try it out!** Use one of the large Common Crawl corpora like [`mc4`](https://huggingface.co/datasets/mc4) or [`oscar`](https://huggingface.co/datasets/oscar) to create a streaming multilingual dataset that represents the spoken proportions of languages in a country of your choice. For example, the four national languages in Switzerland are German, French, Italian, and Romansh, so you could try creating a Swiss corpus by sampling the Oscar subsets according to their spoken proportion.
 
 You now have all the tools you need to load and process datasets of all shapes and sizes -- but unless you're exceptionally lucky, there will come a point in your NLP journey where you'll have to actually create a dataset to solve the problem at hand. That's the topic of the next section!
diff --git a/chapters/en/chapter5/5.mdx b/chapters/en/chapter5/5.mdx
index 5688ea04a..aae856d22 100644
--- a/chapters/en/chapter5/5.mdx
+++ b/chapters/en/chapter5/5.mdx
@@ -113,11 +113,8 @@ response.json()
 
 Whoa, that's a lot of information! We can see useful fields like `title`, `body`, and `number` that describe the issue, as well as information about the GitHub user who opened the issue.
 
-<Tip>
-
-✏️ **Try it out!** Click on a few of the URLs in the JSON payload above to get a feel for what type of information each GitHub issue is linked to.
-
-</Tip>
+> [!TIP]
+> ✏️ **Try it out!** Click on a few of the URLs in the JSON payload above to get a feel for what type of information each GitHub issue is linked to.
 
 As described in the GitHub [documentation](https://docs.github.com/en/rest/overview/resources-in-the-rest-api#rate-limiting), unauthenticated requests are limited to 60 requests per hour. Although you can increase the `per_page` query parameter to reduce the number of requests you make, you will still hit the rate limit on any repository that has more than a few thousand issues. So instead, you should follow GitHub's [instructions](https://docs.github.com/en/github/authenticating-to-github/creating-a-personal-access-token) on creating a _personal access token_ so that you can boost the rate limit to 5,000 requests per hour. Once you have your token, you can include it as part of the request header:
 
@@ -126,11 +123,8 @@ GITHUB_TOKEN = xxx  # Copy your GitHub token here
 headers = {"Authorization": f"token {GITHUB_TOKEN}"}
 ```
 
-<Tip warning={true}>
-
-⚠️ Do not share a notebook with your `GITHUB_TOKEN` pasted in it. We recommend you delete the last cell once you have executed it to avoid leaking this information accidentally. Even better, store the token in a *.env* file and use the [`python-dotenv` library](https://github.com/theskumar/python-dotenv) to load it automatically for you as an environment variable.
-
-</Tip>
+> [!WARNING]
+> ⚠️ Do not share a notebook with your `GITHUB_TOKEN` pasted in it. We recommend you delete the last cell once you have executed it to avoid leaking this information accidentally. Even better, store the token in a *.env* file and use the [`python-dotenv` library](https://github.com/theskumar/python-dotenv) to load it automatically for you as an environment variable.
 
 Now that we have our access token, let's create a function that can download all the issues from a GitHub repository:
 
@@ -237,11 +231,8 @@ issues_dataset = issues_dataset.map(
 )
 ```
 
-<Tip>
-
-✏️ **Try it out!** Calculate the average time it takes to close issues in 🤗 Datasets. You may find the `Dataset.filter()` function useful to filter out the pull requests and open issues, and you can use the `Dataset.set_format()` function to convert the dataset to a `DataFrame` so you can easily manipulate the `created_at` and `closed_at` timestamps. For bonus points, calculate the average time it takes to close pull requests.
-
-</Tip>
+> [!TIP]
+> ✏️ **Try it out!** Calculate the average time it takes to close issues in 🤗 Datasets. You may find the `Dataset.filter()` function useful to filter out the pull requests and open issues, and you can use the `Dataset.set_format()` function to convert the dataset to a `DataFrame` so you can easily manipulate the `created_at` and `closed_at` timestamps. For bonus points, calculate the average time it takes to close pull requests.
 
 Although we could proceed to further clean up the dataset by dropping or renaming some columns, it is generally a good practice to keep the dataset as "raw" as possible at this stage so that it can be easily used in multiple applications.
 
@@ -363,11 +354,8 @@ Dataset({
 
 Cool, we've pushed our dataset to the Hub and it's available for others to use! There's just one important thing left to do: adding a _dataset card_ that explains how the corpus was created and provides other useful information for the community.
 
-<Tip>
-
-💡 You can also upload a dataset to the Hugging Face Hub directly from the terminal by using `huggingface-cli` and a bit of Git magic. See the [🤗 Datasets guide](https://huggingface.co/docs/datasets/share#share-a-dataset-using-the-cli) for details on how to do this.
-
-</Tip>
+> [!TIP]
+> 💡 You can also upload a dataset to the Hugging Face Hub directly from the terminal by using `huggingface-cli` and a bit of Git magic. See the [🤗 Datasets guide](https://huggingface.co/docs/datasets/share#share-a-dataset-using-the-cli) for details on how to do this.
 
 ## Creating a dataset card[[creating-a-dataset-card]]
 
@@ -389,18 +377,12 @@ You can create the *README.md* file directly on the Hub, and you can find a temp
 <img src="https://huggingface.co/datasets/huggingface-course/documentation-images/resolve/main/en/chapter5/dataset-card.png" alt="A dataset card." width="80%"/>
 </div>
 
-<Tip>
-
-✏️ **Try it out!** Use the `dataset-tagging` application and [🤗 Datasets guide](https://github.com/huggingface/datasets/blob/master/templates/README_guide.md) to complete the *README.md* file for your GitHub issues dataset.
-
-</Tip>
+> [!TIP]
+> ✏️ **Try it out!** Use the `dataset-tagging` application and [🤗 Datasets guide](https://github.com/huggingface/datasets/blob/master/templates/README_guide.md) to complete the *README.md* file for your GitHub issues dataset.
 
 That's it! We've seen in this section that creating a good dataset can be quite involved, but fortunately uploading it and sharing it with the community is not. In the next section we'll use our new dataset to create a semantic search engine with 🤗 Datasets that can match questions to the most relevant issues and comments.
 
-<Tip>
-
-✏️ **Try it out!** Go through the steps we took in this section to create a dataset of GitHub issues for your favorite open source library (pick something other than 🤗 Datasets, of course!). For bonus points, fine-tune a multilabel classifier to predict the tags present in the `labels` field.
-
-</Tip>
+> [!TIP]
+> ✏️ **Try it out!** Go through the steps we took in this section to create a dataset of GitHub issues for your favorite open source library (pick something other than 🤗 Datasets, of course!). For bonus points, fine-tune a multilabel classifier to predict the tags present in the `labels` field.
 
 
diff --git a/chapters/en/chapter5/6.mdx b/chapters/en/chapter5/6.mdx
index 418abbbb6..e7dfbbf49 100644
--- a/chapters/en/chapter5/6.mdx
+++ b/chapters/en/chapter5/6.mdx
@@ -176,11 +176,8 @@ Dataset({
 Okay, this has given us a few thousand comments to work with!
 
 
-<Tip>
-
-✏️ **Try it out!** See if you can use `Dataset.map()` to explode the `comments` column of `issues_dataset` _without_ resorting to the use of Pandas. This is a little tricky; you might find the ["Batch mapping"](https://huggingface.co/docs/datasets/about_map_batch#batch-mapping) section of the 🤗 Datasets documentation useful for this task.
-
-</Tip>
+> [!TIP]
+> ✏️ **Try it out!** See if you can use `Dataset.map()` to explode the `comments` column of `issues_dataset` _without_ resorting to the use of Pandas. This is a little tricky; you might find the ["Batch mapping"](https://huggingface.co/docs/datasets/about_map_batch#batch-mapping) section of the 🤗 Datasets documentation useful for this task.
 
 Now that we have one comment per row, let's create a new `comments_length` column that contains the number of words per comment:
 
@@ -511,8 +508,5 @@ URL: https://github.com/huggingface/datasets/issues/824
 
 Not bad! Our second hit seems to match the query.
 
-<Tip>
-
-✏️ **Try it out!** Create your own query and see whether you can find an answer in the retrieved documents. You might have to increase the `k` parameter in `Dataset.get_nearest_examples()` to broaden the search.
-
-</Tip>
\ No newline at end of file
+> [!TIP]
+> ✏️ **Try it out!** Create your own query and see whether you can find an answer in the retrieved documents. You might have to increase the `k` parameter in `Dataset.get_nearest_examples()` to broaden the search.
\ No newline at end of file
diff --git a/chapters/en/chapter6/2.mdx b/chapters/en/chapter6/2.mdx
index e966c486e..cd50a0c85 100644
--- a/chapters/en/chapter6/2.mdx
+++ b/chapters/en/chapter6/2.mdx
@@ -11,11 +11,8 @@ If a language model is not available in the language you are interested in, or i
 
 <Youtube id="DJimQynXZsQ"/>
 
-<Tip warning={true}>
-
-⚠️ Training a tokenizer is not the same as training a model! Model training uses stochastic gradient descent to make the loss a little bit smaller for each batch. It's randomized by nature (meaning you have to set some seeds to get the same results when doing the same training twice). Training a tokenizer is a statistical process that tries to identify which subwords are the best to pick for a given corpus, and the exact rules used to pick them depend on the tokenization algorithm. It's deterministic, meaning you always get the same results when training with the same algorithm on the same corpus.
-
-</Tip>
+> [!WARNING]
+> ⚠️ Training a tokenizer is not the same as training a model! Model training uses stochastic gradient descent to make the loss a little bit smaller for each batch. It's randomized by nature (meaning you have to set some seeds to get the same results when doing the same training twice). Training a tokenizer is a statistical process that tries to identify which subwords are the best to pick for a given corpus, and the exact rules used to pick them depend on the tokenization algorithm. It's deterministic, meaning you always get the same results when training with the same algorithm on the same corpus.
 
 ## Assembling a corpus[[assembling-a-corpus]]
 
diff --git a/chapters/en/chapter6/3.mdx b/chapters/en/chapter6/3.mdx
index 88250f6df..e3e5ee182 100644
--- a/chapters/en/chapter6/3.mdx
+++ b/chapters/en/chapter6/3.mdx
@@ -33,11 +33,8 @@ In the following discussion, we will often make the distinction between "slow" a
 `batched=True`  | 10.8s          | 4min41s
 `batched=False` | 59.2s          | 5min3s
 
-<Tip warning={true}>
-
-⚠️ When tokenizing a single sentence, you won't always see a difference in speed between the slow and fast versions of the same tokenizer. In fact, the fast version might actually be slower! It's only when tokenizing lots of texts in parallel at the same time that you will be able to clearly see the difference.
-
-</Tip>
+> [!WARNING]
+> ⚠️ When tokenizing a single sentence, you won't always see a difference in speed between the slow and fast versions of the same tokenizer. In fact, the fast version might actually be slower! It's only when tokenizing lots of texts in parallel at the same time that you will be able to clearly see the difference.
 
 ## Batch encoding[[batch-encoding]]
 
@@ -107,13 +104,10 @@ encoding.word_ids()
 
 We can see that the tokenizer's special tokens `[CLS]` and `[SEP]` are mapped to `None`, and then each token is mapped to the word it originates from. This is especially useful to determine if a token is at the start of a word or if two tokens are in the same word. We could rely on the `##` prefix for that, but it only works for BERT-like tokenizers; this method works for any type of tokenizer as long as it's a fast one. In the next chapter, we'll see how we can use this capability to apply the labels we have for each word properly to the tokens in tasks like named entity recognition (NER) and part-of-speech (POS) tagging. We can also use it to mask all the tokens coming from the same word in masked language modeling (a technique called _whole word masking_).
 
-<Tip>
-
-The notion of what a word is complicated. For instance, does "I'll" (a contraction of "I will") count as one or two words? It actually depends on the tokenizer and the pre-tokenization operation it applies. Some tokenizers just split on spaces, so they will consider this as one word. Others use punctuation on top of spaces, so will consider it two words.
-
-✏️ **Try it out!** Create a tokenizer from the `bert-base-cased` and `roberta-base` checkpoints and tokenize "81s" with them. What do you observe? What are the word IDs?
-
-</Tip>
+> [!TIP]
+> The notion of what a word is complicated. For instance, does "I'll" (a contraction of "I will") count as one or two words? It actually depends on the tokenizer and the pre-tokenization operation it applies. Some tokenizers just split on spaces, so they will consider this as one word. Others use punctuation on top of spaces, so will consider it two words.
+>
+> ✏️ **Try it out!** Create a tokenizer from the `bert-base-cased` and `roberta-base` checkpoints and tokenize "81s" with them. What do you observe? What are the word IDs?
 
 Similarly, there is a `sentence_ids()` method that we can use to map a token to the sentence it came from (though in this case, the `token_type_ids` returned by the tokenizer can give us the same information).
 
@@ -130,11 +124,8 @@ Sylvain
 
 As we mentioned previously, this is all powered by the fact the fast tokenizer keeps track of the span of text each token comes from in a list of *offsets*. To illustrate their use, next we'll show you how to replicate the results of the `token-classification` pipeline manually.
 
-<Tip>
-
-✏️ **Try it out!** Create your own example text and see if you can understand which tokens are associated with word ID, and also how to extract the character spans for a single word. For bonus points, try using two sentences as input and see if the sentence IDs make sense to you.
-
-</Tip>
+> [!TIP]
+> ✏️ **Try it out!** Create your own example text and see if you can understand which tokens are associated with word ID, and also how to extract the character spans for a single word. For bonus points, try using two sentences as input and see if the sentence IDs make sense to you.
 
 ## Inside the `token-classification` pipeline[[inside-the-token-classification-pipeline]]
 
diff --git a/chapters/en/chapter6/3b.mdx b/chapters/en/chapter6/3b.mdx
index d0affbcba..4fbdea5c1 100644
--- a/chapters/en/chapter6/3b.mdx
+++ b/chapters/en/chapter6/3b.mdx
@@ -275,11 +275,8 @@ We're not quite done yet, but at least we already have the correct score for the
 0.97773
 ```
 
-<Tip>
-
-✏️ **Try it out!** Compute the start and end indices for the five most likely answers.
-
-</Tip>
+> [!TIP]
+> ✏️ **Try it out!** Compute the start and end indices for the five most likely answers.
 
 We have the `start_index` and `end_index` of the answer in terms of tokens, so now we just need to convert to the character indices in the context. This is where the offsets will be super useful. We can grab them and use them like we did in the token classification task:
 
@@ -313,11 +310,8 @@ print(result)
 
 Great! That's the same as in our first example!
 
-<Tip>
-
-✏️ **Try it out!** Use the best scores you computed earlier to show the five most likely answers. To check your results, go back to the first pipeline and pass in `top_k=5` when calling it.
-
-</Tip>
+> [!TIP]
+> ✏️ **Try it out!** Use the best scores you computed earlier to show the five most likely answers. To check your results, go back to the first pipeline and pass in `top_k=5` when calling it.
 
 ## Handling long contexts[[handling-long-contexts]]
 
@@ -608,11 +602,8 @@ print(candidates)
 
 Those two candidates correspond to the best answers the model was able to find in each chunk. The model is way more confident the right answer is in the second part (which is a good sign!). Now we just have to map those two token spans to spans of characters in the context (we only need to map the second one to have our answer, but it's interesting to see what the model has picked in the first chunk).
 
-<Tip>
-
-✏️ **Try it out!** Adapt the code above to return the scores and spans for the five most likely answers (in total, not per chunk).
-
-</Tip>
+> [!TIP]
+> ✏️ **Try it out!** Adapt the code above to return the scores and spans for the five most likely answers (in total, not per chunk).
 
 The `offsets` we grabbed earlier is actually a list of offsets, with one list per chunk of text:
 
@@ -633,10 +624,7 @@ for candidate, offset in zip(candidates, offsets):
 
 If we ignore the first result, we get the same result as our pipeline for this long context -- yay!
 
-<Tip>
-
-✏️ **Try it out!** Use the best scores you computed before to show the five most likely answers (for the whole context, not each chunk). To check your results, go back to the first pipeline and pass in `top_k=5` when calling it.
-
-</Tip>
+> [!TIP]
+> ✏️ **Try it out!** Use the best scores you computed before to show the five most likely answers (for the whole context, not each chunk). To check your results, go back to the first pipeline and pass in `top_k=5` when calling it.
 
 This concludes our deep dive into the tokenizer's capabilities. We will put all of this in practice again in the next chapter, when we show you how to fine-tune a model on a range of common NLP tasks.
diff --git a/chapters/en/chapter6/4.mdx b/chapters/en/chapter6/4.mdx
index 8c7999588..5008699c6 100644
--- a/chapters/en/chapter6/4.mdx
+++ b/chapters/en/chapter6/4.mdx
@@ -47,11 +47,8 @@ print(tokenizer.backend_tokenizer.normalizer.normalize_str("Héllò hôw are ü?
 
 In this example, since we picked the `bert-base-uncased` checkpoint, the normalization applied lowercasing and removed the accents. 
 
-<Tip>
-
-✏️ **Try it out!** Load a tokenizer from the `bert-base-cased` checkpoint and pass the same example to it. What are the main differences you can see between the cased and uncased versions of the tokenizer?
-
-</Tip>
+> [!TIP]
+> ✏️ **Try it out!** Load a tokenizer from the `bert-base-cased` checkpoint and pass the same example to it. What are the main differences you can see between the cased and uncased versions of the tokenizer?
 
 ## Pre-tokenization[[pre-tokenization]]
 
diff --git a/chapters/en/chapter6/5.mdx b/chapters/en/chapter6/5.mdx
index e877653ef..e8375d2c9 100644
--- a/chapters/en/chapter6/5.mdx
+++ b/chapters/en/chapter6/5.mdx
@@ -11,11 +11,8 @@ Byte-Pair Encoding (BPE) was initially developed as an algorithm to compress tex
 
 <Youtube id="HEikzVL-lZU"/>
 
-<Tip>
-
-💡 This section covers BPE in depth, going as far as showing a full implementation. You can skip to the end if you just want a general overview of the tokenization algorithm.
-
-</Tip>
+> [!TIP]
+> 💡 This section covers BPE in depth, going as far as showing a full implementation. You can skip to the end if you just want a general overview of the tokenization algorithm.
 
 ## Training algorithm[[training-algorithm]]
 
@@ -27,11 +24,8 @@ BPE training starts by computing the unique set of words used in the corpus (aft
 
 The base vocabulary will then be `["b", "g", "h", "n", "p", "s", "u"]`. For real-world cases, that base vocabulary will contain all the ASCII characters, at the very least, and probably some Unicode characters as well. If an example you are tokenizing uses a character that is not in the training corpus, that character will be converted to the unknown token. That's one reason why lots of NLP models are very bad at analyzing content with emojis, for instance.
 
-<Tip>
-
-The GPT-2 and RoBERTa tokenizers (which are pretty similar) have a clever way to deal with this: they don't look at words as being written with Unicode characters, but with bytes. This way the base vocabulary has a small size (256), but every character you can think of will still be included and not end up being converted to the unknown token. This trick is called *byte-level BPE*.
-
-</Tip>
+> [!TIP]
+> The GPT-2 and RoBERTa tokenizers (which are pretty similar) have a clever way to deal with this: they don't look at words as being written with Unicode characters, but with bytes. This way the base vocabulary has a small size (256), but every character you can think of will still be included and not end up being converted to the unknown token. This trick is called *byte-level BPE*.
 
 After getting this base vocabulary, we add new tokens until the desired vocabulary size is reached by learning *merges*, which are rules to merge two elements of the existing vocabulary together into a new one. So, at the beginning these merges will create tokens with two characters, and then, as training progresses, longer subwords.
 
@@ -74,11 +68,8 @@ Corpus: ("hug", 10), ("p" "ug", 5), ("p" "un", 12), ("b" "un", 4), ("hug" "s", 5
 
 And we continue like this until we reach the desired vocabulary size.
 
-<Tip>
-
-✏️ **Now your turn!** What do you think the next merge rule will be?
-
-</Tip>
+> [!TIP]
+> ✏️ **Now your turn!** What do you think the next merge rule will be?
 
 ## Tokenization algorithm[[tokenization-algorithm]]
 
@@ -99,11 +90,8 @@ Let's take the example we used during training, with the three merge rules learn
 
 The word `"bug"` will be tokenized as `["b", "ug"]`. `"mug"`, however, will be tokenized as `["[UNK]", "ug"]` since the letter `"m"` was not in the base vocabulary. Likewise, the word `"thug"` will be tokenized as `["[UNK]", "hug"]`: the letter `"t"` is not in the base vocabulary, and applying the merge rules results first in `"u"` and `"g"` being merged and then `"h"` and `"ug"` being merged.
 
-<Tip>
-
-✏️ **Now your turn!** How do you think  the word `"unhug"` will be tokenized?
-
-</Tip>
+> [!TIP]
+> ✏️ **Now your turn!** How do you think  the word `"unhug"` will be tokenized?
 
 ## Implementing BPE[[implementing-bpe]]
 
@@ -315,11 +303,8 @@ print(vocab)
  'Ġtok', 'Ġtoken', 'nd', 'Ġis', 'Ġth', 'Ġthe', 'in', 'Ġab', 'Ġtokeni']
 ```
 
-<Tip>
-
-💡 Using `train_new_from_iterator()` on the same corpus won't result in the exact same vocabulary. This is because when there is a choice of the most frequent pair, we selected the first one encountered, while the 🤗 Tokenizers library selects the first one based on its inner IDs.
-
-</Tip>
+> [!TIP]
+> 💡 Using `train_new_from_iterator()` on the same corpus won't result in the exact same vocabulary. This is because when there is a choice of the most frequent pair, we selected the first one encountered, while the 🤗 Tokenizers library selects the first one based on its inner IDs.
 
 To tokenize a new text, we pre-tokenize it, split it, then apply all the merge rules learned:
 
@@ -351,10 +336,7 @@ tokenize("This is not a token.")
 ['This', 'Ġis', 'Ġ', 'n', 'o', 't', 'Ġa', 'Ġtoken', '.']
 ```
 
-<Tip warning={true}>
-
-⚠️ Our implementation will throw an error if there is an unknown character since we didn't do anything to handle them. GPT-2 doesn't actually have an unknown token (it's impossible to get an unknown character when using byte-level BPE), but this could happen here because we did not include all the possible bytes in the initial vocabulary. This aspect of BPE is beyond the scope of this section, so we've left the details out.
-
-</Tip>
+> [!WARNING]
+> ⚠️ Our implementation will throw an error if there is an unknown character since we didn't do anything to handle them. GPT-2 doesn't actually have an unknown token (it's impossible to get an unknown character when using byte-level BPE), but this could happen here because we did not include all the possible bytes in the initial vocabulary. This aspect of BPE is beyond the scope of this section, so we've left the details out.
 
 That's it for the BPE algorithm! Next, we'll have a look at WordPiece.
\ No newline at end of file
diff --git a/chapters/en/chapter6/6.mdx b/chapters/en/chapter6/6.mdx
index eb0cbddeb..96957b763 100644
--- a/chapters/en/chapter6/6.mdx
+++ b/chapters/en/chapter6/6.mdx
@@ -11,19 +11,13 @@ WordPiece is the tokenization algorithm Google developed to pretrain BERT. It ha
 
 <Youtube id="qpv6ms_t_1A"/>
 
-<Tip>
-
-💡 This section covers WordPiece in depth, going as far as showing a full implementation. You can skip to the end if you just want a general overview of the tokenization algorithm.
-
-</Tip>
+> [!TIP]
+> 💡 This section covers WordPiece in depth, going as far as showing a full implementation. You can skip to the end if you just want a general overview of the tokenization algorithm.
 
 ## Training algorithm[[training-algorithm]]
 
-<Tip warning={true}>
-
-⚠️ Google never open-sourced its implementation of the training algorithm of WordPiece, so what follows is our best guess based on the published literature. It may not be 100% accurate.
-
-</Tip>
+> [!WARNING]
+> ⚠️ Google never open-sourced its implementation of the training algorithm of WordPiece, so what follows is our best guess based on the published literature. It may not be 100% accurate.
 
 Like BPE, WordPiece starts from a small vocabulary including the special tokens used by the model and the initial alphabet. Since it identifies subwords by adding a prefix (like `##` for BERT), each word is initially split by adding that prefix to all the characters inside the word. So, for instance, `"word"` gets split like this:
 
@@ -76,11 +70,8 @@ Corpus: ("hug", 10), ("p" "##u" "##g", 5), ("p" "##u" "##n", 12), ("b" "##u" "##
 
 and we continue like this until we reach the desired vocabulary size.
 
-<Tip>
-
-✏️ **Now your turn!** What will the next merge rule be?
-
-</Tip>
+> [!TIP]
+> ✏️ **Now your turn!** What will the next merge rule be?
 
 ## Tokenization algorithm[[tokenization-algorithm]]
 
@@ -92,11 +83,8 @@ As another example, let's see how the word `"bugs"` would be tokenized. `"b"` is
 
 When the tokenization gets to a stage where it's not possible to find a subword in the vocabulary, the whole word is tokenized as unknown -- so, for instance, `"mug"` would be tokenized as `["[UNK]"]`, as would `"bum"` (even if we can begin with `"b"` and `"##u"`, `"##m"` is not the vocabulary, and the resulting tokenization will just be `["[UNK]"]`, not `["b", "##u", "[UNK]"]`). This is another difference from BPE, which would only classify the individual characters not in the vocabulary as unknown.
 
-<Tip>
-
-✏️ **Now your turn!** How will the word `"pugs"` be tokenized?
-
-</Tip>
+> [!TIP]
+> ✏️ **Now your turn!** How will the word `"pugs"` be tokenized?
 
 ## Implementing WordPiece[[implementing-wordpiece]]
 
@@ -314,11 +302,8 @@ print(vocab)
 
 As we can see, compared to BPE, this tokenizer learns parts of words as tokens a bit faster.
 
-<Tip>
-
-💡 Using `train_new_from_iterator()` on the same corpus won't result in the exact same vocabulary. This is because the 🤗 Tokenizers library does not implement WordPiece for the training (since we are not completely sure of its internals), but uses BPE instead.
-
-</Tip>
+> [!TIP]
+> 💡 Using `train_new_from_iterator()` on the same corpus won't result in the exact same vocabulary. This is because the 🤗 Tokenizers library does not implement WordPiece for the training (since we are not completely sure of its internals), but uses BPE instead.
 
 To tokenize a new text, we pre-tokenize it, split it, then apply the tokenization algorithm on each word. That is, we look for the biggest subword starting at the beginning of the first word and split it, then we repeat the process on the second part, and so on for the rest of that word and the following words in the text:
 
diff --git a/chapters/en/chapter6/7.mdx b/chapters/en/chapter6/7.mdx
index b0f0320f4..deb1c9ee4 100644
--- a/chapters/en/chapter6/7.mdx
+++ b/chapters/en/chapter6/7.mdx
@@ -13,11 +13,8 @@ SentencePiece addresses the fact that not all languages use spaces to separate w
 
 <Youtube id="TGZfZVuF9Yc"/>
 
-<Tip>
-
-💡 This section covers Unigram in depth, going as far as showing a full implementation. You can skip to the end if you just want a general overview of the tokenization algorithm.
-
-</Tip>
+> [!TIP]
+> 💡 This section covers Unigram in depth, going as far as showing a full implementation. You can skip to the end if you just want a general overview of the tokenization algorithm.
 
 ## Training algorithm[[training-algorithm]]
 
@@ -58,11 +55,8 @@ Here are the frequencies of all the possible subwords in the vocabulary:
 
 So, the sum of all frequencies is 210, and the probability of the subword `"ug"` is thus 20/210.
 
-<Tip>
-
-✏️ **Now your turn!** Write the code to compute the frequencies above and double-check that the results shown are correct, as well as the total sum.
-
-</Tip>
+> [!TIP]
+> ✏️ **Now your turn!** Write the code to compute the frequencies above and double-check that the results shown are correct, as well as the total sum.
 
 Now, to tokenize a given word, we look at all the possible segmentations into tokens and compute the probability of each according to the Unigram model. Since all tokens are considered independent, this probability is just the product of the probability of each token. For instance, the tokenization `["p", "u", "g"]` of `"pug"` has the probability:
 
@@ -100,11 +94,8 @@ Character 4 (g): "un" "hug" (score 0.005442)
 
 Thus `"unhug"` would be tokenized as `["un", "hug"]`.
 
-<Tip>
-
-✏️ **Now your turn!** Determine the tokenization of the word `"huggun"`, and its score.
-
-</Tip>
+> [!TIP]
+> ✏️ **Now your turn!** Determine the tokenization of the word `"huggun"`, and its score.
 
 ## Back to training[[back-to-training]]
 
@@ -217,11 +208,8 @@ token_freqs = list(char_freqs.items()) + sorted_subwords[: 300 - len(char_freqs)
 token_freqs = {token: freq for token, freq in token_freqs}
 ```
 
-<Tip>
-
-💡 SentencePiece uses a more efficient algorithm called Enhanced Suffix Array (ESA) to create the initial vocabulary.
-
-</Tip>
+> [!TIP]
+> 💡 SentencePiece uses a more efficient algorithm called Enhanced Suffix Array (ESA) to create the initial vocabulary.
 
 Next, we compute the sum of all frequencies, to convert the frequencies into probabilities. For our model we will store the logarithms of the probabilities, because it's more numerically stable to add logarithms than to multiply small numbers, and this will simplify the computation of the loss of the model:
 
@@ -342,11 +330,8 @@ Since `"ll"` is used in the tokenization of `"Hopefully"`, and removing it will
 0.0
 ```
 
-<Tip>
-
-💡 This approach is very inefficient, so SentencePiece uses an approximation of the loss of the model without token X: instead of starting from scratch, it just replaces token X by its segmentation in the vocabulary that is left. This way, all the scores can be computed at once at the same time as the model loss.
-
-</Tip>
+> [!TIP]
+> 💡 This approach is very inefficient, so SentencePiece uses an approximation of the loss of the model without token X: instead of starting from scratch, it just replaces token X by its segmentation in the vocabulary that is left. This way, all the scores can be computed at once at the same time as the model loss.
 
 With all of this in place, the last thing we need to do is add the special tokens used by the model to the vocabulary, then loop until we have pruned enough tokens from the vocabulary to reach our desired size:
 
@@ -380,10 +365,7 @@ tokenize("This is the Hugging Face course.", model)
 ['▁This', '▁is', '▁the', '▁Hugging', '▁Face', '▁', 'c', 'ou', 'r', 's', 'e', '.']
 ```
 
-<Tip>
-
-The XLNetTokenizer uses SentencePiece which is why the `"_"` character is included. To decode with SentencePiece, concatenate all the tokens and replace `"_"` with a space.
-
-</Tip>
+> [!TIP]
+> The XLNetTokenizer uses SentencePiece which is why the `"_"` character is included. To decode with SentencePiece, concatenate all the tokens and replace `"_"` with a space.
 
 That's it for Unigram! Hopefully by now you're feeling like an expert in all things tokenizer. In the next section, we will delve into the building blocks of the 🤗 Tokenizers library, and show you how you can use them to build your own tokenizer.
diff --git a/chapters/en/chapter6/8.mdx b/chapters/en/chapter6/8.mdx
index 38086edcf..71bcafbe2 100644
--- a/chapters/en/chapter6/8.mdx
+++ b/chapters/en/chapter6/8.mdx
@@ -111,12 +111,9 @@ print(tokenizer.normalizer.normalize_str("Héllò hôw are ü?"))
 hello how are u?
 ```
 
-<Tip>
-
-**To go further** If you test the two versions of the previous normalizers on a string containing the unicode character `u"\u0085"` you will surely notice that these two normalizers are not exactly equivalent. 
-To not over-complicate the version with `normalizers.Sequence` too much , we haven't included the Regex replacements that the `BertNormalizer` requires when the `clean_text` argument is set to `True` - which is the default behavior. But don't worry: it is possible to get exactly the same normalization without using the handy `BertNormalizer` by adding two `normalizers.Replace`'s to the normalizers sequence. 
-
-</Tip>
+> [!TIP]
+> **To go further** If you test the two versions of the previous normalizers on a string containing the unicode character `u"\u0085"` you will surely notice that these two normalizers are not exactly equivalent. 
+> To not over-complicate the version with `normalizers.Sequence` too much , we haven't included the Regex replacements that the `BertNormalizer` requires when the `clean_text` argument is set to `True` - which is the default behavior. But don't worry: it is possible to get exactly the same normalization without using the handy `BertNormalizer` by adding two `normalizers.Replace`'s to the normalizers sequence.
 
 Next is the pre-tokenization step. Again, there is a prebuilt `BertPreTokenizer` that we can use:
 
diff --git a/chapters/en/chapter7/1.mdx b/chapters/en/chapter7/1.mdx
index b81dc277f..068cc90ae 100644
--- a/chapters/en/chapter7/1.mdx
+++ b/chapters/en/chapter7/1.mdx
@@ -33,8 +33,5 @@ Each section can be read independently.
 {/if}
 
 
-<Tip>
-
-If you read the sections in sequence, you will notice that they have quite a bit of code and prose in common. The repetition is intentional, to allow you to dip in (or come back later) to any task that interests you and find a complete working example.
-
-</Tip>
+> [!TIP]
+> If you read the sections in sequence, you will notice that they have quite a bit of code and prose in common. The repetition is intentional, to allow you to dip in (or come back later) to any task that interests you and find a complete working example.
diff --git a/chapters/en/chapter7/2.mdx b/chapters/en/chapter7/2.mdx
index 0bae719d9..65845cbb0 100644
--- a/chapters/en/chapter7/2.mdx
+++ b/chapters/en/chapter7/2.mdx
@@ -45,11 +45,8 @@ You can find the model we'll train and upload to the Hub and double-check its pr
 
 First things first, we need a dataset suitable for token classification. In this section we will use the [CoNLL-2003 dataset](https://huggingface.co/datasets/conll2003), which contains news stories from Reuters. 
 
-<Tip>
-
-💡 As long as your dataset consists of texts split into words with their corresponding labels, you will be able to adapt the data processing procedures described here to your own dataset. Refer back to [Chapter 5](/course/chapter5) if you need a refresher on how to load your own custom data in a `Dataset`.
-
-</Tip>
+> [!TIP]
+> 💡 As long as your dataset consists of texts split into words with their corresponding labels, you will be able to adapt the data processing procedures described here to your own dataset. Refer back to [Chapter 5](/course/chapter5) if you need a refresher on how to load your own custom data in a `Dataset`.
 
 ### The CoNLL-2003 dataset[[the-conll-2003-dataset]]
 
@@ -167,11 +164,8 @@ And for an example mixing `B-` and `I-` labels, here's what the same code gives
 
 As we can see, entities spanning two words, like "European Union" and "Werner Zwingmann," are attributed a `B-` label for the first word and an `I-` label for the second.
 
-<Tip>
-
-✏️ **Your turn!** Print the same two sentences with their POS or chunking labels.
-
-</Tip>
+> [!TIP]
+> ✏️ **Your turn!** Print the same two sentences with their POS or chunking labels.
 
 ### Processing the data[[processing-the-data]]
 
@@ -263,11 +257,8 @@ print(align_labels_with_tokens(labels, word_ids))
 
 As we can see, our function added the `-100` for the two special tokens at the beginning and the end, and a new `0` for our word that was split into two tokens.
 
-<Tip>
-
-✏️ **Your turn!** Some researchers prefer to attribute only one label per word, and assign `-100` to the other subtokens in a given word. This is to avoid long words that split into lots of subtokens contributing heavily to the loss. Change the previous function to align labels with input IDs by following this rule.
-
-</Tip>
+> [!TIP]
+> ✏️ **Your turn!** Some researchers prefer to attribute only one label per word, and assign `-100` to the other subtokens in a given word. This is to avoid long words that split into lots of subtokens contributing heavily to the loss. Change the previous function to align labels with input IDs by following this rule.
 
 To preprocess our whole dataset, we need to tokenize all the inputs and apply `align_labels_with_tokens()` on all the labels. To take advantage of the speed of our fast tokenizer, it's best to tokenize lots of texts at the same time, so we'll write a function that processes a list of examples and use the `Dataset.map()` method with the option `batched=True`. The only thing that is different from our previous example is that the `word_ids()` function needs to get the index of the example we want the word IDs of when the inputs to the tokenizer are lists of texts (or in our case, list of lists of words), so we add that too:
 
@@ -429,11 +420,8 @@ model.config.num_labels
 9
 ```
 
-<Tip warning={true}>
-
-⚠️ If you have a model with the wrong number of labels, you will get an obscure error when calling `model.fit()` later. This can be annoying to debug, so make sure you do this check to confirm you have the expected number of labels.
-
-</Tip>
+> [!WARNING]
+> ⚠️ If you have a model with the wrong number of labels, you will get an obscure error when calling `model.fit()` later. This can be annoying to debug, so make sure you do this check to confirm you have the expected number of labels.
 
 ### Fine-tuning the model[[fine-tuning-the-model]]
 
@@ -497,11 +485,8 @@ model.fit(
 
 You can specify the full name of the repository you want to push to with the `hub_model_id` argument (in particular, you will have to use this argument to push to an organization). For instance, when we pushed the model to the [`huggingface-course` organization](https://huggingface.co/huggingface-course), we added `hub_model_id="huggingface-course/bert-finetuned-ner"`. By default, the repository used will be in your namespace and named after the output directory you set, for example `"cool_huggingface_user/bert-finetuned-ner"`.
 
-<Tip>
-
-💡 If the output directory you are using already exists, it needs to be a local clone of the repository you want to push to. If it isn't, you'll get an error when calling `model.fit()` and will need to set a new name.
-
-</Tip>
+> [!TIP]
+> 💡 If the output directory you are using already exists, it needs to be a local clone of the repository you want to push to. If it isn't, you'll get an error when calling `model.fit()` and will need to set a new name.
 
 Note that while the training happens, each time the model is saved (here, every epoch) it is uploaded to the Hub in the background. This way, you will be able to to resume your training on another machine if necessary.
 
@@ -679,11 +664,8 @@ model.config.num_labels
 9
 ```
 
-<Tip warning={true}>
-
-⚠️ If you have a model with the wrong number of labels, you will get an obscure error when calling the `Trainer.train()` method later on (something like "CUDA error: device-side assert triggered"). This is the number one cause of bugs reported by users for such errors, so make sure you do this check to confirm that you have the expected number of labels.
-
-</Tip>
+> [!WARNING]
+> ⚠️ If you have a model with the wrong number of labels, you will get an obscure error when calling the `Trainer.train()` method later on (something like "CUDA error: device-side assert triggered"). This is the number one cause of bugs reported by users for such errors, so make sure you do this check to confirm that you have the expected number of labels.
 
 ### Fine-tuning the model[[fine-tuning-the-model]]
 
@@ -721,11 +703,8 @@ args = TrainingArguments(
 
 You've seen most of those before: we set some hyperparameters (like the learning rate, the number of epochs to train for, and the weight decay), and we specify `push_to_hub=True` to indicate that we want to save the model and evaluate it at the end of every epoch, and that we want to upload our results to the Model Hub. Note that you can specify the name of the repository you want to push to with the `hub_model_id` argument (in particular, you will have to use this argument to push to an organization). For instance, when we pushed the model to the [`huggingface-course` organization](https://huggingface.co/huggingface-course), we added `hub_model_id="huggingface-course/bert-finetuned-ner"` to `TrainingArguments`. By default, the repository used will be in your namespace and named after the output directory you set, so in our case it will be `"sgugger/bert-finetuned-ner"`.
 
-<Tip>
-
-💡 If the output directory you are using already exists, it needs to be a local clone of the repository you want to push to. If it isn't, you'll get an error when defining your `Trainer` and will need to set a new name.
-
-</Tip>
+> [!TIP]
+> 💡 If the output directory you are using already exists, it needs to be a local clone of the repository you want to push to. If it isn't, you'll get an error when defining your `Trainer` and will need to set a new name.
 
 Finally, we just pass everything to the `Trainer` and launch the training:
 
@@ -813,11 +792,8 @@ model, optimizer, train_dataloader, eval_dataloader = accelerator.prepare(
 )
 ```
 
-<Tip>
-
-🚨 If you're training on a TPU, you'll need to move all the code starting from the cell above into a dedicated training function. See [Chapter 3](/course/chapter3) for more details.
-
-</Tip>
+> [!TIP]
+> 🚨 If you're training on a TPU, you'll need to move all the code starting from the cell above into a dedicated training function. See [Chapter 3](/course/chapter3) for more details.
 
 Now that we have sent our `train_dataloader` to `accelerator.prepare()`, we can use its length to compute the number of training steps. Remember that we should always do this after preparing the dataloader, as that method will change its length. We use a classic linear schedule from the learning rate to 0:
 
diff --git a/chapters/en/chapter7/3.mdx b/chapters/en/chapter7/3.mdx
index de3da9a1f..1707e7113 100644
--- a/chapters/en/chapter7/3.mdx
+++ b/chapters/en/chapter7/3.mdx
@@ -41,11 +41,8 @@ Let's dive in!
 
 <Youtube id="mqElG5QJWUg"/>
 
-<Tip>
-
-🙋 If the terms "masked language modeling" and "pretrained model" sound unfamiliar to you, go check out [Chapter 1](/course/chapter1), where we explain all these core concepts, complete with videos!
-
-</Tip>
+> [!TIP]
+> 🙋 If the terms "masked language modeling" and "pretrained model" sound unfamiliar to you, go check out [Chapter 1](/course/chapter1), where we explain all these core concepts, complete with videos!
 
 ## Picking a pretrained model for masked language modeling[[picking-a-pretrained-model-for-masked-language-modeling]]
 
@@ -237,11 +234,8 @@ for row in sample:
 
 Yep, these are certainly movie reviews, and if you're old enough you may even understand the comment in the last review about owning a VHS version 😜! Although we won't need the labels for language modeling, we can already see that a `0` denotes a negative review, while a `1` corresponds to a positive one.
 
-<Tip>
-
-✏️ **Try it out!** Create a random sample of the `unsupervised` split and verify that the labels are neither `0` nor `1`. While you're at it, you could also check that the labels in the `train` and `test` splits are indeed `0` or `1` -- this is a useful sanity check that every NLP practitioner should perform at the start of a new project!
-
-</Tip>
+> [!TIP]
+> ✏️ **Try it out!** Create a random sample of the `unsupervised` split and verify that the labels are neither `0` nor `1`. While you're at it, you could also check that the labels in the `train` and `test` splits are indeed `0` or `1` -- this is a useful sanity check that every NLP practitioner should perform at the start of a new project!
 
 Now that we've had a quick look at the data, let's dive into preparing it for masked language modeling. As we'll see, there are some additional steps that one needs to take compared to the sequence classification tasks we saw in [Chapter 3](/course/chapter3). Let's go!
 
@@ -299,11 +293,8 @@ tokenizer.model_max_length
 
 This value is derived from the *tokenizer_config.json* file associated with a checkpoint; in this case we can see that the context size is 512 tokens, just like with BERT.
 
-<Tip>
-
-✏️ **Try it out!** Some Transformer models, like [BigBird](https://huggingface.co/google/bigbird-roberta-base) and [Longformer](hf.co/allenai/longformer-base-4096), have a much longer context length than BERT and other early Transformer models. Instantiate the tokenizer for one of these checkpoints and verify that the `model_max_length` agrees with what's quoted on its model card.
-
-</Tip>
+> [!TIP]
+> ✏️ **Try it out!** Some Transformer models, like [BigBird](https://huggingface.co/google/bigbird-roberta-base) and [Longformer](hf.co/allenai/longformer-base-4096), have a much longer context length than BERT and other early Transformer models. Instantiate the tokenizer for one of these checkpoints and verify that the `model_max_length` agrees with what's quoted on its model card.
 
 So, in order to run our experiments on GPUs like those found on Google Colab, we'll pick something a bit smaller that can fit in memory:
 
@@ -311,11 +302,8 @@ So, in order to run our experiments on GPUs like those found on Google Colab, we
 chunk_size = 128
 ```
 
-<Tip warning={true}>
-
-Note that using a small chunk size can be detrimental in real-world scenarios, so you should use a size that corresponds to the use case you will apply your model to.
-
-</Tip>
+> [!WARNING]
+> Note that using a small chunk size can be detrimental in real-world scenarios, so you should use a size that corresponds to the use case you will apply your model to.
 
 Now comes the fun part. To show how the concatenation works, let's take a few reviews from our tokenized training set and print out the number of tokens per review:
 
@@ -472,11 +460,8 @@ for chunk in data_collator(samples)["input_ids"]:
 
 Nice, it worked! We can see that the `[MASK]` token has been randomly inserted at various locations in our text. These will be the tokens which our model will have to predict during training -- and the beauty of the data collator is that it will randomize the `[MASK]` insertion with every batch! 
 
-<Tip>
-
-✏️ **Try it out!** Run the code snippet above several times to see the random masking happen in front of your very eyes! Also replace the `tokenizer.decode()` method with `tokenizer.convert_ids_to_tokens()` to see that sometimes a single token from a given word is masked, and not the others.
-
-</Tip>
+> [!TIP]
+> ✏️ **Try it out!** Run the code snippet above several times to see the random masking happen in front of your very eyes! Also replace the `tokenizer.decode()` method with `tokenizer.convert_ids_to_tokens()` to see that sometimes a single token from a given word is masked, and not the others.
 
 {#if fw === 'pt'}
 
@@ -586,11 +571,8 @@ for chunk in batch["input_ids"]:
 '>>> .... [MASK] [MASK] [MASK] [MASK]....... high. a classic line : inspector : i\'m here to sack one of your teachers. student : welcome to bromwell high. i expect that many adults of my age think that bromwell high is far fetched. what a pity that it isn\'t! [SEP] [CLS] homelessness ( or houselessness as george carlin stated ) has been an issue for years but never a plan to help those on the street that were once considered human who did everything from going to school, work, or vote for the matter. most people think of the homeless'
 ```
 
-<Tip>
-
-✏️ **Try it out!** Run the code snippet above several times to see the random masking happen in front of your very eyes! Also replace the `tokenizer.decode()` method with `tokenizer.convert_ids_to_tokens()` to see that the tokens from a given word are always masked together.
-
-</Tip>
+> [!TIP]
+> ✏️ **Try it out!** Run the code snippet above several times to see the random masking happen in front of your very eyes! Also replace the `tokenizer.decode()` method with `tokenizer.convert_ids_to_tokens()` to see that the tokens from a given word are always masked together.
 
 Now that we have two data collators, the rest of the fine-tuning steps are standard. Training can take a while on Google Colab if you're not lucky enough to score a mythical P100 GPU 😭, so we'll first downsample the size of the training set to a few thousand examples. Don't worry, we'll still get a pretty decent language model! A quick way to downsample a dataset in 🤗 Datasets is via the `Dataset.train_test_split()` function that we saw in [Chapter 5](/course/chapter5):
 
@@ -815,11 +797,8 @@ trainer.push_to_hub()
 
 {/if}
 
-<Tip>
-
-✏️ **Your turn!** Run the training above after changing the data collator to the whole word masking collator. Do you get better results?
-
-</Tip>
+> [!TIP]
+> ✏️ **Your turn!** Run the training above after changing the data collator to the whole word masking collator. Do you get better results?
 
 {#if fw === 'pt'} 
 
@@ -1037,8 +1016,5 @@ Neat -- our model has clearly adapted its weights to predict words that are more
 
 This wraps up our first experiment with training a language model. In [section 6](/course/en/chapter7/6) you'll learn how to train an auto-regressive model like GPT-2 from scratch; head over there if you'd like to see how you can pretrain your very own Transformer model!
 
-<Tip>
-
-✏️ **Try it out!** To quantify the benefits of domain adaptation, fine-tune a classifier on the IMDb labels for both the pretrained and fine-tuned DistilBERT checkpoints. If you need a refresher on text classification, check out [Chapter 3](/course/chapter3). 
-
-</Tip>
+> [!TIP]
+> ✏️ **Try it out!** To quantify the benefits of domain adaptation, fine-tune a classifier on the IMDb labels for both the pretrained and fine-tuned DistilBERT checkpoints. If you need a refresher on text classification, check out [Chapter 3](/course/chapter3).
diff --git a/chapters/en/chapter7/4.mdx b/chapters/en/chapter7/4.mdx
index c95421e17..9414cf34d 100644
--- a/chapters/en/chapter7/4.mdx
+++ b/chapters/en/chapter7/4.mdx
@@ -156,11 +156,8 @@ It will be interesting to see if our fine-tuned model picks up on those particul
 
 <Youtube id="0Oxphw4Q9fo"/>
 
-<Tip>
-
-✏️ **Your turn!** Another English word that is often used in French is "email." Find the first sample in the training dataset that uses this word. How is it translated? How does the pretrained model translate the same English sentence?
-
-</Tip>
+> [!TIP]
+> ✏️ **Your turn!** Another English word that is often used in French is "email." Find the first sample in the training dataset that uses this word. How is it translated? How does the pretrained model translate the same English sentence?
 
 ### Processing the data[[processing-the-data]]
 
@@ -177,11 +174,8 @@ tokenizer = AutoTokenizer.from_pretrained(model_checkpoint, return_tensors="pt")
 
 You can also replace the `model_checkpoint` with any other model you prefer from the [Hub](https://huggingface.co/models), or a local folder where you've saved a pretrained model and a tokenizer.
 
-<Tip>
-
-💡 If you are using a multilingual tokenizer such as mBART, mBART-50, or M2M100, you will need to set the language codes of your inputs and targets in the tokenizer by setting `tokenizer.src_lang` and `tokenizer.tgt_lang` to the right values.
-
-</Tip>
+> [!TIP]
+> 💡 If you are using a multilingual tokenizer such as mBART, mBART-50, or M2M100, you will need to set the language codes of your inputs and targets in the tokenizer by setting `tokenizer.src_lang` and `tokenizer.tgt_lang` to the right values.
 
 The preparation of our data is pretty straightforward. There's just one thing to remember; you need to ensure that the tokenizer processes the targets in the output language (here, French). You can do this by passing the targets to the `text_targets` argument of the tokenizer's `__call__` method.
 
@@ -231,17 +225,11 @@ def preprocess_function(examples):
 
 Note that we set the same maximum length for our inputs and outputs. Since the texts we're dealing with seem pretty short, we use 128.
 
-<Tip>
+> [!TIP]
+> 💡 If you are using a T5 model (more specifically, one of the `t5-xxx` checkpoints), the model will expect the text inputs to have a prefix indicating the task at hand, such as `translate: English to French:`.
 
-💡 If you are using a T5 model (more specifically, one of the `t5-xxx` checkpoints), the model will expect the text inputs to have a prefix indicating the task at hand, such as `translate: English to French:`.
-
-</Tip>
-
-<Tip warning={true}>
-
-⚠️ We don't pay attention to the attention mask of the targets, as the model won't expect it. Instead, the labels corresponding to a padding token should be set to `-100` so they are ignored in the loss computation. This will be done by our data collator later on since we are applying dynamic padding, but if you use padding here, you should adapt the preprocessing function to set all labels that correspond to the padding token to `-100`.
-
-</Tip>
+> [!WARNING]
+> ⚠️ We don't pay attention to the attention mask of the targets, as the model won't expect it. Instead, the labels corresponding to a padding token should be set to `-100` so they are ignored in the loss computation. This will be done by our data collator later on since we are applying dynamic padding, but if you use padding here, you should adapt the preprocessing function to set all labels that correspond to the padding token to `-100`.
 
 We can now apply that preprocessing in one go on all the splits of our dataset:
 
@@ -649,11 +637,8 @@ model.fit(
 
 Note that you can specify the name of the repository you want to push to with the `hub_model_id` argument (in particular, you will have to use this argument to push to an organization). For instance, when we pushed the model to the [`huggingface-course` organization](https://huggingface.co/huggingface-course), we added `hub_model_id="huggingface-course/marian-finetuned-kde4-en-to-fr"` to `Seq2SeqTrainingArguments`. By default, the repository used will be in your namespace and named after the output directory you set, so here it will be `"sgugger/marian-finetuned-kde4-en-to-fr"` (which is the model we linked to at the beginning of this section).
 
-<Tip>
-
-💡 If the output directory you are using already exists, it needs to be a local clone of the repository you want to push to. If it isn't, you'll get an error when calling `model.fit()` and will need to set a new name.
-
-</Tip>
+> [!TIP]
+> 💡 If the output directory you are using already exists, it needs to be a local clone of the repository you want to push to. If it isn't, you'll get an error when calling `model.fit()` and will need to set a new name.
 
 Finally, let's see what our metrics look like now that training has finished:
 
@@ -699,11 +684,8 @@ Apart from the usual hyperparameters (like learning rate, number of epochs, batc
 
 Note that you can specify the full name of the repository you want to push to with the `hub_model_id` argument (in particular, you will have to use this argument to push to an organization). For instance, when we pushed the model to the [`huggingface-course` organization](https://huggingface.co/huggingface-course), we added `hub_model_id="huggingface-course/marian-finetuned-kde4-en-to-fr"` to `Seq2SeqTrainingArguments`. By default, the repository used will be in your namespace and named after the output directory you set, so in our case it will be `"sgugger/marian-finetuned-kde4-en-to-fr"` (which is the model we linked to at the beginning of this section).
 
-<Tip>
-
-💡 If the output directory you are using already exists, it needs to be a local clone of the repository you want to push to. If it isn't, you'll get an error when defining your `Seq2SeqTrainer` and will need to set a new name.
-
-</Tip>
+> [!TIP]
+> 💡 If the output directory you are using already exists, it needs to be a local clone of the repository you want to push to. If it isn't, you'll get an error when defining your `Seq2SeqTrainer` and will need to set a new name.
 
 
 Finally, we just pass everything to the `Seq2SeqTrainer`:
@@ -995,8 +977,5 @@ translator(
 
 Another great example of domain adaptation!
 
-<Tip>
-
-✏️ **Your turn!** What does the model return on the sample with the word "email" you identified earlier?
-
-</Tip>
+> [!TIP]
+> ✏️ **Your turn!** What does the model return on the sample with the word "email" you identified earlier?
diff --git a/chapters/en/chapter7/5.mdx b/chapters/en/chapter7/5.mdx
index b8afcfaa0..869c65e44 100644
--- a/chapters/en/chapter7/5.mdx
+++ b/chapters/en/chapter7/5.mdx
@@ -86,11 +86,8 @@ show_samples(english_dataset)
 '>> Review: Bought this for handling miscellaneous aircraft parts and hanger "stuff" that I needed to organize; it really fit the bill. The unit arrived quickly, was well packaged and arrived intact (always a good sign). There are five wall mounts-- three on the top and two on the bottom. I wanted to mount it on the wall, so all I had to do was to remove the top two layers of plastic drawers, as well as the bottom corner drawers, place it when I wanted and mark it; I then used some of the new plastic screw in wall anchors (the 50 pound variety) and it easily mounted to the wall. Some have remarked that they wanted dividers for the drawers, and that they made those. Good idea. My application was that I needed something that I can see the contents at about eye level, so I wanted the fuller-sized drawers. I also like that these are the new plastic that doesn\'t get brittle and split like my older plastic drawers did. I like the all-plastic construction. It\'s heavy duty enough to hold metal parts, but being made of plastic it\'s not as heavy as a metal frame, so you can easily mount it to the wall and still load it up with heavy stuff, or light stuff. No problem there. For the money, you can\'t beat it. Best one of these I\'ve bought to date-- and I\'ve been using some version of these for over forty years.'
 ```
 
-<Tip>
-
-✏️ **Try it out!** Change the random seed in the `Dataset.shuffle()` command to explore other reviews in the corpus. If you're a Spanish speaker, take a look at some of the reviews in `spanish_dataset` to see if the titles also seem like reasonable summaries.
-
-</Tip>
+> [!TIP]
+> ✏️ **Try it out!** Change the random seed in the `Dataset.shuffle()` command to explore other reviews in the corpus. If you're a Spanish speaker, take a look at some of the reviews in `spanish_dataset` to see if the titles also seem like reasonable summaries.
 
 This sample shows the diversity of reviews one typically finds online, ranging from positive to negative (and everything in between!). Although the example with the "meh" title is not very informative, the other titles look like decent summaries of the reviews themselves. Training a summarization model on all 400,000 reviews would take far too long on a single GPU, so instead we'll focus on generating summaries for a single domain of products. To get a feel for what domains we can choose from, let's convert `english_dataset` to a `pandas.DataFrame` and compute the number of reviews per product category:
 
@@ -228,11 +225,8 @@ We'll focus on mT5, an interesting architecture based on T5 that was pretrained
 mT5 doesn't use prefixes, but shares much of the versatility of T5 and has the advantage of being multilingual. Now that we've picked a model, let's take a look at preparing our data for training.
 
 
-<Tip>
-
-✏️ **Try it out!** Once you've worked through this section, see how well mT5 compares to mBART by fine-tuning the latter with the same techniques. For bonus points, you can also try fine-tuning T5 on just the English reviews. Since T5 has a special prefix prompt, you'll need to prepend `summarize:` to the input examples in the preprocessing steps below.
-
-</Tip>
+> [!TIP]
+> ✏️ **Try it out!** Once you've worked through this section, see how well mT5 compares to mBART by fine-tuning the latter with the same techniques. For bonus points, you can also try fine-tuning T5 on just the English reviews. Since T5 has a special prefix prompt, you'll need to prepend `summarize:` to the input examples in the preprocessing steps below.
 
 ## Preprocessing the data[[preprocessing-the-data]]
 
@@ -247,11 +241,8 @@ model_checkpoint = "google/mt5-small"
 tokenizer = AutoTokenizer.from_pretrained(model_checkpoint)
 ```
 
-<Tip>
-
-💡 In the early stages of your NLP projects, a good practice is to train a class of "small" models on a small sample of data. This allows you to debug and iterate faster toward an end-to-end workflow. Once you are confident in the results, you can always scale up the model by simply changing the model checkpoint!
-
-</Tip>
+> [!TIP]
+> 💡 In the early stages of your NLP projects, a good practice is to train a class of "small" models on a small sample of data. This allows you to debug and iterate faster toward an end-to-end workflow. Once you are confident in the results, you can always scale up the model by simply changing the model checkpoint!
 
 Let's test out the mT5 tokenizer on a small example:
 
@@ -306,11 +297,8 @@ tokenized_datasets = books_dataset.map(preprocess_function, batched=True)
 
 Now that the corpus has been preprocessed, let's take a look at some metrics that are commonly used for summarization. As we'll see, there is no silver bullet when it comes to measuring the quality of machine-generated text.
 
-<Tip>
-
-💡 You may have noticed that we used `batched=True` in our `Dataset.map()` function above. This encodes the examples in batches of 1,000 (the default) and allows you to make use of the multithreading capabilities of the fast tokenizers in 🤗 Transformers. Where possible, try using `batched=True` to get the most out of your preprocessing!
-
-</Tip>
+> [!TIP]
+> 💡 You may have noticed that we used `batched=True` in our `Dataset.map()` function above. This encodes the examples in batches of 1,000 (the default) and allows you to make use of the multithreading capabilities of the fast tokenizers in 🤗 Transformers. Where possible, try using `batched=True` to get the most out of your preprocessing!
 
 
 ## Metrics for text summarization[[metrics-for-text-summarization]]
@@ -328,11 +316,8 @@ reference_summary = "I loved reading the Hunger Games"
 
 One way to compare them could be to count the number of overlapping words, which in this case would be 6. However, this is a bit crude, so instead ROUGE is based on computing the _precision_ and _recall_ scores for the overlap.
 
-<Tip>
-
-🙋 Don't worry if this is the first time you've heard of precision and recall -- we'll go through some explicit examples together to make it all clear. These metrics are usually encountered in classification tasks, so if you want to understand how precision and recall are defined in that context, we recommend checking out the `scikit-learn` [guides](https://scikit-learn.org/stable/auto_examples/model_selection/plot_precision_recall.html).
-
-</Tip>
+> [!TIP]
+> 🙋 Don't worry if this is the first time you've heard of precision and recall -- we'll go through some explicit examples together to make it all clear. These metrics are usually encountered in classification tasks, so if you want to understand how precision and recall are defined in that context, we recommend checking out the `scikit-learn` [guides](https://scikit-learn.org/stable/auto_examples/model_selection/plot_precision_recall.html).
 
 For ROUGE, recall measures how much of the reference summary is captured by the generated one. If we are just comparing words, recall can be calculated according to the following formula:
 
@@ -384,11 +369,8 @@ Score(precision=0.86, recall=1.0, fmeasure=0.92)
 
 Great, the precision and recall numbers match up! Now what about those other ROUGE scores? `rouge2` measures the overlap between bigrams (think the overlap of pairs of words), while `rougeL` and `rougeLsum` measure the longest matching sequences of words by looking for the longest common substrings in the generated and reference summaries. The "sum" in `rougeLsum` refers to the fact that this metric is computed over a whole summary, while `rougeL` is computed as the average over individual sentences.
 
-<Tip>
-
-✏️ **Try it out!** Create your own example of a generated and reference summary and see if the resulting ROUGE scores agree with a manual calculation based on the formulas for precision and recall. For bonus points, split the text into bigrams and compare the precision and recall for the `rouge2` metric.
-
-</Tip>
+> [!TIP]
+> ✏️ **Try it out!** Create your own example of a generated and reference summary and see if the resulting ROUGE scores agree with a manual calculation based on the formulas for precision and recall. For bonus points, split the text into bigrams and compare the precision and recall for the `rouge2` metric.
 
 We'll use these ROUGE scores to track the performance of our model, but before doing that let's do something every good NLP practitioner should do: create a strong, yet simple baseline!
 
@@ -478,11 +460,8 @@ model = TFAutoModelForSeq2SeqLM.from_pretrained(model_checkpoint)
 
 {/if}
 
-<Tip>
-
-💡 If you're wondering why you don't see any warnings about fine-tuning the model on a downstream task, that's because for sequence-to-sequence tasks we keep all the weights of the network. Compare this to our text classification model in [Chapter 3](/course/chapter3), where the head of the pretrained model was replaced with a randomly initialized network.
-
-</Tip>
+> [!TIP]
+> 💡 If you're wondering why you don't see any warnings about fine-tuning the model on a downstream task, that's because for sequence-to-sequence tasks we keep all the weights of the network. Compare this to our text classification model in [Chapter 3](/course/chapter3), where the head of the pretrained model was replaced with a randomly initialized network.
 
 The next thing we need to do is log in to the Hugging Face Hub. If you're running this code in a notebook, you can do so with the following utility function:
 
@@ -843,11 +822,8 @@ model, optimizer, train_dataloader, eval_dataloader = accelerator.prepare(
 )
 ```
 
-<Tip>
-
-🚨 If you're training on a TPU, you'll need to move all the code above into a dedicated training function. See [Chapter 3](/course/chapter3) for more details.
-
-</Tip>
+> [!TIP]
+> 🚨 If you're training on a TPU, you'll need to move all the code above into a dedicated training function. See [Chapter 3](/course/chapter3) for more details.
 
 Now that we've prepared our objects, there are three remaining things to do:
 
diff --git a/chapters/en/chapter7/6.mdx b/chapters/en/chapter7/6.mdx
index 44551f15d..ebe5c1863 100644
--- a/chapters/en/chapter7/6.mdx
+++ b/chapters/en/chapter7/6.mdx
@@ -135,11 +135,8 @@ DatasetDict({
 })
 ```
 
-<Tip>
-
-Pretraining the language model will take a while. We suggest that you first run the training loop on a sample of the data by uncommenting the two partial lines above, and make sure that the training successfully completes and the models are stored. Nothing is more frustrating than a training run failing at the last step because you forgot to create a folder or because there's a typo at the end of the training loop!
-
-</Tip>
+> [!TIP]
+> Pretraining the language model will take a while. We suggest that you first run the training loop on a sample of the data by uncommenting the two partial lines above, and make sure that the training successfully completes and the models are stored. Nothing is more frustrating than a training run failing at the last step because you forgot to create a folder or because there's a typo at the end of the training loop!
 
 Let's look at an example from the dataset. We'll just show the first 200 characters of each field:
 
@@ -252,11 +249,8 @@ We now have 16.7 million examples with 128 tokens each, which corresponds to abo
 
 Now that we have the dataset ready, let's set up the model!
 
-<Tip>
-
-✏️ **Try it out!** Getting rid of all the chunks that are smaller than the context size wasn't a big issue here because we're using small context windows. As you increase the context size (or if you have a corpus of short documents), the fraction of chunks that are thrown away will also grow. A more efficient way to prepare the data is to join all the tokenized samples in a batch with an `eos_token_id` token in between, and then perform the chunking on the concatenated sequences. As an exercise, modify the `tokenize()` function to make use of that approach. Note that you'll want to set `truncation=False` and remove the other arguments from the tokenizer to get the full sequence of token IDs.
-
-</Tip>
+> [!TIP]
+> ✏️ **Try it out!** Getting rid of all the chunks that are smaller than the context size wasn't a big issue here because we're using small context windows. As you increase the context size (or if you have a corpus of short documents), the fraction of chunks that are thrown away will also grow. A more efficient way to prepare the data is to join all the tokenized samples in a batch with an `eos_token_id` token in between, and then perform the chunking on the concatenated sequences. As an exercise, modify the `tokenize()` function to make use of that approach. Note that you'll want to set `truncation=False` and remove the other arguments from the tokenizer to get the full sequence of token IDs.
 
 
 ## Initializing a new model[[initializing-a-new-model]]
@@ -398,11 +392,8 @@ tf_eval_dataset = model.prepare_tf_dataset(
 
 {/if}
 
-<Tip warning={true}>
-
-⚠️ Shifting the inputs and labels to align them happens inside the model, so the data collator just copies the inputs to create the labels.
-
-</Tip>
+> [!WARNING]
+> ⚠️ Shifting the inputs and labels to align them happens inside the model, so the data collator just copies the inputs to create the labels.
 
 
 Now we have everything in place to actually train our model -- that wasn't so much work after all! Before we start training we should log in to Hugging Face. If you're working in a notebook, you can do so with the following utility function:
@@ -501,25 +492,19 @@ model.fit(tf_train_dataset, validation_data=tf_eval_dataset, callbacks=[callback
 
 {/if}
 
-<Tip>
+> [!TIP]
+> ✏️ **Try it out!** It only took us about 30 lines of code in addition to the `TrainingArguments` to get from raw texts to training GPT-2. Try it out with your own dataset and see if you can get good results!
 
-✏️ **Try it out!** It only took us about 30 lines of code in addition to the `TrainingArguments` to get from raw texts to training GPT-2. Try it out with your own dataset and see if you can get good results! 
-
-</Tip>
-
-<Tip>
-
-{#if fw === 'pt'}
-
-💡 If you have access to a machine with multiple GPUs, try to run the code there. The `Trainer` automatically manages multiple machines, and this can speed up training tremendously.
-
-{:else}
-
-💡 If you have access to a machine with multiple GPUs, you can try using a `MirroredStrategy` context to substantially speed up training. You'll need to create a `tf.distribute.MirroredStrategy` object, and make sure that any `to_tf_dataset()` or `prepare_tf_dataset()` methods as well as model creation and the call to `fit()` are all run in its `scope()` context. You can see documentation on this [here](https://www.tensorflow.org/guide/distributed_training#use_tfdistributestrategy_with_keras_modelfit).
-
-{/if}
-
-</Tip>
+> [!TIP]
+> {#if fw === 'pt'}
+>
+> 💡 If you have access to a machine with multiple GPUs, try to run the code there. The `Trainer` automatically manages multiple machines, and this can speed up training tremendously.
+>
+> {:else}
+>
+> 💡 If you have access to a machine with multiple GPUs, you can try using a `MirroredStrategy` context to substantially speed up training. You'll need to create a `tf.distribute.MirroredStrategy` object, and make sure that any `to_tf_dataset()` or `prepare_tf_dataset()` methods as well as model creation and the call to `fit()` are all run in its `scope()` context. You can see documentation on this [here](https://www.tensorflow.org/guide/distributed_training#use_tfdistributestrategy_with_keras_modelfit).
+>
+> {/if}
 
 ## Code generation with a pipeline[[code-generation-with-a-pipeline]]
 
@@ -795,11 +780,8 @@ model, optimizer, train_dataloader, eval_dataloader = accelerator.prepare(
 )
 ```
 
-<Tip>
-
-🚨 If you're training on a TPU, you'll need to move all the code starting at the cell above into a dedicated training function. See [Chapter 3](/course/chapter3) for more details.
-
-</Tip>
+> [!TIP]
+> 🚨 If you're training on a TPU, you'll need to move all the code starting at the cell above into a dedicated training function. See [Chapter 3](/course/chapter3) for more details.
 
 Now that we have sent our `train_dataloader` to `accelerator.prepare()`, we can use its length to compute the number of training steps. Remember that we should always do this after preparing the dataloader, as that method will change its length. We use a classic linear schedule from the learning rate to 0:
 
@@ -899,16 +881,10 @@ for epoch in range(num_train_epochs):
 
 And that's it -- you now have your own custom training loop for causal language models such as GPT-2 that you can further customize to your needs. 
 
-<Tip>
-
-✏️ **Try it out!** Either create your own custom loss function tailored to your use case, or add another custom step into the training loop.
-
-</Tip>
-
-<Tip>
-
-✏️ **Try it out!** When running long training experiments it's a good idea to log important metrics using tools such as TensorBoard or Weights & Biases. Add proper logging to the training loop so you can always check how the training is going.
+> [!TIP]
+> ✏️ **Try it out!** Either create your own custom loss function tailored to your use case, or add another custom step into the training loop.
 
-</Tip>
+> [!TIP]
+> ✏️ **Try it out!** When running long training experiments it's a good idea to log important metrics using tools such as TensorBoard or Weights & Biases. Add proper logging to the training loop so you can always check how the training is going.
 
 {/if}
diff --git a/chapters/en/chapter7/7.mdx b/chapters/en/chapter7/7.mdx
index 34556be21..b708071f3 100644
--- a/chapters/en/chapter7/7.mdx
+++ b/chapters/en/chapter7/7.mdx
@@ -32,11 +32,8 @@ We will fine-tune a BERT model on the [SQuAD dataset](https://rajpurkar.github.i
 
 This is actually showcasing the model that was trained and uploaded to the Hub using the code shown in this section. You can find it and double-check the predictions [here](https://huggingface.co/huggingface-course/bert-finetuned-squad?context=%F0%9F%A4%97+Transformers+is+backed+by+the+three+most+popular+deep+learning+libraries+%E2%80%94+Jax%2C+PyTorch+and+TensorFlow+%E2%80%94+with+a+seamless+integration+between+them.+It%27s+straightforward+to+train+your+models+with+one+before+loading+them+for+inference+with+the+other.&question=Which+deep+learning+libraries+back+%F0%9F%A4%97+Transformers%3F).
 
-<Tip>
-
-💡 Encoder-only models like BERT tend to be great at extracting answers to factoid questions like "Who invented the Transformer architecture?" but fare poorly when given open-ended questions like "Why is the sky blue?" In these more challenging cases, encoder-decoder models like T5 and BART are typically used to synthesize the information in a way that's quite similar to [text summarization](/course/chapter7/5). If you're interested in this type of *generative* question answering, we recommend checking out our [demo](https://yjernite.github.io/lfqa.html) based on the [ELI5 dataset](https://huggingface.co/datasets/eli5).
-
-</Tip>
+> [!TIP]
+> 💡 Encoder-only models like BERT tend to be great at extracting answers to factoid questions like "Who invented the Transformer architecture?" but fare poorly when given open-ended questions like "Why is the sky blue?" In these more challenging cases, encoder-decoder models like T5 and BART are typically used to synthesize the information in a way that's quite similar to [text summarization](/course/chapter7/5). If you're interested in this type of *generative* question answering, we recommend checking out our [demo](https://yjernite.github.io/lfqa.html) based on the [ELI5 dataset](https://huggingface.co/datasets/eli5).
 
 ## Preparing the data[[preparing-the-data]]
 
@@ -359,11 +356,8 @@ print(f"Theoretical answer: {answer}, decoded example: {decoded_example}")
 
 Indeed, we don't see the answer inside the context.
 
-<Tip>
-
-✏️ **Your turn!** When using the XLNet architecture, padding is applied on the left and the question and context are switched. Adapt all the code we just saw to the XLNet architecture (and add `padding=True`). Be aware that the `[CLS]` token may not be at the 0 position with padding applied.
-
-</Tip>
+> [!TIP]
+> ✏️ **Your turn!** When using the XLNet architecture, padding is applied on the left and the question and context are switched. Adapt all the code we just saw to the XLNet architecture (and add `padding=True`). Be aware that the `[CLS]` token may not be at the 0 position with padding applied.
 
 Now that we have seen step by step how to preprocess our training data, we can group it in a function we will apply on the whole training dataset. We'll pad every feature to the maximum length we set, as most of the contexts will be long (and the corresponding samples will be split into several features), so there is no real benefit to applying dynamic padding here:
 
@@ -908,11 +902,8 @@ By default, the repository used will be in your namespace and named after the ou
 
 {#if fw === 'pt'}
 
-<Tip>
-
-💡 If the output directory you are using exists, it needs to be a local clone of the repository you want to push to (so set a new name if you get an error when defining your `Trainer`).
-
-</Tip>
+> [!TIP]
+> 💡 If the output directory you are using exists, it needs to be a local clone of the repository you want to push to (so set a new name if you get an error when defining your `Trainer`).
 
 Finally, we just pass everything to the `Trainer` class and launch the training:
 
@@ -996,11 +987,8 @@ The `Trainer` also drafts a model card with all the evaluation results and uploa
 
 At this stage, you can use the inference widget on the Model Hub to test the model and share it with your friends, family, and favorite pets. You have successfully fine-tuned a model on a question answering task -- congratulations!
 
-<Tip>
-
-✏️ **Your turn!** Try another model architecture to see if it performs better on this task!
-
-</Tip>
+> [!TIP]
+> ✏️ **Your turn!** Try another model architecture to see if it performs better on this task!
 
 {#if fw === 'pt'}
 
diff --git a/chapters/en/chapter8/2.mdx b/chapters/en/chapter8/2.mdx
index 6bc7cc000..0eebe3b23 100644
--- a/chapters/en/chapter8/2.mdx
+++ b/chapters/en/chapter8/2.mdx
@@ -85,11 +85,8 @@ Oh no, something seems to have gone wrong! If you're new to programming, these k
 
 There's a lot of information contained in these reports, so let's walk through the key parts together. The first thing to note is that tracebacks should be read _from bottom to top_. This might sound weird if you're used to reading English text from top to bottom, but it reflects the fact that the traceback shows the sequence of function calls that the `pipeline` makes when downloading the model and tokenizer. (Check out [Chapter 2](/course/chapter2) for more details on how the `pipeline` works under the hood.)
 
-<Tip>
-
-🚨 See that blue box around "6 frames" in the traceback from Google Colab? That's a special feature of Colab, which  compresses the traceback into "frames." If you can't seem to find the source of an error, make sure you expand the full traceback by clicking on those two little arrows.
-
-</Tip>
+> [!TIP]
+> 🚨 See that blue box around "6 frames" in the traceback from Google Colab? That's a special feature of Colab, which  compresses the traceback into "frames." If you can't seem to find the source of an error, make sure you expand the full traceback by clicking on those two little arrows.
 
 This means that the last line of the traceback indicates the last error message and gives the name of the exception that was raised. In this case, the exception type is `OSError`, which indicates a system-related error. If we read the accompanying error message, we can see that there seems to be a problem with the model's *config.json* file, and we're given two suggestions to fix it:
 
@@ -103,11 +100,8 @@ Make sure that:
 """
 ```
 
-<Tip>
-
-💡 If you encounter an error message that is difficult to understand, just copy and paste the message into the Google or [Stack Overflow](https://stackoverflow.com/) search bar (yes, really!). There's a good chance that you're not the first person to encounter the error, and this is a good way to find solutions that others in the community have posted. For example, searching for `OSError: Can't load config for` on Stack Overflow gives several [hits](https://stackoverflow.com/search?q=OSError%3A+Can%27t+load+config+for+) that could be used as a starting point for solving the problem.
-
-</Tip>
+> [!TIP]
+> 💡 If you encounter an error message that is difficult to understand, just copy and paste the message into the Google or [Stack Overflow](https://stackoverflow.com/) search bar (yes, really!). There's a good chance that you're not the first person to encounter the error, and this is a good way to find solutions that others in the community have posted. For example, searching for `OSError: Can't load config for` on Stack Overflow gives several [hits](https://stackoverflow.com/search?q=OSError%3A+Can%27t+load+config+for+) that could be used as a starting point for solving the problem.
 
 The first suggestion is asking us to check whether the model ID is actually correct, so the first order of business is to copy the identifier and paste it into the Hub's search bar:
 
@@ -159,11 +153,8 @@ pretrained_checkpoint = "distilbert-base-uncased"
 config = AutoConfig.from_pretrained(pretrained_checkpoint)
 ```
 
-<Tip warning={true}>
-
-🚨 The approach we're taking here is not foolproof, since our colleague may have tweaked the configuration of `distilbert-base-uncased` before fine-tuning the model. In real life, we'd want to check with them first, but for the purposes of this section we'll assume they used the default configuration.
-
-</Tip>
+> [!WARNING]
+> 🚨 The approach we're taking here is not foolproof, since our colleague may have tweaked the configuration of `distilbert-base-uncased` before fine-tuning the model. In real life, we'd want to check with them first, but for the purposes of this section we'll assume they used the default configuration.
 
 We can then push this to our model repository with the configuration's `push_to_hub()` function:
 
diff --git a/chapters/en/chapter8/4.mdx b/chapters/en/chapter8/4.mdx
index 9d9d9848c..888073f40 100644
--- a/chapters/en/chapter8/4.mdx
+++ b/chapters/en/chapter8/4.mdx
@@ -245,11 +245,8 @@ So `1` means `neutral`, which means the two sentences we saw above are not in co
 
 We don't have token type IDs here, since DistilBERT does not expect them; if you have some in your model, you should also make sure that they properly match where the first and second sentences are in the input.
 
-<Tip>
-
-✏️ **Your turn!** Check that everything seems correct with the second element of the training dataset.
-
-</Tip>
+> [!TIP]
+> ✏️ **Your turn!** Check that everything seems correct with the second element of the training dataset.
 
 We are only doing the check on the training set here, but you should of course double-check the validation and test sets the same way.
 
@@ -522,11 +519,8 @@ Whenever you get an error message that starts with `RuntimeError: CUDA out of me
 
 To solve this issue, you just need to use less GPU space -- something that is often easier said than done. First, make sure you don't have two models on the GPU at the same time (unless that's required for your problem, of course). Then, you should probably reduce your batch size, as it directly affects the sizes of all the intermediate outputs of the model and their gradients. If the problem persists, consider using a smaller version of your model.
 
-<Tip>
-
-In the next part of the course, we'll look at more advanced techniques that can help you reduce your memory footprint and let you fine-tune the biggest models.
-
-</Tip>
+> [!TIP]
+> In the next part of the course, we'll look at more advanced techniques that can help you reduce your memory footprint and let you fine-tune the biggest models.
 
 ### Evaluating the model[[evaluating-the-model]]
 
@@ -553,11 +547,8 @@ trainer.evaluate()
 TypeError: only size-1 arrays can be converted to Python scalars
 ```
 
-<Tip>
-
-💡 You should always make sure you can run `trainer.evaluate()` before launching `trainer.train()`, to avoid wasting lots of compute resources before hitting an error.
-
-</Tip>
+> [!TIP]
+> 💡 You should always make sure you can run `trainer.evaluate()` before launching `trainer.train()`, to avoid wasting lots of compute resources before hitting an error.
 
 Before attempting to debug a problem in the evaluation loop, you should first make sure that you've had a look at the data, are able to form a batch properly, and can run your model on it. We've completed all of those steps, so the following code can be executed without error:
 
@@ -687,11 +678,8 @@ trainer.train()
 
 In this instance, there are no more problems, and our script will fine-tune a model that should give reasonable results. But what can we do when the training proceeds without any error, and the model trained does not perform well at all? That's the hardest part of machine learning, and we'll show you a few techniques that can help.
 
-<Tip>
-
-💡 If you're using a manual training loop, the same steps apply to debug your training pipeline, but it's easier to separate them. Make sure you have not forgotten the `model.eval()` or `model.train()` at the right places, or the `zero_grad()` at each step, however!
-
-</Tip>
+> [!TIP]
+> 💡 If you're using a manual training loop, the same steps apply to debug your training pipeline, but it's easier to separate them. Make sure you have not forgotten the `model.eval()` or `model.train()` at the right places, or the `zero_grad()` at each step, however!
 
 ## Debugging silent errors during training[[debugging-silent-errors-during-training]]
 
@@ -706,11 +694,8 @@ Your model will only learn something if it's actually possible to learn anything
 - Is there one label that's more common than the others?
 - What should the loss/metric be if the model predicted a random answer/always the same answer?
 
-<Tip warning={true}>
-
-⚠️ If you are doing distributed training, print samples of your dataset in each process and triple-check that you get the same thing. One common bug is to have some source of randomness in the data creation that makes each process have a different version of the dataset.
-
-</Tip>
+> [!WARNING]
+> ⚠️ If you are doing distributed training, print samples of your dataset in each process and triple-check that you get the same thing. One common bug is to have some source of randomness in the data creation that makes each process have a different version of the dataset.
 
 After looking at your data, go through a few of the model's predictions and decode them too. If the model is always predicting the same thing, it might be because your dataset is biased toward one category (for classification problems); techniques like oversampling rare classes might help.
 
@@ -739,11 +724,8 @@ for _ in range(20):
     trainer.optimizer.zero_grad()
 ```
 
-<Tip>
-
-💡 If your training data is unbalanced, make sure to build a batch of training data containing all the labels.
-
-</Tip>
+> [!TIP]
+> 💡 If your training data is unbalanced, make sure to build a batch of training data containing all the labels.
 
 The resulting model should have close-to-perfect results on the same `batch`. Let's compute the metric on the resulting predictions:
 
@@ -764,11 +746,8 @@ compute_metrics((preds.cpu().numpy(), labels.cpu().numpy()))
 
 If you don't manage to have your model obtain perfect results like this, it means there is something wrong with the way you framed the problem or your data, so you should fix that. Only when you manage to pass the overfitting test can you be sure that your model can actually learn something.
 
-<Tip warning={true}>
-
-⚠️ You will have to recreate your model and your `Trainer` after this test, as the model obtained probably won't be able to recover and learn something useful on your full dataset.
-
-</Tip>
+> [!WARNING]
+> ⚠️ You will have to recreate your model and your `Trainer` after this test, as the model obtained probably won't be able to recover and learn something useful on your full dataset.
 
 ### Don't tune anything until you have a first baseline[[dont-tune-anything-until-you-have-a-first-baseline]]
 
diff --git a/chapters/en/chapter8/4_tf.mdx b/chapters/en/chapter8/4_tf.mdx
index 675219820..9358bcba9 100644
--- a/chapters/en/chapter8/4_tf.mdx
+++ b/chapters/en/chapter8/4_tf.mdx
@@ -111,15 +111,12 @@ model.compile(optimizer="adam")
 
 Now we'll use the model's internal loss, and this problem should be resolved!
 
-<Tip>
-
-✏️ **Your turn!** As an optional challenge after we've resolved the other issues, you can try coming back to this step and getting the model to work with the original Keras-computed loss instead of the internal loss. You'll need to add `"labels"` to the `label_cols` argument of `to_tf_dataset()` to ensure that the labels are correctly outputted, which will get you gradients -- but there's one more problem with the loss that we specified. Training will still run with this problem, but learning will be very slow and will plateau at a high training loss. Can you figure out what it is?
-
-A ROT13-encoded hint, if you're stuck: Vs lbh ybbx ng gur bhgchgf bs FrdhraprPynffvsvpngvba zbqryf va Genafsbezref, gurve svefg bhgchg vf `ybtvgf`. Jung ner ybtvgf?
-
-And a second hint: Jura lbh fcrpvsl bcgvzvmref, npgvingvbaf be ybffrf jvgu fgevatf, Xrenf frgf nyy gur nethzrag inyhrf gb gurve qrsnhygf. Jung nethzragf qbrf FcnefrPngrtbevpnyPebffragebcl unir, naq jung ner gurve qrsnhygf?
-
-</Tip>
+> [!TIP]
+> ✏️ **Your turn!** As an optional challenge after we've resolved the other issues, you can try coming back to this step and getting the model to work with the original Keras-computed loss instead of the internal loss. You'll need to add `"labels"` to the `label_cols` argument of `to_tf_dataset()` to ensure that the labels are correctly outputted, which will get you gradients -- but there's one more problem with the loss that we specified. Training will still run with this problem, but learning will be very slow and will plateau at a high training loss. Can you figure out what it is?
+>
+> A ROT13-encoded hint, if you're stuck: Vs lbh ybbx ng gur bhgchgf bs FrdhraprPynffvsvpngvba zbqryf va Genafsbezref, gurve svefg bhgchg vf `ybtvgf`. Jung ner ybtvgf?
+>
+> And a second hint: Jura lbh fcrpvsl bcgvzvmref, npgvingvbaf be ybffrf jvgu fgevatf, Xrenf frgf nyy gur nethzrag inyhrf gb gurve qrsnhygf. Jung nethzragf qbrf FcnefrPngrtbevpnyPebffragebcl unir, naq jung ner gurve qrsnhygf?
 
 Now, let's try training. We should get gradients now, so hopefully (ominous music plays here) we can just call `model.fit()` and everything will work fine!
 
@@ -362,11 +359,8 @@ model = TFAutoModelForSequenceClassification.from_pretrained(model_checkpoint)
 model.compile(optimizer=Adam(5e-5))
 ```
 
-<Tip>
-
-💡 You can also import the `create_optimizer()` function from 🤗 Transformers, which will give you an AdamW optimizer with correct weight decay as well as learning rate warmup and decay. This optimizer will often produce slightly better results than the ones you get with the default Adam optimizer.
-
-</Tip>
+> [!TIP]
+> 💡 You can also import the `create_optimizer()` function from 🤗 Transformers, which will give you an AdamW optimizer with correct weight decay as well as learning rate warmup and decay. This optimizer will often produce slightly better results than the ones you get with the default Adam optimizer.
 
 Now, we can try fitting the model with the new, improved learning rate:
 
@@ -388,11 +382,8 @@ We've covered the issues in the script above, but there are several other common
 
 The telltale sign of running out of memory is an error like "OOM when allocating tensor" -- OOM is short for "out of memory." This is a very common hazard when dealing with large language models. If you encounter this, a good strategy is to halve your batch size and try again. Bear in mind, though, that some models are *very* large. For example, the full-size GPT-2 has 1.5B parameters, which means you'll need 6 GB of memory just to store the model, and another 6 GB for its gradients! Training the full GPT-2 model will usually require over 20 GB of VRAM no matter what batch size you use, which only a few GPUs have. More lightweight models like `distilbert-base-cased` are much easier to run, and train much more quickly too.
 
-<Tip>
-
-In the next part of the course, we'll look at more advanced techniques that can help you reduce your memory footprint and let you fine-tune the biggest models.
-
-</Tip>
+> [!TIP]
+> In the next part of the course, we'll look at more advanced techniques that can help you reduce your memory footprint and let you fine-tune the biggest models.
 
 ### Hungry Hungry TensorFlow 🦛[[hungry-hungry-tensorflow]]
 
@@ -448,21 +439,15 @@ for batch in train_dataset:
 model.fit(batch, epochs=20)
 ```
 
-<Tip>
-
-💡 If your training data is unbalanced, make sure to build a batch of training data containing all the labels.
-
-</Tip>
+> [!TIP]
+> 💡 If your training data is unbalanced, make sure to build a batch of training data containing all the labels.
 
 The resulting model should have close-to-perfect results on the `batch`, with a loss declining quickly toward 0 (or the minimum value for the loss you're using).
 
 If you don't manage to have your model obtain perfect results like this, it means there is something wrong with the way you framed the problem or your data, so you should fix that. Only when you manage to pass the overfitting test can you be sure that your model can actually learn something.
 
-<Tip warning={true}>
-
-⚠️ You will have to recreate your model and recompile after this overfitting test, as the model obtained probably won't be able to recover and learn something useful on your full dataset.
-
-</Tip>
+> [!WARNING]
+> ⚠️ You will have to recreate your model and recompile after this overfitting test, as the model obtained probably won't be able to recover and learn something useful on your full dataset.
 
 ### Don't tune anything until you have a first baseline[[dont-tune-anything-until-you-have-a-first-baseline]]
 
diff --git a/chapters/en/chapter8/5.mdx b/chapters/en/chapter8/5.mdx
index a17b9c234..d6220fd04 100644
--- a/chapters/en/chapter8/5.mdx
+++ b/chapters/en/chapter8/5.mdx
@@ -17,11 +17,8 @@ When you are sure you have a bug in your hand, the first step is to build a mini
 
 It's very important to isolate the piece of code that produces the bug, as no one in the Hugging Face team is a magician (yet), and they can't fix what they can't see. A minimal reproducible example should, as the name indicates, be reproducible. This means that it should not rely on any external files or data you may have. Try to replace the data you are using with some dummy values that look like your real ones and still produce the same error.
 
-<Tip>
-
-🚨 Many issues in the 🤗 Transformers repository are unsolved because the data used to reproduce them is not accessible.
-
-</Tip>
+> [!TIP]
+> 🚨 Many issues in the 🤗 Transformers repository are unsolved because the data used to reproduce them is not accessible.
 
 Once you have something that is self-contained, you can try to reduce it into even less lines of code, building what we call a _minimal reproducible example_. While this requires a bit more work on your side, you will almost be guaranteed to get help and a fix if you provide a nice, short bug reproducer.
 
diff --git a/chapters/en/chapter9/1.mdx b/chapters/en/chapter9/1.mdx
index f8d2a75f1..e75ab7f0f 100644
--- a/chapters/en/chapter9/1.mdx
+++ b/chapters/en/chapter9/1.mdx
@@ -32,6 +32,5 @@ Here are some examples of machine learning demos built with Gradio:
 
 This chapter is broken down into sections which include both _concepts_ and _applications_. After you learn the concept in each section, you'll apply it to build a particular kind of demo, ranging from image classification to speech recognition. By the time you finish this chapter, you'll be able to build these demos (and many more!) in just a few lines of Python code.
 
-<Tip>
-👀 Check out <a href="https://huggingface.co/spaces" target="_blank">Hugging Face Spaces</a> to see many recent examples of machine learning demos built by the machine learning community!
-</Tip>
\ No newline at end of file
+> [!TIP]
+> 👀 Check out <a href="https://huggingface.co/spaces" target="_blank">Hugging Face Spaces</a> to see many recent examples of machine learning demos built by the machine learning community!
\ No newline at end of file
diff --git a/chapters/en/chapter9/7.mdx b/chapters/en/chapter9/7.mdx
index fce4f80fb..c315c979a 100644
--- a/chapters/en/chapter9/7.mdx
+++ b/chapters/en/chapter9/7.mdx
@@ -61,9 +61,8 @@ demo.launch()
 This simple example above introduces 4 concepts that underlie Blocks:
 
 1. Blocks allow you to build web applications that combine markdown, HTML, buttons, and interactive components simply by instantiating objects in Python inside of a `with gradio.Blocks` context.
-<Tip>
-🙋If you're not familiar with the `with` statement in Python, we recommend checking out the excellent [tutorial](https://realpython.com/python-with-statement/) from Real Python. Come back here after reading that 🤗
-</Tip>
+> [!TIP]
+> 🙋If you're not familiar with the `with` statement in Python, we recommend checking out the excellent [tutorial](https://realpython.com/python-with-statement/) from Real Python. Come back here after reading that 🤗
 The order in which you instantiate components matters as each element gets rendered into the web app in the order it was created. (More complex layouts are discussed below)
 
 2. You can define regular Python functions anywhere in your code and run them with user input using `Blocks`. In our example, we have a simple function that "flips" the input text, but you can write any Python function, from a simple calculation to processing the predictions from a machine learning model.
diff --git a/chapters/es/chapter1/3.mdx b/chapters/es/chapter1/3.mdx
index 07c6f9c46..8b9c35e3d 100644
--- a/chapters/es/chapter1/3.mdx
+++ b/chapters/es/chapter1/3.mdx
@@ -9,13 +9,10 @@
 
 En esta sección, veremos qué pueden hacer los Transformadores y usaremos nuestra primera herramienta de la librería 🤗 Transformers: la función `pipeline()`.
 
-<Tip>
-
-👀 Ves el botón <em>Open in Colab</em> en la parte superior derecha? Haz clic en él para abrir un cuaderno de Google Colab con todos los ejemplos de código de esta sección. Este botón aparecerá en cualquier sección que tenga ejemplos de código.
-
-Si quieres ejecutar los ejemplos localmente, te recomendamos revisar la <a href="/course/chapter0">configuración</a>.
-
-</Tip>
+> [!TIP]
+> 👀 Ves el botón <em>Open in Colab</em> en la parte superior derecha? Haz clic en él para abrir un cuaderno de Google Colab con todos los ejemplos de código de esta sección. Este botón aparecerá en cualquier sección que tenga ejemplos de código.
+>
+> Si quieres ejecutar los ejemplos localmente, te recomendamos revisar la <a href="/course/chapter0">configuración</a>.
 
 ## ¡Los Transformadores están en todas partes!
 
@@ -25,11 +22,8 @@ Los Transformadores se usan para resolver todo tipo de tareas de PLN, como las m
 
 La [librería 🤗 Transformers](https://github.com/huggingface/transformers) provee la funcionalidad de crear y usar estos modelos compartidos. El [Hub de Modelos](https://huggingface.co/models) contiene miles de modelos preentrenados que cualquiera puede descargar y usar. ¡Tú también puedes subir tus propios modelos al Hub!
 
-<Tip>
-
-⚠️ El Hub de Hugging Face no se limita a Transformadores. ¡Cualquiera puede compartir los tipos de modelos o conjuntos de datos que quiera! ¡<a href="https://huggingface.co/join">Crea una cuenta de huggingface.co</a> para beneficiarte de todas las funciones disponibles!
-
-</Tip>
+> [!TIP]
+> ⚠️ El Hub de Hugging Face no se limita a Transformadores. ¡Cualquiera puede compartir los tipos de modelos o conjuntos de datos que quiera! ¡<a href="https://huggingface.co/join">Crea una cuenta de huggingface.co</a> para beneficiarte de todas las funciones disponibles!
 
 Antes de ver cómo funcionan internamente los Transformadores, veamos un par de ejemplos sobre cómo pueden ser usados para resolver tareas de PLN. 
 
@@ -107,11 +101,8 @@ classifier(
 
 Este pipeline se llama _zero-shot_ porque no necesitas ajustar el modelo con tus datos para usarlo. ¡Puede devolver directamente puntajes de probabilidad para cualquier lista de de etiquetas que escojas!
 
-<Tip>
-
-✏️ **¡Pruébalo!** Juega con tus propias secuencias y etiquetas, y observa cómo se comporta el modelo.
-
-</Tip>
+> [!TIP]
+> ✏️ **¡Pruébalo!** Juega con tus propias secuencias y etiquetas, y observa cómo se comporta el modelo.
 
 
 ## Generación de texto
@@ -135,11 +126,8 @@ generator("In this course, we will teach you how to")
 
 Puedes controlar cuántas secuencias diferentes se generan con el argumento `num_return_sequences` y la longitud total del texto de salida con el argumento `max_length`.
 
-<Tip>
-
-✏️ **¡Pruébalo!** Usa los argumentos `num_return_sequences` y `max_length` para generar dos oraciones de 15 palabras cada una.
-
-</Tip>
+> [!TIP]
+> ✏️ **¡Pruébalo!** Usa los argumentos `num_return_sequences` y `max_length` para generar dos oraciones de 15 palabras cada una.
 
 
 ## Usa cualquier modelo del Hub en un pipeline
@@ -171,11 +159,8 @@ Puedes refinar tu búsqueda de un modelo haciendo clic en las etiquetas de idiom
 
 Una vez has seleccionado un modelo haciendo clic en él, verás que hay un widget que te permite probarlo directamente en línea. De esta manera puedes probar rápidamente las capacidades del modelo antes de descargarlo.
 
-<Tip>
-
-✏️ **¡Pruébalo!** Usa los filtros para encontrar un modelo de generación de texto para un idioma diferente. ¡Siéntete libre de jugar con el widget y úsalo en un pipeline!
-
-</Tip>
+> [!TIP]
+> ✏️ **¡Pruébalo!** Usa los filtros para encontrar un modelo de generación de texto para un idioma diferente. ¡Siéntete libre de jugar con el widget y úsalo en un pipeline!
 
 
 ### La API de Inferencia
@@ -208,11 +193,8 @@ unmasker("This course will teach you all about <mask> models.", top_k=2)
 
 El argumento `top_k` controla el número de posibilidades que se van a mostrar. Nota que en este caso el modelo llena la palabra especial `<mask>`, que se denomina comúnmente como *mask token*. Otros modelos pueden tener diferentes tokens, por lo que es una buena idea verificar la palabra especial adecuada cuando estés explorando diferentes modelos. Una manera de confirmar es revisar la palabra usada en el widget.
 
-<Tip>
-
-✏️ **¡Pruébalo!** Busca el modelo `bert-base-cased` en el Hub e identifica su *mask token* en el widget de la API de Inferencia. ¿Qué predice este modelo para la oración que está en el ejemplo de `pipeline` anterior?
-
-</Tip>
+> [!TIP]
+> ✏️ **¡Pruébalo!** Busca el modelo `bert-base-cased` en el Hub e identifica su *mask token* en el widget de la API de Inferencia. ¿Qué predice este modelo para la oración que está en el ejemplo de `pipeline` anterior?
 
 ## Reconocimiento de entidades nombradas
 
@@ -236,11 +218,8 @@ En este caso el modelo identificó correctamente que Sylvain es una persona (PER
 
 Pasamos la opción `grouped_entities=True` en la función de creación del pipeline para decirle que agrupe las partes de la oración que corresponden a la misma entidad: Aquí el modelo agrupó correctamente "Hugging" y "Face" como una sola organización, a pesar de que su nombre está compuesto de varias palabras. De hecho, como veremos en el siguiente capítulo, el preprocesamiento puede incluso dividir palabras en partes más pequeñas. Por ejemplo, 'Sylvain' se separa en cuatro piezas: `S`, `##yl`, `##va` y`##in`. En el paso de prosprocesamiento, el pipeline reagrupa de manera exitosa dichas piezas.
 
-<Tip>
-
-✏️ **¡Pruébalo!** Busca en el Model Hub un modelo capaz de hacer etiquetado *part-of-speech* (que se abrevia usualmente como POS) en Inglés. ¿Qué predice este modelo para la oración en el ejemplo de arriba?
-
-</Tip>
+> [!TIP]
+> ✏️ **¡Pruébalo!** Busca en el Model Hub un modelo capaz de hacer etiquetado *part-of-speech* (que se abrevia usualmente como POS) en Inglés. ¿Qué predice este modelo para la oración en el ejemplo de arriba?
 
 ## Responder preguntas
 
@@ -324,10 +303,7 @@ translator("Ce cours est produit par Hugging Face.")
 
 Al igual que los pipelines de generación de textos y resumen, puedes especificar una longitud máxima (`max_length`) o mínima (`min_length`) para el resultado.
 
-<Tip>
-
-✏️ **¡Pruébalo!** Busca modelos de traducción en otros idiomas e intenta traducir la oración anterior en varios de ellos.
-
-</Tip>
+> [!TIP]
+> ✏️ **¡Pruébalo!** Busca modelos de traducción en otros idiomas e intenta traducir la oración anterior en varios de ellos.
 
 Los pipelines vistos hasta el momento son principalmente para fines demostrativos. Fueron programados para tareas específicas y no pueden desarrollar variaciones de ellas. En el siguiente capítulo, aprenderás qué está detrás de una función `pipeline()` y cómo personalizar su comportamiento.
\ No newline at end of file
diff --git a/chapters/es/chapter2/4.mdx b/chapters/es/chapter2/4.mdx
index e2b954efb..1202ab35a 100644
--- a/chapters/es/chapter2/4.mdx
+++ b/chapters/es/chapter2/4.mdx
@@ -216,11 +216,8 @@ print(ids)
 
 Estos resultados, una vez convertidos en el tensor del marco apropiado, pueden utilizarse como entradas de un modelo, como se ha visto anteriormente en este capítulo.
 
-<Tip>
-
-✏️ **Try it out!** Replica los dos últimos pasos (tokenización y conversión a IDs de entrada) en las frases de entrada que utilizamos en la sección 2 ("Llevo toda la vida esperando un curso de HuggingFace" y "¡Odio tanto esto!"). Comprueba que obtienes los mismos ID de entrada que obtuvimos antes!
-
-</Tip>
+> [!TIP]
+> ✏️ **Try it out!** Replica los dos últimos pasos (tokenización y conversión a IDs de entrada) en las frases de entrada que utilizamos en la sección 2 ("Llevo toda la vida esperando un curso de HuggingFace" y "¡Odio tanto esto!"). Comprueba que obtienes los mismos ID de entrada que obtuvimos antes!
 
 ## Decodificación
 
diff --git a/chapters/es/chapter2/5.mdx b/chapters/es/chapter2/5.mdx
index 9a4cb0f8d..b419db097 100644
--- a/chapters/es/chapter2/5.mdx
+++ b/chapters/es/chapter2/5.mdx
@@ -177,11 +177,8 @@ batched_ids = [ids, ids]
 
 Se trata de un lote de dos secuencias idénticas.
 
-<Tip>
-
-✏️ **Try it out!** Convierte esta lista `batched_ids` en un tensor y pásalo por tu modelo. Comprueba que obtienes los mismos logits que antes (¡pero dos veces!).
-
-</Tip>
+> [!TIP]
+> ✏️ **Try it out!** Convierte esta lista `batched_ids` en un tensor y pásalo por tu modelo. Comprueba que obtienes los mismos logits que antes (¡pero dos veces!).
 
 La creación de lotes permite que el modelo funcione cuando lo alimentas con múltiples sentencias. Utilizar varias secuencias es tan sencillo como crear un lote con una sola secuencia. Sin embargo, hay un segundo problema. Cuando se trata de agrupar dos (o más) frases, éstas pueden ser de diferente longitud. Si alguna vez ha trabajado con tensores, sabrá que deben tener forma rectangular, por lo que no podrá convertir la lista de IDs de entrada en un tensor directamente. Para evitar este problema, usamos el *padding* para las entradas.
 
@@ -310,11 +307,8 @@ Ahora obtenemos los mismos logits para la segunda frase del lote.
 
 Podemos ver que el último valor de la segunda secuencia es un ID de relleno, que es un valor 0 en la máscara de atención.
 
-<Tip>
-
-✏️ **Try it out!** Aplique la tokenización manualmente a las dos frases utilizadas en la sección 2 ("Llevo toda la vida esperando un curso de HuggingFace" y "¡Odio tanto esto!"). Páselas por el modelo y compruebe que obtiene los mismos logits que en la sección 2. Ahora júntalos usando el token de relleno, y luego crea la máscara de atención adecuada. Comprueba que obtienes los mismos resultados al pasarlos por el modelo.
-
-</Tip>
+> [!TIP]
+> ✏️ **Try it out!** Aplique la tokenización manualmente a las dos frases utilizadas en la sección 2 ("Llevo toda la vida esperando un curso de HuggingFace" y "¡Odio tanto esto!"). Páselas por el modelo y compruebe que obtiene los mismos logits que en la sección 2. Ahora júntalos usando el token de relleno, y luego crea la máscara de atención adecuada. Comprueba que obtienes los mismos resultados al pasarlos por el modelo.
 
 ## Secuencias largas
 
diff --git a/chapters/es/chapter3/2.mdx b/chapters/es/chapter3/2.mdx
index b428d6265..7bd2d8fd2 100644
--- a/chapters/es/chapter3/2.mdx
+++ b/chapters/es/chapter3/2.mdx
@@ -92,9 +92,8 @@ El Hub no solo contiene modelos; sino que también tiene múltiples conjunto de
 
 La librería 🤗 Datasets provee un comando muy simple para descargar y memorizar un conjunto de datos en el Hub. Podemos descargar el conjunto de datos de la siguiente manera:
 
-<Tip>
-⚠️ **Advertencia** Asegúrate de que `datasets` esté instalado ejecutando `pip install datasets`. Luego, carga el conjunto de datos MRPC y imprímelo para ver qué contiene.
-</Tip> 
+> [!TIP]
+> ⚠️ **Advertencia** Asegúrate de que `datasets` esté instalado ejecutando `pip install datasets`. Luego, carga el conjunto de datos MRPC y imprímelo para ver qué contiene. 
 
 ```py
 from datasets import load_dataset
@@ -153,11 +152,8 @@ raw_train_dataset.features
 
 Internamente, `label` es del tipo de dato `ClassLabel`, y la asociación de valores enteros y sus etiquetas esta almacenado en la carpeta *names*. `0` corresponde con `not_equivalent`, y `1` corresponde con `equivalent`.
 
-<Tip>
-
-✏️ **¡Inténtalo!** Mira el elemento 15 del conjunto de datos de entrenamiento y el elemento 87 del conjunto de datos de validación. Cuáles son sus etiquetas?
-
-</Tip>
+> [!TIP]
+> ✏️ **¡Inténtalo!** Mira el elemento 15 del conjunto de datos de entrenamiento y el elemento 87 del conjunto de datos de validación. Cuáles son sus etiquetas?
 
 ### Preprocesando un conjunto de datos
 
@@ -197,11 +193,8 @@ inputs
 
 Nosotros consideramos las llaves `input_ids` y `attention_mask` en el [Capítulo 2](/course/chapter2), pero postergamos hablar sobre la llave `token_type_ids`. En este ejemplo, esta es la que le dice al modelo cual parte de la entrada es la primera oración y cual es la segunda.
 
-<Tip>
-
-✏️ **¡Inténtalo!** Toma el elemento 15 del conjunto de datos de entrenamiento y tokeniza las dos oraciones independientemente y como un par. Cuál es la diferencia entre los dos resultados?
-
-</Tip>
+> [!TIP]
+> ✏️ **¡Inténtalo!** Toma el elemento 15 del conjunto de datos de entrenamiento y tokeniza las dos oraciones independientemente y como un par. Cuál es la diferencia entre los dos resultados?
 
 Si convertimos los IDs dentro de `input_ids` en palabras:
 
@@ -362,11 +355,8 @@ batch = data_collator(samples)
 
 {/if}
 
-<Tip>
-
-✏️ **¡Inténtalo!** Reproduce el preprocesamiento en el conjunto de datos GLUE SST-2. Es un poco diferente ya que esta compuesto de oraciones individuales en lugar de pares, pero el resto de lo que hicimos debería ser igual. Para un reto mayor, intenta escribir una función de preprocesamiento que trabaje con cualquiera de las tareas GLUE.
-
-</Tip>
+> [!TIP]
+> ✏️ **¡Inténtalo!** Reproduce el preprocesamiento en el conjunto de datos GLUE SST-2. Es un poco diferente ya que esta compuesto de oraciones individuales en lugar de pares, pero el resto de lo que hicimos debería ser igual. Para un reto mayor, intenta escribir una función de preprocesamiento que trabaje con cualquiera de las tareas GLUE.
 
 {#if fw === 'tf'}
 
diff --git a/chapters/es/chapter3/3.mdx b/chapters/es/chapter3/3.mdx
index b1e543542..bb2b0bfb8 100644
--- a/chapters/es/chapter3/3.mdx
+++ b/chapters/es/chapter3/3.mdx
@@ -52,11 +52,8 @@ from transformers import TrainingArguments
 training_args = TrainingArguments("test-trainer")
 ```
 
-<Tip>
-
-💡 Si quieres subir automáticamente tu modelo al Hub durante el entrenamiento, incluye `push_to_hub=True` en `TrainingArguments`. Aprenderemos más sobre esto en el [Capítulo 4](/course/chapter4/3).
-
-</Tip>
+> [!TIP]
+> 💡 Si quieres subir automáticamente tu modelo al Hub durante el entrenamiento, incluye `push_to_hub=True` en `TrainingArguments`. Aprenderemos más sobre esto en el [Capítulo 4](/course/chapter4/3).
 
 El segundo paso es definir nuestro modelo. Como en el [capítulo anterior](/course/chapter2), utilizaremos la clase `AutoModelForSequenceClassification`, con dos etiquetas:
 
@@ -173,8 +170,5 @@ El `Trainer` funciona en múltiples GPUs o TPUs y proporciona muchas opciones, c
 
 Con esto concluye la introducción al ajuste utilizando la API de `Trainer`. En el [Capítulo 7](/course/chapter7) se dará un ejemplo de cómo hacer esto para las tareas más comunes de PLN, pero ahora veamos cómo hacer lo mismo en PyTorch puro.
 
-<Tip>
-
-✏️ **¡Inténtalo!** Ajusta un modelo sobre el dataset GLUE SST-2 utilizando el procesamiento de datos que has implementado en la sección 2.
-
-</Tip>
+> [!TIP]
+> ✏️ **¡Inténtalo!** Ajusta un modelo sobre el dataset GLUE SST-2 utilizando el procesamiento de datos que has implementado en la sección 2.
diff --git a/chapters/es/chapter3/3_tf.mdx b/chapters/es/chapter3/3_tf.mdx
index 34f5dcb69..6c20aba5f 100644
--- a/chapters/es/chapter3/3_tf.mdx
+++ b/chapters/es/chapter3/3_tf.mdx
@@ -80,11 +80,8 @@ Observarás que, a diferencia del [Capítulo 2](/course/es/chapter2), aparece un
 
 Para afinar el modelo en nuestro dataset, sólo tenemos que compilar nuestro modelo con `compile()` y luego pasar nuestros datos al método `fit()`. Esto iniciará el proceso de ajuste (que debería tardar un par de minutos en una GPU) e informará de la pérdida de entrenamiento a medida que avanza, además de la pérdida de validación al final de cada época.
 
-<Tip>
-
-Ten en cuenta que los modelos 🤗 Transformers tienen una característica especial que la mayoría de los modelos Keras no tienen - pueden usar automáticamente una pérdida apropiada que calculan internamente. Usarán esta pérdida por defecto si no estableces un argumento de pérdida en `compile()`. Tea en cuenta que para utilizar la pérdida interna tendrás que pasar las etiquetas como parte de la entrada, en vez de como una etiqueta separada como es habitual en los modelos Keras. Veremos ejemplos de esto en la Parte 2 del curso, donde definir la función de pérdida correcta puede ser complicado. Para la clasificación de secuencias, sin embargo, una función de pérdida estándar de Keras funciona bien, así que eso es lo que usaremos aquí.
-
-</Tip>
+> [!TIP]
+> Ten en cuenta que los modelos 🤗 Transformers tienen una característica especial que la mayoría de los modelos Keras no tienen - pueden usar automáticamente una pérdida apropiada que calculan internamente. Usarán esta pérdida por defecto si no estableces un argumento de pérdida en `compile()`. Tea en cuenta que para utilizar la pérdida interna tendrás que pasar las etiquetas como parte de la entrada, en vez de como una etiqueta separada como es habitual en los modelos Keras. Veremos ejemplos de esto en la Parte 2 del curso, donde definir la función de pérdida correcta puede ser complicado. Para la clasificación de secuencias, sin embargo, una función de pérdida estándar de Keras funciona bien, así que eso es lo que usaremos aquí.
 
 ```py
 from tensorflow.keras.losses import SparseCategoricalCrossentropy
@@ -100,11 +97,8 @@ model.fit(
 )
 ```
 
-<Tip warning={true}>
-
-Ten en cuenta un fallo muy común aquí: por poder, _puedes_ pasar simplemente el nombre de la función de pérdida como una cadena a Keras, pero por defecto Keras asumirá que ya has aplicado una función softmax a tus salidas. Sin embargo, muchos modelos devuelven los valores justo antes de que se aplique la función softmax, también conocidos como _logits_. Tenemos que decirle a la función de pérdida que eso es lo que hace nuestro modelo, y la única manera de hacerlo es llamándola directamente, en lugar de pasar su nombre con una cadena.
-
-</Tip>
+> [!WARNING]
+> Ten en cuenta un fallo muy común aquí: por poder, _puedes_ pasar simplemente el nombre de la función de pérdida como una cadena a Keras, pero por defecto Keras asumirá que ya has aplicado una función softmax a tus salidas. Sin embargo, muchos modelos devuelven los valores justo antes de que se aplique la función softmax, también conocidos como _logits_. Tenemos que decirle a la función de pérdida que eso es lo que hace nuestro modelo, y la única manera de hacerlo es llamándola directamente, en lugar de pasar su nombre con una cadena.
 
 ### Mejorar el rendimiento del entrenamiento
 
@@ -136,11 +130,8 @@ from tensorflow.keras.optimizers import Adam
 opt = Adam(learning_rate=lr_scheduler)
 ```
 
-<Tip>
-
-La librería 🤗 Transformers también tiene una función `create_optimizer()` que creará un optimizador `AdamW` con descenso de tasa de aprendizaje. Verás en detalle este útil atajo en próximas secciones del curso.
-
-</Tip>
+> [!TIP]
+> La librería 🤗 Transformers también tiene una función `create_optimizer()` que creará un optimizador `AdamW` con descenso de tasa de aprendizaje. Verás en detalle este útil atajo en próximas secciones del curso.
 
 Ahora tenemos nuestro nuevo optimizador, y podemos intentar entrenar con él. En primer lugar, vamos a recargar el modelo, para restablecer los cambios en los pesos del entrenamiento que acabamos de hacer, y luego podemos compilarlo con el nuevo optimizador:
 
@@ -158,11 +149,8 @@ Ahora, ajustamos de nuevo:
 model.fit(tf_train_dataset, validation_data=tf_validation_dataset, epochs=3)
 ```
 
-<Tip>
-
-💡 Si quieres subir automáticamente tu modelo a Hub durante el entrenamiento, puedes pasar un `PushToHubCallback` en el método `model.fit()`. Aprenderemos más sobre esto en el [Capítulo 4](/course/es/chapter4/3)
-
-</Tip>
+> [!TIP]
+> 💡 Si quieres subir automáticamente tu modelo a Hub durante el entrenamiento, puedes pasar un `PushToHubCallback` en el método `model.fit()`. Aprenderemos más sobre esto en el [Capítulo 4](/course/es/chapter4/3)
 
 ### Predicciones del Modelo[[model-predictions]]
 
diff --git a/chapters/es/chapter3/4.mdx b/chapters/es/chapter3/4.mdx
index 65ce4a045..f223d596b 100644
--- a/chapters/es/chapter3/4.mdx
+++ b/chapters/es/chapter3/4.mdx
@@ -196,11 +196,8 @@ metric.compute()
 
 De nuevo, tus resultados serán un tanto diferente debido a la inicialización aleatoria en la cabeza del modelo y el mezclado de los datos, pero deberían tener valores similares.
 
-<Tip>
-
-✏️ **Inténtalo!** Modifica el bucle de entrenamiento anterior para ajustar tu modelo en el conjunto de datos SST-2.
-
-</Tip>
+> [!TIP]
+> ✏️ **Inténtalo!** Modifica el bucle de entrenamiento anterior para ajustar tu modelo en el conjunto de datos SST-2.
 
 ### Repotencia tu bucle de entrenamiento con Accelerate 🤗
 
@@ -292,11 +289,10 @@ La primera línea a agregarse es la línea del `import`. La segunda línea crea
 
 Ahora la mayor parte del trabajo se hace en la línea que envía los `dataloaders`, el modelo y el optimizador al `accelerator.prepare()`. Este va a envolver esos objetos en el contenedor apropiado para asegurarse que tu entrenamiento distribuido funcione como se espera. Los cambios que quedan son remover la línea que coloca el lote en el `device` (de nuevo, si deseas dejarlo así bastaría con cambiarlo para que use el `accelerator.device`) y reemplazar `loss.backward()` con `accelerator.backward(loss)`.
 
-<Tip>
-  ⚠️ Para obtener el beneficio de la aceleración ofrecida por los TPUs de la
-  nube, recomendamos rellenar las muestras hasta una longitud fija con los
-  argumentos `padding="max_length"` y `max_length` del tokenizador.
-</Tip>
+> [!TIP]
+> ⚠️ Para obtener el beneficio de la aceleración ofrecida por los TPUs de la
+>   nube, recomendamos rellenar las muestras hasta una longitud fija con los
+>   argumentos `padding="max_length"` y `max_length` del tokenizador.
 
 Si deseas copiarlo y pegarlo para probar, así es como luce el bucle completo de entrenamiento con 🤗 Accelerate:
 
diff --git a/chapters/es/chapter5/2.mdx b/chapters/es/chapter5/2.mdx
index e68820743..be29f2443 100644
--- a/chapters/es/chapter5/2.mdx
+++ b/chapters/es/chapter5/2.mdx
@@ -48,11 +48,8 @@ SQuAD_it-train.json.gz:	   82.2% -- replaced with SQuAD_it-train.json
 
 De este modo, podemos ver que los archivos comprimidos son reemplazados por los archuvos en formato JSON _SQuAD_it-train.json_ y _SQuAD_it-test.json_.
 
-<Tip>
-
-✎ Si te preguntas por qué hay un carácter de signo de admiración (`!`) en los comandos de shell, esto es porque los estamos ejecutando desde un cuaderno de Jupyter. Si quieres descargar y descomprimir el archivo directamente desde la terminal, elimina el signo de admiración.
-
-</Tip>
+> [!TIP]
+> ✎ Si te preguntas por qué hay un carácter de signo de admiración (`!`) en los comandos de shell, esto es porque los estamos ejecutando desde un cuaderno de Jupyter. Si quieres descargar y descomprimir el archivo directamente desde la terminal, elimina el signo de admiración.
 
 Para cargar un archivo JSON con la función `load_dataset()`, necesitamos saber si estamos trabajando con un archivo JSON ordinario (parecido a un diccionario anidado) o con JSON Lines (JSON separado por líneas). Como muchos de los datasets de respuesta a preguntas que te vas a encontrar, SQuAD-it usa el formato anidado, en el que el texto está almacenado en un campo `data`. Esto significa que podemos cargar el dataset especificando el argumento `field` de la siguiente manera: 
 
@@ -127,11 +124,8 @@ DatasetDict({
 
 Esto es exactamente lo que queríamos. Ahora podemos aplicar varias técnicas de preprocesamiento para limpiar los datos, _tokenizar_ las reseñas, entre otras tareas.
 
-<Tip>
-
-El argumento `data_files` de la función `load_dataset()` es muy flexible. Puede ser una única ruta de archivo, una lista de rutas o un diccionario que mapee los nombres de los conjuntos a las rutas de archivo. También puedes buscar archivos que cumplan con cierto patrón específico de acuerdo con las reglas usadas por el shell de Unix (e.g., puedes buscar todos los archivos JSON en una carpeta al definir `datafiles="*.json"`). Revisa la [documentación](https://huggingface.co/docs/datasets/loading#local-and-remote-files) para más detalles.
-
-</Tip>
+> [!TIP]
+> El argumento `data_files` de la función `load_dataset()` es muy flexible. Puede ser una única ruta de archivo, una lista de rutas o un diccionario que mapee los nombres de los conjuntos a las rutas de archivo. También puedes buscar archivos que cumplan con cierto patrón específico de acuerdo con las reglas usadas por el shell de Unix (e.g., puedes buscar todos los archivos JSON en una carpeta al definir `datafiles="*.json"`). Revisa la [documentación](https://huggingface.co/docs/datasets/loading#local-and-remote-files) para más detalles.
 
 Los scripts de carga en 🤗 Datasets también pueden descomprimir los archivos de entrada automáticamente, así que podemos saltarnos el uso de `gzip` especificando el argumento `data_files` directamente a la ruta de los archivos comprimidos.
 
@@ -159,8 +153,5 @@ squad_it_dataset = load_dataset("json", data_files=data_files, field="data")
 
 Esto devuelve el mismo objeto `DatasetDict` que obtuvimos antes, pero nos ahorra el paso de descargar y descomprimir manualmente los archivos _SQuAD_it-*.json.gz_. Con esto concluimos nuestra exploración de las diferentes maneras de cargar datasets que no están alojados en el Hub de Hugging Face. Ahora que tenemos un dataset para experimentar, ¡pongámonos manos a la obra con diferentes técnicas de procesamiento de datos!
 
-<Tip>
-
-✏️ **¡Inténtalo!** Escoge otro dataset alojado en GitHub o en el [Repositorio de Machine Learning de UCI](https://archive.ics.uci.edu/ml/index.php) e intenta cargarlo local y remotamente usando las técnicas descritas con anterioridad. Para puntos extra, intenta cargar un dataset que esté guardado en un formato CSV o de texto (revisa la [documentación](https://huggingface.co/docs/datasets/loading#local-and-remote-files) pata tener más información sobre estos formatos).
-
-</Tip>
+> [!TIP]
+> ✏️ **¡Inténtalo!** Escoge otro dataset alojado en GitHub o en el [Repositorio de Machine Learning de UCI](https://archive.ics.uci.edu/ml/index.php) e intenta cargarlo local y remotamente usando las técnicas descritas con anterioridad. Para puntos extra, intenta cargar un dataset que esté guardado en un formato CSV o de texto (revisa la [documentación](https://huggingface.co/docs/datasets/loading#local-and-remote-files) pata tener más información sobre estos formatos).
diff --git a/chapters/es/chapter5/3.mdx b/chapters/es/chapter5/3.mdx
index 61441ef56..322d1ec1c 100644
--- a/chapters/es/chapter5/3.mdx
+++ b/chapters/es/chapter5/3.mdx
@@ -89,11 +89,8 @@ DatasetDict({
 })
 ```
 
-<Tip>
-
-✏️ **¡Inténtalo!** Usa la función `Dataset.unique()` para encontrar el número de medicamentos y condiciones únicas en los conjuntos de entrenamiento y de prueba.
-
-</Tip>
+> [!TIP]
+> ✏️ **¡Inténtalo!** Usa la función `Dataset.unique()` para encontrar el número de medicamentos y condiciones únicas en los conjuntos de entrenamiento y de prueba.
 
 Ahora normalicemos todas las etiquetas de `condition` usando `Dataset.map()`. Tal como lo hicimos con la tokenización en el [Capítulo 3](/course/chapter3), podemos definir una función simple que pueda ser aplicada en todas las filas de cada conjunto en el `drug_dataset`:
 
@@ -217,11 +214,8 @@ drug_dataset["train"].sort("review_length")[:3]
 
 Como lo discutimos anteriormente, algunas reseñas incluyen una sola palabra, que si bien puede ser útil para el análisis de sentimientos, no sería tan informativa si quisiéramos predecir la condición.
 
-<Tip>
-
-🙋 Una forma alternativa de añadir nuevas columnas al dataset es a través de la función `Dataset.add_column()`. Esta te permite incluir la columna como una lista de Python o un array de NumPy y puede ser útil en situaciones en las que `Dataset.map()` no se ajusta a tu caso de uso.
-
-</Tip>
+> [!TIP]
+> 🙋 Una forma alternativa de añadir nuevas columnas al dataset es a través de la función `Dataset.add_column()`. Esta te permite incluir la columna como una lista de Python o un array de NumPy y puede ser útil en situaciones en las que `Dataset.map()` no se ajusta a tu caso de uso.
 
 Usemos la función `Dataset.filter()` para quitar las reseñas que contienen menos de 30 palabras. Similar a lo que hicimos con la columna `condition`, podemos filtrar las reseñas cortas al incluir una condición de que su longitud esté por encima de este umbral:
 
@@ -236,11 +230,8 @@ print(drug_dataset.num_rows)
 
 Como puedes ver, esto ha eliminado alrededor del 15% de las reseñas de nuestros conjuntos originales de entrenamiento y prueba.
 
-<Tip>
-
-✏️ **¡Inténtalo!** Usa la función `Dataset.sort()` para inspeccionar las reseñas con el mayor número de palabras. Revisa la [documentación](https://huggingface.co/docs/datasets/package_reference/main_classes#datasets.Dataset.sort) para ver cuál argumento necesitas para ordenar las reseñas de mayor a menor.
-
-</Tip>
+> [!TIP]
+> ✏️ **¡Inténtalo!** Usa la función `Dataset.sort()` para inspeccionar las reseñas con el mayor número de palabras. Revisa la [documentación](https://huggingface.co/docs/datasets/package_reference/main_classes#datasets.Dataset.sort) para ver cuál argumento necesitas para ordenar las reseñas de mayor a menor.
 
 Por último, tenemos que lidiar con la presencia de códigos de caracteres HTML en las reseñas. Podemos usar el módulo `html` de Python para transformar estos códigos así:
 
@@ -297,11 +288,8 @@ Como viste en el [Capítulo 3](/course/chapter3), podemos pasar uno o varios eje
 
 También puedes medir el tiempo de una celda completa añadiendo `%%time` al inicio de la celda. En el hardware en el que lo ejecutamos, nos arrojó 10.8s para esta instrucción (es el número que aparece después de "Wall time").
 
-<Tip>
-
-✏️ **¡Inténtalo!** Ejecuta la misma instrucción con y sin `batched=True` y luego usa un tokenizador "lento" (añade `use_fast=False` en el método `AutoTokenizer.from_pretrained()`) para ver cuánto tiempo se toman en tu computador.
-
-</Tip>
+> [!TIP]
+> ✏️ **¡Inténtalo!** Ejecuta la misma instrucción con y sin `batched=True` y luego usa un tokenizador "lento" (añade `use_fast=False` en el método `AutoTokenizer.from_pretrained()`) para ver cuánto tiempo se toman en tu computador.
 
 Estos son los resultados que obtuvimos con y sin la ejecución por lotes, con un tokenizador rápido y lento:
 
@@ -338,19 +326,13 @@ Opciones         | Tokenizador rápido | Tokenizador lento
 
 Estos son resultados mucho más razonables para el tokenizador lento, aunque el desempeño del rápido también mejoró sustancialmente. Sin embargo, este no siempre será el caso: para valores de `num_proc` diferentes a 8, nuestras pruebas mostraron que era más rápido usar `batched=true` sin esta opción. En general, no recomendamos usar el multiprocesamiento de Python para tokenizadores rápidos con `batched=True`.
 
-<Tip>
-
-Usar `num_proc` para acelerar tu procesamiento suele ser una buena idea, siempre y cuando la función que uses no esté usando multiples procesos por si misma.
-
-</Tip>
+> [!TIP]
+> Usar `num_proc` para acelerar tu procesamiento suele ser una buena idea, siempre y cuando la función que uses no esté usando multiples procesos por si misma.
 
 Que toda esta funcionalidad está incluida en un método es algo impresionante en si mismo, ¡pero hay más!. Con `Dataset.map()` y `batched=True` puedes cambiar el número de elementos en tu dataset. Esto es súper útil en situaciones en las que quieres crear varias características de entrenamiento de un ejemplo, algo que haremos en el preprocesamiento para varias de las tareas de PLN que abordaremos en el [Capítulo 7](/course/chapter7).
 
-<Tip>
-
-💡 Un _ejemplo_ en Machine Learning se suele definir como el conjunto de _features_ que le damos al modelo. En algunos contextos estos features serán el conjunto de columnas en un `Dataset`, mientras que en otros se pueden extraer múltiples features de un solo ejemplo que pertenecen a una columna –como aquí y en tareas de responder preguntas-.
-
-</Tip>
+> [!TIP]
+> 💡 Un _ejemplo_ en Machine Learning se suele definir como el conjunto de _features_ que le damos al modelo. En algunos contextos estos features serán el conjunto de columnas en un `Dataset`, mientras que en otros se pueden extraer múltiples features de un solo ejemplo que pertenecen a una columna –como aquí y en tareas de responder preguntas-.
 
 ¡Veamos cómo funciona! En este ejemplo vamos a tokenizar nuestros ejemplos y limitarlos a una longitud máxima de 128, pero le pediremos al tokenizador que devuelva *todos* los fragmentos de texto en vez de unicamente el primero. Esto se puede lograr con el argumento `return_overflowing_tokens=True`:
 
@@ -519,11 +501,8 @@ Creemos un `pandas.DataFrame` para el conjunto de entrenamiento entero al selecc
 train_df = drug_dataset["train"][:]
 ```
 
-<Tip>
-
-🚨 Internamente, `Dataset.set_format()` cambia el formato de devolución del método _dunder_ `__getitem()__`. Esto significa que cuando queremos crear un objeto nuevo como `train_df` de un `Dataset` en formato `"pandas"`, tenemos que seleccionar el dataset completo para obtener un `pandas.DataFrame`. Puedes verificar por ti mismo que el tipo de `drug_dataset["train"]` es `Dataset` sin importar el formato de salida.
-
-</Tip>
+> [!TIP]
+> 🚨 Internamente, `Dataset.set_format()` cambia el formato de devolución del método _dunder_ `__getitem()__`. Esto significa que cuando queremos crear un objeto nuevo como `train_df` de un `Dataset` en formato `"pandas"`, tenemos que seleccionar el dataset completo para obtener un `pandas.DataFrame`. Puedes verificar por ti mismo que el tipo de `drug_dataset["train"]` es `Dataset` sin importar el formato de salida.
 
 De aquí en adelante podemos usar toda la funcionalidad de pandas cuando queramos. Por ejemplo, podemos hacer un encadenamiento sofisticado para calcular la distribución de clase entre las entradas de `condition`:
 
@@ -591,11 +570,8 @@ Dataset({
 })
 ```
 
-<Tip>
-
-✏️ **¡Inténtalo!** Calcula la calificación promedio por medicamento y guarda el resultado en un nuevo `Dataset`.
-
-</Tip>
+> [!TIP]
+> ✏️ **¡Inténtalo!** Calcula la calificación promedio por medicamento y guarda el resultado en un nuevo `Dataset`.
 
 Con esto terminamos nuestro tour de las múltiples técnicas de preprocesamiento disponibles en 🤗 Datasets. Para concluir, creemos un set de validación para preparar el conjunto de datos y entrenar el clasificador. Antes de hacerlo, vamos a reiniciar el formato de salida de `drug_dataset` de `"pandas"` a `"arrow"`:
 
diff --git a/chapters/es/chapter5/4.mdx b/chapters/es/chapter5/4.mdx
index 326a2a609..242d0c612 100644
--- a/chapters/es/chapter5/4.mdx
+++ b/chapters/es/chapter5/4.mdx
@@ -43,11 +43,8 @@ Dataset({
 
 Como podemos ver, hay 15.518.009 filas y dos columnas en el dataset, ¡un montón!
 
-<Tip>
-
-✎ Por defecto, 🤗 Datasets va a descomprimir los archivos necesarios para cargar un dataset. Si quieres ahorrar espacio de almacenamiento, puedes usar `DownloadConfig(delete_extracted=True)` al argumento `download_config` de `load_dataset()`. Revisa la [documentación](https://huggingface.co/docs/datasets/package_reference/builder_classes#datasets.DownloadConfig) para más detalles.
-
-</Tip>
+> [!TIP]
+> ✎ Por defecto, 🤗 Datasets va a descomprimir los archivos necesarios para cargar un dataset. Si quieres ahorrar espacio de almacenamiento, puedes usar `DownloadConfig(delete_extracted=True)` al argumento `download_config` de `load_dataset()`. Revisa la [documentación](https://huggingface.co/docs/datasets/package_reference/builder_classes#datasets.DownloadConfig) para más detalles.
 
 Veamos el contenido del primer ejemplo:
 
@@ -98,11 +95,8 @@ Dataset size (cache file) : 19.54 GB
 
 Bien, a pesar de que el archivo es de casi 20 GB, ¡podemos cargarlo y acceder a su contenido con mucha menos RAM!
 
-<Tip>
-
-✏️ **¡Inténtalo!** Escoge alguno de los [subconjuntos](https://mystic.the-eye.eu/public/AI/pile_preliminary_components/) del _Pile_ que sea más grande que la RAM de tu computador portátil o de escritorio, cárgalo con 🤗 Datasets y mide la cantidad de RAM utilizada. Recuerda que para tener una medición precisa, tienes que hacerlo en un nuevo proceso. Puedes encontrar los tamaños de cada uno de los subconjuntos sin comprimir en la Tabla 1 del [paper de _Pile_](https://arxiv.org/abs/2101.00027).
-
-</Tip>
+> [!TIP]
+> ✏️ **¡Inténtalo!** Escoge alguno de los [subconjuntos](https://mystic.the-eye.eu/public/AI/pile_preliminary_components/) del _Pile_ que sea más grande que la RAM de tu computador portátil o de escritorio, cárgalo con 🤗 Datasets y mide la cantidad de RAM utilizada. Recuerda que para tener una medición precisa, tienes que hacerlo en un nuevo proceso. Puedes encontrar los tamaños de cada uno de los subconjuntos sin comprimir en la Tabla 1 del [paper de _Pile_](https://arxiv.org/abs/2101.00027).
 
 Si estás familiarizado con Pandas, este resultado puede ser sorprendente por la famosa [regla de Wes Kinney](https://wesmckinney.com/blog/apache-arrow-pandas-internals/) que indica que típicamente necesitas de 5 a 10 veces la RAM que el tamaño del archivo de tu dataset. ¿Cómo resuelve entonces 🤗 Datasets este problema de manejo de memoria? 🤗 Datasets trata cada dataset como un [archivo proyectado en memoria](https://en.wikipedia.org/wiki/Memory-mapped_file), lo que permite un mapeo entre la RAM y el sistema de almacenamiento de archivos, que le permite a la librería acceder y operar los elementos del dataset sin necesidad de tenerlos cargados completamente en memoria.
 
@@ -130,11 +124,8 @@ print(
 
 Aquí usamos el módulo `timeit` de Python para medir el tiempo de ejecución que se toma `code_snippet`. Típicamemente, puedes iterar a lo largo de un dataset a una velocidad de unas cuantas décimas de un GB por segundo. Esto funciona muy bien para la gran mayoría de aplicaciones, pero algunas veces tendrás que trabajar con un dataset que es tan grande para incluso almacenarse en el disco de tu computador. Por ejemplo, si quisieramos descargar el _Pile_ completo ¡necesitaríamos 825 GB de almacenamiento libre! Para trabajar con esos casos, 🤗 Datasets puede trabajar haciendo _streaming_, lo que permite la descarga y acceso a los elementos sobre la marcha, sin necesidad de descargar todo el dataset. Veamos cómo funciona:
 
-<Tip>
-
-💡 En los cuadernos de Jupyter también puedes medir el tiempo de ejecución de las celdas usando [`%%timeit`](https://ipython.readthedocs.io/en/stable/interactive/magics.html#magic-timeit).
-
-</Tip>
+> [!TIP]
+> 💡 En los cuadernos de Jupyter también puedes medir el tiempo de ejecución de las celdas usando [`%%timeit`](https://ipython.readthedocs.io/en/stable/interactive/magics.html#magic-timeit).
 
 ## Haciendo _streaming_ de datasets
 
@@ -171,11 +162,8 @@ next(iter(tokenized_dataset))
 {'input_ids': [101, 4958, 5178, 4328, 6779, ...], 'attention_mask': [1, 1, 1, 1, 1, ...]}
 ```
 
-<Tip>
-
-💡 Para acelerar la tokenización con _streaming_ puedes definir `batched=True`, como lo vimos en la sección anterior. Esto va a procesar los ejemplos lote por lote. Recuerda que el tamaño por defecto de los lotes es 1.000 y puede ser especificado con el argumento `batch_size`.
-
-</Tip>
+> [!TIP]
+> 💡 Para acelerar la tokenización con _streaming_ puedes definir `batched=True`, como lo vimos en la sección anterior. Esto va a procesar los ejemplos lote por lote. Recuerda que el tamaño por defecto de los lotes es 1.000 y puede ser especificado con el argumento `batch_size`.
 
 También puedes aleatorizar el orden de un dataset _streamed_ usando `IterableDataset.shuffle()`, pero a diferencia de `Dataset.shuffle()` esto sólo afecta a los elementos en un `buffer_size` determinado:
 
@@ -276,10 +264,7 @@ next(iter(pile_dataset["train"]))
  'text': 'It is done, and submitted. You can play “Survival of the Tastiest” on Android, and on the web...'}
 ```
 
-<Tip>
-
-✏️ **¡Inténtalo!** Usa alguno de los corpus grandes de Common Crawl como [`mc4`](https://huggingface.co/datasets/mc4) u [`oscar`](https://huggingface.co/datasets/oscar) para crear un dataset _streaming_ multilenguaje que represente las proporciones de lenguajes hablados en un país de tu elección. Por ejemplo, los 4 lenguajes nacionales en Suiza son alemán, francés, italiano y romanche, así que podrías crear un corpus suizo al hacer un muestreo de Oscar de acuerdo con su proporción de lenguaje.
-
-</Tip>
+> [!TIP]
+> ✏️ **¡Inténtalo!** Usa alguno de los corpus grandes de Common Crawl como [`mc4`](https://huggingface.co/datasets/mc4) u [`oscar`](https://huggingface.co/datasets/oscar) para crear un dataset _streaming_ multilenguaje que represente las proporciones de lenguajes hablados en un país de tu elección. Por ejemplo, los 4 lenguajes nacionales en Suiza son alemán, francés, italiano y romanche, así que podrías crear un corpus suizo al hacer un muestreo de Oscar de acuerdo con su proporción de lenguaje.
 
 Ya tienes todas las herramientas para cargar y procesar datasets de todas las formas y tamaños, pero a menos que seas muy afortunado, llegará un punto en tu camino de PLN en el que tendrás que crear el dataset tu mismo para resolver tu problema particular. De esto hablaremos en la siguiente sección.
diff --git a/chapters/es/chapter5/5.mdx b/chapters/es/chapter5/5.mdx
index 21419d4e5..b0415ed0b 100644
--- a/chapters/es/chapter5/5.mdx
+++ b/chapters/es/chapter5/5.mdx
@@ -113,11 +113,8 @@ response.json()
 
 Wow, ¡es mucha información! Podemos ver campos útiles como `title`, `body` y `number`, que describen el issue, así como información del usuario de GitHub que lo abrió.
 
-<Tip>
-
-✏️ **¡Inténtalo!** Haz clic en algunas de las URL en el _payload_ JSON de arriba para explorar la información que está enlazada al issue de GitHub.
-
-</Tip>
+> [!TIP]
+> ✏️ **¡Inténtalo!** Haz clic en algunas de las URL en el _payload_ JSON de arriba para explorar la información que está enlazada al issue de GitHub.
 
 Tal como se describe en la [documentación](https://docs.github.com/en/rest/overview/resources-in-the-rest-api#rate-limiting) de GitHub, los pedidos sin autenticación están limitados a 60 por hora. Si bien puedes incrementar el parámetro de búsqueda `per_page` para reducir el número de pedidos que haces, igual puedes alcanzar el límite de pedidos en cualquier repositorio que tenga más que un par de miles de issues. En vez de hacer eso, puedes seguir las [instrucciones](https://docs.github.com/en/github/authenticating-to-github/creating-a-personal-access-token) de GitHub para crear un _token de acceso personal_ y que puedas incrementar el límite de pedidos a 5.000 por hora. Una vez tengas tu token, puedes incluirlo como parte del encabezado del pedido:
 
@@ -126,11 +123,8 @@ GITHUB_TOKEN = xxx  # Copy your GitHub token here
 headers = {"Authorization": f"token {GITHUB_TOKEN}"}
 ```
 
-<Tip warning={true}>
-
-⚠️ No compartas un cuaderno que contenga tu `GITHUB_TOKEN`. Te recomendamos eliminar la última celda una vez la has ejecutado para evitar filtrar accidentalmente esta información. Aún mejor, guarda el token en un archivo *.env* y usa la librería [`python-dotenv`](https://github.com/theskumar/python-dotenv) para cargarla automáticamente como una variable de ambiente.
-
-</Tip>
+> [!WARNING]
+> ⚠️ No compartas un cuaderno que contenga tu `GITHUB_TOKEN`. Te recomendamos eliminar la última celda una vez la has ejecutado para evitar filtrar accidentalmente esta información. Aún mejor, guarda el token en un archivo *.env* y usa la librería [`python-dotenv`](https://github.com/theskumar/python-dotenv) para cargarla automáticamente como una variable de ambiente.
 
 Ahora que tenemos nuestro token de acceso, creemos una función que descargue todos los issues de un repositorio de GitHub:
 
@@ -237,11 +231,8 @@ issues_dataset = issues_dataset.map(
 )
 ```
 
-<Tip>
-
-✏️ **¡Inténtalo!** Calcula el tiempo promedio que toma cerrar issues en 🤗 Datasets. La función `Dataset.filter()` te será útil para filtrar los pull requests y los issues abiertos, y puedes usar la función `Dataset.set_format()` para convertir el dataset a un `DataFrame` para poder manipular fácilmente los timestamps de `created_at` y `closed_at`. Para puntos extra, calcula el tiempo promedio que toma cerrar pull requests.
-
-</Tip>
+> [!TIP]
+> ✏️ **¡Inténtalo!** Calcula el tiempo promedio que toma cerrar issues en 🤗 Datasets. La función `Dataset.filter()` te será útil para filtrar los pull requests y los issues abiertos, y puedes usar la función `Dataset.set_format()` para convertir el dataset a un `DataFrame` para poder manipular fácilmente los timestamps de `created_at` y `closed_at`. Para puntos extra, calcula el tiempo promedio que toma cerrar pull requests.
 
 Si bien podemos limpiar aún más el dataset eliminando o renombrando algunas columnas, es una buena práctica mantener un dataset lo más parecido al original en esta etapa, para que se pueda usar fácilmente en varias aplicaciones.
 
@@ -376,11 +367,8 @@ repo_url
 
 En este ejemplo, hemos creado un repositorio vacío para el dataset llamado `github-issues` bajo el nombre de usuario `lewtun` (¡el nombre de usuario debería ser tu nombre de usuario del Hub cuando estés ejecutando este código!).
 
-<Tip>
-
-✏️ **¡Inténtalo!** Usa tu nombre de usuario de Hugging Face Hub para obtener un token y crear un repositorio vacío llamado `github-issues`. Recuerda **nunca guardar tus credenciales** en Colab o cualquier otro repositorio, ya que esta información puede ser aprovechada por terceros.
-
-</Tip>
+> [!TIP]
+> ✏️ **¡Inténtalo!** Usa tu nombre de usuario de Hugging Face Hub para obtener un token y crear un repositorio vacío llamado `github-issues`. Recuerda **nunca guardar tus credenciales** en Colab o cualquier otro repositorio, ya que esta información puede ser aprovechada por terceros.
 
 Ahora clonemos el repositorio del Hub a nuestra máquina local y copiemos nuestro dataset ahí. 🤗 Hub incluye una clase `Repositorio` que envuelve muchos de los comandos comunes de Git, así que para clonar el repositorio remoto solamente necesitamos dar la URL y la ruta local en la que lo queremos clonar:
 
@@ -425,11 +413,8 @@ Dataset({
 
 ¡Genial, hemos subido el dataset al Hub y ya está disponible para que otras personas lo usen! Sólo hay una cosa restante por hacer: añadir una _tarjeta del dataset_ (_dataset card_) que explique cómo se creó el corpus y provea información útil para la comunidad.
 
-<Tip>
-
-💡 También puedes subir un dataset al Hub de Hugging Face directamente desde la terminal usando `huggingface-cli` y un poco de Git. Revisa la [guía de 🤗 Datasets](https://huggingface.co/docs/datasets/share#share-a-dataset-using-the-cli) para más detalles sobre cómo hacerlo.
-
-</Tip>
+> [!TIP]
+> 💡 También puedes subir un dataset al Hub de Hugging Face directamente desde la terminal usando `huggingface-cli` y un poco de Git. Revisa la [guía de 🤗 Datasets](https://huggingface.co/docs/datasets/share#share-a-dataset-using-the-cli) para más detalles sobre cómo hacerlo.
 
 ## Creando una tarjeta del dataset
 
@@ -451,16 +436,10 @@ Puedes crear el archivo *README.md* directamente desde el Hub y puedes encontrar
 <img src="https://huggingface.co/datasets/huggingface-course/documentation-images/resolve/main/en/chapter5/dataset-card.png" alt="A dataset card." width="80%"/>
 </div>
 
-<Tip>
-
-✏️ **¡Inténtalo!** Usa la aplicación `dataset-tagging` y la [guía de 🤗 Datasets](https://github.com/huggingface/datasets/blob/master/templates/README_guide.md) para completar el archivo *README.md* para tu dataset de issues de GitHub.
-
-</Tip>
+> [!TIP]
+> ✏️ **¡Inténtalo!** Usa la aplicación `dataset-tagging` y la [guía de 🤗 Datasets](https://github.com/huggingface/datasets/blob/master/templates/README_guide.md) para completar el archivo *README.md* para tu dataset de issues de GitHub.
 
 ¡Eso es todo! Hemos visto que crear un buen dataset requiere de mucho esfuerzo de tu parte, pero afortunadamente subirlo y compartirlo con la comunidad no. En la siguiente sección usaremos nuestro nuevo dataset para crear un motor de búsqueda semántica con 🤗 Datasets que pueda emparejar preguntas con los issues y comentarios más relevantes.
 
-<Tip>
-
-✏️ **¡Inténtalo!** Sigue los pasos descritos en esta sección para crear un dataset de issues de GitHub de tu librería de código abierto favorita (¡por supuesto, escoge algo distinto a 🤗 Datasets!). Para puntos extra, ajusta un clasificador de etiquetas múltiples para predecir las etiquetas presentes en el campo `labels`.
-
-</Tip>
+> [!TIP]
+> ✏️ **¡Inténtalo!** Sigue los pasos descritos en esta sección para crear un dataset de issues de GitHub de tu librería de código abierto favorita (¡por supuesto, escoge algo distinto a 🤗 Datasets!). Para puntos extra, ajusta un clasificador de etiquetas múltiples para predecir las etiquetas presentes en el campo `labels`.
diff --git a/chapters/es/chapter5/6.mdx b/chapters/es/chapter5/6.mdx
index d92a1333c..fde2e87b4 100644
--- a/chapters/es/chapter5/6.mdx
+++ b/chapters/es/chapter5/6.mdx
@@ -187,11 +187,8 @@ Dataset({
 
 ¡Esto nos ha dado varios miles de comentarios con los que trabajar!
 
-<Tip>
-
-✏️ **¡Inténtalo!** Prueba si puedes usar la función `Dataset.map()` para "explotar" la columna `comments` en `issues_dataset` _sin_ necesidad de usar Pandas. Esto es un poco complejo; te recomendamos revisar la sección de ["Batch mapping"](https://huggingface.co/docs/datasets/about_map_batch#batch-mapping) de la documentación de 🤗 Datasets para completar esta tarea.
-
-</Tip>
+> [!TIP]
+> ✏️ **¡Inténtalo!** Prueba si puedes usar la función `Dataset.map()` para "explotar" la columna `comments` en `issues_dataset` _sin_ necesidad de usar Pandas. Esto es un poco complejo; te recomendamos revisar la sección de ["Batch mapping"](https://huggingface.co/docs/datasets/about_map_batch#batch-mapping) de la documentación de 🤗 Datasets para completar esta tarea.
 
 Ahora que tenemos un comentario para cada fila, creemos una columna `comments_length` que contenga el número de palabras por comentario:
 
@@ -521,8 +518,5 @@ URL: https://github.com/huggingface/datasets/issues/824
 
 ¡No está mal! El segundo comentario parece responder la pregunta.
 
-<Tip>
-
-✏️ **¡Inténtalo!** Crea tu propia pregunta y prueba si puedes encontrar una respuesta en los documentos devueltos. Puede que tengas que incrementar el parámetro `k` en `Dataset.get_nearest_examples()` para aumentar la búsqueda.
-
-</Tip>
\ No newline at end of file
+> [!TIP]
+> ✏️ **¡Inténtalo!** Crea tu propia pregunta y prueba si puedes encontrar una respuesta en los documentos devueltos. Puede que tengas que incrementar el parámetro `k` en `Dataset.get_nearest_examples()` para aumentar la búsqueda.
\ No newline at end of file
diff --git a/chapters/es/chapter6/2.mdx b/chapters/es/chapter6/2.mdx
index d4e629374..951da5b35 100644
--- a/chapters/es/chapter6/2.mdx
+++ b/chapters/es/chapter6/2.mdx
@@ -12,11 +12,8 @@ Si un modelo de lenguaje no está disponible en el lenguaje en el que estás int
 
 <Youtube id="DJimQynXZsQ"/>
 
-<Tip warning={true}>
-
-⚠️ ¡Entrenar un tokenizador no es lo mismo que entrenar un modelo! Entrenar un modelo utiliza `stochastic gradient descent` para minimizar la pérdida (`loss`) en cada lote (`batch`). Es un proceso aleatorio por naturaleza (lo que signifiva que hay que fijar semillas para poder obterner los mismos resultados cuando se realiza el mismo entrenamiento dos veces). Entrenar un tokenizador es un proceso estadístico que intenta identificar cuales son las mejores subpalabras para un corpus dado, y las reglas exactas para elegir estas subpalabras dependen del algoritmo de tokenización. Es un proceso deterministico, lo que significa que siempre se obtienen los mismos resultados al entrenar el mismo algoritmo en el mismo corpus. 
-
-</Tip>
+> [!WARNING]
+> ⚠️ ¡Entrenar un tokenizador no es lo mismo que entrenar un modelo! Entrenar un modelo utiliza `stochastic gradient descent` para minimizar la pérdida (`loss`) en cada lote (`batch`). Es un proceso aleatorio por naturaleza (lo que signifiva que hay que fijar semillas para poder obterner los mismos resultados cuando se realiza el mismo entrenamiento dos veces). Entrenar un tokenizador es un proceso estadístico que intenta identificar cuales son las mejores subpalabras para un corpus dado, y las reglas exactas para elegir estas subpalabras dependen del algoritmo de tokenización. Es un proceso deterministico, lo que significa que siempre se obtienen los mismos resultados al entrenar el mismo algoritmo en el mismo corpus.
 
 ## Ensamblando un Corpus[[assembling-a-corpus]]
 
diff --git a/chapters/es/chapter6/3.mdx b/chapters/es/chapter6/3.mdx
index 65392631a..41a30b725 100644
--- a/chapters/es/chapter6/3.mdx
+++ b/chapters/es/chapter6/3.mdx
@@ -38,11 +38,8 @@ En la siguiente discusión, a menudo haremos la diferencia entre un tokenizador
 `batched=True`  | 10.8s          | 4min41s
 `batched=False` | 59.2s          | 5min3s
 
-<Tip warning={true}>
-
-⚠️ Al tokenizar una sóla oración, no siempre verás una diferencia de velocidad entre la versión lenta y la rápida del mismo tokenizador. De hecho, las versión rápida podría incluso ser más lenta! Es sólo cuando se tokenizan montones de textos en paralelos al mismo tiempo que serás capaz de ver claramente la diferencia.
-
-</Tip>
+> [!WARNING]
+> ⚠️ Al tokenizar una sóla oración, no siempre verás una diferencia de velocidad entre la versión lenta y la rápida del mismo tokenizador. De hecho, las versión rápida podría incluso ser más lenta! Es sólo cuando se tokenizan montones de textos en paralelos al mismo tiempo que serás capaz de ver claramente la diferencia.
 
 ## Codificación en Lotes (Batch Encoding)[[batch-encoding]]
 
@@ -110,13 +107,10 @@ encoding.word_ids()
 
 Podemos ver que los tokens especiales del tokenizador `[CLS]` y `[SEP]` están mapeados a `None`, y que cada token está mapeado a la palabra de la cual se origina. Esto es especialmente útil para determinar si el token está al inicio de la palabra o si dos tokens están en la misma palabra. POdríamos confiar en el prefijo `[CLS]` and `[SEP]` para eso, pero eso sólo funciona para tokenizadores tipo BERT; este método funciona para cualquier tipo de tokenizador mientras sea de tipo rápido. En el próximo capítulo, veremos como podemos usar esta capacidad para aplicar etiquetas para cada palabra de manera apropiada en tareas como Reconocimiento de Entidades (Named Entity Recognition NER), y etiquetado de partes de discurso (part-of-speech POS tagging). También podemos usarlo para enmascarar todos los tokens que provienen de la misma palabra en masked language modeling (una técnica llamada _whole word masking_).
 
-<Tip>
-
-La noción de qué es una palabra es complicada. Por ejemplo "I'll" (la contracción de "I will" en inglés) ¿cuenta como una o dos palabras? De hecho depende del tokenizador y la operación de pretokenización que aplica. Algunos tokenizadores sólo separan en espacios, por lo que considerarán esto como una sóla palabra. Otros utilizan puntuación por sobre los espacios, por lo que lo considerarán como dos palabras. 
-
-✏️ **Inténtalo!** Crea un tokenizador a partir de los checkpoints `bert-base-cased` y `roberta-base` y tokeniza con ellos "81s". ¿Qué observas? Cuál son los IDs de la palabra?
-
-</Tip>
+> [!TIP]
+> La noción de qué es una palabra es complicada. Por ejemplo "I'll" (la contracción de "I will" en inglés) ¿cuenta como una o dos palabras? De hecho depende del tokenizador y la operación de pretokenización que aplica. Algunos tokenizadores sólo separan en espacios, por lo que considerarán esto como una sóla palabra. Otros utilizan puntuación por sobre los espacios, por lo que lo considerarán como dos palabras. 
+>
+> ✏️ **Inténtalo!** Crea un tokenizador a partir de los checkpoints `bert-base-cased` y `roberta-base` y tokeniza con ellos "81s". ¿Qué observas? Cuál son los IDs de la palabra?
 
 De manera similar está el método `sentence_ids()` que podemos utilizar para mapear un token a la oración de la cuál proviene (aunque en este caso el `token_type_ids` retornado por el tokenizador puede darnos la misma información).
 
@@ -133,11 +127,8 @@ Sylvain
 
 Como mencionamos previamente, todo esto funciona gracias al hecho de que los tokenizadores rápidos llevan registro de la porción de texto del que cada token proviene en una lista de *offsets*. Para ilustrar sus usos, a continuación mostraremos como replicar los resultados del pipeline de `clasificación de tokens` de manera manual.
 
-<Tip>
-
-✏️ **Inténtalo!** Crea tu propio texto de ejemplo y ve si puedes entender qué tokens están asociados con el ID de palabra, y también cómo extraer los caracteres para una palabra. Como bonus, intenta usar dos oraciones como entrada/input y ve si los IDs de oraciones te hacen sentido.
-
-</Tip>
+> [!TIP]
+> ✏️ **Inténtalo!** Crea tu propio texto de ejemplo y ve si puedes entender qué tokens están asociados con el ID de palabra, y también cómo extraer los caracteres para una palabra. Como bonus, intenta usar dos oraciones como entrada/input y ve si los IDs de oraciones te hacen sentido.
 
 ## Dentro del Pipeline de `clasificación de tokens`[[inside-the-token-classification-pipeline]]
 
diff --git a/chapters/es/chapter6/3b.mdx b/chapters/es/chapter6/3b.mdx
index df6d509fd..f4f18634d 100644
--- a/chapters/es/chapter6/3b.mdx
+++ b/chapters/es/chapter6/3b.mdx
@@ -276,11 +276,8 @@ No estamos listos aún, pero al menos ya tenemos el puntaje correcto para la res
 0.97773
 ```
 
-<Tip>
-
-✏️ **Inténtalo!** Calcula los índices de inicio y término para las cinco respuestas más probables.
-
-</Tip>
+> [!TIP]
+> ✏️ **Inténtalo!** Calcula los índices de inicio y término para las cinco respuestas más probables.
 
 Tenemos el `start_index` y el `end_index` de la respuesta en términos de tokens, así que ahora sólo necesitamos convertirlos en los índices de caracteres en el contexto. Aquí es donde los offsets serán sumamente útiles. Podemos tomarlos y usarlos como lo hicimos en la tarea de clasificación de tokens:
 
@@ -314,11 +311,8 @@ print(result)
 
 Genial! Obtuvimos lo mismo que en nuestro primer ejemplo!
 
-<Tip>
-
-✏️ **Inténtalo!** Usaremos los mejores puntajes calculados anteriormente para mostrar las cinco respuestas más probables. Para revisar nuestros resultados regresa al primer pipeline y agrega `top_k=5` al llamarlo.
-
-</Tip>
+> [!TIP]
+> ✏️ **Inténtalo!** Usaremos los mejores puntajes calculados anteriormente para mostrar las cinco respuestas más probables. Para revisar nuestros resultados regresa al primer pipeline y agrega `top_k=5` al llamarlo.
 
 ## Manejando contextos largos[[handling-long-contexts]]
 
@@ -609,11 +603,8 @@ print(candidates)
 
 Estos dos candidatos corresponden a las mejores respuestas que el modelo fue capaz de encontrar en cada trozo. El modelo está mucho más confiado de que la respuesta correcta está en la segunda parte (¡lo que es una buena señal!). Ahora sólo tenemos que mapear dichos tokens a los caracteres en el contexto (sólo necesitamos mapear la segunda para obtener nuestra respuesta, pero es interesante ver que el modelo ha elegido en el primer trozo).
 
-<Tip>
-
-✏️ **Inténtalo!** Adapta el código de arriba pra retornar los puntajes de las 5 respuestas más probables (en total, no por trozo).
-
-</Tip>
+> [!TIP]
+> ✏️ **Inténtalo!** Adapta el código de arriba pra retornar los puntajes de las 5 respuestas más probables (en total, no por trozo).
 
 Los `offsets` que tomamos antes es en realidad una lista de offsets, con una lista por trozo de texto:
 
@@ -634,10 +625,7 @@ for candidate, offset in zip(candidates, offsets):
 
 Si ignoramos el primer resultado, obtenemos el mismo resultado que nuestro pipeline para el contexto largo -- bien!
 
-<Tip>
-
-✏️ **Inténtalo!** Usa los mejores puntajes que calculaste antes para mostrar las 5 respuestas más probables. Para revisar tus resultados, regresa al primer pipeline y agrega `top_k=5` al llamarlo.
-
-</Tip>
+> [!TIP]
+> ✏️ **Inténtalo!** Usa los mejores puntajes que calculaste antes para mostrar las 5 respuestas más probables. Para revisar tus resultados, regresa al primer pipeline y agrega `top_k=5` al llamarlo.
 
 Esto concluye nuestra profundización en las capacidades de los tokenizadores. Pondremos todo esto en práctica de nuevo en el siguiente capítulo, cuando te mostremos cómo hacer fine-tuning a un modelo en una variedad de tareas comunes de PLN.
diff --git a/chapters/es/chapter6/4.mdx b/chapters/es/chapter6/4.mdx
index c020896f7..44b7aa2c6 100644
--- a/chapters/es/chapter6/4.mdx
+++ b/chapters/es/chapter6/4.mdx
@@ -47,11 +47,8 @@ print(tokenizer.backend_tokenizer.normalizer.normalize_str("Héllò hôw are ü?
 
 En este ejemplo, dado que elegimos el punto de control (checkpoint) `bert-base-uncased`, la normalización aplicó transformación a minúsculas y remoción de acentos. 
 
-<Tip>
-
-✏️ **Inténtalo!** Carga un tokenizador desde el punto de control (checkpoint)`bert-base-cased` y pásale el mismo ejemplo. Cuáles son las principales diferencias que puedes ver entre las versiones cased y uncased de los tokenizadores?
-
-</Tip>
+> [!TIP]
+> ✏️ **Inténtalo!** Carga un tokenizador desde el punto de control (checkpoint)`bert-base-cased` y pásale el mismo ejemplo. Cuáles son las principales diferencias que puedes ver entre las versiones cased y uncased de los tokenizadores?
 
 ## Pre-tokenización[[pre-tokenization]]
 
diff --git a/chapters/es/chapter6/5.mdx b/chapters/es/chapter6/5.mdx
index c8bd82898..f579601c3 100644
--- a/chapters/es/chapter6/5.mdx
+++ b/chapters/es/chapter6/5.mdx
@@ -11,11 +11,8 @@ La codificación por pares de byte (Byte-Pair Encoding (BPE)) fue inicialmente d
 
 <Youtube id="HEikzVL-lZU"/>
 
-<Tip>
-
-💡 Esta sección cubre BPE en produndidad, yendo tan lejos como para mostrar una implementación completa. Puedes saltarte hasta el final si sólo quieres una descripción general del algoritmo de tokenización. 
-
-</Tip>
+> [!TIP]
+> 💡 Esta sección cubre BPE en produndidad, yendo tan lejos como para mostrar una implementación completa. Puedes saltarte hasta el final si sólo quieres una descripción general del algoritmo de tokenización.
 
 ## Algoritmo de Entrenamiento[[training-algorithm]]
 
@@ -28,11 +25,8 @@ El entrenamiento de BPE comienza calculando el conjunto de palabras únicas usad
 
 El vocabulario vase entonces será `["b", "g", "h", "n", "p", "s", "u"]`. Para casos reales, el vocabulario base contendrá todos los caracteres ASCII, al menos, y probablemente algunos caracteres Unicode también. Si un ejemplo que estás tokenizando usa un caracter que no está en el corpus de entrenamiento, ese caracter será convertido al token "desconocido". Esa es una razón por la cual muchos modelos de NLP son muy malos analizando contenido con emojis.
 
-<Tip>
-
-Los tokenizadores de GPT-2 y RoBERTa (que son bastante similares) tienen una manera bien inteligente de lidiar con esto: ellos no miran a las palabras como si estuvieran escritas con caracteres Unicode, sino con bytes. De esa manera el vocabulario base tiene un tamaño pequeño (256), pero cada caracter que te puedas imaginar estará incluido y no terminará convertido en el token "desconocido". Este truco se llama *byte-level BPE*.
-
-</Tip>
+> [!TIP]
+> Los tokenizadores de GPT-2 y RoBERTa (que son bastante similares) tienen una manera bien inteligente de lidiar con esto: ellos no miran a las palabras como si estuvieran escritas con caracteres Unicode, sino con bytes. De esa manera el vocabulario base tiene un tamaño pequeño (256), pero cada caracter que te puedas imaginar estará incluido y no terminará convertido en el token "desconocido". Este truco se llama *byte-level BPE*.
 
 Luego de obtener el vocabulario base, agregamos nuevos tokens hasta que el tamaño deseado del vocabulario se alcance por medio de aprender *fusiones* (merges), las cuales son reglas para fusionar dos elementos del vocabulario existente en uno nuevo. Por lo que al inicio de estas fusiones crearemos tokens con dos caracteres, y luego, a medida que el entrenamiento avance, subpalabras más largas.
 
@@ -75,11 +69,8 @@ Corpus: ("hug", 10), ("p" "ug", 5), ("p" "un", 12), ("b" "un", 4), ("hug" "s", 5
 
 Y continuamos así hasta que alcancemos el tamaño deseado del vocabulario.
 
-<Tip>
-
-✏️ **Ahora es tu turno!** Cuál crees que será la siguiente regla de fusión?
-
-</Tip>
+> [!TIP]
+> ✏️ **Ahora es tu turno!** Cuál crees que será la siguiente regla de fusión?
 
 ## Algoritmo de Tokenización[[tokenization-algorithm]]
 
@@ -99,11 +90,8 @@ Tomemos el ejemplo que usamos durante el entrenamiento, con las tres reglas de f
 ```
 La palabra `"bug"` será tokenizada como `["b", "ug"]`. En cambio, `"mug"`, será tokenizado como `["[UNK]", "ug"]` dado que la letra `"m"` no fue parte del vocabulario base. De la misma manera, la palabra `"thug"` será tokenizada como `["[UNK]", "hug"]`: la letra `"t"` no está en el vocabulario base, y aplicando las reglas de fusión resulta primero la fusión de `"u"` y `"g"` y luego de `"hu"` and `"g"`.
 
-<Tip>
-
-✏️ **Ahora es tu turno!** ¿Cómo crees será tokenizada la palabra `"unhug"`?
-
-</Tip>
+> [!TIP]
+> ✏️ **Ahora es tu turno!** ¿Cómo crees será tokenizada la palabra `"unhug"`?
 
 ## Implementando BPE[[implementing-bpe]]
 
@@ -315,11 +303,8 @@ print(vocab)
  'Ġtok', 'Ġtoken', 'nd', 'Ġis', 'Ġth', 'Ġthe', 'in', 'Ġab', 'Ġtokeni']
 ```
 
-<Tip>
-
-💡 Usar `train_new_from_iterator()` en el mismo corpus no resultará en exactament el mismo vocabulario. Esto es porque cuando hay una elección del par más frecuente, seleccionamos el primero encontrado, mientras que la librería 🤗 Tokenizers selecciona el primero basado en sus IDs internos. 
-
-</Tip>
+> [!TIP]
+> 💡 Usar `train_new_from_iterator()` en el mismo corpus no resultará en exactament el mismo vocabulario. Esto es porque cuando hay una elección del par más frecuente, seleccionamos el primero encontrado, mientras que la librería 🤗 Tokenizers selecciona el primero basado en sus IDs internos.
 
 Para tokenizar un nuevo texto lo pre-tokenizamos, lo separamos, luego aplicamos todas las reglas de fusión aprendidas:
 
@@ -351,10 +336,7 @@ tokenize("This is not a token.")
 ['This', 'Ġis', 'Ġ', 'n', 'o', 't', 'Ġa', 'Ġtoken', '.']
 ```
 
-<Tip warning={true}>
-
-⚠️ Nuestra implementación arrojará un error si hay un caracter desconocido dado que no hicimos nada para manejarlos. GPT-2 en realidad no tiene un token desconocido (es imposible obtener un caracter desconocido cuando se usa byte-level BPE), pero esto podría ocurrir acá porque no incluímos todos los posibles bytes en el vocabulario inicial. Este aspectode BPE va más allá del alcance de está sección, por lo que dejaremos los detalles fuera. 
-
-</Tip>
+> [!WARNING]
+> ⚠️ Nuestra implementación arrojará un error si hay un caracter desconocido dado que no hicimos nada para manejarlos. GPT-2 en realidad no tiene un token desconocido (es imposible obtener un caracter desconocido cuando se usa byte-level BPE), pero esto podría ocurrir acá porque no incluímos todos los posibles bytes en el vocabulario inicial. Este aspectode BPE va más allá del alcance de está sección, por lo que dejaremos los detalles fuera.
 
 Eso es todo para el algoritmo BPE! A continuación echaremos un vistazo a WordPiece.
\ No newline at end of file
diff --git a/chapters/es/chapter6/6.mdx b/chapters/es/chapter6/6.mdx
index bb5dd62e5..3f9c00404 100644
--- a/chapters/es/chapter6/6.mdx
+++ b/chapters/es/chapter6/6.mdx
@@ -11,19 +11,13 @@ WordPiece es el algoritmo de tokenización que Google desarrolló para pre-entre
 
 <Youtube id="qpv6ms_t_1A"/>
 
-<Tip>
-
-💡 Esta sección cubre WordPiece en profundidad, yendo tan lejos como para mostrar una implementación completa. Puedes saltarte hasta el final si sólo quieres una descripción general del algoritmo de tokenización. 
-
-</Tip>
+> [!TIP]
+> 💡 Esta sección cubre WordPiece en profundidad, yendo tan lejos como para mostrar una implementación completa. Puedes saltarte hasta el final si sólo quieres una descripción general del algoritmo de tokenización.
 
 ## Algoritmo de Entrenamiento[[training-algorithm]]
 
-<Tip warning={true}>
-
-⚠️ Google nunca liberó el código (open-sourced) su implementación del algoritmo de entrenamiento de WordPiece, por tanto lo que sigue es nuestra mejor suposición badado en la literatura publicada. Puede no ser 100% preciso.
-
-</Tip>
+> [!WARNING]
+> ⚠️ Google nunca liberó el código (open-sourced) su implementación del algoritmo de entrenamiento de WordPiece, por tanto lo que sigue es nuestra mejor suposición badado en la literatura publicada. Puede no ser 100% preciso.
 
 Al igual que BPE, WordPiece comienza a partir de un pequeño vocabulario incluyendo los tokens especiales utilizados por el modelo y el alfabeto inicial. Dado que identifica subpalabras (subwords) agregando un prefijo (como `##` para el caso de BERT), cada palabra está inicialmente separada agregando dicho prefijo a todos los caracteres dentro de la palabra. Por lo que por ejemplo la palabra `"word"` queda separada así:
 
@@ -75,11 +69,8 @@ Corpus: ("hug", 10), ("p" "##u" "##g", 5), ("p" "##u" "##n", 12), ("b" "##u" "##
 
 y continuamos como esto hasta que alcancemos el tamaño de vocabulario deseado.
 
-<Tip>
-
-✏️ **Ahora es tu turno!** Cuál será la siguiente regla de fusioń?
-
-</Tip>
+> [!TIP]
+> ✏️ **Ahora es tu turno!** Cuál será la siguiente regla de fusioń?
 
 ## Algoritmo de Tokenización[[tokenization-algorithm]]
 
@@ -91,11 +82,8 @@ Como otro ejemplo, veamos como la palabra `"bugs"` sería tokenizado. `"b"` es l
 
 Cuando la tokenización llega a la etapa donde ya no es posible encontrar una subpalabra en el vocabulario, la palabra entera es tokenizada como desconocida -- Por ejemplo, `"mug"` sería tokenizada como `["[UNK]"]`, al igual que `"bum"` (incluso si podemos comenzar con `"b"` y `"##u"`, `"##m"` no está en el vocabulario, y la tokenización resultante será sólo `["[UNK]"]`, y no `["b", "##u", "[UNK]"]`). Este es otra diferencia con respecto a BPE, el cual sólo clasificaría los caracteres individuales que no están en el vocabulario como desconocido.
 
-<Tip>
-
-✏️ **Ahora es tu turno!** ¿Cómo se tokenizaría la palabra `"pugs"`?
-
-</Tip>
+> [!TIP]
+> ✏️ **Ahora es tu turno!** ¿Cómo se tokenizaría la palabra `"pugs"`?
 
 ## Implementando WordPiece[[implementing-wordpiece]]
 
@@ -313,11 +301,8 @@ print(vocab)
 
 Como podemos ver, comparado con BPE, este tokenizador aprende partes de palabras como tokens un poco más rápido.
 
-<Tip>
-
-💡 Usar `train_new_from_iterator()` en el mismo corpus no resultará en exactamente el mismo vocabulario. Esto porque la librería 🤗 Tokenizers no implementa WordPiece para el entrenamiento (dado que no estamos completamente seguros de su funcionamiento interno), en vez de eso utiliza BPE.
-
-</Tip>
+> [!TIP]
+> 💡 Usar `train_new_from_iterator()` en el mismo corpus no resultará en exactamente el mismo vocabulario. Esto porque la librería 🤗 Tokenizers no implementa WordPiece para el entrenamiento (dado que no estamos completamente seguros de su funcionamiento interno), en vez de eso utiliza BPE.
 
 Para tokenizar un nuevo texto, lo pre-tokenizamos, lo separamos, y luego aplicamos el algoritmo de tokenización para cada palabra. Es decir, miramos la subpalabra más grande comenzando al inicio de la primera palabra y la separamos, luego repetimos el proceso en la segunda parte, y así pará el resto de dicha palabra y de las siguientes palabras en el texto:
 
diff --git a/chapters/es/chapter6/7.mdx b/chapters/es/chapter6/7.mdx
index 325e86c7c..87be84e3d 100644
--- a/chapters/es/chapter6/7.mdx
+++ b/chapters/es/chapter6/7.mdx
@@ -11,11 +11,8 @@ El algoritmo de Unigram es a menudo utilizado en SetencePiece, el cual es el alg
 
 <Youtube id="TGZfZVuF9Yc"/>
 
-<Tip>
-
-💡 Esta sección cubre Unigram en profundidad, yendo tan lejos como para mostrar una implementación completa. Puedes saltarte hasta el final si sólo quieres una descripción general del algoritmo de tokenización. 
-
-</Tip>
+> [!TIP]
+> 💡 Esta sección cubre Unigram en profundidad, yendo tan lejos como para mostrar una implementación completa. Puedes saltarte hasta el final si sólo quieres una descripción general del algoritmo de tokenización.
 
 ## Algoritmo de Entrenamiento[[training-algorithm]]
 
@@ -56,11 +53,8 @@ Acá están las frecuencias de todas las posibles subpalabras en el vocabulario:
 
 Por lo que, la suma de todas las frecuencias es 210, y la probabilidad de la subpalabra `"ug"` es por lo tanto 20/210.
 
-<Tip>
-
-✏️ **Ahora es tu turno!** Escribe el código para calcular las frecuencias de arriba y chequea que los resultados mostrados son correctos, como también la suma total.
-
-</Tip>
+> [!TIP]
+> ✏️ **Ahora es tu turno!** Escribe el código para calcular las frecuencias de arriba y chequea que los resultados mostrados son correctos, como también la suma total.
 
 Ahora, para tokenizar una palabra dada, miramos todas las posibles segmentaciones en tokens y calculamos la probabilidad de cada uno de acuerdo al modelo Unigram. Dado que todos los tokens se consideran como independientes, esta probabilidad es sólo el producto de la probabilidad de cada token. Por ejemplo, la tokenización `["p", "u", "g"]` de `"pug"` tiene como probabilidad: 
 
@@ -98,12 +92,8 @@ Character 4 (g): "un" "hug" (score 0.005442)
 
 Por lo tanto, `"unhug"` se tokenizaría como `["un", "hug"]`.
 
-<Tip>
-
-✏️ **Ahora es tu turno!** Determina la tokenización de la palabra `"huggun"`, y su puntaje
-
-
-</Tip>
+> [!TIP]
+> ✏️ **Ahora es tu turno!** Determina la tokenización de la palabra `"huggun"`, y su puntaje
 
 ## De vuelta al entrenamiento[[back-to-training]]
 
@@ -216,11 +206,8 @@ token_freqs = list(char_freqs.items()) + sorted_subwords[: 300 - len(char_freqs)
 token_freqs = {token: freq for token, freq in token_freqs}
 ```
 
-<Tip>
-
-💡 SentencePiece usa un algoritmo más eficiente llamado Enhanced Suffix Array (ESA) para crear el vocabulario inicial. 
-
-</Tip>
+> [!TIP]
+> 💡 SentencePiece usa un algoritmo más eficiente llamado Enhanced Suffix Array (ESA) para crear el vocabulario inicial.
 
 A continuación, calculamos la suma de todas las frecuencias, para convertir las frecuencias en probabilidades. Para nuestro modelo, almacenaremos los logaritmos de las probabilidades, porque es numericamente más estable sumar logaritmos que multiplicar números pequeños, y esto simplificará el cálculo de la pérdida (`loss`) del modelo:
 
@@ -341,11 +328,8 @@ Dado que `"ll"` se usa en la tokenización de `"Hopefully"`, y removerlo nos har
 0.0
 ```
 
-<Tip>
-
-💡 Este acercamiento es muy ineficiente, por lo que SentencePiece usa una aproximación de la pérdida del modelo sin el token X: en vez de comenzar desde cero, sólo reemplaza el token X por su segmentación en el vocabulario que queda. De esta manera, todos los puntajes se pueden calcular de una sóla vez al mismo tiempo que la pérdida del modelo.
-
-</Tip>
+> [!TIP]
+> 💡 Este acercamiento es muy ineficiente, por lo que SentencePiece usa una aproximación de la pérdida del modelo sin el token X: en vez de comenzar desde cero, sólo reemplaza el token X por su segmentación en el vocabulario que queda. De esta manera, todos los puntajes se pueden calcular de una sóla vez al mismo tiempo que la pérdida del modelo.
 
 Con todo esto en su lugar, lo último que necesitamos hacer es agregar los tokens especiales usados por el modelo al vocabulario, e iterar hasta haber podado suficientes tokens de nuestro vocabulario hasta alcanzar el tamaño deseado:
 
diff --git a/chapters/es/chapter6/8.mdx b/chapters/es/chapter6/8.mdx
index 0754b82d0..f6fdb1e43 100644
--- a/chapters/es/chapter6/8.mdx
+++ b/chapters/es/chapter6/8.mdx
@@ -111,13 +111,10 @@ Como hemos visto antes, podemos usar el método `normalize_str()` del `normalize
 hello how are u?
 ```
 
-<Tip>
-
-**Para ir más allá** Si pruebas las dos versiones de los normalizadores previos en un string conteniendo un caracter unicode `u"\u0085"`
-de seguro notarás que los dos normalizadores no son exactamente equivalentes.
-Para no sobre-complicar demasiado la version con `normalizers.Sequence`, no hemos incluido los reemplazos usando Expresiones Regulares (Regex) que el `BertNormalizer` requiere cuando el argumento `clean_text` se fija como `True` - lo cual es el comportamiento por defecto. Pero no te preocupes, es posible obtener la misma normalización sin usar el útil `BertNormalizer` agregando dos `normalizers.Replace` a la secuencia de normalizadores.
-
-</Tip>
+> [!TIP]
+> **Para ir más allá** Si pruebas las dos versiones de los normalizadores previos en un string conteniendo un caracter unicode `u"\u0085"`
+> de seguro notarás que los dos normalizadores no son exactamente equivalentes.
+> Para no sobre-complicar demasiado la version con `normalizers.Sequence`, no hemos incluido los reemplazos usando Expresiones Regulares (Regex) que el `BertNormalizer` requiere cuando el argumento `clean_text` se fija como `True` - lo cual es el comportamiento por defecto. Pero no te preocupes, es posible obtener la misma normalización sin usar el útil `BertNormalizer` agregando dos `normalizers.Replace` a la secuencia de normalizadores.
 
 A continuación está la etapa de pre-tokenización. De nuevo, hay un `BertPreTokenizer` pre-hecho que podemos usar:
 
diff --git a/chapters/es/chapter8/2.mdx b/chapters/es/chapter8/2.mdx
index 8386daa25..2f308f2ef 100644
--- a/chapters/es/chapter8/2.mdx
+++ b/chapters/es/chapter8/2.mdx
@@ -85,11 +85,8 @@ OSError: Can't load config for 'lewtun/distillbert-base-uncased-finetuned-squad-
 
 Hay mucha información contenida en estos reportes, así que vamos a repasar juntos las partes clave. La primera cosa que notamos es que el _traceback_ debería ser leído de _abajo hacia arriba_. Esto puede sonar extraño si estás acostumbrado a leer en español de arriba hacia abajo, pero refleja el hecho de que el _traceback_ muestra la secuencia de funciones llamadas que el `pipeline` realiza al descargar el modelo y el tokenizador. (Ve al [Capítulo 2](/course/chapter2) para más detalles sobre cómo funciona el `pipeline` bajo el capó) 
 
-<Tip>
-
-🚨 ¿Ves el cuadro azul alrededor de "6 frames" en el traceback de Google Colab? Es una característica especial de Colab, que comprime el traceback en "frames". Si no puedes encontrar el origen de un error, asegúrate de ampliar el traceback completo haciendo clic en esas dos flechitas. 
-
-</Tip>
+> [!TIP]
+> 🚨 ¿Ves el cuadro azul alrededor de "6 frames" en el traceback de Google Colab? Es una característica especial de Colab, que comprime el traceback en "frames". Si no puedes encontrar el origen de un error, asegúrate de ampliar el traceback completo haciendo clic en esas dos flechitas.
 
 Esto significa que la última línea del traceback indica el último mensaje de error y nos da el nombre de la excepción (exception) que se ha generado. En este caso, el tipo de excepción es `OSError`, lo que indica un error relacionado con el sistema. Si leemos el mensaje de error que lo acompaña, podemos ver que parece haber un problema con el archivo *config.json* del modelo, y nos da dos sugerencias para solucionarlo:
 
@@ -103,11 +100,8 @@ Make sure that:
 """
 ```
 
-<Tip>
-
-💡 Si te encuentras con un mensaje de error difícil de entender, simplemente copia y pega el mensaje en la barra de búsqueda de Google o de [Stack Overflow](https://stackoverflow.com/) (¡sí, en serio!). Es muy posible que no seas la primera persona en encontrar el error, y esta es una buena forma de hallar soluciones que otros miembros de la comunidad han publicado. Por ejemplo, al buscar `OSError: Can't load config for` en Stack Overflow se obtienen varios resultados que pueden ser utilizados como punto de partida para resolver el problema.
-
-</Tip>
+> [!TIP]
+> 💡 Si te encuentras con un mensaje de error difícil de entender, simplemente copia y pega el mensaje en la barra de búsqueda de Google o de [Stack Overflow](https://stackoverflow.com/) (¡sí, en serio!). Es muy posible que no seas la primera persona en encontrar el error, y esta es una buena forma de hallar soluciones que otros miembros de la comunidad han publicado. Por ejemplo, al buscar `OSError: Can't load config for` en Stack Overflow se obtienen varios resultados que pueden ser utilizados como punto de partida para resolver el problema.
 
 La primera sugerencia nos pide que comprobemos si el identificador del modelo es realmente correcto, así que lo primero es copiar el identificador y pegarlo en la barra de búsqueda del Hub:
 
@@ -159,11 +153,8 @@ pretrained_checkpoint = "distilbert-base-uncased"
 config = AutoConfig.from_pretrained(pretrained_checkpoint)
 ```
 
-<Tip warning={true}>
-
-🚨 El enfoque que tomamos aquí no es infalible, ya que nuestro compañero puede haber cambiado la configuración de `distilbert-base-uncased` antes de ajustar (fine-tuning) el modelo. En la vida real, nos gustaría consultar con él primero, pero para los fines de esta sección asumiremos que usó la configuración predeterminada.
-
-</Tip>
+> [!WARNING]
+> 🚨 El enfoque que tomamos aquí no es infalible, ya que nuestro compañero puede haber cambiado la configuración de `distilbert-base-uncased` antes de ajustar (fine-tuning) el modelo. En la vida real, nos gustaría consultar con él primero, pero para los fines de esta sección asumiremos que usó la configuración predeterminada.
 
 Luego podemos enviar esto a nuestro repositorio del modelo con la función de configuración `push_to_hub()`: 
 
diff --git a/chapters/fa/chapter2/1.mdx b/chapters/fa/chapter2/1.mdx
index ad6bdd22e..ce64a09d0 100644
--- a/chapters/fa/chapter2/1.mdx
+++ b/chapters/fa/chapter2/1.mdx
@@ -21,10 +21,7 @@
 
 سپس نگاهی به API مربوط به توکِنایزر خواهیم داشت که بخش دیگر پیاده‌سازی تابع <span dir="ltr">pipeline()</span> است. توکِنایزرها مرحله اول و مرحله آخر پردازش را انجام می‌دهند که در طی آن‌ها داده‌های نوشتاری را به ورودی‌های عددی برای شبکه عصبی تبدیل نموده و هنگام نیاز باز داده‌های عددی را به نوشتار تبدیل می‌کنند. در انتها، به شما نشان خواهیم داد چگونه چندین جمله را همزمان در یک بتچ از پیش آماده شده از مدل عبور دهید و سپس فصل را با نگاهی نزدیک‌تر به تابع بالادستی <span dir="ltr">tokenizer()</span> به اتمام خواهیم برد.
 
-<Tip>
-
-⚠️ برای بهره بردن از تمامی ویژگی‌های موجود در هاب مدل‌ها و همچنین ترنسفورمرهای هاگینگ‌فِیس پیشنهاد می‌کنیم که <a href="https://huggingface.co/join"> حساب کاربری بسازید.</a>
-
-</Tip>
+> [!TIP]
+> ⚠️ برای بهره بردن از تمامی ویژگی‌های موجود در هاب مدل‌ها و همچنین ترنسفورمرهای هاگینگ‌فِیس پیشنهاد می‌کنیم که <a href="https://huggingface.co/join"> حساب کاربری بسازید.</a>
 
 </div>
diff --git a/chapters/fa/chapter2/2.mdx b/chapters/fa/chapter2/2.mdx
index 4066e727b..c0676b88e 100644
--- a/chapters/fa/chapter2/2.mdx
+++ b/chapters/fa/chapter2/2.mdx
@@ -23,9 +23,8 @@
 
 {/if}
 
-<Tip>
-این اولین بخشی است که محتوای آن بسته به اینکه از پایتورچ یا تِنسورفِلو استفاده می‌کنید کمی متفاوت است. از سویچ بالای صفحه برای انتخاب پلتفرمی که ترجیح می‌دهید استفاده کنید!
-</Tip>
+> [!TIP]
+> این اولین بخشی است که محتوای آن بسته به اینکه از پایتورچ یا تِنسورفِلو استفاده می‌کنید کمی متفاوت است. از سویچ بالای صفحه برای انتخاب پلتفرمی که ترجیح می‌دهید استفاده کنید!
 
 
 {#if fw === 'pt'}
@@ -434,11 +433,8 @@ model.config.id2label
 
 ما با موفقیت سه مرحله خط تولید را در اینجا نشان دادیم: پیش‌پردازش توسط توکِنایزرها، گذر ورودی‌ها از مدل و پس‌پردازش! اکنون زمان آن فرا رسیده که به شکلی عمیق‌تر وارد هر یک از این مراحل شویم.
 
-<Tip>
-
-✏️ **خودتان امتحان کنید!** دو نوشته از خودتان (یا حتی بیشتر) را از خط تولید `sentiment-analysis` بگذرانید. سپس مراحلی که در اینجا دیدیم را تکرار کنید و بررسی کنید که نتایج همان هستند!
-
-</Tip>
+> [!TIP]
+> ✏️ **خودتان امتحان کنید!** دو نوشته از خودتان (یا حتی بیشتر) را از خط تولید `sentiment-analysis` بگذرانید. سپس مراحلی که در اینجا دیدیم را تکرار کنید و بررسی کنید که نتایج همان هستند!
 
 </div>
 
diff --git a/chapters/fa/chapter3/2.mdx b/chapters/fa/chapter3/2.mdx
index b0702bcff..433823fe8 100644
--- a/chapters/fa/chapter3/2.mdx
+++ b/chapters/fa/chapter3/2.mdx
@@ -106,9 +106,8 @@ model.train_on_batch(batch, labels)
 
 <div dir="ltr">
 
-<Tip>
-⚠️ **هشدار** مطمئن شوید که `datasets` نصب شده است. برای اطمینان، دستور `pip install datasets` را اجرا کنید. سپس، مجموعه داده MRPC را بارگذاری کنید و آن را چاپ کنید تا ببینید چه چیزی در آن وجود دارد.
-</Tip> 
+> [!TIP]
+> ⚠️ **هشدار** مطمئن شوید که `datasets` نصب شده است. برای اطمینان، دستور `pip install datasets` را اجرا کنید. سپس، مجموعه داده MRPC را بارگذاری کنید و آن را چاپ کنید تا ببینید چه چیزی در آن وجود دارد. 
 ```py
 from datasets import load_dataset
 
@@ -188,9 +187,8 @@ raw_train_dataset.features
 
 در پشت صحنه، `label` از نوع `ClassLabel` می‌باشد، و نگاشت اعداد صحیح به نام برچسب در پوشه‌ *names* ذخیره شده است. `0` مربوط به `not_equivalent` و `1` مربوط به `equivalent` می‌باشد.
 
-<Tip>
-✏️ **امتحان کنید!** عنصر شماره ۱۵ از مجموعه `training` و عنصر شماره ۸۷ از مجموعه `validation` را مشاهده کنید. برچسب‌های آنها چیست؟
-</Tip>
+> [!TIP]
+> ✏️ **امتحان کنید!** عنصر شماره ۱۵ از مجموعه `training` و عنصر شماره ۸۷ از مجموعه `validation` را مشاهده کنید. برچسب‌های آنها چیست؟
 
 ### پیش‌پردازش دیتاسِت‌‌ها
 
@@ -240,11 +238,8 @@ inputs
 
 در [فصل ۲](/course/chapter2) در مورد کلیدهای `input_ids` و `attention_mask` بحث کردیم، اما از گفتگو در مورد `token_type_ids` اجتناب کردیم. در این مثال این همان چیزی است که به مدل می‌گوید کدام بخش از ورودی جمله اول و کدام بخش جمله دوم است.
 
-<Tip>
-
-✏️ **امتحان کنید!** عنصر شماره ۱۵ از مجموعه `training` را بردارید و دو جمله را به صورت جداگانه و جفت توکِن کنید. تفاوت دو نتیجه چیست؟
-
-</Tip>
+> [!TIP]
+> ✏️ **امتحان کنید!** عنصر شماره ۱۵ از مجموعه `training` را بردارید و دو جمله را به صورت جداگانه و جفت توکِن کنید. تفاوت دو نتیجه چیست؟
 
 اگر شناسه‌های داخل `input_ids` را به کلمات کدگشایی کنیم:
 
@@ -464,11 +459,8 @@ batch = data_collator(samples)
 
 {/if}
 
-<Tip>
-
-✏️ **امتحان کنید!** پروسه پیش‌پردازش را روی دیتاسِت GLUE SST-2 باز تکرار کنید. از آنجایی که این مجموعه به جای دو جمله‌ها شامل تک جمله‌ها می‌باشد این کار کمی متفاوت است، اما بقیه کارهایی که انجام داده‌ایم باید یکسان به نظر برسند. برای یک چالش مشکل‌تر، سعی کنید تابع پیش‌پردازشی بنویسید که برای همه مسئله‌های GLUE کار کند.
-
-</Tip>
+> [!TIP]
+> ✏️ **امتحان کنید!** پروسه پیش‌پردازش را روی دیتاسِت GLUE SST-2 باز تکرار کنید. از آنجایی که این مجموعه به جای دو جمله‌ها شامل تک جمله‌ها می‌باشد این کار کمی متفاوت است، اما بقیه کارهایی که انجام داده‌ایم باید یکسان به نظر برسند. برای یک چالش مشکل‌تر، سعی کنید تابع پیش‌پردازشی بنویسید که برای همه مسئله‌های GLUE کار کند.
 
 {#if fw === 'tf'}
 
diff --git a/chapters/fa/chapter3/3.mdx b/chapters/fa/chapter3/3.mdx
index aa64dc157..3126fa20b 100644
--- a/chapters/fa/chapter3/3.mdx
+++ b/chapters/fa/chapter3/3.mdx
@@ -54,11 +54,8 @@ training_args = TrainingArguments("test-trainer")
 
 </div>
 
-<Tip>
-
-💡 اگر مایلید مدل‌تان را به صورت خودکار در حین تعلیم در هاب بارگذاری کنید، پارامتر `push_to_hub=True` را در `TrainingArguments` ارسال کنید. در [فصل ۴](/course/chapter4/3) در این باره بیشتر خواهیم آموخت.
-
-</Tip>
+> [!TIP]
+> 💡 اگر مایلید مدل‌تان را به صورت خودکار در حین تعلیم در هاب بارگذاری کنید، پارامتر `push_to_hub=True` را در `TrainingArguments` ارسال کنید. در [فصل ۴](/course/chapter4/3) در این باره بیشتر خواهیم آموخت.
 
 مرحله دوم تعریف مدل‌مان می‌باشد. مانند [فصل قبل](/course/chapter2)، از کلاس `AutoModelForSequenceClassification` با دو برچسب کلاس استفاده خواهیم کرد:
 
@@ -212,10 +209,7 @@ trainer.train()
 
 این پایان مقدمه‌ای بر کوک کردن با استفاده از `Trainer` API می‌باشد. در [فصل ۷](/course/chapter7) مثالی برای نشان دادن چگونگی انجام این کار برای معمول‌ترین مسئله‌های NLP ارائه خواهیم کرد، اما اکنون اجازه دهید ببینیم چگونه همین کار را صرفا با استفاده از PyTorch انجام دهیم. 
 
-<Tip>
-
-✏️ **اتحان کنید!** با استفاده از پردازش داده‌ای که در بخش ۲ انجام دادید، مدلی را روی دیتاسِت GLUE SST-2 کوک کنید.
-
-</Tip>
+> [!TIP]
+> ✏️ **اتحان کنید!** با استفاده از پردازش داده‌ای که در بخش ۲ انجام دادید، مدلی را روی دیتاسِت GLUE SST-2 کوک کنید.
 
 </div>
\ No newline at end of file
diff --git a/chapters/fa/chapter3/3_tf.mdx b/chapters/fa/chapter3/3_tf.mdx
index fb49d492f..13125f967 100644
--- a/chapters/fa/chapter3/3_tf.mdx
+++ b/chapters/fa/chapter3/3_tf.mdx
@@ -81,11 +81,8 @@ model = TFAutoModelForSequenceClassification.from_pretrained(checkpoint, num_lab
 
 برای کوک‌ کردن مدل روی دِیتاسِت‌مان، ما فقط باید مدل را <span dir="ltr">`compile()`</span> کنیم و سپس داده‌مان را به تابع <span dir="ltr">`fit()`</span> ارسال کنیم. این کار فرایند کوک‌ کردن را شروع می‌کند (که باید چند دقیقه روی GPU طول بکشد) و در همین حین هزینه `training` و هزینه `validation` را در انتهای هر epoch گزارش می‌دهد.
 
-<Tip>
-
-توجه داشته باشید که مدل‌های ترَنسفورمِر هاگینگ‌فِیس قابلیت ویژه‌ای دارند که بسیاری از مدل‌های کِراس ندارند - آنها می‌توانند به صورت خودکار از یک تابع هزینه مناسب که به صورت داخلی محاسبه می‌کنند استفاده کنند. در صورتی که شما آرگومانی برای تابع هزینه در زمان <span dir="ltr">`compile()`</span> تعیین نکنید آنها از این تابع هزینه به صورت پیش‌فرض استفاده خواهند کرد. توجه داشته باشید که جهت استفاده از تابع هزینه داخلی شما نیاز خواهید داشت برچسب دسته‌های خودتان را به عنوان بخشی از ورودی، نه به صورت یک برچسب دسته مجزا که روش معمول استفاده از برچسب دسته‌ها در مدل‌های کِراس می‌باشد، ارسال کنید. شما مثال‌هایی از این را در بخش ۲ این درس خواهید دید، جایی که تعیین تابع هزینه‌ی درست می‌تواند تا اندازه‌ای پیچیده باشد. به هر حال، برای دسته‌بندی رشته‌‌‌ها، یک تابع هزینه استانداد کِراس به خوبی کار می‌کند، چیزی که ما در اینجا استفاده خواهیم کرد.
-
-</Tip>
+> [!TIP]
+> توجه داشته باشید که مدل‌های ترَنسفورمِر هاگینگ‌فِیس قابلیت ویژه‌ای دارند که بسیاری از مدل‌های کِراس ندارند - آنها می‌توانند به صورت خودکار از یک تابع هزینه مناسب که به صورت داخلی محاسبه می‌کنند استفاده کنند. در صورتی که شما آرگومانی برای تابع هزینه در زمان <span dir="ltr">`compile()`</span> تعیین نکنید آنها از این تابع هزینه به صورت پیش‌فرض استفاده خواهند کرد. توجه داشته باشید که جهت استفاده از تابع هزینه داخلی شما نیاز خواهید داشت برچسب دسته‌های خودتان را به عنوان بخشی از ورودی، نه به صورت یک برچسب دسته مجزا که روش معمول استفاده از برچسب دسته‌ها در مدل‌های کِراس می‌باشد، ارسال کنید. شما مثال‌هایی از این را در بخش ۲ این درس خواهید دید، جایی که تعیین تابع هزینه‌ی درست می‌تواند تا اندازه‌ای پیچیده باشد. به هر حال، برای دسته‌بندی رشته‌‌‌ها، یک تابع هزینه استانداد کِراس به خوبی کار می‌کند، چیزی که ما در اینجا استفاده خواهیم کرد.
 
 <div dir="ltr">
 
@@ -105,11 +102,8 @@ model.fit(
 
 </div>
 
-<Tip warning={true}>
-
-در اینجا توجه شما را به یک مسئله عام جلب می‌کنیم - شما *می‌توانید* فقط نام تابع هزینه را به صورت یک متغیر متنی برای کِراس ارسال کنید، اما کِراس به صورت پیش‌فرض فکر می‌کند شما یک لایه softmax از پیش به خروجی‌تان اعمال کرده‌اید. با این حال، بسیاری از مدل‌ها مقادیر را درست قبل از اینکه softmax به آنها اعمال شود به خروجی می‌دهند، که همچنین به عنوان *logits* شناخته می‌شوند. ما نیاز داریم که به تابع هزینه بگوییم، این کاری است که مدل‌مان انجام می‌دهد و تنها راه گفتن آن این است که به جای ارسال نام تابع هزینه به صورت متغیر متنی، آن را به صورت مستقیم صدا بزنیم.
-
-</Tip>
+> [!WARNING]
+> در اینجا توجه شما را به یک مسئله عام جلب می‌کنیم - شما *می‌توانید* فقط نام تابع هزینه را به صورت یک متغیر متنی برای کِراس ارسال کنید، اما کِراس به صورت پیش‌فرض فکر می‌کند شما یک لایه softmax از پیش به خروجی‌تان اعمال کرده‌اید. با این حال، بسیاری از مدل‌ها مقادیر را درست قبل از اینکه softmax به آنها اعمال شود به خروجی می‌دهند، که همچنین به عنوان *logits* شناخته می‌شوند. ما نیاز داریم که به تابع هزینه بگوییم، این کاری است که مدل‌مان انجام می‌دهد و تنها راه گفتن آن این است که به جای ارسال نام تابع هزینه به صورت متغیر متنی، آن را به صورت مستقیم صدا بزنیم.
 
 ### بهبود کارایی تعلیم
 
@@ -140,11 +134,8 @@ opt = Adam(learning_rate=lr_scheduler)
 
 </div>
 
-<Tip>
-
-کتابخانه ترنسفورمرهای هاگینگ‌فِیس همچنین یک تابع <span dir="ltr">`create_optimizer()`</span> دارد که بهینه‌سازی از نوع `AdamW`، دارای میزان کاهش نرخ یادگیری می‌سازد. این یک میان‌بر مناسب است که آن‌ را با جزئیات در بخش‌های بعدی این آموزش خواهید دید.
-
-</Tip>
+> [!TIP]
+> کتابخانه ترنسفورمرهای هاگینگ‌فِیس همچنین یک تابع <span dir="ltr">`create_optimizer()`</span> دارد که بهینه‌سازی از نوع `AdamW`، دارای میزان کاهش نرخ یادگیری می‌سازد. این یک میان‌بر مناسب است که آن‌ را با جزئیات در بخش‌های بعدی این آموزش خواهید دید.
 
 اکنون بهینه‌ساز کاملا جدیدمان را در اختیار داریم و می‌توانیم آن را تعلیم دهیم. ابتدا، اجازه دهید مدل را مجددا بارگذاری کنیم تا تغییرات ایجاد شده بر وزنها که در تعلیم قبلی اعمال شده‌اند را به حالت اولیه بازگردانیم، سپس می‌توانیم مدل را با بهینه ساز جدید تدوین کنیم: 
 
@@ -171,11 +162,8 @@ model.fit(tf_train_dataset, validation_data=tf_validation_dataset, epochs=3)
 </div>
 
 
-<Tip>
-
-💡 اگر مایلید مدلتان را در حین تعلیم به صورت خودکار در هاب بارگذاری کنید، می‌توانید پارامتر `PushToHubCallback` را در تابع <span dir="ltr">`model.fit()`</span> ارسال کنید. در [فصل ۴](/course/chapter4/3) در این مورد بیشتر خواهیم آموخت. 
-
-</Tip>
+> [!TIP]
+> 💡 اگر مایلید مدلتان را در حین تعلیم به صورت خودکار در هاب بارگذاری کنید، می‌توانید پارامتر `PushToHubCallback` را در تابع <span dir="ltr">`model.fit()`</span> ارسال کنید. در [فصل ۴](/course/chapter4/3) در این مورد بیشتر خواهیم آموخت.
 
 ### پیش‌بینی‌های مدل
 
diff --git a/chapters/fa/chapter4/2.mdx b/chapters/fa/chapter4/2.mdx
index e47a6db83..2d1f9ceec 100644
--- a/chapters/fa/chapter4/2.mdx
+++ b/chapters/fa/chapter4/2.mdx
@@ -116,8 +116,7 @@ model = TFAutoModelForMaskedLM.from_pretrained("camembert-base")
 
 {/if}
 
-<Tip>
-هنگامی که مدلی از پیش تعلیم دیده را استفاده می‌کنید، حتما بررسی کنید که این تعلیم چگونه و روی چه دیتاسِت‌هایی صورت پذیرفته و چه محدودیت‌ها و سوگیری‌هایی را شامل می‌شود. تمامی این اطلاعات می‌بایست در صفحه توضیحات مدل نشان داده شوند.
-</Tip>
+> [!TIP]
+> هنگامی که مدلی از پیش تعلیم دیده را استفاده می‌کنید، حتما بررسی کنید که این تعلیم چگونه و روی چه دیتاسِت‌هایی صورت پذیرفته و چه محدودیت‌ها و سوگیری‌هایی را شامل می‌شود. تمامی این اطلاعات می‌بایست در صفحه توضیحات مدل نشان داده شوند.
 
 </div>
diff --git a/chapters/fr/chapter1/3.mdx b/chapters/fr/chapter1/3.mdx
index 810112fe2..0d528063c 100644
--- a/chapters/fr/chapter1/3.mdx
+++ b/chapters/fr/chapter1/3.mdx
@@ -11,11 +11,10 @@
 
 Dans cette section, nous allons voir ce que peuvent faire les *transformers* et utiliser notre premier outil de la bibliothèque 🤗 *Transformers* : la fonction `pipeline()`.
 
-<Tip>
-👀 Vous voyez ce bouton <em>Open in Colab</em> en haut à droite ? Cliquez dessus pour ouvrir un <i>notebook</i> Colab avec tous les exemples de code de cette section. Ce bouton sera présent dans n'importe quelle section contenant des exemples de code.
-
-Si vous souhaitez exécuter les codes en local, nous vous recommandons de jeter un œil au chapitre <a href="/course/fr/chapter0">configuration</a>.
-</Tip>
+> [!TIP]
+> 👀 Vous voyez ce bouton <em>Open in Colab</em> en haut à droite ? Cliquez dessus pour ouvrir un <i>notebook</i> Colab avec tous les exemples de code de cette section. Ce bouton sera présent dans n'importe quelle section contenant des exemples de code.
+>
+> Si vous souhaitez exécuter les codes en local, nous vous recommandons de jeter un œil au chapitre <a href="/course/fr/chapter0">configuration</a>.
 
 ## Les <i>transformers</i> sont partout !
 
@@ -25,9 +24,8 @@ Les *transformers* sont utilisés pour résoudre toute sorte de tâches de NLP c
 
 La bibliothèque [🤗 *Transformers*](https://github.com/huggingface/transformers) fournit toutes les fonctionnalités nécessaires pour créer et utiliser les modèles partagés. Le [*Hub*](https://huggingface.co/models) contient des milliers de modèles pré-entraînés que n'importe qui peut télécharger et utiliser. Vous pouvez également transférer vos propres modèles vers le *Hub* !
 
-<Tip>
-	⚠️ Le <i>Hub</i> n'est pas limité aux <i>transformers</i>. Tout le monde peut partager n'importe quel modèle ou jeu de données s'il le souhaite ! <a href="https://huggingface.co/join">Créez un compte sur huggingface.co</a> pour bénéficier de toutes les fonctionnalités disponibles !
-</Tip>
+> [!TIP]
+> ⚠️ Le <i>Hub</i> n'est pas limité aux <i>transformers</i>. Tout le monde peut partager n'importe quel modèle ou jeu de données s'il le souhaite ! <a href="https://huggingface.co/join">Créez un compte sur huggingface.co</a> pour bénéficier de toutes les fonctionnalités disponibles !
 
 Avant de découvrir en détail comment les *transformers* fonctionnent, nous allons voir quelques exemples de comment ils peuvent être utilisés pour résoudre des problèmes intéressants de NLP.
 
@@ -113,11 +111,8 @@ classifier(
 
 Ce pipeline est appelé _zero-shot_ car vous n'avez pas besoin d'entraîner spécifiquement le modèle sur vos données pour l'utiliser. Il peut directement retourner des scores de probabilité pour n'importe quel ensemble de labels que vous choisissez !
 
-<Tip>
-
-✏️ **Essayez !** Jouez avec vos propres séquences et labels et voyez comment le modèle fonctionne.
-
-</Tip>
+> [!TIP]
+> ✏️ **Essayez !** Jouez avec vos propres séquences et labels et voyez comment le modèle fonctionne.
 
 
 ## Génération de texte
@@ -148,11 +143,8 @@ generator(
 
 Il est possible de contrôler le nombre de séquences générées avec l'argument `num_return_sequences` et la longueur totale du texte généré avec l'argument `max_length`.
 
-<Tip>
-
-✏️ **Essayez !** Utilisez les arguments `num_return_sequences` et `max_length` pour générer deux phrases de 15 mots chacune.
-
-</Tip>
+> [!TIP]
+> ✏️ **Essayez !** Utilisez les arguments `num_return_sequences` et `max_length` pour générer deux phrases de 15 mots chacune.
 
 
 ## Utiliser n'importe quel modèle du <i>Hub</i> dans un pipeline
@@ -190,11 +182,8 @@ Vous pouvez améliorer votre recherche de modèle en cliquant sur les *filtres*
 
 Une fois que vous avez choisi un modèle, vous verrez que vous pouvez tester son fonctionnement en ligne directement. Cela vous permet de tester rapidement les capacités du modèle avant de le télécharger.
 
-<Tip>
-
-✏️ **Essayez !** Utilisez les filtres pour trouver un modèle de génération de texte pour une autre langue. N'hésitez pas à jouer avec le *widget* et l'utiliser dans un pipeline !
-
-</Tip>
+> [!TIP]
+> ✏️ **Essayez !** Utilisez les filtres pour trouver un modèle de génération de texte pour une autre langue. N'hésitez pas à jouer avec le *widget* et l'utiliser dans un pipeline !
 
 ### L'API d'inférence
 
@@ -228,11 +217,8 @@ unmasker("This course will teach you all about <mask> models.", top_k=2)
 
 L'argument `top_k` permet de contrôler le nombre de possibilités que vous souhaitez afficher. Notez que dans ce cas, le modèle remplace le mot spécial `<mask>`, qui est souvent appelé un *mot masqué*. D'autres modèles permettant de remplacer les mots manquants peuvent avoir des mots masqués différents, donc il est toujours bon de vérifier le mot masqué approprié lorsque vous comparez d'autres modèles. Une façon de le vérifier est de regarder le mot masqué utilisé dans l'outil de test de la page du modèle.
 
-<Tip>
-
-✏️ **Essayez !** Recherchez le modèle `bert-base-cased` sur le *Hub* et identifiez le mot masqué dans l'outil d'inférence. Que prédit le modèle pour la phrase dans notre exemple de pipeline au-dessus ?
-
-</Tip>
+> [!TIP]
+> ✏️ **Essayez !** Recherchez le modèle `bert-base-cased` sur le *Hub* et identifiez le mot masqué dans l'outil d'inférence. Que prédit le modèle pour la phrase dans notre exemple de pipeline au-dessus ?
 
 ## Reconnaissance d'entités nommées
 
@@ -258,11 +244,8 @@ Nous pouvons voir que le modèle a correctement identifié Sylvain comme une per
 
 Il est possible d'utiliser l'option `grouped_entities=True` lors de la création du pipeline pour regrouper les parties du texte qui correspondent à la même entité : ici le modèle à correctement regroupé `Hugging` et `Face` comme une seule organisation, même si le nom comporte plusieurs mots. En effet, comme nous allons voir dans le prochain chapitre, la prétraitement du texte sépare parfois certains mots en plus petites parties. Par exemple, `Sylvain` est séparé en quatre morceaux : `S`, `##yl`, `##va`, et `##in`. Dans l'étape de post-traitement, le pipeline a réussi à regrouper ces morceaux.
 
-<Tip>
-
-✏️ **Essayez !** Recherchez sur le *Hub* un modèle capable de reconnaître les différentes parties du langage (généralement abrégé en POS pour *Part-of-speech*) en anglais. Que prédit le modèle pour la phrase dans notre exemple du pipeline au-dessus ?
-
-</Tip>
+> [!TIP]
+> ✏️ **Essayez !** Recherchez sur le *Hub* un modèle capable de reconnaître les différentes parties du langage (généralement abrégé en POS pour *Part-of-speech*) en anglais. Que prédit le modèle pour la phrase dans notre exemple du pipeline au-dessus ?
 
 ## Réponse à des questions
 
@@ -375,10 +358,7 @@ translator("Ce cours est produit par Hugging Face.")
 
 Comme pour la génération de texte et le résumé de texte, il est possible de spécifier une `max_length` (longueur maximale) ou une `min_length` (longueur minimale) pour le résultat.
 
-<Tip>
-
-✏️ **Essayez !** Recherchez d'autres modèles de traduction sur le *Hub* et essayez de traduire la phrase précédente en plusieurs langues différentes.
-
-</Tip>
+> [!TIP]
+> ✏️ **Essayez !** Recherchez d'autres modèles de traduction sur le *Hub* et essayez de traduire la phrase précédente en plusieurs langues différentes.
 
 Les pipelines présentés jusqu'ici sont principalement destinés à des fins de démonstration. Ils ont été programmés pour des tâches spécifiques et ne peuvent pas effectuer de variations de celles-ci. Dans le chapitre suivant, vous apprendrez ce qu'il y a dans un `pipeline()` et comment modifier son comportement.
diff --git a/chapters/fr/chapter2/1.mdx b/chapters/fr/chapter2/1.mdx
index 2fb2880e1..ea0ccdb5c 100644
--- a/chapters/fr/chapter2/1.mdx
+++ b/chapters/fr/chapter2/1.mdx
@@ -24,6 +24,5 @@ Nous examinerons ensuite l'API *tokenizer* qui est l'autre composant principal d
 Les *tokenizers* s'occupent de la première et de la dernière étape du traitement en gérant la conversion du texte en entrées numériques pour le réseau neuronal et la reconversion en texte lorsqu'elle est nécessaire. 
 Enfin, nous montrerons comment gérer l'envoi de plusieurs phrases à travers un modèle dans un batch préparé et nous conclurons le tout en examinant de plus près la fonction `tokenizer()`.
 
-<Tip>
-  ⚠️ Afin de bénéficier de toutes les fonctionnalités disponibles avec le <i>Hub</i> et la bibliothèque 🤗 <i>Transformers</i>, nous vous recommandons <a href="https://huggingface.co/join">de créer un compte</a>.
-</Tip>
+> [!TIP]
+> ⚠️ Afin de bénéficier de toutes les fonctionnalités disponibles avec le <i>Hub</i> et la bibliothèque 🤗 <i>Transformers</i>, nous vous recommandons <a href="https://huggingface.co/join">de créer un compte</a>.
diff --git a/chapters/fr/chapter2/2.mdx b/chapters/fr/chapter2/2.mdx
index b05f8b154..fa8476654 100644
--- a/chapters/fr/chapter2/2.mdx
+++ b/chapters/fr/chapter2/2.mdx
@@ -26,9 +26,8 @@
 
 {/if}
 
-<Tip>
-Il s'agit de la première section dont le contenu est légèrement différent selon que vous utilisez PyTorch ou TensorFlow. Cliquez sur le bouton situé au-dessus du titre pour sélectionner la plateforme que vous préférez !
-</Tip>
+> [!TIP]
+> Il s'agit de la première section dont le contenu est légèrement différent selon que vous utilisez PyTorch ou TensorFlow. Cliquez sur le bouton situé au-dessus du titre pour sélectionner la plateforme que vous préférez !
 
 {#if fw === 'pt'}
 <Youtube id="1pedAIvTWXk"/>
@@ -346,8 +345,5 @@ Nous pouvons maintenant conclure que le modèle a prédit ce qui suit :
 
 Nous avons reproduit avec succès les trois étapes du pipeline : prétraitement avec les *tokenizers*, passage des entrées dans le modèle et post-traitement ! Prenons maintenant le temps de nous plonger plus profondément dans chacune de ces étapes.
 
-<Tip>
-
-✏️ **Essayez !** Choisissez deux (ou plus) textes de votre choix (en anglais) et faites-les passer par le pipeline `sentiment-analysis`. Reproduisez ensuite vous-même les étapes vues ici et vérifiez que vous obtenez les mêmes résultats !
-  
-</Tip>
+> [!TIP]
+> ✏️ **Essayez !** Choisissez deux (ou plus) textes de votre choix (en anglais) et faites-les passer par le pipeline `sentiment-analysis`. Reproduisez ensuite vous-même les étapes vues ici et vérifiez que vous obtenez les mêmes résultats !
diff --git a/chapters/fr/chapter2/4.mdx b/chapters/fr/chapter2/4.mdx
index bf36f6268..8ad739967 100644
--- a/chapters/fr/chapter2/4.mdx
+++ b/chapters/fr/chapter2/4.mdx
@@ -232,11 +232,8 @@ print(ids)
 
 Une fois converties en tenseur dans le *framework* approprié, ces sorties peuvent ensuite être utilisées comme entrées d'un modèle, comme nous l'avons vu précédemment dans ce chapitre.
 
-<Tip>
-
-✏️ **Essayez !** Reproduisez les deux dernières étapes (tokénisation et conversion en identifiants d'entrée) sur les phrases des entrées que nous avons utilisées dans la section 2 (« <i>I've been waiting for a HuggingFace course my whole life.</i> » et « <i>I hate this so much!</i> »). Vérifiez que vous obtenez les mêmes identifiants d'entrée que nous avons obtenus précédemment !
-
-</Tip>
+> [!TIP]
+> ✏️ **Essayez !** Reproduisez les deux dernières étapes (tokénisation et conversion en identifiants d'entrée) sur les phrases des entrées que nous avons utilisées dans la section 2 (« <i>I've been waiting for a HuggingFace course my whole life.</i> » et « <i>I hate this so much!</i> »). Vérifiez que vous obtenez les mêmes identifiants d'entrée que nous avons obtenus précédemment !
 
 ## Décodage
 
diff --git a/chapters/fr/chapter2/5.mdx b/chapters/fr/chapter2/5.mdx
index 9e3cff91e..4154b8663 100644
--- a/chapters/fr/chapter2/5.mdx
+++ b/chapters/fr/chapter2/5.mdx
@@ -192,11 +192,8 @@ batched_ids = [ids, ids]
 
 Il s'agit d'un batch de deux séquences identiques !
 
-<Tip>
-
-✏️ **Essayez !** Convertissez cette liste `batched_ids` en un tenseur et passez-la dans votre modèle. Vérifiez que vous obtenez les mêmes logits que précédemment (mais deux fois) !
-
-</Tip>
+> [!TIP]
+> ✏️ **Essayez !** Convertissez cette liste `batched_ids` en un tenseur et passez-la dans votre modèle. Vérifiez que vous obtenez les mêmes logits que précédemment (mais deux fois) !
 
 Utiliser des *batchs* permet au modèle de fonctionner lorsque vous lui donnez plusieurs séquences. Utiliser plusieurs séquences est aussi simple que de construire un batch avec une seule séquence. Il y a cependant un deuxième problème. Lorsque vous essayez de regrouper deux phrases (ou plus), elles peuvent être de longueurs différentes. Si vous avez déjà travaillé avec des tenseurs, vous savez qu'ils doivent être de forme rectangulaire. Vous ne pourrez donc pas convertir directement la liste des identifiants d'entrée en un tenseur. Pour contourner ce problème, nous avons l'habitude de *rembourrer*/*remplir* (le *padding* en anglais) les entrées.
 
@@ -334,11 +331,8 @@ Nous obtenons maintenant les mêmes logits pour la deuxième phrase du batch.
 Remarquez comment la dernière valeur de la deuxième séquence est un identifiant de *padding* valant 0 dans le masque d'attention.
 
 
-<Tip>
-
-✏️ **Essayez !** Appliquez la tokenisation manuellement sur les deux phrases utilisées dans la section 2 (« <i>I've been waiting for a HuggingFace course my whole life.</i> » et « <i>I hate this so much!</i> »). Passez-les dans le modèle et vérifiez que vous obtenez les mêmes logits que dans la section 2. Ensuite regroupez-les en utilisant le jeton de *padding* et créez le masque d'attention approprié. Vérifiez que vous obtenez les mêmes résultats qu’en passant par le modèle !
-
-</Tip>
+> [!TIP]
+> ✏️ **Essayez !** Appliquez la tokenisation manuellement sur les deux phrases utilisées dans la section 2 (« <i>I've been waiting for a HuggingFace course my whole life.</i> » et « <i>I hate this so much!</i> »). Passez-les dans le modèle et vérifiez que vous obtenez les mêmes logits que dans la section 2. Ensuite regroupez-les en utilisant le jeton de *padding* et créez le masque d'attention approprié. Vérifiez que vous obtenez les mêmes résultats qu’en passant par le modèle !
 
 
 ## Séquences plus longues
diff --git a/chapters/fr/chapter3/2.mdx b/chapters/fr/chapter3/2.mdx
index 695f323d3..ffec8b9e4 100644
--- a/chapters/fr/chapter3/2.mdx
+++ b/chapters/fr/chapter3/2.mdx
@@ -97,9 +97,8 @@ Le *Hub* ne contient pas seulement des modèles mais aussi plusieurs jeux de don
 
 La bibliothèque 🤗 *Datasets* propose une commande très simple pour télécharger et mettre en cache un jeu de données à partir du *Hub*. On peut télécharger le jeu de données MRPC comme ceci :   
 
-<Tip>
-⚠️ **Attention** Assurez-vous que `datasets` est installé en exécutant `pip install datasets`. Ensuite, chargez le jeu de données MRPC et imprimez-le pour voir ce qu'il contient.
-</Tip> 
+> [!TIP]
+> ⚠️ **Attention** Assurez-vous que `datasets` est installé en exécutant `pip install datasets`. Ensuite, chargez le jeu de données MRPC et imprimez-le pour voir ce qu'il contient. 
 
 ```py
 from datasets import load_dataset
@@ -160,10 +159,8 @@ raw_train_dataset.features
 
 En réalité, `label` est de type `ClassLabel` et la correspondance des entiers aux noms des labels est enregistrée le dossier *names*. `0` correspond à  `not_equivalent` et `1` correspond à `equivalent`.
 
-<Tip>
-
-✏️ **Essayez !** Regardez l'élément 15 de l'ensemble d'entraînement et l'élément 87 de l'ensemble de validation. Quelles sont leurs étiquettes ?
-</Tip>
+> [!TIP]
+> ✏️ **Essayez !** Regardez l'élément 15 de l'ensemble d'entraînement et l'élément 87 de l'ensemble de validation. Quelles sont leurs étiquettes ?
 
 ### Prétraitement d'un jeu de données
 
@@ -203,11 +200,8 @@ inputs
 
 Nous avons discuté des clés `input_ids` et `attention_mask` dans le [chapitre 2](/course/fr/chapter2), mais nous avons laissé de côté les `token_type_ids`. Dans cet exemple, c'est ce qui indique au modèle quelle partie de l'entrée est la première phrase et quelle partie est la deuxième phrase.
 
-<Tip>
-
-✏️ **Essayez !** Prenez l'élément 15 de l'ensemble d'entraînement et tokenisez les deux phrases séparément et par paire. Quelle est la différence entre les deux résultats ?
-
-</Tip>
+> [!TIP]
+> ✏️ **Essayez !** Prenez l'élément 15 de l'ensemble d'entraînement et tokenisez les deux phrases séparément et par paire. Quelle est la différence entre les deux résultats ?
 
 Si on décode les IDs dans `input_ids` en mots :
 
@@ -362,11 +356,8 @@ C'est beau ! Maintenant que nous sommes passés du texte brut à des batchs que
 
 {/if}
 
-<Tip>
-
-✏️ **Essayez !** Reproduisez le prétraitement sur le jeu de données GLUE SST-2. C'est un peu différent puisqu'il est composé de phrases simples au lieu de paires, mais le reste de ce que nous avons fait devrait être identique. Pour un défi plus difficile, essayez d'écrire une fonction de prétraitement qui fonctionne sur toutes les tâches GLUE.
-
-</Tip>
+> [!TIP]
+> ✏️ **Essayez !** Reproduisez le prétraitement sur le jeu de données GLUE SST-2. C'est un peu différent puisqu'il est composé de phrases simples au lieu de paires, mais le reste de ce que nous avons fait devrait être identique. Pour un défi plus difficile, essayez d'écrire une fonction de prétraitement qui fonctionne sur toutes les tâches GLUE.
 
 {#if fw === 'tf'}
 
diff --git a/chapters/fr/chapter3/3.mdx b/chapters/fr/chapter3/3.mdx
index ba0061e7f..36e2ec914 100644
--- a/chapters/fr/chapter3/3.mdx
+++ b/chapters/fr/chapter3/3.mdx
@@ -44,11 +44,8 @@ from transformers import TrainingArguments
 training_args = TrainingArguments("test-trainer")
 ```
 
-<Tip>
-
-💡 Si vous voulez télécharger automatiquement votre modèle sur le *Hub* pendant l'entraînement, passez `push_to_hub=True` dans le `TrainingArguments`. Nous en apprendrons plus à ce sujet au [chapitre 4](/course/fr/chapter4/3).
-
-</Tip>
+> [!TIP]
+> 💡 Si vous voulez télécharger automatiquement votre modèle sur le *Hub* pendant l'entraînement, passez `push_to_hub=True` dans le `TrainingArguments`. Nous en apprendrons plus à ce sujet au [chapitre 4](/course/fr/chapter4/3).
 
 La deuxième étape consiste à définir notre modèle. Comme dans le [chapitre précédent](/course/fr/chapter2), nous utiliserons la classe `AutoModelForSequenceClassification`, avec deux labels :
 
@@ -166,8 +163,5 @@ Le `Trainer` fonctionnera sur plusieurs GPUs ou TPUs et fournit beaucoup d'optio
 
 Ceci conclut l'introduction au *fine-tuning* en utilisant l'API `Trainer`. Un exemple d'utilisation pour les tâches de NLP les plus communes es donné dans le [chapitre 7](/course/fr/chapter7), mais pour l'instant regardons comment faire la même chose en PyTorch pur.
 
-<Tip>
-
-✏️ **Essayez !** *Finetunez* un modèle sur le jeu de données GLUE SST-2, en utilisant le traitement des données que vous avez fait dans la section 2.
-
-</Tip>
+> [!TIP]
+> ✏️ **Essayez !** *Finetunez* un modèle sur le jeu de données GLUE SST-2, en utilisant le traitement des données que vous avez fait dans la section 2.
diff --git a/chapters/fr/chapter3/3_tf.mdx b/chapters/fr/chapter3/3_tf.mdx
index 41755c3cd..3188f3a53 100644
--- a/chapters/fr/chapter3/3_tf.mdx
+++ b/chapters/fr/chapter3/3_tf.mdx
@@ -72,11 +72,8 @@ Vous remarquerez que, contrairement au [chapitre 2](/course/fr/chapter2), vous o
 
 Pour *finetuner* le modèle sur notre jeu de données, nous devons simplement `compiler()` notre modèle et ensuite passer nos données à la méthode `fit()`. Cela va démarrer le processus de *finetuning* (qui devrait prendre quelques minutes sur un GPU) et rapporter la perte d'entraînement au fur et à mesure, plus la perte de validation à la fin de chaque époque.
 
-<Tip>
-
-Notez que les modèles 🤗 *Transformers* ont une capacité spéciale que la plupart des modèles Keras n'ont pas. Ils peuvent automatiquement utiliser une perte appropriée qu'ils calculent en interne. Ils utiliseront cette perte par défaut si vous ne définissez pas un argument de perte dans `compile()`. Notez que pour utiliser la perte interne, vous devrez passer vos labels comme faisant partie de l'entrée, et non pas comme un label séparé, ce qui est la façon normale d'utiliser les labels avec les modèles Keras. Vous verrez des exemples de cela dans la partie 2 du cours, où la définition de la fonction de perte correcte peut être délicate. Pour la classification des séquences, cependant, une fonction de perte standard de Keras fonctionne bien, et c'est donc ce que nous utiliserons ici.
-
-</Tip>
+> [!TIP]
+> Notez que les modèles 🤗 *Transformers* ont une capacité spéciale que la plupart des modèles Keras n'ont pas. Ils peuvent automatiquement utiliser une perte appropriée qu'ils calculent en interne. Ils utiliseront cette perte par défaut si vous ne définissez pas un argument de perte dans `compile()`. Notez que pour utiliser la perte interne, vous devrez passer vos labels comme faisant partie de l'entrée, et non pas comme un label séparé, ce qui est la façon normale d'utiliser les labels avec les modèles Keras. Vous verrez des exemples de cela dans la partie 2 du cours, où la définition de la fonction de perte correcte peut être délicate. Pour la classification des séquences, cependant, une fonction de perte standard de Keras fonctionne bien, et c'est donc ce que nous utiliserons ici.
 
 ```py
 from tensorflow.keras.losses import SparseCategoricalCrossentropy
@@ -92,11 +89,8 @@ model.fit(
 )
 ```
 
-<Tip warning={true}>
-
-Notez un piège très commun ici. Vous *pouvez* simplement passer le nom de la perte comme une chaîne à Keras, mais par défaut Keras supposera que vous avez déjà appliqué une fonction softmax à vos sorties. Cependant, de nombreux modèles produisent les valeurs juste avant l'application de la softmax, que l'on appelle aussi les *logits*. Nous devons indiquer à la fonction de perte que c'est ce que fait notre modèle, et la seule façon de le faire est de l'appeler directement, plutôt que par son nom avec une chaîne.
-
-</Tip>
+> [!WARNING]
+> Notez un piège très commun ici. Vous *pouvez* simplement passer le nom de la perte comme une chaîne à Keras, mais par défaut Keras supposera que vous avez déjà appliqué une fonction softmax à vos sorties. Cependant, de nombreux modèles produisent les valeurs juste avant l'application de la softmax, que l'on appelle aussi les *logits*. Nous devons indiquer à la fonction de perte que c'est ce que fait notre modèle, et la seule façon de le faire est de l'appeler directement, plutôt que par son nom avec une chaîne.
 
 
 ### Améliorer les performances d'entraînement
@@ -124,11 +118,8 @@ from tensorflow.keras.optimizers import Adam
 opt = Adam(learning_rate=lr_scheduler)
 ```
 
-<Tip>
-
-La bibliothèque 🤗 *Transformers* possède également une fonction `create_optimizer()` qui créera un optimiseur `AdamW` avec un taux d'apprentissage décroissant. Il s'agit d'un raccourci pratique que vous verrez en détail dans les prochaines sections du cours.
-
-</Tip>
+> [!TIP]
+> La bibliothèque 🤗 *Transformers* possède également une fonction `create_optimizer()` qui créera un optimiseur `AdamW` avec un taux d'apprentissage décroissant. Il s'agit d'un raccourci pratique que vous verrez en détail dans les prochaines sections du cours.
 
 Nous avons maintenant notre tout nouvel optimiseur et nous pouvons essayer de nous entraîner avec lui. Tout d'abord, rechargeons le modèle pour réinitialiser les modifications apportées aux poids lors de l'entraînement que nous venons d'effectuer, puis nous pouvons le compiler avec le nouvel optimiseur :
 
@@ -146,11 +137,8 @@ Maintenant, on *fit* :
 model.fit(tf_train_dataset, validation_data=tf_validation_dataset, epochs=3)
 ```
 
-<Tip>
-
-💡 Si vous voulez télécharger automatiquement votre modèle sur le *Hub* pendant l'entraînement, vous pouvez passer un `PushToHubCallback` dans la méthode `model.fit()`. Nous en apprendrons davantage à ce sujet au [chapitre 4](/course/fr/chapter4/3).
-
-</Tip>
+> [!TIP]
+> 💡 Si vous voulez télécharger automatiquement votre modèle sur le *Hub* pendant l'entraînement, vous pouvez passer un `PushToHubCallback` dans la méthode `model.fit()`. Nous en apprendrons davantage à ce sujet au [chapitre 4](/course/fr/chapter4/3).
 
 ### Prédictions du modèle
 
diff --git a/chapters/fr/chapter3/4.mdx b/chapters/fr/chapter3/4.mdx
index 0240bbc04..dd4fb0e0e 100644
--- a/chapters/fr/chapter3/4.mdx
+++ b/chapters/fr/chapter3/4.mdx
@@ -199,11 +199,8 @@ metric.compute()
 
 Une fois encore, vos résultats seront légèrement différents en raison du caractère aléatoire de l'initialisation de la tête du modèle et du mélange des données, mais ils devraient se situer dans la même fourchette.
 
-<Tip>
-
-✏️ **Essayez** Modifiez la boucle d'entraînement précédente pour *finetuner* votre modèle sur le jeu de données SST-2.
-
-</Tip>
+> [!TIP]
+> ✏️ **Essayez** Modifiez la boucle d'entraînement précédente pour *finetuner* votre modèle sur le jeu de données SST-2.
 
 ### Optimisez votre boucle d'entraînement avec 🤗 <i>Accelerate</i>
 
@@ -295,9 +292,8 @@ La première ligne à ajouter est la ligne d'importation. La deuxième ligne ins
 
 Ensuite, le gros du travail est fait dans la ligne qui envoie les *dataloaders*, le modèle, et l'optimiseur à `accelerator.prepare()`. Cela va envelopper ces objets dans le conteneur approprié pour s'assurer que votre entraînement distribué fonctionne comme prévu. Les changements restants à faire sont la suppression de la ligne qui met le batch sur le `device` (encore une fois, si vous voulez le garder, vous pouvez juste le changer pour utiliser `accelerator.device`) et le remplacement de `loss.backward()` par `accelerator.backward(loss)`.
 
-<Tip>
-⚠️ Afin de bénéficier de la rapidité offerte par les TPUs du Cloud, nous vous recommandons de rembourrer vos échantillons à une longueur fixe avec les arguments `padding="max_length"` et `max_length` du <i>tokenizer</i>.
-</Tip>
+> [!TIP]
+> ⚠️ Afin de bénéficier de la rapidité offerte par les TPUs du Cloud, nous vous recommandons de rembourrer vos échantillons à une longueur fixe avec les arguments `padding="max_length"` et `max_length` du <i>tokenizer</i>.
 
 Si vous souhaitez faire un copier-coller pour jouer, voici à quoi ressemble la boucle d'entraînement complète avec 🤗 <i>Accelerate</i> :
 
diff --git a/chapters/fr/chapter4/2.mdx b/chapters/fr/chapter4/2.mdx
index 0990c4663..cde765cec 100644
--- a/chapters/fr/chapter4/2.mdx
+++ b/chapters/fr/chapter4/2.mdx
@@ -95,6 +95,5 @@ model = TFAutoModelForMaskedLM.from_pretrained("camembert-base")
 ```
 {/if}
 
-<Tip>
-Lorsque vous utilisez un modèle pré-entraîné, assurez-vous de vérifier comment il a été entraîné, sur quels jeux de données, ses limites et ses biais. Toutes ces informations doivent être indiquées dans sa carte.
-</Tip>
+> [!TIP]
+> Lorsque vous utilisez un modèle pré-entraîné, assurez-vous de vérifier comment il a été entraîné, sur quels jeux de données, ses limites et ses biais. Toutes ces informations doivent être indiquées dans sa carte.
diff --git a/chapters/fr/chapter4/3.mdx b/chapters/fr/chapter4/3.mdx
index f349d7d9d..08b16824a 100644
--- a/chapters/fr/chapter4/3.mdx
+++ b/chapters/fr/chapter4/3.mdx
@@ -173,11 +173,8 @@ Cliquez sur l'onglet « Fichiers et versions » et vous devriez voir les fichier
 </div>
 {/if}
 
-<Tip>
-
-✏️ **Essayez** Prenez le modèle et le *tokenizer* associés au *checkpoint* `bert-base-cased` et téléchargez-les vers un dépôt dans votre espace en utilisant la méthode `push_to_hub()`. Vérifiez que le dépôt apparaît correctement sur votre page avant de le supprimer.
-
-</Tip>
+> [!TIP]
+> ✏️ **Essayez** Prenez le modèle et le *tokenizer* associés au *checkpoint* `bert-base-cased` et téléchargez-les vers un dépôt dans votre espace en utilisant la méthode `push_to_hub()`. Vérifiez que le dépôt apparaît correctement sur votre page avant de le supprimer.
 
 Comme vous l'avez vu, la méthode `push_to_hub()` accepte plusieurs arguments, ce qui permet de télécharger vers un dépôt ou un espace d'organisation spécifique, ou d'utiliser un jeton d'API différent. Nous vous recommandons de jeter un coup d'œil à la spécification de la méthode disponible directement dans la documentation de [🤗 *Transformers*](https://huggingface.co/transformers/model_sharing.html) pour avoir une idée de ce qui est possible.
 
@@ -464,9 +461,8 @@ Si vous regardez la taille des fichiers (par exemple, avec `ls -lh`), vous devri
 
 {/if}
 
-<Tip>
-✏️ Lors de la création du dépôt à partir de l'interface web, le fichier <i>.gitattributes</i> est automatiquement configuré pour considérer les fichiers avec certaines extensions, comme <i>.bin</i> et <i>.h5</i>, comme des fichiers volumineux, et git-lfs les suivra sans aucune configuration nécessaire de votre part.
-</Tip> 
+> [!TIP]
+> ✏️ Lors de la création du dépôt à partir de l'interface web, le fichier <i>.gitattributes</i> est automatiquement configuré pour considérer les fichiers avec certaines extensions, comme <i>.bin</i> et <i>.h5</i>, comme des fichiers volumineux, et git-lfs les suivra sans aucune configuration nécessaire de votre part. 
 
 Nous pouvons maintenant aller de l'avant et procéder comme nous le ferions habituellement avec des dépôts Git traditionnels. Nous pouvons ajouter tous les fichiers à l'environnement Git en utilisant la commande `git add` :
 
diff --git a/chapters/fr/chapter5/2.mdx b/chapters/fr/chapter5/2.mdx
index 58640da8b..db6a6dd3f 100644
--- a/chapters/fr/chapter5/2.mdx
+++ b/chapters/fr/chapter5/2.mdx
@@ -50,11 +50,8 @@ SQuAD_it-train.json.gz:	   82.2% -- replaced with SQuAD_it-train.json
 
 Nous pouvons voir que les fichiers compressés ont été remplacés par _SQuAD_it-train.json_ et _SQuAD_it-text.json_, et que les données sont stockées au format JSON.
 
-<Tip>
-
-✎ Si vous vous demandez pourquoi il y a un caractère `!` dans les commandes *shell* ci-dessus, c'est parce que nous les exécutons dans un *notebook* Jupyter. Supprimez simplement le préfixe si vous souhaitez télécharger et décompresser le jeu de données dans un terminal.
-
-</Tip>
+> [!TIP]
+> ✎ Si vous vous demandez pourquoi il y a un caractère `!` dans les commandes *shell* ci-dessus, c'est parce que nous les exécutons dans un *notebook* Jupyter. Supprimez simplement le préfixe si vous souhaitez télécharger et décompresser le jeu de données dans un terminal.
 
 Pour charger un fichier JSON avec la fonction `load_dataset()`, nous avons juste besoin de savoir si nous avons affaire à du JSON ordinaire (similaire à un dictionnaire imbriqué) ou à des lignes JSON (JSON séparé par des lignes). Comme de nombreux jeux de données de questions-réponses, SQuAD-it utilise le format imbriqué où tout le texte est stocké dans un champ `data`. Cela signifie que nous pouvons charger le jeu de données en spécifiant l'argument `field` comme suit :
 
@@ -130,11 +127,8 @@ DatasetDict({
 
 C'est exactement ce que nous voulions. Désormais, nous pouvons appliquer diverses techniques de prétraitement pour nettoyer les données, tokeniser les avis, etc.
 
-<Tip>
-
-L'argument `data_files` de la fonction `load_dataset()` est assez flexible et peut être soit un chemin de fichier unique, une liste de chemins de fichiers, ou un dictionnaire qui fait correspondre les noms des échantillons aux chemins de fichiers. Vous pouvez également regrouper les fichiers correspondant à un motif spécifié selon les règles utilisées par le shell Unix. Par exemple, vous pouvez regrouper tous les fichiers JSON d'un répertoire en une seule division en définissant `data_files="*.json"`. Voir la [documentation](https://huggingface.co/docs/datasets/loading#local-and-remote-files) de 🤗 *Datasets* pour plus de détails.
-
-</Tip>
+> [!TIP]
+> L'argument `data_files` de la fonction `load_dataset()` est assez flexible et peut être soit un chemin de fichier unique, une liste de chemins de fichiers, ou un dictionnaire qui fait correspondre les noms des échantillons aux chemins de fichiers. Vous pouvez également regrouper les fichiers correspondant à un motif spécifié selon les règles utilisées par le shell Unix. Par exemple, vous pouvez regrouper tous les fichiers JSON d'un répertoire en une seule division en définissant `data_files="*.json"`. Voir la [documentation](https://huggingface.co/docs/datasets/loading#local-and-remote-files) de 🤗 *Datasets* pour plus de détails.
 
 Les scripts de chargement de 🤗 *Datasets* prennent en charge la décompression automatique des fichiers d'entrée. Nous aurions donc pu ignorer l'utilisation de `gzip` en pointant l'argument `data_files` directement sur les fichiers compressés :
 
@@ -162,8 +156,5 @@ squad_it_dataset = load_dataset("json", data_files=data_files, field="data")
 
 Cela renvoie le même objet `DatasetDict` obtenu ci-dessus mais nous évite de télécharger et de décompresser manuellement les fichiers _SQuAD_it-*.json.gz_. Ceci conclut notre incursion dans les différentes façons de charger des jeux de données qui ne sont pas hébergés sur le *Hub*. Maintenant que nous avons un jeu de données avec lequel jouer, mettons la main à la pâte avec diverses techniques de gestion des données !
 
-<Tip>
-
-✏️ **Essayez !** Choisissez un autre jeu de données hébergé sur GitHub ou dans le [*UCI Machine Learning Repository*](https://archive.ics.uci.edu/ml/index.php) et essayez de le charger localement et à distance en utilisant les techniques présentées ci-dessus. Pour obtenir des points bonus, essayez de charger un jeu de données stocké au format CSV ou texte (voir la [documentation](https://huggingface.co/docs/datasets/loading#local-and-remote-files) pour plus d'informations sur ces formats).
-
-</Tip>
+> [!TIP]
+> ✏️ **Essayez !** Choisissez un autre jeu de données hébergé sur GitHub ou dans le [*UCI Machine Learning Repository*](https://archive.ics.uci.edu/ml/index.php) et essayez de le charger localement et à distance en utilisant les techniques présentées ci-dessus. Pour obtenir des points bonus, essayez de charger un jeu de données stocké au format CSV ou texte (voir la [documentation](https://huggingface.co/docs/datasets/loading#local-and-remote-files) pour plus d'informations sur ces formats).
diff --git a/chapters/fr/chapter5/3.mdx b/chapters/fr/chapter5/3.mdx
index cbade2aaa..3e4c99d30 100644
--- a/chapters/fr/chapter5/3.mdx
+++ b/chapters/fr/chapter5/3.mdx
@@ -97,11 +97,8 @@ DatasetDict({
 })
 ```
 
-<Tip>
-
-✏️ **Essayez !** Utilisez la fonction ` Dataset.unique()` pour trouver le nombre de médicaments et de conditions uniques dans les échantillons d'entraînement et de test.
-
-</Tip>
+> [!TIP]
+> ✏️ **Essayez !** Utilisez la fonction ` Dataset.unique()` pour trouver le nombre de médicaments et de conditions uniques dans les échantillons d'entraînement et de test.
 
 Ensuite, normalisons toutes les étiquettes `condition` en utilisant `Dataset.map()`. Comme nous l'avons fait avec la tokenisation dans le [chapitre 3](/course/fr/chapter3), nous pouvons définir une fonction simple qui peut être appliquée sur toutes les lignes de chaque division dans `drug_dataset` :
 
@@ -228,11 +225,8 @@ drug_dataset["train"].sort("review_length")[:3]
 
 Comme nous le soupçonnions, certaines critiques ne contiennent qu'un seul mot, ce qui, bien que cela puisse convenir à l'analyse des sentiments, n’est pas informatif si nous voulons prédire la condition.
 
-<Tip>
-
-🙋 Une autre façon d'ajouter de nouvelles colonnes à un jeu de données consiste à utiliser la fonction `Dataset.add_column()`. Cela vous permet de donner la colonne sous forme de liste Python ou de tableau NumPy et peut être utile dans les situations où `Dataset.map()` n'est pas bien adapté à votre analyse.
-
-</Tip>
+> [!TIP]
+> 🙋 Une autre façon d'ajouter de nouvelles colonnes à un jeu de données consiste à utiliser la fonction `Dataset.add_column()`. Cela vous permet de donner la colonne sous forme de liste Python ou de tableau NumPy et peut être utile dans les situations où `Dataset.map()` n'est pas bien adapté à votre analyse.
 
 Utilisons la fonction `Dataset.filter()` pour supprimer les avis contenant moins de 30 mots. De la même manière que nous l'avons fait avec la colonne `condition`, nous pouvons filtrer les avis très courts en exigeant que les avis aient une longueur supérieure à ce seuil :
 
@@ -247,11 +241,8 @@ print(drug_dataset.num_rows)
 
 Comme vous pouvez le constater, cela a supprimé environ 15 % des avis de nos jeux d'entraînement et de test d'origine.
 
-<Tip>
-
-✏️ **Essayez !** Utilisez la fonction `Dataset.sort()` pour inspecter les avis avec le plus grand nombre de mots. Consultez la [documentation](https://huggingface.co/docs/datasets/package_reference/main_classes#datasets.Dataset.sort) pour voir quel argument vous devez utiliser pour trier les avis par longueur dans l'ordre décroissant.
-
-</Tip>
+> [!TIP]
+> ✏️ **Essayez !** Utilisez la fonction `Dataset.sort()` pour inspecter les avis avec le plus grand nombre de mots. Consultez la [documentation](https://huggingface.co/docs/datasets/package_reference/main_classes#datasets.Dataset.sort) pour voir quel argument vous devez utiliser pour trier les avis par longueur dans l'ordre décroissant.
 
 La dernière chose à laquelle nous devons faire face est la présence de caractères HTML dans nos avis. Nous pouvons utiliser le module `html` de Python pour supprimer ces caractères, comme ceci :
 
@@ -308,11 +299,8 @@ Comme vous l'avez vu dans le [chapitre 3](/course/fr/chapter3), nous pouvons pas
 
 Vous pouvez également chronométrer une cellule entière en mettant `%%time` au début de la cellule. Sur le matériel sur lequel nous avons exécuté cela, cela affichait 10,8 s pour cette instruction (c'est le nombre écrit après "Wall time").
 
-<Tip>
-
-✏️ **Essayez !** Exécutez la même instruction avec et sans `batched=True`, puis essayez-le avec un *tokenizer* lent (ajoutez `use_fast=False` dans la méthode `AutoTokenizer.from_pretrained()`) afin que vous puissiez voir quels temps vous obtenez sur votre matériel.
-
-</Tip>
+> [!TIP]
+> ✏️ **Essayez !** Exécutez la même instruction avec et sans `batched=True`, puis essayez-le avec un *tokenizer* lent (ajoutez `use_fast=False` dans la méthode `AutoTokenizer.from_pretrained()`) afin que vous puissiez voir quels temps vous obtenez sur votre matériel.
 
 Voici les résultats que nous avons obtenus avec et sans *batching*, avec un *tokenizer* rapide et un lent :
 
@@ -349,19 +337,13 @@ Options                       | *Tokenizer* rapide | *Tokenizer* lent
 
 Ce sont des résultats beaucoup plus raisonnables pour le *tokenizer* lent mais les performances du *tokenizer* rapide ont également été considérablement améliorées. Notez, cependant, que ce ne sera pas toujours le cas : pour des valeurs de `num_proc` autres que 8, nos tests ont montré qu'il était plus rapide d'utiliser `batched=True` sans cette option. En général, nous ne recommandons pas d'utiliser le multitraitement pour les *tokenizers* rapides avec `batched=True`.
 
-<Tip>
-
-Utiliser `num_proc` pour accélérer votre traitement est généralement une bonne idée tant que la fonction que vous utilisez n'effectue pas déjà une sorte de multitraitement.
-
-</Tip>
+> [!TIP]
+> Utiliser `num_proc` pour accélérer votre traitement est généralement une bonne idée tant que la fonction que vous utilisez n'effectue pas déjà une sorte de multitraitement.
 
 Toutes ces fonctionnalités condensées en une seule méthode sont déjà assez étonnantes, mais il y a plus ! Avec `Dataset.map()` et `batched=True` vous pouvez modifier le nombre d'éléments dans votre jeu de données. Ceci est très utile dans de nombreuses situations où vous souhaitez créer plusieurs fonctionnalités d'entraînement à partir d'un exemple. Nous devrons le faire dans le cadre du prétraitement de plusieurs des tâches de traitement du langage naturel que nous entreprendrons dans le [chapitre 7](/course/fr/chapter7).
 
-<Tip>
-
-💡 En apprentissage automatique, un _exemple_ est généralement défini comme l'ensemble de _features_ que nous donnons au modèle. Dans certains contextes, ces caractéristiques seront l'ensemble des colonnes d'un `Dataset`, mais dans d'autres (comme ici et pour la réponse aux questions), plusieurs caractéristiques peuvent être extraites d'un seul exemple et appartenir à une seule colonne.
-
-</Tip>
+> [!TIP]
+> 💡 En apprentissage automatique, un _exemple_ est généralement défini comme l'ensemble de _features_ que nous donnons au modèle. Dans certains contextes, ces caractéristiques seront l'ensemble des colonnes d'un `Dataset`, mais dans d'autres (comme ici et pour la réponse aux questions), plusieurs caractéristiques peuvent être extraites d'un seul exemple et appartenir à une seule colonne.
 
 Voyons comment cela fonctionne ! Ici, nous allons tokeniser nos exemples et les tronquer à une longueur maximale de 128 mais nous demanderons au *tokenizer* de renvoyer *tous* les morceaux des textes au lieu du premier. Cela peut être fait avec `return_overflowing_tokens=True` :
 
@@ -530,11 +512,8 @@ Créons un `pandas.DataFrame` pour l'ensemble d'entraînement en sélectionnant
 train_df = drug_dataset["train"][:]
 ```
 
-<Tip>
-
-🚨 Sous le capot, `Dataset.set_format()` change le format de retour pour la méthode `__getitem__()`. Cela signifie que lorsque nous voulons créer un nouvel objet comme `train_df` à partir d'un `Dataset` au format `"pandas"`, nous devons découper tout le jeu de données pour obtenir un `pandas.DataFrame`. Vous pouvez vérifier par vous-même que le type de `drug_dataset["train"]` est `Dataset`, quel que soit le format de sortie.
-
-</Tip>
+> [!TIP]
+> 🚨 Sous le capot, `Dataset.set_format()` change le format de retour pour la méthode `__getitem__()`. Cela signifie que lorsque nous voulons créer un nouvel objet comme `train_df` à partir d'un `Dataset` au format `"pandas"`, nous devons découper tout le jeu de données pour obtenir un `pandas.DataFrame`. Vous pouvez vérifier par vous-même que le type de `drug_dataset["train"]` est `Dataset`, quel que soit le format de sortie.
 
 
 De là, nous pouvons utiliser toutes les fonctionnalités Pandas que nous voulons. Par exemple, nous pouvons faire un chaînage sophistiqué pour calculer la distribution de classe parmi les entrées `condition` :
@@ -605,11 +584,8 @@ Dataset({
 })
 ```
 
-<Tip>
-
-✏️ **Essayez !** Calculez la note moyenne par médicament et stockez le résultat dans un nouveau jeu de données.
-
-</Tip>
+> [!TIP]
+> ✏️ **Essayez !** Calculez la note moyenne par médicament et stockez le résultat dans un nouveau jeu de données.
 
 Ceci conclut notre visite des différentes techniques de prétraitement disponibles dans 🤗 *Datasets*. Pour compléter la section, créons un ensemble de validation pour préparer le jeu de données à l’entraînement d'un classifieur. Avant cela, nous allons réinitialiser le format de sortie de `drug_dataset` de `"pandas"` à `"arrow"` :
 
diff --git a/chapters/fr/chapter5/4.mdx b/chapters/fr/chapter5/4.mdx
index 2d3d62ba6..13d7cb308 100644
--- a/chapters/fr/chapter5/4.mdx
+++ b/chapters/fr/chapter5/4.mdx
@@ -46,11 +46,8 @@ Dataset({
 
 Nous pouvons voir qu'il y a 15 518 009 lignes et 2 colonnes dans notre jeu de données. C'est beaucoup !
 
-<Tip>
-
-✎ Par défaut, 🤗 *Datasets* décompresse les fichiers nécessaires pour charger un jeu de données. Si vous souhaitez conserver de l'espace sur le disque dur, vous pouvez passer `DownloadConfig(delete_extracted=True)` à l'argument `download_config` de `load_dataset()`. Voir la [documentation](https://huggingface.co/docs/datasets/package_reference/builder_classes#datasets.DownloadConfig) pour plus de détails.
-
-</Tip>
+> [!TIP]
+> ✎ Par défaut, 🤗 *Datasets* décompresse les fichiers nécessaires pour charger un jeu de données. Si vous souhaitez conserver de l'espace sur le disque dur, vous pouvez passer `DownloadConfig(delete_extracted=True)` à l'argument `download_config` de `load_dataset()`. Voir la [documentation](https://huggingface.co/docs/datasets/package_reference/builder_classes#datasets.DownloadConfig) pour plus de détails.
 
 Inspectons le contenu du premier exemple :
 
@@ -103,11 +100,8 @@ Dataset size (cache file) : 19.54 GB
 
 Malgré sa taille de près de 20 Go, nous pouvons charger et accéder au jeu de données avec beaucoup moins de RAM !
 
-<Tip>
-
-✏️ **Essayez !** Choisissez l'un des [sous-ensembles](https://the-eye.eu/public/AI/pile_preliminary_components/) de *The Pile* qui est plus grand que la RAM de votre ordinateur portable ou de bureau. Chargez-le avec 🤗 *Datasets* et mesurez la quantité de RAM utilisée. Notez que pour obtenir une mesure précise, vous devrez le faire dans un nouveau processus. Vous pouvez trouver les tailles décompressées de chaque sous-ensemble dans le tableau 1 du papier de [*The Pile*](https://arxiv.org/abs/2101.00027).
-
-</Tip>
+> [!TIP]
+> ✏️ **Essayez !** Choisissez l'un des [sous-ensembles](https://the-eye.eu/public/AI/pile_preliminary_components/) de *The Pile* qui est plus grand que la RAM de votre ordinateur portable ou de bureau. Chargez-le avec 🤗 *Datasets* et mesurez la quantité de RAM utilisée. Notez que pour obtenir une mesure précise, vous devrez le faire dans un nouveau processus. Vous pouvez trouver les tailles décompressées de chaque sous-ensemble dans le tableau 1 du papier de [*The Pile*](https://arxiv.org/abs/2101.00027).
 
 Si vous êtes familier avec Pandas, ce résultat pourrait surprendre en raison de la célèbre [règle d'or](https://wesmckinney.com/blog/apache-arrow-pandas-internals/) de Wes Kinney selon laquelle vous avez généralement besoin de 5 à 10 fois plus de RAM que la taille de votre jeu de données. Alors, comment 🤗 *Datasets* résout-il ce problème de gestion de la mémoire ? 🤗 *Datasets* traite chaque jeu de données comme un [fichier mappé en mémoire](https://en.wikipedia.org/wiki/Memory-mapped_file). Cela fournit un mappage entre la RAM et le stockage du système de fichiers permettant à la bibliothèque d'accéder et d'opérer sur des éléments du jeu de données sans avoir besoin de le charger entièrement en mémoire.
 
@@ -135,11 +129,8 @@ print(
 
 Ici, nous avons utilisé le module `timeit` de Python pour mesurer le temps d'exécution pris par `code_snippet`. Vous pourrez généralement itérer sur un jeu de données à une vitesse de quelques dixièmes de Go/s à plusieurs Go/s. Cela fonctionne très bien pour la grande majorité des applications, mais vous devrez parfois travailler avec un jeu de données trop volumineux pour être même stocké sur le disque dur de votre ordinateur portable. Par exemple, si nous essayions de télécharger *The Pile* dans son intégralité, nous aurions besoin de 825 Go d'espace disque libre ! Pour gérer ces cas, 🤗 *Datasets* fournit une fonctionnalité de streaming qui nous permet de télécharger et d'accéder aux éléments à la volée, sans avoir besoin de télécharger l'intégralité du jeu de données. Voyons comment cela fonctionne.
 
-<Tip>
-
-💡 Dans les *notebooks* Jupyter, vous pouvez également chronométrer les cellules à l'aide de la fonction magique [`%%timeit`](https://ipython.readthedocs.io/en/stable/interactive/magics.html#magic-timeit).
-
-</Tip>
+> [!TIP]
+> 💡 Dans les *notebooks* Jupyter, vous pouvez également chronométrer les cellules à l'aide de la fonction magique [`%%timeit`](https://ipython.readthedocs.io/en/stable/interactive/magics.html#magic-timeit).
 
 ## Jeux de données en continu
 
@@ -177,11 +168,8 @@ next(iter(tokenized_dataset))
 {'input_ids': [101, 4958, 5178, 4328, 6779, ...], 'attention_mask': [1, 1, 1, 1, 1, ...]}
 ```
 
-<Tip>
-
-💡 Pour accélérer la tokenisation avec le streaming, vous pouvez passer `batched=True`, comme nous l'avons vu dans la dernière section. Il traitera les exemples batch par batch. La taille de batch par défaut est de 1 000 et peut être spécifiée avec l'argument `batch_size`.
-
-</Tip>
+> [!TIP]
+> 💡 Pour accélérer la tokenisation avec le streaming, vous pouvez passer `batched=True`, comme nous l'avons vu dans la dernière section. Il traitera les exemples batch par batch. La taille de batch par défaut est de 1 000 et peut être spécifiée avec l'argument `batch_size`.
 
 Vous pouvez également mélanger un jeu de données diffusé en continu à l'aide de `IterableDataset.shuffle()`, mais contrairement à `Dataset.shuffle()`, cela ne mélange que les éléments dans un `buffer_size` prédéfini :
 
@@ -289,10 +277,7 @@ next(iter(pile_dataset["train"]))
  'text': 'It is done, and submitted. You can play “Survival of the Tastiest” on Android, and on the web...'}
 ```
 
-<Tip>
-
-✏️ **Essayez !** Utilisez l'un des grands corpus Common Crawl comme [`mc4`](https://huggingface.co/datasets/mc4) ou [`oscar`](https://huggingface.co/datasets/oscar) pour créer en streaming un jeu de données multilingue représentant les proportions de langues parlées dans un pays de votre choix. Par exemple, les quatre langues nationales en Suisse sont l'allemand, le français, l'italien et le romanche. Vous pouvez donc essayer de créer un corpus suisse en échantillonnant les sous-ensembles Oscar en fonction de leur proportion parlée.
-
-</Tip>
+> [!TIP]
+> ✏️ **Essayez !** Utilisez l'un des grands corpus Common Crawl comme [`mc4`](https://huggingface.co/datasets/mc4) ou [`oscar`](https://huggingface.co/datasets/oscar) pour créer en streaming un jeu de données multilingue représentant les proportions de langues parlées dans un pays de votre choix. Par exemple, les quatre langues nationales en Suisse sont l'allemand, le français, l'italien et le romanche. Vous pouvez donc essayer de créer un corpus suisse en échantillonnant les sous-ensembles Oscar en fonction de leur proportion parlée.
 
 Vous disposez maintenant de tous les outils dont vous avez besoin pour charger et traiter des jeux de données de toutes formes et tailles. Cependant à moins que vous ne soyez exceptionnellement chanceux, il arrivera un moment dans votre cheminement en traitement du langage naturel où vous devrez réellement créer un jeu de données pour résoudre un problème donné. C'est le sujet de la section suivante !
diff --git a/chapters/fr/chapter5/5.mdx b/chapters/fr/chapter5/5.mdx
index 63a14bf36..4e3bc530b 100644
--- a/chapters/fr/chapter5/5.mdx
+++ b/chapters/fr/chapter5/5.mdx
@@ -116,11 +116,8 @@ response.json()
 
 Waouh, ça fait beaucoup d'informations ! Nous pouvons voir des champs utiles comme `title`, `body` et `number` qui décrivent le problème, ainsi que des informations sur l'utilisateur GitHub qui a ouvert le problème.
 
-<Tip>
-
-✏️ **Essayez !** Cliquez sur quelques-unes des URL pour avoir une idée du type d'informations auxquelles chaque problème GitHub est lié.
-
-</Tip>
+> [!TIP]
+> ✏️ **Essayez !** Cliquez sur quelques-unes des URL pour avoir une idée du type d'informations auxquelles chaque problème GitHub est lié.
 
 Comme décrit dans la [documentation GitHub](https://docs.github.com/en/rest/overview/resources-in-the-rest-api#rate-limiting), les requêtes non authentifiées sont limitées à 60 requêtes par heure. Bien que vous puissiez augmenter le paramètre de requête `per_page` pour réduire le nombre de requêtes que vous effectuez, vous atteindrez toujours la limite de débit sur tout dépôt contenant des milliers de problèmes. Donc, à la place, vous devez suivre les [instructions de GitHub](https://docs.github.com/en/github/authenticating-to-github/creating-a-personal-access-token) sur la création d'un _jeton d'accès personnel_ afin que vous peut augmenter la limite de débit à 5 000 requêtes par heure. Une fois que vous avez votre *token*, vous pouvez l'inclure dans l'en-tête de la requête :
 
@@ -129,11 +126,8 @@ GITHUB_TOKEN = xxx  # Copiez votre jeton GitHub ici
 headers = {"Authorization": f"token {GITHUB_TOKEN}"}
 ```
 
-<Tip warning={true}>
-
-⚠️ Ne partagez pas un *notebook* avec votre `GITHUB_TOKEN` collé dedans. Nous vous recommandons de supprimer la dernière cellule une fois que vous l'avez exécutée pour éviter de divulguer accidentellement ces informations. Mieux encore, stockez le jeton dans un fichier *.env* et utilisez la [bibliothèque `python-dotenv`](https://github.com/theskumar/python-dotenv) pour le charger automatiquement pour vous en tant que variable d'environnement.
-
-</Tip>
+> [!WARNING]
+> ⚠️ Ne partagez pas un *notebook* avec votre `GITHUB_TOKEN` collé dedans. Nous vous recommandons de supprimer la dernière cellule une fois que vous l'avez exécutée pour éviter de divulguer accidentellement ces informations. Mieux encore, stockez le jeton dans un fichier *.env* et utilisez la [bibliothèque `python-dotenv`](https://github.com/theskumar/python-dotenv) pour le charger automatiquement pour vous en tant que variable d'environnement.
 
 Maintenant que nous avons notre jeton d'accès, créons une fonction qui peut télécharger tous les problèmes depuis un référentiel GitHub :
 
@@ -240,11 +234,8 @@ issues_dataset = issues_dataset.map(
 )
 ```
 
-<Tip>
-
-✏️ **Essayez !** Calculez le temps moyen nécessaire pour résoudre les problèmes dans 🤗 *Datasets*. Vous pouvez trouver la fonction `Dataset.filter()` utile pour filtrer les demandes d'extraction et les problèmes ouverts. Vous pouvez utiliser la fonction `Dataset.set_format()` pour convertir le jeu de données en un `DataFrame` afin que vous puissiez facilement manipuler les horodatages `created_at` et `closed_at`. Pour les points bonus, calculez le temps moyen nécessaire pour fermer les *pull_requests*.
-
-</Tip>
+> [!TIP]
+> ✏️ **Essayez !** Calculez le temps moyen nécessaire pour résoudre les problèmes dans 🤗 *Datasets*. Vous pouvez trouver la fonction `Dataset.filter()` utile pour filtrer les demandes d'extraction et les problèmes ouverts. Vous pouvez utiliser la fonction `Dataset.set_format()` pour convertir le jeu de données en un `DataFrame` afin que vous puissiez facilement manipuler les horodatages `created_at` et `closed_at`. Pour les points bonus, calculez le temps moyen nécessaire pour fermer les *pull_requests*.
 
 Bien que nous puissions continuer à nettoyer davantage le jeu de données en supprimant ou en renommant certaines colonnes, il est généralement recommandé de le conserver aussi brut que possible à ce stade afin qu'il puisse être facilement utilisé dans plusieurs applications.
 
@@ -380,11 +371,8 @@ repo_url
 
 Dans cet exemple, nous avons créé un dépôt vide appelé `github-issues` sous le nom d'utilisateur `lewtun` (le nom d'utilisateur doit être votre nom d'utilisateur Hub lorsque vous exécutez ce code !).
 
-<Tip>
-
-✏️ **Essayez !** Utilisez votre nom d'utilisateur et votre mot de passe Hugging Face pour obtenir un jeton et créer un dépôt vide appelé `github-issues`. N'oubliez pas de **n'enregistrez jamais vos informations d'identification** dans Colab ou tout autre référentiel car ces informations peuvent être exploitées par de mauvais individus.
-
-</Tip>
+> [!TIP]
+> ✏️ **Essayez !** Utilisez votre nom d'utilisateur et votre mot de passe Hugging Face pour obtenir un jeton et créer un dépôt vide appelé `github-issues`. N'oubliez pas de **n'enregistrez jamais vos informations d'identification** dans Colab ou tout autre référentiel car ces informations peuvent être exploitées par de mauvais individus.
 
 Ensuite, clonons le dépôt du Hub sur notre machine locale et copions-y notre fichier jeu de données. 🤗 *Hub* fournit une classe `Repository` pratique qui encapsule de nombreuses commandes Git courantes. Donc pour cloner le dépôt distant, nous devons simplement fournir l'URL et le chemin local vers lesquels nous souhaitons cloner :
 
@@ -429,11 +417,8 @@ Dataset({
 
 Cool, nous avons poussé notre jeu de données vers le *Hub* et il est disponible pour que d'autres puissent l'utiliser ! Il ne reste plus qu'une chose importante à faire : ajouter une _carte de jeu de données_ qui explique comment le corpus a été créé et fournit d'autres informations utiles à la communauté.
 
-<Tip>
-
-💡 Vous pouvez également télécharger un jeu de données sur le *Hub* directement depuis le terminal en utilisant `huggingface-cli` et un peu de magie Git. Consultez le [guide de 🤗 *Datasets*](https://huggingface.co/docs/datasets/share#share-a-dataset-using-the-cli) pour savoir comment procéder.
-
-</Tip>
+> [!TIP]
+> 💡 Vous pouvez également télécharger un jeu de données sur le *Hub* directement depuis le terminal en utilisant `huggingface-cli` et un peu de magie Git. Consultez le [guide de 🤗 *Datasets*](https://huggingface.co/docs/datasets/share#share-a-dataset-using-the-cli) pour savoir comment procéder.
 
 ## Création d'une carte pour un jeu de données
 
@@ -456,15 +441,10 @@ Vous pouvez créer le fichier *README.md* directement sur le *Hub* et vous pouve
 <img src="https://huggingface.co/datasets/huggingface-course/documentation-images/resolve/main/en/chapter5/dataset-card.png" alt="A dataset card." width="80%"/>
 </div>
 
-<Tip>
-
-✏️ **Essayez !** Utilisez l'application `dataset-tagging` et [le guide de 🤗 *Datasets*](https://github.com/huggingface/datasets/blob/master/templates/README_guide.md) pour compléter le fichier *README.md* de votre jeu de données de problèmes GitHub.
-</Tip>
+> [!TIP]
+> ✏️ **Essayez !** Utilisez l'application `dataset-tagging` et [le guide de 🤗 *Datasets*](https://github.com/huggingface/datasets/blob/master/templates/README_guide.md) pour compléter le fichier *README.md* de votre jeu de données de problèmes GitHub.
 
 C’est tout ! Nous avons vu dans cette section que la création d'un bon jeu de données peut être assez complexe, mais heureusement, le télécharger et le partager avec la communauté ne l'est pas. Dans la section suivante, nous utiliserons notre nouveau jeu de données pour créer un moteur de recherche sémantique avec 🤗 *Datasets* qui peut faire correspondre les questions aux problèmes et commentaires les plus pertinents.
 
-<Tip>
-
-✏️ **Essayez !** Suivez les étapes que nous avons suivies dans cette section pour créer un jeu de données de problèmes GitHub pour votre bibliothèque open source préférée (choisissez autre chose que 🤗 *Datasets*, bien sûr !). Pour obtenir des points bonus, *finetunez* un classifieur multilabel pour prédire les balises présentes dans le champ `labels`.
-
-</Tip>
+> [!TIP]
+> ✏️ **Essayez !** Suivez les étapes que nous avons suivies dans cette section pour créer un jeu de données de problèmes GitHub pour votre bibliothèque open source préférée (choisissez autre chose que 🤗 *Datasets*, bien sûr !). Pour obtenir des points bonus, *finetunez* un classifieur multilabel pour prédire les balises présentes dans le champ `labels`.
diff --git a/chapters/fr/chapter5/6.mdx b/chapters/fr/chapter5/6.mdx
index 19d2e1f5d..9f4647b2a 100644
--- a/chapters/fr/chapter5/6.mdx
+++ b/chapters/fr/chapter5/6.mdx
@@ -191,11 +191,8 @@ Dataset({
 D'accord, cela nous a donné quelques milliers de commentaires avec lesquels travailler !
 
 
-<Tip>
-
-✏️ **Essayez !** Voyez si vous pouvez utiliser `Dataset.map()` pour exploser la colonne `comments` de `issues_dataset` _sans_ recourir à l'utilisation de Pandas. C'est un peu délicat. La section [« Batch mapping »](https://huggingface.co/docs/datasets/about_map_batch#batch-mapping) de la documentation 🤗 *Datasets* peut être utile pour cette tâche.
-
-</Tip>
+> [!TIP]
+> ✏️ **Essayez !** Voyez si vous pouvez utiliser `Dataset.map()` pour exploser la colonne `comments` de `issues_dataset` _sans_ recourir à l'utilisation de Pandas. C'est un peu délicat. La section [« Batch mapping »](https://huggingface.co/docs/datasets/about_map_batch#batch-mapping) de la documentation 🤗 *Datasets* peut être utile pour cette tâche.
 
 Maintenant que nous avons un commentaire par ligne, créons une nouvelle colonne `comments_length` contenant le nombre de mots par commentaire :
 
@@ -526,8 +523,5 @@ URL: https://github.com/huggingface/datasets/issues/824
 
 Pas mal ! Notre deuxième résultat semble correspondre à la requête.
 
-<Tip>
-
-✏️ **Essayez !** Créez votre propre requête et voyez si vous pouvez trouver une réponse dans les documents récupérés. Vous devrez peut-être augmenter le paramètre `k` dans `Dataset.get_nearest_examples()` pour élargir la recherche.
-
-</Tip>
+> [!TIP]
+> ✏️ **Essayez !** Créez votre propre requête et voyez si vous pouvez trouver une réponse dans les documents récupérés. Vous devrez peut-être augmenter le paramètre `k` dans `Dataset.get_nearest_examples()` pour élargir la recherche.
diff --git a/chapters/fr/chapter6/2.mdx b/chapters/fr/chapter6/2.mdx
index b77452523..fbcaccbe3 100644
--- a/chapters/fr/chapter6/2.mdx
+++ b/chapters/fr/chapter6/2.mdx
@@ -13,12 +13,8 @@ Si un modèle de langue n'est pas disponible dans la langue qui vous intéresse
 
 <Youtube id="DJimQynXZsQ"/>
 
-<Tip warning={true}>
-
-⚠️ Entraîner un *tokenizer* n'est pas la même chose qu'entraîner un modèle ! L'entraînement du modèle utilise la descente de gradient stochastique pour réduire un peu plus la perte à chaque batch. Il est par nature aléatoire (ce qui signifie que vous devez définir des graines pour obtenir les mêmes résultats lorsque vous effectuez deux fois le même entraînement). Entraîner un *tokenizer* est un processus statistique qui identifie les meilleurs sous-mots à choisir pour un corpus donné. Les règles exactes utilisées pour les choisir dépendent de l'algorithme de tokénisation. Le processus est déterministe, ce qui signifie que vous obtenez toujours les mêmes résultats lorsque vous vous entraînez avec le même algorithme sur le même corpus.
-
-
-</Tip>
+> [!WARNING]
+> ⚠️ Entraîner un *tokenizer* n'est pas la même chose qu'entraîner un modèle ! L'entraînement du modèle utilise la descente de gradient stochastique pour réduire un peu plus la perte à chaque batch. Il est par nature aléatoire (ce qui signifie que vous devez définir des graines pour obtenir les mêmes résultats lorsque vous effectuez deux fois le même entraînement). Entraîner un *tokenizer* est un processus statistique qui identifie les meilleurs sous-mots à choisir pour un corpus donné. Les règles exactes utilisées pour les choisir dépendent de l'algorithme de tokénisation. Le processus est déterministe, ce qui signifie que vous obtenez toujours les mêmes résultats lorsque vous vous entraînez avec le même algorithme sur le même corpus.
 
 ## Assemblage d'un corpus
 
diff --git a/chapters/fr/chapter6/3.mdx b/chapters/fr/chapter6/3.mdx
index 554d3705f..8950a15ec 100644
--- a/chapters/fr/chapter6/3.mdx
+++ b/chapters/fr/chapter6/3.mdx
@@ -37,11 +37,8 @@ Dans la discussion qui suit, nous ferons souvent la distinction entre les *token
 `batched=True`  | 10.8s          | 4min41s
 `batched=False` | 59.2s          | 5min3s
 
-<Tip warning={true}>
-
-⚠️ Lors de la tokenisation d'une seule phrase, vous ne verrez pas toujours une différence de vitesse entre les versions lente et rapide d'un même *tokenizer*. En fait, la version rapide peut même être plus lente ! Ce n'est que lorsque vous tokenisez beaucoup de textes en parallèle et en même temps que vous pourrez clairement voir la différence.
-
-</Tip>
+> [!WARNING]
+> ⚠️ Lors de la tokenisation d'une seule phrase, vous ne verrez pas toujours une différence de vitesse entre les versions lente et rapide d'un même *tokenizer*. En fait, la version rapide peut même être plus lente ! Ce n'est que lorsque vous tokenisez beaucoup de textes en parallèle et en même temps que vous pourrez clairement voir la différence.
 
 ## L'objet <i>BatchEncoding</i>
 
@@ -116,11 +113,8 @@ On peut voir que les *tokens* spéciaux du *tokenizer*, `[CLS]` et `[SEP]`, sont
 
 La notion de ce qu'est un mot est compliquée. Par exemple, est-ce que « I'll » (contraction de « I will ») compte pour un ou deux mots ? Cela dépend en fait du *tokenizer* et de l'opération de prétokénisation qu'il applique. Certains *tokenizer* se contentent de séparer les espaces et considèrent donc qu'il s'agit d'un seul mot. D'autres utilisent la ponctuation en plus des espaces et considèrent donc qu'il s'agit de deux mots.
 
-<Tip>
-
-✏️ **Essayez !** Créez un *tokenizer* à partir des <i>checkpoints</i> `bert-base-cased` et `roberta-base` et tokenisez « 81s » avec. Qu'observez-vous ? Quels sont les identifiants des mots ?
-
-</Tip>
+> [!TIP]
+> ✏️ **Essayez !** Créez un *tokenizer* à partir des <i>checkpoints</i> `bert-base-cased` et `roberta-base` et tokenisez « 81s » avec. Qu'observez-vous ? Quels sont les identifiants des mots ?
 
 De même, il existe une méthode `sentence_ids()` que nous pouvons utiliser pour associer un *token* à la phrase dont il provient (bien que dans ce cas, le `token_type_ids` retourné par le *tokenizer* peut nous donner la même information).
 
@@ -137,11 +131,8 @@ Sylvain
 
 Comme nous l'avons mentionné précédemment, tout ceci est rendu possible par le fait que le *tokenizer* rapide garde la trace de la partie du texte d'où provient chaque *token* dans une liste d'*offsets*. Pour illustrer leur utilisation, nous allons maintenant vous montrer comment reproduire manuellement les résultats du pipeline `token-classification`.
 
-<Tip>
-
-✏️ **Essayez !** Rédigez votre propre texte et voyez si vous pouvez comprendre quels *tokens* sont associés à l'identifiant du mot et comment extraire les étendues de caractères pour un seul mot. Pour obtenir des points bonus, essayez d'utiliser deux phrases en entrée et voyez si les identifiants ont un sens pour vous.
-
-</Tip>
+> [!TIP]
+> ✏️ **Essayez !** Rédigez votre propre texte et voyez si vous pouvez comprendre quels *tokens* sont associés à l'identifiant du mot et comment extraire les étendues de caractères pour un seul mot. Pour obtenir des points bonus, essayez d'utiliser deux phrases en entrée et voyez si les identifiants ont un sens pour vous.
 
 ## A l'intérieur du pipeline `token-classification`
 
diff --git a/chapters/fr/chapter6/3b.mdx b/chapters/fr/chapter6/3b.mdx
index f523851ed..6047ae1b1 100644
--- a/chapters/fr/chapter6/3b.mdx
+++ b/chapters/fr/chapter6/3b.mdx
@@ -319,11 +319,8 @@ Nous n'avons pas encore tout à fait terminé, mais au moins nous avons déjà l
 0.97773
 ```
 
-<Tip>
-
-✏️ **Essayez !** Calculez les indices de début et de fin pour les cinq réponses les plus probables.
-
-</Tip>
+> [!TIP]
+> ✏️ **Essayez !** Calculez les indices de début et de fin pour les cinq réponses les plus probables.
 
 Nous avons les `start_index` et `end_index` de la réponse en termes de *tokens*. Maintenant nous devons juste convertir en indices de caractères dans le contexte. C'est là que les *offsets* seront super utiles. Nous pouvons les saisir et les utiliser comme nous l'avons fait dans la tâche de classification des *tokens* :
 
@@ -357,11 +354,8 @@ print(result)
 
 Super ! C'est la même chose que dans notre premier exemple !
 
-<Tip>
-
-✏️ **Essayez !** Utilisez les meilleurs scores que vous avez calculés précédemment pour afficher les cinq réponses les plus probables. Pour vérifier vos résultats, retournez au premier pipeline et passez dans `top_k=5` lorsque vous l'appelez.
-
-</Tip>
+> [!TIP]
+> ✏️ **Essayez !** Utilisez les meilleurs scores que vous avez calculés précédemment pour afficher les cinq réponses les plus probables. Pour vérifier vos résultats, retournez au premier pipeline et passez dans `top_k=5` lorsque vous l'appelez.
 
 ## Gestion des contextes longs
 
@@ -689,11 +683,8 @@ print(candidates)
 
 Ces deux candidats correspondent aux meilleures réponses que le modèle a pu trouver dans chaque morceau. Le modèle est beaucoup plus confiant dans le fait que la bonne réponse se trouve dans la deuxième partie (ce qui est bon signe !). Il ne nous reste plus qu'à faire correspondre ces deux espaces de *tokens* à des espaces de caractères dans le contexte (nous n'avons besoin de faire correspondre que le second pour avoir notre réponse, mais il est intéressant de voir ce que le modèle a choisi dans le premier morceau).
 
-<Tip>
-
-✏️ **Essayez !** Adaptez le code ci-dessus pour renvoyer les scores et les étendues des cinq réponses les plus probables (au total, pas par morceau).
-
-</Tip>
+> [!TIP]
+> ✏️ **Essayez !** Adaptez le code ci-dessus pour renvoyer les scores et les étendues des cinq réponses les plus probables (au total, pas par morceau).
 
 Le `offsets` que nous avons saisi plus tôt est en fait une liste d'*offsets* avec une liste par morceau de texte :
 
@@ -714,10 +705,7 @@ for candidate, offset in zip(candidates, offsets):
 
 Si nous ignorons le premier résultat, nous obtenons le même résultat que notre pipeline pour ce long contexte !
 
-<Tip>
-
-✏️ **Essayez !** Utilisez les meilleurs scores que vous avez calculés auparavant pour montrer les cinq réponses les plus probables (pour l'ensemble du contexte, pas pour chaque morceau). Pour vérifier vos résultats, retournez au premier pipeline et spécifiez `top_k=5` en argument en l'appelant.
-
-</Tip>
+> [!TIP]
+> ✏️ **Essayez !** Utilisez les meilleurs scores que vous avez calculés auparavant pour montrer les cinq réponses les plus probables (pour l'ensemble du contexte, pas pour chaque morceau). Pour vérifier vos résultats, retournez au premier pipeline et spécifiez `top_k=5` en argument en l'appelant.
 
 Ceci conclut notre plongée en profondeur dans les capacités du *tokenizer*. Nous mettrons à nouveau tout cela en pratique dans le prochain chapitre, lorsque nous vous montrerons comment *finetuner* un modèle sur une série de tâches NLP courantes.
diff --git a/chapters/fr/chapter6/4.mdx b/chapters/fr/chapter6/4.mdx
index 968c6578b..956d985a7 100644
--- a/chapters/fr/chapter6/4.mdx
+++ b/chapters/fr/chapter6/4.mdx
@@ -49,11 +49,8 @@ print(tokenizer.backend_tokenizer.normalizer.normalize_str("Héllò hôw are ü?
 
 Dans cet exemple, puisque nous avons choisi le *checkpoint* `bert-base-uncased`, la normalisation a mis le texte en minuscule et supprimé les accents. 
 
-<Tip>
-
-✏️ **Essayez !** Chargez un *tokenizer* depuis le *checkpoint* `bert-base-cased` et passez-lui le même exemple. Quelles sont les principales différences que vous pouvez voir entre les versions casée et non casée du *tokenizer* ?
-
-</Tip>
+> [!TIP]
+> ✏️ **Essayez !** Chargez un *tokenizer* depuis le *checkpoint* `bert-base-cased` et passez-lui le même exemple. Quelles sont les principales différences que vous pouvez voir entre les versions casée et non casée du *tokenizer* ?
 
 ## Prétokenization
 
diff --git a/chapters/fr/chapter6/5.mdx b/chapters/fr/chapter6/5.mdx
index 5c9108ba3..44bcfaff0 100644
--- a/chapters/fr/chapter6/5.mdx
+++ b/chapters/fr/chapter6/5.mdx
@@ -13,11 +13,8 @@ Le *Byte-Pair Encoding* (BPE) a été initialement développé en tant qu'algori
 
 <Youtube id="HEikzVL-lZU"/>
 
-<Tip>
-
-💡 Cette section couvre le BPE en profondeur, allant jusqu'à montrer une implémentation complète. Vous pouvez passer directement à la fin si vous souhaitez simplement avoir un aperçu général de l'algorithme de tokenisation.
-
-</Tip>
+> [!TIP]
+> 💡 Cette section couvre le BPE en profondeur, allant jusqu'à montrer une implémentation complète. Vous pouvez passer directement à la fin si vous souhaitez simplement avoir un aperçu général de l'algorithme de tokenisation.
 
 ## Algorithme d'entraînement
 
@@ -29,11 +26,8 @@ L'entraînement du BPE commence par le calcul de l'unique ensemble de mots utili
 
 Le vocabulaire de base sera alors `["b", "g", "h", "n", "p", "s", "u"]`. Dans le monde réel, le vocabulaire de base contient au moins tous les caractères ASCII et probablement aussi quelques caractères Unicode. Si un exemple que vous tokenisez utilise un caractère qui n'est pas dans le corpus d'entraînement, ce caractère est converti en *token* inconnu. C'est l'une des raisons pour lesquelles de nombreux modèles de NLP sont par exemple très mauvais dans l'analyse de contenus contenant des emojis.
 
-<Tip>
-
-Les *tokenizers* du GPT-2 et de RoBERTa (qui sont assez similaires) ont une façon intelligente de gérer ce problème : ils ne considèrent pas les mots comme étant écrits avec des caractères Unicode mais avec des octets. De cette façon, le vocabulaire de base a une petite taille (256) et tous les caractères auxquels vous pouvez penser seront inclus dedans et ne finiront pas par être convertis en un *token* inconnu. Cette astuce est appelée *byte-level BPE*.
-
-</Tip>
+> [!TIP]
+> Les *tokenizers* du GPT-2 et de RoBERTa (qui sont assez similaires) ont une façon intelligente de gérer ce problème : ils ne considèrent pas les mots comme étant écrits avec des caractères Unicode mais avec des octets. De cette façon, le vocabulaire de base a une petite taille (256) et tous les caractères auxquels vous pouvez penser seront inclus dedans et ne finiront pas par être convertis en un *token* inconnu. Cette astuce est appelée *byte-level BPE*.
 
 Après avoir obtenu ce vocabulaire de base, nous ajoutons de nouveaux *tokens* jusqu'à ce que la taille souhaitée du vocabulaire soit atteinte en apprenant les fusions qui sont des règles permettant de fusionner deux éléments du vocabulaire existant pour en créer un nouveau. Ainsi, au début, ces fusions créeront des *tokens* de deux caractères, puis au fur et à mesure de l'entraînement, des sous-mots plus longs.
 
@@ -76,11 +70,8 @@ Corpus: ("hug", 10), ("p" "ug", 5), ("p" "un", 12), ("b" "un", 4), ("hug" "s", 5
 
 Et nous continuons ainsi jusqu'à ce que nous atteignions la taille de vocabulaire souhaitée.
 
-<Tip>
-
-✏️ **A votre tour !** A votre avis, quelle sera la prochaine règle de fusion ?
-
-</Tip>
+> [!TIP]
+> ✏️ **A votre tour !** A votre avis, quelle sera la prochaine règle de fusion ?
 
 ## Algorithme de tokenisation
 
@@ -101,11 +92,8 @@ Prenons l'exemple que nous avons utilisé pendant l'entraînement, avec les troi
 
 Le mot « bug »  sera traduit par « ["b", "ug"] ». Par contre, le mot « mug » (tasse en français) sera traduit par « ["[UNK]", "ug"] » puisque la lettre « m » ne fait pas partie du vocabulaire de base. De la même façon, le mot « thug » (voyou en français) sera tokenisé en « ["[UNK]", "hug"] » car la lettre « t » n'est pas dans le vocabulaire de base et l'application des règles de fusion résulte d'abord en la fusion de « u » et « g » et ensuite en la fusion de « hu » et « g ».
 
-<Tip>
-
-✏️ **A votre tour !** Comment pensez-vous que le mot « unhug » (détacher en français) sera tokenisé ?
-
-</Tip>
+> [!TIP]
+> ✏️ **A votre tour !** Comment pensez-vous que le mot « unhug » (détacher en français) sera tokenisé ?
 
 ## Implémentation du BPE
 
@@ -321,11 +309,8 @@ print(vocab)
  'Ġtok', 'Ġtoken', 'nd', 'Ġis', 'Ġth', 'Ġthe', 'in', 'Ġab', 'Ġtokeni']
 ```
 
-<Tip>
-
-💡 Utiliser `train_new_from_iterator()` sur le même corpus ne donnera pas exactement le même vocabulaire. C'est parce que lorsqu'il y a un choix de la paire la plus fréquente, nous avons sélectionné la première rencontrée, alors que la bibliothèque 🤗 *Tokenizers* sélectionne la première en fonction de ses identifiants internes.
-
-</Tip>
+> [!TIP]
+> 💡 Utiliser `train_new_from_iterator()` sur le même corpus ne donnera pas exactement le même vocabulaire. C'est parce que lorsqu'il y a un choix de la paire la plus fréquente, nous avons sélectionné la première rencontrée, alors que la bibliothèque 🤗 *Tokenizers* sélectionne la première en fonction de ses identifiants internes.
 
 Pour tokeniser un nouveau texte, on le prétokenise, on le divise, puis on applique toutes les règles de fusion apprises :
 
@@ -357,10 +342,7 @@ tokenize("This is not a token.")
 ['This', 'Ġis', 'Ġ', 'n', 'o', 't', 'Ġa', 'Ġtoken', '.']
 ```
 
-<Tip warning={true}>
-
-⚠️ Notre implémentation lancera une erreur s'il y a un caractère inconnu puisque nous n'avons rien fait pour les gérer. GPT-2 n'a pas réellement de <i>token</i> inconnu (il est impossible d'obtenir un caractère inconnu en utilisant le BPE au niveau de l'octet) mais cela pourrait arriver ici car nous n'avons pas inclus tous les octets possibles dans le vocabulaire initial. Cet aspect du BPE dépasse le cadre de cette section, nous avons donc laissé ces détails de côté.
-
-</Tip>
+> [!WARNING]
+> ⚠️ Notre implémentation lancera une erreur s'il y a un caractère inconnu puisque nous n'avons rien fait pour les gérer. GPT-2 n'a pas réellement de <i>token</i> inconnu (il est impossible d'obtenir un caractère inconnu en utilisant le BPE au niveau de l'octet) mais cela pourrait arriver ici car nous n'avons pas inclus tous les octets possibles dans le vocabulaire initial. Cet aspect du BPE dépasse le cadre de cette section, nous avons donc laissé ces détails de côté.
 
 C'est tout pour l'algorithme BPE ! Nous allons nous intéresser à WordPiece dans la suite.
diff --git a/chapters/fr/chapter6/6.mdx b/chapters/fr/chapter6/6.mdx
index 0e5516588..76bc57f92 100644
--- a/chapters/fr/chapter6/6.mdx
+++ b/chapters/fr/chapter6/6.mdx
@@ -13,19 +13,13 @@
 
 <Youtube id="qpv6ms_t_1A"/>
 
-<Tip>
-
-💡 Cette section couvre le <i>WordPiece</i> en profondeur, allant jusqu'à montrer une implémentation complète. Vous pouvez passer directement à la fin si vous souhaitez simplement avoir un aperçu général de l'algorithme de tokénisation.
-
-</Tip>
+> [!TIP]
+> 💡 Cette section couvre le <i>WordPiece</i> en profondeur, allant jusqu'à montrer une implémentation complète. Vous pouvez passer directement à la fin si vous souhaitez simplement avoir un aperçu général de l'algorithme de tokénisation.
 
 ## Algorithme d'entraînement
 
-<Tip warning={true}>
-
-⚠️ Google n'a jamais mis en ligne son implémentation de l'algorithme d'entraînement du <i>WordPiece</i>. Ce qui suit est donc notre meilleure estimation basée sur la littérature publiée. Il se peut qu'elle ne soit pas exacte à 100 %.
-
-</Tip>
+> [!WARNING]
+> ⚠️ Google n'a jamais mis en ligne son implémentation de l'algorithme d'entraînement du <i>WordPiece</i>. Ce qui suit est donc notre meilleure estimation basée sur la littérature publiée. Il se peut qu'elle ne soit pas exacte à 100 %.
 
 Comme le BPE, *WordPiece* part d'un petit vocabulaire comprenant les *tokens* spéciaux utilisés par le modèle et l'alphabet initial. Puisqu'il identifie les sous-mots en ajoutant un préfixe (comme `##` pour BERT), chaque mot est initialement découpé en ajoutant ce préfixe à tous les caractères du mot. Ainsi par exemple, `"word"` est divisé comme ceci :
 
@@ -78,11 +72,8 @@ Corpus: ("hug", 10), ("p" "##u" "##g", 5), ("p" "##u" "##n", 12), ("b" "##u" "##
 
 et nous continuons ainsi jusqu'à ce que nous atteignions la taille de vocabulaire souhaitée.
 
-<Tip>
-
-✏️ **A votre tour !** Quelle sera la prochaine règle de fusion ?
-
-</Tip>
+> [!TIP]
+> ✏️ **A votre tour !** Quelle sera la prochaine règle de fusion ?
 
 ## Algorithme de tokenisation
 
@@ -94,11 +85,8 @@ Comme autre exemple, voyons comment le mot `"bugs"` serait tokenisé. `"b"` est
 
 Lorsque la tokenisation arrive à un stade où il n'est pas possible de trouver un sous-mot dans le vocabulaire, le mot entier est tokenisé comme inconnu. Par exemple, `"mug"` serait tokenisé comme `["[UNK]"]`, tout comme `"bum"` (même si on peut commencer par " b " et " ##u ", " ##m " ne fait pas partie du vocabulaire, et le *tokenizer* résultant sera simplement `["[UNK]"]` " et non `["b", "##u", "[UNK]"]` "). C'est une autre différence avec le BPE qui classerait seulement les caractères individuels qui ne sont pas dans le vocabulaire comme inconnus.
 
-<Tip>
-
-✏️ **A votre tour !** Comment le mot `"pugs"` sera-t-il tokenisé ?
-
-</Tip>
+> [!TIP]
+> ✏️ **A votre tour !** Comment le mot `"pugs"` sera-t-il tokenisé ?
 
 ## Implémentation de <i>WordPiece</i>
 
@@ -320,11 +308,8 @@ print(vocab)
 
 Comme nous pouvons le voir, comparé à BPE, ce *tokenizer* apprend les parties de mots comme des *tokens* un peu plus rapidement.
 
-<Tip>
-
-💡 Utiliser `train_new_from_iterator()` sur le même corpus ne donnera pas exactement le même vocabulaire. C'est parce que la bibliothèque 🤗 *Tokenizers* n'implémente pas *WordPiece* pour l'entraînement (puisque nous ne sommes pas complètement sûrs de son fonctionnement interne), mais utilise le BPE à la place.
-
-</Tip>
+> [!TIP]
+> 💡 Utiliser `train_new_from_iterator()` sur le même corpus ne donnera pas exactement le même vocabulaire. C'est parce que la bibliothèque 🤗 *Tokenizers* n'implémente pas *WordPiece* pour l'entraînement (puisque nous ne sommes pas complètement sûrs de son fonctionnement interne), mais utilise le BPE à la place.
 
 Pour tokeniser un nouveau texte, on le prétokenise, on le divise, puis on applique l'algorithme de tokenisation sur chaque mot. En d'autres termes, nous recherchons le plus grand sous-mot commençant au début du premier mot et le divisons. Puis nous répétons le processus sur la deuxième partie et ainsi de suite pour le reste de ce mot et les mots suivants dans le texte :
 
diff --git a/chapters/fr/chapter6/7.mdx b/chapters/fr/chapter6/7.mdx
index ab970d817..6ff5f0b27 100644
--- a/chapters/fr/chapter6/7.mdx
+++ b/chapters/fr/chapter6/7.mdx
@@ -13,11 +13,8 @@ L'algorithme *Unigram* est souvent utilisé dans *SentencePiece*, qui est l'algo
 
 <Youtube id="TGZfZVuF9Yc"/>
 
-<Tip>
-
-💡 Cette section couvre *Unigram* en profondeur, allant jusqu'à montrer une implémentation complète. Vous pouvez passer directement à la fin si vous souhaitez simplement avoir un aperçu général de l'algorithme de tokénisation.
-
-</Tip>
+> [!TIP]
+> 💡 Cette section couvre *Unigram* en profondeur, allant jusqu'à montrer une implémentation complète. Vous pouvez passer directement à la fin si vous souhaitez simplement avoir un aperçu général de l'algorithme de tokénisation.
 
 ## Algorithme d'entraînement
 
@@ -58,11 +55,8 @@ Voici les fréquences de tous les sous-mots possibles dans le vocabulaire :
 
 Ainsi, la somme de toutes les fréquences est de 210 et la probabilité du sous-mot `"ug"` est donc de 20/210.
 
-<Tip>
-
-✏️ **A votre tour !** Ecrivez le code permettant de calculer les fréquences ci-dessus et vérifiez que les résultats affichés sont corrects, de même que la somme totale.
-
-</Tip>
+> [!TIP]
+> ✏️ **A votre tour !** Ecrivez le code permettant de calculer les fréquences ci-dessus et vérifiez que les résultats affichés sont corrects, de même que la somme totale.
 
 Maintenant, pour tokeniser un mot donné, nous examinons toutes les segmentations possibles en *tokens* et calculons la probabilité de chacune d'entre elles selon le modèle *Unigram*. Puisque tous les *tokens* sont considérés comme indépendants, cette probabilité est juste le produit de la probabilité de chaque *token*. Par exemple, la tokenisation `["p", "u", "g"]` de `"pug"` a la probabilité :
 
@@ -100,11 +94,8 @@ Character 4 (g): "un" "hug" (score 0.005442)
 
 Ainsi, `"unhug"` serait tokenisé comme `["un", "hug"]`.
 
-<Tip>
-
-✏️ **A votre tour !** Déterminer la tokenization du mot `"huggun"` et son score.
-
-</Tip>
+> [!TIP]
+> ✏️ **A votre tour !** Déterminer la tokenization du mot `"huggun"` et son score.
 
 ## Retour à l'entraînement
 
@@ -221,11 +212,8 @@ token_freqs = list(char_freqs.items()) + sorted_subwords[: 300 - len(char_freqs)
 token_freqs = {token: freq for token, freq in token_freqs}
 ```
 
-<Tip>
-
-💡 *SentencePiece* utilise un algorithme plus efficace appelé *Enhanced Suffix Array* (ESA) pour créer le vocabulaire initial.
-
-</Tip>
+> [!TIP]
+> 💡 *SentencePiece* utilise un algorithme plus efficace appelé *Enhanced Suffix Array* (ESA) pour créer le vocabulaire initial.
 
 Ensuite, nous calculons la somme de toutes les fréquences, pour convertir les fréquences en probabilités. Pour notre modèle, nous allons stocker les logarithmes des probabilités, car c'est plus stable numériquement d'additionner des logarithmes que de multiplier des petits nombres. Cela simplifiera aussi le calcul de la perte du modèle :
 
@@ -346,11 +334,8 @@ Puisque `"ll"` est utilisé dans la tokenisation de `"Hopefully"`, et que le sup
 0.0
 ```
 
-<Tip>
-
-💡 Cette approche est très inefficace, c'est pourquoi *SentencePiece* utilise une approximation de la perte du modèle sans le *token* X. Au lieu de partir de zéro, il remplace simplement le *token* X par sa segmentation dans le vocabulaire restant. De cette façon, tous les scores peuvent être calculés en une seule fois, en même temps que la perte du modèle.
-
-</Tip>
+> [!TIP]
+> 💡 Cette approche est très inefficace, c'est pourquoi *SentencePiece* utilise une approximation de la perte du modèle sans le *token* X. Au lieu de partir de zéro, il remplace simplement le *token* X par sa segmentation dans le vocabulaire restant. De cette façon, tous les scores peuvent être calculés en une seule fois, en même temps que la perte du modèle.
 
 Une fois tout cela en place, la dernière chose à faire est d'ajouter les *tokens* spéciaux utilisés par le modèle au vocabulaire, puis de boucler jusqu'à ce que nous ayons élagué suffisamment de *tokens* du vocabulaire pour atteindre la taille souhaitée :
 
diff --git a/chapters/fr/chapter6/8.mdx b/chapters/fr/chapter6/8.mdx
index 5bed366b9..44dd5298d 100644
--- a/chapters/fr/chapter6/8.mdx
+++ b/chapters/fr/chapter6/8.mdx
@@ -115,12 +115,9 @@ print(tokenizer.normalizer.normalize_str("Héllò hôw are ü?"))
 hello how are u?
 ```
 
-<Tip>
-
-**Pour aller plus loin** Si vous testez les deux versions des normaliseurs précédents sur une chaîne contenant le caractère unicode `u"\u0085"` vous remarquerez sûrement qu'ils ne sont pas exactement équivalents. 
-Pour ne pas trop compliquer la version avec `normalizers.Sequence`, nous n'avons pas inclus les Regex que le `BertNormalizer` requiert quand l'argument `clean_text` est mis à `True` ce qui est le comportement par défaut. Mais ne vous inquiétez pas : il est possible d'obtenir exactement la même normalisation sans utiliser le très pratique `BertNormalizer` en ajoutant deux `normalizers.Replace` à la séquence de normalisation. 
-
-</Tip>
+> [!TIP]
+> **Pour aller plus loin** Si vous testez les deux versions des normaliseurs précédents sur une chaîne contenant le caractère unicode `u"\u0085"` vous remarquerez sûrement qu'ils ne sont pas exactement équivalents. 
+> Pour ne pas trop compliquer la version avec `normalizers.Sequence`, nous n'avons pas inclus les Regex que le `BertNormalizer` requiert quand l'argument `clean_text` est mis à `True` ce qui est le comportement par défaut. Mais ne vous inquiétez pas : il est possible d'obtenir exactement la même normalisation sans utiliser le très pratique `BertNormalizer` en ajoutant deux `normalizers.Replace` à la séquence de normalisation.
 
 L'étape suivante est la prétokenisation. Encore une fois, il y a un `BertPreTokenizer` préconstruit que nous pouvons utiliser :
 
diff --git a/chapters/fr/chapter7/1.mdx b/chapters/fr/chapter7/1.mdx
index 603659553..0d5db03ab 100644
--- a/chapters/fr/chapter7/1.mdx
+++ b/chapters/fr/chapter7/1.mdx
@@ -31,8 +31,5 @@ Chaque section peut être lue indépendamment.
 {/if}
 
 
-<Tip>
-
-Si vous lisez les sections dans l'ordre, vous remarquerez qu'elles ont beaucoup de code et de prose en commun. La répétition est intentionnelle, afin de vous permettre de vous plonger (ou de revenir plus tard) dans une tâche qui vous intéresse et de trouver un exemple fonctionnel complet.
-
-</Tip>
+> [!TIP]
+> Si vous lisez les sections dans l'ordre, vous remarquerez qu'elles ont beaucoup de code et de prose en commun. La répétition est intentionnelle, afin de vous permettre de vous plonger (ou de revenir plus tard) dans une tâche qui vous intéresse et de trouver un exemple fonctionnel complet.
diff --git a/chapters/fr/chapter7/2.mdx b/chapters/fr/chapter7/2.mdx
index 67bd0297e..f52969eec 100644
--- a/chapters/fr/chapter7/2.mdx
+++ b/chapters/fr/chapter7/2.mdx
@@ -49,11 +49,8 @@ Vous pouvez trouver, télécharger et vérifier les précisions de ce modèle su
 
 Tout d'abord, nous avons besoin d'un jeu de données adapté à la classification des *tokens*. Dans cette section, nous utiliserons le jeu de données [CoNLL-2003](https://huggingface.co/datasets/conll2003), qui contient des articles de presse de Reuters. 
 
-<Tip>
-
-💡 Tant que votre jeu de données consiste en des textes divisés en mots avec leurs étiquettes correspondantes, vous pourrez adapter les procédures de traitement des données décrites ici à votre propre jeu de données. Reportez-vous au [chapitre 5](/course/fr/chapter5) si vous avez besoin d'un rafraîchissement sur la façon de charger vos propres données personnalisées dans un `Dataset`.
-
-</Tip>
+> [!TIP]
+> 💡 Tant que votre jeu de données consiste en des textes divisés en mots avec leurs étiquettes correspondantes, vous pourrez adapter les procédures de traitement des données décrites ici à votre propre jeu de données. Reportez-vous au [chapitre 5](/course/fr/chapter5) si vous avez besoin d'un rafraîchissement sur la façon de charger vos propres données personnalisées dans un `Dataset`.
 
 ### Le jeu de données CoNLL-2003
 
@@ -171,11 +168,8 @@ Et pour un exemple mélangeant les étiquettes `B-` et `I-`, voici ce que le mê
 
 Comme on peut le voir, les entités couvrant deux mots, comme « European Union » et « Werner Zwingmann », se voient attribuer une étiquette `B-` pour le premier mot et une étiquette `I-` pour le second.
 
-<Tip>
-
-✏️ *A votre tour !* Affichez les deux mêmes phrases avec leurs étiquettes POS ou *chunking*.
-
-</Tip>
+> [!TIP]
+> ✏️ *A votre tour !* Affichez les deux mêmes phrases avec leurs étiquettes POS ou *chunking*.
 
 ### Traitement des données
 
@@ -267,10 +261,8 @@ print(align_labels_with_tokens(labels, word_ids))
 
 Comme nous pouvons le voir, notre fonction a ajouté `-100` pour les deux *tokens* spéciaux du début et de fin, et un nouveau `0` pour notre mot qui a été divisé en deux *tokens*.
 
-<Tip>
-
-✏️ *A votre tour !* Certains chercheurs préfèrent n'attribuer qu'une seule étiquette par mot et attribuer `-100` aux autres sous-*tokens* dans un mot donné. Ceci afin d'éviter que les longs mots qui se divisent en plusieurs batchs ne contribuent fortement à la perte. Changez la fonction précédente pour aligner les étiquettes avec les identifiants d'entrée en suivant cette règle.
-</Tip>
+> [!TIP]
+> ✏️ *A votre tour !* Certains chercheurs préfèrent n'attribuer qu'une seule étiquette par mot et attribuer `-100` aux autres sous-*tokens* dans un mot donné. Ceci afin d'éviter que les longs mots qui se divisent en plusieurs batchs ne contribuent fortement à la perte. Changez la fonction précédente pour aligner les étiquettes avec les identifiants d'entrée en suivant cette règle.
 
 Pour prétraiter notre jeu de données, nous devons tokeniser toutes les entrées et appliquer `align_labels_with_tokens()` sur toutes les étiquettes. Pour profiter de la vitesse de notre *tokenizer* rapide, il est préférable de tokeniser beaucoup de textes en même temps. Nous allons donc écrire une fonction qui traite une liste d'exemples et utiliser la méthode `Dataset.map()` avec l'option `batched=True`. La seule chose qui diffère de notre exemple précédent est que la fonction `word_ids()` a besoin de récupérer l'index de l'exemple dont nous voulons les identifiants de mots lorsque les entrées du *tokenizer* sont des listes de textes (ou dans notre cas, des listes de mots), donc nous l'ajoutons aussi :
 
@@ -432,11 +424,8 @@ model.config.num_labels
 9
 ```
 
-<Tip warning={true}>
-
-⚠️ Si vous avez un modèle avec le mauvais nombre d'étiquettes, vous obtiendrez plus tard une erreur obscure lors de l'appel de `model.fit()`. Cela peut être ennuyeux à déboguer donc assurez-vous de faire cette vérification pour confirmer que vous avez le nombre d'étiquettes attendu.
-
-</Tip>
+> [!WARNING]
+> ⚠️ Si vous avez un modèle avec le mauvais nombre d'étiquettes, vous obtiendrez plus tard une erreur obscure lors de l'appel de `model.fit()`. Cela peut être ennuyeux à déboguer donc assurez-vous de faire cette vérification pour confirmer que vous avez le nombre d'étiquettes attendu.
 
 ### <i>Finetuning</i> du modèle
 
@@ -500,11 +489,8 @@ model.fit(
 
 Vous pouvez spécifier le nom complet du dépôt vers lequel vous voulez pousser avec l'argument `hub_model_id` (en particulier, vous devrez utiliser cet argument pour pousser vers une organisation). Par exemple, lorsque nous avons poussé le modèle vers l'organisation [`huggingface-course`](https://huggingface.co/huggingface-course), nous avons ajouté `hub_model_id="huggingface-course/bert-finetuned-ner"`. Par défaut, le dépôt utilisé sera dans votre espace de noms et nommé après le répertoire de sortie que vous avez défini, par exemple `"cool_huggingface_user/bert-finetuned-ner"`.
 
-<Tip>
-
-💡 Si le répertoire de sortie que vous utilisez existe déjà, il doit être un clone local du dépôt vers lequel vous voulez pousser. S'il ne l'est pas, vous obtiendrez une erreur lors de l'appel de `model.fit()` et devrez définir un nouveau nom.
-
-</Tip>
+> [!TIP]
+> 💡 Si le répertoire de sortie que vous utilisez existe déjà, il doit être un clone local du dépôt vers lequel vous voulez pousser. S'il ne l'est pas, vous obtiendrez une erreur lors de l'appel de `model.fit()` et devrez définir un nouveau nom.
 
 Notez que pendant l'entraînement, chaque fois que le modèle est sauvegardé (ici, à chaque époque), il est téléchargé sur le *Hub* en arrière-plan. De cette façon, vous pourrez reprendre votre entraînement sur une autre machine si nécessaire.
 
@@ -682,11 +668,8 @@ model.config.num_labels
 9
 ```
 
-<Tip warning={true}>
-
-⚠️ Si vous avez un modèle avec le mauvais nombre d'étiquettes, vous obtiendrez une erreur obscure lors de l'appel de la méthode `Trainer.train()` (quelque chose comme "CUDA error : device-side assert triggered"). C'est la première cause de bogues signalés par les utilisateurs pour de telles erreurs, donc assurez-vous de faire cette vérification pour confirmer que vous avez le nombre d'étiquettes attendu.
-
-</Tip>
+> [!WARNING]
+> ⚠️ Si vous avez un modèle avec le mauvais nombre d'étiquettes, vous obtiendrez une erreur obscure lors de l'appel de la méthode `Trainer.train()` (quelque chose comme "CUDA error : device-side assert triggered"). C'est la première cause de bogues signalés par les utilisateurs pour de telles erreurs, donc assurez-vous de faire cette vérification pour confirmer que vous avez le nombre d'étiquettes attendu.
 
 ### <i>Finetuning</i> du modèle
 
@@ -724,11 +707,8 @@ args = TrainingArguments(
 
 Vous avez déjà vu la plupart d'entre eux. Nous définissons quelques hyperparamètres (comme le taux d'apprentissage, le nombre d'époques à entraîner, et le taux de décroissance des poids), et nous spécifions `push_to_hub=True` pour indiquer que nous voulons sauvegarder le modèle, l'évaluer à la fin de chaque époque, et que nous voulons télécharger nos résultats vers le *Hub*. Notez que vous pouvez spécifier le nom du dépôt vers lequel vous voulez pousser avec l'argument `hub_model_id` (en particulier, vous devrez utiliser cet argument pour pousser vers une organisation). Par exemple, lorsque nous avons poussé le modèle vers l'organisation [`huggingface-course`](https://huggingface.co/huggingface-course), nous avons ajouté `hub_model_id="huggingface-course/bert-finetuned-ner"``TrainingArguments`. Par défaut, le dépôt utilisé sera dans votre espace de noms et nommé d'après le répertoire de sortie que vous avez défini, donc dans notre cas ce sera `"sgugger/bert-finetuned-ner"`.
 
-<Tip>
-
-💡 Si le répertoire de sortie que vous utilisez existe déjà, il doit être un clone local du dépôt vers lequel vous voulez pousser. S'il ne l'est pas, vous obtiendrez une erreur lors de la définition de votre `Trainer` et devrez définir un nouveau nom.
-
-</Tip>
+> [!TIP]
+> 💡 Si le répertoire de sortie que vous utilisez existe déjà, il doit être un clone local du dépôt vers lequel vous voulez pousser. S'il ne l'est pas, vous obtiendrez une erreur lors de la définition de votre `Trainer` et devrez définir un nouveau nom.
 
 Enfin, nous passons tout au `Trainer` et lançons l'entraînement :
 
@@ -816,11 +796,8 @@ model, optimizer, train_dataloader, eval_dataloader = accelerator.prepare(
 )
 ```
 
-<Tip>
-
-🚨 Si vous entraînez sur un TPU, vous devrez déplacer tout le code à partir de la cellule ci-dessus dans une fonction d'entraînement dédiée. Voir le [chapitre 3](/course/fr/chapter3) pour plus de détails.
-
-</Tip>
+> [!TIP]
+> 🚨 Si vous entraînez sur un TPU, vous devrez déplacer tout le code à partir de la cellule ci-dessus dans une fonction d'entraînement dédiée. Voir le [chapitre 3](/course/fr/chapter3) pour plus de détails.
 
 Maintenant que nous avons envoyé notre `train_dataloader` à `accelerator.prepare()`, nous pouvons utiliser sa longueur pour calculer le nombre d'étapes d'entraînement. Rappelez-vous que nous devrions toujours faire cela après avoir préparé le *dataloader* car cette méthode modifiera sa longueur. Nous utilisons un programme linéaire classique du taux d'apprentissage à 0 :
 
diff --git a/chapters/fr/chapter7/3.mdx b/chapters/fr/chapter7/3.mdx
index 11733c1b3..407d10fc7 100644
--- a/chapters/fr/chapter7/3.mdx
+++ b/chapters/fr/chapter7/3.mdx
@@ -45,11 +45,8 @@ Allons-y !
 
 <Youtube id="mqElG5QJWUg"/>
 
-<Tip>
-
-🙋 Si les termes « modélisation du langage masqué » et « modèle pré-entraîné » ne vous sont pas familiers, consultez le [chapitre 1](/course/fr/chapiter1), où nous expliquons tous ces concepts fondamentaux, vidéos à l'appui !
-
-</Tip>
+> [!TIP]
+> 🙋 Si les termes « modélisation du langage masqué » et « modèle pré-entraîné » ne vous sont pas familiers, consultez le [chapitre 1](/course/fr/chapiter1), où nous expliquons tous ces concepts fondamentaux, vidéos à l'appui !
 
 ## Choix d'un modèle pré-entraîné pour la modélisation du langage masqué
 
@@ -241,11 +238,8 @@ for row in sample:
 
 Oui, ce sont bien des critiques de films, et si vous êtes assez âgés, vous pouvez même comprendre le commentaire dans la dernière critique sur le fait de posséder une version VHS 😜 ! Bien que nous n'ayons pas besoin des étiquettes pour la modélisation du langage, nous pouvons déjà voir qu'un `0` dénote une critique négative, tandis qu'un `1` correspond à une critique positive.
 
-<Tip>
-
-✏️ **Essayez !** Créez un échantillon aléatoire de la répartition `unsupervised` et vérifiez que les étiquettes ne sont ni `0` ni `1`. Pendant que vous y êtes, vous pouvez aussi vérifier que les étiquettes dans les échantillons `train` et `test` sont bien `0` ou `1`. C'est un contrôle utile que tout praticien en NLP devrait effectuer au début d'un nouveau projet !
-
-</Tip>
+> [!TIP]
+> ✏️ **Essayez !** Créez un échantillon aléatoire de la répartition `unsupervised` et vérifiez que les étiquettes ne sont ni `0` ni `1`. Pendant que vous y êtes, vous pouvez aussi vérifier que les étiquettes dans les échantillons `train` et `test` sont bien `0` ou `1`. C'est un contrôle utile que tout praticien en NLP devrait effectuer au début d'un nouveau projet !
 
 Maintenant que nous avons jeté un coup d'œil rapide aux données, plongeons dans leur préparation pour la modélisation du langage masqué. Comme nous allons le voir, il y a quelques étapes supplémentaires à suivre par rapport aux tâches de classification de séquences que nous avons vues au [chapitre 3](/course/fr/chapter3). Allons-y !
 
@@ -304,11 +298,8 @@ tokenizer.model_max_length
 
 Cette valeur est dérivée du fichier *tokenizer_config.json* associé à un *checkpoint*. Dans ce cas, nous pouvons voir que la taille du contexte est de 512 *tokens*, tout comme avec BERT.
 
-<Tip>
-
-✏️ **Essayez !** Certains *transformers*, comme [BigBird](https://huggingface.co/google/bigbird-roberta-base) et [Longformer](hf.co/allenai/longformer-base-4096), ont une longueur de contexte beaucoup plus longue que BERT et les autres premiers *transformers*. Instanciez le *tokenizer* pour l'un de ces *checkpoints* et vérifiez que le `model_max_length` correspond à ce qui est indiqué sur sa carte.
-
-</Tip>
+> [!TIP]
+> ✏️ **Essayez !** Certains *transformers*, comme [BigBird](https://huggingface.co/google/bigbird-roberta-base) et [Longformer](hf.co/allenai/longformer-base-4096), ont une longueur de contexte beaucoup plus longue que BERT et les autres premiers *transformers*. Instanciez le *tokenizer* pour l'un de ces *checkpoints* et vérifiez que le `model_max_length` correspond à ce qui est indiqué sur sa carte.
 
 Ainsi, pour réaliser nos expériences sur des GPUs comme ceux disponibles sur Google Colab, nous choisirons quelque chose d'un peu plus petit qui peut tenir en mémoire :
 
@@ -316,11 +307,8 @@ Ainsi, pour réaliser nos expériences sur des GPUs comme ceux disponibles sur G
 chunk_size = 128
 ```
 
-<Tip warning={true}>
-
-Notez que l'utilisation d'une petite taille peut être préjudiciable dans les scénarios du monde réel. Vous devez donc utiliser une taille qui correspond au cas d'utilisation auquel vous appliquerez votre modèle.
-
-</Tip>
+> [!WARNING]
+> Notez que l'utilisation d'une petite taille peut être préjudiciable dans les scénarios du monde réel. Vous devez donc utiliser une taille qui correspond au cas d'utilisation auquel vous appliquerez votre modèle.
 
 Maintenant vient la partie amusante. Pour montrer comment la concaténation fonctionne, prenons quelques commentaires de notre ensemble d'entraînement et affichons le nombre de *tokens* par commentaire :
 
@@ -477,11 +465,8 @@ for chunk in data_collator(samples)["input_ids"]:
 
 Super, ça a marché ! Nous pouvons voir que le *token* `[MASK]` a été inséré de façon aléatoire à différents endroits dans notre texte. Ce seront les *tokens* que notre modèle devra prédire pendant l'entraînement. Et la beauté du collecteur de données est qu'il va rendre aléatoire l'insertion du `[MASK]` à chaque batch ! 
 
-<Tip>
-
-✏️ **Essayez** Exécutez le code ci-dessus plusieurs fois pour voir le masquage aléatoire se produire sous vos yeux ! Remplacez aussi la méthode `tokenizer.decode()` par `tokenizer.convert_ids_to_tokens()` pour voir que parfois un seul *token* d'un mot donné est masqué et pas les autres.
-
-</Tip>
+> [!TIP]
+> ✏️ **Essayez** Exécutez le code ci-dessus plusieurs fois pour voir le masquage aléatoire se produire sous vos yeux ! Remplacez aussi la méthode `tokenizer.decode()` par `tokenizer.convert_ids_to_tokens()` pour voir que parfois un seul *token* d'un mot donné est masqué et pas les autres.
 
 {#if fw === 'pt'}
 
@@ -591,11 +576,8 @@ for chunk in batch["input_ids"]:
 '>>> .... [MASK] [MASK] [MASK] [MASK]....... high. a classic line : inspector : i\'m here to sack one of your teachers. student : welcome to bromwell high. i expect that many adults of my age think that bromwell high is far fetched. what a pity that it isn\'t! [SEP] [CLS] homelessness ( or houselessness as george carlin stated ) has been an issue for years but never a plan to help those on the street that were once considered human who did everything from going to school, work, or vote for the matter. most people think of the homeless'
 ```
 
-<Tip>
-
-✏️ **Essayez** Exécutez le code ci-dessus plusieurs fois pour voir le masquage aléatoire se produire sous vos yeux ! Remplacez aussi la méthode `tokenizer.decode()` par `tokenizer.convert_ids_to_tokens()` pour voir que les *tokens* d'un mot donné sont toujours masqués ensemble.
-
-</Tip>
+> [!TIP]
+> ✏️ **Essayez** Exécutez le code ci-dessus plusieurs fois pour voir le masquage aléatoire se produire sous vos yeux ! Remplacez aussi la méthode `tokenizer.decode()` par `tokenizer.convert_ids_to_tokens()` pour voir que les *tokens* d'un mot donné sont toujours masqués ensemble.
 
 Maintenant que nous avons deux assembleurs de données, les étapes restantes du *finetuning* sont standards. L'entraînement peut prendre un certain temps sur Google Colab si vous n'avez pas la chance de tomber sur un mythique GPU P100 😭. Ainsi nous allons d'abord réduire la taille du jeu d'entraînement à quelques milliers d'exemples. Ne vous inquiétez pas, nous obtiendrons quand même un modèle de langage assez décent ! Un moyen rapide de réduire la taille d'un jeu de données dans 🤗 *Datasets* est la fonction `Dataset.train_test_split()` que nous avons vue au [chapitre 5](/course/fr/chapter5) :
 
@@ -820,11 +802,8 @@ trainer.push_to_hub()
 
 {/if}
 
-<Tip>
-
-✏️ **A votre tour !** Exécutez l'entraînement ci-dessus après avoir remplacé le collecteur de données par le collecteur de mots entiers masqués. Obtenez-vous de meilleurs résultats ?
-
-</Tip>
+> [!TIP]
+> ✏️ **A votre tour !** Exécutez l'entraînement ci-dessus après avoir remplacé le collecteur de données par le collecteur de mots entiers masqués. Obtenez-vous de meilleurs résultats ?
 
 {#if fw === 'pt'} 
 
@@ -1042,8 +1021,5 @@ Notre modèle a clairement adapté ses pondérations pour prédire les mots qui
 
 Ceci conclut notre première expérience d'entraînement d'un modèle de langage. Dans la [section 6](/course/fr/chapter7/section6), vous apprendrez comment entraîner à partir de zéro un modèle autorégressif comme GPT-2. Allez-y si vous voulez voir comment vous pouvez pré-entraîner votre propre *transformer* !
 
-<Tip>
-
-✏️ **Essayez !** Pour quantifier les avantages de l'adaptation au domaine, <i>finetunez</i> un classifieur sur le jeu de données IMDb pour à la fois, le <i>checkpoint</i> de DistilBERT pré-entraîné et e <i>checkpoint</i> de DistilBERT <i>finetuné</i>. Si vous avez besoin d'un rafraîchissement sur la classification de texte, consultez le [chapitre 3](/course/fr/chapter3). 
-
-</Tip>
+> [!TIP]
+> ✏️ **Essayez !** Pour quantifier les avantages de l'adaptation au domaine, <i>finetunez</i> un classifieur sur le jeu de données IMDb pour à la fois, le <i>checkpoint</i> de DistilBERT pré-entraîné et e <i>checkpoint</i> de DistilBERT <i>finetuné</i>. Si vous avez besoin d'un rafraîchissement sur la classification de texte, consultez le [chapitre 3](/course/fr/chapter3).
diff --git a/chapters/fr/chapter7/4.mdx b/chapters/fr/chapter7/4.mdx
index 1699e2038..3ac1fa50d 100644
--- a/chapters/fr/chapter7/4.mdx
+++ b/chapters/fr/chapter7/4.mdx
@@ -161,11 +161,8 @@ Il sera intéressant de voir si notre modèle *finetuné* tient compte de ces pa
 
 <Youtube id="0Oxphw4Q9fo"/>
 
-<Tip>
-
-✏️ **A votre tour !** Un autre mot anglais souvent utilisé en français est « *email* ». Trouvez le premier échantillon dans l'échantillon d'entraînement qui utilise ce mot. Comment est-il traduit ? Comment le modèle pré-entraîné traduit-il cette même phrase ?
-
-</Tip>
+> [!TIP]
+> ✏️ **A votre tour !** Un autre mot anglais souvent utilisé en français est « *email* ». Trouvez le premier échantillon dans l'échantillon d'entraînement qui utilise ce mot. Comment est-il traduit ? Comment le modèle pré-entraîné traduit-il cette même phrase ?
 
 ### Traitement des données
 
@@ -182,11 +179,8 @@ tokenizer = AutoTokenizer.from_pretrained(model_checkpoint, return_tensors="tf")
 
 Vous pouvez remplacer le `model_checkpoint` par un tout autre modèle disponible sur le [*Hub*](https://huggingface.co/models) qui aurait votre préférence, ou par un dossier en local où vous avez sauvegardé un modèle pré-entraîné et un *tokenizer*.
 
-<Tip>
-
-💡 Si vous utilisez un *tokenizer* multilingue tel que mBART, mBART-50 ou M2M100, vous devrez définir les codes de langue de vos entrées et cibles dans le *tokenizer* en définissant `tokenizer.src_lang` et `tokenizer.tgt_lang` aux bonnes valeurs.
-
-</Tip>
+> [!TIP]
+> 💡 Si vous utilisez un *tokenizer* multilingue tel que mBART, mBART-50 ou M2M100, vous devrez définir les codes de langue de vos entrées et cibles dans le *tokenizer* en définissant `tokenizer.src_lang` et `tokenizer.tgt_lang` aux bonnes valeurs.
 
 La préparation de nos données est assez simple. Il y a juste une chose à retenir : vous traitez les entrées comme d'habitude, mais pour les cibles, vous devez envelopper le *tokenizer* dans le gestionnaire de contexte `as_target_tokenizer()`.
 
@@ -249,17 +243,11 @@ def preprocess_function(examples):
 
 Notez que nous avons fixé des longueurs maximales similaires pour nos entrées et nos sorties. Comme les textes que nous traitons semblent assez courts, nous utilisons 128.
 
-<Tip>
+> [!TIP]
+> 💡 Si vous utilisez un modèle T5 (plus précisément, un des *checkpoints* `t5-xxx`), le modèle s'attendra à ce que les entrées aient un préfixe indiquant la tâche à accomplir, comme `translate: English to French:`.
 
-💡 Si vous utilisez un modèle T5 (plus précisément, un des *checkpoints* `t5-xxx`), le modèle s'attendra à ce que les entrées aient un préfixe indiquant la tâche à accomplir, comme `translate: English to French:`.
-
-</Tip>
-
-<Tip warning={true}>
-
-⚠️ Nous ne faisons pas attention au masque d'attention des cibles car le modèle ne s'y attend pas. Au lieu de cela, les étiquettes correspondant à un *token* de *padding* doivent être mises à `-100` afin qu'elles soient ignorées dans le calcul de la perte. Cela sera fait par notre assembleur de données plus tard puisque nous appliquons le *padding* dynamique, mais si vous utilisez le *padding* ici, vous devriez adapter la fonction de prétraitement pour mettre toutes les étiquettes qui correspondent au *token* de *padding* à `-100`.
-
-</Tip>
+> [!WARNING]
+> ⚠️ Nous ne faisons pas attention au masque d'attention des cibles car le modèle ne s'y attend pas. Au lieu de cela, les étiquettes correspondant à un *token* de *padding* doivent être mises à `-100` afin qu'elles soient ignorées dans le calcul de la perte. Cela sera fait par notre assembleur de données plus tard puisque nous appliquons le *padding* dynamique, mais si vous utilisez le *padding* ici, vous devriez adapter la fonction de prétraitement pour mettre toutes les étiquettes qui correspondent au *token* de *padding* à `-100`.
 
 Nous pouvons maintenant appliquer ce prétraitement en une seule fois sur toutes les échantillons de notre jeu de données :
 
@@ -661,11 +649,8 @@ model.fit(
 
 Notez que vous pouvez spécifier le nom du dépôt vers lequel vous voulez pousser le modèle avec l'argument `hub_model_id` (en particulier, vous devrez utiliser cet argument pour pousser vers une organisation). Par exemple, lorsque nous avons poussé le modèle vers l'organisation [`huggingface-course`](https://huggingface.co/huggingface-course), nous avons ajouté `hub_model_id="huggingface-course/marian-finetuned-kde4-en-to-fr"` dans `Seq2SeqTrainingArguments`. Par défaut, le dépôt utilisé sera dans votre espace et nommé après le répertoire de sortie que vous avez défini. Ici ce sera `"sgugger/marian-finetuned-kde4-en-to-fr"` (qui est le modèle que nous avons lié au début de cette section).
 
-<Tip>
-
-💡 Si le répertoire de sortie que vous utilisez existe déjà, il doit être un clone local du dépôt vers lequel vous voulez pousser. S'il ne l'est pas, vous obtiendrez une erreur lors de l'appel de `model.fit()` et devrez définir un nouveau nom.
-
-</Tip>
+> [!TIP]
+> 💡 Si le répertoire de sortie que vous utilisez existe déjà, il doit être un clone local du dépôt vers lequel vous voulez pousser. S'il ne l'est pas, vous obtiendrez une erreur lors de l'appel de `model.fit()` et devrez définir un nouveau nom.
 
 Enfin, voyons à quoi ressemblent nos métriques maintenant que l'entraînement est terminé :
 
@@ -711,11 +696,8 @@ En dehors des hyperparamètres habituels (comme le taux d'apprentissage, le nomb
 
 Notez que vous pouvez spécifier le nom complet du dépôt vers lequel vous voulez pousser avec l'argument `hub_model_id` (en particulier, vous devrez utiliser cet argument pour pousser vers une organisation). Par exemple, lorsque nous avons poussé le modèle vers l'organisation [`huggingface-course`](https://huggingface.co/huggingface-course), nous avons ajouté `hub_model_id="huggingface-course/marian-finetuned-kde4-en-to-fr"` à `Seq2SeqTrainingArguments`. Par défaut, le dépôt utilisé sera dans votre espace et nommé d'après le répertoire de sortie que vous avez défini. Dans notre cas ce sera `"sgugger/marian-finetuned-kde4-en-to-fr"` (qui est le modèle que nous avons lié au début de cette section).
 
-<Tip>
-
-💡 Si le répertoire de sortie que vous utilisez existe déjà, il doit être un clone local du dépôt vers lequel vous voulez pousser. S'il ne l'est pas, vous obtiendrez une erreur lors de la définition de votre `Seq2SeqTrainer` et devrez définir un nouveau nom.
-
-</Tip>
+> [!TIP]
+> 💡 Si le répertoire de sortie que vous utilisez existe déjà, il doit être un clone local du dépôt vers lequel vous voulez pousser. S'il ne l'est pas, vous obtiendrez une erreur lors de la définition de votre `Seq2SeqTrainer` et devrez définir un nouveau nom.
 
 
 Enfin, nous passons tout au `Seq2SeqTrainer` :
@@ -1007,8 +989,5 @@ translator(
 
 Un autre excellent exemple d'adaptation au domaine !
 
-<Tip>
-
-✏️ **A votre tour !** Que retourne le modèle sur l'échantillon avec le mot « *email* » que vous avez identifié plus tôt ?
-
-</Tip>
+> [!TIP]
+> ✏️ **A votre tour !** Que retourne le modèle sur l'échantillon avec le mot « *email* » que vous avez identifié plus tôt ?
diff --git a/chapters/fr/chapter7/5.mdx b/chapters/fr/chapter7/5.mdx
index 0a963e8cb..66ab948e9 100644
--- a/chapters/fr/chapter7/5.mdx
+++ b/chapters/fr/chapter7/5.mdx
@@ -95,11 +95,8 @@ show_samples(english_dataset)
 # Je l'ai acheté pour manipuler diverses pièces d'avion et des "trucs" de hangar que je devais organiser ; il a vraiment fait l'affaire. L'unité est arrivée rapidement, était bien emballée et est arrivée intacte (toujours un bon signe). Il y a cinq supports muraux - trois sur le dessus et deux sur le dessous. Je voulais le monter sur le mur, alors tout ce que j'ai eu à faire était d'enlever les deux couches supérieures de tiroirs en plastique, ainsi que les tiroirs d'angle inférieurs, de le placer où je voulais et de le marquer ; j'ai ensuite utilisé quelques-uns des nouveaux ancrages muraux à vis en plastique (la variété de 50 livres) et il s'est facilement monté sur le mur. Certains ont fait remarquer qu'ils voulaient des séparateurs pour les tiroirs, et qu'ils les ont fabriqués. Bonne idée. Pour ma part, j'avais besoin de quelque chose dont je pouvais voir le contenu à hauteur des yeux, et je voulais donc des tiroirs plus grands. J'aime aussi le fait qu'il s'agisse du nouveau plastique qui ne se fragilise pas et ne se fend pas comme mes anciens tiroirs en plastique. J'aime la construction entièrement en plastique. Elle est suffisamment résistante pour contenir des pièces métalliques, mais étant en plastique, elle n'est pas aussi lourde qu'un cadre métallique, ce qui permet de la fixer facilement au mur et de la charger d'objets lourds ou légers. Aucun problème. Pour le prix, c'est imbattable. C'est le meilleur que j'ai acheté à ce jour, et j'utilise des versions de ce type depuis plus de quarante ans.
 ```
 
-<Tip>
-
-✏️ **Essayez !** Changez la graine aléatoire dans la commande `Dataset.shuffle()` pour explorer d'autres critiques dans le corpus. Si vous parlez espagnol, jetez un coup d'œil  à certaines des critiques dans `spanish_dataset` pour voir si les titres semblent aussi être des résumés raisonnables.
-
-</Tip>
+> [!TIP]
+> ✏️ **Essayez !** Changez la graine aléatoire dans la commande `Dataset.shuffle()` pour explorer d'autres critiques dans le corpus. Si vous parlez espagnol, jetez un coup d'œil  à certaines des critiques dans `spanish_dataset` pour voir si les titres semblent aussi être des résumés raisonnables.
 
 Cet échantillon montre la diversité des critiques que l'on trouve généralement en ligne, allant du positif au négatif (et tout ce qui se trouve entre les deux !). Bien que l'exemple avec le titre « meh » ne soit pas très informatif, les autres titres semblent être des résumés décents des critiques. Entraîner un modèle de résumé sur l'ensemble des 400 000 avis prendrait beaucoup trop de temps sur un seul GPU, nous allons donc nous concentrer sur la génération de résumés pour un seul domaine de produits. Pour avoir une idée des domaines parmi lesquels nous pouvons choisir, convertissons `english_dataset` en `pandas.DataFrame` et calculons le nombre d'avis par catégorie de produits :
 
@@ -249,11 +246,8 @@ Nous allons nous concentrer sur mT5, une architecture intéressante basée sur T
 mT5 n'utilise pas de préfixes mais partage une grande partie de la polyvalence de T5 et a l'avantage d'être multilingue. Maintenant que nous avons choisi un modèle, voyons comment préparer nos données pour l'entraînement.
 
 
-<Tip>
-
-✏️ **Essayez !** Une fois que vous aurez terminé cette section, comparez le mT5 à mBART en *finetunant* ce dernier avec les mêmes techniques. Pour des points bonus, vous pouvez aussi essayer de *finetuner* le T5 uniquement sur les critiques anglaises. Puisque le T5 a un préfixe spécial, vous devrez ajouter `summarize:` aux entrées dans les étapes de prétraitement ci-dessous.
-
-</Tip>
+> [!TIP]
+> ✏️ **Essayez !** Une fois que vous aurez terminé cette section, comparez le mT5 à mBART en *finetunant* ce dernier avec les mêmes techniques. Pour des points bonus, vous pouvez aussi essayer de *finetuner* le T5 uniquement sur les critiques anglaises. Puisque le T5 a un préfixe spécial, vous devrez ajouter `summarize:` aux entrées dans les étapes de prétraitement ci-dessous.
 
 ## Prétraitement des données
 
@@ -268,11 +262,8 @@ model_checkpoint = "google/mt5-small"
 tokenizer = AutoTokenizer.from_pretrained(model_checkpoint)
 ```
 
-<Tip>
-
-💡 Aux premiers stades de vos projets de NLP, une bonne pratique consiste à entraîner une classe de « petits » modèles sur un petit échantillon de données. Cela vous permet de déboguer et d'itérer plus rapidement vers un flux de travail de bout en bout. Une fois que vous avez confiance dans les résultats, vous pouvez toujours faire évoluer le modèle en changeant simplement le *checkpoint* du modèle !
-
-</Tip>
+> [!TIP]
+> 💡 Aux premiers stades de vos projets de NLP, une bonne pratique consiste à entraîner une classe de « petits » modèles sur un petit échantillon de données. Cela vous permet de déboguer et d'itérer plus rapidement vers un flux de travail de bout en bout. Une fois que vous avez confiance dans les résultats, vous pouvez toujours faire évoluer le modèle en changeant simplement le *checkpoint* du modèle !
 
 Testons le *tokenizer* de mT5 sur un petit exemple :
 
@@ -329,11 +320,8 @@ tokenized_datasets = books_dataset.map(preprocess_function, batched=True)
 
 Maintenant que le corpus a été prétraité, examinons certaines métriques couramment utilisées pour le résumé. Comme nous allons le voir, il n'existe pas de solution miracle pour mesurer la qualité d'un texte généré par une machine.
 
-<Tip>
-
-💡 Vous avez peut-être remarqué que nous avons utilisé `batched=True` dans notre fonction `Dataset.map()` ci-dessus. Cela permet de coder les exemples par lots de 1 000 (par défaut) et d'utiliser les capacités de *multithreading* des *tokenizers* rapides de 🤗 *Transformers*. Lorsque cela est possible, essayez d'utiliser `batched=True` pour tirer le meilleur parti de votre prétraitement !
-
-</Tip>
+> [!TIP]
+> 💡 Vous avez peut-être remarqué que nous avons utilisé `batched=True` dans notre fonction `Dataset.map()` ci-dessus. Cela permet de coder les exemples par lots de 1 000 (par défaut) et d'utiliser les capacités de *multithreading* des *tokenizers* rapides de 🤗 *Transformers*. Lorsque cela est possible, essayez d'utiliser `batched=True` pour tirer le meilleur parti de votre prétraitement !
 
 
 ## Métriques pour le résumé de texte
@@ -353,11 +341,8 @@ reference_summary = "I loved reading the Hunger Games"
 
 Une façon de les comparer pourrait être de compter le nombre de mots qui se chevauchent, qui dans ce cas serait de 6. Cependant, cette méthode est un peu grossière, c'est pourquoi ROUGE se base sur le calcul des scores de _précision_ et de _rappel_ pour le chevauchement.
 
-<Tip>
-
-🙋 Ne vous inquiétez pas si c'est la première fois que vous entendez parler de précision et de rappel. Nous allons parcourir ensemble quelques exemples explicites pour que tout soit clair. Ces métriques sont généralement rencontrées dans les tâches de classification, donc si vous voulez comprendre comment la précision et le rappel sont définis dans ce contexte, nous vous recommandons de consulter les [guides de `scikit-learn`](https://scikit-learn.org/stable/auto_examples/model_selection/plot_precision_recall.html).
-
-</Tip>
+> [!TIP]
+> 🙋 Ne vous inquiétez pas si c'est la première fois que vous entendez parler de précision et de rappel. Nous allons parcourir ensemble quelques exemples explicites pour que tout soit clair. Ces métriques sont généralement rencontrées dans les tâches de classification, donc si vous voulez comprendre comment la précision et le rappel sont définis dans ce contexte, nous vous recommandons de consulter les [guides de `scikit-learn`](https://scikit-learn.org/stable/auto_examples/model_selection/plot_precision_recall.html).
 
 Pour ROUGE, le rappel mesure la proportion du résumé de référence qui est capturée par le résumé généré. Si nous ne faisons que comparer des mots, le rappel peut être calculé selon la formule suivante :
 
@@ -409,11 +394,8 @@ Score(precision=0.86, recall=1.0, fmeasure=0.92)
 
 Super, les chiffres de précision et de rappel correspondent ! Maintenant, qu'en est-il des autres scores ROUGE ? `rouge2` mesure le chevauchement entre les bigrammes (chevauchement des paires de mots), tandis que `rougeL` et `rougeLsum` mesurent les plus longues séquences de mots correspondants en recherchant les plus longues sous-souches communes dans les résumés générés et de référence. Le « sum » dans `rougeLsum` fait référence au fait que cette métrique est calculée sur un résumé entier, alors que `rougeL` est calculée comme une moyenne sur des phrases individuelles.
 
-<Tip>
-
-✏️ **Essayez !** Créez votre propre exemple de résumé généré et de référence et voyez si les scores ROUGE obtenus correspondent à un calcul manuel basé sur les formules de précision et de rappel. Pour des points bonus, divisez le texte en bigrammes et comparez la précision et le rappel pour la métrique `rouge2`.
-
-</Tip>
+> [!TIP]
+> ✏️ **Essayez !** Créez votre propre exemple de résumé généré et de référence et voyez si les scores ROUGE obtenus correspondent à un calcul manuel basé sur les formules de précision et de rappel. Pour des points bonus, divisez le texte en bigrammes et comparez la précision et le rappel pour la métrique `rouge2`.
 
 Nous utiliserons ces scores ROUGE pour suivre les performances de notre modèle, mais avant cela, faisons ce que tout bon praticien de NLP devrait faire : créer une *baseline* solide, mais simple !
 
@@ -506,11 +488,8 @@ model = TFAutoModelForSeq2SeqLM.from_pretrained(model_checkpoint)
 
 {/if}
 
-<Tip>
-
-💡 Si vous vous demandez pourquoi vous ne voyez aucun avertissement concernant le *finetuning* du modèle sur une tâche en aval, c'est parce que pour les tâches de séquence à séquence, nous conservons tous les poids du réseau. Comparez cela à notre modèle de classification de texte du [chapitre 3](/course/fr/chapter3) où la tête du modèle pré-entraîné a été remplacée par un réseau initialisé de manière aléatoire.
-
-</Tip>
+> [!TIP]
+> 💡 Si vous vous demandez pourquoi vous ne voyez aucun avertissement concernant le *finetuning* du modèle sur une tâche en aval, c'est parce que pour les tâches de séquence à séquence, nous conservons tous les poids du réseau. Comparez cela à notre modèle de classification de texte du [chapitre 3](/course/fr/chapter3) où la tête du modèle pré-entraîné a été remplacée par un réseau initialisé de manière aléatoire.
 
 La prochaine chose que nous devons faire est de nous connecter au *Hub*. Si vous exécutez ce code dans un *notebook*, vous pouvez le faire avec la fonction utilitaire suivante :
 
@@ -871,11 +850,8 @@ model, optimizer, train_dataloader, eval_dataloader = accelerator.prepare(
 )
 ```
 
-<Tip>
-
-🚨 Si vous vous entraînez sur un TPU, vous devrez déplacer tout le code ci-dessus dans une fonction d'entraînement dédiée. Voir le [chapitre 3](/course/fr/chapter3) pour plus de détails.
-
-</Tip>
+> [!TIP]
+> 🚨 Si vous vous entraînez sur un TPU, vous devrez déplacer tout le code ci-dessus dans une fonction d'entraînement dédiée. Voir le [chapitre 3](/course/fr/chapter3) pour plus de détails.
 
 Maintenant que nous avons préparé nos objets, il reste trois choses à faire :
 
diff --git a/chapters/fr/chapter7/6.mdx b/chapters/fr/chapter7/6.mdx
index 91c90a96c..7825484e9 100644
--- a/chapters/fr/chapter7/6.mdx
+++ b/chapters/fr/chapter7/6.mdx
@@ -134,11 +134,8 @@ DatasetDict({
 })
 ```
 
-<Tip>
-
-Le pré-entraînement du modèle de langue prendra un certain temps. Nous vous suggérons donc d'exécuter d'abord la boucle d'entraînement sur un petit échantillon des données en décommentant les deux lignes dans le code ci-dessus. Assurez-vous alors que l'entraînement se termine avec succès et que les modèles sont stockés. Rien n'est plus frustrant qu'un entraînement qui échoue à la dernière étape car vous avez oublié de créer un dossier ou parce qu'il y a une faute de frappe à la fin de la boucle d'entraînement !
-
-</Tip>
+> [!TIP]
+> Le pré-entraînement du modèle de langue prendra un certain temps. Nous vous suggérons donc d'exécuter d'abord la boucle d'entraînement sur un petit échantillon des données en décommentant les deux lignes dans le code ci-dessus. Assurez-vous alors que l'entraînement se termine avec succès et que les modèles sont stockés. Rien n'est plus frustrant qu'un entraînement qui échoue à la dernière étape car vous avez oublié de créer un dossier ou parce qu'il y a une faute de frappe à la fin de la boucle d'entraînement !
 
 Examinons un exemple tiré du jeu de données. Nous ne montrerons que les 200 premiers caractères de chaque champ :
 
@@ -251,11 +248,8 @@ Nous avons maintenant 16,7 millions d'exemples avec 128 *tokens* chacun, ce qui
 
 Maintenant que le jeu de données est prêt, configurons le modèle !
 
-<Tip>
-
-✏️ **Essayez !** Se débarrasser de tous les morceaux qui sont plus petits que la taille du contexte n'était pas un gros problème ici parce que nous utilisons de petites fenêtres de contexte. Si vous augmentez la taille du contexte (ou si vous avez un corpus de documents courts), la fraction des morceaux qui sont jetés augmentera. Une façon plus efficace de préparer les données est de joindre tous les échantillons dans un batch avec un *token* `eos_token_id` entre les deux, puis d'effectuer le découpage sur les séquences concaténées. Comme exercice, modifiez la fonction `tokenize()` pour utiliser cette approche. Notez que vous devrez mettre `truncation=False` et enlever les autres arguments du *tokenizer* pour obtenir la séquence complète des identifiants des *tokens*.
-
-</Tip>
+> [!TIP]
+> ✏️ **Essayez !** Se débarrasser de tous les morceaux qui sont plus petits que la taille du contexte n'était pas un gros problème ici parce que nous utilisons de petites fenêtres de contexte. Si vous augmentez la taille du contexte (ou si vous avez un corpus de documents courts), la fraction des morceaux qui sont jetés augmentera. Une façon plus efficace de préparer les données est de joindre tous les échantillons dans un batch avec un *token* `eos_token_id` entre les deux, puis d'effectuer le découpage sur les séquences concaténées. Comme exercice, modifiez la fonction `tokenize()` pour utiliser cette approche. Notez que vous devrez mettre `truncation=False` et enlever les autres arguments du *tokenizer* pour obtenir la séquence complète des identifiants des *tokens*.
 
 
 ## Initialisation d'un nouveau modèle
@@ -397,11 +391,8 @@ tf_eval_dataset = model.prepare_tf_dataset(
 
 {/if}
 
-<Tip warning={true}>
-
-⚠️ Le déplacement des entrées et des étiquettes pour les aligner se fait à l'intérieur du modèle, de sorte que l'assembleur de données ne fait que copier les entrées pour créer les étiquettes.
-
-</Tip>
+> [!WARNING]
+> ⚠️ Le déplacement des entrées et des étiquettes pour les aligner se fait à l'intérieur du modèle, de sorte que l'assembleur de données ne fait que copier les entrées pour créer les étiquettes.
 
 
 Nous avons maintenant tout ce qu'il faut pour entraîner notre modèle. Ce n'était pas si compliqué ! Avant de commencer l'entraînement, nous devons nous connecter à Hugging Face. Si vous travaillez dans un *notebook*, vous pouvez le faire avec la fonction utilitaire suivante :
@@ -500,25 +491,19 @@ model.fit(tf_train_dataset, validation_data=tf_eval_dataset, callbacks=[callback
 
 {/if}
 
-<Tip>
+> [!TIP]
+> ✏️ **Essayez !** Il ne nous a fallu qu'une trentaine de lignes de code en plus des `TrainingArguments` pour passer des textes bruts à l'entraînement du GPT-2. Essayez-le avec votre propre jeu de données et voyez si vous pouvez obtenir de bons résultats !
 
-✏️ **Essayez !** Il ne nous a fallu qu'une trentaine de lignes de code en plus des `TrainingArguments` pour passer des textes bruts à l'entraînement du GPT-2. Essayez-le avec votre propre jeu de données et voyez si vous pouvez obtenir de bons résultats ! 
-
-</Tip>
-
-<Tip>
-
-{#if fw === 'pt'}
-
-💡 Si vous avez accès à une machine avec plusieurs GPUs, essayez d'y exécuter le code. `Trainer` gère automatiquement plusieurs machines ce qui peut accélérer considérablement l'entraînement.
-
-{:else}
-
-💡 Si vous avez accès à une machine avec plusieurs GPUs, vous pouvez essayer d'utiliser `MirroredStrategy` pour accélérer considérablement l'entraînement. Vous devrez créer un objet `tf.distribute.MirroredStrategy` et vous assurer que toutes les méthodes `to_tf_dataset()` ou `prepare_tf_dataset()` ainsi que la création du modèle et l'appel à `fit()` sont tous exécutés dans `scope()`. Vous pouvez consulter la documentation à ce sujet [ici](https://www.tensorflow.org/guide/distributed_training#use_tfdistributestrategy_with_keras_modelfit).
-
-{/if}
-
-</Tip>
+> [!TIP]
+> {#if fw === 'pt'}
+>
+> 💡 Si vous avez accès à une machine avec plusieurs GPUs, essayez d'y exécuter le code. `Trainer` gère automatiquement plusieurs machines ce qui peut accélérer considérablement l'entraînement.
+>
+> {:else}
+>
+> 💡 Si vous avez accès à une machine avec plusieurs GPUs, vous pouvez essayer d'utiliser `MirroredStrategy` pour accélérer considérablement l'entraînement. Vous devrez créer un objet `tf.distribute.MirroredStrategy` et vous assurer que toutes les méthodes `to_tf_dataset()` ou `prepare_tf_dataset()` ainsi que la création du modèle et l'appel à `fit()` sont tous exécutés dans `scope()`. Vous pouvez consulter la documentation à ce sujet [ici](https://www.tensorflow.org/guide/distributed_training#use_tfdistributestrategy_with_keras_modelfit).
+>
+> {/if}
 
 ## Génération de code avec le pipeline
 
@@ -790,11 +775,8 @@ model, optimizer, train_dataloader, eval_dataloader = accelerator.prepare(
 )
 ```
 
-<Tip>
-
-🚨 Si vous vous entraînez sur un TPU, vous devrez déplacer tout le code commençant à la cellule ci-dessus dans une fonction d'entraînement dédiée. Voir le [chapitre 3](/course/fr/chapter3) pour plus de détails.
-
-</Tip>
+> [!TIP]
+> 🚨 Si vous vous entraînez sur un TPU, vous devrez déplacer tout le code commençant à la cellule ci-dessus dans une fonction d'entraînement dédiée. Voir le [chapitre 3](/course/fr/chapter3) pour plus de détails.
 
 Maintenant que nous avons envoyé notre `train_dataloader` à `accelerator.prepare()`, nous pouvons utiliser sa longueur pour calculer le nombre d'étapes d'entraînement. Rappelez-vous que nous devons toujours faire cela après avoir préparé le *dataloader* car cette méthode modifiera sa longueur. Nous utilisons un programme linéaire classique du taux d'apprentissage à 0 :
 
@@ -893,16 +875,10 @@ for epoch in range(num_train_epochs):
 
 Et voilà, vous disposez maintenant de votre propre boucle d'entraînement personnalisée pour les modèles de langage causal tels que le GPT-2. Vous pouvez encore l'adapter à vos besoins. 
 
-<Tip>
-
-✏️ **Essayez !** Vous pouvez créer votre propre fonction de perte personnalisée, adaptée à votre cas d'utilisation, ou ajouter une autre étape personnalisée dans la boucle d'entraînement.
-
-</Tip>
-
-<Tip>
-
-✏️ **Essayez !** Lorsque vous effectuez de longues expériences d'entraînement, il est bon d'enregistrer les mesures importantes à l'aide d'outils tels que *TensorBoard* ou *Weights & Biases*. Ajoutez l'un d'eux à la boucle d'entraînement afin de pouvoir toujours vérifier comment se déroule l'entraînement.
+> [!TIP]
+> ✏️ **Essayez !** Vous pouvez créer votre propre fonction de perte personnalisée, adaptée à votre cas d'utilisation, ou ajouter une autre étape personnalisée dans la boucle d'entraînement.
 
-</Tip>
+> [!TIP]
+> ✏️ **Essayez !** Lorsque vous effectuez de longues expériences d'entraînement, il est bon d'enregistrer les mesures importantes à l'aide d'outils tels que *TensorBoard* ou *Weights & Biases*. Ajoutez l'un d'eux à la boucle d'entraînement afin de pouvoir toujours vérifier comment se déroule l'entraînement.
 
 {/if}
diff --git a/chapters/fr/chapter7/7.mdx b/chapters/fr/chapter7/7.mdx
index 885a0b126..f3c06be25 100644
--- a/chapters/fr/chapter7/7.mdx
+++ b/chapters/fr/chapter7/7.mdx
@@ -33,11 +33,8 @@ Nous allons *finetuner* un modèle BERT sur le [jeu de données SQuAD](https://r
 
 Il s'agit d'une présentation du modèle qui a été entraîné à l'aide du code présenté dans cette section et qui a ensuité été téléchargé sur le *Hub*. Vous pouvez le trouver [ici](https://huggingface.co/huggingface-course/bert-finetuned-squad?context=%F0%9F%A4%97+Transformers+is+backed+by+the+three+most+popular+deep+learning+libraries+%E2%80%94+Jax%2C+PyTorch+and+TensorFlow+%E2%80%94+with+a+seamless+integration+between+them.+It%27s+straightforward+to+train+your+models+with+one+before+loading+them+for+inference+with+the+other.&question=Which+deep+learning+libraries+back+%F0%9F%A4%97+Transformers%3F)
 
-<Tip>
-
-💡 Les modèles basé que sur l'encodeur comme BERT ont tendance à être excellents pour extraire les réponses à des questions factuelles comme « Qui a inventé l'architecture Transformer ? » mais ne sont pas très performants lorsqu'on leur pose des questions ouvertes comme « Pourquoi le ciel est-il bleu ? ». Dans ces cas plus difficiles, les modèles encodeurs-décodeurs comme le T5 et BART sont généralement utilisés pour synthétiser les informations d'une manière assez similaire au [résumé de texte](/course/fr/chapter7/5). Si vous êtes intéressé par ce type de réponse aux questions *génératives*, nous vous recommandons de consulter notre [démo](https://yjernite.github.io/lfqa.html) basée sur le [jeu de données ELI5](https://huggingface.co/datasets/eli5).
-
-</Tip>
+> [!TIP]
+> 💡 Les modèles basé que sur l'encodeur comme BERT ont tendance à être excellents pour extraire les réponses à des questions factuelles comme « Qui a inventé l'architecture Transformer ? » mais ne sont pas très performants lorsqu'on leur pose des questions ouvertes comme « Pourquoi le ciel est-il bleu ? ». Dans ces cas plus difficiles, les modèles encodeurs-décodeurs comme le T5 et BART sont généralement utilisés pour synthétiser les informations d'une manière assez similaire au [résumé de texte](/course/fr/chapter7/5). Si vous êtes intéressé par ce type de réponse aux questions *génératives*, nous vous recommandons de consulter notre [démo](https://yjernite.github.io/lfqa.html) basée sur le [jeu de données ELI5](https://huggingface.co/datasets/eli5).
 
 ## Préparation des données
 
@@ -380,11 +377,8 @@ print(f"Theoretical answer: {answer}, decoded example: {decoded_example}")
 
 En effet, nous ne voyons pas la réponse dans le contexte.
 
-<Tip>
-
-✏️ **A votre tour !** En utilisant l'architecture XLNet, le *padding* est appliqué à gauche et la question et le contexte sont intervertis. Adaptez tout le code que nous venons de voir à l'architecture XLNet (et ajoutez `padding=True`). Soyez conscient que le token `[CLS]` peut ne pas être à la position 0 avec le *padding* appliqué.
-
-</Tip>
+> [!TIP]
+> ✏️ **A votre tour !** En utilisant l'architecture XLNet, le *padding* est appliqué à gauche et la question et le contexte sont intervertis. Adaptez tout le code que nous venons de voir à l'architecture XLNet (et ajoutez `padding=True`). Soyez conscient que le token `[CLS]` peut ne pas être à la position 0 avec le *padding* appliqué.
 
 Maintenant que nous avons vu étape par étape comment prétraiter nos données d'entraînement, nous pouvons les regrouper dans une fonction que nous appliquerons à l'ensemble des données d'entraînement. Nous allons rembourrer chaque caractéristique à la longueur maximale que nous avons définie, car la plupart des contextes seront longs (et les échantillons correspondants seront divisés en plusieurs caractéristiques). Il n'y a donc pas de réel avantage à appliquer un rembourrage dynamique ici :
 
@@ -929,11 +923,8 @@ Par défaut, le dépôt utilisé sera dans votre espace et nommé après le rép
 
 {#if fw === 'pt'}
 
-<Tip>
-
-💡 Si le répertoire de sortie que vous utilisez existe, il doit être un clone local du dépôt vers lequel vous voulez pousser (donc définissez un nouveau nom si vous obtenez une erreur lors de la définition de votre `Trainer`).
-
-</Tip>
+> [!TIP]
+> 💡 Si le répertoire de sortie que vous utilisez existe, il doit être un clone local du dépôt vers lequel vous voulez pousser (donc définissez un nouveau nom si vous obtenez une erreur lors de la définition de votre `Trainer`).
 
 Enfin, nous passons tout à la classe `Trainer` et lançons l'entraînement :
 
@@ -1017,11 +1008,8 @@ Le `Trainer` rédige également une carte de modèle avec tous les résultats de
 
 À ce stade, vous pouvez utiliser le *widget* d'inférence sur le *Hub* du modèle pour tester le modèle et le partager avec vos amis, votre famille et vos animaux préférés. Vous avez réussi à *finetuner* un modèle sur une tâche de réponse à une question. Félicitations !
 
-<Tip>
-
-✏️ **A votre tour** Essayez un autre modèle pour voir s'il est plus performant pour cette tâche !
-
-</Tip>
+> [!TIP]
+> ✏️ **A votre tour** Essayez un autre modèle pour voir s'il est plus performant pour cette tâche !
 
 {#if fw === 'pt'}
 
diff --git a/chapters/fr/chapter8/2.mdx b/chapters/fr/chapter8/2.mdx
index df721a540..6abbb73e6 100644
--- a/chapters/fr/chapter8/2.mdx
+++ b/chapters/fr/chapter8/2.mdx
@@ -87,11 +87,8 @@ Oh non, quelque chose semble s'être mal passée ! Si vous êtes novice en progr
 
 Il y a beaucoup d'informations dans ces rapports, nous allons donc en parcourir ensemble les éléments clés. La première chose à noter est que les *tracebacks* doivent être lus _de bas en haut_. Cela peut sembler bizarre si vous avez l'habitude de lire du texte français de haut en bas mais cela reflète le fait que le *traceback* montre la séquence d'appels de fonction que le `pipeline` fait lors du téléchargement du modèle et du *tokenizer*. Consultez le [chapitre 2](/course/fr/chapter2) pour plus de détails sur la façon dont le `pipeline` fonctionne sous le capot.
 
-<Tip>
-
-🚨 Vous voyez le cadre bleu autour de « <i>6 frames</i> » dans le <i>traceback</i> de Google Colab ? Il s'agit d'une fonctionnalité spéciale de Colab qui compresse le <i>traceback</i> en <i>frames</i>. Si vous ne parvenez pas à trouver la source d'une erreur, déroulez le <i>traceback</i> en cliquant sur ces deux petites flèches.
-
-</Tip>
+> [!TIP]
+> 🚨 Vous voyez le cadre bleu autour de « <i>6 frames</i> » dans le <i>traceback</i> de Google Colab ? Il s'agit d'une fonctionnalité spéciale de Colab qui compresse le <i>traceback</i> en <i>frames</i>. Si vous ne parvenez pas à trouver la source d'une erreur, déroulez le <i>traceback</i> en cliquant sur ces deux petites flèches.
 
 Cela signifie que la dernière ligne du <i>traceback</i> indique le dernier message d'erreur et donne le nom de l'exception qui a été levée. Dans ce cas, le type d'exception est `OSError`, ce qui indique une erreur liée au système. Si nous lisons le message d'erreur qui l'accompagne, nous pouvons voir qu'il semble y avoir un problème avec le fichier *config.json* du modèle et deux suggestions nous sont données pour le résoudre :
 
@@ -105,11 +102,8 @@ Make sure that:
 """
 ```
 
-<Tip>
-
-💡 Si vous rencontrez un message d'erreur difficile à comprendre, copiez et collez le message dans Google ou sur [Stack Overflow](https://stackoverflow.com/) (oui, vraiment !). Il y a de fortes chances que vous ne soyez pas la première personne à rencontrer cette erreur et c'est un bon moyen de trouver des solutions que d'autres membres de la communauté ont publiées. Par exemple, en recherchant `OSError : Can't load config for` sur Stack Overflow donne plusieurs [réponses](https://stackoverflow.com/search?q=OSError%3A+Can%27t+load+config+for+) qui peuvent être utilisées comme point de départ pour résoudre le problème.
-
-</Tip>
+> [!TIP]
+> 💡 Si vous rencontrez un message d'erreur difficile à comprendre, copiez et collez le message dans Google ou sur [Stack Overflow](https://stackoverflow.com/) (oui, vraiment !). Il y a de fortes chances que vous ne soyez pas la première personne à rencontrer cette erreur et c'est un bon moyen de trouver des solutions que d'autres membres de la communauté ont publiées. Par exemple, en recherchant `OSError : Can't load config for` sur Stack Overflow donne plusieurs [réponses](https://stackoverflow.com/search?q=OSError%3A+Can%27t+load+config+for+) qui peuvent être utilisées comme point de départ pour résoudre le problème.
 
 La première suggestion nous demande de vérifier si l'identifiant du modèle est effectivement correct, la première chose à faire est donc de copier l'identifiant et de le coller dans la barre de recherche du *Hub* :
 
@@ -161,11 +155,8 @@ pretrained_checkpoint = "distilbert-base-uncased"
 config = AutoConfig.from_pretrained(pretrained_checkpoint)
 ```
 
-<Tip warning={true}>
-
-🚨 L'approche que nous adoptons ici n'est pas infaillible puisque notre collègue peut avoir modifié la configuration de `distilbert-base-uncased` avant de <i>finetuner</i> le modèle. Dans la vie réelle, nous voudrions vérifier avec lui d'abord, mais pour les besoins de cette section nous supposerons qu'il a utilisé la configuration par défaut.
-
-</Tip>
+> [!WARNING]
+> 🚨 L'approche que nous adoptons ici n'est pas infaillible puisque notre collègue peut avoir modifié la configuration de `distilbert-base-uncased` avant de <i>finetuner</i> le modèle. Dans la vie réelle, nous voudrions vérifier avec lui d'abord, mais pour les besoins de cette section nous supposerons qu'il a utilisé la configuration par défaut.
 
 Nous pouvons ensuite le pousser vers notre dépôt de modèles avec la fonction `push_to_hub()` de la configuration :
 
diff --git a/chapters/fr/chapter8/4.mdx b/chapters/fr/chapter8/4.mdx
index fb770c1e2..95b46f9e4 100644
--- a/chapters/fr/chapter8/4.mdx
+++ b/chapters/fr/chapter8/4.mdx
@@ -247,11 +247,8 @@ Donc `1` signifie `neutral`, ce qui signifie que les deux phrases que nous avons
 
 Nous n'avons pas de *token* de type identifiant ici puisque DistilBERT ne les attend pas. Si vous en avez dans votre modèle, vous devriez également vous assurer qu'ils correspondent correctement à l'endroit où se trouvent la première et la deuxième phrase dans l'entrée.
 
-<Tip>
-
-✏️ *A votre tour !* Vérifiez que tout semble correct avec le deuxième élément du jeu de données d'entraînement.
-
-</Tip>
+> [!TIP]
+> ✏️ *A votre tour !* Vérifiez que tout semble correct avec le deuxième élément du jeu de données d'entraînement.
 
 Ici nous ne vérifions que le jeu d'entraînement. Vous devez bien sûr vérifier de la même façon les jeux de validation et de test.
 
@@ -525,11 +522,8 @@ Chaque fois que vous obtenez un message d'erreur qui commence par `RuntimeError
 
 Pour résoudre ce problème, il suffit d'utiliser moins d'espace GPU, ce qui est souvent plus facile à dire qu'à faire. Tout d'abord, assurez-vous que vous n'avez pas deux modèles sur le GPU en même temps (sauf si cela est nécessaire pour votre problème, bien sûr). Ensuite, vous devriez probablement réduire la taille de votre batch car elle affecte directement les tailles de toutes les sorties intermédiaires du modèle et leurs gradients. Si le problème persiste, envisagez d'utiliser une version plus petite de votre modèle.
 
-<Tip>
-
-Dans la prochaine partie du cours, nous examinerons des techniques plus avancées qui peuvent vous aider à réduire votre empreinte mémoire et vous permettre de <i>finetuner</i> les plus grands modèles.
-
-</Tip>
+> [!TIP]
+> Dans la prochaine partie du cours, nous examinerons des techniques plus avancées qui peuvent vous aider à réduire votre empreinte mémoire et vous permettre de <i>finetuner</i> les plus grands modèles.
 
 ### Évaluation du modèle
 
@@ -556,11 +550,8 @@ trainer.evaluate()
 TypeError: only size-1 arrays can be converted to Python scalars
 ```
 
-<Tip>
-
-💡 Vous devriez toujours vous assurer que vous pouvez exécuter `trainer.evaluate()` avant de lancer `trainer.train()`, pour éviter de gaspiller beaucoup de ressources de calcul avant de tomber sur une erreur.
-
-</Tip>
+> [!TIP]
+> 💡 Vous devriez toujours vous assurer que vous pouvez exécuter `trainer.evaluate()` avant de lancer `trainer.train()`, pour éviter de gaspiller beaucoup de ressources de calcul avant de tomber sur une erreur.
 
 Avant de tenter de déboguer un problème dans la boucle d'évaluation, vous devez d'abord vous assurer que vous avez examiné les données, que vous êtes en mesure de former un batch correctement et que vous pouvez exécuter votre modèle sur ces données. Nous avons effectué toutes ces étapes, et le code suivant peut donc être exécuté sans erreur :
 
@@ -690,11 +681,8 @@ trainer.train()
 
 Dans ce cas, il n'y a plus de problème, et notre script va *finetuner* un modèle qui devrait donner des résultats raisonnables. Mais que faire lorsque l'entraînement se déroule sans erreur et que le modèle entraîné n'est pas du tout performant ? C'est la partie la plus difficile de l'apprentissage automatique et nous allons vous montrer quelques techniques qui peuvent vous aider.
 
-<Tip>
-
-💡 Si vous utilisez une boucle d'entraînement manuelle, les mêmes étapes s'appliquent pour déboguer votre pipeline d'entraînement, mais il est plus facile de les séparer. Assurez-vous cependant de ne pas avoir oublié le `model.eval()` ou le `model.train()` aux bons endroits, ou le `zero_grad()` à chaque étape !
-
-</Tip>
+> [!TIP]
+> 💡 Si vous utilisez une boucle d'entraînement manuelle, les mêmes étapes s'appliquent pour déboguer votre pipeline d'entraînement, mais il est plus facile de les séparer. Assurez-vous cependant de ne pas avoir oublié le `model.eval()` ou le `model.train()` aux bons endroits, ou le `zero_grad()` à chaque étape !
 
 ## Déboguer les erreurs silencieuses pendant l'entraînement
 
@@ -709,11 +697,8 @@ Votre modèle n'apprendra quelque chose que s'il est réellement possible d'appr
 - y a-t-il une étiquette qui est plus courante que les autres ?
 - quelle devrait être la perte/métrique si le modèle prédisait une réponse aléatoire/toujours la même réponse ?
 
-<Tip warning={true}>
-
-⚠️ Si vous effectuez un entraînement distribué, imprimez des échantillons de votre ensemble de données dans chaque processus et vérifiez par trois fois que vous obtenez la même chose. Un bug courant consiste à avoir une source d'aléa dans la création des données qui fait que chaque processus a une version différente du jeu de données.
-
-</Tip>
+> [!WARNING]
+> ⚠️ Si vous effectuez un entraînement distribué, imprimez des échantillons de votre ensemble de données dans chaque processus et vérifiez par trois fois que vous obtenez la même chose. Un bug courant consiste à avoir une source d'aléa dans la création des données qui fait que chaque processus a une version différente du jeu de données.
 
 Après avoir examiné vos données, examinez quelques-unes des prédictions du modèle. Si votre modèle produit des *tokens*, essayez aussi de les décoder ! Si le modèle prédit toujours la même chose, cela peut être dû au fait que votre jeu de données est biaisé en faveur d'une catégorie (pour les problèmes de classification). Des techniques telles que le suréchantillonnage des classes rares peuvent aider. D'autre part, cela peut également être dû à des problèmes d'entraînement tels que de mauvais réglages des hyperparamètres.
 
@@ -742,11 +727,8 @@ for _ in range(20):
     trainer.optimizer.zero_grad()
 ```
 
-<Tip>
-
-💡 Si vos données d'entraînement ne sont pas équilibrées, veillez à créer un batch de données d'entraînement contenant toutes les étiquettes.
-
-</Tip>
+> [!TIP]
+> 💡 Si vos données d'entraînement ne sont pas équilibrées, veillez à créer un batch de données d'entraînement contenant toutes les étiquettes.
 
 Le modèle résultant devrait avoir des résultats proches de la perfection sur le même `batch`. Calculons la métrique sur les prédictions résultantes :
 
@@ -767,11 +749,8 @@ compute_metrics((preds.cpu().numpy(), labels.cpu().numpy()))
 
 Si vous ne parvenez pas à ce que votre modèle obtienne des résultats parfaits comme celui-ci, cela signifie qu'il y a quelque chose qui ne va pas dans la façon dont vous avez formulé le problème ou dans vos données. Vous devez donc y remédier. Ce n'est que lorsque vous parviendrez à passer le test de surentraînement que vous pourrez être sûr que votre modèle peut réellement apprendre quelque chose.
 
-<Tip warning={true}>
-
-⚠️ Vous devrez recréer votre modèle et votre `Trainer` après ce test, car le modèle obtenu ne sera probablement pas capable de récupérer et d'apprendre quelque chose d'utile sur votre jeu de données complet.
-
-</Tip>
+> [!WARNING]
+> ⚠️ Vous devrez recréer votre modèle et votre `Trainer` après ce test, car le modèle obtenu ne sera probablement pas capable de récupérer et d'apprendre quelque chose d'utile sur votre jeu de données complet.
 
 ### Ne réglez rien tant que vous n'avez pas une première ligne de base
 
diff --git a/chapters/fr/chapter8/4_tf.mdx b/chapters/fr/chapter8/4_tf.mdx
index e35e9aadd..4c776415e 100644
--- a/chapters/fr/chapter8/4_tf.mdx
+++ b/chapters/fr/chapter8/4_tf.mdx
@@ -115,15 +115,12 @@ model.compile(optimizer="adam")
 
 Maintenant, nous allons utiliser la perte interne du modèle et ce problème devrait être résolu !
 
-<Tip>
-
-✏️ *A votre tour !* Comme défi optionnel après avoir résolu les autres problèmes, vous pouvez essayer de revenir à cette étape et faire fonctionner le modèle avec la perte originale calculée par Keras au lieu de la perte interne. Vous devrez ajouter `"labels"` à l'argument `label_cols` de `to_tf_dataset()` pour vous assurer que les labels sont correctement sortis, ce qui vous donnera des gradients. Mais il y a un autre problème avec la perte que nous avons spécifiée. L'entraînement fonctionnera toujours avec ce problème mais l'apprentissage sera très lent et se stabilisera à une perte d'entraînement élevée. Pouvez-vous trouver ce que c'est ?
-
-Un indice codé en ROT13, si vous êtes coincé : Vs lbh ybbx ng gur bhgchgf bs FrdhraprPynffvsvpngvba zbqryf va Genafsbezref, gurve svefg bhgchg vf `ybtvgf`. Jung ner ybtvgf ?
-
-Et un deuxième indice : Jura lbh fcrpvsl bcgvzvmref, npgvingvbaf be ybffrf jvgu fgevatf, Xrenf frgf ny gur nethzrag inyhrf gb gurve qrsnhygf. Jung nethzragf qbrf FcnefrPngrtbevpnyPebffragebcl unir, naq jung ner gurve qrsnhygf ?
-
-</Tip>
+> [!TIP]
+> ✏️ *A votre tour !* Comme défi optionnel après avoir résolu les autres problèmes, vous pouvez essayer de revenir à cette étape et faire fonctionner le modèle avec la perte originale calculée par Keras au lieu de la perte interne. Vous devrez ajouter `"labels"` à l'argument `label_cols` de `to_tf_dataset()` pour vous assurer que les labels sont correctement sortis, ce qui vous donnera des gradients. Mais il y a un autre problème avec la perte que nous avons spécifiée. L'entraînement fonctionnera toujours avec ce problème mais l'apprentissage sera très lent et se stabilisera à une perte d'entraînement élevée. Pouvez-vous trouver ce que c'est ?
+>
+> Un indice codé en ROT13, si vous êtes coincé : Vs lbh ybbx ng gur bhgchgf bs FrdhraprPynffvsvpngvba zbqryf va Genafsbezref, gurve svefg bhgchg vf `ybtvgf`. Jung ner ybtvgf ?
+>
+> Et un deuxième indice : Jura lbh fcrpvsl bcgvzvmref, npgvingvbaf be ybffrf jvgu fgevatf, Xrenf frgf ny gur nethzrag inyhrf gb gurve qrsnhygf. Jung nethzragf qbrf FcnefrPngrtbevpnyPebffragebcl unir, naq jung ner gurve qrsnhygf ?
 
 Maintenant, essayons d'entraîner. Nous devrions obtenir des gradients maintenant, donc avec un peu de chance nous pouvons juste appeler `model.fit()` et tout fonctionnera bien !
 
@@ -367,11 +364,8 @@ model = TFAutoModelForSequenceClassification.from_pretrained(model_checkpoint)
 model.compile(optimizer=Adam(5e-5))
 ```
 
-<Tip>
-
-💡 Vous pouvez également importer la fonction `create_optimizer()` de 🤗 <i>Transformers</i> qui vous donnera un optimiseur AdamW avec une décroissance du taux des poids correcte ainsi qu'un réchauffement et une décroissance du taux d'apprentissage. Cet optimiseur produira souvent des résultats légèrement meilleurs que ceux que vous obtenez avec l'optimiseur Adam par défaut.
-
-</Tip>
+> [!TIP]
+> 💡 Vous pouvez également importer la fonction `create_optimizer()` de 🤗 <i>Transformers</i> qui vous donnera un optimiseur AdamW avec une décroissance du taux des poids correcte ainsi qu'un réchauffement et une décroissance du taux d'apprentissage. Cet optimiseur produira souvent des résultats légèrement meilleurs que ceux que vous obtenez avec l'optimiseur Adam par défaut.
 
 Maintenant, nous pouvons essayer de *finetuner* le modèle avec le nouveau taux d'apprentissage :
 
@@ -393,11 +387,8 @@ Nous avons couvert les problèmes dans le script ci-dessus, mais il existe plusi
 
 Le signe révélateur d'un manque de mémoire est une erreur du type "OOM when allocating tensor" (OOM étant l'abréviation de *out of memory*). Il s'agit d'un risque très courant lorsque l'on utilise de grands modèles de langage. Si vous rencontrez ce problème, une bonne stratégie consiste à diviser par deux la taille de votre batch et à réessayer. Gardez à l'esprit, cependant, que certains modèles sont *très* grands. Par exemple, le modèle GPT-2 complet possède 1,5 Go de paramètres, ce qui signifie que vous aurez besoin de 6 Go de mémoire rien que pour stocker le modèle, et 6 autres Go pour ses gradients ! Entraîner le modèle GPT-2 complet nécessite généralement plus de 20 Go de VRAM, quelle que soit la taille du batch utilisé, ce dont seuls quelques GPUs sont dotés. Des modèles plus légers comme `distilbert-base-cased` sont beaucoup plus faciles à exécuter et s'entraînent aussi beaucoup plus rapidement.
 
-<Tip>
-
-Dans la prochaine partie du cours, nous examinerons des techniques plus avancées qui peuvent vous aider à réduire votre empreinte mémoire et vous permettre de <i>finetuner</i> les plus grands modèles.
-
-</Tip>
+> [!TIP]
+> Dans la prochaine partie du cours, nous examinerons des techniques plus avancées qui peuvent vous aider à réduire votre empreinte mémoire et vous permettre de <i>finetuner</i> les plus grands modèles.
 
 ### TensorFlow affamé 🦛
 
@@ -455,21 +446,15 @@ for batch in train_dataset:
 model.fit(batch, epochs=20)
 ```
 
-<Tip>
-
-💡 Si vos données d'entraînement ne sont pas équilibrées, veillez à créer un batch de données d'entraînement contenant toutes les étiquettes.
-
-</Tip>
+> [!TIP]
+> 💡 Si vos données d'entraînement ne sont pas équilibrées, veillez à créer un batch de données d'entraînement contenant toutes les étiquettes.
 
 Le modèle résultant devrait avoir des résultats proches de la perfection sur le `batch`, avec une perte diminuant rapidement vers 0 (ou la valeur minimale pour la perte que vous utilisez).
 
 Si vous ne parvenez pas à ce que votre modèle obtienne des résultats parfaits comme celui-ci, cela signifie qu'il y a quelque chose qui ne va pas dans la façon dont vous avez formulé le problème ou dans vos données et vous devez donc y remédier. Ce n'est que lorsque vous parviendrez à passer le test de surentraînement que vous pourrez être sûr que votre modèle peut réellement apprendre quelque chose.
 
-<Tip warning={true}>
-
-⚠️ Vous devrez recréer votre modèle et votre `Trainer` après ce test, car le modèle obtenu ne sera probablement pas capable de récupérer et d'apprendre quelque chose d'utile sur votre jeu de données complet.
-
-</Tip>
+> [!WARNING]
+> ⚠️ Vous devrez recréer votre modèle et votre `Trainer` après ce test, car le modèle obtenu ne sera probablement pas capable de récupérer et d'apprendre quelque chose d'utile sur votre jeu de données complet.
 
 ### Ne réglez rien tant que vous n'avez pas une première ligne de base
 
diff --git a/chapters/fr/chapter8/5.mdx b/chapters/fr/chapter8/5.mdx
index 19be6c405..48109d38e 100644
--- a/chapters/fr/chapter8/5.mdx
+++ b/chapters/fr/chapter8/5.mdx
@@ -21,11 +21,8 @@ Lorsque vous êtes sûr d'avoir identifier un *bug*, la première étape consist
 
 Il est très important d'isoler le morceau de code qui produit le *bug* car personne dans l'équipe d'Hugging Face n'est (encore) un magicien et on ne peut pas réparer ce qu'on ne peut pas voir. Un exemple minimal reproductible doit, comme son nom l'indique, être reproductible. Cela signifie qu'il ne doit pas dépendre de fichiers ou de données externes que vous pourriez avoir. Essayez de remplacer les données que vous utilisez par des valeurs fictives qui ressemblent à vos valeurs réelles et qui produisent toujours la même erreur.
 
-<Tip>
-
-🚨 De nombreux problèmes dans le dépôt 🤗 *Transformers* ne sont pas résolus car les données utilisées pour les reproduire ne sont pas accessibles.
-
-</Tip>
+> [!TIP]
+> 🚨 De nombreux problèmes dans le dépôt 🤗 *Transformers* ne sont pas résolus car les données utilisées pour les reproduire ne sont pas accessibles.
 
 Une fois que vous avez quelque chose d'autonome, essayez de le réduire au moins de lignes de code possible, en construisant ce que nous appelons un _exemple reproductible minimal_. Bien que cela nécessite un peu plus de travail de votre part, vous serez presque assuré d'obtenir de l'aide et une correction si vous fournissez un exemple reproductible court et agréable.
 
diff --git a/chapters/fr/chapter9/1.mdx b/chapters/fr/chapter9/1.mdx
index 8dd18331c..9387245f1 100644
--- a/chapters/fr/chapter9/1.mdx
+++ b/chapters/fr/chapter9/1.mdx
@@ -32,6 +32,5 @@ Voici quelques exemples de démos d'apprentissage automatique construites avec G
 
 Ce chapitre est divisé en sections qui comprennent à la fois des _concepts_ et des _applications_. Après avoir appris le concept dans chaque section, vous l'appliquerez pour construire un type particulier de démo, allant de la classification d'images à la reconnaissance vocale. À la fin de ce chapitre, vous serez en mesure de créer ces démos (et bien d'autres !) en quelques lignes de code Python seulement. 
 
-<Tip>
-👀 Consultez <a href="https://huggingface.co/spaces" target="_blank">Hugging Face Spaces</a> pour voir de nombreux exemples récents de démos d'apprentissage automatique construites par la communauté !
-</Tip>
+> [!TIP]
+> 👀 Consultez <a href="https://huggingface.co/spaces" target="_blank">Hugging Face Spaces</a> pour voir de nombreux exemples récents de démos d'apprentissage automatique construites par la communauté !
diff --git a/chapters/fr/chapter9/7.mdx b/chapters/fr/chapter9/7.mdx
index 34395d9e7..cbfee5ba0 100644
--- a/chapters/fr/chapter9/7.mdx
+++ b/chapters/fr/chapter9/7.mdx
@@ -64,9 +64,8 @@ Ce simple exemple ci-dessus introduit 4 concepts qui sous-tendent les *Blocks* :
 
 1. Les *Blocks* vous permettent de construire des applications web qui combinent Markdown, HTML, boutons et composants interactifs simplement en instanciant des objets en Python dans un contexte `with gradio.Blocks`. 
 
-<Tip>
-🙋Si vous n'êtes pas familier avec l'instruction `with` en Python, nous vous recommandons de consulter l'excellent <a href="https://realpython.com/python-with-statement/">tutoriel</a> de Real Python. Revenez ici après l'avoir lu 🤗
-</Tip>
+> [!TIP]
+> 🙋Si vous n'êtes pas familier avec l'instruction `with` en Python, nous vous recommandons de consulter l'excellent <a href="https://realpython.com/python-with-statement/">tutoriel</a> de Real Python. Revenez ici après l'avoir lu 🤗
 
 L'ordre dans lequel vous instanciez les composants est important car chaque élément est restitué dans l'application Web dans l'ordre où il a été créé. (Les mises en page plus complexes sont abordées ci-dessous)
 
diff --git a/chapters/hi/chapter1/3.mdx b/chapters/hi/chapter1/3.mdx
index c7c0b2fdc..555bb79a2 100644
--- a/chapters/hi/chapter1/3.mdx
+++ b/chapters/hi/chapter1/3.mdx
@@ -9,13 +9,10 @@
 
 इस खंड में, हम देखेंगे कि ट्रांसफॉर्मर मॉडल क्या कर सकते हैं और 🤗 ट्रांसफॉर्मर्स लाइब्रेरी: `पाइपलाइन ()` फ़ंक्शन से हमारे पहले टूल का उपयोग कर सकते हैं।
 
-<Tip>
-
-👀 ऊपर दाईं ओर *Colab में खोलें* बटन देखें? इस अनुभाग के सभी कोड नमूनों के साथ Google Colab नोटबुक खोलने के लिए उस पर क्लिक करें। यह बटन कोड उदाहरणों वाले किसी भी अनुभाग में मौजूद होगा।
-
-यदि आप उदाहरणों को स्थानीय रूप से चलाना चाहते हैं, तो हम <a href="/course/chapter0">सेटअप</a> पर एक नज़र डालने की अनुशंसा करते हैं।
-
-</Tip>
+> [!TIP]
+> 👀 ऊपर दाईं ओर *Colab में खोलें* बटन देखें? इस अनुभाग के सभी कोड नमूनों के साथ Google Colab नोटबुक खोलने के लिए उस पर क्लिक करें। यह बटन कोड उदाहरणों वाले किसी भी अनुभाग में मौजूद होगा।
+>
+> यदि आप उदाहरणों को स्थानीय रूप से चलाना चाहते हैं, तो हम <a href="/course/chapter0">सेटअप</a> पर एक नज़र डालने की अनुशंसा करते हैं।
 
 ## ट्रांसफॉर्मर हर जगह हैं!
 
@@ -25,9 +22,8 @@
 
 [🤗 ट्रांसफॉर्मर्स लाइब्रेरी](https://github.com/huggingface/transformers) उन साझा मॉडलों को बनाने और उपयोग करने की कार्यक्षमता प्रदान करती है। [मॉडल हब](https://huggingface.co/models) में हजारों पूर्व-प्रशिक्षित मॉडल हैं जिन्हें कोई भी डाउनलोड और उपयोग कर सकता है। आप हब पर अपने स्वयं के मॉडल भी अपलोड कर सकते हैं!
 
-<Tip>
- ⚠️ हगिंग फेस हब ट्रांसफॉर्मर मॉडल तक सीमित नहीं है। कोई भी किसी भी प्रकार के मॉडल या डेटासेट साझा कर सकता है! सभी उपलब्ध सुविधाओं का लाभ उठाने के लिए एक <a href="https://huggingface.co/join">हगिंगफेस खाता बनाएं!</a>
-  </Tip>
+> [!TIP]
+> ⚠️ हगिंग फेस हब ट्रांसफॉर्मर मॉडल तक सीमित नहीं है। कोई भी किसी भी प्रकार के मॉडल या डेटासेट साझा कर सकता है! सभी उपलब्ध सुविधाओं का लाभ उठाने के लिए एक <a href="https://huggingface.co/join">हगिंगफेस खाता बनाएं!</a>
 
 ट्रांसफॉर्मर मॉडल हुड के तहत कैसे काम करते हैं, यह जानने से पहले, आइए कुछ उदाहरण देखें कि कुछ दिलचस्प प्राकृतिक भाषा प्रसंस्करण समस्याओं को हल करने के लिए उनका उपयोग कैसे किया जा सकता है।
 
@@ -122,11 +118,8 @@ classifier(
 
 इस पाइपलाइन को _शून्य-शॉट_ कहा जाता है क्योंकि इसका उपयोग करने के लिए आपको अपने डेटा पर मॉडल को फ़ाइन-ट्यून करने की आवश्यकता नहीं है। यह आपके इच्छित लेबल की किसी भी सूची के लिए सीधे संभाव्यता स्कोर लौटा सकता है!
 
-<Tip>
-  
- ✏️ **कोशिश करके देखो!** अपने स्वयं के अनुक्रमों और लेबलों के साथ खेलें और देखें कि मॉडल कैसा व्यवहार करता है।
-  
-</Tip>
+> [!TIP]
+> ✏️ **कोशिश करके देखो!** अपने स्वयं के अनुक्रमों और लेबलों के साथ खेलें और देखें कि मॉडल कैसा व्यवहार करता है।
 
 ## पाठ निर्माण
 
@@ -149,11 +142,8 @@ generator("In this course, we will teach you how to")
 
 आप यह नियंत्रित कर सकते हैं कि `num_return_sequences` तर्क और `max_length` तर्क के साथ आउटपुट टेक्स्ट की कुल लंबाई के साथ कितने अलग-अलग क्रम उत्पन्न होते हैं।
 
-<Tip>
-  
-✏️ **कोशिश करके देखो!** 15 शब्दों के दो वाक्य बनाने के लिए `num_return_sequences` और `max_length` तर्कों का उपयोग करें।
-  
-</Tip>
+> [!TIP]
+> ✏️ **कोशिश करके देखो!** 15 शब्दों के दो वाक्य बनाने के लिए `num_return_sequences` और `max_length` तर्कों का उपयोग करें।
 
 ## हब से पाइपलाइन में किसी भी मॉडल का उपयोग करना
 
@@ -184,11 +174,8 @@ generator(
 
 एक बार जब आप उस पर क्लिक करके एक मॉडल का चयन करते हैं, तो आप देखेंगे कि एक विजेट है जो आपको इसे सीधे ऑनलाइन आज़माने में सक्षम बनाता है। इस प्रकार आप मॉडल को डाउनलोड करने से पहले उसकी क्षमताओं का शीघ्रता से परीक्षण कर सकते हैं।
 
-<Tip>
-  
- ✏️ **कोशिश करके देखो!** किसी अन्य भाषा के लिए टेक्स्ट जनरेशन मॉडल खोजने के लिए फ़िल्टर का उपयोग करें। विजेट के साथ खेलने के लिए स्वतंत्र महसूस करें और इसे पाइपलाइन में उपयोग करें!
-  
-</Tip>
+> [!TIP]
+> ✏️ **कोशिश करके देखो!** किसी अन्य भाषा के लिए टेक्स्ट जनरेशन मॉडल खोजने के लिए फ़िल्टर का उपयोग करें। विजेट के साथ खेलने के लिए स्वतंत्र महसूस करें और इसे पाइपलाइन में उपयोग करें!
 
 ## अनुमान एपीआई
 
@@ -220,11 +207,8 @@ unmasker("This course will teach you all about <mask> models.", top_k=2)
 
 `top_k` तर्क नियंत्रित करता है कि आप कितनी संभावनाएं प्रदर्शित करना चाहते हैं। ध्यान दें कि यहां मॉडल विशेष `<mask>` शब्द भरता है, जिसे अक्सर *मास्क टोकन* के रूप में संदर्भित किया जाता है। अन्य मुखौटा-भरने वाले मॉडलों में अलग-अलग मुखौटा टोकन हो सकते हैं, इसलिए अन्य मॉडलों की खोज करते समय उचित मुखौटा शब्द को सत्यापित करना हमेशा अच्छा होता है। इसे जांचने का एक तरीका विजेट में प्रयुक्त मुखौटा शब्द को देखकर है।
 
-<Tip>
-
- ✏️ **कोशिश करके देखो!** हब पर `बर्ट-बेस-केस्ड` मॉडल खोजें और अनुमान एपीआई विजेट में इसके मुखौटा शब्द की पहचान करें। यह मॉडल उपरोक्त हमारे `पाइपलाइन` उदाहरण में वाक्य के लिए क्या भविष्यवाणी करता है?
-  
-</Tip>
+> [!TIP]
+> ✏️ **कोशिश करके देखो!** हब पर `बर्ट-बेस-केस्ड` मॉडल खोजें और अनुमान एपीआई विजेट में इसके मुखौटा शब्द की पहचान करें। यह मॉडल उपरोक्त हमारे `पाइपलाइन` उदाहरण में वाक्य के लिए क्या भविष्यवाणी करता है?
 
 ## नामित इकाई मान्यता
 
@@ -248,11 +232,8 @@ ner("My name is Sylvain and I work at Hugging Face in Brooklyn.")
 
 हम पाइपलाइन निर्माण फ़ंक्शन में विकल्प `grouped_entities=True` पास करते हैं ताकि पाइपलाइन को एक ही इकाई के अनुरूप वाक्य के हिस्सों को एक साथ फिर से समूहित करने के लिए कहा जा सके: यहां मॉडल ने एक ही संगठन के रूप में "हगिंग" और "फेस" को सही ढंग से समूहीकृत किया है, भले ही नाम में कई शब्द हों। वास्तव में, जैसा कि हम अगले अध्याय में देखेंगे, प्रीप्रोसेसिंग कुछ शब्दों को छोटे भागों में भी विभाजित करता है। उदाहरण के लिए, `सिल्वेन` को चार भागों में बांटा गया है: `S`, `##yl`, `##va`, और `##in`। प्रसंस्करण के बाद के चरण में, पाइपलाइन ने उन टुकड़ों को सफलतापूर्वक पुन: समूहित किया।
 
-<Tip>
-  
- ✏️ **कोशिश करके देखो!** अंग्रेजी में पार्ट-ऑफ-स्पीच टैगिंग (आमतौर पर पीओएस के रूप में संक्षिप्त) करने में सक्षम मॉडल के लिए मॉडल हब खोजें। यह मॉडल उपरोक्त उदाहरण में वाक्य के लिए क्या भविष्यवाणी करता है?
-  
-</Tip>
+> [!TIP]
+> ✏️ **कोशिश करके देखो!** अंग्रेजी में पार्ट-ऑफ-स्पीच टैगिंग (आमतौर पर पीओएस के रूप में संक्षिप्त) करने में सक्षम मॉडल के लिए मॉडल हब खोजें। यह मॉडल उपरोक्त उदाहरण में वाक्य के लिए क्या भविष्यवाणी करता है?
 
 ## प्रश्न उत्तर
 
@@ -335,10 +316,7 @@ translator("Ce cours est produit par Hugging Face.")
 
 पाठ निर्माण और संक्षेपण की तरह, आप परिणाम के लिए `max_length` या `min_length` निर्दिष्ट कर सकते हैं।
 
-<Tip>
-  
-✏️ **कोशिश करके देखो!** अन्य भाषाओं में अनुवाद मॉडल खोजें और पिछले वाक्य का कुछ भिन्न भाषाओं में अनुवाद करने का प्रयास करें।
-  
-</Tip>
+> [!TIP]
+> ✏️ **कोशिश करके देखो!** अन्य भाषाओं में अनुवाद मॉडल खोजें और पिछले वाक्य का कुछ भिन्न भाषाओं में अनुवाद करने का प्रयास करें।
 
 अब तक दिखाई गई पाइपलाइनें ज्यादातर प्रदर्शनकारी उद्देश्यों के लिए हैं। वे विशिष्ट कार्यों के लिए प्रोग्राम किए गए थे और उनमें से विविधताएं नहीं कर सकते। अगले अध्याय में, आप सीखेंगे कि 'पाइपलाइन ()' फ़ंक्शन के अंदर क्या है और इसके व्यवहार को कैसे अनुकूलित किया जाए।
diff --git a/chapters/hi/chapter2/1.mdx b/chapters/hi/chapter2/1.mdx
index df1e0e139..a38f1af48 100644
--- a/chapters/hi/chapter2/1.mdx
+++ b/chapters/hi/chapter2/1.mdx
@@ -20,7 +20,6 @@
 
 फिर हम टोकननाइज़र API को देखेंगे, जो `pipeline()` फ़ंक्शन का अन्य मुख्य अंग है। टोकेनाइज़र पहले और अंतिम प्रसंस्करण चरणों का ध्यान रखते हैं, न्यूरल नेटवर्क के लिए पाठ से संख्यात्मक इनपुट में परिवर्तन को संभालते हैं, और आवश्यकता होने पर पाठ में परिवर्तन वापस करते हैं। अंत में, हम आपको दिखाएंगे कि एक तैयार बैच में एक मॉडल के माध्यम से कई वाक्यों को भेजने से कैसे निपटना है, फिर उच्च-स्तरीय `tokenizer()` फ़ंक्शन को करीब से  देखकर इसका अंत करेंगे।
 
-<Tip>
-⚠️ मॉडल हब और 🤗 ट्रांसफॉर्मर के साथ उपलब्ध सभी सुविधाओं का लाभ उठाने के लिए, हम <a href="https://huggingface.co/join">खाता बनाने</a> की अनुशंसा करते हैं।
-</Tip>
+> [!TIP]
+> ⚠️ मॉडल हब और 🤗 ट्रांसफॉर्मर के साथ उपलब्ध सभी सुविधाओं का लाभ उठाने के लिए, हम <a href="https://huggingface.co/join">खाता बनाने</a> की अनुशंसा करते हैं।
 
diff --git a/chapters/hi/chapter3/2.mdx b/chapters/hi/chapter3/2.mdx
index 736809b47..5a9b28acf 100644
--- a/chapters/hi/chapter3/2.mdx
+++ b/chapters/hi/chapter3/2.mdx
@@ -89,9 +89,8 @@ model.train_on_batch(batch, labels)
 
 🤗 डेटासेट लाइब्रेरी एक बहुत ही सरल कमांड प्रदान करती है हब पर डेटासेट को डाउनलोड और कैश करने के लिए। हम MRPC डेटासेट को इस तरह डाउनलोड कर सकते हैं:
 
-<Tip>
-⚠️ **चेतावनी** सुनिश्चित करें कि `datasets` स्थापित है। इसके लिए `pip install datasets` चलाएँ। फिर, MRPC डेटासेट को लोड करें और देखें कि इसमें क्या है।
-</Tip>
+> [!TIP]
+> ⚠️ **चेतावनी** सुनिश्चित करें कि `datasets` स्थापित है। इसके लिए `pip install datasets` चलाएँ। फिर, MRPC डेटासेट को लोड करें और देखें कि इसमें क्या है।
 
 ```py
 from datasets import load_dataset
@@ -150,11 +149,8 @@ raw_train_dataset.features
 
 परदे के पीछे, `label` प्रकार `ClassLabel` का है, और पूर्णांक का लेबल नाम से मानचित्रण *names* फ़ोल्डर में संग्रहित किया जाता है। `0` मेल खाता है `not_equivalent` से, और `1` मेल खाता है `equivalent` से।
 
-<Tip>
-
-✏️ **कोशिश करके देखे!** प्रशिक्षण सेट के तत्व 15 और सत्यापन सेट के तत्व 87 को देखें। उनके लेबल क्या हैं?
-
-</Tip>
+> [!TIP]
+> ✏️ **कोशिश करके देखे!** प्रशिक्षण सेट के तत्व 15 और सत्यापन सेट के तत्व 87 को देखें। उनके लेबल क्या हैं?
 
 ### डेटासेट का पूर्वप्रक्रमण करना
 
@@ -192,11 +188,8 @@ inputs
 
 हमने [अध्याय 2](/course/chapter2) में `input_ids` और `attention_mask` कुंजियों पर चर्चा की, लेकिन हमने `token_type_ids` के बारे में बात नहीं की। इस उदाहरण में, यह कुंजी मॉडल को बताता है कि इनपुट का कौन सा हिस्सा पहला वाक्य है और कौन सा दूसरा वाक्य है।
 
-<Tip>
-
-✏️ **कोशिश करके देखे!** प्रशिक्षण सेट के तत्व 15 को लें और टोकननाइज करें दो वाक्यों को अलग-अलग और एक जोड़ी के रूप में। दोनों परिणामों में क्या अंतर है?
-
-</Tip>
+> [!TIP]
+> ✏️ **कोशिश करके देखे!** प्रशिक्षण सेट के तत्व 15 को लें और टोकननाइज करें दो वाक्यों को अलग-अलग और एक जोड़ी के रूप में। दोनों परिणामों में क्या अंतर है?
 
 यदि हम `input_ids` के अंदर IDs को शब्दों में वापस व्याख्या करते हैं:
 
@@ -353,11 +346,8 @@ batch = data_collator(samples)
 
 {/if}
 
-<Tip>
-
-✏️ **कोशिश करके देखे!** कोशिश करके देखे! GLUE SST-2 डेटासेट पर प्रीप्रोसेसिंग को दोहराएं। यह थोड़ा अलग है क्योंकि यह जोड़े के बजाय एकल वाक्यों से बना है, लेकिन बाकी जो हमने किया वो वैसा ही दिखना चाहिए। एक कठिन चुनौती के लिए, एक प्रीप्रोसेसिंग फ़ंक्शन लिखने का प्रयास करें जो किसी भी GLUE कार्यों पर काम करता हो।
-
-</Tip>
+> [!TIP]
+> ✏️ **कोशिश करके देखे!** कोशिश करके देखे! GLUE SST-2 डेटासेट पर प्रीप्रोसेसिंग को दोहराएं। यह थोड़ा अलग है क्योंकि यह जोड़े के बजाय एकल वाक्यों से बना है, लेकिन बाकी जो हमने किया वो वैसा ही दिखना चाहिए। एक कठिन चुनौती के लिए, एक प्रीप्रोसेसिंग फ़ंक्शन लिखने का प्रयास करें जो किसी भी GLUE कार्यों पर काम करता हो।
 
 {#if fw === 'tf'}
 
diff --git a/chapters/hi/chapter3/3.mdx b/chapters/hi/chapter3/3.mdx
index 9dd435aaa..387e2f799 100644
--- a/chapters/hi/chapter3/3.mdx
+++ b/chapters/hi/chapter3/3.mdx
@@ -42,11 +42,8 @@ from transformers import TrainingArguments
 training_args = TrainingArguments("test-trainer")
 ```
 
-<Tip>
-
-💡 यदि आप प्रशिक्षण के दौरान अपने मॉडल को हब पर स्वचालित रूप से अपलोड करना चाहते हैं, तो आप `TrainingArguments` में `push_to_hub=True` के साथ पास कर सकते हैं। हम इसके बारे में [अध्याय 4](/course/chapter4/3) में और जानेंगे
-
-</Tip>
+> [!TIP]
+> 💡 यदि आप प्रशिक्षण के दौरान अपने मॉडल को हब पर स्वचालित रूप से अपलोड करना चाहते हैं, तो आप `TrainingArguments` में `push_to_hub=True` के साथ पास कर सकते हैं। हम इसके बारे में [अध्याय 4](/course/chapter4/3) में और जानेंगे
 
 दूसरा कदम हमारे मॉडल को परिभाषित करना है। [पिछले अध्याय](/course/chapter2) की तरह, हम `AutoModelForSequenceClassification` वर्ग का उपयोग करेंगे, दो लेबल के साथ :
 
@@ -164,9 +161,6 @@ trainer.train()
 
 यह `Trainer` API का उपयोग करके फाइन-ट्यूनिंग के परिचय को समाप्त करता है। अधिकांश सामान्य NLP कार्यों के लिए ऐसा करने का एक उदाहरण [अध्याय 7](course/chapter7) में दिया जाएगा,  लेकिन अभी के लिए आइए देखें कि शुद्ध PyTorch में वही काम कैसे करें।
 
-<Tip>
-
-✏️ **कोशिश करके देखे!** GLUE SST-2 डेटासेट पर एक मॉडल को फाइन-ट्यून करें, डेटा प्रसंस्करण यानि डेटा प्रोसेसिंग का उपयोग करके जिसे आपने सेक्शन 2 में किया था
-
-</Tip>
+> [!TIP]
+> ✏️ **कोशिश करके देखे!** GLUE SST-2 डेटासेट पर एक मॉडल को फाइन-ट्यून करें, डेटा प्रसंस्करण यानि डेटा प्रोसेसिंग का उपयोग करके जिसे आपने सेक्शन 2 में किया था
 
diff --git a/chapters/hi/chapter3/3_tf.mdx b/chapters/hi/chapter3/3_tf.mdx
index 4ae3c99f2..38072f990 100644
--- a/chapters/hi/chapter3/3_tf.mdx
+++ b/chapters/hi/chapter3/3_tf.mdx
@@ -70,11 +70,8 @@ model = TFAutoModelForSequenceClassification.from_pretrained(checkpoint, num_lab
 
 अपने डेटासेट पर मॉडल को फाइन-ट्यून करने के लिए, हमें बस अपने मॉडल को `compile()` करना होगा और फिर अपने डेटा को `fit()` विधि में पास करना होगा। यह फ़ाइन-ट्यूनिंग प्रक्रिया को शुरू करेगा (जो GPU पर कुछ मिनट लेगा) और आगे जा कर यह हर युग यानि एपॉच के अंत में प्रशिक्षण हानि यानि लॉस साथ ही सत्यापन हानि की रिपोर्ट करेगा।
 
-<Tip>
-
-ध्यान दें कि 🤗 ट्रांसफॉर्मर मॉडल में एक विशेष क्षमता होती है जो कि अधिकांश Keras मॉडल नहीं होती - वे स्वचालित रूप से एक उचित हानि यानि लॉस का उपयोग कर सकते हैं जिसे वे आंतरिक रूप से गणना करते हैं। वे डिफ़ॉल्ट रूप से इस लॉस का उपयोग करेगा अगर आप `compile()` में लॉस आर्गूमेन्ट सेट नहीं करते हैं तो। ध्यान दें कि आंतरिक लॉस का उपयोग करने के लिए आपको अपने लेबल को इनपुट के हिस्से के रूप में पास करना होगा, न कि एक अलग लेबल के रूप में, जो कि Keras मॉडल के साथ लेबल का उपयोग करने का सामान्य तरीका है। आप पाठ्यक्रम के भाग 2 में इसके उदाहरण देखेंगे, जहां सही लॉस फ़ंक्शन को परिभाषित करना पेचीदा हो सकता है। अनुक्रम वर्गीकरण के लिए, हालांकि, एक मानक Keras लॉस फ़ंक्शन ठीक काम करता है, इसलिए हम यहां इसका उपयोग करेंगे।
-
-</Tip>
+> [!TIP]
+> ध्यान दें कि 🤗 ट्रांसफॉर्मर मॉडल में एक विशेष क्षमता होती है जो कि अधिकांश Keras मॉडल नहीं होती - वे स्वचालित रूप से एक उचित हानि यानि लॉस का उपयोग कर सकते हैं जिसे वे आंतरिक रूप से गणना करते हैं। वे डिफ़ॉल्ट रूप से इस लॉस का उपयोग करेगा अगर आप `compile()` में लॉस आर्गूमेन्ट सेट नहीं करते हैं तो। ध्यान दें कि आंतरिक लॉस का उपयोग करने के लिए आपको अपने लेबल को इनपुट के हिस्से के रूप में पास करना होगा, न कि एक अलग लेबल के रूप में, जो कि Keras मॉडल के साथ लेबल का उपयोग करने का सामान्य तरीका है। आप पाठ्यक्रम के भाग 2 में इसके उदाहरण देखेंगे, जहां सही लॉस फ़ंक्शन को परिभाषित करना पेचीदा हो सकता है। अनुक्रम वर्गीकरण के लिए, हालांकि, एक मानक Keras लॉस फ़ंक्शन ठीक काम करता है, इसलिए हम यहां इसका उपयोग करेंगे।
 
 ```py
 from tensorflow.keras.losses import SparseCategoricalCrossentropy
@@ -90,11 +87,8 @@ model.fit(
 )
 ```
 
-<Tip warning={true}>
-
-यहां एक बहुत ही सामान्य नुकसान पर ध्यान दें - आप केवल लॉस का नाम स्ट्रिंग के रूप मे Keras को पास *कर सकते* है, लेकिन डिफ़ॉल्ट रूप से Keras यह मानेगा कि आपने पहले ही अपने आउटपुट में सॉफ्टमैक्स लागू कर दिया है। हालाँकि, कई मॉडल सॉफ्टमैक्स लागू होने से ठीक पहले मानों यानि वैल्यूज़ को आउटपुट करते हैं, जिन्हें *logits* के रूप में भी जाना जाता है। हमें लॉस फ़ंक्शन को यह बताने की आवश्यकता है कि हमारा मॉडल क्या करता है, और ऐसा करने का एकमात्र तरीका है कि इसे सीधे कॉल करना, बजाय एक स्ट्रिंग के नाम से।
-
-</Tip>
+> [!WARNING]
+> यहां एक बहुत ही सामान्य नुकसान पर ध्यान दें - आप केवल लॉस का नाम स्ट्रिंग के रूप मे Keras को पास *कर सकते* है, लेकिन डिफ़ॉल्ट रूप से Keras यह मानेगा कि आपने पहले ही अपने आउटपुट में सॉफ्टमैक्स लागू कर दिया है। हालाँकि, कई मॉडल सॉफ्टमैक्स लागू होने से ठीक पहले मानों यानि वैल्यूज़ को आउटपुट करते हैं, जिन्हें *logits* के रूप में भी जाना जाता है। हमें लॉस फ़ंक्शन को यह बताने की आवश्यकता है कि हमारा मॉडल क्या करता है, और ऐसा करने का एकमात्र तरीका है कि इसे सीधे कॉल करना, बजाय एक स्ट्रिंग के नाम से।
 
 
 ### प्रशिक्षण प्रदर्शन में सुधार करना
@@ -131,11 +125,8 @@ from tensorflow.keras.optimizers import Adam
 opt = Adam(learning_rate=lr_scheduler)
 ```
 
-<Tip>
-
-🤗 ट्रांसफॉर्मर्स लाइब्रेरी में एक `create_optimizer()` फ़ंक्शन भी है जो लर्निंग रेट क्षय के साथ एक `AdamW` ऑप्टिमाइज़र बनाएगा। यह एक सुविधाजनक शॉर्टकट है जिसे आप पाठ्यक्रम के भविष्य के अनुभागों में विस्तार से देखेंगे।
-
-</Tip>
+> [!TIP]
+> 🤗 ट्रांसफॉर्मर्स लाइब्रेरी में एक `create_optimizer()` फ़ंक्शन भी है जो लर्निंग रेट क्षय के साथ एक `AdamW` ऑप्टिमाइज़र बनाएगा। यह एक सुविधाजनक शॉर्टकट है जिसे आप पाठ्यक्रम के भविष्य के अनुभागों में विस्तार से देखेंगे।
 
 अब हमारे पास हमारा बिल्कुल नया ऑप्टिमाइज़र है, और हम इसके साथ प्रशिक्षण का प्रयास कर सकते हैं। सबसे पहले, मॉडल को फिर से लोड करें, ताकि हमारे द्वारा अभी-अभी किए गए प्रशिक्षण रन से वज़न में परिवर्तन को रीसेट कर सके, और फिर हम इसे नए ऑप्टिमाइज़र के साथ कंपाइल कर सकते हैं:
 
@@ -153,11 +144,8 @@ model.compile(optimizer=opt, loss=loss, metrics=["accuracy"])
 model.fit(tf_train_dataset, validation_data=tf_validation_dataset, epochs=3)
 ```
 
-<Tip>
-
-💡 यदि आप प्रशिक्षण के दौरान अपने मॉडल को हब पर स्वचालित रूप से अपलोड करना चाहते हैं, तो आप `model.fit()` विधि में `PushToHubCallback` के साथ पास कर सकते हैं। हम इसके बारे में [अध्याय 4](/course/chapter4/3) में और जानेंगे
-
-</Tip>
+> [!TIP]
+> 💡 यदि आप प्रशिक्षण के दौरान अपने मॉडल को हब पर स्वचालित रूप से अपलोड करना चाहते हैं, तो आप `model.fit()` विधि में `PushToHubCallback` के साथ पास कर सकते हैं। हम इसके बारे में [अध्याय 4](/course/chapter4/3) में और जानेंगे
 
 ### मॉडल के पूर्वानुमान
 
diff --git a/chapters/hi/chapter3/4.mdx b/chapters/hi/chapter3/4.mdx
index 531a12283..5bb87dfca 100644
--- a/chapters/hi/chapter3/4.mdx
+++ b/chapters/hi/chapter3/4.mdx
@@ -197,11 +197,8 @@ metric.compute()
 
 फिर से, मॉडल हेड इनिशियलाइज़ेशन और डेटा फेरबदल में क्रमरहित होने के कारण आपके परिणाम थोड़े भिन्न होंगे, लेकिन वे एक ही बॉलपार्क में होने चाहिए।
 
-<Tip>
-
-✏️ **कोशिश करके देखे!** पिछले प्रशिक्षण लूप को संशोधित करें ताकि अपने मॉडल को SST-2 डेटासेट पर फाइन-ट्यून कर सके। 
-
-</Tip>
+> [!TIP]
+> ✏️ **कोशिश करके देखे!** पिछले प्रशिक्षण लूप को संशोधित करें ताकि अपने मॉडल को SST-2 डेटासेट पर फाइन-ट्यून कर सके।
 
 ### अपने प्रशिक्षण लूप को सुपरचार्ज करें 🤗 Accelerate के साथ।
 
@@ -293,9 +290,8 @@ for epoch in range(num_epochs):
 
 फिर काम का मुख्य हिस्सा उस लाइन में किया जाता है जो डेटालोडर्स, मॉडल और ऑप्टिमाइज़र को `accelerator.prepare()` पर भेजता है। यह उन वस्तुओं को उचित कंटेनर में लपेट देगा ताकि यह सुनिश्चित हो सके कि आपका वितरित प्रशिक्षण उद्देश्य के अनुसार काम करता है। शेष परिवर्तन है उस लाइन को हटाना जो बैच को `device` पर रखता है (फिर से, यदि आप इसे रखना चाहते हैं तो आप इसे केवल `accelerator.device` का उपयोग करने के लिए बदल सकते हैं) और `loss.backward()` को `accelerator.backward(loss)` के साथ बदलना।
 
-<Tip>
-⚠️ Cloud TPUs द्वारा पेश किए गए स्पीड-अप से लाभ उठाने के लिए, हम अनुशंसा करते हैं कि आप अपने सैम्पल्स को टोकननाइज़र के `padding="max_length"` और `max_length` प्राचल यानि आर्गुमेंट के साथ एक निश्चित लंबाई तक पैडिंग करें।
-</Tip>
+> [!TIP]
+> ⚠️ Cloud TPUs द्वारा पेश किए गए स्पीड-अप से लाभ उठाने के लिए, हम अनुशंसा करते हैं कि आप अपने सैम्पल्स को टोकननाइज़र के `padding="max_length"` और `max_length` प्राचल यानि आर्गुमेंट के साथ एक निश्चित लंबाई तक पैडिंग करें।
 
 यदि आप इसे खेलने के लिए कॉपी और पेस्ट करना चाहते हैं, तो यहां बताया गया है कि 🤗 Accelerate के साथ पूरा प्रशिक्षण लूप कैसा दिखता है:
 
diff --git a/chapters/it/chapter1/3.mdx b/chapters/it/chapter1/3.mdx
index 33546693a..140e51df2 100644
--- a/chapters/it/chapter1/3.mdx
+++ b/chapters/it/chapter1/3.mdx
@@ -9,11 +9,10 @@
 
 In questa sezione, vedremo di cosa sono capaci i modelli Transformer e useremo il nostro primo strumento della libreria 🤗 Transformer: la funzione `pipeline()`.
 
-<Tip>
-👀 Lo vedi il pulsante <em>Open in Colab</em> in alto a destra? Cliccalo per aprire il blocco note Colab di Google che contiene tutti gli esempi di codice di questa sezione. Ritroverai il pulsante in ogni sezione che contiene esempi di codice. 
-
-Se intendi compilare gli esempi localmente, ti consigliamo di dare un occhio alla sezione <a href="/course/chapter0">setup</a>.
-</Tip>
+> [!TIP]
+> 👀 Lo vedi il pulsante <em>Open in Colab</em> in alto a destra? Cliccalo per aprire il blocco note Colab di Google che contiene tutti gli esempi di codice di questa sezione. Ritroverai il pulsante in ogni sezione che contiene esempi di codice. 
+>
+> Se intendi compilare gli esempi localmente, ti consigliamo di dare un occhio alla sezione <a href="/course/chapter0">setup</a>.
 
 ## I Transformer sono ovunque!
 
@@ -23,9 +22,8 @@ I modelli Transformer sono utilizzati per eseguire qualsiasi compito di NLP, com
 
 La [libreria 🤗 Transformer](https://github.com/huggingface/transformers) fornisce la funzionalità per creare e utilizzare questi modelli condivisi. Il [Model Hub](https://huggingface.co/models) contiene migliaia di modelli pre-addestrati che possono essere scaricati e usati liberamente. Puoi anche caricare i tuoi modelli nell'Hub!
 
-<Tip>
-⚠️ L'Hugging Face Hub non si limitata ai soli modelli Transformer. Chiunque può condividere qualsiasi tipo di modello o dataset (<em>insieme di dati</em>)! <a href="https://huggingface.co/join">Crea un profilo huggingface.co</a> per approfittare di tutte le funzioni disponibili!
-</Tip>
+> [!TIP]
+> ⚠️ L'Hugging Face Hub non si limitata ai soli modelli Transformer. Chiunque può condividere qualsiasi tipo di modello o dataset (<em>insieme di dati</em>)! <a href="https://huggingface.co/join">Crea un profilo huggingface.co</a> per approfittare di tutte le funzioni disponibili!
 
 Prima di scoprire come funzionino i modelli Transformer dietro le quinte, vediamo qualche esempio di come questi possano essere utilizzati per risolvere alcuni problemi interessanti di NLP.
 
@@ -104,11 +102,8 @@ classifier(
 
 Questa pipeline si chiama _zero-shot_ perché non hai bisogno di affinare il modello usando i tuoi dati per poterlo utilizzare. È direttamente in grado di generare una previsione probabilistica per qualsiasi lista di etichette tu voglia!
 
-<Tip>
-
-✏️ **Provaci anche tu!** Divertiti creando sequenze ed etichette e osserva come si comporta il modello.
-
-</Tip>
+> [!TIP]
+> ✏️ **Provaci anche tu!** Divertiti creando sequenze ed etichette e osserva come si comporta il modello.
 
 
 ## Generazione di testi
@@ -132,11 +127,8 @@ generator("In this course, we will teach you how to")
 
 Usando l'argomento `num_return_sequences` puoi controllare quante sequenze diverse vengono generate e, con l'argomento `max_length`, la lunghezza totale dell'output testuale.
 
-<Tip>
-
-✏️ **Provaci anche tu!** Usa gli argomenti `num_return_sequences` e `max_length` per generare due frasi di 15 parole ciascuna.
-
-</Tip>
+> [!TIP]
+> ✏️ **Provaci anche tu!** Usa gli argomenti `num_return_sequences` e `max_length` per generare due frasi di 15 parole ciascuna.
 
 
 ## Utilizzo di un qualsiasi modello dell'Hub in una pipeline
@@ -168,11 +160,8 @@ Puoi affinare la ricerca di un modello cliccando sulle etichette corrispondenti
 
 Quando avrai selezionato un modello cliccando su di esso, vedrai che esiste un widget che ti permette di provarlo direttamente online. In questo modo, puoi testare velocemente le capacità del modello prima di scaricarlo.
 
-<Tip>
-
-✏️ **Provaci anche tu!** Usa i filtri per trovare un modello di generazione testuale per un'altra lingua. Sentiti libero/a di divertirti con il widget e usalo in una pipeline!
-
-</Tip>
+> [!TIP]
+> ✏️ **Provaci anche tu!** Usa i filtri per trovare un modello di generazione testuale per un'altra lingua. Sentiti libero/a di divertirti con il widget e usalo in una pipeline!
 
 ### La Inference API
 
@@ -204,11 +193,8 @@ unmasker("This course will teach you all about <mask> models.", top_k=2)
 
 L'argomento `top_k` gestisce il numero di possibilità che vuoi mostrare. Nota che qui il modello inserisce la `<mask>` word speciale, la quale viene spesso chiamata *mask token*. Altri modelli di tipo mask-filling potrebbero avere mask token diversi, quindi è sempre bene verificare quale sia la corretta mask word quando esploriamo nuovi modelli. Un modo per verificarla consiste nel trovare la mask word utilizzata nel widget.
 
-<Tip>
-
-✏️ **Provaci anche tu!** Cerca il modello `bert-base-cased` nell'Hub e identifica la sua mask word nel widget dell'Inference API. Cosa predice questo modello per la frase nel nostro esempio `pipeline` qui sopra?
-
-</Tip>
+> [!TIP]
+> ✏️ **Provaci anche tu!** Cerca il modello `bert-base-cased` nell'Hub e identifica la sua mask word nel widget dell'Inference API. Cosa predice questo modello per la frase nel nostro esempio `pipeline` qui sopra?
 
 ## Riconoscimento delle entità nominate
 
@@ -232,11 +218,8 @@ Qui il modello ha correttamente identificato che Sylvain è una persona (PER), H
 
 Passiamo l'opzione `grouped_entities=True` nella funzione di creazione della pipeline per raggruppare le parti frasali che corrispondono alla stessa entità: qui il modello raggruppa correttamente "Hugging" e "Face" come singola organizzazione, nonostante il nome sia formato da più parole. A dire il vero, come vedremo nel prossimo capitolo, il preprocessing divide perfino alcune parole in parti più piccole. Ad esempio, `Sylvain` viene suddiviso in quattro parti: `S`, `##yl`, `##va`, and `##in`. Al momento del post-processing, la pipeline raggruppa le parti con successo.
 
-<Tip>
-
-✏️ **Provaci anche tu!** Nel Model Hub, cerca un modello capace di effettuare part-of-speech tagging (comunemente abbreviato come POS) in inglese. Cosa predice il modello per la frase nell'esempio qui sopra?
-
-</Tip>
+> [!TIP]
+> ✏️ **Provaci anche tu!** Nel Model Hub, cerca un modello capace di effettuare part-of-speech tagging (comunemente abbreviato come POS) in inglese. Cosa predice il modello per la frase nell'esempio qui sopra?
 
 ## Risposta a domande
 
@@ -320,10 +303,7 @@ translator("Ce cours est produit par Hugging Face.")
 
 Come per le funzioni di generazione testuale e riassunto, è possibile specificare un `max_length` o un `min_length` per il risultato.
 
-<Tip>
-
-✏️ **Provaci anche tu!** Cerca modelli di traduzione in altre lingue e prova a tradurre la frase precedente in un paio di lingue diverse.
-
-</Tip>
+> [!TIP]
+> ✏️ **Provaci anche tu!** Cerca modelli di traduzione in altre lingue e prova a tradurre la frase precedente in un paio di lingue diverse.
 
 Finora abbiamo mostrato pipeline a solo scopo dimostrativo. Tali pipeline sono state programmate per compiti ben specifici e non sono in grado di eseguire variazioni di questi ultimi. Nel prossimo capitolo, imparerai cosa si nasconde dentro la funzione `pipeline()` e come personalizzarne il comportamento.
diff --git a/chapters/it/chapter2/1.mdx b/chapters/it/chapter2/1.mdx
index d0579a712..03079864a 100644
--- a/chapters/it/chapter2/1.mdx
+++ b/chapters/it/chapter2/1.mdx
@@ -21,6 +21,5 @@ Questo capitolo inizierà con un esempio in cui usiamo un modello e un tokenizer
 
 Successivamente vedremo l'API del tokenizer, che è l'altro componente principale della funzione `pipeline()`. I tokenizer si occupano della prima e dell'ultima fase di elaborazione, gestendo la conversione da testo a input numerici per la rete neurale e la conversione di nuovo in testo quando è necessario. Infine, mostreremo come gestire l'invio di più frasi a un modello in un batch preparato, per poi concludere il tutto con un'analisi più approfondita della funzione di alto livello `tokenizer()`.
 
-<Tip>
-⚠️ Per poter usufruire di tutte le funzioni disponibili con il Model Hub e i 🤗 Transformers, si consiglia di <a href="https://huggingface.co/join">creare un account</a>.
-</Tip>
\ No newline at end of file
+> [!TIP]
+> ⚠️ Per poter usufruire di tutte le funzioni disponibili con il Model Hub e i 🤗 Transformers, si consiglia di <a href="https://huggingface.co/join">creare un account</a>.
\ No newline at end of file
diff --git a/chapters/it/chapter2/2.mdx b/chapters/it/chapter2/2.mdx
index 1471eb014..469650854 100644
--- a/chapters/it/chapter2/2.mdx
+++ b/chapters/it/chapter2/2.mdx
@@ -22,9 +22,8 @@
 
 {/if}
 
-<Tip>
-   Questa è la prima sezione in cui il contenuto è leggermente diverso a seconda che si utilizzi PyTorch o TensorFlow. Attivate lo switch sopra il titolo per selezionare la tua piattaforma preferita!
-</Tip>
+> [!TIP]
+> Questa è la prima sezione in cui il contenuto è leggermente diverso a seconda che si utilizzi PyTorch o TensorFlow. Attivate lo switch sopra il titolo per selezionare la tua piattaforma preferita!
 
 {#if fw === 'pt'}
 <Youtube id="1pedAIvTWXk"/>
@@ -346,6 +345,5 @@ Ora possiamo concludere che il modello ha previsto quanto segue:
 
 Abbiamo riprodotto con successo le tre fasi della pipeline: preelaborazione con i tokenizer, passaggio degli input attraverso il modello e postelaborazione! Ora prendiamoci un po' di tempo per approfondire ognuna di queste fasi.
 
-<Tip>
-✏️ **Provaci anche tu!** Scegli due (o più) testi di tua proprietà e lanciali all'interno della pipeline `sentiment-analysis`. Successivamente, replica i passi che hai visto qui e verifica di aver ottenuto gli stessi risultati!
-</Tip>
+> [!TIP]
+> ✏️ **Provaci anche tu!** Scegli due (o più) testi di tua proprietà e lanciali all'interno della pipeline `sentiment-analysis`. Successivamente, replica i passi che hai visto qui e verifica di aver ottenuto gli stessi risultati!
diff --git a/chapters/it/chapter2/4.mdx b/chapters/it/chapter2/4.mdx
index 5cd002b5e..7e9d0e244 100644
--- a/chapters/it/chapter2/4.mdx
+++ b/chapters/it/chapter2/4.mdx
@@ -216,11 +216,8 @@ print(ids)
 
 Questi risultati, una volta convertiti nel tensore quadro appropriato, possono essere successivamente utilizzati come input per un modello, come visto in precedenza in questo capitolo.
 
-<Tip>
-
-✏️ **Provaci anche tu!** Replica gli ultimi due passaggi (tokenizzazione e conversione in ID di input) sulle frasi di input utilizzate nella sezione 2 ("I've been waiting for a HuggingFace course my whole life." e "I hate this so much!"). Verificate di ottenere gli stessi ID di input che abbiamo ottenuto in precedenza!
-
-</Tip>
+> [!TIP]
+> ✏️ **Provaci anche tu!** Replica gli ultimi due passaggi (tokenizzazione e conversione in ID di input) sulle frasi di input utilizzate nella sezione 2 ("I've been waiting for a HuggingFace course my whole life." e "I hate this so much!"). Verificate di ottenere gli stessi ID di input che abbiamo ottenuto in precedenza!
 
 ## Decodifica
 
diff --git a/chapters/it/chapter2/5.mdx b/chapters/it/chapter2/5.mdx
index 833f10a67..659ac5403 100644
--- a/chapters/it/chapter2/5.mdx
+++ b/chapters/it/chapter2/5.mdx
@@ -181,11 +181,8 @@ batched_ids = [ids, ids]
 
 Si tratta di un batch di due sequenze identiche!
 
-<Tip>
-
-✏️ **Try it out!** Convert this `batched_ids` list into a tensor and pass it through your model. Check that you obtain the same logits as before (but twice)!
-
-</Tip>
+> [!TIP]
+> ✏️ **Try it out!** Convert this `batched_ids` list into a tensor and pass it through your model. Check that you obtain the same logits as before (but twice)!
 
 Il batching consente al modello di funzionare quando si inseriscono più frasi. Utilizzare più sequenze è altrettanto semplice che creare un batch con una singola sequenza. C'è però un secondo problema. Quando si cerca di raggruppare due (o più) frasi, queste potrebbero essere di lunghezza diversa. Se si è già lavorato con i tensori, si sa che devono essere di forma rettangolare, quindi non è possibile convertire direttamente l'elenco degli ID in ingresso in un tensore. Per ovviare a questo problema, di solito, utilizziamo la tecnica del *padding* sugli input.
 
@@ -317,11 +314,8 @@ Ora otteniamo gli stessi logits per la seconda frase del batch.
 
 Si noti che l'ultimo valore della seconda sequenza è un ID di riempimento, che è un valore 0 nella maschera di attenzione.
 
-<Tip>
-
-✏️ **Provaci anche tu** Applicate manualmente la tokenizzazione alle due frasi utilizzate nella sezione 2 ("I've been waiting for a HuggingFace course my whole life." e "I hate this so much!"). Passatele attraverso il modello e verificate che si ottengano gli stessi logits della sezione 2. A questo punto, batchateli insieme utilizzando il token di padding e successivamente create la maschera di attenzione appropriata. Verificate di ottenere gli stessi risultati passando attraverso il modello!
-
-</Tip>
+> [!TIP]
+> ✏️ **Provaci anche tu** Applicate manualmente la tokenizzazione alle due frasi utilizzate nella sezione 2 ("I've been waiting for a HuggingFace course my whole life." e "I hate this so much!"). Passatele attraverso il modello e verificate che si ottengano gli stessi logits della sezione 2. A questo punto, batchateli insieme utilizzando il token di padding e successivamente create la maschera di attenzione appropriata. Verificate di ottenere gli stessi risultati passando attraverso il modello!
 
 ## Sequenze più lunghe
 
diff --git a/chapters/it/chapter3/2.mdx b/chapters/it/chapter3/2.mdx
index b8c8dff85..c04650429 100644
--- a/chapters/it/chapter3/2.mdx
+++ b/chapters/it/chapter3/2.mdx
@@ -89,9 +89,8 @@ L'Hub non contiene solo modelli; contiene anche molti dataset in tante lingue di
 
 La libreria 🤗 Datasets fornisce un comando molto semplice per scaricare e mettere nella cache un dataset sull'Hub. Il dataset MRPC può essere scaricato così:
 
-<Tip>
-⚠️ **Attenzione** Assicurati che `datasets` sia installato eseguendo `pip install datasets`. Poi, carica il dataset MRPC e stampalo per vedere cosa contiene.
-</Tip>
+> [!TIP]
+> ⚠️ **Attenzione** Assicurati che `datasets` sia installato eseguendo `pip install datasets`. Poi, carica il dataset MRPC e stampalo per vedere cosa contiene.
 
 ```py
 from datasets import load_dataset
@@ -151,11 +150,8 @@ raw_train_dataset.features
 
 Dietro le quinte, `label` è del tipo  `ClassLabel`, e la corrispondenza tra i numeri e i nomi delle label è contenuta nella cartella *names*. `0` corrisponde a `not_equivalent` (significato diverso), e `1` corrisponde a `equivalent` (stesso significato).
 
-<Tip>
-
-✏️ **Prova tu!** Quali sono le label dell'elemento 15 del training set, e 87 del validation set?
-
-</Tip>
+> [!TIP]
+> ✏️ **Prova tu!** Quali sono le label dell'elemento 15 del training set, e 87 del validation set?
 
 ### Preprocessing del dataset
 
@@ -193,11 +189,8 @@ inputs
 
 Sono state già discusse nel [Capitolo 2](/course/chapter2) le chiavi `input_ids` e `attention_mask`, ma il discorso su `token_type_ids` era stato rimandato. In questo esempio, ciò può essere usato per indicare al modello quale parte dell'input è la prima frase, e quale la seconda. 
 
-<Tip>
-
-✏️ **Prova tu!** Prendere l'element 15 del training set e tokenizzare le due frasi sia separatamente, sia come coppia. Qual è la differenza tra i due risultati?
-
-</Tip>
+> [!TIP]
+> ✏️ **Prova tu!** Prendere l'element 15 del training set e tokenizzare le due frasi sia separatamente, sia come coppia. Qual è la differenza tra i due risultati?
 
 Decodificando gli ID in `input_ids` per ritrasformarli in parole:
 
@@ -354,11 +347,8 @@ Ottimo! Adesso che siamo passati dal testo grezzo a dei batch che il modello è
 
 {/if}
 
-<Tip>
-
-✏️ **Prova tu!** Replicare il preprocessing sul dataset GLUE SST-2. È leggermente diverso poiche è composto da frasi singole e non da coppie di frasi, ma il resto della procedura dovrebbe essere simile. Per una sfida più complessa, provare a scrivere una funzione di preprocessing che funzioni per qualsiasi dei compiti in GLUE. 
-
-</Tip>
+> [!TIP]
+> ✏️ **Prova tu!** Replicare il preprocessing sul dataset GLUE SST-2. È leggermente diverso poiche è composto da frasi singole e non da coppie di frasi, ma il resto della procedura dovrebbe essere simile. Per una sfida più complessa, provare a scrivere una funzione di preprocessing che funzioni per qualsiasi dei compiti in GLUE.
 
 {#if fw === 'tf'}
 
diff --git a/chapters/it/chapter3/3.mdx b/chapters/it/chapter3/3.mdx
index 643d014e4..878a546ef 100644
--- a/chapters/it/chapter3/3.mdx
+++ b/chapters/it/chapter3/3.mdx
@@ -42,11 +42,8 @@ from transformers import TrainingArguments
 training_args = TrainingArguments("test-trainer")
 ```
 
-<Tip>
-
-💡 Se si vuole caricare automaticamente il modello all'Hub durante l'addestramento, basta passare `push_to_hub=True` come parametro nei `TrainingArguments`. Maggiori dettagli verranno forniti nel [Capitolo 4](/course/chapter4/3).
-
-</Tip>
+> [!TIP]
+> 💡 Se si vuole caricare automaticamente il modello all'Hub durante l'addestramento, basta passare `push_to_hub=True` come parametro nei `TrainingArguments`. Maggiori dettagli verranno forniti nel [Capitolo 4](/course/chapter4/3).
 
 Il secondo passo è definire il modello. Come nel [capitolo precedente](/course/chapter2), utilizzeremo la classe `AutoModelForSequenceClassification` con due label:
 
@@ -164,9 +161,6 @@ Il `Trainer` funzionerà direttamente su svariate GPU e TPU e ha molte opzioni,
 
 Qui si conclude l'introduzione all'affinamento usando l'API del `Trainer`. Esempi per i compiti più comuni in NLP verranno forniti nel Capitolo 7, ma per ora vediamo come ottenere la stessa cosa usando puramente Pytorch.
 
-<Tip>
-
-✏️ **Prova tu!** Affinare un modello sul dataset GLUE SST-2 utilizzando il processing dei dati già fatto nella sezione 2. 
-
-</Tip>
+> [!TIP]
+> ✏️ **Prova tu!** Affinare un modello sul dataset GLUE SST-2 utilizzando il processing dei dati già fatto nella sezione 2.
 
diff --git a/chapters/it/chapter3/3_tf.mdx b/chapters/it/chapter3/3_tf.mdx
index fbf82e28d..567399dc8 100644
--- a/chapters/it/chapter3/3_tf.mdx
+++ b/chapters/it/chapter3/3_tf.mdx
@@ -70,11 +70,8 @@ Diversamente dal [Capitolo 2](/course/chapter2), un avviso di avvertimento verr
 
 Per affinare il modello sul dataset, bisogna solo chiamare `compile()` (compila) sul modello e passare i dati al metodo `fit()`. Ciò farà partire il processo di affinamento (che dovrebbe richiedere un paio di minuti su una GPU) e fare il report della funzione obiettivo di addestramento, in aggiunta alla funzione obiettivo di validazione alla fine di ogni epoca. 
 
-<Tip>
-
-I modelli 🤗 Transformers hanno un'abilità speciale che manca alla maggior parte dei modelli Keras – possono usare in maniera automatica una funzione obiettivo appropriata, calcolata internamente. Questa funzione obiettivo verrà usata di default a meno che non venga definito l'argomento di funzione obiettivo nel metodo `compile()`. Per usare la funzione obiettivo interna è necessario passare le etichette come parte dell'input, non separatamente, che è l'approccio normale con i modelli Keras. Verranno mostrati esempi di ciò nella Parte 2 del corso, dove definire la funzione obiettivo correttamente può essere difficile. Per la classificazione di sequenze, invece, la funzione obiettivo standard di Keras funziona bene, quindi verrà utilizzata quella. 
-
-</Tip>
+> [!TIP]
+> I modelli 🤗 Transformers hanno un'abilità speciale che manca alla maggior parte dei modelli Keras – possono usare in maniera automatica una funzione obiettivo appropriata, calcolata internamente. Questa funzione obiettivo verrà usata di default a meno che non venga definito l'argomento di funzione obiettivo nel metodo `compile()`. Per usare la funzione obiettivo interna è necessario passare le etichette come parte dell'input, non separatamente, che è l'approccio normale con i modelli Keras. Verranno mostrati esempi di ciò nella Parte 2 del corso, dove definire la funzione obiettivo correttamente può essere difficile. Per la classificazione di sequenze, invece, la funzione obiettivo standard di Keras funziona bene, quindi verrà utilizzata quella.
 
 ```py
 from tensorflow.keras.losses import SparseCategoricalCrossentropy
@@ -90,11 +87,8 @@ model.fit(
 )
 ```
 
-<Tip warning={true}>
-
-Attenzione ad un errore comune — si *può* passare solo il nome della funzione obiettivo a Keras come una stringa, ma di default Keras si aspetta che softmax sia già stato applicato ai risultati. Molti modelli invece forniscono come risultato i valori prima dell'applicazione del softmax, chiamati *logits*. Bisogna informare la funzione obiettivo che il nostro modello fa esattamente questo, e il solo modo di farlo è invocandola direttamente, non tramite la stringa che rappresenta il suo nome. 
-
-</Tip>
+> [!WARNING]
+> Attenzione ad un errore comune — si *può* passare solo il nome della funzione obiettivo a Keras come una stringa, ma di default Keras si aspetta che softmax sia già stato applicato ai risultati. Molti modelli invece forniscono come risultato i valori prima dell'applicazione del softmax, chiamati *logits*. Bisogna informare la funzione obiettivo che il nostro modello fa esattamente questo, e il solo modo di farlo è invocandola direttamente, non tramite la stringa che rappresenta il suo nome.
 
 
 ### Migliorare la performance di addestramento
@@ -122,11 +116,8 @@ from tensorflow.keras.optimizers import Adam
 opt = Adam(learning_rate=lr_scheduler)
 ```
 
-<Tip>
-
-La libreria 🤗 Transformers fornisce anche una funzione `create_optimizer()` che crea un ottimizzatore `AdamW` con decadimento della learning rate. Questa può essere una scorciatoia utile che verrà presentata nel dettaglio nelle sezioni future del corso.
-
-</Tip>
+> [!TIP]
+> La libreria 🤗 Transformers fornisce anche una funzione `create_optimizer()` che crea un ottimizzatore `AdamW` con decadimento della learning rate. Questa può essere una scorciatoia utile che verrà presentata nel dettaglio nelle sezioni future del corso.
 
 Adesso che abbiamo il nostro ottimizzatore nuovo di zecca, possiamo provare con un addestramento. Per prima cosa, ricarichiamo il modello, per resettare i cambiamenti ai pesi dall'addestramento precedente, dopodiché lo possiamo compilare con nuovo ottimizzatore.
 
@@ -144,11 +135,8 @@ Ora chiamiamo di nuovo fit
 model.fit(tf_train_dataset, validation_data=tf_validation_dataset, epochs=3)
 ```
 
-<Tip>
-
-💡 Se vuoi caricare il modello in maniera automatica all'Hub durante l'addestramento, puoi passare `PushToHubCallback` al metodo `model.fit()`. Maggiori dettagli verranno forniti nel [Capitolo 4](/course/chapter4/3)
-
-</Tip>
+> [!TIP]
+> 💡 Se vuoi caricare il modello in maniera automatica all'Hub durante l'addestramento, puoi passare `PushToHubCallback` al metodo `model.fit()`. Maggiori dettagli verranno forniti nel [Capitolo 4](/course/chapter4/3)
 
 ### Predizioni del modello
 
diff --git a/chapters/it/chapter3/4.mdx b/chapters/it/chapter3/4.mdx
index 74f0e311c..e0836f604 100644
--- a/chapters/it/chapter3/4.mdx
+++ b/chapters/it/chapter3/4.mdx
@@ -197,11 +197,8 @@ metric.compute()
 
 Ancora una volta, i vostri risultati potrebbero essere leggermente diversi a causa della casualità nell'inizializzazione della testa del modello e del ricombinamento dei dati, ma dovrebbero essere nello stesso ordine di grandezza.
 
-<Tip>
-
-✏️ **Prova tu!** Modifica il ciclo di addestramento precedente per affinare il modello sul dataset SST-2.
-
-</Tip>
+> [!TIP]
+> ✏️ **Prova tu!** Modifica il ciclo di addestramento precedente per affinare il modello sul dataset SST-2.
 
 ### Potenzia il tuo ciclo di addestramento con 🤗 Accelerate
 
@@ -293,9 +290,8 @@ Prima di tutto bisogna inserire la linea di importazione. La seconda linea istan
 
 Dopodiché la maggior parte del lavoro è fatta dalla linea che invia i dataloaders, il modello e gli ottimizzatori a `accelerator.prepare()`. Ciò serve a incapsulare queli oggetti nei contenitori appropriati per far sì che l'addestramento distribuito funzioni correttamente. I cambiamenti rimanenti sono la rimozione della linea che sposta la batch sul `device` (dispositivo) (di nuovo, se volete tenerlo potete semplicemente cambiarlo con `accelerator.device`) e lo scambio di `loss.backward()` con `accelerator.backward(loss)`. 
 
-<Tip>
-⚠️ Per poter beneficiare dell'accelerazione offerta da Cloud TPUs, è raccomandabile applicare padding ad una lunghezza fissa tramite gli argomenti `padding="max_length"` e `max_length` del tokenizer.
-</Tip>
+> [!TIP]
+> ⚠️ Per poter beneficiare dell'accelerazione offerta da Cloud TPUs, è raccomandabile applicare padding ad una lunghezza fissa tramite gli argomenti `padding="max_length"` e `max_length` del tokenizer.
 
 Se volete copiare e incollare il codice per giocarci, ecco un ciclo di addestramento completo che usa 🤗 Accelerate:
 
diff --git a/chapters/it/chapter4/2.mdx b/chapters/it/chapter4/2.mdx
index dcee59816..005f6df49 100644
--- a/chapters/it/chapter4/2.mdx
+++ b/chapters/it/chapter4/2.mdx
@@ -91,6 +91,5 @@ model = TFAutoModelForMaskedLM.from_pretrained("camembert-base")
 ```
 {/if}
 
-<Tip>
-Quando usate un modello pre-addestrato, assicuratevi di controllare come è stato addestrato, su quali dataset, i suoi limiti e i suoi bias. Tutte queste informazioni dovrebbero essere indicate sul cartellino del modello.
-</Tip>
+> [!TIP]
+> Quando usate un modello pre-addestrato, assicuratevi di controllare come è stato addestrato, su quali dataset, i suoi limiti e i suoi bias. Tutte queste informazioni dovrebbero essere indicate sul cartellino del modello.
diff --git a/chapters/it/chapter4/3.mdx b/chapters/it/chapter4/3.mdx
index 0a531c7fd..bb68bed50 100644
--- a/chapters/it/chapter4/3.mdx
+++ b/chapters/it/chapter4/3.mdx
@@ -172,11 +172,8 @@ Cliccando sulla scheda "Files and versions" dovreste vedere la lista dei file ca
 </div>
 {/if}
 
-<Tip>
-
-✏️ **Prova tu!** Prendi il modello e il tokenizer associati con il cehckpoint `bert-base-cased` e caricali in un repository nel tuo namespace usando il metodo `push_to_hub()`. Verifica che il repository appaia correttamente sulla tua pagina prima di cancellarlo.
-
-</Tip>
+> [!TIP]
+> ✏️ **Prova tu!** Prendi il modello e il tokenizer associati con il cehckpoint `bert-base-cased` e caricali in un repository nel tuo namespace usando il metodo `push_to_hub()`. Verifica che il repository appaia correttamente sulla tua pagina prima di cancellarlo.
 
 Come avete visto, il metodo `push_to_hub()` accetta numerosi parametri, rendendo possible caricare i file su uno specifico repository o in un namespace di una organizzazione, o utilizzare un qualunque API token. Consigliamo di leggere la documentazione disponibile alla pagina [🤗 Transformers documentation](https://huggingface.co/transformers/model_sharing.html) per farsi una idea di tutte le possibilità offerte dal metodo.
 
@@ -466,10 +463,9 @@ Guardando le dimensioni dei file (ad esempio con `ls -lh`), possiamo vedere che
 
 {/if}
 
-<Tip>
-✏️ When creating the repository from the web interface, the *.gitattributes* file is automatically set up to consider files with certain extensions, such as *.bin* and *.h5*, as large files, and git-lfs will track them with no necessary setup on your side.
-✏️ Creando il reposiotry dall'interfaccia web, il file *.gitattributes*  viene automaticamente configurato per considerare file con alcune estensioni, come *.bin* e *.h5*, come file grandi, e git-lfs li traccerà senza necessità di configurazione da parte dell'utente.
-</Tip> 
+> [!TIP]
+> ✏️ When creating the repository from the web interface, the *.gitattributes* file is automatically set up to consider files with certain extensions, such as *.bin* and *.h5*, as large files, and git-lfs will track them with no necessary setup on your side.
+> ✏️ Creando il reposiotry dall'interfaccia web, il file *.gitattributes*  viene automaticamente configurato per considerare file con alcune estensioni, come *.bin* e *.h5*, come file grandi, e git-lfs li traccerà senza necessità di configurazione da parte dell'utente. 
 
 Possiamo quindi procedere come faremo per un repository Git tradizionale. Possiamo aggiungere tutti i file all'ambiente di staging di Git con il comando `git add`:
 
diff --git a/chapters/it/chapter5/2.mdx b/chapters/it/chapter5/2.mdx
index c3ead7ad6..4672fb617 100644
--- a/chapters/it/chapter5/2.mdx
+++ b/chapters/it/chapter5/2.mdx
@@ -48,11 +48,8 @@ SQuAD_it-train.json.gz:	   82.2% -- replaced with SQuAD_it-train.json
 
 Vediamo che i dati compressi sono stati sostituiti da _SQuAD_it-train.json_ e _SQuAD_it-text.json_, e che i dati sono archiviati in formato JSON.
 
-<Tip>
-
-✎ Se ti stai chiedendo perché c'è un `!` nei comandi di shell precedenti, è perché li stiamo eseguendo da un notebook Jupyter. Se vuoi scaricare e decomprimere i dataset da un terminale, non devi fare altro che rimuovere il prefisso.
-
-</Tip>
+> [!TIP]
+> ✎ Se ti stai chiedendo perché c'è un `!` nei comandi di shell precedenti, è perché li stiamo eseguendo da un notebook Jupyter. Se vuoi scaricare e decomprimere i dataset da un terminale, non devi fare altro che rimuovere il prefisso.
 
 Per caricare un file JSON con la funzione `load_dataset()`, ci serve solo sapere se abbiamo a che fare con un normale JSON (simile a un dizionario annidato) o con un JSON Lines (JSON separato da righe). Come molti dataset per il question asnwring, SQuAD-it usa il formato annidato, con tutto il testo immagazzinato nel campo `data`. Questo significa che possiamo caricare il dataset specificando l'argomento `field` come segue:
 
@@ -126,11 +123,8 @@ DatasetDict({
 
 Questo è proprio ciò che volevamo. Ora possiamo applicare diverse tecniche di preprocessamento per pulire i dati, tokenizzare le revisioni, e altro.
 
-<Tip>
-
-L'argomento `data_files` della funzione `load_dataset()` è molto flessibile, e può essere usato con un percorso file singolo, con una lista di percorsi file, o un dizionario che mappa i nomi delle sezioni ai percorsi file. È anche possibile usare comandi glob per recuperare tutti i file che soddisfano uno specifico pattern secondo le regole dello shell di Unix (ad esempio, è possibile recuperare tutti i file JSON presenti in una cartella usando il pattern `data_files="*.json"`). Consulta la [documentazione](https://huggingface.co/docs/datasets/loading#local-and-remote-files) 🤗 Datasets per maggiori informazioni.
-
-</Tip>
+> [!TIP]
+> L'argomento `data_files` della funzione `load_dataset()` è molto flessibile, e può essere usato con un percorso file singolo, con una lista di percorsi file, o un dizionario che mappa i nomi delle sezioni ai percorsi file. È anche possibile usare comandi glob per recuperare tutti i file che soddisfano uno specifico pattern secondo le regole dello shell di Unix (ad esempio, è possibile recuperare tutti i file JSON presenti in una cartella usando il pattern `data_files="*.json"`). Consulta la [documentazione](https://huggingface.co/docs/datasets/loading#local-and-remote-files) 🤗 Datasets per maggiori informazioni.
 
 Gli script presenti in 🤗 Datasets supportano la decompressione atuomatica dei file in input, quindi possiamo saltare l'uso di `gzip` puntando `data_files` direttamente ai file compressi:
 
@@ -158,10 +152,7 @@ squad_it_dataset = load_dataset("json", data_files=data_files, field="data")
 
 Questo codice restituisce lo stesso oggetto `DatasetDict` visto in precedenza, ma ci risparmia il passaggio manuale di scaricare e decomprimere i file _SQuAD_it-*.json.gz_. Questo conclude la nostra incursione nei diversi modi di caricare dataset che non sono presenti nell'Hub Hugging Face. Ora che abbiamo un dataset con cui giocare, sporchiamoci le mani con diverse tecniche di data-wrangling!
 
-<Tip>
-
-✏️ **Prova tu!** Scegli un altro dataset presente su GitHub o sulla [Repository di Machine Learning UCI](https://archive.ics.uci.edu/ml/index.php) e cerca di caricare sia in locale che in remoto usando le tecniche introdotte in precedenza. Per punti extra, prova a caricare un dataset archiviato in formato CSV o testuale (vedi la [documentazione](https://huggingface.co/docs/datasets/loading#local-and-remote-files) per ulteriori informazioni su questi formati).
-
-</Tip>
+> [!TIP]
+> ✏️ **Prova tu!** Scegli un altro dataset presente su GitHub o sulla [Repository di Machine Learning UCI](https://archive.ics.uci.edu/ml/index.php) e cerca di caricare sia in locale che in remoto usando le tecniche introdotte in precedenza. Per punti extra, prova a caricare un dataset archiviato in formato CSV o testuale (vedi la [documentazione](https://huggingface.co/docs/datasets/loading#local-and-remote-files) per ulteriori informazioni su questi formati).
 
 
diff --git a/chapters/it/chapter5/3.mdx b/chapters/it/chapter5/3.mdx
index 779174afc..70fe152e0 100644
--- a/chapters/it/chapter5/3.mdx
+++ b/chapters/it/chapter5/3.mdx
@@ -90,11 +90,8 @@ DatasetDict({
 })
 ```
 
-<Tip>
-
-✏️ **Prova tu!** Usa la funzione `Dataset.unique()` per trovare il numero di medicine diverse e condizioni nelle sezioni di addestramento e di test.
-
-</Tip>
+> [!TIP]
+> ✏️ **Prova tu!** Usa la funzione `Dataset.unique()` per trovare il numero di medicine diverse e condizioni nelle sezioni di addestramento e di test.
 
 Ora, normaliziamo le etichette in `condition` utilizzando `Dataset.map()`. Così come abbiamo fatto con la tokenizzazione nel [Capitolo 3](/course/chapter3), possiamo definire una semplice funzione che può essere applicata a tutte le righe di ogni sezione nel `drug_dataset`:
 
@@ -220,11 +217,8 @@ drug_dataset["train"].sort("review_length")[:3]
 
 Come sospettato, alcune revisioni contengono una sola parola che, benché potrebbe essere utile per la sentiment analysis, non dà informazioni utili per predirre la condizione.
 
-<Tip>
-
-🙋Un altro modo per aggiungere nuove colonne a un dataset è attraverso la funzione `Dataset.add_column()`. Questo ti permette di inserire le colonne come una lista Python o unarray NumPy, e può tornare utile in situazioni in cui `Dataset.map()` non è indicata per le tue analisi. 
-
-</Tip>
+> [!TIP]
+> 🙋Un altro modo per aggiungere nuove colonne a un dataset è attraverso la funzione `Dataset.add_column()`. Questo ti permette di inserire le colonne come una lista Python o unarray NumPy, e può tornare utile in situazioni in cui `Dataset.map()` non è indicata per le tue analisi.
 
 Usiamo la funzione `Dataset.filter()` per rimuovere le recensioni che contengono meno di 30 parole. Proprio come abbiamo fatto per la colonna `condizione`, possiamo eliminare le recensioni più brevi aggiungendo un filtro che lascia passare solo le recensioni più lunghe di una certa soglia:
 
@@ -239,11 +233,8 @@ print(drug_dataset.num_rows)
 
 Come puoi vedere, questo ha rimosso circa il 15% delle recensioni nelle sezioni di training e di test.
 
-<Tip>
-
-✏️ **Prova tu!** Usa la funzione `Dataset.sort()` per analizzare le revisioni con il maggior numero di parole. Controlla la [documentazione](https://huggingface.co/docs/datasets/package_reference/main_classes#datasets.Dataset.sort) per vedere quali argomenti bisogna usare per ordinare le recensioni in ordine decrescente di lunghezza.
-
-</Tip>
+> [!TIP]
+> ✏️ **Prova tu!** Usa la funzione `Dataset.sort()` per analizzare le revisioni con il maggior numero di parole. Controlla la [documentazione](https://huggingface.co/docs/datasets/package_reference/main_classes#datasets.Dataset.sort) per vedere quali argomenti bisogna usare per ordinare le recensioni in ordine decrescente di lunghezza.
 
 L'ultima cosa che ci resta da risolvere è la presenza di codici HTML di caratteri nelle nostre recensioni. Possiamo usare il modulo Python `html` per sostituirli, così:
 
@@ -299,10 +290,8 @@ Come visto nel [Capitolo 3](/course/chapter3), possiamo passare uno o più esemp
 
 Possiamo cronometrare anche un'intera cella inserento `%%time` all'inizio della cella. Sull'hardware che stiamo utilizzando, mostrava 10.8s pe rquest'istruzione (è il numero scritto dopo "Wall time").
 
-<Tip>
-
-✏️ **Prova tu!** Esegui la stessa istruzione con e senza `batched=True`, poi prova con un tokenizzatore lento (aggiungi `add_fast=False` al metodo `AutoTokenizer.from_pretrained()`) così che puoi controllare i tempi sul tuo hardware.
-</Tip>
+> [!TIP]
+> ✏️ **Prova tu!** Esegui la stessa istruzione con e senza `batched=True`, poi prova con un tokenizzatore lento (aggiungi `add_fast=False` al metodo `AutoTokenizer.from_pretrained()`) così che puoi controllare i tempi sul tuo hardware.
 
 Ecco i risultati che otteniamo con e senza utilizzare batch, con un tokenizzatore lento e uno veloce:
 
@@ -339,19 +328,13 @@ Opzioni                       | Tokenizzatore veloce | Tokenizzatore lento
 
 Questi sono dei risultati molto più accettabili per il tokenizzatore lento, ma anche la performance dei tokenizzatori veloci è notevolmente migliorata. Notare, comunque, che non è sempre questo il caso: per valori di `num_proc` diversi da 8, i nostri test hanno mostrato che è più veloce utilizzare `batched=True` senza l'opzione `num_proc`. In generale, non raccomandiamo l'utilizzo di multiprocessing Python per tokenizzatori veloci con `batched=True`. 
 
-<Tip>
-
-Utilizzare `num_proc` per accelerare i processi è generalmente una buona idea, a patto che la funzione che stai utilizzando non stia già usando un qualche tipo di multiprocessing per conto proprio.
-
-</Tip>
+> [!TIP]
+> Utilizzare `num_proc` per accelerare i processi è generalmente una buona idea, a patto che la funzione che stai utilizzando non stia già usando un qualche tipo di multiprocessing per conto proprio.
 
 Tutte queste funzionalità condensate in un unico metodo sono già molto utili, ma c'è altro! Con `Dataset.map()` e `batched=True`, è possibile modificare il numero di elementi nel tuo dataset. È particolarmente utile quando vuoi creare diverse feature di addestramento da un unico esempio, e ne avremo bisogno come parte di preprocessing per molti dei task NLP che affronteremo nel [Capitolo 7](/course/chapter7). 
 
-<Tip>
-
-💡 Nel machine learning, un _esempio_ è solitamente definito come un insieme di _feature_ che diamo in pasto al modello. In alcuni contesti, queste feature saranno l'insieme delle colonne in un `Dataset`, ma in altri casi (come ad esempio questo, o per il question answering), molte feature possono essere estratte da un singolo esempio, e appartenere a una sola colonna.
-
-</Tip>
+> [!TIP]
+> 💡 Nel machine learning, un _esempio_ è solitamente definito come un insieme di _feature_ che diamo in pasto al modello. In alcuni contesti, queste feature saranno l'insieme delle colonne in un `Dataset`, ma in altri casi (come ad esempio questo, o per il question answering), molte feature possono essere estratte da un singolo esempio, e appartenere a una sola colonna.
 
 Diamo un'occhiata a come funziona! Tokenizziamo i nostri esempi e tronchiamoli a una lunghezza massima di 128, ma chiediamo al tokenizzatore di restituire *tutti* i pezzi di testo e non solo il primo. Questo può essere fatto con `return_overflowing_tokens=True`:
 
@@ -520,11 +503,8 @@ Creiamo un `pandas.DataFrame` per l'intero set di addestramento selezionando tut
 train_df = drug_dataset["train"][:]
 ```
 
-<Tip>
-
-🚨 Dietro le quinte, `Dataset.set_format()` modifica il formato di restituzione del meteodo dunder `__getitem__()` del dataset. Questo significa che quando vogliamo creare un nuovo oggetto come ad esempio `train_df` da un `Dataset` in formato `"pandas"`, abbiamo bisogno di suddividere l'intero dataset per ottenere un `pandas.DataFrame`. Puoi verificare da te che `drug_dataset["train"]` ha come tipo `Dataset`, a prescindere dal formato di output.
-
-</Tip>
+> [!TIP]
+> 🚨 Dietro le quinte, `Dataset.set_format()` modifica il formato di restituzione del meteodo dunder `__getitem__()` del dataset. Questo significa che quando vogliamo creare un nuovo oggetto come ad esempio `train_df` da un `Dataset` in formato `"pandas"`, abbiamo bisogno di suddividere l'intero dataset per ottenere un `pandas.DataFrame`. Puoi verificare da te che `drug_dataset["train"]` ha come tipo `Dataset`, a prescindere dal formato di output.
 
 Da qui possiamo usare tutte le funzionalità Pandas che vogliamo. Ad esempio, possiamo creare un concatenamento per calcolare la distribuzione delle classi nelle voci `condition`:
 
@@ -593,11 +573,8 @@ Dataset({
 })
 ```
 
-<Tip>
-
-✏️ **Prova tu!** Calcola la valutazione media per i medicinali, e salviamo i risultati in un nuovo `Dataset`. 
-
-</Tip>
+> [!TIP]
+> ✏️ **Prova tu!** Calcola la valutazione media per i medicinali, e salviamo i risultati in un nuovo `Dataset`.
 
 Questo conclude il nostro tour delle diverse tecniche di prepocessamento disponibile in 🤗 Datasets. Per riepilogare la sezione, creiamo un set di validazione per preparare il dataset su cui addestreremo un classificatore. Prima di far ciò, resettiamo il formato di output di `drug_dataset` da `"pandas"` a `"arrow"`:
 
diff --git a/chapters/it/chapter5/4.mdx b/chapters/it/chapter5/4.mdx
index ad2f6e191..127500153 100644
--- a/chapters/it/chapter5/4.mdx
+++ b/chapters/it/chapter5/4.mdx
@@ -45,11 +45,8 @@ Dataset({
 
 Possiamo vedere che ci sono 15.518.009 righe e 2 colonne nel nostro dataset -- un bel po'!
 
-<Tip>
-
-✎ Di base, 🤗 Datasets decomprimerà i file necessari a caricare un dataset. Se vuoi risparmiare sullo spazio dell'hard disk, puoi passare `DownloadConfig(delete_extracted_True)` all'argomento `download_config` di `load_dataset()`. Per maggiori dettagli leggi la [documentazione](https://huggingface.co/docs/datasets/package_reference/builder_classes#datasets.DownloadConfig).
-
-</Tip>
+> [!TIP]
+> ✎ Di base, 🤗 Datasets decomprimerà i file necessari a caricare un dataset. Se vuoi risparmiare sullo spazio dell'hard disk, puoi passare `DownloadConfig(delete_extracted_True)` all'argomento `download_config` di `load_dataset()`. Per maggiori dettagli leggi la [documentazione](https://huggingface.co/docs/datasets/package_reference/builder_classes#datasets.DownloadConfig).
 
 Ispezioniamo i contenuti del primo esempio:
 
@@ -100,11 +97,8 @@ Dataset size (cache file) : 19.54 GB
 
 Bene -- nonostante sia grande quasi 30 GB, siamo in grado di caricare e accedere al dataset utilizzando molta meno RAM!
 
-<Tip>
-
-✏️ **Provaci tu!** Scegli uno dei [subset](https://the-eye.eu/public/AI/pile_preliminary_components/) di Pile che è più grande della RAM del tuo PC o del tuo portatile, caricalo utilizzando 🤗 Datasets e calcola la quantità di RAM utilizzata. Nota che per avere un valore preciso, dovrai creare un nuovo processo. Puoi trovare le grandezze decompresse di ogni subset nella Tavola 1 dell'[articolo su Pile](https://arxiv.org/abs/2101.00027)
-
-</Tip>
+> [!TIP]
+> ✏️ **Provaci tu!** Scegli uno dei [subset](https://the-eye.eu/public/AI/pile_preliminary_components/) di Pile che è più grande della RAM del tuo PC o del tuo portatile, caricalo utilizzando 🤗 Datasets e calcola la quantità di RAM utilizzata. Nota che per avere un valore preciso, dovrai creare un nuovo processo. Puoi trovare le grandezze decompresse di ogni subset nella Tavola 1 dell'[articolo su Pile](https://arxiv.org/abs/2101.00027)
 
 Se hai dimestichezza con Pandas, questo risultato potrebbe sorprenderti, vista la famosa [regola di Wes Kinney](https://wesmckinney.com/blog/apache-arrow-pandas-internals/), ovvero che, in linea di massima, serve una RAM 5-10 volte più grande del dataset che vuoi caricare. Come fa 🤗 Datasets a risolvere questo problema di gestione della memoria? 🤗 Datasets tratta ogni dataset come un [file mappato in memoria](https://it.wikipedia.org/wiki/File_mappato_in_memoria), il che permette di avere un mapping tra la RAM e l'archiviazione dei file di sistema, che permette alla librera di accedere e operare su elementi del dataset senza doverli caricare completamente in memoria.
 
@@ -132,11 +126,8 @@ print(
 
 Abbiamo usato il modulo di Python `timeit` per calcolare il tempo di esecuzione impiegato da `code_snippet`. Tipicamente l'iterazione su un dataset impiega un tempo che va da un decimo di GB al secondo, a diversi GB al secondo. Questo funziona perfettamente per la maggior parte delle applicazioni, ma a volte avrai bisogno di lavorare con un dataset che è troppo grande persino per essere salvato sul tuo portatile. Ad esempio, se cercassimo di scaricare Pile per intero, avremo bisogno di 825 GB di spazio libero su disko! In questi casi, 🤗 Datasets permette di utilizzare processi di streaming che ci permettono di scaricare e accedere al volo ai dati, senza bisogno di scaricare l'intero dataset. Diamo un'occhiata a come funziona.
 
-<Tip>
-
-💡 Nei notebook Jupyter, puoi cronometrare le celle utilizzando la [funzione magica `%%timeit`](https://ipython.readthedocs.io/en/stable/interactive/magics.html#magic-timeit)
-
-</Tip>
+> [!TIP]
+> 💡 Nei notebook Jupyter, puoi cronometrare le celle utilizzando la [funzione magica `%%timeit`](https://ipython.readthedocs.io/en/stable/interactive/magics.html#magic-timeit)
 
 ## Streaming di dataset
 
@@ -173,11 +164,8 @@ next(iter(tokenized_dataset))
 {'input_ids': [101, 4958, 5178, 4328, 6779, ...], 'attention_mask': [1, 1, 1, 1, 1, ...]}
 ```
 
-<Tip>
-
-💡 Per velocizzare la tokenizzazione con lo streaming puoi passare `batchet=True`, come abbiamo visto nell'ultima sezione. Questo processerà gli esempi per batch. Di default, la grandezza di un batch è 1.000, e può essere specificata attraverso l'argomento `batch_size`.
-
-</Tip>
+> [!TIP]
+> 💡 Per velocizzare la tokenizzazione con lo streaming puoi passare `batchet=True`, come abbiamo visto nell'ultima sezione. Questo processerà gli esempi per batch. Di default, la grandezza di un batch è 1.000, e può essere specificata attraverso l'argomento `batch_size`.
 
 È anche possibile mescolare un dataset in streaming utilizzato `Iterabledataset.shuffle()`, ma a differenza di `Dataset.shuffle()`, questo metodo mescola solo gli elementi in un `buffer_size` predefinito:
 
@@ -279,10 +267,7 @@ next(iter(pile_dataset["train"]))
  'text': 'It is done, and submitted. You can play “Survival of the Tastiest” on Android, and on the web...'}
 ```
 
-<Tip>
-
-✏️ **Prova tu!** Usa uno dei corpora Common Crawl come [`mc4`](https://huggingface.co/datasets/mc4) oppure [`oscar`](https://huggingface.co/datasets/oscar) per crare un dataset multilingue in streaming, che rappresenta le proporzioni delle lingue parlate in un paese a tua scelta. Ad esempio, le quattro lingue ufficiali in Svizzera sono il tedesco, il francesce, l'italiano e il romancio, per cui potresti creare un corpus della Svizzera raccogliendo i campioni da Oscar, secondo la percentuale di parlanti di ognuna.
-
-</Tip>
+> [!TIP]
+> ✏️ **Prova tu!** Usa uno dei corpora Common Crawl come [`mc4`](https://huggingface.co/datasets/mc4) oppure [`oscar`](https://huggingface.co/datasets/oscar) per crare un dataset multilingue in streaming, che rappresenta le proporzioni delle lingue parlate in un paese a tua scelta. Ad esempio, le quattro lingue ufficiali in Svizzera sono il tedesco, il francesce, l'italiano e il romancio, per cui potresti creare un corpus della Svizzera raccogliendo i campioni da Oscar, secondo la percentuale di parlanti di ognuna.
 
 Ora hai a tua disposizione tutti gli strumenti per caricare e processare dataset di ogni tipo -- ma a meno che tu non sia estremamente fortunato, arriverà un momento nel tuo cammino in cui dovrai effettivamente creare un dataset per risolvere i tuoi problemi. Questo sarà argomento della prossima sezione!
diff --git a/chapters/it/chapter5/5.mdx b/chapters/it/chapter5/5.mdx
index a9246a463..2a071699d 100644
--- a/chapters/it/chapter5/5.mdx
+++ b/chapters/it/chapter5/5.mdx
@@ -113,11 +113,8 @@ response.json()
 
 Wow, quante informazioni! Possiamo vedere alcuni campi utili come `title`, `body` e `number` che descrivono l'issue, così come informazioni sull'utente che l'ha aperto.
 
-<Tip>
-
-✏️ **Prova tu!** Clicca su alcuni degli URL nel payload JSON per farti un'idea del tipo di informazione a cui è collegato ogni issue GitHub.
-
-</Tip>
+> [!TIP]
+> ✏️ **Prova tu!** Clicca su alcuni degli URL nel payload JSON per farti un'idea del tipo di informazione a cui è collegato ogni issue GitHub.
 
 Come descritto nella [documentazione di GitHub](https://docs.github.com/en/rest/overview/resources-in-the-rest-api#rate-limiting), le richieste senza autenticazione sono limitate a 60 ogni ora. Benché possiamo aumentare il parametro della query `per_page` per ridurre il numero di richieste, raggiungerai comunque il limite su qualunque repository che ha qualche migliaio di issue. Quindi, dovresti seguire le [istruzioni](https://docs.github.com/en/github/authenticating-to-github/creating-a-personal-access-token) su come creare un _token di accesso personale_ così che puoi aumentare il limite a 5.000 richieste ogni ora. Una volta che hai ottenuto il tuo token, puoi includerlo come parte dell'header della richiesta:
 
@@ -126,11 +123,8 @@ GITHUB_TOKEN = xxx  # inserisci qui il tuo token GitHub
 headers = {"Authorization": f"token {GITHUB_TOKEN}"}
 ```
 
-<Tip warning={true}>
-
-⚠️ Fai attenzione a non condividere un notebook con il tuo `GITHUB_TOKEN` al suo interno. Ti consigliamo di cancellare l'ultima cella una volta che l'hai eseguita per evitare di far trapelare quest'informazione accidentalmente. Meglio ancora, salva il tuo token in un file *.env* e usa la [libreria `python-dotenv`](https://github.com/theskumar/python-dotenv) per caricarlo automaticamente come una variabile d'ambiente.
-
-</Tip>
+> [!WARNING]
+> ⚠️ Fai attenzione a non condividere un notebook con il tuo `GITHUB_TOKEN` al suo interno. Ti consigliamo di cancellare l'ultima cella una volta che l'hai eseguita per evitare di far trapelare quest'informazione accidentalmente. Meglio ancora, salva il tuo token in un file *.env* e usa la [libreria `python-dotenv`](https://github.com/theskumar/python-dotenv) per caricarlo automaticamente come una variabile d'ambiente.
 
 Ora che abbiamo il nostro token di accesso, creiamo una funzione che scarichi tutti gli issue da una repository GitHub:
 
@@ -239,11 +233,8 @@ issues_dataset = issues_dataset.map(
 )
 ```
 
-<Tip>
-
-✏️ **Prova tu!** Calcola il tempo medio che ci vuole a chiudere un issue su 🤗 Datasets. Potrebbe essere utile usare la funzione `Dataset.filter()` per eliminare le richieste di pull e gli issue aperti, e puoi usare la funzione `Dataset.set_format()` per convertire il dataset in un `DataFrame` così che puoi facilmente manipolare i timestamp `created_at` e `closed_at`. Per dei punti bonus, calcola il tempo medio che ci vuole a chiudere le richieste di pull.
-
-</Tip>
+> [!TIP]
+> ✏️ **Prova tu!** Calcola il tempo medio che ci vuole a chiudere un issue su 🤗 Datasets. Potrebbe essere utile usare la funzione `Dataset.filter()` per eliminare le richieste di pull e gli issue aperti, e puoi usare la funzione `Dataset.set_format()` per convertire il dataset in un `DataFrame` così che puoi facilmente manipolare i timestamp `created_at` e `closed_at`. Per dei punti bonus, calcola il tempo medio che ci vuole a chiudere le richieste di pull.
 
 Benché potremmo procedere e pulire ulteriormente il dataset eliminando o rinominando alcune colonne, è solitamente buona prassi lasciare il dataset quando più intatto è possibile in questo stadio, così che può essere utilizzato facilmente in più applicazioni.
 
@@ -379,11 +370,8 @@ repo_url
 
 In quest'esempio, abbiamo creato una repository vuota chiamata `github-issues` con l'username `lewtun` (l'username dovrebbe essere quello del tuo account Hub quando esegui questo codice!).
 
-<Tip>
-
-✏️ **Prova tu!** Usa le tue credenziali dell'Hub Hugging Face per ottenere un token e creare una repository vuota chiamata `github-issues`. Ricordati di **non salvere mai le tue credenziali** su Colab o qualunque altra repository, perché potrebbero essere recuperate da malintenzionati.
-
-</Tip>
+> [!TIP]
+> ✏️ **Prova tu!** Usa le tue credenziali dell'Hub Hugging Face per ottenere un token e creare una repository vuota chiamata `github-issues`. Ricordati di **non salvere mai le tue credenziali** su Colab o qualunque altra repository, perché potrebbero essere recuperate da malintenzionati.
 
 Ora, cloniamo la repository dall'Hub alla nostra macchina e copiamo al suo interno i file del nostro dataset. 🤗 Hub contiene una classe `Repository` che ha al suo interno molti dei comandi più comuni di Git, per cui per clonare la repository in remoto dobbiamo semplicemente fornire l'URL e il percorso locale in cui desideriamo clonare:
 
@@ -428,11 +416,8 @@ Dataset({
 
 Bene, abbiamo caricato il nostro dataset sull'Hub, e può essere utilizzato da tutti! C'è un'altra cosa importante che dobbiamo fare: aggiungere una _dataset card_ che spiega come è stato creato il corpus, e offre altre informazioni utili per la community.
 
-<Tip>
-
-💡 Puoi caricare un dataset nell'Hub di Hugging Face anche direttamente dal terminale usando `huggingface-cli` e un po' di magia Git. La [guida a 🤗 Datasets](https://huggingface.co/docs/datasets/share#share-a-dataset-using-the-cli) spiega come farlo. 
-
-</Tip>
+> [!TIP]
+> 💡 Puoi caricare un dataset nell'Hub di Hugging Face anche direttamente dal terminale usando `huggingface-cli` e un po' di magia Git. La [guida a 🤗 Datasets](https://huggingface.co/docs/datasets/share#share-a-dataset-using-the-cli) spiega come farlo.
 
 ## Creare una dataset card
 
@@ -454,17 +439,12 @@ Puoi creare il file *README.md* direttamente sull'Hub, e puoi trovare un modello
 <img src="https://huggingface.co/datasets/huggingface-course/documentation-images/resolve/main/en/chapter5/dataset-card.png" alt="A dataset card." width="80%"/>
 </div>
 
-<Tip>
-
-✏️ **Prova tu!** Usa l'applicazione `dataset-tagging` e la [guida 🤗 Datasets](https://github.com/huggingface/datasets/blob/master/templates/README_guide.md) per completare il file *README.md* per il tuo dataset di issue di GitHub.
-</Tip>
+> [!TIP]
+> ✏️ **Prova tu!** Usa l'applicazione `dataset-tagging` e la [guida 🤗 Datasets](https://github.com/huggingface/datasets/blob/master/templates/README_guide.md) per completare il file *README.md* per il tuo dataset di issue di GitHub.
 
 È tutto! Abbiamo visto in questa sezione che creare un buon dataset può essere un'impresa, ma per fortuna caricarlo e condividerlo con la community è molto più semplice. Nella prossima sezione useremo il nostro nuovo dataset per creare un motore di ricerca semantico con 🤗 Datasets, che abbina alle domande gli issue e i commenti più rilevanti.
 
-<Tip>
-
-✏️ **Prova tu!** Segui i passi che abbiamo eseguito in questa sezione per creare un dataset di issue GitHub per la tua libreria open source preferita (ovviamente scegli qualcosa di diverso da 🤗 Datasets!). Per punti bonus, esegui il fine-tuning di un classificatore multiclasse per predirre i tag presenti nel campo `labels`.
-
-</Tip>
+> [!TIP]
+> ✏️ **Prova tu!** Segui i passi che abbiamo eseguito in questa sezione per creare un dataset di issue GitHub per la tua libreria open source preferita (ovviamente scegli qualcosa di diverso da 🤗 Datasets!). Per punti bonus, esegui il fine-tuning di un classificatore multiclasse per predirre i tag presenti nel campo `labels`.
 
 
diff --git a/chapters/it/chapter5/6.mdx b/chapters/it/chapter5/6.mdx
index 1bd0dca34..86fd6c62e 100644
--- a/chapters/it/chapter5/6.mdx
+++ b/chapters/it/chapter5/6.mdx
@@ -188,11 +188,8 @@ Dataset({
 Perfetto, ora abbiamo qualche migliaio di commenti con cui lavorare!
 
 
-<Tip>
-
-✏️ **Prova tu!** Prova ad utilizzare `Dataset.map()` per far esplodere la colonna `commenti` di `issues_dataset` _senza_ utilizzare Pandas. È un po' difficile: potrebbe tornarti utile la sezione ["Batch mapping"](https://huggingface.co/docs/datasets/about_map_batch#batch-mapping) della documentazione di 🤗 Datasets.
-
-</Tip>
+> [!TIP]
+> ✏️ **Prova tu!** Prova ad utilizzare `Dataset.map()` per far esplodere la colonna `commenti` di `issues_dataset` _senza_ utilizzare Pandas. È un po' difficile: potrebbe tornarti utile la sezione ["Batch mapping"](https://huggingface.co/docs/datasets/about_map_batch#batch-mapping) della documentazione di 🤗 Datasets.
 
 Ora che abbiamo un commento per riga, creiamo una nuova colonna `comments_length` che contiene il numero di parole per ogni commento:
 
@@ -524,8 +521,5 @@ URL: https://github.com/huggingface/datasets/issues/824
 
 Non male! Il nostro secondo risultato sembra soddisfare la nostra richiesta.
 
-<Tip>
-
-✏️ **Prova tu!** Crea la tua query e prova a trovare una risposta tra i documenti raccolti. Potresti aver bisogno di aumentare il parametro `k` in `Dataset.get_nearest_examples()` per allargare la ricerca. 
-
-</Tip>
+> [!TIP]
+> ✏️ **Prova tu!** Crea la tua query e prova a trovare una risposta tra i documenti raccolti. Potresti aver bisogno di aumentare il parametro `k` in `Dataset.get_nearest_examples()` per allargare la ricerca.
diff --git a/chapters/it/chapter8/2.mdx b/chapters/it/chapter8/2.mdx
index f17c3c933..1d94d0037 100644
--- a/chapters/it/chapter8/2.mdx
+++ b/chapters/it/chapter8/2.mdx
@@ -85,11 +85,8 @@ Oh no, sembra che qualcosa sia andato storto! Se sei alle prime armi con la prog
 
 Questi messaggi contengono molte informazioni, quindi analizziamo insieme le parti principali. La prima cosa da notare è che i traceback devono essere letti _dal basso verso l'alto_. Questo può sembrare strano se si è abituati a leggere dall'alto verso il basso, ma riflette il fatto che il traceback mostra la sequenza di chiamate delle funzioni che la `pipeline` effettua quando scarica il modello e il tokenizer. (Dai un'occhiata al [Capitolo 2](/course/chapter2) per maggiori dettagli su come funziona la `pipeline`.)
 
-<Tip>
-
-🚨 Hai notato quel riquadro blu intorno a "6 frames" nel traceback di Google Colab? È una funzionalità speciale di Colab, che comprime il traceback in "frame". Se non riesci a trovare l'origine di un errore, assicurati di espandere l'intero traceback facendo clic su quelle due piccole frecce.
-
-</Tip>
+> [!TIP]
+> 🚨 Hai notato quel riquadro blu intorno a "6 frames" nel traceback di Google Colab? È una funzionalità speciale di Colab, che comprime il traceback in "frame". Se non riesci a trovare l'origine di un errore, assicurati di espandere l'intero traceback facendo clic su quelle due piccole frecce.
 
 Ciò significa che l'ultima riga del traceback indica l'ultimo messaggio di errore e fornisce il nome dell'eccezione sollevata. In questo caso, il tipo di eccezione è `OSError`, che indica un errore legato al sistema. Leggendo il messaggio di errore, si può notare che sembra esserci un problema con il file *config.json* del modello e vengono forniti due suggerimenti per risolverlo:
 
@@ -103,11 +100,8 @@ Make sure that:
 """
 ```
 
-<Tip>
-
-💡 Se vedi un messaggio di errore che è difficile da capire, copia e incolla il messaggio nella barra di ricerca di Google o di [Stack Overflow](https://stackoverflow.com/) (sì, davvero!). C'è una buona probabilità che non sei la prima persona a riscontrare l'errore, e questo è un buon modo per trovare le soluzioni pubblicate da altri utenti della community. Ad esempio, cercando `OSError: Can't load config for` su Stack Overflow si ottengono diversi [risultati](https://stackoverflow.com/search?q=OSError%3A+Can%27t+load+config+for+) che possono essere usati come punto di partenza per risolvere il problema.
-
-</Tip>
+> [!TIP]
+> 💡 Se vedi un messaggio di errore che è difficile da capire, copia e incolla il messaggio nella barra di ricerca di Google o di [Stack Overflow](https://stackoverflow.com/) (sì, davvero!). C'è una buona probabilità che non sei la prima persona a riscontrare l'errore, e questo è un buon modo per trovare le soluzioni pubblicate da altri utenti della community. Ad esempio, cercando `OSError: Can't load config for` su Stack Overflow si ottengono diversi [risultati](https://stackoverflow.com/search?q=OSError%3A+Can%27t+load+config+for+) che possono essere usati come punto di partenza per risolvere il problema.
 
 Il primo suggerimento ci chiede di verificare se l'ID del modello è effettivamente corretto, quindi la prima cosa da fare è copiare l'identificativo e incollarlo nella barra di ricerca dell'Hub:
 
@@ -159,11 +153,8 @@ pretrained_checkpoint = "distilbert-base-uncased"
 config = AutoConfig.from_pretrained(pretrained_checkpoint)
 ```
 
-<Tip warning={true}>
-
-🚨 L'approccio che stiamo adottando non è infallibile, poiché il/la nostro/a collega potrebbe aver modificato la configurazione di `distilbert-base-uncased` prima di affinare il modello. Nella vita reale, dovremmo verificare prima con loro, ma per lo scopo di questa sezione assumeremo che abbiano usato la configurazione predefinita.
-
-</Tip>
+> [!WARNING]
+> 🚨 L'approccio che stiamo adottando non è infallibile, poiché il/la nostro/a collega potrebbe aver modificato la configurazione di `distilbert-base-uncased` prima di affinare il modello. Nella vita reale, dovremmo verificare prima con loro, ma per lo scopo di questa sezione assumeremo che abbiano usato la configurazione predefinita.
 
 Possiamo quindi inviarlo al nostro repository del modello con la funzione `push_to_hub()` della configurazione:
 
diff --git a/chapters/it/chapter8/4.mdx b/chapters/it/chapter8/4.mdx
index d9336875b..dbe02325c 100644
--- a/chapters/it/chapter8/4.mdx
+++ b/chapters/it/chapter8/4.mdx
@@ -243,11 +243,8 @@ Quindi `1` significa `neutral` (_neutro_), il che significa che le due frasi vis
 
 Non abbiamo token type ID (_ID del tipo di token_) qui, perché DistilBERT non li prevede; se li hai nel tuo modello, devi anche assicurarti che corrispondano correttamente alla posizione della prima e della seconda frase nell'input.
 
-<Tip>
-
-✏️ **Prova tu!** Controlla che tutto sia corretto nel secondo elemento del training set.
-
-</Tip>
+> [!TIP]
+> ✏️ **Prova tu!** Controlla che tutto sia corretto nel secondo elemento del training set.
 
 In questo caso, il controllo viene effettuato solo sul training set, ma è necessario ricontrollare allo stesso modo anche il validation set e il test set.
 
@@ -518,11 +515,8 @@ Ogni volta che si riceve un messaggio di errore che inizia con `RuntimeError: CU
 
 Per risolvere questo problema, è sufficiente utilizzare meno spazio sulla GPU, cosa che spesso è più facile a dirsi che a farsi. Per prima cosa, assicuratevi di non avere due modelli sulla GPU contemporaneamente (a meno che non sia necessario per il vostro problema, ovviamente). Poi, è probabile che si debba ridurre la dimensione del batch, in quanto influisce direttamente sulle dimensioni di tutti gli output intermedi del modello e dei loro gradienti. Se il problema persiste, si può considerare di utilizzare una versione più piccola del modello.
 
-<Tip>
-
-Nella prossima parte del corso, esamineremo tecniche più avanzate che possono aiutare a ridurre l'impatto sulla memoria e ad affinare i modelli più grandi.
-
-</Tip>
+> [!TIP]
+> Nella prossima parte del corso, esamineremo tecniche più avanzate che possono aiutare a ridurre l'impatto sulla memoria e ad affinare i modelli più grandi.
 
 ### Valutazione del modello
 
@@ -549,11 +543,8 @@ trainer.evaluate()
 TypeError: only size-1 arrays can be converted to Python scalars
 ```
 
-<Tip>
-
-💡 Bisogna sempre assicurarsi di poter eseguire `trainer.evaluate()` prima di lanciare `trainer.train()`, per evitare di sprecare molte risorse di calcolo prima di incorrere in un errore.
-
-</Tip>
+> [!TIP]
+> 💡 Bisogna sempre assicurarsi di poter eseguire `trainer.evaluate()` prima di lanciare `trainer.train()`, per evitare di sprecare molte risorse di calcolo prima di incorrere in un errore.
 
 Prima di tentare il debug di un problema nel ciclo di valutazione, è necessario assicurarsi di aver dato un'occhiata ai dati, di essere in grado di generare correttamente un batch e di poter eseguire il modello su di esso. Abbiamo completato tutti questi passaggi, quindi il codice seguente può essere eseguito senza errori:
 
@@ -682,11 +673,8 @@ trainer.train()
 
 In questo caso, non ci sono più problemi e il nostro script affinerà un modello che dovrebbe dare risultati ragionevoli. Ma cosa possiamo fare quando l'addestramento procede senza errori e il modello addestrato non funziona affatto bene? Questa è la parte più difficile di machine learning e ti mostreremo alcune tecniche che possono aiutarti.
 
-<Tip>
-
-💡 Se si utilizza un ciclo di addestramento manuale, per il debug della pipeline di addestramento valgono gli stessi passaggi, ma è più facile separarli. Assicurati però di non aver dimenticato il `model.eval()` o il `model.train()` nei punti giusti, o lo `zero_grad()` a ogni passo!
-
-</Tip>
+> [!TIP]
+> 💡 Se si utilizza un ciclo di addestramento manuale, per il debug della pipeline di addestramento valgono gli stessi passaggi, ma è più facile separarli. Assicurati però di non aver dimenticato il `model.eval()` o il `model.train()` nei punti giusti, o lo `zero_grad()` a ogni passo!
 
 ## Debug degli errori silenziosi durante l'addestramento
 
@@ -701,11 +689,8 @@ Il tuo modello imparerà qualcosa solo se è effettivamente possibile imparare q
 - C'è una label più comune delle altre?
 - Quale dovrebbe essere la funzione di perdita/metrica se il modello predicesse una risposta a caso/sempre la stessa risposta?
 
-<Tip warning={true}>
-
-⚠️ Se effettui un addestramento in modo distribuito, stampa campioni del set di dati in ogni processo e controlla molto attentamente che ottieni la stessa cosa. Un bug comune è la presenza di una qualche fonte di casualità nella creazione dei dati che fa sì che ogni processo abbia una versione diversa del set di dati.
-
-</Tip>
+> [!WARNING]
+> ⚠️ Se effettui un addestramento in modo distribuito, stampa campioni del set di dati in ogni processo e controlla molto attentamente che ottieni la stessa cosa. Un bug comune è la presenza di una qualche fonte di casualità nella creazione dei dati che fa sì che ogni processo abbia una versione diversa del set di dati.
 
 Dopo aver esaminato i dati, esamina alcune previsioni del modello e decodificale. Se il modello prevede sempre la stessa cosa, potrebbe essere perché il tuo set di dati è influenzato verso una categoria (per i problemi di classificazione); tecniche come fare oversampling (_sovra-campionamento_) delle classi rare potrebbero aiutare.
 
@@ -734,11 +719,8 @@ for _ in range(20):
     trainer.optimizer.zero_grad()
 ```
 
-<Tip>
-
-💡 Se i dati di addestramento sono sbilanciati, assicurati di creare un batch di dati di addestramento contenente tutte le label.
-
-</Tip>
+> [!TIP]
+> 💡 Se i dati di addestramento sono sbilanciati, assicurati di creare un batch di dati di addestramento contenente tutte le label.
 
 Il modello risultante dovrebbe avere risultati quasi perfetti sullo stesso `batch`. Calcoliamo la metrica sulle previsioni risultanti:
 
@@ -759,11 +741,8 @@ compute_metrics((preds.cpu().numpy(), labels.cpu().numpy()))
 
 Se non si riesci a far sì che il modello ottenga risultati perfetti come questo, significa che c'è qualcosa di sbagliato nel modo in cui si è impostato il problema o con i dati, e quindi dovresti risolvere questa cosa. Solo quando riesci a superare il test di overfitting puoi essere sicuro/a che il tuo modello possa effettivamente imparare qualcosa.
 
-<Tip warning={true}>
-
-⚠️ Sarà necessario ricreare il modello e il `Trainer` dopo questo test, poiché il modello ottenuto probabilmente non sarà in grado di recuperare e imparare qualcosa di utile sul set di dati completo.
-
-</Tip>
+> [!WARNING]
+> ⚠️ Sarà necessario ricreare il modello e il `Trainer` dopo questo test, poiché il modello ottenuto probabilmente non sarà in grado di recuperare e imparare qualcosa di utile sul set di dati completo.
 
 ### Non calibrare niente prima di avere una prima baseline
 
diff --git a/chapters/it/chapter8/4_tf.mdx b/chapters/it/chapter8/4_tf.mdx
index e724900d9..cdc16e2bc 100644
--- a/chapters/it/chapter8/4_tf.mdx
+++ b/chapters/it/chapter8/4_tf.mdx
@@ -110,15 +110,12 @@ model.compile(optimizer="adam")
 
 Ora utilizzeremo la funzione di perdita interna del modello e il problema dovrebbe essere risolto!
 
-<Tip>
-
-✏️ **Prova tu!** Come sfida opzionale, dopo aver risolto gli altri problemi, puoi provare a tornare a questo passaggio e a far funzionare il modello con la loss originale calcolata da Keras invece che con la loss interna. È necessario aggiungere `"labels"` all'argomento `label_cols` di `to_tf_dataset()` per assicurarsi che le label siano fornite correttamente, in modo da ottenere i gradienti, ma c'è un altro problema con la loss che abbiamo specificato. L'addestramento continuerà a funzionare con questo problema, ma l'apprendimento sarà molto lento e si bloccherà a una loss di addestramento elevata. Riesci a capire di cosa si tratta?
-
-Un suggerimento codificato in ROT13, se sei bloccato/a: Vs lbh ybbx ng gur bhgchgf bs FrdhraprPynffvsvpngvba zbqryf va Genafsbezref, gurve svefg bhgchg vf `ybtvgf`. Jung ner ybtvgf?
-
-E un secondo indizio: Jura lbh fcrpvsl bcgvzvmref, npgvingvbaf be ybffrf jvgu fgevatf, Xrenf frgf nyy gur nethzrag inyhrf gb gurve qrsnhygf. Jung nethzragf qbrf FcnefrPngrtbevpnyPebffragebcl unir, naq jung ner gurve qrsnhygf?
-
-</Tip>
+> [!TIP]
+> ✏️ **Prova tu!** Come sfida opzionale, dopo aver risolto gli altri problemi, puoi provare a tornare a questo passaggio e a far funzionare il modello con la loss originale calcolata da Keras invece che con la loss interna. È necessario aggiungere `"labels"` all'argomento `label_cols` di `to_tf_dataset()` per assicurarsi che le label siano fornite correttamente, in modo da ottenere i gradienti, ma c'è un altro problema con la loss che abbiamo specificato. L'addestramento continuerà a funzionare con questo problema, ma l'apprendimento sarà molto lento e si bloccherà a una loss di addestramento elevata. Riesci a capire di cosa si tratta?
+>
+> Un suggerimento codificato in ROT13, se sei bloccato/a: Vs lbh ybbx ng gur bhgchgf bs FrdhraprPynffvsvpngvba zbqryf va Genafsbezref, gurve svefg bhgchg vf `ybtvgf`. Jung ner ybtvgf?
+>
+> E un secondo indizio: Jura lbh fcrpvsl bcgvzvmref, npgvingvbaf be ybffrf jvgu fgevatf, Xrenf frgf nyy gur nethzrag inyhrf gb gurve qrsnhygf. Jung nethzragf qbrf FcnefrPngrtbevpnyPebffragebcl unir, naq jung ner gurve qrsnhygf?
 
 Ora proviamo ad avviare l'addestramento. Ora dovremmo ottenere i gradienti, quindi, se tutto va bene (musica minacciosa), possiamo chiamare `model.fit()` e tutto funzionerà bene!
 
@@ -361,11 +358,8 @@ model = TFAutoModelForSequenceClassification.from_pretrained(model_checkpoint)
 model.compile(optimizer=Adam(5e-5))
 ```
 
-<Tip>
-
-💡 È anche possibile importare la funzione `create_optimizer()` da 🤗 Transformers, che fornirà un optimizer AdamW con un corretto weight decay insieme a un learning rate warmup e decay. Questo ottimizzatore spesso produce risultati leggermente migliori di quelli ottenuti con l'ottimizzatore Adam predefinito.
-
-</Tip>
+> [!TIP]
+> 💡 È anche possibile importare la funzione `create_optimizer()` da 🤗 Transformers, che fornirà un optimizer AdamW con un corretto weight decay insieme a un learning rate warmup e decay. Questo ottimizzatore spesso produce risultati leggermente migliori di quelli ottenuti con l'ottimizzatore Adam predefinito.
 
 Adess, possiamo tentarde di fare training del modell con il nuovo learning rate migliorato:
 
@@ -387,11 +381,8 @@ Abbiamo trattato i problemi dello script di cui sopra, ma ci sono molti altri er
 
 Il segnale che indica che la memoria è esaurita è un errore del tipo "OOM when allocating tensor" (OOM è l'abbreviazione di "out of memory"). Si tratta di un rischio molto comune quando si ha a che fare con modelli linguistici di grandi dimensioni. In questo caso, una buona strategia è quella di dimezzare le dimensioni del batch e riprovare. Tenete presente, però, che alcuni modelli sono *molto* grandi. Ad esempio, il modello GPT-2 completo ha 1,5B parametri, il che significa che sono necessari 6 GB di memoria solo per memorizzare il modello e altri 6 GB per i suoi gradienti! L'addestramento del modello GPT-2 completo richiede di solito oltre 20 GB di VRAM, indipendentemente dalla dimensione del batch utilizzato, che solo poche GPU hanno. Modelli più leggeri come `distilbert-base-cased` sono molto più facili da eseguire e si addestrano molto più rapidamente.
 
-<Tip>
-
-Nella prossima parte del corso, esamineremo tecniche più avanzate che possono aiutare a ridurre l'impatto sulla memoria e ad affinare i modelli più grandi.
-
-</Tip>
+> [!TIP]
+> Nella prossima parte del corso, esamineremo tecniche più avanzate che possono aiutare a ridurre l'impatto sulla memoria e ad affinare i modelli più grandi.
 
 ### TensorFlow è molto affamato 🦛
 
@@ -447,21 +438,15 @@ for batch in train_dataset:
 model.fit(batch, epochs=20)
 ```
 
-<Tip>
-
-💡 Se i dati di addestramento sono sbilanciati, assicurati di creare un batch di dati di addestramento contenente tutte le label.
-
-</Tip>
+> [!TIP]
+> 💡 Se i dati di addestramento sono sbilanciati, assicurati di creare un batch di dati di addestramento contenente tutte le label.
 
 Il modello risultante dovrebbe avere risultati quasi perfetti sul `batch`, con una loss che diminuisce rapidamente verso lo 0 (o il valore minimo per la loss che si sta utilizzando).
 
 Se non si riesci a far sì che il modello ottenga risultati perfetti come questo, significa che c'è qualcosa di sbagliato nel modo in cui si è impostato il problema o con i dati, e quindi dovresti risolvere questa cosa. Solo quando riesci a superare il test di overfitting puoi essere sicuro/a che il tuo modello possa effettivamente imparare qualcosa.
 
-<Tip warning={true}>
-
-⚠️ Sarà necessario ricreare il modello e ricompilarlo dopo questo test, poiché il modello ottenuto probabilmente non sarà in grado di recuperare e imparare qualcosa di utile sul set di dati completo.
-
-</Tip>
+> [!WARNING]
+> ⚠️ Sarà necessario ricreare il modello e ricompilarlo dopo questo test, poiché il modello ottenuto probabilmente non sarà in grado di recuperare e imparare qualcosa di utile sul set di dati completo.
 
 ### Non calibrare niente prima di avere una prima baseline
 
diff --git a/chapters/it/chapter8/5.mdx b/chapters/it/chapter8/5.mdx
index cb53a3258..1b05071a5 100644
--- a/chapters/it/chapter8/5.mdx
+++ b/chapters/it/chapter8/5.mdx
@@ -17,11 +17,8 @@ Quando si è sicuri di avere un bug tra le mani, il primo passo è creare un min
 
 È molto importante isolare il pezzo di codice che produce il bug, poiché nessuno del team di Hugging Face è un mago (ancora) e non possono risolvere ciò che non vedono. Un minimo esempio riproducibile dovrebbe, come indica il nome, essere riproducibile. Ciò significa che non deve fare affidamento su file o dati esterni. Prova a sostituire i dati che stai usando con alcuni valori fittizi che assomigliano a quelli reali e producono lo stesso errore.
 
-<Tip>
-
-🚨 Molti issue presenti nel repository di 🤗 Transformers sono irrisolti perché i dati utilizzati per riprodurli non sono accessibili.
-
-</Tip>
+> [!TIP]
+> 🚨 Molti issue presenti nel repository di 🤗 Transformers sono irrisolti perché i dati utilizzati per riprodurli non sono accessibili.
 
 Una volta che si ha qualcosa di autocontenuto, si può cercare di ridurlo in un numero ancora minore di righe di codice, costruendo quello che chiamiamo un _minimo esempio riproducibile_. Sebbene questo richieda un po' più di lavoro da parte tua, se fornisci un breve e chiaro esempio di bug, avrai quasi la garanzia di ricevere aiuto e una correzione.
 
diff --git a/chapters/it/chapter9/1.mdx b/chapters/it/chapter9/1.mdx
index b1b73ae4b..dd08d26c0 100644
--- a/chapters/it/chapter9/1.mdx
+++ b/chapters/it/chapter9/1.mdx
@@ -32,6 +32,5 @@ Ecco alcuni esempi di demo di machine learning costruite con Gradio:
 
 Questo capitolo è suddiviso in sezioni che comprendono sia _concetti_ che _applicazioni_. Dopo aver appreso il concetto in ogni sezione, lo applicherai per creare un particolare tipo di demo, dalla classificazione delle immagini al riconoscimento vocale. Al termine di questo capitolo, sarete in grado di creare queste demo (e molte altre!) con poche righe di Python.
 
-<Tip>
-👀 Dai un'occhiata a <a href="https://huggingface.co/spaces" target="_blank">Hugging Face Spaces</a> per vedere molti esempi recenti di demo di machine learning costruite dalla community!
-</Tip>
+> [!TIP]
+> 👀 Dai un'occhiata a <a href="https://huggingface.co/spaces" target="_blank">Hugging Face Spaces</a> per vedere molti esempi recenti di demo di machine learning costruite dalla community!
diff --git a/chapters/ja/chapter1/3.mdx b/chapters/ja/chapter1/3.mdx
index 93803b215..92be0cddf 100644
--- a/chapters/ja/chapter1/3.mdx
+++ b/chapters/ja/chapter1/3.mdx
@@ -10,11 +10,10 @@
 
 このセクションでは、Transformerモデルができることを見ていき、🤗 Transformersライブラリの最初のツールとして `pipeline()` 関数を使ってみましょう。
 
-<Tip>
-👀 右上に<em>Open in Colab</em>というボタンがありますよね？それをクリックすると、このセクションのすべてのコードサンプルを含むGoogle Colabノートブックが開きます。このボタンは、コードサンプルを含むどのセクションにも存在します。
-
-ローカルでサンプルを実行したい場合は、<a href="/course/chapter0">セットアップ</a>を参照することをお勧めします。
-</Tip>
+> [!TIP]
+> 👀 右上に<em>Open in Colab</em>というボタンがありますよね？それをクリックすると、このセクションのすべてのコードサンプルを含むGoogle Colabノートブックが開きます。このボタンは、コードサンプルを含むどのセクションにも存在します。
+>
+> ローカルでサンプルを実行したい場合は、<a href="/course/chapter0">セットアップ</a>を参照することをお勧めします。
 
 ## Transformersは至るところに!
 
@@ -24,9 +23,8 @@ Transformerモデルは前節で述べたようなあらゆる種類のNLPタス
 
 [🤗 Transformers library](https://github.com/huggingface/transformers)は、それらの共有モデルを作成し、使用するための機能を提供します。[Model Hub](https://huggingface.co/models)には、誰でもダウンロードして使用できる何千もの事前学習済みモデルが含まれています。また、あなた自身のモデルをModel Hubにアップロードすることも可能です。
 
-<Tip>
-⚠️Hugging Face Hubはトランスフォーマーモデルに限定されるものではありません。誰でも好きな種類のモデルやデータセットを共有することができます!すべての利用可能な機能の恩恵を受けるために<a href="https://huggingface.co/join">huggingface.coのアカウントを作成</a>しましょう!
-</Tip>  
+> [!TIP]
+> ⚠️Hugging Face Hubはトランスフォーマーモデルに限定されるものではありません。誰でも好きな種類のモデルやデータセットを共有することができます!すべての利用可能な機能の恩恵を受けるために<a href="https://huggingface.co/join">huggingface.coのアカウントを作成</a>しましょう!  
 
 <br>
 
@@ -107,11 +105,8 @@ classifier(
 
 このpipelineは _zero-shot_ と呼ばれます。なぜなら、これを使うために自前のデータセットでモデルのファインチューニングをする必要がないからです。任意のラベルのリストに対して直接確率スコアを返すことができます。
 
-<Tip>
-
-✏️ **試してみよう!** 独自の入力とラベルで遊んでみて、モデルがどのように振る舞うか見てみましょう。
-
-</Tip>
+> [!TIP]
+> ✏️ **試してみよう!** 独自の入力とラベルで遊んでみて、モデルがどのように振る舞うか見てみましょう。
 
 ## 文章生成
 
@@ -134,12 +129,8 @@ generator("In this course, we will teach you how to")
 
 引数`num_return_sequences`で異なるシーケンスの生成数を、引数`max_length`で出力テキストの合計の長さを制御することができます。
 
-<Tip>
-
-
-✏️ **試してみよう!** `num_return_sequences` と `max_length` 引数を用いて、15語ずつの2つの文を生成してみましょう！
-
-</Tip>
+> [!TIP]
+> ✏️ **試してみよう!** `num_return_sequences` と `max_length` 引数を用いて、15語ずつの2つの文を生成してみましょう！
 
 
 ## pipelineでHubから任意のモデルを使用する
@@ -171,11 +162,8 @@ generator(
 
 モデルをクリックで選択すると、オンラインで直接試用できるウィジェットが表示されます。このようにして、ダウンロードする前にモデルの機能をすばやくテストすることができます。
 
-<Tip>
-
-✏️ **試してみよう!** フィルターを使って、他の言語のテキスト生成モデルを探してみましょう。ウィジェットで自由に遊んだり、pipelineで使ってみてください！
-
-</Tip>
+> [!TIP]
+> ✏️ **試してみよう!** フィルターを使って、他の言語のテキスト生成モデルを探してみましょう。ウィジェットで自由に遊んだり、pipelineで使ってみてください！
 
 
 ### 推論API
@@ -208,11 +196,8 @@ unmasker("This course will teach you all about <mask> models.", top_k=2)
 
 `top_k` 引数は、いくつの可能性を表示させたいかをコントロールします。ここでは、モデルが特別な `<mask>` という単語を埋めていることに注意してください。これはしばしば *mask token* と呼ばれます。他の空所穴埋めモデルは異なるマスクトークンを持つかもしれないので、他のモデルを探索するときには常に適切なマスクワードを確認するのが良いでしょう。それを確認する1つの方法は、ウィジェットで使用されているマスクワードを見ることです。
 
-<Tip>
-
-✏️ **試してみよう!** Hub で `bert-base-cased` モデルを検索し、推論API ウィジェットでそのマスクワードを特定します。このモデルは上記の `pipeline` の例文に対して何を予測するでしょうか？
-
-</Tip>
+> [!TIP]
+> ✏️ **試してみよう!** Hub で `bert-base-cased` モデルを検索し、推論API ウィジェットでそのマスクワードを特定します。このモデルは上記の `pipeline` の例文に対して何を予測するでしょうか？
 
 ## 固有表現認識
 
@@ -236,11 +221,8 @@ ner("My name is Sylvain and I work at Hugging Face in Brooklyn.")
 
 pipelineの作成機能でオプション `grouped_entities=True` を渡すと、同じエンティティに対応する文の部分を再グループ化するようpipelineに指示します。ここでは、名前が複数の単語で構成されていても、モデルは "Hugging" と "Face" を一つの組織として正しくグループ化しています。実際、次の章で説明するように、前処理ではいくつかの単語をより小さなパーツに分割することさえあります。例えば、`Sylvain`は4つの部分に分割されます。`S`, `##yl`, `##va`, and `##in`.です。後処理の段階で、pipelineはこれらの断片をうまく再グループ化しました。
 
-<Tip>
-
-✏️ **試してみよう!** Model Hubで英語の品詞タグ付け（通常POSと略される）を行えるモデルを検索してください。このモデルは、上の例の文に対して何を予測するでしょうか？
-
-</Tip>
+> [!TIP]
+> ✏️ **試してみよう!** Model Hubで英語の品詞タグ付け（通常POSと略される）を行えるモデルを検索してください。このモデルは、上の例の文に対して何を予測するでしょうか？
 
 ## 質問応答
 
@@ -323,10 +305,7 @@ translator("Ce cours est produit par Hugging Face.")
 
 テキスト生成や要約と同様に、結果に対して `max_length` や `min_length` を指定することができます。
 
-<Tip>
-
-✏️ **試してみよう!** 他言語の翻訳モデルを検索して、前の文章をいくつかの異なる言語に翻訳してみましょう。
-
-</Tip>
+> [!TIP]
+> ✏️ **試してみよう!** 他言語の翻訳モデルを検索して、前の文章をいくつかの異なる言語に翻訳してみましょう。
 
 これまで紹介したpipelineは、ほとんどがデモンストレーションのためのものです。これらは特定のタスクのためにプログラムされたものであり、それらのバリエーションを実行することはできません。次の章では、`pipeline()`関数の中身と、その動作をカスタマイズする方法を学びます。
diff --git a/chapters/ja/chapter2/1.mdx b/chapters/ja/chapter2/1.mdx
index 4621ae318..62875fcfd 100644
--- a/chapters/ja/chapter2/1.mdx
+++ b/chapters/ja/chapter2/1.mdx
@@ -19,7 +19,6 @@
 
 次に、`pipeline()`関数のもう一つの主要な構成要素であるトークナイザーAPIについて見ていきます。トークナイザは、テキストからニューラルネットワークの数値入力への変換を処理し、必要に応じてテキストに戻す、最初と最後の処理ステップを担います。最後に、複数の文章をバッチ処理でモデルに送る方法を紹介し、`tokenizer()` 関数を詳しく見て、まとめに移ります。
 
-<Tip>
-⚠️ Model Hubと🤗 Transformersで利用可能なすべての機能を活用するには、<a href="https://huggingface.co/join">アカウントを作成する</a>ことをお勧めします。
-</Tip>
+> [!TIP]
+> ⚠️ Model Hubと🤗 Transformersで利用可能なすべての機能を活用するには、<a href="https://huggingface.co/join">アカウントを作成する</a>ことをお勧めします。
 
diff --git a/chapters/ja/chapter2/2.mdx b/chapters/ja/chapter2/2.mdx
index e03614571..392818ae5 100644
--- a/chapters/ja/chapter2/2.mdx
+++ b/chapters/ja/chapter2/2.mdx
@@ -22,9 +22,8 @@
 
 {/if}
 
-<Tip>
-ここは、PyTorchとTensorFlowのどちらを使うかによって内容が少し異なる最初のセクションです。タイトルの上にあるスイッチを切り替えて、好きなプラットフォームを選んでください！
-</Tip>
+> [!TIP]
+> ここは、PyTorchとTensorFlowのどちらを使うかによって内容が少し異なる最初のセクションです。タイトルの上にあるスイッチを切り替えて、好きなプラットフォームを選んでください！
 
 {#if fw === 'pt'}
 <Youtube id="1pedAIvTWXk"/>
@@ -352,8 +351,5 @@ model.config.id2label
 
 これで、pipelineの3つのステップ、すなわち、トークナイザによる前処理、モデルへの入力、そして後処理がうまく再現できました。この先では、それぞれのステップをより深く掘り下げていきましょう。
 
-<Tip>
-
-✏️ **試してみよう!** 自分でテキストを2つ（またはそれ以上）用意し、`sentiment-analysis` pipelineで実行します。そして、ここで見た手順を自分で再現して、同様な結果が得られるかどうか確認してみましょう!
-
-</Tip>
+> [!TIP]
+> ✏️ **試してみよう!** 自分でテキストを2つ（またはそれ以上）用意し、`sentiment-analysis` pipelineで実行します。そして、ここで見た手順を自分で再現して、同様な結果が得られるかどうか確認してみましょう!
diff --git a/chapters/ja/chapter2/4.mdx b/chapters/ja/chapter2/4.mdx
index 630e58989..397681a53 100644
--- a/chapters/ja/chapter2/4.mdx
+++ b/chapters/ja/chapter2/4.mdx
@@ -216,11 +216,8 @@ print(ids)
 
 これらの出力は、適切なフレームワークのテンソルに変換された後、前述のようにモデルの入力として使用できます。
 
-<Tip>
-
-✏️ **試してみよう！** 最後の2つのステップ（トークン化と入力IDへの変換）を、セクション2で使った入力文（"I've been waiting for a HuggingFace course my whole life." と "I hate this so much!"）に対して再現してみましょう。先ほどと同じ入力IDが得られるかどうかを確認してみてください。
-
-</Tip>
+> [!TIP]
+> ✏️ **試してみよう！** 最後の2つのステップ（トークン化と入力IDへの変換）を、セクション2で使った入力文（"I've been waiting for a HuggingFace course my whole life." と "I hate this so much!"）に対して再現してみましょう。先ほどと同じ入力IDが得られるかどうかを確認してみてください。
 
 ## デコーディング
 
diff --git a/chapters/ja/chapter2/5.mdx b/chapters/ja/chapter2/5.mdx
index 66c7342ae..7bd4169f4 100644
--- a/chapters/ja/chapter2/5.mdx
+++ b/chapters/ja/chapter2/5.mdx
@@ -181,11 +181,8 @@ batched_ids = [ids, ids]
 
 これは2つの同じ系列からなるバッチとなっています。
 
-<Tip>
-
-✏️ **試してみよう！** この `batch_ids` をテンソルに変換し、モデルに入力してみましょう。前と同じロジット（モデル出力）が得られることを確認してください（ただし、二重になっていることに注意してください）。 
-
-</Tip>
+> [!TIP]
+> ✏️ **試してみよう！** この `batch_ids` をテンソルに変換し、モデルに入力してみましょう。前と同じロジット（モデル出力）が得られることを確認してください（ただし、二重になっていることに注意してください）。
 
 バッチ処理により、複数の系列をモデルに入力できるようになります。単一の系列でバッチを構築するのと同じように、簡単に複数の系列を使用することができます。ただし、ここで1つ問題があります。2つ以上の系列をバッチ処理する場合、系列の長さがそれぞれ異なる場合があります。これまでテンソルを扱ったことがある場合は、テンソルの形状は長方形である必要があることをご存知なのではないでしょうか。従って、異なる長さの系列の入力IDリストを直接テンソルに変換することはできません。この問題を回避するための方法として、入力を*パディング*することが一般的です。
 
@@ -317,11 +314,8 @@ tf.Tensor(
 
 2つ目の系列の最後の値がパディングIDであることに注目してください。これは、アテンションマスクの0の値となっています。
 
-<Tip>
-
-✏️ **試してみよう！** セクション2で使用した2つの文 ("I've been waiting for a HuggingFace course my whole life." と "I hate this so much!") を手動でトークン化してみましょう。そしてこれらをモデルに入力し、セクション2で得られたロジットと同じ結果となることを確認してみましょう。次に、パディングトークンを使用してこれらをバッチ処理し、適切なアテンションマスクを作成してみましょう。また同様にモデルに入力した際、セクション2で得られた結果と同じものになることを確認してみましょう。
-
-</Tip>
+> [!TIP]
+> ✏️ **試してみよう！** セクション2で使用した2つの文 ("I've been waiting for a HuggingFace course my whole life." と "I hate this so much!") を手動でトークン化してみましょう。そしてこれらをモデルに入力し、セクション2で得られたロジットと同じ結果となることを確認してみましょう。次に、パディングトークンを使用してこれらをバッチ処理し、適切なアテンションマスクを作成してみましょう。また同様にモデルに入力した際、セクション2で得られた結果と同じものになることを確認してみましょう。
 
 ## より長い系列
 
diff --git a/chapters/ja/chapter4/2.mdx b/chapters/ja/chapter4/2.mdx
index dedc60478..23b739aa1 100644
--- a/chapters/ja/chapter4/2.mdx
+++ b/chapters/ja/chapter4/2.mdx
@@ -91,6 +91,5 @@ model = TFAutoModelForMaskedLM.from_pretrained("camembert-base")
 ```
 {/if}
 
-<Tip>
-学習済みのモデルを使う場合は、どのように学習したのか、どのデータセットで学習したのか、その限界と偏りを必ず確認すること。これらの情報はすべて、モデルカードに記載されています。
-</Tip>
+> [!TIP]
+> 学習済みのモデルを使う場合は、どのように学習したのか、どのデータセットで学習したのか、その限界と偏りを必ず確認すること。これらの情報はすべて、モデルカードに記載されています。
diff --git a/chapters/ja/chapter4/3.mdx b/chapters/ja/chapter4/3.mdx
index b6a8140db..1381a08fa 100644
--- a/chapters/ja/chapter4/3.mdx
+++ b/chapters/ja/chapter4/3.mdx
@@ -170,11 +170,8 @@ tokenizer.push_to_hub("dummy-model", organization="huggingface", use_auth_token=
 </div>
 {/if}
 
-<Tip>
-
-✏️ **やってみよう！** `bert-base-cased`チェックポイントに関連付けられたモデルとトークナイザーを、`push_to_hub()`メソッドを使って自分のネームスペースにあるリポジトリにアップロードします。レポジトリを削除する前に、レポジトリがあなたのページに正しく表示されることを確認してください。
-
-</Tip>
+> [!TIP]
+> ✏️ **やってみよう！** `bert-base-cased`チェックポイントに関連付けられたモデルとトークナイザーを、`push_to_hub()`メソッドを使って自分のネームスペースにあるリポジトリにアップロードします。レポジトリを削除する前に、レポジトリがあなたのページに正しく表示されることを確認してください。
 
 これまで見てきたように、`push_to_hub()`メソッドはいくつかの引数をとるので、特定のリポジトリや組織のネームスペースにアップロードしたり、別のAPI トークンを使用したりすることが可能です。詳細については、[🤗 Transformers documentation](https://huggingface.co/transformers/model_sharing.html)で仕様を確認することをお勧めします。
 
@@ -459,9 +456,8 @@ config.json  README.md  sentencepiece.bpe.model  special_tokens_map.json  tf_mod
 
 {/if}
 
-<Tip>
-✏️ ウェブインターフェースからリポジトリを作成する場合、*.gitattributes* ファイルは自動的に *.bin* や *.h5* などの特定の拡張子を持つファイルを大きなファイルとみなすように設定され、git-lfs がそれらを追跡するようになります。ユーザー側で別途設定を行う必要はありません。
-</Tip> 
+> [!TIP]
+> ✏️ ウェブインターフェースからリポジトリを作成する場合、*.gitattributes* ファイルは自動的に *.bin* や *.h5* などの特定の拡張子を持つファイルを大きなファイルとみなすように設定され、git-lfs がそれらを追跡するようになります。ユーザー側で別途設定を行う必要はありません。 
 
 これで、従来の Git リポジトリと同じように作業を進められるようになりました。すべてのファイルを Git のステージング環境に追加するには、`git add` コマンドを使います：
 
diff --git a/chapters/ja/chapter7/1.mdx b/chapters/ja/chapter7/1.mdx
index cb1eb2ee2..ac0020029 100644
--- a/chapters/ja/chapter7/1.mdx
+++ b/chapters/ja/chapter7/1.mdx
@@ -30,8 +30,5 @@
 
 {/if}
 
-<Tip>
-
-各セクションを順番に読んでいくと、共通するコードや文章がかなりあることに気がつくと思います。この繰り返しは意図的なもので、興味のあるタスクに飛び込んで（あるいは後で戻って）、完全な動作例を見つけることができるようにするためのものです。
-
-</Tip>
+> [!TIP]
+> 各セクションを順番に読んでいくと、共通するコードや文章がかなりあることに気がつくと思います。この繰り返しは意図的なもので、興味のあるタスクに飛び込んで（あるいは後で戻って）、完全な動作例を見つけることができるようにするためのものです。
diff --git a/chapters/ja/chapter7/2.mdx b/chapters/ja/chapter7/2.mdx
index 7d235c0b7..9905d1286 100644
--- a/chapters/ja/chapter7/2.mdx
+++ b/chapters/ja/chapter7/2.mdx
@@ -45,11 +45,8 @@
 
 まず最初に、トークン分類に適したデータセットが必要です。このセクションでは、[CoNLL-2003 dataset](https://huggingface.co/datasets/conll2003)を使います。このデータセットはロイターが配信するニュース記事を含みます。 
 
-<Tip>
-
-💡 単語とそれに対応するラベルに分割されたテキストからなるデータセットであれば、ここで説明するデータ処理を自分のデータセットに適用することができます。独自のデータを `Dataset` にロードする方法について復習が必要な場合は、[第5章](/course/ja/chapter5) を参照してください。
-
-</Tip>
+> [!TIP]
+> 💡 単語とそれに対応するラベルに分割されたテキストからなるデータセットであれば、ここで説明するデータ処理を自分のデータセットに適用することができます。独自のデータを `Dataset` にロードする方法について復習が必要な場合は、[第5章](/course/ja/chapter5) を参照してください。
 
 ### The CoNLL-2003 dataset
 
@@ -168,11 +165,8 @@ print(line2)
 
 このように、"European Union" と "Werner Zwingmann" のように2つの単語にまたがるエンティティは、最初の単語には `B-` ラベルが、2番目の単語には `I-` ラベルが付与されます。
 
-<Tip>
-
-✏️ **あなたの番です！** 同じ2つの文をPOSラベルまたはチャンキングラベルと一緒に出力してください。
-
-</Tip>
+> [!TIP]
+> ✏️ **あなたの番です！** 同じ2つの文をPOSラベルまたはチャンキングラベルと一緒に出力してください。
 
 ### データの処理
 
@@ -266,11 +260,8 @@ print(align_labels_with_tokens(labels, word_ids))
 
 見てわかるように、この関数は最初と最後の2つの特別なトークンに対して `-100` を追加し、2つのトークンに分割された単語に対して新たに `0` を追加しています。
 
-<Tip>
-
-✏️ **あなたの番です！** 研究者の中には、1つの単語には1つのラベルしか付けず、与えられた単語内の他のサブトークンに`-100`を割り当てることを好む人もいます。これは、多くのサブトークンに分割される長い単語が学習時の損失に大きく寄与するのを避けるためです。このルールに従って、ラベルと入力IDを一致させるように、前の関数を変更してみましょう。
-
-</Tip>
+> [!TIP]
+> ✏️ **あなたの番です！** 研究者の中には、1つの単語には1つのラベルしか付けず、与えられた単語内の他のサブトークンに`-100`を割り当てることを好む人もいます。これは、多くのサブトークンに分割される長い単語が学習時の損失に大きく寄与するのを避けるためです。このルールに従って、ラベルと入力IDを一致させるように、前の関数を変更してみましょう。
 
 データセット全体の前処理として、すべての入力をトークン化し、すべてのラベルに対して `align_labels_with_tokens()` を適用する必要があります。高速なtokenizerの速度を活かすには、たくさんのテキストを同時にトークン化するのがよいでしょう。そこで、サンプルのリストを処理する関数を書いて、 `Dataset.map()` メソッドに `batched=True` オプションを付けて使用することにしましょう。以前の例と唯一違うのは、tokenizerへの入力がテキストのリスト（この場合は単語のリストのリスト）である場合、 `word_ids()` 関数は単語IDが欲しいリストのインデックスを必要とするので、これも追加します。
 
@@ -435,11 +426,8 @@ model.config.num_labels
 9
 ```
 
-<Tip warning={true}>
-
-⚠️ ラベルの数が間違っているモデルがあると、後で `model.fit()` を呼び出すときによくわからないエラーが発生します。このエラーはデバッグの際に厄介なので、このチェックを必ず行い、期待通りのラベル数であることを確認してください。
-
-</Tip>
+> [!WARNING]
+> ⚠️ ラベルの数が間違っているモデルがあると、後で `model.fit()` を呼び出すときによくわからないエラーが発生します。このエラーはデバッグの際に厄介なので、このチェックを必ず行い、期待通りのラベル数であることを確認してください。
 
 ### モデルの微調整
 
@@ -506,11 +494,8 @@ model.fit(
 
 `hub_model_id` 引数には、プッシュしたいリポジトリのフルネームを指定できます。 (特に、特定の組織（organization）にプッシュする場合はこの引数を使用する必要があります)。例えば、モデルを [`huggingface-course` organization](https://huggingface.co/huggingface-course) にプッシュする場合、 `hub_model_id="huggingface-course/bert-finetuned-ner"` を追加します。デフォルトでは、使用されるリポジトリはあなたの名前が使われ、設定した出力ディレクトリにちなんだ名前、例えば `"cool_huggingface_user/bert-finetuned-ner"` となります。
 
-<Tip>
-
-💡 使用している出力ディレクトリがすでに存在する場合は、プッシュしたいリポジトリのローカルクローンである必要があります。そうでない場合は、`model.fit()` を呼び出すときにエラーが発生し、新しい名前を設定する必要があります。
-
-</Tip>
+> [!TIP]
+> 💡 使用している出力ディレクトリがすでに存在する場合は、プッシュしたいリポジトリのローカルクローンである必要があります。そうでない場合は、`model.fit()` を呼び出すときにエラーが発生し、新しい名前を設定する必要があります。
 
 学習が行われている間、モデルが保存されるたびに（今回の例ではエポックごとに）バックグラウンドでHubにアップロードされることに注意してください。このようにしておけば、必要に応じて別のマシンで学習を再開することができます。この段階で、Model Hub上の推論ウィジェットを使ってモデルをテストし、友人と共有することができます。
 
@@ -701,11 +686,8 @@ model.config.num_labels
 9
 ```
 
-<Tip warning={true}>
-
-⚠️ ラベルの数が間違っているモデルがあると、後で `model.fit()` を呼び出すときによくわからないエラー（"CUDA error: device-side assert triggered"のようなエラー）が発生します。このようなエラーはユーザーから報告されるバグの原因として一番多いものです。このチェックを必ず行い、期待通りのラベル数であることを確認してください。
-
-</Tip>
+> [!WARNING]
+> ⚠️ ラベルの数が間違っているモデルがあると、後で `model.fit()` を呼び出すときによくわからないエラー（"CUDA error: device-side assert triggered"のようなエラー）が発生します。このようなエラーはユーザーから報告されるバグの原因として一番多いものです。このチェックを必ず行い、期待通りのラベル数であることを確認してください。
 
 ### モデルの微調整
 
@@ -753,10 +735,8 @@ args = TrainingArguments(
 
 デフォルトでは、使用されるリポジトリはあなたの名前が使われ、設定した出力ディレクトリちなんだ名前、例えば今回の例では `"sgugger/bert-finetuned-ner"` となります。
 
-<Tip>
-💡 使用する出力ディレクトリが既に存在する場合は、プッシュしたいリポジトリのローカルクローンである必要があります。そうでない場合は、`Trainer` を定義する際にエラーが発生し、新しい名前を設定する必要があります。
-
-</Tip>
+> [!TIP]
+> 💡 使用する出力ディレクトリが既に存在する場合は、プッシュしたいリポジトリのローカルクローンである必要があります。そうでない場合は、`Trainer` を定義する際にエラーが発生し、新しい名前を設定する必要があります。
 
 最後に、すべてを `Trainer` に渡して、トレーニングを開始するだけです。
 
@@ -845,11 +825,8 @@ model, optimizer, train_dataloader, eval_dataloader = accelerator.prepare(
 )
 ```
 
-<Tip>
-
-🚨 TPUでトレーニングする場合は、上のセルから始まるコードを全て専用のトレーニング関数に移動する必要があります。詳しくは[第3章](/course/ja/chapter3)を参照してください。
-
-</Tip>
+> [!TIP]
+> 🚨 TPUでトレーニングする場合は、上のセルから始まるコードを全て専用のトレーニング関数に移動する必要があります。詳しくは[第3章](/course/ja/chapter3)を参照してください。
 
 これで `train_dataloader` を `accelerator.prepare()` に送ったので、そのデータ長を用いて学習ステップ数を計算することができます。このメソッドはdataloaderの長さを変更するので、常にdataloaderを準備した後に行う必要があることを忘れないでください。ここでは、学習率から0まで古典的な線形スケジュールを使用します。
 
diff --git a/chapters/ja/chapter7/3.mdx b/chapters/ja/chapter7/3.mdx
index 7090ced4d..9d67cca7d 100644
--- a/chapters/ja/chapter7/3.mdx
+++ b/chapters/ja/chapter7/3.mdx
@@ -42,11 +42,8 @@ Transformerモデルを含む多くのNLPアプリケーションでは、ハギ
 
 <Youtube id="mqElG5QJWUg"/>
 
-<Tip>
-
-🙋 「マスク言語モデリング」や「事前学習済みモデル」という言葉に聞き覚えがない方は、[第1章](/course/ja/chapter1)でこれらの主要な概念をすべて動画付きで説明していますので、ぜひご覧になってください。
-
-</Tip>
+> [!TIP]
+> 🙋 「マスク言語モデリング」や「事前学習済みモデル」という言葉に聞き覚えがない方は、[第1章](/course/ja/chapter1)でこれらの主要な概念をすべて動画付きで説明していますので、ぜひご覧になってください。
 
 ## マスク言語モデリング用の事前学習済みモデルの選択
 
@@ -240,11 +237,8 @@ for row in sample:
 
 そう、これらは確かに映画のレビューです。もしあなたが十分に年を取っているなら、最後のレビューにあるVHS版を所有しているというコメントさえ理解できるかもしれません😜! 言語モデリングにはラベルは必要ありませんが、`0`は否定的なレビュー、`1`は肯定的なレビューに対応することがもうわかりました。
 
-<Tip>
-
-✏️ **挑戦してみましょう！** `unsupervised` のラベルがついた分割データのランダムサンプルを作成し、ラベルが `0` や `1` でないことを確認してみましょう。また、`train` と `test` 用の分割データのラベルが本当に `0` か `1` のみかを確認することもできます。これはすべての自然言語処理の実践者が新しいプロジェクトの開始時に実行すべき、有用なサニティチェックです!
-
-</Tip>
+> [!TIP]
+> ✏️ **挑戦してみましょう！** `unsupervised` のラベルがついた分割データのランダムサンプルを作成し、ラベルが `0` や `1` でないことを確認してみましょう。また、`train` と `test` 用の分割データのラベルが本当に `0` か `1` のみかを確認することもできます。これはすべての自然言語処理の実践者が新しいプロジェクトの開始時に実行すべき、有用なサニティチェックです!
 
 さて、データをざっと見たところで、マスク言語モデリングのための準備に取りかかりましょう。[第3章](/course/ja/chapter3)で見たシーケンス分類のタスクと比較すると、いくつかの追加ステップが必要であることがわかるでしょう。さあ、始めましょう！
 
@@ -302,11 +296,8 @@ tokenizer.model_max_length
 
 この値は、チェックポイントに関連付けられた *tokenizer_config.json* ファイルから取得します。この場合、BERT と同様に、コンテキストサイズが 512 トークンであることが分かります。
 
-<Tip>
-
-✏️ ** あなたの番です！ ** [BigBird](https://huggingface.co/google/bigbird-roberta-base) や [Longformer](hf.co/allenai/longformer-base-4096) などのいくつかの Transformer モデルは、BERT や他の初期の Transformer モデルよりずっと長いコンテキスト長を持っています。これらのチェックポイントのトークナイザーをインスタンス化して、`model_max_length` がそのモデルカード内に記載されているものと一致することを検証してください。
-
-</Tip>
+> [!TIP]
+> ✏️ ** あなたの番です！ ** [BigBird](https://huggingface.co/google/bigbird-roberta-base) や [Longformer](hf.co/allenai/longformer-base-4096) などのいくつかの Transformer モデルは、BERT や他の初期の Transformer モデルよりずっと長いコンテキスト長を持っています。これらのチェックポイントのトークナイザーをインスタンス化して、`model_max_length` がそのモデルカード内に記載されているものと一致することを検証してください。
 
 され、Google Colab内で利用可能なGPUで実験を行うために、メモリに収まるような少し小さめのものを選ぶことにします。
 
@@ -314,11 +305,8 @@ tokenizer.model_max_length
 chunk_size = 128
 ```
 
-<Tip warning={true}>
-
-なお、小さな断片サイズを使用すると、実際のシナリオでは不利になることがあるので、モデルを適用するユースケースに対応したサイズを使用する必要があります。
-
-</Tip>
+> [!WARNING]
+> なお、小さな断片サイズを使用すると、実際のシナリオでは不利になることがあるので、モデルを適用するユースケースに対応したサイズを使用する必要があります。
 
 さて、ここからが楽しいところです。連結がどのように機能するかを示すために、トークン化されたトレーニングセットからいくつかのレビューを取り出し、レビュー毎のトークン数を出力してみましょう。
 
@@ -480,11 +468,8 @@ for chunk in data_collator(samples)["input_ids"]:
 
 これらが学習中にモデルが予測しなければならないトークンになります。そしてデータコレーターの素晴らしいところは、バッチごとに`[MASK]`の挿入をランダムにすることです! 
 
-<Tip>
-
-✏️ ** あなたの番です！ ** 上のコードを何度か実行して、ランダムなマスキングがあなたの目の前で起こるのを見ましょう! また、`tokenizer.decode()` メソッドを `tokenizer.convert_ids_to_tokens()` に置き換えると、与えた単語内から一つのトークンが選択されてマスクされ、他の単語がマスクされないことを見ることができます。
-
-</Tip>
+> [!TIP]
+> ✏️ ** あなたの番です！ ** 上のコードを何度か実行して、ランダムなマスキングがあなたの目の前で起こるのを見ましょう! また、`tokenizer.decode()` メソッドを `tokenizer.convert_ids_to_tokens()` に置き換えると、与えた単語内から一つのトークンが選択されてマスクされ、他の単語がマスクされないことを見ることができます。
 
 {#if fw === 'pt'}
 
@@ -599,11 +584,8 @@ for chunk in batch["input_ids"]:
 '>>> .... [MASK] [MASK] [MASK] [MASK]....... high. a classic line : inspector : i\'m here to sack one of your teachers. student : welcome to bromwell high. i expect that many adults of my age think that bromwell high is far fetched. what a pity that it isn\'t! [SEP] [CLS] homelessness ( or houselessness as george carlin stated ) has been an issue for years but never a plan to help those on the street that were once considered human who did everything from going to school, work, or vote for the matter. most people think of the homeless'
 ```
 
-<Tip>
-
-✏️ ** あなたの番です！ ** 上のコードを何度か実行して、ランダムなマスキングがあなたの目の前で起こるのを見ましょう! また、`tokenizer.decode()` メソッドを `tokenizer.convert_ids_to_tokens()` に置き換えると、与えられた単語からのトークンが常に一緒にマスクされることを確認できます。
-
-</Tip>
+> [!TIP]
+> ✏️ ** あなたの番です！ ** 上のコードを何度か実行して、ランダムなマスキングがあなたの目の前で起こるのを見ましょう! また、`tokenizer.decode()` メソッドを `tokenizer.convert_ids_to_tokens()` に置き換えると、与えられた単語からのトークンが常に一緒にマスクされることを確認できます。
 
 これで2つのデータコレーターが揃いましたので、残りの微調整ステップは標準的なものです。Google Colabで、神話に出てくるP100 GPUを運良く割り当てられなかった場合、学習に時間がかかることがあります😭そこで、まず学習セットのサイズを数千事例までダウンサンプルします。心配しないでください、それでもかなりまともな言語モデルができますよ。🤗 Datasets 内のデータセットは[第5章](/course/ja/chapter5)で紹介した `Dataset.train_test_split()` 関数で簡単にダウンサンプリングすることができます。
 
@@ -835,11 +817,8 @@ trainer.push_to_hub()
 
 {/if}
 
-<Tip>
-
-✏️ ** あなたの番です！ ** データコレーターを全単語マスキングコレーターに変えて、上記のトレーニングを実行してみましょう。より良い結果が得られましたか？
-
-</Tip>
+> [!TIP]
+> ✏️ ** あなたの番です！ ** データコレーターを全単語マスキングコレーターに変えて、上記のトレーニングを実行してみましょう。より良い結果が得られましたか？
 
 {#if fw === 'pt'} 
 
@@ -1062,8 +1041,5 @@ for pred in preds:
 
 これで、言語モデルを学習するための最初の実験を終えました。[セクション6](/course/ja/chapter7/section6)では、GPT-2のような自己回帰モデルをゼロから学習する方法を学びます。もし、あなた自身のTransformerモデルをどうやって事前学習するか見たいなら、そちらに向かってください
 
-<Tip>
-
-✏️ ** あなたの番です！ ** ドメイン適応の利点を定量化するために、IMDbラベルの分類器を、訓練前と微調整したDistilBERTチェックポイントの両方で微調整してください。テキスト分類について復習が必要な場合は、[第3章](/course/ja/chapter3)をチェックしてみてください。
-
-</Tip>
+> [!TIP]
+> ✏️ ** あなたの番です！ ** ドメイン適応の利点を定量化するために、IMDbラベルの分類器を、訓練前と微調整したDistilBERTチェックポイントの両方で微調整してください。テキスト分類について復習が必要な場合は、[第3章](/course/ja/chapter3)をチェックしてみてください。
diff --git a/chapters/ja/chapter7/4.mdx b/chapters/ja/chapter7/4.mdx
index e49d65d47..e12c26296 100644
--- a/chapters/ja/chapter7/4.mdx
+++ b/chapters/ja/chapter7/4.mdx
@@ -159,11 +159,8 @@ translator(
 
 <Youtube id="0Oxphw4Q9fo"/>
 
-<Tip>
-
-✏️ **あなたの番です！** フランス語でよく使われるもう1つの英単語は "email "です。学習データセットから、この単語を使った最初のサンプルを見つけてください。どのように翻訳されますか？同じ英文を学習済みモデルはどのように翻訳しているでしょうか？
-
-</Tip>
+> [!TIP]
+> ✏️ **あなたの番です！** フランス語でよく使われるもう1つの英単語は "email "です。学習データセットから、この単語を使った最初のサンプルを見つけてください。どのように翻訳されますか？同じ英文を学習済みモデルはどのように翻訳しているでしょうか？
 
 ### データを加工する
 
@@ -182,11 +179,8 @@ tokenizer = AutoTokenizer.from_pretrained(model_checkpoint, return_tensors="tf")
 
 また、`model_checkpoint` を [Hub](https://huggingface.co/models) にある好きなモデルや、事前学習したモデルやトークナイザーを保存したローカルフォルダに置き換えることができます。
 
-<Tip>
-
-💡 mBART、mBART-50、M2M100 などの多言語トークナイザーを使用している場合は、トークナイザーの `tokenizer.src_lang` と `tokenizer.tgt_lang` に入力元となる言語とターゲットとなる言語の正しい言語コード値を設定する必要があります。
-
-</Tip>
+> [!TIP]
+> 💡 mBART、mBART-50、M2M100 などの多言語トークナイザーを使用している場合は、トークナイザーの `tokenizer.src_lang` と `tokenizer.tgt_lang` に入力元となる言語とターゲットとなる言語の正しい言語コード値を設定する必要があります。
 
 データの準備はとても簡単です。入力言語は通常通り処理しますが、ターゲット言語についてはトークナイザーをコンテキストマネージャー `as_target_tokenizer()` の中にラップする必要があります。
 
@@ -249,17 +243,11 @@ def preprocess_function(examples):
 
 入力と出力に同様な最大長を設定していることに注意してください。扱うテキストはかなり短いと思われるので、128を使用しています。
 
-<Tip>
+> [!TIP]
+> 💡 T5モデル（具体的には `t5-xxx` チェックポイントの1つ）を使用している場合、モデルはテキスト入力にタスクを示すプレフィックス、例えば `translate: English to French:` のような、タスクを示す接頭辞を持つテキスト入力であることを期待します。
 
-💡 T5モデル（具体的には `t5-xxx` チェックポイントの1つ）を使用している場合、モデルはテキスト入力にタスクを示すプレフィックス、例えば `translate: English to French:` のような、タスクを示す接頭辞を持つテキスト入力であることを期待します。
-
-</Tip>
-
-<Tip warning={true}>
-
-⚠️私達はターゲット文のアテンションマスクはモデルが期待していないので、注意を払っていません。その代わり、パディングトークンに対応するラベルに`-100`に設定し、損失計算で無視されるようにします。私達は動的パディングを使用するのでこの処理は後でデータコレーターが行います。しかし、ここでパディングを使用する場合は、パディングトークンに対応する全てのラベルを`-100`に設定するように前処理関数を適応させる必要があります。
-
-</Tip>
+> [!WARNING]
+> ⚠️私達はターゲット文のアテンションマスクはモデルが期待していないので、注意を払っていません。その代わり、パディングトークンに対応するラベルに`-100`に設定し、損失計算で無視されるようにします。私達は動的パディングを使用するのでこの処理は後でデータコレーターが行います。しかし、ここでパディングを使用する場合は、パディングトークンに対応する全てのラベルを`-100`に設定するように前処理関数を適応させる必要があります。
 
 これで、データセットのすべての分割に対して、この前処理を一度に適用することができるようになりました。
 
@@ -667,11 +655,8 @@ model.fit(
 
 デフォルトでは、使用するリポジトリはあなたの名前空間内にあり、設定した出力ディレクトリの名前になります。ここでは `"sgugger/marian-finetuned-kde4-en-to-fr"` (このセクションの最初にリンクしたモデルです)となります。
 
-<Tip>
-
-💡 使用している出力ディレクトリがすでに存在する場合、プッシュしたいリポジトリのローカルクローンである必要があります。そうでない場合は、`model.fit()` を呼び出すときにエラーが発生するので、新しい名前を設定する必要があります。
-
-</Tip>
+> [!TIP]
+> 💡 使用している出力ディレクトリがすでに存在する場合、プッシュしたいリポジトリのローカルクローンである必要があります。そうでない場合は、`model.fit()` を呼び出すときにエラーが発生するので、新しい名前を設定する必要があります。
 
 最後に、トレーニングが終了した後の指標を見てみましょう。
 
@@ -720,11 +705,8 @@ args = Seq2SeqTrainingArguments(
 なお、 `hub_model_id` 引数には、プッシュしたいリポジトリのフルネームを指定できます。(特に、組織にプッシュする場合は、この引数を使用する必要があります)。例えば、モデルを [`huggingface-course` organization`](https://huggingface.co/huggingface-course) にプッシュする場合、`Seq2SeqTrainingArguments` に `hub_model_id="huggingface-course/marian-finetuned-kde4-en-to-fr"` を追加しています。デフォルトでは、使用するリポジトリはあなたの名前空間内にあり、設定した出力ディレクトリにちなんだ名前になります。
 この例では `"sgugger/marian-finetuned-kde4-en-to-fr"` となります。(このセクションの冒頭でリンクしたモデルです)
 
-<Tip>
-
-💡 出力先ディレクトリがすでに存在する場合は、プッシュしたいリポジトリのローカルクローンである必要があります。そうでない場合は、`Seq2SeqTrainer` を定義するときにエラーが発生するので、新しい名前を設定する必要があります。
-
-</Tip>
+> [!TIP]
+> 💡 出力先ディレクトリがすでに存在する場合は、プッシュしたいリポジトリのローカルクローンである必要があります。そうでない場合は、`Seq2SeqTrainer` を定義するときにエラーが発生するので、新しい名前を設定する必要があります。
 
 最後に、すべてを `Seq2SeqTrainer` に渡すだけです。
 
@@ -1016,8 +998,5 @@ translator(
 
 ドメイン適応の好例がまた一つ増えましたね！
 
-<Tip>
-
-✏️ **あなたの番です！** 先ほど確認した「email」という単語が入ったサンプルでは、モデルは何を返すでしょうか？
-
-</Tip>
+> [!TIP]
+> ✏️ **あなたの番です！** 先ほど確認した「email」という単語が入ったサンプルでは、モデルは何を返すでしょうか？
diff --git a/chapters/ja/chapter7/5.mdx b/chapters/ja/chapter7/5.mdx
index 1c6d9e224..443011e9e 100644
--- a/chapters/ja/chapter7/5.mdx
+++ b/chapters/ja/chapter7/5.mdx
@@ -85,11 +85,8 @@ show_samples(english_dataset)
 '>> Review: Bought this for handling miscellaneous aircraft parts and hanger "stuff" that I needed to organize; it really fit the bill. The unit arrived quickly, was well packaged and arrived intact (always a good sign). There are five wall mounts-- three on the top and two on the bottom. I wanted to mount it on the wall, so all I had to do was to remove the top two layers of plastic drawers, as well as the bottom corner drawers, place it when I wanted and mark it; I then used some of the new plastic screw in wall anchors (the 50 pound variety) and it easily mounted to the wall. Some have remarked that they wanted dividers for the drawers, and that they made those. Good idea. My application was that I needed something that I can see the contents at about eye level, so I wanted the fuller-sized drawers. I also like that these are the new plastic that doesn\'t get brittle and split like my older plastic drawers did. I like the all-plastic construction. It\'s heavy duty enough to hold metal parts, but being made of plastic it\'s not as heavy as a metal frame, so you can easily mount it to the wall and still load it up with heavy stuff, or light stuff. No problem there. For the money, you can\'t beat it. Best one of these I\'ve bought to date-- and I\'ve been using some version of these for over forty years.'
 ```
 
-<Tip>
-
-✏️ ** あなたの番です！ ** `Dataset.shuffle()` コマンドのランダムシードを変更して、コーパスの他のレビューも調べてみてください。もしあなたがスペイン語を話せるなら、`spanish_dataset` にあるいくつかのレビューを見て、タイトルも妥当な要約に見えるかどうか確かめてみてください。
-
-</Tip>
+> [!TIP]
+> ✏️ ** あなたの番です！ ** `Dataset.shuffle()` コマンドのランダムシードを変更して、コーパスの他のレビューも調べてみてください。もしあなたがスペイン語を話せるなら、`spanish_dataset` にあるいくつかのレビューを見て、タイトルも妥当な要約に見えるかどうか確かめてみてください。
 
 このサンプルは、肯定的なレビューから否定的なレビューまで（そしてその中間にある全てのレビュー！）、一般的にオンラインで見られるレビューの多様性を示しています。 「meh」というタイトルはあまり有益な情報を示すタイトルではありませんが、他のタイトルはレビュー自体の適切な要約のように見えます。40万件のレビューすべてについて要約モデルをトレーニングすることは、単一のGPUではあまりにも時間がかかりすぎるため、その代わりに、単一製品のドメインについて要約を生成することに焦点を当てます。どのようなドメインから選択できるかを知るために、`english_dataset` を `pandas.DataFrame` に変換して、製品カテゴリごとのレビュー数を計算してみましょう。
 
@@ -228,11 +225,8 @@ books_dataset = books_dataset.filter(lambda x: len(x["review_title"].split()) >
 
 mT5は接頭辞を使用しませんが、T5の多用途性を共有し、多言語であるという利点があります。さて、モデルを選んだところで、学習用のデータの準備に取りかかりましょう。
 
-<Tip>
-
-✏️ ** あなたの番です！ ** このセクションを終えたら、同じ手法でmBARTを微調整して、mT5がmBARTと比較してどの程度優れているかを見てみましょう。ボーナスポイントとして、英語のレビューだけでT5を微調整してみることもできます。T5には特別な接頭辞プロンプトがあるので、以下の前処理ステップでは入力例の前に`summarize:`を付ける必要があります。
-
-</Tip>
+> [!TIP]
+> ✏️ ** あなたの番です！ ** このセクションを終えたら、同じ手法でmBARTを微調整して、mT5がmBARTと比較してどの程度優れているかを見てみましょう。ボーナスポイントとして、英語のレビューだけでT5を微調整してみることもできます。T5には特別な接頭辞プロンプトがあるので、以下の前処理ステップでは入力例の前に`summarize:`を付ける必要があります。
 
 ## データの前処理
 
@@ -247,11 +241,8 @@ model_checkpoint = "google/mt5-small"
 tokenizer = AutoTokenizer.from_pretrained(model_checkpoint)
 ```
 
-<Tip>
-
-💡 NLPプロジェクトの初期段階では、「小さな」モデルのクラスを少量のデータサンプルで学習させるのがよい方法です。これにより、エンド・ツー・エンドのワークフローに向けたデバッグと反復をより速く行うことができます。結果に自信が持てたら、モデルのチェックポイントを変更するだけで、いつでもモデルをスケールアップすることができます。
-
-</Tip>
+> [!TIP]
+> 💡 NLPプロジェクトの初期段階では、「小さな」モデルのクラスを少量のデータサンプルで学習させるのがよい方法です。これにより、エンド・ツー・エンドのワークフローに向けたデバッグと反復をより速く行うことができます。結果に自信が持てたら、モデルのチェックポイントを変更するだけで、いつでもモデルをスケールアップすることができます。
 
 少量のサンプルでmT5トークナイザーをテストしてみましょう
 
@@ -309,11 +300,8 @@ tokenized_datasets = books_dataset.map(preprocess_function, batched=True)
 
 さて、コーパスの前処理が終わったところで、要約によく使われるいくつかの指標を見てみましょう。これから見るように、機械が生成した文章の品質を測る際に、万能の手法は存在しません。
 
-<Tip>
-
-💡 上の `Dataset.map()` 関数で `batched=True` を使っていることにお気づきかもしれません。これはサンプルを1,000のバッチ（デフォルト）でエンコードし、🤗 Transformersの高速トークナイザーが持つマルチスレッド機能を利用できるようにするものです。可能であれば、前処理を最大限に活用するために `batched=True` を使ってみてください!
-
-</Tip>
+> [!TIP]
+> 💡 上の `Dataset.map()` 関数で `batched=True` を使っていることにお気づきかもしれません。これはサンプルを1,000のバッチ（デフォルト）でエンコードし、🤗 Transformersの高速トークナイザーが持つマルチスレッド機能を利用できるようにするものです。可能であれば、前処理を最大限に活用するために `batched=True` を使ってみてください!
 
 
 ## 文章要約のための指標
@@ -331,11 +319,8 @@ reference_summary = "I loved reading the Hunger Games"
 
 比較する一つの方法として、重複している単語の数を数えることが考えられますが、この場合、6個となります。しかし、これは少し粗いので、代わりにROUGEは重なり合った部分の _適合率_ と _再現率_ のスコアを計算することを基本としています。
 
-<Tip>
-
-🙋 もしあなたが適合率や再現率について初めて聞いたとしても心配しないでください。すべてを明らかにするために、いくつかの明確な例を一緒に見ていきましょう。これらの指標は通常分類タスクで遭遇するので、その分類タスクの場合に適合率と再現率がどのように定義されているかを理解したい場合は、 `scikit-learn` [guides](https://scikit-learn.org/stable/auto_examples/model_selection/plot_precision_recall.html) をチェックアウトすることをお勧めします。
-
-</Tip>
+> [!TIP]
+> 🙋 もしあなたが適合率や再現率について初めて聞いたとしても心配しないでください。すべてを明らかにするために、いくつかの明確な例を一緒に見ていきましょう。これらの指標は通常分類タスクで遭遇するので、その分類タスクの場合に適合率と再現率がどのように定義されているかを理解したい場合は、 `scikit-learn` [guides](https://scikit-learn.org/stable/auto_examples/model_selection/plot_precision_recall.html) をチェックアウトすることをお勧めします。
 
 ROUGEでは、生成した要約に参照元の要約がどれだけ取り込まれたかを再現率で測定します。単語を比較するだけであれば、以下の式によって再現率を計算することができます。
 
@@ -390,11 +375,8 @@ Score(precision=0.86, recall=1.0, fmeasure=0.92)
 素晴らしい！適合率と再現率の数値が一致しました。では、他のROUGEスコアについてはどうでしょうか？
 `rouge2` はビッグラム（単語のペアの重なり）の重なりを測定し、 `rougeL` と `rougeLsum` は生成されたサマリーと参照サマリーで最も長い共通部分文字列を探して、最も長くマッチする単語列を測定します。`rougeLsum` の "sum" は、 `rougeL` が個々の文の平均値として計算されるのに対し、この指標は要約全体に対して計算されるという事実を表している。
 
-<Tip>
-
-✏️ **あなたの番です！ ** 生成と参照要約の独自の例を作成し、結果のROUGEスコアが精度とリコールの公式を基にした手動計算と一致するかどうかを確認することができます。ボーナスポイントとして、テキストをビッグラムに分割し、`rouge2` 指標の適合率と制限率を比較します。
-
-</Tip>
+> [!TIP]
+> ✏️ **あなたの番です！ ** 生成と参照要約の独自の例を作成し、結果のROUGEスコアが精度とリコールの公式を基にした手動計算と一致するかどうかを確認することができます。ボーナスポイントとして、テキストをビッグラムに分割し、`rouge2` 指標の適合率と制限率を比較します。
 
 このROUGEスコアを使ってモデルのパフォーマンスを追跡していきますが、その前に優れたNLP実践者がすべきこと、それは強力かつシンプルなベースラインを作成することです。
 
@@ -484,11 +466,8 @@ model = TFAutoModelForSeq2SeqLM.from_pretrained(model_checkpoint)
 
 {/if}
 
-<Tip>
-
-💡 下流のタスクでモデルを微調整に関する警告が表示されないことを不思議に思うかもしれませんが、それはシーケンス間タスクでは、ネットワークのすべての重みが保持されるからです。これを[第3章](/course/ja/chapter3)のテキスト分類モデルと比較してみましょう。テキスト分類モデルでは、事前学習したモデルの先頭をランダムに初期化したネットワークに置き換えています。
-
-</Tip>
+> [!TIP]
+> 💡 下流のタスクでモデルを微調整に関する警告が表示されないことを不思議に思うかもしれませんが、それはシーケンス間タスクでは、ネットワークのすべての重みが保持されるからです。これを[第3章](/course/ja/chapter3)のテキスト分類モデルと比較してみましょう。テキスト分類モデルでは、事前学習したモデルの先頭をランダムに初期化したネットワークに置き換えています。
 
 次に必要なのは、ハンギングフェイス ハブにログインすることです。このコードをノートブックで実行する場合は、次のユーティリティ関数で実行できます。
 
@@ -831,11 +810,8 @@ model, optimizer, train_dataloader, eval_dataloader = accelerator.prepare(
 )
 ```
 
-<Tip>
-
-🚨 TPUでトレーニングする場合は、上記のコードをすべて専用のトレーニング関数に移動する必要があります。詳しくは[第3章](/course/ja/chapter3)を参照してください。
-
-</Tip>
+> [!TIP]
+> 🚨 TPUでトレーニングする場合は、上記のコードをすべて専用のトレーニング関数に移動する必要があります。詳しくは[第3章](/course/ja/chapter3)を参照してください。
 
 さて、オブジェクトの準備ができたので、残すは3つです。
 
diff --git a/chapters/ja/chapter7/6.mdx b/chapters/ja/chapter7/6.mdx
index 34d0a71b6..fb075a725 100644
--- a/chapters/ja/chapter7/6.mdx
+++ b/chapters/ja/chapter7/6.mdx
@@ -135,11 +135,8 @@ DatasetDict({
 })
 ```
 
-<Tip>
-
-言語モデルのプリトレーニングにはしばらく時間がかかります。まず、上記の2つのデータセットに関する部分を一旦コメント化し、サンプルデータに対して学習ループを実行し、学習が一通り正常に終了してモデルが保存されたことを確認することをお勧めします。フォルダを作り忘れたり、学習ループの最後にタイプミスがあったりして、最後のステップで学習が失敗してしまうことほど悔しいことはありません！
-
-</Tip>
+> [!TIP]
+> 言語モデルのプリトレーニングにはしばらく時間がかかります。まず、上記の2つのデータセットに関する部分を一旦コメント化し、サンプルデータに対して学習ループを実行し、学習が一通り正常に終了してモデルが保存されたことを確認することをお勧めします。フォルダを作り忘れたり、学習ループの最後にタイプミスがあったりして、最後のステップで学習が失敗してしまうことほど悔しいことはありません！
 
 データセット内の例を見てみましょう。ここでは、各フィールドの最初の200文字だけを表示することにします。
 
@@ -252,13 +249,10 @@ DatasetDict({
 
 さて、データセットの準備ができたので、モデルをセットアップしてみましょう!
 
-<Tip>
-
-✏️ **あなたの番です！**
-
-コンテキストサイズより小さい断片を全て取り除くことは、今回は小さなコンテキストウィンドウを使っているので大きな問題ではありませんでした。コンテキストサイズを大きくすると（あるいは短いドキュメントのコーパスがある場合）、捨てられる断片の割合も大きくなります。より効率的なデータの準備方法としては、トークン化されたサンプルを `eos_token_id` トークンを挟んで一括で連結し、連結したデータに対して断片分割を実行することです。練習として、その方法を利用するために `tokenize()` 関数を修正してください。トークン ID の完全なシーケンスを取得するために、 `truncation=False` を設定し、トークナイザーの他の引数を削除する必要があることに注意してください。
-
-</Tip>
+> [!TIP]
+> ✏️ **あなたの番です！**
+>
+> コンテキストサイズより小さい断片を全て取り除くことは、今回は小さなコンテキストウィンドウを使っているので大きな問題ではありませんでした。コンテキストサイズを大きくすると（あるいは短いドキュメントのコーパスがある場合）、捨てられる断片の割合も大きくなります。より効率的なデータの準備方法としては、トークン化されたサンプルを `eos_token_id` トークンを挟んで一括で連結し、連結したデータに対して断片分割を実行することです。練習として、その方法を利用するために `tokenize()` 関数を修正してください。トークン ID の完全なシーケンスを取得するために、 `truncation=False` を設定し、トークナイザーの他の引数を削除する必要があることに注意してください。
 
 ## 新しいモデルを初期化する
 
@@ -401,11 +395,8 @@ tf_eval_dataset = tokenized_dataset["valid"].to_tf_dataset(
 
 {/if}
 
-<Tip warning={true}>
-
-⚠️ 入力とラベルの位置をずらすのはモデル内部で行われるので、データコレーターは入力をコピーしてラベルを作成するだけです。
-
-</Tip>
+> [!WARNING]
+> ⚠️ 入力とラベルの位置をずらすのはモデル内部で行われるので、データコレーターは入力をコピーしてラベルを作成するだけです。
 
 これで、実際にモデルを訓練するための準備が整いました。
 
@@ -504,25 +495,19 @@ model.fit(tf_train_dataset, validation_data=tf_eval_dataset, callbacks=[callback
 
 {/if}
 
-<Tip>
-
-✏️ **あなたの番です！** 生のテキストからGPT-2の学習まで、`TrainingArguments`に加えて、約30行のコードを作成するだけで済みました。あなた自身のデータセットで試してみて、良い結果が得られるかどうか確認してみてください! 
-
-</Tip>
-
-<Tip>
-
-{#if fw === 'pt'}
-
-💡 もし、複数のGPUを搭載したマシンを利用できるのであれば、そこでコードを実行してみてください。トレーナー`は自動的に複数のマシンを管理するため、学習速度が飛躍的に向上します。
-
-{:else}
-
-💡 もし、複数のGPUを搭載したマシンを利用できるのであれば、`MirroredStrategy`コンテキストを使って、学習を大幅にスピードアップさせることができます。そのためには `tf.distribute.MirroredStrategy` オブジェクトを作成し、 `to_tf_dataset` コマンド、モデルの作成、 `fit()` の呼び出しがすべて `scope()` コンテキストで実行されることを確認する必要があります。これに関するドキュメントは[こちら](https://www.tensorflow.org/guide/distributed_training#use_tfdistributestrategy_with_keras_modelfit)で見ることができます。
+> [!TIP]
+> ✏️ **あなたの番です！** 生のテキストからGPT-2の学習まで、`TrainingArguments`に加えて、約30行のコードを作成するだけで済みました。あなた自身のデータセットで試してみて、良い結果が得られるかどうか確認してみてください!
 
-{/if}
-
-</Tip>
+> [!TIP]
+> {#if fw === 'pt'}
+>
+> 💡 もし、複数のGPUを搭載したマシンを利用できるのであれば、そこでコードを実行してみてください。トレーナー`は自動的に複数のマシンを管理するため、学習速度が飛躍的に向上します。
+>
+> {:else}
+>
+> 💡 もし、複数のGPUを搭載したマシンを利用できるのであれば、`MirroredStrategy`コンテキストを使って、学習を大幅にスピードアップさせることができます。そのためには `tf.distribute.MirroredStrategy` オブジェクトを作成し、 `to_tf_dataset` コマンド、モデルの作成、 `fit()` の呼び出しがすべて `scope()` コンテキストで実行されることを確認する必要があります。これに関するドキュメントは[こちら](https://www.tensorflow.org/guide/distributed_training#use_tfdistributestrategy_with_keras_modelfit)で見ることができます。
+>
+> {/if}
 
 ## パイプラインによるコード生成
 
@@ -808,11 +793,8 @@ model, optimizer, train_dataloader, eval_dataloader = accelerator.prepare(
 )
 ```
 
-<Tip>
-
-🚨 TPUでトレーニングする場合は、上記のセルから始まるコードを全て専用のトレーニング関数に移動する必要があります。詳しくは[第3章](/course/ja/chapter3)を参照してください。
-
-</Tip>
+> [!TIP]
+> 🚨 TPUでトレーニングする場合は、上記のセルから始まるコードを全て専用のトレーニング関数に移動する必要があります。詳しくは[第3章](/course/ja/chapter3)を参照してください。
 
 これで `train_dataloader` を `accelerator.prepare()` に送ったので、その長さを用いて学習ステップ数を計算することができます。このメソッドはデータローダーの長さを変更するので、常にデータローダーを準備した後に行う必要があることを忘れないでください。ここでは、学習率から0までの古典的な線形スケジュールを使用します。
 
@@ -914,14 +896,10 @@ for epoch in range(num_train_epochs):
 これで完了です。
 貴方はGPT-2のような因果言語モデルのためのカスタム学習ループを作成できるようになり、更にニーズに合わせてカスタマイズすることができます。
 
-<Tip>
-✏️ **あなたの番です！** 用途に合わせた独自の損失関数を作成するか、トレーニングループに別のカスタムステップを追加してみましょう。
-</Tip>
-
-<Tip>
-
-✏️ **あなたの番です！** 長時間に及ぶ学習実験を行う場合、TensorBoardやWeights & Biasesなどのツールを使って重要な指標を記録しておくとよいでしょう。学習ループに適切なログを追加することで、学習がどのように進んでいるかを常に確認することができます。
+> [!TIP]
+> ✏️ **あなたの番です！** 用途に合わせた独自の損失関数を作成するか、トレーニングループに別のカスタムステップを追加してみましょう。
 
-</Tip>
+> [!TIP]
+> ✏️ **あなたの番です！** 長時間に及ぶ学習実験を行う場合、TensorBoardやWeights & Biasesなどのツールを使って重要な指標を記録しておくとよいでしょう。学習ループに適切なログを追加することで、学習がどのように進んでいるかを常に確認することができます。
 
 {/if}
\ No newline at end of file
diff --git a/chapters/ja/chapter7/7.mdx b/chapters/ja/chapter7/7.mdx
index 4a1aaa4b6..04159df08 100644
--- a/chapters/ja/chapter7/7.mdx
+++ b/chapters/ja/chapter7/7.mdx
@@ -33,13 +33,10 @@
 これは実際にこのセクションで示したコードを使って学習し、ハブにアップロードしたモデルを紹介しているものです。
 貴方は[ここ](https://huggingface.co/huggingface-course/bert-finetuned-squad?context=%F0%9F%A4%97+Transformers+is+backed+by+the+three+most+popular+deep+learning+libraries+%E2%80%94+Jax%2C+PyTorch+and+TensorFlow+%E2%80%94+with+a+seamless+integration+between+them.+It%27s+straightforward+to+train+your+models+with+one+before+loading+them+for+inference+with+the+other.&question=Which+deep+learning+libraries+back+%F0%9F%A4%97+Transformers%3F)でモデルを見つけて、予測を再確認することができます。
 
-<Tip>
-
-💡 BERTのようなエンコーダのみのモデルは、「トランスフォーマーアーキテクチャを発明したのは誰ですか？」のような事実として受け入れられている解答が存在する質問に対して答えを抽出するのには優れていますが、「なぜ、空は青いのですか？」のような自由形式の質問を与えられたときにはうまくいかない傾向があります。
-
-このような難しいケースでは、T5やBARTのようなエンコーダーデコーダーモデルが、[テキスト要約](/course/ja/chapter7/5)に非常に似た典型的な方法で情報を合成するために使用されます。このタイプの *生成的* な質問回答に興味がある場合は、[ELI5データセット](https://huggingface.co/datasets/eli5)に基づく私たちの[デモ](https://yjernite.github.io/lfqa.html)をチェックすることをお勧めします。
-
-</Tip>
+> [!TIP]
+> 💡 BERTのようなエンコーダのみのモデルは、「トランスフォーマーアーキテクチャを発明したのは誰ですか？」のような事実として受け入れられている解答が存在する質問に対して答えを抽出するのには優れていますが、「なぜ、空は青いのですか？」のような自由形式の質問を与えられたときにはうまくいかない傾向があります。
+>
+> このような難しいケースでは、T5やBARTのようなエンコーダーデコーダーモデルが、[テキスト要約](/course/ja/chapter7/5)に非常に似た典型的な方法で情報を合成するために使用されます。このタイプの *生成的* な質問回答に興味がある場合は、[ELI5データセット](https://huggingface.co/datasets/eli5)に基づく私たちの[デモ](https://yjernite.github.io/lfqa.html)をチェックすることをお勧めします。
 
 ## データの準備
 
@@ -362,11 +359,8 @@ print(f"Theoretical answer: {answer}, decoded example: {decoded_example}")
 
 確かに、文脈の中に答えが出てきません。
 
-<Tip>
-
-✏️ **あなたの番です！** XLNetアーキテクチャを使用する場合、左側にパディングが適用され、質問とコンテキストが切り替わります。先ほどのコードを全てXLNetアーキテクチャに適応させてください（そして`padding=True`を追加する）。パディングを適用した場合、`[CLS]` トークンが 0 の位置に来ない可能性があることに注意してください。
-
-</Tip>
+> [!TIP]
+> ✏️ **あなたの番です！** XLNetアーキテクチャを使用する場合、左側にパディングが適用され、質問とコンテキストが切り替わります。先ほどのコードを全てXLNetアーキテクチャに適応させてください（そして`padding=True`を追加する）。パディングを適用した場合、`[CLS]` トークンが 0 の位置に来ない可能性があることに注意してください。
 
 訓練データの前処理を段階的に見てきましたが、訓練データセット全体に対して適用する関数にまとめることができます。ほとんどのコンテキストは長いので（そして対応するサンプルはいくつかの特徴に分割されます）、ここで動的パディングを適用してもあまり意味がないので、すべての特徴を設定した最大長になるようにパディングします。
 
@@ -923,11 +917,8 @@ tf.keras.mixed_precision.set_global_policy("mixed_float16")
 
 {#if fw === 'pt'}
 
-<Tip>
-
-💡 使用する出力ディレクトリが存在する場合は、プッシュしたいリポジトリのローカルクローンである必要があります (したがって、`Trainer` の定義時にエラーが発生した場合は、新しい名前を設定する必要があります)。
-
-</Tip>
+> [!TIP]
+> 💡 使用する出力ディレクトリが存在する場合は、プッシュしたいリポジトリのローカルクローンである必要があります (したがって、`Trainer` の定義時にエラーが発生した場合は、新しい名前を設定する必要があります)。
 
 最後に、すべてを `Trainer` クラスに渡して、トレーニングを開始するだけです。
 
@@ -1012,11 +1003,8 @@ trainer.push_to_hub(commit_message="Training complete")
 
 この段階で、モデルハブ上の推論ウィジェットを使ってモデルをテストし、友人、家族、お気に入りのペットと共有することができます。あなたは、質問応答タスクでモデルの微調整に成功しました。おめでとうございます！
 
-<Tip>
-
-✏️ **あなたの番です！** このタスクでより良いパフォーマンスが得られるかどうか、別のモデルアーキテクチャを試してみてください！
-
-</Tip>
+> [!TIP]
+> ✏️ **あなたの番です！** このタスクでより良いパフォーマンスが得られるかどうか、別のモデルアーキテクチャを試してみてください！
 
 {#if fw === 'pt'}
 
diff --git a/chapters/ja/chapter8/2.mdx b/chapters/ja/chapter8/2.mdx
index 7e6bb36cc..f55569b1e 100644
--- a/chapters/ja/chapter8/2.mdx
+++ b/chapters/ja/chapter8/2.mdx
@@ -85,11 +85,8 @@ OSError: Can't load config for 'lewtun/distillbert-base-uncased-finetuned-squad-
 
 このレポートには多くの情報が含まれているので、主要な部分を一緒に見ていきましょう。まず注意すべきは、トレースバックは下から上へ読むということです（日本語の逆の読み方ですね！）。これは、エラートレースバックが、モデルとトークナイザーをダウンロードするときに `pipeline` が行う一連の関数呼び出しを反映していますいます（詳しくは[第２章](/course/chapter2)）をご覧下さい。
 
-<Tip>
-
-🚨 Google Colabのトレースバックで、「6frames」のあたりに青い枠があるのが見えますか？これはColabの特別な機能で、トレースバックを "フレーム "に圧縮しているのです。もしエラーの原因が詳しく見つからないようであれば、この2つの小さな矢印をクリックして、トレースバックの全容を拡大してみてください。
-
-</Tip>
+> [!TIP]
+> 🚨 Google Colabのトレースバックで、「6frames」のあたりに青い枠があるのが見えますか？これはColabの特別な機能で、トレースバックを "フレーム "に圧縮しているのです。もしエラーの原因が詳しく見つからないようであれば、この2つの小さな矢印をクリックして、トレースバックの全容を拡大してみてください。
 
 これは、トレースバックの最後の行が、発生したエラーの名前を与えることをします。エラーのタイプは `OSError` で、これはシステム関連のエラーを示しています。付属のエラーメッセージを読むと、モデルの *config.json* ファイルに問題があるようで、それを修正するための2つの提案を与えられています。
 ```python out
@@ -102,10 +99,8 @@ Make sure that:
 """
 ```
 
-<Tip>
-
-💡 理解しがたいエラーメッセージが表示された場合は、そのメッセージをコピーしてGoogle または [Stack Overflow](https://stackoverflow.com/) の検索バーに貼り付けてください！　そのエラーに遭遇したのはあなたが初めてではない可能性が高いですし、コミュニティの他の人が投稿した解決策を見つける良い方法です。例えば、`OSError: Can't load config for` を Stack Overflow で検索すると、いくつかの [ヒント](https://stackoverflow.com/search?q=OSError%3A+Can%27t+load+config+for+) が見付けられます。これは、問題解決の出発点として使うことができます。
-</Tip>
+> [!TIP]
+> 💡 理解しがたいエラーメッセージが表示された場合は、そのメッセージをコピーしてGoogle または [Stack Overflow](https://stackoverflow.com/) の検索バーに貼り付けてください！　そのエラーに遭遇したのはあなたが初めてではない可能性が高いですし、コミュニティの他の人が投稿した解決策を見つける良い方法です。例えば、`OSError: Can't load config for` を Stack Overflow で検索すると、いくつかの [ヒント](https://stackoverflow.com/search?q=OSError%3A+Can%27t+load+config+for+) が見付けられます。これは、問題解決の出発点として使うことができます。
 
 最初の提案は、モデルIDが実際に正しいかどうかを確認するよう求めているので、まず、識別子をコピーしてHubの検索バーに貼り付けましょう。
 
@@ -156,10 +151,8 @@ pretrained_checkpoint = "distilbert-base-uncased"
 config = AutoConfig.from_pretrained(pretrained_checkpoint)
 ```
 
-<Tip warning={true}>
-
-🚨 同僚は `distilbert-base-uncased` の設定を間違えていじったかもしれないです。実際のところ、私たちはまず彼らに確認したいところですが、このセクションの目的では、彼らがデフォルトの設定を使用したと仮定することにします。
-</Tip>
+> [!WARNING]
+> 🚨 同僚は `distilbert-base-uncased` の設定を間違えていじったかもしれないです。実際のところ、私たちはまず彼らに確認したいところですが、このセクションの目的では、彼らがデフォルトの設定を使用したと仮定することにします。
 
 
 それで`push_to_hub()`　機能でモデルの設定をリポジトリにプッシュすることができます。
diff --git a/chapters/ko/chapter1/3.mdx b/chapters/ko/chapter1/3.mdx
index f0e556a5e..2021d2850 100644
--- a/chapters/ko/chapter1/3.mdx
+++ b/chapters/ko/chapter1/3.mdx
@@ -9,11 +9,10 @@
 
 이번 장에서는 트랜스포머(Transformer) 모델을 사용해 무엇을 할 수 있는지 같이 살펴보고,  🤗 Transformers 라이브러리 툴의 첫 사용을 `pipeline()` 함수와 함께 시작하겠습니다.
 
-<Tip>
-👀 오른쪽 상단에 <em>Open in Colab</em> 버튼이 보이시나요? 버튼을 클릭하면 이번 장에서 사용한 모든 코드 샘플들을 Google Colab notebook을 통해 열 수 있습니다. 이런 버튼을 예제 코드를 포함하는 모든 단원에서 발견하실 수 있습니다.
-
-로컬 환경에서 예제 코드를 실행하려면 <a href="/course/chapter0">setup</a>을 살펴보세요.
-</Tip>
+> [!TIP]
+> 👀 오른쪽 상단에 <em>Open in Colab</em> 버튼이 보이시나요? 버튼을 클릭하면 이번 장에서 사용한 모든 코드 샘플들을 Google Colab notebook을 통해 열 수 있습니다. 이런 버튼을 예제 코드를 포함하는 모든 단원에서 발견하실 수 있습니다.
+>
+> 로컬 환경에서 예제 코드를 실행하려면 <a href="/course/chapter0">setup</a>을 살펴보세요.
 
 ## 트랜스포머는 어디에나 있어요!
 
@@ -23,9 +22,8 @@
 
 [🤗 Transformers 라이브러리](https://github.com/huggingface/transformers)는 이렇게 공유한 모델을 사용하고 구축하는 기능들을 제공합니다. [Model Hub](https://huggingface.co/models)에서는 모두가 다운로드 받아 쓸 수 있는 수 천 개의 사전 학습된 모델들이 여러분을 기다리고 있습니다. 여러분만의 모델을 Hub에 업로드하는 것 또한 가능합니다!
 
-<Tip>
-⚠️ The Hugging Face Hub에는 트랜스포머 모델만 있지 않아요. 누구든지 어떠한 종류의 모델이나 데이터를 공유할 수 있습니다! <a href="https://huggingface.co/join">Create a huggingface.co</a> 링크에서 계정을 만들고 모든 기능을 사용해보세요!
-</Tip>
+> [!TIP]
+> ⚠️ The Hugging Face Hub에는 트랜스포머 모델만 있지 않아요. 누구든지 어떠한 종류의 모델이나 데이터를 공유할 수 있습니다! <a href="https://huggingface.co/join">Create a huggingface.co</a> 링크에서 계정을 만들고 모든 기능을 사용해보세요!
 
 트랜스포머 모델 안에서 무슨 일이 벌어지는지 알아보기 전에, 트랜스포머가 NLP 문제 해결에 어떻게 사용되는지 몇 가지 흥미로운 예시들을 살펴보겠습니다.
 
@@ -104,11 +102,8 @@ classifier(
 
 이러한 파이프라인이 제로샷(zero-shot)이라 불리는 이유는 여러분의 데이터에 맞춰 미세 조정(fine-tune)하지 않고도 바로 작업에 사용할 수 있기 때문입니다. 제로샷은 여러분이 원하는 어떠한 분류 레이블에 대해서도 확률 점수를 즉시 반환합니다.
 
-<Tip>
-
-✏️ **직접 해보기!** 여러분이 직접 작성한 시퀀스와 레이블을 사용해 모델이 어떻게 동작하는지 확인해보세요.
-
-</Tip>
+> [!TIP]
+> ✏️ **직접 해보기!** 여러분이 직접 작성한 시퀀스와 레이블을 사용해 모델이 어떻게 동작하는지 확인해보세요.
 
 
 ## 텍스트 생성(Text generation)
@@ -132,11 +127,8 @@ generator("In this course, we will teach you how to")
 
 `num_return_sequences`라는 인자(argument)를 통해 몇 개의 서로 다른 출력 결과를 생성할지 정할 수 있고, `max_length` 인자를 통해 출력 텍스트의 총 길이를 설정할 수 있습니다.
 
-<Tip>
-
-✏️ **직접 해보기!** `num_return_sequences` 와 `max_length` 인자를 설정해 15개의 단어를 가진 서로 다른 두 개의 문장을 출력해보세요.
-
-</Tip>
+> [!TIP]
+> ✏️ **직접 해보기!** `num_return_sequences` 와 `max_length` 인자를 설정해 15개의 단어를 가진 서로 다른 두 개의 문장을 출력해보세요.
 
 
 ## 파이프라인에 Hub의 모델 적용하기
@@ -168,11 +160,8 @@ generator(
 
 모델을 클릭하면 온라인상에서 바로 사용 가능한 위젯을 확인할 수 있고, 이를 통해 모델을 직접 다운로드 받기 전에 모델의 기능을 빠르게 테스트 해볼 수 있습니다.
 
-<Tip>
-
-✏️ **직접 해보기!** 영어를 제외한 다른 언어를 생성하는 모델을 검색해보세요. 위젯을 자유롭게 다뤄 보시고 파이프라인을 사용해보세요!
-
-</Tip>
+> [!TIP]
+> ✏️ **직접 해보기!** 영어를 제외한 다른 언어를 생성하는 모델을 검색해보세요. 위젯을 자유롭게 다뤄 보시고 파이프라인을 사용해보세요!
 
 ### 추론(Inference) API
 
@@ -204,11 +193,8 @@ unmasker("This course will teach you all about <mask> models.", top_k=2)
 
 상위 몇 개의 높은 확률을 띠는 토큰을 출력할지 `top_k` 인자를 통해 조절합니다. 여기서 모델이 특이한 `<mask>` 단어를 채우는 것을 주목하세요. 이를 마스크 토큰(mask token)이라고 부릅니다. 다른 마스크 채우기 모델들은 다른 형태의 마스크 토큰을 사용할 수 있기 때문에 다른 모델을 탐색할 때 항상 해당 모델의 마스크 단어가 무엇인지 확인해야 합니다. 위젯에서 사용되는 마스크 단어를 보고 이를 확인할 수 있습니다.
 
-<Tip>
-
-✏️ **직접 해보기!** Hub에서 `bert-base-cased`를 검색해 보고 추론 API 위젯을 통해 모델의 마스크 단어가 무엇인지 확인해 보세요. 이 모델이 위의 `pipeline` 예제에서 사용한 문장에 대해 어떻게 예측하나요?
-
-</Tip>
+> [!TIP]
+> ✏️ **직접 해보기!** Hub에서 `bert-base-cased`를 검색해 보고 추론 API 위젯을 통해 모델의 마스크 단어가 무엇인지 확인해 보세요. 이 모델이 위의 `pipeline` 예제에서 사용한 문장에 대해 어떻게 예측하나요?
 
 ## 개체명 인식(Named entity recognition)
 
@@ -232,11 +218,8 @@ ner("My name is Sylvain and I work at Hugging Face in Brooklyn.")
 
 파이프라인을 생성하는 함수에  `grouped_entities=True` 옵션을 전달하면 파이프라인이 같은 개체에 해당하는 문장 부분을 다시 그룹화합니다. 이 옵션을 설정하면 모델은 여러 단어로 구성된 단어임에도 “Hugging”과 “Face”를 하나의 기관으로 정확히 분류하게 됩니다. 다음 챕터에서도 확인하겠지만, 놀랍게도 전처리 과정에서 각 단어들은 더 작은 부분으로 쪼개집니다. 예를 들어 `Sylvain` 이라는 단어는 `S`, `##yl`, `##va`, `##in` 이렇게 네 조각으로 쪼개집니다. 후처리 단계에서 파이프라인은 이 조각들을 멋지게 재그룹화합니다.
 
-<Tip>
-
-✏️ **직접 해보기!** Model Hub에서 영어 품사 태깅(part-of-speech tagging, 줄여서 POS)이 가능한 모델을 찾아보세요. 이 모델이 위의 예시 문장으로 무엇을 예측하나요?
-
-</Tip>
+> [!TIP]
+> ✏️ **직접 해보기!** Model Hub에서 영어 품사 태깅(part-of-speech tagging, 줄여서 POS)이 가능한 모델을 찾아보세요. 이 모델이 위의 예시 문장으로 무엇을 예측하나요?
 
 ## 질의 응답(Question-answering)
 
@@ -319,10 +302,7 @@ translator("Ce cours est produit par Hugging Face.")
 
 텍스트 생성 및 요약에서와 마찬가지로, `max_length` 혹은 `min_length` 를 지정하여 결과를 출력할 수 있습니다.
 
-<Tip>
-
-✏️ **직접 해보기!** 다른 언어를 지원하는 번역 모델을 검색해보고 위의 문장을 몇 가지 다른 언어들로 번역해 봅시다.
-
-</Tip>
+> [!TIP]
+> ✏️ **직접 해보기!** 다른 언어를 지원하는 번역 모델을 검색해보고 위의 문장을 몇 가지 다른 언어들로 번역해 봅시다.
 
 지금까지 보여드린 파이프라인들은 대부분 특정 작업을 위해 프로그래밍된 데모용 파이프라인으로, 여러 태스크를 동시에 지원하지는 않습니다. 다음 단원에서는  `pipeline()` 함수 내부를 살펴보고 그 동작 방식을 직접 설계하는 방법에 대해 다루겠습니다.
\ No newline at end of file
diff --git a/chapters/ko/chapter1/4.mdx b/chapters/ko/chapter1/4.mdx
index 9bb7ca6bb..bf066cb1a 100644
--- a/chapters/ko/chapter1/4.mdx
+++ b/chapters/ko/chapter1/4.mdx
@@ -44,7 +44,7 @@
 
 이러한 종류의 모델은 학습한 언어에 대해 통계 기반의 방식으로 이해를 하지만, 이는 몇몇 실생활 문제에 적합하지 않습니다. 그렇기 때문에 사전 학습된 모델은 *전이 학습(transfer learning)*이라 불리는 과정을 거칩니다. 이 과정에서 모델은 특정 작업에 맞춰 지도적(supervised)인 방법, 즉 사람이 레이블을 추가한 데이터를 사용하는 방법으로 미세 조정(fine-tune)이 이루어지는 단계를 거칩니다.
 
-하나의 예시로 문장 내에서 이전 *n*개의 단어를 읽고 다음에 올 단어를 에측하는 문제를 들 수 있습니다. 이를 과거와 현재의 입력 정보를 이용하는 방식(미래에 올 입력 정보는 이용하지 않습니다)이기 때문에 *인과적 언어 모델링(causal language modeling)*이라고 부릅니다.
+하나의 예시로 문장 내에서 이전 *n*개의 단어를 읽고 다음에 올 단어를 예측하는 문제를 들 수 있습니다. 이를 과거와 현재의 입력 정보를 이용하는 방식(미래에 올 입력 정보는 이용하지 않습니다)이기 때문에 *인과적 언어 모델링(causal language modeling)*이라고 부릅니다.
 
 <div class="flex justify-center">
 <img class="block dark:hidden" src="https://huggingface.co/datasets/huggingface-course/documentation-images/resolve/main/en/chapter1/causal_modeling.svg" alt="Example of causal language modeling in which the next word from a sentence is predicted.">
@@ -150,7 +150,7 @@
 
 ## 원본 구조
 
-트랜스포머 구조는 처음에 번역을 위해 만들어졌습니다. 학습시에 인코더는 특정 언어의 입력 문장을 받고, 동시에 디코더는 타겟 언어로된 동일한 의미의 문장을 받습니다. 인코더에서 어텐션 레이어는 문장 내의 모든 단어를 활요할 수 있습니다(방금 보았듯이 주어진 단어의 번역은 문장의 전후를 살펴보아야 하니까요). 반면, 디코더는 순차적으로 작동하기 때문에 문장 내에서 이미 번역이 이루어진 부분에만 주의를 기울일 수 밖에 없습니다. 이로 인해 현재 생성(번역)되고 있는 단어의 앞에 단어들만 이용할 수 있죠. 예시로, 번역된 타겟의 처음 세 단어를 예측해 놨을 때, 이 결과를 디코더로 넘기면 디코더는 인코더로부터 받은 모든 입력 정보를 함께 이용해 네 번째 올 단어를 예측하는 것입니다.
+트랜스포머 구조는 처음에 번역을 위해 만들어졌습니다. 학습시에 인코더는 특정 언어의 입력 문장을 받고, 동시에 디코더는 타겟 언어로된 동일한 의미의 문장을 받습니다. 인코더에서 어텐션 레이어는 문장 내의 모든 단어를 활용할 수 있습니다(방금 보았듯이 주어진 단어의 번역은 문장의 전후를 살펴보아야 하니까요). 반면, 디코더는 순차적으로 작동하기 때문에 문장 내에서 이미 번역이 이루어진 부분에만 주의를 기울일 수 밖에 없습니다. 이로 인해 현재 생성(번역)되고 있는 단어의 앞에 단어들만 이용할 수 있죠. 예시로, 번역된 타겟의 처음 세 단어를 예측해 놨을 때, 이 결과를 디코더로 넘기면 디코더는 인코더로부터 받은 모든 입력 정보를 함께 이용해 네 번째 올 단어를 예측하는 것입니다.
 
 모델이 타겟 문장에 대한 액세스(access)가 있는 상황에서, 훈련 속도를 높이기 위해 디코더는 전체 타겟을 제공하지만 뒤에 올 단어들을 사용할 수 없습니다. (모델이 두 번째 올 단어를 예측하기 위해 두 번째 위치 단어를 접근할 수 있다면 예측이 의미없어지겠죠?) 예를 들어, 네 번째 단어를 예측할 때 어텐션 레이어는 1~3 번째 단어에만 액세스하도록 합니다.
 
@@ -167,10 +167,10 @@
 
 ## 구조(Architectures) vs. 체크포인트(Checkpoints)
 
-트랜스포머 모델을 본격적으로 공부하기 앞서, 모델(models)과 함께 *구조(architectures)*와 *체크포인트(checkpoints)*라는 단어를 들으시게 될겁니다. 이 셋은 아래와 같이 조금 다른 의미를 갖고 있습니다:
+트랜스포머 모델을 본격적으로 공부하기 앞서, 모델(models)과 함께 *구조(architectures)*와 *체크포인트(checkpoints)*라는 단어를 듣게 되실겁니다. 이 셋은 아래와 같이 조금 다른 의미를 갖고 있습니다:
 
 * **구조(Architecture)**: 모델의 뼈대를 의미하는 용어로, 모델 내부의 각 레이어와 각 연산 작용들을 의미합니다.
 * **체크포인트(Checkpoints)**: 주어진 구조(architecture)에 적용될 가중치들을 의미합니다.
 * **모델(Model)**: 사실 모델은 “구조”나 “가중치”만큼 구체적이지 않은, 다소 뭉뚱그려 사용되는 용어입니다. 이 강의에서는 모호함을 피하기 위해 *구조(architecture)*와 *체크포인트(checkpoint)*를 구분해서 사용하도록 하겠습니다.
 
-예를 들면, BERT는 구조에 해당하고, Google 팀이 최초 공개에서 내놓은 학습 가중치 셋인 `bert-base-cased`는 체크포인에 해당합니다. 그렇지만 “BERT 모델”, “`bert-base-cased` 모델” 등과 같이 구분하지 않고 사용하기도 합니다.
+예를 들면, BERT는 구조에 해당하고, Google 팀이 최초 공개에서 내놓은 학습 가중치 셋인 `bert-base-cased`는 체크포인트에 해당합니다. 그렇지만 “BERT 모델”, “`bert-base-cased` 모델” 등과 같이 구분하지 않고 사용하기도 합니다.
diff --git a/chapters/ko/chapter2/1.mdx b/chapters/ko/chapter2/1.mdx
index 1d28b3a27..ae4f1b82c 100644
--- a/chapters/ko/chapter2/1.mdx
+++ b/chapters/ko/chapter2/1.mdx
@@ -19,6 +19,5 @@
 
 그런 다음 `pipeline()` 함수의 중요한 구성요소인 tokenizer API를 살펴보겠습니다. tokenizer는 처리의 첫 번째 단계인 텍스트를 신경망의 수치 입력으로 바꾸는 부분과 필요할 때 다시 텍스트로 바꾸는 마지막 단계, 즉 양끝을 다룹니다. 마지막으로 여러 문장을 묶어서 모델에게 제공하는 방법을 알아보고, 기존 `tokenizer()` 함수를 자세히 살펴봄으로써 마무리짓겠습니다.
 
-<Tip>
-⚠️ Model Hub와 🤗 Transformers에서 사용할 수 있는 모든 기능을 활용하려면 <a href="https://huggingface.co/join">계정을 만드는 게</a> 좋습니다.
-</Tip>
+> [!TIP]
+> ⚠️ Model Hub와 🤗 Transformers에서 사용할 수 있는 모든 기능을 활용하려면 <a href="https://huggingface.co/join">계정을 만드는 게</a> 좋습니다.
diff --git a/chapters/ko/chapter2/2.mdx b/chapters/ko/chapter2/2.mdx
index 7468ea435..87d93f7bf 100644
--- a/chapters/ko/chapter2/2.mdx
+++ b/chapters/ko/chapter2/2.mdx
@@ -22,9 +22,8 @@
 
 {/if}
 
-<Tip>
-PyTorch를 사용하는지, TensorFlow를 사용하는지에 따라 내용이 약간 달라지는 첫 번째 섹션입니다. 제목 상단 스위치를 이용해 선호하는 플랫폼을 선택하세요!
-</Tip>
+> [!TIP]
+> PyTorch를 사용하는지, TensorFlow를 사용하는지에 따라 내용이 약간 달라지는 첫 번째 섹션입니다. 제목 상단 스위치를 이용해 선호하는 플랫폼을 선택하세요!
 
 {#if fw === 'pt'}
 <Youtube id="1pedAIvTWXk"/>
@@ -346,8 +345,5 @@ model.config.id2label
 
 파이프라인 세 단계-토크나이저를 이용한 전처리, 모델에 입력 넣어주기, 후처리-를 성공적으로 재현했습니다! 이제 각 단계별로 좀 더 깊게 알아보는 시간을 가져봅시다.
 
-<Tip>
-
-✏️ **직접 해보세요!** 2개 이상의 문장을 골라 `sentiment-analysis` 파이프라인을 적용해보세요. 이 챕터에서 본 내용을 그대로 수행해보고 같은 결과가 나오는지 확인해보세요!
-
-</Tip>
+> [!TIP]
+> ✏️ **직접 해보세요!** 2개 이상의 문장을 골라 `sentiment-analysis` 파이프라인을 적용해보세요. 이 챕터에서 본 내용을 그대로 수행해보고 같은 결과가 나오는지 확인해보세요!
diff --git a/chapters/ko/chapter2/4.mdx b/chapters/ko/chapter2/4.mdx
index fed8fc9cf..035a16b61 100644
--- a/chapters/ko/chapter2/4.mdx
+++ b/chapters/ko/chapter2/4.mdx
@@ -217,11 +217,8 @@ print(ids)
 
 적절한 프레임워크의 텐서로 변환되고 나면 이 출력 결과는 이전 장에서 본 것처럼 모델 입력으로 사용될 수 있습니다.
 
-<Tip>
-
-✏️ **직접 해보세요!** 2장에서 사용한 입력 문장("I've been waiting for a HuggingFace course my whole life."과 "I hate this so much!")을 이용해 두 단계(토큰화와 입력 ID로의 변환)를 수행해보세요. 위에서 얻은 결과와 당신이 얻은 결과가 동일한지 확인해보세요!
-
-</Tip>
+> [!TIP]
+> ✏️ **직접 해보세요!** 2장에서 사용한 입력 문장("I've been waiting for a HuggingFace course my whole life."과 "I hate this so much!")을 이용해 두 단계(토큰화와 입력 ID로의 변환)를 수행해보세요. 위에서 얻은 결과와 당신이 얻은 결과가 동일한지 확인해보세요!
 
 ## 디코딩[[decoding]]
 
diff --git a/chapters/ko/chapter2/5.mdx b/chapters/ko/chapter2/5.mdx
index faf950b4f..280959193 100644
--- a/chapters/ko/chapter2/5.mdx
+++ b/chapters/ko/chapter2/5.mdx
@@ -180,11 +180,8 @@ batched_ids = [ids, ids]
 
 동일한 문장 2개로 만든 배치입니다!
 
-<Tip>
-
-✏️ **직접 해보세요!** 이 `batched_ids` 리스트를 텐서로 변환하고 모델로 전달해보세요. 이전에 얻은 로짓과 동일한 결과를 얻는지 확인해보세요. (개수는 두 개여야 합니다!)
-
-</Tip>
+> [!TIP]
+> ✏️ **직접 해보세요!** 이 `batched_ids` 리스트를 텐서로 변환하고 모델로 전달해보세요. 이전에 얻은 로짓과 동일한 결과를 얻는지 확인해보세요. (개수는 두 개여야 합니다!)
 
 배치는 여러 개의 문장을 모델로 넘겼을 때도 모델이 작동하게 합니다. 다중 시퀀스를 사용하는 것은 단일 시퀀스로 배치를 만드는 것만큼 간단합니다. 하지만 두 번째 문제가 있습니다. 두 개 이상의 문장을 배치로 만드려고 할 때, 그 문장들은 아마 다른 길이를 가지고 있을 것입니다. 이전에 텐서를 다뤄본 사람이라면, 텐서의 형태가 직사각형이어야 한다는 것을 알고 있습니다. 문장 길이가 다르면 입력 ID 리스트를 텐서로 바로 변환할 수 없습니다. 이 문제를 해결하기 위해, 일반적으로 입력에 *패드*를 추가합니다.
 
@@ -316,11 +313,8 @@ tf.Tensor(
 
 두 번째 문장의 어텐션 마스크에서 마지막 값인 0은 패딩 ID라는 것을 잊지 마세요.
 
-<Tip>
-
-✏️ **직접 해보세요!** 2장에서 사용한 두 개의 문장("I've been waiting for a HuggingFace course my whole life." and "I hate this so much!")을 이용해 직접 토큰화를 적용해보세요. 토큰화 결과를 모델에 넘기고 2장에서 얻은 것과 동일한 로짓을 얻었는지 확인해보세요. 이제 Now batch them together using the padding token, then create the proper attention mask. Check that you obtain the same results when going through the model!
-
-</Tip>
+> [!TIP]
+> ✏️ **직접 해보세요!** 2장에서 사용한 두 개의 문장("I've been waiting for a HuggingFace course my whole life." and "I hate this so much!")을 이용해 직접 토큰화를 적용해보세요. 토큰화 결과를 모델에 넘기고 2장에서 얻은 것과 동일한 로짓을 얻었는지 확인해보세요. 이제 Now batch them together using the padding token, then create the proper attention mask. Check that you obtain the same results when going through the model!
 
 ## 길이가 긴 시퀀스[[longer-sequences]]
 
diff --git a/chapters/ko/chapter3/2.mdx b/chapters/ko/chapter3/2.mdx
index 106801824..25b180f3f 100644
--- a/chapters/ko/chapter3/2.mdx
+++ b/chapters/ko/chapter3/2.mdx
@@ -45,11 +45,8 @@ Hub에는 모델뿐만 아니라 다양한 언어로 된 여러 데이터 세트
 
 🤗 Datasets 라이브러리는 Hub에서 데이터 세트를 다운로드하고 캐시하는 매우 간단한 명령을 제공합니다. MRPC 데이터 세트를 다음과 같이 다운로드할 수 있습니다.
 
-<Tip>
-
-💡 **추가 자료**: 더 많은 데이터 세트 로딩 기법과 예제를 보려면 [🤗 Datasets 문서](https://huggingface.co/docs/datasets/)를 확인하세요.
-
-</Tip> 
+> [!TIP]
+> 💡 **추가 자료**: 더 많은 데이터 세트 로딩 기법과 예제를 보려면 [🤗 Datasets 문서](https://huggingface.co/docs/datasets/)를 확인하세요. 
 
 ```py
 from datasets import load_dataset
@@ -77,11 +74,8 @@ DatasetDict({
 
 보시다시피, 훈련 세트, 검증 세트, 테스트 세트가 포함된 `DatasetDict` 객체를 얻습니다. 각각은 여러 열(`sentence1`, `sentence2`, `label`, `idx`)과 가변적인 행 수를 포함하며, 이는 각 세트의 요소 수입니다(따라서 훈련 세트에는 3,668개의 문장 쌍이, 검증 세트에는 408개가, 테스트 세트에는 1,725개가 있습니다).
 
-<Tip>
-
-이 명령은 기본적으로 *~/.cache/huggingface/datasets*에 데이터 세트를 다운로드하고 캐시합니다. 2장에서 언급했듯이 `HF_HOME` 환경 변수를 설정하여 캐시 폴더를 맞춤 설정할 수 있습니다.
-
-</Tip>
+> [!TIP]
+> 이 명령은 기본적으로 *~/.cache/huggingface/datasets*에 데이터 세트를 다운로드하고 캐시합니다. 2장에서 언급했듯이 `HF_HOME` 환경 변수를 설정하여 캐시 폴더를 맞춤 설정할 수 있습니다.
 
 딕셔너리처럼 인덱싱하여 `raw_datasets` 객체의 각 문장 쌍에 접근할 수 있습니다.
 
@@ -112,11 +106,8 @@ raw_train_dataset.features
 
 내부적으로 `label`은 `ClassLabel` 타입이며, 정수와 레이블 이름의 매핑이 *names* 폴더에 저장되어 있습니다. `0`은 `not_equivalent`에, `1`은 `equivalent`에 해당합니다.
 
-<Tip>
-
-✏️ **직접 해보기!** 훈련 세트의 15번째 요소와 검증 세트의 87번째 요소를 살펴보세요. 그들의 레이블은 무엇인가요?
-
-</Tip>
+> [!TIP]
+> ✏️ **직접 해보기!** 훈련 세트의 15번째 요소와 검증 세트의 87번째 요소를 살펴보세요. 그들의 레이블은 무엇인가요?
 
 ### 데이터 세트 전처리[[preprocessing-a-dataset]]
 
@@ -133,11 +124,8 @@ tokenized_sentences_1 = tokenizer(raw_datasets["train"]["sentence1"])
 tokenized_sentences_2 = tokenizer(raw_datasets["train"]["sentence2"])
 ```
 
-<Tip>
-
-💡 **심화 학습**: 더 고급 토큰화 기법과 다양한 토크나이저가 작동하는 방식을 이해하려면 [🤗 Tokenizers 문서](https://huggingface.co/docs/transformers/main/en/tokenizer_summary)와 [쿡북의 토큰화 가이드](https://huggingface.co/learn/cookbook/en/advanced_rag#tokenization-strategies)를 살펴보세요.
-
-</Tip>
+> [!TIP]
+> 💡 **심화 학습**: 더 고급 토큰화 기법과 다양한 토크나이저가 작동하는 방식을 이해하려면 [🤗 Tokenizers 문서](https://huggingface.co/docs/transformers/main/en/tokenizer_summary)와 [쿡북의 토큰화 가이드](https://huggingface.co/learn/cookbook/en/advanced_rag#tokenization-strategies)를 살펴보세요.
 
 하지만 두 시퀀스를 모델에 전달하기만 해서는 두 문장이 패러프레이즈인지 아닌지 예측할 수 없습니다. 두 시퀀스를 쌍으로 처리하고 적절한 전처리를 적용해야 합니다. 다행히 토크나이저는 한 쌍의 시퀀스를 받아서 BERT 모델이 기대하는 방식으로 준비할 수도 있습니다.
 
@@ -156,11 +144,8 @@ inputs
 
 [2장](/course/chapter2)에서 `input_ids`와 `attention_mask` 키에 대해 논의했지만, `token_type_ids`에 대한 이야기는 미뤄두었습니다. 이 예제에서 이것은 입력의 어느 부분이 첫 번째 문장이고 어느 부분이 두 번째 문장인지 모델에 알려주는 역할을 합니다.
 
-<Tip>
-
-✏️ **직접 해보기!** 훈련 세트의 15번째 요소를 가져와서 두 문장을 따로따로 토큰화하고 쌍으로도 토큰화해보세요. 두 결과의 차이점은 무엇인가요?
-
-</Tip>
+> [!TIP]
+> ✏️ **직접 해보기!** 훈련 세트의 15번째 요소를 가져와서 두 문장을 따로따로 토큰화하고 쌍으로도 토큰화해보세요. 두 결과의 차이점은 무엇인가요?
 
 `input_ids` 안의 ID를 다시 단어로 디코딩하면
 
@@ -215,11 +200,8 @@ def tokenize_function(example):
 
 지금은 토큰화 함수에서 `padding` 인수를 빼둔 것에 주목하세요. 모든 샘플을 최대 길이로 패딩하는 것은 효율적이지 않기 때문입니다. 배치를 만들 때 샘플을 패딩하는 것이 더 좋습니다. 그러면 해당 배치의 최대 길이까지만 패딩하면 되고, 전체 데이터 세트의 최대 길이까지 패딩할 필요가 없기 때문입니다. 입력의 길이가 매우 다양할 때 많은 시간과 처리 능력을 절약할 수 있습니다!
 
-<Tip>
-
-📚 **성능 팁**: 효율적인 데이터 처리 기법에 대한 자세한 내용은 [🤗 Datasets 성능 가이드](https://huggingface.co/docs/datasets/about_arrow)에서 배울 수 있습니다.
-
-</Tip>
+> [!TIP]
+> 📚 **성능 팁**: 효율적인 데이터 처리 기법에 대한 자세한 내용은 [🤗 Datasets 성능 가이드](https://huggingface.co/docs/datasets/about_arrow)에서 배울 수 있습니다.
 
 다음은 모든 데이터 세트에 토큰화 함수를 한 번에 적용하는 방법입니다. `map` 호출에서 `batched=True`를 사용하므로 함수가 데이터 세트의 각 요소에 개별적으로가 아니라 여러 요소에 한 번에 적용됩니다. 이를 통해 더 빠른 전처리가 가능합니다.
 
@@ -259,11 +241,8 @@ DatasetDict({
 
 배치 내에서 샘플들을 함께 배치하는 역할을 하는 함수를 *collate function*이라고 합니다. 이는 `DataLoader`를 구축할 때 전달할 수 있는 인수로, 기본값은 샘플을 PyTorch 텐서로 변환하고 연결하는 함수입니다(요소가 목록, 튜플 또는 딕셔너리인 경우 재귀적으로). 우리의 경우 입력이 모두 같은 크기가 아니므로 이것은 불가능할 것입니다. 우리는 의도적으로 패딩을 연기하여 각 배치에서만 필요에 따라 적용하고 많은 패딩이 있는 지나치게 긴 입력을 피했습니다. 이것은 훈련을 상당히 가속화할 것이지만, TPU에서 훈련하는 경우 문제를 일으킬 수 있다는 점에 주의하세요 — TPU는 추가 패딩이 필요하더라도 고정된 모양을 선호합니다.
 
-<Tip>
-
-🚀 **최적화 가이드**: 패딩 전략과 TPU 고려사항을 포함한 훈련 성능 최적화에 대한 자세한 내용은 [🤗 Transformers 성능 문서](https://huggingface.co/docs/transformers/main/en/performance)를 참조하세요.
-
-</Tip>
+> [!TIP]
+> 🚀 **최적화 가이드**: 패딩 전략과 TPU 고려사항을 포함한 훈련 성능 최적화에 대한 자세한 내용은 [🤗 Transformers 성능 문서](https://huggingface.co/docs/transformers/main/en/performance)를 참조하세요.
 
 실제로 이를 수행하려면 함께 배치하려는 데이터 세트 항목에 적절한 양의 패딩을 적용할 collate function을 정의해야 합니다. 다행히 🤗 Transformers 라이브러리는 `DataCollatorWithPadding`을 통해 이러한 함수를 제공합니다. 인스턴스화할 때 토크나이저를 받아서(어떤 패딩 토큰을 사용할지, 모델이 입력의 왼쪽 또는 오른쪽에 패딩을 기대하는지 알기 위해) 필요한 모든 것을 수행합니다.
 
@@ -301,13 +280,10 @@ batch = data_collator(samples)
 
 좋아 보입니다! 이제 원시 텍스트에서 모델이 처리할 수 있는 배치까지 만들었으므로, 미세 조정할 준비가 되었습니다!
 
-<Tip>
-
-✏️ **직접 해보기!** GLUE SST-2 데이터 세트에서 전처리를 복제해보세요. 쌍이 아닌 단일 문장으로 구성되어 있어 약간 다르지만, 나머지는 동일하게 보일 것입니다. 더 어려운 도전을 위해서는 GLUE 작업 중 어떤 것에서도 작동하는 전처리 함수를 작성해보세요.
-
-📖 **추가 연습**: [🤗 Transformers 예제](https://huggingface.co/docs/transformers/main/en/notebooks)에서 이러한 실습 예제들을 확인해보세요.
-
-</Tip>
+> [!TIP]
+> ✏️ **직접 해보기!** GLUE SST-2 데이터 세트에서 전처리를 복제해보세요. 쌍이 아닌 단일 문장으로 구성되어 있어 약간 다르지만, 나머지는 동일하게 보일 것입니다. 더 어려운 도전을 위해서는 GLUE 작업 중 어떤 것에서도 작동하는 전처리 함수를 작성해보세요.
+>
+> 📖 **추가 연습**: [🤗 Transformers 예제](https://huggingface.co/docs/transformers/main/en/notebooks)에서 이러한 실습 예제들을 확인해보세요.
 
 완벽합니다! 이제 🤗 Datasets 라이브러리의 최신 모범 사례로 데이터를 전처리했으므로, 최신 Trainer API를 사용하여 모델을 훈련할 준비가 되었습니다. 다음 섹션에서는 Hugging Face 생태계에서 사용할 수 있는 최신 기능과 최적화를 사용하여 모델을 효과적으로 미세 조정하는 방법을 보여드리겠습니다.
 
@@ -435,12 +411,9 @@ batch = data_collator(samples)
 	]}
 />
 
-<Tip>
-
-💡 **핵심 요점**
-- 전처리를 훨씬 빠르게 하려면 `Dataset.map()`에서 `batched=True`를 사용하세요
-- `DataCollatorWithPadding`을 사용한 동적 패딩이 고정 길이 패딩보다 효율적입니다
-- 모델의 추론 결과물(수치적 텐서, 올바른 열 이름)에 맞게 항상 데이터를 전처리하세요
-- 🤗 Datasets 라이브러리는 대규모 효율적인 데이터 처리를 위한 강력한 도구를 제공합니다
-
-</Tip>
\ No newline at end of file
+> [!TIP]
+> 💡 **핵심 요점**
+> - 전처리를 훨씬 빠르게 하려면 `Dataset.map()`에서 `batched=True`를 사용하세요
+> - `DataCollatorWithPadding`을 사용한 동적 패딩이 고정 길이 패딩보다 효율적입니다
+> - 모델의 추론 결과물(수치적 텐서, 올바른 열 이름)에 맞게 항상 데이터를 전처리하세요
+> - 🤗 Datasets 라이브러리는 대규모 효율적인 데이터 처리를 위한 강력한 도구를 제공합니다
\ No newline at end of file
diff --git a/chapters/ko/chapter3/3.mdx b/chapters/ko/chapter3/3.mdx
index 4f1b830f3..5f3604aa4 100644
--- a/chapters/ko/chapter3/3.mdx
+++ b/chapters/ko/chapter3/3.mdx
@@ -13,11 +13,8 @@
 
 🤗 Transformers는 `Trainer` 클래스를 제공합니다. 이 클래스를 사용하면 사전 학습된 모델을 여러분의 데이터셋에 맞춰 최신 기법으로 쉽게 미세 조정할 수 있습니다. 이전 섹션에서 데이터 전처리 작업을 모두 마쳤다면 이제 몇 단계만 거치면 `Trainer`를 정의할 수 있습니다. 가장 어려운 부분은 `Trainer.train()`을 실행할 환경을 준비하는 과정일 수 있습니다. 이 작업은 CPU에서 매우 느리게 실행되기 때문입니다. 만약 GPU가 없다면 [Google Colab](https://colab.research.google.com/)에서 무료로 제공하는 GPU나 TPU를 이용할 수 있습니다.
 
-<Tip>
-
-📚 **훈련 리소스**: 훈련을 시작하기 전에 포괄적인 [🤗 Transformers 훈련 가이드](https://huggingface.co/docs/transformers/main/en/training)를 숙지하고 [미세 조정 쿡북](https://huggingface.co/learn/cookbook/en/fine_tuning_code_llm_on_single_gpu)의 실용적인 예제를 살펴보세요.
-
-</Tip>
+> [!TIP]
+> 📚 **훈련 리소스**: 훈련을 시작하기 전에 포괄적인 [🤗 Transformers 훈련 가이드](https://huggingface.co/docs/transformers/main/en/training)를 숙지하고 [미세 조정 쿡북](https://huggingface.co/learn/cookbook/en/fine_tuning_code_llm_on_single_gpu)의 실용적인 예제를 살펴보세요.
 
 아래 코드 예시는 이전 섹션의 코드를 모두 실행했다는 가정하에 작동합니다. 즉, 다음 사항들이 필요합니다.
 
@@ -50,11 +47,8 @@ training_args = TrainingArguments("test-trainer")
 
 훈련 중에 모델을 Hub에 자동으로 업로드하려면 `TrainingArguments`에서 `push_to_hub=True`를 전달하세요. 이 기능에 대해서는 [Chapter 4](/course/chapter4/3)에서 자세히 알아보겠습니다.
 
-<Tip>
-
-🚀 **고급 설정**: 사용 가능한 모든 훈련 인수와 최적화 전략에 대한 자세한 정보는 [TrainingArguments 문서](https://huggingface.co/docs/transformers/main/en/main_classes/trainer#transformers.TrainingArguments)와 [훈련 구성 쿡북](https://huggingface.co/learn/cookbook/en/fine_tuning_code_llm_on_single_gpu)을 참고하세요.
-
-</Tip>
+> [!TIP]
+> 🚀 **고급 설정**: 사용 가능한 모든 훈련 인수와 최적화 전략에 대한 자세한 정보는 [TrainingArguments 문서](https://huggingface.co/docs/transformers/main/en/main_classes/trainer#transformers.TrainingArguments)와 [훈련 구성 쿡북](https://huggingface.co/learn/cookbook/en/fine_tuning_code_llm_on_single_gpu)을 참고하세요.
 
 두 번째 단계는 모델을 정의하는 것입니다. [이전 챕터](/course/chapter2)에서와 같이 두 개의 라벨과 함께 `AutoModelForSequenceClassification` 클래스를 사용하겠습니다.
 
@@ -83,11 +77,8 @@ trainer = Trainer(
 
 `processing_class`에 토크나이저를 전달하면, `Trainer`가 기본적으로 `DataCollatorWithPadding`을 `data_collator`로 사용합니다. 따라서 이 경우에는 `data_collator=data_collator` 줄을 생략할 수 있지만, 데이터 처리 파이프라인의 중요한 부분을 보여드리기 위해 코드에 포함했습니다.
 
-<Tip>
-
-📖 **더 자세히 알아보기**: Trainer 클래스와 그 매개변수에 대한 자세한 내용은 [Trainer API 문서](https://huggingface.co/docs/transformers/main/en/main_classes/trainer)를 방문하고 [훈련 쿡북 레시피](https://huggingface.co/learn/cookbook/en/fine_tuning_code_llm_on_single_gpu)에서 고급 사용 패턴을 살펴보세요.
-
-</Tip>
+> [!TIP]
+> 📖 **더 자세히 알아보기**: Trainer 클래스와 그 매개변수에 대한 자세한 내용은 [Trainer API 문서](https://huggingface.co/docs/transformers/main/en/main_classes/trainer)를 방문하고 [훈련 쿡북 레시피](https://huggingface.co/learn/cookbook/en/fine_tuning_code_llm_on_single_gpu)에서 고급 사용 패턴을 살펴보세요.
 
 데이터셋에서 모델을 미세 조정하려면 `Trainer`의 `train()` 메소드를 호출하기만 하면 됩니다.
 
@@ -137,11 +128,8 @@ metric.compute(predictions=preds, references=predictions.label_ids)
 {'accuracy': 0.8578431372549019, 'f1': 0.8996539792387542}
 ```
 
-<Tip>
-
-다양한 평가 메트릭과 전략에 대해 알아보려면 [🤗 Evaluate 문서](https://huggingface.co/docs/evaluate/)를 참고하세요.
-
-</Tip>
+> [!TIP]
+> 다양한 평가 메트릭과 전략에 대해 알아보려면 [🤗 Evaluate 문서](https://huggingface.co/docs/evaluate/)를 참고하세요.
 
 모델 헤드의 가중치가 무작위로 초기화되기 때문에 얻게 되는 결과는 조금씩 다를 수 있습니다. 결과를 보면 우리 모델이 검증 세트에서 85.78%의 정확도와 89.97%의 F1 점수를 달성했음을 볼 수 있습니다. 이 두 가지는 GLUE 벤치마크의 MRPC 데이터셋에서 결과를 평가하는 데 사용되는 메트릭입니다. [BERT 논문](https://arxiv.org/pdf/1810.04805.pdf)에서는 기본 모델의 F1 점수를 88.9로 보고하였습니다. 당시에는 `uncased` 모델을 사용했지만, 우리는 현재 `cased` 모델을 사용하고 있기 때문에 더 나은 결과가 나온 것입니다.
 
@@ -216,21 +204,15 @@ training_args = TrainingArguments(
 )
 ```
 
-<Tip>
-
-🎯 **성능 최적화**: 분산 훈련, 메모리 최적화, 하드웨어별 최적화를 포함한 고급 훈련 기술에 대해서는 [🤗 Transformers 성능 가이드](https://huggingface.co/docs/transformers/main/en/performance)를 살펴보세요.
-
-</Tip>
+> [!TIP]
+> 🎯 **성능 최적화**: 분산 훈련, 메모리 최적화, 하드웨어별 최적화를 포함한 고급 훈련 기술에 대해서는 [🤗 Transformers 성능 가이드](https://huggingface.co/docs/transformers/main/en/performance)를 살펴보세요.
 
 `Trainer`는 여러 GPU 또는 TPU에서 즉시 작동하며 분산 훈련을 위한 많은 옵션을 제공합니다. 이와 관련된 모든 내용은 Chapter 10에서 다루겠습니다.
 
 이것으로 `Trainer` API를 사용한 미세 조정 소개를 마칩니다. 대부분의 일반적인 NLP 작업에 대한 예제는 [Chapter 7](/course/chapter7)에서 다룰 예정이며, 다음으로는 순수 PyTorch 코드로 동일한 작업을 수행하는 방법을 살펴보겠습니다.
 
-<Tip>
-
-📝 **더 많은 예제**: [🤗 Transformers 노트북](https://huggingface.co/docs/transformers/main/en/notebooks)에 있는 방대한 자료를 확인해 보세요.
-
-</Tip>
+> [!TIP]
+> 📝 **더 많은 예제**: [🤗 Transformers 노트북](https://huggingface.co/docs/transformers/main/en/notebooks)에 있는 방대한 자료를 확인해 보세요.
 
 ## 섹션 퀴즈[[section-quiz]]
 
@@ -380,14 +362,11 @@ Trainer API와 미세 조정 개념에 대한 이해를 테스트해보세요.
 	]}
 />
 
-<Tip>
-
-💡 **핵심 요점:**
-- `Trainer` API는 대부분의 훈련 복잡성을 처리하는 높은 수준의 인터페이스를 제공합니다.
-- `processing_class`는 적절한 데이터 처리를 위해 토크나이저를 저장하는 데 사용됩니다.
-- `TrainingArguments`는 학습률, 배치 크기, 평가 전략, 최적화 등 훈련의 모든 측면을 제어합니다.
-- `compute_metrics`를 사용하면 훈련 손실 외에 사용자 정의 평가 메트릭을 활용할 수 있습니다.
-- 혼합 정밀도(`fp16=True`)와 그레이디언트 누적과 같은 최신 기능은 훈련 효율성을 크게 향상시킬 수 있습니다.
-
-</Tip>
+> [!TIP]
+> 💡 **핵심 요점:**
+> - `Trainer` API는 대부분의 훈련 복잡성을 처리하는 높은 수준의 인터페이스를 제공합니다.
+> - `processing_class`는 적절한 데이터 처리를 위해 토크나이저를 저장하는 데 사용됩니다.
+> - `TrainingArguments`는 학습률, 배치 크기, 평가 전략, 최적화 등 훈련의 모든 측면을 제어합니다.
+> - `compute_metrics`를 사용하면 훈련 손실 외에 사용자 정의 평가 메트릭을 활용할 수 있습니다.
+> - 혼합 정밀도(`fp16=True`)와 그레이디언트 누적과 같은 최신 기능은 훈련 효율성을 크게 향상시킬 수 있습니다.
 
diff --git a/chapters/ko/chapter3/5.mdx b/chapters/ko/chapter3/5.mdx
index e0624b3d4..3fec97344 100644
--- a/chapters/ko/chapter3/5.mdx
+++ b/chapters/ko/chapter3/5.mdx
@@ -76,11 +76,8 @@ trainer.train()
 - **훈련과 함께 증가**: 모델이 데이터의 패턴을 학습할 수 있다면 학습함에 따라 정확도가 일반적으로 향상되어야 합니다.
 - **고원 현상을 보일 수 있음**: 모델이 실제 레이블에 가까운 예측을 만들어내므로, 정확도는 부드럽게 상승하기보다는 계단식으로 점프하는 경우가 많습니다. 
 
-<Tip>
-
-💡 **정확도 곡선이 "계단식"인 이유**: 연속적인 손실과 달리 정확도는 이산적인 예측을 실제 레이블과 비교하여 계산됩니다. 모델 신뢰도의 작은 개선은 최종 예측을 변경하지 않을 수 있어서 임계값을 넘을 때까지 정확도가 평평하게 유지됩니다.
-
-</Tip>
+> [!TIP]
+> 💡 **정확도 곡선이 "계단식"인 이유**: 연속적인 손실과 달리 정확도는 이산적인 예측을 실제 레이블과 비교하여 계산됩니다. 모델 신뢰도의 작은 개선은 최종 예측을 변경하지 않을 수 있어서 임계값을 넘을 때까지 정확도가 평평하게 유지됩니다.
 
 ### 수렴[[convergence]]
 
@@ -110,14 +107,11 @@ trainer.train()
 
 예를 들어, 고양이(0)와 개(1)를 구별하는 이진 분류기에서 모델이 개 이미지(실제 값 1)에 대해 0.3을 예측하면 이는 0으로 반올림되어 잘못된 분류입니다. 다음 단계에서 0.4를 예측하면 여전히 틀렸습니다. 0.4가 0.3보다 1에 더 가깝기 때문에 손실은 감소했지만 정확도는 변하지 않아 고원을 만듭니다. 정확도는 모델이 0.5보다 큰 값을 예측하여 1로 반올림될 때만 점프합니다.
 
-<Tip>
-
-**건전한 곡선의 특성**
-- **손실의 부드러운 감소**: 훈련 및 검증 손실이 모두 꾸준히 감소
-- **훈련/검증 성능이 근접**: 훈련 및 검증 지표 간의 작은 격차
-- **수렴**: 곡선이 평평해져서 모델이 패턴을 학습했음을 나타냄
-
-</Tip>
+> [!TIP]
+> **건전한 곡선의 특성**
+> - **손실의 부드러운 감소**: 훈련 및 검증 손실이 모두 꾸준히 감소
+> - **훈련/검증 성능이 근접**: 훈련 및 검증 지표 간의 작은 격차
+> - **수렴**: 곡선이 평평해져서 모델이 패턴을 학습했음을 나타냄
 
 ### 실용적인 예시[[practical-examples]]
 
@@ -141,16 +135,14 @@ trainer.train()
 3. **일반화**: 훈련과 검증 성능이 얼마나 가까운가?
 4. **경향**: 추가 훈련이 성능을 향상시킬 가능성이 있는가?
 
-<Tip>
-
-🔍 **W&B 대시보드 기능**: Weights & Biases를 사용하면 학습 곡선을 보기 좋고 상호작용 가능한 그래프로 자동으로 만들 수 있습니다.
-- 여러 실행을 나란히 비교
-- 사용자 정의 지표 및 시각화 추가
-- 이상 동작에 대한 알림 설정
-- 팀과 결과 공유
-
-[Weights & Biases 문서](https://docs.wandb.ai/)에서 자세히 알아보세요.
-</Tip>
+> [!TIP]
+> 🔍 **W&B 대시보드 기능**: Weights & Biases를 사용하면 학습 곡선을 보기 좋고 상호작용 가능한 그래프로 자동으로 만들 수 있습니다.
+> - 여러 실행을 나란히 비교
+> - 사용자 정의 지표 및 시각화 추가
+> - 이상 동작에 대한 알림 설정
+> - 팀과 결과 공유
+>
+> [Weights & Biases 문서](https://docs.wandb.ai/)에서 자세히 알아보세요.
 
 #### 과적합[[overfitting]]
 
@@ -279,19 +271,16 @@ training_args = TrainingArguments(
 
 학습 곡선을 이해하는 것은 효과적인 기계학습 전문가가 되기 위해 중요합니다. 이러한 시각적 도구는 모델의 훈련 진행 상황에 대한 즉각적인 피드백을 제공하고 언제 훈련을 중단하거나 하이퍼파라미터를 조정하거나 다른 접근 방식을 시도할지에 대한 정보에 기반한 결정을 내리는 데 도움이 됩니다. 연습을 통해 건전한 학습 곡선이 어떤 모습인지, 그리고 문제가 발생했을 때 어떻게 해결할지에 대한 직관적인 이해를 개발할 수 있습니다.
 
-<Tip>
-
-💡 **핵심 요점**
-- 학습 곡선은 모델 훈련 진행 상황을 이해하는 데 필수적인 도구입니다.
-- 손실과 정확도 곡선을 모두 모니터링하되, 서로 다른 특성을 가지고 있음을 기억하세요.
-- 과적합은 훈련/검증 성능의 분기로 나타납니다.
-- 과소적합은 훈련과 검증 데이터 모두에서 성능이 좋지 않은 것으로 나타납니다.
-- Weights & Biases와 같은 도구는 학습 곡선을 쉽게 추적하고 분석할 수 있게 해줍니다.
-- 조기 중단과 적절한 정규화는 대부분의 일반적인 훈련 문제를 해결할 수 있습니다.
-
-🔬 **다음 단계**: 자신의 미세 조정 실험에서 학습 곡선을 분석해보세요. 다양한 하이퍼파라미터를 시도하고 곡선 모양에 어떤 영향을 미치는지 관찰하세요. 이러한 실습 경험이 훈련 진행 상황을 읽는 직관을 개발하는 가장 좋은 방법입니다.
-
-</Tip>
+> [!TIP]
+> 💡 **핵심 요점**
+> - 학습 곡선은 모델 훈련 진행 상황을 이해하는 데 필수적인 도구입니다.
+> - 손실과 정확도 곡선을 모두 모니터링하되, 서로 다른 특성을 가지고 있음을 기억하세요.
+> - 과적합은 훈련/검증 성능의 분기로 나타납니다.
+> - 과소적합은 훈련과 검증 데이터 모두에서 성능이 좋지 않은 것으로 나타납니다.
+> - Weights & Biases와 같은 도구는 학습 곡선을 쉽게 추적하고 분석할 수 있게 해줍니다.
+> - 조기 중단과 적절한 정규화는 대부분의 일반적인 훈련 문제를 해결할 수 있습니다.
+>
+> 🔬 **다음 단계**: 자신의 미세 조정 실험에서 학습 곡선을 분석해보세요. 다양한 하이퍼파라미터를 시도하고 곡선 모양에 어떤 영향을 미치는지 관찰하세요. 이러한 실습 경험이 훈련 진행 상황을 읽는 직관을 개발하는 가장 좋은 방법입니다.
 
 ## 섹션 퀴즈[[section-quiz]]
 
diff --git a/chapters/ko/chapter5/2.mdx b/chapters/ko/chapter5/2.mdx
index 67ccff72a..f44e82872 100644
--- a/chapters/ko/chapter5/2.mdx
+++ b/chapters/ko/chapter5/2.mdx
@@ -48,11 +48,8 @@ SQuAD_it-train.json.gz:	   82.2% -- replaced with SQuAD_it-train.json
 
 명령어를 통해 압축 파일들이 각각 _SQuAD_it-train.json_, _SQuAD_it-test.json_ 로 바뀌어 저장되어 있는 것을 볼 수 있습니다.
 
-<Tip>
-
-✎ 위 쉘 명령어에 `!`가 붙는 이유는 주피터 노트북에서 실행하고 있기 때문입니다. 터미널에서 명령어를 실행한다면 `!`를 없애주시면 됩니다.
-
-</Tip>
+> [!TIP]
+> ✎ 위 쉘 명령어에 `!`가 붙는 이유는 주피터 노트북에서 실행하고 있기 때문입니다. 터미널에서 명령어를 실행한다면 `!`를 없애주시면 됩니다.
 
 `load_dataset()` 함수로 JSON 파일을 로딩할 때, 파일이 일반 JSON (nested dictionary와 유사)으로 형태인 지 혹은 JSON Lines (줄로 구분된 JSON)형태인 지 알아야 합니다. 많은 질의응답 데이터셋처럼, SQuAD-it 데이터셋은 `data` 필드에 텍스트가 모두 저장된 nested 포맷을 사용하며, 이는 다음과 같이 `field` 인수를 지정하여 데이터셋을 로딩할 수 있음을 의미합니다:
 
@@ -126,11 +123,8 @@ DatasetDict({
 
 이것이 바로 우리가 원했던 것입니다. 이제 다양한 전처리 기법을 적용하여 데이터를 정리하고, 리뷰를 토큰화하는 등의 작업을 수행할 수 있습니다.
 
-<Tip>
-
-`load_dataset()` 함수의 `data_files` 인수는 매우 유연하여 하나의 파일 경로 (str), 파일 경로들의 리스트 (list) 또는 스플릿 이름을 각 파일 경로에 매핑하는 dictionary를 값으로 받을 수 있습니다. 또한 Unix 쉘에서 사용하는 규칙에 따라 지정된 패턴과 일치하는 파일들을 glob할 수도 있습니다. (예를 들어, `data_files="*.json"`와 같이 설정하면 한 폴더에 있는 모든 JSON 파일들을 하나의 스플릿으로 glob할 수 있습니다.) 자세한 내용은 🤗 Datasets [documentation](https://huggingface.co/docs/datasets/loading#local-and-remote-files)에서 확인할 수 있습니다.
-
-</Tip>
+> [!TIP]
+> `load_dataset()` 함수의 `data_files` 인수는 매우 유연하여 하나의 파일 경로 (str), 파일 경로들의 리스트 (list) 또는 스플릿 이름을 각 파일 경로에 매핑하는 dictionary를 값으로 받을 수 있습니다. 또한 Unix 쉘에서 사용하는 규칙에 따라 지정된 패턴과 일치하는 파일들을 glob할 수도 있습니다. (예를 들어, `data_files="*.json"`와 같이 설정하면 한 폴더에 있는 모든 JSON 파일들을 하나의 스플릿으로 glob할 수 있습니다.) 자세한 내용은 🤗 Datasets [documentation](https://huggingface.co/docs/datasets/loading#local-and-remote-files)에서 확인할 수 있습니다.
 
 🤗 Datasets 로딩 스크립트는 입력 파일의 자동 압축 해제를 지원하기 때문에 실제로 사용할 때는 `data_files` 인수로 압축 파일을 입력하면 `gzip` 사용을 생략할 수 있습니다.
 
@@ -158,10 +152,7 @@ squad_it_dataset = load_dataset("json", data_files=data_files, field="data")
 
 이와 같은 방법을 통해 위에서 얻은 것과 같이 `DatasetDict` 객체를 생성할 수 있으며 수동으로 파일들을 다운로드받고 압축을 푸는 작업을 생략할 수 있습니다. 지금까지 Hugging Face Hub에서 호스팅되어 있지 않은 데이터셋을 로딩하는 다양한 방법을 알아보았습니다. 이제 다뤄볼 데이터셋을 얻었으니 다양한 데이터 전처리 기법을 배워보도록 합시다.
 
-<Tip>
-
-✏️ **시도해 보세요!** GitHub 또는 [UCI Machine Learning Repository](https://archive.ics.uci.edu/ml/index.php)에서 호스팅되는 다른 데이터셋을 선택하여 위에서 배운 기술을 통해 로컬 또는 원격으로 로딩해 보시고, 추가적으로 CSV 또는 텍스트 포맷으로 저장된 데이터셋도 로딩을 시도해 보시길 바랍니다. (이 포맷들에 대한 자세한 정보는 [documentation](https://huggingface.co/docs/datasets/loading#local-and-remote-files)을 참조하세요.)
-
-</Tip>
+> [!TIP]
+> ✏️ **시도해 보세요!** GitHub 또는 [UCI Machine Learning Repository](https://archive.ics.uci.edu/ml/index.php)에서 호스팅되는 다른 데이터셋을 선택하여 위에서 배운 기술을 통해 로컬 또는 원격으로 로딩해 보시고, 추가적으로 CSV 또는 텍스트 포맷으로 저장된 데이터셋도 로딩을 시도해 보시길 바랍니다. (이 포맷들에 대한 자세한 정보는 [documentation](https://huggingface.co/docs/datasets/loading#local-and-remote-files)을 참조하세요.)
 
 
diff --git a/chapters/ko/chapter8/2.mdx b/chapters/ko/chapter8/2.mdx
index 1970013e5..7bc822937 100644
--- a/chapters/ko/chapter8/2.mdx
+++ b/chapters/ko/chapter8/2.mdx
@@ -86,11 +86,8 @@ OSError: Can't load config for 'lewtun/distillbert-base-uncased-finetuned-squad-
 
 이 리포트에는 많은 정보를 담고 있으니, 같이 핵심 부분을 살펴보겠습니다. 우선 명심해야할 것은 tracebacks은 _아래부터 위로_ 읽어야 합니다. 이러한 말은 영어 텍스트를 위에서 아래로 읽어오곤 했다면 이상하게 들릴 수 있겠지만 모델과 토크나이저를 다운로드 할 때 `pipeline`이 만드는 함수 호출 순서를 보여주는 traceback을 반영했기 때문입니다. 내부에서 `pipeline`이 작동하는 방식에 대한 자세한 내용은 [단원 2](/course/chapter2)를 참고하세요.
 
-<Tip>
-
-Google Colab의 traceback에서 "6 frames" 주변의 파란 상자를 보셨나요?  traceback을 "frames"로 압축하는 Colab의 특별한 기능입니다. 만약 오류의 원인을 찾을 수 없다면, 두개의 작은 화살표를 클릭해서 전체 traceback을 확장되어 있는지 여부를 확인하세요.
-
-</Tip>
+> [!TIP]
+> Google Colab의 traceback에서 "6 frames" 주변의 파란 상자를 보셨나요?  traceback을 "frames"로 압축하는 Colab의 특별한 기능입니다. 만약 오류의 원인을 찾을 수 없다면, 두개의 작은 화살표를 클릭해서 전체 traceback을 확장되어 있는지 여부를 확인하세요.
 
 즉 마지막 에러 메시지와 발생한 예외의 이름을 가리키는 traceback의 마지막 줄을 뜻합니다. 이 경우의 예외 유형은 시스템 관련 오류를 나타내는 OS Error 입니다. 첨부된 오류 메시지를 읽으면 모델의 *config.json* 파일에 문제가 있는 것으로 보이며 이를 수정하기 위해 두 가지 선택지가 있습니다:
 
@@ -104,11 +101,8 @@ make sure that:
 """
 ```
 
-<Tip>
-
-💡 이해하기 어려운 에러 메시지를 접하게 된다면, 메세지를 복사해서 Google 또는 [스택오버플로우](https://stackoverflow.com/) 검색창에 붙여 넣기만 하세요(네 진짭니다!). 이는 오류가 발생한 첫 사람이 아닐 가능성이 높을뿐더러, 커뮤니티의 다른 사람들이 게시한 솔루션을 찾는 좋은 방법입니다. 예를 들어, 스택오버플로우에서 'OSError: Can't load config for'를 검색하면 여러 [해답](https://stackoverflow.com/search?q=OSError%3A+Can%27t+load+config+for+)을 제공하며 문제 해결을 위한 출발점으로 사용할 수 있습니다.
-
-</Tip>
+> [!TIP]
+> 💡 이해하기 어려운 에러 메시지를 접하게 된다면, 메세지를 복사해서 Google 또는 [스택오버플로우](https://stackoverflow.com/) 검색창에 붙여 넣기만 하세요(네 진짭니다!). 이는 오류가 발생한 첫 사람이 아닐 가능성이 높을뿐더러, 커뮤니티의 다른 사람들이 게시한 솔루션을 찾는 좋은 방법입니다. 예를 들어, 스택오버플로우에서 'OSError: Can't load config for'를 검색하면 여러 [해답](https://stackoverflow.com/search?q=OSError%3A+Can%27t+load+config+for+)을 제공하며 문제 해결을 위한 출발점으로 사용할 수 있습니다.
 
 첫 번째 제안은 모델 ID가 실제로 정확한지 확인하도록 요청하는 것으로 비즈니스의 첫 순서는 식별자(모델 이름)를 복사하여 Hub의 검색 창에 붙여넣는 것입니다:
 
@@ -160,11 +154,8 @@ pretrained_checkpoint = "distilbert-base-uncased"
 config = AutoConfig.from_pretrained(pretrained_checkpoint)
 ```
 
-<Tip warning={true}>
-
-🚨 여기에서 하는 접근 방식은 동료가 'distilbert-base-uncased'의 config를 수정했을 수 있으므로 이 접근 방식은 완전하지 않습니다. 우리는 동료에게 먼저 확인하고 싶겠지만, 이번 장에서의 목적상, 동료가 디폴트 config를 사용했다고 가정하겠습니다.
-
-</Tip>
+> [!WARNING]
+> 🚨 여기에서 하는 접근 방식은 동료가 'distilbert-base-uncased'의 config를 수정했을 수 있으므로 이 접근 방식은 완전하지 않습니다. 우리는 동료에게 먼저 확인하고 싶겠지만, 이번 장에서의 목적상, 동료가 디폴트 config를 사용했다고 가정하겠습니다.
 
 그런 다음 config 클래스의 `push_to_hub()` 기능을 사용해서 config 파일을 모델 저장소로 푸시할 수 있습니다:
 We can then push this to our model repository with the configuration's `push_to_hub()` function:
diff --git a/chapters/ko/chapter8/4.mdx b/chapters/ko/chapter8/4.mdx
index ee408de5b..878ccc25e 100644
--- a/chapters/ko/chapter8/4.mdx
+++ b/chapters/ko/chapter8/4.mdx
@@ -241,11 +241,8 @@ trainer.train_dataset.features["label"].names
 
 여기에는 token type IDs 가 없습니다. DistilBERT는 사용하지 않기 때문입니다. token type IDs를 사용하는 모델의 경우 입력에서 첫 번째 및 두 번째 문장이 있는 위치와 올바르게 일치하는지 확인해야 합니다.
 
-<Tip>
-
-✏️ **여러분 차례입니다!** 학습 데이터 세트의 두 번째 원소가 정상적인지 확인해보세요.
-
-</Tip>
+> [!TIP]
+> ✏️ **여러분 차례입니다!** 학습 데이터 세트의 두 번째 원소가 정상적인지 확인해보세요.
 
 여기에선 학습 세트에 대해서만 확인하지만,  동일한 방식으로 검증 및 평가 세트를 다시 확인해야 합니다.
 
@@ -507,11 +504,8 @@ trainer.optimizer.step()
 
 이 문제를 해결하려면 GPU 공간을 적게 사용하면 됩니다. 먼저 GPU에 동시에 두 개의 모델이 있지 않은지 확인합니다(물론 문제 해결에 필요한 경우 제외). 그런 다음 배치 크기를 줄여야 합니다. 이는 모델의 모든 중간 결과값 크기와 기울기에 직접적인 영향을 미치기 때문입니다. 문제가 지속되면 더 작은 모델 버전을 사용하는 것이 좋습니다.
 
-<Tip>
-
-코스의 다음 부분에서는 메모리 사용량을 줄이고 가장 큰 모델을 파인 튜닝할 수 있는 고급 기술을 살펴보겠습니다.
-
-</Tip>
+> [!TIP]
+> 코스의 다음 부분에서는 메모리 사용량을 줄이고 가장 큰 모델을 파인 튜닝할 수 있는 고급 기술을 살펴보겠습니다.
 
 ### 모델 평가하기
 
@@ -538,11 +532,8 @@ trainer.evaluate()
 TypeError: only size-1 arrays can be converted to Python scalars
 ```
 
-<Tip>
-
-💡 에러가 발생하기 전에 많은 컴퓨팅 리소스를 낭비하지 않도록 항상 `trainer.train()`을 실행하기 전에 `trainer.evaluate()`를 실행할 수 있는지 확인해야 합니다.
-
-</Tip>
+> [!TIP]
+> 💡 에러가 발생하기 전에 많은 컴퓨팅 리소스를 낭비하지 않도록 항상 `trainer.train()`을 실행하기 전에 `trainer.evaluate()`를 실행할 수 있는지 확인해야 합니다.
 
 평가 루프에서 문제를 디버깅하기 전에 먼저 데이터를 살펴보았는지, 배치를 적절하게 구성할 수 있는지, 모델을 실행할 수 있는지 확인해야 합니다. 모든 단계를 완료했으므로 다음의 코드를 에러 없이 실행할 수 있습니다.:
 
@@ -668,11 +659,8 @@ trainer.train()
 ```
 이 경우 더 이상 문제가 없으며 스크립트는 모델을 파인튜닝 할 것이고 합리적인 결과를 제공해줄 것입니다. 그러나 학습이 에러 없이 진행되었고 학습된 모델이 전혀 잘 작동하지 않을 때 우리는 무엇을 할 수 있을까요? 이것이 기계 학습의 가장 어려운 부분이며 도움이 될 수 있는 몇 가지 기술을 보여 드리겠습니다.
 
-<Tip>
-
-💡 수동 학습 루프를 사용하는 경우 학습 파이프라인을 디버그하기 위해 동일한 단계가 적용되지만 더 쉽게 분리할 수 있습니다. 하지만 올바른 위치의 `model.eval()` 또는 `model.train()` 또는 각 단계의 `zero_grad()`를 잊지 않았는지 확인하세요!
-
-</Tip>
+> [!TIP]
+> 💡 수동 학습 루프를 사용하는 경우 학습 파이프라인을 디버그하기 위해 동일한 단계가 적용되지만 더 쉽게 분리할 수 있습니다. 하지만 올바른 위치의 `model.eval()` 또는 `model.train()` 또는 각 단계의 `zero_grad()`를 잊지 않았는지 확인하세요!
 
 ## 학습 중 조용한 에러 디버깅
 
@@ -687,11 +675,8 @@ trainer.train()
 - 다른 레이블보다 정답에 가까운 레이블이 있는지?
 - 모델이 무작위 답, 언제나 같은 답을 예측할 경우 어떤 손실 함수와 매트릭을 정해야할지?
 
-<Tip warning={true}>
-
-분산 학습을 수행하는 경우 각 프로세스에서 데이터 세트의 샘플을 출력하고 동일한 결과를 얻었는지 세 번 확인하세요. 한 가지 일반적인 버그는 데이터 생성 시 각 프로세스가 서로 다른 버전의 데이터 세트를 갖도록 만드는 임의성의 원인이 있다는 것입니다.
-
-</Tip>
+> [!WARNING]
+> 분산 학습을 수행하는 경우 각 프로세스에서 데이터 세트의 샘플을 출력하고 동일한 결과를 얻었는지 세 번 확인하세요. 한 가지 일반적인 버그는 데이터 생성 시 각 프로세스가 서로 다른 버전의 데이터 세트를 갖도록 만드는 임의성의 원인이 있다는 것입니다.
 
 데이터를 살펴본 후 모델의 몇 가지 예측을 살펴보고 디코딩합니다. 모델이 항상 동일한 것을 예측하는 경우 데이터 세트가 하나의 범주(분류 문제의 경우)로 편향되어 있기 때문일 수 있습니다. 희귀 클래스를 오버샘플링하는 것과 같은 기술이 도움이 될 수 있습니다.
 
@@ -722,11 +707,8 @@ for _ in range(20):
     trainer.optimizer.zero_grad()
 ```
 
-<Tip>
-
-💡 학습 데이터가 불균형한 경우 모든 레이블을 포함하는 학습 데이터 배치를 빌드해야 합니다.
-
-</Tip>
+> [!TIP]
+> 💡 학습 데이터가 불균형한 경우 모든 레이블을 포함하는 학습 데이터 배치를 빌드해야 합니다.
 
 결과 모델은 동일한 `batch`에서 완벽에 가까운 결과를 가져야 합니다. 결과 예측에 대한 메트릭을 계산해 보겠습니다.:
 
@@ -747,11 +729,8 @@ compute_metrics((preds.cpu().numpy(), labels.cpu().numpy()))
 
 모델이 이와 같이 완벽한 결과를 얻지 못한다면 문제 또는 데이터를 구성하는 방식에 문제가 있음을 의미하므로 이를 수정해야 합니다. 과적합 테스트를 통과해야만 모델이 실제로 무언가를 배울 수 있다는 것을 확신할 수 있습니다.
 
-<Tip warning={true}>
-
-⚠️ 이 테스트 후에는 모델과 `Trainer`를 다시 만들어야 합니다. 학습한 모델은 전체 데이터 세트에서 유용한 것을 재구성하거나 학습할 수 없기 때문입니다.
-
-</Tip>
+> [!WARNING]
+> ⚠️ 이 테스트 후에는 모델과 `Trainer`를 다시 만들어야 합니다. 학습한 모델은 전체 데이터 세트에서 유용한 것을 재구성하거나 학습할 수 없기 때문입니다.
 
 ### 첫 기준 모델을 만들기전엔 튜닝하지 마세요.
 
diff --git a/chapters/ko/chapter8/4_tf.mdx b/chapters/ko/chapter8/4_tf.mdx
index 5640c6aba..2f53385da 100644
--- a/chapters/ko/chapter8/4_tf.mdx
+++ b/chapters/ko/chapter8/4_tf.mdx
@@ -112,15 +112,12 @@ model.compile(optimizer="adam")
 이제 모델의 내부 손실 함수를 사용하게 될 것이고 문제가 해결 될 겁니다!
 Now we'll use the model's internal loss, and this problem should be resolved!
 
-<Tip>
-
-✏️ **여러분 차례입니다!** 다른 문제를 해결 후 추가 도전으로, 이 단계로 돌아와 모델이 내부 손실 대신 원래 Keras 손실함수로 작동하도록 할 수 있습니다. 레이블이 올바르게 출력되도록 하려면 `to_tf_dataset()`의 `label_cols` 인수에 `"labels"`를 추가해야 합니다. 그러면 그래디언트가 계산됩니다. 하지만 우리가 지정한 손실함수에는 한 가지 문제가 더 있습니다. 학습은 문제가 있음에도 불구하고 계속 실행되지만 학습이 매우 느리고 높은 train 손실값에서 정체 될 수 있습니다. 왜 그런지 알아 볼까요?
- 
-만약 어렵다면 ROT13 인코드 방식의 힌트를 보세요 Transformers에서 SequenceClassification 모델의 출력을 보면 첫 번째 출력은 'logits'입니다. logits란 무엇일까요?(살면서 ROT13 인코딩은 처음 봤네요, 궁금하면 영어 원본으로 보세요.)
-
-두번째 힌트: 옵티마이저, 활성함수 또는 손실함수를 문자열로 지정하면 Keras는 해당 함수의 모든 인수 값을 기본값으로 설정합니다. SparseCategoricalCrossentropy에는 어떤 인수가 있으며 기본값은 무엇일까요?
-
-</Tip>
+> [!TIP]
+> ✏️ **여러분 차례입니다!** 다른 문제를 해결 후 추가 도전으로, 이 단계로 돌아와 모델이 내부 손실 대신 원래 Keras 손실함수로 작동하도록 할 수 있습니다. 레이블이 올바르게 출력되도록 하려면 `to_tf_dataset()`의 `label_cols` 인수에 `"labels"`를 추가해야 합니다. 그러면 그래디언트가 계산됩니다. 하지만 우리가 지정한 손실함수에는 한 가지 문제가 더 있습니다. 학습은 문제가 있음에도 불구하고 계속 실행되지만 학습이 매우 느리고 높은 train 손실값에서 정체 될 수 있습니다. 왜 그런지 알아 볼까요?
+>
+> 만약 어렵다면 ROT13 인코드 방식의 힌트를 보세요 Transformers에서 SequenceClassification 모델의 출력을 보면 첫 번째 출력은 'logits'입니다. logits란 무엇일까요?(살면서 ROT13 인코딩은 처음 봤네요, 궁금하면 영어 원본으로 보세요.)
+>
+> 두번째 힌트: 옵티마이저, 활성함수 또는 손실함수를 문자열로 지정하면 Keras는 해당 함수의 모든 인수 값을 기본값으로 설정합니다. SparseCategoricalCrossentropy에는 어떤 인수가 있으며 기본값은 무엇일까요?
 
 이제 훈련을 해보죠. 그라디언트를 가져와야 하므로 부디(갑자기 불길한 음악이 재생됨) `model.fit()`을 호출하면 모든 것이 잘 작동할 것입니다!
 
@@ -364,11 +361,8 @@ model = TFAutoModelForSequenceClassification.from_pretrained(model_checkpoint)
 model.compile(optimizer=Adam(5e-5))
 ```
 
-<Tip>
-
-🤗 Transformers에서 `create_optimizer()` 함수를 가져올 수도 있습니다. 이렇게 하면 적합한 가중치 감쇠(weight decay)와 학습률 워밍업과 감쇠가 포함된 AdamW 옵티마이저가 제공됩니다. 이 옵티마이저는 기본 Adam 옵티마이저로 얻은 결과보다 약간 더 나은 결과를 만들어내는 경우가 많습니다.
-
-</Tip>
+> [!TIP]
+> 🤗 Transformers에서 `create_optimizer()` 함수를 가져올 수도 있습니다. 이렇게 하면 적합한 가중치 감쇠(weight decay)와 학습률 워밍업과 감쇠가 포함된 AdamW 옵티마이저가 제공됩니다. 이 옵티마이저는 기본 Adam 옵티마이저로 얻은 결과보다 약간 더 나은 결과를 만들어내는 경우가 많습니다.
 
 이제 새롭고, 개선된 학습률로 모델 학습에 도전해보겠습니다.:
 
@@ -391,11 +385,8 @@ model.fit(train_dataset)
 
 메모리 부족을 알리는 신호는 "OOM when allocating tensor"와 같은 에러입니다. OOM은 "out of memory"의 줄임말입니다. 이것은 큰 언어 모델을 다룰 때 매우 흔한 위험입니다. 이 문제가 발생하면 배치 크기를 절반으로 줄이고 다시 시도하는 것이 좋습니다. 그러나 일부 모델은 *매우* 큽니다. 예를 들어, 최대크기 GPT-2는 1.5B개의 매개변수가 있습니다. 즉, 모델을 저장하는 데만 6GB의 메모리가 필요하고 그라디언트에는 6GB가 추가로 필요합니다! 최대 크기 GPT-2 모델을 학습하려면 사용하는 배치 크기에 관계없이 일반적으로 20GB 이상의 VRAM이 필요하며, 소수의 GPU만 해당합니다. 'distilbert-base-cased'와 같은 더 가벼운 모델은 실행하기가 훨씬 쉽고 훨씬 빠르게 학습할 수 있습니다.
 
-<Tip>
-
-다음 장에서는 메모리 사용량을 줄이고 가장 큰 모델을 파인튜닝 할 수 있는 고급 기술을 살펴보겠습니다.
-
-</Tip>
+> [!TIP]
+> 다음 장에서는 메모리 사용량을 줄이고 가장 큰 모델을 파인튜닝 할 수 있는 고급 기술을 살펴보겠습니다.
 
 ### 몹시 배고픈 TensorFlow 🦛
 
@@ -450,21 +441,15 @@ for batch in train_dataset:
 model.fit(batch, epochs=20)
 ```
 
-<Tip>
-
-💡 학습 데이터가 불균형한 경우 모든 레이블을 포함하는 학습 데이터 배치를 구성해야 합니다.
-
-</Tip>
+> [!TIP]
+> 💡 학습 데이터가 불균형한 경우 모든 레이블을 포함하는 학습 데이터 배치를 구성해야 합니다.
 
 결과 모델은 `배치`에서 완벽에 가까운 결과를 가져야 하며 손실은 0(또는 사용 중인 손실의 최소값)으로 빠르게 감소합니다.
 
 모델이 이와 같은 완벽한 결과를 얻도록 관리하지 못한다면 문제 또는 데이터를 구성하는 방식에 문제가 있음을 의미하므로 수정해야 합니다. 과적합 테스트를 통과해야만 모델이 실제로 무언가를 학습할 수 있다고 확신할 수 있습니다.
 
-<Tip warning={true}>
-
-⚠️ 과적합 테스트 후에 모델을 다시 만들고 다시 컴파일해야 합니다. 학습 모델이 전체 데이터 세트에서 유용한 것을 복구하고 학습할 수 없기 때문입니다.
-
-</Tip>
+> [!WARNING]
+> ⚠️ 과적합 테스트 후에 모델을 다시 만들고 다시 컴파일해야 합니다. 학습 모델이 전체 데이터 세트에서 유용한 것을 복구하고 학습할 수 없기 때문입니다.
 
 ### 첫 번째 기준이 생길 때까지 아무 것도 조정하지 마세요.
 
diff --git a/chapters/ko/chapter8/5.mdx b/chapters/ko/chapter8/5.mdx
index 229cb67da..57ad3f660 100644
--- a/chapters/ko/chapter8/5.mdx
+++ b/chapters/ko/chapter8/5.mdx
@@ -17,11 +17,8 @@ Hugging Face 라이브러리 중에서 이상한 것을 발견하면 고칠 수
 
 버그를 생성하는 코드 조각을 분리하는 것이 매우 중요합니다. Hugging Face 팀의 어느 누구도 (아직) 마술사가 아니며 볼 수 없는 것을 고칠 수 없기 때문입니다. 최소한의 재현 가능한 예는 이름에서 알 수 있듯이 재현 가능해야 합니다. 즉, 가지고 있는 외부 파일이나 데이터에 의존해서는 안 됩니다. 사용 중인 데이터를 실제 값처럼 보이지만 여전히 동일한 오류를 생성하는 일부 더미 값으로 바꾸세요.
 
-<Tip>
-
-🚨 🤗 Transformers 저장소의 많은 문제는 트랜스포머를 재현하는 데 사용된 데이터에 액세스할 수 없기 때문에 해결되지 않습니다.
-
-</Tip>
+> [!TIP]
+> 🚨 🤗 Transformers 저장소의 많은 문제는 트랜스포머를 재현하는 데 사용된 데이터에 액세스할 수 없기 때문에 해결되지 않습니다.
 
 코드가 준비되었더라도 더 적은 수의 코드로 줄여서 우리가 _최소 재현 가능한 예제_라고 부르는 것을 만들 수 있습니다. 여기에는 약간의 추가 작업이 필요하지만 멋지고 짧은 버그 재현 코드를 제공하면 코드 수정에 대한 도움이 보장됩니다.
 
diff --git a/chapters/my/_toctree.yml b/chapters/my/_toctree.yml
index 8cb174f67..676d135ad 100644
--- a/chapters/my/_toctree.yml
+++ b/chapters/my/_toctree.yml
@@ -3,262 +3,262 @@
   - local: chapter0/1
     title: နိဒါန်း
 
-# - title: 1. Transformer မော်ဒယ်များ
-#   sections:
-#   - local: chapter1/1
-#     title: နိဒါန်း
-#   - local: chapter1/2
-#     title: သဘာဝဘာသာစကားစီမံဆောင်ရွက်မှု (Natural Language Processing - NLP) နှင့် ဘာသာစကားမော်ဒယ်ကြီးများ
-#   - local: chapter1/3
-#     title: Transformer များ၏ စွမ်းဆောင်နိုင်ရည်များ
-#   - local: chapter1/4
-#     title: Transformer များ မည်သို့လုပ်ဆောင်သလဲ။
-#   - local: chapter1/5
-#     title: 🤗 Transformers များက လုပ်ငန်းတာဝန်များကို မည်သို့ဖြေရှင်းသလဲ။
-#   - local: chapter1/6
-#     title: Transformer Architectures များ
-#   - local: chapter1/7
-#     title: အမြန်ဉာဏ်စမ်း
-#   - local: chapter1/8
-#     title: LLM များဖြင့် Inference လုပ်ဆောင်ခြင်း
-#   - local: chapter1/9
-#     title: ဘက်လိုက်မှုနှင့် ကန့်သတ်ချက်များ
-#   - local: chapter1/10
-#     title: အနှစ်ချုပ်
-#   - local: chapter1/11
-#     title: အသိအမှတ်ပြုစာမေးပွဲ
-#     quiz: 1
+- title: 1. Transformer models များ
+  sections:
+  - local: chapter1/1
+    title: နိဒါန်း
+  - local: chapter1/2
+    title: Natural Language Processing နှင့် Large Language Models များ
+  - local: chapter1/3
+    title: Transformers တွေက ဘာတွေလုပ်နိုင်လဲ။
+  - local: chapter1/4
+    title: Transformers တွေက ဘယ်လိုအလုပ်လုပ်လဲ။
+  - local: chapter1/5
+    title: 🤗 Transformers တွေက လုပ်ငန်းတာဝန်တွေကို ဘယ်လိုဖြေရှင်းပေးလဲ။
+  - local: chapter1/6
+    title: Transformer Architectures များ
+  - local: chapter1/7
+    title: အမြန်ဉာဏ်စမ်း
+  - local: chapter1/8
+    title: LLMs များဖြင့် မှန်းဆတွက်ချက်ခြင်း။ 
+  - local: chapter1/9
+    title: ဘက်လိုက်မှုနှင့် ကန့်သတ်ချက်များ
+  - local: chapter1/10
+    title: အနှစ်ချုပ်
+  - local: chapter1/11
+    title: အသိအမှတ်ပြု စာမေးပွဲ
+    quiz: 1
 
-# - title: 2. 🤗 Transformers ကိုအသုံးပြုခြင်း
-#   sections:
-#   - local: chapter2/1
-#     title: နိဒါန်း
-#   - local: chapter2/2
-#     title: Pipeline နောက်ကွယ်မှ လုပ်ဆောင်ချက်များ
-#   - local: chapter2/3
-#     title: မော်ဒယ်များ
-#   - local: chapter2/4
-#     title: Tokenizer များ
-#   - local: chapter2/5
-#     title: Sequence များစွာကို စီမံကိုင်တွယ်ခြင်း
-#   - local: chapter2/6
-#     title: အားလုံးကို ပေါင်းစည်းခြင်း
-#   - local: chapter2/7
-#     title: အခြေခံအသုံးပြုမှု ပြီးစီးပါပြီ။
-#   - local: chapter2/8
-#     title: Optimized Inference Deployment
-#   - local: chapter2/9
-#     title: အခန်းပြီးဉာဏ်စမ်း
-#     quiz: 2
+- title: 2. 🤗 Transformers ကို အသုံးပြုခြင်း
+  sections:
+  - local: chapter2/1
+    title: နိဒါန်း
+  - local: chapter2/2
+    title: Pipeline နောက်ကွယ်မှ အကြောင်းအရာများ
+  - local: chapter2/3
+    title: Models
+  - local: chapter2/4
+    title: Tokenizers
+  - local: chapter2/5
+    title: Sequence များစွာကို ကိုင်တွယ်ခြင်း
+  - local: chapter2/6
+    title: အားလုံးကို ပေါင်းစပ်ခြင်း
+  - local: chapter2/7
+    title: အခြေခံ အသုံးပြုမှု ပြီးဆုံးပါပြီ!
+  - local: chapter2/8
+    title: Optimization လုပ်ထားသော Inference Deployment
+  - local: chapter2/9
+    title: အခန်းပြီးဆုံးခြင်း စစ်ဆေးမှု
+    quiz: 2
 
-# - title: 3. ကြိုတင်လေ့ကျင့်ထားသော မော်ဒယ်အား Fine-tuning ပြုလုပ်ခြင်း
-#   sections:
-#   - local: chapter3/1
-#     title: နိဒါန်း
-#   - local: chapter3/2
-#     title: ဒေတာများအား စီမံဆောင်ရွက်ခြင်း
-#   - local: chapter3/3
-#     title: Trainer API ဖြင့် မော်ဒယ်အား Fine-tuning ပြုလုပ်ခြင်း
-#   - local: chapter3/4
-#     title: ပြည့်စုံသော လေ့ကျင့်မှု Loop
-#   - local: chapter3/5
-#     title: Learning Curve များကို နားလည်ခြင်း
-#   - local: chapter3/6
-#     title: Fine-tuning ပြီးမြောက်ပါပြီ။
-#   - local: chapter3/7
-#     title: အခန်းပြီးဉာဏ်စမ်း
-#     quiz: 3
+- title: 3. Pretrained Model တစ်ခုကို Fine-tuning လုပ်ခြင်း
+  sections:
+  - local: chapter3/1
+    title: နိဒါန်း
+  - local: chapter3/2
+    title: ဒေတာများကို စီမံဆောင်ရွက်ခြင်း
+  - local: chapter3/3
+    title: Trainer API ဖြင့် မော်ဒယ်တစ်ခုကို Fine-tuning လုပ်ခြင်း
+  - local: chapter3/4
+    title: ပြည့်စုံသော Training Loop တစ်ခု
+  - local: chapter3/5
+    title: Learning Curves များကို နားလည်ခြင်း
+  - local: chapter3/6
+    title: Fine-tuning လုပ်ငန်း ပြီးစီးပြီ!
+  - local: chapter3/7
+    title: အခန်းပြီးဆုံးခြင်း အသိအမှတ်ပြု လက်မှတ်
+    quiz: 3
 
-# - title: 4. မော်ဒယ်များနှင့် Tokenizer များအား မျှဝေခြင်း
-#   sections:
-#   - local: chapter4/1
-#     title: The Hugging Face Hub
-#   - local: chapter4/2
-#     title: ကြိုတင်လေ့ကျင့်ထားသော မော်ဒယ်များအား အသုံးပြုခြင်း
-#   - local: chapter4/3
-#     title: ကြိုတင်လေ့ကျင့်ထားသော မော်ဒယ်များအား မျှဝေခြင်း
-#   - local: chapter4/4
-#     title: မော်ဒယ်ကတ် တည်ဆောက်ခြင်း
-#   - local: chapter4/5
-#     title: အပိုင်း ၁ ပြီးစီးပါပြီ။
-#   - local: chapter4/6
-#     title: အခန်းပြီးဉာဏ်စမ်း
-#     quiz: 4
+- title: 4. Models နှင့် Tokenizers များကို မျှဝေခြင်း
+  sections:
+  - local: chapter4/1
+    title: Hugging Face Hub
+  - local: chapter4/2
+    title: Pretrained Models များကို အသုံးပြုခြင်း
+  - local: chapter4/3
+    title: Pretrained Models များကို မျှဝေခြင်း
+  - local: chapter4/4
+    title: Model Card တစ်ခု တည်ဆောက်ခြင်း
+  - local: chapter4/5
+    title: အပိုင်း ၁ ပြီးဆုံးပါပြီ!
+  - local: chapter4/6
+    title: အခန်း (၄) ဆိုင်ရာ မေးခွန်းများ
+    quiz: 4
 
-# - title: 5. The 🤗 Datasets Library
-#   sections:
-#   - local: chapter5/1
-#     title: နိဒါန်း
-#   - local: chapter5/2
-#     title: ကျွန်ုပ်၏ Dataset သည် Hub တွင်မရှိလျှင်
-#   - local: chapter5/3
-#     title: ဒေတာများအား ပိုင်းဖြတ်ရန်
-#   - local: chapter5/4
-#     title: Big Data? 🤗 Datasets က ကူညီပါပြီ။
-#   - local: chapter5/5
-#     title: ကိုယ်ပိုင် Dataset ဖန်တီးခြင်း
-#   - local: chapter5/6
-#     title: FAISS ဖြင့် Semantic Search ပြုလုပ်ခြင်း
-#   - local: chapter5/7
-#     title: 🤗 Datasets ပြီးမြောက်ပါပြီ။
-#   - local: chapter5/8
-#     title: အခန်းပြီးဉာဏ်စမ်း
-#     quiz: 5
+- title: 5. The 🤗 Datasets library
+  sections:
+  - local: chapter5/1
+    title: နိဒါန်း
+  - local: chapter5/2
+    title: ကျွန်ုပ်၏ Dataset သည် Hub တွင် မရှိလျှင် ဘာလုပ်ရမလဲ။
+  - local: chapter5/3
+    title: Slice and Dice လုပ်ဖို့ အချိန်တန်ပြီ။
+  - local: chapter5/4
+    title: Big Data လား။ 🤗 Datasets က ကူညီပါလိမ့်မယ်။
+  - local: chapter5/5
+    title: ကိုယ်ပိုင် Dataset တစ်ခု ဖန်တီးခြင်း
+  - local: chapter5/6
+    title: FAISS ဖြင့် Semantic Search ပြုလုပ်ခြင်း
+  - local: chapter5/7
+    title: 🤗 Datasets၊ အဆင်သင့်ဖြစ်ပါပြီ!
+  - local: chapter5/8
+    title: အခန်း (၅) ဆိုင်ရာ မေးခွန်းများ
+    quiz: 5
 
-# - title: 6. The 🤗 Tokenizers Library
+# - title: 6. The 🤗 Tokenizers library
 #   sections:
 #   - local: chapter6/1
-#     title: နိဒါန်း
+#     title: Introduction
 #   - local: chapter6/2
-#     title: Tokenizer အဟောင်းမှ Tokenizer အသစ်တစ်ခု လေ့ကျင့်ခြင်း
+#     title: Training a new tokenizer from an old one
 #   - local: chapter6/3
-#     title: Fast Tokenizer များ၏ ထူးခြားစွမ်းရည်များ
+#     title: Fast tokenizers' special powers
 #   - local: chapter6/3b
-#     title: QA Pipeline ရှိ Fast Tokenizer များ
+#     title: Fast tokenizers in the QA pipeline
 #   - local: chapter6/4
-#     title: Normalization နှင့် Pre-tokenization
+#     title: Normalization and pre-tokenization
 #   - local: chapter6/5
-#     title: Byte-Pair Encoding Tokenization
+#     title: Byte-Pair Encoding tokenization
 #   - local: chapter6/6
-#     title: WordPiece Tokenization
+#     title: WordPiece tokenization
 #   - local: chapter6/7
-#     title: Unigram Tokenization
+#     title: Unigram tokenization
 #   - local: chapter6/8
-#     title: Tokenizer တစ်ခုအား အဆင့်ဆင့် တည်ဆောက်ခြင်း
+#     title: Building a tokenizer, block by block
 #   - local: chapter6/9
-#     title: Tokenizer များ ပြီးမြောက်ပါပြီ။
+#     title: Tokenizers, check!
 #   - local: chapter6/10
-#     title: အခန်းပြီးဉာဏ်စမ်း
+#     title: End-of-chapter quiz
 #     quiz: 6
 
-# - title: 7. Classical NLP Tasks
+# - title: 7. Classical NLP tasks
 #   sections:
 #   - local: chapter7/1
-#     title: နိဒါန်း
+#     title: Introduction
 #   - local: chapter7/2
-#     title: Token ခွဲခြားသတ်မှတ်ခြင်း
+#     title: Token classification
 #   - local: chapter7/3
-#     title: Masked Language Model တစ်ခုအား Fine-tuning ပြုလုပ်ခြင်း
+#     title: Fine-tuning a masked language model
 #   - local: chapter7/4
-#     title: ဘာသာပြန်ခြင်း
+#     title: Translation
 #   - local: chapter7/5
-#     title: အနှစ်ချုပ်ခြင်း
+#     title: Summarization
 #   - local: chapter7/6
-#     title: Causal Language Model တစ်ခုအား အစမှ လေ့ကျင့်ခြင်း
+#     title: Training a causal language model from scratch
 #   - local: chapter7/7
-#     title: မေးခွန်းဖြေဆိုခြင်း
+#     title: Question answering
 #   - local: chapter7/8
-#     title: LLM များအား ကျွမ်းကျင်ခြင်း
+#     title: Mastering LLMs
 #   - local: chapter7/9
-#     title: အခန်းပြီးဉာဏ်စမ်း
+#     title: End-of-chapter quiz
 #     quiz: 7
 
-# - title: 8. အကူအညီတောင်းခံနည်း
+# - title: 8. How to ask for help
 #   sections:
 #   - local: chapter8/1
-#     title: နိဒါန်း
+#     title: Introduction
 #   - local: chapter8/2
-#     title: အမှားတွေ့ရှိသည့်အခါ
+#     title: What to do when you get an error
 #   - local: chapter8/3
-#     title: Forums တွင် အကူအညီတောင်းခံခြင်း
+#     title: Asking for help on the forums
 #   - local: chapter8/4
-#     title: လေ့ကျင့်မှု Pipeline အား Debugging ပြုလုပ်ခြင်း
+#     title: Debugging the training pipeline
 #     local_fw: { pt: chapter8/4, tf: chapter8/4_tf }
 #   - local: chapter8/5
-#     title: ကောင်းမွန်သော Issue တစ်ခု ရေးသားနည်း
+#     title: How to write a good issue
 #   - local: chapter8/6
-#     title: အပိုင်း ၂ ပြီးစီးပါပြီ။
+#     title: Part 2 completed!
 #   - local: chapter8/7
-#     title: အခန်းပြီးဉာဏ်စမ်း
+#     title: End-of-chapter quiz
 #     quiz: 8
 
-# - title: 9. Demo များ တည်ဆောက်ခြင်းနှင့် မျှဝေခြင်း
-#   subtitle: ကျွန်ုပ် မော်ဒယ်တစ်ခုကို လေ့ကျင့်ပြီးပါပြီ၊ ၎င်းကို မည်သို့ပြသနိုင်မည်နည်း။
+# - title: 9. Building and sharing demos
+#   subtitle: I trained a model, but how can I show it off?
 #   sections:
 #   - local: chapter9/1
-#     title: Gradio မိတ်ဆက်
+#     title: Introduction to Gradio
 #   - local: chapter9/2
-#     title: ပထမဆုံး Demo ကို တည်ဆောက်ခြင်း
+#     title: Building your first demo
 #   - local: chapter9/3
-#     title: Interface Class ကို နားလည်ခြင်း
+#     title: Understanding the Interface class
 #   - local: chapter9/4
-#     title: Demo များကို မျှဝေခြင်း
+#     title: Sharing demos with others
 #   - local: chapter9/5
-#     title: Hugging Face Hub နှင့် ပေါင်းစည်းမှုများ
+#     title: Integrations with the Hugging Face Hub
 #   - local: chapter9/6
-#     title: အဆင့်မြင့် Interface အင်္ဂါရပ်များ
+#     title: Advanced Interface features
 #   - local: chapter9/7
-#     title: Blocks မိတ်ဆက်
+#     title: Introduction to Blocks
 #   - local: chapter9/8
-#     title: Gradio ပြီးမြောက်ပါပြီ။
+#     title: Gradio, check!
 #   - local: chapter9/9
-#     title: အခန်းပြီးဉာဏ်စမ်း
+#     title: End-of-chapter quiz
 #     quiz: 9
 
-# - title: 10. အရည်အသွေးမြင့် Dataset များ ပြုစုခြင်း
-#   subtitle: ထူးခြားသော Dataset များဖန်တီးရန် Argilla ကို မည်သို့အသုံးပြုမည်နည်း။
+# - title: 10. Curate high-quality datasets
+#   subtitle: How to use Argilla to create amazing datasets
 #   sections:
 #   - local: chapter10/1
-#     title: Argilla မိတ်ဆက်
+#     title: Introduction to Argilla
 #   - local: chapter10/2
-#     title: Argilla Instance ကို စတင်ပြင်ဆင်ခြင်း
+#     title: Set up your Argilla instance
 #   - local: chapter10/3
-#     title: Dataset ကို Argilla သို့ Load လုပ်ခြင်း
+#     title: Load your dataset to Argilla
 #   - local: chapter10/4
-#     title: Dataset ကို Annotation ပြုလုပ်ခြင်း
+#     title: Annotate your dataset
 #   - local: chapter10/5
-#     title: Annotation ပြုလုပ်ပြီးသော Dataset ကို အသုံးပြုခြင်း
+#     title: Use your annotated dataset
 #   - local: chapter10/6
-#     title: Argilla ပြီးမြောက်ပါပြီ။
+#     title: Argilla, check!
 #   - local: chapter10/7
-#     title: အခန်းပြီးဉာဏ်စမ်း
+#     title: End-of-chapter quiz
 #     quiz: 10
 
-# - title: 11. Large Language Models များအား Fine-tune ပြုလုပ်ခြင်း
-#   subtitle: Supervised Fine-tuning နှင့် Low-Rank Adaptation ကို အသုံးပြု၍ Large Language Model တစ်ခုအား Fine-tune ပြုလုပ်ခြင်း
+# - title: 11. Fine-tune Large Language Models
+#   subtitle: Use Supervised Fine-tuning and Low-Rank Adaptation to fine-tune a large language model
 #   sections:
 #   - local: chapter11/1
-#     title: နိဒါန်း
+#     title: Introduction
 #   - local: chapter11/2
-#     title: Chat Template များ
+#     title: Chat Templates
 #   - local: chapter11/3
-#     title: SFTTrainer ဖြင့် Fine-Tuning ပြုလုပ်ခြင်း
+#     title: Fine-Tuning with SFTTrainer
 #   - local: chapter11/4
 #     title: LoRA (Low-Rank Adaptation)
 #   - local: chapter11/5
-#     title: အကဲဖြတ်ခြင်း
+#     title: Evaluation
 #   - local: chapter11/6
-#     title: နိဂုံး
+#     title: Conclusion
 #   - local: chapter11/7
-#     title: စာမေးပွဲအချိန်။
+#     title: Exam Time!
 #     quiz: 11
 
-# - title: 12. Reasoning မော်ဒယ်များ တည်ဆောက်ခြင်း
-#   subtitle: DeepSeek R1 ကဲ့သို့ Reasoning မော်ဒယ်များအား တည်ဆောက်နည်းကို လေ့လာပါ။
+# - title: 12. Build Reasoning Models
+#   subtitle: Learn how to build reasoning models like DeepSeek R1
 #   new: true
 #   sections:
 #   - local: chapter12/1
-#     title: နိဒါန်း
+#     title: Introduction
 #   - local: chapter12/2
-#     title: LLM များပေါ်တွင် Reinforcement Learning
+#     title: Reinforcement Learning on LLMs
 #   - local: chapter12/3
-#     title: DeepSeek R1 Paper မှ Aha Moment
+#     title: The Aha Moment in the DeepSeek R1 Paper
 #   - local: chapter12/3a
-#     title: DeepSeekMath ရှိ GRPO ကို အဆင့်မြင့် နားလည်ခြင်း
+#     title: Advanced Understanding of GRPO in DeepSeekMath
 #   - local: chapter12/4
-#     title: TRL တွင် GRPO ကို အကောင်အထည်ဖော်ခြင်း
+#     title: Implementing GRPO in TRL
 #   - local: chapter12/5
-#     title: GRPO ဖြင့် မော်ဒယ်တစ်ခုအား Fine-tune ပြုလုပ်ရန် လက်တွေ့လေ့ကျင့်ခန်း
+#     title: Practical Exercise to Fine-tune a model with GRPO
 #   - local: chapter12/6
-#     title: Unsloth ဖြင့် လက်တွေ့လေ့ကျင့်ခန်း
+#     title: Practical Exercise with Unsloth
 #   - local: chapter12/7
-#     title: မကြာမီလာမည်။
+#     title: Coming soon...
 
-# - title: သင်တန်းဆိုင်ရာ ပွဲများ
+# - title: Course Events
 #   sections:
 #   - local: events/1
-#     title: Live Sessions နှင့် Workshop များ
+#     title: Live sessions and workshops
 #   - local: events/2
-#     title: အပိုင်း ၂ ထုတ်ပြန်ခြင်း ပွဲ
+#     title: Part 2 release event
 #   - local: events/3
-#     title: Gradio Blocks Party
\ No newline at end of file
+#     title: Gradio Blocks party
diff --git a/chapters/my/chapter0/1.mdx b/chapters/my/chapter0/1.mdx
index 4cc3d24c8..178234818 100644
--- a/chapters/my/chapter0/1.mdx
+++ b/chapters/my/chapter0/1.mdx
@@ -1,34 +1,34 @@
 # နိဒါန်း[[introduction]]
 
-Hugging Face Course မှ ကြိုဆိုပါတယ်။ ဒီနိဒါန်းမှာ သင့်အတွက် အလုပ်လုပ်နိုင်တဲ့ ပတ်ဝန်းကျင်တစ်ခု ဘယ်လိုတည်ဆောက်ရမလဲဆိုတာကို လမ်းညွှန်ပေးသွားမှာပါ။ သင်ဟာ Course ကို အခုမှစတင်သူဆိုရင်တော့ [အခန်း ၁](/course/chapter1) ကို အရင်ဆုံး ဖတ်ရှုပြီးမှ သင့်ရဲ့ပတ်ဝန်းကျင်ကို ထူထောင်ပြီး ကိုယ်တိုင် Code တွေကို လေ့ကျင့်ကြည့်ဖို့ အကြံပြုလိုပါတယ်။
+Hugging Face သင်တန်းမှ ကြိုဆိုပါတယ်။ ဒီနိဒါန်းက သင် အလုပ်လုပ်ရန် ပတ်ဝန်းကျင် (working environment) ကို တည်ဆောက်ရာမှာ လမ်းညွှန်ပေးမှာပါ။ သင်တန်းကို အခုမှ စတင်သူများအတွက်၊ [Chapter 1](/course/chapter1) ကို အရင်ဆုံး လေ့လာပြီးမှ၊ သင်ကိုယ်တိုင် code တွေကို စမ်းသပ်နိုင်ဖို့ သင့်ပတ်ဝန်းကျင်ကို ပြန်လည်တည်ဆောက်ဖို့ ကျွန်တော်တို့ အကြံပြုလိုပါတယ်။
 
-ဒီ Course မှာ အသုံးပြုမယ့် Library တွေအားလုံးဟာ Python Package တွေအနေနဲ့ ရရှိနိုင်ပါတယ်။ ဒါကြောင့် ဒီမှာ Python ပတ်ဝန်းကျင်တစ်ခုကို ဘယ်လိုတည်ဆောက်ပြီး လိုအပ်တဲ့ Library တွေကို ဘယ်လို Install လုပ်ရမလဲဆိုတာကို ပြသပေးသွားမှာ ဖြစ်ပါတယ်။
+ဒီသင်တန်းမှာ ကျွန်တော်တို့ အသုံးပြုမယ့် library တွေအားလုံးဟာ Python packages တွေအဖြစ် ရရှိနိုင်ပါတယ်။ ဒါကြောင့် ဒီနေရာမှာ သင်လိုအပ်တဲ့ Python ပတ်ဝန်းကျင်ကို ဘယ်လိုတည်ဆောက်ရမလဲ၊ သီးခြား library တွေကို ဘယ်လို install လုပ်ရမလဲဆိုတာကို ပြသသွားမှာပါ။ 
 
-သင့်ရဲ့ အလုပ်လုပ်မယ့်ပတ်ဝန်းကျင်ကို တည်ဆောက်တဲ့ နည်းလမ်းနှစ်မျိုးကို ဖော်ပြပေးသွားမှာပါ။ Colab Notebook ဒါမှမဟုတ် Python Virtual Environment ကို အသုံးပြုတဲ့နည်းလမ်းတွေ ဖြစ်ပါတယ်။ သင့်အတွက် အဆင်ပြေဆုံး နည်းလမ်းကို ရွေးချယ်နိုင်ပါတယ်။ စတင်လေ့လာသူတွေအတွက်တော့ Colab Notebook ကို အသုံးပြုဖို့ အထူးအကြံပြုလိုပါတယ်။
+သင့်ရဲ့ အလုပ်လုပ်ရန် ပတ်ဝန်းကျင်ကို တည်ဆောက်ဖို့ နည်းလမ်းနှစ်ခုကို ကျွန်တော်တို့ ဖော်ပြပေးပါမယ်- Colab notebook ကို အသုံးပြုခြင်း ဒါမှမဟုတ် Python virtual environment ကို အသုံးပြုခြင်းတို့ ဖြစ်ပါတယ်။ သင်နဲ့ အသင့်တော်ဆုံးနည်းလမ်းကို လွတ်လပ်စွာ ရွေးချယ်နိုင်ပါတယ်။ စတင်သူများအတွက်ကတော့ Colab notebook ကို အသုံးပြုပြီး စတင်ဖို့ ကျွန်တော်တို့ အထူးအကြံပြုပါတယ်။
 
-Windows System အတွက်တော့ ဖော်ပြပေးသွားမှာ မဟုတ်ပါဘူး။ အကယ်၍ သင်က Windows ကို အသုံးပြုနေတယ်ဆိုရင်တော့ Colab Notebook ကို အသုံးပြုပြီး လေ့ကျင့်ခန်းတွေ လိုက်လုပ်ဖို့ အကြံပြုလိုပါတယ်။ Linux Distribution ဒါမှမဟုတ် macOS ကို အသုံးပြုနေတယ်ဆိုရင်တော့ ဒီမှာဖော်ပြထားတဲ့ နည်းလမ်းနှစ်မျိုးလုံးကို အသုံးပြုနိုင်ပါတယ်။
+ကျွန်တော်တို့ဟာ Windows system အကြောင်းကို ဖော်ပြသွားမှာ မဟုတ်ပါဘူး။ သင် Windows ကို အသုံးပြုနေတယ်ဆိုရင် Colab notebook ကို အသုံးပြုပြီး လိုက်လုပ်ဖို့ ကျွန်တော်တို့ အကြံပြုပါတယ်။ သင် Linux distribution ဒါမှမဟုတ် macOS ကို အသုံးပြုနေတယ်ဆိုရင်တော့ ဒီနေရာမှာ ဖော်ပြထားတဲ့ နည်းလမ်းနှစ်ခုလုံးကို အသုံးပြုနိုင်ပါတယ်။
 
-ဒီ Course ရဲ့ အစိတ်အပိုင်းအများစုမှာ Hugging Face Account ရှိဖို့ လိုအပ်ပါတယ်။ ဒါကြောင့် အခုပဲ Account တစ်ခု ဖန်တီးထားဖို့ အကြံပြုလိုပါတယ်- [create an account](https://huggingface.co/join)။
+သင်တန်းရဲ့ အများစုကတော့ Hugging Face account ရှိဖို့ လိုအပ်ပါတယ်။ အခုပဲ account တစ်ခု ဖန်တီးဖို့ ကျွန်တော်တို့ အကြံပြုပါတယ်- [account တစ်ခု ဖန်တီးပါ](https://huggingface.co/join)။
 
-## Google Colab Notebook ကို အသုံးပြုခြင်း[[using-a-google-colab-notebook]]
+## Google Colab notebook ကို အသုံးပြုခြင်း[[using-a-google-colab-notebook]]
 
-Colab Notebook ကို အသုံးပြုတာဟာ အလွယ်ကူဆုံး တည်ဆောက်မှုဖြစ်ပါတယ်။ Browser မှာ Notebook တစ်ခုကို ဖွင့်ပြီး Code တွေစရေးရုံပါပဲ။
+Colab notebook ကို အသုံးပြုတာဟာ အလွယ်ကူဆုံး တည်ဆောက်မှုပုံစံ ဖြစ်ပါတယ်၊ သင်ရဲ့ browser ထဲမှာ notebook တစ်ခုကို ဖွင့်လိုက်ရုံနဲ့ တိုက်ရိုက် code ရေးလို့ရပါပြီ။
 
-Colab နဲ့ မရင်းနှီးသေးဘူးဆိုရင်တော့ [introduction](https://colab.research.google.com/notebooks/intro.ipynb) ကို အရင်ဆုံး ဖတ်ရှုဖို့ အကြံပြုလိုပါတယ်။ Colab က GPU ဒါမှမဟုတ် TPU လို Accelerating Hardware တွေကို အသုံးပြုခွင့်ပေးပြီး Workload နည်းနည်းအတွက်ဆိုရင် အခမဲ့ အသုံးပြုနိုင်ပါတယ်။
+သင် Colab နဲ့ မရင်းနှီးသေးဘူးဆိုရင် [နိဒါန်း](https://colab.research.google.com/notebooks/intro.ipynb) ကို အရင်ဆုံး လေ့လာဖို့ ကျွန်တော်တို့ အကြံပြုပါတယ်။ Colab က GPU ဒါမှမဟုတ် TPU လိုမျိုး အရှိန်မြှင့် hardware အချို့ကို အသုံးပြုခွင့်ပေးပြီး၊ အလုပ်ပမာဏ နည်းတဲ့ ကိစ္စတွေအတွက်တော့ အခမဲ့ ဖြစ်ပါတယ်။
 
-Colab မှာ အသုံးပြုရတာ အဆင်ပြေပြီဆိုရင် Notebook အသစ်တစ်ခု ဖန်တီးပြီး စတင်တည်ဆောက်နိုင်ပါပြီ။
+Colab ထဲမှာ ကျွမ်းကျင်စွာ သွားလာနိုင်ပြီဆိုတာနဲ့၊ notebook အသစ်တစ်ခု ဖန်တီးပြီး စနစ်တကျ စတင်လိုက်ပါ။
 
 <div class="flex justify-center">
 <img src="https://huggingface.co/datasets/huggingface-course/documentation-images/resolve/main/en/chapter0/new_colab.png" alt="An empty colab notebook" width="80%"/>
 </div>
 
-နောက်တစ်ဆင့်ကတော့ ဒီ Course မှာ အသုံးပြုမယ့် Library တွေကို Install လုပ်ဖို့ပဲ ဖြစ်ပါတယ်။ Python အတွက် Package Manager ဖြစ်တဲ့ `pip` ကို အသုံးပြုပြီး Install လုပ်ပါမယ်။ Notebook တွေမှာ System Command တွေကို `!` Character နဲ့ စပြီး ရေးသားနိုင်ပါတယ်။ ဒါကြောင့် 🤗 Transformers Library ကို အောက်ပါအတိုင်း Install လုပ်နိုင်ပါတယ်-
+နောက်တစ်ဆင့်ကတော့ ဒီသင်တန်းမှာ ကျွန်တော်တို့ အသုံးပြုမယ့် library တွေကို install လုပ်ဖို့ ဖြစ်ပါတယ်။ install လုပ်ဖို့အတွက် Python ရဲ့ package manager ဖြစ်တဲ့ `pip` ကို ကျွန်တော်တို့ အသုံးပြုပါမယ်။ notebooks တွေမှာ `!` character ကို အရှေ့မှာ ထားခြင်းဖြင့် system commands တွေကို run နိုင်ပါတယ်။ ဒါကြောင့် 🤗 Transformers library ကို အောက်ပါအတိုင်း install လုပ်နိုင်ပါတယ်-
 
 ```
 !pip install transformers
 ```
 
-သင့်ရဲ့ Python Runtime မှာ Import လုပ်ခြင်းဖြင့် Package မှန်ကန်စွာ Install လုပ်ပြီးခြင်းကို သေချာစစ်ဆေးနိုင်ပါတယ်-
+သင့်ရဲ့ Python runtime ထဲမှာ import လုပ်ခြင်းဖြင့် package ကို မှန်ကန်စွာ install လုပ်ထားခြင်းရှိမရှိ စစ်ဆေးနိုင်ပါတယ်-
 
 ```
 import transformers
@@ -38,38 +38,38 @@ import transformers
 <img src="https://huggingface.co/datasets/huggingface-course/documentation-images/resolve/main/en/chapter0/install.gif" alt="A gif showing the result of the two commands above: installation and import" width="80%"/>
 </div>
 
-ဒါက 🤗 Transformers ရဲ့ ပေါ့ပါးတဲ့ Version ကိုသာ Install လုပ်ထားတာ ဖြစ်ပါတယ်။ အထူးသဖြင့် PyTorch ဒါမှမဟုတ် TensorFlow လို Machine Learning Framework တွေတော့ Install လုပ်ထားတာ မဟုတ်ပါဘူး။ Library ရဲ့ ကွဲပြားခြားနားတဲ့ Feature တွေ အများကြီးကို အသုံးပြုရမှာဖြစ်လို့ Development Version ကို Install လုပ်ဖို့ အကြံပြုလိုပါတယ်။ ဒီ Version မှာတော့ စိတ်ကူးနိုင်သမျှ အသုံးပြုမှုတိုင်းအတွက် လိုအပ်တဲ့ Dependency တွေ အားလုံးပါဝင်ပါတယ်-
+ဒါက 🤗 Transformers ရဲ့ အလွန်ပေါ့ပါးတဲ့ version တစ်ခုကို install လုပ်တာ ဖြစ်ပါတယ်။ အထူးသဖြင့်၊ သီးခြား machine learning frameworks (PyTorch ဒါမှမဟုတ် TensorFlow လိုမျိုး) တွေ install လုပ်ထားတာ မရှိပါဘူး။ ကျွန်တော်တို့ library ရဲ့ မတူညီတဲ့ features များစွာကို အသုံးပြုရမှာဖြစ်တာကြောင့်၊ မော်ဒယ်တိုင်းအတွက် လိုအပ်တဲ့ dependencies အားလုံးပါဝင်တဲ့ development version ကို install လုပ်ဖို့ ကျွန်တော်တို့ အကြံပြုပါတယ်-
 
 ```
 !pip install transformers[sentencepiece]
 ```
 
-ဒါက အချိန်အနည်းငယ်ယူပါလိမ့်မယ်။ ဒါပေမဲ့ ပြီးတာနဲ့ Course ရဲ့ ကျန်အပိုင်းတွေအတွက် သင် အသင့်ဖြစ်နေပါပြီ။
+ဒါက အချိန်အနည်းငယ် ကြာပါလိမ့်မယ်၊ ဒါပေမယ့် ပြီးသွားရင်တော့ သင်တန်းတစ်လျှောက်လုံးအတွက် အဆင်သင့်ဖြစ်ပါပြီ။
 
-## Python Virtual Environment ကို အသုံးပြုခြင်း[[using-a-python-virtual-environment]]
+## Python virtual environment ကို အသုံးပြုခြင်း[[using-a-python-virtual-environment]]
 
-အကယ်၍ သင်က Python Virtual Environment ကို အသုံးပြုဖို့ နှစ်သက်တယ်ဆိုရင် ပထမအဆင့်က သင့်ရဲ့ System မှာ Python ကို Install လုပ်ဖို့ပါပဲ။ [ဒီလမ်းညွှန်](https://realpython.com/installing-python/) ကို လိုက်နာပြီး စတင်ဖို့ အကြံပြုလိုပါတယ်။
+သင် Python virtual environment ကို အသုံးပြုဖို့ ပိုကြိုက်တယ်ဆိုရင်၊ ပထမအဆင့်ကတော့ သင့် system မှာ Python ကို install လုပ်ဖို့ပါပဲ။ စတင်ဖို့အတွက် [ဒီလမ်းညွှန်](https://realpython.com/installing-python/) ကို လိုက်နာဖို့ ကျွန်တော်တို့ အကြံပြုပါတယ်။
 
-Python ကို Install လုပ်ပြီးပြီဆိုရင် သင့်ရဲ့ Terminal မှာ Python Command တွေကို Run နိုင်ပြီဖြစ်ပါတယ်။ နောက်အဆင့်တွေ မဆက်ခင် Python မှန်ကန်စွာ Install လုပ်ပြီးပြီလားဆိုတာ သေချာအောင် `python --version` Command ကို Run ကြည့်နိုင်ပါတယ်။ ဒါက သင့်ရဲ့ System မှာ လက်ရှိရရှိနိုင်တဲ့ Python Version ကို ပြသပေးပါလိမ့်မယ်။
+Python ကို install လုပ်ပြီးတာနဲ့၊ သင့် terminal မှာ Python commands တွေကို run နိုင်ပါလိမ့်မယ်။ နောက်ထပ်အဆင့်တွေ မဆက်ခင် မှန်ကန်စွာ install လုပ်ထားခြင်းရှိမရှိ သေချာစေဖို့ `python --version` လို့ရိုက်ပြီး စစ်ဆေးနိုင်ပါတယ်။ ဒါက သင့် system မှာ လက်ရှိရရှိနိုင်တဲ့ Python version ကို ပြသပေးပါလိမ့်မယ်။
 
-Terminal မှာ `python --version` လို Python Command တစ်ခုကို Run တဲ့အခါ၊ Command ကို Run နေတဲ့ Program ကို သင့်ရဲ့ System မှာရှိတဲ့ “main” Python အဖြစ် မှတ်ယူသင့်ပါတယ်။ ဒီ Main Installation ကို Package တွေမပါဘဲ သန့်ရှင်းအောင်ထားဖို့ အကြံပြုလိုပါတယ်။ ပြီးတော့ သင်အလုပ်လုပ်တဲ့ Application တစ်ခုစီအတွက် သီးခြား Environment တွေ ဖန်တီးဖို့အတွက် ဒီ Main Installation ကို အသုံးပြုသင့်ပါတယ်။ ဒီနည်းလမ်းက Application တစ်ခုစီမှာ သူ့ရဲ့ ကိုယ်ပိုင် Dependency တွေနဲ့ Package တွေ ရှိနိုင်ပြီး တခြား Application တွေနဲ့ ဖြစ်နိုင်တဲ့ Compatibility ပြဿနာတွေကို စိုးရိမ်စရာမလိုတော့ပါဘူး။
+သင့် terminal မှာ `python --version` လိုမျိုး Python command တစ်ခုကို run တဲ့အခါ၊ သင် command ကို run နေတဲ့ program ကို သင့် system ရဲ့ "အဓိက" Python လို့ တွေးကြည့်သင့်ပါတယ်။ ဒီအဓိက install လုပ်ထားတဲ့ Python မှာ packages တွေ မထည့်ဘဲ ထားပြီး၊ သင်အလုပ်လုပ်တဲ့ application တစ်ခုစီအတွက် သီးခြား environments တွေ ဖန်တီးဖို့ အသုံးပြုဖို့ ကျွန်တော်တို့ အကြံပြုပါတယ်။ ဒီနည်းနဲ့ application တစ်ခုစီမှာ သူ့ဘာသာသူ dependencies တွေနဲ့ packages တွေ ရှိနိုင်ပြီး၊ အခြား application တွေနဲ့ ဖြစ်နိုင်ချေရှိတဲ့ လိုက်ဖက်မှုပြဿနာတွေအတွက် သင်စိုးရိမ်စရာ မလိုပါဘူး။
 
-Python မှာ ဒါကို [virtual environments](https://docs.python.org/3/tutorial/venv.html) တွေနဲ့ လုပ်ဆောင်ပါတယ်။ Virtual Environment တွေက Self-contained Directory Tree တွေဖြစ်ပြီး တစ်ခုစီမှာ သတ်မှတ်ထားတဲ့ Python Version နဲ့ Application လိုအပ်တဲ့ Package တွေအားလုံးပါဝင်တဲ့ Python Installation တစ်ခုပါဝင်ပါတယ်။ ဒီလို Virtual Environment တစ်ခုကို ဖန်တီးတာကို ကိရိယာအမျိုးမျိုးနဲ့ လုပ်ဆောင်နိုင်ပေမဲ့ ဒီရည်ရွယ်ချက်အတွက် တရားဝင် Python Package ဖြစ်တဲ့ [`venv`](https://docs.python.org/3/library/venv.html#module-venv) ကို အသုံးပြုပါမယ်။
+Python မှာ ဒါကို [*virtual environments*](https://docs.python.org/3/tutorial/venv.html) တွေနဲ့ လုပ်ဆောင်ပါတယ်။ ဒါတွေဟာ သူ့ဘာသာသူ ပါဝင်တဲ့ directory trees တွေဖြစ်ပြီး တစ်ခုစီမှာ သီးခြား Python version တစ်ခုနဲ့ application လိုအပ်တဲ့ packages အားလုံး ပါဝင်ပါတယ်။ ဒီလို virtual environment တစ်ခု ဖန်တီးခြင်းကို ကိရိယာအမျိုးမျိုးနဲ့ လုပ်ဆောင်နိုင်ပေမယ့်၊ ကျွန်တော်တို့ကတော့ အဲဒီရည်ရွယ်ချက်အတွက် တရားဝင် Python package ဖြစ်တဲ့ [`venv`](https://docs.python.org/3/library/venv.html#module-venv) ကို အသုံးပြုပါမယ်။
 
-ပထမဆုံးအနေနဲ့ သင့် Application ထားရှိလိုတဲ့ Directory ကို ဖန်တီးပါ။ ဥပမာ- သင့်ရဲ့ Home Directory ရဲ့ Root မှာ *transformers-course* လို့ခေါ်တဲ့ Directory အသစ်တစ်ခု ဖန်တီးချင်ပါလိမ့်မယ်-
+ပထမဆုံး၊ သင်ရဲ့ application ကို ထားချင်တဲ့ directory ကို ဖန်တီးပါ - ဥပမာ၊ သင်ရဲ့ home directory ရဲ့ root မှာ *transformers-course* လို့ခေါ်တဲ့ directory အသစ်တစ်ခု ပြုလုပ်ချင်ပါလိမ့်မယ်-
 
 ```
 mkdir ~/transformers-course
 cd ~/transformers-course
 ```
 
-ဒီ Directory ထဲကနေ Python ရဲ့ `venv` Module ကို အသုံးပြုပြီး Virtual Environment တစ်ခုကို ဖန်တီးပါ-
+ဒီ directory ထဲကနေ Python ရဲ့ `venv` module ကို အသုံးပြုပြီး virtual environment တစ်ခု ဖန်တီးပါ။
 
 ```
 python -m venv .env
 ```
 
-အခုဆိုရင် သင့်ရဲ့ တခြားဘာမှမရှိတဲ့ Folder ထဲမှာ *.env* လို့ခေါ်တဲ့ Directory တစ်ခု ရှိနေပါပြီ-
+သင်ရဲ့ ဗလာဖြစ်နေတဲ့ folder ထဲမှာ *.env* လို့ခေါ်တဲ့ directory တစ်ခုကို အခု သင်တွေ့ရပါလိမ့်မယ်-
 
 ```
 ls -a
@@ -79,17 +79,17 @@ ls -a
 .      ..    .env
 ```
 
-`activate` နဲ့ `deactivate` Script တွေနဲ့ သင့်ရဲ့ Virtual Environment ထဲကို ဝင်ရောက်ခြင်းနဲ့ ထွက်ခွာခြင်းတို့ကို ပြုလုပ်နိုင်ပါတယ်-
+ဒီ `activate` နဲ့ `deactivate` scripts တွေနဲ့ သင်ရဲ့ virtual environment ထဲကို ဝင်နိုင်ပြီး ပြန်ထွက်နိုင်ပါတယ်-
 
 ```
-# Virtual Environment ကို Activate ပြုလုပ်ရန်
+# virtual environment ကို ဖွင့်
 source .env/bin/activate
 
-# Virtual Environment ကို Deactivate ပြုလုပ်ရန်
+# virtual environment ကို ပိတ်
 deactivate
 ```
 
-`which python` Command ကို Run ခြင်းဖြင့် Environment က Activate လုပ်ထားပြီလားဆိုတာ သေချာအောင် စစ်ဆေးနိုင်ပါတယ်။ အကယ်၍ Virtual Environment ကို ညွှန်ပြနေတယ်ဆိုရင် သင် အောင်မြင်စွာ Activate လုပ်ပြီးသား ဖြစ်ပါပြီ။
+`which python` command ကို run ခြင်းဖြင့် environment ကို ဖွင့်ထားခြင်းရှိမရှိ သေချာအောင် စစ်ဆေးနိုင်ပါတယ်။ အကယ်၍ ၎င်းက virtual environment ကို ညွှန်ပြနေတယ်ဆိုရင် သင်အောင်မြင်စွာ ဖွင့်ထားတာ ဖြစ်ပါတယ်။
 
 ```
 which python
@@ -99,12 +99,43 @@ which python
 /home/<user>/transformers-course/.env/bin/python
 ```
 
-### Dependencies များကို Install လုပ်ခြင်း[[installing-dependencies]]
+### Dependencies တွေကို Install လုပ်ခြင်း[[installing-dependencies]]
 
-Google Colab Instance များကို အသုံးပြုခြင်းအပိုင်းကဲ့သို့ပင် ဆက်လက်လုပ်ဆောင်ရန် လိုအပ်သော Package များကို ယခု Install လုပ်ရပါမည်။ ထပ်မံ၍ 🤗 Transformers ၏ Development Version ကို `pip` Package Manager ကို အသုံးပြု၍ Install လုပ်နိုင်သည်-
+Google Colab instances တွေကို အသုံးပြုခြင်းနဲ့ ပတ်သက်တဲ့ ယခင်အပိုင်းမှာလိုပဲ၊ ဆက်လက်လုပ်ဆောင်ဖို့ လိုအပ်တဲ့ packages တွေကို အခု သင် install လုပ်ဖို့ လိုအပ်ပါလိမ့်မယ်။ ထပ်မံပြီးတော့ `pip` package manager ကို အသုံးပြုပြီး 🤗 Transformers ရဲ့ development version ကို install လုပ်နိုင်ပါတယ်-
 
 ```
 pip install "transformers[sentencepiece]"
 ```
 
-သင် အခုဆိုရင် အားလုံးအဆင်သင့်ဖြစ်ပြီး လုပ်ဆောင်ရန် အသင့်ဖြစ်နေပါပြီ။
\ No newline at end of file
+အခု သင်အားလုံး အဆင်သင့်ဖြစ်ပြီ၊ စတင်ဖို့ အသင့်ပါပဲ။
+
+## ဝေါဟာရ ရှင်းလင်းချက် (Glossary)
+
+*   **Working Environment**: ဆော့ဖ်ဝဲလ် သို့မဟုတ် project တစ်ခုကို ဖန်တီး၊ တည်ဆောက်ပြီး run ရန်အတွက် လိုအပ်သော ကိရိယာများနှင့် ဆော့ဖ်ဝဲလ်များ အားလုံးပါဝင်သော စနစ်ပတ်ဝန်းကျင်။
+*   **Chapter 1**: သင်တန်း၏ ပထမဆုံး အခန်း။
+*   **Python Packages**: Python ပရိုဂရမ်များတွင် အသုံးပြုရန်အတွက် စုစည်းထားသော modules နှင့် code များ။
+*   **Python Environment**: Python code များကို run ရန်အတွက် လိုအပ်သော Python interpreter နှင့် libraries များ အားလုံးပါဝင်သော ပတ်ဝန်းကျင်။
+*   **Colab Notebook (Google Colab)**: Google မှ ပံ့ပိုးပေးထားသော cloud-based Jupyter Notebook environment တစ်ခုဖြစ်ပြီး Python code များကို web browser မှတစ်ဆင့် run နိုင်စေသည်။ အခမဲ့ GPU/TPU အသုံးပြုခွင့်ပေးသည်။
+*   **Python Virtual Environment**: အခြား Python environment များမှ သီးခြားစီ ခွဲထုတ်ထားသော Python environment တစ်ခု။ ၎င်းသည် project တစ်ခုစီအတွက် ၎င်း၏ကိုယ်ပိုင် dependencies များကို ထိန်းသိမ်းရန် ကူညီပေးသည်။
+*   **Linux Distribution**: Linux kernel ပေါ် အခြေခံထားသော operating system (ဥပမာ - Ubuntu, Fedora)။
+*   **macOS**: Apple Inc. မှ ထုတ်လုပ်ထားသော operating system။
+*   **Hugging Face Account**: Hugging Face ပလက်ဖောင်းပေါ်ရှိ သုံးစွဲသူအကောင့်။ ၎င်းသည် မော်ဒယ်များ၊ datasets များနှင့် အခြားအရင်းအမြစ်များကို ဝင်ရောက်ကြည့်ရှုရန် ခွင့်ပြုသည်။
+*   **GPU (Graphics Processing Unit)**: ဂရပ်ဖစ်လုပ်ဆောင်မှုအတွက် အထူးဒီဇိုင်းထုတ်ထားသော processor တစ်မျိုးဖြစ်သော်လည်း AI/ML လုပ်ငန်းများတွင် အရှိန်မြှင့်ရန် အသုံးများသည်။
+*   **TPU (Tensor Processing Unit)**: Google မှ AI/ML workloads များအတွက် အထူးဒီဇိုင်းထုတ်ထားသော processor တစ်မျိုး။
+*   **`pip`**: Python အတွက် package installer (package manager)။ Python packages များကို install လုပ်ရန်နှင့် စီမံခန့်ခွဲရန် အသုံးပြုသည်။
+*   **`!` Character**: Jupyter/Colab Notebook များတွင် shell commands များကို run ရန်အတွက် အသုံးပြုသော prefix။
+*   **🤗 Transformers Library**: Hugging Face က ထုတ်လုပ်ထားတဲ့ library တစ်ခုဖြစ်ပြီး Transformer မော်ဒယ်တွေကို အသုံးပြုပြီး Natural Language Processing (NLP), computer vision, audio processing စတဲ့ နယ်ပယ်တွေမှာ အဆင့်မြင့် AI မော်ဒယ်တွေကို တည်ဆောက်ပြီး အသုံးပြုနိုင်စေပါတယ်။
+*   **Python Runtime**: Python code ကို လက်ရှိ run နေသော ပတ်ဝန်းကျင်။
+*   **Machine Learning Frameworks**: Machine learning မော်ဒယ်များကို တည်ဆောက်ရန်၊ လေ့ကျင့်ရန်နှင့် အသုံးပြုရန်အတွက် ကိရိယာများနှင့် library များ စုစည်းမှု (ဥပမာ - PyTorch, TensorFlow)။
+*   **PyTorch**: Facebook (ယခု Meta) က ဖန်တီးထားတဲ့ open-source machine learning library တစ်ခုဖြစ်ပြီး deep learning မော်ဒယ်တွေ တည်ဆောက်ဖို့အတွက် အသုံးပြုပါတယ်။
+*   **TensorFlow**: Google က ဖန်တီးထားတဲ့ open-source machine learning library တစ်ခုဖြစ်ပြီး deep learning မော်ဒယ်တွေ တည်ဆောက်ဖို့အတွက် အသုံးပြုပါတယ်။
+*   **Development Version**: ဆော့ဖ်ဝဲလ်တစ်ခု၏ အပြည့်အစုံသော၊ အင်္ဂါရပ်အားလုံးပါဝင်သည့် ဗားရှင်း (အများအားဖြင့် စမ်းသပ်ခြင်းနှင့် ဖွံ့ဖြိုးတိုးတက်မှုအတွက်)။
+*   **Dependencies**: ဆော့ဖ်ဝဲလ်တစ်ခု သို့မဟုတ် library တစ်ခု အလုပ်လုပ်ရန် လိုအပ်သော အခြား library များနှင့် modules များ။
+*   **`venv` Module**: Python ၏ တရားဝင် module တစ်ခုဖြစ်ပြီး virtual environments များကို ဖန်တီးရန် အသုံးပြုသည်။
+*   **`mkdir`**: Command line command တစ်ခုဖြစ်ပြီး directory (folder) အသစ်တစ်ခုကို ဖန်တီးရန် အသုံးပြုသည်။
+*   **`cd`**: Command line command တစ်ခုဖြစ်ပြီး directory တစ်ခုမှ အခြားတစ်ခုသို့ ပြောင်းလဲရန် အသုံးပြုသည်။
+*   **`ls -a`**: Command line command တစ်ခုဖြစ်ပြီး လက်ရှိ directory ရှိ ဖိုင်များနှင့် directory များအားလုံးကို (ဝှက်ထားသောဖိုင်များပါ အပါအဝင်) ပြသရန် အသုံးပြုသည်။
+*   **`source`**: Unix/Linux shell command တစ်ခုဖြစ်ပြီး script တစ်ခုကို လက်ရှိ shell ထဲတွင် run ရန် အသုံးပြုသည်။
+*   **`activate` / `deactivate`**: virtual environment ကို ဖွင့်ရန် သို့မဟုတ် ပိတ်ရန် အသုံးပြုသော scripts များ။
+*   **`which python`**: လက်ရှိ shell က အသုံးပြုနေသော `python` executable ၏ လမ်းကြောင်းကို ပြသရန် အသုံးပြုသော command။
+*   **`transformers[sentencepiece]`**: `transformers` library ကို `sentencepiece` dependency ပါ အပါအဝင် install လုပ်ရန်အတွက် `pip` syntax။
diff --git a/chapters/my/chapter1/1.mdx b/chapters/my/chapter1/1.mdx
new file mode 100644
index 000000000..c8d31700d
--- /dev/null
+++ b/chapters/my/chapter1/1.mdx
@@ -0,0 +1,190 @@
+# နိဒါန်း[[introduction]]
+
+<CourseFloatingBanner
+    chapter={1}
+    classNames="absolute z-10 right-0 top-0"
+/>
+
+## 🤗 သင်တန်းမှ ကြိုဆိုပါတယ်။[[welcome-to-the-course]]
+
+<Youtube id="00GKzGyWFEs" />
+
+ဒီသင်တန်းက Hugging Face ရဲ့ ecosystem ထဲက library တွေဖြစ်တဲ့ 🤗 Transformers, 🤗 Datasets, 🤗 Tokenizers နဲ့ 🤗 Accelerate တို့အပြင် Hugging Face Hub ကိုပါ အသုံးပြုပြီး Large Language Models (LLMs) နဲ့ Natural Language Processing (NLP) တို့အကြောင်းကို သင်ကြားပေးမှာပါ။
+
+Hugging Face ecosystem ပြင်ပက library တွေကိုလည်း ထည့်သွင်းသင်ကြားပေးသွားမှာပါ။ ဒါတွေဟာ AI ကဏ္ဍအတွက် အံ့မခန်းပံ့ပိုးမှုတွေဖြစ်ပြီး အသုံးဝင်တဲ့ ကိရိယာတွေပါ။
+
+ဒီသင်တန်းက လုံးဝအခမဲ့ဖြစ်ပြီး ကြော်ငြာတွေလည်း မပါဝင်ပါဘူး။
+
+## Natural Language Processing (NLP) နဲ့ Large Language Models (LLMs) တွေကို နားလည်ခြင်း[[understanding-nlp-and-llms]]
+
+ဒီသင်တန်းဟာ မူလက Natural Language Processing (NLP) ကို အဓိကထားခဲ့ပေမယ့်၊ ဒီနယ်ပယ်ရဲ့ နောက်ဆုံးပေါ် တိုးတက်မှုဖြစ်တဲ့ Large Language Models (LLMs) တွေကို ပိုပြီးအလေးပေး သင်ကြားနိုင်အောင် ပြောင်းလဲထားပါတယ်။
+
+**ဘာတွေ ကွာခြားလဲ။**
+- **Natural Language Processing (NLP)** ဆိုတာ ကွန်ပျူတာတွေ လူသားဘာသာစကားကို နားလည်၊ အဓိပ္ပာယ်ဖော်ပြီး ဖန်တီးနိုင်အောင် လုပ်ဆောင်ပေးတဲ့ ပိုကျယ်ပြန့်တဲ့ နယ်ပယ်တစ်ခုပါ။ NLP မှာ စိတ်ခံစားမှုဆန်းစစ်ခြင်း၊ နာမည်သတ်မှတ်ခြင်းနဲ့ စက်ဘာသာပြန်ခြင်းစတဲ့ နည်းစနစ်များစွာနဲ့ လုပ်ငန်းတာဝန်တွေ ပါဝင်ပါတယ်။
+- **Large Language Models (LLMs)** တွေကတော့ NLP မော်ဒယ်တွေရဲ့ အစွမ်းထက်တဲ့ အစိတ်အပိုင်းတစ်ခုဖြစ်ပြီး ၎င်းတို့ရဲ့ ကြီးမားတဲ့ အရွယ်အစား၊ များပြားတဲ့ သင်ကြားမှု ဒေတာတွေနဲ့ သီးသန့်တာဝန်အတွက် သင်ကြားမှုအနည်းဆုံးနဲ့ ဘာသာစကားလုပ်ငန်းတာဝန် အမျိုးမျိုးကို လုပ်ဆောင်နိုင်စွမ်းတို့ကြောင့် ထူးခြားပါတယ်။ Llama, GPT, ဒါမှမဟုတ် Claude စီးရီးလို မော်ဒယ်တွေဟာ LLMs တွေရဲ့ ဥပမာတွေဖြစ်ပြီး NLP နယ်ပယ်မှာ ဖြစ်နိုင်ခြေတွေကို တော်လှန်ပြောင်းလဲခဲ့ပါတယ်။
+
+ဒီသင်တန်းတစ်လျှောက်လုံးမှာ သင်ဟာ ရိုးရာ NLP သဘောတရားတွေရော၊ ခေတ်မီ LLM နည်းပညာတွေပါ လေ့လာရမှာဖြစ်ပါတယ်။ ဘာလို့လဲဆိုတော့ NLP ရဲ့ အခြေခံအုတ်မြစ်တွေကို နားလည်ထားတာဟာ LLMs တွေနဲ့ ထိထိရောက်ရောက် အလုပ်လုပ်ဖို့အတွက် အရေးကြီးလို့ပါ။
+
+## ဘာတွေ မျှော်လင့်ထားနိုင်မလဲ။[[what-to-expect]]
+
+ဒီသင်တန်းရဲ့ အကျဉ်းချုပ်ကို အောက်မှာဖော်ပြထားပါတယ်။
+
+<div class="flex justify-center">
+<img class="block dark:hidden" src="https://huggingface.co/datasets/huggingface-course/documentation-images/resolve/main/en/chapter1/summary.svg" alt="Brief overview of the chapters of the course.">
+<img class="hidden dark:block" src="https://huggingface.co/datasets/huggingface-course/documentation-images/resolve/main/en/chapter1/summary-dark.svg" alt="Brief overview of the chapters of the course.">
+</div>
+
+- အခန်း (၁) မှ (၄) အထိက 🤗 Transformers library ရဲ့ အဓိကသဘောတရားတွေကို မိတ်ဆက်ပေးထားပါတယ်။ ဒီအပိုင်းအဆုံးမှာ Transformer မော်ဒယ်တွေ ဘယ်လိုအလုပ်လုပ်တယ်ဆိုတာကို သင်နားလည်လာမှာဖြစ်ပြီး Hugging Face Hub ကနေ မော်ဒယ်တစ်ခုကို ဘယ်လိုအသုံးပြုရမယ်၊ dataset တစ်ခုပေါ်မှာ ဘယ်လို fine-tune လုပ်ရမယ်၊ ပြီးတော့ သင်ရဲ့ရလဒ်တွေကို Hub ပေါ်မှာ ဘယ်လို share ရမယ်ဆိုတာကိုပါ သိရှိလာပါလိမ့်မယ်။
+- အခန်း (၅) မှ (၈) အထိကတော့ classic NLP လုပ်ငန်းတာဝန်တွေနဲ့ LLM နည်းပညာတွေထဲ မဝင်ခင် 🤗 Datasets နဲ့ 🤗 Tokenizers ရဲ့ အခြေခံတွေကို သင်ကြားပေးမှာပါ။ ဒီအပိုင်းအဆုံးမှာတော့ အသုံးအများဆုံး ဘာသာစကားလုပ်ဆောင်မှု စိန်ခေါ်မှုတွေကို ကိုယ်တိုင်ဖြေရှင်းနိုင်ပါလိမ့်မယ်။
+- အခန်း (၉) ကတော့ NLP နယ်ပယ်ကို ကျော်လွန်ပြီး သင်ရဲ့မော်ဒယ်တွေရဲ့ demo တွေကို Hugging Face Hub ပေါ်မှာ ဘယ်လိုဖန်တီးပြီး share ရမယ်ဆိုတာကို ဖော်ပြပေးပါလိမ့်မယ်။ ဒီအပိုင်းအဆုံးမှာတော့ သင်ရဲ့ 🤗 Transformers application တွေကို ကမ္ဘာကြီးကို ပြသဖို့ အဆင်သင့်ဖြစ်နေပါလိမ့်မယ်။
+- အခန်း (၁၀) မှ (၁၂) အထိကတော့ fine-tuning, အရည်အသွေးမြင့် dataset များ ပြင်ဆင်ခြင်းနဲ့ reasoning မော်ဒယ်များ တည်ဆောက်ခြင်းစတဲ့ အဆင့်မြင့် LLM ခေါင်းစဉ်များထဲကို နက်နက်နဲနဲ လေ့လာသွားမှာပါ။
+
+ဒီသင်တန်းအတွက် လိုအပ်ချက်များ-
+
+* Python ကို ကောင်းကောင်းသိနားလည်ထားဖို့လိုပါတယ်။
+* fast.ai ရဲ့ Practical Deep Learning for Coders ဒါမှမဟုတ် DeepLearning.AI က ပရိုဂရမ်တစ်ခုခုလို deep learning အခြေခံသင်တန်းတစ်ခုခု တက်ရောက်ပြီးမှ သင်ယူရင် ပိုကောင်းပါတယ်။
+* PyTorch ဒါမှမဟုတ် TensorFlow အကြောင်းကို ကြိုတင်သိထားဖို့ မလိုအပ်ပေမယ့်၊ အနည်းငယ် ရင်းနှီးထားရင်တော့ အထောက်အကူဖြစ်ပါလိမ့်မယ်။
+
+ဒီသင်တန်းပြီးမြောက်သွားရင် DeepLearning.AI ရဲ့ Natural Language Processing Specialization ကို ဆက်လက်လေ့လာဖို့ ကျွန်တော်တို့ အကြံပြုချင်ပါတယ်။ အဲဒီသင်တန်းမှာ naive Bayes နဲ့ LSTMs လို ရိုးရာ NLP မော်ဒယ်အမျိုးအစားများစွာ ပါဝင်ပြီး သိထားသင့်တဲ့အရာတွေ ဖြစ်ပါတယ်။
+
+## ကျွန်တော်တို့က ဘယ်သူတွေလဲ။[[who-are-we]]
+
+စာရေးဆရာများအကြောင်း:
+
+[**Abubakar Abid**](https://huggingface.co/abidlabs) ဟာ Stanford တက္ကသိုလ်မှာ applied machine learning ဘာသာရပ်နဲ့ ပါရဂူဘွဲ့ရရှိခဲ့ပါတယ်။ သူ ပါရဂူဘွဲ့ယူနေစဉ် [Gradio](https://github.com/gradio-app/gradio) ကို တည်ထောင်ခဲ့ပါတယ်။ Gradio ဟာ open-source Python library တစ်ခုဖြစ်ပြီး machine learning demo ပေါင်း ၆၀၀,၀၀၀ ကျော်ကို ဖန်တီးရာမှာ အသုံးပြုခဲ့ပါတယ်။ Gradio ကို Hugging Face က ဝယ်ယူခဲ့ပြီး Abubakar ကတော့ အခု Hugging Face မှာ machine learning team lead အဖြစ် တာဝန်ထမ်းဆောင်နေပါတယ်။
+
+[**Ben Burtenshaw**](https://huggingface.co/burtenshaw) ဟာ Hugging Face မှာ Machine Learning Engineer အဖြစ် တာဝန်ထမ်းဆောင်နေပါတယ်။ သူက University of Antwerp မှာ Natural Language Processing ဘာသာရပ်နဲ့ ပါရဂူဘွဲ့ရခဲ့ပြီး၊ စာတတ်မြောက်မှုစွမ်းရည် တိုးတက်စေဖို့အတွက် Transformer မော်ဒယ်တွေကို ကလေးပုံပြင်တွေ ဖန်တီးရာမှာ အသုံးပြုခဲ့ပါတယ်။ ထိုအချိန်မှစ၍ သူသည် ပညာရေးဆိုင်ရာပစ္စည်းများနှင့် ကိရိယာများကို ပိုမိုကျယ်ပြန့်သော လူ့အဖွဲ့အစည်းအတွက် အဓိကထား လုပ်ဆောင်ခဲ့ပါတယ်။
+
+[**Matthew Carrigan**](https://huggingface.co/Rocketknight1) ဟာ Hugging Face မှာ Machine Learning Engineer တစ်ဦးဖြစ်ပါတယ်။ သူက အိုင်ယာလန်နိုင်ငံ၊ ဒပ်ဘလင်မြို့မှာ နေထိုင်ပြီး အရင်က Parse.ly မှာ ML engineer အဖြစ်နဲ့ Trinity College Dublin မှာ post-doctoral researcher အဖြစ် လုပ်ကိုင်ခဲ့ပါတယ်။ သူက လက်ရှိ architecture တွေရဲ့ အရွယ်အစားကိုချဲ့ရုံနဲ့ AGI ကို ရောက်လိမ့်မယ်လို့ မယုံကြည်ပေမယ့်၊ robot တွေရဲ့ ထာဝရရှင်သန်မှုအပေါ်မှာတော့ မျှော်လင့်ချက်ကြီးမားစွာ ထားရှိပါတယ်။
+
+[**Lysandre Debut**](https://huggingface.co/lysandre) ဟာ Hugging Face မှာ Machine Learning Engineer တစ်ဦးဖြစ်ပြီး 🤗 Transformers library ကို အစောဆုံးဖွံ့ဖြိုးတိုးတက်မှု အဆင့်တွေကတည်းက စတင်လုပ်ဆောင်ခဲ့သူပါ။ သူ့ရဲ့ ရည်ရွယ်ချက်ကတော့ အလွန်ရိုးရှင်းတဲ့ API ပါတဲ့ ကိရိယာတွေကို တီထွင်ခြင်းဖြင့် လူတိုင်းအတွက် NLP ကို လက်လှမ်းမီစေဖို့ပါပဲ။
+
+[**Sylvain Gugger**](https://huggingface.co/sgugger) ဟာ Hugging Face မှာ Research Engineer တစ်ဦးဖြစ်ပြီး 🤗 Transformers library ရဲ့ အဓိက ထိန်းသိမ်းသူတွေထဲက တစ်ဦးလည်း ဖြစ်ပါတယ်။ ယခင်က သူသည် fast.ai တွင် Research Scientist အဖြစ် တာဝန်ထမ်းဆောင်ခဲ့ပြီး Jeremy Howard နှင့်အတူ _[Deep Learning for Coders with fastai and PyTorch](https://learning.oreilly.com/library/view/deep-learning-for/9781492045519/)_ စာအုပ်ကို ပူးတွဲရေးသားခဲ့ပါတယ်။ သူ့ရဲ့ သုတေသနရဲ့ အဓိကအာရုံကတော့ deep learning ကို ပိုမိုလက်လှမ်းမီအောင် ပြုလုပ်ဖို့၊ အကန့်အသတ်ရှိတဲ့ အရင်းအမြစ်တွေနဲ့ မော်ဒယ်တွေကို မြန်မြန်ဆန်ဆန် လေ့ကျင့်နိုင်တဲ့ နည်းစနစ်တွေကို ဒီဇိုင်းဆွဲပြီး တိုးတက်အောင် လုပ်ဆောင်ဖို့ ဖြစ်ပါတယ်။
+
+[**Dawood Khan**](https://huggingface.co/dawoodkhan82) ဟာ Hugging Face မှာ Machine Learning Engineer တစ်ဦးဖြစ်ပါတယ်။ သူက NYC ကလာပြီး New York University ကနေ Computer Science ဘာသာရပ်နဲ့ ဘွဲ့ရခဲ့ပါတယ်။ iOS Engineer အဖြစ် နှစ်အနည်းငယ် အလုပ်လုပ်ပြီးနောက် Dawood ဟာ Gradio ကို သူ့ရဲ့ ပူးတွဲတည်ထောင်သူတွေနဲ့အတူ စတင်ဖို့ အလုပ်ကနေ ထွက်ခဲ့ပါတယ်။ နောက်ဆုံးတော့ Gradio ကို Hugging Face က ဝယ်ယူခဲ့ပါတယ်။
+
+[**Merve Noyan**](https://huggingface.co/merve) ဟာ Hugging Face က developer advocate တစ်ဦးဖြစ်ပြီး လူတိုင်းအတွက် machine learning ကို ဒီမိုကရေစီနည်းကျစေဖို့ ကိရိယာတွေ တီထွင်ပြီး ၎င်းတို့နဲ့ပတ်သက်တဲ့ အကြောင်းအရာတွေကို ဖန်တီးနေပါတယ်။
+
+[**Lucile Saulnier**](https://huggingface.co/SaulLu) ဟာ Hugging Face မှာ machine learning engineer တစ်ဦးဖြစ်ပြီး open-source tool တွေရဲ့ အသုံးပြုမှုကို ဖွံ့ဖြိုးတိုးတက်စေကာ ပံ့ပိုးပေးနေပါတယ်။ သူမသည် Natural Language Processing နယ်ပယ်ရှိ collaborative training နှင့် BigScience ကဲ့သို့သော သုတေသနပရောဂျက်များစွာတွင်လည်း တက်ကြွစွာ ပါဝင်ဆောင်ရွက်နေပါတယ်။
+
+[**Lewis Tunstall**](https://huggingface.co/lewtun) ဟာ Hugging Face မှာ machine learning engineer တစ်ဦးဖြစ်ပြီး open-source tool တွေကို တီထွင်ကာ ပိုမိုကျယ်ပြန့်တဲ့ အသိုင်းအဝိုင်းကို လက်လှမ်းမီအောင် လုပ်ဆောင်နေသူပါ။ သူသည် O'Reilly စာအုပ်ဖြစ်သော [Natural Language Processing with Transformers](https://www.oreilly.com/library/view/natural-language-processing/9781098136789/) ၏ ပူးတွဲစာရေးဆရာလည်း ဖြစ်ပါတယ်။
+
+[**Leandro von Werra**](https://huggingface.co/lvwerra) ဟာ Hugging Face ရဲ့ open-source team မှာ machine learning engineer တစ်ဦးဖြစ်ပြီး O'Reilly စာအုပ်ဖြစ်သော [Natural Language Processing with Transformers](https://www.oreilly.com/library/view/natural-language-processing/9781098136789/) ၏ ပူးတွဲစာရေးဆရာလည်း ဖြစ်ပါတယ်။ သူသည် machine learning stack တစ်လျှောက်လုံး လုပ်ကိုင်ရင်း NLP project များကို ထုတ်လုပ်မှုအဆင့်သို့ ရောက်ရှိစေရာတွင် စက်မှုလုပ်ငန်းအတွေ့အကြုံ နှစ်ပေါင်းများစွာ ရှိခဲ့သူပါ။
+
+## မကြာခဏမေးလေ့ရှိသော မေးခွန်းများ (FAQ)[[faq]]
+
+မကြာခဏမေးလေ့ရှိတဲ့ မေးခွန်းတွေရဲ့ အဖြေတွေကတော့ ဒီမှာပါ။
+
+- **ဒီသင်တန်းတက်ရင် အသိအမှတ်ပြုလက်မှတ် (certification) ရနိုင်လား။**
+လက်ရှိအချိန်မှာတော့ ဒီသင်တန်းအတွက် အသိအမှတ်ပြုလက်မှတ် မရှိသေးပါဘူး။ ဒါပေမယ့် Hugging Face ecosystem အတွက် certification program တစ်ခုကို စီစဉ်နေပါတယ် — စောင့်မျှော်ပေးပါဦး။
+
+- **ဒီသင်တန်းအတွက် အချိန်ဘယ်လောက်ပေးရမလဲ။**
+ဒီသင်တန်းက အခန်းတစ်ခန်းစီကို တစ်ပတ်အတွင်း ပြီးစီးအောင် ဒီဇိုင်းထုတ်ထားပြီး တစ်ပတ်ကို ၆-၈ နာရီခန့် အချိန်ပေးရပါမယ်။ ဒါပေမယ့် သင်တန်းပြီးဆုံးဖို့ လိုအပ်သလောက် အချိန်ယူနိုင်ပါတယ်။
+
+- **မေးခွန်းရှိရင် ဘယ်မှာမေးလို့ရလဲ။**
+သင်တန်းရဲ့ ဘယ်အပိုင်းနဲ့ပတ်သက်ပြီး မေးခွန်းရှိသည်ဖြစ်စေ၊ စာမျက်နှာရဲ့ ထိပ်ပိုင်းမှာရှိတဲ့ "*Ask a question*" banner ကို နှိပ်လိုက်ရုံနဲ့ [Hugging Face forums](https://discuss.huggingface.co/) ရဲ့ မှန်ကန်တဲ့ အပိုင်းကို အလိုအလျောက် ရောက်ရှိသွားပါလိမ့်မယ်။
+
+<img src="https://huggingface.co/datasets/huggingface-course/documentation-images/resolve/main/en/chapter1/forum-button.png" alt="Link to the Hugging Face forums" width="75%">
+
+သင်တန်းပြီးဆုံးသွားတဲ့အခါ ပိုမိုလေ့ကျင့်ချင်တယ်ဆိုရင် forums မှာ [project ideas](https://discuss.huggingface.co/c/course/course-event/25) စာရင်းကိုလည်း ရရှိနိုင်ပါတယ်။
+
+- **သင်တန်းအတွက် code တွေကို ဘယ်မှာရနိုင်မလဲ။**
+အခန်းတစ်ခန်းစီအတွက် စာမျက်နှာရဲ့ ထိပ်ပိုင်းမှာရှိတဲ့ banner ကို နှိပ်လိုက်ရင် Google Colab ဒါမှမဟုတ် Amazon SageMaker Studio Lab မှာ code တွေကို run နိုင်ပါပြီ။
+
+<img src="https://huggingface.co/datasets/huggingface-course/documentation-images/resolve/main/en/chapter1/notebook-buttons.png" alt="Link to the Hugging Face course notebooks" width="75%">
+
+
+သင်တန်းရဲ့ code အားလုံးပါဝင်တဲ့ Jupyter notebooks တွေကို [`huggingface/notebooks`](https://github.com/huggingface/notebooks) repo မှာ လက်ခံထားပါတယ်။ သင် ကိုယ်တိုင် generate လုပ်ချင်တယ်ဆိုရင် GitHub ပေါ်ရှိ course repo မှာ ဖော်ပြထားတဲ့ ညွှန်ကြားချက်တွေကို ကြည့်ရှုနိုင်ပါတယ်။
+
+- **ဒီသင်တန်းကို ဘယ်လို ပံ့ပိုးကူညီနိုင်မလဲ။**
+ဒီသင်တန်းကို ပံ့ပိုးကူညီနိုင်တဲ့ နည်းလမ်းများစွာရှိပါတယ်။ စာလုံးပေါင်းမှားတာ ဒါမှမဟုတ် bug တွေ့ရင် [`course`](https://github.com/huggingface/course repo မှာ issue ဖွင့်ပေးပါ။ သင်တန်းကို သင်ရဲ့ မိခင်ဘာသာစကားနဲ့ ဘာသာပြန်ဆိုဖို့ ကူညီချင်တယ်ဆိုရင် ဒီမှာ ဖော်ပြထားတဲ့ ညွှန်ကြားချက်တွေကို ကြည့်ရှုနိုင်ပါတယ်။
+
+- **ဘာသာပြန်ဆိုမှုတစ်ခုစီအတွက် ဘယ်လိုရွေးချယ်မှုတွေ လုပ်ခဲ့လဲ။**
+ဘာသာပြန်ဆိုမှုတစ်ခုစီမှာ machine learning jargon စတာတွေအတွက် လုပ်ခဲ့တဲ့ ရွေးချယ်မှုတွေကို အသေးစိတ်ဖော်ပြထားတဲ့ glossary နဲ့ TRANSLATING.txt ဖိုင်တစ်ခု ပါရှိပါတယ်။ ဥပမာအဖြစ် ဂျာမန်ဘာသာအတွက် [ဒီမှာ](https://github.com/huggingface/course/blob/main/chapters/de/TRANSLATING.txt) ကြည့်ရှုနိုင်ပါတယ်။
+
+- **ဒီသင်တန်းကို ပြန်လည်အသုံးပြုနိုင်လား။**
+ဟုတ်ကဲ့၊ အသုံးပြုနိုင်ပါတယ်။ ဒီသင်တန်းကို ခွင့်ပြုချက်မြင့်မားတဲ့ [Apache 2 license](https://www.apache.org/licenses/LICENSE-2.0.html) အောက်မှာ ထုတ်ပြန်ထားပါတယ်။ ဒါကတော့ သင်ဟာ သင့်လျော်တဲ့ credit ပေးရမယ်၊ license link ကို ထည့်သွင်းပေးရမယ်၊ ပြောင်းလဲမှုတွေလုပ်ခဲ့ရင်လည်း ဖော်ပြပေးရမယ်လို့ ဆိုလိုပါတယ်။ ဒါတွေကို သင့်လျော်တဲ့ နည်းလမ်းနဲ့ လုပ်ဆောင်နိုင်ပေမယ့်၊ လိုင်စင်ထုတ်ပေးသူက သင့်ကို ဒါမှမဟုတ် သင့်အသုံးပြုမှုကို ထောက်ခံတယ်လို့ ထင်မြင်စေမယ့်ပုံစံမျိုး မလုပ်ဆောင်ရပါဘူး။ သင်တန်းကို ကိုးကားလိုပါက အောက်ပါ BibTeX ကို အသုံးပြုပါ။
+
+```
+@misc{huggingfacecourse,
+  author = {Hugging Face},
+  title = {The Hugging Face Course, 2022},
+  howpublished = "\url{https://huggingface.co/course}",
+  year = {2022},
+  note = "[Online; accessed <today>]"
+}
+```
+
+## ဘာသာစကားများနှင့် ဘာသာပြန်ဆိုမှုများ[[languages-and-translations]]
+
+ကျွန်တော်တို့ရဲ့ အံ့ဖွယ်ကောင်းတဲ့ အသိုင်းအဝိုင်းကြောင့် ဒီသင်တန်းကို အင်္ဂလိပ်ဘာသာစကားအပြင် အခြားဘာသာစကားများစွာနဲ့လည်း ရရှိနိုင်ပါပြီ 🔥! ဘယ်ဘာသာစကားတွေ ရရှိနိုင်ပြီး ဘယ်သူတွေ ဘာသာပြန်ဆိုရာမှာ ပါဝင်ကူညီခဲ့လဲဆိုတာကို အောက်ပါဇယားမှာ ကြည့်ရှုနိုင်ပါတယ်။
+
+| Language                                                                      | Authors                                                                                                                                                                                                                                                                                                                                                  |
+|:------------------------------------------------------------------------------|:---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
+| [French](https://huggingface.co/course/fr/chapter1/1)                         | [@lbourdois](https://github.com/lbourdois), [@ChainYo](https://github.com/ChainYo), [@melaniedrevet](https://github.com/melaniedrevet), [@abdouaziz](https://github.com/abdouaziz)                                                                                                                                                                       |
+| [Vietnamese](https://huggingface.co/course/vi/chapter1/1)                     | [@honghanhh](https://github.com/honghanhh)                                                                                                                                                                                                                                                                                                               |
+| [Chinese (simplified)](https://huggingface.co/course/zh-CN/chapter1/1)        | [@zhlhyx](https://github.com/zhlhyx), [petrichor1122](https://github.com/petrichor1122), [@yaoqih](https://github.com/yaoqih)                                                                                                                                                                                                                    |
+| [Bengali](https://huggingface.co/course/bn/chapter1/1) (WIP)                  | [@avishek-018](https://github.com/avishek-018), [@eNipu](https://github.com/eNipu)                                                                                                                                                                                                                                                                       |
+| [German](https://huggingface.co/course/de/chapter1/1) (WIP)                   | [@JesperDramsch](https://github.com/JesperDramsch), [@MarcusFra](https://github.com/MarcusFra), [@fabridamicelli](https://github.com/fabridamicelli)                                                                                                                                                                                                     |
+| [Spanish](https://huggingface.co/course/es/chapter1/1) (WIP)                  | [@camartinezbu](https://github.com/camartinezbu), [@munozariasjm](https://github.com/munozariasjm), [@fordaz](https://github.com/fordaz)                                                                                                                                                                                                                 |
+| [Persian](https://huggingface.co/course/fa/chapter1/1) (WIP)                  | [@jowharshamshiri](https://github.com/jowharshamshiri), [@schoobani](https://github.com/schoobani)                                                                                                                                                                                                                                                       |
+| [Gujarati](https://huggingface.co/course/gu/chapter1/1) (WIP)                 | [@pandyaved98](https://github.com/pandyaved98)                                                                                                                                                                                                                                                                                                           |
+| [Hebrew](https://huggingface.co/course/he/chapter1/1) (WIP)                   | [@omer-dor](https://github.com/omer-dor)                                                                                                                                                                                                                                                                                                                 |
+| [Hindi](https://huggingface.co/course/hi/chapter1/1) (WIP)                    | [@pandyaved98](https://github.com/pandyaved98)                                                                                                                                                                                                                                                                                                           |
+| [Bahasa Indonesia](https://huggingface.co/course/id/chapter1/1) (WIP)         | [@gstdl](https://github.com/gstdl)                                                                                                                                                                                                                                                                                                                       |
+| [Italian](https://huggingface.co/course/it/chapter1/1) (WIP)                  | [@CaterinaBi](https://github.com/CaterinaBi), [@ClonedOne](https://github.com/ClonedOne),    [@Nolanogenn](https://github.com/Nolanogenn), [@EdAbati](https://github.com/EdAbati), [@gdacciaro](https://github.com/gdacciaro)                                                                                                                            |
+| [Japanese](https://huggingface.co/course/ja/chapter1/1) (WIP)                 | [@hiromu166](https://github.com/@hiromu166), [@younesbelkada](https://github.com/@younesbelkada), [@HiromuHota](https://github.com/@HiromuHota)                                                                                                                                                                                                          |
+| [Korean](https://huggingface.co/course/ko/chapter1/1) (WIP)                   | [@Doohae](https://github.com/Doohae), [@wonhyeongseo](https://github.com/wonhyeongseo), [@dlfrnaos19](https://github.com/dlfrnaos19)                                                                                                                                                                                                                     |
+| [Portuguese](https://huggingface.co/course/pt/chapter1/1) (WIP)               | [@johnnv1](https://github.com/johnnv1), [@victorescosta](https://github.com/victorescosta), [@LincolnVS](https://github.com/LincolnVS)                                                                                                                                                                                                                   |
+| [Russian](https://huggingface.co/course/ru/chapter1/1) (WIP)                  | [@pdumin](https://github.com/pdumin), [@svv73](https://github.com/svv73)                                                                                                                                                                                                                                                                                 |
+| [Thai](https://huggingface.co/course/th/chapter1/1) (WIP)                     | [@peeraponw](https://github.com/peeraponw), [@a-krirk](https://github.com/a-krirk), [@jomariya23156](https://github.com/jomariya23156), [@ckingkan](https://github.com/ckingkan)                                                                                                                                                                         |
+| [Turkish](https://huggingface.co/course/tr/chapter1/1) (WIP)                  | [@tanersekmen](https://github.com/tanersekmen), [@mertbozkir](https://github.com/mertbozkir), [@ftarlaci](https://github.com/ftarlaci), [@akkasayaz](https://github.com/akkasayaz)                                                                                                                                                                       |
+| [Chinese (traditional)](https://huggingface.co/course/zh-TW/chapter1/1) (WIP) | [@davidpeng86](https://github.com/davidpeng86)                                                                                                                                                                                                                                                                                                           |
+
+အချို့ဘာသာစကားတွေအတွက် Hugging Face သင်တန်းရဲ့  [YouTube ဗီဒီယိုတွေ](https://youtube.com/playlist?list=PLo2EIpI_JMQvWfQndUesu0nPBAtZ9gP1o) မှာ အဲဒီဘာသာစကားတွေနဲ့ စာတန်းထိုးတွေ ပါဝင်ပါတယ်။ ဗီဒီယိုရဲ့ ညာဘက်အောက်ထောင့်မှာရှိတဲ့ _CC_ ခလုတ်ကို အရင်ဆုံးနှိပ်ပြီး ၎င်းတို့ကို ဖွင့်နိုင်ပါတယ်။ ထို့နောက် settings icon ⚙️ အောက်မှာရှိတဲ့ _Subtitles/CC_ option ကို ရွေးချယ်ပြီး လိုချင်တဲ့ ဘာသာစကားကို ရွေးနိုင်ပါတယ်။
+
+<img src="https://huggingface.co/datasets/huggingface-course/documentation-images/resolve/main/en/chapter1/subtitles.png" alt="Activating subtitles for the Hugging Face course YouTube videos" width="75%">
+
+> [!TIP]
+> အထက်ပါ ဇယားတွင် သင့်ဘာသာစကားကို မတွေ့ရပါက သို့မဟုတ် လက်ရှိဘာသာပြန်ဆိုမှုကို ပံ့ပိုးကူညီလိုပါက၊ <a href="https://github.com/huggingface/course#translating-the-course-into-your-language">ဤနေရာရှိ</a> ညွှန်ကြားချက်များကို လိုက်နာခြင်းဖြင့် သင်တန်းကို ဘာသာပြန်ဆိုရာတွင် ကူညီနိုင်ပါသည်။
+
+## စလိုက်ရအောင် 🚀
+
+သင် စတင်ဖို့ အဆင်သင့်ဖြစ်ပြီလား။ ဒီအခန်းမှာ သင်လေ့လာရမယ့်အရာတွေကတော့-
+
+* text generation နဲ့ classification လို Natural Language Processing (NLP) လုပ်ငန်းတာဝန်တွေကို `pipeline()` function အသုံးပြုပြီး ဘယ်လိုဖြေရှင်းရမယ်
+* Transformer architecture အကြောင်း
+* encoder, decoder နဲ့ encoder-decoder architecture တွေနဲ့ ၎င်းတို့ရဲ့ အသုံးပြုပုံ ကိစ္စရပ်တွေကို ဘယ်လို ခွဲခြားရမယ်
+
+## ဝေါဟာရ ရှင်းလင်းချက် (Glossary)
+
+*   **Large Language Models (LLMs)**: လူသားဘာသာစကားကို နားလည်ပြီး ထုတ်လုပ်ပေးနိုင်တဲ့ အလွန်ကြီးမားတဲ့ Artificial Intelligence (AI) မော်ဒယ်တွေ ဖြစ်ပါတယ်။ ၎င်းတို့ဟာ ဒေတာအမြောက်အမြားနဲ့ သင်ကြားလေ့ကျင့်ထားပြီး စာရေးတာ၊ မေးခွန်းဖြေတာ စတဲ့ ဘာသာစကားဆိုင်ရာ လုပ်ငန်းမျိုးစုံကို လုပ်ဆောင်နိုင်ပါတယ်။
+*   **Natural Language Processing (NLP)**: ကွန်ပျူတာတွေ လူသားဘာသာစကားကို နားလည်၊ အဓိပ္ပာယ်ဖော်ပြီး၊ ဖန်တီးနိုင်အောင် လုပ်ဆောင်ပေးတဲ့ Artificial Intelligence (AI) ရဲ့ နယ်ပယ်ခွဲတစ်ခု ဖြစ်ပါတယ်။ ဥပမာအားဖြင့် စာသားခွဲခြမ်းစိတ်ဖြာခြင်း၊ ဘာသာပြန်ခြင်း စသည်တို့ ပါဝင်ပါတယ်။
+*   **Hugging Face Ecosystem**: Hugging Face ကုမ္ပဏီမှ ဖန်တီးထားတဲ့ AI နဲ့ machine learning အတွက် ကိရိယာတွေ၊ library တွေ၊ မော်ဒယ်တွေနဲ့ platform တွေရဲ့ အစုအဝေးတစ်ခုပါ။
+*   **🤗 Transformers**: Hugging Face က ထုတ်လုပ်ထားတဲ့ library တစ်ခုဖြစ်ပြီး Transformer မော်ဒယ်တွေကို အသုံးပြုပြီး Natural Language Processing (NLP), computer vision, audio processing စတဲ့ နယ်ပယ်တွေမှာ အဆင့်မြင့် AI မော်ဒယ်တွေကို တည်ဆောက်ပြီး အသုံးပြုနိုင်စေပါတယ်။
+*   **🤗 Datasets**: Hugging Face က ထုတ်လုပ်ထားတဲ့ library တစ်ခုဖြစ်ပြီး AI မော်ဒယ်တွေ လေ့ကျင့်ဖို့အတွက် ဒေတာအစုအဝေး (datasets) တွေကို လွယ်လွယ်ကူကူ ဝင်ရောက်ရယူ၊ စီမံခန့်ခွဲပြီး အသုံးပြုနိုင်စေပါတယ်။
+*   **🤗 Tokenizers**: Hugging Face က ထုတ်လုပ်ထားတဲ့ library တစ်ခုဖြစ်ပြီး စာသားတွေကို AI မော်ဒယ်တွေ နားလည်နိုင်တဲ့ ပုံစံ (tokens) တွေအဖြစ် ပြောင်းလဲပေးတဲ့ လုပ်ငန်းစဉ် (tokenization) ကို မြန်ဆန်ထိရောက်စွာ လုပ်ဆောင်ပေးပါတယ်။
+*   **🤗 Accelerate**: Hugging Face က ထုတ်လုပ်ထားတဲ့ library တစ်ခုဖြစ်ပြီး PyTorch code တွေကို မတူညီတဲ့ training environment (ဥပမာ - GPU အများအပြား၊ distributed training) တွေမှာ အလွယ်တကူ run နိုင်အောင် ကူညီပေးပါတယ်။
+*   **Hugging Face Hub**: AI မော်ဒယ်တွေ၊ datasets တွေနဲ့ demo တွေကို အခြားသူတွေနဲ့ မျှဝေဖို့၊ ရှာဖွေဖို့နဲ့ ပြန်လည်အသုံးပြုဖို့အတွက် အွန်လိုင်း platform တစ်ခု ဖြစ်ပါတယ်။
+*   **Machine Learning (ML)**: ကွန်ပျူတာတွေဟာ ဒေတာတွေကနေ သင်ယူပြီး လုပ်ငန်းဆောင်တာတွေကို လူသားတွေရဲ့ ညွှန်ကြားချက်မပါဘဲ ကိုယ်တိုင်လုပ်ဆောင်နိုင်အောင် လုပ်ဆောင်ပေးတဲ့ Artificial Intelligence (AI) ရဲ့ နယ်ပယ်ခွဲတစ်ခု ဖြစ်ပါတယ်။
+*   **Artificial Intelligence (AI)**: လူသားတွေရဲ့ ဉာဏ်ရည်ဉာဏ်သွေးလိုမျိုး တွေးခေါ်နိုင်စွမ်း၊ သင်ယူနိုင်စွမ်းနဲ့ ပြဿနာဖြေရှင်းနိုင်စွမ်းရှိတဲ့ စက်တွေကို ဖန်တီးတဲ့ သိပ္ပံနယ်ပယ်တစ်ခုပါ။
+*   **Transformer Model**: Natural Language Processing (NLP) မှာ အောင်မြင်မှုများစွာရရှိခဲ့တဲ့ deep learning architecture တစ်မျိုးပါ။ ၎င်းတို့ဟာ စာသားတွေထဲက စကားလုံးတွေရဲ့ ဆက်နွယ်မှုတွေကို "attention mechanism" သုံးပြီး နားလည်အောင် သင်ကြားပေးပါတယ်။
+*   **Fine-tune**: ကြိုတင်လေ့ကျင့်ထားပြီးသား (pre-trained) မော်ဒယ်တစ်ခုကို သီးခြားလုပ်ငန်းတစ်ခု (specific task) အတွက် အနည်းငယ်သော ဒေတာနဲ့ ထပ်မံလေ့ကျင့်ပေးခြင်းကို ဆိုလိုပါတယ်။
+*   **Dataset**: AI မော်ဒယ်တွေ လေ့ကျင့်ဖို့အတွက် အသုံးပြုတဲ့ ဒေတာအစုအဝေးတစ်ခုပါ။
+*   **Sentiment Analysis**: စာသားတစ်ခုရဲ့ စိတ်ခံစားမှု (အပြုသဘော၊ အနုတ်သဘော၊ ကြားနေ) ကို ခွဲခြမ်းစိတ်ဖြာခြင်း။
+*   **Named Entity Recognition (NER)**: စာသားထဲက လူအမည်၊ နေရာအမည်၊ အဖွဲ့အစည်းအမည် စတဲ့ သီးခြားအမည်တွေကို ရှာဖွေဖော်ထုတ်ခြင်း။
+*   **Machine Translation**: ဘာသာစကားတစ်ခုကနေ အခြားဘာသာစကားတစ်ခုကို စာသားတွေ ဒါမှမဟုတ် စကားပြောတွေကို အလိုအလျောက် ဘာသာပြန်ဆိုခြင်း။
+*   **Gradio**: Python library တစ်ခုဖြစ်ပြီး machine learning မော်ဒယ်တွေအတွက် အသုံးပြုရလွယ်ကူတဲ့ web interface တွေ ဒါမှမဟုတ် demo တွေကို အလွယ်တကူ ဖန်တီးနိုင်စေပါတယ်။
+*   **API (Application Programming Interface)**: ဆော့ဖ်ဝဲလ် နှစ်ခုကြား အပြန်အလှန် ချိတ်ဆက်ဆောင်ရွက်နိုင်ရန် လမ်းကြောင်းဖွင့်ပေးသော အစုအဝေး (set of rules) များ။
+*   **Deep Learning**: Machine Learning ရဲ့ နယ်ပယ်ခွဲတစ်ခုဖြစ်ပြီး neural networks တွေကို အသုံးပြုကာ ဒေတာတွေကနေ ရှုပ်ထွေးတဲ့ ပုံစံတွေကို သင်ယူစေပါတယ်။
+*   **PyTorch**: Facebook (ယခု Meta) က ဖန်တီးထားတဲ့ open-source machine learning library တစ်ခုဖြစ်ပြီး deep learning မော်ဒယ်တွေ တည်ဆောက်ဖို့အတွက် အသုံးပြုပါတယ်။
+*   **TensorFlow**: Google က ဖန်တီးထားတဲ့ open-source machine learning library တစ်ခုဖြစ်ပြီး deep learning မော်ဒယ်တွေ တည်ဆောက်ဖို့အတွက် အသုံးပြုပါတယ်။
+*   **Naive Bayes**: ရိုးရှင်းပြီး အသုံးပြုရလွယ်ကူတဲ့ classification algorithm တစ်ခုဖြစ်ပြီး Bayes' Theorem ပေါ် အခြေခံထားပါတယ်။
+*   **LSTMs (Long Short-Term Memory)**: Recurrent Neural Networks (RNNs) ရဲ့ အထူးပြုပုံစံတစ်ခုဖြစ်ပြီး အချိန်ကြာမြင့်စွာ တည်ရှိနေတဲ့ မှတ်ဉာဏ် (long-term dependencies) တွေကို သင်ယူနိုင်စွမ်းရှိပါတယ်။
+*   **AGI (Artificial General Intelligence)**: လူသားတစ်ဦးလို ဉာဏ်ရည်ဉာဏ်သွေး၊ သင်ယူနိုင်စွမ်းနဲ့ လုပ်ငန်းဆောင်တာအမျိုးမျိုးကို လုပ်ဆောင်နိုင်စွမ်းရှိတဲ့ Artificial Intelligence (AI) အမျိုးအစားကို ဆိုလိုပါတယ်။
+*   **Encoder**: Transformer Architecture ရဲ့ အစိတ်အပိုင်းတစ်ခုဖြစ်ပြီး input data (ဥပမာ- စာသား) ကို နားလည်ပြီး ကိုယ်စားပြုတဲ့ အချက်အလက် (representation) အဖြစ် ပြောင်းလဲပေးပါတယ်။
+*   **Decoder**: Transformer Architecture ရဲ့ အစိတ်အပိုင်းတစ်ခုဖြစ်ပြီး encoder ကနေ ရရှိတဲ့ အချက်အလက် (representation) ကို အသုံးပြုပြီး output data (ဥပမာ- ဘာသာပြန်ထားတဲ့ စာသား) ကို ထုတ်ပေးပါတယ်။
+*   **Encoder-Decoder Architecture**: Encoder နှင့် Decoder နှစ်ခုစလုံး ပါဝင်သော Transformer architecture တစ်မျိုးဖြစ်ပြီး ဘာသာပြန်ခြင်းကဲ့သို့သော input sequence မှ output sequence တစ်ခုသို့ ပြောင်းလဲခြင်း လုပ်ငန်းများအတွက် အသုံးပြုပါတယ်။
+*   **Text Generation**: AI မော်ဒယ်များကို အသုံးပြု၍ လူသားကဲ့သို့သော စာသားအသစ်များ ဖန်တီးခြင်း။
+*   **Classification**: ဒေတာအချက်အလက်များကို သတ်မှတ်ထားသော အမျိုးအစားများ သို့မဟုတ် အတန်းများထဲသို့ ခွဲခြားသတ်မှတ်ခြင်း။
+*   **Pipeline function**: Hugging Face Transformers library မှာ ပါဝင်တဲ့ လုပ်ဆောင်ချက်တစ်ခုဖြစ်ပြီး မော်ဒယ်တွေကို သီးခြားလုပ်ငန်းတာဝန်များ (ဥပမာ- စာသားခွဲခြားသတ်မှတ်ခြင်း၊ စာသားထုတ်လုပ်ခြင်း) အတွက် အသုံးပြုရလွယ်ကူအောင် ပြုလုပ်ပေးပါတယ်။
\ No newline at end of file
diff --git a/chapters/my/chapter1/10.mdx b/chapters/my/chapter1/10.mdx
new file mode 100644
index 000000000..201ca3b6d
--- /dev/null
+++ b/chapters/my/chapter1/10.mdx
@@ -0,0 +1,111 @@
+# အနှစ်ချုပ်[[summary]]
+
+<CourseFloatingBanner
+    chapter={1}
+    classNames="absolute z-10 right-0 top-0"
+/>
+
+ဒီအခန်းမှာ Transformer မော်ဒယ်တွေ၊ Large Language Models (LLMs) တွေရဲ့ အခြေခံသဘောတရားတွေနဲ့ ၎င်းတို့က Artificial Intelligence (AI) နယ်ပယ်အပြင် အခြားနယ်ပယ်တွေကိုပါ ဘယ်လို တော်လှန်ပြောင်းလဲနေတယ်ဆိုတာကို သင်လေ့လာခဲ့ပြီးပါပြီ။
+
+## အဓိက သဘောတရားများ[[key-concepts-covered]]
+
+### Natural Language Processing (NLP) နှင့် LLMs များ
+
+Natural Language Processing (NLP) ဆိုတာ ဘာလဲ၊ Large Language Models (LLMs) တွေက ဒီနယ်ပယ်ကို ဘယ်လို ပြောင်းလဲပစ်ခဲ့လဲဆိုတာကို ကျွန်တော်တို့ လေ့လာခဲ့ပါတယ်။ သင်လေ့လာခဲ့တဲ့ အချက်တွေကတော့-
+- NLP ဟာ classification ကနေ generation အထိ လုပ်ငန်းတာဝန်အမျိုးမျိုးကို လွှမ်းခြုံထားပါတယ်။
+- LLMs တွေဟာ ဒေတာအမြောက်အမြားနဲ့ လေ့ကျင့်ထားတဲ့ အစွမ်းထက်တဲ့ မော်ဒယ်တွေ ဖြစ်ပါတယ်။
+- ဒီမော်ဒယ်တွေက architecture တစ်ခုတည်းနဲ့ လုပ်ငန်းတာဝန်မျိုးစုံကို လုပ်ဆောင်နိုင်ပါတယ်။
+- ၎င်းတို့ရဲ့ စွမ်းရည်တွေရှိနေပေမယ့်လည်း LLMs တွေမှာ hallucinations နဲ့ bias တွေလို ကန့်သတ်ချက်တွေ ရှိပါတယ်။
+
+### Transformer ရဲ့ စွမ်းရည်များ[[transformer-capabilities]]
+
+🤗 Transformers library ထဲက `pipeline()` function က pre-trained model တွေကို လုပ်ငန်းတာဝန်အမျိုးမျိုးအတွက် ဘယ်လိုလွယ်ကူစွာ အသုံးပြုနိုင်လဲဆိုတာကို သင်တွေ့ခဲ့ရပါတယ်-
+- Text classification, token classification, နဲ့ question answering
+- Text generation နဲ့ summarization
+- Translation နဲ့ အခြား sequence-to-sequence လုပ်ငန်းတာဝန်များ
+- Speech recognition နဲ့ image classification
+
+### Transformer architecture[[transformer-architecture]]
+
+Transformer မော်ဒယ်တွေဟာ အမြင့်ဆုံးအဆင့်မှာ ဘယ်လိုအလုပ်လုပ်တယ်ဆိုတာကို ကျွန်တော်တို့ ဆွေးနွေးခဲ့ပါတယ်-
+- Attention mechanism ရဲ့ အရေးပါမှု
+- Transfer learning က မော်ဒယ်တွေကို သီးခြားလုပ်ငန်းတာဝန်များအတွက် ဘယ်လို လိုက်လျောညီထွေဖြစ်အောင် ကူညီပေးတယ်ဆိုတာ
+- အဓိက architecture ပုံစံသုံးမျိုး- encoder-only, decoder-only, နဲ့ encoder-decoder
+
+### မော်ဒယ် architecture များနှင့် ၎င်းတို့၏ အသုံးချမှုများ[[model-architectures-and-their-applications]]
+ဒီအခန်းရဲ့ အဓိကအချက်ကတော့ မတူညီတဲ့ လုပ်ငန်းတာဝန်တွေအတွက် ဘယ် architecture ကို အသုံးပြုရမလဲဆိုတာကို နားလည်ခြင်း ဖြစ်ပါတယ်-
+
+| မော်ဒယ်           | ဥပမာများ                                   | လုပ်ငန်းတာဝန်များ                                                                            |
+|-----------------|--------------------------------------------|----------------------------------------------------------------------------------|
+| Encoder-only    | BERT, DistilBERT, ModernBERT               | စာကြောင်းခွဲခြားသတ်မှတ်ခြင်း (Sentence classification), အမည်သတ်မှတ်ခြင်း (named entity recognition), စာသားမှ အဖြေထုတ်ယူခြင်း (extractive question answering) |
+| Decoder-only    | GPT, LLaMA, Gemma, SmolLM                  | စာသားထုတ်လုပ်ခြင်း (Text generation), conversational AI, ဖန်တီးမှုစာရေးခြင်း (creative writing)                             |
+| Encoder-decoder | BART, T5, Marian, mBART                    | အကျဉ်းချုပ်ခြင်း (Summarization), ဘာသာပြန်ခြင်း (translation), ထုတ်လုပ်မှုမေးခွန်းဖြေခြင်း (generative question answering)                        |
+
+### ခေတ်မီ LLM ဖွံ့ဖြိုးတိုးတက်မှုများ[[modern-llm-developments]]
+နယ်ပယ်ရဲ့ မကြာသေးခင် ဖွံ့ဖြိုးတိုးတက်မှုတွေအကြောင်းကိုလည်း သင်လေ့လာခဲ့ပါတယ်-
+- LLMs တွေဟာ အချိန်နဲ့အမျှ အရွယ်အစားနဲ့ စွမ်းရည် ဘယ်လိုတိုးတက်လာခဲ့လဲဆိုတာ
+- Scaling laws သဘောတရားနဲ့ ၎င်းတို့က မော်ဒယ်ဖွံ့ဖြိုးတိုးတက်မှုကို ဘယ်လိုလမ်းညွှန်ပေးတယ်ဆိုတာ
+- မော်ဒယ်တွေကို ပိုမိုရှည်လျားတဲ့ sequences တွေကို လုပ်ဆောင်နိုင်အောင် ကူညီပေးတဲ့ သီးခြား attention mechanism များ
+- Pretraining နဲ့ instruction tuning တို့ရဲ့ နှစ်ဆင့်လေ့ကျင့်မှု ချဉ်းကပ်ပုံ
+
+### လက်တွေ့အသုံးချမှုများ[[practical-applications]]
+ဒီအခန်းတစ်လျှောက်လုံးမှာ ဒီမော်ဒယ်တွေကို လက်တွေ့ဘဝပြဿနာတွေမှာ ဘယ်လိုအသုံးချနိုင်လဲဆိုတာကို သင်တွေ့ခဲ့ရပါတယ်-
+- Hugging Face Hub ကို အသုံးပြုပြီး pre-trained model တွေ ရှာဖွေအသုံးပြုခြင်း
+- Inference API ကို အသုံးပြုပြီး browser ထဲမှာ မော်ဒယ်တွေကို တိုက်ရိုက်စမ်းသပ်ခြင်း
+- သီးခြားလုပ်ငန်းတာဝန်များအတွက် ဘယ်မော်ဒယ်တွေက အသင့်တော်ဆုံးလဲဆိုတာကို နားလည်ခြင်း
+
+## ရှေ့ဆက်မျှော်ကြည့်ခြင်း[[looking-ahead]]
+
+Transformer မော်ဒယ်တွေဆိုတာ ဘာလဲ၊ ၎င်းတို့က အမြင့်ဆုံးအဆင့်မှာ ဘယ်လိုအလုပ်လုပ်တယ်ဆိုတာကို သင်သေချာနားလည်သွားပြီဆိုတော့၊ ၎င်းတို့ကို ထိထိရောက်ရောက် ဘယ်လိုအသုံးပြုရမလဲဆိုတာကို နက်နက်နဲနဲ လေ့လာဖို့ သင်အဆင်သင့်ဖြစ်ပါပြီ။ နောက်အခန်းတွေမှာ သင်လေ့လာရမယ့်အရာတွေကတော့-
+
+- Transformers library ကို အသုံးပြုပြီး မော်ဒယ်တွေ တင်သွင်းခြင်းနဲ့ fine-tune လုပ်ခြင်း
+- မော်ဒယ် input အတွက် မတူညီတဲ့ ဒေတာအမျိုးအစားတွေကို လုပ်ဆောင်ခြင်း
+- Pre-trained model တွေကို သင်ရဲ့ သီးခြားလုပ်ငန်းတာဝန်များအတွက် လိုက်လျောညီထွေဖြစ်အောင် ပြုလုပ်ခြင်း
+- လက်တွေ့အသုံးချမှုများအတွက် မော်ဒယ်များ တပ်ဆင်အသုံးပြုခြင်း (deploy)
+
+ဒီအခန်းမှာ သင်တည်ဆောက်ခဲ့တဲ့ အခြေခံအုတ်မြစ်က လာမယ့်အပိုင်းတွေမှာ ပိုမိုအဆင့်မြင့်တဲ့ အကြောင်းအရာတွေနဲ့ နည်းစနစ်တွေကို သင်လေ့လာတဲ့အခါမှာ အလွန်အသုံးဝင်ပါလိမ့်မယ်။
+
+## ဝေါဟာရ ရှင်းလင်းချက် (Glossary)
+
+*   **Transformer Models**: Natural Language Processing (NLP) မှာ အောင်မြင်မှုများစွာရရှိခဲ့တဲ့ deep learning architecture တစ်မျိုးပါ။ ၎င်းတို့ဟာ စာသားတွေထဲက စကားလုံးတွေရဲ့ ဆက်နွယ်မှုတွေကို "attention mechanism" သုံးပြီး နားလည်အောင် သင်ကြားပေးပါတယ်။
+*   **Large Language Models (LLMs)**: လူသားဘာသာစကားကို နားလည်ပြီး ထုတ်လုပ်ပေးနိုင်တဲ့ အလွန်ကြီးမားတဲ့ Artificial Intelligence (AI) မော်ဒယ်တွေ ဖြစ်ပါတယ်။ ၎င်းတို့ဟာ ဒေတာအမြောက်အမြားနဲ့ သင်ကြားလေ့ကျင့်ထားပြီး စာရေးတာ၊ မေးခွန်းဖြေတာ စတဲ့ ဘာသာစကားဆိုင်ရာ လုပ်ငန်းမျိုးစုံကို လုပ်ဆောင်နိုင်ပါတယ်။
+*   **Natural Language Processing (NLP)**: ကွန်ပျူတာတွေ လူသားဘာသာစကားကို နားလည်၊ အဓိပ္ပာယ်ဖော်ပြီး၊ ဖန်တီးနိုင်အောင် လုပ်ဆောင်ပေးတဲ့ Artificial Intelligence (AI) ရဲ့ နယ်ပယ်ခွဲတစ်ခု ဖြစ်ပါတယ်။ ဥပမာအားဖြင့် စာသားခွဲခြမ်းစိတ်ဖြာခြင်း၊ ဘာသာပြန်ခြင်း စသည်တို့ ပါဝင်ပါတယ်။
+*   **Artificial Intelligence (AI)**: လူသားတွေရဲ့ ဉာဏ်ရည်ဉာဏ်သွေးလိုမျိုး တွေးခေါ်နိုင်စွမ်း၊ သင်ယူနိုင်စွမ်းနဲ့ ပြဿနာဖြေရှင်းနိုင်စွမ်းရှိတဲ့ စက်တွေကို ဖန်တီးတဲ့ သိပ္ပံနယ်ပယ်တစ်ခုပါ။
+*   **Classification**: ဒေတာအချက်အလက်များကို သတ်မှတ်ထားသော အမျိုးအစားများ သို့မဟုတ် အတန်းများထဲသို့ ခွဲခြားသတ်မှတ်ခြင်း။
+*   **Generation**: AI မော်ဒယ်များကို အသုံးပြု၍ အချက်အလက်အသစ်များ (ဥပမာ - စာသား၊ ပုံများ) ဖန်တီးခြင်း။
+*   **Hallucinations**: Artificial Intelligence (AI) မော်ဒယ်များမှ မှန်ကန်မှုမရှိသော သို့မဟုတ် အဓိပ္ပာယ်မရှိသော အချက်အလက်များကို ယုံကြည်မှုရှိရှိ ထုတ်လုပ်ပေးခြင်း။
+*   **Bias**: ဒေတာအစုအဝေး (dataset) သို့မဟုတ် မော်ဒယ်၏ လေ့ကျင့်မှုပုံစံကြောင့် ဖြစ်ပေါ်လာသော ဘက်လိုက်မှုများ။
+*   **`pipeline()` function**: Hugging Face Transformers library မှာ ပါဝင်တဲ့ လုပ်ဆောင်ချက်တစ်ခုဖြစ်ပြီး မော်ဒယ်တွေကို သီးခြားလုပ်ငန်းတာဝန်များ (ဥပမာ- စာသားခွဲခြားသတ်မှတ်ခြင်း၊ စာသားထုတ်လုပ်ခြင်း) အတွက် အသုံးပြုရလွယ်ကူအောင် ပြုလုပ်ပေးပါတယ်။
+*   **🤗 Transformers**: Hugging Face က ထုတ်လုပ်ထားတဲ့ library တစ်ခုဖြစ်ပြီး Transformer မော်ဒယ်တွေကို အသုံးပြုပြီး Natural Language Processing (NLP), computer vision, audio processing စတဲ့ နယ်ပယ်တွေမှာ အဆင့်မြင့် AI မော်ဒယ်တွေကို တည်ဆောက်ပြီး အသုံးပြုနိုင်စေပါတယ်။
+*   **Pre-trained Models**: ဒေတာအမြောက်အမြားပေါ်တွင် ကြိုတင်လေ့ကျင့်ထားပြီးသား Artificial Intelligence (AI) မော်ဒယ်တစ်ခု။ ၎င်းတို့ကို အခြားလုပ်ငန်းများအတွက် အခြေခံအဖြစ် ပြန်လည်အသုံးပြုနိုင်သည်။
+*   **Text Classification**: စာသားတစ်ခုကို သတ်မှတ်ထားသော အမျိုးအစားများ သို့မဟုတ် အတန်းများထဲသို့ ခွဲခြားသတ်မှတ်ခြင်း။
+*   **Token Classification**: စာသားတစ်ခုရှိ token (စကားလုံး သို့မဟုတ် စာလုံးတစ်ပိုင်း) တစ်ခုစီကို အမျိုးအစားခွဲခြားသတ်မှတ်ခြင်း။
+*   **Question Answering**: မေးခွန်းတစ်ခုကို ပေးထားသော စာသားအကြောင်းအရာမှ အဖြေထုတ်ပေးခြင်း။
+*   **Text Generation**: AI မော်ဒယ်များကို အသုံးပြု၍ လူသားကဲ့သို့သော စာသားအသစ်များ ဖန်တီးခြင်း။
+*   **Summarization**: စာသားရှည်ကြီးတစ်ခုကို အဓိကအချက်အလက်များ မပျောက်ပျက်စေဘဲ အကျဉ်းချုံးဖော်ပြခြင်း။
+*   **Translation**: ဘာသာစကားတစ်ခုမှ အခြားဘာသာစကားတစ်ခုသို့ စာသားဘာသာပြန်ခြင်း။
+*   **Sequence-to-sequence Tasks**: input sequence တစ်ခုမှ output sequence တစ်ခုကို ထုတ်လုပ်ပေးသော လုပ်ငန်းတာဝန်များ။
+*   **Speech Recognition**: ပြောဆိုသော ဘာသာစကားကို ကွန်ပျူတာက စာသားအဖြစ် ပြောင်းလဲနားလည်နိုင်သည့် နည်းပညာ။
+*   **Image Classification**: ရုပ်ပုံတစ်ခုကို သတ်မှတ်ထားသော အမျိုးအစားများထဲသို့ ခွဲခြားသတ်မှတ်ခြင်း။
+*   **Attention Mechanism**: Transformer မော်ဒယ်များတွင် အသုံးပြုသော နည်းစနစ်တစ်ခုဖြစ်ပြီး input sequence အတွင်းရှိ အရေးပါသော အစိတ်အပိုင်းများကို အာရုံစိုက်ပြီး ဆက်နွယ်မှုများကို သင်ယူစေသည်။
+*   **Transfer Learning**: ကြိုတင်လေ့ကျင့်ထားပြီးသား မော်ဒယ်တစ်ခုမှ သင်ယူထားသော အသိပညာများကို အခြားဆက်စပ်လုပ်ငန်းတစ်ခုအတွက် အသုံးပြုခြင်း။
+*   **Encoder-only**: Transformer architecture အမျိုးအစားတစ်ခုဖြစ်ပြီး input ကို နားလည်ပြီး ကိုယ်စားပြုတဲ့ အချက်အလက်ကို ထုတ်ပေးတဲ့ encoder အစိတ်အပိုင်းတစ်ခုတည်း ပါဝင်ပါတယ်။
+*   **Decoder-only**: Transformer architecture အမျိုးအစားတစ်ခုဖြစ်ပြီး စာသားထုတ်လုပ်ခြင်းအတွက် အသုံးပြုတဲ့ decoder အစိတ်အပိုင်းတစ်ခုတည်း ပါဝင်ပါတယ်။
+*   **Encoder-decoder**: Transformer architecture အမျိုးအစားတစ်ခုဖြစ်ပြီး input sequence မှ output sequence သို့ ပြောင်းလဲခြင်း လုပ်ငန်းများအတွက် encoder နှင့် decoder နှစ်ခုစလုံး ပါဝင်ပါတယ်။
+*   **BERT**: Google မှ တီထွင်ထားသော Encoder-only Transformer မော်ဒယ်ဥပမာ။
+*   **DistilBERT**: BERT ၏ ပိုမိုသေးငယ်ပြီး မြန်ဆန်သော ဗားရှင်း။
+*   **ModernBERT**: BERT မော်ဒယ်နှင့် ဆင်တူသော နောက်ဆုံးပေါ် ဗားရှင်းတစ်ခု (ဤနေရာတွင် ဥပမာအဖြစ် ရည်ညွှန်းခြင်း)။
+*   **GPT (Generative Pre-trained Transformer)**: OpenAI မှ တီထွင်ထားသော Decoder-only Transformer မော်ဒယ်ဥပမာ။
+*   **LLaMA**: Meta မှ တီထွင်ထားသော Decoder-only Transformer မော်ဒယ်ဥပမာ။
+*   **Gemma**: Google မှ တီထွင်ထားသော Decoder-only Transformer မော်ဒယ်ဥပမာ။
+*   **SmolLM**: Decoder-only Transformer မော်ဒယ်ဥပမာ (ဤနေရာတွင် ဥပမာအဖြစ် ရည်ညွှန်းခြင်း)။
+*   **BART**: Facebook (ယခု Meta) မှ တီထွင်ထားသော Encoder-Decoder Transformer မော်ဒယ်ဥပမာ။
+*   **T5**: Google မှ တီထွင်ထားသော Encoder-Decoder Transformer မော်ဒယ်ဥပမာ။
+*   **Marian**: Encoder-Decoder Transformer မော်ဒယ်ဥပမာ (အဓိကအားဖြင့် ဘာသာပြန်ခြင်းအတွက်)။
+*   **mBART**: Facebook (ယခု Meta) မှ တီထွင်ထားသော Encoder-Decoder Transformer မော်ဒယ်ဥပမာ (ဘာသာစကားမျိုးစုံအတွက်)။
+*   **Scaling Laws**: မော်ဒယ်အရွယ်အစား၊ ဒေတာပမာဏနှင့် ကွန်ပျူတာအရင်းအမြစ်များ တိုးလာသည်နှင့်အမျှ AI မော်ဒယ်များ၏ စွမ်းဆောင်ရည်ကို ခန့်မှန်းဖော်ပြသော ဆက်နွယ်မှုများ။
+*   **Instruction Tuning**: မော်ဒယ်ကို သီးခြားညွှန်ကြားချက်များ (instructions) ကို နားလည်ပြီး လိုက်နာရန် ထပ်မံလေ့ကျင့်ပေးသော လုပ်ငန်းစဉ်။
+*   **Hugging Face Hub**: AI မော်ဒယ်တွေ၊ datasets တွေနဲ့ demo တွေကို အခြားသူတွေနဲ့ မျှဝေဖို့၊ ရှာဖွေဖို့နဲ့ ပြန်လည်အသုံးပြုဖို့အတွက် အွန်လိုင်း platform တစ်ခု ဖြစ်ပါတယ်။
+*   **Inference API**: Hugging Face Hub ပေါ်ရှိ မော်ဒယ်များကို web request များမှတစ်ဆင့် တိုက်ရိုက်အသုံးပြုနိုင်စေသည့် Application Programming Interface (API)။
+*   **Fine-tune**: ကြိုတင်လေ့ကျင့်ထားပြီးသား (pre-trained) မော်ဒယ်တစ်ခုကို သီးခြားလုပ်ငန်းတစ်ခု (specific task) အတွက် အနည်းငယ်သော ဒေတာနဲ့ ထပ်မံလေ့ကျင့်ပေးခြင်းကို ဆိုလိုပါတယ်။
+*   **Deploy**: Machine Learning မော်ဒယ်တစ်ခုကို အမှန်တကယ် အသုံးပြုနိုင်သော စနစ် သို့မဟုတ် environment တစ်ခုထဲသို့ ထည့်သွင်းခြင်း။
\ No newline at end of file
diff --git a/chapters/my/chapter1/11.mdx b/chapters/my/chapter1/11.mdx
new file mode 100644
index 000000000..14cb25459
--- /dev/null
+++ b/chapters/my/chapter1/11.mdx
@@ -0,0 +1,31 @@
+# စာမေးပွဲ ဖြေချိန်ရောက်ပြီ။[[exam-time]]
+
+<CourseFloatingBanner
+    chapter={1}
+    classNames="absolute z-10 right-0 top-0"
+/>
+
+သင်ရဲ့ ဗဟုသုတတွေကို စမ်းသပ်ဖို့ အချိန်တန်ပါပြီ။ ဒီအခန်းမှာ သင်လေ့လာခဲ့တဲ့ သဘောတရားတွေကို နားလည်မှုရှိမရှိ စမ်းသပ်ဖို့အတွက် ဉာဏ်စမ်းမေးခွန်းတိုလေးတစ်ခုကို ကျွန်တော်တို့ ပြင်ဆင်ထားပါတယ်။
+
+ဉာဏ်စမ်းဖြေဆိုရန်အတွက် အောက်ပါအဆင့်များကို လိုက်နာရပါမယ်-
+
+1.  သင်၏ Hugging Face account သို့ ဝင်ရောက်ပါ။
+2.  ဉာဏ်စမ်းမေးခွန်းများကို ဖြေဆိုပါ။
+3.  သင်၏ အဖြေများကို တင်သွင်းပါ။
+
+## အဖြေမှန်ရွေး ဉာဏ်စမ်း
+
+ဒီဉာဏ်စမ်းမှာ ရွေးချယ်စရာများစာရင်းကနေ အဖြေမှန်ကို ရွေးချယ်ဖို့ သင့်ကို တောင်းဆိုပါလိမ့်မယ်။ supervised finetuning ရဲ့ အခြေခံသဘောတရားတွေကို ကျွန်တော်တို့ စမ်းသပ်သွားပါမယ်။
+
+<iframe
+	src="https://huggingface-course-chapter-1-exam.hf.space"
+	frameborder="0"
+	width="850"
+	height="450"
+></iframe>
+
+## ဝေါဟာရ ရှင်းလင်းချက် (Glossary)
+
+*   **Quiz**: သင်၏ ဗဟုသုတ သို့မဟုတ် စွမ်းရည်များကို စမ်းသပ်ရန်အတွက် မေးခွန်းတိုများ။
+*   **Hugging Face Account**: Hugging Face ပလက်ဖောင်းပေါ်ရှိ သုံးစွဲသူအကောင့်။ ၎င်းသည် မော်ဒယ်များ၊ datasets များနှင့် အခြားအရင်းအမြစ်များကို ဝင်ရောက်ကြည့်ရှုရန် ခွင့်ပြုသည်။
+*   **Supervised Finetuning**: ကြိုတင်လေ့ကျင့်ထားပြီးသား (pre-trained) မော်ဒယ်တစ်ခုကို တံဆိပ်တပ်ထားသော (labeled) ဒေတာများဖြင့် သီးခြားလုပ်ငန်းတစ်ခု (specific task) အတွက် ထပ်မံလေ့ကျင့်ပေးသော လုပ်ငန်းစဉ်။ "Supervised" ဆိုသည်မှာ မှန်ကန်သောအဖြေ (label) ကို မော်ဒယ်က သင်ယူရန်အတွက် ပေးထားသည်ကို ရည်ညွှန်းသည်။
\ No newline at end of file
diff --git a/chapters/my/chapter1/2.mdx b/chapters/my/chapter1/2.mdx
new file mode 100644
index 000000000..08a9fd9de
--- /dev/null
+++ b/chapters/my/chapter1/2.mdx
@@ -0,0 +1,83 @@
+# Natural Language Processing နှင့် Large Language Models များ[[natural-language-processing-and-large-language-models]]
+
+<CourseFloatingBanner
+    chapter={1}
+    classNames="absolute z-10 right-0 top-0"
+/>
+
+Transformer မော်ဒယ်တွေထဲကို မဝင်ခင်မှာ Natural Language Processing (NLP) ဆိုတာ ဘာလဲ၊ Large Language Models (LLMs) တွေက ဒီနယ်ပယ်ကို ဘယ်လို ပြောင်းလဲပစ်ခဲ့လဲ၊ ဘာကြောင့် ဒါတွေကို ကျွန်တော်တို့ ဂရုစိုက်သင့်လဲဆိုတာကို အကျဉ်းချုပ်လေး ကြည့်ရအောင်။
+
+## NLP ဆိုတာ ဘာလဲ။[[what-is-nlp]]
+
+<Youtube id="iNzlxWUAjd4" />
+
+Natural Language Processing (NLP) ဆိုတာက ဘာသာဗေဒပညာရပ်နဲ့ Machine Learning နယ်ပယ်ရဲ့ ဘာသာစကားဆိုင်ရာ အားလုံးကို နားလည်ဖို့ အဓိကထားတဲ့ ဘာသာရပ်ခွဲတစ်ခု ဖြစ်ပါတယ်။ NLP လုပ်ငန်းတာဝန်တွေရဲ့ ရည်ရွယ်ချက်ကတော့ တစ်လုံးချင်းစီရဲ့ စကားလုံးတွေကို နားလည်ဖို့တင်မဟုတ်ဘဲ အဲဒီစကားလုံးတွေရဲ့ အကြောင်းအရာ (context) ကိုပါ နားလည်နိုင်ဖို့ ဖြစ်ပါတယ်။
+
+အောက်ပါတို့ကတော့ အသုံးများတဲ့ NLP လုပ်ငန်းတာဝန်တွေနဲ့ ဥပမာအချို့ ဖြစ်ပါတယ်။
+
+- **စာကြောင်းအပြည့်အစုံကို အမျိုးအစားခွဲခြားခြင်း (Classifying whole sentences)**- ဝေဖန်သုံးသပ်ချက်တစ်ခုရဲ့ စိတ်ခံစားမှု (sentiment) ကို ရှာဖွေခြင်း၊ email တစ်ခုက spam ဟုတ်မဟုတ် သိရှိခြင်း၊ စာကြောင်းတစ်ကြောင်းရဲ့ သဒ္ဒါမှန်ကန်မှုကို ဆုံးဖြတ်ခြင်း၊ သို့မဟုတ် စာကြောင်းနှစ်ကြောင်းက ယုတ္တိရှိရှိ ဆက်စပ်မှုရှိမရှိ ဆုံးဖြတ်ခြင်း။
+- **စာကြောင်းတစ်ကြောင်းရှိ စကားလုံးတစ်လုံးစီကို အမျိုးအစားခွဲခြားခြင်း (Classifying each word in a sentence)**- စာကြောင်းတစ်ခုရဲ့ သဒ္ဒါဆိုင်ရာ အစိတ်အပိုင်းများ (နာမ်၊ ကြိယာ၊ နာမဝိသေသန) ကို ဖော်ထုတ်ခြင်း၊ သို့မဟုတ် သီးခြားအမည်များ (လူ၊ နေရာ၊ အဖွဲ့အစည်း) ကို ဖော်ထုတ်ခြင်း။
+- **စာသားအကြောင်းအရာ ဖန်တီးခြင်း (Generating text content)**- အလိုအလျောက်ဖန်တီးထားသော စာသားဖြင့် မေးခွန်းတစ်ခုကို ဖြည့်စွက်ခြင်း၊ စာသားတစ်ခုရှိ ကွက်လပ်များကို ဝှက်ထားသော စကားလုံးများဖြင့် ဖြည့်ဆည်းခြင်း။
+- **စာသားမှ အဖြေထုတ်ယူခြင်း (Extracting an answer from a text)**- မေးခွန်းတစ်ခုနဲ့ အကြောင်းအရာတစ်ခု ပေးထားပြီးနောက်၊ အဲဒီအကြောင်းအရာမှာ ပါဝင်တဲ့ အချက်အလက်တွေအပေါ် အခြေခံပြီး မေးခွန်းရဲ့အဖြေကို ထုတ်ယူခြင်း။
+- **input စာသားတစ်ခုမှ စာကြောင်းအသစ် ဖန်တီးခြင်း (Generating a new sentence from an input text)**- စာသားတစ်ခုကို အခြားဘာသာစကားတစ်ခုသို့ ဘာသာပြန်ခြင်း၊ စာသားတစ်ခုကို အကျဉ်းချုပ်ခြင်း။
+
+NLP ဟာ စာဖြင့်ရေးသားထားတဲ့ စာသားတွေအတွက်ပဲလို့ ကန့်သတ်ထားတာ မဟုတ်ပါဘူး။ ၎င်းဟာ အသံဖိုင်တစ်ခုရဲ့ မှတ်တမ်းကို ဖန်တီးခြင်း ဒါမှမဟုတ် ပုံတစ်ပုံကို ဖော်ပြခြင်းစတဲ့ speech recognition နဲ့ computer vision နယ်ပယ်တွေမှာ ရှုပ်ထွေးတဲ့ စိန်ခေါ်မှုတွေကိုပါ ဖြေရှင်းပေးပါတယ်။
+
+## Large Language Models (LLMs) တွေ ပေါ်ပေါက်လာခြင်း[[rise-of-llms]]
+
+မကြာသေးခင်နှစ်များအတွင်းမှာတော့ Natural Language Processing (NLP) နယ်ပယ်ဟာ Large Language Models (LLMs) တွေကြောင့် တော်လှန်ပြောင်းလဲမှုတွေ ကြုံခဲ့ရပါတယ်။ GPT (Generative Pre-trained Transformer) နဲ့ [Llama](https://huggingface.co/meta-llama) လို architecture တွေပါဝင်တဲ့ ဒီမော်ဒယ်တွေဟာ ဘာသာစကားလုပ်ဆောင်ခြင်း နယ်ပယ်မှာ ဖြစ်နိုင်ခြေရှိတဲ့ အရာတွေကို ပြောင်းလဲပစ်ခဲ့ပါတယ်။
+
+> [!TIP]
+> Large Language Model (LLM) ဆိုတာ များပြားလှတဲ့ စာသားဒေတာတွေနဲ့ လေ့ကျင့်ထားတဲ့ Artificial Intelligence (AI) မော်ဒယ်တစ်ခုဖြစ်ပြီး လူသားဆန်တဲ့ စာသားတွေကို နားလည်၊ ဖန်တီးနိုင်ပါတယ်။ ဘာသာစကားပုံစံတွေကို မှတ်မိနိုင်ပြီး သီးခြားလုပ်ငန်းအတွက် လေ့ကျင့်မှုမရှိဘဲ ဘာသာစကားလုပ်ငန်းမျိုးစုံကို လုပ်ဆောင်နိုင်ပါတယ်။ ၎င်းတို့ဟာ Natural Language Processing (NLP) နယ်ပယ်မှာ သိသိသာသာ တိုးတက်မှုတစ်ခုကို ကိုယ်စားပြုပါတယ်။
+
+LLMs တွေရဲ့ ထူးခြားချက်တွေကတော့:
+- **အတိုင်းအတာ (Scale)**: ၎င်းတို့မှာ parameters သန်းပေါင်းများစွာ၊ ဘီလီယံပေါင်းများစွာ သို့မဟုတ် ရာပေါင်းများစွာသော ဘီလီယံပေါင်းများစွာ ပါဝင်ပါတယ်။
+- **ယေဘုယျစွမ်းရည်များ (General capabilities)**: သီးခြားလုပ်ငန်းအတွက် လေ့ကျင့်မှုမရှိဘဲ လုပ်ငန်းတာဝန်ပေါင်းများစွာကို လုပ်ဆောင်နိုင်ပါတယ်။
+- **အကြောင်းအရာပေါ်မူတည်၍ သင်ယူခြင်း (In-context learning)**: prompt မှာ ပေးထားတဲ့ ဥပမာတွေကနေ သင်ယူနိုင်ပါတယ်။
+- **ပေါ်ထွက်လာသော စွမ်းရည်များ (Emergent abilities)**: ဒီမော်ဒယ်တွေရဲ့ အရွယ်အစား ကြီးမားလာတာနဲ့အမျှ ၎င်းတို့ဟာ ရှင်းရှင်းလင်းလင်း programming လုပ်ထားခြင်း မရှိတဲ့ ဒါမှမဟုတ် ကြိုတင်မမျှော်လင့်ထားတဲ့ စွမ်းရည်တွေကို ပြသလာပါတယ်။
+
+LLMs တွေ ပေါ်ပေါက်လာတာနဲ့အမျှ NLP လုပ်ငန်းတာဝန်အမျိုးမျိုးအတွက် သီးသန့်မော်ဒယ်တွေ တည်ဆောက်တဲ့ ပုံစံကနေ ဘာသာစကားလုပ်ငန်းအမျိုးမျိုးကို ဖြေရှင်းနိုင်ဖို့ prompt ပေးနိုင်တဲ့ ဒါမှမဟုတ် fine-tune လုပ်နိုင်တဲ့ မော်ဒယ်ကြီးတစ်လုံးကို အသုံးပြုတဲ့ ပုံစံကို ပြောင်းလဲသွားစေခဲ့ပါတယ်။ ဒါက အဆင့်မြင့် ဘာသာစကားလုပ်ဆောင်မှုကို ပိုမိုလက်လှမ်းမီစေခဲ့ပေမယ့် ထိရောက်မှု၊ ကျင့်ဝတ်နဲ့ အသုံးပြုမှု (deployment) စတဲ့ နယ်ပယ်တွေမှာ စိန်ခေါ်မှုအသစ်တွေကိုလည်း မိတ်ဆက်ပေးခဲ့ပါတယ်။
+
+သို့သော် LLMs တွေမှာ အရေးကြီးတဲ့ ကန့်သတ်ချက်တွေလည်း ရှိပါတယ်။
+- **ထင်ယောင်ထင်မှားဖြစ်ခြင်း (Hallucinations)**: ၎င်းတို့ဟာ မမှန်ကန်တဲ့ အချက်အလက်တွေကို ယုံကြည်မှုရှိရှိ ထုတ်လုပ်နိုင်ပါတယ်။
+- **စစ်မှန်သော နားလည်မှု ကင်းမဲ့ခြင်း (Lack of true understanding)**: ၎င်းတို့ဟာ ကမ္ဘာကြီးကို စစ်မှန်စွာ နားလည်ခြင်းမရှိဘဲ စာရင်းအင်းဆိုင်ရာ ပုံစံတွေပေါ်မှာသာ လည်ပတ်ပါတယ်။
+- **ဘက်လိုက်မှု (Bias)**: ၎င်းတို့ရဲ့ သင်ကြားမှုဒေတာ ဒါမှမဟုတ် inputs တွေမှာ ပါဝင်တဲ့ ဘက်လိုက်မှုတွေကို ပြန်လည်ထုတ်လုပ်နိုင်ပါတယ်။
+- **Context windows ကန့်သတ်ချက် (Limited context windows)**: ၎င်းတို့မှာ ကန့်သတ်ထားတဲ့ context windows များ ရှိပါတယ်။ (ဒါပေမယ့် တိုးတက်လာနေပါပြီ)
+- **ကွန်ပျူတာ အရင်းအမြစ်များ (Computational resources)**: ၎င်းတို့ဟာ ကွန်ပျူတာ အရင်းအမြစ်များစွာ လိုအပ်ပါတယ်။
+
+
+## ဘာကြောင့် ဘာသာစကား လုပ်ဆောင်ခြင်း (language processing) က ခက်ခဲရတာလဲ။[[why-is-it-challenging]]
+
+ကွန်ပျူတာတွေဟာ လူသားတွေလိုမျိုး သတင်းအချက်အလက်တွေကို လုပ်ဆောင်တာ မဟုတ်ပါဘူး။ ဥပမာအနေနဲ့ "I am hungry" ဆိုတဲ့ စာကြောင်းကို ဖတ်တဲ့အခါ ကျွန်တော်တို့က အဓိပ္ပာယ်ကို အလွယ်တကူ နားလည်နိုင်ပါတယ်။ အလားတူပဲ "I am hungry" နဲ့ "I am sad" ဆိုတဲ့ စာကြောင်းနှစ်ကြောင်း ပေးထားရင် ၎င်းတို့ ဘယ်လောက်တူညီလဲဆိုတာကို အလွယ်တကူ ဆုံးဖြတ်နိုင်ပါတယ်။ Machine Learning (ML) မော်ဒယ်တွေအတွက်တော့ ဒီလိုလုပ်ငန်းတာဝန်တွေက ပိုခက်ခဲပါတယ်။ စာသားကို မော်ဒယ်ကနေ သင်ယူနိုင်တဲ့ ပုံစံမျိုးနဲ့ လုပ်ဆောင်ဖို့ လိုပါတယ်။ ဘာသာစကားက ရှုပ်ထွေးတာကြောင့် ဒီလုပ်ငန်းစဉ်ကို ဘယ်လိုလုပ်ရမလဲဆိုတာ သေချာစဉ်းစားဖို့ လိုပါတယ်။ စာသားတွေကို ဘယ်လိုကိုယ်စားပြုမလဲဆိုတာနဲ့ ပတ်သက်ပြီး သုတေသနတွေ အများကြီးလုပ်ထားပြီး နောက်အခန်းမှာ နည်းလမ်းအချို့ကို ကြည့်ရှုသွားမှာပါ။
+
+LLMs တွေမှာ တိုးတက်မှုတွေ ရှိလာပေမယ့်လည်း၊ အဓိပ္ပာယ်ဝေဝါးမှု၊ ယဉ်ကျေးမှုဆိုင်ရာ အကြောင်းအရာ (cultural context)၊ လှောင်ပြောင်မှု (sarcasm) နဲ့ ဟာသ (humor) တွေကို နားလည်ခြင်း စတဲ့ အခြေခံကျတဲ့ စိန်ခေါ်မှုများစွာ ကျန်ရှိနေပါသေးတယ်။ LLMs တွေက မတူညီတဲ့ ဒေတာအစုအဝေးများစွာပေါ်မှာ အကြီးအကျယ် လေ့ကျင့်ထားခြင်းဖြင့် ဒီစိန်ခေါ်မှုတွေကို ဖြေရှင်းပေမယ့်၊ ရှုပ်ထွေးတဲ့ အခြေအနေများစွာမှာ လူသားအဆင့် နားလည်မှုအထိတော့ မရောက်နိုင်သေးပါဘူး။
+
+## ဝေါဟာရ ရှင်းလင်းချက် (Glossary)
+
+*   **Natural Language Processing (NLP)**: ကွန်ပျူတာတွေ လူသားဘာသာစကားကို နားလည်၊ အဓိပ္ပာယ်ဖော်ပြီး၊ ဖန်တီးနိုင်အောင် လုပ်ဆောင်ပေးတဲ့ Artificial Intelligence (AI) ရဲ့ နယ်ပယ်ခွဲတစ်ခု ဖြစ်ပါတယ်။ ဥပမာအားဖြင့် စာသားခွဲခြမ်းစိတ်ဖြာခြင်း၊ ဘာသာပြန်ခြင်း စသည်တို့ ပါဝင်ပါတယ်။
+*   **Large Language Models (LLMs)**: လူသားဘာသာစကားကို နားလည်ပြီး ထုတ်လုပ်ပေးနိုင်တဲ့ အလွန်ကြီးမားတဲ့ Artificial Intelligence (AI) မော်ဒယ်တွေ ဖြစ်ပါတယ်။ ၎င်းတို့ဟာ ဒေတာအမြောက်အမြားနဲ့ သင်ကြားလေ့ကျင့်ထားပြီး စာရေးတာ၊ မေးခွန်းဖြေတာ စတဲ့ ဘာသာစကားဆိုင်ရာ လုပ်ငန်းမျိုးစုံကို လုပ်ဆောင်နိုင်ပါတယ်။
+*   **Transformer Models**: Natural Language Processing (NLP) မှာ အောင်မြင်မှုများစွာရရှိခဲ့တဲ့ deep learning architecture တစ်မျိုးပါ။ ၎င်းတို့ဟာ စာသားတွေထဲက စကားလုံးတွေရဲ့ ဆက်နွယ်မှုတွေကို "attention mechanism" သုံးပြီး နားလည်အောင် သင်ကြားပေးပါတယ်။
+*   **Machine Learning (ML)**: ကွန်ပျူတာတွေဟာ ဒေတာတွေကနေ သင်ယူပြီး လုပ်ငန်းဆောင်တာတွေကို လူသားတွေရဲ့ ညွှန်ကြားချက်မပါဘဲ ကိုယ်တိုင်လုပ်ဆောင်နိုင်အောင် လုပ်ဆောင်ပေးတဲ့ Artificial Intelligence (AI) ရဲ့ နယ်ပယ်ခွဲတစ်ခု ဖြစ်ပါတယ်။
+*   **Context**: စကားလုံး၊ စာကြောင်း သို့မဟုတ် အကြောင်းအရာတစ်ခုရဲ့ အဓိပ္ပာယ်ကို နားလည်စေရန် ကူညီပေးသော ပတ်ဝန်းကျင်ရှိ အချက်အလက်များ။
+*   **Sentiment**: လူတစ်ဦးရဲ့ ခံစားချက်၊ သဘောထား ဒါမှမဟုတ် အမြင်ကို ဖော်ပြတဲ့ အရာ။ (ဥပမာ- ကောင်းတယ်၊ ဆိုးတယ်၊ ကြားနေ)
+*   **Spam**: မလိုအပ်ဘဲ အစုလိုက်အပြုံလိုက် ပေးပို့သော email များ သို့မဟုတ် မက်ဆေ့ခ်ျများ။
+*   **Grammatically Correct**: သဒ္ဒါစည်းမျဉ်းစည်းကမ်းများနှင့် ကိုက်ညီခြင်း။
+*   **Named Entities**: စာသားတစ်ခုထဲတွင် ပါဝင်သော လူအမည်၊ နေရာအမည်၊ အဖွဲ့အစည်းအမည် သို့မဟုတ် အခြားသီးခြားအမည်များ။
+*   **Text Generation**: AI မော်ဒယ်များကို အသုံးပြု၍ လူသားကဲ့သို့သော စာသားအသစ်များ ဖန်တီးခြင်း။
+*   **Masked Words**: စာသားတစ်ခုထဲတွင် ဝှက်ထားသော သို့မဟုတ် ဖုံးကွယ်ထားသော စကားလုံးများ။
+*   **Speech Recognition**: ပြောဆိုသော ဘာသာစကားကို ကွန်ပျူတာက စာသားအဖြစ် ပြောင်းလဲနားလည်နိုင်သည့် နည်းပညာ။
+*   **Computer Vision**: ကွန်ပျူတာများကို ပုံရိပ်များ၊ ဗီဒီယိုများကို လူသားများကဲ့သို့ မြင်၊ နားလည်နိုင်အောင် သင်ကြားပေးသည့် Artificial Intelligence (AI) နယ်ပယ်။
+*   **Transcript**: အသံ သို့မဟုတ် စကားပြောကို စာသားအဖြစ် ရေးသားထားခြင်း။
+*   **GPT (Generative Pre-trained Transformer)**: OpenAI မှ တီထွင်ထားသော Transformer-based Large Language Model (LLM) အမျိုးအစားတစ်ခု။
+*   **Llama**: Meta မှ တီထွင်ထားသော Transformer-based Large Language Model (LLM) အမျိုးအစားတစ်ခု။
+*   **Parameters**: Machine Learning မော်ဒယ်တစ်ခု၏ သင်ယူနိုင်သော အစိတ်အပိုင်းများ။ ၎င်းတို့သည် လေ့ကျင့်နေစဉ်အတွင်း ဒေတာများမှ ပုံစံများကို သင်ယူကာ ချိန်ညှိပေးသည်။
+*   **Prompt**: Large Language Models (LLMs) ကို တိကျသောလုပ်ငန်းတစ်ခု လုပ်ဆောင်ရန် သို့မဟုတ် အချက်အလက်ပေးရန်အတွက် ပေးပို့သော input text သို့မဟုတ် မေးခွန်း။
+*   **Fine-tune**: ကြိုတင်လေ့ကျင့်ထားပြီးသား (pre-trained) မော်ဒယ်တစ်ခုကို သီးခြားလုပ်ငန်းတစ်ခု (specific task) အတွက် အနည်းငယ်သော ဒေတာနဲ့ ထပ်မံလေ့ကျင့်ပေးခြင်းကို ဆိုလိုပါတယ်။
+*   **Deployment**: Machine Learning မော်ဒယ်တစ်ခုကို အမှန်တကယ် အသုံးပြုနိုင်သော စနစ် သို့မဟုတ် environment တစ်ခုထဲသို့ ထည့်သွင်းခြင်း။
+*   **Hallucinations**: Artificial Intelligence (AI) မော်ဒယ်များမှ မှန်ကန်မှုမရှိသော သို့မဟုတ် အဓိပ္ပာယ်မရှိသော အချက်အလက်များကို ယုံကြည်မှုရှိရှိ ထုတ်လုပ်ပေးခြင်း။
+*   **Bias**: ဒေတာအစုအဝေး (dataset) သို့မဟုတ် မော်ဒယ်၏ လေ့ကျင့်မှုပုံစံကြောင့် ဖြစ်ပေါ်လာသော ဘက်လိုက်မှုများ။
+*   **Context Windows**: Large Language Models (LLMs) တစ်ခုက တစ်ပြိုင်နက်တည်း ပြန်လည်ကြည့်ရှုနိုင်သော သို့မဟုတ် စီမံဆောင်ရွက်နိုင်သော input text ၏ ပမာဏ။
+*   **Computational Resources**: ကွန်ပျူတာစနစ်တစ်ခု၏ လုပ်ဆောင်နိုင်စွမ်းများ (ဥပမာ - CPU, GPU, memory, storage)။
+*   **Cultural Context**: လူ့အဖွဲ့အစည်းတစ်ခု၏ ဓလေ့ထုံးတမ်းများ၊ ယုံကြည်မှုများ၊ တန်ဖိုးများနှင့် အပြုအမူများ။
+*   **Sarcasm**: ပြက်ရယ်ပြုခြင်း သို့မဟုတ် ဆန့်ကျင်ဘက် အဓိပ္ပာယ်ကို ဖော်ပြရန် စကားလုံးများကို အသုံးပြုခြင်း။
+*   **Humor**: ရယ်စရာကောင်းသော သို့မဟုတ် ပျော်ရွှင်စေသော အရာ။
\ No newline at end of file
diff --git a/chapters/my/chapter1/3.mdx b/chapters/my/chapter1/3.mdx
new file mode 100644
index 000000000..401da476b
--- /dev/null
+++ b/chapters/my/chapter1/3.mdx
@@ -0,0 +1,434 @@
+# Transformers တွေက ဘာတွေလုပ်နိုင်လဲ။[[transformers-what-can-they-do]]
+
+<CourseFloatingBanner chapter={1}
+  classNames="absolute z-10 right-0 top-0"
+  notebooks={[
+    {label: "Google Colab", value: "https://colab.research.google.com/github/huggingface/notebooks/blob/master/course/en/chapter1/section3.ipynb"},
+    {label: "Aws Studio", value: "https://studiolab.sagemaker.aws/import/github/huggingface/notebooks/blob/master/course/en/chapter1/section3.ipynb"},
+]} />
+
+ဒီအပိုင်းမှာတော့ Transformer မော်ဒယ်တွေ ဘာတွေလုပ်နိုင်လဲဆိုတာကို ကြည့်ရှုသွားမှာဖြစ်ပြီး Hugging Face ရဲ့ 🤗 Transformers library ထဲက ပထမဆုံး ကိရိယာဖြစ်တဲ့ `pipeline()` function ကို အသုံးပြုသွားမှာပါ။
+
+> [!TIP]
+> 👀 ညာဘက်အပေါ်ထောင့်မှာရှိတဲ့ <em>Open in Colab</em> ခလုတ်ကို မြင်ရလား။ ဒီကဏ္ဍက code ဥပမာတွေ အားလုံးပါဝင်တဲ့ Google Colab notebook ကို ဖွင့်ဖို့ အဲဒီခလုတ်ကို နှိပ်လိုက်ပါ။ ဒီခလုတ်က code ဥပမာတွေ ပါဝင်တဲ့ မည်သည့်ကဏ္ဍမှာမဆို ရှိနေမှာပါ။
+>
+> ဥပမာတွေကို ကိုယ်တိုင် run ချင်တယ်ဆိုရင်တော့ <a href="/course/chapter0">setup</a> ကို လေ့လာကြည့်ဖို့ ကျွန်တော်တို့ အကြံပြုပါတယ်။
+
+## Transformers တွေက နေရာတိုင်းမှာ ရှိနေပါတယ်![[transformers-are-everywhere]]
+
+Transformer မော်ဒယ်တွေကို Natural Language Processing (NLP), computer vision, audio processing အပါအဝင် မတူညီတဲ့ နယ်ပယ်အသီးသီးက လုပ်ငန်းတာဝန်မျိုးစုံကို ဖြေရှင်းဖို့ အသုံးပြုပါတယ်။ အောက်ဖော်ပြပါ ကုမ္ပဏီတွေနဲ့ အဖွဲ့အစည်းအချို့ကတော့ Hugging Face နဲ့ Transformer မော်ဒယ်တွေကို အသုံးပြုနေကြပြီး ၎င်းတို့ရဲ့ မော်ဒယ်တွေကို မျှဝေခြင်းဖြင့် လူ့အဖွဲ့အစည်းကိုလည်း ပံ့ပိုးကူညီပေးနေကြပါတယ်။
+
+<img src="https://huggingface.co/datasets/huggingface-course/documentation-images/resolve/main/en/chapter1/companies.PNG" alt="Companies using Hugging Face" width="100%">
+
+[🤗 Transformers library](https://huggingface.co/docs/transformers/index) ကတော့ အဲဒီမျှဝေထားတဲ့ မော်ဒယ်တွေကို ဖန်တီးပြီး အသုံးပြုနိုင်တဲ့ functionality တွေကို ပံ့ပိုးပေးပါတယ်။ [Model Hub](https://huggingface.co/models) မှာတော့ လူတိုင်း download လုပ်ပြီး အသုံးပြုနိုင်တဲ့ ကြိုတင်လေ့ကျင့်ထားသော (pretrained) မော်ဒယ်ပေါင်း သန်းချီ ပါဝင်ပါတယ်။ သင်ရဲ့ ကိုယ်ပိုင်မော်ဒယ်တွေကိုလည်း Hub ကို upload တင်နိုင်ပါတယ်။
+
+> [!TIP]
+> ⚠️ Hugging Face Hub ဟာ Transformer မော်ဒယ်တွေအတွက်ပဲ ကန့်သတ်ထားတာ မဟုတ်ပါဘူး။ လူတိုင်းက သူတို့လိုချင်တဲ့ မည်သည့်မော်ဒယ် သို့မဟုတ် dataset ကိုမဆို မျှဝေနိုင်ပါတယ်။ ရရှိနိုင်တဲ့ အင်္ဂါရပ်အားလုံးကို ရယူဖို့ <a href="https://huggingface.co/join">huggingface.co account တစ်ခု ဖန်တီးပါ</a>!
+
+Transformer မော်ဒယ်တွေ ဘယ်လိုအလုပ်လုပ်လဲဆိုတာကို အသေးစိတ် မလေ့လာခင်မှာ စိတ်ဝင်စားစရာကောင်းတဲ့ NLP ပြဿနာအချို့ကို ဖြေရှင်းဖို့ ဘယ်လိုအသုံးပြုနိုင်လဲဆိုတာကို ဥပမာအချို့နဲ့ ကြည့်ရအောင်။
+
+## Pipelines တွေနဲ့ အလုပ်လုပ်ခြင်း[[working-with-pipelines]]
+
+<Youtube id="tiZFewofSLM" />
+
+🤗 Transformers library ရဲ့ အခြေခံအကျဆုံး object ကတော့ `pipeline()` function ဖြစ်ပါတယ်။ ၎င်းက မော်ဒယ်တစ်ခုကို ၎င်းရဲ့ လိုအပ်တဲ့ preprocessing နဲ့ postprocessing အဆင့်တွေနဲ့ ချိတ်ဆက်ပေးပြီး ကျွန်တော်တို့ စာသားတစ်ခုခုကို တိုက်ရိုက်ထည့်သွင်းပြီး နားလည်လွယ်တဲ့ အဖြေတစ်ခု ရယူနိုင်စေပါတယ်။
+
+```python
+from transformers import pipeline
+
+classifier = pipeline("sentiment-analysis")
+classifier("I've been waiting for a HuggingFace course my whole life.")
+```
+
+```python out
+[{'label': 'POSITIVE', 'score': 0.9598047137260437}]
+```
+
+ကျွန်တော်တို့ စာကြောင်းများစွာကိုတောင် ပေးပို့နိုင်ပါတယ်။
+
+```python
+classifier(
+    ["I've been waiting for a HuggingFace course my whole life.", "I hate this so much!"]
+)
+```
+
+```python out
+[{'label': 'POSITIVE', 'score': 0.9598047137260437},
+ {'label': 'NEGATIVE', 'score': 0.9994558095932007}]
+```
+
+ပုံမှန်အားဖြင့် ဒီ pipeline ဟာ အင်္ဂလိပ်ဘာသာစကားမှာ sentiment analysis အတွက် fine-tune လုပ်ထားတဲ့ သီးခြား pretrained model တစ်ခုကို ရွေးချယ်ပေးပါတယ်။ `classifier` object ကို ဖန်တီးတဲ့အခါ မော်ဒယ်ကို download လုပ်ပြီး cache လုပ်ထားပါတယ်။ ဒီ command ကို ထပ် run တဲ့အခါ cache လုပ်ထားတဲ့ မော်ဒယ်ကို ပြန်သုံးမှာဖြစ်ပြီး မော်ဒယ်ကို ထပ် download လုပ်ဖို့ မလိုအပ်တော့ပါဘူး။
+
+pipeline ထဲကို စာသားအချို့ ပေးပို့တဲ့အခါ အဓိက အဆင့်သုံးဆင့် ပါဝင်ပါတယ်။
+
+1. စာသားကို မော်ဒယ် နားလည်နိုင်တဲ့ ပုံစံအဖြစ် preprocessing လုပ်ပါတယ်။
+2. preprocessing လုပ်ထားတဲ့ inputs တွေကို မော်ဒယ်ဆီ ပေးပို့ပါတယ်။
+3. မော်ဒယ်ရဲ့ ခန့်မှန်းချက်တွေကို post-processing လုပ်ပြီး နားလည်လွယ်အောင် ပြန်ထုတ်ပေးပါတယ်။
+
+## မတူညီသော နယ်ပယ်များအတွက် ရနိုင်သော pipelines များ
+
+The `pipeline()` function ဟာ မတူညီတဲ့ နယ်ပယ်များစွာ (multimodal) ကို ထောက်ပံ့ပေးပြီး စာသား၊ ပုံရိပ်၊ အသံနဲ့ multimodal လုပ်ငန်းတာဝန်တွေအထိ လုပ်ဆောင်နိုင်ပါတယ်။ ဒီသင်တန်းမှာတော့ စာသားလုပ်ငန်းတာဝန်တွေကို အဓိကထားမှာဖြစ်ပေမယ့် Transformer architecture ရဲ့ အလားအလာကို နားလည်ထားဖို့က အသုံးဝင်တာကြောင့် အကျဉ်းချုပ် ဖော်ပြပေးပါမယ်။
+
+အောက်ဖော်ပြပါတို့ကတော့ ရရှိနိုင်တဲ့ အရာတွေရဲ့ အကျဉ်းချုပ် ဖြစ်ပါတယ်။
+
+> [!TIP]
+> pipelines တွေရဲ့ ပြည့်စုံပြီး နောက်ဆုံးပေါ် စာရင်းအတွက် [🤗 Transformers documentation](https://huggingface.co/docs/hub/en/models-tasks) ကို ကြည့်ရှုပါ။
+
+### စာသား Pipelines (Text pipelines)
+
+- `text-generation`: prompt တစ်ခုမှ စာသားကို ဖန်တီးခြင်း။
+- `text-classification`: စာသားကို ကြိုတင်သတ်မှတ်ထားသော အမျိုးအစားများအဖြစ် ခွဲခြားခြင်း။
+- `summarization`: အဓိက အချက်အလက်များကို ထိန်းသိမ်းထားရင်း စာသားတစ်ခုကို ပိုမိုတိုတောင်းသော ပုံစံအဖြစ် ပြုလုပ်ခြင်း။
+- `translation`: စာသားတစ်ခုကို ဘာသာစကားတစ်ခုမှ အခြားဘာသာစကားတစ်ခုသို့ ဘာသာပြန်ခြင်း။
+- `zero-shot-classification`: သီးခြား label များပေါ်တွင် ကြိုတင်လေ့ကျင့်မှုမရှိဘဲ စာသားကို အမျိုးအစားခွဲခြားခြင်း။(Classify text without prior training on specific labels)
+- `feature-extraction`: စာသားများ၏ vector representations များကို ထုတ်ယူခြင်း။
+
+### ပုံရိပ် Pipelines (Image pipelines)
+
+- `image-to-text`: ပုံရိပ်များ၏ စာသားဖော်ပြချက်များကို ဖန်တီးခြင်း။
+- `image-classification`: ပုံရိပ်တစ်ခုရှိ အရာဝတ္ထုများကို ခွဲခြားသတ်မှတ်ခြင်း။
+- `object-detection`: ပုံရိပ်များရှိ အရာဝတ္ထုများကို နေရာရှာပြီး ခွဲခြားသတ်မှတ်ခြင်း။
+
+### အသံ Pipelines (Audio pipelines)
+
+- `automatic-speech-recognition`: စကားပြောကို စာသားအဖြစ် ပြောင်းလဲခြင်း။
+- `audio-classification`: အသံများကို အမျိုးအစားများအဖြစ် ခွဲခြားခြင်း။
+- `text-to-speech`: စာသားကို ပြောဆိုသောအသံအဖြစ် ပြောင်းလဲခြင်း။
+
+### Multimodal pipelines
+
+- `image-text-to-text`: စာသား prompt တစ်ခုအပေါ် အခြေခံပြီး ပုံရိပ်တစ်ခုကို တုံ့ပြန်ခြင်း။
+
+ဒီ pipelines အချို့ကို ပိုမိုအသေးစိတ် လေ့လာကြည့်ရအောင်။
+
+## Zero-shot classification[[zero-shot-classification]]
+
+ကျွန်တော်တို့ စိန်ခေါ်မှု ပိုများတဲ့ လုပ်ငန်းတာဝန်တစ်ခုနဲ့ စတင်ပါမယ်။ အဲဒါကတော့ label မတပ်ရသေးတဲ့ စာသားတွေကို အမျိုးအစားခွဲခြားဖို့ လိုအပ်တဲ့ အလုပ်ပါ။ ဒါဟာ လက်တွေ့ကမ္ဘာ project တွေမှာ အဖြစ်များတဲ့ အခြေအနေတစ်ခုပါ။ ဘာလို့လဲဆိုတော့ စာသားတွေကို label တပ်ဖို့က အချိန်ကုန်လေ့ရှိပြီး နယ်ပယ်ဆိုင်ရာ ကျွမ်းကျင်မှု (domain expertise) လိုအပ်လို့ပါ။ ဒီလိုကိစ္စမျိုးအတွက် `zero-shot-classification` pipeline က အလွန်အစွမ်းထက်ပါတယ်။ ၎င်းက classification အတွက် အသုံးပြုရမယ့် labels တွေကို သတ်မှတ်နိုင်စေတာကြောင့် pretrained model ရဲ့ labels တွေကို အားကိုးစရာ မလိုတော့ပါဘူး။ မော်ဒယ်က စာကြောင်းတစ်ကြောင်းကို positive သို့မဟုတ် negative လို့ ဘယ်လိုခွဲခြားနိုင်လဲဆိုတာ သင်မြင်ပြီးပါပြီ — ဒါပေမယ့် သင်လိုချင်တဲ့ အခြား labels အစုအဝေးနဲ့လည်း စာသားကို ခွဲခြားနိုင်ပါသေးတယ်။
+
+```python
+from transformers import pipeline
+
+classifier = pipeline("zero-shot-classification")
+classifier(
+    "This is a course about the Transformers library",
+    candidate_labels=["education", "politics", "business"],
+)
+```
+
+```python out
+{'sequence': 'This is a course about the Transformers library',
+ 'labels': ['education', 'business', 'politics'],
+ 'scores': [0.8445963859558105, 0.111976258456707, 0.043427448719739914]}
+```
+
+ဒီ pipeline ကို _zero-shot_ လို့ခေါ်တာကတော့ သင်ရဲ့ဒေတာပေါ်မှာ မော်ဒယ်ကို fine-tune လုပ်ဖို့ မလိုအပ်ဘဲ အသုံးပြုနိုင်လို့ပါ။ ဒါဟာ သင်လိုချင်တဲ့ labels စာရင်းအတွက် ဖြစ်နိုင်ခြေရမှတ် (probability scores) တွေကို တိုက်ရိုက်ပြန်ပေးနိုင်ပါတယ်။
+
+> [!TIP]
+> ✏️ **ကိုယ်တိုင် စမ်းကြည့်ပါဦး။** သင်ရဲ့ ကိုယ်ပိုင် sequences နဲ့ labels တွေနဲ့ စမ်းသပ်ပြီး မော်ဒယ် ဘယ်လို အလုပ်လုပ်လဲ ကြည့်ပါ။
+
+## စာသား ဖန်တီးခြင်း (Text generation)[[text-generation]]
+
+အခုတော့ စာသားအချို့ ဖန်တီးဖို့ pipeline ကို ဘယ်လိုအသုံးပြုလဲဆိုတာ ကြည့်ရအောင်။ ဒီနေရာမှာ အဓိကစိတ်ကူးက သင် prompt တစ်ခု ပေးလိုက်ရင် မော်ဒယ်က ကျန်ရှိတဲ့ စာသားကို ဖန်တီးပေးခြင်းဖြင့် auto-complete လုပ်ပေးပါလိမ့်မယ်။ ဒါက ဖုန်းတွေမှာ တွေ့ရတဲ့ predictive text feature နဲ့ ဆင်တူပါတယ်။ စာသားဖန်တီးခြင်းမှာ randomness ပါဝင်တာကြောင့် အောက်မှာ ပြထားတဲ့ ရလဒ်တွေအတိုင်း အတိအကျမရရင်လည်း ပုံမှန်ပါပဲ။
+
+```python
+from transformers import pipeline
+
+generator = pipeline("text-generation")
+generator("In this course, we will teach you how to")
+```
+
+```python out
+[{'generated_text': 'In this course, we will teach you how to understand and use '
+                    'data flow and data interchange when handling user data. We '
+                    'will be working with one or more of the most commonly used '
+                    'data flows — data flows of various types, as seen by the '
+                    'HTTP'}]
+```
+
+`num_return_sequences` argument နဲ့ ဖန်တီးမယ့် sequence အရေအတွက်ကို ထိန်းချုပ်နိုင်ပြီး `max_length` argument နဲ့ ထွက်ပေါ်လာမယ့် စာသားရဲ့ စုစုပေါင်းအရှည်ကို ထိန်းချုပ်နိုင်ပါတယ်။
+
+> [!TIP]
+> ✏️ **ကိုယ်တိုင် စမ်းကြည့်ပါဦး။** `num_return_sequences` နဲ့ `max_length` arguments တွေကို အသုံးပြုပြီး စကားလုံး ၁၅ လုံးစီ ပါဝင်တဲ့ စာကြောင်းနှစ်ကြောင်း ဖန်တီးပါ။
+
+## Hub မှ မည်သည့်မော်ဒယ်ကိုမဆို pipeline ထဲတွင် အသုံးပြုခြင်း[[using-any-model-from-the-hub-in-a-pipeline]]
+
+အထက်ပါ ဥပမာတွေမှာတော့ လုပ်ငန်းတာဝန်အတွက် ပုံမှန်မော်ဒယ်ကို အသုံးပြုခဲ့တာဖြစ်ပေမယ့်၊ သီးခြားလုပ်ငန်းတစ်ခု — ဥပမာ စာသားဖန်တီးခြင်းအတွက် Hub ကနေ သီးခြားမော်ဒယ်တစ်ခုကိုလည်း pipeline ထဲမှာ ရွေးချယ်အသုံးပြုနိုင်ပါတယ်။ [Model Hub](https://huggingface.co/models) ကိုသွားပြီး ဘယ်ဘက်ခြမ်းမှာရှိတဲ့ သက်ဆိုင်ရာ tag ကို နှိပ်လိုက်ရင် အဲဒီလုပ်ငန်းအတွက် ထောက်ပံ့ပေးတဲ့ မော်ဒယ်တွေကိုသာ ပြသပေးပါလိမ့်မယ်။ [ဒီစာမျက်နှာ](https://huggingface.co/models?pipeline_tag=text-generation) လို စာမျက်နှာကို သင်ရောက်သွားပါလိမ့်မယ်။
+
+[`HuggingFaceTB/SmolLM2-360M`](https://huggingface.co/HuggingFaceTB/SmolLM2-360M) မော်ဒယ်ကို စမ်းကြည့်ရအောင်။ အောက်ပါအတိုင်း အရင်ကအတိုင်း pipeline ထဲမှာ ထည့်သွင်း load လုပ်နိုင်ပါတယ်။
+
+```python
+from transformers import pipeline
+
+generator = pipeline("text-generation", model="HuggingFaceTB/SmolLM2-360M")
+generator(
+    "In this course, we will teach you how to",
+    max_length=30,
+    num_return_sequences=2,
+)
+```
+
+```python out
+[{'generated_text': 'In this course, we will teach you how to manipulate the world and '
+                    'move your mental and physical capabilities to your advantage.'},
+ {'generated_text': 'In this course, we will teach you how to become an expert and '
+                    'practice realtime, and with a hands on experience on both real '
+                    'time and real'}]
+```
+
+ဘာသာစကား tags တွေကို နှိပ်ပြီး သင်ရဲ့မော်ဒယ်ရှာဖွေမှုကို ပိုမိုတိကျအောင် လုပ်ဆောင်နိုင်ပြီး အခြားဘာသာစကားနဲ့ စာသားဖန်တီးပေးမယ့် မော်ဒယ်တစ်ခုကို ရွေးချယ်နိုင်ပါတယ်။ Model Hub မှာ ဘာသာစကားမျိုးစုံကို ထောက်ပံ့ပေးတဲ့ multilingual model တွေအတွက် checkpoints တွေလည်း ပါဝင်ပါတယ်။
+
+မော်ဒယ်တစ်ခုကို နှိပ်ပြီး ရွေးချယ်လိုက်တာနဲ့ ၎င်းကို တိုက်ရိုက် online မှာ စမ်းသပ်နိုင်တဲ့ widget တစ်ခုကို တွေ့ရပါလိမ့်မယ်။ ဒီနည်းနဲ့ မော်ဒယ်ကို download မလုပ်ခင်မှာ ၎င်းရဲ့စွမ်းဆောင်ရည်တွေကို အမြန်စမ်းသပ်နိုင်ပါတယ်။
+
+> [!TIP]
+> ✏️ ** ကိုယ်တိုင် စမ်းကြည့်ပါဦး။** အခြားဘာသာစကားတစ်ခုအတွက် text generation မော်ဒယ်တစ်ခုကို ရှာဖွေဖို့ filters တွေကို အသုံးပြုပါ။ widget နဲ့ စမ်းသပ်ပြီး pipeline ထဲမှာ အသုံးပြုနိုင်ပါတယ်။
+
+### Inference Providers များ[[inference-providers]]
+
+မော်ဒယ်အားလုံးကို Hugging Face [website](https://huggingface.co/docs/inference-providers/en/index) မှာ ရရှိနိုင်တဲ့ Inference Providers ကို အသုံးပြုပြီး သင့် browser ကနေ တိုက်ရိုက် စမ်းသပ်နိုင်ပါတယ်။ ဒီစာမျက်နှာမှာ စိတ်ကြိုက်စာသားထည့်သွင်းပြီး မော်ဒယ်က input data ကို ဘယ်လိုလုပ်ဆောင်လဲဆိုတာကို ကြည့်ရှုခြင်းဖြင့် မော်ဒယ်နဲ့ တိုက်ရိုက်ကစားနိုင်ပါတယ်။
+
+Widget ကို စွမ်းဆောင်ပေးတဲ့ Inference Providers ကိုလည်း ငွေပေးချေရတဲ့ ထုတ်ကုန်အဖြစ် ရရှိနိုင်ပါတယ်။ ဒါက သင်ရဲ့ workflow တွေအတွက် လိုအပ်ရင် အလွန်အသုံးဝင်ပါတယ်။ အသေးစိတ်အတွက်[pricing page](https://huggingface.co/docs/inference-providers/en/pricing) ကို ကြည့်ပါ။
+
+## Mask filling[[mask-filling]]
+
+သင်စမ်းသပ်ရမယ့် နောက်ထပ် pipeline ကတော့ `fill-mask` ဖြစ်ပါတယ်။ ဒီလုပ်ငန်းတာဝန်ရဲ့ စိတ်ကူးက ပေးထားတဲ့ စာသားတစ်ခုမှာရှိတဲ့ ကွက်လပ်တွေကို ဖြည့်ဆည်းပေးဖို့ပါပဲ။
+
+```python
+from transformers import pipeline
+
+unmasker = pipeline("fill-mask")
+unmasker("This course will teach you all about <mask> models.", top_k=2)
+```
+
+```python out
+[{'sequence': 'This course will teach you all about mathematical models.',
+  'score': 0.19619831442832947,
+  'token': 30412,
+  'token_str': ' mathematical'},
+ {'sequence': 'This course will teach you all about computational models.',
+  'score': 0.04052725434303284,
+  'token': 38163,
+  'token_str': ' computational'}]
+```
+
+`top_k`  argument က သင်ပြသ၊လိုတဲ့ ဖြစ်နိုင်ခြေ အရေအတွက်ကို ထိန်းချုပ်ပါတယ်။ ဒီနေရာမှာ မော်ဒယ်က `<mask>` ဆိုတဲ့ အထူးစကားလုံးကို ဖြည့်ပေးတာကို သတိပြုပါ။ ဒါကို *mask token* လို့ မကြာခဏ ရည်ညွှန်းလေ့ရှိပါတယ်။ အခြား mask-filling မော်ဒယ်တွေမှာ မတူညီတဲ့ mask token တွေ ရှိနိုင်တာကြောင့် အခြားမော်ဒယ်တွေကို လေ့လာတဲ့အခါ မှန်ကန်တဲ့ mask word ကို အမြဲစစ်ဆေးတာ ကောင်းပါတယ်။ စစ်ဆေးဖို့ နည်းလမ်းတစ်ခုကတော့ widget မှာ အသုံးပြုထားတဲ့ mask word ကို ကြည့်တာပါပဲ။
+
+> [!TIP]
+> ✏️ **ကိုယ်တိုင် စမ်းကြည့်ပါဦး။** Hub ပေါ်မှာ `bert-base-cased` မော်ဒယ်ကို ရှာဖွေပြီး Inference API widget မှာ ၎င်းရဲ့ mask word ကို ခွဲခြားသတ်မှတ်ပါ။ ကျွန်တော်တို့ရဲ့ `pipeline` ဥပမာမှာပါတဲ့ စာကြောင်းအတွက် ဒီမော်ဒယ်က ဘာကို ခန့်မှန်းပေးမလဲ။
+
+## Named entity recognition[[named-entity-recognition]]
+
+Named Entity Recognition (NER) ဆိုတာက input text ထဲက ဘယ်အပိုင်းတွေက လူ၊ နေရာ ဒါမှမဟုတ် အဖွဲ့အစည်းလို entities တွေနဲ့ ကိုက်ညီလဲဆိုတာကို မော်ဒယ်က ရှာဖွေရတဲ့ လုပ်ငန်းတာဝန်တစ်ခု ဖြစ်ပါတယ်။ ဥပမာတစ်ခုကို ကြည့်ရအောင်။
+
+```python
+from transformers import pipeline
+
+ner = pipeline("ner", grouped_entities=True)
+ner("My name is Sylvain and I work at Hugging Face in Brooklyn.")
+```
+
+```python out
+[{'entity_group': 'PER', 'score': 0.99816, 'word': 'Sylvain', 'start': 11, 'end': 18}, 
+ {'entity_group': 'ORG', 'score': 0.97960, 'word': 'Hugging Face', 'start': 33, 'end': 45}, 
+ {'entity_group': 'LOC', 'score': 0.99321, 'word': 'Brooklyn', 'start': 49, 'end': 57}
+]
+```
+
+ဒီနေရာမှာ မော်ဒယ်က Sylvain ဟာ လူ (PER) ဖြစ်ကြောင်း၊ Hugging Face က အဖွဲ့အစည်း (ORG) ဖြစ်ကြောင်း၊ Brooklyn က နေရာ (LOC) ဖြစ်ကြောင်း မှန်ကန်စွာ ဖော်ထုတ်ခဲ့ပါတယ်။
+
+ကျွန်တော်တို့ `grouped_entities=True option` ကို pipeline ဖန်တီးတဲ့ function မှာ ပေးလိုက်တာက စာကြောင်းရဲ့ အစိတ်အပိုင်းတွေကို တူညီတဲ့ entity နဲ့ ကိုက်ညီတဲ့ အစိတ်အပိုင်းတွေကို အတူတကွ ပြန်လည်စုစည်းဖို့ pipeline ကို ပြောတာပါ။ ဒီနေရာမှာ မော်ဒယ်က "Hugging" နဲ့ "Face" ကို စကားလုံးများစွာနဲ့ ဖွဲ့စည်းထားတဲ့ နာမည်ဖြစ်ပေမယ့် တစ်ခုတည်းသော အဖွဲ့အစည်းအဖြစ် မှန်ကန်စွာ စုစည်းခဲ့ပါတယ်။ တကယ်တော့ နောက်အခန်းမှာ ကျွန်တော်တို့ မြင်ရမှာဖြစ်သလို preprocessing က စကားလုံးအချို့ကို ပိုမိုသေးငယ်တဲ့ အစိတ်အပိုင်းတွေအဖြစ် ခွဲထုတ်တာတောင် လုပ်ပါတယ်။ ဥပမာ၊ `Sylvain` ကို `S`၊ `##yl`၊ `##va`၊ `##in` ဆိုပြီး လေးပိုင်းခွဲပါတယ်။ post-processing အဆင့်မှာတော့ pipeline က အဲဒီအပိုင်းတွေကို အောင်မြင်စွာ ပြန်လည်စုစည်းပေးပါတယ်။
+
+> [!TIP]
+> ✏️ **ကိုယ်တိုင် စမ်းကြည့်ပါဦး။** Model Hub မှာ အင်္ဂလိပ်ဘာသာစကားမှာ part-of-speech tagging (အတိုကောက်အားဖြင့် POS) လုပ်ဆောင်နိုင်တဲ့ မော်ဒယ်တစ်ခုကို ရှာဖွေပါ။ အထက်ပါ ဥပမာမှာပါတဲ့ စာကြောင်းအတွက် ဒီမော်ဒယ်က ဘာကို ခန့်မှန်းပေးမလဲ။
+
+## မေးခွန်းဖြေဆိုခြင်း (Question answering)[[question-answering]]
+
+`question-answering` pipeline ကတော့ ပေးထားတဲ့ အကြောင်းအရာတစ်ခုကနေ အချက်အလက်တွေကို အသုံးပြုပြီး မေးခွန်းတွေကို ဖြေပေးပါတယ်။
+
+```python
+from transformers import pipeline
+
+question_answerer = pipeline("question-answering")
+question_answerer(
+    question="Where do I work?",
+    context="My name is Sylvain and I work at Hugging Face in Brooklyn",
+)
+```
+
+```python out
+{'score': 0.6385916471481323, 'start': 33, 'end': 45, 'answer': 'Hugging Face'}
+```
+
+ဒီ pipeline က ပေးထားတဲ့ context ကနေ အချက်အလက်တွေကို ထုတ်ယူပြီး အလုပ်လုပ်တာကို သတိပြုပါ။ ၎င်းက အဖြေကို ဖန်တီးပေးတာ မဟုတ်ပါဘူး။
+
+## အနှစ်ချုပ်ခြင်း(Summarization)[[summarization]]
+
+Summarization ဆိုတာက စာသားတစ်ခုကို ပိုမိုတိုတောင်းတဲ့ စာသားတစ်ခုအဖြစ် လျှော့ချပြီး စာသားမှာ ရည်ညွှန်းထားတဲ့ အရေးကြီးတဲ့ အချက်အလက်အားလုံး (သို့မဟုတ် အများစု) ကို ထိန်းသိမ်းထားတဲ့ လုပ်ငန်းတာဝန် ဖြစ်ပါတယ်။ ဥပမာတစ်ခုကို ကြည့်ရအောင်။
+
+```python
+from transformers import pipeline
+
+summarizer = pipeline("summarization")
+summarizer(
+    """
+    America has changed dramatically during recent years. Not only has the number of 
+    graduates in traditional engineering disciplines such as mechanical, civil, 
+    electrical, chemical, and aeronautical engineering declined, but in most of 
+    the premier American universities engineering curricula now concentrate on 
+    and encourage largely the study of engineering science. As a result, there 
+    are declining offerings in engineering subjects dealing with infrastructure, 
+    the environment, and related issues, and greater concentration on high 
+    technology subjects, largely supporting increasingly complex scientific 
+    developments. While the latter is important, it should not be at the expense 
+    of more traditional engineering.
+
+    Rapidly developing economies such as China and India, as well as other 
+    industrial countries in Europe and Asia, continue to encourage and advance 
+    the teaching of engineering. Both China and India, respectively, graduate 
+    six and eight times as many traditional engineers as does the United States. 
+    Other industrial countries at minimum maintain their output, while America 
+    suffers an increasingly serious decline in the number of engineering graduates 
+    and a lack of well-educated engineers.
+"""
+)
+```
+
+```python out
+[{'summary_text': ' America has changed dramatically during recent years . The '
+                  'number of engineering graduates in the U.S. has declined in '
+                  'traditional engineering disciplines such as mechanical, civil '
+                  ', electrical, chemical, and aeronautical engineering . Rapidly '
+                  'developing economies such as China and India, as well as other '
+                  'industrial countries in Europe and Asia, continue to encourage '
+                  'and advance engineering .'}]
+```
+
+Text generation  နဲ့ အလားတူပဲ၊ ရလဒ်အတွက် `max_length` သို့မဟုတ် `min_length` ကို သတ်မှတ်နိုင်ပါတယ်။
+
+## ဘာသာပြန်ခြင်း (Translation) [[translation]]
+
+ဘာသာပြန်ခြင်းအတွက်တော့ လုပ်ငန်းတာဝန်အမည်မှာ ဘာသာစကားတွဲ (ဥပမာ- `"translation_en_to_fr"`) ကို ပေးလိုက်ရင် default မော်ဒယ်ကို အသုံးပြုနိုင်ပါတယ်။ ဒါပေမယ့် အလွယ်ကူဆုံးနည်းလမ်းကတော့ [Model Hub](https://huggingface.co/models) ကနေ သင်အသုံးပြုလိုတဲ့ မော်ဒယ်ကို ရွေးချယ်တာပါပဲ။ ဒီနေရာမှာတော့ ပြင်သစ်ဘာသာကနေ အင်္ဂလိပ်ဘာသာကို ဘာသာပြန်ကြည့်ပါမယ်။
+
+```python
+from transformers import pipeline
+
+translator = pipeline("translation", model="Helsinki-NLP/opus-mt-fr-en")
+translator("Ce cours est produit par Hugging Face.")
+```
+
+```python out
+[{'translation_text': 'This course is produced by Hugging Face.'}]
+```
+
+Text generation နဲ့ summarization တွေလိုပဲ ရလဒ်အတွက် `max_length` သို့မဟုတ် `min_length` ကို သတ်မှတ်နိုင်ပါတယ်။
+
+> [!TIP]
+> ✏️ **ကိုယ်တိုင် စမ်းကြည့်ပါဦး။** အခြားဘာသာစကားတွေမှာ ဘာသာပြန်မော်ဒယ်တွေကို ရှာဖွေပြီး အရင်စာကြောင်းကို မတူညီတဲ့ ဘာသာစကားအချို့ဆီ ဘာသာပြန်ကြည့်ပါ။
+
+## ပုံရိပ်နှင့် အသံ Pipelines များ (Image and audio pipelines)
+
+စာသားအပြင် Transformer မော်ဒယ်တွေဟာ ပုံရိပ်နဲ့ အသံတွေနဲ့လည်း အလုပ်လုပ်နိုင်ပါတယ်။ ဥပမာအချို့ကတော့-
+
+### Image classification
+
+```python
+from transformers import pipeline
+
+image_classifier = pipeline(
+    task="image-classification", model="google/vit-base-patch16-224"
+)
+result = image_classifier(
+    "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/pipeline-cat-chonk.jpeg"
+)
+print(result)
+```
+
+```python out
+[{'label': 'lynx, catamount', 'score': 0.43350091576576233},
+ {'label': 'cougar, puma, catamount, mountain lion, painter, panther, Felis concolor',
+  'score': 0.034796204417943954},
+ {'label': 'snow leopard, ounce, Panthera uncia',
+  'score': 0.03240183740854263},
+ {'label': 'Egyptian cat', 'score': 0.02394474856555462},
+ {'label': 'tiger cat', 'score': 0.02288915030658245}]
+```
+
+### Automatic speech recognition
+
+```python
+from transformers import pipeline
+
+transcriber = pipeline(
+    task="automatic-speech-recognition", model="openai/whisper-large-v3"
+)
+result = transcriber(
+    "https://huggingface.co/datasets/Narsil/asr_dummy/resolve/main/mlk.flac"
+)
+print(result)
+```
+
+```python out
+{'text': ' I have a dream that one day this nation will rise up and live out the true meaning of its creed.'}
+```
+
+## ဒေတာ အရင်းအမြစ်မျိုးစုံမှ ဒေတာများကို ပေါင်းစပ်ခြင်း (Combining data from multiple sources)
+
+Transformer မော်ဒယ်တွေရဲ့ အစွမ်းထက်တဲ့ အသုံးချမှုတစ်ခုကတော့ ဒေတာအရင်းအမြစ်မျိုးစုံကနေ ဒေတာတွေကို ပေါင်းစပ်ပြီး လုပ်ဆောင်နိုင်စွမ်းပါပဲ။ ဒါက အောက်ပါကိစ္စတွေအတွက် အထူးအသုံးဝင်ပါတယ်။
+
+1. ဒေတာဘေ့စ်(databases)များစွာ သို့မဟုတ် repository များစွာကို ရှာဖွေခြင်း။
+2. တူညီတဲ့ format တွေ (စာသား၊ ပုံရိပ်၊ အသံ) ကနေ အချက်အလက်တွေကို စုစည်းခြင်း။
+3. ဆက်စပ်အချက်အလက်တွေရဲ့ ပေါင်းစပ်ထားသော မြင်ကွင်းတစ်ခုကို ဖန်တီးခြင်း။
+
+
+
+ဥပမာအားဖြင့်၊ သင်ဟာ အောက်ပါတို့ကို လုပ်ဆောင်နိုင်တဲ့ စနစ်တစ်ခုကို တည်ဆောက်နိုင်ပါတယ်။
+- စာသားနဲ့ ပုံရိပ်လို နယ်ပယ်အမျိုးစုံက ဒေတာဘေ့စ်တွေမှာ အချက်အလက်တွေကို ရှာဖွေခြင်း။
+- မတူညီတဲ့ အရင်းအမြစ်တွေကနေ ရရှိတဲ့ ရလဒ်တွေကို တစ်ခုတည်းသော တုံ့ပြန်မှုအဖြစ် ပေါင်းစပ်ခြင်း။ ဥပမာ- အသံဖိုင်တစ်ခုနဲ့ စာသားဖော်ပြချက်တစ်ခုကနေ။
+- document တွေနဲ့ metadata တွေရဲ့ ဒေတာဘေ့စ်တစ်ခုကနေ အသက်ဆိုင်ဆုံး အချက်အလက်တွေကို တင်ပြခြင်း။
+
+## နိဂုံးချုပ်
+
+ဒီအခန်းမှာ ပြသထားတဲ့ pipelines တွေကတော့ အများစုက သရုပ်ပြရန်အတွက်သာ ဖြစ်ပါတယ်။ ၎င်းတို့ကို သီးခြားလုပ်ငန်းတာဝန်များအတွက် ရေးဆွဲထားတာဖြစ်ပြီး ၎င်းတို့ရဲ့ မူကွဲတွေကိုတော့ လုပ်ဆောင်နိုင်ခြင်း မရှိပါဘူး။ နောက်အခန်းမှာတော့ `pipeline()` function ထဲမှာ ဘာတွေပါလဲ၊ ၎င်းရဲ့ လုပ်ဆောင်ပုံကို ဘယ်လိုစိတ်ကြိုက် ပြုပြင်နိုင်လဲဆိုတာကို သင်လေ့လာရမှာ ဖြစ်ပါတယ်။
+
+## ဝေါဟာရ ရှင်းလင်းချက် (Glossary)
+
+*   **Transformer Models**: Natural Language Processing (NLP) မှာ အောင်မြင်မှုများစွာရရှိခဲ့တဲ့ deep learning architecture တစ်မျိုးပါ။ ၎င်းတို့ဟာ စာသားတွေထဲက စကားလုံးတွေရဲ့ ဆက်နွယ်မှုတွေကို "attention mechanism" သုံးပြီး နားလည်အောင် သင်ကြားပေးပါတယ်။
+*   **🤗 Transformers library**: Hugging Face က ထုတ်လုပ်ထားတဲ့ library တစ်ခုဖြစ်ပြီး Transformer မော်ဒယ်တွေကို အသုံးပြုပြီး Natural Language Processing (NLP), computer vision, audio processing စတဲ့ နယ်ပယ်တွေမှာ အဆင့်မြင့် AI မော်ဒယ်တွေကို တည်ဆောက်ပြီး အသုံးပြုနိုင်စေပါတယ်။
+*   **`pipeline()` function**: Hugging Face Transformers library မှာ ပါဝင်တဲ့ လုပ်ဆောင်ချက်တစ်ခုဖြစ်ပြီး မော်ဒယ်တွေကို သီးခြားလုပ်ငန်းတာဝန်များ (ဥပမာ- စာသားခွဲခြားသတ်မှတ်ခြင်း၊ စာသားထုတ်လုပ်ခြင်း) အတွက် အသုံးပြုရလွယ်ကူအောင် ပြုလုပ်ပေးပါတယ်။
+*   **Natural Language Processing (NLP)**: ကွန်ပျူတာတွေ လူသားဘာသာစကားကို နားလည်၊ အဓိပ္ပာယ်ဖော်ပြီး၊ ဖန်တီးနိုင်အောင် လုပ်ဆောင်ပေးတဲ့ Artificial Intelligence (AI) ရဲ့ နယ်ပယ်ခွဲတစ်ခု ဖြစ်ပါတယ်။ ဥပမာအားဖြင့် စာသားခွဲခြမ်းစိတ်ဖြာခြင်း၊ ဘာသာပြန်ခြင်း စသည်တို့ ပါဝင်ပါတယ်။
+*   **Computer Vision**: ကွန်ပျူတာများကို ပုံရိပ်များ၊ ဗီဒီယိုများကို လူသားများကဲ့သို့ မြင်၊ နားလည်နိုင်အောင် သင်ကြားပေးသည့် Artificial Intelligence (AI) နယ်ပယ်။
+*   **Audio Processing**: အသံအချက်အလက်များကို ခွဲခြမ်းစိတ်ဖြာခြင်း၊ ပြောင်းလဲခြင်း သို့မဟုတ် ဖန်တီးခြင်း လုပ်ငန်းများ။
+*   **Model Hub**: Hugging Face က ထုတ်လုပ်ထားတဲ့ အွန်လိုင်း platform တစ်ခုဖြစ်ပြီး AI မော်ဒယ်တွေ၊ datasets တွေနဲ့ demo တွေကို အခြားသူတွေနဲ့ မျှဝေဖို့၊ ရှာဖွေဖို့နဲ့ ပြန်လည်အသုံးပြုဖို့အတွက် ဖြစ်ပါတယ်။
+*   **Pretrained Model**: ကြီးမားသော ဒေတာအစုအဝေးများဖြင့် ကြိုတင်လေ့ကျင့်ထားသော Machine Learning မော်ဒယ်။ ၎င်းတို့ကို အခြားလုပ်ငန်းများအတွက် fine-tune ပြန်လုပ်နိုင်ပါတယ်။
+*   **Fine-tuned**: ကြိုတင်လေ့ကျင့်ထားပြီးသား (pre-trained) မော်ဒယ်တစ်ခုကို သီးခြားလုပ်ငန်းတစ်ခု (specific task) အတွက် အနည်းငယ်သော ဒေတာနဲ့ ထပ်မံလေ့ကျင့်ပေးခြင်းကို ဆိုလိုပါတယ်။
+*   **Sentiment Analysis**: စာသားတစ်ခုရဲ့ စိတ်ခံစားမှု (အပြုသဘော၊ အနုတ်သဘော၊ ကြားနေ) ကို ခွဲခြမ်းစိတ်ဖြာခြင်း။
+*   **Preprocessing**: ဒေတာများကို Machine Learning မော်ဒယ်တစ်ခုက နားလည်နိုင်သော ပုံစံသို့ ပြောင်းလဲရန် ပြင်ဆင်ခြင်းလုပ်ငန်းစဉ်။
+*   **Postprocessing**: Machine Learning မော်ဒယ်၏ ရလဒ်များကို လူသားများ နားလည်လွယ်သော ပုံစံသို့ ပြန်လည်ပြောင်းလဲခြင်းလုပ်ငန်းစဉ်။
+*   **Multimodal**: မတူညီသော ဒေတာအမျိုးအစားများ (ဥပမာ- စာသား၊ ပုံရိပ်၊ အသံ) ကို အသုံးပြုနိုင်စွမ်းရှိခြင်း။
+*   **Text Generation**: AI မော်ဒယ်များကို အသုံးပြု၍ လူသားကဲ့သို့သော စာသားအသစ်များ ဖန်တီးခြင်း။
+*   **Text Classification**: စာသားတစ်ခုကို ကြိုတင်သတ်မှတ်ထားသော အမျိုးအစားများအဖြစ် ခွဲခြားခြင်း။
+*   **Summarization**: အဓိက အချက်အလက်များကို ထိန်းသိမ်းထားရင်း စာသားတစ်ခုကို ပိုမိုတိုတောင်းသော ပုံစံအဖြစ် ပြုလုပ်ခြင်း။
+*   **Translation**: စာသားတစ်ခုကို ဘာသာစကားတစ်ခုမှ အခြားဘာသာစကားတစ်ခုသို့ ဘာသာပြန်ခြင်း။
+*   **Zero-shot Classification**: သီးခြား label များပေါ်တွင် ကြိုတင်လေ့ကျင့်မှုမရှိဘဲ စာသားကို အမျိုးအစားခွဲခြားခြင်း။
+*   **Feature Extraction**: ဒေတာများမှ အရေးကြီးသော လက္ခဏာရပ်များကို ထုတ်ယူခြင်း။
+*   **Image to Text**: ပုံရိပ်တစ်ခုမှ စာသားဖော်ပြချက်များကို ဖန်တီးခြင်း။
+*   **Image Classification**: ပုံရိပ်တစ်ခုရှိ အရာဝတ္ထုများကို ခွဲခြားသတ်မှတ်ခြင်း။
+*   **Object Detection**: ပုံရိပ်များရှိ အရာဝတ္ထုများကို နေရာရှာပြီး ခွဲခြားသတ်မှတ်ခြင်း။
+*   **Automatic Speech Recognition**: ပြောဆိုသော ဘာသာစကားကို ကွန်ပျူတာက စာသားအဖြစ် ပြောင်းလဲနားလည်နိုင်သည့် နည်းပညာ။
+*   **Audio Classification**: အသံများကို အမျိုးအစားများအဖြစ် ခွဲခြားခြင်း။
+*   **Text to Speech**: စာသားကို ပြောဆိုသောအသံအဖြစ် ပြောင်းလဲခြင်း။
+*   **Image-Text to Text**: ပုံရိပ်တစ်ခုနဲ့ ပတ်သက်ပြီး စာသား prompt တစ်ခုအပေါ် အခြေခံပြီး စာသားတုံ့ပြန်မှု ဖန်တီးခြင်း။
+*   **Label**: ဒေတာအချက်အလက်တစ်ခုကို ဖော်ပြရန် အသုံးပြုသော အမည် သို့မဟုတ် အမျိုးအစား။
+*   **Domain Expertise**: သီးခြားနယ်ပယ်တစ်ခု သို့မဟုတ် ဘာသာရပ်တစ်ခုတွင် နက်ရှိုင်းသော အသိပညာ သို့မဟုတ် ကျွမ်းကျင်မှု။
+*   **Prompt**: Large Language Models (LLMs) ကို တိကျသောလုပ်ငန်းတစ်ခု လုပ်ဆောင်ရန် သို့မဟုတ် အချက်အလက်ပေးရန်အတွက် ပေးပို့သော input text သို့မဟုတ် မေးခွန်း။
+*   **Randomness**: ကြိုတင်ခန့်မှန်းနိုင်ခြင်းမရှိသော သို့မဟုတ် အစီအစဉ်မရှိသော အခြေအနေများ။
+*   **`num_return_sequences`**: text generation pipeline တွင် ဖန်တီးမည့် output sequence အရေအတွက်ကို ထိန်းချုပ်ရန် အသုံးပြုသော argument။
+*   **`max_length`**: text generation pipeline တွင် ထွက်ပေါ်လာမည့် output text ၏ အရှည်ဆုံး ဖြစ်နိုင်သော အရှည်ကို သတ်မှတ်ရန် အသုံးပြုသော argument။
+*   **Inference Providers**: Hugging Face Hub ပေါ်ရှိ မော်ဒယ်များကို cloud-based API မှတစ်ဆင့် အသုံးပြုနိုင်စေသော ဝန်ဆောင်မှု။
+*   **Mask Token**: `fill-mask` လုပ်ငန်းတာဝန်များတွင် စာသားထဲက ကွက်လပ်တစ်ခုကို ကိုယ်စားပြုသော အထူး token။
+*   **Named Entity Recognition (NER)**: စာသားတစ်ခုထဲက လူအမည်၊ နေရာအမည်၊ အဖွဲ့အစည်းအမည် စတဲ့ သီးခြားအမည်တွေကို ရှာဖွေဖော်ထုတ်ခြင်း။
+*   **Entities**: အချက်အလက်များ သို့မဟုတ် အရာဝတ္ထုများ (ဥပမာ- လူ၊ နေရာ၊ အဖွဲ့အစည်း)။
+*   **`grouped_entities`**: NER pipeline တွင် အတူတူရှိသော entity အပိုင်းအစများကို တစ်ခုတည်းအဖြစ် စုစည်းရန် အသုံးပြုသော option။
+*   **Part-of-speech Tagging (POS)**: စာကြောင်းတစ်ခုရှိ စကားလုံးတစ်လုံးစီ၏ သဒ္ဒါဆိုင်ရာ အစိတ်အပိုင်း (ဥပမာ- နာမ်၊ ကြိယာ၊ နာမဝိသေသန) ကို ခွဲခြားသတ်မှတ်ခြင်း။
+*   **Question Answering**: မေးခွန်းတစ်ခုနဲ့ ပေးထားတဲ့ အကြောင်းအရာတစ်ခုကနေ အချက်အလက်တွေကို ထုတ်ယူပြီး အဖြေရှာတဲ့ လုပ်ငန်းတာဝန်။
+*   **`min_length`**: text generation သို့မဟုတ် summarization pipeline တွင် ထွက်ပေါ်လာမည့် output text ၏ အတိုဆုံး ဖြစ်နိုင်သော အရှည်ကို သတ်မှတ်ရန် အသုံးပြုသော argument။
+*   **Multilingual Models**: ဘာသာစကားများစွာကို နားလည်ပြီး လုပ်ဆောင်နိုင်သော မော်ဒယ်များ။
diff --git a/chapters/my/chapter1/4.mdx b/chapters/my/chapter1/4.mdx
new file mode 100644
index 000000000..43e613876
--- /dev/null
+++ b/chapters/my/chapter1/4.mdx
@@ -0,0 +1,255 @@
+# Transformers တွေက ဘယ်လိုအလုပ်လုပ်လဲ။[[how-do-transformers-work]]
+
+<CourseFloatingBanner
+    chapter={1}
+    classNames="absolute z-10 right-0 top-0"
+/>
+
+ဒီအပိုင်းမှာတော့ Transformer မော်ဒယ်တွေရဲ့ architecture ကို ကြည့်ရှုမှာဖြစ်ပြီး attention, encoder-decoder architecture နဲ့ အခြားအကြောင်းအရာတွေကို ပိုမိုနက်ရှိုင်းစွာ လေ့လာသွားမှာ ဖြစ်ပါတယ်။
+
+> [!WARNING]
+> 🚀 ဒီမှာ အရှိန်မြှင့်လိုက်ပါပြီ။ ဒီအပိုင်းက အသေးစိတ်ပြီး နည်းပညာပိုင်းဆိုင်ရာ အချက်အလက်များပါတယ်။ ဒါကြောင့် ချက်ချင်း နားမလည်ရင် စိတ်မပူပါနဲ့။ ဒီအယူအဆတွေကို သင်တန်းရဲ့ နောက်ပိုင်းမှာ ပြန်ပြီး ပြောပြပေးပါမယ်။
+
+## Transformer သမိုင်း အနည်းငယ်[[a-bit-of-transformer-history]]
+
+Transformer မော်ဒယ်တွေရဲ့ (တိုတောင်းတဲ့) သမိုင်းမှာ အရေးပါတဲ့ အချက်အချို့ကို ဒီမှာ ဖော်ပြထားပါတယ်။
+
+<div class="flex justify-center">
+<img class="block dark:hidden" src="https://huggingface.co/datasets/huggingface-course/documentation-images/resolve/main/en/chapter1/transformers_chrono.svg" alt="A brief chronology of Transformers models.">
+<img class="hidden dark:block" src="https://huggingface.co/datasets/huggingface-course/documentation-images/resolve/main/en/chapter1/transformers_chrono-dark.svg" alt="A brief chronology of Transformers models.">
+</div>
+
+[Transformer architecture](https://arxiv.org/abs/1706.03762) ကို ၂၀၁၇ ခုနှစ်၊ ဇွန်လမှာ စတင်မိတ်ဆက်ခဲ့ပါတယ်။ မူလသုတေသနရဲ့ အဓိကအာရုံကတော့ ဘာသာပြန်ခြင်း လုပ်ငန်းတွေ ဖြစ်ပါတယ်။ ထို့နောက်မှာတော့ သြဇာကြီးမားတဲ့ မော်ဒယ်များစွာကို မိတ်ဆက်ခဲ့ပါတယ်။ ၎င်းတို့မှာ-
+
+- **၂၀၁၈ ခုနှစ်၊ ဇွန်လ**: [GPT](https://cdn.openai.com/research-covers/language-unsupervised/language_understanding_paper.pdf)၊ ပထမဆုံးသော pretrained Transformer model ဖြစ်ပြီး NLP လုပ်ငန်းမျိုးစုံအတွက် fine-tuning လုပ်ရာမှာ အသုံးပြုခဲ့ကာ state-of-the-art ရလဒ်များကို ရရှိခဲ့ပါတယ်။
+
+- **၂၀၁၈ ခုနှစ်၊ အောက်တိုဘာလ**: [BERT](https://arxiv.org/abs/1810.04805)၊ နောက်ထပ်ကြီးမားတဲ့ pretrained model တစ်ခုဖြစ်ပြီး စာကြောင်းတွေရဲ့ အနှစ်ချုပ်တွေကို ပိုမိုကောင်းမွန်အောင် ထုတ်လုပ်ဖို့ ဒီဇိုင်းထုတ်ထားပါတယ်။ (နောက်အခန်းမှာ အသေးစိတ်ပြောပါမယ်!)
+
+- **၂၀၁၉ ခုနှစ်၊ ဖေဖော်ဝါရီလ**: [GPT-2](https://cdn.openai.com/better-language-models/language_models_are_unsupervised_multitask_learners.pdf)၊ GPT ရဲ့ ပိုမိုကောင်းမွန်ပြီး (ပိုကြီးတဲ့) ဗားရှင်းတစ်ခုဖြစ်ပြီး ကျင့်ဝတ်ဆိုင်ရာ စိုးရိမ်ပူပန်မှုတွေကြောင့် ချက်ချင်းထုတ်ပြန်ခြင်း မရှိခဲ့ပါဘူး။
+
+- **၂၀၁၉ ခုနှစ်၊ အောက်တိုဘာလ**: [T5](https://huggingface.co/papers/1910.10683)၊ sequence-to-sequence Transformer architecture ကို အခြေခံပြီး multi-task focused လုပ်ဆောင်နိုင်သော မော်ဒယ်တစ်ခု။
+
+- **၂၀၂၀ ခုနှစ်၊ မေလ**: [GPT-3](https://huggingface.co/papers/2005.14165)၊ GPT-2 ထက် ပိုမိုကြီးမားသော ဗားရှင်းဖြစ်ပြီး fine-tuning လုပ်ရန်မလိုဘဲ လုပ်ငန်းမျိုးစုံကို ကောင်းစွာလုပ်ဆောင်နိုင်ပါတယ်။ (၎င်းကို _zero-shot learning_ ဟု ခေါ်ပါတယ်။)
+
+- **၂၀၂၂ ခုနှစ်၊ ဇန်နဝါရီလ**: [InstructGPT](https://huggingface.co/papers/2203.02155)၊ ညွှန်ကြားချက်များကို ပိုမိုကောင်းမွန်စွာ လိုက်နာနိုင်ရန် လေ့ကျင့်ပေးထားသော GPT-3 ၏ ဗားရှင်းတစ်ခု။
+
+ဒီစာရင်းက ပြည့်စုံလွန်းတာ မဟုတ်ပါဘူး။ Transformer မော်ဒယ်အမျိုးအစားအချို့ကိုသာ မီးမောင်းထိုးပြထားတာ ဖြစ်ပါတယ်။ ယေဘုယျအားဖြင့် ၎င်းတို့ကို အမျိုးအစား သုံးမျိုး ခွဲခြားနိုင်ပါတယ်။
+
+- **၂၀၂၃ ခုနှစ်၊ ဇန်နဝါရီလ**: [Llama](https://huggingface.co/papers/2302.13971)၊ ဘာသာစကားမျိုးစုံဖြင့် စာသားများစွာကို ဖန်တီးနိုင်သော Large Language Model တစ်ခု။
+
+- **၂၀၂၃ ခုနှစ်၊ မတ်လ**: [Mistral](https://huggingface.co/papers/2310.06825)၊ parameter ၇ ဘီလီယံပါရှိသော language model ဖြစ်ပြီး အကဲဖြတ်ထားသော benchmark အားလုံးတွင် Llama 2 13B ကို သာလွန်သည်။ ၎င်းသည် ပိုမိုမြန်ဆန်သော inference အတွက် grouped-query attention နှင့် စိတ်ကြိုက်အလျားရှိသော sequences များကို ကိုင်တွယ်ရန် sliding window attention ကို အသုံးပြုသည်။
+
+- **၂၀၂၄ ခုနှစ်၊ မေလ**: [Gemma 2](https://huggingface.co/papers/2408.00118)၊ 2B မှ 27B parameters အထိ ရှိသော ပေါ့ပါးသော၊ state-of-the-art open models မိသားစုတစ်ခုဖြစ်ပြီး interleaved local-global attentions နှင့် group-query attention တို့ကို ပေါင်းစပ်ထားသည်။ ပိုမိုသေးငယ်သော မော်ဒယ်များကို knowledge distillation အသုံးပြု၍ လေ့ကျင့်ထားပြီး ၎င်းတို့ထက် ၂-၃ ဆ ပိုကြီးသော မော်ဒယ်များနှင့် ယှဉ်ပြိုင်နိုင်သော စွမ်းဆောင်ရည်ကို ပေးသည်။
+
+- **၂၀၂၄ ခုနှစ်၊ နိုဝင်ဘာလ**: [SmolLM2](https://huggingface.co/papers/2502.02737)၊ state-of-the-art small language model (135 million မှ 1.7 billion parameters) ဖြစ်ပြီး ၎င်း၏ ကျစ်လျစ်သော အရွယ်အစားရှိသော်လည်း ထူးခြားသော စွမ်းဆောင်ရည်ကို ရရှိစေပြီး mobile နှင့် edge devices များအတွက် ဖြစ်နိုင်ခြေအသစ်များကို ဖွင့်ပေးသည်။
+
+- GPT-ကဲ့သို့သော မော်ဒယ်များ  (_auto-regressive_ Transformer models လို့လည်းခေါ်ကြပါတယ်)
+- BERT-ကဲ့သို့သော မော်ဒယ်များ(_auto-encoding_ Transformer models  လို့လည်းခေါ်ကြပါတယ်) 
+- T5-ကဲ့သို့သော မော်ဒယ်များ (_sequence-to-sequence_ Transformer models  လို့လည်းခေါ်ကြပါတယ်)
+
+ဒီအမျိုးအစားတွေကို နောက်ပိုင်းမှာ ပိုမိုနက်နဲစွာ လေ့လာသွားမှာ ဖြစ်ပါတယ်။
+
+## Transformers တွေဟာ language models တွေ ဖြစ်ပါတယ်။ [[transformers-are-language-models]]
+
+အထက်မှာ ဖော်ပြခဲ့တဲ့ Transformer မော်ဒယ်အားလုံး (GPT, BERT, T5, စသည်ဖြင့်) ကို *language models* အဖြစ် လေ့ကျင့်ထားပါတယ်။ ဒါကတော့ ၎င်းတို့ကို များပြားလှတဲ့ ကုန်ကြမ်းစာသား(raw texts)တွေပေါ်မှာ self-supervised ပုံစံနဲ့ လေ့ကျင့်ထားတယ်လို့ ဆိုလိုပါတယ်။
+
+Self-supervised learning ဆိုတာက မော်ဒယ်ရဲ့ input တွေကနေ ရည်ရွယ်ချက်ကို အလိုအလျောက် တွက်ချက်ပေးတဲ့ သင်ယူမှုပုံစံတစ်ခုပါ။ ဒါကြောင့် ဒေတာတွေကို လူသားတွေက label လုပ်ပေးဖို့ မလိုအပ်ပါဘူး။
+
+ဒီလို မော်ဒယ်မျိုးက သူ လေ့ကျင့်ထားတဲ့ ဘာသာစကားရဲ့ စာရင်းအင်းဆိုင်ရာ နားလည်မှုကို တည်ဆောက်ပေမယ့်၊ သီးခြားလက်တွေ့လုပ်ငန်းတာဝန်တွေအတွက်တော့ အသုံးဝင်မှု နည်းပါးပါတယ်။ ဒါကြောင့် ယေဘုယျ pretrained မော်ဒယ်ကို *transfer learning* ဒါမှမဟုတ် *fine-tuning* လို့ခေါ်တဲ့ လုပ်ငန်းစဉ်တစ်ခုကို လုပ်ဆောင်ပါတယ်။ ဒီလုပ်ငန်းစဉ်အတွင်းမှာ မော်ဒယ်ကို -- လူသားတွေက annotated လုပ်ထားတဲ့ labels တွေ အသုံးပြုပြီး -- ပေးထားတဲ့ လုပ်ငန်းတာဝန်တစ်ခုပေါ်မှာ supervised ပုံစံနဲ့ fine-tune လုပ်ပါတယ်။
+
+လုပ်ငန်းတာဝန်တစ်ခုရဲ့ ဥပမာတစ်ခုကတော့ ယခင် *n* စကားလုံးတွေကို ဖတ်ပြီးနောက် စာကြောင်းတစ်ခုရဲ့ နောက်စကားလုံးကို ခန့်မှန်းခြင်းပါပဲ။ ဒါကို *causal language modeling* လို့ ခေါ်ပါတယ်။ ဘာဖြစ်လို့လဲဆိုတော့ output က အတိတ်နဲ့ ပစ္စုပ္ပန် input တွေပေါ် မူတည်ပေမယ့် အနာဂတ် input တွေပေါ် မမူတည်လို့ပါ။
+
+<div class="flex justify-center">
+<img class="block dark:hidden" src="https://huggingface.co/datasets/huggingface-course/documentation-images/resolve/main/en/chapter1/causal_modeling.svg" alt="Example of causal language modeling in which the next word from a sentence is predicted.">
+<img class="hidden dark:block" src="https://huggingface.co/datasets/huggingface-course/documentation-images/resolve/main/en/chapter1/causal_modeling-dark.svg" alt="Example of causal language modeling in which the next word from a sentence is predicted.">
+</div>
+
+နောက်ထပ် ဥပမာတစ်ခုကတော့ *masked language modeling* ဖြစ်ပြီး၊ အဲဒီမှာ မော်ဒယ်က စာကြောင်းထဲက masked word ကို ခန့်မှန်းပါတယ်။
+
+<div class="flex justify-center">
+<img class="block dark:hidden" src="https://huggingface.co/datasets/huggingface-course/documentation-images/resolve/main/en/chapter1/masked_modeling.svg" alt="Example of masked language modeling in which a masked word from a sentence is predicted.">
+<img class="hidden dark:block" src="https://huggingface.co/datasets/huggingface-course/documentation-images/resolve/main/en/chapter1/masked_modeling-dark.svg" alt="Example of masked language modeling in which a masked word from a sentence is predicted.">
+</div>
+
+## Transformers တွေဟာ မော်ဒယ်ကြီးတွေ ဖြစ်ပါတယ်။[[transformers-are-big-models]]
+
+DistilBERT လို ထူးခြားချက်အချို့ကလွဲလို့ ပိုမိုကောင်းမွန်တဲ့ စွမ်းဆောင်ရည်ကို ရရှိဖို့အတွက် ယေဘုယျနည်းဗျူဟာကတော့ မော်ဒယ်တွေရဲ့ အရွယ်အစားကို တိုးမြှင့်ခြင်းအပြင် ၎င်းတို့ကို pretrain လုပ်တဲ့ ဒေတာပမာဏကိုပါ တိုးမြှင့်ခြင်း ဖြစ်ပါတယ်။
+
+<div class="flex justify-center">
+<img src="https://huggingface.co/datasets/huggingface-course/documentation-images/resolve/main/en/chapter1/model_parameters.png" alt="Number of parameters of recent Transformers models" width="90%">
+</div>
+
+ကံမကောင်းစွာနဲ့ပဲ မော်ဒယ်တစ်ခု၊ အထူးသဖြင့် မော်ဒယ်ကြီးတစ်ခုကို လေ့ကျင့်ဖို့အတွက် ဒေတာအမြောက်အမြား လိုအပ်ပါတယ်။ ဒါကတော့ အချိန်နဲ့ ကွန်ပျူတာ အရင်းအမြစ်တွေအတွက် အလွန်ကုန်ကျများပါတယ်။ အောက်ပါ graph မှာ မြင်ရတဲ့အတိုင်း သဘာဝပတ်ဝန်းကျင်အပေါ် သက်ရောက်မှုတွေလည်း ရှိပါတယ်။
+
+<div class="flex justify-center">
+<img class="block dark:hidden" src="https://huggingface.co/datasets/huggingface-course/documentation-images/resolve/main/en/chapter1/carbon_footprint.svg" alt="The carbon footprint of a large language model.">
+<img class="hidden dark:block" src="https://huggingface.co/datasets/huggingface-course/documentation-images/resolve/main/en/chapter1/carbon_footprint-dark.svg" alt="The carbon footprint of a large language model.">
+</div>
+
+<Youtube id="ftWlj4FBHTg"/>
+
+ဒါကတော့ pretraining ရဲ့ သဘာဝပတ်ဝန်းကျင်အပေါ် သက်ရောက်မှုကို လျှော့ချဖို့ သတိရှိရှိ ကြိုးစားနေတဲ့ အဖွဲ့တစ်ဖွဲ့က ဦးဆောင်တဲ့ (အလွန်ကြီးမားတဲ့) မော်ဒယ်တစ်ခုအတွက် စီမံကိန်းကို ပြသနေတာ ဖြစ်ပါတယ်။ အကောင်းဆုံး hyperparameters တွေ ရရှိဖို့အတွက် စမ်းသပ်မှုများစွာကို လုပ်ဆောင်ခဲ့ရင် ထွက်လာမယ့် ကာဗွန်ခြေရာက ပိုမိုမြင့်မားပါလိမ့်မယ်။
+
+သုတေသနအဖွဲ့၊ ကျောင်းသားအဖွဲ့အစည်း ဒါမှမဟုတ် ကုမ္ပဏီတစ်ခုက မော်ဒယ်တစ်ခုကို လေ့ကျင့်ချင်တိုင်း အစကနေ ပြန်လေ့ကျင့်ရမယ်ဆိုရင် ဘယ်လိုဖြစ်မလဲဆိုတာ စဉ်းစားကြည့်ပါ။ ဒါက ကမ္ဘာလုံးဆိုင်ရာ အလွန်ကြီးမားပြီး မလိုအပ်တဲ့ ကုန်ကျစရိတ်တွေ ဖြစ်ပေါ်စေပါလိမ့်မယ်။
+
+ဒါကြောင့် language models တွေကို မျှဝေခြင်းဟာ အလွန်အရေးကြီးပါတယ်။ လေ့ကျင့်ပြီးသား weights တွေကို မျှဝေခြင်းနဲ့ ရှိပြီးသား weights တွေပေါ်မှာ တည်ဆောက်ခြင်းက အလုံးစုံ ကွန်ပျူတာကုန်ကျစရိတ်နဲ့ ကာဗွန်ခြေရာကို လျှော့ချပေးပါတယ်။
+
+စကားမစပ်၊ သင့်မော်ဒယ်တွေရဲ့ training ကာဗွန်ခြေရာကို ကိရိယာများစွာနဲ့ အကဲဖြတ်နိုင်ပါတယ်။ ဥပမာ [ML CO2 Impact](https://mlco2.github.io/impact/) ဒါမှမဟုတ် 🤗 Transformers မှာ ပေါင်းစပ်ပါဝင်တဲ့ [Code Carbon]( https://codecarbon.io/) စတာတွေ ဖြစ်ပါတယ်။ ဒီအကြောင်း ပိုမိုသိရှိလိုပါက သင်ရဲ့ training ရဲ့ ခြေရာကို ခန့်မှန်းပေးမယ့် `emissions.csv` ဖိုင်တစ်ခုကို ဘယ်လိုထုတ်လုပ်ရမယ်ဆိုတာ ပြသထားတဲ့ [blog post](https://huggingface.co/blog/carbon-emissions-on-the-hub) ကို ဖတ်ရှုနိုင်ပါတယ်။ ဒါ့အပြင် ဒီအကြောင်းအရာကို ဆွေးနွေးထားတဲ့ 🤗 Transformers ရဲ့ [documentation](https://huggingface.co/docs/hub/model-cards-co2)  ကိုလည်း ဖတ်ရှုနိုင်ပါတယ်။
+
+## Transfer Learning[[transfer-learning]]
+
+<Youtube id="BqqfQnyjmgg" />
+
+*Pretraining* ဆိုတာက မော်ဒယ်တစ်ခုကို အစကနေ လေ့ကျင့်ခြင်းဖြစ်ပါတယ်။ weights တွေကို ကျပန်းသတ်မှတ်ပြီး သင်ယူမှုအတွေ့အကြုံမရှိဘဲ လေ့ကျင့်မှု စတင်ပါတယ်။
+
+<div class="flex justify-center">
+<img class="block dark:hidden" src="https://huggingface.co/datasets/huggingface-course/documentation-images/resolve/main/en/chapter1/pretraining.svg" alt="The pretraining of a language model is costly in both time and money.">
+<img class="hidden dark:block" src="https://huggingface.co/datasets/huggingface-course/documentation-images/resolve/main/en/chapter1/pretraining-dark.svg" alt="The pretraining of a language model is costly in both time and money.">
+</div>
+
+ဒီ pretraining ကို များသောအားဖြင့် များပြားလှတဲ့ ဒေတာတွေပေါ်မှာ လုပ်ဆောင်ပါတယ်။ ဒါကြောင့် ကြီးမားတဲ့ data corpus လိုအပ်ပြီး training ဟာ ရက်သတ္တပတ်ပေါင်းများစွာ ကြာမြင့်နိုင်ပါတယ်။
+
+*Fine-tuning* ကတော့ မော်ဒယ်တစ်ခုကို pretrained လုပ်ပြီး **နောက်**မှာ လုပ်ဆောင်တဲ့ training ဖြစ်ပါတယ်။ fine-tuning လုပ်ဖို့အတွက် သင်ဟာ pretrained language model တစ်ခုကို အရင်ရယူပြီးမှ သင်ရဲ့လုပ်ငန်းတာဝန်အတွက် သီးသန့် dataset နဲ့ ထပ်မံ training လုပ်ရပါတယ်။ ဘာလို့ ကျွန်တော်တို့ရဲ့ နောက်ဆုံးအသုံးပြုမှုအတွက် မော်ဒယ်ကို အစကနေ (**scratch**) လုံးဝ မလေ့ကျင့်တာလဲ။ အကြောင်းရင်းအချို့ ရှိပါတယ်။
+
+*   Pretrained မော်ဒယ်ကို fine-tuning dataset နဲ့ ဆင်တူတဲ့ dataset တစ်ခုပေါ်မှာ လေ့ကျင့်ထားပြီးသား ဖြစ်ပါတယ်။ ဒါကြောင့် fine-tuning လုပ်ငန်းစဉ်ဟာ မူလမော်ဒယ်က pretraining လုပ်စဉ် ရရှိခဲ့တဲ့ အသိပညာ (ဥပမာ- NLP ပြဿနာများအတွက် pretrained မော်ဒယ်ဟာ သင်အသုံးပြုမယ့် ဘာသာစကားရဲ့ စာရင်းအင်းဆိုင်ရာ နားလည်မှု အချို့ကို ရရှိထားမှာပါ) ကို အသုံးချနိုင်ပါတယ်။
+*   Pretrained မော်ဒယ်ကို ဒေတာများစွာပေါ်မှာ လေ့ကျင့်ထားပြီးဖြစ်တာကြောင့် fine-tuning လုပ်ဖို့အတွက် သင့်တင့်တဲ့ ရလဒ်တွေရဖို့ ဒေတာပမာဏ အများကြီး လျော့နည်းစွာ လိုအပ်ပါတယ်။
+*   အလားတူပဲ ကောင်းမွန်တဲ့ ရလဒ်တွေရဖို့အတွက် လိုအပ်တဲ့ အချိန်နဲ့ အရင်းအမြစ် ပမာဏဟာ အများကြီး လျော့နည်းပါတယ်။
+
+ဥပမာအားဖြင့် English ဘာသာစကားပေါ်မှာ လေ့ကျင့်ထားတဲ့ pretrained model တစ်ခုကို အသုံးပြုပြီး arXiv corpus ပေါ်မှာ fine-tune လုပ်ခြင်းဖြင့် သိပ္ပံ/သုတေသနအခြေခံ မော်ဒယ်တစ်ခုကို ရရှိနိုင်ပါတယ်။ fine-tuning လုပ်ရာမှာ ဒေတာအနည်းငယ်သာ လိုအပ်ပါလိမ့်မယ်။ pretrained model ရရှိထားတဲ့ အသိပညာကို "လွှဲပြောင်းပေးခြင်း" (transferred) ဖြစ်တာကြောင့် *transfer learning* လို့ ခေါ်ဆိုရခြင်း ဖြစ်ပါတယ်။
+
+<div class="flex justify-center">
+<img class="block dark:hidden" src="https://huggingface.co/datasets/huggingface-course/documentation-images/resolve/main/en/chapter1/finetuning.svg" alt="The fine-tuning of a language model is cheaper than pretraining in both time and money.">
+<img class="hidden dark:block" src="https://huggingface.co/datasets/huggingface-course/documentation-images/resolve/main/en/chapter1/finetuning-dark.svg" alt="The fine-tuning of a language model is cheaper than pretraining in both time and money.">
+</div>
+
+ဒါကြောင့် မော်ဒယ်တစ်ခုကို fine-tuning လုပ်ခြင်းက အချိန်၊ ဒေတာ၊ ငွေကြေးနဲ့ သဘာဝပတ်ဝန်းကျင်ဆိုင်ရာ ကုန်ကျစရိတ်တွေကို လျှော့ချပေးပါတယ်။ training လုပ်တာဟာ pretraining အပြည့်အစုံထက် ကန့်သတ်ချက်နည်းတာကြောင့် မတူညီတဲ့ fine-tuning ပုံစံတွေကို ထပ်ခါတလဲလဲ ပြုလုပ်ဖို့ ပိုမြန်ဆန်ပြီး ပိုလွယ်ကူပါတယ်။
+
+ဒီလုပ်ငန်းစဉ်ဟာ အစကနေ လေ့ကျင့်တာထက် ပိုမိုကောင်းမွန်တဲ့ ရလဒ်တွေ ရရှိစေပါလိမ့်မယ် (သင့်မှာ ဒေတာအများကြီးမရှိဘူးဆိုရင်)။ ဒါကြောင့် သင်ဟာ သင်လုပ်ဆောင်မယ့် လုပ်ငန်းနဲ့ အနီးစပ်ဆုံးဖြစ်တဲ့ pretrained model တစ်ခုကို အမြဲတမ်း အသုံးပြုပြီး fine-tune လုပ်သင့်ပါတယ်။
+
+## ယေဘုယျ Transformer architecture[[general-transformer-architecture]]
+
+ဒီအပိုင်းမှာတော့ Transformer မော်ဒယ်ရဲ့ ယေဘုယျ architecture ကို ခြုံငုံသုံးသပ်သွားပါမယ်။ အချို့အယူအဆတွေကို နားမလည်ရင် စိတ်မပူပါနဲ့။ အစိတ်အပိုင်းတစ်ခုချင်းစီကို အသေးစိတ် ဖော်ပြထားတဲ့ အပိုင်းတွေ နောက်ပိုင်းမှာ ရှိပါသေးတယ်။
+
+<Youtube id="H39Z_720T5s" />
+
+မော်ဒယ်ဟာ အဓိကအားဖြင့် အပိုင်းနှစ်ပိုင်းနဲ့ ဖွဲ့စည်းထားပါတယ်။
+
+* **Encoder (ဘယ်ဘက်)**: Encoder က input ကို လက်ခံပြီး ၎င်းရဲ့ ကိုယ်စားပြုမှု (features) ကို တည်ဆောက်ပါတယ်။ ဒါကတော့ မော်ဒယ်ဟာ input ကနေ နားလည်နိုင်စွမ်းကို ရယူဖို့ အကောင်းဆုံးဖြစ်အောင် ပြင်ဆင်ထားတယ်လို့ ဆိုလိုပါတယ်။
+* **Decoder (ညာဘက်)**: Decoder က encoder ရဲ့ ကိုယ်စားပြုမှု (features) ကို အခြား inputs တွေနဲ့အတူ အသုံးပြုပြီး target sequence တစ်ခုကို ဖန်တီးပေးပါတယ်။ ဒါကတော့ မော်ဒယ်ဟာ outputs တွေကို ဖန်တီးဖို့ အကောင်းဆုံးဖြစ်အောင် ပြင်ဆင်ထားတယ်လို့ ဆိုလိုပါတယ်။
+
+<div class="flex justify-center">
+<img class="block dark:hidden" src="https://huggingface.co/datasets/huggingface-course/documentation-images/resolve/main/en/chapter1/transformers_blocks.svg" alt="Architecture of a Transformers models">
+<img class="hidden dark:block" src="https://huggingface.co/datasets/huggingface-course/documentation-images/resolve/main/en/chapter1/transformers_blocks-dark.svg" alt="Architecture of a Transformers models">
+</div>
+
+ဒီအပိုင်းတစ်ခုချင်းစီကို လုပ်ငန်းတာဝန်ပေါ်မူတည်ပြီး သီးခြားစီ အသုံးပြုနိုင်ပါတယ်။
+
+*   **Encoder-only models**: စာကြောင်းခွဲခြားသတ်မှတ်ခြင်း (sentence classification) နဲ့ သီးခြားအမည် ဖော်ထုတ်ခြင်း (named entity recognition) လို input ကို နားလည်ဖို့ လိုအပ်တဲ့ လုပ်ငန်းတွေအတွက် ကောင်းပါတယ်။
+*   **Decoder-only models**: စာသားထုတ်လုပ်ခြင်း (text generation) လို ဖန်တီးမှုဆိုင်ရာ လုပ်ငန်းတွေအတွက် ကောင်းပါတယ်။
+*   **Encoder-decoder models** သို့မဟုတ် **sequence-to-sequence models**: ဘာသာပြန်ခြင်း ဒါမှမဟုတ် အကျဉ်းချုပ်ခြင်း လို input လိုအပ်တဲ့ ဖန်တီးမှုဆိုင်ရာ လုပ်ငန်းတွေအတွက် ကောင်းပါတယ်။
+
+ဒီ architecture တွေကို နောက်ပိုင်းအပိုင်းတွေမှာ သီးခြားစီ နက်နက်နဲနဲ လေ့လာသွားမှာ ဖြစ်ပါတယ်။
+
+## Attention layers[[attention-layers]]
+
+Transformer မော်ဒယ်တွေရဲ့ အဓိကအင်္ဂါရပ်တစ်ခုကတော့ ၎င်းတို့ကို *attention layers* လို့ခေါ်တဲ့ အထူး layers တွေနဲ့ တည်ဆောက်ထားခြင်း ဖြစ်ပါတယ်။ တကယ်တော့ Transformer architecture ကို မိတ်ဆက်တဲ့ စာတမ်းရဲ့ ခေါင်းစဉ်က ["Attention Is All You Need"](https://arxiv.org/abs/1706.03762) ပါပဲ။ attention layers တွေရဲ့ အသေးစိတ်အချက်အလက်တွေကို သင်တန်းရဲ့ နောက်ပိုင်းမှာ လေ့လာသွားပါမယ်။ အခုအတွက်တော့ ဒီ layer က မော်ဒယ်ကို သင်ပေးပို့လိုက်တဲ့ စာကြောင်းထဲက တချို့စကားလုံးတွေကို သီးခြားအာရုံစိုက်ဖို့ (ကျန်တာတွေကိုတော့ လျစ်လျူရှုဖို့) စကားလုံးတစ်ခုစီရဲ့ ကိုယ်စားပြုမှု (representation) ကို လုပ်ဆောင်နေစဉ်မှာ ပြောပြပေးတယ်ဆိုတာကိုပဲ သိထားဖို့ လိုပါတယ်။
+
+ဒီအကြောင်းအရာကို နားလည်လွယ်အောင် ဥပမာတစ်ခုနဲ့ ပြောရရင် အင်္ဂလိပ်ဘာသာစကားကနေ ပြင်သစ်ဘာသာစကားကို ဘာသာပြန်ခြင်း လုပ်ငန်းကို စဉ်းစားကြည့်ပါ။ "You like this course" ဆိုတဲ့ input ကို ပေးတဲ့အခါ ဘာသာပြန်မော်ဒယ်က "like" ဆိုတဲ့ စကားလုံးအတွက် မှန်ကန်တဲ့ ဘာသာပြန်ကို ရရှိဖို့အတွက် ဘေးကပ်လျက်ရှိတဲ့ "You" ဆိုတဲ့ စကားလုံးကိုလည်း အာရုံစိုက်ဖို့ လိုပါတယ်။ ဘာလို့လဲဆိုတော့ ပြင်သစ်ဘာသာစကားမှာ "like" ဆိုတဲ့ ကြိယာကို subject ပေါ်မူတည်ပြီး ကွဲပြားစွာ တွဲစပ်ရလို့ပါ။ ဒါပေမယ့် စာကြောင်းရဲ့ ကျန်တဲ့အပိုင်းတွေကတော့ အဲဒီစကားလုံးရဲ့ ဘာသာပြန်ဖို့အတွက် အသုံးမဝင်ပါဘူး။ အလားတူပဲ "this" ကို ဘာသာပြန်တဲ့အခါ မော်ဒယ်က "course" ဆိုတဲ့ စကားလုံးကိုပါ အာရုံစိုက်ဖို့ လိုပါလိမ့်မယ်။ ဘာလို့လဲဆိုတော့ "this" က သက်ဆိုင်ရာနာမ်က ပုလ္လင် (masculine) လား၊ ဣတ္ထိလင် (feminine) လားဆိုတာပေါ်မူတည်ပြီး ကွဲပြားစွာ ဘာသာပြန်လို့ပါ။ ဒီမှာလည်း စာကြောင်းထဲက အခြားစကားလုံးတွေက "course" ကို ဘာသာပြန်ဖို့အတွက် အရေးမကြီးပါဘူး။ ပိုမိုရှုပ်ထွေးတဲ့ စာကြောင်းတွေ (နဲ့ ပိုမိုရှုပ်ထွေးတဲ့ သဒ္ဒါစည်းမျဉ်းတွေ) နဲ့ဆိုရင် မော်ဒယ်က စာကြောင်းထဲမှာ ဝေးကွာနေတဲ့ စကားလုံးတွေကိုပါ သီးခြားအာရုံစိုက်ဖို့ လိုအပ်ပါလိမ့်မယ်။
+
+ဒီလို အယူအဆမျိုးက သဘာဝဘာသာစကားနဲ့ သက်ဆိုင်တဲ့ လုပ်ငန်းတာဝန်တွေ အားလုံးမှာ အကျုံးဝင်ပါတယ်။ စကားလုံးတစ်ခုတည်းက သူ့ဘာသာသူ အဓိပ္ပာယ်ရှိပေမယ့်၊ အဲဒီအဓိပ္ပာယ်ဟာ context ကြောင့် နက်နက်နဲနဲ သက်ရောက်မှုရှိပါတယ်။ အဲဒီ context ဟာ လေ့လာနေတဲ့ စကားလုံးရဲ့ အရင် ဒါမှမဟုတ် နောက်ကပ်လျက်ရှိတဲ့ အခြားစကားလုံး (သို့မဟုတ် စကားလုံးများ) ဖြစ်နိုင်ပါတယ်။
+
+Attention layers တွေ ဘာအကြောင်းလဲဆိုတာကို နားလည်ပြီးပြီဆိုတော့ Transformer architecture ကို ပိုမိုနီးကပ်စွာ လေ့လာကြည့်ရအောင်။
+
+## မူလ architecture[[the-original-architecture]]
+
+Transformer architecture ကို မူလက ဘာသာပြန်ခြင်းအတွက် ဒီဇိုင်းထုတ်ခဲ့တာပါ။ training လုပ်နေစဉ်အတွင်း encoder က သတ်မှတ်ထားတဲ့ ဘာသာစကားတစ်ခုနဲ့ input တွေ (စာကြောင်းတွေ) ကို လက်ခံရရှိပြီး၊ decoder ကတော့ တူညီတဲ့ စာကြောင်းတွေကို လိုချင်တဲ့ target language နဲ့ လက်ခံရရှိပါတယ်။ encoder မှာ attention layers တွေက စာကြောင်းတစ်ကြောင်းလုံးရှိ စကားလုံးအားလုံးကို အသုံးပြုနိုင်ပါတယ်။ (ဘာလို့လဲဆိုတော့ ခုဏက မြင်ခဲ့ရတဲ့အတိုင်း စကားလုံးတစ်ခုရဲ့ ဘာသာပြန်ခြင်းက စာကြောင်းထဲမှာ အဲဒီစကားလုံးရဲ့ နောက်က ဒါမှမဟုတ် အရင်က ရှိနေတဲ့အရာတွေပေါ် မူတည်နိုင်လို့ပါ)။ ဒါပေမယ့် decoder ကတော့ တစ်ခုချင်းစီ အစဉ်လိုက် အလုပ်လုပ်ပြီး သူ ဘာသာပြန်ပြီးသား စာကြောင်းထဲက စကားလုံးတွေကိုပဲ အာရုံစိုက်နိုင်ပါတယ်။ (ဒါကြောင့် လက်ရှိ ထုတ်လုပ်နေတဲ့ စကားလုံးရဲ့ အရင်က စကားလုံးတွေကိုသာ)။ ဥပမာအားဖြင့် ဘာသာပြန်ထားတဲ့ target ရဲ့ ပထမစကားလုံး သုံးလုံးကို ခန့်မှန်းပြီးတဲ့အခါ ၎င်းတို့ကို decoder ကို ပေးလိုက်ပါတယ်။ ထို့နောက် decoder က encoder ရဲ့ input တွေအားလုံးကို အသုံးပြုပြီး စတုတ္ထစကားလုံးကို ခန့်မှန်းဖို့ ကြိုးစားပါတယ်။
+
+Training လုပ်နေစဉ်အတွင်း (မော်ဒယ်က target sentences တွေကို ဝင်ရောက်ကြည့်ရှုနိုင်တဲ့အခါ) အရှိန်မြှင့်ဖို့အတွက် decoder ကို target အပြည့်အစုံကို ထည့်ပေးပါတယ်။ ဒါပေမယ့် အနာဂတ်စကားလုံးတွေကို အသုံးပြုခွင့် မပြုပါဘူး။ (အကယ်၍ သူက position 2 မှာရှိတဲ့ စကားလုံးကို ခန့်မှန်းဖို့ ကြိုးစားနေစဉ် position 2 မှာရှိတဲ့ စကားလုံးကို ဝင်ရောက်ကြည့်ရှုခွင့်ရရင် ပြဿနာက သိပ်မခက်ခဲတော့ပါဘူး!)။ ဥပမာအားဖြင့် စတုတ္ထစကားလုံးကို ခန့်မှန်းဖို့ ကြိုးစားနေစဉ်မှာ attention layer က position 1 မှ 3 အထိရှိတဲ့ စကားလုံးတွေကိုပဲ ဝင်ရောက်ကြည့်ရှုနိုင်ပါလိမ့်မယ်။
+
+မူလ Transformer architecture က ဒီလိုပုံစံဖြစ်ပြီး ဘယ်ဘက်မှာ encoder နဲ့ ညာဘက်မှာ decoder ပါဝင်ပါတယ်။
+
+<div class="flex justify-center">
+<img class="block dark:hidden" src="https://huggingface.co/datasets/huggingface-course/documentation-images/resolve/main/en/chapter1/transformers.svg" alt="Architecture of a Transformers models">
+<img class="hidden dark:block" src="https://huggingface.co/datasets/huggingface-course/documentation-images/resolve/main/en/chapter1/transformers-dark.svg" alt="Architecture of a Transformers models">
+</div>
+
+decoder block မှာရှိတဲ့ ပထမ attention layer က decoder ရဲ့ အတိတ် inputs အားလုံးကို အာရုံစိုက်ပေမယ့် ဒုတိယ attention layer က encoder ရဲ့ output ကို အသုံးပြုတယ်ဆိုတာ သတိပြုပါ။ ဒါကြောင့် လက်ရှိစကားလုံးကို အကောင်းဆုံး ခန့်မှန်းနိုင်ဖို့ input စာကြောင်းတစ်ခုလုံးကို ဝင်ရောက်ကြည့်ရှုနိုင်ပါတယ်။ ဒါက အရမ်းအသုံးဝင်ပါတယ်။ ဘာလို့လဲဆိုတော့ မတူညီတဲ့ ဘာသာစကားတွေမှာ စကားလုံးတွေကို မတူညီတဲ့ အစီအစဉ်တွေနဲ့ ချထားတဲ့ သဒ္ဒါစည်းမျဉ်းတွေ ရှိနိုင်တာကြောင့် ဒါမှမဟုတ် စာကြောင်းထဲမှာ နောက်ပိုင်းမှာ ပေးထားတဲ့ context အချို့က ပေးထားတဲ့ စကားလုံးရဲ့ အကောင်းဆုံးဘာသာပြန်ကို ဆုံးဖြတ်ရာမှာ အထောက်အကူဖြစ်နိုင်လို့ပါ။
+
+*Attention mask* ကို encoder/decoder မှာလည်း အသုံးပြုနိုင်ပြီး မော်ဒယ်က အချို့အထူးစကားလုံးတွေကို အာရုံစိုက်ခြင်းမှ ကာကွယ်ပေးပါတယ်။ ဥပမာအားဖြင့် စာကြောင်းတွေကို စုပေါင်းတဲ့အခါ input အားလုံးကို အလျားတူအောင် ပြုလုပ်ဖို့ အသုံးပြုတဲ့ အထူး padding word စတာတွေပါ။
+
+##  Architectures vs. checkpoints[[architecture-vs-checkpoints]]
+
+ဒီသင်တန်းမှာ Transformer မော်ဒယ်တွေထဲကို နက်နက်နဲနဲ လေ့လာတဲ့အခါ *architectures* နဲ့ *checkpoints* အပြင် *models* ဆိုတဲ့ အသုံးအနှုန်းတွေကိုလည်း တွေ့ရပါလိမ့်မယ်။ ဒီအသုံးအနှုန်းတွေ အားလုံးမှာ အဓိပ္ပာယ်အနည်းငယ် ကွဲပြားပါတယ်။
+
+*   **Architecture**: ဒါကတော့ မော်ဒယ်ရဲ့ ပုံစံတည်ဆောက်ပုံ (skeleton) ဖြစ်ပါတယ်။ မော်ဒယ်အတွင်းမှာ ဖြစ်ပျက်နေတဲ့ layer တစ်ခုစီနဲ့ လုပ်ဆောင်မှုတစ်ခုစီရဲ့ အဓိပ္ပာယ်ဖွင့်ဆိုချက်ပါ။
+*   **Checkpoints**: ဒါတွေကတော့ ပေးထားတဲ့ architecture မှာ load လုပ်မယ့် weights တွေ ဖြစ်ပါတယ်။
+*   **Model**: ဒါကတော့ "architecture" သို့မဟုတ် "checkpoint" လို တိကျတဲ့ အဓိပ္ပာယ်မရှိဘဲ နှစ်ခုစလုံးကို ဆိုလိုနိုင်တဲ့ ယေဘုယျအသုံးအနှုန်းပါ။ ဒီသင်တန်းကတော့ မရေရာမှုတွေကို လျှော့ချဖို့အတွက် အရေးကြီးတဲ့အခါ *architecture* ဒါမှမဟုတ် *checkpoint* လို့ သတ်သတ်မှတ်မှတ် ဖော်ပြပေးပါလိမ့်မယ်။
+
+ဥပမာအားဖြင့် BERT က architecture တစ်ခုဖြစ်ပြီး `bert-base-cased` ကတော့ Google အဖွဲ့က BERT ရဲ့ ပထမဆုံးထုတ်ဝေမှုအတွက် လေ့ကျင့်ပေးထားတဲ့ weights အစုအဝေးဖြစ်တာကြောင့် checkpoint တစ်ခု ဖြစ်ပါတယ်။ ဒါပေမယ့် လူတစ်ဦးက "the BERT model" နဲ့ "the `bert-base-cased` model" လို့ နှစ်မျိုးစလုံး ပြောဆိုနိုင်ပါတယ်။
+
+## ဝေါဟာရ ရှင်းလင်းချက် (Glossary)
+
+*   **Transformer Models**: Natural Language Processing (NLP) မှာ အောင်မြင်မှုများစွာရရှိခဲ့တဲ့ deep learning architecture တစ်မျိုးပါ။ ၎င်းတို့ဟာ စာသားတွေထဲက စကားလုံးတွေရဲ့ ဆက်နွယ်မှုတွေကို "attention mechanism" သုံးပြီး နားလည်အောင် သင်ကြားပေးပါတယ်။
+*   **Attention**: Transformer model များတွင် အသုံးပြုသော ယန္တရားတစ်ခုဖြစ်ပြီး input sequence အတွင်းရှိ အရေးကြီးသော အစိတ်အပိုင်းများကို မော်ဒယ်အား ပိုမိုအာရုံစိုက်စေသည်။
+*   **Encoder-Decoder Architecture**: Encoder နှင့် Decoder နှစ်ခုစလုံး ပါဝင်သော Transformer architecture တစ်မျိုးဖြစ်ပြီး ဘာသာပြန်ခြင်းကဲ့သို့သော input sequence မှ output sequence တစ်ခုသို့ ပြောင်းလဲခြင်း လုပ်ငန်းများအတွက် အသုံးပြုပါတယ်။
+*   **Architecture**: Machine Learning မော်ဒယ်တစ်ခု၏ အတွင်းပိုင်းတည်ဆောက်ပုံ၊ အလွှာများ (layers) နှင့် လုပ်ဆောင်မှုများ (operations) ၏ အဓိပ္ပာယ်ဖွင့်ဆိုချက်။
+*   **GPT (Generative Pre-trained Transformer)**: OpenAI မှ တီထွင်ထားသော Transformer-based Large Language Model (LLM) အမျိုးအစားတစ်ခု။
+*   **Pretrained Model**: ဒေတာအမြောက်အမြားပေါ်တွင် ကြိုတင်လေ့ကျင့်ထားသော မော်ဒယ်။
+*   **Fine-tuning**: ကြိုတင်လေ့ကျင့်ထားပြီးသား (pre-trained) မော်ဒယ်တစ်ခုကို သီးခြားလုပ်ငန်းတစ်ခု (specific task) အတွက် အနည်းငယ်သော ဒေတာနဲ့ ထပ်မံလေ့ကျင့်ပေးခြင်းကို ဆိုလိုပါတယ်။
+*   **NLP (Natural Language Processing)**: ကွန်ပျူတာတွေ လူသားဘာသာစကားကို နားလည်၊ အဓိပ္ပာယ်ဖော်ပြီး၊ ဖန်တီးနိုင်အောင် လုပ်ဆောင်ပေးတဲ့ Artificial Intelligence (AI) ရဲ့ နယ်ပယ်ခွဲတစ်ခု ဖြစ်ပါတယ်။
+*   **BERT (Bidirectional Encoder Representations from Transformers)**: Google က ထုတ်လုပ်ထားတဲ့ Transformer-based Pretrained Model တစ်ခုဖြစ်ပြီး စာသားတွေရဲ့ အဓိပ္ပာယ်ကို နားလည်ဖို့အတွက် အသုံးပြုပါတယ်။
+*   **GPT-2**: GPT ရဲ့ ပိုမိုကောင်းမွန်ပြီး ပိုကြီးတဲ့ ဗားရှင်း။
+*   **Ethical Concerns**: ကျင့်ဝတ်ဆိုင်ရာ စိုးရိမ်ပူပန်မှုများ။
+*   **T5 (Text-to-Text Transfer Transformer)**: Google က ထုတ်လုပ်ထားတဲ့ Transformer-based Model တစ်ခုဖြစ်ပြီး NLP လုပ်ငန်းတာဝန်များစွာကို text-to-text format ဖြင့် ဖြေရှင်းရန် ဒီဇိုင်းထုတ်ထားပါတယ်။
+*   **Sequence-to-sequence**: input sequence တစ်ခုကို output sequence တစ်ခုအဖြစ် ပြောင်းလဲပေးသော မော်ဒယ်အမျိုးအစား။ (ဥပမာ- ဘာသာပြန်ခြင်း)
+*   **GPT-3**: GPT-2 ထက် ပိုမိုကြီးမားသော ဗားရှင်း။
+*   **Zero-shot learning**: မော်ဒယ်တစ်ခုကို သီးခြားလုပ်ငန်းတစ်ခုအတွက် လေ့ကျင့်မှုမရှိဘဲ လုပ်ငန်းကို လုပ်ဆောင်စေခြင်း။
+*   **InstructGPT**: ညွှန်ကြားချက်များကို ပိုမိုကောင်းမွန်စွာ လိုက်နာနိုင်ရန် လေ့ကျင့်ထားသော GPT-3 ၏ ဗားရှင်းတစ်ခု။
+*   **Llama**: Meta မှ တီထွင်ထားသော Transformer-based Large Language Model (LLM) အမျိုးအစားတစ်ခု။
+*   **Mistral**: ၇ ဘီလီယံ parameter ပါရှိသော Large Language Model (LLM) တစ်ခု။
+*   **Grouped-query attention**: Transformer model များတွင် အသုံးပြုသော attention mechanism တစ်မျိုးဖြစ်ပြီး inference ကို ပိုမိုမြန်ဆန်စေရန် ကူညီပေးသည်။
+*   **Inference**: လေ့ကျင့်ပြီးသား မော်ဒယ်တစ်ခုကို အသုံးပြု၍ input အသစ်များမှ ခန့်မှန်းချက်များ သို့မဟုတ် output များထုတ်လုပ်ခြင်း။
+*   **Sliding window attention**: Transformer model များတွင် အသုံးပြုသော attention mechanism တစ်မျိုးဖြစ်ပြီး ရှည်လျားသော sequences များကို ထိထိရောက်ရောက် ကိုင်တွယ်နိုင်စေသည်။
+*   **Gemma 2**: Google DeepMind မှ ထုတ်လုပ်သော lightweight, state-of-the-art open models မိသားစုတစ်ခု။
+*   **Interleaved local-global attentions**: Transformer model များတွင် အသုံးပြုသော attention mechanism တစ်မျိုးဖြစ်ပြီး local နှင့် global information နှစ်ခုလုံးကို အာရုံစိုက်နိုင်စေသည်။
+*   **Knowledge distillation**: ပိုမိုကြီးမားသော၊ ပိုမိုရှုပ်ထွေးသော မော်ဒယ် (teacher model) ၏ အသိပညာကို ပိုမိုသေးငယ်သော၊ ရိုးရှင်းသော မော်ဒယ် (student model) သို့ လွှဲပြောင်းပေးသည့် နည်းလမ်း။
+*   **SmolLM2**: သေးငယ်သော အရွယ်အစားရှိသော်လည်း ထူးခြားသော စွမ်းဆောင်ရည်ကို ရရှိစေသော Small Language Model (SLM) တစ်ခု။
+*   **Mobile and Edge Devices**: စမတ်ဖုန်းများ၊ တက်ဘလက်များ၊ IoT ကိရိယာများကဲ့သို့ ကွန်ပျူတာစွမ်းအား ကန့်သတ်ချက်ရှိသော ကိရိယာများ။
+*   **Auto-regressive Transformer models**: GPT ကဲ့သို့ မော်ဒယ်များ၊ နောက်ထပ်လာမည့် token ကို ယခင် token များအပေါ် အခြေခံ၍ ခန့်မှန်းသည်။
+*   **Auto-encoding Transformer models**: BERT ကဲ့သို့ မော်ဒယ်များ၊ masked token များကို input sequence တစ်ခုလုံးအပေါ် အခြေခံ၍ ခန့်မှန်းသည်။
+*   **Sequence-to-sequence Transformer models**: T5 ကဲ့သို့ မော်ဒယ်များ၊ input sequence တစ်ခုကို output sequence တစ်ခုအဖြစ် ပြောင်းလဲပေးသည်။
+*   **Language Models**: လူသားဘာသာစကားကို နားလည်ပြီး ဖန်တီးနိုင်အောင် သင်ကြားထားသော မော်ဒယ်များ။
+*   **Self-supervised learning**: မော်ဒယ်၏ input တွေကနေ ရည်ရွယ်ချက်ကို အလိုအလျောက် တွက်ချက်ပေးတဲ့ သင်ယူမှုပုံစံတစ်ခု။
+*   **Transfer Learning**: ကြိုတင်လေ့ကျင့်ထားသော မော်ဒယ်မှ ရရှိသောအသိပညာကို အခြားဆက်စပ်လုပ်ငန်းတစ်ခုသို့ လွှဲပြောင်းအသုံးပြုခြင်း။
+*   **Supervised Learning**: human-annotated labels တွေကို အသုံးပြုပြီး မော်ဒယ်ကို သင်ကြားပေးတဲ့ သင်ယူမှုပုံစံတစ်ခု။
+*   **Annotated Labels**: လူသားများက ဒေတာများကို မှတ်သားထားသော အမှတ်အသားများ သို့မဟုတ် အမျိုးအစားများ။
+*   **Causal Language Modeling**: input sequence ၏ ယခင် token များကို အခြေခံ၍ နောက်ထပ်လာမည့် token ကို ခန့်မှန်းခြင်း။
+*   **Masked Language Modeling**: input sequence ထဲရှိ masked (ဝှက်ထားသော) token များကို ခန့်မှန်းခြင်း။
+*   **Outliers**: အခြားသောအချက်အလက်များနှင့် ကွဲပြားစွာ ထူးခြားနေသော အချက်အလက်များ။
+*   **DistilBERT**: BERT မော်ဒယ်၏ ပိုမိုသေးငယ်ပြီး ပိုမိုမြန်ဆန်သော ဗားရှင်း။
+*   **Hyperparameters**: Machine Learning မော်ဒယ်တစ်ခုကို လေ့ကျင့်ရာတွင် သတ်မှတ်ထားသော parameter များ (ဥပမာ- learning rate, batch size)။
+*   **Carbon Footprint**: ကာဗွန်ဒိုင်အောက်ဆိုဒ် ထုတ်လွှတ်မှုပမာဏ။
+*   **Pretraining**: မော်ဒယ်တစ်ခုကို အစကနေ လေ့ကျင့်ခြင်း။
+*   **Weights**: Machine Learning မော်ဒယ်တစ်ခု၏ သင်ယူနိုင်သော အစိတ်အပိုင်းများ။
+*   **Randomly Initialized**: မော်ဒယ်၏ weights များကို ကျပန်းတန်ဖိုးများဖြင့် စတင်သတ်မှတ်ခြင်း။
+*   **Corpus**: စာသားများစွာ၏ စုဆောင်းမှု (Collection of text data)။
+*   **arXiv corpus**: သိပ္ပံနည်းကျ စာတမ်းများ၊ သုတေသနစာတမ်းများ စသည်တို့၏ စုဆောင်းမှု။
+*   **Encoder**: Transformer Architecture ရဲ့ အစိတ်အပိုင်းတစ်ခုဖြစ်ပြီး input data (ဥပမာ- စာသား) ကို နားလည်ပြီး ကိုယ်စားပြုတဲ့ အချက်အလက် (representation) အဖြစ် ပြောင်းလဲပေးပါတယ်။
+*   **Decoder**: Transformer Architecture ရဲ့ အစိတ်အပိုင်းတစ်ခုဖြစ်ပြီး encoder ကနေ ရရှိတဲ့ အချက်အလက် (representation) ကို အသုံးပြုပြီး output data (ဥပမာ- ဘာသာပြန်ထားတဲ့ စာသား) ကို ထုတ်ပေးပါတယ်။
+*   **Features**: ဒေတာတစ်ခု၏ ထူးခြားသော လက္ခဏာများ သို့မဟုတ် ဂုဏ်သတ္တိများ။
+*   **Sentence Classification**: စာကြောင်းတစ်ခုလုံးကို ကြိုတင်သတ်မှတ်ထားသော အမျိုးအစားများထဲသို့ ခွဲခြားသတ်မှတ်ခြင်း။
+*   **Named Entity Recognition (NER)**: စာသားထဲက လူအမည်၊ နေရာအမည်၊ အဖွဲ့အစည်းအမည် စတဲ့ သီးခြားအမည်တွေကို ရှာဖွေဖော်ထုတ်ခြင်း။
+*   **Text Generation**: AI မော်ဒယ်များကို အသုံးပြု၍ လူသားကဲ့သို့သော စာသားအသစ်များ ဖန်တီးခြင်း။
+*   **Translation**: ဘာသာစကားတစ်ခုကနေ အခြားဘာသာစကားတစ်ခုကို စာသားတွေ ဒါမှမဟုတ် စကားပြောတွေကို အလိုအလျောက် ဘာသာပြန်ဆိုခြင်း။
+*   **Summarization**: စာသားတစ်ခုကို အဓိကအချက်အလက်များသာ ပါဝင်သော အကျဉ်းချုပ်အဖြစ် ပြောင်းလဲခြင်း။
+*   **Attention Layers**: Transformer model များတွင် input data ၏ မတူညီသော အစိတ်အပိုင်းများအပေါ် အာရုံစိုက်နိုင်ရန် ကူညီပေးသော အလွှာများ။
+*   **Attention Mechanism**: Transformer model များတွင် input sequence အတွင်းရှိ မတူညီသော စကားလုံးများ၏ ဆက်နွယ်မှုကို နားလည်ရန် ကူညီပေးသော ယန္တရား။
+*   **Conjugated**: ကြိယာတစ်ခု၏ ပုံစံသည် subject သို့မဟုတ် tense ပေါ်မူတည်၍ ပြောင်းလဲခြင်း။
+*   **Masculine/Feminine**: ဘာသာစကားအချို့တွင် နာမ်များကို ခွဲခြားထားသော ကျား/မ လိင်ခွဲခြားမှု။
+*   **Context**: စကားလုံး၊ စာကြောင်း သို့မဟုတ် အကြောင်းအရာတစ်ခုရဲ့ အဓိပ္ပာယ်ကို နားလည်စေရန် ကူညီပေးသော ပတ်ဝန်းကျင်ရှိ အချက်အလက်များ။
+*   **Target Language**: ဘာသာပြန်လိုသော ဘာသာစကား။
+*   **Sequentially**: တစ်ခုပြီးတစ်ခု အစီအစဉ်အတိုင်း လုပ်ဆောင်ခြင်း။
+*   **Attention Mask**: မော်ဒယ်ကို အချို့သော input token များအပေါ် အာရုံစိုက်ခြင်းမှ တားဆီးရန် အသုံးပြုသော mask (အမှတ်အသား)။
+*   **Padding Word**: input sequence များ၏ အလျားကို တူညီစေရန်အတွက် ထပ်ပေါင်းထည့်သော အထူးစကားလုံး။
+*   **Batching**: မော်ဒယ်ကို တစ်ကြိမ်တည်း လေ့ကျင့်ရန် သို့မဟုတ် inference လုပ်ရန်အတွက် ဒေတာနမူနာများစွာကို အစုလိုက် စုစည်းခြင်း။
+*   **Checkpoint**: သတ်မှတ်ထားသော architecture အတွက် လေ့ကျင့်ပြီးသား weights များ။
\ No newline at end of file
diff --git a/chapters/my/chapter1/5.mdx b/chapters/my/chapter1/5.mdx
new file mode 100644
index 000000000..6e8cfee59
--- /dev/null
+++ b/chapters/my/chapter1/5.mdx
@@ -0,0 +1,303 @@
+# 🤗 Transformers တွေက လုပ်ငန်းတာဝန်တွေကို ဘယ်လိုဖြေရှင်းပေးလဲ။[[how-transformers-solve-tasks]]
+
+<Youtube id="zsfR7eY9Uho" />
+
+"[Transformers တွေက ဘာတွေလုပ်နိုင်လဲ။](/course/chapter1/3)" မှာ Natural Language Processing (NLP)၊ speech and audio, computer vision လုပ်ငန်းတာဝန်တွေနဲ့ ၎င်းတို့ရဲ့ အရေးကြီးတဲ့ အသုံးချမှုတွေအကြောင်းကို သင်ယူခဲ့ပြီးပါပြီ။ ဒီစာမျက်နှာကတော့ မော်ဒယ်တွေက ဒီလုပ်ငန်းတာဝန်တွေကို ဘယ်လိုဖြေရှင်းပေးလဲဆိုတာကို အသေးစိတ်လေ့လာပြီး၊ အတွင်းပိုင်းမှာ ဘာတွေဖြစ်ပျက်နေလဲဆိုတာကို ရှင်းပြပေးမှာပါ။ လုပ်ငန်းတာဝန်တစ်ခုကို ဖြေရှင်းဖို့ နည်းလမ်းများစွာရှိပါတယ်။ အချို့မော်ဒယ်တွေက နည်းစနစ်အချို့ကို အသုံးပြုနိုင်သလို လုပ်ငန်းတာဝန်ကို ချဉ်းကပ်ပုံအသစ်တစ်ခုကနေတောင် ချဉ်းကပ်နိုင်ပါတယ်။ ဒါပေမယ့် Transformer မော်ဒယ်တွေအတွက်တော့ အခြေခံသဘောတရားက အတူတူပါပဲ။ ၎င်းတို့ရဲ့ ပြောင်းလွယ်ပြင်လွယ်ရှိတဲ့ architecture ကြောင့် မော်ဒယ်အများစုဟာ encoder, decoder သို့မဟုတ် encoder-decoder ဖွဲ့စည်းပုံရဲ့ အမျိုးအစားခွဲ (variant) တစ်ခု ဖြစ်ပါတယ်။
+
+> [!TIP]
+> သီးခြား architectural variants တွေထဲ မဝင်ခင်မှာ၊ လုပ်ငန်းတာဝန်အများစုဟာ အလားတူ ပုံစံတစ်ခုကို လိုက်နာတယ်ဆိုတာ နားလည်ထားဖို့ အထောက်အကူဖြစ်ပါတယ်။ input data ကို မော်ဒယ်ကနေတစ်ဆင့် လုပ်ဆောင်ပြီး output ကို သီးခြားလုပ်ငန်းတစ်ခုအတွက် အနက်ပြန်ပါတယ်။ ကွာခြားချက်တွေကတော့ data ကို ဘယ်လိုပြင်ဆင်ထားလဲ၊ ဘယ်မော်ဒယ် architecture variant ကို အသုံးပြုထားလဲ၊ ပြီးတော့ output ကို ဘယ်လို လုပ်ဆောင်ထားလဲဆိုတာတွေပါပဲ။
+
+လုပ်ငန်းတာဝန်တွေကို ဘယ်လိုဖြေရှင်းလဲဆိုတာ ရှင်းပြဖို့အတွက် မော်ဒယ်အတွင်းမှာ ဘာတွေဖြစ်ပျက်ပြီး အသုံးဝင်တဲ့ ခန့်မှန်းချက်တွေကို ထုတ်ပေးလဲဆိုတာကို ကြည့်သွားပါမယ်။ ကျွန်တော်တို့ အောက်ပါမော်ဒယ်တွေနဲ့ ၎င်းတို့ရဲ့ လုပ်ငန်းတာဝန်တွေကို လေ့လာပါမယ်။
+
+- [Wav2Vec2](https://huggingface.co/docs/transformers/model_doc/wav2vec2) ကို audio classification နဲ့ automatic speech recognition (ASR) အတွက်
+- [Vision Transformer (ViT)](https://huggingface.co/docs/transformers/model_doc/vit) နဲ့ [ConvNeXT](https://huggingface.co/docs/transformers/model_doc/convnext) ကို image classification အတွက်
+- [DETR](https://huggingface.co/docs/transformers/model_doc/detr) ကို object detection အတွက်
+- [Mask2Former](https://huggingface.co/docs/transformers/model_doc/mask2former) ကို image segmentation အတွက်
+- [GLPN](https://huggingface.co/docs/transformers/model_doc/glpn) ကို depth estimation အတွက်
+- [BERT](https://huggingface.co/docs/transformers/model_doc/bert) ကို encoder ကိုအသုံးပြုတဲ့ text classification, token classification နဲ့ question answering လို NLP လုပ်ငန်းတာဝန်တွေအတွက်
+- [GPT2](https://huggingface.co/docs/transformers/model_doc/gpt2) ကို decoder ကိုအသုံးပြုတဲ့ text generation လို NLP လုပ်ငန်းတာဝန်တွေအတွက်
+- [BART](https://huggingface.co/docs/transformers/model_doc/bart) ကို encoder-decoder ကိုအသုံးပြုတဲ့ summarization နဲ့ translation လို NLP လုပ်ငန်းတာဝန်တွေအတွက်
+
+> [!TIP]
+> ဆက်မသွားခင်မှာ မူရင်း Transformer architecture ရဲ့ အခြေခံအသိပညာအချို့ ရှိထားတာ ကောင်းပါတယ်။ encoders, decoders နဲ့ attention တွေ ဘယ်လိုအလုပ်လုပ်လဲဆိုတာကို သိထားရင် Transformer မော်ဒယ်အမျိုးမျိုး ဘယ်လိုအလုပ်လုပ်လဲဆိုတာ နားလည်ဖို့ အထောက်အကူဖြစ်ပါလိမ့်မယ်။ အသေးစိတ်အချက်အလက်တွေအတွက် ကျွန်တော်တို့ရဲ့ [အရင်အခန်း](https://huggingface.co/course/chapter1/4?fw=pt) ကို သေချာကြည့်ရှုပေးပါ။
+
+## ဘာသာစကားအတွက် Transformer မော်ဒယ်များ[[transformer-models-for-language]]
+
+ဘာသာစကားမော်ဒယ်တွေဟာ ခေတ်သစ် NLP ရဲ့ အဓိကအချက်အချာမှာ ရှိပါတယ်။ ၎င်းတို့ကို စာသားတွေထဲက စကားလုံးတွေ ဒါမှမဟုတ် tokens တွေကြားက စာရင်းအင်းဆိုင်ရာ ပုံစံတွေနဲ့ ဆက်နွယ်မှုတွေကို သင်ယူခြင်းဖြင့် လူသားဘာသာစကားကို နားလည်ပြီး ဖန်တီးနိုင်အောင် ဒီဇိုင်းထုတ်ထားပါတယ်။
+
+Transformer ကို မူလက machine translation အတွက် ဒီဇိုင်းထုတ်ခဲ့တာဖြစ်ပြီး၊ အဲဒီအချိန်ကတည်းက AI လုပ်ငန်းတာဝန်အားလုံးကို ဖြေရှင်းဖို့အတွက် အခြေခံ architecture တစ်ခု ဖြစ်လာခဲ့ပါတယ်။ အချို့လုပ်ငန်းတာဝန်တွေက Transformer ရဲ့ encoder ဖွဲ့စည်းပုံနဲ့ ပိုသင့်တော်ပြီး အချို့ကတော့ decoder နဲ့ ပိုလိုက်ဖက်ပါတယ်။ သို့သော် အချို့လုပ်ငန်းတာဝန်တွေကတော့ Transformer ရဲ့ encoder-decoder ဖွဲ့စည်းပုံ နှစ်ခုလုံးကို အသုံးပြုပါတယ်။
+
+### ဘာသာစကားမော်ဒယ်(language models)တွေ ဘယ်လိုအလုပ်လုပ်လဲ။[[how-language-models-work]]
+
+ဘာသာစကားမော်ဒယ်တွေဟာ ပတ်ဝန်းကျင်ရှိ စကားလုံးတွေရဲ့ အကြောင်းအရာ (context) ကို ပေးထားပြီး စကားလုံးတစ်လုံးရဲ့ ဖြစ်နိုင်ခြေ (probability) ကို ခန့်မှန်းဖို့ လေ့ကျင့်ပေးခြင်းဖြင့် အလုပ်လုပ်ပါတယ်။ ဒါက ၎င်းတို့ကို ဘာသာစကားကို အခြေခံနားလည်စေပြီး အခြားလုပ်ငန်းတာဝန်တွေဆီကို ယေဘုယျ (generalize) လုပ်နိုင်စေပါတယ်။
+
+Transformer မော်ဒယ်တစ်ခုကို လေ့ကျင့်ဖို့အတွက် အဓိက ချဉ်းကပ်ပုံနှစ်မျိုးရှိပါတယ်။
+
+1.  **Masked language modeling (MLM)**: BERT လို encoder မော်ဒယ်တွေ အသုံးပြုတဲ့ ဒီချဉ်းကပ်ပုံက input မှာရှိတဲ့ tokens အချို့ကို ကျပန်းဖုံးကွယ်ထားပြီး၊ ပတ်ဝန်းကျင်ရှိ context ပေါ် အခြေခံပြီး မူရင်း tokens တွေကို ခန့်မှန်းဖို့ မော်ဒယ်ကို လေ့ကျင့်ပေးပါတယ်။ ဒါက မော်ဒယ်ကို နှစ်လမ်းသွား context (ဖုံးကွယ်ထားတဲ့ စကားလုံးရဲ့ အရှေ့နဲ့ အနောက် နှစ်ဖက်လုံးက စကားလုံးတွေကို ကြည့်ခြင်း) ကို သင်ယူနိုင်စေပါတယ်။
+
+2.  **Causal language modeling (CLM)**: GPT လို decoder မော်ဒယ်တွေ အသုံးပြုတဲ့ ဒီချဉ်းကပ်ပုံကတော့ sequence ထဲမှာရှိတဲ့ အရင် tokens အားလုံးပေါ် အခြေခံပြီး နောက် token ကို ခန့်မှန်းပါတယ်။ မော်ဒယ်ဟာ နောက် token ကို ခန့်မှန်းဖို့ ဘယ်ဘက် (အရင် tokens) က context ကိုသာ အသုံးပြုနိုင်ပါတယ်။
+
+### ဘာသာစကားမော်ဒယ် အမျိုးအစားများ[[types-of-language-models]]
+
+Transformers library မှာ ဘာသာစကားမော်ဒယ်တွေဟာ အများအားဖြင့် architectural categories သုံးမျိုးအောက်မှာ ရှိပါတယ်။
+
+1.  **Encoder-only models** (BERT ကဲ့သို့): ဒီမော်ဒယ်တွေဟာ နှစ်လမ်းသွား ချဉ်းကပ်ပုံကို အသုံးပြုပြီး context ကို နှစ်ဖက်လုံးကနေ နားလည်ပါတယ်။ ၎င်းတို့ဟာ classification, named entity recognition နဲ့ question answering လို စာသားကို နက်နက်နဲနဲ နားလည်ဖို့ လိုအပ်တဲ့ လုပ်ငန်းတာဝန်တွေအတွက် အသင့်တော်ဆုံး ဖြစ်ပါတယ်။
+
+2.  **Decoder-only models** (GPT, Llama ကဲ့သို့): ဒီမော်ဒယ်တွေဟာ စာသားကို ဘယ်မှညာသို့ လုပ်ဆောင်ပြီး text generation လုပ်ငန်းတာဝန်တွေမှာ အထူးကောင်းမွန်ပါတယ်။ ၎င်းတို့ဟာ စာကြောင်းတွေ ဖြည့်စွက်တာ၊ စာစီစာကုံးရေးတာ ဒါမှမဟုတ် prompt ပေါ် အခြေခံပြီး code ရေးတာမျိုးတွေတောင် လုပ်နိုင်ပါတယ်။
+
+3.  **Encoder-decoder models** (T5, BART ကဲ့သို့): ဒီမော်ဒယ်တွေဟာ ချဉ်းကပ်ပုံနှစ်မျိုးလုံးကို ပေါင်းစပ်ထားပါတယ်။ input ကို နားလည်ဖို့ encoder ကို အသုံးပြုပြီး output ကို ထုတ်ပေးဖို့ decoder ကို အသုံးပြုပါတယ်။ ၎င်းတို့ဟာ translation, summarization နဲ့ question answering လို sequence-to-sequence လုပ်ငန်းတာဝန်တွေမှာ ထူးချွန်ပါတယ်။
+
+![transformer-models-for-language](https://huggingface.co/datasets/huggingface-course/documentation-images/resolve/main/en/chapter1/transformers_architecture.png)
+
+အရင်အခန်းမှာ ဆွေးနွေးခဲ့သလိုပဲ ဘာသာစကားမော်ဒယ်တွေကို များပြားလှတဲ့ စာသားဒေတာတွေနဲ့ self-supervised နည်းလမ်း (human annotations မပါဘဲ) နဲ့ ကြိုတင်လေ့ကျင့်လေ့ရှိပြီး၊ ပြီးမှ သီးခြားလုပ်ငန်းတာဝန်တွေအတွက် fine-tune လုပ်ပါတယ်။ transfer learning လို့ခေါ်တဲ့ ဒီချဉ်းကပ်ပုံက မော်ဒယ်တွေကို သီးခြားလုပ်ငန်းတာဝန်အတွက် နည်းပါးတဲ့ ဒေတာပမာဏနဲ့ မတူညီတဲ့ NLP လုပ်ငန်းတာဝန်များစွာကို လိုက်လျောညီထွေဖြစ်အောင် လုပ်ဆောင်နိုင်စေပါတယ်။
+
+အောက်ပါအခန်းတွေမှာတော့ သီးခြားမော်ဒယ် architecture တွေကို ဘယ်လိုအသုံးပြုပြီး speech, vision နဲ့ text domains တွေတစ်လျှောက် လုပ်ငန်းတာဝန်အမျိုးမျိုးကို ဘယ်လိုဖြေရှင်းတယ်ဆိုတာကို လေ့လာသွားပါမယ်။
+
+> [!TIP]
+> Transformer architecture ရဲ့ ဘယ်အပိုင်း (encoder, decoder ဒါမှမဟုတ် နှစ်ခုလုံး) က သီးခြား NLP လုပ်ငန်းတာဝန်တစ်ခုအတွက် အသင့်တော်ဆုံးလဲဆိုတာ နားလည်ထားတာဟာ မှန်ကန်တဲ့မော်ဒယ်ကို ရွေးချယ်ဖို့ အဓိကကျပါတယ်။ ယေဘုယျအားဖြင့် နှစ်လမ်းသွား context လိုအပ်တဲ့ လုပ်ငန်းတာဝန်တွေက encoders ကို အသုံးပြုပြီး၊ text generate လုပ်တဲ့ လုပ်ငန်းတာဝန်တွေက decoders ကို အသုံးပြုကာ၊ sequence တစ်ခုကို နောက် sequence တစ်ခုသို့ ပြောင်းလဲတဲ့ လုပ်ငန်းတာဝန်တွေကတော့ encoder-decoders ကို အသုံးပြုပါတယ်။
+
+### စာသား ဖန်တီးခြင်း (Text generation)[[text-generation]]
+
+စာသား ဖန်တီးခြင်းဆိုတာက prompt ဒါမှမဟုတ် input တစ်ခုအပေါ် အခြေခံပြီး ဆက်စပ်မှုရှိတဲ့ စာသားတွေကို ဖန်တီးတာကို ဆိုလိုပါတယ်။
+
+[GPT-2](https://huggingface.co/docs/transformers/model_doc/gpt2) ဟာ များပြားလှတဲ့ စာသားတွေနဲ့ ကြိုတင်လေ့ကျင့်ထားတဲ့ decoder-only မော်ဒယ်တစ်ခုပါ။ ၎င်းဟာ prompt တစ်ခုပေးထားရင် ယုံကြည်နိုင်လောက်တဲ့ (အမြဲတမ်းတော့ မဟုတ်ဘူး) စာသားတွေကို ဖန်တီးနိုင်ပြီး၊ မေးခွန်းဖြေတာလို အခြား NLP လုပ်ငန်းတာဝန်တွေကိုလည်း ရှင်းရှင်းလင်းလင်း လေ့ကျင့်ထားခြင်းမရှိဘဲ လုပ်ဆောင်နိုင်ပါတယ်။
+
+<div class="flex justify-center">
+    <img src="https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/gpt2_architecture.png"/>
+</div>
+
+1.  GPT-2 ဟာ စကားလုံးတွေကို tokenize လုပ်ပြီး token embedding တွေ ထုတ်ပေးဖို့အတွက် [byte pair encoding (BPE)](https://huggingface.co/docs/transformers/tokenizer_summary#bytepair-encoding-bpe) ကို အသုံးပြုပါတယ်။ sequence ထဲမှာ token တစ်ခုချင်းစီရဲ့ နေရာကို ပြသဖို့ positional encodings တွေကို token embeddings တွေမှာ ထပ်ထည့်ပါတယ်။ input embeddings တွေကို decoder blocks အများအပြားကနေတဆင့် ဖြတ်သန်းပြီး final hidden state အချို့ကို ထုတ်ပေးပါတယ်။ decoder block တစ်ခုစီအတွင်းမှာ GPT-2 က *masked self-attention* layer ကို အသုံးပြုပါတယ်။ ဆိုလိုတာက GPT-2 ဟာ နောက်လာမယ့် tokens တွေကို မကြည့်နိုင်ပါဘူး။ ဘယ်ဘက် (အရင် tokens) က tokens တွေကိုသာ ကြည့်ခွင့်ရှိပါတယ်။ ဒါက BERT ရဲ့ [`mask`] token နဲ့ မတူပါဘူး။ ဘာလို့လဲဆိုတော့ masked self-attention မှာ future tokens တွေအတွက် score ကို `0` သတ်မှတ်ဖို့ attention mask ကို အသုံးပြုထားလို့ပါ။
+
+2.  Decoder ကနေ ထွက်လာတဲ့ output ကို language modeling head ကို ပေးပို့ပါတယ်။ အဲဒီကနေ linear transformation တစ်ခုကို လုပ်ဆောင်ပြီး hidden states တွေကို logits အဖြစ် ပြောင်းလဲပေးပါတယ်။ label ကတော့ sequence ထဲမှာရှိတဲ့ နောက် token ဖြစ်ပြီး logits တွေကို ညာဘက်သို့ တစ်နေရာ ရွှေ့ခြင်းဖြင့် ဖန်တီးပါတယ်။ shifted logits တွေနဲ့ labels တွေကြားက cross-entropy loss ကို တွက်ချက်ပြီး နောက်လာမယ့် အဖြစ်နိုင်ဆုံး token ကို ထုတ်ပေးပါတယ်။
+
+GPT-2 ရဲ့ pretraining ရည်ရွယ်ချက်က [causal language modeling](https://huggingface.co/docs/transformers/glossary#causal-language-modeling) ပေါ် အခြေခံပြီး sequence ထဲက နောက်စကားလုံးကို ခန့်မှန်းတာပါ။ ဒါက GPT-2 ကို စာသားဖန်တီးခြင်းနဲ့ ပတ်သက်တဲ့ လုပ်ငန်းတာဝန်တွေမှာ အထူးကောင်းမွန်စေပါတယ်။
+
+စာသား ဖန်တီးခြင်းကို ကိုယ်တိုင်စမ်းကြည့်ဖို့ အဆင်သင့်ဖြစ်ပြီလား။ DistilGPT-2 ကို ဘယ်လို fine-tune လုပ်ပြီး inference အတွက် ဘယ်လိုအသုံးပြုရမလဲဆိုတာ လေ့လာဖို့ ကျွန်တော်တို့ရဲ့ ပြည့်စုံတဲ့ [causal language modeling guide](https://huggingface.co/docs/transformers/tasks/language_modeling#causal-language-modeling) ကို ကြည့်ရှုပါ။
+
+> [!TIP]
+> စာသား ဖန်တီးခြင်းနဲ့ ပတ်သက်တဲ့ အချက်အလက်အများကြီးအတွက် [text generation strategies](generation_strategies) guide ကို ကြည့်ရှုပါ။
+
+### စာသား အမျိုးအစားခွဲခြားခြင်း (Text classification)[[text-classification]]
+
+စာသား အမျိုးအစားခွဲခြားခြင်းဆိုတာ စာသားမှတ်တမ်းတွေကို ကြိုတင်သတ်မှတ်ထားတဲ့ အမျိုးအစားတွေ (ဥပမာ- sentiment analysis, topic classification, spam detection) သို့ သတ်မှတ်ပေးတာကို ဆိုလိုပါတယ်။
+
+[BERT](https://huggingface.co/docs/transformers/model_doc/bert) ဟာ encoder-only မော်ဒယ်တစ်ခုဖြစ်ပြီး စာသားကို နှစ်ဖက်စလုံးက စကားလုံးတွေကို ကြည့်ရှုခြင်းဖြင့် ပိုမိုကြွယ်ဝတဲ့ ကိုယ်စားပြုမှု (representations) တွေကို သင်ယူဖို့အတွက် deep bidirectionality ကို ထိရောက်စွာ အကောင်အထည်ဖော်ခဲ့တဲ့ ပထမဆုံးမော်ဒယ် ဖြစ်ပါတယ်။
+
+1.  BERT ဟာ စာသားရဲ့ token embedding ကို ထုတ်ပေးဖို့အတွက် [WordPiece](https://huggingface.co/docs/transformers/tokenizer_summary#wordpiece) tokenization ကို အသုံးပြုပါတယ်။ စာကြောင်းတစ်ကြောင်းနဲ့ စာကြောင်းနှစ်ကြောင်းရဲ့ ကွာခြားချက်ကို ပြောပြဖို့အတွက် အထူး `[SEP]` token တစ်ခုကို ခွဲခြားဖို့ ထပ်ထည့်ပါတယ်။ sequence of text တိုင်းရဲ့ အစမှာ အထူး `[CLS]` token တစ်ခုကို ထပ်ထည့်ပါတယ်။ `[CLS]` token ပါတဲ့ နောက်ဆုံး output ကို classification လုပ်ငန်းတာဝန်တွေအတွက် classification head ရဲ့ input အဖြစ် အသုံးပြုပါတယ်။ BERT ဟာ token တစ်ခုက စာကြောင်းတစ်စုံမှာ ပထမစာကြောင်း ဒါမှမဟုတ် ဒုတိယစာကြောင်းမှာ ပါဝင်တယ်ဆိုတာကို ဖော်ပြဖို့ segment embedding တစ်ခုကိုလည်း ထပ်ထည့်ပါတယ်။
+
+2.  BERT ကို masked language modeling နဲ့ next-sentence prediction ဆိုတဲ့ ရည်ရွယ်ချက်နှစ်ခုနဲ့ ကြိုတင်လေ့ကျင့်ထားပါတယ်။ masked language modeling မှာ input tokens အချို့ ရာခိုင်နှုန်းကို ကျပန်းဖုံးကွယ်ထားပြီး မော်ဒယ်က ဒါတွေကို ခန့်မှန်းဖို့ လိုပါတယ်။ ဒါက မော်ဒယ်က စကားလုံးအားလုံးကို မြင်ပြီး နောက်စကားလုံးကို "ခန့်မှန်း" နိုင်တဲ့ bidirectionality ပြဿနာကို ဖြေရှင်းပေးပါတယ်။ ခန့်မှန်းထားတဲ့ masked tokens တွေရဲ့ final hidden states တွေကို feedforward network တစ်ခုကို ပေးပို့ပြီး vocabulary ပေါ်က softmax နဲ့ ဖုံးကွယ်ထားတဲ့ စကားလုံးကို ခန့်မှန်းပါတယ်။
+
+    ဒုတိယ pretraining object က next-sentence prediction ဖြစ်ပါတယ်။ မော်ဒယ်ဟာ စာကြောင်း B က စာကြောင်း A နောက်က လိုက်သလားဆိုတာကို ခန့်မှန်းရပါမယ်။ အချိန်ရဲ့ ထက်ဝက်မှာ စာကြောင်း B က နောက်လာမယ့် စာကြောင်းဖြစ်ပြီး ကျန်ထက်ဝက်မှာတော့ စာကြောင်း B က ကျပန်းစာကြောင်းတစ်ကြောင်း ဖြစ်ပါတယ်။ နောက်လာမယ့် စာကြောင်းဟုတ်မဟုတ်ဆိုတဲ့ ခန့်မှန်းချက်ကို feedforward network တစ်ခုကို ပေးပို့ပြီး class နှစ်ခု (`IsNext` နဲ့ `NotNext`) ပေါ်က softmax နဲ့ တွက်ချက်ပါတယ်။
+
+3.  input embeddings တွေကို encoder layers အများအပြားကနေတဆင့် ဖြတ်သန်းပြီး final hidden states အချို့ကို ထုတ်ပေးပါတယ်။
+
+ကြိုတင်လေ့ကျင့်ထားတဲ့ မော်ဒယ်ကို စာသား အမျိုးအစားခွဲခြားခြင်းအတွက် အသုံးပြုဖို့အတွက် base BERT မော်ဒယ်ရဲ့ ထိပ်မှာ sequence classification head တစ်ခုကို ထပ်ထည့်ရပါမယ်။ sequence classification head က linear layer တစ်ခုဖြစ်ပြီး final hidden states တွေကို လက်ခံကာ linear transformation တစ်ခုကို လုပ်ဆောင်ပြီး ၎င်းတို့ကို logits အဖြစ် ပြောင်းလဲပေးပါတယ်။ logits တွေနဲ့ target တွေကြားက cross-entropy loss ကို တွက်ချက်ပြီး အဖြစ်နိုင်ဆုံး label ကို ရှာဖွေပါတယ်။
+
+စာသား အမျိုးအစားခွဲခြားခြင်းကို ကိုယ်တိုင်စမ်းကြည့်ဖို့ အဆင်သင့်ဖြစ်ပြီလား။ DistilBERT ကို ဘယ်လို fine-tune လုပ်ပြီး inference အတွက် ဘယ်လိုအသုံးပြုရမလဲဆိုတာ လေ့လာဖို့ ကျွန်တော်တို့ရဲ့ ပြည့်စုံတဲ့ [text classification guide](https://huggingface.co/docs/transformers/tasks/sequence_classification) ကို ကြည့်ရှုပါ။
+
+### Token classification[[token-classification]]
+
+Token classification ဆိုတာ sequence တစ်ခုစီရှိ token တစ်ခုစီကို label တစ်ခု သတ်မှတ်ပေးတာကို ဆိုလိုပါတယ်။ ဥပမာအားဖြင့် named entity recognition သို့မဟုတ် part-of-speech tagging တို့ ဖြစ်ပါတယ်။
+
+BERT ကို named entity recognition (NER) လို token classification လုပ်ငန်းတာဝန်တွေအတွက် အသုံးပြုဖို့အတွက် base BERT မော်ဒယ်ရဲ့ ထိပ်မှာ token classification head တစ်ခုကို ထပ်ထည့်ရပါမယ်။ token classification head က linear layer တစ်ခုဖြစ်ပြီး final hidden states တွေကို လက်ခံကာ linear transformation တစ်ခုကို လုပ်ဆောင်ပြီး ၎င်းတို့ကို logits အဖြစ် ပြောင်းလဲပေးပါတယ်။ logits တွေနဲ့ token တစ်ခုစီကြားက cross-entropy loss ကို တွက်ချက်ပြီး အဖြစ်နိုင်ဆုံး label ကို ရှာဖွေပါတယ်။
+
+token classification ကို ကိုယ်တိုင်စမ်းကြည့်ဖို့ အဆင်သင့်ဖြစ်ပြီလား။ DistilBERT ကို ဘယ်လို fine-tune လုပ်ပြီး inference အတွက် ဘယ်လိုအသုံးပြုရမလဲဆိုတာ လေ့လာဖို့ ကျွန်တော်တို့ရဲ့ ပြည့်စုံတဲ့ [token classification guide](https://huggingface.co/docs/transformers/tasks/token_classification) ကို ကြည့်ရှုပါ။
+
+### မေးခွန်းဖြေခြင်း (Question answering)[[question-answering]]
+
+မေးခွန်းဖြေခြင်းဆိုတာက ပေးထားတဲ့ context ဒါမှမဟုတ် စာပိုဒ်တစ်ခုအတွင်းမှာ မေးခွန်းရဲ့အဖြေကို ရှာဖွေတာကို ဆိုလိုပါတယ်။
+
+BERT ကို မေးခွန်းဖြေခြင်းအတွက် အသုံးပြုဖို့အတွက် base BERT မော်ဒယ်ရဲ့ ထိပ်မှာ span classification head တစ်ခုကို ထပ်ထည့်ရပါမယ်။ ဒီ linear layer က final hidden states တွေကို လက်ခံကာ linear transformation တစ်ခုကို လုပ်ဆောင်ပြီး အဖြေနဲ့ ကိုက်ညီတဲ့ `span` start နဲ့ end logits တွေကို တွက်ချက်ပေးပါတယ်။ logits တွေနဲ့ label position တွေကြားက cross-entropy loss ကို တွက်ချက်ပြီး အဖြေနဲ့ ကိုက်ညီတဲ့ အဖြစ်နိုင်ဆုံး စာသားအပိုင်းကို ရှာဖွေပါတယ်။
+
+မေးခွန်းဖြေခြင်းကို ကိုယ်တိုင်စမ်းကြည့်ဖို့ အဆင်သင့်ဖြစ်ပြီလား။ DistilBERT ကို ဘယ်လို fine-tune လုပ်ပြီး inference အတွက် ဘယ်လိုအသုံးပြုရမလဲဆိုတာ လေ့လာဖို့ ကျွန်တော်တို့ရဲ့ ပြည့်စုံတဲ့ [question answering guide](https://huggingface.co/docs/transformers/tasks/question_answering) ကို ကြည့်ရှုပါ။
+
+> [!TIP]
+> 💡 BERT ကို ကြိုတင်လေ့ကျင့်ပြီးတာနဲ့ မတူညီတဲ့ လုပ်ငန်းတာဝန်တွေအတွက် အသုံးပြုဖို့ ဘယ်လောက်လွယ်ကူလဲဆိုတာ သတိထားမိလား။ သင်လိုချင်တဲ့ output ကို ရရှိဖို့အတွက် ကြိုတင်လေ့ကျင့်ထားတဲ့ မော်ဒယ်ရဲ့ ထိပ်မှာ သီးခြား head တစ်ခုကို ထပ်ထည့်ဖို့ပဲ လိုအပ်ပါတယ်။
+
+### အနှစ်ချုပ်ခြင်း (Summarization)[[summarization]]
+
+အနှစ်ချုပ်ခြင်းဆိုတာက ပိုရှည်တဲ့ စာသားတစ်ခုကို အဓိကအချက်အလက်တွေနဲ့ အဓိပ္ပာယ်ကို ထိန်းသိမ်းထားရင်း ပိုတိုတဲ့ပုံစံအဖြစ် ပြောင်းလဲတာကို ဆိုလိုပါတယ်။
+
+[BART](https://huggingface.co/docs/transformers/model_doc/bart) နဲ့ [T5](model_doc/t5) လို encoder-decoder မော်ဒယ်တွေကို summarization လုပ်ငန်းတာဝန်ရဲ့ sequence-to-sequence ပုံစံအတွက် ဒီဇိုင်းထုတ်ထားပါတယ်။ ဒီအပိုင်းမှာ BART ဘယ်လိုအလုပ်လုပ်လဲဆိုတာကို ရှင်းပြပြီး၊ ပြီးရင် T5 ကို fine-tune လုပ်တာကို သင် ကိုယ်တိုင် စမ်းကြည့်နိုင်ပါတယ်။
+
+<div class="flex justify-center">
+    <img src="https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/bart_architecture.png"/>
+</div>
+
+1.  BART ရဲ့ encoder architecture က BERT နဲ့ အတော်လေး ဆင်တူပြီး စာသားရဲ့ token နဲ့ positional embedding ကို လက်ခံပါတယ်။ BART ကို input ကို ဖျက်စီးပြီး decoder နဲ့ ပြန်လည်တည်ဆောက်ခြင်းဖြင့် ကြိုတင်လေ့ကျင့်ထားပါတယ်။ သီးခြား corruption strategies တွေပါတဲ့ အခြား encoders တွေနဲ့မတူဘဲ BART က ဘယ်လို corruption အမျိုးအစားမဆို အသုံးပြုနိုင်ပါတယ်။ သို့သော် *text infilling* corruption strategy က အကောင်းဆုံး အလုပ်လုပ်ပါတယ်။ text infilling မှာ စာသားအပိုင်းအချို့ကို **တစ်ခုတည်းသော** [`mask`] token နဲ့ အစားထိုးပါတယ်။ ဒါက အရေးကြီးပါတယ်၊ ဘာလို့လဲဆိုတော့ မော်ဒယ်က ဖုံးကွယ်ထားတဲ့ tokens တွေကို ခန့်မှန်းရမှာဖြစ်ပြီး၊ ပျောက်ဆုံးနေတဲ့ tokens အရေအတွက်ကို ခန့်မှန်းဖို့ မော်ဒယ်ကို သင်ကြားပေးပါတယ်။ input embeddings နဲ့ masked spans တွေကို encoder ကနေတဆင့် ဖြတ်သန်းပြီး final hidden states အချို့ကို ထုတ်ပေးပါတယ်။ ဒါပေမယ့် BERT နဲ့မတူဘဲ BART က စကားလုံးတစ်လုံးကို ခန့်မှန်းဖို့ နောက်ဆုံး feedforward network ကို ထပ်ထည့်ထားခြင်း မရှိပါဘူး။
+
+2.  encoder ရဲ့ output ကို decoder ကို ပေးပို့ပါတယ်။ decoder က ဖုံးကွယ်ထားတဲ့ tokens တွေနဲ့ encoder ရဲ့ output ကနေ uncorrupted tokens တွေကို ခန့်မှန်းရပါမယ်။ ဒါက decoder ကို မူရင်းစာသားကို ပြန်လည်တည်ဆောက်ဖို့ အပို context တွေ ပေးပါတယ်။ decoder ကနေ ထွက်လာတဲ့ output ကို language modeling head ကို ပေးပို့ပါတယ်။ အဲဒီကနေ linear transformation တစ်ခုကို လုပ်ဆောင်ပြီး hidden states တွေကို logits အဖြစ် ပြောင်းလဲပေးပါတယ်။ logits တွေနဲ့ label (ညာဘက်သို့ ရွှေ့ထားတဲ့ token) ကြားက cross-entropy loss ကို တွက်ချက်ပါတယ်။
+
+အနှစ်ချုပ်ခြင်းကို ကိုယ်တိုင်စမ်းကြည့်ဖို့ အဆင်သင့်ဖြစ်ပြီလား။ T5 ကို ဘယ်လို fine-tune လုပ်ပြီး inference အတွက် ဘယ်လိုအသုံးပြုရမလဲဆိုတာ လေ့လာဖို့ ကျွန်တော်တို့ရဲ့ ပြည့်စုံတဲ့ [summarization guide](https://huggingface.co/docs/transformers/tasks/summarization) ကို ကြည့်ရှုပါ။
+
+> [!TIP]
+> စာသား ဖန်တီးခြင်းနဲ့ ပတ်သက်တဲ့ အချက်အလက်အများကြီးအတွက် [text generation strategies](https://huggingface.co/docs/transformers/generation_strategies) guide ကို ကြည့်ရှုပါ။
+
+### ဘာသာပြန်ခြင်း (Translation)[[translation]]
+
+ဘာသာပြန်ခြင်းဆိုတာ စာသားတစ်ခုကို အခြားဘာသာစကားတစ်ခုသို့ အဓိပ္ပာယ်ကို ထိန်းသိမ်းထားရင်း ပြောင်းလဲတာကို ဆိုလိုပါတယ်။ ဘာသာပြန်ခြင်းက sequence-to-sequence လုပ်ငန်းတာဝန်တစ်ခုရဲ့ နောက်ထပ်ဥပမာတစ်ခု ဖြစ်ပါတယ်။ ဆိုလိုတာက [BART](https://huggingface.co/docs/transformers/model_doc/bart) ဒါမှမဟုတ် [T5](model_doc/t5) လို encoder-decoder မော်ဒယ်ကို အသုံးပြုနိုင်ပါတယ်။ ဒီအပိုင်းမှာ BART ဘယ်လိုအလုပ်လုပ်လဲဆိုတာကို ရှင်းပြပြီး၊ ပြီးရင် T5 ကို fine-tune လုပ်တာကို သင် ကိုယ်တိုင် စမ်းကြည့်နိုင်ပါတယ်။
+
+BART ဟာ source ဘာသာစကားတစ်ခုကို target ဘာသာစကားသို့ decode လုပ်နိုင်တဲ့ input အဖြစ် map လုပ်ဖို့အတွက် သီးခြား၊ ကျပန်းစတင်ထားတဲ့ encoder တစ်ခုကို ထပ်ထည့်ခြင်းဖြင့် ဘာသာပြန်ခြင်းကို လိုက်လျောညီထွေဖြစ်အောင် လုပ်ဆောင်ပါတယ်။ ဒီ encoder အသစ်ရဲ့ embeddings တွေကို မူရင်း word embeddings အစား ကြိုတင်လေ့ကျင့်ထားတဲ့ encoder ကို ပေးပို့ပါတယ်။ source encoder ကို မော်ဒယ် output ကနေ cross-entropy loss နဲ့ source encoder, positional embeddings နဲ့ input embeddings တွေကို update လုပ်ခြင်းဖြင့် လေ့ကျင့်ပေးပါတယ်။ ဒီပထမအဆင့်မှာ မော်ဒယ် parameters တွေကို freeze ထားပြီး၊ ဒုတိယအဆင့်မှာတော့ မော်ဒယ် parameters အားလုံးကို အတူတကွ လေ့ကျင့်ပေးပါတယ်။ BART ကိုတော့ ဘာသာပြန်ခြင်းအတွက် ရည်ရွယ်ပြီး မတူညီတဲ့ ဘာသာစကားများစွာနဲ့ ကြိုတင်လေ့ကျင့်ထားတဲ့ multilingual version ဖြစ်တဲ့ mBART က ဆက်ခံခဲ့ပါတယ်။
+
+ဘာသာပြန်ခြင်းကို ကိုယ်တိုင်စမ်းကြည့်ဖို့ အဆင်သင့်ဖြစ်ပြီလား။ T5 ကို ဘယ်လို fine-tune လုပ်ပြီး inference အတွက် ဘယ်လိုအသုံးပြုရမလဲဆိုတာ လေ့လာဖို့ ကျွန်တော်တို့ရဲ့ ပြည့်စုံတဲ့ [translation guide](https://huggingface.co/docs/transformers/tasks/translation) ကို ကြည့်ရှုပါ။
+
+> [!TIP]
+> ဒီ guide တစ်လျှောက်လုံးမှာ သင်တွေ့ခဲ့ရတဲ့အတိုင်း မော်ဒယ်များစွာဟာ မတူညီတဲ့ လုပ်ငန်းတာဝန်တွေကို ဖြေရှင်းနေရရင်တောင် အလားတူ ပုံစံတွေကို လိုက်နာကြပါတယ်။ ဒီလို အခြေခံပုံစံတွေကို နားလည်ထားတာက မော်ဒယ်အသစ်တွေ ဘယ်လိုအလုပ်လုပ်လဲဆိုတာကို အမြန်နားလည်ဖို့နဲ့ ရှိပြီးသားမော်ဒယ်တွေကို သင်ရဲ့လိုအပ်ချက်တွေနဲ့ လိုက်လျောညီထွေဖြစ်အောင် လုပ်ဆောင်ဖို့ ကူညီပေးနိုင်ပါတယ်။
+
+## စာသားပြင်ပရှိ Modalities များ[[modalities-beyond-text]]
+
+Transformers တွေဟာ စာသားအတွက်သာ ကန့်သတ်ထားတာ မဟုတ်ပါဘူး။ ၎င်းတို့ကို speech and audio, images နဲ့ video လို အခြား modalities တွေမှာလည်း အသုံးပြုနိုင်ပါတယ်။ ဒီသင်တန်းမှာတော့ ကျွန်တော်တို့က စာသားကို အဓိကထားမှာဖြစ်ပေမယ့် အခြား modalities တွေကို အတိုချုပ် မိတ်ဆက်ပေးပါမယ်။
+
+### စကားပြောနှင့် အသံ (Speech and audio)[[speech-and-audio]]
+
+Transformer မော်ဒယ်တွေက စာသား ဒါမှမဟုတ် ပုံတွေနဲ့ယှဉ်ရင် ထူးခြားတဲ့ စိန်ခေါ်မှုတွေရှိတဲ့ speech နဲ့ audio data တွေကို ဘယ်လိုကိုင်တွယ်လဲဆိုတာကို စလေ့လာရအောင်။
+
+[Whisper](https://huggingface.co/docs/transformers/main/en/model_doc/whisper) ဟာ 680,000 နာရီကြာ မှတ်သားထားတဲ့ audio data တွေနဲ့ ကြိုတင်လေ့ကျင့်ထားတဲ့ encoder-decoder (sequence-to-sequence) transformer တစ်ခုဖြစ်ပါတယ်။ ဒီလိုများပြားတဲ့ pretraining data ပမာဏက English နဲ့ အခြားဘာသာစကားများစွာရှိ audio လုပ်ငန်းတာဝန်တွေမှာ zero-shot performance ကို ရရှိစေပါတယ်။ decoder က Whisper ကို encoders တွေ သင်ယူထားတဲ့ speech representations တွေကို စာသားလို အသုံးဝင်တဲ့ outputs တွေအဖြစ် ထပ်မံ fine-tune လုပ်စရာမလိုဘဲ map လုပ်နိုင်စေပါတယ်။ Whisper က box ထဲကနေ တန်းအလုပ်လုပ်နိုင်ပါတယ်။
+
+<div class="flex justify-center">
+    <img src="https://huggingface.co/datasets/huggingface-course/documentation-images/resolve/main/en/chapter1/whisper_architecture.png"/>
+</div>
+
+ပုံကြမ်းကို [Whisper paper](https://huggingface.co/papers/2212.04356) မှ ရယူထားပါသည်။
+
+ဒီမော်ဒယ်မှာ အဓိက အစိတ်အပိုင်းနှစ်ခု ပါဝင်ပါတယ်။
+
+1.  **Encoder**: input audio ကို လုပ်ဆောင်ပေးပါတယ်။ ကနဦး audio ကို log-Mel spectrogram အဖြစ် ပြောင်းလဲပါတယ်။ ဒီ spectrogram ကို Transformer encoder network ကနေတဆင့် ဖြတ်သန်းပါတယ်။
+
+2.  **Decoder**: encoded audio representation ကို ယူပြီး သက်ဆိုင်ရာ text tokens တွေကို autoregressively ခန့်မှန်းပါတယ်။ ဒါဟာ အရင် tokens တွေနဲ့ encoder output ကို ပေးထားပြီး နောက် text token ကို ခန့်မှန်းဖို့ လေ့ကျင့်ထားတဲ့ standard Transformer decoder တစ်ခုပါ။ transcription, translation ဒါမှမဟုတ် language identification လို သီးခြားလုပ်ငန်းတာဝန်တွေဆီ မော်ဒယ်ကို ဦးတည်စေဖို့ decoder input ရဲ့ အစမှာ အထူး tokens တွေကို အသုံးပြုပါတယ်။
+
+Whisper ကို ဝက်ဘ်မှ စုဆောင်းထားတဲ့ 680,000 နာရီကြာ မှတ်သားထားတဲ့ audio data များစွာနဲ့ မတူညီတဲ့ dataset တစ်ခုပေါ်မှာ ကြိုတင်လေ့ကျင့်ထားပါတယ်။ ဒီလို ကြီးမားတဲ့၊ weakly supervised pretraining ဟာ မတူညီတဲ့ ဘာသာစကားတွေ၊ လေယူလေသိမ်းတွေနဲ့ လုပ်ငန်းတာဝန်တွေမှာ task-specific finetuning မပါဘဲ အစွမ်းထက်တဲ့ zero-shot performance ကို ရရှိစေတဲ့ အဓိကအချက်ပါ။
+
+Whisper ကို ကြိုတင်လေ့ကျင့်ပြီးပြီဆိုတော့ zero-shot inference အတွက် တိုက်ရိုက်အသုံးပြုနိုင်သလို automatic speech recognition ဒါမှမဟုတ် speech translation လို သီးခြားလုပ်ငန်းတာဝန်တွေမှာ စွမ်းဆောင်ရည်ပိုမိုကောင်းမွန်စေဖို့ သင်ရဲ့ data ပေါ်မှာ fine-tune လုပ်နိုင်ပါပြီ။
+
+> [!TIP]
+> Whisper ရဲ့ အဓိက ဆန်းသစ်တီထွင်မှုကတော့ အင်တာနက်ကနေ ရရှိတဲ့ မတူညီတဲ့၊ weakly supervised audio data တွေကို အစဉ်အလာမရှိတဲ့ ပမာဏနဲ့ လေ့ကျင့်ထားခြင်း ဖြစ်ပါတယ်။ ဒါက မတူညီတဲ့ ဘာသာစကားတွေ၊ လေယူလေသိမ်းတွေနဲ့ လုပ်ငန်းတာဝန်တွေဆီကို task-specific finetuning မပါဘဲ ထူးထူးခြားခြား ကောင်းမွန်စွာ ယေဘုယျလုပ်ဆောင်နိုင်စေပါတယ်။
+
+### အလိုအလျောက် စကားပြော မှတ်သားခြင်း (Automatic speech recognition)[[automatic-speech-recognition]]
+
+ကြိုတင်လေ့ကျင့်ထားတဲ့ မော်ဒယ်ကို automatic speech recognition အတွက် အသုံးပြုဖို့အတွက် ၎င်းရဲ့ ပြည့်စုံတဲ့ encoder-decoder ဖွဲ့စည်းပုံကို အသုံးချရပါမယ်။ encoder က audio input ကို လုပ်ဆောင်ပြီး decoder ကတော့ text token တစ်ခုချင်းစီကို autoregressively ထုတ်ပေးပါတယ်။ fine-tuning လုပ်တဲ့အခါ မော်ဒယ်ကို audio input ပေါ် အခြေခံပြီး မှန်ကန်တဲ့ text tokens တွေကို ခန့်မှန်းဖို့အတွက် standard sequence-to-sequence loss (cross-entropy ကဲ့သို့) ကို အသုံးပြုပြီး လေ့ကျင့်လေ့ရှိပါတယ်။
+
+Fine-tuned model ကို inference အတွက် အသုံးပြုဖို့ အလွယ်ဆုံးနည်းလမ်းကတော့ `pipeline` အတွင်းမှာပဲ ဖြစ်ပါတယ်။
+
+```python
+from transformers import pipeline
+
+transcriber = pipeline(
+    task="automatic-speech-recognition", model="openai/whisper-base.en"
+)
+transcriber("https://huggingface.co/datasets/Narsil/asr_dummy/resolve/main/mlk.flac")
+# Output: {'text': ' I have a dream that one day this nation will rise up and live out the true meaning of its creed.'}
+```
+
+Automatic speech recognition ကို ကိုယ်တိုင်စမ်းကြည့်ဖို့ အဆင်သင့်ဖြစ်ပြီလား။ Whisper ကို ဘယ်လို fine-tune လုပ်ပြီး inference အတွက် ဘယ်လိုအသုံးပြုရမလဲဆိုတာ လေ့လာဖို့ ကျွန်တော်တို့ရဲ့ ပြည့်စုံတဲ့ [automatic speech recognition guide](https://huggingface.co/docs/transformers/tasks/asr) ကို ကြည့်ရှုပါ။
+
+### Computer vision[[computer-vision]]
+
+အခုတော့ computer vision လုပ်ငန်းတာဝန်တွေဆီ ဆက်သွားရအောင်။ ဒါတွေကတော့ ပုံတွေ ဒါမှမဟုတ် ဗီဒီယိုတွေကနေ မြင်နိုင်တဲ့ အချက်အလက်တွေကို နားလည်ပြီး အနက်ပြန်ခြင်းနဲ့ သက်ဆိုင်ပါတယ်။
+
+computer vision လုပ်ငန်းတာဝန်တွေကို ချဉ်းကပ်ဖို့ နည်းလမ်းနှစ်မျိုးရှိပါတယ်။
+
+1.  ပုံတစ်ပုံကို patches အစုအဝေးအဖြစ် ခွဲခြမ်းပြီး Transformer နဲ့ တစ်ပြိုင်နက်တည်း လုပ်ဆောင်ခြင်း။
+2.  convolutional layers တွေကို အသုံးပြုပေမယ့် ခေတ်မီ network designs တွေကို လက်ခံထားတဲ့ [ConvNeXT](https://huggingface.co/docs/transformers/model_doc/convnext) လို ခေတ်မီ CNN တစ်ခုကို အသုံးပြုခြင်း။
+
+> [!TIP]
+> တတိယချဉ်းကပ်ပုံကတော့ Transformers တွေကို convolutions တွေနဲ့ ရောစပ်ခြင်း (ဥပမာ- [Convolutional Vision Transformer](https://huggingface.co/docs/transformers/model_doc/cvt) သို့မဟုတ် [LeViT](https://huggingface.co/docs/transformers/model_doc/levit)) ဖြစ်ပါတယ်။ ဒါတွေကိုတော့ ကျွန်တော်တို့ ဆွေးနွေးမှာ မဟုတ်ပါဘူး၊ ဘာလို့လဲဆိုတော့ ၎င်းတို့ဟာ ဒီနေရာမှာ စစ်ဆေးထားတဲ့ ချဉ်းကပ်ပုံနှစ်ခုကို ပေါင်းစပ်ထားတာပဲ ဖြစ်လို့ပါ။
+
+ViT နဲ့ ConvNeXT တို့နှစ်ခုလုံးကို image classification အတွက် အများအားဖြင့် အသုံးပြုပေမယ့် object detection, segmentation နဲ့ depth estimation လို အခြား vision လုပ်ငန်းတာဝန်တွေအတွက်တော့ DETR, Mask2Former နဲ့ GLPN တို့ကို အသီးသီး ကြည့်ရှုသွားပါမယ်။ ဒီမော်ဒယ်တွေကတော့ အဲဒီလုပ်ငန်းတာဝန်တွေအတွက် ပိုမိုသင့်လျော်ပါတယ်။
+
+### ရုပ်ပုံ အမျိုးအစားခွဲခြားခြင်း (Image classification)[[image-classification]]
+
+ရုပ်ပုံ အမျိုးအစားခွဲခြားခြင်းက အခြေခံကျတဲ့ computer vision လုပ်ငန်းတာဝန်တွေထဲက တစ်ခုပါ။ မတူညီတဲ့ မော်ဒယ် architecture တွေက ဒီပြဿနာကို ဘယ်လိုချဉ်းကပ်လဲဆိုတာ ကြည့်ရအောင်။
+
+ViT နဲ့ ConvNeXT တို့နှစ်ခုလုံးကို image classification အတွက် အသုံးပြုနိုင်ပါတယ်။ အဓိက ကွာခြားချက်ကတော့ ViT က attention mechanism ကို အသုံးပြုပြီး ConvNeXT က convolutions တွေကို အသုံးပြုတာပါပဲ။
+
+[ViT](https://huggingface.co/docs/transformers/model_doc/vit) ဟာ convolutions တွေကို Transformer architecture သန့်သန့်နဲ့ အစားထိုးထားပါတယ်။ မူရင်း Transformer နဲ့ ရင်းနှီးပြီးသားဆိုရင် ViT ကို နားလည်ဖို့ အများကြီး ကျန်တော့မှာ မဟုတ်ပါဘူး။
+
+<div class="flex justify-center">
+    <img src="https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/transformers/model_doc/vit_architecture.jpg"/>
+</div>
+
+ViT က မိတ်ဆက်ခဲ့တဲ့ အဓိကပြောင်းလဲမှုက ပုံတွေကို Transformer ကို ဘယ်လို ထည့်သွင်းလဲဆိုတာပါပဲ။
+
+1.  ပုံတစ်ပုံကို လေးထောင့်မကျအောင် မထပ်တဲ့ patches တွေအဖြစ် ခွဲခြမ်းပြီး၊ patch တစ်ခုစီကို vector ဒါမှမဟုတ် *patch embedding* အဖြစ် ပြောင်းလဲပါတယ်။ patch embeddings တွေကို convolutional 2D layer ကနေ ထုတ်ပေးပြီး မှန်ကန်တဲ့ input dimensions (base Transformer အတွက် patch embedding တစ်ခုစီအတွက် 768 values) ကို ဖန်တီးပေးပါတယ်။ 224x224 pixel ပုံတစ်ပုံရှိရင် 196 16x16 ပုံ patches တွေအဖြစ် ခွဲခြမ်းနိုင်ပါတယ်။ စာသားကို စကားလုံးတွေအဖြစ် tokenize လုပ်သလိုမျိုး ပုံတစ်ပုံကိုလည်း patches sequence အဖြစ် "tokenize" လုပ်ပါတယ်။
+
+2.  *learnable embedding* - အထူး `[CLS]` token - ကို BERT လိုပဲ patch embeddings ရဲ့ အစမှာ ထပ်ထည့်ပါတယ်။ `[CLS]` token ရဲ့ final hidden state ကို တွဲထားတဲ့ classification head ရဲ့ input အဖြစ် အသုံးပြုပြီး အခြား outputs တွေကိုတော့ လျစ်လျူရှုပါတယ်။ ဒီ token က မော်ဒယ်ကို ပုံတစ်ပုံရဲ့ representation ကို ဘယ်လို encode လုပ်ရမယ်ဆိုတာ သင်ယူဖို့ ကူညီပေးပါတယ်။
+
+3.  patch နဲ့ learnable embeddings တွေမှာ ထပ်ထည့်ရမယ့် နောက်ဆုံးအရာကတော့ *position embeddings* တွေ ဖြစ်ပါတယ်။ ဘာလို့လဲဆိုတော့ မော်ဒယ်က image patches တွေရဲ့ အစီအစဉ်ကို မသိလို့ပါပဲ။ position embeddings တွေကလည်း learnable ဖြစ်ပြီး patch embeddings တွေနဲ့ အရွယ်အစားတူညီပါတယ်။ နောက်ဆုံးတော့ embeddings အားလုံးကို Transformer encoder ကို ပေးပို့ပါတယ်။
+
+4.  output ကို၊ အထူးသဖြင့် `[CLS]` token ပါတဲ့ output ကိုပဲ multilayer perceptron head (MLP) ကို ပေးပို့ပါတယ်။ ViT ရဲ့ pretraining ရည်ရွယ်ချက်ကတော့ classification ပါပဲ။ အခြား classification heads တွေလိုပဲ MLP head က output ကို class labels တွေပေါ်က logits အဖြစ် ပြောင်းလဲပြီး အဖြစ်နိုင်ဆုံး class ကို ရှာဖွေဖို့ cross-entropy loss ကို တွက်ချက်ပါတယ်။
+
+ရုပ်ပုံ အမျိုးအစားခွဲခြားခြင်းကို ကိုယ်တိုင်စမ်းကြည့်ဖို့ အဆင်သင့်ဖြစ်ပြီလား။ ViT ကို ဘယ်လို fine-tune လုပ်ပြီး inference အတွက် ဘယ်လိုအသုံးပြုရမလဲဆိုတာ လေ့လာဖို့ ကျွန်တော်တို့ရဲ့ ပြည့်စုံတဲ့ [image classification guide](https://huggingface.co/docs/transformers/tasks/image_classification) ကို ကြည့်ရှုပါ။
+
+
+> [!TIP]
+> ViT နဲ့ BERT ကြားက တူညီမှုကို သတိထားမိပါလိမ့်မယ်။ နှစ်ခုလုံးဟာ အလုံးစုံ ကိုယ်စားပြုမှု (overall representation) ကို ဖမ်းယူဖို့ အထူး token (<code>[CLS]</code>) ကို အသုံးပြုကြပြီး၊ နှစ်ခုလုံးက ၎င်းတို့ရဲ့ embeddings တွေမှာ position information ကို ထပ်ထည့်ကြကာ၊ နှစ်ခုလုံးက tokens/patches တွေရဲ့ sequence ကို လုပ်ဆောင်ဖို့ Transformer encoder ကို အသုံးပြုကြပါတယ်။
+
+## ဝေါဟာရ ရှင်းလင်းချက် (Glossary)
+
+*   **Natural Language Processing (NLP)**: ကွန်ပျူတာတွေ လူသားဘာသာစကားကို နားလည်၊ အဓိပ္ပာယ်ဖော်ပြီး၊ ဖန်တီးနိုင်အောင် လုပ်ဆောင်ပေးတဲ့ Artificial Intelligence (AI) ရဲ့ နယ်ပယ်ခွဲတစ်ခု ဖြစ်ပါတယ်။ ဥပမာအားဖြင့် စာသားခွဲခြမ်းစိတ်ဖြာခြင်း၊ ဘာသာပြန်ခြင်း စသည်တို့ ပါဝင်ပါတယ်။
+*   **Large Language Models (LLMs)**: လူသားဘာသာစကားကို နားလည်ပြီး ထုတ်လုပ်ပေးနိုင်တဲ့ အလွန်ကြီးမားတဲ့ Artificial Intelligence (AI) မော်ဒယ်တွေ ဖြစ်ပါတယ်။ ၎င်းတို့ဟာ ဒေတာအမြောက်အမြားနဲ့ သင်ကြားလေ့ကျင့်ထားပြီး စာရေးတာ၊ မေးခွန်းဖြေတာ စတဲ့ ဘာသာစကားဆိုင်ရာ လုပ်ငန်းမျိုးစုံကို လုပ်ဆောင်နိုင်ပါတယ်။
+*   **Transformer Models**: Natural Language Processing (NLP) မှာ အောင်မြင်မှုများစွာရရှိခဲ့တဲ့ deep learning architecture တစ်မျိုးပါ။ ၎င်းတို့ဟာ စာသားတွေထဲက စကားလုံးတွေရဲ့ ဆက်နွယ်မှုတွေကို "attention mechanism" သုံးပြီး နားလည်အောင် သင်ကြားပေးပါတယ်။
+*   **Encoder**: Transformer Architecture ရဲ့ အစိတ်အပိုင်းတစ်ခုဖြစ်ပြီး input data (ဥပမာ- စာသား) ကို နားလည်ပြီး ကိုယ်စားပြုတဲ့ အချက်အလက် (representation) အဖြစ် ပြောင်းလဲပေးပါတယ်။
+*   **Decoder**: Transformer Architecture ရဲ့ အစိတ်အပိုင်းတစ်ခုဖြစ်ပြီး encoder ကနေ ရရှိတဲ့ အချက်အလက် (representation) ကို အသုံးပြုပြီး output data (ဥပမာ- ဘာသာပြန်ထားတဲ့ စာသား) ကို ထုတ်ပေးပါတယ်။
+*   **Encoder-Decoder Structure**: Encoder နှင့် Decoder နှစ်ခုစလုံး ပါဝင်သော Transformer architecture တစ်မျိုးဖြစ်ပြီး ဘာသာပြန်ခြင်းကဲ့သို့သော input sequence မှ output sequence တစ်ခုသို့ ပြောင်းလဲခြင်း လုပ်ငန်းများအတွက် အသုံးပြုပါတယ်။
+*   **Architecture**: Machine Learning မော်ဒယ်တစ်ခု၏ ဒီဇိုင်း သို့မဟုတ် ဖွဲ့စည်းတည်ဆောက်ပုံ။
+*   **Input Data**: မော်ဒယ်တစ်ခုကို ပေးသွင်းသည့် အချက်အလက်များ။
+*   **Output**: မော်ဒယ်တစ်ခုမှ ထုတ်ပေးသော ရလဒ်များ။
+*   **Predictions**: မော်ဒယ်တစ်ခုမှ ခန့်မှန်းထားသော ရလဒ်များ။
+*   **Audio Classification**: အသံနမူနာများကို ကြိုတင်သတ်မှတ်ထားသော အမျိုးအစားများအဖြစ် ခွဲခြားခြင်း။
+*   **Automatic Speech Recognition (ASR)**: ပြောဆိုသော ဘာသာစကားကို စာသားအဖြစ် အလိုအလျောက် ပြောင်းလဲပေးသည့် နည်းပညာ။
+*   **Image Classification**: ရုပ်ပုံများကို ကြိုတင်သတ်မှတ်ထားသော အမျိုးအစားများအဖြစ် ခွဲခြားခြင်း။
+*   **Object Detection**: ပုံတစ်ပုံအတွင်းရှိ အရာဝတ္ထုများကို ရှာဖွေဖော်ထုတ်ပြီး ၎င်းတို့၏ တည်နေရာကို သတ်မှတ်ခြင်း။
+*   **Image Segmentation**: ပုံတစ်ပုံအတွင်းရှိ pixel များကို သီးခြားအရာဝတ္ထုများ သို့မဟုတ် ဒေသများအဖြစ် ခွဲခြားခြင်း။
+*   **Depth Estimation**: ပုံတစ်ပုံအတွင်းရှိ အရာဝတ္ထုများ၏ ကင်မရာနှင့် ဝေးကွာသော အကွာအဝေးကို ခန့်မှန်းခြင်း။
+*   **Text Classification**: စာသားမှတ်တမ်းများကို ကြိုတင်သတ်မှတ်ထားသော အမျိုးအစားများအဖြစ် ခွဲခြားခြင်း။
+*   **Token Classification**: စာသား sequence တစ်ခုရှိ token တစ်ခုစီကို label တစ်ခု သတ်မှတ်ပေးခြင်း။
+*   **Question Answering**: ပေးထားသော စာသားတစ်ခုအတွင်းမှ မေးခွန်းတစ်ခု၏ အဖြေကို ရှာဖွေခြင်း။
+*   **Text Generation**: AI မော်ဒယ်များကို အသုံးပြု၍ လူသားကဲ့သို့သော စာသားအသစ်များ ဖန်တီးခြင်း။
+*   **Summarization**: ရှည်လျားသော စာသားတစ်ခုကို အဓိကအချက်အလက်များနှင့် အဓိပ္ပာယ်ကို မပျက်စီးစေဘဲ တိုတောင်းအောင်ပြုလုပ်ခြင်း။
+*   **Translation**: စာသားကို ဘာသာစကားတစ်ခုမှ အခြားဘာသာစကားတစ်ခုသို့ အဓိပ္ပာယ်မပျက် ဘာသာပြန်ခြင်း။
+*   **Attention Mechanism**: Transformer မော်ဒယ်များတွင် အသုံးပြုသော နည်းစနစ်တစ်ခုဖြစ်ပြီး input sequence ၏ မတူညီသော အစိတ်အပိုင်းများအပေါ် အာရုံစိုက်ပြီး ဆက်နွယ်မှုများကို သင်ယူစေသည်။
+*   **Language Models**: လူသားဘာသာစကားကို နားလည်ပြီး ထုတ်ပေးနိုင်ရန် ဒီဇိုင်းထုတ်ထားသော Machine Learning မော်ဒယ်များ။
+*   **Tokens**: စာသားတစ်ခု၏ အသေးငယ်ဆုံးသော အစိတ်အပိုင်းများ (ဥပမာ- စကားလုံးများ၊ စာလုံးများ)။
+*   **Machine Translation**: ဘာသာစကားတစ်ခုကနေ အခြားဘာသာစကားတစ်ခုကို စာသားတွေ ဒါမှမဟုတ် စကားပြောတွေကို အလိုအလျောက် ဘာသာပြန်ဆိုခြင်း။
+*   **Bidirectional Context**: စာသားတစ်ခုကို စကားလုံးတစ်လုံးရဲ့ အရှေ့နဲ့ အနောက် နှစ်ဖက်လုံးကနေ ကြည့်ရှုပြီး နားလည်ခြင်း။
+*   **Masked Language Modeling (MLM)**: input tokens အချို့ကို ဖုံးကွယ်ထားပြီး မော်ဒယ်ကို ၎င်းတို့ကို ခန့်မှန်းစေရန် လေ့ကျင့်သော pretraining နည်းလမ်း။
+*   **Causal Language Modeling (CLM)**: input sequence ၏ အရင် tokens များပေါ် အခြေခံပြီး နောက် token ကို ခန့်မှန်းစေရန် မော်ဒယ်ကို လေ့ကျင့်သော pretraining နည်းလမ်း။
+*   **Named Entity Recognition (NER)**: စာသားထဲက လူအမည်၊ နေရာအမည်၊ အဖွဲ့အစည်းအမည် စတဲ့ သီးခြားအမည်တွေကို ရှာဖွေဖော်ထုတ်ခြင်း။
+*   **Part-of-Speech (POS) Tagging**: စာကြောင်းတစ်ခုရှိ စကားလုံးတစ်လုံးစီကို သက်ဆိုင်ရာ သဒ္ဒါအမျိုးအစား (ဥပမာ- နာမ်၊ ကြိယာ၊ နာမဝိသေသန) ကို သတ်မှတ်ပေးခြင်း။
+*   **Self-supervised**: ဒေတာများကို လူသားများက လက်ဖြင့် မှတ်သား (annotate) ရန် မလိုအပ်ဘဲ ဒေတာကိုယ်တိုင်ကနေ သင်ယူနိုင်သော လေ့ကျင့်မှုနည်းလမ်း။
+*   **Human Annotations**: လူသားများက ဒေတာများကို လက်ဖြင့် မှတ်သားခြင်း သို့မဟုတ် အညွှန်းတပ်ခြင်း။
+*   **Transfer Learning**: ကြိုတင်လေ့ကျင့်ထားပြီးသား မော်ဒယ် (pre-trained model) တစ်ခုကို အခြားလုပ်ငန်းတာဝန်အသစ်တစ်ခုအတွက် ပြန်လည်အသုံးပြုခြင်း။
+*   **Byte Pair Encoding (BPE)**: စာသားများကို tokens အဖြစ် ပြောင်းလဲရန် အသုံးပြုသော tokenization နည်းလမ်းတစ်ခု။
+*   **Token Embedding**: tokens များကို vector ပုံစံဖြင့် ကိုယ်စားပြုခြင်း။
+*   **Positional Encodings**: sequence တစ်ခုရှိ token တစ်ခုချင်းစီ၏ တည်နေရာ အချက်အလက်များကို ထပ်ထည့်ပေးခြင်း။
+*   **Decoder Blocks**: Transformer decoder ၏ အစိတ်အပိုင်းများ။
+*   **Masked Self-Attention**: Transformer decoder တွင် အသုံးပြုသော attention mechanism တစ်မျိုးဖြစ်ပြီး မော်ဒယ်ကို future tokens များသို့ ကြည့်ရှုခွင့်မပြုပါ။
+*   **Attention Mask**: attention mechanism တွင် အချို့ tokens များကို လျစ်လျူရှုရန် သို့မဟုတ် ၎င်းတို့၏ score ကို သုညသတ်မှတ်ရန် အသုံးပြုသော mask တစ်ခု။
+*   **Language Modeling Head**: မော်ဒယ်၏ hidden states များကို logits အဖြစ် ပြောင်းလဲပေးသည့် layer။
+*   **Linear Transformation**: သင်္ချာဆိုင်ရာ အပြောင်းအလဲတစ်ခုဖြစ်ပြီး input vector ကို output vector အဖြစ် ပြောင်းလဲပေးသည်။
+*   **Logits**: မော်ဒယ်၏ output မတိုင်မီ raw, unnormalized prediction scores များ။
+*   **Cross-Entropy Loss**: classification လုပ်ငန်းတာဝန်များတွင် အသုံးပြုသော loss function တစ်ခုဖြစ်ပြီး မော်ဒယ်၏ ခန့်မှန်းချက်များနှင့် အမှန်တကယ် labels များကြား ခြားနားချက်ကို တိုင်းတာသည်။
+*   **WordPiece**: စာသားများကို tokens အဖြစ် ပြောင်းလဲရန် BERT မှ အသုံးပြုသော tokenization နည်းလမ်းတစ်ခု။
+*   **`[SEP]` Token**: စာကြောင်းများကြား ခွဲခြားရန် အသုံးပြုသော အထူး token ။
+*   **`[CLS]` Token**: စာကြောင်းတစ်ခု၏ အစတွင် ထည့်သွင်းပြီး စာကြောင်းတစ်ခုလုံး၏ ကိုယ်စားပြုမှုကို ဖမ်းယူရန် အသုံးပြုသော အထူး token ။
+*   **Segment Embedding**: token တစ်ခုက စာကြောင်းတစ်စုံမှာ ပထမ သို့မဟုတ် ဒုတိယစာကြောင်းမှာ ပါဝင်သည်ကို ဖော်ပြသော embedding။
+*   **Feedforward Network**: neural network တစ်ခု၏ အခြေခံ layer တစ်ခု။
+*   **Softmax**: multi-class classification တွင် ဖြစ်နိုင်ခြေများကို တွက်ချက်ရန် အသုံးပြုသော activation function တစ်ခု။
+*   **Next-Sentence Prediction**: မော်ဒယ်ကို စာကြောင်း B က စာကြောင်း A နောက်က လိုက်သလားဆိုတာ ခန့်မှန်းစေရန် လေ့ကျင့်သော pretraining လုပ်ငန်းတာဝန်။
+*   **Sequence Classification Head**: sequence classification လုပ်ငန်းတာဝန်များအတွက် မော်ဒယ်၏ output တွင် ထပ်ထည့်သော linear layer။
+*   **Token Classification Head**: token classification လုပ်ငန်းတာဝန်များအတွက် မော်ဒယ်၏ output တွင် ထပ်ထည့်သော linear layer။
+*   **Span Classification Head**: question answering လုပ်ငန်းတာဝန်များအတွက် မော်ဒယ်၏ output တွင် ထပ်ထည့်သော linear layer ဖြစ်ပြီး အဖြေ၏ start/end positions များကို ခန့်မှန်းသည်။
+*   **Corrupting**: မော်ဒယ်ကို လေ့ကျင့်ရန်အတွက် input data တွင် ရည်ရွယ်ချက်ရှိရှိ အပြောင်းအလဲများ ပြုလုပ်ခြင်း။
+*   **Text Infilling**: စာသားအပိုင်းအချို့ကို ဖုံးကွယ်ထားပြီး မော်ဒယ်ကို ၎င်းတို့ကို ခန့်မှန်းစေရန် လေ့ကျင့်သော corruption strategy။
+*   **Log-Mel Spectrogram**: အသံအချက်ပြမှုတစ်ခု၏ ကြိမ်နှုန်းနှင့် အချိန်အလိုက် ပြောင်းလဲမှုများကို ပုံရိပ်အဖြစ် ကိုယ်စားပြုခြင်း။
+*   **Autoregressively**: အရင်က ခန့်မှန်းထားတဲ့ outputs တွေပေါ် အခြေခံပြီး နောက် output ကို ခန့်မှန်းတဲ့ လုပ်ငန်းစဉ်။
+*   **Zero-shot Performance**: မော်ဒယ်တစ်ခုကို သီးခြားလုပ်ငန်းအတွက် လေ့ကျင့်ထားခြင်းမရှိဘဲ လုပ်ငန်းအသစ်တစ်ခုကို လုပ်ဆောင်နိုင်စွမ်း။
+*   **Weakly Supervised Pretraining**: လူသားမှတ်သားမှု (human annotations) နည်းပါးသော သို့မဟုတ် မရှိသော ဒေတာများကို အသုံးပြု၍ မော်ဒယ်ကို ကြိုတင်လေ့ကျင့်ခြင်း။
+*   **Pipeline**: Hugging Face Transformers library တွင် ပါဝင်သော လုပ်ဆောင်ချက်တစ်ခုဖြစ်ပြီး မော်ဒယ်များကို သီးခြားလုပ်ငန်းတာဝန်များအတွက် အသုံးပြုရလွယ်ကူစေရန် ကူညီပေးသည်။
+*   **Patches**: ပုံတစ်ပုံကို ခွဲခြမ်းထားသော သေးငယ်သော အစိတ်အပိုင်းများ။
+*   **Convolutional 2D Layer**: ပုံများကို လုပ်ဆောင်ရန် အသုံးပြုသော neural network layer တစ်မျိုး။
+*   **Multilayer Perceptron (MLP) Head**: classification လုပ်ငန်းတာဝန်များအတွက် အသုံးပြုသော feedforward neural network layer။
+*   **Convolutional Neural Network (CNN)**: ပုံများနှင့် ဗီဒီယိုများကို လုပ်ဆောင်ရန် အထူးဒီဇိုင်းထုတ်ထားသော neural network အမျိုးအစားတစ်ခု။
+*   **Convolutional Layers**: CNN ၏ အဓိက အစိတ်အပိုင်းများဖြစ်ပြီး ပုံများမှ features များကို ထုတ်ယူရန် အသုံးပြုသည်။
\ No newline at end of file
diff --git a/chapters/my/chapter1/6.mdx b/chapters/my/chapter1/6.mdx
new file mode 100644
index 000000000..9619942e3
--- /dev/null
+++ b/chapters/my/chapter1/6.mdx
@@ -0,0 +1,235 @@
+<CourseFloatingBanner
+    chapter={1}
+    classNames="absolute z-10 right-0 top-0"
+/>
+
+# Transformer Architectures များ[[transformer-architectures]]
+
+ယခင်အပိုင်းတွေမှာ Transformer architecture အကြောင်းကို မိတ်ဆက်ပေးခဲ့ပြီး ဒီမော်ဒယ်တွေက လုပ်ငန်းတာဝန်အမျိုးမျိုးကို ဘယ်လိုဖြေရှင်းပေးနိုင်လဲဆိုတာကို လေ့လာခဲ့ပါတယ်။ အခုတော့ Transformer မော်ဒယ်တွေရဲ့ အဓိက architectured ပုံစံသုံးမျိုးကို ပိုမိုနက်နဲစွာ လေ့လာပြီး တစ်ခုချင်းစီကို ဘယ်အချိန်မှာ အသုံးပြုသင့်လဲဆိုတာကို နားလည်အောင် လုပ်ဆောင်ကြပါစို့။ ထို့နောက်၊ အဲဒီ architecture တွေကို မတူညီတဲ့ ဘာသာစကားလုပ်ငန်းတာဝန်တွေမှာ ဘယ်လိုအသုံးပြုလဲဆိုတာကို ကြည့်ရှုခဲ့ကြပါတယ်။
+
+ဒီအပိုင်းမှာတော့ Transformer မော်ဒယ်တွေရဲ့ အဓိက architectured ပုံစံသုံးမျိုးကို ပိုမိုနက်နဲစွာ လေ့လာပြီး တစ်ခုချင်းစီကို ဘယ်အချိန်မှာ အသုံးပြုသင့်လဲဆိုတာကို နားလည်အောင် လုပ်ဆောင်သွားပါမယ်။
+
+
+> [!TIP]
+> Transformer မော်ဒယ်အများစုဟာ architecture သုံးမျိုးထဲက တစ်ခုကို အသုံးပြုတယ်ဆိုတာ သတိရပါ။ အဲဒါတွေကတော့ encoder-only, decoder-only, ဒါမှမဟုတ် encoder-decoder (sequence-to-sequence) တို့ ဖြစ်ပါတယ်။ ဒီကွာခြားချက်တွေကို နားလည်ထားခြင်းက သင့်ရဲ့ သီးခြားလုပ်ငန်းတာဝန်အတွက် မှန်ကန်တဲ့ မော်ဒယ်ကို ရွေးချယ်နိုင်ဖို့ ကူညီပါလိမ့်မယ်။
+
+## Encoder မော်ဒယ်များ[[encoder-models]]
+
+<Youtube id="MUqNwgPjJvQ" />
+
+Encoder မော်ဒယ်တွေဟာ Transformer မော်ဒယ်ရဲ့ encoder အပိုင်းကိုသာ အသုံးပြုပါတယ်။ အဆင့်တိုင်းမှာ attention layers တွေက မူလစာကြောင်းထဲက စကားလုံးအားလုံးကို ဝင်ရောက်ကြည့်ရှုနိုင်ပါတယ်။ ဒီမော်ဒယ်တွေကို "bi-directional" attention ရှိတယ်လို့ မကြာခဏ ဖော်ပြလေ့ရှိပြီး *auto-encoding models* လို့လည်း ခေါ်ကြပါတယ်။
+
+ဒီမော်ဒယ်တွေရဲ့ pretraining က များသောအားဖြင့် ပေးထားတဲ့ စာကြောင်းတစ်ခုကို တစ်နည်းတစ်ဖုံ ဖျက်ဆီးပြီး (ဥပမာ- ကျပန်းစကားလုံးတွေကို ဝှက်ထားခြင်းဖြင့်) မူလစာကြောင်းကို ရှာဖွေခြင်း သို့မဟုတ် ပြန်လည်တည်ဆောက်ခြင်းတို့ကို မော်ဒယ်ကို တာဝန်ပေးခြင်းအပေါ် အခြေခံပါတယ်။
+
+Encoder မော်ဒယ်တွေဟာ စာကြောင်းအပြည့်အစုံကို နားလည်ဖို့ လိုအပ်တဲ့ လုပ်ငန်းတာဝန်များအတွက် အသင့်တော်ဆုံးဖြစ်ပြီး ဥပမာအားဖြင့် sentence classification, named entity recognition (နဲ့ ပိုမိုယေဘုယျအားဖြင့် word classification) နဲ့ extractive question answering တို့ပဲ ဖြစ်ပါတယ်။
+
+> [!TIP]
+> "[🤗 Transformers တွေက လုပ်ငန်းတာဝန်တွေကို ဘယ်လိုဖြေရှင်းပေးလဲ။](/chapter1/5)" မှာ ကျွန်တော်တို့ မြင်တွေ့ခဲ့ရသလို BERT လို encoder မော်ဒယ်တွေဟာ စာသားကို နားလည်ရာမှာ ထူးချွန်ပါတယ်။ ဘာကြောင့်လဲဆိုတော့ ၎င်းတို့ဟာ input တစ်ခုလုံးရဲ့ အကြောင်းအရာ (context) ကို နှစ်ဖက်စလုံးကနေ ကြည့်ရှုနိုင်လို့ပါ။ ဒါကြောင့် input တစ်ခုလုံးရဲ့ နားလည်မှုက အရေးကြီးတဲ့ လုပ်ငန်းတာဝန်တွေအတွက် ၎င်းတို့ဟာ အကောင်းဆုံးပါပဲ။
+
+ဒီမော်ဒယ်မိသားစုရဲ့ ကိုယ်စားပြုမော်ဒယ်တွေကတော့:
+
+- [BERT](https://huggingface.co/docs/transformers/model_doc/bert)
+- [DistilBERT](https://huggingface.co/docs/transformers/model_doc/distilbert)
+- [ModernBERT](https://huggingface.co/docs/transformers/en/model_doc/modernbert)
+
+## Decoder မော်ဒယ်များ[[decoder-models]]
+
+<Youtube id="d_ixlCubqQw" />
+
+Decoder မော်ဒယ်တွေဟာ Transformer မော်ဒယ်ရဲ့ decoder အပိုင်းကိုသာ အသုံးပြုပါတယ်။ အဆင့်တိုင်းမှာ ပေးထားတဲ့ စကားလုံးတစ်ခုအတွက် attention layers တွေက စာကြောင်းထဲမှာ အရင်ရှိနေတဲ့ စကားလုံးတွေကိုသာ ဝင်ရောက်ကြည့်ရှုနိုင်ပါတယ်။ ဒီမော်ဒယ်တွေကို မကြာခဏ *auto-regressive models* လို့ ခေါ်ကြပါတယ်။
+
+Decoder မော်ဒယ်တွေရဲ့ pretraining က များသောအားဖြင့် စာကြောင်းထဲက နောက်ထပ်စကားလုံးကို ခန့်မှန်းခြင်းအပေါ် အခြေခံပါတယ်။
+
+ဒီမော်ဒယ်တွေဟာ စာသားဖန်တီးခြင်း (text generation) နဲ့ပတ်သက်တဲ့ လုပ်ငန်းတာဝန်တွေအတွက် အသင့်တော်ဆုံး ဖြစ်ပါတယ်။
+
+> [!TIP]
+> GPT လို decoder မော်ဒယ်တွေကို တစ်ကြိမ်လျှင် token တစ်ခုချင်းစီကို ခန့်မှန်းခြင်းဖြင့် စာသားတွေ ဖန်တီးဖို့ ဒီဇိုင်းထုတ်ထားပါတယ်။ "[🤗 Transformers တွေက လုပ်ငန်းတာဝန်တွေကို ဘယ်လိုဖြေရှင်းပေးလဲ။](/chapter1/5)" မှာ ကျွန်တော်တို့ လေ့လာခဲ့ရသလို၊ ၎င်းတို့ဟာ ယခင် tokens တွေကိုသာ မြင်နိုင်တာကြောင့် ဖန်တီးမှုဆိုင်ရာ စာသားဖန်တီးခြင်းအတွက် အလွန်ကောင်းမွန်ပေမယ့် bi-directional နားလည်မှု လိုအပ်တဲ့ လုပ်ငန်းတာဝန်တွေအတွက်တော့ သိပ်မသင့်လျော်ပါဘူး။
+
+ဒီမော်ဒယ်မိသားစုရဲ့ ကိုယ်စားပြုမော်ဒယ်တွေကတော့:
+
+- [Hugging Face SmolLM Series](https://huggingface.co/HuggingFaceTB/SmolLM2-1.7B-Instruct)
+- [Meta's Llama Series](https://huggingface.co/docs/transformers/en/model_doc/llama4)
+- [Google's Gemma Series](https://huggingface.co/docs/transformers/main/en/model_doc/gemma3)
+- [DeepSeek's V3](https://huggingface.co/deepseek-ai/DeepSeek-V3)
+
+### ခေတ်မီ Large Language Models (LLMs) များ
+
+ခေတ်မီ Large Language Models (LLMs) အများစုဟာ decoder-only architecture ကို အသုံးပြုပါတယ်။ ဒီမော်ဒယ်တွေဟာ လွန်ခဲ့တဲ့ နှစ်အနည်းငယ်အတွင်း အရွယ်အစားနဲ့ စွမ်းရည်တွေ သိသိသာသာ ကြီးထွားလာခဲ့ပြီး အကြီးဆုံးမော်ဒယ်အချို့မှာ parameters ဘီလီယံပေါင်း ရာနဲ့ချီ ပါဝင်ပါတယ်။
+
+ခေတ်မီ LLMs တွေကို ပုံမှန်အားဖြင့် အဆင့်နှစ်ဆင့်နဲ့ လေ့ကျင့်ပါတယ်။
+1. **Pretraining**: မော်ဒယ်ဟာ များပြားလှတဲ့ စာသားဒေတာတွေပေါ်မှာ နောက်ထပ် token ကို ခန့်မှန်းဖို့ သင်ယူပါတယ်။
+2. **Instruction tuning**: မော်ဒယ်ဟာ ညွှန်ကြားချက်တွေကို လိုက်နာပြီး အထောက်အကူဖြစ်စေတဲ့ တုံ့ပြန်မှုတွေကို ဖန်တီးဖို့ fine-tune လုပ်ခံရပါတယ်။
+
+ဒီချဉ်းကပ်မှုက ကျယ်ပြန့်တဲ့ ခေါင်းစဉ်တွေနဲ့ လုပ်ငန်းတာဝန်မျိုးစုံမှာ လူသားဆန်တဲ့ စာသားတွေကို နားလည်ပြီး ဖန်တီးနိုင်တဲ့ မော်ဒယ်တွေကို ဖြစ်ပေါ်စေခဲ့ပါတယ်။
+
+#### ခေတ်မီ LLMs တွေရဲ့ အဓိက စွမ်းရည်များ
+
+ခေတ်မီ decoder-based LLMs တွေဟာ အထင်ကြီးစရာကောင်းတဲ့ စွမ်းရည်တွေကို ပြသခဲ့ပါတယ်။
+
+| စွမ်းရည်           | ဖော်ပြချက်                                            | ဥပမာ                                    |
+|--------------------|--------------------------------------------------------|------------------------------------------|
+| စာသားဖန်တီးခြင်း    | ဆက်စပ်မှုရှိပြီး အကြောင်းအရာနှင့်ကိုက်ညီသော စာသားများ ဖန်တီးခြင်း | စာစီစာကုံး၊ ပုံပြင်များ သို့မဟုတ် emails များ ရေးသားခြင်း |
+| အကျဉ်းချုပ်ခြင်း     | စာရွက်စာတမ်းရှည်များကို ပိုမိုတိုတောင်းသော ပုံစံများဖြင့် အကျဉ်းချုံးခြင်း | အစီရင်ခံစာများ၏ အနှစ်ချုပ်များ ဖန်တီးခြင်း |
+| ဘာသာပြန်ခြင်း      | ဘာသာစကားများအကြား စာသားများကို ပြောင်းလဲခြင်း           | အင်္ဂလိပ်မှ စပိန်သို့ ဘာသာပြန်ခြင်း        |
+| မေးခွန်းဖြေဆိုခြင်း | အမှန်တကယ်မေးခွန်းများကို အဖြေပေးခြင်း                   | "ပြင်သစ်နိုင်ငံရဲ့ မြို့တော်က ဘာလဲ"      |
+| Code ဖန်တီးခြင်း   | Code snippets များကို ရေးသားခြင်း သို့မဟုတ် ဖြည့်စွက်ခြင်း | ဖော်ပြချက်အပေါ် အခြေခံ၍ function တစ်ခု ဖန်တီးခြင်း |
+| ဆင်ခြင်တုံတရား      | ပြဿနာများကို အဆင့်ဆင့်ဖြေရှင်းခြင်း                     | သင်္ချာပြဿနာများ သို့မဟုတ် ယုတ္တိပဟေဠိများ ဖြေရှင်းခြင်း |
+| Few-shot learning  | prompt တွင် ဥပမာအနည်းငယ်မှ သင်ယူခြင်း                 | ဥပမာ ၂-၃ ခုသာ မြင်ပြီးနောက် စာသားများကို အမျိုးအစားခွဲခြားခြင်း |
+
+Hub ပေါ်ရှိ model repo စာမျက်နှာများမှတစ်ဆင့် သင်၏ browser ထဲတွင် decoder-based LLMs များကို တိုက်ရိုက် စမ်းသပ်နိုင်ပါတယ်။ classic [GPT-2](https://huggingface.co/openai-community/gpt2) (OpenAI ရဲ့ အကောင်းဆုံး open source မော်ဒယ်!) ဥပမာတစ်ခုကတော့ ဒီမှာပါ။
+
+<iframe
+	src="https://huggingface.co/openai-community/gpt2"
+	frameborder="0"
+	width="100%"
+	height="450"
+></iframe>
+
+## Sequence-to-sequence မော်ဒယ်များ[[sequence-to-sequence-models]]
+
+<Youtube id="0_4KEb08xrE" />
+
+Encoder-decoder မော်ဒယ်တွေ (သို့မဟုတ် *sequence-to-sequence models* လို့လည်း ခေါ်ကြပါတယ်) ဟာ Transformer architecture ရဲ့ အစိတ်အပိုင်းနှစ်ခုလုံးကို အသုံးပြုပါတယ်။ အဆင့်တိုင်းမှာ encoder ရဲ့ attention layers တွေက မူလစာကြောင်းထဲက စကားလုံးအားလုံးကို ဝင်ရောက်ကြည့်ရှုနိုင်ပြီး၊ decoder ရဲ့ attention layers တွေကတော့ input ထဲမှာ ပေးထားတဲ့ စကားလုံးတစ်ခုရဲ့ အရင်ရှိနေတဲ့ စကားလုံးတွေကိုသာ ဝင်ရောက်ကြည့်ရှုနိုင်ပါတယ်။
+
+ဒီမော်ဒယ်တွေရဲ့ pretraining က ပုံစံအမျိုးမျိုး ရှိနိုင်ပေမယ့် များသောအားဖြင့် input က တစ်နည်းတစ်ဖုံ ဖျက်ဆီးခံထားရတဲ့ စာကြောင်းတစ်ခုကို ပြန်လည်တည်ဆောက်ခြင်း (ဥပမာ- ကျပန်းစကားလုံးတွေကို ဝှက်ထားခြင်းဖြင့်) တို့ ပါဝင်ပါတယ်။ T5 မော်ဒယ်ရဲ့ pretraining ကတော့ စာသားအပိုင်းအစများ (စကားလုံးများစွာ ပါဝင်နိုင်ပါတယ်) ကို mask special token တစ်ခုတည်းနဲ့ အစားထိုးပြီး အဲဒီ mask token က အစားထိုးထားတဲ့ စာသားကို ခန့်မှန်းဖို့ လုပ်ငန်းတာဝန်ပေးခြင်းတို့ ပါဝင်ပါတယ်။
+
+Sequence-to-sequence မော်ဒယ်တွေဟာ ပေးထားတဲ့ input အပေါ် မူတည်ပြီး စာကြောင်းအသစ်တွေ ဖန်တီးခြင်းနဲ့ပတ်သက်တဲ့ လုပ်ငန်းတာဝန်တွေအတွက် အသင့်တော်ဆုံးဖြစ်ပြီး ဥပမာအားဖြင့် summarization, translation, ဒါမှမဟုတ် generative question answering တို့ပဲ ဖြစ်ပါတယ်။
+
+> [!TIP]
+> "[🤗 Transformers တွေက လုပ်ငန်းတာဝန်တွေကို ဘယ်လိုဖြေရှင်းပေးလဲ။](/chapter1/5)" မှာ ကျွန်တော်တို့ မြင်တွေ့ခဲ့ရသလို BART နဲ့ T5 လို encoder-decoder မော်ဒယ်တွေဟာ architecture နှစ်ခုလုံးရဲ့ အားသာချက်တွေကို ပေါင်းစပ်ထားပါတယ်။ encoder က input ကို နှစ်ဖက်စလုံးကနေ နက်ရှိုင်းစွာ နားလည်မှုကို ပေးပြီး၊ decoder ကတော့ သင့်လျော်တဲ့ output စာသားကို ဖန်တီးပေးပါတယ်။ ဒါကြောင့် ဘာသာပြန်ခြင်း ဒါမှမဟုတ် အကျဉ်းချုပ်ခြင်းလို sequence တစ်ခုကို အခြားတစ်ခုသို့ ပြောင်းလဲပေးတဲ့ လုပ်ငန်းတာဝန်တွေအတွက် ၎င်းတို့ဟာ အကောင်းဆုံးပါပဲ။
+
+### လက်တွေ့အသုံးချမှုများ
+
+Sequence-to-sequence မော်ဒယ်တွေဟာ စာသားတစ်ခုရဲ့ ပုံစံကို အဓိပ္ပာယ်မပျက်စီးစေဘဲ အခြားပုံစံတစ်ခုသို့ ပြောင်းလဲဖို့ လိုအပ်တဲ့ လုပ်ငန်းတာဝန်တွေမှာ ထူးချွန်ပါတယ်။ လက်တွေ့အသုံးချမှုအချို့ကတော့:
+
+| အသုံးချမှု           | ဖော်ပြချက်                                            | ဥပမာ မော်ဒယ် |
+|--------------------|--------------------------------------------------------|---------------|
+| စက်ဘာသာပြန်ခြင်း    | ဘာသာစကားများအကြား စာသားများကို ပြောင်းလဲခြင်း           | Marian, T5    |
+| စာသားအကျဉ်းချုပ်ခြင်း | စာသားရှည်များကို အကျဉ်းချုံး ဖန်တီးခြင်း                  | BART, T5      |
+| ဒေတာမှ စာသားဖန်တီးခြင်း | ဖွဲ့စည်းထားသော ဒေတာများကို သဘာဝဘာသာစကားအဖြစ် ပြောင်းလဲခြင်း | T5            |
+| သဒ္ဒါပြင်ဆင်ခြင်း      | စာသားရှိ သဒ္ဒါအမှားများကို ပြင်ဆင်ခြင်း                   | T5            |
+| မေးခွန်းဖြေဆိုခြင်း | အကြောင်းအရာ (context) အပေါ် အခြေခံ၍ အဖြေများ ဖန်တီးခြင်း | BART, T5      |
+
+ဘာသာပြန်ခြင်းအတွက် sequence-to-sequence မော်ဒယ်ရဲ့ အပြန်အလှန်တုံ့ပြန်နိုင်တဲ့ demo တစ်ခုကတော့ ဒီမှာပါ။
+
+<iframe
+	src="https://course-demos-speech-to-speech-translation.hf.space"
+	frameborder="0"
+	width="850"
+	height="450"
+></iframe>
+
+ဒီမော်ဒယ်မိသားစုရဲ့ ကိုယ်စားပြုမော်ဒယ်တွေကတော့:
+
+- [BART](https://huggingface.co/docs/transformers/model_doc/bart)
+- [mBART](https://huggingface.co/docs/transformers/model_doc/mbart)
+- [Marian](https://huggingface.co/docs/transformers/model_doc/marian)
+- [T5](https://huggingface.co/docs/transformers/model_doc/t5)
+
+## မှန်ကန်သော Architecture ကို ရွေးချယ်ခြင်း[[choosing-the-right-architecture]]
+
+တိကျတဲ့ NLP လုပ်ငန်းတာဝန်တစ်ခုကို လုပ်ဆောင်တဲ့အခါ ဘယ် architecture ကို အသုံးပြုရမယ်ဆိုတာ ဘယ်လိုဆုံးဖြတ်မလဲ။ အောက်ပါကတော့ အမြန်လမ်းညွှန်ချက် ဖြစ်ပါတယ်။
+
+| လုပ်ငန်းတာဝန်                      | အကြံပြုထားသော Architecture | ဥပမာများ       |
+|-------------------------------------|--------------------------------|-------------------|
+| စာသားခွဲခြားသတ်မှတ်ခြင်း (sentiment, topic) | Encoder                        | BERT, RoBERTa     |
+| စာသားဖန်တီးခြင်း (creative writing) | Decoder                        | GPT, LLaMA        |
+| ဘာသာပြန်ခြင်း                        | Encoder-Decoder                | T5, BART          |
+| အကျဉ်းချုပ်ခြင်း                     | Encoder-Decoder                | BART, T5          |
+| Named entity recognition            | Encoder                        | BERT, RoBERTa     |
+| မေးခွန်းဖြေဆိုခြင်း (extractive)   | Encoder                        | BERT, RoBERTa     |
+| မေးခွန်းဖြေဆိုခြင်း (generative)   | Encoder-Decoder သို့မဟုတ် Decoder | T5, GPT           |
+| Conversational AI                   | Decoder                        | GPT, LLaMA        |
+
+> [!TIP]
+> ဘယ်မော်ဒယ်ကို အသုံးပြုရမယ်ဆိုတာ မသေချာတဲ့အခါ အောက်ပါအချက်တွေကို ထည့်သွင်းစဉ်းစားပါ။
+>
+> 1.  သင့်လုပ်ငန်းတာဝန်က ဘယ်လိုနားလည်မှုမျိုး လိုအပ်လဲ။ (Bi-directional သို့မဟုတ် Uni-directional)
+> 2.  သင်ဟာ စာသားအသစ် ဖန်တီးမှာလား ဒါမှမဟုတ် ရှိပြီးသားစာသားကို ခွဲခြမ်းစိတ်ဖြာမှာလား။
+> 3.  Sequence တစ်ခုကို အခြားတစ်ခုသို့ ပြောင်းလဲဖို့ လိုအပ်ပါသလား။
+>
+> ဒီမေးခွန်းတွေရဲ့ အဖြေတွေက သင့်ကို မှန်ကန်တဲ့ architecture ဆီ ဦးတည်စေပါလိမ့်မယ်။
+
+## LLMs တွေရဲ့ ဆင့်ကဲပြောင်းလဲမှု
+
+Large Language Models တွေဟာ မကြာသေးခင်နှစ်များအတွင်းမှာ အလျင်အမြန် ဆင့်ကဲပြောင်းလဲလာခဲ့ပြီး မျိုးဆက်သစ်တိုင်းမှာ စွမ်းရည်တွေ သိသိသာသာ တိုးတက်လာခဲ့ပါတယ်။
+
+## Attention Mechanisms[[attention-mechanisms]]
+
+Transformer မော်ဒယ်အများစုက attention matrix က စတုရန်းပုံ (square) ဖြစ်တဲ့ full attention ကို အသုံးပြုပါတယ်။ စာသားတွေ ရှည်လျားလာတဲ့အခါမှာ ဒါဟာ တွက်ချက်မှုဆိုင်ရာ အဓိက အဟန့်အတားတစ်ခု ဖြစ်နိုင်ပါတယ်။ Longformer နဲ့ Reformer တို့ဟာ ပိုမိုထိရောက်အောင် ကြိုးစားပြီး လေ့ကျင့်မှုမြန်ဆန်စေရန် attention matrix ရဲ့ sparse version ကို အသုံးပြုကြပါတယ်။
+
+> [!TIP]
+> Standard attention mechanisms တွေမှာ computational complexity က O(n²) ရှိပါတယ်။ n ဆိုတာ sequence length ပါ။ ဒီဟာက ရှည်လျားလွန်းတဲ့ sequences တွေအတွက် ပြဿနာ ဖြစ်လာပါတယ်။ အောက်မှာဖော်ပြထားတဲ့ specialized attention mechanisms တွေက ဒီကန့်သတ်ချက်ကို ဖြေရှင်းပေးပါတယ်။
+
+### LSH attention
+
+[Reformer](https://huggingface.co/docs/transformers/model_doc/reformer) က LSH attention ကို အသုံးပြုပါတယ်။ softmax(QK^t) မှာ QK^t matrix ရဲ့ အကြီးဆုံး elements (softmax dimension မှာ) တွေကသာ အသုံးဝင်တဲ့ အထောက်အကူတွေကို ပေးပါလိမ့်မယ်။ ဒါကြောင့် Q မှာရှိတဲ့ query q တစ်ခုစီအတွက် q နဲ့ နီးစပ်တဲ့ K မှာရှိတဲ့ key k တွေကိုသာ ထည့်သွင်းစဉ်းစားနိုင်ပါတယ်။ q နဲ့ k နီးစပ်မှုရှိမရှိကို ဆုံးဖြတ်ဖို့ hash function တစ်ခုကို အသုံးပြုပါတယ်။ attention mask ကို လက်ရှိ token ကို ဝှက်ထားဖို့ ပြင်ဆင်ထားပါတယ် (ပထမဆုံး position မှာမှလွဲ၍)။ ဘာလို့လဲဆိုတော့ ဒါဟာ query နဲ့ key ကို တူညီစေပြီး (ဒါကြောင့် အချင်းချင်း အလွန်ဆင်တူပါတယ်)။ hash က ကျပန်းဖြစ်နိုင်တာကြောင့် လက်တွေ့မှာ hash function အများအပြားကို အသုံးပြုပါတယ် (n_rounds parameter နဲ့ ဆုံးဖြတ်ပါတယ်) ပြီးတော့ ၎င်းတို့ကို ပျမ်းမျှယူပါတယ်။
+
+### Local attention
+
+[Longformer](https://huggingface.co/docs/transformers/model_doc/longformer) က local attention ကို အသုံးပြုပါတယ်- များသောအားဖြင့် ဒေသတွင်း အကြောင်းအရာ (ဥပမာ- ဘယ်ဘက်နဲ့ ညာဘက်က tokens နှစ်ခုက ဘာတွေလဲ) က ပေးထားတဲ့ token တစ်ခုအတွက် အရေးယူဖို့ လုံလောက်ပါတယ်။ ဒါ့အပြင် ပြတင်းပေါက်ငယ်တစ်ခုရှိတဲ့ attention layers တွေကို စုစည်းခြင်းဖြင့်၊ နောက်ဆုံး layer က ပြတင်းပေါက်ထဲရှိ tokens တွေထက် ပိုမိုကျယ်ပြန့်တဲ့ receptive field ကို ပိုင်ဆိုင်နိုင်ပြီး စာကြောင်းတစ်ခုလုံးကို ကိုယ်စားပြုတဲ့ အချက်အလက် (representation) ကို တည်ဆောက်နိုင်စေပါတယ်။
+
+ကြိုတင်ရွေးချယ်ထားသော input tokens အချို့ကိုလည်း global attention ပေးထားပါတယ်။ ဒီ tokens အနည်းငယ်အတွက် attention matrix က tokens အားလုံးကို ဝင်ရောက်ကြည့်ရှုနိုင်ပြီး ဒီလုပ်ငန်းစဉ်က symmetric ဖြစ်ပါတယ်- အခြား tokens အားလုံးက အဲဒီသီးခြား tokens တွေကို ဝင်ရောက်ကြည့်ရှုနိုင်ပါတယ် (၎င်းတို့ရဲ့ local window ထဲမှာရှိတဲ့ tokens တွေအပြင်)။ ဒါကို စာတမ်းရဲ့ ပုံ 2d မှာ ပြသထားပါတယ်၊ အောက်မှာ attention mask ဥပမာကို ကြည့်ပါ။
+
+<div class="flex justify-center">
+    <img scale="50 %" align="center" src="https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/local_attention_mask.png"/>
+</div>
+
+parameters နည်းတဲ့ attention matrices တွေကို အသုံးပြုခြင်းအားဖြင့် မော်ဒယ်က sequence length ပိုကြီးတဲ့ inputs တွေကို လက်ခံနိုင်စေပါတယ်။
+
+### Axial positional encodings
+
+[Reformer](https://huggingface.co/docs/transformers/model_doc/reformer) က axial positional encodings တွေကို အသုံးပြုပါတယ်။ ရိုးရာ Transformer မော်ဒယ်တွေမှာ positional encoding E က \\(l\\) by \\(d\\) matrix တစ်ခုဖြစ်ပြီး \\(l\\) က sequence length ဖြစ်ကာ \\(d\\) က hidden state ရဲ့ dimension ဖြစ်ပါတယ်။ စာသားတွေ အလွန်ရှည်လျားရင် ဒီ matrix က အလွန်ကြီးမားပြီး GPU ပေါ်မှာ နေရာအများကြီး ယူနိုင်ပါတယ်။ ဒါကို ဖြေလျှော့ဖို့အတွက် axial positional encodings တွေက အဲဒီကြီးမားတဲ့ matrix E ကို E1 နဲ့ E2 ဆိုတဲ့ သေးငယ်တဲ့ matrices နှစ်ခုအဖြစ် ခွဲထုတ်တာကို ဆိုလိုပါတယ်။ E1 နဲ့ E2 ရဲ့ dimensions တွေကတော့ \\(l_{1} \times d_{1}\\) နဲ့ \\(l_{2} \times d_{2}\\) ဖြစ်ပြီး \\(l_{1} \times l_{2} = l\\) နဲ့ \\(d_{1} + d_{2} = d\\) ဖြစ်ပါတယ်။ (အလျားတွေအတွက် မြှောက်လဒ်နဲ့ဆိုရင် ဒါက အများကြီး သေးငယ်သွားပါလိမ့်မယ်)။ E မှာရှိတဲ့ time step \\(j\\) အတွက် embedding ကို E1 မှာရှိတဲ့ time step \\(j \% l1\\) အတွက် embedding နဲ့ E2 မှာရှိတဲ့ time step \\(j // l1\\) အတွက် embedding တွေကို ပေါင်းစပ်ခြင်းဖြင့် ရရှိပါတယ်။
+
+## နိဂုံးချုပ်[[conclusion]]
+
+ဒီအပိုင်းမှာ ကျွန်တော်တို့ Transformer architectures သုံးမျိုးနဲ့ အထူးပြု attention mechanisms အချို့ကို လေ့လာခဲ့ပါတယ်။ ဒီ architecture တွေရဲ့ ကွာခြားချက်တွေကို နားလည်ထားတာက သင့်ရဲ့ သီးခြား NLP လုပ်ငန်းတာဝန်အတွက် မှန်ကန်တဲ့ မော်ဒယ်ကို ရွေးချယ်ဖို့အတွက် အရေးကြီးပါတယ်။
+
+သင်တန်းမှာ ဆက်လက်လုပ်ဆောင်သွားတဲ့အခါ ဒီမတူညီတဲ့ architecture တွေနဲ့ လက်တွေ့အတွေ့အကြုံတွေ ရရှိလာမှာဖြစ်ပြီး သင့်ရဲ့ သီးခြားလိုအပ်ချက်တွေအတွက် ဘယ်လို fine-tune လုပ်ရမယ်ဆိုတာကို သင်ယူရမှာ ဖြစ်ပါတယ်။ နောက်အပိုင်းမှာတော့ ဒီမော်ဒယ်တွေမှာ ရှိနေတဲ့ ကန့်သတ်ချက်တွေနဲ့ ဘက်လိုက်မှုအချို့ကို လေ့လာသွားမှာဖြစ်ပြီး ၎င်းတို့ကို အသုံးပြုတဲ့အခါ သတိထားသင့်တဲ့ အချက်တွေပဲ ဖြစ်ပါတယ်။
+
+## ဝေါဟာရ ရှင်းလင်းချက် (Glossary)
+
+*   ** Architecture**: ကွန်ပျူတာစနစ်တစ်ခု၊ ဆော့ဖ်ဝဲလ်တစ်ခု သို့မဟုတ် မော်ဒယ်တစ်ခု၏ အစိတ်အပိုင်းများ စုစည်းပုံနှင့် ၎င်းတို့အချင်းချင်း ဆက်စပ်လုပ်ဆောင်ပုံကို ဖော်ပြသည့် အခြေခံဒီဇိုင်း သို့မဟုတ် ဖွဲ့စည်းပုံ။
+*   **Transformer Architecture**: Natural Language Processing (NLP) မှာ အောင်မြင်မှုများစွာရရှိခဲ့တဲ့ deep learning architecture တစ်မျိုးပါ။ ၎င်းတို့ဟာ စာသားတွေထဲက စကားလုံးတွေရဲ့ ဆက်နွယ်မှုတွေကို "attention mechanism" သုံးပြီး နားလည်အောင် သင်ကြားပေးပါတယ်။
+*   **Encoder-only**: Transformer မော်ဒယ်ရဲ့ encoder အစိတ်အပိုင်းကိုသာ အသုံးပြုထားသော architecture အမျိုးအစား။ စာသားနားလည်မှုလုပ်ငန်းများအတွက် သင့်တော်သည်။
+*   **Decoder-only**: Transformer မော်ဒယ်ရဲ့ decoder အစိတ်အပိုင်းကိုသာ အသုံးပြုထားသော architecture အမျိုးအစား။ စာသားဖန်တီးမှုလုပ်ငန်းများအတွက် သင့်တော်သည်။
+*   **Encoder-Decoder (Sequence-to-sequence)**: Transformer မော်ဒယ်ရဲ့ encoder နှင့် decoder နှစ်ခုစလုံးကို အသုံးပြုထားသော architecture အမျိုးအစား။ စာသားတစ်ခုကို အခြားစာသားတစ်ခုအဖြစ် ပြောင်းလဲခြင်းလုပ်ငန်းများ (ဥပမာ- ဘာသာပြန်ခြင်း) အတွက် သင့်တော်သည်။
+*   **Sentence Classification**: စာကြောင်းတစ်ခုလုံး၏ အဓိပ္ပာယ် သို့မဟုတ် ရည်ရွယ်ချက်ကို အမျိုးအစားခွဲခြားခြင်း (ဥပမာ- စိတ်ခံစားမှု၊ ခေါင်းစဉ်)။
+*   **Named Entity Recognition (NER)**: စာသားထဲက လူအမည်၊ နေရာအမည်၊ အဖွဲ့အစည်းအမည် စတဲ့ သီးခြားအမည်တွေကို ရှာဖွေဖော်ထုတ်ခြင်း။
+*   **Word Classification**: စာကြောင်းတစ်ခုရှိ စကားလုံးတစ်လုံးချင်းစီကို ၎င်း၏ သဒ္ဒါ သို့မဟုတ် အခြားအဓိပ္ပာယ်အရ အမျိုးအစားခွဲခြားခြင်း။
+*   **Extractive Question Answering**: ပေးထားသော စာသားအပိုင်းအစမှ မေးခွန်း၏ အဖြေကို တိုက်ရိုက်ထုတ်ယူခြင်း။
+*   **Bi-directional Attention**: မော်ဒယ်က စာသားတစ်ခုလုံး၏ အကြောင်းအရာ (context) ကို ရှေ့ဘက်နှင့် နောက်ဘက် နှစ်ဖက်စလုံးမှ ကြည့်ရှုနားလည်နိုင်ခြင်း။
+*   **Auto-encoding Models**: စာသားကို ဖျက်ဆီးပြီးနောက် မူလစာသားကို ပြန်လည်တည်ဆောက်ရန် သင်ကြားထားသော မော်ဒယ်များ။
+*   **BERT (Bidirectional Encoder Representations from Transformers)**: Google မှ တီထွင်ထားသော encoder-only Transformer မော်ဒယ်။
+*   **DistilBERT**: BERT မော်ဒယ်ကို သေးငယ်ပြီး ပိုမိုမြန်ဆန်အောင် ပြုလုပ်ထားသော မော်ဒယ်။
+*   **Text Generation**: AI မော်ဒယ်များကို အသုံးပြု၍ လူသားကဲ့သို့သော စာသားအသစ်များ ဖန်တီးခြင်း။
+*   **Auto-regressive Models**: နောက်ထပ် token ကို ခန့်မှန်းရန် ယခင် tokens များကိုသာ အသုံးပြု၍ စာသားများကို တစ်ကြိမ်လျှင် token တစ်ခုချင်းစီ ဖန်တီးသော မော်ဒယ်များ။
+*   **GPT (Generative Pre-trained Transformer)**: OpenAI မှ တီထွင်ထားသော decoder-only Transformer မော်ဒယ်။
+*   **Llama**: Meta မှ တီထွင်ထားသော decoder-only Large Language Model (LLM) အမျိုးအစား။
+*   **Gemma**: Google မှ တီထွင်ထားသော decoder-only Large Language Model (LLM) အမျိုးအစား။
+*   **DeepSeek**: DeepSeek AI မှ တီထွင်ထားသော decoder-only Large Language Model (LLM) အမျိုးအစား။
+*   **Pretraining**: မော်ဒယ်ကို များပြားလှသော အထွေထွေဒေတာများဖြင့် အစောပိုင်းသင်ကြားမှု။
+*   **Instruction Tuning**: မော်ဒယ်ကို သီးခြားညွှန်ကြားချက်များကို လိုက်နာပြီး အထောက်အကူဖြစ်စေသော တုံ့ပြန်မှုများ ထုတ်လုပ်ရန် fine-tune လုပ်ခြင်း။
+*   **Token**: စာသားကို ပိုင်းခြားထားသော အသေးငယ်ဆုံးယူနစ် (ဥပမာ- စကားလုံး၊ စာလုံးအစိတ်အပိုင်း)။
+*   **Summarization**: စာသားရှည်များကို အကျဉ်းချုပ်ဖော်ပြခြင်း။
+*   **Translation**: ဘာသာစကားတစ်ခုမှ အခြားတစ်ခုသို့ စာသားများကို ပြောင်းလဲခြင်း။
+*   **Generative Question Answering**: မေးခွန်း၏ အဖြေကို ပေးထားသော အကြောင်းအရာ (context) အပေါ် အခြေခံ၍ စာသားအသစ်များ ဖန်တီးခြင်းဖြင့် ထုတ်ပေးခြင်း။
+*   **BART (Bidirectional and Auto-Regressive Transformers)**: Encoder-Decoder Transformer မော်ဒယ်တစ်မျိုး။
+*   **T5 (Text-to-Text Transfer Transformer)**: Encoder-Decoder Transformer မော်ဒယ်တစ်မျိုးဖြစ်ပြီး လုပ်ငန်းတာဝန်အားလုံးကို "text-to-text" ပုံစံဖြင့် ဖြေရှင်းရန် ဒီဇိုင်းထုတ်ထားသည်။
+*   **Marian**: အဓိကအားဖြင့် machine translation အတွက် အသုံးပြုသော encoder-decoder မော်ဒယ်။
+*   **mBART**: Multilingual BART (ဘာသာစကားမျိုးစုံအတွက် BART)။
+*   **Data-to-text Generation**: ဖွဲ့စည်းထားသော ဒေတာများကို သဘာဝဘာသာစကားစာသားအဖြစ် ပြောင်းလဲခြင်း။
+*   **Grammar Correction**: စာသားရှိ သဒ္ဒါအမှားများကို ပြင်ဆင်ခြင်း။
+*   **Conversational AI**: လူသားများနှင့် သဘာဝဘာသာစကားဖြင့် အပြန်အလှန်ပြောဆိုနိုင်သော AI စနစ်များ။
+*   **RoBERTa**: BERT ကို ပိုမိုကောင်းမွန်အောင် လေ့ကျင့်ထားသော encoder-only မော်ဒယ်။
+*   **Attention Matrix**: Transformer မော်ဒယ်များတွင် အသုံးပြုသော matrix တစ်ခုဖြစ်ပြီး input sequence အတွင်းရှိ token များအချင်းချင်း မည်မျှဆက်စပ်နေသည်ကို ဖော်ပြသည်။
+*   **Computational Bottleneck**: စနစ်တစ်ခု၏ စွမ်းဆောင်ရည်ကို ကန့်သတ်ထားသော အရင်းအမြစ် သို့မဟုတ် လုပ်ငန်းစဉ်။
+*   **Sparse Attention**: attention matrix ၏ အရေးမကြီးသော အစိတ်အပိုင်းများကို လျစ်လျူရှုခြင်းဖြင့် တွက်ချက်မှု ထိရောက်အောင် ပြုလုပ်ထားသော attention mechanism အမျိုးအစား။
+*   **LSH (Locality Sensitive Hashing) Attention**: Reformer မော်ဒယ်တွင် အသုံးပြုသော attention အမျိုးအစားဖြစ်ပြီး ဆင်တူသော query နှင့် key များကို ရှာဖွေရန် hash function များကို အသုံးပြုသည်။
+*   **Longformer**: ရှည်လျားသော input sequences များကို ကိုင်တွယ်နိုင်ရန် local attention နှင့် global attention တို့ကို ပေါင်းစပ်အသုံးပြုထားသော Transformer မော်ဒယ်။
+*   **Local Attention**: ပေးထားသော token တစ်ခုအတွက် အနီးအနားရှိ tokens များကိုသာ အာရုံစိုက်သော attention mechanism။
+*   **Receptive Field**: neural network layer တစ်ခု၏ output ယူနစ်တစ်ခုကို လွှမ်းမိုးသော input data ၏ အရွယ်အစား။
+*   **Global Attention**: အချို့သော input tokens များအတွက် input sequence ရှိ tokens အားလုံးကို အာရုံစိုက်ခွင့်ပြုသော attention mechanism။
+*   **Axial Positional Encodings**: ရှည်လျားသော sequences များအတွက် positional encoding ကို ပိုမိုထိရောက်အောင် ပြုလုပ်ရန် matrix တစ်ခုကို သေးငယ်သော matrices နှစ်ခုအဖြစ် ခွဲထုတ်ခြင်းနည်းလမ်း။
+*   **Hidden State**: Transformer မော်ဒယ်များတွင် layer တစ်ခုမှ အခြားတစ်ခုသို့ လက်ဆင့်ကမ်းပေးသော အတွင်းပိုင်း ကိုယ်စားပြုအချက်အလက်။
+*   **Dimension**: vector သို့မဟုတ် matrix တစ်ခု၏ အတိုင်းအတာအရေအတွက်။
\ No newline at end of file
diff --git a/chapters/my/chapter1/7.mdx b/chapters/my/chapter1/7.mdx
new file mode 100644
index 000000000..c51de1dd1
--- /dev/null
+++ b/chapters/my/chapter1/7.mdx
@@ -0,0 +1,287 @@
+<!-- DISABLE-FRONTMATTER-SECTIONS -->
+
+# အမှတ်မပေးသော Quiz[[ungraded-quiz]]
+
+<CourseFloatingBanner
+    chapter={1}
+    classNames="absolute z-10 right-0 top-0"
+/>
+
+ဒီအခန်းက သင်ကြားရမယ့် အကြောင်းအရာတွေ အများကြီးကို ဖော်ပြခဲ့ပြီးပါပြီ။ အသေးစိတ် အချက်အလက်အားလုံးကို နားမလည်သေးရင်လည်း စိတ်မပူပါနဲ့။ ဒါပေမယ့် ဒီ quiz နဲ့ သင် သင်ယူခဲ့တာတွေကို ပြန်လည်သုံးသပ်ကြည့်ရအောင်။
+
+ဒီ quiz က အမှတ်မပေးတဲ့အတွက် သင်နှစ်သက်သလောက် အကြိမ်ကြိမ် ကြိုးစားဖြေဆိုနိုင်ပါတယ်။ မေးခွန်းအချို့နဲ့ ရုန်းကန်ရရင် အကြံပြုချက်တွေကို လိုက်နာပြီး သင်ခန်းစာတွေကို ပြန်လည်လေ့လာပါ။ ဒီအကြောင်းအရာတွေကို အသိအမှတ်ပြု စာမေးပွဲမှာ ထပ်မံဖြေဆိုရမှာ ဖြစ်ပါတယ်။
+
+### 1. Hub ကို ရှာဖွေပြီး `roberta-large-mnli` checkpoint ကို ရှာပါ။ ၎င်းသည် မည်သည့်လုပ်ငန်းကို လုပ်ဆောင်ပါသနည်း။
+
+<Question
+	choices={[
+		{
+			text: "အကျဉ်းချုပ်ခြင်း (Summarization)",
+			explain: "roberta-large-mnli စာမျက်နှာကို <a href=\"https://huggingface.co/roberta-large-mnli\">ပြန်လည်ကြည့်ရှုပါ။</a>"
+		},
+		{
+			text: "စာသားခွဲခြားသတ်မှတ်ခြင်း (Text classification)",
+			explain: " ပိုတိတိကျကျပြောရရင် ၎င်းသည် စာကြောင်းနှစ်ကြောင်းက ယုတ္တိရှိရှိ ဆက်စပ်မှုရှိမရှိကို အဆင့်သုံးဆင့် (contradiction, neutral, entailment) နဲ့ ခွဲခြားသတ်မှတ်ပါတယ်။ ဒီလုပ်ငန်းကို <em>natural language inference</em> လို့လည်း ခေါ်ပါတယ်။",
+			correct: true
+		},
+		{
+			text: "စာသားထုတ်လုပ်ခြင်း (Text generation)",
+			explain: "roberta-large-mnli စာမျက်နှာကို <a href=\"https://huggingface.co/roberta-large-mnli\">ပြန်လည်ကြည့်ရှုပါ။</a>"
+		}
+	]}
+/>
+
+### 2. အောက်ပါ code သည် မည်သည့်အရာကို ပြန်ပေးမည်နည်း။
+
+```py
+from transformers import pipeline
+
+ner = pipeline("ner", grouped_entities=True)
+ner("My name is Sylvain and I work at Hugging Face in Brooklyn.")
+```
+
+<Question
+	choices={[
+		{
+			text: "၎င်းသည် ဤစာကြောင်းအတွက် classification scores \"positive\" သို့မဟုတ် \"negative\" အညွှန်းများဖြင့် ကို ပြန်ပေးပါလိမ့်မည်။",
+			explain: "ဒါက မမှန်ကန်ပါဘူး — ဒါက `sentiment-analysis` pipeline ဖြစ်ပါလိမ့်မယ်။"
+		},
+		{
+			text: "၎င်းသည် ဤစာကြောင်းကို ဖြည့်စွက်ထားသော ဖန်တီးထားသည့် စာသားကို ပြန်ပေးပါလိမ့်မည်။",
+			explain: "ဒါက မမှန်ကန်ပါဘူး — ဒါက `text-generation` pipeline ဖြစ်ပါလိမ့်မယ်။",
+		},
+		{
+			text: "၎င်းသည် လူပုဂ္ဂိုလ်များ၊ အဖွဲ့အစည်းများ သို့မဟုတ် နေရာများကို ကိုယ်စားပြုသည့် စကားလုံးများကို ပြန်ပေးပါလိမ့်မည်။",
+			explain: "ထို့အပြင် `grouped_entities=True` ကို အသုံးပြုထားသောကြောင့် ၎င်းသည် 'Hugging Face' ကဲ့သို့သော တူညီသည့် entity နှင့် သက်ဆိုင်သည့် စကားလုံးများကို အုပ်စုဖွဲ့ပေးပါလိမ့်မည်။",
+			correct: true
+		}
+	]}
+/>
+
+### 3. ဤ code နမူနာတွင် ... နေရာ၌ မည်သည့်အရာကို အစားထိုးသင့်သနည်း။
+
+```py
+from transformers import pipeline
+
+filler = pipeline("fill-mask", model="bert-base-cased")
+result = filler("...")
+```
+
+<Question
+	choices={[
+		{
+			text: "This &#60;mask> has been waiting for you.",
+			explain: "ဒါက မမှန်ကန်ပါဘူး။ `bert-base-cased` မော်ဒယ်ကတ်ကို ကြည့်ပြီး သင့်ရဲ့ အမှားကို ရှာဖွေကြည့်ပါ။"
+		},
+		{
+			text: "This [MASK] has been waiting for you.",
+			explain: "ဒီမော်ဒယ်ရဲ့ mask token က `[MASK]` ဖြစ်ပါတယ်။",
+			correct: true
+		},
+		{
+			text: "This man has been waiting for you.",
+			explain: "ဒါက မမှန်ကန်ပါဘူး။ ဒီ pipeline က ဝှက်ထားတဲ့ စကားလုံးတွေကို ဖြည့်ဆည်းပေးတာဖြစ်တဲ့အတွက် mask token တစ်ခုခု လိုအပ်ပါတယ်။"
+		}
+	]}
+/>
+
+### 4. ဤ code သည် အဘယ်ကြောင့် အလုပ်မလုပ်နိုင်သနည်း။
+
+```py
+from transformers import pipeline
+
+classifier = pipeline("zero-shot-classification")
+result = classifier("This is a course about the Transformers library")
+```
+
+<Question
+	choices={[
+		{
+			text: "ဤ pipeline သည် ဤစာသားကို အမျိုးအစားခွဲခြားရန်အတွက် အညွှန်းများ (labels) ပေးရန် လိုအပ်ပါသည်။",
+			explain: "မှန်ပါတယ် — မှန်ကန်တဲ့ code မှာ `candidate_labels=[...]` ကို ထည့်သွင်းဖို့ လိုအပ်ပါတယ်။",
+			correct: true
+		},
+		{
+			text: "ဤ pipeline သည် စာကြောင်းတစ်ကြောင်းတည်းမဟုတ်ဘဲ စာကြောင်းများစွာ လိုအပ်ပါသည်။",
+			explain: "ဒါက မမှန်ကန်ပါဘူး။ ဒါပေမယ့် မှန်ကန်စွာ အသုံးပြုပါက ဒီ pipeline ဟာ စာကြောင်းစာရင်းတစ်ခုကို လုပ်ဆောင်နိုင်ပါတယ်။ (အခြား pipelines အားလုံးလိုပဲပေါ့)"
+		},
+		{
+			text: "🤗 Transformers library သည် အမြဲတမ်းလိုလို ပျက်နေပါသည်။",
+			explain: "ဒီအဖြေကို ကျွန်တော်တို့ မှတ်ချက်မပေးတော့ပါဘူး။"
+		},
+		{
+			text: "ဤ pipeline သည် ပိုရှည်သော inputs များ လိုအပ်ပါသည်။ ဤ input သည် အလွန်တိုတောင်းပါသည်။",
+			explain: "ဒါက မမှန်ကန်ပါဘူး။ အလွန်ရှည်လျားသော စာသားကို ဒီ pipeline က လုပ်ဆောင်တဲ့အခါ ဖြတ်တောက်သွားမှာ ဖြစ်ပါတယ်။"
+		}
+	]}
+/>
+
+### 5. "Transfer learning" ဆိုတာ ဘာကိုဆိုလိုတာလဲ။
+
+<Question
+	choices={[
+		{
+			text: "ကြိုတင်လေ့ကျင့်ထားသော မော်ဒယ်တစ်ခု၏ အသိပညာကို တူညီသော dataset ဖြင့် ထပ်မံလေ့ကျင့်ခြင်းအားဖြင့် မော်ဒယ်အသစ်သို့ လွှဲပြောင်းပေးခြင်း။",
+			explain: "မဟုတ်ပါဘူး၊ ဒါက မော်ဒယ်တစ်ခုတည်းရဲ့ ဗားရှင်းနှစ်ခု ဖြစ်သွားပါလိမ့်မယ်။"
+		},
+		{
+			text: "ကြိုတင်လေ့ကျင့်ထားသော မော်ဒယ်တစ်ခု၏ အသိပညာကို ဒုတိယမော်ဒယ်ကို ပထမမော်ဒယ်၏ weights များဖြင့် စတင်ခြင်းအားဖြင့် မော်ဒယ်အသစ်သို့ လွှဲပြောင်းပေးခြင်း။",
+			explain: "ဒုတိယမော်ဒယ်ကို လုပ်ငန်းအသစ်တစ်ခုအတွက် လေ့ကျင့်သောအခါ ၎င်းသည် ပထမမော်ဒယ်၏ အသိပညာကို 'လွှဲပြောင်း' ပေးပါတယ်။",
+			correct: true
+		},
+		{
+			text: "ကြိုတင်လေ့ကျင့်ထားသော မော်ဒယ်တစ်ခု၏ အသိပညာကို ဒုတိယမော်ဒယ်ကို ပထမမော်ဒယ်နှင့် တူညီသော architecture ဖြင့် တည်ဆောက်ခြင်းအားဖြင့် မော်ဒယ်အသစ်သို့ လွှဲပြောင်းပေးခြင်း။",
+			explain: "Architecture က မော်ဒယ်ကို ဘယ်လိုတည်ဆောက်ထားလဲဆိုတာကိုပဲ ပြောတာပါ။ ဒီကိစ္စမှာ အသိပညာကို မျှဝေတာ ဒါမှမဟုတ် လွှဲပြောင်းပေးတာ မရှိပါဘူး။"
+		}
+	]}
+/>
+
+### 6. မှန်လား မှားလား။ Language Model တစ်ခုသည် ၎င်း၏ pretraining အတွက် အညွှန်းများ (labels) မလိုအပ်ပါ။
+
+<Question
+	choices={[
+		{
+			text: "မှန်သည်",
+			explain: "Pretraining က များသောအားဖြင့် *self-supervised* ဖြစ်ပါတယ်။ ဒါက အညွှန်းတွေကို inputs တွေကနေ အလိုအလျောက် ဖန်တီးပေးတယ်လို့ ဆိုလိုပါတယ်။ (ဥပမာ- နောက်စကားလုံးကို ခန့်မှန်းတာ သို့မဟုတ် ဝှက်ထားတဲ့ စကားလုံးတွေကို ဖြည့်ဆည်းပေးတာ)။",
+			correct: true
+		},
+		{
+			text: "မှားသည်",
+			explain: "ဒါက မှန်ကန်တဲ့အဖြေ မဟုတ်ပါဘူး။"
+		}
+	]}
+/>
+
+### 7. "model"၊ "architecture" နှင့် "weights" ဟူသော ဝေါဟာရများကို အကောင်းဆုံး ဖော်ပြသည့် စာကြောင်းကို ရွေးပါ။
+
+<Question
+	choices={[
+		{
+			text: "အကယ်၍ model တစ်ခုသည် အဆောက်အအုံတစ်ခုဖြစ်ပါက ၎င်း၏ architecture သည် ပုံစံထုတ်ဒီဇိုင်း (blueprint) ဖြစ်ပြီး weights များသည် အတွင်း၌ နေထိုင်သူများ ဖြစ်သည်။",
+			explain: "ဒီဥပမာအတိုင်းဆိုရင် weights တွေက အဆောက်အအုံကို ဆောက်လုပ်ဖို့အတွက် အသုံးပြုတဲ့ အုတ်တွေနဲ့ အခြားပစ္စည်းတွေ ဖြစ်ပါလိမ့်မယ်။"
+		},
+		{
+			text: "Architecture တစ်ခုသည် model တစ်ခုကို တည်ဆောက်ရန် မြေပုံတစ်ခုဖြစ်ပြီး ၎င်း၏ weights များသည် မြေပုံပေါ်တွင် ဖော်ပြထားသော မြို့များ ဖြစ်သည်။",
+			explain: "ဒီဥပမာရဲ့ ပြဿနာကတော့ မြေပုံတစ်ခုက များသောအားဖြင့် ရှိပြီးသား အဖြစ်မှန်တစ်ခုကို ကိုယ်စားပြုပါတယ်။ (ပြင်သစ်မှာ Paris လို့ အမည်ရတဲ့ မြို့တစ်မြို့ပဲ ရှိပါတယ်)။ သတ်မှတ်ထားတဲ့ architecture တစ်ခုအတွက် weights များစွာ ဖြစ်နိုင်ပါတယ်။"
+		},
+		{
+			text: "Architecture တစ်ခုသည် model တစ်ခုကို တည်ဆောက်ရန် သင်္ချာဆိုင်ရာ functions များ၏ ဆက်တိုက်ဖြစ်စဉ်တစ်ခုဖြစ်ပြီး ၎င်း၏ weights များသည် ထို functions များ၏ parameters များ ဖြစ်သည်။",
+			explain: "တူညီသော သင်္ချာဆိုင်ရာ functions အစုံ (architecture) ကို မတူညီသော parameters (weights) များကို အသုံးပြုခြင်းဖြင့် မတူညီသော model များကို တည်ဆောက်ရန် အသုံးပြုနိုင်ပါသည်။",
+			correct: true
+		}
+	]}
+/>
+
+### 8. ဖန်တီးထားသော စာသားများဖြင့် prompts များကို ဖြည့်စွက်ရန်အတွက် မည်သည့်မော်ဒယ်အမျိုးအစားများကို အသုံးပြုမည်နည်း။
+
+<Question
+	choices={[
+		{
+			text: "Encoder model တစ်ခု",
+			explain: "Encoder model တစ်ခုသည် စာကြောင်းတစ်ခုလုံး၏ ကိုယ်စားပြုမှုကို ထုတ်ပေးပြီး ၎င်းသည် classification ကဲ့သို့သော လုပ်ငန်းများအတွက် ပိုမိုသင့်လျော်ပါသည်။"
+		},
+		{
+			text: "Decoder model တစ်ခု",
+			explain: "Decoder model များသည် prompt တစ်ခုမှ စာသားထုတ်လုပ်ရန်အတွက် အပြည့်အဝ သင့်လျော်ပါသည်။",
+			correct: true
+		},
+		{
+			text: "Sequence-to-sequence model တစ်ခု",
+			explain: "Sequence-to-sequence model များသည် input စာကြောင်းများနှင့် ဆက်စပ်ပြီး စာကြောင်းများကို ဖန်တီးလိုသည့် လုပ်ငန်းများအတွက် ပိုမိုသင့်လျော်ပြီး၊ သတ်မှတ်ထားသော prompt တစ်ခုအတွက် မဟုတ်ပါ။"
+		}
+	]}
+/>
+
+### 9. စာသားများကို အကျဉ်းချုပ်ရန်အတွက် မည်သည့်မော်ဒယ်အမျိုးအစားများကို အသုံးပြုမည်နည်း။
+
+<Question
+	choices={[
+		{
+			text: "Encoder model တစ်ခု",
+			explain: "Encoder model တစ်ခုသည် စာကြောင်းတစ်ခုလုံး၏ ကိုယ်စားပြုမှုကို ထုတ်ပေးပြီး ၎င်းသည် classification ကဲ့သို့သော လုပ်ငန်းများအတွက် ပိုမိုသင့်လျော်ပါသည်။"
+		},
+		{
+			text: "Decoder model တစ်ခု",
+			explain: "Decoder model များသည် output text (ဥပမာ- အကျဉ်းချုပ်များ) ကို ထုတ်လုပ်ရန် ကောင်းမွန်သော်လည်း၊ ၎င်းတို့တွင် အကျဉ်းချုပ်ရန် စာသားတစ်ခုလုံးကဲ့သို့သော context ကို အသုံးချနိုင်သည့် စွမ်းရည် မရှိပါ။"
+		},
+		{
+			text: "Sequence-to-sequence model တစ်ခု",
+			explain: "Sequence-to-sequence model များသည် အကျဉ်းချုပ်ခြင်း လုပ်ငန်းတစ်ခုအတွက် အပြည့်အဝ သင့်လျော်ပါသည်။",
+			correct: true
+		}
+	]}
+/>
+
+### 10. သတ်မှတ်ထားသော အညွှန်းများ (labels) အတိုင်း စာသား inputs များကို အမျိုးအစားခွဲခြားရန်အတွက် မည်သည့်မော်ဒယ်အမျိုးအစားများကို အသုံးပြုမည်နည်း။
+
+<Question
+	choices={[
+		{
+			text: "Encoder model တစ်ခု",
+			explain: "Encoder model တစ်ခုသည် စာကြောင်းတစ်ခုလုံး၏ ကိုယ်စားပြုမှုကို ထုတ်ပေးပြီး ၎င်းသည် classification ကဲ့သို့သော လုပ်ငန်းတစ်ခုအတွက် အပြည့်အဝ သင့်လျော်ပါသည်။",
+			correct: true
+		},
+		{
+			text: "Decoder model တစ်ခု",
+			explain: "Decoder model များသည် output text များကို ထုတ်လုပ်ရန် ကောင်းမွန်ပြီး၊ စာကြောင်းတစ်ခုမှ အညွှန်းတစ်ခုကို ထုတ်ယူရန်အတွက် မဟုတ်ပါ။"
+		},
+		{
+			text: "Sequence-to-sequence model တစ်ခု",
+			explain: "Sequence-to-sequence model များသည် input စာကြောင်းတစ်ခုအပေါ် အခြေခံပြီး စာသားကို ဖန်တီးလိုသည့် လုပ်ငန်းများအတွက် ပိုမိုသင့်လျော်ပြီး၊ အညွှန်းတစ်ခုအတွက် မဟုတ်ပါ။",
+		}
+	]}
+/>
+
+### 11. မော်ဒယ်တစ်ခုတွင် တွေ့ရသော ဘက်လိုက်မှု (bias) သည် မည်သည့်ရင်းမြစ်မှ ဖြစ်ပေါ်လာနိုင်သနည်း။
+
+<Question
+	choices={[
+		{
+			text: "မော်ဒယ်သည် ကြိုတင်လေ့ကျင့်ထားသော မော်ဒယ်၏ fine-tuned version တစ်ခုဖြစ်ပြီး ၎င်းမှ ဘက်လိုက်မှုကို ရယူခဲ့ခြင်း။",
+			explain: "Transfer Learning ကို အသုံးပြုသောအခါ ကြိုတင်လေ့ကျင့်ထားသော မော်ဒယ်တွင် ပါဝင်သည့် ဘက်လိုက်မှုသည် fine-tuned မော်ဒယ်တွင် ဆက်လက်တည်ရှိနေပါသည်။",
+			correct: true
+		},
+		{
+			text: "မော်ဒယ်ကို လေ့ကျင့်ရာတွင် အသုံးပြုခဲ့သော ဒေတာသည် ဘက်လိုက်မှု ရှိခြင်း။",
+			explain: "ဒါက ဘက်လိုက်မှုရဲ့ အထင်ရှားဆုံး ရင်းမြစ်တစ်ခုဖြစ်ပေမယ့် တစ်ခုတည်းတော့ မဟုတ်ပါဘူး။",
+			correct: true
+		},
+		{
+			text: "မော်ဒယ်က အကောင်းဆုံးဖြစ်အောင် လုပ်ဆောင်နေတဲ့ metric မှာ ဘက်လိုက်မှု ရှိခြင်း။",
+			explain: "ဘက်လိုက်မှုရဲ့ သိသာထင်ရှားမှု နည်းတဲ့ ရင်းမြစ်တစ်ခုကတော့ မော်ဒယ်ကို လေ့ကျင့်တဲ့ နည်းလမ်းပါ။ သင်ရွေးချယ်တဲ့ metric ကို မော်ဒယ်က မျက်စိမှိတ်ပြီး အကောင်းဆုံးဖြစ်အောင် လုပ်ဆောင်သွားမှာ ဖြစ်ပါတယ်။",
+			correct: true
+		}
+	]}
+/>
+
+## ဝေါဟာရ ရှင်းလင်းချက် (Glossary)
+
+*   **Quiz**: သင်ယူခဲ့သည့် အကြောင်းအရာများကို ပြန်လည်စစ်ဆေးရန် မေးခွန်းများ။
+*   **Transformer Models**: Natural Language Processing (NLP) မှာ အောင်မြင်မှုများစွာရရှိခဲ့တဲ့ deep learning architecture တစ်မျိုးပါ။ ၎င်းတို့ဟာ စာသားတွေထဲက စကားလုံးတွေရဲ့ ဆက်နွယ်မှုတွေကို "attention mechanism" သုံးပြီး နားလည်အောင် သင်ကြားပေးပါတယ်။
+*   **Hugging Face Hub**: AI မော်ဒယ်တွေ၊ datasets တွေနဲ့ demo တွေကို အခြားသူတွေနဲ့ မျှဝေဖို့၊ ရှာဖွေဖို့နဲ့ ပြန်လည်အသုံးပြုဖို့အတွက် အွန်လိုင်း platform တစ်ခု ဖြစ်ပါတယ်။
+*   **Checkpoint**: မော်ဒယ်တစ်ခုကို လေ့ကျင့်နေစဉ်အတွင်း အချိန်အတန်ကြာပြီးနောက် အခြေအနေတစ်ခုကို သိမ်းဆည်းထားသော အမှတ်။
+*   **Task**: AI မော်ဒယ်တစ်ခုက လုပ်ဆောင်ရန် လေ့ကျင့်ထားသော သီးခြားလုပ်ငန်း (ဥပမာ- စာသားခွဲခြားသတ်မှတ်ခြင်း၊ စာသားထုတ်လုပ်ခြင်း)။
+*   **Summarization**: စာသားတစ်ခုကို အဓိကအချက်အလက်များ မပျောက်ပျက်စေဘဲ ပိုမိုတိုတောင်းသော ပုံစံဖြင့် အကျဉ်းချုပ်ခြင်း။
+*   **Text Classification**: စာသားတစ်ခုကို ကြိုတင်သတ်မှတ်ထားသော အမျိုးအစားများ သို့မဟုတ် အညွှန်းများထဲသို့ ခွဲခြားသတ်မှတ်ခြင်း။
+*   **Natural Language Inference (NLI)**: စာကြောင်းနှစ်ကြောင်းကြားရှိ ယုတ္တိဆိုင်ရာ ဆက်နွယ်မှုကို ဆုံးဖြတ်သည့် လုပ်ငန်း။ (ဥပမာ- contradiction, neutral, entailment)
+*   **Text Generation**: AI မော်ဒယ်များကို အသုံးပြု၍ လူသားကဲ့သို့သော စာသားအသစ်များ ဖန်တီးခြင်း။
+*   **`pipeline()` function**: Hugging Face Transformers library မှာ ပါဝင်တဲ့ လုပ်ဆောင်ချက်တစ်ခုဖြစ်ပြီး မော်ဒယ်တွေကို သီးခြားလုပ်ငန်းတာဝန်များ (ဥပမာ- စာသားခွဲခြားသတ်မှတ်ခြင်း၊ စာသားထုတ်လုပ်ခြင်း) အတွက် အသုံးပြုရလွယ်ကူအောင် ပြုလုပ်ပေးပါတယ်။
+*   **`ner` (Named Entity Recognition)**: စာသားထဲက လူအမည်၊ နေရာအမည်၊ အဖွဲ့အစည်းအမည် စတဲ့ သီးခြားအမည်တွေကို ရှာဖွေဖော်ထုတ်ခြင်း။
+*   **`grouped_entities=True`**: `ner` pipeline တွင် အသုံးပြုသည့် parameter တစ်ခုဖြစ်ပြီး တူညီသော entity နှင့် သက်ဆိုင်သည့် စကားလုံးများကို အုပ်စုဖွဲ့ပေးသည်။
+*   **`sentiment-analysis` pipeline**: စာသားတစ်ခု၏ စိတ်ခံစားမှု (အပြုသဘော၊ အနုတ်သဘော) ကို ခွဲခြမ်းစိတ်ဖြာရန် အသုံးပြုသော pipeline။
+*   **`text-generation` pipeline**: input prompt အပေါ် အခြေခံ၍ စာသားအသစ်များကို ဖန်တီးရန် အသုံးပြုသော pipeline။
+*   **`fill-mask` pipeline**: စာသားတစ်ခုရှိ ဝှက်ထားသော စကားလုံးများ (mask tokens) ကို ဖြည့်ဆည်းပေးရန် အသုံးပြုသော pipeline။
+*   **`bert-base-cased`**: BERT (Bidirectional Encoder Representations from Transformers) မော်ဒယ်၏ ဗားရှင်းတစ်ခုဖြစ်ပြီး အင်္ဂလိပ်စာလုံးအကြီးအသေးကို ခွဲခြားသိမြင်သည်။ ၎င်း၏ mask token သည် `[MASK]` ဖြစ်သည်။
+*   **`zero-shot-classification` pipeline**: လေ့ကျင့်မှုဒေတာတွင် မမြင်ဖူးသေးသော အညွှန်းများဖြင့် စာသားများကို အမျိုးအစားခွဲခြားနိုင်သော pipeline။
+*   **`candidate_labels`**: `zero-shot-classification` pipeline တွင် အသုံးပြုသည့် parameter တစ်ခုဖြစ်ပြီး စာသားကို ခွဲခြားသတ်မှတ်ရန်အတွက် ဖြစ်နိုင်ခြေရှိသော အညွှန်းများ (labels) စာရင်းကို ပေးပို့သည်။
+*   **Transfer Learning**: ကြိုတင်လေ့ကျင့်ထားသော မော်ဒယ်တစ်ခု၏ အသိပညာကို အခြားလုပ်ငန်းတစ်ခု (new task) အတွက် မော်ဒယ်အသစ်သို့ လွှဲပြောင်းပေးခြင်း။
+*   **Pretrained Model**: ကြီးမားသော ဒေတာအစုအဝေးများဖြင့် အစောပိုင်းကတည်းက လေ့ကျင့်ထားသော မော်ဒယ်။
+*   **Fine-tuned Model**: ကြိုတင်လေ့ကျင့်ထားသော မော်ဒယ်တစ်ခုကို သီးခြားလုပ်ငန်းတစ်ခု (specific task) အတွက် အနည်းငယ်သော ဒေတာနှင့် ထပ်မံလေ့ကျင့်ပေးထားသော မော်ဒယ်။
+*   **Weights**: Machine Learning မော်ဒယ်တစ်ခု၏ သင်ယူနိုင်သော အစိတ်အပိုင်းများ။ ၎င်းတို့သည် လေ့ကျင့်နေစဉ်အတွင်း ဒေတာများမှ ပုံစံများကို သင်ယူကာ ချိန်ညှိပေးသည်။
+*   **Architecture**: Machine Learning မော်ဒယ်တစ်ခု၏ တည်ဆောက်ပုံ သို့မဟုတ် ဒီဇိုင်း။ ၎င်းသည် သင်္ချာဆိုင်ရာ functions များ၏ အစီအစဉ်နှင့် ၎င်းတို့ မည်သို့ချိတ်ဆက်ထားသည်ကို သတ်မှတ်သည်။
+*   **Self-supervised Learning**: အညွှန်းများ (labels) ကို inputs များမှ အလိုအလျောက် ထုတ်လုပ်နိုင်သည့် သင်ယူမှုပုံစံတစ်မျိုး။
+*   **Encoder Model**: Transformer Architecture ၏ အစိတ်အပိုင်းတစ်ခုဖြစ်ပြီး input data (ဥပမာ- စာသား) ကို နားလည်ပြီး ကိုယ်စားပြုတဲ့ အချက်အလက် (representation) အဖြစ် ပြောင်းလဲပေးကာ classification ကဲ့သို့သော လုပ်ငန်းများအတွက် သင့်လျော်သည်။
+*   **Decoder Model**: Transformer Architecture ၏ အစိတ်အပိုင်းတစ်ခုဖြစ်ပြီး encoder ကနေ ရရှိတဲ့ အချက်အလက် (representation) ကို အသုံးပြုပြီး output data (ဥပမာ- ဘာသာပြန်ထားတဲ့ စာသား သို့မဟုတ် စာသားထုတ်လုပ်ခြင်း) ကို ထုတ်ပေးသည်။
+*   **Sequence-to-sequence Model**: Encoder နှင့် Decoder နှစ်ခုစလုံး ပါဝင်သော Transformer architecture တစ်မျိုးဖြစ်ပြီး input sequence မှ output sequence တစ်ခုသို့ ပြောင်းလဲခြင်း (ဥပမာ- ဘာသာပြန်ခြင်း၊ အကျဉ်းချုပ်ခြင်း) လုပ်ငန်းများအတွက် အသုံးပြုပါတယ်။
+*   **Bias**: ဒေတာအစုအဝေး (dataset) သို့မဟုတ် မော်ဒယ်၏ လေ့ကျင့်မှုပုံစံကြောင့် ဖြစ်ပေါ်လာသော ဘက်လိုက်မှုများ။
+*   **Metric**: မော်ဒယ်တစ်ခု၏ စွမ်းဆောင်ရည်ကို တိုင်းတာရန် အသုံးပြုသော တိုင်းတာမှုစနစ်။
\ No newline at end of file
diff --git a/chapters/my/chapter1/8.mdx b/chapters/my/chapter1/8.mdx
new file mode 100644
index 000000000..74d80239c
--- /dev/null
+++ b/chapters/my/chapter1/8.mdx
@@ -0,0 +1,304 @@
+# LLMs များဖြင့် စာသားထုတ်လုပ်မှု မှန်းဆတွက်ချက်ခြင်း (Text Generation Inference) ကို နက်ရှိုင်းစွာ လေ့လာခြင်း[[inference-with-llms]]
+
+<CourseFloatingBanner
+    chapter={1}
+    classNames="absolute z-10 right-0 top-0"
+/>
+
+<Youtube id="Xp2w1_LKZN4" />
+
+ခုထိ ကျွန်တော်တို့ဟာ text classification ဒါမှမဟုတ် summarization လိုမျိုး သီးခြားလုပ်ငန်းဆောင်တာအမျိုးမျိုးနဲ့ ပတ်သက်ပြီး Transformer architecture ကို လေ့လာခဲ့ပြီးပါပြီ။ ဒါပေမယ့် Large Language Models တွေကို စာသားထုတ်လုပ်ဖို့ အများဆုံးအသုံးပြုကြပြီး၊ ဒီအခန်းမှာ ဒါကိုပဲ ကျွန်တော်တို့ လေ့လာသွားမှာပါ။
+
+ဒီစာမျက်နှာမှာ LLM inference ရဲ့ အဓိကသဘောတရားတွေကို ကျွန်တော်တို့ လေ့လာသွားမှာဖြစ်ပြီး၊ ဒီမော်ဒယ်တွေက စာသားတွေကို ဘယ်လိုထုတ်လုပ်တယ်၊ inference လုပ်ငန်းစဉ်မှာ ဘယ်လိုအဓိက အစိတ်အပိုင်းတွေ ပါဝင်တယ်ဆိုတာကို ပြည့်စုံစွာ နားလည်အောင် ရှင်းပြပေးပါမယ်။
+
+## အခြေခံသဘောတရားများကို နားလည်ခြင်း[[understanding-the-basics]]
+
+အခြေခံသဘောတရားများနဲ့ စလိုက်ရအောင်။ Inference ဆိုတာ လေ့ကျင့်ပြီးသား LLM တစ်ခုကို ပေးထားတဲ့ input prompt ကနေ လူသားဆန်တဲ့ စာသားတွေ ထုတ်လုပ်ဖို့ အသုံးပြုတဲ့ လုပ်ငန်းစဉ် ဖြစ်ပါတယ်။ ဘာသာစကားမော်ဒယ်တွေဟာ လေ့ကျင့်မှုကနေ ရရှိထားတဲ့ အသိပညာတွေကို အသုံးပြုပြီး တစ်ကြိမ်ကို စကားလုံးတစ်လုံးစီ စဉ်းစားကာ အဖြေတွေ ထုတ်ပေးပါတယ်။ မော်ဒယ်က parameters ဘီလီယံပေါင်းများစွာကနေ သင်ယူထားတဲ့ ဖြစ်နိုင်ခြေတွေကို အသုံးပြုပြီး sequence တစ်ခုရဲ့ နောက်ထပ် token ကို ခန့်မှန်းပြီး ထုတ်လုပ်ပေးပါတယ်။ ဒီလို အစဉ်လိုက် ထုတ်လုပ်ခြင်းက LLMs တွေကို ဆက်စပ်မှုရှိပြီး အကြောင်းအရာနဲ့ ကိုက်ညီတဲ့ စာသားတွေကို ထုတ်လုပ်နိုင်စေတာ ဖြစ်ပါတယ်။
+
+## Attention ရဲ့ အခန်းကဏ္ဍ[[the-role-of-attention]]
+
+Attention mechanism ဟာ LLMs တွေကို အကြောင်းအရာ (context) ကို နားလည်ပြီး ဆက်စပ်မှုရှိတဲ့ အဖြေတွေကို ထုတ်ပေးနိုင်စေတဲ့ အရာဖြစ်ပါတယ်။ နောက်ထပ်စကားလုံးကို ခန့်မှန်းတဲ့အခါ စာကြောင်းတစ်ကြောင်းထဲက စကားလုံးတိုင်းက အရေးပါမှု တူညီတာ မဟုတ်ပါဘူး။ ဥပမာ - "The capital of France is ..." ဆိုတဲ့ စာကြောင်းမှာ "France" နဲ့ "capital" ဆိုတဲ့ စကားလုံးတွေက နောက်ထပ် "Paris" ဆိုတဲ့ စကားလုံးကို ဆုံးဖြတ်ဖို့ အရေးကြီးပါတယ်။ ဒီလို သက်ဆိုင်ရာ အချက်အလက်တွေအပေါ် အာရုံစိုက်နိုင်စွမ်းကို ကျွန်တော်တို့ Attention လို့ ခေါ်ပါတယ်။
+
+<img src="https://huggingface.co/datasets/agents-course/course-images/resolve/main/en/unit1/AttentionSceneFinal.gif" alt="Visual Gif of Attention" width="60%">
+
+နောက်ထပ် token ကို ခန့်မှန်းဖို့ အသက်ဆိုင်ဆုံး စကားလုံးတွေကို ဖော်ထုတ်တဲ့ ဒီလုပ်ငန်းစဉ်ဟာ အံ့သြစရာကောင်းလောက်အောင် ထိရောက်မှုရှိတယ်ဆိုတာ သက်သေပြခဲ့ပြီးပါပြီ။ LLMs တွေ လေ့ကျင့်တဲ့ အခြေခံမူ—နောက်ထပ် token ကို ခန့်မှန်းခြင်း—ဟာ BERT နဲ့ GPT-2 ကတည်းက ယေဘုယျအားဖြင့် အတူတူပဲ ရှိခဲ့ပေမယ့်၊ neural network တွေကို ချဲ့ထွင်ရာမှာနဲ့ attention mechanism ကို ပိုမိုရှည်လျားတဲ့ sequence တွေအတွက် ကုန်ကျစရိတ်သက်သာစွာ အလုပ်လုပ်နိုင်အောင် လုပ်ဆောင်ရာမှာ သိသိသာသာ တိုးတက်မှုတွေ ရှိခဲ့ပါတယ်။
+
+> [!TIP]
+> အတိုချုပ်ပြောရရင် attention mechanism ဟာ LLMs တွေ ဆက်စပ်မှုရှိပြီး အကြောင်းအရာကို နားလည်တဲ့ စာသားတွေကို ထုတ်လုပ်နိုင်စေဖို့အတွက် အဓိကသော့ချက် ဖြစ်ပါတယ်။ ဒါက ခေတ်မီ LLMs တွေကို ယခင်မျိုးဆက် ဘာသာစကားမော်ဒယ်တွေနဲ့ ကွဲပြားစေပါတယ်။
+
+### Context Length နဲ့ Attention Span[[context-length-and-attention-span]]
+
+Attention ကို နားလည်ပြီးပြီဆိုတော့ LLM တစ်ခုက ဘယ်လောက်အထိ context ကို ကိုင်တွယ်နိုင်မလဲဆိုတာကို ဆက်လေ့လာကြည့်ရအောင်။ ဒါက model ရဲ့ 'attention span' လို့ခေါ်တဲ့ context length နဲ့ သက်ဆိုင်ပါတယ်။
+
+Context length ဆိုတာ LLM တစ်ခုက တစ်ကြိမ်တည်း လုပ်ဆောင်နိုင်တဲ့ အများဆုံး token (စကားလုံး ဒါမှမဟုတ် စကားလုံးရဲ့ အစိတ်အပိုင်း) အရေအတွက်ကို ရည်ညွှန်းပါတယ်။ ဒါကို model ရဲ့ အလုပ်လုပ်တဲ့ မှတ်ဉာဏ် (working memory) ရဲ့ အရွယ်အစားလို့ တွေးကြည့်နိုင်ပါတယ်။
+
+ဒီစွမ်းရည်တွေဟာ လက်တွေ့ကျတဲ့ အချက်အလက်အချို့ကြောင့် ကန့်သတ်ထားပါတယ်-
+- Model ရဲ့ architecture နဲ့ အရွယ်အစား
+- ရရှိနိုင်တဲ့ ကွန်ပျူတာ အရင်းအမြစ်များ
+- input နဲ့ ထုတ်လိုတဲ့ output ရဲ့ ရှုပ်ထွေးမှု
+
+စံပြကမ္ဘာမှာဆိုရင် မော်ဒယ်ကို ကန့်သတ်ချက်မရှိတဲ့ context တွေ ထည့်ပေးနိုင်ပေမယ့်၊ hardware ကန့်သတ်ချက်တွေနဲ့ ကွန်ပျူတာ ကုန်ကျစရိတ်တွေကြောင့် ဒါက လက်တွေ့မကျပါဘူး။ ဒါကြောင့် ကန့်သတ်ချက်နဲ့ ထိရောက်မှုကို မျှတအောင် ထိန်းညှိဖို့အတွက် မတူညီတဲ့ context length တွေနဲ့ မော်ဒယ်တွေကို ဒီဇိုင်းထုတ်ထားတာ ဖြစ်ပါတယ်။
+
+> [!TIP]
+> Context length ဆိုတာ မော်ဒယ်က အဖြေတစ်ခုကို ထုတ်လုပ်တဲ့အခါ တစ်ကြိမ်တည်းမှာ ထည့်သွင်းစဉ်းစားနိုင်တဲ့ အများဆုံး token အရေအတွက် ဖြစ်ပါတယ်။
+
+### Prompting ပညာ[[the-art-of-prompting]]
+
+ကျွန်တော်တို့ LLMs တွေကို အချက်အလက်တွေ ပေးပို့တဲ့အခါ၊ LLM ရဲ့ ထုတ်လုပ်မှုကို လိုချင်တဲ့ output ဆီ ဦးတည်နိုင်အောင် input ကို ပုံစံချပါတယ်။ ဒါကို _prompting_ လို့ ခေါ်ပါတယ်။
+
+LLMs တွေက အချက်အလက်တွေကို ဘယ်လိုလုပ်ဆောင်တယ်ဆိုတာ နားလည်ခြင်းက ပိုကောင်းတဲ့ prompts တွေကို ဖန်တီးနိုင်ဖို့ ကူညီပေးပါတယ်။ မော်ဒယ်ရဲ့ အဓိကတာဝန်က input token တစ်ခုစီရဲ့ အရေးပါမှုကို ခွဲခြမ်းစိတ်ဖြာပြီး နောက်ထပ် token ကို ခန့်မှန်းဖို့ဖြစ်တာကြောင့်၊ သင်ရဲ့ input sequence ရဲ့ စကားလုံးဖွဲ့စည်းပုံက အရေးကြီးလာပါတယ်။
+
+> [!TIP]
+> Prompt ကို သေချာဒီဇိုင်းထုတ်ခြင်းက **LLM ရဲ့ ထုတ်လုပ်မှုကို လိုချင်တဲ့ output ဆီ ဦးတည်စေရန် ပိုမိုလွယ်ကူစေပါတယ်**။
+
+## နှစ်ဆင့်ပါသော Inference လုပ်ငန်းစဉ်[[the-two-phase-inference-process]]
+
+အခြေခံအစိတ်အပိုင်းတွေကို ကျွန်တော်တို့ နားလည်ပြီးပြီဆိုတော့ LLMs တွေက စာသားတွေကို ဘယ်လိုထုတ်လုပ်တယ်ဆိုတာကို နက်ရှိုင်းစွာ လေ့လာကြည့်ရအောင်။ လုပ်ငန်းစဉ်ကို အဓိက အဆင့်နှစ်ဆင့်ခွဲနိုင်ပါတယ်- prefill နဲ့ decode ပါ။ ဒီအဆင့်တွေက ပူးပေါင်းပြီး အလုပ်လုပ်ကြပြီး၊ စာသားတွေ ဆက်စပ်မှုရှိအောင် ထုတ်လုပ်ရာမှာ အရေးကြီးတဲ့ အခန်းကဏ္ဍတွေကနေ ပါဝင်ပါတယ်။
+
+### Prefill အဆင့်[[the-prefill-phase]]
+
+Prefill အဆင့်ဟာ ချက်ပြုတ်ရာမှာ ပြင်ဆင်မှုအဆင့်နဲ့ တူပါတယ်။ ဒီအဆင့်မှာ ကနဦးပါဝင်ပစ္စည်းအားလုံးကို လုပ်ဆောင်ပြီး အသင့်ပြင်ဆင်ပါတယ်။ ဒီအဆင့်မှာ အဓိကအချက် (၃) ချက် ပါဝင်ပါတယ်-
+
+1.  **Tokenization**: input စာသားကို tokens တွေအဖြစ် ပြောင်းလဲခြင်း (ဒါတွေကို မော်ဒယ်က နားလည်တဲ့ အခြေခံ building blocks တွေလို့ တွေးကြည့်နိုင်ပါတယ်)
+2.  **Embedding Conversion**: ဒီ tokens တွေကို ၎င်းတို့ရဲ့ အဓိပ္ပာယ်ကို ဖမ်းယူထားတဲ့ ဂဏန်းဆိုင်ရာ ကိုယ်စားပြုမှု (numerical representations) တွေအဖြစ် ပြောင်းလဲခြင်း
+3.  **ကနဦး လုပ်ဆောင်ခြင်း (Initial Processing)**: context ကို ပြည့်ပြည့်စုံစုံ နားလည်မှု ဖန်တီးဖို့အတွက် ဒီ embeddings တွေကို model ရဲ့ neural network တွေကနေတဆင့် လုပ်ဆောင်ခြင်း
+
+ဒီအဆင့်ဟာ input tokens အားလုံးကို တစ်ကြိမ်တည်း လုပ်ဆောင်ဖို့ လိုအပ်တာကြောင့် ကွန်ပျူတာအရင်းအမြစ်များစွာ လိုအပ်ပါတယ်။ ဒါကို အဖြေမရေးခင် စာပိုဒ်တစ်ခုလုံးကို ဖတ်ပြီး နားလည်တာနဲ့ တူတယ်လို့ တွေးကြည့်နိုင်ပါတယ်။
+
+အောက်က interactive playground မှာ မတူညီတဲ့ tokenizers တွေနဲ့ စမ်းသပ်ကြည့်နိုင်ပါတယ်-
+
+<iframe
+	src="https://agents-course-the-tokenizer-playground.static.hf.space"
+	frameborder="0"
+	width="850"
+	height="450"
+></iframe>
+
+### Decode အဆင့်[[the-decode-phase]]
+
+Prefill အဆင့်က input ကို လုပ်ဆောင်ပြီးနောက်မှာတော့ decode အဆင့်ကို ရောက်ရှိလာပါပြီ - ဒီနေရာမှာ စာသားထုတ်လုပ်ခြင်း အမှန်တကယ် ဖြစ်လာပါတယ်။ မော်ဒယ်က တစ်ကြိမ်ကို token တစ်ခုစီ ထုတ်လုပ်ပြီး၊ ဒါကို autoregressive process (အသစ်ထွက်လာတဲ့ token တိုင်းက ယခင် tokens အားလုံးပေါ် မူတည်ပါတယ်) လို့ ခေါ်ပါတယ်။
+
+Decode အဆင့်မှာ အသစ်ထွက်လာတဲ့ token တိုင်းအတွက် အဓိကအချက်များစွာ ပါဝင်ပါတယ်-
+1.  **Attention Computation**: context ကို နားလည်ဖို့ ယခင် tokens အားလုံးကို ပြန်ကြည့်ခြင်း
+2.  **ဖြစ်နိုင်ခြေ တွက်ချက်ခြင်း (Probability Calculation)**: ဖြစ်နိုင်ခြေရှိတဲ့ နောက်ထပ် token တစ်ခုစီရဲ့ ဖြစ်နိုင်ခြေကို ဆုံးဖြတ်ခြင်း
+3.  **Token ရွေးချယ်ခြင်း (Token Selection)**: ဒီဖြစ်နိုင်ခြေတွေအပေါ် အခြေခံပြီး နောက်ထပ် token ကို ရွေးချယ်ခြင်း
+4.  **ဆက်လက်လုပ်ဆောင်မှု စစ်ဆေးခြင်း (Continuation Check)**: ဆက်လုပ်မလား ဒါမှမဟုတ် ထုတ်လုပ်မှုကို ရပ်မလားဆိုတာကို ဆုံးဖြတ်ခြင်း
+
+ဒီအဆင့်ဟာ မော်ဒယ်က ယခင်ထုတ်လုပ်ထားတဲ့ tokens အားလုံးနဲ့ ၎င်းတို့ရဲ့ ဆက်စပ်မှုတွေကို မှတ်ထားဖို့ လိုအပ်တာကြောင့် memory-intensive ဖြစ်ပါတယ်။
+
+## Sampling နည်းဗျူဟာများ[[sampling-strategies]]
+
+မော်ဒယ်က စာသားတွေကို ဘယ်လိုထုတ်လုပ်တယ်ဆိုတာ နားလည်ပြီးပြီဆိုတော့ ဒီထုတ်လုပ်မှု လုပ်ငန်းစဉ်ကို ဘယ်လိုထိန်းချုပ်နိုင်မလဲဆိုတာကို လေ့လာကြည့်ရအောင်။ စာရေးဆရာတစ်ဦးက ပိုမိုတီထွင်ဖန်တီးမှုရှိမလား ဒါမှမဟုတ် ပိုမိုတိကျမလားဆိုတာ ရွေးချယ်နိုင်သလိုပဲ၊ ကျွန်တော်တို့လည်း မော်ဒယ်က tokens တွေကို ဘယ်လိုရွေးချယ်တယ်ဆိုတာကို ချိန်ညှိနိုင်ပါတယ်။
+
+ဒီ Space မှာ SmolLM2 နဲ့ အခြေခံ decoding လုပ်ငန်းစဉ်ကို ကိုယ်တိုင် အပြန်အလှန်လုပ်ဆောင်နိုင်ပါတယ် (မှတ်ထားပါ၊ ဒီမော်ဒယ်အတွက် **<|im_end|>** ဖြစ်တဲ့ **EOS** token ကို မရောက်မချင်း decode လုပ်ပါလိမ့်မယ်)-
+
+<iframe
+	src="https://agents-course-decoding-visualizer.hf.space"
+	frameborder="0"
+	width="850"
+	height="450"
+></iframe>
+
+### Token ရွေးချယ်မှုကို နားလည်ခြင်း- ဖြစ်နိုင်ခြေများမှ Token ရွေးချယ်မှုများဆီသို့[[understanding-token-selection-from-probabilities-to-token-choices]]
+
+မော်ဒယ်က နောက်ထပ် token ကို ရွေးချယ်ဖို့ လိုအပ်တဲ့အခါ၊ ၎င်းရဲ့ ဝေါဟာရ (vocabulary) ထဲက စကားလုံးတိုင်းအတွက် ကနဦး ဖြစ်နိုင်ခြေ (logits) တွေနဲ့ စတင်ပါတယ်။ ဒါပေမယ့် ဒီဖြစ်နိုင်ခြေတွေကို လက်တွေ့ရွေးချယ်မှုတွေအဖြစ် ဘယ်လိုပြောင်းလဲမလဲ။ လုပ်ငန်းစဉ်ကို ခွဲခြမ်းစိတ်ဖြာကြည့်ရအောင်-
+
+![image](https://huggingface.co/reasoning-course/images/resolve/main/inference/1.png)  
+
+1.  **Raw Logits**: ဒါတွေကို မော်ဒယ်ရဲ့ နောက်ထပ် ဖြစ်နိုင်ခြေရှိတဲ့ စကားလုံးတိုင်းအတွက် ကနဦး ခံစားချက်တွေလို့ တွေးကြည့်ပါ။
+2.  **Temperature Control**: တီထွင်ဖန်တီးမှု ခလုတ်လိုပါပဲ - တန်ဖိုးမြင့်လေ (>1.0) ရွေးချယ်မှုတွေ ပိုမိုကျပန်းဆန်ပြီး ဖန်တီးမှုရှိလေ၊ တန်ဖိုးနိမ့်လေ (<1.0) ပိုမိုအာရုံစိုက်ပြီး တိကျလေ ဖြစ်ပါတယ်။
+3.  **Top-p (Nucleus) Sampling**: ဖြစ်နိုင်ခြေရှိတဲ့ စကားလုံးအားလုံးကို ထည့်သွင်းစဉ်းစားမယ့်အစား၊ ကျွန်တော်တို့ ရွေးချယ်ထားတဲ့ ဖြစ်နိုင်ခြေ ကန့်သတ်ချက် (ဥပမာ- ထိပ်ဆုံး 90%) နဲ့ ကိုက်ညီတဲ့ အဖြစ်နိုင်ဆုံး စကားလုံးတွေကိုပဲ ကြည့်ပါတယ်။
+4.  **Top-k Filtering**: အခြားနည်းလမ်းတစ်ခုဖြစ်ပြီး၊ ကျွန်တော်တို့ဟာ အဖြစ်နိုင်ဆုံး နောက်ထပ်စကားလုံး k လုံးကိုပဲ ထည့်သွင်းစဉ်းစားပါတယ်။
+
+### ထပ်ခါတလဲလဲဖြစ်ခြင်းကို ထိန်းချုပ်ခြင်း- Output ကို လတ်ဆတ်နေစေခြင်း[[managing-repetition-keeping-output-fresh]]
+
+LLMs တွေနဲ့ အဖြစ်များတဲ့ စိန်ခေါ်မှုတစ်ခုကတော့ သူတို့ကိုယ်သူတို့ ထပ်ခါတလဲလဲပြောတတ်တဲ့ သဘောရှိခြင်းပါ - အချက်အလက်တွေကို အကြိမ်ကြိမ် ပြန်ပြောတတ်တဲ့ စကားပြောသူတစ်ဦးနဲ့ တူပါတယ်။ ဒါကို ဖြေရှင်းဖို့အတွက် ကျွန်တော်တို့ဟာ ပြစ်ဒဏ်နှစ်မျိုးကို အသုံးပြုပါတယ်-
+
+1.  **Presence Penalty**: ယခင်က ပါဝင်ခဲ့ဖူးတဲ့ token တစ်ခုစီအတွက် အကြိမ်အရေအတွက် ဘယ်လောက်ပဲ ဖြစ်ပါစေ၊ သတ်မှတ်ထားတဲ့ ပြစ်ဒဏ်တစ်ခုကို ပေးပါတယ်။ ဒါက မော်ဒယ်ကို တူညီတဲ့ စကားလုံးတွေကို ထပ်ခါတလဲလဲ အသုံးမပြုမိအောင် ကူညီပေးပါတယ်။
+2.  **Frequency Penalty**: token တစ်ခု ဘယ်နှစ်ကြိမ် အသုံးပြုပြီးပြီလဲဆိုတာပေါ်မူတည်ပြီး တိုးလာတဲ့ ပြစ်ဒဏ်တစ်ခုပါ။ စကားလုံးတစ်ခု ပိုမိုပါဝင်လေ၊ နောက်ထပ်ရွေးချယ်ခံရဖို့ အခွင့်အလမ်း နည်းလေ ဖြစ်ပါတယ်။
+
+![image](https://huggingface.co/reasoning-course/images/resolve/main/inference/2.png)  
+
+ဒီပြစ်ဒဏ်တွေကို token ရွေးချယ်မှု လုပ်ငန်းစဉ်ရဲ့ အစောပိုင်းမှာ လိမ်းကျံပြီး၊ အခြား sampling နည်းဗျူဟာတွေကို အသုံးမပြုခင် ကနဦး ဖြစ်နိုင်ခြေတွေကို ချိန်ညှိပေးပါတယ်။ ဒါတွေကို မော်ဒယ်ကို ဝေါဟာရအသစ်တွေ ရှာဖွေဖို့ နူးညံ့စွာ တွန်းအားပေးတာနဲ့ တူတယ်လို့ တွေးကြည့်နိုင်ပါတယ်။
+
+### ထုတ်လုပ်မှု အရှည်ကို ထိန်းချုပ်ခြင်း- ကန့်သတ်ချက်များ သတ်မှတ်ခြင်း[[controlling-generation-length-setting-boundaries]]
+
+ကောင်းမွန်တဲ့ ပုံပြင်တစ်ခုမှာ သင့်လျော်တဲ့ စည်းမျဉ်းစည်းကမ်းနဲ့ အရှည်ရှိဖို့ လိုအပ်သလိုပဲ၊ ကျွန်တော်တို့ရဲ့ LLM က ဘယ်လောက်များများ စာသားထုတ်လုပ်မလဲဆိုတာကို ထိန်းချုပ်ဖို့ နည်းလမ်းတွေ လိုအပ်ပါတယ်။ ဒါက လက်တွေ့အသုံးချမှုတွေအတွက် အရေးကြီးပါတယ် - tweet တစ်ခုလောက်တိုတဲ့ အဖြေဖြစ်စေ၊ blog post အပြည့်အစုံဖြစ်စေပေါ့။
+
+ထုတ်လုပ်မှု အရှည်ကို နည်းလမ်းများစွာနဲ့ ထိန်းချုပ်နိုင်ပါတယ်-
+1.  **Token Limits**: အနည်းဆုံးနဲ့ အများဆုံး token အရေအတွက်ကို သတ်မှတ်ခြင်း
+2.  **Stop Sequences**: ထုတ်လုပ်မှု ပြီးဆုံးကြောင်း အချက်ပြတဲ့ သီးခြားပုံစံတွေကို သတ်မှတ်ခြင်း
+3.  **End-of-Sequence Detection**: မော်ဒယ်ကို သူ့ဘာသာသူ အဖြေကို သဘာဝအတိုင်း နိဂုံးချုပ်စေခြင်း
+
+ဥပမာအားဖြင့်၊ ကျွန်တော်တို့ဟာ စာပိုဒ်တစ်ပိုဒ်တည်းကိုပဲ ထုတ်လုပ်ချင်တယ်ဆိုရင် အများဆုံး tokens 100 သတ်မှတ်ပြီး "\n\n" ကို stop sequence အဖြစ် အသုံးပြုနိုင်ပါတယ်။ ဒါက ကျွန်တော်တို့ရဲ့ output က ရည်ရွယ်ချက်နဲ့ ကိုက်ညီတဲ့ အရွယ်အစားနဲ့ အာရုံစိုက်မှု ရှိစေပါတယ်။
+
+![image](https://huggingface.co/reasoning-course/images/resolve/main/inference/3.png)  
+
+### Beam Search: ပိုမိုကောင်းမွန်တဲ့ ဆက်စပ်မှုအတွက် ကြိုတင်မျှော်မှန်းခြင်း[[beam-search-looking-ahead-for-better-coherence]]
+
+ကျွန်တော်တို့ ခုထိ ဆွေးနွေးခဲ့တဲ့ နည်းဗျူဟာတွေက တစ်ကြိမ်ကို token တစ်ခုစီ ဆုံးဖြတ်ချက်ချပေမယ့်၊ beam search ကတော့ ပိုမိုပြည့်စုံတဲ့ ချဉ်းကပ်မှုကို လုပ်ဆောင်ပါတယ်။ ခြေလှမ်းတိုင်းမှာ ရွေးချယ်မှုတစ်ခုတည်းကိုပဲ လုပ်ဆောင်မယ့်အစား၊ ၎င်းဟာ ကစားသမားတစ်ဦးက အရှေ့ကို အကြိမ်ကြိမ် တွေးတောသလိုမျိုး ဖြစ်နိုင်ခြေရှိတဲ့ လမ်းကြောင်းပေါင်းစုံကို တစ်ပြိုင်နက်တည်း ရှာဖွေပါတယ်။
+
+![image](https://huggingface.co/reasoning-course/images/resolve/main/inference/4.png)  
+
+ဒါက ဘယ်လိုအလုပ်လုပ်လဲဆိုတာကို ကြည့်ရအောင်-
+1.  ခြေလှမ်းတိုင်းမှာ၊ ဖြစ်နိုင်ခြေရှိတဲ့ candidate sequence အများအပြားကို ထိန်းသိမ်းထားပါတယ် (များသောအားဖြင့် ၅-၁၀ ခု)။
+2.  candidate တစ်ခုစီအတွက်၊ နောက်ထပ် token ရဲ့ ဖြစ်နိုင်ခြေတွေကို တွက်ချက်ပါတယ်။
+3.  sequence တွေနဲ့ နောက်ထပ် token တွေရဲ့ အဖြစ်နိုင်ဆုံး ပေါင်းစပ်မှုတွေကိုသာ ထိန်းသိမ်းထားပါတယ်။
+4.  လိုချင်တဲ့ အရှည် ဒါမှမဟုတ် ရပ်တန့်တဲ့အခြေအနေကို ရောက်တဲ့အထိ ဒီလုပ်ငန်းစဉ်ကို ဆက်လုပ်ပါတယ်။
+5.  အလုံးစုံ ဖြစ်နိုင်ခြေအမြင့်ဆုံးရှိတဲ့ sequence ကို ရွေးချယ်ပါတယ်။
+
+ဒီနေရာမှာ beam search ကို ပုံမှန်ကြည့်ရှုနိုင်ပါတယ်-
+
+<iframe
+	src="https://agents-course-beam-search-visualizer.hf.space"
+	frameborder="0"
+	width="850"
+	height="450"
+></iframe>
+
+ဒီချဉ်းကပ်မှုက ပိုမိုဆက်စပ်မှုရှိပြီး သဒ္ဒါမှန်ကန်တဲ့ စာသားတွေကို မကြာခဏ ထုတ်လုပ်ပေးတတ်ပေမယ့်၊ ရိုးရှင်းတဲ့ နည်းလမ်းတွေထက် ကွန်ပျူတာ အရင်းအမြစ် ပိုမိုလိုအပ်ပါတယ်။
+
+## လက်တွေ့ စိန်ခေါ်မှုများနှင့် အကောင်းဆုံးဖြစ်အောင် ပြုလုပ်ခြင်း (Optimization)[[practical-challenges-and-optimization]]
+
+LLM inference ကို ကျွန်တော်တို့ လေ့လာမှု ပြီးဆုံးခါနီးမှာ ဒီမော်ဒယ်တွေကို အသုံးပြုတဲ့အခါ သင်ရင်ဆိုင်ရမယ့် လက်တွေ့ကျတဲ့ စိန်ခေါ်မှုတွေနဲ့ ၎င်းတို့ရဲ့ စွမ်းဆောင်ရည်ကို ဘယ်လိုတိုင်းတာပြီး အကောင်းဆုံးဖြစ်အောင် လုပ်ရမလဲဆိုတာကို ကြည့်ရအောင်။
+
+### အဓိက စွမ်းဆောင်ရည် တိုင်းတာချက်များ (Key Performance Metrics)[[key-performance-metrics]]
+
+LLMs တွေနဲ့ အလုပ်လုပ်တဲ့အခါ အရေးကြီးတဲ့ တိုင်းတာချက် (၄) ခုက သင်ရဲ့ implement လုပ်မယ့် ဆုံးဖြတ်ချက်တွေကို ပုံဖော်ပေးပါလိမ့်မယ်-
+
+1.  **ပထမဆုံး Token ရရှိချိန် (Time to First Token - TTFT)**: ပထမဆုံး အဖြေကို ဘယ်လောက် မြန်မြန်ရနိုင်မလဲ။ ဒါက user experience အတွက် အရေးကြီးပြီး prefill အဆင့်ကြောင့် အဓိကအားဖြင့် ထိခိုက်ပါတယ်။
+2.  **Output Token တစ်ခုစီအတွက် အချိန် (Time Per Output Token - TPOT)**: နောက်ထပ် tokens တွေကို ဘယ်လောက် မြန်မြန်ထုတ်လုပ်နိုင်မလဲ။ ဒါက အလုံးစုံ ထုတ်လုပ်မှု အမြန်နှုန်းကို ဆုံးဖြတ်ပါတယ်။
+3.  **Throughput**: တစ်ပြိုင်နက်တည်း request ဘယ်နှစ်ခုကို ကိုင်တွယ်နိုင်မလဲ။ ဒါက scaling နဲ့ ကုန်ကျစရိတ် ထိရောက်မှုအပေါ် သက်ရောက်မှုရှိပါတယ်။
+4.  **VRAM အသုံးပြုမှု (VRAM Usage)**: GPU memory ဘယ်လောက်လိုအပ်မလဲ။ ဒါက လက်တွေ့အသုံးချမှုတွေမှာ အဓိက ကန့်သတ်ချက် ဖြစ်လာတတ်ပါတယ်။
+
+### Context Length စိန်ခေါ်မှု[[the-context-length-challenge]]
+
+LLM inference မှာ အရေးအကြီးဆုံး စိန်ခေါ်မှုတစ်ခုကတော့ context length ကို ထိထိရောက်ရောက် စီမံခန့်ခွဲခြင်းပါပဲ။ ပိုမိုရှည်လျားတဲ့ contexts တွေက အချက်အလက် ပိုပေးပေမယ့် ကုန်ကျစရိတ်များစွာနဲ့ လာပါတယ်-
+
+- **Memory အသုံးပြုမှု**: context length နဲ့အမျှ quadratically တိုးလာပါတယ်။
+- **လုပ်ဆောင်မှု အမြန်နှုန်း (Processing Speed)**: ပိုမိုရှည်လျားတဲ့ contexts တွေနဲ့အမျှ linearly လျော့ကျသွားပါတယ်။
+- **အရင်းအမြစ် ခွဲဝေမှု (Resource Allocation)**: VRAM အသုံးပြုမှုကို သေချာမျှတအောင် ထိန်းညှိဖို့ လိုအပ်ပါတယ်။
+
+[Qwen2.5-1M](https://huggingface.co/Qwen/Qwen2.5-14B-Instruct-1M) လို ခေတ်မီမော်ဒယ်တွေကတော့ 1M token context windows တွေကို စွဲမက်ဖွယ်ကောင်းအောင် ပေးစွမ်းနိုင်ပေမယ့်၊ ဒါဟာ inference လုပ်ဆောင်ချိန် သိသိသာသာ နှေးကွေးခြင်းရဲ့ အဖိုးအခနဲ့ လာပါတယ်။ အဓိကသော့ချက်ကတော့ သင်ရဲ့ သီးခြားအသုံးပြုမှုအတွက် မှန်ကန်တဲ့ မျှတမှုကို ရှာဖွေဖို့ပါပဲ။
+
+<div style="max-width: 800px; margin: 20px auto; padding: 20px; 
+font-family: system-ui;">
+    <div style="border: 2px solid #ddd; border-radius: 8px; 
+    padding: 20px; margin-bottom: 20px;">
+        <div style="display: flex; align-items: center; 
+        margin-bottom: 15px;">
+            <div style="flex: 1; text-align: center; padding: 
+            10px; background: #f0f0f0; border-radius: 4px;">
+                Input Text (Raw)
+            </div>
+            <div style="margin: 0 10px;">→</div>
+            <div style="flex: 1; text-align: center; padding: 
+            10px; background: #e1f5fe; border-radius: 4px;">
+                Tokenized Input
+            </div>
+        </div>
+        <div style="display: flex; margin-bottom: 15px;">
+            <div style="flex: 1; border: 1px solid #ccc; 
+            padding: 10px; margin: 5px; background: #e8f5e9; 
+            border-radius: 4px; text-align: center;">
+                Context Window<br/>(ဥပမာ- 4K tokens)
+                <div style="display: flex; margin-top: 10px;">
+                    <div style="flex: 1; background: #81c784; 
+                    margin: 2px; height: 20px; border-radius: 
+                    2px;"></div>
+                    <div style="flex: 1; background: #81c784; 
+                    margin: 2px; height: 20px; border-radius: 
+                    2px;"></div>
+                    <div style="flex: 1; background: #81c784; 
+                    margin: 2px; height: 20px; border-radius: 
+                    2px;"></div>
+                    <div style="flex: 1; background: #81c784; 
+                    margin: 2px; height: 20px; border-radius: 
+                    2px;"></div>
+                </div>
+            </div>
+        </div>
+        <div style="display: flex; justify-content: 
+        space-between; text-align: center; font-size: 0.9em; 
+        color: #666;">
+            <div style="flex: 1;">
+                <div style="border: 1px solid #ffcc80; padding: 
+                8px; margin: 5px; background: #fff3e0; 
+                border-radius: 4px;">
+                    Memory အသုံးပြုမှု<br/>∝ အရှည်²
+                </div>
+            </div>
+            <div style="flex: 1;">
+                <div style="border: 1px solid #90caf9; padding: 
+                8px; margin: 5px; background: #e3f2fd; 
+                border-radius: 4px;">
+                    လုပ်ဆောင်ချိန်<br/>∝ အရှည်
+                </div>
+            </div>
+        </div>
+    </div>
+</div>
+
+### KV Cache Optimization[[the-kv-cache-optimization]]
+
+ဒီစိန်ခေါ်မှုတွေကို ဖြေရှင်းဖို့အတွက် အစွမ်းအထက်ဆုံး optimization တစ်ခုကတော့ KV (Key-Value) caching ပါ။ ဒီနည်းပညာက ကြားဖြတ်တွက်ချက်မှုတွေကို သိုလှောင်ပြီး ပြန်လည်အသုံးပြုခြင်းဖြင့် inference အမြန်နှုန်းကို သိသိသာသာ တိုးတက်စေပါတယ်။ ဒီ optimization က-
+- ထပ်ခါတလဲလဲ တွက်ချက်မှုတွေကို လျှော့ချပေးပါတယ်
+- ထုတ်လုပ်မှု အမြန်နှုန်းကို တိုးတက်စေပါတယ်
+- ရှည်လျားတဲ့ context ထုတ်လုပ်မှုကို လက်တွေ့ကျအောင် လုပ်ဆောင်ပေးပါတယ်
+
+အားနည်းချက်ကတော့ memory အသုံးပြုမှု ပိုများလာတာပါပဲ၊ ဒါပေမယ့် စွမ်းဆောင်ရည် အကျိုးကျေးဇူးတွေက ဒီကုန်ကျစရိတ်ထက် များသောအားဖြင့် သာလွန်ပါတယ်။
+
+## နိဂုံး[[conclusion]]
+
+LLM inference ကို နားလည်ထားခြင်းက ဒီအစွမ်းထက်တဲ့ မော်ဒယ်တွေကို ထိထိရောက်ရောက် အသုံးပြုပြီး အကောင်းဆုံးဖြစ်အောင် လုပ်ဆောင်ဖို့အတွက် အရေးကြီးပါတယ်။ ကျွန်တော်တို့ဟာ အဓိကအစိတ်အပိုင်းတွေကို လေ့လာခဲ့ပြီးပါပြီ-
+
+- Attention နဲ့ context ရဲ့ အခြေခံအခန်းကဏ္ဍ
+- နှစ်ဆင့်ပါသော inference လုပ်ငန်းစဉ်
+- ထုတ်လုပ်မှုကို ထိန်းချုပ်ရန် အမျိုးမျိုးသော sampling နည်းဗျူဟာများ
+- လက်တွေ့ စိန်ခေါ်မှုများနှင့် optimization များ
+
+ဒီသဘောတရားတွေကို ကျွမ်းကျင်ခြင်းဖြင့် LLMs တွေကို ထိရောက်စွာနဲ့ စွမ်းဆောင်ရည်မြင့်စွာ အသုံးပြုနိုင်တဲ့ applications တွေ တည်ဆောက်ဖို့ သင်ပိုပြီး အသင့်ဖြစ်လာပါလိမ့်မယ်။
+
+LLM inference နယ်ပယ်ဟာ အဆက်မပြတ် တိုးတက်ပြောင်းလဲနေပြီး၊ နည်းစနစ်အသစ်တွေနဲ့ optimization တွေ ပုံမှန်ပေါ်ထွက်လာနေတာကို သတိရပါ။ သင်ရဲ့ သီးခြားအသုံးပြုမှုအတွက် ဘယ်အရာက အကောင်းဆုံးလဲဆိုတာကို ရှာဖွေဖို့ စူးစမ်းလိုစိတ်ထားပြီး မတူညီတဲ့ ချဉ်းကပ်မှုတွေနဲ့ ဆက်လက်စမ်းသပ်ပါ။
+
+## ဝေါဟာရ ရှင်းလင်းချက် (Glossary)
+
+*   **Inference**: လေ့ကျင့်ပြီးသား Artificial Intelligence (AI) မော်ဒယ်တစ်ခုကို အသုံးပြုပြီး input data ကနေ ခန့်မှန်းချက်တွေ ဒါမှမဟုတ် output တွေကို ထုတ်လုပ်တဲ့ လုပ်ငန်းစဉ်။
+*   **Large Language Models (LLMs)**: လူသားဘာသာစကားကို နားလည်ပြီး ထုတ်လုပ်ပေးနိုင်တဲ့ အလွန်ကြီးမားတဲ့ Artificial Intelligence (AI) မော်ဒယ်တွေ ဖြစ်ပါတယ်။ ၎င်းတို့ဟာ ဒေတာအမြောက်အမြားနဲ့ သင်ကြားလေ့ကျင့်ထားပြီး စာရေးတာ၊ မေးခွန်းဖြေတာ စတဲ့ ဘာသာစကားဆိုင်ရာ လုပ်ငန်းမျိုးစုံကို လုပ်ဆောင်နိုင်ပါတယ်။
+*   **Transformer Architecture**: Natural Language Processing (NLP) မှာ အောင်မြင်မှုများစွာရရှိခဲ့တဲ့ deep learning architecture တစ်မျိုးပါ။ ၎င်းတို့ဟာ စာသားတွေထဲက စကားလုံးတွေရဲ့ ဆက်နွယ်မှုတွေကို "attention mechanism" သုံးပြီး နားလည်အောင် သင်ကြားပေးပါတယ်။
+*   **Text Generation**: AI မော်ဒယ်များကို အသုံးပြု၍ လူသားကဲ့သို့သော စာသားအသစ်များ ဖန်တီးခြင်း။
+*   **Text Classification**: စာသားတစ်ခုကို သတ်မှတ်ထားသော အမျိုးအစားများ သို့မဟုတ် အတန်းများထဲသို့ ခွဲခြားသတ်မှတ်ခြင်း။
+*   **Summarization**: စာသားရှည်ကြီးတစ်ခုကို အဓိကအချက်အလက်များ မပျောက်ပျက်စေဘဲ အကျဉ်းချုံးဖော်ပြခြင်း။
+*   **Prompt**: Large Language Models (LLMs) ကို တိကျသောလုပ်ငန်းတစ်ခု လုပ်ဆောင်ရန် သို့မဟုတ် အချက်အလက်ပေးရန်အတွက် ပေးပို့သော input text သို့မဟုတ် မေးခွန်း။
+*   **Parameters**: Machine Learning မော်ဒယ်တစ်ခု၏ သင်ယူနိုင်သော အစိတ်အပိုင်းများ။ ၎င်းတို့သည် လေ့ကျင့်နေစဉ်အတွင်း ဒေတာများမှ ပုံစံများကို သင်ယူကာ ချိန်ညှိပေးသည်။
+*   **Token**: စာသား (သို့မဟုတ် အခြားဒေတာ) ကို AI မော်ဒယ်များ စီမံဆောင်ရွက်နိုင်ရန် ပိုင်းခြားထားသော အသေးငယ်ဆုံးယူနစ်။ စကားလုံး၊ စာလုံးတစ်ပိုင်း သို့မဟုတ် တစ်ခုတည်းသော စာလုံးတစ်လုံး ဖြစ်နိုင်သည်။
+*   **Sequence**: အစဉ်လိုက် စီစဉ်ထားသော tokens များ။
+*   **Coherent**: ယုတ္တိရှိရှိ ဆက်စပ်နေခြင်း၊ နားလည်လွယ်ခြင်း။
+*   **Context**: စကားလုံး၊ စာကြောင်း သို့မဟုတ် အကြောင်းအရာတစ်ခုရဲ့ အဓိပ္ပာယ်ကို နားလည်စေရန် ကူညီပေးသော ပတ်ဝန်းကျင်ရှိ အချက်အလက်များ။
+*   **Attention Mechanism**: Transformer မော်ဒယ်များတွင် အသုံးပြုသော နည်းစနစ်တစ်ခုဖြစ်ပြီး input sequence အတွင်းရှိ အရေးပါသော အစိတ်အပိုင်းများကို အာရုံစိုက်ပြီး ဆက်နွယ်မှုများကို သင်ယူစေသည်။
+*   **BERT (Bidirectional Encoder Representations from Transformers)**: Google မှ တီထွင်ထားသော Transformer-based NLP မော်ဒယ်တစ်ခု။
+*   **GPT-2 (Generative Pre-trained Transformer 2)**: OpenAI မှ တီထွင်ထားသော Transformer-based NLP မော်ဒယ်တစ်ခု။
+*   **Neural Networks**: လူသားဦးနှောက်၏ လုပ်ဆောင်မှုပုံစံကို အတုယူထားသော ကွန်ပျူတာစနစ်များ။
+*   **Context Length**: Large Language Model (LLM) တစ်ခုက တစ်ကြိမ်တည်း လုပ်ဆောင်နိုင်သော အများဆုံး token အရေအတွက်။
+*   **Working Memory**: မော်ဒယ်က လက်ရှိလုပ်ငန်းဆောင်တာအတွက် လိုအပ်တဲ့ အချက်အလက်တွေကို ခဏတာ ထိန်းသိမ်းထားတဲ့ မှတ်ဉာဏ်။
+*   **Hardware Constraints**: ကွန်ပျူတာစနစ်ရဲ့ ရုပ်ပိုင်းဆိုင်ရာ ကန့်သတ်ချက်များ (ဥပမာ- GPU memory, processing power)။
+*   **Computational Costs**: ကွန်ပျူတာအရင်းအမြစ်များ (ဥပမာ- လျှပ်စစ်ဓာတ်အား၊ စက်အချိန်) အသုံးပြုခြင်းအတွက် ကုန်ကျစရိတ်။
+*   **Tokenization**: input text ကို AI မော်ဒယ် နားလည်နိုင်တဲ့ tokens တွေအဖြစ် ပြောင်းလဲတဲ့ လုပ်ငန်းစဉ်။
+*   **Embeddings**: tokens တွေရဲ့ အဓိပ္ပာယ်ကို ဖမ်းယူထားတဲ့ ဂဏန်းဆိုင်ရာ ကိုယ်စားပြုမှုများ။
+*   **Neural Networks**: လူသားဦးနှောက်၏ လုပ်ဆောင်မှုပုံစံကို အတုယူထားသော ကွန်ပျူတာစနစ်များ။
+*   **Autoregressive Process**: နောက်ထပ်ထုတ်လုပ်မယ့် output က ယခင်ထုတ်လုပ်ခဲ့တဲ့ outputs အားလုံးပေါ် မူတည်နေတဲ့ လုပ်ငန်းစဉ်။
+*   **Logits**: မော်ဒယ်က နောက်ထပ် ဖြစ်နိုင်ခြေရှိတဲ့ token တစ်ခုစီအတွက် ထုတ်ပေးတဲ့ ကနဦး၊ အဆင့်မမီသေးတဲ့ ဖြစ်နိုင်ခြေတန်ဖိုးများ။
+*   **Temperature Control**: LLM ရဲ့ output မှာ ကျပန်းဆန်မှု (randomness) သို့မဟုတ် တီထွင်ဖန်တီးမှု (creativity) ပမာဏကို ထိန်းညှိရန် အသုံးပြုသော parameter တစ်ခု။
+*   **Top-p (Nucleus) Sampling**: နောက်ထပ် token ကို ရွေးချယ်ရာတွင် ဖြစ်နိုင်ခြေအမြင့်ဆုံး token များ၏ စုစုပေါင်းဖြစ်နိုင်ခြေ ကန့်သတ်ချက်အောက်တွင်ရှိသော token များကိုသာ ထည့်သွင်းစဉ်းစားခြင်း။
+*   **Top-k Filtering**: နောက်ထပ် token ကို ရွေးချယ်ရာတွင် အဖြစ်နိုင်ဆုံး k လုံးကိုသာ ထည့်သွင်းစဉ်းစားခြင်း။
+*   **Presence Penalty**: ယခင်က ပေါ်ထွက်ဖူးသော token များအတွက် ပုံသေပြစ်ဒဏ်ချမှတ်ခြင်း။
+*   **Frequency Penalty**: ယခင်က ပေါ်ထွက်ဖူးသော token များအတွက် ၎င်းတို့ပေါ်ထွက်သည့် အကြိမ်အရေအတွက်အလိုက် ပြစ်ဒဏ်ကို တိုးမြှင့်ချမှတ်ခြင်း။
+*   **Token Limits**: ထုတ်လုပ်မည့် token အရေအတွက်အတွက် အနည်းဆုံးနှင့် အများဆုံး ကန့်သတ်ချက်များ။
+*   **Stop Sequences**: စာသားထုတ်လုပ်မှုကို ရပ်တန့်ရန် အချက်ပြသည့် သတ်မှတ်ထားသော စာသားပုံစံများ။
+*   **End-of-Sequence (EOS) Token**: မော်ဒယ်က စာသားထုတ်လုပ်မှုကို ပြီးဆုံးရန် အချက်ပြသည့် အထူး token တစ်ခု။
+*   **Beam Search**: စာသားထုတ်လုပ်ရာတွင် ဖြစ်နိုင်ခြေအကောင်းဆုံး sequence များစွာကို တစ်ပြိုင်နက်တည်း ရှာဖွေပြီး အကောင်းဆုံးကို ရွေးချယ်သည့် နည်းဗျူဟာ။
+*   **Time to First Token (TTFT)**: LLM တစ်ခုက input prompt ကို လက်ခံရရှိပြီးနောက် ပထမဆုံး token ကို ထုတ်လုပ်ရန် ကြာမြင့်သော အချိန်။
+*   **Time Per Output Token (TPOT)**: LLM တစ်ခုက နောက်ဆက်တွဲ output token တစ်ခုစီကို ထုတ်လုပ်ရန် ကြာမြင့်သော အချိန်။
+*   **Throughput**: LLM စနစ်တစ်ခုက သတ်မှတ်ထားသော အချိန်ကာလတစ်ခုအတွင်း လုပ်ဆောင်နိုင်သော requests အရေအတွက်။
+*   **VRAM Usage**: GPU (Graphics Processing Unit) ၏ memory အသုံးပြုမှု။
+*   **KV (Key-Value) Caching**: LLM inference တွင် အကြားအချက်အလက်များကို သိုလှောင်ပြီး ပြန်လည်အသုံးပြုခြင်းဖြင့် လုပ်ဆောင်မှုအမြန်နှုန်းကို မြှင့်တင်ပေးသော နည်းပညာ။
\ No newline at end of file
diff --git a/chapters/my/chapter1/9.mdx b/chapters/my/chapter1/9.mdx
new file mode 100644
index 000000000..71141e5da
--- /dev/null
+++ b/chapters/my/chapter1/9.mdx
@@ -0,0 +1,52 @@
+# ဘက်လိုက်မှုနှင့် ကန့်သတ်ချက်များ[[bias-and-limitations]]
+
+<CourseFloatingBanner chapter={1}
+  classNames="absolute z-10 right-0 top-0"
+  notebooks={[
+    {label: "Google Colab", value: "https://colab.research.google.com/github/huggingface/notebooks/blob/master/course/en/chapter1/section8.ipynb"},
+    {label: "Aws Studio", value: "https://studiolab.sagemaker.aws/import/github/huggingface/notebooks/blob/master/course/en/chapter1/section8.ipynb"},
+]} />
+
+အကယ်၍ သင်က pre-trained model တစ်ခုကို ဒါမှမဟုတ် fine-tuned version တစ်ခုကို ထုတ်လုပ်မှု (production) မှာ အသုံးပြုဖို့ ရည်ရွယ်တယ်ဆိုရင်၊ ဒီမော်ဒယ်တွေဟာ အစွမ်းထက်တဲ့ ကိရိယာတွေဖြစ်ပေမယ့် ကန့်သတ်ချက်တွေနဲ့ လာတယ်ဆိုတာကို သတိပြုသင့်ပါတယ်။ အကြီးဆုံး ကန့်သတ်ချက်ကတော့ ဒေတာပမာဏများစွာပေါ်မှာ pre-training လုပ်နိုင်ဖို့အတွက် သုတေသီတွေဟာ အင်တာနက်ပေါ်က တွေ့သမျှ အကြောင်းအရာအားလုံးကို ရယူကြပြီး၊ အကောင်းဆုံးအရာတွေရော အဆိုးဆုံးအရာတွေရော ပါဝင်လာတတ်ပါတယ်။
+
+ဥပမာအနေနဲ့ ရှင်းပြရရင် BERT မော်ဒယ်ကို အသုံးပြုထားတဲ့ `fill-mask` pipeline ဥပမာကို ပြန်သွားကြည့်ရအောင်။
+
+```python
+from transformers import pipeline
+
+unmasker = pipeline("fill-mask", model="bert-base-uncased")
+result = unmasker("This man works as a [MASK].")
+print([r["token_str"] for r in result])
+
+result = unmasker("This woman works as a [MASK].")
+print([r["token_str"] for r in result])
+```
+
+```python out
+['lawyer', 'carpenter', 'doctor', 'waiter', 'mechanic']
+['nurse', 'waitress', 'teacher', 'maid', 'prostitute']
+```
+
+ဒီစာကြောင်းနှစ်ကြောင်းမှာ ပျောက်ဆုံးနေတဲ့ စကားလုံးကို ဖြည့်ဖို့ မေးတဲ့အခါ၊ မော်ဒယ်က လိင်ကွဲပြားမှုမရှိတဲ့ အဖြေတစ်ခု (waiter/waitress) ကိုသာ ပေးပါတယ်။ ကျန်တဲ့ အလုပ်အကိုင်တွေကတော့ သီးခြားလိင်နဲ့ ပုံမှန်အားဖြင့် ဆက်စပ်နေတဲ့ အလုပ်အကိုင်တွေ ဖြစ်ပါတယ်—ဟုတ်ပါတယ်၊ "prostitute" က "woman" နဲ့ "work" တို့နဲ့ မော်ဒယ်က ဆက်စပ်တဲ့ ဖြစ်နိုင်ခြေ ထိပ်ဆုံး ၅ ခုထဲမှာ ပါဝင်ခဲ့ပါတယ်။ BERT ဟာ အင်တာနက်တစ်လျှောက်ကနေ ဒေတာတွေကို ရယူပြီး တည်ဆောက်ထားတဲ့ ရှားပါး Transformer မော်ဒယ်တွေထဲက တစ်ခု မဟုတ်ဘဲ၊ ကြည့်ရတာ ကြားနေတဲ့ ဒေတာ (English Wikipedia နဲ့ BookCorpus datasets တွေနဲ့ လေ့ကျင့်ထားပါတယ်) ကို အသုံးပြုထားတာ ဖြစ်ပေမယ့်လည်း ဒီလိုဖြစ်တတ်ပါတယ်။
+
+ဒီကိရိယာတွေကို အသုံးပြုတဲ့အခါ သင်သုံးနေတဲ့ မူရင်းမော်ဒယ်ဟာ Sexist၊ Racist ဒါမှမဟုတ် homophobic အကြောင်းအရာတွေကို အလွန်လွယ်ကူစွာ ထုတ်လုပ်နိုင်တယ်ဆိုတာကို သတိရနေဖို့ လိုအပ်ပါတယ်။ သင်ရဲ့ ဒေတာပေါ်မှာ မော်ဒယ်ကို fine-tuning လုပ်တာဟာ ဒီအတွင်းပိုင်း ဘက်လိုက်မှုကို ပျောက်ကွယ်သွားစေမှာ မဟုတ်ပါဘူး။
+
+## ဝေါဟာရ ရှင်းလင်းချက် (Glossary)
+
+*   **Bias**: ဒေတာအစုအဝေး (dataset) သို့မဟုတ် မော်ဒယ်၏ လေ့ကျင့်မှုပုံစံကြောင့် ဖြစ်ပေါ်လာသော ဘက်လိုက်မှုများ။ ဥပမာ - လူမျိုး၊ လိင်၊ ဘာသာ စသည်တို့ကို ခွဲခြားဆက်ဆံခြင်း။
+*   **Limitations**: AI မော်ဒယ်များ၏ လုပ်ဆောင်နိုင်စွမ်းနှင့် ပတ်သက်သော ကန့်သတ်ချက်များ၊ အားနည်းချက်များ။
+*   **Pretrained Model**: ဒေတာအမြောက်အမြားပေါ်တွင် ကြိုတင်လေ့ကျင့်ထားပြီးသား Artificial Intelligence (AI) မော်ဒယ်တစ်ခု။ ၎င်းတို့ကို အခြားလုပ်ငန်းများအတွက် အခြေခံအဖြစ် ပြန်လည်အသုံးပြုနိုင်သည်။
+*   **Fine-tuned Version**: ကြိုတင်လေ့ကျင့်ထားပြီးသား (pre-trained) မော်ဒယ်တစ်ခုကို သီးခြားလုပ်ငန်းတစ်ခု (specific task) အတွက် အနည်းငယ်သော ဒေတာနဲ့ ထပ်မံလေ့ကျင့်ပေးထားသော မော်ဒယ်၏ ပုံစံ။
+*   **Production**: ဆော့ဖ်ဝဲလ် သို့မဟုတ် မော်ဒယ်တစ်ခုကို အမှန်တကယ် အသုံးပြုနေသော လက်တွေ့ပတ်ဝန်းကျင် သို့မဟုတ် စနစ်။
+*   **Transformer Models**: Natural Language Processing (NLP) မှာ အောင်မြင်မှုများစွာရရှိခဲ့တဲ့ deep learning architecture တစ်မျိုးပါ။ ၎င်းတို့ဟာ စာသားတွေထဲက စကားလုံးတွေရဲ့ ဆက်နွယ်မှုတွေကို "attention mechanism" သုံးပြီး နားလည်အောင် သင်ကြားပေးပါတယ်။
+*   **Scrape**: အင်တာနက်ပေါ်မှ ဒေတာများကို အလိုအလျောက် စုဆောင်းခြင်း။
+*   **`fill-mask` pipeline**: Hugging Face Transformers library မှာ ပါဝင်တဲ့ function တစ်ခုဖြစ်ပြီး input text ထဲက `[MASK]` နေရာမှာ ပျောက်ဆုံးနေတဲ့ စကားလုံးကို ခန့်မှန်းပြီး ဖြည့်စွက်ပေးတဲ့ လုပ်ငန်းဆောင်တာ။
+*   **BERT (Bidirectional Encoder Representations from Transformers)**: Google မှ တီထွင်ထားသော Transformer-based NLP မော်ဒယ်တစ်ခု။
+*   **`bert-base-uncased`**: BERT မော်ဒယ်၏ အခြေခံဗားရှင်း (base version) ဖြစ်ပြီး စာလုံးအကြီးအသေး ခွဲခြားခြင်းမရှိ (uncased) ဘဲ လေ့ကျင့်ထားသည်။
+*   **`token_str`**: ထုတ်လုပ်လိုက်သော token ကို ကိုယ်စားပြုသော စာသား string။
+*   **English Wikipedia**: အင်္ဂလိပ်ဘာသာစကားဖြင့် ရေးသားထားသော Wikipedia စွယ်စုံကျမ်း၏ အချက်အလက်များ။
+*   **BookCorpus**: စာအုပ်များစွာမှ စုဆောင်းထားသော စာသားဒေတာအစုအဝေးတစ်ခု။
+*   **Dataset**: AI မော်ဒယ်တွေ လေ့ကျင့်ဖို့အတွက် အသုံးပြုတဲ့ ဒေတာအစုအဝေးတစ်ခုပါ။
+*   **Sexist**: လိင်အပေါ်အခြေခံပြီး ခွဲခြားဆက်ဆံခြင်း သို့မဟုတ် ဘက်လိုက်ခြင်း။
+*   **Racist**: လူမျိုးအပေါ်အခြေခံပြီး ခွဲခြားဆက်ဆံခြင်း သို့မဟုတ် ဘက်လိုက်ခြင်း။
+*   **Homophobic**: လိင်တူချစ်သူများကို မနှစ်သက်ခြင်း သို့မဟုတ် ခွဲခြားဆက်ဆံခြင်း။
\ No newline at end of file
diff --git a/chapters/my/chapter2/1.mdx b/chapters/my/chapter2/1.mdx
new file mode 100644
index 000000000..a6f0750ad
--- /dev/null
+++ b/chapters/my/chapter2/1.mdx
@@ -0,0 +1,53 @@
+# နိဒါန်း[[introduction]]
+
+<CourseFloatingBanner
+    chapter={2}
+    classNames="absolute z-10 right-0 top-0"
+/>
+
+[Chapter 1](/course/chapter1) မှာ သင်တွေ့ခဲ့ရသလို Transformer မော်ဒယ်တွေဟာ များသောအားဖြင့် အရွယ်အစား အလွန်ကြီးမားပါတယ်။ Parameters သန်းပေါင်းများစွာကနေ ဘီလီယံပေါင်းများစွာအထိ ရှိတာကြောင့် ဒီမော်ဒယ်တွေကို လေ့ကျင့်တာနဲ့ အသုံးပြုတာ (deploy) ဟာ ရှုပ်ထွေးတဲ့ လုပ်ငန်းစဉ်တစ်ခု ဖြစ်ပါတယ်။ ဒီအပြင်၊ မော်ဒယ်အသစ်တွေ နေ့တိုင်းနီးပါး ထွက်ပေါ်လာပြီး တစ်ခုချင်းစီမှာ သူ့ရဲ့ကိုယ်ပိုင် implement လုပ်ပုံတွေရှိတာကြောင့် ဒါတွေကို အားလုံး စမ်းသပ်ကြည့်ဖို့က မလွယ်ပါဘူး။
+
+🤗 Transformers library ကို ဒီပြဿနာကို ဖြေရှင်းဖို့အတွက် ဖန်တီးခဲ့တာပါ။ သူ့ရဲ့ ရည်ရွယ်ချက်ကတော့ Transformer မော်ဒယ်တိုင်းကို load လုပ်နိုင်၊ train လုပ်နိုင်ပြီး save လုပ်နိုင်တဲ့ API တစ်ခုတည်းကို ပံ့ပိုးပေးဖို့ပါပဲ။ library ရဲ့ အဓိကအင်္ဂါရပ်တွေကတော့-
+
+-   **အသုံးပြုရလွယ်ကူမှု**: state-of-the-art Natural Language Processing (NLP) မော်ဒယ်တစ်ခုကို inference အတွက် download လုပ်တာ၊ load လုပ်တာနဲ့ အသုံးပြုတာကို code နှစ်ကြောင်းတည်းနဲ့ လုပ်ဆောင်နိုင်ပါတယ်။
+-   **ပြောင်းလွယ်ပြင်လွယ်မှု (Flexibility)**: မူရင်းအားဖြင့် မော်ဒယ်အားလုံးဟာ ရိုးရှင်းတဲ့ PyTorch `nn.Module` classes တွေဖြစ်ပြီး ၎င်းတို့ရဲ့ သက်ဆိုင်ရာ machine learning (ML) frameworks တွေထဲက တခြားမော်ဒယ်တွေလိုမျိုး ကိုင်တွယ်နိုင်ပါတယ်။
+-   **ရိုးရှင်းမှု (Simplicity)**: library တစ်လျှောက်လုံးမှာ abstraction တွေဟာ သိပ်မရှိပါဘူး။ "All in one file" ဆိုတာက အဓိကသဘောတရားတစ်ခုပါ- မော်ဒယ်တစ်ခုရဲ့ forward pass ကို file တစ်ခုတည်းမှာ အပြည့်အစုံ သတ်မှတ်ထားတာကြောင့် code ကို နားလည်ရလွယ်ကူပြီး ပြင်ဆင်ရ လွယ်ကူပါတယ်။
+
+ဒီနောက်ဆုံးအင်္ဂါရပ်က 🤗 Transformers ကို အခြားသော ML library တွေနဲ့ အတော်လေး ကွဲပြားစေပါတယ်။ မော်ဒယ်တွေကို file တွေတစ်လျှောက် မျှဝေထားတဲ့ modules တွေနဲ့ တည်ဆောက်ထားတာ မဟုတ်ပါဘူး။ အဲဒီအစား မော်ဒယ်တစ်ခုစီမှာ သူ့ကိုယ်ပိုင် layers တွေ ရှိပါတယ်။ ဒါက မော်ဒယ်တွေကို ပိုမိုနားလည်ရလွယ်ကူပြီး လက်လှမ်းမီစေတဲ့အပြင်၊ မော်ဒယ်တစ်ခုပေါ်မှာ အခြားမော်ဒယ်တွေကို မထိခိုက်စေဘဲ အလွယ်တကူ စမ်းသပ်နိုင်စေပါတယ်။
+
+ဒီအခန်းကို end-to-end ဥပမာတစ်ခုနဲ့ စတင်ပါမယ်။ ဒီဥပမာမှာ ကျွန်တော်တို့ဟာ [Chapter 1](/course/chapter1) မှာ မိတ်ဆက်ခဲ့တဲ့ `pipeline()` function ကို ပြန်လည်ဖန်တီးဖို့အတွက် မော်ဒယ်တစ်ခုနဲ့ tokenizer တစ်ခုကို ပေါင်းပြီး အသုံးပြုပါမယ်။ နောက်တစ်ဆင့်အနေနဲ့ model API ကို ဆွေးနွေးပါမယ်- မော်ဒယ်နဲ့ configuration classes တွေထဲကို နက်ရှိုင်းစွာ လေ့လာပြီး၊ မော်ဒယ်တစ်ခုကို ဘယ်လို load လုပ်ရမယ်၊ ပြီးတော့ ဂဏန်းဆိုင်ရာ inputs တွေကို output predictions တွေအဖြစ် ဘယ်လိုလုပ်ဆောင်တယ်ဆိုတာကို သင်ပြပါမယ်။
+
+အဲဒီနောက် `pipeline()` function ရဲ့ အခြားအဓိက အစိတ်အပိုင်းဖြစ်တဲ့ tokenizer API ကို ကြည့်ပါမယ်။ Tokenizers တွေက ပထမဆုံးနဲ့ နောက်ဆုံး လုပ်ဆောင်မှုအဆင့်တွေကို ကိုင်တွယ်ပေးပြီး၊ စာသားကနေ neural network အတွက် ဂဏန်းဆိုင်ရာ inputs တွေအဖြစ် ပြောင်းလဲခြင်းနဲ့ လိုအပ်တဲ့အခါ စာသားအဖြစ် ပြန်ပြောင်းလဲခြင်းတို့ကို လုပ်ဆောင်ပေးပါတယ်။ နောက်ဆုံးအနေနဲ့၊ မော်ဒယ်တစ်ခုကနေတဆင့် စာကြောင်းများစွာကို batch အဖြစ် ပေးပို့တာကို ဘယ်လိုကိုင်တွယ်ရမလဲဆိုတာ သင်ပြပြီး၊ အဆင့်မြင့် `tokenizer()` function ကို ပိုမိုနက်ရှိုင်းစွာ လေ့လာခြင်းဖြင့် အားလုံးကို အပြီးသတ်ပါမယ်။
+
+> [!TIP]
+> ⚠️ Model Hub နဲ့ 🤗 Transformers မှာ ရရှိနိုင်တဲ့ အင်္ဂါရပ်အားလုံးကို ရယူဖို့အတွက် [account တစ်ခု ဖန်တီး](https://huggingface.co/join) ဖို့ ကျွန်တော်တို့ အကြံပြုပါတယ်။
+
+---
+
+## ဝေါဟာရ ရှင်းလင်းချက် (Glossary)
+
+*   **Transformer Models**: Natural Language Processing (NLP) မှာ အောင်မြင်မှုများစွာရရှိခဲ့တဲ့ deep learning architecture တစ်မျိုးပါ။ ၎င်းတို့ဟာ စာသားတွေထဲက စကားလုံးတွေရဲ့ ဆက်နွယ်မှုတွေကို "attention mechanism" သုံးပြီး နားလည်အောင် သင်ကြားပေးပါတယ်။
+*   **Parameters**: Machine Learning မော်ဒယ်တစ်ခု၏ သင်ယူနိုင်သော အစိတ်အပိုင်းများ။ ၎င်းတို့သည် လေ့ကျင့်နေစဉ်အတွင်း ဒေတာများမှ ပုံစံများကို သင်ယူကာ ချိန်ညှိပေးသည်။
+*   **Deploying**: Machine Learning မော်ဒယ်တစ်ခုကို အမှန်တကယ် အသုံးပြုနိုင်သော စနစ် သို့မဟုတ် environment တစ်ခုထဲသို့ ထည့်သွင်းခြင်း။
+*   **🤗 Transformers Library**: Hugging Face က ထုတ်လုပ်ထားတဲ့ library တစ်ခုဖြစ်ပြီး Transformer မော်ဒယ်တွေကို အသုံးပြုပြီး Natural Language Processing (NLP), computer vision, audio processing စတဲ့ နယ်ပယ်တွေမှာ အဆင့်မြင့် AI မော်ဒယ်တွေကို တည်ဆောက်ပြီး အသုံးပြုနိုင်စေပါတယ်။
+*   **API (Application Programming Interface)**: ဆော့ဖ်ဝဲလ် နှစ်ခုကြား အပြန်အလှန် ချိတ်ဆက်ဆောင်ရွက်နိုင်ရန် လမ်းကြောင်းဖွင့်ပေးသော အစုအဝေး (set of rules) များ။
+*   **State-of-the-art (SOTA)**: လက်ရှိအချိန်တွင် အကောင်းဆုံး သို့မဟုတ် အဆင့်မြင့်ဆုံး စွမ်းဆောင်ရည်ကို ပြသနိုင်သော နည်းပညာ သို့မဟုတ် မော်ဒယ်။
+*   **NLP (Natural Language Processing)**: ကွန်ပျူတာတွေ လူသားဘာသာစကားကို နားလည်၊ အဓိပ္ပာယ်ဖော်ပြီး၊ ဖန်တီးနိုင်အောင် လုပ်ဆောင်ပေးတဲ့ Artificial Intelligence (AI) ရဲ့ နယ်ပယ်ခွဲတစ်ခု ဖြစ်ပါတယ်။
+*   **Inference**: လေ့ကျင့်ပြီးသား Artificial Intelligence (AI) မော်ဒယ်တစ်ခုကို အသုံးပြုပြီး input data ကနေ ခန့်မှန်းချက်တွေ ဒါမှမဟုတ် output တွေကို ထုတ်လုပ်တဲ့ လုပ်ငန်းစဉ်။
+*   **Flexibility**: ပြောင်းလွယ်ပြင်လွယ်ရှိခြင်း၊ အခြေအနေအမျိုးမျိုးနဲ့ လိုက်လျောညီထွေစွာ အသုံးပြုနိုင်ခြင်း။
+*   **PyTorch `nn.Module` classes**: PyTorch deep learning framework မှာ Neural Network layers တွေနဲ့ models တွေကို တည်ဆောက်ဖို့အတွက် အသုံးပြုတဲ့ အခြေခံ class တွေ။
+*   **Machine Learning (ML) Frameworks**: Machine learning မော်ဒယ်များကို တည်ဆောက်ရန်၊ လေ့ကျင့်ရန်နှင့် အသုံးပြုရန်အတွက် ကိရိယာများနှင့် library များ စုစည်းမှု (ဥပမာ - PyTorch, TensorFlow)။
+*   **Abstractions**: ကွန်ပျူတာပရိုဂရမ်းမင်းတွင် ရှုပ်ထွေးသောအသေးစိတ်အချက်အလက်များကို ဝှက်ထားပြီး အရေးကြီးသော အချက်များကိုသာ ပြသခြင်း။
+*   **Forward Pass**: Neural Network တစ်ခုတွင် input data ကို ယူပြီး network layers များကို ဖြတ်သန်းကာ output prediction ကို ထုတ်လုပ်သည့် လုပ်ငန်းစဉ်။
+*   **End-to-end Example**: စနစ်တစ်ခု၏ စတင်ခြင်းမှ အဆုံးအထိ အပြည့်အစုံ ပြသထားသော ဥပမာ။
+*   **`pipeline()` function**: Hugging Face Transformers library မှာ ပါဝင်တဲ့ လုပ်ဆောင်ချက်တစ်ခုဖြစ်ပြီး မော်ဒယ်တွေကို သီးခြားလုပ်ငန်းတာဝန်များ (ဥပမာ- စာသားခွဲခြားသတ်မှတ်ခြင်း၊ စာသားထုတ်လုပ်ခြင်း) အတွက် အသုံးပြုရလွယ်ကူအောင် ပြုလုပ်ပေးပါတယ်။
+*   **Tokenizer**: စာသား (သို့မဟုတ် အခြားဒေတာ) ကို AI မော်ဒယ်များ စီမံဆောင်ရွက်နိုင်ရန် tokens တွေအဖြစ် ပိုင်းခြားပေးသည့် ကိရိယာ သို့မဟုတ် လုပ်ငန်းစဉ်။
+*   **Model API**: မော်ဒယ်တစ်ခုကို ပရိုဂရမ်ကနေ ဘယ်လို ဝင်ရောက်အသုံးပြုနိုင်မလဲဆိုတာကို သတ်မှတ်ပေးတဲ့ interface။
+*   **Configuration Classes**: Transformer မော်ဒယ်တစ်ခု၏ architecture နှင့် hyperparameters များကို သတ်မှတ်ပေးသော Python classes များ။
+*   **Numerical Inputs**: ကွန်ပျူတာစနစ်များက လုပ်ဆောင်နိုင်သော ဂဏန်းပုံစံဖြင့် ဖော်ပြထားသော အချက်အလက်များ။
+*   **Output Predictions**: မော်ဒယ်က input ကို အခြေခံပြီး ခန့်မှန်းထုတ်ပေးသော ရလဒ်များ။
+*   **Tokenizer API**: Tokenizer တစ်ခုကို ပရိုဂရမ်ကနေ ဘယ်လို ဝင်ရောက်အသုံးပြုနိုင်မလဲဆိုတာကို သတ်မှတ်ပေးတဲ့ interface။
+*   **Neural Network**: လူသားဦးနှောက်၏ လုပ်ဆောင်မှုပုံစံကို အတုယူထားသော ကွန်ပျူတာစနစ်များ။
+*   **Batch**: မော်ဒယ်တစ်ခုက တစ်ပြိုင်နက်တည်း လုပ်ဆောင်ရန်အတွက် စုစည်းထားသော inputs အများအပြား။
+*   **Model Hub**: Hugging Face ပေါ်ရှိ pre-trained model များနှင့် datasets များကို ရှာဖွေ၊ မျှဝေပြီး အသုံးပြုနိုင်သော online platform။
+*   **Hugging Face Account**: Hugging Face ပလက်ဖောင်းပေါ်ရှိ သုံးစွဲသူအကောင့်။ ၎င်းသည် မော်ဒယ်များ၊ datasets များနှင့် အခြားအရင်းအမြစ်များကို ဝင်ရောက်ကြည့်ရှုရန် ခွင့်ပြုသည်။
\ No newline at end of file
diff --git a/chapters/my/chapter2/2.mdx b/chapters/my/chapter2/2.mdx
new file mode 100644
index 000000000..582918623
--- /dev/null
+++ b/chapters/my/chapter2/2.mdx
@@ -0,0 +1,289 @@
+<FrameworkSwitchCourse {fw} />
+
+# Pipeline နောက်ကွယ်မှ အကြောင်းအရာများ[[behind-the-pipeline]]
+
+<CourseFloatingBanner chapter={2}
+  classNames="absolute z-10 right-0 top-0"
+  notebooks={[
+    {label: "Google Colab", value: "https://colab.research.google.com/github/huggingface/notebooks/blob/master/course/en/chapter2/section2_pt.ipynb"},
+    {label: "Aws Studio", value: "https://studiolab.sagemaker.aws/import/github/huggingface/notebooks/blob/master/course/en/chapter2/section2_pt.ipynb"},
+]} />
+
+<Youtube id="1pedAIvTWXk"/>
+
+[Chapter 1](/course/chapter1) မှာ အောက်ပါ code ကို run တဲ့အခါ ဘာတွေဖြစ်ပျက်သွားလဲဆိုတာကို ကြည့်ခြင်းဖြင့် ဥပမာတစ်ခုနဲ့ စလိုက်ရအောင်...။
+
+```python
+from transformers import pipeline
+
+classifier = pipeline("sentiment-analysis")
+classifier(
+    [
+        "I've been waiting for a HuggingFace course my whole life.",
+        "I hate this so much!",
+    ]
+)
+```
+
+အောက်ပါရလဒ်ကို ရရှိခဲ့ပါတယ်။
+
+```python out
+[{'label': 'POSITIVE', 'score': 0.9598047137260437},
+ {'label': 'NEGATIVE', 'score': 0.9994558095932007}]
+```
+
+[Chapter 1](/course/chapter1) မှာ ကျွန်တော်တို့ တွေ့ခဲ့ရသလို၊ ဒီ pipeline ဟာ အဆင့်သုံးဆင့်ကို ပေါင်းစပ်ထားပါတယ်၊ preprocessing လုပ်ခြင်း၊ model ကနေတဆင့် inputs တွေကို ပေးပို့ခြင်း၊ နဲ့ postprocessing လုပ်ခြင်းတို့ ဖြစ်ပါတယ်။
+
+<div class="flex justify-center">
+<img class="block dark:hidden" src="https://huggingface.co/datasets/huggingface-course/documentation-images/resolve/main/en/chapter2/full_nlp_pipeline.svg" alt="The full NLP pipeline: tokenization of text, conversion to IDs, and inference through the Transformer model and the model head."/>
+<img class="hidden dark:block" src="https://huggingface.co/datasets/huggingface-course/documentation-images/resolve/main/en/chapter2/full_nlp_pipeline-dark.svg" alt="The full NLP pipeline: tokenization of text, conversion to IDs, and inference through the Transformer model and the model head."/>
+</div>
+
+ဒါတွေကို အမြန်ဆုံး တစ်ခုချင်းစီ လေ့လာကြည့်ရအောင်။
+
+## Tokenizer ဖြင့် Preprocessing ပြုလုပ်ခြင်း[[preprocessing-with-a-tokenizer]]
+
+အခြား neural network များကဲ့သို့ Transformer မော်ဒယ်များသည် raw text များကို တိုက်ရိုက်လုပ်ဆောင်၍ မရပါ။ ထို့ကြောင့် ကျွန်တော်တို့ pipeline ၏ ပထမအဆင့်မှာ text inputs များကို မော်ဒယ်နားလည်နိုင်သော ဂဏန်းများအဖြစ် ပြောင်းလဲခြင်းဖြစ်သည်။ ၎င်းကို ပြုလုပ်ရန် ကျွန်တော်တို့သည် *tokenizer* ကို အသုံးပြုပါသည်။ ၎င်းသည် အောက်ပါတို့ကို လုပ်ဆောင်ရန် တာဝန်ရှိသည်-
+
+- input ကို *tokens* ဟုခေါ်သော စကားလုံးများ၊ subwords များ သို့မဟုတ် သင်္ကေတများ (ဥပမာ- ပုဒ်ဖြတ်သံ) အဖြစ် ပိုင်းခြားခြင်း
+- token တစ်ခုစီကို integer တစ်ခုသို့ တွဲချိတ်ခြင်း
+- မော်ဒယ်အတွက် အသုံးဝင်နိုင်သော အပို inputs များကို ထည့်သွင်းခြင်း
+
+ဒီ preprocessing အားလုံးကို မော်ဒယ်ကို pre-trained လုပ်ခဲ့စဉ်က အတိအကျလုပ်ခဲ့တဲ့ နည်းလမ်းအတိုင်း ပြုလုပ်ဖို့ လိုအပ်ပါတယ်။ ဒါကြောင့် ကျွန်တော်တို့ အရင်ဆုံး [Model Hub](https://huggingface.co/models) ကနေ အဲဒီအချက်အလက်တွေကို download လုပ်ဖို့ လိုပါတယ်။ ဒါကို လုပ်ဖို့အတွက် `AutoTokenizer` class နဲ့ သူ့ရဲ့ `from_pretrained()` method ကို ကျွန်တော်တို့ အသုံးပြုပါတယ်။ ကျွန်တော်တို့ model ရဲ့ checkpoint name ကို အသုံးပြုပြီး၊ ၎င်းသည် model ရဲ့ tokenizer နဲ့ ဆက်စပ်နေတဲ့ ဒေတာတွေကို အလိုအလျောက် ရယူပြီး cache လုပ်ပါလိမ့်မယ် (ဒါကြောင့် အောက်က code ကို ပထမဆုံးအကြိမ် run မှသာ download လုပ်ပါလိမ့်မယ်)။
+
+`sentiment-analysis` pipeline ရဲ့ default checkpoint က `distilbert-base-uncased-finetuned-sst-2-english` ဖြစ်တာကြောင့် (၎င်းရဲ့ model card ကို [ဒီနေရာမှာ](https://huggingface.co/distilbert-base-uncased-finetuned-sst-2-english) ကြည့်နိုင်ပါတယ်)၊ အောက်ပါ code ကို ကျွန်တော်တို့ run ပြုလုပ်ပေးပါတယ်။
+
+```python
+from transformers import AutoTokenizer
+
+checkpoint = "distilbert-base-uncased-finetuned-sst-2-english"
+tokenizer = AutoTokenizer.from_pretrained(checkpoint)
+```
+
+tokenizer ကို ရရှိပြီဆိုတာနဲ့၊ ကျွန်တော်တို့ရဲ့ စာကြောင်းတွေကို တိုက်ရိုက် ပေးပို့နိုင်ပြီး model ကို ထည့်သွင်းဖို့ အဆင်သင့်ဖြစ်နေတဲ့ dictionary တစ်ခု ပြန်ရပါလိမ့်မယ်။ လုပ်ဆောင်ဖို့ ကျန်ရှိတာကတော့ input IDs တွေရဲ့ list ကို tensors တွေအဖြစ် ပြောင်းလဲဖို့ပါပဲ။
+
+သင်ဟာ backend မှာ ဘယ် ML framework ကို အသုံးပြုလဲဆိုတာ စိုးရိမ်စရာမလိုဘဲ 🤗 Transformers ကို အသုံးပြုနိုင်ပါတယ်။ အချို့မော်ဒယ်တွေအတွက် PyTorch ဒါမှမဟုတ် Flax ဖြစ်နိုင်ပါတယ်။ သို့သော် Transformer မော်ဒယ်တွေက *tensors* တွေကိုပဲ input အဖြစ် လက်ခံပါတယ်။ tensors တွေအကြောင်းကို အခုမှ စကြားဖူးတာဆိုရင်၊ ၎င်းတို့ကို NumPy arrays တွေအဖြစ် တွေးကြည့်နိုင်ပါတယ်။ NumPy array တစ်ခုက scalar (0D)၊ vector (1D)၊ matrix (2D) သို့မဟုတ် dimension များစွာရှိနိုင်ပါတယ်။ ဒါက တကယ်တော့ tensor တစ်ခုပါပဲ။ အခြား ML frameworks တွေရဲ့ tensors တွေလည်း အလားတူပဲ အလုပ်လုပ်ပြီး၊ NumPy arrays တွေလိုပဲ လွယ်ကူစွာ instantiate လုပ်နိုင်ပါတယ်။
+
+ကျွန်တော်တို့ ပြန်လိုချင်တဲ့ tensors (PyTorch သို့မဟုတ် plain NumPy) အမျိုးအစားကို သတ်မှတ်ဖို့အတွက် `return_tensors` argument ကို အသုံးပြုနိုင်ပါတယ်။
+
+```python
+raw_inputs = [
+    "I've been waiting for a HuggingFace course my whole life.",
+    "I hate this so much!",
+]
+inputs = tokenizer(raw_inputs, padding=True, truncation=True, return_tensors="pt")
+print(inputs)
+```
+
+padding နဲ့ truncation အကြောင်းကို အခုထိ စိတ်ပူမနေပါနဲ့၊ ဒါတွေကို နောက်မှ ရှင်းပြပါမယ်။ ဒီနေရာမှာ မှတ်ထားရမယ့် အဓိကအချက်တွေကတော့ သင်ဟာ စာကြောင်းတစ်ကြောင်း ဒါမှမဟုတ် စာကြောင်းများစွာပါတဲ့ list ကို ပေးပို့နိုင်သလို၊ သင်ပြန်လိုချင်တဲ့ tensors အမျိုးအစားကိုလည်း သတ်မှတ်နိုင်ပါတယ် (မည်သည့် type ကိုမျှ မပေးပို့ရင် list of lists အဖြစ် ရလဒ်ရပါလိမ့်မယ်)။
+
+PyTorch tensors အဖြစ် ရလဒ်တွေက အောက်ပါအတိုင်း ဖြစ်ပါတယ်။
+
+```python out
+{
+    'input_ids': tensor([
+        [  101,  1045,  1005,  2310,  2042,  3403,  2005,  1037, 17662, 12172, 2607,  2026,  2878,  2166,  1012,   102],
+        [  101,  1045,  5223,  2023,  2061,  2172,   999,   102,     0,     0,     0,     0,     0,     0,     0,     0]
+    ]), 
+    'attention_mask': tensor([
+        [1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1],
+        [1, 1, 1, 1, 1, 1, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0]
+    ])
+}
+```
+
+output ကိုယ်တိုင်က `input_ids` နဲ့ `attention_mask` ဆိုတဲ့ key နှစ်ခုပါဝင်တဲ့ dictionary တစ်ခု ဖြစ်ပါတယ်။ `input_ids` မှာ integer row နှစ်ခု (စာကြောင်းတစ်ကြောင်းစီအတွက် တစ်ခု) ပါဝင်ပြီး ၎င်းတို့ဟာ စာကြောင်းတစ်ကြောင်းစီရှိ tokens တွေရဲ့ ထူးခြားတဲ့ identifiers တွေ ဖြစ်ပါတယ်။ `attention_mask` ဆိုတာ ဘာလဲဆိုတာကို ဒီအခန်းရဲ့ နောက်ပိုင်းမှာ ကျွန်တော်တို့ ရှင်းပြပါမယ်။
+
+## Model ကို ဖြတ်သန်းခြင်း[[going-through-the-model]]
+
+ကျွန်တော်တို့ tokenizer ကို လုပ်ခဲ့သလိုပဲ pre-trained model ကို download လုပ်နိုင်ပါတယ်။ 🤗 Transformers က `from_pretrained()` method ပါဝင်တဲ့ `AutoModel` class ကို ပံ့ပိုးပေးပါတယ်။
+
+```python
+from transformers import AutoModel
+
+checkpoint = "distilbert-base-uncased-finetuned-sst-2-english"
+model = AutoModel.from_pretrained(checkpoint)
+```
+
+ဒီ code snippet မှာ ကျွန်တော်တို့ဟာ ယခင်က pipeline မှာ အသုံးပြုခဲ့တဲ့ checkpoint အတူတူကို download လုပ်ပြီး model တစ်ခုကို instantiate လုပ်ခဲ့ပါတယ်။ (ဒါကို အမှန်တကယ်တော့ cache လုပ်ထားပြီးသား ဖြစ်သင့်ပါတယ်)။
+
+ဒီ architecture မှာ base Transformer module သာ ပါဝင်ပါတယ်- inputs အချို့ကို ပေးလိုက်တဲ့အခါ ၎င်းသည် *hidden states* ဟုခေါ်သော အရာများကို ထုတ်ပေးပါတယ်။ ၎င်းတို့ကို *features* ဟုလည်း ခေါ်ပါတယ်။ model input တစ်ခုစီအတွက် **Transformer model က အဲဒီ input ကို အကြောင်းအရာအရ နားလည်ထားမှုကို ကိုယ်စားပြုတဲ့ high-dimensional vector တစ်ခုကို** ကျွန်တော်တို့ ပြန်ရပါလိမ့်မယ်။
+
+ဒါကို နားမလည်ရင် စိတ်မပူပါနဲ့။ ဒါတွေကို နောက်မှ အားလုံးရှင်းပြပါမယ်။
+
+ဒီ hidden states တွေက သူ့ဘာသာသူ အသုံးဝင်နိုင်ပေမယ့်၊ ၎င်းတို့ဟာ များသောအားဖြင့် *head* လို့ခေါ်တဲ့ model ရဲ့ နောက်ထပ်အစိတ်အပိုင်းတစ်ခုရဲ့ inputs တွေ ဖြစ်ပါတယ်။ [Chapter 1](/course/chapter1) မှာ မတူညီတဲ့ လုပ်ငန်းတာဝန်တွေကို architecture တူတူနဲ့ လုပ်ဆောင်နိုင်ခဲ့ပေမယ့်၊ ဒီလုပ်ငန်းတာဝန်တစ်ခုစီမှာ ၎င်းနဲ့ ဆက်စပ်နေတဲ့ head တစ်ခုစီ ရှိပါတယ်။
+
+### High-dimensional vector တစ်ခုလား။[[a-high-dimensional-vector]]
+
+Transformer module ကနေ ထုတ်ပေးတဲ့ vector ဟာ များသောအားဖြင့် ကြီးမားပါတယ်။ ဒါက အများအားဖြင့် dimensions သုံးခု ရှိပါတယ်-
+
+-   **Batch size**: တစ်ကြိမ်တည်း လုပ်ဆောင်တဲ့ sequence အရေအတွက် (ကျွန်တော်တို့ ဥပမာမှာ ၂ ခု)။
+-   **Sequence length**: sequence ရဲ့ ဂဏန်းဆိုင်ရာ ကိုယ်စားပြုမှုရဲ့ အရှည် (ကျွန်တော်တို့ ဥပမာမှာ ၁၆ ခု)။
+-   **Hidden size**: model input တစ်ခုစီရဲ့ vector dimension။
+
+နောက်ဆုံးတန်ဖိုးကြောင့် "high dimensional" လို့ ခေါ်တာ ဖြစ်ပါတယ်။ hidden size က အလွန်ကြီးမားနိုင်ပါတယ် (768 က ပိုသေးငယ်တဲ့ မော်ဒယ်တွေအတွက် အများအားဖြင့်ဖြစ်ပြီး၊ ပိုကြီးတဲ့ မော်ဒယ်တွေမှာ ဒါက 3072 ဒါမှမဟုတ် ပိုများနိုင်ပါတယ်)။
+
+ကျွန်တော်တို့ preprocessing လုပ်ထားတဲ့ inputs တွေကို model ကို ပေးပို့ကြည့်ရင် ဒါကို တွေ့နိုင်ပါတယ်။ 
+
+```python
+outputs = model(**inputs)
+print(outputs.last_hidden_state.shape)
+```
+
+```python out
+torch.Size([2, 16, 768])
+```
+
+🤗 Transformers မော်ဒယ်တွေရဲ့ outputs တွေဟာ `namedtuple` တွေ ဒါမှမဟုတ် dictionaries တွေလို အလုပ်လုပ်တယ်ဆိုတာကို သတိပြုပါ။ attribute တွေ (ကျွန်တော်တို့ လုပ်ခဲ့သလို) ဒါမှမဟုတ် key ( `outputs["last_hidden_state"]` ) နဲ့ ဒါမှမဟုတ် သင်ရှာနေတဲ့အရာ ဘယ်နေရာမှာရှိတယ်ဆိုတာ အတိအကျသိရင် index ( `outputs[0]` ) နဲ့ပါ ဝင်ရောက်ကြည့်ရှုနိုင်ပါတယ်။
+
+### Model heads: ဂဏန်းတွေကနေ အဓိပ္ပာယ်ထုတ်ယူခြင်း[[model-heads-making-sense-out-of-numbers]]
+
+Model heads တွေက hidden states တွေရဲ့ high-dimensional vector ကို input အဖြစ် ယူပြီး ၎င်းတို့ကို မတူညီတဲ့ dimension တစ်ခုပေါ်သို့ project လုပ်ပါတယ်။ ၎င်းတို့ဟာ များသောအားဖြင့် linear layers တစ်ခု ဒါမှမဟုတ် အနည်းငယ်နဲ့ ဖွဲ့စည်းထားပါတယ်-
+
+<div class="flex justify-center">
+<img class="block dark:hidden" src="https://huggingface.co/datasets/huggingface-course/documentation-images/resolve/main/en/chapter2/transformer_and_head.svg" alt="A Transformer network alongside its head."/>
+<img class="hidden dark:block" src="https://huggingface.co/datasets/huggingface-course/documentation-images/resolve/main/en/chapter2/transformer_and_head-dark.svg" alt="A Transformer network alongside its head."/>
+</div>
+
+Transformer model ရဲ့ output ကို model head ကို တိုက်ရိုက်ပို့ပြီး လုပ်ဆောင်ပါတယ်။
+
+ဒီပုံမှာ မော်ဒယ်ကို embeddings layer နဲ့ နောက်ဆက်တွဲ layers တွေနဲ့ ကိုယ်စားပြုထားပါတယ်။ embeddings layer က tokenized input ထဲက input ID တစ်ခုစီကို ၎င်းနဲ့ ဆက်စပ်နေတဲ့ token ကို ကိုယ်စားပြုတဲ့ vector တစ်ခုအဖြစ် ပြောင်းလဲပေးပါတယ်။ နောက်ဆက်တွဲ layers တွေက attention mechanism ကို အသုံးပြုပြီး အဲဒီ vectors တွေကို စီမံခန့်ခွဲကာ စာကြောင်းတွေရဲ့ နောက်ဆုံးကိုယ်စားပြုမှုကို ထုတ်ပေးပါတယ်။
+
+🤗 Transformers မှာ မတူညီတဲ့ architecture များစွာ ရရှိနိုင်ပြီး၊ တစ်ခုချင်းစီကို သီးခြားလုပ်ငန်းတစ်ခုကို ဖြေရှင်းဖို့ ဒီဇိုင်းထုတ်ထားပါတယ်။ အောက်ပါတို့ကတော့ မပြည့်စုံသေးသော စာရင်းတစ်ခု ဖြစ်ပါတယ်-
+
+-   `*Model` (hidden states များကို ပြန်ရယူခြင်း)
+-   `*ForCausalLM`
+-   `*ForMaskedLM`
+-   `*ForMultipleChoice`
+-   `*ForQuestionAnswering`
+-   `*ForSequenceClassification`
+-   `*ForTokenClassification`
+-   နဲ့ အခြားအရာများ 🤗
+
+ကျွန်တော်တို့ရဲ့ ဥပမာအတွက်၊ sequence classification head ပါဝင်တဲ့ မော်ဒယ်တစ်ခု လိုအပ်ပါလိမ့်မယ် (စာကြောင်းတွေကို positive သို့မဟုတ် negative အဖြစ် ခွဲခြားသတ်မှတ်နိုင်ဖို့)။ ဒါကြောင့် ကျွန်တော်တို့ဟာ `AutoModel` class ကို အမှန်တကယ် အသုံးပြုမှာ မဟုတ်ဘဲ `AutoModelForSequenceClassification` ကို အသုံးပြုပါမယ်။
+
+```python
+from transformers import AutoModelForSequenceClassification
+
+checkpoint = "distilbert-base-uncased-finetuned-sst-2-english"
+model = AutoModelForSequenceClassification.from_pretrained(checkpoint)
+outputs = model(**inputs)
+```
+
+အခု outputs တွေရဲ့ shape ကို ကြည့်လိုက်ရင်၊ dimensionality က အများကြီး နိမ့်သွားပါလိမ့်မယ်၊ model head က ယခင်က ကျွန်တော်တို့ တွေ့ခဲ့တဲ့ high-dimensional vectors တွေကို input အဖြစ် ယူပြီး၊ တန်ဖိုးနှစ်ခု (label တစ်ခုစီအတွက် တစ်ခု) ပါဝင်တဲ့ vectors တွေကို ထုတ်ပေးပါတယ်။
+
+```python
+print(outputs.logits.shape)
+```
+
+```python out
+torch.Size([2, 2])
+```
+
+ကျွန်တော်တို့မှာ စာကြောင်းနှစ်ကြောင်းနဲ့ label နှစ်ခုပဲ ရှိတာကြောင့်၊ ကျွန်တော်တို့ model ကနေ ရရှိတဲ့ ရလဒ်ဟာ 2 x 2 shape ဖြစ်ပါတယ်။
+
+## Output ကို Postprocessing ပြုလုပ်ခြင်း[[postprocessing-the-output]]
+
+ကျွန်တော်တို့ model ကနေ output အဖြစ် ရရှိတဲ့ တန်ဖိုးတွေက သူ့ဘာသာသူ အဓိပ္ပာယ်ရှိတာ မဟုတ်ပါဘူး။ ကြည့်ကြည့်ရအောင်။
+
+```python
+print(outputs.logits)
+```
+
+```python out
+tensor([[-1.5607,  1.6123],
+        [ 4.1692, -3.3464]], grad_fn=<AddmmBackward>)
+```
+
+ကျွန်တော်တို့ရဲ့ model က ပထမစာကြောင်းအတွက် `[-1.5607, 1.6123]` ကို ခန့်မှန်းခဲ့ပြီး၊ ဒုတိယစာကြောင်းအတွက် `[ 4.1692, -3.3464]` ကို ခန့်မှန်းခဲ့ပါတယ်။ ဒါတွေက ဖြစ်နိုင်ခြေတွေ မဟုတ်ဘဲ *logits* တွေ ဖြစ်ပါတယ်။ ၎င်းတို့က model ရဲ့ နောက်ဆုံး layer ကနေ ထုတ်ပေးတဲ့ raw, unnormalized scores တွေပါ။ ဖြစ်နိုင်ခြေတွေအဖြစ် ပြောင်းလဲဖို့အတွက် [SoftMax](https://en.wikipedia.org/wiki/Softmax_function) layer ကို ဖြတ်သန်းဖို့ လိုအပ်ပါတယ် (🤗 Transformers model အားလုံးက logits တွေကို ထုတ်ပေးပါတယ်၊ ဘာလို့လဲဆိုတော့ training အတွက် loss function က SoftMax လိုမျိုး နောက်ဆုံး activation function နဲ့ cross entropy လိုမျိုး loss function အမှန်တကယ်ကို ပေါင်းစပ်ထားတာ ဖြစ်လေ့ရှိပါတယ်)။
+
+```py
+import torch
+
+predictions = torch.nn.functional.softmax(outputs.logits, dim=-1)
+print(predictions)
+```
+
+```python out
+tensor([[4.0195e-02, 9.5980e-01],
+        [9.9946e-01, 5.4418e-04]], grad_fn=<SoftmaxBackward>)
+```
+
+အခု ကျွန်တော်တို့ model က ပထမစာကြောင်းအတွက် `[0.0402, 0.9598]` ကို ခန့်မှန်းခဲ့ပြီး၊ ဒုတိယစာကြောင်းအတွက် `[0.9995, 0.0005]` ကို ခန့်မှန်းခဲ့တယ်ဆိုတာ တွေ့ရပါပြီ။ ဒါတွေက အသိအမှတ်ပြုနိုင်တဲ့ ဖြစ်နိုင်ခြေ scores တွေ ဖြစ်ပါတယ်။
+
+position တစ်ခုစီနဲ့ ကိုက်ညီတဲ့ labels တွေကို ရယူဖို့အတွက် model config ရဲ့ `id2label` attribute ကို စစ်ဆေးနိုင်ပါတယ် (ဒီအကြောင်းကို နောက်အပိုင်းမှာ ပိုမိုသိရှိရပါလိမ့်မယ်)။
+
+```python
+model.config.id2label
+```
+
+```python out
+{0: 'NEGATIVE', 1: 'POSITIVE'}
+```
+
+Nအခု ကျွန်တော်တို့ model က အောက်ပါအတိုင်း ခန့်မှန်းခဲ့တယ်လို့ ကောက်ချက်ချနိုင်ပါပြီ-
+ 
+- ပထမစာကြောင်း - NEGATIVE: 0.0402, POSITIVE: 0.9598
+- ဒုတိယစာကြောင်း - NEGATIVE: 0.9995, POSITIVE: 0.0005
+
+ကျွန်တော်တို့ pipeline ရဲ့ အဆင့်သုံးဆင့်လုံးကို အောင်မြင်စွာ ပြန်လည်ဖန်တီးနိုင်ခဲ့ပါပြီ- tokenizers တွေနဲ့ preprocessing လုပ်ခြင်း၊ model ကနေတဆင့် inputs တွေကို ပေးပို့ခြင်း၊ နဲ့ postprocessing လုပ်ခြင်းတို့ ဖြစ်ပါတယ်။ အခုတော့ ဒီအဆင့်တစ်ခုချင်းစီကို ပိုပြီး နက်နက်နဲနဲ လေ့လာကြည့်ရအောင်။
+
+> [!TIP]
+> ✏️ **စမ်းသပ်ကြည့်ပါ။** သင်ကိုယ်တိုင် စာသား (၂) ခု (သို့မဟုတ် ပိုမိုများပြား) ရွေးချယ်ပြီး `sentiment-analysis` pipeline ကနေတဆင့် run ပါ။ ထို့နောက် ဒီနေရာမှာ သင်တွေ့ခဲ့ရတဲ့ အဆင့်တွေကို ကိုယ်တိုင်ပြန်လုပ်ပြီး တူညီတဲ့ ရလဒ်တွေ ရရှိမရရှိ စစ်ဆေးပါ။
+
+## ဝေါဟာရ ရှင်းလင်းချက် (Glossary)
+
+*   **Pipeline**: Hugging Face Transformers library မှာ ပါဝင်တဲ့ လုပ်ဆောင်ချက်တစ်ခုဖြစ်ပြီး မော်ဒယ်တွေကို သီးခြားလုပ်ငန်းတာဝန်များ (ဥပမာ- စာသားခွဲခြားသတ်မှတ်ခြင်း၊ စာသားထုတ်လုပ်ခြင်း) အတွက် အသုံးပြုရလွယ်ကူအောင် ပြုလုပ်ပေးပါတယ်။
+*   **Preprocessing**: Machine Learning မော်ဒယ်တစ်ခုကို မထည့်သွင်းမီ raw data များကို လုပ်ဆောင်ရန် အသင့်ဖြစ်အောင် ပြင်ဆင်ခြင်း။
+*   **Postprocessing**: Machine Learning မော်ဒယ်တစ်ခု၏ output များကို ပိုမိုနားလည်လွယ်သော သို့မဟုတ် အသုံးဝင်သော ပုံစံသို့ ပြောင်းလဲခြင်း။
+*   **Neural Networks**: လူသားဦးနှောက်၏ လုပ်ဆောင်မှုပုံစံကို အတုယူထားသော ကွန်ပျူတာစနစ်များ။
+*   **Transformer Models**: Natural Language Processing (NLP) မှာ အောင်မြင်မှုများစွာရရှိခဲ့တဲ့ deep learning architecture တစ်မျိုးပါ။ ၎င်းတို့ဟာ စာသားတွေထဲက စကားလုံးတွေရဲ့ ဆက်နွယ်မှုတွေကို "attention mechanism" သုံးပြီး နားလည်အောင် သင်ကြားပေးပါတယ်။
+*   **Raw Text**: မည်သည့်လုပ်ဆောင်မှုမျှ မပြုလုပ်ရသေးသော သို့မဟုတ် ပုံစံမချရသေးသော မူရင်းစာသား။
+*   **Tokenizer**: စာသား (သို့မဟုတ် အခြားဒေတာ) ကို AI မော်ဒယ်များ စီမံဆောင်ရွက်နိုင်ရန် tokens တွေအဖြစ် ပိုင်းခြားပေးသည့် ကိရိယာ သို့မဟုတ် လုပ်ငန်းစဉ်။
+*   **Tokens**: စာသားကို ခွဲခြမ်းစိတ်ဖြာရာတွင် အသုံးပြုသော အသေးငယ်ဆုံးယူနစ်များ (ဥပမာ- စကားလုံးများ၊ subwords များ သို့မဟုတ် ပုဒ်ဖြတ်သံများ)။
+*   **Integer**: အပြည့်အစုံ ကိန်းဂဏန်း။
+*   **Pretrained**: ဒေတာအမြောက်အမြားပေါ်တွင် ကြိုတင်လေ့ကျင့်ထားပြီးသား Artificial Intelligence (AI) မော်ဒယ်တစ်ခု။
+*   **Model Hub**: Hugging Face ပေါ်ရှိ pre-trained model များနှင့် datasets များကို ရှာဖွေ၊ မျှဝေပြီး အသုံးပြုနိုင်သော online platform။
+*   **`AutoTokenizer` Class**: Hugging Face Transformers library မှာ ပါဝင်တဲ့ class တစ်ခုဖြစ်ပြီး မော်ဒယ်အမည်ကို အသုံးပြုပြီး သက်ဆိုင်ရာ tokenizer ကို အလိုအလျောက် load လုပ်ပေးသည်။
+*   **`from_pretrained()` Method**: Pre-trained model သို့မဟုတ် tokenizer ကို load လုပ်ရန် အသုံးပြုသော method။
+*   **Checkpoint Name**: အင်တာနက်ပေါ်ရှိ Hugging Face Hub မှ pre-trained model သို့မဟုတ် tokenizer ကို ဖော်ထုတ်ရန် အသုံးပြုသော အမည်။
+*   **Cache**: မကြာခဏ အသုံးပြုရသော ဒေတာများကို အမြန်ဆုံး ဝင်ရောက်ရယူနိုင်ရန် ယာယီသိုလှောင်ထားသော နေရာ။
+*   **`sentiment-analysis` pipeline**: စာသားတစ်ခု၏ စိတ်ခံစားမှု (အပြုသဘော သို့မဟုတ် အနုတ်သဘော) ကို ခွဲခြမ်းစိတ်ဖြာပေးသော pipeline။
+*   **`distilbert-base-uncased-finetuned-sst-2-english`**: `sentiment-analysis` pipeline ၏ default checkpoint အဖြစ် အသုံးပြုသော DistilBERT မော်ဒယ်၏ အမည်။ `base` သည် မော်ဒယ်၏ အရွယ်အစားကို ဖော်ပြပြီး `uncased` သည် စာလုံးအကြီးအသေး ခွဲခြားခြင်းမရှိဘဲ လေ့ကျင့်ထားကြောင်း ဖော်ပြသည်။ `finetuned-sst-2-english` က SST-2 dataset တွင် English ဘာသာစကားအတွက် fine-tune လုပ်ထားသည်ကို ဆိုလိုသည်။
+*   **Model Card**: Hugging Face Hub ပေါ်ရှိ မော်ဒယ်တစ်ခု၏ အချက်အလက်များ၊ အသုံးပြုပုံနှင့် စွမ်းဆောင်ရည်များကို အကျဉ်းချုပ်ဖော်ပြထားသော စာမျက်နှာ။
+*   **Dictionary**: key-value pair များဖြင့် ဒေတာများကို သိုလှောင်သော ဒေတာဖွဲ့စည်းပုံ။
+*   **Tensors**: Machine Learning frameworks (PyTorch, TensorFlow) များတွင် ဒေတာများကို ကိုယ်စားပြုသော multi-dimensional array များ။
+*   **NumPy Arrays**: Python တွင် ဂဏန်းတွက်ချက်မှုများအတွက် အသုံးပြုသော multi-dimensional array များအတွက် library။
+*   **Scalar (0D)**: Dimension မရှိသော တစ်ခုတည်းသော ကိန်းဂဏန်းတန်ဖိုး။
+*   **Vector (1D)**: ကိန်းဂဏန်းတန်ဖိုးများ၏ တစ်ကြောင်းတည်းသော sequence။
+*   **Matrix (2D)**: ကိန်းဂဏန်းတန်ဖိုးများ၏ နှစ်ကြောင်းအတန်းလိုက် စီစဉ်ထားသော အစုအဝေး။
+*   **`return_tensors` Argument**: tokenizer ကို ခေါ်ဆိုသောအခါ ပြန်လိုချင်သော tensor အမျိုးအစားကို သတ်မှတ်ရန် အသုံးပြုသော argument။
+*   **`padding`**: မတူညီသော အရှည်ရှိသည့် input sequence များကို အရှည်တူညီအောင် သတ်မှတ်ထားသော တန်ဖိုးများဖြင့် ဖြည့်စွက်ခြင်း။
+*   **`truncation`**: အရှည်ကန့်သတ်ချက်ထက် ပိုနေသော input sequence များကို ဖြတ်တောက်ခြင်း။
+*   **`input_ids`**: Tokenizer မှ ထုတ်ပေးသော tokens တစ်ခုစီ၏ ထူးခြားသော ဂဏန်းဆိုင်ရာ ID များ။
+*   **`attention_mask`**: မော်ဒယ်ကို အာရုံစိုက်သင့်သည့် tokens များနှင့် လျစ်လျူရှုသင့်သည့် (padding) tokens များကို ခွဲခြားပေးသည့် binary mask။
+*   **`AutoModel` Class**: Hugging Face Transformers library မှာ ပါဝင်တဲ့ class တစ်ခုဖြစ်ပြီး မော်ဒယ်အမည်ကို အသုံးပြုပြီး Transformer model ကို အလိုအလျောက် load လုပ်ပေးသည်။
+*   **Hidden States**: Transformer model ၏ အလယ်အလတ် layers များမှ ထုတ်ပေးသော output များ။ ၎င်းတို့သည် input ၏ အကြောင်းအရာဆိုင်ရာ ကိုယ်စားပြုမှုများကို ဖမ်းယူထားသည်။
+*   **Features**: Hidden states များကို ရည်ညွှန်းသော အခြားအသုံးအနှုန်းတစ်ခု။
+*   **High-dimensional Vector**: dimension များစွာရှိသော vector တစ်ခု။
+*   **Batch Size**: မော်ဒယ်က တစ်ပြိုင်နက်တည်း လုပ်ဆောင်သော input sequence အရေအတွက်။
+*   **Sequence Length**: input sequence ၏ token အရေအတွက်။
+*   **Hidden Size**: hidden states vector တစ်ခု၏ dimension အရွယ်အစား။
+*   **`namedtuple`s**: Python တွင် tuple ကဲ့သို့ အလုပ်လုပ်သော်လည်း attribute name များဖြင့် elements များကို ဝင်ရောက်ကြည့်ရှုနိုင်သော data type။
+*   **Model Heads**: Transformer model ၏ hidden states များကို သီးခြားလုပ်ငန်းတစ်ခုအတွက် လိုအပ်သော output များအဖြစ် ပြောင်းလဲပေးသော အစိတ်အပိုင်း။ များသောအားဖြင့် linear layers များဖြင့် ဖွဲ့စည်းထားသည်။
+*   **Embeddings Layer**: input IDs များကို vector representations များအဖြစ် ပြောင်းလဲပေးသော model layer။
+*   **`*Model`**: base Transformer model (hidden states များကို ပြန်ရယူရန်) ကို ကိုယ်စားပြုသော Hugging Face model class family။
+*   **`*ForCausalLM`**: Causal Language Modeling (နောက်ထပ် token ကို ခန့်မှန်းခြင်း) အတွက် ဒီဇိုင်းထုတ်ထားသော model class family။
+*   **`*ForMaskedLM`**: Masked Language Modeling (ပျောက်ဆုံးနေသော token များကို ဖြည့်စွက်ခြင်း) အတွက် ဒီဇိုင်းထုတ်ထားသော model class family။
+*   **`*ForMultipleChoice`**: Multiple Choice question answering အတွက် ဒီဇိုင်းထုတ်ထားသော model class family။
+*   **`*ForQuestionAnswering`**: Question Answering လုပ်ငန်းတာဝန်များအတွက် ဒီဇိုင်းထုတ်ထားသော model class family။
+*   **`*ForSequenceClassification`**: Sequence Classification လုပ်ငန်းတာဝန်များ (ဥပမာ- sentiment analysis) အတွက် ဒီဇိုင်းထုတ်ထားသော model class family။
+*   **`*ForTokenClassification`**: Token Classification လုပ်ငန်းတာဝန်များ (ဥပမာ- Named Entity Recognition) အတွက် ဒီဇိုင်းထုတ်ထားသော model class family။
+*   **`AutoModelForSequenceClassification`**: Hugging Face Transformers library မှာ ပါဝင်တဲ့ class တစ်ခုဖြစ်ပြီး sequence classification အတွက် pre-trained model ကို အလိုအလျောက် load လုပ်ပေးသည်။
+*   **`outputs.logits`**: မော်ဒယ်၏ နောက်ဆုံး layer မှ ထုတ်ပေးသော raw, unnormalized scores များ။
+*   **Logits**: မော်ဒယ်၏ နောက်ဆုံး layer မှ ထုတ်ပေးသော raw, unnormalized scores များ။ ၎င်းတို့သည် ဖြစ်နိုင်ခြေများအဖြစ်သို့ ပြောင်းလဲခြင်းမရှိသေးပါ။
+*   **SoftMax Layer**: input numbers များကို 0 နှင့် 1 ကြားရှိ ဖြစ်နိုင်ခြေများအဖြစ်သို့ ပြောင်းလဲပေးသော activation function တစ်ခု။ ၎င်းတို့၏ စုစုပေါင်းသည် 1 ဖြစ်သည်။
+*   **Loss Function**: မော်ဒယ်၏ ခန့်မှန်းချက်များနှင့် အမှန်တကယ်တန်ဖိုးများကြား ကွာခြားမှုကို တိုင်းတာသော function တစ်ခု။
+*   **Cross Entropy**: Classification လုပ်ငန်းများတွင် အသုံးများသော loss function တစ်ခု။
+*   **`torch.nn.functional.softmax(outputs.logits, dim=-1)`**: PyTorch တွင် softmax function ကို `outputs.logits` ပေါ်တွင် နောက်ဆုံး dimension (dim=-1) အတိုင်း အသုံးပြုခြင်း။
+*   **`model.config.id2label`**: Model configuration ထဲတွင် `id` (ဂဏန်း) မှ `label` (စာသား) သို့ တွဲချိတ်ပေးသော dictionary တစ်ခု။
\ No newline at end of file
diff --git a/chapters/my/chapter2/3.mdx b/chapters/my/chapter2/3.mdx
new file mode 100644
index 000000000..b7d947938
--- /dev/null
+++ b/chapters/my/chapter2/3.mdx
@@ -0,0 +1,342 @@
+<FrameworkSwitchCourse {fw} />
+
+# Models[[the-models]]
+
+<CourseFloatingBanner chapter={2}
+  classNames="absolute z-10 right-0 top-0"
+  notebooks={[
+    {label: "Google Colab", value: "https://colab.research.google.com/github/huggingface/notebooks/blob/master/course/en/chapter2/section3_pt.ipynb"},
+    {label: "Aws Studio", value: "https://studiolab.sagemaker.aws/import/github/huggingface/notebooks/blob/master/course/en/chapter2/section3_pt.ipynb"},
+]} />
+
+<Youtube id="AhChOFRegn4"/>
+
+ဒီအပိုင်းမှာတော့ model တွေကို ဘယ်လိုဖန်တီးရမလဲ၊ အသုံးပြုရမလဲဆိုတာကို ပိုမိုနက်နဲစွာ လေ့လာသွားပါမယ်။ checkpoint တစ်ခုကနေ မည်သည့် model ကိုမဆို instantiate လုပ်ချင်တဲ့အခါ အသုံးဝင်တဲ့ `AutoModel` class ကို ကျွန်တော်တို့ အသုံးပြုပါမယ်။
+
+## Transformer တစ်ခုကို ဖန်တီးခြင်း[[creating-a-transformer]]
+
+`AutoModel` တစ်ခုကို instantiate လုပ်တဲ့အခါ ဘာတွေဖြစ်ပျက်လဲဆိုတာကို ကြည့်ခြင်းဖြင့် စတင်လိုက်ရအောင်။
+
+```py
+from transformers import AutoModel
+
+model = AutoModel.from_pretrained("bert-base-cased")
+```
+
+tokenizer နဲ့ ဆင်တူစွာ၊ `from_pretrained()` method က Hugging Face Hub ကနေ model data တွေကို download လုပ်ပြီး cache လုပ်ပါလိမ့်မယ်။ ယခင်က ဖော်ပြခဲ့သလိုပဲ၊ checkpoint name က သီးခြား model architecture နဲ့ weights တွေနဲ့ ကိုက်ညီပါတယ်။ ဒီဥပမာမှာတော့ basic architecture (12 layers, 768 hidden size, 12 attention heads) နဲ့ cased inputs (ဆိုလိုသည်မှာ စာလုံးအကြီးအသေး ခွဲခြားမှုက အရေးကြီးသည်) ပါဝင်တဲ့ BERT model တစ်ခု ဖြစ်ပါတယ်။ Hub မှာ ရရှိနိုင်တဲ့ checkpoints များစွာ ရှိပါတယ် - [ဒီနေရာမှာ](https://huggingface.co/models) ရှာဖွေနိုင်ပါတယ်။
+
+`AutoModel` class နဲ့ ၎င်းရဲ့ ဆက်စပ် classes တွေဟာ တကယ်တော့ ပေးထားတဲ့ checkpoint အတွက် သင့်လျော်တဲ့ model architecture ကို ရယူဖို့ ဒီဇိုင်းထုတ်ထားတဲ့ ရိုးရှင်းတဲ့ wrappers တွေပါ။ ဒါက "auto" class တစ်ခုဖြစ်ပြီး သင့်အတွက် သင့်လျော်တဲ့ model architecture ကို ခန့်မှန်းပြီး မှန်ကန်တဲ့ model class ကို instantiate လုပ်ပေးပါလိမ့်မယ်။ သို့သော်၊ သင်အသုံးပြုချင်တဲ့ model အမျိုးအစားကို သိရှိထားတယ်ဆိုရင်၊ ၎င်းရဲ့ architecture ကို တိုက်ရိုက်သတ်မှတ်ပေးတဲ့ class ကို အသုံးပြုနိုင်ပါတယ်။ 
+
+```py
+from transformers import BertModel
+
+model = BertModel.from_pretrained("bert-base-cased")
+```
+
+## Loading နှင့် Saving[[loading-and-saving]]
+
+Model တစ်ခုကို save လုပ်တာက tokenizer တစ်ခုကို save လုပ်တာလိုပဲ ရိုးရှင်းပါတယ်။ တကယ်တော့၊ model တွေမှာ model ရဲ့ weights တွေနဲ့ architecture configuration တွေကို save လုပ်ပေးတဲ့ `save_pretrained()` method တူတူကို ပိုင်ဆိုင်ထားပါတယ်။
+
+```py
+model.save_pretrained("directory_on_my_computer")
+```
+
+ဒါက သင့် disk ထဲမှာ ဖိုင်နှစ်ခုကို save လုပ်ပါလိမ့်မယ်။ 
+
+```
+ls directory_on_my_computer
+
+config.json model.safetensors
+```
+
+*config.json* ဖိုင်ထဲကို ကြည့်လိုက်ရင်၊ model architecture ကို တည်ဆောက်ဖို့ လိုအပ်တဲ့ attributes အားလုံးကို တွေ့ရပါလိမ့်မယ်။ ဒီဖိုင်ထဲမှာ checkpoint ဘယ်ကနေ စတင်ခဲ့သလဲ၊ နောက်ဆုံး checkpoint ကို save လုပ်ခဲ့တုန်းက သင်အသုံးပြုခဲ့တဲ့ 🤗 Transformers version စတဲ့ metadata အချို့လည်း ပါဝင်ပါတယ်။
+
+*pytorch_model.safetensors* ဖိုင်ကို state dictionary လို့ခေါ်ပါတယ်။ ၎င်းထဲမှာ သင့် model ရဲ့ weights အားလုံး ပါဝင်ပါတယ်။ ဖိုင်နှစ်ခုစလုံး အတူတူ အလုပ်လုပ်ပါတယ်- configuration file က model architecture အကြောင်း သိရှိဖို့ လိုအပ်ပြီး၊ model weights တွေကတော့ model ရဲ့ parameters တွေ ဖြစ်ပါတယ်။
+
+save လုပ်ထားတဲ့ model တစ်ခုကို ပြန်လည်အသုံးပြုဖို့အတွက် `from_pretrained()` method ကို ထပ်မံအသုံးပြုပါ။
+
+```py
+from transformers import AutoModel
+
+model = AutoModel.from_pretrained("directory_on_my_computer")
+```
+
+🤗 Transformers library ရဲ့ အံ့ဖွယ်ကောင်းတဲ့ အင်္ဂါရပ်တစ်ခုကတော့ model တွေနဲ့ tokenizers တွေကို community နဲ့ အလွယ်တကူ မျှဝေနိုင်စွမ်းပါပဲ။ ဒါကို လုပ်ဖို့ Hugging Face မှာ account ရှိဖို့ သေချာပါစေ။ သင် notebook ကို အသုံးပြုနေတယ်ဆိုရင်၊ ဒါနဲ့ အလွယ်တကူ log in လုပ်ဆောင်နိုင်ပါတယ်။
+
+```python
+from huggingface_hub import notebook_login
+
+notebook_login()
+```
+
+မဟုတ်ရင်တော့ သင့် terminal မှာ အောက်ပါအတိုင်း run ပါ။
+
+```bash
+huggingface-cli login
+```
+
+အဲဒီနောက် `push_to_hub()` method နဲ့ model ကို Hub ကို push လုပ်နိုင်ပါတယ်။
+
+```py
+model.push_to_hub("my-awesome-model")
+```
+
+ဒါက model files တွေကို Hub ကို upload လုပ်ပါလိမ့်မယ်။ သင့် namespace အောက်မှာ *my-awesome-model* လို့ နာမည်ပေးထားတဲ့ repository ထဲမှာပါ။ အဲဒီနောက်၊ မည်သူမဆို သင့် model ကို `from_pretrained()` method နဲ့ load လုပ်နိုင်ပါပြီ။
+
+```py
+from transformers import AutoModel
+
+model = AutoModel.from_pretrained("your-username/my-awesome-model")
+```
+
+Hub API နဲ့ ပိုပြီး လုပ်ဆောင်နိုင်တာတွေ အများကြီး ရှိပါတယ်-
+- local repository ကနေ model တစ်ခုကို push လုပ်ခြင်း
+- အားလုံးကို ပြန်လည် upload မလုပ်ဘဲ သီးခြား files များကို update လုပ်ခြင်း
+- model ရဲ့ စွမ်းဆောင်ရည်၊ ကန့်သတ်ချက်များ၊ သိထားတဲ့ bias စသည်တို့ကို မှတ်တမ်းတင်ဖို့ model cards တွေ ထည့်သွင်းခြင်း
+
+ဒီအကြောင်းအရာတွေအတွက် ပြည့်စုံတဲ့ tutorial ကို [documentation](https://huggingface.co/docs/huggingface_hub/how-to-upstream) မှာ ကြည့်ရှုနိုင်ပါတယ်၊ ဒါမှမဟုတ် အဆင့်မြင့် [Chapter 4](/course/chapter4) ကို လေ့လာနိုင်ပါတယ်။
+
+## Text များကို Encoding လုပ်ခြင်း[[encoding-text]]
+
+Transformer မော်ဒယ်တွေက inputs တွေကို ဂဏန်းတွေအဖြစ် ပြောင်းလဲခြင်းဖြင့် text တွေကို ကိုင်တွယ်ပါတယ်။ ဒီနေရာမှာ သင်ရဲ့ text ကို tokenizer က ဘယ်လိုလုပ်ဆောင်တယ်ဆိုတာကို အတိအကျ ကြည့်ပါမယ်။ [Chapter 1](/course/chapter1) မှာ tokenizer တွေက text ကို tokens တွေအဖြစ် ပိုင်းခြားပြီး အဲဒီ tokens တွေကို ဂဏန်းတွေအဖြစ် ပြောင်းလဲတယ်ဆိုတာကို ကျွန်တော်တို့ တွေ့ခဲ့ရပါပြီ။ ဒီ conversion ကို ရိုးရှင်းတဲ့ tokenizer တစ်ခုနဲ့ ကြည့်နိုင်ပါတယ်။
+
+```py
+from transformers import AutoTokenizer
+
+tokenizer = AutoTokenizer.from_pretrained("bert-base-cased")
+
+encoded_input = tokenizer("Hello, I'm a single sentence!")
+print(encoded_input)
+```
+
+```python out
+{'input_ids': [101, 8667, 117, 1000, 1045, 1005, 1049, 2235, 17662, 12172, 1012, 102], 
+ 'token_type_ids': [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0], 
+ 'attention_mask': [1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1]}
+```
+
+ကျွန်တော်တို့ဟာ အောက်ပါ fields တွေပါဝင်တဲ့ dictionary တစ်ခုကို ရရှိပါတယ် -
+- input_ids: သင့် tokens တွေရဲ့ ဂဏန်းဆိုင်ရာ ကိုယ်စားပြုမှုများ
+- token_type_ids: ဒါတွေက model ကို input ရဲ့ ဘယ်အပိုင်းက sentence A ဖြစ်ပြီး ဘယ်အပိုင်းက sentence B ဖြစ်တယ်ဆိုတာ ပြောပြပါတယ် (နောက်အပိုင်းမှာ ပိုမိုဆွေးနွေးပါမယ်)
+- attention_mask: ဒါက မည်သည့် tokens များကို အာရုံစိုက်သင့်ပြီး မည်သည့် tokens များကို အာရုံစိုက်ရန် မလိုအပ်ကြောင်း ဖော်ပြပါတယ် (နောက်မှ ပိုမိုဆွေးနွေးပါမယ်)
+
+မူရင်း text ကို ပြန်ရဖို့အတွက် input IDs တွေကို decode လုပ်နိုင်ပါတယ်။
+
+```py
+tokenizer.decode(encoded_input["input_ids"])
+```
+
+```python out
+"[CLS] Hello, I'm a single sentence! [SEP]"
+```
+
+tokenizer က model လိုအပ်တဲ့ special tokens တွေဖြစ်တဲ့ `[CLS]` နဲ့ `[SEP]` တွေကို ထည့်သွင်းပေးထားတာကို သင်သတိထားမိပါလိမ့်မယ်။ မော်ဒယ်အားလုံးက special tokens တွေ လိုအပ်တာ မဟုတ်ပါဘူး။ ၎င်းတို့ကို မော်ဒယ်ကို pretrained လုပ်ခဲ့တုန်းက အသုံးပြုခဲ့ရင် အသုံးဝင်ပါတယ်။ အဲဒီအခါမှာတော့ tokenizer က ဒီ tokens တွေကို model က မျှော်လင့်ထားတဲ့အတိုင်း ထည့်ပေးဖို့ လိုအပ်ပါတယ်။
+
+စာကြောင်းများစွာကို တစ်ကြိမ်တည်း encode လုပ်နိုင်ပါတယ်၊ ဒါကို batch လုပ်ခြင်းဖြင့် (ဒီအကြောင်းကို မကြာမီ ဆွေးနွေးပါမယ်) ဒါမှမဟုတ် list တစ်ခု ပေးပို့ခြင်းဖြင့် လုပ်ဆောင်နိုင်ပါတယ်။
+
+```py
+encoded_input = tokenizer("How are you?", "I'm fine, thank you!")
+print(encoded_input)
+```
+
+```python out
+{'input_ids': [[101, 1731, 1132, 1128, 136, 102], [101, 1045, 1005, 1049, 2503, 117, 5763, 1128, 136, 102]], 
+ 'token_type_ids': [[0, 0, 0, 0, 0, 0], [0, 0, 0, 0, 0, 0, 0, 0, 0, 0]], 
+ 'attention_mask': [[1, 1, 1, 1, 1, 1], [1, 1, 1, 1, 1, 1, 1, 1, 1, 1]]}
+```
+
+စာကြောင်းများစွာကို ပေးပို့တဲ့အခါ၊ tokenizer က dictionary value တစ်ခုစီအတွက် စာကြောင်းတစ်ခုစီအတွက် list တစ်ခု ပြန်ပေးတာကို သတိပြုပါ။ tokenizer ကို PyTorch ကနေ tensors တွေကို တိုက်ရိုက်ပြန်ပေးဖို့လည်း တောင်းဆိုနိုင်ပါတယ်။
+
+```py
+encoded_input = tokenizer("How are you?", "I'm fine, thank you!", return_tensors="pt")
+print(encoded_input)
+```
+
+```python out
+{'input_ids': tensor([[  101,  1731,  1132,  1128,   136,   102],
+         [  101,  1045,  1005,  1049,  2503,   117,  5763,  1128,   136,   102]]), 
+ 'token_type_ids': tensor([[0, 0, 0, 0, 0, 0],
+         [0, 0, 0, 0, 0, 0, 0, 0, 0, 0]]), 
+ 'attention_mask': tensor([[1, 1, 1, 1, 1, 1],
+         [1, 1, 1, 1, 1, 1, 1, 1, 1, 1]])}
+```
+
+ဒါပေမယ့် ပြဿနာတစ်ခု ရှိပါတယ်- list နှစ်ခုရဲ့ အရှည်က မတူပါဘူး။ Arrays နဲ့ tensors တွေဟာ ထောင့်မှန်ပုံစံ (rectangular shapes) ဖြစ်ဖို့ လိုအပ်ပါတယ်။ ဒါကြောင့် ဒီ list တွေကို PyTorch tensor (သို့မဟုတ် NumPy array) အဖြစ် ရိုးရှင်းစွာ ပြောင်းလဲလို့ မရပါဘူး။ tokenizer က ဒါအတွက် ရွေးချယ်စရာတစ်ခု ပေးထားပါတယ်၊ padding ပါ။
+
+### Inputs တွေကို Padding လုပ်ခြင်း[[padding-inputs]]
+
+ကျွန်တော်တို့ inputs တွေကို pad လုပ်ဖို့ tokenizer ကို တောင်းဆိုရင်၊ ၎င်းက အရှည်ဆုံးစာကြောင်းထက် တိုနေတဲ့ စာကြောင်းတွေမှာ special padding token တွေ ထည့်သွင်းခြင်းဖြင့် စာကြောင်းအားလုံးကို အရှည်တူညီအောင် ပြုလုပ်ပေးပါလိမ့်မယ်။
+
+```py
+encoded_input = tokenizer(
+    ["How are you?", "I'm fine, thank you!"], padding=True, return_tensors="pt"
+)
+print(encoded_input)
+```
+
+```python out
+{'input_ids': tensor([[  101,  1731,  1132,  1128,   136,   102,     0,     0,     0,     0],
+         [  101,  1045,  1005,  1049,  2503,   117,  5763,  1128,   136,   102]]), 
+ 'token_type_ids': tensor([[0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
+         [0, 0, 0, 0, 0, 0, 0, 0, 0, 0]]), 
+ 'attention_mask': tensor([[1, 1, 1, 1, 1, 1, 0, 0, 0, 0],
+         [1, 1, 1, 1, 1, 1, 1, 1, 1, 1]])}
+```
+
+အခု ကျွန်တော်တို့မှာ ထောင့်မှန်ပုံစံ tensors တွေ ရပါပြီ။ padding tokens တွေကို ID 0 နဲ့ input IDs တွေအဖြစ် encode လုပ်ထားပြီး၊ ၎င်းတို့မှာ attention mask value ကလည်း 0 ဖြစ်တာကို သတိပြုပါ။ ဒါက ဘာလို့လဲဆိုတော့ အဲဒီ padding tokens တွေကို model က analyze လုပ်ဖို့ မလိုအပ်ပါဘူး- ၎င်းတို့က တကယ့်စာကြောင်းရဲ့ အစိတ်အပိုင်းတွေ မဟုတ်ပါဘူး။
+
+### Inputs တွေကို Truncating လုပ်ခြင်း[[truncating-inputs]]
+
+Tensors တွေက model က လုပ်ဆောင်ဖို့အတွက် အရမ်းကြီးလာနိုင်ပါတယ်။ ဥပမာအားဖြင့်၊ BERT ကို အများဆုံး tokens 512 ခုအထိပဲ sequences တွေနဲ့ pretrain လုပ်ထားတာကြောင့် ပိုရှည်တဲ့ sequences တွေကို လုပ်ဆောင်လို့ မရပါဘူး။ သင့်မှာ model က ကိုင်တွယ်နိုင်တာထက် ပိုရှည်တဲ့ sequences တွေရှိရင်၊ `truncation` parameter နဲ့ ၎င်းတို့ကို ဖြတ်တောက်ဖို့ လိုအပ်ပါလိမ့်မယ်-
+
+```py
+encoded_input = tokenizer(
+    "This is a very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very long sentence.",
+    truncation=True,
+)
+print(encoded_input["input_ids"])
+```
+
+```python out
+[101, 1188, 1110, 170, 1505, 1505, 1505, 1505, 1505, 1505, 1505, 1505, 1505, 1505, 1505, 1505, 1505, 1505, 1505, 1505, 1505, 1505, 1505, 1505, 1505, 1505, 1505, 1505, 1505, 1505, 1505, 1505, 1505, 1505, 1505, 1505, 1505, 1505, 1505, 1505, 1505, 1505, 1505, 1505, 1505, 1505, 1505, 1505, 1505, 1505, 1505, 1505, 1505, 1505, 1505, 1505, 1505, 1505, 1505, 1505, 1505, 1505, 1505, 1505, 1505, 1505, 1505, 1505, 1505, 1505, 1505, 1505, 1505, 1505, 1505, 1505, 1505, 1505, 1505, 1505, 1505, 1505, 1505, 1505, 1505, 1505, 1505, 1505, 1505, 1505, 1505, 1505, 1505, 1505, 1505, 1505, 1505, 1505, 1505, 1505, 1505, 1505, 1505, 1505, 1505, 1505, 1505, 1505, 1505, 1505, 1505, 1505, 1505, 1505, 1505, 1505, 1505, 1505, 1505, 1505, 1505, 1505, 1505, 1505, 1505, 1505, 1505, 1505, 1179, 5650, 119, 102]
+```
+
+padding နဲ့ truncation arguments တွေကို ပေါင်းစပ်ခြင်းဖြင့်၊ သင်လိုအပ်တဲ့ တိကျတဲ့ size ရှိတဲ့ tensors တွေကို ရရှိကြောင်း သေချာစေနိုင်ပါတယ်။
+
+```py
+encoded_input = tokenizer(
+    ["How are you?", "I'm fine, thank you!"],
+    padding=True,
+    truncation=True,
+    max_length=5,
+    return_tensors="pt",
+)
+print(encoded_input)
+```
+
+```python out
+{'input_ids': tensor([[  101,  1731,  1132,  1128,   102],
+         [  101,  1045,  1005,  1049,   102]]), 
+ 'token_type_ids': tensor([[0, 0, 0, 0, 0],
+         [0, 0, 0, 0, 0]]), 
+ 'attention_mask': tensor([[1, 1, 1, 1, 1],
+         [1, 1, 1, 1, 1]])}
+```
+
+### Special Tokens တွေ ထည့်သွင်းခြင်း
+
+Special tokens တွေ (သို့မဟုတ် ၎င်းတို့ရဲ့ သဘောတရား) ဟာ BERT နဲ့ ဆင်းသက်လာတဲ့ မော်ဒယ်တွေအတွက် အထူးအရေးကြီးပါတယ်။ ဒီ tokens တွေကို စာကြောင်းရဲ့ အစ ( `[CLS]` ) ဒါမှမဟုတ် စာကြောင်းတွေကြားက ပိုင်းခြားတဲ့နေရာ ( `[SEP]` ) လိုမျိုး စာကြောင်းရဲ့ နယ်နိမိတ်တွေကို ပိုမိုကောင်းမွန်စွာ ကိုယ်စားပြုနိုင်ဖို့ ထည့်သွင်းထားတာပါ။ ရိုးရှင်းတဲ့ ဥပမာတစ်ခုကို ကြည့်ကြည့်ရအောင်။
+
+```py
+encoded_input = tokenizer("How are you?")
+print(encoded_input["input_ids"])
+tokenizer.decode(encoded_input["input_ids"])
+```
+
+```python out
+[101, 1731, 1132, 1128, 136, 102]
+'[CLS] How are you? [SEP]'
+```
+
+ဒီ special tokens တွေကို tokenizer က အလိုအလျောက် ထည့်သွင်းပေးပါတယ်။ မော်ဒယ်အားလုံးက special tokens တွေ လိုအပ်တာ မဟုတ်ပါဘူး။ ၎င်းတို့ကို မော်ဒယ်ကို pretrained လုပ်ခဲ့တုန်းက အသုံးပြုခဲ့ရင် အဓိကအားဖြင့် အသုံးပြုပါတယ်။ အဲဒီအခါမှာတော့ tokenizer က ဒီ tokens တွေကို model က မျှော်လင့်ထားတဲ့အတိုင်း ထည့်ပေးပါလိမ့်မယ်။
+
+### ဒါတွေအားလုံး ဘာကြောင့် လိုအပ်တာလဲ။
+
+ဒီနေရာမှာ တိကျတဲ့ ဥပမာတစ်ခု ရှိပါတယ်- encode လုပ်ထားတဲ့ sequences တွေကို စဉ်းစားကြည့်ပါ။
+
+```py
+sequences = [
+    "I've been waiting for a HuggingFace course my whole life.",
+    "I hate this so much!",
+]
+```
+
+tokenized လုပ်ပြီးတာနဲ့ ကျွန်တော်တို့မှာ အောက်ပါအတိုင်း ရှိပါတယ်။
+
+```python
+encoded_sequences = [
+    [
+        101,
+        1045,
+        1005,
+        2310,
+        2042,
+        3403,
+        2005,
+        1037,
+        17662,
+        12172,
+        2607,
+        2026,
+        2878,
+        2166,
+        1012,
+        102,
+    ],
+    [101, 1045, 5223, 2023, 2061, 2172, 999, 102],
+]
+```
+
+ဒါက encode လုပ်ထားတဲ့ sequences တွေရဲ့ list တစ်ခုပါ- list of lists တစ်ခုပေါ့။ Tensors တွေက ထောင့်မှန်ပုံစံ (rectangular shapes) တွေကိုသာ လက်ခံပါတယ် (matrices တွေကို တွေးကြည့်ပါ)။ ဒီ "array" ဟာ ထောင့်မှန်ပုံစံ ဖြစ်နေပြီဆိုတော့ ဒါကို tensor အဖြစ် ပြောင်းလဲဖို့က လွယ်ကူပါတယ်။
+
+```py
+import torch
+
+model_inputs = torch.tensor(encoded_sequences)
+```
+
+### Tensors များကို Model ၏ Inputs များအဖြစ် အသုံးပြုခြင်း[[using-the-tensors-as-inputs-to-the-model]]
+
+tensors တွေကို model နဲ့ အသုံးပြုတာက အလွန်ရိုးရှင်းပါတယ်- inputs တွေနဲ့ model ကို ခေါ်လိုက်ရုံပါပဲ။
+
+```py
+output = model(model_inputs)
+```
+
+model က မတူညီတဲ့ arguments များစွာကို လက်ခံပေမယ့်၊ input IDs တွေကသာ လိုအပ်တဲ့ arguments တွေပါ။ အခြား arguments တွေက ဘာလုပ်တယ်၊ ဘယ်အချိန်မှာ လိုအပ်တယ်ဆိုတာကို နောက်မှ ရှင်းပြပါမယ်။ ဒါပေမယ့် ပထမဆုံး Transformer model က နားလည်နိုင်တဲ့ inputs တွေကို တည်ဆောက်တဲ့ tokenizers တွေအကြောင်းကို ပိုပြီး နက်နဲစွာ လေ့လာဖို့ လိုအပ်ပါတယ်။
+
+## ဝေါဟာရ ရှင်းလင်းချက် (Glossary)
+
+*   **Models**: Artificial Intelligence (AI) နယ်ပယ်တွင် အသုံးပြုသော သင်္ချာဆိုင်ရာ ပုံစံများ သို့မဟုတ် algorithms များ။
+*   **`AutoModel` Class**: Hugging Face Transformers library မှာ ပါဝင်တဲ့ class တစ်ခုဖြစ်ပြီး မော်ဒယ်အမည်ကို အသုံးပြုပြီး Transformer model ကို အလိုအလျောက် load လုပ်ပေးသည်။
+*   **Checkpoint**: မော်ဒယ်တစ်ခု၏ လေ့ကျင့်မှုအခြေအနေ (weights, architecture configuration) ကို သတ်မှတ်ထားသော အချိန်တစ်ခုတွင် မှတ်တမ်းတင်ထားခြင်း။
+*   **`from_pretrained()` Method**: Pre-trained model သို့မဟုတ် tokenizer ကို load လုပ်ရန် အသုံးပြုသော method။
+*   **`bert-base-cased`**: BERT မော်ဒယ်၏ အမည်။ `base` သည် မော်ဒယ်၏ အရွယ်အစားကို ဖော်ပြပြီး `cased` သည် စာလုံးအကြီးအသေး ခွဲခြားမှုကို ထည့်သွင်းစဉ်းစားပြီး လေ့ကျင့်ထားကြောင်း ဖော်ပြသည်။
+*   **Hugging Face Hub**: AI မော်ဒယ်တွေ၊ datasets တွေနဲ့ demo တွေကို အခြားသူတွေနဲ့ မျှဝေဖို့၊ ရှာဖွေဖို့နဲ့ ပြန်လည်အသုံးပြုဖို့အတွက် အွန်လိုင်း platform တစ်ခု ဖြစ်ပါတယ်။
+*   **Model Architecture**: မော်ဒယ်တစ်ခု၏ ဖွဲ့စည်းပုံ၊ layer အမျိုးအစားများ၊ ၎င်းတို့ ချိတ်ဆက်ပုံ စသည်တို့ကို ဖော်ပြသော ဒီဇိုင်း။
+*   **Weights**: Machine Learning မော်ဒယ်တစ်ခု၏ သင်ယူနိုင်သော အစိတ်အပိုင်းများ။ ၎င်းတို့သည် လေ့ကျင့်နေစဉ်အတွင်း ဒေတာများမှ ပုံစံများကို သင်ယူကာ ချိန်ညှိပေးသည်။
+*   **Layers**: Neural Network တစ်ခု၏ အဆင့်များ။
+*   **Hidden Size**: Hidden states vector တစ်ခု၏ dimension အရွယ်အစား။
+*   **Attention Heads**: Transformer model ၏ attention mechanism တွင် ပါဝင်သော အပြိုင်လုပ်ဆောင်နိုင်သည့် အစိတ်အပိုင်းများ။
+*   **Cased Inputs**: စာလုံးအကြီးအသေး ကွာခြားမှုကို ထည့်သွင်းစဉ်းစားသည့် inputs များ။
+*   **Wrappers**: အခြား code များကို ပိုမိုလွယ်ကူစွာ အသုံးပြုနိုင်စေရန် ပတ်ခြုံပေးထားသော code များ။
+*   **`BertModel` Class**: BERT model architecture ကို တိုက်ရိုက်သတ်မှတ်ပေးသော class။
+*   **`save_pretrained()` Method**: Model သို့မဟုတ် tokenizer ၏ weights များနှင့် architecture configuration ကို save လုပ်ရန် အသုံးပြုသော method။
+*   **`config.json`**: Model architecture တည်ဆောက်ရန် လိုအပ်သော attributes များနှင့် metadata များ ပါဝင်သော JSON ဖိုင်။
+*   **`model.safetensors` / `pytorch_model.safetensors`**: Model ၏ weights များ ပါဝင်သော ဖိုင်။
+*   **State Dictionary**: မော်ဒယ်တစ်ခု၏ သင်ယူထားသော parameters (weights) များကို သိုလှောင်ထားသော dictionary။
+*   **Metadata**: ဒေတာအကြောင်းအရာနှင့်ပတ်သက်သော အချက်အလက်များ (ဥပမာ - ဘယ်ကနေ စတင်ခဲ့သည်၊ ဘယ် version ဖြင့် save လုပ်ခဲ့သည်)။
+*   **Hugging Face**: AI နှင့် machine learning အတွက် tools များနှင့် platform များ ထောက်ပံ့ပေးသော ကုမ္ပဏီ။
+*   **`huggingface_hub`**: Hugging Face Hub နှင့် ချိတ်ဆက်ရန်အတွက် Python library။
+*   **`notebook_login()`**: Jupyter/Colab notebook များတွင် Hugging Face Hub ကို log in လုပ်ရန် အသုံးပြုသော function။
+*   **`huggingface-cli login`**: terminal တွင် Hugging Face Hub ကို log in လုပ်ရန် အသုံးပြုသော command line tool။
+*   **`push_to_hub()` Method**: model ကို Hugging Face Hub သို့ upload လုပ်ရန် အသုံးပြုသော method။
+*   **Repository**: Hugging Face Hub ပေါ်ရှိ model files များ သို့မဟုတ် datasets များကို သိုလှောင်ထားသော နေရာ။
+*   **Namespace**: Hugging Face Hub တွင် သုံးစွဲသူအကောင့် သို့မဟုတ် အဖွဲ့အစည်းအမည်။
+*   **Hub API**: Hugging Face Hub နှင့် ပရိုဂရမ်ဖြင့် အပြန်အလှန် ချိတ်ဆက်ရန်အတွက် API။
+*   **Local Repository**: သင့်ကွန်ပျူတာပေါ်ရှိ model files များ သို့မဟုတ် datasets များကို သိုလှောင်ထားသော နေရာ။
+*   **Model Cards**: Hugging Face Hub ပေါ်ရှိ မော်ဒယ်တစ်ခု၏ အချက်အလက်များ၊ အသုံးပြုပုံနှင့် စွမ်းဆောင်ရည်များကို အကျဉ်းချုပ်ဖော်ပြထားသော စာမျက်နှာ။
+*   **`AutoTokenizer` Class**: Hugging Face Transformers library မှာ ပါဝင်တဲ့ class တစ်ခုဖြစ်ပြီး မော်ဒယ်အမည်ကို အသုံးပြုပြီး သက်ဆိုင်ရာ tokenizer ကို အလိုအလျောက် load လုပ်ပေးသည်။
+*   **Tokens**: စာသားကို ခွဲခြမ်းစိတ်ဖြာရာတွင် အသုံးပြုသော အသေးငယ်ဆုံးယူနစ်များ (ဥပမာ- စကားလုံးများ၊ subwords များ သို့မဟုတ် ပုဒ်ဖြတ်သံများ)။
+*   **`encoded_input`**: Tokenizer ကနေ ထွက်လာတဲ့ encode လုပ်ထားတဲ့ input data။
+*   **`input_ids`**: Tokenizer မှ ထုတ်ပေးသော tokens တစ်ခုစီ၏ ထူးခြားသော ဂဏန်းဆိုင်ရာ ID များ။
+*   **`token_type_ids`**: Multi-sentence inputs တွေမှာ မည်သည့် token က မည်သည့် sentence (A သို့မဟုတ် B) မှ လာသည်ကို model ကို ပြောပြပေးသော IDs များ။
+*   **`attention_mask`**: မော်ဒယ်ကို အာရုံစိုက်သင့်သည့် tokens များနှင့် လျစ်လျူရှုသင့်သည့် (padding) tokens များကို ခွဲခြားပေးသည့် binary mask။
+*   **`tokenizer.decode()` Method**: Token IDs များကို မူရင်းစာသားသို့ ပြန်ပြောင်းလဲပေးသော method။
+*   **Special Tokens**: Transformer model များက စာကြောင်းနယ်နိမိတ်များ သို့မဟုတ် အခြားအချက်အလက်များကို ကိုယ်စားပြုရန် အသုံးပြုသော အထူး tokens များ (ဥပမာ - `[CLS]`, `[SEP]`, `[PAD]`)။
+*   **`[CLS]`**: BERT မော်ဒယ်တွင် classification task အတွက် အသုံးပြုသော special token (စာကြောင်း၏ အစတွင် ပေါ်လာသည်)။
+*   **`[SEP]`**: BERT မော်ဒယ်တွင် စာကြောင်းများကြား ပိုင်းခြားရန် အသုံးပြုသော special token။
+*   **Batching**: မတူညီသော input များစွာကို တစ်ပြိုင်နက်တည်း လုပ်ဆောင်နိုင်ရန် အုပ်စုဖွဲ့ခြင်း။
+*   **Tensors**: Machine Learning frameworks (PyTorch, TensorFlow) များတွင် ဒေတာများကို ကိုယ်စားပြုသော multi-dimensional array များ။
+*   **Rectangular Shapes**: ညီညာသော အတန်းများနှင့် ကော်လံများပါဝင်သည့် ပုံစံ (matrices ကဲ့သို့)။
+*   **Arguments**: function သို့မဟုတ် method တစ်ခုသို့ ပေးပို့သော တန်ဖိုးများ။
\ No newline at end of file
diff --git a/chapters/my/chapter2/4.mdx b/chapters/my/chapter2/4.mdx
new file mode 100644
index 000000000..c1acefe9a
--- /dev/null
+++ b/chapters/my/chapter2/4.mdx
@@ -0,0 +1,252 @@
+<FrameworkSwitchCourse {fw} />
+
+# Tokenizers[[tokenizers]]
+
+<CourseFloatingBanner chapter={2}
+  classNames="absolute z-10 right-0 top-0"
+  notebooks={[
+    {label: "Google Colab", value: "https://colab.research.google.com/github/huggingface/notebooks/blob/master/course/en/chapter2/section4_pt.ipynb"},
+    {label: "Aws Studio", value: "https://studiolab.sagemaker.aws/import/github/huggingface/notebooks/blob/master/course/en/chapter2/section4_pt.ipynb"},
+]} />
+
+<Youtube id="VFp38yj8h3A"/>
+
+Tokenizers တွေဟာ NLP pipeline ရဲ့ အဓိက အစိတ်အပိုင်းတွေထဲက တစ်ခုပါ။ ၎င်းတို့မှာ ရည်ရွယ်ချက်တစ်ခုတည်းပဲ ရှိပါတယ်၊ text ကို model က လုပ်ဆောင်နိုင်တဲ့ data အဖြစ် ပြောင်းလဲဖို့ပါပဲ။ Model တွေက ဂဏန်းတွေကိုပဲ လုပ်ဆောင်နိုင်တာမို့၊ tokenizers တွေက ကျွန်တော်တို့ရဲ့ text inputs တွေကို numerical data အဖြစ် ပြောင်းလဲပေးဖို့ လိုအပ်ပါတယ်။ ဒီအပိုင်းမှာတော့ tokenization pipeline မှာ ဘာတွေ အတိအကျဖြစ်ပျက်လဲဆိုတာကို လေ့လာသွားပါမယ်။
+
+NLP လုပ်ငန်းတွေမှာ အများအားဖြင့် လုပ်ဆောင်တဲ့ data က raw text ပါ။ ဒီလို text ရဲ့ ဥပမာတစ်ခုကို အောက်မှာ ကြည့်ပါ။
+
+```
+Jim Henson was a puppeteer
+```
+
+သို့သော်လည်း၊ model တွေက ဂဏန်းတွေကိုပဲ လုပ်ဆောင်နိုင်တာမို့၊ raw text ကို ဂဏန်းတွေအဖြစ် ပြောင်းလဲဖို့ နည်းလမ်းတစ်ခုကို ကျွန်တော်တို့ ရှာဖွေဖို့ လိုအပ်ပါတယ်။ ဒါက tokenizers တွေ လုပ်ဆောင်တဲ့အရာ ဖြစ်ပြီး၊ ဒါကို လုပ်ဆောင်ဖို့ နည်းလမ်းများစွာ ရှိပါတယ်။ ရည်ရွယ်ချက်ကတော့ အဓိပ္ပာယ်အရှိဆုံး ကိုယ်စားပြုမှု (ဆိုလိုသည်မှာ model အတွက် အဓိပ္ပာယ်အရှိဆုံး ကိုယ်စားပြုမှု) နဲ့ ဖြစ်နိုင်ရင် အသေးငယ်ဆုံး ကိုယ်စားပြုမှုကို ရှာဖွေဖို့ပါပဲ။
+
+tokenization algorithm အချို့ရဲ့ ဥပမာတွေကို ကြည့်ပြီး၊ tokenization နဲ့ ပတ်သက်ပြီး သင့်မှာရှိနိုင်တဲ့ မေးခွန်းအချို့ကို ဖြေဆိုဖို့ ကြိုးစားကြရအောင်။
+
+## Word-based[[word-based]]
+
+<Youtube id="nhJxYji1aho"/>
+
+ပထမဆုံး တွေးမိတဲ့ tokenizer အမျိုးအစားကတော့ *word-based* ပါ။ ဒါကို စည်းမျဉ်းအနည်းငယ်နဲ့ တည်ဆောက်ပြီး အသုံးပြုဖို့ အလွန်လွယ်ကူပြီး၊ အများအားဖြင့် ကောင်းမွန်တဲ့ ရလဒ်တွေ ထွက်ပေါ်လာပါတယ်။ ဥပမာအားဖြင့်၊ အောက်က ပုံမှာ၊ ရည်ရွယ်ချက်က raw text ကို စကားလုံးတွေအဖြစ် ပိုင်းခြားပြီး တစ်ခုချင်းစီအတွက် ဂဏန်းဆိုင်ရာ ကိုယ်စားပြုမှုကို ရှာဖွေဖို့ပါပဲ။
+
+<div class="flex justify-center">
+  <img class="block dark:hidden" src="https://huggingface.co/datasets/huggingface-course/documentation-images/resolve/main/en/chapter2/word_based_tokenization.svg" alt="An example of word-based tokenization."/>
+  <img class="hidden dark:block" src="https://huggingface.co/datasets/huggingface-course/documentation-images/resolve/main/en/chapter2/word_based_tokenization-dark.svg" alt="An example of word-based tokenization."/>
+</div>
+
+text ကို ပိုင်းခြားဖို့ နည်းလမ်းအမျိုးမျိုး ရှိပါတယ်။ ဥပမာအားဖြင့်၊ Python ရဲ့ `split()` function ကို အသုံးပြုပြီး whitespace ကို သုံးကာ text ကို စကားလုံးတွေအဖြစ် tokenize လုပ်ဆောင်နိုင်ပါတယ်။
+
+```py
+tokenized_text = "Jim Henson was a puppeteer".split()
+print(tokenized_text)
+```
+
+```python out
+['Jim', 'Henson', 'was', 'a', 'puppeteer']
+```
+
+punctuation အတွက် အပိုစည်းမျဉ်းတွေရှိတဲ့ word tokenizer အမျိုးအစားတွေလည်း ရှိပါသေးတယ်။ ဒီလို tokenizer အမျိုးအစားနဲ့ဆိုရင် ကျွန်တော်တို့ဟာ အတော်လေး ကြီးမားတဲ့ "vocabularies" တွေနဲ့ အဆုံးသတ်နိုင်ပါတယ်။ vocabulary ဆိုတာက ကျွန်တော်တို့ရဲ့ corpus မှာရှိတဲ့ သီးခြား tokens စုစုပေါင်းအရေအတွက်နဲ့ သတ်မှတ်ပါတယ်။
+
+စကားလုံးတစ်ခုစီကို ID တစ်ခုစီ ခွဲပေးပြီး 0 ကနေစပြီး vocabulary အရွယ်အစားအထိ သတ်မှတ်ပေးပါတယ်။ model က ဒီ IDs တွေကို စကားလုံးတစ်ခုစီကို ခွဲခြားသိမြင်ဖို့ အသုံးပြုပါတယ်။
+
+ကျွန်တော်တို့ဟာ word-based tokenizer တစ်ခုနဲ့ ဘာသာစကားတစ်ခုကို အပြည့်အစုံ ကာဗာလုပ်ချင်တယ်ဆိုရင်၊ ဘာသာစကားထဲက စကားလုံးတစ်ခုစီအတွက် identifier တစ်ခုစီ ရှိဖို့ လိုအပ်ပါလိမ့်မယ်။ ဒါက ကြီးမားတဲ့ tokens အရေအတွက်ကို ထုတ်ပေးပါလိမ့်မယ်။ ဥပမာအားဖြင့်၊ English ဘာသာစကားမှာ စကားလုံး ၅၀၀,၀၀၀ ကျော်ရှိတာကြောင့် စကားလုံးတစ်ခုစီကနေ input ID တစ်ခုဆီ map လုပ်ဖို့အတွက် ID အရေအတွက် အများကြီးကို မှတ်ထားဖို့ လိုအပ်ပါလိမ့်မယ်။ ဒါ့အပြင် "dog" လို စကားလုံးတွေကို "dogs" လို စကားလုံးတွေနဲ့ မတူအောင် ကိုယ်စားပြုထားပြီး၊ "dog" နဲ့ "dogs" တို့ဟာ ဆင်တူတယ်ဆိုတာကို model က အစပိုင်းမှာ သိဖို့ နည်းလမ်းမရှိပါဘူး- ၎င်းက စကားလုံးနှစ်ခုကို ဆက်စပ်မှုမရှိဘူးလို့ ခွဲခြားသတ်မှတ်ပါလိမ့်မယ်။ "run" နဲ့ "running" လို အခြားဆင်တူစကားလုံးတွေနဲ့လည်း အတူတူပါပဲ၊ model က အစပိုင်းမှာ ဆင်တူတယ်လို့ မမြင်ပါဘူး။
+
+နောက်ဆုံးအနေနဲ့၊ ကျွန်တော်တို့ရဲ့ vocabulary မှာ မပါဝင်တဲ့ စကားလုံးတွေကို ကိုယ်စားပြုဖို့ custom token တစ်ခု လိုအပ်ပါတယ်။ ဒါကို "unknown" token လို့ ခေါ်ပြီး၊ မကြာခဏဆိုသလို "[UNK]" သို့မဟုတ် "&lt;unk&gt;" နဲ့ ကိုယ်စားပြုပါတယ်။ tokenizer က ဒီ tokens တွေ အများကြီး ထုတ်ပေးနေတာကို သင်တွေ့ရရင် ဒါက မကောင်းတဲ့ လက္ခဏာတစ်ခုပါ။ ဘာလို့လဲဆိုတော့ ၎င်းက စကားလုံးတစ်ခုရဲ့ အဓိပ္ပာယ်ရှိတဲ့ ကိုယ်စားပြုမှုကို ရယူနိုင်ခြင်းမရှိဘဲ၊ သင်ဟာ အချက်အလက်တွေကို ဆုံးရှုံးနေတာကြောင့်ပါပဲ။ vocabulary ကို ဖန်တီးတဲ့အခါ ရည်ရွယ်ချက်ကတော့ tokenizer က unknown token အဖြစ် စကားလုံးအနည်းဆုံးကို tokenized လုပ်နိုင်အောင် ဖန်တီးဖို့ပါပဲ။
+
+unknown tokens အရေအတွက်ကို လျှော့ချဖို့ နည်းလမ်းတစ်ခုကတော့ တစ်ဆင့်နိမ့်ဆင်းပြီး *character-based* tokenizer ကို အသုံးပြုဖို့ပါပဲ။
+
+## Character-based[[character-based]]
+
+<Youtube id="ssLq_EK2jLE"/>
+
+Character-based tokenizers တွေက text ကို စကားလုံးတွေအစား characters တွေအဖြစ် ပိုင်းခြားပါတယ်။ ဒါက အဓိက အကျိုးကျေးဇူး နှစ်ခု ရှိပါတယ်-
+
+-   Vocabulary က အများကြီး သေးငယ်ပါတယ်။
+-   out-of-vocabulary (unknown) tokens တွေ အများကြီး နည်းပါးသွားပါတယ်။ ဘာလို့လဲဆိုတော့ စကားလုံးတိုင်းကို characters တွေကနေ တည်ဆောက်နိုင်လို့ပါ။
+
+ဒါပေမယ့် ဒီနေရာမှာလည်း နေရာလွတ်တွေ (spaces) နဲ့ punctuation တွေနဲ့ ပတ်သက်ပြီး မေးခွန်းအချို့ ပေါ်ပေါက်လာပါတယ်။
+
+<div class="flex justify-center">
+  <img class="block dark:hidden" src="https://huggingface.co/datasets/huggingface-course/documentation-images/resolve/main/en/chapter2/character_based_tokenization.svg" alt="An example of character-based tokenization."/>
+  <img class="hidden dark:block" src="https://huggingface.co/datasets/huggingface-course/documentation-images/resolve/main/en/chapter2/character_based_tokenization-dark.svg" alt="An example of character-based tokenization."/>
+</div>
+
+ဒီနည်းလမ်းကလည်း ပြီးပြည့်စုံတာ မဟုတ်ပါဘူး။ ကိုယ်စားပြုမှုဟာ စကားလုံးတွေအစား characters တွေပေါ် အခြေခံထားတာကြောင့်၊ တစ်ခုတည်းသော character တစ်ခုက သူ့ဘာသာသူ အဓိပ္ပာယ်သိပ်မရှိဘူးလို့ အလိုလိုဆိုနိုင်ပါတယ်။ စကားလုံးတွေနဲ့ဆိုရင်တော့ အဲလို မဟုတ်ပါဘူး။ ဒါပေမယ့် ဒါက ဘာသာစကားပေါ် မူတည်ပြီး ကွဲပြားပါတယ်။ ဥပမာအားဖြင့်၊ တရုတ်ဘာသာစကားမှာ character တစ်ခုစီက Latin ဘာသာစကားက character တစ်ခုထက် အချက်အလက် ပိုသယ်ဆောင်ပါတယ်။
+
+စဉ်းစားရမယ့် နောက်ထပ်အချက်တစ်ခုကတော့ ကျွန်တော်တို့ရဲ့ model က လုပ်ဆောင်ရမယ့် tokens ပမာဏ အများကြီးနဲ့ အဆုံးသတ်ရပါလိမ့်မယ်။ word-based tokenizer တစ်ခုနဲ့ဆိုရင် စကားလုံးတစ်လုံးဟာ token တစ်ခုတည်းသာ ဖြစ်ပေမယ့်၊ characters တွေအဖြစ် ပြောင်းလဲလိုက်တဲ့အခါ token ၁၀ ခု သို့မဟုတ် ပိုများတဲ့အထိ အလွယ်တကူ ဖြစ်သွားနိုင်ပါတယ်။
+
+နှစ်ခုစလုံးရဲ့ အကောင်းဆုံးကို ရယူဖို့အတွက်၊ ချဉ်းကပ်မှုနှစ်ခုကို ပေါင်းစပ်ထားတဲ့ တတိယနည်းပညာဖြစ်တဲ့ *subword tokenization* ကို ကျွန်တော်တို့ အသုံးပြုနိုင်ပါတယ်။
+
+## Subword Tokenization[[subword-tokenization]]
+
+<Youtube id="zHvTiHr506c"/>
+
+Subword tokenization algorithm တွေဟာ မကြာခဏ အသုံးပြုတဲ့ စကားလုံးတွေကို ပိုသေးငယ်တဲ့ subwords တွေအဖြစ် ပိုင်းခြားသင့်ပါဘူး၊ ဒါပေမယ့် ရှားပါးတဲ့ စကားလုံးတွေကိုတော့ အဓိပ္ပာယ်ရှိတဲ့ subwords တွေအဖြစ် ခွဲခြမ်းသင့်တယ်ဆိုတဲ့ နိယာမပေါ် အခြေခံပါတယ်။
+
+ဥပမာအားဖြင့်၊ "annoyingly" ကို ရှားပါးတဲ့ စကားလုံးအဖြစ် မှတ်ယူနိုင်ပြီး "annoying" နဲ့ "ly" အဖြစ် ခွဲခြားနိုင်ပါတယ်။ ဒါတွေဟာ တစ်ဦးချင်းစီ subwords တွေအဖြစ် ပိုမိုမကြာခဏ ပေါ်လာနိုင်ဖွယ်ရှိပြီး၊ တစ်ချိန်တည်းမှာ "annoyingly" ရဲ့ အဓိပ္ပာယ်ကို "annoying" နဲ့ "ly" ရဲ့ ပေါင်းစပ်အဓိပ္ပာယ်ကနေ ထိန်းသိမ်းထားပါတယ်။
+
+ဒီနေရာမှာ subword tokenization algorithm က "Let's do tokenization!" ဆိုတဲ့ sequence ကို ဘယ်လို tokenize လုပ်မယ်ဆိုတာကို ပြသထားတဲ့ ဥပမာတစ်ခု ဖြစ်နိုင်ပါတယ်။
+
+<div class="flex justify-center">
+  <img class="block dark:hidden" src="https://huggingface.co/datasets/huggingface-course/documentation-images/resolve/main/en/chapter2/bpe_subword.svg" alt="A subword tokenization algorithm."/>
+  <img class="hidden dark:block" src="https://huggingface.co/datasets/huggingface-course/documentation-images/resolve/main/en/chapter2/bpe_subword-dark.svg" alt="A subword tokenization algorithm."/>
+</div>
+
+ဒီ subwords တွေဟာ အဓိပ္ပာယ်ဆိုင်ရာ အချက်အလက်များစွာကို ပေးစွမ်းပါတယ်။ ဥပမာအားဖြင့်၊ အထက်ပါ ဥပမာမှာ "tokenization" ကို "token" နဲ့ "ization" အဖြစ် ပိုင်းခြားခဲ့ပါတယ်။ ဒါတွေဟာ အဓိပ္ပာယ်ရှိတဲ့ tokens နှစ်ခုဖြစ်ပြီး နေရာလည်းသက်သာပါတယ် (ရှည်လျားတဲ့ စကားလုံးတစ်လုံးကို ကိုယ်စားပြုဖို့ tokens နှစ်ခုပဲ လိုအပ်ပါတယ်)။ ဒါက ကျွန်တော်တို့ကို သေးငယ်တဲ့ vocabularies တွေနဲ့ ကောင်းမွန်တဲ့ coverage ကို ပေးနိုင်ပြီး unknown tokens တွေလည်း မရှိသလောက်ပါပဲ။
+
+ဒီနည်းလမ်းက Turkish လိုမျိုး agglutinative languages တွေမှာ အထူးအသုံးဝင်ပါတယ်။ ဘာလို့လဲဆိုတော့ subwords တွေကို ဆက်စပ်ပြီး (နီးပါး) ပမာဏအကန့်အသတ်မရှိ ရှည်လျားတဲ့ ရှုပ်ထွေးတဲ့ စကားလုံးတွေကို ဖွဲ့စည်းနိုင်လို့ပါ။
+
+### အခြားနည်းလမ်းများ![[and-more]]
+
+အံ့သြစရာမလိုဘဲ၊ အခြားနည်းပညာများစွာ ရှိပါသေးတယ်။ အချို့ကို ဖော်ပြရရင်-
+
+-   Byte-level BPE, GPT-2 မှာ အသုံးပြုထားပါတယ်။
+-   WordPiece, BERT မှာ အသုံးပြုထားပါတယ်။
+-   SentencePiece သို့မဟုတ် Unigram, multilingual models အများအပြားမှာ အသုံးပြုထားပါတယ်။
+
+Tokenizer တွေ ဘယ်လိုအလုပ်လုပ်တယ်ဆိုတဲ့ အသိပညာဟာ API နဲ့ စတင်ဖို့ လုံလောက်သင့်ပါပြီ။
+
+## Loading နှင့် Saving[[loading-and-saving]]
+
+Tokenizers တွေကို load လုပ်တာနဲ့ save လုပ်တာက model တွေနဲ့ လုပ်တာလိုပဲ ရိုးရှင်းပါတယ်။ တကယ်တော့၊ ၎င်းဟာ `from_pretrained()` နဲ့ `save_pretrained()` ဆိုတဲ့ methods နှစ်ခုတည်းပေါ် အခြေခံထားတာပါ။ ဒီ methods တွေက tokenizer အသုံးပြုတဲ့ algorithm (model ရဲ့ architecture နဲ့ ဆင်တူပါတယ်) နဲ့ ၎င်းရဲ့ vocabulary (model ရဲ့ weights နဲ့ ဆင်တူပါတယ်) နှစ်ခုလုံးကို load သို့မဟုတ် save လုပ်ပေးပါလိမ့်မယ်။
+
+BERT နဲ့ တူညီတဲ့ checkpoint နဲ့ train လုပ်ထားတဲ့ BERT tokenizer ကို load လုပ်တာက model ကို load လုပ်တာနဲ့ နည်းလမ်းတူတူပါပဲ၊ ဒါပေမယ့် ကျွန်တော်တို့က `BertTokenizer` class ကို အသုံးပြုရုံပါပဲ။
+
+```py
+from transformers import BertTokenizer
+
+tokenizer = BertTokenizer.from_pretrained("bert-base-cased")
+```
+
+`AutoModel` နဲ့ ဆင်တူစွာ၊ `AutoTokenizer` class က checkpoint name ကို အခြေခံပြီး library ထဲက မှန်ကန်တဲ့ tokenizer class ကို ရယူပါလိမ့်မယ်၊ ပြီးတော့ မည်သည့် checkpoint နဲ့မဆို တိုက်ရိုက် အသုံးပြုနိုင်စေပါတယ်။
+
+```py
+from transformers import AutoTokenizer
+
+tokenizer = AutoTokenizer.from_pretrained("bert-base-cased")
+```
+
+အခု ကျွန်တော်တို့ tokenizer ကို ယခင်အပိုင်းမှာ ပြသခဲ့သလို အသုံးပြုနိုင်ပါပြီ။
+
+```python
+tokenizer("Using a Transformer network is simple")
+```
+
+```python out
+{'input_ids': [101, 7993, 170, 11303, 1200, 2443, 1110, 3014, 102],
+ 'token_type_ids': [0, 0, 0, 0, 0, 0, 0, 0, 0],
+ 'attention_mask': [1, 1, 1, 1, 1, 1, 1, 1, 1]}
+```
+
+tokenizer ကို save လုပ်တာက model ကို save လုပ်တာနဲ့ အတူတူပါပဲ။
+
+```py
+tokenizer.save_pretrained("directory_on_my_computer")
+```
+
+`token_type_ids` အကြောင်းကို [Chapter 3](/course/chapter3) မှာ ပိုပြီး အသေးစိတ် ဆွေးနွေးပါမယ်။ `attention_mask` key ကိုတော့ နောက်မှ အနည်းငယ် ရှင်းပြပါမယ်။ ပထမဆုံးအနေနဲ့ `input_ids` တွေ ဘယ်လို ထုတ်လုပ်ခဲ့လဲဆိုတာ ကြည့်ရအောင်။ ဒါကို လုပ်ဖို့ tokenizer ရဲ့ ကြားခံ methods တွေကို ကြည့်ဖို့ လိုအပ်ပါလိမ့်မယ်။
+
+## Encoding[[encoding]]
+
+<Youtube id="Yffk5aydLzg"/>
+
+Text ကို ဂဏန်းတွေအဖြစ် ပြောင်းလဲတာကို *encoding* လို့ ခေါ်ပါတယ်။ Encoding ကို အဆင့်နှစ်ဆင့်နဲ့ လုပ်ဆောင်ပါတယ်- tokenization လုပ်ခြင်း၊ ပြီးရင် input IDs တွေအဖြစ် ပြောင်းလဲခြင်းတို့ ဖြစ်ပါတယ်။
+
+ကျွန်တော်တို့ တွေ့ခဲ့ရသလိုပဲ၊ ပထမအဆင့်က text ကို စကားလုံးတွေအဖြစ် (သို့မဟုတ် စကားလုံးအစိတ်အပိုင်းများ၊ punctuation symbols စသည်တို့) ပိုင်းခြားတာဖြစ်ပြီး၊ ဒါကို အများအားဖြင့် *tokens* လို့ ခေါ်ပါတယ်။ ဒီလုပ်ငန်းစဉ်ကို ထိန်းချုပ်နိုင်တဲ့ စည်းမျဉ်းများစွာ ရှိတာကြောင့် model ကို pretrained လုပ်ခဲ့တုန်းက အသုံးပြုခဲ့တဲ့ စည်းမျဉ်းတွေ အတူတူကို သေချာအသုံးပြုနိုင်ဖို့ model ရဲ့ နာမည်ကို အသုံးပြုပြီး tokenizer ကို instantiate လုပ်ဖို့ လိုအပ်ပါတယ်။
+
+ဒုတိယအဆင့်ကတော့ အဲဒီ tokens တွေကို ဂဏန်းတွေအဖြစ် ပြောင်းလဲတာပါ၊ ဒါမှ ၎င်းတို့ကနေ tensor တစ်ခုကို တည်ဆောက်ပြီး model ကို ထည့်သွင်းပေးနိုင်မှာပါ။ ဒါကို လုပ်ဖို့အတွက် tokenizer မှာ *vocabulary* တစ်ခုရှိပြီး၊ ဒါက `from_pretrained()` method နဲ့ instantiate လုပ်တဲ့အခါ ကျွန်တော်တို့ download လုပ်တဲ့ အစိတ်အပိုင်းပါပဲ။ ထပ်မံပြီး၊ model ကို pretrained လုပ်ခဲ့တုန်းက အသုံးပြုခဲ့တဲ့ vocabulary အတူတူကို ကျွန်တော်တို့ အသုံးပြုဖို့ လိုအပ်ပါတယ်။
+
+အဆင့်နှစ်ဆင့်ကို ပိုမိုနားလည်နိုင်ဖို့အတွက် ၎င်းတို့ကို သီးခြားစီ လေ့လာသွားပါမယ်။ tokenization pipeline ရဲ့ အစိတ်အပိုင်းအချို့ကို သီးခြားစီ လုပ်ဆောင်တဲ့ methods အချို့ကို ကျွန်တော်တို့ အသုံးပြုသွားမှာပါ။ ဒါက အဲဒီအဆင့်တွေရဲ့ ကြားခံရလဒ်တွေကို သင့်ကို ပြသဖို့ပါပဲ။ ဒါပေမယ့် လက်တွေ့မှာတော့ သင်ဟာ သင်ရဲ့ inputs တွေပေါ်မှာ tokenizer ကို တိုက်ရိုက် ခေါ်ဆိုသင့်ပါတယ် (အပိုင်း ၂ မှာ ပြထားသလို)။
+
+### Tokenization[[tokenization]]
+
+Tokenization လုပ်ငန်းစဉ်ကို tokenizer ရဲ့ `tokenize()` method က လုပ်ဆောင်ပါတယ်။ 
+
+```py
+from transformers import AutoTokenizer
+
+tokenizer = AutoTokenizer.from_pretrained("bert-base-cased")
+
+sequence = "Using a Transformer network is simple"
+tokens = tokenizer.tokenize(sequence)
+
+print(tokens)
+```
+
+ဒီ method ရဲ့ output ကတော့ strings ဒါမှမဟုတ် tokens တွေရဲ့ list တစ်ခုပါ။ 
+
+```python out
+['Using', 'a', 'transform', '##er', 'network', 'is', 'simple']
+```
+
+ဒီ tokenizer က subword tokenizer တစ်ခုပါ- ဒါက စကားလုံးတွေကို သူ့ရဲ့ vocabulary က ကိုယ်စားပြုနိုင်တဲ့ tokens တွေ ရရှိတဲ့အထိ ပိုင်းခြားပေးပါတယ်။ ဒီဥပမာမှာ `transformer` ကို `transform` နဲ့ `##er` ဆိုတဲ့ tokens နှစ်ခုအဖြစ် ပိုင်းခြားထားတာကို တွေ့ရပါတယ်။
+
+### Tokens တွေကနေ Input IDs တွေဆီသို့[[from-tokens-to-input-ids]]
+
+Input IDs တွေအဖြစ် ပြောင်းလဲခြင်းကို `convert_tokens_to_ids()` tokenizer method က ကိုင်တွယ်ပါတယ်။ 
+
+```py
+ids = tokenizer.convert_tokens_to_ids(tokens)
+
+print(ids)
+```
+
+```python out
+[7993, 170, 11303, 1200, 2443, 1110, 3014]
+```
+
+ဒီ outputs တွေကို သင့်လျော်တဲ့ framework tensor အဖြစ် ပြောင်းလဲပြီးတာနဲ့၊ ဒီအခန်းရဲ့ အစောပိုင်းမှာ တွေ့ခဲ့ရသလို model ရဲ့ inputs တွေအဖြစ် အသုံးပြုနိုင်ပါတယ်။
+
+> [!TIP]
+> ✏️ **စမ်းသပ်ကြည့်ပါ။** အပိုင်း ၂ မှာ အသုံးပြုခဲ့တဲ့ input sentences တွေ ("I've been waiting for a HuggingFace course my whole life." နဲ့ "I hate this so much!") ပေါ်မှာ နောက်ဆုံးအဆင့်နှစ်ခု (tokenization နဲ့ input IDs အဖြစ် ပြောင်းလဲခြင်း) ကို ပြန်လည်လုပ်ဆောင်ပါ။ ကျွန်တော်တို့ အစောပိုင်းက ရရှိခဲ့တဲ့ input IDs တွေ အတူတူ ရရှိမရရှိ စစ်ဆေးပါ။
+
+## Decoding[[decoding]]
+
+*Decoding* ဆိုတာကတော့ ပြောင်းပြန်လုပ်တာပါ၊ vocabulary indices တွေကနေ string တစ်ခုကို ပြန်ရချင်တာပါ။ ဒါကို `decode()` method နဲ့ အောက်ပါအတိုင်း လုပ်ဆောင်နိုင်ပါတယ်။
+
+```py
+decoded_string = tokenizer.decode([7993, 170, 11303, 1200, 2443, 1110, 3014])
+print(decoded_string)
+```
+
+```python out
+'Using a Transformer network is simple'
+```
+
+`decode` method က indices တွေကို tokens တွေအဖြစ် ပြန်ပြောင်းပေးရုံသာမကဘဲ၊ တူညီတဲ့ စကားလုံးရဲ့ အစိတ်အပိုင်းဖြစ်တဲ့ tokens တွေကို စုစည်းပြီး ဖတ်လို့ရတဲ့ စာကြောင်းတစ်ခုကို ထုတ်လုပ်ပေးတာကို သတိပြုပါ။ ဒီ behavior က text အသစ်တွေကို ခန့်မှန်းတဲ့ model တွေကို အသုံးပြုတဲ့အခါ (prompt တစ်ခုကနေ ထုတ်လုပ်တဲ့ text ဖြစ်စေ၊ ဒါမှမဟုတ် translation သို့မဟုတ် summarization လို sequence-to-sequence ပြဿနာတွေအတွက် ဖြစ်စေ) အလွန်အသုံးဝင်ပါလိမ့်မယ်။
+
+အခုဆိုရင် tokenizer တစ်ခုက ကိုင်တွယ်နိုင်တဲ့ atomic operations တွေကို သင်နားလည်သင့်ပါပြီ- tokenization လုပ်ခြင်း၊ IDs တွေအဖြစ် ပြောင်းလဲခြင်း၊ နဲ့ IDs တွေကို string အဖြစ် ပြန်ပြောင်းလဲခြင်းတို့ ဖြစ်ပါတယ်။ သို့သော်လည်း၊ ကျွန်တော်တို့ဟာ ရေခဲတောင်ရဲ့ ထိပ်ဖျားလေးကိုပဲ ကုတ်ဖဲ့မိပါသေးတယ်။ နောက်အပိုင်းမှာတော့ ကျွန်တော်တို့ရဲ့ နည်းလမ်းကို သူ့ရဲ့ အကန့်အသတ်တွေဆီ ယူဆောင်သွားပြီး ၎င်းတို့ကို ဘယ်လိုကျော်လွှားရမလဲဆိုတာ ကြည့်ရပါလိမ့်မယ်။
+
+## ဝေါဟာရ ရှင်းလင်းချက် (Glossary)
+
+*   **Tokenizers**: စာသား (သို့မဟုတ် အခြားဒေတာ) ကို AI မော်ဒယ်များ စီမံဆောင်ရွက်နိုင်ရန် tokens တွေအဖြစ် ပိုင်းခြားပေးသည့် ကိရိယာ သို့မဟုတ် လုပ်ငန်းစဉ်။
+*   **NLP Pipeline**: Natural Language Processing (NLP) လုပ်ငန်းတာဝန်တစ်ခုကို ပြီးမြောက်အောင် လုပ်ဆောင်ရန် အဆင့်ဆင့် လုပ်ဆောင်ရမည့် လုပ်ငန်းစဉ်များ။
+*   **Raw Text**: မည်သည့်လုပ်ဆောင်မှုမျှ မပြုလုပ်ရသေးသော သို့မဟုတ် ပုံစံမချရသေးသော မူရင်းစာသား။
+*   **Numerical Data**: ဂဏန်းပုံစံဖြင့် ဖော်ပြထားသော အချက်အလက်များ။
+*   **Tokenization**: စာသားကို tokens များအဖြစ် ပိုင်းခြားသော လုပ်ငန်းစဉ်။
+*   **Tokens**: စာသားကို ခွဲခြမ်းစိတ်ဖြာရာတွင် အသုံးပြုသော အသေးငယ်ဆုံးယူနစ်များ (ဥပမာ- စကားလုံးများ၊ subwords များ သို့မဟုတ် ပုဒ်ဖြတ်သံများ)။
+*   **Word-based Tokenizer**: စာသားကို စကားလုံးများအဖြစ် ပိုင်းခြားသော tokenizer အမျိုးအစား။
+*   **Whitespace**: စာသားထဲရှိ နေရာလွတ်များ (space, tab, newline)။
+*   **`split()` Function**: Python တွင် string တစ်ခုကို သတ်မှတ်ထားသော delimiter ဖြင့် ပိုင်းခြားရန် အသုံးပြုသော function။
+*   **Vocabulary**: tokenizer သို့မဟုတ် model တစ်ခုက သိရှိနားလည်ပြီး ကိုင်တွယ်နိုင်သော ထူးခြားသည့် tokens များ စုစုပေါင်း။
+*   **Corpus**: Machine Learning တွင် အသုံးပြုသော စာသားဒေတာအစုအဝေးကြီး။
+*   **ID**: token တစ်ခုစီကို ကိုယ်စားပြုသော ထူးခြားသည့် ဂဏန်း။
+*   **Unknown Token (`[UNK]`, `<unk>`)**: tokenizer ၏ vocabulary တွင် မပါဝင်သော စကားလုံးများကို ကိုယ်စားပြုရန် အသုံးပြုသော special token။
+*   **Character-based Tokenizer**: စာသားကို characters များအဖြစ် ပိုင်းခြားသော tokenizer အမျိုးအစား။
+*   **Out-of-vocabulary (OOV) Tokens**: tokenizer ၏ vocabulary တွင် မပါဝင်သော tokens များ။
+*   **Subword Tokenization**: မကြာခဏ အသုံးပြုသော စကားလုံးများကို မခွဲဘဲ၊ ရှားပါးသော စကားလုံးများကို အဓိပ္ပာယ်ရှိသော subwords များအဖြစ် ခွဲခြားသော tokenization နည်းလမ်း။
+*   **Agglutinative Languages**: စကားလုံးများကို အစိတ်အပိုင်းငယ်လေးများ ပေါင်းစပ်ခြင်းဖြင့် ဖွဲ့စည်းထားသော ဘာသာစကားများ (ဥပမာ- တူရကီဘာသာ)။
+*   **Byte-level BPE**: Byte Pair Encoding (BPE) ၏ ပြောင်းလဲထားသော ပုံစံတစ်ခုဖြစ်ပြီး characters များအစား bytes များကို အသုံးပြုသည်။ GPT-2 တွင် အသုံးပြုသည်။
+*   **WordPiece**: Google မှ ဖန်တီးထားသော subword tokenization algorithm တစ်ခုဖြစ်ပြီး BERT တွင် အသုံးပြုသည်။
+*   **SentencePiece / Unigram**: Google မှ ဖန်တီးထားသော subword tokenization algorithm များဖြစ်ပြီး multilingual models များတွင် အသုံးပြုသည်။
+*   **`AutoTokenizer` Class**: Hugging Face Transformers library မှာ ပါဝင်တဲ့ class တစ်ခုဖြစ်ပြီး မော်ဒယ်အမည်ကို အသုံးပြုပြီး သက်ဆိုင်ရာ tokenizer ကို အလိုအလျောက် load လုပ်ပေးသည်။
+*   **`BertTokenizer` Class**: BERT model အတွက် သီးသန့်ဒီဇိုင်းထုတ်ထားသော tokenizer class။
+*   **`from_pretrained()` Method**: Pre-trained model သို့မဟုတ် tokenizer ကို load လုပ်ရန် အသုံးပြုသော method။
+*   **`save_pretrained()` Method**: Model သို့မဟုတ် tokenizer ၏ weights များနှင့် architecture configuration ကို save လုပ်ရန် အသုံးပြုသော method။
+*   **Encoding**: Text ကို ဂဏန်းဆိုင်ရာ ကိုယ်စားပြုမှုအဖြစ် ပြောင်းလဲသော လုပ်ငန်းစဉ်။
+*   **`tokenize()` Method**: tokenizer ၏ text ကို tokens များအဖြစ် ပိုင်းခြားပေးသော method။
+*   **`convert_tokens_to_ids()` Method**: tokens list ကို input IDs list အဖြစ် ပြောင်းလဲပေးသော tokenizer method။
+*   **Decoding**: ဂဏန်းဆိုင်ရာ ကိုယ်စားပြုမှု (vocabulary indices) များမှ မူရင်းစာသားသို့ ပြန်ပြောင်းလဲသော လုပ်ငန်းစဉ်။
+*   **`decode()` Method**: input IDs များကို မူရင်းစာသားသို့ ပြန်ပြောင်းလဲပေးသော method။
+*   **Sequence-to-sequence Problems**: input sequence တစ်ခုမှ output sequence တစ်ခုသို့ ပြောင်းလဲခြင်း လုပ်ငန်းများ (ဥပမာ- ဘာသာပြန်ခြင်း၊ အနှစ်ချုပ်ခြင်း)။
+*   **Prompt**: မော်ဒယ်ကို text ထုတ်လုပ်ရန်အတွက် ပေးသော အစစာသား။
\ No newline at end of file
diff --git a/chapters/my/chapter2/5.mdx b/chapters/my/chapter2/5.mdx
new file mode 100644
index 000000000..f9ef55615
--- /dev/null
+++ b/chapters/my/chapter2/5.mdx
@@ -0,0 +1,225 @@
+<FrameworkSwitchCourse {fw} />
+
+# Sequence များစွာကို ကိုင်တွယ်ခြင်း[[handling-multiple-sequences]]
+
+<CourseFloatingBanner chapter={2}
+  classNames="absolute z-10 right-0 top-0"
+  notebooks={[
+    {label: "Google Colab", value: "https://colab.research.google.com/github/huggingface/notebooks/blob/master/course/en/chapter2/section5_pt.ipynb"},
+    {label: "Aws Studio", value: "https://studiolab.sagemaker.aws/import/github/huggingface/notebooks/blob/master/course/en/chapter2/section5_pt.ipynb"},
+]} />
+
+<Youtube id="M6adb1j2jPI"/>
+
+ယခင်အပိုင်းမှာ ကျွန်တော်တို့ဟာ အသေးစား အရှည်ရှိတဲ့ sequence တစ်ခုတည်းပေါ်မှာ inference လုပ်တဲ့ အလွယ်ကူဆုံး အသုံးပြုမှုပုံစံကို လေ့လာခဲ့ပါတယ်။ သို့သော်လည်း၊ အချို့မေးခွန်းတွေ အခုကတည်းက ပေါ်ပေါက်လာပါတယ်-
+
+-   sequence များစွာကို ဘယ်လိုကိုင်တွယ်မလဲ။
+-   *အရှည်မတူညီတဲ့* sequence များစွာကို ဘယ်လိုကိုင်တွယ်မလဲ။
+-   vocabulary indices တွေဟာ model က ကောင်းကောင်း အလုပ်လုပ်နိုင်ဖို့ တစ်ခုတည်းသော input တွေလား။
+-   အရမ်းရှည်လျားတဲ့ sequence မျိုး ရှိပါသလား။
+
+ဒီမေးခွန်းတွေက ဘယ်လိုပြဿနာတွေ ဖြစ်စေလဲ၊ Hugging Face Transformers API ကို အသုံးပြုပြီး ဒါတွေကို ဘယ်လိုဖြေရှင်းနိုင်လဲဆိုတာ ကြည့်ရအောင်။
+
+## Model များသည် Inputs များ၏ Batch တစ်ခုကို မျှော်လင့်ကြသည်[[models-expect-a-batch-of-inputs]]
+
+ယခင် လေ့ကျင့်ခန်းမှာ sequence တွေကို ဂဏန်း list တွေအဖြစ် ဘယ်လိုပြောင်းလဲတယ်ဆိုတာကို သင်တွေ့ခဲ့ရပါတယ်။ ဒီဂဏန်း list ကို tensor အဖြစ် ပြောင်းလဲပြီး model ကို ပို့ကြည့်ရအောင်။
+
+```py
+import torch
+from transformers import AutoTokenizer, AutoModelForSequenceClassification
+
+checkpoint = "distilbert-base-uncased-finetuned-sst-2-english"
+tokenizer = AutoTokenizer.from_pretrained(checkpoint)
+model = AutoModelForSequenceClassification.from_pretrained(checkpoint)
+
+sequence = "I've been waiting for a HuggingFace course my whole life."
+
+tokens = tokenizer.tokenize(sequence)
+ids = tokenizer.convert_tokens_to_ids(tokens)
+input_ids = torch.tensor(ids)
+# This line will fail.
+model(input_ids)
+```
+
+```python out
+IndexError: Dimension out of range (expected to be in range of [-1, 0], but got 1)
+```
+
+ဟာ မဟုတ်သေးပါဘူး။ ဒါက ဘာကြောင့် အဆင်မပြေတာလဲ။ ကျွန်တော်တို့ အပိုင်း ၂ မှာပါတဲ့ pipeline အဆင့်တွေကို လိုက်နာခဲ့တာပဲလေဗျာ။
+
+ပြဿနာကတော့ ကျွန်တော်တို့ model ကို single sequence တစ်ခုပဲ ပို့ခဲ့တာပါ၊ ဒါပေမယ့် 🤗 Transformers model တွေက default အားဖြင့် sentences များစွာကို မျှော်လင့်ထားပါတယ်။ ဒီနေရာမှာ tokenizer က `sequence` တစ်ခုပေါ်မှာ အသုံးပြုခဲ့တုန်းက နောက်ကွယ်မှာ လုပ်ခဲ့တဲ့အရာ အားလုံးကို ကျွန်တော်တို့ လုပ်ဆောင်ဖို့ ကြိုးစားခဲ့ပါတယ်။ ဒါပေမယ့် သေချာကြည့်မယ်ဆိုရင် tokenizer က input IDs list ကို tensor တစ်ခုအဖြစ် ပြောင်းပေးရုံသာမကဘဲ၊ ၎င်းရဲ့ အပေါ်မှာ dimension တစ်ခုကို ထည့်သွင်းခဲ့တယ်ဆိုတာကို သင်တွေ့ရပါလိမ့်မယ်-
+
+```py
+tokenized_inputs = tokenizer(sequence, return_tensors="pt")
+print(tokenized_inputs["input_ids"])
+```
+
+```python out
+tensor([[  101,  1045,  1005,  2310,  2042,  3403,  2005,  1037, 17662, 12172,
+          2607,  2026,  2878,  2166,  1012,   102]])
+```
+
+နောက်တစ်ကြိမ် ထပ်ကြိုးစားပြီး dimension အသစ်တစ်ခု ထည့်ကြည့်ရအောင်။
+
+```py
+import torch
+from transformers import AutoTokenizer, AutoModelForSequenceClassification
+
+checkpoint = "distilbert-base-uncased-finetuned-sst-2-english"
+tokenizer = AutoTokenizer.from_pretrained(checkpoint)
+model = AutoModelForSequenceClassification.from_pretrained(checkpoint)
+
+sequence = "I've been waiting for a HuggingFace course my whole life."
+
+tokens = tokenizer.tokenize(sequence)
+ids = tokenizer.convert_tokens_to_ids(tokens)
+
+input_ids = torch.tensor([ids])
+print("Input IDs:", input_ids)
+
+output = model(input_ids)
+print("Logits:", output.logits)
+```
+
+ကျွန်တော်တို့ input IDs တွေရော၊ ထွက်လာတဲ့ logits တွေရော print လုပ်လိုက်ပါတယ်၊ ဒါကတော့ output ပါ။
+
+```python out
+Input IDs: [[ 1045,  1005,  2310,  2042,  3403,  2005,  1037, 17662, 12172,  2607, 2026,  2878,  2166,  1012]]
+Logits: [[-2.7276,  2.8789]]
+```
+
+*Batching* ဆိုတာက model ကို sentences များစွာကို တစ်ပြိုင်နက်တည်း ပေးပို့တဲ့ လုပ်ဆောင်ချက်ပါပဲ။ သင့်မှာ စာကြောင်းတစ်ကြောင်းတည်းသာ ရှိရင်၊ single sequence တစ်ခုတည်းနဲ့ batch တစ်ခုကို တည်ဆောက်နိုင်ပါတယ်။
+
+```
+batched_ids = [ids, ids]
+```
+
+ဒါက အတူတူ sequences နှစ်ခုပါတဲ့ batch တစ်ခုပါပဲ။
+
+> [!TIP]
+> ✏️ **စမ်းသပ်ကြည့်ပါ။** ဒီ `batched_ids` list ကို tensor အဖြစ် ပြောင်းလဲပြီး သင့် model ကို ဖြတ်သန်းပါ။ ယခင်က ရရှိခဲ့တဲ့ logits တွေ အတူတူ (ဒါပေမယ့် နှစ်ဆ) ရရှိမရရှိ စစ်ဆေးပါ။
+
+Batching လုပ်ခြင်းက သင် model ကို sentences များစွာ ထည့်သွင်းတဲ့အခါ အလုပ်လုပ်စေပါတယ်။ sentences များစွာကို အသုံးပြုတာက single sequence နဲ့ batch တစ်ခုတည်ဆောက်တာလိုပဲ ရိုးရှင်းပါတယ်။ ဒါပေမယ့် ဒုတိယပြဿနာတစ်ခု ရှိပါတယ်။ စာကြောင်းနှစ်ခု (သို့မဟုတ် ပိုများ) ကို batch လုပ်ဖို့ ကြိုးစားတဲ့အခါ၊ ၎င်းတို့ရဲ့ အရှည်တွေက မတူညီနိုင်ပါဘူး။ သင် tensors တွေနဲ့ အရင်က အလုပ်လုပ်ဖူးတယ်ဆိုရင်၊ ၎င်းတို့ဟာ ထောင့်မှန်ပုံစံ (rectangular shape) ဖြစ်ဖို့ လိုအပ်တယ်ဆိုတာ သင်သိပါလိမ့်မယ်။ ဒါကြောင့် input IDs list ကို tensor တစ်ခုအဖြစ် တိုက်ရိုက်ပြောင်းလို့ ရမှာ မဟုတ်ပါဘူး။ ဒီပြဿနာကို ဖြေရှင်းဖို့အတွက် ကျွန်တော်တို့ဟာ inputs တွေကို များသောအားဖြင့် *pad* လုပ်ပါတယ်။
+
+## Inputs များကို Padding လုပ်ခြင်း[[padding-the-inputs]]
+
+အောက်ပါ list of lists ကို tensor အဖြစ် ပြောင်းလဲလို့ မရပါဘူး။
+
+```py no-format
+batched_ids = [
+    [200, 200, 200],
+    [200, 200]
+]
+```
+
+ဒီပြဿနာကို ဖြေရှင်းဖို့အတွက် ကျွန်တော်တို့ tensors တွေကို ထောင့်မှန်ပုံစံ ဖြစ်အောင် ပြုလုပ်ဖို့ *padding* ကို အသုံးပြုပါမယ်။ Padding က ကျွန်တော်တို့ရဲ့ sentences တွေအားလုံးကို အရှည်တူညီအောင် သေချာစေဖို့အတွက် *padding token* လို့ခေါ်တဲ့ special word တစ်ခုကို တန်ဖိုးနည်းတဲ့ sentences တွေမှာ ထည့်ပေးပါတယ်။ ဥပမာအားဖြင့်၊ သင့်မှာ စကားလုံး ၁၀ လုံးပါတဲ့ စာကြောင်း ၁၀ ကြောင်းနဲ့ စကားလုံး ၂၀ လုံးပါတဲ့ စာကြောင်း ၁ ကြောင်းရှိရင်၊ padding က စာကြောင်းအားလုံးမှာ စကားလုံး ၂၀ လုံးစီ ရှိစေမှာပါ။ ကျွန်တော်တို့ရဲ့ ဥပမာမှာ၊ ထွက်လာတဲ့ tensor က ဒီလိုဖြစ်ပါတယ်။
+
+```py no-format
+padding_id = 100
+
+batched_ids = [
+    [200, 200, 200],
+    [200, 200, padding_id],
+]
+```
+
+padding token ID ကို `tokenizer.pad_token_id` မှာ ရှာတွေ့နိုင်ပါတယ်။ ဒါကို အသုံးပြုပြီး ကျွန်တော်တို့ရဲ့ စာကြောင်းနှစ်ကြောင်းကို model ကနေတဆင့် တစ်ဦးချင်းစီနဲ့ batch အဖြစ် ပေါင်းပြီး ပို့ကြည့်ရအောင်။
+
+```py no-format
+model = AutoModelForSequenceClassification.from_pretrained(checkpoint)
+
+sequence1_ids = [[200, 200, 200]]
+sequence2_ids = [[200, 200]]
+batched_ids = [
+    [200, 200, 200],
+    [200, 200, tokenizer.pad_token_id],
+]
+
+print(model(torch.tensor(sequence1_ids)).logits)
+print(model(torch.tensor(sequence2_ids)).logits)
+print(model(torch.tensor(batched_ids)).logits)
+```
+
+```python out
+tensor([[ 1.5694, -1.3895]], grad_fn=<AddmmBackward>)
+tensor([[ 0.5803, -0.4125]], grad_fn=<AddmmBackward>)
+tensor([[ 1.5694, -1.3895],
+        [ 1.3373, -1.2163]], grad_fn=<AddmmBackward>)
+```
+
+ကျွန်တော်တို့ရဲ့ batched predictions တွေထဲက logits တွေမှာ တစ်ခုခု မှားယွင်းနေပါတယ်။ ဒုတိယ row က ဒုတိယစာကြောင်းအတွက် logits တွေနဲ့ တူညီသင့်ပေမယ့်၊ ကျွန်တော်တို့မှာ လုံးဝမတူညီတဲ့ တန်ဖိုးတွေ ရနေပါတယ်။
+
+ဒါက Transformer model တွေရဲ့ အဓိကအင်္ဂါရပ်ဖြစ်တဲ့ attention layers တွေက token တစ်ခုစီကို *contextualize* လုပ်ပေးလို့ ဖြစ်ပါတယ်။ ၎င်းတို့သည် sequence တစ်ခု၏ tokens အားလုံးကို အာရုံစိုက်တာကြောင့် padding tokens တွေကို ထည့်သွင်းစဉ်းစားပါလိမ့်မယ်။ မတူညီတဲ့ အရှည်ရှိတဲ့ တစ်ဦးချင်းစီ စာကြောင်းတွေကို model ကနေတဆင့် ပို့တဲ့အခါ ဒါမှမဟုတ် တူညီတဲ့ စာကြောင်းတွေနဲ့ padding လုပ်ထားတဲ့ batch တစ်ခုကို ပို့တဲ့အခါ တူညီတဲ့ရလဒ် ရရှိဖို့အတွက်၊ အဲဒီ attention layers တွေကို padding tokens တွေကို လျစ်လျူရှုဖို့ ကျွန်တော်တို့ ပြောပြဖို့ လိုအပ်ပါတယ်။ ဒါကို attention mask ကို အသုံးပြုခြင်းဖြင့် လုပ်ဆောင်ပါတယ်။
+
+## Attention Masks[[attention-masks]]
+
+*Attention masks* တွေက input IDs tensor နဲ့ အတိအကျတူညီတဲ့ shape ရှိတဲ့ tensors တွေဖြစ်ပြီး 0 တွေနဲ့ 1 တွေနဲ့ ဖြည့်ထားပါတယ်။ 1 တွေက သက်ဆိုင်ရာ tokens တွေကို အာရုံစိုက်သင့်တယ်လို့ ဖော်ပြပြီး၊ 0 တွေကတော့ သက်ဆိုင်ရာ tokens တွေကို အာရုံစိုက်ရန် မလိုအပ်ဘူး (ဆိုလိုသည်မှာ model ရဲ့ attention layers တွေက လျစ်လျူရှုသင့်တယ်) လို့ ဖော်ပြပါတယ်။
+
+ယခင် ဥပမာကို attention mask တစ်ခုနဲ့ ဖြည့်စွက်ကြည့်ရအောင်။
+
+```py no-format
+batched_ids = [
+    [200, 200, 200],
+    [200, 200, tokenizer.pad_token_id],
+]
+
+attention_mask = [
+    [1, 1, 1],
+    [1, 1, 0],
+]
+
+outputs = model(torch.tensor(batched_ids), attention_mask=torch.tensor(attention_mask))
+print(outputs.logits)
+```
+
+```python out
+tensor([[ 1.5694, -1.3895],
+        [ 0.5803, -0.4125]], grad_fn=<AddmmBackward>)
+```
+
+အခု batch ထဲက ဒုတိယစာကြောင်းအတွက် logits တွေ အတူတူကို ရရှိပါပြီ။
+
+ဒုတိယ sequence ရဲ့ နောက်ဆုံးတန်ဖိုးက padding ID တစ်ခုဖြစ်ပြီး၊ attention mask ထဲမှာတော့ 0 တန်ဖိုးဖြစ်နေတာကို သတိပြုပါ။
+
+> [!TIP]
+> ✏️ **စမ်းသပ်ကြည့်ပါ။** အပိုင်း ၂ မှာ အသုံးပြုခဲ့တဲ့ စာကြောင်းနှစ်ကြောင်း ("I've been waiting for a HuggingFace course my whole life." နဲ့ "I hate this so much!") ပေါ်မှာ tokenization ကို ကိုယ်တိုင် လုပ်ဆောင်ပါ။ ၎င်းတို့ကို model ကို ဖြတ်သန်းပြီး အပိုင်း ၂ မှာ ရရှိခဲ့တဲ့ logits တွေ အတူတူ ရရှိမရရှိ စစ်ဆေးပါ။ အခု ၎င်းတို့ကို padding token ကို အသုံးပြုပြီး batch လုပ်ပါ၊ ပြီးရင် မှန်ကန်တဲ့ attention mask ကို ဖန်တီးပါ။ model ကို ဖြတ်သန်းတဲ့အခါ တူညီတဲ့ရလဒ်တွေ ရရှိမရရှိ စစ်ဆေးပါ။
+
+## ပိုမိုရှည်လျားသော Sequences များ[[longer-sequences]]
+
+Transformer model တွေနဲ့ဆိုရင် model တွေကို ပေးပို့နိုင်တဲ့ sequences တွေရဲ့ အရှည်မှာ ကန့်သတ်ချက်တစ်ခု ရှိပါတယ်။ model အများစုက tokens 512 သို့မဟုတ် 1024 အထိ sequences တွေကို ကိုင်တွယ်နိုင်ပြီး၊ ပိုရှည်တဲ့ sequences တွေကို လုပ်ဆောင်ဖို့ တောင်းဆိုတဲ့အခါ crash ဖြစ်ပါလိမ့်မယ်။ ဒီပြဿနာကို ဖြေရှင်းဖို့ နည်းလမ်းနှစ်ခု ရှိပါတယ်-
+
+-   ပိုမိုရှည်လျားတဲ့ sequence length ကို ထောက်ပံ့ပေးတဲ့ model ကို အသုံးပြုပါ။
+-   သင်ရဲ့ sequences တွေကို truncate လုပ်ပါ။
+
+Model တွေမှာ မတူညီတဲ့ ထောက်ပံ့ပေးထားတဲ့ sequence lengths တွေရှိပြီး၊ အချို့က အလွန်ရှည်လျားတဲ့ sequences တွေကို ကိုင်တွယ်ရာမှာ အထူးပြုပါတယ်။ [Longformer](https://huggingface.co/docs/transformers/model_doc/longformer) က ဥပမာတစ်ခုဖြစ်ပြီး၊ [LED](https://huggingface.co/docs/transformers/model_doc/led) က နောက်ဥပမာတစ်ခု ဖြစ်ပါတယ်။ သင်ဟာ အလွန်ရှည်လျားတဲ့ sequences တွေလိုအပ်တဲ့ လုပ်ငန်းတစ်ခုပေါ်မှာ အလုပ်လုပ်နေတယ်ဆိုရင်၊ အဲဒီ model တွေကို လေ့လာကြည့်ဖို့ ကျွန်တော်တို့ အကြံပြုပါတယ်။
+
+မဟုတ်ရင်တော့ သင်ရဲ့ sequences တွေကို `max_sequence_length` parameter ကို သတ်မှတ်ခြင်းဖြင့် truncate လုပ်ဖို့ ကျွန်တော်တို့ အကြံပြုပါတယ်။
+
+```py
+sequence = sequence[:max_sequence_length]
+```
+## ဝေါဟာရ ရှင်းလင်းချက် (Glossary)
+
+*   **Inference**: လေ့ကျင့်ပြီးသား Artificial Intelligence (AI) မော်ဒယ်တစ်ခုကို အသုံးပြုပြီး input data ကနေ ခန့်မှန်းချက်တွေ ဒါမှမဟုတ် output တွေကို ထုတ်လုပ်တဲ့ လုပ်ငန်းစဉ်။
+*   **Sequence**: စာသားတစ်ခု သို့မဟုတ် စကားလုံးများ၊ tokens များ၏ အစဉ်လိုက် စီစဉ်ထားသော အစုအဝေး။
+*   **Vocabulary Indices**: စာသားကို encode လုပ်ပြီးနောက် ရရှိလာသော tokens တစ်ခုစီ၏ ထူးခြားသော ဂဏန်း ID များ။
+*   **🤗 Transformers API**: Hugging Face Transformers library ကို အသုံးပြုရန်အတွက် ပရိုဂရမ်မာများက ခေါ်ဆိုနိုင်သော လုပ်ဆောင်ချက်များ၊ class များနှင့် methods များ။
+*   **Batching**: မတူညီသော input များစွာကို တစ်ပြိုင်နက်တည်း လုပ်ဆောင်နိုင်ရန် အုပ်စုဖွဲ့ခြင်း။
+*   **Tensor**: Machine Learning frameworks (PyTorch, TensorFlow) များတွင် ဒေတာများကို ကိုယ်စားပြုသော multi-dimensional array များ။
+*   **`AutoTokenizer` Class**: Hugging Face Transformers library မှာ ပါဝင်တဲ့ class တစ်ခုဖြစ်ပြီး မော်ဒယ်အမည်ကို အသုံးပြုပြီး သက်ဆိုင်ရာ tokenizer ကို အလိုအလျောက် load လုပ်ပေးသည်။
+*   **`AutoModelForSequenceClassification` Class**: Hugging Face Transformers library မှာ ပါဝင်တဲ့ class တစ်ခုဖြစ်ပြီး sequence classification အတွက် pre-trained model ကို အလိုအလျောက် load လုပ်ပေးသည်။
+*   **`from_pretrained()` Method**: Pre-trained model သို့မဟုတ် tokenizer ကို load လုပ်ရန် အသုံးပြုသော method။
+*   **`tokenize()` Method**: tokenizer ၏ text ကို tokens များအဖြစ် ပိုင်းခြားပေးသော method။
+*   **`convert_tokens_to_ids()` Method**: tokens list ကို input IDs list အဖြစ် ပြောင်းလဲပေးသော tokenizer method။
+*   **`input_ids`**: Tokenizer မှ ထုတ်ပေးသော tokens တစ်ခုစီ၏ ထူးခြားသော ဂဏန်းဆိုင်ရာ ID များ။
+*   **`output.logits`**: မော်ဒယ်၏ နောက်ဆုံး layer မှ ထုတ်ပေးသော raw, unnormalized scores များ။
+*   **Padding**: မတူညီသော အရှည်ရှိသည့် input sequence များကို အရှည်တူညီအောင် သတ်မှတ်ထားသော တန်ဖိုးများဖြင့် ဖြည့်စွက်ခြင်း။
+*   **Padding Token**: Padding လုပ်ရာတွင် အသုံးပြုသော အထူး token (ဥပမာ - `[PAD]`)။
+*   **`tokenizer.pad_token_id`**: tokenizer ၏ padding token ၏ ID။
+*   **Attention Layers**: Transformer model ၏ အစိတ်အပိုင်းများဖြစ်ပြီး input sequence အတွင်းရှိ မတူညီသော tokens များ၏ အရေးပါမှုကို ဆုံးဖြတ်ရာတွင် အထောက်အကူပြုသည်။
+*   **Contextualize**: စကားလုံးတစ်ခု၏ အဓိပ္ပာယ်ကို ၎င်းပါဝင်သော စာကြောင်း သို့မဟုတ် စာသား၏ အခြေအနေအရ နားလည်စေခြင်း။
+*   **Attention Mask**: မော်ဒယ်ကို အာရုံစိုက်သင့်သည့် tokens များနှင့် လျစ်လျူရှုသင့်သည့် (padding) tokens များကို ခွဲခြားပေးသည့် binary mask။
+*   **Truncate**: sequences ၏ အရှည်ကို ကန့်သတ်ချက်တစ်ခုအထိ ဖြတ်တောက်ခြင်း။
+*   **`max_sequence_length` Parameter**: input sequence ၏ အများဆုံး အရှည်ကို သတ်မှတ်သော parameter။
+*   **Longformer**: အလွန်ရှည်လျားသော sequences များကို ကိုင်တွယ်နိုင်ရန် ဒီဇိုင်းထုတ်ထားသော Transformer model တစ်မျိုး။
+*   **LED (Longformer-Encoder-Decoder)**: Longformer ကို အခြေခံထားသော encoder-decoder Transformer model တစ်မျိုး။
\ No newline at end of file
diff --git a/chapters/my/chapter2/6.mdx b/chapters/my/chapter2/6.mdx
new file mode 100644
index 000000000..a24a3475a
--- /dev/null
+++ b/chapters/my/chapter2/6.mdx
@@ -0,0 +1,160 @@
+<FrameworkSwitchCourse {fw} />
+
+# အားလုံးကို ပေါင်းစပ်ခြင်း[[putting-it-all-together]]
+
+<CourseFloatingBanner chapter={2}
+  classNames="absolute z-10 right-0 top-0"
+  notebooks={[
+    {label: "Google Colab", value: "https://colab.research.google.com/github/huggingface/notebooks/blob/master/course/en/chapter2/section6_pt.ipynb"},
+    {label: "Aws Studio", value: "https://studiolab.sagemaker.aws/import/github/huggingface/notebooks/blob/master/course/en/chapter2/section6_pt.ipynb"},
+]} />
+
+နောက်ဆုံးအပိုင်းအချို့မှာတော့ ကျွန်တော်တို့ အလုပ်အများစုကို ကိုယ်တိုင်လုပ်ဆောင်ဖို့ အစွမ်းကုန် ကြိုးစားခဲ့ပါတယ်။ tokenizers တွေ ဘယ်လိုအလုပ်လုပ်တယ်ဆိုတာကို လေ့လာခဲ့ပြီး tokenization, input IDs အဖြစ် ပြောင်းလဲခြင်း၊ padding, truncation, နဲ့ attention masks တွေအကြောင်းကို လေ့လာခဲ့ပါတယ်။
+
+သို့သော်လည်း၊ အပိုင်း ၂ မှာ ကျွန်တော်တို့ တွေ့ခဲ့ရသလိုပဲ၊ 🤗 Transformers API က ဒါတွေအားလုံးကို ကျွန်တော်တို့အတွက် အဆင့်မြင့် function တစ်ခုနဲ့ ကိုင်တွယ်ပေးနိုင်ပြီး၊ အဲဒါကို ဒီနေရာမှာ ကျွန်တော်တို့ နက်ရှိုင်းစွာ လေ့လာပါမယ်။ သင်ရဲ့ `tokenizer` ကို စာကြောင်းပေါ်မှာ တိုက်ရိုက် ခေါ်ဆိုတဲ့အခါ၊ သင်ရဲ့ model ကို ဖြတ်သန်းဖို့ အဆင်သင့်ဖြစ်နေတဲ့ inputs တွေကို ပြန်ရပါလိမ့်မယ်။
+
+```py
+from transformers import AutoTokenizer
+
+checkpoint = "distilbert-base-uncased-finetuned-sst-2-english"
+tokenizer = AutoTokenizer.from_pretrained(checkpoint)
+
+sequence = "I've been waiting for a HuggingFace course my whole life."
+
+model_inputs = tokenizer(sequence)
+```
+
+ဒီနေရာမှာ `model_inputs` variable မှာ model တစ်ခု ကောင်းကောင်း အလုပ်လုပ်နိုင်ဖို့ လိုအပ်တဲ့ အရာအားလုံး ပါဝင်ပါတယ်။ DistilBERT အတွက်ဆိုရင်၊ အဲဒါက input IDs တွေအပြင် attention mask ပါ ပါဝင်ပါတယ်။ အပို inputs တွေကို လက်ခံတဲ့ တခြား model တွေအတွက်လည်း `tokenizer` object က အဲဒါတွေကို output အဖြစ် ထုတ်ပေးပါလိမ့်မယ်။
+
+အောက်ပါ ဥပမာအချို့မှာ ကျွန်တော်တို့ မြင်ရမယ့်အတိုင်း၊ ဒီ method က အလွန်အစွမ်းထက်ပါတယ်။ ပထမဆုံးအနေနဲ့၊ ဒါက single sequence တစ်ခုကို tokenize လုပ်ဆောင်နိုင်ပါတယ်။
+
+```py
+sequence = "I've been waiting for a HuggingFace course my whole life."
+
+model_inputs = tokenizer(sequence)
+```
+
+ဒါက API မှာ ဘာမှမပြောင်းလဲဘဲ sequence များစွာကို တစ်ပြိုင်နက်တည်း ကိုင်တွယ်နိုင်ပါတယ်။
+
+```py
+sequences = ["I've been waiting for a HuggingFace course my whole life.", "So have I!"]
+
+model_inputs = tokenizer(sequences)
+```
+
+ဒါက ရည်ရွယ်ချက်အမျိုးမျိုးအရ pad လုပ်နိုင်ပါတယ်။
+
+```py
+# Sequences တွေကို အရှည်ဆုံး sequence length အထိ pad လုပ်ပါလိမ့်မယ်။
+model_inputs = tokenizer(sequences, padding="longest")
+
+# Sequences တွေကို model ရဲ့ max length (BERT ဒါမှမဟုတ် DistilBERT အတွက် 512) အထိ pad လုပ်ပါလိမ့်မယ်။
+model_inputs = tokenizer(sequences, padding="max_length")
+
+# Sequences တွေကို သတ်မှတ်ထားတဲ့ max length အထိ pad လုပ်ပါလိမ့်မယ်။
+model_inputs = tokenizer(sequences, padding="max_length", max_length=8)
+```
+
+ဒါက sequences တွေကို truncate လည်း လုပ်နိုင်ပါတယ်။
+
+```py
+sequences = ["I've been waiting for a HuggingFace course my whole life.", "So have I!"]
+
+# Model ရဲ့ max length (BERT ဒါမှမဟုတ် DistilBERT အတွက် 512) ထက် ပိုရှည်တဲ့ sequences တွေကို truncate လုပ်ပါလိမ့်မယ်။
+model_inputs = tokenizer(sequences, truncation=True)
+
+# သတ်မှတ်ထားတဲ့ max length ထက် ပိုရှည်တဲ့ sequences တွေကို truncate လုပ်ပါလိမ့်မယ်။
+model_inputs = tokenizer(sequences, max_length=8, truncation=True)
+```
+
+`tokenizer` object က သီးခြား framework tensors တွေအဖြစ် ပြောင်းလဲခြင်းကို ကိုင်တွယ်နိုင်ပါတယ်။ ၎င်းတို့ကို model ကို တိုက်ရိုက် ပို့နိုင်ပါတယ်။ ဥပမာအားဖြင့်၊ အောက်ပါ code sample မှာ ကျွန်တော်တို့က tokenizer ကို မတူညီတဲ့ frameworks တွေကနေ tensors တွေကို ပြန်ပေးဖို့ တောင်းဆိုနေတာပါ။ `"pt"` က PyTorch tensors တွေကို ပြန်ပေးပြီး `"np"` က NumPy arrays တွေကို ပြန်ပေးပါတယ်။
+
+```py
+sequences = ["I've been waiting for a HuggingFace course my whole life.", "So have I!"]
+
+# PyTorch tensors များကို ပြန်ပေးသည်။
+model_inputs = tokenizer(sequences, padding=True, return_tensors="pt")
+
+# NumPy arrays များကို ပြန်ပေးသည်။
+model_inputs = tokenizer(sequences, padding=True, return_tensors="np")
+```
+
+## Special Tokens များ[[special-tokens]]
+
+tokenizer က ပြန်ပေးတဲ့ input IDs တွေကို ကြည့်လိုက်ရင်၊ အစောပိုင်းက ကျွန်တော်တို့ ရရှိခဲ့တာတွေနဲ့ အနည်းငယ် ကွဲပြားနေတာကို တွေ့ရပါလိမ့်မယ်။
+
+```py
+sequence = "I've been waiting for a HuggingFace course my whole life."
+
+model_inputs = tokenizer(sequence)
+print(model_inputs["input_ids"])
+
+tokens = tokenizer.tokenize(sequence)
+ids = tokenizer.convert_tokens_to_ids(tokens)
+print(ids)
+```
+
+```python out
+[101, 1045, 1005, 2310, 2042, 3403, 2005, 1037, 17662, 12172, 2607, 2026, 2878, 2166, 1012, 102]
+[1045, 1005, 2310, 2042, 3403, 2005, 1037, 17662, 12172, 2607, 2026, 2878, 2166, 1012]
+```
+
+token ID တစ်ခုကို အစမှာ ထည့်သွင်းထားပြီး၊ တစ်ခုကို အဆုံးမှာ ထည့်သွင်းထားပါတယ်။ ဒါက ဘာအကြောင်းလဲဆိုတာ သိဖို့ အထက်ပါ IDs sequence နှစ်ခုကို decode လုပ်ကြည့်ရအောင်။
+
+```py
+print(tokenizer.decode(model_inputs["input_ids"]))
+print(tokenizer.decode(ids))
+```
+
+```python out
+"[CLS] i've been waiting for a huggingface course my whole life. [SEP]"
+"i've been waiting for a huggingface course my whole life."
+```
+
+tokenizer က အစမှာ `[CLS]` ဆိုတဲ့ special word ကို ထည့်ထားပြီး၊ အဆုံးမှာ `[SEP]` ဆိုတဲ့ special word ကို ထည့်ထားပါတယ်။ ဒါက model ကို အဲဒီ tokens တွေနဲ့ pretrained လုပ်ထားတာကြောင့် ဖြစ်ပြီး၊ inference အတွက် တူညီတဲ့ ရလဒ်တွေ ရရှိဖို့အတွက် ကျွန်တော်တို့လည်း ဒါတွေကို ထည့်ဖို့ လိုအပ်ပါတယ်။ တချို့ model တွေက special words တွေ မထည့်တာ ဒါမှမဟုတ် မတူညီတဲ့ special words တွေ ထည့်တာမျိုး ရှိနိုင်ပါတယ်။ model တွေက special words တွေကို အစမှာပဲ ဒါမှမဟုတ် အဆုံးမှာပဲ ထည့်တာမျိုးလည်း ရှိနိုင်ပါတယ်။ ဘယ်လိုပဲဖြစ်ဖြစ်၊ tokenizer က ဘယ် special tokens တွေ လိုအပ်တယ်ဆိုတာ သိပြီး သင့်အတွက် ဒါတွေကို ကိုင်တွယ်ပေးပါလိမ့်မယ်။
+
+## အနှစ်ချုပ်: Tokenizer မှ Model ဆီသို့[[wrapping-up-from-tokenizer-to-model]]
+
+`tokenizer` object က text တွေပေါ်မှာ အသုံးပြုတဲ့အခါ တစ်ဦးချင်းစီ အဆင့်တွေကို အားလုံး မြင်ပြီးသွားပြီဆိုတော့၊ ဒါက sequences များစွာကို (padding!)၊ အလွန်ရှည်လျားတဲ့ sequences တွေကို (truncation!) နဲ့ မတူညီတဲ့ tensors အမျိုးအစားများစွာကို သူ့ရဲ့ အဓိက API နဲ့ ဘယ်လိုကိုင်တွယ်လဲဆိုတာကို နောက်ဆုံးတစ်ကြိမ် ကြည့်ရအောင်။
+
+```py
+import torch
+from transformers import AutoTokenizer, AutoModelForSequenceClassification
+
+checkpoint = "distilbert-base-uncased-finetuned-sst-2-english"
+tokenizer = AutoTokenizer.from_pretrained(checkpoint)
+model = AutoModelForSequenceClassification.from_pretrained(checkpoint)
+sequences = ["I've been waiting for a HuggingFace course my whole life.", "So have I!"]
+
+tokens = tokenizer(sequences, padding=True, truncation=True, return_tensors="pt")
+output = model(**tokens)
+```
+
+## ဝေါဟာရ ရှင်းလင်းချက် (Glossary)
+
+*   **Tokenizer**: စာသား (သို့မဟုတ် အခြားဒေတာ) ကို AI မော်ဒယ်များ စီမံဆောင်ရွက်နိုင်ရန် tokens တွေအဖြစ် ပိုင်းခြားပေးသည့် ကိရိယာ သို့မဟုတ် လုပ်ငန်းစဉ်။
+*   **Tokenization**: စာသားကို tokens များအဖြစ် ပိုင်းခြားသော လုပ်ငန်းစဉ်။
+*   **Input IDs**: Tokenizer မှ ထုတ်ပေးသော tokens တစ်ခုစီ၏ ထူးခြားသော ဂဏန်းဆိုင်ရာ ID များ။
+*   **Padding**: မတူညီသော အရှည်ရှိသည့် input sequence များကို အရှည်တူညီအောင် သတ်မှတ်ထားသော တန်ဖိုးများဖြင့် ဖြည့်စွက်ခြင်း။
+*   **Truncation**: အရှည်ကန့်သတ်ချက်ထက် ပိုနေသော input sequence များကို ဖြတ်တောက်ခြင်း။
+*   **Attention Mask**: မော်ဒယ်ကို အာရုံစိုက်သင့်သည့် tokens များနှင့် လျစ်လျူရှုသင့်သည့် (padding) tokens များကို ခွဲခြားပေးသည့် binary mask။
+*   **🤗 Transformers API**: Hugging Face Transformers library ကို အသုံးပြုရန်အတွက် ပရိုဂရမ်မာများက ခေါ်ဆိုနိုင်သော လုပ်ဆောင်ချက်များ၊ class များ နှင့် methods များ။
+*   **`AutoTokenizer` Class**: Hugging Face Transformers library မှာ ပါဝင်တဲ့ class တစ်ခုဖြစ်ပြီး မော်ဒယ်အမည်ကို အသုံးပြုပြီး သက်ဆိုင်ရာ tokenizer ကို အလိုအလျောက် load လုပ်ပေးသည်။
+*   **`from_pretrained()` Method**: Pre-trained model သို့မဟုတ် tokenizer ကို load လုပ်ရန် အသုံးပြုသော method။
+*   **`distilbert-base-uncased-finetuned-sst-2-english`**: `sentiment-analysis` pipeline ၏ default checkpoint အဖြစ် အသုံးပြုသော DistilBERT မော်ဒယ်၏ အမည်။ `base` သည် မော်ဒယ်၏ အရွယ်အစားကို ဖော်ပြပြီး `uncased` သည် စာလုံးအကြီးအသေး ခွဲခြားခြင်းမရှိဘဲ လေ့ကျင့်ထားကြောင်း ဖော်ပြသည်။ `finetuned-sst-2-english` က SST-2 dataset တွင် English ဘာသာစကားအတွက် fine-tune လုပ်ထားသည်ကို ဆိုလိုသည်။
+*   **`model_inputs` Variable**: tokenizer ကနေ ထွက်လာတဲ့ model ရဲ့ inputs တွေအားလုံးကို သိမ်းဆည်းထားတဲ့ variable။
+*   **PyTorch Tensors**: PyTorch deep learning framework မှာ ဒေတာတွေကို ကိုယ်စားပြုသော multi-dimensional array များ။
+*   **NumPy Arrays**: Python တွင် ဂဏန်းတွက်ချက်မှုများအတွက် အသုံးပြုသော multi-dimensional array များအတွက် library။
+*   **`padding="longest"`**: Batch အတွင်းရှိ အရှည်ဆုံး sequence အထိ pad လုပ်ခြင်း။
+*   **`padding="max_length"`**: Model ၏ အများဆုံး length အထိ pad လုပ်ခြင်း။
+*   **`max_length`**: Padding သို့မဟုတ် truncation အတွက် သတ်မှတ်ထားသော အရှည် ကန့်သတ်ချက်။
+*   **`truncation=True`**: Sequences များကို သတ်မှတ်ထားသော length အထိ ဖြတ်တောက်ခြင်း။
+*   **`return_tensors="pt"`**: PyTorch tensors များကို ပြန်ပေးရန် tokenizer ကို ညွှန်ကြားခြင်း။
+*   **`return_tensors="np"`**: NumPy arrays များကို ပြန်ပေးရန် tokenizer ကို ညွှန်ကြားခြင်း။
+*   **Special Tokens**: Transformer model များက စာကြောင်းနယ်နိမိတ်များ သို့မဟုတ် အခြားအချက်အလက်များကို ကိုယ်စားပြုရန် အသုံးပြုသော အထူး tokens များ (ဥပမာ - `[CLS]`, `[SEP]`, `[PAD]`)။
+*   **`[CLS]`**: BERT မော်ဒယ်တွင် classification task အတွက် အသုံးပြုသော special token (စာကြောင်း၏ အစတွင် ပေါ်လာသည်)။
+*   **`[SEP]`**: BERT မော်ဒယ်တွင် စာကြောင်းများကြား ပိုင်းခြားရန် အသုံးပြုသော special token။
+*   **`tokenizer.decode()` Method**: Token IDs များကို မူရင်းစာသားသို့ ပြန်ပြောင်းလဲပေးသော method။
+*   **Inference**: လေ့ကျင့်ပြီးသား Artificial Intelligence (AI) မော်ဒယ်တစ်ခုကို အသုံးပြုပြီး input data ကနေ ခန့်မှန်းချက်တွေ ဒါမှမဟုတ် output တွေကို ထုတ်လုပ်တဲ့ လုပ်ငန်းစဉ်။
+*   **`AutoModelForSequenceClassification` Class**: Hugging Face Transformers library မှာ ပါဝင်တဲ့ class တစ်ခုဖြစ်ပြီး sequence classification အတွက် pre-trained model ကို အလိုအလျောက် load လုပ်ပေးသည်။
+*   **`model(**tokens)`**: tokenizer ကနေ ထုတ်ပေးတဲ့ dictionary ကို model ရဲ့ input အဖြစ် ထည့်သွင်းပေးခြင်း။
\ No newline at end of file
diff --git a/chapters/my/chapter2/7.mdx b/chapters/my/chapter2/7.mdx
new file mode 100644
index 000000000..a2543dcec
--- /dev/null
+++ b/chapters/my/chapter2/7.mdx
@@ -0,0 +1,32 @@
+# အခြေခံ အသုံးပြုမှု ပြီးဆုံးပါပြီ![[basic-usage-completed]]
+
+<CourseFloatingBanner
+    chapter={2}
+    classNames="absolute z-10 right-0 top-0"
+/>
+
+ဒီသင်တန်းကို ဒီအထိ လိုက်ပါခဲ့တဲ့အတွက် ဂုဏ်ယူပါတယ်။ အနှစ်ချုပ်အနေနဲ့၊ ဒီအခန်းမှာ သင်ဟာ အောက်ပါတို့ကို သင်ယူခဲ့ပါပြီ -
+
+-   Transformer model တစ်ခု၏ အခြေခံတည်ဆောက်ပုံများကို သင်ယူခဲ့သည်။
+-   tokenization pipeline ကို ဘာတွေနဲ့ ဖွဲ့စည်းထားတယ်ဆိုတာ သိရှိခဲ့သည်။
+-   လက်တွေ့မှာ Transformer model တစ်ခုကို ဘယ်လိုအသုံးပြုရမယ်ဆိုတာကို တွေ့ခဲ့ရသည်။
+-   tokenizer ကို အသုံးပြုပြီး text ကို model က နားလည်နိုင်တဲ့ tensors တွေအဖြစ် ဘယ်လိုပြောင်းလဲရမယ်ဆိုတာကို သင်ယူခဲ့သည်။
+-   text ကနေ predictions တွေရဖို့ tokenizer နဲ့ model ကို အတူတကွ တည်ဆောက်ခဲ့သည်။
+-   input IDs တွေရဲ့ ကန့်သတ်ချက်များကို သင်ယူခဲ့ပြီး attention masks တွေအကြောင်း သိရှိခဲ့သည်။
+-   အသုံးဝင်ပြီး စိတ်ကြိုက်ပြင်ဆင်နိုင်သော tokenizer methods များနှင့် ကစားကြည့်ခဲ့သည်။
+
+အခုကစပြီး သင်ဟာ 🤗 Transformers docs တွေထဲမှာ လွတ်လပ်စွာ သွားလာနိုင်ပါလိမ့်မယ်၊ Vocabulary တွေက ရင်းနှီးလာမှာဖြစ်ပြီး၊ သင် အချိန်အများစု အသုံးပြုရမယ့် methods တွေကိုလည်း သင် မြင်တွေ့ခဲ့ရပါပြီ။
+
+## ဝေါဟာရ ရှင်းလင်းချက် (Glossary)
+
+*   **Transformer Model**: Natural Language Processing (NLP) မှာ အောင်မြင်မှုများစွာရရှိခဲ့တဲ့ deep learning architecture တစ်မျိုးပါ။ ၎င်းတို့ဟာ စာသားတွေထဲက စကားလုံးတွေရဲ့ ဆက်နွယ်မှုတွေကို "attention mechanism" သုံးပြီး နားလည်အောင် သင်ကြားပေးပါတယ်။
+*   **Tokenization Pipeline**: စာသားကို AI မော်ဒယ်များ လုပ်ဆောင်နိုင်သော ဂဏန်းဆိုင်ရာ ကိုယ်စားပြုမှုအဖြစ် ပြောင်းလဲရန် လိုအပ်သော အဆင့်များ (ဥပမာ- tokenization, input IDs conversion, padding, truncation)။
+*   **Tokenizer**: စာသား (သို့မဟုတ် အခြားဒေတာ) ကို AI မော်ဒယ်များ စီမံဆောင်ရွက်နိုင်ရန် tokens တွေအဖြစ် ပိုင်းခြားပေးသည့် ကိရိယာ သို့မဟုတ် လုပ်ငန်းစဉ်။
+*   **Text**: လူသားဘာသာစကားဖြင့် ရေးသားထားသော စာသားအချက်အလက်များ။
+*   **Tensors**: Machine Learning frameworks (PyTorch, TensorFlow) များတွင် ဒေတာများကို ကိုယ်စားပြုသော multi-dimensional array များ။
+*   **Predictions**: Machine Learning မော်ဒယ်တစ်ခုက input data ကို အခြေခံပြီး ခန့်မှန်းထုတ်ပေးသော ရလဒ်များ။
+*   **Input IDs**: Tokenizer မှ ထုတ်ပေးသော tokens တစ်ခုစီ၏ ထူးခြားသော ဂဏန်းဆိုင်ရာ ID များ။
+*   **Attention Masks**: မော်ဒယ်ကို အာရုံစိုက်သင့်သည့် tokens များနှင့် လျစ်လျူရှုသင့်သည့် (padding) tokens များကို ခွဲခြားပေးသည့် binary mask။
+*   **Configurable Tokenizer Methods**: အသုံးပြုသူ၏ လိုအပ်ချက်များအတိုင်း ပြင်ဆင်သတ်မှတ်နိုင်သော tokenizer functions များ။
+*   **🤗 Transformers Docs**: Hugging Face Transformers library ၏ တရားဝင် မှတ်တမ်းများ (documentation)။
+*   **Vocabulary**: tokenizer သို့မဟုတ် model တစ်ခုက သိရှိနားလည်ပြီး ကိုင်တွယ်နိုင်သော ထူးခြားသည့် tokens များ စုစုပေါင်း။
\ No newline at end of file
diff --git a/chapters/my/chapter2/8.mdx b/chapters/my/chapter2/8.mdx
new file mode 100644
index 000000000..74e01168c
--- /dev/null
+++ b/chapters/my/chapter2/8.mdx
@@ -0,0 +1,930 @@
+# Optimization လုပ်ထားသော Inference Deployment[[optimized-inference-deployment]]
+
+ဒီအပိုင်းမှာတော့ LLM deployments တွေကို optimization လုပ်ဖို့အတွက် အဆင့်မြင့် frameworks တွေဖြစ်တဲ့ Text Generation Inference (TGI), vLLM, နဲ့ llama.cpp တို့ကို လေ့လာသွားပါမယ်။ ဒီ application တွေက အဓိကအားဖြင့် ထုတ်လုပ်မှု ပတ်ဝန်းကျင် (production environments) တွေမှာ LLM တွေကို သုံးစွဲသူများဆီသို့ ဝန်ဆောင်မှုပေးဖို့ အသုံးပြုကြပါတယ်။ ဒီအပိုင်းက ဒီ frameworks တွေကို production မှာ ဘယ်လို deploy လုပ်ရမယ်ဆိုတာကို အဓိကထားပြီး၊ single machine တစ်ခုပေါ်မှာ inference အတွက် ဘယ်လိုအသုံးပြုရမယ်ဆိုတာကို အာရုံစိုက်ထားခြင်း မရှိပါဘူး။
+
+ဒီ tools တွေက inference efficiency ကို ဘယ်လိုအမြင့်ဆုံးမြှင့်တင်ပြီး Large Language Models တွေကို production deployments တွေကို ဘယ်လို ရိုးရှင်းအောင် လုပ်ဆောင်တယ်ဆိုတာကို ကျွန်တော်တို့ ဖော်ပြပေးပါမယ်။
+
+## Framework ရွေးချယ်မှု လမ်းညွှန်[[framework-selection-guide]]
+
+TGI, vLLM, နဲ့ llama.cpp တို့ဟာ ရည်ရွယ်ချက်ချင်း တူညီပေမယ့်၊ မတူညီတဲ့ အသုံးပြုမှုပုံစံတွေအတွက် ပိုမိုသင့်လျော်စေတဲ့ ထူးခြားတဲ့ အင်္ဂါရပ်တွေ ရှိပါတယ်။ ၎င်းတို့ကြားက အဓိက ကွာခြားချက်တွေကို စွမ်းဆောင်ရည် (performance) နဲ့ ပေါင်းစပ်မှု (integration) ကို အာရုံစိုက်ပြီး ကြည့်ရအောင်။
+
+### Memory Management နှင့် Performance[[memory-management-and-performance]]
+
+**TGI** ကို production မှာ တည်ငြိမ်ပြီး ခန့်မှန်းနိုင်စေဖို့ ဒီဇိုင်းထုတ်ထားပြီး၊ memory အသုံးပြုမှုကို တသမတ်တည်း ထိန်းထားဖို့အတွက် fixed sequence lengths တွေကို အသုံးပြုပါတယ်။ TGI က Flash Attention 2 နဲ့ continuous batching နည်းစနစ်တွေကို အသုံးပြုပြီး memory ကို စီမံခန့်ခွဲပါတယ်။ ဒါက ၎င်းသည် attention calculations တွေကို အလွန်ထိထိရောက်ရောက် လုပ်ဆောင်နိုင်ပြီး GPU ကို အလုပ်တွေ အဆက်မပြတ် ပေးခြင်းဖြင့် အလုပ်များနေအောင် ထိန်းထားနိုင်တယ်လို့ ဆိုလိုပါတယ်။ လိုအပ်တဲ့အခါ စနစ်က model ရဲ့ အစိတ်အပိုင်းတွေကို CPU နဲ့ GPU ကြား ရွှေ့ပြောင်းနိုင်တာကြောင့် ပိုကြီးတဲ့ model တွေကို ကိုင်တွယ်ရာမှာ အထောက်အကူ ဖြစ်စေပါတယ်။
+
+<img src="https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/tgi/flash-attn.png" alt="Flash Attention" />
+
+<Tip title="Flash Attention အလုပ်လုပ်ပုံ">
+
+Flash Attention ဆိုတာ transformer models တွေမှာ attention mechanism ကို memory bandwidth bottlenecks တွေကို ဖြေရှင်းပေးခြင်းဖြင့် optimization လုပ်တဲ့ နည်းပညာတစ်ခုပါ။ [Chapter 1.8](/course/chapter1/8) မှာ ယခင်က ဆွေးနွေးခဲ့သလိုပဲ၊ attention mechanism မှာ quadratic complexity နဲ့ memory usage ရှိတာကြောင့် ရှည်လျားတဲ့ sequences တွေအတွက် ထိရောက်မှု မရှိပါဘူး။
+
+အဓိက တီထွင်မှုကတော့ High Bandwidth Memory (HBM) နဲ့ ပိုမြန်တဲ့ SRAM cache ကြား memory transfers တွေကို ဘယ်လို စီမံခန့်ခွဲလဲဆိုတဲ့ အချက်မှာပါပဲ။ ရိုးရာ attention နည်းလမ်းက HBM နဲ့ SRAM ကြား ဒေတာတွေကို အကြိမ်ကြိမ် transfer လုပ်တာကြောင့် GPU ကို အလုပ်မရှိဘဲ ထားခြင်းဖြင့် bottlenecks တွေ ဖြစ်စေပါတယ်။ Flash Attention က ဒေတာတွေကို SRAM ထဲကို တစ်ကြိမ်တည်း load လုပ်ပြီး အဲဒီမှာပဲ calculations တွေအားလုံးကို လုပ်ဆောင်တာကြောင့် ကုန်ကျစရိတ်များတဲ့ memory transfers တွေကို လျှော့ချပေးပါတယ်။
+
+အကျိုးကျေးဇူးတွေက training လုပ်နေစဉ်မှာ အရေးအပါဆုံး ဖြစ်ပေမယ့်၊ Flash Attention ရဲ့ လျှော့ချထားတဲ့ VRAM အသုံးပြုမှုနဲ့ တိုးတက်လာတဲ့ efficiency က inference အတွက်ပါ အဖိုးတန်စေပြီး၊ ပိုမိုမြန်ဆန်ပြီး ပိုမို scalable ဖြစ်တဲ့ LLM serving ကို ဖြစ်ပေါ်စေပါတယ်။
+
+</Tip>
+
+**vLLM** က PagedAttention ကို အသုံးပြုပြီး မတူညီတဲ့ နည်းလမ်းတစ်ခုကို အသုံးပြုပါတယ်။ ကွန်ပျူတာက memory ကို pages တွေနဲ့ စီမံခန့်ခွဲသလိုပဲ၊ vLLM က model ရဲ့ memory ကို ပိုသေးငယ်တဲ့ blocks တွေအဖြစ် ပိုင်းခြားပါတယ်။ ဒီ clever system ကြောင့် ၎င်းသည် မတူညီတဲ့ အရွယ်အစားရှိတဲ့ requests တွေကို ပိုမိုပြောင်းလွယ်ပြင်လွယ် ကိုင်တွယ်နိုင်ပြီး memory space ကို မဖြုန်းတီးပါဘူး။ ဒါက မတူညီတဲ့ requests တွေကြား memory ကို မျှဝေရာမှာ အထူးကောင်းမွန်ပြီး memory fragmentation ကို လျှော့ချပေးတာကြောင့် စနစ်တစ်ခုလုံးကို ပိုမိုထိရောက်စေပါတယ်။
+
+<Tip title="PagedAttention အလုပ်လုပ်ပုံ">
+
+PagedAttention ဆိုတာ LLM inference မှာ နောက်ထပ် အရေးကြီးတဲ့ bottleneck တစ်ခုဖြစ်တဲ့ KV cache memory management ကို ဖြေရှင်းပေးတဲ့ နည်းပညာတစ်ခုပါ။ [Chapter 1.8](/course/chapter1/8) မှာ ဆွေးနွေးခဲ့သလိုပဲ၊ text generation လုပ်နေစဉ်မှာ model က attention keys နဲ့ values (KV cache) တွေကို ထုတ်လုပ်လိုက်တဲ့ token တစ်ခုစီအတွက် သိမ်းဆည်းထားပြီး ထပ်ခါတလဲလဲ တွက်ချက်မှုတွေကို လျှော့ချပါတယ်။ KV cache က အထူးသဖြင့် ရှည်လျားတဲ့ sequences တွေ ဒါမှမဟုတ် concurrent requests များစွာနဲ့ဆိုရင် အလွန်ကြီးမားလာနိုင်ပါတယ်။
+
+vLLM ရဲ့ အဓိက တီထွင်မှုကတော့ ဒီ cache ကို ဘယ်လို စီမံခန့်ခွဲလဲဆိုတဲ့ အချက်မှာပါပဲ-
+
+1.  **Memory Paging**: KV cache ကို ကြီးမားတဲ့ block တစ်ခုအဖြစ် မမှတ်ယူဘဲ၊ ၎င်းကို fixed-size "pages" တွေအဖြစ် ပိုင်းခြားထားပါတယ် (operating systems တွေမှာ virtual memory နဲ့ ဆင်တူပါတယ်)။
+2.  **Non-contiguous Storage**: Pages တွေကို GPU memory မှာ ဆက်တိုက် သိမ်းဆည်းထားဖို့ မလိုအပ်တာကြောင့် ပိုမိုပြောင်းလွယ်ပြင်လွယ်ရှိတဲ့ memory allocation ကို ဖြစ်ပေါ်စေပါတယ်။
+3.  **Page Table Management**: Page table တစ်ခုက ဘယ် pages တွေက ဘယ် sequence နဲ့ သက်ဆိုင်တယ်ဆိုတာကို ခြေရာခံပြီး၊ ထိရောက်တဲ့ lookup နဲ့ access ကို ဖြစ်ပေါ်စေပါတယ်။
+4.  **Memory Sharing**: parallel sampling လို လုပ်ငန်းတွေအတွက်၊ prompt အတွက် KV cache ကို သိမ်းဆည်းထားတဲ့ pages တွေကို sequences များစွာမှာ မျှဝေအသုံးပြုနိုင်ပါတယ်။
+
+PagedAttention နည်းလမ်းက ရိုးရာနည်းလမ်းတွေနဲ့ နှိုင်းယှဉ်ရင် throughput ကို ၂၄ ဆအထိ ပိုမိုမြင့်မားစေနိုင်တာကြောင့် production LLM deployments တွေအတွက် game-changer တစ်ခု ဖြစ်ပါတယ်။ PagedAttention ဘယ်လိုအလုပ်လုပ်တယ်ဆိုတာကို တကယ်နက်နက်နဲနဲ လေ့လာချင်တယ်ဆိုရင် [vLLM documentation ရဲ့ လမ်းညွှန်](https://docs.vllm.ai/en/latest/design/kernel/paged_attention.html) ကို ဖတ်ရှုနိုင်ပါတယ်။
+
+</Tip>
+
+**llama.cpp** ဟာ မူလက LLaMA models တွေကို consumer hardware တွေမှာ run ဖို့ ဒီဇိုင်းထုတ်ထားတဲ့ highly optimized C/C++ implementation တစ်ခုပါ။ ဒါက optional GPU acceleration ပါဝင်တဲ့ CPU efficiency ကို အာရုံစိုက်ပြီး၊ resource-constrained environments တွေအတွက် အကောင်းဆုံးပါပဲ။ llama.cpp က model size နဲ့ memory requirements တွေကို လျှော့ချဖို့အတွက် quantization နည်းစနစ်တွေကို အသုံးပြုပြီး ကောင်းမွန်တဲ့ performance ကို ထိန်းသိမ်းထားပါတယ်။ ဒါက အမျိုးမျိုးသော CPU architectures တွေအတွက် optimized kernels တွေကို implement လုပ်ထားပြီး၊ ထိရောက်တဲ့ token generation အတွက် basic KV cache management ကို ထောက်ပံ့ပေးပါတယ်။
+
+<Tip title="llama.cpp Quantization အလုပ်လုပ်ပုံ">
+
+llama.cpp မှာ Quantization ဆိုတာ model weights တွေရဲ့ precision ကို 32-bit ဒါမှမဟုတ် 16-bit floating point ကနေ 8-bit integers (INT8)၊ 4-bit ဒါမှမဟုတ် ပိုနိမ့်တဲ့ precision formats တွေအဖြစ် လျှော့ချတာပါ။ ဒါက memory အသုံးပြုမှုကို သိသိသာသာ လျှော့ချပေးပြီး အနည်းဆုံး အရည်အသွေး ဆုံးရှုံးမှုနဲ့အတူ inference speed ကို မြှင့်တင်ပေးပါတယ်။
+
+llama.cpp မှာ အဓိက quantization features တွေကတော့-
+1.  **Multiple Quantization Levels**: 8-bit, 4-bit, 3-bit, နဲ့ 2-bit quantization ကိုပါ ထောက်ပံ့ပေးပါတယ်။
+2.  **GGML/GGUF Format**: Quantized inference အတွက် optimization လုပ်ထားတဲ့ custom tensor formats တွေကို အသုံးပြုပါတယ်။
+3.  **Mixed Precision**: Model ရဲ့ မတူညီတဲ့ အစိတ်အပိုင်းတွေမှာ မတူညီတဲ့ quantization levels တွေကို အသုံးပြုနိုင်ပါတယ်။
+4.  **Hardware-Specific Optimizations**: အမျိုးမျိုးသော CPU architectures တွေ (AVX2, AVX-512, NEON) အတွက် optimized code paths တွေ ပါဝင်ပါတယ်။
+
+ဒီနည်းလမ်းက limited memory ရှိတဲ့ consumer hardware တွေမှာ billion-parameter models တွေကို run နိုင်စေပြီး၊ local deployments နဲ့ edge devices တွေအတွက် အကောင်းဆုံး ဖြစ်စေပါတယ်။
+
+</Tip>
+
+### Deployment နှင့် Integration[[deployment-and-integration]]
+
+frameworks တွေကြားက deployment နဲ့ integration ကွာခြားချက်တွေကို ဆက်သွားရအောင်။
+
+**TGI** က သူ့ရဲ့ production-ready features တွေနဲ့ enterprise-level deployment တွေမှာ ထူးချွန်ပါတယ်။ ဒါက built-in Kubernetes support နဲ့ Prometheus နဲ့ Grafana ကနေတဆင့် monitoring လုပ်ခြင်း၊ automatic scaling, နဲ့ ပြည့်စုံတဲ့ safety features တွေလို production မှာ run ဖို့ လိုအပ်တဲ့ အရာအားလုံး ပါဝင်ပါတယ်။ စနစ်က enterprise-grade logging နဲ့ content filtering နဲ့ rate limiting လိုမျိုး အမျိုးမျိုးသော ကာကွယ်မှု နည်းလမ်းတွေပါ ပါဝင်တာကြောင့် သင့် deployment ကို လုံခြုံပြီး တည်ငြိမ်အောင် ထိန်းထားနိုင်ပါတယ်။
+
+**vLLM** က deployment အတွက် ပိုမိုပြောင်းလွယ်ပြင်လွယ်ရှိပြီး developer-friendly ဖြစ်တဲ့ ချဉ်းကပ်မှုကို အသုံးပြုပါတယ်။ ဒါက Python ကို အဓိကထားပြီး တည်ဆောက်ထားတာကြောင့် သင့်ရဲ့ လက်ရှိ application တွေမှာ OpenAI ရဲ့ API ကို အလွယ်တကူ အစားထိုးနိုင်ပါတယ်။ framework က raw performance ကို ပေးစွမ်းဖို့ အာရုံစိုက်ပြီး၊ သင့်ရဲ့ သီးခြားလိုအပ်ချက်တွေနဲ့ ကိုက်ညီအောင် စိတ်ကြိုက်ပြင်ဆင်နိုင်ပါတယ်။ ဒါက clusters တွေကို စီမံခန့်ခွဲဖို့အတွက် Ray နဲ့ အထူးကောင်းမွန်စွာ အလုပ်လုပ်တာကြောင့် high performance နဲ့ adaptability လိုအပ်တဲ့အခါ အကောင်းဆုံး ရွေးချယ်မှုတစ်ခု ဖြစ်ပါတယ်။
+
+**llama.cpp** က ရိုးရှင်းမှုနဲ့ portability ကို ဦးစားပေးပါတယ်။ သူ့ရဲ့ server implementation က ပေါ့ပါးပြီး hardware အမျိုးမျိုး (powerful servers တွေကနေ consumer laptops တွေနဲ့ အချို့ high-end mobile devices တွေအထိ) မှာ run နိုင်ပါတယ်။ အနည်းဆုံး dependencies တွေနဲ့ ရိုးရှင်းတဲ့ C/C++ core နဲ့ဆိုရင်၊ Python frameworks တွေ install လုပ်ဖို့ ခက်ခဲတဲ့ environments တွေမှာ deploy လုပ်ဖို့ လွယ်ကူပါတယ်။ server က OpenAI-compatible API ကို ပံ့ပိုးပေးထားပြီး အခြား solution တွေထက် resource အသုံးပြုမှုက အများကြီး သေးငယ်ပါတယ်။
+
+## စတင်ခြင်း[[getting-started]]
+
+LLMs တွေကို deploy လုပ်ဖို့ ဒီ frameworks တွေကို ဘယ်လိုအသုံးပြုရမလဲဆိုတာကို လေ့လာကြည့်ရအောင်။ installation နဲ့ basic setup ကနေ စတင်ပါမယ်။
+
+### Installation နှင့် Basic Setup[[installation-and-basic-setup]]
+
+<hfoptions id="inference-frameworks" >
+
+<hfoption value="tgi" label="TGI">
+
+TGI က install လုပ်ဖို့နဲ့ အသုံးပြုဖို့ လွယ်ကူပြီး၊ Hugging Face ecosystem ထဲမှာ နက်ရှိုင်းစွာ ပေါင်းစပ်ထားပါတယ်။
+
+ပထမဆုံး၊ Docker ကို အသုံးပြုပြီး TGI server ကို launch လုပ်ပါ။
+
+```sh
+docker run --gpus all \
+    --shm-size 1g \
+    -p 8080:80 \
+    -v ~/.cache/huggingface:/data \
+    ghcr.io/huggingface/text-generation-inference:latest \
+    --model-id HuggingFaceTB/SmolLM2-360M-Instruct
+```
+
+အဲဒီနောက် Hugging Face ရဲ့ InferenceClient ကို အသုံးပြုပြီး အပြန်အလှန်ဆက်သွယ်ပါ။
+
+```python
+from huggingface_hub import InferenceClient
+
+# TGI endpoint ကို ညွှန်ပြပြီး client ကို Initialize လုပ်ပါ။
+client = InferenceClient(
+    model="http://localhost:8080",  # URL to the TGI server
+)
+
+# Text generation
+response = client.text_generation(
+    "Tell me a story",
+    max_new_tokens=100,
+    temperature=0.7,
+    top_p=0.95,
+    details=True,
+    stop_sequences=[],
+)
+print(response.generated_text)
+
+# For chat format
+response = client.chat_completion(
+    messages=[
+        {"role": "system", "content": "You are a helpful assistant."},
+        {"role": "user", "content": "Tell me a story"},
+    ],
+    max_tokens=100,
+    temperature=0.7,
+    top_p=0.95,
+)
+print(response.choices[0].message.content)
+```
+
+တနည်းအားဖြင့် OpenAI client ကို အသုံးပြုနိုင်ပါတယ်။
+
+```python
+from openai import OpenAI
+
+# Initialize client pointing to TGI endpoint
+client = OpenAI(
+    base_url="http://localhost:8080/v1",  # /v1 ကို ထည့်သွင်းဖို့ သေချာပါစေ။
+    api_key="not-needed",  # TGI က default အားဖြင့် API key မလိုအပ်ပါဘူး။
+)
+
+# Chat completion
+response = client.chat.completions.create(
+    model="HuggingFaceTB/SmolLM2-360M-Instruct",
+    messages=[
+        {"role": "system", "content": "You are a helpful assistant."},
+        {"role": "user", "content": "Tell me a story"},
+    ],
+    max_tokens=100,
+    temperature=0.7,
+    top_p=0.95,
+)
+print(response.choices[0].message.content)
+```
+
+</hfoption>
+
+<hfoption value="llama.cpp" label="llama.cpp">
+
+llama.cpp က install လုပ်ဖို့နဲ့ အသုံးပြုဖို့ လွယ်ကူပြီး၊ အနည်းဆုံး dependencies တွေပဲ လိုအပ်ကာ CPU နဲ့ GPU inference နှစ်ခုလုံးကို ထောက်ပံ့ပါတယ်။
+
+ပထမဆုံး၊ llama.cpp ကို install လုပ်ပြီး build လုပ်ပါ။
+
+```sh
+# Repository ကို clone လုပ်ပါ။
+git clone https://github.com/ggerganov/llama.cpp
+cd llama.cpp
+
+# Project ကို build လုပ်ပါ။
+make
+
+# SmolLM2-1.7B-Instruct-GGUF model ကို download လုပ်ပါ။
+curl -L -O https://huggingface.co/HuggingFaceTB/SmolLM2-1.7B-Instruct-GGUF/resolve/main/smollm2-1.7b-instruct.Q4_K_M.gguf
+```
+
+အဲဒီနောက် server ကို launch လုပ်ပါ (OpenAI API compatibility နဲ့)။
+
+```sh
+# Start the server
+./server \
+    -m smollm2-1.7b-instruct.Q4_K_M.gguf \
+    --host 0.0.0.0 \
+    --port 8080 \
+    -c 4096 \
+    --n-gpu-layers 0  # GPU ကို အသုံးပြုရန်အတွက် ပိုမိုမြင့်မားသော နံပါတ်ကို သတ်မှတ်ပါ။
+```
+
+Hugging Face ရဲ့ InferenceClient ကို အသုံးပြုပြီး server နဲ့ အပြန်အလှန်ဆက်သွယ်ပါ။
+
+```python
+from huggingface_hub import InferenceClient
+
+# llama.cpp server ကို ညွှန်ပြပြီး client ကို Initialize လုပ်ပါ။
+client = InferenceClient(
+    model="http://localhost:8080/v1",  # llama.cpp server ရဲ့ URL
+    token="sk-no-key-required",  # llama.cpp server က ဒီ placeholder ကို လိုအပ်ပါတယ်။
+)
+
+# Text generation
+response = client.text_generation(
+    "Tell me a story",
+    max_new_tokens=100,
+    temperature=0.7,
+    top_p=0.95,
+    details=True,
+)
+print(response.generated_text)
+
+# Chat format အတွက်
+response = client.chat_completion(
+    messages=[
+        {"role": "system", "content": "You are a helpful assistant."},
+        {"role": "user", "content": "Tell me a story"},
+    ],
+    max_tokens=100,
+    temperature=0.7,
+    top_p=0.95,
+)
+print(response.choices[0].message.content)
+```
+
+တနည်းအားဖြင့် OpenAI client ကို အသုံးပြုနိုင်ပါတယ်။
+
+```python
+from openai import OpenAI
+
+# llama.cpp server ကို ညွှန်ပြပြီး client ကို Initialize လုပ်ပါ။
+client = OpenAI(
+    base_url="http://localhost:8080/v1",
+    api_key="sk-no-key-required",  # llama.cpp server က ဒီ placeholder ကို လိုအပ်ပါတယ်။
+)
+
+# Chat completion
+response = client.chat.completions.create(
+    model="smollm2-1.7b-instruct",  # Server က model တစ်ခုတည်းသာ load လုပ်တာကြောင့် model identifier က ဘာပဲဖြစ်ဖြစ် ရပါတယ်။
+    messages=[
+        {"role": "system", "content": "You are a helpful assistant."},
+        {"role": "user", "content": "Tell me a story"},
+    ],
+    max_tokens=100,
+    temperature=0.7,
+    top_p=0.95,
+)
+print(response.choices[0].message.content)
+```
+
+</hfoption>
+
+<hfoption value="vllm" label="vLLM">
+
+vLLM က install လုပ်ဖို့နဲ့ အသုံးပြုဖို့ လွယ်ကူပြီး၊ OpenAI API compatibility နဲ့ native Python interface နှစ်ခုလုံး ပါဝင်ပါတယ်။
+
+ပထမဆုံး၊ vLLM OpenAI-compatible server ကို launch လုပ်ပါ။
+
+```sh
+python -m vllm.entrypoints.openai.api_server \
+    --model HuggingFaceTB/SmolLM2-360M-Instruct \
+    --host 0.0.0.0 \
+    --port 8000
+```
+
+အဲဒီနောက် Hugging Face ရဲ့ InferenceClient ကို အသုံးပြုပြီး အပြန်အလှန်ဆက်သွယ်ပါ။
+
+```python
+from huggingface_hub import InferenceClient
+
+# vLLM endpoint ကို ညွှန်ပြပြီး client ကို Initialize လုပ်ပါ။
+client = InferenceClient(
+    model="http://localhost:8000/v1",  # vLLM server ရဲ့ URL
+)
+
+# Text generation
+response = client.text_generation(
+    "Tell me a story",
+    max_new_tokens=100,
+    temperature=0.7,
+    top_p=0.95,
+    details=True,
+)
+print(response.generated_text)
+
+# Chat format အတွက်
+response = client.chat_completion(
+    messages=[
+        {"role": "system", "content": "You are a helpful assistant."},
+        {"role": "user", "content": "Tell me a story"},
+    ],
+    max_tokens=100,
+    temperature=0.7,
+    top_p=0.95,
+)
+print(response.choices[0].message.content)
+```
+
+တနည်းအားဖြင့် OpenAI client ကို အသုံးပြုနိုင်ပါတယ်။
+
+```python
+from openai import OpenAI
+
+# vLLM endpoint ကို ညွှန်ပြပြီး client ကို Initialize လုပ်ပါ။
+client = OpenAI(
+    base_url="http://localhost:8000/v1",
+    api_key="not-needed",  # vLLM က default အားဖြင့် API key မလိုအပ်ပါဘူး။
+)
+
+# Chat completion
+response = client.chat.completions.create(
+    model="HuggingFaceTB/SmolLM2-360M-Instruct",
+    messages=[
+        {"role": "system", "content": "You are a helpful assistant."},
+        {"role": "user", "content": "Tell me a story"},
+    ],
+    max_tokens=100,
+    temperature=0.7,
+    top_p=0.95,
+)
+print(response.choices[0].message.content)
+```
+
+</hfoption>
+
+</hfoptions>
+
+### Basic Text Generation
+
+frameworks တွေနဲ့ text generation ရဲ့ ဥပမာတွေကို ကြည့်ရအောင်။
+
+<hfoptions id="inference-frameworks" >
+
+<hfoption value="tgi" label="TGI">
+
+ပထမဆုံး၊ TGI ကို အဆင့်မြင့် parameters တွေနဲ့ deploy လုပ်ပါ။
+```sh
+docker run --gpus all \
+    --shm-size 1g \
+    -p 8080:80 \
+    -v ~/.cache/huggingface:/data \
+    ghcr.io/huggingface/text-generation-inference:latest \
+    --model-id HuggingFaceTB/SmolLM2-360M-Instruct \
+    --max-total-tokens 4096 \
+    --max-input-length 3072 \
+    --max-batch-total-tokens 8192 \
+    --waiting-served-ratio 1.2
+```
+
+ပြောင်းလွယ်ပြင်လွယ်ရှိတဲ့ text generation အတွက် InferenceClient ကို အသုံးပြုပါ။
+
+```python
+from huggingface_hub import InferenceClient
+
+client = InferenceClient(model="http://localhost:8080")
+
+# အဆင့်မြင့် parameters ဥပမာ
+response = client.chat_completion(
+    messages=[
+        {"role": "system", "content": "You are a creative storyteller."},
+        {"role": "user", "content": "Write a creative story"},
+    ],
+    temperature=0.8,
+    max_tokens=200,
+    top_p=0.95,
+)
+print(response.choices[0].message.content)
+
+# Raw text generation
+response = client.text_generation(
+    "Write a creative story about space exploration",
+    max_new_tokens=200,
+    temperature=0.8,
+    top_p=0.95,
+    repetition_penalty=1.1,
+    do_sample=True,
+    details=True,
+)
+print(response.generated_text)
+```
+
+ဒါမှမဟုတ် OpenAI client ကို အသုံးပြုပါ။
+```python
+from openai import OpenAI
+
+client = OpenAI(base_url="http://localhost:8080/v1", api_key="not-needed")
+
+# အဆင့်မြင့် parameters ဥပမာ
+response = client.chat.completions.create(
+    model="HuggingFaceTB/SmolLM2-360M-Instruct",
+    messages=[
+        {"role": "system", "content": "You are a creative storyteller."},
+        {"role": "user", "content": "Write a creative story"},
+    ],
+    temperature=0.8,  # Higher for more creativity
+)
+print(response.choices[0].message.content)
+```
+
+</hfoption>
+
+<hfoption value="llama.cpp" label="llama.cpp">
+
+llama.cpp အတွက် အဆင့်မြင့် parameters တွေကို server ကို launch လုပ်တဲ့အခါ သတ်မှတ်နိုင်ပါတယ်။
+
+```sh
+./server \
+    -m smollm2-1.7b-instruct.Q4_K_M.gguf \
+    --host 0.0.0.0 \
+    --port 8080 \
+    -c 4096 \            # Context size
+    --threads 8 \        # CPU threads to use
+    --batch-size 512 \   # Batch size for prompt evaluation
+    --n-gpu-layers 0     # GPU layers (0 = CPU only)
+```
+
+InferenceClient ကို အသုံးပြုပါ။
+
+```python
+from huggingface_hub import InferenceClient
+
+client = InferenceClient(model="http://localhost:8080/v1", token="sk-no-key-required")
+
+# Advanced parameters example
+response = client.chat_completion(
+    messages=[
+        {"role": "system", "content": "You are a creative storyteller."},
+        {"role": "user", "content": "Write a creative story"},
+    ],
+    temperature=0.8,
+    max_tokens=200,
+    top_p=0.95,
+)
+print(response.choices[0].message.content)
+
+# တိုက်ရိုက် text generation အတွက်
+response = client.text_generation(
+    "Write a creative story about space exploration",
+    max_new_tokens=200,
+    temperature=0.8,
+    top_p=0.95,
+    repetition_penalty=1.1,
+    details=True,
+)
+print(response.generated_text)
+```
+
+ဒါမှမဟုတ် sampling parameters တွေကို ထိန်းချုပ်ပြီး generation လုပ်ဖို့ OpenAI client ကို အသုံးပြုပါ။
+
+```python
+from openai import OpenAI
+
+client = OpenAI(base_url="http://localhost:8080/v1", api_key="sk-no-key-required")
+
+# အဆင့်မြင့် parameters ဥပမာ
+response = client.chat.completions.create(
+    model="smollm2-1.7b-instruct",
+    messages=[
+        {"role": "system", "content": "You are a creative storyteller."},
+        {"role": "user", "content": "Write a creative story"},
+    ],
+    temperature=0.8,  # ပိုမိုဖန်တီးမှုရှိရန် ပိုမိုမြင့်မားသော တန်ဖိုး
+    top_p=0.95,  # Nucleus sampling probability
+    frequency_penalty=0.5,  # မကြာခဏ ပေါ်လာသော tokens များကို ထပ်ခါတလဲလဲ မဖြစ်အောင် လျှော့ချပါ။
+    presence_penalty=0.5,  # ရှိပြီးသား tokens များကို ပြန်လည်ပေါ်ထွက်မှုကို လျှော့ချပါ။
+    max_tokens=200,  # အများဆုံး generation length
+)
+print(response.choices[0].message.content)
+```
+
+llama.cpp ရဲ့ native library ကိုလည်း ပိုမိုထိန်းချုပ်နိုင်ရန် အသုံးပြုနိုင်ပါတယ်။
+
+```python
+# တိုက်ရိုက် model access အတွက် llama-cpp-python package ကို အသုံးပြုခြင်း
+from llama_cpp import Llama
+
+# Model ကို load လုပ်ပါ။
+llm = Llama(
+    model_path="smollm2-1.7b-instruct.Q4_K_M.gguf",
+    n_ctx=4096,  # Context window size
+    n_threads=8,  # CPU threads
+    n_gpu_layers=0,  # GPU layers (0 = CPU only)
+)
+
+# Model ရဲ့ မျှော်လင့်ထားတဲ့ format အတိုင်း prompt ကို format လုပ်ပါ။
+prompt = """<|im_start|>system
+You are a creative storyteller.
+<|im_end|>
+<|im_start|>user
+Write a creative story
+<|im_end|>
+<|im_start|>assistant
+"""
+
+# တိကျတဲ့ parameter ထိန်းချုပ်မှုနဲ့ response ကို generate လုပ်ပါ။
+output = llm(
+    prompt,
+    max_tokens=200,
+    temperature=0.8,
+    top_p=0.95,
+    frequency_penalty=0.5,
+    presence_penalty=0.5,
+    stop=["<|im_end|>"],
+)
+
+print(output["choices"][0]["text"])
+```
+
+</hfoption>
+
+<hfoption value="vllm" label="vLLM">
+
+vLLM နဲ့ အဆင့်မြင့် အသုံးပြုမှုအတွက် InferenceClient ကို အသုံးပြုနိုင်ပါတယ်။
+
+```python
+from huggingface_hub import InferenceClient
+
+client = InferenceClient(model="http://localhost:8000/v1")
+
+# အဆင့်မြင့် parameters ဥပမာ
+response = client.chat_completion(
+    messages=[
+        {"role": "system", "content": "You are a creative storyteller."},
+        {"role": "user", "content": "Write a creative story"},
+    ],
+    temperature=0.8,
+    max_tokens=200,
+    top_p=0.95,
+)
+print(response.choices[0].message.content)
+
+# တိုက်ရိုက် text generation အတွက်
+response = client.text_generation(
+    "Write a creative story about space exploration",
+    max_new_tokens=200,
+    temperature=0.8,
+    top_p=0.95,
+    details=True,
+)
+print(response.generated_text)
+```
+
+OpenAI client ကိုလည်း အသုံးပြုနိုင်ပါတယ်။
+
+```python
+from openai import OpenAI
+
+client = OpenAI(base_url="http://localhost:8000/v1", api_key="not-needed")
+
+# အဆင့်မြင့် parameters ဥပမာ
+response = client.chat.completions.create(
+    model="HuggingFaceTB/SmolLM2-360M-Instruct",
+    messages=[
+        {"role": "system", "content": "You are a creative storyteller."},
+        {"role": "user", "content": "Write a creative story"},
+    ],
+    temperature=0.8,
+    top_p=0.95,
+    max_tokens=200,
+)
+print(response.choices[0].message.content)
+```
+
+vLLM က fine-grained control ပါဝင်တဲ့ native Python interface ကိုလည်း ပံ့ပိုးပေးပါတယ်-
+
+```python
+from vllm import LLM, SamplingParams
+
+# အဆင့်မြင့် parameters တွေနဲ့ model ကို Initialize လုပ်ပါ။
+llm = LLM(
+    model="HuggingFaceTB/SmolLM2-360M-Instruct",
+    gpu_memory_utilization=0.85,
+    max_num_batched_tokens=8192,
+    max_num_seqs=256,
+    block_size=16,
+)
+
+# Sampling parameters တွေကို Configure လုပ်ပါ။
+sampling_params = SamplingParams(
+    temperature=0.8,  # ပိုမိုဖန်တီးမှုရှိရန် ပိုမိုမြင့်မားသော တန်ဖိုး
+    top_p=0.95,  # 95% ဖြစ်နိုင်ခြေအများဆုံး tokens များကို ထည့်သွင်းစဉ်းစားပါ။
+    max_tokens=100,  # အများဆုံး အရှည်
+    presence_penalty=1.1,  # ထပ်ခါတလဲလဲ မဖြစ်အောင် လျှော့ချပါ။
+    frequency_penalty=1.1,  # ထပ်ခါတလဲလဲ မဖြစ်အောင် လျှော့ချပါ။
+    stop=["\n\n", "###"],  # Stop sequences
+)
+
+# Text generate လုပ်ပါ။
+prompt = "Write a creative story"
+outputs = llm.generate(prompt, sampling_params)
+print(outputs[0].outputs[0].text)
+
+# Chat-style interactions အတွက်
+chat_prompt = [
+    {"role": "system", "content": "You are a creative storyteller."},
+    {"role": "user", "content": "Write a creative story"},
+]
+formatted_prompt = llm.get_chat_template()(chat_prompt)  # Uses model's chat template
+outputs = llm.generate(formatted_prompt, sampling_params)
+print(outputs[0].outputs[0].text)
+```
+
+</hfoption>
+
+</hfoptions>
+
+## Advanced Generation Control
+
+### Token Selection နှင့် Sampling[[token-selection-and-sampling]]
+
+text ကို generate လုပ်တဲ့ လုပ်ငန်းစဉ်မှာ အဆင့်တိုင်းမှာ နောက်ထပ် token ကို ရွေးချယ်တာ ပါဝင်ပါတယ်။ ဒီရွေးချယ်မှု လုပ်ငန်းစဉ်ကို parameters အမျိုးမျိုးကနေတစ်ဆင့် ထိန်းချုပ်နိုင်ပါတယ် -
+
+1.  **Raw Logits**: token တစ်ခုစီအတွက် မူရင်း output probabilities များ။
+2.  **Temperature**: ရွေးချယ်မှုမှာရှိတဲ့ ကျပန်းဆန်မှုကို ထိန်းချုပ်ပါတယ် (ပိုမြင့်ရင် ပိုမိုဖန်တီးမှုရှိပါတယ်)။
+3.  **Top-p (Nucleus) Sampling**: ဖြစ်နိုင်ခြေပမာဏ X% ကို ဖွဲ့စည်းထားတဲ့ ထိပ်ဆုံး tokens တွေကို စစ်ထုတ်ပါတယ်။
+4.  **Top-k Filtering**: ဖြစ်နိုင်ခြေအများဆုံး tokens k ခုအထိ ရွေးချယ်မှုကို ကန့်သတ်ပါတယ်။
+
+ဒီ parameters တွေကို ဘယ်လို configure လုပ်ရမလဲဆိုတာကတော့...
+
+<hfoptions id="inference-frameworks" >
+
+<hfoption value="tgi" label="TGI">
+
+```python
+client.generate(
+    "Write a creative story",
+    temperature=0.8,  # ပိုမိုဖန်တီးမှုရှိရန် ပိုမိုမြင့်မားသော တန်ဖိုး
+    top_p=0.95,  # 95% ဖြစ်နိုင်ခြေအများဆုံး tokens များကို ထည့်သွင်းစဉ်းစားပါ။
+    top_k=50,  # ထိပ်ဆုံး 50 tokens များကို ထည့်သွင်းစဉ်းစားပါ။
+    max_new_tokens=100,  # အများဆုံး အရှည်
+    repetition_penalty=1.1,  # ထပ်ခါတလဲလဲ မဖြစ်အောင် လျှော့ချပါ။
+)
+```
+
+</hfoption>
+
+<hfoption value="llama.cpp" label="llama.cpp">
+
+```python
+# OpenAI API compatibility မှတစ်ဆင့်
+response = client.completions.create(
+    model="smollm2-1.7b-instruct",  # Model name (llama.cpp server အတွက် မည်သည့် string မဆို ဖြစ်နိုင်သည်)
+    prompt="ဖန်တီးမှုရှိတဲ့ ပုံပြင်တစ်ပုဒ် ရေးပေးပါ။",
+    temperature=0.8,  # ပိုမိုဖန်တီးမှုရှိရန် ပိုမိုမြင့်မားသော တန်ဖိုး
+    top_p=0.95,  # 95% ဖြစ်နိုင်ခြေအများဆုံး tokens များကို ထည့်သွင်းစဉ်းစားပါ။
+    frequency_penalty=1.1,  # မကြာခဏ ပေါ်လာသော tokens များကို ထပ်ခါတလဲလဲ မဖြစ်အောင် လျှော့ချပါ။
+    presence_penalty=0.1,  # ရှိပြီးသား tokens များကို ထပ်ခါတလဲလဲ မဖြစ်အောင် လျှော့ချပါ။
+    max_tokens=100,  # အများဆုံး အရှည်
+)
+
+# llama-cpp-python တိုက်ရိုက် access မှတစ်ဆင့်
+output = llm(
+    "Write a creative story",
+    temperature=0.8,
+    top_p=0.95,
+    top_k=50,
+    max_tokens=100,
+    repeat_penalty=1.1,
+)
+```
+
+</hfoption>
+
+<hfoption value="vllm" label="vLLM">
+
+```python
+params = SamplingParams(
+    temperature=0.8,  # ပိုမိုဖန်တီးမှုရှိရန် ပိုမိုမြင့်မားသော တန်ဖိုး
+    top_p=0.95,  # 95% ဖြစ်နိုင်ခြေအများဆုံး tokens များကို ထည့်သွင်းစဉ်းစားပါ။
+    top_k=50,  # ထိပ်ဆုံး 50 tokens များကို ထည့်သွင်းစဉ်းစားပါ။
+    max_tokens=100,  # အများဆုံး အရှည်
+    presence_penalty=0.1,  # ထပ်ခါတလဲလဲ မဖြစ်အောင် လျှော့ချပါ။
+)
+llm.generate("Write a creative story", sampling_params=params)
+```
+
+</hfoption>
+
+</hfoptions>
+
+### ထပ်ခါတလဲလဲ ဖြစ်မှုကို ထိန်းချုပ်ခြင်း[[controlling-repetition]]
+
+frameworks နှစ်ခုလုံးက ထပ်ခါတလဲလဲ text generation ကို ကာကွယ်ဖို့ နည်းလမ်းတွေ ပံ့ပိုးပေးပါတယ်။
+
+<hfoptions id="inference-frameworks" >
+
+<hfoption value="tgi" label="TGI">
+
+```python
+client.generate(
+    "Write a varied text",
+    repetition_penalty=1.1,  # ထပ်ခါတလဲလဲ ဖြစ်သော tokens များကို ဒဏ်ခတ်ပါ။
+    no_repeat_ngram_size=3,  # 3-gram ထပ်ခါတလဲလဲ ဖြစ်မှုကို ကာကွယ်ပါ။
+)
+```
+
+</hfoption>
+
+<hfoption value="llama.cpp" label="llama.cpp">
+
+```python
+# OpenAI API မှတစ်ဆင့်
+response = client.completions.create(
+    model="smollm2-1.7b-instruct",
+    prompt="Write a varied text",
+    frequency_penalty=1.1,  # မကြာခဏ ပေါ်လာသော tokens များကို ဒဏ်ခတ်ပါ။
+    presence_penalty=0.8,  # ရှိပြီးသား tokens များကို ဒဏ်ခတ်ပါ။
+)
+
+# တိုက်ရိုက် library မှတစ်ဆင့်
+output = llm(
+    "Write a varied text",
+    repeat_penalty=1.1,  # Penalize repeated tokens
+    frequency_penalty=0.5,  # အပို frequency penalty
+    presence_penalty=0.5,  # အပို presence penalty
+)
+```
+
+</hfoption>
+
+<hfoption value="vllm" label="vLLM">
+
+```python
+params = SamplingParams(
+    presence_penalty=0.1,  # Token ရှိခြင်းအတွက် ဒဏ်ခတ်ပါ။
+    frequency_penalty=0.1,  # Token မကြာခဏ ပေါ်လာခြင်းအတွက် ဒဏ်ခတ်ပါ။
+)
+```
+
+</hfoption>
+
+</hfoptions>
+
+### အရှည် ထိန်းချုပ်ခြင်းနှင့် ရပ်တန့်ခြင်း Sequences များ[[length-control-and-stop-sequences]]
+
+generation length ကို ထိန်းချုပ်နိုင်ပြီး ဘယ်အချိန်မှာ ရပ်တန့်ရမယ်ဆိုတာ သတ်မှတ်နိုင်ပါတယ်။
+
+<hfoptions id="inference-frameworks" >
+
+<hfoption value="tgi" label="TGI">
+
+```python
+client.generate(
+    "Generate a short paragraph",
+    max_new_tokens=100,
+    min_new_tokens=10,
+    stop_sequences=["\n\n", "###"],
+)
+```
+
+</hfoption>
+
+<hfoption value="llama.cpp" label="llama.cpp">
+
+```python
+# OpenAI API မှတစ်ဆင့်
+response = client.completions.create(
+    model="smollm2-1.7b-instruct",
+    prompt="Generate a short paragraph",
+    max_tokens=100,
+    stop=["\n\n", "###"],
+)
+
+# တိုက်ရိုက် library မှတစ်ဆင့်
+output = llm("Generate a short paragraph", max_tokens=100, stop=["\n\n", "###"])
+```
+
+</hfoption>
+
+<hfoption value="vllm" label="vLLM">
+
+```python
+params = SamplingParams(
+    max_tokens=100,
+    min_tokens=10,
+    stop=["###", "\n\n"],
+    ignore_eos=False,
+    skip_special_tokens=True,
+)
+```
+
+</hfoption>
+
+</hfoptions>
+
+## Memory Management[[memory-management]]
+
+frameworks နှစ်ခုလုံးက ထိရောက်တဲ့ inference အတွက် အဆင့်မြင့် memory management နည်းစနစ်တွေကို implement လုပ်ထားပါတယ်။
+
+<hfoptions id="inference-frameworks" >
+
+<hfoption value="tgi" label="TGI">
+
+TGI က Flash Attention 2 နဲ့ continuous batching ကို အသုံးပြုပါတယ်။
+
+```sh
+# Memory optimization ပါဝင်တဲ့ Docker deployment
+docker run --gpus all -p 8080:80 \
+    --shm-size 1g \
+    ghcr.io/huggingface/text-generation-inference:latest \
+    --model-id HuggingFaceTB/SmolLM2-1.7B-Instruct \
+    --max-batch-total-tokens 8192 \
+    --max-input-length 4096
+```
+
+</hfoption>
+
+<hfoption value="llama.cpp" label="llama.cpp">
+
+llama.cpp က quantization နဲ့ optimized memory layout ကို အသုံးပြုပါတယ်-
+
+```sh
+# Memory optimizations ပါဝင်တဲ့ Server
+./server \
+    -m smollm2-1.7b-instruct.Q4_K_M.gguf \
+    --host 0.0.0.0 \
+    --port 8080 \
+    -c 2048 \               # Context size
+    --threads 4 \           # CPU threads
+    --n-gpu-layers 32 \     # ပိုကြီးတဲ့ models တွေအတွက် GPU layers များများ အသုံးပြုပါ
+    --mlock \               # Swapping မဖြစ်စေရန် memory ကို lock လုပ်ပါ
+    --cont-batching         # Continuous batching ကို ဖွင့်ပါ။
+```
+
+သင့် GPU အတွက် အရမ်းကြီးတဲ့ models တွေအတွက် CPU offloading ကို အသုံးပြုနိုင်ပါတယ်။
+
+```sh
+./server \
+    -m smollm2-1.7b-instruct.Q4_K_M.gguf \
+    --n-gpu-layers 20 \     # ပထမဆုံး 20 layers ကို GPU မှာ ထားပါ
+    --threads 8             # CPU layers တွေအတွက် CPU threads များများ အသုံးပြုပါ
+```
+
+</hfoption>
+
+<hfoption value="vllm" label="vLLM">
+
+vLLM က အကောင်းဆုံး memory management အတွက် PagedAttention ကို အသုံးပြုပါတယ်။
+
+```python
+from vllm.engine.arg_utils import AsyncEngineArgs
+
+engine_args = AsyncEngineArgs(
+    model="HuggingFaceTB/SmolLM2-1.7B-Instruct",
+    gpu_memory_utilization=0.85,
+    max_num_batched_tokens=8192,
+    block_size=16,
+)
+
+llm = LLM(engine_args=engine_args)
+```
+
+</hfoption>
+
+</hfoptions>
+
+## အရင်းအမြစ်များ[[resources]]
+
+-   [Text Generation Inference Documentation](https://huggingface.co/docs/text-generation-inference)
+-   [TGI GitHub Repository](https://github.com/huggingface/text-generation-inference)
+-   [vLLM Documentation](https://vllm.readthedocs.io/)
+-   [vLLM GitHub Repository](https://github.com/vllm-project/vllm)
+-   [PagedAttention Paper](https://arxiv.org/abs/2309.06180)
+-   [llama.cpp GitHub Repository](https://github.com/ggerganov/llama.cpp)
+-   [llama-cpp-python Repository](https://github.com/abetlen/llama-cpp-python)
+
+## ဝေါဟာရ ရှင်းလင်းချက် (Glossary)
+
+*   **Optimized Inference Deployment**: AI မော်ဒယ်များကို အသုံးပြုသူများထံသို့ ထိရောက်စွာနှင့် လျင်မြန်စွာ ဝန်ဆောင်မှုပေးနိုင်ရန် အကောင်းဆုံးဖြစ်အောင် ပြုလုပ်ထားသော လုပ်ငန်းစဉ်။
+*   **LLM (Large Language Model)**: လူသားဘာသာစကားကို နားလည်ပြီး ထုတ်လုပ်ပေးနိုင်တဲ့ အလွန်ကြီးမားတဲ့ Artificial Intelligence (AI) မော်ဒယ်တွေ ဖြစ်ပါတယ်။
+*   **Text Generation Inference (TGI)**: Hugging Face မှ LLM များအတွက် မြန်နှုန်းမြင့် text generation ကို အထူးပြုထားသော framework တစ်ခု။
+*   **vLLM**: မြန်နှုန်းမြင့် LLM inference အတွက် ဒီဇိုင်းထုတ်ထားသော library တစ်ခုဖြစ်ပြီး PagedAttention ကို အသုံးပြုသည်။
+*   **llama.cpp**: LLaMA models များကို consumer hardware ပေါ်တွင် run နိုင်ရန် အဓိကထားသော C/C++ implementation တစ်ခု။
+*   **Production Environments**: ဆော့ဖ်ဝဲလ် သို့မဟုတ် မော်ဒယ်များကို အစစ်အမှန် အသုံးပြုသူများထံသို့ ဝန်ဆောင်မှုပေးသည့် ပတ်ဝန်းကျင်။
+*   **Inference Efficiency**: AI မော်ဒယ်တစ်ခုက input data မှ output ကို ထုတ်လုပ်ရာတွင် အချိန်နှင့် အရင်းအမြစ်များကို မည်မျှ ထိရောက်စွာ အသုံးပြုနိုင်မှု။
+*   **Framework Selection Guide**: မတူညီသော အသုံးပြုမှုပုံစံများအတွက် သင့်လျော်သော framework ကို ရွေးချယ်ရန် လမ်းညွှန်။
+*   **Memory Management**: ကွန်ပျူတာ၏ memory ကို ထိထိရောက်ရောက် စီမံခန့်ခွဲခြင်း။
+*   **Performance**: စနစ်တစ်ခု၏ အလုပ်လုပ်နိုင်စွမ်း သို့မဟုတ် အရှိန်အဟုန်။
+*   **Flash Attention 2**: Transformer models များတွင် attention mechanism ကို memory bandwidth bottlenecks များကို ဖြေရှင်းပေးခြင်းဖြင့် optimization လုပ်သော နည်းပညာ။
+*   **Continuous Batching**: GPU ကို အလုပ်များနေအောင် ထိန်းထားနိုင်ရန် requests များကို အဆက်မပြတ် batch လုပ်ပြီး ပေးပို့သော နည်းလမ်း။
+*   **GPU (Graphics Processing Unit)**: AI/ML လုပ်ငန်းများတွင် အရှိန်မြှင့်ရန် အသုံးပြုသော processor။
+*   **CPU (Central Processing Unit)**: ကွန်ပျူတာ၏ အဓိက processor။
+*   **VRAM (Video RAM)**: GPU တွင် အသုံးပြုသော RAM အမျိုးအစား။
+*   **PagedAttention**: LLM inference တွင် KV cache memory management ကို optimization လုပ်သော နည်းပညာ။
+*   **KV Cache**: Text generation လုပ်နေစဉ်အတွင်း Transformer model က သိမ်းဆည်းထားသော attention keys နှင့် values များ။
+*   **Memory Paging**: Memory ကို fixed-size "pages" များအဖြစ် ပိုင်းခြားခြင်း။
+*   **Memory Fragmentation**: Memory ကို ထိရောက်စွာ အသုံးမပြုနိုင်ဘဲ အပိုင်းအစများအဖြစ် ပြန့်ကျဲနေခြင်း။
+*   **Throughput**: အချိန်ယူနစ်တစ်ခုအတွင်း စနစ်တစ်ခုက လုပ်ဆောင်နိုင်သော လုပ်ငန်းပမာဏ။
+*   **Quantization**: Model weights တွေရဲ့ precision ကို လျှော့ချခြင်းဖြင့် model size နဲ့ memory requirements တွေကို လျှော့ချသော နည်းလမ်း။
+*   **INT8 (8-bit integers)**: 8-bit integers ဖြင့် ကိုယ်စားပြုသော ကိန်းဂဏန်းများ။
+*   **GGML/GGUF Format**: llama.cpp မှ quantized inference အတွက် အကောင်းဆုံးဖြစ်အောင် ပြုလုပ်ထားသော custom tensor formats များ။
+*   **Mixed Precision**: Model ၏ မတူညီသော အစိတ်အပိုင်းများတွင် မတူညီသော quantization levels များကို အသုံးပြုခြင်း။
+*   **CPU Architectures**: CPU အမျိုးအစားများ (ဥပမာ - AVX2, AVX-512, NEON)။
+*   **Local Deployments**: မော်ဒယ်များကို သုံးစွဲသူ၏ ကွန်ပျူတာ သို့မဟုတ် local server ပေါ်တွင် တပ်ဆင်အသုံးပြုခြင်း။
+*   **Edge Devices**: ကွန်ပျူတာကွန်ရက်၏ အစွန်းပိုင်း (ဥပမာ - mobile devices, IoT devices) တွင် အလုပ်လုပ်သော devices များ။
+*   **Enterprise-level Deployment**: လုပ်ငန်းကြီးများအတွက် ဒီဇိုင်းထုတ်ထားသော deployment ပုံစံ။
+*   **Kubernetes Support**: Containerized application များကို automate လုပ်ပြီး deploy, scale လုပ်ရန်အတွက် Kubernetes platform ကို ထောက်ပံ့ခြင်း။
+*   **Prometheus**: Monitoring system တစ်ခု။
+*   **Grafana**: Data visualization tool တစ်ခု။
+*   **Automatic Scaling**: requests များ၏ ပမာဏအပေါ် မူတည်ပြီး resources များကို အလိုအလျောက် ချိန်ညှိခြင်း။
+*   **Content Filtering**: မသင့်လျော်သော သို့မဟုတ် အန္တရာယ်ရှိသော အကြောင်းအရာများကို စစ်ထုတ်ခြင်း။
+*   **Rate Limiting**: အချိန်အတိုင်းအတာတစ်ခုအတွင်း requests အရေအတွက်ကို ကန့်သတ်ခြင်း။
+*   **Developer-friendly Approach**: Developers များအတွက် အသုံးပြုရလွယ်ကူသော ချဉ်းကပ်မှု။
+*   **OpenAI API Compatibility**: OpenAI ၏ API နှင့် တွဲဖက်အသုံးပြုနိုင်ခြင်း။
+*   **Ray**: Distributed computing အတွက် Python framework တစ်ခု။
+*   **Portability**: ဆော့ဖ်ဝဲလ်တစ်ခုကို မတူညီသော platform များ သို့မဟုတ် environments များသို့ အလွယ်တကူ ရွှေ့ပြောင်းအသုံးပြုနိုင်ခြင်း။
+*   **`docker run --gpus all`**: Docker container ကို GPU အားလုံးကို အသုံးပြုပြီး run ရန် command။
+*   **`--shm-size 1g`**: Shared memory size ကို 1GB အဖြစ် သတ်မှတ်ခြင်း။
+*   **`InferenceClient`**: Hugging Face Hub မှ inference endpoint များနှင့် အပြန်အလှန်ဆက်သွယ်ရန် Python client။
+*   **`openai`**: OpenAI API ကို အသုံးပြုရန်အတွက် Python client library။
+*   **`git clone`**: Git repository ကို download လုပ်ရန် command။
+*   **`make`**: Source code ကို executable file အဖြစ် build လုပ်ရန် command။
+*   **`curl -L -O`**: URL မှ file တစ်ခုကို download လုပ်ရန် command။
+*   **`--host`, `--port`**: Server ကို listen လုပ်မည့် host address နှင့် port နံပါတ်။
+*   **`--n-gpu-layers`**: GPU တွင် ထားရှိမည့် model layers အရေအတွက်။
+*   **Context Size (`-c`)**: Model က တစ်ကြိမ်တည်း လုပ်ဆောင်နိုင်သော tokens အရေအတွက် အများဆုံး။
+*   **CPU Threads (`--threads`)**: CPU တွင် အသုံးပြုမည့် threads အရေအတွက်။
+*   **Batch Size (`--batch-size`)**: Prompt evaluation အတွက် batch အရွယ်အစား။
+*   **`llama_cpp`**: llama.cpp C++ library အတွက် Python bindings။
+*   **`Llama` Class**: llama-cpp-python library မှ LLaMA model ကို load လုပ်ရန် class။
+*   **`n_ctx`**: Model ၏ context window size။
+*   **`n_threads`**: CPU threads အရေအတွက်။
+*   **`n_gpu_layers`**: GPU ပေါ်တွင် ထားရှိမည့် layers အရေအတွက်။
+*   **`SamplingParams`**: vLLM တွင် text generation အတွက် sampling parameters များကို သတ်မှတ်ရန် class။
+*   **Temperature**: generated text ၏ randomness သို့မဟုတ် creativity ကို ထိန်းချုပ်သော parameter။
+*   **Top-p (Nucleus) Sampling**: ဖြစ်နိုင်ခြေအများဆုံး tokens အချို့ကို ရွေးချယ်ပြီး ၎င်းတို့၏ စုစုပေါင်း ဖြစ်နိုင်ခြေသည် သတ်မှတ်ထားသော တန်ဖိုး (ဥပမာ - 0.95) ထက် မကျော်လွန်စေရ။
+*   **Top-k Filtering**: ဖြစ်နိုင်ခြေအများဆုံး tokens `k` ခုကိုသာ ရွေးချယ်ပြီး ကျန်များကို လျစ်လျူရှုသည်။
+*   **`max_new_tokens` / `max_tokens`**: Generate လုပ်မည့် tokens အရေအတွက် အများဆုံး။
+*   **`repetition_penalty`**: ထပ်ခါတလဲလဲ ဖြစ်သော tokens များကို ဒဏ်ခတ်ရန် parameter။
+*   **`do_sample`**: True ဖြစ်ပါက sampling ကို အသုံးပြုပြီး၊ False ဖြစ်ပါက greedy decoding ကို အသုံးပြုသည်။
+*   **`frequency_penalty`**: မကြာခဏ ပေါ်လာသော tokens များကို ထပ်ခါတလဲလဲ မဖြစ်အောင် လျှော့ချရန် parameter။
+*   **`presence_penalty`**: ရှိပြီးသား tokens များကို ထပ်ခါတလဲလဲ မဖြစ်အောင် လျှော့ချရန် parameter။
+*   **`min_new_tokens` / `min_tokens`**: Generate လုပ်မည့် tokens အရေအတွက် အနည်းဆုံး။
+*   **`stop_sequences`**: Generated text ကို ရပ်တန့်ရန်အတွက် သတ်မှတ်ထားသော sequence များ။
+*   **`ignore_eos`**: End-of-sequence token ကို လျစ်လျူရှုရန်။
+*   **`skip_special_tokens`**: Generated text မှ special tokens များကို ဖယ်ရှားရန်။
+*   **CPU Offloading**: Model ၏ အစိတ်အပိုင်းအချို့ကို GPU မှ CPU သို့ ရွှေ့ပြောင်းပြီး လုပ်ဆောင်ခြင်း။
+*   **`--mlock`**: Memory ကို lock လုပ်ပြီး swapping မဖြစ်စေရန် ကာကွယ်ခြင်း။
\ No newline at end of file
diff --git a/chapters/my/chapter2/9.mdx b/chapters/my/chapter2/9.mdx
new file mode 100644
index 000000000..aa0f8475d
--- /dev/null
+++ b/chapters/my/chapter2/9.mdx
@@ -0,0 +1,252 @@
+<FrameworkSwitchCourse {fw} />
+
+<!-- DISABLE-FRONTMATTER-SECTIONS -->
+
+# အခန်းပြီးဆုံးခြင်း စစ်ဆေးမှု[[end-of-chapter-quiz]]
+
+<CourseFloatingBanner
+    chapter={2}
+    classNames="absolute z-10 right-0 top-0"
+/>
+
+### 1. Language modeling pipeline ၏ အစီအစဉ်က ဘာလဲ။
+
+<Question
+	choices={[
+		{
+			text: "ပထမဆုံး၊ text ကို ကိုင်တွယ်ပြီး raw predictions တွေကို ပြန်ပေးတဲ့ model ဖြစ်ပါတယ်။ ထို့နောက် tokenizer က ဒီ predictions တွေကို နားလည်ပြီး လိုအပ်တဲ့အခါ text အဖြစ် ပြန်ပြောင်းပေးပါတယ်။",
+			explain: "Model က text ကို နားမလည်နိုင်ပါဘူး။ Tokenizer က text ကို အရင် tokenize လုပ်ပြီး model က နားလည်နိုင်အောင် IDs တွေအဖြစ် ပြောင်းလဲပေးရပါမယ်။"
+		},
+		{
+			text: "ပထမဆုံး၊ text ကို ကိုင်တွယ်ပြီး IDs တွေကို ပြန်ပေးတဲ့ tokenizer ဖြစ်ပါတယ်။ Model က ဒီ IDs တွေကို ကိုင်တွယ်ပြီး text ဖြစ်နိုင်တဲ့ prediction တစ်ခုကို ထုတ်ပေးပါတယ်။",
+			explain: "Model ရဲ့ prediction က တိုက်ရိုက် text မဖြစ်နိုင်ပါဘူး။ Prediction ကို text အဖြစ် ပြန်ပြောင်းဖို့ tokenizer ကို ထပ်မံအသုံးပြုရပါမယ်။"
+		},
+		{
+			text: "Tokenizer က text ကို ကိုင်တွယ်ပြီး IDs တွေကို ပြန်ပေးပါတယ်။ Model က ဒီ IDs တွေကို ကိုင်တွယ်ပြီး prediction တစ်ခုကို ထုတ်ပေးပါတယ်။ ထို့နောက် tokenizer ကို ဒီ predictions တွေကို text အဖြစ် ပြန်ပြောင်းဖို့အတွက် တစ်ဖန် ထပ်မံအသုံးပြုနိုင်ပါတယ်။",
+			explain: "Tokenizer ကို tokenize လုပ်ခြင်းနှင့် de-tokenize လုပ်ခြင်း နှစ်ခုလုံးအတွက် အသုံးပြုနိုင်ပါတယ်။",
+            correct: true
+		}
+	]}
+/>
+
+### 2. Base Transformer model က ထုတ်ပေးတဲ့ tensor မှာ dimension ဘယ်နှစ်ခုရှိပြီး၊ ဘာတွေလဲ။
+
+<Question
+	choices={[
+		{
+			text: "2 ခု: Sequence length နဲ့ batch size",
+			explain: "မှားပါတယ်။ Model က ထုတ်ပေးတဲ့ tensor မှာ တတိယ dimension တစ်ခုရှိပါတယ်- hidden size ပါ။"
+		},
+		{
+			text: "2 ခု: Sequence length နဲ့ hidden size",
+			explain: "မှားပါတယ်။ Transformer model အားလုံးက batches တွေကို ကိုင်တွယ်ပါတယ်၊ single sequence တစ်ခုနဲ့ဆိုရင်တောင်မှ batch size က 1 ဖြစ်ပါလိမ့်မယ်။"
+		},
+		{
+			text: "3 ခု: Sequence length, batch size နဲ့ hidden size",
+			explain: "ကောင်းလိုက်တာ။",
+            correct: true
+		}
+	]}
+/>
+
+### 3. အောက်ပါတို့ထဲမှ မည်သည့်အရာက subword tokenization ဥပမာတစ်ခုလဲ။
+
+<Question
+	choices={[
+		{
+			text: "WordPiece",
+			explain: "ဟုတ်ပါတယ်၊ ဒါက subword tokenization ဥပမာတစ်ခုပါပဲ။",
+            correct: true
+		},
+		{
+			text: "Character-based tokenization",
+			explain: "Character-based tokenization ဟာ subword tokenization အမျိုးအစား မဟုတ်ပါဘူး။"
+		},
+		{
+			text: "Whitespace နဲ့ punctuation တွေနဲ့ ပိုင်းခြားခြင်း",
+			explain: "ဒါက word-based tokenization နည်းလမ်းတစ်ခုပါ။"
+		},
+		{
+			text: "BPE",
+			explain: "ဟုတ်ပါတယ်၊ ဒါက subword tokenization ဥပမာတစ်ခုပါပဲ။",
+            correct: true
+        },
+		{
+			text: "Unigram",
+			explain: "ဟုတ်ပါတယ်၊ ဒါက subword tokenization ဥပမာတစ်ခုပါပဲ။",
+            correct: true
+        },
+		{
+			text: "အထက်ပါအဖြေများမှ တစ်ခုမှ မဟုတ်ပါ။",
+			explain: "မှားပါတယ်။"
+        }
+	]}
+/>
+
+### 4. Model head ဆိုတာ ဘာလဲ။
+
+<Question
+	choices={[
+		{
+			text: "Base Transformer network ရဲ့ အစိတ်အပိုင်းတစ်ခုဖြစ်ပြီး tensors တွေကို ၎င်းတို့ရဲ့ မှန်ကန်တဲ့ layers တွေဆီ ပြန်လည်လမ်းကြောင်းပြောင်းပေးပါတယ်။",
+			explain: "ဒီလို အစိတ်အပိုင်းမျိုး မရှိပါဘူး။"
+		},
+		{
+			text: "Self-attention mechanism လို့လည်း လူသိများပြီး၊ ၎င်းသည် sequence ၏ အခြား tokens များနှင့်အညီ token တစ်ခု၏ ကိုယ်စားပြုမှုကို လိုက်လျောညီထွေဖြစ်အောင် ပြုလုပ်ပေးပါတယ်။",
+			explain: "Self-attention layer မှာ attention 'heads' တွေ ပါဝင်ပေမယ့် ဒါတွေက adaptation heads တွေ မဟုတ်ပါဘူး။"
+		},
+		{
+			text: "Transformer predictions တွေကို task-specific output တစ်ခုအဖြစ် ပြောင်းလဲဖို့အတွက် ပုံမှန်အားဖြင့် layers တစ်ခု သို့မဟုတ် အနည်းငယ်နဲ့ ဖွဲ့စည်းထားတဲ့ အပိုအစိတ်အပိုင်းတစ်ခု။",
+			explain: "မှန်ပါတယ်။ Adaptation heads တွေဟာ (ရိုးရှင်းစွာ heads လို့လည်း လူသိများပါတယ်) မတူညီတဲ့ ပုံစံမျိုးစုံနဲ့ လာပါတယ်- language modeling heads, question answering heads, sequence classification heads... ",
+			correct: true
+		} 
+	]}
+/>
+
+### 5. AutoModel ဆိုတာ ဘာလဲ။
+
+<Question
+	choices={[
+		{
+			text: "သင်၏ data ပေါ်တွင် အလိုအလျောက် လေ့ကျင့်ပေးသော model တစ်ခု။",
+			explain: "ဒါကို ကျွန်တော်တို့ရဲ့ <a href='https://huggingface.co/autotrain'>AutoTrain</a> product နဲ့ မှားနေတာလား။"
+		},
+		{
+			text: "Checkpoint ကို အခြေခံပြီး မှန်ကန်တဲ့ architecture ကို ပြန်ပေးတဲ့ object တစ်ခု။",
+			explain: "မှန်ပါပြီ- `AutoModel` က မှန်ကန်တဲ့ architecture ကို ပြန်ပေးဖို့အတွက် initialize လုပ်မယ့် checkpoint ကို သိဖို့ပဲ လိုအပ်ပါတယ်။",
+			correct: true
+		},
+		{
+			text: "၎င်း၏ inputs များအတွက် အသုံးပြုသော ဘာသာစကားကို အလိုအလျောက် ထောက်လှမ်းပြီး မှန်ကန်သော weights များကို load လုပ်ပေးသော model တစ်ခု။",
+			explain: "အချို့ checkpoints တွေနဲ့ models တွေက ဘာသာစကားများစွာကို ကိုင်တွယ်နိုင်စွမ်းရှိပေမယ့်၊ ဘာသာစကားအရ checkpoint ကို အလိုအလျောက် ရွေးချယ်ဖို့အတွက် built-in tools တွေ မရှိသေးပါဘူး။ သင့်လုပ်ငန်းအတွက် အကောင်းဆုံး checkpoint ကို ရှာဖွေဖို့ <a href='https://huggingface.co/models'>Model Hub</a> ကို သွားသင့်ပါတယ်။"
+		} 
+	]}
+/>
+
+### 6. အရှည်မတူညီသော sequences များကို အတူတကွ batch လုပ်သည့်အခါ မည်သည့်နည်းလမ်းများကို သိရှိထားသင့်သလဲ။
+
+<Question
+	choices={[
+		{
+			text: "Truncating",
+			explain: "ဟုတ်ပါတယ်၊ truncation က rectangular shape ဖြစ်အောင် sequences တွေကို ညီမျှအောင် လုပ်ဖို့ မှန်ကန်တဲ့ နည်းလမ်းတစ်ခုပါပဲ။ ဒါပေမယ့် တစ်ခုတည်းသော နည်းလမ်းလား။",
+			correct: true
+		},
+		{
+			text: "Returning tensors",
+			explain: "အခြားနည်းလမ်းတွေက rectangular tensors တွေကို ပြန်ပေးနိုင်ပေမယ့်၊ sequences တွေကို batch လုပ်တဲ့အခါ tensors တွေကို ပြန်ပေးတာက အသုံးမဝင်ပါဘူး။"
+		},
+		{
+			text: "Padding",
+			explain: "ဟုတ်ပါတယ်၊ padding က rectangular shape ဖြစ်အောင် sequences တွေကို ညီမျှအောင် လုပ်ဖို့ မှန်ကန်တဲ့ နည်းလမ်းတစ်ခုပါပဲ။ ဒါပေမယ့် တစ်ခုတည်းသော နည်းလမ်းလား။",
+			correct: true
+		}, 
+		{
+			text: "Attention masking",
+			explain: "ဟုတ်ပါတယ်။ အရှည်မတူညီသော sequences များကို ကိုင်တွယ်သည့်အခါ Attention masks များသည် အလွန်အရေးကြီးပါသည်။ သို့သော်လည်း ၎င်းသည် သိရှိထားရမည့် တစ်ခုတည်းသော နည်းပညာ မဟုတ်သေးပါ။",
+			correct: true
+		} 
+	]}
+/>
+
+### 7. sequence classification model က ထုတ်ပေးတဲ့ logits တွေပေါ်မှာ SoftMax function ကို အသုံးပြုရခြင်းရဲ့ ရည်ရွယ်ချက်က ဘာလဲ။
+
+<Question
+	choices={[
+		{
+			text: "Logits တွေကို ပိုမိုယုံကြည်စိတ်ချရအောင် ပြုလုပ်ပေးပါတယ်။",
+			explain: "မဟုတ်ပါဘူး၊ SoftMax function က ရလဒ်တွေရဲ့ ယုံကြည်စိတ်ချရမှုကို မထိခိုက်ပါဘူး။"
+		},
+		{
+			text: "၎င်းတို့ နားလည်နိုင်အောင် အနိမ့်ဆုံးနှင့် အမြင့်ဆုံးကန့်သတ်ချက်ကို သတ်မှတ်ပေးပါတယ်။",
+			explain: "ထွက်လာတဲ့ တန်ဖိုးတွေက 0 နဲ့ 1 ကြားမှာ ရှိပါတယ်။ ဒါပေမယ့် ဒါက SoftMax function ကို အသုံးပြုရတဲ့ တစ်ခုတည်းသော အကြောင်းပြချက်တော့ မဟုတ်ပါဘူး။",
+            correct: true
+		},
+		{
+			text: "output ရဲ့ စုစုပေါင်းတန်ဖိုးက 1 ဖြစ်လာပြီး ဖြစ်နိုင်ခြေဆိုင်ရာ အဓိပ္ပာယ်ဖွင့်ဆိုနိုင်ခြေ ရှိလာပါတယ်။",
+			explain: "မှန်ပါပြီ။ ဒါပေမယ့် ဒါက SoftMax function ကို အသုံးပြုရတဲ့ တစ်ခုတည်းသော အကြောင်းပြချက်တော့ မဟုတ်ပါဘူး။",
+            correct: true
+		}
+	]}
+/>
+
+### 8. tokenizer API ရဲ့ အများစုက ဘယ် method ပေါ်မှာ အခြေခံထားလဲ။
+
+<Question
+	choices={[
+		{
+			text: "<code>encode</code>၊ text ကို IDs အဖြစ် encode လုပ်နိုင်ပြီး IDs တွေကို predictions အဖြစ် encode လုပ်နိုင်လို့ပါ။",
+			explain: "မှားပါတယ်။ `encode` method ဟာ tokenizers တွေမှာ ရှိပေမယ့် models တွေမှာတော့ မရှိပါဘူး။"
+		},
+		{
+			text: "tokenizer object ကို တိုက်ရိုက်ခေါ်ခြင်း။",
+			explain: "မှန်ပါပြီ။ tokenizer ရဲ့ `__call__` method ဟာ အလွန်အစွမ်းထက်တဲ့ method တစ်ခုဖြစ်ပြီး ဘာမဆိုနီးပါး ကိုင်တွယ်နိုင်ပါတယ်။ ဒါက model ကနေ predictions တွေကို ရယူဖို့ အသုံးပြုတဲ့ method လည်း ဖြစ်ပါတယ်။",
+			correct: true
+		},
+		{
+			text: "<code>pad</code>",
+			explain: "မှားပါတယ်။ Padding က အလွန်အသုံးဝင်ပေမယ့် tokenizer API ရဲ့ တစ်စိတ်တစ်ပိုင်းမျှသာ ဖြစ်ပါတယ်။"
+		},
+		{
+			text: "<code>tokenize</code>",
+			explain: "`tokenize` method ဟာ အသုံးဝင်ဆုံး methods တွေထဲက တစ်ခုဖြစ်ပေမယ့် tokenizer API ရဲ့ အဓိက အစိတ်အပိုင်းတော့ မဟုတ်ပါဘူး။"
+		}
+	]}
+/>
+
+### 9. ဒီ code sample မှာ `result` variable က ဘာတွေ ပါဝင်သလဲ။
+
+```py
+from transformers import AutoTokenizer
+
+tokenizer = AutoTokenizer.from_pretrained("bert-base-cased")
+result = tokenizer.tokenize("Hello!")
+```
+
+<Question
+	choices={[
+		{
+			text: "Strings များ၏ list တစ်ခု၊ string တစ်ခုစီသည် token တစ်ခုဖြစ်သည်။",
+			explain: "ဟုတ်ပါတယ်၊ ဒါကို IDs တွေအဖြစ် ပြောင်းလဲပြီး model ကို ပို့လိုက်ပါ။",
+            correct: true
+		},
+		{
+			text: "IDs များ၏ list တစ်ခု။",
+			explain: "မှားပါတယ်။ ဒါက `__call__` ဒါမှမဟုတ် `convert_tokens_to_ids` method အတွက်ပါ။"
+		},
+		{
+			text: "Tokens များအားလုံး ပါဝင်သော string တစ်ခု။",
+			explain: "ဒါက မသင့်တော်ပါဘူး၊ ဘာလို့လဲဆိုတော့ ရည်ရွယ်ချက်က string ကို tokens များစွာအဖြစ် ပိုင်းခြားဖို့ပါ။"
+		}
+	]}
+/>
+
+### 10. အောက်ပါ code မှာ တစ်ခုခု မှားနေတာ ရှိပါသလား။
+
+```py
+from transformers import AutoTokenizer, AutoModel
+
+tokenizer = AutoTokenizer.from_pretrained("bert-base-cased")
+model = AutoModel.from_pretrained("gpt2")
+
+encoded = tokenizer("Hey!", return_tensors="pt")
+result = model(**encoded)
+```
+
+<Question
+	choices={[
+		{
+			text: "မရှိပါဘူး၊ မှန်ကန်ပုံရပါတယ်။",
+			explain: "ကံမကောင်းစွာနဲ့ပဲ၊ model တစ်ခုကို မတူညီတဲ့ checkpoint နဲ့ train လုပ်ထားတဲ့ tokenizer တစ်ခုနဲ့ တွဲဖက်တာက ကောင်းတဲ့ အကြံတစ်ခု မဟုတ်ပါဘူး။ model ကို ဒီ tokenizer ရဲ့ output ကနေ အဓိပ္ပာယ်ထုတ်ယူဖို့ train လုပ်ထားတာ မဟုတ်ပါဘူး၊ ဒါကြောင့် model output က (run နိုင်ခဲ့ရင်တောင်) ဘာအဓိပ္ပာယ်မှ မရှိပါဘူး။"
+		},
+		{
+			text: "Tokenizer နဲ့ model ဟာ အမြဲတမ်း checkpoint တူတူကနေ ဖြစ်သင့်ပါတယ်။",
+			explain: "မှန်ပါပြီ။",
+            correct: true
+		},
+		{
+			text: "Input တိုင်းဟာ batch ဖြစ်တာကြောင့် tokenizer နဲ့ pad လုပ်ခြင်းနဲ့ truncate လုပ်ခြင်းက ကောင်းတဲ့ အလေ့အကျင့်ပါ။",
+			explain: "Model input တိုင်းဟာ batch ဖြစ်ဖို့ လိုအပ်တာ မှန်ပါတယ်။ သို့သော်လည်း၊ ဒီ sequence ကို truncate ဒါမှမဟုတ် pad လုပ်တာက အဓိပ္ပာယ်ရှိမှာ မဟုတ်ပါဘူး။ ဘာလို့လဲဆိုတော့ ဒါတစ်ခုတည်းပဲ ရှိလို့ပါ။ ဒါတွေက sentences list တစ်ခုကို batch လုပ်ဖို့အတွက် နည်းလမ်းတွေပါ။"
+		}
+	]}
+/>
\ No newline at end of file
diff --git a/chapters/my/chapter3/1.mdx b/chapters/my/chapter3/1.mdx
new file mode 100644
index 000000000..fcee2dad7
--- /dev/null
+++ b/chapters/my/chapter3/1.mdx
@@ -0,0 +1,59 @@
+<FrameworkSwitchCourse {fw} />
+
+# နိဒါန်း[[introduction]]
+
+<CourseFloatingBanner
+    chapter={3}
+    classNames="absolute z-10 right-0 top-0"
+/>
+
+[Chapter 2](/course/chapter2) မှာ ကျွန်တော်တို့ဟာ tokenizers တွေနဲ့ pre-trained models တွေကို အသုံးပြုပြီး predictions တွေ ဘယ်လိုလုပ်ရမယ်ဆိုတာကို လေ့လာခဲ့ပါတယ်။ ဒါပေမယ့် သီးခြားလုပ်ငန်းတစ်ခုကို ဖြေရှင်းဖို့အတွက် pre-trained model တစ်ခုကို fine-tune လုပ်ချင်တယ်ဆိုရင်ကော။ ဒါက ဒီအခန်းရဲ့ အကြောင်းအရာပါပဲ။ သင်ဟာ အောက်ပါတို့ကို သင်ယူရပါလိမ့်မယ်-
+
+*   နောက်ဆုံးပေါ် 🤗 Datasets features တွေကို အသုံးပြုပြီး Hub ကနေ ကြီးမားတဲ့ dataset တစ်ခုကို ဘယ်လိုပြင်ဆင်ရမလဲ
+*   ခေတ်မီအကောင်းဆုံး အလေ့အကျင့်တွေနဲ့ model တစ်ခုကို fine-tune လုပ်ဖို့ high-level `Trainer` API ကို ဘယ်လိုအသုံးပြုရမလဲ
+*   optimization နည်းစနစ်တွေနဲ့ custom training loop တစ်ခုကို ဘယ်လို implement လုပ်ရမလဲ
+*   မည်သည့် setup ပေါ်မှာမဆို distributed training ကို အလွယ်တကူ run နိုင်ဖို့ 🤗 Accelerate library ကို ဘယ်လိုအသုံးချရမလဲ
+*   အမြင့်ဆုံး performance အတွက် လက်ရှိ fine-tuning အကောင်းဆုံး အလေ့အကျင့်တွေကို ဘယ်လိုအသုံးချရမလဲ
+
+> [!TIP]
+> 📚 **မရှိမဖြစ် လိုအပ်သော အရင်းအမြစ်များ**: မစတင်မီ၊ data processing အတွက် [🤗 Datasets documentation](https://huggingface.co/docs/datasets/) ကို ပြန်လည်လေ့လာနိုင်ပါတယ်။
+
+ဒီအခန်းက 🤗 Transformers library အပြင် အချို့ Hugging Face libraries တွေကိုပါ မိတ်ဆက်ပေးပါလိမ့်မယ်။ 🤗 Datasets, 🤗 Tokenizers, 🤗 Accelerate, နဲ့ 🤗 Evaluate လို libraries တွေက models တွေကို ပိုမိုထိရောက်ပြီး အကျိုးရှိရှိ train လုပ်ဖို့ ဘယ်လိုကူညီပေးနိုင်တယ်ဆိုတာကို ကျွန်တော်တို့ တွေ့မြင်ရပါလိမ့်မယ်။
+
+ဒီအခန်းရဲ့ အဓိကအပိုင်းတစ်ခုစီက သင့်ကို မတူညီတဲ့အရာတွေကို သင်ကြားပေးပါလိမ့်မယ်-
+-   **အပိုင်း ၂**: ခေတ်မီ data preprocessing နည်းစနစ်တွေနဲ့ ထိရောက်တဲ့ dataset handling တွေကို သင်ယူပါ။
+-   **အပိုင်း ၃**: သူ့ရဲ့ နောက်ဆုံးပေါ် features အားလုံးနဲ့ အစွမ်းထက်တဲ့ Trainer API ကို ကျွမ်းကျင်အောင် လေ့လာပါ။
+-   **အပိုင်း ၄**: training loops တွေကို အစကနေ implement လုပ်ပြီး Accelerate နဲ့ distributed training ကို နားလည်ပါ။
+
+ဒီအခန်းရဲ့ အဆုံးမှာတော့ သင်ဟာ high-level APIs နဲ့ custom training loops နှစ်ခုလုံးကို အသုံးပြုပြီး၊ နယ်ပယ်ရဲ့ နောက်ဆုံးပေါ် အကောင်းဆုံး အလေ့အကျင့်တွေကို အသုံးချကာ သင်ကိုယ်ပိုင် datasets တွေနဲ့ လုပ်ငန်းတွေပေါ်မှာ models တွေကို fine-tune လုပ်နိုင်ပါလိမ့်မယ်။
+
+> [!TIP]
+> 🎯 **သင် တည်ဆောက်မည့်အရာ**: ဒီအခန်းရဲ့ အဆုံးမှာ သင်ဟာ text classification အတွက် BERT model တစ်ခုကို fine-tune လုပ်ပြီး၊ သင်၏ datasets တွေနဲ့ လုပ်ငန်းတွေအတွက် ဒီနည်းစနစ်တွေကို ဘယ်လို လိုက်လျောညီထွေဖြစ်အောင် အသုံးချရမလဲဆိုတာကို နားလည်လာပါလိမ့်မယ်။
+
+ဒီအခန်းက **PyTorch** ကိုသာ သီးသန့်အာရုံစိုက်ထားပါတယ်၊ ဘာလို့လဲဆိုတော့ ဒါက ခေတ်မီ deep learning သုတေသနနဲ့ production အတွက် standard framework တစ်ခု ဖြစ်လာလို့ပါပဲ။ Hugging Face ecosystem ရဲ့ နောက်ဆုံးပေါ် APIs နဲ့ အကောင်းဆုံး အလေ့အကျင့်တွေကို ကျွန်တော်တို့ အသုံးပြုပါမယ်။
+
+သင့်ရဲ့ train လုပ်ထားတဲ့ models တွေကို Hugging Face Hub ကို upload လုပ်ဖို့အတွက် Hugging Face account တစ်ခု လိုအပ်ပါလိမ့်မယ်- [account တစ်ခု ဖန်တီးပါ](https://huggingface.co/join)
+
+## ဝေါဟာရ ရှင်းလင်းချက် (Glossary)
+
+*   **Fine-tune**: ကြိုတင်လေ့ကျင့်ထားပြီးသား (pre-trained) မော်ဒယ်တစ်ခုကို သီးခြားလုပ်ငန်းတစ်ခု (specific task) အတွက် အနည်းငယ်သော ဒေတာနဲ့ ထပ်မံလေ့ကျင့်ပေးခြင်းကို ဆိုလိုပါတယ်။
+*   **Pretrained Models**: ဒေတာအမြောက်အမြားပေါ်တွင် ကြိုတင်လေ့ကျင့်ထားပြီးသား Artificial Intelligence (AI) မော်ဒယ်များ။
+*   **Predictions**: Machine Learning မော်ဒယ်တစ်ခုက input data ကို အခြေခံပြီး ခန့်မှန်းထုတ်ပေးသော ရလဒ်များ။
+*   **Dataset**: AI မော်ဒယ်တွေ လေ့ကျင့်ဖို့အတွက် အသုံးပြုတဲ့ ဒေတာအစုအဝေးတစ်ခုပါ။
+*   **Hugging Face Hub**: AI မော်ဒယ်တွေ၊ datasets တွေနဲ့ demo တွေကို အခြားသူတွေနဲ့ မျှဝေဖို့၊ ရှာဖွေဖို့နဲ့ ပြန်လည်အသုံးပြုဖို့အတွက် အွန်လိုင်း platform တစ်ခု ဖြစ်ပါတယ်။
+*   **🤗 Datasets**: Hugging Face က ထုတ်လုပ်ထားတဲ့ library တစ်ခုဖြစ်ပြီး AI မော်ဒယ်တွေ လေ့ကျင့်ဖို့အတွက် ဒေတာအစုအဝေး (datasets) တွေကို လွယ်လွယ်ကူကူ ဝင်ရောက်ရယူ၊ စီမံခန့်ခွဲပြီး အသုံးပြုနိုင်စေပါတယ်။
+*   **`Trainer` API**: 🤗 Transformers library မှာ ပါဝင်တဲ့ high-level API တစ်ခုဖြစ်ပြီး Transformer models တွေကို အလွယ်တကူ လေ့ကျင့်ပြီး fine-tune လုပ်နိုင်စေပါတယ်။
+*   **Best Practices**: နယ်ပယ်တစ်ခုအတွင်း လုပ်ဆောင်မှုများကို ထိရောက်ပြီး အကျိုးရှိစေရန် အကောင်းဆုံးနည်းလမ်းများ။
+*   **Custom Training Loop**: model တစ်ခုကို လေ့ကျင့်ရန်အတွက် ကိုယ်တိုင်ရေးသားထားသော code loop။
+*   **Optimization Techniques**: model လေ့ကျင့်မှုကို ပိုမိုမြန်ဆန်စေရန် သို့မဟုတ် ပိုမိုထိရောက်စေရန် အသုံးပြုသော နည်းလမ်းများ။
+*   **🤗 Accelerate Library**: Hugging Face က ထုတ်လုပ်ထားတဲ့ library တစ်ခုဖြစ်ပြီး PyTorch code တွေကို မတူညီတဲ့ training environment (ဥပမာ - GPU အများအပြား၊ distributed training) တွေမှာ အလွယ်တကူ run နိုင်အောင် ကူညီပေးပါတယ်။
+*   **Distributed Training**: model တစ်ခုကို ကွန်ပျူတာ သို့မဟုတ် GPU များစွာကို အသုံးပြုပြီး အပြိုင် လေ့ကျင့်ခြင်း။
+*   **🤗 Tokenizers**: Hugging Face က ထုတ်လုပ်ထားတဲ့ library တစ်ခုဖြစ်ပြီး စာသားတွေကို AI မော်ဒယ်တွေ နားလည်နိုင်တဲ့ ပုံစံ (tokens) တွေအဖြစ် ပြောင်းလဲပေးတဲ့ လုပ်ငန်းစဉ် (tokenization) ကို မြန်ဆန်ထိရောက်စွာ လုပ်ဆောင်ပေးပါတယ်။
+*   **🤗 Evaluate**: Hugging Face မှ machine learning models များ၏ စွမ်းဆောင်ရည်ကို တိုင်းတာရန်အတွက် metrics များနှင့် evaluation components များကို ပံ့ပိုးပေးသော library။
+*   **BERT Model**: Google က ဖန်တီးခဲ့သော Transformer-based language model တစ်ခု။
+*   **Text Classification**: စာသားတစ်ခုကို ကြိုတင်သတ်မှတ်ထားသော အမျိုးအစားများအဖြစ် ခွဲခြားသတ်မှတ်ခြင်း။
+*   **PyTorch**: Facebook (ယခု Meta) က ဖန်တီးထားတဲ့ open-source machine learning library တစ်ခုဖြစ်ပြီး deep learning မော်ဒယ်တွေ တည်ဆောက်ဖို့အတွက် အသုံးပြုပါတယ်။
+*   **Deep Learning**: Machine Learning ရဲ့ နယ်ပယ်ခွဲတစ်ခုဖြစ်ပြီး neural networks တွေကို အသုံးပြုကာ ဒေတာတွေကနေ ရှုပ်ထွေးတဲ့ ပုံစံတွေကို သင်ယူစေပါတယ်။
+*   **Hugging Face Account**: Hugging Face ပလက်ဖောင်းပေါ်ရှိ သုံးစွဲသူအကောင့်။
+*   **Data Preprocessing**: Raw data ကို machine learning model တစ်ခုက လုပ်ဆောင်နိုင်ရန် ပြင်ဆင်ခြင်း။
+*   **Dataset Handling**: Dataset များကို စီမံခန့်ခွဲခြင်းနှင့် အသုံးပြုခြင်း။
\ No newline at end of file
diff --git a/chapters/my/chapter3/2.mdx b/chapters/my/chapter3/2.mdx
new file mode 100644
index 000000000..12925ae06
--- /dev/null
+++ b/chapters/my/chapter3/2.mdx
@@ -0,0 +1,472 @@
+# ဒေတာများကို စီမံဆောင်ရွက်ခြင်း[[processing-the-data]]
+
+<CourseFloatingBanner chapter={3}
+  classNames="absolute z-10 right-0 top-0"
+  notebooks={[
+    {label: "Google Colab", value: "https://colab.research.google.com/github/huggingface/notebooks/blob/master/course/en/chapter3/section2.ipynb"},
+    {label: "Aws Studio", value: "https://studiolab.sagemaker.aws/import/github/huggingface/notebooks/blob/master/course/en/chapter3/section2.ipynb"},
+]} />
+
+[ယခင်အခန်း](/course/chapter2) မှ ဥပမာကို ဆက်လက်၍၊ batch တစ်ခုပေါ်တွင် sequence classifier ကို မည်သို့လေ့ကျင့်(Train)ရမည်ကို ဤတွင် ဖော်ပြထားသည်။
+
+
+```python
+import torch
+from torch.optim import AdamW
+from transformers import AutoTokenizer, AutoModelForSequenceClassification
+
+# အရင်ကအတိုင်း
+checkpoint = "bert-base-uncased"
+tokenizer = AutoTokenizer.from_pretrained(checkpoint)
+model = AutoModelForSequenceClassification.from_pretrained(checkpoint)
+sequences = [
+    "I've been waiting for a HuggingFace course my whole life.",
+    "This course is amazing!",
+]
+batch = tokenizer(sequences, padding=True, truncation=True, return_tensors="pt")
+
+# ဒါက အသစ်ပါ
+batch["labels"] = torch.tensor([1, 1])
+
+optimizer = AdamW(model.parameters())
+loss = model(**batch).loss
+loss.backward()
+optimizer.step()
+```
+
+ဟုတ်ပါတယ်၊ model ကို စာကြောင်းနှစ်ကြောင်းတည်းနဲ့ လေ့ကျင့်တာကတော့ ကောင်းမွန်တဲ့ ရလဒ်တွေရရှိမှာ မဟုတ်ပါဘူး။ ပိုကောင်းတဲ့ရလဒ်တွေ ရဖို့အတွက်၊ ပိုကြီးမားတဲ့ dataset တစ်ခုကို ပြင်ဆင်ဖို့ လိုပါလိမ့်မယ်။
+
+ဒီအပိုင်းမှာတော့ William B. Dolan နဲ့ Chris Brockett တို့ရဲ့ [paper](https://www.aclweb.org/anthology/I05-5002.pdf) မှာ မိတ်ဆက်ခဲ့တဲ့ MRPC (Microsoft Research Paraphrase Corpus) dataset ကို ဥပမာအဖြစ် ကျွန်တော်တို့ အသုံးပြုပါမယ်။ ဒီ dataset မှာ စာကြောင်းအတွဲပေါင်း ၅,၈၀၁ ခု ပါဝင်ပြီး ၎င်းတို့ဟာ paraphrase (ဆိုလိုသည်မှာ စာကြောင်းနှစ်ခုလုံး အဓိပ္ပာယ်တူညီခြင်း) ဟုတ်မဟုတ်ကို ဖော်ပြတဲ့ label တစ်ခု ပါရှိပါတယ်။ ဒီအခန်းအတွက် ရွေးချယ်ရခြင်းကတော့ ဒါဟာ သေးငယ်တဲ့ dataset တစ်ခုဖြစ်ပြီး၊ ဒါကြောင့် လေ့ကျင့်မှုနဲ့ စမ်းသပ်လုပ်ဆောင်ဖို့ လွယ်ကူလို့ပါပဲ။
+
+### Hub မှ dataset တစ်ခုကို Loading လုပ်ခြင်း[[loading-a-dataset-from-the-hub]]
+
+<Youtube id="_BZearw7f0w"/>
+
+Hub မှာ model တွေပဲ ပါဝင်တာ မဟုတ်ပါဘူး၊ မတူညီတဲ့ ဘာသာစကားမျိုးစုံနဲ့ dataset များစွာလည်း ရှိပါတယ်။ [ဒီနေရာ](https://huggingface.co/datasets) မှာ dataset တွေကို ကြည့်ရှုနိုင်ပြီး၊ ဒီအပိုင်းကို ပြီးသွားရင် dataset အသစ်တစ်ခုကို loading နဲ့ processing လုပ်ကြည့်ဖို့ ကျွန်တော်တို့ အကြံပြုပါတယ် (ယေဘုယျ documentation ကို [ဒီနေရာ](https://huggingface.co/docs/datasets/loading) မှာ ကြည့်ပါ)။ ဒါပေမယ့် အခုတော့ MRPC dataset ကို အာရုံစိုက်ကြရအောင်။ ဒါက [GLUE benchmark](https://gluebenchmark.com/) ကို ဖွဲ့စည်းထားတဲ့ dataset ၁၀ ခုထဲက တစ်ခုဖြစ်ပါတယ်။ GLUE benchmark ဟာ မတူညီတဲ့ text classification လုပ်ငန်း ၁၀ ခုမှာ ML model တွေရဲ့ စွမ်းဆောင်ရည်ကို တိုင်းတာဖို့ အသုံးပြုတဲ့ academic benchmark တစ်ခု ဖြစ်ပါတယ်။
+
+🤗 Datasets library က Hub ပေါ်က dataset တစ်ခုကို download လုပ်ပြီး cache လုပ်ဖို့ အလွန်ရိုးရှင်းတဲ့ command တစ်ခုကို ပေးစွမ်းပါတယ်။ MRPC dataset ကို ဒီလို download လုပ်နိုင်ပါတယ်။
+
+> [!TIP]
+> 💡 **ထပ်ဆောင်း အရင်းအမြစ်များ**: dataset loading နည်းလမ်းတွေနဲ့ ဥပမာတွေ ထပ်မံသိရှိလိုပါက [🤗 Datasets documentation](https://huggingface.co/docs/datasets/) ကို ကြည့်ရှုပါ။ 
+
+```py
+from datasets import load_dataset
+
+raw_datasets = load_dataset("glue", "mrpc")
+raw_datasets
+```
+
+```python out
+DatasetDict({
+    train: Dataset({
+        features: ['sentence1', 'sentence2', 'label', 'idx'],
+        num_rows: 3668
+    })
+    validation: Dataset({
+        features: ['sentence1', 'sentence2', 'label', 'idx'],
+        num_rows: 408
+    })
+    test: Dataset({
+        features: ['sentence1', 'sentence2', 'label', 'idx'],
+        num_rows: 1725
+    })
+})
+```
+
+သင်တွေ့ရတဲ့အတိုင်း၊ training set, validation set, နဲ့ test set တွေပါဝင်တဲ့ `DatasetDict` object တစ်ခုကို ကျွန်တော်တို့ ရရှိပါတယ်။ တစ်ခုချင်းစီမှာ columns များစွာ (`sentence1`, `sentence2`, `label`, နဲ့ `idx`) နဲ့ မတူညီတဲ့ row အရေအတွက်တွေ ပါဝင်ပါတယ် (ဒါကြောင့် training set မှာ စာကြောင်းအတွဲ ၃,၆၆၈ ခု၊ validation set မှာ ၄၀၈ ခု၊ test set မှာ ၁,၇၂၅ ခု ရှိပါတယ်)။
+
+> [!TIP]
+> ဒီ command က dataset ကို download လုပ်ပြီး cache လုပ်ပါတယ်၊ default အားဖြင့် *~/.cache/huggingface/datasets* မှာပါ။ Chapter 2 မှာ သင်ယူခဲ့တဲ့အတိုင်း `HF_HOME` environment variable ကို သတ်မှတ်ခြင်းဖြင့် သင်ရဲ့ cache folder ကို စိတ်ကြိုက်ပြင်ဆင်နိုင်ပါတယ်။
+
+ကျွန်တော်တို့ရဲ့ `raw_datasets` object ထဲက စာကြောင်းအတွဲတစ်ခုစီကို dictionary နဲ့တူညီစွာ indexing လုပ်ပြီး ဝင်ရောက်ကြည့်ရှုနိုင်ပါတယ်။
+
+```py
+raw_train_dataset = raw_datasets["train"]
+raw_train_dataset[0]
+```
+
+```python out
+{'idx': 0,
+ 'label': 1,
+ 'sentence1': 'Amrozi accused his brother , whom he called " the witness " , of deliberately distorting his evidence .',
+ 'sentence2': 'Referring to him as only " the witness " , Amrozi accused his brother of deliberately distorting his evidence .'}
+```
+
+label တွေဟာ integer တွေ ဖြစ်နေတာကို ကျွန်တော်တို့ တွေ့ရပါတယ်။ ဒါကြောင့် အဲဒီနေရာမှာ preprocessing လုပ်ဖို့ မလိုအပ်ပါဘူး။ ဘယ် integer က ဘယ် label နဲ့ ကိုက်ညီလဲဆိုတာ သိရှိဖို့အတွက် ကျွန်တော်တို့ရဲ့ `raw_train_dataset` ရဲ့ `features` တွေကို စစ်ဆေးနိုင်ပါတယ်။ ဒါက column တစ်ခုစီရဲ့ type ကို ပြောပြပေးပါလိမ့်မယ်။
+
+```py
+raw_train_dataset.features
+```
+
+```python out
+{'sentence1': Value(dtype='string', id=None),
+ 'sentence2': Value(dtype='string', id=None),
+ 'label': ClassLabel(num_classes=2, names=['not_equivalent', 'equivalent'], names_file=None, id=None),
+ 'idx': Value(dtype='int32', id=None)}
+```
+
+နောက်ကွယ်မှာတော့ `label` ဟာ `ClassLabel` type ဖြစ်ပြီး၊ integer တွေကနေ label name ကို mapping လုပ်တာက *names* folder ထဲမှာ သိမ်းဆည်းထားပါတယ်။ `0` က `not_equivalent` နဲ့ ကိုက်ညီပြီး၊ `1` က `equivalent` နဲ့ ကိုက်ညီပါတယ်။
+
+> [!TIP]
+> ✏️ **စမ်းသပ်ကြည့်ပါ။** training set ရဲ့ element နံပါတ် ၁၅ နဲ့ validation set ရဲ့ element နံပါတ် ၈၇ ကို ကြည့်ပါ။ ၎င်းတို့ရဲ့ label တွေက ဘာတွေလဲ။
+
+### Dataset တစ်ခုကို Preprocessing လုပ်ခြင်း[[preprocessing-a-dataset]]
+
+<Youtube id="0u3ioSwev3s"/>
+
+Dataset ကို preprocessing လုပ်ဖို့အတွက်၊ text တွေကို model နားလည်နိုင်တဲ့ ဂဏန်းတွေအဖြစ် ပြောင်းလဲဖို့ လိုအပ်ပါတယ်။ [ယခင်အခန်း](/course/chapter2) မှာ သင်တွေ့ခဲ့တဲ့အတိုင်း၊ ဒါကို tokenizer နဲ့ လုပ်ဆောင်ပါတယ်။ tokenizer ကို sentence တစ်ခု ဒါမှမဟုတ် sentence list တစ်ခု ပေးပို့နိုင်တာကြောင့်၊ pair တစ်ခုစီရဲ့ ပထမ sentence တွေအားလုံးနဲ့ ဒုတိယ sentence တွေအားလုံးကို အခုလို တိုက်ရိုက် tokenize လုပ်နိုင်စေပါတယ်။
+
+```py
+from transformers import AutoTokenizer
+
+checkpoint = "bert-base-uncased"
+tokenizer = AutoTokenizer.from_pretrained(checkpoint)
+tokenized_sentences_1 = tokenizer(raw_datasets["train"]["sentence1"])
+tokenized_sentences_2 = tokenizer(raw_datasets["train"]["sentence2"])
+```
+
+> [!TIP]
+> 💡 **နက်နက်နဲနဲ လေ့လာခြင်း**: ပိုမိုအဆင့်မြင့်သော tokenization နည်းလမ်းများနှင့် မတူညီသော tokenizers များ မည်သို့အလုပ်လုပ်သည်ကို နားလည်ရန်အတွက် [🤗 Tokenizers documentation](https://huggingface.co/docs/transformers/main/en/tokenizer_summary) နှင့် [cookbook ရှိ tokenization guide](https://huggingface.co/learn/cookbook/en/advanced_rag#tokenization-strategies) ကို လေ့လာပါ။
+
+သို့သော်လည်း၊ model ကို sequences နှစ်ခု ပေးပို့လိုက်ရုံနဲ့ စာကြောင်းနှစ်ခုဟာ paraphrase ဟုတ်မဟုတ်ဆိုတဲ့ prediction ကို ရရှိမှာ မဟုတ်ပါဘူး။ ကျွန်တော်တို့ဟာ sequences နှစ်ခုကို pair အဖြစ် ကိုင်တွယ်ပြီး သင့်လျော်တဲ့ preprocessing ကို အသုံးပြုဖို့ လိုအပ်ပါတယ်။ ကံကောင်းစွာနဲ့ပဲ၊ tokenizer က sequence pair တစ်ခုကိုလည်း ယူပြီး ကျွန်တော်တို့ရဲ့ BERT model မျှော်လင့်ထားတဲ့အတိုင်း ပြင်ဆင်ပေးနိုင်ပါတယ်။
+
+```py
+inputs = tokenizer("This is the first sentence.", "This is the second one.")
+inputs
+```
+
+```python out
+{ 
+  'input_ids': [101, 2023, 2003, 1996, 2034, 6251, 1012, 102, 2023, 2003, 1996, 2117, 2028, 1012, 102],
+  'token_type_ids': [0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1, 1],
+  'attention_mask': [1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1]
+}
+```
+
+[Chapter 2](/course/chapter2) မှာ `input_ids` နဲ့ `attention_mask` keys တွေအကြောင်း ဆွေးနွေးခဲ့ပြီးပါပြီ၊ ဒါပေမယ့် `token_type_ids` အကြောင်းကိုတော့ နောက်မှ ဆွေးနွေးဖို့ ချန်ထားခဲ့ပါတယ်။ ဒီဥပမာမှာ၊ input ရဲ့ ဘယ်အပိုင်းက ပထမစာကြောင်းဖြစ်ပြီး ဘယ်ဟာက ဒုတိယစာကြောင်းဖြစ်တယ်ဆိုတာကို model ကို ပြောပြတာက ဒါပါပဲ။
+
+> [!TIP]
+> ✏️ **စမ်းသပ်ကြည့်ပါ။** training set ရဲ့ element နံပါတ် ၁၅ ကို ယူပြီး စာကြောင်းနှစ်ကြောင်းကို သီးခြားစီနဲ့ pair အဖြစ် tokenize လုပ်ပါ။ ရလဒ်နှစ်ခုကြား ဘာကွာခြားမှု ရှိပါသလဲ။
+
+`input_ids` ထဲက IDs တွေကို စကားလုံးတွေအဖြစ် ပြန် decode လုပ်ရင်၊
+
+```py
+tokenizer.convert_ids_to_tokens(inputs["input_ids"])
+```
+
+ကျွန်တော်တို့ ရရှိမှာကတော့...
+
+```python out
+['[CLS]', 'this', 'is', 'the', 'first', 'sentence', '.', '[SEP]', 'this', 'is', 'the', 'second', 'one', '.', '[SEP]']
+```
+
+ဒါကြောင့် model က input တွေကို `[CLS] sentence1 [SEP] sentence2 [SEP]` ပုံစံမျိုး မျှော်လင့်ထားတာကို ကျွန်တော်တို့ တွေ့ရပါတယ်။ ဒါကို `token_type_ids` နဲ့ ချိန်ညှိလိုက်ရင်၊
+
+```python out
+['[CLS]', 'this', 'is', 'the', 'first', 'sentence', '.', '[SEP]', 'this', 'is', 'the', 'second', 'one', '.', '[SEP]']
+[      0,      0,    0,     0,       0,          0,   0,       0,      1,    1,     1,        1,     1,   1,       1]
+```
+
+သင်တွေ့ရတဲ့အတိုင်း၊ `[CLS] sentence1 [SEP]` နဲ့ ကိုက်ညီတဲ့ input ရဲ့ အစိတ်အပိုင်းတွေအားလုံးမှာ token type ID `0` ရှိပြီး၊ `sentence2 [SEP]` နဲ့ ကိုက်ညီတဲ့ အခြားအစိတ်အပိုင်းတွေမှာတော့ token type ID `1` ရှိပါတယ်။
+
+သင်က checkpoint တစ်ခုကို ရွေးချယ်မယ်ဆိုရင်၊ သင်ရဲ့ tokenized inputs တွေထဲမှာ `token_type_ids` တွေ မပါဝင်နိုင်ဘူး (ဥပမာ DistilBERT model ကို အသုံးပြုရင် ၎င်းတို့ကို ပြန်မပို့ပါဘူး) ဆိုတာ သတိပြုပါ။ ၎င်းတို့ကို model က ၎င်းတို့နဲ့ ဘာလုပ်ရမယ်ဆိုတာ သိတဲ့အခါမှသာ ပြန်ပို့တာဖြစ်ပါတယ်၊ ဘာလို့လဲဆိုတော့ pretraining လုပ်စဉ်က ၎င်းတို့ကို မြင်ဖူးလို့ပါ။
+
+ဒီနေရာမှာ၊ BERT ကို token type IDs တွေနဲ့ pretrain လုပ်ထားပြီး၊ [Chapter 1](/course/chapter1) မှာ ကျွန်တော်တို့ ဆွေးနွေးခဲ့တဲ့ masked language modeling objective အပြင် _next sentence prediction_ လို့ခေါ်တဲ့ အပို objective တစ်ခုလည်း ပါရှိပါတယ်။ ဒီလုပ်ငန်းရဲ့ ရည်ရွယ်ချက်ကတော့ စာကြောင်းအတွဲတွေကြားက ဆက်စပ်မှုကို model လုပ်ဖို့ပါပဲ။
+
+next sentence prediction မှာ၊ model ကို စာကြောင်းအတွဲတွေ (ကျပန်း mask လုပ်ထားတဲ့ tokens တွေနဲ့) ပေးပြီး ဒုတိယစာကြောင်းက ပထမစာကြောင်းနောက်မှာ လိုက်ပါသလားဆိုတာကို ခန့်မှန်းခိုင်းပါတယ်။ ဒီလုပ်ငန်းကို ခက်ခဲစေဖို့အတွက်၊ တစ်ဝက်တစ်ပျက်က စာကြောင်းတွေဟာ ၎င်းတို့ ထုတ်ယူထားတဲ့ မူရင်း document ထဲမှာ အချင်းချင်း လိုက်ပါနေပြီး၊ ကျန်တစ်ဝက်ကတော့ စာကြောင်းနှစ်ခုဟာ မတူညီတဲ့ document နှစ်ခုကနေ လာတာ ဖြစ်ပါတယ်။
+
+ယေဘုယျအားဖြင့်၊ သင်ရဲ့ tokenized inputs တွေထဲမှာ `token_type_ids` တွေ ပါဝင်သည်ဖြစ်စေ၊ မပါဝင်သည်ဖြစ်စေ သင်စိုးရိမ်ဖို့ မလိုပါဘူး- tokenizer နဲ့ model အတွက် တူညီတဲ့ checkpoint ကို သင်အသုံးပြုနေသရွေ့ အားလုံးအဆင်ပြေမှာပါ၊ ဘာလို့လဲဆိုတော့ tokenizer က သူ့ရဲ့ model ကို ဘာတွေပေးရမယ်ဆိုတာ သိထားလို့ပါ။
+
+အခု ကျွန်တော်တို့ရဲ့ tokenizer က စာကြောင်းအတွဲတစ်ခုကို ဘယ်လိုကိုင်တွယ်နိုင်လဲဆိုတာ သိရှိပြီးတာနဲ့၊ ကျွန်တော်တို့ရဲ့ dataset တစ်ခုလုံးကို tokenize လုပ်ဖို့ အသုံးပြုနိုင်ပါပြီ- [ယခင်အခန်း](/course/chapter2) မှာလိုပဲ၊ tokenizer ကို ပထမ sentences list ကို ပေးပြီး၊ နောက်မှ ဒုတိယ sentences list ကို ပေးခြင်းဖြင့် sentences pair list တစ်ခုကို ထည့်သွင်းနိုင်ပါတယ်။ ဒါက [Chapter 2](/course/chapter2) မှာ ကျွန်တော်တို့ တွေ့ခဲ့တဲ့ padding နဲ့ truncation options တွေနဲ့လည်း ကိုက်ညီပါတယ်။ ဒါကြောင့် training dataset ကို preprocessing လုပ်ဖို့ နည်းလမ်းတစ်ခုကတော့...
+
+```py
+tokenized_dataset = tokenizer(
+    raw_datasets["train"]["sentence1"],
+    raw_datasets["train"]["sentence2"],
+    padding=True,
+    truncation=True,
+)
+```
+
+ဒါက ကောင်းကောင်း အလုပ်လုပ်ပါတယ်၊ ဒါပေမယ့် dictionary တစ်ခု (ကျွန်တော်တို့ရဲ့ keys တွေဖြစ်တဲ့ `input_ids`, `attention_mask`, နဲ့ `token_type_ids` တွေနဲ့ list of lists တွေဖြစ်တဲ့ values တွေ) ကို ပြန်ပို့တဲ့ အားနည်းချက် ရှိပါတယ်။ ဒါက tokenization လုပ်စဉ်မှာ သင်ရဲ့ dataset တစ်ခုလုံးကို သိမ်းဆည်းဖို့ RAM လုံလောက်မှသာ အလုပ်လုပ်မှာပါ (🤗 Datasets library က dataset တွေဟာ disk ပေါ်မှာ သိမ်းဆည်းထားတဲ့ [Apache Arrow](https://arrow.apache.org/) ဖိုင်တွေ ဖြစ်တာကြောင့်၊ သင်တောင်းဆိုထားတဲ့ samples တွေကိုသာ memory ထဲမှာ load လုပ်ထားပါတယ်)။
+
+ဒေတာကို dataset အဖြစ် ဆက်လက်ထားရှိဖို့အတွက်၊ ကျွန်တော်တို့ `Dataset.map()` method ကို အသုံးပြုပါမယ်။ ဒါက tokenization အပြင် ပိုမို preprocessing လုပ်ဖို့ လိုအပ်ရင် အပိုပြောင်းလွယ်ပြင်လွယ်မှု (flexibility) ကိုလည်း ပေးစွမ်းပါတယ်။ `map()` method က dataset ရဲ့ element တစ်ခုစီပေါ်မှာ function တစ်ခုကို အသုံးပြုခြင်းဖြင့် အလုပ်လုပ်ပါတယ်၊ ဒါကြောင့် ကျွန်တော်တို့ရဲ့ inputs တွေကို tokenize လုပ်မယ့် function တစ်ခုကို သတ်မှတ်ကြည့်ရအောင်။
+
+```py
+def tokenize_function(example):
+    return tokenizer(example["sentence1"], example["sentence2"], truncation=True)
+```
+
+ဒီ function က dictionary တစ်ခု (ကျွန်တော်တို့ရဲ့ dataset ရဲ့ items တွေလိုမျိုး) ကို ယူပြီး `input_ids`, `attention_mask`, နဲ့ `token_type_ids` keys တွေပါတဲ့ dictionary အသစ်တစ်ခုကို ပြန်ပို့ပါတယ်။ `example` dictionary မှာ samples များစွာ (sentences list တစ်ခုအဖြစ် key တစ်ခုစီ) ပါဝင်နေရင်လည်း အလုပ်လုပ်နိုင်တယ်ဆိုတာ သတိပြုပါ။ ဘာလို့လဲဆိုတော့ `tokenizer` က အရင်က တွေ့ခဲ့တဲ့အတိုင်း sentences pair list တွေပေါ်မှာ အလုပ်လုပ်လို့ပါပဲ။ ဒါက `map()` ကို ခေါ်ဆိုတဲ့အခါ `batched=True` option ကို အသုံးပြုနိုင်စေမှာဖြစ်ပြီး၊ ဒါက tokenization ကို အလွန်မြန်ဆန်စေပါလိမ့်မယ်။ `tokenizer` ဟာ [🤗 Tokenizers](https://github.com/huggingface/tokenizers) library မှ Rust ဘာသာနဲ့ ရေးထားတဲ့ tokenizer တစ်ခုရဲ့ အထောက်အပံ့နဲ့ အလုပ်လုပ်ပါတယ်။ ဒီ tokenizer က အလွန်မြန်ဆန်နိုင်ပါတယ်၊ ဒါပေမယ့် inputs တွေအများကြီးကို တစ်ပြိုင်နက်တည်း ပေးပို့မှသာ ဖြစ်ပါတယ်။
+
+ကျွန်တော်တို့ရဲ့ tokenization function ထဲမှာ `padding` argument ကို အခုတော့ ချန်ထားခဲ့တယ်ဆိုတာ သတိပြုပါ။ ဒါက samples အားလုံးကို အများဆုံး အရှည်အထိ padding လုပ်တာက ထိရောက်မှု မရှိလို့ပါပဲ- batch တစ်ခုကို တည်ဆောက်တဲ့အခါ samples တွေကို padding လုပ်တာက ပိုကောင်းပါတယ်။ ဘာလို့လဲဆိုတော့ အဲဒီအခါမှာ batch ထဲက အများဆုံး အရှည်အထိသာ padding လုပ်ဖို့ လိုအပ်ပြီး dataset တစ်ခုလုံးရဲ့ အများဆုံး အရှည်အထိ မဟုတ်ပါဘူး။ ဒါက inputs တွေမှာ အရှည်တွေ အလွန်မတူညီတဲ့အခါ အချိန်နဲ့ processing power အများကြီးကို သက်သာစေနိုင်ပါတယ်!
+
+> [!TIP]
+> 📚 **စွမ်းဆောင်ရည် မြှင့်တင်ရေး အကြံပြုချက်များ**: ထိရောက်သော ဒေတာ စီမံဆောင်ရွက်မှု နည်းလမ်းများအကြောင်း ပိုမိုသိရှိလိုပါက [🤗 Datasets performance guide](https://huggingface.co/docs/datasets/about_arrow) ကို လေ့လာပါ။
+
+ဒီမှာ ကျွန်တော်တို့ရဲ့ datasets အားလုံးပေါ်မှာ tokenization function ကို တစ်ပြိုင်နက်တည်း ဘယ်လို အသုံးပြုရမယ်ဆိုတာကို ပြသထားပါတယ်။ `map` ကို ခေါ်ဆိုတဲ့အခါ `batched=True` ကို ကျွန်တော်တို့ အသုံးပြုထားတာကြောင့် function ကို ကျွန်တော်တို့ရဲ့ dataset ရဲ့ element အများအပြားပေါ်မှာ တစ်ပြိုင်နက်တည်း အသုံးပြုတာဖြစ်ပြီး၊ element တစ်ခုစီပေါ်မှာ သီးခြားစီ မဟုတ်ပါဘူး။ ဒါက preprocessing ကို ပိုမိုမြန်ဆန်စေပါတယ်။
+
+```py
+tokenized_datasets = raw_datasets.map(tokenize_function, batched=True)
+tokenized_datasets
+```
+
+🤗 Datasets library က ဒီ processing ကို အသုံးပြုတဲ့ နည်းလမ်းကတော့ preprocessing function က ပြန်ပို့တဲ့ dictionary ထဲက key တစ်ခုစီအတွက် dataset တွေဆီကို fields အသစ်တွေ ထည့်သွင်းပေးတာပါပဲ။
+
+```python out
+DatasetDict({
+    train: Dataset({
+        features: ['attention_mask', 'idx', 'input_ids', 'label', 'sentence1', 'sentence2', 'token_type_ids'],
+        num_rows: 3668
+    })
+    validation: Dataset({
+        features: ['attention_mask', 'idx', 'input_ids', 'label', 'sentence1', 'sentence2', 'token_type_ids'],
+        num_rows: 408
+    })
+    test: Dataset({
+        features: ['attention_mask', 'idx', 'input_ids', 'label', 'sentence1', 'sentence2', 'token_type_ids'],
+        num_rows: 1725
+    })
+})
+```
+
+`map()` နဲ့ သင်ရဲ့ preprocessing function ကို အသုံးပြုတဲ့အခါ `num_proc` argument ကို ထည့်သွင်းပေးခြင်းဖြင့် multiprocessing ကိုတောင် အသုံးပြုနိုင်ပါတယ်။ 🤗 Tokenizers library က ကျွန်တော်တို့ရဲ့ samples တွေကို ပိုမိုမြန်ဆန်စွာ tokenize လုပ်ဖို့ threads များစွာကို အသုံးပြုပြီးသား ဖြစ်တာကြောင့် ဒီနေရာမှာ ကျွန်တော်တို့ ဒါကို မလုပ်ခဲ့ပါဘူး။ ဒါပေမယ့် သင် ဒီ library ရဲ့ အထောက်အပံ့မပါတဲ့ fast tokenizer ကို အသုံးမပြုဘူးဆိုရင်တော့ ဒါက သင်ရဲ့ preprocessing ကို အရှိန်မြှင့်ပေးနိုင်ပါတယ်။
+
+ကျွန်တော်တို့ရဲ့ `tokenize_function` က `input_ids`, `attention_mask`, နဲ့ `token_type_ids` keys တွေပါတဲ့ dictionary တစ်ခုကို ပြန်ပို့တာကြောင့် အဲဒီ fields သုံးခုကို ကျွန်တော်တို့ရဲ့ dataset ရဲ့ splits အားလုံးဆီကို ထည့်သွင်းပါတယ်။ ကျွန်တော်တို့ရဲ့ preprocessing function က `map()` အသုံးပြုထားတဲ့ dataset ထဲက လက်ရှိ key တစ်ခုအတွက် value အသစ်တစ်ခုကို ပြန်ပို့မယ်ဆိုရင် လက်ရှိ fields တွေကို ပြောင်းလဲနိုင်တယ်ဆိုတာလည်း သတိပြုပါ။
+
+နောက်ဆုံးလုပ်ဆောင်ရမယ့်အရာကတော့ elements တွေကို batch လုပ်တဲ့အခါ samples အားလုံးကို အရှည်ဆုံး element ရဲ့ အရှည်အထိ padding လုပ်ဖို့ပါပဲ - ဒါကို *dynamic padding* လို့ ကျွန်တော်တို့ ခေါ်ပါတယ်။
+
+##### Dynamic padding[[dynamic-padding]]
+
+<Youtube id="7q5NyFT8REg"/>
+
+batch ထဲမှာ samples တွေကို ပေါင်းစည်းပေးတဲ့ function ကို *collate function* လို့ခေါ်ပါတယ်။ ဒါဟာ `DataLoader` တစ်ခုကို သင်တည်ဆောက်တဲ့အခါ ထည့်သွင်းနိုင်တဲ့ argument တစ်ခုဖြစ်ပြီး၊ default အားဖြင့်တော့ သင်ရဲ့ samples တွေကို PyTorch tensors တွေအဖြစ် ပြောင်းလဲပြီး (သင်ရဲ့ elements တွေဟာ lists, tuples, ဒါမှမဟုတ် dictionaries တွေဆိုရင်တော့ recursively) တွဲစပ်ပေးမယ့် function တစ်ခုပါပဲ။ ကျွန်တော်တို့ရဲ့ အခြေအနေမှာတော့ ဒါက ဖြစ်နိုင်မှာ မဟုတ်ပါဘူး၊ ဘာလို့လဲဆိုတော့ ကျွန်တော်တို့မှာရှိတဲ့ inputs တွေအားလုံးဟာ size အတူတူ ဖြစ်မှာ မဟုတ်လို့ပါ။ ကျွန်တော်တို့ padding ကို တမင်တကာ နောက်ဆုတ်ထားတာကတော့ batch တစ်ခုစီမှာ လိုအပ်သလောက်သာ အသုံးပြုဖို့နဲ့ padding အများကြီးပါတဲ့ အလွန်ရှည်လျားတဲ့ inputs တွေ မရှိအောင် ရှောင်ရှားဖို့ပါပဲ။ ဒါက training ကို အတော်လေး အရှိန်မြှင့်ပေးပါလိမ့်မယ်၊ ဒါပေမယ့် သင် TPU ပေါ်မှာ လေ့ကျင့်နေတယ်ဆိုရင် ပြဿနာတွေ ဖြစ်စေနိုင်တယ်ဆိုတာ သတိပြုပါ - TPUs တွေက fixed shapes တွေကို ပိုနှစ်သက်ပါတယ်၊ အပို padding လိုအပ်နေရင်တောင်မှပေါ့။
+
+> [!TIP]
+> 🚀 **Optimization လမ်းညွှန်**: Training စွမ်းဆောင်ရည်ကို မြှင့်တင်ခြင်းဆိုင်ရာ အသေးစိတ်အချက်အလက်များအတွက်၊ padding strategies များနှင့် TPU ထည့်သွင်းစဉ်းစားမှုများအပါအဝင် [🤗 Transformers performance documentation](https://huggingface.co/docs/transformers/main/en/performance) ကို ကြည့်ရှုပါ။
+
+ဒါကို လက်တွေ့မှာ လုပ်ဆောင်ဖို့အတွက်၊ ကျွန်တော်တို့ batch လုပ်ချင်တဲ့ dataset ရဲ့ items တွေဆီကို မှန်ကန်တဲ့ padding ပမာဏကို အသုံးပြုပေးမယ့် collate function တစ်ခုကို သတ်မှတ်ရပါမယ်။ ကံကောင်းစွာနဲ့ပဲ၊ 🤗 Transformers library က ကျွန်တော်တို့ကို `DataCollatorWithPadding` ကနေတစ်ဆင့် ဒီလို function တစ်ခု ပေးစွမ်းပါတယ်။ ဒါကို သင် instantiate လုပ်တဲ့အခါ tokenizer တစ်ခုကို ယူပါတယ် (ဘယ် padding token ကို အသုံးပြုရမယ်၊ model က inputs ရဲ့ ဘယ်ဘက် ဒါမှမဟုတ် ညာဘက်မှာ padding ကို မျှော်လင့်ထားသလား သိရှိဖို့) ပြီးတော့ သင်လိုအပ်တဲ့အရာအားလုံးကို လုပ်ဆောင်ပေးပါလိမ့်မယ်-
+
+```py
+from transformers import DataCollatorWithPadding
+
+data_collator = DataCollatorWithPadding(tokenizer=tokenizer)
+```
+
+ဒီကိရိယာအသစ်ကို စမ်းသပ်ဖို့အတွက်၊ ကျွန်တော်တို့ရဲ့ training set ကနေ batch လုပ်ချင်တဲ့ samples အနည်းငယ်ကို ယူလိုက်ရအောင်။ ဒီနေရာမှာ ကျွန်တော်တို့ `idx`, `sentence1`, နဲ့ `sentence2` columns တွေကို ဖယ်ရှားလိုက်ပါတယ်၊ ဘာလို့လဲဆိုတော့ ၎င်းတို့ဟာ မလိုအပ်တော့တဲ့အပြင် strings တွေ ပါဝင်နေလို့ပါ (strings တွေနဲ့ tensors တွေ ဖန်တီးလို့ မရပါဘူး) ပြီးတော့ batch ထဲက entry တစ်ခုစီရဲ့ အရှည်တွေကို ကြည့်ရအောင်-
+
+```py
+samples = tokenized_datasets["train"][:8]
+samples = {k: v for k, v in samples.items() if k not in ["idx", "sentence1", "sentence2"]}
+[len(x) for x in samples["input_ids"]]
+```
+
+```python out
+[50, 59, 47, 67, 59, 50, 62, 32]
+```
+
+အံ့သြစရာမရှိပါဘူး၊ ၃၂ ကနေ ၆၇ အထိ အရှည်အမျိုးမျိုးရှိတဲ့ samples တွေကို ကျွန်တော်တို့ ရရှိပါတယ်။ Dynamic padding ဆိုတာက ဒီ batch ထဲက samples တွေအားလုံးကို batch ထဲမှာ အများဆုံး အရှည်ဖြစ်တဲ့ ၆၇ အထိ padding လုပ်သင့်တယ်လို့ ဆိုလိုတာပါ။ Dynamic padding မပါရင်၊ samples အားလုံးကို dataset တစ်ခုလုံးရဲ့ အများဆုံး အရှည်အထိ ဒါမှမဟုတ် model က လက်ခံနိုင်တဲ့ အများဆုံး အရှည်အထိ padding လုပ်ရပါလိမ့်မယ်။ ကျွန်တော်တို့ရဲ့ `data_collator` က batch ကို ကောင်းကောင်း dynamic padding လုပ်နေလားဆိုတာကို ထပ်စစ်ဆေးကြည့်ရအောင်-
+
+```py
+batch = data_collator(samples)
+{k: v.shape for k, v in batch.items()}
+```
+
+```python out
+{'attention_mask': torch.Size([8, 67]),
+ 'input_ids': torch.Size([8, 67]),
+ 'token_type_ids': torch.Size([8, 67]),
+ 'labels': torch.Size([8])}
+```
+
+ကောင်းပြီ! အခု ကျွန်တော်တို့ဟာ raw text ကနေ model က ကိုင်တွယ်နိုင်တဲ့ batches တွေအထိ ရောက်ရှိသွားပြီ ဖြစ်တာကြောင့်၊ model ကို fine-tune လုပ်ဖို့ အဆင်သင့်ဖြစ်ပါပြီ။
+
+> [!TIP]
+> ✏️ **စမ်းသပ်ကြည့်ပါ။** GLUE SST-2 dataset ပေါ်မှာ preprocessing ကို ပြန်လုပ်ပါ။ ဒါက single sentences တွေနဲ့ ဖွဲ့စည်းထားတဲ့အတွက် အနည်းငယ် ကွာခြားမှု ရှိပေမယ့်၊ ကျွန်တော်တို့ လုပ်ခဲ့တဲ့ ကျန်တာတွေကတော့ အတူတူပါပဲ။ ပိုမိုခက်ခဲတဲ့ စိန်ခေါ်မှုအတွက်၊ GLUE task တစ်ခုခုပေါ်မှာ အလုပ်လုပ်နိုင်မယ့် preprocessing function တစ်ခု ရေးကြည့်ပါ။
+>
+> 📖 **ထပ်ဆောင်းလေ့ကျင့်မှုများ**: [🤗 Transformers examples](https://huggingface.co/docs/transformers/main/en/notebooks) မှ ဤလက်တွေ့ဥပမာများကို ကြည့်ရှုပါ။
+
+ကောင်းပါပြီ။ အခု ကျွန်တော်တို့ရဲ့ ဒေတာကို 🤗 Datasets library မှ နောက်ဆုံးပေါ် အကောင်းဆုံးနည်းလမ်းများဖြင့် preprocessing လုပ်ပြီးသွားပြီ ဖြစ်တာကြောင့်၊ ခေတ်မီ Trainer API ကို အသုံးပြုပြီး ကျွန်တော်တို့ရဲ့ model ကို လေ့ကျင့်ဖို့ အသင့်ဖြစ်ပါပြီ။ နောက်အပိုင်းမှာ Hugging Face ecosystem မှာ ရရှိနိုင်တဲ့ နောက်ဆုံးပေါ် features တွေနဲ့ optimization တွေကို အသုံးပြုပြီး သင်ရဲ့ model ကို ထိရောက်စွာ fine-tune လုပ်နည်းကို ပြသပေးပါလိမ့်မယ်။
+
+## အခန်း၏ ဗဟုသုတစစ်ဆေးခြင်း[[section-quiz]]
+
+ဒေတာ စီမံဆောင်ရွက်မှု သဘောတရားများကို သင့်နားလည်မှုကို စမ်းသပ်ပါ-
+
+### 1. `Dataset.map()` ကို `batched=True` နဲ့ အသုံးပြုခြင်းရဲ့ အဓိကအားသာချက်က ဘာလဲ။
+
+<Question
+	choices={[
+		{
+			text: "၎င်းသည် memory ပမာဏ နည်းပါးစွာ အသုံးပြုသည်။",
+			explain: "Memory ကို ပိုမိုထိရောက်စွာ အသုံးပြုနိုင်သော်လည်း၊ ဒါက အဓိကအားသာချက် မဟုတ်ပါ။"
+		},
+		{
+			text: "၎င်းသည် ဥပမာများစွာကို တစ်ပြိုင်နက်တည်း စီမံဆောင်ရွက်ပေးပြီး tokenization ကို ပိုမိုမြန်ဆန်စေသည်။",
+			explain: "မှန်ပါသည်။ batch အလိုက် စီမံဆောင်ရွက်ခြင်းက fast tokenizer ကို ဥပမာများစွာပေါ်မှာ တစ်ပြိုင်နက်တည်း အလုပ်လုပ်စေပြီး အရှိန်ကို သိသိသာသာ မြှင့်တင်ပေးသည်။",
+            correct: true
+		},
+		{
+			text: "၎င်းသည် padding ကို အလိုအလျောက် ကိုင်တွယ်ပေးသည်။",
+			explain: "Batching က padding ကို အလိုအလျောက် မကိုင်တွယ်ပေးပါဘူး - ဒါကို data collator က လုပ်ဆောင်တာပါ။"
+		},
+        {
+			text: "၎င်းသည် ဒေတာကို PyTorch tensors တွေအဖြစ် ပြောင်းလဲပေးသည်။",
+			explain: "Tensor ပြောင်းလဲခြင်းက format ကို သတ်မှတ်တဲ့အခါ ဖြစ်ပေါ်တာဖြစ်ပြီး batched mapping လုပ်နေစဉ် မဟုတ်ပါဘူး။"
+		}
+	]}
+/>
+
+### 2. dataset ထဲက sequences အားလုံးကို အများဆုံးအရှည်အထိ padding လုပ်တာထက် dynamic padding ကို ဘာကြောင့် အသုံးပြုတာလဲ။
+
+<Question
+	choices={[
+		{
+			text: "Dynamic padding က model architecture က လိုအပ်လို့ပါ။",
+			explain: "မဟုတ်ပါဘူး၊ model တွေက fixed နဲ့ dynamic padding နှစ်ခုလုံးကို ကိုင်တွယ်နိုင်ပါတယ်။"
+		},
+		{
+			text: "၎င်းသည် batch တစ်ခုစီရှိ အများဆုံးအရှည်အထိသာ padding လုပ်ခြင်းဖြင့် computational overhead ကို လျှော့ချပေးသည်။",
+			explain: "မှန်ပါသည်။ Dynamic padding သည် batch ၏ အများဆုံးအရှည်အထိသာ padding လုပ်ခြင်းဖြင့် မလိုအပ်သော padding tokens များအပေါ် တွက်ချက်မှုများကို ရှောင်ရှားသည်။",
+            correct: true
+		},
+		{
+			text: "၎င်းသည် model accuracy ကို တိုးတက်စေသည်။",
+			explain: "Padding strategy က model accuracy ကို တိုက်ရိုက်မထိခိုက်ပါဘူး။"
+		},
+        {
+			text: "၎င်းသည် DataCollatorWithPadding ကို အသုံးပြုတဲ့အခါ လိုအပ်ပါတယ်။",
+			explain: "DataCollatorWithPadding က dynamic padding ကို ဖွင့်ပေးတာဖြစ်ပေမယ့်၊ လိုအပ်ရင် fixed padding ကိုလည်း အသုံးပြုနိုင်ပါတယ်။"
+		}
+	]}
+/>
+
+### 3. BERT tokenization မှာ `token_type_ids` field က ဘာကို ကိုယ်စားပြုလဲ။
+
+<Question
+	choices={[
+		{
+			text: "sequence ထဲက token တစ်ခုစီရဲ့ အနေအထား။",
+			explain: "ဒါက position embeddings တွေဖြစ်ပြီး token_type_ids မဟုတ်ပါဘူး။"
+		},
+		{
+			text: "sentence pair တွေကို လုပ်ဆောင်တဲ့အခါ token တစ်ခုစီဟာ ဘယ် sentence နဲ့ သက်ဆိုင်လဲဆိုတာ။",
+			explain: "မှန်ပါသည်။ token_type_ids များသည် sentence pair လုပ်ငန်းများတွင် ပထမ sentence (0) နှင့် ဒုတိယ sentence (1) ကို ခွဲခြားပေးသည်။",
+            correct: true
+		},
+		{
+			text: "token တစ်ခုစီအတွက် attention mask။",
+			explain: "Attention mask က သီးခြား field တစ်ခုဖြစ်ပြီး ဘယ် tokens တွေကို အာရုံစိုက်ရမလဲဆိုတာ ပြသပါတယ်။"
+		},
+        {
+			text: "token တစ်ခုစီရဲ့ vocabulary ID။",
+			explain: "ဒါက input_ids field ဖြစ်ပြီး token_type_ids မဟုတ်ပါဘူး။"
+		}
+	]}
+/>
+
+### 4. `load_dataset('glue', 'mrpc')` နဲ့ dataset တစ်ခုကို loading လုပ်တဲ့အခါ၊ ဒုတိယ argument က ဘာကို သတ်မှတ်သလဲ။
+
+<Question
+	choices={[
+		{
+			text: "loading လုပ်ရမယ့် dataset ရဲ့ version။",
+			explain: "Version သတ်မှတ်ဖို့အတွက် မတူညီတဲ့ parameters တွေကို အသုံးပြုပါတယ်။"
+		},
+		{
+			text: "GLUE benchmark ထဲက သီးခြား task ဒါမှမဟုတ် subset။",
+			explain: "မှန်ပါသည်။ MRPC သည် ပိုမိုကြီးမားသော GLUE benchmark စုဆောင်းမှုအတွင်းရှိ သီးခြားလုပ်ငန်းများထဲမှ တစ်ခုဖြစ်သည်။",
+            correct: true
+		},
+		{
+			text: "dataset ရဲ့ split (train/validation/test)။",
+			explain: "Splits တွေကို loading လုပ်ပြီးမှ ဝင်ရောက်ကြည့်ရှုရတာဖြစ်ပြီး load_dataset ခေါ်ဆိုမှုမှာ သတ်မှတ်တာ မဟုတ်ပါဘူး။"
+		},
+        {
+			text: "ဒေတာကို ပြန်ပို့မယ့် format။",
+			explain: "Format ကို loading လုပ်ပြီးမှ set_format() method ကို အသုံးပြုပြီး သတ်မှတ်တာပါ။"
+		}
+	]}
+/>
+
+### 5. Training မလုပ်ခင် 'sentence1' နဲ့ 'sentence2' လို columns တွေကို ဖယ်ရှားခြင်းရဲ့ ရည်ရွယ်ချက်က ဘာလဲ။
+
+<Question
+	choices={[
+		{
+			text: "Training လုပ်နေစဉ် memory ချွေတာရန်။",
+			explain: "Memory အနည်းငယ် ချွေတာနိုင်သော်လည်း၊ ဒါက အဓိကအကြောင်းအရင်း မဟုတ်ပါ။"
+		},
+		{
+			text: "Model က ဒီ raw text columns တွေကို မျှော်လင့်မထားဘဲ error ဖြစ်စေနိုင်ပါတယ်။",
+			explain: "မှန်ပါသည်။ Models များသည် raw text strings များမဟုတ်ဘဲ numerical tensors များကို မျှော်လင့်ထားသည်။ text columns များကို ထားရှိခြင်းသည် errors များကို ဖြစ်ပေါ်စေနိုင်သည်။",
+            correct: true
+		},
+		{
+			text: "ဒီ columns တွေက evaluation အတွက် မလိုအပ်ပါ။",
+			explain: "မှန်သော်လည်း၊ အဓိကအကြောင်းအရင်းကတော့ model က raw text ကို လုပ်ဆောင်နိုင်စွမ်း မရှိလို့ပါ။"
+		},
+        {
+			text: "၎င်းသည် training speed ကို သိသိသာသာ တိုးတက်စေသည်။",
+			explain: "မတူညီတဲ့ ဒေတာ types တွေကနေ error တွေ မဖြစ်အောင် ရှောင်ရှားတာနဲ့ နှိုင်းယှဉ်ရင် speed တိုးတက်မှုက အနည်းငယ်ပဲ ရှိပါတယ်။"
+		}
+	]}
+/>
+
+> [!TIP]
+> 💡 **အဓိက အချက်များ:**
+> - သိသိသာသာ မြန်ဆန်သော preprocessing အတွက် `Dataset.map()` ကို `batched=True` ဖြင့် အသုံးပြုပါ။
+> - `DataCollatorWithPadding` ပါသော Dynamic padding သည် fixed-length padding ထက် ပိုမိုထိရောက်သည်။
+> - သင်၏ model မျှော်လင့်ထားသည့်အတိုင်း (numerical tensors, မှန်ကန်သော column names) ဒေတာများကို အမြဲ preprocessing လုပ်ပါ။
+> - 🤗 Datasets library သည် ဒေတာများကို ပမာဏကြီးမားစွာ ထိရောက်စွာ စီမံဆောင်ရွက်ရန်အတွက် အစွမ်းထက်သော ကိရိယာများကို ပံ့ပိုးပေးသည်။
+
+## ဝေါဟာရ ရှင်းလင်းချက် (Glossary)
+
+*   **Inference**: လေ့ကျင့်ပြီးသား Artificial Intelligence (AI) မော်ဒယ်တစ်ခုကို အသုံးပြုပြီး input data ကနေ ခန့်မှန်းချက်တွေ ဒါမှမဟုတ် output တွေကို ထုတ်လုပ်တဲ့ လုပ်ငန်းစဉ်။
+*   **Sequence Classifier**: စာသား sequence တစ်ခုကို သတ်မှတ်ထားသော အမျိုးအစားများထဲသို့ ခွဲခြားရန် လေ့ကျင့်ထားသော AI မော်ဒယ်။
+*   **Batch**: မတူညီသော input များစွာကို တစ်ပြိုင်နက်တည်း လုပ်ဆောင်နိုင်ရန် အုပ်စုဖွဲ့ခြင်း။
+*   **`torch.optim.AdamW`**: PyTorch မှာ အသုံးပြုတဲ့ AdamW optimizer။ Model ၏ parameters များကို training လုပ်ရာမှာ အသုံးပြုသည်။
+*   **`torch.tensor`**: PyTorch framework မှာ data များကို သိမ်းဆည်းရန် အသုံးပြုတဲ့ multi-dimensional array (tensor) တစ်ခုကို ဖန်တီးသော function။
+*   **`model.parameters()`**: model ၏ လေ့ကျင့်နိုင်သော parameters (weights နှင့် biases) များကို ပြန်ပေးသော method။
+*   **`loss`**: Model ၏ ခန့်မှန်းချက်များနှင့် အမှန်တကယ် labels များကြား ကွာခြားမှုကို တိုင်းတာသော တန်ဖိုး။
+*   **`loss.backward()`**: PyTorch မှာ backpropagation ကို လုပ်ဆောင်ပြီး model ၏ parameters တွေအတွက် gradients များကို တွက်ချက်သော method။
+*   **`optimizer.step()`**: တွက်ချက်ထားသော gradients များကို အသုံးပြုပြီး model ၏ parameters များကို update လုပ်သော optimizer method။
+*   **MRPC (Microsoft Research Paraphrase Corpus) Dataset**: ဝီလျံ ဘီ. ဒိုလန်နှင့် ခရစ် ဘရော့ခ်က်တို့က မိတ်ဆက်ခဲ့သော dataset တစ်ခုဖြစ်ပြီး၊ စာကြောင်းအတွဲများတွင် ၎င်းတို့သည် အဓိပ္ပာယ်တူညီသော paraphrase များ ဟုတ်မဟုတ်ကို ဖော်ပြထားသည်။
+*   **Paraphrase**: အဓိပ္ပာယ်တူညီသော စကားလုံးများ သို့မဟုတ် စာကြောင်းများ။
+*   **Hugging Face Hub**: AI မော်ဒယ်တွေ၊ datasets တွေနဲ့ demo တွေကို အခြားသူတွေနဲ့ မျှဝေဖို့၊ ရှာဖွေဖို့နဲ့ ပြန်လည်အသုံးပြုဖို့အတွက် အွန်လိုင်း platform တစ်ခု ဖြစ်ပါတယ်။
+*   **GLUE Benchmark**: စာသားခွဲခြားသတ်မှတ်ခြင်း လုပ်ငန်း ၁၀ ခုတွင် ML model များ၏ စွမ်းဆောင်ရည်ကို တိုင်းတာရန် အသုံးပြုသည့် academic benchmark တစ်ခု။
+*   **🤗 Datasets Library**: Hugging Face က ထုတ်လုပ်ထားတဲ့ library တစ်ခုဖြစ်ပြီး AI မော်ဒယ်တွေ လေ့ကျင့်ဖို့အတွက် ဒေတာအစုအဝေး (datasets) တွေကို လွယ်လွယ်ကူကူ ဝင်ရောက်ရယူ၊ စီမံခန့်ခွဲပြီး အသုံးပြုနိုင်စေပါတယ်။
+*   **`load_dataset()` Function**: Hugging Face Datasets library မှ dataset များကို download လုပ်ပြီး cache လုပ်ရန် အသုံးပြုသော function။
+*   **`DatasetDict` Object**: Training set, validation set, နှင့် test set ကဲ့သို့သော dataset အများအပြားကို dictionary ပုံစံဖြင့် သိမ်းဆည်းထားသော object။
+*   **Training Set**: Model ကို လေ့ကျင့်ရန်အတွက် အသုံးပြုသော dataset အပိုင်း။
+*   **Validation Set**: Training လုပ်နေစဉ် model ၏ စွမ်းဆောင်ရည်ကို အကဲဖြတ်ရန် အသုံးပြုသော dataset အပိုင်း။
+*   **Test Set**: Model ၏ နောက်ဆုံး စွမ်းဆောင်ရည်ကို တိုင်းတာရန် အသုံးပြုသော dataset အပိုင်း။
+*   **`HF_HOME` Environment Variable**: Hugging Face library များမှ cache ဖိုင်များကို သိမ်းဆည်းမည့် နေရာကို သတ်မှတ်ရန် အသုံးပြုသော environment variable။
+*   **`raw_datasets["train"]`**: `DatasetDict` object မှ training set ကို ဝင်ရောက်ကြည့်ရှုခြင်း။
+*   **`raw_train_dataset.features`**: Dataset ၏ columns များ၏ အမျိုးအစားများနှင့် အချက်အလက်များကို ပြန်ပေးသော property။
+*   **`ClassLabel`**: Categorical labels များကို ကိုင်တွယ်ရန် 🤗 Datasets library မှ အသုံးပြုသော feature type။
+*   **Tokenizer**: စာသား (သို့မဟုတ် အခြားဒေတာ) ကို AI မော်ဒယ်များ စီမံဆောင်ရွက်နိုင်ရန် tokens တွေအဖြစ် ပိုင်းခြားပေးသည့် ကိရိယာ သို့မဟုတ် လုပ်ငန်းစဉ်။
+*   **`AutoTokenizer`**: Hugging Face Transformers library မှာ ပါဝင်တဲ့ class တစ်ခုဖြစ်ပြီး မော်ဒယ်အမည်ကို အသုံးပြုပြီး သက်ဆိုင်ရာ tokenizer ကို အလိုအလျောက် load လုပ်ပေးသည်။
+*   **Pretrained**: Model တစ်ခုကို အကြီးစားဒေတာများဖြင့် အစောပိုင်းကတည်းက လေ့ကျင့်ထားခြင်း။
+*   **`input_ids`**: Tokenizer မှ ထုတ်ပေးသော tokens တစ်ခုစီ၏ ထူးခြားသော ဂဏန်းဆိုင်ရာ ID များ။
+*   **`attention_mask`**: မော်ဒယ်ကို အာရုံစိုက်သင့်သည့် tokens များနှင့် လျစ်လျူရှုသင့်သည့် (padding) tokens များကို ခွဲခြားပေးသည့် binary mask။
+*   **`token_type_ids`**: Sentence pair လုပ်ငန်းများတွင် input sequence တစ်ခုစီမှ token တစ်ခုစီသည် မည်သည့် sentence (ပထမ သို့မဟုတ် ဒုတိယ) နှင့် သက်ဆိုင်သည်ကို ဖော်ပြပေးသော IDs များ။
+*   **`convert_ids_to_tokens()` Method**: input IDs များကို tokens များအဖြစ် ပြန်ပြောင်းပေးသော tokenizer method။
+*   **`[CLS]` Token**: BERT model တွင် sequence ၏ အစကို ကိုယ်စားပြုသော special token။
+*   **`[SEP]` Token**: BERT model တွင် sentence တစ်ခု၏ အဆုံး သို့မဟုတ် sentence နှစ်ခုကြား ပိုင်းခြားရန် အသုံးပြုသော special token။
+*   **Masked Language Modeling Objective**: BERT ကဲ့သို့သော model များကို လေ့ကျင့်ရာတွင် အသုံးပြုသော task တစ်ခုဖြစ်ပြီး၊ စာကြောင်းထဲမှ စကားလုံးအချို့ကို ဝှက်ထားပြီး ၎င်းတို့ကို ခန့်မှန်းစေသည်။
+*   **Next Sentence Prediction**: BERT ကဲ့သို့သော model များကို လေ့ကျင့်ရာတွင် အသုံးပြုသော task တစ်ခုဖြစ်ပြီး၊ စာကြောင်းနှစ်ကြောင်း ပေးပြီး ဒုတိယစာကြောင်းက ပထမစာကြောင်းနောက်မှာ လိုက်ပါသလားဆိုတာကို ခန့်မှန်းစေသည်။
+*   **`Dataset.map()` Method**: 🤗 Datasets library မှာ ပါဝင်တဲ့ method တစ်ခုဖြစ်ပြီး dataset ရဲ့ element တစ်ခုစီ ဒါမှမဟုတ် batch တစ်ခုစီပေါ်မှာ function တစ်ခုကို အသုံးပြုနိုင်စေသည်။
+*   **`batched=True`**: `map()` method မှာ အသုံးပြုသော argument တစ်ခုဖြစ်ပြီး function ကို dataset ရဲ့ element အများအပြားပေါ်မှာ တစ်ပြိုင်နက်တည်း အသုံးပြုစေသည်။
+*   **Rust**: System programming language တစ်ခုဖြစ်ပြီး performance မြင့်မားသော applications များ တည်ဆောက်ရာတွင် အသုံးပြုသည်။
+*   **🤗 Tokenizers Library**: Rust ဘာသာနဲ့ ရေးသားထားတဲ့ Hugging Face library တစ်ခုဖြစ်ပြီး မြန်ဆန်ထိရောက်တဲ့ tokenization ကို လုပ်ဆောင်ပေးသည်။
+*   **Dynamic Padding**: Batch တစ်ခုအတွင်းရှိ samples များကို အဲဒီ batch ထဲက အရှည်ဆုံး sample ရဲ့ အရှည်အထိသာ padding လုပ်တဲ့ နည်းလမ်း။
+*   **Collate Function**: `DataLoader` တစ်ခုမှာ အသုံးပြုတဲ့ function တစ်ခုဖြစ်ပြီး batch တစ်ခုအတွင်း samples တွေကို စုစည်းပေးသည်။
+*   **`DataLoader`**: Dataset ကနေ data တွေကို batch အလိုက် load လုပ်ပေးတဲ့ PyTorch utility class။
+*   **PyTorch Tensors**: PyTorch framework မှာ data တွေကို ကိုယ်စားပြုသော multi-dimensional array များ။
+*   **Recursively**: ကိုယ်တိုင် ပြန်လည်ခေါ်ဆိုသော လုပ်ငန်းစဉ်။
+*   **TPU (Tensor Processing Unit)**: Google မှ AI/ML workloads များအတွက် အထူးဒီဇိုင်းထုတ်ထားသော processor တစ်မျိုး။
+*   **`DataCollatorWithPadding`**: Hugging Face Transformers library မှ ပံ့ပိုးပေးသော class တစ်ခုဖြစ်ပြီး dynamic padding ကို အသုံးပြု၍ batch တစ်ခုအတွင်း samples များကို စုစည်းပေးသည်။
+*   **`num_proc` Argument**: `map()` method မှာ အသုံးပြုသော argument တစ်ခုဖြစ်ပြီး multiprocessing ကို အသုံးပြု၍ preprocessing ကို အရှိန်မြှင့်ရန်။
+*   **GLUE SST-2 Dataset**: GLUE benchmark ထဲက sentiment analysis task တစ်ခုဖြစ်ပြီး single sentences တွေ ပါဝင်ပါတယ်။
+*   **Trainer API**: Hugging Face Transformers library မှ model များကို ထိရောက်စွာ လေ့ကျင့်ရန်အတွက် ဒီဇိုင်းထုတ်ထားသော မြင့်မားသောအဆင့် API။
+*   **Apache Arrow**: In-memory data format တစ်ခုဖြစ်ပြီး data analytics applications တွေကြား ဒေတာဖလှယ်မှုကို မြန်ဆန်စေသည်။
\ No newline at end of file
diff --git a/chapters/my/chapter3/3.mdx b/chapters/my/chapter3/3.mdx
new file mode 100644
index 000000000..65029db63
--- /dev/null
+++ b/chapters/my/chapter3/3.mdx
@@ -0,0 +1,443 @@
+<FrameworkSwitchCourse {fw} />
+
+# Trainer API ဖြင့် မော်ဒယ်တစ်ခုကို Fine-tuning လုပ်ခြင်း[[fine-tuning-a-model-with-the-trainer-api]]
+
+<CourseFloatingBanner chapter={3}
+  classNames="absolute z-10 right-0 top-0"
+  notebooks={[
+    {label: "Google Colab", value: "https://colab.research.google.com/github/huggingface/notebooks/blob/master/course/en/chapter3/section3.ipynb"},
+    {label: "Aws Studio", value: "https://studiolab.sagemaker.aws/import/github/huggingface/notebooks/blob/master/course/en/chapter3/section3.ipynb"},
+]} />
+
+<Youtube id="nvBXf7s7vTI"/>
+
+🤗 Transformers က ပံ့ပိုးပေးထားတဲ့ pretrained models တွေကို သင်ရဲ့ dataset ပေါ်မှာ ခေတ်မီ အကောင်းဆုံးနည်းလမ်းများနဲ့ fine-tune လုပ်ဖို့ `Trainer` class တစ်ခုကို ပေးစွမ်းပါတယ်။ ယခင်အပိုင်းမှာ ဒေတာ preprocessing လုပ်ငန်းအားလုံးကို လုပ်ဆောင်ပြီးသွားရင်၊ `Trainer` ကို သတ်မှတ်ဖို့ အနည်းငယ်သော အဆင့်များသာ ကျန်တော့မှာပါ။ `Trainer.train()` ကို run ဖို့ ပတ်ဝန်းကျင်ကို ပြင်ဆင်တာက အခက်ခဲဆုံး အပိုင်းဖြစ်နိုင်ပါတယ်၊ ဘာလို့လဲဆိုတော့ ဒါက CPU ပေါ်မှာ အလွန်နှေးကွေးစွာ run မှာဖြစ်လို့ပါ။ သင့်မှာ GPU မရှိဘူးဆိုရင် [Google Colab](https://colab.research.google.com/) မှာ အခမဲ့ GPUs ဒါမှမဟုတ် TPUs ကို ရယူနိုင်ပါတယ်။
+
+> [!TIP]
+> 📚 **Training အရင်းအမြစ်များ**: Training ထဲကို မဝင်ခင်၊ ပြည့်စုံသော [🤗 Transformers training guide](https://huggingface.co/docs/transformers/main/en/training) ကို လေ့လာပြီး [fine-tuning cookbook](https://huggingface.co/learn/cookbook/en/fine_tuning_code_llm_on_single_gpu) မှာပါတဲ့ လက်တွေ့ဥပမာတွေကို ရှာဖွေပါ။
+
+အောက်ပါ code ဥပမာတွေက ယခင်အပိုင်းက ဥပမာတွေကို သင်ပြီးသားလို့ ယူဆထားပါတယ်။ သင်လိုအပ်တာတွေကို အကျဉ်းချုပ် ပြန်ဖော်ပြထားပါတယ်။
+
+```py
+from datasets import load_dataset
+from transformers import AutoTokenizer, DataCollatorWithPadding
+
+raw_datasets = load_dataset("glue", "mrpc")
+checkpoint = "bert-base-uncased"
+tokenizer = AutoTokenizer.from_pretrained(checkpoint)
+
+
+def tokenize_function(example):
+    return tokenizer(example["sentence1"], example["sentence2"], truncation=True)
+
+
+tokenized_datasets = raw_datasets.map(tokenize_function, batched=True)
+data_collator = DataCollatorWithPadding(tokenizer=tokenizer)
+```
+
+### Training[[training]]
+
+ကျွန်တော်တို့ရဲ့ `Trainer` ကို သတ်မှတ်ခြင်းမပြုမီ ပထမအဆင့်ကတော့ `Trainer` က training နဲ့ evaluation အတွက် အသုံးပြုမယ့် hyperparameters အားလုံးပါဝင်တဲ့ `TrainingArguments` class တစ်ခုကို သတ်မှတ်ဖို့ပါပဲ။ သင် ပေးဆောင်ရမယ့် တစ်ခုတည်းသော argument ကတော့ လေ့ကျင့်ပြီးသား model ကို သိမ်းဆည်းမယ့် directory ဖြစ်သလို၊ လမ်းတစ်လျှောက်မှာရှိတဲ့ checkpoints တွေကိုလည်း သိမ်းဆည်းမယ့် directory ပါပဲ။ ကျန်တာတွေအားလုံးအတွက်တော့ default တွေကို ချန်ထားခဲ့နိုင်ပါတယ်၊ ဒါတွေက အခြေခံ fine-tuning အတွက် ကောင်းကောင်းအလုပ်လုပ်ပါလိမ့်မယ်။
+
+```py
+from transformers import TrainingArguments
+
+training_args = TrainingArguments("test-trainer")
+```
+
+training လုပ်နေစဉ်မှာ သင်ရဲ့ model ကို Hub ကို အလိုအလျောက် upload လုပ်ချင်တယ်ဆိုရင် `TrainingArguments` မှာ `push_to_hub=True` ကို ထည့်သွင်းပေးပါ။ ဒီအကြောင်းကို [Chapter 4](/course/chapter4/3) မှာ ပိုမိုလေ့လာပါမယ်။
+
+> [!TIP]
+> 🚀 **Advanced Configuration**: ရရှိနိုင်သော training arguments အားလုံးနှင့် optimization strategies များအကြောင်း အသေးစိတ်အချက်အလက်များအတွက် [TrainingArguments documentation](https://huggingface.co/docs/transformers/main/en/main_classes/trainer#transformers.TrainingArguments) နှင့် [training configuration cookbook](https://huggingface.co/learn/cookbook/en/fine_tuning_code_llm_on_single_gpu) ကို ကြည့်ရှုပါ။
+
+ဒုတိယအဆင့်ကတော့ ကျွန်တော်တို့ရဲ့ model ကို သတ်မှတ်ဖို့ပါပဲ။ [ယခင်အခန်း](/course/chapter2) မှာလိုပဲ၊ label နှစ်ခုပါတဲ့ `AutoModelForSequenceClassification` class ကို ကျွန်တော်တို့ အသုံးပြုပါမယ်။
+
+```py
+from transformers import AutoModelForSequenceClassification
+
+model = AutoModelForSequenceClassification.from_pretrained(checkpoint, num_labels=2)
+```
+
+ဒီ pretrained model ကို instantiate လုပ်ပြီးနောက် သတိပေးချက်တစ်ခု ရရှိတာကို သင်သတိပြုမိပါလိမ့်မယ်။ ဒါက BERT ကို sentence pair တွေကို classify လုပ်ဖို့ pretrain လုပ်မထားလို့ပါပဲ၊ ဒါကြောင့် pretrained model ရဲ့ head ကို ဖယ်ရှားပြီး sequence classification အတွက် သင့်လျော်တဲ့ head အသစ်တစ်ခုကို ထည့်သွင်းထားပါတယ်။ သတိပေးချက်တွေက အချို့ weights တွေကို အသုံးမပြုခဲ့ဘူး (ဖယ်ရှားလိုက်တဲ့ pretraining head နဲ့ ကိုက်ညီတဲ့ weights တွေ) နဲ့ အချို့ကိုတော့ ကျပန်း (randomly) initialize လုပ်ခဲ့တယ် (head အသစ်အတွက် weights တွေ) ဆိုတာကို ဖော်ပြပါတယ်။ ဒါက model ကို train လုပ်ဖို့ သင့်ကို အားပေးနေတာဖြစ်ပြီး၊ ဒါဟာ ကျွန်တော်တို့ အခုလုပ်မယ့်အရာပါပဲ။
+
+ကျွန်တော်တို့မှာ model ရှိပြီးတာနဲ့၊ အခုအထိ တည်ဆောက်ခဲ့တဲ့ objects အားလုံးကို `model`၊ `training_args`၊ training နဲ့ validation datasets တွေ၊ ကျွန်တော်တို့ရဲ့ `data_collator` နဲ့ ကျွန်တော်တို့ရဲ့ `processing_class` တွေကို ပေးပို့ခြင်းဖြင့် `Trainer` တစ်ခုကို သတ်မှတ်နိုင်ပါပြီ။ `processing_class` parameter က Trainer ကို processing အတွက် ဘယ် tokenizer ကို အသုံးပြုရမယ်ဆိုတာကို ပြောပြပေးတဲ့ ပိုမိုအသစ်သော ထပ်တိုးမှုတစ်ခုပါ။
+
+```py
+from transformers import Trainer
+
+trainer = Trainer(
+    model,
+    training_args,
+    train_dataset=tokenized_datasets["train"],
+    eval_dataset=tokenized_datasets["validation"],
+    data_collator=data_collator,
+    processing_class=tokenizer,
+)
+```
+
+သင်က tokenizer တစ်ခုကို `processing_class` အဖြစ် ပေးပို့တဲ့အခါ၊ `Trainer` က အသုံးပြုမယ့် default `data_collator` က `DataCollatorWithPadding` ဖြစ်ပါလိမ့်မယ်။ ဒီအခြေအနေမှာတော့ သင် `data_collator=data_collator` လိုင်းကို ချန်လှပ်ထားနိုင်ပါတယ်၊ ဒါပေမယ့် processing pipeline ရဲ့ ဒီအရေးကြီးတဲ့ အစိတ်အပိုင်းကို သင့်ကို ပြသဖို့ ဒီနေရာမှာ ထည့်သွင်းထားပါတယ်။
+
+> [!TIP]
+> 📖 **ပိုမိုလေ့လာရန်**: Trainer class နှင့် ၎င်း၏ parameters များအကြောင်း ပြည့်စုံသောအသေးစိတ်အချက်အလက်များအတွက် [Trainer API documentation](https://huggingface.co/docs/transformers/main/en/main_classes/trainer) ကို ကြည့်ရှုပြီး [training cookbook recipes](https://huggingface.co/learn/cookbook/en/fine_tuning_code_llm_on_single_gpu) တွင် အဆင့်မြင့် အသုံးပြုပုံများကို လေ့လာပါ။
+
+ကျွန်တော်တို့ရဲ့ dataset ပေါ်မှာ model ကို fine-tune လုပ်ဖို့အတွက်၊ ကျွန်တော်တို့ရဲ့ `Trainer` ရဲ့ `train()` method ကို ခေါ်ဆိုရုံပါပဲ-
+
+```py
+trainer.train()
+```
+
+ဒါက fine-tuning ကို စတင်ပါလိမ့်မယ် (GPU ပေါ်မှာ မိနစ်အနည်းငယ် ကြာမြင့်ပါလိမ့်မယ်) ပြီးတော့ training loss ကို step ၅၀၀ တိုင်းမှာ အစီရင်ခံပါလိမ့်မယ်။ ဒါပေမယ့် သင့် model ဘယ်လောက် ကောင်းကောင်း (ဒါမှမဟုတ် ဆိုးဆိုး) အလုပ်လုပ်နေတယ်ဆိုတာကိုတော့ ပြောပြမှာ မဟုတ်ပါဘူး။ ဒါက အောက်ပါအချက်တွေကြောင့်ပါ 
+
+1.  `TrainingArguments` မှာ `eval_strategy` ကို `"steps"` (eval_steps တိုင်းမှာ evaluate လုပ်ပါ) ဒါမှမဟုတ် `"epoch"` (epoch တစ်ခုစီရဲ့ အဆုံးမှာ evaluate လုပ်ပါ) အဖြစ် သတ်မှတ်ခြင်းဖြင့် training လုပ်နေစဉ် evaluate လုပ်ဖို့ ကျွန်တော်တို့ `Trainer` ကို မပြောခဲ့ပါဘူး။
+2.  အဲဒီ evaluation လုပ်နေစဉ် metric တွက်ချက်ဖို့ `compute_metrics()` function တစ်ခုကို ကျွန်တော်တို့ `Trainer` ကို မပေးခဲ့ပါဘူး (မဟုတ်ရင် evaluation က loss ကိုသာ print လုပ်မှာဖြစ်ပြီး၊ ဒါက အလိုလိုနားလည်ရခက်တဲ့ နံပါတ်တစ်ခုပါ)။
+
+### Evaluation[[evaluation]]
+
+အသုံးဝင်တဲ့ `compute_metrics()` function တစ်ခုကို ဘယ်လိုတည်ဆောက်ရမလဲ၊ နောက်တစ်ကြိမ် train လုပ်တဲ့အခါ ဘယ်လိုအသုံးပြုရမလဲဆိုတာ ကြည့်ရအောင်။ function က `EvalPrediction` object (အမည်ပါတဲ့ tuple တစ်ခုဖြစ်ပြီး `predictions` field နဲ့ `label_ids` field တွေ ပါဝင်ပါတယ်) ကို ယူရပါမယ်၊ ပြီးတော့ strings တွေကို floats တွေနဲ့ map လုပ်ထားတဲ့ dictionary တစ်ခုကို ပြန်ပို့ပါလိမ့်မယ် (strings တွေက ပြန်ပို့တဲ့ metrics တွေရဲ့ နာမည်တွေဖြစ်ပြီး၊ floats တွေက ၎င်းတို့ရဲ့ values တွေပါ)။ ကျွန်တော်တို့ရဲ့ model ကနေ predictions တွေရဖို့၊ `Trainer.predict()` command ကို အသုံးပြုနိုင်ပါတယ်။
+
+```py
+predictions = trainer.predict(tokenized_datasets["validation"])
+print(predictions.predictions.shape, predictions.label_ids.shape)
+```
+
+```python out
+(408, 2) (408,)
+```
+
+`predict()` method ရဲ့ output က `predictions`, `label_ids`, နဲ့ `metrics` ဆိုတဲ့ fields သုံးခုပါတဲ့ အခြားအမည်ပါတဲ့ tuple တစ်ခုပါပဲ။ `metrics` field ကတော့ ပေးပို့ထားတဲ့ dataset ပေါ်က loss ကိုသာမက အချိန် metrics အချို့ (စုစုပေါင်းနဲ့ ပျမ်းမျှအားဖြင့် ခန့်မှန်းဖို့ ဘယ်လောက်ကြာသလဲ) ကိုလည်း ပါဝင်ပါလိမ့်မယ်။ ကျွန်တော်တို့ `compute_metrics()` function ကို ဖြည့်စွက်ပြီး `Trainer` ကို ပေးပို့လိုက်တာနဲ့၊ အဲဒီ field က `compute_metrics()` ကနေ ပြန်ပို့တဲ့ metrics တွေကိုလည်း ပါဝင်ပါလိမ့်မယ်။
+
+သင်တွေ့ရတဲ့အတိုင်း၊ `predictions` က 408 x 2 shape ရှိတဲ့ two-dimensional array တစ်ခုပါ (408 က ကျွန်တော်တို့ `predict()` ကို ပေးပို့ခဲ့တဲ့ dataset ထဲက element အရေအတွက်ပါ)။ ဒါတွေက ကျွန်တော်တို့ `predict()` ကို ပေးပို့ခဲ့တဲ့ dataset ရဲ့ element တစ်ခုစီအတွက် logits တွေပါ (သင် [ယခင်အခန်း](/course/chapter2) မှာ တွေ့ခဲ့တဲ့အတိုင်း၊ Transformer model တွေအားလုံးက logits တွေကို ပြန်ပို့ပါတယ်)။ ၎င်းတို့ကို ကျွန်တော်တို့ရဲ့ labels တွေနဲ့ နှိုင်းယှဉ်နိုင်တဲ့ predictions တွေအဖြစ် ပြောင်းလဲဖို့အတွက်၊ ဒုတိယ axis မှာ အများဆုံး value ရှိတဲ့ index ကို ယူဖို့ လိုအပ်ပါတယ်။
+
+```py
+import numpy as np
+
+preds = np.argmax(predictions.predictions, axis=-1)
+```
+
+အခု ကျွန်တော်တို့ ဒီ `preds` တွေကို labels တွေနဲ့ နှိုင်းယှဉ်နိုင်ပါပြီ။ ကျွန်တော်တို့ရဲ့ `compute_metric()` function ကို တည်ဆောက်ဖို့အတွက် 🤗 [Evaluate](https://github.com/huggingface/evaluate/) library က metrics တွေကို အားကိုးပါမယ်။ `evaluate.load()` function ကို အသုံးပြုပြီး dataset ကို loading လုပ်သလိုပဲ MRPC dataset နဲ့ သက်ဆိုင်တဲ့ metrics တွေကို အလွယ်တကူ load လုပ်နိုင်ပါတယ်။ ပြန်ပို့တဲ့ object မှာ metric တွက်ချက်မှုကို လုပ်ဆောင်ဖို့ အသုံးပြုနိုင်တဲ့ `compute()` method တစ်ခု ပါရှိပါတယ်။
+
+```py
+import evaluate
+
+metric = evaluate.load("glue", "mrpc")
+metric.compute(predictions=preds, references=predictions.label_ids)
+```
+
+```python out
+{'accuracy': 0.8578431372549019, 'f1': 0.8996539792387542}
+```
+
+> [!TIP]
+> [🤗 Evaluate documentation](https://huggingface.co/docs/evaluate/) တွင် မတူညီသော evaluation metrics များနှင့် strategies များအကြောင်း လေ့လာပါ။
+
+သင်ရရှိမယ့် တိကျတဲ့ရလဒ်တွေက မော်ဒယ် head ရဲ့ ကျပန်း (random) initialization ကြောင့် ပြောင်းလဲနိုင်ပါတယ်၊ ဒါပေမယ့် ဒီနေရာမှာ ကျွန်တော်တို့ရဲ့ မော်ဒယ်က validation set မှာ 85.78% accuracy နဲ့ 89.97% F1 score ရရှိတာကို တွေ့ရပါတယ်။ ဒါတွေက GLUE benchmark အတွက် MRPC dataset မှာ ရလဒ်တွေကို အကဲဖြတ်ဖို့ အသုံးပြုတဲ့ metrics နှစ်ခုပါပဲ။ [BERT paper](https://arxiv.org/pdf/1810.04805.pdf) မှာပါတဲ့ ဇယားက base model အတွက် F1 score 88.9 လို့ ဖော်ပြထားပါတယ်။ ကျွန်တော်တို့ အခု အသုံးပြုနေတာက `cased` model ဖြစ်ပြီး၊ အဲဒါက ပိုကောင်းတဲ့ရလဒ်ကို ရှင်းပြပါတယ်။
+
+အားလုံးကို ပေါင်းစပ်လိုက်ရင်၊ ကျွန်တော်တို့ `compute_metrics()` function ကို ရရှိပါတယ်။
+
+```py
+def compute_metrics(eval_preds):
+    metric = evaluate.load("glue", "mrpc")
+    logits, labels = eval_preds
+    predictions = np.argmax(logits, axis=-1)
+    return metric.compute(predictions=predictions, references=labels)
+```
+
+ပြီးတော့ epoch တစ်ခုစီရဲ့ အဆုံးမှာ metrics တွေ အစီရင်ခံတာကို လက်တွေ့အသုံးပြုတာကို ကြည့်ဖို့အတွက်၊ ဒီ `compute_metrics()` function နဲ့ `Trainer` အသစ်တစ်ခုကို ဘယ်လိုသတ်မှတ်ရမလဲဆိုတာ ဤတွင် ဖော်ပြထားပါတယ်-
+
+```py
+training_args = TrainingArguments("test-trainer", eval_strategy="epoch")
+model = AutoModelForSequenceClassification.from_pretrained(checkpoint, num_labels=2)
+
+trainer = Trainer(
+    model,
+    training_args,
+    train_dataset=tokenized_datasets["train"],
+    eval_dataset=tokenized_datasets["validation"],
+    data_collator=data_collator,
+    processing_class=tokenizer,
+    compute_metrics=compute_metrics,
+)
+```
+
+၎င်း၏ `eval_strategy` ကို `"epoch"` အဖြစ် သတ်မှတ်ထားသော `TrainingArguments` အသစ်တစ်ခုနှင့် model အသစ်တစ်ခုကို ဖန်တီးထားတာကို သတိပြုပါ။ မဟုတ်ရင်၊ ကျွန်တော်တို့ အရင်က train ထားပြီးသား model ကိုပဲ ဆက်လက် training လုပ်နေမှာ ဖြစ်ပါတယ်။ training run အသစ်တစ်ခု စတင်ရန်၊ ကျွန်တော်တို့ အောက်ပါအတိုင်း execute လုပ်ပါ။
+
+```py
+trainer.train()
+```
+
+ဒီတစ်ခါတော့ training loss အပြင် epoch တစ်ခုစီရဲ့ အဆုံးမှာ validation loss နဲ့ metrics တွေကိုပါ အစီရင်ခံပါလိမ့်မယ်။ ထပ်မံပြောရရင်၊ model ရဲ့ random head initialization ကြောင့် သင်ရရှိမယ့် တိကျတဲ့ accuracy/F1 score က ကျွန်တော်တို့ ရှာဖွေတွေ့ရှိတာနဲ့ အနည်းငယ် ကွာခြားနိုင်ပေမယ့်၊ အနီးစပ်ဆုံး တူညီသင့်ပါတယ်။
+
+### Advanced Training Features[[advanced-training-features]]
+
+`Trainer` မှာ ခေတ်မီ deep learning အကောင်းဆုံးနည်းလမ်းတွေကို လက်လှမ်းမီစေတဲ့ built-in features များစွာ ပါဝင်ပါတယ်-
+
+**Mixed Precision Training**: ပိုမိုမြန်ဆန်သော training နှင့် memory အသုံးပြုမှု လျှော့ချရန်အတွက် သင်၏ training arguments တွင် `fp16=True` ကို အသုံးပြုပါ။
+
+```py
+training_args = TrainingArguments(
+    "test-trainer",
+    eval_strategy="epoch",
+    fp16=True,  # Mixed precision ကို ဖွင့်ပါ
+)
+```
+
+**Gradient Accumulation**: GPU memory နည်းပါးတဲ့အခါ ပိုကြီးမားတဲ့ batch sizes တွေကို ထိထိရောက်ရောက် အသုံးပြုနိုင်ဖို့-
+
+```py
+training_args = TrainingArguments(
+    "test-trainer",
+    eval_strategy="epoch",
+    per_device_train_batch_size=4,
+    gradient_accumulation_steps=4,  # ထိရောက်သော batch size = 4 * 4 = 16
+)
+```
+
+**Learning Rate Scheduling**: Trainer က default အားဖြင့် linear decay ကို အသုံးပြုပါတယ်၊ ဒါပေမယ့် ဒါကို သင် စိတ်ကြိုက်ပြင်ဆင်နိုင်ပါတယ်။
+
+```py
+training_args = TrainingArguments(
+    "test-trainer",
+    eval_strategy="epoch",
+    learning_rate=2e-5,
+    lr_scheduler_type="cosine",  # မတူညီသော schedulers များကို စမ်းသပ်ကြည့်ပါ
+)
+```
+
+> [!TIP]
+> 🎯 **စွမ်းဆောင်ရည် မြှင့်တင်ရေး**: distributed training၊ memory optimization နှင့် hardware-specific optimizations အပါအဝင် ပိုမိုအဆင့်မြင့်သော training နည်းလမ်းများအတွက် [🤗 Transformers performance guide](https://huggingface.co/docs/transformers/main/en/performance) ကို လေ့လာပါ။
+
+`Trainer` က GPUs များစွာ ဒါမှမဟုတ် TPUs တွေပေါ်မှာ အလိုအလျောက် အလုပ်လုပ်နိုင်ပြီး distributed training အတွက် options များစွာကို ပေးစွမ်းပါတယ်။ ဒါတွေအားလုံးကို Chapter 10 မှာ ကျွန်တော်တို့ ဆွေးနွေးသွားပါမယ်။
+
+ဒါက `Trainer` API ကို အသုံးပြုပြီး fine-tuning လုပ်ခြင်းနိဒါန်းကို နိဂုံးချုပ်လိုက်ပါပြီ။ အဖြစ်များဆုံး NLP လုပ်ငန်းအများစုအတွက် ဒါကို ဘယ်လိုလုပ်ဆောင်ရမယ်ဆိုတဲ့ ဥပမာတစ်ခုကို [Chapter 7](/course/chapter7) မှာ ပေးထားပါလိမ့်မယ်၊ ဒါပေမယ့် အခုတော့ pure PyTorch training loop နဲ့ ဒီအရာတွေကို ဘယ်လိုလုပ်ဆောင်ရမလဲဆိုတာ ကြည့်ရအောင်။
+
+> [!TIP]
+> 📝 **ဥပမာများ ထပ်မံကြည့်ရှုရန်**: ပြည့်စုံသော [🤗 Transformers notebooks](https://huggingface.co/docs/transformers/main/en/notebooks) စုစည်းမှုကို ကြည့်ရှုပါ။
+
+## အခန်း၏ ဗဟုသုတစစ်ဆေးခြင်း[[section-quiz]]
+
+Trainer API နှင့် fine-tuning concepts များအကြောင်း သင့်နားလည်မှုကို စမ်းသပ်ပါ။
+
+### 1. Trainer မှာပါတဲ့ `processing_class` parameter ရဲ့ ရည်ရွယ်ချက်က ဘာလဲ။
+
+<Question
+	choices={[
+		{
+			text: "၎င်းသည် မည်သည့် model architecture ကို အသုံးပြုရမည်ကို သတ်မှတ်သည်။",
+			explain: "Model architecture ကို model ကို load လုပ်တဲ့အခါ သတ်မှတ်တာဖြစ်ပြီး Trainer မှာ မဟုတ်ပါဘူး။"
+		},
+		{
+			text: "၎င်းသည် ဒေတာများကို စီမံဆောင်ရွက်ရန် မည်သည့် tokenizer ကို အသုံးပြုရမည်ကို Trainer ကို ပြောပြသည်။",
+			explain: "processing_class parameter သည် Trainer ကို မည်သည့် tokenizer ကို အသုံးပြုရမည်ကို သိရှိစေရန် ကူညီပေးသည့် ခေတ်မီထပ်တိုးမှုတစ်ခုဖြစ်သည်။",
+            correct: true
+		},
+		{
+			text: "၎င်းသည် training အတွက် batch size ကို ဆုံးဖြတ်သည်။",
+			explain: "Batch size ကို TrainingArguments မှာ သတ်မှတ်တာဖြစ်ပြီး processing_class မှတစ်ဆင့် မဟုတ်ပါဘူး။"
+		},
+        {
+			text: "၎င်းသည် evaluation frequency ကို ထိန်းချုပ်သည်။",
+			explain: "Evaluation frequency ကို TrainingArguments မှာရှိတဲ့ eval_strategy က ထိန်းချုပ်ပါတယ်။"
+		}
+	]}
+/>
+
+### 2. TrainingArguments parameter က training လုပ်နေစဉ် evaluation ကို ဘယ်လောက်ကြာကြာ လုပ်ရမယ်ဆိုတာကို ဘယ်လိုထိန်းချုပ်သလဲ။
+
+<Question
+	choices={[
+		{
+			text: "eval_frequency",
+			explain: "TrainingArguments မှာ eval_frequency parameter မရှိပါဘူး။"
+		},
+		{
+			text: "eval_strategy",
+			explain: "eval_strategy ကို 'epoch', 'steps', ဒါမှမဟုတ် 'no' အဖြစ် သတ်မှတ်ခြင်းဖြင့် evaluation အချိန်ကို ထိန်းချုပ်နိုင်ပါတယ်။",
+            correct: true
+		},
+		{
+			text: "evaluation_steps",
+			explain: "eval_steps က evaluation တွေကြားက steps အရေအတွက်ကို သတ်မှတ်တာဖြစ်ပေမယ့် eval_strategy က evaluation လုပ်ရင် လုပ်မလုပ်/ဘယ်တော့လုပ်မယ်ဆိုတာကို ဆုံးဖြတ်ပါတယ်။"
+		},
+        {
+			text: "do_eval",
+			explain: "ခေတ်မီ TrainingArguments တွေမှာ do_eval parameter မရှိပါဘူး။"
+		}
+	]}
+/>
+
+### 3. TrainingArguments မှာ `fp16=True` က ဘာကို ဖွင့်ပေးသလဲ။
+
+<Question
+	choices={[
+		{
+			text: "ပိုမိုမြန်ဆန်သော training အတွက် 16-bit integer precision။",
+			explain: "fp16 က floating-point precision ကို ရည်ညွှန်းတာဖြစ်ပြီး integer precision မဟုတ်ပါဘူး။"
+		},
+		{
+			text: "ပိုမိုမြန်ဆန်သော training နှင့် memory အသုံးပြုမှု လျှော့ချရန်အတွက် 16-bit floating-point numbers ဖြင့် mixed precision training။",
+			explain: "Mixed precision training သည် forward pass အတွက် 16-bit floats များကို အသုံးပြုပြီး gradients အတွက် 32-bit များကို အသုံးပြု၍ အမြန်နှုန်းကို တိုးတက်စေကာ memory အသုံးပြုမှုကို လျှော့ချပေးသည်။",
+            correct: true
+		},
+		{
+			text: "အတိအကျ 16 epochs အတွက် training လုပ်ခြင်း။",
+			explain: "fp16 နဲ့ epochs အရေအတွက်နဲ့ ဘာမှမဆိုင်ပါဘူး။"
+		},
+        {
+			text: "distributed training အတွက် GPUs ၁၆ ခု အသုံးပြုခြင်း။",
+			explain: "GPUs အရေအတွက်ကို fp16 parameter က ထိန်းချုပ်တာ မဟုတ်ပါဘူး။"
+		}
+	]}
+/>
+
+### 4. Trainer မှာပါတဲ့ `compute_metrics` function ရဲ့ အခန်းကဏ္ဍက ဘာလဲ။
+
+<Question
+	choices={[
+		{
+			text: "၎င်းသည် training လုပ်နေစဉ် loss ကို တွက်ချက်သည်။",
+			explain: "Loss တွက်ချက်မှုကို model က အလိုအလျောက် ကိုင်တွယ်တာဖြစ်ပြီး compute_metrics က မဟုတ်ပါဘူး။"
+		},
+		{
+			text: "၎င်းသည် logits များကို predictions များအဖြစ် ပြောင်းလဲပြီး accuracy နှင့် F1 ကဲ့သို့သော evaluation metrics များကို တွက်ချက်သည်။",
+			explain: "compute_metrics သည် predictions နှင့် labels များကို ယူပြီး evaluation အတွက် metrics များကို ပြန်ပေးသည်။",
+            correct: true
+		},
+		{
+			text: "၎င်းသည် မည်သည့် optimizer ကို အသုံးပြုရမည်ကို ဆုံးဖြတ်သည်။",
+			explain: "Optimizer ရွေးချယ်မှုကို compute_metrics က ကိုင်တွယ်တာ မဟုတ်ပါဘူး။"
+		},
+        {
+			text: "၎င်းသည် training data ကို preprocessing လုပ်သည်။",
+			explain: "ဒေတာ preprocessing ကို training မလုပ်ခင် လုပ်ဆောင်တာဖြစ်ပြီး evaluation လုပ်နေစဉ် compute_metrics က မဟုတ်ပါဘူး။"
+		}
+	]}
+/>
+
+### 5. Trainer ကို `eval_dataset` မပေးတဲ့အခါ ဘာဖြစ်မလဲ။
+
+<Question
+	choices={[
+		{
+			text: "Training က error နဲ့ fail ဖြစ်သွားပါလိမ့်မယ်။",
+			explain: "eval_dataset မရှိဘဲ training ကို ဆက်လုပ်နိုင်ပါတယ်၊ ဒါပေမယ့် evaluation metrics တွေ ရရှိမှာ မဟုတ်ပါဘူး။"
+		},
+		{
+			text: "Trainer က training data ကို evaluation အတွက် အလိုအလျောက် ခွဲထုတ်ပါလိမ့်မယ်။",
+			explain: "Trainer က validation splits တွေကို အလိုအလျောက် ဖန်တီးပေးတာ မဟုတ်ပါဘူး။"
+		},
+		{
+			text: "Training လုပ်နေစဉ် evaluation metrics တွေ မရရှိနိုင်ပေမယ့် training ကတော့ အလုပ်လုပ်နေပါလိမ့်မယ်။",
+			explain: "Evaluation က optional ဖြစ်ပါတယ် - ဒါမရှိဘဲ train လုပ်နိုင်ပေမယ့် validation metrics တွေ မြင်ရမှာ မဟုတ်ပါဘူး။",
+            correct: true
+		},
+        {
+			text: "Model က training data ကို evaluation အတွက် အသုံးပြုပါလိမ့်မယ်။",
+			explain: "Trainer က training data ကို evaluation အတွက် အလိုအလျောက် အသုံးပြုမှာ မဟုတ်ပါဘူး - ၎င်းသည် ရိုးရှင်းစွာ evaluation မလုပ်ဆောင်ပါဘူး။"
+		}
+	]}
+/>
+
+### 6. Gradient accumulation ဆိုတာ ဘာလဲ၊ ဘယ်လို ဖွင့်ရမလဲ။
+
+<Question
+	choices={[
+		{
+			text: "၎င်းသည် gradients များကို disk ထဲသို့ သိမ်းဆည်းခြင်းဖြစ်ပြီး save_gradients=True ဖြင့် ဖွင့်သည်။",
+			explain: "Gradient accumulation ဟာ gradients တွေကို disk ထဲသို့ သိမ်းဆည်းခြင်း မပါဝင်ပါဘူး။"
+		},
+		{
+			text: "၎င်းသည် updating မလုပ်မီ batches အများအပြားပေါ်မှာ gradients များကို စုဆောင်းခြင်းဖြစ်ပြီး gradient_accumulation_steps ဖြင့် ဖွင့်သည်။",
+			explain: "ဒါက forward passes များစွာပေါ်မှာ gradients များကို စုဆောင်းခြင်းဖြင့် ပိုကြီးမားသော batch sizes များကို အတုယူနိုင်စေသည်။",
+            correct: true
+		},
+		{
+			text: "၎င်းသည် gradient တွက်ချက်မှုကို အရှိန်မြှင့်ပေးပြီး fp16 ဖြင့် အလိုအလျောက် ဖွင့်သည်။",
+			explain: "fp16 က training ကို အရှိန်မြှင့်နိုင်သော်လည်း၊ gradient accumulation က သီးခြားနည်းလမ်းတစ်ခုပါ။"
+		},
+        {
+			text: "၎င်းသည် gradient overflow ကို ကာကွယ်ပေးပြီး gradient_clipping=True ဖြင့် ဖွင့်သည်။",
+			explain: "ဒါက gradient clipping ကို ဖော်ပြတာဖြစ်ပြီး gradient accumulation မဟုတ်ပါဘူး။"
+		}
+	]}
+/>
+
+> [!TIP]
+> 💡 **အဓိက အချက်များ:**
+> - `Trainer` API သည် training ၏ ရှုပ်ထွေးမှုအများစုကို ကိုင်တွယ်ပေးသည့် အဆင့်မြင့် interface တစ်ခုဖြစ်သည်။
+> - သင့်ဒေတာကို မှန်ကန်စွာ ကိုင်တွယ်ရန် `processing_class` ကို အသုံးပြု၍ tokenizer ကို သတ်မှတ်ပါ။
+> - `TrainingArguments` သည် training ၏ ကဏ္ဍအားလုံးကို ထိန်းချုပ်သည်- learning rate, batch size, evaluation strategy နှင့် optimizations များ။
+> - `compute_metrics` သည် training loss အပြင် စိတ်ကြိုက် evaluation metrics များကို ဖွင့်ပေးသည်။
+> - Mixed precision (`fp16=True`) နှင့် gradient accumulation ကဲ့သို့သော ခေတ်မီ features များသည် training ထိရောက်မှုကို သိသိသာသာ တိုးတက်စေနိုင်သည်။
+
+## ဝေါဟာရ ရှင်းလင်းချက် (Glossary)
+
+*   **Fine-tuning**: ကြိုတင်လေ့ကျင့်ထားပြီးသား (pre-trained) မော်ဒယ်တစ်ခုကို သီးခြားလုပ်ငန်းတစ်ခု (specific task) အတွက် အနည်းငယ်သော ဒေတာနဲ့ ထပ်မံလေ့ကျင့်ပေးခြင်းကို ဆိုလိုပါတယ်။
+*   **Trainer API**: Hugging Face Transformers library မှ model များကို ထိရောက်စွာ လေ့ကျင့်ရန်အတွက် ဒီဇိုင်းထုတ်ထားသော မြင့်မားသောအဆင့် (high-level) API (Application Programming Interface)။
+*   **Pretrained Models**: အကြီးစား ဒေတာအမြောက်အမြားဖြင့် ကြိုတင်လေ့ကျင့်ထားပြီးဖြစ်သော AI (Artificial Intelligence) မော်ဒယ်များ။
+*   **Dataset**: AI မော်ဒယ်တွေ လေ့ကျင့်ဖို့အတွက် အသုံးပြုတဲ့ ဒေတာအစုအဝေးတစ်ခုပါ။
+*   **Preprocessing**: ဒေတာများကို model က နားလည်ပြီး လုပ်ဆောင်နိုင်တဲ့ ပုံစံအဖြစ် ပြောင်းလဲပြင်ဆင်ခြင်း လုပ်ငန်းစဉ်။
+*   **GPU (Graphics Processing Unit)**: ဂရပ်ဖစ်လုပ်ဆောင်မှုအတွက် အထူးဒီဇိုင်းထုတ်ထားသော processor တစ်မျိုးဖြစ်သော်လည်း AI/ML (Artificial Intelligence/Machine Learning) လုပ်ငန်းများတွင် အရှိန်မြှင့်ရန် အသုံးများသည်။
+*   **CPU (Central Processing Unit)**: ကွန်ပျူတာ၏ ပင်မ processor ဖြစ်ပြီး အထွေထွေလုပ်ငန်းများကို လုပ်ဆောင်သည်။
+*   **TPU (Tensor Processing Unit)**: Google မှ AI/ML (Artificial Intelligence/Machine Learning) workloads များအတွက် အထူးဒီဇိုင်းထုတ်ထားသော processor တစ်မျိုး။
+*   **Google Colab**: Google မှ ပံ့ပိုးပေးထားသော cloud-based Jupyter Notebook environment တစ်ခုဖြစ်ပြီး Python code များကို web browser မှတစ်ဆင့် run နိုင်စေသည်။ အခမဲ့ GPU/TPU အသုံးပြုခွင့်ပေးသည်။
+*   **`datasets` (Library)**: Hugging Face က ထုတ်လုပ်ထားတဲ့ library တစ်ခုဖြစ်ပြီး AI မော်ဒယ်တွေ လေ့ကျင့်ဖို့အတွက် ဒေတာအစုအဝေး (datasets) တွေကို လွယ်လွယ်ကူကူ ဝင်ရောက်ရယူ၊ စီမံခန့်ခွဲပြီး အသုံးပြုနိုင်စေပါတယ်။
+*   **`transformers` (Library)**: Hugging Face က ထုတ်လုပ်ထားတဲ့ library တစ်ခုဖြစ်ပြီး Transformer မော်ဒယ်တွေကို အသုံးပြုပြီး Natural Language Processing (NLP), computer vision, audio processing စတဲ့ နယ်ပယ်တွေမှာ အဆင့်မြင့် AI မော်ဒယ်တွေကို တည်ဆောက်ပြီး အသုံးပြုနိုင်စေပါတယ်။
+*   **`AutoTokenizer`**: Hugging Face Transformers library မှာ ပါဝင်တဲ့ class တစ်ခုဖြစ်ပြီး မော်ဒယ်အမည်ကို အသုံးပြုပြီး သက်ဆိုင်ရာ tokenizer ကို အလိုအလျောက် load လုပ်ပေးသည်။
+*   **`DataCollatorWithPadding`**: Hugging Face Transformers library မှ ပံ့ပိုးပေးသော class တစ်ခုဖြစ်ပြီး dynamic padding ကို အသုံးပြု၍ batch တစ်ခုအတွင်း samples များကို စုစည်းပေးသည်။
+*   **`load_dataset()` Function**: Hugging Face Datasets library မှ dataset များကို download လုပ်ပြီး cache လုပ်ရန် အသုံးပြုသော function။
+*   **Checkpoint**: မော်ဒယ်၏ weights များနှင့် အခြားဖွဲ့စည်းပုံများ (configuration) ကို သတ်မှတ်ထားသော အချိန်တစ်ခုတွင် သိမ်းဆည်းထားခြင်း။
+*   **`tokenize_function()`**: စာသားများကို tokens အဖြစ် ပြောင်းလဲပေးသည့် function။
+*   **`raw_datasets`**: Preprocessing မလုပ်ရသေးသော dataset များ ပါဝင်သော object။
+*   **`tokenized_datasets`**: Tokenization ပြုလုပ်ပြီးသော dataset များ ပါဝင်သော object။
+*   **`batched=True`**: `map()` method တွင် အသုံးပြုသော argument တစ်ခုဖြစ်ပြီး function ကို dataset ၏ element အများအပြားပေါ်တွင် တစ်ပြိုင်နက်တည်း အသုံးပြုစေသည်။
+*   **Hyperparameters**: AI မော်ဒယ်တစ်ခုကို လေ့ကျင့်ရာတွင် အသုံးပြုသူက သတ်မှတ်ပေးရသော parameters များ (ဥပမာ- learning rate, batch size)။
+*   **TrainingArguments Class**: Trainer ကို အသုံးပြု၍ မော်ဒယ်လေ့ကျင့်ရာတွင် လိုအပ်သော hyperparameters များနှင့် အခြားအခြေအနေများကို သတ်မှတ်ရန် အသုံးပြုသည့် class။
+*   **Evaluation**: မော်ဒယ်၏ စွမ်းဆောင်ရည်ကို တိုင်းတာခြင်း။
+*   **`test-trainer`**: လေ့ကျင့်ပြီးသား model နှင့် checkpoints များကို သိမ်းဆည်းမည့် directory အမည်။
+*   **`push_to_hub=True`**: training လုပ်နေစဉ် model ကို Hugging Face Hub သို့ အလိုအလျောက် upload လုပ်ရန် သတ်မှတ်သော argument။
+*   **Hugging Face Hub**: AI မော်ဒယ်တွေ၊ datasets တွေနဲ့ demo တွေကို အခြားသူတွေနဲ့ မျှဝေဖို့၊ ရှာဖွေဖို့နဲ့ ပြန်လည်အသုံးပြုဖို့အတွက် အွန်လိုင်း platform တစ်ခု ဖြစ်ပါတယ်။
+*   **`AutoModelForSequenceClassification` Class**: Hugging Face Transformers library မှာ ပါဝင်တဲ့ class တစ်ခုဖြစ်ပြီး sequence classification အတွက် pre-trained model ကို အလိုအလျောက် load လုပ်ပေးသည်။
+*   **`num_labels`**: Classification လုပ်ငန်းအတွက် label (အမျိုးအစား) အရေအတွက်။
+*   **Head (Model Head)**: Transformer မော်ဒယ်၏ အဓိကကိုယ်ထည် (body) အပေါ်တွင် ထည့်သွင်းထားသော အပိုအစိတ်အပိုင်း (layer တစ်ခု သို့မဟုတ် နှစ်ခု) ဖြစ်ပြီး သီးခြားလုပ်ငန်း (task) တစ်ခုအတွက် မော်ဒယ်၏ output များကို ချိန်ညှိပေးသည်။ ဥပမာ- sequence classification အတွက် head သည် logits ကို ထုတ်ပေးသည်။
+*   **Randomly Initialized**: မော်ဒယ်၏ parameters (weights) များကို စတင်ချိန်တွင် ကျပန်းတန်ဖိုးများ ပေးခြင်း။
+*   **`Trainer` Class**: 🤗 Transformers library မှ model များကို လေ့ကျင့်ရန်နှင့် အကဲဖြတ်ရန်အတွက် အသုံးပြုသော class။
+*   **`train_dataset`**: Trainer ကို ပေးအပ်သော training set။
+*   **`eval_dataset`**: Trainer ကို ပေးအပ်သော validation set။
+*   **`data_collator`**: batch တစ်ခုအတွင်း samples များကို စုစည်းပေးသော function။
+*   **`processing_class`**: Trainer ကို ဒေတာ processing အတွက် မည်သည့် tokenizer ကို အသုံးပြုရမည်ကို ပြောပြပေးသော parameter။
+*   **`trainer.train()` Method**: Model ကို fine-tune လုပ်ရန် Trainer class ၏ method။
+*   **Training Loss**: Training လုပ်နေစဉ် model ၏ ခန့်မှန်းချက်များနှင့် အမှန်တကယ် labels များကြား ကွာခြားမှုကို တိုင်းတာသော တန်ဖိုး။
+*   **`eval_strategy`**: TrainingArguments တွင် evaluation လုပ်ဆောင်မည့် အချိန်နှင့် ကြိမ်နှုန်းကို သတ်မှတ်သော parameter (ဥပမာ- `"epoch"` သို့မဟုတ် `"steps"`)။
+*   **`eval_steps`**: evaluation လုပ်ငန်းကို ပြန်လုပ်မည့် training steps အရေအတွက်။
+*   **`epoch`**: dataset တစ်ခုလုံးကို model တစ်ခုက အစအဆုံး တစ်ကြိမ် လေ့ကျင့်မှု ပြီးဆုံးခြင်း။
+*   **`compute_metrics()` Function**: evaluation လုပ်ငန်းစဉ်အတွင်း metrics (ဥပမာ- accuracy, F1 score) များကို တွက်ချက်ရန်အတွက် Trainer ကို ပေးအပ်သော function။
+*   **`Trainer.predict()` Method**: Trainer class ၏ method ဖြစ်ပြီး dataset တစ်ခုပေါ်တွင် model ၏ predictions များကို ရရှိစေသည်။
+*   **`EvalPrediction` Object**: `Trainer.predict()` မှ ပြန်ပို့သော named tuple တစ်ခုဖြစ်ပြီး `predictions` field နှင့် `label_ids` field များကို ပါဝင်သည်။
+*   **`predictions` Field**: `EvalPrediction` object တွင် မော်ဒယ်၏ ခန့်မှန်းချက်များ ပါဝင်သော field။
+*   **`label_ids` Field**: `EvalPrediction` object တွင် အမှန်တကယ် label ID များ ပါဝင်သော field။
+*   **`metrics` Field**: `EvalPrediction` object တွင် တွက်ချက်ထားသော evaluation metrics များ ပါဝင်သော field။
+*   **Validation Loss**: Validation set ပေါ်တွင် model ၏ ခန့်မှန်းချက်များနှင့် အမှန်တကယ် labels များကြား ကွာခြားမှုကို တိုင်းတာသော တန်ဖိုး။
+*   **Logits**: မော်ဒယ်၏ နောက်ဆုံး layer မှ ထုတ်ပေးသော raw, unnormalized scores များ။ ၎င်းတို့ကို SoftMax function ကဲ့သို့သော activation function တစ်ခုဖြင့် probabilistic values များအဖြစ် ပြောင်းလဲနိုင်သည်။
+*   **`numpy` (np)**: Python အတွက် ကိန်းဂဏန်းတွက်ချက်မှုများ ပြုလုပ်ရန် အသုံးပြုသော library။
+*   **`np.argmax()`**: NumPy function တစ်ခုဖြစ်ပြီး array တစ်ခု၏ သတ်မှတ်ထားသော axis ပေါ်ရှိ အများဆုံးတန်ဖိုး၏ index ကို ပြန်ပေးသည်။
+*   **🤗 Evaluate Library**: Hugging Face က ထုတ်လုပ်ထားတဲ့ library တစ်ခုဖြစ်ပြီး machine learning မော်ဒယ်တွေရဲ့ စွမ်းဆောင်ရည်ကို တိုင်းတာဖို့အတွက် metrics မျိုးစုံကို ထောက်ပံ့ပေးသည်။
+*   **`evaluate.load()` Function**: 🤗 Evaluate library မှ evaluation metric တစ်ခုကို load လုပ်ရန် အသုံးပြုသော function။
+*   **`metric.compute()` Method**: Load လုပ်ထားသော metric object ၏ method ဖြစ်ပြီး predictions နှင့် references (အမှန်တကယ် labels) များကို အသုံးပြု၍ metric value ကို တွက်ချက်သည်။
+*   **Accuracy**: မှန်ကန်စွာ ခန့်မှန်းနိုင်သော samples ရာခိုင်နှုန်းကို တိုင်းတာသော metric။
+*   **F1 Score**: Precision (တိကျမှု) နှင့် Recall (ပြန်လည်သိမ်းဆည်းမှု) တို့၏ harmonic mean ကို တွက်ချက်သော metric။
+    *   **Precision**: model မှ အပြုသဘောဟု ခန့်မှန်းခဲ့သော samples များအနက် တကယ့်တကယ် အပြုသဘောဖြစ်သော samples များ၏ အချိုး။
+    *   **Recall**: တကယ့်တကယ် အပြုသဘောဖြစ်သော samples များအနက် model မှ မှန်ကန်စွာ အပြုသဘောဟု ခန့်မှန်းခဲ့သော samples များ၏ အချိုး။
+    *   **Harmonic Mean**: ဂဏန်းအစုအဝေးတစ်ခု၏ အပြန်အလှန်တန်ဖိုးများ၏ ပျမ်းမျှကို တွက်ချက်ခြင်း။
+*   **Random Initialization**: မော်ဒယ်၏ weights များကို စတင်ချိန်တွင် ကျပန်းတန်ဖိုးများ ပေးခြင်း။
+*   **Validation Loss**: Model ကို လေ့ကျင့်နေစဉ် validation set ပေါ်တွင် တွက်ချက်သော loss။
+*   **Mixed Precision Training**: မော်ဒယ်ကို လေ့ကျင့်ရာတွင် 16-bit floating-point numbers (fp16) နှင့် 32-bit floating-point numbers (fp32) နှစ်မျိုးလုံးကို ရောစပ်အသုံးပြုခြင်း။ ၎င်းသည် training ကို မြန်ဆန်စေပြီး memory အသုံးပြုမှုကို လျှော့ချသည်။
+*   **`fp16=True`**: Mixed precision training ကို ဖွင့်ရန် TrainingArguments တွင် သတ်မှတ်သော parameter။
+*   **Gradient Accumulation**: GPU memory ကန့်သတ်ချက်ရှိသောအခါ ပိုကြီးမားသော batch sizes များကို အတုယူရန် batches အများအပြားပေါ်တွင် gradients များကို စုဆောင်းပြီးမှ update လုပ်ခြင်း။
+*   **`per_device_train_batch_size`**: device တစ်ခုစီ (ဥပမာ- GPU တစ်ခုစီ) အတွက် training batch size။
+*   **`gradient_accumulation_steps`**: Gradient များကို update မလုပ်မီ စုဆောင်းမည့် steps အရေအတွက်။
+*   **Learning Rate Scheduling**: Training လုပ်နေစဉ် learning rate ကို အချိန်ကြာလာသည်နှင့်အမျှ ပြောင်းလဲသွားစေရန် နည်းလမ်း။
+*   **Linear Decay**: Learning rate ကို အချိန်ကြာလာသည်နှင့်အမျှ လိုင်းဖြောင့်အတိုင်း (linearly) လျှော့ချသွားသော scheduling နည်းလမ်း။
+*   **`learning_rate`**: Training လုပ်နေစဉ် model ၏ weights များကို update လုပ်ရာတွင် အသုံးပြုသော step size။
+*   **`lr_scheduler_type="cosine"`**: Cosine decay ကဲ့သို့သော learning rate scheduler အမျိုးအစားကို သတ်မှတ်ခြင်း။
+*   **Distributed Training**: Model တစ်ခုကို ကွန်ပျူတာများစွာ သို့မဟုတ် devices များစွာပေါ်တွင် တစ်ပြိုင်နက်တည်း လေ့ကျင့်ခြင်း။
+*   **Pure PyTorch Training Loop**: Trainer API ကို အသုံးမပြုဘဲ PyTorch library ၏ အခြေခံလုပ်ဆောင်ချက်များဖြင့် model ကို လေ့ကျင့်ရန် code ကို ကိုယ်တိုင်ရေးသားခြင်း။
\ No newline at end of file
diff --git a/chapters/my/chapter3/4.mdx b/chapters/my/chapter3/4.mdx
new file mode 100644
index 000000000..ce8b440dd
--- /dev/null
+++ b/chapters/my/chapter3/4.mdx
@@ -0,0 +1,626 @@
+# ပြည့်စုံသော Training Loop တစ်ခု[[a-full-training]]
+
+<CourseFloatingBanner chapter={3}
+  classNames="absolute z-10 right-0 top-0"
+  notebooks={[
+    {label: "Google Colab", value: "https://colab.research.google.com/github/huggingface/notebooks/blob/master/course/en/chapter3/section4.ipynb"},
+    {label: "Aws Studio", value: "https://studiolab.sagemaker.aws/import/github/huggingface/notebooks/blob/master/course/en/chapter3/section4.ipynb"},
+]} />
+
+<Youtube id="Dh9CL8fyG80"/>
+
+ယခု ကျွန်တော်တို့သည် `Trainer` class ကို အသုံးမပြုဘဲ၊ ခေတ်မီ PyTorch ၏ အကောင်းဆုံး အလေ့အကျင့်များနှင့်အတူ training loop တစ်ခုကို စတင်တည်ဆောက်ခြင်းဖြင့် ယခင်အပိုင်းတွင် ရရှိခဲ့သော ရလဒ်များအတိုင်း မည်သို့ရရှိနိုင်ကြောင်း လေ့လာပါမည်။ ထပ်မံ၍ သင်သည် အပိုင်း ၂ တွင် ဒေတာ စီမံဆောင်ရွက်မှု (data processing) ကို လုပ်ဆောင်ပြီးဖြစ်သည်ဟု ကျွန်တော်တို့ ယူဆပါသည်။ သင်လိုအပ်မည့် အရာအားလုံး၏ အကျဉ်းချုပ်ကို ဤတွင် ဖော်ပြထားသည်။
+
+> [!TIP]
+> 🏗️ **Scratch မှ Training လုပ်ခြင်း**: ဤအပိုင်းသည် ယခင်အကြောင်းအရာများပေါ်တွင် အခြေခံထားသည်။ PyTorch training loops နှင့် အကောင်းဆုံးအလေ့အကျင့်များဆိုင်ရာ ပြည့်စုံသော လမ်းညွှန်ချက်များအတွက် [🤗 Transformers training documentation](https://huggingface.co/docs/transformers/main/en/training#train-in-native-pytorch) နှင့် [custom training cookbook](https://huggingface.co/learn/cookbook/en/fine_tuning_code_llm_on_single_gpu#model) ကို ကြည့်ရှုပါ။
+
+```py
+from datasets import load_dataset
+from transformers import AutoTokenizer, DataCollatorWithPadding
+
+raw_datasets = load_dataset("glue", "mrpc")
+checkpoint = "bert-base-uncased"
+tokenizer = AutoTokenizer.from_pretrained(checkpoint)
+
+
+def tokenize_function(example):
+    return tokenizer(example["sentence1"], example["sentence2"], truncation=True)
+
+
+tokenized_datasets = raw_datasets.map(tokenize_function, batched=True)
+data_collator = DataCollatorWithPadding(tokenizer=tokenizer)
+```
+
+### Training အတွက် ပြင်ဆင်ခြင်း[[prepare-for-training]]
+
+ကျွန်တော်တို့၏ training loop ကို တကယ်တမ်း မရေးမီတွင် objects အချို့ကို သတ်မှတ်ရန် လိုအပ်ပါလိမ့်မည်။ ပထမဆုံးအရာများမှာ batches များကို ထပ်ခါတလဲလဲ လုပ်ဆောင်ရန် အသုံးပြုမည့် dataloaders များဖြစ်သည်။ သို့သော် ထို dataloaders များကို မသတ်မှတ်မီ၊ `Trainer` က ကျွန်တော်တို့အတွက် အလိုအလျောက် လုပ်ဆောင်ပေးခဲ့သော အရာအချို့ကို ဂရုစိုက်ရန်အတွက် ကျွန်တော်တို့၏ `tokenized_datasets` ကို postprocessing အနည်းငယ် လုပ်ရန်လိုအပ်သည်။ အထူးသဖြင့်၊ ကျွန်တော်တို့ လုပ်ရန်လိုအပ်သည်များမှာ 
+
+- model က မမျှော်လင့်ထားသော (ဥပမာ `sentence1` နှင့် `sentence2` columns ကဲ့သို့) values များနှင့် ကိုက်ညီသော columns များကို ဖယ်ရှားပါ။
+- `label` column ကို `labels` ဟု ပြန်လည်အမည်ပြောင်းပါ (ဘာလို့လဲဆိုတော့ model က argument ကို `labels` ဟု အမည်ပေးထားတာကို မျှော်လင့်ထားလို့ပါ)။
+- datasets ၏ format ကို PyTorch tensors များကို ပြန်ပို့မည့်အစား lists များကို ပြန်ပို့ရန် သတ်မှတ်ပါ။
+
+ကျွန်တော်တို့၏ `tokenized_datasets` တွင် ထိုအဆင့်တစ်ခုစီအတွက် method တစ်ခုစီ ရှိသည်။
+
+```py
+tokenized_datasets = tokenized_datasets.remove_columns(["sentence1", "sentence2", "idx"])
+tokenized_datasets = tokenized_datasets.rename_column("label", "labels")
+tokenized_datasets.set_format("torch")
+tokenized_datasets["train"].column_names
+```
+
+ထို့နောက် ရလဒ်တွင် ကျွန်တော်တို့၏ model က လက်ခံမည့် columns များသာ ပါဝင်ခြင်း ရှိမရှိ စစ်ဆေးနိုင်သည်။
+
+```python
+["attention_mask", "input_ids", "labels", "token_type_ids"]
+```
+
+ယခု ဒါတွေအားလုံး ပြီးသွားပြီဆိုတော့ ကျွန်တော်တို့၏ dataloaders များကို အလွယ်တကူ သတ်မှတ်နိုင်သည်။
+
+```py
+from torch.utils.data import DataLoader
+
+train_dataloader = DataLoader(
+    tokenized_datasets["train"], shuffle=True, batch_size=8, collate_fn=data_collator
+)
+eval_dataloader = DataLoader(
+    tokenized_datasets["validation"], batch_size=8, collate_fn=data_collator
+)
+```
+
+ဒေတာ စီမံဆောင်ရွက်မှု (data processing) တွင် အမှားအယွင်းမရှိစေရန် လျင်မြန်စွာ စစ်ဆေးရန်အတွက် batch တစ်ခုကို ဤသို့ စစ်ဆေးနိုင်သည်။
+
+```py
+for batch in train_dataloader:
+    break
+{k: v.shape for k, v in batch.items()}
+```
+
+```python out
+{'attention_mask': torch.Size([8, 65]),
+ 'input_ids': torch.Size([8, 65]),
+ 'labels': torch.Size([8]),
+ 'token_type_ids': torch.Size([8, 65])}
+```
+
+training dataloader အတွက် `shuffle=True` ကို သတ်မှတ်ထားပြီး batch အတွင်းရှိ အများဆုံးအရှည်အထိ padding လုပ်ထားသောကြောင့် အမှန်တကယ် shape များသည် သင်အတွက် အနည်းငယ် ကွဲပြားနိုင်သည်ကို သတိပြုပါ။
+
+ယခု ဒေတာ preprocessing ကို ကျွန်တော်တို့ အပြည့်အဝ ပြီးဆုံးသွားပြီဖြစ်ရာ (မည်သည့် ML (Machine Learning) practitioner အတွက်မဆို ကျေနပ်ဖွယ်ကောင်းသော်လည်း ရရှိရန်ခက်ခဲသော ပန်းတိုင်တစ်ခု) model ဆီသို့ ပြောင်းရအောင်။ ၎င်းကို ယခင်အပိုင်းတွင် ကျွန်တော်တို့ လုပ်ဆောင်ခဲ့သည့်အတိုင်း အတိအကျ instantiate လုပ်သည်။
+
+```py
+from transformers import AutoModelForSequenceClassification
+
+model = AutoModelForSequenceClassification.from_pretrained(checkpoint, num_labels=2)
+```
+
+training လုပ်နေစဉ် အားလုံးချောမွေ့စွာ ဖြစ်ပျက်စေရန်အတွက် ကျွန်တော်တို့၏ batch ကို model ထံ ပေးပို့သည်။
+
+```py
+outputs = model(**batch)
+print(outputs.loss, outputs.logits.shape)
+```
+
+```python out
+tensor(0.5441, grad_fn=<NllLossBackward>) torch.Size([8, 2])
+```
+
+🤗 Transformers model များအားလုံးသည် `labels` များကို ပေးဆောင်သောအခါ loss ကို ပြန်ပေးလိမ့်မည်ဖြစ်ပြီး ကျွန်တော်တို့သည် logits (ကျွန်တော်တို့၏ batch ရှိ input တစ်ခုစီအတွက် နှစ်ခုစီ၊ ထို့ကြောင့် size 8 x 2 tensor တစ်ခု) ကိုလည်း ရရှိသည်။
+
+ကျွန်တော်တို့၏ training loop ကို ရေးရန် အသင့်ဖြစ်ခါနီးပြီ။ ကျွန်တော်တို့ နှစ်ခုသာ လိုအပ်တော့သည်- optimizer နှင့် learning rate scheduler။ `Trainer` က လုပ်ဆောင်ခဲ့သည့်အရာများကို ကိုယ်တိုင် ပြန်လုပ်ရန် ကြိုးစားနေသောကြောင့် တူညီသော defaults များကို အသုံးပြုမည်။ `Trainer` မှ အသုံးပြုသော optimizer မှာ `AdamW` ဖြစ်သည်။ ၎င်းသည် Adam နှင့် တူသော်လည်း weight decay regularization အတွက် အနည်းငယ် ကွဲပြားမှုရှိသည် (Ilya Loshchilov နှင့် Frank Hutter ရေးသားသော ["Decoupled Weight Decay Regularization"](https://arxiv.org/abs/1711.05101) ကို ကြည့်ပါ)။
+
+```py
+from torch.optim import AdamW
+
+optimizer = AdamW(model.parameters(), lr=5e-5)
+```
+
+> [!TIP]
+> 💡 **ခေတ်မီ Optimization အကြံပြုချက်များ**: ပိုမိုကောင်းမွန်သော စွမ်းဆောင်ရည်အတွက်၊ အောက်ပါတို့ကို စမ်းသပ်နိုင်သည်။
+> - **Weight decay ပါသော AdamW**: `AdamW(model.parameters(), lr=5e-5, weight_decay=0.01)`
+> - **8-bit Adam**: memory-efficient optimization အတွက် `bitsandbytes` ကို အသုံးပြုပါ။
+> - **မတူညီသော learning rates**: large models များအတွက် learning rates နည်းပါးခြင်း (1e-5 မှ 3e-5) က ပိုမိုကောင်းမွန်စွာ အလုပ်လုပ်လေ့ရှိသည်။
+>
+> 🚀 **Optimization အရင်းအမြစ်များ**: optimizers များနှင့် training strategies များအကြောင်း [🤗 Transformers optimization guide](https://huggingface.co/docs/transformers/main/en/performance#optimizer) တွင် ပိုမိုလေ့လာပါ။
+
+နောက်ဆုံးအနေဖြင့်၊ default အားဖြင့် အသုံးပြုသော learning rate scheduler သည် အများဆုံးတန်ဖိုး (5e-5) မှ 0 အထိ linear decay သက်သက်ဖြစ်သည်။ ၎င်းကို မှန်ကန်စွာ သတ်မှတ်ရန်အတွက် ကျွန်တော်တို့ လုပ်ဆောင်မည့် training steps အရေအတွက်ကို သိရှိရန် လိုအပ်သည်၊ ၎င်းသည် ကျွန်တော်တို့ run လိုသော epochs အရေအတွက်ကို training batches အရေအတွက် (ကျွန်တော်တို့၏ training dataloader ၏ length) ဖြင့် မြှောက်ခြင်းဖြစ်သည်။ `Trainer` သည် default အားဖြင့် သုံး epochs ကို အသုံးပြုသောကြောင့် ကျွန်တော်တို့ ထိုအတိုင်း လိုက်နာမည်။
+
+```py
+from transformers import get_scheduler
+
+num_epochs = 3
+num_training_steps = num_epochs * len(train_dataloader)
+lr_scheduler = get_scheduler(
+    "linear",
+    optimizer=optimizer,
+    num_warmup_steps=0,
+    num_training_steps=num_training_steps,
+)
+print(num_training_steps)
+```
+
+```python out
+1377
+```
+
+### Training Loop[[the-training-loop]]
+
+နောက်ဆုံးတစ်ချက်- ကျွန်တော်တို့ GPU တစ်ခုကို အသုံးပြုနိုင်သည်ဆိုလျှင် ၎င်းကို အသုံးပြုလိုသည် (CPU ပေါ်တွင် training သည် မိနစ်အနည်းငယ်အစား နာရီများစွာ ကြာနိုင်သည်)။ ၎င်းကို လုပ်ဆောင်ရန်အတွက် ကျွန်တော်တို့၏ model နှင့် batches များကို တင်မည့် `device` တစ်ခုကို သတ်မှတ်သည်။
+
+```py
+import torch
+
+device = torch.device("cuda") if torch.cuda.is_available() else torch.device("cpu")
+model.to(device)
+device
+```
+
+```python out
+device(type='cuda')
+```
+
+ကျွန်တော်တို့ training လုပ်ရန် အသင့်ဖြစ်ပြီ။ training ပြီးဆုံးမည့်အချိန်ကို ခန့်မှန်းနိုင်ရန် `tqdm` library ကို အသုံးပြု၍ ကျွန်တော်တို့၏ training steps အရေအတွက်ပေါ်တွင် progress bar တစ်ခု ထည့်သွင်းသည်။
+
+```py
+from tqdm.auto import tqdm
+
+progress_bar = tqdm(range(num_training_steps))
+
+model.train()
+for epoch in range(num_epochs):
+    for batch in train_dataloader:
+        batch = {k: v.to(device) for k, v in batch.items()}
+        outputs = model(**batch)
+        loss = outputs.loss
+        loss.backward()
+
+        optimizer.step()
+        lr_scheduler.step()
+        optimizer.zero_grad()
+        progress_bar.update(1)
+```
+
+> [!TIP]
+> 💡 **ခေတ်မီ Training Optimization များ**: သင်၏ training loop ကို ပိုမိုထိရောက်စေရန်အတွက် အောက်ပါတို့ကို ထည့်သွင်းစဉ်းစားပါ။
+>
+> - **Gradient Clipping**: `optimizer.step()` မတိုင်မီ `torch.nn.utils.clip_grad_norm_(model.parameters(), max_norm=1.0)` ကို ထည့်သွင်းပါ။
+> - **Mixed Precision**: ပိုမိုမြန်ဆန်သော training အတွက် `torch.cuda.amp.autocast()` နှင့် `GradScaler` ကို အသုံးပြုပါ။
+> - **Gradient Accumulation**: ပိုကြီးမားသော batch sizes များကို အတုယူရန် batches အများအပြားပေါ်တွင် gradients များကို စုဆောင်းပါ။
+> - **Checkpointing**: training လုပ်ငန်းစဉ် ပြတ်တောက်သွားပါက ပြန်လည်စတင်နိုင်ရန် model checkpoints များကို အခါအားလျော်စွာ သိမ်းဆည်းပါ။
+>
+> 🔧 **Implementation လမ်းညွှန်**: ဤ optimization များ၏ အသေးစိတ်ဥပမာများအတွက် [🤗 Transformers efficient training guide](https://huggingface.co/docs/transformers/main/en/perf_train_gpu_one) နှင့် [optimizers များ](https://huggingface.co/docs/transformers/main/en/optimizers) ကို ကြည့်ရှုပါ။
+
+training loop ၏ အဓိက အစိတ်အပိုင်းသည် နိဒါန်းတွင် ပါရှိသည့်ပုံစံနှင့် အလွန်တူညီကြောင်း သင်တွေ့ရပါလိမ့်မည်။ ကျွန်တော်တို့သည် မည်သည့်အစီရင်ခံမှုကိုမျှ တောင်းဆိုခြင်း မရှိသောကြောင့် ဤ training loop က model ၏ စွမ်းဆောင်ရည်နှင့် ပတ်သက်၍ ဘာမျှ ပြောပြမည်မဟုတ်ပါ။ ထိုအတွက် evaluation loop တစ်ခု ထည့်သွင်းရန် လိုအပ်သည်။
+
+### Evaluation Loop[[the-evaluation-loop]]
+
+ကျွန်တော်တို့ ယခင်က လုပ်ဆောင်ခဲ့သည့်အတိုင်း 🤗 Evaluate library မှ ပံ့ပိုးပေးထားသော metric တစ်ခုကို အသုံးပြုမည်။ ကျွန်တော်တို့သည် `metric.compute()` method ကို မြင်တွေ့ခဲ့ပြီးဖြစ်သော်လည်း metrics များသည် `add_batch()` method ဖြင့် prediction loop ကို လုပ်ဆောင်နေစဉ် batches များကို တကယ်တမ်း စုဆောင်းနိုင်ပါသည်။ batches အားလုံးကို စုဆောင်းပြီးသည်နှင့် `metric.compute()` ဖြင့် နောက်ဆုံးရလဒ်ကို ရရှိနိုင်ပါသည်။ evaluation loop တွင် ဤအရာအားလုံးကို မည်သို့ အကောင်အထည်ဖော်ရမည်ကို ဤတွင် ဖော်ပြထားသည်။
+
+> [!TIP]
+> 📊 **Evaluation အကောင်းဆုံးအလေ့အကျင့်များ**: ပိုမိုရှုပ်ထွေးသော evaluation strategies နှင့် metrics များအတွက် [🤗 Evaluate documentation](https://huggingface.co/docs/evaluate/) နှင့် [comprehensive evaluation cookbook](https://github.com/huggingface/evaluation-guidebook) ကို လေ့လာပါ။
+
+```py
+import evaluate
+
+metric = evaluate.load("glue", "mrpc")
+model.eval()
+for batch in eval_dataloader:
+    batch = {k: v.to(device) for k, v in batch.items()}
+    with torch.no_grad():
+        outputs = model(**batch)
+
+    logits = outputs.logits
+    predictions = torch.argmax(logits, dim=-1)
+    metric.add_batch(predictions=predictions, references=batch["labels"])
+
+metric.compute()
+```
+
+```python out
+{'accuracy': 0.8431372549019608, 'f1': 0.8907849829351535}
+```
+
+ထပ်မံ၍ သင်၏ရလဒ်များသည် model head initialization နှင့် data shuffling ရှိ ကျပန်း (randomness) ကြောင့် အနည်းငယ် ကွဲပြားနိုင်သော်လည်း ၎င်းတို့သည် အနီးစပ်ဆုံး တူညီသင့်သည်။
+
+> [!TIP]
+> ✏️ **စမ်းသပ်ကြည့်ပါ။** သင်၏ model ကို SST-2 dataset ပေါ်တွင် fine-tune လုပ်ရန် ယခင် training loop ကို ပြင်ဆင်ပါ။
+
+### 🤗 Accelerate ဖြင့် သင်၏ Training Loop ကို အစွမ်းထက်စေခြင်း[[supercharge-your-training-loop-with-accelerate]]
+
+<Youtube id="s7dy8QRgjJ0" />
+
+ကျွန်တော်တို့ ယခင်က သတ်မှတ်ခဲ့သော training loop သည် single CPU (Central Processing Unit) သို့မဟုတ် GPU (Graphics Processing Unit) ပေါ်တွင် ကောင်းစွာ အလုပ်လုပ်ပါသည်။ သို့သော် [🤗 Accelerate](https://github.com/huggingface/accelerate) library ကို အသုံးပြု၍ အပြောင်းအလဲ အနည်းငယ်ဖြင့် GPUs များစွာ သို့မဟုတ် TPUs (Tensor Processing Units) များပေါ်တွင် distributed training (ဖြန့်ကျက်လေ့ကျင့်မှု) ကို ဖွင့်နိုင်ပါသည်။ 🤗 Accelerate သည် distributed training၊ mixed precision နှင့် device placement တို့၏ ရှုပ်ထွေးမှုများကို အလိုအလျောက် ကိုင်တွယ်ပေးပါသည်။ training နှင့် validation dataloaders များ ဖန်တီးခြင်းမှ စတင်၍ ကျွန်တော်တို့၏ manual training loop သည် ဤသို့ ဖြစ်ပါသည်။
+
+> [!TIP]
+> ⚡ **Accelerate Deep Dive**: distributed training၊ mixed precision နှင့် hardware optimization အကြောင်း [🤗 Accelerate documentation](https://huggingface.co/docs/accelerate/) တွင် အားလုံးကို လေ့လာပြီး [transformers documentation](https://huggingface.co/docs/transformers/main/en/accelerate) တွင် လက်တွေ့ဥပမာများကို ရှာဖွေပါ။
+
+```py
+from accelerate import Accelerator
+from torch.optim import AdamW
+from transformers import AutoModelForSequenceClassification, get_scheduler
+
+accelerator = Accelerator()
+
+model = AutoModelForSequenceClassification.from_pretrained(checkpoint, num_labels=2)
+optimizer = AdamW(model.parameters(), lr=3e-5)
+
+train_dl, eval_dl, model, optimizer = accelerator.prepare(
+    train_dataloader, eval_dataloader, model, optimizer
+)
+
+num_epochs = 3
+num_training_steps = num_epochs * len(train_dl)
+lr_scheduler = get_scheduler(
+    "linear",
+    optimizer=optimizer,
+    num_warmup_steps=0,
+    num_training_steps=num_training_steps,
+)
+
+progress_bar = tqdm(range(num_training_steps))
+
+model.train()
+for epoch in range(num_epochs):
+    for batch in train_dl:
+        outputs = model(**batch)
+        loss = outputs.loss
+        accelerator.backward(loss)
+
+        optimizer.step()
+        lr_scheduler.step()
+        optimizer.zero_grad()
+        progress_bar.update(1)
+```
+
+ပထမဆုံး ထည့်သွင်းရမည့် လိုင်းမှာ import လိုင်းဖြစ်သည်။ ဒုတိယလိုင်းတွင် `Accelerator` object တစ်ခုကို instantiate လုပ်ထားသည်၊ ၎င်းသည် ပတ်ဝန်းကျင်ကို ကြည့်ရှုပြီး မှန်ကန်သော distributed setup ကို initialize လုပ်ပေးမည်။ 🤗 Accelerate သည် သင့်အတွက် device placement ကို ကိုင်တွယ်ပေးသောကြောင့် model ကို device ပေါ်တွင် တင်မည့်လိုင်းများကို ဖယ်ရှားနိုင်သည် (သို့မဟုတ် သင်ပိုနှစ်သက်ပါက ၎င်းတို့ကို `device` အစား `accelerator.device` ကို အသုံးပြုရန် ပြောင်းလဲနိုင်သည်)။
+
+ထို့နောက် အလုပ်၏ အဓိကအစိတ်အပိုင်းကို dataloaders, model နှင့် optimizer တို့ကို `accelerator.prepare()` ထံ ပေးပို့သည့် လိုင်းတွင် လုပ်ဆောင်သည်။ ၎င်းသည် ထို objects များကို သင်၏ distributed training က ရည်ရွယ်ထားသည့်အတိုင်း အလုပ်လုပ်ကြောင်း သေချာစေရန် မှန်ကန်သော container ထဲတွင် ထည့်သွင်းပေးမည်။ ပြုလုပ်ရမည့် ကျန်ရှိသော အပြောင်းအလဲများမှာ batch ကို `device` ပေါ်တွင် တင်သည့်လိုင်းကို ဖယ်ရှားခြင်း (ထပ်မံ၍ သင် ၎င်းကို ထားလိုပါက `accelerator.device` ကို အသုံးပြုရန် ပြောင်းလဲနိုင်သည်) နှင့် `loss.backward()` ကို `accelerator.backward(loss)` ဖြင့် အစားထိုးခြင်းတို့ဖြစ်သည်။
+
+> [!TIP]
+> ⚠️ Cloud TPUs မှ ပေးဆောင်သော အရှိန်အဟုန်ကို အကျိုးခံစားရန်အတွက် သင်၏ samples များကို tokenizer ၏ `padding="max_length"` နှင့် `max_length` arguments များဖြင့် fixed length တစ်ခုသို့ padding လုပ်ရန် ကျွန်ုပ်တို့ အကြံပြုပါသည်။
+
+သင် ကူးယူပြီး ကစားကြည့်လိုပါက 🤗 Accelerate ဖြင့် ပြည့်စုံသော training loop သည် ဤသို့ ဖြစ်ပါသည်။
+
+```py
+from accelerate import Accelerator
+from torch.optim import AdamW
+from transformers import AutoModelForSequenceClassification, get_scheduler
+
+accelerator = Accelerator()
+
+model = AutoModelForSequenceClassification.from_pretrained(checkpoint, num_labels=2)
+optimizer = AdamW(model.parameters(), lr=3e-5)
+
+train_dl, eval_dl, model, optimizer = accelerator.prepare(
+    train_dataloader, eval_dataloader, model, optimizer
+)
+
+num_epochs = 3
+num_training_steps = num_epochs * len(train_dl)
+lr_scheduler = get_scheduler(
+    "linear",
+    optimizer=optimizer,
+    num_warmup_steps=0,
+    num_training_steps=num_training_steps,
+)
+
+progress_bar = tqdm(range(num_training_steps))
+
+model.train()
+for epoch in range(num_epochs):
+    for batch in train_dl:
+        outputs = model(**batch)
+        loss = outputs.loss
+        accelerator.backward(loss)
+
+        optimizer.step()
+        lr_scheduler.step()
+        optimizer.zero_grad()
+        progress_bar.update(1)
+```
+
+ဤ code ကို `train.py` script တစ်ခုထဲတွင် ထည့်သွင်းခြင်းဖြင့် မည်သည့် distributed setup အမျိုးအစားပေါ်တွင်မဆို ထို script ကို run နိုင်မည်ဖြစ်သည်။ သင်၏ distributed setup တွင် ၎င်းကို စမ်းသပ်ရန်အတွက် အောက်ပါ command ကို run ပါ။
+
+```bash
+accelerate config
+```
+
+၎င်းသည် သင့်အား မေးခွန်းအနည်းငယ် မေးပြီး သင်၏ အဖြေများကို configuration file တစ်ခုထဲတွင် dump လုပ်လိမ့်မည်၊ ၎င်းကို ဤ command မှ အသုံးပြုသည်။
+
+```
+accelerate launch train.py
+```
+
+၎င်းသည် distributed training ကို စတင်လိမ့်မည်။
+
+သင် ၎င်းကို Notebook (ဥပမာ Colab တွင် TPUs ဖြင့် စမ်းသပ်ရန်) တွင် စမ်းသပ်လိုပါက code ကို `training_function()` တစ်ခုထဲတွင် ကူးထည့်ပြီး နောက်ဆုံး cell တစ်ခုကို ဤသို့ run ပါ။
+
+```python
+from accelerate import notebook_launcher
+
+notebook_launcher(training_function)
+```
+
+[🤗 Accelerate repo](https://github.com/huggingface/accelerate/tree/main/examples) တွင် ဥပမာများ ထပ်မံ ရှာဖွေနိုင်ပါသည်။
+
+> [!TIP]
+> 🌐 **Distributed Training**: multi-GPU နှင့် multi-node training များအကြောင်း ပြည့်စုံသော အကြောင်းအရာများအတွက် [🤗 Transformers distributed training guide](https://huggingface.co/docs/transformers/main/en/perf_train_gpu_many) နှင့် [scaling training cookbook](https://huggingface.co/docs/transformers/main/en/accelerate) ကို ကြည့်ရှုပါ။
+
+### နောက်တစ်ဆင့်များနှင့် အကောင်းဆုံးအလေ့အကျင့်များ[[next-steps-and-best-practices]]
+
+ယခု သင်သည် training ကို အစမှ အကောင်အထည်ဖော်နည်းကို သင်ယူပြီးပြီဖြစ်ရာ၊ ထုတ်လုပ်မှု (production) အတွက် ထပ်မံ ထည့်သွင်းစဉ်းစားရမည့် အချက်အချို့ ဤတွင် ဖော်ပြထားသည်။
+
+**Model Evaluation**: သင်၏ model ကို accuracy သက်သက်မဟုတ်ဘဲ metrics များစွာဖြင့် အမြဲတမ်း အကဲဖြတ်ပါ။ ပြည့်စုံသော evaluation အတွက် 🤗 Evaluate library ကို အသုံးပြုပါ။
+
+**Hyperparameter Tuning**: စနစ်တကျ hyperparameter optimization အတွက် Optuna သို့မဟုတ် Ray Tune ကဲ့သို့သော library များကို အသုံးပြုရန် စဉ်းစားပါ။
+
+**Model Monitoring**: training လုပ်နေစဉ် တစ်လျှောက်လုံး training metrics၊ learning curves နှင့် validation performance များကို မှတ်တမ်းတင်ပါ။
+
+**Model Sharing**: လေ့ကျင့်ပြီးသည်နှင့် သင်၏ model ကို Hugging Face Hub ပေါ်တွင် မျှဝေခြင်းဖြင့် လူအဖွဲ့အစည်း (community) အတွက် ရရှိနိုင်စေပါ။
+
+**Efficiency**: large models များအတွက် gradient checkpointing၊ parameter-efficient fine-tuning (LoRA, AdaLoRA) သို့မဟုတ် quantization methods ကဲ့သို့သော နည်းလမ်းများကို ထည့်သွင်းစဉ်းစားပါ။
+
+ဒါက custom training loops တွေနဲ့ fine-tuning လုပ်ခြင်းအကြောင်း ကျွန်တော်တို့ရဲ့ နက်နက်နဲနဲ လေ့လာမှုကို နိဂုံးချုပ်လိုက်ပါပြီ။ ဒီနေရာမှာ သင်သင်ယူခဲ့တဲ့ ကျွမ်းကျင်မှုတွေက training process ပေါ်မှာ အပြည့်အဝ ထိန်းချုပ်ဖို့ လိုအပ်တဲ့အခါ ဒါမှမဟုတ် `Trainer` API က ပေးစွမ်းနိုင်တာထက် ကျော်လွန်တဲ့ custom training logic ကို အကောင်အထည်ဖော်ချင်တဲ့အခါ သင့်ကို အထောက်အကူပြုပါလိမ့်မယ်။
+
+## အခန်း၏ ဗဟုသုတစစ်ဆေးခြင်း[[section-quiz]]
+
+custom training loops များနှင့် advanced training နည်းလမ်းများအကြောင်း သင့်နားလည်မှုကို စမ်းသပ်ပါ။
+
+### 1. Adam နှင့် AdamW optimizers များကြား အဓိကကွာခြားချက်က ဘာလဲ။
+
+<Question
+	choices={[
+		{
+			text: "AdamW သည် မတူညီသော learning rate schedule ကို အသုံးပြုသည်။",
+			explain: "Learning rate scheduling သည် optimizer ရွေးချယ်မှုနှင့် သီးခြားဖြစ်သည်။"
+		},
+		{
+			text: "AdamW တွင် decoupled weight decay regularization ပါဝင်သည်။",
+			explain: "မှန်ပါသည်။ AdamW သည် gradient-based parameter updates မှ weight decay ကို ခွဲခြားထားခြင်းဖြင့် ပိုမိုကောင်းမွန်သော regularization ကို ဖြစ်ပေါ်စေသည်။",
+            correct: true
+		},
+		{
+			text: "AdamW သည် transformer models များနှင့်သာ အလုပ်လုပ်သည်။",
+			explain: "AdamW ကို transformers များသာမက မည်သည့် model architecture နှင့်မဆို အသုံးပြုနိုင်သည်။"
+		},
+        {
+			text: "AdamW သည် Adam ထက် memory ပိုမိုနည်းပါးစွာ လိုအပ်သည်။",
+			explain: "optimizers နှစ်ခုလုံးတွင် memory လိုအပ်ချက်များ တူညီသည်။"
+		}
+	]}
+/>
+
+### 2. training loop တစ်ခုတွင် လုပ်ဆောင်မှုများ၏ မှန်ကန်သော အစီအစဉ်က ဘာလဲ။
+
+<Question
+	choices={[
+		{
+			text: "Forward pass → Backward pass → Optimizer step → Zero gradients",
+			explain: "နီးစပ်သော်လည်း၊ gradients အဟောင်းများ စုပုံခြင်းကို ရှောင်ရှားရန် နောက်ထပ် forward pass မလုပ်မီ gradients များကို zero ပြန်လုပ်သင့်သည်။"
+		},
+		{
+			text: "Forward pass → Backward pass → Optimizer step → Scheduler step → Zero gradients",
+			explain: "မှန်ပါသည်။ ၎င်းသည် မှန်ကန်သော အစီအစဉ်ဖြစ်သည်။ loss ကို တွက်ချက်ပါ၊ gradients ကို တွက်ချက်ပါ၊ parameters ကို update လုပ်ပါ၊ learning rate ကို update လုပ်ပါ၊ ထို့နောက် gradients ကို ရှင်းလင်းပါ။",
+            correct: true
+		},
+		{
+			text: "Zero gradients → Forward pass → Optimizer step → Backward pass",
+			explain: "gradients ကို တွက်ချက်ရန်အတွက် backward pass သည် forward pass နောက်တွင် လာရမည်။"
+		},
+        {
+			text: "Forward pass → Zero gradients → Backward pass → Optimizer step",
+			explain: "backward pass မတိုင်မီ gradients များကို zero ပြန်လုပ်ခြင်းသည် သင်ခုမှ တွက်ချက်ထားသော gradients များကို ဖျက်ဆီးပစ်လိမ့်မည်။"
+		}
+	]}
+/>
+
+### 3. 🤗 Accelerate library သည် အဓိကအားဖြင့် ဘာကို ကူညီသလဲ။
+
+<Question
+	choices={[
+		{
+			text: "forward pass ကို optimized လုပ်ခြင်းဖြင့် သင်၏ models များကို ပိုမိုမြန်ဆန်စွာ train လုပ်ခြင်း။",
+			explain: "Accelerate သည် model architecture ကိုယ်တိုင်ကို optimized လုပ်ခြင်း မရှိပါ။"
+		},
+		{
+			text: "အကောင်းဆုံး hyperparameters များကို အလိုအလျောက် ရွေးချယ်ခြင်း။",
+			explain: "Accelerate သည် hyperparameter optimization ကို မလုပ်ဆောင်ပါ။"
+		},
+		{
+			text: "code အပြောင်းအလဲ အနည်းငယ်ဖြင့် GPUs/TPUs များစွာတွင် distributed training ကို ဖွင့်နိုင်စေခြင်း။",
+			explain: "မှန်ပါသည်။ Accelerate သည် distributed training ၏ ရှုပ်ထွေးမှုများကို ကိုင်တွယ်ကာ သင်၏ code ကို single သို့မဟုတ် multiple devices များပေါ်တွင် ချောမွေ့စွာ run နိုင်စေသည်။",
+            correct: true
+		},
+        {
+			text: "models များကို TensorFlow ကဲ့သို့သော မတူညီသော frameworks များသို့ ပြောင်းလဲခြင်း။",
+			explain: "Accelerate သည် PyTorch အတွင်း၌ အလုပ်လုပ်ပြီး frameworks များကြား ပြောင်းလဲခြင်း မရှိပါ။"
+		}
+	]}
+/>
+
+### 4. training loop တစ်ခုတွင် batches များကို device သို့ ဘာကြောင့် ရွှေ့ရသလဲ။
+
+<Question
+	choices={[
+		{
+			text: "training ကို ပိုမိုမြန်ဆန်စေရန်။",
+			explain: "အရှိန်ကို ထိခိုက်စေနိုင်သော်လည်း၊ အဓိကအကြောင်းအရင်းမှာ လိုက်ဖက်မှုရှိခြင်း ဖြစ်သည်။"
+		},
+		{
+			text: "computation လုပ်ဆောင်ရန်အတွက် model နှင့် data ကို တူညီသော device (CPU/GPU) ပေါ်တွင် ရှိရမည်ဖြစ်သောကြောင့်။",
+			explain: "မှန်ပါသည်။ PyTorch သည် operation များ အလုပ်လုပ်ရန်အတွက် tensors များကို တူညီသော device ပေါ်တွင် ရှိရန် လိုအပ်သည်။",
+            correct: true
+		},
+		{
+			text: "memory ချွေတာရန်။",
+			explain: "device သို့ ရွှေ့ခြင်းသည် မူလအားဖြင့် memory ချွေတာခြင်း မရှိပါ။"
+		},
+        {
+			text: "DataLoader က လိုအပ်သောကြောင့်။",
+			explain: "DataLoader သည် သီးခြား device placement ကို မလိုအပ်ပါ။"
+		}
+	]}
+/>
+
+### 5. evaluation မတိုင်မီ `model.eval()` က ဘာလုပ်သလဲ။
+
+<Question
+	choices={[
+		{
+			text: "model parameters များကို freeze လုပ်ခြင်းဖြင့် ၎င်းတို့ကို update လုပ်လို့ မရအောင် တားဆီးသည်။",
+			explain: "model.eval() သည် parameters များကို freeze လုပ်ခြင်း မရှိပါ - ၎င်းကို requires_grad=False ဟု သတ်မှတ်ခြင်းဖြင့် လုပ်ဆောင်မည်။"
+		},
+		{
+			text: "dropout နှင့် batch normalization ကဲ့သို့သော layers များ၏ အပြုအမူကို inference အတွက် ပြောင်းလဲသည်။",
+			explain: "မှန်ပါသည်။ eval() mode သည် dropout ကို ပိတ်ကာ batch norm အတွက် running statistics များကို အသုံးပြုပြီး လက်ရှိ batch မှ တွက်ချက်ခြင်း မရှိပါ။",
+            correct: true
+		},
+		{
+			text: "evaluation metrics အတွက် gradient computation ကို ဖွင့်ပေးသည်။",
+			explain: "တကယ်တော့၊ evaluation လုပ်နေစဉ် gradient computation ကို ပိတ်ရန်အတွက် ကျွန်တော်တို့သည် များသောအားဖြင့် torch.no_grad() ကို အသုံးပြုသည်။"
+		},
+        {
+			text: "evaluation metrics များကို အလိုအလျောက် တွက်ချက်သည်။",
+			explain: "model.eval() သည် layer အပြုအမူကိုသာ ပြောင်းလဲသည် - metric တွက်ချက်မှုကို သင်ကိုယ်တိုင် အကောင်အထည်ဖော်ရန် လိုအပ်သေးသည်။"
+		}
+	]}
+/>
+
+### 6. evaluation လုပ်နေစဉ် `torch.no_grad()` ရဲ့ ရည်ရွယ်ချက်က ဘာလဲ။
+
+<Question
+	choices={[
+		{
+			text: "model က predictions တွေ မလုပ်နိုင်အောင် တားဆီးရန်။",
+			explain: "torch.no_grad() သည် predictions များကို တားဆီးခြင်း မရှိဘဲ gradient computation ကိုသာ တားဆီးသည်။"
+		},
+		{
+			text: "gradient tracking ကို ပိတ်ခြင်းဖြင့် memory ချွေတာရန်နှင့် computation ကို အရှိန်မြှင့်ရန်။",
+			explain: "မှန်ပါသည်။ evaluation အတွက် gradients မလိုအပ်သောကြောင့် ၎င်းတို့ကို ပိတ်ခြင်းသည် memory နှင့် computation ကို ချွေတာသည်။",
+            correct: true
+		},
+		{
+			text: "model အတွက် evaluation mode ကို ဖွင့်ရန်။",
+			explain: "Evaluation mode ကို model.eval() ဖြင့် ဖွင့်ပြီး torch.no_grad() ဖြင့် မဟုတ်ပါ။"
+		},
+        {
+			text: "run များတစ်လျှောက် တူညီသောရလဒ်များ ရရှိစေရန်။",
+			explain: "Reproducibility ကို random seeds များကို သတ်မှတ်ခြင်းဖြင့် ကိုင်တွယ်ပြီး torch.no_grad() ဖြင့် မဟုတ်ပါ။"
+		}
+	]}
+/>
+
+### 7. သင်၏ training loop တွင် 🤗 Accelerate ကို အသုံးပြုသောအခါ ဘာတွေ ပြောင်းလဲသွားသလဲ။
+
+<Question
+	choices={[
+		{
+			text: "သင်၏ training loop တစ်ခုလုံးကို အစမှ ပြန်ရေးရမည်။",
+			explain: "Accelerate သည် ရှိပြီးသား PyTorch code တွင် အနည်းငယ်သော အပြောင်းအလဲများသာ လိုအပ်သည်။"
+		},
+		{
+			text: "အဓိက objects များကို accelerator.prepare() ဖြင့် ထုပ်ပိုးပြီး loss.backward() အစား accelerator.backward() ကို အသုံးပြုရမည်။",
+			explain: "မှန်ပါသည်။ ဤအရာများသည် အဓိကအပြောင်းအလဲများဖြစ်သည် - သင်၏ objects များကို ပြင်ဆင်ပြီး မှန်ကန်သော distributed training အတွက် accelerator.backward() ကို အသုံးပြုပါ။",
+            correct: true
+		},
+		{
+			text: "သင်၏ code တွင် GPUs အရေအတွက်ကို သတ်မှတ်ရန် လိုအပ်သည်။",
+			explain: "Accelerate သည် ရရှိနိုင်သော hardware ကို အလိုအလျောက် detect လုပ်သည်။"
+		},
+        {
+			text: "သင်သည် မတူညီသော optimizer နှင့် scheduler ကို အသုံးပြုရမည်။",
+			explain: "သင်သည် Accelerate ဖြင့် တူညီသော optimizers နှင့် schedulers များကို အသုံးပြုနိုင်သည်။"
+		}
+	]}
+/>
+
+> [!TIP]
+> 💡 **အဓိက အချက်များ:**
+> - Manual training loops များသည် သင့်အား အပြည့်အဝ ထိန်းချုပ်ခွင့်ပေးသော်လည်း မှန်ကန်သော အစီအစဉ်ကို နားလည်ရန် လိုအပ်သည်- forward → backward → optimizer step → scheduler step → zero gradients။
+> - Weight decay ပါသော AdamW သည် transformer models များအတွက် အကြံပြုထားသော optimizer ဖြစ်သည်။
+> - မှန်ကန်သော အပြုအမူနှင့် ထိရောက်မှုအတွက် evaluation လုပ်နေစဉ် `model.eval()` နှင့် `torch.no_grad()` ကို အမြဲတမ်း အသုံးပြုပါ။
+> - 🤗 Accelerate သည် code အပြောင်းအလဲ အနည်းငယ်ဖြင့် distributed training ကို လက်လှမ်းမီစေသည်။
+> - Device management (tensors များကို GPU/CPU သို့ ရွှေ့ခြင်း) သည် PyTorch operation များအတွက် အရေးကြီးသည်။
+> - Mixed precision၊ gradient accumulation နှင့် gradient clipping ကဲ့သို့သော ခေတ်မီနည်းလမ်းများသည် training ထိရောက်မှုကို သိသိသာသာ တိုးတက်စေနိုင်သည်။
+
+## ဝေါဟာရ ရှင်းလင်းချက် (Glossary)
+
+*   **Training Loop**: AI (Artificial Intelligence) မော်ဒယ်တစ်ခုကို ဒေတာများဖြင့် အကြိမ်ကြိမ် လေ့ကျင့်ပေးသည့် လုပ်ငန်းစဉ်။
+*   **`Trainer` Class**: 🤗 Transformers library မှ model များကို လေ့ကျင့်ရန်နှင့် အကဲဖြတ်ရန်အတွက် အသုံးပြုသော class။
+*   **PyTorch**: Facebook (ယခု Meta) က ဖန်တီးထားတဲ့ open-source machine learning library တစ်ခုဖြစ်ပြီး deep learning မော်ဒယ်တွေ တည်ဆောက်ဖို့အတွက် အသုံးပြုပါတယ်။
+*   **Data Processing**: ဒေတာများကို model က နားလည်ပြီး လုပ်ဆောင်နိုင်တဲ့ ပုံစံအဖြစ် ပြောင်းလဲပြင်ဆင်ခြင်း လုပ်ငန်းစဉ်။
+*   **Dataloaders**: Datasets များမှ ဒေတာများကို batches အလိုက် ထုတ်ယူပေးသည့် PyTorch utility class။
+*   **Batch**: မတူညီသော input များစွာကို တစ်ပြိုင်နက်တည်း လုပ်ဆောင်နိုင်ရန် အုပ်စုဖွဲ့ခြင်း။
+*   **Postprocessing**: Preprocessing လုပ်ပြီးနောက် ဒေတာများကို ထပ်မံပြင်ဆင်ခြင်း လုပ်ငန်းစဉ်။
+*   **`tokenized_datasets`**: Tokenization ပြုလုပ်ပြီးသော dataset များ ပါဝင်သော object။
+*   **`remove_columns()` Method**: Dataset မှ မလိုအပ်သော columns များကို ဖယ်ရှားရန် အသုံးပြုသော method။
+*   **`rename_column()` Method**: Dataset ၏ column အမည်ကို ပြောင်းလဲရန် အသုံးပြုသော method။
+*   **`set_format("torch")` Method**: Dataset ၏ output format ကို PyTorch tensors အဖြစ် ပြောင်းလဲရန် သတ်မှတ်သော method။
+*   **PyTorch Tensors**: PyTorch framework မှာ data တွေကို ကိုယ်စားပြုသော multi-dimensional array များ။
+*   **`torch.utils.data.DataLoader`**: PyTorch မှာ dataloaders များကို ဖန်တီးရန် အသုံးပြုသော class။
+*   **`shuffle=True`**: `DataLoader` တွင် samples များကို training လုပ်နေစဉ် ကျပန်း (randomly) ရောနှောရန် သတ်မှတ်သော parameter။
+*   **`batch_size`**: batch တစ်ခုစီတွင် ပါဝင်မည့် samples အရေအတွက်။
+*   **`collate_fn`**: batch တစ်ခုအတွင်း samples များကို စုစည်းပေးသော function (ဥပမာ- `DataCollatorWithPadding`)။
+*   **`DataCollatorWithPadding`**: Hugging Face Transformers library မှ ပံ့ပိုးပေးသော class တစ်ခုဖြစ်ပြီး dynamic padding ကို အသုံးပြု၍ batch တစ်ခုအတွင်း samples များကို စုစည်းပေးသည်။
+*   **Model**: Artificial Intelligence (AI) နယ်ပယ်တွင် အချက်အလက်များကို လေ့လာပြီး ခန့်မှန်းချက်များ ပြုလုပ်ရန် ဒီဇိုင်းထုတ်ထားသော သင်္ချာဆိုင်ရာဖွဲ့စည်းပုံ။
+*   **`AutoModelForSequenceClassification`**: Hugging Face Transformers library မှာ ပါဝင်တဲ့ class တစ်ခုဖြစ်ပြီး sequence classification အတွက် pre-trained model ကို အလိုအလျောက် load လုပ်ပေးသည်။
+*   **`num_labels`**: Classification လုပ်ငန်းအတွက် label (အမျိုးအစား) အရေအတွက်။
+*   **`outputs = model(**batch)`**: Model ကို input batch ပေးပို့ပြီး output ရယူခြင်း။
+*   **`outputs.loss`**: Model ၏ output မှ ပြန်ပေးသော loss တန်ဖိုး။
+*   **`outputs.logits`**: Model ၏ output မှ ပြန်ပေးသော raw, unnormalized scores များ။
+*   **Optimizer**: Model ၏ parameters များကို လေ့ကျင့်နေစဉ် update လုပ်ရန် အသုံးပြုသော algorithm။
+*   **Learning Rate Scheduler**: Training လုပ်နေစဉ် learning rate ကို အချိန်ကြာလာသည်နှင့်အမျှ ပြောင်းလဲသွားစေရန် နည်းလမ်း။
+*   **`AdamW`**: `Adam` optimizer ၏ မူကွဲတစ်ခုဖြစ်ပြီး weight decay regularization ကို ပိုမိုထိရောက်စွာ လုပ်ဆောင်သည်။
+    *   **Adam (Adaptive Moment Estimation)**: Deep learning တွင် အသုံးများသော optimizer တစ်ခုဖြစ်ပြီး learning rate ကို အလိုအလျောက် ချိန်ညှိပေးသည်။
+    *   **Weight Decay Regularization**: model ၏ weights များကို သေးငယ်အောင် ထိန်းညှိခြင်းဖြင့် overfitting ကို လျှော့ချရန် အသုံးပြုသော နည်းလမ်း။
+    *   **Decoupled Weight Decay**: Weight decay ကို gradient update များနှင့် သီးခြားစီ လုပ်ဆောင်ခြင်း။
+*   **`model.parameters()`**: model ၏ လေ့ကျင့်နိုင်သော parameters (weights နှင့် biases) များကို ပြန်ပေးသော method။
+*   **`lr` (Learning Rate)**: Training လုပ်နေစဉ် model ၏ weights များကို update လုပ်ရာတွင် အသုံးပြုသော step size။
+*   **`get_scheduler()` Function**: Hugging Face Transformers library မှ learning rate scheduler တစ်ခုကို ဖန်တီးရန် အသုံးပြုသော function။
+*   **Linear Decay**: Learning rate ကို အချိန်ကြာလာသည်နှင့်အမျှ လိုင်းဖြောင့်အတိုင်း (linearly) လျှော့ချသွားသော scheduling နည်းလမ်း။
+*   **`num_warmup_steps`**: learning rate ကို တိုးမြှင့်မည့် training steps အရေအတွက်။
+*   **`num_training_steps`**: training လုပ်ငန်းစဉ်၏ စုစုပေါင်း steps အရေအတွက်။
+*   **`device`**: Model နှင့် data များကို ထားရှိမည့် computing device (CPU သို့မဟုတ် GPU)။
+*   **`torch.device("cuda")`**: GPU (CUDA enabled) ကို အသုံးပြုရန် သတ်မှတ်ခြင်း။
+*   **`torch.device("cpu")`**: CPU ကို အသုံးပြုရန် သတ်မှတ်ခြင်း။
+*   **`model.to(device)`**: Model ကို သတ်မှတ်ထားသော device သို့ ရွှေ့ပြောင်းခြင်း။
+*   **`tqdm` (Library)**: Python loop များအတွက် progress bars များကို ဖန်တီးရန် အသုံးပြုသော library။
+*   **`tqdm.auto.tqdm`**: `tqdm` library မှ progress bar class။
+*   **`progress_bar.update(1)`**: progress bar ကို တစ် step တိုးမြှင့်ခြင်း။
+*   **`model.train()` Method**: Model ကို training mode သို့ ပြောင်းလဲခြင်း။ ၎င်းသည် dropout နှင့် batch normalization ကဲ့သို့သော layers များကို training အတွက် သင့်လျော်သောအပြုအမူသို့ ပြောင်းလဲပေးသည်။
+*   **`batch = {k: v.to(device) for k, v in batch.items()}`**: batch ထဲရှိ tensors များကို သတ်မှတ်ထားသော device သို့ ရွှေ့ပြောင်းခြင်း။
+*   **`loss.backward()` Method**: PyTorch မှာ backpropagation ကို လုပ်ဆောင်ပြီး model ၏ parameters တွေအတွက် gradients များကို တွက်ချက်သော method။
+*   **`optimizer.step()` Method**: တွက်ချက်ထားသော gradients များကို အသုံးပြုပြီး model ၏ parameters များကို update လုပ်သော optimizer method။
+*   **`lr_scheduler.step()` Method**: Learning rate scheduler ကို update လုပ်သော method။
+*   **`optimizer.zero_grad()` Method**: Optimizer အတွင်းရှိ gradient များကို zero ပြန်လုပ်ခြင်း (နောက်ထပ် batch အတွက် gradient များကို စုပုံခြင်းမှ ကာကွယ်ရန်)။
+*   **Gradient Clipping**: Gradients များ၏ တန်ဖိုးကို ကန့်သတ်ခြင်းဖြင့် gradient exploding ပြဿနာကို ကာကွယ်သော နည်းလမ်း။
+*   **`torch.nn.utils.clip_grad_norm_`**: PyTorch function တစ်ခုဖြစ်ပြီး gradients များကို clip လုပ်ရန် အသုံးပြုသည်။
+*   **`max_norm`**: Gradient clipping လုပ်ရာတွင် သတ်မှတ်သော အများဆုံး norm တန်ဖိုး။
+*   **`torch.cuda.amp.autocast()`**: PyTorch တွင် mixed precision training ကို အသုံးပြုရန်အတွက် context manager။
+*   **`GradScaler`**: PyTorch တွင် mixed precision training အတွက် gradients များကို scale လုပ်ရန် အသုံးပြုသော class။
+*   **Gradient Accumulation**: GPU memory ကန့်သတ်ချက်ရှိသောအခါ ပိုကြီးမားသော batch sizes များကို အတုယူရန် batches အများအပြားပေါ်တွင် gradients များကို စုဆောင်းပြီးမှ update လုပ်ခြင်း။
+*   **Checkpointing**: Model parameters များကို အခါအားလျော်စွာ သိမ်းဆည်းထားခြင်း။
+*   **`model.eval()` Method**: Model ကို evaluation mode သို့ ပြောင်းလဲခြင်း။ ၎င်းသည် dropout နှင့် batch normalization ကဲ့သို့သော layers များကို inference အတွက် သင့်လျော်သောအပြုအမူသို့ ပြောင်းလဲပေးသည်။
+*   **`torch.no_grad()`**: PyTorch တွင် gradient တွက်ချက်မှုကို ပိတ်ရန်အတွက် context manager။ Evaluation လုပ်နေစဉ် memory ချွေတာရန်နှင့် အရှိန်မြှင့်ရန် အသုံးပြုသည်။
+*   **`torch.argmax(logits, dim=-1)`**: Logits များမှ အများဆုံးတန်ဖိုးရှိသော index ကို ယူခြင်းဖြင့် prediction ကို ရရှိစေသည်။ `dim=-1` ဆိုသည်မှာ နောက်ဆုံး dimension ကို ဆိုလိုသည်။
+*   **`metric.add_batch()` Method**: 🤗 Evaluate library ၏ metric object ၏ method ဖြစ်ပြီး predictions နှင့် references များကို batch အလိုက် စုဆောင်းပေးသည်။
+*   **`metric.compute()` Method**: စုဆောင်းထားသော predictions နှင့် references များမှ နောက်ဆုံး metric value ကို တွက်ချက်ရန် metric object ၏ method။
+*   **`tqdm` (Library)**: Python loop များအတွက် progress bars များကို ဖန်တီးရန် အသုံးပြုသော library။
+*   **`randomness`**: ကျပန်းသဘောတရား၊ ကြိုတင်ခန့်မှန်း၍မရသော အဖြစ်အပျက်များ။
+*   **SST-2 Dataset**: GLUE benchmark ထဲက sentiment analysis task တစ်ခုဖြစ်ပြီး single sentences တွေ ပါဝင်ပါတယ်။
+*   **🤗 Accelerate Library**: Hugging Face က ထုတ်လုပ်ထားတဲ့ library တစ်ခုဖြစ်ပြီး PyTorch training loops တွေကို code အပြောင်းအလဲ အနည်းငယ်နဲ့ distributed training (multiple GPUs, TPUs) မှာ run နိုင်အောင် ကူညီပေးပါတယ်။
+*   **`Accelerator` Object**: 🤗 Accelerate library မှ main object ဖြစ်ပြီး distributed setup ကို initialize လုပ်ပေးသည်။
+*   **`accelerator.prepare()` Method**: `Accelerator` object ၏ method ဖြစ်ပြီး dataloaders, model နှင့် optimizer များကို distributed training အတွက် သင့်လျော်သော container များအဖြစ် ထုပ်ပိုးပေးသည်။
+*   **`accelerator.device`**: `Accelerator` object မှ ဖော်ပြသော လက်ရှိအသုံးပြုနေသော device (CPU သို့မဟုတ် GPU)။
+*   **`accelerator.backward(loss)` Method**: `Accelerator` object မှ gradient များကို တွက်ချက်ရန် အသုံးပြုသော method ဖြစ်ပြီး distributed training အတွက် မှန်ကန်စွာ လုပ်ဆောင်ပေးသည်။
+*   **`accelerate config` Command**: 🤗 Accelerate ကို စတင်အသုံးပြုရန်အတွက် configuration file တစ်ခုကို ဖန်တီးရန် အသုံးပြုသော command line tool။
+*   **`accelerate launch train.py` Command**: `train.py` script ကို distributed setup တွင် run ရန် အသုံးပြုသော command line tool။
+*   **`notebook_launcher()` Function**: 🤗 Accelerate library မှ function တစ်ခုဖြစ်ပြီး Notebook environment တွင် distributed training function များကို run နိုင်စေသည်။
+*   **Production Use**: ဆော့ဖ်ဝဲလ် သို့မဟုတ် AI model တစ်ခုကို လက်တွေ့အသုံးချရန် အဆင့်။
+*   **Optuna / Ray Tune (Libraries)**: Hyperparameter optimization အတွက် အသုံးပြုသော library များ။
+*   **Hyperparameter Tuning**: Model ၏ စွမ်းဆောင်ရည်ကို အကောင်းဆုံးဖြစ်စေရန် hyperparameters များကို ရှာဖွေခြင်း လုပ်ငန်းစဉ်။
+*   **Model Monitoring**: Training metrics, learning curves နှင့် validation performance များကို အချိန်နှင့်တစ်ပြေးညီ စောင့်ကြည့်ခြင်း။
+*   **Learning Curves**: Training လုပ်နေစဉ် training loss နှင့် validation loss/metrics များကို ဂရပ်ဖြင့် ပြသထားခြင်း။
+*   **Parameter-efficient Fine-tuning (PEFT)**: Model ၏ အချို့သော parameters များကိုသာ fine-tune လုပ်ခြင်းဖြင့် memory နှင့် computation ကို ချွေတာသော နည်းလမ်းများ (ဥပမာ- LoRA, AdaLoRA)။
+    *   **LoRA (Low-Rank Adaptation)**: Pretrained model ၏ weights များကို freeze လုပ်ထားပြီး small, low-rank matrices များကိုသာ fine-tune လုပ်သော PEFT နည်းလမ်း။
+    *   **AdaLoRA**: LoRA ၏ မူကွဲတစ်ခုဖြစ်ပြီး adaptive rank selection ကို အသုံးပြုသည်။
+*   **Quantization**: Model ၏ parameters များကို floating-point မှ integer ကဲ့သို့သော ပိုမိုသေးငယ်သော data types များသို့ ပြောင်းလဲခြင်းဖြင့် memory အသုံးပြုမှုနှင့် computation ကို လျှော့ချသော နည်းလမ်း။
+*   **Gradient Checkpointing**: Computation graph ၏ အချို့သော အလယ်အလတ် activations များကိုသာ သိမ်းဆည်းခြင်းဖြင့် memory အသုံးပြုမှုကို လျှော့ချသော နည်းလမ်း (backward pass အတွက် လိုအပ်သောအခါ ၎င်းတို့ကို ပြန်လည်တွက်ချက်သည်)။
\ No newline at end of file
diff --git a/chapters/my/chapter3/5.mdx b/chapters/my/chapter3/5.mdx
new file mode 100644
index 000000000..412c6e054
--- /dev/null
+++ b/chapters/my/chapter3/5.mdx
@@ -0,0 +1,485 @@
+# Learning Curve များကို နားလည်ခြင်း[[understanding-learning-curves]]
+
+<CourseFloatingBanner chapter={3}
+  classNames="absolute z-10 right-0 top-0"
+  notebooks={[
+    {label: "Google Colab", value: "https://colab.research.google.com/github/huggingface/notebooks/blob/master/course/en/chapter3/section7.ipynb"},
+    {label: "Aws Studio", value: "https://studiolab.sagemaker.aws/import/github/huggingface/notebooks/blob/master/course/en/chapter3/section7.ipynb"},
+]} />
+
+<Youtube id="7q5NyFT8REg"/>
+
+ယခု သင်သည် `Trainer` API (Application Programming Interface) နှင့် custom training loops နှစ်ခုလုံးကို အသုံးပြု၍ fine-tuning လုပ်နည်းကို သင်ယူပြီးပြီဖြစ်ရာ၊ ရလဒ်များကို မည်သို့ အဓိပ္ပာယ်ဖွင့်ဆိုရမည်ကို နားလည်ရန် အရေးကြီးသည်။ Learning curves များသည် training လုပ်နေစဉ် သင့် model ၏ စွမ်းဆောင်ရည်ကို အကဲဖြတ်ရန်နှင့် စွမ်းဆောင်ရည် မကျဆင်းမီ ဖြစ်နိုင်ခြေရှိသော ပြဿနာများကို ဖော်ထုတ်ရန် အလွန်တန်ဖိုးရှိသော ကိရိယာများဖြစ်သည်။
+
+ဤအပိုင်းတွင် accuracy နှင့် loss curves များကို မည်သို့ဖတ်ရှု အဓိပ္ပာယ်ဖွင့်ဆိုရမည်၊ မတူညီသော curve shapes များက ကျွန်ုပ်တို့၏ model အပြုအမူနှင့် ပတ်သက်၍ ဘာတွေပြောပြသည်ကို နားလည်ရမည်၊ နှင့် အဖြစ်များသော training ပြဿနာများကို မည်သို့ ဖြေရှင်းရမည်ကို လေ့လာပါမည်။
+
+## Learning Curves ဆိုသည်မှာ အဘယ်နည်း။[[what-are-learning-curves]]
+
+Learning curves များသည် training လုပ်နေစဉ်အတွင်း အချိန်နှင့်အမျှ သင်၏ model ၏ စွမ်းဆောင်ရည် metrics များကို ပုံဖြင့်ပြသထားခြင်းဖြစ်သည်။ စောင့်ကြည့်ရန် အရေးကြီးဆုံး curves နှစ်ခုမှာ-
+
+-   **Loss curves**: training steps သို့မဟုတ် epochs များတစ်လျှောက် model ၏ error (loss) မည်သို့ပြောင်းလဲသည်ကို ပြသသည်။
+-   **Accuracy curves**: training steps သို့မဟုတ် epochs များတစ်လျှောက် မှန်ကန်သော ခန့်မှန်းချက်များ၏ ရာခိုင်နှုန်းကို ပြသသည်။
+
+ဤ curves များသည် ကျွန်ုပ်တို့၏ model က ထိရောက်စွာ သင်ယူနေခြင်း ရှိမရှိ နားလည်ရန် ကူညီပေးပြီး စွမ်းဆောင်ရည်ကို မြှင့်တင်ရန်အတွက် ချိန်ညှိမှုများ ပြုလုပ်ရာတွင် လမ်းညွှန်ပေးနိုင်သည်။ Transformers များတွင် ဤ metrics များကို batch တစ်ခုစီအတွက် သီးခြားစီ တွက်ချက်ပြီး disk ထဲသို့ log လုပ်သည်။ ထို့နောက် ကျွန်ုပ်တို့သည် [Weights & Biases](https://wandb.ai/) ကဲ့သို့သော library များကို အသုံးပြု၍ ဤ curves များကို မြင်သာအောင် ပြုလုပ်ပြီး ကျွန်ုပ်တို့၏ model ၏ စွမ်းဆောင်ရည်ကို အချိန်နှင့်အမျှ မှတ်တမ်းတင်နိုင်သည်။
+
+### Loss Curves[[loss-curves]]
+
+Loss curve သည် model ၏ error က အချိန်နှင့်အမျှ မည်သို့လျော့နည်းသွားသည်ကို ပြသသည်။ ပုံမှန် အောင်မြင်သော training run တစ်ခုတွင် အောက်ပါကဲ့သို့ curve ကို သင်တွေ့ရပါလိမ့်မည်။
+
+![Loss Curve](https://huggingface.co/datasets/huggingface-course/documentation-images/resolve/main/en/chapter3/1.png)
+
+-   **မြင့်မားသော အစပိုင်း loss**: model သည် optimization မရှိဘဲ စတင်သောကြောင့် ခန့်မှန်းချက်များသည် အစပိုင်းတွင် ညံ့ဖျင်းသည်။
+-   **လျော့နည်းလာသော loss**: training လုပ်ဆောင်လာသည်နှင့်အမျှ loss သည် ယေဘုယျအားဖြင့် လျော့နည်းသင့်သည်။
+-   **Convergence**: နောက်ဆုံးတွင် loss သည် နည်းပါးသော တန်ဖိုးတစ်ခုတွင် တည်ငြိမ်လာပြီး model သည် ဒေတာရှိ ပုံစံများကို သင်ယူပြီးဖြစ်ကြောင်း ဖော်ပြသည်။
+
+ယခင်အခန်းများတွင်ကဲ့သို့ပင်၊ ဤ metrics များကို မှတ်တမ်းတင်ရန်နှင့် dashboard တွင် မြင်သာအောင် ပြုလုပ်ရန် Trainer API ကို ကျွန်ုပ်တို့ အသုံးပြုနိုင်သည်။ အောက်ပါသည် Weights & Biases ဖြင့် ဤသို့ လုပ်ဆောင်ပုံ၏ ဥပမာတစ်ခုဖြစ်သည်။
+
+```python
+# Trainer ဖြင့် training လုပ်နေစဉ် loss ကို မှတ်တမ်းတင်ပုံ ဥပမာ
+from transformers import Trainer, TrainingArguments
+import wandb
+
+# စမ်းသပ်မှု မှတ်တမ်းတင်ခြင်းအတွက် Weights & Biases ကို initialize လုပ်ပါ။
+wandb.init(project="transformer-fine-tuning", name="bert-mrpc-analysis")
+
+training_args = TrainingArguments(
+    output_dir="./results",
+    eval_strategy="steps",
+    eval_steps=50,
+    save_steps=100,
+    logging_steps=10,  # metrics များကို steps 10 တိုင်း log လုပ်ပါ။
+    num_train_epochs=3,
+    per_device_train_batch_size=16,
+    per_device_eval_batch_size=16,
+    report_to="wandb",  # logs များကို Weights & Biases သို့ ပို့ပါ။
+)
+
+trainer = Trainer(
+    model=model,
+    args=training_args,
+    train_dataset=tokenized_datasets["train"],
+    eval_dataset=tokenized_datasets["validation"],
+    data_collator=data_collator,
+    processing_class=tokenizer,
+    compute_metrics=compute_metrics,
+)
+
+# metrics များကို အလိုအလျောက် train လုပ်ပြီး log လုပ်ပါ။
+trainer.train()
+```
+
+### Accuracy Curves[[accuracy-curves]]
+
+Accuracy curve သည် အချိန်နှင့်အမျှ မှန်ကန်သော ခန့်မှန်းချက်များ၏ ရာခိုင်နှုန်းကို ပြသသည်။ loss curves များနှင့် မတူဘဲ accuracy curves များသည် model သင်ယူလာသည်နှင့်အမျှ ယေဘုယျအားဖြင့် တိုးလာသင့်ပြီး loss curve ထက် steps ပိုများနိုင်သည်။
+
+![Accuracy Curve](https://huggingface.co/datasets/huggingface-course/documentation-images/resolve/main/en/chapter3/2.png)
+
+-   **အစပိုင်း နိမ့်ပါးခြင်း**: model သည် ဒေတာရှိ ပုံစံများကို မသင်ယူရသေးသောကြောင့် အစပိုင်း accuracy သည် နိမ့်ပါးသင့်သည်။
+-   **Training နှင့်အတူ တိုးလာခြင်း**: model သည် ဒေတာရှိ ပုံစံများကို သင်ယူနိုင်ပါက accuracy သည် ယေဘုယျအားဖြင့် တိုးတက်သင့်သည်။
+-   **Plateaus များ ပြသနိုင်ခြင်း**: model သည် true labels များနှင့် နီးစပ်သော ခန့်မှန်းချက်များ ပြုလုပ်သောကြောင့် accuracy သည် ချောမွေ့စွာ တိုးတက်ခြင်းထက် discrete jumps များဖြင့် တိုးတက်လေ့ရှိသည်။
+
+> [!TIP]
+> 💡 **Accuracy Curves များ "Steppy" ဖြစ်ရခြင်း အကြောင်းရင်း**: ဆက်တိုက်ဖြစ်သော loss နှင့်မတူဘဲ၊ accuracy ကို discrete predictions များကို true labels များနှင့် နှိုင်းယှဉ်ခြင်းဖြင့် တွက်ချက်သည်။ model ၏ confidence တွင် သေးငယ်သော တိုးတက်မှုများသည် နောက်ဆုံးခန့်မှန်းချက်ကို ပြောင်းလဲနိုင်ခြင်းမရှိဘဲ၊ threshold ကို ကျော်လွန်သည်အထိ accuracy ကို ပြောင်းလဲခြင်းမရှိဘဲ ပြားသွားစေသည်။
+
+### Convergence[[convergence]]
+
+Convergence ဆိုသည်မှာ model ၏ စွမ်းဆောင်ရည် တည်ငြိမ်လာပြီး loss နှင့် accuracy curves များ ညီညာသွားသည့်အခါ ဖြစ်ပေါ်သည်။ ဒါက model သည် ဒေတာရှိ ပုံစံများကို သင်ယူပြီးပြီဖြစ်ကာ အသုံးပြုရန် အသင့်ဖြစ်ပြီဆိုတဲ့ လက္ခဏာတစ်ခုဖြစ်သည်။ ရိုးရှင်းစွာပြောရလျှင် ကျွန်ုပ်တို့သည် model ကို train လုပ်တိုင်း တည်ငြိမ်သော စွမ်းဆောင်ရည်သို့ convergence ဖြစ်စေရန် ရည်ရွယ်သည်။
+
+![Convergence](https://huggingface.co/datasets/huggingface-course/documentation-images/resolve/main/en/chapter3/4.png)
+
+models များ convergence ဖြစ်ပြီးသည်နှင့် ၎င်းတို့ကို data အသစ်များပေါ်တွင် ခန့်မှန်းချက်များ ပြုလုပ်ရန် အသုံးပြုနိုင်ပြီး model ၏ စွမ်းဆောင်ရည် မည်မျှ ကောင်းမွန်သည်ကို နားလည်ရန် evaluation metrics များကို ကိုးကားနိုင်သည်။
+
+## Learning Curve ပုံစံများကို အဓိပ္ပာယ်ဖွင့်ဆိုခြင်း[[interpreting-learning-curve-patterns]]
+
+မတူညီသော curve shapes များက သင့် model ၏ training ၏ ကွဲပြားသော ကဏ္ဍများကို ဖော်ပြသည်။ အဖြစ်အများဆုံး ပုံစံများနှင့် ၎င်းတို့၏ အဓိပ္ပာယ်များကို ဆန်းစစ်ကြည့်ကြပါစို့။
+
+### ကောင်းမွန်သော Learning Curves[[healthy-learning-curves]]
+
+ကောင်းမွန်စွာ လုပ်ဆောင်သော training run တစ်ခုသည် အောက်ပါကဲ့သို့ curve shapes များကို ပြသလေ့ရှိသည်။
+
+![Healthy Loss Curve](https://huggingface.co/datasets/huggingface-course/documentation-images/resolve/main/en/chapter3/5.png)
+
+အထက်ပါ သရုပ်ဖော်ပုံကို ကြည့်ကြပါစို့။ ၎င်းသည် loss curve (ဘယ်ဘက်) နှင့် သက်ဆိုင်ရာ accuracy curve (ညာဘက်) နှစ်ခုလုံးကို ပြသထားသည်။ ဤ curves များသည် ထူးခြားသော ဝိသေသလက္ခဏာများ ရှိသည်။
+
+Loss curve သည် အချိန်နှင့်အမျှ model ၏ loss တန်ဖိုးကို ပြသသည်။ အစပိုင်းတွင် loss သည် မြင့်မားပြီးနောက် တဖြည်းဖြည်း လျော့နည်းသွားသည်၊ ဒါက model က တိုးတက်နေကြောင်း ဖော်ပြသည်။ loss တန်ဖိုး လျော့နည်းခြင်းက model က ပိုမိုကောင်းမွန်သော ခန့်မှန်းချက်များ ပြုလုပ်နေကြောင်း ညွှန်ပြသည်၊ ဘာလို့လဲဆိုတော့ loss သည် ခန့်မှန်းထားသော output နှင့် true output အကြား error ကို ကိုယ်စားပြုလို့ပါပဲ။
+
+ယခု ကျွန်ုပ်တို့၏ အာရုံကို accuracy curve သို့ ပြောင်းကြပါစို့။ ၎င်းသည် အချိန်နှင့်အမျှ model ၏ accuracy ကို ကိုယ်စားပြုသည်။ Accuracy curve သည် နည်းပါးသော တန်ဖိုးဖြင့် စတင်ပြီး training လုပ်ဆောင်လာသည်နှင့်အမျှ တိုးလာသည်။ Accuracy သည် မှန်ကန်စွာ classify လုပ်ထားသော instance များ၏ အချိုးကို တိုင်းတာသည်။ ထို့ကြောင့် accuracy curve တိုးလာသည်နှင့်အမျှ model က ပိုမိုမှန်ကန်သော ခန့်မှန်းချက်များ ပြုလုပ်နေကြောင်း ဖော်ပြသည်။
+
+curves များကြား သိသာထင်ရှားသော ကွာခြားချက်တစ်ခုမှာ ချောမွေ့မှုနှင့် accuracy curve ပေါ်ရှိ "plateaus" များ ရှိနေခြင်းဖြစ်သည်။ loss သည် ချောမွေ့စွာ လျော့နည်းနေသော်လည်း၊ accuracy curve ပေါ်ရှိ plateaus များသည် ဆက်တိုက်တိုးတက်ခြင်းထက် accuracy တွင် discrete jumps များကို ဖော်ပြသည်။ ဤအပြုအမူသည် accuracy ကို တိုင်းတာပုံကြောင့် ဖြစ်သည်။ model ၏ output သည် target နှင့် ပိုမိုနီးစပ်လာပါက loss သည် တိုးတက်နိုင်သည်၊ နောက်ဆုံးခန့်မှန်းချက်သည် မှားယွင်းနေသေးလျှင်ပင်ပေါ့။ သို့သော် Accuracy သည် ခန့်မှန်းချက်က မှန်ကန်ရန် threshold ကို ကျော်လွန်မှသာ တိုးတက်သည်။
+
+ဥပမာအားဖြင့်၊ ကြောင် (0) နှင့် ခွေး (1) ကို ခွဲခြားသော binary classifier တစ်ခုတွင်၊ model က ခွေးပုံ (true value 1) အတွက် 0.3 ဟု ခန့်မှန်းပါက၊ ၎င်းကို 0 အဖြစ် ပတ်လည်ကိန်းသတ်မှတ်ပြီး မှားယွင်းသော classification ဖြစ်သည်။ နောက်တစ်ဆင့်တွင် 0.4 ဟု ခန့်မှန်းပါက၊ ၎င်းသည် မှားယွင်းနေသေးသည်။ 0.4 သည် 0.3 ထက် 1 နှင့် ပိုမိုနီးစပ်သောကြောင့် loss သည် လျော့နည်းသွားမည်ဖြစ်သော်လည်း accuracy သည် ပြောင်းလဲခြင်းမရှိဘဲ plateau တစ်ခုကို ဖန်တီးသည်။ model က 0.5 ထက် ပိုကြီးသော တန်ဖိုးတစ်ခုကို ခန့်မှန်းပြီး 1 အဖြစ် ပတ်လည်ကိန်းသတ်မှတ်မှသာ accuracy သည် တိုးလာမည်။
+
+> [!TIP]
+> **ကောင်းမွန်သော curves ၏ ဝိသေသလက္ခဏာများ:**
+> - **Loss တွင် ချောမွေ့စွာ လျော့နည်းခြင်း**: training နှင့် validation loss နှစ်ခုလုံး တဖြည်းဖြည်း လျော့နည်းသည်။
+> - **နီးစပ်သော training/validation စွမ်းဆောင်ရည်**: training နှင့် validation metrics များကြား ကွာဟချက် နည်းပါးသည်။
+> - **Convergence**: curves များ ညီညာသွားပြီး model သည် ပုံစံများကို သင်ယူပြီးဖြစ်ကြောင်း ဖော်ပြသည်။
+
+### လက်တွေ့ဥပမာများ[[practical-examples]]
+
+learning curves ၏ လက်တွေ့ဥပမာအချို့ကို လုပ်ဆောင်ကြပါစို့။ ပထမဦးစွာ training လုပ်နေစဉ် learning curves များကို စောင့်ကြည့်ရန် နည်းလမ်းအချို့ကို မီးမောင်းထိုးပြပါမည်။ အောက်တွင် learning curves တွင် တွေ့မြင်နိုင်သော မတူညီသော ပုံစံများကို ခွဲခြမ်းစိတ်ဖြာပါမည်။
+
+#### Training လုပ်နေစဉ်[[during-training]]
+
+training လုပ်ငန်းစဉ်အတွင်း (သင် `trainer.train()` ကို ခေါ်ဆိုပြီးနောက်) သင်သည် ဤအဓိက အညွှန်းကိန်းများကို စောင့်ကြည့်နိုင်သည်။
+
+1.  **Loss convergence**: loss သည် ဆက်လက် လျော့နည်းနေသေးလား သို့မဟုတ် plateau ဖြစ်နေပြီလား။
+2.  **Overfitting လက္ခဏာများ**: training loss လျော့နည်းနေစဉ် validation loss က တိုးလာခြင်း ရှိမရှိ။
+3.  **Learning rate**: curves များသည် အလွန်မမှန် (LR (Learning Rate) အလွန်မြင့်မား) သို့မဟုတ် အလွန်ပြားနေသလား (LR အလွန်နည်းပါး)။
+4.  **Stability**: ပြဿနာများကို ညွှန်ပြသော ရုတ်တရက် spikes သို့မဟုတ် drops များ ရှိမရှိ။
+
+#### Training ပြီးနောက်[[after-training]]
+
+training လုပ်ငန်းစဉ် ပြီးဆုံးပြီးနောက် သင်သည် model ၏ စွမ်းဆောင်ရည်ကို နားလည်ရန် ပြည့်စုံသော curves များကို ဆန်းစစ်နိုင်သည်။
+
+1.  **နောက်ဆုံး စွမ်းဆောင်ရည်**: model က လက်ခံနိုင်သော စွမ်းဆောင်ရည်အဆင့်များသို့ ရောက်ရှိခဲ့သလား။
+2.  **ထိရောက်မှု**: epochs နည်းပါးစွာဖြင့် တူညီသော စွမ်းဆောင်ရည်ကို ရရှိနိုင်ပါသလား။
+3.  **Generalization**: training နှင့် validation စွမ်းဆောင်ရည် မည်မျှ နီးစပ်သလဲ။
+4.  **Trends**: ထပ်မံ training လုပ်ခြင်းဖြင့် စွမ်းဆောင်ရည် တိုးတက်နိုင်ပါသလား။
+
+> [!TIP]
+> 🔍 **W&B Dashboard Features**: Weights & Biases သည် သင်၏ learning curves ၏ လှပသော၊ interactive plots များကို အလိုအလျောက် ဖန်တီးပေးသည်။ သင်သည်-
+> - run များစွာကို ဘေးချင်းယှဉ်၍ နှိုင်းယှဉ်နိုင်သည်
+> - custom metrics နှင့် visualizations များကို ထည့်သွင်းနိုင်သည်
+> - ပုံမှန်မဟုတ်သော အပြုအမူများအတွက် alerts များကို သတ်မှတ်နိုင်သည်
+> - ရလဒ်များကို သင်၏ team နှင့် မျှဝေနိုင်သည်
+>
+> [Weights & Biases documentation](https://docs.wandb.ai/) တွင် ပိုမိုလေ့လာပါ။
+
+#### Overfitting[[overfitting]]
+
+Overfitting ဆိုသည်မှာ model သည် training data မှ အလွန်အမင်း သင်ယူပြီး မတူညီသော data (validation set မှ ကိုယ်စားပြုသော) ကို အသုံးချနိုင်စွမ်းမရှိခြင်း ဖြစ်သည်။
+
+![Overfitting](https://huggingface.co/datasets/huggingface-course/documentation-images/resolve/main/en/chapter3/10.png)
+
+**လက္ခဏာများ:**
+
+- training loss ဆက်လက်လျော့နည်းနေစဉ် validation loss တိုးလာခြင်း သို့မဟုတ် plateau ဖြစ်ခြင်း။
+- training နှင့် validation accuracy များကြား ကွာဟချက် ကြီးမားခြင်း။
+- training accuracy သည် validation accuracy ထက် များစွာ မြင့်မားခြင်း။
+
+**Overfitting အတွက် ဖြေရှင်းနည်းများ:**
+- **Regularization**: dropout, weight decay သို့မဟုတ် အခြား regularization နည်းလမ်းများကို ထည့်သွင်းပါ။
+- **Early stopping**: validation စွမ်းဆောင်ရည် မတိုးတက်တော့သည့်အခါ training ကို ရပ်တန့်ပါ။
+- **Data augmentation**: training data အမျိုးအစားကို တိုးမြှင့်ပါ။
+- **Model complexity ကို လျှော့ချပါ**: သေးငယ်သော model သို့မဟုတ် parameters နည်းပါးစွာ အသုံးပြုပါ။
+
+အောက်ပါ ဥပမာတွင် ကျွန်ုပ်တို့သည် overfitting ကို ကာကွယ်ရန် early stopping ကို အသုံးပြုသည်။ ကျွန်ုပ်တို့သည် `early_stopping_patience` ကို 3 ဟု သတ်မှတ်ထားသည်၊ ဆိုလိုသည်မှာ validation loss သည် 3 consecutive epochs အတွက် မတိုးတက်ပါက training ကို ရပ်တန့်လိမ့်မည်။
+
+```python
+# early stopping ဖြင့် overfitting ကို detect လုပ်ပုံ ဥပမာ
+from transformers import EarlyStoppingCallback
+
+training_args = TrainingArguments(
+    output_dir="./results",
+    eval_strategy="steps",
+    eval_steps=100,
+    save_strategy="steps",
+    save_steps=100,
+    load_best_model_at_end=True,
+    metric_for_best_model="eval_loss",
+    greater_is_better=False,
+    num_train_epochs=10,  # မြင့်မားစွာ သတ်မှတ်ထားသော်လည်း စောစီးစွာ ရပ်တန့်ပါမည်။
+)
+
+# overfitting ကို ကာကွယ်ရန် early stopping ကို ထည့်သွင်းပါ။
+trainer = Trainer(
+    model=model,
+    args=training_args,
+    train_dataset=tokenized_datasets["train"],
+    eval_dataset=tokenized_datasets["validation"],
+    data_collator=data_collator,
+    processing_class=tokenizer,
+    compute_metrics=compute_metrics,
+    callbacks=[EarlyStoppingCallback(early_stopping_patience=3)],
+)
+```
+
+#### 2. Underfitting[[underfitting]]
+
+Underfitting ဆိုသည်မှာ model သည် ဒေတာရှိ အခြေခံပုံစံများကို ဖမ်းယူရန် အလွန်ရိုးရှင်းလွန်းသောအခါ ဖြစ်ပေါ်သည်။ ၎င်းသည် အကြောင်းရင်းအများအပြားကြောင့် ဖြစ်နိုင်သည်။
+
+- model သည် အလွန်သေးငယ်ခြင်း သို့မဟုတ် ပုံစံများကို သင်ယူရန် စွမ်းဆောင်ရည် မရှိခြင်း။
+- learning rate သည် အလွန်နည်းပါးပြီး သင်ယူမှု နှေးကွေးခြင်း။
+- dataset သည် အလွန်သေးငယ်ခြင်း သို့မဟုတ် ပြဿနာကို ကိုယ်စားပြုမှု မရှိခြင်း။
+- model သည် မှန်ကန်စွာ regularization မလုပ်ထားခြင်း။
+
+![Underfitting](https://huggingface.co/datasets/huggingface-course/documentation-images/resolve/main/en/chapter3/7.png)
+
+**လက္ခဏာများ:**
+- training နှင့် validation loss နှစ်ခုလုံး မြင့်မားနေသေးသည်။
+- training ၏ အစောပိုင်းတွင် model စွမ်းဆောင်ရည် plateau ဖြစ်သွားသည်။
+- training accuracy သည် မျှော်လင့်ထားသည်ထက် နိမ့်သည်။
+
+**Underfitting အတွက် ဖြေရှင်းနည်းများ:**
+- **Model capacity တိုးမြှင့်ပါ**: ပိုကြီးမားသော model သို့မဟုတ် parameters ပိုများစွာ အသုံးပြုပါ။
+- **ပိုမိုကြာကြာ train လုပ်ပါ**: epochs အရေအတွက်ကို တိုးမြှင့်ပါ။
+- **Learning rate ချိန်ညှိပါ**: မတူညီသော learning rates များကို စမ်းသပ်ကြည့်ပါ။
+- **ဒေတာအရည်အသွေးကို စစ်ဆေးပါ**: သင့်ဒေတာကို မှန်ကန်စွာ preprocessing လုပ်ထားကြောင်း သေချာပါစေ။
+
+အောက်ပါ ဥပမာတွင် model က ဒေတာရှိ ပုံစံများကို သင်ယူနိုင်မလား သိရှိရန် epochs ပိုများစွာ train လုပ်သည်။
+
+```python
+from transformers import TrainingArguments
+
+training_args = TrainingArguments(
+    output_dir="./results",
+    -num_train_epochs=5,
+    +num_train_epochs=10,
+)
+```
+
+#### 3. Erratic Learning Curves[[erratic-learning-curves]]
+
+Erratic learning curves များသည် model က ထိရောက်စွာ သင်ယူခြင်းမရှိသောအခါ ဖြစ်ပေါ်သည်။ ၎င်းသည် အကြောင်းရင်းအများအပြားကြောင့် ဖြစ်နိုင်သည်။
+
+- learning rate သည် အလွန်မြင့်မားပြီး model ကို optimal parameters များကို ကျော်လွန်သွားစေသည်။
+- batch size သည် အလွန်သေးငယ်ပြီး model ကို နှေးကွေးစွာ သင်ယူစေသည်။
+- model သည် မှန်ကန်စွာ regularization မလုပ်ထားသောကြောင့် training data ကို overfitting ဖြစ်စေသည်။
+- dataset ကို မှန်ကန်စွာ preprocessing မလုပ်ထားသောကြောင့် model ကို noise မှ သင်ယူစေသည်။
+
+![Erratic Learning Curves](https://huggingface.co/datasets/huggingface-course/documentation-images/resolve/main/en/chapter3/3.png)
+
+**လက္ခဏာများ:**
+- loss သို့မဟုတ် accuracy တွင် မကြာခဏ အတက်အကျများ။
+- curves များသည် မြင့်မားသော variance သို့မဟုတ် မတည်ငြိမ်မှုကို ပြသသည်။
+- စွမ်းဆောင်ရည်သည် ရှင်းလင်းသော trend မရှိဘဲ လှုပ်ရှားနေသည်။
+
+training နှင့် validation curves နှစ်ခုလုံးသည် erratic အပြုအမူကို ပြသသည်။
+
+![Erratic Learning Curves](https://huggingface.co/datasets/huggingface-course/documentation-images/resolve/main/en/chapter3/9.png)
+
+**Erratic curves များအတွက် ဖြေရှင်းနည်းများ:**
+- **Learning rate ကို လျှော့ချပါ**: ပိုမိုတည်ငြိမ်သော training အတွက် step size ကို လျှော့ချပါ။
+- **Batch size တိုးမြှင့်ပါ**: ပိုကြီးမားသော batches များက ပိုမိုတည်ငြိမ်သော gradients များကို ပေးစွမ်းသည်။
+- **Gradient clipping**: exploding gradients များကို ကာကွယ်ပါ။
+- **ပိုမိုကောင်းမွန်သော ဒေတာ preprocessing**: တသမတ်တည်းသော ဒေတာအရည်အသွေးကို သေချာပါစေ။
+
+အောက်ပါ ဥပမာတွင် ကျွန်ုပ်တို့သည် learning rate ကို လျှော့ချပြီး batch size ကို တိုးမြှင့်သည်။
+
+```python
+from transformers import TrainingArguments
+
+training_args = TrainingArguments(
+    output_dir="./results",
+    -learning_rate=1e-5,
+    +learning_rate=1e-4,
+    -per_device_train_batch_size=16,
+    +per_device_train_batch_size=32,
+)
+```
+
+## အဓိက အချက်များ[[key-takeaways]]
+
+learning curves များကို နားလည်ခြင်းသည် ထိရောက်သော machine learning practitioner တစ်ဦးဖြစ်လာရန် အရေးကြီးသည်။ ဤမြင်သာသော ကိရိယာများသည် သင်၏ model ၏ training လုပ်ဆောင်မှုတိုးတက်မှုအကြောင်း ချက်ချင်း တုံ့ပြန်ချက်ပေးပြီး training ကို ရပ်တန့်ရမည့်အချိန်၊ hyperparameters များကို ချိန်ညှိရမည့်အချိန် သို့မဟုတ် မတူညီသော နည်းလမ်းများကို စမ်းသပ်ရမည့်အချိန်တို့နှင့် ပတ်သက်၍ အသိဉာဏ်ရှိသော ဆုံးဖြတ်ချက်များ ချမှတ်ရာတွင် ကူညီပေးသည်။ လေ့ကျင့်မှုဖြင့် သင်သည် ကောင်းမွန်သော learning curves များ မည်သို့ရှိသည်ကို အလိုလို နားလည်လာပြီး ပြဿနာများ ပေါ်ပေါက်လာသောအခါ ၎င်းတို့ကို မည်သို့ဖြေရှင်းရမည်ကို သိရှိလာမည်။
+
+> [!TIP]
+> 💡 **အဓိက အချက်များ:**
+> - Learning curves များသည် model training progress ကို နားလည်ရန် မရှိမဖြစ်လိုအပ်သော ကိရိယာများဖြစ်သည်။
+> - loss နှင့် accuracy curves နှစ်ခုလုံးကို စောင့်ကြည့်ပါ၊ သို့သော် ၎င်းတို့တွင် မတူညီသော ဝိသေသလက္ခဏာများ ရှိသည်ကို သတိရပါ။
+> - Overfitting သည် ကွဲပြားသော training/validation စွမ်းဆောင်ရည်အဖြစ် ပေါ်လာသည်။
+> - Underfitting သည် training နှင့် validation data နှစ်ခုလုံးတွင် စွမ်းဆောင်ရည် ညံ့ဖျင်းခြင်းအဖြစ် ပေါ်လာသည်။
+> - Weights & Biases ကဲ့သို့သော ကိရိယာများသည် learning curves များကို မှတ်တမ်းတင်ရန်နှင့် ဆန်းစစ်ရန် လွယ်ကူစေသည်။
+> - Early stopping နှင့် မှန်ကန်သော regularization တို့သည် အဖြစ်များသော training ပြဿနာအများစုကို ဖြေရှင်းနိုင်သည်။
+>
+> 🔬 **နောက်တစ်ဆင့်များ**: သင်၏ကိုယ်ပိုင် fine-tuning စမ်းသပ်မှုများတွင် learning curves များကို ဆန်းစစ်ခြင်းကို လေ့ကျင့်ပါ။ မတူညီသော hyperparameters များကို စမ်းသပ်ပြီး ၎င်းတို့က curve shapes များကို မည်သို့သက်ရောက်သည်ကို လေ့လာပါ။ ဤလက်တွေ့အတွေ့အကြုံသည် training progress ကို ဖတ်ရှုရန်အတွက် အလိုလိုသိနိုင်စွမ်းကို မြှင့်တင်ပေးမည့် အကောင်းဆုံးနည်းလမ်းဖြစ်သည်။
+
+## အခန်း၏ ဗဟုသုတစစ်ဆေးခြင်း[[section-quiz]]
+
+learning curves နှင့် training analysis concepts များအကြောင်း သင့်နားလည်မှုကို စမ်းသပ်ပါ။
+
+### 1. Training loss လျော့နည်းနေသော်လည်း validation loss တိုးလာပါက အများအားဖြင့် ဘာကို ဆိုလိုသလဲ။
+
+<Question
+	choices={[
+		{
+			text: "Model သည် အောင်မြင်စွာ သင်ယူနေပြီး ဆက်လက်တိုးတက်လိမ့်မည်။",
+			explain: "validation loss တိုးလာနေပြီး training loss လျော့နည်းနေပါက၊ ၎င်းသည် ပြဿနာတစ်ခုကို ညွှန်ပြပြီး အောင်မြင်မှုကို မဟုတ်ပါ။"
+		},
+		{
+			text: "Model သည် training data ကို overfitting ဖြစ်နေသည်။",
+			explain: "မှန်ပါသည်။ ၎င်းသည် overfitting ၏ ပုံမှန်လက္ခဏာတစ်ခုဖြစ်သည် - model သည် training data ပေါ်တွင် ကောင်းစွာလုပ်ဆောင်သော်လည်း မမြင်ရသေးသော validation data ပေါ်တွင် ညံ့ဖျင်းသည်။",
+            correct: true
+		},
+		{
+			text: "Learning rate သည် အလွန်နည်းပါးသည်။",
+			explain: "Learning rate နိမ့်ကျခြင်းက သင်ယူမှု နှေးကွေးစေမည်ဖြစ်ပြီး training နှင့် validation စွမ်းဆောင်ရည် ကွဲပြားခြင်းကို ဖြစ်စေမည်မဟုတ်ပါ။"
+		},
+        {
+			text: "Dataset သည် အလွန်သေးငယ်သည်။",
+			explain: "သေးငယ်သော datasets များသည် overfitting ကို အထောက်အကူပြုနိုင်သော်လည်း၊ ဤသီးခြားပုံစံသည် dataset size မည်သို့ပင်ဖြစ်စေ overfitting ၏ အဓိပ္ပာယ်ဖွင့်ဆိုချက်ဖြစ်သည်။"
+		}
+	]}
+/>
+
+### 2. Accuracy curves များသည် ချောမွေ့စွာ တိုးတက်ခြင်းထက် "steppy" သို့မဟုတ် plateau-like ပုံစံကို ဘာကြောင့် ပြလေ့ရှိသလဲ။
+
+<Question
+	choices={[
+		{
+			text: "accuracy တွက်ချက်မှုတွင် အမှားတစ်ခု ရှိနေသည်။",
+			explain: "steppy ပုံစံသည် ပုံမှန်ဖြစ်ပြီး မျှော်လင့်ထားသည့်အတိုင်းဖြစ်ပြီး အမှားတစ်ခု မဟုတ်ပါ။"
+		},
+		{
+			text: "Accuracy သည် discrete metric တစ်ခုဖြစ်ပြီး predictions များ decision boundaries ကို ကျော်လွန်မှသာ ပြောင်းလဲသည်။",
+			explain: "မှန်ပါသည်။ Loss နှင့်မတူဘဲ၊ accuracy သည် discrete prediction decisions များပေါ်တွင် မူတည်သောကြောင့်၊ confidence တွင် သေးငယ်သော တိုးတက်မှုများသည် threshold ကို ကျော်လွန်သည်အထိ နောက်ဆုံး accuracy ကို ပြောင်းလဲနိုင်ခြင်းမရှိပါ။",
+            correct: true
+		},
+		{
+			text: "Model သည် ထိရောက်စွာ သင်ယူခြင်း မရှိပါ။",
+			explain: "Model က ကောင်းစွာ သင်ယူနေသည့်အခါ၌ပင် steppy accuracy curves များသည် ပုံမှန်ဖြစ်သည်။"
+		},
+        {
+			text: "Batch size သည် အလွန်သေးငယ်သည်။",
+			explain: "Batch size သည် training stability ကို ထိခိုက်စေသော်လည်း accuracy metrics ၏ မူလ discrete သဘာဝကို မရှင်းပြပါ။"
+		}
+	]}
+/>
+
+### 3. Erratic, အလွန်အတက်အကျများသော learning curves များကို တွေ့ရှိသောအခါ အကောင်းဆုံးနည်းလမ်းက ဘာလဲ။
+
+<Question
+	choices={[
+		{
+			text: "Convergence ကို အရှိန်မြှင့်ရန် learning rate ကို တိုးမြှင့်ပါ။",
+			explain: "Learning rate တိုးမြှင့်ခြင်းသည် အတက်အကျများကို ပိုမိုဆိုးရွားစေနိုင်သည်။"
+		},
+		{
+			text: "Learning rate ကို လျှော့ချပြီး ဖြစ်နိုင်လျှင် batch size ကို တိုးမြှင့်ပါ။",
+			explain: "မှန်ပါသည်။ Learning rates နည်းပါးခြင်းနှင့် batch sizes ကြီးမားခြင်းတို့သည် ပိုမိုတည်ငြိမ်သော training ကို ဖြစ်ပေါ်စေသည်။",
+            correct: true
+		},
+		{
+			text: "Model သည် တိုးတက်တော့မည်မဟုတ်သောကြောင့် training ကို ချက်ချင်းရပ်တန့်ပါ။",
+			explain: "Erratic curves များကို hyperparameter ချိန်ညှိမှုများဖြင့် မကြာခဏ ပြုပြင်နိုင်သည်။"
+		},
+        {
+			text: "လုံးဝ မတူညီသော model architecture သို့ ပြောင်းလဲပါ။",
+			explain: "ဒါက စောသေးသည် - erratic curves များကို များသောအားဖြင့် hyperparameter tuning ဖြင့် ပြုပြင်နိုင်သည်။"
+		}
+	]}
+/>
+
+### 4. Early stopping ကို ဘယ်အချိန်မှာ အသုံးပြုရန် စဉ်းစားသင့်သလဲ။
+
+<Question
+	choices={[
+		{
+			text: "အမြဲတမ်း၊ ၎င်းသည် မည်သည့် overfitting ပုံစံကိုမဆို ကာကွယ်သောကြောင့်။",
+			explain: "Early stopping သည် အသုံးဝင်သော်လည်း အမြဲတမ်း မလိုအပ်ပါ၊ အထူးသဖြင့် အခြား regularization နည်းလမ်းများ အလုပ်လုပ်နေပါက။"
+		},
+		{
+			text: "Validation စွမ်းဆောင်ရည် မတိုးတက်တော့သည့်အခါ သို့မဟုတ် စတင်ကျဆင်းသည့်အခါ။",
+			explain: "မှန်ပါသည်။ Early stopping သည် model သည် ပိုမိုကောင်းမွန်စွာ generalization မလုပ်နိုင်တော့သည့်အခါ training ကို ရပ်တန့်စေခြင်းဖြင့် overfitting ကို ကာကွယ်ရန် ကူညီပေးသည်။",
+            correct: true
+		},
+		{
+			text: "Training loss သည် အရှိန်အဟုန်ဖြင့် လျော့နည်းနေဆဲဖြစ်သည့်အခါမှသာ။",
+			explain: "Training loss သည် အရှိန်အဟုန်ဖြင့် လျော့နည်းနေပြီး validation စွမ်းဆောင်ရည် ကောင်းမွန်ပါက၊ သင်သည် training ကို ဆက်လက်လုပ်ဆောင်လိုပေမည်။"
+		},
+        {
+			text: "ဘယ်တော့မှ မသုံးပါနှင့်၊ ၎င်းသည် model ၏ အလားအလာ အပြည့်အဝကို မရောက်ရှိစေရန် တားဆီးသောကြောင့်။",
+			explain: "Early stopping သည် overfitting ကို ကာကွယ်ခြင်းဖြင့် နောက်ဆုံး model စွမ်းဆောင်ရည်ကို မကြာခဏ တိုးတက်စေသည့် တန်ဖိုးရှိသော နည်းလမ်းတစ်ခုဖြစ်သည်။"
+		}
+	]}
+/>
+
+### 5. သင့် model သည် underfitting ဖြစ်နေကြောင်း ဘာက ညွှန်ပြသလဲ။
+
+<Question
+	choices={[
+		{
+			text: "Training accuracy သည် validation accuracy ထက် များစွာ မြင့်မားသည်။",
+			explain: "ဒါက overfitting ကို ဖော်ပြတာဖြစ်ပြီး underfitting မဟုတ်ပါဘူး။"
+		},
+		{
+			text: "Training နှင့် validation စွမ်းဆောင်ရည် နှစ်ခုလုံး ညံ့ဖျင်းပြီး စောစီးစွာ plateau ဖြစ်သည်။",
+			explain: "မှန်ပါသည်။ Underfitting သည် model သည် ပုံစံများကို သင်ယူရန် စွမ်းဆောင်ရည် မရှိသောအခါ ဖြစ်ပေါ်ပြီး training နှင့် validation data နှစ်ခုလုံးတွင် စွမ်းဆောင်ရည် ညံ့ဖျင်းစေသည်။",
+            correct: true
+		},
+		{
+			text: "Learning curves များသည် အလွန်ချောမွေ့ပြီး အတက်အကျ မရှိပါ။",
+			explain: "ချောမွေ့သော curves များသည် ယေဘုယျအားဖြင့် ကောင်းမွန်ပြီး underfitting ကို ညွှန်ပြခြင်း မရှိပါ။"
+		},
+        {
+			text: "Validation loss သည် training loss ထက် ပိုမိုမြန်ဆန်စွာ လျော့နည်းနေသည်။",
+			explain: "ဒါက တကယ်တမ်း ကောင်းမွန်တဲ့ လက္ခဏာတစ်ခုဖြစ်ပြီး ပြဿနာ မဟုတ်ပါဘူး။"
+		}
+	]}
+/>
+
+> [!TIP]
+> 💡 **အဓိက အချက်များ:**
+> - Learning curves များသည် model training progress ကို နားလည်ရန် မရှိမဖြစ်လိုအပ်သော ကိရိယာများဖြစ်သည်။
+> - loss နှင့် accuracy curves နှစ်ခုလုံးကို စောင့်ကြည့်ပါ၊ သို့သော် ၎င်းတို့တွင် မတူညီသော ဝိသေသလက္ခဏာများ ရှိသည်ကို သတိရပါ။
+> - Overfitting သည် ကွဲပြားသော training/validation စွမ်းဆောင်ရည်အဖြစ် ပေါ်လာသည်။
+> - Underfitting သည် training နှင့် validation data နှစ်ခုလုံးတွင် စွမ်းဆောင်ရည် ညံ့ဖျင်းခြင်းအဖြစ် ပေါ်လာသည်။
+> - Weights & Biases ကဲ့သို့သော ကိရိယာများသည် learning curves များကို မှတ်တမ်းတင်ရန်နှင့် ဆန်းစစ်ရန် လွယ်ကူစေသည်။
+> - Early stopping နှင့် မှန်ကန်သော regularization တို့သည် အဖြစ်များသော training ပြဿနာအများစုကို ဖြေရှင်းနိုင်သည်။
+>
+> 🔬 **နောက်တစ်ဆင့်များ**: သင်၏ကိုယ်ပိုင် fine-tuning စမ်းသပ်မှုများတွင် learning curves များကို ဆန်းစစ်ခြင်းကို လေ့ကျင့်ပါ။ မတူညီသော hyperparameters များကို စမ်းသပ်ပြီး ၎င်းတို့က curve shapes များကို မည်သို့သက်ရောက်သည်ကို လေ့လာပါ။ ဤလက်တွေ့အတွေ့အကြုံသည် training progress ကို ဖတ်ရှုရန်အတွက် အလိုလိုသိနိုင်စွမ်းကို မြှင့်တင်ပေးမည့် အကောင်းဆုံးနည်းလမ်းဖြစ်သည်။
+
+## ဝေါဟာရ ရှင်းလင်းချက် (Glossary)
+
+*   **Fine-tuning**: ကြိုတင်လေ့ကျင့်ထားပြီးသား (pre-trained) မော်ဒယ်တစ်ခုကို သီးခြားလုပ်ငန်းတစ်ခု (specific task) အတွက် အနည်းငယ်သော ဒေတာနဲ့ ထပ်မံလေ့ကျင့်ပေးခြင်းကို ဆိုလိုပါတယ်။
+*   **Trainer API (Application Programming Interface)**: Hugging Face Transformers library မှ model များကို ထိရောက်စွာ လေ့ကျင့်ရန်အတွက် ဒီဇိုင်းထုတ်ထားသော မြင့်မားသောအဆင့် (high-level) API။
+*   **Custom Training Loops**: Trainer API ကဲ့သို့သော abstractions များကို အသုံးမပြုဘဲ PyTorch library ၏ အခြေခံလုပ်ဆောင်ချက်များဖြင့် model ကို လေ့ကျင့်ရန် code ကို ကိုယ်တိုင်ရေးသားခြင်း။
+*   **Learning Curves**: Training လုပ်နေစဉ် model ၏ performance metrics (ဥပမာ- loss, accuracy) များကို အချိန်နှင့်အမျှ ပုံဖြင့်ပြသထားခြင်း။
+*   **Performance Metrics**: Model ၏ စွမ်းဆောင်ရည်ကို တိုင်းတာရန် အသုံးပြုသော တန်ဖိုးများ (ဥပမာ- accuracy, F1 score)။
+*   **Loss Curves**: Training steps သို့မဟုတ် epochs များတစ်လျှောက် model ၏ error (loss) မည်သို့ပြောင်းလဲသည်ကို ပြသသော learning curve။
+*   **Accuracy Curves**: Training steps သို့မဟုတ် epochs များတစ်လျှောက် မှန်ကန်သော ခန့်မှန်းချက်များ၏ ရာခိုင်နှုန်းကို ပြသသော learning curve။
+*   **Training Steps**: training batch တစ်ခုစီကို လုပ်ဆောင်ခြင်း။
+*   **Epochs**: dataset တစ်ခုလုံးကို model တစ်ခုက အစအဆုံး တစ်ကြိမ် လေ့ကျင့်မှု ပြီးဆုံးခြင်း။
+*   **Weights & Biases (wandb)**: Machine learning စမ်းသပ်မှုများကို မှတ်တမ်းတင်ရန်၊ မြင်သာအောင် ပြုလုပ်ရန်နှင့် မျှဝေရန်အတွက် ကိရိယာများကို ပံ့ပိုးပေးသော platform။
+*   **Loss**: Model ၏ ခန့်မှန်းချက်များနှင့် အမှန်တကယ် labels များကြား ကွာခြားမှုကို တိုင်းတာသော တန်ဖိုး (error)။
+*   **Optimization**: Model ၏ parameters များကို ချိန်ညှိခြင်းဖြင့် loss ကို လျှော့ချပြီး စွမ်းဆောင်ရည်ကို မြှင့်တင်ခြင်း။
+*   **Convergence**: Training လုပ်နေစဉ် model ၏ performance metrics များ (loss, accuracy) တည်ငြိမ်လာပြီး ထပ်မံတိုးတက်ခြင်းမရှိတော့သည့် အခြေအနေ။
+*   **Dashboard**: အချက်အလက်များကို မြင်သာသော ပုံစံဖြင့် စုစည်းပြသထားသည့် user interface။
+*   **`wandb.init()` Function**: Weights & Biases ကို initialize လုပ်ပြီး စမ်းသပ်မှု မှတ်တမ်းတင်ခြင်းကို စတင်ရန် function။
+*   **`project` (wandb argument)**: Weights & Biases တွင် စမ်းသပ်မှုများ စုစည်းထားသည့် project အမည်။
+*   **`name` (wandb argument)**: Weights & Biases တွင် လက်ရှိ training run အတွက် ပေးသော အမည်။
+*   **`TrainingArguments` Class**: Trainer ကို အသုံးပြု၍ မော်ဒယ်လေ့ကျင့်ရာတွင် လိုအပ်သော hyperparameters များနှင့် အခြားအခြေအနေများကို သတ်မှတ်ရန် အသုံးပြုသည့် class။
+*   **`output_dir`**: လေ့ကျင့်ပြီးသား model နှင့် checkpoints များကို သိမ်းဆည်းမည့် directory။
+*   **`eval_strategy="steps"`**: Training steps အရေအတွက်အလိုက် evaluation လုပ်ရန် သတ်မှတ်သော `eval_strategy` option။
+*   **`eval_steps`**: evaluation လုပ်ငန်းကို ပြန်လုပ်မည့် training steps အရေအတွက်။
+*   **`save_steps`**: Model checkpoints များကို သိမ်းဆည်းမည့် steps အရေအတွက်။
+*   **`logging_steps`**: Metrics များကို log လုပ်မည့် steps အရေအတွက်။
+*   **`num_train_epochs`**: Training လုပ်မည့် epochs အရေအတွက်။
+*   **`per_device_train_batch_size`**: device တစ်ခုစီ (ဥပမာ- GPU တစ်ခုစီ) အတွက် training batch size။
+*   **`per_device_eval_batch_size`**: device တစ်ခုစီ (ဥပမာ- GPU တစ်ခုစီ) အတွက် evaluation batch size။
+*   **`report_to="wandb"`**: Logs များကို Weights & Biases သို့ ပို့ရန် သတ်မှတ်သော parameter။
+*   **`model` (Trainer argument)**: Trainer ကို ပေးအပ်သော model object။
+*   **`args` (Trainer argument)**: Trainer ကို ပေးအပ်သော `TrainingArguments` object။
+*   **`train_dataset`**: Trainer ကို ပေးအပ်သော training set။
+*   **`eval_dataset`**: Trainer ကို ပေးအပ်သော validation set။
+*   **`data_collator`**: batch တစ်ခုအတွင်း samples များကို စုစည်းပေးသော function။
+*   **`processing_class`**: Trainer ကို ဒေတာ processing အတွက် မည်သည့် tokenizer ကို အသုံးပြုရမည်ကို ပြောပြပေးသော parameter။
+*   **`compute_metrics`**: evaluation လုပ်ငန်းစဉ်အတွင်း metrics (ဥပမာ- accuracy, F1 score) များကို တွက်ချက်ရန်အတွက် Trainer ကို ပေးအပ်သော function။
+*   **Plateaus**: Learning curves များတွင် စွမ်းဆောင်ရည် တိုးတက်မှု ရပ်တန့်သွားသော ညီညာသည့် အပိုင်း။
+*   **Discrete Predictions**: Model ၏ output များကို ပြတ်သားသော အမျိုးအစားများအဖြစ် ပြောင်းလဲခြင်း (ဥပမာ- 0 သို့မဟုတ် 1)။
+*   **Threshold**: Discrete prediction ပြုလုပ်ရန်အတွက် ကျော်လွန်ရမည့် တန်ဖိုး။
+*   **Binary Classifier**: ဒေတာများကို အမျိုးအစားနှစ်မျိုးအဖြစ် ခွဲခြားပေးသော model။
+*   **Overfitting**: Model သည် training data မှ အလွန်အမင်း သင်ယူပြီး မမြင်ရသေးသော data (validation set) ပေါ်တွင် စွမ်းဆောင်ရည် ကျဆင်းခြင်း။
+*   **Generalize (Generalization)**: Model သည် သင်ယူထားသော ပုံစံများကို မမြင်ရသေးသော ဒေတာအသစ်များပေါ်တွင် ကောင်းစွာ အသုံးချနိုင်စွမ်း။
+*   **Regularization**: Model ၏ ရှုပ်ထွေးမှုကို လျှော့ချခြင်းဖြင့် overfitting ကို ကာကွယ်သော နည်းလမ်းများ (ဥပမာ- dropout, weight decay)။
+    *   **Dropout**: Neural network layers များတွင် neurons အချို့ကို ကျပန်း (randomly) ပိတ်ထားခြင်းဖြင့် overfitting ကို လျှော့ချသော နည်းလမ်း။
+    *   **Weight Decay**: Model ၏ weights များကို သေးငယ်အောင် ထိန်းညှိခြင်းဖြင့် overfitting ကို လျှော့ချသော နည်းလမ်း။
+*   **Early Stopping**: Validation performance မတိုးတက်တော့သည့်အခါ training ကို ရပ်တန့်ခြင်း။
+*   **Data Augmentation**: training data ကို ပြောင်းလဲခြင်း (ဥပမာ- ရုပ်ပုံများကို လှည့်ခြင်း၊ စာသားများကို ပြန်ရေးခြင်း) ဖြင့် ၎င်း၏ ကွဲပြားမှုကို တိုးမြှင့်ခြင်း။
+*   **Model Complexity**: Model ၏ ရှုပ်ထွေးမှုပမာဏ (ဥပမာ- layers အရေအတွက်၊ parameters အရေအတွက်)။
+*   **`EarlyStoppingCallback`**: `Trainer` တွင် early stopping feature ကို ထည့်သွင်းရန် အသုံးပြုသော callback။
+*   **`early_stopping_patience`**: validation performance မတိုးတက်ဘဲ training ဆက်လုပ်မည့် epochs အရေအတွက်။
+*   **`save_strategy`**: Model checkpoints များကို မည်သို့ သိမ်းဆည်းရမည်ကို သတ်မှတ်သော strategy (ဥပမာ- `"steps"`)။
+*   **`load_best_model_at_end=True`**: Training ပြီးဆုံးသောအခါ အကောင်းဆုံး validation performance ရှိသော model ကို load လုပ်ရန် သတ်မှတ်သော parameter။
+*   **`metric_for_best_model="eval_loss"`**: အကောင်းဆုံး model ကို ဆုံးဖြတ်ရန်အတွက် အသုံးပြုမည့် metric (ဤနေရာတွင် evaluation loss)။
+*   **`greater_is_better=False`**: `metric_for_best_model` အတွက် ပိုကြီးသော တန်ဖိုးသည် ပိုကောင်းသည် (True) သို့မဟုတ် ပိုသေးငယ်သော တန်ဖိုးသည် ပိုကောင်းသည် (False) ကို သတ်မှတ်ခြင်း။
+*   **Underfitting**: Model သည် training data ရှိ အခြေခံပုံစံများကို သင်ယူရန် အလွန်ရိုးရှင်းလွန်းသောကြောင့် training နှင့် validation data နှစ်ခုလုံးတွင် စွမ်းဆောင်ရည် ညံ့ဖျင်းခြင်း။
+*   **Model Capacity**: Model ၏ သင်ယူနိုင်စွမ်း သို့မဟုတ် ရှုပ်ထွေးသော ပုံစံများကို ဖမ်းယူနိုင်စွမ်း။
+*   **Learning Rate**: Training လုပ်နေစဉ် model ၏ weights များကို update လုပ်ရာတွင် အသုံးပြုသော step size။
+*   **Erratic Learning Curves**: Model သည် ထိရောက်စွာ သင်ယူခြင်းမရှိဘဲ performance metrics များ မကြာခဏ အတက်အကျရှိနေသော learning curve ပုံစံ။
+*   **Fluctuations**: တန်ဖိုးများ၏ အတက်အကျ သို့မဟုတ် မတည်ငြိမ်မှု။
+*   **Variance**: ဒေတာအမှတ်များ၏ ပျံ့နှံ့မှုပမာဏ။
+*   **Stability**: စွမ်းဆောင်ရည်၏ တည်ငြိမ်မှု သို့မဟုတ် ပြောင်းလဲမှု နည်းပါးခြင်း။
+*   **Gradient Clipping**: Gradients များ၏ တန်ဖိုးကို ကန့်သတ်ခြင်းဖြင့် gradient exploding ပြဿနာကို ကာကွယ်သော နည်းလမ်း။
+*   **Batch Size**: batch တစ်ခုစီတွင် ပါဝင်မည့် samples အရေအတွက်။
+*   **Gradients**: Model ၏ loss function ကို လျှော့ချရန်အတွက် model ၏ weights များကို မည်သည့်လမ်းကြောင်းသို့ ချိန်ညှိရမည်ကို ညွှန်ပြသော တန်ဖိုးများ။
\ No newline at end of file
diff --git a/chapters/my/chapter3/6.mdx b/chapters/my/chapter3/6.mdx
new file mode 100644
index 000000000..4c67a8b54
--- /dev/null
+++ b/chapters/my/chapter3/6.mdx
@@ -0,0 +1,76 @@
+<FrameworkSwitchCourse {fw} />
+
+# Fine-tuning လုပ်ငန်း ပြီးစီးပြီ![[fine-tuning-check]]
+
+<CourseFloatingBanner
+    chapter={3}
+    classNames="absolute z-10 right-0 top-0"
+/>
+
+ဒါက ပြည့်စုံလွန်းပါတယ်။ ပထမအခန်းနှစ်ခန်းမှာ သင်ဟာ models တွေနဲ့ tokenizers တွေအကြောင်း သင်ယူခဲ့ပြီး၊ အခုတော့ ခေတ်မီ အကောင်းဆုံးအလေ့အကျင့်တွေကို အသုံးပြုပြီး သင်ရဲ့ကိုယ်ပိုင်ဒေတာအတွက် ၎င်းတို့ကို ဘယ်လို fine-tune လုပ်ရမယ်ဆိုတာ သင်သိသွားပါပြီ။ အနှစ်ချုပ်အနေနဲ့၊ ဒီအခန်းမှာ သင်ဟာ အောက်ပါတို့ကို သင်ယူခဲ့ပါပြီ။
+
+*   [Hub](https://huggingface.co/datasets) ပေါ်က datasets တွေနဲ့ ခေတ်မီ data processing နည်းလမ်းတွေအကြောင်း သင်ယူခဲ့သည်။
+*   dynamic padding နဲ့ data collators တွေကို အသုံးပြုခြင်းအပါအဝင် datasets တွေကို ထိရောက်စွာ load လုပ်ပြီး preprocessing လုပ်နည်းကို သင်ယူခဲ့သည်။
+*   နောက်ဆုံးပေါ် features တွေပါဝင်တဲ့ high-level `Trainer` API ကို အသုံးပြုပြီး fine-tuning နဲ့ evaluation ကို အကောင်အထည်ဖော်ခဲ့သည်။
+*   PyTorch နဲ့ အစကနေ အဆုံးထိ ပြည့်စုံတဲ့ custom training loop တစ်ခုကို အကောင်အထည်ဖော်ခဲ့သည်။
+*   multiple GPUs ဒါမှမဟုတ် TPUs တွေပေါ်မှာ သင်ရဲ့ training code ကို ချောမွေ့စွာ အလုပ်လုပ်နိုင်စေဖို့ 🤗 Accelerate ကို အသုံးပြုခဲ့သည်။
+*   mixed precision training နဲ့ gradient accumulation လိုမျိုး ခေတ်မီ optimization နည်းလမ်းတွေကို အသုံးချခဲ့သည်။
+
+> [!TIP]
+> 🎉 **ဂုဏ်ယူပါတယ်!** သင်ဟာ transformer models တွေကို fine-tuning လုပ်တဲ့ အခြေခံအချက်တွေကို ကျွမ်းကျင်သွားပါပြီ။ အခု သင်ဟာ လက်တွေ့ ML (Machine Learning) project တွေကို ကိုင်တွယ်ဖို့ အဆင်သင့်ဖြစ်နေပါပြီ!
+>
+> 📖 **ဆက်လက်လေ့လာရန်**: သင်၏ အသိပညာကို နက်ရှိုင်းစေရန် ဤအရင်းအမြစ်များကို လေ့လာပါ။
+> - သီးခြား NLP (Natural Language Processing) လုပ်ငန်းများအတွက် [🤗 Transformers task guides](https://huggingface.co/docs/transformers/main/en/tasks/sequence_classification)
+> - ပြည့်စုံသော notebooks များအတွက် [🤗 Transformers examples](https://huggingface.co/docs/transformers/main/en/notebooks)
+>
+> 🚀 **နောက်တစ်ဆင့်များ**:
+> - သင်သင်ယူခဲ့သော နည်းလမ်းများကို အသုံးပြုပြီး သင်၏ကိုယ်ပိုင် dataset ပေါ်တွင် fine-tuning လုပ်ကြည့်ပါ။
+> - [Hugging Face Hub](https://huggingface.co/models) ပေါ်မှာ ရရှိနိုင်တဲ့ မတူညီတဲ့ model architectures များကို စမ်းသပ်ကြည့်ပါ။
+> - သင်၏ project များကို မျှဝေရန်နှင့် အကူအညီရယူရန် [Hugging Face community](https://discuss.huggingface.co/) သို့ ဝင်ရောက်ပါ။
+
+ဒါက 🤗 Transformers နဲ့ သင်၏ ခရီးရဲ့ အစသာ ရှိပါသေးတယ်။ နောက်အခန်းမှာတော့ သင်၏ models တွေနဲ့ tokenizers တွေကို လူအဖွဲ့အစည်း (community) နဲ့ ဘယ်လို မျှဝေရမယ်၊ ပြီးတော့ တဖြည်းဖြည်း တိုးတက်နေတဲ့ pretrained models တွေရဲ့ ecosystem ကို ဘယ်လို ပံ့ပိုးပေးရမယ်ဆိုတာကို ကျွန်တော်တို့ လေ့လာသွားပါမယ်။
+
+ဒီနေရာမှာ သင်တည်ဆောက်ခဲ့တဲ့ ကျွမ်းကျင်မှုတွေ - ဒေတာ preprocessing၊ training configuration၊ evaluation နဲ့ optimization - တွေဟာ မည်သည့် machine learning project အတွက်မဆို အခြေခံကျပါတယ်။ သင် text classification၊ named entity recognition၊ question answering ဒါမှမဟုတ် အခြား NLP လုပ်ငန်းတစ်ခုခုမှာ အလုပ်လုပ်နေသည်ဖြစ်စေ၊ ဒီနည်းလမ်းတွေက သင့်ကို အထောက်အကူပြုပါလိမ့်မယ်။
+
+> [!TIP]
+> 💡 **အောင်မြင်မှုအတွက် အကြံပြုချက်များ**:
+> - custom training loops တွေကို အကောင်အထည်မဖော်ခင် `Trainer` API ကို အသုံးပြုပြီး ခိုင်မာတဲ့ baseline တစ်ခုနဲ့ အမြဲတမ်း စတင်ပါ။
+> - ပိုမိုကောင်းမွန်တဲ့ starting points တွေအတွက် သင့်လုပ်ငန်းနဲ့ နီးစပ်တဲ့ pretrained models တွေကို ရှာဖွေဖို့ 🤗 Hub ကို အသုံးပြုပါ။
+> - သင်၏ training ကို မှန်ကန်တဲ့ evaluation metrics တွေနဲ့ စောင့်ကြည့်ပါ ပြီးတော့ checkpoints တွေကို သိမ်းဆည်းဖို့ မမေ့ပါနဲ့။
+> - လူအဖွဲ့အစည်းကို အကျိုးယူပါ - အခြားသူများကို ကူညီရန်နှင့် သင်၏အလုပ်အပေါ် တုံ့ပြန်ချက်ရယူရန် သင်၏ models တွေနဲ့ datasets တွေကို မျှဝေပါ။
+
+## ဝေါဟာရ ရှင်းလင်းချက် (Glossary)
+
+*   **Fine-tuning**: ကြိုတင်လေ့ကျင့်ထားပြီးသား (pre-trained) မော်ဒယ်တစ်ခုကို သီးခြားလုပ်ငန်းတစ်ခု (specific task) အတွက် အနည်းငယ်သော ဒေတာနဲ့ ထပ်မံလေ့ကျင့်ပေးခြင်းကို ဆိုလိုပါတယ်။
+*   **Models**: Artificial Intelligence (AI) နယ်ပယ်တွင် အချက်အလက်များကို လေ့လာပြီး ခန့်မှန်းချက်များ ပြုလုပ်ရန် ဒီဇိုင်းထုတ်ထားသော သင်္ချာဆိုင်ရာဖွဲ့စည်းပုံများ။
+*   **Tokenizers**: စာသား (သို့မဟုတ် အခြားဒေတာ) ကို AI မော်ဒယ်များ စီမံဆောင်ရွက်နိုင်ရန် tokens တွေအဖြစ် ပိုင်းခြားပေးသည့် ကိရိယာ သို့မဟုတ် လုပ်ငန်းစဉ်။
+*   **Hub (Hugging Face Hub)**: AI မော်ဒယ်တွေ၊ datasets တွေနဲ့ demo တွေကို အခြားသူတွေနဲ့ မျှဝေဖို့၊ ရှာဖွေဖို့နဲ့ ပြန်လည်အသုံးပြုဖို့အတွက် အွန်လိုင်း platform တစ်ခု ဖြစ်ပါတယ်။
+*   **Data Processing Techniques**: ဒေတာများကို model က နားလည်ပြီး လုပ်ဆောင်နိုင်တဲ့ ပုံစံအဖြစ် ပြောင်းလဲပြင်ဆင်ခြင်း နည်းလမ်းများ။
+*   **Dynamic Padding**: Batch တစ်ခုအတွင်းရှိ samples များကို အဲဒီ batch ထဲက အရှည်ဆုံး sample ရဲ့ အရှည်အထိသာ padding လုပ်တဲ့ နည်းလမ်း။
+*   **Data Collators**: batch တစ်ခုအတွင်း samples များကို စုစည်းပေးသော functions သို့မဟုတ် classes များ။
+*   **High-level API (Application Programming Interface)**: အသုံးပြုရလွယ်ကူစေရန် ရှုပ်ထွေးသော အသေးစိတ်အချက်အလက်များကို ဖုံးကွယ်ထားသော ပရိုဂရမ်ရေးသားခြင်း interface။
+*   **`Trainer` API**: Hugging Face Transformers library မှ model များကို ထိရောက်စွာ လေ့ကျင့်ရန်အတွက် ဒီဇိုင်းထုတ်ထားသော မြင့်မားသောအဆင့် (high-level) API။
+*   **Evaluation**: မော်ဒယ်၏ စွမ်းဆောင်ရည်ကို တိုင်းတာခြင်း။
+*   **Custom Training Loop**: Trainer API ကဲ့သို့သော abstractions များကို အသုံးမပြုဘဲ PyTorch library ၏ အခြေခံလုပ်ဆောင်ချက်များဖြင့် model ကို လေ့ကျင့်ရန် code ကို ကိုယ်တိုင်ရေးသားခြင်း။
+*   **PyTorch**: Facebook (ယခု Meta) က ဖန်တီးထားတဲ့ open-source machine learning library တစ်ခုဖြစ်ပြီး deep learning မော်ဒယ်တွေ တည်ဆောက်ဖို့အတွက် အသုံးပြုပါတယ်။
+*   **🤗 Accelerate**: Hugging Face က ထုတ်လုပ်ထားတဲ့ library တစ်ခုဖြစ်ပြီး PyTorch training loops တွေကို code အပြောင်းအလဲ အနည်းငယ်နဲ့ distributed training (multiple GPUs, TPUs) မှာ run နိုင်အောင် ကူညီပေးပါတယ်။
+*   **Multiple GPUs (Graphics Processing Units)**: ဂရပ်ဖစ်လုပ်ဆောင်မှုအတွက် အထူးဒီဇိုင်းထုတ်ထားသော processor များစွာကို အသုံးပြုခြင်း။ AI/ML (Artificial Intelligence/Machine Learning) လုပ်ငန်းများတွင် အရှိန်မြှင့်ရန် အသုံးများသည်။
+*   **TPUs (Tensor Processing Units)**: Google မှ AI/ML workloads များအတွက် အထူးဒီဇိုင်းထုတ်ထားသော processor များစွာ။
+*   **Mixed Precision Training**: မော်ဒယ်ကို လေ့ကျင့်ရာတွင် 16-bit floating-point numbers (fp16) နှင့် 32-bit floating-point numbers (fp32) နှစ်မျိုးလုံးကို ရောစပ်အသုံးပြုခြင်း။ ၎င်းသည် training ကို မြန်ဆန်စေပြီး memory အသုံးပြုမှုကို လျှော့ချသည်။
+*   **Gradient Accumulation**: GPU memory ကန့်သတ်ချက်ရှိသောအခါ ပိုကြီးမားသော batch sizes များကို အတုယူရန် batches အများအပြားပေါ်တွင် gradients များကို စုဆောင်းပြီးမှ update လုပ်ခြင်း။
+*   **ML Projects (Machine Learning Projects)**: Machine learning နည်းပညာများကို အသုံးပြု၍ ပြဿနာတစ်ခုကို ဖြေရှင်းရန် လုပ်ဆောင်သော project များ။
+*   **`task guides` (Transformers)**: Transformers library ကို အသုံးပြု၍ သီးခြား NLP (Natural Language Processing) လုပ်ငန်းများကို လုပ်ဆောင်ရန် လမ်းညွှန်များ။
+*   **`notebooks` (Transformers)**: Hugging Face Transformers library ကို အသုံးပြုပုံဥပမာများ ပါဝင်သော Jupyter Notebooks များ။
+*   **Model Architectures**: Model တစ်ခု၏ layers များနှင့် ၎င်းတို့ ချိတ်ဆက်ပုံကို ဖော်ပြသော ဒီဇိုင်းဖွဲ့စည်းပုံ။
+*   **Community**: Hugging Face တွင် AI/ML နယ်ပယ်မှ သုံးစွဲသူများ၊ developer များနှင့် သုတေသီများ စုစည်းထားသော အဖွဲ့အစည်း။
+*   **Preprocessing**: ဒေတာများကို model က နားလည်ပြီး လုပ်ဆောင်နိုင်တဲ့ ပုံစံအဖြစ် ပြောင်းလဲပြင်ဆင်ခြင်း လုပ်ငန်းစဉ်။
+*   **Training Configuration**: Model ကို လေ့ကျင့်ရန်အတွက် လိုအပ်သော setting များနှင့် parameters များကို သတ်မှတ်ခြင်း။
+*   **Optimization**: Model ၏ စွမ်းဆောင်ရည်ကို မြှင့်တင်ရန် သို့မဟုတ် training ကို ပိုမိုထိရောက်စေရန် နည်းလမ်းများ။
+*   **Text Classification**: စာသားကို သတ်မှတ်ထားသော အမျိုးအစားများထဲသို့ ခွဲခြားသတ်မှတ်ခြင်း။
+*   **Named Entity Recognition (NER)**: စာသားထဲက လူအမည်၊ နေရာအမည်၊ အဖွဲ့အစည်းအမည် စတဲ့ သီးခြားအမည်တွေကို ရှာဖွေဖော်ထုတ်ခြင်း။
+*   **Question Answering**: မေးခွန်းတစ်ခုအတွက် စာသားထဲမှ အဖြေကို ရှာဖွေခြင်း။
+*   **Baseline**: ပထမဆုံး စွမ်းဆောင်ရည် စစ်ဆေးရန် အသုံးပြုသော အခြေခံ model သို့မဟုတ် ရလဒ်။
+*   **Pretrained Models**: အကြီးစား ဒေတာအမြောက်အမြားဖြင့် ကြိုတင်လေ့ကျင့်ထားပြီးဖြစ်သော AI (Artificial Intelligence) မော်ဒယ်များ။
+*   **Evaluation Metrics**: Model ၏ စွမ်းဆောင်ရည်ကို တိုင်းတာရန် အသုံးပြုသော တန်ဖိုးများ (ဥပမာ- accuracy, F1 score)။
+*   **Checkpoints**: မော်ဒယ်၏ weights များနှင့် အခြားဖွဲ့စည်းပုံများ (configuration) ကို သတ်မှတ်ထားသော အချိန်တစ်ခုတွင် သိမ်းဆည်းထားခြင်း။
+*   **Feedback**: လုပ်ဆောင်မှုတစ်ခုအပေါ် ရရှိသော တုံ့ပြန်မှု သို့မဟုတ် အကြံပြုချက်များ။
\ No newline at end of file
diff --git a/chapters/my/chapter3/7.mdx b/chapters/my/chapter3/7.mdx
new file mode 100644
index 000000000..b8fb43a31
--- /dev/null
+++ b/chapters/my/chapter3/7.mdx
@@ -0,0 +1,39 @@
+<FrameworkSwitchCourse {fw} />
+
+<!-- DISABLE-FRONTMATTER-SECTIONS -->
+
+# အခန်းပြီးဆုံးခြင်း အသိအမှတ်ပြု လက်မှတ်[[end-of-chapter-certificate]]
+
+<CourseFloatingBanner chapter={3}
+  classNames="absolute z-10 right-0 top-0"
+/>
+
+သင်တန်းကို ပြီးမြောက်အောင်မြင်စွာ ပြီးဆုံးခဲ့တဲ့အတွက် ဂုဏ်ယူပါတယ်။ pretrained models တွေကို fine-tune လုပ်နည်း၊ learning curves တွေကို နားလည်နည်းနဲ့ သင်၏ models တွေကို community နဲ့ မျှဝေနည်းတွေကို သင်ယူခဲ့ပြီးပါပြီ။ အခုတော့ သင်၏ အသိပညာကို စစ်ဆေးပြီး လက်မှတ်ရယူဖို့ quiz ဖြေဆိုရမယ့်အချိန်ပါပဲ။
+
+quiz ဖြေဆိုဖို့အတွက် အောက်ပါအဆင့်တွေကို လိုက်နာရပါလိမ့်မယ်။
+
+1.  သင်၏ Hugging Face account သို့ ဝင်ရောက်ပါ။
+2.  quiz ရှိ မေးခွန်းများကို ဖြေဆိုပါ။
+3.  သင်၏ အဖြေများကို တင်သွင်းပါ။
+
+## ရွေးချယ်စရာအများအပြားပါသော မေးခွန်းများ
+
+ဤ quiz တွင် ရွေးချယ်စရာစာရင်းမှ မှန်ကန်သောအဖြေကို ရွေးချယ်ရန် သင့်အား တောင်းဆိုပါလိမ့်မည်။ ကျွန်တော်တို့သည် supervised finetuning ၏ အခြေခံများကို စစ်ဆေးပါလိမ့်မယ်။
+
+<iframe
+	src="https://huggingface-course-unit-3-quiz.hf.space"
+	frameborder="0"
+	width="850"
+	height="450"
+></iframe>
+
+## ဝေါဟာရ ရှင်းလင်းချက် (Glossary)
+
+*   **Fine-tune (Fine-tuning)**: ကြိုတင်လေ့ကျင့်ထားပြီးသား (pre-trained) မော်ဒယ်တစ်ခုကို သီးခြားလုပ်ငန်းတစ်ခု (specific task) အတွက် အနည်းငယ်သော ဒေတာနဲ့ ထပ်မံလေ့ကျင့်ပေးခြင်းကို ဆိုလိုပါတယ်။
+*   **Pretrained Models**: အကြီးစား ဒေတာအမြောက်အမြားဖြင့် ကြိုတင်လေ့ကျင့်ထားပြီးဖြစ်သော AI (Artificial Intelligence) မော်ဒယ်များ။
+*   **Learning Curves**: Training လုပ်နေစဉ် model ၏ performance metrics (ဥပမာ- loss, accuracy) များကို အချိန်နှင့်အမျှ ပုံဖြင့်ပြသထားခြင်း။
+*   **Community**: Hugging Face တွင် AI/ML နယ်ပယ်မှ သုံးစွဲသူများ၊ developer များနှင့် သုတေသီများ စုစည်းထားသော အဖွဲ့အစည်း။
+*   **Quiz**: သင်ယူခဲ့သည့် အကြောင်းအရာများအပေါ် နားလည်မှုကို စစ်ဆေးရန်အတွက် အတိုချုပ် စာမေးပွဲ သို့မဟုတ် မေးခွန်းများ။
+*   **Certificate**: သင်တန်းတစ်ခု သို့မဟုတ် လေ့ကျင့်မှုတစ်ခု ပြီးဆုံးကြောင်း အသိအမှတ်ပြုသော လက်မှတ်။
+*   **Hugging Face Account**: Hugging Face ပလက်ဖောင်းပေါ်ရှိ သုံးစွဲသူအကောင့်။ ၎င်းသည် မော်ဒယ်များ၊ datasets များနှင့် အခြားအရင်းအမြစ်များကို ဝင်ရောက်ကြည့်ရှုရန် ခွင့်ပြုသည်။
+*   **Supervised Finetuning**: label လုပ်ထားသော ဒေတာ (inputs နှင့် ၎င်းတို့၏ သက်ဆိုင်ရာ outputs များ) ကို အသုံးပြု၍ pretrained model တစ်ခုကို သီးခြားလုပ်ငန်းတစ်ခုအတွက် ထပ်မံလေ့ကျင့်ခြင်း။
\ No newline at end of file
diff --git a/chapters/my/chapter4/1.mdx b/chapters/my/chapter4/1.mdx
new file mode 100644
index 000000000..c68f03fa5
--- /dev/null
+++ b/chapters/my/chapter4/1.mdx
@@ -0,0 +1,45 @@
+# Hugging Face Hub[[the-hugging-face-hub]]
+
+<CourseFloatingBanner
+    chapter={4}
+    classNames="absolute z-10 right-0 top-0"
+/>
+
+[Hugging Face Hub](https://huggingface.co/) –- ကျွန်တော်တို့ရဲ့ ပင်မ website –- ဟာ မည်သူမဆို ခေတ်မီဆန်းသစ်တဲ့ models တွေနဲ့ datasets တွေကို ရှာဖွေ၊ အသုံးပြုပြီး ပံ့ပိုးကူညီနိုင်စေတဲ့ ဗဟို platform တစ်ခု ဖြစ်ပါတယ်။ ဒါဟာ models ပေါင်း ၁၀,၀၀၀ ကျော်ကို အများပြည်သူအတွက် ရရှိအောင် လက်ခံထားပါတယ်။ ဒီအခန်းမှာ models တွေကို အဓိကထားပြီး လေ့လာမှာဖြစ်ပြီး၊ datasets တွေကိုတော့ Chapter 5 မှာ လေ့လာသွားပါမယ်။
+
+Hub မှာရှိတဲ့ models တွေဟာ 🤗 Transformers ဒါမှမဟုတ် NLP (Natural Language Processing) ကိုပဲ ကန့်သတ်ထားတာ မဟုတ်ပါဘူး။ NLP အတွက် [Flair](https://github.com/flairNLP/flair) နဲ့ [AllenNLP](https://github.com/allenai/allennlp) က models တွေ၊ speech (စကားပြော) အတွက် [Asteroid](https://github.com/asteroid-team/asteroid) နဲ့ [pyannote](https://github.com/pyannote/pyannote-audio) က models တွေ၊ ပြီးတော့ vision (အမြင်) အတွက် [timm](https://github.com/rwightman/pytorch-image-models) က models တွေ စသည်ဖြင့် များစွာရှိပါတယ်။
+
+ဒီ models တစ်ခုစီကို Git repository အဖြစ် လက်ခံထားတာကြောင့် versioning နဲ့ reproducibility ကို ခွင့်ပြုပါတယ်။ Hub ပေါ်မှာ model တစ်ခုကို မျှဝေတယ်ဆိုတာက လူအဖွဲ့အစည်း (community) အတွက် ဖွင့်လှစ်ပေးလိုက်တာဖြစ်ပြီး၊ ဘယ်သူမဆို အလွယ်တကူ အသုံးပြုနိုင်စေဖို့ လုပ်ဆောင်ပေးတာပါ၊ ဒါက သူတို့ကိုယ်တိုင် model ကို train လုပ်ဖို့ လိုအပ်ချက်ကို ဖယ်ရှားပေးပြီး မျှဝေခြင်းနဲ့ အသုံးပြုခြင်းကို ရိုးရှင်းစေပါတယ်။
+
+ထို့အပြင်၊ Hub ပေါ်မှာ model တစ်ခုကို မျှဝေတာက အဲဒီ model အတွက် hosted Inference API ကို အလိုအလျောက် deploy လုပ်ပေးပါတယ်။ လူအဖွဲ့အစည်းဝင်တိုင်းက model ရဲ့ page ပေါ်မှာပဲ၊ စိတ်ကြိုက် inputs တွေနဲ့ သင့်လျော်တဲ့ widgets တွေနဲ့ တိုက်ရိုက် စမ်းသပ်ကြည့်နိုင်ပါတယ်။
+
+အကောင်းဆုံးအပိုင်းကတော့ Hub ပေါ်မှာ မည်သည့် public model ကိုမဆို မျှဝေခြင်းနဲ့ အသုံးပြုခြင်းဟာ လုံးဝအခမဲ့ ဖြစ်ပါတယ်။ သင် models တွေကို private အနေနဲ့ မျှဝေချင်တယ်ဆိုရင် [paid plans](https://huggingface.co/pricing) တွေလည်း ရှိပါတယ်။
+
+အောက်ပါ video က Hub ကို ဘယ်လို လှည့်ပတ်သွားလာရမယ်ဆိုတာ ပြသထားပါတယ်။
+
+<Youtube id="XvSGPZFEjDY"/>
+
+huggingface.co account ရှိဖို့က ဒီအပိုင်းကို လိုက်လုပ်ဖို့ လိုအပ်ပါတယ်၊ ဘာလို့လဲဆိုတော့ ကျွန်တော်တို့ Hugging Face Hub မှာ repositories တွေ ဖန်တီးပြီး စီမံခန့်ခွဲရမှာ ဖြစ်လို့ပါ၊ [account တစ်ခု ဖန်တီးပါ။](https://huggingface.co/join)။
+
+## ဝေါဟာရ ရှင်းလင်းချက် (Glossary)
+
+*   **Hugging Face Hub**: AI (Artificial Intelligence) မော်ဒယ်တွေ၊ datasets တွေနဲ့ demo တွေကို အခြားသူတွေနဲ့ မျှဝေဖို့၊ ရှာဖွေဖို့နဲ့ ပြန်လည်အသုံးပြုဖို့အတွက် အွန်လိုင်း platform တစ်ခု ဖြစ်ပါတယ်။
+*   **State-of-the-art Models (SOTA Models)**: လက်ရှိအချိန်မှာ အကောင်းဆုံး စွမ်းဆောင်ရည်ကို ပြသထားတဲ့ မော်ဒယ်များ။
+*   **Datasets**: AI မော်ဒယ်တွေ လေ့ကျင့်ဖို့အတွက် အသုံးပြုတဲ့ ဒေတာအစုအဝေးတစ်ခုပါ။
+*   **🤗 Transformers (Library)**: Hugging Face က ထုတ်လုပ်ထားတဲ့ library တစ်ခုဖြစ်ပြီး Transformer မော်ဒယ်တွေကို အသုံးပြုပြီး Natural Language Processing (NLP), computer vision, audio processing စတဲ့ နယ်ပယ်တွေမှာ အဆင့်မြင့် AI မော်ဒယ်တွေကို တည်ဆောက်ပြီး အသုံးပြုနိုင်စေပါတယ်။
+*   **NLP (Natural Language Processing)**: ကွန်ပျူတာတွေ လူသားဘာသာစကားကို နားလည်၊ အဓိပ္ပာယ်ဖော်ပြီး၊ ဖန်တီးနိုင်အောင် လုပ်ဆောင်ပေးတဲ့ Artificial Intelligence (AI) ရဲ့ နယ်ပယ်ခွဲတစ်ခုပါ။
+*   **Flair (Library)**: NLP လုပ်ငန်းများအတွက် အသုံးပြုသော open-source framework။
+*   **AllenNLP (Library)**: NLP သုတေသနနှင့် အပလီကေးရှင်းများအတွက် အသုံးပြုသော open-source deep learning library။
+*   **Speech (Speech Processing)**: အသံဒေတာ (စကားပြော) ကို ကွန်ပျူတာများက နားလည်ပြီး လုပ်ဆောင်နိုင်စေရန် ဆောင်ရွက်သည့် နယ်ပယ်။
+*   **Asteroid (Library)**: Speech enhancement (အသံမြှင့်တင်မှု) နှင့် source separation (ရင်းမြစ်ခွဲထုတ်မှု) လုပ်ငန်းများအတွက် PyTorch library။
+*   **pyannote (Library)**: အသံနှင့် စကားပြောခွဲခြမ်းစိတ်ဖြာခြင်း (audio and speech analysis) အတွက် open-source toolkit။
+*   **Vision (Computer Vision)**: ကွန်ပျူတာများက ပုံရိပ်များနှင့် ဗီဒီယိုများမှ အချက်အလက်များကို နားလည်ပြီး လုပ်ဆောင်နိုင်စေရန် ဆောင်ရွက်သည့် နယ်ပယ်။
+*   **timm (Library)**: PyTorch တွင် ခေတ်မီ image models များ (image classification) အတွက် စုစည်းထားသော library။
+*   **Git Repository**: Git version control system ကို အသုံးပြု၍ project တစ်ခု၏ files များနှင့် ၎င်းတို့၏ ပြောင်းလဲမှု မှတ်တမ်းများကို သိမ်းဆည်းထားသော နေရာ။
+*   **Versioning**: ဖိုင်များ သို့မဟုတ် project များ၏ မတူညီသော ဗားရှင်းများကို ခြေရာခံခြင်းနှင့် စီမံခန့်ခွဲခြင်း လုပ်ငန်းစဉ်။
+*   **Reproducibility**: သတ်မှတ်ထားသော code နှင့် data ကို အသုံးပြု၍ တူညီသော ရလဒ်များကို ပြန်လည်ထုတ်လုပ်နိုင်ခြင်း။
+*   **Community**: Hugging Face တွင် AI/ML နယ်ပယ်မှ သုံးစွဲသူများ၊ developer များနှင့် သုတေသီများ စုစည်းထားသော အဖွဲ့အစည်း။
+*   **Inference API (Application Programming Interface)**: လေ့ကျင့်ပြီးသား AI မော်ဒယ်တစ်ခုကို အသုံးပြုပြီး input data ကနေ ခန့်မှန်းချက်တွေ ဒါမှမဟုတ် output တွေကို ထုတ်လုပ်တဲ့ ဝန်ဆောင်မှုကို ပေးတဲ့ interface။
+*   **Deploy**: ဆော့ဖ်ဝဲလ်တစ်ခုကို အသုံးပြုနိုင်ရန် စနစ်တစ်ခုပေါ်တွင် ထည့်သွင်းတပ်ဆင်ခြင်း။
+*   **Widgets**: Graphical User Interface (GUI) တွင် အသုံးပြုသူနှင့် အပြန်အလှန်တုံ့ပြန်နိုင်သော အစိတ်အပိုင်းများ (ဥပမာ- input box, button)။
+*   **Private Models**: သတ်မှတ်ထားသော အသုံးပြုသူများသာ ဝင်ရောက်ကြည့်ရှုနိုင်သော models များ။
\ No newline at end of file
diff --git a/chapters/my/chapter4/2.mdx b/chapters/my/chapter4/2.mdx
new file mode 100644
index 000000000..126933f99
--- /dev/null
+++ b/chapters/my/chapter4/2.mdx
@@ -0,0 +1,120 @@
+<FrameworkSwitchCourse {fw} />
+
+# Pretrained Models များကို အသုံးပြုခြင်း[[using-pretrained-models]]
+
+{#if fw === 'pt'}
+
+<CourseFloatingBanner chapter={4}
+  classNames="absolute z-10 right-0 top-0"
+  notebooks={[
+    {label: "Google Colab", value: "https://colab.research.google.com/github/huggingface/notebooks/blob/master/course/en/chapter4/section2_pt.ipynb"},
+    {label: "Aws Studio", value: "https://studiolab.sagemaker.aws/import/github/huggingface/notebooks/blob/master/course/en/chapter4/section2_pt.ipynb"},
+]} />
+
+{:else}
+
+<CourseFloatingBanner chapter={4}
+  classNames="absolute z-10 right-0 top-0"
+  notebooks={[
+    {label: "Google Colab", value: "https://colab.research.google.com/github/huggingface/notebooks/blob/master/course/en/chapter4/section2_tf.ipynb"},
+    {label: "Aws Studio", value: "https://studiolab.sagemaker.aws/import/github/huggingface/notebooks/blob/master/course/en/chapter4/section2_tf.ipynb"},
+]} />
+
+{/if}
+
+Model Hub က သင့်လျော်တဲ့ model ကို ရွေးချယ်တာကို ရိုးရှင်းစေတာကြောင့်၊ downstream library တွေမှာ အသုံးပြုတာကို code လိုင်းအနည်းငယ်နဲ့ လုပ်ဆောင်နိုင်ပါတယ်။ ဒီ models တွေထဲက တစ်ခုကို လက်တွေ့ဘယ်လိုအသုံးပြုရမလဲ၊ ပြီးတော့ community ကို ဘယ်လိုပြန်လည်ပံ့ပိုးပေးရမလဲဆိုတာ ကြည့်ရအောင်။
+
+ဥပမာအားဖြင့်၊ ကျွန်တော်တို့ဟာ mask filling ကို လုပ်ဆောင်နိုင်တဲ့ French-based model တစ်ခုကို ရှာနေတယ်လို့ ဆိုကြပါစို့။
+
+<div class="flex justify-center">
+<img src="https://huggingface.co/datasets/huggingface-course/documentation-images/resolve/main/en/chapter4/camembert.gif" alt="Selecting the Camembert model." width="80%"/>
+</div>
+
+ကျွန်တော်တို့ `camembert-base` checkpoint ကို စမ်းသပ်ဖို့ ရွေးချယ်လိုက်ပါတယ်။ `camembert-base` ဆိုတဲ့ identifier တစ်ခုတည်းကပဲ အဲဒါကို စတင်အသုံးပြုဖို့ လိုအပ်တဲ့ အရာအားလုံးပါပဲ! ယခင်အခန်းတွေမှာ သင်တွေ့ခဲ့ရတဲ့အတိုင်း၊ `pipeline()` function ကို အသုံးပြုပြီး instantiate လုပ်ဆောင်နိုင်ပါတယ်။
+
+```py
+from transformers import pipeline
+
+camembert_fill_mask = pipeline("fill-mask", model="camembert-base")
+results = camembert_fill_mask("Le camembert est <mask> :)")
+```
+
+```python out
+[
+  {'sequence': 'Le camembert est délicieux :)', 'score': 0.49091005325317383, 'token': 7200, 'token_str': 'délicieux'}, 
+  {'sequence': 'Le camembert est excellent :)', 'score': 0.1055697426199913, 'token': 2183, 'token_str': 'excellent'}, 
+  {'sequence': 'Le camembert est succulent :)', 'score': 0.03453313186764717, 'token': 26202, 'token_str': 'succulent'}, 
+  {'sequence': 'Le camembert est meilleur :)', 'score': 0.0330314114689827, 'token': 528, 'token_str': 'meilleur'}, 
+  {'sequence': 'Le camembert est parfait :)', 'score': 0.03007650189101696, 'token': 1654, 'token_str': 'parfait'}
+]
+```
+
+သင်တွေ့ရတဲ့အတိုင်း၊ pipeline ထဲမှာ model တစ်ခုကို loading လုပ်တာက အလွန်ရိုးရှင်းပါတယ်။ သင်ဂရုစိုက်ရမယ့် တစ်ခုတည်းသောအရာကတော့ ရွေးချယ်ထားတဲ့ checkpoint က အသုံးပြုမယ့် task အတွက် သင့်လျော်ခြင်းရှိမရှိပါပဲ။ ဥပမာ၊ ဒီနေရာမှာ ကျွန်တော်တို့ `camembert-base` checkpoint ကို `fill-mask` pipeline မှာ loading လုပ်နေတာဖြစ်ပြီး၊ ဒါက လုံးဝအဆင်ပြေပါတယ်။ ဒါပေမယ့် ဒီ checkpoint ကို `text-classification` pipeline မှာ loading လုပ်မယ်ဆိုရင်တော့၊ `camembert-base` ရဲ့ head က ဒီ task အတွက် မသင့်လျော်တဲ့အတွက် ရလဒ်တွေဟာ ဘာမှ အဓိပ္ပာယ်ရှိမှာ မဟုတ်ပါဘူး! သင့်လျော်တဲ့ checkpoints တွေကို ရွေးချယ်နိုင်ဖို့ Hugging Face Hub interface မှာရှိတဲ့ task selector ကို အသုံးပြုဖို့ ကျွန်တော်တို့ အကြံပြုပါတယ်။
+
+<div class="flex justify-center">
+<img src="https://huggingface.co/datasets/huggingface-course/documentation-images/resolve/main/en/chapter4/tasks.png" alt="The task selector on the web interface." width="80%"/>
+</div>
+
+model architecture ကို တိုက်ရိုက်အသုံးပြုပြီး checkpoint ကိုလည်း instantiate လုပ်နိုင်ပါတယ်။
+
+{#if fw === 'pt'}
+```py
+from transformers import CamembertTokenizer, CamembertForMaskedLM
+
+tokenizer = CamembertTokenizer.from_pretrained("camembert-base")
+model = CamembertForMaskedLM.from_pretrained("camembert-base")
+```
+
+သို့သော်လည်း၊ [`Auto*` classes](https://huggingface.co/transformers/model_doc/auto?highlight=auto#auto-classes) တွေကို အသုံးပြုဖို့ ကျွန်တော်တို့ အကြံပြုပါတယ်၊ ဘာလို့လဲဆိုတော့ ဒါတွေဟာ architecture-agnostic ဖြစ်အောင် ဒီဇိုင်းထုတ်ထားလို့ပါပဲ။ ယခင် code sample က CamemBERT architecture မှာ load လုပ်နိုင်တဲ့ checkpoints တွေကိုသာ ကန့်သတ်ထားပေမယ့်၊ `Auto*` classes တွေကို အသုံးပြုခြင်းက checkpoints တွေ ပြောင်းတာကို ရိုးရှင်းစေပါတယ်။
+
+```py
+from transformers import AutoTokenizer, AutoModelForMaskedLM
+
+tokenizer = AutoTokenizer.from_pretrained("camembert-base")
+model = AutoModelForMaskedLM.from_pretrained("camembert-base")
+```
+{:else}
+```py
+from transformers import CamembertTokenizer, TFCamembertForMaskedLM
+
+tokenizer = CamembertTokenizer.from_pretrained("camembert-base")
+model = TFCamembertForMaskedLM.from_pretrained("camembert-base")
+```
+
+သို့သော်လည်း၊ [`TFAuto*` classes](https://huggingface.co/transformers/model_doc/auto?highlight=auto#auto-classes) တွေကို အသုံးပြုဖို့ ကျွန်တော်တို့ အကြံပြုပါတယ်၊ ဘာလို့လဲဆိုတော့ ဒါတွေဟာ architecture-agnostic ဖြစ်အောင် ဒီဇိုင်းထုတ်ထားလို့ပါပဲ။ ယခင် code sample က CamemBERT architecture မှာ load လုပ်နိုင်တဲ့ checkpoints တွေကိုသာ ကန့်သတ်ထားပေမယ့်၊ `TFAuto*` classes တွေကို အသုံးပြုခြင်းက checkpoints တွေ ပြောင်းတာကို ရိုးရှင်းစေပါတယ်။
+
+```py
+from transformers import AutoTokenizer, TFAutoModelForMaskedLM
+
+tokenizer = AutoTokenizer.from_pretrained("camembert-base")
+model = TFAutoModelForMaskedLM.from_pretrained("camembert-base")
+```
+{/if}
+
+<Tip>
+pretrained model တစ်ခုကို အသုံးပြုတဲ့အခါ၊ ဒါကို ဘယ်လို train လုပ်ခဲ့လဲ၊ ဘယ် datasets တွေပေါ်မှာလဲ၊ ၎င်းရဲ့ ကန့်သတ်ချက်တွေနဲ့ ဘက်လိုက်မှုတွေကို သေချာစစ်ဆေးပါ။ ဒီအချက်အလက်အားလုံးကို ၎င်းရဲ့ model card မှာ ဖော်ပြထားသင့်ပါတယ်။
+</Tip>
+
+## ဝေါဟာရ ရှင်းလင်းချက် (Glossary)
+
+*   **Model Hub**: Hugging Face Hub ကို ရည်ညွှန်းပြီး AI မော်ဒယ်များ ရှာဖွေ၊ မျှဝေ၊ အသုံးပြုနိုင်သော ဗဟို platform။
+*   **Downstream Library**: အခြား library များ၏ အပေါ်တွင် တည်ဆောက်ထားသော သို့မဟုတ် ၎င်းတို့ကို အသုံးပြုသော library။
+*   **Community**: Hugging Face တွင် AI/ML နယ်ပယ်မှ သုံးစွဲသူများ၊ developer များနှင့် သုတေသီများ စုစည်းထားသော အဖွဲ့အစည်း။
+*   **French-based Model**: ပြင်သစ်ဘာသာစကားဖြင့် လေ့ကျင့်ထားသော AI မော်ဒယ်။
+*   **Mask Filling**: စာကြောင်းတစ်ခုထဲမှ ဖုံးကွယ်ထားသော (masked) စကားလုံးများကို model က ခန့်မှန်းဖြည့်ဆည်းပေးသည့် Natural Language Processing (NLP) လုပ်ငန်းတစ်ခု။
+*   **`camembert-base`**: CamemBERT model ၏ base version အတွက် checkpoint identifier။
+*   **Checkpoint**: မော်ဒယ်၏ weights များနှင့် အခြားဖွဲ့စည်းပုံများ (configuration) ကို သတ်မှတ်ထားသော အချိန်တစ်ခုတွင် သိမ်းဆည်းထားခြင်း။
+*   **Identifier**: သီးခြားအရာတစ်ခုကို ဖော်ပြရန် အသုံးပြုသော နာမည် သို့မဟုတ် ကုဒ်။
+*   **`pipeline()` Function**: Hugging Face Transformers library မှာ ပါဝင်တဲ့ လုပ်ဆောင်ချက်တစ်ခုဖြစ်ပြီး မော်ဒယ်တွေကို သီးခြားလုပ်ငန်းတာဝန်များ (ဥပမာ- စာသားခွဲခြားသတ်မှတ်ခြင်း၊ စာသားထုတ်လုပ်ခြင်း) အတွက် အသုံးပြုရလွယ်ကူအောင် ပြုလုပ်ပေးပါတယ်။
+*   **Instantiate**: class တစ်ခုမှ object တစ်ခုကို ဖန်တီးခြင်း။
+*   **`fill-mask` Pipeline**: Mask filling လုပ်ငန်းအတွက် ဒီဇိုင်းထုတ်ထားသော pipeline။
+*   **`text-classification` Pipeline**: Text classification လုပ်ငန်းအတွက် ဒီဇိုင်းထုတ်ထားသော pipeline။
+*   **Head (Model Head)**: Transformer မော်ဒယ်၏ အဓိကကိုယ်ထည် (body) အပေါ်တွင် ထည့်သွင်းထားသော အပိုအစိတ်အပိုင်း (layer တစ်ခု သို့မဟုတ် နှစ်ခု) ဖြစ်ပြီး သီးခြားလုပ်ငန်း (task) တစ်ခုအတွက် မော်ဒယ်၏ output များကို ချိန်ညှိပေးသည်။ ဥပမာ- sequence classification အတွက် head သည် logits ကို ထုတ်ပေးသည်။
+*   **Model Architecture**: Model တစ်ခု၏ layers များနှင့် ၎င်းတို့ ချိတ်ဆက်ပုံကို ဖော်ပြသော ဒီဇိုင်းဖွဲ့စည်းပုံ။
+*   **`CamembertTokenizer`**: CamemBERT model အတွက် သီးခြားထုတ်လုပ်ထားသော tokenizer class။
+*   **`CamembertForMaskedLM`**: Masked Language Modeling (MLM) အတွက် CamemBERT model class။
+*   **`Auto*` Classes (e.g., `AutoTokenizer`, `AutoModelForMaskedLM`)**: Hugging Face Transformers library မှာ ပါဝင်တဲ့ class တွေဖြစ်ပြီး model အမည် (checkpoint name) ကို အခြေခံပြီး သက်ဆိုင်ရာ tokenizer သို့မဟုတ် model class ကို အလိုအလျောက် ရွေးချယ်ပေးသည်။ ၎င်းတို့သည် architecture-agnostic ဖြစ်သည်။
+*   **Architecture-agnostic**: မော်ဒယ်၏ အောက်ခံ architecture ကို သီးခြားသိရှိထားရန် မလိုဘဲ အလုပ်လုပ်နိုင်သော သဘောတရား။ မတူညီသော architecture များကြား ပြောင်းလဲခြင်းကို လွယ်ကူစေသည်။
+*   **`TFAuto*` Classes (e.g., `TFAutoModelForMaskedLM`)**: TensorFlow framework အတွက် `Auto*` classes များနှင့် တူညီသော လုပ်ဆောင်ချက်များရှိသည်။
+*   **Model Card**: Hugging Face Hub တွင် မော်ဒယ်တစ်ခုစီအတွက် ပါရှိသော အချက်အလက်များပါသည့် စာမျက်နှာ။ ၎င်းတွင် မော်ဒယ်ကို မည်သို့လေ့ကျင့်ခဲ့သည်၊ မည်သည့် datasets များကို အသုံးပြုခဲ့သည်၊ ၎င်း၏ ကန့်သတ်ချက်များ၊ ဘက်လိုက်မှုများ (biases) နှင့် အသုံးပြုနည်းများ ပါဝင်သည်။
+*   **Biases**: Model တစ်ခု၏ ခန့်မှန်းချက်များတွင် ဒေတာ သို့မဟုတ် သင်္ချာဆိုင်ရာ အကြောင်းများကြောင့် ဖြစ်ပေါ်လာသော ဘက်လိုက်မှုများ။
\ No newline at end of file
diff --git a/chapters/my/chapter4/3.mdx b/chapters/my/chapter4/3.mdx
new file mode 100644
index 000000000..744dacbaf
--- /dev/null
+++ b/chapters/my/chapter4/3.mdx
@@ -0,0 +1,734 @@
+<FrameworkSwitchCourse {fw} />
+
+# Pretrained Models များကို မျှဝေခြင်း[[sharing-pretrained-models]]
+
+{#if fw === 'pt'}
+
+<CourseFloatingBanner chapter={4}
+  classNames="absolute z-10 right-0 top-0"
+  notebooks={[
+    {label: "Google Colab", value: "https://colab.research.google.com/github/huggingface/notebooks/blob/master/course/en/chapter4/section3_pt.ipynb"},
+    {label: "Aws Studio", value: "https://studiolab.sagemaker.aws/import/github/huggingface/notebooks/blob/master/course/en/chapter4/section3_pt.ipynb"},
+]} />
+
+{:else}
+
+<CourseFloatingBanner chapter={4}
+  classNames="absolute z-10 right-0 top-0"
+  notebooks={[
+    {label: "Google Colab", value: "https://colab.research.google.com/github/huggingface/notebooks/blob/master/course/en/chapter4/section3_tf.ipynb"},
+    {label: "Aws Studio", value: "https://studiolab.sagemaker.aws/import/github/huggingface/notebooks/blob/master/course/en/chapter4/section3_tf.ipynb"},
+]} />
+
+{/if}
+
+အောက်ပါအဆင့်တွေမှာ pretrained models တွေကို 🤗 Hub ကို အလွယ်ဆုံးနည်းလမ်းတွေနဲ့ ဘယ်လိုမျှဝေရမလဲဆိုတာ ကျွန်တော်တို့ လေ့လာသွားပါမယ်။ models တွေကို Hub မှာ တိုက်ရိုက်မျှဝေပြီး update လုပ်တာကို ရိုးရှင်းစေတဲ့ ကိရိယာတွေနဲ့ utility တွေ ရရှိနိုင်ပြီး၊ အောက်မှာ ကျွန်တော်တို့ လေ့လာသွားပါမယ်။
+
+<Youtube id="9yY3RB_GSPM"/>
+
+models တွေကို train လုပ်တဲ့ သုံးစွဲသူအားလုံးကို community နဲ့ မျှဝေခြင်းဖြင့် ပံ့ပိုးကူညီဖို့ ကျွန်တော်တို့ တိုက်တွန်းပါတယ်။ အလွန်တိကျတဲ့ datasets တွေပေါ်မှာ train လုပ်ထားတဲ့ models တွေကိုတောင် မျှဝေခြင်းက တခြားသူတွေကို အချိန်နဲ့ compute resources တွေ သက်သာစေပြီး အသုံးဝင်တဲ့ trained artifacts တွေကို ရယူအသုံးပြုနိုင်စေပါလိမ့်မယ်။ ဒါ့အပြင်၊ တခြားသူတွေ လုပ်ခဲ့တဲ့ အလုပ်တွေကနေလည်း သင်အကျိုးခံစားနိုင်ပါတယ်။
+
+model repositories အသစ်တွေ ဖန်တီးဖို့ နည်းလမ်းသုံးသွယ် ရှိပါတယ်။
+
+-   `push_to_hub` API ကို အသုံးပြုခြင်း
+-   `huggingface_hub` Python library ကို အသုံးပြုခြင်း
+-   web interface ကို အသုံးပြုခြင်း
+
+repository တစ်ခု ဖန်တီးပြီးတာနဲ့ git နဲ့ git-lfs ကို အသုံးပြုပြီး files တွေ upload လုပ်နိုင်ပါတယ်။ အောက်ပါအပိုင်းတွေမှာ model repositories တွေ ဖန်တီးပုံနဲ့ files တွေ upload လုပ်ပုံကို ကျွန်တော်တို့ ပြသသွားပါမယ်။
+
+## `push_to_hub` API ကို အသုံးပြုခြင်း[[using-the-pushtohub-api]]
+
+{#if fw === 'pt'}
+
+<Youtube id="Zh0FfmVrKX0"/>
+
+{:else}
+
+<Youtube id="pUh5cGmNV8Y"/>
+
+{/if}
+
+Hub ကို files တွေ upload လုပ်ဖို့ အလွယ်ကူဆုံးနည်းလမ်းက `push_to_hub` API ကို အကျိုးရှိရှိ အသုံးပြုခြင်းပါပဲ။
+
+ဆက်မသွားခင်မှာ၊ authentication token တစ်ခုကို generate လုပ်ဖို့ လိုအပ်ပါလိမ့်မယ်။ ဒါမှ `huggingface_hub` API က သင်ဘယ်သူလဲဆိုတာနဲ့ ဘယ် namespaces တွေမှာ write access ရှိတယ်ဆိုတာ သိမှာပါ။ သင် `transformers` install လုပ်ထားတဲ့ environment တစ်ခုမှာ ရှိနေကြောင်း သေချာပါစေ ([Setup](/course/chapter0) မှာ ကြည့်ပါ)။ သင် notebook ထဲမှာဆိုရင် login လုပ်ဖို့ အောက်ပါ function ကို အသုံးပြုနိုင်ပါတယ်။
+
+```python
+from huggingface_hub import notebook_login
+
+notebook_login()
+```
+
+terminal မှာဆိုရင်တော့ အောက်ပါအတိုင်း run နိုင်ပါတယ်။
+
+```bash
+huggingface-cli login
+```
+
+နည်းလမ်းနှစ်ခုလုံးမှာ သင့်ရဲ့ username နဲ့ password ကို ထည့်သွင်းဖို့ တောင်းဆိုပါလိမ့်မယ်။ ဒါတွေက Hub ကို log in လုပ်ဖို့ သင်အသုံးပြုတဲ့ အတူတူပါပဲ။ သင်မှာ Hub profile မရှိသေးဘူးဆိုရင် [ဒီနေရာ](https://huggingface.co/join) မှာ တစ်ခု ဖန်တီးသင့်ပါတယ်။
+
+ကောင်းပြီ! အခု သင့်ရဲ့ authentication token ကို cache folder ထဲမှာ သိမ်းဆည်းထားပါပြီ။ repository အချို့ကို ဖန်တီးကြရအောင်!
+
+{#if fw === 'pt'}
+
+သင် `Trainer` API ကို အသုံးပြုပြီး model တစ်ခုကို train လုပ်ခဲ့တယ်ဆိုရင်၊ ဒါကို Hub ကို upload လုပ်ဖို့ အလွယ်ကူဆုံးနည်းလမ်းက သင့် `TrainingArguments` ကို သတ်မှတ်တဲ့အခါ `push_to_hub=True` လို့ သတ်မှတ်ပေးဖို့ပါပဲ။
+
+```py
+from transformers import TrainingArguments
+
+training_args = TrainingArguments(
+    "bert-finetuned-mrpc", save_strategy="epoch", push_to_hub=True
+)
+```
+
+သင် `trainer.train()` ကို ခေါ်တဲ့အခါ၊ `Trainer` က သင့် model ကို သိမ်းဆည်းတိုင်း (ဒီနေရာမှာတော့ epoch တိုင်း) သင့် namespace ထဲက repository တစ်ခုမှာ Hub ကို upload လုပ်ပါလိမ့်မယ်။ အဲဒီ repository ကို သင်ရွေးချယ်ခဲ့တဲ့ output directory (ဒီနေရာမှာ `bert-finetuned-mrpc`) အတိုင်း နာမည်ပေးထားပါလိမ့်မယ်။ ဒါပေမယ့် `hub_model_id = "a_different_name"` နဲ့ မတူညီတဲ့ နာမည်တစ်ခုကို ရွေးချယ်နိုင်ပါတယ်။
+
+သင်အဖွဲ့ဝင်ဖြစ်တဲ့ organization တစ်ခုကို သင့် model ကို upload လုပ်ဖို့အတွက် `hub_model_id = "my_organization/my_repo_name"` နဲ့ ထည့်ပေးလိုက်ရုံပါပဲ။
+
+သင့် training ပြီးဆုံးသွားတာနဲ့၊ သင့် model ရဲ့ နောက်ဆုံး version ကို upload လုပ်ဖို့ နောက်ဆုံး `trainer.push_to_hub()` ကို လုပ်သင့်ပါတယ်။ ဒါက အသုံးပြုခဲ့တဲ့ hyperparameters တွေနဲ့ evaluation results တွေကို ဖော်ပြတဲ့ model card တစ်ခုကိုလည်း relevant metadata တွေအားလုံးနဲ့အတူ generate လုပ်ပေးပါလိမ့်မယ်။ ဒီလို model card တစ်ခုမှာ သင်တွေ့ရမယ့် content ဥပမာတစ်ခုကတော့ ဒီမှာပါ...
+
+<div class="flex justify-center">
+  <img src="https://huggingface.co/datasets/huggingface-course/documentation-images/resolve/main/en/chapter4/model_card.png" alt="An example of an auto-generated model card." width="100%"/>
+</div>
+
+{:else}
+
+သင် model ကို train လုပ်ဖို့ Keras ကို အသုံးပြုနေတယ်ဆိုရင်၊ ဒါကို Hub ကို upload လုပ်ဖို့ အလွယ်ကူဆုံးနည်းလမ်းက `model.fit()` ကို ခေါ်တဲ့အခါ `PushToHubCallback` တစ်ခုကို ထည့်ပေးဖို့ပါပဲ။
+
+```py
+from transformers import PushToHubCallback
+
+callback = PushToHubCallback(
+    "bert-finetuned-mrpc", save_strategy="epoch", tokenizer=tokenizer
+)
+```
+
+ပြီးရင် သင့် `model.fit()` ကို ခေါ်တဲ့အခါ `callbacks=[callback]` ကို ထည့်သွင်းသင့်ပါတယ်။ callback က သင့် model ကို သိမ်းဆည်းတိုင်း (ဒီနေရာမှာတော့ epoch တိုင်း) သင့် namespace ထဲက repository တစ်ခုမှာ Hub ကို upload လုပ်ပါလိမ့်မယ်။ အဲဒီ repository ကို သင်ရွေးချယ်ခဲ့တဲ့ output directory (ဒီနေရာမှာ `bert-finetuned-mrpc`) အတိုင်း နာမည်ပေးထားပါလိမ့်မယ်။ ဒါပေမယ့် `hub_model_id = "a_different_name"` နဲ့ မတူညီတဲ့ နာမည်တစ်ခုကို ရွေးချယ်နိုင်ပါတယ်။
+
+သင်အဖွဲ့ဝင်ဖြစ်တဲ့ organization တစ်ခုကို သင့် model ကို upload လုပ်ဖို့အတွက် `hub_model_id = "my_organization/my_repo_name"` နဲ့ ထည့်ပေးလိုက်ရုံပါပဲ။
+
+{/if}
+
+အနိမ့်အဆင့်မှာဆိုရင် Model Hub ကို model တွေ၊ tokenizers တွေနဲ့ configuration objects တွေမှာရှိတဲ့ ၎င်းတို့ရဲ့ `push_to_hub()` method ကနေ တိုက်ရိုက် ဝင်ရောက်ကြည့်ရှုနိုင်ပါတယ်။ ဒီ method က repository ဖန်တီးတာနဲ့ model နဲ့ tokenizer files တွေကို repository ကို တိုက်ရိုက် push လုပ်တာ နှစ်ခုလုံးကို လုပ်ဆောင်ပေးပါတယ်။ အောက်မှာ ကျွန်တော်တို့တွေ့ရမယ့် API နဲ့ မတူဘဲ၊ manual handling လုံးဝမလိုအပ်ပါဘူး။
+
+ဒါက ဘယ်လိုအလုပ်လုပ်တယ်ဆိုတာ သိဖို့အတွက်၊ ပထမဆုံး model တစ်ခုနဲ့ tokenizer တစ်ခုကို initialize လုပ်ကြရအောင်...
+
+{#if fw === 'pt'}
+```py
+from transformers import AutoModelForMaskedLM, AutoTokenizer
+
+checkpoint = "camembert-base"
+
+model = AutoModelForMaskedLM.from_pretrained(checkpoint)
+tokenizer = AutoTokenizer.from_pretrained(checkpoint)
+```
+{:else}
+```py
+from transformers import TFAutoModelForMaskedLM, AutoTokenizer
+
+checkpoint = "camembert-base"
+
+model = TFAutoModelForMaskedLM.from_pretrained(checkpoint)
+tokenizer = AutoTokenizer.from_pretrained(checkpoint)
+```
+{/if}
+
+ဒီ models တွေနဲ့ သင်ဘာမဆိုလုပ်ဆောင်နိုင်ပါတယ်၊ tokenizer ကို tokens တွေထည့်တာ၊ model ကို train လုပ်တာ၊ fine-tune လုပ်တာ။ ရရှိလာတဲ့ model, weights, နဲ့ tokenizer တွေနဲ့ သင်ကျေနပ်သွားတာနဲ့၊ `model` object မှာ တိုက်ရိုက်ရရှိနိုင်တဲ့ `push_to_hub()` method ကို အကျိုးရှိရှိ အသုံးပြုနိုင်ပါတယ်။
+
+```py
+model.push_to_hub("dummy-model")
+```
+
+ဒါက သင့် profile မှာ `dummy-model` repository အသစ်တစ်ခုကို ဖန်တီးပေးပြီး သင့် model files တွေနဲ့ ဖြည့်ပေးပါလိမ့်မယ်။
+tokenizer နဲ့လည်း အတူတူလုပ်ပါ၊ ဒါမှ files အားလုံးဟာ ဒီ repository မှာ ရရှိနိုင်ပါလိမ့်မယ်။
+
+```py
+tokenizer.push_to_hub("dummy-model")
+```
+
+သင် organization တစ်ခုမှာ အဖွဲ့ဝင်ဆိုရင်၊ အဲဒီ organization ရဲ့ namespace ကို upload လုပ်ဖို့ `organization` argument ကို သတ်မှတ်ပေးလိုက်ရုံပါပဲ။
+
+```py
+tokenizer.push_to_hub("dummy-model", organization="huggingface")
+```
+
+သင် သီးခြား Hugging Face token တစ်ခုကို အသုံးပြုချင်တယ်ဆိုရင်၊ `push_to_hub()` method ကိုလည်း သတ်မှတ်ပေးနိုင်ပါတယ်။
+
+```py
+tokenizer.push_to_hub("dummy-model", organization="huggingface", use_auth_token="<TOKEN>")
+```
+
+အခု သင်အခုမှ upload လုပ်ထားတဲ့ model ကို ရှာဖို့ Model Hub: *https://huggingface.co/user-or-organization/dummy-model* ကို သွားပါ။
+
+"Files and versions" tab ကို နှိပ်ပြီး အောက်ပါ screenshot မှာ မြင်ရတဲ့ files တွေကို သင်တွေ့ရပါလိမ့်မယ်။
+
+{#if fw === 'pt'}
+<div class="flex justify-center">
+<img src="https://huggingface.co/datasets/huggingface-course/documentation-images/resolve/main/en/chapter4/push_to_hub_dummy_model.png" alt="Dummy model containing both the tokenizer and model files." width="80%"/>
+</div>
+{:else}
+<div class="flex justify-center">
+<img src="https://huggingface.co/datasets/huggingface-course/documentation-images/resolve/main/en/chapter4/push_to_hub_dummy_model_tf.png" alt="Dummy model containing both the tokenizer and model files." width="80%"/>
+</div>
+{/if}
+
+<Tip>
+
+✏️ **စမ်းသပ်ကြည့်ပါ!** `bert-base-cased` checkpoint နဲ့ ဆက်စပ်နေတဲ့ model နဲ့ tokenizer ကို ယူပြီး `push_to_hub()` method ကို အသုံးပြုပြီး သင့် namespace ထဲက repo တစ်ခုကို upload လုပ်ပါ။ repo ဟာ သင့် page မှာ မှန်ကန်စွာ ပေါ်လာခြင်းရှိမရှိ နှစ်ကြိမ်စစ်ဆေးပြီးမှ ဖျက်ပစ်ပါ။
+
+</Tip>
+
+သင်တွေ့ခဲ့ရတဲ့အတိုင်း၊ `push_to_hub()` method က arguments အများအပြားကို လက်ခံပါတယ်။ ဒါကြောင့် သီးခြား repository ဒါမှမဟုတ် organization namespace ကို upload လုပ်တာ၊ ဒါမှမဟုတ် မတူညီတဲ့ API token တစ်ခုကို အသုံးပြုတာတွေ လုပ်ဆောင်နိုင်ပါတယ်။ ဖြစ်နိုင်ခြေရှိတဲ့ အရာတွေအကြောင်း သိရှိဖို့ [🤗 Transformers documentation](https://huggingface.co/transformers/model_sharing) မှာ တိုက်ရိုက်ရရှိနိုင်တဲ့ method specification ကို ကြည့်ရှုဖို့ ကျွန်တော်တို့ အကြံပြုပါတယ်။
+
+`push_to_hub()` method က [`huggingface_hub`](https://github.com/huggingface/huggingface_hub) Python package ကို အထောက်အပံ့ပေးထားပါတယ်။ ဒီ package က Hugging Face Hub ကို တိုက်ရိုက် API ကို ပေးပါတယ်။ ဒါကို 🤗 Transformers နဲ့ အခြား machine learning libraries များစွာ (ဥပမာ- [`allenlp`](https://github.com/allenai/allennlp) လိုမျိုး) ထဲမှာ ပေါင်းစပ်ထားပါတယ်။ ဒီအခန်းမှာ 🤗 Transformers integration ကို အာရုံစိုက်ပေမယ့်၊ ဒါကို သင်ရဲ့ကိုယ်ပိုင် code ဒါမှမဟုတ် library ထဲကို ပေါင်းစပ်တာက ရိုးရှင်းပါတယ်။
+
+သင်အခုမှ ဖန်တီးခဲ့တဲ့ repository ကို files တွေ ဘယ်လို upload လုပ်ရမယ်ဆိုတာ သိဖို့ နောက်ဆုံးအပိုင်းကို သွားလိုက်ပါ။
+
+## `huggingface_hub` Python library ကို အသုံးပြုခြင်း[[using-the-huggingfacehub-python-library]]
+
+`huggingface_hub` Python library ဟာ model တွေနဲ့ datasets hub တွေအတွက် ကိရိယာအစုံအလင်ကို ပေးတဲ့ package တစ်ခု ဖြစ်ပါတယ်။ Hub ပေါ်က repositories တွေအကြောင်း အချက်အလက်ရယူတာ၊ ၎င်းတို့ကို စီမံခန့်ခွဲတာလိုမျိုး အသုံးများတဲ့ လုပ်ငန်းတွေအတွက် ရိုးရှင်းတဲ့ methods တွေနဲ့ classes တွေကို ပေးထားပါတယ်။ repositories တွေရဲ့ content ကို စီမံခန့်ခွဲဖို့နဲ့ သင်ရဲ့ project တွေနဲ့ libraries တွေထဲမှာ Hub ကို ပေါင်းစပ်ဖို့အတွက် git ရဲ့ အပေါ်မှာ အလုပ်လုပ်တဲ့ ရိုးရှင်းတဲ့ APIs တွေကို ပေးထားပါတယ်။
+
+`push_to_hub` API ကို အသုံးပြုတာနဲ့ ဆင်တူစွာ၊ ဒါက သင့် API token ကို သင့် cache မှာ သိမ်းဆည်းထားဖို့ လိုအပ်ပါလိမ့်မယ်။ ဒါကို လုပ်ဆောင်ဖို့အတွက်၊ ယခင်အပိုင်းမှာ ဖော်ပြခဲ့တဲ့အတိုင်း CLI ကနေ `login` command ကို အသုံးပြုဖို့ လိုအပ်ပါလိမ့်မယ် (Google Colab မှာ run နေတယ်ဆိုရင် ဒီ commands တွေကို `!` character နဲ့ အရှေ့ကနေ ထည့်သွင်းဖို့ သေချာပါစေ)။
+
+```bash
+huggingface-cli login
+```
+
+`huggingface_hub` package က ကျွန်တော်တို့ရဲ့ ရည်ရွယ်ချက်အတွက် အသုံးဝင်တဲ့ methods တွေနဲ့ classes အများအပြားကို ပေးထားပါတယ်။ ပထမဆုံး၊ repository ဖန်တီးခြင်း၊ ဖျက်ပစ်ခြင်း စသည်တို့ကို စီမံခန့်ခွဲရန် methods အချို့ရှိပါတယ်။
+
+```python no-format
+from huggingface_hub import (
+    # User management
+    login,
+    logout,
+    whoami,
+
+    # Repository creation and management
+    create_repo,
+    delete_repo,
+    update_repo_visibility,
+
+    # And some methods to retrieve/change information about the content
+    list_models,
+    list_datasets,
+    list_metrics,
+    list_repo_files,
+    upload_file,
+    delete_file,
+)
+```
+
+ထို့အပြင်၊ ၎င်းသည် local repository တစ်ခုကို စီမံခန့်ခွဲရန် အလွန်အစွမ်းထက်သော `Repository` class ကို ပေးထားသည်။ ၎င်းတို့ကို အကျိုးရှိရှိ အသုံးပြုပုံကို နားလည်ရန် နောက်အပိုင်းအနည်းငယ်တွင် ဤ methods များနှင့် class ကို ကျွန်တော်တို့ လေ့လာသွားပါမယ်။
+
+`create_repo` method ကို အသုံးပြုပြီး Hub မှာ repository အသစ်တစ်ခု ဖန်တီးနိုင်ပါတယ်။
+
+```py
+from huggingface_hub import create_repo
+
+create_repo("dummy-model")
+```
+
+ဒါက သင့် namespace မှာ `dummy-model` repository ကို ဖန်တီးပေးပါလိမ့်မယ်။ သင်ကြိုက်နှစ်သက်ရင် `organization` argument ကို အသုံးပြုပြီး repository က ဘယ် organization နဲ့ သက်ဆိုင်တယ်ဆိုတာ သတ်မှတ်နိုင်ပါတယ်။
+
+```py
+from huggingface_hub import create_repo
+
+create_repo("dummy-model", organization="huggingface")
+```
+
+ဒါက သင်အဲဒီ organization မှာ အဖွဲ့ဝင်ဖြစ်တယ်ဆိုရင် `dummy-model` repository ကို `huggingface` namespace မှာ ဖန်တီးပေးပါလိမ့်မယ်။
+
+အသုံးဝင်နိုင်တဲ့ အခြား arguments တွေကတော့...
+
+-   `private`၊ repository ကို တခြားသူတွေ မြင်နိုင်မမြင်နိုင် သတ်မှတ်ဖို့။
+-   `token`၊ သင် cache မှာ သိမ်းထားတဲ့ token ကို သတ်မှတ်ထားတဲ့ token တစ်ခုနဲ့ override လုပ်ချင်ရင်။
+-   `repo_type`၊ model အစား `dataset` ဒါမှမဟုတ် `space` တစ်ခုကို ဖန်တီးချင်ရင်။ လက်ခံနိုင်တဲ့ တန်ဖိုးတွေကတော့ `"dataset"` နဲ့ `"space"` ဖြစ်ပါတယ်။
+
+repository ကို ဖန်တီးပြီးတာနဲ့၊ ကျွန်တော်တို့ files တွေ ထည့်သွင်းသင့်ပါတယ်။ ဒါကို ဘယ်လိုနည်းလမ်းသုံးခုနဲ့ ကိုင်တွယ်နိုင်မလဲဆိုတာ သိဖို့ နောက်အပိုင်းကို သွားလိုက်ပါ။
+
+## Web Interface ကို အသုံးပြုခြင်း[[using-the-web-interface]]
+
+web interface က Hub မှာ repositories တွေကို တိုက်ရိုက်စီမံခန့်ခွဲဖို့ ကိရိယာတွေကို ပေးထားပါတယ်။ interface ကို အသုံးပြုပြီး၊ သင် repositories တွေ လွယ်လွယ်ကူကူ ဖန်တီးနိုင်တယ်၊ files တွေ ထည့်နိုင်တယ် (ကြီးမားတဲ့ files တွေတောင်မှ!)၊ models တွေကို လေ့လာနိုင်တယ်၊ diffs တွေကို မြင်နိုင်တယ်၊ ပြီးတော့ တခြားအများကြီး လုပ်နိုင်ပါတယ်။
+
+repository အသစ်တစ်ခု ဖန်တီးဖို့ [huggingface.co/new](https://huggingface.co/new) ကို သွားပါ။
+
+<div class="flex justify-center">
+<img src="https://huggingface.co/datasets/huggingface-course/documentation-images/resolve/main/en/chapter4/new_model.png" alt="Page showcasing the model used for the creation of a new model repository." width="80%"/>
+</div>
+
+ပထမဆုံး၊ repository ရဲ့ ပိုင်ရှင်ကို သတ်မှတ်ပါ၊ ဒါက သင်ကိုယ်တိုင် ဒါမှမဟုတ် သင်နဲ့ ဆက်စပ်နေတဲ့ organization တစ်ခုခု ဖြစ်နိုင်ပါတယ်။ သင် organization တစ်ခုကို ရွေးချယ်ရင်၊ model ကို organization ရဲ့ page မှာ ဖော်ပြထားမှာဖြစ်ပြီး organization ရဲ့ အဖွဲ့ဝင်တိုင်းက repository ကို ပံ့ပိုးကူညီနိုင်ပါလိမ့်မယ်။
+
+နောက်တစ်ခုကတော့ သင့် model ရဲ့ နာမည်ကို ထည့်သွင်းပါ။ ဒါက repository ရဲ့ နာမည်လည်း ဖြစ်ပါလိမ့်မယ်။ နောက်ဆုံးအနေနဲ့၊ သင်ရဲ့ model ကို public ဒါမှမဟုတ် private ဖြစ်စေချင်လားဆိုတာ သတ်မှတ်နိုင်ပါတယ်။ Private models တွေကို အများပြည်သူမြင်ကွင်းကနေ ဝှက်ထားပါတယ်။
+
+သင့် model repository ကို ဖန်တီးပြီးတာနဲ့၊ အောက်ပါကဲ့သို့ page တစ်ခုကို သင်တွေ့ရပါလိမ့်မယ်။
+
+<div class="flex justify-center">
+<img src="https://huggingface.co/datasets/huggingface-course/documentation-images/resolve/main/en/chapter4/empty_model.png" alt="An empty model page after creating a new repository." width="80%"/>
+</div>
+
+ဒီနေရာမှာ သင့် model ကို လက်ခံထားမှာ ဖြစ်ပါတယ်။ ဒါကို စတင်ဖြည့်ဆည်းဖို့၊ web interface ကနေ README file တစ်ခုကို တိုက်ရိုက်ထည့်သွင်းနိုင်ပါတယ်။
+
+<div class="flex justify-center">
+<img src="https://huggingface.co/datasets/huggingface-course/documentation-images/resolve/main/en/chapter4/dummy_model.png" alt="The README file showing the Markdown capabilities." width="80%"/>
+</div>
+
+README file က Markdown နဲ့ ရေးထားပါတယ်၊ လွတ်လပ်စွာ ဖန်တီးနိုင်ပါတယ်။ ဒီအခန်းရဲ့ တတိယအပိုင်းက model card တစ်ခုတည်ဆောက်ခြင်းအတွက် ရည်ရွယ်ပါတယ်။ ဒါတွေက သင့် model ကို တန်ဖိုးရှိစေရာမှာ အရေးပါပါတယ်၊ ဘာလို့လဲဆိုတော့ ဒါတွေက သင့် model က ဘာတွေလုပ်နိုင်လဲဆိုတာကို တခြားသူတွေကို ပြောပြတဲ့ နေရာဖြစ်လို့ပါပဲ။
+
+"Files and versions" tab ကို သင်ကြည့်လိုက်ရင်၊ အဲဒီမှာ files အများကြီး မရှိသေးတာကို သင်တွေ့ရပါလိမ့်မယ်၊ သင်အခုမှ ဖန်တီးခဲ့တဲ့ *README.md* နဲ့ ကြီးမားတဲ့ files တွေကို ခြေရာခံထားတဲ့ *.gitattributes* file ပဲ ရှိပါသေးတယ်။
+
+<div class="flex justify-center">
+<img src="https://huggingface.co/datasets/huggingface-course/documentation-images/resolve/main/en/chapter4/files.png" alt="The 'Files and versions' tab only shows the .gitattributes and README.md files." width="80%"/>
+</div>
+
+files အသစ်တွေ ဘယ်လိုထည့်ရမလဲဆိုတာကို နောက်မှာ ကျွန်တော်တို့ လေ့လာသွားပါမယ်။
+
+## Model Files များကို Upload လုပ်ခြင်း[[uploading-the-model-files]]
+
+Hugging Face Hub မှာ files တွေကို စီမံခန့်ခွဲတဲ့ system က ပုံမှန် files တွေအတွက် git ကို အခြေခံထားပြီး၊ ကြီးမားတဲ့ files တွေအတွက် git-lfs ([Git Large File Storage](https://git-lfs.github.com/)) ကို အခြေခံထားပါတယ်။
+
+နောက်အပိုင်းမှာ Hub ကို files တွေ upload လုပ်ဖို့ နည်းလမ်းသုံးခုကို ကျွန်တော်တို့ လေ့လာသွားပါမယ်- `huggingface_hub` ကို အသုံးပြုခြင်းနဲ့ git commands တွေကို အသုံးပြုခြင်းတို့ ဖြစ်ပါတယ်။
+
+### `upload_file` ချဉ်းကပ်မှု[[the-uploadfile-approach]]
+
+`upload_file` ကို အသုံးပြုတာက သင့် system မှာ git နဲ့ git-lfs install လုပ်ထားဖို့ မလိုအပ်ပါဘူး။ ဒါက HTTP POST requests တွေကို အသုံးပြုပြီး files တွေကို 🤗 Hub ကို တိုက်ရိုက် push လုပ်ပါတယ်။ ဒီနည်းလမ်းရဲ့ ကန့်သတ်ချက်ကတော့ 5GB ထက်ကြီးတဲ့ files တွေကို ကိုင်တွယ်နိုင်ခြင်း မရှိပါဘူး။
+သင့် files တွေက 5GB ထက်ကြီးတယ်ဆိုရင် အောက်မှာ အသေးစိတ်ဖော်ပြထားတဲ့ အခြားနည်းလမ်းနှစ်ခုကို လိုက်နာပါ။
+
+API ကို အောက်ပါအတိုင်း အသုံးပြုနိုင်ပါတယ်။
+
+```py
+from huggingface_hub import upload_file
+
+upload_file(
+    "<path_to_file>/config.json",
+    path_in_repo="config.json",
+    repo_id="<namespace>/dummy-model",
+)
+```
+
+ဒါက `<path_to_file>` မှာရှိတဲ့ `config.json` file ကို repository ရဲ့ root ကို `config.json` အနေနဲ့ `dummy-model` repository ကို upload လုပ်ပါလိမ့်မယ်။
+
+အသုံးဝင်နိုင်တဲ့ အခြား arguments တွေကတော့...
+
+-   `token`၊ သင် cache မှာ သိမ်းထားတဲ့ token ကို သတ်မှတ်ထားတဲ့ token တစ်ခုနဲ့ override လုပ်ချင်ရင်။
+-   `repo_type`၊ model အစား `dataset` ဒါမှမဟုတ် `space` တစ်ခုကို upload လုပ်ချင်ရင်။ လက်ခံနိုင်တဲ့ တန်ဖိုးတွေကတော့ `"dataset"` နဲ့ `"space"` ဖြစ်ပါတယ်။
+
+### `Repository` Class[[the-repository-class]]
+
+`Repository` class က local repository တစ်ခုကို git-like နည်းလမ်းနဲ့ စီမံခန့်ခွဲပါတယ်။ ဒါက ကျွန်တော်တို့ လိုအပ်တဲ့ features အားလုံးကို ပေးဖို့အတွက် git နဲ့ ဖြစ်နိုင်တဲ့ ခက်ခဲတဲ့အချက်အများစုကို abstract လုပ်ထားပါတယ်။
+
+ဒီ class ကို အသုံးပြုဖို့အတွက် git နဲ့ git-lfs install လုပ်ထားဖို့ လိုအပ်ပါတယ်။ ဒါကြောင့် မစတင်ခင် git-lfs ကို install လုပ်ပြီး set up လုပ်ထားကြောင်း သေချာပါစေ ([ဒီနေရာမှာ](https://git-lfs.github.com/) install လုပ်နည်းလမ်းညွှန်တွေကို ကြည့်ပါ)။
+
+ကျွန်တော်တို့ အခုမှ ဖန်တီးခဲ့တဲ့ repository နဲ့ စတင်လုပ်ဆောင်ဖို့၊ remote repository ကို clone လုပ်ခြင်းဖြင့် local folder တစ်ခုထဲကို initialize လုပ်နိုင်ပါတယ်။
+
+```py
+from huggingface_hub import Repository
+
+repo = Repository("<path_to_dummy_folder>", clone_from="<namespace>/dummy-model")
+```
+
+ဒါက ကျွန်တော်တို့ရဲ့ working directory မှာ `<path_to_dummy_folder>` folder ကို ဖန်တီးလိုက်ပါတယ်။ ဒီ folder မှာ `.gitattributes` file တစ်ခုတည်းသာ ပါဝင်ပါတယ်၊ ဘာလို့လဲဆိုတော့ repository ကို `create_repo` ကနေ instantiate လုပ်တဲ့အခါ ဖန်တီးခဲ့တဲ့ တစ်ခုတည်းသော file ဖြစ်လို့ပါပဲ။
+
+ဒီအချိန်ကစပြီး၊ ကျွန်တော်တို့ဟာ ရိုးရာ git methods အများအပြားကို အကျိုးရှိရှိ အသုံးပြုနိုင်ပါတယ်။
+
+```py
+repo.git_pull()
+repo.git_add()
+repo.git_commit()
+repo.git_push()
+repo.git_tag()
+```
+
+နဲ့ တခြားအရာတွေ! ရရှိနိုင်တဲ့ methods အားလုံးကို ခြုံငုံသုံးသပ်ဖို့ [ဒီနေရာမှာ](https://github.com/huggingface/huggingface_hub/tree/main/src/huggingface_hub#advanced-programmatic-repository-management) ရရှိနိုင်တဲ့ `Repository` documentation ကို ကြည့်ရှုဖို့ ကျွန်တော်တို့ အကြံပြုပါတယ်။
+
+လက်ရှိမှာ၊ Hub ကို push လုပ်ချင်တဲ့ model တစ်ခုနဲ့ tokenizer တစ်ခု ရှိပါတယ်။ ကျွန်တော်တို့ repository ကို အောင်မြင်စွာ clone လုပ်ခဲ့ပြီးဖြစ်လို့၊ အဲဒီ repository ထဲမှာ files တွေကို သိမ်းဆည်းနိုင်ပါတယ်။
+
+နောက်ဆုံးပြောင်းလဲမှုတွေကို pull လုပ်ခြင်းဖြင့် ကျွန်တော်တို့ရဲ့ local clone က နောက်ဆုံးအခြေအနေဖြစ်ကြောင်း အရင်ဆုံး သေချာစေပါတယ်။
+
+```py
+repo.git_pull()
+```
+
+ဒါပြီးတာနဲ့ model နဲ့ tokenizer files တွေကို သိမ်းဆည်းပါတယ်။
+
+```py
+model.save_pretrained("<path_to_dummy_folder>")
+tokenizer.save_pretrained("<path_to_dummy_folder>")
+```
+
+`<path_to_dummy_folder>` မှာ model နဲ့ tokenizer files အားလုံး အခုပါဝင်နေပါပြီ။ files တွေကို staging area ကို ထည့်တာ၊ commit လုပ်တာနဲ့ Hub ကို push လုပ်တာလိုမျိုး ပုံမှန် git workflow ကို ကျွန်တော်တို့ လိုက်နာပါတယ်။
+
+```py
+repo.git_add()
+repo.git_commit("Add model and tokenizer files")
+repo.git_push()
+```
+
+ဂုဏ်ယူပါတယ်! သင် Hub မှာ ပထမဆုံး files တွေကို push လုပ်ခဲ့ပါပြီ။
+
+### Git-based ချဉ်းကပ်မှု[[the-git-based-approach]]
+
+ဒါက files တွေ upload လုပ်ဖို့ အလွန်ရိုးရှင်းတဲ့ နည်းလမ်းဖြစ်ပါတယ်- ကျွန်တော်တို့ git နဲ့ git-lfs ကို တိုက်ရိုက်အသုံးပြုပါမယ်။ ခက်ခဲတဲ့အချက်အများစုကို ယခင်ချဉ်းကပ်မှုတွေက abstract လုပ်ထားပြီးဖြစ်ပေမယ့်၊ အောက်ပါ method မှာ သတိထားရမယ့် အချက်အချို့ရှိတဲ့အတွက် ပိုမိုရှုပ်ထွေးတဲ့ use-case တစ်ခုကို ကျွန်တော်တို့ လိုက်နာပါမယ်။
+
+ဒီ class ကို အသုံးပြုဖို့အတွက် git နဲ့ git-lfs install လုပ်ထားဖို့ လိုအပ်ပါတယ်။ ဒါကြောင့် မစတင်ခင် [git-lfs](https://git-lfs.github.com/) ကို install လုပ်ပြီး set up လုပ်ထားကြောင်း သေချာပါစေ (ဒီနေရာမှာ install လုပ်နည်းလမ်းညွှန်တွေကို ကြည့်ပါ)။
+
+ပထမဆုံး git-lfs ကို initialize လုပ်ခြင်းဖြင့် စတင်ပါ။
+
+```bash
+git lfs install
+```
+
+```bash
+Updated git hooks.
+Git LFS initialized.
+```
+
+ဒါပြီးတာနဲ့၊ ပထမအဆင့်ကတော့ သင်ရဲ့ model repository ကို clone လုပ်ဖို့ပါပဲ။
+
+```bash
+git clone https://huggingface.co/<namespace>/<your-model-id>
+```
+
+ကျွန်တော့် username က `lysandre` ဖြစ်ပြီး model name က `dummy` ကို အသုံးပြုခဲ့တဲ့အတွက်၊ ကျွန်တော့်အတွက် command က အောက်ပါအတိုင်း ဖြစ်ပါလိမ့်မယ်။
+
+```
+git clone https://huggingface.co/lysandre/dummy
+```
+
+ကျွန်တော့် working directory မှာ *dummy* လို့ခေါ်တဲ့ folder တစ်ခု အခုရှိပါပြီ။ folder ထဲကို `cd` လုပ်ပြီး contents တွေကို ကြည့်နိုင်ပါတယ်။
+
+```bash
+cd dummy && ls
+```
+
+```bash
+README.md
+```
+
+သင် Hugging Face Hub ရဲ့ `create_repo` method ကို အသုံးပြုပြီး repository ကို အခုမှ ဖန်တီးခဲ့တာဆိုရင်၊ ဒီ folder မှာ hidden `.gitattributes` file တစ်ခုတည်းသာ ပါဝင်သင့်ပါတယ်။ သင် web interface ကို အသုံးပြုပြီး repository တစ်ခု ဖန်တီးဖို့ ယခင်အပိုင်းက ညွှန်ကြားချက်တွေကို လိုက်နာခဲ့တယ်ဆိုရင်တော့၊ folder မှာ hidden `.gitattributes` file နဲ့အတူ *README.md* file တစ်ခု ပါဝင်သင့်ပါတယ်။
+
+configuration file တစ်ခု၊ vocabulary file တစ်ခု ဒါမှမဟုတ် kilobytes အနည်းငယ်အောက်ရှိတဲ့ မည်သည့် file ကိုမဆို ပုံမှန်အရွယ်အစားရှိတဲ့ file တစ်ခုကို ထည့်သွင်းတာက မည်သည့် git-based system မှာမဆို လုပ်ဆောင်ရမယ့်အတိုင်း အတိအကျပါပဲ။ သို့သော်လည်း၊ ပိုကြီးမားတဲ့ files တွေကို *huggingface.co* ကို push လုပ်နိုင်ဖို့ git-lfs ကနေတစ်ဆင့် register လုပ်ရပါမယ်။
+
+ကျွန်တော်တို့ရဲ့ dummy repository ကို commit လုပ်ချင်တဲ့ model တစ်ခုနဲ့ tokenizer တစ်ခုကို generate လုပ်ဖို့ Python ကို ခဏပြန်သွားကြရအောင်...
+
+{#if fw === 'pt'}
+```py
+from transformers import AutoModelForMaskedLM, AutoTokenizer
+
+checkpoint = "camembert-base"
+
+model = AutoModelForMaskedLM.from_pretrained(checkpoint)
+tokenizer = AutoTokenizer.from_pretrained(checkpoint)
+
+# Do whatever with the model, train it, fine-tune it...
+
+model.save_pretrained("<path_to_dummy_folder>")
+tokenizer.save_pretrained("<path_to_dummy_folder>")
+```
+{:else}
+```py
+from transformers import TFAutoModelForMaskedLM, AutoTokenizer
+
+checkpoint = "camembert-base"
+
+model = TFAutoModelForMaskedLM.from_pretrained(checkpoint)
+tokenizer = AutoTokenizer.from_pretrained(checkpoint)
+
+# Do whatever with the model, train it, fine-tune it...
+
+model.save_pretrained("<path_to_dummy_folder>")
+tokenizer.save_pretrained("<path_to_dummy_folder>")
+```
+{/if}
+
+model နဲ့ tokenizer artifacts အချို့ကို သိမ်းဆည်းပြီးပြီဆိုတော့ *dummy* folder ကို ထပ်ကြည့်ကြရအောင်...
+
+```bash
+ls
+```
+
+{#if fw === 'pt'}
+```bash
+config.json  pytorch_model.bin  README.md  sentencepiece.bpe.model  special_tokens_map.json tokenizer_config.json  tokenizer.json
+```
+
+သင် file sizes တွေကို (ဥပမာ- `ls -lh` နဲ့) ကြည့်လိုက်ရင် model state dict file (*pytorch_model.bin*) က 400 MB ကျော်ရှိတဲ့ တစ်ခုတည်းသော ခြားနားချက်ဖြစ်တာကို သင်တွေ့ရပါလိမ့်မယ်။
+
+{:else}
+```bash
+config.json  README.md  sentencepiece.bpe.model  special_tokens_map.json  tf_model.h5  tokenizer_config.json  tokenizer.json
+```
+
+သင် file sizes တွေကို (ဥပမာ- `ls -lh` နဲ့) ကြည့်လိုက်ရင် model state dict file (*t5_model.h5*) က 400 MB ကျော်ရှိတဲ့ တစ်ခုတည်းသော ခြားနားချက်ဖြစ်တာကို သင်တွေ့ရပါလိမ့်မယ်။
+
+{/if}
+
+<Tip>
+✏️ web interface ကနေ repository ကို ဖန်တီးတဲ့အခါ၊ *.gitattributes* file ကို *.bin* နဲ့ *.h5* လိုမျိုး သတ်မှတ်ထားတဲ့ extensions တွေပါတဲ့ files တွေကို ကြီးမားတဲ့ files တွေအဖြစ် သတ်မှတ်ဖို့ အလိုအလျောက် set up လုပ်ထားပါတယ်။ ဒါကြောင့် သင့်ဘက်ကနေ မည်သည့် setup မှ မလိုအပ်ဘဲ git-lfs က ၎င်းတို့ကို ခြေရာခံပါလိမ့်မယ်။
+</Tip>
+
+အခု ကျွန်တော်တို့ဟာ ရိုးရာ Git repositories တွေနဲ့ ပုံမှန်လုပ်ဆောင်သလိုပဲ ဆက်လက်လုပ်ဆောင်နိုင်ပါပြီ။ `git add` command ကို အသုံးပြုပြီး files အားလုံးကို Git ရဲ့ staging environment ထဲကို ထည့်နိုင်ပါတယ်။
+
+```bash
+git add .
+```
+
+ပြီးရင် လက်ရှိ staged လုပ်ထားတဲ့ files တွေကို ကြည့်နိုင်ပါတယ်။
+
+```bash
+git status
+```
+
+{#if fw === 'pt'}
+```bash
+On branch main
+Your branch is up to date with 'origin/main'.
+
+Changes to be committed:
+  (use "git restore --staged <file>..." to unstage)
+  modified:   .gitattributes
+	new file:   config.json
+	new file:   pytorch_model.bin
+	new file:   sentencepiece.bpe.model
+	new file:   special_tokens_map.json
+	new file:   tokenizer.json
+	new file:   tokenizer_config.json
+```
+{:else}
+```bash
+On branch main
+Your branch is up to date with 'origin/main'.
+
+Changes to be committed:
+  (use "git restore --staged <file>..." to unstage)
+  modified:   .gitattributes
+  	new file:   config.json
+	new file:   sentencepiece.bpe.model
+	new file:   special_tokens_map.json
+	new file:   tf_model.h5
+	new file:   tokenizer.json
+	new file:   tokenizer_config.json
+```
+{/if}
+
+အလားတူပဲ၊ git-lfs က မှန်ကန်တဲ့ files တွေကို ခြေရာခံနေခြင်းရှိမရှိ ၎င်းရဲ့ `status` command ကို အသုံးပြုပြီး သေချာအောင် လုပ်နိုင်ပါတယ်။
+
+```bash
+git lfs status
+```
+
+{#if fw === 'pt'}
+```bash
+On branch main
+Objects to be pushed to origin/main:
+
+
+Objects to be committed:
+
+	config.json (Git: bc20ff2)
+	pytorch_model.bin (LFS: 35686c2)
+	sentencepiece.bpe.model (LFS: 988bc5a)
+	special_tokens_map.json (Git: cb23931)
+	tokenizer.json (Git: 851ff3e)
+	tokenizer_config.json (Git: f0f7783)
+
+Objects not staged for commit:
+
+
+```
+
+files အားလုံးမှာ `Git` ကို handler အဖြစ်ရှိနေတာကို ကျွန်တော်တို့ တွေ့နိုင်ပါတယ်၊ *pytorch_model.bin* နဲ့ *sentencepiece.bpe.model* တို့မှလွဲ၍ `LFS` ကို handler အဖြစ်ရှိနေပါတယ်။ ကောင်းပါပြီ!
+
+{:else}
+```bash
+On branch main
+Objects to be pushed to origin/main:
+
+
+Objects to be committed:
+
+	config.json (Git: bc20ff2)
+	sentencepiece.bpe.model (LFS: 988bc5a)
+	special_tokens_map.json (Git: cb23931)
+	tf_model.h5 (LFS: 86fce29)
+	tokenizer.json (Git: 851ff3e)
+	tokenizer_config.json (Git: f0f7783)
+
+Objects not staged for commit:
+
+
+```
+
+files အားလုံးမှာ `Git` ကို handler အဖြစ်ရှိနေတာကို ကျွန်တော်တို့ တွေ့နိုင်ပါတယ်၊ *t5_model.h5* မှလွဲ၍ `LFS` ကို handler အဖြစ်ရှိနေပါတယ်။ ကောင်းပါပြီ!
+
+{/if}
+
+နောက်ဆုံးအဆင့်တွေဖြစ်တဲ့ commit လုပ်ခြင်းနဲ့ *huggingface.co* remote repository ကို push လုပ်ခြင်းဆီ ဆက်သွားကြရအောင်...
+
+```bash
+git commit -m "First model version"
+```
+
+{#if fw === 'pt'}
+```bash
+[main b08aab1] First model version
+ 7 files changed, 29027 insertions(+)
+  6 files changed, 36 insertions(+)
+ create mode 100644 config.json
+ create mode 100644 pytorch_model.bin
+ create mode 100644 sentencepiece.bpe.model
+ create mode 100644 special_tokens_map.json
+ create mode 100644 tokenizer.json
+ create mode 100644 tokenizer_config.json
+```
+{:else}
+```bash
+[main b08aab1] First model version
+ 6 files changed, 36 insertions(+)
+ create mode 100644 config.json
+ create mode 100644 sentencepiece.bpe.model
+ create mode 100644 special_tokens_map.json
+ create mode 100644 tf_model.h5
+ create mode 100644 tokenizer.json
+ create mode 100644 tokenizer_config.json
+```
+{/if}
+
+push လုပ်တာက သင့် internet connection ရဲ့ မြန်နှုန်းနဲ့ သင့် files တွေရဲ့ အရွယ်အစားပေါ်မူတည်ပြီး အချိန်အနည်းငယ် ကြာနိုင်ပါတယ်။
+
+```bash
+git push
+```
+
+```bash
+Uploading LFS objects: 100% (1/1), 433 MB | 1.3 MB/s, done.
+Enumerating objects: 11, done.
+Counting objects: 100% (11/11), done.
+Delta compression using up to 12 threads
+Compressing objects: 100% (9/9), done.
+Writing objects: 100% (9/9), 288.27 KiB | 6.27 MiB/s, done.
+Total 9 (delta 1), reused 0 (delta 0), pack-reused 0
+To https://huggingface.co/lysandre/dummy
+   891b41d..b08aab1  main -> main
+```
+
+{#if fw === 'pt'}
+ဒါပြီးတာနဲ့ model repository ကို ကျွန်တော်တို့ကြည့်လိုက်ရင်၊ အခုမှ ထည့်သွင်းထားတဲ့ files တွေအားလုံးကို ကျွန်တော်တို့ တွေ့မြင်နိုင်ပါပြီ။
+
+<div class="flex justify-center">
+<img src="https://huggingface.co/datasets/huggingface-course/documentation-images/resolve/main/en/chapter4/full_model.png" alt="The 'Files and versions' tab now contains all the recently uploaded files." width="80%"/>
+</div>
+
+UI (User Interface) က သင့်ကို model files တွေနဲ့ commits တွေကို လေ့လာနိုင်စေပြီး commit တစ်ခုစီက ဖြစ်ပေါ်စေတဲ့ diffs တွေကို ကြည့်ရှုနိုင်စေပါတယ်။
+
+<div class="flex justify-center">
+<img src="https://huggingface.co/datasets/huggingface-course/documentation-images/resolve/main/en/chapter4/diffs.gif" alt="The diff introduced by the recent commit." width="80%"/>
+</div>
+{:else}
+ဒါပြီးတာနဲ့ model repository ကို ကျွန်တော်တို့ကြည့်လိုက်ရင်၊ အခုမှ ထည့်သွင်းထားတဲ့ files တွေအားလုံးကို ကျွန်တော်တို့ တွေ့မြင်နိုင်ပါပြီ။
+
+<div class="flex justify-center">
+<img src="https://huggingface.co/datasets/huggingface-course/documentation-images/resolve/main/en/chapter4/full_model_tf.png" alt="The 'Files and versions' tab now contains all the recently uploaded files." width="80%"/>
+</div>
+
+UI (User Interface) က သင့်ကို model files တွေနဲ့ commits တွေကို လေ့လာနိုင်စေပြီး commit တစ်ခုစီက ဖြစ်ပေါ်စေတဲ့ diffs တွေကို ကြည့်ရှုနိုင်စေပါတယ်။
+
+<div class="flex justify-center">
+<img src="https://huggingface.co/datasets/huggingface-course/documentation-images/resolve/main/en/chapter4/diffstf.gif" alt="The diff introduced by the recent commit." width="80%"/>
+</div>
+{/if}
+
+## ဝေါဟာရ ရှင်းလင်းချက် (Glossary)
+
+*   **Pretrained Models**: အကြီးစား ဒေတာအမြောက်အမြားဖြင့် ကြိုတင်လေ့ကျင့်ထားပြီးဖြစ်သော AI (Artificial Intelligence) မော်ဒယ်များ။
+*   **🤗 Hub (Hugging Face Hub)**: AI မော်ဒယ်တွေ၊ datasets တွေနဲ့ demo တွေကို အခြားသူတွေနဲ့ မျှဝေဖို့၊ ရှာဖွေဖို့နဲ့ ပြန်လည်အသုံးပြုဖို့အတွက် အွန်လိုင်း platform တစ်ခု ဖြစ်ပါတယ်။
+*   **`push_to_hub` API (Application Programming Interface)**: Hugging Face Transformers library မှ ပံ့ပိုးပေးသော method တစ်ခုဖြစ်ပြီး trained models များနှင့် tokenizers များကို Hugging Face Hub သို့ အလွယ်တကူ upload လုပ်နိုင်စေသည်။
+*   **`huggingface_hub` Python Library**: Hugging Face Hub ကို အပြန်အလှန်ဆက်သွယ်ရန် (repositories ဖန်တီးခြင်း၊ files များ upload လုပ်ခြင်း စသည်ဖြင့်) ကိရိယာများနှင့် functions များကို ပံ့ပိုးပေးသော Python library။
+*   **Web Interface**: Hugging Face Hub ဝက်ဘ်ဆိုဒ်၏ ဂရပ်ဖစ်အခြေခံ အသုံးပြုသူမျက်နှာပြင်။
+*   **Git**: Version control system တစ်ခုဖြစ်ပြီး code changes များကို ခြေရာခံရန်နှင့် developer အများအပြား ပူးပေါင်းလုပ်ဆောင်နိုင်ရန် ဒီဇိုင်းထုတ်ထားသည်။
+*   **Git-LFS (Git Large File Storage)**: Git repositories များတွင် ကြီးမားသော files များကို ထိရောက်စွာ ကိုင်တွယ်ရန်အတွက် extension တစ်ခု။
+*   **Compute Resources**: AI/ML model များကို လေ့ကျင့်ရန်နှင့် အသုံးပြုရန်အတွက် လိုအပ်သော ကွန်ပျူတာ hardware (ဥပမာ- CPU, GPU, memory)။
+*   **Trained Artifacts**: လေ့ကျင့်ထားသော AI model များ၊ ၎င်းတို့၏ weights များ၊ tokenizers များ စသည်ဖြင့် အသုံးပြုနိုင်သော output များ။
+*   **Model Repositories**: Hugging Face Hub တွင် AI model များကို သိမ်းဆည်းထားသော Git repositories များ။
+*   **Authentication Token**: အသုံးပြုသူတစ်ဦး၏ အထောက်အထားကို စစ်ဆေးရန်နှင့် ၎င်းတို့၏ permissions များကို အတည်ပြုရန် အသုံးပြုသော လုံခြုံရေး token။
+*   **`huggingface-cli login`**: Hugging Face CLI (Command Line Interface) မှတစ်ဆင့် Hub ကို login လုပ်ရန် command။
+*   **CLI (Command Line Interface)**: ကွန်ပျူတာပရိုဂရမ်တစ်ခုနှင့် text-based commands များ အသုံးပြု၍ အပြန်အလှန်ဆက်သွယ်သော နည်းလမ်း။
+*   **Notebook**: Jupyter Notebook သို့မဟုတ် Google Colab ကဲ့သို့သော interactive computing environment။
+*   **`notebook_login()` Function**: Jupyter/Colab Notebooks များတွင် Hugging Face Hub ကို login လုပ်ရန် `huggingface_hub` library မှ function။
+*   **Cache Folder**: အနာဂတ်တွင် မြန်ဆန်စွာ ဝင်ရောက်ကြည့်ရှုနိုင်ရန် ဒေတာများကို ယာယီသိမ်းဆည်းထားသော နေရာ။
+*   **Namespace**: Hugging Face Hub တွင် အသုံးပြုသူများ သို့မဟုတ် organization များအတွက် ထူးခြားသော ID သို့မဟုတ် ခွဲခြားသတ်မှတ်မှု။
+*   **`Trainer` API**: Hugging Face Transformers library မှ model များကို ထိရောက်စွာ လေ့ကျင့်ရန်အတွက် ဒီဇိုင်းထုတ်ထားသော မြင့်မားသောအဆင့် (high-level) API။
+*   **`TrainingArguments`**: Trainer API တွင် training process အတွက် parameters များကို သတ်မှတ်ရန် အသုံးပြုသော class။
+*   **`bert-finetuned-mrpc`**: MRPC dataset တွင် fine-tuned လုပ်ထားသော BERT model ကို ဖော်ပြသော အမည်။
+*   **`save_strategy="epoch"`**: Model ကို epoch တိုင်း သိမ်းဆည်းရန် သတ်မှတ်ခြင်း။
+*   **`push_to_hub=True`**: Model ကို Hugging Face Hub သို့ အလိုအလျောက် push လုပ်ရန် သတ်မှတ်ခြင်း။
+*   **`trainer.train()`**: Trainer API ကို အသုံးပြု၍ model ကို လေ့ကျင့်ရန် method။
+*   **`trainer.push_to_hub()`**: Trainer API ကို အသုံးပြု၍ လက်ရှိ training အခြေအနေကို Hub သို့ push လုပ်ရန် method။
+*   **Model Card**: Hugging Face Hub တွင် မော်ဒယ်တစ်ခုစီအတွက် ပါရှိသော အချက်အလက်များပါသည့် စာမျက်နှာ။ ၎င်းတွင် မော်ဒယ်ကို မည်သို့လေ့ကျင့်ခဲ့သည်၊ မည်သည့် datasets များကို အသုံးပြုခဲ့သည်၊ ၎င်း၏ ကန့်သတ်ချက်များ၊ ဘက်လိုက်မှုများ (biases) နှင့် အသုံးပြုနည်းများ ပါဝင်သည်။
+*   **Metadata**: ဒေတာအကြောင်းအရာများကို ဖော်ပြသော ဒေတာ (ဥပမာ- hyperparameters, evaluation results)။
+*   **Hyperparameters**: Model ကို လေ့ကျင့်ရာတွင် အသုံးပြုသော ပြင်ပ parameters များ (ဥပမာ- learning rate, batch size)။
+*   **Evaluation Results**: Model ၏ စွမ်းဆောင်ရည်ကို တိုင်းတာသော ရလဒ်များ (ဥပမာ- accuracy, F1 score)။
+*   **Keras**: TensorFlow အပေါ်တွင် တည်ဆောက်ထားသော open-source deep learning library တစ်ခု။
+*   **`PushToHubCallback`**: Keras models များကို Hub သို့ push လုပ်ရန်အတွက် Hugging Face Transformers library မှ callback class။
+*   **`model.fit()`**: Keras မှာ model ကို လေ့ကျင့်ရန် method။
+*   **`hub_model_id`**: Hub ပေါ်ရှိ model repository အတွက် သတ်မှတ်ထားသော ID။
+*   **Organization Namespace**: Hugging Face Hub တွင် organization တစ်ခုအတွက် သတ်မှတ်ထားသော namespace။
+*   **Configuration Objects**: model ၏ ဖွဲ့စည်းပုံနှင့် parameters များကို သိမ်းဆည်းထားသော object များ။
+*   **`push_to_hub()` Method (Model/Tokenizer/Config)**: model, tokenizer, သို့မဟုတ် configuration object များပေါ်တွင် တိုက်ရိုက်ရရှိနိုင်သော method ဖြစ်ပြီး ၎င်းတို့ကို Hub သို့ push လုပ်ရန်။
+*   **`AutoModelForMaskedLM` / `TFAutoModelForMaskedLM`**: Masked Language Modeling (MLM) အတွက် AutoModel classes များ (PyTorch/TensorFlow)။
+*   **`AutoTokenizer`**: Hugging Face Transformers library မှာ ပါဝင်တဲ့ class တစ်ခုဖြစ်ပြီး မော်ဒယ်အမည်ကို အသုံးပြုပြီး သက်ဆိုင်ရာ tokenizer ကို အလိုအလျောက် load လုပ်ပေးသည်။
+*   **`from_pretrained()`**: Pretrained model သို့မဟုတ် tokenizer ကို Hub မှ load လုပ်ရန် method။
+*   **Weights**: Model ၏ လေ့ကျင့်ပြီးသား parameters များ။
+*   **`dummy-model`**: ဥပမာပြရန်အတွက် အသုံးပြုသော model repository အမည်။
+*   **API Token**: Access token ကို ရည်ညွှန်းသည်။
+*   **`use_auth_token` Argument**: authentication token ကို `push_to_hub()` method သို့ တိုက်ရိုက်ပေးရန် argument။
+*   **`bert-base-cased`**: BERT model ၏ base version, cased text (အကြီးအသေး ခွဲခြားသော) ဖြင့် လေ့ကျင့်ထားသော checkpoint identifier။
+*   **`model_sharing` (Transformers documentation)**: Transformers documentation ရှိ model မျှဝေခြင်းနှင့် ပတ်သက်သော အပိုင်း။
+*   **`allenlp` (Library)**: NLP သုတေသနနှင့် အပလီကေးရှင်းများအတွက် အသုံးပြုသော open-source deep learning library။
+*   **`login`, `logout`, `whoami` (User Management)**: `huggingface_hub` library မှ user account ကို စီမံခန့်ခွဲရန် functions များ။
+*   **`create_repo`, `delete_repo`, `update_repo_visibility` (Repository Management)**: `huggingface_hub` library မှ repository များကို ဖန်တီး၊ ဖျက်ပစ်၊ မြင်နိုင်စွမ်းကို ပြောင်းလဲရန် functions များ။
+*   **`list_models`, `list_datasets`, `list_metrics`, `list_repo_files`, `upload_file`, `delete_file` (Content Information)**: `huggingface_hub` library မှ Hub ပေါ်ရှိ content အကြောင်း အချက်အလက်များ ရယူရန်/ပြောင်းလဲရန် functions များ။
+*   **`private` Argument (create_repo)**: Repository ကို public သို့မဟုတ် private အဖြစ် သတ်မှတ်ရန် argument။
+*   **`repo_type` Argument (create_repo)**: Repository အမျိုးအစားကို သတ်မှတ်ရန် argument (ဥပမာ- `"model"`, `"dataset"`, `"space"`)။
+*   **Space (Hugging Face Space)**: AI demo များကို host လုပ်ရန်အတွက် Hugging Face Hub တွင်ရှိသော platform။
+*   **Owner**: Repository ကို ပိုင်ဆိုင်သော အသုံးပြုသူ သို့မဟုတ် organization။
+*   **Affiliated with**: ဆက်စပ်မှုရှိသော။
+*   **README File (README.md)**: project တစ်ခု သို့မဟုတ် repository တစ်ခုအကြောင်း အချက်အလက်များပါဝင်သော စာသားဖိုင်။ Markdown format ဖြင့် ရေးသားလေ့ရှိသည်။
+*   **Markdown**: စာသားကို အလှဆင်ရန်အတွက် အသုံးပြုသော lightweight markup language။
+*   **`.gitattributes` File**: Git repositories များတွင် ကြီးမားသော files များကို git-lfs ဖြင့် ကိုင်တွယ်ရန် သတ်မှတ်ထားသော configuration file။
+*   **HTTP POST Requests**: Web server သို့ data ပို့ရန်အတွက် အသုံးပြုသော HTTP method။
+*   **Local Repository**: သင့်ကွန်ပျူတာပေါ်ရှိ Git repository ၏ copy တစ်ခု။
+*   **Clone**: Remote Git repository တစ်ခု၏ copy ကို local folder တစ်ခုသို့ ကူးယူခြင်း။
+*   **`repo.git_pull()`**: Local repository ကို remote repository မှ နောက်ဆုံးပြောင်းလဲမှုများဖြင့် update လုပ်ရန် method။
+*   **`repo.git_add()`**: Files များကို Git ၏ staging area သို့ ထည့်ရန် method။
+*   **`repo.git_commit()`**: Staged files များကို repository ၏ history သို့ သိမ်းဆည်းရန် method။
+*   **`repo.git_push()`**: Local commits များကို remote repository သို့ ပေးပို့ရန် method။
+*   **`repo.git_tag()`**: Git history တွင် သီးခြားမှတ်တိုင်တစ်ခုကို အမှတ်အသားပြုရန် tag တစ်ခု ဖန်တီးရန် method။
+*   **`model.save_pretrained()` / `tokenizer.save_pretrained()`**: model သို့မဟုတ် tokenizer ၏ weights နှင့် configuration များကို local folder တစ်ခုသို့ သိမ်းဆည်းရန် method။
+*   **Staging Area**: Git တွင် commit မလုပ်မီ changes များကို ယာယီသိမ်းဆည်းထားသော နေရာ။
+*   **Commit**: Git repository ၏ history သို့ changes များကို သိမ်းဆည်းခြင်း။
+*   **Git-based System**: Git version control system ကို အသုံးပြုထားသော system။
+*   **`git lfs install`**: Git LFS ကို initialize လုပ်ရန် command။
+*   **`git clone`**: Git repository ကို clone လုပ်ရန် command။
+*   **`cd`**: Command line command တစ်ခုဖြစ်ပြီး directory တစ်ခုမှ အခြားတစ်ခုသို့ ပြောင်းလဲရန် အသုံးပြုသည်။
+*   **`ls`**: Command line command တစ်ခုဖြစ်ပြီး လက်ရှိ directory ရှိ ဖိုင်များနှင့် directory များကို ပြသရန် အသုံးပြုသည်။
+*   **Configuration File**: ဆော့ဖ်ဝဲလ်တစ်ခု၏ setting များနှင့် parameters များကို သိမ်းဆည်းထားသော ဖိုင်။
+*   **Vocabulary File**: Tokenizer က အသုံးပြုသော tokens များ၏ စာရင်းပါဝင်သော ဖိုင်။
+*   **Model State Dict File (`pytorch_model.bin` / `tf_model.h5`)**: PyTorch သို့မဟုတ် TensorFlow model ၏ weights များကို သိမ်းဆည်းထားသော ဖိုင်။
+*   **`ls -lh`**: File sizes များကို human-readable format ဖြင့် ပြသရန် `ls` command ၏ option။
+*   **Outlier**: အများစုနှင့် ကွဲပြားနေသော အရာ။
+*   **Git's Staging Environment**: Git တွင် နောက် commit အတွက် ပြင်ဆင်ထားသော changes များရှိရာ နေရာ။
+*   **`git add .`**: လက်ရှိ directory ရှိ changes အားလုံးကို staging area သို့ ထည့်ရန် command။
+*   **`git status`**: လက်ရှိ repository ၏ အခြေအနေ (staged, unstaged, untracked files) ကို ပြသရန် command။
+*   **Handler**: File အမျိုးအစားတစ်ခုကို စီမံခန့်ခွဲသော ကိရိယာ သို့မဟုတ် လုပ်ငန်းစဉ်။
+*   **Remote Repository**: Local repository ၏ အွန်လိုင်း (cloud) ပေါ်ရှိ copy။
+*   **`git commit -m "Message"`**: Commit message ပါဝင်သော commit တစ်ခု ပြုလုပ်ရန် command။
+*   **`git push`**: Local commits များကို remote repository သို့ ပေးပို့ရန် command။
+*   **Uploading LFS Objects**: Git LFS မှ စီမံခန့်ခွဲသော ကြီးမားသော files များကို upload လုပ်ခြင်း။
+*   **Enumerating Objects**: Git history တွင် object များကို ရေတွက်ခြင်း။
+*   **Counting Objects**: Git commit history တွင် object များကို ရေတွက်ခြင်း။
+*   **Delta Compression**: Changes များကိုသာ သိမ်းဆည်းခြင်းဖြင့် file အရွယ်အစားကို လျှော့ချသော နည်းလမ်း။
+*   **Compressing Objects**: Git object များကို ချုံ့ခြင်း။
+*   **Writing Objects**: Git database သို့ object များကို ရေးသားခြင်း။
+*   **Total/Reused/Pack-reused**: Git push လုပ်ငန်းစဉ်၏ အကျဉ်းချုပ် အချက်အလက်များ။
+*   **UI (User Interface)**: အသုံးပြုသူနှင့် အပြန်အလှန်တုံ့ပြန်နိုင်သော ဂရပ်ဖစ်မျက်နှာပြင်။
+*   **Commits**: Git repository ၏ history တွင် မှတ်တမ်းတင်ထားသော changes များ။
+*   **Diffs**: commits နှစ်ခုကြားရှိ ကွာခြားချက်များကို ပြသခြင်း။
\ No newline at end of file
diff --git a/chapters/my/chapter4/4.mdx b/chapters/my/chapter4/4.mdx
new file mode 100644
index 000000000..16bb39742
--- /dev/null
+++ b/chapters/my/chapter4/4.mdx
@@ -0,0 +1,142 @@
+# Model Card တစ်ခု တည်ဆောက်ခြင်း[[building-a-model-card]]
+
+<CourseFloatingBanner
+    chapter={4}
+    classNames="absolute z-10 right-0 top-0"
+/>
+
+Model card ဟာ model repository ထဲက model နဲ့ tokenizer files တွေလောက်ပဲ အရေးကြီးတယ်လို့ ပြောလို့ရပါတယ်။ ဒါဟာ model ရဲ့ ဗဟိုချက်မဖြစ်ပြီး၊ community member တွေက ပြန်လည်အသုံးပြုနိုင်မှု (reusability) ကို သေချာစေကာ၊ ရလဒ်တွေရဲ့ reproducibility ကို အာမခံပေးပြီး၊ အခြား member တွေက ၎င်းတို့ရဲ့ artifacts တွေ တည်ဆောက်နိုင်မယ့် platform တစ်ခုကို ပံ့ပိုးပေးပါတယ်။
+
+training နဲ့ evaluation လုပ်ငန်းစဉ်တွေကို မှတ်တမ်းတင်ထားတာက တခြားသူတွေ model ကနေ ဘာကို မျှော်လင့်နိုင်မလဲဆိုတာ နားလည်အောင် ကူညီပေးပါတယ် — ပြီးတော့ အသုံးပြုခဲ့တဲ့ data၊ preprocessing နဲ့ postprocessing တွေနဲ့ ပတ်သက်တဲ့ လုံလောက်တဲ့ အချက်အလက်တွေကို ပေးခြင်းက model ရဲ့ ကန့်သတ်ချက်တွေ၊ ဘက်လိုက်မှုတွေနဲ့ အသုံးဝင်တဲ့၊ မအသုံးဝင်တဲ့ အခြေအနေတွေကို ဖော်ထုတ်ပြီး နားလည်နိုင်စေဖို့ သေချာစေပါတယ်။
+
+ဒါကြောင့်၊ သင့် model ကို ရှင်းရှင်းလင်းလင်း ဖော်ပြတဲ့ model card တစ်ခု ဖန်တီးတာက အလွန်အရေးကြီးတဲ့ အဆင့်တစ်ခုပါ။ ဒီနေရာမှာ၊ ဒီကိစ္စအတွက် သင့်ကို အကူအညီပေးမယ့် အကြံပြုချက်အချို့ကို ကျွန်တော်တို့ ပေးပါမယ်။ model card ဖန်တီးတာကို သင်အရင်က တွေ့ခဲ့ရတဲ့ *README.md* file (Markdown file) ကနေ လုပ်ဆောင်ပါတယ်။
+
+"model card" သဘောတရားက Google ကနေ သုတေသနလမ်းကြောင်းတစ်ခုကနေ ဆင်းသက်လာတာဖြစ်ပြီး၊ Margaret Mitchell et al. ရေးသားခဲ့တဲ့ ["Model Cards for Model Reporting"](https://arxiv.org/abs/1810.03993) စာတမ်းမှာ ပထမဆုံး မျှဝေခဲ့တာပါ။ ဒီနေရာမှာ ပါဝင်တဲ့ အချက်အလက်အများစုဟာ အဲဒီစာတမ်းပေါ်မှာ အခြေခံထားတာဖြစ်ပြီး၊ reproducibility, reusability နဲ့ fairness ကို တန်ဖိုးထားတဲ့ ကမ္ဘာကြီးမှာ model cards တွေ ဘာကြောင့် ဒီလောက်အရေးကြီးသလဲဆိုတာ နားလည်ဖို့အတွက် အဲဒီစာတမ်းကို လေ့လာကြည့်ဖို့ ကျွန်တော်တို့ အကြံပြုပါတယ်။
+
+model card က အများအားဖြင့် model ဟာ ဘာအတွက်လဲဆိုတဲ့ အလွန်တိုတောင်းတဲ့၊ မြင့်မားသောအဆင့် overview နဲ့ စတင်ပြီး၊ အောက်ပါအပိုင်းတွေမှာ ထပ်ဆောင်းအသေးစိတ်အချက်အလက်တွေ ပါဝင်ပါတယ်။
+
+- Model ဖော်ပြချက် (Model description)
+- ရည်ရွယ်အသုံးပြုပုံများနှင့် ကန့်သတ်ချက်များ (Intended uses & limitations)
+- အသုံးပြုပုံ (How to use)
+- ကန့်သတ်ချက်များနှင့် ဘက်လိုက်မှုများ (Limitations and bias)
+- Training data
+- Training လုပ်ငန်းစဉ် (Training procedure)
+- Evaluation ရလဒ်များ (Evaluation results)
+
+ဒီအပိုင်းတစ်ခုစီမှာ ဘာတွေပါဝင်သင့်လဲဆိုတာ ကြည့်ရအောင်။
+
+### Model ဖော်ပြချက်[[model-description]]
+
+Model ဖော်ပြချက်က model နဲ့ပတ်သက်တဲ့ အခြေခံအသေးစိတ်အချက်အလက်တွေကို ပေးပါတယ်။ ဒါတွေမှာ architecture၊ version၊ စာတမ်းတစ်ခုမှာ မိတ်ဆက်ခဲ့ခြင်းရှိမရှိ၊ မူရင်း implementation ရနိုင်ခြင်းရှိမရှိ၊ စာရေးဆရာနဲ့ model အကြောင်း အထွေထွေအချက်အလက်တွေ ပါဝင်ပါတယ်။ မည်သည့် copyright ကိုမဆို ဒီနေရာမှာ ဖော်ပြသင့်ပါတယ်။ training လုပ်ငန်းစဉ်တွေ၊ parameters တွေနဲ့ အရေးကြီးတဲ့ ငြင်းဆိုချက်တွေအကြောင်း အထွေထွေအချက်အလက်တွေကိုလည်း ဒီအပိုင်းမှာ ဖော်ပြနိုင်ပါတယ်။
+
+### ရည်ရွယ်အသုံးပြုပုံများနှင့် ကန့်သတ်ချက်များ[[intended-uses-limitations]]
+
+ဒီနေရာမှာ model ကို ဘယ်လိုအခြေအနေမျိုးမှာ အသုံးပြုဖို့ ရည်ရွယ်ထားသလဲဆိုတာကို ဖော်ပြရပါမယ်။ ဘယ်ဘာသာစကားတွေ၊ နယ်ပယ်တွေနဲ့ domains တွေမှာ အသုံးချနိုင်တယ်ဆိုတာလည်း ပါဝင်ပါတယ်။ model အတွက် အသုံးမဝင်တဲ့ ဒါမှမဟုတ် စွမ်းဆောင်ရည် နည်းပါးနိုင်တဲ့ အပိုင်းတွေကိုလည်း model card ရဲ့ ဒီအပိုင်းမှာ မှတ်တမ်းတင်နိုင်ပါတယ်။
+
+### အသုံးပြုပုံ[[how-to-use]]
+
+ဒီအပိုင်းမှာ model ကို ဘယ်လိုအသုံးပြုရမယ်ဆိုတဲ့ ဥပမာအချို့ကို ထည့်သွင်းသင့်ပါတယ်။ ဒါတွေက `pipeline()` function အသုံးပြုပုံ၊ model နဲ့ tokenizer classes တွေ အသုံးပြုပုံ၊ ပြီးတော့ အထောက်အကူဖြစ်နိုင်တယ်လို့ သင်ထင်တဲ့ အခြား code တွေလည်း ဖြစ်နိုင်ပါတယ်။
+
+### Training data[[training-data]]
+
+ဒီအပိုင်းက model ကို ဘယ် dataset(s) တွေနဲ့ train လုပ်ခဲ့တယ်ဆိုတာကို ဖော်ပြသင့်ပါတယ်။ dataset(s) အကြောင်း အတိုချုံး ဖော်ပြချက်ကိုလည်း ကြိုဆိုပါတယ်။
+
+### Training လုပ်ငန်းစဉ်[[training-procedure]]
+
+ဒီအပိုင်းမှာ training ရဲ့ သက်ဆိုင်ရာ ကဏ္ဍအားလုံးကို reproducibility ရှုထောင့်ကနေ အသုံးဝင်အောင် ဖော်ပြသင့်ပါတယ်။ ဒါတွေမှာ data ပေါ်မှာ လုပ်ခဲ့တဲ့ preprocessing နဲ့ postprocessing တွေအပြင်၊ model ကို train လုပ်ခဲ့တဲ့ epochs အရေအတွက်၊ batch size၊ learning rate စတဲ့ အသေးစိတ်အချက်အလက်တွေ ပါဝင်ပါတယ်။
+
+### Variables နှင့် Metrics[[variable-and-metrics]]
+
+ဒီနေရာမှာ သင် evaluation အတွက် အသုံးပြုတဲ့ metrics တွေနဲ့ သင်တိုင်းတာနေတဲ့ ကွဲပြားခြားနားတဲ့ factors တွေကို ဖော်ပြသင့်ပါတယ်။ ဘယ် metric(s) တွေကို အသုံးပြုခဲ့လဲ၊ ဘယ် dataset နဲ့ ဘယ် dataset split ပေါ်မှာလဲဆိုတာ ဖော်ပြခြင်းက သင့် model ရဲ့ စွမ်းဆောင်ရည်ကို အခြား models တွေနဲ့ နှိုင်းယှဉ်ဖို့ လွယ်ကူစေပါတယ်။ ဒါတွေက ရည်ရွယ်ထားတဲ့ အသုံးပြုသူတွေနဲ့ use cases တွေလို ယခင်အပိုင်းတွေကနေ အချက်အလက်ယူထားသင့်ပါတယ်။
+
+### Evaluation ရလဒ်များ[[evaluation-results]]
+
+နောက်ဆုံးအနေနဲ့၊ evaluation dataset ပေါ်မှာ model က ဘယ်လောက်ကောင်းကောင်း စွမ်းဆောင်နိုင်တယ်ဆိုတာကို ဖော်ပြပါ။ အကယ်၍ model က decision threshold တစ်ခုကို အသုံးပြုတယ်ဆိုရင်၊ evaluation မှာ အသုံးပြုခဲ့တဲ့ decision threshold ကို ဖော်ပြပါ ဒါမှမဟုတ် ရည်ရွယ်ထားတဲ့ အသုံးပြုမှုများအတွက် မတူညီတဲ့ thresholds တွေမှာ evaluation နဲ့ ပတ်သက်တဲ့ အသေးစိတ်အချက်အလက်တွေကို ပေးပါ။
+
+## ဥပမာ[[example]]
+
+ကောင်းမွန်စွာ ရေးသားထားတဲ့ model cards အချို့ရဲ့ ဥပမာတွေကို အောက်မှာ ကြည့်ရှုနိုင်ပါတယ်။ 
+
+- [`bert-base-cased`](https://huggingface.co/bert-base-cased)
+- [`gpt2`](https://huggingface.co/gpt2)
+- [`distilbert`](https://huggingface.co/distilbert-base-uncased)
+
+မတူညီတဲ့ အဖွဲ့အစည်းတွေနဲ့ ကုမ္ပဏီတွေက နောက်ထပ်ဥပမာတွေကို [ဒီနေရာမှာ](https://github.com/huggingface/model_card/blob/master/examples.md) ရရှိနိုင်ပါတယ်။
+
+## မှတ်ချက်[[note]]
+
+Model cards တွေဟာ models တွေကို publish လုပ်တဲ့အခါ လိုအပ်ချက်တစ်ခု မဟုတ်ပါဘူး၊ ပြီးတော့ သင်တစ်ခု ဖန်တီးတဲ့အခါ အထက်မှာ ဖော်ပြထားတဲ့ အပိုင်းအားလုံးကို ထည့်သွင်းဖို့ မလိုအပ်ပါဘူး။ သို့သော်လည်း၊ model ရဲ့ ရှင်းလင်းပြတ်သားတဲ့ မှတ်တမ်းတင်ခြင်းက အနာဂတ်အသုံးပြုသူတွေကိုသာ အကျိုးပြုမှာဖြစ်တဲ့အတွက်၊ သင်သိသလောက်နဲ့ တတ်နိုင်သမျှ အပိုင်းများစွာကို ဖြည့်စွက်ဖို့ ကျွန်တော်တို့ အကြံပြုပါတယ်။
+
+## Model Card Metadata[[model-card-metadata]]
+
+သင် Hugging Face Hub ကို အနည်းငယ်လေ့လာကြည့်ဖူးတယ်ဆိုရင်၊ တချို့ models တွေက သတ်မှတ်ထားတဲ့ categories တွေထဲမှာ ပါဝင်တာကို သင်တွေ့ဖူးပါလိမ့်မယ်- သင် ဒါတွေကို tasks တွေ၊ languages တွေ၊ libraries တွေနဲ့ အခြားအရာတွေနဲ့ filter လုပ်နိုင်ပါတယ်။ model တစ်ခု ပါဝင်တဲ့ categories တွေကို model card header ထဲမှာ သင်ထည့်တဲ့ metadata နဲ့ ခွဲခြားသတ်မှတ်ပါတယ်။
+
+ဥပမာအားဖြင့်၊ သင် [`camembert-base` model card](https://huggingface.co/camembert-base/blob/main/README.md) ကို ကြည့်လိုက်မယ်ဆိုရင်၊ model card header ထဲမှာ အောက်ပါစာကြောင်းတွေကို သင်တွေ့ရပါလိမ့်မယ်။
+
+```
+---
+language: fr
+license: mit
+datasets:
+- oscar
+---
+```
+
+ဒီ metadata ကို Hugging Face Hub က parse လုပ်ပြီး၊ အဲဒီ model ကို French model အဖြစ်၊ MIT license နဲ့၊ Oscar dataset ပေါ်မှာ train လုပ်ထားတဲ့ model အဖြစ် ခွဲခြားသတ်မှတ်ပါတယ်။
+
+[Full model card specification](https://github.com/huggingface/hub-docs/blame/main/modelcard.md) က languages၊ licenses၊ tags၊ datasets၊ metrics တွေအပြင် model train လုပ်စဉ်က ရရှိခဲ့တဲ့ evaluation results တွေကိုပါ သတ်မှတ်ခွင့်ပြုပါတယ်။
+
+## ဝေါဟာရ ရှင်းလင်းချက် (Glossary)
+
+*   **Model Card**: Hugging Face Hub တွင် မော်ဒယ်တစ်ခုစီအတွက် ပါရှိသော အချက်အလက်များပါသည့် စာမျက်နှာ။ ၎င်းတွင် မော်ဒယ်ကို မည်သို့လေ့ကျင့်ခဲ့သည်၊ မည်သည့် datasets များကို အသုံးပြုခဲ့သည်၊ ၎င်း၏ ကန့်သတ်ချက်များ၊ ဘက်လိုက်မှုများ (biases) နှင့် အသုံးပြုနည်းများ ပါဝင်သည်။
+*   **Model Repository**: Git version control system ကို အသုံးပြု၍ model file များ၊ tokenizer file များ၊ model card (README.md) နှင့် အခြားဆက်စပ်ဖိုင်များကို သိမ်းဆည်းထားသော နေရာ။
+*   **Tokenizer Files**: tokenizer ကို ပြန်လည်တည်ဆောက်ရန် လိုအပ်သော configuration နှင့် vocabulary files များ။
+*   **Community Members**: Hugging Face ပလက်ဖောင်းကို အသုံးပြုပြီး ပံ့ပိုးကူညီနေသော လူပုဂ္ဂိုလ်များ။
+*   **Reproducibility**: သတ်မှတ်ထားသော code နှင့် data ကို အသုံးပြု၍ တူညီသော ရလဒ်များကို ပြန်လည်ထုတ်လုပ်နိုင်ခြင်း။
+*   **Reusability**: ဆော့ဖ်ဝဲလ်အစိတ်အပိုင်းများ သို့မဟုတ် မော်ဒယ်များကို အခြား project များတွင် ပြန်လည်အသုံးပြုနိုင်ခြင်း။
+*   **Artifacts**: Machine Learning project တစ်ခုတွင် ဖန်တီးထားသော အရာများ (ဥပမာ- trained models, datasets, code)။
+*   **Training Process**: Model ကို ဒေတာများဖြင့် လေ့ကျင့်ပေးသည့် လုပ်ငန်းစဉ်။
+*   **Evaluation Process**: Model ၏ စွမ်းဆောင်ရည်ကို တိုင်းတာသည့် လုပ်ငန်းစဉ်။
+*   **Data**: Model ကို လေ့ကျင့်ရန်နှင့် အကဲဖြတ်ရန် အသုံးပြုသော အချက်အလက်များ။
+*   **Preprocessing**: ဒေတာများကို model က နားလည်ပြီး လုပ်ဆောင်နိုင်တဲ့ ပုံစံအဖြစ် ပြောင်းလဲပြင်ဆင်ခြင်း လုပ်ငန်းစဉ်။
+*   **Postprocessing**: Model ၏ output များကို နောက်ဆုံးအသုံးပြုမှုအတွက် ပြင်ဆင်ခြင်း လုပ်ငန်းစဉ်။
+*   **Limitations**: Model တစ်ခု၏ ကန့်သတ်ချက်များ သို့မဟုတ် အားနည်းချက်များ။
+*   **Biases**: Model တစ်ခု၏ ခန့်မှန်းချက်များတွင် ဒေတာ သို့မဟုတ် သင်္ချာဆိုင်ရာ အကြောင်းများကြောင့် ဖြစ်ပေါ်လာသော ဘက်လိုက်မှုများ။
+*   **Contexts**: Model ကို အသုံးပြုသည့် အခြေအနေများ သို့မဟုတ် ပတ်ဝန်းကျင်။
+*   **`README.md`**: Markdown format ဖြင့် ရေးသားထားသော project ၏ အဓိက မှတ်တမ်းဖိုင်။
+*   **Markdown File**: Plain text format တစ်ခုဖြစ်ပြီး formatting syntax ကို အသုံးပြု၍ စာသားကို ပုံစံချခြင်း။
+*   **Model Cards for Model Reporting (Paper)**: Margaret Mitchell et al. မှ ရေးသားခဲ့သော research paper တစ်ခုဖြစ်ပြီး model cards ၏ အရေးပါမှုကို ဖော်ပြသည်။
+*   **Fairness**: AI စနစ်များက အဖွဲ့အစည်းများ သို့မဟုတ် တစ်ဦးချင်းအပေါ် ဘက်လိုက်မှုမရှိဘဲ တန်းတူညီမျှစွာ ဆက်ဆံခြင်း။
+*   **High-level Overview**: ရှုပ်ထွေးသောအသေးစိတ်အချက်အလက်များမပါဘဲ အဓိကအချက်များကိုသာ အကျဉ်းချုပ်ဖော်ပြခြင်း။
+*   **Model Description**: Model ၏ အခြေခံအချက်အလက်များ (architecture, version, author, general info)။
+*   **Architecture**: Model တစ်ခု၏ layers များနှင့် ၎င်းတို့ ချိတ်ဆက်ပုံကို ဖော်ပြသော ဒီဇိုင်းဖွဲ့စည်းပုံ။
+*   **Version**: ဆော့ဖ်ဝဲလ် သို့မဟုတ် model တစ်ခု၏ သီးခြားထုတ်ပြန်မှု။
+*   **Original Implementation**: Model ကို ပထမဆုံးအကြိမ် အကောင်အထည်ဖော်ခဲ့သော code။
+*   **Author**: Model ကို ဖန်တီးသူ။
+*   **Copyright**: ဥပဒေအရ အကာအကွယ်ပေးထားသော ပုံနှိပ်ထုတ်ဝေခြင်း သို့မဟုတ် ဖန်တီးမှုဆိုင်ရာ အခွင့်အရေး။
+*   **Parameters**: Model ၏ လုပ်ဆောင်ချက်ကို သတ်မှတ်ပေးသော အတွင်းပိုင်းတန်ဖိုးများ။
+*   **Disclaimers**: တာဝန်ယူမှု ကန့်သတ်ချက်များကို ဖော်ပြသော ထုတ်ပြန်ချက်များ။
+*   **Intended Uses**: Model ကို အသုံးပြုရန် ရည်ရွယ်ထားသော ကိစ္စရပ်များ။
+*   **Domains**: သီးခြားနယ်ပယ်များ (ဥပမာ- medical domain, legal domain)။
+*   **Out of Scope**: Model ၏ စွမ်းဆောင်နိုင်မှုနယ်ပယ်ပြင်ပ။
+*   **Suboptimally**: အကောင်းဆုံးမဟုတ်ဘဲ စွမ်းဆောင်ရည် နည်းပါးစွာ လုပ်ဆောင်ခြင်း။
+*   **`pipeline()` Function**: Hugging Face Transformers library မှာ ပါဝင်တဲ့ လုပ်ဆောင်ချက်တစ်ခုဖြစ်ပြီး မော်ဒယ်တွေကို သီးခြားလုပ်ငန်းတာဝန်များ (ဥပမာ- စာသားခွဲခြားသတ်မှတ်ခြင်း၊ စာသားထုတ်လုပ်ခြင်း) အတွက် အသုံးပြုရလွယ်ကူအောင် ပြုလုပ်ပေးပါတယ်။
+*   **Tokenizer Classes**: tokenizer ကို အသုံးပြုရန်အတွက် class များ။
+*   **Dataset(s)**: Model ကို လေ့ကျင့်ရန် အသုံးပြုသော ဒေတာအစုအဝေး(များ)။
+*   **Number of Epochs**: Model ကို training dataset တစ်ခုလုံးဖြင့် လေ့ကျင့်သည့် အကြိမ်အရေအတွက်။
+*   **Batch Size**: training လုပ်ငန်းစဉ်တစ်ခုစီတွင် model သို့ ပေးပို့သော input samples အရေအတွက်။
+*   **Learning Rate**: training လုပ်ငန်းစဉ်အတွင်း model ၏ weights များကို မည်မျှပြောင်းလဲရမည်ကို ထိန်းချုပ်သော parameter။
+*   **Variables**: တိုင်းတာရန်အတွက် အသုံးပြုသော အချက်အလက်များ။
+*   **Metrics**: Model ၏ စွမ်းဆောင်ရည်ကို တိုင်းတာရန် အသုံးပြုသော တန်ဖိုးများ (ဥပမာ- accuracy, F1 score)။
+*   **Evaluation Dataset**: Model ၏ စွမ်းဆောင်ရည်ကို အကဲဖြတ်ရန် အသုံးပြုသော dataset အပိုင်း။
+*   **Decision Threshold**: classification model ၏ output ကို အခြေခံပြီး အမျိုးအစားတစ်ခုကို ဆုံးဖြတ်ရန် အသုံးပြုသော တန်ဖိုး။
+*   **Metadata**: data အကြောင်း အချက်အလက်များ (data about data)။
+*   **Model Card Header**: Markdown file (README.md) ၏ အပေါ်ပိုင်းတွင် YAML format ဖြင့် ရေးသားထားသော model ၏ အချက်အလက်များ။
+*   **`camembert-base` Model Card**: `camembert-base` model နှင့် ပတ်သက်သော အချက်အလက်များ ပါဝင်သည့် model card။
+*   **`language: fr`**: Model ၏ ဘာသာစကားသည် ပြင်သစ် (French) ဖြစ်ကြောင်း ဖော်ပြသော metadata။
+*   **`license: mit`**: Model ၏ license အမျိုးအစားသည် MIT ဖြစ်ကြောင်း ဖော်ပြသော metadata။
+*   **`datasets: - oscar`**: Model ကို Oscar dataset ပေါ်တွင် လေ့ကျင့်ခဲ့ကြောင်း ဖော်ပြသော metadata။
+*   **Parsed**: အချက်အလက်များကို ခွဲခြမ်းစိတ်ဖြာပြီး အဓိပ္ပာယ်ဖော်ခြင်း။
+*   **MIT License**: Open-source license တစ်မျိုးဖြစ်ပြီး software ကို လွတ်လပ်စွာ အသုံးပြု၊ ပြင်ဆင်၊ ဖြန့်ဝေနိုင်စေသည်။
+*   **Oscar Dataset**: Large-scale multilingual corpus (ဘာသာစကားမျိုးစုံပါဝင်သော စာသားအစုအဝေး)။
+*   **Tags**: Model ကို ဖော်ပြရန် သို့မဟုတ် ခွဲခြားသတ်မှတ်ရန် အသုံးပြုသော keyword များ။
\ No newline at end of file
diff --git a/chapters/my/chapter4/5.mdx b/chapters/my/chapter4/5.mdx
new file mode 100644
index 000000000..1d7e1005b
--- /dev/null
+++ b/chapters/my/chapter4/5.mdx
@@ -0,0 +1,24 @@
+# အပိုင်း ၁ ပြီးဆုံးပါပြီ![[part-1-completed]]
+
+<CourseFloatingBanner
+    chapter={4}
+    classNames="absolute z-10 right-0 top-0"
+/>
+
+ဒါက သင်တန်းရဲ့ ပထမအပိုင်း ပြီးဆုံးခြင်းဖြစ်ပါတယ်။ အပိုင်း ၂ က နိုဝင်ဘာလ ၁၅ ရက်နေ့မှာ community event ကြီးတစ်ခုနဲ့အတူ ထွက်ရှိလာပါလိမ့်မယ်၊ အသေးစိတ်အချက်အလက်များကို [ဒီနေရာမှာ](https://huggingface.co/blog/course-launch-event) ကြည့်ရှုနိုင်ပါတယ်။
+
+သင်ဟာ အခုဆိုရင် pretrained model တစ်ခုကို text classification problem (စာကြောင်းတစ်ခုတည်း ဒါမှမဟုတ် စာကြောင်းအတွဲများ) မှာ fine-tune လုပ်နိုင်ပြီး ရလဒ်ကို Model Hub သို့ upload လုပ်နိုင်ပါပြီ။ ဒီပထမပိုင်းကို ကျွမ်းကျင်ကြောင်း သေချာစေဖို့၊ သင်စိတ်ဝင်စားတဲ့ problem တစ်ခု (သင်အခြားဘာသာစကားတစ်ခု ပြောတတ်ရင် အင်္ဂလိပ်ဘာသာနဲ့ ဖြစ်စရာမလိုပါဘူး) ကို အတိအကျလုပ်ကြည့်သင့်ပါတယ်။ [Hugging Face forums](https://discuss.huggingface.co/) မှာ အကူအညီရှာနိုင်ပြီး ပြီးသွားတာနဲ့ [ဒီ topic](https://discuss.huggingface.co/t/share-your-projects/6803) မှာ သင်ရဲ့ project ကို မျှဝေနိုင်ပါတယ်။
+
+သင်ဘာတွေ တည်ဆောက်မလဲဆိုတာကို ကျွန်တော်တို့ စောင့်မျှော်နေပါတယ်။
+
+## ဝေါဟာရ ရှင်းလင်းချက် (Glossary)
+
+*   **Pretrained Model**: အကြီးစား ဒေတာအမြောက်အမြားဖြင့် ကြိုတင်လေ့ကျင့်ထားပြီးဖြစ်သော AI (Artificial Intelligence) မော်ဒယ်။
+*   **Text Classification Problem**: စာသားကို သတ်မှတ်ထားသော အမျိုးအစားများထဲသို့ ခွဲခြားသတ်မှတ်ခြင်းနှင့် သက်ဆိုင်သော ပြဿနာ။
+*   **Fine-tune**: ကြိုတင်လေ့ကျင့်ထားပြီးသား (pre-trained) မော်ဒယ်တစ်ခုကို သီးခြားလုပ်ငန်းတစ်ခု (specific task) အတွက် အနည်းငယ်သော ဒေတာနဲ့ ထပ်မံလေ့ကျင့်ပေးခြင်းကို ဆိုလိုပါတယ်။
+*   **Model Hub**: Hugging Face Hub ကို ရည်ညွှန်းပြီး AI မော်ဒယ်များ ရှာဖွေ၊ မျှဝေ၊ အသုံးပြုနိုင်သော ဗဟို platform။
+*   **Upload**: ကွန်ပျူတာတစ်ခုမှ အွန်လိုင်းဆာဗာ သို့မဟုတ် အခြားကွန်ပျူတာတစ်ခုသို့ ဖိုင်များကို ပေးပို့ခြင်း။
+*   **Hugging Face Forums**: Hugging Face အသုံးပြုသူများ အချင်းချင်း မေးခွန်းများမေးရန်၊ အဖြေများပေးရန်နှင့် ဆွေးနွေးရန်အတွက် အွန်လိုင်းဖိုရမ်။
+*   **Community Event**: လူအဖွဲ့အစည်း (community) အတွင်းရှိ အဖွဲ့ဝင်များ စုဝေးပြီး ပါဝင်ဆင်နွှဲသော အခမ်းအနား သို့မဟုတ် လုပ်ဆောင်မှု။
+*   **Topic**: အွန်လိုင်းဖိုရမ် သို့မဟုတ် ဆွေးနွေးပွဲတစ်ခုရှိ သီးခြားခေါင်းစဉ်။
+*   **Project**: သတ်မှတ်ထားသော ရည်ရွယ်ချက်တစ်ခုကို ပြည့်မီရန် စီစဉ်ထားသော အလုပ်။
\ No newline at end of file
diff --git a/chapters/my/chapter4/6.mdx b/chapters/my/chapter4/6.mdx
new file mode 100644
index 000000000..d2209f586
--- /dev/null
+++ b/chapters/my/chapter4/6.mdx
@@ -0,0 +1,271 @@
+<FrameworkSwitchCourse {fw} />
+
+<!-- DISABLE-FRONTMATTER-SECTIONS -->
+
+# အခန်း (၄) ဆိုင်ရာ မေးခွန်းများ[[end-of-chapter-quiz]]
+
+<CourseFloatingBanner
+    chapter={4}
+    classNames="absolute z-10 right-0 top-0"
+/>
+
+ဒီအခန်းမှာ သင်ယူခဲ့တာတွေကို စစ်ဆေးကြည့်ရအောင်။
+
+### ၁။ Hub ပေါ်က models တွေဟာ ဘာတွေနဲ့ ကန့်သတ်ထားလဲ။
+
+<Question
+	choices={[
+		{
+			text: "🤗 Transformers library က models တွေ။",
+			explain: "🤗 Transformers library က models တွေကို Hugging Face Hub ပေါ်မှာ ထောက်ပံ့ပေးထားပေမယ့်၊ အဲဒါတွေ တစ်ခုတည်းတော့ မဟုတ်ပါဘူး!"
+		},
+		{
+			text: "🤗 Transformers နဲ့ ဆင်တူတဲ့ interface ရှိတဲ့ models အားလုံး။",
+			explain: "Hugging Face Hub ကို models တွေ upload လုပ်တဲ့အခါ interface လိုအပ်ချက် မရှိပါဘူး။"
+		},
+		{
+			text: "ကန့်သတ်ချက် မရှိပါဘူး။",
+			explain: "မှန်ပါတယ်။ Hub ကို models တွေ upload လုပ်တဲ့အခါ ကန့်သတ်ချက် မရှိပါဘူး။",
+            correct: true
+		},
+        {
+			text: "NLP နဲ့ တစ်နည်းနည်း ဆက်စပ်နေတဲ့ models တွေ။",
+			explain: "အသုံးချနယ်ပယ်နဲ့ ပတ်သက်တဲ့ လိုအပ်ချက် မသတ်မှတ်ထားပါဘူး!"
+		}
+	]}
+/>
+
+### ၂။ Hub ပေါ်မှာ models တွေကို ဘယ်လို စီမံခန့်ခွဲနိုင်မလဲ။
+
+<Question
+	choices={[
+		{
+			text: "GCP account မှတစ်ဆင့်။",
+			explain: "မမှန်ပါဘူး!"
+		},
+		{
+			text: "peer-to-peer distribution မှတစ်ဆင့်။",
+			explain: "မမှန်ပါဘူး!"
+		},
+		{
+			text: "git နဲ့ git-lfs မှတစ်ဆင့်။",
+			explain: "မှန်ပါတယ်။ Hub ပေါ်က models တွေဟာ ရိုးရှင်းတဲ့ Git repositories တွေဖြစ်ပြီး၊ large files တွေအတွက် <code>git-lfs</code> ကို အကျိုးယူပါတယ်။",
+            correct: true
+		}
+	]}
+/>
+
+### ၃။ Hugging Face Hub web interface ကို အသုံးပြုပြီး ဘာတွေလုပ်ဆောင်နိုင်မလဲ။
+
+<Question
+	choices={[
+		{
+			text: "လက်ရှိ repository ကို fork လုပ်ပါ။",
+			explain: "Hugging Face Hub မှာ repository တစ်ခုကို fork လုပ်လို့ မရပါဘူး။"
+		},
+		{
+			text: "model repository အသစ်တစ်ခု ဖန်တီးပါ။",
+			explain: "မှန်ပါတယ်။ ဒါပေမယ့် ဒါတစ်ခုတည်းတော့ မဟုတ်ပါဘူး။",
+            correct: true
+		},
+		{
+			text: "ဖိုင်တွေကို စီမံခန့်ခွဲပြီး ပြင်ဆင်ပါ။",
+			explain: "မှန်ပါတယ်။ ဒါပေမယ့် ဒါတစ်ခုတည်း အဖြေမှန် မဟုတ်ပါဘူး။",
+            correct: true
+		},
+        {
+			text: "ဖိုင်တွေကို upload လုပ်ပါ။",
+			explain: "မှန်ပါတယ်။ ဒါပေမယ့် ဒါတစ်ခုတည်းတော့ မဟုတ်ပါဘူး။",
+            correct: true
+		},
+        {
+			text: "versions တွေအလိုက် diffs တွေကို ကြည့်ပါ။",
+			explain: "မှန်ပါတယ်။ ဒါပေမယ့် ဒါတစ်ခုတည်းတော့ မဟုတ်ပါဘူး။",
+            correct: true
+		}
+	]}
+/>
+
+### ၄။ Model card ဆိုတာဘာလဲ။
+
+<Question
+	choices={[
+		{
+			text: "model ရဲ့ ကြမ်းတမ်းတဲ့ ဖော်ပြချက်တစ်ခုဖြစ်ပြီး၊ model နဲ့ tokenizer files တွေထက် အရေးမကြီးပါဘူး။",
+			explain: "ဒါဟာ model ရဲ့ ဖော်ပြချက်တစ်ခုဖြစ်ပေမယ့်၊ အရေးကြီးတဲ့ အစိတ်အပိုင်းတစ်ခုပါ။ ဒါက မပြည့်စုံဘူးဆိုရင် ဒါမှမဟုတ် မရှိဘူးဆိုရင် model ရဲ့ အသုံးဝင်မှုက သိသိသာသာ လျော့ကျသွားပါလိမ့်မယ်။"
+		},
+		{
+			text: "reproducibility, reusability, နဲ့ fairness ကို သေချာစေမယ့် နည်းလမ်းတစ်ခု။",
+			explain: "မှန်ပါတယ်။ model card မှာ မှန်ကန်တဲ့ အချက်အလက်တွေ မျှဝေခြင်းက သုံးစွဲသူတွေကို သင့် model ကို အကျိုးယူနိုင်စေပြီး ၎င်းရဲ့ ကန့်သတ်ချက်တွေနဲ့ ဘက်လိုက်မှုတွေကို သိရှိစေပါလိမ့်မယ်။",
+            correct: true
+		},
+		{
+			text: "model အကြောင်း အချက်အလက်တွေ ပြန်လည်ရယူဖို့ run နိုင်တဲ့ Python file တစ်ခု။",
+			explain: "Model cards တွေဟာ ရိုးရှင်းတဲ့ Markdown files တွေပါ။"
+		}
+	]}
+/>
+
+### ၅။ 🤗 Transformers library ရဲ့ ဘယ် objects တွေက `push_to_hub()` နဲ့ Hub ပေါ်မှာ တိုက်ရိုက်မျှဝေနိုင်လဲ။
+
+{#if fw === 'pt'}
+<Question
+	choices={[
+		{
+			text: "Tokenizer",
+			explain: "မှန်ပါတယ်။ Tokenizers အားလုံးမှာ <code>push_to_hub</code> method ပါဝင်ပြီး၊ အဲဒါကို အသုံးပြုခြင်းက tokenizer files (vocabulary, tokenizer ရဲ့ architecture စသည်) အားလုံးကို သတ်မှတ်ထားတဲ့ repo သို့ push လုပ်ပါလိမ့်မယ်။ ဒါပေမယ့် ဒါတစ်ခုတည်းတော့ အဖြေမှန် မဟုတ်ပါဘူး။",
+            correct: true
+		},
+		{
+			text: "Model configuration",
+			explain: "မှန်ပါတယ်။ Model configurations အားလုံးမှာ <code>push_to_hub</code> method ပါဝင်ပြီး၊ အဲဒါကို အသုံးပြုခြင်းက ၎င်းတို့ကို သတ်မှတ်ထားတဲ့ repo သို့ push လုပ်ပါလိမ့်မယ်။ အခြားဘာတွေ မျှဝေနိုင်သေးလဲ။",
+            correct: true
+		},
+		{
+			text: "Model",
+			explain: "မှန်ပါတယ်။ Models အားလုံးမှာ <code>push_to_hub</code> method ပါဝင်ပြီး၊ အဲဒါကို အသုံးပြုခြင်းက ၎င်းတို့နဲ့ ၎င်းတို့ရဲ့ configuration files တွေကို သတ်မှတ်ထားတဲ့ repo သို့ push လုပ်ပါလိမ့်မယ်။ ဒါပေမယ့် ဒါတစ်ခုတည်းတော့ မဟုတ်ပါဘူး။",
+            correct: true
+		},
+        {
+			text: "Trainer",
+			explain: "မှန်ပါတယ်။ <code>Trainer</code> ကလည်း <code>push_to_hub</code> method ကို အကောင်အထည်ဖော်ထားပြီး၊ အဲဒါကို အသုံးပြုခြင်းက model၊ ၎င်းရဲ့ configuration၊ tokenizer နဲ့ model card draft တို့ကို သတ်မှတ်ထားတဲ့ repo သို့ upload လုပ်ပါလိမ့်မယ်။ အခြားအဖြေတစ်ခု စမ်းကြည့်ပါ။",
+            correct: true
+		}
+	]}
+/>
+{:else}
+<Question
+	choices={[
+		{
+			text: "Tokenizer",
+			explain: "မှန်ပါတယ်။ Tokenizers အားလုံးမှာ <code>push_to_hub</code> method ပါဝင်ပြီး၊ အဲဒါကို အသုံးပြုခြင်းက tokenizer files (vocabulary, tokenizer ရဲ့ architecture စသည်) အားလုံးကို သတ်မှတ်ထားတဲ့ repo သို့ push လုပ်ပါလိမ့်မယ်။ ဒါပေမယ့် ဒါတစ်ခုတည်းတော့ အဖြေမှန် မဟုတ်ပါဘူး။",
+            correct: true
+		},
+		{
+			text: "Model configuration",
+			explain: "မှန်ပါတယ်။ Model configurations အားလုံးမှာ <code>push_to_hub</code> method ပါဝင်ပြီး၊ အဲဒါကို အသုံးပြုခြင်းက ၎င်းတို့ကို သတ်မှတ်ထားတဲ့ repo သို့ push လုပ်ပါလိမ့်မယ်။ အခြားဘာတွေ မျှဝေနိုင်သေးလဲ။",
+            correct: true
+		},
+		{
+			text: "Model",
+			explain: "မှန်ပါတယ်။ Models အားလုံးမှာ <code>push_to_hub</code> method ပါဝင်ပြီး၊ အဲဒါကို အသုံးပြုခြင်းက ၎င်းတို့နဲ့ ၎င်းတို့ရဲ့ configuration files တွေကို သတ်မှတ်ထားတဲ့ repo သို့ push လုပ်ပါလိမ့်မယ်။ ဒါပေမယ့် ဒါတစ်ခုတည်းတော့ မဟုတ်ပါဘူး။",
+            correct: true
+		},
+		{
+			text: "အထက်ပါအရာအားလုံးကို သီးသန့် callback တစ်ခုဖြင့်",
+			explain: "မှန်ပါတယ်။ <code>PushToHubCallback</code> က training လုပ်နေစဉ် အဲဒီ objects အားလုံးကို repo သို့ ပုံမှန် ပေးပို့ပါလိမ့်မယ်။",
+            correct: true
+		}
+	]}
+/>
+{/if}
+
+### ၆။ `push_to_hub()` method သို့မဟုတ် CLI tools တွေကို အသုံးပြုတဲ့အခါ ပထမဆုံးအဆင့်က ဘာလဲ။
+
+<Question
+	choices={[
+		{
+			text: "website ပေါ်မှာ login ဝင်ပါ။",
+			explain: "ဒါက သင့် local machine မှာ အကူအညီဖြစ်မှာ မဟုတ်ပါဘူး။"
+		},
+		{
+			text: "terminal မှာ 'huggingface-cli login' ကို run ပါ။",
+			explain: "မှန်ပါတယ် — ဒါက သင့်ရဲ့ personal token ကို download လုပ်ပြီး cache လုပ်ပါလိမ့်မယ်။",
+            correct: true
+		},
+		{
+			text: "notebook မှာ 'notebook_login()' ကို run ပါ။",
+			explain: "မှန်ပါတယ် — ဒါက သင့်ကို authentication လုပ်ခွင့်ပေးမယ့် widget တစ်ခုကို ပြသပါလိမ့်မယ်။",
+            correct: true
+		},
+	]}
+/>
+
+### ၇။ သင်ဟာ model တစ်ခုနဲ့ tokenizer တစ်ခုကို အသုံးပြုနေတယ် — ၎င်းတို့ကို Hub သို့ ဘယ်လို upload လုပ်မလဲ။
+
+<Question
+	choices={[
+		{
+			text: "model နဲ့ tokenizer ပေါ်မှာ push_to_hub method ကို တိုက်ရိုက်ခေါ်ခြင်းဖြင့်။",
+			explain: "မှန်ပါတယ်!",
+            correct: true
+		},
+		{
+			text: "Python runtime ထဲမှာ ၎င်းတို့ကို <code>huggingface_hub</code> utility တစ်ခုနဲ့ wrap လုပ်ခြင်းဖြင့်။",
+			explain: "Models တွေနဲ့ tokenizers တွေက <code>huggingface_hub</code> utilities တွေကနေ အကျိုးကျေးဇူးတွေ ရနေပြီးသားပါ။ ထပ်ဆောင်း wrap လုပ်ဖို့ မလိုပါဘူး!"
+		},
+		{
+			text: "၎င်းတို့ကို disk ထဲသို့ save လုပ်ပြီး <code>transformers-cli upload-model</code> ကို ခေါ်ခြင်းဖြင့်။",
+			explain: "<code>upload-model</code> command က မရှိပါဘူး။"
+		}
+	]}
+/>
+
+### ၈။ `Repository` class ကို အသုံးပြုပြီး ဘယ် git operations တွေ လုပ်ဆောင်နိုင်လဲ။
+
+<Question
+	choices={[
+		{
+			text: "A commit။",
+			explain: "မှန်ပါတယ်။ <code>git_commit()</code> method က ဒါအတွက် ရှိနေပါတယ်။",
+            correct: true
+		},
+		{
+			text: "A pull",
+			explain: "<code>git_pull()</code> method ရဲ့ ရည်ရွယ်ချက်က ဒါပါပဲ။",
+            correct: true
+		},
+		{
+			text: "A push",
+			explain: "<code>git_push()</code> method က ဒါကို လုပ်ဆောင်ပါတယ်။",
+            correct: true
+		},
+		{
+			text: "A merge",
+			explain: "မမှန်ပါဘူး၊ အဲဒီ operation က ဒီ API နဲ့ ဘယ်တော့မှ ဖြစ်နိုင်မှာ မဟုတ်ပါဘူး။"
+		}
+	]}
+/>
+
+## ဝေါဟာရ ရှင်းလင်းချက် (Glossary)
+
+*   **Models**: Artificial Intelligence (AI) နယ်ပယ်တွင် အချက်အလက်များကို လေ့လာပြီး ခန့်မှန်းချက်များ ပြုလုပ်ရန် ဒီဇိုင်းထုတ်ထားသော သင်္ချာဆိုင်ရာဖွဲ့စည်းပုံများ။
+*   **Hub (Hugging Face Hub)**: AI မော်ဒယ်တွေ၊ datasets တွေနဲ့ demo တွေကို အခြားသူတွေနဲ့ မျှဝေဖို့၊ ရှာဖွေဖို့နဲ့ ပြန်လည်အသုံးပြုဖို့အတွက် အွန်လိုင်း platform တစ်ခု ဖြစ်ပါတယ်။
+*   **🤗 Transformers Library**: Hugging Face က ထုတ်လုပ်ထားတဲ့ library တစ်ခုဖြစ်ပြီး Transformer မော်ဒယ်တွေကို အသုံးပြုပြီး Natural Language Processing (NLP), computer vision, audio processing စတဲ့ နယ်ပယ်တွေမှာ အဆင့်မြင့် AI မော်ဒယ်တွေကို တည်ဆောက်ပြီး အသုံးပြုနိုင်စေပါတယ်။
+*   **Interface**: ဆော့ဖ်ဝဲလ် နှစ်ခုကြား သို့မဟုတ် အသုံးပြုသူနှင့် ဆော့ဖ်ဝဲလ်ကြား အပြန်အလှန် ချိတ်ဆက်ဆောင်ရွက်နိုင်သော စနစ်။
+*   **NLP (Natural Language Processing)**: ကွန်ပျူတာတွေ လူသားဘာသာစကားကို နားလည်၊ အဓိပ္ပာယ်ဖော်ပြီး၊ ဖန်တီးနိုင်အောင် လုပ်ဆောင်ပေးတဲ့ Artificial Intelligence (AI) ရဲ့ နယ်ပယ်ခွဲတစ်ခုပါ။
+*   **GCP (Google Cloud Platform)**: Google မှ ပံ့ပိုးပေးသော cloud computing ဝန်ဆောင်မှုများ။
+*   **Peer-to-peer distribution**: ကွန်ရက်တစ်ခုရှိ ကွန်ပျူတာများကြား ဖိုင်များ သို့မဟုတ် ဒေတာများကို တိုက်ရိုက်မျှဝေခြင်း။
+*   **`git`**: Version control system တစ်ခုဖြစ်ပြီး project files တွေကို ခြေရာခံ၊ စီမံခန့်ခွဲရာတွင် အသုံးပြုသည်။
+*   **`git-lfs` (Git Large File Storage)**: ကြီးမားသော binary files များကို Git repository များတွင် ထိရောက်စွာ ကိုင်တွယ်နိုင်ရန် Git ၏ extension တစ်ခု။
+*   **Git Repositories**: Git version control system ကို အသုံးပြု၍ project တစ်ခု၏ files များနှင့် ၎င်းတို့၏ ပြောင်းလဲမှု မှတ်တမ်းများကို သိမ်းဆည်းထားသော နေရာ။
+*   **Fork**: လက်ရှိ repository ၏ မိတ္တူတစ်ခုကို ဖန်တီးပြီး သီးခြားစီ ပြောင်းလဲပြင်ဆင်ခြင်း။
+*   **Model Repository**: Git version control system ကို အသုံးပြု၍ model file များ၊ tokenizer file များ၊ model card (README.md) နှင့် အခြားဆက်စပ်ဖိုင်များကို သိမ်းဆည်းထားသော နေရာ။
+*   **Diffs**: ဖိုင်နှစ်ခုကြား ကွာခြားချက်များကို ပြသခြင်း။
+*   **Model Card**: Hugging Face Hub တွင် မော်ဒယ်တစ်ခုစီအတွက် ပါရှိသော အချက်အလက်များပါသည့် စာမျက်နှာ။ ၎င်းတွင် မော်ဒယ်ကို မည်သို့လေ့ကျင့်ခဲ့သည်၊ မည်သည့် datasets များကို အသုံးပြုခဲ့သည်၊ ၎င်း၏ ကန့်သတ်ချက်များ၊ ဘက်လိုက်မှုများ (biases) နှင့် အသုံးပြုနည်းများ ပါဝင်သည်။
+*   **Reproducibility**: သတ်မှတ်ထားသော code နှင့် data ကို အသုံးပြု၍ တူညီသော ရလဒ်များကို ပြန်လည်ထုတ်လုပ်နိုင်ခြင်း။
+*   **Reusability**: ဆော့ဖ်ဝဲလ်အစိတ်အပိုင်းများ သို့မဟုတ် မော်ဒယ်များကို အခြား project များတွင် ပြန်လည်အသုံးပြုနိုင်ခြင်း။
+*   **Fairness**: AI စနစ်များက အဖွဲ့အစည်းများ သို့မဟုတ် တစ်ဦးချင်းအပေါ် ဘက်လိုက်မှုမရှိဘဲ တန်းတူညီမျှစွာ ဆက်ဆံခြင်း။
+*   **Tokenizer**: စာသား (သို့မဟုတ် အခြားဒေတာ) ကို AI မော်ဒယ်များ စီမံဆောင်ရွက်နိုင်ရန် tokens တွေအဖြစ် ပိုင်းခြားပေးသည့် ကိရိယာ သို့မဟုတ် လုပ်ငန်းစဉ်။
+*   **`push_to_hub()` Method**: Hugging Face Transformers library မှ model, tokenizer, သို့မဟုတ် configuration များကို Hugging Face Hub သို့ upload လုပ်ရန် အသုံးပြုသော method။
+*   **Model Configuration**: model ၏ ဖွဲ့စည်းပုံ (architecture, hyperparameters စသည်) ကို ဖော်ပြသော အချက်အလက်များ။
+*   **Trainer**: Hugging Face Transformers library မှ model များကို လေ့ကျင့်ရန်အတွက် မြင့်မားသောအဆင့် (high-level) API။
+*   **Vocabulary**: tokenizer သို့မဟုတ် model တစ်ခုက သိရှိနားလည်ပြီး ကိုင်တွယ်နိုင်သော ထူးခြားသည့် tokens များ စုစုပေါင်း။
+*   **`PushToHubCallback`**: PyTorch Trainer တွင် အသုံးပြုသော callback တစ်ခုဖြစ်ပြီး training လုပ်နေစဉ်အတွင်း models, tokenizers, configuration များနှင့် model card draft များကို Hub သို့ ပုံမှန် update လုပ်ရန်။
+*   **CLI Tools (Command Line Interface Tools)**: command line မှတစ်ဆင့် အပြန်အလှန်တုံ့ပြန်နိုင်သော ဆော့ဖ်ဝဲလ်ကိရိယာများ။
+*   **`huggingface-cli login`**: Hugging Face CLI (Command Line Interface) မှ Hugging Face Hub သို့ login ဝင်ရန် အသုံးပြုသော command။
+*   **Personal Token**: Hugging Face Hub တွင် အကောင့် authentication အတွက် အသုံးပြုသော ထူးခြားသည့် ကုဒ်။
+*   **Cache**: မကြာခဏ အသုံးပြုရသော ဒေတာများကို မြန်မြန်ဆန်ဆန် ဝင်ရောက်ကြည့်ရှုနိုင်ရန် သိမ်းဆည်းထားသော ယာယီသိုလှောင်ရာနေရာ။
+*   **`notebook_login()`**: Jupyter/Colab Notebooks များတွင် Hugging Face Hub သို့ login ဝင်ရန် အသုံးပြုသော function။
+*   **Widget**: Graphical User Interface (GUI) တွင် အသုံးပြုသူနှင့် အပြန်အလှန်တုံ့ပြန်နိုင်သော အစိတ်အပိုင်းများ (ဥပမာ- input box, button)။
+*   **`huggingface_hub` Utility**: Hugging Face Hub နှင့် အပြန်အလှန်ဆက်သွယ်ရန် အသုံးပြုသော Python library။
+*   **Python Runtime**: Python code ကို လက်ရှိ run နေသော ပတ်ဝန်းကျင်။
+*   **`Repository` Class**: `huggingface_hub` library မှ Git repository များကို ကိုင်တွယ်ရန်အတွက် class။
+*   **Git Operations**: Git version control system ဖြင့် လုပ်ဆောင်နိုင်သော လုပ်ငန်းများ (ဥပမာ- commit, pull, push, merge)။
+*   **Commit**: Git repository တွင် ပြောင်းလဲမှုများကို မှတ်တမ်းတင်ခြင်း။
+*   **`git_commit()` Method**: `Repository` class မှ commit လုပ်ရန်အတွက် method။
+*   **Pull**: အဝေးထိန်း repository (remote repository) မှ ပြောင်းလဲမှုများကို local repository သို့ ရယူခြင်း။
+*   **`git_pull()` Method**: `Repository` class မှ pull လုပ်ရန်အတွက် method။
+*   **Push**: Local repository မှ ပြောင်းလဲမှုများကို အဝေးထိန်း repository သို့ ပေးပို့ခြင်း။
+*   **`git_push()` Method**: `Repository` class မှ push လုပ်ရန်အတွက် method။
+*   **Merge**: Git တွင် မတူညီသော branches များမှ ပြောင်းလဲမှုများကို ပေါင်းစပ်ခြင်း။
\ No newline at end of file
diff --git a/chapters/my/chapter5/1.mdx b/chapters/my/chapter5/1.mdx
new file mode 100644
index 000000000..d7a739c80
--- /dev/null
+++ b/chapters/my/chapter5/1.mdx
@@ -0,0 +1,40 @@
+# နိဒါန်း[[introduction]]
+
+<CourseFloatingBanner
+    chapter={5}
+    classNames="absolute z-10 right-0 top-0"
+/>
+
+[Chapter 3](/course/chapter3) မှာ သင်ဟာ 🤗 Datasets library ရဲ့ ပထမဆုံး အတွေ့အကြုံကို ရရှိခဲ့ပြီး model တစ်ခုကို fine-tuning လုပ်တဲ့အခါ အဓိကအဆင့်သုံးဆင့်ရှိတယ်ဆိုတာကို တွေ့မြင်ခဲ့ရပါတယ်-
+
+၁။ Hugging Face Hub ကနေ dataset တစ်ခုကို load လုပ်ပါ။
+၂။ `Dataset.map()` နဲ့ data ကို preprocess လုပ်ပါ။
+၃။ metrics တွေကို load လုပ်ပြီး တွက်ချက်ပါ။
+
+ဒါပေမယ့် ဒါတွေဟာ 🤗 Datasets လုပ်နိုင်တဲ့အရာတွေရဲ့ အပေါ်ယံမျှသာ ရှိပါသေးတယ်။ ဒီအခန်းမှာ၊ ကျွန်တော်တို့ library ကို နက်နက်နဲနဲ လေ့လာသွားမှာပါ။ ဒီလိုလုပ်ရင်း၊ အောက်ပါမေးခွန်းတွေရဲ့ အဖြေတွေကို ရှာဖွေသွားမှာပါ-
+
+*   သင်၏ dataset က Hub ပေါ်မှာ မရှိရင် ဘာလုပ်ရမလဲ။
+*   dataset တစ်ခုကို ဘယ်လို slice and dice လုပ်မလဲ။ (ပြီးတော့ Pandas ကို _တကယ်_ အသုံးပြုဖို့ လိုအပ်ရင် ဘယ်လိုလုပ်မလဲ။)
+*   သင်၏ dataset က ကြီးမားလွန်းပြီး သင့် laptop ရဲ့ RAM ကို အရည်ပျော်သွားစေနိုင်ရင် ဘာလုပ်ရမလဲ။
+*   "memory mapping" နဲ့ Apache Arrow ဆိုတာ ဘာတွေလဲ။
+*   သင့်ကိုယ်ပိုင် dataset ကို ဘယ်လိုဖန်တီးပြီး Hub ကို push လုပ်မလဲ။
+
+ဒီနေရာမှာ သင်ယူရမယ့် နည်းလမ်းတွေက [Chapter 6](/course/chapter6) နဲ့ [Chapter 7](/course/chapter7) မှာ ပါဝင်မယ့် အဆင့်မြင့် tokenization နဲ့ fine-tuning လုပ်ငန်းတွေအတွက် သင့်ကို ပြင်ဆင်ပေးပါလိမ့်မယ် — ဒါကြောင့် ကော်ဖီတစ်ခွက်သောက်ပြီး စလိုက်ရအောင်!
+
+## ဝေါဟာရ ရှင်းလင်းချက် (Glossary)
+
+*   **🤗 Datasets Library**: Hugging Face က ထုတ်လုပ်ထားတဲ့ library တစ်ခုဖြစ်ပြီး AI မော်ဒယ်တွေ လေ့ကျင့်ဖို့အတွက် ဒေတာအစုအဝေး (datasets) တွေကို လွယ်လွယ်ကူကူ ဝင်ရောက်ရယူ၊ စီမံခန့်ခွဲပြီး အသုံးပြုနိုင်စေပါတယ်။
+*   **Fine-tuning**: ကြိုတင်လေ့ကျင့်ထားပြီးသား (pre-trained) မော်ဒယ်တစ်ခုကို သီးခြားလုပ်ငန်းတစ်ခု (specific task) အတွက် အနည်းငယ်သော ဒေတာနဲ့ ထပ်မံလေ့ကျင့်ပေးခြင်းကို ဆိုလိုပါတယ်။
+*   **Model**: Artificial Intelligence (AI) နယ်ပယ်တွင် အချက်အလက်များကို လေ့လာပြီး ခန့်မှန်းချက်များ ပြုလုပ်ရန် ဒီဇိုင်းထုတ်ထားသော သင်္ချာဆိုင်ရာဖွဲ့စည်းပုံများ။
+*   **Hugging Face Hub**: AI မော်ဒယ်တွေ၊ datasets တွေနဲ့ demo တွေကို အခြားသူတွေနဲ့ မျှဝေဖို့၊ ရှာဖွေဖို့နဲ့ ပြန်လည်အသုံးပြုဖို့အတွက် အွန်လိုင်း platform တစ်ခု ဖြစ်ပါတယ်။
+*   **Dataset**: AI မော်ဒယ်တွေ လေ့ကျင့်ဖို့အတွက် အသုံးပြုတဲ့ ဒေတာအစုအဝေးတစ်ခုပါ။
+*   **Preprocess**: ဒေတာများကို model က နားလည်ပြီး လုပ်ဆောင်နိုင်တဲ့ ပုံစံအဖြစ် ပြောင်းလဲပြင်ဆင်ခြင်း လုပ်ငန်းစဉ်။
+*   **`Dataset.map()`**: 🤗 Datasets library မှာ ပါဝင်တဲ့ method တစ်ခုဖြစ်ပြီး dataset ရဲ့ element တစ်ခုစီ ဒါမှမဟုတ် batch တစ်ခုစီပေါ်မှာ function တစ်ခုကို အသုံးပြုနိုင်စေသည်။
+*   **Metrics**: Model ၏ စွမ်းဆောင်ရည်ကို တိုင်းတာရန် အသုံးပြုသော တန်ဖိုးများ (ဥပမာ- accuracy, F1 score)။
+*   **Slice and Dice**: ဒေတာအစုအဝေး (dataset) ကို လိုအပ်သလို အစိတ်စိတ်အမြွှာမြွှာ ပိုင်းဖြတ်ခြင်းနှင့် ပုံစံပြောင်းလဲခြင်း။
+*   **Pandas**: Python programming language အတွက် data analysis နှင့် manipulation အတွက် အသုံးပြုသော open-source library။
+*   **RAM (Random Access Memory)**: ကွန်ပျူတာ၏ ယာယီမှတ်ဉာဏ်သိုလှောင်ရာနေရာ။
+*   **Memory Mapping**: ဖိုင်တစ်ခု၏ အကြောင်းအရာများကို ကွန်ပျူတာ၏ virtual memory နေရာသို့ တိုက်ရိုက်ချိတ်ဆက်ပေးသည့် နည်းလမ်း။ ၎င်းသည် ကြီးမားသောဖိုင်များကို disk ပေါ်ကနေ လိုအပ်သလောက်သာ memory ထဲသို့ load လုပ်စေပြီး memory အသုံးပြုမှုကို လျှော့ချသည်။
+*   **Apache Arrow**: In-memory data format တစ်ခုဖြစ်ပြီး data analytics applications တွေကြား ဒေတာဖလှယ်မှုကို မြန်ဆန်စေပြီး ထိရောက်စေသည်။
+*   **Push to the Hub**: Hugging Face Hub သို့ model, dataset သို့မဟုတ် အခြား artifacts များကို upload လုပ်ခြင်း။
+*   **Tokenization**: စာသား (သို့မဟုတ် အခြားဒေတာ) ကို AI မော်ဒယ်များ စီမံဆောင်ရွက်နိုင်ရန် tokens တွေအဖြစ် ပိုင်းခြားပေးသည့် လုပ်ငန်းစဉ်။
\ No newline at end of file
diff --git a/chapters/my/chapter5/2.mdx b/chapters/my/chapter5/2.mdx
new file mode 100644
index 000000000..e147f5866
--- /dev/null
+++ b/chapters/my/chapter5/2.mdx
@@ -0,0 +1,193 @@
+# ကျွန်ုပ်၏ Dataset သည် Hub တွင် မရှိလျှင် ဘာလုပ်ရမလဲ။[[what-if-my-dataset-isnt-on-the-hub]]
+
+<CourseFloatingBanner chapter={5}
+  classNames="absolute z-10 right-0 top-0"
+  notebooks={[
+    {label: "Google Colab", value: "https://colab.research.google.com/github/huggingface/notebooks/blob/master/course/en/chapter5/section2.ipynb"},
+    {label: "Aws Studio", value: "https://studiolab.sagemaker.aws/import/github/huggingface/notebooks/blob/master/course/en/chapter5/section2.ipynb"},
+]} />
+
+Datasets တွေကို download လုပ်ဖို့ [Hugging Face Hub](https://huggingface.co/datasets) ကို ဘယ်လိုအသုံးပြုရမယ်ဆိုတာ သင်သိပါတယ်။ ဒါပေမယ့် သင့် laptop မှာ ဒါမှမဟုတ် remote server တစ်ခုမှာ သိမ်းဆည်းထားတဲ့ data တွေနဲ့ အလုပ်လုပ်ရတာမျိုး မကြာခဏ ကြုံတွေ့ရမှာပါ။ ဒီအပိုင်းမှာ 🤗 Datasets ကို အသုံးပြုပြီး Hugging Face Hub မှာ မရရှိနိုင်တဲ့ datasets တွေကို ဘယ်လို load လုပ်ရမလဲဆိုတာ ပြသပေးပါမယ်။
+
+<Youtube id="HyQgpJTkRdE"/>
+
+## Local နှင့် Remote Datasets များနှင့် အလုပ်လုပ်ခြင်း[[working-with-local-and-remote-datasets]]
+
+🤗 Datasets က local နဲ့ remote datasets တွေကို load လုပ်ရာမှာ အသုံးပြုနိုင်တဲ့ loading scripts တွေကို ပံ့ပိုးပေးပါတယ်။ ၎င်းက အောက်ပါ common data formats များစွာကို ထောက်ပံ့ပေးပါတယ်။
+
+|    Data format     | Loading script |                         Example                         |
+| :----------------: | :------------: | :-----------------------------------------------------: |
+|     CSV & TSV      |     `csv`      |     `load_dataset("csv", data_files="my_file.csv")`     |
+|     Text files     |     `text`     |    `load_dataset("text", data_files="my_file.txt")`     |
+| JSON & JSON Lines  |     `json`     |   `load_dataset("json", data_files="my_file.jsonl")`    |
+| Pickled DataFrames |    `pandas`    | `load_dataset("pandas", data_files="my_dataframe.pkl")` |
+
+ဇယားမှာ ပြသထားတဲ့အတိုင်း၊ data format တစ်ခုစီအတွက် `load_dataset()` function မှာ loading script အမျိုးအစားကို သတ်မှတ်ဖို့ပဲ လိုပါတယ်။ ဒါ့အပြင် file တစ်ခု သို့မဟုတ် တစ်ခုထက်ပိုတဲ့ files တွေရဲ့ path ကို သတ်မှတ်ပေးမယ့် `data_files` argument လည်း လိုပါတယ်။ local files ကနေ dataset တစ်ခုကို load လုပ်ခြင်းဖြင့် စတင်ပါမယ်၊ နောက်ပိုင်းမှာ remote files တွေနဲ့ ဘယ်လိုလုပ်ရမလဲဆိုတာ ကြည့်ရပါမယ်။
+
+## Local Dataset တစ်ခုကို Loading လုပ်ခြင်း[[loading-a-local-dataset]]
+
+ဒီဥပမာအတွက် [SQuAD-it dataset](https://github.com/crux82/squad-it/) ကို ကျွန်တော်တို့ အသုံးပြုပါမယ်။ ဒါက Italian ဘာသာစကားနဲ့ မေးခွန်းဖြေဆိုခြင်း (question answering) အတွက် large-scale dataset တစ်ခု ဖြစ်ပါတယ်။
+
+training နဲ့ test splits တွေကို GitHub မှာ host လုပ်ထားတာကြောင့်၊ ရိုးရှင်းတဲ့ `wget` command တစ်ခုနဲ့ download လုပ်နိုင်ပါတယ်။
+
+```python
+!wget https://github.com/crux82/squad-it/raw/master/SQuAD_it-train.json.gz
+!wget https://github.com/crux82/squad-it/raw/master/SQuAD_it-test.json.gz
+```
+ 
+ဒါက *SQuAD_it-train.json.gz* နဲ့ *SQuAD_it-test.json.gz* လို့ခေါ်တဲ့ compressed files နှစ်ခုကို download လုပ်ပါလိမ့်မယ်။ ဒါတွေကို Linux ရဲ့ `gzip` command နဲ့ decompress လုပ်နိုင်ပါတယ်။
+
+```python
+!gzip -dkv SQuAD_it-*.json.gz
+```
+
+```bash
+SQuAD_it-test.json.gz:	   87.4% -- replaced with SQuAD_it-test.json
+SQuAD_it-train.json.gz:	   82.2% -- replaced with SQuAD_it-train.json
+```
+
+compressed files တွေဟာ _SQuAD_it-train.json_ နဲ့ _SQuAD_it-test.json_ တို့နဲ့ အစားထိုးခံရပြီး data တွေကို JSON format နဲ့ သိမ်းဆည်းထားတာကို တွေ့ရပါတယ်။
+
+> [!TIP]
+> ✎ အပေါ်က shell commands တွေမှာ `!` character ဘာကြောင့် ပါလဲဆိုတာ သင်တွေးနေမယ်ဆိုရင်၊ ဒါတွေက Jupyter notebook ထဲမှာ run နေလို့ပါပဲ။ သင် terminal ထဲမှာ dataset ကို download လုပ်ပြီး unzip လုပ်ချင်တယ်ဆိုရင် prefix ကို ရိုးရှင်းစွာ ဖယ်ရှားလိုက်ပါ။
+
+`load_dataset()` function နဲ့ JSON file တစ်ခုကို load လုပ်ဖို့၊ ကျွန်တော်တို့ဟာ သာမန် JSON (nested dictionary နဲ့ ဆင်တူ) ဒါမှမဟုတ် JSON Lines (line-separated JSON) နဲ့ အလုပ်လုပ်နေတာ ဟုတ်မဟုတ် သိဖို့ပဲ လိုပါတယ်။ question answering datasets အများစုလိုပဲ၊ SQuAD-it က nested format ကို အသုံးပြုပြီး text အားလုံးကို `data` field ထဲမှာ သိမ်းဆည်းထားပါတယ်။ ဒါက `field` argument ကို အောက်ပါအတိုင်း သတ်မှတ်ပေးခြင်းဖြင့် dataset ကို load လုပ်နိုင်တယ်လို့ ဆိုလိုပါတယ်။
+
+```py
+from datasets import load_dataset
+
+squad_it_dataset = load_dataset("json", data_files="SQuAD_it-train.json", field="data")
+```
+
+default အားဖြင့်၊ local files တွေကို load လုပ်ခြင်းက `train` split ပါဝင်တဲ့ `DatasetDict` object တစ်ခုကို ဖန်တီးပါတယ်။ ဒါကို `squad_it_dataset` object ကို စစ်ဆေးကြည့်ခြင်းဖြင့် မြင်နိုင်ပါတယ်။
+
+```py
+squad_it_dataset
+```
+
+```python out
+DatasetDict({
+    train: Dataset({
+        features: ['title', 'paragraphs'],
+        num_rows: 442
+    })
+})
+```
+
+ဒါက ကျွန်တော်တို့ကို rows အရေအတွက်နဲ့ training set နဲ့ ဆက်စပ်နေတဲ့ column names တွေကို ပြသပါတယ်။ `train` split ထဲက ဥပမာတစ်ခုကို အောက်ပါအတိုင်း indexing လုပ်ခြင်းဖြင့် ကြည့်ရှုနိုင်ပါတယ်။
+
+```py
+squad_it_dataset["train"][0]
+```
+
+```python out
+{
+    "title": "Terremoto del Sichuan del 2008",
+    "paragraphs": [
+        {
+            "context": "Il terremoto del Sichuan del 2008 o il terremoto...",
+            "qas": [
+                {
+                    "answers": [{"answer_start": 29, "text": "2008"}],
+                    "id": "56cdca7862d2951400fa6826",
+                    "question": "In quale anno si è verificato il terremoto nel Sichuan?",
+                },
+                ...
+            ],
+        },
+        ...
+    ],
+}
+```
+
+ကောင်းပါပြီ၊ ကျွန်တော်တို့ရဲ့ ပထမဆုံး local dataset ကို load လုပ်ခဲ့ပါပြီ။ ဒါပေမယ့် ဒါက training set အတွက် အလုပ်ဖြစ်ခဲ့ပေမယ့်၊ ကျွန်တော်တို့ တကယ်လိုချင်တာက `train` နဲ့ `test` splits နှစ်ခုလုံးကို `DatasetDict` object တစ်ခုတည်းမှာ ထည့်သွင်းဖို့ပါပဲ။ ဒါမှ `Dataset.map()` functions တွေကို splits နှစ်ခုလုံးပေါ်မှာ တစ်ပြိုင်နက်တည်း အသုံးပြုနိုင်မှာပါ။ ဒါကိုလုပ်ဖို့၊ `data_files` argument ကို dictionary တစ်ခု ပေးနိုင်ပါတယ်။ အဲဒီ dictionary က split name တစ်ခုစီကို အဲဒီ split နဲ့ ဆက်စပ်နေတဲ့ file တစ်ခုသို့ map လုပ်ပေးပါတယ်။
+
+```py
+data_files = {"train": "SQuAD_it-train.json", "test": "SQuAD_it-test.json"}
+squad_it_dataset = load_dataset("json", data_files=data_files, field="data")
+squad_it_dataset
+```
+
+```python out
+DatasetDict({
+    train: Dataset({
+        features: ['title', 'paragraphs'],
+        num_rows: 442
+    })
+    test: Dataset({
+        features: ['title', 'paragraphs'],
+        num_rows: 48
+    })
+})
+```
+
+ဒါက ကျွန်တော်တို့ လိုချင်တာ အတိအကျပါပဲ။ အခုဆိုရင် data ကို သန့်ရှင်းရေးလုပ်ဖို့၊ reviews တွေကို tokenize လုပ်ဖို့ စတဲ့ preprocessing techniques မျိုးစုံကို အသုံးပြုနိုင်ပါပြီ။
+
+> [!TIP]
+> `load_dataset()` function ရဲ့ `data_files` argument က အတော်လေး ပြောင်းလွယ်ပြင်လွယ်ရှိပြီး file path တစ်ခုတည်း၊ file paths စာရင်းတစ်ခု ဒါမှမဟုတ် split names တွေကို file paths တွေနဲ့ map လုပ်တဲ့ dictionary တစ်ခု ဖြစ်နိုင်ပါတယ်။ Unix shell က အသုံးပြုတဲ့ စည်းမျဉ်းတွေအရ သတ်မှတ်ထားတဲ့ pattern နဲ့ ကိုက်ညီတဲ့ files တွေကိုလည်း glob လုပ်နိုင်ပါတယ်။ (ဥပမာ- directory တစ်ခုထဲက JSON files အားလုံးကို `data_files="*.json"` လို့ သတ်မှတ်ခြင်းဖြင့် single split အဖြစ် glob လုပ်နိုင်ပါတယ်။) အသေးစိတ်အချက်အလက်တွေအတွက် 🤗 Datasets [documentation](https://huggingface.co/docs/datasets/loading#local-and-remote-files) ကို ကြည့်ပါ။
+
+🤗 Datasets မှာရှိတဲ့ loading scripts တွေက input files တွေကို အလိုအလျောက် decompress လုပ်ခြင်းကို ထောက်ပံ့ပေးပါတယ်၊ ဒါကြောင့် `data_files` argument ကို compressed files တွေဆီ တိုက်ရိုက်ညွှန်ပြခြင်းဖြင့် `gzip` ကို အသုံးပြုတာကို ကျော်ဖြတ်နိုင်ခဲ့ပါတယ်။
+
+```py
+data_files = {"train": "SQuAD_it-train.json.gz", "test": "SQuAD_it-test.json.gz"}
+squad_it_dataset = load_dataset("json", data_files=data_files, field="data")
+```
+
+ဒါက GZIP files အများအပြားကို ကိုယ်တိုင် decompress လုပ်ချင်ခြင်းမရှိတဲ့အခါ အသုံးဝင်ပါတယ်။ အလိုအလျောက် decompression က ZIP နဲ့ TAR လိုမျိုး အခြား common formats တွေနဲ့လည်း သက်ဆိုင်တာကြောင့်၊ သင် `data_files` ကို compressed files တွေဆီ ညွှန်ပြဖို့ပဲ လိုအပ်ပြီး အဆင်သင့်ပါပဲ!
+
+သင့် laptop ဒါမှမဟုတ် desktop ပေါ်မှာ local files တွေကို ဘယ်လို load လုပ်ရမယ်ဆိုတာ သိပြီဆိုတော့၊ remote files တွေကို loading လုပ်တာကို ကြည့်ရအောင်။
+
+## Remote Dataset တစ်ခုကို Loading လုပ်ခြင်း[[loading-a-remote-dataset]]
+
+သင်က ကုမ္ပဏီတစ်ခုမှာ data scientist ဒါမှမဟုတ် coder အဖြစ် အလုပ်လုပ်နေတယ်ဆိုရင်၊ သင် analyze လုပ်ချင်တဲ့ datasets တွေဟာ remote server တစ်ခုမှာ သိမ်းဆည်းထားနိုင်ခြေ များပါတယ်။ ကံကောင်းစွာနဲ့ပဲ၊ remote files တွေကို load လုပ်တာက local files တွေကို load လုပ်တာလောက် ရိုးရှင်းပါတယ်။ local files တွေရဲ့ path ကို ပေးမယ့်အစား၊ `load_dataset()` function ရဲ့ `data_files` argument ကို remote files တွေ သိမ်းဆည်းထားတဲ့ URL တစ်ခု သို့မဟုတ် တစ်ခုထက်ပိုတဲ့ URL တွေဆီ ညွှန်ပြပါတယ်။ ဥပမာ၊ GitHub မှာ host လုပ်ထားတဲ့ SQuAD-it dataset အတွက်၊ ကျွန်တော်တို့ `data_files` ကို _SQuAD_it-*.json.gz_ URLs တွေဆီ အောက်ပါအတိုင်း ညွှန်ပြနိုင်ပါတယ်။
+
+```py
+url = "https://github.com/crux82/squad-it/raw/master/"
+data_files = {
+    "train": url + "SQuAD_it-train.json.gz",
+    "test": url + "SQuAD_it-test.json.gz",
+}
+squad_it_dataset = load_dataset("json", data_files=data_files, field="data")
+```
+ဒါက အပေါ်မှာ ရရှိခဲ့တဲ့ `DatasetDict` object တူတူကို ပြန်ပေးပါတယ်၊ ဒါပေမယ့် _SQuAD_it-*.json.gz_ files တွေကို ကိုယ်တိုင် download လုပ်ပြီး decompress လုပ်ရမယ့်အဆင့်ကို ကျွန်တော်တို့အတွက် သက်သာစေပါတယ်။ ဒါက Hugging Face Hub မှာ host လုပ်ထားတာ မဟုတ်တဲ့ datasets တွေကို load လုပ်တဲ့ နည်းလမ်းအမျိုးမျိုးကို ကျွန်တော်တို့ရဲ့ လေ့လာမှုကို နိဂုံးချုပ်လိုက်ပါပြီ။ အခု ကျွန်တော်တို့ ကစားစရာ dataset တစ်ခု ရရှိပြီဆိုတော့၊ data-wrangling techniques အမျိုးမျိုးနဲ့ လက်တွေ့လုပ်ဆောင်ကြည့်ရအောင်။
+
+> [!TIP]
+> ✏️ **စမ်းသပ်ကြည့်ပါ။** GitHub မှာ ဒါမှမဟုတ် [UCI Machine Learning Repository](https://archive.ics.uci.edu/ml/index.php) မှာ host လုပ်ထားတဲ့ အခြား dataset တစ်ခုကို ရွေးပြီး အထက်မှာ မိတ်ဆက်ခဲ့တဲ့ နည်းလမ်းတွေကို အသုံးပြုပြီး local နဲ့ remote နှစ်မျိုးလုံး load လုပ်ကြည့်ပါ။ bonus အမှတ်များအတွက်၊ CSV ဒါမှမဟုတ် text format နဲ့ သိမ်းဆည်းထားတဲ့ dataset တစ်ခုကို load လုပ်ကြည့်ပါ (ဒီ formats တွေအကြောင်း အသေးစိတ်အချက်အလက်တွေအတွက် [documentation](https://huggingface.co/docs/datasets/loading#local-and-remote-files) ကို ကြည့်ပါ)။
+
+## ဝေါဟာရ ရှင်းလင်းချက် (Glossary)
+
+*   **Hugging Face Hub**: AI မော်ဒယ်တွေ၊ datasets တွေနဲ့ demo တွေကို အခြားသူတွေနဲ့ မျှဝေဖို့၊ ရှာဖွေဖို့နဲ့ ပြန်လည်အသုံးပြုဖို့အတွက် အွန်လိုင်း platform တစ်ခု ဖြစ်ပါတယ်။
+*   **Local Dataset**: သင့်ကွန်ပျူတာ (laptop သို့မဟုတ် desktop) ၏ hard drive ပေါ်တွင် သိမ်းဆည်းထားသော dataset။
+*   **Remote Dataset**: အွန်လိုင်းဆာဗာ သို့မဟုတ် cloud storage တွင် သိမ်းဆည်းထားသော dataset။
+*   **🤗 Datasets**: Hugging Face က ထုတ်လုပ်ထားတဲ့ library တစ်ခုဖြစ်ပြီး AI မော်ဒယ်တွေ လေ့ကျင့်ဖို့အတွက် ဒေတာအစုအဝေး (datasets) တွေကို လွယ်လွယ်ကူကူ ဝင်ရောက်ရယူ၊ စီမံခန့်ခွဲပြီး အသုံးပြုနိုင်စေပါတယ်။
+*   **Loading Scripts**: 🤗 Datasets library အတွင်းရှိ code များဖြစ်ပြီး မတူညီသော data formats (CSV, JSON, Text, Pandas) မှ datasets များကို load လုပ်ရန်အတွက် တာဝန်ယူသည်။
+*   **CSV (Comma-Separated Values)**: ကော်မာများဖြင့် ပိုင်းခြားထားသော ဒေတာတန်ဖိုးများပါဝင်သည့် plain text file format။
+*   **TSV (Tab-Separated Values)**: Tab များဖြင့် ပိုင်းခြားထားသော ဒေတာတန်ဖိုးများပါဝင်သည့် plain text file format။
+*   **Text Files**: စာသားအချက်အလက်များသာ ပါဝင်သော ဖိုင်များ။
+*   **JSON (JavaScript Object Notation)**: ဒေတာများကို ပေါ့ပေါ့ပါးပါး ဖလှယ်နိုင်သော format ဖြစ်ပြီး လူသားများ ဖတ်ရှုရလွယ်ကူပြီး စက်များ စီမံဆောင်ရွက်ရလွယ်ကူသည်။
+*   **JSON Lines**: JSON objects များကို line တစ်ကြောင်းစီတွင် တစ်ခုစီ ထားရှိသော JSON format ၏ ပုံစံတစ်မျိုး။
+*   **Pickled DataFrames**: Python ၏ `pickle` module ကို အသုံးပြု၍ Python object (ဥပမာ- Pandas DataFrame) ကို binary format ဖြင့် သိမ်းဆည်းထားသော ဖိုင်။
+*   **`load_dataset()` Function**: Hugging Face Datasets library မှ dataset များကို download လုပ်ပြီး cache လုပ်ရန် အသုံးပြုသော function။
+*   **`data_files` Argument**: `load_dataset()` function တွင် dataset files (local သို့မဟုတ် remote) ၏ path (သို့မဟုတ် URL) ကို သတ်မှတ်ရန် အသုံးပြုသော argument။
+*   **SQuAD-it Dataset**: Italian ဘာသာစကားဖြင့် မေးခွန်းဖြေဆိုခြင်း (question answering) အတွက် အသုံးပြုသော dataset။
+*   **GitHub**: Version control အတွက် Git ကို အသုံးပြုသည့် web-based platform တစ်ခုဖြစ်ပြီး code များနှင့် project များကို host လုပ်သည်။
+*   **`wget` Command**: ကွန်ရက်မှ ဖိုင်များကို download လုပ်ရန်အတွက် Unix/Linux command-line utility။
+*   **Compressed Files**: ဖိုင်အရွယ်အစားကို လျှော့ချရန်အတွက် ဖိသိပ်ထားသော ဖိုင်များ (ဥပမာ- `.gz`, `.zip`, `.tar`)။
+*   **`gzip` Command**: GZIP format ဖြင့် ဖိုင်များကို compress (ဖိသိပ်) သို့မဟုတ် decompress (ဖိသိပ်မှုဖြေလျှော့) ရန်အတွက် Linux command-line utility။
+*   **`!` Character**: Jupyter/Colab Notebooks များတွင် shell commands များကို run ရန်အတွက် အသုံးပြုသော prefix။
+*   **Jupyter Notebook**: code, text, images, နှင့် mathematical equations တို့ကို ပေါင်းစပ်နိုင်သော interactive computing environment။
+*   **Terminal**: command-line interface မှတစ်ဆင့် ကွန်ပျူတာကို ထိန်းချုပ်ရန် အသုံးပြုသော interface။
+*   **Nested Dictionary**: dictionary အတွင်း၌ အခြား dictionary များ ပါဝင်သော dictionary။
+*   **`field` Argument**: `load_dataset()` function တွင် JSON file အတွင်းရှိ မည်သည့် field မှ data များကို load လုပ်ရမည်ကို သတ်မှတ်ရန် အသုံးပြုသော argument။
+*   **`DatasetDict` Object**: Training set, validation set, နှင့် test set ကဲ့သို့သော dataset အများအပြားကို dictionary ပုံစံဖြင့် သိမ်းဆည်းထားသော object။
+*   **`train` Split**: Model ကို လေ့ကျင့်ရန်အတွက် အသုံးပြုသော dataset အပိုင်း။
+*   **`test` Split**: Model ၏ နောက်ဆုံး စွမ်းဆောင်ရည်ကို တိုင်းတာရန် အသုံးပြုသော dataset အပိုင်း။
+*   **Rows (Examples)**: dataset အတွင်းရှိ တစ်ခုချင်းစီသော data entry များ။
+*   **Column Names (Features)**: dataset အတွင်းရှိ attributes များ သို့မဟုတ် ကဏ္ဍများ။
+*   **Indexing**: dataset (သို့မဟုတ် list, dictionary) အတွင်းရှိ သီးခြား element တစ်ခုကို ၎င်း၏ index (သို့မဟုတ် key) ကို အသုံးပြု၍ ဝင်ရောက်ကြည့်ရှုခြင်း။
+*   **Preprocessing Techniques**: ဒေတာများကို model က နားလည်ပြီး လုပ်ဆောင်နိုင်တဲ့ ပုံစံအဖြစ် ပြောင်းလဲပြင်ဆင်ခြင်း လုပ်ငန်းစဉ်အတွက် အသုံးပြုသော နည်းလမ်းများ။
+*   **Tokenize**: စာသား (သို့မဟုတ် အခြားဒေတာ) ကို AI မော်ဒယ်များ စီမံဆောင်ရွက်နိုင်ရန် tokens တွေအဖြစ် ပိုင်းခြားပေးသည့် လုပ်ငန်းစဉ်။
+*   **Glob Files**: Unix shell တွင် wildcard (ဥပမာ- `*`, `?`) များကို အသုံးပြု၍ pattern တစ်ခုနှင့် ကိုက်ညီသော ဖိုင်များကို ရှာဖွေခြင်း။
+*   **URL (Uniform Resource Locator)**: web ပေါ်ရှိ အရင်းအမြစ်တစ်ခု (ဥပမာ- web page, file) ၏ လိပ်စာ။
+*   **Data-wrangling Techniques**: ကုန်ကြမ်းဒေတာ (raw data) များကို ပိုမိုအသုံးဝင်ပြီး သန့်ရှင်းသော ပုံစံသို့ ပြောင်းလဲရန်အတွက် လုပ်ဆောင်သော လုပ်ငန်းစဉ်များ။
+*   **UCI Machine Learning Repository**: Machine learning datasets များစွာကို စုစည်းထားသော repository တစ်ခု။
diff --git a/chapters/my/chapter5/3.mdx b/chapters/my/chapter5/3.mdx
new file mode 100644
index 000000000..70ffe91c0
--- /dev/null
+++ b/chapters/my/chapter5/3.mdx
@@ -0,0 +1,811 @@
+# Slice and Dice လုပ်ဖို့ အချိန်တန်ပြီ[[time-to-slice-and-dice]]
+
+<CourseFloatingBanner chapter={5}
+  classNames="absolute z-10 right-0 top-0"
+  notebooks={[
+    {label: "Google Colab", value: "https://colab.research.google.com/github/huggingface/notebooks/blob/master/course/en/chapter5/section3.ipynb"},
+    {label: "Aws Studio", value: "https://studiolab.sagemaker.aws/import/github/huggingface/notebooks/blob/master/course/en/chapter5/section3.ipynb"},
+]} />
+
+အများအားဖြင့်၊ သင်အလုပ်လုပ်တဲ့ data တွေက models တွေကို train လုပ်ဖို့အတွက် ပြီးပြည့်စုံစွာ ပြင်ဆင်ထားမှာ မဟုတ်ပါဘူး။ ဒီအပိုင်းမှာ 🤗 Datasets က သင့် datasets တွေကို သန့်ရှင်းရေးလုပ်ဖို့ ပံ့ပိုးပေးတဲ့ features အမျိုးမျိုးကို ကျွန်တော်တို့ လေ့လာသွားမှာပါ။
+
+<Youtube id="tqfSFcPMgOI"/>
+
+## ကျွန်တော်တို့ရဲ့ Data ကို Slicing and Dicing လုပ်ခြင်း[[slicing-and-dicing-our-data]]
+
+Pandas နဲ့ ဆင်တူစွာ၊ 🤗 Datasets က `Dataset` နဲ့ `DatasetDict` objects တွေရဲ့ အကြောင်းအရာတွေကို ခြယ်လှယ်ဖို့ functions အမျိုးမျိုးကို ပံ့ပိုးပေးပါတယ်။ [Chapter 3](/course/chapter3) မှာ `Dataset.map()` method ကို ကျွန်တော်တို့ ကြုံတွေ့ခဲ့ရပြီးပါပြီ၊ ဒီအပိုင်းမှာတော့ ကျွန်တော်တို့ ရရှိနိုင်တဲ့ အခြား functions အချို့ကို လေ့လာသွားမှာပါ။
+
+ဒီဥပမာအတွက် [UC Irvine Machine Learning Repository](https://archive.ics.uci.edu/ml/index.php) မှာ host လုပ်ထားတဲ့ [Drug Review Dataset](https://archive.ics.uci.edu/ml/datasets/Drug+Review+Dataset+%28Drugs.com%29) ကို ကျွန်တော်တို့ အသုံးပြုပါမယ်။ ၎င်းမှာ ဆေးဝါးအမျိုးမျိုးနဲ့ ပတ်သက်တဲ့ လူနာ reviews တွေအပြင်၊ ကုသနေတဲ့ အခြေအနေ (condition) နဲ့ လူနာရဲ့ စိတ်ကျေနပ်မှု 10-star rating တို့ ပါဝင်ပါတယ်။
+
+ပထမဆုံး ကျွန်တော်တို့ data ကို download လုပ်ပြီး extract လုပ်ဖို့ လိုအပ်ပါတယ်။ ဒါကို `wget` နဲ့ `unzip` commands တွေနဲ့ လုပ်ဆောင်နိုင်ပါတယ်။
+
+```py
+!wget "https://archive.ics.uci.edu/ml/machine-learning-databases/00462/drugsCom_raw.zip"
+!unzip drugsCom_raw.zip
+```
+
+TSV က CSV ရဲ့ ပုံစံတစ်မျိုးဖြစ်ပြီး commas အစား tabs တွေကို separator အဖြစ် အသုံးပြုတာကြောင့်၊ ဒီ files တွေကို `csv` loading script ကို အသုံးပြုပြီး `load_dataset()` function မှာ `delimiter` argument ကို အောက်ပါအတိုင်း သတ်မှတ်ပေးခြင်းဖြင့် load လုပ်နိုင်ပါတယ်-
+
+```py
+from datasets import load_dataset
+
+data_files = {"train": "drugsComTrain_raw.tsv", "test": "drugsComTest_raw.tsv"}
+# \t က Python မှာ tab character ပါ။
+drug_dataset = load_dataset("csv", data_files=data_files, delimiter="\t")
+```
+
+မည်သည့် data analysis မျိုးကိုမဆို လုပ်ဆောင်တဲ့အခါ ကောင်းမွန်တဲ့ အလေ့အကျင့်တစ်ခုကတော့ သင်အလုပ်လုပ်နေတဲ့ data အမျိုးအစားကို မြန်မြန်ဆန်ဆန် သိရှိနိုင်ဖို့ small random sample တစ်ခုကို ယူကြည့်တာပါပဲ။ 🤗 Datasets မှာ၊ `Dataset.shuffle()` နဲ့ `Dataset.select()` functions တွေကို ဆက်စပ်အသုံးပြုခြင်းဖြင့် random sample တစ်ခုကို ဖန်တီးနိုင်ပါတယ်။
+
+```py
+drug_sample = drug_dataset["train"].shuffle(seed=42).select(range(1000))
+# ပထမဥပမာအချို့ကို ကြည့်ရှုပါ
+drug_sample[:3]
+```
+
+```python out
+{'Unnamed: 0': [87571, 178045, 80482],
+ 'drugName': ['Naproxen', 'Duloxetine', 'Mobic'],
+ 'condition': ['Gout, Acute', 'ibromyalgia', 'Inflammatory Conditions'],
+ 'review': ['"like the previous person mention, I&#039;m a strong believer of aleve, it works faster for my gout than the prescription meds I take. No more going to the doctor for refills.....Aleve works!"',
+  '"I have taken Cymbalta for about a year and a half for fibromyalgia pain. It is great\r\nas a pain reducer and an anti-depressant, however, the side effects outweighed \r\nany benefit I got from it. I had trouble with restlessness, being tired constantly,\r\ndizziness, dry mouth, numbness and tingling in my feet, and horrible sweating. I am\r\nbeing weaned off of it now. Went from 60 mg to 30mg and now to 15 mg. I will be\r\noff completely in about a week. The fibro pain is coming back, but I would rather deal with it than the side effects."',
+  '"I have been taking Mobic for over a year with no side effects other than an elevated blood pressure.  I had severe knee and ankle pain which completely went away after taking Mobic.  I attempted to stop the medication however pain returned after a few days."'],
+ 'rating': [9.0, 3.0, 10.0],
+ 'date': ['September 2, 2015', 'November 7, 2011', 'June 5, 2013'],
+ 'usefulCount': [36, 13, 128]}
+```
+
+reproducibility အတွက် `Dataset.shuffle()` မှာ seed ကို ကျွန်တော်တို့ သတ်မှတ်ထားတာကို သတိပြုပါ။ `Dataset.select()` က indices တွေရဲ့ iterable တစ်ခုကို မျှော်လင့်တာကြောင့်၊ shuffled dataset ကနေ ပထမ 1,000 ဥပမာတွေကို ယူဖို့ `range(1000)` ကို ကျွန်တော်တို့ ပေးပို့ခဲ့ပါတယ်။ ဒီ sample ကနေ ကျွန်တော်တို့ dataset မှာ အချို့ထူးခြားချက်တွေကို မြင်တွေ့နိုင်ပါပြီ။
+
+*   `Unnamed: 0` column က လူနာတစ်ဦးစီအတွက် anonymized ID တစ်ခုနဲ့ သံသယဖြစ်ဖွယ် ဆင်တူပါတယ်။
+*   `condition` column မှာ uppercase နဲ့ lowercase labels တွေ ရောနှောပါဝင်ပါတယ်။
+*   reviews တွေက အရှည်အမျိုးမျိုးရှိပြီး Python line separators (`\r\n`) အပြင် HTML character codes (ဥပမာ `&\#039;`) တွေလည်း ပါဝင်ပါတယ်။
+
+ဒီပြဿနာတစ်ခုစီကို ဖြေရှင်းဖို့ 🤗 Datasets ကို ဘယ်လိုအသုံးပြုရမလဲဆိုတာ ကြည့်ရအောင်။ `Unnamed: 0` column အတွက် patient ID hypothesis ကို စမ်းသပ်ဖို့၊ `Dataset.unique()` function ကို အသုံးပြုပြီး ID အရေအတွက်က split တစ်ခုစီမှာရှိတဲ့ rows အရေအတွက်နဲ့ ကိုက်ညီခြင်းရှိမရှိ စစ်ဆေးနိုင်ပါတယ်။
+
+```py
+for split in drug_dataset.keys():
+    assert len(drug_dataset[split]) == len(drug_dataset[split].unique("Unnamed: 0"))
+```
+
+ဒါက ကျွန်တော်တို့ရဲ့ hypothesis ကို အတည်ပြုပုံရပါတယ်၊ ဒါကြောင့် `Unnamed: 0` column ကို နားလည်ရလွယ်ကူတဲ့ နာမည်တစ်ခုနဲ့ ပြန်လည်နာမည်ပေးခြင်းဖြင့် dataset ကို အနည်းငယ် သန့်ရှင်းရေးလုပ်ရအောင်။ `DatasetDict.rename_column()` function ကို အသုံးပြုပြီး columns တွေကို splits နှစ်ခုလုံးမှာ တစ်ပြိုင်နက်တည်း ပြန်လည်နာမည်ပေးနိုင်ပါတယ်။
+
+```py
+drug_dataset = drug_dataset.rename_column(
+    original_column_name="Unnamed: 0", new_column_name="patient_id"
+)
+drug_dataset
+```
+
+```python out
+DatasetDict({
+    train: Dataset({
+        features: ['patient_id', 'drugName', 'condition', 'review', 'rating', 'date', 'usefulCount'],
+        num_rows: 161297
+    })
+    test: Dataset({
+        features: ['patient_id', 'drugName', 'condition', 'review', 'rating', 'date', 'usefulCount'],
+        num_rows: 53766
+    })
+})
+```
+
+> [!TIP]
+> ✏️ **စမ်းသပ်ကြည့်ပါ။** training နဲ့ test sets တွေမှာရှိတဲ့ ထူးခြားတဲ့ ဆေးဝါးတွေနဲ့ condition တွေရဲ့ အရေအတွက်ကို ရှာဖွေဖို့ `Dataset.unique()` function ကို အသုံးပြုပါ။
+
+ထို့နောက် `Dataset.map()` ကို အသုံးပြုပြီး `condition` labels အားလုံးကို normalize လုပ်ရအောင်။ [Chapter 3](/course/chapter3) မှာ tokenization နဲ့ လုပ်ခဲ့သလိုပဲ၊ `drug_dataset` ထဲက split တစ်ခုစီရဲ့ rows အားလုံးမှာ အသုံးပြုနိုင်မယ့် ရိုးရှင်းတဲ့ function တစ်ခုကို ကျွန်တော်တို့ သတ်မှတ်နိုင်ပါတယ်။
+
+```py
+def lowercase_condition(example):
+    return {"condition": example["condition"].lower()}
+
+
+drug_dataset.map(lowercase_condition)
+```
+
+```python out
+AttributeError: 'NoneType' object has no attribute 'lower'
+```
+
+โอ้! map function မှာ ပြဿနာတစ်ခု ကြုံတွေ့ခဲ့ရပါပြီ။ error ကနေ `condition` column ထဲက အချို့ entries တွေဟာ `None` ဖြစ်ပြီး၊ ၎င်းတို့ဟာ strings မဟုတ်တဲ့အတွက် lowercased လုပ်လို့မရဘူးဆိုတာကို ကျွန်တော်တို့ ကောက်ချက်ချနိုင်ပါတယ်။ ဒီ rows တွေကို `Dataset.filter()` ကို အသုံးပြုပြီး ဖယ်ရှားရအောင်။ ၎င်းက `Dataset.map()` နဲ့ ဆင်တူစွာ အလုပ်လုပ်ပြီး dataset ရဲ့ single example တစ်ခုကို လက်ခံတဲ့ function တစ်ခုကို မျှော်လင့်ပါတယ်။ အောက်ပါအတိုင်း explicit function တစ်ခု ရေးမယ့်အစား...
+
+```py
+def filter_nones(x):
+    return x["condition"] is not None
+```
+
+ပြီးတော့ `drug_dataset.filter(filter_nones)` ကို run မယ့်အစား၊ _lambda function_ ကို အသုံးပြုပြီး ဒါကို တစ်ကြောင်းတည်းနဲ့ လုပ်ဆောင်နိုင်ပါတယ်။ Python မှာ၊ lambda functions တွေဟာ နာမည်မပေးဘဲ သတ်မှတ်နိုင်တဲ့ small functions တွေဖြစ်ပါတယ်။ ၎င်းတို့ရဲ့ အခြေခံပုံစံက...
+
+```
+lambda <arguments> : <expression>
+```
+
+`lambda` ဟာ Python ရဲ့ special [keywords](https://docs.python.org/3/reference/lexical_analysis.html#keywords) တွေထဲက တစ်ခုဖြစ်ပြီး၊ `<arguments>` က function ရဲ့ inputs တွေကို သတ်မှတ်ပေးတဲ့ comma-separated values တွေရဲ့ list/set ဖြစ်ပြီး၊ `<expression>` က သင် execute လုပ်လိုတဲ့ operations တွေကို ကိုယ်စားပြုပါတယ်။ ဥပမာ၊ ဂဏန်းတစ်ခုကို နှစ်ထပ်ကိန်းတင်မယ့် ရိုးရှင်းတဲ့ lambda function တစ်ခုကို အောက်ပါအတိုင်း သတ်မှတ်နိုင်ပါတယ်။
+
+```
+lambda x : x * x
+```
+
+ဒီ function ကို input တစ်ခုမှာ အသုံးပြုဖို့၊ ဒါကို input နဲ့အတူ parentheses ထဲမှာ ထည့်ဖို့ လိုအပ်ပါတယ်-
+
+```py
+(lambda x: x * x)(3)
+```
+
+```python out
+9
+```
+
+အလားတူပဲ၊ arguments မျိုးစုံနဲ့ lambda functions တွေကို commas တွေနဲ့ ခွဲခြားပြီး သတ်မှတ်နိုင်ပါတယ်။ ဥပမာ၊ တြိဂံတစ်ခုရဲ့ area ကို အောက်ပါအတိုင်း တွက်ချက်နိုင်ပါတယ်။
+
+```py
+(lambda base, height: 0.5 * base * height)(4, 8)
+```
+
+```python out
+16.0
+```
+
+Lambda functions တွေဟာ small, single-use functions တွေ သတ်မှတ်လိုတဲ့အခါ အသုံးဝင်ပါတယ် (၎င်းတို့အကြောင်း အသေးစိတ်အချက်အလက်တွေအတွက် Andre Burgaud ရဲ့ အကောင်းဆုံး [Real Python tutorial](https://realpython.com/python-lambda/) ကို ဖတ်ရှုဖို့ ကျွန်တော်တို့ အကြံပြုပါတယ်)။ 🤗 Datasets context မှာ၊ ကျွန်တော်တို့ဟာ simple map နဲ့ filter operations တွေ သတ်မှတ်ဖို့ lambda functions တွေကို အသုံးပြုနိုင်ပါတယ်။ ဒါကြောင့် ဒီ trick ကို အသုံးပြုပြီး ကျွန်တော်တို့ dataset ထဲက `None` entries တွေကို ဖယ်ရှားရအောင်။
+
+```py
+drug_dataset = drug_dataset.filter(lambda x: x["condition"] is not None)
+```
+
+`None` entries တွေ ဖယ်ရှားပြီးတာနဲ့၊ ကျွန်တော်တို့ `condition` column ကို normalize လုပ်နိုင်ပါပြီ။
+
+```py
+drug_dataset = drug_dataset.map(lowercase_condition)
+# lowercasing အလုပ်ဖြစ်ကြောင်း စစ်ဆေးပါ
+drug_dataset["train"]["condition"][:3]
+```
+
+```python out
+['left ventricular dysfunction', 'adhd', 'birth control']
+```
+
+အလုပ်ဖြစ်ပါတယ်! Labels တွေကို သန့်ရှင်းရေးလုပ်ပြီးတာနဲ့ reviews တွေကိုယ်တိုင် သန့်ရှင်းရေးလုပ်တာကို ကြည့်ရအောင်။
+
+## New Columns များ ဖန်တီးခြင်း[[creating-new-columns]]
+
+Customer reviews တွေနဲ့ အလုပ်လုပ်တဲ့အခါတိုင်း၊ review တစ်ခုစီမှာ ပါဝင်တဲ့ စကားလုံးအရေအတွက်ကို စစ်ဆေးတာဟာ ကောင်းမွန်တဲ့ အလေ့အကျင့်တစ်ခုပါပဲ။ review တစ်ခုက "Great!" လိုမျိုး စကားလုံးတစ်လုံးတည်း ဖြစ်နိုင်သလို၊ ထောင်ပေါင်းများစွာသော စကားလုံးတွေနဲ့ ပြည့်စုံတဲ့ essay တစ်ခုလည်း ဖြစ်နိုင်ပါတယ်။ use case ပေါ်မူတည်ပြီး ဒီအစွန်းရောက်အခြေအနေတွေကို ကွဲပြားစွာ ကိုင်တွယ်ဖို့ လိုအပ်ပါလိမ့်မယ်။ review တစ်ခုစီမှာရှိတဲ့ စကားလုံးအရေအတွက်ကို တွက်ချက်ဖို့၊ text တစ်ခုစီကို whitespace နဲ့ ခွဲထုတ်ခြင်းအပေါ် အခြေခံတဲ့ rough heuristic တစ်ခုကို ကျွန်တော်တို့ အသုံးပြုပါမယ်။
+
+review တစ်ခုစီမှာရှိတဲ့ စကားလုံးအရေအတွက်ကို ရေတွက်တဲ့ ရိုးရှင်းတဲ့ function တစ်ခုကို သတ်မှတ်ရအောင်...
+
+```py
+def compute_review_length(example):
+    return {"review_length": len(example["review"].split())}
+```
+
+ကျွန်တော်တို့ရဲ့ `lowercase_condition()` function နဲ့ မတူဘဲ၊ `compute_review_length()` က dataset ထဲက column names တွေထဲက တစ်ခုနဲ့ မကိုက်ညီတဲ့ key ပါဝင်တဲ့ dictionary တစ်ခုကို ပြန်ပေးပါတယ်။ ဒီကိစ္စမှာ၊ `compute_review_length()` ကို `Dataset.map()` သို့ ပေးပို့တဲ့အခါ၊ dataset ထဲက rows အားလုံးမှာ အသုံးပြုပြီး `review_length` column အသစ်တစ်ခုကို ဖန်တီးပါလိမ့်မယ်။
+
+```py
+drug_dataset = drug_dataset.map(compute_review_length)
+# ပထမ training example ကို စစ်ဆေးပါ
+drug_dataset["train"][0]
+```
+
+```python out
+{'patient_id': 206461,
+ 'drugName': 'Valsartan',
+ 'condition': 'left ventricular dysfunction',
+ 'review': '"It has no side effect, I take it in combination of Bystolic 5 Mg and Fish Oil"',
+ 'rating': 9.0,
+ 'date': 'May 20, 2012',
+ 'usefulCount': 27,
+ 'review_length': 17}
+```
+
+မျှော်လင့်ထားတဲ့အတိုင်း၊ ကျွန်တော်တို့ရဲ့ training set မှာ `review_length` column တစ်ခု ထပ်ထည့်ထားတာကို မြင်တွေ့နိုင်ပါတယ်။ ဒီ column အသစ်ကို `Dataset.sort()` နဲ့ sort လုပ်ပြီး extreme values တွေ ဘယ်လိုပုံစံရှိလဲဆိုတာ ကြည့်နိုင်ပါတယ်။
+
+```py
+drug_dataset["train"].sort("review_length")[:3]
+```
+
+```python out
+{'patient_id': [103488, 23627, 20558],
+ 'drugName': ['Loestrin 21 1 / 20', 'Chlorzoxazone', 'Nucynta'],
+ 'condition': ['birth control', 'muscle spasm', 'pain'],
+ 'review': ['"Excellent."', '"useless"', '"ok"'],
+ 'rating': [10.0, 1.0, 6.0],
+ 'date': ['November 4, 2008', 'March 24, 2017', 'August 20, 2016'],
+ 'usefulCount': [5, 2, 10],
+ 'review_length': [1, 1, 1]}
+```
+
+ကျွန်တော်တို့ သံသယရှိခဲ့တဲ့အတိုင်း၊ reviews အချို့မှာ စကားလုံးတစ်လုံးတည်းသာ ပါဝင်ပါတယ်။ ဒါက sentiment analysis အတွက် အဆင်ပြေနိုင်ပေမယ့်၊ condition ကို ခန့်မှန်းချင်တယ်ဆိုရင်တော့ အချက်အလက်မပြည့်စုံပါဘူး။
+
+> [!TIP]
+> 🙋 dataset မှာ columns အသစ်တွေ ထပ်ထည့်ဖို့ နောက်ထပ်နည်းလမ်းတစ်ခုကတော့ `Dataset.add_column()` function ကို အသုံးပြုခြင်းပါပဲ။ ဒါက column ကို Python list ဒါမှမဟုတ် NumPy array အဖြစ် ပံ့ပိုးပေးနိုင်ပြီး `Dataset.map()` က သင့် analysis အတွက် မသင့်လျော်တဲ့ အခြေအနေတွေမှာ အသုံးဝင်နိုင်ပါတယ်။
+
+စကားလုံး ၃၀ ထက်နည်းတဲ့ reviews တွေကို ဖယ်ရှားဖို့ `Dataset.filter()` function ကို အသုံးပြုရအောင်။ `condition` column နဲ့ လုပ်ခဲ့သလိုပဲ၊ reviews တွေရဲ့ အရှည်က ဒီ threshold ထက် ပိုရှည်ဖို့ လိုအပ်ချက်ထားခြင်းဖြင့် အလွန်တိုတောင်းတဲ့ reviews တွေကို ကျွန်တော်တို့ ဖယ်ရှားနိုင်ပါတယ်။
+
+```py
+drug_dataset = drug_dataset.filter(lambda x: x["review_length"] > 30)
+print(drug_dataset.num_rows)
+```
+
+```python out
+{'train': 138514, 'test': 46108}
+```
+
+သင်တွေ့ရတဲ့အတိုင်း၊ ဒါက ကျွန်တော်တို့ရဲ့ မူရင်း training နဲ့ test sets တွေကနေ reviews တွေရဲ့ ၁၅% ခန့်ကို ဖယ်ရှားလိုက်တာပါပဲ။
+
+> [!TIP]
+> ✏️ **စမ်းသပ်ကြည့်ပါ။** `Dataset.sort()` function ကို အသုံးပြုပြီး စကားလုံးအရေအတွက် အများဆုံး reviews တွေကို စစ်ဆေးကြည့်ပါ။ reviews တွေကို အရှည်အလိုက် descending order နဲ့ sort လုပ်ဖို့ ဘယ် argument ကို အသုံးပြုရမလဲဆိုတာ သိရှိဖို့ [documentation](https://huggingface.co/docs/datasets/package_reference/main_classes#datasets.Dataset.sort) ကို ကြည့်ပါ။
+
+ကျွန်တော်တို့ ကိုင်တွယ်ရမယ့် နောက်ဆုံးအရာက reviews တွေမှာ HTML character codes တွေ ပါဝင်နေခြင်းပါပဲ။ ဒီ characters တွေကို unescape လုပ်ဖို့ Python ရဲ့ `html` module ကို အောက်ပါအတိုင်း အသုံးပြုနိုင်ပါတယ်-
+
+```py
+import html
+
+text = "I&#039;m a transformer called BERT"
+html.unescape(text)
+```
+
+```python out
+"I'm a transformer called BERT"
+```
+
+ကျွန်တော်တို့ corpus ထဲက HTML characters အားလုံးကို unescape လုပ်ဖို့ `Dataset.map()` ကို အသုံးပြုပါမယ်။
+
+```python
+drug_dataset = drug_dataset.map(lambda x: {"review": html.unescape(x["review"])})
+```
+
+သင်တွေ့ရတဲ့အတိုင်း၊ `Dataset.map()` method က data ကို process လုပ်ရာမှာ အတော်လေး အသုံးဝင်ပါတယ်၊ ပြီးတော့ ဒါက လုပ်ဆောင်နိုင်တဲ့အရာတွေရဲ့ အပေါ်ယံမျှသာ ရှိပါသေးတယ်!
+
+## `map()` Method ရဲ့ အစွမ်းထက်စွမ်းရည်များ[[the-map-methods-superpowers]]
+
+`Dataset.map()` method က `batched` argument ကို လက်ခံပါတယ်။ `True` လို့ သတ်မှတ်ရင်၊ အဲဒါက map function ကို batches အလိုက် examples တွေ ပေးပို့ပါလိမ့်မယ် (batch size ကို ပြင်ဆင်နိုင်ပေမယ့် default က ၁,၀၀၀ ဖြစ်ပါတယ်)။ ဥပမာ၊ HTML အားလုံးကို unescape လုပ်တဲ့ ယခင် map function က အလုပ်လုပ်ဖို့ အချိန်အနည်းငယ် ကြာခဲ့ပါတယ် (progress bars တွေကနေ ကြာချိန်ကို ဖတ်ရှုနိုင်ပါတယ်)။ list comprehension ကို အသုံးပြုပြီး elements အများအပြားကို တစ်ပြိုင်နက်တည်း process လုပ်ခြင်းဖြင့် ဒါကို အရှိန်မြှင့်နိုင်ပါတယ်။
+
+`batched=True` လို့ သတ်မှတ်တဲ့အခါ function က dataset ရဲ့ fields တွေပါဝင်တဲ့ dictionary တစ်ခုကို လက်ခံပါတယ်၊ ဒါပေမယ့် value တစ်ခုစီက အခုဆိုရင် single value တစ်ခုတည်း မဟုတ်တော့ဘဲ _list of values_ ဖြစ်လာပါတယ်။ `Dataset.map()` ရဲ့ return value ကလည်း အတူတူပဲ ဖြစ်သင့်ပါတယ်- ကျွန်တော်တို့ dataset ကို update လုပ်ချင်တဲ့ ဒါမှမဟုတ် ထပ်ထည့်ချင်တဲ့ fields တွေနဲ့ list of values ပါဝင်တဲ့ dictionary တစ်ခုပါ။ ဥပမာ၊ HTML characters အားလုံးကို unescape လုပ်ဖို့ နောက်ထပ်နည်းလမ်းတစ်ခုက `batched=True` ကို အသုံးပြုခြင်းပါပဲ။
+
+```python
+new_drug_dataset = drug_dataset.map(
+    lambda x: {"review": [html.unescape(o) for o in x["review"]]}, batched=True
+)
+```
+
+သင်ဒီ code ကို notebook မှာ run နေတယ်ဆိုရင်၊ ဒီ command က ယခင် command ထက် အများကြီး ပိုမြန်မြန် execute ဖြစ်တာကို တွေ့ရပါလိမ့်မယ်။ ဒါဟာ ကျွန်တော်တို့ reviews တွေက HTML-unescaped ဖြစ်ပြီးသားမို့ မဟုတ်ပါဘူး၊ ယခင်အပိုင်းက instruction ( `batched=True` မပါဘဲ) ကို ပြန်လည် execute လုပ်မယ်ဆိုရင် အရင်ကအတိုင်း အချိန်တူတူ ကြာပါလိမ့်မယ်။ ဒါက list comprehensions တွေဟာ `for` loop ထဲမှာ code တူတူကို execute လုပ်တာထက် ပိုမြန်တတ်လို့ ဖြစ်ပါတယ်၊ ပြီးတော့ elements အများကြီးကို တစ်ခုချင်းစီ မဟုတ်ဘဲ တစ်ပြိုင်နက်တည်း ဝင်ရောက်ကြည့်ရှုခြင်းဖြင့် စွမ်းဆောင်ရည်အချို့ကိုလည်း ရရှိပါတယ်။
+
+`Dataset.map()` ကို `batched=True` နဲ့ အသုံးပြုတာက [Chapter 6](/course/chapter6) မှာ ကျွန်တော်တို့ ကြုံတွေ့ရမယ့် "fast" tokenizers တွေရဲ့ မြန်နှုန်းကို ဖွင့်လှစ်ဖို့ မရှိမဖြစ်လိုအပ်ပါလိမ့်မယ်။ ၎င်းတို့က big lists of texts တွေကို မြန်မြန်ဆန်ဆန် tokenize လုပ်နိုင်ပါတယ်။ ဥပမာ၊ fast tokenizer တစ်ခုနဲ့ drug reviews အားလုံးကို tokenize လုပ်ဖို့၊ အောက်ပါ function လိုမျိုး တစ်ခုကို အသုံးပြုနိုင်ပါတယ်။
+
+```python
+from transformers import AutoTokenizer
+
+tokenizer = AutoTokenizer.from_pretrained("bert-base-cased")
+
+
+def tokenize_function(examples):
+    return tokenizer(examples["review"], truncation=True)
+```
+
+[Chapter 3](/course/chapter3) မှာ သင်တွေ့ခဲ့ရတဲ့အတိုင်း၊ tokenizer ကို examples တစ်ခု သို့မဟုတ် တစ်ခုထက်ပိုတာတွေကို ပေးပို့နိုင်တာကြောင့်၊ ဒီ function ကို `batched=True` ပါသည်ဖြစ်စေ၊ မပါသည်ဖြစ်စေ အသုံးပြုနိုင်ပါတယ်။ မတူညီတဲ့ options တွေရဲ့ စွမ်းဆောင်ရည်ကို နှိုင်းယှဉ်ဖို့ ဒီအခွင့်အရေးကို ယူကြရအောင်။ notebook မှာ၊ သင်တိုင်းတာချင်တဲ့ code line ရဲ့ အရှေ့မှာ `%time` ထည့်ခြင်းဖြင့် one-line instruction တစ်ခုကို အချိန်တိုင်းနိုင်ပါတယ်။
+
+```python no-format
+%time tokenized_dataset = drug_dataset.map(tokenize_function, batched=True)
+```
+
+cell တစ်ခုလုံးကို အချိန်တိုင်းဖို့အတွက် cell ရဲ့ အစမှာ `%%time` ထည့်လို့လည်း ရပါတယ်။ ကျွန်တော်တို့ run ခဲ့တဲ့ hardware မှာ၊ ဒီ instruction အတွက် 10.8s ပြသခဲ့ပါတယ် ("Wall time" နောက်မှာ ရေးထားတဲ့ နံပါတ်ပါ)။
+
+> [!TIP]
+> ✏️ **စမ်းသပ်ကြည့်ပါ။** `batched=True` ပါသည်ဖြစ်စေ၊ မပါသည်ဖြစ်စေ instruction တူတူကို execute လုပ်ကြည့်ပါ၊ ပြီးတော့ slow tokenizer နဲ့ စမ်းသပ်ကြည့်ပါ (`AutoTokenizer.from_pretrained()` method မှာ `use_fast=False` ကို ထည့်ပါ) ဒါမှ သင့် hardware မှာ ဘယ်လိုနံပါတ်တွေ ရလဲဆိုတာ မြင်နိုင်ပါလိမ့်မယ်။
+
+batching ပါသည်ဖြစ်စေ၊ မပါသည်ဖြစ်စေ fast နဲ့ slow tokenizer တွေနဲ့ ကျွန်တော်တို့ ရရှိခဲ့တဲ့ ရလဒ်တွေကတော့ ဒီမှာပါ။
+
+Options         | Fast tokenizer | Slow tokenizer
+:--------------:|:--------------:|:-------------:
+`batched=True`  | 10.8s          | 4min41s
+`batched=False` | 59.2s          | 5min3s
+
+ဒါက `batched=True` option နဲ့ fast tokenizer ကို အသုံးပြုခြင်းက batching မပါတဲ့ slow counterpart ထက် အဆ ၃၀ ပိုမြန်တယ်လို့ ဆိုလိုပါတယ် — ဒါက တကယ်ကို အံ့မခန်းပါပဲ! ဒါက `AutoTokenizer` ကို အသုံးပြုတဲ့အခါ fast tokenizers တွေ default ဖြစ်ရတဲ့ အဓိကအကြောင်းရင်းပါ (ဒါကြောင့်လည်း ၎င်းတို့ကို "fast" လို့ခေါ်တာပါ)။ ၎င်းတို့က ဒီလို speedup ကို ဘယ်လိုရရှိလဲဆိုတော့၊ နောက်ကွယ်မှာ tokenization code ကို Rust မှာ execute လုပ်ထားလို့ပါပဲ။ Rust က code execution ကို parallelize လုပ်ဖို့ လွယ်ကူစေတဲ့ ဘာသာစကားတစ်ခုပါ။
+
+Parallelization က batching နဲ့ fast tokenizer က 6x နီးပါး speedup ရရှိရတဲ့ အကြောင်းရင်းလည်း ဖြစ်ပါတယ်- single tokenization operation တစ်ခုကို parallelize လုပ်လို့မရပါဘူး၊ ဒါပေမယ့် texts အများကြီးကို တစ်ပြိုင်နက်တည်း tokenize လုပ်ချင်တဲ့အခါ execution ကို processes အများအပြား ခွဲလိုက်ရုံပါပဲ၊ တစ်ခုစီက သူ့ရဲ့ကိုယ်ပိုင် texts တွေအတွက် တာဝန်ယူပါတယ်။
+
+`Dataset.map()` မှာ သူ့ဘာသာသူ parallelization စွမ်းရည်အချို့လည်း ရှိပါတယ်။ ၎င်းတို့က Rust က ထောက်ပံ့ထားတာ မဟုတ်တဲ့အတွက် slow tokenizer ကို fast tokenizer နဲ့ လိုက်မီအောင် မလုပ်ပေးနိုင်ပေမယ့် (အထူးသဖြင့် fast version မရှိတဲ့ tokenizer တစ်ခုကို အသုံးပြုနေတယ်ဆိုရင်) အကူအညီဖြစ်နိုင်ပါသေးတယ်။ multiprocessing ကို ဖွင့်ဖို့အတွက်၊ `num_proc` argument ကို အသုံးပြုပြီး `Dataset.map()` ကို ခေါ်ဆိုတဲ့အခါ အသုံးပြုမယ့် processes အရေအတွက်ကို သတ်မှတ်ပေးပါ။
+
+```py
+slow_tokenizer = AutoTokenizer.from_pretrained("bert-base-cased", use_fast=False)
+
+
+def slow_tokenize_function(examples):
+    return slow_tokenizer(examples["review"], truncation=True)
+
+
+tokenized_dataset = drug_dataset.map(slow_tokenize_function, batched=True, num_proc=8)
+```
+
+အကောင်းဆုံး processes အရေအတွက်ကို ဆုံးဖြတ်ဖို့ အချိန်တိုင်းတာနဲ့ အနည်းငယ် စမ်းသပ်ကြည့်နိုင်ပါတယ်၊ ကျွန်တော်တို့ရဲ့ ကိစ္စမှာ ၈ က အကောင်းဆုံး speed gain ကို ပေးပုံရပါတယ်။ multiprocessing ပါသည်ဖြစ်စေ၊ မပါသည်ဖြစ်စေ ကျွန်တော်တို့ ရရှိခဲ့တဲ့ နံပါတ်တွေကတော့ ဒီမှာပါ။
+
+Options         | Fast tokenizer | Slow tokenizer
+:--------------:|:--------------:|:-------------:
+`batched=True`  | 10.8s          | 4min41s
+`batched=False` | 59.2s          | 5min3s
+`batched=True`, `num_proc=8`  | 6.52s          | 41.3s
+`batched=False`, `num_proc=8` | 9.49s          | 45.2s
+
+ဒါတွေက slow tokenizer အတွက် ပိုပြီး လက်ခံနိုင်စရာ ရလဒ်တွေပါ၊ ဒါပေမယ့် fast tokenizer ရဲ့ စွမ်းဆောင်ရည်ကလည်း သိသိသာသာ တိုးတက်လာခဲ့ပါတယ်။ သို့သော်လည်း၊ `num_proc` တန်ဖိုး ၈ ကလွဲပြီး အခြားတန်ဖိုးတွေအတွက်၊ ကျွန်တော်တို့ စမ်းသပ်မှုတွေက အဲဒီ option မပါဘဲ `batched=True` ကို အသုံးပြုတာ ပိုမြန်တယ်ဆိုတာ ပြသခဲ့တာကို သတိပြုပါ။ ယေဘုယျအားဖြင့်၊ `batched=True` နဲ့ fast tokenizers တွေအတွက် Python multiprocessing ကို အသုံးပြုဖို့ ကျွန်တော်တို့ အကြံမပြုပါဘူး။
+
+> [!TIP]
+> `num_proc` ကို အသုံးပြုပြီး သင်၏ processing ကို အရှိန်မြှင့်တင်ခြင်းက အမြဲတမ်း ကောင်းမွန်တဲ့ အကြံဥာဏ်တစ်ခုပါ၊ သင့် function က သူ့ဘာသာသူ multiprocessing မျိုးကို မလုပ်ဆောင်နေသရွေ့ပေါ့။
+
+ဒီ functionality အားလုံးကို single method တစ်ခုတည်းမှာ ပေါင်းစပ်ထားတာက အံ့မခန်းပါပဲ၊ ဒါပေမယ့် ပိုပြီး ရှိပါသေးတယ်! `Dataset.map()` နဲ့ `batched=True` ကို အသုံးပြုပြီး သင့် dataset ထဲက elements အရေအတွက်ကို ပြောင်းလဲနိုင်ပါတယ်။ ဒါက example တစ်ခုကနေ training features များစွာ ဖန်တီးလိုတဲ့ အခြေအနေများစွာမှာ အလွန်အသုံးဝင်ပြီး၊ [Chapter 7](/course/chapter7) မှာ ကျွန်တော်တို့ လုပ်ဆောင်မယ့် NLP tasks အများအပြားအတွက် preprocessing ရဲ့ တစ်စိတ်တစ်ပိုင်းအဖြစ် ဒါကို လုပ်ဆောင်ဖို့ လိုအပ်ပါလိမ့်မယ်။
+
+> [!TIP]
+> 💡 Machine learning မှာ၊ _example_ တစ်ခုကို model ကို ကျွန်တော်တို့ ထည့်သွင်းပေးတဲ့ _features_ အစုအဝေးအဖြစ် အများအားဖြင့် သတ်မှတ်ပါတယ်။ အချို့ context တွေမှာ၊ ဒီ features တွေက `Dataset` ထဲက columns တွေရဲ့ အစုအဝေး ဖြစ်ပါလိမ့်မယ်၊ ဒါပေမယ့် အခြား context တွေမှာ (ဒီနေရာနဲ့ question answering လိုမျိုး)၊ single example တစ်ခုကနေ features များစွာကို ထုတ်ယူနိုင်ပြီး single column တစ်ခုထဲမှာ ပါဝင်နိုင်ပါတယ်။
+
+ဒါက ဘယ်လိုအလုပ်လုပ်လဲ ကြည့်ရအောင်။ ဒီနေရာမှာ ကျွန်တော်တို့ examples တွေကို tokenize လုပ်ပြီး အရှည်ဆုံး 128 အထိ truncate လုပ်ပါမယ်၊ ဒါပေမယ့် tokenizer ကို texts တွေရဲ့ ပထမဆုံး chunk တစ်ခုတည်း မဟုတ်ဘဲ chunks အားလုံးကို ပြန်ပေးဖို့ တောင်းဆိုပါမယ်။ ဒါကို `return_overflowing_tokens=True` နဲ့ လုပ်ဆောင်နိုင်ပါတယ်။
+
+```py
+def tokenize_and_split(examples):
+    return tokenizer(
+        examples["review"],
+        truncation=True,
+        max_length=128,
+        return_overflowing_tokens=True,
+    )
+```
+
+`Dataset.map()` ကို dataset တစ်ခုလုံးမှာ အသုံးမပြုခင် example တစ်ခုပေါ်မှာ ဒါကို စမ်းသပ်ကြည့်ရအောင်။
+
+```py
+result = tokenize_and_split(drug_dataset["train"][0])
+[len(inp) for inp in result["input_ids"]]
+```
+
+```python out
+[128, 49]
+```
+
+ဒါကြောင့်၊ training set ထဲက ကျွန်တော်တို့ရဲ့ ပထမ example က features နှစ်ခု ဖြစ်လာခဲ့ပါတယ်။ ဘာလို့လဲဆိုတော့ သတ်မှတ်ထားတဲ့ maximum tokens အရေအတွက်ထက် ပိုပြီး tokenize လုပ်ခဲ့လို့ပါပဲ- ပထမတစ်ခုက အရှည် ၁၂၈၊ ဒုတိယတစ်ခုက အရှည် ၄၉ ဖြစ်ပါတယ်။ အခု dataset ရဲ့ elements အားလုံးအတွက် ဒါကို လုပ်ဆောင်ရအောင်!
+
+```py
+tokenized_dataset = drug_dataset.map(tokenize_and_split, batched=True)
+```
+
+```python out
+ArrowInvalid: Column 1 named condition expected length 1463 but got length 1000
+```
+
+အိုခေ၊ ဒါအလုပ်မဖြစ်ဘူး! ဘာလို့လဲ။ error message ကို ကြည့်လိုက်ရင် ကျွန်တော်တို့ clue တစ်ခုရပါလိမ့်မယ်- columns တွေထဲက တစ်ခုရဲ့ အရှည်တွေ မကိုက်ညီပါဘူး၊ တစ်ခုက ၁,၄၆၃၊ နောက်တစ်ခုက ၁,၀၀၀ ဖြစ်ပါတယ်။ `Dataset.map()` [documentation](https://huggingface.co/docs/datasets/package_reference/main_classes#datasets.Dataset.map) ကို သင်ကြည့်ခဲ့မယ်ဆိုရင်၊ ဒါက ကျွန်တော်တို့ map လုပ်နေတဲ့ function ကို ပေးပို့လိုက်တဲ့ samples အရေအတွက်ဖြစ်တယ်ဆိုတာ သင်မှတ်မိနိုင်ပါလိမ့်မယ်။ ဒီနေရာမှာ အဲဒီ ၁,၀၀၀ examples က features အသစ် ၁,၄၆၃ ခုကို ပေးခဲ့တာကြောင့် shape error ဖြစ်သွားတာပါပဲ။
+
+ပြဿနာကတော့ မတူညီတဲ့ datasets နှစ်ခုကို မတူညီတဲ့ sizes တွေနဲ့ ရောနှောဖို့ ကြိုးစားနေတာပါပဲ- `drug_dataset` columns တွေမှာ သတ်မှတ်ထားတဲ့ အရေအတွက်ရှိတဲ့ examples တွေ ပါဝင်မှာပါ (ကျွန်တော်တို့ error မှာပါတဲ့ ၁,၀၀၀)၊ ဒါပေမယ့် ကျွန်တော်တို့ တည်ဆောက်နေတဲ့ `tokenized_dataset` မှာတော့ ပိုများပါလိမ့်မယ် (error message မှာပါတဲ့ ၁,၄၆၃၊ ဒါက ၁,၀၀၀ ထက် များတာက ကျွန်တော်တို့ `return_overflowing_tokens=True` ကို အသုံးပြုပြီး ရှည်လျားတဲ့ reviews တွေကို example တစ်ခုထက် ပိုပြီး tokenize လုပ်နေလို့ပါ)။ ဒါက `Dataset` အတွက် အလုပ်မဖြစ်တဲ့အတွက်၊ ကျွန်တော်တို့ဟာ old dataset ကနေ columns တွေကို ဖယ်ရှားရပါလိမ့်မယ် ဒါမှမဟုတ် ၎င်းတို့ကို new dataset မှာရှိတဲ့ size တူတူ ဖြစ်အောင် လုပ်ရပါလိမ့်မယ်။ `remove_columns` argument နဲ့ ပထမနည်းလမ်းကို လုပ်ဆောင်နိုင်ပါတယ်။
+
+```py
+tokenized_dataset = drug_dataset.map(
+    tokenize_and_split, batched=True, remove_columns=drug_dataset["train"].column_names
+)
+```
+
+အခု ဒါက error မရှိဘဲ အလုပ်ဖြစ်ပါတယ်။ ကျွန်တော်တို့ရဲ့ new dataset မှာ original dataset ထက် elements တွေ အများကြီး ပိုများတယ်ဆိုတာကို lengths တွေကို နှိုင်းယှဉ်ခြင်းဖြင့် စစ်ဆေးနိုင်ပါတယ်။
+
+```py
+len(tokenized_dataset["train"]), len(drug_dataset["train"])
+```
+
+```python out
+(206772, 138514)
+```
+
+မတူညီတဲ့ အရှည်ပြဿနာကို old columns တွေကို new ones တွေရဲ့ size တူတူ ဖြစ်အောင် လုပ်ခြင်းဖြင့်လည်း ဖြေရှင်းနိုင်တယ်လို့ ကျွန်တော်တို့ ပြောခဲ့ပါတယ်။ ဒါကိုလုပ်ဖို့၊ `return_overflowing_tokens=True` လို့ သတ်မှတ်တဲ့အခါ tokenizer က ပြန်ပေးတဲ့ `overflow_to_sample_mapping` field ကို ကျွန်တော်တို့ လိုအပ်ပါလိမ့်မယ်။ ဒါက new feature index ကနေ ဒါကို ထုတ်လုပ်ခဲ့တဲ့ sample ရဲ့ index ဆီကို mapping တစ်ခုကို ပေးပါတယ်။ ဒါကို အသုံးပြုပြီး၊ ကျွန်တော်တို့ original dataset မှာ ပါဝင်တဲ့ key တစ်ခုစီကို မှန်ကန်တဲ့ size ရှိတဲ့ values list တစ်ခုနဲ့ ဆက်စပ်နိုင်ပါတယ်၊ ဒါက example တစ်ခုစီရဲ့ values တွေကို features အသစ် ဘယ်နှစ်ခု ထုတ်လုပ်လဲဆိုတာနဲ့ အညီ အကြိမ်ကြိမ် ထပ်ခါတလဲလဲ လုပ်ခြင်းဖြင့်ပါ။
+
+```py
+def tokenize_and_split(examples):
+    result = tokenizer(
+        examples["review"],
+        truncation=True,
+        max_length=128,
+        return_overflowing_tokens=True,
+    )
+    # New နှင့် old indices များကြား mapping ကို ထုတ်ယူပါ
+    sample_map = result.pop("overflow_to_sample_mapping")
+    for key, values in examples.items():
+        result[key] = [values[i] for i in sample_map]
+    return result
+```
+
+ဒါက `Dataset.map()` နဲ့ အလုပ်ဖြစ်တာကို ကျွန်တော်တို့ မြင်နိုင်ပါတယ်။ old columns တွေကို ဖယ်ရှားစရာ မလိုပါဘူး။
+
+```py
+tokenized_dataset = drug_dataset.map(tokenize_and_split, batched=True)
+tokenized_dataset
+```
+
+```python out
+DatasetDict({
+    train: Dataset({
+        features: ['attention_mask', 'condition', 'date', 'drugName', 'input_ids', 'patient_id', 'rating', 'review', 'review_length', 'token_type_ids', 'usefulCount'],
+        num_rows: 206772
+    })
+    test: Dataset({
+        features: ['attention_mask', 'condition', 'date', 'drugName', 'input_ids', 'patient_id', 'rating', 'review', 'review_length', 'token_type_ids', 'usefulCount'],
+        num_rows: 68876
+    })
+})
+```
+
+ကျွန်တော်တို့ ယခင်ကနဲ့ တူညီတဲ့ training features အရေအတွက်ကို ရရှိပါတယ်၊ ဒါပေမယ့် ဒီနေရာမှာ old fields အားလုံးကို သိမ်းဆည်းထားပါတယ်။ သင် model ကို အသုံးပြုပြီးနောက် post-processing လုပ်ဖို့ ဒါတွေကို လိုအပ်မယ်ဆိုရင် ဒီချဉ်းကပ်မှုကို အသုံးပြုချင်ပါလိမ့်မယ်။
+
+🤗 Datasets ကို နည်းလမ်းအမျိုးမျိုးနဲ့ dataset တစ်ခုကို preprocess လုပ်ဖို့ ဘယ်လိုအသုံးပြုရမယ်ဆိုတာကို အခု သင်မြင်တွေ့ခဲ့ရပါပြီ။ 🤗 Datasets ရဲ့ processing functions တွေက သင့် model training လိုအပ်ချက်အများစုကို ဖြည့်ဆည်းပေးနိုင်ပေမယ့်၊
+ပိုမိုအားကောင်းတဲ့ features တွေဖြစ်တဲ့ `DataFrame.groupby()` ဒါမှမဟုတ် visualization အတွက် high-level APIs တွေလိုမျိုးကို ဝင်ရောက်ကြည့်ရှုဖို့ Pandas ကို ပြောင်းလဲဖို့ လိုအပ်တဲ့အခါတွေ ရှိနိုင်ပါတယ်။ ကံကောင်းစွာနဲ့ပဲ၊ 🤗 Datasets ကို Pandas, NumPy, PyTorch, TensorFlow, နဲ့ JAX လိုမျိုး libraries တွေနဲ့ အပြန်အလှန်အလုပ်လုပ်နိုင်အောင် ဒီဇိုင်းထုတ်ထားပါတယ်။ ဒါက ဘယ်လိုအလုပ်လုပ်လဲ ကြည့်ရအောင်။
+
+## `Dataset`s မှ `DataFrame`s သို့ ပြန်လည်[[from-datasets-to-dataframes-and-back]]
+
+<Youtube id="tfcY1067A5Q"/>
+
+အခြား third-party libraries တွေကြား ပြောင်းလဲခြင်းကို ဖွင့်ဖို့၊ 🤗 Datasets က `Dataset.set_format()` function ကို ပံ့ပိုးပေးပါတယ်။ ဒီ function က dataset ရဲ့ _output format_ ကိုသာ ပြောင်းလဲတာဖြစ်ပြီး၊ အောက်ခံ _data format_ (Apache Arrow) ကို မထိခိုက်ဘဲ အခြား format တစ်ခုသို့ လွယ်ကူစွာ ပြောင်းလဲနိုင်ပါတယ်။ formatting က in place လုပ်ဆောင်ပါတယ်။ ဥပမာပြသရန်၊ ကျွန်တော်တို့ dataset ကို Pandas သို့ ပြောင်းလဲရအောင်...
+
+```py
+drug_dataset.set_format("pandas")
+```
+
+အခု dataset ရဲ့ elements တွေကို ဝင်ရောက်ကြည့်ရှုတဲ့အခါ dictionary အစား `pandas.DataFrame` တစ်ခုကို ကျွန်တော်တို့ ရရှိပါတယ်။
+
+```py
+drug_dataset["train"][:3]
+```
+
+<table border="1" class="dataframe">
+  <thead>
+    <tr style="text-align: right;">
+      <th></th>
+      <th>patient_id</th>
+      <th>drugName</th>
+      <th>condition</th>
+      <th>review</th>
+      <th>rating</th>
+      <th>date</th>
+      <th>usefulCount</th>
+      <th>review_length</th>
+    </tr>
+  </thead>
+  <tbody>
+    <tr>
+      <th>0</th>
+      <td>95260</td>
+      <td>Guanfacine</td>
+      <td>adhd</td>
+      <td>"My son is halfway through his fourth week of Intuniv..."</td>
+      <td>8.0</td>
+      <td>April 27, 2010</td>
+      <td>192</td>
+      <td>141</td>
+    </tr>
+    <tr>
+      <th>1</th>
+      <td>92703</td>
+      <td>Lybrel</td>
+      <td>birth control</td>
+      <td>"I used to take another oral contraceptive, which had 21 pill cycle, and was very happy- very light periods, max 5 days, no other side effects..."</td>
+      <td>5.0</td>
+      <td>December 14, 2009</td>
+      <td>17</td>
+      <td>134</td>
+    </tr>
+    <tr>
+      <th>2</th>
+      <td>138000</td>
+      <td>Ortho Evra</td>
+      <td>birth control</td>
+      <td>"This is my first time using any form of birth control..."</td>
+      <td>8.0</td>
+      <td>November 3, 2015</td>
+      <td>10</td>
+      <td>89</td>
+    </tr>
+  </tbody>
+</table>
+
+`drug_dataset["train"]` ရဲ့ elements အားလုံးကို ရွေးချယ်ခြင်းဖြင့် training set တစ်ခုလုံးအတွက် `pandas.DataFrame` တစ်ခုကို ဖန်တီးရအောင်...
+
+```py
+train_df = drug_dataset["train"][:]
+```
+
+> [!TIP]
+> 🚨 နောက်ကွယ်မှာ၊ `Dataset.set_format()` က dataset ရဲ့ `__getitem__()` dunder method အတွက် return format ကို ပြောင်းလဲပါတယ်။ ဒါက `"pandas"` format မှာရှိတဲ့ `Dataset` ကနေ `train_df` လိုမျိုး object အသစ်တစ်ခု ဖန်တီးချင်တဲ့အခါ၊ `pandas.DataFrame` တစ်ခုရဖို့ dataset တစ်ခုလုံးကို slice လုပ်ဖို့ လိုအပ်တယ်လို့ ဆိုလိုပါတယ်။ `drug_dataset["train"]` ရဲ့ type က output format နဲ့ မသက်ဆိုင်ဘဲ `Dataset` ဖြစ်နေတာကို သင်ကိုယ်တိုင် စစ်ဆေးနိုင်ပါတယ်။
+
+ဒီကနေ ကျွန်တော်တို့ လိုချင်တဲ့ Pandas functionality အားလုံးကို အသုံးပြုနိုင်ပါတယ်။ ဥပမာ၊ `condition` entries တွေကြား class distribution ကို တွက်ချက်ဖို့ fancy chaining ကို လုပ်ဆောင်နိုင်ပါတယ်။
+
+```py
+frequencies = (
+    train_df["condition"]
+    .value_counts()
+    .to_frame()
+    .reset_index()
+    .rename(columns={"index": "condition", "count": "frequency"})
+)
+frequencies.head()
+```
+
+<table border="1" class="dataframe">
+  <thead>
+    <tr style="text-align: right;">
+      <th></th>
+      <th>condition</th>
+      <th>frequency</th>
+    </tr>
+  </thead>
+  <tbody>
+    <tr>
+      <th>0</th>
+      <td>birth control</td>
+      <td>27655</td>
+    </tr>
+    <tr>
+      <th>1</th>
+      <td>depression</td>
+      <td>8023</td>
+    </tr>
+    <tr>
+      <th>2</th>
+      <td>acne</td>
+      <td>5209</td>
+    </tr>
+    <tr>
+      <th>3</th>
+      <td>anxiety</td>
+      <td>4991</td>
+    </tr>
+    <tr>
+      <th>4</th>
+      <td>pain</td>
+      <td>4744</td>
+    </tr>
+  </tbody>
+</table>
+
+
+ကျွန်တော်တို့ Pandas analysis ပြီးသွားတာနဲ့၊ `Dataset.from_pandas()` function ကို အောက်ပါအတိုင်း အသုံးပြုခြင်းဖြင့် new `Dataset` object တစ်ခုကို အမြဲတမ်း ဖန်တီးနိုင်ပါတယ်။
+
+
+```py
+from datasets import Dataset
+
+freq_dataset = Dataset.from_pandas(frequencies)
+freq_dataset
+```
+
+```python out
+Dataset({
+    features: ['condition', 'frequency'],
+    num_rows: 819
+})
+```
+
+> [!TIP]
+> ✏️ **စမ်းသပ်ကြည့်ပါ။** ဆေးဝါးတစ်ခုစီရဲ့ ပျမ်းမျှ rating ကို တွက်ချက်ပြီး ရလဒ်ကို new `Dataset` တစ်ခုမှာ သိမ်းဆည်းပါ။
+
+ဒါက 🤗 Datasets မှာ ရရှိနိုင်တဲ့ preprocessing techniques အမျိုးမျိုးကို ကျွန်တော်တို့ရဲ့ လေ့လာမှုကို နိဂုံးချုပ်လိုက်ပါပြီ။ ဒီအပိုင်းကို အဆုံးသတ်ဖို့အတွက်၊ classifier တစ်ခုကို train လုပ်ဖို့ dataset ကို ပြင်ဆင်ဖို့ validation set တစ်ခုကို ဖန်တီးရအောင်။ ဒါကို မလုပ်ခင်၊ `drug_dataset` ရဲ့ output format ကို `"pandas"` ကနေ `"arrow"` သို့ ပြန်လည်သတ်မှတ်ပါမယ်။
+
+```python
+drug_dataset.reset_format()
+```
+
+## Validation Set တစ်ခု ဖန်တီးခြင်း[[creating-a-validation-set]]
+
+evaluation အတွက် အသုံးပြုနိုင်မယ့် test set တစ်ခု ကျွန်တော်တို့မှာ ရှိပေမယ့်၊ development လုပ်နေစဉ် test set ကို မထိဘဲ သီးခြား validation set တစ်ခု ဖန်တီးတာက ကောင်းမွန်တဲ့ အလေ့အကျင့်တစ်ခုပါ။ validation set ပေါ်က သင့် models တွေရဲ့ စွမ်းဆောင်ရည်ကို သင်စိတ်ကျေနပ်သွားတာနဲ့၊ test set ပေါ်မှာ နောက်ဆုံး sanity check တစ်ခု လုပ်နိုင်ပါတယ်။ ဒီလုပ်ငန်းစဉ်က test set ကို overfit ဖြစ်ပြီး real-world data မှာ အဆင်မပြေတဲ့ model တစ်ခုကို deploy လုပ်မိမယ့် အန္တရာယ်ကို လျှော့ချရာမှာ ကူညီပေးပါတယ်။
+
+🤗 Datasets က `scikit-learn` ရဲ့ နာမည်ကြီး functionality ပေါ် အခြေခံထားတဲ့ `Dataset.train_test_split()` function ကို ပံ့ပိုးပေးပါတယ်။ ကျွန်တော်တို့ရဲ့ training set ကို `train` နဲ့ `validation` splits တွေအဖြစ် ခွဲထုတ်ဖို့ ဒါကို အသုံးပြုရအောင် (reproducibility အတွက် `seed` argument ကို ကျွန်တော်တို့ သတ်မှတ်ပါတယ်)။
+
+```py
+drug_dataset_clean = drug_dataset["train"].train_test_split(train_size=0.8, seed=42)
+# default "test" split ကို "validation" အဖြစ် နာမည်ပြောင်းပါ။
+drug_dataset_clean["validation"] = drug_dataset_clean.pop("test")
+# "test" set ကို ကျွန်တော်တို့ရဲ့ `DatasetDict` ထဲသို့ ထည့်ပါ။
+drug_dataset_clean["test"] = drug_dataset["test"]
+drug_dataset_clean
+```
+
+```python out
+DatasetDict({
+    train: Dataset({
+        features: ['patient_id', 'drugName', 'condition', 'review', 'rating', 'date', 'usefulCount', 'review_length', 'review_clean'],
+        num_rows: 110811
+    })
+    validation: Dataset({
+        features: ['patient_id', 'drugName', 'condition', 'review', 'rating', 'date', 'usefulCount', 'review_length', 'review_clean'],
+        num_rows: 27703
+    })
+    test: Dataset({
+        features: ['patient_id', 'drugName', 'condition', 'review', 'rating', 'date', 'usefulCount', 'review_length', 'review_clean'],
+        num_rows: 46108
+    })
+})
+```
+
+ကောင်းပါပြီ၊ အခုဆိုရင် models အချို့ကို train လုပ်ဖို့ အဆင်သင့်ဖြစ်နေတဲ့ dataset တစ်ခုကို ကျွန်တော်တို့ ပြင်ဆင်ထားပါပြီ။ [section 5](/course/chapter5/5) မှာ datasets တွေကို Hugging Face Hub သို့ ဘယ်လို upload လုပ်ရမလဲဆိုတာ ပြသပေးပါမယ်၊ ဒါပေမယ့် အခုတော့ သင့် local machine မှာ datasets တွေကို သိမ်းဆည်းနိုင်တဲ့ နည်းလမ်းအချို့ကို ကြည့်ခြင်းဖြင့် ကျွန်တော်တို့ရဲ့ analysis ကို နိဂုံးချုပ်ရအောင်။
+
+## Dataset တစ်ခုကို သိမ်းဆည်းခြင်း[[saving-a-dataset]]
+
+<Youtube id="blF9uxYcKHo"/>
+
+🤗 Datasets က download လုပ်ထားတဲ့ dataset တိုင်းနဲ့ ၎င်းပေါ်မှာ လုပ်ဆောင်ထားတဲ့ operations တွေကို cache လုပ်ထားပေမယ့်၊ dataset တစ်ခုကို disk ထဲသို့ သိမ်းဆည်းချင်တဲ့အခါတွေ ရှိနိုင်ပါတယ် (ဥပမာ- cache ဖျက်ပစ်ခံရတဲ့အခါ)။ အောက်ပါဇယားမှာ ပြသထားတဲ့အတိုင်း၊ 🤗 Datasets က သင့် dataset ကို မတူညီတဲ့ formats တွေနဲ့ သိမ်းဆည်းဖို့ အဓိက functions သုံးခုကို ပံ့ပိုးပေးပါတယ်။
+
+| Data format |        Function        |
+| :---------: | :--------------------: |
+|    Arrow    | `Dataset.save_to_disk()` |
+|     CSV     |    `Dataset.to_csv()`    |
+|    JSON     |   `Dataset.to_json()`    |
+
+ဥပမာ၊ ကျွန်တော်တို့ရဲ့ သန့်ရှင်းရေးလုပ်ထားတဲ့ dataset ကို Arrow format နဲ့ သိမ်းဆည်းရအောင်။
+
+```py
+drug_dataset_clean.save_to_disk("drug-reviews")
+```
+
+ဒါက အောက်ပါ structure ပါဝင်တဲ့ directory တစ်ခုကို ဖန်တီးပါလိမ့်မယ်။
+
+```
+drug-reviews/
+├── dataset_dict.json
+├── test
+│   ├── dataset.arrow
+│   ├── dataset_info.json
+│   └── state.json
+├── train
+│   ├── dataset.arrow
+│   ├── dataset_info.json
+│   ├── indices.arrow
+│   └── state.json
+└── validation
+    ├── dataset.arrow
+    ├── dataset_info.json
+    ├── indices.arrow
+    └── state.json
+```
+
+ဒီနေရာမှာ split တစ်ခုစီဟာ သူ့ရဲ့ကိုယ်ပိုင် *dataset.arrow* table နဲ့ *dataset_info.json* နဲ့ *state.json* ထဲက metadata အချို့နဲ့ ဆက်စပ်နေတာကို ကျွန်တော်တို့ မြင်တွေ့နိုင်ပါတယ်။ Arrow format ကို columns တွေနဲ့ rows တွေပါဝင်တဲ့ fancy table တစ်ခုအဖြစ် တွေးကြည့်နိုင်ပြီး ကြီးမားတဲ့ datasets တွေကို process လုပ်ပြီး သယ်ယူပို့ဆောင်တဲ့ high-performance applications တွေ တည်ဆောက်ဖို့အတွက် optimization လုပ်ထားပါတယ်။
+
+dataset ကို သိမ်းဆည်းပြီးတာနဲ့၊ `load_from_disk()` function ကို အောက်ပါအတိုင်း အသုံးပြုပြီး load လုပ်နိုင်ပါတယ်။
+
+```py
+from datasets import load_from_disk
+
+drug_dataset_reloaded = load_from_disk("drug-reviews")
+drug_dataset_reloaded
+```
+
+```python out
+DatasetDict({
+    train: Dataset({
+        features: ['patient_id', 'drugName', 'condition', 'review', 'rating', 'date', 'usefulCount', 'review_length'],
+        num_rows: 110811
+    })
+    validation: Dataset({
+        features: ['patient_id', 'drugName', 'condition', 'review', 'rating', 'date', 'usefulCount', 'review_length'],
+        num_rows: 27703
+    })
+    test: Dataset({
+        features: ['patient_id', 'drugName', 'condition', 'review', 'rating', 'date', 'usefulCount', 'review_length'],
+        num_rows: 46108
+    })
+})
+```
+
+CSV နဲ့ JSON formats တွေအတွက်ကတော့၊ split တစ်ခုစီကို သီးခြား file တစ်ခုစီအဖြစ် သိမ်းဆည်းရပါမယ်။ ဒါကိုလုပ်နိုင်တဲ့ နည်းလမ်းတစ်ခုက `DatasetDict` object ထဲက keys နဲ့ values တွေကို iterate လုပ်ခြင်းပါပဲ။
+
+```py
+for split, dataset in drug_dataset_clean.items():
+    dataset.to_json(f"drug-reviews-{split}.jsonl")
+```
+
+ဒါက split တစ်ခုစီကို [JSON Lines format](https://jsonlines.org) နဲ့ သိမ်းဆည်းပေးပါတယ်၊ ဒီနေရာမှာ dataset ထဲက row တစ်ခုစီကို JSON ရဲ့ single line တစ်ခုအဖြစ် သိမ်းဆည်းထားပါတယ်။ ပထမဥပမာကတော့ ဒီလိုပုံစံရှိပါတယ်။
+
+```py
+!head -n 1 drug-reviews-train.jsonl
+```
+
+```python out
+{"patient_id":141780,"drugName":"Escitalopram","condition":"depression","review":"\"I seemed to experience the regular side effects of LEXAPRO, insomnia, low sex drive, sleepiness during the day. I am taking it at night because my doctor said if it made me tired to take it at night. I assumed it would and started out taking it at night. Strange dreams, some pleasant. I was diagnosed with fibromyalgia. Seems to be helping with the pain. Have had anxiety and depression in my family, and have tried quite a few other medications that haven't worked. Only have been on it for two weeks but feel more positive in my mind, want to accomplish more in my life. Hopefully the side effects will dwindle away, worth it to stick with it from hearing others responses. Great medication.\"","rating":9.0,"date":"May 29, 2011","usefulCount":10,"review_length":125}
+```
+
+အဲဒီနောက် [section 2](/course/chapter5/2) က နည်းလမ်းတွေကို အသုံးပြုပြီး JSON files တွေကို အောက်ပါအတိုင်း load လုပ်နိုင်ပါတယ်။
+
+```py
+data_files = {
+    "train": "drug-reviews-train.jsonl",
+    "validation": "drug-reviews-validation.jsonl",
+    "test": "drug-reviews-test.jsonl",
+}
+drug_dataset_reloaded = load_dataset("json", data_files=data_files)
+```
+
+ဒါက 🤗 Datasets နဲ့ data wrangling ကို ကျွန်တော်တို့ရဲ့ လေ့လာမှု ပြီးဆုံးပါပြီ။ model တစ်ခုကို train လုပ်ဖို့အတွက် cleaned dataset တစ်ခု ရရှိပြီဆိုတော့၊ ဒီနေရာမှာ သင်စမ်းသပ်ကြည့်နိုင်မယ့် အကြံဥာဏ်အချို့ရှိပါတယ်။
+
+၁။ [Chapter 3](/course/chapter3) က နည်းလမ်းတွေကို အသုံးပြုပြီး drug review ကို အခြေခံပြီး လူနာရဲ့ condition ကို ခန့်မှန်းနိုင်တဲ့ classifier တစ်ခုကို train လုပ်ပါ။
+၂။ [Chapter 1](/course/chapter1) က `summarization` pipeline ကို အသုံးပြုပြီး reviews တွေရဲ့ summaries တွေကို generate လုပ်ပါ။
+
+နောက်ထပ်အနေနဲ့၊ 🤗 Datasets က သင့် laptop ကို မပေါက်ကွဲစေဘဲ ကြီးမားတဲ့ datasets တွေနဲ့ အလုပ်လုပ်နိုင်အောင် ဘယ်လိုကူညီပေးနိုင်လဲဆိုတာ ကြည့်ရအောင်။
+
+## ဝေါဟာရ ရှင်းလင်းချက် (Glossary)
+
+*   **Slice and Dice**: ဒေတာအစုအဝေး (dataset) ကို လိုအပ်သလို အစိတ်စိတ်အမြွှာမြွှာ ပိုင်းဖြတ်ခြင်းနှင့် ပုံစံပြောင်းလဲခြင်း။
+*   **🤗 Datasets Library**: Hugging Face က ထုတ်လုပ်ထားတဲ့ library တစ်ခုဖြစ်ပြီး AI မော်ဒယ်တွေ လေ့ကျင့်ဖို့အတွက် ဒေတာအစုအဝေး (datasets) တွေကို လွယ်လွယ်ကူကူ ဝင်ရောက်ရယူ၊ စီမံခန့်ခွဲပြီး အသုံးပြုနိုင်စေပါတယ်။
+*   **Pandas**: Python programming language အတွက် data analysis နှင့် manipulation အတွက် အသုံးပြုသော open-source library။
+*   **`Dataset` Object**: 🤗 Datasets library တွင် data table တစ်ခုကို ကိုယ်စားပြုသော object။
+*   **`DatasetDict` Object**: Training set, validation set, နှင့် test set ကဲ့သို့သော dataset အများအပြားကို dictionary ပုံစံဖြင့် သိမ်းဆည်းထားသော object။
+*   **`Dataset.map()` Method**: 🤗 Datasets library မှာ ပါဝင်တဲ့ method တစ်ခုဖြစ်ပြီး dataset ရဲ့ element တစ်ခုစီ ဒါမှမဟုတ် batch တစ်ခုစီပေါ်မှာ function တစ်ခုကို အသုံးပြုနိုင်စေသည်။
+*   **UCI Machine Learning Repository**: Machine learning datasets များစွာကို စုစည်းထားသော repository တစ်ခု။
+*   **Drug Review Dataset**: ဆေးဝါးများနှင့်ပတ်သက်သော လူနာသုံးသပ်ချက်များ၊ အခြေအနေများနှင့် rating များပါဝင်သော dataset။
+*   **`wget` Command**: ကွန်ရက်မှ ဖိုင်များကို download လုပ်ရန်အတွက် Unix/Linux command-line utility။
+*   **`unzip` Command**: ZIP archive များကို decompress လုပ်ရန်အတွက် command-line utility။
+*   **TSV (Tab-Separated Values)**: Tab များဖြင့် ပိုင်းခြားထားသော ဒေတာတန်ဖိုးများပါဝင်သည့် plain text file format။
+*   **CSV (Comma-Separated Values)**: ကော်မာများဖြင့် ပိုင်းခြားထားသော ဒေတာတန်ဖိုးများပါဝင်သည့် plain text file format။
+*   **`delimiter` Argument**: CSV/TSV ဖိုင်များတွင် columns များကို ပိုင်းခြားရန် အသုံးပြုသော character ကို သတ်မှတ်ရန်။
+*   **`load_dataset()` Function**: Hugging Face Datasets library မှ dataset များကို download လုပ်ပြီး cache လုပ်ရန် အသုံးပြုသော function။
+*   **`data_files` Argument**: `load_dataset()` function တွင် dataset files (local သို့မဟုတ် remote) ၏ path (သို့မဟုတ် URL) ကို သတ်မှတ်ရန် အသုံးပြုသော argument။
+*   **Random Sample**: Dataset တစ်ခုမှ ကျပန်းရွေးချယ်ထားသော အစိတ်အပိုင်းငယ်တစ်ခု။
+*   **`Dataset.shuffle()`**: Dataset ရှိ examples များကို ကျပန်းရောနှောရန် (shuffle) အသုံးပြုသော function။
+*   **`seed`**: ကျပန်းနံပါတ်ထုတ်လုပ်ခြင်းကို reproducibility အတွက် တည်ငြိမ်စေရန် အသုံးပြုသော တန်ဖိုး။
+*   **`Dataset.select()`**: Dataset မှ သတ်မှတ်ထားသော indices များရှိ examples များကို ရွေးချယ်ရန် အသုံးပြုသော function။
+*   **Indices**: Dataset အတွင်းရှိ examples များ၏ အမှတ်စဉ်နံပါတ်များ။
+*   **Anonymized ID**: ပုဂ္ဂိုလ်ရေးအချက်အလက်များကို ဖုံးကွယ်ထားသော ID။
+*   **Uppercase/Lowercase Labels**: စာလုံးကြီး/စာလုံးသေးဖြင့် ဖော်ပြထားသော အမျိုးအစားများ။
+*   **Python Line Separators (`\r\n`)**: Python တွင် line အသစ်ပြောင်းရန် အသုံးပြုသော characters များ။
+*   **HTML Character Codes (`&#039;`)**: HTML တွင် သီးခြား characters များကို ကိုယ်စားပြုသော ကုဒ်များ။
+*   **`Dataset.unique()`**: Dataset column တစ်ခုရှိ ထူးခြားသော တန်ဖိုးများ (unique values) ကို ရှာဖွေရန် အသုံးပြုသော function။
+*   **`DatasetDict.rename_column()`**: `DatasetDict` object အတွင်းရှိ columns များကို နာမည်ပြောင်းရန် အသုံးပြုသော function။
+*   **`Dataset.filter()`**: သတ်မှတ်ထားသော condition နှင့် ကိုက်ညီသော examples များကိုသာ ရွေးချယ်ရန် အသုံးပြုသော function။
+*   **Lambda Function**: Python တွင် အမည်မရှိဘဲ ရိုးရှင်းသော function တစ်ခုကို အတိုချုံးရေးသားရန် အသုံးပြုသော function။
+*   **Keywords**: Python programming language တွင် အထူးအဓိပ္ပာယ်ရှိသော စကားလုံးများ။
+*   **`html` Module**: Python တွင် HTML entities များကို ကိုင်တွယ်ရန် အသုံးပြုသော module။
+*   **`html.unescape()`**: HTML character codes များကို ၎င်းတို့၏ မူရင်း characters များသို့ ပြန်ပြောင်းပေးသော function။
+*   **Heuristic**: ပြဿနာဖြေရှင်းခြင်းအတွက် မြန်ဆန်ပြီး လက်တွေ့ကျသော နည်းလမ်း။
+*   **Whitespace**: စာသားအတွင်းရှိ အဖြူနေရာများ (space, tab, newline)။
+*   **`Dataset.add_column()`**: Dataset သို့ column အသစ်တစ်ခု ထည့်ရန် အသုံးပြုသော function။
+*   **NumPy Array**: Python တွင် ဂဏန်းများကို ထိရောက်စွာ ကိုင်တွယ်ရန် အသုံးပြုသော multi-dimensional array object။
+*   **Descending Order**: အကြီးဆုံးမှ အငယ်ဆုံးသို့ စီစဉ်ထားသော အစီအစဉ်။
+*   **`batched=True`**: `Dataset.map()` function တွင် examples များကို batch အလိုက် လုပ်ဆောင်ရန် သတ်မှတ်သော argument။
+*   **List Comprehension**: Python တွင် list အသစ်တစ်ခုကို တိုတိုရှင်းရှင်း ဖန်တီးရန် နည်းလမ်း။
+*   **`%time` / `%%time`**: IPython/Jupyter Notebook တွင် code execution အချိန်ကို တိုင်းတာရန် အသုံးပြုသော magic commands များ။
+*   **Wall Time**: code execution အတွက် စုစုပေါင်း ကြာချိန် (actual clock time)။
+*   **Fast Tokenizer**: Hugging Face Transformers library တွင် Rust ဖြင့် အကောင်အထည်ဖော်ထားပြီး မြန်ဆန်သော tokenization ကို ပံ့ပိုးပေးသော tokenizer။
+*   **Slow Tokenizer**: Python ဖြင့် အကောင်အထည်ဖော်ထားပြီး fast tokenizer ထက် နှေးကွေးသော tokenizer။
+*   **`AutoTokenizer`**: Hugging Face Transformers library မှာ ပါဝင်တဲ့ class တစ်ခုဖြစ်ပြီး မော်ဒယ်အမည်ကို အသုံးပြုပြီး သက်ဆိုင်ရာ tokenizer ကို အလိုအလျောက် load လုပ်ပေးသည်။
+*   **`truncation=True`**: Tokenization လုပ်ငန်းစဉ်တွင် input sequence များကို အများဆုံးအရှည် (max_length) သို့ ဖြတ်တောက်ရန် သတ်မှတ်ခြင်း။
+*   **`use_fast=False`**: `AutoTokenizer.from_pretrained()` တွင် fast tokenizer ကို အသုံးမပြုဘဲ slow tokenizer ကို အသုံးပြုရန် သတ်မှတ်သော argument။
+*   **Rust**: System programming language တစ်ခုဖြစ်ပြီး performance မြင့်မားသော applications များ တည်ဆောက်ရာတွင် အသုံးပြုသည်။
+*   **Parallelize Code Execution**: Code ကို threads သို့မဟုတ် processes များစွာဖြင့် တစ်ပြိုင်နက်တည်း run ခြင်း။
+*   **Multiprocessing**: ကွန်ပျူတာ၏ CPU core များစွာကို အသုံးပြု၍ တစ်ပြိုင်နက်တည်း လုပ်ငန်းဆောင်တာများ လုပ်ဆောင်ခြင်း။
+*   **`num_proc` Argument**: `Dataset.map()` function တွင် multiprocessing အတွက် အသုံးပြုမည့် processes အရေအတွက်ကို သတ်မှတ်ရန် argument။
+*   **Optimal Number of Processes**: အကောင်းဆုံး စွမ်းဆောင်ရည်ကို ပေးသော processes အရေအတွက်။
+*   **`return_overflowing_tokens=True`**: tokenizer ကို input text ၏ အရှည်ဆုံးအပိုင်း (chunk) တစ်ခုတည်း မဟုတ်ဘဲ အခြားအပိုင်းများ (overflowing tokens) ကိုပါ ပြန်ပေးရန် သတ်မှတ်သော argument။
+*   **ArrowInvalid**: Apache Arrow library မှ ထုတ်ပေးသော error တစ်မျိုး၊ ဒေတာ structure များ မကိုက်ညီခြင်းကြောင့် ဖြစ်ပေါ်တတ်သည်။
+*   **Mismatched Lengths**: ဒေတာအစုအဝေးများ သို့မဟုတ် columns များ၏ အရှည် မတူညီခြင်း။
+*   **`remove_columns` Argument**: `Dataset.map()` function တွင် မလိုအပ်သော columns များကို ဖယ်ရှားရန် အသုံးပြုသော argument။
+*   **`column_names` Property**: Dataset ၏ column names များကို ပြန်ပေးသော property။
+*   **`overflow_to_sample_mapping` Field**: `return_overflowing_tokens=True` သတ်မှတ်ထားသည့်အခါ tokenizer မှ ပြန်ပေးသော field တစ်ခု။ ၎င်းသည် output feature တစ်ခုစီကို မူရင်း input sample နှင့် ဆက်စပ်ပေးသည်။
+*   **Post-processing**: Model ၏ output များကို နောက်ဆုံးအသုံးပြုမှုအတွက် ပြင်ဆင်ခြင်း လုပ်ငန်းစဉ်။
+*   **`DataFrame.groupby()`**: Pandas DataFrame တွင် ဒေတာများကို column တစ်ခု သို့မဟုတ် တစ်ခုထက်ပိုသော တန်ဖိုးများအလိုက် အုပ်စုဖွဲ့ရန် အသုံးပြုသော method။
+*   **High-level APIs**: ဆော့ဖ်ဝဲလ် developer များအတွက် အသုံးပြုရလွယ်ကူပြီး abstraction အဆင့်မြင့်သော Programming Interfaces များ။
+*   **Visualization**: ဒေတာများကို ဂရပ်များ သို့မဟုတ် ပုံများအဖြစ် ပြသခြင်း။
+*   **NumPy**: Python programming language အတွက် ဂဏန်းများကို ထိရောက်စွာ ကိုင်တွယ်ရန် အသုံးပြုသော library။
+*   **PyTorch**: Facebook (ယခု Meta) က ဖန်တီးထားတဲ့ open-source machine learning library တစ်ခုဖြစ်ပြီး deep learning မော်ဒယ်တွေ တည်ဆောက်ဖို့အတွက် အသုံးပြုပါတယ်။
+*   **TensorFlow**: Google က ဖန်တီးထားတဲ့ open-source machine learning library တစ်ခုဖြစ်ပြီး deep learning မော်ဒယ်တွေ တည်ဆောက်ဖို့အတွက် အသုံးပြုပါတယ်။
+*   **JAX**: Google က ဖန်တီးထားတဲ့ high-performance numerical computing library တစ်ခုဖြစ်ပြီး AI/ML သုတေသနအတွက် အသုံးပြုပါတယ်။
+*   **`Dataset.set_format()` Function**: Dataset ၏ output format (ဥပမာ- "pandas", "pytorch", "numpy") ကို ပြောင်းလဲရန် အသုံးပြုသော function။
+*   **Output Format**: Dataset မှ data များကို မည်သည့် format ဖြင့် ပြန်ပေးမည်ကို သတ်မှတ်ခြင်း။
+*   **Underlying Data Format**: Dataset အတွင်းပိုင်းတွင် data များကို သိမ်းဆည်းထားသော မူရင်း format (Apache Arrow)။
+*   **Apache Arrow**: In-memory data format တစ်ခုဖြစ်ပြီး data analytics applications တွေကြား ဒေတာဖလှယ်မှုကို မြန်ဆန်စေပြီး ထိရောက်စေသည်။
+*   **`pandas.DataFrame`**: Pandas library တွင် table ပုံစံ ဒေတာများကို ကိုယ်စားပြုသော data structure။
+*   **`__getitem__()` Dunder Method**: Python object တစ်ခု၏ element များကို index သို့မဟုတ် key ဖြင့် ဝင်ရောက်ကြည့်ရှုသည့်အခါ ခေါ်ဆိုသော special method။
+*   **`Dataset.from_pandas()` Function**: Pandas DataFrame မှ `Dataset` object တစ်ခုကို ဖန်တီးရန် အသုံးပြုသော function။
+*   **Validation Set**: Training လုပ်နေစဉ် model ၏ စွမ်းဆောင်ရည်ကို အကဲဖြတ်ရန် အသုံးပြုသော dataset အပိုင်း။
+*   **Test Set**: Model ၏ နောက်ဆုံး စွမ်းဆောင်ရည်ကို တိုင်းတာရန် အသုံးပြုသော dataset အပိုင်း။
+*   **Sanity Check**: စနစ် သို့မဟုတ် code က မှန်ကန်စွာ အလုပ်လုပ်ခြင်းရှိမရှိ စစ်ဆေးရန် ရိုးရှင်းသော စမ်းသပ်မှု။
+*   **Overfit**: Model က training data ကို အလွန်အကျွံ သင်ယူသွားပြီး မမြင်ဘူးသေးသော data တွင် စွမ်းဆောင်ရည် ကျဆင်းခြင်း။
+*   **`Dataset.train_test_split()` Function**: Dataset ကို training နှင့် testing (သို့မဟုတ် validation) splits များအဖြစ် ခွဲထုတ်ရန် အသုံးပြုသော function။ `scikit-learn` library ၏ functionality နှင့် ဆင်တူသည်။
+*   **`scikit-learn`**: Python အတွက် machine learning library တစ်ခု။
+*   **`train_size` Argument**: `train_test_split()` function တွင် training set ၏ အရွယ်အစား (ရာခိုင်နှုန်း) ကို သတ်မှတ်ရန်။
+*   **`pop()` Method**: Dictionary မှ သတ်မှတ်ထားသော key နှင့် ၎င်း၏ value ကို ဖယ်ရှားပြီး value ကို ပြန်ပေးသော method။
+*   **Classifier**: ဒေတာအချက်အလက်များကို သတ်မှတ်ထားသော အမျိုးအစားများ သို့မဟုတ် အတန်းများထဲသို့ ခွဲခြားသတ်မှတ်ရန် လေ့ကျင့်ထားသော model။
+*   **Cache**: မကြာခဏ အသုံးပြုရသော ဒေတာများကို မြန်မြန်ဆန်ဆန် ဝင်ရောက်ကြည့်ရှုနိုင်ရန် သိမ်းဆည်းထားသော ယာယီသိုလှောင်ရာနေရာ။
+*   **`Dataset.save_to_disk()`**: Dataset ကို Apache Arrow format ဖြင့် local disk တွင် သိမ်းဆည်းရန် အသုံးပြုသော function။
+*   **`Dataset.to_csv()`**: Dataset ကို CSV format ဖြင့် local disk တွင် သိမ်းဆည်းရန် အသုံးပြုသော function။
+*   **`Dataset.to_json()`**: Dataset ကို JSON format ဖြင့် local disk တွင် သိမ်းဆည်းရန် အသုံးပြုသော function။
+*   **`load_from_disk()` Function**: Local disk တွင် သိမ်းဆည်းထားသော dataset ကို ပြန်လည် load လုပ်ရန် အသုံးပြုသော function။
+*   **JSON Lines Format**: JSON objects များကို line တစ်ကြောင်းစီတွင် တစ်ခုစီ ထားရှိသော JSON format ၏ ပုံစံတစ်မျိုး။
+*   **`summarization` Pipeline**: စာသားကို အကျဉ်းချုပ်ပေးသည့် Natural Language Processing (NLP) pipeline။
+*   **Data Wrangling**: ကုန်ကြမ်းဒေတာ (raw data) များကို ပိုမိုအသုံးဝင်ပြီး သန့်ရှင်းသော ပုံစံသို့ ပြောင်းလဲရန်အတွက် လုပ်ဆောင်သော လုပ်ငန်းစဉ်များ။
+*   **Laptop RAM**: Laptop ကွန်ပျူတာ၏ Random Access Memory (RAM)။
\ No newline at end of file
diff --git a/chapters/my/chapter5/4.mdx b/chapters/my/chapter5/4.mdx
new file mode 100644
index 000000000..8800628f1
--- /dev/null
+++ b/chapters/my/chapter5/4.mdx
@@ -0,0 +1,336 @@
+# Big Data လား။ 🤗 Datasets က ကယ်တင်ပါလိမ့်မယ်![[big-data-datasets-to-the-rescue]]
+
+<CourseFloatingBanner chapter={5}
+  classNames="absolute z-10 right-0 top-0"
+  notebooks={[
+    {label: "Google Colab", value: "https://colab.research.google.com/github/huggingface/notebooks/blob/master/course/en/chapter5/section4.ipynb"},
+    {label: "Aws Studio", value: "https://studiolab.sagemaker.aws/import/github/huggingface/notebooks/blob/master/course/en/chapter5/section4.ipynb"},
+]} />
+
+ဒီဘက်ခေတ်မှာ multi-gigabyte datasets တွေနဲ့ အလုပ်လုပ်ရတာ မဆန်းပါဘူး။ အထူးသဖြင့် BERT ဒါမှမဟုတ် GPT-2 လို transformer တစ်ခုကို အစကနေ pretrain လုပ်ဖို့ စီစဉ်နေတယ်ဆိုရင်ပေါ့။ ဒီလိုကိစ္စတွေမှာ၊ data တွေကို _loading_ လုပ်တာကတောင် စိန်ခေါ်မှုတစ်ခု ဖြစ်နိုင်ပါတယ်။ ဥပမာအားဖြင့်၊ GPT-2 ကို pretrain လုပ်ဖို့ အသုံးပြုခဲ့တဲ့ WebText corpus မှာ document ပေါင်း ၈ သန်းကျော်နဲ့ 40 GB စာသားတွေ ပါဝင်ပါတယ်၊ ဒါကို သင့် laptop ရဲ့ RAM ထဲကို loading လုပ်တာက သင့် laptop ကို ဒုက္ခပေးနိုင်ပါတယ်။
+
+ကံကောင်းစွာနဲ့ပဲ၊ 🤗 Datasets ကို ဒီကန့်သတ်ချက်တွေကို ကျော်လွှားနိုင်ဖို့ ဒီဇိုင်းထုတ်ထားတာပါ။ ၎င်းက datasets တွေကို _memory-mapped_ files တွေအဖြစ် သတ်မှတ်ပြီး memory management ပြဿနာတွေကနေ သင့်ကို ကင်းဝေးစေပါတယ်။ ပြီးတော့ corpus ထဲက entries တွေကို _streaming_ လုပ်ခြင်းအားဖြင့် hard drive ကန့်သတ်ချက်တွေကနေလည်း ကင်းဝေးစေပါတယ်။
+
+<Youtube id="JwISwTCPPWo"/>
+
+ဒီအပိုင်းမှာတော့ 🤗 Datasets ရဲ့ ဒီ features တွေကို the Pile (825 GB corpus) နဲ့အတူ လေ့လာသွားမှာပါ။ စတင်လိုက်ရအောင်!
+
+## The Pile ဆိုတာ ဘာလဲ။[[what-is-the-pile]]
+
+The Pile ဟာ large-scale language models တွေကို training လုပ်ဖို့အတွက် [EleutherAI](https://www.eleuther.ai) က ဖန်တီးခဲ့တဲ့ English text corpus တစ်ခုပါ။ ၎င်းမှာ သိပ္ပံဆိုင်ရာဆောင်းပါးတွေ၊ GitHub code repositories တွေနဲ့ filter လုပ်ထားတဲ့ web text တွေ ပါဝင်တဲ့ မတူကွဲပြားတဲ့ datasets များစွာ ပါဝင်ပါတယ်။ training corpus ကို [14 GB chunks](https://the-eye.eu/public/AI/pile/) အဖြစ် ရရှိနိုင်ပြီး၊ [individual components](https://the-eye.eu/public/AI/pile_preliminary_components/) အချို့ကိုလည်း download လုပ်နိုင်ပါတယ်။ PubMed Abstracts dataset ကို အရင်ကြည့်ရအောင်။ ဒါက [PubMed](https://pubmed.ncbi.nlm.nih.gov/) က biomedical publications ၁၅ သန်းက abstracts တွေရဲ့ corpus တစ်ခုပါ။ dataset က [JSON Lines format](https://jsonlines.org) နဲ့ `zstandard` library ကို အသုံးပြုပြီး compressed လုပ်ထားတာကြောင့်၊ အရင်ဆုံး အဲဒါကို install လုပ်ဖို့ လိုအပ်ပါတယ်။
+
+```py
+!pip install zstandard
+```
+
+နောက်တစ်ခုကတော့ [section 2](/course/chapter5/2) မှာ ကျွန်တော်တို့ သင်ယူခဲ့တဲ့ remote files တွေအတွက် method ကို အသုံးပြုပြီး dataset ကို load လုပ်နိုင်ပါတယ်။
+
+```py
+from datasets import load_dataset
+
+# ဒါက အချိန်အနည်းငယ် ကြာနိုင်ပါတယ်၊ ဒါကြောင့် သင်စောင့်နေတုန်း လက်ဖက်ရည် ဒါမှမဟုတ် ကော်ဖီတစ်ခွက်လောက် သွားယူလိုက်ပါ :)
+data_files = "https://the-eye.eu/public/AI/pile_preliminary_components/PUBMED_title_abstracts_2019_baseline.jsonl.zst"
+pubmed_dataset = load_dataset("json", data_files=data_files, split="train")
+pubmed_dataset
+```
+
+```python out
+Dataset({
+    features: ['meta', 'text'],
+    num_rows: 15518009
+})
+```
+
+ကျွန်တော်တို့ရဲ့ dataset မှာ rows ပေါင်း ၁၅,၅၁၈,၀၀၉ ခုနဲ့ columns ၂ ခု ရှိတယ်ဆိုတာ တွေ့နိုင်ပါတယ်၊ ဒါက အများကြီးပါပဲ!
+
+> [!TIP]
+> ✎ default အားဖြင့်၊ 🤗 Datasets က dataset တစ်ခုကို load လုပ်ဖို့ လိုအပ်တဲ့ files တွေကို decompress လုပ်ပါလိမ့်မယ်။ hard drive space ကို ချွေတာချင်တယ်ဆိုရင်၊ `DownloadConfig(delete_extracted=True)` ကို `load_dataset()` ရဲ့ `download_config` argument သို့ ပေးပို့နိုင်ပါတယ်။ အသေးစိတ်အချက်အလက်တွေအတွက် [documentation](https://huggingface.co/docs/datasets/package_reference/builder_classes#datasets.DownloadConfig) ကို ကြည့်ပါ။
+
+ပထမဆုံး ဥပမာရဲ့ အကြောင်းအရာတွေကို စစ်ဆေးကြည့်ရအောင်...
+
+```py
+pubmed_dataset[0]
+```
+
+```python out
+{'meta': {'pmid': 11409574, 'language': 'eng'},
+ 'text': 'Epidemiology of hypoxaemia in children with acute lower respiratory infection.\nTo determine the prevalence of hypoxaemia in children aged under 5 years suffering acute lower respiratory infections (ALRI), the risk factors for hypoxaemia in children under 5 years of age with ALRI, and the association of hypoxaemia with an increased risk of dying in children of the same age ...'}
+```
+
+ကောင်းပါပြီ၊ ဒါက ဆေးဘက်ဆိုင်ရာ ဆောင်းပါးတစ်စောင်ရဲ့ abstract လိုပါပဲ။ အခု dataset ကို load လုပ်ဖို့ ကျွန်တော်တို့ ဘယ်လောက် RAM အသုံးပြုခဲ့လဲဆိုတာ ကြည့်ရအောင်!
+
+## Memory Mapping ၏ မှော်ပညာ[[the-magic-of-memory-mapping]]
+
+Python မှာ memory အသုံးပြုမှုကို တိုင်းတာဖို့ ရိုးရှင်းတဲ့နည်းလမ်းက [`psutil`](https://psutil.readthedocs.io/en/latest/) library နဲ့ပါ။ ဒါကို `pip` နဲ့ အောက်ပါအတိုင်း install လုပ်နိုင်ပါတယ်။
+
+```python
+!pip install psutil
+```
+
+ဒါက ကျွန်တော်တို့ လက်ရှိ process ရဲ့ memory အသုံးပြုမှုကို အောက်ပါအတိုင်း စစ်ဆေးနိုင်တဲ့ `Process` class တစ်ခုကို ပံ့ပိုးပေးပါတယ်။
+
+```py
+import psutil
+
+# Process.memory_info ကို bytes ဖြင့် ဖော်ပြထားသောကြောင့် megabytes သို့ ပြောင်းပါ။
+print(f"RAM used: {psutil.Process().memory_info().rss / (1024 * 1024):.2f} MB")
+```
+
+```python out
+RAM used: 5678.33 MB
+```
+
+ဒီနေရာမှာ `rss` attribute က _resident set size_ ကို ရည်ညွှန်းပြီး၊ ဒါက process တစ်ခုက RAM မှာ နေရာယူထားတဲ့ memory ရဲ့ အစိတ်အပိုင်းပါ။ ဒီတိုင်းတာမှုမှာ Python interpreter နဲ့ ကျွန်တော်တို့ load လုပ်ထားတဲ့ libraries တွေ အသုံးပြုတဲ့ memory လည်း ပါဝင်တာကြောင့်၊ dataset ကို load လုပ်ဖို့ အသုံးပြုခဲ့တဲ့ အမှန်တကယ် memory ပမာဏက နည်းနည်း ပိုနည်းပါတယ်။ နှိုင်းယှဉ်ဖို့အတွက်၊ `dataset_size` attribute ကို အသုံးပြုပြီး disk ပေါ်မှာ dataset က ဘယ်လောက်ကြီးလဲဆိုတာ ကြည့်ရအောင်။ ရလဒ်က အရင်ကလို bytes နဲ့ ဖော်ပြထားတာကြောင့်၊ ဒါကို gigabytes သို့ ကိုယ်တိုင် ပြောင်းဖို့ လိုအပ်ပါတယ်။
+
+```py
+print(f"Dataset size in bytes: {pubmed_dataset.dataset_size}")
+size_gb = pubmed_dataset.dataset_size / (1024**3)
+print(f"Dataset size (cache file) : {size_gb:.2f} GB")
+```
+
+```python out
+Dataset size in bytes : 20979437051
+Dataset size (cache file) : 19.54 GB
+```
+
+ကောင်းပါပြီ။ 20 GB နီးပါး ကြီးမားပေမယ့်၊ ကျွန်တော်တို့ dataset ကို RAM နည်းနည်းလေးနဲ့ load လုပ်ပြီး ဝင်ရောက်ကြည့်ရှုနိုင်ပါတယ်။
+
+> [!TIP]
+> ✏️ **စမ်းသပ်ကြည့်ပါ။** သင့် laptop သို့မဟုတ် desktop ရဲ့ RAM ထက် ပိုကြီးတဲ့ Pile က [subsets](https://the-eye.eu/public/AI/pile_preliminary_components/) တစ်ခုကို ရွေးပါ၊ 🤗 Datasets နဲ့ load လုပ်ပါ၊ ပြီးတော့ အသုံးပြုထားတဲ့ RAM ပမာဏကို တိုင်းတာပါ။ တိကျတဲ့ တိုင်းတာမှုတစ်ခု ရဖို့အတွက်၊ ဒါကို process အသစ်တစ်ခုမှာ လုပ်ဖို့ လိုအပ်ပါလိမ့်မယ်။ subset တစ်ခုစီရဲ့ decompress လုပ်ပြီးသား အရွယ်အစားတွေကို [the Pile paper](https://arxiv.org/abs/2101.00027) ရဲ့ Table 1 မှာ ရှာတွေ့နိုင်ပါတယ်။
+
+သင် Pandas နဲ့ ရင်းနှီးတယ်ဆိုရင်၊ ဒီရလဒ်က Wes Kinney ရဲ့ နာမည်ကြီး [rule of thumb](https://wesmckinney.com/blog/apache-arrow-pandas-internals/) ကြောင့် အံ့သြစရာ ဖြစ်နိုင်ပါတယ်။ ဒါက ပုံမှန်အားဖြင့် သင့် dataset ရဲ့ အရွယ်အစားထက် ၅ ဆမှ ၁၀ ဆအထိ RAM လိုအပ်တယ်လို့ ဆိုလိုပါတယ်။ ဒါဆို 🤗 Datasets က ဒီ memory management ပြဿနာကို ဘယ်လိုဖြေရှင်းတာလဲ။ 🤗 Datasets က dataset တစ်ခုစီကို [memory-mapped file](https://en.wikipedia.org/wiki/Memory-mapped_file) တစ်ခုအဖြစ် သတ်မှတ်ပါတယ်။ ဒါက RAM နဲ့ filesystem storage ကြား mapping တစ်ခုကို ပံ့ပိုးပေးပြီး၊ library က dataset ကို memory ထဲကို အပြည့်အဝ load လုပ်ဖို့မလိုဘဲ dataset ရဲ့ elements တွေကို ဝင်ရောက်ကြည့်ရှုပြီး အလုပ်လုပ်နိုင်စေပါတယ်။
+
+Memory-mapped files တွေကို processes များစွာကြား မျှဝေနိုင်တာကြောင့်၊ `Dataset.map()` လို methods တွေကို dataset ကို ရွှေ့ပြောင်းခြင်း ဒါမှမဟုတ် copy လုပ်ဖို့ မလိုဘဲ parallelize လုပ်နိုင်ပါတယ်။ အောက်ခံမှာ၊ ဒီစွမ်းဆောင်ရည်တွေ အားလုံးကို [Apache Arrow](https://arrow.apache.org) memory format နဲ့ [`pyarrow`](https://arrow.apache.org/docs/python/index.html) library တွေကနေ အကောင်အထည်ဖော်ထားတာဖြစ်ပြီး၊ ဒါတွေက data loading နဲ့ processing ကို လျှပ်စီးလက်သလို မြန်ဆန်စေပါတယ်။ (Apache Arrow အကြောင်းနဲ့ Pandas နဲ့ နှိုင်းယှဉ်မှုတွေအတွက် [Dejan Simic ရဲ့ blog post](https://towardsdatascience.com/apache-arrow-read-dataframe-with-zero-memory-69634092b1a) ကို ကြည့်ပါ။) ဒါကို လက်တွေ့မြင်ရဖို့အတွက်၊ PubMed Abstracts dataset ထဲက elements အားလုံးကို iterate လုပ်ပြီး speed test အနည်းငယ် run ကြည့်ရအောင်။
+
+```py
+import timeit
+
+code_snippet = """batch_size = 1000
+
+for idx in range(0, len(pubmed_dataset), batch_size):
+    _ = pubmed_dataset[idx:idx + batch_size]
+"""
+
+time = timeit.timeit(stmt=code_snippet, number=1, globals=globals())
+print(
+    f"Iterated over {len(pubmed_dataset)} examples (about {size_gb:.1f} GB) in "
+    f"{time:.1f}s, i.e. {size_gb/time:.3f} GB/s"
+)
+```
+
+```python out
+'Iterated over 15518009 examples (about 19.5 GB) in 64.2s, i.e. 0.304 GB/s'
+```
+
+ဒီနေရာမှာ ကျွန်တော်တို့ Python ရဲ့ `timeit` module ကို အသုံးပြုပြီး `code_snippet` က ယူခဲ့တဲ့ execution time ကို တိုင်းတာခဲ့ပါတယ်။ သင်ဟာ ပုံမှန်အားဖြင့် GB/s ရဲ့ ဒဿမဂဏန်း အနည်းငယ်ကနေ GB/s အများအပြားအထိ မြန်နှုန်းနဲ့ dataset တစ်ခုကို iterate လုပ်နိုင်ပါလိမ့်မယ်။ ဒါက အသုံးချပရိုဂရမ်အများစုအတွက် ကောင်းမွန်စွာ အလုပ်လုပ်ပါတယ်၊ ဒါပေမယ့် တစ်ခါတစ်ရံမှာ သင့် laptop ရဲ့ hard drive မှာ သိမ်းဆည်းဖို့တောင် အရမ်းကြီးမားတဲ့ dataset တစ်ခုနဲ့ အလုပ်လုပ်ရပါလိမ့်မယ်။ ဥပမာအားဖြင့်၊ ကျွန်တော်တို့ Pile ကို အပြည့်အစုံ download လုပ်ဖို့ ကြိုးစားခဲ့မယ်ဆိုရင်၊ 825 GB free disk space လိုအပ်ပါလိမ့်မယ်! ဒီကိစ္စတွေကို ကိုင်တွယ်ဖို့အတွက်၊ 🤗 Datasets က streaming feature တစ်ခုကို ပံ့ပိုးပေးပြီး dataset တစ်ခုလုံးကို download လုပ်ဖို့မလိုဘဲ elements တွေကို on the fly download လုပ်ပြီး ဝင်ရောက်ကြည့်ရှုနိုင်စေပါတယ်။ ဒါက ဘယ်လိုအလုပ်လုပ်လဲဆိုတာ ကြည့်ရအောင်။
+
+> [!TIP]
+> 💡 Jupyter notebooks တွေမှာ [`%%timeit` magic function](https://ipython.readthedocs.io/en/stable/interactive/magics.html#magic-timeit) ကို အသုံးပြုပြီး cells တွေကိုလည်း time လုပ်နိုင်ပါတယ်။
+
+## Streaming Datasets များ[[streaming-datasets]]
+
+dataset streaming ကို enable လုပ်ဖို့အတွက် `load_dataset()` function သို့ `streaming=True` argument ကို ပေးပို့ဖို့ပဲ လိုပါတယ်။ ဥပမာအားဖြင့်၊ PubMed Abstracts dataset ကို streaming mode မှာ ထပ်မံ load လုပ်ကြည့်ရအောင်။
+
+```py
+pubmed_dataset_streamed = load_dataset(
+    "json", data_files=data_files, split="train", streaming=True
+)
+```
+
+ဒီအခန်းမှာ ကျွန်တော်တို့ ကြုံတွေ့ခဲ့ရတဲ့ ရင်းနှီးတဲ့ `Dataset` အစား၊ `streaming=True` နဲ့ ပြန်ပေးတဲ့ object ဟာ `IterableDataset` ဖြစ်ပါတယ်။ နာမည်က ဖော်ပြသလိုပဲ၊ `IterableDataset` ရဲ့ elements တွေကို ဝင်ရောက်ကြည့်ရှုဖို့အတွက် ကျွန်တော်တို့ ဒါကို iterate လုပ်ဖို့ လိုအပ်ပါတယ်။ ကျွန်တော်တို့ရဲ့ streamed dataset ရဲ့ ပထမဆုံး element ကို အောက်ပါအတိုင်း ဝင်ရောက်ကြည့်ရှုနိုင်ပါတယ်။
+
+```py
+next(iter(pubmed_dataset_streamed))
+```
+
+```python out
+{'meta': {'pmid': 11409574, 'language': 'eng'},
+ 'text': 'Epidemiology of hypoxaemia in children with acute lower respiratory infection.\nTo determine the prevalence of hypoxaemia in children aged under 5 years suffering acute lower respiratory infections (ALRI), the risk factors for hypoxaemia in children under 5 years of age with ALRI, and the association of hypoxaemia with an increased risk of dying in children of the same age ...'}
+```
+
+streamed dataset မှ elements များကို `IterableDataset.map()` ကို အသုံးပြုပြီး on the fly process လုပ်နိုင်ပါတယ်။ ဒါက inputs တွေကို tokenize လုပ်ဖို့ လိုအပ်ရင် training လုပ်နေစဉ်အတွင်း အသုံးဝင်ပါတယ်။ လုပ်ငန်းစဉ်က [Chapter 3](/course/chapter3) မှာ ကျွန်တော်တို့ dataset ကို tokenize လုပ်ဖို့ အသုံးပြုခဲ့တဲ့ လုပ်ငန်းစဉ်အတိုင်း အတိအကျပါပဲ၊ ကွာခြားချက်ကတော့ outputs တွေကို တစ်ခုပြီးတစ်ခု ပြန်ပေးတာပါပဲ။
+
+```py
+from transformers import AutoTokenizer
+
+tokenizer = AutoTokenizer.from_pretrained("distilbert-base-uncased")
+tokenized_dataset = pubmed_dataset_streamed.map(lambda x: tokenizer(x["text"]))
+next(iter(tokenized_dataset))
+```
+
+```python out
+{'input_ids': [101, 4958, 5178, 4328, 6779, ...], 'attention_mask': [1, 1, 1, 1, 1, ...]}
+```
+
+> [!TIP]
+> 💡 streaming နဲ့ tokenization ကို အရှိန်မြှင့်ဖို့အတွက်၊ ယခင်အပိုင်းမှာ ကျွန်တော်တို့ တွေ့ခဲ့ရတဲ့အတိုင်း `batched=True` ကို ပေးပို့နိုင်ပါတယ်။ ဒါက examples တွေကို batch by batch လုပ်ဆောင်ပါလိမ့်မယ်။ default batch size က ၁,၀၀၀ ဖြစ်ပြီး `batch_size` argument နဲ့ သတ်မှတ်နိုင်ပါတယ်။
+
+သင် `IterableDataset.shuffle()` ကို အသုံးပြုပြီး streamed dataset တစ်ခုကို shuffle လုပ်နိုင်ပေမယ့်၊ `Dataset.shuffle()` နဲ့မတူတာက ဒါက သတ်မှတ်ထားတဲ့ `buffer_size` ထဲက elements တွေကိုသာ shuffle လုပ်ပါတယ်။
+
+```py
+shuffled_dataset = pubmed_dataset_streamed.shuffle(buffer_size=10_000, seed=42)
+next(iter(shuffled_dataset))
+```
+
+```python out
+{'meta': {'pmid': 11410799, 'language': 'eng'},
+ 'text': 'Randomized study of dose or schedule modification of granulocyte colony-stimulating factor in platinum-based chemotherapy for elderly patients with lung cancer ...'}
+```
+
+ဒီဥပမာမှာ၊ ကျွန်တော်တို့ buffer ထဲက ပထမဆုံး ၁၀,၀၀၀ ဥပမာထဲက random example တစ်ခုကို ရွေးခဲ့ပါတယ်။ example တစ်ခုကို ဝင်ရောက်ကြည့်ရှုပြီးတာနဲ့၊ buffer ထဲက ၎င်းရဲ့နေရာကို corpus ထဲက နောက်ဆုံး example (အထက်ဖော်ပြပါကိစ္စတွင် ၁၀,၀၀၁ ခုမြောက် example) နဲ့ ဖြည့်ပေးပါလိမ့်မယ်။ `IterableDataset.take()` နဲ့ `IterableDataset.skip()` functions တွေကို အသုံးပြုပြီး streamed dataset ကနေ elements တွေကိုလည်း ရွေးချယ်နိုင်ပါတယ်။ ဒါတွေက `Dataset.select()` နဲ့ အလားတူ လုပ်ဆောင်ပါတယ်။ ဥပမာအားဖြင့်၊ PubMed Abstracts dataset ထဲက ပထမဆုံး ၅ ဥပမာကို ရွေးချယ်ဖို့ ကျွန်တော်တို့ အောက်ပါအတိုင်း လုပ်ဆောင်နိုင်ပါတယ်။
+
+```py
+dataset_head = pubmed_dataset_streamed.take(5)
+list(dataset_head)
+```
+
+```python out
+[{'meta': {'pmid': 11409574, 'language': 'eng'},
+  'text': 'Epidemiology of hypoxaemia in children with acute lower respiratory infection ...'},
+ {'meta': {'pmid': 11409575, 'language': 'eng'},
+  'text': 'Clinical signs of hypoxaemia in children with acute lower respiratory infection: indicators of oxygen therapy ...'},
+ {'meta': {'pmid': 11409576, 'language': 'eng'},
+  'text': "Hypoxaemia in children with severe pneumonia in Papua New Guinea ..."},
+ {'meta': {'pmid': 11409577, 'language': 'eng'},
+  'text': 'Oxygen concentrators and cylinders ...'},
+ {'meta': {'pmid': 11409578, 'language': 'eng'},
+  'text': 'Oxygen supply in rural africa: a personal experience ...'}]
+```
+
+အလားတူပဲ၊ `IterableDataset.skip()` function ကို အသုံးပြုပြီး shuffled dataset ကနေ training နဲ့ validation splits တွေ ဖန်တီးနိုင်ပါတယ်။
+
+```py
+# ပထမဆုံး 1,000 examples တွေကို ကျော်ပြီး ကျန်တာတွေကို training set မှာ ထည့်သွင်းပါ
+train_dataset = shuffled_dataset.skip(1000)
+# validation set အတွက် ပထမဆုံး 1,000 examples တွေကို ယူပါ
+validation_dataset = shuffled_dataset.take(1000)
+```
+
+multiple datasets တွေကို ပေါင်းစပ်ပြီး single corpus တစ်ခု ဖန်တီးခြင်းဆိုတဲ့ common application နဲ့ dataset streaming ကို ကျွန်တော်တို့ရဲ့ လေ့လာမှုကို နိဂုံးချုပ်လိုက်ရအောင်။ 🤗 Datasets က `interleave_datasets()` function တစ်ခုကို ပံ့ပိုးပေးပြီး၊ ဒါက `IterableDataset` objects စာရင်းတစ်ခုကို single `IterableDataset` တစ်ခုအဖြစ် ပြောင်းလဲပေးပါတယ်။ အဲဒီမှာ new dataset ရဲ့ elements တွေကို source examples တွေကြားမှာ alternating လုပ်ခြင်းဖြင့် ရရှိပါတယ်။ ဒီ function က large datasets တွေကို ပေါင်းစပ်ဖို့ ကြိုးစားတဲ့အခါ အထူးအသုံးဝင်ပါတယ်၊ ဒါကြောင့် ဥပမာအနေနဲ့ FreeLaw subset of the Pile ကို stream လုပ်ကြည့်ရအောင်။ ဒါက US courts တွေက ဥပဒေရေးရာအမြင် ၅၁ GB dataset တစ်ခုပါ။
+
+```py
+law_dataset_streamed = load_dataset(
+    "json",
+    data_files="https://the-eye.eu/public/AI/pile_preliminary_components/FreeLaw_Opinions.jsonl.zst",
+    split="train",
+    streaming=True,
+)
+next(iter(law_dataset_streamed))
+```
+
+```python out
+{'meta': {'case_ID': '110921.json',
+  'case_jurisdiction': 'scotus.tar.gz',
+  'date_created': '2010-04-28T17:12:49Z'},
+ 'text': '\n461 U.S. 238 (1983)\nOLIM ET AL.\nv.\nWAKINEKONA\nNo. 81-1581.\nSupreme Court of United States.\nArgued January 19, 1983.\nDecided April 26, 1983.\nCERTIORARI TO THE UNITED STATES COURT OF APPEALS FOR THE NINTH CIRCUIT\n*239 Michael A. Lilly, First Deputy Attorney General of Hawaii, argued the cause for petitioners. With him on the brief was James H. Dannenberg, Deputy Attorney General...'}
+```
+
+ဒီ dataset က laptops အများစုရဲ့ RAM ကို ဒုက္ခပေးလောက်အောင် ကြီးမားပေမယ့်၊ ကျွန်တော်တို့ ဒါကို ချွေးတစ်စက်မှ မကျဘဲ load လုပ်ပြီး ဝင်ရောက်ကြည့်ရှုနိုင်ခဲ့ပါတယ်။ အခု FreeLaw နဲ့ PubMed Abstracts datasets တွေက examples တွေကို `interleave_datasets()` function နဲ့ ပေါင်းစပ်ကြည့်ရအောင်။
+
+```py
+from itertools import islice
+from datasets import interleave_datasets
+
+combined_dataset = interleave_datasets([pubmed_dataset_streamed, law_dataset_streamed])
+list(islice(combined_dataset, 2))
+```
+
+```python out
+[{'meta': {'pmid': 11409574, 'language': 'eng'},
+  'text': 'Epidemiology of hypoxaemia in children with acute lower respiratory infection ...'},
+ {'meta': {'case_ID': '110921.json',
+   'case_jurisdiction': 'scotus.tar.gz',
+   'date_created': '2010-04-28T17:12:49Z'},
+  'text': '\n461 U.S. 238 (1983)\nOLIM ET AL.\nv.\nWAKINEKONA\nNo. 81-1581.\nSupreme Court of United States.\nArgued January 19, 1983.\nDecided April 26, 1983.\nCERTIORARI TO THE UNITED STATES COURT OF APPEALS FOR THE NINTH CIRCUIT\n*239 Michael A. Lilly, First Deputy Attorney General of Hawaii, argued the cause for petitioners. With him on the brief was James H. Dannenberg, Deputy Attorney General...'}]
+```
+
+ဒီနေရာမှာ ကျွန်တော်တို့ Python ရဲ့ `itertools` module က `islice()` function ကို အသုံးပြုပြီး combined dataset ကနေ ပထမဆုံး ဥပမာ နှစ်ခုကို ရွေးခဲ့ပါတယ်၊ ပြီးတော့ ဒါတွေဟာ source datasets နှစ်ခုလုံးရဲ့ ပထမဆုံး ဥပမာတွေနဲ့ ကိုက်ညီတာကို တွေ့ရပါတယ်။
+
+နောက်ဆုံးအနေနဲ့၊ သင် Pile ကို ၎င်းရဲ့ 825 GB အပြည့်အစုံ stream လုပ်ချင်တယ်ဆိုရင်၊ ပြင်ဆင်ထားတဲ့ files တွေအားလုံးကို အောက်ပါအတိုင်း ရယူနိုင်ပါတယ်။
+
+```py
+base_url = "https://the-eye.eu/public/AI/pile/"
+data_files = {
+    "train": [base_url + "train/" + f"{idx:02d}.jsonl.zst" for idx in range(30)],
+    "validation": base_url + "val.jsonl.zst",
+    "test": base_url + "test.jsonl.zst",
+}
+pile_dataset = load_dataset("json", data_files=data_files, streaming=True)
+next(iter(pile_dataset["train"]))
+```
+
+```python out
+{'meta': {'pile_set_name': 'Pile-CC'},
+ 'text': 'It is done, and submitted. You can play “Survival of the Tastiest” on Android, and on the web...'}
+```
+
+> [!TIP]
+> ✏️ **စမ်းသပ်ကြည့်ပါ။** [`mc4`](https://huggingface.co/datasets/mc4) ဒါမှမဟုတ် [`oscar`](https://huggingface.co/datasets/oscar) လိုမျိုး ကြီးမားတဲ့ Common Crawl corpora တွေထဲက တစ်ခုကို အသုံးပြုပြီး သင်ရွေးချယ်ထားတဲ့ နိုင်ငံတစ်ခုမှာ ဘာသာစကားတွေ ပြောဆိုတဲ့ အချိုးအစားကို ကိုယ်စားပြုတဲ့ streaming multilingual dataset တစ်ခု ဖန်တီးပါ။ ဥပမာအားဖြင့်၊ Switzerland မှာ ဘာသာစကား လေးမျိုး (German, French, Italian, Romansh) ရှိတာကြောင့်၊ Oscar subsets တွေကို ၎င်းတို့ရဲ့ ပြောဆိုတဲ့အချိုးအစားအတိုင်း sampling လုပ်ပြီး Swiss corpus တစ်ခု ဖန်တီးကြည့်နိုင်ပါတယ်။
+
+အခုဆိုရင် သင်ဟာ datasets အမျိုးအစားပေါင်းစုံကို load လုပ်ပြီး process လုပ်နိုင်ဖို့ လိုအပ်တဲ့ tools တွေ အားလုံးကို ပိုင်ဆိုင်နေပါပြီ။ ဒါပေမယ့် သင် အရမ်းကံကောင်းနေမှသာ၊ သင့် NLP ခရီးစဉ်မှာ လက်ရှိပြဿနာကို ဖြေရှင်းဖို့ dataset တစ်ခုကို တကယ်တမ်း ဖန်တီးရမယ့်အချိန် ရောက်လာပါလိမ့်မယ်။ ဒါကတော့ နောက်အပိုင်းရဲ့ ခေါင်းစဉ်ပါပဲ!
+
+## ဝေါဟာရ ရှင်းလင်းချက် (Glossary)
+
+*   **Big Data**: ပုံမှန်ဒေတာစီမံခန့်ခွဲမှုကိရိယာများဖြင့် ကိုင်တွယ်ရန် ခက်ခဲသော အလွန်ကြီးမားသော ဒေတာအစုအဝေးများ။
+*   **🤗 Datasets**: Hugging Face က ထုတ်လုပ်ထားတဲ့ library တစ်ခုဖြစ်ပြီး AI မော်ဒယ်တွေ လေ့ကျင့်ဖို့အတွက် ဒေတာအစုအဝေး (datasets) တွေကို လွယ်လွယ်ကူကူ ဝင်ရောက်ရယူ၊ စီမံခန့်ခွဲပြီး အသုံးပြုနိုင်စေပါတယ်။
+*   **Pretrain**: Model တစ်ခုကို အကြီးစားဒေတာများဖြင့် အစောပိုင်းကတည်းက လေ့ကျင့်ထားခြင်း။
+*   **Transformer**: Natural Language Processing (NLP) မှာ အောင်မြင်မှုများစွာရရှိခဲ့တဲ့ deep learning architecture တစ်မျိုးပါ။
+*   **BERT (Bidirectional Encoder Representations from Transformers)**: Google က ဖန်တီးခဲ့သော Transformer-based language model တစ်ခု။
+*   **GPT-2 (Generative Pre-trained Transformer 2)**: OpenAI က ဖန်တီးခဲ့သော Transformer-based language model တစ်ခု။
+*   **WebText Corpus**: GPT-2 ကို pretrain လုပ်ရာတွင် အသုံးပြုခဲ့သော ကြီးမားသော စာသားအစုအဝေး။
+*   **RAM (Random Access Memory)**: ကွန်ပျူတာ၏ ယာယီမှတ်ဉာဏ်သိုလှောင်ရာနေရာ။
+*   **Memory Management**: ကွန်ပျူတာ၏ memory အသုံးပြုမှုကို ထိန်းချုပ်ခြင်းနှင့် စီမံခန့်ခွဲခြင်း။
+*   **Memory-mapped Files**: ဖိုင်တစ်ခု၏ အကြောင်းအရာများကို ကွန်ပျူတာ၏ virtual memory နေရာသို့ တိုက်ရိုက်ချိတ်ဆက်ပေးသည့် နည်းလမ်း။ ၎င်းသည် ကြီးမားသောဖိုင်များကို disk ပေါ်ကနေ လိုအပ်သလောက်သာ memory ထဲသို့ load လုပ်စေပြီး memory အသုံးပြုမှုကို လျှော့ချသည်။
+*   **Streaming**: ဒေတာများကို တစ်စိတ်တစ်ပိုင်းစီ download လုပ်ပြီး လုပ်ဆောင်နိုင်ခြင်း၊ dataset တစ်ခုလုံးကို memory ထဲသို့ load လုပ်ရန် မလိုဘဲ။
+*   **Corpus**: စာသားအမြောက်အမြား စုစည်းထားခြင်း။
+*   **Hard Drive Limits**: ကွန်ပျူတာ၏ hard drive (disk storage) ၏ သိုလှောင်နိုင်စွမ်း ကန့်သတ်ချက်များ။
+*   **The Pile**: EleutherAI မှ ဖန်တီးထားသော ကြီးမားသည့် English text corpus တစ်ခု။
+*   **EleutherAI**: Large-scale open-source AI models များ ဖန်တီးရန် အဓိကထားသော collective တစ်ခု။
+*   **PubMed Abstracts Dataset**: PubMed မှ biomedical publications ၏ abstracts များပါဝင်သော dataset။
+*   **JSON Lines Format**: JSON objects များကို line တစ်ကြောင်းစီတွင် တစ်ခုစီ ထားရှိသော JSON format ၏ ပုံစံတစ်မျိုး။
+*   **`zstandard` Library**: Zstandard compression algorithm ကို အသုံးပြု၍ ဖိုင်များကို compress (ဖိသိပ်) သို့မဟုတ် decompress (ဖိသိပ်မှုဖြေလျှော့) ရန်အတွက် Python library။
+*   **`pip install zstandard`**: `zstandard` library ကို Python package manager `pip` ဖြင့် install လုပ်ခြင်း။
+*   **`load_dataset()` Function**: Hugging Face Datasets library မှ dataset များကို download လုပ်ပြီး cache လုပ်ရန် အသုံးပြုသော function။
+*   **`data_files` Argument**: `load_dataset()` function တွင် dataset files (local သို့မဟုတ် remote) ၏ path (သို့မဟုတ် URL) ကို သတ်မှတ်ရန် အသုံးပြုသော argument။
+*   **`split` Argument**: `load_dataset()` function တွင် dataset ၏ မည်သည့် split (ဥပမာ- "train", "validation", "test") ကို load လုပ်ရမည်ကို သတ်မှတ်ရန် အသုံးပြုသော argument။
+*   **`Dataset` Object**: 🤗 Datasets library မှ data များကို ကိုယ်စားပြုသော object။
+*   **`DownloadConfig(delete_extracted=True)`**: `DownloadConfig` class ၏ object တစ်ခုဖြစ်ပြီး download လုပ်ပြီးသား extracted files များကို delete လုပ်ရန် configure လုပ်ထားသည်။
+*   **`download_config` Argument**: `load_dataset()` function တွင် download configuration ကို သတ်မှတ်ရန် အသုံးပြုသော argument။
+*   **`psutil` Library**: Python တွင် process နှင့် system utility များကို ဝင်ရောက်ကြည့်ရှုရန် အသုံးပြုသော library။
+*   **`Process` Class**: `psutil` library မှ လက်ရှိ Python process ကို ကိုယ်စားပြုသော class။
+*   **`memory_info().rss`**: Process တစ်ခုက RAM မှာ နေရာယူထားတဲ့ memory ပမာဏ (resident set size) ကို bytes ဖြင့် ပြန်ပေးသော method။
+*   **`dataset_size` Attribute**: `Dataset` object ၏ disk ပေါ်ရှိ အရွယ်အစားကို bytes ဖြင့် ပြန်ပေးသော attribute။
+*   **Wes McKinney**: Pandas library ကို ဖန်တီးသူ။
+*   **Rule of Thumb**: အတွေ့အကြုံအရ အသုံးများသော သို့မဟုတ် လက်တွေ့ကျသော လမ်းညွှန်ချက်။
+*   **Apache Arrow**: In-memory data format တစ်ခုဖြစ်ပြီး data analytics applications တွေကြား ဒေတာဖလှယ်မှုကို မြန်ဆန်စေပြီး ထိရောက်စေသည်။
+*   **`pyarrow` Library**: Python တွင် Apache Arrow format ကို အသုံးပြုရန်အတွက် library။
+*   **`timeit` Module**: Python code snippets များ၏ execution time ကို တိုင်းတာရန်အတွက် built-in module။
+*   **`code_snippet`**: တိုင်းတာလိုသော Python code အပိုင်း။
+*   **`%%timeit` Magic Function**: Jupyter notebooks တွင် cells များ၏ execution time ကို တိုင်းတာရန်အတွက် magic function။
+*   **`streaming=True` Argument**: `load_dataset()` function တွင် dataset streaming ကို enable လုပ်ရန် အသုံးပြုသော argument။
+*   **`IterableDataset`**: 🤗 Datasets library မှ streaming mode တွင် data များကို ကိုယ်စားပြုသော object။ ၎င်း၏ elements များကို iterate လုပ်ခြင်းဖြင့်သာ ဝင်ရောက်ကြည့်ရှုနိုင်သည်။
+*   **`next(iter(dataset))`**: `IterableDataset` ၏ ပထမဆုံး element ကို ယူရန်အတွက် Python idiom။
+*   **`IterableDataset.map()`**: `IterableDataset` ၏ elements များကို process လုပ်ရန်အတွက် method။
+*   **`AutoTokenizer`**: Hugging Face Transformers library မှာ ပါဝင်တဲ့ class တစ်ခုဖြစ်ပြီး မော်ဒယ်အမည်ကို အသုံးပြုပြီး သက်ဆိုင်ရာ tokenizer ကို အလိုအလျောက် load လုပ်ပေးသည်။
+*   **`distilbert-base-uncased`**: DistilBERT model ၏ base uncased version အတွက် checkpoint identifier။
+*   **`input_ids`**: Tokenizer မှ ထုတ်ပေးသော tokens တစ်ခုစီ၏ ထူးခြားသော ဂဏန်းဆိုင်ရာ ID များ။
+*   **`attention_mask`**: မော်ဒယ်ကို အာရုံစိုက်သင့်သည့် tokens များနှင့် လျစ်လျူရှုသင့်သည့် (padding) tokens များကို ခွဲခြားပေးသည့် binary mask။
+*   **`batched=True`**: `map()` method မှာ အသုံးပြုသော argument တစ်ခုဖြစ်ပြီး function ကို dataset ရဲ့ element အများအပြားပေါ်မှာ တစ်ပြိုင်နက်တည်း အသုံးပြုစေသည်။
+*   **`batch_size`**: training လုပ်ငန်းစဉ်တစ်ခုစီတွင် model သို့ ပေးပို့သော input samples အရေအတွက်။
+*   **`IterableDataset.shuffle()`**: streamed dataset ၏ elements များကို shuffle လုပ်ရန်အတွက် method။ ၎င်းသည် သတ်မှတ်ထားသော `buffer_size` အတွင်းရှိ elements များကိုသာ shuffle လုပ်သည်။
+*   **`buffer_size`**: `IterableDataset.shuffle()` တွင် elements များကို shuffle လုပ်ရန် အသုံးပြုသော memory buffer ၏ အရွယ်အစား။
+*   **`seed`**: Random number generator ၏ အစမှတ်ကို သတ်မှတ်ခြင်း။
+*   **`IterableDataset.take()`**: streamed dataset မှ ပထမဆုံး `n` elements များကို ယူရန်အတွက် method။
+*   **`IterableDataset.skip()`**: streamed dataset မှ ပထမဆုံး `n` elements များကို ကျော်သွားရန်အတွက် method။
+*   **`Dataset.select()`**: regular `Dataset` မှ သီးခြား index ဖြင့် elements များကို ရွေးချယ်ရန်အတွက် method။
+*   **Training Dataset**: Model ကို လေ့ကျင့်ရန်အတွက် အသုံးပြုသော dataset အပိုင်း။
+*   **Validation Dataset**: Training လုပ်နေစဉ် model ၏ စွမ်းဆောင်ရည်ကို အကဲဖြတ်ရန် အသုံးပြုသော dataset အပိုင်း။
+*   **`interleave_datasets()` Function**: Hugging Face Datasets library မှ function တစ်ခုဖြစ်ပြီး `IterableDataset` objects စာရင်းကို single `IterableDataset` တစ်ခုအဖြစ် ပြောင်းလဲပေးသည်။
+*   **`itertools` Module**: Python ၏ built-in module တစ်ခုဖြစ်ပြီး iterator များကို ဖန်တီးရန်နှင့် ကိုင်တွယ်ရန်အတွက် အသုံးဝင်သော functions များကို ပံ့ပိုးပေးသည်။
+*   **`islice()` Function**: `itertools` module မှ iterator တစ်ခု၏ element များကို slice လုပ်ရန်အတွက် function။
+*   **FreeLaw Subset**: The Pile dataset ၏ အစိတ်အပိုင်းတစ်ခုဖြစ်ပြီး US courts မှ legal opinions များ ပါဝင်သည်။
+*   **Common Crawl Corpora**: အင်တာနက်မှ web data များကို စုစည်းထားသော ကြီးမားသည့် dataset များ (ဥပမာ- `mc4`, `oscar`)။
+*   **Multilingual Dataset**: ဘာသာစကားမျိုးစုံ ပါဝင်သော dataset။
+*   **Sampling**: ကြီးမားသော dataset တစ်ခုမှ သေးငယ်သော အစိတ်အပိုင်းတစ်ခုကို ရွေးချယ်ခြင်း။
+*   **NLP Journey**: Natural Language Processing (NLP) နယ်ပယ်တွင် လေ့လာသင်ယူခြင်းနှင့် အလုပ်လုပ်ခြင်း ခရီးစဉ်။
+*   **Problem at Hand**: လက်ရှိ ဖြေရှင်းရန် လိုအပ်သော ပြဿနာ။
diff --git a/chapters/my/chapter5/5.mdx b/chapters/my/chapter5/5.mdx
new file mode 100644
index 000000000..1f0f50b12
--- /dev/null
+++ b/chapters/my/chapter5/5.mdx
@@ -0,0 +1,465 @@
+# သင့်ကိုယ်ပိုင် Dataset တစ်ခု ဖန်တီးခြင်း[[creating-your-own-dataset]]
+
+<CourseFloatingBanner chapter={5}
+  classNames="absolute z-10 right-0 top-0"
+  notebooks={[
+    {label: "Google Colab", value: "https://colab.research.google.com/github/huggingface/notebooks/blob/master/course/en/chapter5/section5.ipynb"},
+    {label: "Aws Studio", value: "https://studiolab.sagemaker.aws/import/github/huggingface/notebooks/blob/master/course/en/chapter5/section5.ipynb"},
+]} />
+
+တခါတရံမှာ သင် NLP application တစ်ခု တည်ဆောက်ဖို့ လိုအပ်တဲ့ dataset ဟာ မရှိနေတတ်ပါဘူး။ ဒါကြောင့် သင်ကိုယ်တိုင် ဖန်တီးဖို့ လိုအပ်ပါလိမ့်မယ်။ ဒီအပိုင်းမှာ GitHub repository တွေမှာ bugs တွေ ဒါမှမဟုတ် features တွေကို ခြေရာခံရာမှာ အသုံးများတဲ့ [GitHub issues](https://github.com/features/issues/) တွေကို corpus တစ်ခုအနေနဲ့ ဘယ်လိုဖန်တီးရမလဲဆိုတာကို ပြသပေးပါမယ်။ ဒီ corpus ကို ရည်ရွယ်ချက်အမျိုးမျိုးအတွက် အသုံးပြုနိုင်ပါတယ်။
+
+*   ပွင့်နေတဲ့ issues ဒါမှမဟုတ် pull requests တွေကို ပိတ်ဖို့ ဘယ်လောက်ကြာသလဲဆိုတာကို လေ့လာခြင်း
+*   issue ရဲ့ ဖော်ပြချက်ပေါ်မူတည်ပြီး metadata (ဥပမာ- "bug," "enhancement," သို့မဟုတ် "question") နဲ့ tag လုပ်နိုင်တဲ့ _multilabel classifier_ တစ်ခုကို training လုပ်ခြင်း
+*   အသုံးပြုသူရဲ့ query နဲ့ ကိုက်ညီတဲ့ issues တွေကို ရှာဖွေဖို့ semantic search engine တစ်ခု ဖန်တီးခြင်း
+
+ဒီနေရာမှာ ကျွန်တော်တို့ corpus ဖန်တီးတာကို အဓိကထားပြီး၊ နောက်အပိုင်းမှာတော့ semantic search application ကို လေ့လာပါမယ်။ အကြောင်းအရာကို ပိုမိုနားလည်လွယ်အောင်၊ လူကြိုက်များတဲ့ open source project တစ်ခုဖြစ်တဲ့ 🤗 Datasets နဲ့ ဆက်စပ်နေတဲ့ GitHub issues တွေကို အသုံးပြုပါမယ်! data ကို ဘယ်လိုရယူရမလဲ၊ ဒီ issues တွေမှာ ပါဝင်တဲ့ အချက်အလက်တွေကို ဘယ်လိုလေ့လာရမလဲဆိုတာ ကြည့်ရအောင်။
+
+## Data ကို ရယူခြင်း[[getting-the-data]]
+
+🤗 Datasets မှာရှိတဲ့ issues အားလုံးကို repository ရဲ့ [Issues tab](https://github.com/huggingface/datasets/issues) ကို သွားပြီး ရှာဖွေနိုင်ပါတယ်။ အောက်ပါ screenshot မှာ ပြထားတဲ့အတိုင်း၊ ဒီစာကို ရေးသားနေချိန်မှာ ပွင့်နေတဲ့ issues ၃၃၁ ခုနဲ့ ပိတ်ထားတဲ့ issues ၆၆၈ ခု ရှိပါတယ်။
+
+<div class="flex justify-center">
+<img src="https://huggingface.co/datasets/huggingface-course/documentation-images/resolve/main/en/chapter5/datasets-issues.png" alt="The GitHub issues associated with 🤗 Datasets." width="80%"/>
+</div>
+
+ဒီ issues တွေထဲက တစ်ခုကို နှိပ်လိုက်ရင် title၊ description နဲ့ issue ကို ဖော်ပြတဲ့ labels အစုအဝေးတစ်ခု ပါဝင်တာကို သင်တွေ့ရပါလိမ့်မယ်။ ဥပမာတစ်ခုကို အောက်ပါ screenshot မှာ ပြသထားပါတယ်။
+
+<div class="flex justify-center">
+<img src="https://huggingface.co/datasets/huggingface-course/documentation-images/resolve/main/en/chapter5/datasets-issues-single.png" alt="A typical GitHub issue in the 🤗 Datasets repository." width="80%"/>
+</div>
+
+repository ရဲ့ issues အားလုံးကို download လုပ်ဖို့၊ [`Issues` endpoint](https://docs.github.com/en/rest/reference/issues#list-repository-issues) ကို poll လုပ်ဖို့ [GitHub REST API](https://docs.github.com/en/rest) ကို အသုံးပြုပါမယ်။ ဒီ endpoint က JSON objects စာရင်းတစ်ခုကို ပြန်ပေးပြီး၊ object တစ်ခုစီမှာ title နဲ့ description အပြင် issue ရဲ့ status နဲ့ အခြား metadata အများအပြား ပါဝင်ပါတယ်။
+
+issues တွေကို download လုပ်ဖို့ အဆင်ပြေတဲ့ နည်းလမ်းတစ်ခုကတော့ `requests` library ကို အသုံးပြုခြင်းပါပဲ။ ဒါက Python မှာ HTTP requests တွေ ပြုလုပ်ဖို့အတွက် standard နည်းလမ်းတစ်ခုပါ။ library ကို အောက်ပါအတိုင်း install လုပ်နိုင်ပါတယ်။
+
+```python
+!pip install requests
+```
+
+library ကို install လုပ်ပြီးတာနဲ့၊ `requests.get()` function ကို ခေါ်ခြင်းဖြင့် `Issues` endpoint ကို GET requests တွေ ပြုလုပ်နိုင်ပါတယ်။ ဥပမာအားဖြင့်၊ ပထမစာမျက်နှာရဲ့ ပထမဆုံး issue ကို ပြန်လည်ရယူဖို့ အောက်ပါ command ကို run နိုင်ပါတယ်။
+
+```py
+import requests
+
+url = "https://api.github.com/repos/huggingface/datasets/issues?page=1&per_page=1"
+response = requests.get(url)
+```
+
+`response` object က HTTP status code အပါအဝင် request နဲ့ပတ်သက်တဲ့ အသုံးဝင်တဲ့ အချက်အလက်များစွာ ပါဝင်ပါတယ်။
+
+```py
+response.status_code
+```
+
+```python out
+200
+```
+
+ဒီနေရာမှာ `200` status က request အောင်မြင်တယ်လို့ ဆိုလိုပါတယ် (ဖြစ်နိုင်ချေရှိတဲ့ HTTP status codes စာရင်းကို [ဒီနေရာမှာ](https://en.wikipedia.org/wiki/List_of_HTTP_status_codes) ရှာတွေ့နိုင်ပါတယ်။) ကျွန်တော်တို့ တကယ်စိတ်ဝင်စားတာက _payload_ ဖြစ်ပြီး၊ ဒါကို bytes, strings, သို့မဟုတ် JSON လိုမျိုး formats မျိုးစုံနဲ့ ဝင်ရောက်ကြည့်ရှုနိုင်ပါတယ်။ ကျွန်တော်တို့ issues တွေက JSON format နဲ့ဆိုတာ သိတဲ့အတွက်၊ payload ကို အောက်ပါအတိုင်း စစ်ဆေးကြည့်ရအောင်။
+
+```py
+response.json()
+```
+
+```python out
+[{'url': 'https://api.github.com/repos/huggingface/datasets/issues/2792',
+  'repository_url': 'https://api.github.com/repos/huggingface/datasets',
+  'labels_url': 'https://api.github.com/repos/huggingface/datasets/issues/2792/labels{/name}',
+  'comments_url': 'https://api.github.com/repos/huggingface/datasets/issues/2792/comments',
+  'events_url': 'https://api.github.com/repos/huggingface/datasets/issues/2792/events',
+  'html_url': 'https://github.com/huggingface/datasets/pull/2792',
+  'id': 968650274,
+  'node_id': 'MDExOlB1bGxSZXF1ZXN0NzEwNzUyMjc0',
+  'number': 2792,
+  'title': 'Update GooAQ',
+  'user': {'login': 'bhavitvyamalik',
+   'id': 19718818,
+   'node_id': 'MDQ6VXNlcjE5NzE4ODE4',
+   'avatar_url': 'https://avatars.githubusercontent.com/u/19718818?v=4',
+   'gravatar_id': '',
+   'url': 'https://api.github.com/users/bhavitvyamalik',
+   'html_url': 'https://github.com/bhavitvyamalik',
+   'followers_url': 'https://api.github.com/users/bhavitvyamalik/followers',
+   'following_url': 'https://api.github.com/users/bhavitvyamalik/following{/other_user}',
+   'gists_url': 'https://api.github.com/users/bhavitvyamalik/gists{/gist_id}',
+   'starred_url': 'https://api.github.com/users/bhavitvyamalik/starred{/owner}{/repo}',
+   'subscriptions_url': 'https://api.github.com/users/bhavitvyamalik/subscriptions',
+   'organizations_url': 'https://api.github.com/users/bhavitvyamalik/orgs',
+   'repos_url': 'https://api.github.com/users/bhavitvyamalik/repos',
+   'events_url': 'https://api.github.com/users/bhavitvyamalik/events{/privacy}',
+   'received_events_url': 'https://api.github.com/users/bhavitvyamalik/received_events',
+   'type': 'User',
+   'site_admin': False},
+  'labels': [],
+  'state': 'open',
+  'locked': False,
+  'assignee': None,
+  'assignees': [],
+  'milestone': None,
+  'comments': 1,
+  'created_at': '2021-08-12T11:40:18Z',
+  'updated_at': '2021-08-12T12:31:17Z',
+  'closed_at': None,
+  'author_association': 'CONTRIBUTOR',
+  'active_lock_reason': None,
+  'pull_request': {'url': 'https://api.github.com/repos/huggingface/datasets/pulls/2792',
+   'html_url': 'https://github.com/huggingface/datasets/pull/2792',
+   'diff_url': 'https://github.com/huggingface/datasets/pull/2792.diff',
+   'patch_url': 'https://github.com/huggingface/datasets/pull/2792.patch'},
+  'body': '[GooAQ](https://github.com/allenai/gooaq) dataset was recently updated after splits were added for the same. This PR contains new updated GooAQ with train/val/test splits and updated README as well.',
+  'performed_via_github_app': None}]
+```
+
+အိုး၊ အချက်အလက်တွေ အများကြီးပါပဲ။ issue ကို ဖော်ပြတဲ့ `title`, `body`, နဲ့ `number` လိုမျိုး အသုံးဝင်တဲ့ fields တွေအပြင် issue ကို ဖွင့်ခဲ့တဲ့ GitHub user အကြောင်း အချက်အလက်တွေကိုလည်း ကျွန်တော်တို့ မြင်တွေ့ရပါတယ်။
+
+> [!TIP]
+> ✏️ **စမ်းသပ်ကြည့်ပါ။** အပေါ်က JSON payload ထဲက URL အချို့ကို နှိပ်ပြီး GitHub issue တစ်ခုစီက ဘယ်လိုအချက်အလက်မျိုးတွေနဲ့ ချိတ်ဆက်ထားလဲဆိုတာကို ခံစားကြည့်ပါ။
+
+GitHub [documentation](https://docs.github.com/en/rest/overview/resources-in-the-rest-api#rate-limiting) မှာ ဖော်ပြထားတဲ့အတိုင်း၊ authentication မလုပ်ထားတဲ့ requests တွေကို တစ်နာရီလျှင် 60 requests သာ ကန့်သတ်ထားပါတယ်။ သင် `per_page` query parameter ကို တိုးမြှင့်ခြင်းဖြင့် သင်ပြုလုပ်တဲ့ requests အရေအတွက်ကို လျှော့ချနိုင်ပေမယ့်၊ issues ထောင်ပေါင်းများစွာရှိတဲ့ repository တွေမှာတော့ rate limit ကို ကျော်လွန်နေဦးမှာပါ။ ဒါကြောင့်၊ သင်ဟာ တစ်နာရီလျှင် 5,000 requests အထိ rate limit ကို မြှင့်တင်နိုင်ဖို့ personal access token တစ်ခု ဖန်တီးဖို့ GitHub ရဲ့ [ညွှန်ကြားချက်များ](https://docs.github.com/en/github/authenticating-to-github/creating-a-personal-access-token) ကို လိုက်နာသင့်ပါတယ်။ သင့် token ရပြီဆိုတာနဲ့ request header ထဲမှာ ထည့်သွင်းနိုင်ပါတယ်။
+
+```py
+GITHUB_TOKEN = xxx  # သင့် GitHub token ကို ဒီနေရာမှာ ကူးထည့်ပါ။
+headers = {"Authorization": f"token {GITHUB_TOKEN}"}
+```
+
+> [!WARNING]
+> ⚠️ သင့် `GITHUB_TOKEN` ကို ကူးထည့်ထားတဲ့ notebook ကို မျှဝေခြင်း မပြုပါနဲ့။ ဒီအချက်အလက်တွေ မတော်တဆ ပေါက်ကြားတာမျိုး မဖြစ်အောင်၊ သင် run ပြီးတာနဲ့ နောက်ဆုံး cell ကို ဖျက်ပစ်ဖို့ ကျွန်တော်တို့ အကြံပြုပါတယ်။ ပိုကောင်းတာကတော့ token ကို *.env* file ထဲမှာ သိမ်းဆည်းထားပြီး environment variable တစ်ခုအနေနဲ့ အလိုအလျောက် load လုပ်ပေးဖို့ [`python-dotenv` library](https://github.com/theskumar/python-dotenv) ကို အသုံးပြုပါ။
+
+access token ရရှိပြီဆိုတာနဲ့၊ GitHub repository တစ်ခုကနေ issues အားလုံးကို download လုပ်နိုင်တဲ့ function တစ်ခု ဖန်တီးကြည့်ရအောင်။
+
+```py
+import time
+import math
+from pathlib import Path
+import pandas as pd
+from tqdm.notebook import tqdm
+
+
+def fetch_issues(
+    owner="huggingface",
+    repo="datasets",
+    num_issues=10_000,
+    rate_limit=5_000,
+    issues_path=Path("."),
+):
+    if not issues_path.is_dir():
+        issues_path.mkdir(exist_ok=True)
+
+    batch = []
+    all_issues = []
+    per_page = 100  # စာမျက်နှာတစ်ခုစီတွင် ပြန်ပေးမည့် issues အရေအတွက်
+    num_pages = math.ceil(num_issues / per_page)
+    base_url = "https://api.github.com/repos"
+
+    for page in tqdm(range(num_pages)):
+        # state=all နဲ့ query လုပ်ပြီး open နဲ့ closed issues နှစ်ခုလုံးကို ရယူပါ
+        query = f"issues?page={page}&per_page={per_page}&state=all"
+        issues = requests.get(f"{base_url}/{owner}/{repo}/{query}", headers=headers)
+        batch.extend(issues.json())
+
+        if len(batch) > rate_limit and len(all_issues) < num_issues:
+            all_issues.extend(batch)
+            batch = []  # နောက်တစ်ကြိမ်အတွက် batch ကို ရှင်းပါ
+            print(f"GitHub rate limit ရောက်ပါပြီ။ တစ်နာရီကြာ အိပ်ပါမည် ...")
+            time.sleep(60 * 60 + 1)
+
+    all_issues.extend(batch)
+    df = pd.DataFrame.from_records(all_issues)
+    df.to_json(f"{issues_path}/{repo}-issues.jsonl", orient="records", lines=True)
+    print(
+        f"{repo} အတွက် issues အားလုံးကို download လုပ်ပြီးပါပြီ! Dataset ကို {issues_path}/{repo}-issues.jsonl မှာ သိမ်းဆည်းထားပါတယ်"
+    )
+```
+
+အခု `fetch_issues()` ကို ခေါ်လိုက်တဲ့အခါ GitHub ရဲ့ တစ်နာရီ requests အရေအတွက် ကန့်သတ်ချက်ကို မကျော်လွန်စေဖို့ issues အားလုံးကို batches အလိုက် download လုပ်ပါလိမ့်မယ်။ ရလဒ်ကို *repository_name-issues.jsonl* file ထဲမှာ သိမ်းဆည်းထားမှာဖြစ်ပြီး၊ line တစ်ကြောင်းစီက issue တစ်ခုကို ကိုယ်စားပြုတဲ့ JSON object တစ်ခု ဖြစ်ပါတယ်။ ဒီ function ကို အသုံးပြုပြီး 🤗 Datasets ကနေ issues အားလုံးကို ရယူလိုက်ရအောင်။
+
+```py
+# သင့်အင်တာနက်ချိတ်ဆက်မှုပေါ်မူတည်ပြီး၊ ဒါက မိနစ်အနည်းငယ် ကြာနိုင်ပါတယ်...
+fetch_issues()
+```
+
+issues တွေကို download လုပ်ပြီးတာနဲ့ [အပိုင်း ၂](/course/chapter5/2) ကနေ ကျွန်တော်တို့ရဲ့ အသစ်တွေ့ရှိတဲ့ ကျွမ်းကျင်မှုတွေကို အသုံးပြုပြီး ၎င်းတို့ကို locally load လုပ်နိုင်ပါတယ်-
+
+```py
+issues_dataset = load_dataset("json", data_files="datasets-issues.jsonl", split="train")
+issues_dataset
+```
+
+```python out
+Dataset({
+    features: ['url', 'repository_url', 'labels_url', 'comments_url', 'events_url', 'html_url', 'id', 'node_id', 'number', 'title', 'user', 'labels', 'state', 'locked', 'assignee', 'assignees', 'milestone', 'comments', 'created_at', 'updated_at', 'closed_at', 'author_association', 'active_lock_reason', 'pull_request', 'body', 'timeline_url', 'performed_via_github_app'],
+    num_rows: 3019
+})
+```
+
+ကောင်းပါပြီ၊ ကျွန်တော်တို့ရဲ့ ပထမဆုံး dataset ကို အစကနေ ဖန်တီးခဲ့ပါပြီ။ ဒါပေမယ့် 🤗 Datasets repository ရဲ့ [Issues tab](https://github.com/huggingface/datasets/issues) မှာ စုစုပေါင်း issues ၁,၀၀၀ ခန့်သာ ပြသနေပေမယ့် ဘာကြောင့် issues ထောင်ပေါင်းများစွာ ရှိနေရတာလဲ 🤔။ GitHub [documentation](https://docs.github.com/en/rest/reference/issues#list-issues-assigned-to-the-authenticated-user) မှာ ဖော်ပြထားတဲ့အတိုင်း၊ ကျွန်တော်တို့ pull requests တွေအားလုံးကိုပါ download လုပ်ထားလို့ပါပဲ။
+
+> GitHub ရဲ့ REST API v3 က pull request တိုင်းကို issue တစ်ခုလို့ သတ်မှတ်ပေမယ့်၊ issue တိုင်းကတော့ pull request မဟုတ်ပါဘူး။ ဒါကြောင့်၊ "Issues" endpoints တွေက response မှာ issues နဲ့ pull requests နှစ်ခုလုံးကို ပြန်ပေးနိုင်ပါတယ်။ `pull_request` key နဲ့ pull requests တွေကို ခွဲခြားသိမြင်နိုင်ပါတယ်။ "Issues" endpoints တွေကနေ ပြန်လာတဲ့ pull request တစ်ခုရဲ့ `id` က issue id တစ်ခု ဖြစ်နေမှာကို သတိထားပါ။
+
+issues နဲ့ pull requests ရဲ့ အကြောင်းအရာတွေက အတော်လေး ကွာခြားတာကြောင့်၊ ၎င်းတို့ကြား ခွဲခြားနိုင်ဖို့ minor preprocessing အချို့ လုပ်ကြည့်ရအောင်။
+
+## Data ကို သန့်ရှင်းရေးလုပ်ခြင်း[[cleaning-up-the-data]]
+
+GitHub ရဲ့ documentation က အပေါ်က snippet က `pull_request` column ကို issues နဲ့ pull requests တွေကြား ခွဲခြားဖို့ အသုံးပြုနိုင်တယ်လို့ ကျွန်တော်တို့ကို ပြောပြပါတယ်။ ခြားနားချက်ကို မြင်နိုင်ဖို့ random sample တစ်ခုကို ကြည့်ရအောင်။ [အပိုင်း ၃](/course/chapter5/3) မှာ လုပ်ခဲ့သလိုပဲ၊ `Dataset.shuffle()` နဲ့ `Dataset.select()` ကို တွဲပြီး random sample တစ်ခုကို ဖန်တီးပါမယ်။ ပြီးတော့ `html_url` နဲ့ `pull_request` columns တွေကို zip လုပ်ပြီး URLs တွေကို နှိုင်းယှဉ်နိုင်ပါလိမ့်မယ်။
+
+```py
+sample = issues_dataset.shuffle(seed=666).select(range(3))
+
+# URL နဲ့ pull request entries တွေကို print ထုတ်ပါ
+for url, pr in zip(sample["html_url"], sample["pull_request"]):
+    print(f">> URL: {url}")
+    print(f">> Pull request: {pr}\n")
+```
+
+```python out
+>> URL: https://github.com/huggingface/datasets/pull/850
+>> Pull request: {'url': 'https://api.github.com/repos/huggingface/datasets/pulls/850', 'html_url': 'https://github.com/huggingface/datasets/pull/850', 'diff_url': 'https://github.com/huggingface/datasets/pull/850', 'patch_url': 'https://github.com/huggingface/datasets/pull/850'}
+
+>> URL: https://github.com/huggingface/datasets/issues/2773
+>> Pull request: None
+
+>> URL: https://github.com/huggingface/datasets/pull/783
+>> Pull request: {'url': 'https://api.github.com/repos/huggingface/datasets/pulls/783', 'html_url': 'https://github.com/huggingface/datasets/pull/783', 'diff_url': 'https://github.com/huggingface/datasets/pull/783', 'patch_url': 'https://github.com/huggingface/datasets/pull/783'}
+```
+
+ဒီနေရာမှာ pull request တစ်ခုစီဟာ URLs မျိုးစုံနဲ့ ဆက်စပ်နေတာကို မြင်နိုင်ပြီး၊ သာမန် issues တွေမှာတော့ `None` entry ပါဝင်ပါတယ်။ ဒီခြားနားချက်ကို အသုံးပြုပြီး `pull_request` field က `None` ဟုတ်မဟုတ် စစ်ဆေးတဲ့ `is_pull_request` column အသစ်တစ်ခုကို ဖန်တီးနိုင်ပါတယ်။
+
+```py
+issues_dataset = issues_dataset.map(
+    lambda x: {"is_pull_request": False if x["pull_request"] is None else True}
+)
+```
+
+> [!TIP]
+> ✏️ **စမ်းသပ်ကြည့်ပါ။** 🤗 Datasets မှာ issues တွေကို ပိတ်ဖို့ ပျမ်းမျှအချိန်ကို တွက်ချက်ပါ။ pull requests တွေနဲ့ open issues တွေကို filter လုပ်ဖို့ `Dataset.filter()` function ကို အသုံးဝင်တယ်လို့ တွေ့ရနိုင်ပြီး၊ `created_at` နဲ့ `closed_at` timestamps တွေကို အလွယ်တကူ ကိုင်တွယ်နိုင်ဖို့ dataset ကို `DataFrame` အဖြစ် ပြောင်းလဲဖို့ `Dataset.set_format()` function ကို အသုံးပြုနိုင်ပါတယ်။ bonus အမှတ်များအတွက်၊ pull requests တွေကို ပိတ်ဖို့ ပျမ်းမျှအချိန်ကို တွက်ချက်ပါ။
+
+columns အချို့ကို ဖျက်ပစ်ခြင်း သို့မဟုတ် အမည်ပြောင်းလဲခြင်းဖြင့် dataset ကို နောက်ထပ် သန့်ရှင်းရေးလုပ်နိုင်ပေမယ့်၊ ဒီအဆင့်မှာ dataset ကို တတ်နိုင်သမျှ "raw" အဖြစ် ထားရှိခြင်းက နောက်ပိုင်းမှာ applications မျိုးစုံမှာ အလွယ်တကူ အသုံးပြုနိုင်စေဖို့ အလေ့အကျင့်ကောင်းတစ်ခုပါ။
+
+ကျွန်တော်တို့ရဲ့ dataset ကို Hugging Face Hub သို့ push မလုပ်ခင်၊ မပါဝင်သေးတဲ့ အရာတစ်ခုကို ဖြေရှင်းကြည့်ရအောင်- issue နဲ့ pull request တစ်ခုစီနဲ့ ဆက်စပ်နေတဲ့ comments တွေပါ။ ဒါတွေကို နောက်မှာ GitHub REST API နဲ့ ထပ်ထည့်ပါမယ်။
+
+## Dataset ကို အဆင့်မြှင့်တင်ခြင်း[[augmenting-the-dataset]]
+
+အောက်ပါ screenshot မှာ ပြထားတဲ့အတိုင်း၊ issue ဒါမှမဟုတ် pull request တစ်ခုနဲ့ ဆက်စပ်နေတဲ့ comments တွေက အချက်အလက်ကြွယ်ဝတဲ့ အရင်းအမြစ်တစ်ခုကို ပံ့ပိုးပေးပါတယ်။ အထူးသဖြင့် library အကြောင်း အသုံးပြုသူမေးခွန်းတွေကို ဖြေဖို့ search engine တစ်ခု တည်ဆောက်ချင်တယ်ဆိုရင်ပေါ့။
+
+<div class="flex justify-center">
+<img src="https://huggingface.co/datasets/huggingface-course/documentation-images/resolve/main/en/chapter5/datasets-issues-comment.png" alt="Comments associated with an issue about 🤗 Datasets." width="80%"/>
+</div>
+
+GitHub REST API က issue number တစ်ခုနဲ့ ဆက်စပ်နေတဲ့ comments အားလုံးကို ပြန်ပေးတဲ့ [`Comments` endpoint](https://docs.github.com/en/rest/reference/issues#list-issue-comments) ကို ပံ့ပိုးပေးပါတယ်။ ဒါက ဘာတွေပြန်ပေးလဲဆိုတာ ကြည့်ဖို့ endpoint ကို စမ်းသပ်ကြည့်ရအောင်။
+
+```py
+issue_number = 2792
+url = f"https://api.github.com/repos/huggingface/datasets/issues/{issue_number}/comments"
+response = requests.get(url, headers=headers)
+response.json()
+```
+
+```python out
+[{'url': 'https://api.github.com/repos/huggingface/datasets/issues/comments/897594128',
+  'html_url': 'https://github.com/huggingface/datasets/pull/2792#issuecomment-897594128',
+  'issue_url': 'https://api.github.com/repos/huggingface/datasets/issues/2792',
+  'id': 897594128,
+  'node_id': 'IC_kwDODunzps41gDMQ',
+  'user': {'login': 'bhavitvyamalik',
+   'id': 19718818,
+   'node_id': 'MDQ6VXNlcjE5NzE4ODE4',
+   'avatar_url': 'https://avatars.githubusercontent.com/u/19718818?v=4',
+   'gravatar_id': '',
+   'url': 'https://api.github.com/users/bhavitvyamalik',
+   'html_url': 'https://github.com/bhavitvyamalik',
+   'followers_url': 'https://api.github.com/users/bhavitvyamalik/followers',
+   'following_url': 'https://api.github.com/users/bhavitvyamalik/following{/other_user}',
+   'gists_url': 'https://api.github.com/users/bhavitvyamalik/gists{/gist_id}',
+   'starred_url': 'https://api.github.com/users/bhavitvyamalik/starred{/owner}{/repo}',
+   'subscriptions_url': 'https://api.github.com/users/bhavitvyamalik/subscriptions',
+   'organizations_url': 'https://api.github.com/users/bhavitvyamalik/orgs',
+   'repos_url': 'https://api.github.com/users/bhavitvyamalik/repos',
+   'events_url': 'https://api.github.com/users/bhavitvyamalik/events{/privacy}',
+   'received_events_url': 'https://api.github.com/users/bhavitvyamalik/received_events',
+   'type': 'User',
+   'site_admin': False},
+  'created_at': '2021-08-12T12:21:52Z',
+  'updated_at': '2021-08-12T12:31:17Z',
+  'author_association': 'CONTRIBUTOR',
+  'body': "@albertvillanova my tests are failing here:\r\n```\r\ndataset_name = 'gooaq'\r\n\r\n    def test_load_dataset(self, dataset_name):\r\n        configs = self.dataset_tester.load_all_configs(dataset_name, is_local=True)[:1]\r\n>       self.dataset_tester.check_load_dataset(dataset_name, configs, is_local=True, use_local_dummy_data=True)\r\n\r\ntests/test_dataset_common.py:234: \r\n_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ \r\ntests/test_dataset_common.py:187: in check_load_dataset\r\n    self.parent.assertTrue(len(dataset[split]) > 0)\r\nE   AssertionError: False is not true\r\n```\r\nWhen I try loading dataset on local machine it works fine. Any suggestions on how can I avoid this error?",
+  'performed_via_github_app': None}]
+```
+
+comment က `body` field ထဲမှာ သိမ်းဆည်းထားတာကို ကျွန်တော်တို့ မြင်နိုင်ပါတယ်။ ဒါကြောင့် `response.json()` ထဲက element တစ်ခုစီအတွက် `body` အကြောင်းအရာတွေကို ရွေးထုတ်ပြီး issue တစ်ခုနဲ့ ဆက်စပ်နေတဲ့ comments အားလုံးကို ပြန်ပေးနိုင်တဲ့ ရိုးရှင်းတဲ့ function တစ်ခု ရေးကြည့်ရအောင်။
+
+```py
+def get_comments(issue_number):
+    url = f"https://api.github.com/repos/huggingface/datasets/issues/{issue_number}/comments"
+    response = requests.get(url, headers=headers)
+    return [r["body"] for r in response.json()]
+
+
+# ကျွန်တော်တို့ရဲ့ function က မျှော်လင့်ထားတဲ့အတိုင်း အလုပ်လုပ်မလုပ် စစ်ဆေးပါ
+get_comments(2792)
+```
+
+```python out
+["@albertvillanova my tests are failing here:\r\n```\r\ndataset_name = 'gooaq'\r\n\r\n    def test_load_dataset(self, dataset_name):\r\n        configs = self.dataset_tester.load_all_configs(dataset_name, is_local=True)[:1]\r\n>       self.dataset_tester.check_load_dataset(dataset_name, configs, is_local=True, use_local_dummy_data=True)\r\n\r\ntests/test_dataset_common.py:234: \r\n_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ \r\ntests/test_dataset_common.py:187: in check_load_dataset\r\n    self.parent.assertTrue(len(dataset[split]) > 0)\r\nE   AssertionError: False is not true\r\n```\r\nWhen I try loading dataset on local machine it works fine. Any suggestions on how can I avoid this error?"]
+```
+
+ဒါက ကောင်းမွန်ပုံရပါတယ်။ ဒါကြောင့် `Dataset.map()` ကို အသုံးပြုပြီး ကျွန်တော်တို့ရဲ့ dataset ထဲက issue တစ်ခုစီအတွက် `comments` column အသစ်တစ်ခု ထည့်လိုက်ရအောင်။
+
+```py
+# သင့်အင်တာနက်ချိတ်ဆက်မှုပေါ်မူတည်ပြီး၊ ဒါက မိနစ်အနည်းငယ် ကြာနိုင်ပါတယ်...
+issues_with_comments_dataset = issues_dataset.map(
+    lambda x: {"comments": get_comments(x["number"])}
+)
+```
+
+နောက်ဆုံးအဆင့်ကတော့ ကျွန်တော်တို့ရဲ့ dataset ကို Hub သို့ push လုပ်ဖို့ပါပဲ။ ဒါကို ဘယ်လိုလုပ်ရမလဲဆိုတာ ကြည့်ရအောင်။
+
+## Dataset ကို Hugging Face Hub သို့ Upload လုပ်ခြင်း[[uploading-the-dataset-to-the-hugging-face-hub]]
+
+<Youtube id="HaN6qCr_Afc"/>
+
+အခု ကျွန်တော်တို့ရဲ့ augmented dataset ကို ရရှိပြီဆိုတော့၊ ဒါကို community နဲ့ မျှဝေနိုင်ဖို့ Hub ကို push လုပ်ရမယ့်အချိန်ပါပဲ။ dataset တစ်ခုကို upload လုပ်တာက အလွန်ရိုးရှင်းပါတယ်- 🤗 Transformers က models တွေနဲ့ tokenizers တွေလိုပဲ၊ dataset တစ်ခုကို push လုပ်ဖို့ `push_to_hub()` method ကို အသုံးပြုနိုင်ပါတယ်။ ဒါကိုလုပ်ဖို့ authentication token တစ်ခု လိုအပ်ပြီး၊ ဒါကို `notebook_login()` function နဲ့ Hugging Face Hub ကို အရင်ဆုံး login ဝင်ခြင်းဖြင့် ရရှိနိုင်ပါတယ်။
+
+```py
+from huggingface_hub import notebook_login
+
+notebook_login()
+```
+
+ဒါက သင် username နဲ့ password ထည့်သွင်းနိုင်မယ့် widget တစ်ခုကို ဖန်တီးပေးပြီး၊ API token ကို *~/.huggingface/token* ထဲမှာ သိမ်းဆည်းပါလိမ့်မယ်။ သင် code ကို terminal မှာ run နေတယ်ဆိုရင်၊ CLI မှတစ်ဆင့် login ဝင်နိုင်ပါတယ်။
+
+```bash
+huggingface-cli login
+```
+
+ဒါကို လုပ်ပြီးတာနဲ့၊ ကျွန်တော်တို့ရဲ့ dataset ကို အောက်ပါအတိုင်း run ပြီး upload လုပ်နိုင်ပါတယ်။
+
+```py
+issues_with_comments_dataset.push_to_hub("github-issues")
+```
+
+ဒီနေရာကနေ၊ ဘယ်သူမဆို `load_dataset()` ကို repository ID ကို `path` argument အဖြစ် ပေးခြင်းဖြင့် dataset ကို download လုပ်နိုင်ပါတယ်။
+
+```py
+remote_dataset = load_dataset("lewtun/github-issues", split="train")
+remote_dataset
+```
+
+```python out
+Dataset({
+    features: ['url', 'repository_url', 'labels_url', 'comments_url', 'events_url', 'html_url', 'id', 'node_id', 'number', 'title', 'user', 'labels', 'state', 'locked', 'assignee', 'assignees', 'milestone', 'comments', 'created_at', 'updated_at', 'closed_at', 'author_association', 'active_lock_reason', 'pull_request', 'body', 'performed_via_github_app', 'is_pull_request'],
+    num_rows: 2855
+})
+```
+
+ကောင်းပါပြီ၊ ကျွန်တော်တို့ရဲ့ dataset ကို Hub သို့ push လုပ်ခဲ့ပြီး အခြားသူတွေ အသုံးပြုနိုင်ပါပြီ။ လုပ်ဖို့ကျန်နေသေးတဲ့ အရေးကြီးတဲ့အရာတစ်ခုပဲ ရှိပါတော့တယ်- corpus ကို ဘယ်လိုဖန်တီးခဲ့သလဲဆိုတာကို ရှင်းပြပြီး community အတွက် အခြားအသုံးဝင်တဲ့ အချက်အလက်တွေ ပံ့ပိုးပေးမယ့် _dataset card_ တစ်ခု ထည့်သွင်းခြင်းပါ။
+
+> [!TIP]
+> 💡 `huggingface-cli` နဲ့ Git magic အနည်းငယ်ကို အသုံးပြုပြီး terminal ကနေ Hugging Face Hub သို့ dataset တစ်ခုကို တိုက်ရိုက် upload လုပ်နိုင်ပါသေးတယ်။ ဒါကို ဘယ်လိုလုပ်ရမလဲဆိုတာကို Hugging Face [🤗 Datasets guide](https://huggingface.co/docs/datasets/share#share-a-dataset-using-the-cli) မှာ ကြည့်ရှုနိုင်ပါတယ်။
+
+## Dataset Card တစ်ခု ဖန်တီးခြင်း[[creating-a-dataset-card]]
+
+ကောင်းမွန်စွာ မှတ်တမ်းတင်ထားသော datasets များသည် အခြားသူများ (သင့်ရဲ့ အနာဂတ် ကိုယ်သင်ကိုယ်တိုင် အပါအဝင်) အတွက် ပိုမိုအသုံးဝင်နိုင်ဖွယ်ရှိပါတယ်။ ဘာကြောင့်လဲဆိုတော့ ၎င်းတို့က dataset သည် ၎င်းတို့ရဲ့ task အတွက် သက်ဆိုင်ခြင်းရှိမရှိ ဆုံးဖြတ်နိုင်ရန်နှင့် dataset အသုံးပြုခြင်းနဲ့ ဆက်စပ်နေတဲ့ ဖြစ်နိုင်ချေရှိသော ဘက်လိုက်မှုများ (biases) သို့မဟုတ် အန္တရာယ်များကို အကဲဖြတ်နိုင်ရန် အချက်အလက်များ (context) ကို ပေးသောကြောင့်ပါ။
+
+Hugging Face Hub မှာ၊ ဒီအချက်အလက်တွေကို dataset repository တစ်ခုစီရဲ့ *README.md* file ထဲမှာ သိမ်းဆည်းထားပါတယ်။ ဒီ file ကို မဖန်တီးခင် လုပ်ဆောင်သင့်တဲ့ အဓိကအဆင့်နှစ်ဆင့်ရှိပါတယ်။
+
+၁။ YAML format နဲ့ metadata tags တွေ ဖန်တီးဖို့ [`datasets-tagging` application](https://huggingface.co/datasets/tagging/) ကို အသုံးပြုပါ။ ဒီ tags တွေကို Hugging Face Hub ပေါ်မှာ ရှာဖွေမှု features အမျိုးမျိုးအတွက် အသုံးပြုပြီး သင့် dataset ကို community members တွေက အလွယ်တကူ ရှာဖွေနိုင်ဖို့ သေချာစေပါတယ်။ ဒီနေရာမှာ ကျွန်တော်တို့ custom dataset တစ်ခုကို ဖန်တီးခဲ့တဲ့အတွက်၊ သင် `datasets-tagging` repository ကို clone လုပ်ပြီး application ကို locally run ဖို့ လိုအပ်ပါလိမ့်မယ်။ interface က ဘယ်လိုပုံလဲဆိုတာ ဒီမှာပါ။
+
+<div class="flex justify-center">
+<img src="https://huggingface.co/datasets/huggingface-course/documentation-images/resolve/main/en/chapter5/datasets-tagger.png" alt="The `datasets-tagging` interface." width="80%"/>
+</div>
+
+၂။ အချက်အလက်ပြည့်စုံတဲ့ dataset cards တွေ ဖန်တီးခြင်းအကြောင်း [🤗 Datasets guide](https://github.com/huggingface/datasets/blob/master/templates/README_guide.md) ကို ဖတ်ပြီး ဒါကို template အဖြစ် အသုံးပြုပါ။
+
+သင် *README.md* file ကို Hub ပေါ်မှာ တိုက်ရိုက်ဖန်တီးနိုင်ပြီး၊ `lewtun/github-issues` dataset repository ထဲမှာ template dataset card တစ်ခုကို သင်ရှာတွေ့နိုင်ပါတယ်။ ဖြည့်စွက်ထားတဲ့ dataset card ရဲ့ screenshot တစ်ခုကို အောက်မှာ ပြသထားပါတယ်။
+
+<div class="flex justify-center">
+<img src="https://huggingface.co/datasets/huggingface-course/documentation-images/resolve/main/en/chapter5/dataset-card.png" alt="A dataset card." width="80%"/>
+</div>
+
+> [!TIP]
+> ✏️ **စမ်းသပ်ကြည့်ပါ။** သင်၏ GitHub issues dataset အတွက် *README.md* file ကို ဖြည့်စွက်ရန် `datasets-tagging` application နှင့် [🤗 Datasets guide](https://github.com/huggingface/datasets/blob/master/templates/README_guide.md) ကို အသုံးပြုပါ။
+
+ဒါပါပဲ! ဒီအပိုင်းမှာ ကောင်းမွန်တဲ့ dataset တစ်ခု ဖန်တီးတာက အတော်လေး ရှုပ်ထွေးနိုင်တယ်ဆိုတာ ကျွန်တော်တို့ မြင်တွေ့ခဲ့ရပေမယ့်၊ ကံကောင်းစွာနဲ့ပဲ ဒါကို upload လုပ်ပြီး community နဲ့ မျှဝေတာကတော့ မရှုပ်ထွေးပါဘူး။ နောက်အပိုင်းမှာ ကျွန်တော်တို့ရဲ့ dataset အသစ်ကို အသုံးပြုပြီး 🤗 Datasets နဲ့ semantic search engine တစ်ခုကို ဖန်တီးပါမယ်။ ဒါက မေးခွန်းတွေကို အသင့်တော်ဆုံး issues နဲ့ comments တွေနဲ့ ကိုက်ညီစေနိုင်ပါတယ်။
+
+> [!TIP]
+> ✏️ **စမ်းသပ်ကြည့်ပါ။** ဒီအပိုင်းမှာ ကျွန်တော်တို့ လုပ်ခဲ့တဲ့ အဆင့်တွေကို လိုက်နာပြီး သင့်စိတ်ကြိုက် open source library အတွက် GitHub issues dataset တစ်ခု ဖန်တီးပါ။ (🤗 Datasets မဟုတ်တဲ့ တခြားတစ်ခုကို ရွေးချယ်ပါ၊) bonus အမှတ်များအတွက်၊ `labels` field မှာ ပါဝင်တဲ့ tags တွေကို ခန့်မှန်းဖို့ multilabel classifier တစ်ခုကို fine-tune လုပ်ပါ။
+
+## ဝေါဟာရ ရှင်းလင်းချက် (Glossary)
+
+*   **NLP Application**: Natural Language Processing (NLP) နည်းပညာများကို အသုံးပြု၍ လူသားဘာသာစကားနှင့် ဆက်စပ်သော လုပ်ငန်းများကို လုပ်ဆောင်သည့် application။
+*   **Corpus**: ဘာသာစကားဆိုင်ရာ လေ့လာမှုများအတွက် စုဆောင်းထားသော စာသားအစုအဝေးကြီး။
+*   **GitHub Issues**: GitHub repository များတွင် bugs များကို ခြေရာခံရန်၊ features များကို တောင်းဆိုရန် သို့မဟုတ် ပရောဂျက်နှင့် ပတ်သက်သော ဆွေးနွေးမှုများ ပြုလုပ်ရန် အသုံးပြုသော မှတ်တမ်းများ။
+*   **Pull Requests**: GitHub တွင် developer များက project code တွင် ပြောင်းလဲမှုများကို အကြံပြုပြီး main codebase ထဲသို့ ပေါင်းစည်းရန် တောင်းဆိုခြင်း။
+*   **Multilabel Classifier**: input တစ်ခုကို သတ်မှတ်ထားသော labels များစွာဖြင့် တစ်ပြိုင်နက်တည်း အမျိုးအစားခွဲခြားနိုင်သော machine learning model။
+*   **Metadata**: data အကြောင်း အချက်အလက်များ (data about data)။
+*   **Semantic Search Engine**: အဓိပ္ပာယ်ပေါ်မူတည်၍ ရှာဖွေမှုများကို လုပ်ဆောင်နိုင်သော search engine။
+*   **Query**: search engine တစ်ခုသို့ ပေးပို့သော ရှာဖွေမှု မေးခွန်း။
+*   **🤗 Datasets**: Hugging Face က ထုတ်လုပ်ထားတဲ့ library တစ်ခုဖြစ်ပြီး AI မော်ဒယ်တွေ လေ့ကျင့်ဖို့အတွက် ဒေတာအစုအဝေး (datasets) တွေကို လွယ်လွယ်ကူကူ ဝင်ရောက်ရယူ၊ စီမံခန့်ခွဲပြီး အသုံးပြုနိုင်စေပါတယ်။
+*   **Open Source Project**: source code ကို အများပြည်သူအား လွတ်လပ်စွာ အသုံးပြုရန်၊ ပြင်ဆင်ရန်နှင့် မျှဝေရန် ခွင့်ပြုထားသော ဆော့ဖ်ဝဲလ်ပရောဂျက်။
+*   **Repository**: Git version control system ကို အသုံးပြု၍ project files တွေကို ခြေရာခံ၊ စီမံခန့်ခွဲရာတွင် အသုံးပြုသော code များ စုစည်းရာနေရာ။
+*   **Issues Tab**: GitHub repository တွင် issues များကို စာရင်းပြုစုထားသော tab။
+*   **GitHub REST API**: GitHub ပလပ်ဖောင်း၏ ဒေတာများကို ပရိုဂရမ်ဖြင့် ဝင်ရောက်ကြည့်ရှုရန်နှင့် စီမံခန့်ခွဲရန် ခွင့်ပြုသော web service။
+*   **`Issues` Endpoint**: GitHub REST API တွင် issues များနှင့် pull requests များကို ရယူရန် အသုံးပြုသော API endpoint။
+*   **JSON Objects**: JavaScript Object Notation (JSON) format ဖြင့် ဖော်ပြထားသော data structure များ။
+*   **HTTP Requests**: Hypertext Transfer Protocol (HTTP) ကို အသုံးပြု၍ web server မှ အရင်းအမြစ်များကို တောင်းဆိုခြင်း။
+*   **`requests` Library**: Python တွင် HTTP requests များ ပြုလုပ်ရန်အတွက် အသုံးပြုသော popular library။
+*   **`pip install requests`**: Python package manager `pip` ကို အသုံးပြု၍ `requests` library ကို install လုပ်သော command။
+*   **GET Request**: web server မှ အချက်အလက်များကို ရယူရန် အသုံးပြုသော HTTP request method။
+*   **`requests.get()` Function**: `requests` library မှ GET request တစ်ခုကို ပေးပို့ရန်အတွက် function။
+*   **`response` Object**: HTTP request တစ်ခု၏ ရလဒ်များကို ကိုယ်စားပြုသော object။
+*   **HTTP Status Code**: HTTP request ၏ အခြေအနေကို ဖော်ပြသော ဂဏန်းကုဒ် (ဥပမာ- 200 = OK, 404 = Not Found)။
+*   **Payload**: HTTP response ၏ အကြောင်းအရာများ (data)။
+*   **Bytes**: ကွန်ပျူတာဖြင့် သိမ်းဆည်းနိုင်သော အသေးငယ်ဆုံး ဒေတာယူနစ်။
+*   **Strings**: စာသားကို ကိုယ်စားပြုသော characters များ၏ အစီအစဉ်။
+*   **`response.json()`**: HTTP response ၏ payload ကို JSON format ဖြင့် parse လုပ်ပြီး Python dictionary သို့မဟုတ် list အဖြစ် ပြန်ပေးသော method။
+*   **`title` Field**: issue သို့မဟုတ် pull request ၏ ခေါင်းစဉ်။
+*   **`body` Field**: issue သို့မဟုတ် pull request ၏ အဓိက ဖော်ပြချက် သို့မဟုတ် comment ၏ အကြောင်းအရာ။
+*   **`number` Field**: issue သို့မဟုတ် pull request ၏ ထူးခြားသော နံပါတ်။
+*   **Rate Limiting**: သတ်မှတ်ထားသော အချိန်ကာလတစ်ခုအတွင်း ပြုလုပ်နိုင်သော requests အရေအတွက်ကို ကန့်သတ်ခြင်း။
+*   **`per_page` Query Parameter**: API request တစ်ခုတွင် တစ်စာမျက်နှာလျှင် ပြန်ပေးမည့် items အရေအတွက်ကို သတ်မှတ်သော parameter။
+*   **Personal Access Token**: GitHub API ကို authentication လုပ်ရန် အသုံးပြုသော လုံခြုံရေး token။ ၎င်းသည် rate limit ကို မြှင့်တင်ပေးသည်။
+*   **Request Header**: HTTP request တွင် အချက်အလက်များကို ပေးပို့ရန် အသုံးပြုသော key-value pair များ။
+*   **`Authorization` Header**: API authentication အတွက် အသုံးပြုသော request header။
+*   **`.env` File**: Environment variables များကို သိမ်းဆည်းထားသော ဖိုင်။
+*   **`python-dotenv` Library**: `.env` file မှ environment variables များကို Python application သို့ load လုပ်ရန် အသုံးပြုသော library။
+*   **Environment Variable**: Operating system တွင် သတ်မှတ်ထားသော variable တစ်ခုဖြစ်ပြီး program များက အချက်အလက်များ ရယူရန် အသုံးပြုသည်။
+*   **`Pathlib`**: Python တွင် file system paths များကို object-oriented ပုံစံဖြင့် ကိုင်တွယ်ရန် အသုံးပြုသော module။
+*   **`pandas`**: Python programming language အတွက် data analysis နှင့် manipulation အတွက် အသုံးပြုသော open-source library။
+*   **`tqdm.notebook.tqdm`**: Notebook environment များအတွက် progress bar ကို ပြသပေးသော `tqdm` library ၏ function။
+*   **`fetch_issues()` Function**: GitHub API မှ issues များကို download လုပ်ရန် ကျွန်တော်တို့ ဖန်တီးထားသော function။
+*   **Batches**: ဒေတာအမြောက်အမြားကို တစ်ပြိုင်နက်တည်း လုပ်ဆောင်နိုင်ရန် အုပ်စုဖွဲ့ထားခြင်း။
+*   **`DataFrame.from_records()`**: List of dictionaries မှ Pandas DataFrame တစ်ခုကို ဖန်တီးသော method။
+*   **`to_json()`**: DataFrame ကို JSON format ဖြင့် file သို့ သိမ်းဆည်းသော method။
+*   **`orient="records"`**: JSON export orientation တစ်မျိုးဖြစ်ပြီး each row is a JSON object။
+*   **`lines=True`**: JSON Lines format ဖြင့် export လုပ်ရန်အတွက် `to_json()` argument။
+*   **`jsonl` (JSON Lines) File**: JSON objects များကို line တစ်ကြောင်းစီတွင် တစ်ခုစီ ထားရှိသော text file format။
+*   **`split="train"`**: `load_dataset()` function တွင် dataset ၏ training split ကို load လုပ်ရန် သတ်မှတ်ခြင်း။
+*   **`pull_request` Key**: GitHub API response တွင် item သည် pull request ဖြစ်မဖြစ် ခွဲခြားရန် အသုံးပြုသော key။
+*   **`Dataset.shuffle()`**: dataset အတွင်းရှိ rows များကို ကျပန်း (randomly) ရောနှောပေးသော method။
+*   **`Dataset.select()`**: dataset မှ သတ်မှတ်ထားသော index များကို ရွေးထုတ်ပေးသော method။
+*   **`zip()`**: Python built-in function တစ်ခုဖြစ်ပြီး iterable objects များကို တွဲဖက်ပေးသည်။
+*   **`html_url` Column**: issue သို့မဟုတ် pull request ၏ web URL။
+*   **`None` Entry**: တန်ဖိုးမရှိခြင်း သို့မဟုတ် မဖော်ပြထားခြင်းကို ကိုယ်စားပြုသော Python object။
+*   **`is_pull_request` Column**: item သည် pull request ဟုတ်မဟုတ်ကို ဖော်ပြသော boolean (True/False) column အသစ်။
+*   **`Dataset.filter()`**: dataset မှ သတ်မှတ်ထားသော criteria များနှင့် ကိုက်ညီသော rows များကို ဖယ်ရှားပေးသော method။
+*   **`Dataset.set_format()`**: dataset ၏ output format ကို ပြောင်းလဲပေးသော method (ဥပမာ- "pandas", "torch")။
+*   **`created_at` / `closed_at` Timestamps**: issue သို့မဟုတ် pull request ဖန်တီးခဲ့သည့် သို့မဟုတ် ပိတ်ခဲ့သည့် ရက်စွဲနှင့် အချိန်။
+*   **"Raw" Dataset**: မည်သည့် preprocessing သို့မဟုတ် cleaning မှ မလုပ်ရသေးသော dataset။
+*   **Augmenting the Dataset**: dataset သို့ အပိုဒေတာများ သို့မဟုတ် အချက်အလက်များ ထပ်ထည့်ခြင်း။
+*   **`Comments` Endpoint**: GitHub REST API တွင် issue တစ်ခု သို့မဟုတ် pull request တစ်ခုနှင့် ဆက်စပ်နေသော comments များကို ရယူရန် အသုံးပြုသော API endpoint။
+*   **`notebook_login()` Function (from `huggingface_hub`)**: Jupyter Notebooks တွင် Hugging Face Hub သို့ authentication လုပ်ရန်အတွက် function။
+*   **API Token**: API ကို ဝင်ရောက်ကြည့်ရှုရန် အသုံးပြုသော unique key။
+*   **`~/.huggingface/token`**: Hugging Face authentication token ကို သိမ်းဆည်းထားသော ဖိုင်လမ်းကြောင်း။
+*   **`huggingface-cli login`**: Command Line Interface (CLI) မှ Hugging Face Hub သို့ login ဝင်ရန် command။
+*   **Repository ID**: Hugging Face Hub ပေါ်ရှိ repository တစ်ခု၏ ထူးခြားသော ID (ဥပမာ- `lewtun/github-issues`)။
+*   **Dataset Card**: Hugging Face Hub တွင် dataset တစ်ခုစီအတွက် ပါရှိသော အချက်အလက်များပါသည့် စာမျက်နှာ (README.md file)။ ၎င်းတွင် dataset ကို မည်သို့ဖန်တီးခဲ့သည်၊ ၎င်း၏ ကန့်သတ်ချက်များ၊ ဘက်လိုက်မှုများ (biases) နှင့် အသုံးပြုနည်းများ ပါဝင်သည်။
+*   **Context**: အချက်အလက် သို့မဟုတ် အခြေအနေတစ်ခုကို နားလည်ရန် အရေးကြီးသော နောက်ခံအချက်အလက်။
+*   **Potential Biases**: dataset တွင် ဖြစ်ပေါ်နိုင်သော ဘက်လိုက်မှုများ။
+*   **Risks**: dataset အသုံးပြုခြင်းနှင့် ဆက်စပ်သော ဖြစ်နိုင်ချေရှိသော အန္တရာယ်များ။
+*   **`datasets-tagging` Application**: Hugging Face မှ dataset ၏ metadata tags များကို ဖန်တီးရန် ကူညီပေးသော application။
+*   **YAML Format**: Human-readable data serialization standard တစ်ခုဖြစ်ပြီး configuration files များတွင် အသုံးများသည်။
+*   **Metadata Tags**: dataset ကို ရှာဖွေနိုင်ရန်နှင့် အမျိုးအစားခွဲခြားရန် အသုံးပြုသော keywords သို့မဟုတ် labels များ။
+*   **Clone**: Git repository တစ်ခု၏ မိတ္တူအပြည့်အစုံကို local machine သို့ download လုပ်ခြင်း။
+*   **Local Machine**: သင်အသုံးပြုနေသော ကိုယ်ပိုင်ကွန်ပျူတာ။
+*   **Semantic Search Engine**: အဓိပ္ပာယ်ပေါ်မူတည်၍ ရှာဖွေမှုများကို လုပ်ဆောင်နိုင်သော search engine။
\ No newline at end of file
diff --git a/chapters/my/chapter5/6.mdx b/chapters/my/chapter5/6.mdx
new file mode 100644
index 000000000..c3a81c424
--- /dev/null
+++ b/chapters/my/chapter5/6.mdx
@@ -0,0 +1,589 @@
+<FrameworkSwitchCourse {fw} />
+
+# FAISS ဖြင့် Semantic Search ပြုလုပ်ခြင်း[[semantic-search-with-faiss]]
+
+{#if fw === 'pt'}
+
+<CourseFloatingBanner chapter={5}
+  classNames="absolute z-10 right-0 top-0"
+  notebooks={[
+    {label: "Google Colab", value: "https://colab.research.google.com/github/huggingface/notebooks/blob/master/course/en/chapter5/section6_pt.ipynb"},
+    {label: "Aws Studio", value: "https://studiolab.sagemaker.aws/import/github/huggingface/notebooks/blob/master/course/en/chapter5/section6_pt.ipynb"},
+]} />
+
+{:else}
+
+<CourseFloatingBanner chapter={5}
+  classNames="absolute z-10 right-0 top-0"
+  notebooks={[
+    {label: "Google Colab", value: "https://colab.research.google.com/github/huggingface/notebooks/blob/master/course/en/chapter5/section6_tf.ipynb"},
+    {label: "Aws Studio", value: "https://studiolab.sagemaker.aws/import/github/huggingface/notebooks/blob/master/course/en/chapter5/section6_tf.ipynb"},
+]} />
+
+{/if}
+
+[အခန်း ၅](/course/chapter5/5) မှာ ကျွန်တော်တို့ 🤗 Datasets repository ကနေ GitHub issues နဲ့ comments တွေရဲ့ dataset တစ်ခုကို ဖန်တီးခဲ့ပါတယ်။ ဒီအပိုင်းမှာတော့ ဒီအချက်အလက်တွေကို အသုံးပြုပြီး library နဲ့ ပတ်သက်တဲ့ ကျွန်တော်တို့ရဲ့ အရေးအကြီးဆုံး မေးခွန်းတွေရဲ့ အဖြေတွေကို ရှာဖွေနိုင်မယ့် search engine တစ်ခုကို တည်ဆောက်သွားမှာပါ။
+
+<Youtube id="OATCgQtNX2o"/>
+
+## Semantic Search အတွက် Embeddings များကို အသုံးပြုခြင်း[[using-embeddings-for-semantic-search]]
+
+[Chapter 1](/course/chapter1) မှာ ကျွန်တော်တို့ မြင်ခဲ့ရတဲ့အတိုင်း၊ Transformer-based language models တွေက စာသားအပိုင်းအစတစ်ခုထဲက token တစ်ခုစီကို _embedding vector_ အဖြစ် ကိုယ်စားပြုပါတယ်။ တစ်ခါတစ်ရံမှာ sentences တွေ၊ paragraphs တွေ ဒါမှမဟုတ် (အချို့ကိစ္စတွေမှာ) documents တွေအတွက် vector representation တစ်ခု ဖန်တီးဖို့ individual embeddings တွေကို "pool" လုပ်နိုင်ပါတယ်။ ထို့နောက် ဤ embeddings များကို dot-product similarity (သို့မဟုတ် အခြား similarity metric တစ်ခုခု) ကို တွက်ချက်ခြင်းဖြင့် corpus ထဲရှိ ဆင်တူသော documents များကို ရှာဖွေနိုင်ပြီး အတူဆုံးသော documents များကို ပြန်ပေးနိုင်ပါတယ်။
+
+ဒီအပိုင်းမှာတော့ embeddings တွေကို အသုံးပြုပြီး semantic search engine တစ်ခုကို ကျွန်တော်တို့ တည်ဆောက်သွားမှာပါ။ ဒီ search engines တွေက query ထဲက keywords တွေကို documents တွေနဲ့ ကိုက်ညီအောင် လုပ်ဆောင်တဲ့ ရိုးရာနည်းလမ်းတွေထက် အားသာချက်များစွာကို ပေးစွမ်းပါတယ်။
+
+<div class="flex justify-center">
+<img class="block dark:hidden" src="https://huggingface.co/datasets/huggingface-course/documentation-images/resolve/main/en/chapter5/semantic-search.svg" alt="Semantic search."/>
+<img class="hidden dark:block" src="https://huggingface.co/datasets/huggingface-course/documentation-images/resolve/main/en/chapter5/semantic-search-dark.svg" alt="Semantic search."/>
+</div>
+
+## Dataset ကို Loading လုပ်ပြီး ပြင်ဆင်ခြင်း[[loading-and-preparing-the-dataset]]
+
+ပထမဆုံး ကျွန်တော်တို့ လုပ်ရမယ့်အရာက GitHub issues တွေရဲ့ dataset ကို download လုပ်ဖို့ပါပဲ၊ ဒါကြောင့် ပုံမှန်အတိုင်း `load_dataset()` function ကို အသုံးပြုရအောင်။
+
+```py
+from datasets import load_dataset
+
+issues_dataset = load_dataset("lewtun/github-issues", split="train")
+issues_dataset
+```
+
+```python out
+Dataset({
+    features: ['url', 'repository_url', 'labels_url', 'comments_url', 'events_url', 'html_url', 'id', 'node_id', 'number', 'title', 'user', 'labels', 'state', 'locked', 'assignee', 'assignees', 'milestone', 'comments', 'created_at', 'updated_at', 'closed_at', 'author_association', 'active_lock_reason', 'pull_request', 'body', 'performed_via_github_app', 'is_pull_request'],
+    num_rows: 2855
+})
+```
+
+ဒီနေရာမှာ ကျွန်တော်တို့ `load_dataset()` မှာ default `train` split ကို သတ်မှတ်ထားတာကြောင့် `DatasetDict` အစား `Dataset` ကို ပြန်ပေးပါတယ်။ ပထမဦးဆုံး လုပ်ရမယ့်အရာက pull requests တွေကို စစ်ထုတ်ပစ်ဖို့ပါပဲ၊ ဘာလို့လဲဆိုတော့ ဒါတွေက user queries တွေကို ဖြေဖို့အတွက် ရှားရှားပါးပါး အသုံးပြုတာကြောင့် search engine မှာ noise တွေ ဖြစ်ပေါ်စေပါလိမ့်မယ်။ အခုဆို ရင်းနှီးနေပြီဖြစ်တဲ့အတိုင်း၊ ကျွန်တော်တို့ dataset ထဲက ဒီ rows တွေကို ဖယ်ထုတ်ဖို့ `Dataset.filter()` function ကို အသုံးပြုနိုင်ပါတယ်။ အဲဒါကို လုပ်နေရင်းနဲ့၊ user queries တွေအတွက် အဖြေမပေးနိုင်တဲ့ comments မရှိတဲ့ rows တွေကိုလည်း စစ်ထုတ်ပစ်ရအောင်။
+
+```py
+issues_dataset = issues_dataset.filter(
+    lambda x: (x["is_pull_request"] == False and len(x["comments"]) > 0)
+)
+issues_dataset
+```
+
+```python out
+Dataset({
+    features: ['url', 'repository_url', 'labels_url', 'comments_url', 'events_url', 'html_url', 'id', 'node_id', 'number', 'title', 'user', 'labels', 'state', 'locked', 'assignee', 'assignees', 'milestone', 'comments', 'created_at', 'updated_at', 'closed_at', 'author_association', 'active_lock_reason', 'pull_request', 'body', 'performed_via_github_app', 'is_pull_request'],
+    num_rows: 771
+})
+```
+
+ကျွန်တော်တို့ dataset မှာ columns တွေ အများကြီးပါတာကို တွေ့ရပါတယ်။ ဒါတွေထဲက အများစုကို search engine တည်ဆောက်ဖို့ ကျွန်တော်တို့ မလိုအပ်ပါဘူး။ search ရှုထောင့်ကကြည့်မယ်ဆိုရင်၊ အချက်အလက်အများဆုံး columns တွေက `title`၊ `body` နဲ့ `comments` တွေဖြစ်ပြီး၊ `html_url` ကတော့ ကျွန်တော်တို့ကို source issue ကို ပြန်လည်ညွှန်ပြတဲ့ link ကို ပေးပါတယ်။ ကျန်တာတွေကို ဖယ်ရှားဖို့ `Dataset.remove_columns()` function ကို အသုံးပြုရအောင်။
+
+```py
+columns = issues_dataset.column_names
+columns_to_keep = ["title", "body", "html_url", "comments"]
+columns_to_remove = set(columns_to_keep).symmetric_difference(columns)
+issues_dataset = issues_dataset.remove_columns(columns_to_remove)
+issues_dataset
+```
+
+```python out
+Dataset({
+    features: ['html_url', 'title', 'comments', 'body'],
+    num_rows: 771
+})
+```
+
+ကျွန်တော်တို့ရဲ့ embeddings တွေ ဖန်တီးဖို့အတွက် issue ရဲ့ title နဲ့ body ကို comment တစ်ခုစီတိုင်းမှာ ပေါင်းထည့်ပါမယ်၊ ဘာလို့လဲဆိုတော့ ဒီ fields တွေက မကြာခဏဆိုသလို အသုံးဝင်တဲ့ context information တွေ ပါဝင်လို့ပါ။ ကျွန်တော်တို့ရဲ့ `comments` column က လက်ရှိမှာ issue တစ်ခုစီအတွက် comments တွေရဲ့ list တစ်ခုဖြစ်နေတာကြောင့်၊ row တစ်ခုစီမှာ `(html_url, title, body, comment)` tuple တစ်ခုပါဝင်အောင် column ကို "explode" လုပ်ဖို့ လိုအပ်ပါတယ်။ Pandas မှာ ဒါကို [`DataFrame.explode()` function](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.explode.html) နဲ့ လုပ်ဆောင်နိုင်ပါတယ်။ ဒါက list-like column တစ်ခုစီမှာ element တစ်ခုစီအတွက် new row တစ်ခု ဖန်တီးပေးပြီး ကျန်တဲ့ column values တွေအားလုံးကို ပွားပေးပါတယ်။ ဒါကို လက်တွေ့မြင်ရဖို့၊ ပထမဆုံး Pandas `DataFrame` format သို့ ပြောင်းရအောင်။
+
+```py
+issues_dataset.set_format("pandas")
+df = issues_dataset[:]
+```
+
+ဒီ `DataFrame` ထဲက ပထမဆုံး row ကို စစ်ဆေးကြည့်မယ်ဆိုရင် ဒီ issue နဲ့ ဆက်စပ်နေတဲ့ comments လေးခုရှိတာကို တွေ့ရပါတယ်-
+
+```py
+df["comments"][0].tolist()
+```
+
+```python out
+['the bug code locate in ：\r\n    if data_args.task_name is not None:\r\n        # Downloading and loading a dataset from the hub.\r\n        datasets = load_dataset("glue", data_args.task_name, cache_dir=model_args.cache_dir)',
+ 'Hi @jinec,\r\n\r\nFrom time to time we get this kind of `ConnectionError` coming from the github.com website: https://raw.githubusercontent.com\r\n\r\nNormally, it should work if you wait a little and then retry.\r\n\r\nCould you please confirm if the problem persists?',
+ 'cannot connect，even by Web browser，please check that  there is some  problems。',
+ 'I can access https://raw.githubusercontent.com/huggingface/datasets/1.7.0/datasets/glue/glue.py without problem...']
+```
+
+ကျွန်တော်တို့ `df` ကို explode လုပ်တဲ့အခါ၊ ဒီ comments တစ်ခုစီအတွက် row တစ်ခုရရှိဖို့ မျှော်လင့်ပါတယ်။ ဒါဟုတ်မဟုတ် စစ်ကြည့်ရအောင်။
+
+```py
+comments_df = df.explode("comments", ignore_index=True)
+comments_df.head(4)
+```
+
+<table border="1" class="dataframe" style="table-layout: fixed; word-wrap:break-word; width: 100%;">
+  <thead>
+    <tr style="text-align: right;">
+      <th></th>
+      <th>html_url</th>
+      <th>title</th>
+      <th>comments</th>
+      <th>body</th>
+    </tr>
+  </thead>
+  <tbody>
+    <tr>
+      <th>0</th>
+      <td>https://github.com/huggingface/datasets/issues/2787</td>
+      <td>ConnectionError: Couldn't reach https://raw.githubusercontent.com</td>
+      <td>the bug code locate in ：\r\n    if data_args.task_name is not None...</td>
+      <td>Hello,\r\nI am trying to run run_glue.py and it gives me this error...</td>
+    </tr>
+    <tr>
+      <th>1</th>
+      <td>https://github.com/huggingface/datasets/issues/2787</td>
+      <td>ConnectionError: Couldn't reach https://raw.githubusercontent.com</td>
+      <td>Hi @jinec,\r\n\r\nFrom time to time we get this kind of `ConnectionError` coming from the github.com website: https://raw.githubusercontent.com...</td>
+      <td>Hello,\r\nI am trying to run run_glue.py and it gives me this error...</td>
+    </tr>
+    <tr>
+      <th>2</th>
+      <td>https://github.com/huggingface/datasets/issues/2787</td>
+      <td>ConnectionError: Couldn't reach https://raw.githubusercontent.com</td>
+      <td>cannot connect，even by Web browser，please check that  there is some  problems。</td>
+      <td>Hello,\r\nI am trying to run run_glue.py and it gives me this error...</td>
+    </tr>
+    <tr>
+      <th>3</th>
+      <td>https://github.com/huggingface/datasets/issues/2787</td>
+      <td>ConnectionError: Couldn't reach https://raw.githubusercontent.com</td>
+      <td>I can access https://raw.githubusercontent.com/huggingface/datasets/1.7.0/datasets/glue/glue.py without problem...</td>
+      <td>Hello,\r\nI am trying to run run_glue.py and it gives me this error...</td>
+    </tr>
+  </tbody>
+</table>
+
+ကောင်းပါပြီ၊ rows တွေ ပွားနေတာကို တွေ့ရပြီး `comments` column မှာ individual comments တွေ ပါဝင်တာကို မြင်ရပါတယ်။ Pandas နဲ့ ကျွန်တော်တို့ လုပ်ဆောင်တာ ပြီးစီးသွားပြီဆိုတော့ `DataFrame` ကို memory ထဲမှာ loading လုပ်ခြင်းဖြင့် `Dataset` သို့ လျင်မြန်စွာ ပြန်ပြောင်းနိုင်ပါတယ်။
+
+```py
+from datasets import Dataset
+
+comments_dataset = Dataset.from_pandas(comments_df)
+comments_dataset
+```
+
+```python out
+Dataset({
+    features: ['html_url', 'title', 'comments', 'body'],
+    num_rows: 2842
+})
+```
+
+ကောင်းပါပြီ၊ ဒါက ကျွန်တော်တို့ကို အလုပ်လုပ်ဖို့ comments အနည်းငယ် ထောင်ချီပြီး ပေးထားပါတယ်။
+
+> [!TIP]
+> ✏️ **စမ်းသပ်ကြည့်ပါ။** Pandas ကို အသုံးမပြုဘဲ `issues_dataset` ရဲ့ `comments` column ကို explode လုပ်ဖို့ `Dataset.map()` ကို အသုံးပြုနိုင်မလား ကြည့်ပါ။ ဒါက နည်းနည်းလေး ခက်ပါတယ်၊ ဒီ task အတွက် 🤗 Datasets documentation ရဲ့ ["Batch mapping"](https://huggingface.co/docs/datasets/about_map_batch#batch-mapping) အပိုင်းက အထောက်အကူ ဖြစ်နိုင်ပါတယ်။
+
+အခု row တစ်ခုစီမှာ comment တစ်ခုစီ ရှိပြီဆိုတော့၊ comment တစ်ခုစီမှာရှိတဲ့ စကားလုံးအရေအတွက် ပါဝင်တဲ့ `comments_length` column အသစ်တစ်ခု ဖန်တီးရအောင်။
+
+```py
+comments_dataset = comments_dataset.map(
+    lambda x: {"comment_length": len(x["comments"].split())}
+)
+```
+
+ဒီ column အသစ်ကို အသုံးပြုပြီး "cc @lewtun" ဒါမှမဟုတ် "Thanks!" လိုမျိုး search engine နဲ့ မသက်ဆိုင်တဲ့ တိုတိုတောင်းတောင်း comments တွေကို ဖယ်ရှားနိုင်ပါတယ်။ filter အတွက် သတ်မှတ်ထားတဲ့ နံပါတ်မရှိပေမယ့်၊ စကားလုံး ၁၅ လုံးဝန်းကျင်က ကောင်းမွန်တဲ့ အစတစ်ခု ဖြစ်ပါလိမ့်မယ်။
+
+```py
+comments_dataset = comments_dataset.filter(lambda x: x["comment_length"] > 15)
+comments_dataset
+```
+
+```python out
+Dataset({
+    features: ['html_url', 'title', 'comments', 'body', 'comment_length'],
+    num_rows: 2098
+})
+```
+
+ကျွန်တော်တို့ dataset ကို နည်းနည်း သန့်ရှင်းရေး လုပ်ပြီးပြီဆိုတော့၊ issue title၊ description နဲ့ comments တွေကို `text` column အသစ်တစ်ခုထဲမှာ ပေါင်းစပ်လိုက်ရအောင်။ ပုံမှန်အတိုင်းပဲ၊ `Dataset.map()` ကို ပေးပို့နိုင်မယ့် ရိုးရှင်းတဲ့ function တစ်ခုကို ကျွန်တော်တို့ ရေးပါမယ်။
+
+```py
+def concatenate_text(examples):
+    return {
+        "text": examples["title"]
+        + " \n "
+        + examples["body"]
+        + " \n "
+        + examples["comments"]
+    }
+
+
+comments_dataset = comments_dataset.map(concatenate_text)
+```
+
+နောက်ဆုံးတော့ embeddings တွေ ဖန်တီးဖို့ အဆင်သင့်ဖြစ်ပါပြီ။ ကြည့်ရအောင်။
+
+## Text Embeddings များ ဖန်တီးခြင်း[[creating-text-embeddings]]
+
+[Chapter 2](/course/chapter2) မှာ `AutoModel` class ကို အသုံးပြုပြီး token embeddings တွေရနိုင်တယ်ဆိုတာ ကျွန်တော်တို့ တွေ့ခဲ့ရပါတယ်။ ကျွန်တော်တို့ လုပ်ဖို့လိုတာက model ကို load လုပ်ဖို့ သင့်လျော်တဲ့ checkpoint တစ်ခုကို ရွေးချယ်ဖို့ပါပဲ။ ကံကောင်းစွာနဲ့ပဲ၊ embeddings တွေ ဖန်တီးဖို့အတွက် သီးသန့် library တစ်ခုဖြစ်တဲ့ `sentence-transformers` ရှိပါတယ်။ library ရဲ့ [documentation](https://www.sbert.net/examples/applications/semantic-search/README.html#symmetric-vs-asymmetric-semantic-search) မှာ ဖော်ပြထားတဲ့အတိုင်း၊ ကျွန်တော်တို့ရဲ့ use case က _asymmetric semantic search_ ရဲ့ ဥပမာတစ်ခုပါပဲ၊ ဘာလို့လဲဆိုတော့ ကျွန်တော်တို့မှာ တိုတောင်းတဲ့ query တစ်ခုရှိပြီး အဲဒီ query ရဲ့ အဖြေကို issue comment လိုမျိုး ပိုရှည်တဲ့ document တစ်ခုထဲမှာ ရှာဖွေလိုတာကြောင့်ပါ။ documentation ထဲက အသုံးဝင်တဲ့ [model overview table](https://www.sbert.net/docs/pretrained_models.html#model-overview) က `multi-qa-mpnet-base-dot-v1` checkpoint ဟာ semantic search အတွက် အကောင်းဆုံး စွမ်းဆောင်ရည် ရှိတယ်လို့ ညွှန်ပြထားတာကြောင့် ကျွန်တော်တို့ application အတွက် ဒါကို အသုံးပြုပါမယ်။ tokenizer ကိုလည်း အလားတူ checkpoint ကို အသုံးပြုပြီး load လုပ်ပါမယ်။
+
+{#if fw === 'pt'}
+
+```py
+from transformers import AutoTokenizer, AutoModel
+
+model_ckpt = "sentence-transformers/multi-qa-mpnet-base-dot-v1"
+tokenizer = AutoTokenizer.from_pretrained(model_ckpt)
+model = AutoModel.from_pretrained(model_ckpt)
+```
+
+embedding လုပ်ငန်းစဉ်ကို မြန်ဆန်စေဖို့၊ model နဲ့ inputs တွေကို GPU device ပေါ်မှာ ထားတာက အထောက်အကူဖြစ်စေပါတယ်၊ ဒါကြောင့် အခုပဲ လုပ်ရအောင်။
+
+```py
+import torch
+
+device = torch.device("cuda")
+model.to(device)
+```
+
+{:else}
+
+```py
+from transformers import AutoTokenizer, TFAutoModel
+
+model_ckpt = "sentence-transformers/multi-qa-mpnet-base-dot-v1"
+tokenizer = AutoTokenizer.from_pretrained(model_ckpt)
+model = TFAutoModel.from_pretrained(model_ckpt, from_pt=True)
+```
+
+`from_pretrained()` method ရဲ့ argument အဖြစ် `from_pt=True` ကို ကျွန်တော်တို့ သတ်မှတ်ထားတာကို သတိပြုပါ။ ဒါက `multi-qa-mpnet-base-dot-v1` checkpoint မှာ PyTorch weights တွေပဲ ရှိတာကြောင့် `from_pt=True` ကို သတ်မှတ်ခြင်းက ၎င်းတို့ကို ကျွန်တော်တို့အတွက် TensorFlow format သို့ အလိုအလျောက် ပြောင်းပေးပါလိမ့်မယ်။ သင်တွေ့ရတဲ့အတိုင်း၊ 🤗 Transformers မှာ frameworks တွေကြား ပြောင်းလဲတာက အလွန်ရိုးရှင်းပါတယ်။
+
+{/if}
+
+ကျွန်တော်တို့ အစောပိုင်းက ဖော်ပြခဲ့တဲ့အတိုင်း၊ ကျွန်တော်တို့ရဲ့ GitHub issues corpus ထဲက entry တစ်ခုစီကို single vector အဖြစ် ကိုယ်စားပြုချင်တာကြောင့်၊ ကျွန်တော်တို့ token embeddings တွေကို နည်းလမ်းတစ်ခုခုနဲ့ "pool" သို့မဟုတ် average လုပ်ဖို့ လိုအပ်ပါတယ်။ လူကြိုက်များတဲ့ နည်းလမ်းတစ်ခုကတော့ model ရဲ့ outputs တွေပေါ်မှာ *CLS pooling* ကို လုပ်ဆောင်တာပါပဲ၊ ဒီနေရာမှာ ကျွန်တော်တို့ဟာ special `[CLS]` token အတွက် last hidden state ကို ရိုးရှင်းစွာ စုဆောင်းပါတယ်။ အောက်ပါ function က ကျွန်တော်တို့အတွက် လုပ်ဆောင်ပေးပါတယ်။
+
+```py
+def cls_pooling(model_output):
+    return model_output.last_hidden_state[:, 0]
+```
+
+နောက်ထပ်အနေနဲ့၊ documents တွေရဲ့ list တစ်ခုကို tokenize လုပ်ပေးမယ့်၊ tensors တွေကို GPU ပေါ်မှာ ထားပေးမယ့်၊ model ကို feed လုပ်ပေးမယ့်၊ ပြီးတော့ နောက်ဆုံးမှာ outputs တွေကို CLS pooling လုပ်ပေးမယ့် helper function တစ်ခုကို ကျွန်တော်တို့ ဖန်တီးပါမယ်။
+
+{#if fw === 'pt'}
+
+```py
+def get_embeddings(text_list):
+    encoded_input = tokenizer(
+        text_list, padding=True, truncation=True, return_tensors="pt"
+    )
+    encoded_input = {k: v.to(device) for k, v in encoded_input.items()}
+    model_output = model(**encoded_input)
+    return cls_pooling(model_output)
+```
+
+function က အလုပ်ဖြစ်မဖြစ် စစ်ဆေးဖို့အတွက် ကျွန်တော်တို့ corpus ထဲက ပထမဆုံး text entry ကို feed လုပ်ပြီး output shape ကို စစ်ဆေးနိုင်ပါတယ်။
+
+```py
+embedding = get_embeddings(comments_dataset["text"][0])
+embedding.shape
+```
+
+```python out
+torch.Size([1, 768])
+```
+
+ကောင်းပါပြီ၊ ကျွန်တော်တို့ corpus ထဲက ပထမဆုံး entry ကို 768-dimensional vector တစ်ခုအဖြစ် ပြောင်းလဲခဲ့ပါပြီ။ ကျွန်တော်တို့ရဲ့ `get_embeddings()` function ကို corpus ထဲက row တစ်ခုစီတိုင်းမှာ အသုံးပြုဖို့ `Dataset.map()` ကို အသုံးပြုနိုင်တာကြောင့်၊ အောက်ပါအတိုင်း `embeddings` column အသစ်တစ်ခု ဖန်တီးရအောင်။
+
+```py
+embeddings_dataset = comments_dataset.map(
+    lambda x: {"embeddings": get_embeddings(x["text"]).detach().cpu().numpy()[0]}
+)
+```
+
+{:else}
+
+```py
+def get_embeddings(text_list):
+    encoded_input = tokenizer(
+        text_list, padding=True, truncation=True, return_tensors="tf"
+    )
+    encoded_input = {k: v for k, v in encoded_input.items()}
+    model_output = model(**encoded_input)
+    return cls_pooling(model_output)
+```
+
+function က အလုပ်ဖြစ်မဖြစ် စစ်ဆေးဖို့အတွက် ကျွန်တော်တို့ corpus ထဲက ပထမဆုံး text entry ကို feed လုပ်ပြီး output shape ကို စစ်ဆေးနိုင်ပါတယ်။
+
+```py
+embedding = get_embeddings(comments_dataset["text"][0])
+embedding.shape
+```
+
+```python out
+TensorShape([1, 768])
+```
+
+ကောင်းပါပြီ၊ ကျွန်တော်တို့ corpus ထဲက ပထမဆုံး entry ကို 768-dimensional vector တစ်ခုအဖြစ် ပြောင်းလဲခဲ့ပါပြီ။ ကျွန်တော်တို့ရဲ့ `get_embeddings()` function ကို corpus ထဲက row တစ်ခုစီတိုင်းမှာ အသုံးပြုဖို့ `Dataset.map()` ကို အသုံးပြုနိုင်တာကြောင့်၊ အောက်ပါအတိုင်း `embeddings` column အသစ်တစ်ခု ဖန်တီးရအောင်။
+
+```py
+embeddings_dataset = comments_dataset.map(
+    lambda x: {"embeddings": get_embeddings(x["text"]).numpy()[0]}
+)
+```
+
+{/if}
+
+embeddings တွေကို NumPy arrays တွေအဖြစ် ပြောင်းလဲထားတာကို သတိပြုပါ။ ဒါက 🤗 Datasets က ၎င်းတို့ကို FAISS နဲ့ index လုပ်ဖို့ ကြိုးစားတဲ့အခါ ဒီ format ကို လိုအပ်လို့ပါ။ ဒါကို နောက်တစ်ဆင့်မှာ ကျွန်တော်တို့ လုပ်ဆောင်ပါမယ်။
+
+## FAISS ကို အသုံးပြု၍ ထိရောက်သော Similarity Search[[using-faiss-for-efficient-similarity-search]]
+
+အခု ကျွန်တော်တို့မှာ embeddings တွေရဲ့ dataset တစ်ခုရှိပြီဆိုတော့ ၎င်းတို့ပေါ်မှာ search လုပ်ဖို့ နည်းလမ်းတစ်ခု လိုအပ်ပါတယ်။ ဒါကိုလုပ်ဖို့၊ 🤗 Datasets မှာ _FAISS index_ လို့ခေါ်တဲ့ အထူး data structure တစ်ခုကို ကျွန်တော်တို့ အသုံးပြုပါမယ်။ [FAISS](https://faiss.ai/) (Facebook AI Similarity Search ရဲ့ အတိုကောက်) က embedding vectors တွေကို လျင်မြန်စွာ ရှာဖွေပြီး cluster လုပ်ဖို့ ထိရောက်တဲ့ algorithms တွေကို ပံ့ပိုးပေးတဲ့ library တစ်ခုပါ။
+
+FAISS ရဲ့ အခြေခံသဘောတရားက input embedding တစ်ခုနဲ့ ဆင်တူတဲ့ embeddings တွေကို ရှာဖွေနိုင်စေမယ့် _index_ လို့ခေါ်တဲ့ အထူး data structure တစ်ခုကို ဖန်တီးဖို့ပါပဲ။ 🤗 Datasets မှာ FAISS index တစ်ခု ဖန်တီးတာက ရိုးရှင်းပါတယ် — ကျွန်တော်တို့ `Dataset.add_faiss_index()` function ကို အသုံးပြုပြီး ကျွန်တော်တို့ dataset ထဲက ဘယ် column ကို index လုပ်ချင်တယ်ဆိုတာ သတ်မှတ်ပေးရုံပါပဲ။
+
+```py
+embeddings_dataset.add_faiss_index(column="embeddings")
+```
+
+အခု ကျွန်တော်တို့ `Dataset.get_nearest_examples()` function နဲ့ nearest neighbor lookup လုပ်ခြင်းဖြင့် ဒီ index ပေါ်မှာ queries တွေ လုပ်ဆောင်နိုင်ပါပြီ။ ဒါကို ပထမဆုံး မေးခွန်းတစ်ခုကို အောက်ပါအတိုင်း embedding လုပ်ခြင်းဖြင့် စမ်းကြည့်ရအောင်။
+
+{#if fw === 'pt'}
+
+```py
+question = "How can I load a dataset offline?"
+question_embedding = get_embeddings([question]).cpu().detach().numpy()
+question_embedding.shape
+```
+
+```python out
+torch.Size([1, 768])
+```
+
+{:else}
+
+```py
+question = "How can I load a dataset offline?"
+question_embedding = get_embeddings([question]).numpy()
+question_embedding.shape
+```
+
+```python out
+(1, 768)
+```
+
+{/if}
+
+documents တွေနဲ့အတူတူပဲ၊ ကျွန်တော်တို့မှာ အခု query ကို ကိုယ်စားပြုတဲ့ 768-dimensional vector တစ်ခု ရှိပါတယ်။ ဒါကို အတူဆုံး embeddings တွေကို ရှာဖွေဖို့ corpus တစ်ခုလုံးနဲ့ နှိုင်းယှဉ်နိုင်ပါတယ်။
+
+```py
+scores, samples = embeddings_dataset.get_nearest_examples(
+    "embeddings", question_embedding, k=5
+)
+```
+
+`Dataset.get_nearest_examples()` function က query နဲ့ document ကြား တူညီမှုအဆင့်ကို အဆင့်သတ်မှတ်ပေးတဲ့ scores တွေရဲ့ tuple တစ်ခုနဲ့ သက်ဆိုင်ရာ samples အစုအဝေးတစ်ခု (ဒီနေရာမှာတော့ အကောင်းဆုံး ကိုက်ညီမှု ၅ ခု) ကို ပြန်ပေးပါတယ်။ ဒါတွေကို `pandas.DataFrame` ထဲမှာ စုဆောင်းပြီး အလွယ်တကူ စီစဉ်နိုင်အောင် လုပ်ရအောင်။
+
+```py
+import pandas as pd
+
+samples_df = pd.DataFrame.from_dict(samples)
+samples_df["scores"] = scores
+samples_df.sort_values("scores", ascending=False, inplace=True)
+```
+
+အခု ကျွန်တော်တို့ query က ရရှိနိုင်တဲ့ comments တွေနဲ့ ဘယ်လောက် ကောင်းကောင်း ကိုက်ညီလဲဆိုတာကို ကြည့်ဖို့ ပထမဆုံး rows အနည်းငယ်ကို iterate လုပ်နိုင်ပါပြီ။
+
+```py
+for _, row in samples_df.iterrows():
+    print(f"COMMENT: {row.comments}")
+    print(f"SCORE: {row.scores}")
+    print(f"TITLE: {row.title}")
+    print(f"URL: {row.html_url}")
+    print("=" * 50)
+    print()
+```
+
+```python out
+"""
+COMMENT: Requiring online connection is a deal breaker in some cases unfortunately so it'd be great if offline mode is added similar to how `transformers` loads models offline fine.
+
+@mandubian's second bullet point suggests that there's a workaround allowing you to use your offline (custom?) dataset with `datasets`. Could you please elaborate on how that should look like?
+SCORE: 25.505046844482422
+TITLE: Discussion using datasets in offline mode
+URL: https://github.com/huggingface/datasets/issues/824
+==================================================
+
+COMMENT: The local dataset builders (csv, text , json and pandas) are now part of the `datasets` package since #1726 :)
+You can now use them offline
+\`\`\`python
+datasets = load_dataset("text", data_files=data_files)
+\`\`\`
+
+We'll do a new release soon
+SCORE: 24.555509567260742
+TITLE: Discussion using datasets in offline mode
+URL: https://github.com/huggingface/datasets/issues/824
+==================================================
+
+COMMENT: I opened a PR that allows to reload modules that have already been loaded once even if there's no internet.
+
+Let me know if you know other ways that can make the offline mode experience better. I'd be happy to add them :)
+
+I already note the "freeze" modules option, to prevent local modules updates. It would be a cool feature.
+
+----------
+
+> @mandubian's second bullet point suggests that there's a workaround allowing you to use your offline (custom?) dataset with `datasets`. Could you please elaborate on how that should look like?
+
+Indeed `load_dataset` allows to load remote dataset script (squad, glue, etc.) but also you own local ones.
+For example if you have a dataset script at `./my_dataset/my_dataset.py` then you can do
+\`\`\`python
+load_dataset("./my_dataset")
+\`\`\`
+and the dataset script will generate your dataset once and for all.
+
+----------
+
+About I'm looking into having `csv`, `json`, `text`, `pandas` dataset builders already included in the `datasets` package, so that they are available offline by default, as opposed to the other datasets that require the script to be downloaded.
+cf #1724
+SCORE: 24.14896583557129
+TITLE: Discussion using datasets in offline mode
+URL: https://github.com/huggingface/datasets/issues/824
+==================================================
+
+COMMENT: > here is my way to load a dataset offline, but it **requires** an online machine
+>
+> 1. (online machine)
+>
+> ```
+>
+> import datasets
+>
+> data = datasets.load_dataset(...)
+>
+> data.save_to_disk(/YOUR/DATASET/DIR)
+>
+> ```
+>
+> 2. copy the dir from online to the offline machine
+>
+> 3. (offline machine)
+>
+> ```
+>
+> import datasets
+>
+> data = datasets.load_from_disk(/SAVED/DATA/DIR)
+>
+> ```
+>
+>
+>
+> HTH.
+
+
+SCORE: 22.893993377685547
+TITLE: Discussion using datasets in offline mode
+URL: https://github.com/huggingface/datasets/issues/824
+==================================================
+
+COMMENT: here is my way to load a dataset offline, but it **requires** an online machine
+1. (online machine)
+\`\`\`
+import datasets
+data = datasets.load_dataset(...)
+data.save_to_disk(/YOUR/DATASET/DIR)
+\`\`\`
+2. copy the dir from online to the offline machine
+3. (offline machine)
+\`\`\`
+import datasets
+data = datasets.load_from_disk(/SAVED/DATA/DIR)
+\`\`\`
+
+HTH.
+SCORE: 22.406635284423828
+TITLE: Discussion using datasets in offline mode
+URL: https://github.com/huggingface/datasets/issues/824
+==================================================
+"""
+```
+
+မဆိုးပါဘူး! ကျွန်တော်တို့ရဲ့ ဒုတိယ hit က query နဲ့ ကိုက်ညီပုံရပါတယ်။
+
+> [!TIP]
+> ✏️ **စမ်းသပ်ကြည့်ပါ။** သင့်ကိုယ်ပိုင် query တစ်ခုကို ဖန်တီးပြီး ပြန်လည်ရရှိထားတဲ့ documents တွေထဲမှာ အဖြေရှာနိုင်မလား ကြည့်ပါ။ search ကို ပိုမိုကျယ်ပြန့်စေဖို့ `Dataset.get_nearest_examples()` မှာရှိတဲ့ `k` parameter ကို တိုးမြှင့်ရပါလိမ့်မယ်။
+
+## ဝေါဟာရ ရှင်းလင်းချက် (Glossary)
+
+*   **FAISS (Facebook AI Similarity Search)**: embedding vectors များကို လျင်မြန်စွာ ရှာဖွေပြီး cluster လုပ်ရန်အတွက် ထိရောက်သော algorithms များကို ပံ့ပိုးပေးသော library တစ်ခု။
+*   **Semantic Search**: အဓိပ္ပာယ်ကို နားလည်ပြီး query တစ်ခု၏ ရည်ရွယ်ချက်နှင့် ကိုက်ညီသော documents များကို ရှာဖွေပေးသည့် search engine အမျိုးအစား။
+*   **Embeddings**: စာသား (သို့မဟုတ် အခြားဒေတာ) ကို multi-dimensional vector space ထဲရှိ ဂဏန်းများအဖြစ် ကိုယ်စားပြုခြင်း။
+*   **GitHub Issues**: GitHub repository များတွင် ပြဿနာများ၊ bug များ သို့မဟုတ် အင်္ဂါရပ် တောင်းဆိုမှုများကို မှတ်တမ်းတင်ရန် အသုံးပြုသော အင်္ဂါရပ်။
+*   **Comments**: GitHub issue တစ်ခု သို့မဟုတ် pull request တစ်ခုအောက်တွင် အသုံးပြုသူများက ပေါင်းထည့်သော စာသားမှတ်ချက်များ။
+*   **Search Engine**: အသုံးပြုသူ၏ query နှင့် ကိုက်ညီသော အချက်အလက်များကို ရှာဖွေပေးသည့် စနစ်။
+*   **Transformer-based Language Models**: Transformer architecture ပေါ်တွင် အခြေခံထားသော language models များ။
+*   **Embedding Vector**: စာသားအပိုင်းအစတစ်ခု (token, sentence, paragraph) ကို ဂဏန်းတန်ဖိုးများဖြင့် ကိုယ်စားပြုထားသော vector။
+*   **Pooling**: individual embeddings များကို ပေါင်းစပ်ပြီး ပိုကြီးသော text unit (ဥပမာ- sentence, document) အတွက် single vector representation တစ်ခု ဖန်တီးခြင်း။
+*   **Corpus**: သုတေသနပြုရန်အတွက် အသုံးပြုသော စာသားအစုအဝေးကြီး။
+*   **Dot-product Similarity**: vectors နှစ်ခုကြားရှိ similarity ကို တွက်ချက်သော metric တစ်ခု။
+*   **Similarity Metric**: အရာဝတ္ထုနှစ်ခု (ဥပမာ- embeddings) မည်မျှတူညီသည်ကို တိုင်းတာသော နည်းလမ်း။
+*   **Query**: search engine သို့မဟုတ် database မှ အချက်အလက်များ တောင်းဆိုခြင်း။
+*   **Conventional Approaches**: ရိုးရာ သို့မဟုတ် သမားရိုးကျ နည်းလမ်းများ။
+*   **Keywords**: Search query တွင် အသုံးပြုသော အဓိကစကားလုံးများ။
+*   **`load_dataset()` Function**: Hugging Face Datasets library မှ dataset များကို download လုပ်ပြီး cache လုပ်ရန် အသုံးပြုသော function။
+*   **`split="train"`**: `load_dataset()` function တွင် training split ကို ရွေးချယ်ရန်အတွက် argument။
+*   **`Dataset` Object**: Hugging Face Datasets library မှ dataset တစ်ခုကို ကိုယ်စားပြုသော object။
+*   **`DatasetDict` Object**: Training set, validation set, နှင့် test set ကဲ့သို့သော dataset အများအပြားကို dictionary ပုံစံဖြင့် သိမ်းဆည်းထားသော object။
+*   **Pull Requests**: GitHub တွင် code အပြောင်းအလဲများကို project ၏ main branch သို့ ပေါင်းစည်းရန် တောင်းဆိုခြင်း။
+*   **`Dataset.filter()` Function**: Dataset မှ သတ်မှတ်ထားသော အခြေအနေများနှင့် မကိုက်ညီသော rows များကို ဖယ်ရှားရန် အသုံးပြုသော function။
+*   **`is_pull_request`**: GitHub issue တစ်ခုသည် pull request ဟုတ်မဟုတ်ကို ဖော်ပြသော feature (boolean value)။
+*   **`len(x["comments"]) > 0`**: comment list ၏ အရှည်သည် သုညထက် ကြီးမားခြင်းရှိမရှိ စစ်ဆေးခြင်း။
+*   **`title` Column**: Issue ၏ ခေါင်းစဉ်ကို သိမ်းဆည်းထားသော column။
+*   **`body` Column**: Issue ၏ ဖော်ပြချက်ကို သိမ်းဆည်းထားသော column။
+*   **`comments` Column**: Issue နှင့် သက်ဆိုင်သော comments များကို သိမ်းဆည်းထားသော column (list of strings)။
+*   **`html_url` Column**: Issue ၏ GitHub URL ကို သိမ်းဆည်းထားသော column။
+*   **`Dataset.remove_columns()` Function**: Dataset မှ မလိုအပ်သော columns များကို ဖယ်ရှားရန် အသုံးပြုသော function။
+*   **`set()`**: Python တွင် item များကို စုစည်းသိမ်းဆည်းထားသော unordered collection ဖြစ်ပြီး duplicate များ မပါဝင်ပါ။
+*   **`symmetric_difference()`**: set နှစ်ခုကြားရှိ မတူညီသော items များကို ရှာဖွေသော method။
+*   **Contextual Information**: အခြေအနေတစ်ခု သို့မဟုတ် စာသားတစ်ခု၏ အဓိပ္ပာယ်ကို နားလည်ရန် ကူညီပေးသော နောက်ခံအချက်အလက်များ။
+*   **"Explode" a Column**: Pandas DataFrame တွင် list-like column တစ်ခုရှိ element တစ်ခုစီအတွက် new row တစ်ခု ဖန်တီးခြင်း။
+*   **Pandas `DataFrame`**: Python တွင် tabular data (ဇယားပုံစံဒေတာ) ကို ကိုင်တွယ်ရန် အသုံးပြုသော two-dimensional data structure။
+*   **`DataFrame.explode()` Function**: Pandas မှ list-like column တစ်ခုရှိ element တစ်ခုစီအတွက် new row တစ်ခု ဖန်တီးပေးသော function။
+*   **`issues_dataset.set_format("pandas")`**: Dataset ကို Pandas DataFrame format သို့ ပြောင်းလဲခြင်း။
+*   **`issues_dataset[:]`**: Dataset တစ်ခုလုံးကို (Pandas format တွင်) selection လုပ်ခြင်း။
+*   **`comments_df.head(4)`**: DataFrame ၏ ပထမဆုံး rows လေးခုကို ပြသခြင်း။
+*   **`ignore_index=True`**: `explode()` function တွင် original index ကို မထိန်းသိမ်းဘဲ new index ကို ဖန်တီးရန် argument။
+*   **`Dataset.from_pandas()`**: Pandas DataFrame တစ်ခုမှ Hugging Face Dataset object တစ်ခုကို ဖန်တီးသော method။
+*   **`comments_length` Column**: Comment တစ်ခုစီရှိ စကားလုံးအရေအတွက်ကို သိမ်းဆည်းထားသော column။
+*   **`x["comments"].split()`**: Comment စာသားကို စကားလုံးများအဖြစ် ပိုင်းခြားခြင်း။
+*   **`Dataset.map()`**: 🤗 Datasets library မှာ ပါဝင်တဲ့ method တစ်ခုဖြစ်ပြီး dataset ရဲ့ element တစ်ခုစီ ဒါမှမဟုတ် batch တစ်ခုစီပေါ်မှာ function တစ်ခုကို အသုံးပြုနိုင်စေသည်။
+*   **`AutoModel` Class**: Hugging Face Transformers library မှ မော်ဒယ်အမည်ကို အသုံးပြုပြီး သက်ဆိုင်ရာ model class ကို အလိုအလျောက် load လုပ်ပေးသော class။
+*   **Checkpoint**: မော်ဒယ်၏ weights များနှင့် အခြားဖွဲ့စည်းပုံများ (configuration) ကို သတ်မှတ်ထားသော အချိန်တစ်ခုတွင် သိမ်းဆည်းထားခြင်း။
+*   **`sentence-transformers` Library**: Sentence embeddings များ ဖန်တီးရန်အတွက် ဒီဇိုင်းထုတ်ထားသော Python library။
+*   **Asymmetric Semantic Search**: query သည် တိုတောင်းပြီး document သည် ရှည်လျားသော semantic search အမျိုးအစား (ဥပမာ- မေးခွန်းတစ်ခုကို အဖြေရှာခြင်း)။
+*   **`multi-qa-mpnet-base-dot-v1`**: Semantic search အတွက် စွမ်းဆောင်ရည်ကောင်းမွန်သော sentence-transformer model checkpoint။
+*   **`AutoTokenizer`**: Hugging Face Transformers library မှာ ပါဝင်တဲ့ class တစ်ခုဖြစ်ပြီး မော်ဒယ်အမည်ကို အသုံးပြုပြီး သက်ဆိုင်ရာ tokenizer ကို အလိုအလျောက် load လုပ်ပေးသည်။
+*   **GPU (Graphics Processing Unit)**: ဂရပ်ဖစ်လုပ်ဆောင်မှုအတွက် အထူးဒီဇိုင်းထုတ်ထားသော processor တစ်မျိုးဖြစ်သော်လည်း AI/ML လုပ်ငန်းများတွင် အရှိန်မြှင့်ရန် အသုံးများသည်။
+*   **`torch.device("cuda")`**: PyTorch တွင် GPU device ကို ရည်ညွှန်းသည်။
+*   **`model.to(device)`**: PyTorch model ကို သတ်မှတ်ထားသော device (GPU) သို့ ရွှေ့ပြောင်းခြင်း။
+*   **`TFAutoModel`**: TensorFlow framework အတွက် `AutoModel` နှင့် တူညီသော လုပ်ဆောင်ချက်များရှိသည်။
+*   **`from_pt=True`**: `TFAutoModel.from_pretrained()` တွင် PyTorch weights များကို TensorFlow format သို့ အလိုအလျောက် ပြောင်းလဲရန် argument။
+*   **CLS Pooling**: Transformer model ၏ output မှ `[CLS]` token ၏ last hidden state ကို အသုံးပြု၍ text sequence အတွက် single vector representation တစ်ခု ဖန်တီးခြင်း။
+*   **`[CLS]` Token**: BERT model တွင် sequence ၏ အစကို ကိုယ်စားပြုသော special token။
+*   **Last Hidden State**: Transformer model ၏ နောက်ဆုံး layer မှ output embeddings များ။
+*   **`encoded_input`**: Tokenizer မှ ထုတ်ပေးသော input IDs, attention masks စသည်တို့ ပါဝင်သော dictionary။
+*   **`padding=True`**: Tokenization လုပ်ရာတွင် sequence အရှည်များ ကွဲပြားပါက အရှည်ဆုံး sequence အရှည်အတိုင်း ဖြည့်ပေးခြင်း။
+*   **`truncation=True`**: sequence အရှည်သည် model ၏ အများဆုံး input အရှည်ထက် ရှည်လျားပါက ဖြတ်တောက်ခြင်း။
+*   **`return_tensors="pt"`**: PyTorch tensors များအဖြစ် output ပြန်ပေးရန် argument။
+*   **`encoded_input.items()`**: dictionary မှ key-value pairs များကို ရယူခြင်း။
+*   **`k: v.to(device)`**: dictionary comprehension ဖြင့် input tensors များကို GPU သို့ ရွှေ့ခြင်း။
+*   **`model(**encoded_input)`**: model ကို encoded inputs များဖြင့် run ခြင်း။
+*   **`embedding.shape`**: embedding vector ၏ ပုံသဏ္ဍာန် (dimensions) ကို ပြသခြင်း။
+*   **768-dimensional Vector**: dimensions ၇၆၈ ခုပါဝင်သော vector။
+*   **`detach().cpu().numpy()[0]`**: PyTorch tensor ကို detach (computation graph မှ ဖြတ်တောက်) ပြီး CPU သို့ ရွှေ့၊ ထို့နောက် NumPy array အဖြစ် ပြောင်းလဲခြင်း။
+*   **NumPy Arrays**: Python တွင် ဂဏန်းဆိုင်ရာ တွက်ချက်မှုများအတွက် အသုံးပြုသော array object။
+*   **FAISS Index**: FAISS library မှ efficient similarity search အတွက် အသုံးပြုသော data structure။
+*   **`Dataset.add_faiss_index()` Function**: Hugging Face Dataset တွင် FAISS index တစ်ခုကို ထည့်သွင်းရန် အသုံးပြုသော function။
+*   **Nearest Neighbor Lookup**: input query နှင့် အတူဆုံးသော item များကို ရှာဖွေခြင်း။
+*   **`Dataset.get_nearest_examples()` Function**: Dataset ၏ FAISS index ကို အသုံးပြုပြီး query နှင့် အတူဆုံးသော examples များကို ပြန်ပေးသော function။
+*   **`question_embedding`**: Query စာသား၏ embedding vector။
+*   **`k` Parameter**: `get_nearest_examples()` function တွင် အနီးဆုံး examples အရေအတွက် (k) ကို သတ်မှတ်ရန် အသုံးပြုသော parameter။
+*   **`pd.DataFrame.from_dict()`**: dictionary တစ်ခုမှ Pandas DataFrame တစ်ခုကို ဖန်တီးသော method။
+*   **`samples_df.sort_values("scores", ascending=False, inplace=True)`**: DataFrame ကို `scores` column အလိုက် အများဆုံးမှ အနည်းဆုံးသို့ စီစဉ်ခြင်း။
+*   **`iterrows()`**: DataFrame ၏ rows များကို iterate လုပ်ရန် အသုံးပြုသော method။
\ No newline at end of file
diff --git a/chapters/my/chapter5/7.mdx b/chapters/my/chapter5/7.mdx
new file mode 100644
index 000000000..132826eec
--- /dev/null
+++ b/chapters/my/chapter5/7.mdx
@@ -0,0 +1,36 @@
+# 🤗 Datasets၊ အဆင်သင့်ဖြစ်ပါပြီ![[datasets-check]]
+
+<CourseFloatingBanner
+    chapter={5}
+    classNames="absolute z-10 right-0 top-0"
+/>
+
+🤗 Datasets library ကို ကောင်းကောင်း လေ့လာခဲ့ပြီးပါပြီ၊ ဒီအထိ ရောက်လာတဲ့အတွက် ဂုဏ်ယူပါတယ်။ ဒီအခန်းကနေ သင်ရရှိခဲ့တဲ့ ဗဟုသုတတွေနဲ့ သင်ဟာ အောက်ပါတို့ကို လုပ်ဆောင်နိုင်ပါလိမ့်မယ်။
+
+-   Hugging Face Hub၊ သင့် laptop ဒါမှမဟုတ် သင့်ကုမ္ပဏီက remote server တစ်ခုကနေ dataset တွေကို load လုပ်ပါ။
+-   `Dataset.map()` နဲ့ `Dataset.filter()` functions တွေကို ပေါင်းစပ်အသုံးပြုပြီး သင့် data တွေကို wrangle လုပ်ပါ။
+-   `Dataset.set_format()` ကို အသုံးပြုပြီး Pandas နဲ့ NumPy လို data formats တွေကြား လျင်မြန်စွာ ပြောင်းလဲပါ။
+-   သင့်ကိုယ်ပိုင် dataset ကို ဖန်တီးပြီး Hugging Face Hub ကို push လုပ်ပါ။
+-   Transformer model ကို အသုံးပြုပြီး သင့် documents တွေကို embed လုပ်ကာ FAISS ကို အသုံးပြုပြီး semantic search engine တစ်ခုကို တည်ဆောက်ပါ။
+
+[Chapter 7](/course/chapter7) မှာ၊ Transformer models တွေအတွက် အကောင်းဆုံးဖြစ်တဲ့ အဓိက NLP tasks တွေကို နက်နက်နဲနဲ လေ့လာရင်း ဒီအရာအားလုံးကို ကောင်းကောင်း အသုံးချသွားမှာပါ။ ရှေ့ကို ဆက်မသွားခင်၊ 🤗 Datasets အပေါ် သင်ရဲ့ ဗဟုသုတကို quick quiz တစ်ခုနဲ့ စစ်ဆေးကြည့်လိုက်ပါ။
+
+## ဝေါဟာရ ရှင်းလင်းချက် (Glossary)
+
+*   **🤗 Datasets Library**: Hugging Face က ထုတ်လုပ်ထားတဲ့ library တစ်ခုဖြစ်ပြီး AI မော်ဒယ်တွေ လေ့ကျင့်ဖို့အတွက် ဒေတာအစုအဝေး (datasets) တွေကို လွယ်လွယ်ကူကူ ဝင်ရောက်ရယူ၊ စီမံခန့်ခွဲပြီး အသုံးပြုနိုင်စေပါတယ်။
+*   **Hugging Face Hub**: AI မော်ဒယ်တွေ၊ datasets တွေနဲ့ demo တွေကို အခြားသူတွေနဲ့ မျှဝေဖို့၊ ရှာဖွေဖို့နဲ့ ပြန်လည်အသုံးပြုဖို့အတွက် အွန်လိုင်း platform တစ်ခု ဖြစ်ပါတယ်။
+*   **Laptop**: သယ်ဆောင်ရလွယ်ကူသော ကိုယ်ပိုင်ကွန်ပျူတာ။
+*   **Remote Server**: ကွန်ရက်တစ်ခုပေါ်တွင် ဝန်ဆောင်မှုများ သို့မဟုတ် အရင်းအမြစ်များကို ပံ့ပိုးပေးသော ကွန်ပျူတာ။
+*   **Wrangle Data**: ကုန်ကြမ်းဒေတာ (raw data) များကို ပိုမိုအသုံးဝင်ပြီး သန့်ရှင်းသော ပုံစံသို့ ပြောင်းလဲရန်အတွက် လုပ်ဆောင်သော လုပ်ငန်းစဉ်များ။
+*   **`Dataset.map()` Function**: 🤗 Datasets library မှာ ပါဝင်တဲ့ method တစ်ခုဖြစ်ပြီး dataset ရဲ့ element တစ်ခုစီ ဒါမှမဟုတ် batch တစ်ခုစီပေါ်မှာ function တစ်ခုကို အသုံးပြုနိုင်စေသည်။
+*   **`Dataset.filter()` Function**: 🤗 Datasets library မှာ ပါဝင်တဲ့ method တစ်ခုဖြစ်ပြီး သတ်မှတ်ထားသော အခြေအနေများနှင့် ကိုက်ညီသော ဒေတာများကိုသာ dataset မှ ရွေးထုတ်ရန် အသုံးပြုသည်။
+*   **Pandas**: Python programming language အတွက် data analysis နှင့် manipulation အတွက် အသုံးပြုသော open-source library။
+*   **NumPy**: Python programming language အတွက် numerical computing (ဂဏန်းတွက်ချက်မှု) အတွက် အသုံးပြုသော library။
+*   **`Dataset.set_format()` Function**: 🤗 Datasets library မှာ ပါဝင်တဲ့ method တစ်ခုဖြစ်ပြီး dataset ၏ output format (ဥပမာ- "pandas", "numpy", "torch", "tensorflow") ကို သတ်မှတ်ရန် အသုံးပြုသည်။
+*   **Push to the Hub**: Hugging Face Hub သို့ model, dataset သို့မဟုတ် အခြား artifacts များကို upload လုပ်ခြင်း။
+*   **Embed Documents**: စာသား document များကို vector space အတွင်းရှိ ဂဏန်းဆိုင်ရာ ကိုယ်စားပြုမှုများ (embeddings) အဖြစ် ပြောင်းလဲခြင်း။ ၎င်းသည် document များကြား ဆင်တူမှုများကို တိုင်းတာနိုင်စေသည်။
+*   **Transformer Model**: Natural Language Processing (NLP) မှာ အောင်မြင်မှုများစွာရရှိခဲ့တဲ့ deep learning architecture တစ်မျိုးပါ။
+*   **Semantic Search Engine**: စာလုံးများကို ကိုက်ညီမှု ရှာဖွေခြင်းထက် အဓိပ္ပာယ်ပေါ်မူတည်၍ ရှာဖွေနိုင်သော search engine။
+*   **FAISS (Facebook AI Similarity Search)**: Facebook AI မှ ထုတ်လုပ်ထားသော library တစ်ခုဖြစ်ပြီး vector များကို မြန်ဆန်ထိရောက်စွာ ရှာဖွေခြင်းနှင့် grouping လုပ်ခြင်းအတွက် အသုံးပြုသည်။
+*   **NLP Tasks (Natural Language Processing Tasks)**: ကွန်ပျူတာတွေ လူသားဘာသာစကားကို နားလည်၊ အဓိပ္ပာယ်ဖော်ပြီး၊ ဖန်တီးနိုင်အောင် လုပ်ဆောင်ပေးတဲ့ အလုပ်တွေ (ဥပမာ- text classification, question answering)။
+*   **Quick Quiz**: ဗဟုသုတကို လျင်မြန်စွာ စစ်ဆေးသည့် မေးခွန်းအနည်းငယ်။
\ No newline at end of file
diff --git a/chapters/my/chapter5/8.mdx b/chapters/my/chapter5/8.mdx
new file mode 100644
index 000000000..e70c2a1df
--- /dev/null
+++ b/chapters/my/chapter5/8.mdx
@@ -0,0 +1,296 @@
+<!-- DISABLE-FRONTMATTER-SECTIONS -->
+
+# အခန်း (၅) ဆိုင်ရာ မေးခွန်းများ[[end-of-chapter-quiz]]
+
+<CourseFloatingBanner
+    chapter={5}
+    classNames="absolute z-10 right-0 top-0"
+/>
+
+ဒီအခန်းမှာ အချက်အလက်များစွာကို ဖော်ပြခဲ့ပါတယ်။ အသေးစိတ်အချက်အလက်အားလုံးကို နားမလည်သေးရင်လည်း စိတ်မပူပါနဲ့၊ နောက်အခန်းတွေက အတွင်းပိုင်းလုပ်ဆောင်မှုတွေကို နားလည်အောင် ကူညီပေးပါလိမ့်မယ်။
+
+ဒါပေမယ့် ဆက်မသွားခင်၊ ဒီအခန်းမှာ သင်ယူခဲ့တာတွေကို စစ်ဆေးကြည့်ရအောင်။
+
+### ၁။ 🤗 Datasets မှာရှိတဲ့ `load_dataset()` function က အောက်ပါနေရာတွေထဲက ဘယ်နေရာကနေ dataset တစ်ခုကို load လုပ်နိုင်စေသလဲ။
+
+<Question
+	choices={[
+		{
+			text: "Locally၊ ဥပမာ သင့် laptop ပေါ်ကနေ",
+			explain: "မှန်ပါတယ်။ local datasets တွေကို load လုပ်ဖို့အတွက် local files တွေရဲ့ paths တွေကို `load_dataset()` ရဲ့ `data_files` argument ကို ပေးနိုင်ပါတယ်။",
+			correct: true
+		},
+		{
+			text: "Hugging Face Hub ကနေ",
+			explain: "မှန်ပါတယ်။ dataset ID ကို ပေးခြင်းဖြင့် Hub ပေါ်က datasets တွေကို load လုပ်နိုင်ပါတယ်။ ဥပမာ- <code>load_dataset('emotion')</code>။",
+			correct: true
+		},
+		{
+			text: "Remote server တစ်ခုကနေ",
+			explain: "မှန်ပါတယ်။ remote files တွေကို load လုပ်ဖို့အတွက် URLs တွေကို `load_dataset()` ရဲ့ `data_files` argument ကို ပေးနိုင်ပါတယ်။",
+			correct: true
+		},
+	]}
+/>
+
+### ၂။ အောက်ပါအတိုင်း GLUE tasks ထဲက တစ်ခုကို load လုပ်တယ်လို့ ယူဆပါစို့-
+
+```py
+from datasets import load_dataset
+
+dataset = load_dataset("glue", "mrpc", split="train")
+```
+
+အောက်ပါ commands တွေထဲက ဘယ်ဟာက `dataset` ကနေ elements ၅၀ ကို random sample အဖြစ် ထုတ်လုပ်ပေးမလဲ။
+
+<Question
+	choices={[
+		{
+			text: "<code>dataset.sample(50)</code>",
+			explain: "ဒါက မမှန်ပါဘူး -- <code>Dataset.sample()</code> method မရှိပါဘူး။"
+		},
+		{
+			text: "<code>dataset.shuffle().select(range(50))</code>",
+			explain: "မှန်ပါတယ်။ ဒီအခန်းမှာ သင်တွေ့ခဲ့ရတဲ့အတိုင်း၊ သင်ဟာ dataset ကို အရင် shuffle လုပ်ပြီးမှ ၎င်းကနေ samples တွေကို ရွေးထုတ်တာပါ။",
+			correct: true
+		},
+		{
+			text: "<code>dataset.select(range(50)).shuffle()</code>",
+			explain: "ဒါက မမှန်ပါဘူး -- code က run မှာဖြစ်ပေမယ့်၊ dataset ထဲက ပထမဆုံး elements ၅၀ ကိုပဲ shuffle လုပ်ပါလိမ့်မယ်။"
+		}
+	]}
+/>
+
+### ၃။ `pets_dataset` လို့ခေါ်တဲ့ အိမ်မွေးတိရစ္ဆာန်တွေနဲ့ ပတ်သက်တဲ့ dataset တစ်ခုရှိပြီး၊ တိရစ္ဆာန်တစ်ခုစီရဲ့ နာမည်ကို ဖော်ပြတဲ့ `name` column ပါဝင်တယ်လို့ ယူဆပါ။ အောက်ပါနည်းလမ်းတွေထဲက ဘယ်ဟာက နာမည် "L" စာလုံးနဲ့ စတင်တဲ့ တိရစ္ဆာန်တွေအားလုံးအတွက် dataset ကို filter လုပ်နိုင်စေမှာလဲ။
+
+<Question
+	choices={[
+		{
+			text: "<code>pets_dataset.filter(lambda x : x['name'].startswith('L'))</code>",
+			explain: "မှန်ပါတယ်။ ဒီလို မြန်ဆန်တဲ့ filters တွေအတွက် Python lambda function ကို အသုံးပြုတာက အကောင်းဆုံးပါပဲ။ တခြားဖြေရှင်းနည်းတစ်ခုကို စဉ်းစားနိုင်မလား။",
+			correct: true
+		},
+		{
+			text: "<code>pets_dataset.filter(lambda x['name'].startswith('L'))</code>",
+			explain: "ဒါက မမှန်ပါဘူး -- lambda function တစ်ခုက <code>lambda *arguments* : *expression*</code> ပုံစံရှိတာကြောင့်၊ ဒီကိစ္စမှာ arguments တွေ ပေးဖို့လိုပါတယ်။"
+		},
+		{
+			text: "<code>def filter_names(x): return x['name'].startswith('L')</code> လို function တစ်ခု ဖန်တီးပြီး <code>pets_dataset.filter(filter_names)</code> ကို run ပါ။",
+			explain: "မှန်ပါတယ်။ <code>Dataset.map()</code> နဲ့ အတူတူပါပဲ၊ သင်ဟာ <code>Dataset.filter()</code> ကို explicit functions တွေ ပေးနိုင်ပါတယ်။ ဒါက တိုတောင်းတဲ့ lambda function နဲ့ မသင့်လျော်တဲ့ ရှုပ်ထွေးတဲ့ logic တွေရှိတဲ့အခါ အသုံးဝင်ပါတယ်။ တခြားဘယ်ဖြေရှင်းနည်းတွေ အလုပ်ဖြစ်မလဲ။",
+			correct: true
+		}
+	]}
+/>
+
+### ၄။ Memory mapping ဆိုတာ ဘာလဲ။
+
+<Question
+	choices={[
+		{
+			text: "CPU နဲ့ GPU RAM ကြားက mapping တစ်ခု",
+			explain: "ဒါက မဟုတ်ပါဘူး -- ထပ်ကြိုးစားပါ။",
+		},
+		{
+			text: "RAM နဲ့ filesystem storage ကြားက mapping တစ်ခု",
+			explain: "မှန်ပါတယ်။ 🤗 Datasets က dataset တစ်ခုစီကို memory-mapped file တစ်ခုလို သတ်မှတ်ပါတယ်။ ဒါက library ကို dataset ရဲ့ elements တွေကို memory ထဲကို အပြည့်အဝ load လုပ်ဖို့ မလိုဘဲ ဝင်ရောက်ကြည့်ရှုပြီး လုပ်ဆောင်နိုင်စေပါတယ်။",
+			correct: true
+		},
+		{
+			text: "🤗 Datasets cache ထဲက files နှစ်ခုကြားက mapping တစ်ခု",
+			explain: "ဒါက မမှန်ပါဘူး -- ထပ်ကြိုးစားပါ။"
+		}
+	]}
+/>
+
+### ၅။ အောက်ပါတို့ထဲက ဘယ်အရာတွေက memory mapping ရဲ့ အဓိက အကျိုးကျေးဇူးတွေလဲ။
+
+<Question
+	choices={[
+		{
+			text: "memory-mapped files တွေကို ဝင်ရောက်ကြည့်ရှုတာက disk ကနေ ဖတ်တာ ဒါမှမဟုတ် disk ကို ရေးတာထက် ပိုမြန်ပါတယ်။",
+			explain: "မှန်ပါတယ်။ ဒါက 🤗 Datasets ကို အလွန်လျင်မြန်စေပါတယ်။ ဒါပေမယ့် ဒါတစ်ခုတည်းသော အကျိုးကျေးဇူးတော့ မဟုတ်ပါဘူး။",
+			correct: true
+		},
+		{
+			text: "Applications တွေဟာ အလွန်ကြီးမားတဲ့ file တစ်ခုထဲက data segments တွေကို file တစ်ခုလုံးကို RAM ထဲကို အရင်ဖတ်စရာ မလိုဘဲ ဝင်ရောက်ကြည့်ရှုနိုင်ပါတယ်။",
+			explain: "မှန်ပါတယ်။ ဒါက 🤗 Datasets ကို multi-gigabyte datasets တွေကို သင့် laptop ပေါ်မှာ CPU ကို မပိတ်မိစေဘဲ load လုပ်နိုင်စေပါတယ်။ memory mapping က တခြားဘာ အကျိုးကျေးဇူးတွေ ပေးလဲ။",
+			correct: true
+		},
+		{
+			text: "စွမ်းအင် နည်းနည်းပဲ သုံးစွဲတာကြောင့် သင့်ဘက်ထရီက ပိုကြာကြာ ခံပါတယ်။",
+			explain: "ဒါက မမှန်ပါဘူး -- ထပ်ကြိုးစားပါ။"
+		}
+	]}
+/>
+
+### ၆။ အောက်ပါ code က ဘာကြောင့် အလုပ်မလုပ်တာလဲ။
+
+```py
+from datasets import load_dataset
+
+dataset = load_dataset("allocine", streaming=True, split="train")
+dataset[0]
+```
+
+<Question
+	choices={[
+		{
+			text: "RAM ထဲကို မဆံ့လောက်အောင် ကြီးမားတဲ့ dataset ကို stream လုပ်ဖို့ ကြိုးစားနေတာ။",
+			explain: "ဒါက မမှန်ပါဘူး -- streaming datasets တွေကို on the fly မှာ decompress လုပ်တာဖြစ်ပြီး၊ terabyte-sized datasets တွေကို RAM အနည်းငယ်နဲ့ လုပ်ဆောင်နိုင်ပါတယ်!",
+		},
+		{
+			text: "<code>IterableDataset</code> ကို ဝင်ရောက်ကြည့်ရှုဖို့ ကြိုးစားနေတာ။",
+			explain: "မှန်ပါတယ်။ <code>IterableDataset</code> ဆိုတာ generator တစ်ခုဖြစ်ပြီး container မဟုတ်ပါဘူး၊ ဒါကြောင့် ၎င်းရဲ့ elements တွေကို <code>next(iter(dataset))</code> ကို အသုံးပြုပြီး ဝင်ရောက်ကြည့်ရှုသင့်ပါတယ်။",
+			correct: true
+		},
+		{
+			text: "<code>allocine</code> dataset မှာ <code>train</code> split မရှိပါဘူး။",
+			explain: "ဒါက မမှန်ပါဘူး -- Hub ပေါ်က [<code>allocine</code> dataset card](https://huggingface.co/datasets/allocine) ကို ကြည့်ပြီး ၎င်းမှာ ဘယ် splits တွေ ပါဝင်လဲဆိုတာ စစ်ဆေးပါ။"
+		}
+	]}
+/>
+
+### ၇။ Dataset card တစ်ခု ဖန်တီးခြင်းရဲ့ အဓိက အကျိုးကျေးဇူးတွေက ဘာတွေလဲ။
+
+<Question
+	choices={[
+		{
+			text: "ဒါက dataset ရဲ့ ရည်ရွယ်အသုံးပြုပုံနဲ့ ထောက်ပံ့ထားတဲ့ tasks တွေအကြောင်း အချက်အလက်တွေ ပေးတာကြောင့် community ထဲက တခြားသူတွေက ဒါကို အသုံးပြုဖို့ အသိပေးဆုံးဖြတ်ချက် ချနိုင်ပါတယ်။",
+			explain: "မှန်ပါတယ်။ မှတ်တမ်းမရှိတဲ့ datasets တွေဟာ dataset ဖန်တီးသူတွေရဲ့ ရည်ရွယ်ချက်တွေကို ထင်ဟပ်နိုင်ခြင်းမရှိတဲ့ models တွေကို train လုပ်ဖို့ အသုံးပြုနိုင်ပါတယ်၊ ဒါမှမဟုတ် privacy သို့မဟုတ် licensing ကန့်သတ်ချက်တွေကို ချိုးဖောက်တဲ့ data တွေပေါ်မှာ train လုပ်ထားရင် ဥပဒေရေးရာ မရှင်းလင်းတဲ့ models တွေကို ထုတ်လုပ်နိုင်ပါတယ်။ ဒါပေမယ့် ဒါတစ်ခုတည်းသော အကျိုးကျေးဇူးတော့ မဟုတ်ပါဘူး!",
+			correct : true
+		},
+		{
+			text: "corpus ထဲမှာ ရှိနေတဲ့ ဘက်လိုက်မှုတွေကို အာရုံစိုက်မိစေဖို့ ကူညီပေးပါတယ်။",
+			explain: "မှန်ပါတယ်။ datasets အားလုံးနီးပါးမှာ ဘက်လိုက်မှုပုံစံအချို့ ပါဝင်ပြီး၊ ဒါတွေက အောက်ဘက်မှာ အနုတ်လက္ခဏာဆောင်တဲ့ အကျိုးဆက်တွေ ဖြစ်ပေါ်စေနိုင်ပါတယ်။ ဒါတွေကို သိရှိနားလည်ခြင်းက model တည်ဆောက်သူတွေကို မွေးရာပါ ဘက်လိုက်မှုတွေကို ဘယ်လိုဖြေရှင်းရမလဲဆိုတာ နားလည်စေပါတယ်။ dataset cards တွေက တခြားဘာတွေ ကူညီပေးသေးလဲ။",
+			correct : true
+		},
+		{
+			text: "community ထဲက တခြားသူတွေက ကျွန်ုပ်ရဲ့ dataset ကို အသုံးပြုနိုင်ခြေကို တိုးတက်စေပါတယ်။",
+			explain: "မှန်ပါတယ်။ ကောင်းကောင်းရေးထားတဲ့ dataset card တစ်ခုက သင့်ရဲ့ တန်ဖိုးရှိတဲ့ dataset ကို ပိုမိုအသုံးပြုလာစေဖို့ ဦးဆောင်ပါလိမ့်မယ်။ တခြားဘာ အကျိုးကျေးဇူးတွေ ပေးသေးလဲ။",
+			correct: true
+		},
+	]}
+/>
+
+### ၈။ Semantic search ဆိုတာ ဘာလဲ။
+
+<Question
+	choices={[
+		{
+			text: "query ထဲက စကားလုံးတွေနဲ့ corpus ထဲက documents တွေကြား တိကျတဲ့ ကိုက်ညီမှုတွေကို ရှာဖွေတဲ့ နည်းလမ်းတစ်ခု။",
+			explain: "ဒါက မမှန်ပါဘူး -- ဒီလို ရှာဖွေမှုကို *lexical search* လို့ခေါ်ပြီး၊ ရိုးရာ search engines တွေမှာ သင်ပုံမှန်တွေ့ရတာပါ။"
+		},
+		{
+			text: "query ရဲ့ contextual meaning ကို နားလည်ခြင်းဖြင့် ကိုက်ညီတဲ့ documents တွေကို ရှာဖွေတဲ့ နည်းလမ်းတစ်ခု။",
+			explain: "မှန်ပါတယ်။ Semantic search က embedding vectors တွေကို အသုံးပြုပြီး queries နဲ့ documents တွေကို ကိုယ်စားပြုကာ၊ ၎င်းတို့ကြားက ထပ်နေမှုပမာဏကို တိုင်းတာဖို့ similarity metric ကို အသုံးပြုပါတယ်။ တခြားဘယ်လို ဖော်ပြနိုင်သေးလဲ။",
+			correct: true
+		},
+		{
+			text: "search accuracy ကို တိုးတက်စေတဲ့ နည်းလမ်းတစ်ခု။",
+			explain: "မှန်ပါတယ်။ Semantic search engines တွေက keyword matching ထက် query ရဲ့ ရည်ရွယ်ချက်ကို ပိုကောင်းကောင်း နားလည်နိုင်ပြီး ပုံမှန်အားဖြင့် ပိုမိုမြင့်မားတဲ့ precision နဲ့ documents တွေကို ပြန်လည်ရယူပါတယ်။ ဒါပေမယ့် ဒါတစ်ခုတည်းသော အဖြေမှန် မဟုတ်ပါဘူး -- semantic search က တခြားဘာတွေ ပံ့ပိုးပေးသေးလဲ။",
+			correct: true
+		}
+	]}
+/>
+
+### ၉။ Asymmetric semantic search အတွက်၊ သင်အများအားဖြင့် ဘာတွေရှိလဲ။
+
+<Question
+	choices={[
+		{
+			text: "တိုတောင်းသော query တစ်ခုနဲ့ query ကို ဖြေကြားပေးတဲ့ ပိုရှည်တဲ့ paragraph တစ်ခု။",
+			explain: "မှန်ပါတယ်!",
+			correct : true
+		},
+		{
+			text: "query တွေနဲ့ paragraphs တွေက အရှည်တူညီလောက်ပါတယ်။",
+			explain: "ဒါက symmetric semantic search ရဲ့ ဥပမာတစ်ခုပါ -- ထပ်ကြိုးစားပါ။"
+		},
+		{
+			text: "ရှည်လျားသော query တစ်ခုနဲ့ query ကို ဖြေကြားပေးတဲ့ ပိုတိုတောင်းတဲ့ paragraph တစ်ခု။",
+			explain: "ဒါက မမှန်ပါဘူး -- ထပ်ကြိုးစားပါ။"
+		}
+	]}
+/>
+
+### ၁၀။ 🤗 Datasets ကို speech processing လိုမျိုး အခြား domains တွေမှာ အသုံးပြုဖို့ data တွေ load လုပ်ဖို့ အသုံးပြုနိုင်မလား။
+
+<Question
+	choices={[
+		{
+			text: "မရပါဘူး။",
+			explain: "ဒါက မမှန်ပါဘူး -- 🤗 Datasets က လက်ရှိမှာ tabular data, audio နဲ့ computer vision တွေကို ထောက်ပံ့ပေးပါတယ်။ computer vision ဥပမာအတွက် Hub ပေါ်က <a  href='https://huggingface.co/datasets/mnist'>MNIST dataset</a> ကို ကြည့်ပါ။"
+		},
+		{
+			text: "ရပါတယ်။",
+			explain: "မှန်ပါတယ်။ 🤗 Transformers library မှာ speech နဲ့ vision နဲ့ပတ်သက်တဲ့ စိတ်လှုပ်ရှားဖွယ် တိုးတက်မှုတွေကို ကြည့်ပြီး 🤗 Datasets ကို ဒီ domains တွေမှာ ဘယ်လိုအသုံးပြုလဲဆိုတာ ကြည့်ပါ။",
+			correct : true
+		},
+	]}
+/>
+
+## ဝေါဟာရ ရှင်းလင်းချက် (Glossary)
+
+*   **🤗 Datasets Library**: Hugging Face က ထုတ်လုပ်ထားတဲ့ library တစ်ခုဖြစ်ပြီး AI မော်ဒယ်တွေ လေ့ကျင့်ဖို့အတွက် ဒေတာအစုအဝေး (datasets) တွေကို လွယ်လွယ်ကူကူ ဝင်ရောက်ရယူ၊ စီမံခန့်ခွဲပြီး အသုံးပြုနိုင်စေပါတယ်။
+*   **`load_dataset()` Function**: Hugging Face Datasets library မှ dataset များကို download လုပ်ပြီး cache လုပ်ရန် အသုံးပြုသော function။
+*   **Locally**: သင့်ကွန်ပျူတာ (laptop သို့မဟုတ် desktop) ၏ hard drive ပေါ်တွင်။
+*   **Laptop**: သယ်ဆောင်ရလွယ်ကူသော ကိုယ်ပိုင်ကွန်ပျူတာ။
+*   **`data_files` Argument**: `load_dataset()` function တွင် dataset files (local သို့မဟုတ် remote) ၏ path (သို့မဟုတ် URL) ကို သတ်မှတ်ရန် အသုံးပြုသော argument။
+*   **Hugging Face Hub**: AI မော်ဒယ်တွေ၊ datasets တွေနဲ့ demo တွေကို အခြားသူတွေနဲ့ မျှဝေဖို့၊ ရှာဖွေဖို့နဲ့ ပြန်လည်အသုံးပြုဖို့အတွက် အွန်လိုင်း platform တစ်ခု ဖြစ်ပါတယ်။
+*   **Dataset ID**: Hugging Face Hub ပေါ်ရှိ dataset တစ်ခု၏ ထူးခြားသော ဖော်ထုတ်ကိန်း (identifier)။
+*   **Remote Server**: ကွန်ရက်တစ်ခုပေါ်တွင် ဝန်ဆောင်မှုများ သို့မဟုတ် အရင်းအမြစ်များကို ပံ့ပိုးပေးသော ကွန်ပျူတာ။
+*   **URL (Uniform Resource Locator)**: web ပေါ်ရှိ အရင်းအမြစ်တစ်ခု (ဥပမာ- web page, file) ၏ လိပ်စာ။
+*   **GLUE Tasks (General Language Understanding Evaluation Tasks)**: စာသားခွဲခြားသတ်မှတ်ခြင်း လုပ်ငန်း ၁၀ ခုတွင် ML model များ၏ စွမ်းဆောင်ရည်ကို တိုင်းတာရန် အသုံးပြုသည့် academic benchmark တစ်ခု။
+*   **`mrpc` (Microsoft Research Paraphrase Corpus)**: GLUE benchmark ထဲက paraphrase detection task တစ်ခု။
+*   **`split="train"`**: dataset ရဲ့ training split ကို ရွေးချယ်ခြင်း။
+*   **Random Sample**: dataset တစ်ခုမှ ကျပန်းရွေးချယ်ထားသော elements များ။
+*   **`Dataset.sample()` Method**: `Dataset` object မှာ မရှိပါ။
+*   **`Dataset.shuffle()` Method**: dataset အတွင်းရှိ elements များကို ကျပန်းရောနှော (shuffle) ရန် အသုံးပြုသော method။
+*   **`Dataset.select(range(50))` Method**: dataset ၏ ပထမဆုံး elements ၅၀ ကို ရွေးထုတ်ရန် အသုံးပြုသော method။
+*   **`Dataset.filter()` Method**: 🤗 Datasets library မှာ ပါဝင်တဲ့ method တစ်ခုဖြစ်ပြီး သတ်မှတ်ထားသော အခြေအနေများနှင့် ကိုက်ညီသော ဒေတာများကိုသာ dataset မှ ရွေးထုတ်ရန် အသုံးပြုသည်။
+*   **Lambda Function (Python Lambda)**: အမည်မရှိသော (anonymous) function တစ်ခုဖြစ်ပြီး code လိုင်းတစ်ကြောင်းတည်းဖြင့် သတ်မှတ်နိုင်သည်။
+*   **`x['name'].startswith('L')`**: dictionary `x` အတွင်းရှိ `name` key ၏ value သည် 'L' ဖြင့် စတင်ခြင်းရှိမရှိ စစ်ဆေးသော Python expression။
+*   **Memory Mapping**: ဖိုင်တစ်ခု၏ အကြောင်းအရာများကို ကွန်ပျူတာ၏ virtual memory နေရာသို့ တိုက်ရိုက်ချိတ်ဆက်ပေးသည့် နည်းလမ်း။
+*   **CPU (Central Processing Unit)**: ကွန်ပျူတာ၏ ပင်မ processor။
+*   **GPU (Graphics Processing Unit)**: ဂရပ်ဖစ်လုပ်ဆောင်မှုအတွက် အထူးဒီဇိုင်းထုတ်ထားသော processor တစ်မျိုးဖြစ်သော်လည်း AI/ML လုပ်ငန်းများတွင် အရှိန်မြှင့်ရန် အသုံးများသည်။
+*   **RAM (Random Access Memory)**: ကွန်ပျူတာ၏ ယာယီမှတ်ဉာဏ်သိုလှောင်ရာနေရာ။
+*   **Filesystem Storage**: ကွန်ပျူတာ၏ hard disk သို့မဟုတ် solid-state drive (SSD) ကဲ့သို့သော အမြဲတမ်းသိုလှောင်ရာနေရာ။
+*   **🤗 Datasets Cache**: 🤗 Datasets library မှ download လုပ်ထားသော datasets များနှင့် processing လုပ်ထားသော ဒေတာများကို ယာယီသိမ်းဆည်းထားသော နေရာ။
+*   **Blazing Fast**: အလွန်လျင်မြန်စွာ လုပ်ဆောင်ခြင်း။
+*   **Multi-gigabyte Datasets**: gigabyte အရွယ်အစားများစွာရှိသော datasets များ။
+*   **CPU (Central Processing Unit)**: ကွန်ပျူတာ၏ ပင်မ processor။ (ဤနေရာတွင် "blowing up your CPU" ဆိုသည်မှာ CPU ကို အလွန်အမင်း ဝန်ပိစေခြင်းကို ဆိုလိုသည်)။
+*   **`streaming=True`**: `load_dataset()` function တွင် dataset ကို memory ထဲသို့ တစ်ခါတည်း အားလုံး load မလုပ်ဘဲ၊ လိုအပ်သလို အပိုင်းလိုက် stream လုပ်ရန် သတ်မှတ်ခြင်း။
+*   **`IterableDataset`**: 🤗 Datasets library ၏ class တစ်ခုဖြစ်ပြီး dataset ကို generator တစ်ခုအဖြစ် လုပ်ဆောင်စေကာ memory ထဲသို့ ဒေတာအားလုံးကို တစ်ခါတည်း load မလုပ်ဘဲ လိုအပ်သလို ထုတ်ပေးသည်။
+*   **Generator**: Python တွင် iteration လုပ်နိုင်သော object တစ်ခုဖြစ်ပြီး ၎င်းသည် အရာအားလုံးကို memory ထဲသို့ တစ်ပြိုင်နက်တည်း သိမ်းဆည်းမထားဘဲ လိုအပ်သလို တန်ဖိုးများကို ထုတ်ပေးသည်။
+*   **Container**: Python တွင် elements များကို သိမ်းဆည်းထားနိုင်သော object (ဥပမာ- list, tuple, dictionary)။
+*   **`next(iter(dataset))`**: `IterableDataset` မှ နောက်ထပ် element တစ်ခုကို ရယူရန် အသုံးပြုသော Python code။
+*   **`allocine` Dataset**: Hugging Face Hub ပေါ်ရှိ dataset တစ်ခု (ပြင်သစ်ရုပ်ရှင် reviews များ ပါဝင်နိုင်သည်)။
+*   **Dataset Card**: Hugging Face Hub တွင် dataset တစ်ခုစီအတွက် ပါရှိသော အချက်အလက်များပါသည့် စာမျက်နှာ။
+*   **Intended Use**: Dataset ကို အသုံးပြုရန် ရည်ရွယ်ထားသော ကိစ္စရပ်များ။
+*   **Supported Tasks**: Dataset ကို အသုံးပြု၍ လုပ်ဆောင်နိုင်သော လုပ်ငန်းများ။
+*   **Informed Decision**: အချက်အလက်အပြည့်အစုံကို အခြေခံပြီး ဆုံးဖြတ်ချက်ချခြင်း။
+*   **Undocumented Datasets**: အသုံးပြုပုံ၊ ကန့်သတ်ချက်များ သို့မဟုတ် ဘက်လိုက်မှုများအတွက် တရားဝင်မှတ်တမ်းမရှိသော datasets များ။
+*   **Legal Status**: ဥပဒေရေးရာ အခြေအနေ။
+*   **Murky**: ရှင်းရှင်းလင်းလင်း မရှိခြင်း၊ မရေမရာဖြစ်ခြင်း။
+*   **Privacy Restrictions**: ကိုယ်ရေးကိုယ်တာ အချက်အလက်များနှင့် ပတ်သက်သော ကန့်သတ်ချက်များ။
+*   **Licensing Restrictions**: လိုင်စင်နှင့် ပတ်သက်သော ကန့်သတ်ချက်များ။
+*   **Corpus**: စာသား (သို့မဟုတ် အခြားဒေတာ) အစုအဝေးကြီးတစ်ခု။
+*   **Bias**: Model တစ်ခု၏ ခန့်မှန်းချက်များတွင် ဒေတာ သို့မဟုတ် သင်္ချာဆိုင်ရာ အကြောင်းများကြောင့် ဖြစ်ပေါ်လာသော ဘက်လိုက်မှုများ။
+*   **Negative Consequences Downstream**: နောက်ဆက်တွဲအဆင့်များတွင် ဖြစ်ပေါ်လာနိုင်သော အနုတ်လက္ခဏာဆောင်သည့် ရလဒ်များ။
+*   **Semantic Search**: စာလုံးများကို ကိုက်ညီမှု ရှာဖွေခြင်းထက် အဓိပ္ပာယ်ပေါ်မူတည်၍ ရှာဖွေနိုင်သော search engine။
+*   **Lexical Search**: စကားလုံးများကို တိကျသော ကိုက်ညီမှုအပေါ် အခြေခံ၍ ရှာဖွေခြင်း။
+*   **Query**: search engine တွင် ရှာဖွေရန် ထည့်သွင်းသော စကားလုံး သို့မဟုတ် စာကြောင်း။
+*   **Documents**: ရှာဖွေရန် စုစည်းထားသော စာသားအချက်အလက်များ။
+*   **Contextual Meaning**: စာသားတစ်ခု၏ အကြောင်းအရာအလိုက် အဓိပ္ပာယ်။
+*   **Embedding Vectors**: စာသား သို့မဟုတ် အခြားဒေတာများကို ဂဏန်းဆိုင်ရာ vector များအဖြစ် ကိုယ်စားပြုခြင်း။
+*   **Similarity Metric**: elements နှစ်ခုကြား ဆင်တူမှုပမာဏကို တိုင်းတာသော တန်ဖိုး။
+*   **Overlap**: အရာနှစ်ခုကြား တူညီသော သို့မဟုတ် ထပ်နေသော အစိတ်အပိုင်းများ။
+*   **Search Accuracy**: ရှာဖွေမှုရလဒ်များ၏ မှန်ကန်မှုပမာဏ။
+*   **Keyword Matching**: search query ထဲက စကားလုံးတွေနဲ့ document ထဲက စကားလုံးတွေ တိကျစွာ ကိုက်ညီမှုကို ရှာဖွေခြင်း။
+*   **Precision**: search results များထဲမှ သက်ဆိုင်ရာရလဒ်များ၏ ရာခိုင်နှုန်း။
+*   **Asymmetric Semantic Search**: Query နှင့် Document များ၏ အရှည် သို့မဟုတ် ပုံစံ ကွာခြားသည့် semantic search အမျိုးအစား (ဥပမာ- တိုတိုလေး query နှင့် ရှည်လျားသော document)။
+*   **Symmetric Semantic Search**: Query နှင့် Document များ၏ အရှည် သို့မဟုတ် ပုံစံ တူညီသည့် semantic search အမျိုးအစား။
+*   **Tabular Data**: جداول ပုံစံဖြင့် စုစည်းထားသော ဒေတာ (rows and columns)။
+*   **Audio (Speech Processing)**: အသံအချက်အလက်များကို AI စနစ်များဖြင့် စီမံဆောင်ရွက်ခြင်း။
+*   **Computer Vision**: ကွန်ပျူတာများကို ပုံရိပ်များ သို့မဟုတ် ဗီဒီယိုများမှ အချက်အလက်များ နားလည်စေရန် သင်ကြားပေးခြင်း။
+*   **MNIST Dataset**: handwritten digits များပါဝင်သော computer vision dataset တစ်ခု။
\ No newline at end of file
diff --git a/chapters/pt/chapter1/3.mdx b/chapters/pt/chapter1/3.mdx
index 8ef239099..f1f4ec49f 100644
--- a/chapters/pt/chapter1/3.mdx
+++ b/chapters/pt/chapter1/3.mdx
@@ -9,11 +9,10 @@
 
 Nessa seção, observaremos sobre o que os modelos Transformers podem fazer e usar nossa primeira ferramenta da biblioteca 🤗 Transformers: a função `pipeline()` .
 
-<Tip>
-👀 Tá vendo o botão <em>Open in Colab</em> no topo direito? Clique nele e abra um notebook Google Colab notebook  com todas as amostras de códigos dessa seção. Esse botão estará presente em cada seção contendo exemplos de códigos. 
-
-Se você deseja rodar os exemplos localmente, nós recomendamos dar uma olhada no <a href="/course/chapter0">setup</a>.
-</Tip>
+> [!TIP]
+> 👀 Tá vendo o botão <em>Open in Colab</em> no topo direito? Clique nele e abra um notebook Google Colab notebook  com todas as amostras de códigos dessa seção. Esse botão estará presente em cada seção contendo exemplos de códigos. 
+>
+> Se você deseja rodar os exemplos localmente, nós recomendamos dar uma olhada no <a href="/course/chapter0">setup</a>.
 
 ## Transformers estão por toda parte!
 
@@ -23,9 +22,8 @@ Os modelos Transformers são usados para resolver todos os tipos de tarefas de N
 
 A [biblioteca 🤗 Transformers](https://github.com/huggingface/transformers) oferece a funcionalidade para criar e usar esses modelos compartilhados. O [Model Hub](https://huggingface.co/models) contém milhares de modelos pré-treinados que qualquer um pode baixar e usar. Você pode também dar upload nos seus próprios modelos no Hub!
 
-<Tip>
-⚠️ O Hugging Face Hub não é limitado aos modelos Transformers. Qualquer um pode compartilhar quaisquer tipos de modelos ou datasets que quiserem! <a href="https://huggingface.co/join">Crie uma conta na huggingface.co</a> para se beneficiar de todos os recursos disponíveis!
-</Tip>
+> [!TIP]
+> ⚠️ O Hugging Face Hub não é limitado aos modelos Transformers. Qualquer um pode compartilhar quaisquer tipos de modelos ou datasets que quiserem! <a href="https://huggingface.co/join">Crie uma conta na huggingface.co</a> para se beneficiar de todos os recursos disponíveis!
 
 Antes de aprofundarmos sobre como os modelos Transformers funcionam por debaixo dos panos, vamos olhar alguns exemplos de como eles podem ser usados para solucionar alguns problemas de NLP interessantes.
 
@@ -104,11 +102,8 @@ classifier(
 
 Esse pipeline é chamado de _zero-shot_ porque você não precisa fazer o ajuste fino do modelo nos dados que você o utiliza. Pode diretamente retornar scores de probabilidade para qualquer lista de rótulos que você quiser!
 
-<Tip>
-
-✏️ **Experimente!** Brinque com suas próprias sequências e rótulos e veja como o modelo se comporta.
-
-</Tip>
+> [!TIP]
+> ✏️ **Experimente!** Brinque com suas próprias sequências e rótulos e veja como o modelo se comporta.
 
 
 ## Geração de Texto
@@ -134,11 +129,8 @@ generator(
 
 Você pode controlar quão diferentes sequências são geradas com o argumento `num_return_sequences` e o tamanho total da saída de texto (*output*) com o argumento `max_length`.
 
-<Tip>
-
-✏️ **Experimente!** Use os argumentos `num_return_sequences` e `max_length` para gerar 2 textos com 15 palavras cada.
-
-</Tip>
+> [!TIP]
+> ✏️ **Experimente!** Use os argumentos `num_return_sequences` e `max_length` para gerar 2 textos com 15 palavras cada.
 
 
 ## Usando qualquer modelo do Hub em um pipeline
@@ -170,11 +162,8 @@ Você pode refinar sua pesquisa por um modelo clicando nas tags de linguagem, e
 
 Uma vez que você seleciona o modelo clicando nele, você irá ver que há um widget que permite que você teste-o diretamente online. Desse modo você pode rapidamente testar as capacidades do modelo antes de baixa-lo.
 
-<Tip>
-
-✏️ **Experimente!** Use os filtros para encontrar um modelo de geração de texto em outra lingua. Fique à vontade para brincar com o widget e usa-lo em um pipeline!
-
-</Tip>
+> [!TIP]
+> ✏️ **Experimente!** Use os filtros para encontrar um modelo de geração de texto em outra lingua. Fique à vontade para brincar com o widget e usa-lo em um pipeline!
 
 ### A API de Inferência
 
@@ -206,11 +195,8 @@ unmasker("This course will teach you all about <mask> models.", top_k=2)
 
 O argumento `top_k` controla quantas possibilidades você quer que sejam geradas. Note que aqui o modelo preenche com uma palavra `<máscara>` especial, que é frequentemente referida como *mask token*.  Outros modelos de preenchimento de máscara podem ter diferentes *mask tokens*, então é sempre bom verificar uma palavra máscara apropriada quando explorar outros modelos. Um modo de checar isso é olhando para a palavra máscara usada no widget.
 
-<Tip>
-
-✏️ **Experimente!**  Pesquise pelo modelo `bert-base-cased` no Hub e identifique suas palavras máscara no widget da API de inferência. O que esse modelo prediz para a sentença em nosso `pipeline` no exemplo acima?
-
-</Tip>
+> [!TIP]
+> ✏️ **Experimente!**  Pesquise pelo modelo `bert-base-cased` no Hub e identifique suas palavras máscara no widget da API de inferência. O que esse modelo prediz para a sentença em nosso `pipeline` no exemplo acima?
 
 ## Reconhecimento de entidades nomeadas
 
@@ -234,11 +220,8 @@ Aqui o modelo corretamente identificou que Sylvain é uma pessoa (PER), Hugging
 
 Nós passamos a opção `grouped_entities=True` na criação da função do pipelina para dize-lo para reagrupar juntos as partes da sentença que correspondem à mesma entidade: aqui o modelo agrupou corretamente "Hugging" e "Face" como única organização, ainda que o mesmo nome consista em múltiplas palavras. Na verdade, como veremos no próximo capítulo, o pré-processamento até mesmo divide algumas palavras em partes menores. Por exemplo, `Sylvain`  é dividido em 4 pedaços: `S`, `##yl`, `##va`, e `##in`. No passo de pós-processamento, o pipeline satisfatoriamente reagrupa esses pedaços.
 
-<Tip>
-
-✏️ **Experimente!** Procure no Model Hub por um modelo capaz de fazer o tageamento de partes do discurso (usualmente abreviado como POS) em inglês. O que o modelo prediz para a sentença no exemplo acima?
-
-</Tip>
+> [!TIP]
+> ✏️ **Experimente!** Procure no Model Hub por um modelo capaz de fazer o tageamento de partes do discurso (usualmente abreviado como POS) em inglês. O que o modelo prediz para a sentença no exemplo acima?
 
 ## Responder perguntas
 
@@ -322,10 +305,7 @@ translator("Ce cours est produit par Hugging Face.")
 
 Como a geração de texto e a sumarização, você pode especificar o tamanho máximo  `max_length`  e mínimo `min_length` para o resultado.
 
-<Tip>
-
-✏️ **Experimente!** Pesquise por modelos de tradução em outras línguas e experimente traduzir a sentença anterior em idiomas diferentes.
-
-</Tip>
+> [!TIP]
+> ✏️ **Experimente!** Pesquise por modelos de tradução em outras línguas e experimente traduzir a sentença anterior em idiomas diferentes.
 
 Os pipelines mostrados até agora são em sua maioria para propósitos demonstrativos. Eles foram programados para tarefas específicas e não podem performar variações delas. No próximo capítulo, você aprenderá o que está por dentro da função `pipeline()` e como customizar seu comportamento.
diff --git a/chapters/pt/chapter2/1.mdx b/chapters/pt/chapter2/1.mdx
index 3d3b8e83e..295f158be 100644
--- a/chapters/pt/chapter2/1.mdx
+++ b/chapters/pt/chapter2/1.mdx
@@ -20,6 +20,5 @@ Este capítulo começará com um exemplo de ponta a ponta onde usamos um modelo
 Depois veremos a API do tokenizer, que é o outro componente principal da função `pipeline()`. Os Tokenizers cuidam da primeira e da última etapa do processamento, cuidando da conversão de texto para entradas numéricas para a rede neural, e da conversão de volta ao texto quando for necessário. Por fim, mostraremos a você como lidar com o envio de várias frases através de um modelo em um batch preparado, depois olharemos tudo mais atentamente a função de alto nível `tokenizer()`.
 
 
-<Tip>
-⚠️ Para beneficiar-se de todos os recursos disponíveis com o Model Hub e 🤗 Transformers, recomendamos  <a href="https://huggingface.co/join"> criar uma conta</a>.
-</Tip>
\ No newline at end of file
+> [!TIP]
+> ⚠️ Para beneficiar-se de todos os recursos disponíveis com o Model Hub e 🤗 Transformers, recomendamos  <a href="https://huggingface.co/join"> criar uma conta</a>.
\ No newline at end of file
diff --git a/chapters/pt/chapter2/2.mdx b/chapters/pt/chapter2/2.mdx
index 3e4a38522..6318928d1 100644
--- a/chapters/pt/chapter2/2.mdx
+++ b/chapters/pt/chapter2/2.mdx
@@ -22,9 +22,8 @@
 
 {/if}
 
-<Tip>
-Esta é a primeira seção onde o conteúdo é ligeiramente diferente, dependendo se você usa PyTorch e TensorFlow. Para selecionar a plataforma que você prefere, basta alterar no botão no topo.
-</Tip>
+> [!TIP]
+> Esta é a primeira seção onde o conteúdo é ligeiramente diferente, dependendo se você usa PyTorch e TensorFlow. Para selecionar a plataforma que você prefere, basta alterar no botão no topo.
 
 {#if fw === 'pt'}
 <Youtube id="1pedAIvTWXk"/>
@@ -347,8 +346,5 @@ Agora podemos concluir que o modelo previu o seguinte:
 
 Reproduzimos com sucesso as três etapas do pipeline: o pré-processamento, passagem das entradas através do modelo, e o pós-processamento! Agora, vamos levar algum tempo para mergulhar mais fundo em cada uma dessas etapas.
 
-<Tip>
-
-✏️ **Experimente!** Escolha duas (ou mais) textos próprios e passe-os através do pipeline `sentiment-analysis`. Em seguida, replique as etapas que você mesmo viu aqui e verifique se você obtém os mesmos resultados!
-
-</Tip>
+> [!TIP]
+> ✏️ **Experimente!** Escolha duas (ou mais) textos próprios e passe-os através do pipeline `sentiment-analysis`. Em seguida, replique as etapas que você mesmo viu aqui e verifique se você obtém os mesmos resultados!
diff --git a/chapters/pt/chapter2/4.mdx b/chapters/pt/chapter2/4.mdx
index aab3ec7fe..a0346e272 100644
--- a/chapters/pt/chapter2/4.mdx
+++ b/chapters/pt/chapter2/4.mdx
@@ -221,11 +221,8 @@ print(ids)
 
 Estas saídas, uma vez convertidas no tensor com a estrutura apropriada, podem então ser usadas como entradas para um modelo como visto anteriormente neste capítulo.
 
-<Tip>
-
-✏️ **Experimente realizar isso!** Replicar os dois últimos passos (tokenização e conversão para IDs de entrada) nas frases de entrada que usamos na seção 2 ("I've been waiting for a HuggingFace course my whole life." e "I hate this so much!"). Verifique se você recebe os mesmos IDs de entrada que recebemos antes!
-
-</Tip>
+> [!TIP]
+> ✏️ **Experimente realizar isso!** Replicar os dois últimos passos (tokenização e conversão para IDs de entrada) nas frases de entrada que usamos na seção 2 ("I've been waiting for a HuggingFace course my whole life." e "I hate this so much!"). Verifique se você recebe os mesmos IDs de entrada que recebemos antes!
 
 ## Decoding
 
diff --git a/chapters/pt/chapter2/5.mdx b/chapters/pt/chapter2/5.mdx
index 700cf373a..003e72fa9 100644
--- a/chapters/pt/chapter2/5.mdx
+++ b/chapters/pt/chapter2/5.mdx
@@ -180,11 +180,8 @@ batched_ids = [ids, ids]
 
 Este é um lote de duas sequências idênticas!
 
-<Tip>
-
-✏️ **Experimente!** Converta esta lista de `batched_ids` em um tensor e passe-a através de seu modelo. Verifique se você obtém os mesmos logits que antes (mas duas vezes)!
-
-</Tip>
+> [!TIP]
+> ✏️ **Experimente!** Converta esta lista de `batched_ids` em um tensor e passe-a através de seu modelo. Verifique se você obtém os mesmos logits que antes (mas duas vezes)!
 
 O Batching permite que o modelo funcione quando você o alimenta com várias frases. Usar várias sequências é tão simples quanto construir um lote com uma única sequência. Há uma segunda questão, no entanto. Quando você está tentando agrupar duas (ou mais) sentenças, elas podem ser de comprimentos diferentes. Se você já trabalhou com tensores antes, você sabe que eles precisam ser de forma retangular, então você não será capaz de converter a lista de IDs de entrada em um tensor diretamente. Para contornar este problema, normalmente realizamos uma *padronização* (padding) nas entradas.
 
@@ -318,11 +315,8 @@ Agora obtemos os mesmos logits para a segunda frase do batch.
 
 Observe como o último valor da segunda sequência é um ID de padding, que é um valor 0 na máscara de atenção.
 
-<Tip>
-
-✏️ **Experimente!** Aplique a tokenização manualmente nas duas frases usadas na seção 2 ("I've been waiting for a HuggingFace course my whole life." e "I hate this so much!"). Passe-as através do modelo e verifique se você obtém os mesmos logits que na seção 2. Agora, agrupe-os usando o token de padding e depois crie a máscara de atenção adequada. Verifique que você obtenha os mesmos resultados ao passar pelo modelo!
-
-</Tip>
+> [!TIP]
+> ✏️ **Experimente!** Aplique a tokenização manualmente nas duas frases usadas na seção 2 ("I've been waiting for a HuggingFace course my whole life." e "I hate this so much!"). Passe-as através do modelo e verifique se você obtém os mesmos logits que na seção 2. Agora, agrupe-os usando o token de padding e depois crie a máscara de atenção adequada. Verifique que você obtenha os mesmos resultados ao passar pelo modelo!
 
 ## Sequências mais longas
 
diff --git a/chapters/pt/chapter4/2.mdx b/chapters/pt/chapter4/2.mdx
index 70c53d600..0a547c3d6 100644
--- a/chapters/pt/chapter4/2.mdx
+++ b/chapters/pt/chapter4/2.mdx
@@ -92,6 +92,5 @@ model = TFAutoModelForMaskedLM.from_pretrained("camembert-base")
 ```
 {/if}
 
-<Tip>
-Ao utilizar um modelo pré-treinado, certifique-se de verificar como ele foi treinado, em quais datasets, seus limites e seus enviesamentos. Todas estas informações devem ser indicadas em seu modelo de cartão.
-</Tip>
+> [!TIP]
+> Ao utilizar um modelo pré-treinado, certifique-se de verificar como ele foi treinado, em quais datasets, seus limites e seus enviesamentos. Todas estas informações devem ser indicadas em seu modelo de cartão.
diff --git a/chapters/pt/chapter4/3.mdx b/chapters/pt/chapter4/3.mdx
index 93c652963..c97ffb29d 100644
--- a/chapters/pt/chapter4/3.mdx
+++ b/chapters/pt/chapter4/3.mdx
@@ -174,11 +174,8 @@ Clique na aba "Files and versions", e você deve ver os arquivos visíveis na se
 </div>
 {/if}
 
-<Tip>
-
-✏️ **Teste-o!** Pegue o modelo e o tokenizer associados ao checkpoint `bert-base-cased` e carregue-os para um repo em seu namespace utilizando o método `push_to_hub()`. Verifique novamente se o repo aparece corretamente em sua página antes de excluí-lo.
-
-</Tip>
+> [!TIP]
+> ✏️ **Teste-o!** Pegue o modelo e o tokenizer associados ao checkpoint `bert-base-cased` e carregue-os para um repo em seu namespace utilizando o método `push_to_hub()`. Verifique novamente se o repo aparece corretamente em sua página antes de excluí-lo.
 
 Como você já viu, o método `push_to_hub()` aceita vários argumentos, tornando possível carregar para um repositório específico ou espaço de nomes de organizações, ou utilizar um token API diferente. Recomendamos que você dê uma olhada na especificação do método disponível diretamente na documentação [🤗 Transformers documentation](https://huggingface.co/transformers/model_sharing.html) para ter uma idéia do que é possível.
 
@@ -465,9 +462,8 @@ Se você olhar para os tamanhos de arquivo (por exemplo, com `ls -lh`), você de
 
 {/if}
 
-<Tip>
-✏️ Ao criar o repositório a partir da interface web, o arquivo *.gitattributes* é automaticamente configurado para considerar arquivos com certas extensões, como *.bin* e *.h5*, como arquivos grandes, e o git-lfs os rastreará sem nenhuma configuração necessária em seu lado.
-</Tip> 
+> [!TIP]
+> ✏️ Ao criar o repositório a partir da interface web, o arquivo *.gitattributes* é automaticamente configurado para considerar arquivos com certas extensões, como *.bin* e *.h5*, como arquivos grandes, e o git-lfs os rastreará sem nenhuma configuração necessária em seu lado. 
 
 Agora podemos ir em frente e proceder como normalmente faríamos com os repositórios tradicionais da Git. Podemos adicionar todos os arquivos ao ambiente de encenação do Git utilizando o comando `git add`:
 
diff --git a/chapters/pt/chapter5/2.mdx b/chapters/pt/chapter5/2.mdx
index 2d12ad4e0..c0544573a 100644
--- a/chapters/pt/chapter5/2.mdx
+++ b/chapters/pt/chapter5/2.mdx
@@ -49,10 +49,8 @@ SQuAD_it-train.json.gz:	   82.2% -- replaced with SQuAD_it-train.json
 
 Podemos ver que os arquivos compactados foram substituídos por _SQuAD_it-train.json_ e _SQuAD_it-text.json_, e que os dados são armazenados no formato JSON.
 
-<Tip>
-
-✎ Se você está se perguntando por que há um `!` nos comandos shell acima, é porque estamos executando-os dentro de um Jupyter notebook. Basta remover o prefixo se você quiser baixar e descompactar o conjunto de dados dentro de um terminal.
-</Tip>
+> [!TIP]
+> ✎ Se você está se perguntando por que há um `!` nos comandos shell acima, é porque estamos executando-os dentro de um Jupyter notebook. Basta remover o prefixo se você quiser baixar e descompactar o conjunto de dados dentro de um terminal.
 
 Para carregar um arquivo JSON com a função `load_dataset()`, só precisamos saber se estamos lidando com o JSON comum (semelhante a um dicionário aninhado) ou Linhas JSON (JSON line-separated JSON). Como muitos conjuntos de dados que respondem a perguntas, o SQuAD utiliza o formato aninhado, com todo o texto armazenado em um campo `data`. Isto significa que podemos carregar o conjunto de dados especificando o argumento  `field` da seguinte forma:
 
@@ -128,11 +126,8 @@ DatasetDict({
 Isto é exatamente o que queríamos. Agora, podemos aplicar várias técnicas de pré-processamento para limpar os dados, assinalar as revisões, e assim por diante.
 
 
-<Tip>
-
-O argumento `data_files` da função `load_dataset()` é bastante flexível e pode ser um único caminho de arquivo ou uma lista de caminhos de arquivo, ou um dicionário que mapeia nomes divididos para caminhos de arquivo. Você também pode incluir arquivos que correspondam a um padrão especificado de acordo com as regras utilizadas pela Unix shell (por exemplo, você pode adicionar todos os arquivos JSON em um diretório como uma única divisão, definindo `data_files="*.json"`). Consulte a [documentação](https://huggingface.co/docs/datasets/loading#local-and-remote-files) do 🤗 Datasets para obter mais detalhes.
-
-</Tip>
+> [!TIP]
+> O argumento `data_files` da função `load_dataset()` é bastante flexível e pode ser um único caminho de arquivo ou uma lista de caminhos de arquivo, ou um dicionário que mapeia nomes divididos para caminhos de arquivo. Você também pode incluir arquivos que correspondam a um padrão especificado de acordo com as regras utilizadas pela Unix shell (por exemplo, você pode adicionar todos os arquivos JSON em um diretório como uma única divisão, definindo `data_files="*.json"`). Consulte a [documentação](https://huggingface.co/docs/datasets/loading#local-and-remote-files) do 🤗 Datasets para obter mais detalhes.
 
 Os scripts de carregamento em 🤗 Datasets realmente suportam a descompressão automática dos arquivos de entrada, então poderíamos ter pulado o uso do `gzip` ao apontar o argumento `data_files` diretamente para os arquivos compactados:
 
@@ -160,8 +155,7 @@ squad_it_dataset = load_dataset("json", data_files=data_files, field="data")
 
 Isto retorna o mesmo objeto `DatasetDict` obtido anteriormente, mas nos poupa o passo de baixar e descomprimir manualmente os arquivos _SQuAD_it-*.json.gz_. Isto envolve nas várias formas de carregar conjuntos de dados que não estão hospedados no Hugging Face Hub. Agora que temos um conjunto de dados para brincar, vamos sujar as mãos com várias técnicas de manipulação de dados!
 
-<Tip>
-✏️ **Tente fazer isso!** Escolha outro conjunto de dados hospedado no GitHub ou no [UCI Machine Learning Repository](https://archive.ics.uci.edu/ml/index.php) e tente carregá-lo tanto local como remotamente usando as técnicas introduzidas acima. Para pontos bônus, tente carregar um conjunto de dados que esteja armazenado em formato CSV ou texto (veja a [documentação](https://huggingface.co/docs/datasets/loading#local-and-remote-files) para mais informações sobre estes formatos).
-</Tip>
+> [!TIP]
+> ✏️ **Tente fazer isso!** Escolha outro conjunto de dados hospedado no GitHub ou no [UCI Machine Learning Repository](https://archive.ics.uci.edu/ml/index.php) e tente carregá-lo tanto local como remotamente usando as técnicas introduzidas acima. Para pontos bônus, tente carregar um conjunto de dados que esteja armazenado em formato CSV ou texto (veja a [documentação](https://huggingface.co/docs/datasets/loading#local-and-remote-files) para mais informações sobre estes formatos).
 
 
diff --git a/chapters/pt/chapter5/3.mdx b/chapters/pt/chapter5/3.mdx
index 707145410..997a05e83 100644
--- a/chapters/pt/chapter5/3.mdx
+++ b/chapters/pt/chapter5/3.mdx
@@ -89,11 +89,8 @@ DatasetDict({
 })
 ```
 
-<Tip>
-
-✏️ **Experimente!** Use a função `Dataset.unique()` para encontrar o número de medicamentos e condições exclusivos nos conjuntos de treinamento e teste.
-
-</Tip>
+> [!TIP]
+> ✏️ **Experimente!** Use a função `Dataset.unique()` para encontrar o número de medicamentos e condições exclusivos nos conjuntos de treinamento e teste.
 
 Em seguida, vamos normalizar todos os rótulos `condition` usando `Dataset.map()`. Como fizemos com a tokenização no [Capítulo 3](/course/chapter3), podemos definir uma função simples que pode ser aplicada em todas as linhas de cada divisão em `drug_dataset`:
 
@@ -217,11 +214,8 @@ drug_dataset["train"].sort("review_length")[:3]
 
 Como suspeitávamos, algumas revisões contêm apenas uma única palavra, que, embora possa ser boa para análise de sentimentos, não seria informativa se quisermos prever a condição.
 
-<Tip>
-
-🙋 Uma forma alternativa de adicionar novas colunas a um conjunto de dados é com a função `Dataset.add_column()`. Isso permite que você forneça a coluna como uma lista Python ou array NumPy e pode ser útil em situações em que `Dataset.map()` não é adequado para sua análise.
-
-</Tip>
+> [!TIP]
+> 🙋 Uma forma alternativa de adicionar novas colunas a um conjunto de dados é com a função `Dataset.add_column()`. Isso permite que você forneça a coluna como uma lista Python ou array NumPy e pode ser útil em situações em que `Dataset.map()` não é adequado para sua análise.
 
 Vamos usar a função `Dataset.filter()` para remover comentários que contenham menos de 30 palavras. Da mesma forma que fizemos com a coluna "condição", podemos filtrar as reviews muito curtas exigindo que as reviews tenham um comprimento acima desse limite.
 
@@ -236,11 +230,8 @@ print(drug_dataset.num_rows)
 
 Como você pode ver, isso removeu cerca de 15% das avaliações de nossos conjuntos de treinamento e teste originais.
 
-<Tip>
-
-✏️ **Experimente!** Use a função `Dataset.sort()` para inspecionar as resenhas com o maior número de palavras. Consulte a [documentação](https://huggingface.co/docs/datasets/package_reference/main_classes#datasets.Dataset.sort) para ver qual argumento você precisa usar para classificar as avaliações por tamanho em ordem decrescente.
-
-</Tip>
+> [!TIP]
+> ✏️ **Experimente!** Use a função `Dataset.sort()` para inspecionar as resenhas com o maior número de palavras. Consulte a [documentação](https://huggingface.co/docs/datasets/package_reference/main_classes#datasets.Dataset.sort) para ver qual argumento você precisa usar para classificar as avaliações por tamanho em ordem decrescente.
 
 A última coisa com a qual precisamos lidar é a presença de códigos de caracteres HTML em nossas análises. Podemos usar o módulo `html` do Python para liberar esses caracteres, assim:
 
@@ -297,11 +288,8 @@ Como você viu no [Capítulo 3](/course/chapter3), podemos passar um ou vários
 
 Você também pode cronometrar uma célula inteira colocando `%%time` no início da célula. No hardware em que executamos isso, ele mostrava 10,8s para esta instrução (é o número escrito depois de "Wall time").
 
-<Tip>
-
-✏️ **Experimente!** Execute a mesma instrução com e sem `batched=True`, então tente com um tokenizer lento (adicione `use_fast=False` no método `AutoTokenizer.from_pretrained()`) para que você possa veja quais números você obtém em seu hardware.
-
-</Tip>
+> [!TIP]
+> ✏️ **Experimente!** Execute a mesma instrução com e sem `batched=True`, então tente com um tokenizer lento (adicione `use_fast=False` no método `AutoTokenizer.from_pretrained()`) para que você possa veja quais números você obtém em seu hardware.
 
 Aqui estão os resultados que obtivemos com e sem batching, com um tokenizer rápido e lento:
 
@@ -338,19 +326,13 @@ Opções                          | Tokenizador rápido | Tokenizador lento
 
 Esses são resultados muito mais razoáveis ​​para o tokenizer lento, mas o desempenho do tokenizer rápido também foi substancialmente melhorado. Observe, no entanto, que nem sempre será o caso -- para valores de `num_proc` diferentes de 8, nossos testes mostraram que era mais rápido usar `batched=True` sem essa opção. Em geral, não recomendamos o uso de multiprocessamento Python para tokenizers rápidos com `batched=True`.
 
-<Tip>
-
-Usar `num_proc` para acelerar seu processamento geralmente é uma ótima idéia, desde que a função que você está usando não esteja fazendo algum tipo de multiprocessamento próprio.
-
-</Tip>
+> [!TIP]
+> Usar `num_proc` para acelerar seu processamento geralmente é uma ótima idéia, desde que a função que você está usando não esteja fazendo algum tipo de multiprocessamento próprio.
 
 Toda essa funcionalidade condensada em um único método já é incrível, mas tem mais! Com `Dataset.map()` e `batched=True` você pode alterar o número de elementos em seu conjunto de dados. Isso é super útil em muitas situações em que você deseja criar vários recursos de treinamento a partir de um exemplo, e precisaremos fazer isso como parte do pré-processamento de várias das tarefas de PNL que realizaremos no [Capítulo 7](/course/chapter7).
 
-<Tip>
-
-💡 No aprendizado de máquina, um _exemplo_ geralmente é definido como o conjunto de _recursos_ que alimentamos o modelo. Em alguns contextos, esses recursos serão o conjunto de colunas em um `Dataset`, mas em outros (como aqui e para resposta a perguntas), vários recursos podem ser extraídos de um único exemplo e pertencer a uma única coluna.
-
-</Tip>
+> [!TIP]
+> 💡 No aprendizado de máquina, um _exemplo_ geralmente é definido como o conjunto de _recursos_ que alimentamos o modelo. Em alguns contextos, esses recursos serão o conjunto de colunas em um `Dataset`, mas em outros (como aqui e para resposta a perguntas), vários recursos podem ser extraídos de um único exemplo e pertencer a uma única coluna.
 
 Vamos dar uma olhada em como funciona! Aqui vamos tokenizar nossos exemplos e truncá-los para um comprimento máximo de 128, mas pediremos ao tokenizer para retornar *todos* os pedaços dos textos em vez de apenas o primeiro. Isso pode ser feito com `return_overflowing_tokens=True`:
 ```py
@@ -518,11 +500,8 @@ Vamos criar um `pandas.DataFrame` para todo o conjunto de treinamento selecionan
 train_df = drug_dataset["train"][:]
 ```
 
-<Tip>
-
-🚨 `Dataset.set_format()` altera o formato de retorno para o método dunder `__getitem__()` do conjunto de dados. Isso significa que quando queremos criar um novo objeto como `train_df` a partir de um `Dataset` no formato `"pandas"`, precisamos dividir todo o conjunto de dados para obter um `pandas.DataFrame`. Você pode verificar por si mesmo que o tipo de `drug_dataset["train"]` é `Dataset`, independentemente do formato de saída.
-
-</Tip>
+> [!TIP]
+> 🚨 `Dataset.set_format()` altera o formato de retorno para o método dunder `__getitem__()` do conjunto de dados. Isso significa que quando queremos criar um novo objeto como `train_df` a partir de um `Dataset` no formato `"pandas"`, precisamos dividir todo o conjunto de dados para obter um `pandas.DataFrame`. Você pode verificar por si mesmo que o tipo de `drug_dataset["train"]` é `Dataset`, independentemente do formato de saída.
 
 
 A partir daqui, podemos usar todas as funcionalidades do Pandas que queremos. Por exemplo, podemos fazer um encadeamento sofisticado para calcular a distribuição de classes entre as entradas `condition`:
@@ -593,11 +572,8 @@ Dataset({
 })
 ```
 
-<Tip>
-
-✏️ **Experimente!** Calcule a classificação média por medicamento e armazene o resultado em um novo `Dataset`.
-
-</Tip>
+> [!TIP]
+> ✏️ **Experimente!** Calcule a classificação média por medicamento e armazene o resultado em um novo `Dataset`.
 
 Isso encerra nosso tour pelas várias técnicas de pré-processamento disponíveis em 🤗 Datasets. Para completar a seção, vamos criar um conjunto de validação para preparar o conjunto de dados para treinar um classificador. Antes de fazer isso, vamos redefinir o formato de saída de `drug_dataset` de `"pandas"` para `"arrow"`:
 
diff --git a/chapters/pt/chapter5/4.mdx b/chapters/pt/chapter5/4.mdx
index 2a64e2471..f424f360c 100644
--- a/chapters/pt/chapter5/4.mdx
+++ b/chapters/pt/chapter5/4.mdx
@@ -44,11 +44,8 @@ Dataset({
 
 Podemos ver que há 15.518.009 linhas e 2 colunas em nosso conjunto de dados - isso é muito!
 
-<Tip>
-
-✎ Por padrão, 🤗 Datasets descompactará os arquivos necessários para carregar um dataset. Se você quiser preservar espaço no disco rígido, você pode passar `DownloadConfig(delete_extracted=True)` para o argumento `download_config` de `load_dataset()`. Consulte a [documentação](https://huggingface.co/docs/datasets/package_reference/builder_classes#datasets.DownloadConfig) para obter mais detalhes.
-
-</Tip>
+> [!TIP]
+> ✎ Por padrão, 🤗 Datasets descompactará os arquivos necessários para carregar um dataset. Se você quiser preservar espaço no disco rígido, você pode passar `DownloadConfig(delete_extracted=True)` para o argumento `download_config` de `load_dataset()`. Consulte a [documentação](https://huggingface.co/docs/datasets/package_reference/builder_classes#datasets.DownloadConfig) para obter mais detalhes.
 
 Vamos inspecionar o conteúdo do primeiro exemplo:
 
@@ -99,11 +96,8 @@ Dataset size (cache file) : 19.54 GB
 
 Legal -- apesar de ter quase 20 GB de tamanho, podemos carregar e acessar o conjunto de dados com muito menos RAM!
 
-<Tip>
-
-✏️ **Experimente!** Escolha um dos [subconjuntos](https://the-eye.eu/public/AI/pile_preliminary_components/) da `The Pile` que é maior que a RAM do seu laptop ou desktop, carregue com 🤗 Datasets e meça a quantidade de RAM usada. Observe que, para obter uma medição precisa, você desejará fazer isso em um novo processo. Você pode encontrar os tamanhos descompactados de cada subconjunto na Tabela 1 do [artigo do `The Pile`](https://arxiv.org/abs/2101.00027).
-
-</Tip>
+> [!TIP]
+> ✏️ **Experimente!** Escolha um dos [subconjuntos](https://the-eye.eu/public/AI/pile_preliminary_components/) da `The Pile` que é maior que a RAM do seu laptop ou desktop, carregue com 🤗 Datasets e meça a quantidade de RAM usada. Observe que, para obter uma medição precisa, você desejará fazer isso em um novo processo. Você pode encontrar os tamanhos descompactados de cada subconjunto na Tabela 1 do [artigo do `The Pile`](https://arxiv.org/abs/2101.00027).
 
 Se você estiver familiarizado com Pandas, esse resultado pode ser uma surpresa por causa da famosa [regra de ouro] de Wes Kinney (https://wesmckinney.com/blog/apache-arrow-pandas-internals/) de que você normalmente precisa de 5 para 10 vezes mais RAM do que o tamanho do seu conjunto de dados. Então, como 🤗 Datasets resolve esse problema de gerenciamento de memória? 🤗 Os conjuntos de dados tratam cada conjunto de dados como um [arquivo mapeado em memória](https://en.wikipedia.org/wiki/Memory-mapped_file), que fornece um mapeamento entre RAM e armazenamento do sistema de arquivos que permite que a biblioteca acesse e opere em elementos do conjunto de dados sem precisar carregá-lo totalmente na memória.
 
@@ -131,11 +125,8 @@ print(
 
 Aqui usamos o módulo `timeit` do Python para medir o tempo de execução do `code_snippet`. Normalmente, você poderá iterar em um conjunto de dados a uma velocidade de alguns décimos de GB/s a vários GB/s. Isso funciona muito bem para a grande maioria dos aplicativos, mas às vezes você terá que trabalhar com um conjunto de dados grande demais para ser armazenado no disco rígido do seu laptop. Por exemplo, se tentássemos baixar o Pile por completo, precisaríamos de 825 GB de espaço livre em disco! Para lidar com esses casos, 🤗 Datasets fornece um recurso de streaming que nos permite baixar e acessar elementos em tempo real, sem a necessidade de baixar todo o conjunto de dados. Vamos dar uma olhada em como isso funciona.
 
-<Tip>
-
-💡 Nos notebooks Jupyter, você também pode cronometrar células usando a [`%%timeit` função mágica](https://ipython.readthedocs.io/en/stable/interactive/magics.html#magic-timeit).
-
-</Tip>
+> [!TIP]
+> 💡 Nos notebooks Jupyter, você também pode cronometrar células usando a [`%%timeit` função mágica](https://ipython.readthedocs.io/en/stable/interactive/magics.html#magic-timeit).
 
 ## Conjuntos de dados em streaming
 
@@ -173,11 +164,8 @@ next(iter(tokenized_dataset))
 {'input_ids': [101, 4958, 5178, 4328, 6779, ...], 'attention_mask': [1, 1, 1, 1, 1, ...]}
 ```
 
-<Tip>
-
-💡 Para acelerar a tokenização com streaming você pode passar `batched=True`, como vimos na última seção. Ele processará os exemplos lote por lote; o tamanho do lote padrão é 1.000 e pode ser especificado com o argumento `batch_size`.
-
-</Tip>
+> [!TIP]
+> 💡 Para acelerar a tokenização com streaming você pode passar `batched=True`, como vimos na última seção. Ele processará os exemplos lote por lote; o tamanho do lote padrão é 1.000 e pode ser especificado com o argumento `batch_size`.
 
 Você também pode embaralhar um conjunto de dados transmitido usando `IterableDataset.shuffle()`, mas, diferentemente de `Dataset.shuffle()`, isso apenas embaralha os elementos em um `buffer_size` predefinido:
 
@@ -278,10 +266,7 @@ next(iter(pile_dataset["train"]))
  'text': 'It is done, and submitted. You can play “Survival of the Tastiest” on Android, and on the web...'}
 ```
 
-<Tip>
-
-✏️ **Experimente!** Use um dos grandes corpora Common Crawl como [`mc4`](https://huggingface.co/datasets/mc4) ou [`oscar`](https://huggingface.co/datasets/oscar) para criar um conjunto de dados multilíngue de streaming que represente as proporções faladas de idiomas em um país de sua escolha. Por exemplo, as quatro línguas nacionais na Suíça são alemão, francês, italiano e romanche, então você pode tentar criar um corpus suíço amostrando os subconjuntos do Oscar de acordo com sua proporção falada.
-
-</Tip>
+> [!TIP]
+> ✏️ **Experimente!** Use um dos grandes corpora Common Crawl como [`mc4`](https://huggingface.co/datasets/mc4) ou [`oscar`](https://huggingface.co/datasets/oscar) para criar um conjunto de dados multilíngue de streaming que represente as proporções faladas de idiomas em um país de sua escolha. Por exemplo, as quatro línguas nacionais na Suíça são alemão, francês, italiano e romanche, então você pode tentar criar um corpus suíço amostrando os subconjuntos do Oscar de acordo com sua proporção falada.
 
 Agora você tem todas as ferramentas necessárias para carregar e processar conjuntos de dados de todas as formas e tamanhos, mas, a menos que tenha muita sorte, chegará um ponto em sua jornada de PNL em que você terá que criar um conjunto de dados para resolver o problema. problema em mãos. Esse é o tema da próxima seção!
diff --git a/chapters/pt/chapter5/5.mdx b/chapters/pt/chapter5/5.mdx
index 23e3831dc..09dc6b26d 100644
--- a/chapters/pt/chapter5/5.mdx
+++ b/chapters/pt/chapter5/5.mdx
@@ -113,11 +113,8 @@ response.json()
 
 Uau, é muita informação! Podemos ver campos úteis como `title`, `body` e `number` que descrevem a issue, bem como informações sobre o usuário do GitHub que abriu a issue.
 
-<Tip>
-
-✏️ **Experimente!** Clique em alguns dos URLs na carga JSON acima para ter uma ideia de que tipo de informação cada issue do GitHub está vinculado.
-
-</Tip>
+> [!TIP]
+> ✏️ **Experimente!** Clique em alguns dos URLs na carga JSON acima para ter uma ideia de que tipo de informação cada issue do GitHub está vinculado.
 
 Conforme descrito na [documentação] do GitHub (https://docs.github.com/en/rest/overview/resources-in-the-rest-api#rate-limiting), as solicitações não autenticadas são limitadas a 60 solicitações por hora. Embora você possa aumentar o parâmetro de consulta `per_page` para reduzir o número de solicitações feitas, você ainda atingirá o limite de taxa em qualquer repositório que tenha mais do que alguns milhares de issues. Então, em vez disso, você deve seguir as [instruções] do GitHub (https://docs.github.com/en/github/authenticating-to-github/creating-a-personal-access-token) sobre como criar um _token de acesso pessoal_ para que você pode aumentar o limite de taxa para 5.000 solicitações por hora. Depois de ter seu token, você pode incluí-lo como parte do cabeçalho da solicitação:
 
@@ -126,11 +123,8 @@ GITHUB_TOKEN = xxx  # Copy your GitHub token here
 headers = {"Authorization": f"token {GITHUB_TOKEN}"}
 ```
 
-<Tip warning={true}>
-
-⚠️ Não compartilhe um notebook com seu `GITHUB_TOKEN` colado nele. Recomendamos que você exclua a última célula depois de executá-la para evitar o vazamento dessas informações acidentalmente. Melhor ainda, armazene o token em um arquivo *.env* e use a [`python-dotenv` library](https://github.com/theskumar/python-dotenv) para carregá-lo automaticamente para você como uma variável de ambiente.
-
-</Tip>
+> [!WARNING]
+> ⚠️ Não compartilhe um notebook com seu `GITHUB_TOKEN` colado nele. Recomendamos que você exclua a última célula depois de executá-la para evitar o vazamento dessas informações acidentalmente. Melhor ainda, armazene o token em um arquivo *.env* e use a [`python-dotenv` library](https://github.com/theskumar/python-dotenv) para carregá-lo automaticamente para você como uma variável de ambiente.
 
 Agora que temos nosso token de acesso, vamos criar uma função que possa baixar todas as issues de um repositório do GitHub:
 
@@ -238,11 +232,8 @@ issues_dataset = issues_dataset.map(
 )
 ```
 
-<Tip>
-
-✏️ **Experimente!** Calcule o tempo médio que leva para fechar as issues em 🤗 Datasets. Você pode achar a função `Dataset.filter()` útil para filtrar os pull requests e as issues abertas, e você pode usar a função `Dataset.set_format()` para converter o conjunto de dados em um `DataFrame` para que você possa manipular facilmente os timestamps `created_at` e `closed_at`. Para pontos de bônus, calcule o tempo médio que leva para fechar os pull requests.
-
-</Tip>
+> [!TIP]
+> ✏️ **Experimente!** Calcule o tempo médio que leva para fechar as issues em 🤗 Datasets. Você pode achar a função `Dataset.filter()` útil para filtrar os pull requests e as issues abertas, e você pode usar a função `Dataset.set_format()` para converter o conjunto de dados em um `DataFrame` para que você possa manipular facilmente os timestamps `created_at` e `closed_at`. Para pontos de bônus, calcule o tempo médio que leva para fechar os pull requests.
 
 Embora possamos continuar a limpar o conjunto de dados descartando ou renomeando algumas colunas, geralmente é uma boa prática manter o conjunto de dados o mais "bruto" possível neste estágio para que possa ser facilmente usado em vários aplicativos.
 
@@ -378,11 +369,8 @@ repo_url
 
 Neste exemplo, criamos um repositório de conjunto de dados vazio chamado `github-issues` sob o nome de usuário `lewtun` (o nome de usuário deve ser seu nome de usuário do Hub quando você estiver executando este código!).
 
-<Tip>
-
-✏️ **Experimente!** Use seu nome de usuário e senha do Hugging Face Hub para obter um token e criar um repositório vazio chamado `github-issues`. Lembre-se de **nunca salvar suas credenciais** no Colab ou em qualquer outro repositório, pois essas informações podem ser exploradas por agentes mal-intencionados.
-
-</Tip>
+> [!TIP]
+> ✏️ **Experimente!** Use seu nome de usuário e senha do Hugging Face Hub para obter um token e criar um repositório vazio chamado `github-issues`. Lembre-se de **nunca salvar suas credenciais** no Colab ou em qualquer outro repositório, pois essas informações podem ser exploradas por agentes mal-intencionados.
 
 Em seguida, vamos clonar o repositório do Hub para nossa máquina local e copiar nosso arquivo de conjunto de dados para ele. O 🤗 Hub fornece uma classe `Repository` útil que envolve muitos dos comandos comuns do Git, portanto, para clonar o repositório remoto, basta fornecer o URL e o caminho local para o qual desejamos clonar:
 
@@ -428,11 +416,8 @@ Dataset({
 
 Legal, nós enviamos nosso conjunto de dados para o Hub e está disponível para outros usarem! Há apenas uma coisa importante a fazer: adicionar um _cartão de conjunto de dados_ que explica como o corpus foi criado e fornece outras informações úteis para a comunidade.
 
-<Tip>
-
-💡 Você também pode enviar um conjunto de dados para o Hugging Face Hub diretamente do terminal usando `huggingface-cli` e um pouco de magia Git. Consulte o [guia do 🤗 Datasets](https://huggingface.co/docs/datasets/share#share-a-dataset-using-the-cli) para obter detalhes sobre como fazer isso.
-
-</Tip>
+> [!TIP]
+> 💡 Você também pode enviar um conjunto de dados para o Hugging Face Hub diretamente do terminal usando `huggingface-cli` e um pouco de magia Git. Consulte o [guia do 🤗 Datasets](https://huggingface.co/docs/datasets/share#share-a-dataset-using-the-cli) para obter detalhes sobre como fazer isso.
 
 ## Criando um cartão do datasets
 
@@ -454,18 +439,12 @@ Você pode criar o arquivo *README.md* diretamente no Hub e encontrar um cartão
 <img src="https://huggingface.co/datasets/huggingface-course/documentation-images/resolve/main/en/chapter5/dataset-card.png" alt="A dataset card." width="80%"/>
 </div>
 
-<Tip>
-
-✏️ **Experimente!** Use o aplicativo `dataset-tagging` e [guia do 🤗 datasets](https://github.com/huggingface/datasets/blob/master/templates/README_guide.md) para concluir o *Arquivo README.md* para o conjunto de dados de issues do GitHub.
-
-</Tip>
+> [!TIP]
+> ✏️ **Experimente!** Use o aplicativo `dataset-tagging` e [guia do 🤗 datasets](https://github.com/huggingface/datasets/blob/master/templates/README_guide.md) para concluir o *Arquivo README.md* para o conjunto de dados de issues do GitHub.
 
 É isso! Vimos nesta seção que criar um bom conjunto de dados pode ser bastante complicado, mas felizmente carregá-lo e compartilhá-lo com a comunidade não é. Na próxima seção, usaremos nosso novo conjunto de dados para criar um mecanismo de pesquisa semântica com o 🤗 datasets que podem corresponder perguntas as issues e comentários mais relevantes.
 
-<Tip>
-
-✏️ **Experimente!** Siga as etapas que seguimos nesta seção para criar um conjunto de dados de issues do GitHub para sua biblioteca de código aberto favorita (escolha algo diferente do 🤗 datasets, é claro!). Para pontos de bônus, ajuste um classificador multilabel para prever as tags presentes no campo `labels`.
-
-</Tip>
+> [!TIP]
+> ✏️ **Experimente!** Siga as etapas que seguimos nesta seção para criar um conjunto de dados de issues do GitHub para sua biblioteca de código aberto favorita (escolha algo diferente do 🤗 datasets, é claro!). Para pontos de bônus, ajuste um classificador multilabel para prever as tags presentes no campo `labels`.
 
 
diff --git a/chapters/pt/chapter5/6.mdx b/chapters/pt/chapter5/6.mdx
index f0a868c57..689512757 100644
--- a/chapters/pt/chapter5/6.mdx
+++ b/chapters/pt/chapter5/6.mdx
@@ -186,11 +186,8 @@ Dataset({
 
 Ok, isso nos deu alguns milhares de comentários para trabalhar!
 
-<Tip>
-
-✏️ **Experimente!** Veja se você pode usar `Dataset.map()` para explodir a coluna `comments` de `issues_dataset` _sem_ recorrer ao uso de Pandas. Isso é um pouco complicado; você pode achar útil para esta tarefa a seção ["Mapeamento em lote"](https://huggingface.co/docs/datasets/v1.12.1/about_map_batch#batch-mapping) da documentação do 🤗 Dataset.
-
-</Tip>
+> [!TIP]
+> ✏️ **Experimente!** Veja se você pode usar `Dataset.map()` para explodir a coluna `comments` de `issues_dataset` _sem_ recorrer ao uso de Pandas. Isso é um pouco complicado; você pode achar útil para esta tarefa a seção ["Mapeamento em lote"](https://huggingface.co/docs/datasets/v1.12.1/about_map_batch#batch-mapping) da documentação do 🤗 Dataset.
 
 Agora que temos um comentário por linha, vamos criar uma nova coluna `comments_length` que contém o número de palavras por comentário:
 
@@ -522,8 +519,5 @@ URL: https://github.com/huggingface/datasets/issues/824
 
 Nada mal! Nosso segundo resultado parece corresponder à consulta.
 
-<Tip>
-
-✏️ **Experimente!** Crie sua própria consulta e veja se consegue encontrar uma resposta nos documentos recuperados. Você pode ter que aumentar o parâmetro `k` em `Dataset.get_nearest_examples()` para ampliar a pesquisa.
-
-</Tip>
\ No newline at end of file
+> [!TIP]
+> ✏️ **Experimente!** Crie sua própria consulta e veja se consegue encontrar uma resposta nos documentos recuperados. Você pode ter que aumentar o parâmetro `k` em `Dataset.get_nearest_examples()` para ampliar a pesquisa.
\ No newline at end of file
diff --git a/chapters/pt/chapter6/2.mdx b/chapters/pt/chapter6/2.mdx
index 4cefd09ea..35e02f6a0 100644
--- a/chapters/pt/chapter6/2.mdx
+++ b/chapters/pt/chapter6/2.mdx
@@ -11,11 +11,8 @@ Se um modelo de linguagem não estiver disponível no idioma que você estiver i
 
 <Youtube id="DJimQynXZsQ"/>
 
-<Tip warning={true}>
-
-⚠️ Treinar um tokenizador não é o mesmo que treinar um modelo! O treinamento de um modelo usa o gradiente descendente estocástico para fazer a perda um pouquinho menor a cada batch. Portanto, é aleatório por natureza (o que significa que você deve definir seeds para obter o mesmo resultado quando estiver fazendo o mesmo treino novamente). Treinar um tokenizador é um processo estatístico que tenta identificar que subpalavras são as melhores para escolher dependendo do algoritmo de tokenização. Portanto, este processo é determinístico, o que significa que você terá sempre o mesmo resultado quando for treinar com o mesmo algoritmo no mesmo corpus.
-
-</Tip>
+> [!WARNING]
+> ⚠️ Treinar um tokenizador não é o mesmo que treinar um modelo! O treinamento de um modelo usa o gradiente descendente estocástico para fazer a perda um pouquinho menor a cada batch. Portanto, é aleatório por natureza (o que significa que você deve definir seeds para obter o mesmo resultado quando estiver fazendo o mesmo treino novamente). Treinar um tokenizador é um processo estatístico que tenta identificar que subpalavras são as melhores para escolher dependendo do algoritmo de tokenização. Portanto, este processo é determinístico, o que significa que você terá sempre o mesmo resultado quando for treinar com o mesmo algoritmo no mesmo corpus.
 
 ## Montando um corpus
 
diff --git a/chapters/pt/chapter6/3.mdx b/chapters/pt/chapter6/3.mdx
index 3233b9530..3ed672ffc 100644
--- a/chapters/pt/chapter6/3.mdx
+++ b/chapters/pt/chapter6/3.mdx
@@ -32,10 +32,8 @@ Na discussão a seguir, muitas vezes faremos a distinção entre tokenizadores "
 `batched=True`  | 10.8s          | 4min41s
 `batched=False` | 59.2s          | 5min3s
 
-<Tip warning={true}>
-
-⚠️ Ao tokenizar uma única frase, você nem sempre verá uma diferença de velocidade entre as versões lenta e rápida do mesmo tokenizador. Na verdade, a versão rápida pode ser mais lenta! É somente ao tokenizar muitos textos em paralelo ao mesmo tempo que você poderá ver a diferença com maior nitidez.
-</Tip>
+> [!WARNING]
+> ⚠️ Ao tokenizar uma única frase, você nem sempre verá uma diferença de velocidade entre as versões lenta e rápida do mesmo tokenizador. Na verdade, a versão rápida pode ser mais lenta! É somente ao tokenizar muitos textos em paralelo ao mesmo tempo que você poderá ver a diferença com maior nitidez.
 
 ## Codificação em lote
 
@@ -105,13 +103,10 @@ encoding.word_ids()
 
 Podemos observar que as palavras especiais do tokenizador `[CLS]` e `[SEP]` são mapeados para `None`, e então cada token é mapeada para a palavra de onde se origina. Isso é especialmente útil para determinar se um token está no início da palavra ou se dois tokens estão em uma mesma palavra. Poderíamos contar com o prefix `##` para isso, mas apenas para tokenizadores do tipo BERT; este método funciona para qualquer tipo de tokenizador, desde que seja do tipo rápido. No próximo capítulo, nós veremos como podemos usar esse recurso para aplicar os rótulos que temos para cada palavra adequadamente aos tokens em tarefas como reconhecimento de entidade nomeada (em inglês, Named Entity Recognition, ou NER) e marcação de parte da fala (em inglês, part-of-speech, ou POS). Também podemos usá-lo para mascarar todos os tokens provenientes da mesma palavra na modelagem de linguagem mascarada (uma técnica chamada _mascaramento da palavra inteira_)
 
-<Tip>
-
-A noção do que é uma palavra é complicada. Por exemplo, "d'água" (uma contração de "da água") conta como uma ou duas palavras? Na verdade, depende do tokenizador e da operação de pré-tokenização que é aplicada. Alguns tokenizadores apenas dividem em espaços, então eles considerarão isso como uma palavra. Outros usam pontuação em cima dos espaços, então considerarão duas palavras.
-
-✏️ **Experimente!** Crie um tokenizador a partir dos checkpoints de `bert-base-cased `e `roberta-base` e tokenize "81s" com eles. O que você observa? Quais são os IDs das palavras?
-
-</Tip>
+> [!TIP]
+> A noção do que é uma palavra é complicada. Por exemplo, "d'água" (uma contração de "da água") conta como uma ou duas palavras? Na verdade, depende do tokenizador e da operação de pré-tokenização que é aplicada. Alguns tokenizadores apenas dividem em espaços, então eles considerarão isso como uma palavra. Outros usam pontuação em cima dos espaços, então considerarão duas palavras.
+>
+> ✏️ **Experimente!** Crie um tokenizador a partir dos checkpoints de `bert-base-cased `e `roberta-base` e tokenize "81s" com eles. O que você observa? Quais são os IDs das palavras?
 
 Da mesma forma, existe um método `sentence_ids()` que podemos usar para mapear um token para a sentença de onde veio (embora, neste caso, o `token_type_ids` retornado pelo tokenizador possa nos dar a mesma informação).
 
@@ -128,11 +123,8 @@ Sylvain
 
 Como mencionamos anteriormente, isso é apoiado pelo fato de que o tokenizador rápido acompanha o intervalo de texto de cada token em uma lista de *offsets*. Para ilustrar seu uso, mostraremos a seguir como replicar manualmente os resultados do pipeline `token-classification`. 
 
-<Tip>
-
-✏️ **Experimente!** Crie seu próprio texto de exemplo e veja se você consegue entender quais tokens estão associados ao ID da palavra e também como extrair os intervalos de caracteres para uma única palavra. Como bônus, tente usar duas frases como entrada e veja se os IDs das frases fazem sentido para você.
-
-</Tip>
+> [!TIP]
+> ✏️ **Experimente!** Crie seu próprio texto de exemplo e veja se você consegue entender quais tokens estão associados ao ID da palavra e também como extrair os intervalos de caracteres para uma única palavra. Como bônus, tente usar duas frases como entrada e veja se os IDs das frases fazem sentido para você.
 
 ## Dentro do pipeline `token-classification`
 
diff --git a/chapters/pt/chapter7/1.mdx b/chapters/pt/chapter7/1.mdx
index 65009e31a..8a190ac7d 100644
--- a/chapters/pt/chapter7/1.mdx
+++ b/chapters/pt/chapter7/1.mdx
@@ -31,8 +31,5 @@ Cada seção pode ser lida de forma independente.
 {/if}
 
 
-<Tip>
-
-Se ler as seções em sequência, notará que elas têm bastante código e texto em comum. Essa repetição é intencional para que possa mergulhar (ou voltar mais tarde) em qualquer tarefa que lhe interesse e encontrar um exemplo completo.
-
-</Tip>
+> [!TIP]
+> Se ler as seções em sequência, notará que elas têm bastante código e texto em comum. Essa repetição é intencional para que possa mergulhar (ou voltar mais tarde) em qualquer tarefa que lhe interesse e encontrar um exemplo completo.
diff --git a/chapters/pt/chapter8/2.mdx b/chapters/pt/chapter8/2.mdx
index e158d0be6..08279cd4f 100644
--- a/chapters/pt/chapter8/2.mdx
+++ b/chapters/pt/chapter8/2.mdx
@@ -85,11 +85,8 @@ Oh não, algo parece ter dado errado! Se você é novo em programação, esse ti
 
 Há muitas informações contidas nesses relatórios, então vamos percorrer as partes principais juntos. A primeira coisa a notar é que os tracebacks devem ser lidos _de baixo para cima_. Isso pode soar estranho se você está acostumado a ler texto em inglês de cima para baixo, mas reflete o fato de que o traceback mostra a sequência de chamadas de função que o `pipeline` faz ao baixar o modelo e o tokenizer. (Confira o [Capítulo 2](/course/chapter2) para mais detalhes sobre como o `pipeline` funciona nos bastidores.)
 
-<Tip>
-
-🚨 Está vendo aquela caixa azul em torno de "6 frames" no traceback do Google Colab? Esse é um recurso especial do Colab, que compacta o traceback em "quadros". Se você não conseguir encontrar a fonte de um erro, certifique-se de expandir o rastreamento completo clicando nessas duas pequenas setas.
-
-</Tip>
+> [!TIP]
+> 🚨 Está vendo aquela caixa azul em torno de "6 frames" no traceback do Google Colab? Esse é um recurso especial do Colab, que compacta o traceback em "quadros". Se você não conseguir encontrar a fonte de um erro, certifique-se de expandir o rastreamento completo clicando nessas duas pequenas setas.
 
 Isso significa que a última linha do traceback indica a última mensagem de erro e fornece o nome da exceção que foi gerada. Nesse caso, o tipo de exceção é `OSError`, que indica um erro relacionado ao sistema. Se lermos a mensagem de erro que a acompanha, veremos que parece haver um problema com o arquivo *config.json* do modelo e recebemos duas sugestões para corrigi-lo:
 
@@ -103,11 +100,8 @@ Make sure that:
 """
 ```
 
-<Tip>
-
-💡 Se você encontrar uma mensagem de erro difícil de entender, basta copiar e colar a mensagem na barra de pesquisa do Google ou [Stack Overflow](https://stackoverflow.com/) (sim, sério!). Há uma boa chance de você não ser a primeira pessoa a encontrar o erro, e essa é uma boa maneira de encontrar soluções que outras pessoas da comunidade postaram. Por exemplo, pesquisar por `OSError: Can't load config for` no Stack Overflow fornece vários [hits](https://stackoverflow.com/search?q=OSError%3A+Can%27t+load+config+for+) que poderia ser usado como ponto de partida para resolver o problema.
-
-</Tip>
+> [!TIP]
+> 💡 Se você encontrar uma mensagem de erro difícil de entender, basta copiar e colar a mensagem na barra de pesquisa do Google ou [Stack Overflow](https://stackoverflow.com/) (sim, sério!). Há uma boa chance de você não ser a primeira pessoa a encontrar o erro, e essa é uma boa maneira de encontrar soluções que outras pessoas da comunidade postaram. Por exemplo, pesquisar por `OSError: Can't load config for` no Stack Overflow fornece vários [hits](https://stackoverflow.com/search?q=OSError%3A+Can%27t+load+config+for+) que poderia ser usado como ponto de partida para resolver o problema.
 
 A primeira sugestão é nos pedir para verificar se o ID do modelo está realmente correto, então a primeira ordem do dia é copiar o identificador e colá-lo na barra de pesquisa do Hub:
 
@@ -160,11 +154,8 @@ pretrained_checkpoint = "distilbert-base-uncased"
 config = AutoConfig.from_pretrained(pretrained_checkpoint)
 ```
 
-<Tip warning={true}>
-
-🚨 A abordagem que estamos tomando aqui não é infalível, já que nosso colega pode ter ajustado a configuração de `distilbert-base-uncased` antes de ajustar o modelo. Na vida real, gostaríamos de verificar com eles primeiro, mas para os propósitos desta seção, vamos supor que eles usaram a configuração padrão.
-
-</Tip>
+> [!WARNING]
+> 🚨 A abordagem que estamos tomando aqui não é infalível, já que nosso colega pode ter ajustado a configuração de `distilbert-base-uncased` antes de ajustar o modelo. Na vida real, gostaríamos de verificar com eles primeiro, mas para os propósitos desta seção, vamos supor que eles usaram a configuração padrão.
 
 Podemos então enviar isso para o nosso repositório de modelos com a função `push_to_hub()` da configuração:
 
diff --git a/chapters/ro/chapter1/3.mdx b/chapters/ro/chapter1/3.mdx
index f5d890f1e..9a5d63d3d 100644
--- a/chapters/ro/chapter1/3.mdx
+++ b/chapters/ro/chapter1/3.mdx
@@ -9,10 +9,9 @@
 
 În această parte, vom explora ce pot face modelele Transformer și vom folosi primul nostru instrument din biblioteca 🤗 Transformers: funcția `pipeline()`.
 
-<Tip>
-👀 Vedeți butonul <em>Open in Colab</em> din dreapta sus? Faceți clic pe el pentru a deschide un notebook Google Colab cu toate exemplele de cod din această secțiune. Acest buton va fi prezent în orice secțiune care conține exemple de cod. 
-Dacă doriți să executați exemplele local, vă recomandăm să aruncați o privire la <a href="/course/chapter0">setup</a>.
-</Tip>
+> [!TIP]
+> 👀 Vedeți butonul <em>Open in Colab</em> din dreapta sus? Faceți clic pe el pentru a deschide un notebook Google Colab cu toate exemplele de cod din această secțiune. Acest buton va fi prezent în orice secțiune care conține exemple de cod. 
+> Dacă doriți să executați exemplele local, vă recomandăm să aruncați o privire la <a href="/course/chapter0">setup</a>.
 
 ## Modelele Transformer sunt peste tot![[modelele-transformer-sunt-peste-tot]]
 
@@ -22,9 +21,8 @@ Modelele Transformer sunt utilizate pentru a rezolva toate tipurile de sarcini N
 
 Biblioteca [🤗 Transformers](https://github.com/huggingface/transformers) oferă funcționalitatea de a crea și utiliza aceste modele partajate. [Model Hub](https://huggingface.co/models) conține mii de modele preinstruite pe care oricine le poate descărca și utiliza. De asemenea, vă puteți încărca propriile modele pe Hub!
 
-<Tip>
-⚠️ Hub-ul Hugging Face nu este limitat la modelele Transformer. Oricine poate partaja orice fel de modele sau seturi de date pe care le dorește! <a href="https://huggingface.co/join">Creați un cont huggingface.co</a> pentru a beneficia de toate funcțiile disponibile!
-</Tip>
+> [!TIP]
+> ⚠️ Hub-ul Hugging Face nu este limitat la modelele Transformer. Oricine poate partaja orice fel de modele sau seturi de date pe care le dorește! <a href="https://huggingface.co/join">Creați un cont huggingface.co</a> pentru a beneficia de toate funcțiile disponibile!
 
 Înainte de a analiza funcționarea internă a modelelor Transformer , să ne oprim asupra unor exemple privind modul în care acestea pot fi utilizate pentru a rezolva unele probleme interesante de NLP.
 
@@ -103,11 +101,8 @@ classifier(
 
 Acest pipeline se numește _zero-shot_ deoarece nu trebuie să reglați modelul pe datele dvs. pentru a o utiliza. Aceasta poate returna direct scoruri de probabilitate pentru orice listă de etichete doriți!
 
-<Tip>
-
-✏️ **Încercați** Jucați-vă cu propriile secvențe și etichete și vedeți cum se comportă modelul.
-
-</Tip>
+> [!TIP]
+> ✏️ **Încercați** Jucați-vă cu propriile secvențe și etichete și vedeți cum se comportă modelul.
 
 
 ## Text generation[[text-generation]]
@@ -131,11 +126,8 @@ generator("In this course, we will teach you how to")
 
 Puteți controla câte secvențe diferite sunt generate cu argumentul `num_return_sequences` și lungimea totală a textului de ieșire cu argumentul `max_length`.
 
-<Tip>
-
-✏️ **Încercați!** Utilizați argumentele `num_return_sequences` și `max_length` pentru a genera două propoziții a câte 15 cuvinte fiecare.
-
-</Tip>
+> [!TIP]
+> ✏️ **Încercați!** Utilizați argumentele `num_return_sequences` și `max_length` pentru a genera două propoziții a câte 15 cuvinte fiecare.
 
 
 ## Utilizarea oricărui model de pe Hub într-un pipeline[[utilizarea-oricărui-model-de-pe-hub-într-un-pipeline]]
@@ -167,11 +159,8 @@ Puteți să vă îmbunătățiți căutarea unui model făcând clic pe etichete
 
 După ce selectați un model făcând clic pe el, veți vedea că există un widget care vă permite să îl încercați direct online. În acest fel, puteți testa rapid capacitățile modelului înainte de a-l descărca.
 
-<Tip>
-
-✏️ ** Încercați!** Utilizați filtrele pentru a găsi un model de generare a textului pentru o altă limbă. Nu ezitați să explorați widget-ul și să îl utilizați într-un pipeline!
-
-</Tip>
+> [!TIP]
+> ✏️ ** Încercați!** Utilizați filtrele pentru a găsi un model de generare a textului pentru o altă limbă. Nu ezitați să explorați widget-ul și să îl utilizați într-un pipeline!
 
 ### API-ul de inferență[[api-ul-de-inferență]]
 
@@ -203,11 +192,8 @@ unmasker("This course will teach you all about <mask> models.", top_k=2)
 
 Argumentul `top_k` controlează câte posibilități doriți să fie afișate. Rețineți că aici modelul completează cuvântul special `<mask>`, care este adesea denumit *mask token*. Alte modele de umplere a măștii ar putea avea token-uri de mască diferite, astfel încât este întotdeauna bine să verificați cuvântul de mască adecvat atunci când explorați alte modele. O modalitate de verificare este să vă uitați la cuvântul mască utilizat în widget.
 
-<Tip>
-
-✏️ **Încercați!** Căutați modelul `bert-base-cased` pe Hub și identificați-i cuvântul mască în widget-ul Inference API. Ce prezice acest model pentru propoziția din exemplul nostru `pipeline` de mai sus?
-
-</Tip>
+> [!TIP]
+> ✏️ **Încercați!** Căutați modelul `bert-base-cased` pe Hub și identificați-i cuvântul mască în widget-ul Inference API. Ce prezice acest model pentru propoziția din exemplul nostru `pipeline` de mai sus?
 
 ## Named entity recognition[[named-entity-recognition]]
 
@@ -231,11 +217,8 @@ Aici, modelul a identificat corect că Sylvain este o persoană (PER), Hugging F
 
 Trecem opțiunea `grouped_entities=True` în funcția de creare a pipeline-ului pentru a-i spune pipeline-ului să regrupeze părțile propoziției care corespund aceleiași entități: aici, modelul a grupat corect „Hugging” și „Face” ca o singură organizație, chiar dacă numele este format din mai multe cuvinte. De fapt, după cum vom vedea în capitolul următor, preprocesarea chiar împarte unele cuvinte în părți mai mici. De exemplu, `Sylvain` este împărțit în patru părți: `S`, `##yl`, `##va`, și `##in`. În etapa de postprocesare, pipeline-ul a reușit să regrupeze aceste părți.
 
-<Tip>
-
-✏️ **Încercați!** Căutați în Hub-ul de modele un model capabil să facă etichetarea părții de vorbire (de obicei abreviată ca POS) în limba engleză. Ce prezice acest model pentru propoziția din exemplul de mai sus?
-
-</Tip>
+> [!TIP]
+> ✏️ **Încercați!** Căutați în Hub-ul de modele un model capabil să facă etichetarea părții de vorbire (de obicei abreviată ca POS) în limba engleză. Ce prezice acest model pentru propoziția din exemplul de mai sus?
 
 ## Question answering[[question-answering]]
 
@@ -317,10 +300,7 @@ translator("Ce cours est produit par Hugging Face.")
 ```
 
 Ca și în cazul generării și rezumării textului, puteți specifica `max_length` sau `min_length` pentru rezultat.
-<Tip>
-
-✏️ **Încercați!** Căutați modele de traducere în alte limbi și încercați să traduceți propoziția anterioară în câteva limbi diferite.
-
-</Tip>
+> [!TIP]
+> ✏️ **Încercați!** Căutați modele de traducere în alte limbi și încercați să traduceți propoziția anterioară în câteva limbi diferite.
 
 Pipeline-urile prezentate până în acest moment au în principal scop demonstrativ. Ele au fost programate pentru sarcini specifice și nu pot efectua variații ale acestora. În capitolul următor, veți afla ce se află în interiorul unei funcții `pipeline()` și cum să îi personalizați comportamentul.
\ No newline at end of file
diff --git a/chapters/ro/chapter11/1.mdx b/chapters/ro/chapter11/1.mdx
index 967010bab..6d688d49c 100644
--- a/chapters/ro/chapter11/1.mdx
+++ b/chapters/ro/chapter11/1.mdx
@@ -18,9 +18,8 @@ Adaptarea de rang scăzut (LoRA) este o tehnică pentru fine-tuningul modelelor
 
 Evaluarea este un pas crucial în procesul de fine-tuning. Ne permite să măsurăm performanța modelului pe un set de date specific sarcinii.
 
-<Tip>
-⚠️ Pentru a beneficia de toate funcționalitățile disponibile cu Model Hub și 🤗 Transformers, recomandăm <a href="https://huggingface.co/join">crearea unui cont</a>.
-</Tip>
+> [!TIP]
+> ⚠️ Pentru a beneficia de toate funcționalitățile disponibile cu Model Hub și 🤗 Transformers, recomandăm <a href="https://huggingface.co/join">crearea unui cont</a>.
 
 ## Referințe
 
diff --git a/chapters/ro/chapter11/2.mdx b/chapters/ro/chapter11/2.mdx
index b5fe8d46a..b42a48097 100644
--- a/chapters/ro/chapter11/2.mdx
+++ b/chapters/ro/chapter11/2.mdx
@@ -10,13 +10,12 @@
 
 Template-urile de chat sunt esențiale pentru structurarea interacțiunilor dintre modelele de limbaj și utilizatori. Indiferent dacă construiți un chatbot simplu sau un agent AI complex, înțelegerea modului de a formata corect conversațiile este crucială pentru a obține cele mai bune rezultate de la modelul dumneavoastră. În acest ghid, vom explora ce sunt template-urile de chat, de ce sunt importante și cum să le folosiți eficient.
 
-<Tip>
-Template-urile de chat sunt cruciale pentru:
-- Menținerea unei structuri consecvente de conversație
-- Asigurarea identificării corecte a rolurilor
-- Gestionarea contextului pe mai multe tururi
-- Suportarea funcționalităților avansate precum utilizarea instrumentelor
-</Tip>
+> [!TIP]
+> Template-urile de chat sunt cruciale pentru:
+> - Menținerea unei structuri consecvente de conversație
+> - Asigurarea identificării corecte a rolurilor
+> - Gestionarea contextului pe mai multe tururi
+> - Suportarea funcționalităților avansate precum utilizarea instrumentelor
 
 ## Tipuri de modele și template-uri
 
@@ -27,9 +26,8 @@ Modelele ajustate pentru instrucțiuni sunt antrenate să urmeze o structură co
 
 Pentru a face ca un model de bază să se comporte ca un model instruct, trebuie să formatăm prompturile într-un mod consecvent pe care modelul să îl înțeleagă. Aici intervin template-urile de chat. ChatML este unul dintre aceste formate de template care structurează conversațiile cu indicatori clari de rol (sistem, utilizator, asistent). Iată un ghid despre [ChatML](https://huggingface.co/HuggingFaceTB/SmolLM2-135M-Instruct/blob/e2c3f7557efbdec707ae3a336371d169783f1da1/tokenizer_config.json#L146).
 
-<Tip warning={true}>
-Când folosiți un model instruct, verificați întotdeauna că folosiți formatul corect de template de chat. Folosirea unui template greșit poate duce la performanțe slabe ale modelului sau comportament neașteptat. Cea mai ușoară modalitate de a vă asigura de acest lucru este să verificați configurația tokenizer-ului modelului pe Hub. De exemplu, modelul `SmolLM2-135M-Instruct` folosește <a href="https://huggingface.co/HuggingFaceTB/SmolLM2-135M-Instruct/blob/e2c3f7557efbdec707ae3a336371d169783f1da1/tokenizer_config.json#L146">această configurație</a>.
-</Tip>
+> [!WARNING]
+> Când folosiți un model instruct, verificați întotdeauna că folosiți formatul corect de template de chat. Folosirea unui template greșit poate duce la performanțe slabe ale modelului sau comportament neașteptat. Cea mai ușoară modalitate de a vă asigura de acest lucru este să verificați configurația tokenizer-ului modelului pe Hub. De exemplu, modelul `SmolLM2-135M-Instruct` folosește <a href="https://huggingface.co/HuggingFaceTB/SmolLM2-135M-Instruct/blob/e2c3f7557efbdec707ae3a336371d169783f1da1/tokenizer_config.json#L146">această configurație</a>.
 
 ### Formate comune de template
 
@@ -142,12 +140,11 @@ Template-urile de chat pot gestiona scenarii mai complexe dincolo de simpla inte
 3. **Apeluri de funcții**: Pentru executarea structurată de funcții
 4. **Context multi-turn**: Pentru menținerea istoricului conversației
 
-<Tip>
-Când implementați funcționalități avansate:
-- Testați temeinic cu modelul dumneavoastră specific. Template-urile de viziune și utilizare a instrumentelor sunt deosebit de diverse.
-- Monitorizați cu atenție utilizarea token-urilor între fiecare funcționalitate și model.
-- Documentați formatul așteptat pentru fiecare funcționalitate
-</Tip>
+> [!TIP]
+> Când implementați funcționalități avansate:
+> - Testați temeinic cu modelul dumneavoastră specific. Template-urile de viziune și utilizare a instrumentelor sunt deosebit de diverse.
+> - Monitorizați cu atenție utilizarea token-urilor între fiecare funcționalitate și model.
+> - Documentați formatul așteptat pentru fiecare funcționalitate
 
 Pentru conversații multimodale, template-urile de chat pot include referințe la imagini sau imagini codificate în base64:
 
@@ -207,44 +204,42 @@ Când lucrați cu template-uri de chat, urmați aceste practici cheie:
 4. **Gestionarea erorilor**: Includeți gestionarea adecvată a erorilor pentru apelurile de instrumente și intrările multimodale
 5. **Validare**: Validați structura mesajelor înainte de a le trimite la model
 
-<Tip warning={true}>
-Capcane comune de evitat:
-- Amestecarea diferitelor formate de template în aceeași aplicație
-- Depășirea limitelor de token-uri cu istoricuri lungi de conversație
-- Neescaparea corespunzătoare a caracterelor speciale în mesaje
-- Uitarea validării structurii mesajelor de intrare
-- Ignorarea cerințelor specifice de template ale modelului
-</Tip>
+> [!WARNING]
+> Capcane comune de evitat:
+> - Amestecarea diferitelor formate de template în aceeași aplicație
+> - Depășirea limitelor de token-uri cu istoricuri lungi de conversație
+> - Neescaparea corespunzătoare a caracterelor speciale în mesaje
+> - Uitarea validării structurii mesajelor de intrare
+> - Ignorarea cerințelor specifice de template ale modelului
 
 ## Exercițiu practic
 
 Să exersăm implementarea template-urilor de chat cu un exemplu din lumea reală.
 
-<Tip>
-Urmați acești pași pentru a converti setul de date `HuggingFaceTB/smoltalk` în formatul chatml:
-
-1. Încărcați setul de date:
-```python
-from datasets import load_dataset
-
-dataset = load_dataset("HuggingFaceTB/smoltalk")
-```
-
-2. Creați o funcție de procesare:
-```python
-def convert_to_chatml(example):
-    return {
-        "messages": [
-            {"role": "user", "content": example["input"]},
-            {"role": "assistant", "content": example["output"]},
-        ]
-    }
-```
-
-3. Aplicați template-ul de chat folosind tokenizer-ul modelului ales
-
-Amintiți-vă să validați că formatul de ieșire se potrivește cu cerințele modelului țintă!
-</Tip>
+> [!TIP]
+> Urmați acești pași pentru a converti setul de date `HuggingFaceTB/smoltalk` în formatul chatml:
+>
+> 1. Încărcați setul de date:
+> ```python
+> from datasets import load_dataset
+>
+> dataset = load_dataset("HuggingFaceTB/smoltalk")
+> ```
+>
+> 2. Creați o funcție de procesare:
+> ```python
+> def convert_to_chatml(example):
+>     return {
+>         "messages": [
+>             {"role": "user", "content": example["input"]},
+>             {"role": "assistant", "content": example["output"]},
+>         ]
+>     }
+> ```
+>
+> 3. Aplicați template-ul de chat folosind tokenizer-ul modelului ales
+>
+> Amintiți-vă să validați că formatul de ieșire se potrivește cu cerințele modelului țintă!
 
 ## Resurse adiționale
 
diff --git a/chapters/ro/chapter11/3.mdx b/chapters/ro/chapter11/3.mdx
index 1d9f924a2..71856d2a8 100644
--- a/chapters/ro/chapter11/3.mdx
+++ b/chapters/ro/chapter11/3.mdx
@@ -14,12 +14,11 @@ Această pagină oferă un ghid pas cu pas pentru fine-tuningul modelului [`deep
 
 Înainte de a se adânci în implementare, este important să înțelegem când SFT este alegerea potrivită pentru proiectul dumneavoastră. Ca primul pas, ar trebui să considerați dacă folosirea unui model existent ajustat pentru instrucțiuni cu prompturi bine elaborate ar fi suficientă pentru cazul dumneavoastră de utilizare. SFT implică resurse computaționale semnificative și efort de inginerie, așa că ar trebui urmărit doar când promptarea modelelor existente se dovedește insuficientă.
 
-<Tip>
-Considerați SFT doar dacă:
-- Aveți nevoie de performanțe suplimentare dincolo de ceea ce poate realiza promptarea
-- Aveți un caz de utilizare specific în care costul folosirii unui model mare de uz general depășește costul fine-tuningului unui model mai mic
-- Aveți nevoie de formate de ieșire specializate sau cunoștințe specifice domeniului cu care modelele existente se confruntă
-</Tip>
+> [!TIP]
+> Considerați SFT doar dacă:
+> - Aveți nevoie de performanțe suplimentare dincolo de ceea ce poate realiza promptarea
+> - Aveți un caz de utilizare specific în care costul folosirii unui model mare de uz general depășește costul fine-tuningului unui model mai mic
+> - Aveți nevoie de formate de ieșire specializate sau cunoștințe specifice domeniului cu care modelele existente se confruntă
 
 Dacă determinați că SFT este necesar, decizia de a continua depinde de doi factori principali:
 
@@ -36,15 +35,14 @@ Când lucrați în domenii specializate, SFT ajută la alinierea modelului cu ce
 3. Gestionarea adecvată a întrebărilor tehnice
 4. Urmarea ghidurilor specifice industriei
 
-<Tip>
-Înainte de a începe SFT, evaluați dacă cazul dumneavoastră de utilizare necesită:
-- Formatare precisă de ieșire
-- Cunoștințe specifice domeniului
-- Modele consecvente de răspuns
-- Aderarea la ghiduri specifice
-
-Această evaluare va ajuta să determinați dacă SFT este abordarea potrivită pentru nevoile dumneavoastră.
-</Tip>
+> [!TIP]
+> Înainte de a începe SFT, evaluați dacă cazul dumneavoastră de utilizare necesită:
+> - Formatare precisă de ieșire
+> - Cunoștințe specifice domeniului
+> - Modele consecvente de răspuns
+> - Aderarea la ghiduri specifice
+>
+> Această evaluare va ajuta să determinați dacă SFT este abordarea potrivită pentru nevoile dumneavoastră.
 
 ## Pregătirea setului de date
 
@@ -88,13 +86,12 @@ Configurația SFTTrainer necesită considerarea mai multor parametri care contro
    - `eval_steps`: Cât de des să evalueze pe datele de validare
    - `save_steps`: Frecvența salvării punctelor de verificare ale modelului
 
-<Tip>
-Începeți cu valori conservatoare și ajustați bazat pe monitorizare:
-- Începeți cu 1-3 epoci
-- Folosiți dimensiuni mai mici ale batch-ului inițial
-- Monitorizați metricile de validare atent
-- Ajustați rata de învățare dacă antrenamentul este instabil
-</Tip>
+> [!TIP]
+> Începeți cu valori conservatoare și ajustați bazat pe monitorizare:
+> - Începeți cu 1-3 epoci
+> - Folosiți dimensiuni mai mici ale batch-ului inițial
+> - Monitorizați metricile de validare atent
+> - Ajustați rata de învățare dacă antrenamentul este instabil
 
 ## Implementare cu TRL
 
@@ -145,9 +142,8 @@ trainer = SFTTrainer(
 trainer.train()
 ```
 
-<Tip>
-Când folosiți un set de date cu un câmp "messages" (ca exemplul de mai sus), SFTTrainer aplică automat template-ul de chat al modelului, pe care îl recuperează de pe hub. Aceasta înseamnă că nu aveți nevoie de nicio configurație suplimentară pentru a gestiona conversațiile în stil chat - trainer-ul va formata mesajele conform formatului de template așteptat al modelului.
-</Tip>
+> [!TIP]
+> Când folosiți un set de date cu un câmp "messages" (ca exemplul de mai sus), SFTTrainer aplică automat template-ul de chat al modelului, pe care îl recuperează de pe hub. Aceasta înseamnă că nu aveți nevoie de nicio configurație suplimentară pentru a gestiona conversațiile în stil chat - trainer-ul va formata mesajele conform formatului de template așteptat al modelului.
 
 ## Împachetarea setului de date
 
@@ -201,13 +197,12 @@ Monitorizarea eficientă implică urmărirea metricilor cantitative și evaluare
 - Progresia ratei de învățare
 - Normele gradientului
 
-<Tip warning={true}>
-Urmăriți aceste semne de avertizare în timpul antrenamentului:
-1. Pierderea validării crește în timp ce pierderea antrenamentului scade (supraadaptare)
-2. Nicio îmbunătățire semnificativă în valorile pierderilor (subadaptare)
-3. Valori extrem de mici ale pierderilor (posibilă memorizare)
-4. Formatare inconsistentă a ieșirii (probleme de învățare a template-ului)
-</Tip>
+> [!WARNING]
+> Urmăriți aceste semne de avertizare în timpul antrenamentului:
+> 1. Pierderea validării crește în timp ce pierderea antrenamentului scade (supraadaptare)
+> 2. Nicio îmbunătățire semnificativă în valorile pierderilor (subadaptare)
+> 3. Valori extrem de mici ale pierderilor (posibilă memorizare)
+> 4. Formatare inconsistentă a ieșirii (probleme de învățare a template-ului)
 
 ### Calea către convergență
 
@@ -242,9 +237,8 @@ Valori extrem de mici ale pierderilor ar putea sugera memorizare mai degrabă de
 - Ieșirile lipsesc de diversitate
 - Răspunsurile sunt prea similare cu exemplele de antrenare
 
-<Tip warning={true}>
-Monitorizați atât valorile pierderilor, cât și ieșirile efective ale modelului în timpul antrenamentului. Uneori pierderea poate arăta bine în timp ce modelul dezvoltă comportamente nedorite. Evaluarea calitativă regulată a răspunsurilor modelului ajută la detectarea problemelor pe care metricile singure le-ar putea rata.
-</Tip>
+> [!WARNING]
+> Monitorizați atât valorile pierderilor, cât și ieșirile efective ale modelului în timpul antrenamentului. Uneori pierderea poate arăta bine în timp ce modelul dezvoltă comportamente nedorite. Evaluarea calitativă regulată a răspunsurilor modelului ajută la detectarea problemelor pe care metricile singure le-ar putea rata.
 
 Ar trebui să observăm că interpretarea valorilor pierderilor pe care o descriem aici este destinată cazului cel mai comun, și de fapt, valorile pierderilor se pot comporta în moduri diferite în funcție de model, setul de date, parametrii de antrenare, etc. Dacă sunteți interesați să explorați mai multe despre modelele descrise, ar trebui să consultați acest articol de blog de la oamenii de la [Fast AI](https://www.fast.ai/posts/2023-09-04-learning-jumps/).
 
@@ -259,14 +253,13 @@ După finalizarea SFT, considerați aceste acțiuni de urmărire:
 3. Testați reținerea cunoștințelor specifice domeniului
 4. Monitorizați metricile de performanță din lumea reală
 
-<Tip>
-Documentați procesul de antrenare, inclusiv:
-- Caracteristicile setului de date
-- Parametrii antrenamentului
-- Metricile de performanță
-- Limitările cunoscute
-Această documentație va fi valoroasă pentru iterațiile viitoare ale modelului.
-</Tip>
+> [!TIP]
+> Documentați procesul de antrenare, inclusiv:
+> - Caracteristicile setului de date
+> - Parametrii antrenamentului
+> - Metricile de performanță
+> - Limitările cunoscute
+> Această documentație va fi valoroasă pentru iterațiile viitoare ale modelului.
 
 ## Chestionar
 
diff --git a/chapters/ro/chapter11/4.mdx b/chapters/ro/chapter11/4.mdx
index f243b94e3..3edb5b76a 100644
--- a/chapters/ro/chapter11/4.mdx
+++ b/chapters/ro/chapter11/4.mdx
@@ -67,9 +67,8 @@ Să parcurgem configurația LoRA și parametrii cheie.
 | `bias` | Controlează antrenarea termenilor de bias. Opțiunile sunt "none", "all" sau "lora_only". "none" este cel mai comun pentru eficiența memoriei. |
 | `target_modules` | Specifică la care module ale modelului să aplice LoRA. Poate fi "all-linear" sau module specifice precum "q_proj,v_proj". Mai multe module permit o adaptabilitate mai mare dar cresc utilizarea memoriei. |
 
-<Tip>
-Când implementați metode PEFT, începeți cu valori mici ale rangului (4-8) pentru LoRA și monitorizați pierderea antrenamentului. Folosiți seturi de validare pentru a preveni supraadaptarea și comparați rezultatele cu liniile de bază de fine-tuning complet când este posibil. Eficacitatea diferitelor metode poate varia în funcție de sarcină, așa că experimentarea este cheia.
-</Tip>
+> [!TIP]
+> Când implementați metode PEFT, începeți cu valori mici ale rangului (4-8) pentru LoRA și monitorizați pierderea antrenamentului. Folosiți seturi de validare pentru a preveni supraadaptarea și comparați rezultatele cu liniile de bază de fine-tuning complet când este posibil. Eficacitatea diferitelor metode poate varia în funcție de sarcină, așa că experimentarea este cheia.
 
 ## Folosirea TRL cu PEFT
 
@@ -112,11 +111,8 @@ trainer = SFTTrainer(
 )
 ```
 
-<Tip>
-
-✏️ **Încercați!** Construiți pe modelul dumneavoastră ajustat fin din secțiunea anterioară, dar faceți fine-tuning cu LoRA. Folosiți setul de date `HuggingFaceTB/smoltalk` pentru a face fine-tuning la un model `deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B`, folosind configurația LoRA pe care am definit-o mai sus.
-
-</Tip>
+> [!TIP]
+> ✏️ **Încercați!** Construiți pe modelul dumneavoastră ajustat fin din secțiunea anterioară, dar faceți fine-tuning cu LoRA. Folosiți setul de date `HuggingFaceTB/smoltalk` pentru a face fine-tuning la un model `deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B`, folosind configurația LoRA pe care am definit-o mai sus.
 
 ## Îmbinarea adaptorilor LoRA
 
@@ -158,11 +154,8 @@ merged_model.save_pretrained("path/to/save/merged_model")
 tokenizer.save_pretrained("path/to/save/merged_model")
 ```
 
-<Tip>
-
-✏️ **Încercați!** Îmbinați greutățile adaptorului înapoi în modelul de bază. Folosiți setul de date `HuggingFaceTB/smoltalk` pentru a face fine-tuning la un model `deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B`, folosind configurația LoRA pe care am definit-o mai sus.
-
-</Tip>
+> [!TIP]
+> ✏️ **Încercați!** Îmbinați greutățile adaptorului înapoi în modelul de bază. Folosiți setul de date `HuggingFaceTB/smoltalk` pentru a face fine-tuning la un model `deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B`, folosind configurația LoRA pe care am definit-o mai sus.
 
 # Resurse
 
diff --git a/chapters/ro/chapter11/5.mdx b/chapters/ro/chapter11/5.mdx
index 61ca3c867..6093775dc 100644
--- a/chapters/ro/chapter11/5.mdx
+++ b/chapters/ro/chapter11/5.mdx
@@ -120,11 +120,8 @@ Rezultatele sunt afișate în format tabular arătând:
 
 Lighteval include, de asemenea, un API Python pentru sarcini de evaluare mai detaliate, care este util pentru manipularea rezultatelor într-un mod mai flexibil. Consultați [documentația Lighteval](https://huggingface.co/docs/lighteval/using-the-python-api) pentru mai multe informații.
 
-<Tip>
-
-✏️ **Încercați!** Evaluați modelul dumneavoastră ajustat fin pe o sarcină specifică în lighteval.
-
-</Tip>
+> [!TIP]
+> ✏️ **Încercați!** Evaluați modelul dumneavoastră ajustat fin pe o sarcină specifică în lighteval.
 
 # Chestionar de sfârșit de capitol[[end-of-chapter-quiz]]
 
diff --git a/chapters/ro/chapter12/1.mdx b/chapters/ro/chapter12/1.mdx
index 4dbae17fa..fb2f414ed 100644
--- a/chapters/ro/chapter12/1.mdx
+++ b/chapters/ro/chapter12/1.mdx
@@ -77,11 +77,8 @@ Pentru a obține cel mai mult din acest capitol, este util să ai:
 
 Nu-ți face griji dacă îți lipsesc unele dintre acestea – vom explica conceptele cheie pe măsură ce mergem! 🚀
 
-<Tip>
-
-Dacă nu ai toate cerințele prealabile, consultă acest [curs](/course/chapter1/1) de la unitățile 1 la 11
-
-</Tip>
+> [!TIP]
+> Dacă nu ai toate cerințele prealabile, consultă acest [curs](/course/chapter1/1) de la unitățile 1 la 11
 
 ## Cum să Folosești Acest Capitol
 
diff --git a/chapters/ro/chapter12/2.mdx b/chapters/ro/chapter12/2.mdx
index fc76264eb..5b08479f4 100644
--- a/chapters/ro/chapter12/2.mdx
+++ b/chapters/ro/chapter12/2.mdx
@@ -4,11 +4,8 @@ Bun venit la prima pagină!
 
 Vom începe călătoria noastră în lumea captivantă a Învățării prin Întărire (RL) și vom descoperi cum aceasta revoluționează modul în care antrenăm Modelele de Limbaj precum cele pe care le-ai putea folosi în fiecare zi.
 
-<Tip>
-
-În acest capitol, ne concentrăm pe învățarea prin întărire pentru modelele de limbaj. Cu toate acestea, învățarea prin întărire este un domeniu larg cu multe aplicații dincolo de modelele de limbaj. Dacă ești interesat să înveți mai multe despre învățarea prin întărire, ar trebui să consulți [Cursul de Învățare prin Întărire Profundă](https://huggingface.co/courses/deep-rl-course/en/unit1/introduction).
-
-</Tip>
+> [!TIP]
+> În acest capitol, ne concentrăm pe învățarea prin întărire pentru modelele de limbaj. Cu toate acestea, învățarea prin întărire este un domeniu larg cu multe aplicații dincolo de modelele de limbaj. Dacă ești interesat să înveți mai multe despre învățarea prin întărire, ar trebui să consulți [Cursul de Învățare prin Întărire Profundă](https://huggingface.co/courses/deep-rl-course/en/unit1/introduction).
 
 Această pagină îți va oferi o introducere prietenoasă și clară în RL, chiar dacă nu ai întâlnit-o niciodată înainte. Vom descompune ideile principale și vom vedea de ce RL devine atât de important în domeniul Modelelor Mari de Limbaj (LLM-uri).
 
@@ -119,14 +116,11 @@ Optimizarea Politicii Proximale (PPO) a fost una dintre primele tehnici foarte e
 
 Optimizarea Directă a Preferinței (DPO) a fost dezvoltată mai târziu ca o tehnică mai simplă care elimină nevoia unui model de recompensă separat folosind datele de preferință direct. În esență, încadrează problema ca o sarcină de clasificare între răspunsurile alese și respinse.
 
-<Tip>
-
-DPO și PPO sunt algoritmi complecși de învățare prin întărire în sine, pe care nu îi vom acoperi în acest curs. Dacă ești interesat să înveți mai multe despre ei, poți consulta următoarele resurse:
-
-- [Optimizarea Politicii Proximale](https://huggingface.co/docs/trl/main/en/ppo_trainer)
-- [Optimizarea Directă a Preferinței](https://huggingface.co/docs/trl/main/en/dpo_trainer)
-
-</Tip>
+> [!TIP]
+> DPO și PPO sunt algoritmi complecși de învățare prin întărire în sine, pe care nu îi vom acoperi în acest curs. Dacă ești interesat să înveți mai multe despre ei, poți consulta următoarele resurse:
+>
+> - [Optimizarea Politicii Proximale](https://huggingface.co/docs/trl/main/en/ppo_trainer)
+> - [Optimizarea Directă a Preferinței](https://huggingface.co/docs/trl/main/en/dpo_trainer)
 
 Spre deosebire de DPO și PPO, GRPO grupează eșantioane similare împreună și le compară ca grup. Abordarea bazată pe grup oferă gradienți mai stabili și proprietăți de convergență mai bune comparativ cu alte metode.
 
diff --git a/chapters/ro/chapter12/3.mdx b/chapters/ro/chapter12/3.mdx
index 1cd3a12e2..893a65cb9 100644
--- a/chapters/ro/chapter12/3.mdx
+++ b/chapters/ro/chapter12/3.mdx
@@ -10,11 +10,8 @@ DeepSeek R1 reprezintă un progres semnificativ în antrenarea modelelor de limb
 
 Obiectivul inițial al lucrării a fost să exploreze dacă învățarea prin întărire pură ar putea dezvolta capacități de raționament fără ajustarea fină supervizată.
 
-<Tip>
-
-Până la acel moment, toate LLM-urile populare necesitau o anumită ajustare fină supervizată, pe care am explorat-o în [capitolul 11](/course/chapter11/1).
-
-</Tip>
+> [!TIP]
+> Până la acel moment, toate LLM-urile populare necesitau o anumită ajustare fină supervizată, pe care am explorat-o în [capitolul 11](/course/chapter11/1).
 
 ## Momentul Revoluționar 'Aha'
 
@@ -157,13 +154,10 @@ Această abordare se dovedește mai stabilă decât metodele tradiționale pentr
 - Normalizarea bazată pe grup ajută la prevenirea problemelor cu scalarea recompenselor
 - Penalitatea KL acționează ca o plasă de siguranță, asigurându-se că modelul nu uită ce știa deja în timp ce învață lucruri noi
 
-<Tip>
-
-Inovațiile cheie ale GRPO sunt:
-- Învățarea direct din orice funcție sau model, eliminând dependența de un model de recompensă separat.
-- Învățarea bazată pe grup, care este mai stabilă și eficientă decât metodele tradiționale cum ar fi comparațiile perechi.
-
-</Tip>
+> [!TIP]
+> Inovațiile cheie ale GRPO sunt:
+> - Învățarea direct din orice funcție sau model, eliminând dependența de un model de recompensă separat.
+> - Învățarea bazată pe grup, care este mai stabilă și eficientă decât metodele tradiționale cum ar fi comparațiile perechi.
 
 Această descompunere este complexă, dar concluzia cheie este că GRPO este o modalitate mai eficientă și stabilă de a antrena un model să raționeze.
 
diff --git a/chapters/ro/chapter12/3a.mdx b/chapters/ro/chapter12/3a.mdx
index 9d4522cb6..c02cfbab9 100644
--- a/chapters/ro/chapter12/3a.mdx
+++ b/chapters/ro/chapter12/3a.mdx
@@ -1,10 +1,7 @@
 # Înțelegerea Avansată a Optimizării Relative a Politicii de Grup (GRPO) în DeepSeekMath
 
-<Tip>
-
-Această secțiune se scufundă în detaliile tehnice și matematice ale GRPO. A fost scrisă de [Shirin Yamani](https://github.com/shirinyamani).
-
-</Tip>
+> [!TIP]
+> Această secțiune se scufundă în detaliile tehnice și matematice ale GRPO. A fost scrisă de [Shirin Yamani](https://github.com/shirinyamani).
 
 Să ne aprofundăm înțelegerea GRPO astfel încât să putem îmbunătăți procesul de antrenare al modelului nostru.
 
diff --git a/chapters/ro/chapter12/4.mdx b/chapters/ro/chapter12/4.mdx
index bc50b2c7b..c785b6b18 100644
--- a/chapters/ro/chapter12/4.mdx
+++ b/chapters/ro/chapter12/4.mdx
@@ -4,11 +4,8 @@
 
 Vom explora conceptele centrale ale GRPO așa cum sunt întruchipate în GRPOTrainer din TRL, folosind fragmente din documentația oficială TRL pentru a ne ghida.
 
-<Tip>
-
-Acest capitol este destinat începătorilor TRL. Dacă ești deja familiar cu TRL, ai putea de asemenea să consulți [implementarea Open R1](https://github.com/huggingface/open-r1/blob/main/src/open_r1/grpo.py) a GRPO.
-
-</Tip>
+> [!TIP]
+> Acest capitol este destinat începătorilor TRL. Dacă ești deja familiar cu TRL, ai putea de asemenea să consulți [implementarea Open R1](https://github.com/huggingface/open-r1/blob/main/src/open_r1/grpo.py) a GRPO.
 
 În primul rând, să ne reamintim unele dintre conceptele importante ale algoritmului GRPO:
 
diff --git a/chapters/ro/chapter12/5.mdx b/chapters/ro/chapter12/5.mdx
index 80b961e2e..4d47ffe3c 100644
--- a/chapters/ro/chapter12/5.mdx
+++ b/chapters/ro/chapter12/5.mdx
@@ -8,11 +8,8 @@
 
 Acum că ai văzut teoria, să o punem în practică! În acest exercițiu, vei ajusta fin un model cu GRPO.
 
-<Tip>
-
-Acest exercițiu a fost scris de expertul în ajustarea fină LLM [@mlabonne](https://huggingface.co/mlabonne).
-
-</Tip>
+> [!TIP]
+> Acest exercițiu a fost scris de expertul în ajustarea fină LLM [@mlabonne](https://huggingface.co/mlabonne).
 
 ## Instalează dependențele
 
diff --git a/chapters/ro/chapter12/6.mdx b/chapters/ro/chapter12/6.mdx
index 5be72a1ee..330fb637e 100644
--- a/chapters/ro/chapter12/6.mdx
+++ b/chapters/ro/chapter12/6.mdx
@@ -10,11 +10,8 @@
 
 Unsloth este o bibliotecă care accelerează ajustarea fină a LLM-urilor, făcând posibilă antrenarea modelelor mai repede și cu mai puține resurse computaționale. Unsloth se conectează la TRL, deci vom construi pe ceea ce am învățat în secțiunile anterioare, și o vom adapta pentru specificațiile Unsloth.
 
-<Tip>
-
-Acest exercițiu poate fi rulat pe un GPU T4 Google Colab gratuit. Pentru cea mai bună experiență, urmărește notebook-ul legat mai sus și încearcă-l singur.
-
-</Tip>
+> [!TIP]
+> Acest exercițiu poate fi rulat pe un GPU T4 Google Colab gratuit. Pentru cea mai bună experiență, urmărește notebook-ul legat mai sus și încearcă-l singur.
 
 ## Instalează dependențele
 
@@ -71,11 +68,8 @@ model = FastLanguageModel.get_peft_model(
 
 Acest cod încarcă modelul în cuantizare 4-bit pentru a economisi memoria și aplică LoRA (Adaptarea de Rang Mic) pentru ajustarea fină eficientă. Parametrul `target_modules` specifică care straturi ale modelului să fie ajustate fin, și `use_gradient_checkpointing` permite antrenarea cu contexte mai lungi.
 
-<Tip>
-
-Nu vom acoperi detaliile LoRA în acest capitol, dar poți învăța mai multe în [Capitolul 11](/course/chapter11/3).
-
-</Tip>
+> [!TIP]
+> Nu vom acoperi detaliile LoRA în acest capitol, dar poți învăța mai multe în [Capitolul 11](/course/chapter11/3).
 
 ## Pregătirea Datelor
 
@@ -278,11 +272,8 @@ Acum să începem antrenarea:
 trainer.train()
 ```
 
-<Tip warning={true}>
-
-Antrenarea poate dura ceva timp. S-ar putea să nu vezi recompensele crescând imediat - poate dura 150-200 de pași înainte să începi să vezi îmbunătățiri. Fii răbdător!
-
-</Tip>
+> [!WARNING]
+> Antrenarea poate dura ceva timp. S-ar putea să nu vezi recompensele crescând imediat - poate dura 150-200 de pași înainte să începi să vezi îmbunătățiri. Fii răbdător!
 
 ## Testarea Modelului
 
diff --git a/chapters/ro/chapter2/1.mdx b/chapters/ro/chapter2/1.mdx
index 3adddcc0b..e682ba1f9 100644
--- a/chapters/ro/chapter2/1.mdx
+++ b/chapters/ro/chapter2/1.mdx
@@ -21,7 +21,6 @@ Acest capitol va începe cu un exemplu end-to-end în care folosim împreună un
 Apoi vom analiza API-ul tokenizer, care este cealaltă componentă principală a funcției `pipeline()`. Tokenizerii se ocupă de prima și ultima etapă de procesare, gestionând conversia de la text la intrări numerice pentru rețeaua neuronală și conversia înapoi la text atunci când este necesar. În cele din urmă, vă vom arăta cum să vă ocupați de trimiterea mai multor propoziții printr-un model în cadrul unui batch pregătit, apoi vom încheia totul cu o examinare mai atentă a funcției  `tokenizer()`.
 
 
-<Tip>
-⚠️  
-Pentru a beneficia de toate funcțiile disponibile cu Model Hub și 🤗 Transformers, vă recomandăm <a href="https://huggingface.co/join">să vă creați un cont</a>.
-</Tip>
\ No newline at end of file
+> [!TIP]
+> ⚠️  
+> Pentru a beneficia de toate funcțiile disponibile cu Model Hub și 🤗 Transformers, vă recomandăm <a href="https://huggingface.co/join">să vă creați un cont</a>.
\ No newline at end of file
diff --git a/chapters/ro/chapter2/2.mdx b/chapters/ro/chapter2/2.mdx
index a244194e8..8e870ec7f 100644
--- a/chapters/ro/chapter2/2.mdx
+++ b/chapters/ro/chapter2/2.mdx
@@ -22,10 +22,8 @@
 
 {/if}
 
-<Tip>
- 
-Aceasta este prima secțiune în care conținutul este ușor diferit în funcție de utilizarea PyTorch sau TensorFlow. Schimbați comutatorul din partea de sus a titlului pentru a selecta platforma pe care o preferați!
-</Tip>
+> [!TIP]
+> Aceasta este prima secțiune în care conținutul este ușor diferit în funcție de utilizarea PyTorch sau TensorFlow. Schimbați comutatorul din partea de sus a titlului pentru a selecta platforma pe care o preferați!
 
 {#if fw === 'pt'}
 <Youtube id="1pedAIvTWXk"/>
@@ -356,8 +354,5 @@ Acum putem concluziona că modelul a prezis următoarele:
 
 Am reprodus cu succes cele trei etape ale pipeline-ului: preprocesarea cu tokenizatoare, trecerea intrărilor prin model și postprocesarea! Acum haideți să analizăm în profunzime fiecare dintre aceste etape.
 
-<Tip>
-
-✏️  **Încercați!** Alegeți două (sau mai multe) texte proprii și treceți-le prin conducta `sentiment-analysis`. Apoi repetați pașii pe care i-ați văzut aici și verificați dacă obțineți aceleași rezultate!
-
-</Tip>
+> [!TIP]
+> ✏️  **Încercați!** Alegeți două (sau mai multe) texte proprii și treceți-le prin conducta `sentiment-analysis`. Apoi repetați pașii pe care i-ați văzut aici și verificați dacă obțineți aceleași rezultate!
diff --git a/chapters/ro/chapter2/4.mdx b/chapters/ro/chapter2/4.mdx
index 365db17b3..e5845c115 100644
--- a/chapters/ro/chapter2/4.mdx
+++ b/chapters/ro/chapter2/4.mdx
@@ -226,11 +226,8 @@ print(ids)
 
 Aceste rezultate, odată convertite în tensorul framework-ului corespunzător, pot fi apoi utilizate ca intrări într-un model, așa cum am văzut mai devreme în acest capitol.
 
-<Tip>
-
-✏️ **Încercați!** Replicați ultimii doi pași (tokenizarea și conversia în ID-uri de intrare) pe propozițiile de intrare pe care le-am folosit în secțiunea 2 ("I've been waiting for a HuggingFace course my whole life." și "I hate this so much!"). Verificați dacă obțineți aceleași ID-uri de intrare pe care le-am obținut mai devreme! 
-
-</Tip>
+> [!TIP]
+> ✏️ **Încercați!** Replicați ultimii doi pași (tokenizarea și conversia în ID-uri de intrare) pe propozițiile de intrare pe care le-am folosit în secțiunea 2 ("I've been waiting for a HuggingFace course my whole life." și "I hate this so much!"). Verificați dacă obțineți aceleași ID-uri de intrare pe care le-am obținut mai devreme!
 
 ## Decodificare[[decodificare]]
 
diff --git a/chapters/ro/chapter2/5.mdx b/chapters/ro/chapter2/5.mdx
index 9bd200369..82c84ceed 100644
--- a/chapters/ro/chapter2/5.mdx
+++ b/chapters/ro/chapter2/5.mdx
@@ -181,11 +181,8 @@ batched_ids = [ids, ids]
 
 Acesta este un batch de două secvențe identice!
 
-<Tip>
-
-✏️ **Încercați!** Convertiți această listă `batched_ids` într-un tensor și treceți-o prin modelul dumneavoastră. Verificați dacă obțineți aceleași logits ca înainte (dar de două ori)!
-
-</Tip>
+> [!TIP]
+> ✏️ **Încercați!** Convertiți această listă `batched_ids` într-un tensor și treceți-o prin modelul dumneavoastră. Verificați dacă obțineți aceleași logits ca înainte (dar de două ori)!
 
 Batching-ul permite modelului să funcționeze atunci când îi furnizați mai multe secvențe. Utilizarea mai multor secvențe este la fel de simplă ca și crearea unui lot cu o singură secvență. Există însă o a doua problemă. Atunci când încercați să combinați două (sau mai multe) propoziții, acestea pot avea lungimi diferite. Dacă ați mai lucrat vreodată cu tensori, știți că aceștia trebuie să aibă o formă dreptunghiulară, deci nu veți putea converti direct lista de ID-uri de intrare într-un tensor. Pentru a rezolva această problemă, de obicei *umplem* datele de intrare.
 
@@ -317,11 +314,8 @@ Acum obținem aceeași logits pentru a doua propoziție din batch.
 
 Observăm cum ultima valoare a celei de-a doua secvențe este un ID de padding, care este o valoare 0 în masca de atenție.
 
-<Tip>
-
-✏️ **Încercați!** Aplicați manual tokenizarea pe cele două propoziții utilizate în secțiunea 2 ("I've been waiting for a HuggingFace course my whole life." și "I hate this so much!"). Treceți-le prin model și verificați dacă obțineți aceeași logiți ca în secțiunea 2. Acum grupați-le împreună folosind token-ul de padding, apoi creați masca de atenție corespunzătoare. Verificați dacă obțineți aceleași rezultate atunci când parcurgeți modelul!
-
-</Tip>
+> [!TIP]
+> ✏️ **Încercați!** Aplicați manual tokenizarea pe cele două propoziții utilizate în secțiunea 2 ("I've been waiting for a HuggingFace course my whole life." și "I hate this so much!"). Treceți-le prin model și verificați dacă obțineți aceeași logiți ca în secțiunea 2. Acum grupați-le împreună folosind token-ul de padding, apoi creați masca de atenție corespunzătoare. Verificați dacă obțineți aceleași rezultate atunci când parcurgeți modelul!
 
 ## Secvențe mai lungi[[secvențe-mai-lungi]]
 
diff --git a/chapters/ro/chapter3/2.mdx b/chapters/ro/chapter3/2.mdx
index e6f391072..8662bc931 100644
--- a/chapters/ro/chapter3/2.mdx
+++ b/chapters/ro/chapter3/2.mdx
@@ -88,9 +88,8 @@ Hub-ul nu conține doar modele, ci și multe seturi de date în limbi diferite.
 
 Biblioteca 🤗 Datasets oferă o comandă foarte simplă pentru a descărca și stoca în cache un set de date pe Hub. Putem descărca setul de date MRPC astfel:
 
-<Tip>
-⚠️ **Atenție** Asigurați-vă că `datasets` este instalat prin rularea `pip install datasets`. Apoi, încărcați setul de date MRPC și tipăriți-l pentru a vedea ce conține.
-</Tip> 
+> [!TIP]
+> ⚠️ **Atenție** Asigurați-vă că `datasets` este instalat prin rularea `pip install datasets`. Apoi, încărcați setul de date MRPC și tipăriți-l pentru a vedea ce conține. 
 
 ```py
 from datasets import load_dataset
@@ -149,11 +148,8 @@ raw_train_dataset.features
 
 În culise, `label` este de tipul `ClassLabel`, iar maparea numerelor întregi și numele etichetei este stocată în folderul *names*. `0` corespunde la `not_equivalent`, iar `1` corespunde la `equivalent`.
 
-<Tip>
-
-✏️ **Încercați!** Uitați-vă la elementul 15 din setul de antrenament și la elementul 87 din setul de validare. Care sunt etichetele lor?
-
-</Tip>
+> [!TIP]
+> ✏️ **Încercați!** Uitați-vă la elementul 15 din setul de antrenament și la elementul 87 din setul de validare. Care sunt etichetele lor?
 
 ### Preprocesarea unui set de date[[preprocesarea-unui-set-de-date]]
 
@@ -191,11 +187,8 @@ inputs
 
 Am discutat despre cheile `input_ids` și `attention_mask` în [Capitolul 2](/course/chapter2), dar am amânat discuția despre `token_type_ids`. În acest exemplu, aceasta este ceea ce îi spune modelului care parte a intrării este prima propoziție și care este a doua propoziție.
 
-<Tip>
-
-✏️ **Încercați!** Luați elementul 15 din setul de antrenament și tokenizați cele două propoziții separat apoi ca pe o pereche. Care este diferența dintre cele două rezultate?
-
-</Tip>
+> [!TIP]
+> ✏️ **Încercați!** Luați elementul 15 din setul de antrenament și tokenizați cele două propoziții separat apoi ca pe o pereche. Care este diferența dintre cele două rezultate?
 
 Dacă decodificăm ID-urile din `input_ids` înapoi în cuvinte:
 
@@ -352,11 +345,8 @@ Perfect! Acum că am trecut de la text brut la batch-uri cu care modelul nostru
 
 {/if}
 
-<Tip>
-
-✏️ **Încearcați!** Replicați preprocesarea pe setul de date GLUE SST-2. Acesta este puțin diferit, deoarece este compus din propoziții simple în loc de perechi, dar restul lucrurilor pe care le-am făcut ar trebui să fie la fel. Pentru o provocare mai dificilă, încercați să scrieți o funcție de preprocesare care să funcționeze pe oricare dintre sarcinile GLUE.
-
-</Tip>
+> [!TIP]
+> ✏️ **Încearcați!** Replicați preprocesarea pe setul de date GLUE SST-2. Acesta este puțin diferit, deoarece este compus din propoziții simple în loc de perechi, dar restul lucrurilor pe care le-am făcut ar trebui să fie la fel. Pentru o provocare mai dificilă, încercați să scrieți o funcție de preprocesare care să funcționeze pe oricare dintre sarcinile GLUE.
 
 {#if fw === 'tf'}
 
diff --git a/chapters/ro/chapter3/3.mdx b/chapters/ro/chapter3/3.mdx
index 5da0fc0dd..2368e6046 100644
--- a/chapters/ro/chapter3/3.mdx
+++ b/chapters/ro/chapter3/3.mdx
@@ -42,11 +42,8 @@ from transformers import TrainingArguments
 training_args = TrainingArguments("test-trainer")
 ```
 
-<Tip>
-
-💡 Dacă doriți să încărcați automat modelul în Hub în timpul instruirii, treceți `push_to_hub=True` în `TrainingArguments`. Vom afla mai multe despre acest lucru în [Capitolul 4](/course/chapter4/3).
-
-</Tip>
+> [!TIP]
+> 💡 Dacă doriți să încărcați automat modelul în Hub în timpul instruirii, treceți `push_to_hub=True` în `TrainingArguments`. Vom afla mai multe despre acest lucru în [Capitolul 4](/course/chapter4/3).
 
 Al doilea pas este să ne definim modelul. Ca și în [capitolul anterior](/course/chapter2), vom folosi clasa `AutoModelForSequenceClassification`, cu două etichete:
 
@@ -163,9 +160,6 @@ Modelul `Trainer` va funcționa din start pe mai multe GPU sau TPU și oferă o
 
 Aceasta încheie introducerea la reglarea fină cu ajutorul API-ului `Trainer`. În [Capitolul 7](/course/chapter7) va fi prezentat un exemplu de efectuare a acestei operații pentru cele mai comune sarcini NLP, dar pentru moment să analizăm cum se poate face același lucru în PyTorch pur.
 
-<Tip>
-
-✏️ **Încercați!** Ajustați un model pe setul de date GLUE SST-2, folosind procesarea datelor efectuată în secțiunea 2.
-
-</Tip>
+> [!TIP]
+> ✏️ **Încercați!** Ajustați un model pe setul de date GLUE SST-2, folosind procesarea datelor efectuată în secțiunea 2.
 
diff --git a/chapters/ro/chapter3/3_tf.mdx b/chapters/ro/chapter3/3_tf.mdx
index 73d2fa64b..25cec3088 100644
--- a/chapters/ro/chapter3/3_tf.mdx
+++ b/chapters/ro/chapter3/3_tf.mdx
@@ -70,11 +70,8 @@ Veți observa că, spre deosebire de [Capitolul 2](/course/chapter2), veți prim
 
 Pentru a realiza fine-tuning-ul modelului pe setul nostru de date, trebuie doar să `compilăm()` modelul nostru și apoi să transmitem datele noastre metodei `fit()`. Aceasta va începe procesul de `fine-tuning` (care ar trebui să dureze câteva minute pe un GPU) și va raporta pierderea de formare pe parcurs, plus pierderea de validare la sfârșitul fiecărei epoci.
 
-<Tip>
-
-Rețineți că modelele 🤗 Transformers au o abilitate specială pe care majoritatea modelelor Keras nu o au - ele pot utiliza automat o valoare adecvată a pierderii pe care o calculează intern. Ele vor utiliza această valoare în mod implicit dacă nu setați un argument de pierdere în `compile()`. Rețineți că pentru a utiliza  valoarea internă a pierderii va trebui să transmiteți etichetele ca parte a datelor de intrare, nu ca etichetă separată, care este modul normal de utilizare a etichetelor cu modelele Keras. Veți vedea exemple în acest sens în partea 2 a cursului, unde definirea funcției de pierdere corecte poate fi complicată. Cu toate acestea, pentru clasificarea secvențelor, o funcție de pierdere Keras standard funcționează bine, așa că aceasta este cea pe care o vom utiliza aici.
-
-</Tip>
+> [!TIP]
+> Rețineți că modelele 🤗 Transformers au o abilitate specială pe care majoritatea modelelor Keras nu o au - ele pot utiliza automat o valoare adecvată a pierderii pe care o calculează intern. Ele vor utiliza această valoare în mod implicit dacă nu setați un argument de pierdere în `compile()`. Rețineți că pentru a utiliza  valoarea internă a pierderii va trebui să transmiteți etichetele ca parte a datelor de intrare, nu ca etichetă separată, care este modul normal de utilizare a etichetelor cu modelele Keras. Veți vedea exemple în acest sens în partea 2 a cursului, unde definirea funcției de pierdere corecte poate fi complicată. Cu toate acestea, pentru clasificarea secvențelor, o funcție de pierdere Keras standard funcționează bine, așa că aceasta este cea pe care o vom utiliza aici.
 
 ```py
 from tensorflow.keras.losses import SparseCategoricalCrossentropy
@@ -90,11 +87,8 @@ model.fit(
 )
 ```
 
-<Tip warning={true}>
-
-Remarcați o problemă foarte des întâlnită aici - *puteți* doar să transmiteți numele pierderii ca șir de caractere către Keras, dar în mod implicit Keras va presupune că ați aplicat deja un softmax rezultatelor dvs. Cu toate acestea, multe modele produc valorile chiar înainte de aplicarea softmax, care sunt cunoscute și sub numele de *logits*. Trebuie să spunem funcției de pierdere că asta face modelul nostru, iar singura modalitate de a face acest lucru este să o apelăm direct, mai degrabă decât prin nume cu un șir.
-
-</Tip>
+> [!WARNING]
+> Remarcați o problemă foarte des întâlnită aici - *puteți* doar să transmiteți numele pierderii ca șir de caractere către Keras, dar în mod implicit Keras va presupune că ați aplicat deja un softmax rezultatelor dvs. Cu toate acestea, multe modele produc valorile chiar înainte de aplicarea softmax, care sunt cunoscute și sub numele de *logits*. Trebuie să spunem funcției de pierdere că asta face modelul nostru, iar singura modalitate de a face acest lucru este să o apelăm direct, mai degrabă decât prin nume cu un șir.
 
 
 ### Îmbunătățirea performanțelor de instruire[[Îmbunătățirea-performanțelor-de-instruire]]
@@ -130,11 +124,8 @@ from tensorflow.keras.optimizers import Adam
 opt = Adam(learning_rate=lr_scheduler)
 ```
 
-<Tip>
-
-Biblioteca Transformers 🤗 are, de asemenea, o funcție `create_optimizer()` care va crea un optimizator `AdamW` cu rata de învățare în scădere. Aceasta este o scurtătură convenabilă pe care o veți vedea în detaliu în secțiunile viitoare ale cursului.
-
-</Tip>
+> [!TIP]
+> Biblioteca Transformers 🤗 are, de asemenea, o funcție `create_optimizer()` care va crea un optimizator `AdamW` cu rata de învățare în scădere. Aceasta este o scurtătură convenabilă pe care o veți vedea în detaliu în secțiunile viitoare ale cursului.
 
 Acum avem optimizatorul nostru complet nou și putem încerca să ne antrenăm cu el. În primul rând, să reîncărcăm modelul, pentru a reseta modificările aduse ponderilor în urma instruirii pe care tocmai am efectuat-o, iar apoi îl putem compila cu noul optimizator:
 
@@ -152,11 +143,8 @@ Acum, ne potrivim din nou:
 model.fit(tf_train_dataset, validation_data=tf_validation_dataset, epochs=3)
 ```
 
-<Tip>
-
-💡 Dacă doriți să încărcați automat modelul dvs. în Hub în timpul instruirii, puteți trece un `PushToHubCallback` în metoda `model.fit()`. Vom afla mai multe despre acest lucru în [Capitolul 4](/course/chapter4/3)
-
-</Tip>
+> [!TIP]
+> 💡 Dacă doriți să încărcați automat modelul dvs. în Hub în timpul instruirii, puteți trece un `PushToHubCallback` în metoda `model.fit()`. Vom afla mai multe despre acest lucru în [Capitolul 4](/course/chapter4/3)
 
 ### Predicțiile modelului[[predicțiile-modelului]]
 
diff --git a/chapters/ro/chapter3/4.mdx b/chapters/ro/chapter3/4.mdx
index 797e7d9c9..6ea1f50f7 100644
--- a/chapters/ro/chapter3/4.mdx
+++ b/chapters/ro/chapter3/4.mdx
@@ -197,11 +197,8 @@ metric.compute()
 
 Din nou, rezultatele voastre vor fi ușor diferite din cauza aleatorietății în inițializarea layer-ului final (model head) și a amestecării datelor, dar ar trebui să fie în aceeași zonă valorică.
 
-<Tip>
-
-✏️ **Încercați!** Modificați bucla de antrenament anterioară pentru a vă rafina modelul pe dataset-ul SST-2.
-
-</Tip>
+> [!TIP]
+> ✏️ **Încercați!** Modificați bucla de antrenament anterioară pentru a vă rafina modelul pe dataset-ul SST-2.
 
 ### Îmbunătățiți circuitul de antrenament cu 🤗 Accelerate[[îmbunătățiți-circuitul-de-antrenament-cu-accelerate]]
 
@@ -291,9 +288,8 @@ Prima linie de adăugat este linia de import. A doua linie instanțiază un obie
 
 Apoi, partea principală a muncii este făcută în linia care trimite dataloaders, modelul și optimizer-ul la `accelerator.prepare()`. Aceasta va împacheta acele obiecte în containerul potrivit pentru a vă asigura că antrenarea distribuită funcționează corespunzător. Restul modificărilor constau în eliminarea liniei care mută batch-ul pe `device` (din nou, dacă doriți să o păstrați puteți doar să o schimbați să folosească `accelerator.device`) și înlocuirea `loss.backward()` cu `accelerator.backward(loss)`.
 
-<Tip>
-⚠️ Pentru a beneficia de creșterea vitezei oferită de Cloud TPU-uri, vă recomandăm să împachetați mostrele la o lungime fixă folosind argumentele `padding="max_length"` și `max_length` ale tokenizer-ului.
-</Tip>
+> [!TIP]
+> ⚠️ Pentru a beneficia de creșterea vitezei oferită de Cloud TPU-uri, vă recomandăm să împachetați mostrele la o lungime fixă folosind argumentele `padding="max_length"` și `max_length` ale tokenizer-ului.
 
 Dacă vreți să copiați și să lipiți pentru a vă juca, iată cum arată bucla completă de antrenament cu 🤗 Accelerate:
 
diff --git a/chapters/ro/chapter4/2.mdx b/chapters/ro/chapter4/2.mdx
index bbe94669b..d24191960 100644
--- a/chapters/ro/chapter4/2.mdx
+++ b/chapters/ro/chapter4/2.mdx
@@ -92,7 +92,6 @@ model = TFAutoModelForMaskedLM.from_pretrained("camembert-base")
 ```
 {/if}
 
-<Tip>
-În momentul în care folosiți un model preantrenat, asigurați-vă să verificați cum a fost antrenat și pe ce date se bazează. De asemenea, trebuie să cunoasceți limitele și prejudecățile sale. Toată această informație va fi indicată pe cartea modelului.
-</Tip>
+> [!TIP]
+> În momentul în care folosiți un model preantrenat, asigurați-vă să verificați cum a fost antrenat și pe ce date se bazează. De asemenea, trebuie să cunoasceți limitele și prejudecățile sale. Toată această informație va fi indicată pe cartea modelului.
 
diff --git a/chapters/ro/chapter4/3.mdx b/chapters/ro/chapter4/3.mdx
index dc0a68d61..2f8bebb05 100644
--- a/chapters/ro/chapter4/3.mdx
+++ b/chapters/ro/chapter4/3.mdx
@@ -171,11 +171,8 @@ Apăsați pe tabul "Files and versions", iar acum ar trebui să vedeți fișiere
 </div>
 {/if}
 
-<Tip>
-
-✏️ **Încercați!** Luați modelul și tokenizerul asociat cu checkpointul `bert-base-cased` și încărcați-l pe un repo în namespace-ul tău folosind metoda `push_to_hub()`. Verificați că repo-ul apare  în pagina dumneavoastră înainte de a-l șterge.
-
-</Tip>
+> [!TIP]
+> ✏️ **Încercați!** Luați modelul și tokenizerul asociat cu checkpointul `bert-base-cased` și încărcați-l pe un repo în namespace-ul tău folosind metoda `push_to_hub()`. Verificați că repo-ul apare  în pagina dumneavoastră înainte de a-l șterge.
 
 Ați văzut că metoda `push_to_hub()` acceptă mai multe argumente, ceea ce permite încărcarea într-un repository specific sau namespace al unei organizații, sau utilizarea unui token API diferit. Vă recomandăm să vă uitați la specificația metodei disponibilă direct în [documentația 🤗 Transformers](https://huggingface.co/transformers/model_sharing) pentru a înțelge ceea ce este posibil.
 
@@ -462,9 +459,8 @@ Dacă priviți la dimensiunile fișierelor (de exemplu, folosind `ls -lh`), ar t
 
 {/if}
 
-<Tip>
-✏️ Dacă creați repositoriul folosind interfața web, fișierul *.gitattributes* va fi automat configurat pentru a considera anumite extensii de fișiere, precum *.bin* și *.h5*, ca fiind fișiere mari. În acest caz nu este necesară nici o setare suplimentară din partea ta, pentru că git-lfs le va urmări automat.
-</Tip>
+> [!TIP]
+> ✏️ Dacă creați repositoriul folosind interfața web, fișierul *.gitattributes* va fi automat configurat pentru a considera anumite extensii de fișiere, precum *.bin* și *.h5*, ca fiind fișiere mari. În acest caz nu este necesară nici o setare suplimentară din partea ta, pentru că git-lfs le va urmări automat.
 
 Acum putem continua procesul în felul nostru obișnuit cu repositoriurile Git tradiționale. Putem adăuga toate fișierele în mediul de stocare a Git folosind comanda `git add`:
 
diff --git a/chapters/ro/chapter5/2.mdx b/chapters/ro/chapter5/2.mdx
index 9ed66fd46..1605b94ff 100644
--- a/chapters/ro/chapter5/2.mdx
+++ b/chapters/ro/chapter5/2.mdx
@@ -48,11 +48,8 @@ SQuAD_it-train.json.gz:	   82.2% -- replaced with SQuAD_it-train.json
 
 Putem vedea că fișierele comprimate au fost înlocuite cu _SQuAD_it-train.json_ și _SQuAD_it-test.json_, și că datele sunt stocate în formatul JSON.
 
-<Tip>
-
-✎ Dacă vă întrebați de ce există un caracter `!` în comenzile shell de mai sus, aceasta este pentru că le rulăm într-un notebook Jupyter. Pur și simplu eliminați prefixul dacă doriți să descărcați și să dezarhivați datasetul într-un terminal.
-
-</Tip>
+> [!TIP]
+> ✎ Dacă vă întrebați de ce există un caracter `!` în comenzile shell de mai sus, aceasta este pentru că le rulăm într-un notebook Jupyter. Pur și simplu eliminați prefixul dacă doriți să descărcați și să dezarhivați datasetul într-un terminal.
 
 Pentru a încărca un fișier JSON cu funcția `load_dataset()`, trebuie doar să știm dacă avem de-a face cu JSON obișnuit (similar cu un dicționar imbricat) sau JSON Lines (JSON separat pe linii). Ca multe dataseturi de întrebări și răspunsuri, SQuAD-it folosește formatul imbricat, cu tot textul stocat într-un câmp `data`. Aceasta înseamnă că putem încărca datasetul specificând argumentul `field` după cum urmează:
 
@@ -126,11 +123,8 @@ DatasetDict({
 
 Acesta este exact ceea ce am dorit. Acum, putem aplica diverse tehnici de preprocesare pentru a curăța datele, tokeniza recenziile și așa mai departe.
 
-<Tip>
-
-Argumentul `data_files` al funcției `load_dataset()` este destul de flexibil și poate fi fie o singură cale de fișier, o listă de căi de fișiere, sau un dicționar care mapează numele spliturilor la căile fișierelor. De asemenea, puteți folosi glob pentru fișiere care se potrivesc unui model specificat conform regulilor folosite de shell-ul Unix (de exemplu, puteți face glob pentru toate fișierele JSON dintr-un director ca un singur split prin setarea `data_files="*.json"`). Consultați [documentația](https://huggingface.co/docs/datasets/loading#local-and-remote-files) 🤗 Datasets pentru mai multe detalii.
-
-</Tip>
+> [!TIP]
+> Argumentul `data_files` al funcției `load_dataset()` este destul de flexibil și poate fi fie o singură cale de fișier, o listă de căi de fișiere, sau un dicționar care mapează numele spliturilor la căile fișierelor. De asemenea, puteți folosi glob pentru fișiere care se potrivesc unui model specificat conform regulilor folosite de shell-ul Unix (de exemplu, puteți face glob pentru toate fișierele JSON dintr-un director ca un singur split prin setarea `data_files="*.json"`). Consultați [documentația](https://huggingface.co/docs/datasets/loading#local-and-remote-files) 🤗 Datasets pentru mai multe detalii.
 
 Scripturile de încărcare din 🤗 Datasets suportă de fapt decomprimarea automată a fișierelor de intrare, deci am fi putut să sărim peste folosirea `gzip` prin indicarea argumentului `data_files` direct către fișierele comprimate:
 
@@ -158,11 +152,8 @@ squad_it_dataset = load_dataset("json", data_files=data_files, field="data")
 
 Aceasta returnează același obiect `DatasetDict` obținut mai sus, dar ne economisește pasul de a descărca și decomprima manual fișierele _SQuAD_it-*.json.gz_. Aceasta încheie incursiunea noastră în diversele modalități de încărcare a dataseturilor care nu sunt găzduite pe Hugging Face Hub. Acum că avem un dataset cu care să ne jucăm, să explorăm diverse tehnici de manipulare a datelor!
 
-<Tip>
-
-✏️ **Încercați!** Alegeți un alt dataset găzduit pe GitHub sau [UCI Machine Learning Repository](https://archive.ics.uci.edu/ml/index.php) și încercați să îl încărcați atât local cât și remote folosind tehnicile introduse mai sus. Pentru puncte bonus, încercați să încărcați un dataset care este stocat în format CSV sau text (consultați [documentația](https://huggingface.co/docs/datasets/loading#local-and-remote-files) pentru mai multe informații despre aceste formate).
-
-</Tip>
+> [!TIP]
+> ✏️ **Încercați!** Alegeți un alt dataset găzduit pe GitHub sau [UCI Machine Learning Repository](https://archive.ics.uci.edu/ml/index.php) și încercați să îl încărcați atât local cât și remote folosind tehnicile introduse mai sus. Pentru puncte bonus, încercați să încărcați un dataset care este stocat în format CSV sau text (consultați [documentația](https://huggingface.co/docs/datasets/loading#local-and-remote-files) pentru mai multe informații despre aceste formate).
 
 
 
diff --git a/chapters/ro/chapter5/3.mdx b/chapters/ro/chapter5/3.mdx
index b79c2453a..c904c52a9 100644
--- a/chapters/ro/chapter5/3.mdx
+++ b/chapters/ro/chapter5/3.mdx
@@ -89,11 +89,8 @@ DatasetDict({
 })
 ```
 
-<Tip>
-
-✏️ **Încearcă!** Folosiți funcția `Dataset.unique()` pentru a găsi numărul de medicamente și condiții unice în seturile de antrenare și testare.
-
-</Tip>
+> [!TIP]
+> ✏️ **Încearcă!** Folosiți funcția `Dataset.unique()` pentru a găsi numărul de medicamente și condiții unice în seturile de antrenare și testare.
 
 În continuare, vom normaliza toate `condition` labels folosind `Dataset.map()`. La fel cum am făcut cu tokenizarea în [Capitolul 3](/course/chapter3), putem defini o funcție simplă care poate fi aplicată pe toate rândurile fiecărui split din `drug_dataset`:
 
@@ -219,11 +216,8 @@ drug_dataset["train"].sort("review_length")[:3]
 
 Precum am presupus, unele recenii conțin doar un singur cuvânt, ceea ce, deși ar putea fi OK pentru analiza sentimentului, nu ar fi informativ dacă vrem să prezicem condiției.
 
-<Tip>
-
-🙋 O alternativă la adăugarea unei noi coloane într-un dataset este funcția `Dataset.add_column()`. Aceasta permite să oferiți coloana ca o listă Python sau array NumPy și poate fi utilă în situații în care `Dataset.map()` nu este bine adaptat pentru analiza dumneavoastră.
-
-</Tip>
+> [!TIP]
+> 🙋 O alternativă la adăugarea unei noi coloane într-un dataset este funcția `Dataset.add_column()`. Aceasta permite să oferiți coloana ca o listă Python sau array NumPy și poate fi utilă în situații în care `Dataset.map()` nu este bine adaptat pentru analiza dumneavoastră.
 
 Hai să folosim funcția `Dataset.filter()` pentru a elimina recenziile care conțin mai puțin de 30 de cuvinte. Similar cum am făcut în cazul coloanei `condition`, putem elimina recenziile foarte scurte cerând ca recenziile să aibă o lungime mai mare decât acest prag:
 
@@ -238,11 +232,8 @@ print(drug_dataset.num_rows)
 
 După cum vedeți, aceasta a eliminat aproximativ 15% din recenziile noastrem, din seturile originale de antrenare și testare.
 
-<Tip>
-
-✏️ **Încercați!** Folosiți funcția `Dataset.sort()` pentru a inspecta recenziile cu cele mai mari numere de cuvinte. Vezi [documentația](https://huggingface.co/docs/datasets/package_reference/main_classes#datasets.Dataset.sort) pentru a vedea ce argument trebuie să folosești pentru a sorta recenziile în ordine descrescătoare.
-
-</Tip>
+> [!TIP]
+> ✏️ **Încercați!** Folosiți funcția `Dataset.sort()` pentru a inspecta recenziile cu cele mai mari numere de cuvinte. Vezi [documentația](https://huggingface.co/docs/datasets/package_reference/main_classes#datasets.Dataset.sort) pentru a vedea ce argument trebuie să folosești pentru a sorta recenziile în ordine descrescătoare.
 
 Ultima chestie de care trebuie să ne ocupăm este prezența caracterelor HTML în recenziile noastre. Putem folosi modulul `html` din Python pentru a face unescape acestor caractere:
 
@@ -299,11 +290,8 @@ Așa cum am văzut în [Capitolul 3](/course/chapter3), putem transmite un singu
 
 Puteți și să măsurați un întreg cell prin scrierea `%%time` la începutul celulei. În hardware-ul pe care l-am executat, acest lucru a arătat 10.8s pentru această instrucție (este numărul scris după "Wall time").
 
-<Tip>
-
-✏️ **Încercați!** Executați aceeași instrucție cu și fără `batched=True`, apoi încercați-o cu un tokenizer lent (adaugați `use_fast=False` în metoda `AutoTokenizer.from_pretrained()`), astfel să puteți vedea ce numere obțineți pe hardwareul vostru.
-
-</Tip>
+> [!TIP]
+> ✏️ **Încercați!** Executați aceeași instrucție cu și fără `batched=True`, apoi încercați-o cu un tokenizer lent (adaugați `use_fast=False` în metoda `AutoTokenizer.from_pretrained()`), astfel să puteți vedea ce numere obțineți pe hardwareul vostru.
 
 Aici sunt rezultatele pe care le-am obținut cu și fără batching, folosind un tokenizer rapid și lent:
 
@@ -522,11 +510,8 @@ drug_dataset["train"][:3]
 train_df = drug_dataset["train"][:]
 ```
 
-<Tip>
-
-🚨 În spatele scenei, `Dataset.set_format()` schimbă formatul returnat pentru  `__getitem__()` dunder method a datasetului. Asta înseamnă că atunci când dorim să creăm un nou obiect ca `train_df` dintr-un `Dataset` în formatul `"pandas"`, trebuie să tăiem întreg datasetul pentru a obține un `pandas.DataFrame`. Puteți verifica voi înşivă că tipul lui `drug_dataset["train"]` este `Dataset`, indiferent de output format.
-
-</Tip>
+> [!TIP]
+> 🚨 În spatele scenei, `Dataset.set_format()` schimbă formatul returnat pentru  `__getitem__()` dunder method a datasetului. Asta înseamnă că atunci când dorim să creăm un nou obiect ca `train_df` dintr-un `Dataset` în formatul `"pandas"`, trebuie să tăiem întreg datasetul pentru a obține un `pandas.DataFrame`. Puteți verifica voi înşivă că tipul lui `drug_dataset["train"]` este `Dataset`, indiferent de output format.
 
 Acum putem utiliza toate funcționalitățile Pandas pe care le dorim. De exemplu, putem face fancy chaining pentru a calcula distribuția clasei printre intrările `condition`:
 
@@ -590,11 +575,8 @@ freq_dataset
 ```
 
 
-<Tip>
-
-✏️ **Încercați!** Calculați media ratingului per medicament și salvați rezultatul într-un nou `Dataset`.
-
-</Tip>
+> [!TIP]
+> ✏️ **Încercați!** Calculați media ratingului per medicament și salvați rezultatul într-un nou `Dataset`.
 
 
 Acest lucru completează turul nostru de tehnici de preprocesare disponibile în 🤗 Datasets. Pentru a finisa secțiunea, vom crea un set de validare pentru a pregăti datasetul pentru antrenarea unui clasificator. Înainte de a face asta, noi vom reseta output formatul `drug_dataset` de la `"pandas"` la `"arrow" :
diff --git a/chapters/ro/chapter5/4.mdx b/chapters/ro/chapter5/4.mdx
index 9375a8af4..5eaccf3bf 100644
--- a/chapters/ro/chapter5/4.mdx
+++ b/chapters/ro/chapter5/4.mdx
@@ -44,11 +44,8 @@ Dataset({
 
 Putem observa că există 15.518.009 de linii și două coloane în datasetul nostru – e foarte mult!
 
-<Tip>
-
-✎ De abia acum, 🤗 Datasets va descompresa fișierele necesare pentru încărcarea datasetului. Dacă doriți să salvați spațiu pe hard drive-ul dvs. , puteți transmite `DownloadConfig(delete_extracted=True)` la argumentul `download_config` al `load_dataset()`. Vedeți mai multe detalii în [documentație](https://huggingface.co/docs/datasets/package_reference/builder_classes#datasets.DownloadConfig).
-
-</Tip>
+> [!TIP]
+> ✎ De abia acum, 🤗 Datasets va descompresa fișierele necesare pentru încărcarea datasetului. Dacă doriți să salvați spațiu pe hard drive-ul dvs. , puteți transmite `DownloadConfig(delete_extracted=True)` la argumentul `download_config` al `load_dataset()`. Vedeți mai multe detalii în [documentație](https://huggingface.co/docs/datasets/package_reference/builder_classes#datasets.DownloadConfig).
 
 Acum hai să analizăm conținutul primei linii:
 
@@ -100,11 +97,8 @@ Dataset size (cache file) : 19.54 GB
 
 Nice – deși este aproximativ 20 GB, putem încărca și accesa datasetul cu mult mai puțin RAM!
 
-<Tip>
-
-✏️ **Încercați!** Alegeți una dintre [subseturile](https://the-eye.eu/public/AI/pile_preliminary_components/) din Pile care este mai mare decât memoria RAM a laptopului sau dispozitivului tău, încărcați-o cu 🤗 Datasets și măsurați cantitatea de memorie folosită. Pentru o măsurare precisă, veți dori să faceți acest lucru într-un proces nou. Puteți găsi dimensiunile decomprimate ale fiecărui subset în Tabelul 1 din [Pile paper](https://arxiv.org/abs/2101.00027). 
-
-</Tip>
+> [!TIP]
+> ✏️ **Încercați!** Alegeți una dintre [subseturile](https://the-eye.eu/public/AI/pile_preliminary_components/) din Pile care este mai mare decât memoria RAM a laptopului sau dispozitivului tău, încărcați-o cu 🤗 Datasets și măsurați cantitatea de memorie folosită. Pentru o măsurare precisă, veți dori să faceți acest lucru într-un proces nou. Puteți găsi dimensiunile decomprimate ale fiecărui subset în Tabelul 1 din [Pile paper](https://arxiv.org/abs/2101.00027).
 
 Dacă sunteți familiarizați cu Pandas, rezultatul acesta poate veni ca o surpriză din cauza celebrei [rule of thumbă](https://wesmckinney.com/blog/apache-arrow-pandas-internals/) al lui Wes Kinney, care spune că în mod normal aveți nevoie de 5 până la 10 ori mai mult spațiu pe RAM decât mărimea datasetului. Deci 🤗 Datasets această problemă de memory management? 🤗 Datasets tratează fiecare dataset ca un [memory-mapped file](https://en.wikipedia.org/wiki/Memory-mapped_file), care oferă un mapping între spațiul RAM și stocarea pe sistem, ceea ce permite bibliotecii să acceseze și să opereze asupra elementelor datasetului fără a trebui să-l încarce în totalitate în memorie.
 
@@ -132,11 +126,8 @@ print(
 
 Aici am folosit modulul `timeit` al Python pentru a măsura timpul de execuție necesar pentru a rula `code_snippet`. În mod normal veți putea trece peste un dataset la viteze de câteva sute de MB/s până la câțiva GB/s. Acest lucru funcționează bine pentru majoritatea aplicațiilor, dar uneori veți avea nevoie să lucrați cu un dataset care este prea mare ca să încapă pe hard driveul laptopului tău. De exemplu, dacă am încerca să descarcăm Pile în întregime, am avea nevoie de 825 GB de spațiu liber! Pentru a vă ajuta cu astfel de cazuri, 🤗 Datasets oferă o feature de streaming care permite accesarea și descărcarea elementelor, fără a trebui să descărcați întregul dataset. Hai să vedem cum funcționează!
 
-<Tip>
-
-💡În Jupyter notebooks poți să măsori timpul unei celule utilizând [funcția magică `%%timeit`](https://ipython.readthedocs.io/en/stable/interactive/magics.html#magic-timeit).
-
-</Tip>
+> [!TIP]
+> 💡În Jupyter notebooks poți să măsori timpul unei celule utilizând [funcția magică `%%timeit`](https://ipython.readthedocs.io/en/stable/interactive/magics.html#magic-timeit).
 
 ## Streamingul dataseturilor[[streaming-datasets]]
 
@@ -173,11 +164,8 @@ next(iter(tokenized_dataset))
 {'input_ids': [101, 4958, 5178, 4328, 6779, ...], 'attention_mask': [1, 1, 1, 1, 1, ...]}
 ```
 
-<Tip>
-
-💡 Pentru a accelera tokenizarea cu streaming puteți seta `batched=True`, ca și în secțiunea precedentă. Acest lucru va procesa exemplele, batch cu batch; dimensiunea implicită a batchului este de 1,000 și poate fi specificată cu argumentul `batch_size`.
-
-</Tip>
+> [!TIP]
+> 💡 Pentru a accelera tokenizarea cu streaming puteți seta `batched=True`, ca și în secțiunea precedentă. Acest lucru va procesa exemplele, batch cu batch; dimensiunea implicită a batchului este de 1,000 și poate fi specificată cu argumentul `batch_size`.
 
 De asemenea, puteți amesteca un streamed dataset utilizând `IterableDataset.shuffle()`, dar față de `Dataset.shuffle()` acest lucru va amesteca doar elementele dintr-un `buffer_size` predefinit:
 
@@ -278,10 +266,7 @@ next(iter(pile_dataset["train"]))
  'text': 'It is done, and submitted. You can play “Survival of the Tastiest” on Android, and on the web...'}
 ```
 
-<Tip>
-
-✏️ **Încercați!** Utilizați unul dintre cele mari corpusuri Common Crawl ca [`mc4`](https://huggingface.co/datasets/mc4) sau [`oscar`](https://huggingface.co/datasets/oscar) pentru a crea un streaming dataset multilingv care reprezintă proporția limbii vorbite într-o țară aleasă de tine. De exemplu, cele patru limbi naționale din Elveția sunt germana, franceza, italiana și romansha, așadar puteți încerca să creați un corpus elvețian prin samplingul subseturilor Oscar în funcție de proporția lor vorbită.
-
-</Tip>
+> [!TIP]
+> ✏️ **Încercați!** Utilizați unul dintre cele mari corpusuri Common Crawl ca [`mc4`](https://huggingface.co/datasets/mc4) sau [`oscar`](https://huggingface.co/datasets/oscar) pentru a crea un streaming dataset multilingv care reprezintă proporția limbii vorbite într-o țară aleasă de tine. De exemplu, cele patru limbi naționale din Elveția sunt germana, franceza, italiana și romansha, așadar puteți încerca să creați un corpus elvețian prin samplingul subseturilor Oscar în funcție de proporția lor vorbită.
 
 Acum aveți toate instrumentele necesare pentru a încărca și procesa dataseturi de orice formă și dimensiune – dar, din păcate, va veni un moment în care veți trebui să creați voi înșivă un dataset pentru a rezolva problema pe care o aveți. Acesta este subiectul următoarei secțiuni!
diff --git a/chapters/ro/chapter5/5.mdx b/chapters/ro/chapter5/5.mdx
index ce2dbbbc7..310b2793d 100644
--- a/chapters/ro/chapter5/5.mdx
+++ b/chapters/ro/chapter5/5.mdx
@@ -113,11 +113,8 @@ response.json()
 
 Uau, aceasta e o cantitate mare de informație! Putem vedea câmpuri utile cum ar fi `title`, `body` și `number` care descriu problema, precum și informații despre utilizatorul GitHub care a deschis issue-ul.
 
-<Tip>
-
-✏️ **Încercați!** Faceți clic pe câteva dintre URL-urile din payload-ul JSON de mai sus pentru a vă familiariza cu tipul de informații către care se face referire pentru fiecare GitHub issue.
-
-</Tip>
+> [!TIP]
+> ✏️ **Încercați!** Faceți clic pe câteva dintre URL-urile din payload-ul JSON de mai sus pentru a vă familiariza cu tipul de informații către care se face referire pentru fiecare GitHub issue.
 
 După cum este descris în [documentația](https://docs.github.com/en/rest/overview/resources-in-the-rest-api#rate-limiting) GitHub, solicitările neautentificate sunt limitate la 60 de solicitări pe oră. Deși puteți crește `per_page` query parameter pentru a reduce numărul de solicitări pe care le faceți, oricum veți atinge limita pentru orice repository care are mai mult de câteva mii de issues. Prin urmare, ar trebui să urmați [instrucțiunile](https://docs.github.com/en/github/authenticating-to-github/creating-a-personal-access-token) GitHub pentru crearea unui _personal access token_ astfel încât să puteți crește limita la 5.000 de solicitări pe oră. Odată ce aveți tokenul, îl puteți include ca parte a request header:
 
@@ -126,11 +123,8 @@ GITHUB_TOKEN = xxx  # Copy your GitHub token here
 headers = {"Authorization": f"token {GITHUB_TOKEN}"}
 ```
 
-<Tip warning={true}>
-
-⚠️ Nu oferiți nimănui un notebook cu `GITHUB_TOKEN` în el . Vă recomandăm să ștergeți ultima celulă odată ce ați executat-o pentru a evita scurgerea accidentală a acestor informații. Chiar mai bine, stocați tokenul într-un fișier *.env* și utilizați biblioteca `python-dotenv` pentru a îl încărca automat ca variabilă de mediu.
-
-</Tip>
+> [!WARNING]
+> ⚠️ Nu oferiți nimănui un notebook cu `GITHUB_TOKEN` în el . Vă recomandăm să ștergeți ultima celulă odată ce ați executat-o pentru a evita scurgerea accidentală a acestor informații. Chiar mai bine, stocați tokenul într-un fișier *.env* și utilizați biblioteca `python-dotenv` pentru a îl încărca automat ca variabilă de mediu.
 
 Acum că avem tokenul de acces, hai să creăm o funcție care să poată descărca toate issue-urile dintr-un repositoriu GitHub:
 
@@ -237,11 +231,8 @@ issues_dataset = issues_dataset.map(
 )
 ```
 
-<Tip>
-
-✏️ **Încercați!** Calculați timpul mediu necesar pentru închiderea issue-urilor în Datasets. Vă poate fi utilă funcția `Dataset.filter()` pentru a filtra pull requesturile și issue-urile deschise, și puteți utiliza funcția `Dataset.set_format()` pentru a converti datasetul într-un `DataFrame` astfel încât să puteți manipula cu ușurință timestampurile `created_at` și `closed_at`. Pentru puncte bonus, calculați timpul mediu necesar pentru închiderea pull requesturilor.
-
-</Tip>
+> [!TIP]
+> ✏️ **Încercați!** Calculați timpul mediu necesar pentru închiderea issue-urilor în Datasets. Vă poate fi utilă funcția `Dataset.filter()` pentru a filtra pull requesturile și issue-urile deschise, și puteți utiliza funcția `Dataset.set_format()` pentru a converti datasetul într-un `DataFrame` astfel încât să puteți manipula cu ușurință timestampurile `created_at` și `closed_at`. Pentru puncte bonus, calculați timpul mediu necesar pentru închiderea pull requesturilor.
 
 Deși am putea continua să curățăm datasetul prin eliminarea sau redenumirea unor coloane, este, în general, o practică bună să păstrăm datasetul cât mai "raw" posibil la acest stadiu, astfel încât să poată fi utilizat ușor în multiple aplicații.
 
@@ -363,11 +354,8 @@ Dataset({
 
 Cool, am încărcat datasetul nostru pe Hub și acum este disponibil pentru alții să îl utilizeze! Mai este doar un lucru important de făcut: adăugarea unui _dataset card_ care explică cum a fost creat corpusul și oferă alte informații utile pentru comunitate.
 
-<Tip>
-
-💡 De asemenea, puteți încărca un dataset pe Hugging Face Hub direct din terminal utilizând `huggingface-cli` și puțină magie Git. Consultați [ghidul 🤗 Datasets](https://huggingface.co/docs/datasets/share#share-a-dataset-using-the-cli) pentru detalii despre cum puteți face asta.
-
-</Tip>
+> [!TIP]
+> 💡 De asemenea, puteți încărca un dataset pe Hugging Face Hub direct din terminal utilizând `huggingface-cli` și puțină magie Git. Consultați [ghidul 🤗 Datasets](https://huggingface.co/docs/datasets/share#share-a-dataset-using-the-cli) pentru detalii despre cum puteți face asta.
 
 ## Crearea unei dataset card[[creating-a-dataset-card]]
 
@@ -389,16 +377,10 @@ Puteți crea fișierul *README.md* direct pe Hub și puteți găsi un template p
 <img src="https://huggingface.co/datasets/huggingface-course/documentation-images/resolve/main/en/chapter5/dataset-card.png" alt="Dataset card." width="80%"/>
 </div>
 
-<Tip>
-
-✏️ **Încercați!** Utilizați aplicația `dataset-tagging` și [ghidul 🤗 Datasets](https://github.com/huggingface/datasets/blob/master/templates/README_guide.md) pentru a completa fișierul *README.md* pentru datasetul de probleme GitHub.
-
-</Tip>
+> [!TIP]
+> ✏️ **Încercați!** Utilizați aplicația `dataset-tagging` și [ghidul 🤗 Datasets](https://github.com/huggingface/datasets/blob/master/templates/README_guide.md) pentru a completa fișierul *README.md* pentru datasetul de probleme GitHub.
 
 Astfel, am văzut în această secțiune că crearea unui dataset bun poate fi destul de complicată, dar, spre norocul nsotru, încărcarea și oferirea acestuia comunității nu sunt. În secțiunea următoare, vom utiliza datasetul nou pentru a crea un motor de căutare semantic cu 🤗 Datasets care poate să asocieze întrebări cu cele mai relevante issues și comentarii.
 
-<Tip>
-
-✏️ **Încercați!** Treceți prin pașii pe care i-am făcut în această secțiune pentru a crea un dataset de issues GitHub pentru o biblioteca open source care îți place(alegeți altceva înafară de 🤗 Datasets, desigur!). Pentru puncte bonus, faceți fine-tune unui multilabel classifier pentru a prezice tagurile prezente în câmpul `labels`.
-
-</Tip>
+> [!TIP]
+> ✏️ **Încercați!** Treceți prin pașii pe care i-am făcut în această secțiune pentru a crea un dataset de issues GitHub pentru o biblioteca open source care îți place(alegeți altceva înafară de 🤗 Datasets, desigur!). Pentru puncte bonus, faceți fine-tune unui multilabel classifier pentru a prezice tagurile prezente în câmpul `labels`.
diff --git a/chapters/ro/chapter5/6.mdx b/chapters/ro/chapter5/6.mdx
index 2fce7ed1e..f917fdeac 100644
--- a/chapters/ro/chapter5/6.mdx
+++ b/chapters/ro/chapter5/6.mdx
@@ -176,11 +176,8 @@ Dataset({
 Okay, acest lucru ne-a oferit câteva mii de comentarii cu care să lucrăm!
 
 
-<Tip>
-
-✏️ **Încercați!** Vezi dacă poți utiliza `Dataset.map()` pentru a exploda coloana `comments` din `issues_dataset` _fără_ a recurge la utilizarea Pandas. Acest lucru este puțin dificil; s-ar putea să găsiți utilă secțiunea ["Mapping batch"](https://huggingface.co/docs/datasets/about_map_batch#batch-mapping) din documentația 🤗 Datasets pentru această sarcină.
-
-</Tip>
+> [!TIP]
+> ✏️ **Încercați!** Vezi dacă poți utiliza `Dataset.map()` pentru a exploda coloana `comments` din `issues_dataset` _fără_ a recurge la utilizarea Pandas. Acest lucru este puțin dificil; s-ar putea să găsiți utilă secțiunea ["Mapping batch"](https://huggingface.co/docs/datasets/about_map_batch#batch-mapping) din documentația 🤗 Datasets pentru această sarcină.
 
 Acum că avem un singur comentariu pe rând, să creăm o nouă coloană `comments_length` care conține numărul de cuvinte din fiecare comentariu:
 
@@ -510,8 +507,5 @@ URL: https://github.com/huggingface/datasets/issues/824
 
 Nu-i rău! A doua încercare se pare că se potrivește cu query-ul!
 
-<Tip>
-
-✏️ **Încearcă!** Creează propriul tău query și vezi dacp poți găsi un răspuns și să extragi documentele. S-ar putea să trebuiești să crești parametrul `k` în `Dataset.get_nearest_examples()` pentru a mări căutarea.
-
-</Tip>
\ No newline at end of file
+> [!TIP]
+> ✏️ **Încearcă!** Creează propriul tău query și vezi dacp poți găsi un răspuns și să extragi documentele. S-ar putea să trebuiești să crești parametrul `k` în `Dataset.get_nearest_examples()` pentru a mări căutarea.
\ No newline at end of file
diff --git a/chapters/ro/chapter6/2.mdx b/chapters/ro/chapter6/2.mdx
index 955c275c1..fca53211e 100644
--- a/chapters/ro/chapter6/2.mdx
+++ b/chapters/ro/chapter6/2.mdx
@@ -11,11 +11,8 @@ Dacă un model de limbaj nu este disponibil în limba dorită sau dacă corpusul
 
 <Youtube id="DJimQynXZsQ"/>
 
-<Tip warning={true}>
-
-⚠️ Antrenarea unui tokenizer nu este același lucru ca antrenarea unui model! Antrenarea modelului folosește stochastic gradient descent pentru a face pierderea puțin mai mică pentru fiecare batch. Este randomizată prin natură (ceea ce înseamnă că trebuie să setați niște seeduri pentru a obține aceleași rezultate atunci când faceți aceeași antrenare de două ori). Antrenarea unui tokenizer este un proces statistic care încearcă să identifice care subcuvinte sunt cele mai bune pentru a fi selectate pentru un anumit corpus, și regulile exacte utilizate pentru a le selecta depind de algoritmul de tokenizare. Este determinist, ceea ce înseamnă că întotdeauna obțineți aceleași rezultate atunci când antrenați cu același algoritm pe același corpus.
-
-</Tip>
+> [!WARNING]
+> ⚠️ Antrenarea unui tokenizer nu este același lucru ca antrenarea unui model! Antrenarea modelului folosește stochastic gradient descent pentru a face pierderea puțin mai mică pentru fiecare batch. Este randomizată prin natură (ceea ce înseamnă că trebuie să setați niște seeduri pentru a obține aceleași rezultate atunci când faceți aceeași antrenare de două ori). Antrenarea unui tokenizer este un proces statistic care încearcă să identifice care subcuvinte sunt cele mai bune pentru a fi selectate pentru un anumit corpus, și regulile exacte utilizate pentru a le selecta depind de algoritmul de tokenizare. Este determinist, ceea ce înseamnă că întotdeauna obțineți aceleași rezultate atunci când antrenați cu același algoritm pe același corpus.
 
 ## Asamblarea unui corpus[[assembling-a-corpus]]
 
diff --git a/chapters/ro/chapter6/3.mdx b/chapters/ro/chapter6/3.mdx
index 6f79c5ab7..a2a440c98 100644
--- a/chapters/ro/chapter6/3.mdx
+++ b/chapters/ro/chapter6/3.mdx
@@ -33,11 +33,8 @@
 `batched=True`  | 10,8s          | 4min41s
 `batched=False` | 59,2s          | 5min3s
 
-<Tip warning={true}>
-
-⚠️ Atunci când tokenizați o singură propoziție, nu veți vedea întotdeauna o diferență de viteză între versiunea lentă și rapidă ale aceluiași tokenizer. De fapt, versiunea rapidă poate fi chiar mai lentă! Abia atunci când tokenizați multe texte în paralel, în același timp, veți putea observa clar diferența.
-
-</Tip>
+> [!WARNING]
+> ⚠️ Atunci când tokenizați o singură propoziție, nu veți vedea întotdeauna o diferență de viteză între versiunea lentă și rapidă ale aceluiași tokenizer. De fapt, versiunea rapidă poate fi chiar mai lentă! Abia atunci când tokenizați multe texte în paralel, în același timp, veți putea observa clar diferența.
 
 ## Batch encoding[[batch-encoding]]
 
@@ -106,13 +103,10 @@ encoding.word_ids()
 
 Putem vedea că tokenizerul are tokeni speciali `[CLS]` și `[SEP]` care sunt mapped la `None`, iar apoi fiecare token este mapped la cuvântul din care provine. Acest lucru este deosebit de util pentru a determina dacă un token este la începutul unui cuvânt sau dacă două tokenuri sunt în același cuvânt. Ne-am putea baza pe prefixul `##` pentru aceasta, dar funcționează doar pentru tokenizeri de tip BERT; această metodă funcționează pentru orice tip de tokenizator, atâta timp cât este unul rapid. În capitolul următor, vom vedea cum putem utiliza această capabilitate pentru a aplica labeluri pe care le avem pentru fiecare cuvânt în mod corespunzător tokenurilor în sarcini precum named entity recognition (NER) și part-of-speech (POS). De asemenea, îl putem utiliza pentru a face mask tuturor tokenurilor care provin din același cuvânt în masked language modeling(o tehnică numită _whole word masking_).
 
-<Tip>
-
-Noțiunea de ceea ce este un cuvânt este complicată. De exemplu, "I'll" (o prescurtare a "I will") contează ca unul sau două cuvinte? Acest lucru depinde de tokenizer și de operațiunea de pre-tokenizare pe care o aplică. Unii tokenizeri se divid doar pe spații, așa că vor considera acest lucru ca un singur cuvânt. Alții folosesc punctuația pe lângă spații, deci vor considera două cuvinte.
-
-✏️ **Încercați!** Creați un tokenizer din checkpointurile `bert-base-cased` și `roberta-base` și tokenizați "81s" cu ele. Ce observați? Care sunt ID-urile cuvintelor?
-
-</Tip>
+> [!TIP]
+> Noțiunea de ceea ce este un cuvânt este complicată. De exemplu, "I'll" (o prescurtare a "I will") contează ca unul sau două cuvinte? Acest lucru depinde de tokenizer și de operațiunea de pre-tokenizare pe care o aplică. Unii tokenizeri se divid doar pe spații, așa că vor considera acest lucru ca un singur cuvânt. Alții folosesc punctuația pe lângă spații, deci vor considera două cuvinte.
+>
+> ✏️ **Încercați!** Creați un tokenizer din checkpointurile `bert-base-cased` și `roberta-base` și tokenizați "81s" cu ele. Ce observați? Care sunt ID-urile cuvintelor?
 
 În mod similar, există o metodă `sentence_ids()` pe care o putem utiliza pentru a face map unui token la propoziția din care provine (deși, în acest caz, `token_type_ids` returnate de tokenizer ne pot oferi aceeași informație).
 
@@ -129,11 +123,8 @@ Sylvain
 
 Așa cum am menționat anterior, toate acestea sunt posibile datorită faptului că tokenizerul rapid ține evidența spanului de text de la care provine fiecare token într-o listă de *offseturi*. Pentru a ilustra modul în care se utilizează acestea, în continuare vă vom arăta cum să replicați rezultatele pipelineului `token-classification` manual.
 
-<Tip>
-
-✏️ **Încercați!** Creați propriul exemplu de text și încercați să înțelegeți care tokenuri sunt asociate cu ID-ul cuvântului și, de asemenea, cum să extrageți spanurile pentru singur cuvânt. Pentru puncte bonus, încercați să utilizați două propoziții ca inputuri și să vedeți dacă ID-urile propozițiilor au sens pentru voi.
-
-</Tip>
+> [!TIP]
+> ✏️ **Încercați!** Creați propriul exemplu de text și încercați să înțelegeți care tokenuri sunt asociate cu ID-ul cuvântului și, de asemenea, cum să extrageți spanurile pentru singur cuvânt. Pentru puncte bonus, încercați să utilizați două propoziții ca inputuri și să vedeți dacă ID-urile propozițiilor au sens pentru voi.
 
 ## În interiorul pipelineului `token-classification`[[inside-the-token-classification-pipeline]]
 
diff --git a/chapters/ro/chapter6/3b.mdx b/chapters/ro/chapter6/3b.mdx
index 0a7997708..55f2c0e9c 100644
--- a/chapters/ro/chapter6/3b.mdx
+++ b/chapters/ro/chapter6/3b.mdx
@@ -276,11 +276,8 @@ Nu am terminat încă, dar cel puțin avem deja scorul corect pentru răspuns (p
 0.97773
 ```
 
-<Tip>
-
-✏️ **Încercați!** Calculați indicii de început și de sfârșit pentru cele mai probabile cinci răspunsuri.
-
-</Tip>
+> [!TIP]
+> ✏️ **Încercați!** Calculați indicii de început și de sfârșit pentru cele mai probabile cinci răspunsuri.
 
 Avem `start_index` și `end_index` ale răspunsului în termeni de tokens, deci acum trebuie doar să convertim în character indices în context. Acesta este momentul în care offseturile vor fi foarte utile. Putem să le luăm și să le folosim așa cum am făcut în sarcina de clasificare a tokenurilor:
 
@@ -314,11 +311,8 @@ print(result)
 
 Grozav! Este la fel ca în primul nostru exemplu!
 
-<Tip>
-
-✏️ **Încercați!** Utilizați cele mai bune scoruri pe care le-ați calculat anterior pentru a afișa cele mai probabile cinci răspunsuri. Pentru a vă verifica rezultatele, întoarceți-vă la primul pipeline și introduceți `top_k=5` atunci când îl apelați.
-
-</Tip>
+> [!TIP]
+> ✏️ **Încercați!** Utilizați cele mai bune scoruri pe care le-ați calculat anterior pentru a afișa cele mai probabile cinci răspunsuri. Pentru a vă verifica rezultatele, întoarceți-vă la primul pipeline și introduceți `top_k=5` atunci când îl apelați.
 
 ## Gestionarea contextelor lungi[[handling-long-contexts]]
 
@@ -609,11 +603,8 @@ print(candidates)
 
 Cei doi candidați corespund celor mai bune răspunsuri pe care modelul le-a putut găsi în fiecare parte. Modelul este mult mai încrezător că răspunsul corect se află în a doua parte (ceea ce este un semn bun!). Acum trebuie doar să facem map celor două intervale de tokenuri cu intervalele de caractere din context (trebuie să o punem în corespondență doar pe a doua pentru a avea răspunsul nostru, dar este interesant să vedem ce a ales modelul în prima parte).
 
-<Tip>
-
-✏️ **Încercați!** Adaptați codul de mai sus pentru a returna scorurile și spanurile intervalele pentru cele mai probabile cinci răspunsuri (în total, nu pe chunk).
-
-</Tip>
+> [!TIP]
+> ✏️ **Încercați!** Adaptați codul de mai sus pentru a returna scorurile și spanurile intervalele pentru cele mai probabile cinci răspunsuri (în total, nu pe chunk).
 
 `offsets`-urile pe care le-am luat mai devreme este de fapt o listă de offsets, cu o listă pentru fiecare chunk de text:
 
@@ -634,10 +625,7 @@ for candidate, offset in zip(candidates, offsets):
 
 Dacă ignorăm primul rezultat, obținem același rezultat ca și pipelineul noastru pentru acest context lung - yay!
 
-<Tip>
-
-✏️ **Încercați!** Utilizați cele mai bune scoruri pe care le-ați calculat înainte pentru a afișa cele mai probabile cinci răspunsuri (pentru întregul context, nu pentru fiecare chunk). Pentru a vă verifica rezultatele, întoarceți-vă la primul pipeline și introduceți `top_k=5` atunci când îl apelați.
-
-</Tip>
+> [!TIP]
+> ✏️ **Încercați!** Utilizați cele mai bune scoruri pe care le-ați calculat înainte pentru a afișa cele mai probabile cinci răspunsuri (pentru întregul context, nu pentru fiecare chunk). Pentru a vă verifica rezultatele, întoarceți-vă la primul pipeline și introduceți `top_k=5` atunci când îl apelați.
 
 Aici se încheie scufundarea noastră în capacitățile tokenizerului. Vom pune toate acestea din nou în practică în capitolul următor, când vă vom arăta cum să ajustați un model pentru o serie de sarcini NLP comune.
diff --git a/chapters/ro/chapter6/4.mdx b/chapters/ro/chapter6/4.mdx
index d19b1777a..58c9abec7 100644
--- a/chapters/ro/chapter6/4.mdx
+++ b/chapters/ro/chapter6/4.mdx
@@ -47,11 +47,8 @@ print(tokenizer.backend_tokenizer.normalizer.normalize_str("Héllò hôw are ü?
 
 În acest exemplu, din moment ce am ales checkpointul `bert-base-uncased`, normalizarea a aplicat scrierea cu minusculă și a eliminat accentele.
 
-<Tip>
-
-✏️ **Încercați!** Încărcați un tokenizer din checkpointul `bert-base-cased` și treceți-i același exemplu. Care sunt principalele diferențe pe care le puteți observa între versiunile cased și uncased ale tokenizerului?
-
-</Tip>
+> [!TIP]
+> ✏️ **Încercați!** Încărcați un tokenizer din checkpointul `bert-base-cased` și treceți-i același exemplu. Care sunt principalele diferențe pe care le puteți observa între versiunile cased și uncased ale tokenizerului?
 
 ## Pre-tokenization[[pre-tokenization]]
 
diff --git a/chapters/ro/chapter6/5.mdx b/chapters/ro/chapter6/5.mdx
index 91c26b82f..2c8d4466e 100644
--- a/chapters/ro/chapter6/5.mdx
+++ b/chapters/ro/chapter6/5.mdx
@@ -11,11 +11,8 @@ Byte-Pair Encoding (BPE) a fost inițial dezvoltat ca un algoritm de comprimare
 
 <Youtube id="HEikzVL-lZU"/>
 
-<Tip>
-
-💡 Această secțiune acoperă BPE în profunzime, mergând până la prezentarea unei implementări complete. Puteți sări la sfârșit dacă doriți doar o prezentare generală a algoritmului de tokenizare.
-
-</Tip>
+> [!TIP]
+> 💡 Această secțiune acoperă BPE în profunzime, mergând până la prezentarea unei implementări complete. Puteți sări la sfârșit dacă doriți doar o prezentare generală a algoritmului de tokenizare.
 
 ## Algoritmul de antrenare[[training-algorithm]]
 
@@ -27,11 +24,8 @@ Antrenarea BPE începe prin calcularea setului unic de cuvinte utilizate în cor
 
 Vocabularul de bază va fi atunci `["b", "g", "h", "n", "p", "s", "u"]`. Pentru cazurile din lumea reală, vocabularul de bază va conține cel puțin toate caracterele ASCII și, probabil, și unele caractere Unicode. Dacă un exemplu pe care îl tokenizați utilizează un caracter care nu se află în corpusul de antrenare, acel caracter va fi convertit într-un token necunoscut. Acesta este unul dintre motivele pentru care o mulțime de modele NLP sunt foarte proaste la analizarea conținutului cu emoji, de exemplu.
 
-<Tip>
-
-Tokenizerele GPT-2 și RoBERTa (care sunt destul de asemănătoare) au o modalitate inteligentă de a rezolva acest lucru: ele nu privesc cuvintele ca fiind scrise cu caractere Unicode, ci cu bytes. În acest fel, vocabularul de bază are o dimensiune mică (256), dar fiecare caracter la care vă puteți gândi va fi inclus și nu va ajunge să fie convertit într-un token necunoscut. Acest truc se numește *byte-level BPE*.
-
-</Tip>
+> [!TIP]
+> Tokenizerele GPT-2 și RoBERTa (care sunt destul de asemănătoare) au o modalitate inteligentă de a rezolva acest lucru: ele nu privesc cuvintele ca fiind scrise cu caractere Unicode, ci cu bytes. În acest fel, vocabularul de bază are o dimensiune mică (256), dar fiecare caracter la care vă puteți gândi va fi inclus și nu va ajunge să fie convertit într-un token necunoscut. Acest truc se numește *byte-level BPE*.
 
 După obținerea acestui vocabular de bază, adăugăm noi tokeni până când se atinge dimensiunea dorită a vocabularului prin învățarea prin *merges*, care sunt reguli de merge a două elemente ale vocabularului existent într-unul nou. Astfel, la început, aceste fuziuni vor crea tokenuri cu două caractere, iar apoi, pe măsură ce antrenamentul progresează, subwords mai lungi.
 
@@ -74,11 +68,8 @@ Corpus: ("hug", 10), ("p" "ug", 5), ("p" "un", 12), ("b" "un", 4), ("hug" "s", 5
 
 Și continuăm astfel până când ajungem la dimensiunea dorită a vocabularului.
 
-<Tip>
-
-✏️ **Acum e rândul tău!** Care crezi că va fi următoarea regulă de fuziune?
-
-</Tip>
+> [!TIP]
+> ✏️ **Acum e rândul tău!** Care crezi că va fi următoarea regulă de fuziune?
 
 ## Algoritmul de tokenizare[[tokenization-algorithm]]
 
@@ -99,11 +90,8 @@ Să luăm exemplul pe care l-am folosit în timpul antrenamentului, cu cele trei
 
 Cuvântul `"bug"` va fi tokenizat ca `["b", "ug"]`. Cu toate acestea, cuvântul `"mug"` va fi tokenizat ca `["[UNK]", "ug"]` deoarece litera `"m"` nu a fost în vocabularul de bază. De asemenea, cuvântul `"thug"` va fi tokenizat ca `["[UNK]", "hug"]`: litera `"t"` nu se află în vocabularul de bază, iar aplicarea regulilor de merge duce mai întâi la fuzionarea lui `"u"` și `"g"` și apoi la fuzionarea lui `"h"` și `"ug"`.
 
-<Tip>
-
-✏️ **Acum e rândul tău!** Cum crezi că va fi tokenizat cuvântul `"unhug"`?
-
-</Tip>
+> [!TIP]
+> ✏️ **Acum e rândul tău!** Cum crezi că va fi tokenizat cuvântul `"unhug"`?
 
 ## Implementarea BPE[[implementing-bpe]]
 
@@ -315,11 +303,8 @@ print(vocab)
  'Ġtok', 'Ġtoken', 'nd', 'Ġis', 'Ġth', 'Ġthe', 'in', 'Ġab', 'Ġtokeni']
 ```
 
-<Tip>
-
-💡 Folosind `train_new_from_iterator()` pe același corpus nu va rezulta exact același vocabular. Acest lucru se datorează faptului că atunci când există o alegere a celei mai frecvente perechi, am selectat-o pe prima întâlnită, în timp ce biblioteca 🤗 Tokenizers o selectează pe prima pe baza ID-urilor sale interne.
-
-</Tip>
+> [!TIP]
+> 💡 Folosind `train_new_from_iterator()` pe același corpus nu va rezulta exact același vocabular. Acest lucru se datorează faptului că atunci când există o alegere a celei mai frecvente perechi, am selectat-o pe prima întâlnită, în timp ce biblioteca 🤗 Tokenizers o selectează pe prima pe baza ID-urilor sale interne.
 
 Pentru a tokeniza un text nou, îl pre-tokenizăm, îl împărțim, apoi aplicăm toate regulile de merge învățate:
 
@@ -351,10 +336,7 @@ tokenize("This is not a token.")
 ['This', 'Ġis', 'Ġ', 'n', 'o', 't', 'Ġa', 'Ġtoken', '.']
 ```
 
-<Tip warning={true}>
-
-⚠️ Implementarea noastră va arunca o eroare dacă există un caracter necunoscut, deoarece nu am făcut nimic pentru a le gestiona. GPT-2 nu are de fapt un token necunoscut (este imposibil să obțineți un caracter necunoscut atunci când utilizați BPE la nivel de bytes), dar acest lucru s-ar putea întâmpla aici deoarece nu am inclus toate byte-urile posibile în vocabularul inițial. Acest aspect al BPE depășește domeniul de aplicare al acestei secțiuni, așa că am omis detaliile.
-
-</Tip>
+> [!WARNING]
+> ⚠️ Implementarea noastră va arunca o eroare dacă există un caracter necunoscut, deoarece nu am făcut nimic pentru a le gestiona. GPT-2 nu are de fapt un token necunoscut (este imposibil să obțineți un caracter necunoscut atunci când utilizați BPE la nivel de bytes), dar acest lucru s-ar putea întâmpla aici deoarece nu am inclus toate byte-urile posibile în vocabularul inițial. Acest aspect al BPE depășește domeniul de aplicare al acestei secțiuni, așa că am omis detaliile.
 
 Asta e tot pentru algoritmul BPE! În continuare, ne vom uita la WordPiece.
\ No newline at end of file
diff --git a/chapters/ro/chapter6/6.mdx b/chapters/ro/chapter6/6.mdx
index ffa06beff..46b0dc826 100644
--- a/chapters/ro/chapter6/6.mdx
+++ b/chapters/ro/chapter6/6.mdx
@@ -11,19 +11,13 @@ WordPiece este algoritmul de tokenizare dezvoltat de Google pentru preantrenarea
 
 <Youtube id="qpv6ms_t_1A"/>
 
-<Tip>
-
-💡 Această secțiune acoperă WordPiece în profunzime, mergând până la prezentarea unei implementări complete. Puteți sări la sfârșit dacă doriți doar o prezentare generală a algoritmului de tokenizare.
-
-</Tip>
+> [!TIP]
+> 💡 Această secțiune acoperă WordPiece în profunzime, mergând până la prezentarea unei implementări complete. Puteți sări la sfârșit dacă doriți doar o prezentare generală a algoritmului de tokenizare.
 
 ## Algoritmul de antrenare[[training-algorithm]]
 
-<Tip warning={true}>
-
-⚠️ Google nu a publicat niciodată implementarea algoritmului de formare a WordPiece, astfel încât ceea ce urmează este cea mai bună presupunere a noastră bazată pe literatura publicată. Este posibil să nu fie 100% exactă.
-
-</Tip>
+> [!WARNING]
+> ⚠️ Google nu a publicat niciodată implementarea algoritmului de formare a WordPiece, astfel încât ceea ce urmează este cea mai bună presupunere a noastră bazată pe literatura publicată. Este posibil să nu fie 100% exactă.
 
 La fel ca BPE, WordPiece pornește de la un vocabular restrâns care include simbolurile speciale utilizate de model și alfabetul inițial. Deoarece identifică subcuvinte prin adăugarea unui prefix (cum ar fi `##` pentru BERT), fiecare cuvânt este inițial împărțit prin adăugarea prefixului respectiv la toate caracterele din cuvânt. Astfel, de exemplu, `"word"` este împărțit astfel:
 
@@ -76,11 +70,8 @@ Corpus: ("hug", 10), ("p" "##u" "##g", 5), ("p" "##u" "##n", 12), ("b" "##u" "##
 
 și continuăm astfel până când ajungem la dimensiunea dorită a vocabularului.
 
-<Tip>
-
-✏️ **Acum e rândul tău!** Care va fi următoarea regulă de merge?
-
-</Tip>
+> [!TIP]
+> ✏️ **Acum e rândul tău!** Care va fi următoarea regulă de merge?
 
 ## Algoritm de tokenizare[[tokenization-algorithm]]
 
@@ -92,11 +83,8 @@ Ca un alt exemplu, să vedem cum ar fi tokenizat cuvântul `"bugs"`. `"b"` este
 
 Atunci când tokenizarea ajunge într-un stadiu în care nu este posibilă găsirea unui subcuvânt în vocabular, întregul cuvânt este tokenizat ca necunoscut - astfel, de exemplu, `"mug"` ar fi tokenizat ca `["[UNK]"]`, la fel ca `"bum"` (chiar dacă putem începe cu `"b"` și `"##u"`, `"##m"` nu face parte din vocabular, iar tokenizarea rezultată va fi `["[UNK]"]`, nu `["b", "##u", "[UNK]"]`). Aceasta este o altă diferență față de BPE, care ar clasifica doar caracterele individuale care nu se află în vocabular ca necunoscute.
 
-<Tip>
-
-✏️ **Acum e rândul tău!** Cum va fi tokenizat cuvântul `"pugs"`?
-
-</Tip>
+> [!TIP]
+> ✏️ **Acum e rândul tău!** Cum va fi tokenizat cuvântul `"pugs"`?
 
 ## Implementând WordPiece[[implementing-wordpiece]]
 
@@ -314,11 +302,8 @@ print(vocab)
 
 După cum putem vedea, în comparație cu BPE, acest tokenizator învață părțile din cuvinte ca tokenuri puțin mai repede.
 
-<Tip>
-
-💡 Folosind `train_new_from_iterator()` pe același corpus nu va rezulta exact același vocabular. Acest lucru se datorează faptului că biblioteca 🤗 Tokenizers nu implementează WordPiece pentru antrenare (deoarece nu suntem complet siguri cum funcționează intern), ci utilizează BPE în schimb.
-
-</Tip>
+> [!TIP]
+> 💡 Folosind `train_new_from_iterator()` pe același corpus nu va rezulta exact același vocabular. Acest lucru se datorează faptului că biblioteca 🤗 Tokenizers nu implementează WordPiece pentru antrenare (deoarece nu suntem complet siguri cum funcționează intern), ci utilizează BPE în schimb.
 
 Pentru a tokeniza un text nou, îl pre-tokenizăm, îl împărțim, apoi aplicăm algoritmul de tokenizare pe fiecare cuvânt. Adică, căutăm cel mai mare subcuvânt începând de la începutul primului cuvânt și îl împărțim, apoi repetăm procesul pentru a doua parte și așa mai departe pentru restul acelui cuvânt și pentru următoarele cuvinte din text:
 
diff --git a/chapters/ro/chapter6/7.mdx b/chapters/ro/chapter6/7.mdx
index 3bc245c67..aa362f240 100644
--- a/chapters/ro/chapter6/7.mdx
+++ b/chapters/ro/chapter6/7.mdx
@@ -11,11 +11,8 @@ Algoritmul Unigram este adesea utilizat în SentencePiece, care este algoritmul
 
 <Youtube id="TGZfZVuF9Yc"/>
 
-<Tip>
-
-💡 Această secțiune acoperă Unigram în profunzime, mergând până la prezentarea unei implementări complete. Puteți sări la sfârșit dacă doriți doar o prezentare generală a algoritmului de tokenizare.
-
-</Tip>
+> [!TIP]
+> 💡 Această secțiune acoperă Unigram în profunzime, mergând până la prezentarea unei implementări complete. Puteți sări la sfârșit dacă doriți doar o prezentare generală a algoritmului de tokenizare.
 
 ## Algoritm de antrenare[[training-algorithm]]
 
@@ -56,11 +53,8 @@ Iată frecvențele tuturor subcuvintelor posibile din vocabular:
 
 Astfel, suma tuturor frecvențelor este 210, iar probabilitatea subcuvântului `"ug"` este 20/210.
 
-<Tip>
-
-✏️ **Acum este rândul tău!** Scrie codul pentru a calcula frecvențele de mai sus și verifică de două ori dacă rezultatele afișate sunt corecte, precum și suma totală.
-
-</Tip>
+> [!TIP]
+> ✏️ **Acum este rândul tău!** Scrie codul pentru a calcula frecvențele de mai sus și verifică de două ori dacă rezultatele afișate sunt corecte, precum și suma totală.
 
 Acum, pentru a tokeniza un cuvânt dat, ne uităm la toate segmentările posibile în tokeni și calculăm probabilitatea fiecăruia în conformitate cu modelul Unigram. Deoarece toate token-urile sunt considerate independente, această probabilitate este doar produsul probabilității fiecărui token. De exemplu, tokenizarea `["p", "u", "g"]` a lui `"pug"` are probabilitatea:
 
@@ -98,11 +92,8 @@ Character 4 (g): "un" "hug" (score 0.005442)
 
 Astfel, `"unhug"` ar fi tokenizat ca `["un", "hug"]`.
 
-<Tip>
-
-✏️ **Acum e rândul tău!** Determinați tokenizarea cuvântului `"huggun"` și scorul acestuia.
-
-</Tip>
+> [!TIP]
+> ✏️ **Acum e rândul tău!** Determinați tokenizarea cuvântului `"huggun"` și scorul acestuia.
 
 ## Înapoi la antrenare[[back-to-training]]
 
@@ -215,11 +206,8 @@ token_freqs = list(char_freqs.items()) + sorted_subwords[: 300 - len(char_freqs)
 token_freqs = {token: freq for token, freq in token_freqs}
 ```
 
-<Tip>
-
-💡 SentencePiece utilizează un algoritm mai eficient numit Enhanced Suffix Array (ESA) pentru a crea vocabularul inițial.
-
-</Tip>
+> [!TIP]
+> 💡 SentencePiece utilizează un algoritm mai eficient numit Enhanced Suffix Array (ESA) pentru a crea vocabularul inițial.
 
 În continuare, calculăm suma tuturor frecvențelor, pentru a converti frecvențele în probabilități. Pentru modelul nostru, vom stoca logaritmii probabilităților, deoarece este mai stabil din punct de vedere numeric să adăugăm logaritmi decât să multiplicăm numere mici, iar acest lucru va simplifica calcularea pierderii modelului:
 
@@ -340,11 +328,8 @@ Deoarece `"ll"` este folosit în tokenizarea lui `"Hopefully"`, iar eliminarea l
 0.0
 ```
 
-<Tip>
-
-💡 Această abordare este foarte ineficientă, astfel încât SentencePiece utilizează o aproximare a pierderii modelului fără simbolul X: în loc să înceapă de la zero, înlocuiește simbolul X cu segmentarea sa în vocabularul rămas. În acest fel, toate scorurile pot fi calculate odată, în același timp cu pierderea modelului.
-
-</Tip>
+> [!TIP]
+> 💡 Această abordare este foarte ineficientă, astfel încât SentencePiece utilizează o aproximare a pierderii modelului fără simbolul X: în loc să înceapă de la zero, înlocuiește simbolul X cu segmentarea sa în vocabularul rămas. În acest fel, toate scorurile pot fi calculate odată, în același timp cu pierderea modelului.
 
 Cu toate acestea la locul lor, ultimul lucru pe care trebuie să îl facem este să adăugăm la vocabular tokeni speciali utilizate de model, apoi să facem o buclă până când am eliminat suficienți tokeni din vocabular pentru a ajunge la dimensiunea dorită:
 
diff --git a/chapters/ro/chapter6/8.mdx b/chapters/ro/chapter6/8.mdx
index a1a2be892..d0ccf8e0b 100644
--- a/chapters/ro/chapter6/8.mdx
+++ b/chapters/ro/chapter6/8.mdx
@@ -111,13 +111,9 @@ print(tokenizer.normalizer.normalize_str("Héllò hôw are ü?"))
 hello how are u?
 ```
 
-<Tip>
-
-**Pentru a merge mai departe** Dacă testați cele două versiuni ale normalizatorilor anteriori pe un șir care conține caracterul Unicode `u"\u0085"` veți observa cu siguranță că acești doi normalizatori nu sunt exact echivalenți.
-Pentru a nu complica prea mult versiunea cu `normalizers.Sequence` , nu am inclus înlocuirile Regex pe care `BertNormalizer` le cere atunci când argumentul `clean_text` este setat la `True` - care este comportamentul implicit. Dar nu vă faceți griji: este posibil să obțineți exact aceeași normalizare fără a utiliza utilul `BertNormalizer` prin adăugarea a două `normalizers.Replace` la secvența normalizers.
-
-
-</Tip>
+> [!TIP]
+> **Pentru a merge mai departe** Dacă testați cele două versiuni ale normalizatorilor anteriori pe un șir care conține caracterul Unicode `u"\u0085"` veți observa cu siguranță că acești doi normalizatori nu sunt exact echivalenți.
+> Pentru a nu complica prea mult versiunea cu `normalizers.Sequence` , nu am inclus înlocuirile Regex pe care `BertNormalizer` le cere atunci când argumentul `clean_text` este setat la `True` - care este comportamentul implicit. Dar nu vă faceți griji: este posibil să obțineți exact aceeași normalizare fără a utiliza utilul `BertNormalizer` prin adăugarea a două `normalizers.Replace` la secvența normalizers.
 
 Urmează etapa de pre-tokenizare. Din nou, există un `BertPreTokenizer` pre-construit pe care îl putem utiliza:
 
diff --git a/chapters/ro/chapter7/1.mdx b/chapters/ro/chapter7/1.mdx
index a5d41cf80..5ffcebc88 100644
--- a/chapters/ro/chapter7/1.mdx
+++ b/chapters/ro/chapter7/1.mdx
@@ -31,8 +31,5 @@ Fiecare secțiune poate fi citită independent.
 {/if}
 
 
-<Tip>
-
-Dacă citiți secțiunile în succesiune, veți observa că acestea au destul de mult cod și proză în comun. Repetarea este intenționată, pentru a vă permite să intrați (sau să reveniți mai târziu) la orice sarcină care vă interesează și să găsiți un exemplu.
-
-</Tip>
+> [!TIP]
+> Dacă citiți secțiunile în succesiune, veți observa că acestea au destul de mult cod și proză în comun. Repetarea este intenționată, pentru a vă permite să intrați (sau să reveniți mai târziu) la orice sarcină care vă interesează și să găsiți un exemplu.
diff --git a/chapters/ro/chapter7/2.mdx b/chapters/ro/chapter7/2.mdx
index 556650e97..1052c1887 100644
--- a/chapters/ro/chapter7/2.mdx
+++ b/chapters/ro/chapter7/2.mdx
@@ -45,11 +45,8 @@ Puteți găsi modelul pe care îl vom antrena și încărca în Hub și puteți
 
 În primul rând, avem nevoie de un dataset adecvat pentru clasificarea simbolurilor. În această secțiune vom utiliza [datasetul CoNLL-2003] (https://huggingface.co/datasets/conll2003), care conține știri de la Reuters.
 
-<Tip>
-
-💡 Atât timp cât datasetul vostru constă în texte împărțite în cuvinte cu labelurile corespunzătoare, veți putea adapta procedurile de preprocesare a datelor descrise aici la propriul dataset. Consultați [Capitolul 5](/course/chapter5) dacă aveți nevoie de o recapitulare a modului de încărcare a propriilor date personalizate într-un `Dataset`.
-
-</Tip>
+> [!TIP]
+> 💡 Atât timp cât datasetul vostru constă în texte împărțite în cuvinte cu labelurile corespunzătoare, veți putea adapta procedurile de preprocesare a datelor descrise aici la propriul dataset. Consultați [Capitolul 5](/course/chapter5) dacă aveți nevoie de o recapitulare a modului de încărcare a propriilor date personalizate într-un `Dataset`.
 
 ### Datasetul CoNLL-2003 [[the-conll-2003-dataset]]
 
@@ -167,11 +164,8 @@ print(line2)
 
 După cum putem vedea, entităților care cuprind două cuvinte, precum "Uniunea Europeană" și "Werner Zwingmann", li se atribuie un label `B-` pentru primul cuvânt și un label `I-` pentru al doilea.
 
-<Tip>
-
-✏️ **E rândul tău!** Afișați aceleași două propoziții cu labelurile POS sau chunking.
-
-</Tip>
+> [!TIP]
+> ✏️ **E rândul tău!** Afișați aceleași două propoziții cu labelurile POS sau chunking.
 
 ### Procesarea datelor[[processing-the-data]]
 
@@ -264,11 +258,8 @@ print(align_labels_with_tokens(labels, word_ids))
 
 După cum putem vedea, funcția noastră a adăugat `-100` pentru cei doi tokeni speciali de la început și de la sfârșit, și un nou `0` pentru cuvântul nostru care a fost împărțit în doi tokeni.
 
-<Tip>
-
-✏️ ** Rândul tău!** Unii cercetători preferă să atribuie un singur label pe cuvânt și să atribuie `-100` celorlalți subtokeni dintr-un cuvânt dat. Aceasta are loc pentru a evita ca cuvintele lungi care se împart în mai mulți subtokeni să contribuie puternic la pierdere. Modificați funcția anterioară pentru a alinia labelurile cu ID-urile de input urmând această regulă.
-
-</Tip>
+> [!TIP]
+> ✏️ ** Rândul tău!** Unii cercetători preferă să atribuie un singur label pe cuvânt și să atribuie `-100` celorlalți subtokeni dintr-un cuvânt dat. Aceasta are loc pentru a evita ca cuvintele lungi care se împart în mai mulți subtokeni să contribuie puternic la pierdere. Modificați funcția anterioară pentru a alinia labelurile cu ID-urile de input urmând această regulă.
 
 Pentru a preprocesa întregul nostru dataset, trebuie să tokenizăm toate inputurile și să aplicăm `align_labels_with_tokens()` pe toate labelurile. Pentru a profita de viteza tokenizerului nostru rapid, este mai bine să tokenizăm multe texte în același timp, așa că vom scrie o funcție care procesează o listă de exemple și vom folosi metoda `Dataset.map()` cu opțiunea `batched=True`. Singurul lucru diferit față de exemplul nostru anterior este că funcția `word_ids()` trebuie să obțină indexul exemplului din care dorim ID-urile cuvintelor atunci când inputurile către tokenizer sunt liste de texte (sau, în cazul nostru, liste de liste de cuvinte), așa că adăugăm și acest lucru:
 
@@ -430,11 +421,8 @@ model.config.num_labels
 9
 ```
 
-<Tip warning={true}>
-
-⚠️ Dacă aveți un model cu un număr greșit de labeluri, veți primi o eroare obscură atunci când apelați `model.fit()` mai târziu. Acest lucru poate fi enervant pentru debbuging, așa că asigurați-vă că faceți această verificare pentru a confirma că aveți numărul așteptat de labeluri.
-
-</Tip>
+> [!WARNING]
+> ⚠️ Dacă aveți un model cu un număr greșit de labeluri, veți primi o eroare obscură atunci când apelați `model.fit()` mai târziu. Acest lucru poate fi enervant pentru debbuging, așa că asigurați-vă că faceți această verificare pentru a confirma că aveți numărul așteptat de labeluri.
 
 ### Fine-tuningul modelului[[fine-tuning-the-model]]
 
@@ -498,11 +486,8 @@ model.fit(
 
 Puteți specifica numele complet al repositoriului către care doriți să efectuați push cu argumentul `hub_model_id` (în special, va trebui să utilizați acest argument pentru a efectua push către o organizație). De exemplu, atunci când am trimis modelul către organizația [`huggingface-course`](https://huggingface.co/huggingface-course), am adăugat `hub_model_id="huggingface-course/bert-finetuned-ner"`. În mod implicit, repositoriul utilizat va fi în namespace-ul denumit după directory output pe care l-ați stabilit, de exemplu `"cool_huggingface_user/bert-finetuned-ner"`.
 
-<Tip>
-
-💡 Dacă directory output pe care îl utilizați există deja, acesta trebuie să fie o clonă locală a repositoriului către care doriți să faceți push. Dacă nu este, veți primi o eroare atunci când apelați `model.fit()` și va trebui să setați un nume nou.
-
-</Tip>
+> [!TIP]
+> 💡 Dacă directory output pe care îl utilizați există deja, acesta trebuie să fie o clonă locală a repositoriului către care doriți să faceți push. Dacă nu este, veți primi o eroare atunci când apelați `model.fit()` și va trebui să setați un nume nou.
 
 Rețineți că, în timpul antrenamentului, de fiecare dată când modelul este salvat (aici, la fiecare epocă), acesta este încărcat pe Hub în fundal. În acest fel, veți putea să reluați formarea pe o altă mașină, dacă este necesar.
 
@@ -680,11 +665,8 @@ model.config.num_labels
 9
 ```
 
-<Tip warning={true}>
-
-⚠️ Dacă aveți un model cu un număr greșit de labeluri, veți primi o eroare obscură atunci când apelați metoda `Trainer.train()` mai târziu (ceva de genul "CUDA error: device-side assert triggered"). Aceasta este cauza numărul unu a erorilor raportate de utilizatori pentru astfel de erori, așa că asigurați-vă că faceți această verificare pentru a confirma că aveți numărul de labeluri așteptat.
-
-</Tip>
+> [!WARNING]
+> ⚠️ Dacă aveți un model cu un număr greșit de labeluri, veți primi o eroare obscură atunci când apelați metoda `Trainer.train()` mai târziu (ceva de genul "CUDA error: device-side assert triggered"). Aceasta este cauza numărul unu a erorilor raportate de utilizatori pentru astfel de erori, așa că asigurați-vă că faceți această verificare pentru a confirma că aveți numărul de labeluri așteptat.
 
 ### Fine-tuningul modelului[[fine-tuning-the-model]]
 
@@ -722,11 +704,8 @@ args = TrainingArguments(
 
 Ați mai văzut cele mai multe dintre acestea: stabilim niște hiperparametri (cum ar fi learning rate, numărul de epoci pentru care să ne antrenăm și weights decay) și specificăm `push_to_hub=True` pentru a indica faptul că dorim să salvăm modelul și să îl evaluăm la sfârșitul fiecărei epoci și că dorim să încărcăm rezultatele noastre în Model Hub. Rețineți că puteți specifica numele repositoriului către care doriți să faceți push cu argumentul `hub_model_id` (în special, va trebui să utilizați acest argument pentru a face push către o organizație). De exemplu, atunci când am trimis modelul către organizația [`huggingface-course`](https://huggingface.co/huggingface-course), am adăugat `hub_model_id="huggingface-course/bert-finetuned-ner"` la `TrainingArguments`. În mod implicit, repositoriul utilizat va fi în namespaceul tău și denumit după output directory-ul pe care l-ați setat, deci în cazul nostru va fi `"sgugger/bert-finetuned-ner"`.
 
-<Tip>
-
-💡 Dacă output directory-ul pe care îl utilizați există deja, acesta trebuie să fie o clonă locală a repositoriul către care doriți să faceți push. Dacă nu este așa, veți primi o eroare la definirea `Trainer` și va trebui să setați un nume nou.
-
-</Tip>
+> [!TIP]
+> 💡 Dacă output directory-ul pe care îl utilizați există deja, acesta trebuie să fie o clonă locală a repositoriul către care doriți să faceți push. Dacă nu este așa, veți primi o eroare la definirea `Trainer` și va trebui să setați un nume nou.
 
 În final, transmitem totul către `Trainer` și lansăm antrenarea:
 
@@ -814,11 +793,8 @@ model, optimizer, train_dataloader, eval_dataloader = accelerator.prepare(
 )
 ```
 
-<Tip>
-
-🚨 Dacă vă antrenați pe un TPU, va trebui să mutați tot codul începând de la celula de mai sus într-o funcție de antrenament dedicată. Consultați [Capitolul 3](/course/chapter3) pentru mai multe detalii.
-
-</Tip>
+> [!TIP]
+> 🚨 Dacă vă antrenați pe un TPU, va trebui să mutați tot codul începând de la celula de mai sus într-o funcție de antrenament dedicată. Consultați [Capitolul 3](/course/chapter3) pentru mai multe detalii.
 
 Acum că am trimis `train_dataloader` la `accelerator.prepare()`, putem utiliza lungimea acestuia pentru a calcula numărul de pași de antrenare. Rețineți că ar trebui să facem întotdeauna acest lucru după ce pregătim dataloaderul, deoarece această metodă îi va modifica lungimea. Utilizăm un classic liner schedule de la rata de învățare la 0:
 
diff --git a/chapters/ro/chapter7/3.mdx b/chapters/ro/chapter7/3.mdx
index 376173cc9..ac8eb4583 100644
--- a/chapters/ro/chapter7/3.mdx
+++ b/chapters/ro/chapter7/3.mdx
@@ -41,11 +41,8 @@ Hai să începem!
 
 <Youtube id="mqElG5QJWUg"/>
 
-<Tip>
-
-🙋 Dacă termenii "masked language modeling" și "pretrained model" nu vă sună familiar, mergeți să verificați [Capitolul 1](/course/chapter1), unde vă explicăm toate aceste concepte de bază, cu videoclipuri!
-
-</Tip>
+> [!TIP]
+> 🙋 Dacă termenii "masked language modeling" și "pretrained model" nu vă sună familiar, mergeți să verificați [Capitolul 1](/course/chapter1), unde vă explicăm toate aceste concepte de bază, cu videoclipuri!
 
 ## Alegerea unui model preantrenat pentru masked language modeling[[picking-a-pretrained-model-for-masked-language-modeling]]
 
@@ -236,11 +233,8 @@ for row in sample:
 
 Da, acestea sunt cu siguranță recenzii de film și, dacă sunteți suficient de bătrâni, ați putea chiar înțelege comentariul din ultima recenzie despre deținerea unei versiuni VHS 😜! Deși nu vom avea nevoie de labeluri pentru modelarea limbajului, putem vedea deja că un `0` denotă o recenzie negativă, în timp ce un `1` corespunde uneia pozitive.
 
-<Tip>
-
-✏️ **Încearcă!** Creați un sample aleatoriu din segmentul `unsupervised` și verificați că labelurile nu sunt nici `0`, nici `1`. În același timp, ați putea verifica și dacă labelurile din segmentele `train` și `test` sunt într-adevăr `0` sau `1` - aceasta este o verificare utilă pe care orice practicant NLP ar trebui să o efectueze la începutul unui nou proiect!
-
-</Tip>
+> [!TIP]
+> ✏️ **Încearcă!** Creați un sample aleatoriu din segmentul `unsupervised` și verificați că labelurile nu sunt nici `0`, nici `1`. În același timp, ați putea verifica și dacă labelurile din segmentele `train` și `test` sunt într-adevăr `0` sau `1` - aceasta este o verificare utilă pe care orice practicant NLP ar trebui să o efectueze la începutul unui nou proiect!
 
 Acum că am aruncat o privire rapidă asupra datelor, să ne apucăm să le pregătim pentru modelarea limbajului mascat. După cum vom vedea, există câteva etape suplimentare pe care trebuie să le parcurgem în comparație cu sarcinile de clasificare a secvențelor pe care le-am văzut în [Capitolul 3](/course/chapter3). Să începem!
 
@@ -298,11 +292,8 @@ tokenizer.model_max_length
 
 Această valoare este derivată din fișierul *tokenizer_config.json* asociat cu un checkpoint; în acest caz putem vedea că dimensiunea contextului este de 512 tokeni, la fel ca în cazul BERT.
 
-<Tip>
-
-✏️ **Încearcă!** Unele modele Transformer, precum [BigBird](https://huggingface.co/google/bigbird-roberta-base) și [Longformer](hf.co/allenai/longformer-base-4096), au o lungime de context mult mai mare decât BERT și alte modele Transformer mai vechi. Inițializați tokenizerul pentru unul dintre aceste checkpointuri și verificați dacă `model_max_length` este în concordanță cu ceea ce este menționat pe model card.
-
-</Tip>
+> [!TIP]
+> ✏️ **Încearcă!** Unele modele Transformer, precum [BigBird](https://huggingface.co/google/bigbird-roberta-base) și [Longformer](hf.co/allenai/longformer-base-4096), au o lungime de context mult mai mare decât BERT și alte modele Transformer mai vechi. Inițializați tokenizerul pentru unul dintre aceste checkpointuri și verificați dacă `model_max_length` este în concordanță cu ceea ce este menționat pe model card.
 
 Prin urmare, pentru a derula experimentele pe GPU-uri precum cele de pe Google Colab, vom alege ceva mai mic care să încapă în memorie:
 
@@ -310,11 +301,8 @@ Prin urmare, pentru a derula experimentele pe GPU-uri precum cele de pe Google C
 chunk_size = 128
 ```
 
-<Tip warning={true}>
-
-Rețineți că utilizarea unei dimensiuni mici a chunkurilor poate fi dăunător în scenariile din lumea reală, astfel încât ar trebui să utilizați o dimensiune care corespunde cazului de utilizare la care veți aplica modelul.
-
-</Tip>
+> [!WARNING]
+> Rețineți că utilizarea unei dimensiuni mici a chunkurilor poate fi dăunător în scenariile din lumea reală, astfel încât ar trebui să utilizați o dimensiune care corespunde cazului de utilizare la care veți aplica modelul.
 
 Acum vine partea distractivă. Pentru a arăta cum funcționează concatenarea, să luăm câteva recenzii din setul nostru de antrenare tokenizat și să imprimăm numărul de tokeni per recenzie:
 
@@ -471,11 +459,8 @@ for chunk in data_collator(samples)["input_ids"]:
 
 Frumos, a funcționat! Putem vedea că tokenul `[MASK]` a fost inserat aleatoriu în diferite locuri din textul nostru. Acestea vor fi tokenii pe care modelul nostru va trebui să le prezică în timpul antrenamentului - iar frumusețea data collatorului este că va introduce aleatoriu tokenul `[MASK]` cu fiecare batch!
 
-<Tip>
-
-✏️ **Încercați!** Rulați fragmentul de cod de mai sus de mai multe ori pentru a vedea cum se întâmplă mascarea aleatorie în fața ochilor voștri! De asemenea, înlocuiți metoda `tokenizer.decode()` cu `tokenizer.convert_ids_to_tokens()` pentru a vedea că uneori un singur token dintr-un cuvânt dat este mascat, și nu celelalte.
-
-</Tip>
+> [!TIP]
+> ✏️ **Încercați!** Rulați fragmentul de cod de mai sus de mai multe ori pentru a vedea cum se întâmplă mascarea aleatorie în fața ochilor voștri! De asemenea, înlocuiți metoda `tokenizer.decode()` cu `tokenizer.convert_ids_to_tokens()` pentru a vedea că uneori un singur token dintr-un cuvânt dat este mascat, și nu celelalte.
 
 {#if fw === 'pt'}
 
@@ -585,11 +570,8 @@ for chunk in batch["input_ids"]:
 '>>> .... [MASK] [MASK] [MASK] [MASK]....... high. a classic line : inspector : i\'m here to sack one of your teachers. student : welcome to bromwell high. i expect that many adults of my age think that bromwell high is far fetched. what a pity that it isn\'t! [SEP] [CLS] homelessness ( or houselessness as george carlin stated ) has been an issue for years but never a plan to help those on the street that were once considered human who did everything from going to school, work, or vote for the matter. most people think of the homeless'
 ```
 
-<Tip>
-
-✏️ **Încercați!** Rulați fragmentul de cod de mai sus de mai multe ori pentru a vedea cum se întâmplă mascarea aleatorie în fața ochilor voștri! De asemenea, înlocuiți metoda `tokenizer.decode()` cu `tokenizer.convert_ids_to_tokens()` pentru a vedea că tokenii dintr-un cuvânt dat sunt întotdeauna mascați împreună.
-
-</Tip>
+> [!TIP]
+> ✏️ **Încercați!** Rulați fragmentul de cod de mai sus de mai multe ori pentru a vedea cum se întâmplă mascarea aleatorie în fața ochilor voștri! De asemenea, înlocuiți metoda `tokenizer.decode()` cu `tokenizer.convert_ids_to_tokens()` pentru a vedea că tokenii dintr-un cuvânt dat sunt întotdeauna mascați împreună.
 
 Acum, că avem două data collators, restul pașilor fine-tuning sunt standard. Pregătirea poate dura ceva timp pe Google Colab dacă nu sunteți suficient de norocos să obțineți un GPU P100 mitic 😭, așa că vom reduce mai întâi dimensiunea setului de antrenare la câteva mii de exemple. Nu vă faceți griji, vom obține în continuare un model lingvistic destul de decent! O modalitate rapidă de a reduce sampleurile unui dataset în 🤗 Datasets este prin intermediul funcției `Dataset.train_test_split()` pe care am văzut-o în [Capitolul 5](/course/chapter5):
 
@@ -814,11 +796,8 @@ trainer.push_to_hub()
 
 {/if}
 
-<Tip>
-
-✏️ **Rândul tău!** Rulați antrenamentul de mai sus după schimbarea data collatorului cu whole word masking collator. Obțineți rezultate mai bune?
-
-</Tip>
+> [!TIP]
+> ✏️ **Rândul tău!** Rulați antrenamentul de mai sus după schimbarea data collatorului cu whole word masking collator. Obțineți rezultate mai bune?
 
 {#if fw === 'pt'} 
 
@@ -1036,8 +1015,5 @@ Frumos - modelul nostru și-a adaptat în mod clar weighturile pentru a prezice
 
 Acest lucru încheie primul nostru experiment de antrenare a unui model lingvistic. În [secțiunea 6](/course/ro/chapter7/6) veți învăța cum să antrenați de la zero un model auto-regressive precum GPT-2; mergeți acolo dacă doriți să vedeți cum vă puteți preantrena propriul model Transformer!
 
-<Tip>
-
-✏️ **Încercați!** Pentru a cuantifica beneficiile adaptării domeniului, faceți fine-tune unui clasificator pe labelurile IMDb atât pentru checkpointurile DistilBERT preantrenate, cât și pentru cele fine-tuned. Dacă aveți nevoie de o recapitulare a clasificării textului, consultați [Capitolul 3](/course/chapter3).
-
-</Tip>
+> [!TIP]
+> ✏️ **Încercați!** Pentru a cuantifica beneficiile adaptării domeniului, faceți fine-tune unui clasificator pe labelurile IMDb atât pentru checkpointurile DistilBERT preantrenate, cât și pentru cele fine-tuned. Dacă aveți nevoie de o recapitulare a clasificării textului, consultați [Capitolul 3](/course/chapter3).
diff --git a/chapters/ro/chapter7/4.mdx b/chapters/ro/chapter7/4.mdx
index ae1098e58..88f1ffb41 100644
--- a/chapters/ro/chapter7/4.mdx
+++ b/chapters/ro/chapter7/4.mdx
@@ -156,11 +156,8 @@ Va fi interesant de văzut dacă modelul nostru fine-tuned reține aceste partic
 
 <Youtube id="0Oxphw4Q9fo"/>
 
-<Tip>
-
-✏️ **Rândul tău!** Un alt cuvânt englezesc care este adesea folosit în franceză este "email". Găsiți primul sample din datasetul de antrenare care utilizează acest cuvânt. Cum este tradus? Cum traduce modelul preantrenat aceeași propoziție în limba engleză?
-
-</Tip>
+> [!TIP]
+> ✏️ **Rândul tău!** Un alt cuvânt englezesc care este adesea folosit în franceză este "email". Găsiți primul sample din datasetul de antrenare care utilizează acest cuvânt. Cum este tradus? Cum traduce modelul preantrenat aceeași propoziție în limba engleză?
 
 ### Procesarea datelor[[processing-the-data]]
 
@@ -177,11 +174,8 @@ tokenizer = AutoTokenizer.from_pretrained(model_checkpoint, return_tensors="pt")
 
 De asemenea, puteți înlocui `model_checkpoint` cu orice alt model preferat din [Hub](https://huggingface.co/models) sau cu un folder local în care ați salvat un model preantrenat și un tokenizer.
 
-<Tip>
-
-💡 Dacă utilizați un tokenizer multilingv, cum ar fi mBART, mBART-50 sau M2M100, va trebui să setați codurile de limbă ale inputurilor și targeturilor în tokenizer prin setarea `tokenizer.src_lang` și `tokenizer.tgt_lang` la valorile corecte.
-
-</Tip>
+> [!TIP]
+> 💡 Dacă utilizați un tokenizer multilingv, cum ar fi mBART, mBART-50 sau M2M100, va trebui să setați codurile de limbă ale inputurilor și targeturilor în tokenizer prin setarea `tokenizer.src_lang` și `tokenizer.tgt_lang` la valorile corecte.
 
 Pregătirea datelor noastre este destul de simplă. Trebuie să rețineți un singur lucru; trebuie să vă asigurați că tokenizerul procesează targeturile în limba de ieșire (aici, franceză). Puteți face acest lucru trecând targeturile la argumentul `text_targets` al metodei `__call__` a tokenizerului.
 
@@ -231,17 +225,11 @@ def preprocess_function(examples):
 
 Rețineți că am stabilit aceeași lungime maximă pentru inputurile și outputurile noastre. Deoarece textele cu care avem de-a face par destul de scurte, vom folosi 128.
 
-<Tip>
+> [!TIP]
+> 💡 Dacă utilizați un model T5 (mai precis, unul dintre checkpointurile `t5-xxx`), modelul se va aștepta ca inputurile text să aibă un prefix care să indice sarcina în cauză, cum ar fi `translate: din engleză în franceză:`.
 
-💡 Dacă utilizați un model T5 (mai precis, unul dintre checkpointurile `t5-xxx`), modelul se va aștepta ca inputurile text să aibă un prefix care să indice sarcina în cauză, cum ar fi `translate: din engleză în franceză:`.
-
-</Tip>
-
-<Tip warning={true}>
-
-⚠️ Nu acordăm atenție attention maskului a targeturilor, deoarece modelul nu se va aștepta la aceasta. În schimb, labelurile corespunzătoare unui padding token trebuie setate la `-100`, astfel încât acestea să fie ignorate în calculul pierderilor. Acest lucru va fi făcut mai târziu de data collatorul nostru, deoarece aplicăm padding dinamic, dar dacă utilizați padding aici, ar trebui să adaptați funcția de preprocesare pentru a seta toate labelurile care corespund simbolului de padding la `-100`.
-
-</Tip>
+> [!WARNING]
+> ⚠️ Nu acordăm atenție attention maskului a targeturilor, deoarece modelul nu se va aștepta la aceasta. În schimb, labelurile corespunzătoare unui padding token trebuie setate la `-100`, astfel încât acestea să fie ignorate în calculul pierderilor. Acest lucru va fi făcut mai târziu de data collatorul nostru, deoarece aplicăm padding dinamic, dar dacă utilizați padding aici, ar trebui să adaptați funcția de preprocesare pentru a seta toate labelurile care corespund simbolului de padding la `-100`.
 
 Acum putem aplica această preprocesare dintr-o singură dată pe toate diviziunile datasetului nostru:
 
@@ -649,11 +637,8 @@ model.fit(
 
 Rețineți că puteți specifica numele repositoriului către care doriți să faceți push cu argumentul `hub_model_id` (în special, va trebui să utilizați acest argument pentru a face push către o organizație). De exemplu, atunci când am trimis modelul către organizația [`huggingface-course`](https://huggingface.co/huggingface-course), am adăugat `hub_model_id="huggingface-course/marian-finetuned-kde4-en-to-fr"` la `Seq2SeqTrainingArguments`. În mod implicit, repositoriul utilizat va fi în namespaceul vostru și denumit după output directory-ul pe care l-ați stabilit, deci aici va fi `"sgugger/marian-finetuned-kde4-en-to-fr"` (care este modelul la care am făcut legătura la începutul acestei secțiuni).
 
-<Tip>
-
-💡 Dacă output directory-ul pe care îl utilizați există deja, acesta trebuie să fie o clonă locală a repositoriului către care doriți să faceți push. Dacă nu este, veți primi o eroare atunci când apelați `model.fit()` și va trebui să setați un nume nou.
-
-</Tip>
+> [!TIP]
+> 💡 Dacă output directory-ul pe care îl utilizați există deja, acesta trebuie să fie o clonă locală a repositoriului către care doriți să faceți push. Dacă nu este, veți primi o eroare atunci când apelați `model.fit()` și va trebui să setați un nume nou.
 
 În cele din urmă, hai să vedem cum arată metricele noastre acum că antrenarea s-a încheiat:
 
@@ -699,11 +684,8 @@ args = Seq2SeqTrainingArguments(
 
 Rețineți că puteți specifica numele complet al repositoriului către care doriți să faceți push cu argumentul `hub_model_id` (în special, va trebui să utilizați acest argument pentru a face push către o organizație). De exemplu, atunci când am trimis modelul către organizația [`huggingface-course`](https://huggingface.co/huggingface-course), am adăugat `hub_model_id="huggingface-course/marian-finetuned-kde4-en-to-fr"` la `Seq2SeqTrainingArguments`. În mod implicit, repositoriul utilizat va fi în namespaceul vostru și denumit după output directory-ul pe care l-ați stabilit, deci în cazul nostru va fi `"sgugger/marian-finetuned-kde4-en-to-fr"` (care este modelul la care am făcut legătura la începutul acestei secțiuni).
 
-<Tip>
-
-💡 Dacă output directory-ul pe care îl utilizați există deja, acesta trebuie să fie o clonă locală a repositoriului către care doriți să faceți push. În caz contrar, veți primi o eroare atunci când vă definiți `Seq2SeqTrainer` și va trebui să stabiliți un nume nou.
-
-</Tip>
+> [!TIP]
+> 💡 Dacă output directory-ul pe care îl utilizați există deja, acesta trebuie să fie o clonă locală a repositoriului către care doriți să faceți push. În caz contrar, veți primi o eroare atunci când vă definiți `Seq2SeqTrainer` și va trebui să stabiliți un nume nou.
 
 În final, transmitem totul către `Seq2SeqTrainer`:
 
@@ -994,8 +976,5 @@ translator(
 
 Un alt exemplu excelent de adaptare a domeniului!
 
-<Tip>
-
-✏️ **E rândul tău!** Care este rezultatul modelului pentru sampleul cuvântul "email" pe care l-ai identificat mai devreme?
-
-</Tip>
+> [!TIP]
+> ✏️ **E rândul tău!** Care este rezultatul modelului pentru sampleul cuvântul "email" pe care l-ai identificat mai devreme?
diff --git a/chapters/ro/chapter7/5.mdx b/chapters/ro/chapter7/5.mdx
index 348265741..11fdb6690 100644
--- a/chapters/ro/chapter7/5.mdx
+++ b/chapters/ro/chapter7/5.mdx
@@ -87,11 +87,8 @@ show_samples(english_dataset)
 '>> Review: Bought this for handling miscellaneous aircraft parts and hanger "stuff" that I needed to organize; it really fit the bill. The unit arrived quickly, was well packaged and arrived intact (always a good sign). There are five wall mounts-- three on the top and two on the bottom. I wanted to mount it on the wall, so all I had to do was to remove the top two layers of plastic drawers, as well as the bottom corner drawers, place it when I wanted and mark it; I then used some of the new plastic screw in wall anchors (the 50 pound variety) and it easily mounted to the wall. Some have remarked that they wanted dividers for the drawers, and that they made those. Good idea. My application was that I needed something that I can see the contents at about eye level, so I wanted the fuller-sized drawers. I also like that these are the new plastic that doesn\'t get brittle and split like my older plastic drawers did. I like the all-plastic construction. It\'s heavy duty enough to hold metal parts, but being made of plastic it\'s not as heavy as a metal frame, so you can easily mount it to the wall and still load it up with heavy stuff, or light stuff. No problem there. For the money, you can\'t beat it. Best one of these I\'ve bought to date-- and I\'ve been using some version of these for over forty years.'
 ```
 
-<Tip>
-
-✏️ **Încercați!** Schimbați seedul aleatoriu în comanda `Dataset.shuffle()` pentru a explora alte recenzii din corpus. Dacă sunteți vorbitor de spaniolă, aruncați o privire la unele dintre recenziile din `spanish_dataset` pentru a vedea dacă și titlurile par a fi rezumate rezonabil.
-
-</Tip>
+> [!TIP]
+> ✏️ **Încercați!** Schimbați seedul aleatoriu în comanda `Dataset.shuffle()` pentru a explora alte recenzii din corpus. Dacă sunteți vorbitor de spaniolă, aruncați o privire la unele dintre recenziile din `spanish_dataset` pentru a vedea dacă și titlurile par a fi rezumate rezonabil.
 
 Acest sample arată diversitatea recenziilor pe care le găsim de obicei online, variind de la pozitive la negative (și totul între ele!). Deși exemplul cu titlul "meh" nu este foarte informativ, celelalte titluri par a fi rezumate decente ale recenziilor în sine. Antrenarea unui model de rezumare pe toate cele 400 000 de recenzii ar dura mult prea mult pe un singur GPU, așa că ne vom concentra pe generarea de rezumate pentru un singur domeniu de produse. Pentru a avea o idee despre domeniile din care putem alege, să convertim `english_dataset` într-un `pandas.DataFrame` și să calculăm numărul de recenzii per categorie de produse:
 
@@ -229,11 +226,8 @@ Ne vom concentra asupra mT5, o arhitectură interesantă bazată pe T5, care a f
 mT5 nu utilizează prefixe, dar împărtășește o mare parte din versatilitatea T5 și are avantajul de a fi multilingv. Acum că am ales un model, să aruncăm o privire la pregătirea datelor noastre pentru antrenare.
 
 
-<Tip>
-
-✏️ **Încercați!** După ce ați parcurs această secțiune, vedeți cât de bine se compară mT5 cu mBART prin aplicarea fine-tuningului acestuia din urmă cu aceleași tehnici. Pentru puncte bonus, puteți încerca, de asemenea, fine-tuningul a T5 doar pe recenziile în limba engleză. Deoarece T5 are un prefix prompt special, va trebui să adăugați `summarize:` la exemplele de intrare în pașii de preprocesare de mai jos.
-
-</Tip>
+> [!TIP]
+> ✏️ **Încercați!** După ce ați parcurs această secțiune, vedeți cât de bine se compară mT5 cu mBART prin aplicarea fine-tuningului acestuia din urmă cu aceleași tehnici. Pentru puncte bonus, puteți încerca, de asemenea, fine-tuningul a T5 doar pe recenziile în limba engleză. Deoarece T5 are un prefix prompt special, va trebui să adăugați `summarize:` la exemplele de intrare în pașii de preprocesare de mai jos.
 
 ## Preprocessing the data[[preprocessing-the-data]]
 
@@ -248,11 +242,8 @@ model_checkpoint = "google/mt5-small"
 tokenizer = AutoTokenizer.from_pretrained(model_checkpoint)
 ```
 
-<Tip>
-
-💡 În stadiile inițiale ale proiectelor NLP, o bună practică este de a antrena o clasă de modele "mici" pe un sample mic de date. Acest lucru vă permite să faceți debug și să iterați mai rapid către un flux de lucru end-to-end. Odată ce sunteți încrezător în rezultate, puteți oricând să măriți modelul prin simpla schimbare a checkpointului modelului!
-
-</Tip>
+> [!TIP]
+> 💡 În stadiile inițiale ale proiectelor NLP, o bună practică este de a antrena o clasă de modele "mici" pe un sample mic de date. Acest lucru vă permite să faceți debug și să iterați mai rapid către un flux de lucru end-to-end. Odată ce sunteți încrezător în rezultate, puteți oricând să măriți modelul prin simpla schimbare a checkpointului modelului!
 
 Să testăm tokenizerul mT5 pe un mic exemplu:
 
@@ -307,11 +298,8 @@ tokenized_datasets = books_dataset.map(preprocess_function, batched=True)
 
 Acum că corpusul a fost preprocesat, să aruncăm o privire asupra unor metrici care sunt utilizate în mod obișnuit pentru sumarizare. După cum vom vedea, nu există un glonț de argint atunci când vine vorba de măsurarea calității textului generat de calculator.
 
-<Tip>
-
-💡 Poate ați observat că am folosit `batched=True` în funcția noastră `Dataset.map()` de mai sus. Aceasta codifică exemplele în batchuri de 1.000 (implicit) și vă permite să utilizați capacitățile multithreading ale tokenizerilor rapizi din 🤗 Transformers. Atunci când este posibil, încercați să utilizați `batched=True` pentru a profita la maximum de preprocesare!
-
-</Tip>
+> [!TIP]
+> 💡 Poate ați observat că am folosit `batched=True` în funcția noastră `Dataset.map()` de mai sus. Aceasta codifică exemplele în batchuri de 1.000 (implicit) și vă permite să utilizați capacitățile multithreading ale tokenizerilor rapizi din 🤗 Transformers. Atunci când este posibil, încercați să utilizați `batched=True` pentru a profita la maximum de preprocesare!
 
 
 ## Metrice pentru sumarizare[[metrics-for-text-summarization]]
@@ -329,11 +317,8 @@ reference_summary = "I loved reading the Hunger Games"
 
 O modalitate de a le compara ar fi să numărați numărul de cuvinte care se suprapun, care în acest caz ar fi 6. Cu toate acestea, acest lucru este un pic crud, astfel încât, în schimb, ROUGE se bazează pe calcularea scorurilor _preicision_ și _recall_ pentru suprapunere.
 
-<Tip>
-
-🙋 Nu vă faceți griji dacă aceasta este prima dată când auziți de precision și recall - vom trece împreună prin câteva exemple explicite pentru a clarifica totul. Aceste metrici sunt de obicei întâlnite în sarcinile de clasificare, deci dacă doriți să înțelegeți cum sunt definite precizia și recallul în acest context, vă recomandăm să consultați [ghidurile `scikit-learn`] (https://scikit-learn.org/stable/auto_examples/model_selection/plot_precision_recall.html).
-
-</Tip>
+> [!TIP]
+> 🙋 Nu vă faceți griji dacă aceasta este prima dată când auziți de precision și recall - vom trece împreună prin câteva exemple explicite pentru a clarifica totul. Aceste metrici sunt de obicei întâlnite în sarcinile de clasificare, deci dacă doriți să înțelegeți cum sunt definite precizia și recallul în acest context, vă recomandăm să consultați [ghidurile `scikit-learn`] (https://scikit-learn.org/stable/auto_examples/model_selection/plot_precision_recall.html).
 
 Pentru ROUGE, recall măsoară cât de mult din rezumatul de referință este capturat de cel generat. Dacă comparăm doar cuvinte, recall poate fi calculată conform următoarei formule:
 
@@ -386,11 +371,8 @@ Score(precision=0.86, recall=1.0, fmeasure=0.92)
 
 Grozav, numerele de precision și de recall se potrivesc! Acum ce se întâmplă cu celelalte scoruri ROUGE? `rouge2` măsoară suprapunerea dintre bigrame (suprapunerea perechilor de cuvinte), în timp ce `rougeL` și `rougeLsum` măsoară cele mai lungi secvențe de cuvinte care se potrivesc, căutând cele mai lungi substraturi comune în rezumatele generate și de referință. Termenul "sum" din `rougeLsum` se referă la faptul că această metrică este calculată pentru un rezumat întreg, în timp ce `rougeL` este calculată ca medie a propozițiilor individuale.
 
-<Tip>
-
-✏️ **Încercați!** Creați propriul exemplu de rezumat generat și de referință și vedeți dacă scorurile ROUGE rezultate sunt în concordanță cu un calcul manual bazat pe formulele de precision și recall. Pentru puncte bonus, împărțiți textul în bigrame și comparați precizia și recallul pentru metrica `rouge2`.
-
-</Tip>
+> [!TIP]
+> ✏️ **Încercați!** Creați propriul exemplu de rezumat generat și de referință și vedeți dacă scorurile ROUGE rezultate sunt în concordanță cu un calcul manual bazat pe formulele de precision și recall. Pentru puncte bonus, împărțiți textul în bigrame și comparați precizia și recallul pentru metrica `rouge2`.
 
 Vom folosi aceste scoruri ROUGE pentru a urmări performanța modelului nostru, dar înainte de a face acest lucru, să facem ceva ce orice bun practician NLP ar trebui să facă: să creăm un baseline puternic, dar simplu!
 
@@ -480,11 +462,8 @@ model = TFAutoModelForSeq2SeqLM.from_pretrained(model_checkpoint)
 
 {/if}
 
-<Tip>
-
-💡 Dacă vă întrebați de ce nu vedeți niciun avertisment cu privire la fine-tuningul modelului pe un downstream task, acest lucru se datorează faptului că pentru sarcinile secvență-la-secvență păstrăm toate weighturile rețelei. Comparați acest lucru cu modelul nostru de clasificare a textului din [Capitolul 3](/course/chapter3), unde headul modelului preantrenat a fost înlocuit cu o rețea inițializată aleatoriu.
-
-</Tip>
+> [!TIP]
+> 💡 Dacă vă întrebați de ce nu vedeți niciun avertisment cu privire la fine-tuningul modelului pe un downstream task, acest lucru se datorează faptului că pentru sarcinile secvență-la-secvență păstrăm toate weighturile rețelei. Comparați acest lucru cu modelul nostru de clasificare a textului din [Capitolul 3](/course/chapter3), unde headul modelului preantrenat a fost înlocuit cu o rețea inițializată aleatoriu.
 
 Următorul lucru pe care trebuie să îl facem este să ne conectăm la Hugging Face Hub. Dacă executați acest cod într-un notebook, puteți face acest lucru cu următoarea funcție:
 
@@ -845,11 +824,8 @@ model, optimizer, train_dataloader, eval_dataloader = accelerator.prepare(
 )
 ```
 
-<Tip>
-
-🚨 Dacă faceți antrenarea pe un TPU, va trebui să mutați tot codul de mai sus într-o funcție de antrenare aparte. Consultați [Capitolul 3](/course/chapter3) pentru mai multe detalii.
-
-</Tip>
+> [!TIP]
+> 🚨 Dacă faceți antrenarea pe un TPU, va trebui să mutați tot codul de mai sus într-o funcție de antrenare aparte. Consultați [Capitolul 3](/course/chapter3) pentru mai multe detalii.
 
 Acum că ne-am pregătit obiectele, mai avem trei lucruri de făcut:
 
diff --git a/chapters/ro/chapter7/6.mdx b/chapters/ro/chapter7/6.mdx
index 4c78cb1bd..85f4d79cb 100644
--- a/chapters/ro/chapter7/6.mdx
+++ b/chapters/ro/chapter7/6.mdx
@@ -135,11 +135,8 @@ DatasetDict({
 })
 ```
 
-<Tip>
-
-Preantrenarea modelului de limbaj va dura ceva timp. Vă sugerăm să rulați mai întâi bucla de antrenare pe un sample de date prin decomentarea celor două linii parțiale de mai sus și să vă asigurați că antrenarea se finalizează cu succes și că modelele sunt stocate. Nimic nu este mai frustrant decât o rulare de antrenare care eșuează la ultimul pas pentru că ați uitat să creați un folder sau pentru că există o greșeală de tipar la sfârșitul buclei de antrenare!
-
-</Tip>
+> [!TIP]
+> Preantrenarea modelului de limbaj va dura ceva timp. Vă sugerăm să rulați mai întâi bucla de antrenare pe un sample de date prin decomentarea celor două linii parțiale de mai sus și să vă asigurați că antrenarea se finalizează cu succes și că modelele sunt stocate. Nimic nu este mai frustrant decât o rulare de antrenare care eșuează la ultimul pas pentru că ați uitat să creați un folder sau pentru că există o greșeală de tipar la sfârșitul buclei de antrenare!
 
 Să ne uităm la un exemplu din dataset. Vom arăta doar primele 200 de caractere din fiecare câmp:
 
@@ -252,11 +249,8 @@ Avem acum 16,7 milioane de exemple cu 128 de tokenii fiecare, ceea ce corespunde
 
 Acum că avem datasetul gata, hai să configurăm modelul!
 
-<Tip>
-
-✏️ **Încercați!** Eliminarea tuturor bucăților care sunt mai mici decât dimensiunea contextului nu a fost o problemă majoră aici, deoarece folosim ferestre de context mici. Pe măsură ce creșteți dimensiunea contextului (sau dacă aveți un corpus de documente scurte), fracțiunea de segmente care sunt aruncate va crește și ea. O modalitate mai eficientă de a pregăti datele este de a uni toate sampleurile tokenizate într-un batch cu un token `eos_token_id` între ele, iar apoi de a efectua chunkingul pe secvențele concatenate. Ca exercițiu, modificați funcția `tokenize()` pentru a utiliza această abordare. Rețineți că veți dori să setați `truncation=False` și să eliminați celelalte argumente din tokenizer pentru a obține secvența completă de token IDs.
-
-</Tip>
+> [!TIP]
+> ✏️ **Încercați!** Eliminarea tuturor bucăților care sunt mai mici decât dimensiunea contextului nu a fost o problemă majoră aici, deoarece folosim ferestre de context mici. Pe măsură ce creșteți dimensiunea contextului (sau dacă aveți un corpus de documente scurte), fracțiunea de segmente care sunt aruncate va crește și ea. O modalitate mai eficientă de a pregăti datele este de a uni toate sampleurile tokenizate într-un batch cu un token `eos_token_id` între ele, iar apoi de a efectua chunkingul pe secvențele concatenate. Ca exercițiu, modificați funcția `tokenize()` pentru a utiliza această abordare. Rețineți că veți dori să setați `truncation=False` și să eliminați celelalte argumente din tokenizer pentru a obține secvența completă de token IDs.
 
 
 ## Inițializarea unui nou model[[initializing-a-new-model]]
@@ -398,11 +392,8 @@ tf_eval_dataset = model.prepare_tf_dataset(
 
 {/if}
 
-<Tip warning={true}>
-
-⚠️ Schimbarea inputurilor și a labelurilor pentru a le alinia are loc în interiorul modelului, astfel încât data collatorului doar copiază inputurile pentru a crea labeluri.
-
-</Tip>
+> [!WARNING]
+> ⚠️ Schimbarea inputurilor și a labelurilor pentru a le alinia are loc în interiorul modelului, astfel încât data collatorului doar copiază inputurile pentru a crea labeluri.
 
 
 Acum avem totul pregătit pentru a ne antrena modelul - până la urmă nu a fost atât de greu! Înainte de a începe antrenamentul, trebuie să ne conectăm la Hugging Face. Dacă lucrați într-un notebook, puteți face acest lucru cu următoarea funcție de utilitate:
@@ -501,25 +492,19 @@ model.fit(tf_train_dataset, validation_data=tf_eval_dataset, callbacks=[callback
 
 {/if}
 
-<Tip>
+> [!TIP]
+> ✏️ **Try it out!** Ne-a luat doar aproximativ 30 de linii de cod în plus față de `TrainingArguments` pentru a ajunge de la texte brute la antrenarea GPT-2. Încercați antrenarea cu propriul dataset și vedeți dacă puteți obține rezultate bune!
 
-✏️ **Try it out!** Ne-a luat doar aproximativ 30 de linii de cod în plus față de `TrainingArguments` pentru a ajunge de la texte brute la antrenarea GPT-2. Încercați antrenarea cu propriul dataset și vedeți dacă puteți obține rezultate bune!
-
-</Tip>
-
-<Tip>
-
-{#if fw === 'pt'}
-
-💡 Dacă aveți acces la un calculator cu mai multe GPU-uri, încercați să rulați codul acolo. `Trainer` gestionează automat mai multe calculatoare, iar acest lucru poate accelera foarte mult antrenamentul.
-
-{:else}
-
-💡 Dacă aveți acces la un calculator cu mai multe GPU-uri, puteți încerca să utilizați un context `MirroredStrategy` pentru a accelera substanțial antrenarea. Va trebui să creați un obiect `tf.distribute.MirroredStrategy` și să vă asigurați că toate metodele `to_tf_dataset()` sau `prepare_tf_dataset()`, precum și crearea modelului și apelul la `fit()` sunt rulate în contextul său `scope()`. Puteți vedea documentația despre acest lucru [aici] (https://www.tensorflow.org/guide/distributed_training#use_tfdistributestrategy_with_keras_modelfit).
-
-{/if}
-
-</Tip>
+> [!TIP]
+> {#if fw === 'pt'}
+>
+> 💡 Dacă aveți acces la un calculator cu mai multe GPU-uri, încercați să rulați codul acolo. `Trainer` gestionează automat mai multe calculatoare, iar acest lucru poate accelera foarte mult antrenamentul.
+>
+> {:else}
+>
+> 💡 Dacă aveți acces la un calculator cu mai multe GPU-uri, puteți încerca să utilizați un context `MirroredStrategy` pentru a accelera substanțial antrenarea. Va trebui să creați un obiect `tf.distribute.MirroredStrategy` și să vă asigurați că toate metodele `to_tf_dataset()` sau `prepare_tf_dataset()`, precum și crearea modelului și apelul la `fit()` sunt rulate în contextul său `scope()`. Puteți vedea documentația despre acest lucru [aici] (https://www.tensorflow.org/guide/distributed_training#use_tfdistributestrategy_with_keras_modelfit).
+>
+> {/if}
 
 ## Generarea codului cu un pipeline[[code-generation-with-a-pipeline]]
 
@@ -795,11 +780,8 @@ model, optimizer, train_dataloader, eval_dataloader = accelerator.prepare(
 )
 ```
 
-<Tip>
-
-🚨 Dacă antrenați pe un TPU, va trebui să mutați tot codul începând cu celula de mai sus într-o funcție de antrenare dedicată. Consultați [Capitolul 3](/course/chapter3) pentru mai multe detalii.
-
-</Tip>
+> [!TIP]
+> 🚨 Dacă antrenați pe un TPU, va trebui să mutați tot codul începând cu celula de mai sus într-o funcție de antrenare dedicată. Consultați [Capitolul 3](/course/chapter3) pentru mai multe detalii.
 
 Acum că am trimis `train_dataloader` la `accelerator.prepare()`, putem utiliza lungimea acestuia pentru a calcula numărul de pași de antrenare. Rețineți că ar trebui să facem acest lucru întotdeauna după ce pregătim dataloaderurile, deoarece această metodă îi va modifica lungimea. Utilizăm un program liniar clasic de la rata de învățare la 0:
 
@@ -899,16 +881,10 @@ for epoch in range(num_train_epochs):
 
 Și asta e tot - acum aveți propria buclă de antrenare personalizată pentru modele de limbaj cauzal, cum ar fi GPT-2, pe care o puteți personaliza în continuare în funcție de nevoile voastre.
 
-<Tip>
-
-✏️ **încercați!** Fie vă creați propria funcție de pierdere personalizată, adaptată la cazul vostru de utilizare, fie adăugați un alt pas personalizat în bucla de antrenare.
-
-</Tip>
-
-<Tip>
-
-✏️ **încercați!** Atunci când efectuați experimente de antrenare de lungă durată, este o idee bună să înregistrați parametrii importanți utilizând instrumente precum TensorBoard sau Weights & Biases. Adăugați o logare adecvată la bucla de antrenare, astfel încât să puteți verifica întotdeauna cum decurge antrenarea.
+> [!TIP]
+> ✏️ **încercați!** Fie vă creați propria funcție de pierdere personalizată, adaptată la cazul vostru de utilizare, fie adăugați un alt pas personalizat în bucla de antrenare.
 
-</Tip>
+> [!TIP]
+> ✏️ **încercați!** Atunci când efectuați experimente de antrenare de lungă durată, este o idee bună să înregistrați parametrii importanți utilizând instrumente precum TensorBoard sau Weights & Biases. Adăugați o logare adecvată la bucla de antrenare, astfel încât să puteți verifica întotdeauna cum decurge antrenarea.
 
 {/if}
diff --git a/chapters/ro/chapter7/7.mdx b/chapters/ro/chapter7/7.mdx
index 619ecf720..bcf38863e 100644
--- a/chapters/ro/chapter7/7.mdx
+++ b/chapters/ro/chapter7/7.mdx
@@ -32,11 +32,8 @@ Vom face fine-tuning unui-model BERT pe [datasetul SQuAD] (https://rajpurkar.git
 
 Aceasta este de fapt o prezentare a modelului care a fost antrenat și încărcat în Hub folosind codul prezentat în această secțiune. Puteți să-l găsiți și să verificați predicțiile [aici](https://huggingface.co/huggingface-course/bert-finetuned-squad?context=%F0%9F%A4%97+Transformers+is+backed+by+the+three+most+popular+deep+learning+libraries+%E2%80%94+Jax%2C+PyTorch+and+TensorFlow+%E2%80%94+with+a+seamless+integration+between+them.+It%27s+straightforward+to+train+your+models+with+one+before+loading+them+for+inference+with+the+other.&question=Which+deep+learning+libraries+back+%F0%9F%A4%97+Transformers%3F).
 
-<Tip>
-
-💡 Modelele bazate doar pe encoding, cum ar fi BERT, tind să fie foarte bune la extragerea răspunsurilor la întrebări de tip factoid, cum ar fi "Cine a inventat arhitectura Transformer?", dar nu se descurcă prea bine atunci când primesc întrebări deschise, cum ar fi "De ce este cerul albastru?" În aceste cazuri mai dificile, modelele encoder-decoder precum T5 și BART sunt utilizate de obicei pentru a sintetiza informațiile într-un mod destul de similar cu [rezumarea textului](/course/chapter7/5). Dacă sunteți interesat de acest tip de răspuns *generativ* la întrebări, vă recomandăm să consultați [demo-ul](https://yjernite.github.io/lfqa.html) nostru bazat pe [datasetul ELI5](https://huggingface.co/datasets/eli5).
-
-</Tip>
+> [!TIP]
+> 💡 Modelele bazate doar pe encoding, cum ar fi BERT, tind să fie foarte bune la extragerea răspunsurilor la întrebări de tip factoid, cum ar fi "Cine a inventat arhitectura Transformer?", dar nu se descurcă prea bine atunci când primesc întrebări deschise, cum ar fi "De ce este cerul albastru?" În aceste cazuri mai dificile, modelele encoder-decoder precum T5 și BART sunt utilizate de obicei pentru a sintetiza informațiile într-un mod destul de similar cu [rezumarea textului](/course/chapter7/5). Dacă sunteți interesat de acest tip de răspuns *generativ* la întrebări, vă recomandăm să consultați [demo-ul](https://yjernite.github.io/lfqa.html) nostru bazat pe [datasetul ELI5](https://huggingface.co/datasets/eli5).
 
 ## Pregătirea datelor[[preparing-the-data]]
 
@@ -359,11 +356,8 @@ print(f"Theoretical answer: {answer}, decoded example: {decoded_example}")
 
 Într-adevăr, nu vedem răspunsul în interiorul contextului.
 
-<Tip>
-
-✏️ **E rândul tău!** Atunci când se utilizează arhitectura XLNet, paddingul este aplicat la stânga, iar întrebarea și contextul sunt schimbate. Adaptați tot codul pe care tocmai l-am văzut la arhitectura XLNet (și adăugați `padding=True`). Fiți conștienți de faptul că tokenul `[CLS]` ar putea să nu se afle la poziția 0 în cazul aplicării paddingului.
-
-</Tip>
+> [!TIP]
+> ✏️ **E rândul tău!** Atunci când se utilizează arhitectura XLNet, paddingul este aplicat la stânga, iar întrebarea și contextul sunt schimbate. Adaptați tot codul pe care tocmai l-am văzut la arhitectura XLNet (și adăugați `padding=True`). Fiți conștienți de faptul că tokenul `[CLS]` ar putea să nu se afle la poziția 0 în cazul aplicării paddingului.
 
 Acum că am văzut pas cu pas cum să preprocesăm datele de antrenare, le putem grupa într-o funcție pe care o vom aplica întregului dataset de antrenare. Vom umple fiecare caracteristică la lungimea maximă pe care am stabilit-o, deoarece majoritatea contextelor vor fi lungi (iar sampleurile corespunzătoare vor fi împărțite în mai multe caracteristici), astfel încât nu există niciun beneficiu real pentru aplicarea paddingului dinamic aici:
 
@@ -911,11 +905,8 @@ tf.keras.mixed_precision.set_global_policy("mixed_float16")
 
 {#if fw === 'pt'}
 
-<Tip>
-
-💡 Dacă folderul de ieșire pe care îl utilizați există, acesta trebuie să fie o clonă locală a repositoriul în care doriți să faceți push (deci setați un nume nou dacă primiți o eroare la definirea `Trainer`).
-
-</Tip>
+> [!TIP]
+> 💡 Dacă folderul de ieșire pe care îl utilizați există, acesta trebuie să fie o clonă locală a repositoriul în care doriți să faceți push (deci setați un nume nou dacă primiți o eroare la definirea `Trainer`).
 
 În cele din urmă, trecem totul în clasa `Trainer` și lansăm antrenarea:
 
@@ -999,11 +990,8 @@ De asemenea, `Trainer` redactează un model card cu toate rezultatele evaluării
 
 În această etapă, puteți utiliza widgetul de inferență de pe Model Hub pentru a testa modelul și pentru a-l oferi prietenilor, familia și animalele de companie preferate. Ați făcut fine-tune cu succes unui model pentru o sarcină de răspundere a unei întrebari - felicitări!
 
-<Tip>
-
-✏️ **E rândul tău!** Încearcă un alt model de arhitectură pentru a vedea dacă are performanțe mai bune la această sarcină!
-
-</Tip>
+> [!TIP]
+> ✏️ **E rândul tău!** Încearcă un alt model de arhitectură pentru a vedea dacă are performanțe mai bune la această sarcină!
 
 {#if fw === 'pt'}
 
diff --git a/chapters/ro/chapter8/2.mdx b/chapters/ro/chapter8/2.mdx
index 5473d60d4..7d84f81b0 100644
--- a/chapters/ro/chapter8/2.mdx
+++ b/chapters/ro/chapter8/2.mdx
@@ -85,11 +85,8 @@ Oh nu, se pare că ceva a mers prost! Dacă sunteți nou în programare, acest t
 
 Există multe informații conținute în aceste rapoarte, așa că să parcurgem împreună părțile cheie. Primul lucru de reținut este că traceback-urile ar trebui citite _de jos în sus_. Aceasta poate părea ciudat dacă sunteți obișnuiți să citiți textul în engleză de sus în jos, dar reflectă faptul că traceback-ul arată secvența de apeluri de funcții pe care `pipeline` le face când descarcă modelul și tokenizer-ul. (Consultați [Capitolul 2](/course/chapter2) pentru mai multe detalii despre cum funcționează `pipeline` în culise.)
 
-<Tip>
-
-🚨 Vedeți acea casetă albastră din jurul "6 frames" în traceback-ul din Google Colab? Aceasta este o caracteristică specială a Colab, care comprimă traceback-ul în "frame-uri". Dacă nu reușiți să găsiți sursa unei erori, asigurați-vă că extindeți traceback-ul complet făcând clic pe acele două săgeți mici.
-
-</Tip>
+> [!TIP]
+> 🚨 Vedeți acea casetă albastră din jurul "6 frames" în traceback-ul din Google Colab? Aceasta este o caracteristică specială a Colab, care comprimă traceback-ul în "frame-uri". Dacă nu reușiți să găsiți sursa unei erori, asigurați-vă că extindeți traceback-ul complet făcând clic pe acele două săgeți mici.
 
 Aceasta înseamnă că ultima linie a traceback-ului indică ultimul mesaj de eroare și dă numele excepției care a fost ridicată. În acest caz, tipul excepției este `OSError`, care indică o eroare legată de sistem. Dacă citim mesajul de eroare însoțitor, putem vedea că pare să existe o problemă cu fișierul *config.json* al modelului și ni se dau două sugestii pentru a o rezolva:
 
@@ -103,11 +100,8 @@ Make sure that:
 """
 ```
 
-<Tip>
-
-💡 Dacă întâlniți un mesaj de eroare care este dificil de înțeles, doar copiați și lipiți mesajul în bara de căutare Google sau [Stack Overflow](https://stackoverflow.com/) (da, chiar!). Există o șansă bună că nu sunteți prima persoană care întâlnește eroarea, și aceasta este o modalitate bună de a găsi soluții pe care alții din comunitate le-au postat. De exemplu, căutarea pentru `OSError: Can't load config for` pe Stack Overflow dă mai multe [rezultate](https://stackoverflow.com/search?q=OSError%3A+Can%27t+load+config+for+) care ar putea fi folosite ca punct de plecare pentru rezolvarea problemei.
-
-</Tip>
+> [!TIP]
+> 💡 Dacă întâlniți un mesaj de eroare care este dificil de înțeles, doar copiați și lipiți mesajul în bara de căutare Google sau [Stack Overflow](https://stackoverflow.com/) (da, chiar!). Există o șansă bună că nu sunteți prima persoană care întâlnește eroarea, și aceasta este o modalitate bună de a găsi soluții pe care alții din comunitate le-au postat. De exemplu, căutarea pentru `OSError: Can't load config for` pe Stack Overflow dă mai multe [rezultate](https://stackoverflow.com/search?q=OSError%3A+Can%27t+load+config+for+) care ar putea fi folosite ca punct de plecare pentru rezolvarea problemei.
 
 Prima sugestie ne cere să verificăm dacă ID-ul modelului este într-adevăr corect, așa că primul lucru de făcut este să copiem identificatorul și să îl lipim în bara de căutare a Hub-ului:
 
@@ -159,11 +153,8 @@ pretrained_checkpoint = "distilbert-base-uncased"
 config = AutoConfig.from_pretrained(pretrained_checkpoint)
 ```
 
-<Tip warning={true}>
-
-🚨 Abordarea pe care o adoptăm aici nu este infailibilă, deoarece colegul nostru poate să fi modificat configurația `distilbert-base-uncased` înainte de a ajusta fin modelul. În viața reală, am vrea să verificăm cu ei mai întâi, dar în scopurile acestei secțiuni vom presupune că au folosit configurația implicită.
-
-</Tip>
+> [!WARNING]
+> 🚨 Abordarea pe care o adoptăm aici nu este infailibilă, deoarece colegul nostru poate să fi modificat configurația `distilbert-base-uncased` înainte de a ajusta fin modelul. În viața reală, am vrea să verificăm cu ei mai întâi, dar în scopurile acestei secțiuni vom presupune că au folosit configurația implicită.
 
 Apoi putem împinge aceasta în repository-ul nostru de model cu funcția `push_to_hub()` a configurației:
 
diff --git a/chapters/ro/chapter8/4.mdx b/chapters/ro/chapter8/4.mdx
index f517bb4f9..ffc274d13 100644
--- a/chapters/ro/chapter8/4.mdx
+++ b/chapters/ro/chapter8/4.mdx
@@ -245,11 +245,8 @@ Deci `1` înseamnă `neutral`, ceea ce înseamnă că cele două propoziții pe
 
 Nu avem ID-uri de tip token aici, deoarece DistilBERT nu le așteaptă; dacă aveți unele în modelul vostru, ar trebui să vă asigurați de asemenea că se potrivesc corespunzător unde sunt prima și a doua propoziție în intrare.
 
-<Tip>
-
-✏️ **Rândul vostru!** Verificați că totul pare corect cu al doilea element al setului de date de antrenament.
-
-</Tip>
+> [!TIP]
+> ✏️ **Rândul vostru!** Verificați că totul pare corect cu al doilea element al setului de date de antrenament.
 
 Facem verificarea doar pe setul de antrenament aici, dar ar trebui desigur să verificați din nou seturile de validare și test în același mod.
 
@@ -523,11 +520,8 @@ Ori de câte ori primiți un mesaj de eroare care începe cu `RuntimeError: CUDA
 
 Pentru a rezolva această problemă, trebuie doar să folosiți mai puțin spațiu GPU -- ceva care este adesea mai ușor de spus decât de făcut. În primul rând, asigurați-vă că nu aveți două modele pe GPU în același timp (dacă nu este necesar pentru problema voastră, desigur). Apoi, probabil ar trebui să reduceți dimensiunea batch-ului, deoarece aceasta afectează direct dimensiunile tuturor ieșirilor intermediare ale modelului și gradienții lor. Dacă problema persistă, considerați folosirea unei versiuni mai mici a modelului vostru.
 
-<Tip>
-
-În următoarea parte a cursului, vom examina tehnici mai avansate care vă pot ajuta să reduceți amprenta de memorie și să vă permită să ajustați fin cele mai mari modele.
-
-</Tip>
+> [!TIP]
+> În următoarea parte a cursului, vom examina tehnici mai avansate care vă pot ajuta să reduceți amprenta de memorie și să vă permită să ajustați fin cele mai mari modele.
 
 ### Evaluarea modelului[[evaluarea-modelului]]
 
@@ -554,11 +548,8 @@ trainer.evaluate()
 TypeError: only size-1 arrays can be converted to Python scalars
 ```
 
-<Tip>
-
-💡 Ar trebui să vă asigurați întotdeauna că puteți rula `trainer.evaluate()` înainte de a lansa `trainer.train()`, pentru a evita risipa multor resurse de calcul înainte de a întâlni o eroare.
-
-</Tip>
+> [!TIP]
+> 💡 Ar trebui să vă asigurați întotdeauna că puteți rula `trainer.evaluate()` înainte de a lansa `trainer.train()`, pentru a evita risipa multor resurse de calcul înainte de a întâlni o eroare.
 
 Înainte de a încerca să depanați o problemă în bucla de evaluare, ar trebui să vă asigurați mai întâi că v-ați uitat la date, puteți forma un batch corespunzător și puteți rula modelul pe el. Am completat toți acești pași, așa că următorul cod poate fi executat fără eroare:
 
@@ -688,11 +679,8 @@ trainer.train()
 
 În acest caz, nu mai sunt probleme, și scriptul nostru va ajusta fin un model care ar trebui să dea rezultate rezonabile. Dar ce putem face când antrenamentul continuă fără nicio eroare, și modelul antrenat nu performează deloc bine? Aceasta este partea cea mai dificilă a machine learning-ului, și vă vom arăta câteva tehnici care pot ajuta.
 
-<Tip>
-
-💡 Dacă folosiți o buclă de antrenament manuală, aceiași pași se aplică pentru a vă depana pipeline-ul de antrenament, dar este mai ușor să îi separați. Asigurați-vă că nu ați uitat `model.eval()` sau `model.train()` la locurile potrivite, sau `zero_grad()` la fiecare pas, totuși!
-
-</Tip>
+> [!TIP]
+> 💡 Dacă folosiți o buclă de antrenament manuală, aceiași pași se aplică pentru a vă depana pipeline-ul de antrenament, dar este mai ușor să îi separați. Asigurați-vă că nu ați uitat `model.eval()` sau `model.train()` la locurile potrivite, sau `zero_grad()` la fiecare pas, totuși!
 
 ## Depanarea erorilor silențioase în timpul antrenamentului[[depanarea-erorilor-silentioase-in-timpul-antrenamentului]]
 
@@ -707,11 +695,8 @@ Modelul vostru va învăța ceva doar dacă este de fapt posibil să învețe ce
 - Există o etichetă care este mai comună decât altele?
 - Care ar trebui să fie loss-ul/metrica dacă modelul ar prezice un răspuns aleatoriu/întotdeauna același răspuns?
 
-<Tip warning={true}>
-
-⚠️ Dacă faceți antrenament distribuit, afișați eșantioane din setul vostru de date în fiecare proces și verificați de trei ori că obțineți același lucru. O eroare comună este să aveți o sursă de aleatoriu în crearea datelor care face ca fiecare proces să aibă o versiune diferită a setului de date.
-
-</Tip>
+> [!WARNING]
+> ⚠️ Dacă faceți antrenament distribuit, afișați eșantioane din setul vostru de date în fiecare proces și verificați de trei ori că obțineți același lucru. O eroare comună este să aveți o sursă de aleatoriu în crearea datelor care face ca fiecare proces să aibă o versiune diferită a setului de date.
 
 După ce vă uitați la datele voastre, treceți prin câteva dintre predicțiile modelului și decodați-le și pe ele. Dacă modelul prezice întotdeauna același lucru, ar putea fi pentru că setul vostru de date este părtinitor către o categorie (pentru problemele de clasificare); tehnici precum supraesantionarea claselor rare ar putea ajuta.
 
@@ -740,11 +725,8 @@ for _ in range(20):
     trainer.optimizer.zero_grad()
 ```
 
-<Tip>
-
-💡 Dacă datele voastre de antrenament sunt dezechilibrate, asigurați-vă să construiți un batch de date de antrenament care conține toate etichetele.
-
-</Tip>
+> [!TIP]
+> 💡 Dacă datele voastre de antrenament sunt dezechilibrate, asigurați-vă să construiți un batch de date de antrenament care conține toate etichetele.
 
 Modelul rezultat ar trebui să aibă rezultate aproape perfecte pe același `batch`. Să calculăm metrica pe predicțiile rezultate:
 
@@ -765,11 +747,8 @@ compute_metrics((preds.cpu().numpy(), labels.cpu().numpy()))
 
 Dacă nu reușiți să faceți modelul vostru să obțină rezultate perfecte ca aceasta, înseamnă că există ceva greșit cu modul în care ați formulat problema sau datele voastre, așa că ar trebui să reparați asta. Doar când reușiți să treceți testul de supraajustare puteți fi siguri că modelul vostru poate învăța de fapt ceva.
 
-<Tip warning={true}>
-
-⚠️ Va trebui să vă recreați modelul și `Trainer` după acest test, deoarece modelul obținut probabil nu va putea să se recupereze și să învețe ceva util pe setul vostru complet de date.
-
-</Tip>
+> [!WARNING]
+> ⚠️ Va trebui să vă recreați modelul și `Trainer` după acest test, deoarece modelul obținut probabil nu va putea să se recupereze și să învețe ceva util pe setul vostru complet de date.
 
 ### Nu ajustați nimic până nu aveți o primă linie de bază[[nu-ajustati-nimic-pana-nu-aveti-o-prima-linie-de-baza]]
 
diff --git a/chapters/ro/chapter8/4_tf.mdx b/chapters/ro/chapter8/4_tf.mdx
index 2895bcf60..620901446 100644
--- a/chapters/ro/chapter8/4_tf.mdx
+++ b/chapters/ro/chapter8/4_tf.mdx
@@ -111,15 +111,12 @@ model.compile(optimizer="adam")
 
 Acum vom folosi loss-ul intern al modelului, și această problemă ar trebui să fie rezolvată! 
 
-<Tip>
-
-✏️ **Rândul vostru!** Ca o provocare opțională după ce am rezolvat celelalte probleme, puteți încerca să vă întoarceți la acest pas și să faceți modelul să funcționeze cu loss-ul original calculat de Keras în loc de loss-ul intern. Va trebui să adăugați `"labels"` la argumentul `label_cols` al `to_tf_dataset()` pentru a vă asigura că etichetele sunt scoase corect, ceea ce vă va da gradienți -- dar mai există o problemă cu loss-ul pe care l-am specificat. Antrenamentul va rula încă cu această problemă, dar învățarea va fi foarte lentă și va ajunge la un platou la un loss de antrenament ridicat. Puteți să vă dați seama ce este?
-
-Un indiciu codificat ROT13, dacă sunteți blocați: Vs lbh ybbx ng gur bhgchgf bs FrdhraprPynffvsvpngvba zbqryf va Genafsbezref, gurve svefg bhgchg vf `ybtvgf`. Jung ner ybtvgf?
-
-Și un al doilea indiciu: Jura lbh fcrpvsl bcgvzvmref, npgvingvbaf be ybffrf jvgu fgevatf, Xrenf frgf nyy gur nethzrag inyhrf gb gurve qrsnhygf. Jung nethzragf qbrf FcnefrPngrtbevpnyPebffragebcl unir, naq jung ner gurve qrsnhygf?
-
-</Tip>
+> [!TIP]
+> ✏️ **Rândul vostru!** Ca o provocare opțională după ce am rezolvat celelalte probleme, puteți încerca să vă întoarceți la acest pas și să faceți modelul să funcționeze cu loss-ul original calculat de Keras în loc de loss-ul intern. Va trebui să adăugați `"labels"` la argumentul `label_cols` al `to_tf_dataset()` pentru a vă asigura că etichetele sunt scoase corect, ceea ce vă va da gradienți -- dar mai există o problemă cu loss-ul pe care l-am specificat. Antrenamentul va rula încă cu această problemă, dar învățarea va fi foarte lentă și va ajunge la un platou la un loss de antrenament ridicat. Puteți să vă dați seama ce este?
+>
+> Un indiciu codificat ROT13, dacă sunteți blocați: Vs lbh ybbx ng gur bhgchgf bs FrdhraprPynffvsvpngvba zbqryf va Genafsbezref, gurve svefg bhgchg vf `ybtvgf`. Jung ner ybtvgf?
+>
+> Și un al doilea indiciu: Jura lbh fcrpvsl bcgvzvmref, npgvingvbaf be ybffrf jvgu fgevatf, Xrenf frgf nyy gur nethzrag inyhrf gb gurve qrsnhygf. Jung nethzragf qbrf FcnefrPngrtbevpnyPebffragebcl unir, naq jung ner gurve qrsnhygf?
 
 Acum, să încercăm antrenamentul. Ar trebui să primim gradienți acum, așa că cu speranță (muzică sinistră se aude aici) putem pur și simplu să apelăm `model.fit()` și totul va funcționa bine!
 
@@ -277,11 +274,8 @@ model = TFAutoModelForSequenceClassification.from_pretrained(model_checkpoint)
 model.compile(optimizer=Adam(5e-5))
 ```
 
-<Tip>
-
-💡 Puteți de asemenea importa funcția `create_optimizer()` din 🤗 Transformers, care vă va da un optimizator AdamW cu weight decay corect precum și warmup și decay pentru rata de învățare. Acest optimizator va produce adesea rezultate puțin mai bune decât cele pe care le obțineți cu optimizatorul Adam implicit.
-
-</Tip>
+> [!TIP]
+> 💡 Puteți de asemenea importa funcția `create_optimizer()` din 🤗 Transformers, care vă va da un optimizator AdamW cu weight decay corect precum și warmup și decay pentru rata de învățare. Acest optimizator va produce adesea rezultate puțin mai bune decât cele pe care le obțineți cu optimizatorul Adam implicit.
 
 Acum, putem încerca să încadrăm modelul cu noua rată de învățare îmbunătățită:
 
@@ -303,11 +297,8 @@ Am acoperit problemele din scriptul de mai sus, dar există mai multe erori comu
 
 Semnul revelator al rămânerii fără memorie este o eroare ca "OOM when allocating tensor" -- OOM este prescurtare pentru "out of memory". Aceasta este o problemă foarte comună când aveți de-a face cu modele de limbă mari. Dacă întâlniți aceasta, o strategie bună este să vă înjumătățiți dimensiunea batch-ului și să încercați din nou. Rețineți, totuși, că unele modele sunt *foarte* mari. De exemplu, GPT-2 complet are 1.5B parametri, ceea ce înseamnă că veți avea nevoie de 6 GB de memorie doar pentru a stoca modelul, și încă 6 GB pentru gradienții săi! Antrenamentul modelului GPT-2 complet va necesita de obicei peste 20 GB de VRAM indiferent de dimensiunea batch-ului pe care îl folosiți, pe care doar câteva GPU-uri îl au. Modele mai ușoare cum ar fi `distilbert-base-cased` sunt mult mai ușor de rulat, și se antrenează mult mai rapid de asemenea.
 
-<Tip>
-
-În următoarea parte a cursului, vom examina tehnici mai avansate care vă pot ajuta să reduceți amprenta de memorie și să vă permită să ajustați fin cele mai mari modele.
-
-</Tip>
+> [!TIP]
+> În următoarea parte a cursului, vom examina tehnici mai avansate care vă pot ajuta să reduceți amprenta de memorie și să vă permită să ajustați fin cele mai mari modele.
 
 ### TensorFlow flămând flămând 🦛[[tensorflow-flamand-flamand]]
 
@@ -362,21 +353,15 @@ for batch in train_dataset:
 model.fit(batch, epochs=20)
 ```
 
-<Tip>
-
-💡 Dacă datele voastre de antrenament sunt dezechilibrate, asigurați-vă să construiți un batch de date de antrenament care conține toate etichetele.
-
-</Tip>
+> [!TIP]
+> 💡 Dacă datele voastre de antrenament sunt dezechilibrate, asigurați-vă să construiți un batch de date de antrenament care conține toate etichetele.
 
 Modelul rezultat ar trebui să aibă rezultate aproape perfecte pe `batch`, cu un loss care scade rapid către 0 (sau valoarea minimă pentru loss-ul pe care îl folosiți).
 
 Dacă nu reușiți să faceți modelul vostru să obțină rezultate perfecte ca aceasta, înseamnă că există ceva greșit cu modul în care ați formulat problema sau datele voastre, așa că ar trebui să reparați asta. Doar când reușiți să treceți testul de supraajustare puteți fi siguri că modelul vostru poate învăța de fapt ceva.
 
-<Tip warning={true}>
-
-⚠️ Va trebui să vă recreați modelul și să recompilați după acest test de supraajustare, deoarece modelul obținut probabil nu va putea să se recupereze și să învețe ceva util pe setul vostru complet de date.
-
-</Tip>
+> [!WARNING]
+> ⚠️ Va trebui să vă recreați modelul și să recompilați după acest test de supraajustare, deoarece modelul obținut probabil nu va putea să se recupereze și să învețe ceva util pe setul vostru complet de date.
 
 ### Nu ajustați nimic până nu aveți o primă linie de bază[[nu-ajustati-nimic-pana-nu-aveti-o-prima-linie-de-baza]]
 
diff --git a/chapters/ro/chapter8/5.mdx b/chapters/ro/chapter8/5.mdx
index 6a1adfeda..85338c6ce 100644
--- a/chapters/ro/chapter8/5.mdx
+++ b/chapters/ro/chapter8/5.mdx
@@ -17,11 +17,8 @@ Când sunteți siguri că aveți un bug în mână, primul pas este să construi
 
 Este foarte important să izolați bucata de cod care produce bug-ul, deoarece nimeni din echipa Hugging Face nu este magician (încă), și nu pot repara ceea ce nu pot vedea. Un exemplu minimal reproductibil ar trebui, așa cum indică numele, să fie reproductibil. Aceasta înseamnă că nu ar trebui să se bazeze pe niciun fișier extern sau date pe care le-ați putea avea. Încercați să înlocuiți datele pe care le folosiți cu niște valori fictive care arată ca cele reale și încă produc aceeași eroare.
 
-<Tip>
-
-🚨 Multe probleme din repository-ul 🤗 Transformers sunt nerezolvate deoarece datele folosite pentru a le reproduce nu sunt accesibile.
-
-</Tip>
+> [!TIP]
+> 🚨 Multe probleme din repository-ul 🤗 Transformers sunt nerezolvate deoarece datele folosite pentru a le reproduce nu sunt accesibile.
 
 Odată ce aveți ceva care este autonom, puteți încerca să îl reduceți la și mai puține linii de cod, construind ceea ce numim un _exemplu minimal reproductibil_. Deși aceasta necesită puțină muncă suplimentară din partea voastră, veți fi aproape garantat să primiți ajutor și o reparație dacă furnizați un reproductor de bug frumos și scurt.
 
diff --git a/chapters/ro/chapter9/1.mdx b/chapters/ro/chapter9/1.mdx
index 1363a4b94..6e2c1fb6e 100644
--- a/chapters/ro/chapter9/1.mdx
+++ b/chapters/ro/chapter9/1.mdx
@@ -32,6 +32,5 @@ Iată câteva exemple de demo-uri de machine learning construite cu Gradio:
 
 Acest capitol este împărțit în secțiuni care includ atât _concepte_ cât și _aplicații_. După ce învățați conceptul din fiecare secțiune, îl veți aplica pentru a construi un anumit tip de demo, de la clasificarea imaginilor la recunoașterea vocii. Până când terminați acest capitol, veți putea construi aceste demo-uri (și multe altele!) în doar câteva linii de cod Python.
 
-<Tip>
-👀 Consultați <a href="https://huggingface.co/spaces" target="_blank">Hugging Face Spaces</a> pentru a vedea multe exemple recente de demo-uri de machine learning construite de comunitatea machine learning!
-</Tip> 
\ No newline at end of file
+> [!TIP]
+> 👀 Consultați <a href="https://huggingface.co/spaces" target="_blank">Hugging Face Spaces</a> pentru a vedea multe exemple recente de demo-uri de machine learning construite de comunitatea machine learning! 
\ No newline at end of file
diff --git a/chapters/ro/chapter9/7.mdx b/chapters/ro/chapter9/7.mdx
index 7ef5f2b4d..8f3631fa9 100644
--- a/chapters/ro/chapter9/7.mdx
+++ b/chapters/ro/chapter9/7.mdx
@@ -61,9 +61,8 @@ demo.launch()
 Acest exemplu simplu de mai sus introduce 4 concepte care stau la baza Blocks:
 
 1. Blocks vă permite să construiți aplicații web care combină markdown, HTML, butoane și componente interactive simplu prin instanțierea obiectelor în Python într-un context `with gradio.Blocks`.
-<Tip>
-🙋Dacă nu sunteți familiarizați cu declarația `with` în Python, vă recomandăm să consultați excelentul [tutorial](https://realpython.com/python-with-statement/) de la Real Python. Întoarceți-vă aici după citirea acestuia 🤗
-</Tip>
+> [!TIP]
+> 🙋Dacă nu sunteți familiarizați cu declarația `with` în Python, vă recomandăm să consultați excelentul [tutorial](https://realpython.com/python-with-statement/) de la Real Python. Întoarceți-vă aici după citirea acestuia 🤗
 Ordinea în care instanțiați componentele contează deoarece fiecare element este redat în aplicația web în ordinea în care a fost creat. (Layout-uri mai complexe sunt discutate mai jos)
 
 2. Puteți defini funcții Python obișnuite oriunde în codul dvs. și să le rulați cu intrări de la utilizator folosind `Blocks`. În exemplul nostru, avem o funcție simplă care "inversează" textul de intrare, dar puteți scrie orice funcție Python, de la un calcul simplu la procesarea predicțiilor dintr-un model de machine learning.
diff --git a/chapters/ru/chapter1/3.mdx b/chapters/ru/chapter1/3.mdx
index ed4fb86d4..8b58d7993 100644
--- a/chapters/ru/chapter1/3.mdx
+++ b/chapters/ru/chapter1/3.mdx
@@ -11,11 +11,10 @@
 
 
 
-<Tip>
-👀 Видите кнопку <em>Open in Colab</em> справа сверху? Нажмите на нее, чтобы открыть блокнот в Google Colab с примерами кода этого раздела. Эта кнопка будет во всех разделах, которые содержат примеры кода.
-
-Если вы хотите запускать ноутбуки локально, то обратите внимание на раздел <a href="/course/chapter0">setup</a>.
-</Tip>
+> [!TIP]
+> 👀 Видите кнопку <em>Open in Colab</em> справа сверху? Нажмите на нее, чтобы открыть блокнот в Google Colab с примерами кода этого раздела. Эта кнопка будет во всех разделах, которые содержат примеры кода.
+>
+> Если вы хотите запускать ноутбуки локально, то обратите внимание на раздел <a href="/course/chapter0">setup</a>.
 
 ## Трансформеры повсюду!
 
@@ -25,9 +24,8 @@
 
 Библиотека [🤗 Transformers](https://github.com/huggingface/transformers) предоставляет различную функциональность для создания и использования этих моделей. [Model Hub](https://huggingface.co/models) содержит тысячи предобученных моделей, которые может скачать и использовать любой. Вы также можете загружать свои модели на Model Hub!
 
-<Tip>
-⚠️ Hugging Face Hub не ограничивается только моделями. Любой человек может поделиться своими моделями или датасетами! Для этого нужно создать учетную запись: <a href="https://huggingface.co/join">Create a huggingface.co</a> 
-</Tip>
+> [!TIP]
+> ⚠️ Hugging Face Hub не ограничивается только моделями. Любой человек может поделиться своими моделями или датасетами! Для этого нужно создать учетную запись: <a href="https://huggingface.co/join">Create a huggingface.co</a>
 
 
 Прежде чем погрузиться в механизм работы трансформеров, давайте взглянем на несколько примеров того, как они могут быть использованы для решения задач NLP. 
@@ -106,11 +104,8 @@ classifier(
 
 Этот пайплайн называется _zero-shot_ потому, что вам нет необходимости дообучать модель для использования. Она может вернуть вам оценки вероятности для любого списка меток, который вы хотите!
 
-<Tip>
-
-✏️ **Попробуйте** Поэкспериментируйте с собственными предложениями и метками, и посмотрите как ведет себя модель!
-
-</Tip>
+> [!TIP]
+> ✏️ **Попробуйте** Поэкспериментируйте с собственными предложениями и метками, и посмотрите как ведет себя модель!
 
 
 ## Генерация текста
@@ -134,11 +129,8 @@ generator("In this course, we will teach you how to")
 
 Вы можете указать, сколько разных «ответов» сгенерирует модель, задав параметр `num_return_sequences`. Чтобы изменить длину ответной последовательности, нужно передать значение в аргумент `max_length`.
 
-<Tip>
-
-✏️ **Попробуйте!** Измените `num_return_sequences` и `max_length` так, чтобы были сгенерированы два ответа каждый из которых будет состоять из 15 слов. 
-
-</Tip>
+> [!TIP]
+> ✏️ **Попробуйте!** Измените `num_return_sequences` и `max_length` так, чтобы были сгенерированы два ответа каждый из которых будет состоять из 15 слов.
 
 
 ## Использование произвольной модели из Hub в пайплайне
@@ -171,11 +163,8 @@ generator(
 
 Как только вы выберете модель, вы увидите, что есть виджет, позволяющий вам попробовать ее прямо на сайте. Таким образом, вы можете быстро протестировать возможности модели перед ее загрузкой.
 
-<Tip>
-
-✏️ **Попробуйте!** Найдите модель для другого языка и используйте виджет для проверки!
-
-</Tip>
+> [!TIP]
+> ✏️ **Попробуйте!** Найдите модель для другого языка и используйте виджет для проверки!
 
 ### The Inference API
 
@@ -210,11 +199,8 @@ unmasker("This course will teach you all about <mask> models.", top_k=2)
 
 Аргумент `top_k` указывает, сколько вариантов для пропущенного слова будет отображено. Обратите внимание, что модель заполнит пропуск на месте слова `<mask>`, которое часто интерпретируют как *mask token (токен-маска)*. Другие модели могут использовать другие токены для обозначения пропуска, всегда лучше проверять это. Один из способов сделать это - обратить внимание на виджет для соответствующей модели. 
 
-<Tip>
-
-✏️ **Попробуйте!** Найдите в поиске модель `bert-based-cased` и обратите внимание на его токен-маску в виджете. Что эта модель предскажет, если применить ее в предыдущем примере?
-
-</Tip>
+> [!TIP]
+> ✏️ **Попробуйте!** Найдите в поиске модель `bert-based-cased` и обратите внимание на его токен-маску в виджете. Что эта модель предскажет, если применить ее в предыдущем примере?
 
 ## Распознавание именованных сущностей (NER)
 
@@ -238,11 +224,8 @@ ner("My name is Sylvain and I work at Hugging Face in Brooklyn.")
 
 Мы передали в пайплайн аргумент `grouped_entities=True` для того, чтобы модель сгруппировала части предложения, соответствующие одной сущности: в данном случае модель объединила "Hugging" и "Face" несмотря на то, что название организации состоит из двух слов. На самом деле, как мы увидим в следующей главе, препроцессинг делит даже отдельные слова на несколько частей. Например, `Sylvain` будет разделено на 4 части: `S`, `##yl`, `##va`, and `##in`. На этапе постпроцессинга пайплайн успешно объединит эти части. 
 
-<Tip>
-
-✏️ **Попробуйте!** Найдите на Model Hub модель, позволяющую решать задачу определения частей речи в предложении (part of speech tagging, POS). Что модель предскажет для предложения из примера выше?
-
-</Tip>
+> [!TIP]
+> ✏️ **Попробуйте!** Найдите на Model Hub модель, позволяющую решать задачу определения частей речи в предложении (part of speech tagging, POS). Что модель предскажет для предложения из примера выше?
 
 ## Вопросно-ответные системы
 
@@ -329,11 +312,8 @@ translator("Ce cours est produit par Hugging Face.")
 Так же, как и в задачах генерации и автоматического реферирования текста, вы можете указать максимальную длину `max_length` или минимальную длину `min_length` результата. 
 
 
-<Tip>
-
-✏️ **Попробуйте!** Найдите модель, которая переведет предложение из примера выше на другие языки
-
-</Tip>
+> [!TIP]
+> ✏️ **Попробуйте!** Найдите модель, которая переведет предложение из примера выше на другие языки
 
 Показанные пайплайны в основном носят демонстрационный характер, потому что настроены на решение конкретных задач. В следующей главе вы узнаете, как изменить поведение функции `pipeline()`. 
 
diff --git a/chapters/ru/chapter2/1.mdx b/chapters/ru/chapter2/1.mdx
index 94add379c..b1043c6b5 100644
--- a/chapters/ru/chapter2/1.mdx
+++ b/chapters/ru/chapter2/1.mdx
@@ -20,6 +20,5 @@
 
 Затем мы рассмотрим API токенизатора, который является другим основным компонентом функции `pipeline()`. Токенизаторы берут на себя первый и последний шаги препроцессинга, обработку преобразования текста в числовые входы для нейронной сети и обратное преобразование в текст, когда это необходимо. Наконец, мы покажем вам, как обрабатывать несколько предложений, передавая их в модель в подготовленном батче, затем завершим все это более подробным рассмотрением высокоуровневой функции `tokenizer()`.
 
-<Tip>
-⚠️ Чтобы воспользоваться всеми возможностями, доступными в Model Hub и 🤗 Transformers, мы рекомендуем <a href="https://huggingface.co/join">создать учетную запись</a>.
-</Tip>
\ No newline at end of file
+> [!TIP]
+> ⚠️ Чтобы воспользоваться всеми возможностями, доступными в Model Hub и 🤗 Transformers, мы рекомендуем <a href="https://huggingface.co/join">создать учетную запись</a>.
\ No newline at end of file
diff --git a/chapters/ru/chapter2/2.mdx b/chapters/ru/chapter2/2.mdx
index b61f2f25d..941a83622 100644
--- a/chapters/ru/chapter2/2.mdx
+++ b/chapters/ru/chapter2/2.mdx
@@ -22,9 +22,8 @@
 
 {/if}
 
-<Tip>
-Это первый раздел, в котором содержание немного отличается в зависимости от того, используете ли вы PyTorch или TensorFlow. Переключите переключатель в верхней части заголовка, чтобы выбрать платформу, которую вы предпочитаете!
-</Tip>
+> [!TIP]
+> Это первый раздел, в котором содержание немного отличается в зависимости от того, используете ли вы PyTorch или TensorFlow. Переключите переключатель в верхней части заголовка, чтобы выбрать платформу, которую вы предпочитаете!
 
 {#if fw === 'pt'}
 <Youtube id="1pedAIvTWXk"/>
@@ -346,8 +345,5 @@ model.config.id2label
 
 Мы успешно воспроизвели три этапа конвейера: предобработку с помощью токенизаторов, прохождение входных данных через модель и постобработку! Теперь давайте уделим немного времени тому, чтобы углубиться в каждый из этих этапов.
 
-<Tip>
-
-✏️ **Попробуйте! ** Выберите два (или более) собственных текста и пропустите их через конвейер `sentiment-analysis`. Затем повторите описанные здесь шаги и убедитесь, что вы получили те же результаты!
-
-</Tip>
+> [!TIP]
+> ✏️ **Попробуйте! ** Выберите два (или более) собственных текста и пропустите их через конвейер `sentiment-analysis`. Затем повторите описанные здесь шаги и убедитесь, что вы получили те же результаты!
diff --git a/chapters/ru/chapter2/4.mdx b/chapters/ru/chapter2/4.mdx
index 45b0bb20b..167cce0d2 100644
--- a/chapters/ru/chapter2/4.mdx
+++ b/chapters/ru/chapter2/4.mdx
@@ -216,11 +216,8 @@ print(ids)
 
 Эти выходы, преобразованные в тензор соответствующего фреймворка, могут быть использованы в качестве входов в модель, как было показано ранее в этой главе.
 
-<Tip>
-
-✏️ **Попробуйте! ** Повторите два последних шага (токенизацию и преобразование во входные идентификаторы) на входных предложениях, которые мы использовали в разделе 2 ("I've been waiting for a HuggingFace course my whole life." и "I hate this so much!"). Убедитесь, что вы получили те же самые входные идентификаторы, которые мы получали ранее!
-
-</Tip>
+> [!TIP]
+> ✏️ **Попробуйте! ** Повторите два последних шага (токенизацию и преобразование во входные идентификаторы) на входных предложениях, которые мы использовали в разделе 2 ("I've been waiting for a HuggingFace course my whole life." и "I hate this so much!"). Убедитесь, что вы получили те же самые входные идентификаторы, которые мы получали ранее!
 
 ## Декодирование[[decoding]]
 
diff --git a/chapters/ru/chapter2/5.mdx b/chapters/ru/chapter2/5.mdx
index 5dfa9c630..c507c8261 100644
--- a/chapters/ru/chapter2/5.mdx
+++ b/chapters/ru/chapter2/5.mdx
@@ -180,11 +180,8 @@ batched_ids = [ids, ids]
 
 Это батч из двух одинаковых последовательностей!
 
-<Tip>
-
-✏️ **Попробуйте!** Преобразуйте этот список `batched_ids` в тензор и пропустите его через вашу модель. Проверьте, что вы получаете те же логиты, что и раньше (но дважды)!
-
-</Tip>
+> [!TIP]
+> ✏️ **Попробуйте!** Преобразуйте этот список `batched_ids` в тензор и пропустите его через вашу модель. Проверьте, что вы получаете те же логиты, что и раньше (но дважды)!
 
 Батчинг позволяет модели работать, когда вы подаете ей несколько последовательностей. Использование нескольких последовательностей так же просто, как и создание батча с одной последовательностью. Однако есть и вторая проблема. Когда вы пытаетесь собрать в батч два (или более) предложения, они могут быть разной длины. Если вы когда-нибудь работали с тензорами, то знаете, что они должны иметь прямоугольную форму, поэтому вы не сможете напрямую преобразовать список входных идентификаторов в тензор. Чтобы обойти эту проблему, мы обычно прибегаем к *дополнению (pad)* входных данных.
 
@@ -316,11 +313,8 @@ tf.Tensor(
 
 Обратите внимание, что последнее значение второй последовательности - это идентификатор дополнения (padding ID), который в маске внимания имеет значение 0.
 
-<Tip>
-
-✏️ **Попробуйте! ** Примените токенизацию вручную к двум предложениям, использованным в разделе 2 ("I've been waiting for a HuggingFace course my whole life." и "I hate this so much!"). Пропустите их через модель и проверьте, что вы получите те же логиты, что и в разделе 2. Теперь объедините их в батч с использованием токена дополнения, а затем создайте соответствующую маску внимания. Проверьте, что при прохождении через модель вы получаете те же результаты!
-
-</Tip>
+> [!TIP]
+> ✏️ **Попробуйте! ** Примените токенизацию вручную к двум предложениям, использованным в разделе 2 ("I've been waiting for a HuggingFace course my whole life." и "I hate this so much!"). Пропустите их через модель и проверьте, что вы получите те же логиты, что и в разделе 2. Теперь объедините их в батч с использованием токена дополнения, а затем создайте соответствующую маску внимания. Проверьте, что при прохождении через модель вы получаете те же результаты!
 
 ## Более длинные последовательности[[longer-sequences]]
 
diff --git a/chapters/ru/chapter3/2.mdx b/chapters/ru/chapter3/2.mdx
index 9c08acee2..b52ac2f16 100644
--- a/chapters/ru/chapter3/2.mdx
+++ b/chapters/ru/chapter3/2.mdx
@@ -89,9 +89,8 @@ Hub содержит не только модели, там также расп
 
 Библиотека 🤗 Datasets предоставляет возможность использовать очень простую команду для загрузки и кэширования датасета с Hub. Мы можем загрузить датасет следующим образом: 
 
-<Tip>
-⚠️ **Предупреждение** Убедитесь, что `datasets` установлены, выполнив `pip install datasets`. Затем загрузите набор данных MRPC и выведите его, чтобы увидеть, что он содержит.
-</Tip>
+> [!TIP]
+> ⚠️ **Предупреждение** Убедитесь, что `datasets` установлены, выполнив `pip install datasets`. Затем загрузите набор данных MRPC и выведите его, чтобы увидеть, что он содержит.
 
 ```py
 from datasets import load_dataset
@@ -150,11 +149,8 @@ raw_train_dataset.features
 
 Переменная `label` типа `ClassLabel` соответствует именам в *names*. `0` соответствует `not_equivalent`, `1` соответствует `equivalent`. 
 
-<Tip>
-
-✏️ **Попробуйте!** Посмотрите на 15-й элемент обучающей выборки и на 87-й элемент вадидационной выборки. Какие у них лейблы?
-
-</Tip>
+> [!TIP]
+> ✏️ **Попробуйте!** Посмотрите на 15-й элемент обучающей выборки и на 87-й элемент вадидационной выборки. Какие у них лейблы?
 
 ### Предобработка датасета
 
@@ -192,11 +188,8 @@ inputs
 
 Мы уже обсуждали ключи `input_ids` и `attention_mask` в [главе 2](../chapter2/1), но не упоминали о `token_type_ids`. В этом примере мы указываем модели какая часть входных данных является первым предложением, а какая вторым. 
 
-<Tip>
-
-✏️ **Попробуйте!** Токенизируйте 15-й элемент обучающей выборки как два предложения, и как пару предложений. В чем разница между двумя результатами?
-
-</Tip>
+> [!TIP]
+> ✏️ **Попробуйте!** Токенизируйте 15-й элемент обучающей выборки как два предложения, и как пару предложений. В чем разница между двумя результатами?
 
 Если мы декодируем ID из `input_ids` обратно в слова: 
 
@@ -353,11 +346,8 @@ batch = data_collator(samples)
 
 {/if}
 
-<Tip>
-
-✏️ **Попробуйте!** Повторите этап препроцессинга для набора данных GLUE SST-2. Он немного отличается, так как состоит из отдельных предложений, а не пар, но в остальном это то же самое, что мы сделали. Для более сложной задачи попробуйте написать функцию предварительной обработки, которая работает с любой из задач GLUE.
-
-</Tip>
+> [!TIP]
+> ✏️ **Попробуйте!** Повторите этап препроцессинга для набора данных GLUE SST-2. Он немного отличается, так как состоит из отдельных предложений, а не пар, но в остальном это то же самое, что мы сделали. Для более сложной задачи попробуйте написать функцию предварительной обработки, которая работает с любой из задач GLUE.
 
 {#if fw === 'tf'}
 
diff --git a/chapters/ru/chapter3/3.mdx b/chapters/ru/chapter3/3.mdx
index 4f00a73b3..38ec64670 100644
--- a/chapters/ru/chapter3/3.mdx
+++ b/chapters/ru/chapter3/3.mdx
@@ -43,11 +43,8 @@ from transformers import TrainingArguments
 training_args = TrainingArguments("test-trainer")
 ```
 
-<Tip>
-
-💡 Если вы хотите автоматически загружать модель на Hub во время обучения, передайте аргумент `push_to_hub=True` в `TrainingArguments`. Мы больше узнаем об этом в [главе 4](../chapter4/3). 
-
-</Tip>
+> [!TIP]
+> 💡 Если вы хотите автоматически загружать модель на Hub во время обучения, передайте аргумент `push_to_hub=True` в `TrainingArguments`. Мы больше узнаем об этом в [главе 4](../chapter4/3).
 
 Второй шаг – задание модели. Так же, как и в [предыдущей главе](../chapter2/1), мы будем использовать класс `AutoModelForSequenceClassification` с двумя лейблами: 
 
@@ -167,9 +164,6 @@ trainer.train()
 
 На этом введение в fine-tuning с использованием API `Trainer` подошло к концу. Пример того, как сделать это же для наиболее распространенных задач  NLP мы рассмотрим в Главе 7, а сейчас взглянем на то, как реализовать то же самое на чистом PyTorch. 
 
-<Tip>
-
-✏️ **Попробуйте!** Произведите fine-tuning модели на датасете GLUE SST-2 с использованием препроцессинга из раздела 2.
-
-</Tip>
+> [!TIP]
+> ✏️ **Попробуйте!** Произведите fine-tuning модели на датасете GLUE SST-2 с использованием препроцессинга из раздела 2.
 
diff --git a/chapters/ru/chapter3/3_tf.mdx b/chapters/ru/chapter3/3_tf.mdx
index bbe24dee5..06a49c730 100644
--- a/chapters/ru/chapter3/3_tf.mdx
+++ b/chapters/ru/chapter3/3_tf.mdx
@@ -70,11 +70,8 @@ model = TFAutoModelForSequenceClassification.from_pretrained(checkpoint, num_lab
 
 Чтобы точно настроить модель в нашем наборе данных, нам просто нужно вызвать `compile()` у нашей модели, а затем передать наши данные в метод `fit()`. Это запустит процесс fine tuning (который должен занять пару минут на графическом процессоре) и сообщит о значениях функции потерь при обучении, а также о значениях функции потерь на валидации.
 
-<Tip>
-
-Обратите внимание, что у моделей 🤗 Transformers есть особая способность, которой нет у большинства моделей Keras — они могут автоматически использовать соответствующие функции потерь. Они будут использовать эти потерю по умолчанию, если вы не установите аргумент `loss` в `compile()`. Обратите внимание, что для использования внутренней функции вам нужно будет передать свои метки классов как часть обучающих данных, а не как отдельную метку, что является обычным способом использования меток с моделями Keras. Вы увидите примеры этого во второй части курса, где определение правильной функции потерь может быть сложным. Однако для классификации последовательностей стандартная функция потерь Keras отлично работает, поэтому мы будем использовать ее здесь.
-
-</Tip>
+> [!TIP]
+> Обратите внимание, что у моделей 🤗 Transformers есть особая способность, которой нет у большинства моделей Keras — они могут автоматически использовать соответствующие функции потерь. Они будут использовать эти потерю по умолчанию, если вы не установите аргумент `loss` в `compile()`. Обратите внимание, что для использования внутренней функции вам нужно будет передать свои метки классов как часть обучающих данных, а не как отдельную метку, что является обычным способом использования меток с моделями Keras. Вы увидите примеры этого во второй части курса, где определение правильной функции потерь может быть сложным. Однако для классификации последовательностей стандартная функция потерь Keras отлично работает, поэтому мы будем использовать ее здесь.
 
 ```py
 from tensorflow.keras.losses import SparseCategoricalCrossentropy
@@ -90,11 +87,8 @@ model.fit(
 )
 ```
 
-<Tip warning={true}>
-
-Обратите внимание на очень распространенную ошибку — в Keras функцию потерь можно задать просто текстовым значением, но по умолчанию Keras будет считать, что вы уже применили softmax к своим выходам. Однако многие модели выводят значения непосредственно перед применением softmax, так называемые *логиты*. Нам нужно указать это в функции потерь, а единственный способ сделать это — вызвать ее напрямую, а не по имени в виде строки.
-
-</Tip>
+> [!WARNING]
+> Обратите внимание на очень распространенную ошибку — в Keras функцию потерь можно задать просто текстовым значением, но по умолчанию Keras будет считать, что вы уже применили softmax к своим выходам. Однако многие модели выводят значения непосредственно перед применением softmax, так называемые *логиты*. Нам нужно указать это в функции потерь, а единственный способ сделать это — вызвать ее напрямую, а не по имени в виде строки.
 
 
 ### Повышение производительности обучения
@@ -123,11 +117,8 @@ from tensorflow.keras.optimizers import Adam
 opt = Adam(learning_rate=lr_scheduler)
 ```
 
-<Tip>
-
-В библиотеке 🤗 Transformers также есть функция `create_optimizer()`, которая создаст оптимизатор `AdamW` с уменьшением скорости обучения. Это удобный способ, с которым вы подробно познакомитесь в следующих разделах курса.
-
-</Tip>
+> [!TIP]
+> В библиотеке 🤗 Transformers также есть функция `create_optimizer()`, которая создаст оптимизатор `AdamW` с уменьшением скорости обучения. Это удобный способ, с которым вы подробно познакомитесь в следующих разделах курса.
 
 Теперь у нас есть новый оптимизатор, и мы можем попробовать обучить модель с помощью него. Во-первых, давайте перезагрузим модель, чтобы сбросить изменения весов из тренировочного прогона, который мы только что сделали, а затем мы можем скомпилировать ее с помощью нового оптимизатора:
 
@@ -145,11 +136,8 @@ model.compile(optimizer=opt, loss=loss, metrics=["accuracy"])
 model.fit(tf_train_dataset, validation_data=tf_validation_dataset, epochs=3)
 ```
 
-<Tip>
-
-💡 Если вы хотите автоматически загружать свою модель на Hub во время обучения, вы можете передать `PushToHubCallback` в метод `model.fit()`. Мы узнаем об этом больше в [Главе 4](../chapter4/3). 
-
-</Tip>
+> [!TIP]
+> 💡 Если вы хотите автоматически загружать свою модель на Hub во время обучения, вы можете передать `PushToHubCallback` в метод `model.fit()`. Мы узнаем об этом больше в [Главе 4](../chapter4/3).
 
 ### Применение модели для классификации
 
diff --git a/chapters/ru/chapter3/4.mdx b/chapters/ru/chapter3/4.mdx
index 87f27fa40..64002b330 100644
--- a/chapters/ru/chapter3/4.mdx
+++ b/chapters/ru/chapter3/4.mdx
@@ -197,11 +197,8 @@ metric.compute()
 
 Повторим: результаты, которые получите вы, могут немного отличаться из-за наличия случайностей при инициализации параметров слоя модели и из-за случайного перемешивания датасета, однако их порядок должен совпадать.
 
-<Tip>
-
-✏️ **Попробуйте!** Измените обучающий цикл так, чтобы дообучить модель на датасете SST-2. 
-
-</Tip>
+> [!TIP]
+> ✏️ **Попробуйте!** Измените обучающий цикл так, чтобы дообучить модель на датасете SST-2.
 
 ### Ускорение обучающего цикла с помощью 🤗 Accelerate
 
@@ -294,9 +291,8 @@ for epoch in range(num_epochs):
 
 Далее главная часть работы выполняется в строке, которая отправляет данные, модель и оптимизатор на `accelerator.prepare()`. Этот метод «обернет» ваши объекты в контейнер и убедится, что распределенное обучение выполняется корректно. Оставшиеся изменения – удаление строки, которая отправляет батч на `device` (повторим: если вы хотите оставить эту строку, замените `device` на `accelerator.device`) и замените `loss.backward()` на `accelerator.backward(loss)`.
 
-<Tip>
-⚠️ Чтобы воспользоваться ускорением, предлагаемым облачными TPU, мы рекомендуем дополнять данные до фиксированной длины с помощью аргументов `padding="max_length"` и `max_length` токенизатора.
-</Tip>
+> [!TIP]
+> ⚠️ Чтобы воспользоваться ускорением, предлагаемым облачными TPU, мы рекомендуем дополнять данные до фиксированной длины с помощью аргументов `padding="max_length"` и `max_length` токенизатора.
 
 Если вы хотите скопировать и запустить этот код, это полная версия с использованием 🤗 Accelerate:
 
diff --git a/chapters/ru/chapter4/2.mdx b/chapters/ru/chapter4/2.mdx
index b08768ddd..ab576467a 100644
--- a/chapters/ru/chapter4/2.mdx
+++ b/chapters/ru/chapter4/2.mdx
@@ -91,7 +91,5 @@ model = TFAutoModelForMaskedLM.from_pretrained("camembert-base")
 ```
 {/if}
 
-<Tip>
-
-При использовании предварительно обученной модели обязательно проверьте: как она была обучена, на каких наборах данных, ее ограничениях и смещениях. Вся эта информация должна быть указана в карточке модели.
-</Tip>
+> [!TIP]
+> При использовании предварительно обученной модели обязательно проверьте: как она была обучена, на каких наборах данных, ее ограничениях и смещениях. Вся эта информация должна быть указана в карточке модели.
diff --git a/chapters/ru/chapter4/3.mdx b/chapters/ru/chapter4/3.mdx
index 3a7fbf507..8ccc057bc 100644
--- a/chapters/ru/chapter4/3.mdx
+++ b/chapters/ru/chapter4/3.mdx
@@ -172,9 +172,8 @@ tokenizer.push_to_hub("dummy-model", organization="huggingface", use_auth_token=
 </div>
 {/if}
 
-<Tip>
-✏️ **Попробуйте!** Используйте модель и токенайзер чекпоинта `bert-base-cased` и загрузите их в собственный профиль с помощью метода `push_to_hub()`. Проверьте, что репозиторий корректно создался перед его удалением. 
-</Tip>
+> [!TIP]
+> ✏️ **Попробуйте!** Используйте модель и токенайзер чекпоинта `bert-base-cased` и загрузите их в собственный профиль с помощью метода `push_to_hub()`. Проверьте, что репозиторий корректно создался перед его удалением.
 
 Как мы увидели выше, метод `push_to_hub()` поддерживает несколько аргументов, позволяющих загрузить данные в конкретный профиль или профиль организации или использовать конкретный токен. Мы рекомендуем обратить внимание на спецификацию метода, доступную по ссылке [🤗 Transformers documentation](https://huggingface.co/transformers/model_sharing.html), и ознакомиться с остальными возможностями метода. 
 
@@ -458,9 +457,8 @@ config.json  README.md  sentencepiece.bpe.model  special_tokens_map.json  tf_mod
 
 {/if}
 
-<Tip>
-✏️ При создании репозитория с помощью веб-интерфейса, файл *.gitattributes* автоматически фиксирует файлы с определенными расширениями (*.bin* и *.h5*) как большие файлы, и git-lfs отследит их без необходимости делать это вручную. 
-</Tip> 
+> [!TIP]
+> ✏️ При создании репозитория с помощью веб-интерфейса, файл *.gitattributes* автоматически фиксирует файлы с определенными расширениями (*.bin* и *.h5*) как большие файлы, и git-lfs отследит их без необходимости делать это вручную. 
 
 Теперь мы можем продолжить и продолжить, как обычно делаем с традиционными репозиториями Git. Мы можем добавить все файлы в промежуточную среду Git с помощью команды `git add`:
 
diff --git a/chapters/ru/chapter5/2.mdx b/chapters/ru/chapter5/2.mdx
index b4a9c9ab5..b97c3c1f7 100644
--- a/chapters/ru/chapter5/2.mdx
+++ b/chapters/ru/chapter5/2.mdx
@@ -46,9 +46,8 @@ SQuAD_it-train.json.gz:	   82.2% -- replaced with SQuAD_it-train.json
 ```
 После выполнения команд мы увидим, что архивы будут заменены файлами _SQuAD_it-train.json_ и _SQuAD_it-text.json_ в формате JSON. 
 
-<Tip>
-✎ Причина, по которой в примере выше перед командами расположен `!` заключается в том, что мы выполняем их в Jupyter notebook. Если вы хотите запустить эти команды в терминале – просто удалите `!`. 
-</Tip>
+> [!TIP]
+> ✎ Причина, по которой в примере выше перед командами расположен `!` заключается в том, что мы выполняем их в Jupyter notebook. Если вы хотите запустить эти команды в терминале – просто удалите `!`.
 
 Для загрузки JSON файла с помощью функции `load_dataset()` необходимо знать, с каким типом JSON-файла мы имеем дело: обычный JSON (похожий на вложенный словарь) или JSON, сформированный построчно. Как и многие датасеты для задач question-answering, SQuAD-it использует формат обычного JSON'а с текстом, хранящимся в поле `data`. Это означает, что мы можем подгрузить датасет, задав аргумент `field` следующим образом: 
 
@@ -121,11 +120,8 @@ DatasetDict({
 
 Это ровно то, чего мы хотели добиться! Далее мы можем применять различные приемы для препроцессинга данных: очистку, токенизацию  и прочее. 
 
-<Tip>
-
-Аргумент `data_files` функции `load_dataset()` очень гибкий и может являться путем к файлу, списком путей файлов или словарем, в котором указаны названия сплитов (обучающего и тестового) и пути к соответствующим файлам. Вы также можете найти все подходящие файлы в директории с использованием маски по правилам Unix-консоли (т.е. указать путь к директории и указать `data_files="*.json"` для конкретного сплита). Более подробно это изложено в [документации](https://huggingface.co/docs/datasets/loading#local-and-remote-files) 🤗 Datasets. 
-
-</Tip>
+> [!TIP]
+> Аргумент `data_files` функции `load_dataset()` очень гибкий и может являться путем к файлу, списком путей файлов или словарем, в котором указаны названия сплитов (обучающего и тестового) и пути к соответствующим файлам. Вы также можете найти все подходящие файлы в директории с использованием маски по правилам Unix-консоли (т.е. указать путь к директории и указать `data_files="*.json"` для конкретного сплита). Более подробно это изложено в [документации](https://huggingface.co/docs/datasets/loading#local-and-remote-files) 🤗 Datasets.
 
 Скрипты загрузки 🤗 Datasets также поддерживают автоматическую распаковку входных файлов, поэтому мы можем пропустить команду `gzip` просто передав в аргумент `data_files` пути к архивам: 
 
@@ -154,10 +150,7 @@ squad_it_dataset = load_dataset("json", data_files=data_files, field="data")
 Эта операция вернет такой же `DatasetDict`, какой мы получали ранее, но избавит нас от загрузки и разархивирования файлов _SQuAD_it-*.json.gz_ вручную. 
 На этом мы завершаем наш обзор различных способов загрузки датасетов, которые не размещены на Hugging Face Hub. Теперь, когда у нас есть датасет, с которым можно поиграться, давайте погрузимся в различные методы обработки данных!
 
-<Tip>
-
-✏️ **Попробуйте!** Выберите другой датасет, расположенный на GitHub или в архиве [UCI Machine Learning Repository](https://archive.ics.uci.edu/ml/index.php) и попробуйте загрузить его с локальной машины и с удаленного сервера. В качестве бонуса попробуйте загрузить датасет в формате CSV или обычного тектового файла (см. детали по поддерживаемым форматам в [документации](https://huggingface.co/docs/datasets/loading#local-and-remote-files)). 
-
-</Tip>
+> [!TIP]
+> ✏️ **Попробуйте!** Выберите другой датасет, расположенный на GitHub или в архиве [UCI Machine Learning Repository](https://archive.ics.uci.edu/ml/index.php) и попробуйте загрузить его с локальной машины и с удаленного сервера. В качестве бонуса попробуйте загрузить датасет в формате CSV или обычного тектового файла (см. детали по поддерживаемым форматам в [документации](https://huggingface.co/docs/datasets/loading#local-and-remote-files)).
 
 
diff --git a/chapters/ru/chapter5/3.mdx b/chapters/ru/chapter5/3.mdx
index e9f0699b8..0299d7c13 100644
--- a/chapters/ru/chapter5/3.mdx
+++ b/chapters/ru/chapter5/3.mdx
@@ -89,11 +89,8 @@ DatasetDict({
 })
 ```
 
-<Tip>
-
-✏️ **Попробуйте!** Используйте функцию `Dataset.unique()` для поиска числа уникальных лекарств и состояний пациентов в обучающем и тестовом сплитах.
-
-</Tip>
+> [!TIP]
+> ✏️ **Попробуйте!** Используйте функцию `Dataset.unique()` для поиска числа уникальных лекарств и состояний пациентов в обучающем и тестовом сплитах.
 
 Далее нормализуем все лейблы столбца `condition` с применением `Dataset.map()`. Так же, как мы делали токенизацию в [главе 3](../chapter3/1), мы можем определить простую функцию, которая будет применения для всех строк каждого сплита в `drug_dataset`:
 
@@ -217,11 +214,8 @@ drug_dataset["train"].sort("review_length")[:3]
 
 Как и ожидалось, некоторые отзывы содержат одно слово, хотя это и может быть допустимо для задачи оценки тональности текста, вряд ли будет полезно если мы хотим предсказывать состояние пациента. 
 
-<Tip>
-
-🙋 Альтернативный вариант добавления нового столбца в датасет – использовать функцию `Dataset.add_column()`. Она позволяет создать новый столбец из Python-списка или NumPy-массива, что может быть удобно, если функция `Dataset.map()` не очень подходит для вашего случая.
-
-</Tip>
+> [!TIP]
+> 🙋 Альтернативный вариант добавления нового столбца в датасет – использовать функцию `Dataset.add_column()`. Она позволяет создать новый столбец из Python-списка или NumPy-массива, что может быть удобно, если функция `Dataset.map()` не очень подходит для вашего случая.
 
 Давайте применим функцию `Dataset.filter()` для удаления отзывов, содержащих меньше 30 слов. Схожим образом мы применяли её для столбца  `condition`: мы можем отфильтровать отзывы, в которых число слов меньше порога:
 
@@ -236,11 +230,8 @@ print(drug_dataset.num_rows)
 
 Как вы можете увидеть, эта функция удалила около 15% отзывов из наших исходных обучающих и тестовых наборов данных. 
 
-<Tip>
-
-✏️ **Попробуйте!** Используйте функцию `Dataset.sort()` для проверки наиболее длинных отзывов. Изучите [документацию](https://huggingface.co/docs/datasets/package_reference/main_classes#datasets.Dataset.sort) чтобы понять, какой аргумент нужно передать в функцию, чтобы сортировка произошла в убывающем порядке.
-
-</Tip>
+> [!TIP]
+> ✏️ **Попробуйте!** Используйте функцию `Dataset.sort()` для проверки наиболее длинных отзывов. Изучите [документацию](https://huggingface.co/docs/datasets/package_reference/main_classes#datasets.Dataset.sort) чтобы понять, какой аргумент нужно передать в функцию, чтобы сортировка произошла в убывающем порядке.
 
 Последняя вещь, которую нам необходимо сделать, это справиться с присутствием HTML-кодами символов в наших отзывах. Мы можем использовать модуль `html` и метод `unescape()` чтобы избавиться от них: 
 
@@ -297,11 +288,8 @@ def tokenize_function(examples):
 
 Также присутствует возможность измерить время выполнения всей ячейки: нужно заменить `%time` на `%%time` в начале ячейки. На нашем оборудовании это заняло 10.8 секунд. Это значение расположено после слов "Wall time".
 
-<Tip>
-
-✏️ **Попробуйте!** Выполните эту же инструкцию с и без параметра `batched=True`, затем попробуйте сделать это с "медленным" токенизатором (добавьте `use_fast=False` в метод `AutoTokenizer.from_pretrained()`) и посмотрите, какие значения вы получите на своем оборудовании.
-
-</Tip>
+> [!TIP]
+> ✏️ **Попробуйте!** Выполните эту же инструкцию с и без параметра `batched=True`, затем попробуйте сделать это с "медленным" токенизатором (добавьте `use_fast=False` в метод `AutoTokenizer.from_pretrained()`) и посмотрите, какие значения вы получите на своем оборудовании.
 
 Вот результаты, которые мы получили без и с применением батчинга, и двумя разными по скорости токенизаторами: 
 
@@ -338,19 +326,13 @@ Options         | Fast tokenizer | Slow tokenizer
 
 Это гораздо более разумные результаты для "медленного" токенизатора, но производительность быстрого токенизатора также существенно выросла. Однако, обратите внимание, что это не всегда так — для значений `num_proc`, отличных от 8, наши тесты показали, что быстрее использовать `batched=True` без этой опции. Как правило, мы не рекомендуем использовать мультипроцессинг Python для "быстрых" токенизаторов с параметром `batched=True`.
 
-<Tip>
-
-Использование `num_proc` для ускорения обработки обычно отличная идея, но только в тех случаях, когда функция сама по себе не производит никакой параллелизации. 
-
-</Tip>
+> [!TIP]
+> Использование `num_proc` для ускорения обработки обычно отличная идея, но только в тех случаях, когда функция сама по себе не производит никакой параллелизации.
 
 Объединение всей этой функциональности во всего лишь один метод само по себе прекрасно, но это еще не все! Используя `Dataset.map()` и `batched=True` вы можете поменять число элементов в датасете. Это очень полезно во множестве ситуаций, например, когда вы хотите создать несколько обучающих признаков из одного экземпляра текста. Мы воспользуеся этой возможностью на этапе препроцессинга для нескольких NLP-задач, которые рассмотрим в [главе 7](../chapter7)
 
-<Tip>
-
-💡 В машинном обучении экземпляром (объектом, элементом выборки) является множество _признаков_, которые мы должны подать на вход модели. В некоторых контекстах это множество признаков будет множеством колонок в `Dataset`, а в других (как в текущем примере или в задачах ответов на вопросы) признаки будут софрмированы из одного столбца. 
-
-</Tip>
+> [!TIP]
+> 💡 В машинном обучении экземпляром (объектом, элементом выборки) является множество _признаков_, которые мы должны подать на вход модели. В некоторых контекстах это множество признаков будет множеством колонок в `Dataset`, а в других (как в текущем примере или в задачах ответов на вопросы) признаки будут софрмированы из одного столбца.
 
 Давайте посмотрим как это работает! В этом примере мы токенизируем наши тексты и обрежем их до максимальной длины в 128, однако мы попросим токенизатор вернуть нам *все* получившиеся токены, а не только начальные. Это может быть сделано с помощью параметра `return_overflowing_tokens=True`: 
 
@@ -522,11 +504,8 @@ drug_dataset["train"][:3]
 train_df = drug_dataset["train"][:]
 ```
 
-<Tip>
-
-🚨 Внутри `Dataset.set_format()` изменяет формат, возвращаемый методом `__getitem__()`. Это означает, что когда мы хотим создать новый объект, например, `train_df`, из `Dataset`, формата `"pandas"`, мы должны сделать slice всего датасета и получить `pandas.DataFrame`. Вы можете проверить, что тип `drug_dataset["train"]` – формата `Dataset`, несмотря на выходной формат (который станет `pandas.DataFrame`). 
-
-</Tip>
+> [!TIP]
+> 🚨 Внутри `Dataset.set_format()` изменяет формат, возвращаемый методом `__getitem__()`. Это означает, что когда мы хотим создать новый объект, например, `train_df`, из `Dataset`, формата `"pandas"`, мы должны сделать slice всего датасета и получить `pandas.DataFrame`. Вы можете проверить, что тип `drug_dataset["train"]` – формата `Dataset`, несмотря на выходной формат (который станет `pandas.DataFrame`).
 
 Начиная с этого момента мы можем использовать всю функциональность Pandas. Например, мы можем иначе посчитать расределение `condition` среди нашей выборки: 
 
@@ -595,11 +574,8 @@ Dataset({
 })
 ```
 
-<Tip>
-
-✏️ **Попробуйте!** Вычислите средний рейтинг по подному лекарству и сохраните результат в новом датасете типа `Dataset`.
-
-</Tip>
+> [!TIP]
+> ✏️ **Попробуйте!** Вычислите средний рейтинг по подному лекарству и сохраните результат в новом датасете типа `Dataset`.
 
 На этом мы заканчиваем наш обзор различных техник препроцессинга, доступных в 🤗 Datasets. Чтобы завершить этот раздел, давайте создадим валидационную часть выборки. Прежде, чем сделать это, мы сбросим формат `drug_dataset` обратно к `"arrow"`: 
 
diff --git a/chapters/ru/chapter5/4.mdx b/chapters/ru/chapter5/4.mdx
index c62459ec4..db5b5cb70 100644
--- a/chapters/ru/chapter5/4.mdx
+++ b/chapters/ru/chapter5/4.mdx
@@ -43,11 +43,8 @@ Dataset({
 
 Мы видим, что в нашем наборе данных 15 518 009 строк и 2 столбца — это очень много!
 
-<Tip>
-
-✎ По умолчанию 🤗 Datasets распаковывает файлы, необходимые для загрузки набора данных. Если вы хотите сохранить место на жестком диске, вы можете передать `DownloadConfig(delete_extracted=True)` в аргумент `download_config` функции `load_dataset()`. Дополнительные сведения см. в [документации](https://huggingface.co/docs/datasets/package_reference/builder_classes#datasets.DownloadConfig).
-
-</Tip>
+> [!TIP]
+> ✎ По умолчанию 🤗 Datasets распаковывает файлы, необходимые для загрузки набора данных. Если вы хотите сохранить место на жестком диске, вы можете передать `DownloadConfig(delete_extracted=True)` в аргумент `download_config` функции `load_dataset()`. Дополнительные сведения см. в [документации](https://huggingface.co/docs/datasets/package_reference/builder_classes#datasets.DownloadConfig).
 
 Давайте посмотрим на содержимое первого экземпляра: 
 
@@ -98,11 +95,8 @@ Dataset size (cache file) : 19.54 GB
 
 Приятно — несмотря на то, что он весит почти 20 ГБ, мы можем загрузить и получить доступ к набору данных с гораздо меньшим объемом оперативной памяти!
 
-<Tip>
-
-✏️ **Попробуйте!** Выберите один из [компонентов](https://mystic.the-eye.eu/public/AI/pile_preliminary_components/) из Pile, который больше, чем оперативная память вашего ноутбука или настольного компьютера, загрузите его с 🤗 Datasets и измерьте объем используемой оперативной памяти. Обратите внимание, что для получения точных измерений вам потребуется сделать это в новом процессе. Вы можете найти распакованные размеры каждого компонента в Таблице 1 [документации Pile] (https://arxiv.org/abs/2101.00027).
-
-</Tip>
+> [!TIP]
+> ✏️ **Попробуйте!** Выберите один из [компонентов](https://mystic.the-eye.eu/public/AI/pile_preliminary_components/) из Pile, который больше, чем оперативная память вашего ноутбука или настольного компьютера, загрузите его с 🤗 Datasets и измерьте объем используемой оперативной памяти. Обратите внимание, что для получения точных измерений вам потребуется сделать это в новом процессе. Вы можете найти распакованные размеры каждого компонента в Таблице 1 [документации Pile] (https://arxiv.org/abs/2101.00027).
 
 Если вы знакомы с Pandas, этот результат может стать неожиданностью из-за знаменитого [эмпирического правила] Уэса Кинни (https://wesmckinney.com/blog/apache-arrow-pandas-internals/), согласно которому вам обычно требуется 5 до 10 раз больше оперативной памяти, чем размер вашего набора данных. Так как же 🤗 Datasets решают эту проблему управления памятью? 🤗 Datasets рассматривают каждый набор данных как [файл с отображением в память] (https://en.wikipedia.org/wiki/Memory-mapped_file), который обеспечивает сопоставление между оперативной памятью и хранилищем файловой системы, что позволяет библиотеке получать доступ к элементам и работать с ними без необходимости полной загрузки его в память.
 
@@ -130,11 +124,8 @@ print(
 
 Здесь мы использовали модуль `timeit` Python для измерения времени выполнения `code_snippet`. Обычно вы сможете перебирать набор данных со скоростью от нескольких десятых долей ГБ/с до нескольких ГБ/с. Это прекрасно работает для подавляющего большинства приложений, но иногда вам придется работать с набором данных, который слишком велик даже для хранения на жестком диске вашего ноутбука. Например, если бы мы попытались загрузить весь Pile, нам потребовалось бы 825 ГБ свободного места на диске! Чтобы справиться с такими случаями 🤗 Datasets предоставляют функцию потоковой передачи, которая позволяет нам загружать и получать доступ к элементам на лету, без необходимости загружать весь набор данных. Давайте посмотрим, как это работает.
 
-<Tip>
-
-💡 В Jupyter notebooks вы также можете измерить время исполнения ячейки с использованием  [`%%timeit` magic function](https://ipython.readthedocs.io/en/stable/interactive/magics.html#magic-timeit).
-
-</Tip>
+> [!TIP]
+> 💡 В Jupyter notebooks вы также можете измерить время исполнения ячейки с использованием  [`%%timeit` magic function](https://ipython.readthedocs.io/en/stable/interactive/magics.html#magic-timeit).
 
 ## Потоковая передача датасета
 
@@ -171,11 +162,8 @@ next(iter(tokenized_dataset))
 {'input_ids': [101, 4958, 5178, 4328, 6779, ...], 'attention_mask': [1, 1, 1, 1, 1, ...]}
 ```
 
-<Tip>
-
-💡 Чтобы ускорить токенизацию с потоковой передачей, вы можете передать `batched=True`, как мы делали в последнем разделе. Он будет обрабатывать примеры батчами; размер батча по умолчанию составляет 1000 и может быть указан в аргументе `batch_size`.
-
-</Tip>
+> [!TIP]
+> 💡 Чтобы ускорить токенизацию с потоковой передачей, вы можете передать `batched=True`, как мы делали в последнем разделе. Он будет обрабатывать примеры батчами; размер батча по умолчанию составляет 1000 и может быть указан в аргументе `batch_size`.
 
 Вы также можете перемешать потоковые наборы данных, используя `IterableDataset.shuffle()`, но в отличие от `Dataset.shuffle()`, это только перемешивает элементы в предопределенном `buffer_size`:
 
@@ -276,10 +264,7 @@ next(iter(pile_dataset["train"]))
  'text': 'It is done, and submitted. You can play “Survival of the Tastiest” on Android, and on the web...'}
 ```
 
-<Tip>
-
-✏️ **Попробуйте!** Используйте один из больших корпусов Common Crawl, например [`mc4`](https://huggingface.co/datasets/mc4) или [`oscar`](https://huggingface.co/ datasets/oscar) для создания потокового многоязычного набора данных, который представляет пропорции разговорных языков в стране по вашему выбору. Например, в Швейцарии есть четыре национальных языка: немецкий, французский, итальянский и рето-романский, поэтому вы можете попробовать создать швейцарский корпус, выбрав подмножества Оскаров в соответствии с их разговорной пропорцией.
-
-</Tip>
+> [!TIP]
+> ✏️ **Попробуйте!** Используйте один из больших корпусов Common Crawl, например [`mc4`](https://huggingface.co/datasets/mc4) или [`oscar`](https://huggingface.co/ datasets/oscar) для создания потокового многоязычного набора данных, который представляет пропорции разговорных языков в стране по вашему выбору. Например, в Швейцарии есть четыре национальных языка: немецкий, французский, итальянский и рето-романский, поэтому вы можете попробовать создать швейцарский корпус, выбрав подмножества Оскаров в соответствии с их разговорной пропорцией.
 
 Теперь у вас есть все инструменты, необходимые для загрузки и обработки наборов данных всех форм и размеров, но, если только вам не повезет, в вашем путешествии по НЛП наступит момент, когда вам придется фактически создать собственный набор данных для решения проблемы. Это тема следующего раздела!
diff --git a/chapters/ru/chapter5/6.mdx b/chapters/ru/chapter5/6.mdx
index 117b06c60..d1a98c017 100644
--- a/chapters/ru/chapter5/6.mdx
+++ b/chapters/ru/chapter5/6.mdx
@@ -188,11 +188,8 @@ Dataset({
 Хорошо, это дало нам несколько тысяч комментариев для работы!
 
 
-<Tip>
-
-✏️ **Попробуйте!** Посмотрите, сможете ли вы использовать `Dataset.map()`, чтобы развернуть столбец `comments` столбца `issues_dataset` _без_ использования Pandas. Это немного сложно; вы можете найти раздел ["Batch mapping"](https://huggingface.co/docs/datasets/about_map_batch#batch-mapping) документации 🤗 Datasets, полезным для этой задачи.
-
-</Tip>
+> [!TIP]
+> ✏️ **Попробуйте!** Посмотрите, сможете ли вы использовать `Dataset.map()`, чтобы развернуть столбец `comments` столбца `issues_dataset` _без_ использования Pandas. Это немного сложно; вы можете найти раздел ["Batch mapping"](https://huggingface.co/docs/datasets/about_map_batch#batch-mapping) документации 🤗 Datasets, полезным для этой задачи.
 
 Теперь, когда у нас есть один комментарий в строке, давайте создадим новый столбец `comments_length`, содержащий количество слов в комментарии:
 
@@ -519,8 +516,5 @@ URL: https://github.com/huggingface/datasets/issues/824
 
 Неплохо! Наше второе обращение, кажется, соответствует запросу.
 
-<Tip>
-
-✏️ **Попробуйте!** Создайте свой собственный запрос и посмотрите, сможете ли вы найти ответ в найденных документах. Возможно, вам придется увеличить параметр `k` в `Dataset.get_nearest_examples()`, чтобы расширить поиск.
-
-</Tip>
\ No newline at end of file
+> [!TIP]
+> ✏️ **Попробуйте!** Создайте свой собственный запрос и посмотрите, сможете ли вы найти ответ в найденных документах. Возможно, вам придется увеличить параметр `k` в `Dataset.get_nearest_examples()`, чтобы расширить поиск.
\ No newline at end of file
diff --git a/chapters/ru/chapter6/2.mdx b/chapters/ru/chapter6/2.mdx
index 089e23962..8d59cb8af 100644
--- a/chapters/ru/chapter6/2.mdx
+++ b/chapters/ru/chapter6/2.mdx
@@ -11,11 +11,8 @@
 
 <Youtube id="DJimQynXZsQ"/>
 
-<Tip warning={true}>
-
-⚠️ Обучение токенизатора - это не то же самое, что обучение модели! При обучении модели используется стохастический градиентный спуск, чтобы сделать потери немного меньше для каждого батча. Оно рандомизировано по своей природе (это означает, что вам нужно задать некоторое число seed, чтобы получить одинаковые результаты при повторном обучении). Обучение токенизатора - это статистический процесс, который пытается определить, какие подслова лучше всего выбрать для данного корпуса, а точные правила, используемые для их выбора, зависят от алгоритма токенизации. Это детерминированный процесс, то есть вы всегда получите одинаковые результаты при обучении одного и того же алгоритма на одном и том же корпусе.
-
-</Tip>
+> [!WARNING]
+> ⚠️ Обучение токенизатора - это не то же самое, что обучение модели! При обучении модели используется стохастический градиентный спуск, чтобы сделать потери немного меньше для каждого батча. Оно рандомизировано по своей природе (это означает, что вам нужно задать некоторое число seed, чтобы получить одинаковые результаты при повторном обучении). Обучение токенизатора - это статистический процесс, который пытается определить, какие подслова лучше всего выбрать для данного корпуса, а точные правила, используемые для их выбора, зависят от алгоритма токенизации. Это детерминированный процесс, то есть вы всегда получите одинаковые результаты при обучении одного и того же алгоритма на одном и том же корпусе.
 
 ## Сбор корпуса слов[[assembling-a-corpus]]
 
diff --git a/chapters/ru/chapter6/3.mdx b/chapters/ru/chapter6/3.mdx
index 214e2ba13..0e8f9078b 100644
--- a/chapters/ru/chapter6/3.mdx
+++ b/chapters/ru/chapter6/3.mdx
@@ -33,11 +33,8 @@
 `batched=True`  | 10.8s                  | 4min41s
 `batched=False` | 59.2s                  | 5min3s
 
-<Tip warning={true}>
-
-⚠️ Когда вы токенизируете одно предложение, вы не всегда увидите разницу в скорости между медленной и быстрой версиями одного и того же токенизатора. Более того, быстрая версия может быть даже медленнее! Только при параллельной токенизации большого количества текстов вы сможете увидеть разницу.
-
-</Tip>
+> [!WARNING]
+> ⚠️ Когда вы токенизируете одно предложение, вы не всегда увидите разницу в скорости между медленной и быстрой версиями одного и того же токенизатора. Более того, быстрая версия может быть даже медленнее! Только при параллельной токенизации большого количества текстов вы сможете увидеть разницу.
 
 ## Batch encoding[[batch-encoding]]
 
@@ -108,13 +105,10 @@ encoding.word_ids()
 
 Мы можем видеть, что специальные токены токенизатора `[CLS]` и `[SEP]` сопоставляются с `None`, а затем каждый токен сопоставляется со словом, от которого он происходит. Это особенно полезно для определения того, находится ли токен в начале слова или два токена в одном и том же слове. Для этого мы могли бы использовать префикс `##`, но он работает только для токенизаторов типа BERT; этот метод работает для любого типа токенизаторов, лишь бы он был быстрым. В следующей главе мы увидим, как можно использовать эту возможность для применения меток, которые мы имеем для каждого слова, к токенам в таких задачах, как распознавание именованных сущностей (NER) и тегирование частей речи (part-of-speech - POS). Мы также можем использовать ее для маскирования всех токенов, происходящих от одного и того же слова, при моделировании языка по маске (masked language modeling) (эта техника называется _маскированием всего слова (whole word masking)_).
 
-<Tip>
-
-Понятие "слово" очень сложное. Например, "I'll" (сокращение от "I will") считается одним или двумя словами? На самом деле это зависит от токенизатора и применяемой им операции предварительной токенизации. Некоторые токенизаторы просто разделяют пробелы, поэтому они будут считать это одним словом. Другие используют пунктуацию поверх пробелов, поэтому будут считать это двумя словами.
-
-✏️ **Попробуйте!** Создайте токенизатор из контрольных точек `bert-base-cased` и `roberta-base` и токенизируйте с их помощью "81s". Что вы заметили? Каковы идентификаторы слов?
-
-</Tip>
+> [!TIP]
+> Понятие "слово" очень сложное. Например, "I'll" (сокращение от "I will") считается одним или двумя словами? На самом деле это зависит от токенизатора и применяемой им операции предварительной токенизации. Некоторые токенизаторы просто разделяют пробелы, поэтому они будут считать это одним словом. Другие используют пунктуацию поверх пробелов, поэтому будут считать это двумя словами.
+>
+> ✏️ **Попробуйте!** Создайте токенизатор из контрольных точек `bert-base-cased` и `roberta-base` и токенизируйте с их помощью "81s". Что вы заметили? Каковы идентификаторы слов?
 
 Аналогично, существует метод `sentence_ids()`, который мы можем использовать для сопоставления токена с предложением, из которого оно взято (хотя в этом случае ту же информацию может дать и `token_type_ids`, возвращаемый токенизатором).
 
@@ -131,11 +125,8 @@ Sylvain
 
 Как мы уже говорили, все это происходит благодаря тому, что быстрый токенизатор отслеживает, из какого участка текста происходит каждый токен, в списке *смещений (offsets)*. Чтобы проиллюстрировать их использование, далее мы покажем, как воспроизвести результаты конвейера `token-classification` вручную.
 
-<Tip>
-
-✏️ **Попробуйте!** Создайте свой собственный пример текста и посмотрите, сможете ли вы понять, какие токены связаны с идентификаторами слов, а также как извлечь диапазоны символов для одного слова. Чтобы получить бонусные очки, попробуйте использовать два предложения в качестве входных данных и посмотрите, будут ли идентификаторы предложений иметь для вас смысл.
-
-</Tip>
+> [!TIP]
+> ✏️ **Попробуйте!** Создайте свой собственный пример текста и посмотрите, сможете ли вы понять, какие токены связаны с идентификаторами слов, а также как извлечь диапазоны символов для одного слова. Чтобы получить бонусные очки, попробуйте использовать два предложения в качестве входных данных и посмотрите, будут ли идентификаторы предложений иметь для вас смысл.
 
 ## Внутри конвейера `token-classification`[[inside-the-token-classification-pipeline]]
 
diff --git a/chapters/ru/chapter6/3b.mdx b/chapters/ru/chapter6/3b.mdx
index 6ad4b6ed0..91aa6c029 100644
--- a/chapters/ru/chapter6/3b.mdx
+++ b/chapters/ru/chapter6/3b.mdx
@@ -275,11 +275,8 @@ print(scores[start_index, end_index])
 0.97773
 ```
 
-<Tip>
-
-✏️ **Попробуйте!** Вычислите начальный и конечный индексы для пяти наиболее вероятных ответов.
-
-</Tip>
+> [!TIP]
+> ✏️ **Попробуйте!** Вычислите начальный и конечный индексы для пяти наиболее вероятных ответов.
 
 У нас есть `start_index` и `end_index` ответа в терминах токенов, так что теперь нам просто нужно преобразовать их в индексы символов в контексте. Именно здесь смещения будут очень полезны. Мы можем захватить их и использовать, как мы это делали в задаче token classification:
 
@@ -313,11 +310,8 @@ print(result)
 
 Отлично! Это то же самое, что и в нашем первом примере!
 
-<Tip>
-
-✏️ **Попробуйте! ** Используйте лучшие оценки, которые вы вычислили ранее, чтобы показать пять наиболее вероятных ответов. Чтобы проверить результаты, вернитесь к первому конвейеру и передайте `top_k=5` при его вызове.
-
-</Tip>
+> [!TIP]
+> ✏️ **Попробуйте! ** Используйте лучшие оценки, которые вы вычислили ранее, чтобы показать пять наиболее вероятных ответов. Чтобы проверить результаты, вернитесь к первому конвейеру и передайте `top_k=5` при его вызове.
 
 ## Обработка длинных контекстов[[handling-long-contexts]]
 
@@ -608,11 +602,8 @@ print(candidates)
 
 Эти два кандидата соответствуют лучшим ответам, которые модель смогла найти в каждом фрагменте. Модель гораздо больше уверена в том, что правильный ответ находится во второй части (это хороший знак!). Теперь нам нужно сопоставить эти два диапазона токенов с диапазонами символов в контексте (для получения ответа нам нужно сопоставить только второй, но интересно посмотреть, что модель выбрала в первом фрагменте).
 
-<Tip>
-
-✏️ **Попробуйте!** Адаптируйте приведенный выше код, чтобы он возвращал оценки и промежутки для пяти наиболее вероятных ответов (в целом, а не по частям).
-
-</Tip>
+> [!TIP]
+> ✏️ **Попробуйте!** Адаптируйте приведенный выше код, чтобы он возвращал оценки и промежутки для пяти наиболее вероятных ответов (в целом, а не по частям).
 
 `offsets`, которую мы взяли ранее, на самом деле является списком смещений, по одному списку на каждый фрагмент текста:
 
@@ -633,10 +624,7 @@ for candidate, offset in zip(candidates, offsets):
 
 Если мы проигнорируем первый результат, то получим тот же результат, что и в нашем конвейере для этого длинного контекста - ура!
 
-<Tip>
-
-✏️ **Попробуйте!** Используйте лучшие оценки, которые вы вычислили ранее, чтобы показать пять наиболее вероятных ответов (для всего контекста, а не для каждого фрагмента). Чтобы проверить результаты, вернитесь к первому конвейеру и передайте `top_k=5` при его вызове.
-
-</Tip>
+> [!TIP]
+> ✏️ **Попробуйте!** Используйте лучшие оценки, которые вы вычислили ранее, чтобы показать пять наиболее вероятных ответов (для всего контекста, а не для каждого фрагмента). Чтобы проверить результаты, вернитесь к первому конвейеру и передайте `top_k=5` при его вызове.
 
 На этом мы завершаем наше глубокое погружение в возможности токенизатора. В следующей главе мы снова применим все это на практике, когда покажем, как дообучить модель для ряда распространенных задач NLP.
diff --git a/chapters/ru/chapter6/4.mdx b/chapters/ru/chapter6/4.mdx
index fc5eaecf0..d799acc6b 100644
--- a/chapters/ru/chapter6/4.mdx
+++ b/chapters/ru/chapter6/4.mdx
@@ -47,11 +47,8 @@ print(tokenizer.backend_tokenizer.normalizer.normalize_str("Héllò hôw are ü?
 
 В этом примере, поскольку мы выбрали контрольную точку `bert-base-uncased`, нормализация применила нижний регистр и удалила ударения. 
 
-<Tip>
-
-✏️ **Попробуйте!** Загрузите токенизатор из контрольной точки `bert-base-cased` и передайте ему тот же пример. Какие основные различия вы можете увидеть между версией токенизатора cased и uncased?
-
-</Tip>
+> [!TIP]
+> ✏️ **Попробуйте!** Загрузите токенизатор из контрольной точки `bert-base-cased` и передайте ему тот же пример. Какие основные различия вы можете увидеть между версией токенизатора cased и uncased?
 
 ## Предварительная токенизация[[pre-tokenization]]
 
diff --git a/chapters/ru/chapter6/5.mdx b/chapters/ru/chapter6/5.mdx
index fa05c49ff..2500e00f0 100644
--- a/chapters/ru/chapter6/5.mdx
+++ b/chapters/ru/chapter6/5.mdx
@@ -11,11 +11,8 @@ Byte-Pair Encoding (BPE) изначально была разработана к
 
 <Youtube id="HEikzVL-lZU"/>
 
-<Tip>
-
-💡 В этом разделе подробно рассматривается BPE, вплоть до демонстрации полной реализации. Вы можете пропустить этот раздел, если вам нужен только общий обзор алгоритма токенизации.
-
-</Tip>
+> [!TIP]
+> 💡 В этом разделе подробно рассматривается BPE, вплоть до демонстрации полной реализации. Вы можете пропустить этот раздел, если вам нужен только общий обзор алгоритма токенизации.
 
 ## Алгоритм обучения[[training-algorithm]]
 
@@ -27,11 +24,8 @@ Byte-Pair Encoding (BPE) изначально была разработана к
 
 Тогда базовым словарем будет `["b", "g", "h", "n", "p", "s", "u"]`. В реальном мире этот базовый словарь будет содержать, как минимум, все символы ASCII, а возможно, и некоторые символы Unicode. Если в примере, который вы обрабатываете, используется символ, которого нет в обучающем корпусе, этот символ будет преобразован в неизвестный токен. Это одна из причин, по которой многие модели NLP очень плохо анализируют контент с эмоджи, например.
 
-<Tip>
-
-Токенизаторы GPT-2 и RoBERTa (которые довольно похожи) имеют умный способ решения этой проблемы: они рассматривают слова не как символы Unicode, а как байты. Таким образом, базовый словарь имеет небольшой размер (256), но все символы, которые вы можете придумать, все равно будут включены и не будут преобразованы в неизвестный токен. Этот трюк называется *byte-level BPE*.
-
-</Tip>
+> [!TIP]
+> Токенизаторы GPT-2 и RoBERTa (которые довольно похожи) имеют умный способ решения этой проблемы: они рассматривают слова не как символы Unicode, а как байты. Таким образом, базовый словарь имеет небольшой размер (256), но все символы, которые вы можете придумать, все равно будут включены и не будут преобразованы в неизвестный токен. Этот трюк называется *byte-level BPE*.
 
 После получения базового словаря мы добавляем новые токены, пока не достигнем желаемого объема словаря, обучаясь *слияниям*, которые представляют собой правила слияния двух элементов существующего словаря в новый. Таким образом, в начале эти слияния будут создавать токены с двумя символами, а затем, по мере обучения, более длинные подслова.
 
@@ -74,11 +68,8 @@ Corpus: ("hug", 10), ("p" "ug", 5), ("p" "un", 12), ("b" "un", 4), ("hug" "s", 5
 
 И продолжаем в том же духе, пока не достигнем желаемого размера словаря.
 
-<Tip>
-
-✏️ **Теперь ваша очередь!** Как вы думаете, каким будет следующее правило слияния?
-
-</Tip>
+> [!TIP]
+> ✏️ **Теперь ваша очередь!** Как вы думаете, каким будет следующее правило слияния?
 
 ## Алгоритм токенизации[[tokenization-algorithm]]
 
@@ -99,11 +90,8 @@ Corpus: ("hug", 10), ("p" "ug", 5), ("p" "un", 12), ("b" "un", 4), ("hug" "s", 5
 
 Слово `"bug"` будет токенизировано как `["b", "ug"]`. Слово `"mug"`, однако, будет токенизировано как `["[UNK]", "ug"]`, поскольку буква `"m"` отсутствует в базовом словаре. Аналогично, слово `"thug" будет токенизировано как `["[UNK]", "hug"]`: буква `"t" отсутствует в базовом словаре, и применение правил слияния приводит сначала к слиянию `"u"` и `"g"`, а затем к слиянию `"h"` и `"ug"`.
 
-<Tip>
-
-✏️ ** Теперь ваша очередь!** Как вы думаете, как будет токенизировано слово `'unhug'`?
-
-</Tip>
+> [!TIP]
+> ✏️ ** Теперь ваша очередь!** Как вы думаете, как будет токенизировано слово `'unhug'`?
 
 ## Реализация BPE[[implementing-bpe]]
 
@@ -315,11 +303,8 @@ print(vocab)
  'Ġtok', 'Ġtoken', 'nd', 'Ġis', 'Ġth', 'Ġthe', 'in', 'Ġab', 'Ġtokeni']
 ```
 
-<Tip>
-
-💡 Использование `train_new_from_iterator()` на том же корпусе не приведет к созданию точно такого же словаря. Это связано с тем, что при выборе наиболее частотной пары мы выбираем первую попавшуюся, в то время как библиотека 🤗 Tokenizers выбирает первую пару, основываясь на ее внутренних ID.
-
-</Tip>
+> [!TIP]
+> 💡 Использование `train_new_from_iterator()` на том же корпусе не приведет к созданию точно такого же словаря. Это связано с тем, что при выборе наиболее частотной пары мы выбираем первую попавшуюся, в то время как библиотека 🤗 Tokenizers выбирает первую пару, основываясь на ее внутренних ID.
 
 Чтобы токенизировать новый текст, мы предварительно токенизируем его, разбиваем на части, а затем применяем все изученные правила слияния:
 
@@ -351,10 +336,7 @@ tokenize("This is not a token.")
 ['This', 'Ġis', 'Ġ', 'n', 'o', 't', 'Ġa', 'Ġtoken', '.']
 ```
 
-<Tip warning={true}>
-
-⚠️ Наша реализация будет выбрасывать ошибку при наличии неизвестного символа, поскольку мы ничего не сделали для их обработки. На самом деле в GPT-2 нет неизвестного токена (невозможно получить неизвестный символ при использовании BPE на уровне байтов), но здесь это может произойти, поскольку мы не включили все возможные байты в начальный словарь. Этот аспект BPE выходит за рамки данного раздела, поэтому мы опустили подробности.
-
-</Tip>
+> [!WARNING]
+> ⚠️ Наша реализация будет выбрасывать ошибку при наличии неизвестного символа, поскольку мы ничего не сделали для их обработки. На самом деле в GPT-2 нет неизвестного токена (невозможно получить неизвестный символ при использовании BPE на уровне байтов), но здесь это может произойти, поскольку мы не включили все возможные байты в начальный словарь. Этот аспект BPE выходит за рамки данного раздела, поэтому мы опустили подробности.
 
 Вот и все об алгоритме BPE! Далее мы рассмотрим WordPiece.
\ No newline at end of file
diff --git a/chapters/ru/chapter6/6.mdx b/chapters/ru/chapter6/6.mdx
index a462d82f2..dcdc70cbe 100644
--- a/chapters/ru/chapter6/6.mdx
+++ b/chapters/ru/chapter6/6.mdx
@@ -11,19 +11,13 @@ WordPiece - это алгоритм токенизации, разработан
 
 <Youtube id="qpv6ms_t_1A"/>
 
-<Tip>
-
-💡 В этом разделе подробно рассматривается WordPiece, вплоть до демонстрации полной реализации. Вы можете пропустить его, если вам нужен только общий обзор алгоритма токенизации.
-
-</Tip>
+> [!TIP]
+> 💡 В этом разделе подробно рассматривается WordPiece, вплоть до демонстрации полной реализации. Вы можете пропустить его, если вам нужен только общий обзор алгоритма токенизации.
 
 ## Алгоритм обучения[[training-algorithm]]
 
-<Tip warning={true}>
-
-⚠️ Google никогда не предоставлял открытый доступ к своей реализации алгоритма обучения WordPiece, поэтому все вышесказанное - это наши предположения, основанные на опубликованных материалах. Возможно, они точны не на 100 %.
-
-</Tip>
+> [!WARNING]
+> ⚠️ Google никогда не предоставлял открытый доступ к своей реализации алгоритма обучения WordPiece, поэтому все вышесказанное - это наши предположения, основанные на опубликованных материалах. Возможно, они точны не на 100 %.
 
 Как и BPE, WordPiece начинает работу с небольшого словаря, включающего специальные токены, используемые моделью, и начальный алфавит. Поскольку модель идентифицирует подслова путем добавления префикса (как `##` для BERT), каждое слово первоначально разбивается на части путем добавления этого префикса ко всем символам внутри слова. Так, например, `"word"` разбивается на части следующим образом:
 
@@ -76,11 +70,8 @@ Corpus: ("hug", 10), ("p" "##u" "##g", 5), ("p" "##u" "##n", 12), ("b" "##u" "##
 
 и мы продолжаем так до тех пор, пока не достигнем необходимого размера словаря.
 
-<Tip>
-
-✏️ **Теперь ваша очередь!** Каким будет следующее правило слияния?
-
-</Tip>
+> [!TIP]
+> ✏️ **Теперь ваша очередь!** Каким будет следующее правило слияния?
 
 ## Алгоритм токенизации[[tokenization-algorithm]]
 
@@ -92,11 +83,8 @@ Corpus: ("hug", 10), ("p" "##u" "##g", 5), ("p" "##u" "##n", 12), ("b" "##u" "##
 
 Когда токенизация доходит до стадии, когда невозможно найти подслово в словаре, все слово токенизируется как неизвестное - так, например, `"mug"` будет токенизировано как `["[UNK]"]`, как и `"bum"` (даже если мы можем начать с `"b"` и `"##u"`, `"##m"` не входит в словарь, и результирующий токен будет просто `["[UNK]"]`, а не `["b", "##u", "[UNK]"]`). Это еще одно отличие от BPE, который классифицирует как неизвестные только отдельные символы, отсутствующие в словаре.
 
-<Tip>
-
-✏️ **Теперь ваша очередь!** Как будет токенизировано слово `"pugs"`?
-
-</Tip>
+> [!TIP]
+> ✏️ **Теперь ваша очередь!** Как будет токенизировано слово `"pugs"`?
 
 ## Реализация WordPiece[[implementing-wordpiece]]
 
@@ -314,11 +302,8 @@ print(vocab)
 
 Как мы видим, по сравнению с BPE этот токенизатор быстрее выучивает части слов как токены.
 
-<Tip>
-
-💡 Использование `train_new_from_iterator()` на одном и том же корпусе не приведет к точно такому же словарю. Это происходит потому, что библиотека 🤗 Tokenizers не реализует WordPiece для обучения (поскольку мы не полностью уверены в его внутреннем устройстве), а использует вместо него BPE.
-
-</Tip>
+> [!TIP]
+> 💡 Использование `train_new_from_iterator()` на одном и том же корпусе не приведет к точно такому же словарю. Это происходит потому, что библиотека 🤗 Tokenizers не реализует WordPiece для обучения (поскольку мы не полностью уверены в его внутреннем устройстве), а использует вместо него BPE.
 
 Чтобы токенизировать новый текст, мы предварительно токенизируем его, разбиваем на части, а затем применяем алгоритм токенизации к каждому слову. То есть начиная с первого слова мы ищем самое большое подслово и разбиваем его на части, затем мы повторяем процесс для второй части, и так далее для оставшейся части этого слова и следующих слов в тексте:
 
diff --git a/chapters/ru/chapter6/7.mdx b/chapters/ru/chapter6/7.mdx
index 3b436be17..3d427ec9b 100644
--- a/chapters/ru/chapter6/7.mdx
+++ b/chapters/ru/chapter6/7.mdx
@@ -11,11 +11,8 @@
 
 <Youtube id="TGZfZVuF9Yc"/>
 
-<Tip>
-
-💡 В этом разделе подробно рассматривается Unigram, вплоть до демонстрации полной реализации. Вы можете пропустить его, если вам нужен только общий обзор алгоритма токенизации.
-
-</Tip>
+> [!TIP]
+> 💡 В этом разделе подробно рассматривается Unigram, вплоть до демонстрации полной реализации. Вы можете пропустить его, если вам нужен только общий обзор алгоритма токенизации.
 
 ## Алгоритм обучения[[training-algorithm]]
 
@@ -56,11 +53,8 @@
 
 Итак, сумма всех частот равна 210, а вероятность появления подслова `"ug"`, таким образом, составляет 20/210.
 
-<Tip>
-
-✏️ **Теперь ваша очередь!** Напишите код для вычисления вышеуказанных частот и дважды проверьте правильность приведенных результатов, а также общую сумму.
-
-</Tip>
+> [!TIP]
+> ✏️ **Теперь ваша очередь!** Напишите код для вычисления вышеуказанных частот и дважды проверьте правильность приведенных результатов, а также общую сумму.
 
 Теперь для токенизации данного слова мы рассматриваем все возможные сегментации на токены и вычисляем вероятность каждого из них в соответствии с моделью Unigram. Поскольку все токены считаются независимыми, эта вероятность равна произведению вероятностей появления каждого токена. Например, при токенизации `["p", "u", "g"]` слова `"pug"` вероятность составляет:
 
@@ -98,11 +92,8 @@ Character 4 (g): "un" "hug" (score 0.005442)
 
 Таким образом, `"unhug"` будет токенизировано как `["un", "hug"]`.
 
-<Tip>
-
-✏️ **Теперь ваша очередь!** Определите токенизацию слова `" huggun"` и его оценку.
-
-</Tip>
+> [!TIP]
+> ✏️ **Теперь ваша очередь!** Определите токенизацию слова `" huggun"` и его оценку.
 
 ## Назад к обучению[[back-to-training]]
 
@@ -215,11 +206,8 @@ token_freqs = list(char_freqs.items()) + sorted_subwords[: 300 - len(char_freqs)
 token_freqs = {token: freq for token, freq in token_freqs}
 ```
 
-<Tip>
-
-💡 SentencePiece использует более эффективный алгоритм под названием Enhanced Suffix Array (ESA) для создания начального словаря.
-
-</Tip>
+> [!TIP]
+> 💡 SentencePiece использует более эффективный алгоритм под названием Enhanced Suffix Array (ESA) для создания начального словаря.
 
 Далее мы вычисляем сумму всех частот, чтобы преобразовать частоты в вероятности. Для нашей модели мы будем хранить логарифмы вероятностей, потому что численно стабильнее складывать логарифмы, чем перемножать маленькие числа, и это упростит вычисление потерь модели:
 
@@ -340,11 +328,8 @@ print(scores["his"])
 0.0
 ```
 
-<Tip>
-
-💡 Такой подход очень неэффективен, поэтому SentencePiece использует приближенную оценку потерь модели без токена X: вместо того чтобы начинать с нуля, он просто заменяет токен X его сегментацией в оставшемся словаре. Таким образом, все оценки могут быть вычислены одновременно с потерями модели.
-
-</Tip>
+> [!TIP]
+> 💡 Такой подход очень неэффективен, поэтому SentencePiece использует приближенную оценку потерь модели без токена X: вместо того чтобы начинать с нуля, он просто заменяет токен X его сегментацией в оставшемся словаре. Таким образом, все оценки могут быть вычислены одновременно с потерями модели.
 
 Когда этот процесс завершиться, останется только добавить в словарь специальные токены, используемые моделью, а затем итерироваться, пока мы не вычеркнем из словаря достаточно токенов, чтобы достичь желаемого размера:
 
diff --git a/chapters/ru/chapter6/8.mdx b/chapters/ru/chapter6/8.mdx
index 36b260a6b..98ca7d600 100644
--- a/chapters/ru/chapter6/8.mdx
+++ b/chapters/ru/chapter6/8.mdx
@@ -111,12 +111,9 @@ print(tokenizer.normalizer.normalize_str("Héllò hôw are ü?"))
 hello how are u?
 ```
 
-<Tip>
-
-**Далее** если вы протестируете две версии предыдущих нормализаторов на строке, содержащей символ Unicode `u"\u0085"`, то наверняка заметите, что эти два нормализатора не совсем эквивалентны.  
-Чтобы не усложнять версию с `normalizers.Sequence`, мы не включили в нее Regex-замены, которые требует `BertNormalizer`, когда аргумент `clean_text` установлен в `True`, что является поведением по умолчанию. Но не волнуйтесь: можно получить точно такую же нормализацию без использования удобного `BertNormalizer`, добавив два `normalizers.Replace` в последовательность нормализаторов.
-
-</Tip>
+> [!TIP]
+> **Далее** если вы протестируете две версии предыдущих нормализаторов на строке, содержащей символ Unicode `u"\u0085"`, то наверняка заметите, что эти два нормализатора не совсем эквивалентны.  
+> Чтобы не усложнять версию с `normalizers.Sequence`, мы не включили в нее Regex-замены, которые требует `BertNormalizer`, когда аргумент `clean_text` установлен в `True`, что является поведением по умолчанию. Но не волнуйтесь: можно получить точно такую же нормализацию без использования удобного `BertNormalizer`, добавив два `normalizers.Replace` в последовательность нормализаторов.
 
 Далее следует этап предварительной токенизации. Опять же, есть готовый `BertPreTokenizer`, который мы можем использовать:
 
diff --git a/chapters/ru/chapter7/1.mdx b/chapters/ru/chapter7/1.mdx
index 47cf244b5..a54781bc2 100644
--- a/chapters/ru/chapter7/1.mdx
+++ b/chapters/ru/chapter7/1.mdx
@@ -31,8 +31,5 @@
 {/if}
 
 
-<Tip>
-
-Если вы будете читать разделы по порядку, то заметите, что в них довольно много общего в коде и тексте. Повторение сделано намеренно, чтобы вы могли погрузиться (или вернуться позже) в любую интересующую вас задачу и найти полный рабочий пример.
-
-</Tip>
+> [!TIP]
+> Если вы будете читать разделы по порядку, то заметите, что в них довольно много общего в коде и тексте. Повторение сделано намеренно, чтобы вы могли погрузиться (или вернуться позже) в любую интересующую вас задачу и найти полный рабочий пример.
diff --git a/chapters/ru/chapter7/2.mdx b/chapters/ru/chapter7/2.mdx
index edc843490..092853937 100644
--- a/chapters/ru/chapter7/2.mdx
+++ b/chapters/ru/chapter7/2.mdx
@@ -45,11 +45,8 @@
 
 Прежде всего, нам нужен набор данных, подходящий для классификации токенов. В этом разделе мы будем использовать [набор данных CoNLL-2003](https://huggingface.co/datasets/conll2003), который содержит новости от Reuters.
 
-<Tip>
-
-💡 Если ваш набор данных состоит из текстов, часть которых состоит из слов с соответствующими метками, вы сможете адаптировать описанные здесь процедуры обработки данных к своему набору данных. Обратитесь к [Главе 5](../chapter5/1), если вам нужно освежить в памяти то, как загружать собственные данные в `Dataset`.
-
-</Tip>
+> [!TIP]
+> 💡 Если ваш набор данных состоит из текстов, часть которых состоит из слов с соответствующими метками, вы сможете адаптировать описанные здесь процедуры обработки данных к своему набору данных. Обратитесь к [Главе 5](../chapter5/1), если вам нужно освежить в памяти то, как загружать собственные данные в `Dataset`.
 
 ### Датасет CoNLL-2003[[the-conll-2003-dataset]]
 
@@ -167,11 +164,8 @@ print(line2)
 
 Как мы видим, сущностям, состоящим из двух слов, например "European Union" и "Werner Zwingmann", присваивается метка `B-` для первого слова и метка `I-` для второго.
 
-<Tip>
-
-✏️ **Попробуйте!** Выведите те же два предложения с метками POS или chunking.
-
-</Tip>
+> [!TIP]
+> ✏️ **Попробуйте!** Выведите те же два предложения с метками POS или chunking.
 
 ### Обработка данных[[processing-the-data]]
 
@@ -263,11 +257,8 @@ print(align_labels_with_tokens(labels, word_ids))
 
 Как мы видим, наша функция добавила `-100` для двух специальных токенов в начале и в конце и новый `0` для нашего слова, которое было разбито на две части.
 
-<Tip>
-
-✏️ **Попробуйте!** Некоторые исследователи предпочитают назначать только одну метку на слово и присваивать `-100` другим подтокенам в данном слове. Это делается для того, чтобы длинные слова, часть которых состоит из множества субтокенов, не вносили значительный вклад в потери.
-
-</Tip>
+> [!TIP]
+> ✏️ **Попробуйте!** Некоторые исследователи предпочитают назначать только одну метку на слово и присваивать `-100` другим подтокенам в данном слове. Это делается для того, чтобы длинные слова, часть которых состоит из множества субтокенов, не вносили значительный вклад в потери.
 
 Чтобы предварительно обработать весь наш датасет, нам нужно провести токенизацию всех входных данных и применить `align_labels_with_tokens()` ко всем меткам. Чтобы воспользоваться преимуществами скорости нашего быстрого токенизатора, лучше всего токенизировать много текстов одновременно, поэтому мы напишем функцию, которая обрабатывает список примеров и использует метод `Dataset.map()` с параметром `batched=True`. Единственное отличие от нашего предыдущего примера заключается в том, что функция `word_ids()` должна получить индекс примера, идентификаторы слов которого нам нужны, с учётом того что входными данными для токенизатора являются списки текстов (или, в нашем случае, списки слов), поэтому мы добавляем и это:
 
@@ -429,11 +420,8 @@ model.config.num_labels
 9
 ```
 
-<Tip warning={true}>
-
-⚠️ Если у вас есть модель с неправильным количеством меток, то при последующем вызове `model.fit()` вы получите непонятную ошибку. Это может вызвать раздражение при отладке, поэтому обязательно выполните эту проверку, чтобы убедиться, что у вас есть ожидаемое количество меток.
-
-</Tip>
+> [!WARNING]
+> ⚠️ Если у вас есть модель с неправильным количеством меток, то при последующем вызове `model.fit()` вы получите непонятную ошибку. Это может вызвать раздражение при отладке, поэтому обязательно выполните эту проверку, чтобы убедиться, что у вас есть ожидаемое количество меток.
 
 ### Дообучение модели[[fine-tuning-the-model]]
 
@@ -497,11 +485,8 @@ model.fit(
 
 С помощью аргумента `hub_model_id` можно указать полное имя репозитория, в который вы хотите передать модель (в частности, этот аргумент нужно использовать, чтобы передать модель в организацию). Например, когда мы отправили модель в [организацию `huggingface-course`](https://huggingface.co/huggingface-course), мы добавили `hub_model_id="huggingface-course/bert-finetuned-ner"`. По умолчанию используемое хранилище будет находиться в вашем пространстве имен и называться в соответствии с заданной вами выходной директорией, например `"cool_huggingface_user/bert-finetuned-ner"`.
 
-<Tip>
-
-💡 Если выходной каталог, который вы используете, уже существует, он должен быть локальным клоном репозитория, в который вы хотите выполнить push. Если это не так, вы получите ошибку при вызове `model.fit()` и должны будете задать новое имя.
-
-</Tip>
+> [!TIP]
+> 💡 Если выходной каталог, который вы используете, уже существует, он должен быть локальным клоном репозитория, в который вы хотите выполнить push. Если это не так, вы получите ошибку при вызове `model.fit()` и должны будете задать новое имя.
 
 Обратите внимание, что во время обучения каждый раз, когда модель сохраняется (здесь - каждую эпоху), она загружается на хаб в фоновом режиме. Таким образом, при необходимости вы сможете возобновить обучение на другой машине.
 
@@ -679,11 +664,8 @@ model.config.num_labels
 9
 ```
 
-<Tip warning={true}>
-
-⚠️ Если у вас есть модель с неправильным количеством меток, то при последующем вызове метода `Trainer.train()` вы получите непонятную ошибку (что-то вроде "CUDA error: device-side assert triggered"). Это главная причина ошибок, о которых сообщают пользователи, поэтому обязательно выполните эту проверку, чтобы убедиться, что у вас есть ожидаемое количество меток.
-
-</Tip>
+> [!WARNING]
+> ⚠️ Если у вас есть модель с неправильным количеством меток, то при последующем вызове метода `Trainer.train()` вы получите непонятную ошибку (что-то вроде "CUDA error: device-side assert triggered"). Это главная причина ошибок, о которых сообщают пользователи, поэтому обязательно выполните эту проверку, чтобы убедиться, что у вас есть ожидаемое количество меток.
 
 ### Дообучение модели[[fine-tuning-the-model]]
 
@@ -721,11 +703,8 @@ args = TrainingArguments(
 
 Большинство из них вы уже видели: мы задаем некоторые гиперпараметры (например, скорость обучения, количество эпох для обучения и затухание весов) и указываем `push_to_hub=True`, чтобы указать, что мы хотим сохранить модель и оценить ее в конце каждой эпохи, а также что мы хотим загрузить наши результаты в Model Hub. Обратите внимание, что с помощью аргумента `hub_model_id` можно указать имя репозитория, в который вы хотите передать модель (в частности, этот аргумент нужно использовать, чтобы передать модель в организацию). Например, когда мы передавали модель в [организацию `huggingface-course`](https://huggingface.co/huggingface-course), мы добавили `hub_model_id="huggingface-course/bert-finetuned-ner"` в `TrainingArguments`. По умолчанию используемый репозиторий будет находиться в вашем пространстве имен и называться в соответствии с заданным вами выходным каталогом, так что в нашем случае это будет `"sgugger/bert-finetuned-ner"`.
 
-<Tip>
-
-💡 Если выходной каталог, который вы используете, уже существует, он должен быть локальным клоном репозитория, в который вы хотите передать модель. Если это не так, вы получите ошибку при определении вашего `Trainer` и должны будете задать новое имя.
-
-</Tip>
+> [!TIP]
+> 💡 Если выходной каталог, который вы используете, уже существует, он должен быть локальным клоном репозитория, в который вы хотите передать модель. Если это не так, вы получите ошибку при определении вашего `Trainer` и должны будете задать новое имя.
 
 Наконец, мы просто передаем все в `Trainer` и запускаем обучение:
 
@@ -813,11 +792,8 @@ model, optimizer, train_dataloader, eval_dataloader = accelerator.prepare(
 )
 ```
 
-<Tip>
-
-🚨 Если вы обучаетесь на TPU, вам нужно будет перенести весь код, начиная с ячейки выше, в специальную функцию обучения. Подробнее смотрите [Главу 3](../chapter3/1).
-
-</Tip>
+> [!TIP]
+> 🚨 Если вы обучаетесь на TPU, вам нужно будет перенести весь код, начиная с ячейки выше, в специальную функцию обучения. Подробнее смотрите [Главу 3](../chapter3/1).
 
 Теперь, когда мы отправили наш `train_dataloader` в `accelerator.prepare()`, мы можем использовать его длину для вычисления количества шагов обучения. Помните, что это всегда нужно делать после подготовки загрузчика данных, так как этот метод изменит его длину. Мы используем классический линейный планировшик скорости обучения до 0:
 
diff --git a/chapters/ru/chapter7/3.mdx b/chapters/ru/chapter7/3.mdx
index da62408eb..2364bc0ea 100644
--- a/chapters/ru/chapter7/3.mdx
+++ b/chapters/ru/chapter7/3.mdx
@@ -41,11 +41,8 @@
 
 <Youtube id="mqElG5QJWUg"/>
 
-<Tip>
-
-🙋 Если термины "маскированное моделирование языка (masked language modeling)" и "предварительно обученная модель (pretrained model)" кажутся вам незнакомыми, загляните в [Главу 1](../chapter1/1), где мы объясняем все эти основные понятия, сопровождая их видеороликами!
-
-</Tip>
+> [!TIP]
+> 🙋 Если термины "маскированное моделирование языка (masked language modeling)" и "предварительно обученная модель (pretrained model)" кажутся вам незнакомыми, загляните в [Главу 1](../chapter1/1), где мы объясняем все эти основные понятия, сопровождая их видеороликами!
 
 ## Выбор предварительно обученной модели для маскированного моделирования языка[[picking-a-pretrained-model-for-masked-language-modeling]]
 
@@ -237,11 +234,8 @@ for row in sample:
 
 Да, это точно рецензии на фильмы, и если вы родились до 1990х, вам будет лучше понятен комментарий в последней рецензии о VHS-версии. 😜! Хотя нам не понадобятся эти метки для языкового моделирования, мы уже видим, что `0` обозначает отрицательный отзыв, а `1` - положительный.
 
-<Tip>
-
-✏️ **Попробуйте!** Создайте случайную выборку из части `unsupervised` и проверьте, что метки не являются ни `0`, ни `1`. В процессе работы вы также можете проверить, что метки в частях `train` и `test` действительно равны `0` или `1` - это полезная проверка здравомыслия, которую каждый практикующий NLP должен выполнять в начале нового проекта!
-
-</Tip>
+> [!TIP]
+> ✏️ **Попробуйте!** Создайте случайную выборку из части `unsupervised` и проверьте, что метки не являются ни `0`, ни `1`. В процессе работы вы также можете проверить, что метки в частях `train` и `test` действительно равны `0` или `1` - это полезная проверка здравомыслия, которую каждый практикующий NLP должен выполнять в начале нового проекта!
 
 Теперь, когда мы вкратце ознакомились с данными, давайте перейдем к их подготовке к моделированию языка по маске. Как мы увидим, есть несколько дополнительных шагов, которые необходимо сделать по сравнению с задачами классификации последовательностей, которые мы рассматривали в [Главе 3](../chapter3/1). Поехали!
 
@@ -299,11 +293,8 @@ tokenizer.model_max_length
 
 Это значение берется из файла *tokenizer_config.json*, связанного с контрольной точкой; в данном случае мы видим, что размер контекста составляет 512 токенов, как и в случае с BERT.
 
-<Tip>
-
-✏️ **Попробуйте!** Некоторые модели трансформеров, например [BigBird](https://huggingface.co/google/bigbird-roberta-base) и [Longformer](hf.co/allenai/longformer-base-4096), имеют гораздо большую длину контекста, чем BERT и другие ранние модели трансформеров. Инстанцируйте токенизатор для одной из этих контрольных точек и проверьте, что `model_max_length` согласуется с тем, что указано в описании модели.
-
-</Tip>
+> [!TIP]
+> ✏️ **Попробуйте!** Некоторые модели трансформеров, например [BigBird](https://huggingface.co/google/bigbird-roberta-base) и [Longformer](hf.co/allenai/longformer-base-4096), имеют гораздо большую длину контекста, чем BERT и другие ранние модели трансформеров. Инстанцируйте токенизатор для одной из этих контрольных точек и проверьте, что `model_max_length` согласуется с тем, что указано в описании модели.
 
 Поэтому для проведения экспериментов на GPU, подобных тем, что стоят в Google Colab, мы выберем что-нибудь поменьше, что может поместиться в памяти:
 
@@ -311,11 +302,8 @@ tokenizer.model_max_length
 chunk_size = 128
 ```
 
-<Tip warning={true}>
-
-Обратите внимание, что использование небольшого размера фрагмента может быть вредным в реальных сценариях, поэтому следует использовать размер, соответствующий сценарию использования, к которому вы будете применять вашу модель.
-
-</Tip>
+> [!WARNING]
+> Обратите внимание, что использование небольшого размера фрагмента может быть вредным в реальных сценариях, поэтому следует использовать размер, соответствующий сценарию использования, к которому вы будете применять вашу модель.
 
 Теперь наступает самое интересное. Чтобы показать, как работает конкатенация, давайте возьмем несколько отзывов из нашего обучающего набора с токенизацией и выведем количество токенов в отзыве:
 
@@ -472,11 +460,8 @@ for chunk in data_collator(samples)["input_ids"]:
 
 Отлично, сработало! Мы видим, что токен `[MASK]` был случайным образом вставлен в различные места нашего текста. Это будут токены, которые наша модель должна будет предсказать в процессе обучения - и прелесть коллатора данных в том, что он будет случайным образом вставлять `[MASK]` в каждом батче!  
 
-<Tip>
-
-✏️ **Попробуйте!** Запустите приведенный выше фрагмент кода несколько раз, чтобы увидеть, как случайное маскирование происходит на ваших глазах! Также замените метод `tokenizer.decode()` на `tokenizer.convert_ids_to_tokens()`, чтобы увидеть, что иногда маскируется один токен из данного слова, а не все остальные.
-
-</Tip>
+> [!TIP]
+> ✏️ **Попробуйте!** Запустите приведенный выше фрагмент кода несколько раз, чтобы увидеть, как случайное маскирование происходит на ваших глазах! Также замените метод `tokenizer.decode()` на `tokenizer.convert_ids_to_tokens()`, чтобы увидеть, что иногда маскируется один токен из данного слова, а не все остальные.
 
 {#if fw === 'pt'}
 
@@ -586,11 +571,8 @@ for chunk in batch["input_ids"]:
 '>>> .... [MASK] [MASK] [MASK] [MASK]....... high. a classic line : inspector : i\'m here to sack one of your teachers. student : welcome to bromwell high. i expect that many adults of my age think that bromwell high is far fetched. what a pity that it isn\'t! [SEP] [CLS] homelessness ( or houselessness as george carlin stated ) has been an issue for years but never a plan to help those on the street that were once considered human who did everything from going to school, work, or vote for the matter. most people think of the homeless'
 ```
 
-<Tip>
-
-✏️ **Попробуйте!** Запустите приведенный выше фрагмент кода несколько раз, чтобы увидеть, как случайное маскирование происходит на ваших глазах! Также замените метод `tokenizer.decode()` на `tokenizer.convert_ids_to_tokens()`, чтобы увидеть, что токены из данного слова всегда маскируются вместе.
-
-</Tip>
+> [!TIP]
+> ✏️ **Попробуйте!** Запустите приведенный выше фрагмент кода несколько раз, чтобы увидеть, как случайное маскирование происходит на ваших глазах! Также замените метод `tokenizer.decode()` на `tokenizer.convert_ids_to_tokens()`, чтобы увидеть, что токены из данного слова всегда маскируются вместе.
 
 Теперь, когда у нас есть два колатора данных, остальные шаги по дообучению стандартны. Обучение может занять много времени в Google Colab, если вам не посчастливилось получить мифический GPU P100 😭, поэтому мы сначала уменьшим размер обучающего набора до нескольких тысяч примеров. Не волнуйтесь, мы все равно получим довольно приличную языковую модель! Быстрый способ уменьшить размер датасета в 🤗 Datasets - это функция `Dataset.train_test_split()`, которую мы рассматривали в [Главе 5](../chapter5/1):
 
@@ -815,11 +797,8 @@ trainer.push_to_hub()
 
 {/if}
 
-<Tip>
-
-✏️ **Попробуйте!** Запустите обучение, описанное выше, после замены коллатора данных на коллатор маскирующий все слово. Получили ли вы лучшие результаты?
-
-</Tip>
+> [!TIP]
+> ✏️ **Попробуйте!** Запустите обучение, описанное выше, после замены коллатора данных на коллатор маскирующий все слово. Получили ли вы лучшие результаты?
 
 {#if fw === 'pt'} 
 
@@ -1037,8 +1016,5 @@ for pred in preds:
 
 На этом мы завершаем наш первый эксперимент по обучению языковой модели. В [разделе 6](../chapter7/6) вы узнаете, как обучить авторегрессионную модель типа GPT-2 с нуля; загляните туда, если хотите посмотреть, как можно предварительно обучить свою собственную модель трансформера!
 
-<Tip>
-
-✏️ **Попробуйте!** Чтобы оценить преимущества адаптации к домену, дообучите классификатор на метках IMDb как для предварительно обученных, так и для дообученных контрольных точек DistilBERT. Если вам нужно освежить в памяти классификацию текстов, ознакомьтесь с [Главой 3](../chapter3/1).
-
-</Tip>
+> [!TIP]
+> ✏️ **Попробуйте!** Чтобы оценить преимущества адаптации к домену, дообучите классификатор на метках IMDb как для предварительно обученных, так и для дообученных контрольных точек DistilBERT. Если вам нужно освежить в памяти классификацию текстов, ознакомьтесь с [Главой 3](../chapter3/1).
diff --git a/chapters/ru/chapter7/4.mdx b/chapters/ru/chapter7/4.mdx
index dd6001867..e324a07e3 100644
--- a/chapters/ru/chapter7/4.mdx
+++ b/chapters/ru/chapter7/4.mdx
@@ -156,11 +156,8 @@ translator(
 
 <Youtube id="0Oxphw4Q9fo"/>
 
-<Tip>
-
-✏️ **Попробуйте!** Еще одно английское слово, которое часто используется во французском языке, - "email". Найдите в обучающем датасете первый образец, в котором используется это слово. Как оно переводится? Как предварительно обученная модель переводит то же английское предложение?
-
-</Tip>
+> [!TIP]
+> ✏️ **Попробуйте!** Еще одно английское слово, которое часто используется во французском языке, - "email". Найдите в обучающем датасете первый образец, в котором используется это слово. Как оно переводится? Как предварительно обученная модель переводит то же английское предложение?
 
 ### Предварительная обработка данных[[processing-the-data]]
 
@@ -177,11 +174,8 @@ tokenizer = AutoTokenizer.from_pretrained(model_checkpoint, return_tensors="pt")
 
 Вы также можете заменить `model_checkpoint` на любую другую модель из [Hub](https://huggingface.co/models) или локальной папки, в которой вы сохранили предварительно обученную модель и токенизатор.
 
-<Tip>
-
-💡 Если вы используете многоязыковой токенизатор, такой как mBART, mBART-50 или M2M100, вам нужно задать языковые коды ваших входных и целевых данных в токенизаторе, задав правильные значения параметрам `tokenizer.src_lang` и `tokenizer.tgt_lang`.
-
-</Tip>
+> [!TIP]
+> 💡 Если вы используете многоязыковой токенизатор, такой как mBART, mBART-50 или M2M100, вам нужно задать языковые коды ваших входных и целевых данных в токенизаторе, задав правильные значения параметрам `tokenizer.src_lang` и `tokenizer.tgt_lang`.
 
 Подготовка наших данных довольно проста. Нужно помнить только об одном: необходимо убедиться, что токенизатор обрабатывает целевые значения на выходном языке (здесь - французском). Вы можете сделать это, передав целевые данные в аргумент `text_targets` метода `__call__` токенизатора.
 
@@ -231,17 +225,11 @@ def preprocess_function(examples):
 
 Обратите внимание, что мы установили одинаковую максимальную длину для наших входов и выходов. Поскольку тексты, с которыми мы имеем дело, довольно короткие, мы используем 128.
 
-<Tip>
+> [!TIP]
+> 💡 Если вы используете модель T5 (точнее, одну из контрольных точек `t5-xxx`), модель будет ожидать, что текстовые данные будут иметь префикс, указывающий на поставленную задачу, например `translate: English to French:`.
 
-💡 Если вы используете модель T5 (точнее, одну из контрольных точек `t5-xxx`), модель будет ожидать, что текстовые данные будут иметь префикс, указывающий на поставленную задачу, например `translate: English to French:`.
-
-</Tip>
-
-<Tip warning={true}>
-
-⚠️ Мы не обращаем внимания на маску внимания целевых значений, так как модель не будет этого ожидать. Вместо этого метки, соответствующие токенам дополнения, должны быть заданы как `-100`, чтобы они игнорировались при вычислении потерь. Это будет сделано нашим коллатором данных позже, так как мы применяем динамическое дополнение (dynamic padding), но если вы используете дополнение (padding) здесь, вы должны адаптировать функцию предварительной обработки данных, чтобы установить все метки, соответствующие токену дополнения, в `-100`.
-
-</Tip>
+> [!WARNING]
+> ⚠️ Мы не обращаем внимания на маску внимания целевых значений, так как модель не будет этого ожидать. Вместо этого метки, соответствующие токенам дополнения, должны быть заданы как `-100`, чтобы они игнорировались при вычислении потерь. Это будет сделано нашим коллатором данных позже, так как мы применяем динамическое дополнение (dynamic padding), но если вы используете дополнение (padding) здесь, вы должны адаптировать функцию предварительной обработки данных, чтобы установить все метки, соответствующие токену дополнения, в `-100`.
 
 Теперь мы можем применить эту функцию предварительной обработки ко всем частям нашего датасета за один раз:
 
@@ -649,11 +637,8 @@ model.fit(
 
 Обратите внимание, что с помощью аргумента `hub_model_id` можно указать имя репозитория, в который вы хотите отправить модель (в частности, этот аргумент нужно использовать, чтобы отправить модель в организацию). Например, когда мы отправили модель в организацию [`huggingface-course`](https://huggingface.co/huggingface-course), мы добавили `hub_model_id="huggingface-course/marian-finetuned-kde4-en-to-fr"` в `Seq2SeqTrainingArguments`. По умолчанию используемый репозиторий будет находиться в вашем пространстве имен и называться в соответствии с заданным вами выходным каталогом, поэтому здесь это будет `"sgugger/marian-finetuned-kde4-en-to-fr"` (это модель, на которую мы ссылались в начале этого раздела).
 
-<Tip>
-
-💡 Если выходной каталог, который вы используете, уже существует, он должен быть локальным клоном репозитория, в который вы хотите выполнить push. Если это не так, вы получите ошибку при вызове `model.fit()` и должны будете задать новое имя.
-
-</Tip>
+> [!TIP]
+> 💡 Если выходной каталог, который вы используете, уже существует, он должен быть локальным клоном репозитория, в который вы хотите выполнить push. Если это не так, вы получите ошибку при вызове `model.fit()` и должны будете задать новое имя.
 
 Наконец, давайте посмотрим, как выглядят наши метрики после завершения обучения:
 
@@ -699,11 +684,8 @@ args = Seq2SeqTrainingArguments(
 
 Обратите внимание, что в аргументе `hub_model_id` можно указать полное имя розитория, в который вы хотите отправить модель (в частности, этот аргумент нужно использовать, чтобы отправить модель в организацию). Например, когда мы отправили модель в организацию [`huggingface-course`](https://huggingface.co/huggingface-course), мы добавили `hub_model_id="huggingface-course/marian-finetuned-kde4-en-to-fr"` в `Seq2SeqTrainingArguments`. По умолчанию используемый розиторий будет находиться в вашем пространстве имен и называться в соответствии с заданным вами выходным каталогом, поэтому в нашем случае это будет `"sgugger/marian-finetuned-kde4-en-to-fr"` (это модель, на которую мы ссылались в начале этого раздела).
 
-<Tip>
-
-💡 Если выходной каталог, который вы используете, уже существует, он должен быть локальным клоном того розитория, в который вы хотите выполнить push. Если это не так, вы получите ошибку при определении вашего `Seq2SeqTrainer` и должны будете задать новое имя.
-
-</Tip>
+> [!TIP]
+> 💡 Если выходной каталог, который вы используете, уже существует, он должен быть локальным клоном того розитория, в который вы хотите выполнить push. Если это не так, вы получите ошибку при определении вашего `Seq2SeqTrainer` и должны будете задать новое имя.
 
 
 Наконец, мы просто передаем все в `Seq2SeqTrainer`:
@@ -995,8 +977,5 @@ translator(
 
 Еще один отличный пример доменной адаптации!
 
-<Tip>
-
-✏️ **Попробуйте!** Что возвращает модель для примера со словом "email", который вы определили ранее?
-
-</Tip>
+> [!TIP]
+> ✏️ **Попробуйте!** Что возвращает модель для примера со словом "email", который вы определили ранее?
diff --git a/chapters/ru/chapter7/5.mdx b/chapters/ru/chapter7/5.mdx
index 9c14bb75a..999d7d0ce 100644
--- a/chapters/ru/chapter7/5.mdx
+++ b/chapters/ru/chapter7/5.mdx
@@ -86,11 +86,8 @@ show_samples(english_dataset)
 '>> Review: Bought this for handling miscellaneous aircraft parts and hanger "stuff" that I needed to organize; it really fit the bill. The unit arrived quickly, was well packaged and arrived intact (always a good sign). There are five wall mounts-- three on the top and two on the bottom. I wanted to mount it on the wall, so all I had to do was to remove the top two layers of plastic drawers, as well as the bottom corner drawers, place it when I wanted and mark it; I then used some of the new plastic screw in wall anchors (the 50 pound variety) and it easily mounted to the wall. Some have remarked that they wanted dividers for the drawers, and that they made those. Good idea. My application was that I needed something that I can see the contents at about eye level, so I wanted the fuller-sized drawers. I also like that these are the new plastic that doesn\'t get brittle and split like my older plastic drawers did. I like the all-plastic construction. It\'s heavy duty enough to hold metal parts, but being made of plastic it\'s not as heavy as a metal frame, so you can easily mount it to the wall and still load it up with heavy stuff, or light stuff. No problem there. For the money, you can\'t beat it. Best one of these I\'ve bought to date-- and I\'ve been using some version of these for over forty years.'
 ```
 
-<Tip>
-
-✏️ **Попробуйте!** Измените random seed в команде `Dataset.shuffle()`, чтобы изучить другие отзывы в корпусе. Если вы владеете испанским языком, посмотрите на некоторые отзывы в `spanish_dataset`, чтобы понять, похожи ли их названия на разумные резюме.
-
-</Tip>
+> [!TIP]
+> ✏️ **Попробуйте!** Измените random seed в команде `Dataset.shuffle()`, чтобы изучить другие отзывы в корпусе. Если вы владеете испанским языком, посмотрите на некоторые отзывы в `spanish_dataset`, чтобы понять, похожи ли их названия на разумные резюме.
 
 Эта выборка демонстрирует разнообразие отзывов, которые обычно можно найти в сети, - от положительных до отрицательных (и все, что между ними!). Хотя пример с названием "meh" не очень информативен, остальные названия выглядят как достойные резюме самих отзывов. Обучение модели суммаризации всех 400 000 отзывов заняло бы слишком много времени на одном GPU, поэтому вместо этого мы сосредоточимся на создании резюме для одного домена продуктов. Чтобы получить представление о том, какие домены мы можем выбрать, давайте преобразуем `english_dataset` в `pandas.DataFrame` и вычислим количество отзывов по каждой категории товаров:
 
@@ -228,11 +225,8 @@ books_dataset = books_dataset.filter(lambda x: len(x["review_title"].split()) >
 В mT5 не используются префиксы, но она обладает многими универсальными возможностями T5 и имеет многоязыковое преимущество. Теперь, когда мы выбрали модель, давайте посмотрим, как подготовить данные для обучения.
 
 
-<Tip>
-
-✏️ **Попробуйте!** После того как вы проработаете этот раздел, посмотрите, насколько хорошо mT5 сравнится с mBART, дообучив его тем же методам. Чтобы получить бонусные очки, вы также можете попробовать дообучить T5 только на английских рецензиях. Поскольку в T5 есть специальный префикс запроса, вам нужно будет добавить `summarize:` к входным примерам на следующих шагах предварительной обработки.
-
-</Tip>
+> [!TIP]
+> ✏️ **Попробуйте!** После того как вы проработаете этот раздел, посмотрите, насколько хорошо mT5 сравнится с mBART, дообучив его тем же методам. Чтобы получить бонусные очки, вы также можете попробовать дообучить T5 только на английских рецензиях. Поскольку в T5 есть специальный префикс запроса, вам нужно будет добавить `summarize:` к входным примерам на следующих шагах предварительной обработки.
 
 ## Предварительная обработка данных[[preprocessing-the-data]]
 
@@ -247,11 +241,8 @@ model_checkpoint = "google/mt5-small"
 tokenizer = AutoTokenizer.from_pretrained(model_checkpoint)
 ```
 
-<Tip>
-
-💡 На ранних стадиях ваших NLP-проектов хорошей практикой является обучение класса "маленьких" моделей на небольшой выборке данных. Это позволит вам быстрее отлаживать и итерировать модели, чтобы создать сквозной рабочий процесс. Когда вы будете уверены в результатах, вы всегда сможете увеличить масштаб модели, просто изменив контрольную точку модели!
-
-</Tip>
+> [!TIP]
+> 💡 На ранних стадиях ваших NLP-проектов хорошей практикой является обучение класса "маленьких" моделей на небольшой выборке данных. Это позволит вам быстрее отлаживать и итерировать модели, чтобы создать сквозной рабочий процесс. Когда вы будете уверены в результатах, вы всегда сможете увеличить масштаб модели, просто изменив контрольную точку модели!
 
 Давайте протестируем токенизатор mT5 на небольшом примере:
 
@@ -306,11 +297,8 @@ tokenized_datasets = books_dataset.map(preprocess_function, batched=True)
 
 Теперь, когда корпус был предварительно обработан, давайте посмотрим на некоторые метрики, которые обычно используются для суммаризации. Как мы увидим, не существует серебряной пули, когда дело доходит до измерения качества сгенерированного машиной текста.
 
-<Tip>
-
-💡 Возможно, вы заметили, что выше в функции `Dataset.map()` мы использовали `batched=True`. Это кодирует примеры в батчах по 1 000 (по умолчанию) и позволяет использовать возможности многопоточности быстрых токенизаторов в 🤗 Transformers. По возможности, попробуйте использовать `batched=True`, чтобы получить максимальную отдачу от препроцессинга!
-
-</Tip>
+> [!TIP]
+> 💡 Возможно, вы заметили, что выше в функции `Dataset.map()` мы использовали `batched=True`. Это кодирует примеры в батчах по 1 000 (по умолчанию) и позволяет использовать возможности многопоточности быстрых токенизаторов в 🤗 Transformers. По возможности, попробуйте использовать `batched=True`, чтобы получить максимальную отдачу от препроцессинга!
 
 
 ## Метрики для суммаризации текста[[metrics-for-text-summarization]]
@@ -328,11 +316,8 @@ reference_summary = "I loved reading the Hunger Games"
 
 Одним из способов их сравнения может быть подсчет количества перекрывающихся слов, которых в данном случае будет 6. Однако это несколько грубовато, поэтому вместо этого ROUGE основывается на вычислении оценок _precision_ и _recall_ для перекрытия.
 
-<Tip>
-
-🙋 Не волнуйтесь, если вы впервые слышите о precision и recall - мы вместе разберем несколько наглядных примеров, чтобы все стало понятно. Эти метрики обычно встречаются в задачах классификации, поэтому, если вы хотите понять, как определяются precision и recall в этом контексте, мы рекомендуем ознакомиться с `scikit-learn` [руководством](https://scikit-learn.org/stable/auto_examples/model_selection/plot_precision_recall.html).
-
-</Tip>
+> [!TIP]
+> 🙋 Не волнуйтесь, если вы впервые слышите о precision и recall - мы вместе разберем несколько наглядных примеров, чтобы все стало понятно. Эти метрики обычно встречаются в задачах классификации, поэтому, если вы хотите понять, как определяются precision и recall в этом контексте, мы рекомендуем ознакомиться с `scikit-learn` [руководством](https://scikit-learn.org/stable/auto_examples/model_selection/plot_precision_recall.html).
 
 Для ROUGE recall измеряет, насколько эталонное резюме соответствует сгенерированному. Если мы просто сравниваем слова, recall можно рассчитать по следующей формуле:
 
@@ -384,11 +369,8 @@ Score(precision=0.86, recall=1.0, fmeasure=0.92)
 
 Отлично, показатели precision и recall совпадают! А как насчет других показателей ROUGE? `rouge2` измеряет перекрытие биграмм (считайте, что это перекрытие пар слов), а `rougeL` и `rougeLsum` измеряют самые длинные совпадающие последовательности слов, ища самые длинные общие подстроки в сгенерированных и эталонных резюме. Слово "sum" в `rougeLsum` означает, что эта метрика вычисляется для всего резюме, в то время как `rougeL` вычисляется как среднее по отдельным предложениям.
 
-<Tip>
-
-✏️ **Попробуйте!** Создайте свой собственный пример сгенерированного и эталонного резюме и посмотрите, согласуются ли полученные оценки ROUGE с ручным расчетом по формулам precision и recall. Для получения бонусных очков разбейте текст на биграммы и сравните precision и recall для метрики `rouge2`.
-
-</Tip>
+> [!TIP]
+> ✏️ **Попробуйте!** Создайте свой собственный пример сгенерированного и эталонного резюме и посмотрите, согласуются ли полученные оценки ROUGE с ручным расчетом по формулам precision и recall. Для получения бонусных очков разбейте текст на биграммы и сравните precision и recall для метрики `rouge2`.
 
 Мы будем использовать эту оценку ROUGE для отслеживания эффективности нашей модели, но перед этим давайте сделаем то, что должен сделать каждый хороший NLP-практик: создадим сильную, но простую базовую модель!
 
@@ -478,11 +460,8 @@ model = TFAutoModelForSeq2SeqLM.from_pretrained(model_checkpoint)
 
 {/if}
 
-<Tip>
-
-💡 Если вы задаетесь вопросом, почему вы не видите предупреждений о необходимости дообучить модель для последующей задачи, то это потому, что для задач "последовательность-в-последовательность" мы сохраняем все веса сети. Сравните это с нашей моделью классификации текста из [Главы 3](../chapter3/1), где голова предварительно обученной модели была заменена на случайно инициализированную сеть.
-
-</Tip>
+> [!TIP]
+> 💡 Если вы задаетесь вопросом, почему вы не видите предупреждений о необходимости дообучить модель для последующей задачи, то это потому, что для задач "последовательность-в-последовательность" мы сохраняем все веса сети. Сравните это с нашей моделью классификации текста из [Главы 3](../chapter3/1), где голова предварительно обученной модели была заменена на случайно инициализированную сеть.
 
 Следующее, что нам нужно сделать, это войти в Hugging Face Hub. Если вы выполняете этот код в ноутбуке, вы можете сделать это с помощью следующей полезной функции:
 
@@ -843,11 +822,8 @@ model, optimizer, train_dataloader, eval_dataloader = accelerator.prepare(
 )
 ```
 
-<Tip>
-
-🚨 Если вы обучаете на TPU, вам нужно будет перенести весь приведенный выше код в специальную функцию обучения. Подробнее смотрите в [Главе 3](../chapter3/1).
-
-</Tip>
+> [!TIP]
+> 🚨 Если вы обучаете на TPU, вам нужно будет перенести весь приведенный выше код в специальную функцию обучения. Подробнее смотрите в [Главе 3](../chapter3/1).
 
 Теперь, когда мы подготовили наши объекты, осталось сделать три вещи:
 
diff --git a/chapters/ru/chapter7/6.mdx b/chapters/ru/chapter7/6.mdx
index 6ade23c1c..abff5a321 100644
--- a/chapters/ru/chapter7/6.mdx
+++ b/chapters/ru/chapter7/6.mdx
@@ -135,11 +135,8 @@ DatasetDict({
 })
 ```
 
-<Tip>
-
-Предварительное обучение языковой модели займет некоторое время. Мы рекомендуем сначала запустить цикл обучения на выборке данных, раскомментировав две частичные строки выше, и убедиться, что обучение успешно завершено и модели сохранены. Нет ничего обиднее, чем неудачное обучение на последнем этапе из-за того, что вы забыли создать папку или из-за опечатки в конце цикла обучения!
-
-</Tip>
+> [!TIP]
+> Предварительное обучение языковой модели займет некоторое время. Мы рекомендуем сначала запустить цикл обучения на выборке данных, раскомментировав две частичные строки выше, и убедиться, что обучение успешно завершено и модели сохранены. Нет ничего обиднее, чем неудачное обучение на последнем этапе из-за того, что вы забыли создать папку или из-за опечатки в конце цикла обучения!
 
 Давайте рассмотрим пример из датасета. Мы покажем только первые 200 символов каждого поля:
 
@@ -252,11 +249,8 @@ DatasetDict({
 
 Теперь, когда у нас есть готовый датасет, давайте создадим модель!
 
-<Tip>
-
-✏️ **Попробуйте!** Избавление от всех фрагментов, размер которых меньше размера контекста, не является большой проблемой, поскольку мы используем небольшие контекстные окна. При увеличении размера контекста (или если у вас корпус коротких документов) доля отбрасываемых фрагментов также будет расти. Более эффективный способ подготовки данных - объединить все токенизированные примеры в батч с маркером `eos_token_id` между ними, а затем выполнить фрагментацию на конкатенированных последовательностях. В качестве упражнения измените функцию `tokenize()`, чтобы использовать этот подход. Обратите внимание, что вам нужно установить `truncation=False` и удалить другие аргументы из токенизатора, чтобы получить полную последовательность идентификаторов токенов.
-
-</Tip>
+> [!TIP]
+> ✏️ **Попробуйте!** Избавление от всех фрагментов, размер которых меньше размера контекста, не является большой проблемой, поскольку мы используем небольшие контекстные окна. При увеличении размера контекста (или если у вас корпус коротких документов) доля отбрасываемых фрагментов также будет расти. Более эффективный способ подготовки данных - объединить все токенизированные примеры в батч с маркером `eos_token_id` между ними, а затем выполнить фрагментацию на конкатенированных последовательностях. В качестве упражнения измените функцию `tokenize()`, чтобы использовать этот подход. Обратите внимание, что вам нужно установить `truncation=False` и удалить другие аргументы из токенизатора, чтобы получить полную последовательность идентификаторов токенов.
 
 
 ## Инициализация новой модели[[initializing-a-new-model]]
@@ -398,11 +392,8 @@ tf_eval_dataset = model.prepare_tf_dataset(
 
 {/if}
 
-<Tip warning={true}>
-
-⚠️ Сдвиг входов и меток для их выравнивания происходит внутри модели, поэтому коллатор данных просто копирует входы для создания меток.
-
-</Tip>
+> [!WARNING]
+> ⚠️ Сдвиг входов и меток для их выравнивания происходит внутри модели, поэтому коллатор данных просто копирует входы для создания меток.
 
 
 Теперь у нас есть все необходимое для обучения нашей модели - в конце концов, это было не так уж и сложно! Прежде чем приступить к обучению, мы должны войти в Hugging Face. Если вы работаете в блокноте, вы можете сделать это с помощью следующей служебной функции:
@@ -501,25 +492,19 @@ model.fit(tf_train_dataset, validation_data=tf_eval_dataset, callbacks=[callback
 
 {/if}
 
-<Tip>
+> [!TIP]
+> ✏️ **Попробуйте!** Всего около 30 строк кода в дополнение к `TrainingArguments` понадобилось нам, чтобы перейти от сырых текстов к обучению GPT-2. Попробуйте это на своем датасете и посмотрите, сможете ли вы получить хорошие результаты!
 
-✏️ **Попробуйте!** Всего около 30 строк кода в дополнение к `TrainingArguments` понадобилось нам, чтобы перейти от сырых текстов к обучению GPT-2. Попробуйте это на своем датасете и посмотрите, сможете ли вы получить хорошие результаты!
-
-</Tip>
-
-<Tip>
-
-{#if fw === 'pt'}
-
-💡 Если у вас есть доступ к компьютеру с несколькими GPU, попробуйте запустить код на нем. `Trainer` автоматически управляет несколькими компьютерами, и это может значительно ускорить обучение.
-
-{:else}
-
-💡 Если у вас есть доступ к компьютеру с несколькими GPU, вы можете попробовать использовать контекст `MirroredStrategy` для существенного ускорения обучения. Вам нужно будет создать объект `tf.distribute.MirroredStrategy` и убедиться, что все методы `to_tf_dataset()` или `prepare_tf_dataset()`, а также создание модели и вызов `fit()` выполняются в его контексте `scope()`. Документацию на эту тему можно посмотреть [здесь](https://www.tensorflow.org/guide/distributed_training#use_tfdistributestrategy_with_keras_modelfit).
-
-{/if}
-
-</Tip>
+> [!TIP]
+> {#if fw === 'pt'}
+>
+> 💡 Если у вас есть доступ к компьютеру с несколькими GPU, попробуйте запустить код на нем. `Trainer` автоматически управляет несколькими компьютерами, и это может значительно ускорить обучение.
+>
+> {:else}
+>
+> 💡 Если у вас есть доступ к компьютеру с несколькими GPU, вы можете попробовать использовать контекст `MirroredStrategy` для существенного ускорения обучения. Вам нужно будет создать объект `tf.distribute.MirroredStrategy` и убедиться, что все методы `to_tf_dataset()` или `prepare_tf_dataset()`, а также создание модели и вызов `fit()` выполняются в его контексте `scope()`. Документацию на эту тему можно посмотреть [здесь](https://www.tensorflow.org/guide/distributed_training#use_tfdistributestrategy_with_keras_modelfit).
+>
+> {/if}
 
 ## Генерация кода с помощью конвейера[[code-generation-with-a-pipeline]]
 
@@ -795,11 +780,8 @@ model, optimizer, train_dataloader, eval_dataloader = accelerator.prepare(
 )
 ```
 
-<Tip>
-
-🚨 Если вы проводите обучение на TPU, вам нужно будет перенести весь код, начиная с ячейки выше, в выделенную функцию обучения. Подробнее смотрите [Главу 3](../chapter3/1).
-
-</Tip>
+> [!TIP]
+> 🚨 Если вы проводите обучение на TPU, вам нужно будет перенести весь код, начиная с ячейки выше, в выделенную функцию обучения. Подробнее смотрите [Главу 3](../chapter3/1).
 
 Теперь, когда мы отправили наш `train_dataloader` в `accelerator.prepare()`, мы можем использовать его длину для вычисления количества шагов обучения. Помните, что это всегда нужно делать после подготовки загрузчика данных, так как этот метод изменит его длину. Мы используем классический линейный график скорости обучения до 0:
 
@@ -899,16 +881,10 @@ for epoch in range(num_train_epochs):
 
 Вот и все -- теперь у вас есть свой собственный цикл обучения для каузальных языковых моделей, таких как GPT-2, который вы можете дополнительно настроить под свои нужды. 
 
-<Tip>
-
-✏️ **Попробуйте!** Либо создайте свою собственную функцию потерь, подходящую для вашего случая, либо добавьте еще один пользовательский шаг в цикл обучения.
-
-</Tip>
-
-<Tip>
-
-✏️ **Попробуйте!** При проведении длительных экспериментов по обучению полезно регистрировать важные метрики с помощью таких инструментов, как TensorBoard или Weights & Biases. Добавьте соответствующее логирование в цикл обучения, чтобы вы всегда могли проверить, как проходит обучение.
+> [!TIP]
+> ✏️ **Попробуйте!** Либо создайте свою собственную функцию потерь, подходящую для вашего случая, либо добавьте еще один пользовательский шаг в цикл обучения.
 
-</Tip>
+> [!TIP]
+> ✏️ **Попробуйте!** При проведении длительных экспериментов по обучению полезно регистрировать важные метрики с помощью таких инструментов, как TensorBoard или Weights & Biases. Добавьте соответствующее логирование в цикл обучения, чтобы вы всегда могли проверить, как проходит обучение.
 
 {/if}
diff --git a/chapters/ru/chapter7/7.mdx b/chapters/ru/chapter7/7.mdx
index c3c6764c1..2afc87576 100644
--- a/chapters/ru/chapter7/7.mdx
+++ b/chapters/ru/chapter7/7.mdx
@@ -32,11 +32,8 @@
 
 На самом деле это демонстрация модели, которая была обучена и загружена на Hub с помощью кода, показанного в этом разделе. Вы можете найти ее и перепроверить прогнозы [здесь](https://huggingface.co/huggingface-course/bert-finetuned-squad?context=%F0%9F%A4%97+Transformers+is+backed+by+the+three+most+popular+deep+learning+libraries+%E2%80%94+Jax%2C+PyTorch+and+TensorFlow+%E2%80%94+with+a+seamless+integration+between+them.+It%27s+straightforward+to+train+your+models+with+one+before+loading+them+for+inference+with+the+other.&question=Which+deep+learning+libraries+back+%F0%9F%A4%97+Transformers%3F).
 
-<Tip>
-
-💡 Модели, основанные только на энкодере, такие как BERT, как правило, отлично справляются с извлечением ответов на фактоидные вопросы типа "Кто изобрел архитектуру трансформера?", но плохо справляются с открытыми вопросами типа "Почему небо голубое?". В таких сложных случаях для синтеза информации обычно используются модели энкодеров-декодеров, такие как T5 и BART, что очень похоже на [сумризацию текста](/course/chapter7/5). Если вам интересен этот тип *генеративных* ответов на вопросы, рекомендуем ознакомиться с нашим [демо](https://yjernite.github.io/lfqa.html) основанным на [датасете ELI5](https://huggingface.co/datasets/eli5).
-
-</Tip>
+> [!TIP]
+> 💡 Модели, основанные только на энкодере, такие как BERT, как правило, отлично справляются с извлечением ответов на фактоидные вопросы типа "Кто изобрел архитектуру трансформера?", но плохо справляются с открытыми вопросами типа "Почему небо голубое?". В таких сложных случаях для синтеза информации обычно используются модели энкодеров-декодеров, такие как T5 и BART, что очень похоже на [сумризацию текста](/course/chapter7/5). Если вам интересен этот тип *генеративных* ответов на вопросы, рекомендуем ознакомиться с нашим [демо](https://yjernite.github.io/lfqa.html) основанным на [датасете ELI5](https://huggingface.co/datasets/eli5).
 
 ## Подготовка данных[[preparing-the-data]]
 
@@ -359,11 +356,8 @@ print(f"Theoretical answer: {answer}, decoded example: {decoded_example}")
 
 Действительно, мы не видим ответа в контексте.
 
-<Tip>
-
-✏️ **Ваша очередь!** При использовании архитектуры XLNet дополнение применяется слева, а вопрос и контекст меняются местами. Адаптируйте весь код, который мы только что рассмотрели, к архитектуре XLNet (и добавьте `padding=True`). Имейте в виду, что токен `[CLS]` может не находиться в позиции 0 при использовании дополнения.
-
-</Tip>
+> [!TIP]
+> ✏️ **Ваша очередь!** При использовании архитектуры XLNet дополнение применяется слева, а вопрос и контекст меняются местами. Адаптируйте весь код, который мы только что рассмотрели, к архитектуре XLNet (и добавьте `padding=True`). Имейте в виду, что токен `[CLS]` может не находиться в позиции 0 при использовании дополнения.
 
 Теперь, когда мы шаг за шагом разобрались с предварительной обработкой обучающих данных, мы можем сгруппировать их в функцию, которую будем применять ко всему датасету. Мы дополним каждый признак до максимальной длины, которую мы задали, поскольку большинство контекстов будут длинными (и соответствующие образцы будут разбиты на несколько признаков), поэтому применение динамического дополнения здесь не имеет реальной пользы:
 
@@ -908,11 +902,8 @@ tf.keras.mixed_precision.set_global_policy("mixed_float16")
 
 {#if fw === 'pt'}
 
-<Tip>
-
-💡 Если используемый вами выходной каталог существует, он должен быть локальным клоном того розитория, в который вы хотите отправлять данные (поэтому задайте новое имя, если вы получите ошибку при определении `Trainer`).
-
-</Tip>
+> [!TIP]
+> 💡 Если используемый вами выходной каталог существует, он должен быть локальным клоном того розитория, в который вы хотите отправлять данные (поэтому задайте новое имя, если вы получите ошибку при определении `Trainer`).
 
 Наконец, мы просто передаем все в класс `Trainer` и запускаем обучение:
 
@@ -996,11 +987,8 @@ trainer.push_to_hub(commit_message="Training complete")
 
 На этом этапе вы можете использовать виджет инференса на Model Hub, чтобы протестировать модель и поделиться ею с друзьями, семьей и любимыми питомцами. Вы успешно провели дообучение модели для задачи ответа на вопрос - поздравляем!
 
-<Tip>
-
-✏️ **Ваша очередь!** Попробуйте другую архитектуру модели, чтобы узнать, лучше ли она справляется с этой задачей!
-
-</Tip>
+> [!TIP]
+> ✏️ **Ваша очередь!** Попробуйте другую архитектуру модели, чтобы узнать, лучше ли она справляется с этой задачей!
 
 {#if fw === 'pt'}
 
diff --git a/chapters/ru/chapter8/2.mdx b/chapters/ru/chapter8/2.mdx
index 08b975198..bf03edf88 100644
--- a/chapters/ru/chapter8/2.mdx
+++ b/chapters/ru/chapter8/2.mdx
@@ -85,11 +85,8 @@ OSError: Can't load config for 'lewtun/distillbert-base-uncased-finetuned-squad-
 
 В этих отчетах содержится много информации, поэтому давайте вместе пройдемся по ключевым местам. Первое, что следует отметить, - это то, что трассировки следует читать _снизу вверх_. Это может показаться странным, если вы привыкли читать английский текст сверху вниз, но это логично: трассировка показывает последовательность вызовов функций, которые делает `pipeline` при загрузке модели и токенизатора. (Более подробно о том, как работает `pipeline` под капотом, читайте в [Главе 2](../chapter2/1)).
 
-<Tip>
-
-🚨 Видите синюю рамку вокруг "6 frames" в трассировке Google Colab? Это специальная функция Colab, которая помещает отчет в раскрывающийся блок текста. Если вы не можете найти источник ошибки, обязательно раскройте этот блок, нажав на эти две маленькие стрелки.
-
-</Tip>
+> [!TIP]
+> 🚨 Видите синюю рамку вокруг "6 frames" в трассировке Google Colab? Это специальная функция Colab, которая помещает отчет в раскрывающийся блок текста. Если вы не можете найти источник ошибки, обязательно раскройте этот блок, нажав на эти две маленькие стрелки.
 
 Последняя строка трассировки указывает на последнее сообщение об ошибке и дает имя исключения, которое было вызвано. В данном случае тип исключения - `OSError`, что указывает на системную ошибку. Если мы прочитаем сопроводительное сообщение об ошибке, то увидим, что, похоже, возникла проблема с файлом *config.json* модели, и нам предлагается два варианта ее устранения:
 
@@ -103,11 +100,8 @@ Make sure that:
 """
 ```
 
-<Tip>
-
-💡 Если вы столкнулись с сообщением об ошибке, которое трудно понять, просто скопируйте и вставьте его в строку поиска Google или [Stack Overflow](https://stackoverflow.com/) (да, действительно!). Велика вероятность того, что вы не первый, кто столкнулся с этой ошибкой, и это хороший способ найти решения, которые опубликовали другие члены сообщества. Например, поиск по запросу `OSError: Can't load config for` на Stack Overflow дает несколько [результатов](https://stackoverflow.com/search?q=OSError%3A+Can%27t+load+config+for+), которые можно использовать в качестве отправной точки для решения проблемы.
-
-</Tip>
+> [!TIP]
+> 💡 Если вы столкнулись с сообщением об ошибке, которое трудно понять, просто скопируйте и вставьте его в строку поиска Google или [Stack Overflow](https://stackoverflow.com/) (да, действительно!). Велика вероятность того, что вы не первый, кто столкнулся с этой ошибкой, и это хороший способ найти решения, которые опубликовали другие члены сообщества. Например, поиск по запросу `OSError: Can't load config for` на Stack Overflow дает несколько [результатов](https://stackoverflow.com/search?q=OSError%3A+Can%27t+load+config+for+), которые можно использовать в качестве отправной точки для решения проблемы.
 
 В первом предложении нам предлагается проверить, действительно ли идентификатор модели правильный, поэтому первым делом нужно скопировать идентификатор и вставить его в строку поиска Hub:
 
@@ -159,11 +153,8 @@ pretrained_checkpoint = "distilbert-base-uncased"
 config = AutoConfig.from_pretrained(pretrained_checkpoint)
 ```
 
-<Tip warning={true}>
-
-🚨 Применяемый здесь подход не является надежным, поскольку наш коллега мог изменить конфигурацию `distilbert-base-uncased` перед дообучением модели. В реальной жизни мы бы хотели сначала уточнить у него, но для целей этого раздела будем считать, что он использовал конфигурацию по умолчанию.
-
-</Tip>
+> [!WARNING]
+> 🚨 Применяемый здесь подход не является надежным, поскольку наш коллега мог изменить конфигурацию `distilbert-base-uncased` перед дообучением модели. В реальной жизни мы бы хотели сначала уточнить у него, но для целей этого раздела будем считать, что он использовал конфигурацию по умолчанию.
 
 Затем мы можем отправить это в наш репозиторий моделей вместе с конфигурацией с помощью функции `push_to_hub()`:
 
diff --git a/chapters/ru/chapter8/4.mdx b/chapters/ru/chapter8/4.mdx
index 2141a5afe..38561cd07 100644
--- a/chapters/ru/chapter8/4.mdx
+++ b/chapters/ru/chapter8/4.mdx
@@ -245,11 +245,8 @@ trainer.train_dataset.features["label"].names
 
 Здесь у нас нет идентификаторов типов токенов, поскольку DistilBERT их не ожидает; если в вашей модели они есть, вам также следует убедиться, что они правильно соответствуют месту первого и второго предложений во входных данных.
 
-<Tip>
-
-✏️ **Ваша очередь!** Проверьте, все ли правильно со вторым элементом обучающего набора данных.
-
-</Tip>
+> [!TIP]
+> ✏️ **Ваша очередь!** Проверьте, все ли правильно со вторым элементом обучающего набора данных.
 
 В данном случае мы проверяем только обучающий набор, но, конечно, вы должны дважды проверить валидационный и тестовый наборы таким же образом.
 
@@ -522,11 +519,8 @@ trainer.optimizer.step()
 
 Чтобы решить эту проблему, нужно просто использовать меньше памяти на GPU - что зачастую легче сказать, чем сделать. Во-первых, убедитесь, что у вас нет двух моделей на GPU одновременно (если, конечно, это не требуется для решения вашей задачи). Затем, вероятно, следует уменьшить размер батча, поскольку он напрямую влияет на размеры всех промежуточных выходов модели и их градиентов. Если проблема сохраняется, подумайте о том, чтобы использовать меньшую версию модели.
 
-<Tip>
-
-В следующей части курса мы рассмотрим более продвинутые техники, которые помогут вам уменьшить объем занимаемой памяти и позволят точно настроить самые большие модели.
-
-</Tip>
+> [!TIP]
+> В следующей части курса мы рассмотрим более продвинутые техники, которые помогут вам уменьшить объем занимаемой памяти и позволят точно настроить самые большие модели.
 
 ### Валидация модели[[evaluating-the-model]]
 
@@ -553,11 +547,8 @@ trainer.evaluate()
 TypeError: only size-1 arrays can be converted to Python scalars
 ```
 
-<Tip>
-
-💡 Перед запуском `trainer.train()` всегда следует убедиться, что вы можете запустить `trainer.evaluate()`, чтобы не тратить много вычислительных ресурсов до того, как столкнетесь с ошибкой.
-
-</Tip>
+> [!TIP]
+> 💡 Перед запуском `trainer.train()` всегда следует убедиться, что вы можете запустить `trainer.evaluate()`, чтобы не тратить много вычислительных ресурсов до того, как столкнетесь с ошибкой.
 
 Прежде чем пытаться отладить проблему в цикле валидации, нужно сначала убедиться, что вы посмотрели на данные, смогли правильно сформировать батч и запустить на нем свою модель. Мы выполнили все эти шаги, поэтому следующий код может быть выполнен без ошибок:
 
@@ -687,10 +678,8 @@ trainer.train()
 
 В этом случае проблем больше нет, и наш скрипт обучит модель, которая должна дать приемлемые результаты. Но что делать, если обучение проходит без ошибок, а обученная модель совсем не работает? Это самая сложная часть машинного обучения, и мы покажем вам несколько приемов, которые могут помочь.
 
-<Tip>
-
-💡 Если вы используете ручной цикл обучения, для отладки пайплайна обучения применимы те же шаги, но их проще разделить. Убедитесь, что вы не забыли `model.eval()` или `model.train()` в нужных местах, или `zero_grad()` на каждом шаге!
-</Tip>
+> [!TIP]
+> 💡 Если вы используете ручной цикл обучения, для отладки пайплайна обучения применимы те же шаги, но их проще разделить. Убедитесь, что вы не забыли `model.eval()` или `model.train()` в нужных местах, или `zero_grad()` на каждом шаге!
 
 ## Отладка скрытых ошибок во время обучения[[debugging-silent-errors-during-training]]
 
@@ -705,11 +694,8 @@ trainer.train()
 - Есть ли одна метка, которая встречается чаще других?
 - Каким должно быть значение функции потерь/метрики если модель предсказала случайный ответ/всегда один и тот же ответ?
 
-<Tip warning={true}>
-
-⚠️ Если вы проводите распределенное обучение, распечатайте образцы набора данных в каждом процессе и трижды проверьте, что вы получаете одно и то же. Одна из распространенных ошибок - наличие некоторого источника случайности при создании данных, из-за которого каждый процесс имеет свою версию набора данных.
-
-</Tip>
+> [!WARNING]
+> ⚠️ Если вы проводите распределенное обучение, распечатайте образцы набора данных в каждом процессе и трижды проверьте, что вы получаете одно и то же. Одна из распространенных ошибок - наличие некоторого источника случайности при создании данных, из-за которого каждый процесс имеет свою версию набора данных.
 
 Просмотрев данные, проанализируйте несколько предсказаний модели и декодируйте их. Если модель постоянно предсказывает одно и то же, это может быть связано с тем, что ваш набор данных смещен в сторону одной категории (для проблем классификации); здесь могут помочь такие методы, как oversampling редких классов.
 
@@ -738,11 +724,8 @@ for _ in range(20):
     trainer.optimizer.zero_grad()
 ```
 
-<Tip>
-
-💡 Если ваши обучающие данные несбалансированы, обязательно создайте батч обучающих данных, содержащий все метки.
-
-</Tip>
+> [!TIP]
+> 💡 Если ваши обучающие данные несбалансированы, обязательно создайте батч обучающих данных, содержащий все метки.
 
 Результирующая модель должна иметь близкие к идеальным результаты на одном и том же батче. Вычислим метрику по полученным предсказаниям:
 
@@ -763,11 +746,8 @@ compute_metrics((preds.cpu().numpy(), labels.cpu().numpy()))
 
 Если вам не удается добиться от модели таких идеальных результатов, значит, что-то не так с постановкой задачи или данными, и вам следует это исправить. Только когда вам удастся пройти тест на переобучение, вы сможете быть уверены, что ваша модель действительно способна чему-то научиться.
 
-<Tip warning={true}>
-
-⚠️ Вам придется пересоздать модель и `Trainer` после этого теста на переобучение, поскольку полученная модель, вероятно, не сможет восстановиться и научиться чему-то полезному на полном наборе данных.
-
-</Tip>
+> [!WARNING]
+> ⚠️ Вам придется пересоздать модель и `Trainer` после этого теста на переобучение, поскольку полученная модель, вероятно, не сможет восстановиться и научиться чему-то полезному на полном наборе данных.
 
 ### Не обучайте ничего, пока не получите первый бейзлайн.[[dont-tune-anything-until-you-have-a-first-baseline]]
 
diff --git a/chapters/ru/chapter8/4_tf.mdx b/chapters/ru/chapter8/4_tf.mdx
index c88ce58e1..1574389cc 100644
--- a/chapters/ru/chapter8/4_tf.mdx
+++ b/chapters/ru/chapter8/4_tf.mdx
@@ -111,15 +111,12 @@ model.compile(optimizer="adam")
 
 Теперь мы будем использовать конкретную функцию  потерь модели, и эта проблема должна быть решена!
 
-<Tip>
-
-✏️ **Ваша очередь!** В качестве дополнительной задачи после решения других проблем вы можете попробовать вернуться к этому шагу и заставить модель работать с оригинальной функцией потерь, вычисленными Keras. Вам нужно будет добавить `"labels"` к аргументу `label_cols` в `to_tf_dataset()`, чтобы обеспечить корректный вывод меток, что позволит получить градиенты - но есть еще одна проблема с функцией потерь, которую мы указали. Обучение будет продолжаться и с этой проблемой, но обучение будет происходить очень медленно и застопорится на высоком уровне потерь при обучении. Можете ли вы понять, в чем дело?
-
-Подсказка в кодировке ROT13, если вы застряли: Vs lbh ybbx ng gur bhgchgf bs FrdhraprPynffvsvpngvba zbqryf va Genafsbezref, gurve svefg bhgchg vf `ybtvgf`. Jung ner ybtvgf?
-
-И вторая подсказка: Jura lbh fcrpvsl bcgvzvmref, npgvingvbaf be ybffrf jvgu fgevatf, Xrenf frgf nyy gur nethzrag inyhrf gb gurve qrsnhygf. Jung nethzragf qbrf FcnefrPngrtbevpnyPebffragebcl unir, naq jung ner gurve qrsnhygf?
-
-</Tip>
+> [!TIP]
+> ✏️ **Ваша очередь!** В качестве дополнительной задачи после решения других проблем вы можете попробовать вернуться к этому шагу и заставить модель работать с оригинальной функцией потерь, вычисленными Keras. Вам нужно будет добавить `"labels"` к аргументу `label_cols` в `to_tf_dataset()`, чтобы обеспечить корректный вывод меток, что позволит получить градиенты - но есть еще одна проблема с функцией потерь, которую мы указали. Обучение будет продолжаться и с этой проблемой, но обучение будет происходить очень медленно и застопорится на высоком уровне потерь при обучении. Можете ли вы понять, в чем дело?
+>
+> Подсказка в кодировке ROT13, если вы застряли: Vs lbh ybbx ng gur bhgchgf bs FrdhraprPynffvsvpngvba zbqryf va Genafsbezref, gurve svefg bhgchg vf `ybtvgf`. Jung ner ybtvgf?
+>
+> И вторая подсказка: Jura lbh fcrpvsl bcgvzvmref, npgvingvbaf be ybffrf jvgu fgevatf, Xrenf frgf nyy gur nethzrag inyhrf gb gurve qrsnhygf. Jung nethzragf qbrf FcnefrPngrtbevpnyPebffragebcl unir, naq jung ner gurve qrsnhygf?
 
 Теперь попробуем провести обучение. Теперь мы должны получить градиенты, так что, надеюсь (здесь играет зловещая музыка), мы можем просто вызвать `model.fit()` и все будет работать отлично!
 
@@ -362,11 +359,8 @@ model = TFAutoModelForSequenceClassification.from_pretrained(model_checkpoint)
 model.compile(optimizer=Adam(5e-5))
 ```
 
-<Tip>
-
-💡 Вы также можете импортировать функцию `create_optimizer()` из 🤗 Transformers, которая даст вам оптимизатор AdamW с правильным затуханием весов, а также прогревом и затуханием скорости обучения. Этот оптимизатор часто дает несколько лучшие результаты, чем оптимизатор Adam по умолчанию.
-
-</Tip>
+> [!TIP]
+> 💡 Вы также можете импортировать функцию `create_optimizer()` из 🤗 Transformers, которая даст вам оптимизатор AdamW с правильным затуханием весов, а также прогревом и затуханием скорости обучения. Этот оптимизатор часто дает несколько лучшие результаты, чем оптимизатор Adam по умолчанию.
 
 Теперь мы можем попробовать подогнать модель под новую, улучшенную скорость обучения:
 
@@ -388,11 +382,8 @@ model.fit(train_dataset)
 
 Признаком нехватки памяти является ошибка типа "OOM when allocating tensor" - OOM - это сокращение от "out of memory". Это очень распространенная опасность при работе с большими языковыми моделями. Если вы столкнулись с этим, хорошая стратегия - уменьшить размер батча вдвое и попробовать снова. Однако имейте в виду, что некоторые модели *очень* велики. Например, полноразмерная модель GPT-2 имеет 1,5 млрд. параметров, что означает, что вам потребуется 6 Гб памяти только для хранения модели и еще 6 Гб для ее градиентов! Для обучения полной модели GPT-2 обычно требуется более 20 ГБ VRAM, независимо от размера батча, что есть лишь у некоторых GPU. Более легкие модели, такие как `distilbert-base-cased`, гораздо легче запускать, и они обучаются гораздо быстрее.
 
-<Tip>
-
-В следующей части курса мы рассмотрим более продвинутые техники, которые помогут вам уменьшить объем занимаемой памяти и позволят дообучить самые большие модели.
-
-</Tip>
+> [!TIP]
+> В следующей части курса мы рассмотрим более продвинутые техники, которые помогут вам уменьшить объем занимаемой памяти и позволят дообучить самые большие модели.
 
 ### "Голодный" TensorFlow 🦛[[hungry-hungry-tensorflow]]
 
@@ -448,21 +439,15 @@ for batch in train_dataset:
 model.fit(batch, epochs=20)
 ```
 
-<Tip>
-
-💡 Если ваши обучающие данные несбалансированы, обязательно создайте партию обучающих данных, содержащую все метки.
-
-</Tip>
+> [!TIP]
+> 💡 Если ваши обучающие данные несбалансированы, обязательно создайте партию обучающих данных, содержащую все метки.
 
 Полученная модель должна иметь близкие к идеальным результаты для `батча`, значение функции потерь должно быстро уменьшаться до 0 (или минимальному значению для используемой вами функции потерь).
 
 Если вам не удается добиться идеальных результатов, значит, что-то не так с постановкой задачи или данными, и вам следует это исправить. Только когда вам удастся пройти тест на избыточную подгонку, вы сможете быть уверены, что ваша модель действительно способна чему-то научиться.
 
-<Tip warning={true}>
-
-⚠️ Вам придется пересоздать модель и перекомпилировать ее после этого теста на переобучение, поскольку полученная модель, вероятно, не сможет восстановиться и научиться чему-то полезному на полном наборе данных.
-
-</Tip>
+> [!WARNING]
+> ⚠️ Вам придется пересоздать модель и перекомпилировать ее после этого теста на переобучение, поскольку полученная модель, вероятно, не сможет восстановиться и научиться чему-то полезному на полном наборе данных.
 
 ### Не обучайте ничего, пока не получите первый бейзлайн.[[dont-tune-anything-until-you-have-a-first-baseline]]
 
diff --git a/chapters/ru/chapter8/5.mdx b/chapters/ru/chapter8/5.mdx
index c24eabf50..fcfca9598 100644
--- a/chapters/ru/chapter8/5.mdx
+++ b/chapters/ru/chapter8/5.mdx
@@ -17,11 +17,8 @@
 
 Очень важно изолировать часть кода, в которой возникает ошибка, поскольку никто из команды Hugging Face не является волшебником (пока), и они не могут исправить то, чего не видят. Минимальный воспроизводимый пример, как видно из названия, должен быть воспроизводимым. Это значит, что он не должен опираться на какие-либо внешние файлы или данные, которые могут у вас быть. Попробуйте заменить используемые данные какими-нибудь фиктивными значениями, которые выглядят как настоящие и при этом выдают ту же ошибку.
 
-<Tip>
-
-🚨 Многие проблемы в репозитории 🤗 Transformers остаются нерешенными, потому что данные, использованные для их воспроизведения, недоступны.
-
-</Tip>
+> [!TIP]
+> 🚨 Многие проблемы в репозитории 🤗 Transformers остаются нерешенными, потому что данные, использованные для их воспроизведения, недоступны.
 
 Когда у вас есть что-то самодостаточное, вы можете попытаться сократить его до еще меньшего количества строк кода, создав то, что мы называем _минимальным воспроизводимым примером_. Хотя это требует немного больше работы с вашей стороны, вы почти гарантированно получите помощь и исправление, если предоставите хороший, короткий пример воспроизведения ошибки.
 
diff --git a/chapters/ru/chapter9/1.mdx b/chapters/ru/chapter9/1.mdx
index 4af88378c..d23299e98 100644
--- a/chapters/ru/chapter9/1.mdx
+++ b/chapters/ru/chapter9/1.mdx
@@ -32,6 +32,5 @@
 
 Эта глава разбита на разделы, включающие как _концепции_, так и _приложения_. После изучения концепций в каждом разделе вы будете применять их для создания демо определенного типа, начиная от классификации изображений и заканчивая распознаванием речи. К тому времени, как вы закончите эту главу, вы сможете создавать эти демо (и многие другие!) всего в несколько строк кода на Python.
 
-<Tip>
-👀 Проверьте <a href="https://huggingface.co/spaces" target="_blank">Hugging Face Spaces</a> чтобы увидеть множество свежих примеров демо машинного обучения, созданных сообществом специалистов по машинному обучению!
-</Tip>
\ No newline at end of file
+> [!TIP]
+> 👀 Проверьте <a href="https://huggingface.co/spaces" target="_blank">Hugging Face Spaces</a> чтобы увидеть множество свежих примеров демо машинного обучения, созданных сообществом специалистов по машинному обучению!
\ No newline at end of file
diff --git a/chapters/ru/chapter9/7.mdx b/chapters/ru/chapter9/7.mdx
index de0e1f8db..c1b62a8d6 100644
--- a/chapters/ru/chapter9/7.mdx
+++ b/chapters/ru/chapter9/7.mdx
@@ -62,10 +62,8 @@ demo.launch()
 
 1. Блоки позволяют создавать веб-приложения, сочетающие в себе разметку, HTML, кнопки и интерактивные компоненты, просто инстанцируя объекты на Python в контексте `with gradio.Blocks`.
 
-<Tip>
-🙋Если вы не знакомы с оператором `with` в Python, рекомендуем ознакомиться с отличным [руководством](https://realpython.com/python-with-statement/) от Real Python. Возвращайтесь сюда после его прочтения 🤗
-
-</Tip>
+> [!TIP]
+> 🙋Если вы не знакомы с оператором `with` в Python, рекомендуем ознакомиться с отличным [руководством](https://realpython.com/python-with-statement/) от Real Python. Возвращайтесь сюда после его прочтения 🤗
 
 Порядок, в котором вы инстанцируете компоненты, имеет значение, поскольку каждый элемент отображается в веб-приложении в том порядке, в котором он был создан. (Более сложные макеты рассматриваются ниже)
 
diff --git a/chapters/te/chapter1/1.mdx b/chapters/te/chapter1/1.mdx
index c62a72c30..a7985eba7 100644
--- a/chapters/te/chapter1/1.mdx
+++ b/chapters/te/chapter1/1.mdx
@@ -142,14 +142,13 @@
 
 <img src="https://huggingface.co/datasets/huggingface-course/documentation-images/resolve/main/en/chapter1/subtitles.png" alt="Activating subtitles for the Hugging Face course YouTube videos" width="75%">
 
-<Tip>
-  పైన పట్టికలో మీ భాష కనిపించలేదా లేదా మీరు ఇప్పటికే ఉన్న అనువాదానికి
-  సహకరించాలనుకుంటున్నారా? ఇక్కడ ఉన్న సూచనలను{" "}
-  <a href="https://github.com/huggingface/course#translating-the-course-into-your-language">
-    ఇక్కడ
-  </a>
-  అనుసరించడం ద్వారా మీరు కోర్సును అనువదించడానికి మాకు సహాయం చేయవచ్చు.
-</Tip>
+> [!TIP]
+> పైన పట్టికలో మీ భాష కనిపించలేదా లేదా మీరు ఇప్పటికే ఉన్న అనువాదానికి
+>   సహకరించాలనుకుంటున్నారా? ఇక్కడ ఉన్న సూచనలను{" "}
+>   <a href="https://github.com/huggingface/course#translating-the-course-into-your-language">
+>     ఇక్కడ
+>   </a>
+>   అనుసరించడం ద్వారా మీరు కోర్సును అనువదించడానికి మాకు సహాయం చేయవచ్చు.
 
 ## ప్రారంభిద్దాం 🚀
 
diff --git a/chapters/te/chapter1/2.mdx b/chapters/te/chapter1/2.mdx
index ab759edbc..210ad08e2 100644
--- a/chapters/te/chapter1/2.mdx
+++ b/chapters/te/chapter1/2.mdx
@@ -24,11 +24,8 @@ NLP కేవలం వ్రాతపూర్వక వచనానికి 
 
 ఇటీవలి సంవత్సరాలలో, NLP రంగం Large Language Models (LLMs) ద్వారా విప్లవాత్మకంగా మారింది. GPT (Generative Pre-trained Transformer) మరియు [Llama](https://huggingface.co/meta-llama), వంటి ఆర్కిటెక్చర్లను కలిగి ఉన్న ఈ మోడల్స్, భాషా ప్రాసెసింగ్‌లో సాధ్యమయ్యే వాటిని మార్చాయి.
 
-<Tip>
-
-ఒక Large Language Model (LLM) అనేది భారీ మొత్తంలో వచన డేటాపై శిక్షణ పొందిన ఒక AI మోడల్, ఇది మానవ-వంటి వచనాన్ని అర్థం చేసుకోగలదు మరియు ఉత్పత్తి చేయగలదు, భాషలో నమూనాలను గుర్తించగలదు మరియు టాస్క్-నిర్దిష్ట శిక్షణ లేకుండా విస్తృత శ్రేణి భాషా పనులను చేయగలదు. అవి Natural Language Processing (NLP) రంగంలో గణనీయమైన పురోగతిని సూచిస్తాయి.
-
-</Tip>
+> [!TIP]
+> ఒక Large Language Model (LLM) అనేది భారీ మొత్తంలో వచన డేటాపై శిక్షణ పొందిన ఒక AI మోడల్, ఇది మానవ-వంటి వచనాన్ని అర్థం చేసుకోగలదు మరియు ఉత్పత్తి చేయగలదు, భాషలో నమూనాలను గుర్తించగలదు మరియు టాస్క్-నిర్దిష్ట శిక్షణ లేకుండా విస్తృత శ్రేణి భాషా పనులను చేయగలదు. అవి Natural Language Processing (NLP) రంగంలో గణనీయమైన పురోగతిని సూచిస్తాయి.
 
 LLMలు వీటి ద్వారా వర్గీకరించబడతాయి:
 
diff --git a/chapters/te/chapter1/3.mdx b/chapters/te/chapter1/3.mdx
index 28f4e3755..3dc3f04a9 100644
--- a/chapters/te/chapter1/3.mdx
+++ b/chapters/te/chapter1/3.mdx
@@ -19,12 +19,10 @@
 
 ఈ విభాగంలో, Transformer మోడల్స్ ఏమి చేయగలవో చూద్దాం మరియు 🤗 Transformers లైబ్రరీ నుండి మా మొదటి సాధనం: `pipeline()` ఫంక్షన్‌ను ఉపయోగిద్దాం.
 
-<Tip>
-👀 కుడివైపు పైభాగంలో ఉన్న <em>Open in Colab</em> బటన్ చూడండి? దానిపై క్లిక్ చేసి, ఈ విభాగానికి సంబంధించిన అన్ని కోడ్ నమూనాలతో ఉన్న Google Colab నోట్‌బుక్‌ను తెరవండి. కోడ్ ఉదాహరణలు ఉన్న ఏ విభాగంలోనైనా ఈ బటన్ ఉంటుంది.
-
-మీరు ఉదాహరణలను స్థానికంగా అమలు చేయాలనుకుంటే, మేము <a href="/course/chapter0">సెటప్</a> ను చూడమని సిఫార్సు చేస్తాము.
-
-</Tip>
+> [!TIP]
+> 👀 కుడివైపు పైభాగంలో ఉన్న <em>Open in Colab</em> బటన్ చూడండి? దానిపై క్లిక్ చేసి, ఈ విభాగానికి సంబంధించిన అన్ని కోడ్ నమూనాలతో ఉన్న Google Colab నోట్‌బుక్‌ను తెరవండి. కోడ్ ఉదాహరణలు ఉన్న ఏ విభాగంలోనైనా ఈ బటన్ ఉంటుంది.
+>
+> మీరు ఉదాహరణలను స్థానికంగా అమలు చేయాలనుకుంటే, మేము <a href="/course/chapter0">సెటప్</a> ను చూడమని సిఫార్సు చేస్తాము.
 
 ## ట్రాన్స్‌ఫార్మర్‌లు ప్రతిచోటా ఉన్నాయి![[transformers-are-everywhere]]
 
@@ -34,11 +32,8 @@ Transformer మోడల్స్ సహజ భాషా ప్రాసెస
 
 [🤗 Transformers లైబ్రరీ](https://github.com/huggingface/transformers) ఆ పంచుకున్న మోడల్స్‌ను సృష్టించడానికి మరియు ఉపయోగించడానికి ఫంక్షనాలిటీని అందిస్తుంది. [మోడల్ హబ్](https://huggingface.co/models) లో మిలియన్ల కొద్దీ ముందుగా శిక్షణ పొందిన మోడల్స్ ఉన్నాయి, వాటిని ఎవరైనా డౌన్‌లోడ్ చేసి ఉపయోగించవచ్చు. మీరు మీ స్వంత మోడల్స్‌ను కూడా హబ్‌కు అప్‌లోడ్ చేయవచ్చు!
 
-<Tip>
-
-⚠️ Hugging Face హబ్ కేవలం Transformer మోడల్స్ కి మాత్రమే పరిమితం కాదు. ఎవరైనా ఏ రకమైన మోడల్స్ లేదా డేటాసెట్స్‌ను అయినా పంచుకోవచ్చు! అందుబాటులో ఉన్న అన్ని ఫీచర్ల నుండి ప్రయోజనం పొందడానికి <a href="https://huggingface.co/join">huggingface.co ఖాతాను సృష్టించుకోండి</a>!
-
-</Tip>
+> [!TIP]
+> ⚠️ Hugging Face హబ్ కేవలం Transformer మోడల్స్ కి మాత్రమే పరిమితం కాదు. ఎవరైనా ఏ రకమైన మోడల్స్ లేదా డేటాసెట్స్‌ను అయినా పంచుకోవచ్చు! అందుబాటులో ఉన్న అన్ని ఫీచర్ల నుండి ప్రయోజనం పొందడానికి <a href="https://huggingface.co/join">huggingface.co ఖాతాను సృష్టించుకోండి</a>!
 
 Transformer మోడల్స్ తెర వెనుక ఎలా పనిచేస్తాయో లోతుగా పరిశీలించే ముందు, కొన్ని ఆసక్తికరమైన NLP సమస్యలను పరిష్కరించడానికి అవి ఎలా ఉపయోగించబడతాయో కొన్ని ఉదాహరణలు చూద్దాం.
 
@@ -86,11 +81,8 @@ classifier(
 
 ఇక్కడ అందుబాటులో ఉన్న వాటి యొక్క స్థూలదృష్టి ఉంది:
 
-<Tip>
-
-పైప్‌లైన్‌ల పూర్తి మరియు నవీకరించబడిన జాబితా కోసం, [🤗 Transformers డాక్యుమెంటేషన్](https://huggingface.co/docs/hub/en/models-tasks) ను చూడండి.
-
-</Tip>
+> [!TIP]
+> పైప్‌లైన్‌ల పూర్తి మరియు నవీకరించబడిన జాబితా కోసం, [🤗 Transformers డాక్యుమెంటేషన్](https://huggingface.co/docs/hub/en/models-tasks) ను చూడండి.
 
 ### టెక్స్ట్ పైప్‌లైన్‌లు
 
@@ -141,11 +133,8 @@ classifier(
 
 ఈ పైప్‌లైన్‌ను _జీరో-షాట్_ అని అంటారు ఎందుకంటే దాన్ని ఉపయోగించడానికి మీరు మీ డేటాపై మోడల్‌ను ఫైన్-ట్యూన్ చేయాల్సిన అవసరం లేదు. ఇది మీకు కావలసిన లేబుల్స్ జాబితా కోసం నేరుగా సంభావ్యత స్కోర్‌లను తిరిగి ఇవ్వగలదు!
 
-<Tip>
-
-✏️ **ప్రయత్నించి చూడండి!** మీ స్వంత సీక్వెన్సులు మరియు లేబుల్స్‌తో ఆడుకోండి మరియు మోడల్ ఎలా ప్రవర్తిస్తుందో చూడండి.
-
-</Tip>
+> [!TIP]
+> ✏️ **ప్రయత్నించి చూడండి!** మీ స్వంత సీక్వెన్సులు మరియు లేబుల్స్‌తో ఆడుకోండి మరియు మోడల్ ఎలా ప్రవర్తిస్తుందో చూడండి.
 
 ## టెక్స్ట్ జనరేషన్[[text-generation]]
 
@@ -168,11 +157,8 @@ generator("In this course, we will teach you how to")
 
 మీరు `num_return_sequences` ఆర్గ్యుమెంట్‌తో ఎన్ని వేర్వేరు సీక్వెన్సులను రూపొందించాలో మరియు `max_length` ఆర్గ్యుమెంట్‌తో అవుట్‌పుట్ టెక్స్ట్ యొక్క మొత్తం పొడవును నియంత్రించవచ్చు.
 
-<Tip>
-
-✏️ **ప్రయత్నించి చూడండి!** `num_return_sequences` మరియు `max_length` ఆర్గ్యుమెంట్‌లను ఉపయోగించి 15 పదాల పొడవు గల రెండు వాక్యాలను రూపొందించండి.
-
-</Tip>
+> [!TIP]
+> ✏️ **ప్రయత్నించి చూడండి!** `num_return_sequences` మరియు `max_length` ఆర్గ్యుమెంట్‌లను ఉపయోగించి 15 పదాల పొడవు గల రెండు వాక్యాలను రూపొందించండి.
 
 ## హబ్ నుండి ఏ మోడల్‌నైనా పైప్‌లైన్‌లో ఉపయోగించడం[[using-any-model-from-the-hub-in-a-pipeline]]
 
@@ -203,11 +189,8 @@ generator(
 
 మీరు దానిపై క్లిక్ చేయడం ద్వారా ఒక మోడల్‌ను ఎంచుకున్న తర్వాత, దాన్ని నేరుగా ఆన్‌లైన్‌లో ప్రయత్నించడానికి ఒక విడ్జెట్ ఉందని మీరు చూస్తారు. ఈ విధంగా మీరు మోడల్‌ను డౌన్‌లోడ్ చేయడానికి ముందు దాని సామర్థ్యాలను త్వరగా పరీక్షించవచ్చు.
 
-<Tip>
-
-✏️ **ప్రయత్నించి చూడండి!** మరొక భాష కోసం టెక్స్ట్ జనరేషన్ మోడల్‌ను కనుగొనడానికి ఫిల్టర్‌లను ఉపయోగించండి. విడ్జెట్‌తో ఆడుకోవడానికి సంకోచించకండి మరియు దానిని పైప్‌లైన్‌లో ఉపయోగించండి!
-
-</Tip>
+> [!TIP]
+> ✏️ **ప్రయత్నించి చూడండి!** మరొక భాష కోసం టెక్స్ట్ జనరేషన్ మోడల్‌ను కనుగొనడానికి ఫిల్టర్‌లను ఉపయోగించండి. విడ్జెట్‌తో ఆడుకోవడానికి సంకోచించకండి మరియు దానిని పైప్‌లైన్‌లో ఉపయోగించండి!
 
 ### ఇన్ఫరెన్స్ ప్రొవైడర్లు[[inference-providers]]
 
@@ -239,11 +222,8 @@ unmasker("This course will teach you all about <mask> models.", top_k=2)
 
 `top_k` ఆర్గ్యుమెంట్ మీరు ఎన్ని అవకాశాలను ప్రదర్శించాలనుకుంటున్నారో నియంత్రిస్తుంది. ఇక్కడ మోడల్ ప్రత్యేక `<mask>` పదాన్ని నింపుతుందని గమనించండి, దీనిని తరచుగా _మాస్క్ టోకెన్_ అని అంటారు. ఇతర మాస్క్-ఫిల్లింగ్ మోడల్స్ వేర్వేరు మాస్క్ టోకెన్‌లను కలిగి ఉండవచ్చు, కాబట్టి ఇతర మోడల్స్‌ను అన్వేషించేటప్పుడు సరైన మాస్క్ పదాన్ని ధృవీకరించడం ఎల్లప్పుడూ మంచిది. దానిని తనిఖీ చేయడానికి ఒక మార్గం విడ్జెట్‌లో ఉపయోగించిన మాస్క్ పదాన్ని చూడటం.
 
-<Tip>
-
-✏️ **ప్రయత్నించి చూడండి!** హబ్‌లో `bert-base-cased` మోడల్ కోసం శోధించి మరియు ఇన్ఫరెన్స్ API విడ్జెట్‌లో దాని మాస్క్ పదాన్ని గుర్తించండి. మా `pipeline` ఉదాహరణలోని వాక్యానికి ఈ మోడల్ ఏమి అంచనా వేస్తుంది?
-
-</Tip>
+> [!TIP]
+> ✏️ **ప్రయత్నించి చూడండి!** హబ్‌లో `bert-base-cased` మోడల్ కోసం శోధించి మరియు ఇన్ఫరెన్స్ API విడ్జెట్‌లో దాని మాస్క్ పదాన్ని గుర్తించండి. మా `pipeline` ఉదాహరణలోని వాక్యానికి ఈ మోడల్ ఏమి అంచనా వేస్తుంది?
 
 ## నేమ్డ్ ఎంటిటీ రికగ్నిషన్[[named-entity-recognition]]
 
@@ -267,11 +247,8 @@ ner("My name is Sylvain and I work at Hugging Face in Brooklyn.")
 
 వాక్యంలోని ఒకే ఎంటిటీకి సంబంధించిన భాగాలను తిరిగి సమూహపరచమని పైప్‌లైన్‌కు చెప్పడానికి మేము పైప్‌లైన్ సృష్టి ఫంక్షన్‌లో `grouped_entities=True` ఎంపికను పాస్ చేస్తాము: ఇక్కడ మోడల్ "Hugging" మరియు "Face" ను ఒకే సంస్థగా సరిగ్గా సమూహపరిచింది, పేరు అనేక పదాలతో ఉన్నప్పటికీ. నిజానికి, మనం తదుపరి అధ్యాయంలో చూస్తాము, ప్రిప్రాసెసింగ్ కొన్ని పదాలను చిన్న భాగాలుగా కూడా విభజిస్తుంది. ఉదాహరణకు, `Sylvain` ను నాలుగు ముక్కలుగా విభజించారు: `S`, `##yl`, `##va`, మరియు `##in`. పోస్ట్-ప్రాసెసింగ్ దశలో, పైప్‌లైన్ ఆ ముక్కలను విజయవంతంగా తిరిగి సమూహపరిచింది.
 
-<Tip>
-
-✏️ **ప్రయత్నించి చూడండి!** ఇంగ్లీషులో పార్ట్-ఆఫ్-స్పీచ్ ట్యాగింగ్ (సాధారణంగా POS అని సంక్షిప్తం) చేయగల మోడల్ కోసం మోడల్ హబ్‌ను శోధించండి. పై ఉదాహరణలోని వాక్యానికి ఈ మోడల్ ఏమి అంచనా వేస్తుంది?
-
-</Tip>
+> [!TIP]
+> ✏️ **ప్రయత్నించి చూడండి!** ఇంగ్లీషులో పార్ట్-ఆఫ్-స్పీచ్ ట్యాగింగ్ (సాధారణంగా POS అని సంక్షిప్తం) చేయగల మోడల్ కోసం మోడల్ హబ్‌ను శోధించండి. పై ఉదాహరణలోని వాక్యానికి ఈ మోడల్ ఏమి అంచనా వేస్తుంది?
 
 ## ప్రశ్న సమాధానం[[question-answering]]
 
@@ -354,11 +331,8 @@ translator("Ce cours est produit par Hugging Face.")
 
 టెక్స్ట్ జనరేషన్ మరియు సారాంశీకరణ లాగే, మీరు ఫలితం కోసం `max_length` లేదా `min_length` ను నిర్దేశించవచ్చు.
 
-<Tip>
-
-✏️ **ప్రయత్నించి చూడండి!** ఇతర భాషలలో అనువాద మోడల్స్ కోసం శోధించి, మునుపటి వాక్యాన్ని కొన్ని వేర్వేరు భాషలలోకి అనువదించడానికి ప్రయత్నించండి.
-
-</Tip>
+> [!TIP]
+> ✏️ **ప్రయత్నించి చూడండి!** ఇతర భాషలలో అనువాద మోడల్స్ కోసం శోధించి, మునుపటి వాక్యాన్ని కొన్ని వేర్వేరు భాషలలోకి అనువదించడానికి ప్రయత్నించండి.
 
 ## ఇమేజ్ మరియు ఆడియో పైప్‌లైన్‌లు
 
diff --git a/chapters/te/chapter1/4.mdx b/chapters/te/chapter1/4.mdx
index f4f12f2aa..bf295d681 100644
--- a/chapters/te/chapter1/4.mdx
+++ b/chapters/te/chapter1/4.mdx
@@ -4,11 +4,8 @@
 
 ఈ విభాగంలో, మనం Transformer మోడల్స్ యొక్క ఆర్కిటెక్చర్‌ను పరిశీలిద్దాం మరియు అటెన్షన్, ఎన్‌కోడర్-డీకోడర్ ఆర్కిటెక్చర్ వంటి భావనలను లోతుగా చర్చిద్దాం.
 
-<Tip warning={true}>
-
-🚀 ఇక్కడ మనం విషయాలను మరింత లోతుగా తెలుసుకుంటున్నాం. ఈ విభాగం వివరంగా మరియు సాంకేతికంగా ఉంటుంది, కాబట్టి మీరు వెంటనే అన్నింటినీ అర్థం చేసుకోకపోయినా చింతించకండి. కోర్సులో తరువాత ఈ భావనల వద్దకు మనం తిరిగి వస్తాము.
-
-</Tip>
+> [!WARNING]
+> 🚀 ఇక్కడ మనం విషయాలను మరింత లోతుగా తెలుసుకుంటున్నాం. ఈ విభాగం వివరంగా మరియు సాంకేతికంగా ఉంటుంది, కాబట్టి మీరు వెంటనే అన్నింటినీ అర్థం చేసుకోకపోయినా చింతించకండి. కోర్సులో తరువాత ఈ భావనల వద్దకు మనం తిరిగి వస్తాము.
 
 ## Transformerల చరిత్ర గురించి కొంచెం[[a-bit-of-transformer-history]]
 
diff --git a/chapters/te/chapter1/5.mdx b/chapters/te/chapter1/5.mdx
index 0754c89ea..8bf4adfee 100644
--- a/chapters/te/chapter1/5.mdx
+++ b/chapters/te/chapter1/5.mdx
@@ -4,11 +4,8 @@
 
 [Transformers, what can they do?](/course/chapter1/3)లో, మీరు సహజ భాషా ప్రాసెసింగ్ (NLP), ప్రసంగం మరియు ఆడియో, కంప్యూటర్ విజన్ పనులు మరియు వాటి కొన్ని ముఖ్యమైన అనువర్తనాల గురించి తెలుసుకున్నారు. ఈ పేజీ మోడల్స్ ఈ పనులను ఎలా పరిష్కరిస్తాయో నిశితంగా పరిశీలిస్తుంది మరియు తెరవెనుక ఏమి జరుగుతుందో వివరిస్తుంది. ఒక నిర్దిష్ట పనిని పరిష్కరించడానికి చాలా మార్గాలు ఉన్నాయి, కొన్ని మోడల్స్ నిర్దిష్ట టెక్నిక్‌లను అమలు చేయవచ్చు లేదా పనిని కొత్త కోణం నుండి కూడా సంప్రదించవచ్చు, కానీ Transformer మోడల్స్ కోసం, సాధారణ ఆలోచన ఒకటే. దాని సౌకర్యవంతమైన ఆర్కిటెక్చర్ కారణంగా, చాలా మోడల్స్ ఒక ఎన్‌కోడర్, ఒక డీకోడర్, లేదా ఒక ఎన్‌కోడర్-డీకోడర్ నిర్మాణం యొక్క వైవిధ్యాలుగా ఉంటాయి.
 
-<Tip>
-
-నిర్దిష్ట ఆర్కిటెక్చరల్ వైవిధ్యాలలోకి ప్రవేశించే ముందు, చాలా పనులు ఒకే విధమైన పద్ధతిని అనుసరిస్తాయని అర్థం చేసుకోవడం సహాయపడుతుంది: ఇన్‌పుట్ డేటా ఒక మోడల్ ద్వారా ప్రాసెస్ చేయబడుతుంది మరియు అవుట్‌పుట్ ఒక నిర్దిష్ట పని కోసం వ్యాఖ్యానించబడుతుంది. తేడాలు డేటాను ఎలా తయారు చేస్తారు, ఏ మోడల్ ఆర్కిటెక్చర్ వైవిధ్యం ఉపయోగించబడుతుంది మరియు అవుట్‌పుట్ ఎలా ప్రాసెస్ చేయబడుతుంది అనే దానిలో ఉంటాయి.
-
-</Tip>
+> [!TIP]
+> నిర్దిష్ట ఆర్కిటెక్చరల్ వైవిధ్యాలలోకి ప్రవేశించే ముందు, చాలా పనులు ఒకే విధమైన పద్ధతిని అనుసరిస్తాయని అర్థం చేసుకోవడం సహాయపడుతుంది: ఇన్‌పుట్ డేటా ఒక మోడల్ ద్వారా ప్రాసెస్ చేయబడుతుంది మరియు అవుట్‌పుట్ ఒక నిర్దిష్ట పని కోసం వ్యాఖ్యానించబడుతుంది. తేడాలు డేటాను ఎలా తయారు చేస్తారు, ఏ మోడల్ ఆర్కిటెక్చర్ వైవిధ్యం ఉపయోగించబడుతుంది మరియు అవుట్‌పుట్ ఎలా ప్రాసెస్ చేయబడుతుంది అనే దానిలో ఉంటాయి.
 
 పనులు ఎలా పరిష్కరించబడతాయో వివరించడానికి, ఉపయోగకరమైన అంచనాలను అవుట్‌పుట్ చేయడానికి మోడల్ లోపల ఏమి జరుగుతుందో మేము వివరిస్తాము. మేము ఈ క్రింది మోడల్స్ మరియు వాటి సంబంధిత పనులను కవర్ చేస్తాము:
 
@@ -21,11 +18,8 @@
 - [GPT2](https://huggingface.co/docs/transformers/model_doc/gpt2) డీకోడర్‌ను ఉపయోగించే టెక్స్ట్ జనరేషన్ వంటి NLP పనుల కోసం
 - [BART](https://huggingface.co/docs/transformers/model_doc/bart) ఎన్‌కోడర్-డీకోడర్‌ను ఉపయోగించే సారాంశం మరియు అనువాదం వంటి NLP పనుల కోసం
 
-<Tip>
-
-మీరు ముందుకు వెళ్లే ముందు, అసలు Transformer ఆర్కిటెక్చర్ గురించి কিছু ప్రాథమిక పరిజ్ఞానం కలిగి ఉండటం మంచిది. ఎన్‌కోడర్‌లు, డీకోడర్‌లు మరియు అటెన్షన్ ఎలా పనిచేస్తుందో తెలుసుకోవడం వేర్వేరు Transformer మోడల్స్ ఎలా పనిచేస్తాయో అర్థం చేసుకోవడంలో మీకు సహాయపడుతుంది. మరింత సమాచారం కోసం మా [మునుపటి విభాగాన్ని](https://huggingface.co/course/chapter1/4?fw=pt) తప్పకుండా చూడండి!
-
-</Tip>
+> [!TIP]
+> మీరు ముందుకు వెళ్లే ముందు, అసలు Transformer ఆర్కిటెక్చర్ గురించి কিছু ప్రాథమిక పరిజ్ఞానం కలిగి ఉండటం మంచిది. ఎన్‌కోడర్‌లు, డీకోడర్‌లు మరియు అటెన్షన్ ఎలా పనిచేస్తుందో తెలుసుకోవడం వేర్వేరు Transformer మోడల్స్ ఎలా పనిచేస్తాయో అర్థం చేసుకోవడంలో మీకు సహాయపడుతుంది. మరింత సమాచారం కోసం మా [మునుపటి విభాగాన్ని](https://huggingface.co/course/chapter1/4?fw=pt) తప్పకుండా చూడండి!
 
 ## భాష కోసం Transformer మోడల్స్
 
@@ -58,11 +52,8 @@ Transformers లైబ్రరీలో, భాషా నమూనాలు స
 
 తరువాతి విభాగాలలో, మేము నిర్దిష్ట మోడల్ ఆర్కిటెక్చర్లను మరియు అవి ప్రసంగం, దృష్టి మరియు టెక్స్ట్ డొమైన్‌లలోని వివిధ పనులకు ఎలా వర్తింపజేయబడతాయో అన్వేషిస్తాము.
 
-<Tip>
-
-ఒక నిర్దిష్ట NLP పనికి ఏ Transformer ఆర్కిటెక్చర్ భాగం (ఎన్‌కోడర్, డీకోడర్, లేదా రెండూ) ఉత్తమంగా సరిపోతుందో అర్థం చేసుకోవడం సరైన మోడల్‌ను ఎంచుకోవడంలో కీలకం. సాధారణంగా, ద్విదిశాత్మక సందర్భం అవసరమయ్యే పనులు ఎన్‌కోడర్‌లను ఉపయోగిస్తాయి, టెక్స్ట్‌ను ఉత్పత్తి చేసే పనులు డీకోడర్‌లను ఉపయోగిస్తాయి మరియు ఒక క్రమాన్ని మరొకదానికి మార్చే పనులు ఎన్‌కోడర్-డీకోడర్‌లను ఉపయోగిస్తాయి.
-
-</Tip>
+> [!TIP]
+> ఒక నిర్దిష్ట NLP పనికి ఏ Transformer ఆర్కిటెక్చర్ భాగం (ఎన్‌కోడర్, డీకోడర్, లేదా రెండూ) ఉత్తమంగా సరిపోతుందో అర్థం చేసుకోవడం సరైన మోడల్‌ను ఎంచుకోవడంలో కీలకం. సాధారణంగా, ద్విదిశాత్మక సందర్భం అవసరమయ్యే పనులు ఎన్‌కోడర్‌లను ఉపయోగిస్తాయి, టెక్స్ట్‌ను ఉత్పత్తి చేసే పనులు డీకోడర్‌లను ఉపయోగిస్తాయి మరియు ఒక క్రమాన్ని మరొకదానికి మార్చే పనులు ఎన్‌కోడర్-డీకోడర్‌లను ఉపయోగిస్తాయి.
 
 ### టెక్స్ట్ జనరేషన్
 
@@ -90,10 +81,9 @@ GPT-2 యొక్క ప్రీ-ట్రైనింగ్ లక్ష్
 
 టెక్స్ట్ జనరేషన్‌లో మీ నైపుణ్యాన్ని ప్రయత్నించడానికి సిద్ధంగా ఉన్నారా? DistilGPT-2ను ఫైన్‌ట్యూన్ చేయడం మరియు దానిని ఇన్ఫరెన్స్ కోసం ఉపయోగించడం ఎలాగో తెలుసుకోవడానికి మా పూర్తి [causal language modeling guide](https://huggingface.co/docs/transformers/tasks/language_modeling#causal-language-modeling)ను చూడండి!
 
-<Tip>
-  టెక్స్ట్ జనరేషన్ గురించి మరింత సమాచారం కోసం, [text generation
-  strategies](generation_strategies) గైడ్‌ను చూడండి!
-</Tip>
+> [!TIP]
+> టెక్స్ట్ జనరేషన్ గురించి మరింత సమాచారం కోసం, [text generation
+>   strategies](generation_strategies) గైడ్‌ను చూడండి!
 
 ### టెక్స్ట్ వర్గీకరణ
 
@@ -129,11 +119,8 @@ GPT-2 యొక్క ప్రీ-ట్రైనింగ్ లక్ష్
 
 ప్రశ్నలకు సమాధానాలలో మీ నైపుణ్యాన్ని ప్రయత్నించడానికి సిద్ధంగా ఉన్నారా? DistilBERTను ఫైన్‌ట్యూన్ చేయడం మరియు దానిని ఇన్ఫరెన్స్ కోసం ఉపయోగించడం ఎలాగో తెలుసుకోవడానికి మా పూర్తి [question answering guide](https://huggingface.co/docs/transformers/tasks/question_answering)ను చూడండి!
 
-<Tip>
-
-💡 BERT ముందే శిక్షణ పొందిన తర్వాత వివిధ పనుల కోసం ఉపయోగించడం ఎంత సులభమో గమనించండి. మీ దాచిన స్థితులను మీ కావలసిన అవుట్‌పుట్‌గా మార్చడానికి ముందే శిక్షణ పొందిన మోడల్‌కు ఒక నిర్దిష్ట హెడ్‌ను జోడించడం మాత్రమే మీకు అవసరం!
-
-</Tip>
+> [!TIP]
+> 💡 BERT ముందే శిక్షణ పొందిన తర్వాత వివిధ పనుల కోసం ఉపయోగించడం ఎంత సులభమో గమనించండి. మీ దాచిన స్థితులను మీ కావలసిన అవుట్‌పుట్‌గా మార్చడానికి ముందే శిక్షణ పొందిన మోడల్‌కు ఒక నిర్దిష్ట హెడ్‌ను జోడించడం మాత్రమే మీకు అవసరం!
 
 ### సారాంశం
 
@@ -151,11 +138,8 @@ GPT-2 యొక్క ప్రీ-ట్రైనింగ్ లక్ష్
 
 సారాంశంలో మీ నైపుణ్యాన్ని ప్రయత్నించడానికి సిద్ధంగా ఉన్నారా? T5ను ఫైన్‌ట్యూన్ చేయడం మరియు దానిని ఇన్ఫరెన్స్ కోసం ఉపయోగించడం ఎలాగో తెలుసుకోవడానికి మా పూర్తి [summarization guide](https://huggingface.co/docs/transformers/tasks/summarization)ను చూడండి!
 
-<Tip>
-
-టెక్స్ట్ జనరేషన్ గురించి మరింత సమాచారం కోసం, [text generation strategies](https://huggingface.co/docs/transformers/generation_strategies) గైడ్‌ను చూడండి!
-
-</Tip>
+> [!TIP]
+> టెక్స్ట్ జనరేషన్ గురించి మరింత సమాచారం కోసం, [text generation strategies](https://huggingface.co/docs/transformers/generation_strategies) గైడ్‌ను చూడండి!
 
 ### అనువాదం
 
@@ -166,11 +150,8 @@ BART తరువాత అనేక విభిన్న భాషలపై 
 
 అనువాదంలో మీ నైపుణ్యాన్ని ప్రయత్నించడానికి సిద్ధంగా ఉన్నారా? T5ను ఫైన్‌ట్యూన్ చేయడం మరియు దానిని ఇన్ఫరెన్స్ కోసం ఉపయోగించడం ఎలాగో తెలుసుకోవడానికి మా పూర్తి [translation guide](https://huggingface.co/docs/transformers/tasks/translation)ను చూడండి!
 
-<Tip>
-
-ఈ గైడ్ అంతటా మీరు చూసినట్లుగా, అనేక మోడల్స్ వేర్వేరు పనులను పరిష్కరించినప్పటికీ ఇలాంటి నమూనాలను అనుసరిస్తాయి. ఈ సాధారణ నమూనాలను అర్థం చేసుకోవడం కొత్త మోడల్స్ ఎలా పనిచేస్తాయో మరియు మీ నిర్దిష్ట అవసరాలకు అనుగుణంగా ఉన్న మోడల్స్‌ను ఎలా మార్చుకోవాలో త్వరగా గ్రహించడంలో మీకు సహాయపడుతుంది.
-
-</Tip>
+> [!TIP]
+> ఈ గైడ్ అంతటా మీరు చూసినట్లుగా, అనేక మోడల్స్ వేర్వేరు పనులను పరిష్కరించినప్పటికీ ఇలాంటి నమూనాలను అనుసరిస్తాయి. ఈ సాధారణ నమూనాలను అర్థం చేసుకోవడం కొత్త మోడల్స్ ఎలా పనిచేస్తాయో మరియు మీ నిర్దిష్ట అవసరాలకు అనుగుణంగా ఉన్న మోడల్స్‌ను ఎలా మార్చుకోవాలో త్వరగా గ్రహించడంలో మీకు సహాయపడుతుంది.
 
 ## టెక్స్ట్ దాటిన మోడాలిటీలు
 
@@ -198,11 +179,8 @@ Whisper వెబ్ నుండి సేకరించిన 680,000 గం
 
 ఇప్పుడు Whisper ముందే శిక్షణ పొందింది, మీరు దానిని జీరో-షాట్ ఇన్ఫరెన్స్ కోసం నేరుగా ఉపయోగించవచ్చు లేదా ఆటోమేటిక్ స్పీచ్ రికగ్నిషన్ లేదా స్పీచ్ ట్రాన్స్‌లేషన్ వంటి నిర్దిష్ట పనులపై మెరుగైన పనితీరు కోసం మీ డేటాపై ఫైన్‌ట్యూన్ చేయవచ్చు!
 
-<Tip>
-
-Whisperలోని కీలకమైన ఆవిష్కరణ దాని శిక్షణ, ఇది ఇంటర్నెట్ నుండి అపూర్వమైన స్థాయిలో విభిన్న, బలహీనంగా పర్యవేక్షించబడిన ఆడియో డేటాపై జరిగింది. ఇది పని-నిర్దిష్ట ఫైన్‌ట్యూనింగ్ లేకుండా విభిన్న భాషలు, యాసలు మరియు పనులకు అసాధారణంగా బాగా సాధారణీకరించడానికి అనుమతిస్తుంది.
-
-</Tip>
+> [!TIP]
+> Whisperలోని కీలకమైన ఆవిష్కరణ దాని శిక్షణ, ఇది ఇంటర్నెట్ నుండి అపూర్వమైన స్థాయిలో విభిన్న, బలహీనంగా పర్యవేక్షించబడిన ఆడియో డేటాపై జరిగింది. ఇది పని-నిర్దిష్ట ఫైన్‌ట్యూనింగ్ లేకుండా విభిన్న భాషలు, యాసలు మరియు పనులకు అసాధారణంగా బాగా సాధారణీకరించడానికి అనుమతిస్తుంది.
 
 ### ఆటోమేటిక్ స్పీచ్ రికగ్నిషన్
 
@@ -231,11 +209,8 @@ transcriber("https://huggingface.co/datasets/Narsil/asr_dummy/resolve/main/mlk.f
 1. ఒక చిత్రాన్ని ప్యాచ్‌ల క్రమంగా విభజించి మరియు వాటిని ఒక Transformerతో సమాంతరంగా ప్రాసెస్ చేయండి.
 2. [ConvNeXT](https://huggingface.co/docs/transformers/model_doc/convnext) వంటి ఆధునిక CNNను ఉపయోగించండి, ఇది కన్వల్యూషనల్ లేయర్‌లపై ఆధారపడుతుంది కానీ ఆధునిక నెట్‌వర్క్ డిజైన్‌లను అనుసరిస్తుంది.
 
-<Tip>
-
-మూడవ విధానం Transformersను కన్వల్యూషన్‌లతో మిళితం చేస్తుంది (ఉదాహరణకు, [Convolutional Vision Transformer](https://huggingface.co/docs/transformers/model_doc/cvt) లేదా [LeViT](https://huggingface.co/docs/transformers/model_doc/levit)). మనం వాటిని చర్చించము ఎందుకంటే అవి ఇక్కడ మనం పరిశీలించే రెండు విధానాలను మిళితం చేస్తాయి.
-
-</Tip>
+> [!TIP]
+> మూడవ విధానం Transformersను కన్వల్యూషన్‌లతో మిళితం చేస్తుంది (ఉదాహరణకు, [Convolutional Vision Transformer](https://huggingface.co/docs/transformers/model_doc/cvt) లేదా [LeViT](https://huggingface.co/docs/transformers/model_doc/levit)). మనం వాటిని చర్చించము ఎందుకంటే అవి ఇక్కడ మనం పరిశీలించే రెండు విధానాలను మిళితం చేస్తాయి.
 
 ViT మరియు ConvNeXT సాధారణంగా ఇమేజ్ వర్గీకరణ కోసం ఉపయోగించబడతాయి, కానీ వస్తువు గుర్తింపు, సెగ్మెంటేషన్, మరియు డెప్త్ ఎస్టిమేషన్ వంటి ఇతర దృష్టి పనుల కోసం, మనం వరుసగా DETR, Mask2Former మరియు GLPNలను పరిశీలిస్తాము; ఈ మోడల్స్ ఆ పనులకు బాగా సరిపోతాయి.
 
@@ -263,8 +238,5 @@ ViT ప్రవేశపెట్టిన ప్రధాన మార్ప
 
 ఇమేజ్ వర్గీకరణలో మీ నైపుణ్యాన్ని ప్రయత్నించడానికి సిద్ధంగా ఉన్నారా? ViTను ఫైన్-ట్యూన్ చేయడం మరియు దానిని ఇన్ఫరెన్స్ కోసం ఉపయోగించడం ఎలాగో తెలుసుకోవడానికి మా పూర్తి [image classification guide](https://huggingface.co/docs/transformers/tasks/image_classification)ను చూడండి!
 
-<Tip>
-
-ViT మరియు BERT మధ్య సమాంతరాన్ని గమనించండి: రెండూ మొత్తం ప్రాతినిధ్యాన్ని సంగ్రహించడానికి ఒక ప్రత్యేక టోకెన్ (<code>[CLS]</code>)ను ఉపయోగిస్తాయి, రెండూ వాటి ఎంబెడ్డింగ్‌లకు స్థాన సమాచారాన్ని జోడిస్తాయి, మరియు రెండూ టోకెన్‌లు/ప్యాచ్‌ల క్రమాన్ని ప్రాసెస్ చేయడానికి ఒక Transformer ఎన్‌కోడర్‌ను ఉపయోగిస్తాయి.
-
-</Tip>
+> [!TIP]
+> ViT మరియు BERT మధ్య సమాంతరాన్ని గమనించండి: రెండూ మొత్తం ప్రాతినిధ్యాన్ని సంగ్రహించడానికి ఒక ప్రత్యేక టోకెన్ (<code>[CLS]</code>)ను ఉపయోగిస్తాయి, రెండూ వాటి ఎంబెడ్డింగ్‌లకు స్థాన సమాచారాన్ని జోడిస్తాయి, మరియు రెండూ టోకెన్‌లు/ప్యాచ్‌ల క్రమాన్ని ప్రాసెస్ చేయడానికి ఒక Transformer ఎన్‌కోడర్‌ను ఉపయోగిస్తాయి.
diff --git a/chapters/te/chapter1/6.mdx b/chapters/te/chapter1/6.mdx
index d9939b20b..bdb062578 100644
--- a/chapters/te/chapter1/6.mdx
+++ b/chapters/te/chapter1/6.mdx
@@ -6,11 +6,8 @@
 
 ఈ విభాగంలో, మేము Transformer నమూనాల యొక్క మూడు ప్రధాన నిర్మాణ వైవిధ్యాలను లోతుగా పరిశీలించబోతున్నాము మరియు ప్రతి ఒక్కటి ఎప్పుడు ఉపయోగించాలో అర్థం చేసుకుంటాము.
 
-<Tip>
-
-చాలా Transformer నమూనాలు మూడు నిర్మాణాలలో ఒకదాన్ని ఉపయోగిస్తాయని గుర్తుంచుకోండి: ఎన్‌కోడర్-మాత్రమే, డీకోడర్-మాత్రమే లేదా ఎన్‌కోడర్-డీకోడర్ (సీక్వెన్స్-టు-సీక్వెన్స్). ఈ తేడాలను అర్థం చేసుకోవడం మీ నిర్దిష్ట పనికి సరైన నమూనాను ఎంచుకోవడంలో మీకు సహాయపడుతుంది.
-
-</Tip>
+> [!TIP]
+> చాలా Transformer నమూనాలు మూడు నిర్మాణాలలో ఒకదాన్ని ఉపయోగిస్తాయని గుర్తుంచుకోండి: ఎన్‌కోడర్-మాత్రమే, డీకోడర్-మాత్రమే లేదా ఎన్‌కోడర్-డీకోడర్ (సీక్వెన్స్-టు-సీక్వెన్స్). ఈ తేడాలను అర్థం చేసుకోవడం మీ నిర్దిష్ట పనికి సరైన నమూనాను ఎంచుకోవడంలో మీకు సహాయపడుతుంది.
 
 ## ఎన్‌కోడర్ నమూనాలు[[encoder-models]]
 
@@ -22,11 +19,8 @@
 
 వాక్య వర్గీకరణ, పేరున్న ఎంటిటీ గుర్తింపు (మరియు సాధారణంగా పద వర్గీకరణ), మరియు సంగ్రాహక ప్రశ్న-సమాధానం వంటి పూర్తి వాక్యం యొక్క అవగాహన అవసరమయ్యే పనులకు ఎన్‌కోడర్ నమూనాలు ఉత్తమంగా సరిపోతాయి.
 
-<Tip>
-
-[How 🤗 Transformers solve tasks](/chapter1/5), లో మనం చూసినట్లుగా, BERT వంటి ఎన్‌కోడర్ నమూనాలు టెక్స్ట్‌ను అర్థం చేసుకోవడంలో రాణిస్తాయి ఎందుకంటే అవి రెండు దిశలలోని మొత్తం సందర్భాన్ని చూడగలవు. ఇది మొత్తం ఇన్‌పుట్ యొక్క గ్రహణశక్తి ముఖ్యమైన పనులకు వాటిని పరిపూర్ణంగా చేస్తుంది.
-
-</Tip>
+> [!TIP]
+> [How 🤗 Transformers solve tasks](/chapter1/5), లో మనం చూసినట్లుగా, BERT వంటి ఎన్‌కోడర్ నమూనాలు టెక్స్ట్‌ను అర్థం చేసుకోవడంలో రాణిస్తాయి ఎందుకంటే అవి రెండు దిశలలోని మొత్తం సందర్భాన్ని చూడగలవు. ఇది మొత్తం ఇన్‌పుట్ యొక్క గ్రహణశక్తి ముఖ్యమైన పనులకు వాటిని పరిపూర్ణంగా చేస్తుంది.
 
 ఈ నమూనాల కుటుంబం యొక్క ప్రతినిధులు:
 
@@ -44,11 +38,8 @@
 
 ఈ నమూనాలు టెక్స్ట్ జనరేషన్ వంటి పనులకు ఉత్తమంగా సరిపోతాయి.
 
-<Tip>
-
-GPT వంటి డీకోడర్ నమూనాలు ఒక సమయంలో ఒక టోకెన్‌ను అంచనా వేయడం ద్వారా టెక్స్ట్‌ను ఉత్పత్తి చేయడానికి రూపొందించబడ్డాయి. [How 🤗 Transformers solve tasks](/chapter1/5) లో మనం అన్వేషించినట్లుగా, అవి మునుపటి టోకెన్‌లను మాత్రమే చూడగలవు, ఇది సృజనాత్మక టెక్స్ట్ జనరేషన్‌కు వాటిని అద్భుతంగా చేస్తుంది కానీ ద్విదిశాత్మక అవగాహన అవసరమయ్యే పనులకు అంత ఆదర్శవంతంగా ఉండదు.
-
-</Tip>
+> [!TIP]
+> GPT వంటి డీకోడర్ నమూనాలు ఒక సమయంలో ఒక టోకెన్‌ను అంచనా వేయడం ద్వారా టెక్స్ట్‌ను ఉత్పత్తి చేయడానికి రూపొందించబడ్డాయి. [How 🤗 Transformers solve tasks](/chapter1/5) లో మనం అన్వేషించినట్లుగా, అవి మునుపటి టోకెన్‌లను మాత్రమే చూడగలవు, ఇది సృజనాత్మక టెక్స్ట్ జనరేషన్‌కు వాటిని అద్భుతంగా చేస్తుంది కానీ ద్విదిశాత్మక అవగాహన అవసరమయ్యే పనులకు అంత ఆదర్శవంతంగా ఉండదు.
 
 ఈ నమూనాల కుటుంబం యొక్క ప్రతినిధులు:
 
@@ -101,11 +92,8 @@ GPT వంటి డీకోడర్ నమూనాలు ఒక సమయం
 
 సీక్వెన్స్-టు-సీక్వెన్స్ నమూనాలు సారాంశీకరణ, అనువాదం, లేదా సృజనాత్మక ప్రశ్న-సమాధానం వంటి ఇచ్చిన ఇన్‌పుట్‌పై ఆధారపడి కొత్త వాక్యాలను ఉత్పత్తి చేయడం చుట్టూ తిరిగే పనులకు ఉత్తమంగా సరిపోతాయి.
 
-<Tip>
-
-[How 🤗 Transformers solve tasks](/chapter1/5),లో మనం చూసినట్లుగా, BART మరియు T5 వంటి ఎన్‌కోడర్-డీకోడర్ నమూనాలు రెండు నిర్మాణాల యొక్క బలాలను మిళితం చేస్తాయి. ఎన్‌కోడర్ ఇన్‌పుట్ యొక్క లోతైన ద్విదిశాత్మక అవగాహనను అందిస్తుంది, అయితే డీకోడర్ తగిన అవుట్‌పుట్ టెక్స్ట్‌ను ఉత్పత్తి చేస్తుంది. ఇది ఒక సీక్వెన్స్‌ను మరొకదానికి మార్చే పనులకు, అనువాదం లేదా సారాంశీకరణ వంటి వాటికి వాటిని పరిపూర్ణంగా చేస్తుంది.
-
-</Tip>
+> [!TIP]
+> [How 🤗 Transformers solve tasks](/chapter1/5),లో మనం చూసినట్లుగా, BART మరియు T5 వంటి ఎన్‌కోడర్-డీకోడర్ నమూనాలు రెండు నిర్మాణాల యొక్క బలాలను మిళితం చేస్తాయి. ఎన్‌కోడర్ ఇన్‌పుట్ యొక్క లోతైన ద్విదిశాత్మక అవగాహనను అందిస్తుంది, అయితే డీకోడర్ తగిన అవుట్‌పుట్ టెక్స్ట్‌ను ఉత్పత్తి చేస్తుంది. ఇది ఒక సీక్వెన్స్‌ను మరొకదానికి మార్చే పనులకు, అనువాదం లేదా సారాంశీకరణ వంటి వాటికి వాటిని పరిపూర్ణంగా చేస్తుంది.
 
 ### ఆచరణాత్మక అనువర్తనాలు
 
@@ -149,17 +137,14 @@ Representatives of this family of models include:
 | ప్రశ్నలకు సమాధానం (జనరేటివ్)         | Encoder-Decoder or Decoder | T5, GPT       |
 | సంభాషణ AI                            | Decoder                    | GPT, LLaMA    |
 
-<Tip>
-
-ఏ నమూనాను ఉపయోగించాలో సందేహంలో ఉన్నప్పుడు, పరిగణించండి:
-
-1. మీ పనికి ఎలాంటి అవగాహన అవసరం? (ద్విదిశాత్మక లేదా ఏకదిశాత్మక)
-2. మీరు కొత్త టెక్స్ట్‌ను ఉత్పత్తి చేస్తున్నారా లేదా ఉన్న టెక్స్ట్‌ను విశ్లేషిస్తున్నారా?
-3. మీరు ఒక సీక్వెన్స్‌ను మరొకదానికి మార్చాల్సిన అవసరం ఉందా?
-
-ఈ ప్రశ్నలకు సమాధానాలు మిమ్మల్ని సరైన నిర్మాణం వైపు నడిపిస్తాయి.
-
-</Tip>
+> [!TIP]
+> ఏ నమూనాను ఉపయోగించాలో సందేహంలో ఉన్నప్పుడు, పరిగణించండి:
+>
+> 1. మీ పనికి ఎలాంటి అవగాహన అవసరం? (ద్విదిశాత్మక లేదా ఏకదిశాత్మక)
+> 2. మీరు కొత్త టెక్స్ట్‌ను ఉత్పత్తి చేస్తున్నారా లేదా ఉన్న టెక్స్ట్‌ను విశ్లేషిస్తున్నారా?
+> 3. మీరు ఒక సీక్వెన్స్‌ను మరొకదానికి మార్చాల్సిన అవసరం ఉందా?
+>
+> ఈ ప్రశ్నలకు సమాధానాలు మిమ్మల్ని సరైన నిర్మాణం వైపు నడిపిస్తాయి.
 
 ## LLMల పరిణామం
 
@@ -169,11 +154,8 @@ Representatives of this family of models include:
 
 చాలా ట్రాన్స్‌ఫార్మర్ నమూనాలు పూర్తి శ్రద్ధను ఉపయోగిస్తాయి, అంటే శ్రద్ధా మాత్రిక చతురస్రాకారంలో ఉంటుంది. మీకు పొడవైన టెక్స్ట్‌లు ఉన్నప్పుడు ఇది ఒక పెద్ద గణన అవరోధంగా ఉంటుంది. లాంగ్‌ఫార్మర్ మరియు రిఫార్మర్ అనే నమూనాలు మరింత సమర్థవంతంగా ఉండటానికి ప్రయత్నిస్తాయి మరియు శిక్షణను వేగవంతం చేయడానికి శ్రద్ధా మాత్రిక యొక్క స్పాన్ వెర్షన్‌ను ఉపయోగిస్తాయి.
 
-<Tip>
-
-ప్రామాణిక శ్రద్ధా యంత్రాంగాలు O(n²) యొక్క గణన సంక్లిష్టతను కలిగి ఉంటాయి, ఇక్కడ n సీక్వెన్స్ పొడవు. చాలా పొడవైన సీక్వెన్స్‌లకు ఇది సమస్యాత్మకంగా మారుతుంది. కింద పేర్కొన్న ప్రత్యేక శ్రద్ధా యంత్రాంగాలు ఈ పరిమితిని పరిష్కరించడంలో సహాయపడతాయి.
-
-</Tip>
+> [!TIP]
+> ప్రామాణిక శ్రద్ధా యంత్రాంగాలు O(n²) యొక్క గణన సంక్లిష్టతను కలిగి ఉంటాయి, ఇక్కడ n సీక్వెన్స్ పొడవు. చాలా పొడవైన సీక్వెన్స్‌లకు ఇది సమస్యాత్మకంగా మారుతుంది. కింద పేర్కొన్న ప్రత్యేక శ్రద్ధా యంత్రాంగాలు ఈ పరిమితిని పరిష్కరించడంలో సహాయపడతాయి.
 
 ### LSH అటెన్షన్
 
diff --git a/chapters/te/chapter1/8.mdx b/chapters/te/chapter1/8.mdx
index 12dda0e10..56bbfd026 100644
--- a/chapters/te/chapter1/8.mdx
+++ b/chapters/te/chapter1/8.mdx
@@ -20,11 +20,8 @@
 
 తదుపరి టోకెన్‌ను అంచనా వేయడానికి అత్యంత సంబంధిత పదాలను గుర్తించే ఈ ప్రక్రియ అద్భుతంగా ప్రభావవంతమైనదని నిరూపించబడింది. BERT మరియు GPT-2 కాలం నుండి LLMలకు శిక్షణ ఇచ్చే ప్రాథమిక సూత్రం — తదుపరి టోకెన్‌ను అంచనా వేయడం — సాధారణంగా స్థిరంగా ఉన్నప్పటికీ, న్యూరల్ నెట్‌వర్క్‌లను స్కేల్ చేయడంలో మరియు అటెన్షన్ మెకానిజంను తక్కువ ఖర్చుతో, సుదీర్ఘమైన సీక్వెన్స్‌ల కోసం పనిచేసేలా చేయడంలో గణనీయమైన పురోగతి సాధించబడింది.
 
-<Tip>
-
-సంక్షిప్తంగా, LLMలు పొందికగా మరియు సందర్భానుసారంగా ఉండే టెక్స్ట్‌ను ఉత్పత్తి చేయగలగడానికి అటెన్షన్ మెకానిజం కీలకం. ఇది ఆధునిక LLMలను పాత తరం భాషా నమూనాల నుండి వేరుగా నిలుపుతుంది.
-
-</Tip>
+> [!TIP]
+> సంక్షిప్తంగా, LLMలు పొందికగా మరియు సందర్భానుసారంగా ఉండే టెక్స్ట్‌ను ఉత్పత్తి చేయగలగడానికి అటెన్షన్ మెకానిజం కీలకం. ఇది ఆధునిక LLMలను పాత తరం భాషా నమూనాల నుండి వేరుగా నిలుపుతుంది.
 
 ### కాంటెక్స్ట్ లెంగ్త్ మరియు అటెన్షన్ స్పాన్
 
@@ -40,11 +37,8 @@
 
 ఒక ఆదర్శ ప్రపంచంలో, మనం మోడల్‌కు అపరిమితమైన కాంటెక్స్ట్‌ను అందించవచ్చు, కానీ హార్డ్‌వేర్ పరిమితులు మరియు గణన ఖర్చులు దీనిని అసాధ్యం చేస్తాయి. అందుకే సామర్థ్యాన్ని మరియు సమర్థతను సమతుల్యం చేయడానికి వివిధ మోడల్స్ వివిధ కాంటెక్స్ట్ లెంగ్త్‌లతో రూపొందించబడ్డాయి.
 
-<Tip>
-
-కాంటెక్స్ట్ లెంగ్త్ అనేది, స్పందనను ఉత్పత్తి చేసేటప్పుడు మోడల్ ఒకేసారి పరిగణించగల గరిష్ట టోకెన్‌ల సంఖ్య.
-
-</Tip>
+> [!TIP]
+> కాంటెక్స్ట్ లెంగ్త్ అనేది, స్పందనను ఉత్పత్తి చేసేటప్పుడు మోడల్ ఒకేసారి పరిగణించగల గరిష్ట టోకెన్‌ల సంఖ్య.
 
 ### ప్రాంప్టింగ్ కళ
 
@@ -52,11 +46,8 @@
 
 LLMలు సమాచారాన్ని ఎలా ప్రాసెస్ చేస్తాయో అర్థం చేసుకోవడం, మెరుగైన ప్రాంప్ట్‌లను రూపొందించడంలో మనకు సహాయపడుతుంది. మోడల్ యొక్క ప్రాథమిక విధి ప్రతి ఇన్‌పుట్ టోకెన్ యొక్క ప్రాముఖ్యతను విశ్లేషించడం ద్వారా తదుపరి టోకెన్‌ను అంచనా వేయడం కాబట్టి, మీ ఇన్‌పుట్ సీక్వెన్స్ యొక్క పదజాలం చాలా కీలకమైనది.
 
-<Tip>
-
-ప్రాంప్ట్‌ను జాగ్రత్తగా రూపొందించడం వల్ల LLM జనరేషన్‌ను కావలసిన అవుట్‌పుట్ వైపు నడిపించడం సులభం అవుతుంది.
-
-</Tip>
+> [!TIP]
+> ప్రాంప్ట్‌ను జాగ్రత్తగా రూపొందించడం వల్ల LLM జనరేషన్‌ను కావలసిన అవుట్‌పుట్ వైపు నడిపించడం సులభం అవుతుంది.
 
 ## రెండు-దశల ఇన్ఫరెన్స్ ప్రక్రియ
 
diff --git a/chapters/th/chapter1/3.mdx b/chapters/th/chapter1/3.mdx
index a13be7f77..5a11949de 100644
--- a/chapters/th/chapter1/3.mdx
+++ b/chapters/th/chapter1/3.mdx
@@ -9,11 +9,10 @@
 
 ในส่วนนี้ เราจะมาดูกันว่า Transformer นั้นทำอะไรได้บ้าง และมาใช้งานเครื่องมือชิ้นแรกจาก library 🤗 Transformers ซึ่งก็คือฟังก์ชัน `pipeline()`
 
-<Tip>
-👀 เห็นปุ่ม <em>Open in Colab</em> ทางด้านบนขวานั่นมั้ย? ลองคลิกเปิดดู Google Colab notebook ได้ ด้านในจะมีตัวอย่างโค้ดที่เกี่ยวข้องทั้งหมดในหน้านี้ โดยปุ่นแบบนี้จะมีในทุก ๆ หน้าที่มีโค้ดตัวอย่าง
-
-แต่หากคุณต้องการรันโค้ดตัวอย่างบนเครื่องของตนเอง เราแนะนำให้เปิดดู <a href="/course/chapter0">ติดตั้งโปรแกรม</a>
-</Tip>
+> [!TIP]
+> 👀 เห็นปุ่ม <em>Open in Colab</em> ทางด้านบนขวานั่นมั้ย? ลองคลิกเปิดดู Google Colab notebook ได้ ด้านในจะมีตัวอย่างโค้ดที่เกี่ยวข้องทั้งหมดในหน้านี้ โดยปุ่นแบบนี้จะมีในทุก ๆ หน้าที่มีโค้ดตัวอย่าง
+>
+> แต่หากคุณต้องการรันโค้ดตัวอย่างบนเครื่องของตนเอง เราแนะนำให้เปิดดู <a href="/course/chapter0">ติดตั้งโปรแกรม</a>
 
 ## Transformers, Transformers เต็มไปหมดเลย!
 
@@ -24,9 +23,8 @@
 
 [library 🤗 Transformers](https://github.com/huggingface/transformers) มีฟังก์ชันในการสร้างและใช้งานโมเดลที่มีคนแบ่งมาให้ใช้ ใน [Model Hub](https://huggingface.co/models) มีโมเดลที่ผ่านการเทรนมาแล้ว (หรือเรียกว่า pretrained model) มากกว่าหนึ่งพันโมเดลที่ใคร ๆ ก็สามารถดาวน์โหลดไปใช้งานได้ รวมถึงคุณเองก็เป็นส่วนหนึ่งในการแบ่งปันนี้ได้เช่นกัน
 
-<Tip>
-⚠️ Hugging Face Hub ไม่ได้จำกัดอยู่แค่เพียงโมเดล Transformer เท่านั้น ใคร ๆ ก็สามารถแบ่งปันโมเดลหรือชุดข้อมูลอะไรก็ได้ตามต้องการ มา <a href="https://huggingface.co/join">สร้างบัญชี huggingface.co</a> เพื่อใช้งานฟีเจอร์เหล่านี้ด้วยกันนะ!
-</Tip>
+> [!TIP]
+> ⚠️ Hugging Face Hub ไม่ได้จำกัดอยู่แค่เพียงโมเดล Transformer เท่านั้น ใคร ๆ ก็สามารถแบ่งปันโมเดลหรือชุดข้อมูลอะไรก็ได้ตามต้องการ มา <a href="https://huggingface.co/join">สร้างบัญชี huggingface.co</a> เพื่อใช้งานฟีเจอร์เหล่านี้ด้วยกันนะ!
 
 ก่อนจะเจาะลึกเข้าไปถึงเบื้องหลังการทำงานของโมเดล Transformer มาดูตัวอย่างกันว่าโมเดล Transformer นั้นสามารถแก้ปัญหา NLP ได้อย่างไรบ้าง
 
@@ -104,11 +102,8 @@ classifier(
 
 pipeline นี้เรียกว่า _zero-shot_ เพราะว่าเราไม่ต้อง fine-tune โมเดลด้วยข้อมูลของเราก่อนจะนำไปใช้ ตัวโมเดลสามารถระบุค่าความน่าจะเป็นตาม list ของ label ที่เราต้องการได้เลย!
 
-<Tip>
-
-✏️ **ลองเลย!** ทดลองโดยใช้ข้อความอะไรก็ได้ของเราเองแล้วดูว่าโมเดลส่งค่าอะไรคืนให้เราบ้าง
-
-</Tip>
+> [!TIP]
+> ✏️ **ลองเลย!** ทดลองโดยใช้ข้อความอะไรก็ได้ของเราเองแล้วดูว่าโมเดลส่งค่าอะไรคืนให้เราบ้าง
 
 
 ## การสร้างข้อความ
@@ -133,11 +128,8 @@ generator("In this course, we will teach you how to")
 คุณสามารถควบคุมจำนวนข้อความที่สร้างขึ้นได้โดยกำหนดค่าที่ argument `num_return_sequences` และกำหนดความยาวของข้อความที่สร้างขึ้นมาด้วย argument `max_length`
 
 
-<Tip>
-
-✏️ **ลองเลย!** ปรับค่า argument `num_return_sequences` และ `max_length` ให้สร้างข้อความขึ้นมาสองประโยค ประโยคละ 15 คำ
-
-</Tip>
+> [!TIP]
+> ✏️ **ลองเลย!** ปรับค่า argument `num_return_sequences` และ `max_length` ให้สร้างข้อความขึ้นมาสองประโยค ประโยคละ 15 คำ
 
 
 ## การใช้งานโมเดลใด ๆ จาก Hub ใน pipeline
@@ -169,11 +161,8 @@ generator(
 
 เมื่อคุณเลือกโมเดลโดยการคลิกที่ชื่อโมเดล คุณจะเห็นว่าจะมีแถบ widget เล็ก ๆ ขึ้นมาให้คุณลองก่อนแบบออนไลน์ ด้วยวิธีการนี้ คุณสามารถทดสอบความสามารถของโมเดลได้ก่อนจะดาวน์โหลดไปใช้
 
-<Tip>
-
-✏️ **ลองเลย!** ใช้ตัวกรองหาโมเดลสร้างข้อความสำหรับภาษาอื่น แนะนำให้ลองกับ widget บน pipeline ดูก่อนเลย
-
-</Tip>
+> [!TIP]
+> ✏️ **ลองเลย!** ใช้ตัวกรองหาโมเดลสร้างข้อความสำหรับภาษาอื่น แนะนำให้ลองกับ widget บน pipeline ดูก่อนเลย
 
 ### The Inference API
 
@@ -205,12 +194,8 @@ unmasker("This course will teach you all about <mask> models.", top_k=2)
 
 argument `top_k` ควบคุมจำนวนข้อความที่ต้องการแสดง โดยโมเดลจะเติมคำลงไปที่คำพิเศษที่เขียนว่า `<mask>` ซึ่งหมายถึง *คำที่ละไว้* ทั้งนี้ โมเดลเติมคำในช่องว่างโมเดลอื่นอาจใช้คำที่ละไว้นี้เป็นอย่างอื่น ดังนั้น โปรดรับรู้ไว้เสมอว่า หากจะใช้โมเดลใดให้ศึกษาให้แน่ชัดว่าโมเดลนั้นใช้คำที่ละไว้ว่าอะไร วิธีการหนึ่งที่ทำได้คือให้ตรวจสอบคำที่ละไว้ที่ใช้ใน widget นี้
 
-<Tip>
-
-✏️ **ลองเลย!** หาโมเดล `bert-base-cased` ใน Hub และตรวจสอบคำที่ละไว้ที่ใช้ใน widget ของ Inference API โมเดลนี้ทำนายอะไรออกมาหากเราใส่ประโยคที่ใส่เข้าไปในตัวอย่าง `pipeline` ด้านบน
-
-
-</Tip>
+> [!TIP]
+> ✏️ **ลองเลย!** หาโมเดล `bert-base-cased` ใน Hub และตรวจสอบคำที่ละไว้ที่ใช้ใน widget ของ Inference API โมเดลนี้ทำนายอะไรออกมาหากเราใส่ประโยคที่ใส่เข้าไปในตัวอย่าง `pipeline` ด้านบน
 
 ## การระบุชื่อเฉพาะ
 
@@ -234,10 +219,8 @@ ner("My name is Sylvain and I work at Hugging Face in Brooklyn.")
 
 เราเพิ่มตัวเลือก `grouped_entities=True` ตอนสร้างฟังก์ชัน pipeline เพื่อระบุให้ pipeline จับกลุ่มคำที่เป็นการระบุชื่อเฉพาะของสิ่ง ๆ เดียว ในที่นี้ โมเดลจับกลุ่มคำว่า "Hugging" และ "Face" เข้าไปเป็นชื่อองค์กรองค์กรเดียว แม้ว่าจะเป็นการรวมคำหลายคำเข้าด้วยกันก็ตาม ซึ่งจริง ๆ แล้ว ในบทต่อไปเราจะเห็นว่าการประมวลผลนั้นจะแบ่งคำบางคำออกมาเป็นส่วนที่แยกย่อยลงไปอีก ตัวอย่างเช่น คำว่า `Sylvian` ถูกแบ่งออกเป็น 4 ส่วน ได้แก่ `S`, `##yl`, `##va`, และ `##in` และระหว่างการ post-processing ตัว pipeline ก็จะนำแต่ละส่วนนี้มาประกอบเข้าด้วยกัน
 
-<Tip>
-
-✏️ **ลองเลย!** หาโมเดลใน Model Hub ที่ทำงานเกี่ยวกับการระบุชื่อเฉพาะ(หรือเรียกว่า part-of-speech tagging ย่อว่า POS)ในภาษาอังกฤษ รู้มั้ยว่าโมเดลนี้ทำนายอะไรในตัวอย่างประโยคข้างต้น?
-</Tip>
+> [!TIP]
+> ✏️ **ลองเลย!** หาโมเดลใน Model Hub ที่ทำงานเกี่ยวกับการระบุชื่อเฉพาะ(หรือเรียกว่า part-of-speech tagging ย่อว่า POS)ในภาษาอังกฤษ รู้มั้ยว่าโมเดลนี้ทำนายอะไรในตัวอย่างประโยคข้างต้น?
 
 ## ถาม-ตอบคำถาม
 
@@ -320,10 +303,7 @@ translator("Ce cours est produit par Hugging Face.")
 
 คุณสามารถกำหนด argument `max_length` หรือ `min_length` เพื่อระบุผลลัพธ์ที่ต้องการเหมือนการสร้างข้อความและการสรุปความ
 
-<Tip>
-
-✏️ **ลองเลย!** ลองค้นหาโมเดลแปลภาษาในภาษาอื่น ๆ และทดลองแปลภาษาจากข้อความด้านบนไปยังภาษาอื่น ๆ
-
-</Tip>
+> [!TIP]
+> ✏️ **ลองเลย!** ลองค้นหาโมเดลแปลภาษาในภาษาอื่น ๆ และทดลองแปลภาษาจากข้อความด้านบนไปยังภาษาอื่น ๆ
 
 คำสั่ง pipeline ที่แสดงด้านบนเป็นเพียงตัวอย่างเบื้องต้นเท่านั้น การใช้งานของคำสั่งนี้ค่อนข้างเฉพาะเจาะจงและไม่สามารถแก้ไขอะไรเบื้องหลังได้มากนัก ในบทหลัง ๆ คุณจะได้เรียนรู้หลักการทำงานเบื้องหลังของฟังก์ชัน `pipeline()` และวิธีการปรับแต่งการทำงาน
diff --git a/chapters/th/chapter2/1.mdx b/chapters/th/chapter2/1.mdx
index 690eeabe1..db70b7e89 100644
--- a/chapters/th/chapter2/1.mdx
+++ b/chapters/th/chapter2/1.mdx
@@ -19,6 +19,5 @@
 
 หลังจากนั้นเราจะไปดูกันที่ tokenizer API ซึ่งเป็นอีกหนึ่งส่วนประกอบหลักของฟังก์ชัน `pipeline()`, Tokenizers จะรับผิดชอบการประมวลผลขั้นแรกและขั้นสุดท้าย ซึ่งก็คือ การแปลงข้อมูลที่เป็นข้อความให้เป็นข้อมูลเชิงตัวเลข เพื่อใช้กับ neural network, และการแปลงข้อมูลกลับไปเป็นตัวอักษร ในกรณีที่จำเป็น และสุดท้ายเราจะแสดงวิธีการจัดการกับการส่งข้อความทีละหลายๆประโยคแบบที่เตรียมไว้เป็นชุดๆ (batch) ไปยังโมเดล และปิดท้ายด้วยฟังก์ชัน `tokenizer()`
 
-<Tip>
-⚠️ เพื่อให้ได้ประโยชน์สูงสุดจากคุณลักษณะเด่นทั้งหมดที่มีใน Model Hub และ 🤗 Transformers, เราแนะนำให้คุณ <a href="https://huggingface.co/join">สร้างบัญชี</a>.
-</Tip>
\ No newline at end of file
+> [!TIP]
+> ⚠️ เพื่อให้ได้ประโยชน์สูงสุดจากคุณลักษณะเด่นทั้งหมดที่มีใน Model Hub และ 🤗 Transformers, เราแนะนำให้คุณ <a href="https://huggingface.co/join">สร้างบัญชี</a>.
\ No newline at end of file
diff --git a/chapters/th/chapter2/2.mdx b/chapters/th/chapter2/2.mdx
index 4c4ae8111..e03b2a9f2 100644
--- a/chapters/th/chapter2/2.mdx
+++ b/chapters/th/chapter2/2.mdx
@@ -22,9 +22,8 @@
 
 {/if}
 
-<Tip>
-Section นี้จะเป็น Section แรกที่เนื้อหาจะค่อนข้างแตกต่างกันขึ้นอยู่กับว่าคุณใช้ PyTorch หรือ TensorFlow คุณสามารถเลือก plateform ที่คุณต้องการได้จากปุ่มที่อยู่ด้านบนของชื่อหัวข้อ!
-</Tip>
+> [!TIP]
+> Section นี้จะเป็น Section แรกที่เนื้อหาจะค่อนข้างแตกต่างกันขึ้นอยู่กับว่าคุณใช้ PyTorch หรือ TensorFlow คุณสามารถเลือก plateform ที่คุณต้องการได้จากปุ่มที่อยู่ด้านบนของชื่อหัวข้อ!
 
 {#if fw === 'pt'}
 <Youtube id="1pedAIvTWXk"/>
@@ -346,8 +345,5 @@ model.config.id2label
 
 ถึงตรงนี้ประสบความสำเร็จในการลองทำ สาม ขั้นตอนของ pipeline: การประมวลผลเบื้องต้น(preprocessing)โดยใช้ tokenizers, ส่งข้อมูลเข้าไปยังโมเดล,และการประมวลผลข้อมูลที่ได้จากโมเดล! ต่อจากนี้เราจะไปลงลึกในรายละเอียดของแต่ละขั้นตอน
 
-<Tip>
-
-✏️ **ลองเลย!** เลือกสอง(หรือมากกว่านั้น) ข้อความของคุณเองและลองใส่มันเข้าไปใน `sentiment-analysis` pipeline. แล้วทำขั้นตอนต่างๆ ที่คุณเรียนผ่านมาใน section นี้และตรวจสอบดูว่าคุณได้ผลเหมือนเดิมหรือไม่!
-
-</Tip>
+> [!TIP]
+> ✏️ **ลองเลย!** เลือกสอง(หรือมากกว่านั้น) ข้อความของคุณเองและลองใส่มันเข้าไปใน `sentiment-analysis` pipeline. แล้วทำขั้นตอนต่างๆ ที่คุณเรียนผ่านมาใน section นี้และตรวจสอบดูว่าคุณได้ผลเหมือนเดิมหรือไม่!
diff --git a/chapters/th/chapter2/4.mdx b/chapters/th/chapter2/4.mdx
index d9ce8fb86..c22c0969b 100644
--- a/chapters/th/chapter2/4.mdx
+++ b/chapters/th/chapter2/4.mdx
@@ -216,11 +216,8 @@ print(ids)
 
 ผลลัพธ์เหล่านี้ เมื่อทำการแปลงไปเป็น tensor ที่เหมาะสมของ framework นั้นๆ แล้ว มันสามารถถูกนำไปใช้เป็นอินพุตของโมเดลเหมือนที่เราเห็นก่อนหน้าในนี้ในบทนี้
 
-<Tip>
-
-✏️ **ลองดูสิ!** ทำซ้ำสองขั้นตอนสุดท้าย(tokenization และแปลงไปเป็น input IDs) กับข้อความที่เราใช้เป็นอินพุตใน section 2 ("I've been waiting for a HuggingFace course my whole life." และ "I hate this so much!") และลองดูว่าคุณได้ input IDs เดียวกันกับที่เราได้ก่อนหน้านี้ไหม!
-
-</Tip>
+> [!TIP]
+> ✏️ **ลองดูสิ!** ทำซ้ำสองขั้นตอนสุดท้าย(tokenization และแปลงไปเป็น input IDs) กับข้อความที่เราใช้เป็นอินพุตใน section 2 ("I've been waiting for a HuggingFace course my whole life." และ "I hate this so much!") และลองดูว่าคุณได้ input IDs เดียวกันกับที่เราได้ก่อนหน้านี้ไหม!
 
 ## การถอดรหัส(Decoding)
 
diff --git a/chapters/th/chapter2/5.mdx b/chapters/th/chapter2/5.mdx
index 518089952..1382ef789 100644
--- a/chapters/th/chapter2/5.mdx
+++ b/chapters/th/chapter2/5.mdx
@@ -180,11 +180,8 @@ batched_ids = [ids, ids]
 
 นี่คือชุด(batch)ของข้อมูลที่ประด้วยสองประโยคที่เหมือนกัน!
 
-<Tip>
-
-✏️ **Try it out!** แปลงลิสท์ของ `batched_ids` นี้ไปเป็น tensor และใส่เข้าไปในโมเดลของคุณ แล้วลองตรวจสอบดูว่าคุณได้ logits เหมือนกับที่ได้ก่อนหน้านี้ไหม(แต่ได้สองค่า)!
-
-</Tip>
+> [!TIP]
+> ✏️ **Try it out!** แปลงลิสท์ของ `batched_ids` นี้ไปเป็น tensor และใส่เข้าไปในโมเดลของคุณ แล้วลองตรวจสอบดูว่าคุณได้ logits เหมือนกับที่ได้ก่อนหน้านี้ไหม(แต่ได้สองค่า)!
 
 ฺBatching นั้นทำให้โมเดลสามารถทำงานได้เมื่อคุณใส่ประโยคหลายๆประโยคเข้าไป การใช้ประโยคหลายๆประโยคนั้นก็สามารถทำได้ง่ายเหมือนกับที่ทำกับประโยคเดียว แต่ก็ยังมีปัญหาที่สอง เมื่อคุณพยายามรวมประโยคตั้งแต่สองประโยคขึ้นไปเป็นชุดข้อมูลเดียวกัน แต่ประโยคเหล่านั้นอาจจะมีความยาวที่แตกต่างกัน ถ้าคุณเคยทำ tensors มาก่อนหน้านี้ คุณจะรู้ว่ามันจำเป็นต้องมีขนาดจตุรัส(rectangular) ดังนั้นคุณจะไม่สามารถแปลงลิสท์ของ input IDs ไปเป็น tensor ได้โดยตรง เราสามารถแก้ปัญหานี้ได้ด้วยการ *pad* อินพุต
 
@@ -316,11 +313,8 @@ tf.Tensor(
 
 สังเกตุว่าค่าสุดท้ายของประโยคที่สองนั้นเป็นอย่างไรใน padding ID ซึ่งก็คือค่า 0 ใน attention mask
 
-<Tip>
-
-✏️ **ลองเลย!** ทำ tokenization กับสองประโยคใน section 2 ("I've been waiting for a HuggingFace course my whole life." และ "I hate this so much!") แล้วใส่เข้าไปในโมเดลและตรวจสอบดูว่าคุณได้ logits เหมือนกับใน section 2 หรือไม่ หลังจากนั้นให้จับประโยครวมกันเป็นชุด(batch) โดยใช้ padding token แล้วสร้าง attention mask ที่ถูกต้อง และตรวจสอบดูว่าคุณได้ผลลัพท์เหมือนกันหรือไม่หลังจากใส่เข้าไปในโมเดลแล้ว!
-
-</Tip>
+> [!TIP]
+> ✏️ **ลองเลย!** ทำ tokenization กับสองประโยคใน section 2 ("I've been waiting for a HuggingFace course my whole life." และ "I hate this so much!") แล้วใส่เข้าไปในโมเดลและตรวจสอบดูว่าคุณได้ logits เหมือนกับใน section 2 หรือไม่ หลังจากนั้นให้จับประโยครวมกันเป็นชุด(batch) โดยใช้ padding token แล้วสร้าง attention mask ที่ถูกต้อง และตรวจสอบดูว่าคุณได้ผลลัพท์เหมือนกันหรือไม่หลังจากใส่เข้าไปในโมเดลแล้ว!
 
 ## ประโยคที่ยาวขึ้น
 
diff --git a/chapters/th/chapter3/2.mdx b/chapters/th/chapter3/2.mdx
index 0f8ed19a1..ecacf2b63 100644
--- a/chapters/th/chapter3/2.mdx
+++ b/chapters/th/chapter3/2.mdx
@@ -89,9 +89,8 @@ Hub นั้นไม่ได้เก็บเพียงแค่โมเ
 
 ไลบรารี่ 🤗 Datasets library มีคำสั่งที่ใช้งานได้ง่ายมากในการดาวโหลดและ cache ชุดข้อมูลที่อยู่บน Hub เราสามารถดาวโหลดชุดข้อมูล MRPC ได้ดังนี้:
 
-<Tip>
-⚠️ **คำเตือน** ตรวจสอบให้แน่ใจว่า `datasets` ได้ถูกติดตั้งโดยการรัน `pip install datasets` ก่อน จากนั้นโหลดชุดข้อมูล MRPC และพิมพ์เพื่อดูว่ามีอะไรบ้าง
-</Tip> 
+> [!TIP]
+> ⚠️ **คำเตือน** ตรวจสอบให้แน่ใจว่า `datasets` ได้ถูกติดตั้งโดยการรัน `pip install datasets` ก่อน จากนั้นโหลดชุดข้อมูล MRPC และพิมพ์เพื่อดูว่ามีอะไรบ้าง 
 
 ```py
 from datasets import load_dataset
@@ -150,11 +149,8 @@ raw_train_dataset.features
 
 เราจะเห็นเบื้องหลังของ `label` ว่าเป็นข้อมูลชนิด `ClassLabel` โดยข้อมูลการ mapping integers เข้ากับชื่อ label นั้นเก็บอยู่ในโฟลเดอร์ *names* โดย `0` จะตรงกับ `not_equivalent` และ `1` ตรงกับ `equivalent`
 
-<Tip>
-
-✏️ **ลองเลย!** ลองดูที่ element 15 ของ training set และ element 87 ของ validation set ว่ามี label เป็นอะไร?
-
-</Tip>
+> [!TIP]
+> ✏️ **ลองเลย!** ลองดูที่ element 15 ของ training set และ element 87 ของ validation set ว่ามี label เป็นอะไร?
 
 ### การประมวลผลชุดข้อมูล
 
@@ -192,11 +188,8 @@ inputs
 
 เราได้อธิบายเกี่ยวกับ keys ที่ชื่อ `input_ids` และ `attention_mask` ไปแล้วใน [Chapter 2](/course/chapter2) แต่เรายังไม่ได้พูดถึง `token_type_ids` ซึ่งในตัวอย่างนี้ ตัว token_type_ids นี่เองที่เป็นตัวบอกโมเดลว่าส่วนไหนของ input ที่เป็นประโยคแรก และส่วนไหนที่เป็นประโยคที่สอง
 
-<Tip>
-
-✏️ **ลองเลย!** ลองเลือก element 15 ของ training set มาลอง tokenize ประโยคทั้งสองแยกกันทีละประโยค และลอง tokenize เป็นคู่มาเทียบกันดู การ tokenize สองแบบนี้ให้ผลลัพธ์ที่ต่างกันอย่างไร?
-
-</Tip>
+> [!TIP]
+> ✏️ **ลองเลย!** ลองเลือก element 15 ของ training set มาลอง tokenize ประโยคทั้งสองแยกกันทีละประโยค และลอง tokenize เป็นคู่มาเทียบกันดู การ tokenize สองแบบนี้ให้ผลลัพธ์ที่ต่างกันอย่างไร?
 
 ถ้าเรา decode ข้อมูล IDs ที่อยู่ใน `input_ids` กลับไปเป็นคำ:
 
@@ -353,11 +346,8 @@ batch = data_collator(samples)
 
 {/if}
 
-<Tip>
-
-✏️ **ลองเลย!** ลองทำการประมวลผลแบบนี้กับชุดข้อมูล GLUE SST-2 ดู มันจะต่างจากตัวอย่างนี้เล็กน้อย เนื่องจากชุดข้อมูลนั้นประกอบไปด้วยประโยคเดียวแทนที่จะเป็นคู่ประโยค แต่ส่วนที่เหลือก็เหมือนกัน ถ้าอยากลองความท้าทายที่ยากขึ้นไปอีก ให้ลองเขียนฟังก์ชั่นประมวลผลที่ใช้กับ GLUE tasks ได้ทุก task ดูสิ
-
-</Tip>
+> [!TIP]
+> ✏️ **ลองเลย!** ลองทำการประมวลผลแบบนี้กับชุดข้อมูล GLUE SST-2 ดู มันจะต่างจากตัวอย่างนี้เล็กน้อย เนื่องจากชุดข้อมูลนั้นประกอบไปด้วยประโยคเดียวแทนที่จะเป็นคู่ประโยค แต่ส่วนที่เหลือก็เหมือนกัน ถ้าอยากลองความท้าทายที่ยากขึ้นไปอีก ให้ลองเขียนฟังก์ชั่นประมวลผลที่ใช้กับ GLUE tasks ได้ทุก task ดูสิ
 
 {#if fw === 'tf'}
 
diff --git a/chapters/th/chapter3/3.mdx b/chapters/th/chapter3/3.mdx
index 311766740..df54574bb 100644
--- a/chapters/th/chapter3/3.mdx
+++ b/chapters/th/chapter3/3.mdx
@@ -42,11 +42,8 @@ from transformers import TrainingArguments
 training_args = TrainingArguments("test-trainer")
 ```
 
-<Tip>
-
-💡 ถ้าคุณต้องการจะอัพโหลดโมเดลของคุณขึ้น Hub ระหว่างที่ทำการเทรนโดยอัตโนมัติ ให้ใส่ `push_to_hub=True` เข้าไปใน `TrainingArguments` ด้วย โดยเราจะเรียนรู้เพิ่มเติมใน [Chapter 4](/course/chapter4/3)
-
-</Tip>
+> [!TIP]
+> 💡 ถ้าคุณต้องการจะอัพโหลดโมเดลของคุณขึ้น Hub ระหว่างที่ทำการเทรนโดยอัตโนมัติ ให้ใส่ `push_to_hub=True` เข้าไปใน `TrainingArguments` ด้วย โดยเราจะเรียนรู้เพิ่มเติมใน [Chapter 4](/course/chapter4/3)
 
 ขั้นตอนที่สองคือการกำหนดโมเดลของพวกเรา เหมือนกับใน [previous chapter](/course/chapter2) เราจะใช้ `AutoModelForSequenceClassification` class โดยมี 2 labels:
 
@@ -164,9 +161,6 @@ trainer.train()
 
 ก็เป็นอันเสร็จสิ้นวิธีการ fine-tune โดยใช้ `Trainer` API ซึ่งตัวอย่างการ fine-tune กับ NLP tasks ส่วนใหญ่ที่ใช้บ่อยจะอยู่ใน Chapter 7 แต่ตอนนี้เรามาดูการทำแบบเดียวกันนี้โดยใช้ PyTorch เพียงอย่างเดียวกัน
 
-<Tip>
-
-✏️ **ลองเลย!** Fine-tune โมเดลโดยใช้ GLUE SST-2 dataset โดยใช้การประมวลผลข้อมูลแบบที่คุณทำไว้ใน section 2
-
-</Tip>
+> [!TIP]
+> ✏️ **ลองเลย!** Fine-tune โมเดลโดยใช้ GLUE SST-2 dataset โดยใช้การประมวลผลข้อมูลแบบที่คุณทำไว้ใน section 2
 
diff --git a/chapters/th/chapter3/3_tf.mdx b/chapters/th/chapter3/3_tf.mdx
index 58a9cd8dd..09ecd1509 100644
--- a/chapters/th/chapter3/3_tf.mdx
+++ b/chapters/th/chapter3/3_tf.mdx
@@ -71,11 +71,8 @@ model = TFAutoModelForSequenceClassification.from_pretrained(checkpoint, num_lab
 เพื่อจะ fine-tune โมเดลด้วย dataset ของเรา เราแค่ต้องทำการ `compile()` โมเดลของเราแล้วส่งข้อมูลเข้าโดยใช้เมธอด `fit()` ซึ่งจะเป็นการเริ่ม fine-tuning (ซึ่งจะใช้เวลาไม่กี่นาทีบน GPU) และรายงาน training loss ระหว่างการเทรน รวมถึง validation loss เมื่อจบแต่ละ epoch 
 To fine-tune the model on our dataset, we just have to `compile()` our model and then pass our data to the `fit()` method. This will start the fine-tuning process (which should take a couple of minutes on a GPU) and report training loss as it goes, plus the validation loss at the end of each epoch.
 
-<Tip>
-
-ควรสังเกตว่าโมเดล 🤗 Transformers มีความสามารถพิเศษที่โมเดล Keras ส่วนใหญ่ไม่มี นั่นก็คือ พวกมันสามารถเลือก loss ที่เหมาะสมได้เอง โดยมันจะใช้ค่า loss นี้เป็นค่าเริ่มต้นหากคุณไม่ได้ใส่อากิวเมนต์ loss ในเมธอด `compile()` นอกจากนี้ควรระวังว่า การจะใช้ internal loss คุณจะต้องส่ง labels ของคุณเข้าไปเป็นส่วนหนึ่งของ input ด้วย ห้ามส่งแยกกัน ซึ่งเป็นวิธีการปกติที่ใช้จัดการกับ labels กับโมเดล Keras คุณจะเป็นตัวอย่างนี้ใน Part 2 ของคอร์ส ซึ่งการจะกำหนด loss function ให้ถูกต้องนั้นจะยุ่งยากเล็กน้อย อย่างไรก็ตาม สำหรับงาน sequence classification สามารถใช้ standard Keras loss function ได้โดยไม่มีปัญหา ซึ่งเราก็จะใช้แบบนั้นในตัวอย่างนี้
-
-</Tip>
+> [!TIP]
+> ควรสังเกตว่าโมเดล 🤗 Transformers มีความสามารถพิเศษที่โมเดล Keras ส่วนใหญ่ไม่มี นั่นก็คือ พวกมันสามารถเลือก loss ที่เหมาะสมได้เอง โดยมันจะใช้ค่า loss นี้เป็นค่าเริ่มต้นหากคุณไม่ได้ใส่อากิวเมนต์ loss ในเมธอด `compile()` นอกจากนี้ควรระวังว่า การจะใช้ internal loss คุณจะต้องส่ง labels ของคุณเข้าไปเป็นส่วนหนึ่งของ input ด้วย ห้ามส่งแยกกัน ซึ่งเป็นวิธีการปกติที่ใช้จัดการกับ labels กับโมเดล Keras คุณจะเป็นตัวอย่างนี้ใน Part 2 ของคอร์ส ซึ่งการจะกำหนด loss function ให้ถูกต้องนั้นจะยุ่งยากเล็กน้อย อย่างไรก็ตาม สำหรับงาน sequence classification สามารถใช้ standard Keras loss function ได้โดยไม่มีปัญหา ซึ่งเราก็จะใช้แบบนั้นในตัวอย่างนี้
 
 ```py
 from tensorflow.keras.losses import SparseCategoricalCrossentropy
@@ -91,11 +88,8 @@ model.fit(
 )
 ```
 
-<Tip warning={true}>
-
-ให้ระวังข้อผิดพลาดที่เกิดขึ้นบ่อยตรงนี้ - คุณ *สามารถ* แค่ใส่ชื่อของ loss เป็น string เข้าไปใน Keras แต่โดยค่าเริ่มต้นแล้ว Keras จะสันนิษฐานว่าคุณได้ใช้ softmax กับ outputs ของคุณไปแล้ว อย่างไรก็ตามมีโมเดลจำนวนมากที่ให้ผลลัพธ์เป็นค่าก่อนที่จะใช้ softmax ซึ่งเรียกว่า logits เราจะต้องบอก loss function ว่าโมเดลของเราทำอะไร และวิธีการเดียวที่จะทำได้คือการ call โดยตรง ไม่ใช่การส่งชื่อที่เป็น string เข้าไป
-
-</Tip>
+> [!WARNING]
+> ให้ระวังข้อผิดพลาดที่เกิดขึ้นบ่อยตรงนี้ - คุณ *สามารถ* แค่ใส่ชื่อของ loss เป็น string เข้าไปใน Keras แต่โดยค่าเริ่มต้นแล้ว Keras จะสันนิษฐานว่าคุณได้ใช้ softmax กับ outputs ของคุณไปแล้ว อย่างไรก็ตามมีโมเดลจำนวนมากที่ให้ผลลัพธ์เป็นค่าก่อนที่จะใช้ softmax ซึ่งเรียกว่า logits เราจะต้องบอก loss function ว่าโมเดลของเราทำอะไร และวิธีการเดียวที่จะทำได้คือการ call โดยตรง ไม่ใช่การส่งชื่อที่เป็น string เข้าไป
 
 
 ### การปรับปรุงประสิทธิภาพในการเทรน
@@ -129,11 +123,8 @@ from tensorflow.keras.optimizers import Adam
 opt = Adam(learning_rate=lr_scheduler)
 ```
 
-<Tip>
-
-ไลบรารี่ 🤗 Transformers ก็มีฟังก์ชั่น `create_optimizer()` ซึ่งจะสร้าง `AdamW` optimizer โดยใช้ learning rate decay ซึ่งเป็นทางลัดที่สะดวก และคุณจะได้เห็นรายละเอียดใน section ต่อ ๆ ไปในคอร์ส
-
-</Tip>
+> [!TIP]
+> ไลบรารี่ 🤗 Transformers ก็มีฟังก์ชั่น `create_optimizer()` ซึ่งจะสร้าง `AdamW` optimizer โดยใช้ learning rate decay ซึ่งเป็นทางลัดที่สะดวก และคุณจะได้เห็นรายละเอียดใน section ต่อ ๆ ไปในคอร์ส
 
 ตอนนี้เราก็มี optimizer ตัวใหม่เอี่ยม และสามารถนำไปลองเทรนได้เลย ขั้นแรก เรามาโหลดโมเดลขึ้นมากใหม่ เพื่อ reset weights จากการเทรนก่อนหน้านี้ จากนั้นเราก็จะ compile โมเดลโดยใช้ optimizer ตัวใหม่
 
@@ -151,11 +142,8 @@ model.compile(optimizer=opt, loss=loss, metrics=["accuracy"])
 model.fit(tf_train_dataset, validation_data=tf_validation_dataset, epochs=3)
 ```
 
-<Tip>
-
-💡 ถ้าคุณต้องการจะอัพโหลดโมเดลของคุณขึ้น Hub ระหว่างที่ทำการเทรนโดยอัตโนมัติ ให้ใส่ `push_to_hub=True` เข้าไปใน `TrainingArguments` ด้วย โดยเราจะเรียนรู้เพิ่มเติมใน [Chapter 4](/course/chapter4/3)
-
-</Tip>
+> [!TIP]
+> 💡 ถ้าคุณต้องการจะอัพโหลดโมเดลของคุณขึ้น Hub ระหว่างที่ทำการเทรนโดยอัตโนมัติ ให้ใส่ `push_to_hub=True` เข้าไปใน `TrainingArguments` ด้วย โดยเราจะเรียนรู้เพิ่มเติมใน [Chapter 4](/course/chapter4/3)
 
 ### การทำนายผลของโมเดล
 
diff --git a/chapters/th/chapter3/4.mdx b/chapters/th/chapter3/4.mdx
index 97a44a944..3587608dc 100644
--- a/chapters/th/chapter3/4.mdx
+++ b/chapters/th/chapter3/4.mdx
@@ -197,11 +197,8 @@ metric.compute()
 
 ผลลัพธ์ที่ได้อาจแตกต่างไปเล็กน้อยเนื่องจากมีการสุ่มค่า weight ตอนสร้าง model head และมีการสลับข้อมูลแบบสุ่ม แต่ผลที่ได้ก็ควรจะใกล้เคียงกัน
 
-<Tip>
-
-✏️ **ลองเลย!** แก้ไขลูปในการเทรนก่อนหน้านี้เพื่อทำการ fine-tune โมเดลของคุณด้วย SST-2 dataset.
-
-</Tip>
+> [!TIP]
+> ✏️ **ลองเลย!** แก้ไขลูปในการเทรนก่อนหน้านี้เพื่อทำการ fine-tune โมเดลของคุณด้วย SST-2 dataset.
 
 ### เร่งความเร็วให้ลูปในการเทรนของคุณด้วย 🤗 Accelerate
 
@@ -292,9 +289,8 @@ for epoch in range(num_epochs):
 
 จากนัั้นก็มีการทำงานหลัก ๆ ในบรรทัดที่ส่ง dataloaders, โมเดล และ optimizer เข้าไปที่ `accelerator.prepare()` ซึ่งเป็นการ wrap ออพเจ็กต์เหล่านี้ให้อยู่ใน container ที่เหมาะสม และทำให้แน่ใจว่า distributed training ของคุณทำงานได้ตามที่ตั้งใจไว้ การเปลี่ยนแปลงส่วนที่เหลือคือการเอาโค้ดส่วนที่คุณใส่ batch เข้าไปใน `device` ออก (และอีกครั้ง ถ้าคุณอยากคงไว้ ก็เปลี่ยนจาก `device` เป็น `accelerator.device` และแก้จาก `loss.backward()` เป็น `accelerator.backward(loss)`)
 
-<Tip>
-⚠️ เพื่อที่จะได้ประโยชน์จากความเร็วที่เพิ่มขึ้นจากการใช้ Cloud TPUs เราแนะนำให้คุณเติมข้อมูลของคุณด้วยความยาวที่คงที่โดยการกำหนดอากิวเมนต์ `padding="max_length"` และ `max_length` ให้กับ tokenizer
-</Tip>
+> [!TIP]
+> ⚠️ เพื่อที่จะได้ประโยชน์จากความเร็วที่เพิ่มขึ้นจากการใช้ Cloud TPUs เราแนะนำให้คุณเติมข้อมูลของคุณด้วยความยาวที่คงที่โดยการกำหนดอากิวเมนต์ `padding="max_length"` และ `max_length` ให้กับ tokenizer
 
 ถ้าคุณอยากคัดลองและวางโค้ดเพื่อทดลองดู โค้ดข้างล่างนี้คือตัวอย่างของลูปในการเทรนโดยใช้ 🤗 Accelerate แบบสมบูรณ์:
 
diff --git a/chapters/th/chapter4/2.mdx b/chapters/th/chapter4/2.mdx
index 9881ecc0d..586e4ca35 100644
--- a/chapters/th/chapter4/2.mdx
+++ b/chapters/th/chapter4/2.mdx
@@ -91,6 +91,5 @@ model = TFAutoModelForMaskedLM.from_pretrained("camembert-base")
 ```
 {/if}
 
-<Tip>
-เมื่อมีการใช้งานโมเดลที่ผ่านการเทรนมาแล้ว (pretrained model) คุณควรตรวจสอบให้มั่นใจว่ามันถูกเทรนมาอย่างไร กับชุดข้อมูลไหน ขีดจำกัด (limits) และความลำเอียง (biases) คืออะไร ซึ่งข้อมูลทั้งหมดนี้ควรถูกระบุอยู่ในการ์ดโมเดล (model card)
-</Tip>
+> [!TIP]
+> เมื่อมีการใช้งานโมเดลที่ผ่านการเทรนมาแล้ว (pretrained model) คุณควรตรวจสอบให้มั่นใจว่ามันถูกเทรนมาอย่างไร กับชุดข้อมูลไหน ขีดจำกัด (limits) และความลำเอียง (biases) คืออะไร ซึ่งข้อมูลทั้งหมดนี้ควรถูกระบุอยู่ในการ์ดโมเดล (model card)
diff --git a/chapters/th/chapter4/3.mdx b/chapters/th/chapter4/3.mdx
index 7558c85fc..cb9966a56 100644
--- a/chapters/th/chapter4/3.mdx
+++ b/chapters/th/chapter4/3.mdx
@@ -172,11 +172,8 @@ tokenizer.push_to_hub("dummy-model", organization="huggingface", use_auth_token=
 </div>
 {/if}
 
-<Tip>
-
-✏️ **ทดลองใช้ได้เลย!** นำโมเดลและ tokenizer ที่เกี่ยวข้องกับ `bert-base-cased` checkpoint และอัพโหลดขึ้นไปบน repo ใน namespace ของคุณด้วยคำสั่ง `push_to_hub()` จากนั้นลองตรวจสอบดูว่า repo ปรากฏออกมาในรูปแบบที่สมควรจะเป็นบนหน้าของคุณ ก่อนที่จะลบมันออกไป
-
-</Tip>
+> [!TIP]
+> ✏️ **ทดลองใช้ได้เลย!** นำโมเดลและ tokenizer ที่เกี่ยวข้องกับ `bert-base-cased` checkpoint และอัพโหลดขึ้นไปบน repo ใน namespace ของคุณด้วยคำสั่ง `push_to_hub()` จากนั้นลองตรวจสอบดูว่า repo ปรากฏออกมาในรูปแบบที่สมควรจะเป็นบนหน้าของคุณ ก่อนที่จะลบมันออกไป
 
 อย่างที่คุณได้เห็น คำสั่ง `push_to_hub()` สามารถรับ arguments ได้มากมาย ทำให้สามารถอัพไฟล์ไปยัง repository หรือ namespace ขององค์กรแบบเจาะจง หรือใช้งานโทเค็น API ที่ต่างกันได้ เราแนะนำให้คุณลองเปิดดูรายละเอียดของคำสั่งนี้ได้โดยตรงที่ [🤗 Transformers documentation](https://huggingface.co/transformers/model_sharing.html) เพื่อจะได้เข้าใจมากขึ้นว่าคำสั่งนี้สามารถทำอะไรได้บ้าง
 
@@ -465,9 +462,8 @@ config.json  README.md  sentencepiece.bpe.model  special_tokens_map.json  tf_mod
 
 {/if}
 
-<Tip>
-✏️ ตอนที่สร้าง repository จาก web interface ไฟล์ *.gitattributes* ถูกตั้งค่าอย่างอัตโนมัติเพื่อให้พิจารณาไฟล์บางประเภท เช่น *.bin* และ *.h5* ว่าเป็นไฟล์ขนาดใหญ่ และ git-lfs จะติดตามไฟล์เหล่านั้นโดยคุณไม่จำเป็นต้องตั้งค่าอะไรเลย
-</Tip> 
+> [!TIP]
+> ✏️ ตอนที่สร้าง repository จาก web interface ไฟล์ *.gitattributes* ถูกตั้งค่าอย่างอัตโนมัติเพื่อให้พิจารณาไฟล์บางประเภท เช่น *.bin* และ *.h5* ว่าเป็นไฟล์ขนาดใหญ่ และ git-lfs จะติดตามไฟล์เหล่านั้นโดยคุณไม่จำเป็นต้องตั้งค่าอะไรเลย 
 
 ต่อไปเราจะทำต่อโดยการทำแบบที่เราทำกับ Git repositories ดั้งเดิม เราสามารถเพิ่มไฟล์ทั้งหมดไปยัง staging environment ของ Git ได้ด้วยการใช้คำสั่ง `git add`:
 
diff --git a/chapters/th/chapter6/2.mdx b/chapters/th/chapter6/2.mdx
index 3b1e55c73..efb9c9bee 100644
--- a/chapters/th/chapter6/2.mdx
+++ b/chapters/th/chapter6/2.mdx
@@ -18,12 +18,10 @@
 
 <Youtube id="DJimQynXZsQ"/>
 
-<Tip warning={true}>
-
-⚠️ การเทรน tokenize จะไม่เหมือนการกับเทรนโมเดลทั่วไป ในการเทรนโมเดลทั่วไปเราใช้ stochastic gradient descent เพื่อลดค่า loss ในทุก batch กระบวนการนี้มีความ random อยู่ในตัวของมัน (ซึ่งแปลว่า ถ้าคุณเทรนโมเดลสองครั้งแล้วอยากได้ผลลัพธ์ที่เหมือนกัน คุณจะต้องตั้งค่า seed ของการ random ให้เหมือนกันในทุกครั้งที่คุณเทรน)
-ส่วนการเทรน tokenize เป็นกระบวนการทางสถิติที่พยายามจะค้นหาคำย่อยที่เหมาะสมที่สุดจากคลังข้อมูลหนึ่ง วิธีในการเลือกค้นหาคำย่อยนี้ก็มีหลากหลายวิธี
-ผลลัพธ์ของการเทรนประเภทนี้จะมีความคงที่ (deterministic) ซึ่งแปลว่าคุณจะได้ผลลัพธ์เดิมทุกครั้งหลังจากการเทรน ถ้าหากคุณใช้อัลกอริทึมและข้อมูลเดิมทุกครั้ง
-</Tip>
+> [!WARNING]
+> ⚠️ การเทรน tokenize จะไม่เหมือนการกับเทรนโมเดลทั่วไป ในการเทรนโมเดลทั่วไปเราใช้ stochastic gradient descent เพื่อลดค่า loss ในทุก batch กระบวนการนี้มีความ random อยู่ในตัวของมัน (ซึ่งแปลว่า ถ้าคุณเทรนโมเดลสองครั้งแล้วอยากได้ผลลัพธ์ที่เหมือนกัน คุณจะต้องตั้งค่า seed ของการ random ให้เหมือนกันในทุกครั้งที่คุณเทรน)
+> ส่วนการเทรน tokenize เป็นกระบวนการทางสถิติที่พยายามจะค้นหาคำย่อยที่เหมาะสมที่สุดจากคลังข้อมูลหนึ่ง วิธีในการเลือกค้นหาคำย่อยนี้ก็มีหลากหลายวิธี
+> ผลลัพธ์ของการเทรนประเภทนี้จะมีความคงที่ (deterministic) ซึ่งแปลว่าคุณจะได้ผลลัพธ์เดิมทุกครั้งหลังจากการเทรน ถ้าหากคุณใช้อัลกอริทึมและข้อมูลเดิมทุกครั้ง
 
 ## การสร้างคลังข้อมูล (Assembling a corpus)
 
diff --git a/chapters/th/chapter6/3.mdx b/chapters/th/chapter6/3.mdx
index 6d58f3b79..4ccdad814 100644
--- a/chapters/th/chapter6/3.mdx
+++ b/chapters/th/chapter6/3.mdx
@@ -39,11 +39,8 @@
 `batched=True`  | 10.8s          | 4min41s
 `batched=False` | 59.2s          | 5min3s
 
-<Tip warning={true}>
-
-⚠️ ถ้าคุณเปรียบเทียบ tokenizer ทั้งสองแบบ โดยดูจากความเร็วในการตัดคำของประโยคเดียว คุณอาจจะไม่เห็นความแตกต่างมาก และบางที fast tokenizer อาจจะช้ากว่า slow tokenizer ด้วยซ้ำ คุณจะเห็นความแตกต่างที่แท้จริง ก็เมื่อลองรันกับ input ที่มีขนาดใหญ่ระดับหนึ่ง เพราะการทำแบบนี้ จะทำให้การประมวลผลแบบ parallel ถูกเรียกใช้งาน
-
-</Tip>
+> [!WARNING]
+> ⚠️ ถ้าคุณเปรียบเทียบ tokenizer ทั้งสองแบบ โดยดูจากความเร็วในการตัดคำของประโยคเดียว คุณอาจจะไม่เห็นความแตกต่างมาก และบางที fast tokenizer อาจจะช้ากว่า slow tokenizer ด้วยซ้ำ คุณจะเห็นความแตกต่างที่แท้จริง ก็เมื่อลองรันกับ input ที่มีขนาดใหญ่ระดับหนึ่ง เพราะการทำแบบนี้ จะทำให้การประมวลผลแบบ parallel ถูกเรียกใช้งาน
 
 ## Batch encoding
 
@@ -123,15 +120,12 @@ encoding.word_ids()
 ในบทหน้า เราจะมาดูกันว่า เราจะใช้ feature นี้เพื่อจับคู่ label กับ token ใน task เช่น entity recognition (NER) and part-of-speech (POS) tagging ได้อย่างไร
 นอกจากนั้น คุณยังสามารถใช้ feature นี้ เพื่อทำการปกปิด(mask) token ทุกตัวที่มาจากคำเดียวกัน เวลาใช้ masked language modeling ได้อีกด้วย (การทำแบบนี้เราเรียกว่า _whole word masking_)
 
-<Tip>
-
-นิยามของ "คำ" นั้นค่อนข้างยากที่จะกำหนด ตัวอย่างเช่น  "I'll" (เป็นการเขียนแบบสั้นของ "I will" ) ควรนับเป็นหนึ่งหรือสองคำ ?
-คำตอบของคำถามนี้นั้น ขึ้นกับว่า คุณใช้ตัวตัดคำแบบไหน และมีการปรับแต่งข้อความ input ก่อนที่จะทำการตัดคำหรือไม่
-ตัวตัดคำบางตัว อาจจะแยกคำด้วย space บางตัวอาจจะแยกคำด้วยเครื่องหมายวรรคตอน (punctuation) ก่อนแล้วจึงแบ่งด้วย space ในกรณีหลังนี้ "I'll" ก็จะถูกแบ่งเป็นสองคำ
-
-✏️ **ลองทำดู!** ให้คุณลองสร้างตัวตัดคำจาก checkpoint ของ `bert-base-cased` and `roberta-base` แล้วให้ลองตัดคำว่า "81s"  คุณสังเกตเห็นอะไรบ้าง และ ID ของคำที่ได้คืออะไร
-
-</Tip>
+> [!TIP]
+> นิยามของ "คำ" นั้นค่อนข้างยากที่จะกำหนด ตัวอย่างเช่น  "I'll" (เป็นการเขียนแบบสั้นของ "I will" ) ควรนับเป็นหนึ่งหรือสองคำ ?
+> คำตอบของคำถามนี้นั้น ขึ้นกับว่า คุณใช้ตัวตัดคำแบบไหน และมีการปรับแต่งข้อความ input ก่อนที่จะทำการตัดคำหรือไม่
+> ตัวตัดคำบางตัว อาจจะแยกคำด้วย space บางตัวอาจจะแยกคำด้วยเครื่องหมายวรรคตอน (punctuation) ก่อนแล้วจึงแบ่งด้วย space ในกรณีหลังนี้ "I'll" ก็จะถูกแบ่งเป็นสองคำ
+>
+> ✏️ **ลองทำดู!** ให้คุณลองสร้างตัวตัดคำจาก checkpoint ของ `bert-base-cased` and `roberta-base` แล้วให้ลองตัดคำว่า "81s"  คุณสังเกตเห็นอะไรบ้าง และ ID ของคำที่ได้คืออะไร
 
 นอกจากนั้นยังมี method คล้ายๆกัน ที่ชื่อ `sentence_ids()` ที่เอาไว้ใช้เพื่อโยง token ไปหาประโยคต้นตอ ในตัวอย่างของเรา คุณสามารถใช้ `token_type_ids` ซึ่งเป็นผลลัพธ์จากการรัน tokenizer แทน `sentence_ids()` ได้ เพราะทั้งสองให้ข้อมูลเดียวกัน
 
@@ -151,12 +145,8 @@ Sylvain
 อย่างที่เราได้บอกข้างต้นแล้ว fast tokenizer สามารถทำแบบนี้ได้ เพราะมันเก็บข้อมูลเกี่ยวกับ span ของแต่ละ token เอาไว้ และบันทึกไว้ใน list ของ *offsets*
 เพื่อที่จะอธิบายการใช้งานของ feature นี้ เรามาลองคำนวณผลลัพธ์ของ pipeline `token-classification` กัน
 
-<Tip>
-
-✏️ **ลองทำดู!** ให้คุณลองคิดข้อความตัวอย่างขึ้นมา แล้วถามตัวเองว่า token ตัวไหนคู่กับ ID ของคำไหน และ คุณจะหา span ของแต่ละคำได้อย่างไร นอกจากนั้น ให้คุณลองสร้างสองประโยคเพื่อเป็น input ให้กับตัวตัดคำของคุณ แล้วดูว่า ID ของประโยคนั้นเหมาะสมหรือไม่
-
-
-</Tip>
+> [!TIP]
+> ✏️ **ลองทำดู!** ให้คุณลองคิดข้อความตัวอย่างขึ้นมา แล้วถามตัวเองว่า token ตัวไหนคู่กับ ID ของคำไหน และ คุณจะหา span ของแต่ละคำได้อย่างไร นอกจากนั้น ให้คุณลองสร้างสองประโยคเพื่อเป็น input ให้กับตัวตัดคำของคุณ แล้วดูว่า ID ของประโยคนั้นเหมาะสมหรือไม่
 
 ## โครงสร้างภายในของ pipeline `token-classification`
 
diff --git a/chapters/th/chapter6/3b.mdx b/chapters/th/chapter6/3b.mdx
index 53f0893fd..765e695bd 100644
--- a/chapters/th/chapter6/3b.mdx
+++ b/chapters/th/chapter6/3b.mdx
@@ -278,11 +278,8 @@ print(scores[start_index, end_index])
 0.97773
 ```
 
-<Tip>
-
-✏️ **ลองทำดู!** คำนวณ index เริ่มต้นและสิ้นสุด เพื่อหาคำตอบที่น่าจะเป็นไปได้มากที่สุด 5 คำตอบ
-
-</Tip>
+> [!TIP]
+> ✏️ **ลองทำดู!** คำนวณ index เริ่มต้นและสิ้นสุด เพื่อหาคำตอบที่น่าจะเป็นไปได้มากที่สุด 5 คำตอบ
 
 เรามี `start_index` และ `end_index` ของ token ที่จะเอามาเป็นคำตอบได้แล้ว ดังนั้นตอนนี้เราเพียงแค่ต้องแปลงเป็น index ของตัวอักษร ใน context เท่านั้น นี่คือจุดที่ offsets จะมีประโยชน์มาก เราสามารถใช้งานมันได้เหมือนที่เราทำใน token classification:
 
@@ -316,10 +313,8 @@ print(result)
 
 ยอดเยี่ยม! เราได้คำตอบเหมือนกับในตัวอย่างแรกของเรา!
 
-<Tip>
-
-✏️ **ลองดูสิ!** ใช้คะแนนที่ดีที่สุดที่คุณคำนวณไว้ก่อนหน้านี้ เพื่อคำนวณคำตอบที่น่าจะเป็นไปได้มากที่สุดห้าลำดับ ในการตรวจสอบผลลัพธ์ของคุณ ให้กลับไปที่ pipeline แรกแล้วตั้งค่า `top_k=5` ตอนที่รัน pipeline
-</Tip>
+> [!TIP]
+> ✏️ **ลองดูสิ!** ใช้คะแนนที่ดีที่สุดที่คุณคำนวณไว้ก่อนหน้านี้ เพื่อคำนวณคำตอบที่น่าจะเป็นไปได้มากที่สุดห้าลำดับ ในการตรวจสอบผลลัพธ์ของคุณ ให้กลับไปที่ pipeline แรกแล้วตั้งค่า `top_k=5` ตอนที่รัน pipeline
 
 ## การจัดการกับบริบทยาว (long contexts)
 
@@ -614,11 +609,8 @@ print(candidates)
 ```
 
 output ที่เราได้คือ span คำตอบที่ดีที่สุดของแต่ละประโยคย่อย ที่โมเดลคำนวณได้ เราจะเห็นว่าโมเดลให้ค่าความมั่นใจที่สูงมากๆกับ span คำตอบในประโยคที่สองมากกว่าประโยคแรก (ซึ่งเป็นสัญญาณที่ดี!) สิ่งที่เราต้องทำหลังจากนี้ก็คือ map ค่า span ไปสู่ตัวอักษร เพื่อดูว่า คำตอบที่โมเดลคำนวณได้คืออะไร
-<Tip>
-
-✏️ **ลองดูสิ!** ปรับโค้ดด้านบนเพื่อให้มัน return score และ span ของคำตอบที่น่าจะเป็นไปได้มากที่สุด 5 ลำดับ (โดยเปรียบเทียบ score ของทุกประโยคย่อย)
-
-</Tip>
+> [!TIP]
+> ✏️ **ลองดูสิ!** ปรับโค้ดด้านบนเพื่อให้มัน return score และ span ของคำตอบที่น่าจะเป็นไปได้มากที่สุด 5 ลำดับ (โดยเปรียบเทียบ score ของทุกประโยคย่อย)
 
 ค่า `offsets` ที่เราใช้ก่อนหน้านี้ เป็น list ของ offsets โดยที่แต่ละประโยคย่อยจะมีหนึ่ง list :
 
@@ -639,10 +631,7 @@ for candidate, offset in zip(candidates, offsets):
 
 ถ้าไม่นับผลลัพธ์แรกที่เรา print ออกมาด้วย เราก็จะได้ผลลัพธ์เดียวกันกับผลลัพธ์จากไปป์ไลน์ -- เย้!
 
-<Tip>
-
-✏️ **ลองดูสิ!** ใช้ score ที่ดีที่สุดที่คุณคำนวณได้ก่อนหน้านี้ เพื่อแสดงคำตอบที่น่าจะเป็นไปได้มากที่สุด 5 ลำดับ (สำหรับบริบททั้งหมด ไม่ใช่แต่ละส่วน) เพื่อตรวจสอบผลลัพธ์ของคุณ ให้กลับไปที่ไปป์ไลน์แรกแล้วตั้งค่า `top_k=5` เวลารัน
-
-</Tip>
+> [!TIP]
+> ✏️ **ลองดูสิ!** ใช้ score ที่ดีที่สุดที่คุณคำนวณได้ก่อนหน้านี้ เพื่อแสดงคำตอบที่น่าจะเป็นไปได้มากที่สุด 5 ลำดับ (สำหรับบริบททั้งหมด ไม่ใช่แต่ละส่วน) เพื่อตรวจสอบผลลัพธ์ของคุณ ให้กลับไปที่ไปป์ไลน์แรกแล้วตั้งค่า `top_k=5` เวลารัน
 
 บทนี้ถือว่าเป็น การสรุปจบการเรียนรู้ความสามารถของ tokenizer แบบละเอียด ในบทต่อไปคุณจะได้ใช้ความรู้ที่เรียนมานี้ เพื่อฝึกฝนอีก โดยคุณจะได้ฝึก fine-tune โมเดลเพื่อ task ทั่วๆไป ของ NLP
\ No newline at end of file
diff --git a/chapters/th/chapter6/4.mdx b/chapters/th/chapter6/4.mdx
index c77352e2c..d549b9efa 100644
--- a/chapters/th/chapter6/4.mdx
+++ b/chapters/th/chapter6/4.mdx
@@ -50,11 +50,8 @@ print(tokenizer.backend_tokenizer.normalizer.normalize_str("Héllò hôw are ü?
 
 ในตัวอย่างนี้ เนื่องจากเราเลือกใช้ checkpoint `bert-base-uncased` การ normalization จึงแปลงข้อความเป็นตัวพิมพ์เล็กและลบเครื่องหมายเน้นเสียงออก
 
-<Tip>
-
-✏️ **ลองดูสิ!** โหลด tokenizer จาก checkpoint `bert-base-cased` และใช้มันกับ input เดียวกันกับข้างบนนี้ แล้วดูว่าผลลัพธ์ต่างกันอย่างไร ระหว่าง tokenizer เวอร์ชัน cased และ uncased
-
-</Tip>
+> [!TIP]
+> ✏️ **ลองดูสิ!** โหลด tokenizer จาก checkpoint `bert-base-cased` และใช้มันกับ input เดียวกันกับข้างบนนี้ แล้วดูว่าผลลัพธ์ต่างกันอย่างไร ระหว่าง tokenizer เวอร์ชัน cased และ uncased
 
 ## Pre-tokenization
 
diff --git a/chapters/th/chapter6/5.mdx b/chapters/th/chapter6/5.mdx
index 4d6961f70..266cf09b9 100644
--- a/chapters/th/chapter6/5.mdx
+++ b/chapters/th/chapter6/5.mdx
@@ -11,11 +11,8 @@
 
 <Youtube id="HEikzVL-lZU"/>
 
-<Tip>
-
-💡 บทนี้จะพูดถึง BPE อย่างละเอียด เราจะเจาะลึกถึงไปถึงการ implement อัลกอริทึมนี้ คุณสามารถข้ามไปตอนท้ายได้ ถ้าคุณสนใจเพียงแค่ภาพรวมคร่าวๆเท่านั้น
-
-</Tip>
+> [!TIP]
+> 💡 บทนี้จะพูดถึง BPE อย่างละเอียด เราจะเจาะลึกถึงไปถึงการ implement อัลกอริทึมนี้ คุณสามารถข้ามไปตอนท้ายได้ ถ้าคุณสนใจเพียงแค่ภาพรวมคร่าวๆเท่านั้น
 
 ## อัลกอริทึมที่ใช้ในการเทรน
 
@@ -29,11 +26,8 @@ vocabulary ตั้งต้นสำหรับชุดข้อมูลน
 
 ถ้าข้อความใหม่ที่คุณต้องการจะตัดคำ มีสัญลักษณ์ที่ไม่ได้อยู่ใน training corpus สัญลักษณ์พวกนี้จะถูกแปลงเป็น unknown token นี่เป็นเหตุผลว่าทำไมโมเดล NLP จึงประมวลผลข้อความที่มีอีโมจิได้ไม่ดีนัก
 
-<Tip>
-
-Tokenizer ของ GPT-2 และ RoBERTa (ซึ่งค่อนข้างคล้ายกัน) มีวิธีการจัดการกับปัญหานี้ได้อย่างประสิทธิภาพ มันจะไม่มองแต่ละคำเป็น Unicode แต่จะมองว่าเป็น byte การทำแบบนี้ทำให้ vocabulary ตั้งต้น มีขนาดที่เล็ก (256) แต่ยังสามารถบันทึกทุกๆสัญลักษณ์ได้ โดยไม่ต้องแปลงสัญลักษณ์พิเศษต่างๆเป็น unknown token เทคนิคนี้เรียกว่า *byte-level BPE*
-
-</Tip>
+> [!TIP]
+> Tokenizer ของ GPT-2 และ RoBERTa (ซึ่งค่อนข้างคล้ายกัน) มีวิธีการจัดการกับปัญหานี้ได้อย่างประสิทธิภาพ มันจะไม่มองแต่ละคำเป็น Unicode แต่จะมองว่าเป็น byte การทำแบบนี้ทำให้ vocabulary ตั้งต้น มีขนาดที่เล็ก (256) แต่ยังสามารถบันทึกทุกๆสัญลักษณ์ได้ โดยไม่ต้องแปลงสัญลักษณ์พิเศษต่างๆเป็น unknown token เทคนิคนี้เรียกว่า *byte-level BPE*
 
 หลังจากสร้าง vocabulary ตั้งต้นแล้ว เราจะเพิ่ม token ใหม่ๆ เข้าไปจนว่าจะได้ vocabulary ขนาดใหญ่พอกับที่เราต้องการ โดยเราจะเทรน BPE ให้เรียน กฎที่เรียกว่า *merges* ซึ่งเป็นกฎสำหรับการรวมสองหน่วยใน vocabulary เข้าด้วยกัน
 ตอนช่วงเริ่มต้น กฎ merges พวกนี้จะสร้างคำย่อยที่ประกอบด้วยตัวอักษรสองตัว ระหว่างที่เราเทรนต่อไปเรื่อยๆ คำย่อยที่ได้ก็จะยาวขึ้น
@@ -82,11 +76,8 @@ Corpus: ("hug", 10), ("p" "ug", 5), ("p" "un", 12), ("b" "un", 4), ("hug" "s", 5
 
 เราจะทำแบบนี้ต่อไปเรื่อยๆ จนกว่าจะได้ขนาดของ vocabulary ที่ต้องการ
 
-<Tip>
-
-✏️ **ตาคุณแล้ว!** คุณคิดว่ากฎ merge ต่อไปคืออะไร
-
-</Tip>
+> [!TIP]
+> ✏️ **ตาคุณแล้ว!** คุณคิดว่ากฎ merge ต่อไปคืออะไร
 
 ## Tokenization algorithm
 
@@ -108,11 +99,8 @@ Corpus: ("hug", 10), ("p" "ug", 5), ("p" "un", 12), ("b" "un", 4), ("hug" "s", 5
 คำว่า`"bug"` จะถูกแยกเป็น `["b", "ug"]` ส่วนคำว่า `"mug"` จะถูกแยกเป็น `["[UNK]", "ug"]` เพราะว่า `"m"` ไม่ได้อยู่ใน vocabulary ของเรา
 ้เช่นเดียวกัน คำว่า `"thug"` จะถูกแยกเป็น `["[UNK]", "hug"]` เพราะว่า `"t"` ไม่ได้อยู่ใน vocabulary กฎแรกจะรวม `"u"` และ `"g"` เข้าด้วยกัน จากนั้น `"hu"` และ `"g"` ก็จะถูกรวมเข้าด้วยกัน
 
-<Tip>
-
-✏️ **ตาคุณแล้ว!** คุณคิดว่าคำว่า `"unhug"` จะถูกแยกอย่างไร
-
-</Tip>
+> [!TIP]
+> ✏️ **ตาคุณแล้ว!** คุณคิดว่าคำว่า `"unhug"` จะถูกแยกอย่างไร
 
 ## การสร้าง BPE (Implementing BPE)
 
@@ -324,11 +312,8 @@ print(vocab)
  'Ġtok', 'Ġtoken', 'nd', 'Ġis', 'Ġth', 'Ġthe', 'in', 'Ġab', 'Ġtokeni']
 ```
 
-<Tip>
-
-💡 ถ้าคุณใช้ `train_new_from_iterator()` กับ corpus เดียวกันนี้ คุณจะไม่ได้ vocabulary เดียวกัน เพราะว่าอาจจะมีหลายคู่ token ที่มีความถี่สูงสุดเท่ากัน ในตัวอย่างของเรา เราเลือกคู่แรกที่โค้ดของเราอ่านเจอ ส่วน 🤗 Tokenizers library เลือกคู่แรกโดยเรียงตาม ID
-
-</Tip>
+> [!TIP]
+> 💡 ถ้าคุณใช้ `train_new_from_iterator()` กับ corpus เดียวกันนี้ คุณจะไม่ได้ vocabulary เดียวกัน เพราะว่าอาจจะมีหลายคู่ token ที่มีความถี่สูงสุดเท่ากัน ในตัวอย่างของเรา เราเลือกคู่แรกที่โค้ดของเราอ่านเจอ ส่วน 🤗 Tokenizers library เลือกคู่แรกโดยเรียงตาม ID
 
 หากเราต้องการ tokenize ข้อความใดข้อความหนึ่ง สิ่งที่ต้องทำคือ pre-tokenize จากนั้นจึงทำการ tokenize และสุดท้าย apply กฎ merge :
 
@@ -360,12 +345,9 @@ tokenize("This is not a token.")
 ['This', 'Ġis', 'Ġ', 'n', 'o', 't', 'Ġa', 'Ġtoken', '.']
 ```
 
-<Tip warning={true}>
-
-⚠️ การ implementation ในตัวอย่างของเราจะ return error ถ้าโปรแกรมอ่านเจอตัวอักษรที่ไม่มีใน vocabulary นั่นเพราะว่าเราไม่ได้เขียนโค้ดเพื่อจัดการกับกรณีแบบนี้
-ใน GPT-2 ปกติจะไม่มี unknown token แบบนี้ เพราะว่า ถ้าเราใช้ byte-level BPE เราจะไม่มีทางได้ตัวอักษรที่ unknown อย่างไรก็ตามในตัวอย่างของเรา เราไม่ได้ใช้ทุกๆ byte เพื่อสร้าง vocabulary ตั้งต้น
-อย่างไรก็ตาม หัวข้อนี้นั้นค่อนข้างลึก เราจึงจะไม่ขอพูดถึงรายละเอียดไปมากกว่านี้
-
-</Tip>
+> [!WARNING]
+> ⚠️ การ implementation ในตัวอย่างของเราจะ return error ถ้าโปรแกรมอ่านเจอตัวอักษรที่ไม่มีใน vocabulary นั่นเพราะว่าเราไม่ได้เขียนโค้ดเพื่อจัดการกับกรณีแบบนี้
+> ใน GPT-2 ปกติจะไม่มี unknown token แบบนี้ เพราะว่า ถ้าเราใช้ byte-level BPE เราจะไม่มีทางได้ตัวอักษรที่ unknown อย่างไรก็ตามในตัวอย่างของเรา เราไม่ได้ใช้ทุกๆ byte เพื่อสร้าง vocabulary ตั้งต้น
+> อย่างไรก็ตาม หัวข้อนี้นั้นค่อนข้างลึก เราจึงจะไม่ขอพูดถึงรายละเอียดไปมากกว่านี้
 
 นี่ก็คือ อัลกอริทึม BPE ในบทต่อไป เราจะมาดู WordPiece กัน
\ No newline at end of file
diff --git a/chapters/th/chapter6/6.mdx b/chapters/th/chapter6/6.mdx
index 4a40d9747..01126ee86 100644
--- a/chapters/th/chapter6/6.mdx
+++ b/chapters/th/chapter6/6.mdx
@@ -13,19 +13,13 @@ WordPiece มีความคล้ายกับ BPE ในวิธีก
 
 <Youtube id="qpv6ms_t_1A"/>
 
-<Tip>
-
-💡 บทนี้จะพูดถึง WordPiece อย่างละเอียด เราจะเจาะลึกถึงไปถึงการ implement อัลกอริทึมนี้ คุณสามารถข้ามไปตอนท้ายได้ ถ้าคุณสนใจเพียงแค่ภาพรวมคร่าวๆเท่านั้น
-
-</Tip>
+> [!TIP]
+> 💡 บทนี้จะพูดถึง WordPiece อย่างละเอียด เราจะเจาะลึกถึงไปถึงการ implement อัลกอริทึมนี้ คุณสามารถข้ามไปตอนท้ายได้ ถ้าคุณสนใจเพียงแค่ภาพรวมคร่าวๆเท่านั้น
 
 ## Training algorithm
 
-<Tip warning={true}>
-
-⚠️ เนื่องจาก Google ไม่เปิดเผยโค้ดสำหรับการเทรน WordPiece ดังนั้นโค้ดที่เราจะสอนคุณต่อจากนี้ มาจากการพยายามทำตามข้อมูลที่บอกไว้ใน paper แปลว่าโค้ดอาจจะไม่แม่นยำ 100%
-
-</Tip>
+> [!WARNING]
+> ⚠️ เนื่องจาก Google ไม่เปิดเผยโค้ดสำหรับการเทรน WordPiece ดังนั้นโค้ดที่เราจะสอนคุณต่อจากนี้ มาจากการพยายามทำตามข้อมูลที่บอกไว้ใน paper แปลว่าโค้ดอาจจะไม่แม่นยำ 100%
 
 เช่นเดียวกับ BPE อัลกอริทึม WordPiece เริ่มจาก vocabulary ขนาดเล็ก ที่ประกอบไปด้วย token พิเศษที่โมเดลใช้ และตัวอักษรตั้งต้น
 เพื่อที่โมเดลจะได้รู้ว่าคำไหนเป็นคำย่อย มันจะเขียน prefix เช่น `##` (ใช้ใน BERT) ไว้ข้างหน้าของแต่ละคำย่อย ในขั้นตอนแรก แต่ละคำจะถูกแบ่งออกเป็นตัวอักษร โดยตัวอักษรที่ไม่ใช่ตัวแรกจะมี prefix นี้
@@ -87,11 +81,8 @@ Corpus: ("hug", 10), ("p" "##u" "##g", 5), ("p" "##u" "##n", 12), ("b" "##u" "##
 
 เราจะทำแบบนี้จนกว่าจะได้ vocabulary ที่มีขนาดใหญ่มากพอ
 
-<Tip>
-
-✏️ **ตาคุณบ้างแล้ว!** กฎ merge ต่อไปคืออะไร
-
-</Tip>
+> [!TIP]
+> ✏️ **ตาคุณบ้างแล้ว!** กฎ merge ต่อไปคืออะไร
 
 ## Tokenization algorithm
 
@@ -110,11 +101,8 @@ Corpus: ("hug", 10), ("p" "##u" "##g", 5), ("p" "##u" "##n", 12), ("b" "##u" "##
 ตัวอย่างเช่นคำว่า `"mug"` จะถูก tokenize ให้เป็น `["[UNK]"]` เช่นเดียวกันกับคำว่า `"bum"` ถึงแม้ว่าเราจะเจอ `"b"` และ `"##u"` ใน vocabulary แต่ว่า `"##m"` ไม่ได้อยู่ใน  vocabulary เราจะแปลงทั้งคำเป็น `["[UNK]"]` และจะไม่แยกมันเป็น `["b", "##u", "[UNK]"]`
 นี่เป็นสิ่งหนึ่งที่แตกต่างจาก BPE โดย BPE จะดูที่แต่ละตัวอักษร และถ้าตัวไหนไม่พบใน vocabulary ก็จะถูกคัดว่าเป็น unknown
 
-<Tip>
-
-✏️ **ถึงตาคุณแล้ว!** คำว่า `"pugs"` จะถูก tokenize อย่างไร?
-
-</Tip>
+> [!TIP]
+> ✏️ **ถึงตาคุณแล้ว!** คำว่า `"pugs"` จะถูก tokenize อย่างไร?
 
 ## Implementing WordPiece
 
@@ -332,12 +320,8 @@ print(vocab)
 
 ถ้าเทียบกับ BPE คุณจะเห็นว่า tokenizer ตัวนี้สามารถเรียนเกี่ยวกับคำย่อยได้เร็วกว่านิดหน่อย
 
-<Tip>
-
-💡 ถ้าคุณใช้ `train_new_from_iterator()` กับ corpus ตัวอย่างนี้ คุณอาจจะไม่ได้ vocabulary เดียวกัน นั่นก็เพราะ 🤗 Tokenizers library ไม่ได้ใช้ WordPiece ในการเทรน แต่เราใช้ BPE
-
-
-</Tip>
+> [!TIP]
+> 💡 ถ้าคุณใช้ `train_new_from_iterator()` กับ corpus ตัวอย่างนี้ คุณอาจจะไม่ได้ vocabulary เดียวกัน นั่นก็เพราะ 🤗 Tokenizers library ไม่ได้ใช้ WordPiece ในการเทรน แต่เราใช้ BPE
 
 เมื่อคุณต้องการ tokenize ข้อความใหม่ คุณจะต้องทำการ pre-tokenize ข้อความแล้วจากนั้นจึง tokenize แต่ละคำ ตามหลักการของอัลกอริทึมนี้
 เราจะมองหาคำย่อยที่ยาวที่สุด โดยอ่านจากข้างหน้าคำไปข้างหลัง จากนั้นเราจะแยกคำหลักออกตรงคำย่อยนี้ จากนั้นทำขั้นตอนนี้ซ้ำกับส่วนต่อๆไปของคำนั้น แล้วทำเช่นเดียวกันกับคำต่อไป
diff --git a/chapters/th/chapter6/7.mdx b/chapters/th/chapter6/7.mdx
index add722178..dc37eefc7 100644
--- a/chapters/th/chapter6/7.mdx
+++ b/chapters/th/chapter6/7.mdx
@@ -11,11 +11,8 @@
 
 <Youtube id="TGZfZVuF9Yc"/>
 
-<Tip>
-
-💡 บทนี้จะพูดถึง Unigram อย่างละเอียด เราจะเจาะลึกถึงไปถึงการ implement อัลกอริทึมนี้ คุณสามารถข้ามไปตอนท้ายได้ ถ้าคุณสนใจเพียงแค่ภาพรวมคร่าวๆเท่านั้น
-
-</Tip>
+> [!TIP]
+> 💡 บทนี้จะพูดถึง Unigram อย่างละเอียด เราจะเจาะลึกถึงไปถึงการ implement อัลกอริทึมนี้ คุณสามารถข้ามไปตอนท้ายได้ ถ้าคุณสนใจเพียงแค่ภาพรวมคร่าวๆเท่านั้น
 
 ## Training algorithm
 
@@ -63,11 +60,8 @@ Unigram ถือว่าเป็น language model ประเภทที
 
 และผลรวมของทุกๆความถี่ก็คือ 210 ดังนั้นความน่าจะเป็นของ `"ug"` ก็คือ 20/210
 
-<Tip>
-
-✏️ **ตาคุณบ้างแล้ว!** ลองเขียนโค้ดเพื่อคำนวณความถี่ของแต่ละ token แบบตัวอย่างข้างบน และคำนวณผลรวมของทุกความถี่ด้วย แล้วเช็คว่าผลลัพธ์ของคุณถูกหรือไม่
-
-</Tip>
+> [!TIP]
+> ✏️ **ตาคุณบ้างแล้ว!** ลองเขียนโค้ดเพื่อคำนวณความถี่ของแต่ละ token แบบตัวอย่างข้างบน และคำนวณผลรวมของทุกความถี่ด้วย แล้วเช็คว่าผลลัพธ์ของคุณถูกหรือไม่
 
 ในการ tokenize คำๆหนึ่งนั้น เราจะคำนวณทุกๆการตัดคำที่เป็นไปได้ (segmentation) และคำนวณความน่าจะเป็นของแต่ละ segmentation ด้วย โดยใช้วิธีการคำนวณตามโมเดล Unigram
 เนื่องจากแต่ละ token ไม่ได้ขึ้นกับ token ตัวอื่น ค่าความน่าจะเป็นของแต่ละ segmentation สามารถคำนวณได้โดย นำค่าความน่าจะเป็นของแต่ละ token ย่อยใน segmentation นั้นมาคูณกัน
@@ -112,11 +106,8 @@ Character 4 (g): "un" "hug" (score 0.005442)
 
 ดังนั้น `"unhug"` ก็จะถูกแบ่งเป็น `["un", "hug"]`
 
-<Tip>
-
-✏️ **ตาคุณบ้างแล้ว!** ลองคำนวณการแบ่งคำของ `"huggun"`และ score ของมัน
-
-</Tip>
+> [!TIP]
+> ✏️ **ตาคุณบ้างแล้ว!** ลองคำนวณการแบ่งคำของ `"huggun"`และ score ของมัน
 
 ## กลับมาสู่การเทรน
 
@@ -232,11 +223,8 @@ token_freqs = list(char_freqs.items()) + sorted_subwords[: 300 - len(char_freqs)
 token_freqs = {token: freq for token, freq in token_freqs}
 ```
 
-<Tip>
-
-💡 SentencePiece ใช้อัลกอริทึม ชื่อ Enhanced Suffix Array (ESA) ซึ่งมีประสิทธิภาพมากกว่า Ngram ในการสร้าง vocabulary ตั้งต้น
-
-</Tip>
+> [!TIP]
+> 💡 SentencePiece ใช้อัลกอริทึม ชื่อ Enhanced Suffix Array (ESA) ซึ่งมีประสิทธิภาพมากกว่า Ngram ในการสร้าง vocabulary ตั้งต้น
 
 ขั้นตอนต่อไป เราจะคำนวณผลรวมของความถี่ของทุกๆคำ เพื่อแปลงความถี่เป็นค่าความน่าจะเป็น
 
@@ -362,12 +350,9 @@ print(scores["his"])
 0.0
 ```
 
-<Tip>
-
-💡 วิธีการคำนวณแบบข้างบนนี้ถือว่าไม่มีประสิทธิภาพนัก ดังนั้น SentencePiece จะคำนวณค่า loss แบบคร่าวๆเท่านั้น เวลาที่เราลองลบ token แต่ละตัวออก โดยมันจะแทนที่ token นั้นด้วย segmentation ของมันแทนที่จะใช้ token เต็มๆ
-การทำแบบนี้ช่วยให้เราสามารถคำนวณ score ของทุกๆตัวได้ภายในครั้งเดียว และยังสามารถคำนวณไปพร้อมๆกับค่า loss ได้อีกด้วย
-
-</Tip>
+> [!TIP]
+> 💡 วิธีการคำนวณแบบข้างบนนี้ถือว่าไม่มีประสิทธิภาพนัก ดังนั้น SentencePiece จะคำนวณค่า loss แบบคร่าวๆเท่านั้น เวลาที่เราลองลบ token แต่ละตัวออก โดยมันจะแทนที่ token นั้นด้วย segmentation ของมันแทนที่จะใช้ token เต็มๆ
+> การทำแบบนี้ช่วยให้เราสามารถคำนวณ score ของทุกๆตัวได้ภายในครั้งเดียว และยังสามารถคำนวณไปพร้อมๆกับค่า loss ได้อีกด้วย
 
 สิ่งสุดท้ายที่เราจะต้องทำก็คือ เพิ่ม token พิเศษที่โมเดลใช้ลงไปใน vocabulary จากนั้น loop จนกว่าเราจะลบ token ออกจาก vocabulary จนได้ขนาดที่เราพอใจ :
 
diff --git a/chapters/th/chapter6/8.mdx b/chapters/th/chapter6/8.mdx
index 5d7e90c33..767da8acb 100644
--- a/chapters/th/chapter6/8.mdx
+++ b/chapters/th/chapter6/8.mdx
@@ -119,13 +119,10 @@ print(tokenizer.normalizer.normalize_str("Héllò hôw are ü?"))
 hello how are u?
 ```
 
-<Tip>
-
-**รายละเอียดเพิ่มเติม** ถ้าคุณทดลองใช้งาน normalizer ทั้งสองเวอร์ชันกับข้อความที่มีตัวอักษร unicode `u"\u0085"` คุณจะได้ผลลัพธ์ที่แตกต่างกัน
-อย่างไรก็ตาม เราไม่อยากทำให้เวอร์ชันที่สร้างจาก `normalizers.Sequence` ของเรานั้นซับซ้อนเกินไป เราจึงไม่ใช้ Regex ที่ `BertNormalizer` ใช้เวลาที่ `clean_text` ถูกตั้งค่าเป็น `True` ซึ่งเป็นค่าตั้งต้น
-แต่คุณไม่ต้องกังวลไป เพราะมันยังมีวิธีที่จะทำให้ผลลัพธ์ออกมาเป็นเหมือนกันโดยที่ไม่ต้องใช้ `BertNormalizer` นั่นคือโดยการเพิ่ม `normalizers.Replace` สองครั้ง เข้าไปใน `normalizers.Sequence`
-
-</Tip>
+> [!TIP]
+> **รายละเอียดเพิ่มเติม** ถ้าคุณทดลองใช้งาน normalizer ทั้งสองเวอร์ชันกับข้อความที่มีตัวอักษร unicode `u"\u0085"` คุณจะได้ผลลัพธ์ที่แตกต่างกัน
+> อย่างไรก็ตาม เราไม่อยากทำให้เวอร์ชันที่สร้างจาก `normalizers.Sequence` ของเรานั้นซับซ้อนเกินไป เราจึงไม่ใช้ Regex ที่ `BertNormalizer` ใช้เวลาที่ `clean_text` ถูกตั้งค่าเป็น `True` ซึ่งเป็นค่าตั้งต้น
+> แต่คุณไม่ต้องกังวลไป เพราะมันยังมีวิธีที่จะทำให้ผลลัพธ์ออกมาเป็นเหมือนกันโดยที่ไม่ต้องใช้ `BertNormalizer` นั่นคือโดยการเพิ่ม `normalizers.Replace` สองครั้ง เข้าไปใน `normalizers.Sequence`
 
 ขั้นตอนต่อไปคือ การ pre-tokenization เราจะใช้ `BertPreTokenizer` ที่ถูกสร้างมาแล้ว :
 
diff --git a/chapters/tr/chapter2/1.mdx b/chapters/tr/chapter2/1.mdx
index e0787052a..6d032f704 100644
--- a/chapters/tr/chapter2/1.mdx
+++ b/chapters/tr/chapter2/1.mdx
@@ -20,6 +20,5 @@ Bunun ardından, kütüphanenin model API’ından bahsedeceğiz: Model ve konfi
 
 Daha sonra, `pipeline()` fonksiyonunun diğer ana parçası olan simgeleştirici API’ına göz atacağız. Simgeleştiriciler, sinir ağının metni sayısal giriş verilerine ve gerektiğinde bu sayısal verileri tekrar metne dönüştüren ilk ve son işlem aşamalarından sorumludur. Son olarak, birden fazla cümleyi hazır bir grup halinde bir modele nasıl gönderebileceğinizi gösterip, `tokenizer()` fonksiyonuna yakından bakarak bu bölümü tamamlayacağız. 
 
-<Tip>
-⚠️ Model Hub ve 🤗 Transformers kütüphanesinde yeralan bütün özelliklerden yararlanabilmeniz icin, <a href="https://huggingface.co/join">bir hesap oluşturmanızı</a> tavsiye ediyoruz.
-</Tip>
+> [!TIP]
+> ⚠️ Model Hub ve 🤗 Transformers kütüphanesinde yeralan bütün özelliklerden yararlanabilmeniz icin, <a href="https://huggingface.co/join">bir hesap oluşturmanızı</a> tavsiye ediyoruz.
diff --git a/chapters/vi/chapter1/3.mdx b/chapters/vi/chapter1/3.mdx
index e89a7f86f..eb0106c69 100644
--- a/chapters/vi/chapter1/3.mdx
+++ b/chapters/vi/chapter1/3.mdx
@@ -9,11 +9,10 @@
 
 Trong phần này, chúng ta sẽ xem các mô hình Transformer có thể làm được những gì và sử dụng công cụ đầu tiên từ thư viện 🤗 Transformers: hàm `pipeline()`.
 
-<Tip>
-Bạn có thấy nút <em>Mở trong Colab</em> ở trên cùng bên phải không? Bấm vào nó để mở sổ ghi chép Google Colab với tất cả các đoạn mã của phần này. Nút này sẽ xuất hiện trong bất kỳ phần nào có chứa các mã ví dụ.
-
-Nếu bạn muốn chạy các ví dụ ở máy cá nhân, các bạn có thể tham khảo phần <a href="/course/chapter0">cài đặt</a>.
-</Tip>
+> [!TIP]
+> Bạn có thấy nút <em>Mở trong Colab</em> ở trên cùng bên phải không? Bấm vào nó để mở sổ ghi chép Google Colab với tất cả các đoạn mã của phần này. Nút này sẽ xuất hiện trong bất kỳ phần nào có chứa các mã ví dụ.
+>
+> Nếu bạn muốn chạy các ví dụ ở máy cá nhân, các bạn có thể tham khảo phần <a href="/course/chapter0">cài đặt</a>.
 
 ## Transformers ở muôn nơi!
 
@@ -23,9 +22,8 @@ Các mô hình Transformers được sử dụng để giải quyết tất cả
 
 Thư viện [🤗 Transformers](https://github.com/huggingface/transformers) cung cấp tính năng tạo và sử dụng các mô hình được chia sẻ đó. [Model Hub](https://huggingface.co/models) chứa hàng nghìn mô hình được huấn luyện trước mà bất kỳ ai cũng có thể tải xuống và sử dụng. Bạn cũng có thể tải các mô hình của riêng mình lên Hub!
 
-<Tip>
-⚠️ Hugging Face Hub không giới hạn ở các mô hình Transformer. Bất kỳ ai cũng có thể chia sẻ bất kỳ loại mô hình hoặc bộ dữ liệu nào họ muốn! <a href="https://huggingface.co/join"> Tạo tài khoản huggingface.co </a> để hưởng lợi từ tất cả các tính năng có sẵn này!
-</Tip>
+> [!TIP]
+> ⚠️ Hugging Face Hub không giới hạn ở các mô hình Transformer. Bất kỳ ai cũng có thể chia sẻ bất kỳ loại mô hình hoặc bộ dữ liệu nào họ muốn! <a href="https://huggingface.co/join"> Tạo tài khoản huggingface.co </a> để hưởng lợi từ tất cả các tính năng có sẵn này!
 
 Trước khi đi sâu vào cách hoạt động của các mô hình Transformer, hãy cùng xem một vài ví dụ về cách sử dụng chúng để giải quyết một số vấn đề NLP thú vị.
 
@@ -105,11 +103,8 @@ classifier(
 
 Quy trình này được gọi là _zero-shot_ (không mẫu) vì bạn không cần tinh chỉnh mô hình trên dữ liệu của bạn để sử dụng. Nó có thể trực tiếp trả về xác suất cho bất kỳ danh sách nhãn nào bạn muốn!
 
-<Tip>
-
-✏️ **Thử nghiệm thôi!** Cùng thử các chuỗi văn bản và các nhãn riêng của bạn để xem mô hình hoạt động như thế nào.
-
-</Tip>
+> [!TIP]
+> ✏️ **Thử nghiệm thôi!** Cùng thử các chuỗi văn bản và các nhãn riêng của bạn để xem mô hình hoạt động như thế nào.
 
 ## Tạo văn bản
 
@@ -132,11 +127,8 @@ generator("In this course, we will teach you how to")
 
 Bạn có thể kiểm soát số lượng chuỗi khác nhau được tạo với tham số `num_return_sequences` và tổng độ dài của văn bản đầu ra với tham số `max_length`.
 
-<Tip>
-
-✏️ **Thử nghiệm thôi!** Sử dụng `num_return_sequences` và `max_length` để tạo ra hai câu, mỗi câu chứa 15 từ.
-
-</Tip>
+> [!TIP]
+> ✏️ **Thử nghiệm thôi!** Sử dụng `num_return_sequences` và `max_length` để tạo ra hai câu, mỗi câu chứa 15 từ.
 
 ## Sử dụng một mô hình bất kỳ từ Hub trong pipeline
 
@@ -167,11 +159,8 @@ Bạn có thể tinh chỉnh việc tìm kiếm cho một mô hình của mình
 
 Sau khi bạn chọn một mô hình bằng cách bấm vào nó, bạn sẽ thấy rằng có một tiện ích cho phép bạn dùng thử trực tuyến. Bằng cách này, bạn có thể nhanh chóng kiểm tra khả năng của mô hình trước khi tải xuống.
 
-<Tip>
-
-✏️ **Thử nghiệm thôi!** Sử dụng bộ lọc để tìm mô hình tạo văn bản cho ngôn ngữ khác. Hãy thoải mái chơi với tiện ích này và sử dụng nó theo một pipeline!
-
-</Tip>
+> [!TIP]
+> ✏️ **Thử nghiệm thôi!** Sử dụng bộ lọc để tìm mô hình tạo văn bản cho ngôn ngữ khác. Hãy thoải mái chơi với tiện ích này và sử dụng nó theo một pipeline!
 
 ### Inference API
 
@@ -203,11 +192,8 @@ unmasker("This course will teach you all about <mask> models.", top_k=2)
 
 Tham số `top_k` kiểm soát số lượng khả năng bạn muốn được hiển thị. Lưu ý rằng ở đây mô hình điền từ vào vị trí bị che bởi từ `<mask>`, thường được gọi là *mask token*. Các mô hình điền khác có thể có các kiểu che từ khác nhau, vì vậy, tốt nhất nên xác minh từ bị che phù hợp khi khám phá các mô hình khác. Một cách để kiểm tra đó là xem từ bị che được sử dụng trong tiện ích con.
 
-<Tip>
-
-✏️ **Thử nghiệm thôi!** Tìm kiếm mô hình `bert-base-cased` trên Hub và xác định từ bị che của nó trong tiện ích Inference API. Mô hình này dự đoán điều gì cho câu trong ví dụ về `pipeline` của chúng ta ở trên?
-
-</Tip>
+> [!TIP]
+> ✏️ **Thử nghiệm thôi!** Tìm kiếm mô hình `bert-base-cased` trên Hub và xác định từ bị che của nó trong tiện ích Inference API. Mô hình này dự đoán điều gì cho câu trong ví dụ về `pipeline` của chúng ta ở trên?
 
 ## Nhận dạng thực thể
 
@@ -231,11 +217,8 @@ ner("My name is Sylvain and I work at Hugging Face in Brooklyn.")
 
 Chúng ta truyền `grouped_entities = True` vào trong hàm pipeline để yêu cầu pipeline nhóm lại các phần thuộc cùng một thực thể trong câu với nhau: ở đây mô hình đã nhóm chính xác "Hugging" và "Face" thành một tổ chức duy nhất, mặc dù tên bao gồm nhiều từ. Trên thực tế, như chúng ta sẽ thấy trong chương tiếp theo, quá trình tiền xử lý thậm chí còn chia một số từ thành các phần nhỏ hơn. Ví dụ: `Sylvain` được chia thành bốn phần: `S`, `##yl`, `##va`, và `##in`. Trong bước hậu xử lý, pipeline đã tập hợp lại thành công các phần đó.
 
-<Tip>
-
-✏️ **Thử nghiệm thôi!** Tìm kiếm trên Model Hub để tìm một mô hình có thể thực hiện gán nhãn từ loại (thường được viết tắt là POS) bằng tiếng Anh. Mô hình này dự đoán điều gì cho câu trong ví dụ trên?
-
-</Tip>
+> [!TIP]
+> ✏️ **Thử nghiệm thôi!** Tìm kiếm trên Model Hub để tìm một mô hình có thể thực hiện gán nhãn từ loại (thường được viết tắt là POS) bằng tiếng Anh. Mô hình này dự đoán điều gì cho câu trong ví dụ trên?
 
 ## Hỏi đáp
 
@@ -318,10 +301,7 @@ translator("Ce cours est produit par Hugging Face.")
 
 Giống như tạo và tóm tắt văn bản, bạn có thể chỉ định giá trị `max_length` hoặc `min_length` cho kết quả trả về.
 
-<Tip>
-
-✏️ **Thử nghiệm thôi!** Tìm kiếm các mô hình dịch ở các ngôn ngữ khác và cố gắng dịch câu trước đó sang một vài ngôn ngữ khác nhau.
-
-</Tip>
+> [!TIP]
+> ✏️ **Thử nghiệm thôi!** Tìm kiếm các mô hình dịch ở các ngôn ngữ khác và cố gắng dịch câu trước đó sang một vài ngôn ngữ khác nhau.
 
 Các pipeline ở trên hầu hết phục vụ mục đích trình diễn. Chúng được lập trình cho các tác vụ cụ thể và không thể thực hiện các biến thể của chúng. Trong chương tiếp theo, bạn sẽ tìm hiểu những gì bên trong một hàm  `pipeline()`  và cách tinh chỉnh hành vi của nó.
diff --git a/chapters/vi/chapter2/1.mdx b/chapters/vi/chapter2/1.mdx
index 8bbce9975..7dc4c3f65 100644
--- a/chapters/vi/chapter2/1.mdx
+++ b/chapters/vi/chapter2/1.mdx
@@ -19,6 +19,5 @@ Chương này sẽ bắt đầu với một ví dụ từ đầu đến cuối,
 
 Sau đó, chúng ta sẽ xem xét API tokenizer, một thành phần chính khác của hàm `pipeline()`. Tokenizers thực hiện các bước xử lý đầu tiên và cuối cùng, xử lý việc chuyển đổi từ văn bản đầu vào thành dạng số cho mạng nơ-ron và chuyển đổi trở lại văn bản khi cần. Cuối cùng, chúng tôi sẽ chỉ cho bạn cách xử lý việc gửi nhiều câu vào một mô hình trong một batch (lô) đã chuẩn bị, sau đó tóm tắt tất cả bằng cách xem xét kỹ hơn hàm `tokenizer()` ở bậc cao.
 
-<Tip>
-⚠️ Để có thể tận dụng tất cả các tính năng có sẵn với Model Hub và 🤗 Transformers, chúng tôi khuyến khích bạn <a href="https://huggingface.co/join">tạo tài khoản </a>.
-</Tip>
+> [!TIP]
+> ⚠️ Để có thể tận dụng tất cả các tính năng có sẵn với Model Hub và 🤗 Transformers, chúng tôi khuyến khích bạn <a href="https://huggingface.co/join">tạo tài khoản </a>.
diff --git a/chapters/vi/chapter2/2.mdx b/chapters/vi/chapter2/2.mdx
index a15e95884..da2f63afe 100644
--- a/chapters/vi/chapter2/2.mdx
+++ b/chapters/vi/chapter2/2.mdx
@@ -22,9 +22,8 @@
 
 {/if}
 
-<Tip>
-Đây là phần đầu tiên có nội dung hơi khác một chút tùy thuộc vào việc bạn sử dụng PyTorch hay TensorFlow. Chuyển đổi công tắc trên đầu tiêu đề để chọn nền tảng bạn thích!
-</Tip>
+> [!TIP]
+> Đây là phần đầu tiên có nội dung hơi khác một chút tùy thuộc vào việc bạn sử dụng PyTorch hay TensorFlow. Chuyển đổi công tắc trên đầu tiêu đề để chọn nền tảng bạn thích!
 
 {#if fw === 'pt'}
 <Youtube id="1pedAIvTWXk"/>
@@ -346,8 +345,5 @@ Bây giờ chúng ta có thể kết luận rằng mô hình đã dự đoán nh
 
 Chúng tôi đã tái tạo thành công ba bước của quy trình: tiền xử lý bằng tokenizers, đưa đầu vào qua mô hình và hậu xử lý! Giờ thì chúng ta hãy dành một chút thời gian để đi sâu hơn vào từng bước đó.
 
-<Tip>
-
-✏️ **Thử nghiệm thôi!** Chọn hai (hoặc nhiều) văn bản của riêng bạn và chạy chúng thông qua `sentiment-analysis`. Sau đó, tự mình lặp lại các bước bạn đã thấy ở đây và kiểm tra xem bạn có thu được kết quả tương tự không!
-
-</Tip>
+> [!TIP]
+> ✏️ **Thử nghiệm thôi!** Chọn hai (hoặc nhiều) văn bản của riêng bạn và chạy chúng thông qua `sentiment-analysis`. Sau đó, tự mình lặp lại các bước bạn đã thấy ở đây và kiểm tra xem bạn có thu được kết quả tương tự không!
diff --git a/chapters/vi/chapter2/4.mdx b/chapters/vi/chapter2/4.mdx
index 1d4c6cd1d..08c9ecee2 100644
--- a/chapters/vi/chapter2/4.mdx
+++ b/chapters/vi/chapter2/4.mdx
@@ -214,11 +214,8 @@ print(ids)
 
 Các đầu ra này, sau khi được chuyển đổi sang khung tensor thích hợp, có thể được sử dụng làm đầu vào cho một mô hình như đã thấy ở phần trước trong chương này.
 
-<Tip>
-
-✏️ **Thử nghiệm thôi!** Sao chép hai bước cuối cùng (tokenize và chuyển đổi sang ID đầu vào) trên các câu đầu vào mà chúng ta đã sử dụng trong phần 2 ("I've been waiting for a HuggingFace course my whole life." và "I hate this so much!"). Kiểm tra xem bạn có nhận được các ID đầu vào giống như chúng tôi đã nhận trước đó không!
-
-</Tip>
+> [!TIP]
+> ✏️ **Thử nghiệm thôi!** Sao chép hai bước cuối cùng (tokenize và chuyển đổi sang ID đầu vào) trên các câu đầu vào mà chúng ta đã sử dụng trong phần 2 ("I've been waiting for a HuggingFace course my whole life." và "I hate this so much!"). Kiểm tra xem bạn có nhận được các ID đầu vào giống như chúng tôi đã nhận trước đó không!
 
 ## Giải mã
 
diff --git a/chapters/vi/chapter2/5.mdx b/chapters/vi/chapter2/5.mdx
index 19a57ed7e..b8a0a4968 100644
--- a/chapters/vi/chapter2/5.mdx
+++ b/chapters/vi/chapter2/5.mdx
@@ -180,11 +180,8 @@ batched_ids = [ids, ids]
 
 Đây là một lô chứa hai chuỗi giống nhau!
 
-<Tip>
-
-✏️ **Thử nghiệm thôi!** Chuyển đổi danh sách `batch_ids` này thành một tensor và chuyển nó qua mô hình của bạn. Kiểm tra để đảm bảo rằng bạn có được logit giống như trước đây (nhưng hai lần)!
-
-</Tip>
+> [!TIP]
+> ✏️ **Thử nghiệm thôi!** Chuyển đổi danh sách `batch_ids` này thành một tensor và chuyển nó qua mô hình của bạn. Kiểm tra để đảm bảo rằng bạn có được logit giống như trước đây (nhưng hai lần)!
 
 Việc phân phối lô cho phép mô hình hoạt động khi bạn đưa vào nhiều câu. Việc sử dụng nhiều chuỗi cũng đơn giản như xây dựng một lô với một chuỗi duy nhất. Tuy nhiên, có một vấn đề thứ hai. Khi bạn cố gắng ghép hai (hoặc nhiều) câu lại với nhau, chúng có thể có độ dài khác nhau. Nếu bạn đã từng làm việc với tensor trước đây, bạn biết rằng chúng cần có dạng hình chữ nhật, vì vậy bạn sẽ không thể chuyển đổi trực tiếp danh sách ID đầu vào thành tensor. Để giải quyết vấn đề này, chúng tôi thường *đệm* các đầu vào.
 
@@ -316,11 +313,8 @@ Bây giờ chúng ta nhận được các logit tương tự cho câu thứ hai
 
 Lưu ý cách giá trị cuối cùng của chuỗi thứ hai là ID đệm, là giá trị 0 trong attention mask.
 
-<Tip>
-
-✏️ **Thử nghiệm thôi!** Áp dụng thủ công tokenize cho hai câu được sử dụng trong phần 2 ("I've been waiting for a HuggingFace course my whole life." và "I hate this so much!"). Đưa chúng vào mô hình và kiểm tra xem bạn có nhận được các logit giống như trong phần 2 không. Bây giờ, gộp chúng lại với nhau bằng cách sử dụng token đệm, sau đó tạo attention mask thích hợp. Kiểm tra xem bạn có đạt được kết quả tương tự khi đưa qua mô hình không!
-
-</Tip>
+> [!TIP]
+> ✏️ **Thử nghiệm thôi!** Áp dụng thủ công tokenize cho hai câu được sử dụng trong phần 2 ("I've been waiting for a HuggingFace course my whole life." và "I hate this so much!"). Đưa chúng vào mô hình và kiểm tra xem bạn có nhận được các logit giống như trong phần 2 không. Bây giờ, gộp chúng lại với nhau bằng cách sử dụng token đệm, sau đó tạo attention mask thích hợp. Kiểm tra xem bạn có đạt được kết quả tương tự khi đưa qua mô hình không!
 
 ## Những chuỗi dài hơn
 
diff --git a/chapters/vi/chapter3/2.mdx b/chapters/vi/chapter3/2.mdx
index 5920b57b1..c316c264a 100644
--- a/chapters/vi/chapter3/2.mdx
+++ b/chapters/vi/chapter3/2.mdx
@@ -146,11 +146,8 @@ raw_train_dataset.features
 
 Phía sau, `label` thuộc loại `ClassLabel` và ánh xạ các số nguyên thành tên nhãn được lưu trữ trong thư mục *names*. `0` tương ứng với `không tương đương`, và `1` tương ứng với `tương đương`.
 
-<Tip>
-
-✏️ **Thử nghiệm thôi!** Nhìn vào phần tử thứ 15 của tập huấn luyện và phần tử 87 của tập kiểm định. Nhãn của chúng là gì?
-
-</Tip>
+> [!TIP]
+> ✏️ **Thử nghiệm thôi!** Nhìn vào phần tử thứ 15 của tập huấn luyện và phần tử 87 của tập kiểm định. Nhãn của chúng là gì?
 
 ### Tiền xử lý một bộ dữ liệu
 
@@ -188,11 +185,8 @@ inputs
 
 Chúng ta đã thảo luận về `input_ids` và `attention_mask` trong [Chương 2](/course/chapter2), nhưng chúng ta tạm dừng để nói về `token_type_ids`. Trong ví dụ này, đây là phần cho mô hình biết phần nào của đầu vào là câu đầu tiên và phần nào là câu thứ hai.
 
-<Tip>
-
-✏️ **Thử nghiệm thôi!** Lấy phần tử 15 của tập huấn luyện và tokenize hai câu riêng biệt và như một cặp. Sự khác biệt giữa hai kết quả là gì?
-
-</Tip>
+> [!TIP]
+> ✏️ **Thử nghiệm thôi!** Lấy phần tử 15 của tập huấn luyện và tokenize hai câu riêng biệt và như một cặp. Sự khác biệt giữa hai kết quả là gì?
 
 Nếu chúng ta giải mã các ID bên trong `input_ids` trở lại các từ:
 
@@ -349,11 +343,8 @@ Trông khá ổn! Giờ ta đã chuyển từ văn bản thô sang các lô mà
 
 {/if}
 
-<Tip>
-
-✏️ **Thử nghiệm thôi!** Sao chép tiền xử lý trên tập dữ liệu GLUE SST-2. Nó hơi khác một chút vì nó bao gồm các câu đơn thay vì các cặp, nhưng phần còn lại của những gì ta đã làm sẽ tương tự nhau. Với một thử thách khó hơn, hãy cố gắng viết một hàm tiền xử lý hoạt động trên bất kỳ tác vụ GLUE nào.
-
-</Tip>
+> [!TIP]
+> ✏️ **Thử nghiệm thôi!** Sao chép tiền xử lý trên tập dữ liệu GLUE SST-2. Nó hơi khác một chút vì nó bao gồm các câu đơn thay vì các cặp, nhưng phần còn lại của những gì ta đã làm sẽ tương tự nhau. Với một thử thách khó hơn, hãy cố gắng viết một hàm tiền xử lý hoạt động trên bất kỳ tác vụ GLUE nào.
 
 {#if fw === 'tf'}
 
diff --git a/chapters/vi/chapter3/3.mdx b/chapters/vi/chapter3/3.mdx
index cff0f3963..7c8c4faf5 100644
--- a/chapters/vi/chapter3/3.mdx
+++ b/chapters/vi/chapter3/3.mdx
@@ -42,11 +42,8 @@ from transformers import TrainingArguments
 training_args = TrainingArguments("test-trainer")
 ```
 
-<Tip>
-
-💡 Nếu bạn muốn tự động tải mô hình của mình lên Hub trong quá trình huấn luyện, hãy chuyển sang phần `push_to_hub=True` trong phần `TrainingArguments`. Chúng ta sẽ tìm hiểu thêm về điều này trong [Chương 4](/course/chapter4/3)
-
-</Tip>
+> [!TIP]
+> 💡 Nếu bạn muốn tự động tải mô hình của mình lên Hub trong quá trình huấn luyện, hãy chuyển sang phần `push_to_hub=True` trong phần `TrainingArguments`. Chúng ta sẽ tìm hiểu thêm về điều này trong [Chương 4](/course/chapter4/3)
 
 Bước thứ hai là xác định mô hình của chúng ta. Như trong [chương trước](/course/chapter2), chúng ta sẽ sử dụng lớp `AutoModelForSequenceClassification`, với hai nhãn:
 
@@ -163,8 +160,5 @@ Lần này, nó sẽ báo cáo thông số mất mát kiểm định và chỉ s
 
 Phần này kết thúc phần giới thiệu về cách tinh chỉnh bằng API `Trainer`. Một ví dụ về việc thực hiện điều này đối với hầu hết các tác vụ NLP phổ biến sẽ được đưa ra trong [Chương 7](/course/chapter7), nhưng ở thời điểm này chúng ta hãy xem cách thực hiện điều tương tự trong PyTorch thuần túy.
 
-<Tip>
-
-✏️ **Thử nghiệm thôi!** Tinh chỉnh mô hình trên tập dữ liệu GLUE SST-2, sử dụng quá trình xử lý dữ liệu bạn đã thực hiện trong phần 2.
-
-</Tip>
+> [!TIP]
+> ✏️ **Thử nghiệm thôi!** Tinh chỉnh mô hình trên tập dữ liệu GLUE SST-2, sử dụng quá trình xử lý dữ liệu bạn đã thực hiện trong phần 2.
diff --git a/chapters/vi/chapter3/3_tf.mdx b/chapters/vi/chapter3/3_tf.mdx
index 9878b7674..f048d0270 100644
--- a/chapters/vi/chapter3/3_tf.mdx
+++ b/chapters/vi/chapter3/3_tf.mdx
@@ -70,11 +70,8 @@ Bạn sẽ nhận thấy rằng không như trong [Chương 2](/course/chapter2)
 
 Để tinh chỉnh mô hình trên tập dữ liệu của mình, chúng ta chỉ cần `compile()` mô hình và sau đó chuyển dữ liệu của ta đến phương thức `fit()`. Thao tác này sẽ bắt đầu quá trình tinh chỉnh (sẽ mất vài phút trên GPU) và báo cáo sự mất mát ở tập huấn luyện khi nó diễn ra, cộng với mất mát ở tập kiểm định ở cuối mỗi epoch.
 
-<Tip>
-
-Lưu ý rằng 🤗 các mô hình Transformers có một khả năng đặc biệt mà hầu hết các mô hình Keras không có - chúng có thể tự động sử dụng một lượng mất mát thích hợp mà chúng tính toán bên trong. Chúng sẽ sử dụng sự mất mát này theo mặc định nếu bạn không đặt tham số mất mát bên trong `compile()`. Lưu ý rằng để sử dụng hàm mất mát trong nội bộ, bạn sẽ cần truyền các nhãn của mình như một phần của đầu vào, không phải dưới dạng nhãn riêng biệt, đây là cách thông thường để sử dụng nhãn với các mô hình Keras. Bạn sẽ thấy các ví dụ về điều này trong Phần 2 của khóa học, trong đó việc xác định hàm mất mát chính xác có thể khó khăn. Tuy nhiên, đối với phân loại chuỗi, một hàm mất mát Keras tiêu chuẩn hoạt động khá tốt, vì vậy đó là những gì chúng ta sẽ sử dụng ở đây.
-
-</Tip>
+> [!TIP]
+> Lưu ý rằng 🤗 các mô hình Transformers có một khả năng đặc biệt mà hầu hết các mô hình Keras không có - chúng có thể tự động sử dụng một lượng mất mát thích hợp mà chúng tính toán bên trong. Chúng sẽ sử dụng sự mất mát này theo mặc định nếu bạn không đặt tham số mất mát bên trong `compile()`. Lưu ý rằng để sử dụng hàm mất mát trong nội bộ, bạn sẽ cần truyền các nhãn của mình như một phần của đầu vào, không phải dưới dạng nhãn riêng biệt, đây là cách thông thường để sử dụng nhãn với các mô hình Keras. Bạn sẽ thấy các ví dụ về điều này trong Phần 2 của khóa học, trong đó việc xác định hàm mất mát chính xác có thể khó khăn. Tuy nhiên, đối với phân loại chuỗi, một hàm mất mát Keras tiêu chuẩn hoạt động khá tốt, vì vậy đó là những gì chúng ta sẽ sử dụng ở đây.
 
 ```py
 from tensorflow.keras.losses import SparseCategoricalCrossentropy
@@ -90,11 +87,8 @@ model.fit(
 )
 ```
 
-<Tip warning={true}>
-
-Lưu ý một lỗi rất phổ biến ở đây - bạn *có thể* chỉ cần truyền tên của hàm mất mát dưới dạng chuỗi cho Keras, nhưng theo mặc định, Keras sẽ cho rằng bạn đã áp dụng softmax cho đầu ra của mình. Tuy nhiên, nhiều mô hình xuất ra các giá trị ngay trước khi áp dụng softmax, còn được gọi là *logit*. Chúng ta cần nói với hàm mất mát rằng đó là những gì mô hình của chúng ta làm và cách duy nhất để làm điều đó là gọi nó trực tiếp, thay vì đặt tên bằng một chuỗi.
-
-</Tip>
+> [!WARNING]
+> Lưu ý một lỗi rất phổ biến ở đây - bạn *có thể* chỉ cần truyền tên của hàm mất mát dưới dạng chuỗi cho Keras, nhưng theo mặc định, Keras sẽ cho rằng bạn đã áp dụng softmax cho đầu ra của mình. Tuy nhiên, nhiều mô hình xuất ra các giá trị ngay trước khi áp dụng softmax, còn được gọi là *logit*. Chúng ta cần nói với hàm mất mát rằng đó là những gì mô hình của chúng ta làm và cách duy nhất để làm điều đó là gọi nó trực tiếp, thay vì đặt tên bằng một chuỗi.
 
 ### Cải thiện hiệu suất huấn luyện
 
@@ -122,11 +116,8 @@ from tensorflow.keras.optimizers import Adam
 opt = Adam(learning_rate=lr_scheduler)
 ```
 
-<Tip>
-
-Thư viện 🤗 Transformers cũng có một hàm `create_optimizer()` sẽ tạo ra một trình tối ưu hóa `AdamW` với sự giảm tốc độ học. Đây là một phím tắt thuận tiện mà bạn sẽ thấy chi tiết trong các phần sau của khóa học.
-
-</Tip>
+> [!TIP]
+> Thư viện 🤗 Transformers cũng có một hàm `create_optimizer()` sẽ tạo ra một trình tối ưu hóa `AdamW` với sự giảm tốc độ học. Đây là một phím tắt thuận tiện mà bạn sẽ thấy chi tiết trong các phần sau của khóa học.
 
 Bây giờ chúng ta đã có trình tối ưu hóa hoàn toàn mới và ta có thể thử huấn luyện với nó. Đầu tiên, hãy tải lại mô hình, để đặt lại các thay đổi đối với trọng số từ lần chạy huấn luyện mà chúng ta vừa thực hiện và sau đó ta có thể biên dịch nó bằng trình tối ưu hóa mới:
 
@@ -144,11 +135,8 @@ Giờ ta sẽ fit lại 1 lần nữa:
 model.fit(tf_train_dataset, validation_data=tf_validation_dataset, epochs=3)
 ```
 
-<Tip>
-
-💡 Nếu bạn muốn tự động tải mô hình của mình lên Hub trong quá trình huấn luyện, bạn có thể truyền `PushToHubCallback` vào trong phương thức `model.fit()`. Chúng ta sẽ tìm hiểu thêm về điều này trong [Chương 4](/course/chapter4/3)
-
-</Tip>
+> [!TIP]
+> 💡 Nếu bạn muốn tự động tải mô hình của mình lên Hub trong quá trình huấn luyện, bạn có thể truyền `PushToHubCallback` vào trong phương thức `model.fit()`. Chúng ta sẽ tìm hiểu thêm về điều này trong [Chương 4](/course/chapter4/3)
 
 ### Các dự đoán của mô hình
 
diff --git a/chapters/vi/chapter3/4.mdx b/chapters/vi/chapter3/4.mdx
index 2a9d4f62c..2429da9b7 100644
--- a/chapters/vi/chapter3/4.mdx
+++ b/chapters/vi/chapter3/4.mdx
@@ -196,11 +196,8 @@ metric.compute()
 
 Một lần nữa, kết quả của bạn sẽ hơi khác một chút vì sự ngẫu nhiên trong quá trình khởi tạo đầu mô hình và xáo trộn dữ liệu, nhưng chúng phải ở trong cùng một khoảng.
 
-<Tip>
-
-✏️ **Thử nghiệm thôi!** Sửa đổi vòng lặp huấn luyện trước đó để tinh chỉnh mô hình của bạn trên tập dữ liệu SST-2.
-
-</Tip>
+> [!TIP]
+> ✏️ **Thử nghiệm thôi!** Sửa đổi vòng lặp huấn luyện trước đó để tinh chỉnh mô hình của bạn trên tập dữ liệu SST-2.
 
 ### Tăng cường trí thông minh của vòng huấn luyện với 🤗 Accelerate
 
@@ -292,9 +289,8 @@ Dòng đầu tiên cần thêm là dòng nhập. Dòng thứ hai khởi tạo m
 
 Sau đó, phần lớn công việc chính được thực hiện trong dòng gửi bộ lưu dữ liệu, mô hình và trình tối ưu hóa đến `accelerator.prepare()`. Thao tác này sẽ bọc các đối tượng đó trong hộp chứa thích hợp để đảm bảo việc huấn luyện được phân phối hoạt động như dự định. Các thay đổi còn lại cần thực hiện là loại bỏ dòng đặt lô trên `device` (một lần nữa, nếu bạn muốn giữ lại điều này, bạn chỉ cần thay đổi nó thành sử dụng `accelerator.device`) và thay thế `loss.backward()` bằng  `accelerator.backward(loss)`.
 
-<Tip>
-⚠️ Để hưởng lợi từ việc tăng tốc độ do Cloud TPUs cung cấp, chúng tôi khuyên bạn nên đệm các mẫu của mình theo độ dài cố định bằng các tham số `padding="max_length"` và `max_length` của tokenizer.
-</Tip>
+> [!TIP]
+> ⚠️ Để hưởng lợi từ việc tăng tốc độ do Cloud TPUs cung cấp, chúng tôi khuyên bạn nên đệm các mẫu của mình theo độ dài cố định bằng các tham số `padding="max_length"` và `max_length` của tokenizer.
 
 Nếu bạn muốn sao chép và dán nó để mày mò, đây là giao diện của vòng huấn luyện hoàn chỉnh với 🤗 Accelerate:
 
diff --git a/chapters/vi/chapter4/2.mdx b/chapters/vi/chapter4/2.mdx
index edefb5b7c..e2aaa0a73 100644
--- a/chapters/vi/chapter4/2.mdx
+++ b/chapters/vi/chapter4/2.mdx
@@ -122,8 +122,5 @@ model = TFAutoModelForMaskedLM.from_pretrained("camembert-base")
 
 {/if}
 
-<Tip>
-
-Khi sử dụng một mô hình được huấn luyện trước, hãy đảm bảo kiểm tra xem nó được huấn luyện như thế nào, dựa trên tập dữ liệu nào, các giới hạn và độ sai lệch của nó. Tất cả thông tin này phải được ghi trên thẻ mô hình của nó.
-
-</Tip>
+> [!TIP]
+> Khi sử dụng một mô hình được huấn luyện trước, hãy đảm bảo kiểm tra xem nó được huấn luyện như thế nào, dựa trên tập dữ liệu nào, các giới hạn và độ sai lệch của nó. Tất cả thông tin này phải được ghi trên thẻ mô hình của nó.
diff --git a/chapters/vi/chapter4/3.mdx b/chapters/vi/chapter4/3.mdx
index c80df00cd..b221ec358 100644
--- a/chapters/vi/chapter4/3.mdx
+++ b/chapters/vi/chapter4/3.mdx
@@ -198,11 +198,8 @@ Nhấp vào tab "Files and versions" ("Tệp và phiên bản") và bạn sẽ t
 </div>
 {/if}
 
-<Tip>
-
-✏️ **Thử nghiệm thôi!** Lấy mô hình và trình tokenize được liên kết với checkpoint `bert-base-cased` và tải chúng lên kho lưu trữ trong không gian tên của bạn bằng phương thức `push_to_hub()`. Kiểm tra kỹ xem repo có xuất hiện chính xác trên trang của bạn hay không trước khi xóa nó.
-
-</Tip>
+> [!TIP]
+> ✏️ **Thử nghiệm thôi!** Lấy mô hình và trình tokenize được liên kết với checkpoint `bert-base-cased` và tải chúng lên kho lưu trữ trong không gian tên của bạn bằng phương thức `push_to_hub()`. Kiểm tra kỹ xem repo có xuất hiện chính xác trên trang của bạn hay không trước khi xóa nó.
 
 Như bạn đã thấy, phương thức `push_to_hub()` nhận một vài tham số, giúp bạn có thể tải lên không gian tên tổ chức hoặc kho lưu trữ cụ thể hoặc sử dụng token API khác. Chúng tôi khuyên bạn nên xem thông số kỹ thuật phương pháp có sẵn trực tiếp trong [🤗 tài liệu về Transformers](https://huggingface.co/transformers/model_sharing.html) để biết những gì ta có thể làm.
 
@@ -508,9 +505,8 @@ Nếu bạn nhìn vào kích thước tệp (ví dụ: với `ls -lh`), bạn s
 
 {/if}
 
-<Tip>
-  ✏️  Khi tạo kho lưu trữ từ giao diện web, tệp *.gitattributes* được tự động thiết lập để xem xét các tệp có phần mở rộng nhất định, chẳng hạn như *.bin* và *.h5*, là tệp lớn và git-lfs sẽ theo dõi chúng mà không có thiết lập cần thiết về phía bạn.
-</Tip>{" "}
+> [!TIP]
+> ✏️  Khi tạo kho lưu trữ từ giao diện web, tệp *.gitattributes* được tự động thiết lập để xem xét các tệp có phần mở rộng nhất định, chẳng hạn như *.bin* và *.h5*, là tệp lớn và git-lfs sẽ theo dõi chúng mà không có thiết lập cần thiết về phía bạn.{" "}
 
 Bây giờ chúng ta có thể tiếp tục và tiến hành như chúng ta thường làm với các kho lưu trữ Git truyền thống. Chúng ta có thể thêm tất cả các tệp vào môi trường dàn dựng của Git bằng lệnh `git add`:
 
diff --git a/chapters/vi/chapter5/2.mdx b/chapters/vi/chapter5/2.mdx
index 660739c0a..7a45b1546 100644
--- a/chapters/vi/chapter5/2.mdx
+++ b/chapters/vi/chapter5/2.mdx
@@ -49,11 +49,8 @@ SQuAD_it-train.json.gz:	   82.2% -- replaced with SQuAD_it-train.json
 
 Chúng ta có thể thấy rằng các tệp nén đã được thay thế bằng _SQuAD_it-train.json_ và _SQuAD_it-text.json_, và dữ liệu được lưu trữ ở định dạng JSON.
 
-<Tip>
-
-✎ Nếu bạn đang thắc mắc tại sao lại có ký tự`!` trong các lệnh trên, đó là bởi vì chúng ta đang chạy chúng trong một sổ ghi chép Jupyter. Chỉ cần xóa tiền tố này nếu bạn muốn tải xuống và giải nén tập dữ liệu trên terminal.
-
-</Tip>
+> [!TIP]
+> ✎ Nếu bạn đang thắc mắc tại sao lại có ký tự`!` trong các lệnh trên, đó là bởi vì chúng ta đang chạy chúng trong một sổ ghi chép Jupyter. Chỉ cần xóa tiền tố này nếu bạn muốn tải xuống và giải nén tập dữ liệu trên terminal.
 
 Để tải tệp JSON bằng hàm `load_dataset()`, chúng ta chỉ cần biết liệu chúng ta đang xử lý JSON thông thường (tương tự như từ điển lồng nhau) hay JSON dòng (JSON được phân tách bằng dòng). Giống như nhiều bộ dữ liệu hỏi đáp, SQuAD-it sử dụng định dạng lồng nhau, với tất cả văn bản được lưu trữ trong trường `data`. Điều này có nghĩa là chúng ta có thể tải tập dữ liệu bằng cách chỉ định tham số `field` như sau:
 
@@ -126,11 +123,8 @@ DatasetDict({
 ```
 Đây chính xác là những gì chúng ta muốn. Giờ đây, ta có thể áp dụng nhiều kỹ thuật tiền xử lý khác nhau để làm sạch dữ liệu, mã hóa các bài đánh giá, v.v.
 
-<Tip>
-
-Tham số `data_files` của hàm `load_dataset()` khá linh hoạt và có thể là một đường dẫn tệp duy nhất, danh sách các đường dẫn tệp hoặc từ điển ánh xạ các tên tách thành đường dẫn tệp. Bạn cũng có thể tập hợp các tệp phù hợp với một mẫu được chỉ định theo các quy tắc được sử dụng bởi Unix shell (ví dụ: bạn có thể tổng hợp tất cả các tệp JSON trong một thư mục dưới dạng một lần tách duy nhất bằng cách đặt `data_files="*.json"`). Xem [tài liệu](https://huggingface.co/docs/datasets/loading#local-and-remote-files) 🤗 Datasets để biết thêm chi tiết.
-
-</Tip>
+> [!TIP]
+> Tham số `data_files` của hàm `load_dataset()` khá linh hoạt và có thể là một đường dẫn tệp duy nhất, danh sách các đường dẫn tệp hoặc từ điển ánh xạ các tên tách thành đường dẫn tệp. Bạn cũng có thể tập hợp các tệp phù hợp với một mẫu được chỉ định theo các quy tắc được sử dụng bởi Unix shell (ví dụ: bạn có thể tổng hợp tất cả các tệp JSON trong một thư mục dưới dạng một lần tách duy nhất bằng cách đặt `data_files="*.json"`). Xem [tài liệu](https://huggingface.co/docs/datasets/loading#local-and-remote-files) 🤗 Datasets để biết thêm chi tiết.
 
 Các tập lệnh tải trong 🤗 Datasets thực sự hỗ trợ giải nén tự động các tệp đầu vào, vì vậy chúng ta có thể bỏ qua việc sử dụng `gzip` bằng cách trỏ trực tiếp tham số `data_files` vào các tệp nén:
 
@@ -158,8 +152,5 @@ squad_it_dataset = load_dataset("json", data_files=data_files, field="data")
 
 Điều này trả về cùng một đối tượng `DatasetDict` như ở trên, nhưng giúp ta tiết kiệm bước tải xuống và giải nén thủ công các tệp  _SQuAD_it-*.json.gz_. Điều này tổng kết bước đột phá của chúng ta vào các cách khác nhau để tải các tập dữ liệu không được lưu trữ trên Hugging Face Hub. Giờ ta đã có một tập dữ liệu để nghịch, hãy bắt tay vào các kỹ thuật sắp xếp dữ liệu khác nhau thôi!
 
-<Tip>
-
-✏️ **Thử nghiệm thôi!** Chọn một tập dữ liệu khác được lưu trữ trên GitHub hoặc [Kho lưu trữ Học Máy UCI](https://archive.ics.uci.edu/ml/index.php) và thử tải nó cả cục bộ và từ xa bằng cách sử dụng các kỹ thuật đã giới thiệu ở trên. Để có điểm thưởng, hãy thử tải tập dữ liệu được lưu trữ ở định dạng CSV hoặc dạng văn bản (xem [tài liệu](https://huggingface.co/docs/datasets/loading#local-and-remote-files) để biết thêm thông tin trên các định dạng này).
-
-</Tip>
+> [!TIP]
+> ✏️ **Thử nghiệm thôi!** Chọn một tập dữ liệu khác được lưu trữ trên GitHub hoặc [Kho lưu trữ Học Máy UCI](https://archive.ics.uci.edu/ml/index.php) và thử tải nó cả cục bộ và từ xa bằng cách sử dụng các kỹ thuật đã giới thiệu ở trên. Để có điểm thưởng, hãy thử tải tập dữ liệu được lưu trữ ở định dạng CSV hoặc dạng văn bản (xem [tài liệu](https://huggingface.co/docs/datasets/loading#local-and-remote-files) để biết thêm thông tin trên các định dạng này).
diff --git a/chapters/vi/chapter5/3.mdx b/chapters/vi/chapter5/3.mdx
index 20006ad93..e81bb608b 100644
--- a/chapters/vi/chapter5/3.mdx
+++ b/chapters/vi/chapter5/3.mdx
@@ -88,11 +88,8 @@ DatasetDict({
 })
 ```
 
-<Tip>
-
-✏️ **Thử nghiệm thôi!** Sử dụng hàm `Dataset.unique()` để tìm số lượng thuốc độc nhất và điều kiện trong tập huấn luyện và kiểm thử.
-
-</Tip>
+> [!TIP]
+> ✏️ **Thử nghiệm thôi!** Sử dụng hàm `Dataset.unique()` để tìm số lượng thuốc độc nhất và điều kiện trong tập huấn luyện và kiểm thử.
 
 Tiếp theo, hãy chuẩn hóa tất cả các nhãn `condition` bằng cách sử dụng `Dataset.map()`. Như chúng ta đã làm với tokenize trong [Chương 3](/course/chapter3), chúng ta có thể xác định một hàm đơn giản có thể được áp dụng trên tất cả các hàng của mỗi tập trong `drug_dataset`:
 
@@ -216,11 +213,8 @@ drug_dataset["train"].sort("review_length")[:3]
 
 Như ta đã nghi vấn, một số đánh giá chỉ chứa một từ duy nhất, mặc dù có thể ổn để phân tích sắc thái, nhưng sẽ không có nhiều thông tin nếu chúng tôi muốn dự đoán tình trạng bệnh.
 
-<Tip>
-
-🙋 Một cách thay thế để thêm các cột mới vào tập dữ liệu là sử dụng hàm `Dataset.add_column()`. Điều này cho phép bạn cung cấp cột dưới dạng danh sách Python hoặc mảng NumPy và có thể hữu ích trong các trường hợp mà `Dataset.map()` không phù hợp cho phân tích của bạn.
-
-</Tip>
+> [!TIP]
+> 🙋 Một cách thay thế để thêm các cột mới vào tập dữ liệu là sử dụng hàm `Dataset.add_column()`. Điều này cho phép bạn cung cấp cột dưới dạng danh sách Python hoặc mảng NumPy và có thể hữu ích trong các trường hợp mà `Dataset.map()` không phù hợp cho phân tích của bạn.
 
 Hãy sử dụng hàm `Dataset.filter()` để xóa các bài đánh giá có ít hơn 30 từ. Tương tự như những gì chúng ta đã làm với cột `condition`, chúng ta có thể lọc ra các bài đánh giá rất ngắn bằng cách yêu cầu các bài đánh giá có độ dài trên ngưỡng này:
 
@@ -235,11 +229,8 @@ print(drug_dataset.num_rows)
 
 Như bạn có thể thấy, điều này đã loại bỏ khoảng 15% bài đánh giá khỏi bộ huấn luyện và kiểm thử ban đầu.
 
-<Tip>
-
-✏️ **Thử nghiệm thôi!** Sử dụng hàm `Dataset.sort()` để kiểm tra các bài đánh giá có số lượng từ lớn nhất. Tham khảo [tài liệu](https://huggingface.co/docs/datasets/package_reference/main_classes#datasets.Dataset.sort) để biết bạn cần sử dụng tham số nào để sắp xếp các bài đánh giá theo thứ tự giảm dần.
-
-</Tip>
+> [!TIP]
+> ✏️ **Thử nghiệm thôi!** Sử dụng hàm `Dataset.sort()` để kiểm tra các bài đánh giá có số lượng từ lớn nhất. Tham khảo [tài liệu](https://huggingface.co/docs/datasets/package_reference/main_classes#datasets.Dataset.sort) để biết bạn cần sử dụng tham số nào để sắp xếp các bài đánh giá theo thứ tự giảm dần.
 
 Điều cuối cùng chúng ta cần giải quyết là sự hiện diện của ký tự HTML trong các bài đánh giá của chúng ta. Chúng ta có thể sử dụng mô-đun `html` của Python để loại bỏ qua các ký tự này, như sau:
 
@@ -296,11 +287,8 @@ Như bạn đã thấy trong [Chương 3](/course/chapter3), chúng ta có thể
 
 Bạn cũng có thể tính thời gian cho toàn bộ ô bằng cách đặt `%%time` ở đầu của ô mã. Trên phần cứng mà chúng ta thực hiện, nó hiển thị 10.8 giây cho lệnh này (đó là số được viết sau "Wall time").
 
-<Tip>
-
-✏️ **Thử nghiệm thôi!** Thực hiện cùng một hướng dẫn có và không có `batched=True`, sau đó thử nó với tokenizer chậm (thêm `use_fast=False` vào `AutoTokenizer.from_pretrained()`) để bạn có thể thấy giá trị bạn nhận được trên phần cứng của mình.
-
-</Tip>
+> [!TIP]
+> ✏️ **Thử nghiệm thôi!** Thực hiện cùng một hướng dẫn có và không có `batched=True`, sau đó thử nó với tokenizer chậm (thêm `use_fast=False` vào `AutoTokenizer.from_pretrained()`) để bạn có thể thấy giá trị bạn nhận được trên phần cứng của mình.
 
 Dưới đây là kết quả thu được khi có và không có tính năng phân lô, với tokenizer nhanh và chậm:
 
@@ -337,19 +325,13 @@ Tuỳ chọn         | Tokenizer nhanh | Tokenizer chậm
 
 Đó là những kết quả hợp lý hơn nhiều đối với tokenizer chậm, nhưng hiệu suất của tokenizer nhanh cũng đã được cải thiện đáng kể. Tuy nhiên, lưu ý rằng điều đó không phải lúc nào cũng đúng - đối với các giá trị của `num_proc` khác 8, các thử nghiệm của chúng tôi cho thấy rằng sử dụng `batched=True` mà không có tùy chọn này sẽ nhanh hơn. Nói chung, chúng tôi khuyên bạn không nên sử dụng xử lý đa luồng Python cho các trình tokenize nhanh với `batched=True`.
 
-<Tip>
-
-Sử dụng `num_proc` để tăng tốc quá trình xử lý của bạn thường là một ý tưởng tuyệt vời, miễn là hàm bạn đang sử dụng chưa thực hiện một số kiểu xử lý đa xử lý của riêng nó.
-
-</Tip>
+> [!TIP]
+> Sử dụng `num_proc` để tăng tốc quá trình xử lý của bạn thường là một ý tưởng tuyệt vời, miễn là hàm bạn đang sử dụng chưa thực hiện một số kiểu xử lý đa xử lý của riêng nó.
 
 Tất cả các chức năng này được cô đọng trong một phương pháp đã khá tuyệt vời, nhưng còn nhiều hơn thế nữa! Với `Dataset.map()` và `batched=True`, bạn có thể thay đổi số lượng phần tử trong tập dữ liệu của mình. Điều này cực kỳ hữu ích trong nhiều trường hợp mà bạn muốn tạo một số đặc trưng huấn luyện từ một mẫu và chúng ta sẽ cần thực hiện điều này như một phần của quá trình tiền xử lý cho một số tác vụ NLP sẽ thực hiện trong [Chương 7](/course/chapter7).
 
-<Tip>
-
-💡 Trong học máy, một _mẫu_ thường được định nghĩa là tập hợp _đặc trưng_ mà chúng ta cung cấp cho mô hình. Trong một số ngữ cảnh, các đặc trưng này sẽ là tập hợp thành các cột trong `Dataset`, nhưng trong các trường hợp khác (như ở đây và để phục vụ hỏi đáp), nhiều đặc trưng có thể được trích xuất từ một mẫu và thuộc về một cột duy nhất.
-
-</Tip>
+> [!TIP]
+> 💡 Trong học máy, một _mẫu_ thường được định nghĩa là tập hợp _đặc trưng_ mà chúng ta cung cấp cho mô hình. Trong một số ngữ cảnh, các đặc trưng này sẽ là tập hợp thành các cột trong `Dataset`, nhưng trong các trường hợp khác (như ở đây và để phục vụ hỏi đáp), nhiều đặc trưng có thể được trích xuất từ một mẫu và thuộc về một cột duy nhất.
 
 Chúng ta hãy xem nó hoạt động như thế nào! Ở đây, ta sẽ tokenize các mẫu của mình và cắt chúng về độ dài tối đa là 128, nhưng ta sẽ yêu cầu trình tokenize trả về *tất cả* các đoạn văn bản thay vì chỉ đoạn văn bản đầu tiên. Điều này có thể được thực hiện với `return_overflowing_tokens=True`:
 
@@ -519,11 +501,8 @@ Hãy tạo ra một `pandas.DataFrame` cho toàn bộ tập huấn luyện bằn
 train_df = drug_dataset["train"][:]
 ```
 
-<Tip>
-
-🚨 Bên dưới `Dataset.set_format()` thay đổi định dạng trả về cho phương thức `__getitem __()` của tập dữ liệu. Điều này có nghĩa là khi chúng ta muốn tạo một đối tượng mới như `train_df` từ `Dataset` ở định dạng `"pandas"`, chúng ta cần cắt toàn bộ tập dữ liệu để có được một `pandas.DataFrame`. Bạn có thể tự xác minh xem kiểu dữ liệu của `drug_dataset["train"]` có phải là `Dataset`, bất kể định dạng đầu ra là gì.
-
-</Tip>
+> [!TIP]
+> 🚨 Bên dưới `Dataset.set_format()` thay đổi định dạng trả về cho phương thức `__getitem __()` của tập dữ liệu. Điều này có nghĩa là khi chúng ta muốn tạo một đối tượng mới như `train_df` từ `Dataset` ở định dạng `"pandas"`, chúng ta cần cắt toàn bộ tập dữ liệu để có được một `pandas.DataFrame`. Bạn có thể tự xác minh xem kiểu dữ liệu của `drug_dataset["train"]` có phải là `Dataset`, bất kể định dạng đầu ra là gì.
 
 Từ đây, ta có thể sử dụng tất cả các chức năng của Pandas mà ta muốn. Ví dụ, chúng ta có thể thực hiện chuỗi lạ mắt để tính toán phân phối lớp giữa các `condition`:
 
@@ -591,11 +570,8 @@ Dataset({
 })
 ```
 
-<Tip>
-
-✏️ **Thử nghiệm thôi!** Tính xếp hạng trung bình cho mỗi loại thuốc và lưu trữ kết quả ở dạng `Dataset` mới.
-
-</Tip>
+> [!TIP]
+> ✏️ **Thử nghiệm thôi!** Tính xếp hạng trung bình cho mỗi loại thuốc và lưu trữ kết quả ở dạng `Dataset` mới.
 
 Phần này kết thúc chuyến tham quan của chúng ta về các kỹ thuật tiền xử lý khác nhau có sẵn trong 🤗 Datasets. Để hoàn thiện phần này, hãy tạo một tệp kiểm định để chuẩn bị tập dữ liệu cho việc huấn luyện một trình phân loại. Trước khi làm như vậy, chúng ta sẽ đặt lại định dạng đầu ra của `drug_dataset` từ `"pandas"` thành `"arrow"`:
 
diff --git a/chapters/vi/chapter5/4.mdx b/chapters/vi/chapter5/4.mdx
index c105b0b28..7a84df140 100644
--- a/chapters/vi/chapter5/4.mdx
+++ b/chapters/vi/chapter5/4.mdx
@@ -52,11 +52,8 @@ Dataset({
 
 Chúng ta có thể thấy rằng có 15,518,009 hàng và 2 cột trong tập dữ liệu của chúng tôi - đó là rất nhiều!
 
-<Tip>
-
-✎ Theo mặc định, 🤗 Datasets sẽ giải nén các tệp cần thiết để tải tập dữ liệu. Nếu bạn muốn bảo toàn dung lượng ổ cứng, bạn có thể truyền `DownloadConfig(delete_extracted=True)` vào tham số `download_config` của `load_dataset()`. Xem [tài liệu](https://huggingface.co/docs/datasets/package_reference/builder_classes#datasets.DownloadConfig) để biết thêm chi tiết.
-
-</Tip>
+> [!TIP]
+> ✎ Theo mặc định, 🤗 Datasets sẽ giải nén các tệp cần thiết để tải tập dữ liệu. Nếu bạn muốn bảo toàn dung lượng ổ cứng, bạn có thể truyền `DownloadConfig(delete_extracted=True)` vào tham số `download_config` của `load_dataset()`. Xem [tài liệu](https://huggingface.co/docs/datasets/package_reference/builder_classes#datasets.DownloadConfig) để biết thêm chi tiết.
 
 Hãy kiểm tra nội dung của mẫu đầu tiên:
 
@@ -107,11 +104,8 @@ Dataset size (cache file) : 19.54 GB
 
 Tuyệt vời - mặc dù nó gần 20 GB, chúng ta có thể tải và truy cập tập dữ liệu với RAM ít hơn nhiều!
 
-<Tip>
-
-✏️ **Thử nghiệm thôi!** Chọn một trong các [tập hợp con](https://the-eye.eu/public/AI/pile_preliminary_components/) từ Pile sao cho lớn hơn RAM của máy tính xách tay hoặc máy tính để bàn của bạn, tải nó với 🤗 Datasets, và đo dung lượng RAM được sử dụng. Lưu ý rằng để có được một phép đo chính xác, bạn sẽ muốn thực hiện việc này trong một quy trình mới. Bạn có thể tìm thấy các kích thước đã giải nén của từng tập hợp con trong Bảng 1 của [bài báo về Pile](https://arxiv.org/abs/2101.00027).
-
-</Tip>
+> [!TIP]
+> ✏️ **Thử nghiệm thôi!** Chọn một trong các [tập hợp con](https://the-eye.eu/public/AI/pile_preliminary_components/) từ Pile sao cho lớn hơn RAM của máy tính xách tay hoặc máy tính để bàn của bạn, tải nó với 🤗 Datasets, và đo dung lượng RAM được sử dụng. Lưu ý rằng để có được một phép đo chính xác, bạn sẽ muốn thực hiện việc này trong một quy trình mới. Bạn có thể tìm thấy các kích thước đã giải nén của từng tập hợp con trong Bảng 1 của [bài báo về Pile](https://arxiv.org/abs/2101.00027).
 
 Nếu bạn đã quen thuộc với Pandas, kết quả này có thể gây bất ngờ vì theo [quy tắc ngón tay cái](https://wesmckinney.com/blog/apache-arrow-pandas-internals/) nổi tiếng của Wes Kinney, bạn thường cần gấp 5 gấp 10 lần RAM so với kích thước của tập dữ liệu của bạn. Vậy 🤗 Datasets giải quyết vấn đề quản lý bộ nhớ này như thế nào? 🤗 Datasets coi mỗi tập dữ liệu như một [tệp ánh xạ bộ nhớ](https://en.wikipedia.org/wiki/Memory-mapped_file), cung cấp ánh xạ giữa RAM và bộ nhớ hệ thống tệp cho phép thư viện truy cập và hoạt động trên các phần tử của tập dữ liệu mà không cần tải đầy đủ vào bộ nhớ.
 
@@ -139,11 +133,8 @@ print(
 
 Ở đây chúng ta đã sử dụng mô-đun `timeit` của Python để đo thời gian thực thi được thực hiện bởi `code_snippet`. Thông thường, bạn sẽ có thể lặp lại tập dữ liệu với tốc độ từ vài phần mười GB/s đến vài GB/s. Điều này hoạt động hiệu quả với đại đa số các ứng dụng, nhưng đôi khi bạn sẽ phải làm việc với một tập dữ liệu quá lớn, thậm chí không thể lưu trữ trên ổ cứng của máy tính xách tay của bạn. Ví dụ: nếu chúng tôi cố gắng tải xuống toàn bộ Pile, chúng tôi sẽ cần 825 GB dung lượng đĩa trống! Để xử lý những trường hợp này, 🤗 Datasets cung cấp tính năng phát trực tuyến cho phép chúng tôi tải xuống và truy cập các phần tử một cách nhanh chóng mà không cần tải xuống toàn bộ tập dữ liệu. Chúng ta hãy xem cách này hoạt động như thế nào.
 
-<Tip>
-
-💡 Trong sổ ghi chép Jupyter, bạn có thể định thời gian cho các ô bằng cách sử dụng[hàm ma thuật `%%timeit`](https://ipython.readthedocs.io/en/stable/interactive/magics.html#magic-timeit).
-
-</Tip>
+> [!TIP]
+> 💡 Trong sổ ghi chép Jupyter, bạn có thể định thời gian cho các ô bằng cách sử dụng[hàm ma thuật `%%timeit`](https://ipython.readthedocs.io/en/stable/interactive/magics.html#magic-timeit).
 
 ## Truyền trực tuyến tập dữ liệu
 
@@ -180,11 +171,8 @@ next(iter(tokenized_dataset))
 {'input_ids': [101, 4958, 5178, 4328, 6779, ...], 'attention_mask': [1, 1, 1, 1, 1, ...]}
 ```
 
-<Tip>
-
-💡 Để tăng tốc độ trình tokenize với tính năng phát trực tuyến, bạn có thể vượt qua `batched=True`, như chúng ta đã thấy trong phần trước. Nó sẽ xử lý hàng loạt các ví dụ; kích thước lô mặc định là 1,000 và có thể được chỉ định bằng tham số `batch_size`.
-
-</Tip>
+> [!TIP]
+> 💡 Để tăng tốc độ trình tokenize với tính năng phát trực tuyến, bạn có thể vượt qua `batched=True`, như chúng ta đã thấy trong phần trước. Nó sẽ xử lý hàng loạt các ví dụ; kích thước lô mặc định là 1,000 và có thể được chỉ định bằng tham số `batch_size`.
 
 Bạn cũng có thể xáo trộn một tập dữ liệu được phát trực tuyến bằng cách sử dụng `IterableDataset.shuffle()`, nhưng không giống như `Dataset.shuffle()` điều này chỉ xáo trộn các phần tử trong một `buffer_size` được định trước:
 
@@ -285,10 +273,7 @@ next(iter(pile_dataset["train"]))
  'text': 'It is done, and submitted. You can play “Survival of the Tastiest” on Android, and on the web...'}
 ```
 
-<Tip>
-
-✏️ **Thử nghiệm thôi!** Sử dụng một trong những kho tài liệu Common Crawl lớn như [`mc4`](https://huggingface.co/datasets/mc4) hoặc [`oscar`](https://huggingface.co/datasets/oscar) để tạo tập dữ liệu đa ngôn ngữ trực tuyến thể hiện tỷ lệ nói của các ngôn ngữ ở quốc gia bạn chọn. Ví dụ: bốn ngôn ngữ quốc gia ở Thụy Sĩ là tiếng Đức, tiếng Pháp, tiếng Ý và tiếng La Mã, vì vậy bạn có thể thử tạo một kho ngữ liệu tiếng Thụy Sĩ bằng cách lấy mẫu các tập hợp con Oscar theo tỷ lệ nói của chúng.
-
-</Tip>
+> [!TIP]
+> ✏️ **Thử nghiệm thôi!** Sử dụng một trong những kho tài liệu Common Crawl lớn như [`mc4`](https://huggingface.co/datasets/mc4) hoặc [`oscar`](https://huggingface.co/datasets/oscar) để tạo tập dữ liệu đa ngôn ngữ trực tuyến thể hiện tỷ lệ nói của các ngôn ngữ ở quốc gia bạn chọn. Ví dụ: bốn ngôn ngữ quốc gia ở Thụy Sĩ là tiếng Đức, tiếng Pháp, tiếng Ý và tiếng La Mã, vì vậy bạn có thể thử tạo một kho ngữ liệu tiếng Thụy Sĩ bằng cách lấy mẫu các tập hợp con Oscar theo tỷ lệ nói của chúng.
 
 Giờ đây, bạn có tất cả các công cụ cần thiết để tải và xử lý các tập dữ liệu ở mọi hình dạng và kích thước - nhưng trừ khi bạn đặc biệt may mắn, sẽ đến một thời điểm trong hành trình NLP của bạn, nơi bạn sẽ phải thực sự tạo một tập dữ liệu để giải quyết vấn đề vấn đề trong tầm tay. Đó là chủ đề của phần tiếp theo!
diff --git a/chapters/vi/chapter5/5.mdx b/chapters/vi/chapter5/5.mdx
index 9b05d75d6..0a61b0041 100644
--- a/chapters/vi/chapter5/5.mdx
+++ b/chapters/vi/chapter5/5.mdx
@@ -114,11 +114,8 @@ response.json()
 
 Ồ, đó là rất nhiều thông tin! Chúng ta có thể thấy các trường hữu ích như `title`, `body`,  và `number` mô tả sự cố cũng như thông tin về người dùng GitHub đã mở sự cố.
 
-<Tip>
-
-✏️ **Thử nghiệm thôi!** Nhấp vào một vài URL trong tải trọng JSON ở trên để biết loại thông tin mà mỗi vấn đề GitHub được liên kết với.
-
-</Tip>
+> [!TIP]
+> ✏️ **Thử nghiệm thôi!** Nhấp vào một vài URL trong tải trọng JSON ở trên để biết loại thông tin mà mỗi vấn đề GitHub được liên kết với.
 
 Như đã mô tả trong [tài liệu](https://docs.github.com/en/rest/overview/resources-in-the-rest-api#rate-limiting) GitHub, các yêu cầu chưa được xác thực được giới hạn ở 60 yêu cầu mỗi giờ. Mặc dù bạn có thể tăng tham số truy vấn `per_page` để giảm số lượng yêu cầu bạn thực hiện, nhưng bạn vẫn sẽ đạt đến giới hạn tỷ lệ trên bất kỳ kho lưu trữ nào có nhiều hơn một vài nghìn vấn đề. Vì vậy, thay vào đó, bạn nên làm theo [hướng dẫn](https://docs.github.com/en/github/authenticating-to-github/creating-a-personal-access-token) của GitHub về cách tạo _personal access token_ hay _token truy cập cá nhân_ để bạn có thể tăng giới hạn tốc độ lên 5,000 yêu cầu mỗi giờ. Khi bạn có token của riêng mình, bạn có thể bao gồm nó như một phần của tiêu đề yêu cầu:
 
@@ -127,11 +124,8 @@ GITHUB_TOKEN = xxx  # Sao chép token GitHub của bạn tại đây
 headers = {"Authorization": f"token {GITHUB_TOKEN}"}
 ```
 
-<Tip warning={true}>
-
-⚠️ Không dùng chung notebook có dán `GITHUB_TOKEN` của bạn trong đó. Chúng tôi khuyên bạn nên xóa ô cuối cùng sau khi bạn đã thực thi nó để tránh vô tình làm rò rỉ thông tin này. Tốt hơn nữa, hãy lưu trữ token trong tệp *.env* và sử dụng [thư viện `python-dotenv`](https://github.com/theskumar/python-dotenv) để tải tự động cho bạn dưới dạng biến môi trường.
-
-</Tip>
+> [!WARNING]
+> ⚠️ Không dùng chung notebook có dán `GITHUB_TOKEN` của bạn trong đó. Chúng tôi khuyên bạn nên xóa ô cuối cùng sau khi bạn đã thực thi nó để tránh vô tình làm rò rỉ thông tin này. Tốt hơn nữa, hãy lưu trữ token trong tệp *.env* và sử dụng [thư viện `python-dotenv`](https://github.com/theskumar/python-dotenv) để tải tự động cho bạn dưới dạng biến môi trường.
 
 Bây giờ chúng ta đã có token truy cập của mình, hãy tạo một hàm có thể tải xuống tất cả các vấn đề từ kho lưu trữ GitHub:
 
@@ -238,11 +232,8 @@ issues_dataset = issues_dataset.map(
 )
 ```
 
-<Tip>
-
-✏️ **Thử nghiệm thôi!** Tính thời gian trung bình cần để đóng các vấn đề trong 🤗 Datasets. Bạn có thể thấy hàm `Dataset.filter()` hữu ích để lọc ra các yêu cầu kéo và các vấn đề đang mở, đồng thời bạn có thể sử dụng hàm `Dataset.set_format()` để chuyển đổi tập dữ liệu thành `DataFrame` để bạn có thể dễ dàng thao tác dấu thời gian `create_at` và `closed_at`. Đối với điểm thưởng, hãy tính thời gian trung bình cần để đóng các yêu cầu kéo.
-
-</Tip>
+> [!TIP]
+> ✏️ **Thử nghiệm thôi!** Tính thời gian trung bình cần để đóng các vấn đề trong 🤗 Datasets. Bạn có thể thấy hàm `Dataset.filter()` hữu ích để lọc ra các yêu cầu kéo và các vấn đề đang mở, đồng thời bạn có thể sử dụng hàm `Dataset.set_format()` để chuyển đổi tập dữ liệu thành `DataFrame` để bạn có thể dễ dàng thao tác dấu thời gian `create_at` và `closed_at`. Đối với điểm thưởng, hãy tính thời gian trung bình cần để đóng các yêu cầu kéo.
 
 Mặc dù chúng ta có thể tiếp tục dọn dẹp tập dữ liệu bằng cách loại bỏ hoặc đổi tên một số cột, nhưng thông thường tốt nhất là giữ tập dữ liệu ở trạng thái "thô" nhất có thể ở giai đoạn này để có thể dễ dàng sử dụng trong nhiều ứng dụng.
 
@@ -384,11 +375,8 @@ repo_url
 
 Trong ví dụ này, chúng ta đã tạo một kho lưu trữ tập dữ liệu trống có tên là `github-issue` với tên người dùng `lewtun` (tên người dùng phải là tên người dùng Hub của bạn khi bạn đang chạy đoạn mã này!)
 
-<Tip>
-
-✏️ **Thử nghiệm thôi!** Sử dụng tên người dùng và mật khẩu Hugging Face Hub của bạn để lấy token và tạo một kho lưu trữ trống có tên là `github-issue`. Hãy nhớ **không bao giờ lưu thông tin đăng nhập của bạn** trong Colab hoặc bất kỳ kho lưu trữ nào khác, vì thông tin này có thể bị kẻ xấu lợi dụng.
-
-</Tip>
+> [!TIP]
+> ✏️ **Thử nghiệm thôi!** Sử dụng tên người dùng và mật khẩu Hugging Face Hub của bạn để lấy token và tạo một kho lưu trữ trống có tên là `github-issue`. Hãy nhớ **không bao giờ lưu thông tin đăng nhập của bạn** trong Colab hoặc bất kỳ kho lưu trữ nào khác, vì thông tin này có thể bị kẻ xấu lợi dụng.
 
 Tiếp theo, hãy sao chép kho lưu trữ từ Hub vào máy cục bộ của chúng ta và sao chép tệp tập dữ liệu của chúng ta vào đó. 🤗 Hub cung cấp một lớp `Repository` tiện dụng bao bọc nhiều lệnh Git phổ biến, do đó, để sao chép kho lưu trữ từ xa, chúng ta chỉ cần cung cấp URL và đường dẫn cục bộ mà ta muốn sao chép tới:
 
@@ -433,11 +421,8 @@ Dataset({
 
 Tuyệt vời, chúng ta đã đưa tập dữ liệu của mình vào Hub và nó có sẵn cho những người khác sử dụng! Chỉ còn một việc quan trọng cần làm: thêm _dataset card_  hay _thẻ dữ liệu_ giải thích cách tạo kho tài liệu và cung cấp thông tin hữu ích khác cho cộng đồng.
 
-<Tip>
-
-💡 Bạn cũng có thể tải tập dữ liệu lên Hugging Face Hub trực tiếp từ thiết bị đầu cuối bằng cách sử dụng `huggingface-cli` và một chút phép thuật từ Git. Tham khảo [hướng dẫn 🤗 Datasets](https://huggingface.co/docs/datasets/share#share-a-dataset-using-the-cli) để biết chi tiết về cách thực hiện việc này.
-
-</Tip>
+> [!TIP]
+> 💡 Bạn cũng có thể tải tập dữ liệu lên Hugging Face Hub trực tiếp từ thiết bị đầu cuối bằng cách sử dụng `huggingface-cli` và một chút phép thuật từ Git. Tham khảo [hướng dẫn 🤗 Datasets](https://huggingface.co/docs/datasets/share#share-a-dataset-using-the-cli) để biết chi tiết về cách thực hiện việc này.
 
 ## Tạo thẻ dữ liệu
 
@@ -459,16 +444,10 @@ Bạn có thể tạo tệp *README.md* trực tiếp trên Hub và bạn có th
 <img src="https://huggingface.co/datasets/huggingface-course/documentation-images/resolve/main/en/chapter5/dataset-card.png" alt="A dataset card." width="80%"/>
 </div>
 
-<Tip>
-
-✏️ **Thử nghiệm thôi!** Sử dụng ứng dụng `dataset-tagging` và [hướng dẫn 🤗 Datasets](https://github.com/huggingface/datasets/blob/master/templates/README_guide.md) để hoàn thành tệp *README.md* cho vấn đề về dữ liệu trên Github của bạn.
-
-</Tip>
+> [!TIP]
+> ✏️ **Thử nghiệm thôi!** Sử dụng ứng dụng `dataset-tagging` và [hướng dẫn 🤗 Datasets](https://github.com/huggingface/datasets/blob/master/templates/README_guide.md) để hoàn thành tệp *README.md* cho vấn đề về dữ liệu trên Github của bạn.
 
 Vậy đó! Trong phần này, chúng ta đã thấy rằng việc tạo một tập dữ liệu tốt có thể khá liên quan, nhưng may mắn thay, việc tải nó lên và chia sẻ nó với cộng đồng thì không. Trong phần tiếp theo, chúng ta sẽ sử dụng bộ dữ liệu mới của mình để tạo một công cụ tìm kiếm ngữ nghĩa với 🤗 Datasets có thể so khớp các câu hỏi với các vấn đề và nhận xét có liên quan nhất.
 
-<Tip>
-
-✏️ **Thử nghiệm thôi!** Thực hiện theo các bước chúng ta đã thực hiện trong phần này để tạo tập dữ liệu về các vấn đề GitHub cho thư viện mã nguồn mở yêu thích của bạn (tất nhiên là chọn thứ khác ngoài 🤗 Datasets!). Để có điểm thưởng, hãy tinh chỉnh bộ phân loại đa nhãn để dự đoán các thẻ có trong trường `labels`.
-
-</Tip>
+> [!TIP]
+> ✏️ **Thử nghiệm thôi!** Thực hiện theo các bước chúng ta đã thực hiện trong phần này để tạo tập dữ liệu về các vấn đề GitHub cho thư viện mã nguồn mở yêu thích của bạn (tất nhiên là chọn thứ khác ngoài 🤗 Datasets!). Để có điểm thưởng, hãy tinh chỉnh bộ phân loại đa nhãn để dự đoán các thẻ có trong trường `labels`.
diff --git a/chapters/vi/chapter5/6.mdx b/chapters/vi/chapter5/6.mdx
index 6c29fff33..a65c03252 100644
--- a/chapters/vi/chapter5/6.mdx
+++ b/chapters/vi/chapter5/6.mdx
@@ -188,11 +188,8 @@ Dataset({
 Được rồi, điều này đã cho chúng ta vài nghìn nhận xét để làm việc cùng!
 
 
-<Tip>
-
-✏️ **Thử nghiệm thôi!** Cùng xem liệu bạn có thể sử dụng `Dataset.map()` để khám phá cột `comments` của `issues_dataset` _mà không cần_ sử dụng Pandas hay không. Nó sẽ hơi khó khăn một chút; bạn có thể xem phần ["Batch mapping"](https://huggingface.co/docs/datasets/about_map_batch#batch-mapping) của tài liệu 🤗 Datasets, một tài liệu hữu ích cho tác vụ này.
-
-</Tip>
+> [!TIP]
+> ✏️ **Thử nghiệm thôi!** Cùng xem liệu bạn có thể sử dụng `Dataset.map()` để khám phá cột `comments` của `issues_dataset` _mà không cần_ sử dụng Pandas hay không. Nó sẽ hơi khó khăn một chút; bạn có thể xem phần ["Batch mapping"](https://huggingface.co/docs/datasets/about_map_batch#batch-mapping) của tài liệu 🤗 Datasets, một tài liệu hữu ích cho tác vụ này.
 
 Bây giờ chúng ta đã có một nhận xét trên mỗi hàng, hãy tạo một cột `comments_length` mới chứa số từ trên mỗi nhận xét:
 
@@ -522,8 +519,5 @@ URL: https://github.com/huggingface/datasets/issues/824
 
 Không tệ! Lần truy cập thứ hai của chúng ta dường như phù hợp với truy vấn.
 
-<Tip>
-
-✏️ **Thử nghiệm thôi!** Tạo truy vấn của riêng bạn và xem liệu bạn có thể tìm thấy câu trả lời trong các tài liệu đã truy xuất hay không. Bạn có thể phải tăng tham số `k` trong `Dataset.get_nearest_examples()` để mở rộng tìm kiếm.
-
-</Tip>
+> [!TIP]
+> ✏️ **Thử nghiệm thôi!** Tạo truy vấn của riêng bạn và xem liệu bạn có thể tìm thấy câu trả lời trong các tài liệu đã truy xuất hay không. Bạn có thể phải tăng tham số `k` trong `Dataset.get_nearest_examples()` để mở rộng tìm kiếm.
diff --git a/chapters/vi/chapter6/2.mdx b/chapters/vi/chapter6/2.mdx
index d9bcd1349..ee264bfc9 100644
--- a/chapters/vi/chapter6/2.mdx
+++ b/chapters/vi/chapter6/2.mdx
@@ -11,11 +11,8 @@ Nếu mô hình ngôn ngữ không có sẵn ngôn ngữ bạn quan tâm hoặc
 
 <Youtube id="DJimQynXZsQ"/>
 
-<Tip warning={true}>
-
-⚠️ Huấn luyện một tokenizer không giống như huấn luyện một mô hình! Huấn luyện mô hình sử dụng giảm độ dốc ngẫu nhiên để làm cho tổn thất nhỏ hơn một chút cho mỗi đợt. Nó được ngẫu nhiên hóa bởi tự nhiên (có nghĩa là bạn phải đặt một giá trị seed để có được kết quả tương tự khi thực hiện cùng thực hiện huấn luyện hai lần). Huấn luyện một trình tokenize là một quy trình thống kê cố gắng xác định những từ phụ nào tốt nhất để chọn cho một kho dữ liệu nhất định, và các quy tắc được sử dụng để chọn chúng dựa trên thuật toán tokenize. Nó mang tính cố định, nghĩa là bạn luôn nhận được cùng một kết quả khi huấn luyện với cùng một thuật toán trên cùng một kho tài liệu.
-
-</Tip>
+> [!WARNING]
+> ⚠️ Huấn luyện một tokenizer không giống như huấn luyện một mô hình! Huấn luyện mô hình sử dụng giảm độ dốc ngẫu nhiên để làm cho tổn thất nhỏ hơn một chút cho mỗi đợt. Nó được ngẫu nhiên hóa bởi tự nhiên (có nghĩa là bạn phải đặt một giá trị seed để có được kết quả tương tự khi thực hiện cùng thực hiện huấn luyện hai lần). Huấn luyện một trình tokenize là một quy trình thống kê cố gắng xác định những từ phụ nào tốt nhất để chọn cho một kho dữ liệu nhất định, và các quy tắc được sử dụng để chọn chúng dựa trên thuật toán tokenize. Nó mang tính cố định, nghĩa là bạn luôn nhận được cùng một kết quả khi huấn luyện với cùng một thuật toán trên cùng một kho tài liệu.
 
 ## Tập hợp một kho ngữ liệu
 
diff --git a/chapters/vi/chapter6/3.mdx b/chapters/vi/chapter6/3.mdx
index f4206a87a..8afe5c687 100644
--- a/chapters/vi/chapter6/3.mdx
+++ b/chapters/vi/chapter6/3.mdx
@@ -33,11 +33,8 @@ Trong phần thảo luận kế tiếp, chúng ta sẽ phân biệt giữa các
 `batched=True`  | 10.8s          | 4min41s
 `batched=False` | 59.2s          | 5min3s
 
-<Tip warning={true}>
-
-⚠️ Khi tokenize một câu, bạn không phải lúc nào cũng thấy sự khác biệt về tốc độ giữa các phiên bản chậm và nhanh của cùng một trình tokenize. Trên thực tế, phiên bản nhanh có thể chậm hơn! Chỉ khi tokenize nhiều văn bản song song cùng một lúc, bạn mới có thể thấy rõ sự khác biệt.
-
-</Tip>
+> [!WARNING]
+> ⚠️ Khi tokenize một câu, bạn không phải lúc nào cũng thấy sự khác biệt về tốc độ giữa các phiên bản chậm và nhanh của cùng một trình tokenize. Trên thực tế, phiên bản nhanh có thể chậm hơn! Chỉ khi tokenize nhiều văn bản song song cùng một lúc, bạn mới có thể thấy rõ sự khác biệt.
 
 ## Mã hoá theo lô
 
@@ -107,13 +104,10 @@ encoding.word_ids()
 
 Chúng ta có thể thấy rằng các token đặc biệt của trình tokenize như `[CLS]` và `[SEP]` được ánh xạ thành `None`, và sau đó mỗi token được ánh xạ tới từ mà nó bắt nguồn. Điều này đặc biệt hữu ích để xác định xem một token nằm ở đầu một từ hay nếu hai token có trong cùng thuộc một từ. Chúng ta có thể dựa vào tiền tố `##` cho điều đó, nhưng nó chỉ hoạt động đối với các tokenize kiểu BERT; phương pháp này hoạt động với bất kỳ loại tokenizer nào miễn nó là phiên bản nhanh. Trong chương tiếp theo, chúng ta sẽ xem cách chúng ta có thể sử dụng khả năng này để áp dụng nhãn chúng ta có cho mỗi từ đúng cách với các token trong các tác vụ như nhận dạng thực thể được đặt tên (NER) và  gán nhãn từ loại (POS). Chúng ta cũng có thể sử dụng nó để che giấu tất cả các token đến từ cùng một từ trong mô hình ngôn ngữ được che (một kỹ thuật được gọi là _whole word masking_).
 
-<Tip>
-
-Khái niệm về một từ rất là phức tạp. Ví dụ: "I'll" (từ rút gọn của "I will") có được tính là một hay hai từ? Nó thực sự phụ thuộc vào trình tokenize và hoạt động tiền tokenize mà nó áp dụng. Một số tokenizer chỉ tách ra trên khoảng trắng, vì vậy họ sẽ coi đây là một từ. Những người khác sử dụng dấu câu ở đầu khoảng trắng, vì vậy sẽ coi nó là hai từ.
-
-✏️ **Thử nghiệm thôi!** Tạo tokenizer từ các checkpoints `bert-base-cased` và` roberta-base` và tokenize "81s" với chúng. Bạn quan sát thấy gì? ID từ là gì?
-
-</Tip>
+> [!TIP]
+> Khái niệm về một từ rất là phức tạp. Ví dụ: "I'll" (từ rút gọn của "I will") có được tính là một hay hai từ? Nó thực sự phụ thuộc vào trình tokenize và hoạt động tiền tokenize mà nó áp dụng. Một số tokenizer chỉ tách ra trên khoảng trắng, vì vậy họ sẽ coi đây là một từ. Những người khác sử dụng dấu câu ở đầu khoảng trắng, vì vậy sẽ coi nó là hai từ.
+>
+> ✏️ **Thử nghiệm thôi!** Tạo tokenizer từ các checkpoints `bert-base-cased` và` roberta-base` và tokenize "81s" với chúng. Bạn quan sát thấy gì? ID từ là gì?
 
 Tương tự, có một phương thức `question_ids()` mà chúng ta có thể sử dụng để ánh xạ token đến câu mà nó bắt nguồn (mặc dù trong trường hợp này, `token_type_ids` được trả về bởi tokenizer có thể cung cấp cho chúng ta cùng một thông tin).
 
@@ -130,11 +124,8 @@ Sylvain
 
 Như đã đề cập trước đó, tất cả điều này thực tế được hỗ trợ bởi là trình tokenizer nhanh kết hợp khoảng văn bản mà mỗi token đến từ danh sách *offset* hay *offset*. Để minh họa việc sử dụng chúng, tiếp theo, chúng tôi sẽ hướng dẫn bạn cách sao chép các kết quả của `token-classification` theo cách thủ công.
 
-<Tip>
-
-✏️ **Thử nghiệm thôi** Tạo văn bản mẫu của riêng bạn và xem liệu bạn có thể hiểu những token nào được liên kết với ID từ, cũng như cách trích xuất ký tự kéo dài cho một từ. Để có điểm thưởng, hãy thử sử dụng hai câu làm đầu vào và xem liệu ID câu có phù hợp với bạn không.
-
-</Tip>
+> [!TIP]
+> ✏️ **Thử nghiệm thôi** Tạo văn bản mẫu của riêng bạn và xem liệu bạn có thể hiểu những token nào được liên kết với ID từ, cũng như cách trích xuất ký tự kéo dài cho một từ. Để có điểm thưởng, hãy thử sử dụng hai câu làm đầu vào và xem liệu ID câu có phù hợp với bạn không.
 
 ## Bên trong pipeline `token-classification`
 
diff --git a/chapters/vi/chapter6/3b.mdx b/chapters/vi/chapter6/3b.mdx
index bb8145d52..aa26e2a56 100644
--- a/chapters/vi/chapter6/3b.mdx
+++ b/chapters/vi/chapter6/3b.mdx
@@ -275,11 +275,8 @@ Chúng ta chưa xong đâu, nhưng ít nhất chúng ta đã có điểm chính
 0.97773
 ```
 
-<Tip>
-
-✏️ **Thử nghiệm thôi!** Tính chỉ mục bắt đầu và kết thúc cho năm cấu trả lời đầu tiện.
-
-</Tip>
+> [!TIP]
+> ✏️ **Thử nghiệm thôi!** Tính chỉ mục bắt đầu và kết thúc cho năm cấu trả lời đầu tiện.
 
 Ta có `start_index` và `end_index` của câu trả lời theo token nên ta chỉ cần chuyển đổi các chỉ mục kí tự trong ngữ cảnh. Đấy là nơi offset sẽ cực kì hữu ích. Ta có thể lấy và sử dụng chúng như cách ta làm trong tác vụ phân loại token:
 
@@ -313,11 +310,8 @@ print(result)
 
 Tuyệt quá! Kết quả đó giống như trong ví dụ đầu tiên của chúng ta!
 
-<Tip>
-
-✏️ **Thử nghiệm thôi!** Sử dụng điểm tốt nhất mà bạn đã tính toán trước đó để hiển thị năm câu trả lời có khả năng nhất. Để kiểm tra kết quả của bạn, hãy quay lại đường dẫn đầu tiên và truyền vào `top_k=5` khi gọi nó.
-
-</Tip>
+> [!TIP]
+> ✏️ **Thử nghiệm thôi!** Sử dụng điểm tốt nhất mà bạn đã tính toán trước đó để hiển thị năm câu trả lời có khả năng nhất. Để kiểm tra kết quả của bạn, hãy quay lại đường dẫn đầu tiên và truyền vào `top_k=5` khi gọi nó.
 
 ## Xử lý các ngữ cảnh dài
 
@@ -608,11 +602,8 @@ print(candidates)
 
 Hai ứng cử viên đó tương ứng với các câu trả lời tốt nhất mà mô hình có thể tìm thấy trong mỗi đoạn. Mô hình chắc chắn hơn rằng câu trả lời đúng nằm ở phần thứ hai (đó là một dấu hiệu tốt!). Bây giờ chúng ta chỉ cần ánh xạ khoảng hai token đó với khoảng các ký tự trong ngữ cảnh (chúng ta chỉ cần lập ánh xạ cái thứ hai để có câu trả lời, nhưng thật thú vị khi xem mô hình đã chọn những gì trong đoạn đầu tiên).
 
-<Tip>
-
-✏️ **Thử nghiệm thôi!** Hãy điều chỉnh đoạn mã trên để trả về điểm và khoảng cho năm câu trả lời có nhiều khả năng nhất (tổng cộng, không phải cho mỗi đoạn).
-
-</Tip>
+> [!TIP]
+> ✏️ **Thử nghiệm thôi!** Hãy điều chỉnh đoạn mã trên để trả về điểm và khoảng cho năm câu trả lời có nhiều khả năng nhất (tổng cộng, không phải cho mỗi đoạn).
 
 `offsets` mà chúng ta đã nắm được trước đó thực sự là một danh sách các offset, với một danh sách trên mỗi đoạn văn bản:
 
@@ -633,10 +624,7 @@ for candidate, offset in zip(candidates, offsets):
 
 Nếu chúng ta bỏ qua kết quả đầu tiên, chúng ta sẽ nhận được kết quả tương tự như pipeline cho ngữ cảnh dài này - yayy!
 
-<Tip>
-
-✏️ **Thử nghiệm thôi!** Sử dụng điểm tốt nhất bạn đã tính toán trước đó để hiển thị năm câu trả lời có khả năng xảy ra nhất (cho toàn bộ ngữ cảnh, không phải từng đoạn). Để kiểm tra kết quả của bạn, hãy quay lại pipeline đầu tiên và truyền vào `top_k=5` khi gọi nó.
-
-</Tip>
+> [!TIP]
+> ✏️ **Thử nghiệm thôi!** Sử dụng điểm tốt nhất bạn đã tính toán trước đó để hiển thị năm câu trả lời có khả năng xảy ra nhất (cho toàn bộ ngữ cảnh, không phải từng đoạn). Để kiểm tra kết quả của bạn, hãy quay lại pipeline đầu tiên và truyền vào `top_k=5` khi gọi nó.
 
 Điều này kết thúc phần đi sâu vào các khả năng của tokenizer. Chúng ta sẽ đưa tất cả những điều này vào thực tế một lần nữa trong chương tiếp theo, khi chúng tôi hướng dẫn bạn cách tinh chỉnh một mô hình về một loạt các tác vụ NLP phổ biến.
diff --git a/chapters/vi/chapter6/4.mdx b/chapters/vi/chapter6/4.mdx
index 115792dfa..6762e8871 100644
--- a/chapters/vi/chapter6/4.mdx
+++ b/chapters/vi/chapter6/4.mdx
@@ -47,11 +47,8 @@ print(tokenizer.backend_tokenizer.normalizer.normalize_str("Héllò hôw are ü?
 
 Trong ví dụ này, vì chúng ta chọn checkpoint `bert-base-uncased`, bước chuẩn hoá sẽ thực hiện viết thường và loại bỏ các dấu.
 
-<Tip>
-
-✏️ **Try it out!** Tải tokenizer từ checkpoint `bert-base-cased` và truyền vào cùng một ví dụ vào.Sự khác biệt chính mà bạn có thể thấy giữa các phiên bản có dấu và không dấu của tokenizer là gì?
-
-</Tip>
+> [!TIP]
+> ✏️ **Try it out!** Tải tokenizer từ checkpoint `bert-base-cased` và truyền vào cùng một ví dụ vào.Sự khác biệt chính mà bạn có thể thấy giữa các phiên bản có dấu và không dấu của tokenizer là gì?
 
 ## Pre-tokenization
 
diff --git a/chapters/vi/chapter6/5.mdx b/chapters/vi/chapter6/5.mdx
index 2a9b9cf73..d476dff21 100644
--- a/chapters/vi/chapter6/5.mdx
+++ b/chapters/vi/chapter6/5.mdx
@@ -11,11 +11,8 @@ Mã hóa theo cặp (BPE) tiền thân được phát triển như một thuật
 
 <Youtube id="HEikzVL-lZU"/>
 
-<Tip>
-
-💡 Phần này trình bày sâu hơn về BPE, đi xa hơn nữa là trình bày cách triển khai đầy đủ. Bạn có thể bỏ qua phần cuối nếu bạn chỉ muốn có một cái nhìn tổng quan chung về thuật toán tokenize.
-
-</Tip>
+> [!TIP]
+> 💡 Phần này trình bày sâu hơn về BPE, đi xa hơn nữa là trình bày cách triển khai đầy đủ. Bạn có thể bỏ qua phần cuối nếu bạn chỉ muốn có một cái nhìn tổng quan chung về thuật toán tokenize.
 
 ## Thuật toán huấn luyện
 
@@ -27,11 +24,8 @@ Huấn luyện BPE bắt đầu bằng cách tính toán tập hợp các từ d
 
 Từ vựng cơ sở khi đó sẽ là `["b", "g", "h", "n", "p", "s", "u"]`. Đối với các trường hợp trong thực tế, từ vựng cơ sở đó sẽ chứa tất cả các ký tự ASCII, ít nhất và có thể là một số ký tự Unicode. Nếu một mẫu bạn đang tokenize sử dụng một ký tự không có trong kho dữ liệu huấn luyện, thì ký tự đó sẽ được chuyển đổi thành token không xác định. Đó là một lý do tại sao nhiều mô hình NLP rất kém trong việc phân tích nội dung bằng biểu tượng cảm xúc.
 
-<Tip>
-
-GPT-2 và RoBERTa tokenizer (khá giống nhau) có một cách thông minh để giải quyết vấn đề này: chúng không xem các từ được viết bằng các ký tự Unicode mà là các byte. Bằng cách này, từ vựng cơ sở có kích thước nhỏ (256), nhưng mọi ký tự bạn có thể nghĩ đến sẽ vẫn được bao gồm và không bị chuyển đổi thành token không xác định. Thủ thuật này được gọi là *BPE cấp byte*.
-
-</Tip>
+> [!TIP]
+> GPT-2 và RoBERTa tokenizer (khá giống nhau) có một cách thông minh để giải quyết vấn đề này: chúng không xem các từ được viết bằng các ký tự Unicode mà là các byte. Bằng cách này, từ vựng cơ sở có kích thước nhỏ (256), nhưng mọi ký tự bạn có thể nghĩ đến sẽ vẫn được bao gồm và không bị chuyển đổi thành token không xác định. Thủ thuật này được gọi là *BPE cấp byte*.
 
 Sau khi có được bộ từ vựng cơ bản này, chúng ta thêm các token mới cho đến khi đạt được kích thước từ vựng mong muốn bằng cách học *hợp nhất*, đây là các quy tắc để hợp nhất hai yếu tố của từ vựng hiện có với nhau thành một từ mới. Vì vậy, lúc đầu sự hợp nhất này sẽ tạo ra các token có hai ký tự và sau đó, khi quá trình huấn luyện tiến triển, các từ phụ sẽ dài hơn.
 
@@ -75,11 +69,8 @@ Corpus: ("hug", 10), ("p" "ug", 5), ("p" "un", 12), ("b" "un", 4), ("hug" "s", 5
 
 Và chúng ta tiếp túc làm vậy cho đến khi chúng ta chạm đến kích thước bộ tự điển ta mong muốn.
 
-<Tip>
-
-✏️ **Giờ thì đến lượt bạn!** Bạn nghĩ bước hợp nhất tiếp theo sẽ là gì?
-
-</Tip>
+> [!TIP]
+> ✏️ **Giờ thì đến lượt bạn!** Bạn nghĩ bước hợp nhất tiếp theo sẽ là gì?
 
 ## Thuật toán tokenize
 
@@ -100,11 +91,8 @@ Lấy ví dụ mà ta đã sử dụng trong quá trình huấn luyện, với b
 
 Từ `"bug"` sẽ được tokenize thành `["b", "ug"]`. `"mug"`, tuy nhiên, sẽ tokenize thành `["[UNK]", "ug"]` vì kí tự `"m"` không có trong bộ tự vựng gốc. Tương tự, từ `"thug"` sẽ được tokenize thành  `["[UNK]", "hug"]`: kí tự `"t"` không có trong bộ tự vựng gốc, và áp dụng quy tắc hợp nhất ở `"u"` và `"g"` và sau đó `"hu"` và `"g"`.
 
-<Tip>
-
-✏️ **Giờ tới lượt bạn!** Bạn nghĩ rằng `"unhug"` sẽ được tokenize như thế nào?
-
-</Tip>
+> [!TIP]
+> ✏️ **Giờ tới lượt bạn!** Bạn nghĩ rằng `"unhug"` sẽ được tokenize như thế nào?
 
 ## Triển khai BPE
 
@@ -316,11 +304,8 @@ print(vocab)
  'Ġtok', 'Ġtoken', 'nd', 'Ġis', 'Ġth', 'Ġthe', 'in', 'Ġab', 'Ġtokeni']
 ```
 
-<Tip>
-
-💡 Sử dụng `train_new_from_iterator()` trên cùng kho ngữ liệu sẽ không mang về kết quả kho ngữ liệu y hệt. Đó là bởi khi có sự lựa chọn về cặp có tần suất cao nhất, ta đã chọn cái đầu tiên xuất hiện, trong khi thư viện 🤗 Tokenizers chọn cái đầu tiên dựa trên ID bên trong của nó.
-
-</Tip>
+> [!TIP]
+> 💡 Sử dụng `train_new_from_iterator()` trên cùng kho ngữ liệu sẽ không mang về kết quả kho ngữ liệu y hệt. Đó là bởi khi có sự lựa chọn về cặp có tần suất cao nhất, ta đã chọn cái đầu tiên xuất hiện, trong khi thư viện 🤗 Tokenizers chọn cái đầu tiên dựa trên ID bên trong của nó.
 
 Để tokenize văn bản mới, chúng ta tiền tokenize nó, tách ra, rồi áp dụng quy tắc hợp nhất được học:
 
@@ -352,10 +337,7 @@ tokenize("This is not a token.")
 ['This', 'Ġis', 'Ġ', 'n', 'o', 't', 'Ġa', 'Ġtoken', '.']
 ```
 
-<Tip warning={true}>
-
-⚠️ Các triển khai của chúng ta sẽ gặp lỗi nếu có những kí tự vô danh vì chúng ta đã không làm gì để xử lý chúng. GPT-2 không thực sự có những token vô danh (không thể có kí tự vô danh khi sử dụng BPE cấp byte), nhưng nó có thể xảy ra ở đây vì ta không bao gồm tất cả các byte có thể có trong bộ từ vựng gốc. Khía cạnh này của BPE nằm ngoài phạm vi phần này, nên chúng tôi sẽ không đi sau vào chi tiết.
-
-</Tip>
+> [!WARNING]
+> ⚠️ Các triển khai của chúng ta sẽ gặp lỗi nếu có những kí tự vô danh vì chúng ta đã không làm gì để xử lý chúng. GPT-2 không thực sự có những token vô danh (không thể có kí tự vô danh khi sử dụng BPE cấp byte), nhưng nó có thể xảy ra ở đây vì ta không bao gồm tất cả các byte có thể có trong bộ từ vựng gốc. Khía cạnh này của BPE nằm ngoài phạm vi phần này, nên chúng tôi sẽ không đi sau vào chi tiết.
 
 Đó là những gì ta cần biết về thuật toán BPE! Tiếp theo, chúng ta sẽ cùng tìm hiểu về WordPiece.
diff --git a/chapters/vi/chapter6/6.mdx b/chapters/vi/chapter6/6.mdx
index 744012f07..c253e9e7f 100644
--- a/chapters/vi/chapter6/6.mdx
+++ b/chapters/vi/chapter6/6.mdx
@@ -11,19 +11,13 @@ WordPiece là một thuật toán tokenize được Google phát triển để h
 
 <Youtube id="qpv6ms_t_1A"/>
 
-<Tip>
-
-💡 Phần này sẽ đi sâu vào WordPiece, cũng như các triển khai đầy đủ của nó. Bạn có thể bỏ qua phần cuối nếu bạn chỉ muốn có một cái nhìn tổng quan về thuật toán tokenize này.
-
-</Tip>
+> [!TIP]
+> 💡 Phần này sẽ đi sâu vào WordPiece, cũng như các triển khai đầy đủ của nó. Bạn có thể bỏ qua phần cuối nếu bạn chỉ muốn có một cái nhìn tổng quan về thuật toán tokenize này.
 
 ## Thuật toán huấn luyện
 
-<Tip warning={true}>
-
-⚠️ Google không bao giờ có nguồn mở về cách triển khai các thuật toán huấn luyện của WordPiece,vì vậy những gì dưới đây là phỏng đoán tốt nhất của chúng tôi dựa trên các tài liệu đã xuất bản. Nó có thể không chính xác 100%.
-
-</Tip>
+> [!WARNING]
+> ⚠️ Google không bao giờ có nguồn mở về cách triển khai các thuật toán huấn luyện của WordPiece,vì vậy những gì dưới đây là phỏng đoán tốt nhất của chúng tôi dựa trên các tài liệu đã xuất bản. Nó có thể không chính xác 100%.
 
 Giống như BPE, WordPiece bắt đầu từ một từ vựng nhỏ bao gồm các token đặc biệt được sử dụng bởi mô hình và bảng chữ cái đầu tiên. Vì nó xác định các từ phụ bằng cách thêm tiền tố (như `##` cho BERT), ban đầu mỗi từ được tách bằng cách thêm tiền tố đó vào tất cả các ký tự bên trong từ. Vì vậy, ví dụ, `"word"` được chia như thế này:
 
@@ -76,11 +70,8 @@ Corpus: ("hug", 10), ("p" "##u" "##g", 5), ("p" "##u" "##n", 12), ("b" "##u" "##
 
 và tiếp tục như vậy cho đến khi chúng ta đạt được kích thước bộ từ vựng mong muốn.
 
-<Tip>
-
-✏️ **Giờ đến lượt bạn!** Bộ quy luật hợp nhất tiếp theo là gì?
-
-</Tip>
+> [!TIP]
+> ✏️ **Giờ đến lượt bạn!** Bộ quy luật hợp nhất tiếp theo là gì?
 
 ## Thuật toán tokenize
 
@@ -92,11 +83,8 @@ Ví dụ khác, hãy xem từ `"bugs"` sẽ được tokenize như thế nào. `
 
 Khi quá trình tokenize đến giai đoạn không thể tìm thấy một từ khóa phụ trong từ vựng, toàn bộ từ được tokenize thành không xác định - vì vậy, ví dụ: `"mug"` sẽ được tokenize là  `["[UNK]"]`, cũng như `"bum"` (ngay cả khi chúng ta có thể bắt đầu bằng `"b"` và `"##u"`, `"##m"` không phải thuộc bộ từ vựng và kết quả tokenize sẽ chỉ là `["[UNK]"]`, không phải `["b", "##u", "[UNK]"]`). Đây là một điểm khác biệt so với BPE, chỉ phân loại các ký tự riêng lẻ không có trong từ vựng là không xác định.
 
-<Tip>
-
-✏️ **Giờ đến lượt bạn!** `"pugs"` sẽ được tokenize như thế nào?
-
-</Tip>
+> [!TIP]
+> ✏️ **Giờ đến lượt bạn!** `"pugs"` sẽ được tokenize như thế nào?
 
 ## Triển khai WordPiece
 
@@ -314,11 +302,8 @@ print(vocab)
 
 Như có thể thấy, so với BPE, tokenizer này học các phần của từ như là token nhanh hơn một chút.
 
-<Tip>
-
-💡 Sử dụng `train_new_from_iterator()` trên cùng kho ngữ liệu sẽ không mang về kết quả kho ngữ liệu y hệt. Đó là bởi thư viện 🤗 Tokenizers không triển khai WordPiece cho huấn luyện (vì chúng ta không hoàn toàn nằm rõ bên trong), và sử dụng BPE thay vào đó.
-
-</Tip>
+> [!TIP]
+> 💡 Sử dụng `train_new_from_iterator()` trên cùng kho ngữ liệu sẽ không mang về kết quả kho ngữ liệu y hệt. Đó là bởi thư viện 🤗 Tokenizers không triển khai WordPiece cho huấn luyện (vì chúng ta không hoàn toàn nằm rõ bên trong), và sử dụng BPE thay vào đó.
 
 Để tokenize những đoạn văn mới, ta tiền tokenize nó, chia nhỏ và áp dụng thuật toán tokenize cho từng từ. Vậy đó, chúng ta nhìn vào cụm từ con dài nhất bắt đầu từ đầu từ đầu tiên và chia nhỏ nó, sau đó lặp lại quy trình với phần thứ hai, và tiếp tục cho đến hết từ và các từ tiếp theo trong văn bản: 
 
diff --git a/chapters/vi/chapter6/7.mdx b/chapters/vi/chapter6/7.mdx
index ff07c3c4e..384f9299c 100644
--- a/chapters/vi/chapter6/7.mdx
+++ b/chapters/vi/chapter6/7.mdx
@@ -11,11 +11,8 @@ Thuật toán Unigram thường được sử dung trong SentencePiece,  đây l
 
 <Youtube id="TGZfZVuF9Yc"/>
 
-<Tip>
-
-💡 Phần này sẽ đi sâu vào Unigram cũng như toàn bộ cách triển khai. Bạn có thể bỏ qua phần cuối nếu bạn chỉ quan tâm tổng quan thuật toán tokenize.
-
-</Tip>
+> [!TIP]
+> 💡 Phần này sẽ đi sâu vào Unigram cũng như toàn bộ cách triển khai. Bạn có thể bỏ qua phần cuối nếu bạn chỉ quan tâm tổng quan thuật toán tokenize.
 
 ## Thuật toán huấn luyện
 
@@ -56,11 +53,8 @@ Dưới đây là tần suất của tất cả các từ phụ có thể có tr
 
 Vậy nên, tổng của tất cả các tần suất là 210, và xác suất của từ phụ `"ug"` là 20/210.
 
-<Tip>
-
-✏️ **Giờ đến lượt bạn!** Viết đoạn mã để tính tần suất trên và kiểm tra lại kết quả hiển thị cũng như tổng đã đúng chưa.
-
-</Tip>
+> [!TIP]
+> ✏️ **Giờ đến lượt bạn!** Viết đoạn mã để tính tần suất trên và kiểm tra lại kết quả hiển thị cũng như tổng đã đúng chưa.
 
 Giờ, để tokenize một từ cho trước, chúng ta sẽ nhìn vào tất cả các phần đoạn thành token và tính xác suất của từng cái theo mô hình Unigram. Vì tất cả token được cho là độc lập, xác suất này chỉ là tích của xác suất mỗi token. Ví dụ, `["p", "u", "g"]` của `"pug"` có xác suất:
 
@@ -98,11 +92,8 @@ Character 4 (g): "un" "hug" (score 0.005442)
 
 Vậy `"unhug"` có thể tokenize thành `["un", "hug"]`.
 
-<Tip>
-
-✏️ **Giờ đến lượt bạn!** Xác định token của từ `"huggun"`, và điểm cảu chúng.
-
-</Tip>
+> [!TIP]
+> ✏️ **Giờ đến lượt bạn!** Xác định token của từ `"huggun"`, và điểm cảu chúng.
 
 ## Quay lại huấn luyện
 
@@ -215,11 +206,8 @@ token_freqs = list(char_freqs.items()) + sorted_subwords[: 300 - len(char_freqs)
 token_freqs = {token: freq for token, freq in token_freqs}
 ```
 
-<Tip>
-
-💡 SentencePiece sử dụng một thuật toán hiệu quả hơn gọi là Enhanced Suffix Array (ESA) để tạo ra bộ từ vựng ban đầu.
-
-</Tip>
+> [!TIP]
+> 💡 SentencePiece sử dụng một thuật toán hiệu quả hơn gọi là Enhanced Suffix Array (ESA) để tạo ra bộ từ vựng ban đầu.
 
 Tiếp theo, chúng ta tính tổng tần suất để biến đổi các tần suất này thành xác suất. Với mô hình, chúng ta sẽ lưu các log của xác xuất, vì nó ổn định hơn về mặt số học khi cộng logarit hơn là nhân các số nhỏ và điều này sẽ đơn giản hóa việc tính toán mất mát của mô hình: 
 
@@ -340,11 +328,8 @@ Vì `"ll"` được sử dụng trong quá trình tokenize  `"Hopefully"`, và l
 0.0
 ```
 
-<Tip>
-
-💡 Phương pháp này rất không hiệu quả, nên SentencePiece  sử dụng một xấp xỉ của hàm mất mát của mô hình mà không dùng token X: thay vì bắt đầu từ đầu, nó chỉ thay thế token X bởi phân đoạn bên trái của nó trong bộ từ vựng. Bằng cách này, tất cả điểm có thể được tính trong cùng một lần đồng thời với sự mất mát của mô hình.
-
-</Tip>
+> [!TIP]
+> 💡 Phương pháp này rất không hiệu quả, nên SentencePiece  sử dụng một xấp xỉ của hàm mất mát của mô hình mà không dùng token X: thay vì bắt đầu từ đầu, nó chỉ thay thế token X bởi phân đoạn bên trái của nó trong bộ từ vựng. Bằng cách này, tất cả điểm có thể được tính trong cùng một lần đồng thời với sự mất mát của mô hình.
 
 Với tất cả những điều trên, điều cuối cùng ta cần phải làm là thêm các token đặc biệt của mô hình vào bộ từ vựng, sau đó lặp cho đến khi chúng ta cắt đủ số token ta mong muốn cho kích cỡ bộ từ vựng:
 
diff --git a/chapters/vi/chapter6/8.mdx b/chapters/vi/chapter6/8.mdx
index aa93a0fed..9e5e9a242 100644
--- a/chapters/vi/chapter6/8.mdx
+++ b/chapters/vi/chapter6/8.mdx
@@ -109,12 +109,9 @@ print(tokenizer.normalizer.normalize_str("Héllò hôw are ü?"))
 hello how are u?
 ```
 
-<Tip>
-
-**Đào sâu hơn** Nếu bạn muốn kiểm tra xem hai phiên bản chuẩn hoá trước đó trên cũng một chuỗi chứa kí tự unicode `u"\u0085"`, bạn chắc chắn sẽ nhận thấy rằng hai cách chuẩn hoá này không hoàn toàn giống nhau.
-Để tránh phức tạp hoá phiên bản với `normalizers.Sequence` quá nhiều, chúng tôi sẽ không bao gồm các sự thay thế theo Regex mà `BertNormalizer` yêu cầu khi tham số `clean_text` được thiết lập là `True` - đây cũng là giá trị mặc định. Nhưng đừng lo: có khả năng ta sẽ nhận được kết quả chuẩn hoá giống nhau mà không cần sử dụng `BertNormalizer` thủ công bằng cách thêm hai `normalizers.Replace` vào chuỗi chuẩn hoá.
-
-</Tip>
+> [!TIP]
+> **Đào sâu hơn** Nếu bạn muốn kiểm tra xem hai phiên bản chuẩn hoá trước đó trên cũng một chuỗi chứa kí tự unicode `u"\u0085"`, bạn chắc chắn sẽ nhận thấy rằng hai cách chuẩn hoá này không hoàn toàn giống nhau.
+> Để tránh phức tạp hoá phiên bản với `normalizers.Sequence` quá nhiều, chúng tôi sẽ không bao gồm các sự thay thế theo Regex mà `BertNormalizer` yêu cầu khi tham số `clean_text` được thiết lập là `True` - đây cũng là giá trị mặc định. Nhưng đừng lo: có khả năng ta sẽ nhận được kết quả chuẩn hoá giống nhau mà không cần sử dụng `BertNormalizer` thủ công bằng cách thêm hai `normalizers.Replace` vào chuỗi chuẩn hoá.
 
 Tiếp theo là bước pre-tokenization. Một lần nữa, ta có `BertPreTokenizer` được xây dựng sẵn để dùng:
 
diff --git a/chapters/vi/chapter7/1.mdx b/chapters/vi/chapter7/1.mdx
index 57fc79527..a0acae0fe 100644
--- a/chapters/vi/chapter7/1.mdx
+++ b/chapters/vi/chapter7/1.mdx
@@ -25,8 +25,5 @@ Mỗi phần có thể được đọc độc lập.
 
 {/if}
 
-<Tip>
-
-Nếu bạn đọc các phần theo trình tự, bạn sẽ nhận thấy rằng chúng có khá nhiều điểm chung về đoạn mã và văn xuôi mô tả. Việc lặp lại là có chủ đích, để cho phép bạn nhúng tay vào (hoặc quay lại sau) bất kỳ tác vụ nào mà bạn quan tâm và tìm thấy một ví dụ hoạt động hoàn chỉnh.
-
-</Tip>
+> [!TIP]
+> Nếu bạn đọc các phần theo trình tự, bạn sẽ nhận thấy rằng chúng có khá nhiều điểm chung về đoạn mã và văn xuôi mô tả. Việc lặp lại là có chủ đích, để cho phép bạn nhúng tay vào (hoặc quay lại sau) bất kỳ tác vụ nào mà bạn quan tâm và tìm thấy một ví dụ hoạt động hoàn chỉnh.
diff --git a/chapters/vi/chapter7/2.mdx b/chapters/vi/chapter7/2.mdx
index 5167b026f..6c33c5601 100644
--- a/chapters/vi/chapter7/2.mdx
+++ b/chapters/vi/chapter7/2.mdx
@@ -79,11 +79,8 @@ Bạn có thể tìm mô hình ta sẽ huấn luyện và tải lên Hub và ki
 
 Đầu tiên, ta cần bộ dữ liệu chuẩn bị cho phân loại token. Trong chương này, chúng ta sẽ sử dụng bộ dữ liệu [CoNLL-2003](https://huggingface.co/datasets/conll2003), bao gồm các câu chuyện tin tức từ Reuters.
 
-<Tip>
-
-💡 Miễn là tập dữ liệu của bạn bao gồm các văn bản được chia thành các từ với nhãn tương ứng của chúng, bạn sẽ có thể điều chỉnh các quy trình xử lý dữ liệu được mô tả ở đây với tập dữ liệu của riêng bạn. Tham khảo lại [Chapter 5](/course/chapter5) nếu bạn cần cập nhật về cách tải dữ liệu tùy chỉnh của riêng bạn trong `Dataset`.
-
-</Tip>
+> [!TIP]
+> 💡 Miễn là tập dữ liệu của bạn bao gồm các văn bản được chia thành các từ với nhãn tương ứng của chúng, bạn sẽ có thể điều chỉnh các quy trình xử lý dữ liệu được mô tả ở đây với tập dữ liệu của riêng bạn. Tham khảo lại [Chapter 5](/course/chapter5) nếu bạn cần cập nhật về cách tải dữ liệu tùy chỉnh của riêng bạn trong `Dataset`.
 
 ### Tập dữ liệu CoNLL-2003
 
@@ -201,11 +198,8 @@ Và đối với một ví dụ trộn nhãn `B-` và `I-`, đây là những g
 
 Như chúng ta có thể thấy, các thực thể bao gồm hai từ, như "European Union" và "Werner Zwingmann", được gán nhãn `B-` cho từ đầu tiên và nhãn `I-` cho từ thứ hai.
 
-<Tip>
-
-✏️ **Đến lượt bạn!** In hai câu giống nhau bằng nhãn POS hoặc phân khúc của chúng.
-
-</Tip>
+> [!TIP]
+> ✏️ **Đến lượt bạn!** In hai câu giống nhau bằng nhãn POS hoặc phân khúc của chúng.
 
 ### Xử lý dữ liệu
 
@@ -297,11 +291,8 @@ print(align_labels_with_tokens(labels, word_ids))
 
 Như chúng ta có thể thấy, hàm đã thêm `-100` cho hai token đặc biệt ở đầu và cuối, và dấu `0` mới cho từ của chúng ta đã được chia thành hai token.
 
-<Tip>
-
-✏️ **Đến lượt bạn!** Một số nhà nghiên cứu chỉ thích gán một nhãn cho mỗi từ và gán `-100` cho các token con khác trong một từ nhất định. Điều này là để tránh các từ dài được chia thành nhiều token phụ góp phần lớn vào hàm mất mát. Thay đổi chức năng trước đó để căn chỉnh nhãn với ID đầu vào bằng cách tuân theo quy tắc này.
-
-</Tip>
+> [!TIP]
+> ✏️ **Đến lượt bạn!** Một số nhà nghiên cứu chỉ thích gán một nhãn cho mỗi từ và gán `-100` cho các token con khác trong một từ nhất định. Điều này là để tránh các từ dài được chia thành nhiều token phụ góp phần lớn vào hàm mất mát. Thay đổi chức năng trước đó để căn chỉnh nhãn với ID đầu vào bằng cách tuân theo quy tắc này.
 
 Để xử lý trước toàn bộ tập dữ liệu của mình, chúng ta cần tokenize tất cả các đầu vào và áp dụng `align_labels_with_tokens()` trên tất cả các nhãn. Để tận dụng tốc độ của trình tokenize nhanh của mình, tốt nhất bạn nên tokenize nhiều văn bản cùng một lúc, vì vậy chúng ta sẽ viết một hàm xử lý danh sách các ví dụ và sử dụng phương thức `Dataset.map()` với tùy chọn `batched=True`. Điều duy nhất khác với ví dụ trước là hàm `word_ids()` cần lấy chỉ mục của mẫu mà chúng ta muốn các ID từ khi các đầu vào cho tokenizer là danh sách văn bản (hoặc trong trường hợp của chúng ta là danh sách danh sách các từ), vì vậy chúng ta cũng thêm vào đó:
 
@@ -461,11 +452,8 @@ model.config.num_labels
 9
 ```
 
-<Tip warning={true}>
-
-⚠️ Nếu bạn có một mô hình có số nhãn sai, bạn sẽ gặp lỗi khó hiểu khi gọi `model.fit()` sau này. Điều này có thể gây khó chịu khi gỡ lỗi, vì vậy hãy đảm bảo bạn thực hiện kiểm tra này để xác nhận rằng bạn có số lượng nhãn dự kiến.
-
-</Tip>
+> [!WARNING]
+> ⚠️ Nếu bạn có một mô hình có số nhãn sai, bạn sẽ gặp lỗi khó hiểu khi gọi `model.fit()` sau này. Điều này có thể gây khó chịu khi gỡ lỗi, vì vậy hãy đảm bảo bạn thực hiện kiểm tra này để xác nhận rằng bạn có số lượng nhãn dự kiến.
 
 ### Tinh chỉnh một mô hình
 
@@ -529,11 +517,8 @@ model.fit(
 
 Bạn có thể chỉ định tên đầy đủ của kho lưu trữ mà bạn muốn đẩy đến bằng tham số `hub_model_id` (đặc biệt, bạn sẽ phải sử dụng tham số này để đẩy đến một tổ chức). Ví dụ: khi chúng ta đẩy mô hình vào tổ chức [`huggingface-course`](https://huggingface.co/huggingface-course), chúng ta đã thêm `hub_model_id="huggingface-course/bert-finetuned-ner"`. Theo mặc định, kho lưu trữ được sử dụng sẽ nằm trong không gian tên của bạn và được đặt tên theo thư mục đầu ra mà bạn đã đặt, ví dụ: `"cool_huggingface_user/bert-finetuned-ner"`.
 
-<Tip>
-
-💡 Nếu thư mục đầu ra bạn đang sử dụng đã tồn tại, nó cần phải là bản sao cục bộ của kho lưu trữ mà bạn muốn đẩy đến. Nếu không, bạn sẽ gặp lỗi khi gọi `model.fit()` và sẽ cần đặt tên mới.
-
-</Tip>
+> [!TIP]
+> 💡 Nếu thư mục đầu ra bạn đang sử dụng đã tồn tại, nó cần phải là bản sao cục bộ của kho lưu trữ mà bạn muốn đẩy đến. Nếu không, bạn sẽ gặp lỗi khi gọi `model.fit()` và sẽ cần đặt tên mới.
 
 Lưu ý rằng trong khi quá trình huấn luyện diễn ra, mỗi khi mô hình được lưu (ở đây, mỗi epoch), nó sẽ được tải lên Hub ở chế độ nền. Bằng cách này, bạn sẽ có thể tiếp tục quá trình huấn luyện của mình trên một máy khác nếu cần.
 
@@ -709,11 +694,8 @@ model.config.num_labels
 9
 ```
 
-<Tip warning={true}>
-
-⚠️ Nếu bạn có mô hình với số lượng nhãn sai, bạn sẽ nhận một lỗi khó hiểu khi gọi hàm `Trainer.train()` sau đó (giống như "CUDA error: device-side assert triggered"). Đây là nguyên nhân số một gây ra lỗi do người dùng báo cáo về những lỗi như vậy, vì vậy hãy đảm bảo bạn thực hiện kiểm tra này để xác nhận rằng bạn có số lượng nhãn dự kiến.
-
-</Tip>
+> [!WARNING]
+> ⚠️ Nếu bạn có mô hình với số lượng nhãn sai, bạn sẽ nhận một lỗi khó hiểu khi gọi hàm `Trainer.train()` sau đó (giống như "CUDA error: device-side assert triggered"). Đây là nguyên nhân số một gây ra lỗi do người dùng báo cáo về những lỗi như vậy, vì vậy hãy đảm bảo bạn thực hiện kiểm tra này để xác nhận rằng bạn có số lượng nhãn dự kiến.
 
 ### Tinh chỉnh mô hình
 
@@ -751,11 +733,8 @@ args = TrainingArguments(
 
 Bạn đã từng thấy hầu hết những điều đó trước đây: chúng ta đặt một số siêu tham số (như tốc độ học, số epoch cần luyện tập và giảm trọng lượng) và chúng ta chỉ định `push_to_hub=True` để chỉ ra rằng chúng ta muốn lưu mô hình và đánh giá nó vào cuối mỗi epoch và rằng chúng ta muốn tải kết quả của mình lên Model Hub. Lưu ý rằng bạn có thể chỉ định tên của kho lưu trữ mà bạn muốn đẩy đến bằng tham số `hub_model_id` (cụ thể là bạn sẽ phải sử dụng tham số này để đẩy đến một tổ chức). Ví dụ: khi đẩy mô hình vào tổ chức [`huggingface-course`](https://huggingface.co/huggingface-course) chúng ta đã thêm `hub_model_id="huggingface-course/bert-finetuned-ner"` vào `TrainingArguments`. Theo mặc định, kho lưu trữ được sử dụng sẽ nằm trong không gian tên của bạn và được đặt tên theo thư mục đầu ra mà bạn đã đặt, vì vậy trong trường hợp của chúng tôi, nó sẽ là `"sgugger/bert-finetuned-ner"`.
 
-<Tip>
-
-💡 Nếu thư mục đầu ra bạn đang sử dụng đã tồn tại, nó cần phải là bản sao cục bộ của kho lưu trữ mà bạn muốn đẩy đến. Nếu không, bạn sẽ gặp lỗi khi xác định `Trainer` của mình và sẽ cần đặt một tên mới.
-
-</Tip>
+> [!TIP]
+> 💡 Nếu thư mục đầu ra bạn đang sử dụng đã tồn tại, nó cần phải là bản sao cục bộ của kho lưu trữ mà bạn muốn đẩy đến. Nếu không, bạn sẽ gặp lỗi khi xác định `Trainer` của mình và sẽ cần đặt một tên mới.
 
 Cuối cùng, chúng ta chỉ cần truyền mọi thứ cho  `Trainer` và bắt đầu huấn luyện:
 
@@ -842,11 +821,8 @@ model, optimizer, train_dataloader, eval_dataloader = accelerator.prepare(
 )
 ```
 
-<Tip>
-
-🚨 Nếu bạn huấn luyện trên TPU, bạn sẽ cần chuyển tất cả các đoạn mã ở trên thành một hàm huấn luyện. Xem [Chương 3](/course/chapter3) để biết thêm chi tiết.
-
-</Tip>
+> [!TIP]
+> 🚨 Nếu bạn huấn luyện trên TPU, bạn sẽ cần chuyển tất cả các đoạn mã ở trên thành một hàm huấn luyện. Xem [Chương 3](/course/chapter3) để biết thêm chi tiết.
 
 Bây giờ, chúng ta đã gửi `train_dataloader` của mình tới `speedrator.prepare()`, chúng ta có thể sử dụng độ dài của nó để tính số bước huấn luyện. Hãy nhớ rằng chúng ta phải luôn làm điều này sau khi chuẩn bị dataloader, vì phương thức đó sẽ thay đổi độ dài của nó. Chúng ta sử dụng một lịch trình tuyến tính cổ điển từ tốc độ học đến 0:
 
diff --git a/chapters/vi/chapter7/3.mdx b/chapters/vi/chapter7/3.mdx
index 0cc470d6c..f9fa20c55 100644
--- a/chapters/vi/chapter7/3.mdx
+++ b/chapters/vi/chapter7/3.mdx
@@ -41,11 +41,8 @@ Cùng đi sâu vào thôi!
 
 <Youtube id="mqElG5QJWUg"/>
 
-<Tip>
-
-🙋 Nếu các thuật ngữ "mô hình ngôn ngữ bị ẩn đi" và "mô hình huấn luyện trước" nghe có vẻ xa lạ với bạn, hãy xem [Chương 1](/course/chapter1), nơi chúng tôi giải thích tất cả các khái niệm cốt lõi này, kèm theo video!
-
-</Tip>
+> [!TIP]
+> 🙋 Nếu các thuật ngữ "mô hình ngôn ngữ bị ẩn đi" và "mô hình huấn luyện trước" nghe có vẻ xa lạ với bạn, hãy xem [Chương 1](/course/chapter1), nơi chúng tôi giải thích tất cả các khái niệm cốt lõi này, kèm theo video!
 
 ## Chọn một mô hình huấn luyện trước cho mô hình ngôn ngữ bị ẩn đi
 
@@ -237,11 +234,8 @@ for row in sample:
 
 Đúng, đây chắc chắn là những bài đánh giá phim, và nếu bạn đủ lớn, bạn thậm chí có thể hiểu nhận xét trong bài đánh giá cuối cùng về việc sở hữu phiên bản VHS 😜! Mặc dù chúng ta sẽ không cần nhãn để cho mô hình ngôn ngữ, nhưng chúng ta có thể thấy rằng `0` biểu thị một đánh giá tiêu cực, trong khi `1` tương ứng với một đánh giá tích cực.
 
-<Tip>
-
-✏️ **Thử nghiệm thôi!** Tạo ra các mẫu ngẫu nhiền từ phần `phi giám sát` và kiểm định xem nhãn của chúng là `0` hay `1`. Khi đang ở đó, bạn cũng có thể kiểm tra xem các nhãn trong phần `huấn luyện` và `kiểm thử` có thực sử là `0` hoặc `1` không -- đây là một phần kiểm tra hữu ích mànhững nhà NLP nên thực hiện đầu dự án!. 
-
-</Tip>
+> [!TIP]
+> ✏️ **Thử nghiệm thôi!** Tạo ra các mẫu ngẫu nhiền từ phần `phi giám sát` và kiểm định xem nhãn của chúng là `0` hay `1`. Khi đang ở đó, bạn cũng có thể kiểm tra xem các nhãn trong phần `huấn luyện` và `kiểm thử` có thực sử là `0` hoặc `1` không -- đây là một phần kiểm tra hữu ích mànhững nhà NLP nên thực hiện đầu dự án!.
 
 Bây giờ chúng ta đã có một cái nhìn nhanh về dữ liệu, hãy đi sâu vào việc chuẩn bị nó cho việc lập mô hình ngôn ngữ bị ẩn đi. Như chúng ta sẽ thấy, có một số bước bổ sung mà người ta cần thực hiện so với các tác vụ phân loại chuỗi mà chúng ta đã thấy trong [Chương 3](/course/chapter3). Đi thôi!
 
@@ -299,11 +293,8 @@ tokenizer.model_max_length
 
 Giá trị này có nguồn gốc từ tệp *tokenizer_config.json* được liên kết với một checkpoint; trong trường hợp này, chúng ta có thể thấy rằng kích thước ngữ cảnh là 512 token, giống như với BERT.
 
-<Tip>
-
-✏️ **Thử nghiệm thôi!** Một số mô hình Transformer, như[BigBird](https://huggingface.co/google/bigbird-roberta-base) và [Longformer](hf.co/allenai/longformer-base-4096),có độ dài ngữ cảnh dài hơn nhiều so với BERT và các mô hình Transformer đời đầu khác. Khởi tạo tokenizer cho một trong những checkpoint và xác minh rằng `model_max_length` tương ứng với những gì được trích dẫn trên thẻ mô hình của nó.
-
-</Tip>
+> [!TIP]
+> ✏️ **Thử nghiệm thôi!** Một số mô hình Transformer, như[BigBird](https://huggingface.co/google/bigbird-roberta-base) và [Longformer](hf.co/allenai/longformer-base-4096),có độ dài ngữ cảnh dài hơn nhiều so với BERT và các mô hình Transformer đời đầu khác. Khởi tạo tokenizer cho một trong những checkpoint và xác minh rằng `model_max_length` tương ứng với những gì được trích dẫn trên thẻ mô hình của nó.
 
 Vì vậy, để chạy các thử nghiệm trên GPU như những GPU được tìm thấy trên Google Colab, chúng ta sẽ chọn thứ gì đó nhỏ hơn một chút có thể vừa với bộ nhớ:
 
@@ -311,11 +302,8 @@ Vì vậy, để chạy các thử nghiệm trên GPU như những GPU được
 chunk_size = 128
 ```
 
-<Tip warning={true}>
-
-Lưu ý rằng việc sử dụng kích thước phân đoạn nhỏ có thể gây bất lợi trong các tình huống thực tế, vì vậy bạn nên sử dụng kích thước tương ứng với trường hợp sử dụng mà bạn sẽ áp dụng mô hình của mình.
-
-</Tip>
+> [!WARNING]
+> Lưu ý rằng việc sử dụng kích thước phân đoạn nhỏ có thể gây bất lợi trong các tình huống thực tế, vì vậy bạn nên sử dụng kích thước tương ứng với trường hợp sử dụng mà bạn sẽ áp dụng mô hình của mình.
 
 Bây giờ đến phần thú vị. Để cho biết cách nối hoạt động, hãy lấy một vài bài đánh giá từ bộ huấn luyện được tokenize và in ra số lượng token cho mỗi bài đánh giá:
 
@@ -473,11 +461,8 @@ for chunk in data_collator(samples)["input_ids"]:
 
 Tốt, nó đã hoạt động! Chúng ta có thể thấy rằng `[MASK]` đã được chèn ngẫu nhiên tại các vị trí khác nhau trong văn bản. Đây sẽ là những token mà mô hình sẽ phải dự đoán trong quá trình huấn luyện - và cái hay của công cụ đối chiếu dữ liệu là nó sẽ ngẫu nhiên chèn `[MASK]` với mọi lô!
 
-<Tip>
-
-✏️ **Thử nghiệm thôi!** Chạy đoạn mã trên vài lần để xem việc che ngẫu nhiên diễn ra ngay trước mắt bạn! Đồng thời  thử thay thế phương thức `tokenizer.decode()` bằng `tokenizer.convert_ids_to_tokens()` để thấy rằng đôi khi một token từ một từ nhất định bị che, chứ không phải những cái khác.
-
-</Tip>
+> [!TIP]
+> ✏️ **Thử nghiệm thôi!** Chạy đoạn mã trên vài lần để xem việc che ngẫu nhiên diễn ra ngay trước mắt bạn! Đồng thời  thử thay thế phương thức `tokenizer.decode()` bằng `tokenizer.convert_ids_to_tokens()` để thấy rằng đôi khi một token từ một từ nhất định bị che, chứ không phải những cái khác.
 
 {#if fw === 'pt'}
 
@@ -587,11 +572,8 @@ for chunk in batch["input_ids"]:
 '>>> .... [MASK] [MASK] [MASK] [MASK]....... high. a classic line : inspector : i\'m here to sack one of your teachers. student : welcome to bromwell high. i expect that many adults of my age think that bromwell high is far fetched. what a pity that it isn\'t! [SEP] [CLS] homelessness ( or houselessness as george carlin stated ) has been an issue for years but never a plan to help those on the street that were once considered human who did everything from going to school, work, or vote for the matter. most people think of the homeless'
 ```
 
-<Tip>
-
-✏️ **Thử nghiệm thôi!** Chạy đoạn mã trên vài lần để xem việc che ngẫu nhiên diễn ra ngay trước mắt bạn! Đồng thời  thử thay thế phương thức `tokenizer.decode()` bằng `tokenizer.convert_ids_to_tokens()` để thấy rằng đôi khi một token từ một từ nhất định bị che, chứ không phải những cái khác.
-
-</Tip>
+> [!TIP]
+> ✏️ **Thử nghiệm thôi!** Chạy đoạn mã trên vài lần để xem việc che ngẫu nhiên diễn ra ngay trước mắt bạn! Đồng thời  thử thay thế phương thức `tokenizer.decode()` bằng `tokenizer.convert_ids_to_tokens()` để thấy rằng đôi khi một token từ một từ nhất định bị che, chứ không phải những cái khác.
 
 Giờ chúng ta có hai trình đối chiếu dữ liệu, phần còn lại của các bước tinh chỉnh là tiêu chuẩn. Quá trình huấn luyện có thể mất một khoảng thời gian trên Google Colab nếu bạn không đủ may mắn để đạt được GPU P100 thần thoại 😭, vì vậy, trước tiên chúng ta sẽ giảm kích thước của tập huấn luyện xuống còn vài nghìn mẫu. Đừng lo lắng, chúng ta sẽ vẫn nhận được một mô hình ngôn ngữ khá tốt! Một cách nhanh chóng để giảm mẫu một tập dữ liệu trong 🤗 Datasets là thông qua hàm `Dataset.train_test_split()` mà chúng ta đã thấy trong [Chapter 5](/course/chapter5):
 
@@ -819,11 +801,8 @@ trainer.push_to_hub()
 
 {/if}
 
-<Tip>
-
-✏️ **Đến lượt bạn!** Chạy bướchuấn luyện trên sau khi thay đổi trình thu thập dữ liệu thành che toàn bộ từ. Bạn có nhận được kết quả tốt hơn không?
-
-</Tip>
+> [!TIP]
+> ✏️ **Đến lượt bạn!** Chạy bướchuấn luyện trên sau khi thay đổi trình thu thập dữ liệu thành che toàn bộ từ. Bạn có nhận được kết quả tốt hơn không?
 
 {#if fw === 'pt'}
 
@@ -1041,8 +1020,5 @@ Gọn gàng - mô hình của chúng ta rõ ràng đã điều chỉnh trọng s
 
 Điều này kết thúc thử nghiệm đầu tiên của chúng ta với việc huấn luyện một mô hình ngôn ngữ. Trong [phần 6](/course/chapter7/section6), bạn sẽ học cách huấn luyện một mô hình tự động hồi quy như GPT-2 từ đầu; hãy đến đó nếu bạn muốn xem cách bạn có thể huấn luyện trước mô hình Transformer của riêng mình!
 
-<Tip>
-
-✏️ **Thử nghiệm thôi!** Để định lượng lợi ích của việc thích ứng chuyên môn, hãy tinh chỉnh bộ phân loại trên các nhãn IMDb cho cả các checkpoint DistilBERT được huấn luyện trước và tinh chỉnh. Nếu bạn cần bồi dưỡng về phân loại văn bản, hãy xem [Chương 3](/course/chapter3).
-
-</Tip>
+> [!TIP]
+> ✏️ **Thử nghiệm thôi!** Để định lượng lợi ích của việc thích ứng chuyên môn, hãy tinh chỉnh bộ phân loại trên các nhãn IMDb cho cả các checkpoint DistilBERT được huấn luyện trước và tinh chỉnh. Nếu bạn cần bồi dưỡng về phân loại văn bản, hãy xem [Chương 3](/course/chapter3).
diff --git a/chapters/vi/chapter7/4.mdx b/chapters/vi/chapter7/4.mdx
index 42b060b45..b0e59ce09 100644
--- a/chapters/vi/chapter7/4.mdx
+++ b/chapters/vi/chapter7/4.mdx
@@ -156,11 +156,8 @@ Sẽ rất thú vị khi xem liệu mô hình tinh chỉnh của mình có tiế
 
 <Youtube id="0Oxphw4Q9fo"/>
 
-<Tip>
-
-✏️ **Đến lượt bạn!** Một từ tiếng Anh khác thường được sử dụng trong tiếng Pháp là "email". Tìm mẫu đầu tiên trong tập dữ liệu huấn luyện sử dụng từ này. Nó được dịch như thế nào? Làm thế nào để mô hình huấn luyện trước dịch cùng một câu tiếng Anh?
-
-</Tip>
+> [!TIP]
+> ✏️ **Đến lượt bạn!** Một từ tiếng Anh khác thường được sử dụng trong tiếng Pháp là "email". Tìm mẫu đầu tiên trong tập dữ liệu huấn luyện sử dụng từ này. Nó được dịch như thế nào? Làm thế nào để mô hình huấn luyện trước dịch cùng một câu tiếng Anh?
 
 ### Chuẩn bị dữ liệu
 
@@ -177,11 +174,8 @@ tokenizer = AutoTokenizer.from_pretrained(model_checkpoint, return_tensors="tf")
 
 You can also replace the `model_checkpoint` with any other model you prefer from the [Hub](https://huggingface.co/models), or a local folder where you've saved a pretrained model and a tokenizer.
 
-<Tip>
-
-💡 Nếu bạn đang sử dụng trình tokenize đa ngôn ngữ như mBART, mBART-50 hoặc M2M100, bạn sẽ cần đặt mã ngôn ngữ của đầu vào và nhãn của mình trong trình tokenize bằng cách đặt `tokenizer.src_lang` và `tokenizer.tgt_lang` ở bên phải các giá trị.
-
-</Tip>
+> [!TIP]
+> 💡 Nếu bạn đang sử dụng trình tokenize đa ngôn ngữ như mBART, mBART-50 hoặc M2M100, bạn sẽ cần đặt mã ngôn ngữ của đầu vào và nhãn của mình trong trình tokenize bằng cách đặt `tokenizer.src_lang` và `tokenizer.tgt_lang` ở bên phải các giá trị.
 
 Việc chuẩn bị dữ liệu của chúng ta khá đơn giản. Chỉ có một điều cần nhớ: bạn xử lý các đầu vào như bình thường, nhưng đối với các nhãn, bạn cần phải bọc tokenizer bên trong trình quản lý ngữ cảnh `as_target_tokenizer()`.
 
@@ -244,17 +238,11 @@ def preprocess_function(examples):
 
 Note that we set similar maximum lengths for our inputs and outputs. Since the texts we're dealing with seem pretty short, we use 128.
 
-<Tip>
+> [!TIP]
+> 💡 Nếu bạn đang sử dụng mô hình T5 (cụ thể hơn là một trong các checkpoint `t5-xxx`), mô hình sẽ mong đợi các đầu vào văn bản có tiền tố cho biết tác vụ đang thực hiện, chẳng hạn như `translate: English to French:`.
 
-💡 Nếu bạn đang sử dụng mô hình T5 (cụ thể hơn là một trong các checkpoint `t5-xxx`), mô hình sẽ mong đợi các đầu vào văn bản có tiền tố cho biết tác vụ đang thực hiện, chẳng hạn như `translate: English to French:`.
-
-</Tip>
-
-<Tip warning={true}>
-
-⚠️ Chúng ta không chú ý đến attention mask của các nhãn, vì mô hình sẽ không mong đợi điều đó. Thay vào đó, các nhãn tương ứng với token đệm phải được đặt thành `-100` để chúng bị bỏ qua trong tính toán mất mát. Điều này sẽ được thực hiện bởi trình đối chiếu dữ liệu của chúng ta sau này vì chúng ta đang áp dụng đệm động, nhưng nếu bạn sử dụng đệm ở đây, bạn nên điều chỉnh chức năng tiền xử lý để đặt tất cả các nhãn tương ứng với token đệm thành `-100`.
-
-</Tip>
+> [!WARNING]
+> ⚠️ Chúng ta không chú ý đến attention mask của các nhãn, vì mô hình sẽ không mong đợi điều đó. Thay vào đó, các nhãn tương ứng với token đệm phải được đặt thành `-100` để chúng bị bỏ qua trong tính toán mất mát. Điều này sẽ được thực hiện bởi trình đối chiếu dữ liệu của chúng ta sau này vì chúng ta đang áp dụng đệm động, nhưng nếu bạn sử dụng đệm ở đây, bạn nên điều chỉnh chức năng tiền xử lý để đặt tất cả các nhãn tương ứng với token đệm thành `-100`.
 
 Bây giờ chúng ta có thể áp dụng tiền xử lý đó trong một lần trên tất cả các phần của tập dữ liệu của mình:
 
@@ -642,11 +630,8 @@ model.fit(
 
 Lưu ý rằng bạn có thể chỉ định tên của kho lưu trữ mà bạn muốn đẩy lên bằng tham số `hub_model_id` (cụ thể là bạn sẽ phải sử dụng tham số này để đẩy lên một tổ chức). Ví dụ: khi chúng tôi đẩy mô hình vào tổ chức [`huggingface-course`](https://huggingface.co/huggingface-course), chúng ta đã thêm `hub_model_id="huggingface-course/marian-finetuned-kde4-en-to-fr"` thành `Seq2SeqTrainingArguments`. Theo mặc định, kho lưu trữ được sử dụng sẽ nằm trong không gian tên của bạn và được đặt tên theo thư mục đầu ra mà bạn đã đặt, vì vậy ở đây nó sẽ là `"sgugger/marian-finetuned-kde4-en-to-fr"` (là mô hình mà chúng tôi đã liên kết với ở đầu phần này).
 
-<Tip>
-
-💡 Nếu thư mục đầu ra bạn đang sử dụng đã tồn tại, nó cần phải là bản sao cục bộ của kho lưu trữ mà bạn muốn đẩy đến. Nếu không, bạn sẽ gặp lỗi khi gọi `model.fit()` và sẽ cần đặt tên mới.
-
-</Tip>
+> [!TIP]
+> 💡 Nếu thư mục đầu ra bạn đang sử dụng đã tồn tại, nó cần phải là bản sao cục bộ của kho lưu trữ mà bạn muốn đẩy đến. Nếu không, bạn sẽ gặp lỗi khi gọi `model.fit()` và sẽ cần đặt tên mới.
 
 Cuối cùng, hãy xem các chỉ số của chúng ta trông như thế nào khi quá trình huấn luyện đã kết thúc:
 
@@ -692,11 +677,8 @@ Ngoài các siêu tham số thông thường (như tốc độ học, số epoch
 
 Lưu ý rằng bạn có thể chỉ định tên đầy đủ của kho lưu trữ mà bạn muốn đẩy đến bằng tham số `hub_model_id` (đặc biệt, bạn sẽ phải sử dụng tham số này để đẩy đến một tổ chức). Ví dụ: khi chúng ta đẩy mô hình vào tổ chức [`huggingface-course`](https://huggingface.co/huggingface-course), chúng ta đã thêm `hub_model_id="huggingface-course/marian-finetuned-kde4-en-to-fr"` thành `Seq2SeqTrainingArguments`. Theo mặc định, kho lưu trữ được sử dụng sẽ nằm trong không gian tên của bạn và được đặt tên theo thư mục đầu ra mà bạn đã đặt, vì vậy trong trường hợp của chúng tôi, nó sẽ là `"sgugger/marian-finetuned-kde4-en-to-fr"` (là mô hình chúng tôi liên kết đến ở đầu phần này).
 
-<Tip>
-
-💡 Nếu thư mục đầu ra bạn đang sử dụng đã tồn tại, nó cần phải là bản sao cục bộ của kho lưu trữ mà bạn muốn đẩy đến. Nếu không, bạn sẽ gặp lỗi khi xác định `Seq2SeqTrainer` của mình và sẽ cần đặt tên mới.
-
-</Tip>
+> [!TIP]
+> 💡 Nếu thư mục đầu ra bạn đang sử dụng đã tồn tại, nó cần phải là bản sao cục bộ của kho lưu trữ mà bạn muốn đẩy đến. Nếu không, bạn sẽ gặp lỗi khi xác định `Seq2SeqTrainer` của mình và sẽ cần đặt tên mới.
 
 Cuối cùng, ta chỉ cần truyền mọi thứ cho `Seq2SeqTrainer`:
 
@@ -986,8 +968,5 @@ translator(
 
 Một ví dụ tuyệt vời khác về thích ứng chuyện môn!
 
-<Tip>
-
-✏️ **Đến lượt bạn!** Mô hình trả về cái gì với từ "email" bạn xác định trước đó?
-
-</Tip>
+> [!TIP]
+> ✏️ **Đến lượt bạn!** Mô hình trả về cái gì với từ "email" bạn xác định trước đó?
diff --git a/chapters/vi/chapter7/5.mdx b/chapters/vi/chapter7/5.mdx
index 518c5217f..0786c951e 100644
--- a/chapters/vi/chapter7/5.mdx
+++ b/chapters/vi/chapter7/5.mdx
@@ -85,11 +85,8 @@ show_samples(english_dataset)
 '>> Review: Bought this for handling miscellaneous aircraft parts and hanger "stuff" that I needed to organize; it really fit the bill. The unit arrived quickly, was well packaged and arrived intact (always a good sign). There are five wall mounts-- three on the top and two on the bottom. I wanted to mount it on the wall, so all I had to do was to remove the top two layers of plastic drawers, as well as the bottom corner drawers, place it when I wanted and mark it; I then used some of the new plastic screw in wall anchors (the 50 pound variety) and it easily mounted to the wall. Some have remarked that they wanted dividers for the drawers, and that they made those. Good idea. My application was that I needed something that I can see the contents at about eye level, so I wanted the fuller-sized drawers. I also like that these are the new plastic that doesn\'t get brittle and split like my older plastic drawers did. I like the all-plastic construction. It\'s heavy duty enough to hold metal parts, but being made of plastic it\'s not as heavy as a metal frame, so you can easily mount it to the wall and still load it up with heavy stuff, or light stuff. No problem there. For the money, you can\'t beat it. Best one of these I\'ve bought to date-- and I\'ve been using some version of these for over forty years.'
 ```
 
-<Tip>
-
-✏️ **Thử nghiệm thôi!** Thay đổi seed ngẫu nhiên trong lệnh `Dataset.shuffle()` để khám phá các bài đánh giá khác trong kho tài liệu. Nếu bạn là người nói tiếng Tây Ban Nha, hãy xem một số bài đánh giá trong `spanish_dataset` để xem liệu các tiêu đề có giống như những bản tóm tắt hợp lý hay không.
-
-</Tip>
+> [!TIP]
+> ✏️ **Thử nghiệm thôi!** Thay đổi seed ngẫu nhiên trong lệnh `Dataset.shuffle()` để khám phá các bài đánh giá khác trong kho tài liệu. Nếu bạn là người nói tiếng Tây Ban Nha, hãy xem một số bài đánh giá trong `spanish_dataset` để xem liệu các tiêu đề có giống như những bản tóm tắt hợp lý hay không.
 
 Mẫu này cho thấy sự đa dạng của các bài đánh giá mà người ta thường tìm thấy trên mạng, từ tích cực đến tiêu cực (và mọi thứ ở giữa!). Mặc dù ví dụ với tiêu đề "meh" không nhiều thông tin, nhưng các tiêu đề khác trông giống như những bản tóm tắt phù hợp về bản thân các đánh giá. Việc huấn luyện một mô hình tóm tắt cho tất cả 400,000 bài đánh giá sẽ mất quá nhiều thời gian trên một GPU, vì vậy thay vào đó, chúng ta sẽ tập trung vào việc tạo tóm tắt cho một miền sản phẩm. Để biết tên miền mà chúng ta có thể chọn, hãy chuyển đổi `english_dataset` thành `pandas.DataFrame` và tính toán số lượng đánh giá cho mỗi danh mục sản phẩm:
 
@@ -227,11 +224,8 @@ Chúng ta sẽ tập trung vào mT5, một kiến trúc thú vị dựa trên T5
 mT5 không sử dụng tiền tố, nhưng chia sẻ phần lớn tính linh hoạt của T5 và có lợi thế là đa ngôn ngữ. Giờ ta đã chọn một mô hình, hãy xem xét việc chuẩn bị dữ liệu để huấn luyện.
 
 
-<Tip>
-
-✏️ **Thử nghiệm thôi!** Khi bạn đã làm qua phần này, hãy xem mT5 so với mBART tốt như thế nào bằng cách tinh chỉnh phần sau với các kỹ thuật tương tự. Để có điểm thưởng, bạn cũng có thể thử tinh chỉnh T5 chỉ trên các bài đánh giá tiếng Anh. Vì T5 có tiền tố nhắc đặc biệt, bạn sẽ cần thêm  `summarize:` vào trước các mẫu đầu vào trong các bước tiền xử lý bên dưới.
-
-</Tip>
+> [!TIP]
+> ✏️ **Thử nghiệm thôi!** Khi bạn đã làm qua phần này, hãy xem mT5 so với mBART tốt như thế nào bằng cách tinh chỉnh phần sau với các kỹ thuật tương tự. Để có điểm thưởng, bạn cũng có thể thử tinh chỉnh T5 chỉ trên các bài đánh giá tiếng Anh. Vì T5 có tiền tố nhắc đặc biệt, bạn sẽ cần thêm  `summarize:` vào trước các mẫu đầu vào trong các bước tiền xử lý bên dưới.
 
 ## Tiền xử lý dữ liệu
 
@@ -246,10 +240,8 @@ model_checkpoint = "google/mt5-small"
 tokenizer = AutoTokenizer.from_pretrained(model_checkpoint)
 ```
 
-<Tip>
-
-💡 Trong giai đoạn đầu của các dự án NLP của bạn, một phương pháp hay là huấn luyện một lớp các mô hình "nhỏ" trên một mẫu dữ liệu nhỏ. Điều này cho phép bạn gỡ lỗi và lặp lại nhanh hơn đối với quy trình làm việc đầu cuối. Một khi bạn tự tin vào kết quả, bạn luôn có thể mở rộng mô hình bằng cách thay đổi checkpoint của mô hình!
-</Tip>
+> [!TIP]
+> 💡 Trong giai đoạn đầu của các dự án NLP của bạn, một phương pháp hay là huấn luyện một lớp các mô hình "nhỏ" trên một mẫu dữ liệu nhỏ. Điều này cho phép bạn gỡ lỗi và lặp lại nhanh hơn đối với quy trình làm việc đầu cuối. Một khi bạn tự tin vào kết quả, bạn luôn có thể mở rộng mô hình bằng cách thay đổi checkpoint của mô hình!
 
 Hãy thử nghiệm mT5 tokenizer trên một ví dụ nhỏ:
 
@@ -305,11 +297,8 @@ tokenized_datasets = books_dataset.map(preprocess_function, batched=True)
 
 Bây giờ kho dữ liệu đã được xử lý trước, chúng ta hãy xem xét một số chỉ số thường được sử dụng để tóm tắt. Như chúng ta sẽ thấy, không có giải pháp dễ dàng và nhanh chóng khi nói đến việc đo lường chất lượng của văn bản do máy tạo ra.
 
-<Tip>
-
-💡 Bạn có thể nhận thấy rằng chúng ta đã sử dụng `batched=True` trong hàm`Dataset.map()` ở trên. Điều này mã hóa các mẫu theo lô 1,000 (mặc định) và cho phép bạn sử dụng khả năng đa luồng của các bộ tokenizer nhanh trong 🤗 Transformers. Nếu có thể, hãy thử sử dụng `batched=True` để tận dụng tối đa quá trình tiền xử lý của bạn!
-
-</Tip>
+> [!TIP]
+> 💡 Bạn có thể nhận thấy rằng chúng ta đã sử dụng `batched=True` trong hàm`Dataset.map()` ở trên. Điều này mã hóa các mẫu theo lô 1,000 (mặc định) và cho phép bạn sử dụng khả năng đa luồng của các bộ tokenizer nhanh trong 🤗 Transformers. Nếu có thể, hãy thử sử dụng `batched=True` để tận dụng tối đa quá trình tiền xử lý của bạn!
 
 ## Thước đo cho tóm tắt văn bản
 
@@ -326,11 +315,8 @@ reference_summary = "I loved reading the Hunger Games"
 
 Một cách để có thể so sánh chúng là đếm số từ trùng lặp, trong trường hợp này sẽ là 6. Tuy nhiên, điều này hơi thô, vì vậy thay vào đó ROUGE dựa trên việc tính toán điểm số _precision_  và _recall_ cho sự trùng lặp.
 
-<Tip>
-
-🙋 Đừng lo lắng nếu đây là lần đầu tiên bạn nghe nói về precision và recall - chúng ta sẽ cùng nhau điểm qua một số ví dụ rõ ràng để làm rõ tất cả. Các chỉ số này thường gặp trong các tác vụ phân loại, vì vậy nếu bạn muốn hiểu cách xác định precision và recall trong ngữ cảnh đó, chúng tôi khuyên bạn nên xem [hướng dẫn `scikit-learn`](https://scikit-learn.org/stable/auto_examples/model_selection/plot_precision_recall.html).
-
-</Tip>
+> [!TIP]
+> 🙋 Đừng lo lắng nếu đây là lần đầu tiên bạn nghe nói về precision và recall - chúng ta sẽ cùng nhau điểm qua một số ví dụ rõ ràng để làm rõ tất cả. Các chỉ số này thường gặp trong các tác vụ phân loại, vì vậy nếu bạn muốn hiểu cách xác định precision và recall trong ngữ cảnh đó, chúng tôi khuyên bạn nên xem [hướng dẫn `scikit-learn`](https://scikit-learn.org/stable/auto_examples/model_selection/plot_precision_recall.html).
 
 Đối với ROUGE, recall đo lường mức độ tóm tắt tham chiếu thu được từ cái đã tạo. Nếu chúng ta chỉ so sánh các từ, recall có thể được tính theo công thức sau:
 
@@ -382,11 +368,8 @@ Score(precision=0.86, recall=1.0, fmeasure=0.92)
 
 Tuyệt vời, precision và recall lại khớp với nhau! Còn những điểm ROUGE khác thì sao? `rouge2` đo lường sự trùng lặp giữa các bigram (hãy nghĩ rằng đó là sự chồng chéo của các cặp từ), trong khi `rougeL` và `rougeLsum` đo lường các chuỗi từ phù hợp dài nhất bằng cách tìm kiếm các chuỗi con chung dài nhất trong các bản tóm tắt được tạo và tham chiếu. "sum" trong `rougeLsum` đề cập đến thực tế là chỉ số này được tính trên toàn bộ bản tóm tắt, trong khi `rougeL` được tính là giá trị trung bình trên các câu riêng lẻ.
 
-<Tip>
-
-✏️ **Thử nghiệm thôi!** Tạo ví dụ của riêng bạn về bản tóm tắt được tạo và tham khảo, và xem liệu điểm kết quả ROUGE có giống với tính toán thủ công dựa trên các công thức về precision và recall hay không. Để có điểm thưởng, hãy chia văn bản thành bigrams và so sánh độ chính xác và thu hồi cho chỉ số `rouge2`.
-
-</Tip>
+> [!TIP]
+> ✏️ **Thử nghiệm thôi!** Tạo ví dụ của riêng bạn về bản tóm tắt được tạo và tham khảo, và xem liệu điểm kết quả ROUGE có giống với tính toán thủ công dựa trên các công thức về precision và recall hay không. Để có điểm thưởng, hãy chia văn bản thành bigrams và so sánh độ chính xác và thu hồi cho chỉ số `rouge2`.
 
 Chúng tôi sẽ sử dụng các điểm ROUGE này để theo dõi hiệu suất của mô hình, nhưng trước khi làm điều đó, hãy làm điều mà mọi người thực hành NLP giỏi nên làm: tạo một đường cơ sở mạnh mẽ nhưng đơn giản!
 
@@ -476,11 +459,8 @@ model = TFAutoModelForSeq2SeqLM.from_pretrained(model_checkpoint)
 
 {/if}
 
-<Tip>
-
-💡 Nếu bạn đang tự hỏi tại sao bạn không thấy bất kỳ cảnh báo nào về việc tinh chỉnh mô hình trên một tác vụ phía sau, đó là bởi vì đối với các tác vụ chuỗi sang chuỗi, chúng ta giữ tất cả các trọng số của mạng. So sánh mô hình này với mô hình phân loại văn bản trong [Chương 3](/course/chapter3), trong đó phần đầu của mô hình định sẵn được thay thế bằng một mạng được khởi tạo ngẫu nhiên.
-
-</Tip>
+> [!TIP]
+> 💡 Nếu bạn đang tự hỏi tại sao bạn không thấy bất kỳ cảnh báo nào về việc tinh chỉnh mô hình trên một tác vụ phía sau, đó là bởi vì đối với các tác vụ chuỗi sang chuỗi, chúng ta giữ tất cả các trọng số của mạng. So sánh mô hình này với mô hình phân loại văn bản trong [Chương 3](/course/chapter3), trong đó phần đầu của mô hình định sẵn được thay thế bằng một mạng được khởi tạo ngẫu nhiên.
 
 Điều tiếp theo chúng ta cần làm là đăng nhập vào Hugging Face Hub. Nếu bạn đang chạy đoạn mã này trong notebook, bạn có thể làm như vậy với hàm tiện ích sau:
 
@@ -820,11 +800,8 @@ model, optimizer, train_dataloader, eval_dataloader = accelerator.prepare(
 )
 ```
 
-<Tip>
-
-🚨 Nếu bạn đang huấn luyện trên TPU, bạn sẽ cần chuyển tất cả đoạn mã ở trên vào một hàm huấn luyện chuyên dụng. Xem [Chương 3(/course/chapter3) để biết thêm chi tiết.
-
-</Tip>
+> [!TIP]
+> 🚨 Nếu bạn đang huấn luyện trên TPU, bạn sẽ cần chuyển tất cả đoạn mã ở trên vào một hàm huấn luyện chuyên dụng. Xem [Chương 3(/course/chapter3) để biết thêm chi tiết.
 
 Bây giờ chúng ta đã chuẩn bị các đối tượng của mình, còn ba việc cần làm:
 
diff --git a/chapters/vi/chapter7/6.mdx b/chapters/vi/chapter7/6.mdx
index 0fc13dfd6..1a735073c 100644
--- a/chapters/vi/chapter7/6.mdx
+++ b/chapters/vi/chapter7/6.mdx
@@ -161,11 +161,8 @@ DatasetDict({
 })
 ```
 
-<Tip>
-
-Việc huấn luyện trước mô hình ngôn ngữ sẽ mất một lúc. Chúng tôi khuyên bạn trước tiên nên chạy vòng lặp huấn luyện trên một mẫu dữ liệu bằng cách bỏ chú thích hai dòng một phần ở trên và đảm bảo rằng quá trình huấn luyện hoàn tất thành công và các mô hình được lưu trữ. Không có gì khó chịu hơn là một lần chạy huấn luyện không thành công ở bước cuối cùng vì bạn quên tạo một thư mục hoặc vì có lỗi đánh máy ở cuối vòng lặp huấn luyện!
-
-</Tip>
+> [!TIP]
+> Việc huấn luyện trước mô hình ngôn ngữ sẽ mất một lúc. Chúng tôi khuyên bạn trước tiên nên chạy vòng lặp huấn luyện trên một mẫu dữ liệu bằng cách bỏ chú thích hai dòng một phần ở trên và đảm bảo rằng quá trình huấn luyện hoàn tất thành công và các mô hình được lưu trữ. Không có gì khó chịu hơn là một lần chạy huấn luyện không thành công ở bước cuối cùng vì bạn quên tạo một thư mục hoặc vì có lỗi đánh máy ở cuối vòng lặp huấn luyện!
 
 Hãy xem một ví dụ từ tập dữ liệu. Chúng ta sẽ chỉ hiển thị 200 ký tự đầu tiên của mỗi trường:
 
@@ -286,11 +283,8 @@ Hiện chúng ta có 16,7 triệu ví dụ với 128 token mỗi ví dụ, tươ
 
 Bây giờ chúng ta đã có tập dữ liệu sẵn sàng, hãy thiết lập mô hình!
 
-<Tip>
-
-✏️ **Thử nghiệm thôi!** Loại bỏ tất cả các phần nhỏ hơn kích thước ngữ cảnh không phải là vấn đề lớn ở đây vì chúng ta đang sử dụng các cửa sổ ngữ cảnh nhỏ. Khi bạn tăng kích thước ngữ cảnh (hoặc nếu bạn có một kho tài liệu ngắn), phần nhỏ các phần bị vứt bỏ cũng sẽ tăng lên. Một cách hiệu quả hơn để chuẩn bị dữ liệu là kết hợp tất cả các mẫu được tokenize trong một lô với token `eos_token_id` ở giữa, và sau đó thực hiện phân đoạn trên các chuỗi được nối. Như một bài tập, hãy sửa đổi hàm `tokenize()` để sử dụng cách tiếp cận đó. Lưu ý rằng bạn sẽ muốn đặt `truncation=False` và xóa các tham số khác khỏi tokenizer để nhận được chuỗi đầy đủ của token ID.
-
-</Tip>
+> [!TIP]
+> ✏️ **Thử nghiệm thôi!** Loại bỏ tất cả các phần nhỏ hơn kích thước ngữ cảnh không phải là vấn đề lớn ở đây vì chúng ta đang sử dụng các cửa sổ ngữ cảnh nhỏ. Khi bạn tăng kích thước ngữ cảnh (hoặc nếu bạn có một kho tài liệu ngắn), phần nhỏ các phần bị vứt bỏ cũng sẽ tăng lên. Một cách hiệu quả hơn để chuẩn bị dữ liệu là kết hợp tất cả các mẫu được tokenize trong một lô với token `eos_token_id` ở giữa, và sau đó thực hiện phân đoạn trên các chuỗi được nối. Như một bài tập, hãy sửa đổi hàm `tokenize()` để sử dụng cách tiếp cận đó. Lưu ý rằng bạn sẽ muốn đặt `truncation=False` và xóa các tham số khác khỏi tokenizer để nhận được chuỗi đầy đủ của token ID.
 
 ## Khởi tạo mô hình mới
 
@@ -431,11 +425,8 @@ tf_eval_dataset = tokenized_dataset["valid"].to_tf_dataset(
 
 {/if}
 
-<Tip warning={true}>
-
-⚠️ Việc dịch chuyển các đầu vào và nhãn để căn chỉnh chúng xảy ra bên trong mô hình, do đó, bộ đối chiếu dữ liệu chỉ cần sao chép các đầu vào để tạo nhãn.
-
-</Tip>
+> [!WARNING]
+> ⚠️ Việc dịch chuyển các đầu vào và nhãn để căn chỉnh chúng xảy ra bên trong mô hình, do đó, bộ đối chiếu dữ liệu chỉ cần sao chép các đầu vào để tạo nhãn.
 
 Bây giờ chúng ta có mọi thứ để thực sự huấn luyện mô hình của mình - đó không phải là quá nhiều công việc! Trước khi bắt đầu luyện tập, chúng ta nên đăng nhập vào Hugging Face. Nếu bạn đang làm việc trong notebook, bạn có thể làm như vậy với hàm tiện ích sau:
 
@@ -533,25 +524,19 @@ model.fit(tf_train_dataset, validation_data=tf_eval_dataset, callbacks=[callback
 
 {/if}
 
-<Tip>
+> [!TIP]
+> ✏️ **Thử nghiệm thôi!** Chỉ mất khoảng 30 dòng mã ngoài `TrainingArguments` để từ văn bản thô đến huấn luyện GPT-2. Hãy dùng thử với tập dữ liệu của riêng bạn và xem liệu bạn có thể đạt được kết quả tốt hay không!
 
-✏️ **Thử nghiệm thôi!** Chỉ mất khoảng 30 dòng mã ngoài `TrainingArguments` để từ văn bản thô đến huấn luyện GPT-2. Hãy dùng thử với tập dữ liệu của riêng bạn và xem liệu bạn có thể đạt được kết quả tốt hay không!
-
-</Tip>
-
-<Tip>
-
-{#if fw === 'pt'}
-
-💡 Nếu bạn có quyền truy cập vào một máy có nhiều GPU, hãy thử chạy mã ở đó. `Trainer` tự động quản lý nhiều máy và điều này có thể tăng tốc quá trình huấn luyện lên rất nhiều.
-
-{:else}
-
-💡 Nếu bạn có quyền truy cập vào một máy có nhiều GPU, bạn có thể thử sử dụng ngữ cảnh `MirroredStrategy` để tăng tốc đáng kể cho quá trình huấn luyện. Bạn sẽ cần tạo một đối tượng `tf.distribute.MirroredStrategy` và đảm bảo rằng các lệnh `to_tf_dataset` cũng như tạo mô hình và lệnh gọi đến `fit()` đều được chạy trong ngữ cảnh `scope()` của nó. Bạn có thể xem tài liệu về điều này [tại đây](https://www.tensorflow.org/guide/distributed_training#use_tfdistributestrategy_with_keras_modelfit).
-
-{/if}
-
-</Tip>
+> [!TIP]
+> {#if fw === 'pt'}
+>
+> 💡 Nếu bạn có quyền truy cập vào một máy có nhiều GPU, hãy thử chạy mã ở đó. `Trainer` tự động quản lý nhiều máy và điều này có thể tăng tốc quá trình huấn luyện lên rất nhiều.
+>
+> {:else}
+>
+> 💡 Nếu bạn có quyền truy cập vào một máy có nhiều GPU, bạn có thể thử sử dụng ngữ cảnh `MirroredStrategy` để tăng tốc đáng kể cho quá trình huấn luyện. Bạn sẽ cần tạo một đối tượng `tf.distribute.MirroredStrategy` và đảm bảo rằng các lệnh `to_tf_dataset` cũng như tạo mô hình và lệnh gọi đến `fit()` đều được chạy trong ngữ cảnh `scope()` của nó. Bạn có thể xem tài liệu về điều này [tại đây](https://www.tensorflow.org/guide/distributed_training#use_tfdistributestrategy_with_keras_modelfit).
+>
+> {/if}
 
 ## Tạo mã với một pipeline
 
@@ -827,11 +812,8 @@ model, optimizer, train_dataloader, eval_dataloader = accelerator.prepare(
 )
 ```
 
-<Tip>
-
-🚨 Nếu bạn đang huấn luyện trên TPU, bạn sẽ cần chuyển tất cả mã bắt đầu từ ô ở trên vào một hàm huấn luyện chuyên dụng. Xem [Chapter 3](/course/chapter3) để biết thêm chi tiết.
-
-</Tip>
+> [!TIP]
+> 🚨 Nếu bạn đang huấn luyện trên TPU, bạn sẽ cần chuyển tất cả mã bắt đầu từ ô ở trên vào một hàm huấn luyện chuyên dụng. Xem [Chapter 3](/course/chapter3) để biết thêm chi tiết.
 
 Bây giờ, chúng ta đã gửi `train_dataloader` của mình tới `accelerator.prepare()`, chúng ta có thể sử dụng độ dài của nó để tính số bước huấn luyện. Hãy nhớ rằng chúng ta phải luôn làm điều này sau khi chuẩn bị dataloader, vì phương thức đó sẽ thay đổi độ dài của nó. Chúng ta sử dụng một lịch trình tuyến tính cổ điển từ tốc độ học đến 0:
 
@@ -932,16 +914,10 @@ for epoch in range(num_train_epochs):
 
 Vậy là xong - bây giờ bạn có vòng huấn luyện tùy chỉnh của riêng mình cho các mô hình ngôn ngữ nhân quả chẳng hạn như GPT-2 mà bạn có thể tùy chỉnh thêm theo nhu cầu của mình.
 
-<Tip>
-
-✏️ **Thử nghiệm thôi!** Hoặc tạo hàm mất tùy chỉnh của riêng bạn phù hợp với trường hợp sử dụng của bạn hoặc thêm một bước tùy chỉnh khác vào vòng lặp huấn luyện.
-
-</Tip>
-
-<Tip>
-
-✏️ **Thử nghiệm thôi!** Khi chạy các thử nghiệm huấn luyện dài, bạn nên ghi lại các chỉ số quan trọng bằng cách sử dụng các công cụ như TensorBoard hoặc Weights & Biases. Thêm ghi nhật ký thích hợp vào vòng lặp huấn luyện để bạn luôn có thể kiểm tra quá trình huấn luyện diễn ra như thế nào.
+> [!TIP]
+> ✏️ **Thử nghiệm thôi!** Hoặc tạo hàm mất tùy chỉnh của riêng bạn phù hợp với trường hợp sử dụng của bạn hoặc thêm một bước tùy chỉnh khác vào vòng lặp huấn luyện.
 
-</Tip>
+> [!TIP]
+> ✏️ **Thử nghiệm thôi!** Khi chạy các thử nghiệm huấn luyện dài, bạn nên ghi lại các chỉ số quan trọng bằng cách sử dụng các công cụ như TensorBoard hoặc Weights & Biases. Thêm ghi nhật ký thích hợp vào vòng lặp huấn luyện để bạn luôn có thể kiểm tra quá trình huấn luyện diễn ra như thế nào.
 
 {/if}
diff --git a/chapters/vi/chapter7/7.mdx b/chapters/vi/chapter7/7.mdx
index 516daa7a9..7766d7009 100644
--- a/chapters/vi/chapter7/7.mdx
+++ b/chapters/vi/chapter7/7.mdx
@@ -32,11 +32,8 @@ Chúng ta sẽ tinh chỉnh mô hình BERT trên [bộ dữ liệu SQuAD](https:
 
 Đây thực sự cách mô hình đã được huấn luyện và tải lên Hub bằng cách sử dụng mã được hiển thị trong phần này. Bạn có thể tìm thấy nó và kiểm tra các dự đoạn [tại đây](https://huggingface.co/huggingface-course/bert-finetuned-squad?context=%F0%9F%A4%97+Transformers+is+backed+by+the+three+most+popular+deep+learning+libraries+%E2%80%94+Jax%2C+PyTorch+and+TensorFlow+%E2%80%94+with+a+seamless+integration+between+them.+It%27s+straightforward+to+train+your+models+with+one+before+loading+them+for+inference+with+the+other.&question=Which+deep+learning+libraries+back+%F0%9F%A4%97+Transformers%3F).
 
-<Tip>
-
-💡 Các mô hình mã hóa như BERT có xu hướng tuyệt vời trong việc trích xuất câu trả lời cho các câu hỏi dạng thực tế như "Ai đã phát minh ra kiến trúc Transformer?" nhưng khá kém khi trả lời những câu hỏi mở như "Tại sao bầu trời lại có màu xanh?" Trong những trường hợp khó khăn hơn này, các mô hình mã hóa-giải mã như T5 và BART thường được sử dụng để tổng hợp thông tin theo cách khá giống với [tóm tắt văn bản](/course/chapter7/5). Nếu bạn quan tâm đến kiểu trả lời câu hỏi *chung chung* này, chúng tôi khuyên bạn nên xem [demo](https://yjernite.github.io/lfqa.html) của chúng tôi dựa trên [bộ dữ liệu ELI5](https://huggingface.co/datasets/eli5).
-
-</Tip>
+> [!TIP]
+> 💡 Các mô hình mã hóa như BERT có xu hướng tuyệt vời trong việc trích xuất câu trả lời cho các câu hỏi dạng thực tế như "Ai đã phát minh ra kiến trúc Transformer?" nhưng khá kém khi trả lời những câu hỏi mở như "Tại sao bầu trời lại có màu xanh?" Trong những trường hợp khó khăn hơn này, các mô hình mã hóa-giải mã như T5 và BART thường được sử dụng để tổng hợp thông tin theo cách khá giống với [tóm tắt văn bản](/course/chapter7/5). Nếu bạn quan tâm đến kiểu trả lời câu hỏi *chung chung* này, chúng tôi khuyên bạn nên xem [demo](https://yjernite.github.io/lfqa.html) của chúng tôi dựa trên [bộ dữ liệu ELI5](https://huggingface.co/datasets/eli5).
 
 ## Chuẩn bị dữ liệu
 
@@ -359,11 +356,8 @@ Bây giờ chúng ta đã thấy từng bước cách tiền xử lý dữ liệ
 
 Thật vậy, chúng ta không thấy câu trả lời bên trong ngữ cảnh.
 
-<Tip>
-
-✏️ **Đến lượt bạn!** Khi sử dụng kiến trúc XLNet, phần đệm được áp dụng ở bên trái và câu hỏi và ngữ cảnh được chuyển đổi. Điều chỉnh tất cả mã chúng ta vừa thấy với kiến trúc XLNet (và thêm `padding=True`). Lưu ý rằng token `[CLS]` có thể không ở vị trí 0 khi áp dụng phần đệm.
-
-</Tip>
+> [!TIP]
+> ✏️ **Đến lượt bạn!** Khi sử dụng kiến trúc XLNet, phần đệm được áp dụng ở bên trái và câu hỏi và ngữ cảnh được chuyển đổi. Điều chỉnh tất cả mã chúng ta vừa thấy với kiến trúc XLNet (và thêm `padding=True`). Lưu ý rằng token `[CLS]` có thể không ở vị trí 0 khi áp dụng phần đệm.
 
 Bây giờ chúng ta đã thấy từng bước cách tiền xử lý dữ liệu huấn luyện của mình, chúng ta có thể nhóm nó trong một hàm mà chúng ta sẽ áp dụng trên toàn bộ tập dữ liệu huấn luyện. Chúng ta sẽ đệm mọi đặc trưng đến độ dài tối đa mà chúng ta đã đặt, vì hầu hết các ngữ cảnh sẽ dài (và các mẫu tương ứng sẽ được chia thành nhiều đặc trưng), vì vậy không có lợi ích thực sự nào khi áp dụng đệm động ở đây:
 
@@ -916,11 +910,8 @@ Mặc định, kho lưu trữ được sử dụng sẽ nằm trong không gian
 
 {#if fw === 'pt'}
 
-<Tip>
-
-💡 Nếu thư mục đầu ra bạn đang sử dụng tồn tại, nó cần phải là bản sao cục bộ của kho lưu trữ mà bạn muốn đẩy đến (vì vậy hãy đặt tên mới nếu bạn gặp lỗi khi xác định `Trainer` của mình).
-
-</Tip>
+> [!TIP]
+> 💡 Nếu thư mục đầu ra bạn đang sử dụng tồn tại, nó cần phải là bản sao cục bộ của kho lưu trữ mà bạn muốn đẩy đến (vì vậy hãy đặt tên mới nếu bạn gặp lỗi khi xác định `Trainer` của mình).
 
 Cuối cùng, ta chỉ cần truyền mọi thứ vào lớp `Trainer` và khởi động việc huấn luyện:
 
@@ -1004,11 +995,8 @@ trainer.push_to_hub(commit_message="Training complete")
 
 Ở giai đoạn này, bạn có thể sử dụng tiện ích luận suy trên Model Hub để kiểm tra mô hình và chia sẻ mô hình đó với bạn bè, gia đình và vật nuôi yêu thích của bạn. Bạn đã tinh chỉnh thành công một mô hình trong tác vụ hỏi đáp - xin chúc mừng!
 
-<Tip>
-
-✏️ **Đến lượt bạn!** Hãy thử một kiến trúc mô hình khác để xem liệu nó có hoạt động tốt hơn trong tác vụ này không!
-
-</Tip>
+> [!TIP]
+> ✏️ **Đến lượt bạn!** Hãy thử một kiến trúc mô hình khác để xem liệu nó có hoạt động tốt hơn trong tác vụ này không!
 
 {#if fw === 'pt'}
 
diff --git a/chapters/vi/chapter8/2.mdx b/chapters/vi/chapter8/2.mdx
index 50d862204..d64315e3d 100644
--- a/chapters/vi/chapter8/2.mdx
+++ b/chapters/vi/chapter8/2.mdx
@@ -85,11 +85,8 @@ OSError: Can't load config for 'lewtun/distillbert-base-uncased-finetuned-squad-
 
 Có rất nhiều thông tin có trong các báo cáo này, vì vậy chúng ta hãy cùng nhau xem qua các phần chính. Điều đầu tiên cần lưu ý là theo dõi phải được đọc _từ dưới lên trên_. Điều này nghe có vẻ kỳ lạ nếu bạn đã quen đọc văn bản tiếng Anh từ trên xuống dưới, nhưng nó phản ánh thực tế là bản truy xuất hiển thị chuỗi các lệnh gọi hàm mà `pipeline` thực hiện khi tải xuống mô hình và trình tokenizer. (Xem [Chương 2](/course/chapter2) để biết thêm chi tiết về cách hoạt động của `pipeline`.)
 
-<Tip>
-
-🚨 Bạn có thấy hộp màu xanh lam xung quanh "6 frames" trong phần truy xuất từ Google Colab không? Đó là một tính năng đặc biệt của Colab, nén phần truy xuất vào các "frames". Nếu bạn dường như không thể tìm ra nguồn gốc của lỗi, hãy đảm bảo rằng bạn mở rộng toàn bộ theo dõi bằng cách nhấp vào hai mũi tên nhỏ đó.
-
-</Tip>
+> [!TIP]
+> 🚨 Bạn có thấy hộp màu xanh lam xung quanh "6 frames" trong phần truy xuất từ Google Colab không? Đó là một tính năng đặc biệt của Colab, nén phần truy xuất vào các "frames". Nếu bạn dường như không thể tìm ra nguồn gốc của lỗi, hãy đảm bảo rằng bạn mở rộng toàn bộ theo dõi bằng cách nhấp vào hai mũi tên nhỏ đó.
 
 Điều này có nghĩa là dòng cuối cùng của truy xuất cho biết thông báo lỗi cuối cùng và cung cấp tên của ngoại lệ đã được nêu ra. Trong trường hợp này, loại ngoại lệ là `OSError`, cho biết lỗi liên quan đến hệ thống. Nếu chúng ta đọc thông báo lỗi kèm theo, chúng ta có thể thấy rằng dường như có sự cố với tệp *config.json* của mô hình và ta sẽ đưa ra hai đề xuất để khắc phục:
 
@@ -103,11 +100,8 @@ Make sure that:
 """
 ```
 
-<Tip>
-
-💡 Nếu bạn gặp phải thông báo lỗi khó hiểu, chỉ cần sao chép và dán thông báo đó vào thanh tìm kiếm Google hoặc [Stack Overflow](https://stackoverflow.com/) (vâng, thực sự!). Có nhiều khả năng bạn không phải là người đầu tiên gặp phải lỗi và đây là một cách tốt để tìm giải pháp mà những người khác trong cộng đồng đã đăng. Ví dụ: tìm kiếm `OSError: Can't load config for` trên Stack Overflow mang lại nhiều [lần truy cập](https://stackoverflow.com/search?q=OSError%3A+Can%27t+load+config+for+) có thể được sử dụng như một điểm khởi đầu để giải quyết vấn đề.
-
-</Tip>
+> [!TIP]
+> 💡 Nếu bạn gặp phải thông báo lỗi khó hiểu, chỉ cần sao chép và dán thông báo đó vào thanh tìm kiếm Google hoặc [Stack Overflow](https://stackoverflow.com/) (vâng, thực sự!). Có nhiều khả năng bạn không phải là người đầu tiên gặp phải lỗi và đây là một cách tốt để tìm giải pháp mà những người khác trong cộng đồng đã đăng. Ví dụ: tìm kiếm `OSError: Can't load config for` trên Stack Overflow mang lại nhiều [lần truy cập](https://stackoverflow.com/search?q=OSError%3A+Can%27t+load+config+for+) có thể được sử dụng như một điểm khởi đầu để giải quyết vấn đề.
 
 Đề xuất đầu tiên là yêu cầu ta kiểm tra xem ID mô hình có thực sự chính xác hay không, vì vậy, việc đầu tiên ta làm là sao chép chỉ số nhận dạng và dán nó vào thanh tìm kiếm của Hub:
 
@@ -159,11 +153,8 @@ pretrained_checkpoint = "distilbert-base-uncased"
 config = AutoConfig.from_pretrained(pretrained_checkpoint)
 ```
 
-<Tip warning={true}>
-
-🚨 Cách tiếp cận mà chúng tôi đang thực hiện ở đây không phải là hoàn hảo, vì đồng nghiệp của chúng ta có thể đã chỉnh sửa cấu hình của `distilbert-base-uncased` trước khi tinh chỉnh mô hình. Trong thực tế, chúng ta muốn kiểm tra với họ trước, nhưng với mục đích của phần này, chúng ta sẽ giả định rằng họ đã sử dụng cấu hình mặc định.
-
-</Tip>
+> [!WARNING]
+> 🚨 Cách tiếp cận mà chúng tôi đang thực hiện ở đây không phải là hoàn hảo, vì đồng nghiệp của chúng ta có thể đã chỉnh sửa cấu hình của `distilbert-base-uncased` trước khi tinh chỉnh mô hình. Trong thực tế, chúng ta muốn kiểm tra với họ trước, nhưng với mục đích của phần này, chúng ta sẽ giả định rằng họ đã sử dụng cấu hình mặc định.
 
 Sau đó, chúng ta có thể đẩy nó vào kho lưu trữ mô hình của mình bằng hàm `push_to_hub()` của cấu hình:
 
diff --git a/chapters/vi/chapter8/4.mdx b/chapters/vi/chapter8/4.mdx
index fd7a55996..24fd86e4e 100644
--- a/chapters/vi/chapter8/4.mdx
+++ b/chapters/vi/chapter8/4.mdx
@@ -245,11 +245,8 @@ Vì vậy, `1` có nghĩa là `neutral`, có nghĩa là hai câu chúng ta đã
 
 Chúng ta không có token ID ở đây, vì DistilBERT không mong đợi chúng; nếu bạn có một số trong mô hình của mình, bạn cũng nên đảm bảo rằng chúng khớp đúng với vị trí của câu đầu tiên và câu thứ hai trong đầu vào.
 
-<Tip>
-
-✏️ **Đến lượt bạn!** Kiểm tra xem mọi thứ có chính xác không với phần tử thứ hai của tập dữ liệu huấn luyện.
-
-</Tip>
+> [!TIP]
+> ✏️ **Đến lượt bạn!** Kiểm tra xem mọi thứ có chính xác không với phần tử thứ hai của tập dữ liệu huấn luyện.
 
 Chúng ta chỉ thực hiện kiểm tra tập huấn luyện ở đây, nhưng tất nhiên bạn nên kiểm tra kỹ các tập kiểm định và kiểm tra theo cùng một cách.
 
@@ -522,11 +519,8 @@ Bất cứ khi nào bạn nhận được thông báo lỗi bắt đầu bằng
 
 Để giải quyết vấn đề này, bạn chỉ cần sử dụng ít dung lượng GPU hơn - điều mà nói thì dễ hơn làm. Trước tiên, hãy đảm bảo rằng bạn không có hai mô hình GPU trên cùng một lúc (tất nhiên là trừ khi đó là yêu cầu cho vấn đề của bạn). Sau đó, bạn có thể nên giảm kích thước lô của mình, vì nó ảnh hưởng trực tiếp đến kích thước của tất cả các đầu ra trung gian của mô hình và độ dốc của chúng. Nếu sự cố vẫn tiếp diễn, hãy xem xét sử dụng phiên bản mô hình nhỏ hơn của bạn.
 
-<Tip>
-
-Trong phần tiếp theo của khóa học, chúng ta sẽ xem xét các kỹ thuật nâng cao hơn có thể giúp bạn giảm dung lượng bộ nhớ và cho phép bạn tinh chỉnh các mô hình lớn nhất.
-
-</Tip>
+> [!TIP]
+> Trong phần tiếp theo của khóa học, chúng ta sẽ xem xét các kỹ thuật nâng cao hơn có thể giúp bạn giảm dung lượng bộ nhớ và cho phép bạn tinh chỉnh các mô hình lớn nhất.
 
 ### Đánh giá mô hình
 
@@ -553,11 +547,8 @@ trainer.evaluate()
 TypeError: only size-1 arrays can be converted to Python scalars
 ```
 
-<Tip>
-
-💡 Bạn phải luôn đảm bảo rằng mình có thể chạy `trainr.evaluate()` trước khi khởi chạy `trainer.train()`, để tránh lãng phí nhiều tài nguyên máy tính trước khi gặp lỗi.
-
-</Tip>
+> [!TIP]
+> 💡 Bạn phải luôn đảm bảo rằng mình có thể chạy `trainr.evaluate()` trước khi khởi chạy `trainer.train()`, để tránh lãng phí nhiều tài nguyên máy tính trước khi gặp lỗi.
 
 Trước khi cố gắng gỡ lỗi một vấn đề trong vòng kiểm định, trước tiên bạn nên đảm bảo rằng bạn đã xem xét dữ liệu, có thể tạo một lô đúng cách và có thể chạy mô hình của bạn trên đó. Chúng ta đã hoàn thành tất cả các bước đó, vì vậy mã sau có thể được thực thi mà không có lỗi:
 
@@ -687,11 +678,8 @@ trainer.train()
 
 Trong trường hợp này, không còn vấn đề gì nữa và tập lệnh của chúng ta sẽ tinh chỉnh một mô hình sẽ cho kết quả hợp lý. Nhưng chúng ta có thể làm gì khi quá trình huấn luyện diễn ra mà không có bất kỳ lỗi nào, và mô hình được huấn luyện không hoạt động tốt chút nào? Đó là phần khó nhất của học máy và chúng ta sẽ chỉ cho bạn một vài kỹ thuật có thể hữu ích.
 
-<Tip>
-
-💡 Nếu bạn đang sử dụng vòng lặp huấn luyện thủ công, các bước tương tự sẽ áp dụng để gỡ lỗi quy trình huấn luyện của bạn, nhưng việc tách chúng ra sẽ dễ dàng hơn. Tuy nhiên, hãy đảm bảo rằng bạn không quên `model.eval()` hoặc `model.train()` ở đúng nơi, hoặc `zero_grad()` ở mỗi bước!
-
-</Tip>
+> [!TIP]
+> 💡 Nếu bạn đang sử dụng vòng lặp huấn luyện thủ công, các bước tương tự sẽ áp dụng để gỡ lỗi quy trình huấn luyện của bạn, nhưng việc tách chúng ra sẽ dễ dàng hơn. Tuy nhiên, hãy đảm bảo rằng bạn không quên `model.eval()` hoặc `model.train()` ở đúng nơi, hoặc `zero_grad()` ở mỗi bước!
 
 ## Debugging silent errors during training
 
@@ -706,11 +694,8 @@ Mô hình của bạn sẽ chỉ học được điều gì đó nếu nó thự
 - Có một nhãn nào phổ biến hơn những nhãn khác không?
 - Mất mát/Chỉ số sẽ là bao nhiêu nếu mô hình dự đoán một câu trả lời ngẫu nhiên/luôn là một câu trả lời giống nhau?
 
-<Tip warning={true}>
-
-⚠️ Nếu bạn đang thực hiện huấn luyện phân tán, hãy in các mẫu tập dữ liệu của bạn trong mỗi quy trình và kiểm tra ba lần để đảm bảo bạn nhận được điều tương tự. Một lỗi phổ biến là có một số nguồn ngẫu nhiên trong quá trình tạo dữ liệu khiến mỗi quy trình có một phiên bản khác nhau của tập dữ liệu.
-
-</Tip>
+> [!WARNING]
+> ⚠️ Nếu bạn đang thực hiện huấn luyện phân tán, hãy in các mẫu tập dữ liệu của bạn trong mỗi quy trình và kiểm tra ba lần để đảm bảo bạn nhận được điều tương tự. Một lỗi phổ biến là có một số nguồn ngẫu nhiên trong quá trình tạo dữ liệu khiến mỗi quy trình có một phiên bản khác nhau của tập dữ liệu.
 
 Sau khi xem xét dữ liệu của bạn, hãy xem qua một số dự đoán của mô hình và giải mã chúng. Nếu mô hình luôn dự đoán cùng một điều, có thể là do tập dữ liệu của bạn thiên về một loại (đối với các vấn đề phân loại); các kỹ thuật như lấy mẫu quá mức các lớp hiếm có thể hữu ích.
 
@@ -739,11 +724,8 @@ for _ in range(20):
     trainer.optimizer.zero_grad()
 ```
 
-<Tip>
-
-💡 Nếu dữ liệu huấn luyện của bạn không cân bằng, hãy đảm bảo tạo một loạt dữ liệu huấn luyện có chứa tất cả các nhãn.
-
-</Tip>
+> [!TIP]
+> 💡 Nếu dữ liệu huấn luyện của bạn không cân bằng, hãy đảm bảo tạo một loạt dữ liệu huấn luyện có chứa tất cả các nhãn.
 
 Mô hình phải có kết quả trả về gần như hoàn hảo trên cùng một `lô`. Hãy tính toán các chỉ số trên các dự đoán kết quả:
 
@@ -764,11 +746,8 @@ Chính xác 100%, đây là một ví dụ điển hình về việc overfitt(c
 
 Nếu bạn không quản lý để mô hình của mình có được kết quả hoàn hảo như thế này, điều đó có nghĩa là có điều gì đó không ổn trong cách bạn định khung vấn đề hoặc dữ liệu của mình, vì vậy bạn nên khắc phục điều đó. Chỉ khi bạn vượt qua được bài kiểm tra overfit, bạn mới có thể chắc chắn rằng mô hình của mình thực sự có thể học được điều gì đó.
 
-<Tip warning={true}>
-
-⚠️ Bạn sẽ phải tạo lại mô hình và `Trainer`của mình sau bài kiểm tra overfitt này, vì mô hình thu được có thể sẽ không thể khôi phục và học được điều gì đó hữu ích trên tập dữ liệu đầy đủ của bạn.
-
-</Tip>
+> [!WARNING]
+> ⚠️ Bạn sẽ phải tạo lại mô hình và `Trainer`của mình sau bài kiểm tra overfitt này, vì mô hình thu được có thể sẽ không thể khôi phục và học được điều gì đó hữu ích trên tập dữ liệu đầy đủ của bạn.
 
 ### Không điều chỉnh bất cứ thứ gì cho đến khi bạn có mô hình cơ sở đầu tiên
 
diff --git a/chapters/vi/chapter8/4_tf.mdx b/chapters/vi/chapter8/4_tf.mdx
index 0061a3d5c..c7b6b1e5d 100644
--- a/chapters/vi/chapter8/4_tf.mdx
+++ b/chapters/vi/chapter8/4_tf.mdx
@@ -109,15 +109,12 @@ model.compile(optimizer="adam")
 
 Bây giờ chúng ta sẽ sử dụng mất mát bên trong của mô hình và vấn đề này sẽ được giải quyết!
 
-<Tip>
-
-✏️ **Đến lượt bạn!** Là một thử thách không bắt buộc sau khi chúng ta đã giải quyết xong các vấn đề khác, bạn có thể thử quay lại bước này và làm cho mô hình hoạt động với mất mát do Keras tính toán ban đầu thay vì mất mát nội bộ. Bạn sẽ cần phải thêm `"labels"` vào `label_cols` của `to_tf_dataset()` để đảm bảo rằng các nhãn được xuất chính xác, điều này sẽ giúp bạn có được độ dốc - nhưng có một vấn đề nữa với sự mất mát mà chúng ta đã chỉ định. Việc huấnl uyện vẫn sẽ diễn ra với vấn đề này, nhưng việc học sẽ rất chậm và sẽ khả năng mất mát huấn luyện cao. Bạn có thể tìm ra nó là gì không?
-
-Một gợi ý mã hoá ROT13, nếu bạn bế tắc: Vs lbh ybbx ng gur bhgchgf bs FrdhraprPynffvsvpngvba zbqryf va Genafsbezref, gurve svefg bhgchg vf `ybtvgf`. Jung ner ybtvgf?
-
-Và một gợi ý thứ hai: Jura lbh fcrpvsl bcgvzvmref, npgvingvbaf be ybffrf jvgu fgevatf, Xrenf frgf nyy gur nethzrag inyhrf gb gurve qrsnhygf. Jung nethzragf qbrf FcnefrPngrtbevpnyPebffragebcl unir, naq jung ner gurve qrsnhygf?
-
-</Tip>
+> [!TIP]
+> ✏️ **Đến lượt bạn!** Là một thử thách không bắt buộc sau khi chúng ta đã giải quyết xong các vấn đề khác, bạn có thể thử quay lại bước này và làm cho mô hình hoạt động với mất mát do Keras tính toán ban đầu thay vì mất mát nội bộ. Bạn sẽ cần phải thêm `"labels"` vào `label_cols` của `to_tf_dataset()` để đảm bảo rằng các nhãn được xuất chính xác, điều này sẽ giúp bạn có được độ dốc - nhưng có một vấn đề nữa với sự mất mát mà chúng ta đã chỉ định. Việc huấnl uyện vẫn sẽ diễn ra với vấn đề này, nhưng việc học sẽ rất chậm và sẽ khả năng mất mát huấn luyện cao. Bạn có thể tìm ra nó là gì không?
+>
+> Một gợi ý mã hoá ROT13, nếu bạn bế tắc: Vs lbh ybbx ng gur bhgchgf bs FrdhraprPynffvsvpngvba zbqryf va Genafsbezref, gurve svefg bhgchg vf `ybtvgf`. Jung ner ybtvgf?
+>
+> Và một gợi ý thứ hai: Jura lbh fcrpvsl bcgvzvmref, npgvingvbaf be ybffrf jvgu fgevatf, Xrenf frgf nyy gur nethzrag inyhrf gb gurve qrsnhygf. Jung nethzragf qbrf FcnefrPngrtbevpnyPebffragebcl unir, naq jung ner gurve qrsnhygf?
 
 Bây giờ, chúng ta hãy thử huấn luyện. Bây giờ chúng ta sẽ nhận được gradient, vì vậy hy vọng (nhạc đáng ngại phát ở đây) chúng ta có thể gọi `model.fit()` và mọi thứ sẽ hoạt động tốt!
 
@@ -360,11 +357,8 @@ model = TFAutoModelForSequenceClassification.from_pretrained(model_checkpoint)
 model.compile(optimizer=Adam(5e-5))
 ```
 
-<Tip>
-
-💡 Bạn cũng có thể nhập hàm `create_optimizer()` từ 🤗 Transformers, hàm này sẽ cho bạn một trình tối ưu AdamW với với độ phân rã trọng số chính xác cũng như khởi động và phân rã tốc độ học. Trình này thường sẽ tạo ra kết quả tốt hơn một chút so với kết quả bạn nhận được với trình tối ưu hóa Adam mặc định.
-
-</Tip>
+> [!TIP]
+> 💡 Bạn cũng có thể nhập hàm `create_optimizer()` từ 🤗 Transformers, hàm này sẽ cho bạn một trình tối ưu AdamW với với độ phân rã trọng số chính xác cũng như khởi động và phân rã tốc độ học. Trình này thường sẽ tạo ra kết quả tốt hơn một chút so với kết quả bạn nhận được với trình tối ưu hóa Adam mặc định.
 
 Bây giờ, chúng tôi có thể thử điều chỉnh mô hình với tốc độ học mới, được cải thiện:
 
@@ -386,11 +380,8 @@ Chúng ta đã đề cập đến các vấn đề trong tập lệnh ở trên,
 
 Dấu hiệu cho biết sắp hết bộ nhớ là một lỗi như "OOM when allocating tensor"  - OOM là viết tắt của "hết bộ nhớ." Đây là một nguy cơ rất phổ biến khi xử lý các mô hình ngôn ngữ lớn. Nếu bạn gặp phải điều này, một chiến lược tốt là giảm một nửa kích thước lô của bạn và thử lại. Tuy nhiên, hãy nhớ rằng một số mô hình có kích thước *rất* lớn. Ví dụ: GPT-2 kích thước đầy đủ có thông số 1.5B, có nghĩa là bạn sẽ cần 6GB bộ nhớ chỉ để lưu mô hình và 6GB khác cho độ dốc của nó! Huấn luyện mô hình GPT-2 đầy đủ thường sẽ yêu cầu hơn 20GB VRAM bất kể bạn sử dụng kích thước lô nào, điều mà chỉ một số GPU có. Các mô hình nhẹ hơn như `distilbert-base-cased`  dễ chạy hơn nhiều và huấn luyện cũng nhanh hơn nhiều.
 
-<Tip>
-
-Trong phần tiếp theo của khóa học, chúng ta sẽ xem xét các kỹ thuật nâng cao hơn có thể giúp bạn giảm dung lượng bộ nhớ và cho phép bạn tinh chỉnh các mô hình lớn nhất.
-
-</Tip>
+> [!TIP]
+> Trong phần tiếp theo của khóa học, chúng ta sẽ xem xét các kỹ thuật nâng cao hơn có thể giúp bạn giảm dung lượng bộ nhớ và cho phép bạn tinh chỉnh các mô hình lớn nhất.
 
 ### TensorFlow đói rồi đói rồi🦛
 
@@ -445,21 +436,15 @@ for batch in train_dataset:
 model.fit(batch, epochs=20)
 ```
 
-<Tip>
-
-💡 Nếu dữ liệu huấn luyện của bạn không cân bằng, hãy đảm bảo tạo một loạt dữ liệu huấn luyện có chứa tất cả các nhãn.
-
-</Tip>
+> [!TIP]
+> 💡 Nếu dữ liệu huấn luyện của bạn không cân bằng, hãy đảm bảo tạo một loạt dữ liệu huấn luyện có chứa tất cả các nhãn.
 
 Mô hình phải có kết quả gần như hoàn hảo trên `batch`, với mức mất mát giảm nhanh về 0 (hoặc giá trị tối thiểu cho khoản mất mát bạn đang sử dụng).
 
 Nếu bạn không quản lý để mô hình của mình có được kết quả hoàn hảo như thế này, điều đó có nghĩa là có điều gì đó không ổn trong cách bạn định khung vấn đề hoặc dữ liệu của mình, vì vậy bạn nên khắc phục điều đó. Chỉ khi bạn vượt qua được bài kiểm tra overfit, bạn mới có thể chắc chắn rằng mô hình của mình thực sự có thể học được điều gì đó.
 
-<Tip warning={true}>
-
-⚠️ Bạn sẽ phải tạo lại mô hình của mình và biên dịch lại sau bài kiểm tra overfitt này, vì mô hình thu được có thể sẽ không thể khôi phục và học được điều gì đó hữu ích trên tập dữ liệu đầy đủ của bạn.
-
-</Tip>
+> [!WARNING]
+> ⚠️ Bạn sẽ phải tạo lại mô hình của mình và biên dịch lại sau bài kiểm tra overfitt này, vì mô hình thu được có thể sẽ không thể khôi phục và học được điều gì đó hữu ích trên tập dữ liệu đầy đủ của bạn.
 
 ### Không điều chỉnh bất cứ thứ gì cho đến khi bạn có mô hình cơ sở đầu tiên
 
diff --git a/chapters/vi/chapter8/5.mdx b/chapters/vi/chapter8/5.mdx
index 742660127..5e1acf651 100644
--- a/chapters/vi/chapter8/5.mdx
+++ b/chapters/vi/chapter8/5.mdx
@@ -17,11 +17,8 @@ Khi bạn chắc chắn rằng bạn có một lỗi trong tay, bước đầu t
 
 Điều rất quan trọng là phải cô lập đoạn mã tạo ra lỗi, vì không có ai trong nhóm Hugging Face là ảo thuật gia và họ không thể sửa những gì họ không thể nhìn thấy. Một ví dụ có thể tái tạo tối thiểu, như tên đã chỉ ra, phải có thể tái tạo được. Điều này có nghĩa là nó không nên dựa vào bất kỳ tệp hoặc dữ liệu bên ngoài nào mà bạn có thể có. Cố gắng thay thế dữ liệu bạn đang sử dụng bằng một số giá trị giả trông giống như dữ liệu thật của bạn mà vẫn tạo ra lỗi tương tự.
 
-<Tip>
-
-🚨 Nhiều vấn đề trong kho lưu trữ 🤗 Transformers chưa được giải quyết vì không thể truy cập được dữ liệu được sử dụng để tái tạo chúng.
-
-</Tip>
+> [!TIP]
+> 🚨 Nhiều vấn đề trong kho lưu trữ 🤗 Transformers chưa được giải quyết vì không thể truy cập được dữ liệu được sử dụng để tái tạo chúng.
 
 Một khi bạn có một cái gì đó độc lập, bạn có thể cố gắng giảm nó thành những dòng mã ít hơn, xây dựng cái mà chúng ta gọi là _ví dụ tối giản có thể tái tạo được_. Mặc dù điều này đòi hỏi bạn phải làm việc nhiều hơn một chút, nhưng bạn gần như sẽ được đảm bảo nhận được trợ giúp và bản sửa lỗi nếu bạn cung cấp một trình tạo lỗi ngắn gọn, đẹp mắt.
 
diff --git a/chapters/vi/chapter9/1.mdx b/chapters/vi/chapter9/1.mdx
index 4bc7ab1a8..83ff2f043 100644
--- a/chapters/vi/chapter9/1.mdx
+++ b/chapters/vi/chapter9/1.mdx
@@ -27,10 +27,8 @@ Dưới đây là một số ví dụ về demo học máy được xây dựng
 
 Chương này được chia thành các phần bao gồm cả _khái niệm_ và _ứng dụng_. Sau khi bạn tìm hiểu khái niệm trong mỗi phần, bạn sẽ áp dụng nó để xây dựng một loại bản demo cụ thể, từ phân loại hình ảnh đến nhận dạng giọng nói. Vào thời điểm bạn hoàn thành chương này, bạn sẽ có thể xây dựng các bản demo này (và nhiều hơn nữa!) Chỉ trong một vài dòng mã Python.
 
-<Tip>
-👀 Hãy ngó thử <a href="https://huggingface.co/spaces" target="_blank">Hugging Face Spaces</a> để xem nhiều ví dụ gần đây về các bản demo học máy do cộng đồng học máy xây dựng!
-
-</Tip>
+> [!TIP]
+> 👀 Hãy ngó thử <a href="https://huggingface.co/spaces" target="_blank">Hugging Face Spaces</a> để xem nhiều ví dụ gần đây về các bản demo học máy do cộng đồng học máy xây dựng!
 
 ## Bữa tiệc Gradio
 
diff --git a/chapters/vi/chapter9/7.mdx b/chapters/vi/chapter9/7.mdx
index 91ab3a275..8f7a10d93 100644
--- a/chapters/vi/chapter9/7.mdx
+++ b/chapters/vi/chapter9/7.mdx
@@ -61,9 +61,8 @@ Ví dụ đơn giản ở trên giới thiệu 4 khái niệm làm nền tảng
 
 1. Blocks cho phép bạn xây dựng các ứng dụng web kết hợp markdown, HTML, các nút và các thành phần tương tác đơn giản bằng cách khởi tạo các đối tượng bằng Python bên trong ngữ cảnh `with gradio.Blocks`.
 
-<Tip>
-🙋Nếu bạn không quen với câu lệnh `with` trong Python, chúng tôi khuyên bạn nên xem [hướng dẫn](https://realpython.com/python-with-statement/)  tuyệt vời từ Real Python. Quay lại đây sau khi đọc xong 🤗
-</Tip>
+> [!TIP]
+> 🙋Nếu bạn không quen với câu lệnh `with` trong Python, chúng tôi khuyên bạn nên xem [hướng dẫn](https://realpython.com/python-with-statement/)  tuyệt vời từ Real Python. Quay lại đây sau khi đọc xong 🤗
 
 Thứ tự mà bạn khởi tạo các thành phần quan trọng khi mỗi phần tử được hiển thị vào ứng dụng web theo thứ tự nó được tạo. (Các bố cục phức tạp hơn được thảo luận bên dưới)
 
diff --git a/chapters/zh-CN/chapter1/3.mdx b/chapters/zh-CN/chapter1/3.mdx
index 7563b179e..032430bfa 100644
--- a/chapters/zh-CN/chapter1/3.mdx
+++ b/chapters/zh-CN/chapter1/3.mdx
@@ -10,13 +10,10 @@
 
 在本节中，我们将看看 Transformer 模型可以做什么，并使用 🤗 Transformers 库中的第一个工具： `pipeline()` 函数。
 
-<Tip>
-
-👀 看到那个右上角的在 Colab 中打开（Open in Colab）的按钮了吗？单击它就可以打开一个包含本节所有代码示例的 Google Colab Notebook 每一个有实例代码的小节都会有它。
-
-如果你想在本地运行示例，我们建议你查看<a href="/course/chapter0">第 0 章</a>。
-
-</Tip>
+> [!TIP]
+> 👀 看到那个右上角的在 Colab 中打开（Open in Colab）的按钮了吗？单击它就可以打开一个包含本节所有代码示例的 Google Colab Notebook 每一个有实例代码的小节都会有它。
+>
+> 如果你想在本地运行示例，我们建议你查看<a href="/course/chapter0">第 0 章</a>。
 
 ## Transformers 无处不在！[[Transformer 被应用于各个方面！]]
 
@@ -26,11 +23,8 @@ Transformer 模型用于解决各种 NLP 任务，就像上一节中提到的那
 
 [🤗 Transformers 库](https://github.com/huggingface/transformers) 提供了创建和使用这些共享模型的功能。 [Hugging Face 模型中心（以下简称“Hub”）](https://huggingface.co/models) 包含数千个任何人都可以下载和使用的预训练模型。你还可以将自己的模型上传到 Hub！
 
-<Tip>
-
-⚠️ Hugging Face Hub 不限于 Transformer 模型。任何人都可以分享他们想要的任何类型的模型或数据集！创建一个 [Huggingface.co 帐户](https://huggingface.co/join) 即可使用所有可用功能！
-
-</Tip>
+> [!TIP]
+> ⚠️ Hugging Face Hub 不限于 Transformer 模型。任何人都可以分享他们想要的任何类型的模型或数据集！创建一个 [Huggingface.co 帐户](https://huggingface.co/join) 即可使用所有可用功能！
 
 在深入研究 Transformer 模型的底层工作原理之前，让我们先看几个示例，看看它们如何用于解决一些有趣的 NLP 问题。
 
@@ -104,11 +98,8 @@ classifier(
 
 这个 pipeline 称为 `zero-shot` （零样本学习），因为你不需要对数据上的模型进行微调即可使用它。它可以直接返回你想要的任何标签列表的概率分数！
 
-<Tip>
-
-✏️**快来试试吧！**使用你自己的序列和标签，看看模型的表现如何。
-
-</Tip>
+> [!TIP]
+> ✏️**快来试试吧！**使用你自己的序列和标签，看看模型的表现如何。
 
 ## 文本生成 [[文本生成]]
 
@@ -129,11 +120,8 @@ generator("In this course, we will teach you how to")
 ```
 你可以使用参数 `num_return_sequences` 控制生成多少个不同的候选的句子，并使用参数 `max_length` 控制输出文本的总长度。
 
-<Tip>
-
-✏️**快来试试吧！**使用 `num_return_sequences` 和 `max_length` 参数生成两个句子，每个句子 15 个单词。
-
-</Tip>
+> [!TIP]
+> ✏️**快来试试吧！**使用 `num_return_sequences` 和 `max_length` 参数生成两个句子，每个句子 15 个单词。
 
 ## 在 pipeline 中使用 Hub 中的其他模型 [[在 pipeline 中使用 Hub 中的其他模型]]
 
@@ -163,11 +151,8 @@ generator(
 
 通过单击选择模型后，你会看到有一个小组件，可让你直接在线试用。通过这种方式，你可以在下载之前快速测试模型的功能。
 
-<Tip>
-
-✏️**快来试试吧！**使用标签筛选查找另一种语言的文本生成模型。使用小组件测试并在 pipeline 中使用它！
-
-</Tip>
+> [!TIP]
+> ✏️**快来试试吧！**使用标签筛选查找另一种语言的文本生成模型。使用小组件测试并在 pipeline 中使用它！
 
 ## 推理 API [[推理 API]]
 
@@ -198,11 +183,8 @@ unmasker("This course will teach you all about <mask> models.", top_k=2)
 
  `top_k` 参数控制要显示的结果有多少种。请注意，这里模型填补了特殊的 `<mask>` 词，它通常被称为 `mask token` 。不同的 `mask-filling` 模型可能有不同的 `mask token` ，因此在探索其他模型时要验证正确的 `mask token` 是什么。检查它的一种方法是查看小组件中使用的 `mask token` 。
 
-<Tip>
-
-✏️**快来试试吧！**在 Hub 上搜索 `bert-base-cased` 模型并在推理 API 小组件中找到它的 mask token。对于上面 pipeline 示例中的句子，这个模型预测了什么？
-
-</Tip>
+> [!TIP]
+> ✏️**快来试试吧！**在 Hub 上搜索 `bert-base-cased` 模型并在推理 API 小组件中找到它的 mask token。对于上面 pipeline 示例中的句子，这个模型预测了什么？
 
 ## 命名实体识别 [[命名实体识别]]
 
@@ -225,11 +207,8 @@ ner("My name is Sylvain and I work at Hugging Face in Brooklyn.")
 
 我们在创建 pipeline 的函数中传递的 `grouped_entities=True` 参数告诉 pipeline 将与同一实体对应的句子部分重新分组：这里模型正确地将“Hugging”和“Face”分组为一个组织，即使名称由多个词组成。事实上，正如我们即将在下一章看到的，预处理甚至会将一些单词分成更小的部分。例如， `Sylvain` 分割为了四部分： `S、##yl、##va` 和 `##in` 。在后处理步骤中，pipeline 成功地重新组合了这些部分。
 
-<Tip>
-
-✏️**快来试试吧！**在模型中心（hub）搜索能够用英语进行词性标注（通常缩写为 POS）的模型。对于上面示例中的句子，这个词性标注的模型预测了什么？
-
-</Tip>
+> [!TIP]
+> ✏️**快来试试吧！**在模型中心（hub）搜索能够用英语进行词性标注（通常缩写为 POS）的模型。对于上面示例中的句子，这个词性标注的模型预测了什么？
 
 ## 问答 [[问答]]
 
@@ -309,10 +288,7 @@ translator("Ce cours est produit par Hugging Face.")
 
 与文本生成和摘要一样，你可以指定结果的 `max_length` 或 `min_length` 。
 
-<Tip>
-
-✏️**快来试试吧！**搜索其他语言的翻译模型，尝试将前面的句子翻译成几种不同的语言。
-
-</Tip>
+> [!TIP]
+> ✏️**快来试试吧！**搜索其他语言的翻译模型，尝试将前面的句子翻译成几种不同的语言。
 
 到目前为止显示的 pipeline 主要用于演示目的。它们是为特定任务而定制的，不能对他们进行自定义的修改。在下一章中，你将了解 `pipeline()` 函数内部的过程以及如何进行自定义的修改。
\ No newline at end of file
diff --git a/chapters/zh-CN/chapter1/4.mdx b/chapters/zh-CN/chapter1/4.mdx
index 3e75f3660..827e23803 100644
--- a/chapters/zh-CN/chapter1/4.mdx
+++ b/chapters/zh-CN/chapter1/4.mdx
@@ -77,7 +77,7 @@
 
 这只是显示了一支团队领导的（非常大的）模型项目，该团队试图减少预训练对环境的影响。如果为了获得最佳超参数而进行大量试验，所造成的碳排放当量会更高。
 
-想象一下，如果每次一个研究团队、一个学生组织或一家公司想要训练一个模型，都从头开始训练的。这将导致巨大的、不必要的浪费！
+想象一下，如果每次一个研究团队、一个学生组织或一家公司想要训练一个模型，都从头开始训练。这将导致巨大的、不必要的浪费！
 
 这就是为什么共享语言模型至关重要：共享经过训练的权重，当遇见新的需求时在预训练的权重之上进行微调，可以降低训练模型训练的算力和时间消耗，降低全球的总体计算成本和碳排放。
 
@@ -87,7 +87,7 @@
 
 <Youtube id="BqqfQnyjmgg" />
 
-预训练（Pretraining）是是指从头开始训练模型：随机初始化权重，在没有任何先验知识的情况下开始训练。
+预训练（Pretraining）是指从头开始训练模型：随机初始化权重，在没有任何先验知识的情况下开始训练。
 
 <div class="flex justify-center">
 <img class="block dark:hidden" src="https://huggingface.co/datasets/huggingface-course/documentation-images/resolve/main/en/chapter1/pretraining.svg" alt="The pretraining of a language model is costly in both time and money."/>
diff --git a/chapters/zh-CN/chapter2/1.mdx b/chapters/zh-CN/chapter2/1.mdx
index 01f4c1523..f13d84364 100644
--- a/chapters/zh-CN/chapter2/1.mdx
+++ b/chapters/zh-CN/chapter2/1.mdx
@@ -18,6 +18,5 @@
 
 然后我们来看看 `tokenizer` API，它是 `pipeline()` 函数的另一个重要组成部分。在 `pipeline()` 中 `Tokenizer` 负责第一步和最后一步的处理，将文本转换到神经网络的输入，以及在需要时将其转换回文本。最后，我们将向你展示如何处理将多个句子整理为一个 batch 发送给模型，然后我们将更深入地研究 `tokenizer()` 函数。
 
-<Tip>
-⚠️ 为了充分利用 Model Hub 和🤗 Transformers 提供的所有功能，我们建议你<a href="https://huggingface.co/join">创建一个账户</a>。
-</Tip>
\ No newline at end of file
+> [!TIP]
+> ⚠️ 为了充分利用 Model Hub 和🤗 Transformers 提供的所有功能，我们建议你<a href="https://huggingface.co/join">创建一个账户</a>。
\ No newline at end of file
diff --git a/chapters/zh-CN/chapter2/2.mdx b/chapters/zh-CN/chapter2/2.mdx
index d828642f2..65502872f 100644
--- a/chapters/zh-CN/chapter2/2.mdx
+++ b/chapters/zh-CN/chapter2/2.mdx
@@ -22,9 +22,8 @@
 
 {/if}
 
-<Tip>
-这是第一部分，根据你使用 PyTorch 或者 TensorFlow，内容略有不同。点击标题上方的平台，选择你喜欢的平台！
-</Tip>
+> [!TIP]
+> 这是第一部分，根据你使用 PyTorch 或者 TensorFlow，内容略有不同。点击标题上方的平台，选择你喜欢的平台！
 
 {#if fw === 'pt'}
 <Youtube id="1pedAIvTWXk"/>
@@ -360,8 +359,5 @@ model.config.id2label
 
 我们已经成功地复刻了管道的三个步骤：使用 tokenizer 进行预处理、通过模型传递输入以及后处理！接下来，让我们花一些时间深入了解这些步骤中的每一步。
 
-<Tip>
-
-✏️ **试试看！** 选择两个（或更多）句子并分别在 `sentiment-analysis` 管道和自己实现的管道中运行它们。看一看是否获得的结果是不是相同的！
-
-</Tip>
+> [!TIP]
+> ✏️ **试试看！** 选择两个（或更多）句子并分别在 `sentiment-analysis` 管道和自己实现的管道中运行它们。看一看是否获得的结果是不是相同的！
diff --git a/chapters/zh-CN/chapter2/4.mdx b/chapters/zh-CN/chapter2/4.mdx
index f8f87eda5..5b57e7222 100644
--- a/chapters/zh-CN/chapter2/4.mdx
+++ b/chapters/zh-CN/chapter2/4.mdx
@@ -218,11 +218,8 @@ print(ids)
 
 这些输出，一旦转换为适当的框架张量，就可以用作模型的输入，如本章前面所示。
 
-<Tip>
-
-✏️ **试试看！** 请将我们在第 2 节中使用的输入句子（“I've been waiting for a HuggingFace course my whole life.”和“I hate this so much!”）执行最后两个步骤（分词和转换为 inputs ID）。检查你获得的 inputs ID 是否与我们在第二节中获得的一致！
-
-</Tip>
+> [!TIP]
+> ✏️ **试试看！** 请将我们在第 2 节中使用的输入句子（“I've been waiting for a HuggingFace course my whole life.”和“I hate this so much!”）执行最后两个步骤（分词和转换为 inputs ID）。检查你获得的 inputs ID 是否与我们在第二节中获得的一致！
 
 ## 解码 [[解码]]
 
diff --git a/chapters/zh-CN/chapter2/5.mdx b/chapters/zh-CN/chapter2/5.mdx
index 26e529030..8c16b8c78 100644
--- a/chapters/zh-CN/chapter2/5.mdx
+++ b/chapters/zh-CN/chapter2/5.mdx
@@ -184,11 +184,8 @@ batched_ids = [ids, ids]
 
 这就是一个包含两个相同句子的 batch 
 
-<Tip>
-
-✏️   **试试看！**  将这个 `batched_ids` 列表转换为张量，并通过你的模型进行处理。检查你是否得到了与之前相同的 logits 值（但是重复了两次）！
-
-</Tip>
+> [!TIP]
+> ✏️   **试试看！**  将这个 `batched_ids` 列表转换为张量，并通过你的模型进行处理。检查你是否得到了与之前相同的 logits 值（但是重复了两次）！
 
 批处理支持模型在输入多个句子时工作。使用多个句子就像使用单个句子构建批一样简单。不过，还有第二个问题。当你试图将两个（或更多）句子组合在一起时，它们的长度可能不同。如果你以前使用过张量，那么你知道它们必须是矩形，因此无法将 inputs ID 列表直接转换为张量。为了解决这个问题，我们通常填充输入（Padding）。
 
@@ -320,11 +317,8 @@ tf.Tensor(
 
 注意第二序列的最后一个值是填充 ID，其在注意力掩码中的值为 0。
 
-<Tip>
-
-✏️ **试试看！**在第二节使用的两个句子（“I've been waiting for a HuggingFace course my whole life.” 和 “I hate this so much!”）上手动进行 tokenize。将它们输入模型并检查你是否得到了与第二节相同的 logits 值。然后使用填充 token 将它们一起进行批处理，然后创建合适的注意力掩码。检查模型计算后是否得到了相同的结果！
-
-</Tip>
+> [!TIP]
+> ✏️ **试试看！**在第二节使用的两个句子（“I've been waiting for a HuggingFace course my whole life.” 和 “I hate this so much!”）上手动进行 tokenize。将它们输入模型并检查你是否得到了与第二节相同的 logits 值。然后使用填充 token 将它们一起进行批处理，然后创建合适的注意力掩码。检查模型计算后是否得到了相同的结果！
 
 ## 更长的句子 [[更长的句子]]
 
diff --git a/chapters/zh-CN/chapter3/2.mdx b/chapters/zh-CN/chapter3/2.mdx
index b839864da..2ae86ef09 100644
--- a/chapters/zh-CN/chapter3/2.mdx
+++ b/chapters/zh-CN/chapter3/2.mdx
@@ -92,11 +92,8 @@ model.train_on_batch(batch, labels)
 模型中心（hub）不仅仅包含模型，还有许多别的语言的数据集。访问 [Datasets](https://huggingface.co/datasets) 的链接即可进行浏览。我们建议你在完成本节的学习后阅读一下 [加载和处理新的数据集](https://huggingface.co/docs/datasets/loading) 这篇文章，这会让你对 huggingface 的数据集理解更加清晰。现在让我们使用 MRPC 数据集中的 [GLUE 基准测试数据集](https://gluebenchmark.com) 作为我们训练所使用的数据集，它是构成 MRPC 数据集的 10 个数据集之一，作为一个用于衡量机器学习模型在 10 个不同文本分类任务中性能的学术基准。
 
 🤗 Datasets 库提供了一条非常便捷的命令，可以在模型中心（hub）上下载和缓存数据集。你可以以下代码下载 MRPC 数据集：
-<Tip>
-
-⚠️ **警告** 确保你已经运行 `pip install datasets` 安装了 `datasets`。然后，再继续下面的加载 MRPC 数据集和打印出来查看其内容。
-
-</Tip> 
+> [!TIP]
+> ⚠️ **警告** 确保你已经运行 `pip install datasets` 安装了 `datasets`。然后，再继续下面的加载 MRPC 数据集和打印出来查看其内容。 
 
 ```py
 from datasets import load_dataset
@@ -161,11 +158,8 @@ raw_train_dataset.features
 
 上面的例子中的 `Label（标签）` 是一种 `ClassLabel（分类标签）` ，也就是使用整数建立起类别标签的映射关系。 `0` 对应于 `not_equivalent（非同义）` ， `1` 对应于 `equivalent（同义）` 。
 
-<Tip>
-
-✏️ **试试看！** 查看训练集的第 15 行元素和验证集的 87 行元素。他们的标签是什么？
-
-</Tip>
+> [!TIP]
+> ✏️ **试试看！** 查看训练集的第 15 行元素和验证集的 87 行元素。他们的标签是什么？
 
 ## 预处理数据集 [[预处理数据集]]
 
@@ -219,11 +213,8 @@ inputs
 
 我们在 [第二章](/course/chapter2) 讨论了 `输入词id(input_ids)` 和 `注意力遮罩(attention_mask)` ，但尚未讨论 `token类型ID(token_type_ids)` 。在本例中， `token类型ID(token_type_ids)` 的作用就是告诉模型输入的哪一部分是第一句，哪一部分是第二句。
 
-<Tip>
-
-✏️ ** 试试看！** 选取训练集中的第 15 个元素，将两句话分别进行tokenization。结果和上方的例子有什么不同？
-
-</Tip>
+> [!TIP]
+> ✏️ ** 试试看！** 选取训练集中的第 15 个元素，将两句话分别进行tokenization。结果和上方的例子有什么不同？
 
 如果将 `input_ids` 中的 id 转换回文字：
 
@@ -426,11 +417,8 @@ batch = data_collator(samples)
 
 {/if}
 
-<Tip>
-
-✏️ ** 试试看！** 在 GLUE SST-2 数据集上复刻上述预处理。它有点不同，因为它是由单句而不是成对的句子组成的，但是我们所做的其他事情看起来应该是一样的。另一个进阶的挑战是尝试编写一个可用于任何 GLUE 任务的预处理函数。
-
-</Tip>
+> [!TIP]
+> ✏️ ** 试试看！** 在 GLUE SST-2 数据集上复刻上述预处理。它有点不同，因为它是由单句而不是成对的句子组成的，但是我们所做的其他事情看起来应该是一样的。另一个进阶的挑战是尝试编写一个可用于任何 GLUE 任务的预处理函数。
 
 {#if fw === 'tf'}
 
diff --git a/chapters/zh-CN/chapter3/3.mdx b/chapters/zh-CN/chapter3/3.mdx
index 80e30c20c..2cb66e663 100644
--- a/chapters/zh-CN/chapter3/3.mdx
+++ b/chapters/zh-CN/chapter3/3.mdx
@@ -42,11 +42,8 @@ from transformers import TrainingArguments
 training_args = TrainingArguments("test-trainer")
 ```
 
-<Tip>
-
-💡 如果你想在训练期间自动将模型上传到 Hub，请将 `push_to_hub=True` 添加到 TrainingArguments 之中。我们将在 [第四章](/course/chapter4/3) 中详细介绍这部分。
-
-</Tip>
+> [!TIP]
+> 💡 如果你想在训练期间自动将模型上传到 Hub，请将 `push_to_hub=True` 添加到 TrainingArguments 之中。我们将在 [第四章](/course/chapter4/3) 中详细介绍这部分。
 
 第二步是定义我们的模型。与 [前一章](/course/chapter2) 一样，我们将使用 `AutoModelForSequenceClassification` 类，它有两个参数：
 
@@ -160,9 +157,6 @@ trainer.train()
 
 使用 `Trainer` API 微调的介绍到此结束。在 [第七章](/course/chapter7) 中会给出一个对大多数常见的 NLP 任务进行训练的例子，但现在让我们看看如何在 PyTorch 中做相同的操作。
 
-<Tip>
-
-✏️ **试试看！** 使用你在第 2 节中学到的数据处理过程，在 GLUE SST-2 数据集上对模型进行微调。
-
-</Tip>
+> [!TIP]
+> ✏️ **试试看！** 使用你在第 2 节中学到的数据处理过程，在 GLUE SST-2 数据集上对模型进行微调。
 
diff --git a/chapters/zh-CN/chapter3/3_tf.mdx b/chapters/zh-CN/chapter3/3_tf.mdx
index 773224214..4c93fb6d5 100644
--- a/chapters/zh-CN/chapter3/3_tf.mdx
+++ b/chapters/zh-CN/chapter3/3_tf.mdx
@@ -70,11 +70,8 @@ model = TFAutoModelForSequenceClassification.from_pretrained(checkpoint, num_lab
 
 为了在我们的数据集上微调模型，我们只需要在我们的模型上调用 `compile()` 方法，然后将我们的数据传递给 `fit()` 方法。 这将启动微调过程（在 GPU 上应该需要几分钟）并输出训练损失，以及每个 epoch 结束时的验证损失。
 
-<Tip>
-
-请注意🤗 Transformers 模型具有大多数 Keras 模型所没有的特殊能力——它们可以自动使用内部计算的损失。 如果你没有在 `compile()` 中设置损失参数，它们可以自动使用适当的损失函数，并在内部计算。 请注意，要使用内部损失，你需要将标签作为输入的一部分传入模型，而不是作为单独的标签（这是在 Keras 模型中使用标签的常规方式）。 你将在课程的第 2 部分中看到这方面的示例，正确定义损失函数可能会有些棘手。 然而对于序列分类来说，标准的 Keras 损失函数效果很好，因此我们将在这里使用它。
-
-</Tip>
+> [!TIP]
+> 请注意🤗 Transformers 模型具有大多数 Keras 模型所没有的特殊能力——它们可以自动使用内部计算的损失。 如果你没有在 `compile()` 中设置损失参数，它们可以自动使用适当的损失函数，并在内部计算。 请注意，要使用内部损失，你需要将标签作为输入的一部分传入模型，而不是作为单独的标签（这是在 Keras 模型中使用标签的常规方式）。 你将在课程的第 2 部分中看到这方面的示例，正确定义损失函数可能会有些棘手。 然而对于序列分类来说，标准的 Keras 损失函数效果很好，因此我们将在这里使用它。
 
 ```py
 from tensorflow.keras.losses import SparseCategoricalCrossentropy
@@ -90,11 +87,8 @@ model.fit(
 )
 ```
 
-<Tip warning={true}>
-
-请注意这里有一个非常常见的陷阱——你可以把损失的名称作为一个字符串传递给 Keras，但默认情况下，Keras 会假设你已经对输出进行了 softmax。 然而，许多模型在经过 softmax 函数之前输出的是被称为 `logits` 的值。 我们需要告诉损失函数，我们的模型是否已经使用 softmax 函数进行了处理，唯一的方法是传递一个损失函数并且在参数的部分告诉模型，而不是只传递一个字符串。
-
-</Tip>
+> [!WARNING]
+> 请注意这里有一个非常常见的陷阱——你可以把损失的名称作为一个字符串传递给 Keras，但默认情况下，Keras 会假设你已经对输出进行了 softmax。 然而，许多模型在经过 softmax 函数之前输出的是被称为 `logits` 的值。 我们需要告诉损失函数，我们的模型是否已经使用 softmax 函数进行了处理，唯一的方法是传递一个损失函数并且在参数的部分告诉模型，而不是只传递一个字符串。
 
 
 ## 改善训练的效果 [[改善训练的效果]]
@@ -123,11 +117,8 @@ from tensorflow.keras.optimizers import Adam
 opt = Adam(learning_rate=lr_scheduler)
 ```
 
-<Tip>
-
-🤗 Transformers 库还有一个 `create_optimizer()` 函数，它将创建一个具有学习率衰减的 `AdamW` 优化器。 这是一个快捷的方式，你将在本课程的后续部分中详细了解。
-
-</Tip>
+> [!TIP]
+> 🤗 Transformers 库还有一个 `create_optimizer()` 函数，它将创建一个具有学习率衰减的 `AdamW` 优化器。 这是一个快捷的方式，你将在本课程的后续部分中详细了解。
 
 现在我们有了全新的优化器，我们可以尝试使用它进行训练。 首先，让我们重新加载模型，重新设置刚刚训练时的权重，然后我们可以使用新的优化器对其进行编译：
 
@@ -145,11 +136,8 @@ model.compile(optimizer=opt, loss=loss, metrics=["accuracy"])
 model.fit(tf_train_dataset, validation_data=tf_validation_dataset, epochs=3)
 ```
 
-<Tip>
-
-💡 如果你想在训练期间自动将模型上传到 Hub，你可以在 `model.fit()` 方法中传递一个 `PushToHubCallback`。 我们将在 [第四章](/course/chapter4/3) 中进一步了解这个问题。
-
-</Tip>
+> [!TIP]
+> 💡 如果你想在训练期间自动将模型上传到 Hub，你可以在 `model.fit()` 方法中传递一个 `PushToHubCallback`。 我们将在 [第四章](/course/chapter4/3) 中进一步了解这个问题。
 
 ## 模型预测 [[模型预测]]
 
diff --git a/chapters/zh-CN/chapter3/4.mdx b/chapters/zh-CN/chapter3/4.mdx
index 64a267dbb..8fdb7653e 100644
--- a/chapters/zh-CN/chapter3/4.mdx
+++ b/chapters/zh-CN/chapter3/4.mdx
@@ -197,11 +197,8 @@ metric.compute()
 
 同样，由于模型头部初始化和数据打乱的随机性，你的结果会略有不同，但应该相差不多。
 
-<Tip>
-
-✏️ **试试看！** 修改之前的训练循环以在 SST-2 数据集上微调你的模型。
-
-</Tip>
+> [!TIP]
+> ✏️ **试试看！** 修改之前的训练循环以在 SST-2 数据集上微调你的模型。
 
 ## 使用🤗 Accelerate 加速你的训练循环 [[使用🤗 Accelerate加速你的训练循环]]
 
@@ -292,11 +289,8 @@ for epoch in range(num_epochs):
 
 然后大部分工作会在将数据加载器、模型和优化器发送到的 `accelerator.prepare()` 中完成。这将会把这些对象包装在适当的容器中，以确保你的分布式训练按预期工作。要进行的其余更改是删除将 `batch` 放在 `device` 的那行代码（同样，如果你想保留它，你可以将其更改为使用 `accelerator.device` ） 并将 `loss.backward()` 替换为 `accelerator.backward(loss)` 。
 
-<Tip>
-
-⚠️ 为了使云端 TPU 提供的加速中发挥最大的效益，我们建议使用 tokenizer 的 `padding=max_length` 和 `max_length` 参数将你的样本填充到固定长度。
-
-</Tip>
+> [!TIP]
+> ⚠️ 为了使云端 TPU 提供的加速中发挥最大的效益，我们建议使用 tokenizer 的 `padding=max_length` 和 `max_length` 参数将你的样本填充到固定长度。
 
 如果你想复制并粘贴来直接运行，以下是 🤗 Accelerate 的完整训练循环：
 
diff --git a/chapters/zh-CN/chapter4/2.mdx b/chapters/zh-CN/chapter4/2.mdx
index 956ecd244..07bde83fb 100644
--- a/chapters/zh-CN/chapter4/2.mdx
+++ b/chapters/zh-CN/chapter4/2.mdx
@@ -91,6 +91,5 @@ model = TFAutoModelForMaskedLM.from_pretrained("camembert-base")
 ```
 {/if}
 
-<Tip>
-使用预训练模型时，一定要检查它是如何训练的、在哪些数据集上训练的、它的局限性和偏见。所有这些信息都应在其模型卡片上有所展示。
-</Tip>
+> [!TIP]
+> 使用预训练模型时，一定要检查它是如何训练的、在哪些数据集上训练的、它的局限性和偏见。所有这些信息都应在其模型卡片上有所展示。
diff --git a/chapters/zh-CN/chapter4/3.mdx b/chapters/zh-CN/chapter4/3.mdx
index a10f1e13d..8fd08e4f9 100644
--- a/chapters/zh-CN/chapter4/3.mdx
+++ b/chapters/zh-CN/chapter4/3.mdx
@@ -180,11 +180,8 @@ tokenizer.push_to_hub("dummy-model", organization="huggingface", use_auth_token=
 </div>
 {/if}
 
-<Tip>
-
-✏️ **试试看** 获取与 `bert-base-cased` checkpoint 相关的模型和 tokenizer 并使用 `push_to_hub()` 方法将它们上传到你账户中的一个仓库。并请仔细检查该仓库在你的页面上显示是否正常。
-
-</Tip>
+> [!TIP]
+> ✏️ **试试看** 获取与 `bert-base-cased` checkpoint 相关的模型和 tokenizer 并使用 `push_to_hub()` 方法将它们上传到你账户中的一个仓库。并请仔细检查该仓库在你的页面上显示是否正常。
 
 如你所见， `push_to_hub()` 方法接受多个参数，从而可以上传到特定的仓库或账户，除此之外还可以使用不同的 API 令牌验证身份。我们建议你直接查看 [🤗 Transformers 文档](https://huggingface.co/transformers/model_sharing) 了解更多的用法。
 
@@ -462,9 +459,8 @@ config.json  README.md  sentencepiece.bpe.model  special_tokens_map.json  tf_mod
 
 {/if}
 
-<Tip>
-✏️ 当通过网页界面创建仓库时， `.gitattributes` 文件会自动将某些扩展名（如 `.bin` 和 `.h5` ）的文件视为大文件，你无需对 git-lfs 进行任何设置即可跟踪它们。
-</Tip> 
+> [!TIP]
+> ✏️ 当通过网页界面创建仓库时， `.gitattributes` 文件会自动将某些扩展名（如 `.bin` 和 `.h5` ）的文件视为大文件，你无需对 git-lfs 进行任何设置即可跟踪它们。 
 
 我们现在可以继续上传，就像我们使用传统 Git 仓库一样。我们可以使用以下命令将所有文件添加到 Git 的暂存环境中 `git add` 命令：
 
diff --git a/chapters/zh-CN/chapter5/2.mdx b/chapters/zh-CN/chapter5/2.mdx
index 478e457b5..2e1d9d5fe 100644
--- a/chapters/zh-CN/chapter5/2.mdx
+++ b/chapters/zh-CN/chapter5/2.mdx
@@ -48,11 +48,8 @@ SQuAD_it-train.json.gz:	   82.2% -- replaced with SQuAD_it-train.json
 
 我们可以看到压缩文件已经被替换为 `SQuAD_it-train.json` 和 `SQuAD_it-test.json` ，并且数据以 JSON 格式存储。
 
-<Tip>
-
-✏️ 如果你想知道为什么上面的 shell 命令中有一个 `!` ，那是因为我们现在是在 Jupyter notebook 中运行它们。如果你想在命令行中下载和解压缩数据集，只需删除前缀 `!` 即可。
-
-</Tip>
+> [!TIP]
+> ✏️ 如果你想知道为什么上面的 shell 命令中有一个 `!` ，那是因为我们现在是在 Jupyter notebook 中运行它们。如果你想在命令行中下载和解压缩数据集，只需删除前缀 `!` 即可。
 
 当我们使用 `load_dataset()` 函数来加载 JSON 文件时，我们需要知道我们是在处理普通的 JSON（类似于嵌套字典）还是 JSON Lines（每一行都是一个 JSON）。像许多问答数据集一样，SQuAD-it 使用的是嵌套字典，所有文本都存储在 `data` 字段中。这意味着我们可以通过使用参数 `field` 来加载数据集，如下所示：
 
@@ -126,11 +123,8 @@ DatasetDict({
 
 这正是我们想要的。现在，我们可以使用各种预处理技术来清洗数据、tokenize 评论等等。
 
-<Tip> 
-
-`load_dataset()` 函数的 `data_files` 参数非常灵活：可以是单个文件路径、文件路径列表或者是标签映射到文件路径的字典。你还可以根据 Unix shell 的规则，对符合指定模式的文件进行批量匹配（例如，你可以通过设置 `data_files="*.JSON"` 匹配目录中所有的 JSON 文件）。有关`load_dataset()`更多详细信息，请参阅 [🤗Datasets 文档](https://huggingface.co/docs/datasets/v2.12.0/en/loading#local-and-remote-files) 。
-
-</Tip>
+> [!TIP]
+> `load_dataset()` 函数的 `data_files` 参数非常灵活：可以是单个文件路径、文件路径列表或者是标签映射到文件路径的字典。你还可以根据 Unix shell 的规则，对符合指定模式的文件进行批量匹配（例如，你可以通过设置 `data_files="*.JSON"` 匹配目录中所有的 JSON 文件）。有关`load_dataset()`更多详细信息，请参阅 [🤗Datasets 文档](https://huggingface.co/docs/datasets/v2.12.0/en/loading#local-and-remote-files) 。
 
 🤗 Datasets 实际上支持自动解压输入文件，所以我们可以跳过使用 `gzip` ，直接将 `data_files` 参数设置为压缩文件：
 
@@ -158,9 +152,6 @@ squad_it_dataset = load_dataset("json", data_files=data_files, field="data")
 
 这将返回和上面的本地例子相同的 `DatasetDict` 对象，但省去了我们手动下载和解压 `SQuAD_it-*.json.gz` 文件的步骤。这是我们对加载未托管在 Hugging Face Hub 的数据集的各种方法的总结。既然我们已经有了一个可以使用的数据集，让我们开始大展身手吧！
 
-<Tip>
-
-✏️ **试试看！** 选择托管在 GitHub 或 [UCI 机器学习仓库](https://archive.ics.uci.edu/ml/index.php) 上的另一个数据集并尝试使用上述技术在本地和远程加载它。另外，可以尝试加载 CSV 或者文本格式存储的数据集（有关这些格式的更多信息，请参阅 [文档](https://huggingface.co/docs/datasets/loading#local-and-remote-files) ）。
-
-</Tip>
+> [!TIP]
+> ✏️ **试试看！** 选择托管在 GitHub 或 [UCI 机器学习仓库](https://archive.ics.uci.edu/ml/index.php) 上的另一个数据集并尝试使用上述技术在本地和远程加载它。另外，可以尝试加载 CSV 或者文本格式存储的数据集（有关这些格式的更多信息，请参阅 [文档](https://huggingface.co/docs/datasets/loading#local-and-remote-files) ）。
 
diff --git a/chapters/zh-CN/chapter5/3.mdx b/chapters/zh-CN/chapter5/3.mdx
index 19a7ddacc..5f9accb39 100644
--- a/chapters/zh-CN/chapter5/3.mdx
+++ b/chapters/zh-CN/chapter5/3.mdx
@@ -90,11 +90,8 @@ DatasetDict({
 })
 ```
 
-<Tip>
-
-✏️  **试试看！** 使用 `Dataset.unique()` 函数查找训练和测试集中的特定药物和病症的数量。
-
-</Tip>
+> [!TIP]
+> ✏️  **试试看！** 使用 `Dataset.unique()` 函数查找训练和测试集中的特定药物和病症的数量。
 
 接下来，让我们使用 `Dataset.map()` 来规范所有的 `condition` 标签。正如我们在 [第三章](/course/chapter3) 中处理 tokenizer 一样，我们可以定义一个简单的函数，可以使用该函数 `drug_dataset` 处理每个分组的所有行：
 
@@ -218,11 +215,8 @@ drug_dataset["train"].sort("review_length")[:3]
 
 正如我们所猜想的那样，有些评论只包含一个词，虽然这对于情感分析任务来说还可以接受，但如果我们想要预测病情，那么它所提供的信息就不够丰富了。
 
-<Tip>
-
-🙋向数据集添加新列的另一种方法是使用函数 `Dataset.add_column()` ，在使用它时你可以通过 Python 列表或 NumPy 数组的方式提供数据，在不适合使用 `Dataset.map()` 情况下可以很方便。
-
-</Tip>
+> [!TIP]
+> 🙋向数据集添加新列的另一种方法是使用函数 `Dataset.add_column()` ，在使用它时你可以通过 Python 列表或 NumPy 数组的方式提供数据，在不适合使用 `Dataset.map()` 情况下可以很方便。
 
 让我们使用 `Dataset.filter()` 功能来删除包含少于 30 个单词的评论。这与我们过滤 `condition` 列的处理方式相似，我们可以通过设定评论长度的最小阈值，筛选出过短的评论：
 
@@ -237,11 +231,8 @@ print(drug_dataset.num_rows)
 
 如你所见，这个操作从我们的原始训练和测试集中删除了大约 15％ 的评论。
 
-<Tip>
-
-✏️ **试试看！**使用 `Dataset.sort()` 函数查看单词数最多的评论。你可以参阅 [文档](https://huggingface.co/docs/datasets/package_reference/main_classes.html#datasets.Dataset.sort) 了解如何按照评论的长度降序排序。
-
-</Tip>
+> [!TIP]
+> ✏️ **试试看！**使用 `Dataset.sort()` 函数查看单词数最多的评论。你可以参阅 [文档](https://huggingface.co/docs/datasets/package_reference/main_classes.html#datasets.Dataset.sort) 了解如何按照评论的长度降序排序。
 
 我们需要处理的最后一件事是处理评论中的 HTML 字符。我们可以使用 Python 的 `html` 模块来解码这些字符，如下所示：
 
@@ -298,11 +289,8 @@ def tokenize_function(examples):
 
 你也可以将 `%%time` 放置在单元格开头来统计整个单元格的执行时间。在我们的硬件上，该指令显示 10.8 秒（这就是真正（Wall time）的执行时间）。
 
-<Tip>
-
-✏️ **试试看！** 在有和无 `batched=True` 的情况下执行相同的指令，然后试试慢速 tokenizer （在 `AutoTokenizer.from_pretrained()` 方法中添加 `use_fast=False` ），这样你就可以测试一下在你的电脑上它需要多长的时间。
-
-</Tip>
+> [!TIP]
+> ✏️ **试试看！** 在有和无 `batched=True` 的情况下执行相同的指令，然后试试慢速 tokenizer （在 `AutoTokenizer.from_pretrained()` 方法中添加 `use_fast=False` ），这样你就可以测试一下在你的电脑上它需要多长的时间。
 
 以下是我们在使用和不使用批处理时使用快速和慢速 tokenizer 获得的结果：
 
@@ -337,19 +325,13 @@ tokenized_dataset = drug_dataset.map(slow_tokenize_function, batched=True, num_p
 
 这个结果对于慢速分词器来说是更加友好了，但快速分词器的性能也得到了显著提升。但是请注意，情况并非总是如此—对于 `num_proc` 的其他值，在我们的测试中，使用 `batched=True` 而不带有 `num_proc` 参数的选项处理起来更快。总的来说，我们并不推荐在快速 tokenizer 和 `batched=True` 的情况下使用 Python 的多进程处理。
 
-<Tip>
-
-通常来说，使用 `num_proc` 以加快处理速度通常是一个好主意，只要你使用的函数本身没有进行某种类型的多进程处理。
-
-</Tip>
+> [!TIP]
+> 通常来说，使用 `num_proc` 以加快处理速度通常是一个好主意，只要你使用的函数本身没有进行某种类型的多进程处理。
 
 将所有这些功能浓缩到一个方法中已经非常了不起，但是还有更多！使用 `Dataset.map()` 和 `batched=True` 你可以更改数据集中的元素数量。当你想从一个样本中创建几个训练特征时，这是非常有用的。我们将在 [第七章](/course/chapter7) 中几个 NLP 任务的预处理中使用到这个功能，它非常便捷。
 
-<Tip>
-
-💡在机器学习中，一个样本通常可以为我们的模型提供一组特征。在某些情况下，这组特征会储存在数据集的几个列，但在某些情况下（例如此处的例子和用于问答的数据），可以从单个样本的那一列中提取多个特征。
-
-</Tip>
+> [!TIP]
+> 💡在机器学习中，一个样本通常可以为我们的模型提供一组特征。在某些情况下，这组特征会储存在数据集的几个列，但在某些情况下（例如此处的例子和用于问答的数据），可以从单个样本的那一列中提取多个特征。
 
 让我们来看看从一列中提取多个特征是如何实现的！在这里，我们将对我们的样本进行 tokenize 并将最大截断长度设置为 128，并且我们将要求 tokenizer 返回全部文本块，而不仅仅是第一个。这可以通过设置 `return_overflowing_tokens=True` 来实现：
 
@@ -518,11 +500,8 @@ drug_dataset["train"][:3]
 train_df = drug_dataset["train"][:]
 ```
 
-<Tip>
-
-🚨 实际上， `Dataset.set_format()` 仅仅改变了数据集的 `__getitem__()` 方法的返回格式。这意味着当我们想从 `"pandas"` 格式的 `Dataset` 中创建像 `train_df` 这样的新对象时，我们需要对整个数据集进行切片（[:]）才可以获得 `pandas.DataFrame` 对象。无论输出格式如何，你都可以自己验证 `drug_dataset["train"]` 的类型依然还是 `Dataset` 。
-
-</Tip>
+> [!TIP]
+> 🚨 实际上， `Dataset.set_format()` 仅仅改变了数据集的 `__getitem__()` 方法的返回格式。这意味着当我们想从 `"pandas"` 格式的 `Dataset` 中创建像 `train_df` 这样的新对象时，我们需要对整个数据集进行切片（[:]）才可以获得 `pandas.DataFrame` 对象。无论输出格式如何，你都可以自己验证 `drug_dataset["train"]` 的类型依然还是 `Dataset` 。
 
 
 有了这个基础，我们可以使用我们想要的所有 Pandas 功能。例如，我们可以巧妙地链式操作，来计算 `condition` 列中不同类别的分布 
@@ -593,11 +572,8 @@ Dataset({
 })
 ```
 
-<Tip>
-
-✏️**试试看！**计算每种药物的平均评分并将结果存储在一个新的 Dataset 中。
-
-</Tip>
+> [!TIP]
+> ✏️**试试看！**计算每种药物的平均评分并将结果存储在一个新的 Dataset 中。
 
 到此为止，我们对🤗 Datasets 中可用的各种预处理技术的介绍就结束了。在本节的最后一部分，让我们为训练分类器创建一个验证集。在此之前，让我们将输出格式 `drug_dataset` 从 `pandas` 重置到 `arrow` ：
 
diff --git a/chapters/zh-CN/chapter5/4.mdx b/chapters/zh-CN/chapter5/4.mdx
index 8625c67ae..664947455 100644
--- a/chapters/zh-CN/chapter5/4.mdx
+++ b/chapters/zh-CN/chapter5/4.mdx
@@ -44,11 +44,8 @@ Dataset({
 
 我们可以看到我们的数据集中有 15,518,009 行和 2 列 —— 如此庞大！
 
-<Tip>
-
-✏️ 默认情况下，🤗 Datasets 会自动解压加载数据集所需的文件。如果你想保留硬盘空间，你可以把 `DownloadConfig(delete_extracted=True)` 传递给 `load_dataset()` 的 `download_config` 参数。更多详细信息，请参阅 [文档](https://huggingface.co/docs/datasets/package_reference/builder_classes#datasets.DownloadConfig) 。
-
-</Tip>
+> [!TIP]
+> ✏️ 默认情况下，🤗 Datasets 会自动解压加载数据集所需的文件。如果你想保留硬盘空间，你可以把 `DownloadConfig(delete_extracted=True)` 传递给 `load_dataset()` 的 `download_config` 参数。更多详细信息，请参阅 [文档](https://huggingface.co/docs/datasets/package_reference/builder_classes#datasets.DownloadConfig) 。
 
 让我们看看数据集的第一个元素的内容：
 
@@ -99,11 +96,8 @@ print(f"数据集大小 (缓存文件) : {size_gb:.2f} GB")
 
 令人欣喜的是——尽管它将近 20GB 之大，我们却能用远小于此的 RAM 加载和访问数据集！
 
-<Tip>
-
-✏️ **试试看！** 从 Pile 选择一个比你的笔记本电脑或台式机的 RAM 更大的 [子集](https://the-eye.eu/public/AI/pile_preliminary_components/) ，用 🤗 Datasets 加载这个数据集，并且测量 RAM 的使用量。请注意，为了获得准确的测量结果，你需要新开一个进程执行这个操作。你可以在 [the Pile paper](https://arxiv.org/abs/2101.00027) 的表 1 中找到每个子集解压后的大小。
-
-</Tip>
+> [!TIP]
+> ✏️ **试试看！** 从 Pile 选择一个比你的笔记本电脑或台式机的 RAM 更大的 [子集](https://the-eye.eu/public/AI/pile_preliminary_components/) ，用 🤗 Datasets 加载这个数据集，并且测量 RAM 的使用量。请注意，为了获得准确的测量结果，你需要新开一个进程执行这个操作。你可以在 [the Pile paper](https://arxiv.org/abs/2101.00027) 的表 1 中找到每个子集解压后的大小。
 
 如果你熟悉 Pandas，这个结果可能会让人感到很惊奇。因为根据 Wes Kinney 的著名的 [经验法则](https://wesmckinney.com/blog/apache-arrow-pandas-internals/) ，你通常需要 5 到 10 倍于你数据集大小的 RAM。那么 🤗 Datasets 是如何解决这个内存管理问题的呢？🤗 Datasets 将每一个数据集看作一个 [内存映射文件](https://en.wikipedia.org/wiki/Memory-mapped_file) ，它提供了 RAM 和文件系统存储之间的映射，该映射允许 Datasets 库无需将其完全加载到内存中即可访问和操作数据集的元素。
 
@@ -130,11 +124,8 @@ print(
 
 这里我们使用了 Python 的 `timeit` 模块来测量执行 `code_snippet` 所耗的时间。你通常能以十分之几 GB/s 到几 GB/s 的速度遍历一个数据集。通过上述的方法就已经能够解决大多数大数据集加载的限制，但是有时候你不得不使用一个很大的数据集，它甚至都不能存储在笔记本电脑的硬盘上。例如，如果我们尝试下载整个 Pile，我们需要 825GB 的可用磁盘空间！为了处理这种情况，🤗 Datasets 提供了一个流式功能，这个功能允许我们动态下载和访问元素，并且不需要下载整个数据集。让我们来看看这个功能是如何工作的。
 
-<Tip>
-
-💡在 Jupyter 笔记中你还可以使用 [`%%timeit` 魔术函数](https://ipython.readthedocs.io/en/stable/interactive/magics.html#magic-timeit) 为整个单元格计时。
-
-</Tip>
+> [!TIP]
+> 💡在 Jupyter 笔记中你还可以使用 [`%%timeit` 魔术函数](https://ipython.readthedocs.io/en/stable/interactive/magics.html#magic-timeit) 为整个单元格计时。
 
 ## 流式数据集 [[流式数据集]]
 
@@ -172,11 +163,8 @@ next(iter(tokenized_dataset))
 {'input_ids': [101, 4958, 5178, 4328, 6779, ...], 'attention_mask': [1, 1, 1, 1, 1, ...]}
 ```
 
-<Tip>
-
-💡 为了加速流式的 tokenize，你可以传递 `batched=True` ，就像我们在上一节看到的那样。它会批量处理示例；默认的批大小是 1000，可以通过 `batch_size` 参数指定批量大小。
-
-</Tip>
+> [!TIP]
+> 💡 为了加速流式的 tokenize，你可以传递 `batched=True` ，就像我们在上一节看到的那样。它会批量处理示例；默认的批大小是 1000，可以通过 `batch_size` 参数指定批量大小。
 
 你还可以使用 `IterableDataset.shuffle()` 打乱流式数据集，但与 `Dataset.shuffle()` 不同的是这只会打乱预定义 `buffer_size` 中的元素：
 
@@ -277,10 +265,7 @@ next(iter(pile_dataset["train"]))
  'text': 'It is done, and submitted. You can play “Survival of the Tastiest” on Android, and on the web...'}
 ```
 
-<Tip>
-
-✏️ **试试看！** 使用像 [`mc4`](https://huggingface.co/datasets/mc4) 或者 [`oscar`](https://huggingface.co/datasets/oscar) 这样的大型 Common Crawl 语料库来创建一个流式多语言数据集，该数据集代表你选择的国家/地区语言的口语比例。例如，瑞士的四种民族语言分别是德语、法语、意大利语和罗曼什语，因此你可以尝试根据根据口语比例对 Oscar 子集进行抽样来创建一个瑞士语料库。
-
-</Tip>
+> [!TIP]
+> ✏️ **试试看！** 使用像 [`mc4`](https://huggingface.co/datasets/mc4) 或者 [`oscar`](https://huggingface.co/datasets/oscar) 这样的大型 Common Crawl 语料库来创建一个流式多语言数据集，该数据集代表你选择的国家/地区语言的口语比例。例如，瑞士的四种民族语言分别是德语、法语、意大利语和罗曼什语，因此你可以尝试根据根据口语比例对 Oscar 子集进行抽样来创建一个瑞士语料库。
 
 你现在拥有加载和处理各种类型和大小的数据集的所需的所有工具 —— 但是除非你非常幸运，否则在你的 NLP 之旅中会有一个难题，你将不得不亲自创建一个数据集来解决手头的问题。这就是我们接下来要讨论的主题！
diff --git a/chapters/zh-CN/chapter5/5.mdx b/chapters/zh-CN/chapter5/5.mdx
index c3f203a11..53462d0bf 100644
--- a/chapters/zh-CN/chapter5/5.mdx
+++ b/chapters/zh-CN/chapter5/5.mdx
@@ -112,11 +112,8 @@ response.json()
 
 哇，好大量的信息！我们可以看到有用的字段，我们可以看到诸如 `title` 、 `body` 和 `number` 等描述 issue 的有用字段，以及关于创建 issue 的 GitHub 用户的信息。
 
-<Tip>
-
-✏️ **试试看！**打开上面 JSON 中的一些 URL，以了解每个 GitHub issue 中 url 所链接的信息类型。
-
-</Tip>
+> [!TIP]
+> ✏️ **试试看！**打开上面 JSON 中的一些 URL，以了解每个 GitHub issue 中 url 所链接的信息类型。
 
 如 GitHub [文档](https://docs.github.com/en/rest/overview/resources-in-the-rest-api#rate-limiting) 中所述，未经身份验证的请求限制为每小时 60 个请求。虽然你可以增加 `per_page` 查询参数以减少你发出的请求次数，但你仍会在任何有几千个以上 issue 的仓库上触发速率限制。因此，你应该按照 GitHub 的 [说明](https://docs.github.com/en/github/authenticating-to-github/creating-a-personal-access-token) ，创建一个 `个人访问令牌（personal access token）` 这样你就可以将速率限制提高到每小时 5,000 个请求。获得令牌后，你可以将其放在请求标头中：
 
@@ -125,11 +122,8 @@ GITHUB_TOKEN = xxx  #  将你的GitHub令牌复制到此处
 headers = {"Authorization": f"token {GITHUB_TOKEN}"}
 ```
 
-<Tip warning={true}>
-
-⚠️ 不要与陌生人共享存在 `GITHUB令牌` 的笔记本。我们建议你在使用完后将 `GITHUB令牌` 删除，以避免意外泄漏。一个更好的做法是，将令牌存储在．env 文件中，并使用 [`python-dotenv`库](https://github.com/theskumar/python-dotenv) 自动加载环境变量。
-
-</Tip>
+> [!WARNING]
+> ⚠️ 不要与陌生人共享存在 `GITHUB令牌` 的笔记本。我们建议你在使用完后将 `GITHUB令牌` 删除，以避免意外泄漏。一个更好的做法是，将令牌存储在．env 文件中，并使用 [`python-dotenv`库](https://github.com/theskumar/python-dotenv) 自动加载环境变量。
 
 现在我们有了访问令牌，让我们创建一个可以从 GitHub 仓库下载所有 issue 的函数：
 
@@ -236,11 +230,8 @@ issues_dataset = issues_dataset.map(
 )
 ```
 
-<Tip>
-
-✏️ **试试看！**计算在 🤗 Datasets 中解决 issue 所需的平均时间。你可能会发现 `Dataset.filter()` 函数对于过滤 pull 请求和未解决的 issue 很有用，并且你可以使用 `Dataset.set_format()` 函数将数据集转换为 `DataFrame` ，以便你可以轻松地按照需求修改 `创建(created_at)` 和 `关闭(closed_at)` 的时间的格式（以时间戳格式）。
-
-</Tip>
+> [!TIP]
+> ✏️ **试试看！**计算在 🤗 Datasets 中解决 issue 所需的平均时间。你可能会发现 `Dataset.filter()` 函数对于过滤 pull 请求和未解决的 issue 很有用，并且你可以使用 `Dataset.set_format()` 函数将数据集转换为 `DataFrame` ，以便你可以轻松地按照需求修改 `创建(created_at)` 和 `关闭(closed_at)` 的时间的格式（以时间戳格式）。
 
 尽管我们可以通过删除或重命名某些列来进一步清理数据集，但在此阶段尽可能保持数据集“原始”状态通常是一个很好的做法，以便它可以在多个不同的项目中轻松使用。在我们将数据集推送到 Hugging Face Hub 之前，让我们再添加一些缺少的数据：每个 issue 和 pull 中的评论。我们接下来将添加它们——你猜对了——我们将依然使用 GitHub REST API！
 
@@ -359,11 +350,8 @@ Dataset({
 
 很酷，我们已经将我们的数据集推送到 Hub，其他人可以使用它！只剩下一件重要的事情要做：添加一个数据卡片，解释语料库是如何创建的，并为使用数据集的其他提供一些其他有用的信息。
 
-<Tip>
-
-💡 你还可以使用一些 Git 技巧和 `huggingface-cli` 直接从终端将数据集上传到 Hugging Face Hub。有关如何执行此操作的详细信息，请参阅 [🤗 Datasets 指南](https://huggingface.co/docs/datasets/share#share-a-dataset-using-the-cli) 指南。
-
-</Tip>
+> [!TIP]
+> 💡 你还可以使用一些 Git 技巧和 `huggingface-cli` 直接从终端将数据集上传到 Hugging Face Hub。有关如何执行此操作的详细信息，请参阅 [🤗 Datasets 指南](https://huggingface.co/docs/datasets/share#share-a-dataset-using-the-cli) 指南。
 
 ## 创建数据集卡片 [[创建数据集卡片]]
 
@@ -385,16 +373,11 @@ Dataset({
 <img src="https://huggingface.co/datasets/huggingface-course/documentation-images/resolve/main/en/chapter5/dataset-card.png" alt="A dataset card." width="80%"/>
 </div>
 
-<Tip>
-
-✏️**试试看！**使用 `dataset-tagging` 应用程序和 [🤗 Datasets 指南](https://github.com/huggingface/datasets/blob/master/templates/README_guide.md) 指南来完成 GitHub issue 数据集的 README.md 文件。
-
-</Tip>
+> [!TIP]
+> ✏️**试试看！**使用 `dataset-tagging` 应用程序和 [🤗 Datasets 指南](https://github.com/huggingface/datasets/blob/master/templates/README_guide.md) 指南来完成 GitHub issue 数据集的 README.md 文件。
 
 很好！我们在本节中可以看到，创建一个好的数据集可能涉及相当多的工作，但幸运的是，将其上传并与社区共享会很容易实现。在下一节中，我们将使用我们的新数据集创建一个 🤗 Datasets 的语义搜索引擎，该引擎可以将输入匹配到最相关的 issue 和评论。
 
-<Tip>
-
-✏️ **试试看！**按照我们在本节中采取的步骤为你最喜欢的开源库创建一个 GitHub issue 数据集（当然是除了 🤗 Datasets）。进阶的挑战：微调多标签分类器以预测在 `labels` 字段中出现的标签。
-</Tip>
+> [!TIP]
+> ✏️ **试试看！**按照我们在本节中采取的步骤为你最喜欢的开源库创建一个 GitHub issue 数据集（当然是除了 🤗 Datasets）。进阶的挑战：微调多标签分类器以预测在 `labels` 字段中出现的标签。
 
diff --git a/chapters/zh-CN/chapter5/6.mdx b/chapters/zh-CN/chapter5/6.mdx
index 71a46b74a..91bc56229 100644
--- a/chapters/zh-CN/chapter5/6.mdx
+++ b/chapters/zh-CN/chapter5/6.mdx
@@ -177,11 +177,8 @@ Dataset({
 太好了，我们获取到了几千条的评论！
 
 
-<Tip>
-
-✏️ **试试看！** 看看你是否可以使用 `Dataset.map()` 展开 `issues_dataset` 的 `comments` 列，这有点棘手；你可能会发现 🤗 Datasets 文档的 ["批处理映射(Batch mapping)"](https://huggingface.co/docs/datasets/about_map_batch#batch-mapping) 对这个任务很有用。
-
-</Tip>
+> [!TIP]
+> ✏️ **试试看！** 看看你是否可以使用 `Dataset.map()` 展开 `issues_dataset` 的 `comments` 列，这有点棘手；你可能会发现 🤗 Datasets 文档的 ["批处理映射(Batch mapping)"](https://huggingface.co/docs/datasets/about_map_batch#batch-mapping) 对这个任务很有用。
 
 既然我们每行有一个评论，让我们创建一个新的 `comments_length` 列来存放每条评论的字数：
 
@@ -511,8 +508,5 @@ URL: https://github.com/huggingface/datasets/issues/824
 
 不错！我们的输出的第 2 个结果似乎与查询匹配。
 
-<Tip>
-
-✏️  试试看！创建你自己的查询并查看你是否可以在检索到的文档中找到答案。你可能需要在 `Dataset.get_nearest_examples()` 增加参数 `k` 以扩大搜索范围。
-
-</Tip>
\ No newline at end of file
+> [!TIP]
+> ✏️  试试看！创建你自己的查询并查看你是否可以在检索到的文档中找到答案。你可能需要在 `Dataset.get_nearest_examples()` 增加参数 `k` 以扩大搜索范围。
\ No newline at end of file
diff --git a/chapters/zh-CN/chapter6/2.mdx b/chapters/zh-CN/chapter6/2.mdx
index 1c24c35f4..96727b31e 100644
--- a/chapters/zh-CN/chapter6/2.mdx
+++ b/chapters/zh-CN/chapter6/2.mdx
@@ -11,11 +11,8 @@
 
 <Youtube id="DJimQynXZsQ"/>
 
-<Tip warning={true}>
-
-⚠️ 训练 tokenizer 与训练模型不同！模型训练使用随机梯度下降使每个 batch 的 loss 小一点。它本质上是随机的（这意味着在即使两次训练的参数和算法完全相同，你也必须设置一些随机数种子才能获得相同的结果）。训练 tokenizer 是一个统计过程，它试图确定哪些子词最适合为给定的语料库选择，确定的过程取决于分词算法。它是确定性的，这意味着在相同的语料库上使用相同的算法进行训练时，得到的结果总是相同的。
-
-</Tip>
+> [!WARNING]
+> ⚠️ 训练 tokenizer 与训练模型不同！模型训练使用随机梯度下降使每个 batch 的 loss 小一点。它本质上是随机的（这意味着在即使两次训练的参数和算法完全相同，你也必须设置一些随机数种子才能获得相同的结果）。训练 tokenizer 是一个统计过程，它试图确定哪些子词最适合为给定的语料库选择，确定的过程取决于分词算法。它是确定性的，这意味着在相同的语料库上使用相同的算法进行训练时，得到的结果总是相同的。
 
 ## 准备语料库 [[准备语料库]]
 
diff --git a/chapters/zh-CN/chapter6/3.mdx b/chapters/zh-CN/chapter6/3.mdx
index 1a9fea14a..28c30a549 100644
--- a/chapters/zh-CN/chapter6/3.mdx
+++ b/chapters/zh-CN/chapter6/3.mdx
@@ -31,11 +31,8 @@
 |               | 快速 tokenizer      | 慢速 tokenizer 
 :--------------:|:--------------:|:-------------: `batched=True` | 10.8s          | 4min41s `batched=False` | 59.2s          | 5min3s
 
-<Tip warning={true}>
-
-⚠️ 对单个句子进行 tokenize 时，你不总是能看到同一个 tokenizer 的慢速和快速版本之间的速度差异。事实上，快速版本可能更慢！只有同时对大量文本进行 tokenize 时，你才能清楚地看到差异。
-
-</Tip>
+> [!WARNING]
+> ⚠️ 对单个句子进行 tokenize 时，你不总是能看到同一个 tokenizer 的慢速和快速版本之间的速度差异。事实上，快速版本可能更慢！只有同时对大量文本进行 tokenize 时，你才能清楚地看到差异。
 
 ## 批量编码 [[批量编码]]
 
@@ -105,13 +102,10 @@ encoding.word_ids()
 
 我们可以看到 tokenizer 的特殊 token `[CLS]` 和 `[SEP]` 被映射到 `None` ，然后每个 token 都映射到它来源的单词。这对于确定一个 token 是否在单词的开头或两个 token 是否在同一个单词中特别有用。对于 BERT 类型（BERT-like）的的 tokenizer 我们也可以依靠 `##` 前缀来实现这个功能；不过只要是快速 tokenizer 它所提供的 `word_ids()` 方法适用于任何类型的 tokenizer 。在下一章，我们将看到如何利用这种能力，将我们为每个词正确地对应到词汇任务中的标签，如命名实体识别（NER）和词性标注（POS）。我们也可以使用它在掩码语言建模（masked language modeling）中来遮盖来自同一词的所有 token（一种称为 `全词掩码（whole word masking）` 的技术）。
 
-<Tip>
-
-词的概念是复杂的。例如，“I'll”（“I will”的缩写）算作一个词还是两个词？这实际上取决于 tokenizer 和它采用的预分词操作。有些 tokenizer 只在空格处分割，所以它们会把这个看作是一个词。有些其他 tokenizer 在空格的基础之上还使用标点，所以会认为它是两个词。
-
-✏️ **试试看！**从 `bert base cased` 和 `roberta base` checkpoint 创建一个 tokenizer 并用它们对“81s”进行分词。你观察到了什么？这些词的 ID 是什么？
-
-</Tip>
+> [!TIP]
+> 词的概念是复杂的。例如，“I'll”（“I will”的缩写）算作一个词还是两个词？这实际上取决于 tokenizer 和它采用的预分词操作。有些 tokenizer 只在空格处分割，所以它们会把这个看作是一个词。有些其他 tokenizer 在空格的基础之上还使用标点，所以会认为它是两个词。
+>
+> ✏️ **试试看！**从 `bert base cased` 和 `roberta base` checkpoint 创建一个 tokenizer 并用它们对“81s”进行分词。你观察到了什么？这些词的 ID 是什么？
 
 同样，我们还有一个 `sentence_ids()` 方法，可以用它把一个 token 映射到它原始的句子（尽管在这种情况下，tokenizer 返回的 `token_type_ids`也可以为我们提供相同的信息）。
 
@@ -129,11 +123,8 @@ Sylvain
 如前所述，这一切都是由于快速分词器跟踪每个 token 来自的文本范围的一组*偏移*。为了阐明它们的作用，接下来我们将展示如何手动复现 `token-classification` 管道的结果。
 
 
-<Tip>
-
-✏️ **试试看！** 使用自己的文本，看看你是否能理解哪些 token 与单词 ID 相关联，以及如何提取单个单词的字符跨度。附加题：请尝试使用两个句子作为输入，看看句子 ID 是否对你有意义。
-
-</Tip>
+> [!TIP]
+> ✏️ **试试看！** 使用自己的文本，看看你是否能理解哪些 token 与单词 ID 相关联，以及如何提取单个单词的字符跨度。附加题：请尝试使用两个句子作为输入，看看句子 ID 是否对你有意义。
 
 ## `token-classification` 管道内部流程 [[`token-classification`管道内部流程]]
 
diff --git a/chapters/zh-CN/chapter6/3b.mdx b/chapters/zh-CN/chapter6/3b.mdx
index c6c366722..4386c43a2 100644
--- a/chapters/zh-CN/chapter6/3b.mdx
+++ b/chapters/zh-CN/chapter6/3b.mdx
@@ -276,11 +276,8 @@ print(scores[start_index, end_index])
 0.97773
 ```
 
-<Tip>
-
-✏️ **试试看！** 计算五个最可能的答案的开始和结束索引。
-
-</Tip>
+> [!TIP]
+> ✏️ **试试看！** 计算五个最可能的答案的开始和结束索引。
 
 我们有了答案的 `start_index` 和 `end_index` ，所以现在我们只需要将他们转换为上下文中的字符索引。这就是偏移量将会非常有用的地方。我们可以像我们在 token 分类任务中那样获取偏移量并使用它们：
 
@@ -315,11 +312,8 @@ print(result)
 
 太棒了！这和我们上面获取的结果一样！
 
-<Tip>
-
-✏️ **试试看！** 使用你之前计算的最佳分数来显示五个最可能的答案。你可以回到之前的 QA pipeline，并在调用时传入 `top_k=5` 来对比检查你的结果。
-
-</Tip>
+> [!TIP]
+> ✏️ **试试看！** 使用你之前计算的最佳分数来显示五个最可能的答案。你可以回到之前的 QA pipeline，并在调用时传入 `top_k=5` 来对比检查你的结果。
 
 ## 处理长文本 [[处理长文本]]
 
@@ -614,11 +608,8 @@ print(candidates)
 这两个候选范围对应的是模型在每个块中能够找到的最好的答案。模型对于正确的答案在第二部分更有信心（这是个好兆头！）。现在我们只需要将这两个 token 范围映射到上下文中的字符范围（我们只需要映射第二个就能得到我们的答案，但是看看模型在第一块中选取了什么作为答案还是很有意思的）。
 
 
-<Tip>
-
-✏️ **试试看！**  调整上面的代码，以返回五个最可能的答案的得分和范围（对于整个上下文，而不是单个块）。
-
-</Tip>
+> [!TIP]
+> ✏️ **试试看！**  调整上面的代码，以返回五个最可能的答案的得分和范围（对于整个上下文，而不是单个块）。
 
 我们之前抓取 `offsets` 的实际上是一个偏移量列表，每个文本块都有一个列表：
 
@@ -639,10 +630,7 @@ for candidate, offset in zip(candidates, offsets):
 
 如果我们选择分数最高的第二个结果，我们会得到与 QA 管道相同结果——耶！
 
-<Tip>
-
-✏️ **试试看！** 使用你之前计算的最佳分数来显示五个最可能的答案（对于整个上下文，而不是单个块）。如果想要与 pipeline 对比检查你的结果的话，返回之前的 QA 管道，并在调用时传入 `top_k=5` 的参数。
-
-</Tip>
+> [!TIP]
+> ✏️ **试试看！** 使用你之前计算的最佳分数来显示五个最可能的答案（对于整个上下文，而不是单个块）。如果想要与 pipeline 对比检查你的结果的话，返回之前的 QA 管道，并在调用时传入 `top_k=5` 的参数。
 
 我们已经结束了我们对 tokenizer 能力的深入探究。在下一章，我们将展示如何在一系列常见的 NLP 任务上微调模型，我们将对这些内容再次付诸实践。
\ No newline at end of file
diff --git a/chapters/zh-CN/chapter6/4.mdx b/chapters/zh-CN/chapter6/4.mdx
index 6064c0179..6d878ff6b 100644
--- a/chapters/zh-CN/chapter6/4.mdx
+++ b/chapters/zh-CN/chapter6/4.mdx
@@ -47,11 +47,8 @@ print(tokenizer.backend_tokenizer.normalizer.normalize_str("Héllò hôw are ü?
 
 在这个例子中，由于我们选择了 `bert-base-uncased` checkpoint，所以会在标准化的过程中转换为小写并删除重音。
 
-<Tip>
-
-✏️ **试试看！** 从 `bert-base-cased` checkpoint 加载 tokenizer 并处理相同的示例。看一看 tokenizer 的 `cased` 和 `uncased` 版本之间的主要区别是什么？
-
-</Tip>
+> [!TIP]
+> ✏️ **试试看！** 从 `bert-base-cased` checkpoint 加载 tokenizer 并处理相同的示例。看一看 tokenizer 的 `cased` 和 `uncased` 版本之间的主要区别是什么？
 
 ## 预分词 [[预分词]]
 
diff --git a/chapters/zh-CN/chapter6/5.mdx b/chapters/zh-CN/chapter6/5.mdx
index 4a8e1d4f7..88719151b 100644
--- a/chapters/zh-CN/chapter6/5.mdx
+++ b/chapters/zh-CN/chapter6/5.mdx
@@ -11,11 +11,8 @@
 
 <Youtube id="HEikzVL-lZU"/>
 
-<Tip>
-
-💡 本节深入介绍了 BPE，甚至展示了一个完整的实现。如果你只想大致了解 tokenization 算法，可以跳到最后。
-
-</Tip>
+> [!TIP]
+> 💡 本节深入介绍了 BPE，甚至展示了一个完整的实现。如果你只想大致了解 tokenization 算法，可以跳到最后。
 
 ## BPE 训练 [[BPE 训练]]
 
@@ -27,11 +24,8 @@ BPE 训练首先计算语料库中使用的唯一单词集合（在完成标准
 
 基础单词集合将是 `["b", "g", "h", "n", "p", "s", "u"]` 。在实际应用中，基本词汇表将至少包含所有 ASCII 字符，可能还包含一些 Unicode 字符。如果你正在 tokenization 不在训练语料库中的字符，则该字符将转换为未知 tokens，这就是为什么许多 NLP 模型在分析带有表情符号的内容的结果非常糟糕的原因之一。
 
-<Tip>
-
-GPT-2 和 RoBERTa （这两者非常相似）的 tokenizer 有一个巧妙的方法来处理这个问题：他们不把单词看成是用 Unicode 字符编写的，而是用字节编写的。这样，基本词汇表的大小很小（256），但是能包含几乎所有你能想象的字符，而不会最终转换为未知 tokens 这个技巧被称为 `字节级（byte-level） BPE` 。
-
-</Tip>
+> [!TIP]
+> GPT-2 和 RoBERTa （这两者非常相似）的 tokenizer 有一个巧妙的方法来处理这个问题：他们不把单词看成是用 Unicode 字符编写的，而是用字节编写的。这样，基本词汇表的大小很小（256），但是能包含几乎所有你能想象的字符，而不会最终转换为未知 tokens 这个技巧被称为 `字节级（byte-level） BPE` 。
 
 获得这个基础单词集合后，我们通过学习 `合并（merges）` 来添加新的 tokens 直到达到期望的词汇表大小。合并是将现有词汇表中的两个元素合并为一个新元素的规则。所以，一开始会创建出含有两个字符的 tokens 然后，随着训练的进展，会产生更长的子词。
 
@@ -74,11 +68,8 @@ GPT-2 和 RoBERTa （这两者非常相似）的 tokenizer 有一个巧妙的方
 
 我们继续这样合并，直到达到我们所需的词汇量。
 
-<Tip>
-
-✏️ **现在轮到你了！** 你认为下一个合并规则是什么？
-
-</Tip>
+> [!TIP]
+> ✏️ **现在轮到你了！** 你认为下一个合并规则是什么？
 
 ## tokenization [[tokenization]]
 
@@ -99,11 +90,8 @@ GPT-2 和 RoBERTa （这两者非常相似）的 tokenizer 有一个巧妙的方
 
 在这种情况下，单词 `"bug"` 将被转化为 `["b", "ug"]` 。然而 `"mug"` ，将被转换为 `["[UNK]", "ug"]` ，因为字母 `"m"` 不再基本词汇表中。同样，单词 `"thug"` 会被转换为 `["[UNK]", "hug"]` ：字母 `"t"` 不在基本词汇表中，使用合并规则首先会将 `"u"` 和 `"g"` 合并，然后将 `"h"` 和 `"ug"` 合并。
 
-<Tip>
-
-✏️ **现在轮到你了！** 你认为这个词 `"unhug"` 将如何被 tokenization？
-
-</Tip>
+> [!TIP]
+> ✏️ **现在轮到你了！** 你认为这个词 `"unhug"` 将如何被 tokenization？
 
 ## 实现 BPE 算法[[实现 BPE 算法]]
 
@@ -316,11 +304,8 @@ print(vocab)
  'Ġtok', 'Ġtoken', 'nd', 'Ġis', 'Ġth', 'Ġthe', 'in', 'Ġab', 'Ġtokeni']
 ```
 
-<Tip>
-
-💡 在同一语料库上使用 `train_new_from_iterator()` 可能不会产生完全相同的词汇表。这是因为当有多个出现频率最高的对时，我们选择遇到的第一个，而 🤗 Tokenizers 库根据内部 ID 选择第一个。
-
-</Tip>
+> [!TIP]
+> 💡 在同一语料库上使用 `train_new_from_iterator()` 可能不会产生完全相同的词汇表。这是因为当有多个出现频率最高的对时，我们选择遇到的第一个，而 🤗 Tokenizers 库根据内部 ID 选择第一个。
 
 为了对新文本进行分词，我们对其进行预分词、拆分，然后使用学到的所有合并规则：
 
@@ -353,10 +338,7 @@ tokenize("This is not a token.")
 ['This', 'Ġis', 'Ġ', 'n', 'o', 't', 'Ġa', 'Ġtoken', '.']
 ```
 
-<Tip warning={true}>
-
-⚠️ 如果存在未知字符，我们的实现将抛出错误，因为我们没有做任何处理它们。GPT-2 实际上没有未知 tokens （使用字节级 BPE 时不可能得到未知字符），但这里的代码可能会出现这个错误，因为我们并未在初始词汇中包含所有可能的字节。BPE 的这一部分已超出了本节的范围，因此我们省略了一些细节。
-
-</Tip>
+> [!WARNING]
+> ⚠️ 如果存在未知字符，我们的实现将抛出错误，因为我们没有做任何处理它们。GPT-2 实际上没有未知 tokens （使用字节级 BPE 时不可能得到未知字符），但这里的代码可能会出现这个错误，因为我们并未在初始词汇中包含所有可能的字节。BPE 的这一部分已超出了本节的范围，因此我们省略了一些细节。
 
 至此，BPE 算法的介绍就到此结束！接下来，我们将研究 WordPiece 算法。
\ No newline at end of file
diff --git a/chapters/zh-CN/chapter6/6.mdx b/chapters/zh-CN/chapter6/6.mdx
index 799c8737c..8fe8c1e3a 100644
--- a/chapters/zh-CN/chapter6/6.mdx
+++ b/chapters/zh-CN/chapter6/6.mdx
@@ -11,19 +11,13 @@ WordPiece 是 Google 开发的用于 BERT 预训练的分词算法。自此之
 
 <Youtube id="qpv6ms_t_1A"/>
 
-<Tip>
-
-💡 本节详细讲述了 WordPiece，甚至展示了一个完整的实现。如果你只想对这个分词算法有个大概的理解，可以直接跳到最后。
-
-</Tip>
+> [!TIP]
+> 💡 本节详细讲述了 WordPiece，甚至展示了一个完整的实现。如果你只想对这个分词算法有个大概的理解，可以直接跳到最后。
 
 ## WordPiece 训练 [[WordPiece 训练]]
 
-<Tip warning={true}>
-
-⚠️ Google 从未开源 WordPiece 训练算法的实现，因此以下是我们基于已发表文献的最佳猜测。它可能并非 100％ 准确的。
-
-</Tip>
+> [!WARNING]
+> ⚠️ Google 从未开源 WordPiece 训练算法的实现，因此以下是我们基于已发表文献的最佳猜测。它可能并非 100％ 准确的。
 
 与BPE 一样，WordPiece 也是从包含模型使用的特殊 tokens 和初始字母表的小词汇表开始的。由于它是通过添加前缀（如 BERT 中的 `##` ）来识别子词的，每个词最初都会通过在词内部所有字符前添加该前缀进行分割。因此，例如 `"word"` 将被这样分割：
 
@@ -77,11 +71,8 @@ $$\mathrm{score} = (\mathrm{freq\_of\_pair}) / (\mathrm{freq\_of\_first\_element
 
 然后我们就按此方式继续，直到我们达到所需的词汇表大小。
 
-<Tip>
-
-✏️ **现在轮到你了！** 下一个合并规则是什么？
-
-</Tip>
+> [!TIP]
+> ✏️ **现在轮到你了！** 下一个合并规则是什么？
 
 ## tokenization 算法 [[tokenization 算法]]
 
@@ -93,11 +84,8 @@ WordPiece 和 BPE 的分词方式有所不同，WordPiece 只保存最终词汇
 
 当分词过程中无法在词汇库中找到该子词时，整个词会被标记为 unknown（未知）—— 例如， `"mug"` 将被标记为 `["[UNK]"]` ， `"bum"` 也是如此（即使我们的词汇表中包含 `"b"` 和 `"##u"` 开始，但是 `"##m"` 不在词汇表中，因此最终的分词结果只会是 `["[UNK]"]` ，而不是 `["b", "##u", "[UNK]"]` ）。这是与 BPE 的另一个区别，BPE 只会将不在词汇库中的单个字符标记为 unknown。
 
-<Tip>
-
-✏️ **现在轮到你了！** `"pugs"` 将被如何分词？
-
-</Tip>
+> [!TIP]
+> ✏️ **现在轮到你了！** `"pugs"` 将被如何分词？
 
 ## 实现 WordPiece [[实现 WordPiece]]
 
@@ -315,11 +303,8 @@ print(vocab)
 
 如我们所见，相较于 BPE（字节对编码），此分词器在学习单词部分作为 tokens 时稍快一些。
 
-<Tip>
-
-💡 在同一语料库上使用 `train_new_from_iterator()` 不会产生完全相同的词汇表。这是因为 🤗 Tokenizers 库没有为训练实现 WordPiece（因为我们不完全确定它的真实实现方式），而是使用了 BPE。
-
-</Tip>
+> [!TIP]
+> 💡 在同一语料库上使用 `train_new_from_iterator()` 不会产生完全相同的词汇表。这是因为 🤗 Tokenizers 库没有为训练实现 WordPiece（因为我们不完全确定它的真实实现方式），而是使用了 BPE。
 
 要对新文本进行分词，我们先预分词，再进行分割，然后在每个词上使用分词算法。也就是说，我们寻找从第一个词开始的最大子词并将其分割，然后我们对第二部分重复此过程，以此类推，对该词以及文本中的后续词进行分割：
 
diff --git a/chapters/zh-CN/chapter6/7.mdx b/chapters/zh-CN/chapter6/7.mdx
index 98e16f6fb..ef9d6c16b 100644
--- a/chapters/zh-CN/chapter6/7.mdx
+++ b/chapters/zh-CN/chapter6/7.mdx
@@ -11,11 +11,8 @@ Unigram 算法常用于 SentencePiece 中，该切分算法被 AlBERT，T5，mBA
 
 <Youtube id="TGZfZVuF9Yc"/>
 
-<Tip>
-
-💡 本节将深入探讨 Unigram，甚至展示完整的实现过程。如果你只想大致了解 tokenization 算法，可以直接跳到章节末尾。
-
-</Tip>
+> [!TIP]
+> 💡 本节将深入探讨 Unigram，甚至展示完整的实现过程。如果你只想大致了解 tokenization 算法，可以直接跳到章节末尾。
 
 ## Unigram 训练 [[Unigram 训练]]
 
@@ -56,11 +53,8 @@ Unigram 模型是一种语言模型，它认为每个符号都与其之前的符
 
 所以，所有频率之和为 210，子词 `"ug"` 出现的概率是 20/210。
 
-<Tip>
-
-✏️ **现在轮到你了！** 编写代码计算上述频率，然后验证结果的准确性，以及概率的总和是否正确。
-
-</Tip>
+> [!TIP]
+> ✏️ **现在轮到你了！** 编写代码计算上述频率，然后验证结果的准确性，以及概率的总和是否正确。
 
 现在，为了对一个给定的单词进行分词，我们会查看所有可能的分词组合，并根据 Unigram 模型计算出每种可能的概率。由于所有的分词都被视为独立的，因此这个单词分词的概率就是每个子词概率的乘积。例如，将 `"pug"` 分词为 `["p", "u", "g"]` 的概率为：
 
@@ -100,11 +94,8 @@ Character 4 (g): "un" "hug" (score 0.005442)
 因此 “unhug” 将被分词为 `["un", "hug"]` 。
 
 
-<Tip>
-
-✏️ **现在轮到你了！**  确定单词 “huggun” 的分词方式以及其得分。
-
-</Tip>
+> [!TIP]
+> ✏️ **现在轮到你了！**  确定单词 “huggun” 的分词方式以及其得分。
 
 ## 回到训练 [[回到训练]]
 
@@ -217,11 +208,8 @@ token_freqs = list(char_freqs.items()) + sorted_subwords[: 300 - len(char_freqs)
 token_freqs = {token: freq for token, freq in token_freqs}
 ```
 
-<Tip>
-
-💡 SentencePiece 使用一种名为增强后缀数组（ESA）的更高效的算法来创建初始词汇表。
-
-</Tip>
+> [!TIP]
+> 💡 SentencePiece 使用一种名为增强后缀数组（ESA）的更高效的算法来创建初始词汇表。
 
 接下来，我们需要计算所有频率的总和，将频率转化为概率。在我们的模型中，我们将存储概率的对数，因为相较于小数相乘，对数相加在数值上更稳定，而且这将简化模型损失的计算：
 
@@ -342,11 +330,8 @@ print(scores["his"])
 0.0
 ```
 
-<Tip>
-
-💡 这种方式效率非常低，所以 SentencePiece 使用了一种估算方法来计算如果没有 X token，模型的损失会是多少：它不是重新开始，而是只是用剩下的词表里 X token 的分词方式来替代它。这样，所有的得分都能在和模型损失一起的同时计算出来。
-
-</Tip>
+> [!TIP]
+> 💡 这种方式效率非常低，所以 SentencePiece 使用了一种估算方法来计算如果没有 X token，模型的损失会是多少：它不是重新开始，而是只是用剩下的词表里 X token 的分词方式来替代它。这样，所有的得分都能在和模型损失一起的同时计算出来。
 
 至此，我们需要做的最后一件事就是将模型使用的特殊 tokens 添加到词汇表中，然后循环直到我们从词汇表中剪除足够多的 tokens 以达到我们期望的规模：
 
diff --git a/chapters/zh-CN/chapter6/8.mdx b/chapters/zh-CN/chapter6/8.mdx
index cbb878ef3..b8b940001 100644
--- a/chapters/zh-CN/chapter6/8.mdx
+++ b/chapters/zh-CN/chapter6/8.mdx
@@ -110,13 +110,10 @@ print(tokenizer.normalizer.normalize_str("Héllò hôw are ü?"))
 hello how are u?
 ```
 
-<Tip>
-
-**更进一步**如果你在包含 unicode 字符的字符串上测试先前 normalizers 的两个版本，你肯定会注意到这两个 normalizers 并不完全等效。
-
-为了避免 `normalizers.Sequence` 过于复杂，我们的实现没有包含当 `clean_text` 参数设置为 `True` 时 `BertNormalizer` 需要的正则表达式替换 —— 而这是 `BertNormalizer` 默认会实现的。但不要担心：通过在 normalizer 序列中添加两个 `normalizers.Replace` 可以在不使用方便的 `BertNormalizer` 的情况下获得完全相同的标准化。
-
-</Tip>
+> [!TIP]
+> **更进一步**如果你在包含 unicode 字符的字符串上测试先前 normalizers 的两个版本，你肯定会注意到这两个 normalizers 并不完全等效。
+>
+> 为了避免 `normalizers.Sequence` 过于复杂，我们的实现没有包含当 `clean_text` 参数设置为 `True` 时 `BertNormalizer` 需要的正则表达式替换 —— 而这是 `BertNormalizer` 默认会实现的。但不要担心：通过在 normalizer 序列中添加两个 `normalizers.Replace` 可以在不使用方便的 `BertNormalizer` 的情况下获得完全相同的标准化。
 
 下一步是预分词。同样，我们可以使用预构建的 `BertPreTokenizer` ：
 
diff --git a/chapters/zh-CN/chapter7/1.mdx b/chapters/zh-CN/chapter7/1.mdx
index 8617c8397..615055176 100644
--- a/chapters/zh-CN/chapter7/1.mdx
+++ b/chapters/zh-CN/chapter7/1.mdx
@@ -26,8 +26,5 @@
 {/if}
 
 
-<Tip>
-
-如果你按顺序阅读这些部分，你会注意到各小节在代码和描述上有许多相似之处。这种重复是有意为之的，让你可以随时钻研或对比学习任何感兴趣的任务，并且在每个任务中都可以找到一个完整的可运行示例。
-
-</Tip>
+> [!TIP]
+> 如果你按顺序阅读这些部分，你会注意到各小节在代码和描述上有许多相似之处。这种重复是有意为之的，让你可以随时钻研或对比学习任何感兴趣的任务，并且在每个任务中都可以找到一个完整的可运行示例。
diff --git a/chapters/zh-CN/chapter7/2.mdx b/chapters/zh-CN/chapter7/2.mdx
index 330606138..aaf518030 100644
--- a/chapters/zh-CN/chapter7/2.mdx
+++ b/chapters/zh-CN/chapter7/2.mdx
@@ -44,11 +44,8 @@
 
 首先，我们需要一个适合 token 分类的数据集。在本节中，我们将使用 [CoNLL-2003 数据集](https://huggingface.co/datasets/conll2003) ，该数据集来源于路透社的新闻报道。
 
-<Tip>
-
-💡 只要你的数据集由分词文本和对应的标签组成，就能够将这里描述的数据处理过程应用到自己的数据集中。如果需要复习如何在 `Dataset` 中加载自定义数据集，请复习 [第五章](/course/chapter5) 。
-
-</Tip>
+> [!TIP]
+> 💡 只要你的数据集由分词文本和对应的标签组成，就能够将这里描述的数据处理过程应用到自己的数据集中。如果需要复习如何在 `Dataset` 中加载自定义数据集，请复习 [第五章](/course/chapter5) 。
 
 ### CoNLL-2003 数据集 [[CoNLL-2003 数据集]]
 
@@ -166,11 +163,8 @@ print(line2)
 
 正如我们在上面的输出中所看到的，跨越两个单词的实体，如“European Union”和“Werner Zwingmann”，数据集把第一个单词标注为了 `B-` 标签，将第二个单词标记为了 `I-` 标签。
 
-<Tip>
-
-✏️ **轮到你了！** 检查同一个句子的词性标注 （POS）或分块（chunking）列，查看输出的结果。
-
-</Tip>
+> [!TIP]
+> ✏️ **轮到你了！** 检查同一个句子的词性标注 （POS）或分块（chunking）列，查看输出的结果。
 
 ### 处理数据 [[处理数据]]
 
@@ -263,11 +257,8 @@ print(align_labels_with_tokens(labels, word_ids))
 
 正如我们所看到的，我们的函数为开头和结尾添加了两个特殊 tokens ：`-100` ，并为切分成两个 tokens 的单词添加了一个新的 `0` 标签。
 
-<Tip>
-
-✏️ **轮到你了！** 有些研究人员更喜欢只为每个单只分配一个标签，对该单词其他部分的 token 分配 `-100` 标签。这是为了避免那些分解成许多子词的长单词对损失作出过多的贡献。请按照这个思路，改变之前的函数，使标签与 inputs ID 对齐。
-
-</Tip>
+> [!TIP]
+> ✏️ **轮到你了！** 有些研究人员更喜欢只为每个单只分配一个标签，对该单词其他部分的 token 分配 `-100` 标签。这是为了避免那些分解成许多子词的长单词对损失作出过多的贡献。请按照这个思路，改变之前的函数，使标签与 inputs ID 对齐。
 
 为了预处理我们的整个数据集，我们需要对所有输入进行 `tokenize`，并使用 `align_labels_with_tokens()` 函数处理所有标签。为了充分利用快速 tokenizer 的优势，最好是同时对大量文本一起进行 `tokenize`，所以我们需要编写一个处理一组示例的函数，并使用带有 `batched=True` 参数的 `Dataset.map()` 方法。与我们之前的示例唯一不同的是，当 `tokenizer` 的输入是文本列表（或示例中单词的列表的列表）时， `word_ids()` 函数需要根据列表的索引获取 `token` 的 ID，所以我们也在下面的函数中添加了这个功能：
 
@@ -426,11 +417,8 @@ model.config.num_labels
 9
 ```
 
-<Tip warning={true}>
-
-⚠️ 如果你的模型的标签数量错误，那么在后面调用 `model.fit()` 时，你会得到一个晦涩的错误。这可能会令人烦恼，所以确保你做了这个检查，确认你的标签数量是正确的。
-
-</Tip>
+> [!WARNING]
+> ⚠️ 如果你的模型的标签数量错误，那么在后面调用 `model.fit()` 时，你会得到一个晦涩的错误。这可能会令人烦恼，所以确保你做了这个检查，确认你的标签数量是正确的。
 
 ### 微调模型
 
@@ -493,11 +481,8 @@ model.fit(
 
 你可以使用 `hub_model_id` 参数指定你想要推送的仓库的全名（特别需要注意的是，如果你需要推送给某个组织，就必须使用这个参数）。例如，当我们将模型推送到 [`huggingface-course` 组织](https://huggingface.co/huggingface-course) 时，我们添加了 `hub_model_id="huggingface-course/bert-finetuned-ner"` 。默认情况下，这个仓库将保存在你的账户之内，并以你设置的输出目录命名，例如 `"cool_huggingface_user/bert-finetuned-ner"` 。
 
-<Tip>
-
-💡 如果设置了模型保存的路径，并且在那里已经存在了一个非空的同名文件夹，那么该目录应该是 Hub 仓库克隆在本地的版本，如果不是，则会在调用 model.ﬁt() 方法时收到一个错误，并需要设置一个新的路径。
-
-</Tip>
+> [!TIP]
+> 💡 如果设置了模型保存的路径，并且在那里已经存在了一个非空的同名文件夹，那么该目录应该是 Hub 仓库克隆在本地的版本，如果不是，则会在调用 model.ﬁt() 方法时收到一个错误，并需要设置一个新的路径。
 
 请注意，在训练过程中每次保存模型时（这里是每个 epooch），它都会在后台上传到 Hub。这样，如有需要，你将能够在另一台机器上继续你的训练。
 
@@ -673,11 +658,8 @@ model.config.num_labels
 9
 ```
 
-<Tip warning={true}>
-
-⚠️ 如果你的模型的标签数量有错误，那么在后面调用 `Trainer.train()` 时，你会得到一个晦涩的错误（类似于“CUDA error：device-side assert triggered”）。这可能会令人烦恼，所以确保你做了这个检查，确认你的标签数量是正确。
-
-</Tip>
+> [!WARNING]
+> ⚠️ 如果你的模型的标签数量有错误，那么在后面调用 `Trainer.train()` 时，你会得到一个晦涩的错误（类似于“CUDA error：device-side assert triggered”）。这可能会令人烦恼，所以确保你做了这个检查，确认你的标签数量是正确。
 
 ### 微调模型
 
@@ -707,11 +689,8 @@ args = TrainingArguments(
 
 你已经对大多数内容有所了解了：我们设置了一些超参数（如学习率、训练的轮数和权重衰减），并设定 `push_to_hub=True` ，表示我们希望在每个训练轮次结束时保存并评估模型，然后将结果上传到模型中心。注意，你可以通过 `hub_model_id` 参数指定你想推送的仓库的名称（特别需要注意的是，如果你需要推送给某个组织，就必须使用这个参数）。例如，当我们将模型推送到 [`huggingface-course` 组织](https://huggingface.co/huggingface-course) 时，我们在 `TrainingArguments` 中添加了 `hub_model_id="huggingface-course/bert-finetuned-ner"` 。默认情况下，使用的仓库将保存在你的账户之内，并以你设置的输出目录命名，所以在我们的例子中，仓库的地址是 `"sgugger/bert-finetuned-ner"` 。
 
-<Tip>
-
-💡 如果你使用的输出路径已经存在一个同名的文件夹，那么它需要是你想推送到 hub 的仓库的克隆在本地的版本。如果不是，你将在声明 `Trainer` 时遇到一个错误，并需要设置一个新的路径。
-
-</Tip>
+> [!TIP]
+> 💡 如果你使用的输出路径已经存在一个同名的文件夹，那么它需要是你想推送到 hub 的仓库的克隆在本地的版本。如果不是，你将在声明 `Trainer` 时遇到一个错误，并需要设置一个新的路径。
 
 最后，我们将所有内容传递给 `Trainer` 并启动训练：
 
@@ -799,11 +778,8 @@ model, optimizer, train_dataloader, eval_dataloader = accelerator.prepare(
 )
 ```
 
-<Tip>
-
-🚨 如果你正在 TPU 上训练，你需要将上面单元格开始的所有代码移动到一个专门的训练函数中。更多详情请回顾 [第三章](/course/chapter3) 。
-
-</Tip>
+> [!TIP]
+> 🚨 如果你正在 TPU 上训练，你需要将上面单元格开始的所有代码移动到一个专门的训练函数中。更多详情请回顾 [第三章](/course/chapter3) 。
 
 现在我们已经将我们的 `train_dataloader` 传递给了 `accelerator.prepare()` 方法，我们还可以使用 `len()` 来计算训练步骤的数量。请记住，我们应该在准备好 `dataloader` 后再使用 `len()` ，因为改动 `dataloader` 会改变训练长度的数量。这里我们将使用一个从学习率衰减到 0 的经典线性学习率调度：
 
diff --git a/chapters/zh-CN/chapter7/3.mdx b/chapters/zh-CN/chapter7/3.mdx
index 240a82139..00247d082 100644
--- a/chapters/zh-CN/chapter7/3.mdx
+++ b/chapters/zh-CN/chapter7/3.mdx
@@ -41,11 +41,8 @@
 
 <Youtube id="mqElG5QJWUg"/>
 
-<Tip>
-
-🙋 如果你对“掩码语言建模”和“预训练模型”这两个术语感到陌生，请回顾 [第一章](/course/chapter1) ，我们在其中解释了所有这些核心概念，并附有视频！
-
-</Tip>
+> [!TIP]
+> 🙋 如果你对“掩码语言建模”和“预训练模型”这两个术语感到陌生，请回顾 [第一章](/course/chapter1) ，我们在其中解释了所有这些核心概念，并附有视频！
 
 ## 选择用于掩码语言建模的预训练模型 [[选择用于掩码语言建模的预训练模型]]
 
@@ -242,11 +239,8 @@ for row in sample:
 
 是的，这些肯定是电影评论，如果你年龄足够大，你甚至可能会理解上述评论中关于拥有 VHS （一种古老的盒式摄像机格式）版本的评论😜！虽然语言模型不需要预先标注好的标签，但我们已经可以看到数据集其实包含了标签， `0` 代表负面评论， `1` 代表正面评论。
 
-<Tip>
-
-✏️ **试一试！** 创建一个 `unsupervised` 部分的随机样本，并验证其标签既不是 `0` 也不是 `1` 。或者，你也可以检查 `train` 和 `test` 部分的标签确实是 `0` 或 `1` —— 每个 NLP 实践者在开始新项目时都应该对数据标注进行的有用的、合理的检查！
-
-</Tip>
+> [!TIP]
+> ✏️ **试一试！** 创建一个 `unsupervised` 部分的随机样本，并验证其标签既不是 `0` 也不是 `1` 。或者，你也可以检查 `train` 和 `test` 部分的标签确实是 `0` 或 `1` —— 每个 NLP 实践者在开始新项目时都应该对数据标注进行的有用的、合理的检查！
 
 现在我们已经快速浏览了一下数据，接下来我们要深入准备这些数据以供进行掩码语言建模。如我们所见，与我们在 [第三章](https://chat.openai.com/course/chapter3) 看到的序列分类任务相比，这里需要采取一些额外的步骤。让我们开始吧！
 
@@ -304,11 +298,8 @@ tokenizer.model_max_length
 
 该值来自于与 checkpoint 相关联的 `tokenizer_config.json` 文件；在我们的例子中，我们可以看到上下文大小是 512 个 `tokens` 与 BERT 模型一样。
 
-<Tip>
-
-✏️ **试试看！** 一些 Transformer 模型，例如 [BigBird](https://huggingface.co/google/bigbird-roberta-base) 和 [Longformer](hf.co/allenai/longformer-base-4096) 具有比 BERT 和其他早期 Transformer 模型更长的上下文长度。选择一个 `checkpoint` 来实例化 `tokenizer` 并验证 `model_max_length` 是否与模型卡上标注的大小一致。
-
-</Tip>
+> [!TIP]
+> ✏️ **试试看！** 一些 Transformer 模型，例如 [BigBird](https://huggingface.co/google/bigbird-roberta-base) 和 [Longformer](hf.co/allenai/longformer-base-4096) 具有比 BERT 和其他早期 Transformer 模型更长的上下文长度。选择一个 `checkpoint` 来实例化 `tokenizer` 并验证 `model_max_length` 是否与模型卡上标注的大小一致。
 
 因此，为了可以在像 Google Colab 那样的 GPU 上运行我们的实验，我们会选择一个稍小一点、可以放入内存中的分块大小：
 
@@ -316,11 +307,8 @@ tokenizer.model_max_length
 chunk_size = 128
 ```
 
-<Tip warning={true}>
-
-注意，在实际应用场景中，使用小的块可能会有丢失长句子之间的语义信息从而对最终模型的性能产生不利的影响，所以如果显存条件允许的话，你应该选择一个与你将要使用模型的相匹配的大小。
-
-</Tip>
+> [!WARNING]
+> 注意，在实际应用场景中，使用小的块可能会有丢失长句子之间的语义信息从而对最终模型的性能产生不利的影响，所以如果显存条件允许的话，你应该选择一个与你将要使用模型的相匹配的大小。
 
 现在来到了最有趣的部分。为了展示如何把这些示例连接在一，我们从分词后的训练集中取出几个评论，并打印出每个评论的 token 数量：
 
@@ -477,11 +465,8 @@ for chunk in data_collator(samples)["input_ids"]:
 
 很棒，成功了！我们可以看到， `[MASK]` tokens 已随机插入我们文本中的不同位置。这些将是我们的模型在训练期间必须预测的 tokens  —— 数据整理器的美妙之处在于，它会在每个 batch 中随机插入 `[MASK]` ！
 
-<Tip>
-
-✏️ **试一试！** 多运行上面的代码片段几次，亲眼看看随机遮蔽的效果！也可以用 `tokenizer.convert_ids_to_tokens()` 替换 `tokenizer.decode()` 方法，看看只把一个给定单词的单个 token 遮蔽，而保持这个单词其他 tokens 不变的效果。
-
-</Tip>
+> [!TIP]
+> ✏️ **试一试！** 多运行上面的代码片段几次，亲眼看看随机遮蔽的效果！也可以用 `tokenizer.convert_ids_to_tokens()` 替换 `tokenizer.decode()` 方法，看看只把一个给定单词的单个 token 遮蔽，而保持这个单词其他 tokens 不变的效果。
 
 {#if fw === 'pt'}
 
@@ -590,11 +575,8 @@ for chunk in batch["input_ids"]:
 '>>> .... [MASK] [MASK] [MASK] [MASK]....... high. a classic line : inspector : i\'m here to sack one of your teachers. student : welcome to bromwell high. i expect that many adults of my age think that bromwell high is far fetched. what a pity that it isn\'t! [SEP] [CLS] homelessness ( or houselessness as george carlin stated ) has been an issue for years but never a plan to help those on the street that were once considered human who did everything from going to school, work, or vote for the matter. most people think of the homeless'
 ```
 
-<Tip>
-
-✏️ **试试看！** 多次运行上面的代码片段，亲眼看看随机遮蔽的效果！也可以将 `tokenizer.decode()` 方法替换为 `tokenizer.convert_ids_to_tokens()` ，可以观察到给定单词的所有 tokens 总是被一起遮蔽。
-
-</Tip>
+> [!TIP]
+> ✏️ **试试看！** 多次运行上面的代码片段，亲眼看看随机遮蔽的效果！也可以将 `tokenizer.decode()` 方法替换为 `tokenizer.convert_ids_to_tokens()` ，可以观察到给定单词的所有 tokens 总是被一起遮蔽。
 
 现在我们有了两个数据整理器，剩下的微调步骤与其他任务类似都是标准的。如果你在 Google Colab 上运行并且没有幸运地分配到神秘的 P100 GPU😭，那么训练可能会需要一些时间，所以我们首先将训练集的大小减小到几千个例子。不用担心，我们仍然可以得到一个相当不错的语言模型！在 🤗 Datasets 中快速筛选数据集的方法是使用我们在 [第五章](/course/chapter5) 中看到的 `Dataset.train_test_split()` 函数：
 
@@ -818,11 +800,8 @@ trainer.push_to_hub()
 
 {/if}
 
-<Tip>
-
-✏️ **轮到你了！** 将数据整理器改为全词屏蔽的数据整理器后运行上面的训练。你能得到更好的结果吗？
-
-</Tip>
+> [!TIP]
+> ✏️ **轮到你了！** 将数据整理器改为全词屏蔽的数据整理器后运行上面的训练。你能得到更好的结果吗？
 
 {#if fw === 'pt'} 
 
@@ -1039,7 +1018,5 @@ Nice！—— 我们的模型显然已经调整了它的权重来预测与电影
 
 这标志着我们第一次训练语言模型的实验到现在就结束了。在 [第 6 节](https://chat.openai.com/course/en/chapter7/6) 中，你将学习如何从头开始训练一个自动回归模型，比如 GPT-2；如果你想看看如何预训练你自己的 Transformer 模型，就赶快去那里看看吧！
 
-<Tip>
-
-✏️ **试试看！** 为了量化领域适应的好处，分别使用预训练和微调的 DistilBERT checkpoint 以及数据集自带的 IMDb 标签来微调一个分类器，并对比一下这个两个 checkpoint 的差异。如果你需要复习文本分类的知识，请查看 [第三章](/course/chapter3) 。
-</Tip>
+> [!TIP]
+> ✏️ **试试看！** 为了量化领域适应的好处，分别使用预训练和微调的 DistilBERT checkpoint 以及数据集自带的 IMDb 标签来微调一个分类器，并对比一下这个两个 checkpoint 的差异。如果你需要复习文本分类的知识，请查看 [第三章](/course/chapter3) 。
diff --git a/chapters/zh-CN/chapter7/4.mdx b/chapters/zh-CN/chapter7/4.mdx
index 0a2313357..fbb1c9453 100644
--- a/chapters/zh-CN/chapter7/4.mdx
+++ b/chapters/zh-CN/chapter7/4.mdx
@@ -155,11 +155,8 @@ translator(
 
 <Youtube id="0Oxphw4Q9fo"/>
 
-<Tip>
-
-✏️ **轮到你了！** 另一个在法语中经常使用的英语单词是“email”。在训练数据集中找到使用这个词的第一个样本。在数据集中它是如何翻译的？预训练模型如何翻译同一个英文句子？
-
-</Tip>
+> [!TIP]
+> ✏️ **轮到你了！** 另一个在法语中经常使用的英语单词是“email”。在训练数据集中找到使用这个词的第一个样本。在数据集中它是如何翻译的？预训练模型如何翻译同一个英文句子？
 
 ### 处理数据 [[处理数据]]
 
@@ -176,11 +173,8 @@ tokenizer = AutoTokenizer.from_pretrained(model_checkpoint, return_tensors="pt")
 
 你也可以将 `model_checkpoint` 替换为你从 [Hub](https://huggingface.co/models) 中选择的其他模型，或者一个保存了预训练模型和 tokenizer 的本地文件夹。
 
-<Tip>
-
-💡 如果你在使用一个多语言的 tokenizer，比如 mBART，mBART-50，或者 M2M100，你需要通过设置 `tokenizer.src_lang` 和 `tokenizer.tgt_lang` 来在 tokenizer 中指定输入和目标的语言代码。
-
-</Tip>
+> [!TIP]
+> 💡 如果你在使用一个多语言的 tokenizer，比如 mBART，mBART-50，或者 M2M100，你需要通过设置 `tokenizer.src_lang` 和 `tokenizer.tgt_lang` 来在 tokenizer 中指定输入和目标的语言代码。
 
 我们的数据准备相当简单。只有一点要记住；你需要确保 tokenizer 处理的目标是输出语言（在这里是法语）。你可以通过将目标语言传递给 tokenizer 的 `__call__` 方法的 `text_targets` 参数来完成此操作。
 
@@ -230,17 +224,11 @@ def preprocess_function(examples):
 
 请注意，上述代码也为输入和输出设置了相同的最大长度。由于要处理的文本看起来很短，因此在这里将最大长度设置为 128。
 
-<Tip>
+> [!TIP]
+> 💡 如果你正在使用 T5 模型（更具体地说，一个 `t5-xxx` checkpoint ），模型会期望文本输入有一个前缀指示目前的任务，比如 `translate: English to French:` 。
 
-💡 如果你正在使用 T5 模型（更具体地说，一个 `t5-xxx` checkpoint ），模型会期望文本输入有一个前缀指示目前的任务，比如 `translate: English to French:` 。
-
-</Tip>
-
-<Tip warning={true}>
-
-⚠️ 我们不需要对待遇测的目标设置注意力掩码，因为模型序列到序列的不会需要它。不过，我们应该将填充（padding） token 对应的标签设置为 `-100` ，以便在 loss 计算中忽略它们。由于我们正在使用动态填充，这将在稍后由我们的数据整理器完成，但是如果你在此处就打算进行填充，你应该调整预处理函数，将所有填充（padding） token 对应的标签设置为 `-100` 。
-
-</Tip>
+> [!WARNING]
+> ⚠️ 我们不需要对待遇测的目标设置注意力掩码，因为模型序列到序列的不会需要它。不过，我们应该将填充（padding） token 对应的标签设置为 `-100` ，以便在 loss 计算中忽略它们。由于我们正在使用动态填充，这将在稍后由我们的数据整理器完成，但是如果你在此处就打算进行填充，你应该调整预处理函数，将所有填充（padding） token 对应的标签设置为 `-100` 。
 
 我们现在可以一次性使用上述预处理处理数据集的所有数据。
 
@@ -650,11 +638,8 @@ model.fit(
 
 请注意，你可以使用 `hub_model_id` 参数指定要推送到的模型仓库的名称（当你想把模型推送到指定的组织的时候，就必须使用此参数）。例如，当我们将模型推送到 [`huggingface-course` 组织](https://huggingface.co/huggingface-course) 时，我们在 `Seq2SeqTrainingArguments`添加了 `hub_model_id="huggingface-course/marian-finetuned-kde4-en- to-fr"` 。默认情况下，该仓库将保存在你的账户里，并以你设置的输出目录命名，因此在我们的例子中它是 `"sgugger/marian-finetuned-kde4-en-to-fr"` 。
 
-<Tip>
-
-💡如果正在使用的输出目录已经存在一个同名的文件夹，则它应该是目标推送仓库的在本地克隆在本地的版本。如果不是，当调用 `model.fit()` 时会收到错误，并需要设置一个新的路径。
-
-</Tip>
+> [!TIP]
+> 💡如果正在使用的输出目录已经存在一个同名的文件夹，则它应该是目标推送仓库的在本地克隆在本地的版本。如果不是，当调用 `model.fit()` 时会收到错误，并需要设置一个新的路径。
 
 最后，让我们看看训练结束后我们的模型的 BLEU 的分数：
 
@@ -700,11 +685,8 @@ args = Seq2SeqTrainingArguments(
 
 请注意，你可以使用 `hub_model_id` 参数指定要推送到的存储库的名称（当你想把模型推送到指定的组织的时候，就必须使用此参数）。例如，当我们将模型推送到 [`huggingface-course` 组织](https://huggingface.co/huggingface-course) 时，我们在 `Seq2SeqTrainingArguments` 添加了 `hub_model_id="huggingface-course/marian-finetuned-kde4-en- to-fr"` 。默认情况下，该仓库将保存在你的账户中，并以你设置的输出目录命名，因此在我们的例子中它是 `"sgugger/marian-finetuned-kde4-en-to-fr"` 。
 
-<Tip>
-
-💡如果你使用的输出目录已经存在一个同名的文件夹，则它应该是推送的仓库克隆在本地的版本。如果不是，你将在定义你的 `Seq2SeqTrainer` 名称时会遇到错误，并且需要设置一个新名称。
-
-</Tip>
+> [!TIP]
+> 💡如果你使用的输出目录已经存在一个同名的文件夹，则它应该是推送的仓库克隆在本地的版本。如果不是，你将在定义你的 `Seq2SeqTrainer` 名称时会遇到错误，并且需要设置一个新名称。
 
 最后，我们将所有内容传递给 `Seq2SeqTrainer` ：
 
@@ -997,8 +979,5 @@ translator(
 
 这是另一个领域适应的好例子！
 
-<Tip>
-
-✏️ **轮到你了！** 把之前找到的包含单词“email”样本输入模型，会返回什么结果？
-
-</Tip>
+> [!TIP]
+> ✏️ **轮到你了！** 把之前找到的包含单词“email”样本输入模型，会返回什么结果？
diff --git a/chapters/zh-CN/chapter7/5.mdx b/chapters/zh-CN/chapter7/5.mdx
index f50469f85..2d4d1e155 100644
--- a/chapters/zh-CN/chapter7/5.mdx
+++ b/chapters/zh-CN/chapter7/5.mdx
@@ -86,11 +86,8 @@ show_samples(english_dataset)
 '>> Review: Bought this for handling miscellaneous aircraft parts and hanger "stuff" that I needed to organize; it really fit the bill. The unit arrived quickly, was well packaged and arrived intact (always a good sign). There are five wall mounts-- three on the top and two on the bottom. I wanted to mount it on the wall, so all I had to do was to remove the top two layers of plastic drawers, as well as the bottom corner drawers, place it when I wanted and mark it; I then used some of the new plastic screw in wall anchors (the 50 pound variety) and it easily mounted to the wall. Some have remarked that they wanted dividers for the drawers, and that they made those. Good idea. My application was that I needed something that I can see the contents at about eye level, so I wanted the fuller-sized drawers. I also like that these are the new plastic that doesn\'t get brittle and split like my older plastic drawers did. I like the all-plastic construction. It\'s heavy duty enough to hold metal parts, but being made of plastic it\'s not as heavy as a metal frame, so you can easily mount it to the wall and still load it up with heavy stuff, or light stuff. No problem there. For the money, you can\'t beat it. Best one of these I\'ve bought to date-- and I\'ve been using some version of these for over forty years.'
 ```
 
-<Tip>
-
-✏️ **试试看！** 更改 `Dataset.shuffle()` 命令中的随机种子以探索语料库中的其他评论。如果你是说西班牙语的人，请查看 `spanish_dataset` 中的一些评论，看看标题是否像是合理的摘要。
-
-</Tip>
+> [!TIP]
+> ✏️ **试试看！** 更改 `Dataset.shuffle()` 命令中的随机种子以探索语料库中的其他评论。如果你是说西班牙语的人，请查看 `spanish_dataset` 中的一些评论，看看标题是否像是合理的摘要。
 
 这个示例显示了人们通常在网上评论的多样性，从积极的到消极的（以及介于两者之间的评论！）。尽管带有“meh”标题的示例的信息量不大，但其他标题看起来像是对评论本身的不错的总结。在单个 GPU 上训练所有 400,000 条评论的摘要模型将花费太长时间，因此我们将专注于为单个产品领域生成摘要。为了了解我们可以选择哪些领域，让我们将 `english_dataset` 转换为 `pandas.DataFrame` ，并计算每个产品类别的评论数量：
 
@@ -228,11 +225,8 @@ books_dataset = books_dataset.filter(lambda x: len(x["review_title"].split()) >
 
 mT5 不使用前缀，但具有 T5 的大部分功能，并且具有多语言的优势。现在我们已经选择了一个模型，接下来让我们来看看如何准备我们的训练数据。
 
-<Tip>
-
-✏️ **试试看！** 完成本节后，可以尝试比较一下 mT5 和用相同技术微调过的 mBART 的性能。附加的挑战：只在英文评论上微调 T5。因为 T5 有一个特殊的前缀提示，你需要在下面的预处理步骤中将 `summarize:` 添加到输入例子前。
-
-</Tip>
+> [!TIP]
+> ✏️ **试试看！** 完成本节后，可以尝试比较一下 mT5 和用相同技术微调过的 mBART 的性能。附加的挑战：只在英文评论上微调 T5。因为 T5 有一个特殊的前缀提示，你需要在下面的预处理步骤中将 `summarize:` 添加到输入例子前。
 
 ## 预处理数据 [[预处理数据]]
 
@@ -247,11 +241,8 @@ model_checkpoint = "google/mt5-small"
 tokenizer = AutoTokenizer.from_pretrained(model_checkpoint)
 ```
 
-<Tip>
-
-💡在 NLP 项目的早期阶段，一个好的做法是在小样本数据上训练一类“小”模型。这使你可以更快地调试和迭代端到端工作流。当你对结果有信心之后，你只需要通过简单地更改模型 checkpoint 就可以在较大规模数据上训练模型！
-
-</Tip>
+> [!TIP]
+> 💡在 NLP 项目的早期阶段，一个好的做法是在小样本数据上训练一类“小”模型。这使你可以更快地调试和迭代端到端工作流。当你对结果有信心之后，你只需要通过简单地更改模型 checkpoint 就可以在较大规模数据上训练模型！
 
 让我们在一个小样本上测试 mT5  tokenizer 
 
@@ -306,11 +297,8 @@ tokenized_datasets = books_dataset.map(preprocess_function, batched=True)
 
 既然语料库已经预处理完毕，我们来看看一些常用的摘要指标。正如我们在下面即将看到的，在衡量机器生成的文本的质量方面没有灵丹妙药。
 
-<Tip>
-
-💡 你可能已经注意到我们在上面的 `Dataset.map()` 函数中使用了 `batched=True` 。这将以 1000（默认值）的 batch size 对示例继续编码，并让你可以利用 🤗 Transformers 中快速 tokenizer 的多线程功能。在可能的情况下，尝试使用 `batched=True` 来加速你的预处理！
-
-</Tip>
+> [!TIP]
+> 💡 你可能已经注意到我们在上面的 `Dataset.map()` 函数中使用了 `batched=True` 。这将以 1000（默认值）的 batch size 对示例继续编码，并让你可以利用 🤗 Transformers 中快速 tokenizer 的多线程功能。在可能的情况下，尝试使用 `batched=True` 来加速你的预处理！
 
 ## 文本摘要的评估指标 [[文本摘要的评估指标]]
 
@@ -326,11 +314,8 @@ reference_summary = "I loved reading the Hunger Games"
 ```
 比较它们的一种方法是计算重叠单词的数量，在这个例子中为 6。然而，这种方法有些粗糙，因此 ROUGE 是基于计算计算重叠部分的 `精确度(Precision)` 和 `召回率(Recall)` 分数来计算的。
 
-<Tip>
-
-🙋 如果这是你第一次听说精确度（Precision）和召回率（Recall），请不要担心——我们将一起通过一些清晰的示例来理解它们。这些指标通常在分类任务中遇到，所以如果你想了解在分类任务中精确度（Precision）和召回率（Recall）是如何定义的，我们建议你查看 `scikit-learn` 的 [指南](https://scikit-learn.org/stable/auto_examples/model_selection/plot_precision_recall.html) 。
-
-</Tip>
+> [!TIP]
+> 🙋 如果这是你第一次听说精确度（Precision）和召回率（Recall），请不要担心——我们将一起通过一些清晰的示例来理解它们。这些指标通常在分类任务中遇到，所以如果你想了解在分类任务中精确度（Precision）和召回率（Recall）是如何定义的，我们建议你查看 `scikit-learn` 的 [指南](https://scikit-learn.org/stable/auto_examples/model_selection/plot_precision_recall.html) 。
 
 对于 ROUGE，召回率衡量的是参考摘要中被生成摘要捕获的内容量。如果我们只是比较单词，召回率可以按照以下公式计算：
 
@@ -382,11 +367,8 @@ Score(precision=0.86, recall=1.0, fmeasure=0.92)
 ```
 太好了，精确度和召回率的数字都对上了！那么其他的 ROUGE 得分表示什么含义呢？ `rouge2` 度量了二元词组（考虑单词对的重叠）之间的重叠，而 `rougeL` 和 `rougeLsum` 通过寻找生成的摘要和参考摘要中最长的公共子串来度量单词的最长匹配序列。 `rougeLsum` 中的“sum”指的是该指标是在整个摘要上计算的，而 `rougeL` 是指在各个句子上计算的平均值。
 
-<Tip>
-
-✏️ **试试看！** 自己手动创建一个生成摘要和参考摘要，看看使用 evaluate 得出的 ROUGE 分数是否与基于精确度和召回率公式的手动计算一致。附加的挑战：将文本切分为长度为2的词组，并手动计算精度和召回率与 `rouge2` 指标的精确度和召回率进行对比。
-
-</Tip>
+> [!TIP]
+> ✏️ **试试看！** 自己手动创建一个生成摘要和参考摘要，看看使用 evaluate 得出的 ROUGE 分数是否与基于精确度和召回率公式的手动计算一致。附加的挑战：将文本切分为长度为2的词组，并手动计算精度和召回率与 `rouge2` 指标的精确度和召回率进行对比。
 
 我们将使用这些 ROUGE 分数来跟踪我们模型的性能，但在此之前，让我们做每个优秀的 NLP 从业者都应该做的事情：创建一个强大而简单的 baseline！
 
@@ -476,11 +458,8 @@ model = TFAutoModelForSeq2SeqLM.from_pretrained(model_checkpoint)
 
 {/if}
 
-<Tip>
-
-💡 如果你想知道为什么在实例化的过程中没有看到任何关于微调模型的警告，那是因为对于序列到序列的任务，我们保留了网络的所有权重。与此相比，在 [第三章](https://chat.openai.com/course/chapter3) 中的文本分类模型中，我们用一个随机初始化的网络替换了预训练模型的头部。
-
-</Tip>
+> [!TIP]
+> 💡 如果你想知道为什么在实例化的过程中没有看到任何关于微调模型的警告，那是因为对于序列到序列的任务，我们保留了网络的所有权重。与此相比，在 [第三章](https://chat.openai.com/course/chapter3) 中的文本分类模型中，我们用一个随机初始化的网络替换了预训练模型的头部。
 
 我们需要做的下一件事是登录 Hugging Face Hub。如果你在 notebook 中运行此代码，则可以使用以下实用程序函数进行此操作：
 
@@ -845,11 +824,8 @@ model, optimizer, train_dataloader, eval_dataloader = accelerator.prepare(
 )
 ```
 
-<Tip>
-
-🚨如果你在 TPU 上进行训练，则需要将上述所有代码移动到专门的训练函数中。有关 TPU 的详细信息，请回顾 [第三章](/course/chapter3) 。
-
-</Tip>
+> [!TIP]
+> 🚨如果你在 TPU 上进行训练，则需要将上述所有代码移动到专门的训练函数中。有关 TPU 的详细信息，请回顾 [第三章](/course/chapter3) 。
 
 现在我们已经准备好了我们的对象，还有三个事情需要做
 
diff --git a/chapters/zh-CN/chapter7/6.mdx b/chapters/zh-CN/chapter7/6.mdx
index 83028b05f..92d47f391 100644
--- a/chapters/zh-CN/chapter7/6.mdx
+++ b/chapters/zh-CN/chapter7/6.mdx
@@ -245,11 +245,8 @@ DatasetDict({
 
 既然我们已经准备好了数据集，那就来设置模型吧！
 
-<Tip>
-
-✏️ **试一试！**这里我们删除了所有小于设定的上下文大小的块，并不会造成大问题，因为我们使用的是比较小的上下文窗口。随着增大上下文大小（或者语料库中的文档长度都很短），被抛弃的块的比例也会增加。更有效方法是将所有 tokenize 后的样本拼接起来加入一个 batch 中，每个样本之间有一个 `eos_token_id` token 作为分隔，然后对连接后的序列进行切块处理。作为练习，修改 `tokenize()` 函数以利用这种方法。请注意，为了获取完整的 token  ID 序列你需要设置 `truncation=False` ，并删除 tokenizer 中的其他参数。
-
-</Tip>
+> [!TIP]
+> ✏️ **试一试！**这里我们删除了所有小于设定的上下文大小的块，并不会造成大问题，因为我们使用的是比较小的上下文窗口。随着增大上下文大小（或者语料库中的文档长度都很短），被抛弃的块的比例也会增加。更有效方法是将所有 tokenize 后的样本拼接起来加入一个 batch 中，每个样本之间有一个 `eos_token_id` token 作为分隔，然后对连接后的序列进行切块处理。作为练习，修改 `tokenize()` 函数以利用这种方法。请注意，为了获取完整的 token  ID 序列你需要设置 `truncation=False` ，并删除 tokenizer 中的其他参数。
 
 ## 初始化一个新模型 [[初始化一个新模型]]
 
@@ -390,11 +387,8 @@ tf_eval_dataset = model.prepare_tf_dataset(
 
 {/if}
 
-<Tip warning={true}>
-
-⚠️ 输入序列和目标序列对齐将在模型内部自动进行，所以数据整理器只需复制输入序列来创建目标序列。
-
-</Tip>
+> [!WARNING]
+> ⚠️ 输入序列和目标序列对齐将在模型内部自动进行，所以数据整理器只需复制输入序列来创建目标序列。
 
 现在我们已经准备好了所有东西，可以开始训练我们的模型了——好像也不是那么困难！在我们开始训练之前，我们应该登录到 Hugging Face。如果你正在使用 Notebook 运行代码，你可以使用下面的实用函数进行登录：
 
@@ -492,25 +486,19 @@ model.fit(tf_train_dataset, validation_data=tf_eval_dataset, callbacks=[callback
 
 {/if}
 
-<Tip>
-
-✏️ **试试看！** 除了 `TrainingArguments` 之外，我们只需要大约 30 行代码就可以从原始文本到训练 GPT-2。用你自己的数据集试试看，看看你能不能得到好的结果！
-
-</Tip>
+> [!TIP]
+> ✏️ **试试看！** 除了 `TrainingArguments` 之外，我们只需要大约 30 行代码就可以从原始文本到训练 GPT-2。用你自己的数据集试试看，看看你能不能得到好的结果！
 
-<Tip>
-
-{#if fw === 'pt'}
-
-💡 如果你能使用多 GPU 的机器，尝试在那里运行代码。 `Trainer` 自动管理多台机器，这能极大地加快训练速度。
-
-{:else}
-
-💡 如果你正在使用具有多个 GPU 的计算机，则可以尝试使用 `MirroredStrategy` 上下文来大幅加快训练速度。你需要创建一个 `tf.distribute.MirroredStrategy` 对象，并确保所有的 `to_tf_dataset` 或 `prepare_tf_dataset()` 方法以及模型创建和对 `fit()` 的调用都在其 `scope()` 上下文中运行。你可以在 [这里](https://www.tensorflow.org/guide/distributed_training#use_tfdistributestrategy_with_keras_modelfit) 查看有关此内容的文档。
-
-{/if}
-
-</Tip>
+> [!TIP]
+> {#if fw === 'pt'}
+>
+> 💡 如果你能使用多 GPU 的机器，尝试在那里运行代码。 `Trainer` 自动管理多台机器，这能极大地加快训练速度。
+>
+> {:else}
+>
+> 💡 如果你正在使用具有多个 GPU 的计算机，则可以尝试使用 `MirroredStrategy` 上下文来大幅加快训练速度。你需要创建一个 `tf.distribute.MirroredStrategy` 对象，并确保所有的 `to_tf_dataset` 或 `prepare_tf_dataset()` 方法以及模型创建和对 `fit()` 的调用都在其 `scope()` 上下文中运行。你可以在 [这里](https://www.tensorflow.org/guide/distributed_training#use_tfdistributestrategy_with_keras_modelfit) 查看有关此内容的文档。
+>
+> {/if}
 
 ## 使用 pipeline 进行代码生成 [[使用管道生成代码]]
 
@@ -786,11 +774,8 @@ model, optimizer, train_dataloader, eval_dataloader = accelerator.prepare(
 )
 ```
 
-<Tip>
-
-🚨 如果你在 TPU 上训练，你需要将上述单元格开始的所有代码移到一个专门的训练函数中。更多详情请参阅 [第三章](/course/chapter3) 。
-
-</Tip>
+> [!TIP]
+> 🚨 如果你在 TPU 上训练，你需要将上述单元格开始的所有代码移到一个专门的训练函数中。更多详情请参阅 [第三章](/course/chapter3) 。
 
 现在我们已经将我们的 `train_dataloader` 传递给了 `accelerator.prepare()` ，我们可以使用 `len()` 来计算训练步骤的数量。请记住，我们应该在准备好 `dataloader` 后再使用 `len()` ，因为改动 `dataloader` 会改变其长度。我们使用一个从学习率衰减到 0 的经典线性学习率调度：
 
@@ -886,16 +871,10 @@ for epoch in range(num_train_epochs):
 
 就是这样 - 你现在拥有自己的因果语言模型（例如 GPT-2）的自定义训练循环，你可以根据自己的需要进一步定制。
 
-<Tip>
-
-✏️ **试试看！** 创建适合你的用例的自定义损失函数，或在训练循环中添加另一个自定义步骤。
-
-</Tip>
-
-<Tip>
-
-✏️ **试试看！**  当运行长时间的训练实验时，使用 TensorBoard 或 Weights & Biases 等工具记录重要指标是个好主意。向训练循环中添加适当的日志记录，这样你可以随时检查训练进度。
+> [!TIP]
+> ✏️ **试试看！** 创建适合你的用例的自定义损失函数，或在训练循环中添加另一个自定义步骤。
 
-</Tip>
+> [!TIP]
+> ✏️ **试试看！**  当运行长时间的训练实验时，使用 TensorBoard 或 Weights & Biases 等工具记录重要指标是个好主意。向训练循环中添加适当的日志记录，这样你可以随时检查训练进度。
 
 {/if}
\ No newline at end of file
diff --git a/chapters/zh-CN/chapter7/7.mdx b/chapters/zh-CN/chapter7/7.mdx
index efeda62f6..220530c0e 100644
--- a/chapters/zh-CN/chapter7/7.mdx
+++ b/chapters/zh-CN/chapter7/7.mdx
@@ -32,11 +32,8 @@
 
 本节使用的代码已经上传到了 Hub。你可以在 [这里](https://huggingface.co/huggingface-course/bert-finetuned-squad?context=%F0%9F%A4%97+Transformers+is+backed+by+the+three+most+popular+deep+learning+libraries+%E2%80%94+Jax%2C+PyTorch+and+TensorFlow+%E2%80%94+with+a+seamless+integration+between+them.+It%27s+straightforward+to+train+your+models+with+one+before+loading+them+for+inference+with+the+other.&question=Which+deep+learning+libraries+back+%F0%9F%A4%97+Transformers%3F) 找到它并尝试用它进行预测。
 
-<Tip>
-
-💡 像 BERT 这样的纯编码器模型往往很擅长提取诸如 “谁发明了 Transformer 架构？”之类的事实性问题的答案。但在给出诸如 “为什么天空是蓝色的？” 之类的开放式问题时表现不佳。在这些更具挑战性的情况下，通常使用编码器-解码器模型如 T5 和 BART 来以类似于 [文本摘要](https://chat.openai.com/course/chapter7/5) 的方式整合信息。如果你对这种 `生成式（generative）` 问答感兴趣，我们推荐你查看我们做的基于 [ELI5 数据集](https://huggingface.co/datasets/eli5) 的 [演示demo](https://yjernite.github.io/lfqa.html) 。
-
-</Tip>
+> [!TIP]
+> 💡 像 BERT 这样的纯编码器模型往往很擅长提取诸如 “谁发明了 Transformer 架构？”之类的事实性问题的答案。但在给出诸如 “为什么天空是蓝色的？” 之类的开放式问题时表现不佳。在这些更具挑战性的情况下，通常使用编码器-解码器模型如 T5 和 BART 来以类似于 [文本摘要](https://chat.openai.com/course/chapter7/5) 的方式整合信息。如果你对这种 `生成式（generative）` 问答感兴趣，我们推荐你查看我们做的基于 [ELI5 数据集](https://huggingface.co/datasets/eli5) 的 [演示demo](https://yjernite.github.io/lfqa.html) 。
 
 ## 准备数据 [[准备数据]]
 
@@ -358,11 +355,8 @@ print(f"Theoretical answer: {answer}, decoded example: {decoded_example}")
 
 确实，我们在 Context 中没有看到答案。
 
-<Tip>
-
-✏️ **轮你来了！** 在使用 XLNet 架构时，如果截取后的文本长度没有达到设定的最大长度，需要在左侧进行填充，并且需要交互问题和 Context 的顺序。尝试将我们刚刚看到的所有代码调整为 XLNet 架构（并添加 `padding=True` ）。请注意，因为是在左侧填充的，所以填充后的 `[CLS]` tokens 可能不在索引为 0 的位置。
-
-</Tip>
+> [!TIP]
+> ✏️ **轮你来了！** 在使用 XLNet 架构时，如果截取后的文本长度没有达到设定的最大长度，需要在左侧进行填充，并且需要交互问题和 Context 的顺序。尝试将我们刚刚看到的所有代码调整为 XLNet 架构（并添加 `padding=True` ）。请注意，因为是在左侧填充的，所以填充后的 `[CLS]` tokens 可能不在索引为 0 的位置。
 
 现在，我们已经逐步了解了如何预处理我们的训练数据，接下来可以将其组合到一个函数中，并使用该函数处理整个训练数据集。我们将每个拆分后的样本都填充到我们设置的最大长度，因为大多数上下文都很长（相应的样本会被分割成几小块），所以在这里进行动态填充的所带来的增益不是很大。
 
@@ -910,11 +904,8 @@ tf.keras.mixed_precision.set_global_policy("mixed_float16")
 
 {#if fw === 'pt'}
 
-<Tip>
-
-💡 如果你正在使用的输出目录已经存在一个同名的文件，则它需要是你要推送到的存储库克隆在本地的版本（因此，如果在定义你的 `Trainer` 时出现错误，请设置一个新的名称）。
-
-</Tip>
+> [!TIP]
+> 💡 如果你正在使用的输出目录已经存在一个同名的文件，则它需要是你要推送到的存储库克隆在本地的版本（因此，如果在定义你的 `Trainer` 时出现错误，请设置一个新的名称）。
 
 最后，我们只需将所有内容传递给 `Trainer` 类并启动训练：
 
@@ -997,11 +988,8 @@ trainer.push_to_hub(commit_message="Training complete")
 
 在这个阶段，你可以使用模型库中的推理小部件来测试模型，并与你的朋友、家人和同伴分享。恭喜你成功地在问答任务上对模型进行了微调！
 
-<Tip>
-
-✏️ **轮到你了！** 尝试使用另一个模型架构，看看它在这个任务上表现得是否更好！
-
-</Tip>
+> [!TIP]
+> ✏️ **轮到你了！** 尝试使用另一个模型架构，看看它在这个任务上表现得是否更好！
 
 {#if fw === 'pt'}
 
diff --git a/chapters/zh-CN/chapter8/2.mdx b/chapters/zh-CN/chapter8/2.mdx
index 87d549896..ea88969fe 100644
--- a/chapters/zh-CN/chapter8/2.mdx
+++ b/chapters/zh-CN/chapter8/2.mdx
@@ -85,11 +85,8 @@ OSError: Can't load config for 'lewtun/distillbert-base-uncased-finetuned-squad-
 
 这些报告中包含很多信息，让我们一起来看看关键部分。阅读这样的报告时的阅读顺序比较特殊，应该按照从底部到顶部的顺序阅读，如果你习惯于从上到下阅读文本，这可能听起来很奇怪，但它反映了一个事实：traceback 显示了在下载模型和 tokenizer 时 `pipeline` 函数调用的顺序。（查看 [第二章](/course/chapter2) 了解有关 `pipeline` 内部原理的更多详细信息。）
 
-<Tip>
-
-🚨 看到 Google Colab 中 traceback 中间 “6 frames” 的椭圆形蓝色框了吗？这是 Colab 的一个特殊功能，它会自动将 traceback 的中间部分压缩为“frames”。如果你无法找到错误的来源，可以通过单击这两个小箭头来展开完整的 traceback。
-
-</Tip>
+> [!TIP]
+> 🚨 看到 Google Colab 中 traceback 中间 “6 frames” 的椭圆形蓝色框了吗？这是 Colab 的一个特殊功能，它会自动将 traceback 的中间部分压缩为“frames”。如果你无法找到错误的来源，可以通过单击这两个小箭头来展开完整的 traceback。
 
 这意味着 traceback 的最后一行显示的是最后一条错误消息和引发的异常名称。在这里，异常类型是 `OSError` ，表示这个错误与系统相关。如果我们阅读随之附着的错误消息，我们就可以看到模型的 `config.json` 文件似乎有问题，这里给出了两个修复的建议：
 
@@ -103,11 +100,8 @@ Make sure that:
 """
 ```
 
-<Tip>
-
-💡 如果你遇到难以理解的错误消息，只需将该消息复制并粘贴到 Google 或 [Stack Overflow](https://stackoverflow.com) 搜索栏中。你很有可能不是第一个遇到错误的人，因此很有可能在社区中找到其他人发布的解决方案。例如，在 Stack Overflow 上搜索 `OSError: Can't load config for` 给出了几个 [结果](https://stackoverflow.com/search?q=OSError%3A+Can%27t+load+config+for+) ，可以作为你解决问题的起点。
-
-</Tip>
+> [!TIP]
+> 💡 如果你遇到难以理解的错误消息，只需将该消息复制并粘贴到 Google 或 [Stack Overflow](https://stackoverflow.com) 搜索栏中。你很有可能不是第一个遇到错误的人，因此很有可能在社区中找到其他人发布的解决方案。例如，在 Stack Overflow 上搜索 `OSError: Can't load config for` 给出了几个 [结果](https://stackoverflow.com/search?q=OSError%3A+Can%27t+load+config+for+) ，可以作为你解决问题的起点。
 
 第一个建议是检查模型 ID 是否真的正确，所以首先要做的就是复制标签并将其粘贴到 Hub 的搜索栏中：
 
@@ -159,11 +153,8 @@ pretrained_checkpoint = "distilbert-base-uncased"
 config = AutoConfig.from_pretrained(pretrained_checkpoint)
 ```
 
-<Tip warning={true}>
-
-🚨 在这里采用的方法并不是百分之百可靠的，因为你的同事可能在微调模型之前已经调整了 `distilbert-base-uncased` 配置。在现实情况中，我们应该先与他们核实，但在本节中，我们先假设他们使用的是默认配置。
-
-</Tip>
+> [!WARNING]
+> 🚨 在这里采用的方法并不是百分之百可靠的，因为你的同事可能在微调模型之前已经调整了 `distilbert-base-uncased` 配置。在现实情况中，我们应该先与他们核实，但在本节中，我们先假设他们使用的是默认配置。
 
 上一步成功后，可以使用 `config` 的 `push_to_hub()` 方法将其上传到模型仓库：
 
diff --git a/chapters/zh-CN/chapter8/4.mdx b/chapters/zh-CN/chapter8/4.mdx
index d7ce5d279..2ca008b5e 100644
--- a/chapters/zh-CN/chapter8/4.mdx
+++ b/chapters/zh-CN/chapter8/4.mdx
@@ -243,11 +243,8 @@ trainer.train_dataset.features["label"].names
 
 我们这里没有 token 类型 ID，因为 DistilBERT 不需要它们；如果你的模型中有，你还应该确保 token 类型 ID 可以正确匹配输入中第一句和第二句的位置。
 
-<Tip>
-
-✏️ **轮到你了！** 检查训练数据集的第二个条数据是否正确。
-
-</Tip>
+> [!TIP]
+> ✏️ **轮到你了！** 检查训练数据集的第二个条数据是否正确。
 
 我们在这里只对训练集进行检查，但你当然应该以同样的方式仔细检查验证集和测试集。
 
@@ -520,11 +517,8 @@ trainer.optimizer.step()
 
 要解决这个问题，你只需要使用更少的显存—这往往说起来容易做起来难。首先，确保你没有同时在 GPU 上运行两个模型（当然，除非在解决问题时必须要这样做）。然后，你可能应该减少 batch 的大小，因为它直接影响模型的所有中间输出的大小及其梯度。如果问题仍然存在，请考虑使用较小版本的模型，或者更换有更大显存的设备。
 
-<Tip>
-
-在课程的下一部分中，我们将介绍更先进的技术，这些技术可以帮助你减少内存占用并让你微调超大的模型。
-
-</Tip>
+> [!TIP]
+> 在课程的下一部分中，我们将介绍更先进的技术，这些技术可以帮助你减少内存占用并让你微调超大的模型。
 
 ### 评估模型 [[评估模型]]
 
@@ -551,11 +545,8 @@ trainer.evaluate()
 TypeError: only size-1 arrays can be converted to Python scalars
 ```
 
-<Tip>
-
-💡 你应该始终确保在启动 `trainer.train()` 之前 `trainer.evaluate()` 是可以运行的，以避免在遇到错误之前浪费大量计算资源。
-
-</Tip>
+> [!TIP]
+> 💡 你应该始终确保在启动 `trainer.train()` 之前 `trainer.evaluate()` 是可以运行的，以避免在遇到错误之前浪费大量计算资源。
 
 在尝试调试评估循环中的问题之前，你应该首先确保你已经检查了数据，能够正确地形成了 batch 并且可以在其上运行你的模型。我们已经完成了所有这些步骤，因此可以执行以下代码而不会出错：
 
@@ -684,11 +675,8 @@ trainer.train()
 ```
 
 在这种情况下，如果没有更多错误，我们的脚本将微调一个应该给出合理结果的模型。但是，如果训练没有任何错误，而训练出来的模型根本表现不佳，我们该怎么办？这是机器学习中最难的部分，我们将向你展示一些可以帮助解决这类问题的技巧。
-<Tip>
-
-💡 如果你使用的是手动训练循环，调试训练流程时也需要遵循相同的步骤，而且更容易将训练中的各个步骤分开调试。但是，请确保你没有忘记在合适的位置调用 `model.eval()` 或 `model.train()` ，也不要忘记在每个步骤中使用 `zero_grad()` ！
-
-</Tip>
+> [!TIP]
+> 💡 如果你使用的是手动训练循环，调试训练流程时也需要遵循相同的步骤，而且更容易将训练中的各个步骤分开调试。但是，请确保你没有忘记在合适的位置调用 `model.eval()` 或 `model.train()` ，也不要忘记在每个步骤中使用 `zero_grad()` ！
 
 ## 在训练期间调试静默（没有任何错误提示）错误 [[在训练期间调试静默（没有任何错误提示）错误]]
 
@@ -703,11 +691,8 @@ trainer.train()
 - 有没有一个标签比其他标签更常见？
 - 如果模型预测的答案是随机的或总是相同的，那么 loss/ 评估指标应该是多少，是否模型根本没能学到任何知识？
 
-<Tip warning={true}>
-
-⚠️ 如果你正在进行分布式训练，请在每个进程中打印数据集的样本并仔细核对，确保你得到的是相同的内容。一个常见的错误是在数据创建过程中有一些随机性，导致每个进程具有不同版本的数据集。
-
-</Tip>
+> [!WARNING]
+> ⚠️ 如果你正在进行分布式训练，请在每个进程中打印数据集的样本并仔细核对，确保你得到的是相同的内容。一个常见的错误是在数据创建过程中有一些随机性，导致每个进程具有不同版本的数据集。
 
 在检查数据后，可以检查模型的一些预测并对其进行解码。 如果模型总是预测同样的类别，那么可能是因为这个类别在数据集中的比例比较高（针对分类问题）； 过采样稀有类等技术可能会对解决这种问题有帮助。或者，这也可能是由训练的设置（如错误的超参数设置）引起的。
 
@@ -736,11 +721,8 @@ for _ in range(20):
     trainer.optimizer.zero_grad()
 ```
 
-<Tip>
-
-💡 如果你的训练数据不平衡，请确保构建一批包含所有标签的训练数据。
-
-</Tip>
+> [!TIP]
+> 💡 如果你的训练数据不平衡，请确保构建一批包含所有标签的训练数据。
 
 生成的模型在一个 `batch` 上应该有接近完美的结果。让我们计算结果预测的评估指标：
 
@@ -761,11 +743,8 @@ compute_metrics((preds.cpu().numpy(), labels.cpu().numpy()))
 
 如果你没有设法让你的模型获得这样的完美结果，这意味着构建问题的方式或数据有问题。只有当你通过了过拟合测试，才能确定你的模型理论上确实可以学到一些东西。
 
-<Tip warning={true}>
-
-⚠️ 在此测试之后，你需要创建模型和 `Trainer` ，因为获得的模型可能无法在你的完整数据集上恢复和学习有用的东西。
-
-</Tip>
+> [!WARNING]
+> ⚠️ 在此测试之后，你需要创建模型和 `Trainer` ，因为获得的模型可能无法在你的完整数据集上恢复和学习有用的东西。
 
 ### 在你有第一个 baseline 模型之前不要调整任何东西 [[在你有第一个baseline 模型之前不要调整任何东西]]
 
diff --git a/chapters/zh-CN/chapter8/4_tf.mdx b/chapters/zh-CN/chapter8/4_tf.mdx
index f70e60508..4149d19ba 100644
--- a/chapters/zh-CN/chapter8/4_tf.mdx
+++ b/chapters/zh-CN/chapter8/4_tf.mdx
@@ -110,19 +110,16 @@ model.compile(optimizer="adam")
 
 现在我们可以使用模型的内部自动计算损失，这个问题解决了！
 
-<Tip>
-
-✏️ **轮到你了！** 作为解决其他问题后的可选挑战，你可以尝试回到这一步，让模型使用原始 Keras 计算的损失而不是内部损失。 你需要将 `"labels"` 添加到 `to_tf_dataset()` 的 `label_cols` 参数，并且确保 `to_tf_dataset()` 输出真实的标签来提供梯度，但是我们指定的损失还有一个问题。即使在这个问题上可以进行训练，学习速度仍然会非常慢，并且 loss 会达到一个较高的值。你能找出问题在哪里吗？
-
-如果你卡住了，这是一个 ROT13 编码的提示（如果你不熟悉 ROT13，可以在[这里](https://rot13.com/)解码。）：Vs lbh ybbx ng gur bhgchgf bs FrdhraprPynffvsvpngvba zbqryf va Genafsbezref, gurve svefg bhgchg vf `ybtvgf`. Jung ner ybtvgf?（查看 Transformers 中 `SequenceClassiﬁcation` 模型的输出，第一个输出是“logits”。思考什么是 logits ？它所代表的实际含义是什么？）
-
-还有一个提示：
-
-Jura lbh fcrpvsl bcgvzvmref, npgvingvbaf be ybffrf jvgu fgevatf, Xrenf frgf nyy gur nethzrag inyhrf gb gurve qrsnhygf. Jung nethzragf qbrf FcnefrPngrtbevpnyPebffragebcl unir, naq jung ner gurve qrsnhygf?（当训练模型时直接使用字符串告诉模型指定优化器、激活函数或损失函数时，Keras 会使用优化器、激活函数或损失函数参数值的默认值。
-`SparseCategoricalCrossentropy` 损失函数有哪些参数，它们的默认值是什么？
-）
-
-</Tip>
+> [!TIP]
+> ✏️ **轮到你了！** 作为解决其他问题后的可选挑战，你可以尝试回到这一步，让模型使用原始 Keras 计算的损失而不是内部损失。 你需要将 `"labels"` 添加到 `to_tf_dataset()` 的 `label_cols` 参数，并且确保 `to_tf_dataset()` 输出真实的标签来提供梯度，但是我们指定的损失还有一个问题。即使在这个问题上可以进行训练，学习速度仍然会非常慢，并且 loss 会达到一个较高的值。你能找出问题在哪里吗？
+>
+> 如果你卡住了，这是一个 ROT13 编码的提示（如果你不熟悉 ROT13，可以在[这里](https://rot13.com/)解码。）：Vs lbh ybbx ng gur bhgchgf bs FrdhraprPynffvsvpngvba zbqryf va Genafsbezref, gurve svefg bhgchg vf `ybtvgf`. Jung ner ybtvgf?（查看 Transformers 中 `SequenceClassiﬁcation` 模型的输出，第一个输出是“logits”。思考什么是 logits ？它所代表的实际含义是什么？）
+>
+> 还有一个提示：
+>
+> Jura lbh fcrpvsl bcgvzvmref, npgvingvbaf be ybffrf jvgu fgevatf, Xrenf frgf nyy gur nethzrag inyhrf gb gurve qrsnhygf. Jung nethzragf qbrf FcnefrPngrtbevpnyPebffragebcl unir, naq jung ner gurve qrsnhygf?（当训练模型时直接使用字符串告诉模型指定优化器、激活函数或损失函数时，Keras 会使用优化器、激活函数或损失函数参数值的默认值。
+> `SparseCategoricalCrossentropy` 损失函数有哪些参数，它们的默认值是什么？
+> ）
 
 现在让我们尝试继续进行训练。 如今已经得到梯度，所以希望（此处播放令人不安的音乐）只需调用`model.fit()`，一切都会正常工作！
 
@@ -363,11 +360,8 @@ model = TFAutoModelForSequenceClassification.from_pretrained(model_checkpoint)
 model.compile(optimizer=Adam(5e-5))
 ```
 
-<Tip>
-
-💡除了从 Keras 中导入 `Adam` 优化器你还可以从🤗 Transformers 中导入 `create_optimizer()` 函数，这将提供具有正确的权重衰减和学习率预热和衰减的 AdamW 优化器。 此优化器通常会比使用默认 `Adam` 优化器的效果稍好一些。
-
-</Tip>
+> [!TIP]
+> 💡除了从 Keras 中导入 `Adam` 优化器你还可以从🤗 Transformers 中导入 `create_optimizer()` 函数，这将提供具有正确的权重衰减和学习率预热和衰减的 AdamW 优化器。 此优化器通常会比使用默认 `Adam` 优化器的效果稍好一些。
 
 现在，我们可以使用改进后的学习率来拟合模型：
 
@@ -389,11 +383,8 @@ model.fit(train_dataset)
 
 内存不足指的是`OOM when allocating tensor`之类的错误——OOM 是`out of memory`的缩写。 在处理大型语言模型时，这是一个非常常见的错误。 如遇此种情况，可以将 batch size 减半并重试。 但有些尺寸非常大，比如全尺寸 GPT-2 的参数为 1.5B，这意味着你将需要 6 GB 的内存来存储模型，另外需要 6 GB 的内存用于梯度下降！ 无论你使用什么 batch size ，训练完整的 GPT-2 模型通常都需要超过 20 GB 的 VRAM，然而这只有少数 GPU 才可以做到。 像`distilbert-base-cased`这样更轻量级的模型更容易训练，并且训练速度也更快。
 
-<Tip>
-
-在课程的下一部分中，我们将介绍更先进的技术，这些技术可以帮助你减少内存占用并微调超大的模型。
-
-</Tip>
+> [!TIP]
+> 在课程的下一部分中，我们将介绍更先进的技术，这些技术可以帮助你减少内存占用并微调超大的模型。
 
 ### TensorFlow 🦛饿饿 [[TensorFlow 🦛饿饿]]
 
@@ -446,21 +437,15 @@ for batch in train_dataset:
 model.fit(batch, epochs=20)
 ```
 
-<Tip>
-
-💡 如果训练数据不平衡，请确保构建的这个 batch 包含训练数据中所有的标签。
-
-</Tip>
+> [!TIP]
+> 💡 如果训练数据不平衡，请确保构建的这个 batch 包含训练数据中所有的标签。
 
 生成的模型在一个 batch 上应该有接近完美的结果，损失迅速下降到 0（或你正在使用的损失的最小值）。
 
 如果你没有设法让你的模型获得这样的完美结果，这意味着构建问题的方式或数据有问题。只有当你通过了过拟合测试，才能确定你的模型理论上确实可以学到一些东西。
 
-<Tip warning={true}>
-
-⚠️ 在此测试之后，你需要创建模型和 `Trainer` ，因为经过过拟合测试的模型可能无法恢复正常的参数范围，从而无法在完整数据集上学到有用的知识。
-
-</Tip>
+> [!WARNING]
+> ⚠️ 在此测试之后，你需要创建模型和 `Trainer` ，因为经过过拟合测试的模型可能无法恢复正常的参数范围，从而无法在完整数据集上学到有用的知识。
 
 ### 在你有第一个 baseline 模型之前不要调整任何东西 [[在你有第一个 baseline 模型之前不要调整任何东西]]
 
diff --git a/chapters/zh-CN/chapter8/5.mdx b/chapters/zh-CN/chapter8/5.mdx
index d5cb0793e..4316dd9b7 100644
--- a/chapters/zh-CN/chapter8/5.mdx
+++ b/chapters/zh-CN/chapter8/5.mdx
@@ -17,11 +17,8 @@
 
 创建一个最小可复现的示例非常重要，因为 Hugging Face 团队中没有人是魔术师（至少目前还没有），他们不能修复他们看不到的问题。一个最小可复现的示例应该是可复现的，这意味着它不应依赖于你可能有的任何外部文件或数据。尝试用一些看起来像真实数据的虚拟值替换你正在使用的数据，并且仍然产生相同的错误。
 
-<Tip>
-
-🚨🤗 Transformers 仓库中有很多未解决的问题，因为无法访问复现这些问题的数据。
-
-</Tip>
+> [!TIP]
+> 🚨🤗 Transformers 仓库中有很多未解决的问题，因为无法访问复现这些问题的数据。
 
 在创建了一个包含所遇见的问题示例后，你可以尝试将其进一步简化，构建我们所说的“最小可复现示例”。虽然这需要你多做一些工作，但如果你提供了一个简洁明了的 bug 复现，那么几乎可以肯定会得到帮助和修复。
 
diff --git a/chapters/zh-CN/chapter9/7.mdx b/chapters/zh-CN/chapter9/7.mdx
index df418c83d..afcdf597e 100644
--- a/chapters/zh-CN/chapter9/7.mdx
+++ b/chapters/zh-CN/chapter9/7.mdx
@@ -61,9 +61,8 @@ demo.launch()
 上面简单示例介绍了 Blocks 的 4 个基本概念：
 
 1. 通过在 `with gradio.Blocks` 上下文中实例化 Python 对象，Blocks 支持构建组合了 markdown、HTML、按钮和交互式组件的 Web 网页。
-<Tip>
-🙋如果你不熟悉 Python 中的 `with` 语句，我们建议你先查看优秀的 [realpython 教程](https://realpython.com/python-with-statement) 后再回来查看🤗。
-</Tip>
+> [!TIP]
+> 🙋如果你不熟悉 Python 中的 `with` 语句，我们建议你先查看优秀的 [realpython 教程](https://realpython.com/python-with-statement) 后再回来查看🤗。
 实例化组件的顺序很重要，因为每个元素都按照创建的顺序渲染到 Web 网页中。（更复杂的布局将在下面讨论）
 
 2. 你可以在代码中的任何位置定义常规 Python 函数，并指定 `Blocks` 在用户输入的情况下运行它们。在我们的示例中，我们使用了一个可以“翻转”输入的文本简单的函数，这个函数可以是任意的Python 函数，从简单的计算到处理来自机器学习模型的预测等。
diff --git a/chapters/zh-TW/chapter1/3.mdx b/chapters/zh-TW/chapter1/3.mdx
index 206c7dd68..513f7dfd5 100644
--- a/chapters/zh-TW/chapter1/3.mdx
+++ b/chapters/zh-TW/chapter1/3.mdx
@@ -8,11 +8,10 @@
 ]} />
 
 在本節中，我們將看看 Transformer 模型可以做什麼，並使用 🤗 Transformers 庫中的第一個工具：pipeline() 函數。
-<Tip>
-👀 看到那個右上角的 <em>在Colab中打開</em>的按鈕了嗎? 單擊它就可以打開一個包含本節所有代碼示例的 Google Colab 筆記本。 每一個有實例代碼的小節都會有它。
-
-如果您想在本地運行示例，我們建議您查看<a href="/course/chapter0">準備</a>.
-</Tip>
+> [!TIP]
+> 👀 看到那個右上角的 <em>在Colab中打開</em>的按鈕了嗎? 單擊它就可以打開一個包含本節所有代碼示例的 Google Colab 筆記本。 每一個有實例代碼的小節都會有它。
+>
+> 如果您想在本地運行示例，我們建議您查看<a href="/course/chapter0">準備</a>.
 
 ## Transformer被應用於各個方面！
 Transformer 模型用於解決各種 NLP 任務，就像上一節中提到的那樣。以下是一些使用 Hugging Face 和 Transformer 模型的公司和組織，他們也通過分享他們的模型回饋社區：
@@ -21,9 +20,8 @@ Transformer 模型用於解決各種 NLP 任務，就像上一節中提到的那
 
 [🤗 Transformers 庫](https://github.com/huggingface/transformers)提供了創建和使用這些共享模型的功能。[模型中心（hub）](https://huggingface.co/models)包含數千個任何人都可以下載和使用的預訓練模型。您還可以將自己的模型上傳到 Hub！
 
-<Tip>
-⚠️ Hugging Face Hub 不限於 Transformer 模型。任何人都可以分享他們想要的任何類型的模型或數據集！創建一個 Huggingface.co 帳戶(https://huggingface.co/join)以使用所有可用功能！
-</Tip>
+> [!TIP]
+> ⚠️ Hugging Face Hub 不限於 Transformer 模型。任何人都可以分享他們想要的任何類型的模型或數據集！創建一個 Huggingface.co 帳戶(https://huggingface.co/join)以使用所有可用功能！
 
 在深入研究 Transformer 模型的底層工作原理之前，讓我們先看幾個示例，看看它們如何用於解決一些有趣的 NLP 問題。
 
@@ -97,9 +95,8 @@ classifier(
 ```
 
 此pipeline稱為zero-shot，因為您不需要對數據上的模型進行微調即可使用它。它可以直接返回您想要的任何標籤列表的概率分數！
-<Tip>
-✏️**快來試試吧！** 使用您自己的序列和標籤，看看模型的行為。
-</Tip>
+> [!TIP]
+> ✏️**快來試試吧！** 使用您自己的序列和標籤，看看模型的行為。
 
 ## 文本生成
 現在讓我們看看如何使用pipeline來生成一些文本。這裡的主要使用方法是您提供一個提示，模型將通過生成剩餘的文本來自動完成整段話。這類似於許多手機上的預測文本功能。文本生成涉及隨機性，因此如果您沒有得到相同的如下所示的結果，這是正常的。
@@ -119,9 +116,8 @@ generator("In this course, we will teach you how to")
 ```
 您可以使用參數 **num_return_sequences** 控制生成多少個不同的序列，並使用參數 **max_length** 控制輸出文本的總長度。
 
-<Tip>
-✏️**快來試試吧！** 使用 num_return_sequences 和 max_length 參數生成兩個句子，每個句子 15 個單詞。
-</Tip>
+> [!TIP]
+> ✏️**快來試試吧！** 使用 num_return_sequences 和 max_length 參數生成兩個句子，每個句子 15 個單詞。
 
 ## 在pipeline中使用 Hub 中的其他模型
 前面的示例使用了默認模型，但您也可以從 Hub 中選擇特定模型以在特定任務的pipeline中使用 - 例如，文本生成。轉到[模型中心（hub）](https://huggingface.co/models)並單擊左側的相應標籤將會只顯示該任務支持的模型。[例如這樣](https://huggingface.co/models?pipeline_tag=text-generation)。
@@ -148,9 +144,8 @@ generator(
 您可以通過單擊語言標籤來篩選搜索結果，然後選擇另一種文本生成模型的模型。模型中心（hub）甚至包含支持多種語言的多語言模型。
 
 通過單擊選擇模型後，您會看到有一個小組件，可讓您直接在線試用。通過這種方式，您可以在下載之前快速測試模型的功能。
-<Tip>
-✏️**快來試試吧！** 使用標籤篩選查找另一種語言的文本生成模型。使用小組件測試並在pipeline中使用它！
-</Tip>
+> [!TIP]
+> ✏️**快來試試吧！** 使用標籤篩選查找另一種語言的文本生成模型。使用小組件測試並在pipeline中使用它！
 
 ## 推理 API
 所有模型都可以使用 Inference API 直接通過瀏覽器進行測試，該 API 可在 [Hugging Face 網站](https://huggingface.co/)上找到。通過輸入自定義文本並觀察模型的輸出，您可以直接在此頁面上使用模型。
@@ -177,9 +172,8 @@ unmasker("This course will teach you all about <mask> models.", top_k=2)
 ```
 **top_k** 參數控制要顯示的結果有多少種。請注意，這裡模型填充了特殊的< **mask** >詞，它通常被稱為掩碼標記。其他掩碼填充模型可能有不同的掩碼標記，因此在探索其他模型時要驗證正確的掩碼字是什麼。檢查它的一種方法是查看小組件中使用的掩碼。
 
-<Tip>
-✏️**快來試試吧！** 在 Hub 上搜索基於 bert 的模型並在推理 API 小組件中找到它的掩碼。這個模型對上面pipeline示例中的句子預測了什麼？
-</Tip>
+> [!TIP]
+> ✏️**快來試試吧！** 在 Hub 上搜索基於 bert 的模型並在推理 API 小組件中找到它的掩碼。這個模型對上面pipeline示例中的句子預測了什麼？
 
 ## 命名實體識別
 命名實體識別 (NER) 是一項任務，其中模型必須找到輸入文本的哪些部分對應於諸如人員、位置或組織之類的實體。讓我們看一個例子：
@@ -199,9 +193,8 @@ ner("My name is Sylvain and I work at Hugging Face in Brooklyn.")
 
 我們在pipeline創建函數中傳遞選項 **grouped_entities=True** 以告訴pipeline將對應於同一實體的句子部分重新組合在一起：這裡模型正確地將「Hugging」和「Face」分組為一個組織，即使名稱由多個詞組成。事實上，正如我們即將在下一章看到的，預處理甚至會將一些單詞分成更小的部分。例如，**Sylvain** 分割為了四部分：**S、##yl、##va** 和 **##in**。在後處理步驟中，pipeline成功地重新組合了這些部分。
 
-<Tip>
-✏️**快來試試吧！** 在模型中心（hub）搜索能夠用英語進行詞性標注（通常縮寫為 POS）的模型。這個模型對上面例子中的句子預測了什麼？
-</Tip>
+> [!TIP]
+> ✏️**快來試試吧！** 在模型中心（hub）搜索能夠用英語進行詞性標注（通常縮寫為 POS）的模型。這個模型對上面例子中的句子預測了什麼？
 
 ## 問答系統
 問答pipeline使用來自給定上下文的信息回答問題：
@@ -279,10 +272,7 @@ translator("Ce cours est produit par Hugging Face.")
 
 與文本生成和摘要一樣，您可以指定結果的 **max_length** 或 **min_length**。
 
-<Tip>
-
-✏️**快來試試吧！** 搜索其他語言的翻譯模型，嘗試將前一句翻譯成幾種不同的語言。
-
-</Tip>
+> [!TIP]
+> ✏️**快來試試吧！** 搜索其他語言的翻譯模型，嘗試將前一句翻譯成幾種不同的語言。
 
 到目前為止顯示的pipeline主要用於演示目的。它們是為特定任務而編程的，不能對他們進行自定義的修改。在下一章中，您將瞭解 **pipeline()** 函數內部的內容以及如何進行自定義的修改。
\ No newline at end of file
diff --git a/chapters/zh-TW/chapter2/1.mdx b/chapters/zh-TW/chapter2/1.mdx
index 618d780bb..04e0f1bf8 100644
--- a/chapters/zh-TW/chapter2/1.mdx
+++ b/chapters/zh-TW/chapter2/1.mdx
@@ -18,6 +18,5 @@
 
 然後我們來看看標記器API，它是 pipeline() 函數的另一個主要組件。它是作用分詞器負責第一個和最後一個處理步驟，處理從文本到神經網絡數字輸入的轉換，以及在需要時轉換回文本。最後，我們將向您展示如何處理在一個準備好的批處理中通過一個模型發送多個句子的問題，然後詳細介紹 pipeline() 函數。
 
-<Tip>
-⚠️ 為了從模型中心和 🤗Transformers 的所有可用功能中獲益，我們建議<a href="https://huggingface.co/join">creating an account</a>.
-</Tip>
\ No newline at end of file
+> [!TIP]
+> ⚠️ 為了從模型中心和 🤗Transformers 的所有可用功能中獲益，我們建議<a href="https://huggingface.co/join">creating an account</a>.
\ No newline at end of file
diff --git a/chapters/zh-TW/chapter2/2.mdx b/chapters/zh-TW/chapter2/2.mdx
index fcacec271..1e990c884 100644
--- a/chapters/zh-TW/chapter2/2.mdx
+++ b/chapters/zh-TW/chapter2/2.mdx
@@ -22,9 +22,8 @@
 
 {/if}
 
-<Tip>
-這是第一部分，根據您使用 PyTorch 或者 TensorFlow，內容略有不同。點擊標題上方的平臺，選一個您喜歡的吧！
-</Tip>
+> [!TIP]
+> 這是第一部分，根據您使用 PyTorch 或者 TensorFlow，內容略有不同。點擊標題上方的平臺，選一個您喜歡的吧！
 
 {#if fw === 'pt'}
 <Youtube id="1pedAIvTWXk"/>
@@ -352,8 +351,5 @@ model.config.id2label
 
 我們已經成功地複製了管道的三個步驟：使用標記化器進行預處理、通過模型傳遞輸入以及後處理！現在，讓我們花一些時間深入瞭解這些步驟中的每一步。
 
-<Tip>
-
-✏️ **試試看！** 選擇兩個（或更多）你自己的文本並在管道中運行它們。然後自己複製在這裡看到的步驟，並檢查是否獲得相同的結果！
-
-</Tip>
+> [!TIP]
+> ✏️ **試試看！** 選擇兩個（或更多）你自己的文本並在管道中運行它們。然後自己複製在這裡看到的步驟，並檢查是否獲得相同的結果！
diff --git a/chapters/zh-TW/chapter2/4.mdx b/chapters/zh-TW/chapter2/4.mdx
index 93bb26e65..d7991f260 100644
--- a/chapters/zh-TW/chapter2/4.mdx
+++ b/chapters/zh-TW/chapter2/4.mdx
@@ -215,11 +215,8 @@ print(ids)
 
 這些輸出一旦轉換為適當的框架張量，就可以用作模型的輸入，如本章前面所見。
 
-<Tip>
-
-✏️ **試試看！** 在我們在第 2 節中使用的輸入句子（“I've been waiting for a HuggingFace course my whole life.”和“I hate this so much!”）複製最後兩個步驟（標記化和轉換為輸入 ID）。檢查您獲得的輸入 ID 是否與我們之前獲得的相同！
-
-</Tip>
+> [!TIP]
+> ✏️ **試試看！** 在我們在第 2 節中使用的輸入句子（“I've been waiting for a HuggingFace course my whole life.”和“I hate this so much!”）複製最後兩個步驟（標記化和轉換為輸入 ID）。檢查您獲得的輸入 ID 是否與我們之前獲得的相同！
 
 ## 解碼
 
diff --git a/chapters/zh-TW/chapter2/5.mdx b/chapters/zh-TW/chapter2/5.mdx
index add0490a6..8c3384fbe 100644
--- a/chapters/zh-TW/chapter2/5.mdx
+++ b/chapters/zh-TW/chapter2/5.mdx
@@ -187,10 +187,8 @@ batched_ids = [ids, ids]
 
 這是一批兩個相同的序列！
 
-<Tip>
-
-✏️ **試試看！** 將此列表轉換為張量並通過模型傳遞。檢查您是否獲得與之前相同的登錄（但是隻有兩次）
-</Tip>
+> [!TIP]
+> ✏️ **試試看！** 將此列表轉換為張量並通過模型傳遞。檢查您是否獲得與之前相同的登錄（但是隻有兩次）
 
 批處理允許模型在輸入多個句子時工作。使用多個序列就像使用單個序列構建批一樣簡單。不過，還有第二個問題。當你試圖將兩個（或更多）句子組合在一起時，它們的長度可能不同。如果您以前使用過張量，那麼您知道它們必須是矩形，因此無法將輸入ID列表直接轉換為張量。為了解決這個問題，我們通常填充輸入。
 
@@ -324,11 +322,8 @@ tf.Tensor(
 
 請注意，第二個序列的最後一個值是一個填充ID，它在attention mask中是一個0值。
 
-<Tip>
-
-✏️ 試試看！在第2節中使用的兩個句子上手動應用標記化（“我一生都在等待Hugging Face課程。”和“我非常討厭這個！”）。通過模型傳遞它們，並檢查您是否獲得與第2節中相同的登錄。現在使用填充標記將它們批處理在一起，然後創建適當的注意掩碼。檢查通過模型時是否獲得相同的結果！
-
-</Tip>
+> [!TIP]
+> ✏️ 試試看！在第2節中使用的兩個句子上手動應用標記化（“我一生都在等待Hugging Face課程。”和“我非常討厭這個！”）。通過模型傳遞它們，並檢查您是否獲得與第2節中相同的登錄。現在使用填充標記將它們批處理在一起，然後創建適當的注意掩碼。檢查通過模型時是否獲得相同的結果！
 
 ## 長序列
 
diff --git a/chapters/zh-TW/chapter3/2.mdx b/chapters/zh-TW/chapter3/2.mdx
index 3a1f8980f..4cc8f883d 100644
--- a/chapters/zh-TW/chapter3/2.mdx
+++ b/chapters/zh-TW/chapter3/2.mdx
@@ -146,11 +146,8 @@ raw_train_dataset.features
 
 在上面的例子之中,**Label（標籤）** 是一種**ClassLabel（分類標籤）**，使用整數建立起到類別標籤的映射關係。**0**對應於**not_equivalent**，**1**對應於**equivalent**。
 
-<Tip>
-
-✏️ **試試看！** 查看訓練集的第15行元素和驗證集的87行元素。他們的標籤是什麼？
-
-</Tip>
+> [!TIP]
+> ✏️ **試試看！** 查看訓練集的第15行元素和驗證集的87行元素。他們的標籤是什麼？
 
 ### 預處理數據集
 
@@ -188,11 +185,8 @@ inputs
 
 我們在[第二章](/course/chapter2) 討論了**輸入詞id(input_ids)** 和 **注意力遮罩(attention_mask)** ，但我們在那個時候沒有討論**類型標記ID(token_type_ids)**。在這個例子中，**類型標記ID(token_type_ids)**的作用就是告訴模型輸入的哪一部分是第一句，哪一部分是第二句。
 
-<Tip>
-
-✏️ ** 試試看！** 選取訓練集中的第15個元素，將兩句話分別標記為一對。結果和上方的例子有什麼不同？
-
-</Tip>
+> [!TIP]
+> ✏️ ** 試試看！** 選取訓練集中的第15個元素，將兩句話分別標記為一對。結果和上方的例子有什麼不同？
 
 如果我們將**input_ids**中的id轉換回文字:
 
@@ -351,11 +345,8 @@ batch = data_collator(samples)
 
 {/if}
 
-<Tip>
-
-✏️ ** 試試看！** 在GLUE SST-2數據集上應用預處理。它有點不同，因為它是由單個句子而不是成對的句子組成的，但是我們所做的其他事情看起來應該是一樣的。另一個更難的挑戰，請嘗試編寫一個可用於任何GLUE任務的預處理函數。
-
-</Tip>
+> [!TIP]
+> ✏️ ** 試試看！** 在GLUE SST-2數據集上應用預處理。它有點不同，因為它是由單個句子而不是成對的句子組成的，但是我們所做的其他事情看起來應該是一樣的。另一個更難的挑戰，請嘗試編寫一個可用於任何GLUE任務的預處理函數。
 
 {#if fw === 'tf'}
 
diff --git a/chapters/zh-TW/chapter3/3.mdx b/chapters/zh-TW/chapter3/3.mdx
index 93555c2aa..c2d37016f 100644
--- a/chapters/zh-TW/chapter3/3.mdx
+++ b/chapters/zh-TW/chapter3/3.mdx
@@ -42,11 +42,8 @@ from transformers import TrainingArguments
 training_args = TrainingArguments("test-trainer")
 ```
 
-<Tip>
-
-💡 如果您想在訓練期間自動將模型上傳到 Hub，請將push_to_hub=True添加到TrainingArguments之中. 我們將在[第四章](/course/chapter4/3)中詳細介紹這部分。
-
-</Tip>
+> [!TIP]
+> 💡 如果您想在訓練期間自動將模型上傳到 Hub，請將push_to_hub=True添加到TrainingArguments之中. 我們將在[第四章](/course/chapter4/3)中詳細介紹這部分。
 
 第二步是定義我們的模型。正如在[之前的章節](/2_Using Transformers/Introduction)一樣，我們將使用 **AutoModelForSequenceClassification** 類，它有兩個參數：
 
@@ -164,9 +161,6 @@ trainer.train()
 
 使用**Trainer** API微調的介紹到此結束。對最常見的 NLP 任務執行此操作的示例將在第 7 章中給出，但現在讓我們看看如何在純 PyTorch 中執行相同的操作。
 
-<Tip>
-
-✏️ **試試看!** 使用您在第 2 節中進行的數據處理，在 GLUE SST-2 數據集上微調模型。
-
-</Tip>
+> [!TIP]
+> ✏️ **試試看!** 使用您在第 2 節中進行的數據處理，在 GLUE SST-2 數據集上微調模型。
 
diff --git a/chapters/zh-TW/chapter3/3_tf.mdx b/chapters/zh-TW/chapter3/3_tf.mdx
index 21d095289..6423977ec 100644
--- a/chapters/zh-TW/chapter3/3_tf.mdx
+++ b/chapters/zh-TW/chapter3/3_tf.mdx
@@ -70,11 +70,8 @@ model = TFAutoModelForSequenceClassification.from_pretrained(checkpoint, num_lab
 
 要在我們的數據集上微調模型，我們只需要在我們的模型上調用 `compile()` 方法，然後將我們的數據傳遞給 `fit()` 方法。 這將啟動微調過程（在 GPU 上應該需要幾分鐘）並輸出訓練loss，以及每個 epoch 結束時的驗證loss。
 
-<Tip>
-
-請注意🤗 Transformers 模型具有大多數 Keras 模型所沒有的特殊能力——它們可以自動使用內部計算的loss。 如果您沒有在 `compile()` 中設置損失函數，他們將默認使用內部計算的損失。 請注意，要使用內部損失，您需要將標籤作為輸入的一部分傳遞，而不是作為單獨的標籤（這是在 Keras 模型中使用標籤的正常方式）。 您將在課程的第 2 部分中看到這方面的示例，其中定義正確的損失函數可能很棘手。 然而，對於序列分類，標準的 Keras 損失函數可以正常工作，所以我們將在這裡使用它。
-
-</Tip>
+> [!TIP]
+> 請注意🤗 Transformers 模型具有大多數 Keras 模型所沒有的特殊能力——它們可以自動使用內部計算的loss。 如果您沒有在 `compile()` 中設置損失函數，他們將默認使用內部計算的損失。 請注意，要使用內部損失，您需要將標籤作為輸入的一部分傳遞，而不是作為單獨的標籤（這是在 Keras 模型中使用標籤的正常方式）。 您將在課程的第 2 部分中看到這方面的示例，其中定義正確的損失函數可能很棘手。 然而，對於序列分類，標準的 Keras 損失函數可以正常工作，所以我們將在這裡使用它。
 
 ```py
 from tensorflow.keras.losses import SparseCategoricalCrossentropy
@@ -90,11 +87,8 @@ model.fit(
 )
 ```
 
-<Tip warning={true}>
-
-請注意這裡有一個非常常見的陷阱——你只是*可以*將損失的名稱作為字符串傳遞給 Keras，但默認情況下，Keras 會假設你已經對輸出應用了 softmax。 然而，許多模型在應用 softmax 之前就輸出，也稱為 *logits*。 我們需要告訴損失函數我們的模型是否經過了softmax，唯一的方法是直接調用它，而不是用字符串的名稱。
-
-</Tip>
+> [!WARNING]
+> 請注意這裡有一個非常常見的陷阱——你只是*可以*將損失的名稱作為字符串傳遞給 Keras，但默認情況下，Keras 會假設你已經對輸出應用了 softmax。 然而，許多模型在應用 softmax 之前就輸出，也稱為 *logits*。 我們需要告訴損失函數我們的模型是否經過了softmax，唯一的方法是直接調用它，而不是用字符串的名稱。
 
 
 ### 提升訓練的效果
@@ -122,11 +116,8 @@ from tensorflow.keras.optimizers import Adam
 opt = Adam(learning_rate=lr_scheduler)
 ```
 
-<Tip>
-
-🤗 Transformers 庫還有一個 `create_optimizer()` 函數，它將創建一個具有學習率衰減的 `AdamW` 優化器。 這是一個便捷的方式，您將在本課程的後續部分中詳細瞭解。
-
-</Tip>
+> [!TIP]
+> 🤗 Transformers 庫還有一個 `create_optimizer()` 函數，它將創建一個具有學習率衰減的 `AdamW` 優化器。 這是一個便捷的方式，您將在本課程的後續部分中詳細瞭解。
 
 現在我們有了全新的優化器，我們可以嘗試使用它進行訓練。 首先，讓我們重新加載模型，以重置我們剛剛進行的訓練運行對權重的更改，然後我們可以使用新的優化器對其進行編譯：
 
@@ -144,11 +135,8 @@ model.compile(optimizer=opt, loss=loss, metrics=["accuracy"])
 model.fit(tf_train_dataset, validation_data=tf_validation_dataset, epochs=3)
 ```
 
-<Tip>
-
-💡 如果您想在訓練期間自動將模型上傳到 Hub，您可以在 `model.fit()` 方法中傳遞 `PushToHubCallback`。 我們將在 [第四章](/course/chapter4/3) 中進行介紹
-
-</Tip>
+> [!TIP]
+> 💡 如果您想在訓練期間自動將模型上傳到 Hub，您可以在 `model.fit()` 方法中傳遞 `PushToHubCallback`。 我們將在 [第四章](/course/chapter4/3) 中進行介紹
 
 ### 模型預測
 
diff --git a/chapters/zh-TW/chapter3/4.mdx b/chapters/zh-TW/chapter3/4.mdx
index 7ea812d27..9134b8ffd 100644
--- a/chapters/zh-TW/chapter3/4.mdx
+++ b/chapters/zh-TW/chapter3/4.mdx
@@ -196,11 +196,8 @@ metric.compute()
 
 同樣，由於模型頭部初始化和數據改組的隨機性，您的結果會略有不同，但它們應該在同一個範圍內。
 
-<Tip>
-
-✏️ **試試看！** 修改之前的訓練循環以在 SST-2 數據集上微調您的模型。
-
-</Tip>
+> [!TIP]
+> ✏️ **試試看！** 修改之前的訓練循環以在 SST-2 數據集上微調您的模型。
 
 ### S使用🤗 Accelerate加速您的訓練循環
 
@@ -292,9 +289,8 @@ for epoch in range(num_epochs):
 
 然後大部分工作會在將數據加載器、模型和優化器發送到的`accelerator.prepare()`中完成。這將會把這些對象包裝在適當的容器中，以確保您的分佈式訓練按預期工作。要進行的其餘更改是刪除將`batch`放在 `device` 的那行代碼（同樣，如果您想保留它，您可以將其更改為使用 `accelerator.device` ) 並將 `loss.backward()` 替換為`accelerator.backward(loss)`。
 
-<Tip>
-⚠️ 為了使雲端 TPU 提供的加速發揮最大的效益，我們建議使用標記器(tokenizer)的 `padding=max_length` 和 `max_length` 參數將您的樣本填充到固定長度。
-</Tip>
+> [!TIP]
+> ⚠️ 為了使雲端 TPU 提供的加速發揮最大的效益，我們建議使用標記器(tokenizer)的 `padding=max_length` 和 `max_length` 參數將您的樣本填充到固定長度。
 
 如果您想複製並粘貼來直接運行，以下是 🤗 Accelerate 的完整訓練循環:
 
diff --git a/chapters/zh-TW/chapter4/2.mdx b/chapters/zh-TW/chapter4/2.mdx
index 66268aeef..abef381cf 100644
--- a/chapters/zh-TW/chapter4/2.mdx
+++ b/chapters/zh-TW/chapter4/2.mdx
@@ -92,6 +92,5 @@ model = TFAutoModelForMaskedLM.from_pretrained("camembert-base")
 ```
 {/if}
 
-<Tip>
-使用預訓練模型時，一定要檢查它是如何訓練的，在哪些數據集上，它的限制和它的偏差。所有這些信息都應在其模型卡片上註明。
-</Tip>
+> [!TIP]
+> 使用預訓練模型時，一定要檢查它是如何訓練的，在哪些數據集上，它的限制和它的偏差。所有這些信息都應在其模型卡片上註明。
diff --git a/chapters/zh-TW/chapter4/3.mdx b/chapters/zh-TW/chapter4/3.mdx
index 77bef09f3..fe38505b5 100644
--- a/chapters/zh-TW/chapter4/3.mdx
+++ b/chapters/zh-TW/chapter4/3.mdx
@@ -178,11 +178,8 @@ tokenizer.push_to_hub("dummy-model", organization="huggingface", use_auth_token=
 </div>
 {/if}
 
-<Tip>
-
-✏️ **試試看**！獲取與檢查點關聯的模型和標記器，並使用該方法將它們上傳到您的命名空間中的存儲庫。在刪除之前，請仔細檢查該存儲庫是否正確顯示在您的頁面上。
-
-</Tip>
+> [!TIP]
+> ✏️ **試試看**！獲取與檢查點關聯的模型和標記器，並使用該方法將它們上傳到您的命名空間中的存儲庫。在刪除之前，請仔細檢查該存儲庫是否正確顯示在您的頁面上。
 
 如您所見， **push_to_hub()** 方法接受多個參數，從而可以上傳到特定的存儲庫或組織命名空間，或使用不同的 API 令牌。我們建議您查看直接在[🤗 Transformers documentation](https://huggingface.co/transformers/model_sharing.html)瞭解什麼是可能的
 
@@ -470,9 +467,8 @@ config.json  README.md  sentencepiece.bpe.model  special_tokens_map.json  tf_mod
 
 {/if}
 
-<Tip>
-✏️ 從 web 界面創建存儲庫時，*.gitattributes* 文件會自動設置為將具有某些擴展名的文件，例如 *.bin* 和 *.h5* 視為大文件，git-lfs 會對其進行跟蹤您無需進行必要的設置。
-</Tip> 
+> [!TIP]
+> ✏️ 從 web 界面創建存儲庫時，*.gitattributes* 文件會自動設置為將具有某些擴展名的文件，例如 *.bin* 和 *.h5* 視為大文件，git-lfs 會對其進行跟蹤您無需進行必要的設置。 
 
 我們現在可以繼續進行，就像我們通常使用傳統 Git 存儲庫一樣。我們可以使用以下命令將所有文件添加到 Git 的暫存環境中 **git add** 命令：
 
diff --git a/chapters/zh-TW/chapter5/2.mdx b/chapters/zh-TW/chapter5/2.mdx
index 90cbfdc43..6120c15b7 100644
--- a/chapters/zh-TW/chapter5/2.mdx
+++ b/chapters/zh-TW/chapter5/2.mdx
@@ -48,11 +48,8 @@ SQuAD_it-train.json.gz:	   82.2% -- replaced with SQuAD_it-train.json
 
 我們可以看到壓縮文件已經被替換為SQuAD_it-train.json和SQuAD_it-text.json,並且數據以 JSON 格式存儲。
 
-<Tip>
-
-✎ 如果你想知道為什麼上面的shell命令中喲與一個字符`!`,那是因為我們是在 Jupyter notebook 中運行它們。如果您想在終端中下載和解壓縮數據集，只需刪除前綴!即可。
-
-</Tip>
+> [!TIP]
+> ✎ 如果你想知道為什麼上面的shell命令中喲與一個字符`!`,那是因為我們是在 Jupyter notebook 中運行它們。如果您想在終端中下載和解壓縮數據集，只需刪除前綴!即可。
 
 使用`load_dataset()`函數來加載JSON文件, 我們只需要知道我們是在處理普通的 JSON(類似於嵌套字典)還是 JSON 行(行分隔的 JSON)。像許多問答數據集一樣, SQuAD-it 使用嵌套格式,所有文本都存儲在 `data`文件中。這意味著我們可以通過指定參數`field`來加載數據集,如下所示:
 
@@ -126,11 +123,8 @@ DatasetDict({
 
 這正是我們想要的。現在, 現在，我們可以應用各種預處理技術來清理數據、標記評論等。
 
-<Tip>
-
-`load_dataset()`函數的`data_files`參數非常靈活並且可以是單個文件路徑、文件路徑列表或將分割後的名稱映射到文件路徑的字典。您還可以根據Unix shell使用的規則對與指定模式匹配的文件進行全局定位（例如，您可以通過設置'data_files=“*.JSON”'將目錄中的所有JSON文件作為單個拆分進行全局定位）。有關更多詳細信息，請參閱🤗Datasets 文檔。
-
-</Tip>
+> [!TIP]
+> `load_dataset()`函數的`data_files`參數非常靈活並且可以是單個文件路徑、文件路徑列表或將分割後的名稱映射到文件路徑的字典。您還可以根據Unix shell使用的規則對與指定模式匹配的文件進行全局定位（例如，您可以通過設置'data_files=“*.JSON”'將目錄中的所有JSON文件作為單個拆分進行全局定位）。有關更多詳細信息，請參閱🤗Datasets 文檔。
 
 🤗 Datasets實際上支持輸入文件的自動解壓,所以我們可以跳過使用`gzip`,直接設置 `data_files`參數傳遞壓縮文件:
 
@@ -158,10 +152,7 @@ squad_it_dataset = load_dataset("json", data_files=data_files, field="data")
 
 這將返回和上面的本地例子相同的 `DatasetDict` 對象, 但省去了我們手動下載和解壓 _SQuAD_it-*.json.gz_ 文件的步驟。這是我們對加載未託管在Hugging Face Hub的數據集的各種方法的總結。既然我們已經有了一個可以使用的數據集,讓我們開始大展身手吧！
 
-<Tip>
-
-✏️ **試試看!** 選擇託管在GitHub或[UCI Machine Learning Repository](https://archive.ics.uci.edu/ml/index.php)上的另一個數據集並嘗試使用上述技術在本地和遠程加載它。另外,可以嘗試加載CSV或者文本格式存儲的數據集(有關這些格式的更多信息,請參閱[文檔](https://huggingface.co/docs/datasets/loading#local-and-remote-files))。
-
-</Tip>
+> [!TIP]
+> ✏️ **試試看!** 選擇託管在GitHub或[UCI Machine Learning Repository](https://archive.ics.uci.edu/ml/index.php)上的另一個數據集並嘗試使用上述技術在本地和遠程加載它。另外,可以嘗試加載CSV或者文本格式存儲的數據集(有關這些格式的更多信息,請參閱[文檔](https://huggingface.co/docs/datasets/loading#local-and-remote-files))。
 
 
diff --git a/chapters/zh-TW/chapter5/3.mdx b/chapters/zh-TW/chapter5/3.mdx
index 3b78e474e..c7ea2e73e 100644
--- a/chapters/zh-TW/chapter5/3.mdx
+++ b/chapters/zh-TW/chapter5/3.mdx
@@ -89,11 +89,8 @@ DatasetDict({
 })
 ```
 
-<Tip>
-
-✏️ **試試看！** 使用 `Dataset.unique()` 函數查找訓練和測試集中滿足某個條件的藥物經過去重之後的數量。
-
-</Tip>
+> [!TIP]
+> ✏️ **試試看！** 使用 `Dataset.unique()` 函數查找訓練和測試集中滿足某個條件的藥物經過去重之後的數量。
 
 接下來，讓我們使用 **Dataset.map()**標準化所有 **condition** 標籤 .正如我們在[第三章](/course/chapter3)中所做的那樣，我們可以定義一個簡單的函數，可以將該函數應用於**drug_dataset** 拆分後每部分的所有行：
 
@@ -217,11 +214,8 @@ drug_dataset["train"].sort("review_length")[:3]
 
 正如我們所猜想的那樣，一些評論只包含一個詞，雖然這對於情感分析來說可能沒問題，但如果我們想要預測病情，這些評論可能並不適合。
 
-<Tip>
-
-🙋向數據集添加新列的另一種方法是使用函數Dataset.add_column() 。這允許您輸入Python 列表或 NumPy，在不適合使用Dataset.map()情況下可以很方便。
-
-</Tip>
+> [!TIP]
+> 🙋向數據集添加新列的另一種方法是使用函數Dataset.add_column() 。這允許您輸入Python 列表或 NumPy，在不適合使用Dataset.map()情況下可以很方便。
 
 讓我們使用 **Dataset.filter()** 功能來刪除包含少於 30 個單詞的評論。與我們對 **condition** 列的處理相似，我們可以通過選取評論的長度高於此閾值來過濾掉非常短的評論：
 
@@ -236,11 +230,8 @@ print(drug_dataset.num_rows)
 
 如您所見，這已經從我們的原始訓練和測試集中刪除了大約 15% 的評論。
 
-<Tip>
-
-✏️ 試試看！使用 Dataset.sort() 函數查看單詞數最多的評論。請參閱文檔以瞭解您需要使用哪個參數按長度降序對評論進行排序。
-
-</Tip>
+> [!TIP]
+> ✏️ 試試看！使用 Dataset.sort() 函數查看單詞數最多的評論。請參閱文檔以瞭解您需要使用哪個參數按長度降序對評論進行排序。
 
 我們需要處理的最後一件事是評論中是否存在 HTML 字符代碼。我們可以使用 Python 的**html**模塊取消這些字符的轉義，如下所示：
 
@@ -297,11 +288,8 @@ def tokenize_function(examples):
 
 您還可以通過將整個單元格計時 **%%time** 在單元格的開頭。在我們執行此操作的硬件上，該指令顯示 10.8 秒（這是寫在“Wall time”之後的數字）。
 
-<Tip>
-
-✏️ **試試看！** 使用和不使用 `batched=True` 執行相同的指令，然後使用慢速標記器嘗試（在 `AutoTokenizer.from_pretrained()` 方法中添加 `use_fast=False`），這樣你就可以看看在你的電腦上它需要多長的時間。
-
-</Tip>
+> [!TIP]
+> ✏️ **試試看！** 使用和不使用 `batched=True` 執行相同的指令，然後使用慢速標記器嘗試（在 `AutoTokenizer.from_pretrained()` 方法中添加 `use_fast=False`），這樣你就可以看看在你的電腦上它需要多長的時間。
 
 以下是我們在使用和不使用批處理時使用快速和慢速分詞器獲得的結果：
 
@@ -338,19 +326,13 @@ Options         | Fast tokenizer | Slow tokenizer
 
 對於慢速分詞器來說，這些結果要合理得多，但快速分詞器的性能也得到了顯著提高。但是請注意，情況並非總是如此——除了 **num_proc=8**，我們的測試表明，使用**batched=True**而不帶有**num_proc**參數的選項處理起來更快。通常，我們不建議將 Python 多線程處理用於具有**batched=True**功能的快速標記器  .
 
-<Tip>
-
-使用num_proc以加快處理速度通常是一個好主意，只要您使用的函數還沒有自己帶有的進行某種多進程處理的方法。
-
-</Tip>
+> [!TIP]
+> 使用num_proc以加快處理速度通常是一個好主意，只要您使用的函數還沒有自己帶有的進行某種多進程處理的方法。
 
 將所有這些功能濃縮到一個方法中已經非常了不起，但還有更多！使用 **Dataset.map()** 和 **batched=True** 您可以更改數據集中的元素數量。當你想從一個例子中創建幾個訓練特徵時，這是非常有用的。我們將在[第七章](/course/chapter7).中進行的幾個NLP任務的預處理中使用到這個功能，它非常便利。
 
-<Tip>
-
-💡在機器學習中，一個例子通常可以為我們的模型提供一組特徵。在某些情況下，這些特徵會儲存在數據集的幾個列，但在其他情況下（例如此處的例子和用於問答的數據），可以從單個示例的一列中提取多個特徵
-
-</Tip>
+> [!TIP]
+> 💡在機器學習中，一個例子通常可以為我們的模型提供一組特徵。在某些情況下，這些特徵會儲存在數據集的幾個列，但在其他情況下（例如此處的例子和用於問答的數據），可以從單個示例的一列中提取多個特徵
 
 讓我們來看看它是如何工作的！在這裡，我們將標記化我們的示例並將最大截斷長度設置128，但我們將要求標記器返回全部文本塊，而不僅僅是第一個。這可以用 **return_overflowing_tokens=True** ：
 
@@ -519,11 +501,8 @@ drug_dataset["train"][:3]
 train_df = drug_dataset["train"][:]
 ```
 
-<Tip>
-
-🚨 在底層，`Dataset.set_format()` 改變了數據集的 `__getitem__()` dunder 方法的返回格式。 這意味著當我們想從 `"pandas"` 格式的 `Dataset` 中創建像 `train_df` 這樣的新對象時，我們需要對整個數據集進行切片以獲得 `pandas.DataFrame`。 無論輸出格式如何，您都可以自己驗證 `drug_dataset["train"]` 的類型依然還是 `Dataset`。
-
-</Tip>
+> [!TIP]
+> 🚨 在底層，`Dataset.set_format()` 改變了數據集的 `__getitem__()` dunder 方法的返回格式。 這意味著當我們想從 `"pandas"` 格式的 `Dataset` 中創建像 `train_df` 這樣的新對象時，我們需要對整個數據集進行切片以獲得 `pandas.DataFrame`。 無論輸出格式如何，您都可以自己驗證 `drug_dataset["train"]` 的類型依然還是 `Dataset`。
 
 
 從這裡我們可以使用我們想要的所有 Pandas 功能。例如，我們可以通過花式鏈接來計算 **condition**類之間的分佈 ：
@@ -594,11 +573,8 @@ Dataset({
 })
 ```
 
-<Tip>
-
-✏️ **試試看！** 計算每種藥物的平均評級並將結果存儲在一個新的Dataset.
-
-</Tip>
+> [!TIP]
+> ✏️ **試試看！** 計算每種藥物的平均評級並將結果存儲在一個新的Dataset.
 
 我們對 🤗 Datasets中可用的各種預處理技術的介紹到此結束。在最後一部分，讓我們創建一個驗證集來準備用於訓練分類器的數據集。在此之前，我們將輸出格式 **drug_dataset** 從 **pandas**重置到 **arrow** ：
 
diff --git a/chapters/zh-TW/chapter5/4.mdx b/chapters/zh-TW/chapter5/4.mdx
index 96e11bb8c..8a17b32d5 100644
--- a/chapters/zh-TW/chapter5/4.mdx
+++ b/chapters/zh-TW/chapter5/4.mdx
@@ -44,11 +44,8 @@ Dataset({
 
 我們可以看到我們的數據集中有 15,518,009 行和 2 列 -- 這是非常多的!
 
-<Tip>
-
-✎ 默認情況下, 🤗 Datasets 會自動解壓加載數據集所需的文件。 如果你想保留硬盤空間, 你可以傳遞 `DownloadConfig(delete_extracted=True)` 到 `download_config` 的 `load_dataset()`參數. 有關更多詳細信息, 請參閱文檔](https://huggingface.co/docs/datasets/package_reference/builder_classes#datasets.DownloadConfig)。
-
-</Tip>
+> [!TIP]
+> ✎ 默認情況下, 🤗 Datasets 會自動解壓加載數據集所需的文件。 如果你想保留硬盤空間, 你可以傳遞 `DownloadConfig(delete_extracted=True)` 到 `download_config` 的 `load_dataset()`參數. 有關更多詳細信息, 請參閱文檔](https://huggingface.co/docs/datasets/package_reference/builder_classes#datasets.DownloadConfig)。
 
 讓我們看看數據集的第一個元素的內容:
 
@@ -99,11 +96,8 @@ Dataset size (cache file) : 19.54 GB
 
 非常棒 -- 儘管它將近20GB, 但我們能夠佔用很少的RAM空間加載和訪問數據集!
 
-<Tip>
-
-✏️ **試試看!** 從[subsets](https://the-eye.eu/public/AI/pile_preliminary_components/)中選擇一個大於你的筆記本或者臺式機的RAM大小的子集, 用 🤗 Datasets加載這個數據集, 並且測量RAM的使用量。 請注意, 要獲得準確的測量結果, 你需要在另一個進程中執行這個操作。你可以在 [the Pile paper](https://arxiv.org/abs/2101.00027)的表一中找到每個子集解壓後的大小。
-
-</Tip>
+> [!TIP]
+> ✏️ **試試看!** 從[subsets](https://the-eye.eu/public/AI/pile_preliminary_components/)中選擇一個大於你的筆記本或者臺式機的RAM大小的子集, 用 🤗 Datasets加載這個數據集, 並且測量RAM的使用量。 請注意, 要獲得準確的測量結果, 你需要在另一個進程中執行這個操作。你可以在 [the Pile paper](https://arxiv.org/abs/2101.00027)的表一中找到每個子集解壓後的大小。
 
 如果你熟悉 Pandas, 這個結果可能會讓人感到很意外。因為 Wes Kinney 的著名的[經驗法則](https://wesmckinney.com/blog/apache-arrow-pandas-internals/) 是你需要的RAM應該是數據集的大小的5倍到10倍。 那麼 🤗 Datasets 是如何解決這個內存管理問題的呢? 🤗 Datasets 將每一個數據集看作一個[內存映射文件](https://en.wikipedia.org/wiki/Memory-mapped_file), 它提供了RAM和文件系統存儲之間的映射, 該映射允許庫訪問和操作數據集的元素, 而且無需將其完全加載到內存中。
 
@@ -131,11 +125,8 @@ print(
 
 這裡我們使用了 Python的 `timeit` 模塊來測量執行 `code_snippet`所耗的時間。 你通常能以十分之幾GB/s到幾GB/s的速度迭代數據集。通過上述的方法就已經能夠解決大多數大數據集加載的限制, 但是有時候你不得不使用一個很大的數據集, 它甚至都不能存儲在筆記本電腦的硬盤上。例如, 如果我們嘗試下載整個 Pile, 我們需要825GB的可用磁盤空間! 為了處理這種情況, 🤗 Datasets 提供了一個流式功能, 這個功能允許我們動態下載和訪問元素, 並且不需要下載整個數據集。讓我們來看看這個功能是如何工作的。
 
-<Tip>
-
-💡在 Jupyter 筆記中你還可以使用[`%%timeit` magic function](https://ipython.readthedocs.io/en/stable/interactive/magics.html#magic-timeit)為單元格計時。
-
-</Tip>
+> [!TIP]
+> 💡在 Jupyter 筆記中你還可以使用[`%%timeit` magic function](https://ipython.readthedocs.io/en/stable/interactive/magics.html#magic-timeit)為單元格計時。
 
 ## 流式數據集
 
@@ -173,11 +164,8 @@ next(iter(tokenized_dataset))
 {'input_ids': [101, 4958, 5178, 4328, 6779, ...], 'attention_mask': [1, 1, 1, 1, 1, ...]}
 ```
 
-<Tip>
-
-💡 你可以傳遞 `batched=True` 來通過流式加速標記化, 如同我們在上一節看到的那樣。它將逐批處理示例; 默認的批量大小為 1,000, 可以使用 `batch_size` 參數指定批量大小。
-
-</Tip>
+> [!TIP]
+> 💡 你可以傳遞 `batched=True` 來通過流式加速標記化, 如同我們在上一節看到的那樣。它將逐批處理示例; 默認的批量大小為 1,000, 可以使用 `batch_size` 參數指定批量大小。
 
 你還可以使用 `IterableDataset.shuffle()` 打亂流式數據集, 但與 `Dataset.shuffle()` 不同的是這隻會打亂預定義 `buffer_size` 中的元素:
 
@@ -278,10 +266,7 @@ next(iter(pile_dataset["train"]))
  'text': 'It is done, and submitted. You can play “Survival of the Tastiest” on Android, and on the web...'}
 ```
 
-<Tip>
-
-✏️ **試試看!** 使用像[`mc4`](https://huggingface.co/datasets/mc4) 或者 [`oscar`](https://huggingface.co/datasets/oscar)這樣的大型 Common Crawl 語料庫來創建一個流式多語言數據集, 該數據集代表你選擇的國家/地區語言的口語比例。例如, 瑞士的四種民族語言分別是德語、法語、意大利語和羅曼什語, 因此你可以嘗試根據根據口語比例對Oscar子集進行採用來創建瑞士語料庫。
-
-</Tip>
+> [!TIP]
+> ✏️ **試試看!** 使用像[`mc4`](https://huggingface.co/datasets/mc4) 或者 [`oscar`](https://huggingface.co/datasets/oscar)這樣的大型 Common Crawl 語料庫來創建一個流式多語言數據集, 該數據集代表你選擇的國家/地區語言的口語比例。例如, 瑞士的四種民族語言分別是德語、法語、意大利語和羅曼什語, 因此你可以嘗試根據根據口語比例對Oscar子集進行採用來創建瑞士語料庫。
 
 你現在擁有加載和處理各種類型和大小的數據集的所需的所有工具 -- 但是除非你非常幸運, 否則在你的NLP之旅中會有一個難題, 你將不得不創建一個數據集來解決手頭的問題。這就是下一節的主題!
diff --git a/chapters/zh-TW/chapter5/5.mdx b/chapters/zh-TW/chapter5/5.mdx
index 6f9953910..008c94f08 100644
--- a/chapters/zh-TW/chapter5/5.mdx
+++ b/chapters/zh-TW/chapter5/5.mdx
@@ -112,10 +112,8 @@ response.json()
 
 哇，這是很多信息！我們可以看到有用的字段，例如 **標題** , **內容** ，  **參與的成員**， **issue的描述信息**，以及打開issue的GitHub 用戶的信息。
 
-<Tip>
-
-✏️ 試試看！單擊上面 JSON 中的幾個 URL，以瞭解每個 GitHub issue中我url鏈接到的實際的地址。
-</Tip>
+> [!TIP]
+> ✏️ 試試看！單擊上面 JSON 中的幾個 URL，以瞭解每個 GitHub issue中我url鏈接到的實際的地址。
 
 如 GitHub[文檔](https://docs.github.com/en/rest/overview/resources-in-the-rest-api#rate-limiting) 中所述，未經身份驗證的請求限制為每小時 60 個請求。雖然你可以增加 **per_page** 查詢參數以減少您發出的請求數量，您仍然會遭到任何超過幾千個issue的存儲庫的速率限制。因此，您應該關注 GitHub 的[創建個人身份令牌](https://docs.github.com/en/github/authenticating-to-github/creating-a-personal-access-token)，創建一個個人訪問令牌這樣您就可以將速率限制提高到每小時 5,000 個請求。獲得令牌後，您可以將其包含在請求標頭中：
 
@@ -124,11 +122,8 @@ GITHUB_TOKEN = xxx  # Copy your GitHub token here
 headers = {"Authorization": f"token {GITHUB_TOKEN}"}
 ```
 
-<Tip warning={true}>
-
-⚠️ 不要與陌生人共享存在GITHUB令牌的筆記本。我們建議您在使用完後將GITHUB令牌刪除，以避免意外洩漏此信息。一個更好的做法是，將令牌存儲在.env文件中，並使用 [`python-dotenv` library](https://github.com/theskumar/python-dotenv) 為您自動將其作為環境變量加載。
-
-</Tip>
+> [!WARNING]
+> ⚠️ 不要與陌生人共享存在GITHUB令牌的筆記本。我們建議您在使用完後將GITHUB令牌刪除，以避免意外洩漏此信息。一個更好的做法是，將令牌存儲在.env文件中，並使用 [`python-dotenv` library](https://github.com/theskumar/python-dotenv) 為您自動將其作為環境變量加載。
 
 現在我們有了訪問令牌，讓我們創建一個可以從 GitHub 存儲庫下載所有issue的函數：
 
@@ -235,11 +230,8 @@ issues_dataset = issues_dataset.map(
 )
 ```
 
-<Tip>
-
-✏️ 試試看！計算在 🤗 Datasets中解決issue所需的平均時間。您可能會發現 Dataset.filter()函數對於過濾拉取請求和未解決的issue很有用，並且您可以使用Dataset.set_format()函數將數據集轉換為DataFrame，以便您可以輕鬆地按照需求修改創建和關閉的時間的格式（以時間戳格式）。
-
-</Tip>
+> [!TIP]
+> ✏️ 試試看！計算在 🤗 Datasets中解決issue所需的平均時間。您可能會發現 Dataset.filter()函數對於過濾拉取請求和未解決的issue很有用，並且您可以使用Dataset.set_format()函數將數據集轉換為DataFrame，以便您可以輕鬆地按照需求修改創建和關閉的時間的格式（以時間戳格式）。
 
 儘管我們可以通過刪除或重命名某些列來進一步清理數據集，但在此階段儘可能保持數據集“原始”狀態通常是一個很好的做法，以便它可以在多個應用程序中輕鬆使用。在我們將數據集推送到 Hugging Face Hub 之前，讓我們再添加一些缺少的數據：與每個issue和拉取請求相關的評論。我們接下來將添加它們——你猜對了——我們將依然使用GitHub REST API！
 
@@ -372,11 +364,8 @@ repo_url
 
 在此示例中，我們創建了一個名為的空數據集存儲庫 **github-issues** 在下面 **lewtun** 用戶名（當您運行此代碼時，用戶名應該是您的 Hub 用戶名！）。
 
-<Tip>
-
-✏️ 試試看！使用您的 Hugging Face Hub 用戶名和密碼獲取令牌並創建一個名為 github-issues.請記住永遠不要將您的憑據保存在 Colab 或任何其他存儲庫中，因為這些信息可能會被不法分子利用。
-
-</Tip>
+> [!TIP]
+> ✏️ 試試看！使用您的 Hugging Face Hub 用戶名和密碼獲取令牌並創建一個名為 github-issues.請記住永遠不要將您的憑據保存在 Colab 或任何其他存儲庫中，因為這些信息可能會被不法分子利用。
 
 接下來，讓我們將存儲庫從 Hub 克隆到我們的本地機器，並將我們的數據集文件複製到其中。 🤗 Hub 提供了一個方便的 **Repository** 類，它包含許多常見 Git 命令的類，因此要克隆遠程存儲庫，我們只需要提供我們要克隆的 URL 和本地路徑：
 
@@ -421,11 +410,8 @@ Dataset({
 
 很酷，我們已經將我們的數據集推送到 Hub，其他人可以使用它！只剩下一件重要的事情要做：添加一個數據卡這解釋了語料庫是如何創建的，併為使用數據集的其他提供一些其他有用的信息。
 
-<Tip>
-
-💡 您還可以使用一些 Git 魔法直接從終端將數據集上傳到 Hugging Face Hub。有關如何執行此操作的詳細信息，請參閱 [🤗 Datasets guide](https://huggingface.co/docs/datasets/share#share-a-dataset-using-the-cli) 指南。
-
-</Tip>
+> [!TIP]
+> 💡 您還可以使用一些 Git 魔法直接從終端將數據集上傳到 Hugging Face Hub。有關如何執行此操作的詳細信息，請參閱 [🤗 Datasets guide](https://huggingface.co/docs/datasets/share#share-a-dataset-using-the-cli) 指南。
 
 ## 創建數據集卡片
 
@@ -445,17 +431,12 @@ Dataset({
 <img src="https://huggingface.co/datasets/huggingface-course/documentation-images/resolve/main/en/chapter5/dataset-card.png" alt="A dataset card." width="80%"/>
 </div>
 
-<Tip>
-
-✏️試試看！使用應用程序和 [🤗 Datasets guide](https://github.com/huggingface/datasets/blob/master/templates/README_guide.md) 指南來完成 GitHub issue數據集的 README.md 文件。
-
-</Tip>
+> [!TIP]
+> ✏️試試看！使用應用程序和 [🤗 Datasets guide](https://github.com/huggingface/datasets/blob/master/templates/README_guide.md) 指南來完成 GitHub issue數據集的 README.md 文件。
 
 很好! 我們在本節中看到，創建一個好的數據集可能非常複雜，但幸運的是，將其上傳並與社區共享會很容易實現。在下一節中，我們將使用我們的新數據集創建一個帶有 🤗 Datasets的語義搜索引擎，該數據集可以將issue與最相關的issue和評論進行匹配。
 
-<Tip>
-
-✏️ 試試看！按照我們在本節中採取的步驟為您最喜歡的開源庫創建一個 GitHub issue數據集（當然，選擇 🤗 Datasets以外的其他東西！）。對於獎勵積分，微調多標籤分類器以預測該領域中存在的標籤。
-</Tip>
+> [!TIP]
+> ✏️ 試試看！按照我們在本節中採取的步驟為您最喜歡的開源庫創建一個 GitHub issue數據集（當然，選擇 🤗 Datasets以外的其他東西！）。對於獎勵積分，微調多標籤分類器以預測該領域中存在的標籤。
 
 
diff --git a/chapters/zh-TW/chapter5/6.mdx b/chapters/zh-TW/chapter5/6.mdx
index 8f578f8f6..b62dc59fd 100644
--- a/chapters/zh-TW/chapter5/6.mdx
+++ b/chapters/zh-TW/chapter5/6.mdx
@@ -186,11 +186,8 @@ Dataset({
 太好了，我們獲取到了幾千條的評論！
 
 
-<Tip>
-
-✏️ **試試看！** 看看能不能不用pandas就可以完成列的擴充； 這有點棘手； 你可能會發現 🤗 Datasets 文檔的 ["Batch mapping"](https://huggingface.co/docs/datasets/about_map_batch#batch-mapping) 對這個任務很有用。
-
-</Tip>
+> [!TIP]
+> ✏️ **試試看！** 看看能不能不用pandas就可以完成列的擴充； 這有點棘手； 你可能會發現 🤗 Datasets 文檔的 ["Batch mapping"](https://huggingface.co/docs/datasets/about_map_batch#batch-mapping) 對這個任務很有用。
 
 現在我們每行有一個評論，讓我們創建一個新的 **comments_length** 列來存放每條評論的字數：
 
@@ -519,8 +516,5 @@ URL: https://github.com/huggingface/datasets/issues/824
 
 我們的第二個搜索結果似乎與查詢相符。
 
-<Tip>
-
-✏️  試試看！創建您自己的查詢並查看您是否可以在檢索到的文檔中找到答案。您可能需要增加參數k以擴大搜索範圍。
-
-</Tip>
\ No newline at end of file
+> [!TIP]
+> ✏️  試試看！創建您自己的查詢並查看您是否可以在檢索到的文檔中找到答案。您可能需要增加參數k以擴大搜索範圍。
\ No newline at end of file
diff --git a/chapters/zh-TW/chapter6/2.mdx b/chapters/zh-TW/chapter6/2.mdx
index 57c320314..fd24bd1ab 100644
--- a/chapters/zh-TW/chapter6/2.mdx
+++ b/chapters/zh-TW/chapter6/2.mdx
@@ -11,11 +11,8 @@
 
 <Youtube id="DJimQynXZsQ"/>
 
-<Tip warning={true}>
-
-⚠️ 訓練標記器與訓練模型不同！模型訓練使用隨機梯度下降使每個batch的loss小一點。它本質上是隨機的（這意味著在進行兩次相同的訓練時，您必須設置一些隨機數種子才能獲得相同的結果）。訓練標記器是一個統計過程，它試圖確定哪些子詞最適合為給定的語料庫選擇，用於選擇它們的確切規則取決於分詞算法。它是確定性的，這意味著在相同的語料庫上使用相同的算法進行訓練時，您總是會得到相同的結果。
-
-</Tip>
+> [!WARNING]
+> ⚠️ 訓練標記器與訓練模型不同！模型訓練使用隨機梯度下降使每個batch的loss小一點。它本質上是隨機的（這意味著在進行兩次相同的訓練時，您必須設置一些隨機數種子才能獲得相同的結果）。訓練標記器是一個統計過程，它試圖確定哪些子詞最適合為給定的語料庫選擇，用於選擇它們的確切規則取決於分詞算法。它是確定性的，這意味著在相同的語料庫上使用相同的算法進行訓練時，您總是會得到相同的結果。
 
 ## 準備語料庫
 
diff --git a/chapters/zh-TW/chapter6/3.mdx b/chapters/zh-TW/chapter6/3.mdx
index c35a3e3ca..87ba663d4 100644
--- a/chapters/zh-TW/chapter6/3.mdx
+++ b/chapters/zh-TW/chapter6/3.mdx
@@ -33,11 +33,8 @@
 `batched=True`  | 10.8s          | 4min41s
 `batched=False` | 59.2s          | 5min3s
 
-<Tip warning={true}>
-
-⚠️ 對單個句子進行分詞時，您不會總是看到相同分詞器的慢速和快速版本之間的速度差異。事實上，快速版本實際上可能更慢！只有同時對大量文本進行標記時，您才能清楚地看到差異。
-
-</Tip>
+> [!WARNING]
+> ⚠️ 對單個句子進行分詞時，您不會總是看到相同分詞器的慢速和快速版本之間的速度差異。事實上，快速版本實際上可能更慢！只有同時對大量文本進行標記時，您才能清楚地看到差異。
 
 ## 批量編碼
 
@@ -105,13 +102,10 @@ encoding.word_ids()
 
 我們可以看到分詞器的特殊標記 **[CLS]** 和 **[SEP]** 被映射到 **None** ，然後每個標記都映射到它起源的單詞。這對於確定一個標記是否在單詞的開頭或兩個標記是否在同一個單詞中特別有用。我們可以依靠 **##** 前綴，但它僅適用於類似 BERT 的分詞器；這種方法適用於任何類型的標記器，只要它是快速的。在下一章中，我們將看到如何使用此功能將每個單詞的標籤正確應用於命名實體識別 (NER) 和詞性 (POS) 標記等任務中的標記。我們還可以使用它來屏蔽來自屏蔽語言建模中來自同一單詞的所有標記（一種稱為全詞掩碼）。
 
-<Tip>
-
-一個詞是什麼的概念很複雜。例如，“I'll”（“I will”的縮寫）算一兩個詞嗎？它實際上取決於分詞器和它應用的預分詞操作。一些標記器只是在空格上拆分，因此他們會將其視為一個詞。其他人在空格頂部使用標點符號，因此將其視為兩個詞。
-
-✏️ 試試看！從bert base cased和roberta base檢查點創建一個標記器，並用它們標記“81s”。你觀察到了什麼？ID這個詞是什麼？
-
-</Tip>
+> [!TIP]
+> 一個詞是什麼的概念很複雜。例如，“I'll”（“I will”的縮寫）算一兩個詞嗎？它實際上取決於分詞器和它應用的預分詞操作。一些標記器只是在空格上拆分，因此他們會將其視為一個詞。其他人在空格頂部使用標點符號，因此將其視為兩個詞。
+>
+> ✏️ 試試看！從bert base cased和roberta base檢查點創建一個標記器，並用它們標記“81s”。你觀察到了什麼？ID這個詞是什麼？
 
 同樣，有一個 **sentence_ids()** 我們可以用來將標記映射到它來自的句子的方法（儘管在這種情況下， **token_type_ids** 分詞器返回的信息可以為我們提供相同的信息）。
 
@@ -128,11 +122,8 @@ Sylvain
 
 正如我們之前提到的，這一切都是由快速標記器跟蹤每個標記來自列表中的文本跨度這一事實提供支持的抵消.為了說明它們的用途，接下來我們將向您展示如何複製結果 **token-classification** 手動管道。
 
-<Tip>
-
-✏️ 試試看！創建您自己的示例文本，看看您是否能理解哪些標記與單詞 ID 相關聯，以及如何提取單個單詞的字符跨度。對於獎勵積分，請嘗試使用兩個句子作為輸入，看看句子 ID 是否對您有意義。
-
-</Tip>
+> [!TIP]
+> ✏️ 試試看！創建您自己的示例文本，看看您是否能理解哪些標記與單詞 ID 相關聯，以及如何提取單個單詞的字符跨度。對於獎勵積分，請嘗試使用兩個句子作為輸入，看看句子 ID 是否對您有意義。
 
 ## 在令牌分類管道內
 
diff --git a/chapters/zh-TW/chapter6/3b.mdx b/chapters/zh-TW/chapter6/3b.mdx
index 4471c4cec..5135ff755 100644
--- a/chapters/zh-TW/chapter6/3b.mdx
+++ b/chapters/zh-TW/chapter6/3b.mdx
@@ -271,11 +271,8 @@ print(scores[start_index, end_index])
 0.97773
 ```
 
-<Tip>
-
-✏️ **試試看!** 計算五個最可能的答案的開始和結束索引。
-
-</Tip>
+> [!TIP]
+> ✏️ **試試看!** 計算五個最可能的答案的開始和結束索引。
 
 我們有 **start_index** 和 **end_index** 就標記而言的答案，所以現在我們只需要轉換為上下文中的字符索引。這是偏移量非常有用的地方。我們可以像在令牌分類任務中一樣抓住它們並使用它們：
 
@@ -309,11 +306,8 @@ print(result)
 
 太棒了！這和我們的第一個例子一樣！
 
-<Tip>
-
-✏️ **試試看!** 使用您之前計算的最佳分數來顯示五個最可能的答案。要檢查您的結果，請返回到第一個管道並在調用它時傳入。
-
-</Tip>
+> [!TIP]
+> ✏️ **試試看!** 使用您之前計算的最佳分數來顯示五個最可能的答案。要檢查您的結果，請返回到第一個管道並在調用它時傳入。
 
 ## 處理長上下文
 
@@ -605,11 +599,8 @@ print(candidates)
 
 這兩個候選對應於模型能夠在每個塊中找到的最佳答案。該模型對正確答案在第二部分更有信心（這是一個好兆頭！）。現在我們只需要將這兩個標記跨度映射到上下文中的字符跨度（我們只需要映射第二個標記以獲得我們的答案，但看看模型在第一個塊中選擇了什麼很有趣）。
 
-<Tip>
-
-✏️ **試試看!** 修改上面的代碼以返回五個最可能的答案的分數和跨度（總計，而不是每個塊）。
-
-</Tip>
+> [!TIP]
+> ✏️ **試試看!** 修改上面的代碼以返回五個最可能的答案的分數和跨度（總計，而不是每個塊）。
 
 這 **offsets** 我們之前抓取的實際上是一個偏移量列表，每個文本塊有一個列表：
 
@@ -630,10 +621,7 @@ for candidate, offset in zip(candidates, offsets):
 
 如果我們忽略第一個結果，我們會得到與這個長上下文的管道相同的結果——是的！
 
-<Tip>
-
-✏️ **試試看!** 使用您之前計算的最佳分數來顯示五個最可能的答案（對於整個上下文，而不是每個塊）。要檢查您的結果，請返回到第一個管道並在調用它時傳入。
-
-</Tip>
+> [!TIP]
+> ✏️ **試試看!** 使用您之前計算的最佳分數來顯示五個最可能的答案（對於整個上下文，而不是每個塊）。要檢查您的結果，請返回到第一個管道並在調用它時傳入。
 
 我們對分詞器功能的深入研究到此結束。我們將在下一章再次將所有這些付諸實踐，屆時我們將向您展示如何在一系列常見的 NLP 任務上微調模型。
diff --git a/chapters/zh-TW/chapter6/4.mdx b/chapters/zh-TW/chapter6/4.mdx
index e866ea2bf..eb90d54dc 100644
--- a/chapters/zh-TW/chapter6/4.mdx
+++ b/chapters/zh-TW/chapter6/4.mdx
@@ -47,12 +47,8 @@ print(tokenizer.backend_tokenizer.normalizer.normalize_str("Héllò hôw are ü?
 
 在這個例子中，因為我們選擇了 **bert-base-uncased** 檢查點，標準化應用小寫並刪除重音。
 
-<Tip>
-
-✏️ **試試看!** 從檢查點加載標記器並將相同的示例傳遞給它。您可以看到分詞器的帶殼和無殼版本之間的主要區別是什麼？
-
-
-</Tip>
+> [!TIP]
+> ✏️ **試試看!** 從檢查點加載標記器並將相同的示例傳遞給它。您可以看到分詞器的帶殼和無殼版本之間的主要區別是什麼？
 
 ## 預標記化
 
diff --git a/chapters/zh-TW/chapter6/5.mdx b/chapters/zh-TW/chapter6/5.mdx
index 1471d292d..220a788a1 100644
--- a/chapters/zh-TW/chapter6/5.mdx
+++ b/chapters/zh-TW/chapter6/5.mdx
@@ -11,11 +11,8 @@
 
 <Youtube id="HEikzVL-lZU"/>
 
-<Tip>
-
-💡 本節深入介紹了BPE，甚至展示了一個完整的實現。如果你只想大致瞭解標記化算法，可以跳到最後。
-
-</Tip>
+> [!TIP]
+> 💡 本節深入介紹了BPE，甚至展示了一個完整的實現。如果你只想大致瞭解標記化算法，可以跳到最後。
 
 ## 訓練算法
 
@@ -27,11 +24,8 @@ BPE 訓練首先計算語料庫中使用的唯一單詞集(在完成標準化和
 
 基礎詞彙將是 `["b", "g", "h", "n", "p", "s", "u"]`。對於實際情況，基本詞彙表將包含所有 ASCII 字符，至少，可能還包含一些 Unicode 字符。如果您正在標記的示例使用不在訓練語料庫中的字符，則該字符將轉換為未知標記。這就是為什麼許多 NLP 模型在分析帶有表情符號的內容方面非常糟糕的原因之一。
 
-<Tip>
-
-TGPT-2 和 RoBERTa 標記器(非常相似)有一個聰明的方法來處理這個問題: 他們不把單詞看成是用 Unicode 字符寫的，而是用字節寫的。這樣，基本詞彙表的大小很小(256),但你能想到的每個字符仍將被包含在內,而不會最終轉換為未知標記。這個技巧被稱為 *字節級 BPE*。
-
-</Tip>
+> [!TIP]
+> TGPT-2 和 RoBERTa 標記器(非常相似)有一個聰明的方法來處理這個問題: 他們不把單詞看成是用 Unicode 字符寫的，而是用字節寫的。這樣，基本詞彙表的大小很小(256),但你能想到的每個字符仍將被包含在內,而不會最終轉換為未知標記。這個技巧被稱為 *字節級 BPE*。
 
 獲得這個基本詞彙後，我們添加新的標記，直到通過學習*合併*達到所需的詞彙量，這是將現有詞彙表的兩個元素合併為一個新元素的規則。因此在開始時，這些合併將創建具有兩個字符的標記，然後隨著訓練的進行，會創建更長的子詞。
 
@@ -74,11 +68,8 @@ Corpus: ("hug", 10), ("p" "ug", 5), ("p" "un", 12), ("b" "un", 4), ("hug" "s", 5
 
 我們繼續這樣合併,直到達到我們所需的詞彙量。
 
-<Tip>
-
-✏️ **現在輪到你了!**你認為下一個合併規則是什麼？
-
-</Tip>
+> [!TIP]
+> ✏️ **現在輪到你了!**你認為下一個合併規則是什麼？
 
 ## 標記化算法
 
@@ -99,11 +90,8 @@ Corpus: ("hug", 10), ("p" "ug", 5), ("p" "un", 12), ("b" "un", 4), ("hug" "s", 5
 
 這個單詞 `"bug"` 將被標記為 `["b", "ug"]`。然而 `"mug"`,將被標記為 `["[UNK]", "ug"]`,因為字母 `"m"` 不再基本詞彙表中。同樣,單詞`"thug"` 會被標記為 `["[UNK]", "hug"]`: 字母 `"t"` 不在基本詞彙表中,應用合併規則首先導致 `"u"` 和 `"g"` 被合併,然後是 `"hu"` 和 `"g"` 被合併。
 
-<Tip>
-
-✏️ **現在輪到你了!** 你認為這個詞 `"unhug"` 將如何被標記？
-
-</Tip>
+> [!TIP]
+> ✏️ **現在輪到你了!** 你認為這個詞 `"unhug"` 將如何被標記？
 
 ## 實現 BPE
 
@@ -315,11 +303,8 @@ print(vocab)
  'Ġtok', 'Ġtoken', 'nd', 'Ġis', 'Ġth', 'Ġthe', 'in', 'Ġab', 'Ġtokeni']
 ```
 
-<Tip>
-
-💡 在同一語料庫上使用 `train_new_from_iterator()` 不會產生完全相同的詞彙表。這是因為當有最頻繁對的選擇時,我們選擇遇到的第一個, 而 🤗 Tokenizers 庫根據內部ID選擇第一個。
-
-</Tip>
+> [!TIP]
+> 💡 在同一語料庫上使用 `train_new_from_iterator()` 不會產生完全相同的詞彙表。這是因為當有最頻繁對的選擇時,我們選擇遇到的第一個, 而 🤗 Tokenizers 庫根據內部ID選擇第一個。
 
 為了對新文本進行分詞,我們對其進行預分詞、拆分，然後應用學到的所有合併規則:
 
@@ -351,10 +336,7 @@ tokenize("This is not a token.")
 ['This', 'Ġis', 'Ġ', 'n', 'o', 't', 'Ġa', 'Ġtoken', '.']
 ```
 
-<Tip warning={true}>
-
-⚠️ 如果存在未知字符,我們的實現將拋出錯誤,因為我們沒有做任何處理它們。GPT-2 實際上沒有未知標記(使用字節級 BPE 時不可能得到未知字符),但這可能發生在這裡,因為我們沒有在初始詞彙表中包含所有可能的字節。 BPE 的這方面超出了本節的範圍,因此我們忽略了細節。
-
-</Tip>
+> [!WARNING]
+> ⚠️ 如果存在未知字符,我們的實現將拋出錯誤,因為我們沒有做任何處理它們。GPT-2 實際上沒有未知標記(使用字節級 BPE 時不可能得到未知字符),但這可能發生在這裡,因為我們沒有在初始詞彙表中包含所有可能的字節。 BPE 的這方面超出了本節的範圍,因此我們忽略了細節。
 
 這就是 BPE 算法！接下來,我們將看看 WordPiece。
\ No newline at end of file
diff --git a/chapters/zh-TW/chapter6/6.mdx b/chapters/zh-TW/chapter6/6.mdx
index 69c08c682..671cfbbcc 100644
--- a/chapters/zh-TW/chapter6/6.mdx
+++ b/chapters/zh-TW/chapter6/6.mdx
@@ -11,19 +11,13 @@ WordPiece 是 Google 為預訓練 BERT 而開發的標記化算法。此後,它
 
 <Youtube id="qpv6ms_t_1A"/>
 
-<Tip>
-
-💡 本節深入介紹 WordPiece,甚至展示完整的實現。如果您只想大致瞭解標記化算法,可以跳到最後。
-
-</Tip>
+> [!TIP]
+> 💡 本節深入介紹 WordPiece,甚至展示完整的實現。如果您只想大致瞭解標記化算法,可以跳到最後。
 
 ## 訓練算法
 
-<Tip warning={true}>
-
-⚠️ Google 從未開源 WordPiece 訓練算法的實現,因此以下是我們基於已發表文獻的最佳猜測。它可能不是 100% 準確的。
-
-</Tip>
+> [!WARNING]
+> ⚠️ Google 從未開源 WordPiece 訓練算法的實現,因此以下是我們基於已發表文獻的最佳猜測。它可能不是 100% 準確的。
 
 與 BPE 一樣,WordPiece 從一個小詞彙表開始,包括模型使用的特殊標記和初始字母表。因為它通過添加前綴來識別子詞 (如同 `##` 對於 BERT),每個單詞最初是通過將該前綴添加到單詞內的所有字符來拆分的。所以,例如 `"word"` ,像這樣拆分:
 
@@ -76,10 +70,8 @@ Corpus: ("hug", 10), ("p" "##u" "##g", 5), ("p" "##u" "##n", 12), ("b" "##u" "##
 
 我們繼續這樣處理,直到達到我們所需的詞彙量。
 
-<Tip>
-
-✏️ **現在輪到你了!** 下一個合併規則是什麼？
-</Tip>
+> [!TIP]
+> ✏️ **現在輪到你了!** 下一個合併規則是什麼？
 
 ## 標記化算法
 
@@ -91,11 +83,8 @@ WordPiece 和 BPE 中的標記化的不同在於 WordPiece 只保存最終詞彙
 
 當分詞達到無法在詞彙表中找到子詞的階段時, 整個詞被標記為未知 -- 例如, `"mug"` 將被標記為 `["[UNK]"]`,就像 `"bum"` (即使我們可以以 `"b"` 和 `"##u"` 開始, `"##m"` 不在詞彙表中,由此產生的標記將只是 `["[UNK]"]`, 不是 `["b", "##u", "[UNK]"]`)。這是與 BPE 的另一個區別,BPE 只會將不在詞彙表中的單個字符分類為未知。
 
-<Tip>
-
-✏️ **現在輪到你了!** `"pugs"` 將被如何標記?
-
-</Tip>
+> [!TIP]
+> ✏️ **現在輪到你了!** `"pugs"` 將被如何標記?
 
 ## 實現 WordPiece
 
@@ -313,11 +302,8 @@ print(vocab)
 
 正如我們所看到的,與 BPE 相比,這個標記器將單詞的一部分作為標記學習得更快一些。
 
-<Tip>
-
-💡 在同一語料庫上使用 `train_new_from_iterator()` 不會產生完全相同的詞彙表。這是因為 🤗 Tokenizers 庫沒有為訓練實現 WordPiece(因為我們不完全確定它的內部結構),而是使用 BPE。
-
-</Tip>
+> [!TIP]
+> 💡 在同一語料庫上使用 `train_new_from_iterator()` 不會產生完全相同的詞彙表。這是因為 🤗 Tokenizers 庫沒有為訓練實現 WordPiece(因為我們不完全確定它的內部結構),而是使用 BPE。
 
 為了對新文本進行分詞,我們對其進行預分詞、拆分,然後對每個單詞應用分詞算法。也就是說,我們從第一個詞的開頭尋找最大的子詞並將其拆分,然後我們在第二部分重複這個過程,對於該詞的其餘部分和文本中的以下詞,依此類推:
 
diff --git a/chapters/zh-TW/chapter6/7.mdx b/chapters/zh-TW/chapter6/7.mdx
index 95e013cb8..47a9cd754 100644
--- a/chapters/zh-TW/chapter6/7.mdx
+++ b/chapters/zh-TW/chapter6/7.mdx
@@ -11,11 +11,8 @@
 
 <Youtube id="TGZfZVuF9Yc"/>
 
-<Tip>
-
-💡 本節深入介紹了 Unigram,甚至展示了一個完整的實現。如果你只想大致瞭解標記化算法,可以跳到最後。
-
-</Tip>
+> [!TIP]
+> 💡 本節深入介紹了 Unigram,甚至展示了一個完整的實現。如果你只想大致瞭解標記化算法,可以跳到最後。
 
 ## 訓練算法
 
@@ -56,11 +53,8 @@ Unigram 模型是一種語言模型,它認為每個標記都獨立於它之前
 
 所以,所有頻率之和為210, 並且子詞 `"ug"` 出現的概率是 20/210。
 
-<Tip>
-
-✏️ **現在輪到你了!** 編寫代碼來計算上面的頻率,並仔細檢查顯示的結果以及總和是否正確。
-
-</Tip>
+> [!TIP]
+> ✏️ **現在輪到你了!** 編寫代碼來計算上面的頻率,並仔細檢查顯示的結果以及總和是否正確。
 
 現在，為了對給定的單詞進行標記，我們將所有可能的分割視為標記，並根據 Unigram 模型計算每個分割的概率。由於所有標記都被認為是獨立的，所以這個概率只是每個標記概率的乘積。例如 `"pug"` 的標記化 `["p", "u", "g"]` 的概率為:
 
@@ -98,11 +92,8 @@ Character 4 (g): "un" "hug" (score 0.005442)
 
 因此 `"unhug"` 將被標記為 `["un", "hug"]`。
 
-<Tip>
-
-✏️ **現在輪到你了!** 確定單詞 `"huggun"` 的標記化及其分數。
-
-</Tip>
+> [!TIP]
+> ✏️ **現在輪到你了!** 確定單詞 `"huggun"` 的標記化及其分數。
 
 ## 回到訓練
 
@@ -215,11 +206,8 @@ token_freqs = list(char_freqs.items()) + sorted_subwords[: 300 - len(char_freqs)
 token_freqs = {token: freq for token, freq in token_freqs}
 ```
 
-<Tip>
-
-💡 SentencePiece 使用一種稱為增強後綴數組(ESA)的更高效算法來創建初始詞彙表。
-
-</Tip>
+> [!TIP]
+> 💡 SentencePiece 使用一種稱為增強後綴數組(ESA)的更高效算法來創建初始詞彙表。
 
 接下來,我們計算所有頻率的總和,將頻率轉換為概率。對於我們的模型,我們將存儲概率的對數,因為添加對數比乘以小數在數值上更穩定,這將簡化模型損失的計算:
 
@@ -340,11 +328,8 @@ print(scores["his"])
 0.0
 ```
 
-<Tip>
-
-💡 這種方法非常低效,因此 SentencePiece 使用了沒有標記 X 的模型損失的近似值:它不是從頭開始,而是通過其在剩餘詞彙表中的分段替換標記 X。這樣,所有分數可以與模型損失同時計算。
-
-</Tip>
+> [!TIP]
+> 💡 這種方法非常低效,因此 SentencePiece 使用了沒有標記 X 的模型損失的近似值:它不是從頭開始,而是通過其在剩餘詞彙表中的分段替換標記 X。這樣,所有分數可以與模型損失同時計算。
 
 完成所有這些後,我們需要做的最後一件事是將模型使用的特殊標記添加到詞彙表中,然後循環直到我們從詞彙表中修剪了足夠的標記以達到我們想要的大小:
 
diff --git a/chapters/zh-TW/chapter6/8.mdx b/chapters/zh-TW/chapter6/8.mdx
index c61e14d7f..2182b0da9 100644
--- a/chapters/zh-TW/chapter6/8.mdx
+++ b/chapters/zh-TW/chapter6/8.mdx
@@ -111,12 +111,9 @@ print(tokenizer.normalizer.normalize_str("Héllò hôw are ü?"))
 hello how are u?
 ```
 
-<Tip>
-
-**更進一步**如果您在包含 unicode 字符的字符串上測試先前normalizers的兩個版本，您肯定會注意到這兩個normalizers並不完全等效。
-為了不過度使用 `normalizers.Sequence` 使版本過於複雜，我們沒有包含當 `clean_text` 參數設置為 `True` 時 `BertNormalizer` 需要的正則表達式替換 - 這是默認行為。 但不要擔心：通過在normalizer序列中添加兩個 `normalizers.Replace` 可以在不使用方便的 `BertNormalizer` 的情況下獲得完全相同的規範化。
-
-</Tip>
+> [!TIP]
+> **更進一步**如果您在包含 unicode 字符的字符串上測試先前normalizers的兩個版本，您肯定會注意到這兩個normalizers並不完全等效。
+> 為了不過度使用 `normalizers.Sequence` 使版本過於複雜，我們沒有包含當 `clean_text` 參數設置為 `True` 時 `BertNormalizer` 需要的正則表達式替換 - 這是默認行為。 但不要擔心：通過在normalizer序列中添加兩個 `normalizers.Replace` 可以在不使用方便的 `BertNormalizer` 的情況下獲得完全相同的規範化。
 
 接下來是預標記步驟。 同樣，我們可以使用一個預構建的“BertPreTokenizer”：
 
diff --git a/chapters/zh-TW/chapter7/1.mdx b/chapters/zh-TW/chapter7/1.mdx
index 038fcc3bd..3a8a2d142 100644
--- a/chapters/zh-TW/chapter7/1.mdx
+++ b/chapters/zh-TW/chapter7/1.mdx
@@ -26,8 +26,5 @@
 {/if}
 
 
-<Tip>
-
-如果您按順序閱讀這些部分，您會注意到它們有很多共同的代碼和陳述。 重複是有意為之的，讓您可以深入（或稍後返回）任何您感興趣的任務並找到一個完整的工作示例。
-
-</Tip>
+> [!TIP]
+> 如果您按順序閱讀這些部分，您會注意到它們有很多共同的代碼和陳述。 重複是有意為之的，讓您可以深入（或稍後返回）任何您感興趣的任務並找到一個完整的工作示例。
diff --git a/chapters/zh-TW/chapter7/2.mdx b/chapters/zh-TW/chapter7/2.mdx
index 3652439a7..9a18f2625 100644
--- a/chapters/zh-TW/chapter7/2.mdx
+++ b/chapters/zh-TW/chapter7/2.mdx
@@ -46,11 +46,8 @@
 
 首先，我們需要一個適合標記分類的數據集。在本節中，我們將使用[CoNLL-2003 數據集](https://huggingface.co/datasets/conll2003), 其中包含來自路透社的新聞報道。
 
-<Tip>
-
-💡 只要您的數據集由帶有相應標籤的分割成單詞並的文本組成，您就能夠將這裡描述的數據處理過程應用到您自己的數據集。如果需要複習如何在.Dataset中加載自定義數據，請參閱[Chapter 5](/course/chapter5)。
-
-</Tip>
+> [!TIP]
+> 💡 只要您的數據集由帶有相應標籤的分割成單詞並的文本組成，您就能夠將這裡描述的數據處理過程應用到您自己的數據集。如果需要複習如何在.Dataset中加載自定義數據，請參閱[Chapter 5](/course/chapter5)。
 
 ### CoNLL-2003 數據集
 
@@ -168,11 +165,8 @@ print(line2)
 
 正如我們所看到的，跨越兩個單詞的實體，如“European Union”和“Werner Zwingmann”，模型為第一個單詞標註了一個B-標籤，為第二個單詞標註了一個I-標籤。
 
-<Tip>
-
-✏️ **輪到你了！** 使用 POS 或chunking標籤識別同一個句子。
-
-</Tip>
+> [!TIP]
+> ✏️ **輪到你了！** 使用 POS 或chunking標籤識別同一個句子。
 
 ### 處理數據
 
@@ -263,11 +257,8 @@ print(align_labels_with_tokens(labels, word_ids))
 
 正如我們所看到的，我們的函數為開頭和結尾的兩個特殊標記添加了  `-100` ，併為分成兩個標記的單詞添加了一個新的`0` 。
 
-<Tip>
-
-✏️ **輪到你了！** 一些研究人員更喜歡每個詞只歸屬一個標籤, 並分配 `-100` 給定詞中的其他子標記。這是為了避免分解成大量子標記的長詞對損失造成嚴重影響。按照此規則更改前一個函數使標籤與輸入id對齊。
-
-</Tip>
+> [!TIP]
+> ✏️ **輪到你了！** 一些研究人員更喜歡每個詞只歸屬一個標籤, 並分配 `-100` 給定詞中的其他子標記。這是為了避免分解成大量子標記的長詞對損失造成嚴重影響。按照此規則更改前一個函數使標籤與輸入id對齊。
 
 為了預處理我們的整個數據集，我們需要標記所有輸入並在所有標籤上應用 `align_labels_with_tokens()` 。為了利用我們的快速分詞器的速度優勢，最好同時對大量文本進行分詞，因此我們將編寫一個處理示例列表的函數並使用帶 `batched=True` 有選項的 `Dataset.map()`方法 .與我們之前的示例唯一不同的是當分詞器的輸入是文本列表（或者像例子中的單詞列表）時  `word_ids()` 函數需要獲取我們想要單詞的索引的ID，所以我們也添加它：
 
@@ -429,11 +420,8 @@ model.config.num_labels
 9
 ```
 
-<Tip warning={true}>
-
-⚠️ 如果您的模型標籤數量錯誤，則在稍後調用 `model.fit()` 時將收到一個模糊的錯誤。調試起來可能很煩人，因此請確保執行此檢查以確認您具有預期的標籤數。
-
-</Tip>
+> [!WARNING]
+> ⚠️ 如果您的模型標籤數量錯誤，則在稍後調用 `model.fit()` 時將收到一個模糊的錯誤。調試起來可能很煩人，因此請確保執行此檢查以確認您具有預期的標籤數。
 
 ### 微調模型
 
@@ -497,11 +485,8 @@ model.fit(
 
 您之前已經看過其中的大部分內容：我們設置了一些超參數（例如學習率、要訓練的 epoch 數和權重衰減），然後我們指定 `push_to_hub=True` 表明我們想要保存模型並在每個時期結束時對其進行評估，並且我們想要將我們的結果上傳到模型中心。請注意，可以使用hub_model_id參數指定要推送到的存儲庫的名稱(特別是，必須使用這個參數來推送到一個組織)。例如，當我們將模型推送到[`huggingface-course` organization](https://huggingface.co/huggingface-course), 我們添加了 `hub_model_id=huggingface-course/bert-finetuned-ner` 到 `TrainingArguments` .默認情況下，使用的存儲庫將在您的命名空間中並以您設置的輸出目錄命名，因此在我們的例子中它將是 `sgugger/bert-finetuned-ner` .
 
-<Tip>
-
-💡 如果您正在使用的輸出目錄已經存在，那麼輸出目錄必須是從同一個存儲庫clone下來的。如果不是，您將在聲明 `model.fit()` 時遇到錯誤，並且需要設置一個新名稱。
-
-</Tip>
+> [!TIP]
+> 💡 如果您正在使用的輸出目錄已經存在，那麼輸出目錄必須是從同一個存儲庫clone下來的。如果不是，您將在聲明 `model.fit()` 時遇到錯誤，並且需要設置一個新名稱。
 
 請注意，當訓練發生時，每次保存模型時（這裡是每個epooch），它都會在後臺上傳到 Hub。這樣，如有必要，您將能夠在另一臺機器上繼續您的訓練。
 
@@ -679,11 +664,8 @@ model.config.num_labels
 9
 ```
 
-<Tip warning={true}>
-
-⚠️ 如果模型的標籤數量錯誤，稍後調用Trainer.train()方法時會出現一個模糊的錯誤（類似於“CUDA error: device-side assert triggered”）。這是用戶報告此類錯誤的第一個原因，因此請確保進行這樣的檢查以確認您擁有預期數量的標籤。
-
-</Tip>
+> [!WARNING]
+> ⚠️ 如果模型的標籤數量錯誤，稍後調用Trainer.train()方法時會出現一個模糊的錯誤（類似於“CUDA error: device-side assert triggered”）。這是用戶報告此類錯誤的第一個原因，因此請確保進行這樣的檢查以確認您擁有預期數量的標籤。
 
 ### 微調模型
 
@@ -718,11 +700,8 @@ args = TrainingArguments(
 
 您之前已經看過其中的大部分內容：我們設置了一些超參數（例如學習率、要訓練的 epoch 數和權重衰減），然後我們指定 `push_to_hub=True` 表明我們想要保存模型並在每個時期結束時對其進行評估，並且我們想要將我們的結果上傳到模型中心。請注意，可以使用hub_model_id參數指定要推送到的存儲庫的名稱(特別是，必須使用這個參數來推送到一個組織)。例如，當我們將模型推送到[`huggingface-course` organization](https://huggingface.co/huggingface-course), 我們添加了 `hub_model_id=huggingface-course/bert-finetuned-ner` 到 `TrainingArguments` 。默認情況下，使用的存儲庫將在您的命名空間中並以您設置的輸出目錄命名，因此在我們的例子中它將是 `sgugger/bert-finetuned-ner`。
 
-<Tip>
-
-💡 如果您正在使用的輸出目錄已經存在，那麼輸出目錄必須是從同一個存儲庫clone下來的。如果不是，您將在聲明 `Trainer` 時遇到錯誤，並且需要設置一個新名稱。
-
-</Tip>
+> [!TIP]
+> 💡 如果您正在使用的輸出目錄已經存在，那麼輸出目錄必須是從同一個存儲庫clone下來的。如果不是，您將在聲明 `Trainer` 時遇到錯誤，並且需要設置一個新名稱。
 
 最後，我們只是將所有內容傳遞給 `Trainer` 並啟動訓練：
 
@@ -808,11 +787,8 @@ model, optimizer, train_dataloader, eval_dataloader = accelerator.prepare(
 )
 ```
 
-<Tip>
-
-🚨 如果您在 TPU 上進行訓練，則需要將以上單元格中的所有代碼移動到專用的訓練函數中。有關詳細信息，請參閱 [第3章](/course/chapter3)。
-
-</Tip>
+> [!TIP]
+> 🚨 如果您在 TPU 上進行訓練，則需要將以上單元格中的所有代碼移動到專用的訓練函數中。有關詳細信息，請參閱 [第3章](/course/chapter3)。
 
 現在我們已經發送了我們的 `train_dataloader` 到 `accelerator.prepare()` ，我們可以使用它的長度來計算訓練步驟的數量。請記住，我們應該始終在準備好dataloader後執行此操作，因為該方法會改變其長度。我們使用經典線性學習率調度：
 
diff --git a/chapters/zh-TW/chapter7/3.mdx b/chapters/zh-TW/chapter7/3.mdx
index 7f6e95008..fcc92b592 100644
--- a/chapters/zh-TW/chapter7/3.mdx
+++ b/chapters/zh-TW/chapter7/3.mdx
@@ -42,11 +42,8 @@
 
 <Youtube id="mqElG5QJWUg"/>
 
-<Tip>
-
-🙋 如果您對“掩碼語言建模”和“預訓練模型”這兩個術語感到陌生, 請查看[第一章](/course/chapter1), 我們在其中解釋了所有這些核心概念, 並附有視頻!
-
-</Tip>
+> [!TIP]
+> 🙋 如果您對“掩碼語言建模”和“預訓練模型”這兩個術語感到陌生, 請查看[第一章](/course/chapter1), 我們在其中解釋了所有這些核心概念, 並附有視頻!
 
 ## 選擇用於掩碼語言建模的預訓練模型
 
@@ -239,11 +236,8 @@ for row in sample:
 
 是的, 這些肯定是電影評論, 如果你年齡足夠,你甚至可能會理解上次評論中關於擁有 VHS 版本的評論😜! 雖然我們不需要語言建模的標籤, 但我們已經可以看到 `0` 表示負面評論, 而 `1` 對應正面。
 
-<Tip>
-
-✏️ **試試看!** 創建 `無監督` 拆分的隨機樣本, 並驗證標籤既不是 `0` 也不是 `1`。在此過程中, 你還可以檢查 `train` 和 `test` 拆分中的標籤是否確實為 `0` 或 `1` -- 這是每個 NLP 從業者在新項目開始時都應該執行的有用的健全性檢查!
-
-</Tip>
+> [!TIP]
+> ✏️ **試試看!** 創建 `無監督` 拆分的隨機樣本, 並驗證標籤既不是 `0` 也不是 `1`。在此過程中, 你還可以檢查 `train` 和 `test` 拆分中的標籤是否確實為 `0` 或 `1` -- 這是每個 NLP 從業者在新項目開始時都應該執行的有用的健全性檢查!
 
 現在我們已經快速瀏覽了數據, 讓我們深入研究為掩碼語言建模做準備。正如我們將看到的, 與我們在[第三章](/course/chapter3)中看到的序列分類任務相比, 還需要採取一些額外的步驟。讓我們繼續!
 
@@ -301,11 +295,8 @@ tokenizer.model_max_length
 
 該值來自於與檢查點相關聯的 *tokenizer_config.json* 文件; 在這種情況下, 我們可以看到上下文大小是 512 個標記, 就像 BERT 一樣。
 
-<Tip>
-
-✏️ **試試看!** 一些 Transformer 模型, 例如 [BigBird](https://huggingface.co/google/bigbird-roberta-base) 和 [Longformer](hf.co/allenai/longformer-base-4096), 它們具有比BERT和其他早期Transformer模型更長的上下文長度。為這些檢查點之一實例化標記器, 並驗證 `model_max_length` 是否與模型卡上引用的內容一致。
-
-</Tip>
+> [!TIP]
+> ✏️ **試試看!** 一些 Transformer 模型, 例如 [BigBird](https://huggingface.co/google/bigbird-roberta-base) 和 [Longformer](hf.co/allenai/longformer-base-4096), 它們具有比BERT和其他早期Transformer模型更長的上下文長度。為這些檢查點之一實例化標記器, 並驗證 `model_max_length` 是否與模型卡上引用的內容一致。
 
 因此, 以便在像Google Colab 那樣的 GPU 上運行我們的實驗, 我們將選擇可以放入內存的更小一些的東西:
 
@@ -313,11 +304,8 @@ tokenizer.model_max_length
 chunk_size = 128
 ```
 
-<Tip warning={true}>
-
-請注意, 在實際場景中使用較小的塊大小可能是有害的, 因此你應該使用與將應用模型的用例相對應的大小。
-
-</Tip>
+> [!WARNING]
+> 請注意, 在實際場景中使用較小的塊大小可能是有害的, 因此你應該使用與將應用模型的用例相對應的大小。
 
 有趣的來了。為了展示串聯是如何工作的, 讓我們從我們的標記化訓練集中取一些評論並打印出每個評論的標記數量:
 
@@ -474,11 +462,8 @@ for chunk in data_collator(samples)["input_ids"]:
 
 很棒, 成功了! 我們可以看到, `[MASK]` 標記已隨機插入我們文本中的不同位置。 這些將是我們的模型在訓練期間必須預測的標記 -- 數據整理器的美妙之處在於, 它將隨機化每個批次的 `[MASK]` 插入! 
 
-<Tip>
-
-✏️ **試試看!** 多次運行上面的代碼片段, 看看隨機屏蔽發生在你眼前! 還要將 `tokenizer.decode()` 方法替換為 `tokenizer.convert_ids_to_tokens()` 以查看有時會屏蔽給定單詞中的單個標記, 而不是其他標記。
-
-</Tip>
+> [!TIP]
+> ✏️ **試試看!** 多次運行上面的代碼片段, 看看隨機屏蔽發生在你眼前! 還要將 `tokenizer.decode()` 方法替換為 `tokenizer.convert_ids_to_tokens()` 以查看有時會屏蔽給定單詞中的單個標記, 而不是其他標記。
 
 {#if fw === 'pt'}
 
@@ -588,11 +573,8 @@ for chunk in batch["input_ids"]:
 '>>> .... [MASK] [MASK] [MASK] [MASK]....... high. a classic line : inspector : i\'m here to sack one of your teachers. student : welcome to bromwell high. i expect that many adults of my age think that bromwell high is far fetched. what a pity that it isn\'t! [SEP] [CLS] homelessness ( or houselessness as george carlin stated ) has been an issue for years but never a plan to help those on the street that were once considered human who did everything from going to school, work, or vote for the matter. most people think of the homeless'
 ```
 
-<Tip>
-
-✏️ **試試看!** 多次運行上面的代碼片段, 看看隨機屏蔽發生在你眼前! 還要將 `tokenizer.decode()` 方法替換為 `tokenizer.convert_ids_to_tokens()` 以查看來自給定單詞的標記始終被屏蔽在一起。
-
-</Tip>
+> [!TIP]
+> ✏️ **試試看!** 多次運行上面的代碼片段, 看看隨機屏蔽發生在你眼前! 還要將 `tokenizer.decode()` 方法替換為 `tokenizer.convert_ids_to_tokens()` 以查看來自給定單詞的標記始終被屏蔽在一起。
 
 現在我們有兩個數據整理器, 其餘的微調步驟是標準的。如果您沒有足夠幸運地獲得神話般的 P100 GPU 😭, 在 Google Colab 上進行訓練可能需要一段時間, 因此我們將首先將訓練集的大小縮減為幾千個示例。別擔心, 我們仍然會得到一個相當不錯的語言模型! 在 🤗 Datasets 中快速下采樣數據集的方法是通過我們在 [第五章](/course/chapter5) 中看到的 `Dataset.train_test_split()` 函數:
 
@@ -816,11 +798,8 @@ trainer.push_to_hub()
 
 {/if}
 
-<Tip>
-
-✏️ **輪到你了!** 將數據整理器改為全字屏蔽整理器後運行上面的訓練。你有得到更好的結果嗎?
-
-</Tip>
+> [!TIP]
+> ✏️ **輪到你了!** 將數據整理器改為全字屏蔽整理器後運行上面的訓練。你有得到更好的結果嗎?
 
 {#if fw === 'pt'} 
 
@@ -1038,8 +1017,5 @@ for pred in preds:
 
 這結束了我們訓練語言模型的第一個實驗。在 [第六節](/course/chapter7/section6)中你將學習如何從頭開始訓練像 GPT-2 這樣的自迴歸模型; 如果你想了解如何預訓練您自己的 Transformer 模型, 請前往那裡!
 
-<Tip>
-
-✏️ **試試看!** 為了量化域適應的好處, 微調 IMDb 標籤上的分類器和預先訓練和微調的Distil BERT檢查點。如果你需要複習文本分類, 請查看 [第三章](/course/chapter3)。
-
-</Tip>
+> [!TIP]
+> ✏️ **試試看!** 為了量化域適應的好處, 微調 IMDb 標籤上的分類器和預先訓練和微調的Distil BERT檢查點。如果你需要複習文本分類, 請查看 [第三章](/course/chapter3)。
diff --git a/chapters/zh-TW/chapter7/4.mdx b/chapters/zh-TW/chapter7/4.mdx
index 667b463de..986dad63e 100644
--- a/chapters/zh-TW/chapter7/4.mdx
+++ b/chapters/zh-TW/chapter7/4.mdx
@@ -156,11 +156,8 @@ translator(
 
 <Youtube id="0Oxphw4Q9fo"/>
 
-<Tip>
-
-✏️ **輪到你了！** 另一個在法語中經常使用的英語單詞是“email”。在訓練數據集中找到使用這個詞的第一個樣本。它是如何翻譯的？預訓練模型如何翻譯同一個英文句子？
-
-</Tip>
+> [!TIP]
+> ✏️ **輪到你了！** 另一個在法語中經常使用的英語單詞是“email”。在訓練數據集中找到使用這個詞的第一個樣本。它是如何翻譯的？預訓練模型如何翻譯同一個英文句子？
 
 ### 處理數據
 
@@ -177,11 +174,8 @@ tokenizer = AutoTokenizer.from_pretrained(model_checkpoint, return_tensors="tf")
 
 您可以將 **model_checkpoint** 更換為[Hub](https://huggingface.co/models)上你喜歡的任何其他型號，或本地保存的預訓練模型和標記器。
 
-<Tip>
-
-💡 如果正在使用mart、mBART-50或M2 M100等多語言標記器，則需要在tokenizer中設置tokenizer.src_lang和tokenizer.tgt_lang為正確的輸入和目標的語言代碼。
-
-</Tip>
+> [!TIP]
+> 💡 如果正在使用mart、mBART-50或M2 M100等多語言標記器，則需要在tokenizer中設置tokenizer.src_lang和tokenizer.tgt_lang為正確的輸入和目標的語言代碼。
 
 我們的數據準備非常簡單。 只需要記住一件事：您照常處理輸入，但對於這次的輸出目標，您需要將標記器包裝在上下文管理器“as_target_tokenizer()”中。
 
@@ -244,17 +238,11 @@ def preprocess_function(examples):
 
 請注意，我們為輸入和輸出設置了相同的最大長度。由於我們處理的文本看起來很短，我們使用 128。
 
-<Tip>
+> [!TIP]
+> 💡如果你使用的是T5模型(更具體地說，是T5 -xxx檢查點之一)，模型將需要文本輸入有一個前綴來表示正在進行的任務，例如從英語到法語的翻譯
 
-💡如果你使用的是T5模型(更具體地說，是T5 -xxx檢查點之一)，模型將需要文本輸入有一個前綴來表示正在進行的任務，例如從英語到法語的翻譯
-
-</Tip>
-
-<Tip warning={true}>
-
-⚠️ 我們不關注目標的注意力掩碼，因為模型不會需要它。相反，對應於填充標記的標籤應設置為-100，以便在loss計算中忽略它們。這將在稍後由我們的數據整理器完成，因為我們正在應用動態填充，但是如果您在此處使用填充，您應該調整預處理函數以將與填充標記對應的所有標籤設置為 -100。
-
-</Tip>
+> [!WARNING]
+> ⚠️ 我們不關注目標的注意力掩碼，因為模型不會需要它。相反，對應於填充標記的標籤應設置為-100，以便在loss計算中忽略它們。這將在稍後由我們的數據整理器完成，因為我們正在應用動態填充，但是如果您在此處使用填充，您應該調整預處理函數以將與填充標記對應的所有標籤設置為 -100。
 
 我們現在可以對數據集的所有數據一次性應用該預處理：
 
@@ -643,11 +631,8 @@ model.fit(
 
 請注意，您可以使用 `hub_model_id` 參數指定要推送到的存儲庫的名稱（當您想把模型推送到指定的組織的時候，您也必須使用此參數）。 例如，當我們將模型推送到 [`huggingface-course` 組織](https://huggingface.co/huggingface-course) 時，我們添加了 `hub_model_id="huggingface-course/marian-finetuned-kde4-en- to-fr"` 到 `Seq2SeqTrainingArguments`。 默認情況下，使用的存儲庫將在您的命名空間中，並以您設置的輸出目錄命名，因此這裡將是 `"sgugger/marian-finetuned-kde4-en-to-fr"`。
 
-<Tip>
-
-💡如果您使用的輸出目錄已經存在，則它需要是您要推送到的存儲庫的本地克隆。如果不是，您將在定義您的名稱時會遇到錯誤，並且需要設置一個新名稱。
-
-</Tip>
+> [!TIP]
+> 💡如果您使用的輸出目錄已經存在，則它需要是您要推送到的存儲庫的本地克隆。如果不是，您將在定義您的名稱時會遇到錯誤，並且需要設置一個新名稱。
 
 最後，讓我們看看訓練結束後我們的指標是什麼樣的：
 
@@ -693,11 +678,8 @@ args = Seq2SeqTrainingArguments(
 
 請注意，您可以使用 `hub_model_id` 參數指定要推送到的存儲庫的名稱（當您想把模型推送到指定的組織的時候，您也必須使用此參數）。 例如，當我們將模型推送到 [`huggingface-course` 組織](https://huggingface.co/huggingface-course) 時，我們添加了 `hub_model_id="huggingface-course/marian-finetuned-kde4-en- to-fr"` 到 `Seq2SeqTrainingArguments`。 默認情況下，使用的存儲庫將在您的命名空間中，並以您設置的輸出目錄命名，因此這裡將是 `"sgugger/marian-finetuned-kde4-en-to-fr"`。
 
-<Tip>
-
-💡如果您使用的輸出目錄已經存在，則它需要是您要推送到的存儲庫的本地克隆。如果不是，您將在定義您的名稱時會遇到錯誤，並且需要設置一個新名稱。
-
-</Tip>
+> [!TIP]
+> 💡如果您使用的輸出目錄已經存在，則它需要是您要推送到的存儲庫的本地克隆。如果不是，您將在定義您的名稱時會遇到錯誤，並且需要設置一個新名稱。
 
 
 最後，我們需要將所有內容傳遞給 **Seq2SeqTrainer** ：
@@ -989,8 +971,5 @@ translator(
 
 風格適應的另一個很好的例子！
 
-<Tip>
-
-✏️ **輪到你了！** “電子郵件”這個詞在模型返回了什麼？
-
-</Tip>
+> [!TIP]
+> ✏️ **輪到你了！** “電子郵件”這個詞在模型返回了什麼？
diff --git a/chapters/zh-TW/chapter7/5.mdx b/chapters/zh-TW/chapter7/5.mdx
index 49f6cdd4a..edd33b275 100644
--- a/chapters/zh-TW/chapter7/5.mdx
+++ b/chapters/zh-TW/chapter7/5.mdx
@@ -87,11 +87,8 @@ show_samples(english_dataset)
 '>> Review: Bought this for handling miscellaneous aircraft parts and hanger "stuff" that I needed to organize; it really fit the bill. The unit arrived quickly, was well packaged and arrived intact (always a good sign). There are five wall mounts-- three on the top and two on the bottom. I wanted to mount it on the wall, so all I had to do was to remove the top two layers of plastic drawers, as well as the bottom corner drawers, place it when I wanted and mark it; I then used some of the new plastic screw in wall anchors (the 50 pound variety) and it easily mounted to the wall. Some have remarked that they wanted dividers for the drawers, and that they made those. Good idea. My application was that I needed something that I can see the contents at about eye level, so I wanted the fuller-sized drawers. I also like that these are the new plastic that doesn\'t get brittle and split like my older plastic drawers did. I like the all-plastic construction. It\'s heavy duty enough to hold metal parts, but being made of plastic it\'s not as heavy as a metal frame, so you can easily mount it to the wall and still load it up with heavy stuff, or light stuff. No problem there. For the money, you can\'t beat it. Best one of these I\'ve bought to date-- and I\'ve been using some version of these for over forty years.'
 ```
 
-<Tip>
-
-✏️ **試試看！** 更改 `Dataset.shuffle()` 命令中的隨機種子以探索語料庫中的其他評論。 如果您是說西班牙語的人，請查看 `spanish_dataset` 中的一些評論，看看標題是否也像合理的摘要。
-
-</Tip>
+> [!TIP]
+> ✏️ **試試看！** 更改 `Dataset.shuffle()` 命令中的隨機種子以探索語料庫中的其他評論。 如果您是說西班牙語的人，請查看 `spanish_dataset` 中的一些評論，看看標題是否也像合理的摘要。
 
 此示例顯示了人們通常在網上找到的評論的多樣性，從正面到負面（以及介於兩者之間的所有內容！）。儘管標題為“meh”的示例信息量不大，但其他標題看起來像是對評論本身的體面總結。在單個 GPU 上訓練所有 400,000 條評論的摘要模型將花費太長時間，因此我們將專注於為單個產品領域生成摘要。為了瞭解我們可以選擇哪些域，讓我們將 **english_dataset** 轉換到 **pandas.DataFrame** 並計算每個產品類別的評論數量：
 
@@ -229,11 +226,8 @@ books_dataset = books_dataset.filter(lambda x: len(x["review_title"].split()) >
 
 mT5 不使用前綴，但具有 T5 的大部分功能，並且具有多語言的優勢。現在我們已經選擇了一個模型，讓我們來看看準備我們的訓練數據。
 
-<Tip>
-
-✏️ **試試看！** 完成本節後，通過使用相同的技術對 mBART 進行微調，看看 mT5 與 mBART 相比有多好。 對於獎勵積分，您還可以嘗試僅在英文評論上微調 T5。 由於 T5 需要一個特殊的前綴提示，因此您需要在下面的預處理步驟中將“summarize:”添加到輸入示例中。
-
-</Tip>
+> [!TIP]
+> ✏️ **試試看！** 完成本節後，通過使用相同的技術對 mBART 進行微調，看看 mT5 與 mBART 相比有多好。 對於獎勵積分，您還可以嘗試僅在英文評論上微調 T5。 由於 T5 需要一個特殊的前綴提示，因此您需要在下面的預處理步驟中將“summarize:”添加到輸入示例中。
 
 ## 預處理數據
 
@@ -248,11 +242,8 @@ model_checkpoint = "google/mt5-small"
 tokenizer = AutoTokenizer.from_pretrained(model_checkpoint)
 ```
 
-<Tip>
-
-💡在 NLP 項目的早期階段，一個好的做法是在小樣本數據上訓練一類“小”模型。這使您可以更快地調試和迭代端到端工作流。一旦您對結果充滿信心，您始終可以通過簡單地更改模型檢查點來在大規模數據上訓練模型！
-
-</Tip>
+> [!TIP]
+> 💡在 NLP 項目的早期階段，一個好的做法是在小樣本數據上訓練一類“小”模型。這使您可以更快地調試和迭代端到端工作流。一旦您對結果充滿信心，您始終可以通過簡單地更改模型檢查點來在大規模數據上訓練模型！
 
 讓我們在一個小例子上測試 mT5 標記器：
 
@@ -308,11 +299,8 @@ tokenized_datasets = books_dataset.map(preprocess_function, batched=True)
 
 既然語料庫已經預處理完畢，我們來看看一些常用的摘要指標。正如我們將看到的，在衡量機器生成的文本的質量方面沒有靈丹妙藥。
 
-<Tip>
-
-💡 你可能已經注意到我們在上面的 `Dataset.map()` 函數中使用了 `batched=True`。 這會以 1,000 個（默認）為單位對示例進行編碼，並允許您利用 🤗 Transformers 中快速標記器的多線程功能。 在可能的情況下，嘗試使用 `batched=True` 來加速您的預處理！
-
-</Tip>
+> [!TIP]
+> 💡 你可能已經注意到我們在上面的 `Dataset.map()` 函數中使用了 `batched=True`。 這會以 1,000 個（默認）為單位對示例進行編碼，並允許您利用 🤗 Transformers 中快速標記器的多線程功能。 在可能的情況下，嘗試使用 `batched=True` 來加速您的預處理！
 
 
 ## 文本摘要的指標
@@ -329,11 +317,8 @@ reference_summary = "I loved reading the Hunger Games"
 ```
 比較它們的一種方法是計算重疊單詞的數量，在這種情況下為 6。但是，這有點粗糙，因此 ROUGE 是基於計算計算重疊的 _precision_ 和 _recall_ 分數。。
 
-<Tip>
-
-🙋 如果這是您第一次聽說精確率和召回率，請不要擔心——我們將一起通過一些明確的示例來說明一切。 這些指標通常在分類任務中遇到，因此如果您想了解在該上下文中如何定義精確度和召回率，我們建議查看 scikit-learn [指南](https://scikit-learn.org/stable /auto_examples/model_selection/plot_precision_recall.html）。
-
-</Tip>
+> [!TIP]
+> 🙋 如果這是您第一次聽說精確率和召回率，請不要擔心——我們將一起通過一些明確的示例來說明一切。 這些指標通常在分類任務中遇到，因此如果您想了解在該上下文中如何定義精確度和召回率，我們建議查看 scikit-learn [指南](https://scikit-learn.org/stable /auto_examples/model_selection/plot_precision_recall.html）。
 
 對於 ROUGE，recall 衡量生成的參考摘要包含了多少參考摘要。如果我們只是比較單詞，recall可以根據以下公式計算：
 
@@ -384,11 +369,8 @@ Score(precision=0.86, recall=1.0, fmeasure=0.92)
 ```
 太好了，準確率和召回率匹配了！那麼其他的 ROUGE 分數呢？ **rouge2** 測量二元組之間的重疊（想想單詞對的重疊），而 **rougeL** 和 **rougeLsum** 通過在生成的和參考摘要中查找最長的公共子串來測量最長的單詞匹配序列。中的“總和” **rougeLsum** 指的是這個指標是在整個摘要上計算的，而 **rougeL** 計算為單個句子的平均值。
 
-<Tip>
-
-    ✏️ **試試看！** 創建您自己的生成和參考摘要示例，並查看生成的 ROUGE 分數是否與基於精確度和召回率公式的手動計算一致。 對於附加分，將文本拆分為二元組並比較“rouge2”指標的精度和召回率。
-
-</Tip>
+> [!TIP]
+> ✏️ **試試看！** 創建您自己的生成和參考摘要示例，並查看生成的 ROUGE 分數是否與基於精確度和召回率公式的手動計算一致。 對於附加分，將文本拆分為二元組並比較“rouge2”指標的精度和召回率。
 
 我們將使用這些 ROUGE 分數來跟蹤我們模型的性能，但在此之前，讓我們做每個優秀的 NLP 從業者都應該做的事情：創建一個強大而簡單的baseline！
 
@@ -477,12 +459,9 @@ model = TFAutoModelForSeq2SeqLM.from_pretrained(model_checkpoint)
 
 {/if}
 
-<Tip>
-
-💡 If you're wondering why you don't see any warnings about fine-tuning the model on a downstream task, that's because for sequence-to-sequence tasks we keep all the weights of the network. Compare this to our text classification model in [Chapter 3](/course/chapter3), where the head of the pretrained model was replaced with a randomly initialized network.
-💡 如果您想知道為什麼在下游任務中沒有看到任何關於微調模型的警告，那是因為對於序列到序列的任務，我們保留了網絡的所有權重。與我們在[第三章] (/course/chapter3)中的文本分類模型進行比較，文本分類模型預訓練模型的頭部被隨機初始化的網絡替換。
-
-</Tip>
+> [!TIP]
+> 💡 If you're wondering why you don't see any warnings about fine-tuning the model on a downstream task, that's because for sequence-to-sequence tasks we keep all the weights of the network. Compare this to our text classification model in [Chapter 3](/course/chapter3), where the head of the pretrained model was replaced with a randomly initialized network.
+> 💡 如果您想知道為什麼在下游任務中沒有看到任何關於微調模型的警告，那是因為對於序列到序列的任務，我們保留了網絡的所有權重。與我們在[第三章] (/course/chapter3)中的文本分類模型進行比較，文本分類模型預訓練模型的頭部被隨機初始化的網絡替換。
 
 我們需要做的下一件事是登錄 Hugging Face Hub。如果您在notebook中運行此代碼，則可以使用以下實用程序函數執行此操作：
 
@@ -820,11 +799,8 @@ model, optimizer, train_dataloader, eval_dataloader = accelerator.prepare(
 )
 ```
 
-<Tip>
-
-🚨如果您在 TPU 上進行訓練，則需要將上述所有代碼移動到專門的訓練函數中。有關詳細信息，請參閱[第三章](/course/chapter3)。
-
-</Tip>
+> [!TIP]
+> 🚨如果您在 TPU 上進行訓練，則需要將上述所有代碼移動到專門的訓練函數中。有關詳細信息，請參閱[第三章](/course/chapter3)。
 
 現在我們已經準備好了我們索要用的對象，還有三件事要做：
 
diff --git a/chapters/zh-TW/chapter7/6.mdx b/chapters/zh-TW/chapter7/6.mdx
index 49a820da5..ac665f315 100644
--- a/chapters/zh-TW/chapter7/6.mdx
+++ b/chapters/zh-TW/chapter7/6.mdx
@@ -128,11 +128,8 @@ DatasetDict({
 })
 ```
 
-<Tip>
-
-預訓練語言模型需要一段時間。我們建議您首先通過取消註釋以上兩行的註釋對數據樣本運行訓練循環，並確保訓練成功完成並存儲模型。沒有什麼比最後一步的訓練失敗更令人沮喪的了，因為你忘記創建一個文件夾或者因為保存路徑在訓練循環結束時有一個錯字！
-
-</Tip>
+> [!TIP]
+> 預訓練語言模型需要一段時間。我們建議您首先通過取消註釋以上兩行的註釋對數據樣本運行訓練循環，並確保訓練成功完成並存儲模型。沒有什麼比最後一步的訓練失敗更令人沮喪的了，因為你忘記創建一個文件夾或者因為保存路徑在訓練循環結束時有一個錯字！
 
 讓我們看一個來自數據集的例子。我們將只顯示每個字段的前 200 個字符：
 
@@ -247,11 +244,8 @@ DatasetDict({
 現在我們已經準備好了數據集，讓我們設置模型！
 
 
-<Tip>
-
-✏️ **試試看！** 擺脫所有小於上下文大小的塊在這裡並不是什麼大問題，因為我們使用的是小上下文窗口。隨著上下文大小的增加（或者如果您有一個短文檔語料庫），被丟棄的塊的比例也會增加。準備數據的更有效方法是將所有標記化的樣本加入一個批次中，每個語料之間有一個`eos_token_id` 標記, 然後對連接的序列執行分塊。作為練習，修改 `tokenize()`函數以使用該方法。請注意，您需要設置`truncation=False` 和刪除標記生成器中的其他參數以獲取完整的標記 ID 序列。
-
-</Tip>
+> [!TIP]
+> ✏️ **試試看！** 擺脫所有小於上下文大小的塊在這裡並不是什麼大問題，因為我們使用的是小上下文窗口。隨著上下文大小的增加（或者如果您有一個短文檔語料庫），被丟棄的塊的比例也會增加。準備數據的更有效方法是將所有標記化的樣本加入一個批次中，每個語料之間有一個`eos_token_id` 標記, 然後對連接的序列執行分塊。作為練習，修改 `tokenize()`函數以使用該方法。請注意，您需要設置`truncation=False` 和刪除標記生成器中的其他參數以獲取完整的標記 ID 序列。
 
 
 ## 初始化新模型
@@ -393,11 +387,8 @@ tf_eval_dataset = tokenized_dataset["valid"].to_tf_dataset(
 
 {/if}
 
-<Tip warning={true}>
-
-⚠️  移動輸入和標籤以對齊它們發生在模型內部，因此數據整理器只需複製輸入以創建標籤。
-
-</Tip>
+> [!WARNING]
+> ⚠️  移動輸入和標籤以對齊它們發生在模型內部，因此數據整理器只需複製輸入以創建標籤。
 
 
 現在我們已經準備好實際訓練我們的模型的一切了——畢竟這不是那麼多工作！在我們開始訓練之前，我們應該登錄 Hugging Face。如果您在筆記本上工作，則可以使用以下實用程序功能：
@@ -496,25 +487,19 @@ model.fit(tf_train_dataset, validation_data=tf_eval_dataset, callbacks=[callback
 
 {/if}
 
-<Tip>
+> [!TIP]
+> ✏️ **試試看!** 除了`TrainingArguments` 之外，我們只需要大約30行代碼就可以從原始文本到訓練GPT-2。 用你自己的數據集試試看，看看你能不能得到好的結果！
 
-✏️ **試試看!** 除了`TrainingArguments` 之外，我們只需要大約30行代碼就可以從原始文本到訓練GPT-2。 用你自己的數據集試試看，看看你能不能得到好的結果！
-
-</Tip>
-
-<Tip>
-
-{#if fw === 'pt'}
-
-💡 如果您可以訪問具有多個 GPU 的機器，請嘗試在那裡運行代碼。 `Trainer`自動管理多臺機器，這可以極大地加快訓練速度。
-
-{:else}
-
-💡 如果您有權訪問具有多個 GPU 的計算機，則可以嘗試使用 `MirroredStrategy` 上下文來大幅加快訓練速度。您需要創建一個`tf.distribute.MirroredStrategy`對象，並確保 `to_tf_dataset` 命令以及模型創建和對 `fit()`的調用都在其 `scope()` context. 上下文中運行。您可以查看有關此內容的文檔[here](https://www.tensorflow.org/guide/distributed_training#use_tfdistributestrategy_with_keras_modelfit).
-
-{/if}
-
-</Tip>
+> [!TIP]
+> {#if fw === 'pt'}
+>
+> 💡 如果您可以訪問具有多個 GPU 的機器，請嘗試在那裡運行代碼。 `Trainer`自動管理多臺機器，這可以極大地加快訓練速度。
+>
+> {:else}
+>
+> 💡 如果您有權訪問具有多個 GPU 的計算機，則可以嘗試使用 `MirroredStrategy` 上下文來大幅加快訓練速度。您需要創建一個`tf.distribute.MirroredStrategy`對象，並確保 `to_tf_dataset` 命令以及模型創建和對 `fit()`的調用都在其 `scope()` context. 上下文中運行。您可以查看有關此內容的文檔[here](https://www.tensorflow.org/guide/distributed_training#use_tfdistributestrategy_with_keras_modelfit).
+>
+> {/if}
 
 ## 使用管道生成代碼
 
@@ -790,11 +775,8 @@ model, optimizer, train_dataloader, eval_dataloader = accelerator.prepare(
 )
 ```
 
-<Tip>
-
-🚨 如果您在 TPU 上進行訓練，則需要將從上面的單元格開始的所有代碼移動到專用的訓練函數中。有關詳細信息，請參閱 [第 3 章](/course/chapter3) for more details.
-
-</Tip>
+> [!TIP]
+> 🚨 如果您在 TPU 上進行訓練，則需要將從上面的單元格開始的所有代碼移動到專用的訓練函數中。有關詳細信息，請參閱 [第 3 章](/course/chapter3) for more details.
 
 現在我們已經發送了我們的 `train_dataloader`到 `accelerator.prepare()` ，我們可以使用它的長度來計算訓練步驟的數量。請記住，我們應該始終在準備好dataloader後執行此操作，因為該方法會改變其長度。我們使用經典線性學習率調度：
 
@@ -891,16 +873,10 @@ for epoch in range(num_train_epochs):
 
 就是這樣 - 您現在擁有自己的因果語言模型（例如 GPT-2）的自定義訓練循環，您可以根據自己的需要進一步自定義。
 
-<Tip>
-
-✏️ **試試看!** 創建適合您的用例的自定義損失函數，或在訓練循環中添加另一個自定義步驟。
-
-</Tip>
-
-<Tip>
-
-✏️ **試試看!** 在運行長時間的訓練實驗時，最好使用 TensorBoard 或 Weights Biases 等工具記錄重要指標。向訓練循環添加適當的日誌記錄，以便您始終可以檢查訓練的進行情況。going.
+> [!TIP]
+> ✏️ **試試看!** 創建適合您的用例的自定義損失函數，或在訓練循環中添加另一個自定義步驟。
 
-</Tip>
+> [!TIP]
+> ✏️ **試試看!** 在運行長時間的訓練實驗時，最好使用 TensorBoard 或 Weights Biases 等工具記錄重要指標。向訓練循環添加適當的日誌記錄，以便您始終可以檢查訓練的進行情況。going.
 
 {/if}
\ No newline at end of file
diff --git a/chapters/zh-TW/chapter7/7.mdx b/chapters/zh-TW/chapter7/7.mdx
index 075c1f6ff..8ad9ee93e 100644
--- a/chapters/zh-TW/chapter7/7.mdx
+++ b/chapters/zh-TW/chapter7/7.mdx
@@ -33,11 +33,8 @@
 
 本節使用的代碼已經上傳到了Hub。你可以在 [這裡](https://huggingface.co/huggingface-course/bert-finetuned-squad?context=%F0%9F%A4%97+Transformers+is+backed+by+the+three+most+popular+deep+learning+libraries+%E2%80%94+Jax%2C+PyTorch+and+TensorFlow+%E2%80%94+with+a+seamless+integration+between+them.+It%27s+straightforward+to+train+your+models+with+one+before+loading+them+for+inference+with+the+other.&question=Which+deep+learning+libraries+back+%F0%9F%A4%97+Transformers%3F) 找到它並嘗試用它進行預測。
 
-<Tip>
-
-💡 像 BERT 這樣的純編碼器模型往往很擅長提取諸如 "誰發明了 Transformer 架構?"之類的事實性問題的答案。但在給出諸如 "為什麼天空是藍色的?" 之類的開放式問題時表現不佳。在這些更具挑戰性的情況下, T5 和 BART 等編碼器-解碼器模型通常使用以與 [文本摘要](/course/chapter7/5) 非常相似的方式合成信息。如果你對這種類型的*生成式*問答感興趣, 我們建議您查看我們基於 [ELI5 數據集](https://huggingface.co/datasets/eli5) 的 [演示](https://yjernite.github.io/lfqa.html)。
-
-</Tip>
+> [!TIP]
+> 💡 像 BERT 這樣的純編碼器模型往往很擅長提取諸如 "誰發明了 Transformer 架構?"之類的事實性問題的答案。但在給出諸如 "為什麼天空是藍色的?" 之類的開放式問題時表現不佳。在這些更具挑戰性的情況下, T5 和 BART 等編碼器-解碼器模型通常使用以與 [文本摘要](/course/chapter7/5) 非常相似的方式合成信息。如果你對這種類型的*生成式*問答感興趣, 我們建議您查看我們基於 [ELI5 數據集](https://huggingface.co/datasets/eli5) 的 [演示](https://yjernite.github.io/lfqa.html)。
 
 ## 準備數據
 
@@ -360,11 +357,8 @@ print(f"Theoretical answer: {answer}, decoded example: {decoded_example}")
 
 事實上, 我們在上下文中看不到答案。
 
-<Tip>
-
-✏️ **輪到你了!** 使用 XLNet 架構時, 在左側應用填充, 並切換問題和上下文。將我們剛剛看到的所有代碼改編為 XLNet 架構 (並添加 `padding=True`)。請注意, `[CLS]` 標記可能不在應用填充的 0 位置。
-
-</Tip>
+> [!TIP]
+> ✏️ **輪到你了!** 使用 XLNet 架構時, 在左側應用填充, 並切換問題和上下文。將我們剛剛看到的所有代碼改編為 XLNet 架構 (並添加 `padding=True`)。請注意, `[CLS]` 標記可能不在應用填充的 0 位置。
 
 現在我們已經逐步瞭解瞭如何預處理我們的訓練數據, 我們可以將其分組到一個函數中, 我們將應用於整個訓練數據集。我們會將每個特徵填充到我們設置的最大長度, 因為大多數上下文會很長 (並且相應的樣本將被分成幾個特徵), 所以在這裡應用動態填充沒有真正的好處:
 
@@ -915,11 +909,8 @@ tf.keras.mixed_precision.set_global_policy("mixed_float16")
 
 {#if fw === 'pt'}
 
-<Tip>
-
-💡 如果您使用的輸出目錄存在, 則它需要是您要推送到的存儲庫的本地克隆 (因此, 如果在定義 `Trainer` 時出錯, 請設置新名稱)。
-
-</Tip>
+> [!TIP]
+> 💡 如果您使用的輸出目錄存在, 則它需要是您要推送到的存儲庫的本地克隆 (因此, 如果在定義 `Trainer` 時出錯, 請設置新名稱)。
 
 最後, 我們只需將所有內容傳遞給 `Trainer` 類並啟動訓練:
 
@@ -1003,11 +994,8 @@ trainer.push_to_hub(commit_message="Training complete")
 
 在這個階段, 你可以使用模型中心上的推理小部件來測試模型並與您的朋友、家人和最喜歡的寵物分享。你已經成功地微調了一個問答任務的模型 -- 恭喜!
 
-<Tip>
-
-✏️ **輪到你了!** 嘗試另一種模型架構, 看看它是否在此任務上表現更好!
-
-</Tip>
+> [!TIP]
+> ✏️ **輪到你了!** 嘗試另一種模型架構, 看看它是否在此任務上表現更好!
 
 {#if fw === 'pt'}
 
diff --git a/chapters/zh-TW/chapter8/2.mdx b/chapters/zh-TW/chapter8/2.mdx
index d41597cd7..e3fd5df3c 100644
--- a/chapters/zh-TW/chapter8/2.mdx
+++ b/chapters/zh-TW/chapter8/2.mdx
@@ -85,11 +85,8 @@ OSError: Can't load config for 'lewtun/distillbert-base-uncased-finetuned-squad-
 
 這些報告中包含很多信息, 所以讓我們一起來看看關鍵部分。首先要注意的是, 應該從 _從底部到頂部_ 讀取回溯。如果你習慣於從上到下閱讀英文文本, 這可能聽起來很奇怪,但它反映了這樣一個事實,即回溯顯示了在下載模型和標記器時 `管道` 進行的函數調用序列。(查看 [第二章](/course/chapter2) 瞭解有關 `pipeline` 如何在後臺工作的更多詳細信息。)
 
-<Tip>
-
-🚨 看到Google Colab 回溯中 "6 幀" 周圍的藍色框了嗎? 這是 Colab 的一個特殊功能, 它將回溯壓縮為"幀"。如果你似乎無法找到錯誤的來源, 請確保通過單擊這兩個小箭頭來展開完整的回溯。
-
-</Tip>
+> [!TIP]
+> 🚨 看到Google Colab 回溯中 "6 幀" 周圍的藍色框了嗎? 這是 Colab 的一個特殊功能, 它將回溯壓縮為"幀"。如果你似乎無法找到錯誤的來源, 請確保通過單擊這兩個小箭頭來展開完整的回溯。
 
 這意味著回溯的最後一行指示最後一條錯誤消息並給出引發的異常的名稱。在這種情況下, 異常類型是`OSError`, 表示與系統相關的錯誤。如果我們閱讀隨附的錯誤消息, 我們可以看到模型的 *config.json* 文件似乎有問題, 我們給出了兩個修復它的建議:
 
@@ -103,11 +100,8 @@ Make sure that:
 """
 ```
 
-<Tip>
-
-💡 如果你遇到難以理解的錯誤消息, 只需將該消息複製並粘貼到 Google 或 [Stack Overflow](https://stackoverflow.com/) 搜索欄中 (是的, 真的!)。你很可能不是第一個遇到錯誤的人, 這是找到社區中其他人發佈的解決方案的好方法。例如, 在 Stack Overflow 上搜索 `OSError: Can't load config for` 給出了幾個[hits](https://stackoverflow.com/search?q=OSError%3A+Can%27t+load+config+for+), 可能是用作解決問題的起點。
-
-</Tip>
+> [!TIP]
+> 💡 如果你遇到難以理解的錯誤消息, 只需將該消息複製並粘貼到 Google 或 [Stack Overflow](https://stackoverflow.com/) 搜索欄中 (是的, 真的!)。你很可能不是第一個遇到錯誤的人, 這是找到社區中其他人發佈的解決方案的好方法。例如, 在 Stack Overflow 上搜索 `OSError: Can't load config for` 給出了幾個[hits](https://stackoverflow.com/search?q=OSError%3A+Can%27t+load+config+for+), 可能是用作解決問題的起點。
 
 第一個建議是要求我們檢查模型ID是否真的正確, 所以首先要做的就是複製標識符並將其粘貼到Hub的搜索欄中:
 
@@ -159,11 +153,8 @@ pretrained_checkpoint = "distilbert-base-uncased"
 config = AutoConfig.from_pretrained(pretrained_checkpoint)
 ```
 
-<Tip warning={true}>
-
-🚨 我們在這裡採用的方法並不是萬無一失的, 因為我們的同事可能在微調模型之前已經調整了 `distilbert-base-uncased` 配置。在現實生活中, 我們想首先檢查它們, 但出於本節的目的, 我們假設它們使用默認配置。
-
-</Tip>
+> [!WARNING]
+> 🚨 我們在這裡採用的方法並不是萬無一失的, 因為我們的同事可能在微調模型之前已經調整了 `distilbert-base-uncased` 配置。在現實生活中, 我們想首先檢查它們, 但出於本節的目的, 我們假設它們使用默認配置。
 
 然後我們可以使用配置的 `push_to_hub()` 方法將其推送到我們的模型存儲庫:
 
diff --git a/chapters/zh-TW/chapter8/4.mdx b/chapters/zh-TW/chapter8/4.mdx
index 4d7b3c1c1..c0c17f392 100644
--- a/chapters/zh-TW/chapter8/4.mdx
+++ b/chapters/zh-TW/chapter8/4.mdx
@@ -243,11 +243,8 @@ trainer.train_dataset.features["label"].names
 
 我們這裡沒有令牌類型 ID，因為 DistilBERT 不需要它們； 如果您的模型中有一些，您還應該確保它們正確匹配輸入中第一句和第二句的位置。
 
-<Tip>
-
-✏️ **輪到你了！** 檢查訓練數據集的第二個元素是否正確。
-
-</Tip>
+> [!TIP]
+> ✏️ **輪到你了！** 檢查訓練數據集的第二個元素是否正確。
 
 我們在這裡只對訓練集進行檢查，但您當然應該以同樣的方式仔細檢查驗證集和測試集。
 
@@ -518,11 +515,8 @@ trainer.optimizer.step()
 
 要解決這個問題，您只需要使用更少的 GPU 空間——這往往說起來容易做起來難。 首先，確保您沒有同時在 GPU 上運行兩個模型（當然，除非您的問題需要這樣做）。 然後，您可能應該減少batch的大小，因為它直接影響模型的所有中間輸出的大小及其梯度。 如果問題仍然存在，請考慮使用較小版本的模型。
 
-<Tip>
-
-在課程的下一部分中，我們將介紹更先進的技術，這些技術可以幫助您減少內存佔用並讓您微調最大的模型。
-
-</Tip>
+> [!TIP]
+> 在課程的下一部分中，我們將介紹更先進的技術，這些技術可以幫助您減少內存佔用並讓您微調最大的模型。
 
 ### 評估模型
 
@@ -549,11 +543,8 @@ trainer.evaluate()
 TypeError: only size-1 arrays can be converted to Python scalars
 ```
 
-<Tip>
-
-💡 您應該始終確保在啟動 `trainer.train()` 之前 `trainer.evaluate()`是可以運行的，以避免在遇到錯誤之前浪費大量計算資源。
-
-</Tip>
+> [!TIP]
+> 💡 您應該始終確保在啟動 `trainer.train()` 之前 `trainer.evaluate()`是可以運行的，以避免在遇到錯誤之前浪費大量計算資源。
 
 在嘗試調試評估循環中的問題之前，您應該首先確保您已經查看了數據，能夠正確地形成批處理，並且可以在其上運行您的模型。 我們已經完成了所有這些步驟，因此可以執行以下代碼而不會出錯：
 
@@ -682,11 +673,8 @@ trainer.train()
 
 在這種情況下，如果沒有更多錯誤，我們的腳本將微調一個應該給出合理結果的模型。 但是，如果訓練沒有任何錯誤，而訓練出來的模型根本表現不佳，我們該怎麼辦？ 這是機器學習中最難的部分，我們將向您展示一些可以提供幫助的技術。
 
-<Tip>
-
-💡 如果您使用手動訓練循環，則相同的步驟也適用於調試訓練管道，而且更容易將它們分開。 但是，請確保您沒有忘記正確位置的 `model.eval()` 或 `model.train()`，或者每個步驟中的 `zero_grad()`！
-
-</Tip>
+> [!TIP]
+> 💡 如果您使用手動訓練循環，則相同的步驟也適用於調試訓練管道，而且更容易將它們分開。 但是，請確保您沒有忘記正確位置的 `model.eval()` 或 `model.train()`，或者每個步驟中的 `zero_grad()`！
 
 ## 在訓練期間調試靜默（沒有任何錯誤提示）錯誤
 
@@ -701,11 +689,8 @@ trainer.train()
 - 有沒有一個標籤比其他標籤更常見？
 - 如果模型預測隨機的答案/總是相同的答案，那麼loss/評估指標應該是多少？
 
-<Tip warning={true}>
-
-⚠️ 如果您正在進行分佈式訓練，請在每個過程中打印數據集的樣本，並三次檢查您是否得到相同的結果。 一個常見的錯誤是在數據創建中有一些隨機性來源，這使得每個進程都有不同版本的數據集。
-
-</Tip>
+> [!WARNING]
+> ⚠️ 如果您正在進行分佈式訓練，請在每個過程中打印數據集的樣本，並三次檢查您是否得到相同的結果。 一個常見的錯誤是在數據創建中有一些隨機性來源，這使得每個進程都有不同版本的數據集。
 
 查看您的數據後，查看模型的一些預測並對其進行解碼。 如果模型總是預測同樣的事情，那可能是因為你的數據集偏向一個類別（針對分類問題）； 過採樣稀有類等技術可能會有所幫助。
 
@@ -734,11 +719,8 @@ for _ in range(20):
     trainer.optimizer.zero_grad()
 ```
 
-<Tip>
-
-💡 如果您的訓練數據不平衡，請確保構建一批包含所有標籤的訓練數據。
-
-</Tip>
+> [!TIP]
+> 💡 如果您的訓練數據不平衡，請確保構建一批包含所有標籤的訓練數據。
 
 生成的模型在一個“批次”上應該有接近完美的結果。 讓我們計算結果預測的指標：
 
@@ -759,11 +741,8 @@ compute_metrics((preds.cpu().numpy(), labels.cpu().numpy()))
 
 如果你沒有設法讓你的模型獲得這樣的完美結果，這意味著你構建問題或數據的方式有問題，所以你應該修復它。 只有當你可以通過過擬合測試時，你才能確定你的模型實際上可以學到一些東西。
 
-<Tip warning={true}>
-
-⚠️ 在此測試之後，您將不得不重新創建您的模型和“Trainer”，因為獲得的模型可能無法在您的完整數據集上恢復和學習有用的東西。
-
-</Tip>
+> [!WARNING]
+> ⚠️ 在此測試之後，您將不得不重新創建您的模型和“Trainer”，因為獲得的模型可能無法在您的完整數據集上恢復和學習有用的東西。
 
 ### 在你有第一個基線之前不要調整任何東西
 
diff --git a/chapters/zh-TW/chapter8/4_tf.mdx b/chapters/zh-TW/chapter8/4_tf.mdx
index 64a80718f..e6aa13c6b 100644
--- a/chapters/zh-TW/chapter8/4_tf.mdx
+++ b/chapters/zh-TW/chapter8/4_tf.mdx
@@ -116,15 +116,12 @@ model.compile(optimizer="adam")
 
 現在我們將使用模型的內部損失，這個問題應該解決了！
 
-<Tip>
-
-✏️ **輪到你了！** 作為我們解決其他問題後的可選挑戰，你可以嘗試回到這一步，讓模型使用原始 Keras 計算的損失而不是內部損失。 您需要將 `"labels"` 添加到 `to_tf_dataset()` 的 `label_cols` 參數，以確保正確輸出標籤，這將為您提供梯度——但我們指定的損失還有一個問題 . 訓練仍然會遇到這個問題，學習會非常緩慢，並且會在多次訓練損失時達到穩定狀態。 你能弄清楚它是什麼嗎？
-
-一個 ROT13 編碼的提示，如果你卡住了：Vs lbh ybbx ng gur bhgchgf bs FrdhraprPynffvsvpngvba zbqryf va Genafsbezref, gurve svefg bhgchg vf `ybtvgf`。 榮格納 ybtvgf?
-
-第二個提示：Jura lbh fcrpvsl bcgvzvmref, npgvingvbaf 是 ybffrf jvgu fgevatf, Xrenf frgf nyy gur nethzrag inyhrf gb gurve qrsnhygf。 Jung nethzragf qbrf FcnefrPngrtbevpnyPebffragebcl unir, naq jung ner gurve qrsnhygf?
-
-</Tip>
+> [!TIP]
+> ✏️ **輪到你了！** 作為我們解決其他問題後的可選挑戰，你可以嘗試回到這一步，讓模型使用原始 Keras 計算的損失而不是內部損失。 您需要將 `"labels"` 添加到 `to_tf_dataset()` 的 `label_cols` 參數，以確保正確輸出標籤，這將為您提供梯度——但我們指定的損失還有一個問題 . 訓練仍然會遇到這個問題，學習會非常緩慢，並且會在多次訓練損失時達到穩定狀態。 你能弄清楚它是什麼嗎？
+>
+> 一個 ROT13 編碼的提示，如果你卡住了：Vs lbh ybbx ng gur bhgchgf bs FrdhraprPynffvsvpngvba zbqryf va Genafsbezref, gurve svefg bhgchg vf `ybtvgf`。 榮格納 ybtvgf?
+>
+> 第二個提示：Jura lbh fcrpvsl bcgvzvmref, npgvingvbaf 是 ybffrf jvgu fgevatf, Xrenf frgf nyy gur nethzrag inyhrf gb gurve qrsnhygf。 Jung nethzragf qbrf FcnefrPngrtbevpnyPebffragebcl unir, naq jung ner gurve qrsnhygf?
 
 現在，讓我們嘗試訓練。 我們現在應該得到梯度，所以希望（這裡播放不祥的音樂）我們可以調用 `model.fit()` 一切都會正常工作！
 
@@ -367,11 +364,8 @@ model = TFAutoModelForSequenceClassification.from_pretrained(model_checkpoint)
 model.compile(optimizer=Adam(5e-5))
 ```
 
-<Tip>
-
-💡您還可以從🤗 Transformers 中導入 `create_optimizer()` 函數，這將為您提供具有正確權重衰減以及學習率預熱和學習率衰減的 AdamW 優化器。 此優化器通常會產生比使用默認 Adam 優化器獲得的結果稍好一些的結果。
-
-</Tip>
+> [!TIP]
+> 💡您還可以從🤗 Transformers 中導入 `create_optimizer()` 函數，這將為您提供具有正確權重衰減以及學習率預熱和學習率衰減的 AdamW 優化器。 此優化器通常會產生比使用默認 Adam 優化器獲得的結果稍好一些的結果。
 
 現在，我們可以嘗試使用新的、改進後的學習率來擬合模型：
 
@@ -393,11 +387,8 @@ model.fit(train_dataset)
 
 內存不足的跡象是“分配張量時出現 OOM”之類的錯誤——OOM 是“內存不足”的縮寫。 在處理大型語言模型時，這是一個非常常見的危險。 如果遇到這種情況，一個好的策略是將批量大小減半並重試。 但請記住，有些型號*非常*大。 例如，全尺寸 GPT-2 的參數為 1.5B，這意味著您將需要 6 GB 的內存來存儲模型，另外需要 6 GB 的內存用於梯度下降！ 無論您使用什麼批量大小，訓練完整的 GPT-2 模型通常需要超過 20 GB 的 VRAM，而只有少數 GPU 擁有。 像“distilbert-base-cased”這樣更輕量級的模型更容易運行，訓練也更快。
 
-<Tip>
-
-在課程的下一部分中，我們將介紹更先進的技術，這些技術可以幫助您減少內存佔用並讓您微調最大的模型。
-
-</Tip>
+> [!TIP]
+> 在課程的下一部分中，我們將介紹更先進的技術，這些技術可以幫助您減少內存佔用並讓您微調最大的模型。
 
 ### TensorFlow 🦛餓餓
 
@@ -451,21 +442,15 @@ for batch in train_dataset:
 model.fit(batch, epochs=20)
 ```
 
-<Tip>
-
-💡 如果您的訓練數據不平衡，請確保構建一批包含所有標籤的訓練數據。
-
-</Tip>
+> [!TIP]
+> 💡 如果您的訓練數據不平衡，請確保構建一批包含所有標籤的訓練數據。
 
 生成的模型在“批次”上應該有接近完美的結果，損失迅速下降到 0（或您正在使用的損失的最小值）。
 
 如果你沒有設法讓你的模型獲得這樣的完美結果，這意味著你構建問題或數據的方式有問題，所以你應該修復它。 只有當你設法通過過擬合測試時，你才能確定你的模型實際上可以學到一些東西。
 
-<Tip warning={true}>
-
-⚠️ 在此測試之後，您將不得不重新創建您的模型和“Trainer”，因為獲得的模型可能無法在您的完整數據集上恢復和學習有用的東西。
-
-</Tip>
+> [!WARNING]
+> ⚠️ 在此測試之後，您將不得不重新創建您的模型和“Trainer”，因為獲得的模型可能無法在您的完整數據集上恢復和學習有用的東西。
 
 ### 在你有第一個基線之前不要調整任何東西
 
diff --git a/chapters/zh-TW/chapter8/5.mdx b/chapters/zh-TW/chapter8/5.mdx
index 378a8ccb9..9544a011f 100644
--- a/chapters/zh-TW/chapter8/5.mdx
+++ b/chapters/zh-TW/chapter8/5.mdx
@@ -16,11 +16,8 @@
 
 隔離產生錯誤的代碼段非常重要，因為 Hugging Face 團隊中沒有人是魔術師（目前），他們無法修復他們看不到的東西。顧名思義，最小的可重現示例應該是可重現的。這意味著它不應依賴於您可能擁有的任何外部文件或數據。嘗試用一些看起來像真實值的虛擬值替換您正在使用的數據，但仍然會產生相同的錯誤。
 
-<Tip>
-
-🚨🤗 Transformers 存儲庫中的許多問題都沒有解決，因為用於複製它們的數據不可訪問。
-
-</Tip>
+> [!TIP]
+> 🚨🤗 Transformers 存儲庫中的許多問題都沒有解決，因為用於複製它們的數據不可訪問。
 
 一旦你有一些自包含的東西，你可以嘗試將它減少到更少的代碼行，構建我們所謂的最小的可重複示例.雖然這需要你做更多的工作，但如果你提供一個漂亮的、簡短的錯誤重現器，你幾乎可以保證得到幫助和修復。
 
diff --git a/chapters/zh-TW/chapter9/7.mdx b/chapters/zh-TW/chapter9/7.mdx
index 9cf1dc7cd..e3656f552 100644
--- a/chapters/zh-TW/chapter9/7.mdx
+++ b/chapters/zh-TW/chapter9/7.mdx
@@ -61,9 +61,8 @@ demo.launch()
 上述簡單示例介紹了塊的4個基本概念:
 
 1. 塊允許你允許你構建結合markdown、HTML、按鈕和交互組件的web應用程序, 只需在一個帶有gradio的Python中實例化對象。
-<Tip>
-🙋如果你不熟悉 Python 中的 `with` 語句, 我們建議你查看來自 Real Python 的極好的[教程](https://realpython.com/python-with-statement/)。看完後回到這裡 🤗
-</Tip>
+> [!TIP]
+> 🙋如果你不熟悉 Python 中的 `with` 語句, 我們建議你查看來自 Real Python 的極好的[教程](https://realpython.com/python-with-statement/)。看完後回到這裡 🤗
 實例化組件的順序很重要, 因為每個元素都按照創建的順序呈現到 Web 應用程序中。(更復雜的佈局在下面討論)
 
 2. 你可以在代碼中的任何位置定義常規 Python 函數, 並使用`塊`在用戶輸入的情況下運行它們。在我們的示例中, 們有一個"翻轉"輸入文本的簡單函數, 但你可以編寫任何 Python 函數, 從簡單的計算到處理機器學習模型的預測。
diff --git a/utils/code_formatter.py b/utils/code_formatter.py
index dfff59a30..7f5563809 100644
--- a/utils/code_formatter.py
+++ b/utils/code_formatter.py
@@ -1,9 +1,10 @@
 import argparse
-import black
 import os
 import re
 from pathlib import Path
 
+import black
+
 
 def blackify(filename, check_only=False):
     # Read the content of the file
@@ -26,7 +27,9 @@ def blackify(filename, check_only=False):
             # Deal with ! instructions
             code = re.sub(r"^!", r"## !", code, flags=re.MULTILINE)
 
-            code_samples.append({"start_index": start_index, "end_index": line_index - 1, "code": code})
+            code_samples.append(
+                {"start_index": start_index, "end_index": line_index - 1, "code": code}
+            )
             line_index += 1
         else:
             line_index += 1
@@ -35,7 +38,9 @@ def blackify(filename, check_only=False):
     delimiter = "\n\n### New cell ###\n"
     full_code = delimiter.join([sample["code"] for sample in code_samples])
     formatted_code = full_code.replace("\t", "    ")
-    formatted_code = black.format_str(formatted_code, mode=black.FileMode({black.TargetVersion.PY37}, line_length=90))
+    formatted_code = black.format_str(
+        formatted_code, mode=black.FileMode({black.TargetVersion.PY37}, line_length=90)
+    )
 
     # Black adds last new lines we don't want, so we strip individual code samples.
     cells = formatted_code.split(delimiter)
@@ -75,7 +80,9 @@ def format_all_files(check_only=False):
             raise
 
     if check_only and len(failures) > 0:
-        raise ValueError(f"{len(failures)} files need to be formatted, run `make style`.")
+        raise ValueError(
+            f"{len(failures)} files need to be formatted, run `make style`."
+        )
 
 
 if __name__ == "__main__":
diff --git a/utils/convert_bilingual_monolingual.py b/utils/convert_bilingual_monolingual.py
index c993a6516..725f537cb 100644
--- a/utils/convert_bilingual_monolingual.py
+++ b/utils/convert_bilingual_monolingual.py
@@ -1,5 +1,5 @@
-import re
 import argparse
+import re
 from pathlib import Path
 
 PATTERN_TIMESTAMP = re.compile(
@@ -35,7 +35,9 @@ def convert(input_file, output_file):
 if __name__ == "__main__":
     parser = argparse.ArgumentParser()
     parser.add_argument(
-        "--input_language_folder", type=str, help="Folder with input bilingual SRT files to be converted"
+        "--input_language_folder",
+        type=str,
+        help="Folder with input bilingual SRT files to be converted",
     )
     parser.add_argument(
         "--output_language_folder",
@@ -50,4 +52,6 @@ def convert(input_file, output_file):
     input_files = Path(args.input_language_folder).glob("*.srt")
     for input_file in input_files:
         convert(input_file, output_path / input_file.name)
-    print(f"Succesfully converted {len(list(input_files))} files to {args.output_language_folder} folder")
+    print(
+        f"Succesfully converted {len(list(input_files))} files to {args.output_language_folder} folder"
+    )
diff --git a/utils/generate_notebooks.py b/utils/generate_notebooks.py
index 9ed74f976..9a439e7dd 100644
--- a/utils/generate_notebooks.py
+++ b/utils/generate_notebooks.py
@@ -1,12 +1,12 @@
 import argparse
 import os
 import re
-import nbformat
 import shutil
-import yaml
-
 from pathlib import Path
 
+import nbformat
+import yaml
+
 re_framework_test = re.compile(r"^{#if\s+fw\s+===\s+'([^']+)'}\s*$")
 re_framework_else = re.compile(r"^{:else}\s*$")
 re_framework_end = re.compile(r"^{/if}\s*$")
@@ -119,9 +119,17 @@ def convert_to_nb_cell(cell):
 
 def nb_cell(source, code=True):
     if not code:
-        return nbformat.notebooknode.NotebookNode({"cell_type": "markdown", "source": source, "metadata": {}})
+        return nbformat.notebooknode.NotebookNode(
+            {"cell_type": "markdown", "source": source, "metadata": {}}
+        )
     return nbformat.notebooknode.NotebookNode(
-        {"cell_type": "code", "metadata": {}, "source": source, "execution_count": None, "outputs": []}
+        {
+            "cell_type": "code",
+            "metadata": {},
+            "source": source,
+            "execution_count": None,
+            "outputs": [],
+        }
     )
 
 
@@ -188,21 +196,28 @@ def build_notebook(fname, title, output_dir="."):
             fnames.append(f"section{stem}_{key}.ipynb")
             section_names.append(f"{Path(fname).parent.stem}/{stem}_{key}")
 
-    for title, content, fname, section_name in zip(titles, contents, fnames, section_names):
+    for title, content, fname, section_name in zip(
+        titles, contents, fnames, section_names
+    ):
         cells = extract_cells(content)
         if len(cells) == 0:
             continue
 
         nb_cells = [
             nb_cell(f"# {title}", code=False),
-            nb_cell("Install the Transformers, Datasets, and Evaluate libraries to run this notebook.", code=False),
+            nb_cell(
+                "Install the Transformers, Datasets, and Evaluate libraries to run this notebook.",
+                code=False,
+            ),
         ]
 
         # Install cell
         installs = ["!pip install datasets evaluate transformers[sentencepiece]"]
         if section_name in sections_with_accelerate:
             installs.append("!pip install accelerate")
-            installs.append("# To run the training on TPU, you will need to uncomment the following line:")
+            installs.append(
+                "# To run the training on TPU, you will need to uncomment the following line:"
+            )
             installs.append(
                 "# !pip install cloud-tpu-client==0.10 torch==1.9.0 https://storage.googleapis.com/tpu-pytorch/wheels/torch_xla-1.9-cp37-cp37m-linux_x86_64.whl"
             )
@@ -219,7 +234,8 @@ def build_notebook(fname, title, output_dir="."):
             nb_cells.extend(
                 [
                     nb_cell(
-                        "You will need to setup git, adapt your email and name in the following cell.", code=False
+                        "You will need to setup git, adapt your email and name in the following cell.",
+                        code=False,
                     ),
                     nb_cell(
                         '!git config --global user.email "you@example.com"\n!git config --global user.name "Your Name"'
@@ -228,12 +244,19 @@ def build_notebook(fname, title, output_dir="."):
                         "You will also need to be logged in to the Hugging Face Hub. Execute the following and enter your credentials.",
                         code=False,
                     ),
-                    nb_cell("from huggingface_hub import notebook_login\n\nnotebook_login()"),
+                    nb_cell(
+                        "from huggingface_hub import notebook_login\n\nnotebook_login()"
+                    ),
                 ]
             )
         nb_cells += [convert_to_nb_cell(cell) for cell in cells]
         metadata = {"colab": {"name": title, "provenance": []}}
-        nb_dict = {"cells": nb_cells, "metadata": metadata, "nbformat": 4, "nbformat_minor": 4}
+        nb_dict = {
+            "cells": nb_cells,
+            "metadata": metadata,
+            "nbformat": 4,
+            "nbformat_minor": 4,
+        }
         notebook = nbformat.notebooknode.NotebookNode(nb_dict)
         os.makedirs(output_dir, exist_ok=True)
         nbformat.write(notebook, os.path.join(output_dir, fname), version=4)
@@ -243,7 +266,9 @@ def get_titles(language):
     """
     Parse the _toctree.yml file to get the correspondence filename to title
     """
-    table = yaml.safe_load(open(os.path.join(f"chapters/{language}", "_toctree.yml"), "r"))
+    table = yaml.safe_load(
+        open(os.path.join(f"chapters/{language}", "_toctree.yml"), "r")
+    )
     result = {}
     for entry in table:
         for section in entry["sections"]:
diff --git a/utils/generate_subtitles.py b/utils/generate_subtitles.py
index 68592830f..b2f159a6a 100644
--- a/utils/generate_subtitles.py
+++ b/utils/generate_subtitles.py
@@ -1,17 +1,31 @@
+import argparse
+from pathlib import Path
+
 import pandas as pd
 from youtube_transcript_api import YouTubeTranscriptApi
 from youtube_transcript_api.formatters import SRTFormatter
 from youtubesearchpython import Playlist
-from pathlib import Path
-import argparse
 
-COURSE_VIDEOS_PLAYLIST = "https://youtube.com/playlist?list=PLo2EIpI_JMQvWfQndUesu0nPBAtZ9gP1o"
-TASK_VIDEOS_PLAYLIST = "https://youtube.com/playlist?list=PLo2EIpI_JMQtyEr-sLJSy5_SnLCb4vtQf"
+COURSE_VIDEOS_PLAYLIST = (
+    "https://youtube.com/playlist?list=PLo2EIpI_JMQvWfQndUesu0nPBAtZ9gP1o"
+)
+TASK_VIDEOS_PLAYLIST = (
+    "https://youtube.com/playlist?list=PLo2EIpI_JMQtyEr-sLJSy5_SnLCb4vtQf"
+)
 # These videos are not part of the course, but are part of the task playlist
-TASK_VIDEOS_TO_SKIP = ["tjAIM7BOYhw", "WdAeKSOpxhw", "KWwzcmG98Ds", "TksaY_FDgnk", "leNG9fN9FQU", "dKE8SIt9C-w"]
+TASK_VIDEOS_TO_SKIP = [
+    "tjAIM7BOYhw",
+    "WdAeKSOpxhw",
+    "KWwzcmG98Ds",
+    "TksaY_FDgnk",
+    "leNG9fN9FQU",
+    "dKE8SIt9C-w",
+]
 
 
-def generate_subtitles(language: str, youtube_language_code: str = None, is_task_playlist: bool = False):
+def generate_subtitles(
+    language: str, youtube_language_code: str = None, is_task_playlist: bool = False
+):
     metadata = []
     formatter = SRTFormatter()
     path = Path(f"subtitles/{language}")
@@ -24,7 +38,9 @@ def generate_subtitles(language: str, youtube_language_code: str = None, is_task
     for idx, video in enumerate(playlist_videos["videos"]):
         video_id = video["id"]
         title = video["title"]
-        title_formatted = title.lower().replace(" ", "-").replace(":", "").replace("?", "")
+        title_formatted = (
+            title.lower().replace(" ", "-").replace(":", "").replace("?", "")
+        )
         id_str = f"{idx}".zfill(2)
 
         if is_task_playlist:
@@ -42,8 +58,12 @@ def generate_subtitles(language: str, youtube_language_code: str = None, is_task
 
         # Get transcript
         transcript_list = YouTubeTranscriptApi.list_transcripts(video_id)
-        english_transcript = transcript_list.find_transcript(language_codes=["en", "en-US"])
-        languages = pd.DataFrame(english_transcript.translation_languages)["language_code"].tolist()
+        english_transcript = transcript_list.find_transcript(
+            language_codes=["en", "en-US"]
+        )
+        languages = pd.DataFrame(english_transcript.translation_languages)[
+            "language_code"
+        ].tolist()
         # Map mismatched language codes
         if language not in languages:
             if youtube_language_code is None:
@@ -60,11 +80,20 @@ def generate_subtitles(language: str, youtube_language_code: str = None, is_task
             with open(srt_filename, "w", encoding="utf-8") as f:
                 f.write(srt_formatted)
         except:
-            print(f"Problem generating transcript for {title} with ID {video_id} at {video['link']}.")
+            print(
+                f"Problem generating transcript for {title} with ID {video_id} at {video['link']}."
+            )
             with open(srt_filename, "w", encoding="utf-8") as f:
                 f.write("No transcript found for this video!")
 
-        metadata.append({"id": video_id, "title": title, "link": video["link"], "srt_filename": srt_filename})
+        metadata.append(
+            {
+                "id": video_id,
+                "title": title,
+                "link": video["link"],
+                "srt_filename": srt_filename,
+            }
+        )
 
     df = pd.DataFrame(metadata)
 
@@ -74,12 +103,17 @@ def generate_subtitles(language: str, youtube_language_code: str = None, is_task
         df.to_csv(f"{path}/metadata.csv", index=False)
 
 
-
 if __name__ == "__main__":
     parser = argparse.ArgumentParser()
-    parser.add_argument("--language", type=str, help="Language to generate subtitles for")
-    parser.add_argument("--youtube_language_code", type=str, help="YouTube language code")
+    parser.add_argument(
+        "--language", type=str, help="Language to generate subtitles for"
+    )
+    parser.add_argument(
+        "--youtube_language_code", type=str, help="YouTube language code"
+    )
     args = parser.parse_args()
-    generate_subtitles(args.language, args.youtube_language_code, is_task_playlist=False)
+    generate_subtitles(
+        args.language, args.youtube_language_code, is_task_playlist=False
+    )
     generate_subtitles(args.language, args.youtube_language_code, is_task_playlist=True)
     print(f"All done! Subtitles stored at subtitles/{args.language}")
diff --git a/utils/validate_translation.py b/utils/validate_translation.py
index b3892c2cc..df47feab0 100644
--- a/utils/validate_translation.py
+++ b/utils/validate_translation.py
@@ -1,14 +1,16 @@
 import argparse
 import os
-import yaml
-
 from pathlib import Path
 
+import yaml
+
 PATH_TO_COURSE = Path("chapters/")
 
 
 def load_sections(language: str):
-    toc = yaml.safe_load(open(os.path.join(PATH_TO_COURSE / language, "_toctree.yml"), "r"))
+    toc = yaml.safe_load(
+        open(os.path.join(PATH_TO_COURSE / language, "_toctree.yml"), "r")
+    )
     sections = []
     for chapter in toc:
         for section in chapter["sections"]:

	patient_id	drugName	condition	review	rating	date	usefulCount	review_length
0	95260	Guanfacine	adhd	"My son is halfway through his fourth week of Intuniv..."	8.0	April 27, 2010	192	141
1	92703	Lybrel	birth control	"I used to take another oral contraceptive, which had 21 pill cycle, and was very happy- very light periods, max 5 days, no other side effects..."	5.0	December 14, 2009	17	134
2	138000	Ortho Evra	birth control	"This is my first time using any form of birth control..."	8.0	November 3, 2015	10	89
	condition	frequency
0	birth control	27655
1	depression	8023
2	acne	5209
3	anxiety	4991
4	pain	4744
	html_url	title	comments	body
0	https://github.com/huggingface/datasets/issues/2787	ConnectionError: Couldn't reach https://raw.githubusercontent.com	the bug code locate in ：\r\n if data_args.task_name is not None...	Hello,\r\nI am trying to run run_glue.py and it gives me this error...
1	https://github.com/huggingface/datasets/issues/2787	ConnectionError: Couldn't reach https://raw.githubusercontent.com	Hi @jinec,\r\n\r\nFrom time to time we get this kind of `ConnectionError` coming from the github.com website: https://raw.githubusercontent.com...	Hello,\r\nI am trying to run run_glue.py and it gives me this error...
2	https://github.com/huggingface/datasets/issues/2787	ConnectionError: Couldn't reach https://raw.githubusercontent.com	cannot connect，even by Web browser，please check that there is some problems。	Hello,\r\nI am trying to run run_glue.py and it gives me this error...
3	https://github.com/huggingface/datasets/issues/2787	ConnectionError: Couldn't reach https://raw.githubusercontent.com	I can access https://raw.githubusercontent.com/huggingface/datasets/1.7.0/datasets/glue/glue.py without problem...	Hello,\r\nI am trying to run run_glue.py and it gives me this error...