huggingface
diff --git a/‎chapters/my/_toctree.yml‎
Lines changed: 19 additions & 19 deletions b/‎chapters/my/_toctree.yml‎
Lines changed: 19 additions & 19 deletions
diff --git a/‎chapters/my/chapter5/1.mdx‎
Lines changed: 40 additions & 0 deletions b/‎chapters/my/chapter5/1.mdx‎
Lines changed: 40 additions & 0 deletions
@@ -85,25 +85,25 @@
     title: အခန်း (၄) ဆိုင်ရာ မေးခွန်းများ
     quiz: 4
 
-# - title: 5. The 🤗 Datasets library
-#   sections:
-#   - local: chapter5/1
-#     title: Introduction
-#   - local: chapter5/2
-#     title: What if my dataset isn't on the Hub?
-#   - local: chapter5/3
-#     title: Time to slice and dice
-#   - local: chapter5/4
-#     title: Big data? 🤗 Datasets to the rescue!
-#   - local: chapter5/5
-#     title: Creating your own dataset
-#   - local: chapter5/6
-#     title: Semantic search with FAISS
-#   - local: chapter5/7
-#     title: 🤗 Datasets, check!
-#   - local: chapter5/8
-#     title: End-of-chapter quiz
-#     quiz: 5
+- title: 5. The 🤗 Datasets library
+  sections:
+  - local: chapter5/1
+    title: နိဒါန်း
+  - local: chapter5/2
+    title: ကျွန်ုပ်၏ Dataset သည် Hub တွင် မရှိလျှင် ဘာလုပ်ရမလဲ။
+  - local: chapter5/3
+    title: Slice and Dice လုပ်ဖို့ အချိန်တန်ပြီ။
+  - local: chapter5/4
+    title: Big Data လား။ 🤗 Datasets က ကူညီပါလိမ့်မယ်။
+  - local: chapter5/5
+    title: ကိုယ်ပိုင် Dataset တစ်ခု ဖန်တီးခြင်း
+  - local: chapter5/6
+    title: FAISS ဖြင့် Semantic Search ပြုလုပ်ခြင်း
+  - local: chapter5/7
+    title: 🤗 Datasets၊ အဆင်သင့်ဖြစ်ပါပြီ!
+  - local: chapter5/8
+    title: အခန်း (၅) ဆိုင်ရာ မေးခွန်းများ
+    quiz: 5
 
 # - title: 6. The 🤗 Tokenizers library
 #   sections:
 
@@ -0,0 +1,40 @@
+# နိဒါန်း[[introduction]]
+
+<CourseFloatingBanner
+    chapter={5}
+    classNames="absolute z-10 right-0 top-0"
+/>
+
+[Chapter 3](/course/chapter3) မှာ သင်ဟာ 🤗 Datasets library ရဲ့ ပထမဆုံး အတွေ့အကြုံကို ရရှိခဲ့ပြီး model တစ်ခုကို fine-tuning လုပ်တဲ့အခါ အဓိကအဆင့်သုံးဆင့်ရှိတယ်ဆိုတာကို တွေ့မြင်ခဲ့ရပါတယ်-
+
+၁။ Hugging Face Hub ကနေ dataset တစ်ခုကို load လုပ်ပါ။
+၂။ `Dataset.map()` နဲ့ data ကို preprocess လုပ်ပါ။
+၃။ metrics တွေကို load လုပ်ပြီး တွက်ချက်ပါ။
+
+ဒါပေမယ့် ဒါတွေဟာ 🤗 Datasets လုပ်နိုင်တဲ့အရာတွေရဲ့ အပေါ်ယံမျှသာ ရှိပါသေးတယ်။ ဒီအခန်းမှာ၊ ကျွန်တော်တို့ library ကို နက်နက်နဲနဲ လေ့လာသွားမှာပါ။ ဒီလိုလုပ်ရင်း၊ အောက်ပါမေးခွန်းတွေရဲ့ အဖြေတွေကို ရှာဖွေသွားမှာပါ-
+
+*   သင်၏ dataset က Hub ပေါ်မှာ မရှိရင် ဘာလုပ်ရမလဲ။
+*   dataset တစ်ခုကို ဘယ်လို slice and dice လုပ်မလဲ။ (ပြီးတော့ Pandas ကို _တကယ်_ အသုံးပြုဖို့ လိုအပ်ရင် ဘယ်လိုလုပ်မလဲ။)
+*   သင်၏ dataset က ကြီးမားလွန်းပြီး သင့် laptop ရဲ့ RAM ကို အရည်ပျော်သွားစေနိုင်ရင် ဘာလုပ်ရမလဲ။
+*   "memory mapping" နဲ့ Apache Arrow ဆိုတာ ဘာတွေလဲ။
+*   သင့်ကိုယ်ပိုင် dataset ကို ဘယ်လိုဖန်တီးပြီး Hub ကို push လုပ်မလဲ။
+
+ဒီနေရာမှာ သင်ယူရမယ့် နည်းလမ်းတွေက [Chapter 6](/course/chapter6) နဲ့ [Chapter 7](/course/chapter7) မှာ ပါဝင်မယ့် အဆင့်မြင့် tokenization နဲ့ fine-tuning လုပ်ငန်းတွေအတွက် သင့်ကို ပြင်ဆင်ပေးပါလိမ့်မယ် — ဒါကြောင့် ကော်ဖီတစ်ခွက်သောက်ပြီး စလိုက်ရအောင်!
+
+## ဝေါဟာရ ရှင်းလင်းချက် (Glossary)
+
+*   **🤗 Datasets Library**: Hugging Face က ထုတ်လုပ်ထားတဲ့ library တစ်ခုဖြစ်ပြီး AI မော်ဒယ်တွေ လေ့ကျင့်ဖို့အတွက် ဒေတာအစုအဝေး (datasets) တွေကို လွယ်လွယ်ကူကူ ဝင်ရောက်ရယူ၊ စီမံခန့်ခွဲပြီး အသုံးပြုနိုင်စေပါတယ်။
+*   **Fine-tuning**: ကြိုတင်လေ့ကျင့်ထားပြီးသား (pre-trained) မော်ဒယ်တစ်ခုကို သီးခြားလုပ်ငန်းတစ်ခု (specific task) အတွက် အနည်းငယ်သော ဒေတာနဲ့ ထပ်မံလေ့ကျင့်ပေးခြင်းကို ဆိုလိုပါတယ်။
+*   **Model**: Artificial Intelligence (AI) နယ်ပယ်တွင် အချက်အလက်များကို လေ့လာပြီး ခန့်မှန်းချက်များ ပြုလုပ်ရန် ဒီဇိုင်းထုတ်ထားသော သင်္ချာဆိုင်ရာဖွဲ့စည်းပုံများ။
+*   **Hugging Face Hub**: AI မော်ဒယ်တွေ၊ datasets တွေနဲ့ demo တွေကို အခြားသူတွေနဲ့ မျှဝေဖို့၊ ရှာဖွေဖို့နဲ့ ပြန်လည်အသုံးပြုဖို့အတွက် အွန်လိုင်း platform တစ်ခု ဖြစ်ပါတယ်။
+*   **Dataset**: AI မော်ဒယ်တွေ လေ့ကျင့်ဖို့အတွက် အသုံးပြုတဲ့ ဒေတာအစုအဝေးတစ်ခုပါ။
+*   **Preprocess**: ဒေတာများကို model က နားလည်ပြီး လုပ်ဆောင်နိုင်တဲ့ ပုံစံအဖြစ် ပြောင်းလဲပြင်ဆင်ခြင်း လုပ်ငန်းစဉ်။
+*   **`Dataset.map()`**: 🤗 Datasets library မှာ ပါဝင်တဲ့ method တစ်ခုဖြစ်ပြီး dataset ရဲ့ element တစ်ခုစီ ဒါမှမဟုတ် batch တစ်ခုစီပေါ်မှာ function တစ်ခုကို အသုံးပြုနိုင်စေသည်။
+*   **Metrics**: Model ၏ စွမ်းဆောင်ရည်ကို တိုင်းတာရန် အသုံးပြုသော တန်ဖိုးများ (ဥပမာ- accuracy, F1 score)။
+*   **Slice and Dice**: ဒေတာအစုအဝေး (dataset) ကို လိုအပ်သလို အစိတ်စိတ်အမြွှာမြွှာ ပိုင်းဖြတ်ခြင်းနှင့် ပုံစံပြောင်းလဲခြင်း။
+*   **Pandas**: Python programming language အတွက် data analysis နှင့် manipulation အတွက် အသုံးပြုသော open-source library။
+*   **RAM (Random Access Memory)**: ကွန်ပျူတာ၏ ယာယီမှတ်ဉာဏ်သိုလှောင်ရာနေရာ။
+*   **Memory Mapping**: ဖိုင်တစ်ခု၏ အကြောင်းအရာများကို ကွန်ပျူတာ၏ virtual memory နေရာသို့ တိုက်ရိုက်ချိတ်ဆက်ပေးသည့် နည်းလမ်း။ ၎င်းသည် ကြီးမားသောဖိုင်များကို disk ပေါ်ကနေ လိုအပ်သလောက်သာ memory ထဲသို့ load လုပ်စေပြီး memory အသုံးပြုမှုကို လျှော့ချသည်။
+*   **Apache Arrow**: In-memory data format တစ်ခုဖြစ်ပြီး data analytics applications တွေကြား ဒေတာဖလှယ်မှုကို မြန်ဆန်စေပြီး ထိရောက်စေသည်။
+*   **Push to the Hub**: Hugging Face Hub သို့ model, dataset သို့မဟုတ် အခြား artifacts များကို upload လုပ်ခြင်း။
+*   **Tokenization**: စာသား (သို့မဟုတ် အခြားဒေတာ) ကို AI မော်ဒယ်များ စီမံဆောင်ရွက်နိုင်ရန် tokens တွေအဖြစ် ပိုင်းခြားပေးသည့် လုပ်ငန်းစဉ်။