Skip to content

Commit 7c1326e

Browse files
Merge pull request #4 from thewebscraping/feat/mk-docs
feat: documentation
2 parents 01604cf + 4203f85 commit 7c1326e

File tree

6 files changed

+76
-28
lines changed

6 files changed

+76
-28
lines changed
Lines changed: 41 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,41 @@
1+
name: Build Documentation
2+
3+
on:
4+
release:
5+
types: [published, created, released, prereleased]
6+
7+
8+
jobs:
9+
build:
10+
runs-on: ubuntu-latest
11+
strategy:
12+
max-parallel: 1
13+
matrix:
14+
python-version: ['3.9']
15+
steps:
16+
- uses: actions/checkout@v4
17+
- name: Set up Python ${{ matrix.python-version }}
18+
uses: actions/setup-python@v5
19+
with:
20+
python-version: ${{ matrix.python-version }}
21+
22+
- name: Install Dependencies
23+
run: |
24+
python -m pip install --upgrade pip
25+
pip install -r requirements-dev.txt
26+
27+
- name: Configure Git Credentials
28+
run: |
29+
git config user.name github-actions[bot]
30+
git config user.email 41898282+github-actions[bot]@users.noreply.github.com
31+
32+
- uses: actions/cache@v4
33+
with:
34+
key: mkdocs-material-${{ env.cache_id }}
35+
path: .cache
36+
restore-keys: |
37+
mkdocs-material-
38+
- name: Publish Documentation
39+
run: |
40+
echo "cache_id=$(date --utc '+%V')" >> $GITHUB_ENV
41+
mkdocs gh-deploy --force

README.md

Lines changed: 31 additions & 25 deletions
Original file line numberDiff line numberDiff line change
@@ -1,11 +1,9 @@
1-
2-
Gemma Template
3-
==============
1+
# Gemma Template
42

53
This library was developed for the Kaggle challenge:
64
[**Google - Unlocking Global Communication with Gemma**](https://www.kaggle.com/competitions/gemma-language-tuning), sponsored by Google.
75

8-
### Credit Requirement
6+
## Credit Requirement
97

108
**Important:** If you are a participant in the competition and wish to use this source code in your submission,
119
you must clearly credit the original author before the competition's end date, **January 14, 2025**.
@@ -16,13 +14,26 @@ Please include the following information in your submission:
1614
Author: Tu Pham
1715
Kaggle Username: [bigfishdev](https://www.kaggle.com/bigfishdev)
1816
GitHub: [https://github.com/thewebscraping/gemma-template/](https://github.com/thewebscraping/gemma-template)
17+
LinkedIn: [https://www.linkedin.com/in/thetwofarm](https://www.linkedin.com/in/thetwofarm)
1918
```
2019

2120
# Overview
2221

23-
**Gemma Template** is a lightweight and efficient Python library for generating templates to fine-tune models and craft prompts.
24-
Designed for flexibility, it seamlessly supports Gemma, LLaMa and other language frameworks, offering fast, user-friendly customization.
25-
With multilingual capabilities and advanced configuration options, ensures precise, professional, and dynamic template creation.
22+
Gemma Template is a lightweight and efficient Python library for generating templates to fine-tune models and craft prompts.
23+
Designed for flexibility, it seamlessly supports Gemma, LLaMA, and other language frameworks, offering fast, user-friendly customization.
24+
With multilingual capabilities and advanced configuration options, it ensures precise, professional, and dynamic template creation.
25+
26+
### Learning Process and Acknowledgements
27+
As a newbie, I created Gemma Template based on what I read and learned from the following sources:
28+
29+
- Google Cookbook: [Advanced Prompting Techniques](https://github.com/google-gemini/gemma-cookbook/blob/main/Gemma/Advanced_Prompting_Techniques.ipynb)
30+
- Google Cookbook: [Finetune_with_LLaMA_Factory](https://github.com/google-gemini/gemma-cookbook/blob/main/Gemma/Finetune_with_LLaMA_Factory.ipynb)
31+
- Google Cookbook: [Finetuning Gemma for Function Calling](https://github.com/google-gemini/gemma-cookbook/blob/main/Gemma/Finetuning_Gemma_for_Function_Calling.ipynb)
32+
- Alpaca: [Alpaca Lora Documention](https://github.com/tloen/alpaca-lora)
33+
- Unsloth: [Finetune Llama 3.2, Mistral, Phi-3.5, Qwen 2.5 & Gemma 2-5x faster with 80% less memory!](https://github.com/unslothai/unsloth)
34+
35+
36+
Gemma Template supports exporting dataset files in three formats: `Text`, `Alpaca`, and `GPT conversions`.
2637

2738
# Multilingual Content Writing Assistant
2839

@@ -45,18 +56,17 @@ It enhances text readability, aligns with linguistic nuances, and preserves orig
4556
- Aligns rewritten content with SEO best practices for discoverability.
4657

4758
#### 4. **Professional and Multilingual Expertise**
48-
- Fully support for creating template with local language.
49-
- Supports multiple languages with advanced vocabulary and grammar enhancement.
50-
- Adapts tone and style to maintain professionalism and clarity.
51-
- Support hidden mask input text.
52-
- Optional: learn vocabulary enhancement with unigrams, bigrams and trigrams instruction template.
53-
- Full documentation, easy configuration prompts with examples.
59+
- Full support for creating templates in local languages.
60+
- Supports multiple languages with advanced prompting techniques.
61+
- Vocabulary and grammar enhancement with unigrams, bigrams, and trigrams instruction template.
62+
- Supports hidden mask input text. Adapts tone and style to maintain professionalism and clarity.
63+
- Full documentation with easy configuration prompts and examples.
5464

5565
#### 5. **Customize Advanced Response Structure and Dataset Format**
56-
- Fully support for advanced structure response format customization.
57-
- Support output multiple formats such as Alpaca, GPT, STF text.
58-
- Can be used with other models such as LLama.
59-
- Dynamic prompts are enhanced using Round-Robin loop.
66+
- Supports advanced response structure format customization.
67+
- Compatible with other models such as LLaMa.
68+
- Enhances dynamic prompts using Round-Robin loops.
69+
- Outputs multiple formats such as Alpaca, GPT, and STF text.
6070

6171
**Installation**
6272
----------------
@@ -82,7 +92,7 @@ Start using Gemma Template with just a few lines of code:
8292
```python
8393
from gemma_template.models import *
8494

85-
prompt_instance = Template(
95+
template_instance = Template(
8696
structure_field=StructureField(
8797
title=["Custom Title"],
8898
description=["Custom Description"],
@@ -93,11 +103,7 @@ prompt_instance = Template(
93103
),
94104
) # Create fully customized structured reminders.
95105

96-
response = prompt_instance.template(
97-
template=GEMMA_TEMPLATE,
98-
user_template=USER_TEMPLATE,
99-
instruction_template=INSTRUCTION_TEMPLATE,
100-
structure_template=STRUCTURE_TEMPLATE,
106+
response = template_instance.template(
101107
title="Gemma open models",
102108
description="Gemma: Introducing new state-of-the-art open models.",
103109
document="Gemma open models are built from the same research and technology as Gemini models. Gemma 2 comes in 2B, 9B and 27B and Gemma 1 comes in 2B and 7B sizes.",
@@ -226,8 +232,8 @@ print(dataset['text'][0])
226232
```python
227233
dataset = gemma_template.load_dataset(
228234
"your_huggingface_dataset",
229-
# enum: text, gpt, alpaca
230-
output_format='gpt',
235+
# enum: `text`, `alpaca` and `gpt`.
236+
output_format='text',
231237
# Template for instruction the user prompt.
232238
instruction_template=INSTRUCTION_TEMPLATE,
233239
# Template for structuring the user prompt.

docs/benchmark.md

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1 @@
1+
# Benchmark

examples/README.md

Whitespace-only changes.

gemma_template/models.py

Lines changed: 2 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -642,8 +642,6 @@ def get_user_kwargs(
642642
if language is None:
643643
language_code, language = get_language(document)
644644

645-
document = mask_hidden(language_code=language_code, **kwargs)
646-
647645
unigrams = kwargs.get("unigrams")
648646
if unigrams is None:
649647
unigrams = self._get_frequently_words(
@@ -669,6 +667,7 @@ def get_user_kwargs(
669667
excluded_words=unigrams,
670668
)
671669

670+
document = mask_hidden(language_code=language_code, **kwargs)
672671
instruction_kwargs = dict(
673672
document=document,
674673
topic_values=", ".join(kwargs.get("categories", []) or []),
@@ -1143,7 +1142,7 @@ def _get_structure_attrs(self, **kwargs):
11431142
return mapping
11441143

11451144
def _get_origin_data(self, **kwargs) -> dict:
1146-
if not kwargs.get("is_remove_data", True):
1145+
if kwargs.get("is_remove_data", True) is False:
11471146
return {k: v for k, v in kwargs.items() if hasattr(self, k)}
11481147
return {}
11491148

mkdocs.yml

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -24,6 +24,7 @@ edit_uri: ""
2424
nav:
2525
- Introduction: 'index.md'
2626
- Quickstart Guide: 'quickstart.md'
27+
- Benchmark: 'benchmark.md'
2728

2829
markdown_extensions:
2930
- admonition

0 commit comments

Comments
 (0)