Skip to content

Commit 3f03c0e

Browse files
chore: improve generate prompt
1 parent 478eaba commit 3f03c0e

File tree

15 files changed

+1268
-1015
lines changed

15 files changed

+1268
-1015
lines changed

README.md

Lines changed: 96 additions & 98 deletions
Original file line numberDiff line numberDiff line change
@@ -27,13 +27,13 @@ With multilingual capabilities and advanced configuration options, it ensures pr
2727
As a newbie, I created Gemma Template based on what I read and learned from the following sources:
2828

2929
- Google Gemma Cookbook: [Advanced Prompting Techniques](https://github.com/google-gemini/gemma-cookbook/blob/main/Gemma/Advanced_Prompting_Techniques.ipynb)
30-
- Google Gemma Cookbook: [Finetune_with_LLaMA_Factory](https://github.com/google-gemini/gemma-cookbook/blob/main/Gemma/Finetune_with_LLaMA_Factory.ipynb)
31-
- Google Gemma Cookbook: [Finetuning Gemma for Function Calling](https://github.com/google-gemini/gemma-cookbook/blob/main/Gemma/Finetuning_Gemma_for_Function_Calling.ipynb)
30+
- Google Gemma Cookbook: [Finetune with LLaMA Factory](https://github.com/google-gemini/gemma-cookbook/blob/main/Gemma/Finetune_with_LLaMA_Factory.ipynb)
31+
- Google Gemma Cookbook: [Fine tuning Gemma for Function Calling](https://github.com/google-gemini/gemma-cookbook/blob/main/Gemma/Finetuning_Gemma_for_Function_Calling.ipynb)
3232
- Alpaca: [Alpaca Lora Documentation](https://github.com/tloen/alpaca-lora)
3333
- Unsloth: [Finetune Llama 3.2, Mistral, Phi-3.5, Qwen 2.5 & Gemma 2-5x faster with 80% less memory!](https://github.com/unslothai/unsloth)
3434

3535

36-
Gemma Template supports exporting dataset files in three formats: `Text`, `Alpaca`, and `GPT conversions`.
36+
Gemma Template supports exporting dataset files in three formats: `Text`, `Alpaca`, and `OpenAI`.
3737

3838
# Multilingual Content Writing Assistant
3939

@@ -66,7 +66,7 @@ It enhances text readability, aligns with linguistic nuances, and preserves orig
6666
- Supports advanced response structure format customization.
6767
- Compatible with other models such as LLaMa.
6868
- Enhances dynamic prompts using Round-Robin loops.
69-
- Outputs multiple formats such as Text, Alpaca and GPT conversions.
69+
- Outputs multiple formats such as Text, Alpaca and OpenAI.
7070

7171
**Installation**
7272
----------------
@@ -89,62 +89,104 @@ pip install git+https://github.com/thewebscraping/gemma-template.git
8989
----------------
9090
Start using Gemma Template with just a few lines of code:
9191

92+
## Load Dataset
93+
Returns: A Hugging Face Dataset or DatasetDict object containing the processed prompts.
94+
95+
**Load Dataset from data dict**
9296
```python
93-
from gemma_template.models import *
97+
prompt_instance = Template()
98+
data_dict = [
99+
{
100+
"id": "JnZJolR76_u2",
101+
"title": "Sample title",
102+
"description": "Sample description",
103+
"document": "Sample document",
104+
"categories": ["Topic 1", "Topic 2"],
105+
"tags": ["Tag 1", "Tag 2"],
106+
"output": "Sample output",
107+
"main_points": ["Main point 1", "Main point 2"],
108+
}
109+
]
110+
dataset = prompt_instance.load_dataset(data_dict, output_format='text') # enum: `text`, `alpaca` and `openai`.
111+
print(dataset['text'][0])
112+
```
113+
114+
**Load Dataset from HuggingFace**
115+
```python
116+
dataset = gemma_template.load_dataset(
117+
"YOUR_JSON_FILE_PATH_OR_HUGGINGFACE_DATASET",
118+
# enum: `text`, `alpaca` and `openai`.
119+
output_format='text',
120+
# Percentage of documents that need to be word masked.
121+
# Min: 0, Max: 1. Default: 0.
122+
max_hidden_ratio=.1,
123+
# Replace 10% of words in the input document with '_____'.
124+
# Use int to extract the correct number of words. The `max_hidden_ratio` parameter must be greater than 0.
125+
max_hidden_words=.1,
126+
# Minimum character of a word, used to create unigrams, bigrams, and trigrams. Default is 2.
127+
min_chars_length=2,
128+
# Maximum character of a word, used to create unigrams, bigrams and trigrams. Default is 0.
129+
max_chars_length=8,
130+
)
131+
```
132+
133+
## Fully Customized Template
134+
135+
```python
136+
from gemma_template import Template, FieldPosition, INPUT_TEMPLATE, OUTPUT_TEMPLATE, INSTRUCTION_TEMPLATE, PROMPT_TEMPLATE
94137

95138
template_instance = Template(
96-
structure_field=StructureField(
97-
title=["Custom Title"],
98-
description=["Custom Description"],
99-
document=["Custom Article"],
100-
main_points=["Custom Main Points"],
101-
categories=["Custom Categories"],
102-
tags=["Custom Tags"],
103-
),
104-
) # Create fully customized structured reminders.
105-
106-
response = template_instance.template(
139+
instruction_template=[INSTRUCTION_TEMPLATE], # Optional: dynamic Round-Robin loops
140+
prompt_template=[PROMPT_TEMPLATE], # Optional: dynamic Round-Robin loops
141+
input_template=[INPUT_TEMPLATE], # Optional: dynamic Round-Robin loops
142+
output_template=[OUTPUT_TEMPLATE], # Optional: dynamic Round-Robin loops
143+
position=FieldPosition(
144+
title=["Custom Title"],
145+
description=["Custom Description"],
146+
document=["Custom Article"],
147+
main_points=["Custom Main Points"],
148+
categories=["Custom Categories"],
149+
tags=["Custom Tags"],
150+
), # Optional: dynamic Round-Robin loops
151+
)
152+
153+
response = template_instance.apply_template(
107154
title="Gemma open models",
108155
description="Gemma: Introducing new state-of-the-art open models.",
109-
document="Gemma open models are built from the same research and technology as Gemini models. Gemma 2 comes in 2B, 9B and 27B and Gemma 1 comes in 2B and 7B sizes.",
110156
main_points=["Main point 1", "Main point 2"],
111157
categories=["Artificial Intelligence", "Gemma"],
112158
tags=["AI", "LLM", "Google"],
159+
document="Gemma open models are built from the same research and technology as Gemini models. Gemma 2 comes in 2B, 9B and 27B and Gemma 1 comes in 2B and 7B sizes.",
113160
output="A new family of open language models demonstrating strong performance across academic benchmarks for language understanding, reasoning, and safety.",
114161
max_hidden_words=.1, # set 0 if you don't want to hide words.
115162
min_chars_length=2, # Minimum character of a word, used to create unigrams, bigrams, and trigrams. Default is 2.
116-
max_chars_length=0, # Maximum character of a word, used to create unigrams, bigrams and trigrams.. Default is 0.
117-
) # remove kwargs if not used.
163+
max_chars_length=0, # Maximum character of a word, used to create unigrams, bigrams and trigrams. Default is 0.
164+
) # remove kwargs if not used.
165+
118166
print(response)
119167
```
120168

121169
### Output:
122170

123171
```text
124172
<start_of_turn>user
125-
126173
You are a multilingual professional writer.
127174
128-
Rewrite the text to be more search engine friendly. Incorporate relevant keywords naturally, improve readability, and ensure it aligns with SEO best practices.
129-
130175
# Role:
131176
You are a highly skilled professional content writer, linguistic analyst, and multilingual expert specializing in structured writing and advanced text processing.
132177
133178
# Task:
134179
Your primary objectives are:
135-
1. Your primary task is to rewrite the provided content into a more structured, professional format that maintains its original intent and meaning.
136-
2. Enhance vocabulary comprehension by analyzing text with unigrams (single words), bigrams (two words), and trigrams (three words).
137-
3. Ensure your response adheres strictly to the prescribed structure format.
138-
4. Respond in the primary language of the input text unless alternative instructions are explicitly given.
180+
1. Simplification: Rewrite the input text or document to ensure it is accessible and easy to understand for a general audience while preserving the original meaning and essential details.
181+
2. Lexical and Grammatical Analysis: Analyze and refine vocabulary and grammar using unigrams (single words), bigrams (two words), and trigrams (three words) to enhance readability and depth.
182+
3. Structure and Organization: Ensure your response adheres strictly to the prescribed structure format.
183+
4. Language Consistency: Respond in the same language as the input text unless explicitly directed otherwise.
139184
140-
# Additional Expectations:
185+
# Additional Guidelines:
141186
1. Provide a rewritten, enhanced version of the input text, ensuring professionalism, clarity, and improved structure.
142187
2. Focus on multilingual proficiency, using complex vocabulary, grammar to improve your responses.
143188
3. Preserve the context and cultural nuances of the original text when rewriting.
144189
145-
Topics: Artificial Intelligence, Gemma
146-
Keywords: AI, LLM, Google
147-
148190
# Text Analysis:
149191
Example 1: Unigrams (single words)
150192
and => English
@@ -165,88 +207,44 @@ Text Analysis 3: Trigrams further validate the linguistic analysis and the neces
165207
# Conclusion of Text Analysis:
166208
The linguistic analysis confirms the text is predominantly in English. Consequently, the response should be structured and written in English to align with the original text and context.
167209
210+
# Input Text:
211+
Rewrite the input text or document to highlight its unique value proposition while ensuring it ranks well for targeted keywords.
212+
168213
# Response Structure Format:
169214
You must follow the response structure:
170-
**Custom Title (Title):** Rewrite the title to reflect the main keyword and topic.
171-
**Custom Description (Meta Description):** Rewrite the description with a bold claim or statistic to grab attention.
172-
**Custom Article (Edit Article):** Reimagine this article with a more engaging and creative tone. Add metaphors, analogies, or storytelling elements to make it more captivating for readers.
173-
**Custom Main Points (Highlights):** Summarize the main ideas into concise, actionable key points for added context to make them more engaging.
174-
**Custom Categories (Topics):** Assign appropriate categories to the article based text or target audience.
175-
**Custom Tags (Keywords):** Focus use tags that reflect the article’s subtopics or themes for better SEO.
215+
216+
**Custom Title (Title):** Rewrite the title to maximize clarity, appeal, and relevance to the content.
217+
**Custom Description (Description):** Create a description focusing on how the article addresses a common problem or challenge readers face.
218+
**Custom Article (Article):** Rewrite the input text or document with an authoritative tone, incorporating credible sources, data, and references to boost trustworthiness and SEO ranking.
219+
**Custom Main Points (Main Points):** Ensure all key points flow logically from one to the next.
220+
**Custom Categories (Categories):** Use categories that align with similar articles on the topic and improve SEO and discoverability.
221+
**Custom Tags (Tags):** Rewrite tags to make them more specific and targeted.
176222
177223
By adhering to this format, the response will maintain linguistic integrity while enhancing professionalism, structure and alignment with user expectations.
178224
179225
# Text:
180-
Gemma open models are built from the same research _____ technology as Gemini models. Gemma 2 comes in 2B, 9B and 27B _____ Gemma 1 comes in 2B _____ 7B sizes.
181-
182-
<end_of_turn>
226+
Gemma open models are built _____ the same _____ and technology as Gemini models. Gemma 2 comes in 2B, 9B _____ 27B and Gemma 1 comes in 2B and 7B sizes.<end_of_turn>
183227
<start_of_turn>model
184-
185-
## **Custom Title**:
228+
## **Custom Title:**
186229
### Gemma open models
187230
188-
## **Custom Description**:
231+
## **Custom Description:**
189232
Gemma: Introducing new state-of-the-art open models.
190233
191-
## **Custom Article**:
234+
## **Custom Article:**
192235
A new family of open language models demonstrating strong performance across academic benchmarks for language understanding, reasoning, and safety.
193236
194-
## **Custom Main Points**:
195-
- Main point 1
196-
- Main point 2
237+
## **Custom Main Points:**
238+
* Main point 1
239+
* Main point 2
197240
198-
## **Custom Categories**:
199-
- Artificial Intelligence
200-
- Gemma
201-
202-
## **Custom Tags**:
203-
- AI
204-
- LLM
205-
- Google<end_of_turn>
206-
207-
```
208-
209-
## Load Dataset
210-
Returns: Dataset: A Hugging Face Dataset or DatasetDict object containing the processed prompts.
241+
## **Custom Categories:**
242+
* Artificial Intelligence
243+
* Gemma
211244
212-
**Load Dataset from local file path**
213-
```python
214-
prompt_instance = Template()
215-
data_dict = [
216-
{
217-
"id": "JnZJolR76_u2",
218-
"title": "Sample title",
219-
"description": "Sample description",
220-
"document": "Sample document",
221-
"categories": ["Topic 1", "Topic 2"],
222-
"tags": ["Tag 1", "Tag 2"],
223-
"output": "Sample output",
224-
"main_points": ["Main point 1", "Main point 2"],
225-
}
226-
]
227-
dataset = prompt_instance.load_dataset(data_dict, output_format='text') # enum: text, gpt, alpaca
228-
print(dataset['text'][0])
229-
```
245+
## **Custom Tags:**
246+
* AI
247+
* LLM
248+
* Google<end_of_turn>
230249
231-
**Load Dataset from HuggingFace**
232-
```python
233-
dataset = gemma_template.load_dataset(
234-
"your_huggingface_dataset",
235-
# enum: `text`, `alpaca` and `gpt`.
236-
output_format='text',
237-
# Template for instruction the user prompt.
238-
instruction_template=INSTRUCTION_TEMPLATE,
239-
# Template for structuring the user prompt.
240-
structure_template=STRUCTURE_TEMPLATE,
241-
# Percentage of documents that need to be word masked.
242-
# Min: 0, Max: 1. Default: 0.
243-
max_hidden_ratio=.1,
244-
# Replace 10% of words in the input document with '_____'.
245-
# Use int to extract the correct number of words. The `max_hidden_ratio` parameter must be greater than 0.
246-
max_hidden_words=.1,
247-
# Minimum character of a word, used to create unigrams, bigrams, and trigrams. Default is 2.
248-
min_chars_length=2,
249-
# Maximum character of a word, used to create unigrams, bigrams and trigrams. Default is 0.
250-
max_chars_length=8,
251-
)
252250
```

0 commit comments

Comments
 (0)