Skip to content

Commit e2e87a5

Browse files
authored
fix:update api doc with diy prompt template (#150)
1 parent 30a5364 commit e2e87a5

File tree

8 files changed

+392
-91
lines changed

8 files changed

+392
-91
lines changed

docs/en/notes/api/operators/conversations/generate/ConsistentChatGenerator.md

Lines changed: 82 additions & 16 deletions
Original file line numberDiff line numberDiff line change
@@ -6,32 +6,98 @@ permalink: /en/api/operators/conversations/generate/consistentchatgenerator/
66

77
## 📘 Overview
88

9-
**ConsistentChatGenerator** is an operator designed to synthesize multi-turn dialogue data from scratch. It operates in a two-stage process, generating conversations based on a predefined set of topics and human intents, making it ideal for creating consistent and context-aware conversational datasets.
9+
`ConsistentChatGenerator` is a multi-turn dialogue data generation operator that synthesizes dialogue data from scratch in two stages based on predefined topics and human intents.
1010

1111
## `__init__` function
1212

1313
```python
14-
def __init__(self,
15-
llm_serving: LLMServingABC = None,
16-
num_dialogs_per_intent = 20,
17-
num_turns_per_dialog = 6,
18-
temperature = 0.9):
14+
def __init__(self, llm_serving: LLMServingABC = None, num_dialogs_per_intent = 20, num_turns_per_dialog = 6, temperature = 0.9, , prompt_template : Union[ConsistentChatPrompt, DIYPromptABC] = None)
1915
```
2016

2117
### init Parameters
2218

23-
| Parameter | Type | Default Value | Description |
24-
| :----------------------- | :------------ | :------------ | :------------------------------------------------------------------- |
25-
| **llm_serving** | LLMServingABC | None | The Large Language Model serving instance used for generation. |
26-
| **num_dialogs_per_intent**| int | 20 | The number of dialogs to generate for each predefined intent. |
27-
| **num_turns_per_dialog** | int | 6 | The number of turns (user and assistant messages) in each dialog. |
28-
| **temperature** | float | 0.9 | The sampling temperature to control the randomness of the generation. |
19+
| Parameter | Type | Default Value | Description |
20+
| :------------------------ | :------------------------------------- | :------------ | :-------------------------------------------------------------------------- |
21+
| **llm_serving** | LLMServingABC | None | The Large Language Model serving instance used for generation. |
22+
| **num_dialogs_per_intent**| int | 20 | The number of dialogs to generate for each predefined intent. |
23+
| **num_turns_per_dialog** | int | 6 | The number of turns (user and assistant messages) in each dialog. |
24+
| **temperature** | float | 0.9 | The sampling temperature to control the randomness of the generation. |
25+
| **prompt_template** | Union[ConsistentChatPrompt, DIYPromptABC] | None | Prompt template, supports customization. |
2926

30-
### Prompt Template Descriptions
27+
### Prompt Template Description
3128

32-
| Prompt Template Name | Main Purpose | Applicable Scenarios | Feature Description |
33-
| -------------------- | ------------ | -------------------- | ------------------- |
34-
| | | | |
29+
Includes `query` and `response` parts.
30+
31+
#### query
32+
33+
```python
34+
Task Description and Rules
35+
1. Generate multiple rounds of realistic user questions based on the provided topic:
36+
- Based on a single core topic (provided directly by the user), generate multiple rounds of realistic user questions, comprising 6-8 turns in total.
37+
- The questions should match the characteristics of real users in natural communication: sometimes simple, sometimes vague, or including contextual backgrounds, and should reflect the language style of daily communication.
38+
- Note: Avoid directly including the exact expression of the input topic in the questions. Instead, abstract it with natural and conversational language in practical scenarios.
39+
40+
2. Dynamic Dialogue Information Flow in Conversations: Below are the relevant steps of the information flow: {info_flow}
41+
42+
The dialogue style should adhere to the following requirements:
43+
- Utilize natural phrasing and vivid language, avoiding overly mechanical responses.
44+
- Favor shorter sentences in questions, with occasional subject omission allowed.
45+
- Ensure smooth and logical transitions through lighthearted or entertaining interjections.
46+
- Permit the expression of specific personality traits and individualized tones.
47+
- Proactively introduce new topics when appropriate, ensuring relevance to the current theme.
48+
49+
The dialogue should comply with the following generation rules:
50+
- For each round of dialogue, only simulate user questions without providing answers.
51+
- Ensure the conversation flows naturally and reflects realistic interactive thinking.
52+
- Avoid overly polished or templated content, ensuring the questions feel authentic and relatable in life scenarios.
53+
54+
Output Format:
55+
Multi-turn Questions in JSON Format:
56+
"category": "<Core Topic of the Conversation>",
57+
"turns": ["<turn_1>", "<turn_2>", "<turn_3>", "..."]
58+
To generate multi-turn queries with high topic consistency, please think step-by-step.
59+
The input core topic for this task is: {topic}
60+
```
61+
62+
#### response
63+
64+
```python
65+
Your task is to simulate a multi-turn conversation where you progressively answer a series of user questions provided under a given topic category. For each answer, focus on delivering a natural, contextually relevant, and actionable response while considering both the current question and future questions in the sequence. The goal is to ensure consistency and logical progression throughout the dialogue and to avoid unnecessary follow-up questions in the responses simultaneously. To generate multi-turn responses with high topic consistency, think step-by-step. Key Dialogue Style Requirements are as follows:
66+
Content and Structure:
67+
1. Directly Answer the Current Question:
68+
- Provide a complete, useful response to the current question without posing additional questions unless they are directly relevant to future queries.
69+
- If clarification or additional steps are needed, frame these as suggestions or explanations rather than questions.
70+
2. Be Context-Aware:
71+
- Always tailor each response to the current question while remaining mindful of the context provided by prior and future questions.
72+
- Avoid prematurely addressing future queries but create subtle links where necessary to ensure smooth progression.
73+
3. Clear, Action-Oriented Responses:
74+
- Focus on providing actionable advice, logical explanations, or troubleshooting steps rather than speculative or rhetorical remarks.
75+
- Avoid long or overly complex explanations; aim for clarity and efficiency.
76+
Tone and Style:
77+
1. Conversational and Supportive:
78+
- Use a natural, empathetic tone that simulates real-life problem-solving interactions.
79+
- Avoid mechanical or overly formal responses.
80+
2. Economical with Words:
81+
- Keep responses concise but informative. Minimize extraneous content while ensuring answers have enough detail to be helpful.
82+
3. No Unnecessary Questions:
83+
- Limit unnecessary questions in the responses and focus instead on providing actionable steps or solutions directly. Avoid follow-up questions that don’t align with the next user query.
84+
Turn-by-Turn Instructions:
85+
1. Answer Exclusively for the Current Question:
86+
- For each turn, generate an answer that directly addresses the immediate question. Avoid revisiting past details unnecessarily unless they are highly relevant.
87+
- While you shouldn’t anticipate or directly answer future queries, your response should create natural openings for upcoming questions if applicable.
88+
2. Avoid Irrelevant Follow-Up Questions:
89+
- If the immediate question doesn’t require clarification, frame your response as a statement or suggestion rather than a question.
90+
- Maintain alignment with the logical flow of dialogue to ensure each turn is coherent.
91+
3. Proactively Provide Scenarios or Steps:
92+
- Where appropriate, guide the user with specific recommendations, troubleshooting actions, or observations they can make without requiring back-and-forth clarification.
93+
Output Requirements:
94+
The output must simulate the conversation by only providing responses (one per turn) in a sequential manner. The final format must strictly adhere to valid JSON and include the required structure.
95+
96+
The input core topic and questions-only turns for this task is:
97+
core topic: {topic}
98+
queries:
99+
{', '.join([f'User query: {query}' for query in queries])}
100+
```
35101

36102

37103
## `run` function

docs/en/notes/api/operators/text_sft/generate/CondorGenerator.md

Lines changed: 61 additions & 22 deletions
Original file line numberDiff line numberDiff line change
@@ -4,41 +4,80 @@ createTime: 2025/10/09 16:52:48
44
permalink: /en/api/operators/text_sft/generate/condorgenerator/
55
---
66

7-
# 📘 CondorGenerator
8-
9-
`CondorGenerator` is an operator that generates Synthetic Fine-Tuning (SFT) data from scratch based on predefined knowledge tree tags. It operates in a two-stage process: first generating questions of varying difficulty levels, and then generating corresponding answers for each question.
7+
## 📘 Overview
8+
`CondorGenerator` is a two-stage operator for generating SFT (Supervised Fine-Tuning) formatted data. Based on predefined knowledge tree tags, it first generates questions of different difficulty levels, and then generates corresponding answers for each question. This operator can create diverse instruction fine-tuning datasets from scratch.
109

1110
## `__init__`
1211
```python
13-
def __init__(self, llm_serving: LLMServingABC = None, num_samples=15, use_task_diversity=True)
12+
def __init__(self, llm_serving: LLMServingABC = None, num_samples=15, use_task_diversity=True, prompt_template: Union[CondorQuestionPrompt, DIYPromptABC] = None):
1413
```
14+
### init Parameters
1515
| Parameter | Type | Default | Description |
1616
| :--- | :--- | :--- | :--- |
1717
| **llm_serving** | LLMServingABC | None | The Large Language Model serving instance, used to execute inference and generation. |
18-
| **num_samples** | int | 15 | Total number of samples to generate. It is recommended to keep this value below 5000 unless more tags are added. |
19-
| **use_task_diversity** | bool | True | Whether to use task scenario diversity enhancement to enrich the generated questions. |
18+
| **num_samples** | int | 15 | Total number of samples to generate. When the number is large (e.g., greater than 5000), it is recommended to increase the knowledge tree tags to ensure data richness. |
19+
| **use_task_diversity** | bool | True | Whether to use predefined task scenarios to enhance the diversity of generated questions. |
20+
| **prompt_template** | Union[CondorQuestionPrompt, DIYPromptABC] | None | Prompt template for generating questions, supports custom modifications |
2021

21-
## Prompt Template Descriptions
22-
| Prompt Template Name | Primary Use | Applicable Scenarios | Feature Description |
23-
| :--- | :--- | :--- | :--- |
24-
| | | | |
22+
### Prompt Template Description
23+
24+
The default prompt generates questions based on predefined themes and domains.
25+
26+
```python
27+
def build_prompt(self, theme, domain):
28+
"""
29+
Generates the formatted prompt for LLM input based on the theme and domain.
30+
31+
Parameters:
32+
theme (str): The main theme of the questions.
33+
domain (str): The domain under the given theme.
34+
35+
Returns:
36+
str: The formatted prompt for generating questions.
37+
"""
38+
prompt = f"""
39+
Now we need to create high-quality SFT data for LLM training, so we need you to produce a batch of such data. You only
40+
need to create Questions. I will give you a theme for SFT data Questions. You need to create three
41+
Questions of different difficulty levels based on this new theme.\\
42+
Your Questions must meet the following requirements:\\
43+
1. You must strictly create only three Questions at a time. These three Questions must be in the domain of {domain}
44+
and the Questions should align with the given theme of {theme}.\\
45+
2. The Questions you create must have context and sufficient information; they should not be abrupt and directly ask the
46+
question.\\
47+
3. Your reply must strictly follow the format below. Your Questions need to be included between [Question Start] and
48+
[Question End], and the difficulty level should be indicated at the beginning, as in the following format:\\
49+
50+
[Easy][Question Start]Question[Question End]
51+
52+
[Medium][Question Start]Question[Question End]
53+
54+
[Hard][Question Start]Question[Question End]
55+
56+
4. Your Questions of different difficulty levels should be distinct and actually reflect the different levels of difficulty.\\
57+
\quad \\
58+
59+
Now it's your turn. Please provide the three Questions of different difficulty levels you created about the theme of {theme} for {domain}, according to the requirements.
60+
"""
61+
return prompt
62+
```
2563

2664
## `run`
2765
```python
28-
def run(self, storage: DataFlowStorage)
66+
def run(self, storage: DataFlowStorage):
2967
```
30-
| Parameter | Type | Default | Description |
68+
#### Parameters
69+
| Name | Type | Default | Description |
3170
| :--- | :--- | :--- | :--- |
32-
| **storage** | DataFlowStorage | Required | The DataFlowStorage instance, responsible for writing the generated data. |
71+
| **storage** | DataFlowStorage | Required | The DataFlowStorage instance, responsible for reading and writing data. |
3372

34-
## 🧠 Example Usage
35-
```python
73+
## Example Usage
3674

37-
```
75+
Example synthetic data
3876

39-
#### 🧾 Output Format
40-
| Field | Type | Description |
41-
| :--- | :--- | :--- |
42-
| difficulty | str | Difficulty level of the question (e.g., "Easy", "Medium", "Hard"). |
43-
| instruction | str | The generated question text. |
44-
| output | str | The generated answer for the corresponding instruction. |
77+
```json
78+
{
79+
"difficulty":"Easy",
80+
"instruction":"Imagine you're visiting a country where the custom is to greet others with a bow instead of a handshake. As a traveler aiming to adapt culturally, how would you approach your first meeting with locals to show respect for their traditions? How might you prepare before your trip to ensure you understand this and other greeting customs?",
81+
"output":"Adapting to local customs when visiting a new country is a respectful and enriching approach. Here's how you might approach your first meeting with locals to show respect for their tradition of greeting with a bow:\n\n1. **Preparation Before the Trip:**\n - **Research Cultural Practices:** Begin by researching the social norms and customs specific to the country you are visiting. This can include watching videos, reading travel guides, and exploring reputable online resources that describe local etiquette.\n - **Understand the Bowing Gesture:** Learn the different forms and depths of bowing, as some cultures have variations based on the social context or the status of the person you are greeting.\n - **Learn the Language Basics:** Familiarize yourself with basic greetings and polite phrases in the local language, which will enhance your sincerity in adapting to their customs.\n - **Connect with Locals:** If possible, reach out to locals or expatriate communities online to gather insights or tips on cultural norms and expectations.\n\n2. **During the Meeting:**\n - **Observe and Mimic:** Upon meeting someone for the first time, observe how they initiate the greeting, the depth of their bow, and then respectfully mimic their approach.\n - **Display Body Language that Conveys Openness:** Maintain good posture, smile genuinely, and make eye contact where culturally appropriate to convey warmth and openness.\n - **Offer a Polite Verbal Greeting:** Combine the physical gesture with a courteous verbal greeting in their language or a respectful equivalent.\n\n3. **Be Open to Feedback:**\n - **Ask for Guidance:** If you are unsure about the proper etiquette, it is perfectly fine to politely ask if you are doing it correctly or request guidance on any cultural nuances.\n - **Show Appreciation:** Express gratitude for any insights or corrections offered by locals, demonstrating your willingness to learn and respect their traditions.\n\nBy thoroughly preparing and demonstrating genuine respect during your interactions, you show appreciation for their culture and foster positive and meaningful connections."
82+
}
83+
```

0 commit comments

Comments
 (0)