You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
**ConsistentChatGenerator** is an operator designed to synthesize multi-turn dialogue data from scratch. It operates in a two-stage process, generating conversations based on a predefined set of topics and human intents, making it ideal for creating consistent and context-aware conversational datasets.
9
+
`ConsistentChatGenerator` is a multi-turn dialogue data generation operator that synthesizes dialogue data from scratch in two stages based on predefined topics and human intents.
1. Generate multiple rounds of realistic user questions based on the provided topic:
36
+
- Based on a single core topic (provided directly by the user), generate multiple rounds of realistic user questions, comprising 6-8 turns in total.
37
+
- The questions should match the characteristics of real users in natural communication: sometimes simple, sometimes vague, or including contextual backgrounds, and should reflect the language style of daily communication.
38
+
- Note: Avoid directly including the exact expression of the input topic in the questions. Instead, abstract it with natural and conversational language in practical scenarios.
39
+
40
+
2. Dynamic Dialogue Information Flow in Conversations: Below are the relevant steps of the information flow: {info_flow}
41
+
42
+
The dialogue style should adhere to the following requirements:
To generate multi-turn queries with high topic consistency, please think step-by-step.
59
+
The input core topic for this task is: {topic}
60
+
```
61
+
62
+
#### response
63
+
64
+
```python
65
+
Your task is to simulate a multi-turn conversation where you progressively answer a series of user questions provided under a given topic category. For each answer, focus on delivering a natural, contextually relevant, and actionable response while considering both the current question and future questions in the sequence. The goal is to ensure consistency and logical progression throughout the dialogue and to avoid unnecessary follow-up questions in the responses simultaneously. To generate multi-turn responses with high topic consistency, think step-by-step. Key Dialogue Style Requirements are as follows:
66
+
Content and Structure:
67
+
1. Directly Answer the Current Question:
68
+
- Provide a complete, useful response to the current question without posing additional questions unless they are directly relevant to future queries.
69
+
- If clarification or additional steps are needed, frame these as suggestions or explanations rather than questions.
70
+
2. Be Context-Aware:
71
+
- Always tailor each response to the current question while remaining mindful of the context provided by prior and future questions.
72
+
- Avoid prematurely addressing future queries but create subtle links where necessary to ensure smooth progression.
73
+
3. Clear, Action-Oriented Responses:
74
+
- Focus on providing actionable advice, logical explanations, or troubleshooting steps rather than speculative or rhetorical remarks.
75
+
- Avoid longor overly complex explanations; aim for clarity and efficiency.
76
+
Tone and Style:
77
+
1. Conversational and Supportive:
78
+
- Use a natural, empathetic tone that simulates real-life problem-solving interactions.
79
+
- Avoid mechanical or overly formal responses.
80
+
2. Economical with Words:
81
+
- Keep responses concise but informative. Minimize extraneous content while ensuring answers have enough detail to be helpful.
82
+
3. No Unnecessary Questions:
83
+
- Limit unnecessary questions in the responses and focus instead on providing actionable steps or solutions directly. Avoid follow-up questions that don’t align with the next user query.
84
+
Turn-by-Turn Instructions:
85
+
1. Answer Exclusively for the Current Question:
86
+
- For each turn, generate an answer that directly addresses the immediate question. Avoid revisiting past details unnecessarily unless they are highly relevant.
87
+
- While you shouldn’t anticipate or directly answer future queries, your response should create natural openings for upcoming questions if applicable.
88
+
2. Avoid Irrelevant Follow-Up Questions:
89
+
- If the immediate question doesn’t require clarification, frame your response as a statement or suggestion rather than a question.
90
+
- Maintain alignment with the logical flow of dialogue to ensure each turn is coherent.
91
+
3. Proactively Provide Scenarios or Steps:
92
+
- Where appropriate, guide the user with specific recommendations, troubleshooting actions, or observations they can make without requiring back-and-forth clarification.
93
+
Output Requirements:
94
+
The output must simulate the conversation by only providing responses (one per turn) in a sequential manner. The final format must strictly adhere to valid JSONand include the required structure.
95
+
96
+
The input core topic and questions-only turns for this task is:
97
+
core topic: {topic}
98
+
queries:
99
+
{', '.join([f'User query: {query}'for query in queries])}
`CondorGenerator` is an operator that generates Synthetic Fine-Tuning (SFT) data from scratch based on predefined knowledge tree tags. It operates in a two-stage process: first generating questions of varying difficulty levels, and then generating corresponding answers for each question.
7
+
## 📘 Overview
8
+
`CondorGenerator` is a two-stage operator for generating SFT (Supervised Fine-Tuning) formatted data. Based on predefined knowledge tree tags, it first generates questions of different difficulty levels, and then generates corresponding answers for each question. This operator can create diverse instruction fine-tuning datasets from scratch.
|**llm_serving**| LLMServingABC | None | The Large Language Model serving instance, used to execute inference and generation. |
18
-
|**num_samples**| int | 15 | Total number of samples to generate. It is recommended to keep this value below 5000 unless more tags are added. |
19
-
|**use_task_diversity**| bool | True | Whether to use task scenario diversity enhancement to enrich the generated questions. |
18
+
|**num_samples**| int | 15 | Total number of samples to generate. When the number is large (e.g., greater than 5000), it is recommended to increase the knowledge tree tags to ensure data richness. |
19
+
|**use_task_diversity**| bool | True | Whether to use predefined task scenarios to enhance the diversity of generated questions. |
| Prompt Template Name | Primary Use | Applicable Scenarios | Feature Description |
23
-
| :--- | :--- | :--- | :--- |
24
-
|||||
22
+
### Prompt Template Description
23
+
24
+
The default prompt generates questions based on predefined themes and domains.
25
+
26
+
```python
27
+
defbuild_prompt(self, theme, domain):
28
+
"""
29
+
Generates the formatted prompt for LLM input based on the theme and domain.
30
+
31
+
Parameters:
32
+
theme (str): The main theme of the questions.
33
+
domain (str): The domain under the given theme.
34
+
35
+
Returns:
36
+
str: The formatted prompt for generating questions.
37
+
"""
38
+
prompt =f"""
39
+
Now we need to create high-quality SFT data for LLM training, so we need you to produce a batch of such data. You only
40
+
need to create Questions. I will give you a theme for SFT data Questions. You need to create three
41
+
Questions of different difficulty levels based on this new theme.\\
42
+
Your Questions must meet the following requirements:\\
43
+
1. You must strictly create only three Questions at a time. These three Questions must be in the domain of {domain}
44
+
and the Questions should align with the given theme of {theme}.\\
45
+
2. The Questions you create must have context and sufficient information; they should not be abrupt and directly ask the
46
+
question.\\
47
+
3. Your reply must strictly follow the format below. Your Questions need to be included between [Question Start] and
48
+
[Question End], and the difficulty level should be indicated at the beginning, as in the following format:\\
49
+
50
+
[Easy][Question Start]Question[Question End]
51
+
52
+
[Medium][Question Start]Question[Question End]
53
+
54
+
[Hard][Question Start]Question[Question End]
55
+
56
+
4. Your Questions of different difficulty levels should be distinct and actually reflect the different levels of difficulty.\\
57
+
\quad \\
58
+
59
+
Now it's your turn. Please provide the three Questions of different difficulty levels you created about the theme of {theme} for {domain}, according to the requirements.
60
+
"""
61
+
return prompt
62
+
```
25
63
26
64
## `run`
27
65
```python
28
-
defrun(self, storage: DataFlowStorage)
66
+
defrun(self, storage: DataFlowStorage):
29
67
```
30
-
| Parameter | Type | Default | Description |
68
+
#### Parameters
69
+
| Name | Type | Default | Description |
31
70
| :--- | :--- | :--- | :--- |
32
-
|**storage**| DataFlowStorage | Required | The DataFlowStorage instance, responsible for writing the generated data. |
71
+
|**storage**| DataFlowStorage | Required | The DataFlowStorage instance, responsible for reading and writing data. |
33
72
34
-
## 🧠 Example Usage
35
-
```python
73
+
## Example Usage
36
74
37
-
```
75
+
Example synthetic data
38
76
39
-
#### 🧾 Output Format
40
-
| Field | Type | Description |
41
-
| :--- | :--- | :--- |
42
-
| difficulty | str | Difficulty level of the question (e.g., "Easy", "Medium", "Hard"). |
43
-
| instruction | str | The generated question text. |
44
-
| output | str | The generated answer for the corresponding instruction. |
77
+
```json
78
+
{
79
+
"difficulty":"Easy",
80
+
"instruction":"Imagine you're visiting a country where the custom is to greet others with a bow instead of a handshake. As a traveler aiming to adapt culturally, how would you approach your first meeting with locals to show respect for their traditions? How might you prepare before your trip to ensure you understand this and other greeting customs?",
81
+
"output":"Adapting to local customs when visiting a new country is a respectful and enriching approach. Here's how you might approach your first meeting with locals to show respect for their tradition of greeting with a bow:\n\n1. **Preparation Before the Trip:**\n - **Research Cultural Practices:** Begin by researching the social norms and customs specific to the country you are visiting. This can include watching videos, reading travel guides, and exploring reputable online resources that describe local etiquette.\n - **Understand the Bowing Gesture:** Learn the different forms and depths of bowing, as some cultures have variations based on the social context or the status of the person you are greeting.\n - **Learn the Language Basics:** Familiarize yourself with basic greetings and polite phrases in the local language, which will enhance your sincerity in adapting to their customs.\n - **Connect with Locals:** If possible, reach out to locals or expatriate communities online to gather insights or tips on cultural norms and expectations.\n\n2. **During the Meeting:**\n - **Observe and Mimic:** Upon meeting someone for the first time, observe how they initiate the greeting, the depth of their bow, and then respectfully mimic their approach.\n - **Display Body Language that Conveys Openness:** Maintain good posture, smile genuinely, and make eye contact where culturally appropriate to convey warmth and openness.\n - **Offer a Polite Verbal Greeting:** Combine the physical gesture with a courteous verbal greeting in their language or a respectful equivalent.\n\n3. **Be Open to Feedback:**\n - **Ask for Guidance:** If you are unsure about the proper etiquette, it is perfectly fine to politely ask if you are doing it correctly or request guidance on any cultural nuances.\n - **Show Appreciation:** Express gratitude for any insights or corrections offered by locals, demonstrating your willingness to learn and respect their traditions.\n\nBy thoroughly preparing and demonstrating genuine respect during your interactions, you show appreciation for their culture and foster positive and meaningful connections."
0 commit comments