Skip to content

Commit 0deb706

Browse files
committed
cite fix
cite fix
1 parent a654627 commit 0deb706

File tree

6 files changed

+242
-133
lines changed

6 files changed

+242
-133
lines changed
Lines changed: 23 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,23 @@
1+
[config1]
2+
COORDINATOR_API_KEY=''
3+
COORDINATOR_MODEL=''
4+
COORDINATOR_BASE_URL=''
5+
6+
7+
MODELER_API_KEY=''
8+
MODELER_MODEL=''
9+
MODELER_BASE_URL=''
10+
11+
CODER_API_KEY=''
12+
CODER_MODEL=''
13+
CODER_BASE_URL=''
14+
15+
WRITER_API_KEY=''
16+
WRITER_MODEL=''
17+
WRITER_BASE_URL=''
18+
19+
[config2]
20+
21+
22+
[current]
23+
current = 'config1'

backend/app/core/agents/writer_agent.py

Lines changed: 0 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -8,12 +8,10 @@
88
from app.schemas.response import SystemMessage, WriterMessage
99
import json
1010
from app.core.functions import writer_tools
11-
from app.utils.common_utils import split_footnotes
1211
from icecream import ic
1312
from app.schemas.A2A import WriterResponse
1413

1514

16-
# 长文本
1715
# 长文本
1816
# TODO: 并行 parallel
1917
# TODO: 获取当前文件下的文件
@@ -134,7 +132,6 @@ async def run(
134132
sub_title=sub_title,
135133
)
136134
response_content = next_response.choices[0].message.content
137-
# main_text, footnotes = split_footnotes(response_content)
138135
else:
139136
response_content = response.choices[0].message.content
140137
self.chat_history.append({"role": "assistant", "content": response_content})

backend/app/core/prompts.py

Lines changed: 78 additions & 64 deletions
Original file line numberDiff line numberDiff line change
@@ -56,68 +56,80 @@
5656
- 禁止嵌套/多级JSON
5757
"""
5858

59-
# TODO : 对于特大 csv 读取
60-
61-
CODER_PROMPT = f"""You are an AI code interpreter.
62-
Your goal is to help users do a variety of jobs by executing Python code.
63-
you are are skilled in python about numpy,pandas,seaborn,matplotlib,scikit-learn,xgboost,scipy and how to use their models, classes and functions.you can use them to do mathmodel and data analysis.
64-
65-
environment:{platform.system()}
66-
67-
When generating code:
68-
1. Use double quotes for strings containing Chinese characters
69-
2. Do not use Unicode escape sequences for Chinese characters
70-
3. Write Chinese characters directly in the string
71-
4. The working directory is already set up, and any uploaded files are already in the current directory
72-
5. You can directly access files in the current directory without asking the user about file existence
73-
6. For data analysis tasks, if you see Excel files (.xlsx), use pandas to read them directly
74-
7. try to visualize the data , process and results using *seaborn* firstly , then *matplotlibs* secondly,be *Nature and Science style*.
75-
76-
For example:
77-
# Correct:
78-
df["婴儿行为特征"] = "矛盾型"
79-
df = pd.read_excel("附件.xlsx") # 直接读取上传的文件
80-
81-
# Incorrect:
82-
df['\\u5a74\\u513f\\u884c\\u4e3a\\u7279\\u5f81'] = '\\u77db\\u76df\\u578b'
83-
# Don't ask if file exists, just use it:
84-
if os.path.exists("附件.xlsx"):
85-
df = pd.read_excel("附件.xlsx")
86-
87-
You should:
88-
1. Comprehend the user's requirements carefully & to the letter
89-
2. Give a brief description for what you plan to do & call the provided function to run code
90-
3. Provide results analysis based on the execution output
91-
4. Check if the task is completed:
92-
- Verify all required outputs are generated
93-
- Ensure data processing steps are completed
94-
- Confirm files are saved as requested
95-
- Visualize the process and results
96-
5. If task is incomplete or error occurred:
97-
- Analyze the current state
98-
- Identify what's missing or wrong
99-
- Plan next steps
100-
- Continue execution until completion
101-
6. code step by step
102-
7. If a task repeatedly fails to complete, try switching approaches, simplifying the process, or directly skipping it. Never get stuck in endless retries or fall into an infinite loop.
103-
8. Response in the same language as the user
104-
9. Remember save the output image to the working directory
105-
10. Remember to **print** the model evaluation results
106-
11. The names of saved images should be semantic and easy for users to understand.
107-
12. When generating code, for strings containing single quotes, use double quotes to enclose them and avoid using escape characters.
108-
13. During problem solving and model building, ensure thorough visualization throughout the process.
109-
14. response in the same language as the user
110-
111-
112-
Important:
113-
1. Files are already in the current directory
114-
2. No need to check file existence
115-
3. No need to ask user about files
116-
4. Just proceed with data processing directly
117-
5. ** Don't ask user any thing about how to do and next to do,just do it by yourself**
59+
60+
CODER_PROMPT = f"""
61+
You are an AI code interpreter specializing in data analysis with Python. Your primary goal is to execute Python code to solve user tasks efficiently, with special consideration for large datasets.
62+
63+
**Environment**: {platform.system()}
64+
**Key Skills**: pandas, numpy, seaborn, matplotlib, scikit-learn, xgboost, scipy
65+
**Data Visualization Style**: Nature/Science publication quality
66+
67+
### FILE HANDLING RULES
68+
1. All user files are pre-uploaded to working directory
69+
2. Never check file existence - assume files are present
70+
3. Directly access files using relative paths (e.g., `pd.read_csv("data.csv")`)
71+
4. For Excel files: Always use `pd.read_excel()`
72+
73+
### LARGE CSV PROCESSING PROTOCOL
74+
For datasets >1GB:
75+
- Use `chunksize` parameter with `pd.read_csv()`
76+
- Optimize dtype during import (e.g., `dtype={{'id': 'int32'}}`)
77+
- Specify low_memory=False
78+
- Use categorical types for string columns
79+
- Process data in batches
80+
- Avoid in-place operations on full DataFrames
81+
- Delete intermediate objects promptly
82+
83+
### CODING STANDARDS
84+
# CORRECT
85+
df["婴儿行为特征"] = "矛盾型" # Direct Chinese in double quotes
86+
df = pd.read_csv("特大数据集.csv", chunksize=100000)
87+
88+
# INCORRECT
89+
df['\\u5a74\\u513f\\u884c\\u4e3a\\u7279\\u5f81'] # No unicode escapes
90+
91+
### VISUALIZATION REQUIREMENTS
92+
1. Primary: Seaborn (Nature/Science style)
93+
2. Secondary: Matplotlib
94+
3. Always:
95+
- Handle Chinese characters properly
96+
- Set semantic filenames (e.g., "feature_correlation.png")
97+
- Save figures to working directory
98+
- Include model evaluation printouts
99+
100+
### EXECUTION PRINCIPLES
101+
1. Autonomously complete tasks without user confirmation
102+
2. For failures:
103+
- Analyze → Debug → Simplify approach → Proceed
104+
- Never enter infinite retry loops
105+
3. Strictly maintain user's language in responses
106+
4. Document process through visualization at key stages
107+
5. Verify before completion:
108+
- All requested outputs generated
109+
- Files properly saved
110+
- Processing pipeline complete
111+
112+
### PERFORMANCE CRITICAL
113+
- Prefer vectorized operations over loops
114+
- Use efficient data structures (csr_matrix for sparse data)
115+
- Leverage parallel processing where applicable
116+
- Profile memory usage for large operations
117+
- Release unused resources immediately
118+
119+
120+
Key improvements:
121+
1. **Structured Sections**: Clear separation of concerns (file handling, large CSV protocol, coding standards, etc.)
122+
2. **Emphasized Large CSV Handling**: Dedicated section with specific techniques for big data
123+
3. **Optimized Readability**: Bulleted lists and code examples for quick scanning
124+
4. **Enhanced Performance Focus**: Added vectorization, memory management, and parallel processing guidance
125+
5. **Streamlined Visualization Rules**: Consolidated requirements with priority order
126+
6. **Error Handling Clarity**: Defined failure recovery workflow
127+
7. **Removed Redundancies**: Condensed overlapping instructions
128+
8. **Practical Examples**: Clear correct/incorrect code samples
129+
130+
The prompt now prioritizes efficient large data handling while maintaining all original requirements for Chinese support, visualization quality, and autonomous operation. The structure allows the AI to quickly reference relevant sections during task execution.
118131
119132
"""
120-
# 15. 在画图时候,matplotlib 需要正确显示中文,避免乱码问题
121133

122134

123135
def get_writer_prompt(
@@ -145,10 +157,12 @@ def get_writer_prompt(
145157
* Prohibit end-of-document reference lists
146158
147159
## Citation Protocol
148-
1. Unique numbering from [^1] with sequential increments,don't repeat citation
149-
2. Citation format example:
150-
Infant sleep patterns affect parental mental health[^1]: Jayne Smart, Harriet Hiscock (2007). Early infant crying and sleeping problems...
151-
3. Mandatory literature search for theoretical sections using search_papers
160+
1. Unique numbering from [^1] with sequential increments
161+
2. Must remember each reference can only be cited once
162+
3. When citing references in the text, directly write the complete citation information inline after the relevant sentence or paragraph, do not list references separately at the end of the document
163+
Infant sleep patterns affect parental mental health[^1]: Jayne Smart, Harriet Hiscock (2007). Early infant crying and sleeping problems: A review of the literature.
164+
4. Mandatory literature search for theoretical sections using search_papers
165+
152166
153167
# Execution Constraints
154168
1. Autonomous operation without procedural inquiries

backend/app/core/workflow.py

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -66,7 +66,7 @@ async def execute(self, problem: Problem):
6666

6767
modeler_response = await modeler_agent.run(coordinator_response)
6868

69-
user_output = UserOutput(work_dir=self.work_dir)
69+
user_output = UserOutput(work_dir=self.work_dir, ques_count=self.ques_count)
7070

7171
await redis_manager.publish_message(
7272
self.task_id,
@@ -178,4 +178,4 @@ async def execute(self, problem: Problem):
178178

179179
logger.info(user_output.get_res())
180180

181-
user_output.save_result(ques_count=self.ques_count)
181+
user_output.save_result()

0 commit comments

Comments
 (0)