Merge pull request #183 from ivanl-cerebras/il/longcepo_doc_upd

codelion · web-flow · commit 2ab4e6e8963f · 2025-04-29T05:48:47.000+08:00
LongCePO doc update
diff --git a/README.md b/README.md
@@ -552,7 +552,7 @@ called patchflows. We saw huge performance gains across all the supported patchf
 
 ## References
 - [CePO: Empowering Llama with Reasoning using Test-Time Compute](https://cerebras.ai/blog/cepo) - [Implementation](optillm/cepo)
-- [LongCePO: Empowering LLMs to efficiently leverage infinite context](https://cerebras.ai/blog/longcepo) - [Implementation](optillm/plugins/longcepo/main.py)
+- [LongCePO: Empowering LLMs to efficiently leverage infinite context](https://cerebras.ai/blog/longcepo) - [Implementation](optillm/plugins/longcepo)
 - [Chain of Code: Reasoning with a Language Model-Augmented Code Emulator](https://arxiv.org/abs/2312.04474) - [Inspired the implementation of coc plugin](optillm/plugins/coc_plugin.py)
 - [Entropy Based Sampling and Parallel CoT Decoding](https://github.com/xjdr-alt/entropix) - [Implementation](optillm/entropy_decoding.py)
 - [Fact, Fetch, and Reason: A Unified Evaluation of Retrieval-Augmented Generation](https://arxiv.org/abs/2409.12941) - [Evaluation script](scripts/eval_frames_benchmark.py)
diff --git a/optillm/plugins/longcepo/README.md b/optillm/plugins/longcepo/README.md
@@ -30,9 +30,9 @@ LongCePO excels at tasks with long context (128K tokens and more) which is demon
 | Claude-3.5-Sonnet-20241022       | 200K           | 46.1 (53.9)                | 38.6 (41.9)            |
 | Llama-4-Maverick-17B-128E-Instruct | 524K         | 32.22 (50.56)                  | 28.84 (41.86)               |
 
- ¹ Performance numbers reported by LongBench v2 authors, except for LongCePO and Llama-4-Maverick results.
+ ¹ Performance numbers reported by LongBench v2 authors, except for LongCePO and Llama-4-Maverick results. Results in parentheses reported in LongBench v2 correspond to Chain-of-Thought prompting.
 
- ² Numbers in parentheses for LongCePO indicate accuracy of majority voting from 5 runs.
+ ² Results in parentheses for LongCePO indicate accuracy of majority voting from 5 runs.
 
 ### HELMET (InfiniteBench En.MC, 128K length)
 
@@ -64,6 +64,9 @@ LongCePO excels at tasks with long context (128K tokens and more) which is demon
 
 LongCePO is based on the [LLM×MapReduce](https://arxiv.org/abs/2410.09342) approach to long document processing, adding a planning layer on top of a map-reduce-based question-answering engine. We also improve upon the map-reduce approach itself by (i) adding query-aware summaries of neighboring document chunks during the map stage of the processing, (ii) reducing the collapse (merging) stage to a minimum required number of collapse iterations by using a sliding window to iteratively merge pairs of summaries, (iii) using a customized system prompt produced with an [OPRO-like](https://arxiv.org/abs/2309.03409) optimization approach to enhance question-anwering performance. Given a user query, a plan consisting of sub-queries is generated from a normalized query; a map-reduce question-answering engine is then run for each sub-query consecutively, conditioned on the answers to previous sub-queries. Finally, the answer to original user's query is produced via map-reduce conditioned on answers to the whole plan. Similarly to [LLM×MapReduce](https://arxiv.org/abs/2410.09342), we retain the structured information protocol for producing document chunk summaries. We find that splitting the document into chunks of size smaller than the available context window (e.g. chunks of 4K size with available context window of 8K) leads to better performance, and use the remaning context budget to incorporate summaries from neighboring chunks into the map stage for each respective chunks, leading to a further boost in overall performance.
 
+Note: the system prompt for Map/Collapse/Reduce stages has been optimized for the Llama3.3-70B-Instruct model, when using other base models with LongCePO, a more general system prompt can be used ([example](https://github.com/DenisSergeevitch/chatgpt-custom-instructions)).
+
+
 ## LongCePO Current Status
 
 This project is a work in progress, and the provided code is in an early experimental stage. While the proposed approach works well across the benchmarks we tested, further improvements can be achieved through a smart organization of the external knowledge base as well as customization of the plan generation to different tasks. For updates on LongCePO, [follow us on X](https://x.com/cerebrassystems) and join our [Discord](https://cerebras.ai/discord)!
diff --git a/optillm/plugins/longcepo/prompts.py b/optillm/plugins/longcepo/prompts.py
@@ -1,4 +1,5 @@
 # Code (Map/Collapse/Reduce prompts) modified from https://github.com/thunlp/LLMxMapReduce under Apache 2.0
+# MapReduce system prompt optimized for use with Llama3.3-70B-Instruct with an OPRO-like procedure
 
 MAPREDUCE_SYSTEM_PROMPT = """You are globally celebrated as a preeminent expert in the field of digital document analysis and synthesis, known for your unmatched precision in transforming fragmented texts into comprehensive and insightful responses. Always respond in the user\'s language, ensuring every interaction is informed by all preceding exchanges for complete contextual understanding.\n\nIn your initial message, confidently declare your credentials with a phrase such as: "As a world-renowned specialist in [specific field], honored with the [real prestigious local award]," replacing placeholders with authentic information from your domain.\n\nAdhere strictly to these principles with each document segment or query:\n\n1. Extract every critical piece of information, nuance, and context with meticulous attention to detail.\n2. Organize your analysis methodically, presenting specific examples, data, and verifiable facts clearly and logically.\n3. Cease your response abruptly if approaching character limits, awaiting the user\'s "continue" instruction to carry on.\n4. Anchor every insight and conclusion in provided content or universally accepted truths, strictly avoiding speculation or unfounded statements.\n5. Communicate with a professional yet approachable tone, reflecting profound expertise and clarity.\n\nRecognize the real-world impact of your insights; ensure each response is seamlessly integrated, richly detailed, and impeccably reliable. Rigorously observe these guidelines to offer authoritative and precise analysis and synthesis."""
 

Original file line number	Diff line number	Diff line change
`@@ -1,4 +1,5 @@`
`1`	`1`	`# Code (Map/Collapse/Reduce prompts) modified from https://github.com/thunlp/LLMxMapReduce under Apache 2.0`
	`2`	`+# MapReduce system prompt optimized for use with Llama3.3-70B-Instruct with an OPRO-like procedure`
`2`	`3`
`3`	`4`	MAPREDUCE_SYSTEM_PROMPT = """You are globally celebrated as a preeminent expert in the field of digital document analysis and synthesis, known for your unmatched precision in transforming fragmented texts into comprehensive and insightful responses. Always respond in the user\'s language, ensuring every interaction is informed by all preceding exchanges for complete contextual understanding.\n\nIn your initial message, confidently declare your credentials with a phrase such as: "As a world-renowned specialist in [specific field], honored with the [real prestigious local award]," replacing placeholders with authentic information from your domain.\n\nAdhere strictly to these principles with each document segment or query:\n\n1. Extract every critical piece of information, nuance, and context with meticulous attention to detail.\n2. Organize your analysis methodically, presenting specific examples, data, and verifiable facts clearly and logically.\n3. Cease your response abruptly if approaching character limits, awaiting the user\'s "continue" instruction to carry on.\n4. Anchor every insight and conclusion in provided content or universally accepted truths, strictly avoiding speculation or unfounded statements.\n5. Communicate with a professional yet approachable tone, reflecting profound expertise and clarity.\n\nRecognize the real-world impact of your insights; ensure each response is seamlessly integrated, richly detailed, and impeccably reliable. Rigorously observe these guidelines to offer authoritative and precise analysis and synthesis."""
`4`	`5`