AntonOsika · Sdsai0311 · Oct 9, 2025 · Oct 9, 2025 · Oct 9, 2025 · Oct 10, 2025
diff --git a/.pre-commit-config.yaml b/.pre-commit-config.yaml
@@ -1,7 +1,7 @@
 # See https://pre-commit.com for more information
 # See https://pre-commit.com/hooks.html for more hooks
 fail_fast: true
-default_stages: [commit]
+default_stages: [pre-commit]
 
 repos:
   - repo: https://github.com/psf/black

diff --git a/README.md b/README.md
@@ -12,78 +12,125 @@ The OG code genereation experimentation platform!
 If you are looking for the evolution that is an opinionated, managed service – check out gptengineer.app.
 
 If you are looking for a well maintained hackable CLI for – check out aider.
+ # gpt-engineer
+
+GitHub Repo stars · Discord Follow · License · GitHub Issues or Pull Requests · GitHub Release · Twitter Follow
+
+The OG code generation experimentation platform!
+
+If you are looking for the evolution that is an opinionated, managed service – check out gptengineer.app.
+
+If you are looking for a well maintained hackable CLI – check out aider.
 
 
 gpt-engineer lets you:
+
 - Specify software in natural language
 - Sit back and watch as an AI writes and executes the code
 - Ask the AI to implement improvements
 
+
 ## Getting Started
 
 ### Install gpt-engineer
 
-For **stable** release:
+For stable release:
+
+```bash
+python -m pip install gpt-engineer
+```
 
-- `python -m pip install gpt-engineer`
+For development:
 
-For **development**:
-- `git clone https://github.com/gpt-engineer-org/gpt-engineer.git`
-- `cd gpt-engineer`
-- `poetry install`
-- `poetry shell` to activate the virtual environment
+```bash
+git clone https://github.com/gpt-engineer-org/gpt-engineer.git
+cd gpt-engineer
+poetry install
+poetry shell  # activate the virtual environment
+```
 
-We actively support Python 3.10 - 3.12. The last version to support Python 3.8 - 3.9 was [0.2.6](https://pypi.org/project/gpt-engineer/0.2.6/).
+We actively support Python 3.10 - 3.12. The last version to support Python 3.8 - 3.9 was 0.2.6.
 
 ### Setup API key
 
-Choose **one** of:
-- Export env variable (you can add this to .bashrc so that you don't have to do it each time you start the terminal)
-    - `export OPENAI_API_KEY=[your api key]`
-- .env file:
-    - Create a copy of `.env.template` named `.env`
-    - Add your OPENAI_API_KEY in .env
-- Custom model:
-    - See [docs](https://gpt-engineer.readthedocs.io/en/latest/open_models.html), supports local model, azure, etc.
+Choose one of:
 
-Check the [Windows README](./WINDOWS_README.md) for Windows usage.
+- Export an environment variable (add it to your shell profile so you don't need to set it every time):
 
-**Other ways to run:**
-- Use Docker ([instructions](docker/README.md))
-- Do everything in your browser:
-[![Open in GitHub Codespaces](https://github.com/codespaces/badge.svg)](https://github.com/gpt-engineer-org/gpt-engineer/codespaces)
+```bash
+export OPENAI_API_KEY=[your api key]
+```
+
+- Use a `.env` file:
+  - Create a copy of `.env.template` named `.env`
+  - Add your `OPENAI_API_KEY` in `.env`
+
+- Custom model: See the docs for instructions (supports local models, Azure, etc.).
+
+Check the `WINDOWS_README.md` file for Windows-specific instructions.
+
+Other ways to run:
+
+- Use Docker (see `docker/README.md`)
+- Open in GitHub Codespaces
+
+
+## Usage
 
 ### Create new code (default usage)
-- Create an empty folder for your project anywhere on your computer
-- Create a file called `prompt` (no extension) inside your new folder and fill it with instructions
-- Run `gpte <project_dir>` with a relative path to your folder
-  - For example: `gpte projects/my-new-project` from the gpt-engineer directory root with your new folder in `projects/`
+
+1. Create an empty folder for your project.
+2. Inside that folder create a file named `prompt` (no extension) and fill it with your instructions.
+3. From the gpt-engineer repo root run:
+
+```bash
+gpte <project_dir>
+# example: gpte projects/my-new-project
+```
+
 
 ### Improve existing code
-- Locate a folder with code which you want to improve anywhere on your computer
-- Create a file called `prompt` (no extension) inside your new folder and fill it with instructions for how you want to improve the code
-- Run `gpte <project_dir> -i` with a relative path to your folder
-  - For example: `gpte projects/my-old-project -i` from the gpt-engineer directory root with your folder in `projects/`
 
-### Benchmark custom agents
-- gpt-engineer installs the binary 'bench', which gives you a simple interface for benchmarking your own agent implementations against popular public datasets.
-- The easiest way to get started with benchmarking is by checking out the [template](https://github.com/gpt-engineer-org/gpte-bench-template) repo, which contains detailed instructions and an agent template.
-- Currently supported benchmark:
-  - [APPS](https://github.com/hendrycks/apps)
-  - [MBPP](https://github.com/google-research/google-research/tree/master/mbpp)
+1. Locate the folder containing the code you want to improve.
+2. Create a `prompt` file inside it with instructions for the improvement.
+3. Run:
+
+```bash
+gpte <project_dir> -i
+# example: gpte projects/my-old-project -i
+```
+
+
+### Benchmarking
+
+The `gpt-engineer` package installs a `bench` binary for benchmarking agent implementations. See the `gpte-bench-template` repo for a starter template.
+
+Supported datasets include APPS and MBPP.
 
-The community has started work with different benchmarking initiatives, as described in [this Loom](https://www.loom.com/share/206805143fbb4302b5455a5329eaab17?sid=f689608f-8e49-44f7-b55f-4c81e9dc93e6) video.
 
 ### Research
-Some of our community members have worked on different research briefs that could be taken further. See [this document](https://docs.google.com/document/d/1qmOj2DvdPc6syIAm8iISZFpfik26BYw7ZziD5c-9G0E/edit?usp=sharing) if you are interested.
 
-## Terms
-By running gpt-engineer, you agree to our [terms](https://github.com/gpt-engineer-org/gpt-engineer/blob/main/TERMS_OF_USE.md).
+See the `docs` and community resources for research notes and briefs.
+
+
+## Notes
+
+- Limiting context window: see `docs/context_window.md` for strategies to control token usage and avoid truncation.
+- By running gpt-engineer you agree to the terms in `TERMS_OF_USE.md`.
+
+
+## Links & Community
+
+- Roadmap: `ROADMAP.md`
+- Governance: `GOVERNANCE.md`
+- Contributing: `.github/CONTRIBUTING.md`
+- Discord: https://discord.gg/8tcDQ89Ej2
+
+
+---
 
+_This README was updated locally._
 
-## Relation to gptengineer.app (GPT Engineer)
-[gptengineer.app](https://gptengineer.app/) is a commercial project for the automatic generation of web apps.
-It features a UI for non-technical users connected to a git-controlled codebase.
 The gptengineer.app team is actively supporting the open source community.
 
 

diff --git a/docs/context_window.md b/docs/context_window.md
@@ -0,0 +1,150 @@
+# Context window (token limit)
+
+This note explains what a context window (token limit) is, why it matters when using LLMs, and practical strategies to work within it.
+
+## What is the context window?
+
+A model's context window (also called token limit) is the maximum number of tokens the model can accept as input (and sometimes include in output). Tokens roughly correspond to pieces of words; common English text averages ~0.7–1.3 tokens per word depending on vocabulary and punctuation.
+
+If your prompt + conversation + document history exceed the context window, older content will be truncated (dropped) or the model will return an error depending on the client.
+
+## Why it matters
+
+- Cost: Many API providers bill per token. Sending more tokens increases costs.
+- Performance: Larger inputs increase latency and can require more memory on the client/server side.
+- Truncation / information loss: When the context exceeds the limit, parts of history or documents are omitted, which can break coherence, reasoning, or cause the model to lose earlier instructions or facts.
+
+## Practical strategies
+
+Below are three pragmatic strategies to manage content so it fits the context window while preserving useful information.
+
+### 1) Truncation (simple, predictable)
+
+When total tokens are too large, drop old or less-important content. This is easy, predictable, and safe for streaming/long chats. Use heuristics to drop older messages or large binary blobs (images, raw code) first.
+
+Pros: simple, low compute overhead.
+Cons: may drop crucial earlier context.
+
+Conceptual pseudocode:
+
+```
+function build_payload(history, new_message, max_tokens):
+    payload = [system_prompt]
+    payload.append(new_message)
+    for msg in reversed(history):  # start from most recent
+        if token_count(payload) + token_count(msg) > max_tokens:
+            break
+        payload.prepend(msg)
+    return payload
+```
+
+Tips:
+- Keep a sliding window of the most recent N messages.
+- Prefer to keep the system instructions and the most recent user/assistant turn.
+
+### 2) Summarization / compaction (preserve meaning)
+
+Compress older content into a shorter summary that preserves important facts. Periodically summarize the conversation or documents and store the summary in place of raw items. This preserves context at a lower token cost.
+
+Pros: maintains semantic information; better for long-running sessions.
+Cons: requires extra API calls or compute for summarization and careful prompt engineering to avoid losing critical specifics.
+
+Conceptual pseudocode:
+
+```
+if total_tokens(history) > summary_threshold:
+    chunk = select_oldest_chunk(history)
+    summary = call_model_summarize(chunk)
+    remove chunk from history
+    append summary_marker(summary) to history
+
+# Then build payload as in truncation, prioritizing summaries + recent messages
+```
+
+Implementation notes:
+- Use structured summaries when possible: facts, entities, decisions, open tasks.
+- Keep both a human-readable summary and a small machine-friendly key-value store for retrieval.
+- Re-summarize incrementally: each time you summarize, append to the summary rather than re-summarize everything from scratch.
+
+### 3) Configuration option (developer-facing control)
+
+Expose a configuration option to tune how the system behaves when approaching the token limit. Example knobs:
+
+- max_context_tokens: hard limit used when composing payloads.
+- strategy: one of ["truncate", "summarize", "hybrid"].
+- preserve_system_prompts: boolean; always keep system prompts.
+- preserve_recent_turns: N recent user/assistant turns to always keep.
+
+This lets users choose tradeoffs appropriate to their use case (cost vs. fidelity).
+
+Example configuration object (JSON-like):
+
+```
+config = {
+  "max_context_tokens": 32000,
+  "strategy": "hybrid",
+  "preserve_system_prompts": true,
+  "preserve_recent_turns": 6,
+  "summary_chunk_size": 4000  # tokens per summarization chunk
+}
+```
+
+Hybrid strategy: try to include as much recent raw context as possible, then include summaries of older content, and finally truncate if still necessary.
+
+## Pseudocode: hybrid end-to-end
+
+```
+function prepare_context(history, new_message, config):
+    ensure system_prompt in history (or separate)
+
+    # Step 1: try to keep recent turns
+    payload = [system_prompt, new_message]
+    for msg in reversed(history.recent(config.preserve_recent_turns)):
+        if token_count(payload) + token_count(msg) <= config.max_context_tokens:
+            payload.prepend(msg)
+
+    # Step 2: include summaries of older content
+    older = history.older_than_recent()
+    for chunk in chunked(older, config.summary_chunk_size):
+        summary = get_or_create_summary(chunk)
+        if token_count(payload) + token_count(summary) <= config.max_context_tokens:
+            payload.append(summary)
+        else:
+            break
+
+    # Step 3: if still too large, truncate the least-important remaining items
+    if token_count(payload) > config.max_context_tokens:
+        payload = truncate_least_important(payload, config.max_context_tokens)
+
+    return payload
+```
+
+## Troubleshooting notes & edge cases
+
+- "Off-by-one" token errors: different tokenizers or APIs may count tokens differently. Always leave a safety buffer (e.g., 32–256 tokens) when computing allowed tokens for model input + expected output.
+
+- Unexpected truncation of system messages: ensure system prompts are treated as highest priority and pinned into the payload.
+
+- Cost spikes when summarizing: summarization itself consumes tokens (both input and output), so amortize summarization by doing it infrequently or offline when possible.
+
+- Losing exact data (e.g., code or long tables): summaries can lose exact formatting or specifics. For cases where exactness matters, keep the original as a downloadable artifact and include a short index or pointer in the summary.
+
+- Very long single documents: chunk documents into logical sections and summarize each section, or use retrieval (vector DB) + short relevant context injection instead of sending whole doc.
+
+- Multi-user/parallel sessions: keep per-session histories and shared summaries carefully namespaced to avoid mixing users' contexts.
+
+## Additional suggestions
+
+- Instrument token usage and provide metrics to users (tokens per request, cost per request, average history length). This helps tune thresholds.
+- Provide a debugging mode that prints the token counts and what was dropped or summarized before each request.
+- When integrating with retrieval (vector DBs), index long documents and retrieve only the most relevant chunks to inject into prompts rather than pushing entire documents.
+
+## References and further reading
+
+- Tokenization and how tokens map to words depends on the model's tokenizer (BPE / byte-level BPE etc.).
+- For long-running agents, consider combining summarization with retrieval-augmented generation (RAG) patterns.
+
+
+---
+
+Notes: this page is intentionally concise. If you have an existing draft on the canvas you want copied verbatim, paste it here or tell me where to read it and I will replace this content with the draft's exact text.
diff --git a/docs/examples/open_llms/README.md b/docs/examples/open_llms/README.md
@@ -53,4 +53,4 @@ export MODEL_NAME="CodeLlama"
 python examples/open_llms/langchain_interface.py
 ```
 
-That's it 🤓 time to go back [to](/docs/open_models.md#running-the-example) and give `gpte` a try.
+That's it 🤓 time to go back [to](../../open_models.md) and give `gpte` a try.
diff --git a/docs/index.rst b/docs/index.rst
@@ -16,6 +16,7 @@ Welcome to GPT-ENGINEER's Documentation
    windows_readme_link
    open_models.md
    tracing_debugging.md
+   context_window.md
 
 .. toctree::
    :maxdepth: 2

diff --git a/docs/introduction.md b/docs/introduction.md
@@ -4,9 +4,9 @@
 <br>
 
 ## Get started
-[Here’s](/en/latest/installation.html) how to install ``gpt-engineer``, set up your environment, and start building.
+[Here’s](installation.rst) how to install ``gpt-engineer``, set up your environment, and start building.
 
-We recommend following our [Quickstart](/en/latest/quickstart.html) guide to familiarize yourself with the framework by building your first application with ``gpt-engineer``.
+We recommend following our [Quickstart](quickstart.rst) guide to familiarize yourself with the framework by building your first application with ``gpt-engineer``.
 
 <br>
 

diff --git a/docs/open_models.md b/docs/open_models.md
@@ -72,7 +72,7 @@ Feel free to try out larger models on your hardware and see what happens.
 Running the Example
 ==================
 
-To see that your setup works check [test open LLM setup](examples/test_open_llm/README.md).
+To see that your setup works check [test open LLM setup](examples/open_llms/README.md).
 
 If above tests work proceed 😉
 

diff --git a/docs/quickstart.rst b/docs/quickstart.rst
@@ -5,13 +5,13 @@ Quickstart
 Installation
 ============
 
-To install LangChain run:
+To install gpt-engineer run:
 
 .. code-block:: console
 
     $ python -m pip install gpt-engineer
 
-For more details, see our [Installation guide](/instllation.html).
+For more details, see our `Installation guide <installation.rst>`_.
 
 Setup API Key
 =============
@@ -29,9 +29,9 @@ Choose one of the following:
   - Create a copy of ``.env.template`` named ``.env``
   - Add your ``OPENAI_API_KEY`` in .env
 
-- If you want to use a custom model, visit our docs on `using open models and azure models <./open_models.html>`_.
+- If you want to use a custom model, visit our docs on `using open models and azure models <open_models.md>`_.
 
-- To set API key on windows check the `Windows README <./windows_readme_link.html>`_.
+- To set API key on Windows check the `Windows README <windows_readme_link.rst>`_.
 
 Building with ``gpt-engineer``
 ==============================
@@ -60,7 +60,7 @@ Improve Existing Code
 
     $ gpte projects/my-old-project -i
 
-By running ``gpt-engineer`` you agree to our `terms <./terms_link.html>`_.
+By running ``gpt-engineer`` you agree to our `terms <terms_link.rst>`_.
 
 To **run in the browser** you can simply: