- I. Project Overview
- II. Architecture / Design
- III. Prerequisites
- IV. Installation / Setup
- V. Usage
- VI. Infrastructure
- VII. Configuration
- VIII. Project Structure
- IX. Limitations / Assumptions
README Generator is an AI-powered CLI tool that automatically generates comprehensive README.md files for codebases. The tool analyzes a project's repository structure, source code, and Git history to produce well-structured, accurate documentation without requiring prior knowledge of the project.
The tool is designed for developers and technical teams who want to:
- Automatically generate standardized README files for their projects
- Update existing documentation based on code changes (incremental update mode)
- Ensure documentation accuracy by deriving information directly from code
- Maintain consistent documentation structure across multiple repositories
- Reduce manual documentation effort
The README Generator is built as a Python-based AI agent system with the following components:
-
AI Agent (Strands Framework)
- Uses AWS Bedrock with Claude Sonnet 4.5 as the inference model
- Configured with extended read timeout (180 seconds) for large codebases
- Equipped with custom tools for repository exploration and file manipulation
- Maintains conversation state for interactive chat mode
-
Custom Tools
get_tree: Recursively explores directory structure with configurable depthwrite_readme_file: Writes generated content to README.md at the project rootfile_read: Reads and analyzes source files (provided by strands-agents-tools)
-
Git Integration
- Detects changes since the last README.md update using Git history
- Generates diff output to focus analysis on modified files
- Enables incremental documentation updates rather than full regeneration
-
Security Layer
- Path validation ensures the agent can only access files within the specified root directory
- Prevents directory traversal attacks
-
Session Management
- File-based session persistence for conversation history
- Enables interactive chat mode for iterative refinement
- User invokes CLI with project path and project name
- Agent retrieves AWS Bedrock inference profile by name pattern (
{project_name}_{domain_name}) - If named profile is not found, falls back to global Claude Sonnet 4.5 profile
- System prompt is constructed from templates and optional organizational context
- Git diff analysis identifies changes since last README update (if applicable)
- Agent explores repository structure using
get_tree - Agent reads relevant files to understand the project
- Agent generates or updates README.md based on analysis
- (Optional) User can enter chat mode to iteratively refine the documentation
The tool implements an intelligent incremental update strategy:
- When a README.md already exists and the repository is a Git repository
- The tool extracts the diff of all changes since the README was last modified
- The AI agent focuses its analysis on changed files rather than re-analyzing the entire codebase
- This improves performance and reduces API costs for large repositories
- Python: 3.13 or higher
- AWS Account: With access to AWS Bedrock
- AWS Credentials: Properly configured on the local machine (via
~/.aws/credentialsor environment variables) - Poetry: For dependency management
- Terraform: 1.0+ (for infrastructure deployment)
The executing user/role must have permissions to:
- Call AWS Bedrock inference profiles (
bedrock:InvokeModel) - List AWS Bedrock inference profiles (
bedrock:ListInferenceProfiles)
Before using the tool, an AWS Bedrock inference profile must be deployed via Terraform (see Infrastructure section). If the named profile is not found, the tool will attempt to use a default global Claude Sonnet 4.5 profile.
git clone <repository-url>
cd readme-generatorNavigate to the code directory and install Python dependencies using Poetry:
cd code
poetry installEnsure AWS credentials are configured:
aws configureOr set environment variables:
export AWS_ACCESS_KEY_ID="your-access-key"
export AWS_SECRET_ACCESS_KEY="your-secret-key"
export AWS_DEFAULT_REGION="eu-west-1"Deploy the AWS Bedrock inference profile:
cd ../iac
terraform init -backend-config="bucket=<your-s3-bucket>" \
-backend-config="dynamodb_table=<your-dynamodb-table>"
terraform apply -var="project_name=<your-project-name>" \
-var="git_repository=<repository-url>"The inference profile name will be: {project_name}_readme_generator
Generate a README.md for the current directory:
poetry run readme_generator -p <project-name>Generate a README.md for a specific path:
poetry run readme_generator -p <project-name> -r /path/to/projectEnable chat mode to iteratively refine the generated README:
poetry run readme_generator -p <project-name> -r /path/to/project --chat-modeIn chat mode:
- The tool generates an initial README.md
- You can provide feedback and request modifications
- Type
exitto finish
The README Generator supports two methods for providing additional context to guide the documentation generation process.
Use --additional-context-file-path to provide a file containing organizational or project-specific context:
poetry run readme_generator -p <project-name> -r /path/to/project \
--additional-context-file-path /path/to/organizational-context.mdUse cases for context files:
- Organizational Standards: Define company-wide conventions, naming patterns, infrastructure practices, or deployment workflows
- Technology Stack Context: Specify internal frameworks, libraries, or tools used across multiple projects
- Documentation Standards: Enforce specific documentation styles, required sections, or terminology
- Cloud & Infrastructure Conventions: Document AWS account structures, resource naming conventions, tagging policies, or FinOps practices
- Security & Compliance: Include security guidelines, compliance requirements, or access control patterns
Example context file (organizational-context.md):
# Company XYZ Technical Context
## Infrastructure Conventions
- All projects use AWS in eu-west-1 region
- Resource naming: {project}_{domain}_{stage}_{resource}
- All resources must have cost allocation tags
## Deployment
- GitLab CI/CD is the standard platform
- Terraform manages all infrastructure
- Backend state stored in S3 with DynamoDB locking
## Technology Stack
- Python projects use Poetry for dependency management
- All APIs follow OpenAPI 3.0 specification
- Monitoring uses CloudWatch and DataDogThis context will be injected into the AI agent's system prompt, ensuring generated documentation reflects organizational practices and conventions.
For simple, one-off context additions, use -c or --additional-context-string:
poetry run readme_generator -p <project-name> -r /path/to/project \
-c "This is a legacy project migrated from Python 2.7 to Python 3.13"| Option | Required | Description |
|---|---|---|
-p, --project-name |
Yes | AWS project name (used to locate Bedrock inference profile) |
-r, --root-path |
No | Root path of the project to document (defaults to current directory) |
--chat-mode |
No | Enable interactive chat mode for README refinement |
--additional-context-file-path |
No | Path to file containing additional context for the AI (e.g., organizational conventions) |
-c, --additional-context-string |
No | Additional context provided as a string (for quick additions) |
The infrastructure is managed with Terraform and deploys an AWS Bedrock inference profile.
File: iac/bedrock_inference_profile.tf
- aws_bedrock_inference_profile.main: Creates a Bedrock inference profile
- Name pattern:
{project_name}_readme_generator - Model: Claude Sonnet 4.5 (
global.anthropic.claude-sonnet-4-5-20250929-v1:0) - Uses global inference profile for cross-region availability
- Name pattern:
| Variable | Description | Required |
|---|---|---|
project_name |
Name of the project (used for resource naming) | Yes |
git_repository |
Git repository URL (used for tagging) | Yes |
role_to_assume_arn |
ARN of IAM role to assume for deployment | No |
The project uses GitLab CI/CD for automated deployment:
- CI/CD Configuration:
.gitlab-ci.yml - Shared Templates: Includes reusable templates from
erwan.simon/devops-platform-ci-templates(v2.0.2) - Pipeline Stages: init, format, security, deploy, release, mirror_to_github
- Environment Selection: Derived from Git branch name (
$CI_COMMIT_REF_SLUG) - Project Variables:
PROJECT_NAME: pocDOMAIN_NAME: readme_generatorSTAGE_NAME: Automatically set from branch name
For local Terraform execution:
-
Initialize Terraform with backend configuration:
terraform init -backend-config="bucket=<s3-bucket>" \ -backend-config="dynamodb_table=<dynamodb-table>"
-
Create or select Terraform workspace (controls environment):
# Create new environment workspace terraform workspace new prod # Or select existing workspace terraform workspace select prod
Note: If no workspace is created, Terraform uses the
defaultworkspace, resulting instage_name=default. -
Apply Terraform configuration:
terraform apply -var="project_name=poc" \ -var="git_repository=https://gitlab.com/your/repo"
-
Verify AWS credentials target the correct account:
aws sts get-caller-identity
- Backend Type: S3
- State File Key:
readme_generator.tfstate - Region:
eu-west-1 - Encryption: Enabled
Backend configuration is provided at runtime (not hardcoded in Terraform files), following organizational conventions.
All AWS resources are tagged with:
Appli: Project nameComponent:readme_generatorgit_repository: Source repository URL
These tags support cost allocation and FinOps tracking.
The tool does not require environment variables for basic operation, but relies on standard AWS SDK credential resolution:
AWS_ACCESS_KEY_IDAWS_SECRET_ACCESS_KEYAWS_DEFAULT_REGION(defaults toeu-west-1in Terraform)
Defines the AI agent's behavior, analysis guidelines, and README structure requirements. This file is loaded at runtime and combined with the README template.
Key instructions include:
- Agent role and constraints
- Repository analysis methodology
- Change-based update mode: Instructions for incremental updates based on Git diff
- Context window safety: Adaptive exploration strategy to prevent token overflow
- Documentation neutrality rule: Ensures README represents current state without mentioning changes or versions
- README content requirements
- Organizational context awareness
- Output behavior and feedback loop handling
Defines the expected structure and sections for generated README files.
File: code/pyproject.toml
- Package Name:
readme_generator - Version: 0.5.1
- Python Version: ^3.13
- Entry Point:
readme_generatorcommand mapped toreadme_generator.main:command_line_main
The tool looks up the Bedrock inference profile using the pattern:
{project_name}_{domain_name}
Where:
project_name: Provided via-pCLI optiondomain_name: Fixed toreadme_generator
Example: -p poc resolves to inference profile poc_readme_generator
If the named profile is not found, the tool falls back to:
arn:aws:bedrock:{region}:{account}:inference-profile/global.anthropic.claude-sonnet-4-5-20250929-v1:0
The Bedrock client is configured with:
- Read Timeout: 180 seconds (to accommodate large repository analysis)
- Model: Claude Sonnet 4.5 via inference profile
- Session Management: File-based persistence for conversation history
The README Generator constructs the AI agent's system prompt by combining multiple sources in the following order:
The foundation of the agent's instructions, defining:
- The agent's role as a senior software engineer and technical writer
- Analysis methodology and constraints
- Required README sections and structure
- Output format and tone guidelines
- Security constraints (e.g., respecting
.gitignore, file access boundaries) - Organizational context awareness: Explicit instruction that "Organizational context is authoritative unless explicitly contradicted by the repository"
- Change-based update mode: Instructions for using Git diff to focus on modified files
- Documentation neutrality rule: Prohibition of version references or "new/added/removed" language
- Context window safety: Adaptive directory-by-directory exploration strategy
Appended to the system prompt to provide a structural template with:
- Standard section headings and hierarchy
- Table of contents format
- Markdown conventions
Injected via --additional-context-file-path, this is where you can provide:
- Company-wide technical conventions
- Infrastructure and deployment standards
- Naming conventions and tagging policies
- Technology stack preferences
- Compliance and security requirements
- CI/CD platform and execution model
- Cloud provider conventions and region preferences
This ensures the AI agent interprets repositories through the lens of your organization's specific practices, producing documentation that aligns with internal standards.
Any additional context provided via --additional-context-string is appended:
system_prompt += "\nFinally, the user gave you this sentence as additional context:" + user_stringIf the repository is a Git repository and a README.md exists, the tool automatically appends:
system_prompt += "\n\nHere is the diff list:\n" + str(changes_list)This enables the agent to focus on changed files and perform incremental updates.
Final System Prompt = Base Instructions (with org context awareness)
+ README Template
+ [Organizational Context File]
+ [User Context String]
+ [Git Diff Since Last README Update]
This layered approach allows for:
- Consistency: Base prompt ensures standard behavior across all runs
- Organizational Alignment: System prompt explicitly prioritizes organizational context
- Customization: Organizational context adapts the tool to your environment
- Flexibility: User context string enables quick, one-off adjustments
- Efficiency: Git diff enables incremental updates for large repositories
- Keep it factual: Provide objective information about conventions, not preferences
- Be specific: Include concrete examples of naming patterns, resource structures, etc.
- Document CI/CD and deployment: Specify which platform is used (GitLab CI, GitHub Actions, etc.) and how environments are selected
- Include infrastructure conventions: Cloud provider, region, Terraform backend patterns, workspace usage
- Update regularly: Maintain the context file as organizational practices evolve
- Version control: Store organizational context files in a shared repository
- Scope appropriately: Separate general organizational context from project-specific details
readme-generator/
├── code/ # Python application code
│ ├── readme_generator/ # Main package
│ │ ├── main.py # CLI entry point and agent orchestration
│ │ ├── system_prompt.txt # AI agent instructions
│ │ └── readme_example.md # README template structure
│ ├── pyproject.toml # Poetry configuration and dependencies
│ └── poetry.lock # Locked dependency versions
├── iac/ # Infrastructure as Code (Terraform)
│ ├── bedrock_inference_profile.tf # Bedrock inference profile resource
│ ├── locals.tf # Local variables
│ ├── variables.tf # Input variables
│ ├── data.tf # Data sources (AWS account, region)
│ ├── terraform.tf # Provider and backend configuration
│ └── backend.hcl # Backend configuration (git-ignored)
├── .gitlab-ci.yml # GitLab CI/CD pipeline
├── .releaserc.json # Semantic release configuration
├── .gitignore # Git ignore patterns
└── LICENSE # MIT License
code/readme_generator/main.py
- CLI entry point using Click framework
- Agent initialization and orchestration
- Custom tool definitions (
get_tree,write_readme_file) - Security validation for file access
- Chat mode implementation
- Prompt construction logic (base + template + organizational context + user context + git diff)
- Git integration for change detection (
get_git_diff_since_readme_update) - Bedrock client configuration with extended read timeout (180 seconds)
- Inference profile resolution with fallback to global profile
code/readme_generator/system_prompt.txt
- Defines AI agent role and capabilities
- Specifies analysis guidelines
- Lists required README sections
- Sets output format and tone
- Includes organizational context awareness directive: "Organizational context is authoritative unless explicitly contradicted by the repository"
- Includes organizational context exposure guideline: "When organizational conventions materially affect how users build, deploy, or operate the project, they MUST be explicitly documented in the README"
- Defines change-based update mode: Instructions for using Git diff to focus analysis
- Defines documentation neutrality rule: Prohibition of version/change references
- Defines context window safety strategy: Adaptive directory-by-directory exploration
code/readme_generator/readme_example.md
- Markdown template for generated READMEs
- Defines standard section structure
iac/bedrock_inference_profile.tf
- Defines AWS Bedrock inference profile resource
- Configures Claude Sonnet 4.5 model
- Uses global inference profile for cross-region support
iac/locals.tf
domain_name: Fixed toreadme_generatorenvironment_name: Computed as{project_name}_{domain_name}
iac/terraform.tf
- AWS provider configuration with default tags
- S3 backend configuration for state management
- IAM role assumption support
- AWS Region: Infrastructure defaults to
eu-west-1(Ireland) - Python Version: Requires Python 3.13 or higher
- Bedrock Access: Assumes AWS account has access to Claude Sonnet 4.5 model
- Terraform Backend: Backend configuration must be provided at initialization time (not hardcoded)
- GitLab CI/CD: CI/CD pipelines are configured for GitLab (not GitHub Actions)
- Inference Profile Naming: The tool expects inference profiles to follow the naming pattern
{project_name}_readme_generator - GitHub Mirror: This repository is mirrored to GitHub from GitLab (source of truth is GitLab)
- Git Repository: Change-based update mode requires the project to be a Git repository
- Path Restriction: The agent can only access files within the specified root path (security measure)
- Recursive Depth: Directory exploration is limited to a configurable depth (default: 5 levels) to prevent performance issues
- Model Dependency: Requires access to AWS Bedrock and the specific Claude model
- AWS Credentials: Relies on locally configured AWS credentials (does not support credential injection)
- Single Repository Analysis: Designed to analyze one repository at a time
- No Multi-language LLM Support: Currently configured only for Claude on AWS Bedrock
- GitIgnore Awareness: The system prompt instructs the agent to respect
.gitignore, but enforcement depends on AI behavior - Read Timeout: Bedrock API calls are subject to 180-second timeout, which may affect very large repositories
- Token Limits: Large codebases may exceed Claude's context window; the tool implements adaptive exploration to mitigate this
- Cost: Each README generation incurs AWS Bedrock API costs
- Network Dependency: Requires network access to AWS services
- Session Persistence: Chat mode sessions are stored locally and not shared across machines
- Terraform Workspace: Local users must manually create and select Terraform workspaces to control environment (
stage_name); otherwise defaults todefaultworkspace - Git Diff Analysis: Change-based update mode is only available for Git repositories with existing README.md files
- Inference Profile Fallback: If the named inference profile is not found, the tool uses a global profile ARN which may have different rate limits or availability