Skip to content

feat:support cnb repo #303

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 4 commits into
base: main
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from 2 commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
12 changes: 12 additions & 0 deletions .trae/TODO.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,12 @@
# TODO:

- [x] 1: 检查src/app/page.tsx中parseRepositoryInput函数,确保cnb.cool被正确识别为cnb类型 (priority: High)
- [x] 4: 修复src/app/page.tsx中handleGenerateWiki函数,确保URL参数中type使用解析出的type而不是selectedPlatform (priority: High)
- [x] 15: 移除src/app/[owner]/[repo]/page.tsx中所有错误的API调用逻辑(GitHub、GitLab、Bitbucket、CNB等平台API) (priority: High)
- [x] 16: 移除api/api.py中不必要的CNB API代理端点 (priority: High)
- [x] 18: 简化前端仓库信息获取逻辑,确保只传递仓库URL和token给后端 (priority: High)
- [x] 20: 移除src/app/page.tsx中CNB类型强制要求token的限制 (priority: High)
- [x] 21: 移除next.config.ts中CNB API代理配置 (priority: High)
- [x] 25: 修改README.md中的平台描述,将"GitHub, GitLab, or BitBucket"更新为"GitHub, GitLab, BitBucket, or CNB" (priority: High)
- [x] 26: 更新src/messages/下所有语言文件中的平台描述,添加CNB支持 (priority: High)
- [x] 17: 移除api/requirements.txt中不必要的httpx依赖 (priority: Medium)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

This file appears to be a personal TODO list for tracking development tasks. Such files are not typically committed to the project's source control. It should be removed before merging.

5 changes: 5 additions & 0 deletions .vscode/settings.json
Original file line number Diff line number Diff line change
@@ -0,0 +1,5 @@
{
"i18n-ally.localesPaths": [
"src/messages"
]
}
10 changes: 5 additions & 5 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,7 @@

![DeepWiki Banner](screenshots/Deepwiki.png)

**DeepWiki** is my own implementation attempt of DeepWiki, automatically creates beautiful, interactive wikis for any GitHub, GitLab, or BitBucket repository! Just enter a repo name, and DeepWiki will:
**DeepWiki** is my own implementation attempt of DeepWiki, automatically creates beautiful, interactive wikis for any GitHub, GitLab, BitBucket, or CNB repository! Just enter a repo name, and DeepWiki will:

1. Analyze the code structure
2. Generate comprehensive documentation
Expand All @@ -18,7 +18,7 @@

## ✨ Features

- **Instant Documentation**: Turn any GitHub, GitLab or BitBucket repo into a wiki in seconds
- **Instant Documentation**: Turn any GitHub, GitLab, BitBucket or CNB repo into a wiki in seconds
- **Private Repository Support**: Securely access private repositories with personal access tokens
- **Smart Analysis**: AI-powered understanding of code structure and relationships
- **Beautiful Diagrams**: Automatic Mermaid diagrams to visualize architecture and data flow
Expand Down Expand Up @@ -104,15 +104,15 @@ yarn dev
#### Step 4: Use DeepWiki!

1. Open [http://localhost:3000](http://localhost:3000) in your browser
2. Enter a GitHub, GitLab, or Bitbucket repository (like `https://github.com/openai/codex`, `https://github.com/microsoft/autogen`, `https://gitlab.com/gitlab-org/gitlab`, or `https://bitbucket.org/redradish/atlassian_app_versions`)
2. Enter a GitHub, GitLab, Bitbucket, or CNB repository (like `https://github.com/openai/codex`, `https://github.com/microsoft/autogen`, `https://gitlab.com/gitlab-org/gitlab`, `https://bitbucket.org/redradish/atlassian_app_versions`, or `https://cnb.cool/learning-docker/project-1-jupyter`)
3. For private repositories, click "+ Add access tokens" and enter your GitHub or GitLab personal access token
4. Click "Generate Wiki" and watch the magic happen!

## 🔍 How It Works

DeepWiki uses AI to:

1. Clone and analyze the GitHub, GitLab, or Bitbucket repository (including private repos with token authentication)
1. Clone and analyze the GitHub, GitLab, Bitbucket, or CNB repository (including private repos with token authentication)
2. Create embeddings of the code for smart retrieval
3. Generate documentation with context-aware AI (using Google Gemini, OpenAI, OpenRouter, Azure OpenAI, or local Ollama models)
4. Create visual diagrams to explain code relationships
Expand All @@ -122,7 +122,7 @@ DeepWiki uses AI to:

```mermaid
graph TD
A[User inputs GitHub/GitLab/Bitbucket repo] --> AA{Private repo?}
A[User inputs GitHub/GitLab/Bitbucket/CNB repo] --> AA{Private repo?}
AA -->|Yes| AB[Add access token]
AA -->|No| B[Clone Repository]
AB --> B
Expand Down
3 changes: 3 additions & 0 deletions api/api.py
Original file line number Diff line number Diff line change
Expand Up @@ -393,13 +393,16 @@ def generate_json_export(repo_url: str, pages: List[WikiPage]) -> str:
# Import the simplified chat implementation
from api.simple_chat import chat_completions_stream
from api.websocket_wiki import handle_websocket_chat
# Removed httpx import - no longer needed since CNB API proxy was removed

# Add the chat_completions_stream endpoint to the main app
app.add_api_route("/chat/completions/stream", chat_completions_stream, methods=["POST"])

# Add the WebSocket endpoint
app.add_websocket_route("/ws/chat", handle_websocket_chat)

# Removed CNB API proxy endpoint - backend handles all repository types via git clone

# --- Wiki Cache Helper Functions ---

WIKI_CACHE_DIR = os.path.join(get_adalflow_default_root_path(), "wikicache")
Expand Down
14 changes: 12 additions & 2 deletions api/config/embedder.json
Original file line number Diff line number Diff line change
@@ -1,13 +1,23 @@
{
"embedder": {
"client_class": "OpenAIClient",
"batch_size": 500,
"initialize_kwargs": {
"api_key": "${OPENAI_API_KEY}",
"base_url": "${OPENAI_BASE_URL}"
},
"batch_size": 10,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The batch_size has been drastically reduced from 500 to 10. This change may significantly degrade the performance of the document embedding process by increasing the number of API calls required. If this reduction is necessary for stability or model constraints, it would be beneficial to add a comment explaining the reason.

"model_kwargs": {
"model": "text-embedding-3-small",
"model": "text-embedding-v4",
"dimensions": 256,
"encoding_format": "float"
}
},
"embedder_ollama": {
"client_class": "OllamaClient",
"model_kwargs": {
"model": "nomic-embed-text"
}
},
"retriever": {
"top_k": 20
},
Expand Down
59 changes: 56 additions & 3 deletions api/data_pipeline.py
Original file line number Diff line number Diff line change
Expand Up @@ -57,11 +57,12 @@ def count_tokens(text: str, is_ollama_embedder: bool = None) -> int:

def download_repo(repo_url: str, local_path: str, type: str = "github", access_token: str = None) -> str:
"""
Downloads a Git repository (GitHub, GitLab, or Bitbucket) to a specified local path.
Downloads a Git repository (GitHub, GitLab, Bitbucket, CNB, or other Git hosting services) to a specified local path.

Args:
repo_url (str): The URL of the Git repository to clone.
local_path (str): The local directory where the repository will be cloned.
type (str): The type of repository (github, gitlab, bitbucket, cnb, web, local).
access_token (str, optional): Access token for private repositories.

Returns:
Expand Down Expand Up @@ -101,6 +102,10 @@ def download_repo(repo_url: str, local_path: str, type: str = "github", access_t
elif type == "bitbucket":
# Format: https://x-token-auth:{token}@bitbucket.org/owner/repo.git
clone_url = urlunparse((parsed.scheme, f"x-token-auth:{access_token}@{parsed.netloc}", parsed.path, '', '', ''))
elif type == "cnb" or type == "web":
# For CNB and generic web-based Git repositories, use a generic token format
# This works for most Git hosting services that support HTTP basic auth
clone_url = urlunparse((parsed.scheme, f"{access_token}@{parsed.netloc}", parsed.path, '', '', ''))

logger.info("Using access token for authentication")

Expand Down Expand Up @@ -648,13 +653,58 @@ def get_bitbucket_file_content(repo_url: str, file_path: str, access_token: str
raise ValueError(f"Failed to get file content: {str(e)}")


def get_web_file_content(repo_url: str, file_path: str) -> str:
"""
Retrieves the content of a file from a locally cloned web-based Git repository.

Args:
repo_url (str): The URL of the repository
file_path (str): The path to the file within the repository

Returns:
str: The content of the file as a string

Raises:
ValueError: If the file cannot be found or read
"""
try:
# Extract repository name from URL to find local path
url_parts = repo_url.rstrip('/').split('/')
if len(url_parts) >= 2:
owner = url_parts[-2]
repo = url_parts[-1].replace(".git", "")
repo_name = f"{owner}_{repo}"
else:
repo_name = url_parts[-1].replace(".git", "")

# Get the local repository path
from .utils import get_adalflow_default_root_path
root_path = get_adalflow_default_root_path()
local_repo_path = os.path.join(root_path, "repos", repo_name)

# Construct the full file path
full_file_path = os.path.join(local_repo_path, file_path)

# Check if file exists
if not os.path.exists(full_file_path):
raise ValueError(f"File not found: {file_path}")

# Read and return file content
with open(full_file_path, 'r', encoding='utf-8') as f:
return f.read()

except Exception as e:
raise ValueError(f"Failed to get file content from local repository: {str(e)}")


def get_file_content(repo_url: str, file_path: str, type: str = "github", access_token: str = None) -> str:
"""
Retrieves the content of a file from a Git repository (GitHub or GitLab).
Retrieves the content of a file from a Git repository (GitHub, GitLab, Bitbucket, CNB, or other Git hosting services).

Args:
repo_url (str): The URL of the repository
file_path (str): The path to the file within the repository
type (str): The type of repository (github, gitlab, bitbucket, cnb, web, local)
access_token (str, optional): Access token for private repositories

Returns:
Expand All @@ -669,8 +719,11 @@ def get_file_content(repo_url: str, file_path: str, type: str = "github", access
return get_gitlab_file_content(repo_url, file_path, access_token)
elif type == "bitbucket":
return get_bitbucket_file_content(repo_url, file_path, access_token)
elif type == "cnb" or type == "web":
# For cnb and web-type repositories, read from local cloned repository
return get_web_file_content(repo_url, file_path)
else:
raise ValueError("Unsupported repository URL. Only GitHub and GitLab are supported.")
raise ValueError("Unsupported repository type. Supported types: github, gitlab, bitbucket, cnb, web.")

class DatabaseManager:
"""
Expand Down
1 change: 1 addition & 0 deletions api/requirements.txt
Original file line number Diff line number Diff line change
Expand Up @@ -17,4 +17,5 @@ boto3>=1.34.0
websockets>=11.0.3
azure-identity>=1.12.0
azure-core>=1.24.0
# Removed httpx>=0.24.0 - no longer needed since CNB API proxy was removed

1 change: 1 addition & 0 deletions next.config.ts
Original file line number Diff line number Diff line change
Expand Up @@ -63,6 +63,7 @@ const nextConfig: NextConfig = {
source: '/api/lang/config',
destination: `${TARGET_SERVER_BASE_URL}/lang/config`,
},
// Removed CNB API proxy - backend handles all repository types via git clone
];
},
};
Expand Down
Loading