refactor: Refactor project structure for maintainability & modularity #309

khengyun · 2025-08-08T06:13:49Z

Summary

The original monolithic layout of DeepWiki-Open has become hard to navigate and contribute to. This PR splits the codebase into clear backend and frontend modules, and relocates all language-specific README files into a dedicated docs/ folder. These changes lay the groundwork for:

Easier onboarding of new contributors
Independent evolution of backend vs. frontend
Clear separation of responsibilities

🔍 Motivation

Monolith → Modular
The old structure mixed API, UI and docs at the root. By splitting into backend/ & frontend/, you can work on one part without accidentally touching the other.
Scalability
Future features—new UIs, CLI tools, data-processing jobs—can live in their own submodules.
Localized README maintenance
Having dozens of README.xx.md files at top level was cluttered. Now each lives under docs/{lang}/README.md, making it trivial to find and update language-specific guides.

What’s changed

New folders
- backend/ – all Node/Python server code, Dockerfiles, tests
- frontend/ – Next.js app, Tailwind config, UI components
- docs/ – per-language READMEs (en/, zh/, vi/, etc.)
Removed top-level language READMEs and merged into docs/
Updated CI/CD and Docker Compose to mount the new module paths
Refactored import paths and build scripts to point at backend/ & frontend/
Cleaned up root directory: only config files, LICENSE, and high-level entrypoints remain

📷 Before & After

Before (monolith)

After (modular)

Benefits

Clearer boundaries between service layers
Faster CI builds (scoped to changed module)
Simplified dependency management per module
Lower barrier to entry for new contributors

gemini-code-assist

Summary of Changes

Hello @khengyun, I'm Gemini Code Assist¹! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

I've initiated a significant refactoring of the project's containerization strategy. The primary goal was to decouple the backend Python API from the frontend Next.js application, moving from a single, monolithic Docker build to a more modular, multi-service approach. This change enhances maintainability, simplifies future scaling, and provides clearer separation of concerns for development and deployment workflows.

Highlights

Architectural Refactoring: I've refactored the Docker setup to separate the backend (Python API) and frontend (Next.js application) into distinct Dockerfiles and services within docker-compose.yml. This significantly improves modularity and allows for independent development and deployment of each component.
Dockerfile Separation: The previous single Dockerfile has been removed. New dedicated Dockerfiles, backend/Dockerfile and frontend/Dockerfile, have been introduced to manage the build and runtime environments for each part of the application separately.
Docker Compose Updates: The docker-compose.yml file has been updated to define two services: backend and frontend. The frontend service is configured to depend on the backend service, ensuring proper startup order. Health checks have also been added for both services.
Documentation Path Update: I've updated the README.md to reflect a new documentation structure, changing the paths for language links to point to a docs/ subdirectory.

Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point in your pull request via creating an issue comment (i.e. comment on the pull request page) using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands.

Feature	Command	Description
Code Review	`/gemini review`	Performs a code review for the current pull request in its current state.
Pull Request Summary	`/gemini summary`	Provides a summary of the current pull request in its current state.
Comment	@gemini-code-assist	Responds in comments when explicitly tagged, both in issue comments and review comments.
Help	`/gemini help`	Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments or fill out our survey to provide feedback.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution. ↩

gemini-code-assist

Code Review

This pull request refactors the application's Docker setup by splitting the monolithic Dockerfile into separate backend and frontend services and Dockerfiles. This is a great improvement for modularity and scalability.

My review focuses on ensuring the new Docker setup is correct, optimized, and maintains the functionality of the previous configuration. I've identified a few critical issues related to data persistence and inter-service communication in docker-compose.yml, as well as opportunities for optimization in the new Dockerfiles. Please see my detailed comments for suggestions.

gemini-code-assist · 2025-08-08T06:15:44Z

docker-compose.yml

    env_file:
      - .env


The new backend service is missing the volumes and resource limits (mem_limit, mem_reservation) that were present in the old deepwiki service.

Volumes: The removed volumes (~/.adalflow:/root/.adalflow and ./api/logs:/app/api/logs) were responsible for persisting repository data, embeddings, and logs. Without them, all this data will be lost when the container is stopped or restarted. This is likely a critical regression.

Resource Limits: The memory limits helped ensure the container doesn't consume excessive resources.

I recommend re-adding these configurations to the backend service to ensure data persistence and resource management.

env_file: - .env volumes: - ~/.adalflow:/root/.adalflow # Persist repository and embedding data - ./backend/api/logs:/app/api/logs # Persist log files across container restarts mem_limit: 6g mem_reservation: 2g

gemini-code-assist · 2025-08-08T06:15:44Z

docker-compose.yml

+    environment:
+      - NODE_ENV=production


The frontend service needs to know how to communicate with the backend service. The next.config.ts file uses the SERVER_BASE_URL environment variable for this, but it's not set for the frontend service.

Without it, the frontend will default to http://localhost:8001, which will not resolve to the backend container. You should set SERVER_BASE_URL to http://backend:8001 so the frontend can reach the backend service within the Docker network.

environment: - NODE_ENV=production - SERVER_BASE_URL=http://backend:8001

gemini-code-assist · 2025-08-08T06:15:44Z

backend/Dockerfile

+# Cài Python dependencies
+COPY backend/api/requirements.txt ./api/
+RUN python -m venv /opt/venv && \
+    /opt/venv/bin/pip install --no-cache -r api/requirements.txt
+
+# Copy source
+COPY backend/api/ ./api/
+
+ENV PATH="/opt/venv/bin:$PATH"
+ENV PORT=8001
+
+# Health-check endpoint nên implement ở FastAPI
+EXPOSE 8001
+CMD ["uvicorn", "api.main:app", "--host", "0.0.0.0", "--port", "8001"]


There are a few improvements that can be made to this Dockerfile:

The comments are in Vietnamese. For consistency with the rest of the project, it's better to use English.

The CMD instruction seems to reference the wrong application object. api/main.py imports app from api.api, so the command should likely be uvicorn api.api:app ....

The port is hardcoded in the CMD. It's better to use the PORT environment variable defined on line 14 for flexibility.

Here's a suggested update that addresses these points:

# Install Python dependencies COPY backend/api/requirements.txt ./api/ RUN python -m venv /opt/venv && \ /opt/venv/bin/pip install --no-cache -r api/requirements.txt # Copy source COPY backend/api/ ./api/ ENV PATH="/opt/venv/bin:$PATH" ENV PORT=8001 # A health-check endpoint should be implemented in FastAPI EXPOSE 8001 CMD ["sh", "-c", "uvicorn api.api:app --host 0.0.0.0 --port ${PORT}"]

gemini-code-assist · 2025-08-08T06:15:44Z

frontend/Dockerfile

+# Runtime stage
+FROM node:20-alpine3.22
+WORKDIR /app
+COPY --from=builder /app/.next ./.next
+COPY --from=builder /app/public ./public
+COPY --from=builder /app/node_modules ./node_modules
+COPY frontend/package.json ./
+ENV NODE_ENV=production
+EXPOSE 3000
+CMD ["npm", "run", "start"]


The runtime stage of this Dockerfile can be significantly optimized. The next.config.ts is configured with output: 'standalone', which creates a minimal server build in .next/standalone. However, this Dockerfile copies the entire node_modules directory from the builder stage, which includes development dependencies and makes the final image much larger than necessary.

To leverage the standalone output, you should copy the .next/standalone directory and the .next/static directory, and then run the server with node server.js. This will result in a smaller and more secure production image.

# Runtime stage FROM node:20-alpine3.22 WORKDIR /app ENV NODE_ENV=production COPY --from=builder /app/public ./public COPY --from=builder /app/.next/standalone ./ COPY --from=builder /app/.next/static ./.next/static EXPOSE 3000 CMD ["node", "server.js"]

…t mention

gemini-code-assist

Code Review

This pull request refactors the application from a monolithic service into separate backend and frontend services, which is a great improvement for scalability and maintainability. The changes include new Dockerfiles for each service and an updated docker-compose.yml.

My review focuses on improving the new Docker configurations:

In backend/Dockerfile, I've suggested using exec in the CMD instruction for proper signal handling.
In docker-compose.yml, I've pointed out that the frontend service needs the SERVER_BASE_URL environment variable to communicate with the backend. I've also suggested restoring some environment variables and resource limits for the backend service for better resilience and production stability.
In frontend/Dockerfile, I've highlighted the importance of using a .dockerignore file to prevent copying unnecessary files like node_modules into the Docker image.

These changes should make the new setup more robust and efficient.

gemini-code-assist · 2025-08-08T06:27:29Z

docker-compose.yml

+    environment:
+      - NODE_ENV=production


The frontend service needs to communicate with the backend service. The next.config.ts file uses process.env.SERVER_BASE_URL with a fallback to http://localhost:8001. Inside the frontend container, localhost refers to the container itself, not the backend container. To fix this, you should set SERVER_BASE_URL to http://backend:8001 in the frontend service's environment, as the services are on the same Docker network.

environment: - NODE_ENV=production - SERVER_BASE_URL=http://backend:8001

gemini-code-assist · 2025-08-08T06:27:30Z

backend/Dockerfile

+
+# A health-check endpoint should be implemented in FastAPI
+EXPOSE 8001
+CMD ["sh", "-c", "uvicorn api.api:app --host 0.0.0.0 --port ${PORT}"]


The use of sh -c in CMD can lead to issues with signal handling, as the shell might not forward signals to the uvicorn process, preventing a graceful shutdown. To ensure graceful shutdowns, you can use exec to replace the shell process with the uvicorn process. Also, the file should end with a newline character for POSIX compatibility.

CMD ["sh", "-c", "exec uvicorn api.api:app --host 0.0.0.0 --port ${PORT}"]

gemini-code-assist · 2025-08-08T06:27:30Z

docker-compose.yml

    env_file:
      - .env
-    environment:
-      - PORT=${PORT:-8001}
-      - NODE_ENV=production
-      - SERVER_BASE_URL=http://localhost:${PORT:-8001}
-      - LOG_LEVEL=${LOG_LEVEL:-INFO}
-      - LOG_FILE_PATH=${LOG_FILE_PATH:-api/logs/application.log}
    volumes:
      - ~/.adalflow:/root/.adalflow      # Persist repository and embedding data
-      - ./api/logs:/app/api/logs          # Persist log files across container restarts
-    # Resource limits for docker-compose up (not Swarm mode)
-    mem_limit: 6g
-    mem_reservation: 2g
-    # Health check configuration
+      - ./backend/api/logs:/app/api/logs # Persist log files across container restarts


The backend service configuration is missing some environment variables (LOG_LEVEL, LOG_FILE_PATH) and resource limits (mem_limit, mem_reservation) that were present in the original deepwiki service. While these might be set in the .env file, explicitly defining them with defaults in the compose file improves clarity and makes the service more resilient. Resource limits are also important for production stability.

env_file: - .env environment: - LOG_LEVEL=${LOG_LEVEL:-INFO} - LOG_FILE_PATH=${LOG_FILE_PATH:-/app/api/logs/application.log} volumes: - ~/.adalflow:/root/.adalflow # Persist repository and embedding data - ./backend/api/logs:/app/api/logs # Persist log files across container restarts mem_limit: 6g mem_reservation: 2g

gemini-code-assist · 2025-08-08T06:27:30Z

frontend/Dockerfile

+WORKDIR /app
+COPY frontend/package.json frontend/package-lock.json ./
+RUN npm ci --legacy-peer-deps
+COPY frontend/ ./


This COPY command is very broad and copies the entire frontend directory. This can be problematic if a local node_modules directory exists, as it will be copied into the Docker image, increasing its size and potentially causing issues. It is highly recommended to add a .dockerignore file in the root of your project to exclude node_modules and other unnecessary files from the Docker build context.

Kurea · 2025-08-13T10:20:20Z

Hi @khengyun and thank you for this highly valuable contribution !

I have some suggestions (in addition to those made by Gemini):
Documentation

I think we should add a cd backend and an cd frontend before installs
Maybe we can update project structure (without comments to prevent translation issues but just to display frontend and backend directory)
link to the english readme is broken, we need to replace ./Readme.md with ../Readme.md
It looks like repomix-output.md was an automatically generated file. To keep the docs directory clean, what do you think about removing it from the PR?

Docker and environement

To prevent accidentally tracking node_modules or build artifacts, could we update the .gitignore entries from /.next/ to .next/ and /node_modules to node_modules/? This makes the rule apply to any such folder in the project, not just in the root.
For consistency, we can replace PYTHON_BACKEND_HOST by SERVER_BASE_URL in app/api/wiki/projects/route.ts
To ensure the frontend container builds correctly with the backend URL, we'll need to pass SERVER_BASE_URL as a build argument (I think it will still be needed in the environment part too).
- in docker-compose

frontend:
    build:
      context: .
      dockerfile: frontend/Dockerfile
      args:
        - SERVER_BASE_URL=http://backend:8001

in frontend/Dockerfile just before npm run build, we need to receive this argument and make it available to the build process:

ARG SERVER_BASE_URL
ENV SERVER_BASE_URL=${SERVER_BASE_URL}

Websockets

To get websockets working correctly through the Next.js dev server (like the other API calls), we can add a rewrite rule to next.config.ts. This avoids CORS issues and simplifies the client-side code.

     {
       source: '/ws/chat',
       destination: `${TARGET_SERVER_BASE_URL}/ws/chat`,
     },

With the proxy rule above, we can then simplify all the WebSocket connections in the page.tsx files and websocketClient.ts to use a relative path: const wsUrl = `/ws/chat`;

These are just minor suggestions to tie it all together. Let me know if you have any questions or need a hand with any of them!

…ood4maintain

- Add cd commands before installs in documentation - Fix project structure and broken README links in docs/ - Update .gitignore patterns and replace PYTHON_BACKEND_HOST with SERVER_BASE_URL - Improve Docker configuration with proper build args - Simplify WebSocket connections with relative paths and rewrite rules Resolves all 9 points raised in the comprehensive code review.

khengyun · 2025-08-16T02:33:46Z

Hi @Kurea! 👋

Thank you so much for the detailed and valuable feedback! I really appreciate you taking the time to review the code and provide such comprehensive suggestions.

I've implemented all of your recommendations:

✅ Documentation Updates

✅ Added cd backend and cd frontend before install commands in README.md and Ollama-instruction.md
✅ Updated project structure to display frontend and backend directories without comments to prevent translation issues
✅ Fixed broken English README links - replaced ./README.md with ../README.md in all docs/ files
✅ Removed repomix-output.md from docs directory

✅ Docker and Environment

✅ Updated .gitignore entries from /node_modules to node_modules/ and /.next/ to .next/
✅ Replaced PYTHON_BACKEND_HOST with SERVER_BASE_URL in app/api/wiki/projects/route.ts
✅ Added SERVER_BASE_URL as build argument in frontend/Dockerfile
✅ Updated docker-compose.yml frontend section to pass SERVER_BASE_URL as both build arg and environment variable

✅ Websockets

✅ Added rewrite rule to next.config.ts for websockets: {source: '/ws/chat', destination: '${TARGET_SERVER_BASE_URL}/ws/chat'}
✅ Simplified WebSocket connections in websocketClient.ts to use relative path /ws/chat instead of constructing full URLs

All changes have been tested and are working correctly. The Docker setup now properly handles the backend URL configuration, and websockets work seamlessly through the Next.js dev server without CORS issues.

Thanks again for the thorough review - these improvements make the project much more robust and user-friendly! 🚀

refactor: init

7b5f351

gemini-code-assist bot reviewed Aug 8, 2025

View reviewed changes

khengyun added 2 commits August 8, 2025 13:20

fix: backend service is missing the volumes

f4b706b

fix: optimal dockerfile for backend and frontend as gemini-code-assis…

f73bb08

…t mention

gemini-code-assist bot reviewed Aug 8, 2025

View reviewed changes

khengyun changed the title ~~refactor: init~~ refactor: Refactor project structure for maintainability & modularity Aug 8, 2025

Delete repomix-output.md

ba950e7

khengyun added 2 commits August 16, 2025 08:46

Merge remote-tracking branch 'upstream/main' into project-structure-g…

5f0a148

…ood4maintain

refactor: Refactor project structure for maintainability & modularity #309

Are you sure you want to change the base?

refactor: Refactor project structure for maintainability & modularity #309

Uh oh!

Conversation

khengyun commented Aug 8, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

🔍 Motivation

What’s changed

📷 Before & After

Benefits

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Summary of Changes

Highlights

Footnotes

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist bot Aug 8, 2025

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Aug 8, 2025

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Aug 8, 2025

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Aug 8, 2025

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist bot Aug 8, 2025

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Aug 8, 2025

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Aug 8, 2025

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Aug 8, 2025

Choose a reason for hiding this comment

Uh oh!

Kurea commented Aug 13, 2025

Uh oh!

khengyun commented Aug 16, 2025

✅ Documentation Updates

✅ Docker and Environment

✅ Websockets

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

khengyun commented Aug 8, 2025 •

edited

Loading