Skip to content

refactor: Refactor project structure for maintainability & modularity #309

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 4 commits into
base: main
Choose a base branch
from

Conversation

khengyun
Copy link

@khengyun khengyun commented Aug 8, 2025

Summary

The original monolithic layout of DeepWiki-Open has become hard to navigate and contribute to. This PR splits the codebase into clear backend and frontend modules, and relocates all language-specific README files into a dedicated docs/ folder. These changes lay the groundwork for:

  • Easier onboarding of new contributors
  • Independent evolution of backend vs. frontend
  • Clear separation of responsibilities

🔍 Motivation

  1. Monolith → Modular
    The old structure mixed API, UI and docs at the root. By splitting into backend/ & frontend/, you can work on one part without accidentally touching the other.
  2. Scalability
    Future features—new UIs, CLI tools, data-processing jobs—can live in their own submodules.
  3. Localized README maintenance
    Having dozens of README.xx.md files at top level was cluttered. Now each lives under docs/{lang}/README.md, making it trivial to find and update language-specific guides.

What’s changed

  • New folders

    • backend/ – all Node/Python server code, Dockerfiles, tests
    • frontend/ – Next.js app, Tailwind config, UI components
    • docs/ – per-language READMEs (en/, zh/, vi/, etc.)
  • Removed top-level language READMEs and merged into docs/

  • Updated CI/CD and Docker Compose to mount the new module paths

  • Refactored import paths and build scripts to point at backend/ & frontend/

  • Cleaned up root directory: only config files, LICENSE, and high-level entrypoints remain

📷 Before & After

Before (monolith)
image

After (modular)
image

Benefits

  • Clearer boundaries between service layers
  • Faster CI builds (scoped to changed module)
  • Simplified dependency management per module
  • Lower barrier to entry for new contributors

Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Summary of Changes

Hello @khengyun, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

I've initiated a significant refactoring of the project's containerization strategy. The primary goal was to decouple the backend Python API from the frontend Next.js application, moving from a single, monolithic Docker build to a more modular, multi-service approach. This change enhances maintainability, simplifies future scaling, and provides clearer separation of concerns for development and deployment workflows.

Highlights

  • Architectural Refactoring: I've refactored the Docker setup to separate the backend (Python API) and frontend (Next.js application) into distinct Dockerfiles and services within docker-compose.yml. This significantly improves modularity and allows for independent development and deployment of each component.
  • Dockerfile Separation: The previous single Dockerfile has been removed. New dedicated Dockerfiles, backend/Dockerfile and frontend/Dockerfile, have been introduced to manage the build and runtime environments for each part of the application separately.
  • Docker Compose Updates: The docker-compose.yml file has been updated to define two services: backend and frontend. The frontend service is configured to depend on the backend service, ensuring proper startup order. Health checks have also been added for both services.
  • Documentation Path Update: I've updated the README.md to reflect a new documentation structure, changing the paths for language links to point to a docs/ subdirectory.
Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point in your pull request via creating an issue comment (i.e. comment on the pull request page) using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in issue comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments or fill out our survey to provide feedback.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request refactors the application's Docker setup by splitting the monolithic Dockerfile into separate backend and frontend services and Dockerfiles. This is a great improvement for modularity and scalability.

My review focuses on ensuring the new Docker setup is correct, optimized, and maintains the functionality of the previous configuration. I've identified a few critical issues related to data persistence and inter-service communication in docker-compose.yml, as well as opportunities for optimization in the new Dockerfiles. Please see my detailed comments for suggestions.

Comment on lines 11 to 12
env_file:
- .env
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

critical

The new backend service is missing the volumes and resource limits (mem_limit, mem_reservation) that were present in the old deepwiki service.

  • Volumes: The removed volumes (~/.adalflow:/root/.adalflow and ./api/logs:/app/api/logs) were responsible for persisting repository data, embeddings, and logs. Without them, all this data will be lost when the container is stopped or restarted. This is likely a critical regression.
  • Resource Limits: The memory limits helped ensure the container doesn't consume excessive resources.

I recommend re-adding these configurations to the backend service to ensure data persistence and resource management.

    env_file:
      - .env
    volumes:
      - ~/.adalflow:/root/.adalflow      # Persist repository and embedding data
      - ./backend/api/logs:/app/api/logs # Persist log files across container restarts
    mem_limit: 6g
    mem_reservation: 2g

Comment on lines +26 to +27
environment:
- NODE_ENV=production
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

critical

The frontend service needs to know how to communicate with the backend service. The next.config.ts file uses the SERVER_BASE_URL environment variable for this, but it's not set for the frontend service.

Without it, the frontend will default to http://localhost:8001, which will not resolve to the backend container. You should set SERVER_BASE_URL to http://backend:8001 so the frontend can reach the backend service within the Docker network.

    environment:
      - NODE_ENV=production
      - SERVER_BASE_URL=http://backend:8001

Comment on lines 5 to 18
# Cài Python dependencies
COPY backend/api/requirements.txt ./api/
RUN python -m venv /opt/venv && \
/opt/venv/bin/pip install --no-cache -r api/requirements.txt

# Copy source
COPY backend/api/ ./api/

ENV PATH="/opt/venv/bin:$PATH"
ENV PORT=8001

# Health-check endpoint nên implement ở FastAPI
EXPOSE 8001
CMD ["uvicorn", "api.main:app", "--host", "0.0.0.0", "--port", "8001"]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

There are a few improvements that can be made to this Dockerfile:

  1. The comments are in Vietnamese. For consistency with the rest of the project, it's better to use English.
  2. The CMD instruction seems to reference the wrong application object. api/main.py imports app from api.api, so the command should likely be uvicorn api.api:app ....
  3. The port is hardcoded in the CMD. It's better to use the PORT environment variable defined on line 14 for flexibility.

Here's a suggested update that addresses these points:

# Install Python dependencies
COPY backend/api/requirements.txt ./api/
RUN python -m venv /opt/venv && \
    /opt/venv/bin/pip install --no-cache -r api/requirements.txt

# Copy source
COPY backend/api/ ./api/

ENV PATH="/opt/venv/bin:$PATH"
ENV PORT=8001

# A health-check endpoint should be implemented in FastAPI
EXPOSE 8001
CMD ["sh", "-c", "uvicorn api.api:app --host 0.0.0.0 --port ${PORT}"]

Comment on lines 10 to 19
# Runtime stage
FROM node:20-alpine3.22
WORKDIR /app
COPY --from=builder /app/.next ./.next
COPY --from=builder /app/public ./public
COPY --from=builder /app/node_modules ./node_modules
COPY frontend/package.json ./
ENV NODE_ENV=production
EXPOSE 3000
CMD ["npm", "run", "start"]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

The runtime stage of this Dockerfile can be significantly optimized. The next.config.ts is configured with output: 'standalone', which creates a minimal server build in .next/standalone. However, this Dockerfile copies the entire node_modules directory from the builder stage, which includes development dependencies and makes the final image much larger than necessary.

To leverage the standalone output, you should copy the .next/standalone directory and the .next/static directory, and then run the server with node server.js. This will result in a smaller and more secure production image.

# Runtime stage
FROM node:20-alpine3.22
WORKDIR /app
ENV NODE_ENV=production

COPY --from=builder /app/public ./public
COPY --from=builder /app/.next/standalone ./
COPY --from=builder /app/.next/static ./.next/static

EXPOSE 3000
CMD ["node", "server.js"]

Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request refactors the application from a monolithic service into separate backend and frontend services, which is a great improvement for scalability and maintainability. The changes include new Dockerfiles for each service and an updated docker-compose.yml.

My review focuses on improving the new Docker configurations:

  • In backend/Dockerfile, I've suggested using exec in the CMD instruction for proper signal handling.
  • In docker-compose.yml, I've pointed out that the frontend service needs the SERVER_BASE_URL environment variable to communicate with the backend. I've also suggested restoring some environment variables and resource limits for the backend service for better resilience and production stability.
  • In frontend/Dockerfile, I've highlighted the importance of using a .dockerignore file to prevent copying unnecessary files like node_modules into the Docker image.

These changes should make the new setup more robust and efficient.

Comment on lines +29 to +30
environment:
- NODE_ENV=production
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

The frontend service needs to communicate with the backend service. The next.config.ts file uses process.env.SERVER_BASE_URL with a fallback to http://localhost:8001. Inside the frontend container, localhost refers to the container itself, not the backend container. To fix this, you should set SERVER_BASE_URL to http://backend:8001 in the frontend service's environment, as the services are on the same Docker network.

    environment:
      - NODE_ENV=production
      - SERVER_BASE_URL=http://backend:8001


# A health-check endpoint should be implemented in FastAPI
EXPOSE 8001
CMD ["sh", "-c", "uvicorn api.api:app --host 0.0.0.0 --port ${PORT}"]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The use of sh -c in CMD can lead to issues with signal handling, as the shell might not forward signals to the uvicorn process, preventing a graceful shutdown. To ensure graceful shutdowns, you can use exec to replace the shell process with the uvicorn process. Also, the file should end with a newline character for POSIX compatibility.

CMD ["sh", "-c", "exec uvicorn api.api:app --host 0.0.0.0 --port ${PORT}"]

Comment on lines 11 to +15
env_file:
- .env
environment:
- PORT=${PORT:-8001}
- NODE_ENV=production
- SERVER_BASE_URL=http://localhost:${PORT:-8001}
- LOG_LEVEL=${LOG_LEVEL:-INFO}
- LOG_FILE_PATH=${LOG_FILE_PATH:-api/logs/application.log}
volumes:
- ~/.adalflow:/root/.adalflow # Persist repository and embedding data
- ./api/logs:/app/api/logs # Persist log files across container restarts
# Resource limits for docker-compose up (not Swarm mode)
mem_limit: 6g
mem_reservation: 2g
# Health check configuration
- ./backend/api/logs:/app/api/logs # Persist log files across container restarts
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The backend service configuration is missing some environment variables (LOG_LEVEL, LOG_FILE_PATH) and resource limits (mem_limit, mem_reservation) that were present in the original deepwiki service. While these might be set in the .env file, explicitly defining them with defaults in the compose file improves clarity and makes the service more resilient. Resource limits are also important for production stability.

    env_file:
      - .env
    environment:
      - LOG_LEVEL=${LOG_LEVEL:-INFO}
      - LOG_FILE_PATH=${LOG_FILE_PATH:-/app/api/logs/application.log}
    volumes:
      - ~/.adalflow:/root/.adalflow      # Persist repository and embedding data
      - ./backend/api/logs:/app/api/logs # Persist log files across container restarts
    mem_limit: 6g
    mem_reservation: 2g

WORKDIR /app
COPY frontend/package.json frontend/package-lock.json ./
RUN npm ci --legacy-peer-deps
COPY frontend/ ./
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

This COPY command is very broad and copies the entire frontend directory. This can be problematic if a local node_modules directory exists, as it will be copied into the Docker image, increasing its size and potentially causing issues. It is highly recommended to add a .dockerignore file in the root of your project to exclude node_modules and other unnecessary files from the Docker build context.

@khengyun khengyun changed the title refactor: init refactor: Refactor project structure for maintainability & modularity Aug 8, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant