-
Notifications
You must be signed in to change notification settings - Fork 968
refactor: Refactor project structure for maintainability & modularity #309
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
refactor: Refactor project structure for maintainability & modularity #309
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Summary of Changes
Hello @khengyun, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!
I've initiated a significant refactoring of the project's containerization strategy. The primary goal was to decouple the backend Python API from the frontend Next.js application, moving from a single, monolithic Docker build to a more modular, multi-service approach. This change enhances maintainability, simplifies future scaling, and provides clearer separation of concerns for development and deployment workflows.
Highlights
- Architectural Refactoring: I've refactored the Docker setup to separate the backend (Python API) and frontend (Next.js application) into distinct Dockerfiles and services within
docker-compose.yml
. This significantly improves modularity and allows for independent development and deployment of each component. - Dockerfile Separation: The previous single
Dockerfile
has been removed. New dedicated Dockerfiles,backend/Dockerfile
andfrontend/Dockerfile
, have been introduced to manage the build and runtime environments for each part of the application separately. - Docker Compose Updates: The
docker-compose.yml
file has been updated to define two services:backend
andfrontend
. Thefrontend
service is configured to depend on thebackend
service, ensuring proper startup order. Health checks have also been added for both services. - Documentation Path Update: I've updated the
README.md
to reflect a new documentation structure, changing the paths for language links to point to adocs/
subdirectory.
Using Gemini Code Assist
The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.
Invoking Gemini
You can request assistance from Gemini at any point in your pull request via creating an issue comment (i.e. comment on the pull request page) using either /gemini <command>
or @gemini-code-assist <command>
. Below is a summary of the supported commands.
Feature | Command | Description |
---|---|---|
Code Review | /gemini review |
Performs a code review for the current pull request in its current state. |
Pull Request Summary | /gemini summary |
Provides a summary of the current pull request in its current state. |
Comment | @gemini-code-assist | Responds in comments when explicitly tagged, both in issue comments and review comments. |
Help | /gemini help |
Displays a list of available commands. |
Customization
To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/
folder in the base of the repository. Detailed instructions can be found here.
Limitations & Feedback
Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments or fill out our survey to provide feedback.
You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.
Footnotes
-
Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution. ↩
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code Review
This pull request refactors the application's Docker setup by splitting the monolithic Dockerfile into separate backend
and frontend
services and Dockerfiles. This is a great improvement for modularity and scalability.
My review focuses on ensuring the new Docker setup is correct, optimized, and maintains the functionality of the previous configuration. I've identified a few critical issues related to data persistence and inter-service communication in docker-compose.yml
, as well as opportunities for optimization in the new Dockerfiles. Please see my detailed comments for suggestions.
env_file: | ||
- .env |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The new backend
service is missing the volumes
and resource limits (mem_limit
, mem_reservation
) that were present in the old deepwiki
service.
- Volumes: The removed volumes (
~/.adalflow:/root/.adalflow
and./api/logs:/app/api/logs
) were responsible for persisting repository data, embeddings, and logs. Without them, all this data will be lost when the container is stopped or restarted. This is likely a critical regression. - Resource Limits: The memory limits helped ensure the container doesn't consume excessive resources.
I recommend re-adding these configurations to the backend
service to ensure data persistence and resource management.
env_file:
- .env
volumes:
- ~/.adalflow:/root/.adalflow # Persist repository and embedding data
- ./backend/api/logs:/app/api/logs # Persist log files across container restarts
mem_limit: 6g
mem_reservation: 2g
environment: | ||
- NODE_ENV=production |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The frontend
service needs to know how to communicate with the backend
service. The next.config.ts
file uses the SERVER_BASE_URL
environment variable for this, but it's not set for the frontend
service.
Without it, the frontend will default to http://localhost:8001
, which will not resolve to the backend container. You should set SERVER_BASE_URL
to http://backend:8001
so the frontend can reach the backend service within the Docker network.
environment:
- NODE_ENV=production
- SERVER_BASE_URL=http://backend:8001
backend/Dockerfile
Outdated
# Cài Python dependencies | ||
COPY backend/api/requirements.txt ./api/ | ||
RUN python -m venv /opt/venv && \ | ||
/opt/venv/bin/pip install --no-cache -r api/requirements.txt | ||
|
||
# Copy source | ||
COPY backend/api/ ./api/ | ||
|
||
ENV PATH="/opt/venv/bin:$PATH" | ||
ENV PORT=8001 | ||
|
||
# Health-check endpoint nên implement ở FastAPI | ||
EXPOSE 8001 | ||
CMD ["uvicorn", "api.main:app", "--host", "0.0.0.0", "--port", "8001"] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There are a few improvements that can be made to this Dockerfile:
- The comments are in Vietnamese. For consistency with the rest of the project, it's better to use English.
- The
CMD
instruction seems to reference the wrong application object.api/main.py
importsapp
fromapi.api
, so the command should likely beuvicorn api.api:app ...
. - The port is hardcoded in the
CMD
. It's better to use thePORT
environment variable defined on line 14 for flexibility.
Here's a suggested update that addresses these points:
# Install Python dependencies
COPY backend/api/requirements.txt ./api/
RUN python -m venv /opt/venv && \
/opt/venv/bin/pip install --no-cache -r api/requirements.txt
# Copy source
COPY backend/api/ ./api/
ENV PATH="/opt/venv/bin:$PATH"
ENV PORT=8001
# A health-check endpoint should be implemented in FastAPI
EXPOSE 8001
CMD ["sh", "-c", "uvicorn api.api:app --host 0.0.0.0 --port ${PORT}"]
frontend/Dockerfile
Outdated
# Runtime stage | ||
FROM node:20-alpine3.22 | ||
WORKDIR /app | ||
COPY --from=builder /app/.next ./.next | ||
COPY --from=builder /app/public ./public | ||
COPY --from=builder /app/node_modules ./node_modules | ||
COPY frontend/package.json ./ | ||
ENV NODE_ENV=production | ||
EXPOSE 3000 | ||
CMD ["npm", "run", "start"] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The runtime stage of this Dockerfile can be significantly optimized. The next.config.ts
is configured with output: 'standalone'
, which creates a minimal server build in .next/standalone
. However, this Dockerfile copies the entire node_modules
directory from the builder stage, which includes development dependencies and makes the final image much larger than necessary.
To leverage the standalone output, you should copy the .next/standalone
directory and the .next/static
directory, and then run the server with node server.js
. This will result in a smaller and more secure production image.
# Runtime stage
FROM node:20-alpine3.22
WORKDIR /app
ENV NODE_ENV=production
COPY --from=builder /app/public ./public
COPY --from=builder /app/.next/standalone ./
COPY --from=builder /app/.next/static ./.next/static
EXPOSE 3000
CMD ["node", "server.js"]
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code Review
This pull request refactors the application from a monolithic service into separate backend and frontend services, which is a great improvement for scalability and maintainability. The changes include new Dockerfiles for each service and an updated docker-compose.yml
.
My review focuses on improving the new Docker configurations:
- In
backend/Dockerfile
, I've suggested usingexec
in theCMD
instruction for proper signal handling. - In
docker-compose.yml
, I've pointed out that thefrontend
service needs theSERVER_BASE_URL
environment variable to communicate with the backend. I've also suggested restoring some environment variables and resource limits for thebackend
service for better resilience and production stability. - In
frontend/Dockerfile
, I've highlighted the importance of using a.dockerignore
file to prevent copying unnecessary files likenode_modules
into the Docker image.
These changes should make the new setup more robust and efficient.
environment: | ||
- NODE_ENV=production |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The frontend service needs to communicate with the backend service. The next.config.ts
file uses process.env.SERVER_BASE_URL
with a fallback to http://localhost:8001
. Inside the frontend
container, localhost
refers to the container itself, not the backend
container. To fix this, you should set SERVER_BASE_URL
to http://backend:8001
in the frontend
service's environment, as the services are on the same Docker network.
environment:
- NODE_ENV=production
- SERVER_BASE_URL=http://backend:8001
|
||
# A health-check endpoint should be implemented in FastAPI | ||
EXPOSE 8001 | ||
CMD ["sh", "-c", "uvicorn api.api:app --host 0.0.0.0 --port ${PORT}"] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The use of sh -c
in CMD
can lead to issues with signal handling, as the shell might not forward signals to the uvicorn
process, preventing a graceful shutdown. To ensure graceful shutdowns, you can use exec
to replace the shell process with the uvicorn
process. Also, the file should end with a newline character for POSIX compatibility.
CMD ["sh", "-c", "exec uvicorn api.api:app --host 0.0.0.0 --port ${PORT}"]
env_file: | ||
- .env | ||
environment: | ||
- PORT=${PORT:-8001} | ||
- NODE_ENV=production | ||
- SERVER_BASE_URL=http://localhost:${PORT:-8001} | ||
- LOG_LEVEL=${LOG_LEVEL:-INFO} | ||
- LOG_FILE_PATH=${LOG_FILE_PATH:-api/logs/application.log} | ||
volumes: | ||
- ~/.adalflow:/root/.adalflow # Persist repository and embedding data | ||
- ./api/logs:/app/api/logs # Persist log files across container restarts | ||
# Resource limits for docker-compose up (not Swarm mode) | ||
mem_limit: 6g | ||
mem_reservation: 2g | ||
# Health check configuration | ||
- ./backend/api/logs:/app/api/logs # Persist log files across container restarts |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The backend
service configuration is missing some environment variables (LOG_LEVEL
, LOG_FILE_PATH
) and resource limits (mem_limit
, mem_reservation
) that were present in the original deepwiki
service. While these might be set in the .env
file, explicitly defining them with defaults in the compose file improves clarity and makes the service more resilient. Resource limits are also important for production stability.
env_file:
- .env
environment:
- LOG_LEVEL=${LOG_LEVEL:-INFO}
- LOG_FILE_PATH=${LOG_FILE_PATH:-/app/api/logs/application.log}
volumes:
- ~/.adalflow:/root/.adalflow # Persist repository and embedding data
- ./backend/api/logs:/app/api/logs # Persist log files across container restarts
mem_limit: 6g
mem_reservation: 2g
WORKDIR /app | ||
COPY frontend/package.json frontend/package-lock.json ./ | ||
RUN npm ci --legacy-peer-deps | ||
COPY frontend/ ./ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This COPY
command is very broad and copies the entire frontend
directory. This can be problematic if a local node_modules
directory exists, as it will be copied into the Docker image, increasing its size and potentially causing issues. It is highly recommended to add a .dockerignore
file in the root of your project to exclude node_modules
and other unnecessary files from the Docker build context.
Summary
The original monolithic layout of DeepWiki-Open has become hard to navigate and contribute to. This PR splits the codebase into clear backend and frontend modules, and relocates all language-specific README files into a dedicated
docs/
folder. These changes lay the groundwork for:🔍 Motivation
The old structure mixed API, UI and docs at the root. By splitting into
backend/
&frontend/
, you can work on one part without accidentally touching the other.Future features—new UIs, CLI tools, data-processing jobs—can live in their own submodules.
Having dozens of
README.xx.md
files at top level was cluttered. Now each lives underdocs/{lang}/README.md
, making it trivial to find and update language-specific guides.What’s changed
New folders
backend/
– all Node/Python server code, Dockerfiles, testsfrontend/
– Next.js app, Tailwind config, UI componentsdocs/
– per-language READMEs (en/
,zh/
,vi/
, etc.)Removed top-level language READMEs and merged into
docs/
Updated CI/CD and Docker Compose to mount the new module paths
Refactored import paths and build scripts to point at
backend/
&frontend/
Cleaned up root directory: only config files, LICENSE, and high-level entrypoints remain
📷 Before & After
Before (monolith)

After (modular)

Benefits