Conversation
There was a problem hiding this comment.
Summary of Changes
Hello @flyinglandlord, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!
This pull request introduces a proactive shared memory (/dev/shm) monitoring feature to LightLLM. It aims to alert users about potentially insufficient SHM space, which is critical for optimal performance, by providing both continuous periodic checks and an option for a one-time startup verification.
Highlights
- New Feature: SHM Size Monitoring: LightLLM now includes a mechanism to periodically check the available
/dev/shmsize, logging warnings if it falls below the recommended 128GB threshold. This helps users identify potential memory issues that could impact performance. - New Command-Line Argument: A new
--disable-shm-warningargument has been added toapi_cli.py. When this argument is set, LightLLM will perform a single SHM size check only during startup and then disable further periodic checks, providing flexibility for users who prefer a less verbose output.
Using Gemini Code Assist
The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.
Invoking Gemini
You can request assistance from Gemini at any point in your pull request via creating an issue comment (i.e. comment on the pull request page) using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands.
| Feature | Command | Description |
|---|---|---|
| Code Review | /gemini review |
Performs a code review for the current pull request in its current state. |
| Pull Request Summary | /gemini summary |
Provides a summary of the current pull request in its current state. |
| Comment | @gemini-code-assist | Responds in comments when explicitly tagged, both in issue comments and review comments. |
| Help | /gemini help |
Displays a list of available commands. |
Customization
To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.
Limitations & Feedback
Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments or fill out our survey to provide feedback.
You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.
Footnotes
-
Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution. ↩
lightllm/server/api_start.py
Outdated
| def get_shm_size_gb(): | ||
| """ | ||
| 获取 /dev/shm 的总大小(以GB为单位)。 | ||
| """ | ||
| try: | ||
| shm_path = "/dev/shm" | ||
| if not os.path.exists(shm_path): | ||
| logger.error(f"{shm_path} not exist, this may indicate a system or Docker configuration anomaly.") | ||
| return 0 | ||
|
|
||
| # shutil.disk_usage 返回 (total, used, free) | ||
| total_bytes = shutil.disk_usage(shm_path).total | ||
| total_gb = total_bytes / (1024 ** 3) | ||
| return total_gb | ||
| except Exception as e: | ||
| logger.error(f"Error getting /dev/shm size: {e}") | ||
| return 0 |
There was a problem hiding this comment.
The function calculates the total SHM size but the PR description refers to "available" space. This can be misleading. To check for available space, use shutil.disk_usage(shm_path).free and rename the function to get_shm_free_size_gb for clarity.
| def get_shm_size_gb(): | |
| """ | |
| 获取 /dev/shm 的总大小(以GB为单位)。 | |
| """ | |
| try: | |
| shm_path = "/dev/shm" | |
| if not os.path.exists(shm_path): | |
| logger.error(f"{shm_path} not exist, this may indicate a system or Docker configuration anomaly.") | |
| return 0 | |
| # shutil.disk_usage 返回 (total, used, free) | |
| total_bytes = shutil.disk_usage(shm_path).total | |
| total_gb = total_bytes / (1024 ** 3) | |
| return total_gb | |
| except Exception as e: | |
| logger.error(f"Error getting /dev/shm size: {e}") | |
| return 0 | |
| def get_shm_free_size_gb(): | |
| """ | |
| 获取 /dev/shm 的可用大小(以GB为单位)。 | |
| """ | |
| try: | |
| shm_path = "/dev/shm" | |
| if not os.path.exists(shm_path): | |
| logger.error(f"{shm_path} not exist, this may indicate a system or Docker configuration anomaly.") | |
| return 0 | |
| # shutil.disk_usage 返回 (total, used, free) | |
| free_bytes = shutil.disk_usage(shm_path).free | |
| free_gb = free_bytes / (1024 ** 3) | |
| return free_gb | |
| except Exception as e: | |
| logger.error(f"Error getting /dev/shm size: {e}") | |
| return 0 |
lightllm/server/api_start.py
Outdated
| required_size = 128 # 128G | ||
| if shm_size < required_size: | ||
| logger.warning(f"{RED}Available shm size is less than 128G: {shm_size:.2f}G{ENDC}") | ||
| else: | ||
| logger.info(f"{GREEN}/dev/shm available space is sufficient ({shm_size:.2f} GB >= {required_size} GB).{ENDC}") |
There was a problem hiding this comment.
For better maintainability, define required_size as a constant REQUIRED_SIZE_GB and use it in the warning message to avoid hardcoding the value 128G.
| required_size = 128 # 128G | |
| if shm_size < required_size: | |
| logger.warning(f"{RED}Available shm size is less than 128G: {shm_size:.2f}G{ENDC}") | |
| else: | |
| logger.info(f"{GREEN}/dev/shm available space is sufficient ({shm_size:.2f} GB >= {required_size} GB).{ENDC}") | |
| REQUIRED_SIZE_GB = 128 # 128G | |
| if shm_size < REQUIRED_SIZE_GB: | |
| logger.warning(f"{RED}Available shm size is less than {REQUIRED_SIZE_GB}G: {shm_size:.2f}G{ENDC}") | |
| else: | |
| logger.info(f"{GREEN}/dev/shm available space is sufficient ({shm_size:.2f} GB >= {REQUIRED_SIZE_GB} GB).{ENDC}") |
lightllm/server/api_start.py
Outdated
| import threading | ||
|
|
||
| def periodic_shm_warning(): | ||
| while True: | ||
| check_shm_size() | ||
| time.sleep(120) # 每 120 秒打印一次警告日志 | ||
|
|
||
| shm_warning_thread = threading.Thread(target=periodic_shm_warning, daemon=True) | ||
| shm_warning_thread.start() |
There was a problem hiding this comment.
For style and maintainability:
- Move
import threadingto the top of the file (PEP 8). - Define the sleep interval
120as a named constant (e.g.,SHM_CHECK_INTERVAL_S = 120) at the module level.
Since I can't suggest changes outside the diff, I'll define the constant locally, but consider moving both the import and the constant to the module level.
| import threading | |
| def periodic_shm_warning(): | |
| while True: | |
| check_shm_size() | |
| time.sleep(120) # 每 120 秒打印一次警告日志 | |
| shm_warning_thread = threading.Thread(target=periodic_shm_warning, daemon=True) | |
| shm_warning_thread.start() | |
| import threading | |
| SHM_CHECK_INTERVAL_S = 120 # Consider moving this to a module-level constant | |
| def periodic_shm_warning(): | |
| while True: | |
| check_shm_size() | |
| time.sleep(SHM_CHECK_INTERVAL_S) # 每 120 秒打印一次警告日志 | |
| shm_warning_thread = threading.Thread(target=periodic_shm_warning, daemon=True) | |
| shm_warning_thread.start() |
| args.cache_capacity * (image_size_bytes + image_token_size_bytes) | ||
| + req_shm_size_bytes | ||
| + out_token_queue_size_bytes | ||
| ) |
There was a problem hiding this comment.
args.running_max_req_size * 8 * 2 * args.max_req_total_len
There was a problem hiding this comment.
size of input_tokens and logprobs
lightllm/utils/shm_size_check.py
Outdated
| # 假设加载最大分辨率图片时,通过 tokenizer 得到最多的 image_tokens | ||
| if not hasattr(tokenizer, "get_image_token_length"): | ||
| raise AttributeError("Tokenizer must have a 'get_image_token_length' method for multimodal models.") | ||
| max_image_tokens = tokenizer.get_image_token_length(None) |
There was a problem hiding this comment.
构造一个img传进去,防止出错
lightllm/utils/shm_size_check.py
Outdated
| req_class_size = ctypes.sizeof(ChunkedPrefillReq) | ||
| req_shm_size_bytes = req_class_size * args.running_max_req_size | ||
|
|
||
| # 估算OutTokenQueue所需shm大小 |
There was a problem hiding this comment.
这部分已经在上面的sizeof算过了,不用再算一遍
New Startup Command:
--disable-shm-warning.By default, LightLLM will now periodically check the available SHM size every 120 seconds and log warnings if it's below the recommended threshold (128GB).
If
--disable-shm-warningis set, LightLLM will perform a one-time SHM size check only during startup, and then disable further periodic checks.