-
Notifications
You must be signed in to change notification settings - Fork 4.9k
Routing: Reduce peak memory usage #5488
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
似乎不是 #5480 (comment) ? |
|
这块我记得最开始是为了降低 CPU 占用 后来有人反馈这样启动时内存峰值过高 不过一般使用场景下 geodat 高内存占用,根源还是由 domainmatcher 导致的 |
|
😁 这个 PR 改成给 geodat 加索引文件,只读取部分 geodat 而非整个 除非核心启动后的 gc 不能完全释放掉那些内存 |
|
如果只是这样的话那就没必要了,iOS 的 50MB 限制都够加载两个 geo 文件 |
I think there is a small misunderstanding here. This PR is not primarily about optimizing partial unmarshal for performance, but about solving a real memory-limit issue on iOS Packet Tunnel. In practice on iOS: So the issue is not whether “50 MB should theoretically be enough”, but that on iOS Packet Tunnel the startup peak memory usage (protobuf unmarshal + runtime overhead + page cache) can exceed the system limit and cause the process to be killed immediately. |
Yes, that’s correct. In practice the main issue appears before and during the construction of routing rules, not just during the initial geodat load. When using a large geosite (for example geosite:cn, which expands to ~7000 domain entries), the process is roughly: During this phase, both the expanded routing rules and the domain matcher structures coexist in memory, which significantly increases peak memory usage. On iOS Packet Tunnel, this peak is often enough to exceed the system limit and cause the process to be killed. This PR pervent keep []*RoutingRule in memory. |
如果只是这样的话还好,但 pre-map 似乎没必要吧?不是已经用上 mmap 了吗 最优解应该还是 #5480 (comment) ,有兴趣做一下吗? |
My bad, this should be fixed now.
I may be missing something here, could you please clarify what you mean or give some guidance on how you’d suggest approaching this? |
把 iOS 上 build 好的 DomainMatcher dump 到文件,然后用 mmap 读取并用 unsafe 转回 DomainMatcher,这个能实现吗 |
This looks tricky, but I see it as a next approach. I will try to explore it further next week. |
|
|
|
当前的流程是这样的:
因此改进的方向是 1 和 3 此 PR 做的是 1 针对 3 在循环里一边清理一边构建不就行了吗? |
|
都是启动阶段峰值内存问题且 1 已经由 mmap 解决了 #5480 ,据 @hossinasaadi 说这个 PR 解决的是 3 |
|
解决的似乎并不多 |
|
@Fangliding 话说你测一下这个 PR 现在的代码能降低峰值内存占用吗 |
|
这个 pr 我没有仔细 review 等下我再看看 mmap 不是滑动窗口,万一 seek 的分类比较靠后还是会加载很多,只能说缓解但不根治 |
|
我同时测了 5480和5482 的表现 只是comment在那而已 意思是 没想象的高(一百多MB的峰值少了六七MB)
这是针对苹果平台的 所以我一直没想发表意见 |
|
@hossinasaadi 这个 PR 现在的代码是依赖 mmap 的吗 |
|
Yes. It’s on mmap as far I checked. |
|
那你改个更精确的标题吧,准备合了,新增那个文件放 common/platform/filesystem 就行,别嵌套太多文件夹
|
|
我对 core 配置加载和启动过程中各对象的生命周期没有足够了解 3 是更激进地 这个 pr 应该是为了解决 config 对象被谁长期持有没法回收占用了内存 |
if we just rr.xxx = nil , seems api(update or add rule) will broken and wont update the list or may cause of other issues. |
|
@RPRX revert to |
|
只有 Windows 和 WASM 不支持 mmap 的话,不用 revert 了吧 解决一下 conflicts |
|
然后标题也要改一下 |
No need to revert. |
光从代码看 api 的 reload 函数所需的 config.rr.xxx 是从函数传入的,不是对象的字段,似乎可以直接 rr.xxx = nil 不会有副作用 我没有用过 api 仅从代码推测 如果可以的话代码会更干净 |
|
c715154遇到问题,会使用1.1.1.1解析所有出站,回退b38a412正常,访问国内地址使用223.5.5.5解析。 配置文件: "dns": { |
@Tsukimeizi Thanks for reporting this. Could you please confirm whether this PR resolves the issue? |
It works well. |
@RPRX So far, I've done the following: This prebuild step may only be necessary on iOS. If so, we can guard it behind a flag and enable it only on iOS; otherwise, we can fall back to the normal loading path. What needs to be add? |
|
@hossinasaadi 没想到这么快就实现了,iOS 上 geo 文件可以随便用了,或许彻底解决了 iOS 上内存受限的问题,你先 PR 一下吧 开启方式就环境变量指定临时文件夹 path 吧,Linux 硬路由上也需要这个,其它内存不受限的场景倒没必要,把规则放内存中更快 有了 PR 后你们测试一下 @iambabyninja @mangustyura |
|
@Meo597 我仔细查了下 mmap 确实是懒加载,所以说它有没有类似 swap 的机制,就是说读完没用的不再占用运行内存 |
This change can be implemented on other platforms as well (except Windows and WASM), but it has been tested only on Darwin (macOS) and iOS so far.
On iOS, the full CN geo file can now be loaded successfully without crashes or excessive memory usage.
On macOS, overall memory usage is reduced by approximately 20 MB based on local testing.