[Trainer] remove redundant memory metrics and set enable as default #8374

SylarTiaNII · 2024-05-07T07:56:38Z

PR types

Others

PR changes

Others

Description

remove redundant memory metrics and enable memory metrics print as default

paddle-bot · 2024-05-07T07:56:42Z

Thanks for your contribution!

ZHUI · 2024-05-07T08:01:30Z

paddlenlp/trainer/trainer.py

+                    logs["current_memory_allocated"] = current_memory_allocated / divisor
+                    logs["current_memory_reserved"] = current_memory_reserved / divisor
+                    logs["max_memory_allocated"] = max_memory_allocated / divisor
+                    logs["max_memory_reserved"] = max_memory_reserved / divisor


PaddleNLP/scripts/distribute/ci_case_dy.sh

Line 454 in 09a0ce7

mem=`cat $log_dir/workerlog.0 | grep 'global_step: 30' | awk -F 'gpu_max_memory_reserved: ' '{print $2}' | awk -F ',' '{print $1}'`

这处也对应改了吧，gpu_max_memory_reserved -> max_memory_reserved

ZHUI · 2024-05-07T08:04:27Z

paddlenlp/trainer/trainer.py

+                    max_memory_reserved = core.device_memory_stat_peak_value("Reserved", device_id)
+                    logs["current_memory_allocated"] = current_memory_allocated / divisor
+                    logs["current_memory_reserved"] = current_memory_reserved / divisor
+                    logs["max_memory_allocated"] = max_memory_allocated / divisor


Suggested change

logs["max_memory_allocated"] = max_memory_allocated / divisor

logs["max_memory_allocated"] = max_memory_allocated >> 20

这个之前是MB单位，建议不要改变了，保持原来写法。用除法的话，是浮点数，还有小数位的问题，建议直接位运算，MB为单位

codecov · 2024-05-07T08:27:03Z

Codecov Report

❌ Patch coverage is 11.11111% with 16 lines in your changes missing coverage. Please review.
✅ Project coverage is 54.60%. Comparing base (09a0ce7) to head (4821ce2).
⚠️ Report is 1214 commits behind head on develop.

Files with missing lines	Patch %	Lines
paddlenlp/trainer/trainer.py	11.11%	16 Missing ⚠️

❌ Your patch status has failed because the patch coverage (11.11%) is below the target coverage (80.00%). You can increase the patch coverage or adjust the target coverage.
❌ Your project status has failed because the head coverage (54.60%) is below the target coverage (58.00%). You can increase the head coverage or adjust the target coverage.

Additional details and impacted files

@@             Coverage Diff             @@
##           develop    #8374      +/-   ##
===========================================
- Coverage    55.36%   54.60%   -0.77%     
===========================================
  Files          614      636      +22     
  Lines        96016   108827   +12811     
===========================================
+ Hits         53164    59428    +6264     
- Misses       42852    49399    +6547

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

ZHUI · 2024-05-07T08:47:48Z

paddlenlp/trainer/training_args.py

    )
    skip_memory_metrics: bool = field(
-        default=True, metadata={"help": "Whether or not to skip adding of memory profiler reports to metrics."}
+        default=False, metadata={"help": "Whether or not to skip adding of memory profiler reports to metrics."}


这个别改了吧，你们需要用，自己打看

ZHUI

LGTM

github-actions · 2024-07-10T00:18:15Z

This Pull Request is stale because it has been open for 60 days with no activity. 当前Pull Request 60天内无活动，被标记为stale。

github-actions · 2024-09-09T00:20:57Z

This Pull Request is stale because it has been open for 60 days with no activity. 当前Pull Request 60天内无活动，被标记为stale。

CLAassistant · 2024-10-14T07:15:24Z

All committers have signed the CLA.

github-actions · 2025-01-05T00:23:14Z

This Pull Request is stale because it has been open for 60 days with no activity. 当前Pull Request 60天内无活动，被标记为stale。

paddle-bot · 2026-01-06T06:39:23Z

Automatically closed by Paddle-bot.

ZHUI reviewed May 7, 2024

View reviewed changes

SylarTiaNII force-pushed the modify_logger branch from 1059bb4 to 69a2c35 Compare May 8, 2024 06:11

[Trainer] remove redundant memory metrics and set enable as default

4821ce2

SylarTiaNII force-pushed the modify_logger branch from 69a2c35 to 4821ce2 Compare May 8, 2024 07:05

ZHUI approved these changes May 8, 2024

View reviewed changes

PaddlePaddle locked and limited conversation to collaborators May 8, 2024

PaddlePaddle unlocked this conversation May 8, 2024

ZHUI closed this May 9, 2024

ZHUI reopened this May 9, 2024

github-actions bot added the stale label Jul 10, 2024

ZHUI closed this Jul 10, 2024

ZHUI reopened this Jul 10, 2024

PaddlePaddle locked and limited conversation to collaborators Jul 10, 2024

PaddlePaddle unlocked this conversation Jul 10, 2024

github-actions bot removed the stale label Jul 11, 2024

github-actions bot added the stale label Sep 9, 2024

github-actions bot removed the stale label Oct 15, 2024

github-actions bot added the stale label Jan 5, 2025

paddle-bot bot closed this Jan 6, 2026

	logs["max_memory_allocated"] = max_memory_allocated / divisor
	logs["max_memory_allocated"] = max_memory_allocated >> 20

[Trainer] remove redundant memory metrics and set enable as default #8374

[Trainer] remove redundant memory metrics and set enable as default #8374

Uh oh!

Conversation

SylarTiaNII commented May 7, 2024

PR types

PR changes

Description

Uh oh!

paddle-bot bot commented May 7, 2024

Uh oh!

ZHUI May 7, 2024

Choose a reason for hiding this comment

Uh oh!

SylarTiaNII May 8, 2024

Choose a reason for hiding this comment

Uh oh!

ZHUI May 7, 2024

Choose a reason for hiding this comment

Uh oh!

SylarTiaNII May 8, 2024

Choose a reason for hiding this comment

Uh oh!

codecov bot commented May 7, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

ZHUI May 7, 2024

Choose a reason for hiding this comment

Uh oh!

ZHUI left a comment

Choose a reason for hiding this comment

Uh oh!

github-actions bot commented Jul 10, 2024

Uh oh!

github-actions bot commented Sep 9, 2024

Uh oh!

CLAassistant commented Oct 14, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

github-actions bot commented Jan 5, 2025

Uh oh!

paddle-bot bot commented Jan 6, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

codecov bot commented May 7, 2024 •

edited

Loading

CLAassistant commented Oct 14, 2024 •

edited

Loading