Skip to content

Conversation

@SylarTiaNII
Copy link
Contributor

PR types

Others

PR changes

Others

Description

remove redundant memory metrics and enable memory metrics print as default

@paddle-bot
Copy link

paddle-bot bot commented May 7, 2024

Thanks for your contribution!

logs["current_memory_allocated"] = current_memory_allocated / divisor
logs["current_memory_reserved"] = current_memory_reserved / divisor
logs["max_memory_allocated"] = max_memory_allocated / divisor
logs["max_memory_reserved"] = max_memory_reserved / divisor
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

mem=`cat $log_dir/workerlog.0 | grep 'global_step: 30' | awk -F 'gpu_max_memory_reserved: ' '{print $2}' | awk -F ',' '{print $1}'`

这处也对应改了吧,gpu_max_memory_reserved -> max_memory_reserved

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

已修改

max_memory_reserved = core.device_memory_stat_peak_value("Reserved", device_id)
logs["current_memory_allocated"] = current_memory_allocated / divisor
logs["current_memory_reserved"] = current_memory_reserved / divisor
logs["max_memory_allocated"] = max_memory_allocated / divisor
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
logs["max_memory_allocated"] = max_memory_allocated / divisor
logs["max_memory_allocated"] = max_memory_allocated >> 20

这个之前是MB单位,建议不要改变了,保持原来写法。用除法的话,是浮点数,还有小数位的问题,建议直接位运算,MB为单位

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

已修改

@codecov
Copy link

codecov bot commented May 7, 2024

Codecov Report

❌ Patch coverage is 11.11111% with 16 lines in your changes missing coverage. Please review.
✅ Project coverage is 54.60%. Comparing base (09a0ce7) to head (4821ce2).
⚠️ Report is 1214 commits behind head on develop.

Files with missing lines Patch % Lines
paddlenlp/trainer/trainer.py 11.11% 16 Missing ⚠️

❌ Your patch status has failed because the patch coverage (11.11%) is below the target coverage (80.00%). You can increase the patch coverage or adjust the target coverage.
❌ Your project status has failed because the head coverage (54.60%) is below the target coverage (58.00%). You can increase the head coverage or adjust the target coverage.

Additional details and impacted files
@@             Coverage Diff             @@
##           develop    #8374      +/-   ##
===========================================
- Coverage    55.36%   54.60%   -0.77%     
===========================================
  Files          614      636      +22     
  Lines        96016   108827   +12811     
===========================================
+ Hits         53164    59428    +6264     
- Misses       42852    49399    +6547     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

)
skip_memory_metrics: bool = field(
default=True, metadata={"help": "Whether or not to skip adding of memory profiler reports to metrics."}
default=False, metadata={"help": "Whether or not to skip adding of memory profiler reports to metrics."}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

这个别改了吧,你们需要用,自己打看

Copy link
Contributor

@ZHUI ZHUI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@PaddlePaddle PaddlePaddle locked and limited conversation to collaborators May 8, 2024
@PaddlePaddle PaddlePaddle unlocked this conversation May 8, 2024
@ZHUI ZHUI closed this May 9, 2024
@ZHUI ZHUI reopened this May 9, 2024
@github-actions
Copy link

This Pull Request is stale because it has been open for 60 days with no activity. 当前Pull Request 60天内无活动,被标记为stale。

@github-actions github-actions bot added the stale label Jul 10, 2024
@ZHUI ZHUI closed this Jul 10, 2024
@ZHUI ZHUI reopened this Jul 10, 2024
@PaddlePaddle PaddlePaddle locked and limited conversation to collaborators Jul 10, 2024
@PaddlePaddle PaddlePaddle unlocked this conversation Jul 10, 2024
@github-actions github-actions bot removed the stale label Jul 11, 2024
@github-actions
Copy link

github-actions bot commented Sep 9, 2024

This Pull Request is stale because it has been open for 60 days with no activity. 当前Pull Request 60天内无活动,被标记为stale。

@github-actions github-actions bot added the stale label Sep 9, 2024
@CLAassistant
Copy link

CLAassistant commented Oct 14, 2024

CLA assistant check
All committers have signed the CLA.

@github-actions github-actions bot removed the stale label Oct 15, 2024
@github-actions
Copy link

github-actions bot commented Jan 5, 2025

This Pull Request is stale because it has been open for 60 days with no activity. 当前Pull Request 60天内无活动,被标记为stale。

@github-actions github-actions bot added the stale label Jan 5, 2025
@paddle-bot paddle-bot bot closed this Jan 6, 2026
@paddle-bot
Copy link

paddle-bot bot commented Jan 6, 2026

Automatically closed by Paddle-bot.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants