Skip to content

Commit b1265ee

Browse files
authored
Merge pull request #294 from Tencent/jiaruifang/accurate_timer
accurate timer
2 parents bc9683f + 1a74991 commit b1265ee

File tree

3 files changed

+7
-2
lines changed

3 files changed

+7
-2
lines changed

CHANGE_LOG.md

Lines changed: 4 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,8 @@
11
## v0.4.5 Dec. 2021
2-
refactory the files in example and adding chunk size searching.
2+
Evaluate on 8 nodes of SuperPod. Fix bugs in multi-GPU mem tracer.
3+
4+
## v0.4.5 Dec. 2021
5+
Refactor the files in example and add chunk size searching.
36

47

58
### v0.4.4 Dec. 2021

patrickstar/utils/global_timer.py

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -28,6 +28,7 @@
2828
# SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
2929

3030
import time
31+
import torch
3132

3233
# from .logging import logger
3334
from .singleton_meta import SingletonMeta
@@ -57,6 +58,7 @@ def start_profile(self, key):
5758
def finish_profile(self, key):
5859
if not self.start_flag:
5960
return
61+
torch.cuda.current_stream().synchronize()
6062
if key in self.elapse_stat:
6163
self.elapse_stat[key] += time.time() - self.start_time[key]
6264
else:

setup.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -41,7 +41,7 @@ def fetch_requirements(path):
4141

4242
setup(
4343
name="patrickstar",
44-
version="0.4.5",
44+
version="0.4.6",
4545
description="PatrickStart library",
4646
long_description="PatrickStar: Parallel Training of Large Language Models via a Chunk-based Parameter Server",
4747
long_description_content_type="text/markdown",

0 commit comments

Comments
 (0)