feature(xjy): add the rnd-related features #438

xiongjyu · 2025-11-07T09:06:15Z

No description provided.

…d AdamW; add value_priority, adaptive policy entropy control, encoder-clip, label smoothing, latent representation analysis option, and cosine similarity loss.

lzero/entry/train_unizero_segment_with_reward_model.py

lzero/entry/__init__.py

puyuan1996 · 2025-11-10T06:24:33Z

lzero/entry/train_unizero_segment_with_reward_model.py

+                train_data_augmented.append(learner.train_iter)
+
+                log_vars = learner.train(train_data_augmented, collector.envstep)
+                reward_model.train_with_policy_batch(train_data)


应该reward_model先训一些iters 然后unizero用训好的rnd网络估计融合奖励再去训unizero的网络，目前这个版本相当于融合奖励每个迭代都在变化，对于unizero这边的学习来说太不平稳了？

对，目前加上了之前讨论的那个参数自适应，初始阶段为0，一段时间后慢慢升上来，这样的话初始阶段相当于只是训练了RND网络，但是没用到内在奖励

目前新跑的都是用了这个方法吗

lzero/reward_model/rnd_reward_model.py

…ync latest params

xiongjyu added 2 commits November 6, 2025 17:23

Fix norm_type, kv_cache rewrite, _reset_collect/eval, init_weight, an…

6a92678

…d AdamW; add value_priority, adaptive policy entropy control, encoder-clip, label smoothing, latent representation analysis option, and cosine similarity loss.

feature(xjy): add the rnd-related features

1cf8688

puyuan1996 reviewed Nov 10, 2025

View reviewed changes

puyuan1996 added the research Research work in progress label Nov 10, 2025

xiongjyu changed the title ~~Dev rnd~~ feature(xjy): add the rnd-related features Nov 10, 2025

xiongjyu added 4 commits November 11, 2025 01:52

add dynamic control weights + intrinsic reward-state mapping graph; s…

e9314d1

…ync latest params

add episode-level RND intrinsic reward evaluation

ac58169

fix a bug on evaluation

b7015d8

modify some cfg

0eb9792

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

feature(xjy): add the rnd-related features #438

feature(xjy): add the rnd-related features #438

Uh oh!

xiongjyu commented Nov 7, 2025

Uh oh!

Uh oh!

Uh oh!

puyuan1996 Nov 10, 2025 •

edited

Loading

Uh oh!

xiongjyu Nov 10, 2025

Uh oh!

puyuan1996 Nov 10, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

feature(xjy): add the rnd-related features #438

Are you sure you want to change the base?

feature(xjy): add the rnd-related features #438

Uh oh!

Conversation

xiongjyu commented Nov 7, 2025

Uh oh!

Uh oh!

Uh oh!

puyuan1996 Nov 10, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

xiongjyu Nov 10, 2025

Choose a reason for hiding this comment

Uh oh!

puyuan1996 Nov 10, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

puyuan1996 Nov 10, 2025 •

edited

Loading