info/index.html at main · shiyukai26/info · GitHub

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
<!DOCTYPE HTML>
<html lang="en"><head><meta http-equiv="Content-Type" content="text/html; charset=UTF-8">

  <title>Yukai Shi (石瑜恺)</title>

  <meta name="author" content="Yukai Shi">
  <meta name="viewport" content="width=device-width, initial-scale=1">

  <link rel="stylesheet" type="text/css" href="stylesheet.css">
	<link rel="icon" href="images/yukai.jpg">
</head>

<body>
  <table style="width:100%;max-width:800px;border:0px;border-spacing:0px;border-collapse:separate;margin-right:auto;margin-left:auto;"><tbody>
    <tr style="padding:0px">
      <td style="padding:0px">
        <table style="width:100%;border:0px;border-spacing:0px;border-collapse:separate;margin-right:auto;margin-left:auto;"><tbody>
          <tr style="padding:0px">
            <td style="padding:2.5%;width:63%;vertical-align:middle">
              <p style="text-align:center">
                <name>Yukai Shi (石瑜恺)</name>
              </p>
              <p>I am a PhD student(2022-present) at <a href="https://www.tsinghua.edu.cn/">Tsinghua University</a>, supervised by Professor <a href="https://scholar.google.com/citations?user=9akH-n8AAAAJ&hl=zh-CN"> Heung-Yeung Shum</a>(former executive vice president of Microsoft). Previously, I got my bachelor's degree from the school of Artificial Intelligence in <a href="https://en.xidian.edu.cn/">Xidian University</a>. Currently, I am interning at <a href="https://www.idea.edu.cn/research/cvr.html">IDEA Research</a>, advised by Professor <a href="//www.leizhang.org/">Lei Zhang</a>. I also work cloesly with Professor <a href="https://pingtan.people.ust.hk/index.html">Ping Tan</a> and Doctor <a href="https://www.xtao.website/">Xin Tao</a>.
              </p>
              <p>
                My research interests lie in 3D generation and video generation. Welcome to contact me for any discussion and cooperation!
              </p>
              <p>
                Email: <strong>shiyk22 AT mails Dot tsinghua Dot edu Dot cn</strong> / <strong>shiyukai22 AT gmail Dot com</strong> .
              </p>
              <p style="text-align:center">
                <a href="shiyk22@mails.tsinghua.edu.cn">Email</a> &nbsp/&nbsp
                <!-- <a href="data/???.pdf">CV</a> &nbsp/&nbsp -->
                <a href="https://scholar.google.com/citations?hl=zh-CN&user=oQXfkSQAAAAJ">Google Scholar</a> &nbsp/&nbsp
                <a href="https://github.com/shiyukai26">Github</a> &nbsp/&nbsp
		            <!-- <a href="https://www.zhihu.com/people/nl0000/posts">Zhihu</a>       -->
              </p>
            </td>
            <td style="padding:2.5%;width:40%;max-width:40%">
              <a href="images/yukai.jpg"><img style="width:100%;max-width:100%" alt="profile photo" src="images/yukai.jpg" class="hoverZoomLink"></a>
            </td>
          </tr>
        </tbody></table>
        <table style="width:100%;border:0px;border-spacing:0px;border-collapse:separate;margin-right:auto;margin-left:auto;"><tbody>
            <tr>
            <td style="padding:20px;width:100%;vertical-align:middle">
              <heading>Research</heading>
              <!-- <p>
                I'm interested in <strong>Computer Vision, Machine Learning, Robotics, and their intersections</strong>. My long-term goal is to enable robots to master all human skills and eventually build autonomous robotic systems in physical simulations and reality.
              </p> -->
            </td>
          </tr>
        </tbody></table>
        <table style="width:100%;border:0px;border-spacing:0px;border-collapse:separate;margin-right:auto;margin-left:auto;"><tbody>

	      <!-- scenemaker -->
          <tr>
            <td style="padding:20px;width:25%;vertical-align:middle">
              <div class="one">
                <div class="two" id='recon_image'><video  width=100% height=100% muted autoplay loop>
                <source src="videos/scenemaker_demo.mp4" type="video/mp4">
                </video></div>
            </td>
            <td style="padding:20px;width:75%;vertical-align:middle">
              <a href="https://arxiv.org/abs/2512.10957">
              <papertitle>SceneMaker: Open-set 3D Scene Generation with Decoupled De-occlusion and Pose Estimation Model </papertitle>
              </a> <br><strong>Yukai Shi</strong>, Weiyu Li, Zihao Wang, Hongyang Li, Xingyu Chen, Ping Tan, Lei Zhang.
              <br>
              <em>CVPR</em>, 2026
              <br>
              <a href="https://idea-research.github.io/SceneMaker/">Project page</a>
              /
              <a href="https://arxiv.org/abs/2512.10957">Paper</a>
              /
              <a href="https://huggingface.co/datasets/LightillusionsLab/SceneMaker">Datasets</a>
              /
              <a href="https://github.com/IDEA-Research/SceneMaker">Code</a>
            </td>
          </tr>


          <!-- imba loss -->
          <tr>
            <td style="padding:20px;width:25%;vertical-align:middle">
              <div>
              <img src='images/IMBA_loss.png' width=100%>
              </div>
            </td>
            <td style="padding:20px;width:75%;vertical-align:middle">
              <a href="http://arxiv.org/abs/2507.13345">
              <papertitle>Imbalance in Balance: Online Concept Balancing in Generation Models </papertitle>
              </a> <br><strong>Yukai Shi</strong>, Jiarong Ou, Rui Chen, Haotian Yang, Jiahao Wang, Xin Tao, Pengfei Wan, Di Zhang, Kun Gai.
              <br>
              <em>ICCV</em>, 2025
              <br>
              <a href="http://arxiv.org/abs/2507.13345">Paper</a>
              /
              <a href="https://github.com/KlingAIResearch/IMBA-Loss">Code</a>
            </td>
          </tr>


        <!-- koala36m -->
          <tr>
            <td style="padding:20px;width:25%;vertical-align:middle">
              <div>
              <img src='images/Koala36M.png' width=100%>
              </div>
            </td>
            <td style="padding:20px;width:75%;vertical-align:middle">
              <a href="https://arxiv.org/abs/2410.08260">
              <papertitle>Koala-36M: A Large-scale Video Dataset Improving Consistency between Fine-grained Conditions and Video Content</papertitle>
              </a> <br>Qiuheng Wang*, <strong>Yukai Shi*</strong>, Jiarong Ou, Rui Chen, Ke Lin, Jiahao Wang, Boyuan Jiang, Haotian Yang, Mingwu Zheng, Xin Tao, Fei Yang, Pengfei Wan, Di Zhang
              <br>
              <em>CVPR</em>, 2025
              <br>
	      <a href="https://koala36m.github.io/">Project page</a>
              /
              <a href="https://arxiv.org/abs/2410.08260">Paper</a>
	      /
	      <a href="https://github.com/KlingAIResearch/Koala-36M">Code</a>
            </td>
          </tr>


          <!-- toss -->
          <tr>
            <td style="padding:20px;width:25%;vertical-align:middle">
              <div class="one">
                <div class="two" id='recon_image'><video  width=100% height=100% muted autoplay loop>
                <source src="videos/toss.mp4" type="video/mp4">
                Your browser does not support the video tag.
                </video></div>
            </td>
            <td style="padding:20px;width:75%;vertical-align:middle">
              <papertitle><a href="https://arxiv.org/abs/2310.10644">TOSS: High-quality Text-guided Novel View Synthesis from a Single Image</a></papertitle>
              <br>
              <strong>Yukai Shi*</strong>, Jianan Wang*, He Cao*, Boshi Tang, Xianbiao Qi, Tianyu Yang, Yukun Huang, Shilong Liu, Lei Zhang, Heung-Yeung Shum
              <br>
              <em>ICLR</em>, 2024
              <br>
              <a href="https://toss3d.github.io/">Project page</a>
              /
              <a href="https://arxiv.org/abs/2310.10644">Paper</a>
              /
              <a href="https://github.com/IDEA-Research/TOSS">Code</a>
              <p></p>
              <!-- <p>
                TOSS utilizes text as semantic guidance to further constrain the solution space of NVS, and generates more plausible, controllable, multiview-consistent novel view images from a single image.
              </p> -->
            </td>
          </tr>

          <!-- dreamtime -->
          <tr>
            <td style="padding:20px;width:25%;vertical-align:middle">
              <div>
              <img src='images/dreamtime.png' width=100%>
              </div>
            </td>
            <td style="padding:20px;width:75%;vertical-align:middle">
              <a href="https://openreview.net/pdf?id=1bAUywYJTU">
              <papertitle>DreamTime: An Improved Optimization Strategy for Diffusion-Guided 3D Generation</papertitle>
              </a> <br>Yukun Huang, Jianan Wang, <strong>Yukai Shi</strong>, Boshi Tang, Xianbiao Qi, Lei Zhang
              <br>
              <em>ICLR</em>, 2024
              <br>
              <a href="https://openreview.net/pdf?id=1bAUywYJTU">Paper</a>
              <!-- <p>
                      We propose to prioritize timestep sampling with monotonically non-increasing functions, which aligns the 3D optimization process with the sampling process of diffusion model.
              </p> -->
            </td>
          </tr>

          <!-- dreamwaltz -->
          <tr>
            <td style="padding:20px;width:25%;vertical-align:middle">
              <div class="one">
                <div class="two" id='recon_image'><video  width=100% height=100% muted autoplay loop>
                <source src="videos/dreamwaltz.mp4" type="video/mp4">
                Your browser does not support the video tag.
                </video></div>
            </td>
            <td style="padding:20px;width:75%;vertical-align:middle">
              <papertitle><a href="https://arxiv.org/pdf/2305.12529.pdf">DreamWaltz: Make a Scene with Complex 3D Animatable Avatars</a></papertitle>
              <br>
              Yukun Huang, Jianan Wang, Ailing Zeng, He Cao, Xianbiao Qi, <strong>Yukai Shi</strong>, Zheng-Jun Zha, Lei Zhang
              <br>
              <em>NeurlPS</em>, 2023
              <br>
              <a href="https://idea-research.github.io/DreamWaltz/">Project page</a>
              /
              <a href="https://arxiv.org/pdf/2305.12529.pdf">Paper</a>
              /
              <a href="https://github.com/IDEA-Research/DreamWaltz">Code</a>
              <p></p>
              <!-- <p>
                TOSS utilizes text as semantic guidance to further constrain the solution space of NVS, and generates more plausible, controllable, multiview-consistent novel view images from a single image.
              </p> -->
            </td>
          </tr>


          <!-- lipsformer -->
          <tr>
            <td style="padding:20px;width:25%;vertical-align:middle;">
              <div>
                <img src='images/lipsformer.png' width=100%>
              </div>
            </td>
            <td style="padding:20px;width:75%;vertical-align:middle">
              <a href="https://openreview.net/pdf?id=cHf1DcCwcH3">
              <papertitle>LipsFormer: Introducing Lipschitz Continuity to Vision Transformers</papertitle>
              </a> <br>Xianbiao Qi, Jianan Wang, Yihao Chen, <strong>Yukai Shi</strong>, Lei Zhang,
              <br>
              <em>ICLR</em>, 2023
              <br>
              <a href="https://openreview.net/pdf?id=cHf1DcCwcH3">Paper</a>/
              <a href="https://github.com/IDEA-Research/LipsFormer">Code</a>
              <!-- <p>
                      We propose to prioritize timestep sampling with monotonically non-increasing functions, which aligns the 3D optimization process with the sampling process of diffusion model.
              </p> -->
            </td>
          </tr>


          <!-- <tr onmouseout="dreamfusion_stop()" onmouseover="dreamfusion_start()">
            <td style="padding:20px;width:25%;vertical-align:middle">
              <div class="one">
                <div class="two" id='dreamfusion_image'>
                <img src='images/ddnm_after.png' width="160"></div>
                <img src='images/ddnm_before.png' width="160">
              </div>
              <script type="text/javascript">
                function dreamfusion_start() {
                  document.getElementById('dreamfusion_image').style.opacity = "1";
                }

                function dreamfusion_stop() {
                  document.getElementById('dreamfusion_image').style.opacity = "0";
                }
                dreamfusion_stop()
              </script>
            </td>
            <td style="padding:20px;width:75%;vertical-align:middle">
              <a href="https://wyhuai.github.io/ddnm.io/">
                <papertitle>Zero-Shot Image Restoration Using Denoising Diffusion Null-Space Model</papertitle>
              </a>
              <br>
              <strong>Yinhuai Wang*</strong>,
              <a href="https://scholar.google.com/citations?user=uoRPLHIAAAAJ&hl=en">Jiwen Yu*</a>,
							<a href="https://jianzhang.tech/">Jian Zhang</a>
              <br>
              <em>ICLR</em>, 2023 &nbsp <font color="red"><strong>(Oral Presentation)</strong></font>
              <br>
              <a href="https://wyhuai.github.io/ddnm.io/">Project page</a>
              /
              <a href="https://arxiv.org/pdf/2212.00490.pdf">arXiv</a>
              /
              <a href="https://github.com/wyhuai/DDNM">Code</a>
              <p></p>
              <p>
              We bring Range-Null space Decomposition (RND) into diffusion models, enabling diverse image restoration tasks in a zero-shot manner, without extra training or optimization.
              </p>
            </td>
          </tr> -->


        </tbody></table>


        <table width="100%" align="center" border="0" cellspacing="0" cellpadding="20"><tbody>
          <tr>
            <td>
              <heading>Academic Service</heading>
              <p>
                Conference reviewer of CVPR, NeurIPS, ICLR, ICML, ICCV.
              </p>
            </td>
          </tr>
        </tbody></table>

        <table style="width:100%;border:0px;border-spacing:0px;border-collapse:separate;margin-right:auto;margin-left:auto;"><tbody>
          <tr>
            <td style="padding:0px">
              <br>
              <p style="text-align:middle;font-size:small;">
                This cool template is stolen from <a href="https://jonbarron.info/">Jon Barron</a>!
              </p>
            </td>
          </tr>
        </tbody></table>
      </td>
    </tr>
  </table>
</body>

</html>