You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
TensorFlow. TensorFlow: A System for Large-Scale
Machine Learning. Martín Abadi, Paul Barham, Jianmin Chen, Zhifeng Chen, Andy Davis, Jeffrey Dean, Matthieu Devin, Sanjay Ghemawat, Geoffrey Irving, Michael Isard, Manjunath Kudlur, Josh Levenberg, Rajat Monga, Sherry Moore, Derek G. Murray, Benoit Steiner, Paul Tucker, Vijay Vasudevan, Pete Warden, Martin Wicke, Yuan Yu, Xiaoqiang Zheng. OSDI 2016.
Extra reading
TenorFlowDCF. Dynamic Control Flow in Large-Scale Machine Learning. Yuan Yu, Martin Abadi, Paul Barham, Eugene Brevdo, Mike Burrows, Andy Davis, Jeff Dean, Sanjay Ghemawat, Tim Harley, Peter Hawkins, Michael Isard, Manjunath Kudlur, Rajat Monga, Derek Murray, Xiaoqiang Zheng. EuroSys 2018.
RDAG. Improving the Expressiveness of Deep Learning Frameworks with Recursion. Eunji Jeong*, Joo Seong Jeong*, Soojeong Kim, Gyeong-In Yu, Byung-Gon Chun. EuroSys 2018.
3/16
Required reading
PyTorch. PyTorch: An Imperative Style, High-Performance
Deep Learning Library. NeurIPS 2019.
3/21
Required reading
JANUS. JANUS: Fast and Flexible Deep Learning via Symbolic Graph Execution of Imperative Programs. Eunji Jeong, Sungwoo Cho, Gyeong-In Yu, Joo Seong Jeong, Dong-Jin Shin, Byung-Gon Chun. NSDI 2019.
Extra reading
AutoGraph. AutoGraph: Imperative-Style Coding with Graph-based Performance. Dan Moldovan, James M Decker, Fei Wang, Andrew A Johnson, Brian K Lee, Zachary Nado, D Sculley, Tiark Rompf, Alexander B Witschko. MLSys 2019.
TFEager. TensorFlow Eager: A Multi-Stage, Python Embedded DSL for Machine Learning. Akshay Agrawal, Akshay Naresh Modi, Alexandre Passos, Allen Lavoie, Ashish Agarwal, Asim Shankar, Igor Ganichev, Josh Levenberg, Mingsheng Hong, Rajat Monga, Shanqing Cai. MLSys 2019.
Terra. Terra: Imperative-Symbolic Co-Execution of Imperative Deep Learning Programs Taebum Kim, Eunji Jeong, Geon-Woo Kim, Yunmo Koo, Sehoon Kim, Gyeongin Yu, Byung-Gon Chun. NeurIPS 2021.
3/23
PyTorch internals
3/28
Required reading
Halide. Halide: A Language and Compiler for Optimizing Parallelism, Locality, and Recomputation in Image Processing Pipelines. Jonathan Ragan-Kelley, Connelly Barnes, Andrew Adams, Sylvain Paris, Fredo Durand, Saman Amarasinghe. PLDI 2013.
3/30
Project proposal
4/4
Required reading
TVM. An Automated End-to-End Optimizing Compiler for Deep Learning. Tianqi Chen, Thierry Moreau, Ziheng Jiang, Lianmin Zheng, Eddie Yan, Haichen Shen, Meghan Cowan, Leyuan Wang, Yuwei Hu, Luis Ceze, Carlos Guestrin, Arvind Krishnamurthy. OSDI 2018.
Extra reading
TC. Tensor Comprehensions: Framework-Agnostic High-Performance Machine Learning Abstractions. Nicolas Vasilache, Oleksandr Zinenko, Theodoros Theodoridis, Priya Goyal, Zachary DeVito, William S. Moses, Sven Verdoolaege, Andrew Adams, Albert Cohen. 2018.
TACO. The Tensor Algebra Compiler. Kjolstad, Fredrik and Kamil, Shoaib and Chou, Stephen and Lugato, David and Amarasinghe, Saman. OOPSLA 2017.
4/6
Required reading
Ansor. Lianmin Zheng, Chengfan Jia, Minmin Sun, Zhao Wu, Cody Hao Yu, Ameer Haj-Ali, Yida Wang, Jun Yang, Danyang Zhuo, Koushik Sen, Joseph E. Gonzalez, Ion Stoica. Ansor: Generating High-Performance Tensor Programs for Deep Learning. OSDI 2020.
4/11
Required reading
TASO. TASO: Optimizing Deep Learning Computation with Automatic Generation of Graph Substitutions. Zhihao Jia, Oded Padon, James Thomas, Todd Warszawski, Matei Zaharia, Alex Aiken. SOSP 2019.
Extra reading
PET. PET: Optimizing Tensor Programs with Partially Equivalent Transformations and Automated Corrections. OSDI 2021.
AStitch. AStitch: Enabling A New Multi-Dimensional Optimization Space for Memory-Intensive ML Training and Inference on Modern SIMT Architectures. Zhen Zheng, Xuanda Yang, Pengzhan Zhao, Guoping Long, Kai Zhu, Feiwen Zhu, Wenyi Zhao, Xiaoyong Liu, Jun Yang, Jidong Zhai, Shuaiwen Leon Song, Wei Lin. ASPLOS 2022.
4/13
Required reading
SparTA. SparTA: Deep-Learning Model Sparsity via Tensor-with-Sparsity-Attribute. OSDI 2022.
Extra reading
Roller. Roller: Fast and Efficient Tensor Compilation for Deep Learning. OSDI 2022.
4/18
Required reading
Nimble. Nimble: Lightweight and Parallel GPU Task Scheduling for Deep Learning. Woosuk Kwon*, Gyeong-In Yu*, Eunji Jeong, Byung-Gon Chun. NeurIPS 2020 (Spotlight).
Clipper. A Low-Latency Online Prediction Serving System. Daniel Crankshaw, Xin Wang, Guilio Zhou, Michael J. Franklin, Joseph E. Gonzalez, Ion Stoica. NSDI 2017.
Extra reading
PRETZEL. PRETZEL: Opening the Black Box of Machine Learning Prediction Serving Systems. Yunseong Lee, Alberto Scolari, Byung-Gon Chun, Marco Domenico Santambrogio, Markus Weimer, Matteo Interlandi. OSDI 2018.
5/2
Mid project presentation
5/4
Required reading
Clockwork. Serving DNNs like Clockwork: Performance Predictability from the Bottom Up. Arpan Gujarati, Reza Karimi, Safya Alzayat, Wei Hao, Antoine Kaufmann, Ymir Vigfusson, Jonathan Mace. OSDI 2020.
5/9
Required reading
Orca. Orca: A Distributed Serving System for Transformer-Based Generative Models. Gyeong-In Yu, Joo Seong Jeong, Geon-Woo Kim, Soojeong Kim, Byung-Gon Chun. OSDI 2022. (patented)
5/11
Required reading
Parallax. Parallax: Sparsity-aware Data Parallel Training of Deep Neural Networks. Soojeong Kim, Gyeong-In Yu, Hojin Park, Sungwoo Cho, Eunji Jeong, Hyeonmin Ha, Sanha Lee, Joo Seong Jeong, Byung-Gon Chun. EuroSys 2019.
Extra reading
Horovod. Horovod: fast and easy distributed deep learning in TensorFlow. Alexander Sergeev and Mike Del Balso, arXiv 2018.
PS. Scaling Distributed Machine Learning with the Parameter Server. Mu Li, David G. Andersen, Jun Woo Park, Alexander J. Smola, Amr Ahmed, Vanja Josifovski, James Long, Eugene J. Shekita, and Bor-Yiing Su. OSDI 2014.
5/16
Required reading
Unity. Unity: Accelerating DNN Training Through Joint Optimization of Algebraic Transformations and Parallelization. OSDI 2022.
Extra reading
GSPMD. GSPMD: General and Scalable Parallelization for ML Computation Graphs. ICLR 2021.
5/18
Required reading
Alpa. Alpa: Automating Inter- and Intra-Operator Parallelism for Distributed Deep Learning. OSDI 2022.
Extra reading
PipeDream. PipeDream: Generalized Pipeline Parallelism for DNN Training. Deepak Narayanan, Aaron Harlap, Amar Phanishayee, Vivek Seshadri, Nikhil R. Devanur, Gregory R. Ganger, Philip B. Gibbons, Matei Zaharia. SOSP 2019.
Mesh-Tensorflow. Mesh-TensorFlow: Deep Learning for Supercomputers. NeurIPS 2018.
5/23
CXL (Guest lecture)
5/25
Required reading
MegatronLM. Efficient Large-Scale Language Model Training on GPU Clusters Using Megatron-LM. SC 2021.
Zero.ZeRO: Memory Optimizations Toward Training Trillion Parameter Models. SC 2020.
ZeroInfinity. ZeRO-Infinity: Breaking the GPU Memory Wall for Extreme Scale Deep Learning. SC 2021.
tf.data. tf.data: A Machine Learning Data Processing Framework. VLDB 2021.
5/30
고성능 딥러닝 추론 가속기를 위한 하드웨어 앱스트랙션 설계
6/1
Required reading
Gandiva. Gandiva: Introspective Cluster Scheduling for Deep Learning. OSDI 2018.
Tiresias. Tiresias: A GPU Cluster Manager for Distributed Deep Learning. NSDI 2019.
Extra reading
Themis. Themis: Fair and Efficient GPU Cluster Scheduling. NSDI 2020.
Gavel. Heterogeneity-Aware Cluster Scheduling Policies for Deep Learning Workloads. OSDI 2020.
Antman. AntMan: Dynamic Scaling on GPU Clusters for Deep Learning. OSDI 2020.
HiveD. HiveD: Sharing a GPU Cluster for Deep Learning
with Guarantees. OSDI 2020.