|
| 1 | +# Overview |
| 2 | + |
| 3 | +With the advent of machine learning systems, the design of user-friendly |
| 4 | +and high-performance APIs has become a paramount concern for system |
| 5 | +designers. In the early stages of machine learning frameworks (as |
| 6 | +depicted in Figure :numref:`ch03/framework_development_history`, developers often |
| 7 | +opted for high-level programming languages like Lua (Torch) and Python |
| 8 | +(Theano) to write machine learning programs. These frameworks offered |
| 9 | +essential functions, including model definition and automatic |
| 10 | +differentiation, which are integral to machine learning. They were |
| 11 | +particularly well-suited for creating small-scale machine learning |
| 12 | +applications targeted toward scientific research purposes. |
| 13 | + |
| 14 | +<figure id="fig:ch03/framework_development_history"> |
| 15 | +<embed src="../img/ch03/framework_development_history.pdf" /> |
| 16 | +<figcaption> Evolution of Machine Learning Programming Frameworks: A |
| 17 | +Historical Perspective</figcaption> |
| 18 | +</figure> |
| 19 | + |
| 20 | +The rapid advancement of deep neural networks (DNNs) since 2011 has |
| 21 | +sparked groundbreaking achievements in various AI application domains, |
| 22 | +such as computer vision, speech recognition, and natural language |
| 23 | +processing. However, training DNNs requires substantial computational |
| 24 | +power. Unfortunately, earlier frameworks like Torch (primarily using |
| 25 | +Lua) and Theano (mainly using Python) were unable to fully harness this |
| 26 | +computing power. On the other hand, general-purpose APIs like CUDA C for |
| 27 | +computational accelerators such as NVIDIA GPUs have become increasingly |
| 28 | +mature, and multi-thread libraries like POSIX Threads built on CPU |
| 29 | +multi-core technology have gained popularity among developers. |
| 30 | +Consequently, many machine learning users sought to develop |
| 31 | +high-performance deep learning applications utilizing C/C++. These |
| 32 | +requirements led to the emergence of frameworks like Caffe, which |
| 33 | +employed C/C++ as their core APIs. |
| 34 | + |
| 35 | +However, customization of machine learning models is often necessary to |
| 36 | +suit specific deployment scenarios, data types, identification tasks, |
| 37 | +and so on. This customization typically falls on the shoulders of AI |
| 38 | +application developers, who may come from diverse backgrounds and may |
| 39 | +not fully leverage the capabilities of C/C++. This became a significant |
| 40 | +bottleneck that hindered the widespread adoption of programming |
| 41 | +frameworks like Caffe, which heavily relied on C/C++. |
| 42 | + |
| 43 | +In late 2015, Google introduced TensorFlow, which revolutionized the |
| 44 | +landscape. In contrast to Torch, TensorFlow adopted a design where the |
| 45 | +frontend and backend were relatively independent. The frontend, |
| 46 | +presented to users, utilized the high-level programming language Python, |
| 47 | +while the high-performance backend was implemented in C/C++. TensorFlow |
| 48 | +provided numerous Python-based frontend APIs, gaining wide acceptance |
| 49 | +among data scientists and machine learning researchers. It seamlessly |
| 50 | +integrated into Python-dominated big data ecosystems, benefiting from |
| 51 | +various big data development libraries such as NumPy, Pandas, SciPy, |
| 52 | +Matplotlib, and PySpark. Python's exceptional interoperability with |
| 53 | +C/C++, as demonstrated in multiple Python libraries, further enhanced |
| 54 | +TensorFlow's appeal. Consequently, TensorFlow combined the flexibility |
| 55 | +and ecosystem of Python with high-performance capabilities offered by |
| 56 | +its C/C++ backend. This design philosophy was inherited by subsequent |
| 57 | +frameworks like PyTorch, MindSpore, and PaddlePaddle. |
| 58 | + |
| 59 | +Subsequently, as observed globally, prominent enterprises started |
| 60 | +favoring open-source machine learning frameworks, leading to the |
| 61 | +emergence of Keras and TensorLayerX. These high-level libraries |
| 62 | +significantly expedited the development of machine learning |
| 63 | +applications. They provided Python APIs that allowed quick importing of |
| 64 | +existing models, and these high-level APIs were decoupled from the |
| 65 | +intricate implementation details of specific machine learning |
| 66 | +frameworks. As a result, Keras and TensorLayerX could be utilized across |
| 67 | +different machine learning frameworks. |
| 68 | + |
| 69 | +While deep neural networks continued to evolve, new challenges surfaced |
| 70 | +regarding the APIs of machine learning frameworks. Around 2020, novel |
| 71 | +frameworks like MindSpore and JAX emerged to tackle these challenges. |
| 72 | +MindSpore, in addition to inheriting the hybrid interfaces (Python and |
| 73 | +C/C++) from TensorFlow and PyTorch, expanded the scope of machine |
| 74 | +learning programming models. This expansion facilitated efficient |
| 75 | +support for a diverse range of AI backend chips, including NVIDIA GPU, |
| 76 | +Huawei Ascend , and ARM. Consequently, machine learning applications can |
| 77 | +be swiftly deployed across a wide array of heterogeneous devices. |
| 78 | + |
| 79 | +Simultaneously, the proliferation of ultra-large datasets and |
| 80 | +ultra-large DNNs necessitated distributed execution as a fundamental |
| 81 | +design requirement for machine learning programming frameworks. However, |
| 82 | +implementing distributed execution in TensorFlow and PyTorch required |
| 83 | +developers to write substantial amounts of code for allocating datasets |
| 84 | +and DNNs across distributed nodes. Yet, many AI developers are not |
| 85 | +well-versed in distributed programming. In this regard, JAX and |
| 86 | +MindSpore significantly improves the situation by enabling the seamless |
| 87 | +execution of programs on a single node across various other nodes. |
0 commit comments