You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: articles/synapse-analytics/machine-learning/synapse-machine-learning-library.md
+18-19Lines changed: 18 additions & 19 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -11,48 +11,47 @@ ms.author: sngun
11
11
12
12
# What is SynapseML?
13
13
14
-
SynapseML (previously known as MMLSpark), is an open-source library that simplifies the creation of massively scalable machine learning (ML) pipelines. It's an ecosystem of tools used to expand the Apache Spark framework in several new directions. SynapseML unifies several existing machine learning frameworks and new Microsoft algorithms into a single, scalable API that is usable across Python, R, Scala, and Java. Using this library, developers can focus on the high-level structure of their data and tasks, and the library takes care of the machine learning implementation details.
14
+
SynapseML (previously known as MMLSpark), is an open-source library that simplifies the creation of massively scalable machine learning (ML) pipelines. SynapseML provides simple, composable, and distributed APIs for a wide variety of different machine learning tasks such as text analytics, vision, anomaly detection, and many others. SynapseML is built on the [Apache Spark distributed computing framework](https://spark.apache.org/) and shares the same API as the [SparkML/MLLib library](https://spark.apache.org/mllib/), allowing you to seamlessly embed SynapseML models into existing Apache Spark workflows.
15
15
16
-
SynapseML adds many deep learning and data science tools to the Spark ecosystem, including seamless integration of Spark Machine Learning pipelines with Microsoft Cognitive Toolkit (CNTK), LightGBM and OpenCV. These tools enable powerful and highly scalable predictive and analytical models for various datasources. It also brings new networking capabilities to the Spark Ecosystem. SynapseML requires Scala 2.12, Spark 3.0+, and Python 3.6+.
16
+
With SynapseML, you can build scalable and intelligent systems to solve challenges in domains such as anomaly detection, computer vision, deep learning, text analytics, and others. SynapseML can train and evaluate models on single-node, multi-node, and elastically resizable clusters of computers, so you can scale your work without wasting resources. SynapseML is usable across Python, R, Scala, Java, and .NET. Furthermore, its API abstracts over a wide variety of databases, file systems, and cloud data stores to simplify experiments no matter where data is located.
17
17
18
-
With SynapseML, you can build scalable and intelligent systems to solve challenges in domains such as Anomaly detection, Computer vision, Deep learning, Text analytics etc. It can train and evaluate models on single-node, multi-node, and elastically resizable clusters of computers, so you can scale your work without wasting resources. In addition to its availability in several different programming languages, the API abstracts over a wide variety of databases, file systems, and cloud data stores to simplify experiments no matter where data is located.
18
+
SynapseML requires Scala 2.12, Spark 3.0+, and Python 3.6+.
19
19
20
-
To get started and build machine learning models in different languages, see the [Installation guide.](https://microsoft.github.io/SynapseML/docs/getting_started/installation/)
21
20
22
21
## Key features of SynapseML
23
22
24
-
### Simplifies distributed machine learning
23
+
### A unified API for creating, training, and scoring models
25
24
26
-
SynapseML offers a unified API that simplifies developing fault-tolerant distributed programs. This library unifies different machine learning frameworks into a single API that is scalable, data and language agnostic. It works for batch, streaming, and serving applications.
25
+
SynapseML offers a unified API that simplifies developing fault-tolerant distributed programs. In particular, SynapseML exposes many different machine learning frameworks under a single API that is scalable, data and language agnostic, and works for batch, streaming, and serving applications.
27
26
28
-
A unified API standardizes many tools, frameworks, algorithms and streamlines the distributed machine learning experience. It enables developers to quickly compose disparate machine learning frameworks. It's helpful for use cases that require more than one framework, such as web-supervised learning, search engine creation, and many others. It can train and evaluate models on single-node, multi-node, and elastically resizable clusters of computers. You can scale up your work without wasting resources.
27
+
A unified API standardizes many tools, frameworks, algorithms and streamlines the distributed machine learning experience. It enables developers to quickly compose disparate machine learning frameworks, keeps code clean, and enables workflows that require more than one framework, such as web-supervised learning, search engine creation, and many others.
29
28
30
-
The SynapseML API abstracts over a wide variety of databases, file systems, and cloud data stores to simplify experiments no matter where the data is located.
31
29
32
-
### Enterprise support on Azure Synapse Analytics
30
+
### Use pre-built intelligent models
33
31
34
-
SynapseML is generally available on Azure Synapse Analytics with enterprise support. You can now build large-scale machine learning pipelines using Azure Cognitive Services, LightGBM, ONNX, and other [selected SynapseML features](https://techcommunity.microsoft.com/t5/azure-synapse-analytics-blog/streamline-collaboration-and-insights-with-simplified-machine/ba-p/2924707). It even includes templates to quickly prototype distributed machine learning systems, such as visual search engines, predictive maintenance pipelines, document translation, and more.
32
+
Many tools in SynapseML don't require a large labeled training dataset. Instead, SynapseML provides simple APIs for pre-built intelligent services, such as Azure Cognitive Services, to quickly solve large-scale AI challenges related to both business and research. SynapseML enables developers to embed over 50 different state-of-the-art ML services directly into their systems and databases. These ready-to-use algorithms can parse a wide variety of documents, transcribe multi-speaker conversations in real time, and translate text to over 100 different languages. For more examples of how to use pre-built AI to solve tasks quicklyt, see [the SynapseML cognitive service examples](https://microsoft.github.io/SynapseML/docs/features/cognitive_services/CognitiveServices%20-%20Overview/).
35
33
36
-
### Pre-built intelligent models
37
-
38
-
Many tools in SynapseML don't require a large labeled training dataset. Instead, SynapseML provides simple APIs for pre-built intelligent services, such as Azure Cognitive Services, to quickly solve large-scale AI challenges related to both business and research. SynapseML enables developers to embed over 45 different state-of-the-art ML services directly into their systems and databases. The latest release includes added support for distributed form recognition, conversation transcription, and translation. These ready-to-use algorithms can parse a wide variety of documents, transcribe multi-speaker conversations in real time, and translate text to over 100 different languages.
39
-
40
-
To make SynapseML's integration with Azure Cognitive Services fast and efficient, several new tools are available within Apache Spark. In particular, SynapseML automatically parses common throttling responses to ensure that jobs don’t overwhelm backend services. Additionally, it uses exponential back-offs to handle unreliable network connections and failed responses. Finally, Spark’s worker machines stay busy with new asynchronous parallelism primitive to Spark. This allows worker machines to send requests while waiting on a response from the server, which can yield a tenfold increase in throughput.
34
+
To make SynapseML's integration with Azure Cognitive Services fast and efficient SynapseML introduces many optimizations for service-oriented workflows. In particular, SynapseML automatically parses common throttling responses to ensure that jobs don’t overwhelm backend services. Additionally, it uses exponential back-offs to handle unreliable network connections and failed responses. Finally, Spark’s worker machines stay busy with new asynchronous parallelism primitives for Spark. Asynchronous parallelism allows worker machines to send requests while waiting on a responses from the server and can yield a tenfold increase in throughput.
41
35
42
36
### Broad ecosystem compatibility with ONNX
43
37
44
-
SynapseML enables developers to use models from many different ML ecosystems through the Open Neural Network Exchange (ONNX) framework. With this integration, you can execute a wide variety of classical and deep learning models at scale with only a few lines of code. This integration between ONNX and Spark automatically handles distributing ONNX models to worker nodes, batching and buffering input data for high throughput, and scheduling work on hardware accelerators.
38
+
SynapseML enables developers to use models from many different ML ecosystems through the Open Neural Network Exchange (ONNX) framework. With this integration, you can execute a wide variety of classical and deep learning models at scale with only a few lines of code. SynapseML automatically handles distributing ONNX models to worker nodes, batching and buffering input data for high throughput, and scheduling work on hardware accelerators.
45
39
46
40
Bringing ONNX to Spark not only helps developers scale deep learning models, it also enables distributed inference across a wide variety of ML ecosystems. In particular, ONNXMLTools converts models from TensorFlow, scikit-learn, Core ML, LightGBM, XGBoost, H2O, and PyTorch to ONNX for accelerated and distributed inference using SynapseML.
47
41
48
-
## Building responsible AI systems with SynapseML
42
+
### Build responsible AI systems
43
+
44
+
After building a model, it’s imperative that researchers and engineers understand its limitations and behavior before deployment. SynapseML helps developers and researchers build responsible AI systems by introducing new tools that reveal why models make certain predictions and how to improve the training dataset to eliminate biases. SynapseML dramatically speeds the process of understanding a user’s trained model by enabling developers to distribute computation across hundreds of machines. More specifically, SynapseML includes distributed implementations of Shapley Additive Explanations (SHAP) and Locally Interpretable Model-Agnostic Explanations (LIME) to explain the predictions of vision, text, and tabular models. It also includes tools such as Individual Conditional Expectation (ICE) and partial dependence analysis to recognized biased datasets.
45
+
46
+
47
+
## Enterprise support on Azure Synapse Analytics
49
48
50
-
After building a model, it’s imperative that researchers and engineers understand its limitations and behavior before deployment. SynapseML helps developers and researchers build responsible AI systems by introducing new tools that reveal why models make certain predictions and how to improve the training dataset to eliminate biases. More specifically, SynapseML includes distributed implementations of Shapley Additive Explanations (SHAP) and Locally Interpretable Model-Agnostic Explanations (LIME) to explain the predictions of vision, text, and tabular models. SynapseML dramatically speeds the process of understanding a user’s trained model by enabling developers to distribute computation across hundreds of machines.
49
+
SynapseML is generally available on Azure Synapse Analytics with enterprise support. You can build large-scale machine learning pipelines using Azure Cognitive Services, LightGBM, ONNX, and other [selected SynapseML features](https://techcommunity.microsoft.com/t5/azure-synapse-analytics-blog/streamline-collaboration-and-insights-with-simplified-machine/ba-p/2924707). It even includes templates to quickly prototype distributed machine learning systems, such as visual search engines, predictive maintenance pipelines, document translation, and more.
51
50
52
51
## Next steps
53
52
54
53
* To learn more about SynapseML, see the [blog post.](https://www.microsoft.com/en-us/research/blog/synapseml-a-simple-multilingual-and-massively-parallel-machine-learning-library/)
55
54
56
55
*[Install SynapseML and get started with examples.](https://microsoft.github.io/SynapseML/docs/getting_started/installation/)
0 commit comments