Skip to content

Commit 7fb28cf

Browse files
authored
Merge pull request #113198 from dagiro/freshness_c51
Freshness c51
2 parents ed6c391 + 5c46857 commit 7fb28cf

File tree

1 file changed

+5
-28
lines changed

1 file changed

+5
-28
lines changed

articles/hdinsight/spark/apache-spark-machine-learning-mllib-ipython.md

Lines changed: 5 additions & 28 deletions
Original file line numberDiff line numberDiff line change
@@ -6,13 +6,13 @@ ms.author: hrasheed
66
ms.reviewer: jasonh
77
ms.service: hdinsight
88
ms.topic: conceptual
9-
ms.custom: hdinsightactive,hdiseo17may2017
10-
ms.date: 04/16/2020
9+
ms.custom: hdinsightactive,hdiseo17may2017,seoapr2020
10+
ms.date: 04/27/2020
1111
---
1212

1313
# Use Apache Spark MLlib to build a machine learning application and analyze a dataset
1414

15-
Learn how to use Apache Spark [MLlib](https://spark.apache.org/mllib/) to create a machine learning application. The application will do predictive analysis on an open dataset. From Spark's built-in machine learning libraries, this example uses *classification* through logistic regression.
15+
Learn how to use Apache Spark MLlib to create a machine learning application. The application will do predictive analysis on an open dataset. From Spark's built-in machine learning libraries, this example uses *classification* through logistic regression.
1616

1717
MLlib is a core Spark library that provides many utilities useful for machine learning tasks, such as:
1818

@@ -175,7 +175,7 @@ Let's start to get a sense of what the dataset contains.
175175
176176
![SQL query output](./media/apache-spark-machine-learning-mllib-ipython/spark-machine-learning-query-output.png "SQL query output")
177177
178-
3. You can also use [Matplotlib](https://en.wikipedia.org/wiki/Matplotlib), a library used to construct visualization of data, to create a plot. Because the plot must be created from the locally persisted **countResultsdf** dataframe, the code snippet must begin with the `%%local` magic. This action ensures that the code is run locally on the Jupyter server.
178+
3. You can also use Matplotlib, a library used to construct visualization of data, to create a plot. Because the plot must be created from the locally persisted **countResultsdf** dataframe, the code snippet must begin with the `%%local` magic. This action ensures that the code is run locally on the Jupyter server.
179179
180180
```PySpark
181181
%%local
@@ -357,28 +357,5 @@ After you have finished running the application, you should shut down the notebo
357357
## Next steps
358358
359359
* [Overview: Apache Spark on Azure HDInsight](apache-spark-overview.md)
360-
361-
### Scenarios
362-
363-
* [Apache Spark with BI: Interactive data analysis using Spark in HDInsight with BI tools](apache-spark-use-bi-tools.md)
364-
* [Apache Spark with Machine Learning: Use Spark in HDInsight for analyzing building temperature using HVAC data](apache-spark-ipython-notebook-machine-learning.md)
365360
* [Website log analysis using Apache Spark in HDInsight](apache-spark-custom-library-website-log-analysis.md)
366-
367-
### Create and run applications
368-
369-
* [Create a standalone application using Scala](apache-spark-create-standalone-application.md)
370-
* [Run jobs remotely on an Apache Spark cluster using Apache Livy](apache-spark-livy-rest-interface.md)
371-
372-
### Tools and extensions
373-
374-
* [Use HDInsight Tools Plugin for IntelliJ IDEA to create and submit Spark Scala applications](apache-spark-intellij-tool-plugin.md)
375-
* [Use HDInsight Tools Plugin for IntelliJ IDEA to debug Apache Spark applications remotely](apache-spark-intellij-tool-plugin-debug-jobs-remotely.md)
376-
* [Use Apache Zeppelin notebooks with an Apache Spark cluster on HDInsight](apache-spark-zeppelin-notebook.md)
377-
* [Kernels available for Jupyter notebook in Apache Spark cluster for HDInsight](apache-spark-jupyter-notebook-kernels.md)
378-
* [Use external packages with Jupyter notebooks](apache-spark-jupyter-notebook-use-external-packages.md)
379-
* [Install Jupyter on your computer and connect to an HDInsight Spark cluster](apache-spark-jupyter-notebook-install-locally.md)
380-
381-
### Manage resources
382-
383-
* [Manage resources for the Apache Spark cluster in Azure HDInsight](apache-spark-resource-manager.md)
384-
* [Track and debug jobs running on an Apache Spark cluster in HDInsight](apache-spark-job-debugging.md)
361+
* [Microsoft Cognitive Toolkit deep learning model with Azure HDInsight](apache-spark-microsoft-cognitive-toolkit.md)

0 commit comments

Comments
 (0)