You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: articles/synapse-analytics/spark/apache-spark-to-power-bi.md
+14-14Lines changed: 14 additions & 14 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -26,19 +26,19 @@ In this example, you will use Apache Spark to perform some analysis on taxi trip
26
26
1. Run the following lines to create a Spark dataframe by pasting the code into a new cell. This retrieves the data via the Open Datasets API. Pulling all of this data generates about 1.5 billion rows. The following code example uses start_date and end_date to apply a filter that returns a single month of data.
2. Using Apache Spark SQL, we will create a database called NycTlcTutorial. We will use this database to store the results of our data processing.
39
39
```python
40
40
%%pyspark
41
-
spark.sql("CREATE DATABASE IF NOT EXISTS NycTlcTutorial")
41
+
spark.sql("CREATE DATABASE IF NOT EXISTS NycTlcTutorial")
42
42
```
43
43
3. Next, we will use Spark dataframe operations to process the data. In the following code, we perform the following transformations:
44
44
1. The removal of columns which are not needed.
@@ -63,10 +63,10 @@ In this example, you will use Apache Spark to perform some analysis on taxi trip
63
63
& (filtered_df.paymentType.isin({"1", "2"})))
64
64
```
65
65
4. Finally, we will save our dataframe using the Apache Spark ```saveAsTable``` method. This will allow you to later query and connect to the same table using serverless SQL pools.
Azure Synapse Analytics allows the different workspace computational engines to share databases and tables between its serverless Apache Spark pools and serverless SQL pool. This is powered through the Synapse [shared metadata management](../metadata/overview.md) capability. As a result, the Spark created databases and their parquet-backed tables become visible in the workspace serverless SQL pool.
72
72
@@ -113,4 +113,4 @@ For more details on how to create a dataset through serverless SQL and connect t
113
113
## Next steps
114
114
You can continue to learn more about data visualization capabilities in Azure Synapse Analytics by visiting the following documents and tutorials:
115
115
- [Visualize data with serverless Apache Spark pools](../spark/apache-spark-data-visualization-tutorial.md)
116
-
-[Overview of data visualization with Apache Spark pools](../spark/apache-spark-data-visualization.md)
116
+
- [Overview of data visualization with Apache Spark pools](../spark/apache-spark-data-visualization.md)
0 commit comments