Skip to content

Commit 2025daa

Browse files
committed
Merge branch 'patch-2' of https://github.com/tempacct791/azure-docs into public-prs-feb-2025-2
2 parents 5659706 + dd89a87 commit 2025daa

File tree

1 file changed

+14
-14
lines changed

1 file changed

+14
-14
lines changed

articles/synapse-analytics/spark/apache-spark-to-power-bi.md

Lines changed: 14 additions & 14 deletions
Original file line numberDiff line numberDiff line change
@@ -26,19 +26,19 @@ In this example, you will use Apache Spark to perform some analysis on taxi trip
2626
1. Run the following lines to create a Spark dataframe by pasting the code into a new cell. This retrieves the data via the Open Datasets API. Pulling all of this data generates about 1.5 billion rows. The following code example uses start_date and end_date to apply a filter that returns a single month of data.
2727

2828
```python
29-
from azureml.opendatasets import NycTlcYellow
30-
from dateutil import parser
31-
from datetime import datetime
32-
33-
end_date = parser.parse('2018-06-06')
34-
start_date = parser.parse('2018-05-01')
35-
nyc_tlc = NycTlcYellow(start_date=start_date, end_date=end_date)
36-
filtered_df = nyc_tlc.to_spark_dataframe()
29+
from azureml.opendatasets import NycTlcYellow
30+
from dateutil import parser
31+
from datetime import datetime
32+
33+
end_date = parser.parse('2018-06-06')
34+
start_date = parser.parse('2018-05-01')
35+
nyc_tlc = NycTlcYellow(start_date=start_date, end_date=end_date)
36+
filtered_df = spark.createDataFrame(nyc_tlc.to_pandas_dataframe())
3737
```
3838
2. Using Apache Spark SQL, we will create a database called NycTlcTutorial. We will use this database to store the results of our data processing.
3939
```python
4040
%%pyspark
41-
spark.sql("CREATE DATABASE IF NOT EXISTS NycTlcTutorial")
41+
spark.sql("CREATE DATABASE IF NOT EXISTS NycTlcTutorial")
4242
```
4343
3. Next, we will use Spark dataframe operations to process the data. In the following code, we perform the following transformations:
4444
1. The removal of columns which are not needed.
@@ -63,10 +63,10 @@ In this example, you will use Apache Spark to perform some analysis on taxi trip
6363
& (filtered_df.paymentType.isin({"1", "2"})))
6464
```
6565
4. Finally, we will save our dataframe using the Apache Spark ```saveAsTable``` method. This will allow you to later query and connect to the same table using serverless SQL pools.
66-
```python
67-
taxi_df.write.mode("overwrite").saveAsTable("NycTlcTutorial.nyctaxi")
68-
```
69-
66+
```python
67+
taxi_df.write.mode("overwrite").saveAsTable("NycTlcTutorial.nyctaxi")
68+
```
69+
7070
## Query data using serverless SQL pools
7171
Azure Synapse Analytics allows the different workspace computational engines to share databases and tables between its serverless Apache Spark pools and serverless SQL pool. This is powered through the Synapse [shared metadata management](../metadata/overview.md) capability. As a result, the Spark created databases and their parquet-backed tables become visible in the workspace serverless SQL pool.
7272

@@ -113,4 +113,4 @@ For more details on how to create a dataset through serverless SQL and connect t
113113
## Next steps
114114
You can continue to learn more about data visualization capabilities in Azure Synapse Analytics by visiting the following documents and tutorials:
115115
- [Visualize data with serverless Apache Spark pools](../spark/apache-spark-data-visualization-tutorial.md)
116-
- [Overview of data visualization with Apache Spark pools](../spark/apache-spark-data-visualization.md)
116+
- [Overview of data visualization with Apache Spark pools](../spark/apache-spark-data-visualization.md)

0 commit comments

Comments
 (0)