Skip to content

Commit cd5bf40

Browse files
Merge pull request #267216 from RamanathanChinnappan-MSFT/patch-97
(AzureCXP) fixes MicrosoftDocs/azure-docs#120145
2 parents 2d4d198 + 2c1725a commit cd5bf40

File tree

1 file changed

+25
-25
lines changed

1 file changed

+25
-25
lines changed

articles/synapse-analytics/spark/apache-spark-performance-hyperspace.md

Lines changed: 25 additions & 25 deletions
Original file line numberDiff line numberDiff line change
@@ -92,7 +92,7 @@ res3: org.apache.spark.sql.SparkSession = org.apache.spark.sql.SparkSession@297e
9292

9393
To prepare your environment, you'll create sample data records and save them as Parquet data files. Parquet is used for illustration, but you can also use other formats such as CSV. In the subsequent cells, you'll see how you can create several Hyperspace indexes on this sample dataset and make Spark use them when running queries.
9494

95-
The example records correspond to two datasets: department and employee. You should configure the "empLocation" and "deptLocation" paths so that on the storage account they point to your desired location to save generated data files.
95+
The example records correspond to two datasets: department and employee. You should configure the "emp_Location" and "dept_Location" paths so that on the storage account they point to your desired location to save generated data files.
9696

9797
The output of running the following cell shows contents of our datasets as lists of triplets followed by references to dataFrames created to save the content of each dataset in our preferred location.
9898

@@ -130,10 +130,10 @@ import spark.implicits._
130130
val empData: DataFrame = employees.toDF("empId", "empName", "deptId")
131131
val deptData: DataFrame = departments.toDF("deptId", "deptName", "location")
132132

133-
val empLocation: String = "/<yourpath>/employees.parquet" //TODO ** customize this location path **
134-
val deptLocation: String = "/<yourpath>/departments.parquet" //TODO ** customize this location path **
135-
empData.write.mode("overwrite").parquet(empLocation)
136-
deptData.write.mode("overwrite").parquet(deptLocation)
133+
val emp_Location: String = "/<yourpath>/employees.parquet" //TODO ** customize this location path **
134+
val dept_Location: String = "/<yourpath>/departments.parquet" //TODO ** customize this location path **
135+
empData.write.mode("overwrite").parquet(emp_Location)
136+
deptData.write.mode("overwrite").parquet(dept_Location)
137137
```
138138

139139
::: zone-end
@@ -218,10 +218,10 @@ var employeeSchema = new StructType(new List<StructField>()
218218
DataFrame empData = spark.CreateDataFrame(employees, employeeSchema);
219219
DataFrame deptData = spark.CreateDataFrame(departments, departmentSchema);
220220

221-
string empLocation = "/<yourpath>/employees.parquet"; //TODO ** customize this location path **
222-
string deptLocation = "/<yourpath>/departments.parquet"; //TODO ** customize this location path **
223-
empData.Write().Mode("overwrite").Parquet(empLocation);
224-
deptData.Write().Mode("overwrite").Parquet(deptLocation);
221+
string emp_Location = "/<yourpath>/employees.parquet"; //TODO ** customize this location path **
222+
string dept_Location = "/<yourpath>/departments.parquet"; //TODO ** customize this location path **
223+
empData.Write().Mode("overwrite").Parquet(emp_Location);
224+
deptData.Write().Mode("overwrite").Parquet(dept_Location);
225225

226226
```
227227

@@ -235,8 +235,8 @@ employees: Seq[(Int, String, Int)] = List((7369,SMITH,20), (7499,ALLEN,30), (752
235235

236236
empData: org.apache.spark.sql.DataFrame = [empId: int, empName: string ... 1 more field]
237237
deptData: org.apache.spark.sql.DataFrame = [deptId: int, deptName: string ... 1 more field]
238-
empLocation: String = /your-path/employees.parquet
239-
deptLocation: String = /your-path/departments.parquet
238+
emp_Location: String = /your-path/employees.parquet
239+
dept_Location: String = /your-path/departments.parquet
240240
```
241241

242242
Let's verify the contents of the Parquet files we created to make sure they contain expected records in the correct format. Later, we'll use these data files to create Hyperspace indexes and run sample queries.
@@ -246,9 +246,9 @@ Running the following cell produces an output that displays the rows in employee
246246
:::zone pivot = "programming-language-scala"
247247

248248
```scala
249-
// empLocation and deptLocation are the user defined locations above to save parquet files
250-
val empDF: DataFrame = spark.read.parquet(empLocation)
251-
val deptDF: DataFrame = spark.read.parquet(deptLocation)
249+
// emp_Location and dept_Location are the user defined locations above to save parquet files
250+
val empDF: DataFrame = spark.read.parquet(emp_Location)
251+
val deptDF: DataFrame = spark.read.parquet(dept_Location)
252252

253253
// Verify the data is available and correct
254254
empDF.show()
@@ -277,9 +277,9 @@ dept_DF.show()
277277

278278
```csharp
279279

280-
// empLocation and deptLocation are the user-defined locations above to save parquet files
281-
DataFrame empDF = spark.Read().Parquet(empLocation);
282-
DataFrame deptDF = spark.Read().Parquet(deptLocation);
280+
// emp_Location and dept_Location are the user-defined locations above to save parquet files
281+
DataFrame empDF = spark.Read().Parquet(emp_Location);
282+
DataFrame deptDF = spark.Read().Parquet(dept_Location);
283283

284284
// Verify the data is available and correct
285285
empDF.Show();
@@ -782,8 +782,8 @@ The following cell enables Hyperspace and creates two DataFrames containing your
782782
// Enable Hyperspace
783783
spark.enableHyperspace
784784

785-
val empDFrame: DataFrame = spark.read.parquet(empLocation)
786-
val deptDFrame: DataFrame = spark.read.parquet(deptLocation)
785+
val empDFrame: DataFrame = spark.read.parquet(emp_Location)
786+
val deptDFrame: DataFrame = spark.read.parquet(dept_Location)
787787

788788
empDFrame.show(5)
789789
deptDFrame.show(5)
@@ -815,8 +815,8 @@ dept_DF.show(5)
815815
// Enable Hyperspace
816816
spark.EnableHyperspace();
817817

818-
DataFrame empDFrame = spark.Read().Parquet(empLocation);
819-
DataFrame deptDFrame = spark.Read().Parquet(deptLocation);
818+
DataFrame empDFrame = spark.Read().Parquet(emp_Location);
819+
DataFrame deptDFrame = spark.Read().Parquet(dept_Location);
820820

821821
empDFrame.Show(5);
822822
deptDFrame.Show(5);
@@ -1392,9 +1392,9 @@ val extraDepartments = Seq(
13921392
(60, "Human Resources", "San Francisco"))
13931393

13941394
val extraDeptData: DataFrame = extraDepartments.toDF("deptId", "deptName", "location")
1395-
extraDeptData.write.mode("Append").parquet(deptLocation)
1395+
extraDeptData.write.mode("Append").parquet(dept_Location)
13961396

1397-
val deptDFrameUpdated: DataFrame = spark.read.parquet(deptLocation)
1397+
val deptDFrameUpdated: DataFrame = spark.read.parquet(dept_Location)
13981398

13991399
deptDFrameUpdated.show(10)
14001400

@@ -1432,9 +1432,9 @@ var extraDepartments = new List<GenericRow>()
14321432
};
14331433

14341434
DataFrame extraDeptData = spark.CreateDataFrame(extraDepartments, departmentSchema);
1435-
extraDeptData.Write().Mode("Append").Parquet(deptLocation);
1435+
extraDeptData.Write().Mode("Append").Parquet(dept_Location);
14361436

1437-
DataFrame deptDFrameUpdated = spark.Read().Parquet(deptLocation);
1437+
DataFrame deptDFrameUpdated = spark.Read().Parquet(dept_Location);
14381438

14391439
deptDFrameUpdated.Show(10);
14401440

0 commit comments

Comments
 (0)