Skip to content

Commit 0d3f311

Browse files
Merge pull request #234930 from Rodrigossz/main
Fixing Spark ObjectId code
2 parents 6c32c6c + 53b3639 commit 0d3f311

File tree

1 file changed

+16
-16
lines changed

1 file changed

+16
-16
lines changed

articles/cosmos-db/analytical-store-introduction.md

Lines changed: 16 additions & 16 deletions
Original file line numberDiff line numberDiff line change
@@ -4,7 +4,7 @@ description: Learn about Azure Cosmos DB transactional (row-based) and analytica
44
author: Rodrigossz
55
ms.service: cosmos-db
66
ms.topic: conceptual
7-
ms.date: 03/24/2022
7+
ms.date: 04/18/2023
88
ms.author: rosouz
99
ms.custom: seo-nov-2020, devx-track-azurecli, ignite-2022
1010
ms.reviewer: mjbrown
@@ -446,24 +446,24 @@ the MongoDB `_id` field is fundamental to every collection in MongoDB and origin
446446
447447
###### Working with the MongoDB `_id` field in Spark
448448
449+
The example below works on Spark 2.x and 3.x versions:
450+
449451
```Python
450-
import org.apache.spark.sql.types._
451-
val simpleSchema = StructType(Array(
452-
    StructField("_id", StructType(Array(StructField("objectId",BinaryType,true)) ),true),
453-
    StructField("id", StringType, true)
454-
  ))
455-
456-
df = spark.read.format("cosmos.olap")\
457-
.option("spark.synapse.linkedService", "<enter linked service name>")\
458-
.option("spark.cosmos.container", "<enter container name>")\
459-
.schema(simpleSchema)
460-
.load()
452+
val df = spark.read.format("cosmos.olap").option("spark.synapse.linkedService", "xxxx").option("spark.cosmos.container", "xxxx").load()
461453
462-
df.select("id", "_id.objectId").show()
463-
```
454+
val convertObjectId = udf((bytes: Array[Byte]) => {
455+
val builder = new StringBuilder
464456
465-
> [!NOTE]
466-
> This workaround was designed to work with Spark 2.4.
457+
for (b <- bytes) {
458+
builder.append(String.format("%02x", Byte.box(b)))
459+
}
460+
builder.toString
461+
}
462+
)
463+
464+
val dfConverted = df.withColumn("objectId", col("_id.objectId")).withColumn("convertedObjectId", convertObjectId(col("_id.objectId"))).select("id", "objectId", "convertedObjectId")
465+
display(dfConverted)
466+
```
467467
468468
###### Working with the MongoDB `_id` field in SQL
469469

0 commit comments

Comments
 (0)