You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: sdk/cosmos/azure-cosmos-spark_3_2-12/docs/configuration-reference.md
+7-6Lines changed: 7 additions & 6 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -112,12 +112,13 @@ Used to influence the json serialization/deserialization behavior
112
112
|`spark.cosmos.serialization.dateTimeConversionMode`|`Default`| The date/time conversion mode (`Default`, `AlwaysEpochMilliseconds`, `AlwaysEpochMillisecondsWithSystemDefaultTimezone`). With `Default` the standard Spark 3.* behavior is used (`java.sql.Date`/`java.time.LocalDate` are converted to EpochDay, `java.sql.Timestamp`/`java.time.Instant` are converted to MicrosecondsFromEpoch). With `AlwaysEpochMilliseconds` the same behavior the Cosmos DB connector for Spark 2.4 used is applied - `java.sql.Date`, `java.time.LocalDate`, `java.sql.Timestamp` and `java.time.Instant` are converted to MillisecondsFromEpoch. The behavior for `AlwaysEpochMillisecondsWithSystemDefaultTimezone` is identical with `AlwaysEpochMilliseconds` except that it will assume System default time zone / Spark session time zone (specified via `spark.sql.session.timezone`) instead of UTC when the date/time to be parsed has no explicit time zone. |
113
113
114
114
#### Change feed (only for Spark-Streaming using `cosmos.oltp.changeFeed` data source, which is read-only) configuration
|`spark.cosmos.changeFeed.startFrom`|`Beginning`| ChangeFeed Start from settings (`Now`, `Beginning` or a certain point in time (UTC) for example `2020-02-10T14:15:03`) - the default value is `Beginning`. If the write config contains a `checkpointLocation` and any checkpoints exist, the stream is always continued independent of the `spark.cosmos.changeFeed.startFrom` settings - you need to change `checkpointLocation` or delete checkpoints to restart the stream if that is the intention. |
118
-
|`spark.cosmos.changeFeed.mode`|`Incremental/LatestVersion`| ChangeFeed mode (`Incremental/LatestVersion` or `FullFidelity/AllVersionsAndDeletes`) - NOTE: `FullFidelity/AllVersionsAndDeletes` is in experimental state right now. It requires that the subscription/account has been enabled for the private preview and there are known breaking changes that will happen for `FullFidelity/AllVersionsAndDeletes` (schema of the returned documents). It is recommended to only use `FullFidelity/AllVersionsAndDeletes` for non-production scenarios at this point. |
119
-
|`spark.cosmos.changeFeed.itemCountPerTriggerHint`| None (process all available data in first micro-batch) | Approximate maximum number of items read from change feed for each micro-batch/trigger. If not set, all available data in the changefeed is going to be processed in the first micro-batch. This could overload the client-resources (especially memory), so choosing a value to cap the resource consumption in the Spark executors is advisable here. Usually a reasonable value would be at least in the 100-thousands or single-digit millions. |
120
-
|`spark.cosmos.changeFeed.batchCheckpointLocation`| None | Can be used to generate checkpoints when using change feed queries in batch mode - and proceeding on the next iteration where the previous left off. |
|`spark.cosmos.changeFeed.startFrom`|`Beginning`| ChangeFeed Start from settings (`Now`, `Beginning` or a certain point in time (UTC) for example `2020-02-10T14:15:03`) - the default value is `Beginning`. If the write config contains a `checkpointLocation` and any checkpoints exist, the stream is always continued independent of the `spark.cosmos.changeFeed.startFrom` settings - you need to change `checkpointLocation` or delete checkpoints to restart the stream if that is the intention. |
118
+
|`spark.cosmos.changeFeed.mode`|`Incremental/LatestVersion`| ChangeFeed mode (`Incremental/LatestVersion` or `FullFidelity/AllVersionsAndDeletes`) - NOTE: `FullFidelity/AllVersionsAndDeletes` is in experimental state right now. It requires that the subscription/account has been enabled for the private preview and there are known breaking changes that will happen for `FullFidelity/AllVersionsAndDeletes` (schema of the returned documents). It is recommended to only use `FullFidelity/AllVersionsAndDeletes` for non-production scenarios at this point. |
119
+
|`spark.cosmos.changeFeed.itemCountPerTriggerHint`| None (process all available data in first micro-batch) | Approximate maximum number of items read from change feed for each micro-batch/trigger. If not set, all available data in the changefeed is going to be processed in the first micro-batch. This could overload the client-resources (especially memory), so choosing a value to cap the resource consumption in the Spark executors is advisable here. Usually a reasonable value would be at least in the 100-thousands or single-digit millions. |
120
+
|`spark.cosmos.changeFeed.batchCheckpointLocation`| None | Can be used to generate checkpoints when using change feed queries in batch mode - and proceeding on the next iteration where the previous left off. |
121
+
|`spark.cosmos.changeFeed.performance.monitoring.enabled`|`true`| A Flag to indicate whether enable change feed performance monitoring. When enabled, custom task metrics will be tracked internally, which will be used to dynamically tuning the change feed micro-batch size. |
Copy file name to clipboardExpand all lines: sdk/cosmos/azure-cosmos-spark_3_2-12/src/test/scala/com/azure/cosmos/spark/CosmosPartitionPlannerSpec.scala
+1-1Lines changed: 1 addition & 1 deletion
Original file line number
Diff line number
Diff line change
@@ -487,7 +487,7 @@ class CosmosPartitionPlannerSpec extends UnitSpec {
487
487
calculate(1).endLsn.get shouldEqual 2150
488
488
}
489
489
490
-
it should "calculateEndLsn should distribute rate based on metrics with readLimit and multiple partitions" in {
490
+
it should "calculateEndLsn should distribute rate based on metrics with readLimit" in {
0 commit comments