add singleton to track bucket/workspace information in scala by jdries · Pull Request #443 · Open-EO/openeo-geotrellis-extensions

jdries · 2025-05-07T07:08:04Z

pvbouwel

I like the idea of the PR but implementation wise there are some unknowns and I would have some things different

pvbouwel · 2025-12-19T17:42:18Z

openeo-geotrellis/src/main/scala/org/openeo/geotrellis/MultiClientRangeReaderProvider.scala

    val isCloudFerro = s3Endpoint != null &&
      (s3Endpoint.toLowerCase.contains("cloudferro") || s3Endpoint.toLowerCase.endsWith(".dataspace.copernicus.eu"))

+    val maybeWorkspace = WorkspaceRepository.get().getWorkspaceByBucket(s3Uri.getBucket)


I don't believe this logic is best placed here. CreoS3Utils has helpers like getCreoS3Client so if we solve the problem here in MultiClientRangeReader we'd have to solve it in other places as well. Ideally there is one factory for creating S3 clients. There are however multiple problems with getCreoS3Client and it's consumption:

It assumes the region is known or defaults to 'RegionOne'
a. a hard coded default does not make sense imho
b. the placeholder value would cause the endpoint to be resolved via an environment variable SWIFT_URL but it would not change the region. If sigv4 checking is done stricly then it would lead to authorization failures.

From a consumption part it often called in the same file without specifying an argument

I like the idea of a Workspace Repository. Python could provision the workspaces info that is potentially encountered.

So a ClientFactory method that just takes a bucket name (or S3URI from which it can extract the bucket name) use the workspace repository to resolve bucket to S3 details (region + endpoint or region+profile) and potentially falls back to the legacy resolution would be usable in both places.

pvbouwel · 2025-12-19T17:58:15Z

openeo-geotrellis/src/main/scala/org/openeo/workspace/WorkspaceRepository.scala

+     */
+
+    private val workspaces = scala.collection.mutable.Map[String, WorkspaceConfig]()
+    private val workspacesByBucket = scala.collection.mutable.Map[String, WorkspaceConfig]()


Is there a particular reason not to just have a Map[String, String]. A 1-to-1 relationship between workspace and bucketname is by doing 2 lookups the WorkspaceConfig won't be duplicated.

pvbouwel · 2025-12-19T18:01:06Z

openeo-geotrellis/src/main/scala/org/openeo/workspace/WorkspaceRepository.scala

+    private val workspaces = scala.collection.mutable.Map[String, WorkspaceConfig]()
+    private val workspacesByBucket = scala.collection.mutable.Map[String, WorkspaceConfig]()
+
+    def registerBucketDetails( workspaceId:String, bucketName: String,


What will call these register functions? I guess the singleton is per JVM so needs to be called on driver and all executors?

For K8s we can easily make sure a file is present in the execution environment. Is the same true for YARN? Otherwise the loading can be done on initialization of the singleton.

Are sync jobs a thing on YARN because? Because for jobs I guess at spark-submit time extra files can be handed.

add singleton to track bucket/workspace information in scala

ba58873

#441

pvbouwel requested changes Dec 19, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

add singleton to track bucket/workspace information in scala#443

add singleton to track bucket/workspace information in scala#443
jdries wants to merge 1 commit intodevelopfrom
workspacerepo

jdries commented May 7, 2025

Uh oh!

pvbouwel left a comment

Uh oh!

pvbouwel Dec 19, 2025

Uh oh!

pvbouwel Dec 19, 2025

Uh oh!

pvbouwel Dec 19, 2025

Uh oh!

pvbouwel Dec 19, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

jdries commented May 7, 2025

Uh oh!

pvbouwel left a comment

Choose a reason for hiding this comment

Uh oh!

pvbouwel Dec 19, 2025

Choose a reason for hiding this comment

Uh oh!

pvbouwel Dec 19, 2025

Choose a reason for hiding this comment

Uh oh!

pvbouwel Dec 19, 2025

Choose a reason for hiding this comment

Uh oh!

pvbouwel Dec 19, 2025

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants