Skip to content

Run it from local spark instance on eclipse #49

@santooudnur

Description

@santooudnur

Is it possible to access the IBM Cloudstorage outside apache instance in Bluemix?

Basically I am trying to use
this library for access COS objects from scala program run on local apache spark.
I am trying to connect to cloudstorage instance on my Bluemix account , and access temperatureUS.csv object in bucket tests from Scala code.

Test code can be found here
SparkCosS.txt
Always getting following error

18/01/15 19:29:50 DEBUG request: Received error response: com.amazonaws.services.s3.model.AmazonS3Exception: Forbidden (Service: null; Status Code: 403; Error Code: 403 Forbidden; Request ID: 8cee1d0b-c4d8-4800-a75f-06ff49e76a5b), S3 Extended Request ID: null
18/01/15 19:29:50 DEBUG COSAPIClient: Not found cos://tests.myCos/temperatureUS.csv
18/01/15 19:29:50 WARN COSAPIClient: com.amazonaws.services.s3.model.AmazonS3Exception: Forbidden (Service: Amazon S3; Status Code: 403; Error Code: 403 Forbidden; Request ID: 8cee1d0b-c4d8-4800-a75f-06ff49e76a5b), S3 Extended Request ID: null
Exception in thread "main" org.apache.spark.sql.AnalysisException: Path does not exist: cos://tests.myCos/temperatureUS.csv;
at org.apache.spark.sql.execution.datasources.DataSource$.org$apache$spark$sql$execution$datasources$DataSource$$checkAndGlobPathIfNecessary(DataSource.scala:626)
at org.apache.spark.sql.execution.datasources.DataSource$$anonfun$14.apply(DataSource.scala:350)
at org.apache.spark.sql.execution.datasources.DataSource$$anonfun$14.apply(DataSource.scala:350)
at scala.collection.TraversableLike$$anonfun$flatMap$1.apply(TraversableLike.scala:241)
at scala.collection.TraversableLike$$anonfun$flatMap$1.apply(TraversableLike.scala:241)
at scala.collection.immutable.List.foreach(List.scala:392)
at scala.collection.TraversableLike$class.flatMap(TraversableLike.scala:241)
at scala.collection.immutable.List.flatMap(List.scala:355)
at org.apache.spark.sql.execution.datasources.DataSource.resolveRelation(DataSource.scala:349)
at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:178)
at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:156)
at test.SparkCosFinalSL$.main(SparkCosSL.scala:86)
at test.SparkCosFinalSL.main(SparkCosSL.scala)

However I am able to connect to service through java API.
SDKGlobalConfiguration.IAM_ENDPOINT = "https://iam.bluemix.net/oidc/token";

    String bucketName = "testb5e78bd1988d453f81ec11cbfced949a";//"<bucketName>";
    String api_key = "L_-uMLV9AU-ZBWr0BE6JmiHMYFqsORXndMmfrpaqJIgG";//"<apiKey>";
    String service_instance_id = "crn:v1:bluemix:public:cloud-object-storage:global:a/647b189897a37a7ac4dbf0a3ef43fc42:866ec777-5c98-4e1c-b2bf-e5d0b1d13694::";//"<resourceInstanceId>";
    String endpoint_url = "https://s3-api.us-geo.objectstorage.softlayer.net";
    String location =  "us-geo"; //"us";

    System.out.println("Current time: " + new Timestamp(System.currentTimeMillis()).toString());
    _s3Client = createClient(api_key, service_instance_id, endpoint_url, location);
    
    listObjects(bucketName, _s3Client);
    listBuckets(_s3Client);

Let me know if there is anything missed by me.

However only observation I have seen when run spark from eclipse is that Hadoop library not loaded

18/01/15 19:42:01 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable

Appreciate your quick response

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions