Skip to content

Commit df14e14

Browse files
committed
update readme file for R
1 parent 2ddcfd4 commit df14e14

File tree

1 file changed

+47
-23
lines changed

1 file changed

+47
-23
lines changed

r/sparkr/README.md

Lines changed: 47 additions & 23 deletions
Original file line numberDiff line numberDiff line change
@@ -1,23 +1,23 @@
11
# ibmos2sparkR
22

3-
The package sets Spark Hadoop configurations for connecting to
3+
The package sets Spark Hadoop configurations for connecting to
44
IBM Bluemix Object Storage and Softlayer Account Object Storage instances. This packages uses the new [stocator](https://github.com/SparkTC/stocator) driver, which implements the `swift2d` protocol, and is availble
5-
on the latest IBM Apache Spark Service instances (and through IBM Data Science Experience).
5+
on the latest IBM Apache Spark Service instances (and through IBM Data Science Experience).
66

7-
Using the `stocator` driver connects your Spark executor nodes directly
7+
Using the `stocator` driver connects your Spark executor nodes directly
88
to your data in object storage.
99
This is an optimized, high-performance method to connect Spark to your data. All IBM Apache Spark kernels
10-
are instantiated with the `stocator` driver in the Spark kernel's classpath.
11-
You can also run this locally by installing the [stocator driver](https://github.com/SparkTC/stocator)
12-
and adding it to your local Apache Spark kernel's classpath.
10+
are instantiated with the `stocator` driver in the Spark kernel's classpath.
11+
You can also run this locally by installing the [stocator driver](https://github.com/SparkTC/stocator)
12+
and adding it to your local Apache Spark kernel's classpath.
1313

1414

1515
This package expects a SparkContext instantiated by SparkR. It has been tested to work with
1616
IBM Spark service in R notebooks on IBM DSX, though it should work with other Spark installations
1717
that utilize the [swift2d/stocator](https://github.com/SparkTC/stocator) protocol.
1818

1919

20-
## Installation
20+
## Installation
2121

2222
library(devtools)
2323
devtools::install_url("https://github.com/ibm-cds-labs/ibmos2spark/archive/<version).zip", subdir= "r/sparkr/")
@@ -27,15 +27,39 @@ where `version` should be a tagged release, such as `0.0.7`. (If you're daring,
2727
## Usage
2828

2929
The usage of this package depends on *from where* your Object Storage instance was created. This package
30-
is intended to connect to IBM's Object Storage instances obtained from Bluemix or Data Science Experience
31-
(DSX) or from a separate account on IBM Softlayer. The instructions below show how to connect to
32-
either type of instance.
30+
is intended to connect to IBM's Object Storage instances obtained from Bluemix or Data Science Experience
31+
(DSX) or from a separate account on IBM Softlayer. It also supports IBM cloud object storage (COS). The
32+
instructions below show how to connect to either type of instance.
3333

3434
The connection setup is essentially the same. But the difference for you is how you deliver the
3535
credentials. If your Object Storage was created with Bluemix/DSX, with a few clicks on the side-tab
3636
within a DSX Jupyter notebook, you can obtain your account credentials in the form of a list.
3737
If your Object Storage was created with a Softlayer account, each part of the credentials will
38-
be found as text that you can copy and paste into the example code below.
38+
be found as text that you can copy and paste into the example code below.
39+
40+
### Cloud Object Storage
41+
library(ibmos2sparkR)
42+
configurationName = "bluemixO123"
43+
44+
# In DSX notebooks, the "insert to code" will insert this credentials list for you
45+
credentials <- list(
46+
accessKey = "123",
47+
secretKey = "123",
48+
endpoint = "https://s3-api.objectstorage.....net/"
49+
)
50+
51+
cos <- CloudObjectStorage(sparkContext=sc, credentials=credentials, configurationName=configurationName)
52+
bucketName <- "bucketName"
53+
fileName <- "test.csv"
54+
url <- cos$url(bucketName, fileName)
55+
56+
invisible(sparkR.session(appName = "SparkSession R"))
57+
58+
df.data.1 <- read.df(url,
59+
source = "org.apache.spark.sql.execution.datasources.csv.CSVFileFormat",
60+
header = "true")
61+
head(df.data.1)
62+
3963

4064
### Bluemix / Data Science Experience
4165

@@ -45,11 +69,11 @@ be found as text that you can copy and paste into the example code below.
4569
# In DSX notebooks, the "insert to code" will insert this credentials list for you
4670
creds = list(
4771
auth_url="https://identity.open.softlayer.com",
48-
region="dallas",
49-
project_id = "XXXXX",
50-
user_id="XXXXX",
72+
region="dallas",
73+
project_id = "XXXXX",
74+
user_id="XXXXX",
5175
password="XXXXX")
52-
76+
5377
bmconfig = bluemix(sparkcontext=sc, name=configurationname, credentials = creds)
5478

5579
container = "my_container"
@@ -67,24 +91,24 @@ be found as text that you can copy and paste into the example code below.
6791
library(ibmos2sparkR)
6892
configurationname = "softlayerOScon" #can be any any name you like (allows for multiple configurations)
6993

70-
slconfig = softlayer(sparkcontext=sc,
71-
name=configurationname,
94+
slconfig = softlayer(sparkcontext=sc,
95+
name=configurationname,
7296
auth_url="https://identity.open.softlayer.com",
73-
tenant = "XXXXX",
74-
username="XXXXX",
97+
tenant = "XXXXX",
98+
username="XXXXX",
7599
password="XXXXX"
76100
)
77-
101+
78102
container = "my_container"
79103
object = "my_data.csv"
80104

81105
data <- read.df(sqlContext, slconfig$url(container,object), source = "com.databricks.spark.csv", header = "true")
82-
106+
83107
# OR, for Spark >= 2.0.0
84108

85109
data = read.df(slconfig$url(container, objectname), source="com.databricks.spark.csv", header="true")
86-
87-
## License
110+
111+
## License
88112

89113
Copyright 2016 IBM Cloud Data Services
90114

0 commit comments

Comments
 (0)