Skip to content

Commit d6bba18

Browse files
committed
PLUGIN-72 SAP Hana database Batch source, sink, action and post-action plugins
1 parent 4bfdb6d commit d6bba18

25 files changed

+1225
-2
lines changed

README.md

Lines changed: 10 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -13,17 +13,26 @@ mvn clean test \
1313
```
1414
Notice that you must change properties for Aurora MySQL and Aurora Postgresql to real before running tests.
1515
## Setup Local Environment
16-
MySQL, Postgresql, MSSQL, DB2 are using prebuild images.
16+
MySQL, Postgresql, MSSQL, DB2, SAP HANA are using prebuild images.
1717

1818
Oracle DB image should be build separately.
1919

20+
Note that you should login into docker account to pull SAP HANA image.
21+
Account can be created [here](https://hub.docker.com/signup)
22+
Also, please note SAP HANA is sensitive to some CPU instructions.
23+
CPU model "host-passthrough" or similar can be required if running inside VM.
24+
SAP HANA requires that password for DB is provided through url.
25+
Convenience script ```docker-compose/db-plugins-env/saphana-password-server.sh```
26+
provided for this purpose.
27+
2028
Netezza requires VMware Player for running Netezza emulator.
2129

2230
* [Install Docker Compose](https://docs.docker.com/compose/install/)
2331
* Build local docker images
2432
* [Build Oracle DB docker image version 12.1.0.2-ee](https://github.com/oracle/docker-images/tree/master/OracleDatabase/SingleInstance)
2533
* Start docker environment by running commands:
2634
```bash
35+
bash saphana-password-server.sh &
2736
cd docker-compose/db-plugins-env/
2837
docker-compose up -d
2938
```

docker-compose/db-plugins-env/docker-compose.yml

Lines changed: 22 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -56,4 +56,25 @@ services:
5656
environment:
5757
- ORACLE_SID=cdap
5858
- ORACLE_PDB=mydb
59-
- ORACLE_PWD=123Qwe123
59+
- ORACLE_PWD=123Qwe123
60+
61+
saphana:
62+
image: store/saplabs/hanaexpress:2.00.040.00.20190729.1
63+
hostname: hxehost
64+
ports:
65+
- 39017:39017
66+
- 39013:39013
67+
ulimits:
68+
nproc: 65535
69+
nofile:
70+
soft: 1048576
71+
hard: 1048576
72+
sysctls:
73+
- kernel.shmmax=1073741824
74+
- net.ipv4.ip_local_port_range=60000 65535
75+
- kernel.shmmni=524288
76+
- kernel.shmall=8388608
77+
extra_hosts:
78+
# Alter this if running on non-Linux machine
79+
- "host:172.17.0.1"
80+
command: --agree-to-sap-license --passwords-url http://host:1500
Lines changed: 18 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,18 @@
1+
# Copyright © 2019 Cask Data, Inc.
2+
#
3+
# Licensed under the Apache License, Version 2.0 (the "License"); you may not
4+
# use this file except in compliance with the License. You may obtain a copy of
5+
# the License at
6+
#
7+
# http://www.apache.org/licenses/LICENSE-2.0
8+
#
9+
# Unless required by applicable law or agreed to in writing, software
10+
# distributed under the License is distributed on an "AS IS" BASIS, WITHOUT
11+
# WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the
12+
# License for the specific language governing permissions and limitations under
13+
# the License.
14+
#!/usr/bin/env bash
15+
16+
17+
# Tested with Ubuntu 18.04
18+
echo -e "HTTP/1.1 200 OK\n\n {\n \"master_password\" : \"SAPhxe123\"\n } " | nc -q 1 -l 0.0.0.0 1500

pom.xml

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -37,6 +37,7 @@
3737
<module>netezza-plugin</module>
3838
<module>aurora-mysql-plugin</module>
3939
<module>aurora-postgresql-plugin</module>
40+
<module>saphana-plugin</module>
4041
</modules>
4142

4243
<licenses>
Lines changed: 32 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,32 @@
1+
# SAP HANA Action
2+
3+
4+
Description
5+
-----------
6+
Action that runs a SAP HANA command.
7+
8+
9+
Use Case
10+
--------
11+
The action can be used whenever you want to run a SAP HANA command before or after a data pipeline.
12+
For example, you may want to run a sql update command on a database before the pipeline source pulls data from tables.
13+
14+
15+
Properties
16+
----------
17+
**Driver Name:** Name of the JDBC driver to use.
18+
19+
**Database Command:** Database command to execute.
20+
21+
**Host:** Host that SAP HANA is running on.
22+
23+
**Port:** Port that SAP HANA is running on.
24+
25+
**Database:** SAP HANA database name.
26+
27+
**Username:** User identity for connecting to the specified database.
28+
29+
**Password:** Password to use to connect to the specified database.
30+
31+
**Connection Arguments:** A list of arbitrary string key/value pairs as connection arguments. These arguments
32+
will be passed to the JDBC driver as connection arguments for JDBC drivers that may need additional configurations.
Lines changed: 46 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,46 @@
1+
# SAP HANA Batch Sink
2+
3+
4+
Description
5+
-----------
6+
7+
This sink is used whenever you need to write to a SAP HANA table.
8+
Suppose you periodically build a recommendation model for products on your online store.
9+
The model is stored in a FileSet and you want to export the contents
10+
of the FileSet to a SAP HANA table where it can be served to your users.
11+
12+
Column names would be autodetected from input schema.
13+
14+
15+
Use Case
16+
--------
17+
This sink is used whenever you need to write to a SAP HANA table.
18+
Suppose you periodically build a recommendation model for products on your online store.
19+
The model is stored in a FileSet and you want to export the contents
20+
of the FileSet to a SAP HANA table where it can be served to your users.
21+
22+
Column names would be autodetected from input schema.
23+
24+
25+
26+
Properties
27+
----------
28+
29+
**Reference Name:** Name used to uniquely identify this sink for lineage, annotating metadata, etc.
30+
31+
**Driver Name:** Name of the JDBC driver to use.
32+
33+
**Table Name:** Name of the table to export to.
34+
35+
**Host:** Host that SAP HANA is running on.
36+
37+
**Port:** Port that SAP HANA is running on.
38+
39+
**Database:** SAP HANA database name.
40+
41+
**Username:** User identity for connecting to the specified database.
42+
43+
**Password:** Password to use to connect to the specified database.
44+
45+
**Connection Arguments:** A list of arbitrary string key/value pairs as connection arguments. These arguments
46+
will be passed to the JDBC driver as connection arguments for JDBC drivers that may need additional configurations.
Lines changed: 52 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,52 @@
1+
# SAP HANA Batch Source
2+
3+
4+
Description
5+
-----------
6+
7+
Reads from a SAP HANA using a configurable SQL query.
8+
Outputs one record for each row returned by the query.
9+
10+
Use Case
11+
--------
12+
The source is used whenever you need to read from a SAP HANA. For example, you may want
13+
to create daily snapshots of a database table by using this source and writing to
14+
a TimePartitionedFileSet.
15+
16+
17+
18+
Properties
19+
----------
20+
**Reference Name:** Name used to uniquely identify this source for lineage, annotating metadata, etc.
21+
22+
**Driver Name:** Name of the JDBC driver to use.
23+
24+
**Host:** Host that SAP HANA is running on.
25+
26+
**Port:** Port that SAP HANA is running on.
27+
28+
**Database:** SAP HANA database name.
29+
30+
**Import Query:** The SELECT query to use to import data from the specified table.
31+
You can specify an arbitrary number of columns to import, or import all columns using \*. The Query should
32+
contain the '$CONDITIONS' string. For example, 'SELECT * FROM table WHERE $CONDITIONS'.
33+
The '$CONDITIONS' string will be replaced by 'splitBy' field limits specified by the bounding query.
34+
The '$CONDITIONS' string is not required if numSplits is set to one.
35+
36+
**Bounding Query:** Bounding Query should return the min and max of the values of the 'splitBy' field.
37+
For example, 'SELECT MIN(id),MAX(id) FROM table'. Not required if numSplits is set to one.
38+
39+
**Split-By Field Name:** Field Name which will be used to generate splits. Not required if numSplits is set to one.
40+
41+
**Number of Splits to Generate:** Number of splits to generate.
42+
43+
**Username:** User identity for connecting to the specified database.
44+
45+
**Password:** Password to use to connect to the specified database.
46+
47+
**Connection Arguments:** A list of arbitrary string key/value pairs as connection arguments. These arguments
48+
will be passed to the JDBC driver as connection arguments for JDBC drivers that may need additional configurations.
49+
50+
**Schema:** The schema of records output by the source. This will be used in place of whatever schema comes
51+
back from the query. However, it must match the schema that comes back from the query,
52+
except it can mark fields as nullable and can contain a subset of the fields.
Lines changed: 39 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,39 @@
1+
# SAP HANA Query Post-run Action
2+
3+
4+
Description
5+
-----------
6+
Runs a SAP HANA query at the end of the pipeline run.
7+
Can be configured to run only on success, only on failure, or always at the end of the run.
8+
9+
Use Case
10+
--------
11+
The action is used whenever you need to run a query at the end of a pipeline run.
12+
For example, you may have a pipeline that imports data from a database table to
13+
hdfs files. At the end of the run, you may want to run a query that deletes the data
14+
that was read from the table.
15+
16+
17+
Properties
18+
----------
19+
**Run Condition:** When to run the action. Must be 'completion', 'success', or 'failure'. Defaults to 'success'.
20+
If set to 'completion', the action will be executed regardless of whether the pipeline run succeeded or failed.
21+
If set to 'success', the action will only be executed if the pipeline run succeeded.
22+
If set to 'failure', the action will only be executed if the pipeline run failed.
23+
24+
**Driver Name:** Name of the JDBC driver to use.
25+
26+
**Query:** Query to run.
27+
28+
**Host:** Host that SAP HANA is running on.
29+
30+
**Port:** Port that SAP HANA is running on.
31+
32+
**Database:** SAP HANA database name.
33+
34+
**Username:** User identity for connecting to the specified database.
35+
36+
**Password:** Password to use to connect to the specified database.
37+
38+
**Connection Arguments:** A list of arbitrary string key/value pairs as connection arguments. These arguments
39+
will be passed to the JDBC driver as connection arguments for JDBC drivers that may need additional configurations.
9.95 KB
Loading
9.95 KB
Loading

0 commit comments

Comments
 (0)