Skip to content

Commit f80be7a

Browse files
authored
[CYB-190] user can specify names of CDP cloud artifacts in example create_datahub_config.sh setup script (#76)
1 parent e5d8fbd commit f80be7a

File tree

2 files changed

+141
-72
lines changed

2 files changed

+141
-72
lines changed

flink-cyber/cyber-jobs/src/main/resources/examples/README.md

Lines changed: 40 additions & 33 deletions
Original file line numberDiff line numberDiff line change
@@ -43,7 +43,7 @@ Install the following services on the CDP Base Cluster.
4343

4444
#### CDP Public Cloud
4545

46-
Provision the resources below in the same CDP Environment and use the same prefix at the begining of each resource name:
46+
Provision the resources below in the same CDP Environment:
4747

4848
| Resource Type | Configuration | Basic | Full |
4949
| -----------------| ------| ---- | ------|
@@ -66,26 +66,26 @@ Provision the resources below in the same CDP Environment and use the same prefi
6666
CYBERSEC-2.3.1-1.16.1-csadh1.10.0.0-cdh7.2.17.0-334-2308141830/meta/
6767
...
6868
```
69-
4. Create a link from CYBERSEC to the parcel directory.
69+
4\. Create a link from CYBERSEC to the parcel directory.
7070
```shell script
7171
[cduby@cduby-csa-081423-master0 ~]$ ln -s CYBERSEC-2.3.1-1.16.1-csadh1.10.0.0-cdh7.2.17.0-334-2308141830 CYBERSEC
7272
[cduby@cduby-csa-081423-master0 ~]$ ls -ld CYBERSEC
7373
lrwxrwxrwx. 1 cduby cduby 62 Aug 14 20:41 CYBERSEC -> CYBERSEC-2.3.1-1.16.1-csadh1.10.0.0-cdh7.2.17.0-334-2308141830
7474
[cduby@cduby-csa-081423-master0 ~]$ ls CYBERSEC
7575
bin etc jobs lib meta tools
7676
```
77-
5. Edit the shell configuration defining the PATH variable. For example, edit .bash_profile.
77+
5\. Edit the shell configuration defining the PATH variable. For example, edit .bash_profile.
7878
```shell script
7979
## The shell config file, maybe different. Locate the definition of PATH in your configs.
8080
[cduby@cduby-csa-081423-master0 ~]$ vi .bash_profile
8181
```
82-
6. Add $HOME/CYBERSEC/bin to the PATH
82+
6\. Add $HOME/CYBERSEC/bin to the PATH
8383
```shell script
8484
### this is an example, use your path here
8585
PATH=$PATH:$HOME/CYBERSEC/bin:$HOME/.local/bin:$HOME/bin
8686
export PATH
8787
```
88-
7. Source the shell config or log out and log back in again to refresh the shell settings. Check the availability of the cybersec commands in the path.
88+
7\. Source the shell config or log out and log back in again to refresh the shell settings. Check the availability of the cybersec commands in the path.
8989
```shell script
9090
[cduby@cduby-csa-081423-master0 ~]$ source .bash_profile
9191
[cduby@cduby-csa-081423-master0 ~]$ which cs-restart-parser
@@ -102,60 +102,67 @@ export PATH
102102

103103
#### CDP Base
104104
1. Copy the files in examples/setup/templates to example/pipelines
105-
106105
```
107106
cd cybersec/flink-cyber/
108107
```
109108
2. Edit the .properties files in example/pipelines with the correct settings for the cluster.
110-
2. If the Hbase service is not in the same cluster as Flink, download the Hbase client configs from Cloudera Manager. Move the hbase config zip to the pipelines directory. Unzip the hbase configuration files.
111-
3. If the Hive service is not in the same cluster as Flink, download the Hive on tez client configs from Cloudera Manager. Move the hive config zip to the pipelines directory. Unzip the hive config files.
112-
4. If using a separate Hive cluster, remove the hive_conf/core-site.xml and hive-conf/yarn-site.xml files.
109+
3. If the Hbase service is not in the same cluster as Flink, download the Hbase client configs from Cloudera Manager. Move the hbase config zip to the pipelines directory. Unzip the hbase configuration files.
110+
4. If the Hive service is not in the same cluster as Flink, download the Hive on tez client configs from Cloudera Manager. Move the hive config zip to the pipelines directory. Unzip the hive config files.
111+
5. If using a separate Hive cluster, remove the hive_conf/core-site.xml and hive-conf/yarn-site.xml files.
113112

114113
#### CDP Public Cloud
115114
1. If necessary, install the [CDP CLI client](https://docs.cloudera.com/cdp-public-cloud/cloud/cli/topics/mc-cli-client-setup.html).
116-
2. Run the command line ./create_datahub_config.sh <environment_name> <prefix>. When prompted enter your workload password.
115+
2. [Install the jq package](https://jqlang.github.io/jq/download/).
116+
3. Create a properties file with the names of the CDP cloud resources.
117+
```shell script
118+
hive_datahub_name=name_of_hive_datahub
119+
kafka_datahub_name=name_of_kafka_datahub
120+
opdb_database_name=name_of_operational_db
121+
```
122+
Omit any lines for the hive datahubs or operational DB. The minimal properties files is shown below:
123+
```shell script
124+
kafka_datahub_name=name_of_kafka_datahub
125+
```
126+
4\. Run the command line ./create_datahub_config.sh <environment_name> <properties_file>. When prompted enter your workload password.
117127
```shell script
118128
cduby@cduby-MBP16-21649 examples % cd cybersec/flink-cyber/cyber-jobs/src/main/resources/examples/setup
119-
cduby@cduby-MBP16-21649 setup % ./create_datahub_config.sh se-sandboxx-aws cduby
120-
cleaning up hive configs
129+
cduby@cduby-MBP16-21649 setup % ./create_datahub_config.sh se-sandboxx-aws datahub_setup_kafka_hive_opdb.properties
130+
When prompted, enter your workload user password.
131+
INFO: resetting hive configs from datahub de-cduby-013024
121132
Enter host password for user 'cduby':
122-
% Total % Received % Xferd Average Speed Time Time Time Current
123-
Dload Upload Total Spent Left Speed
124-
100 11102 0 11102 0 0 2870 0 --:--:-- 0:00:03 --:--:-- 2876
125-
x hive-conf/mapred-site.xml
126-
x hive-conf/hdfs-site.xml
127-
x hive-conf/hive-site.xml
128-
x hive-conf/atlas-application.properties
129-
x hive-conf/log4j.properties
130133
x hive-conf/hadoop-env.sh
134+
x hive-conf/hdfs-site.xml
131135
x hive-conf/log4j2.properties
132-
x hive-conf/redaction-rules.json
133-
x hive-conf/core-site.xml
136+
x hive-conf/beeline-site.xml
134137
x hive-conf/yarn-site.xml
138+
x hive-conf/mapred-site.xml
139+
x hive-conf/hive-site.xml
140+
x hive-conf/atlas-application.properties
135141
x hive-conf/hive-env.sh
136-
x hive-conf/beeline-site.xml
142+
x hive-conf/core-site.xml
143+
x hive-conf/redaction-rules.json
144+
x hive-conf/log4j.properties
145+
INFO: Resetting OPDB configs from datahub ckdodb-013024
137146
Enter host password for user 'cduby':
138-
% Total % Received % Xferd Average Speed Time Time Time Current
139-
Dload Upload Total Spent Left Speed
140-
100 5828 100 5828 0 0 5394 0 0:00:01 0:00:01 --:--:-- 5436
141-
x hbase-conf/hdfs-site.xml
142-
x hbase-conf/atlas-application.properties
143147
x hbase-conf/hbase-omid-client-config.yml
148+
x hbase-conf/hbase-site.xml
149+
x hbase-conf/jaas.conf
150+
x hbase-conf/hdfs-site.xml
144151
x hbase-conf/hbase-env.sh
145152
x hbase-conf/core-site.xml
153+
x hbase-conf/atlas-application.properties
146154
x hbase-conf/log4j.properties
147-
x hbase-conf/hbase-site.xml
148-
x hbase-conf/jaas.conf
155+
INFO: getting Phoenix connection settings
156+
Enter host password for user 'cduby':
149157
Certificate was added to keystore
150-
151158
```
152159
### (Optional) Download Maxmind Database
153160
1. Optionally download the binary (mmdb) version of the [Maxmind GeoLite2 City and ASN Databases](https://dev.maxmind.com/geoip/geolite2-free-geolocation-data). Create a Maxmind account login if you don't have one already. If you don't download the databases, the triaging job will operate but events will not have geocode (country, city, lat, lon) or asn (network) enrichments.
154161
2. cp the GeoLite2 .tar.gz files to the examples/setup directory.
155162
```shell script
156163
cduby@cduby-MBP16-21649 templates % cp GeoLite2-*.tar.gz examples/setup
157164
```
158-
3. Scp the examples directory tree to the flink gateway host.
165+
3\. Scp the examples directory tree to the flink gateway host.
159166
```shell script
160167
scp -r examples <user>@<flink_gateway_host>:/home/<user>
161168
```
@@ -193,7 +200,7 @@ no mock server running
193200
100 138 100 138 0 0 874 0 --:--:-- --:--:-- --:--:-- 878
194201

195202
```
196-
3. Start the basic pipeline. If you have all the required services installed, start the full pipeline.
203+
3\. Start the basic pipeline. If you have all the required services installed, start the full pipeline.
197204
```shell script
198205
./start_basic.sh
199206
## if all required services are installed

flink-cyber/cyber-jobs/src/main/resources/examples/setup/create_datahub_config.sh

Lines changed: 101 additions & 39 deletions
Original file line numberDiff line numberDiff line change
@@ -1,63 +1,125 @@
1+
#!/usr/bin/env bash
2+
3+
report_usage_error() {
4+
echo "$(basename $0) $1" >&2
5+
exit 2
6+
}
7+
8+
report_fail() {
9+
echo "$(basename $0) ERROR: $1" >&2
10+
exit 2
11+
}
12+
13+
report_info() {
14+
echo "INFO: $1" >&2
15+
}
16+
117
if [[ $# -ne 2 ]]; then
2-
echo "$(basename $0) <environment_name> <cluster_prefix>" >&2
3-
exit 2
18+
report_usage_error "<environment_name> <properties_file>"
419
fi
520

6-
echo "When prompted, enter your workload user password."
721
get_dh_name() {
822
cdp datahub list-clusters | jq -r '.clusters[] | select(.clusterName | contains ("'"$1"'")) | select(.workloadType | contains ("'$2'")) | .clusterName '
923
}
1024

25+
# read_properties_into_variables <property_file_name>
26+
# Read the name value pairs
27+
# split at first equals to key and value
28+
# convert property name to legal shell variable name
29+
# set shell variable to property variable
30+
function read_properties_into_variables() {
31+
while read -r line; do
32+
[[ "$line" =~ ^([[:space:]]*|[[:space:]]*#.*)$ ]] && continue
33+
value=${line#*=}
34+
key=${line%"=$value"}
35+
key=$(echo $key | tr '.' '_')
36+
eval ${key}=${value}
37+
done <$1
38+
}
39+
1140
env_name=$1
12-
cluster_prefix=$2
41+
properties_file=$2
42+
43+
if [[ -f "$properties_file" ]]; then
44+
read_properties_into_variables "$properties_file"
45+
else
46+
report_fail "Properties file $properties_file can't be read. Check path and permissions."
47+
fi
48+
49+
echo "When prompted, enter your workload user password."
50+
51+
found_env=$(cdp environments list-environments | jq -r '.environments[] | select(.environmentName=="'"${env_name}"'") | .environmentName')
52+
if [[ -z "${found_env}" ]]; then
53+
report_fail "Environment ${env_name} does not exist."
54+
fi
55+
1356
config_dir=../pipelines
1457
workload_user=$(cdp iam get-user | jq -r '.user.workloadUsername')
1558

1659
# discover hive configs
17-
hive_dh=$(get_dh_name "${cluster_prefix}" "Hive")
18-
if [[ ! -z "$hive_dh" ]]; then
19-
hive_zip=$config_dir/hive-conf.zip
20-
hive_conf="$config_dir/hive-conf"
21-
hive_cm_api=$(cdp datahub describe-cluster --cluster-name "$hive_dh" | jq -r '.cluster.endpoints.endpoints[] | select (.serviceName | contains("CM-API")) | .serviceUrl')
22-
echo resetting hive configs from datahub ${hive_dh}
23-
rm -f "$hive_zip"
24-
rm -rf "$hive_conf"
25-
curl -S -s -o "${hive_zip}" -u "${workload_user}" ${hive_cm_api}/v41/clusters/${hive_dh}/services/hive_on_tez/clientConfig
26-
if [[ -f "$hive_zip" ]]; then
27-
tar -zxvf "$hive_zip" -C "$config_dir"
28-
rm -f "$hive_conf/core-site.xml"
29-
rm -f "$hive_conf/yarn-site.xml"
60+
if [[ ! -z "${hive_datahub_name}" ]]; then
61+
hive_dh=$(get_dh_name "${hive_datahub_name}" "Hive")
62+
if [[ ! -z "$hive_dh" ]]; then
63+
hive_zip=$config_dir/hive-conf.zip
64+
hive_conf="$config_dir/hive-conf"
65+
hive_cm_api=$(cdp datahub describe-cluster --cluster-name "$hive_dh" | jq -r '.cluster.endpoints.endpoints[] | select (.serviceName | contains("CM-API")) | .serviceUrl')
66+
report_info "resetting hive configs from datahub ${hive_dh}"
67+
rm -f "$hive_zip"
68+
rm -rf "$hive_conf"
69+
curl -S -s -o "${hive_zip}" -u "${workload_user}" ${hive_cm_api}/v41/clusters/${hive_dh}/services/hive_on_tez/clientConfig
70+
if [[ -f "$hive_zip" ]]; then
71+
tar -zxvf "$hive_zip" -C "$config_dir"
72+
rm -f "$hive_conf/core-site.xml"
73+
rm -f "$hive_conf/yarn-site.xml"
74+
else
75+
report_fail "Could not get hive configuration."
76+
fi
3077
else
31-
echo "ERROR: could not get hive configuration."
32-
exit 2
78+
report_fail "Hive datahub '${hive_datahub_name}' not found in environment '${env_name}'"
3379
fi
80+
else
81+
report_info "Hive is not configured. Property hive_datahub_name not defined in properties file"
3482
fi
3583

36-
# discover kafka connection config
37-
kafka_dh_name=$(get_dh_name "${cluster_prefix}" "Kafka")
84+
85+
86+
# discover kafka connection config
87+
if [[ ! -z "${kafka_datahub_name}" ]]; then
88+
kafka_dh_name=$(get_dh_name "${kafka_datahub_name}" "Kafka")
89+
if [[ -z "${kafka_dh_name}" ]]; then
90+
report_fail "Environment '${env_name}' does not contain a datahub named '${kafka_datahub_name}' containing Kafka"
91+
fi
92+
else
93+
report_fail "Kafka is not configured. Property kafka_datahub_name not defined in properties file"
94+
fi
3895
schema_registry=$(cdp datahub describe-cluster --cluster-name ${kafka_dh_name} | jq -r '.cluster.instanceGroups[] | select(.name | contains("master")) | .instances[].fqdn')
3996
kafka_broker=$(cdp datahub describe-cluster --cluster-name ${kafka_dh_name} | jq -r '.cluster.endpoints.endpoints[] | select (.serviceName | contains("KAFKA_BROKER")) | .serviceUrl' | sed 's/ //g')
4097

4198
# opdb (hbase and phoenix) connection config
42-
opdb_cluster_name=$(cdp opdb list-databases --environment-name ${env_name} | jq -r '.databases[] | select(.databaseName | contains ("'"${cluster_prefix}"'")) | .databaseName')
43-
phoenix_query_server_host=NO_OPDB_CLUSTER
44-
if [[ ! -z "$opdb_cluster_name" ]]; then
45-
echo resetting opdb configs from datahub ${opdb_cluster_name}
46-
opdb_client_url=$(cdp opdb describe-client-connectivity --environment-name ${env_name} --database-name ${opdb_cluster_name} | jq -r '.connectors[] | select(.name=="hbase") | .configuration.clientConfigurationDetails[].url')
47-
hbase_zip="$config_dir/hbase-config.zip"
48-
hbase_conf="$config_dir/hbase-conf"
49-
rm -f "$hbase_zip"
50-
rm -rf "$hbase_conf"
51-
curl -S -s -f -o "$hbase_zip" -u "${workload_user}" "${opdb_client_url}"
52-
if [[ -f "$hbase_zip" ]]; then
53-
tar -zxvf "$hbase_zip" -C "$config_dir"
54-
else
55-
echo "ERROR: could not get hbase configuration."
56-
exit 2
99+
if [[ ! -z "${opdb_database_name}" ]]; then
100+
opdb_cluster_name=$(cdp opdb list-databases --environment-name ${env_name} | jq -r '.databases[] | select(.databaseName | contains ("'"${opdb_database_name}"'")) | .databaseName')
101+
phoenix_query_server_host=NO_OPDB_CLUSTER
102+
if [[ ! -z "$opdb_cluster_name" ]]; then
103+
report_info "Resetting OPDB configs from datahub ${opdb_cluster_name}"
104+
opdb_client_url=$(cdp opdb describe-client-connectivity --environment-name ${env_name} --database-name ${opdb_cluster_name} | jq -r '.connectors[] | select(.name=="hbase") | .configuration.clientConfigurationDetails[].url')
105+
hbase_zip="$config_dir/hbase-config.zip"
106+
hbase_conf="$config_dir/hbase-conf"
107+
rm -f "$hbase_zip"
108+
rm -rf "$hbase_conf"
109+
curl -S -s -f -o "$hbase_zip" -u "${workload_user}" "${opdb_client_url}"
110+
if [[ -f "$hbase_zip" ]]; then
111+
tar -zxvf "$hbase_zip" -C "$config_dir"
112+
else
113+
report_fail "Could not get HBase configuration."
114+
fi
115+
base_opdb_services_url=$(echo ${opdb_client_url} | sed -e 's/hbase\/clientConfig//')
116+
report_info "getting Phoenix connection settings"
117+
phoenix_query_server_host=$(curl -S -s -u ${workload_user} ${base_opdb_services_url}/phoenix/roles | jq -r '.items[] | select (.type | contains("PHOENIX_QUERY_SERVER")) | .hostRef.hostname')
118+
else
119+
report_fail "OPDB database ${hive_datahub_name} not found in environment ${env_name}"
57120
fi
58-
base_opdb_services_url=$(echo ${opdb_client_url} | sed -e 's/hbase\/clientConfig//')
59-
echo "getting phoenix connection settings"
60-
phoenix_query_server_host=$(curl -S -s -u ${workload_user} ${base_opdb_services_url}/phoenix/roles | jq -r '.items[] | select (.type | contains("PHOENIX_QUERY_SERVER")) | .hostRef.hostname')
121+
else
122+
report_info "HBase and Phoenix are not configured. Property opdb_database_name not defined in properties file" >&2
61123
fi
62124

63125
cdp environments get-keytab --environment-name $env_name | jq -r '.contents' | base64 --decode > ${config_dir}/krb5.keytab

0 commit comments

Comments
 (0)