Skip to content

Commit ad5fce0

Browse files
authored
Merge pull request #70 from ARGOeu/devel
Released to production
2 parents 6f127b3 + 5362c9c commit ad5fce0

File tree

128 files changed

+12441
-2335
lines changed

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

128 files changed

+12441
-2335
lines changed

.gitignore

Lines changed: 8 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -10,3 +10,11 @@
1010

1111
# virtual machine crash logs, see http://www.java.com/en/download/help/error_hotspot.xml
1212
hs_err_pid*
13+
14+
# Python scripts
15+
__pycache__
16+
*.pyc
17+
*~
18+
.DS_STORE
19+
conf.cfg
20+
.pytest_cache

.travis.yml

Lines changed: 10 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -1,10 +1,16 @@
1-
# Set-up a python centric enviroment in order to easily choose py version:2.6
1+
# Set-up a python centric enviroment in order to easily choose py version:2.7
22
# bonus: Java 7 and mvn also included
33
language: python
4-
# Target py version 2.6
4+
# Target py version 2.7
55
python:
6-
- "2.6"
6+
- "2.7"
77

88
script:
9-
- cd flink_jobs/ams_stream_hbase/ && mvn test
9+
- pip install -r ./bin/requirements.txt
10+
- pytest
11+
- cd flink_jobs/ams_ingest_metric/ && mvn test
12+
- cd ../batch_ar && mvn test
13+
- cd ../batch_status && mvn test
1014
- cd ../stream_status && mvn test
15+
- cd ../ams_ingest_sync && mvn test
16+

README.md

Lines changed: 143 additions & 38 deletions
Original file line numberDiff line numberDiff line change
@@ -3,43 +3,59 @@
33

44
## Flink Jobs
55

6-
### AMS stream to Hbase
6+
### AMS ingest metric data (and store them to HDFS and/or Hbase)
77

8-
Flink job that connects as a subscriber to an ARGO Messaging Service, pulls messages from a specific project/subscription and stores them to a remote hbase cluster.
8+
Flink job that connects as a subscriber to an ARGO Messaging Service, pulls messages from a specific project/subscription and stores them to a remote hdfs and/or hbase cluster.
99

1010
Prepare job to submit in flink:
1111

12-
- `cd flink_jobs/ams_stream_hbase`
12+
- `cd flink_jobs/ams_ingest_metric`
1313
- `mvn clean && mvn package`
1414

1515

1616
Run jar in flink:
1717

18-
- `flink run ams-stream-hbase-0.1.jar --ams-endpoint {...} --ams-port {...} --ams-token {...} -ams-project {...} --ams-sub {...} --avro-schema {...} --hbase-master {...} --hbase-zk-quorum {...} --hbase-zk-port {...} --hbase-namespace {...} --hbase-table {...} --hbase-master-port {...}`
18+
- `flink run ams-ingest-metric-0.1.jar --ams.endpoint {...} --ams.port {...} --ams.token {...} -ams.project {...} --ams.sub {...} --avro.schema {...} --hbase.master {...} --hbase.zk.quorum {...} --hbase.zk.port {...} --hbase.namespace {...} --hbase.table {...} --hbase.master.port {...} --hdfs.path {...} --check.path {...} --check.interval --ams.batch {...} --ams.interval {...}`
1919

2020
Job required cli parameters:
2121

22-
`--ams-endpoint` : ARGO messaging api endoint to connect to msg.example.com
22+
`--ams.endpoint` : ARGO messaging api endoint to connect to msg.example.com
23+
24+
`--ams.port` : ARGO messaging api port
25+
26+
`--ams.token` : ARGO messaging api token
27+
28+
`--ams.project` : ARGO messaging api project to connect to
29+
30+
`--ams.sub` : ARGO messaging subscription to pull from
2331

24-
`--ams-port` : ARGO messaging api port
32+
Job optional cli parameters:
2533

26-
`--ams-token` : ARGO messaging api token
34+
`--hbase.master` : hbase endpoint
2735

28-
`--ams-project` : ARGO messaging api project to connect to
36+
`--hbase.master.port` : hbase master port
2937

30-
`--ams-sub` : ARGO messaging subscription to pull from
38+
`--hbase.zk.quorum` : comma separated list of hbase zookeeper servers
3139

32-
`--hbase-master` : hbase endpoint
40+
`--hbase.zk.port` : port used by hbase zookeeper servers
3341

34-
`--hbase-master-port` : hbase master port
42+
`--hbase.namespace` : table namespace used (usually tenant name)
3543

36-
`--hbase-zk-quorum` : comma separated list of hbase zookeeper servers
44+
`--hbase.table` : table name (usually metric_data)
3745

38-
`--hbase-zk-port` : port used by hbase zookeeper servers
46+
`--hdfs.path` : base path for storing metric data on hdfs
3947

40-
`--hbase-namespace` : table namespace used (usually tenant name)
48+
`--check.path` : path to store flink checkpoints
4149

42-
`--hbase-table` : table name (usually metric_data)
50+
`--check.interval` : interval for checkpointing (in ms)
51+
52+
`--ams.batch` : num of messages to be retrieved per request to AMS service
53+
54+
`--ams.interval` : interval (in ms) between AMS service requests
55+
56+
`--ams.proxy` : optional http proxy url to be used for AMS requests
57+
58+
`--ams.verify` : optional turn on/off ssl verify
4359

4460
### Metric data hbase schema
4561

@@ -71,21 +87,23 @@ Each hbase table has a column family named 'data' and the following columns:
7187

7288
`tags` : json list of tags used to add metadata to the metric event
7389

74-
### Stream Status
7590

76-
Flink job that connects as a subscriber to an ARGO Messaging Service, pulls messages from a specific project/subscription.
77-
For each metric data message the job calculates status changes in the whole topology and produces status Events.
78-
The status events are then forwarded to a specific kafka topic
91+
### AMS ingest connector (sync) data to HDFS
92+
93+
Flink job that connects as a subscriber to an ARGO Messaging Service, pulls messages that contain connector (sync) data (metric profiles, topology, weight etc.) from a specific project/subscription and stores them to an hdfs destination. Each message should have the following attributes:
94+
- report: name of the report that the connector data belong to
95+
- type: type of the connector data (metric_profile, group_endpoints, group_groups, weights, downtimes)
96+
- partition_date: YYYY-MM-DD format of date that the current connector data relates to.
7997

8098
Prepare job to submit in flink:
8199

82-
- `cd flink_jobs/stream_status
100+
- `cd flink_jobs/ams_ingest_sync`
83101
- `mvn clean && mvn package`
84102

85103

86104
Run jar in flink:
87105

88-
- `flink run streaming-status-0.1.jar --ams-endpoint {...} --ams-port {...} --ams-token {...} -ams-project {...} --ams-sub {...} --avro-schema {...} --hbase-master {...} --hbase-zk-quorum {...} --hbase-zk-port {...} --hbase-namespace {...} --hbase-table {...} --hbase-master-port {...} --sync-mps {...} --sync-egp {...} --sync-aps {...} --sync-ops {...}`
106+
- `flink run ams-ingest-sync-0.1.jar- --ams.endpoint {...} --ams.port {...} --ams.token {...} -ams.project {...} --ams.sub {...} --hdfs.path {...} --ams.batch {...} --ams.interval {...}
89107

90108
Job required cli parameters:
91109

@@ -99,27 +117,100 @@ Job required cli parameters:
99117

100118
`--ams.sub` : ARGO messaging subscription to pull from
101119

102-
`--avro.schema` : Schema used for the decoding of metric data payload
120+
`--hdfs.path` : Base hdfs path to store connector data to (e.g. hdfs://localhost:9000/user/foo/path/to/tenant)
121+
122+
`--ams.batch` : num of messages to be retrieved per request to AMS service
123+
124+
`--ams.interval` : interval (in ms) between AMS service requests
125+
126+
`--ams.proxy` : optional http proxy url to be used for AMS requests
127+
128+
`--ams.verify` : optional turn on/off ssl verify
129+
130+
131+
### Stream Status
132+
133+
Flink job that connects as a subscriber to an ARGO Messaging Service, pulls messages from a specific project/subscription.
134+
For each metric data message the job calculates status changes in the whole topology and produces status Events.
135+
The status events are then forwarded to a specific kafka topic (and/or) hbase table (and/or) filesystem
136+
137+
Prepare job to submit in flink:
138+
139+
- `cd flink_jobs/stream_status
140+
- `mvn clean && mvn package`
141+
142+
143+
Run jar in flink:
103144

104-
`--hbase-master` : hbase endpoint
145+
- `flink run streaming-status-0.1.jar --ams.endpoint {...} --ams.port {...} --ams.token {...} -ams.project {...} --ams.sub.metric {...} --ams.sub.sync {...} --sync.mps {...} --sync.egp {...} --sync.aps {...} --sync.ops {...} --hbase.master {...} --hbase.zk.quorum {...} --hbase.zk.port {...} --hbase.namespace {...} --hbase.table {...} --hbase.master.port {...} --kafka.servers {...} --kafka.topic {...} --fs.output {...} --ams.batch {...} --ams.interval {...}`
105146

106-
`--hbase-master-port` : hbase master port
147+
Job required cli input parameters:
107148

108-
`--hbase-zk-quorum` : comma separated list of hbase zookeeper servers
149+
`--ams.endpoint` : ARGO messaging api endoint to connect to msg.example.com
109150

110-
`--hbase-zk-port` : port used by hbase zookeeper servers
151+
`--ams.port` : ARGO messaging api port
111152

112-
`--hbase-namespace` : table namespace used (usually tenant name)
153+
`--ams.token` : ARGO messaging api token
113154

114-
`--hbase-table` : table name (usually metric_data)
155+
`--ams.project` : ARGO messaging api project to connect to
156+
157+
`--ams.sub.metric` : ARGO messaging subscription to pull metric data from
158+
159+
`--ams.sub.sync` : ARGO messaging subscription to pull sync data from
115160

116161
`--sync.mps` : Metric profile file used
117162

118163
`--sync.egp` : Endpoint group topology file used
119164

120165
`--sync.aps` : Aggregation profile used
121166

122-
`--sync.ops` :Operations profile used
167+
`--sync.ops` : Operations profile used
168+
169+
Job optional cli parameters for hbase output:
170+
171+
`--hbase.master` : hbase endpoint
172+
173+
`--hbase.master.port` : hbase master port
174+
175+
`--hbase.zk.quorum` : comma separated list of hbase zookeeper servers
176+
177+
`--hbase.zk.port` : port used by hbase zookeeper servers
178+
179+
`--hbase.namespace` : table namespace used (usually tenant name)
180+
181+
`--hbase.table` : table name (usually metric_data)
182+
183+
Job optional cli parameters for kafka output:
184+
185+
`--kafka.servers` : Kafka server list to connect to
186+
187+
`--kafka.topic` : Kafka topic to send status events to
188+
189+
Job optional cli parameters for filesystem output (local/hdfs):
190+
191+
`--fs.output` : filesystem path for output (prefix with "hfds://" for hdfs usage)
192+
193+
Job optional cli parameters for mongo output:
194+
195+
`--mongo.uri` : Mongo uri to store status events to
196+
197+
`--mongo.method` : Mongo store method used (insert/upsert)
198+
199+
Job Optional cli parameters for ams ingestion related
200+
201+
`--ams.batch` : num of messages to be retrieved per request to AMS service
202+
203+
`--ams.interval` : interval (in ms) between AMS service requests
204+
Other optional cli parameters
205+
`--daily` : true/false - controls daily regeneration of events (not used in notifications)
206+
207+
`--timeout` : long(ms) - controls default timeout for event regeneration (used in notifications)
208+
209+
`--ams.proxy` : optional http proxy url to be used for AMS requests
210+
211+
`--ams.verify` : optional turn on/off ssl verify
212+
213+
123214

124215
### Status events schema
125216

@@ -174,18 +265,18 @@ Job required cli parameters:
174265

175266
`--rec` : file location of recomputations file (local or hdfs)
176267

177-
`--report` : report uuid
268+
`--conf` : file location of report configuration json file (local or hdfs)
178269

179270
`--run.date` : target date in DD-MM-YYYY format
180271

181-
`--egroup.type` : endpoint group type used in report (for e.g. SITES)
272+
`--mongo.uri` : MongoDB uri for outputting the results to (e.g. mongodb://localhost:21017/example_db)
182273

183-
`--datastore.uri` : datastore uri for outputting the results
274+
`--mongo.method` : MongoDB method to be used when storing the results ~ either: `insert` or `upsert`
184275

185276

186-
## Batch Status
277+
## Batch AR
187278

188-
Flink batch job that calculates status results for a specific date
279+
Flink batch job that calculates a/r results for a specific date
189280

190281
Prepare job to submit in flink:
191282

@@ -219,12 +310,26 @@ Job required cli parameters:
219310

220311
`--downtimes` : file location of downtimes file (local or hdfs)
221312

222-
`--report` : report uuid
313+
`--conf` : file location of report configuration json file (local or hdfs)
223314

224315
`--run.date` : target date in DD-MM-YYYY format
225316

226-
`--egroup.type` : endpoint group type used in report (for e.g. SITES)
317+
`--mongo.uri` : MongoDB uri for outputting the results to (e.g. mongodb://localhost:21017/example_db)
318+
319+
`--mongo.method` : MongoDB method to be used when storing the results ~ either: `insert` or `upsert`
320+
321+
322+
## Flink job names
323+
Running flink jobs can be listed either in flink dashboard by visiting `http://{{flink.webui.host}}:{{flink.webui.port}}`
324+
or by quering jobmanager api at `http://{{flink.webui.host}:{{flink.webui.port}}/joboverview/running
227325

228-
`--ggroup.type` : group of groups type used in report (for e.g. NGI)
326+
Each job submitted has a discerning job name based on a specific template. Job names are used also by submission wrapper scripts (`/bin` folder) to check if a identical job runs (to avoid duplicate submission)
229327

230-
`--datastore.uri` : datastore uri for outputting the results
328+
Job Name schemes:
329+
Job Type| Job Name scheme
330+
--------|----------------
331+
Ingest Metric | Ingesting metric data from `{{ams-endpoint}}`/v1/projects/`{{project}}`/subscriptions/`{{subscription}}`
332+
Ingest Sync | Ingesting sync data from `{{ams-endpoint}}`/v1/projects/`{{project}}`/subscriptions/`{{subscription}}`
333+
Batch AR | Ar Batch job for tenant:`{{tenant}}` on day:`{{day}}` using report:`{{report}}`
334+
Batch Status | Status Batch job for tenant:`{{tenant}}` on day:`{{day}}` using report:`{{report}}`
335+
Streaming Status | Streaming status using data from `{{ams-endpoint}}`/v1/projects/`{{project}}`/subscriptions/`[`{{metric_subscription}}`,`{{sync_subscription}}`]

0 commit comments

Comments
 (0)