ARGOeu
diff --git a/‎.gitignore‎
Lines changed: 8 additions & 0 deletions b/‎.gitignore‎
Lines changed: 8 additions & 0 deletions
diff --git a/‎.travis.yml‎
Lines changed: 10 additions & 4 deletions b/‎.travis.yml‎
Lines changed: 10 additions & 4 deletions
diff --git a/‎README.md‎
Lines changed: 143 additions & 38 deletions b/‎README.md‎
Lines changed: 143 additions & 38 deletions
@@ -10,3 +10,11 @@
 
 # virtual machine crash logs, see http://www.java.com/en/download/help/error_hotspot.xml
 hs_err_pid*
+
+# Python scripts
+__pycache__
+*.pyc
+*~
+.DS_STORE
+conf.cfg
+.pytest_cache
@@ -1,10 +1,16 @@
-# Set-up a python centric enviroment in order to easily choose py version:2.6
+# Set-up a python centric enviroment in order to easily choose py version:2.7
 # bonus: Java 7 and mvn also included
 language: python
-# Target py version 2.6
+# Target py version 2.7
 python:
-  - "2.6"
+  - "2.7"
 
 script:
-  - cd flink_jobs/ams_stream_hbase/ && mvn test
+  - pip install -r ./bin/requirements.txt
+  - pytest
+  - cd flink_jobs/ams_ingest_metric/ && mvn test
+  - cd ../batch_ar && mvn test
+  - cd ../batch_status && mvn test
   - cd ../stream_status && mvn test
+  - cd ../ams_ingest_sync && mvn test
+  
@@ -3,43 +3,59 @@
 
 ## Flink Jobs
 
-### AMS stream to Hbase
+### AMS ingest metric data (and store them to HDFS and/or Hbase)
 
-Flink job that connects as a subscriber to an ARGO Messaging Service, pulls messages from a specific project/subscription and stores them to a remote hbase cluster.
+Flink job that connects as a subscriber to an ARGO Messaging Service, pulls messages from a specific project/subscription and stores them to a remote hdfs and/or hbase cluster.
 
 Prepare job to submit in flink:
 
-- `cd flink_jobs/ams_stream_hbase`
+- `cd flink_jobs/ams_ingest_metric`
 - `mvn clean && mvn package`
 
 
 Run jar in flink:
 
-- `flink run ams-stream-hbase-0.1.jar --ams-endpoint {...} --ams-port {...} --ams-token {...} -ams-project {...} --ams-sub {...} --avro-schema {...} --hbase-master {...} --hbase-zk-quorum {...} --hbase-zk-port {...} --hbase-namespace {...} --hbase-table {...} --hbase-master-port {...}`
+- `flink run ams-ingest-metric-0.1.jar --ams.endpoint {...} --ams.port {...} --ams.token {...} -ams.project {...} --ams.sub {...} --avro.schema {...} --hbase.master {...} --hbase.zk.quorum {...} --hbase.zk.port {...} --hbase.namespace {...} --hbase.table {...} --hbase.master.port {...} --hdfs.path {...} --check.path {...} --check.interval  --ams.batch {...} --ams.interval {...}`
 
 Job required cli parameters:
 
-`--ams-endpoint`      : ARGO messaging api endoint to connect to msg.example.com
+`--ams.endpoint`      : ARGO messaging api endoint to connect to msg.example.com
+
+`--ams.port`          : ARGO messaging api port
+
+`--ams.token`         : ARGO messaging api token
+
+`--ams.project`       : ARGO messaging api project to connect to
+
+`--ams.sub`           : ARGO messaging subscription to pull from
 
-`--ams-port`          : ARGO messaging api port
+Job optional cli parameters:
 
-`--ams-token`         : ARGO messaging api token
+`--hbase.master`      : hbase endpoint
 
-`--ams-project`       : ARGO messaging api project to connect to
+`--hbase.master.port` : hbase master port
 
-`--ams-sub`           : ARGO messaging subscription to pull from
+`--hbase.zk.quorum`   : comma separated list of hbase zookeeper servers
 
-`--hbase-master`      : hbase endpoint
+`--hbase.zk.port`     : port used by hbase zookeeper servers
 
-`--hbase-master-port` : hbase master port
+`--hbase.namespace`   : table namespace used (usually tenant name)
 
-`--hbase-zk-quorum`   : comma separated list of hbase zookeeper servers
+`--hbase.table`       : table name (usually metric_data)
 
-`--hbase-zk-port`     : port used by hbase zookeeper servers
+`--hdfs.path`         : base path for storing metric data on hdfs
 
-`--hbase-namespace`   : table namespace used (usually tenant name)
+`--check.path`        : path to store flink checkpoints
 
-`--hbase-table`       : table name (usually metric_data)
+`--check.interval`    : interval for checkpointing (in ms)
+
+`--ams.batch`         : num of messages to be retrieved per request to AMS service
+
+`--ams.interval`      : interval (in ms) between AMS service requests
+
+`--ams.proxy`         : optional http proxy url to be used for AMS requests
+
+`--ams.verify`        : optional turn on/off ssl verify
 
 ### Metric data hbase schema
 
@@ -71,21 +87,23 @@ Each hbase table has a column family named 'data' and the following columns:
 
 `tags`              : json list of tags used to add metadata to the metric event
 
-### Stream Status
 
-Flink job that connects as a subscriber to an ARGO Messaging Service, pulls messages from a specific project/subscription.
-For each metric data message the job calculates status changes in the whole topology and produces status Events.
-The status events are then forwarded to a specific kafka topic
+### AMS ingest connector (sync) data to HDFS
+
+Flink job that connects as a subscriber to an ARGO Messaging Service, pulls messages that contain connector (sync) data (metric profiles, topology, weight etc.) from a specific project/subscription and stores them to an hdfs destination. Each message should have the following attributes:
+- report: name of the report that the connector data belong to
+- type: type of the connector data (metric_profile, group_endpoints, group_groups, weights, downtimes)
+- partition_date: YYYY-MM-DD format of date that the current connector data relates to.
 
 Prepare job to submit in flink:
 
-- `cd flink_jobs/stream_status
+- `cd flink_jobs/ams_ingest_sync`
 - `mvn clean && mvn package`
 
 
 Run jar in flink:
 
-- `flink run streaming-status-0.1.jar --ams-endpoint {...} --ams-port {...} --ams-token {...} -ams-project {...} --ams-sub {...} --avro-schema {...} --hbase-master {...} --hbase-zk-quorum {...} --hbase-zk-port {...} --hbase-namespace {...} --hbase-table {...} --hbase-master-port {...} --sync-mps {...} --sync-egp {...} --sync-aps {...} --sync-ops {...}`
+- `flink run  ams-ingest-sync-0.1.jar- --ams.endpoint {...} --ams.port {...} --ams.token {...} -ams.project {...} --ams.sub {...} --hdfs.path {...} --ams.batch {...} --ams.interval {...}
 
 Job required cli parameters:
 
@@ -99,27 +117,100 @@ Job required cli parameters:
 
 `--ams.sub`           : ARGO messaging subscription to pull from
 
-`--avro.schema`       : Schema used for the decoding of metric data payload
+`--hdfs.path`         : Base hdfs path to store connector data to (e.g. hdfs://localhost:9000/user/foo/path/to/tenant)
+
+`--ams.batch`         : num of messages to be retrieved per request to AMS service
+
+`--ams.interval`      : interval (in ms) between AMS service requests
+
+`--ams.proxy`         : optional http proxy url to be used for AMS requests
+
+`--ams.verify`        : optional turn on/off ssl verify
+
+
+### Stream Status
+
+Flink job that connects as a subscriber to an ARGO Messaging Service, pulls messages from a specific project/subscription.
+For each metric data message the job calculates status changes in the whole topology and produces status Events.
+The status events are then forwarded to a specific kafka topic (and/or) hbase table (and/or) filesystem
+
+Prepare job to submit in flink:
+
+- `cd flink_jobs/stream_status
+- `mvn clean && mvn package`
+
+
+Run jar in flink:
 
-`--hbase-master`      : hbase endpoint
+- `flink run streaming-status-0.1.jar --ams.endpoint {...} --ams.port {...} --ams.token {...} -ams.project {...} --ams.sub.metric {...} --ams.sub.sync {...}  --sync.mps {...} --sync.egp {...} --sync.aps {...} --sync.ops {...} --hbase.master {...} --hbase.zk.quorum {...} --hbase.zk.port {...} --hbase.namespace {...} --hbase.table {...} --hbase.master.port {...} --kafka.servers {...} --kafka.topic {...} --fs.output {...} --ams.batch {...} --ams.interval {...}`
 
-`--hbase-master-port` : hbase master port
+Job required cli input parameters:
 
-`--hbase-zk-quorum`   : comma separated list of hbase zookeeper servers
+`--ams.endpoint`      : ARGO messaging api endoint to connect to msg.example.com
 
-`--hbase-zk-port`     : port used by hbase zookeeper servers
+`--ams.port`          : ARGO messaging api port
 
-`--hbase-namespace`   : table namespace used (usually tenant name)
+`--ams.token`         : ARGO messaging api token
 
-`--hbase-table`       : table name (usually metric_data)
+`--ams.project`       : ARGO messaging api project to connect to
+
+`--ams.sub.metric`    : ARGO messaging subscription to pull metric data from
+
+`--ams.sub.sync`      : ARGO messaging subscription to pull sync data from
 
 `--sync.mps`          : Metric profile file used
 
 `--sync.egp`          : Endpoint group topology file used
 
 `--sync.aps`          : Aggregation profile used
 
-`--sync.ops`           :Operations profile used
+`--sync.ops`          : Operations profile used
+
+Job optional cli parameters for hbase output:
+
+`--hbase.master`      : hbase endpoint
+
+`--hbase.master.port` : hbase master port
+
+`--hbase.zk.quorum`   : comma separated list of hbase zookeeper servers
+
+`--hbase.zk.port`     : port used by hbase zookeeper servers
+
+`--hbase.namespace`   : table namespace used (usually tenant name)
+
+`--hbase.table`       : table name (usually metric_data)
+
+Job optional cli parameters for kafka output:
+
+`--kafka.servers`     : Kafka server list to connect to
+
+`--kafka.topic`       : Kafka topic to send status events to
+
+Job optional cli parameters for filesystem output (local/hdfs):
+
+`--fs.output`         : filesystem path for output (prefix with "hfds://" for hdfs usage)
+
+Job optional cli parameters for mongo output:
+
+`--mongo.uri`         : Mongo uri to store status events to
+
+`--mongo.method`      : Mongo store method used (insert/upsert)
+
+Job Optional cli parameters for ams ingestion related
+
+`--ams.batch`         : num of messages to be retrieved per request to AMS service
+
+`--ams.interval`      : interval (in ms) between AMS service requests
+Other optional cli parameters
+`--daily`             : true/false - controls daily regeneration of events (not used in notifications)
+
+`--timeout`           : long(ms) - controls default timeout for event regeneration (used in notifications)
+
+`--ams.proxy`         : optional http proxy url to be used for AMS requests
+
+`--ams.verify`        : optional turn on/off ssl verify
+
+
 
 ### Status events schema
 
@@ -174,18 +265,18 @@ Job required cli parameters:
 
 `--rec`               : file location of recomputations file (local or hdfs)
 
-`--report`            : report uuid
+`--conf`              : file location of report configuration json file (local or hdfs)
 
 `--run.date`          : target date in DD-MM-YYYY format
 
-`--egroup.type`       : endpoint group type used in report (for e.g. SITES)
+`--mongo.uri`         : MongoDB uri for outputting the results to (e.g. mongodb://localhost:21017/example_db)
 
-`--datastore.uri`     : datastore uri for outputting the results
+`--mongo.method`      : MongoDB method to be used when storing the results ~ either: `insert` or `upsert`
 
 
-## Batch Status
+## Batch AR
 
-Flink batch job that calculates status results for a specific date
+Flink batch job that calculates a/r results for a specific date
 
 Prepare job to submit in flink:
 
@@ -219,12 +310,26 @@ Job required cli parameters:
 
 `--downtimes`         : file location of downtimes file (local or hdfs)
 
-`--report`            : report uuid
+`--conf`              : file location of report configuration json file (local or hdfs)
 
 `--run.date`          : target date in DD-MM-YYYY format
 
-`--egroup.type`       : endpoint group type used in report (for e.g. SITES)
+`--mongo.uri`         : MongoDB uri for outputting the results to (e.g. mongodb://localhost:21017/example_db)
+
+`--mongo.method`      : MongoDB method to be used when storing the results ~ either: `insert` or `upsert`
+
+
+## Flink job names
+Running flink jobs can be listed either in flink dashboard by visiting `http://{{flink.webui.host}}:{{flink.webui.port}}`
+or by quering jobmanager api at `http://{{flink.webui.host}:{{flink.webui.port}}/joboverview/running
 
-`--ggroup.type`       : group of groups type used in report (for e.g. NGI)
+Each job submitted has a discerning job name based on a specific template. Job names are used also by submission wrapper scripts (`/bin` folder) to check if a identical job runs (to avoid duplicate submission)
 
-`--datastore.uri`     : datastore uri for outputting the results
+Job Name schemes:
+Job Type| Job Name scheme
+--------|----------------
+Ingest Metric | Ingesting metric data from `{{ams-endpoint}}`/v1/projects/`{{project}}`/subscriptions/`{{subscription}}`
+Ingest Sync | Ingesting sync data from `{{ams-endpoint}}`/v1/projects/`{{project}}`/subscriptions/`{{subscription}}`
+Batch AR | Ar Batch job for tenant:`{{tenant}}` on day:`{{day}}` using report:`{{report}}`
+Batch Status | Status Batch job for tenant:`{{tenant}}` on day:`{{day}}` using report:`{{report}}`
+Streaming Status | Streaming status using data from `{{ams-endpoint}}`/v1/projects/`{{project}}`/subscriptions/`[`{{metric_subscription}}`,`{{sync_subscription}}`]