Skip to content

Commit 69a62ce

Browse files
authored
Merge pull request #40 from mwvgroup/u/tjr/broker-v0.2
version 0.3 (the branch name indicates "v0.2", but is incorrect)
2 parents aea85f0 + 8bdb9fe commit 69a62ce

File tree

109 files changed

+4336
-2084
lines changed

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

109 files changed

+4336
-2084
lines changed

README.md

Lines changed: 81 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,84 @@
11
# Pitt-Google LSST Broker
22

3+
- [Broker Architecture](#broker-architecture)
4+
- [Setup the Broker for the First Time](#setup-the-broker-for-the-first-time)
5+
- [Run Nightly Broker](#run-nightly-broker)
6+
- [Note on Resources for Test Runs](#note-on-resources-for-test-runs)
7+
- [original README](#ogread)
8+
9+
__Useful tutorial/reference docs__
10+
- [__broker/README__.md](broker/README.md)
11+
- [broker/consumer/__kafka_console_connect__.md](broker/consumer/kafka_console_connect.md)
12+
- [broker/beam/__beam_dataflow_primer__.md](broker/beam/beam_dataflow_primer.md)
13+
14+
---
15+
16+
# Broker Architecture
17+
<!-- fs -->
18+
In short:
19+
20+
There are 4 main components:
21+
- The __consumer__ ingests the ZTF Kafka stream and republishes it as a Pub/Sub stream.
22+
- The __data storage__ (x2) and __processing__ (x1) components ingest the consumer's Pub/Sub stream and proceed with their function. These components store their output data to Cloud Storage and/or BigQuery, and publish it to a dedicated Pub/Sub topic/stream.
23+
24+
In addition, there is a "__night conductor__" (running on a VM) that
25+
orchestrates the broker,
26+
starting up resources and jobs at night (and will shut them down in the morning, but that script isn't written yet).
27+
28+
You can monitor the production broker at the [__ZTF Stream Monitoring Dashboard__](https://console.cloud.google.com/monitoring/dashboards/builder/d8b7db8b-c875-4b93-8b31-d9f427f0c761?project=ardent-cycling-243415&dashboardBuilderState=%257B%2522editModeEnabled%2522:false%257D&timeDomain=1w).
29+
30+
Details:
31+
32+
1. __ZTF Alert Stream Consumer__ (ZTF Alert: Kafka -> Pub/Sub)
33+
- __Compute Engine VM:__ [`ztf-consumer`](https://console.cloud.google.com/compute/instancesMonitoringDetail/zones/us-central1-a/instances/ztf-consumer?project=ardent-cycling-243415&tab=monitoring&duration=PT1H)
34+
- __Running__ Kafka Connect [`CloudPubSubConnector`](https://github.com/GoogleCloudPlatform/pubsub/tree/master/kafka-connector)
35+
- __Publishes to__ Pub/Sub topic: [`ztf_alert_data`](https://console.cloud.google.com/cloudpubsub/topic/detail/ztf_alert_data?project=ardent-cycling-243415)
36+
37+
2. __Avro File Storage__ (ZTF Alert -> Fix Schema -> GCS bucket)
38+
- __Cloud Function:__
39+
[`upload_ztf_bytes_to_bucket`](https://console.cloud.google.com/functions/details/us-central1/upload_ztf_bytes_to_bucket?project=ardent-cycling-243415&pageState=%28%22functionsDetailsCharts%22:%28%22groupValue%22:%22P1D%22,%22customValue%22:null%29%29)
40+
- __Listens to__ PS topic: [`ztf_alert_data`](https://console.cloud.google.com/cloudpubsub/topic/detail/ztf_alert_data?project=ardent-cycling-243415)
41+
- __Stores in__ GCS bucket: [`ztf_alert_avro_bucket`](https://console.cloud.google.com/storage/browser/ardent-cycling-243415_ztf_alert_avro_bucket;tab=objects?forceOnBucketsSortingFiltering=false&project=ardent-cycling-243415&prefix=&forceOnObjectsSortingFiltering=false)
42+
- __GCS bucket triggers__ Pub/Sub topic: [`ztf_alert_avro_bucket`](https://console.cloud.google.com/cloudpubsub/topic/detail/ztf_alert_avro_bucket?project=ardent-cycling-243415)
43+
44+
3. __BigQuery Database Storage__ (ZTF Alert -> BigQuery)
45+
- __Dataflow job:__ [`production-ztf-ps-bq`](https://console.cloud.google.com/dataflow/jobs?project=ardent-cycling-243415)
46+
- __Listens to__ PS topic: [`ztf_alert_data`](https://console.cloud.google.com/cloudpubsub/topic/detail/ztf_alert_data?project=ardent-cycling-243415)
47+
- __Stores in__ BQ table: [`ztf_alerts.alerts`](https://console.cloud.google.com/bigquery?project=ardent-cycling-243415) (ZTF alert data)
48+
49+
4. __Data Processing (value-added products)__ (ZTF Alert -> Extragalactic Transients Filter -> Salt2 Fit)
50+
- __Dataflow job:__ [`production-ztf-ps-exgal-salt2`](https://console.cloud.google.com/dataflow/jobs?project=ardent-cycling-243415)
51+
- __Listens to__ PS topic: [`ztf_alert_data`](https://console.cloud.google.com/cloudpubsub/topic/detail/ztf_alert_data?project=ardent-cycling-243415)
52+
- __Stores in__ BQ table: [`ztf_alerts.salt2`](https://console.cloud.google.com/bigquery?project=ardent-cycling-243415) (Salt2 fit params)
53+
- __Stores in__ GCS bucket: [`ztf-sncosmo/salt2/plot_lc`](https://console.cloud.google.com/storage/browser/ardent-cycling-243415_ztf-sncosmo/salt2/plot_lc?pageState=%28%22StorageObjectListTable%22:%28%22f%22:%22%255B%255D%22%29%29&project=ardent-cycling-243415&prefix=&forceOnObjectsSortingFiltering=false) (lightcurve + Salt2 fit, png)
54+
- __Publishes to__ PS topics:
55+
- [ `ztf_exgalac_trans`](https://console.cloud.google.com/cloudpubsub/topic/detail/ztf_exgalac_trans?project=ardent-cycling-243415) (alerts passing extragalactic transient filter)
56+
- [`ztf_salt2`](https://console.cloud.google.com/cloudpubsub/topic/detail/ztf_salt2?project=ardent-cycling-243415) (Salt2 fit params)
57+
58+
5. __Night Conductor__ (orchestrates GCP resources and jobs to run the broker each night)
59+
- __Compute Engine VM:__ [`night-conductor`](https://console.cloud.google.com/compute/instancesDetail/zones/us-central1-a/instances/night-conductor?tab=details&project=ardent-cycling-243415)
60+
61+
<!-- fe Broker Architecture -->
62+
63+
---
64+
65+
# Setup the Broker for the First Time
66+
<!-- fs -->
67+
1. Setup and configure a new Google Cloud Platform (GCP) project.
68+
- [Instructions in our current docs](https://pitt-broker.readthedocs.io/en/latest/installation_setup/installation.html). We would need to follow pieces of the "Installation" and "Defining Environmental Variables" sections. Our project is already setup, so leaving out most of the details for now.
69+
70+
2. Install GCP tools on your machine:
71+
- [Google Cloud SDK](https://cloud.google.com/sdk/docs/install): Follow the instructions at the link. (This installs `gcloud`, `gsutil` and `bq` command line tools). I use a minimum version of Google Cloud SDK 323.0.0.
72+
- [Cloud Client Libraries for Python](https://cloud.google.com/python/docs/reference): Each service requires a different library; the ones we need are (I hope) all listed in the `requirements.txt` in this directory. Install them with (e.g., ) `pip install -r requirements.txt`.
73+
74+
3. Follow instructions in [broker/README.md](broker/README.md) to complete the setup.
75+
<!-- fe Setup the Broker -->
76+
77+
---
78+
<a name="ogread"></a>
79+
The following is the original README:
80+
<!-- fs -->
81+
382
[![python](https://img.shields.io/badge/python-3.7-g.svg)]()
483
[![Build Status](https://travis-ci.com/mwvgroup/Pitt-Google-Broker.svg?branch=master)](https://travis-ci.com/mwvgroup/Pitt-Google-Broker)
584
[![Documentation Status](https://readthedocs.org/projects/pitt-broker/badge/?version=latest)](https://pitt-broker.readthedocs.io/en/latest/?badge=latest)
@@ -9,3 +88,5 @@ Data from the Large Synoptic Survey Telescope ([LSST](https://www.lsst.org)) wil
988
The 60-second alert stream will not be made available to the public (at least not in its entirety). Instead, LSST will rely on a small number of (~7) community developed *broker* systems to publically relay the information. This repo represents the construction of an LSST broker designed to run on the Google Cloud Platform ([GCP](https://cloud.google.com)) using alerts from the Zwicky Transient Facility ([ZTF](https://www.ztf.caltech.edu)) as a testing ground.
1089

1190
Full online documentation is available online via [Read the Docs](https://pitt-broker.readthedocs.io/en/latest/index.html).
91+
92+
<!-- fe OG readme -->

0 commit comments

Comments
 (0)