Skip to content
Merged
Show file tree
Hide file tree
Changes from 16 commits
Commits
Show all changes
22 commits
Select commit Hold shift + click to select a range
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 3 additions & 1 deletion Dockerfile
Original file line number Diff line number Diff line change
Expand Up @@ -36,4 +36,6 @@ USER gen3

WORKDIR /${appname}

CMD ["poetry", "run", "gunicorn", "gen3workflow.app:app", "-k", "uvicorn.workers.UvicornWorker", "-c", "gunicorn.conf.py"]
RUN chmod 755 bin/run.sh

CMD ["bash", "bin/run.sh"]
44 changes: 7 additions & 37 deletions bin/_common_setup.sh
Original file line number Diff line number Diff line change
Expand Up @@ -6,45 +6,15 @@ set -e

CURRENT_DIR="$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd)"

if [ -f "/src/gen3-workflow-config.yaml" ]; then
echo "Loading environment variables from ${CURRENT_DIR}/.env"
PROMETHEUS_MULTIPROC_DIR=$(grep 'PROMETHEUS_MULTIPROC_DIR:' /src/gen3-workflow-config.yaml | awk -F': ' '{print $2}' | tr -d '"')
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

mm the config is only at /src/gen3-workflow-config.yaml for cloud-automation deployments. If you run the service locally for example, it's not there.

Also i don't see PROMETHEUS_MULTIPROC_DIR in the default config, should we add it there?

and i don't get the reference to .env since we don't use that here

Copy link
Copy Markdown
Contributor Author

@nss10 nss10 Apr 17, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks!

  1. This script is only used when launching the service in an environment with multiple workers using gunicorn. In that scenario, it’s mandatory to have PROMETHEUS_MULTIPROC_DIR set before the application starts.
    For local development, we continue using python run.py, where the Prometheus directory is set during the initialization of the metrics object.
  2. The value is already added in the config-default.yaml here and also in config.py here
  3. The .env comment was an oversight -- I'll update that!

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. Gotcha, could you add a comment explaining that it's made to work with cloud-automation
  2. Not sure how i missed it

else
PROMETHEUS_MULTIPROC_DIR=""
fi
# Source the environment variables from the metrics setup script
# source "${CURRENT_DIR}/setup_prometheus"
source "${CURRENT_DIR}/setup_prometheus" $PROMETHEUS_MULTIPROC_DIR

echo "installing dependencies with 'poetry install -vv'..."
poetry install -vv
poetry env info
echo "ensuring db exists"

# Get the username, password, host, port, and database name
db_settings=$(poetry run python $CURRENT_DIR/../gen3workflow/config.py | tail -1)
if [ -z "${db_settings}" ]; then
echo "'gen3workflow/config.py' did not return DB settings"
exit 1
fi
db_settings_array=($db_settings)
HOST=${db_settings_array[0]}
PORT=${db_settings_array[1]}
USER=${db_settings_array[2]}
PASSWORD=${db_settings_array[3]}
DB_NAME=${db_settings_array[4]}

if [ -z "${HOST}" ] || [ -z "${PORT}" ] || [ -z "${USER}" ] || [ -z "${PASSWORD}" ] || [ -z "${DB_NAME}" ]; then
echo "Failed to extract one or more components from DB settings"
exit 1
fi

echo "Extracted database name: ${DB_NAME}"
echo "Extracted username: ${USER}"

# Check if the database exists
# Use the full connection string to connect directly
if [ "$( PGPASSWORD="${PASSWORD}" psql -h "${HOST}" -p "${PORT}" -U "${USER}" -d postgres -XtAc "SELECT 1 FROM pg_database WHERE datname='${DB_NAME}'" )" = '1' ]
then
echo "Database ${DB_NAME} already exists."
else
echo "Database ${DB_NAME} does not exist. Creating it..."
# Connect to the default postgres database to create the new database
PGPASSWORD="${PASSWORD}" psql -h "${HOST}" -p "${PORT}" -U "${USER}" -d postgres -c "CREATE DATABASE \"${DB_NAME}\";"
fi

echo "running db migration with 'poetry run alembic upgrade head'..."
poetry run alembic upgrade head
36 changes: 36 additions & 0 deletions bin/_setup_db.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,36 @@
echo "ensuring db exists"

# Get the username, password, host, port, and database name
db_settings=$(poetry run python $CURRENT_DIR/../gen3workflow/config.py | tail -1)
if [ -z "${db_settings}" ]; then
echo "'gen3workflow/config.py' did not return DB settings"
exit 1
fi
db_settings_array=($db_settings)
HOST=${db_settings_array[0]}
PORT=${db_settings_array[1]}
USER=${db_settings_array[2]}
PASSWORD=${db_settings_array[3]}
DB_NAME=${db_settings_array[4]}

if [ -z "${HOST}" ] || [ -z "${PORT}" ] || [ -z "${USER}" ] || [ -z "${PASSWORD}" ] || [ -z "${DB_NAME}" ]; then
echo "Failed to extract one or more components from DB settings"
exit 1
fi

echo "Extracted database name: ${DB_NAME}"
echo "Extracted username: ${USER}"

# Check if the database exists
# Use the full connection string to connect directly
if [ "$( PGPASSWORD="${PASSWORD}" psql -h "${HOST}" -p "${PORT}" -U "${USER}" -d postgres -XtAc "SELECT 1 FROM pg_database WHERE datname='${DB_NAME}'" )" = '1' ]
then
echo "Database ${DB_NAME} already exists."
else
echo "Database ${DB_NAME} does not exist. Creating it..."
# Connect to the default postgres database to create the new database
PGPASSWORD="${PASSWORD}" psql -h "${HOST}" -p "${PORT}" -U "${USER}" -d postgres -c "CREATE DATABASE \"${DB_NAME}\";"
fi

echo "running db migration with 'poetry run alembic upgrade head'..."
poetry run alembic upgrade head
27 changes: 27 additions & 0 deletions bin/run.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,27 @@
#!/usr/bin/env bash
set -e

# Mostly simulates the production run of the app as described in the Dockerfile.
# Uses Gunicorn, multiple Uvicorn workers
# Small config overrides for local dev, like hot reload when the code is modified and logs to stdout

CURRENT_DIR="$(cd "$(dirname "${BASH_SOURCE[0]}")/.." && pwd)"
echo $CURRENT_DIR
export ENV="production"

if [ -f "${CURRENT_DIR}/.env" ]; then
echo "Loading environment variables from ${CURRENT_DIR}/.env"
source "${CURRENT_DIR}/.env"
else
echo "No .env file found in ${CURRENT_DIR}. Using default environment variables."
fi

source "${CURRENT_DIR}/bin/_common_setup.sh"

poetry run gunicorn \
gen3workflow.app:app \
-k uvicorn.workers.UvicornWorker \
-c gunicorn.conf.py \
--reload \
--access-logfile - \
--error-logfile -
33 changes: 33 additions & 0 deletions bin/setup_prometheus
Original file line number Diff line number Diff line change
@@ -0,0 +1,33 @@
#!/usr/bin/env bash
# Prepares the prometheus_multiproc_dir folder to store the metrics from separate workers (per PID)
#
# This script is called by:
# Dockerfile & run.py
# - So local runs setup necessary environment vars and folders for prometheus metrics
# Test framework in conftest
# - So test runs setup necessary environment vars and folders for prometheus metrics

# Usage:
# ./setup_prometheus [DIR] [true]

# Default directory if no argument is provided
DIR=${1:-/var/tmp/prometheus_metrics}

# Determine whether to wipe the directory (default is to wipe)
SETUP_DIR=${2:-true}

set -ex

if [[ "$SETUP_DIR" == "true" ]]; then
echo "setting up $PROMETHEUS_MULTIPROC_DIR. clearing existing files, ensuring it exists, chmod 755"
rm -Rf "$DIR"
mkdir -p "$DIR"
chmod 755 "$DIR"
fi

if id -u gen3 &>/dev/null; then
chown "$(id -u gen3)":"$(id -g gen3)" "$DIR"
fi

export PROMETHEUS_MULTIPROC_DIR="$DIR"
echo "PROMETHEUS_MULTIPROC_DIR is $PROMETHEUS_MULTIPROC_DIR"
1 change: 1 addition & 0 deletions bin/test.sh
Original file line number Diff line number Diff line change
Expand Up @@ -4,6 +4,7 @@ set -e
CURRENT_DIR="$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd)"

source "${CURRENT_DIR}/_common_setup.sh"
source "${CURRENT_DIR}/_setup_db.sh"

echo "running tests with 'pytest'..."
poetry run pytest -vv --cov=gen3workflow --cov=migrations --cov-report term-missing:skip-covered --cov-report xml
11 changes: 11 additions & 0 deletions docs/metrics.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,11 @@
## Metrics

Metrics can be exposed at a `/metrics` endpoint compatible with Prometheus scraping and visualized in Prometheus or
Graphana, etc.

The metrics are defined in `gen3workflow/metrics.py` as follows:

* **gen3_workflow_api_requests_total**: API requests for made to Gen3-Workflow service.
* ** **More metrics yet to be decided** **

You can [run Prometheus locally](https://github.com/prometheus/prometheus) if you want to test or visualize these.
9 changes: 9 additions & 0 deletions docs/openapi.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -47,6 +47,7 @@ paths:
content:
application/json:
schema:
additionalProperties: true
title: Response Get Status Get
type: object
description: Successful Response
Expand All @@ -61,6 +62,7 @@ paths:
content:
application/json:
schema:
additionalProperties: true
title: Response Get Status Status Get
type: object
description: Successful Response
Expand All @@ -75,6 +77,7 @@ paths:
content:
application/json:
schema:
additionalProperties: true
title: Response Get Version Version Get
type: object
description: Successful Response
Expand All @@ -89,6 +92,7 @@ paths:
content:
application/json:
schema:
additionalProperties: true
title: Response Service Info Ga4Gh Tes V1 Service Info Get
type: object
description: Successful Response
Expand All @@ -105,6 +109,7 @@ paths:
content:
application/json:
schema:
additionalProperties: true
title: Response List Tasks Ga4Gh Tes V1 Tasks Get
type: object
description: Successful Response
Expand All @@ -120,6 +125,7 @@ paths:
content:
application/json:
schema:
additionalProperties: true
title: Response Create Task Ga4Gh Tes V1 Tasks Post
type: object
description: Successful Response
Expand All @@ -143,6 +149,7 @@ paths:
content:
application/json:
schema:
additionalProperties: true
title: Response Get Task Ga4Gh Tes V1 Tasks Task Id Get
type: object
description: Successful Response
Expand Down Expand Up @@ -172,6 +179,7 @@ paths:
content:
application/json:
schema:
additionalProperties: true
title: Response Cancel Task Ga4Gh Tes V1 Tasks Task Id Cancel Post
type: object
description: Successful Response
Expand Down Expand Up @@ -451,6 +459,7 @@ paths:
content:
application/json:
schema:
additionalProperties: true
title: Response Get Storage Info Storage Info Get
type: object
description: Successful Response
Expand Down
55 changes: 54 additions & 1 deletion gen3workflow/app.py
Original file line number Diff line number Diff line change
@@ -1,17 +1,21 @@
from fastapi import FastAPI
from fastapi.security import HTTPAuthorizationCredentials
import httpx
from importlib.metadata import version
import os
import time

from cdislogging import get_logger
from gen3authz.client.arborist.async_client import ArboristClient

from fastapi import Request, HTTPException
from gen3workflow import logger
from gen3workflow.config import config
from gen3workflow.metrics import Metrics
from gen3workflow.routes.ga4gh_tes import router as ga4gh_tes_router
from gen3workflow.routes.s3 import router as s3_router
from gen3workflow.routes.storage import router as storage_router
from gen3workflow.routes.system import router as system_router
from gen3workflow.auth import Auth


def get_app(httpx_client=None) -> FastAPI:
Expand Down Expand Up @@ -54,6 +58,55 @@ def get_app(httpx_client=None) -> FastAPI:
logger=get_logger("gen3workflow.gen3authz", log_level=log_level),
)

logger.info(
f"Setting up Metrics with ENABLE_PROMETHEUS_METRICS flag set to {config['ENABLE_PROMETHEUS_METRICS']}"
)
app.metrics = Metrics(
enabled=config["ENABLE_PROMETHEUS_METRICS"],
prometheus_dir=config["PROMETHEUS_MULTIPROC_DIR"],
)

if app.metrics.enabled:
app.mount("/metrics", app.metrics.get_asgi_app())

@app.middleware("http")
async def middleware_log_response_and_api_metric(
request: Request, call_next
) -> None:
"""
This FastAPI middleware effectively allows pre and post logic to a request.

We are using this to log the response consistently across defined endpoints (including execution time).

Args:
request (Request): the incoming HTTP request
call_next (Callable): function to call (this is handled by FastAPI's middleware support)
"""
start_time = time.perf_counter()
response = await call_next(request)
response_time_seconds = time.perf_counter() - start_time

path = request.url.path
method = request.method
if path not in config["ENDPOINTS_WITH_METRICS"]:
return response

try:
# TODO: Add user_id to this metric
metrics = app.metrics
metrics.add_create_task_api_interaction(
method=method,
path=path,
response_time_seconds=response_time_seconds,
status_code=response.status_code,
)
except Exception as e:
logger.warning(
f"Metrics were not logged for the request with {method=}, {path=}, {response.status_code=}, {response_time_seconds=}. Failed due to {e}",
exc_info=True,
)
return response

return app


Expand Down
8 changes: 8 additions & 0 deletions gen3workflow/auth.py
Original file line number Diff line number Diff line change
@@ -1,3 +1,4 @@
from typing import Union
from authutils.token.fastapi import access_token
from fastapi import HTTPException, Security
from fastapi.security import HTTPAuthorizationCredentials, HTTPBearer
Expand Down Expand Up @@ -65,6 +66,13 @@ async def get_token_claims(self) -> dict:

return token_claims

async def get_user_id(self) -> Union[str, None]:
try:
token_claims = await self.get_token_claims()
except Exception:
return None
return token_claims.get("sub")

async def authorize(
self,
method: str,
Expand Down
9 changes: 9 additions & 0 deletions gen3workflow/config-default.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -53,3 +53,12 @@ TASK_IMAGE_WHITELIST: []
# - public.ecr.aws/random/approved/public:*
# - 9876543210.dkr.ecr.us-east-1.amazonaws.com/approved/{username}:abc
# - quay.io/nextflow/bash:*


#############
# METRICS #
#############

ENABLE_PROMETHEUS_METRICS: false
PROMETHEUS_MULTIPROC_DIR: /var/tmp/prometheus_metrics
ENDPOINTS_WITH_METRICS: [ /ga4gh/tes/v1/tasks ]
6 changes: 6 additions & 0 deletions gen3workflow/config.py
Original file line number Diff line number Diff line change
Expand Up @@ -61,6 +61,12 @@ def validate_top_level_configs(self) -> None:
"DB_CONNECTION_STRING": {"type": "string"},
"TASK_IMAGE_WHITELIST": {"type": "array", "items": {"type": "string"}},
"TES_SERVER_URL": {"type": "string"},
"ENABLE_PROMETHEUS_METRICS": {"type": "boolean"},
"PROMETHEUS_MULTIPROC_DIR": {"type": "string"},
"ENDPOINTS_WITH_METRICS": {
"type": "array",
"items": {"type": "string"},
},
},
}
validate(instance=self, schema=schema)
Expand Down
21 changes: 21 additions & 0 deletions gen3workflow/metrics.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,21 @@
from typing import Any, Dict

from cdispyutils.metrics import BaseMetrics

from gen3workflow.config import config


class Metrics(BaseMetrics):
def __init__(self, prometheus_dir: str, enabled: bool = True) -> None:
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i think prometheus_enabled instead of just enabled would be better in case we want to expand the "metrics" class later... but maybe not worth updating cdis-python-utils...

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can create a PR with this change, but it might not be worth releasing a new version just for this.

super().__init__(
prometheus_dir=config["PROMETHEUS_MULTIPROC_DIR"], enabled=enabled
)

def add_create_task_api_interaction(
self,
**kwargs: Dict[str, Any],
) -> None:
"""
Add a metric for create_task API interactions
"""
self.increment_counter(name="gen3_workflow_tasks_created", labels=kwargs)
Loading
Loading