Skip to content

Comments

feat: instance monitors#42

Merged
jason-lynch merged 14 commits intomainfrom
feat/PLAT-33/instance-monitors-reorder
Jun 6, 2025
Merged

feat: instance monitors#42
jason-lynch merged 14 commits intomainfrom
feat/PLAT-33/instance-monitors-reorder

Conversation

@jason-lynch
Copy link
Member

@jason-lynch jason-lynch commented Jun 4, 2025

This PR introduces an extensible monitoring mechanism, uses it to monitor instance status, and returns that status in the API. The instance status is continuously refreshed and stored every 15 seconds for the lifetime of the instance. I've implemented an instance monitor resource in our resource model so that the monitors can be created and torn down alongside the instances. The API returns a full instance status in the newly-implemented get-database endpoint and the update-database endpoint1. The list-databases endpoint returns an abbreviated view of the instance status.

Example API responses

Abbreviated view in list-databases

[
  {
    "created_at": "2025-06-03T20:26:15Z",
    "id": "5acc3f42-e24d-41ff-b487-37720f456a2e",
    "instances": [
      {
        "host_id": "8f6e5455-e228-4e2e-9129-a86cba1437c8",
        "id": "2d5c4307-9e1c-558d-ab8a-6c1f326a5fad",
        "node_name": "n1",
        "state": "available"
      },
      {
        "host_id": "36dcd7ff-9f04-476e-ac6f-5495d075607d",
        "id": "cfa24ed8-eb03-582e-abd1-0b0f7c092132",
        "node_name": "n2",
        "state": "available"
      },
      {
        "host_id": "fa461a39-5867-4a72-9923-dd7ce91e1eab",
        "id": "21285a07-b2d0-5d24-9a2a-56c8d87e487f",
        "node_name": "n3",
        "state": "available"
      }
    ],
    "state": "available",
    "updated_at": "2025-06-03T20:26:15Z"
  }
]

Full view in get-database

{
  // ...
  "instances": [
    {
      "created_at": "2025-06-03T20:26:32Z",
      "host_id": "8f6e5455-e228-4e2e-9129-a86cba1437c8",
      "hostname": "orbstack",
      "id": "2d5c4307-9e1c-558d-ab8a-6c1f326a5fad",
      "ipv4_address": "198.19.249.2",
      "node_name": "n1",
      "patroni_state": "running",
      "port": 36411,
      "postgres_version": "17.5",
      "read_only": "off",
      "role": "primary",
      "spock_version": "4.0.10",
      "state": "available",
      "status_updated_at": "2025-06-04T14:11:35Z",
      "subscriptions": [
        {
          "name": "sub_n1n2",
          "provider_node": "n2",
          "status": "replicating"
        },
        {
          "name": "sub_n1n3",
          "provider_node": "n3",
          "status": "replicating"
        }
      ],
      "updated_at": "2025-06-03T20:26:42Z"
    },
    {
      "created_at": "2025-06-03T20:26:32Z",
      "host_id": "36dcd7ff-9f04-476e-ac6f-5495d075607d",
      "hostname": "orbstack",
      "id": "cfa24ed8-eb03-582e-abd1-0b0f7c092132",
      "ipv4_address": "198.19.249.2",
      "node_name": "n2",
      "patroni_state": "running",
      "port": 36408,
      "postgres_version": "17.5",
      "read_only": "off",
      "role": "primary",
      "spock_version": "4.0.10",
      "state": "available",
      "status_updated_at": "2025-06-04T14:11:35Z",
      "subscriptions": [
        {
          "name": "sub_n2n1",
          "provider_node": "n1",
          "status": "replicating"
        },
        {
          "name": "sub_n2n3",
          "provider_node": "n3",
          "status": "replicating"
        }
      ],
      "updated_at": "2025-06-03T20:26:42Z"
    },
    {
      "created_at": "2025-06-03T20:26:32Z",
      "host_id": "fa461a39-5867-4a72-9923-dd7ce91e1eab",
      "hostname": "orbstack",
      "id": "21285a07-b2d0-5d24-9a2a-56c8d87e487f",
      "ipv4_address": "198.19.249.2",
      "node_name": "n3",
      "patroni_state": "running",
      "port": 36407,
      "postgres_version": "17.5",
      "read_only": "off",
      "role": "primary",
      "spock_version": "4.0.10",
      "state": "available",
      "status_updated_at": "2025-06-04T14:11:35Z",
      "subscriptions": [
        {
          "name": "sub_n3n1",
          "provider_node": "n1",
          "status": "replicating"
        },
        {
          "name": "sub_n3n2",
          "provider_node": "n2",
          "status": "replicating"
        }
      ],
      "updated_at": "2025-06-03T20:26:42Z"
    }
  ],
  // ...
}

Example showing a downed instance

{
  // ...
  "instances": [
    {
      "created_at": "2025-06-03T20:26:32Z",
      "error": "failed to get postgres container: no postgres container found for \"2d5c4307-9e1c-558d-ab8a-6c1f326a5fad\"",
      "host_id": "8f6e5455-e228-4e2e-9129-a86cba1437c8",
      "id": "2d5c4307-9e1c-558d-ab8a-6c1f326a5fad",
      "node_name": "n1",
      "state": "unknown",
      "status_updated_at": "2025-06-04T13:22:30Z",
      "updated_at": "2025-06-03T20:26:42Z"
    },
    {
      "created_at": "2025-06-03T20:26:32Z",
      "host_id": "36dcd7ff-9f04-476e-ac6f-5495d075607d",
      "hostname": "orbstack",
      "id": "cfa24ed8-eb03-582e-abd1-0b0f7c092132",
      "ipv4_address": "198.19.249.2",
      "node_name": "n2",
      "patroni_state": "running",
      "port": 36408,
      "postgres_version": "17.5",
      "read_only": "off",
      "role": "primary",
      "spock_version": "4.0.10",
      "state": "available",
      "status_updated_at": "2025-06-04T13:22:30Z",
      "subscriptions": [
        {
          "name": "sub_n2n1",
          "provider_node": "n1",
          "status": "down"
        },
        {
          "name": "sub_n2n3",
          "provider_node": "n3",
          "status": "replicating"
        }
      ],
      "updated_at": "2025-06-03T20:26:42Z"
    },
    {
      "created_at": "2025-06-03T20:26:32Z",
      "host_id": "fa461a39-5867-4a72-9923-dd7ce91e1eab",
      "hostname": "orbstack",
      "id": "21285a07-b2d0-5d24-9a2a-56c8d87e487f",
      "ipv4_address": "198.19.249.2",
      "node_name": "n3",
      "patroni_state": "running",
      "port": 36407,
      "postgres_version": "17.5",
      "read_only": "off",
      "role": "primary",
      "spock_version": "4.0.10",
      "state": "available",
      "status_updated_at": "2025-06-04T13:22:30Z",
      "subscriptions": [
        {
          "name": "sub_n3n1",
          "provider_node": "n1",
          "status": "down"
        },
        {
          "name": "sub_n3n2",
          "provider_node": "n2",
          "status": "replicating"
        }
      ],
      "updated_at": "2025-06-03T20:26:42Z"
    }
  ],
  // ...
}

PLAT-33

Footnotes

  1. The create-database response has the instances field, but the instances are created after this call has returned.

This was using the caller's host ID, which might not be the same as the
instance's host ID.
Adds new 'instance status monitoring' functionality to periodically
query and store instance status. Currently, this only includes the
status info that will be returned by the API. But, it's extensible to
other information, such as metrics, in future PRs.

PLAT-33
@jason-lynch jason-lynch requested review from mmols and tsivaprasad June 4, 2025 12:35
@jason-lynch jason-lynch force-pushed the feat/PLAT-33/instance-monitors-reorder branch 2 times, most recently from 5fbf489 to 2f5af04 Compare June 4, 2025 16:07
Adds instance data to the update and get database endpoints. Also
renames `inspect-database` to `get-database`.

PLAT-33
@jason-lynch jason-lynch force-pushed the feat/PLAT-33/instance-monitors-reorder branch from 2f5af04 to 8aecc3d Compare June 4, 2025 16:10
The stored entries are returned in reverse order, so the last entry is
at the beginning of the slice and not the end.

PLAT-33
@jason-lynch jason-lynch force-pushed the feat/PLAT-33/instance-monitors-reorder branch from 77f9877 to dee4288 Compare June 5, 2025 11:53
This field was just missing from the `taskToAPI` conversion function.

PLAT-33
@jason-lynch jason-lynch requested a review from tsivaprasad June 5, 2025 20:51
@jason-lynch jason-lynch merged commit 055b0ec into main Jun 6, 2025
2 checks passed
@jason-lynch jason-lynch deleted the feat/PLAT-33/instance-monitors-reorder branch June 6, 2025 12:09
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants