|
1 | 1 | # SRE Bot |
2 | 2 |
|
3 | | - |
| 3 | + |
4 | 4 |
|
5 | | -A slack bot for site reliability engineering at CDS. |
| 5 | +**SRE Bot** is a Slack bot designed for site reliability engineering at CDS. It automates incident management, integrates with cloud and collaboration platforms, and streamlines SRE workflows for modern teams. |
6 | 6 |
|
7 | | -This bot is using the Bolt framework in python (https://slack.dev/bolt-python/) and uses a web socket connection to Slack. |
| 7 | +--- |
8 | 8 |
|
9 | | -## Local Development with Containers |
| 9 | +## Features |
10 | 10 |
|
11 | | -This project uses [Visual Studio Code Remote - Containers](https://code.visualstudio.com/docs/remote/containers). |
| 11 | +- **Incident Management** |
| 12 | + - Create, update, and manage incidents (status, roles, conversations, documents, folders, notifications) |
| 13 | + - Display and update incident information |
| 14 | + - Notify about stale incident channels |
| 15 | + - Schedule incident retrospectives |
| 16 | + - On-call management for incidents |
| 17 | + |
| 18 | +- **AWS Integration & SCIM-like User/Group Management** |
| 19 | + - Manage AWS access requests and approvals |
| 20 | + - Monitor AWS account health |
| 21 | + - Assign and manage AWS user and group memberships (SCIM bridge functionality) |
| 22 | + - Manage AWS Identity Center and SSO access |
| 23 | + - Track AWS spending and cost reports |
| 24 | + |
| 25 | +- **Slack Integration & Webhook Management** |
| 26 | + - Create, list, and manage Slack webhooks |
| 27 | + - Send notifications to Slack channels |
| 28 | + - Integrate with Slack for incident and alert workflows |
| 29 | + |
| 30 | +- **Google Workspace Integration** |
| 31 | + - Manage Google Workspace users and groups (provisioning, reporting) |
| 32 | + - Integrate with Google for incident and workflow automation |
| 33 | + |
| 34 | +- **Role & Talent Management** |
| 35 | + - Manage organizational roles, including special workflows for "Talent" roles |
| 36 | + - Assign and update user roles within the organization |
| 37 | + |
| 38 | +- **Secret Management** |
| 39 | + - Store, retrieve, and manage secrets securely |
| 40 | + |
| 41 | +- **SRE & Geolocation Features** |
| 42 | + - Geolocate users or incidents (using MaxMind or similar) |
| 43 | + - SRE-specific workflows and reporting |
| 44 | + |
| 45 | +- **Notification & Alerting Integrations** |
| 46 | + - Integrate with external notification systems (OpsGenie, Sentinel, Trello, etc.) |
| 47 | + - Send and manage alerts from various sources |
| 48 | + |
| 49 | +- **Reporting & Analytics** |
| 50 | + - Generate and display reports (including Google Groups, AWS spending, etc.) |
| 51 | + |
| 52 | +- **Webhook Integrations (General)** |
| 53 | + - Manage and process incoming webhooks from various sources (AWS SNS, custom, etc.) |
| 54 | + - Route and handle webhook-based notifications |
| 55 | + |
| 56 | +--- |
12 | 57 |
|
13 | | -Here are the instructions to get started with developing locally. |
| 58 | +## Who is this for? |
14 | 59 |
|
15 | | -Requirements: |
| 60 | +- SRE teams, DevOps engineers, and incident responders at CDS or similar organizations. |
| 61 | +- Teams looking to automate cloud, incident, and collaboration workflows in Slack. |
| 62 | + |
| 63 | +--- |
| 64 | + |
| 65 | +## Example Workflows |
| 66 | + |
| 67 | +- `/incident create` — Start a new incident and assign roles |
| 68 | +- `/aws access-request` — Request temporary AWS access |
| 69 | +- `/webhook add slack` — Register a new Slack webhook for notifications |
| 70 | +- `/role assign talent` — Assign a "Talent" role to a user |
| 71 | + |
| 72 | +--- |
| 73 | + |
| 74 | +## Getting Started (Local Development) |
| 75 | + |
| 76 | +This project uses [Visual Studio Code Remote - Containers](https://code.visualstudio.com/docs/remote/containers). |
| 77 | + |
| 78 | +### Requirements |
16 | 79 |
|
17 | 80 | - Docker installed and running |
18 | 81 | - VS Code |
19 | 82 |
|
20 | | -Steps: |
| 83 | +### Steps |
21 | 84 |
|
22 | 85 | 1. Clone the repo |
23 | | -2. Open VS Code with Dev Container (see [Quick start: Open an existing folder in a container](https://code.visualstudio.com/docs/remote/containers#_quick-start-open-an-existing-folder-in-a-container)) |
24 | | -3. Install Python dependencies |
| 86 | +2. Open VS Code with Dev Container ([Quick start guide](https://code.visualstudio.com/docs/remote/containers#_quick-start-open-an-existing-folder-in-a-container)) |
| 87 | +3. Install Python dependencies: |
| 88 | + |
| 89 | + ```sh |
| 90 | + cd app && pip install --no-cache-dir -r requirements.txt |
| 91 | + ``` |
| 92 | + |
| 93 | +4. Add a `.env` file to the `/workspace/app` folder (Contact SRE team for the project-specific .env setup) |
| 94 | +5. Launch the dev bot: |
| 95 | + |
| 96 | + ```sh |
| 97 | + make dev |
| 98 | + ``` |
25 | 99 |
|
26 | | -``` |
27 | | -cd app && pip install --no-cache-dir -r requirements.txt |
28 | | -``` |
| 100 | +6. Test your development in the dedicated Slack channel (SRE team will confirm which channel to use) |
29 | 101 |
|
30 | | -4. Add a ``.env`` file to the ``/workspace/app`` folder (Contact SRE team for the project specific .env setup) |
31 | | -5. Launch the dev bot with ```make dev``` |
32 | | -6. Test your development in the dedicated channel (SRE team will confirm which channel to point to) |
| 102 | +--- |
33 | 103 |
|
34 | | -## Refactoring |
| 104 | +## Project Structure |
35 | 105 |
|
36 | | -The bot is currently being refactored to separate the integration concerns from the bot's features and commands. The goal is to make the bot more modular and easier to maintain. |
| 106 | +- `app/integrations/` — Integrations with external services (Google Workspace, Slack, AWS, etc.) |
| 107 | +- `app/modules/` — Bot features and user-facing commands |
| 108 | +- `app/jobs/` — Scheduled jobs (e.g., reminders, status checks) |
37 | 109 |
|
38 | | -### Integrations |
| 110 | +--- |
39 | 111 |
|
40 | | -The `app/integrations` will contain the bot's interactions with external services (e.g. Google Workspace, Slack, etc.) |
| 112 | +## Security & Privacy |
41 | 113 |
|
42 | | -The integrations will be responsible for handling the bot's interactions with the external services. They will be responsible for sending and receiving messages, and handling any other interactions with the external services. |
| 114 | +SRE Bot handles sensitive data such as secrets and user/group assignments. Please review our [security guidelines](./SECURITY.md) and ensure you follow best practices for environment configuration and access control. |
43 | 115 |
|
44 | | -From a design perspective, they should be as simple as possible, and should not contain any business logic. They should be responsible for handling the interactions with the external services, and should delegate any business logic to the features. |
| 116 | +--- |
45 | 117 |
|
46 | | -### Features (aka modules) |
| 118 | +## Getting Help |
47 | 119 |
|
48 | | -The `app/modules` will contain the bot's features and commands. Each feature will have its own directory and will be responsible for handling the bot's interactions with the user. |
| 120 | +- For questions or support, contact the SRE team. |
| 121 | +- For feature requests or bug reports, open an issue in this repository. |
49 | 122 |
|
50 | | -The features may end up needing to interact with the integrations to perform their tasks. |
| 123 | +--- |
51 | 124 |
|
52 | | -### Jobs |
| 125 | +## License |
53 | 126 |
|
54 | | -The `app/jobs` will contain the bot's scheduled jobs. Each job will have its own directory and will be responsible for handling the bot's scheduled tasks. |
| 127 | +[MIT License](./LICENSE) |
55 | 128 |
|
56 | | -Examples may include sending daily reminders, checking the status of a service, etc. |
| 129 | +--- |
0 commit comments