Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
6 changes: 6 additions & 0 deletions .checkov.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,6 @@
#
#
#
skip-check:
- CKV_OPENAPI_3 # Schema don't need cleartext checks.
- CKV_OPENAPI_4 # Schema don't need security definitions.
50 changes: 50 additions & 0 deletions .github/ISSUE_TEMPLATE/config.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,50 @@
name: Bug Report
description: File a bug report
title: "[Bug]: "
labels: ["bug", "triage"]
body:
- type: textarea
id: what-happened
attributes:
label: I Expect
placeholder: Tell us what you see!
value: |-

## When (optional)

-
-

## I Expect

-
-

## Instead

-

## Notes

-
validations:
required: true
- type: textarea
id: logs
attributes:
label: Relevant log output
description: >-
Please copy and paste any relevant log output.
This will be automatically formatted into code,
so no need for backticks.
render: shell
- type: checkboxes
id: terms
attributes:
label: Code of Conduct
description: >
By submitting this issue,
you agree to follow our [Code of Conduct](https://example.com)
options:
- label: I agree to follow this project's Code of Conduct
required: true
29 changes: 29 additions & 0 deletions .github/ISSUE_TEMPLATE/issue-simple.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,29 @@
---
name: Minimal issue template
about: Minimal Issue Template
title: '[bug]: '
labels: ''
assignees: ''
---

## When I (optional)

1. xx
1. xx
1. xx

## I expect

- [ ] yy
- [ ] zz

## Instead

-
-
-

## Notes

Attach sanitized logs, screenshots, outputs.
CC folks
142 changes: 142 additions & 0 deletions .github/POSTMORTEM.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,142 @@
---
# This is a template for a postmortem reports inspired by
# the teamdigitale's one published on medium.com.
# For the original version, see the references section.
title: Fake Postmortem - Cloud connectivity incident
date: 2018-05-23
summary: >-
Fake Postmortem inspired by the following: The Digital Team's websites were unreachable for 28 hours due to a cloud provider
outage.
authors:
- name: Mario Rossi
- name: Franco Bianchi
references:
- https://medium.com/team-per-la-trasformazione-digitale/document-postmortem-technology-italian-government-public-administration-99639a0a7877
- https://abseil.io/resources/swe-book/html/ch02.html#blameless_postmortem_culture
glossary: {}
keywords: []
...
---
# Postmortem - Template for a postmortem report

## Summary

**Impact**:

The following services cannot be reached:

- Dashboard Team
- Three-Year ICT Plan
- Designers Italia
- Developers Italia
- Docs Italia
- Forum Italia

**Duration**:
28 hours

**Cause**:
OpenStack network outage - cloud provider _Cloud SPC Lotto 1_

## Context

The Digital Team's websites are based mainly on static HTML generated by the source content of the repositories on GitHub. The HTML code is published via a web server (nginx) and exposed according to HTTPS protocol. Forum Italia (http://forum.italia.it) is the only exception to this deployment model, and is managed separately via Docker containers. At any given time, one or more web servers can be deployed on the cloud provider's (Cloud SPC Lotto 1) OpenStack virtual machines, using the API provided by the platform.

Cloud resources (virtual machines and volume data) are allocated towards services according to the Agency for Digital Italy's Cloud SPC contract.

## Impact and damage assessment

On 19/05/2018, the following services became unreachable due to an internal connectivity issue of the Cloud Service Provider "Cloud SPC":

- Dashboard Team
- Three-Year ICT Plan
- Designers Italia
- Developers Italia
- Docs Italia
- Forum Italia

## Causes and Contributing Factors

According to a postmortem document released by the supplier on 2018-06-07, the interruption of connectivity experienced by the 31 users (tenants) of the SPC Cloud service was triggered by a planned update of the OpenStack platform carried out on the night of Thursday 2018-05-17.

### Detection

The problem was detected the following morning (2018-05-18), thanks to reports from users who were no longer able to access the services provided on the Cloud SPC platform.

### Causes

The document states that a restart of the control nodes of the OpenStack platform (nodes that handle OpenStack's management services: neutron, glance, cinder, etc.) caused “an anomaly” in the network infrastructure, blocking the traffic on several computing nodes (nodes where virtual instances are executed), and causing virtual machines belonging to 31 users to become unreachable.
The postmortem document also explains how a bug in the playbook (update script) would have blocked network activities by modifying the permissions of the file `/var/run/neutron/lock/neutron-iptables`, as indicated in the platform's official documentation.

Again, according to the supplier, restarting the nodes was necessary for the application of security updates for Meltdown and Spectre (CVE-2017-5715, CVE-2017-5753 and CVE-2017-5754).

The unavailability of the Cloud SPC infrastructure was undoubtedly the root cause of the problem, but the lack of an application-level protection mechanism for the Digital Team's services prolonged their unavailability.
Indeed, due to the fact that the possibility of the entire cloud provider becoming unreachable had not been taken into account during the design phase of the services, it was not possible to respond adequately to this event.
Despite the SPC Cloud provider's failover mechanisms, the web services were not protected from generalized outages capable of undermining the entire infrastructure of the only Cloud provider at our disposal.

## Actions taken

WRITEME: A list of action items taken to mitigate/fix the problem

- * Action 1
* Owner
- * Action 2
* Owner
...

## Preventive actions

WRITEME: A list of action items to prevent this from happening again



## Lessons learned

### What went wrong

The Cloud SPC platform cannot currently distribute virtual machines through data centers or different regions (OpenStack region).
It would have been useful to be able to distribute virtual resources through independent infrastructures, even infrastructures provided by the same supplier.

### What should have been done

In hindsight, the Public Administration should have access to multiple cloud providers, so as to ensure the resilience of its services even when the main cloud provider is interrupted.

### Where we got lucky

WRITEME: What things went right that could have gone wrong

### What should we do differently next time

The most important lesson we learned from this experience is the need to continue investing in the development of a cross-platform, multi-supplier Cloud model.
This model would guarantee the reliability of Public Administration services even when the main cloud provider becomes affected by problems that make it unreachable for a long period of time.

## Timeline

A timeline of the event, from discovery through investigation to resolution.
All times are in CEST.

### 2018-05-17

22.30 CEST: The SPC MaaS alert service sends alerts through email indicating that several nodes can no longer be reached. <START of programmed activities>

### 2018-05-19

6:50 CEST: The aforementioned services, available at the IP address 91.206.129.249, can no longer be reached <START of INTERRUPTION>

### 2018-05-19

08:00 CEST: The problem is detected and reported to the supplier

09:30 CEST: The machines are determined to be accessible through OpenStack's administration interface (API and GUI) and internal connectivity reveals no issue. Virtual machines can communicate through the tenant's private network, but do not connect to the Internet.

15:56 CEST: The Digital Team sends the supplier and CONSIP a help request via email

18:00 CEST: The supplier communicates that they have identified the problem, which turns out to be the same problem experienced by the DAF project, and commence work on a manual workaround

19:00 CEST: The supplier informs us that a fix has been produced and that it will be applied to the virtual machines belonging to the 31 public administrations (tenants) involved.

### 2018-05-20

11:10 CEST: The supplier restores connectivity to the VMs of the AgID tenant

11:30 CEST: The Digital Team reboots the web services and the sites are again reachable <END OF INTERRUPTION>
13 changes: 13 additions & 0 deletions .github/PULL_REQUEST_TEMPLATE/pr-simple.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,13 @@
## This PR

- [ ]
- [ ]
- [ ]

## It's done

- Rationale of the implementation

## Checks

- [ ] This PR conforms the [CONTRIBUTING.md](CONTRIBUTING.md) guidelines
90 changes: 90 additions & 0 deletions .github/workflows/lint.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,90 @@
# Run linting action with some custom setup.

name: Lint

on:
push:
branches:
- main
pull_request:
branches:
- main

# Allows you to run this workflow manually from the Actions tab
workflow_dispatch:

permissions: read-all
jobs:
lint:
# The type of runner that the job will run on
runs-on: ubuntu-latest
steps:
# Checks-out your repository under $GITHUB_WORKSPACE, so your job can access it
- uses: actions/checkout@v4
with:
fetch-depth: 0

- name: Super-Linter
uses: super-linter/super-linter/[email protected]
env:
# Either disable MULTI_STATUS or pass the GITHUB_TOKEN.
MULTI_STATUS: false
# GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }}
VALIDATE_MARKDOWN: false
VALIDATE_MARKDOWN_PRETTIER: false
# Disabled for conflicts with the isort version used in pre-commit
# you can re-enable it if you align your local isort with
# the one in the super-linter image.
VALIDATE_PYTHON_ISORT: false
VALIDATE_XML: false
#
# YAML validation is delegated to pre-commit.
#
VALIDATE_YAML: false
VALIDATE_YAML_PRETTIER: false
VALIDATE_OPENAPI: false
VALIDATE_NATURAL_LANGUAGE: false
#
# JS/CSS delegated to eslint.
#
VALIDATE_CSS: false
VALIDATE_JAVASCRIPT_ES: false
VALIDATE_JAVASCRIPT_STANDARD: false
VALIDATE_JSON_PRETTIER: false
VALIDATE_JSON: false
#
# TS/TSX delegated to eslint.
#
VALIDATE_TSX: false
VALIDATE_TYPESCRIPT_ES: false
# JAVASCRIPT_ES_CONFIG_FILE: eslint.config.mjs
VALIDATE_TYPESCRIPT_STANDARD: false
# TYPESCRIPT_ES_CONFIG_FILE: eslint.config.mjs
VALIDATE_JSCPD: false # Disable copy-paste detection.

pre-commit:
# The type of runner that the job will run on
runs-on: ubuntu-latest
container: python:3.9
steps:
- uses: actions/checkout@v4

- name: Run commit hooks.
run: |
pip3 --no-cache-dir install pre-commit
git --version
pwd
ls -la
id
git config --global --add safe.directory "$PWD"
pre-commit install
pre-commit run -a

# Store (expiring) logs on failure.
# Retrieve artifacts via `gh run download`.
- uses: actions/upload-artifact@v4
if: failure()
with:
name: pre-commit.log
path: /github/home/.cache/pre-commit/pre-commit.log
retention-days: 5
28 changes: 28 additions & 0 deletions .pre-commit-config.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,28 @@
#
# Run pre-commit hooks. You can run them without installing
# the hook with
#
# $ pre-commit run --all-files
#
# See https://pre-commit.com for more information
# See https://pre-commit.com/hooks.html for more hooks
repos:
- repo: https://github.com/pre-commit/pre-commit-hooks
rev: v6.0.0
hooks:
# Manage spaces.
- id: trailing-whitespace
- id: end-of-file-fixer
- id: check-added-large-files
- id: check-symlinks
# Check file syntax/format
- id: check-xml
- id: check-json
- id: check-yaml
args: [--allow-multiple-documents]
# Security checks.
- id: detect-private-key
- id: detect-aws-credentials
args:
# See https://github.com/pre-commit/pre-commit-hooks/issues/174
- --allow-missing-credentials
26 changes: 26 additions & 0 deletions .yamllint
Original file line number Diff line number Diff line change
@@ -0,0 +1,26 @@
#
# yamllint configuration files. It disables some checks to ease the integration
# with other yaml tools (eg. pre-commit autoformatter, ...)
#
extends: default

rules:
document-end: disable
document-start: disable
truthy: disable
brackets: disable
line-length:
max: 90
indentation:
indent-sequences: consistent

# Specify the paths to be processed
ignore: |
pnpm-lock.yaml
node_modules

# Override rules for specific paths
overrides:
- path: "apps/example/public/schemas/*.yaml"
rules:
line-length: disable
Loading
Loading