LBHackney-IT
diff --git a/‎docs/api-playbook/11-faqs.md‎
Lines changed: 55 additions & 0 deletions b/‎docs/api-playbook/11-faqs.md‎
Lines changed: 55 additions & 0 deletions
diff --git a/‎docs/api-playbook/Data migration/_category_.yml‎
Lines changed: 1 addition & 0 deletions b/‎docs/api-playbook/Data migration/_category_.yml‎
Lines changed: 1 addition & 0 deletions
diff --git a/‎docs/api-playbook/Data migration/data_migration.md‎
Lines changed: 123 additions & 0 deletions b/‎docs/api-playbook/Data migration/data_migration.md‎
Lines changed: 123 additions & 0 deletions
diff --git a/‎docs/api-playbook/Data migration/pipeline_implementation.md‎
Lines changed: 37 additions & 0 deletions b/‎docs/api-playbook/Data migration/pipeline_implementation.md‎
Lines changed: 37 additions & 0 deletions
diff --git a/‎docs/api-playbook/DevOps practices/Monitoring/alerting.md‎
Lines changed: 69 additions & 0 deletions b/‎docs/api-playbook/DevOps practices/Monitoring/alerting.md‎
Lines changed: 69 additions & 0 deletions
diff --git a/‎docs/api-playbook/DevOps practices/Monitoring/application_logging.md‎
Lines changed: 77 additions & 0 deletions b/‎docs/api-playbook/DevOps practices/Monitoring/application_logging.md‎
Lines changed: 77 additions & 0 deletions
@@ -0,0 +1,55 @@
+# Frequently Asked Questions
+
+## How can I find Base URLs for APIs?:
+
+You can do this by looking up the API in AWS API Gateway.
+1. Go to the relevant [AWS Account](https://d-936715b9ec.awsapps.com/start#/) that the API is deployed to;
+2. Choose "API Gateway" under "Services" in AWS;
+3. Click on your API;
+4. Click on "Stages";
+5. Choose the stage you want to use;
+6. Now you can see the entire URL inside a blue box on the top with the heading "Invoke URL";
+
+## How do I find out whether an API is healthy?:
+
+You can utilise AWS Canaries, which we use for uptime monitoring.
+1. Go to the relevant [AWS Account](https://d-936715b9ec.awsapps.com/start#/) that the API is deployed to;
+2. Choose “CloudWatch” under “Services” in AWS;
+3. Click on “Synthetics Canaries” in the sidebar (under “Application Monitoring”);
+4. This will list all of the canaries and their statuses;
+
+To see more information about canaries and how to set them up, visit the relevant page [here](../DevOps practices/Monitoring/uptime_monitoring).
+
+## Where do I find my token to use to authenticate access to APIs?:
+
+1. Visit the [Hackney Authentication Service](https://auth.hackney.gov.uk/auth?redirect_uri=https://auth.hackney.gov.uk/auth/check_token) website to check your Hackney JWT token. You may need to log in during this;
+2. Inspect your cookies to find the `hackneyToken` cookie. This will depend on your browser. See [here](https://cookie-script.com/documentation/how-to-check-cookies-on-chrome-and-firefox) for instructions on how to find cookies on Google Chrome or Firefox;
+3. Add this token value to the `Authorization` header in all API requests. This will allow you to authenticate access to Hackney APIs;
+
+## How do I find the authentication process used for an API?:
+
+You can do this by looking up the API in AWS API Gateway.
+1. Go to the relevant [AWS Account](https://d-936715b9ec.awsapps.com/start#/) that the API is deployed to;
+2. Choose "API Gateway" under "Services" in AWS;
+3. Click on your API;
+4. Click on "Resources" and select  **`/{proxy+}`**. This is the API endpoint;
+5. This will bring up a box that looks like this:
+    ![API Authorization](./doc-images/api_authorisation.png)
+    - If it says `Authorization: Custom`, that means this API uses a lambda authoriser;
+    - If it says `API Key: Required`, this means that this API uses an API key;
+
+## Where do I find the API Key value from?:
+
+This can be found through API Gateway.
+1. Go to the relevant [AWS Account](https://d-936715b9ec.awsapps.com/start#/) that the API is deployed to;
+2. Choose "API Gateway" under "Services" in AWS;
+3. Choose “API Keys” from the sidebar;
+4. This will bring up a list of API Keys. Choose the relevant API Key, select the blue ‘Show’ link, and you will see the API Key value;
+
+## How can I access our CircleCI Pipelines?:
+
+Visit [https://app.circleci.com/pipelines/github/LBHackney-IT](https://app.circleci.com/pipelines/github/LBHackney-IT) to view workflows from projects you follow at Hackney.
+
+If you aren't subscribed to any projects, simply navigate to the'projects' tab on the sidebar to view and search for all Hackney projects.
+
+![Circle CI Sidebar - Click on the second icon (projects) to view all projects in a workspace](./doc-images/CircleCI_sidebar.png)
@@ -0,0 +1 @@
+position: 8
@@ -0,0 +1,123 @@
+# Setting Up AWS DMS
+
+# What is AWS DMS?:
+
+AWS Data Migration Service (DMS) is a service that allows us to migrate data between a source (in our case, on-premises database) and a target (in our case, Postgres database hosted in AWS).
+## DMS supported replication types:
+
+- **Continuous replication (CDC)**:
+  * When we want to do a one-off migration of all data and then continuously capture new inserts, updates and deletes and reflect them in our target database.
+
+- **One-off data migration**:
+  * When the goal is to migrate all data from a source, and is expected that changes will not be captured and reflected.
+
+
+# Which AWS DMS set up to use?:
+
+## For continuous migration:
+
+### CDC:
+
+CDC is a SQL server feature, available only on Enterprise and Developer editions.
+
+It allows for changes to be captured (inserts/updates/deletes).
+
+#### Use case:
+
+When the source database does not have primary keys and you want to migrate data continuously.
+
+### MS Replication:
+
+MS Replication is a SQL server feature available on all editions.
+
+It creates a “distribution” database and every time there is a change, it is captured and stored in the “distribution” database.
+
+.MS will then read from that database to reflect the changes in the target database.
+
+**Note:** The sql user created must have **sysadmin** permissions to set up replication.
+
+**Additional notes:** Configuration on the source database is required (please see below section). Additionally, SQL servers DO NOT come with MS replication features pre-installed, so the server might require a set up.
+
+#### Use cases:
+
+- When you want to migrate data continuously;
+
+- When the SQL server is not Enterprise/Developer edition;
+
+- When the source database has tables, which make use of primary keys;
+
+## For one-off set up:
+
+  -  No database configuration is required;
+
+  - The sql user must have at least db_owner permissions;
+
+  - The replication runs ones and migrates the data specified;
+
+  - There are no subsequent runs of the migration task, unless triggered with other means;
+
+###  Use cases:
+
+  1. When only a one-off migration is required;
+
+  2. When the underlying source database is a reporting server and there are no possible ways to capture updates. In this scenario, we need to daily run a one-off migration, after the reporting server was updated with the latest data;
+
+
+# How to set up DMS:
+
+## Database set up:
+
+- [DMS with SQL CDC](https://docs.google.com/document/d/1EaZ-a8ejQwWQ40OGDGobxhTqtxXvtX9Ydk5mTFASUMo/edit).
+
+- [DMS with MS Replication](https://docs.google.com/document/d/14kNirloRWXCnla08brXiTihCMIm24chygc1lGUjNVbE/edit?usp=sharing).
+
+## AWS DMS set up via Terraform:
+
+Both DMS and Postgres can be created via Terraform.
+### DMS:
+
+[Template repository and example usage](https://github.com/LBHackney-IT/aws-dms-terraform).
+
+
+![DMS example usage](../doc-images/data_migration.png).
+
+
+**Notes:**:
+
+- Follow the example usage, which also demonstrates how to add table mappings (specifying which tables are to be replicated);
+
+- The source DB server should be specified with IP and not the server name;
+
+- DMS instance should be in the VPC, where the VPN is set up to ensure communication to on-prem is possible;
+
+- <u>  Make sure your DMS instance’s subnet group has only private subnets in it! </u>;
+### Postgres:
+
+[Template repository and example usage](https://github.com/LBHackney-IT/aws-hackney-common-terraform/tree/master/modules/database/postgres)
+
+![PostgreSQL example usage](../doc-images/data2.png)
+
+**Notes:**:
+
+ - DMS does not support Postgres version 12, so use version 11 or older;
+ - Always store passwords in parameter store and do not hardcode them;
+ - “Multi_az” should be true for production databases;
+ - “subnet_ids” requires subnets in 2 different AZs. Make sure those are private subnets to ensure that the database is secure;
+- Currently not terraformed: To enable traffic from DMS to your Postgres instance, ensure you add to the ingress rules of the database’s security group all traffic   from DMS security group;
+
+![AWS console](../doc-images/data3.png)
+
+
+## Data migration using a data pipeline:
+
+  ** What is a data pipeline? **
+
+  A data pipeline is an automated flow that gets data stored in one location (source) and uploads it to a target destination.
+
+### Data pipeline - CSV to Postgres:
+
+  As of 26/06/2020, we have implemented one data pipeline.
+
+  The pipeline takes data uploaded in an S3 bucket in .csv format and uploads the data into a Postgres database.
+
+  ![Data pipelines](../doc-images/data4.png)
@@ -0,0 +1,37 @@
+# Data Pipeline Implementation
+
+## S3:
+
+The source S3 bucket has been configured to invoke a Lambda function when a file has been uploaded with extension `.csv`.
+
+![S3 Bucket](../doc-images/data5.png)
+
+The configuration for the source S3 bucket is done using the pipeline’s serverless implementation - <u> no manual set up is required for events </u>.
+
+![S3 configuration](../doc-images/data6.png)
+
+## Lambda:
+
+The lambda function implements the following:
+
+- Receives S3 notifications;
+- Retrieves bucket and file details from the notification;
+- Truncates the target table in the target database;
+- Makes use of AWS Postgres function that copies data from an csv file in S3 to Postgres to migrate the data;
+- Logs any exceptions and errors to Cloudwatch;
+
+ ** Note: The Postgres database and table to match the CSV format needs to be created separately. **
+
+## How to set up the data pipeline for a project:
+
+[Template repository for the data pipeline code implementation](https://github.com/LBHackney-IT/s3-to-postgres-data-pipeline)
+
+ 1. Create a repository for your pipeline by using the above template;
+ 2. Update the code by replacing the names of the existing pipeline to the name of your project’s pipeline;
+ 3. Ensure you populate the specified environment variables in the README file of the repository;
+ 4. Deploy using serverless - this will deploy the Lambda and it will set up an existing S3 bucket with the event it needs to listen for;
+
+** Notes **
+
+ - You need to create the S3 bucket separately and provide the name in the serverless.yml file of the pipeline repository;
+ - You need to create the Postgres separately and create the table that will be the “target” with the same columns as the ones expected to be present in the .csv that will be uploaded to S3;
@@ -0,0 +1,69 @@
+# Alerting
+
+## Application monitoring and alerting:
+
+In Hackney, we use AWS CloudWatch to implement monitoring and alerting.
+
+Any logs created in our APIs are recorded and accessible in AWS CloudWatch. Creation of log groups is automated via the current serverless setup.
+## Metric filters:
+
+### Filter and Pattern Syntax:
+
+
+Metric filters are a useful feature that allows you to find patterns and terms in your logs. Following the logging standards identified earlier in this document, metric filters can be created to easily identify logs related to a certain phrase or term like `ERROR`.
+
+Using the filters, developers can easily narrow down the logs they see to only the ones related to any error that has occurred, hiding all other logs such as ones for successful requests.
+
+CloudWatch also provides a way to create alarms based on metric filters, so we can get notified if a log with matching a certain pattern/term has occurred.
+
+### Metric filters that should be created per API:
+
+TBC;
+
+_Needs to be filters we want commonly available for each API_
+
+### Alarms should be created for the following metric filters:
+
+TBC;
+
+_Need to decide which logs should have an alarm associated with them_
+
+## Availability monitoring and alerting:
+
+We use AWS CloudWatch Canaries to monitor the availability of our APIs and front-end applications.
+
+## AWS CloudWatch Canaries:
+
+### AWS Canaries for APIs:
+
+- Set to run every 5 mins;
+
+- A canary invokes an API endpoint to check it’s availability;
+
+- Needs to be set up per API endpoint to ensure all endpoints provided by an API are functioning as expected;
+
+- The current creation process for a canary is **manual**;
+
+[See the guidance for Canaries here](./uptime_monitoring.md)
+
+### AWS Canaries for front end applications:
+
+- Can monitor the availability of a web page;
+
+- Alarms can be set to alert if availability of a given web page falls down;
+
+- Logs recorded can be used to identify performance issues associated with loading a specific item;
+
+- Can check for broken links;
+
+- A max number of links to follow is set up;
+
+- The canary crawls through the links and returns the first broken link identified;
+
+## AWS CloudWatch Alarms:
+
+We also use CloudWatch alarms to monitor for specific events in the log streams.
+
+Specific metrics can be established as triggers on application logs which can fire off alerts in the form of emails or other messaging mediums.  We can create up to 5000 alarms per region per account which should give us sufficient capacity.
+
+It may also be possible to consolidate these alarms if we have a standard format for logs (this may also be achievable by creating composite alarms but uses up available alarms.
@@ -0,0 +1,77 @@
+# Application Logging Guidelines
+
+## Introduction:
+
+This document defines a set of guidelines used to produce rich application logging for applications belonging to a microservice architecture.
+Providing rich logging information will make it easier to investigate issues without making use of intrusive approaches (i.e: debug, memory dump), also making visible the behaviour of services by using monitoring tools to extract and/or query these logs.
+## Categories:
+
+### Application:
+
+Logs related to the application behaviour that does not result into exceptions and would not be visible externally if not logged. Common scenarios are conditional behaviours that will generate different outputs based on the contents of the command or state of the resource being manipulated. Application logs should be the only log level required within the application and must be used with caution to avoid excessive log entries without value for issues investigation.
+### Events:
+
+Events are application notifications used to inform external components that an activity has happened within the application. In some cases the event will let external subscribers know if an operation happened successfully or failed. Every operation will raise at least two types of events, success or failure, but in some cases, it might generate different events based on the context of the operation. All events raised by an application are logged by the infrastructure components, so adding a log information to notify an event has happened is not required and will make the logging redundant.
+### Exceptions:
+
+Exceptions is an execution flow mechanism used to interrupt the current processing flow either because, the application or one of it’s dependent components behaved unexpectedly and can’t proceed, or an application logic is aware it can’t proceed because doing so will/may cause issues. All exceptions raised by an application or it’s dependencies are logged by the infrastructure components, so adding a log information to notify an exception has happened is not required and will make the logging redundant.
+### Tracing and Context Correlation:
+
+Whether an application executes a task successfully or not is often highly dependent on the input from the user. As a result, this contextual information may be vital when trying to diagnose a fault.
+
+This can be achieved by setting a property of your logging platform during the startup of a component.
+This allows to view the unified stream of "events", but also to segregate per role when required to troubleshoot an individual component.
+Every operation in an application is initiated by a trigger either externally or internally (synchronous processing). In general, these triggers do not have visibility of the behaviour for each service, and just expect a result as output from a request.
+In many cases, these operations may trigger operations into dependent services to accomplish the initial operation. These chain of events need to be correlated in order to identify possible failures or for auditing purposes. For this reason, enter into scene the CorrelationId and TracingContext.
+## Correlation Id:
+
+In order to identify any operation executed across different components and layers but that are part of the same context, there should be a correlationId sent from the client triggering the operation.
+CorrelationId is a global unique identifier (GUID) attribute provided by callers (or auto-generated when one is not provided) to identify the chain of calls between services.
+For instance, an HTTP client should send a correlationId header for every call. The downstream systems should pass along this correlationId to any downstream layer so that all traces/logs can be identified as part of the same operation.
+### Tracing Context:
+
+Tracing Context is the name given to the correlated chain of calls that happened from an initial trigger until it reached the current state. By default the Tracing Context will be using the CorrelationId to identify all events raised since the first trigger initiate the operation.
+Currently, the correlation identify the events in a chronological order and events happening in parallel on separate services will be mixed up.
+
+** Additional Attributes to Consider (Custom Dimensions) **
+
+| Attribute Name | Attribute Name | Attribute Name | Attribute Name |
+| -------------- | -------------- | -------------- | -------------- |
+| Domain * | Application Instance | tenure | Domain name which the service belongs to |
+| Service * | Application Instance | tenure-listener / tenure-api | The service name generating the logs |
+| Environment * | Application Instance | dev | Environment name |
+| Version * | Application Instance | 1.2.345 | Semantic Version Number |
+| CorrelationId * | Per Operation | 3fa85f64-5717-4562-b3fc-2c963f-66afa6 | Id used to chain events and logs executed by multiple operations |
+| OperationId | Per Operation | 3fa85f64-5717-4562-b3fc-2c963f-66afa6 | Unique Id that identifies one occurrence of the operation i.e: Requestid |
+| OperationName | Per Operation | Update Tenure | Name of operation being executed |
+| UserId | Per Operation | KJS827HJS88S | Id of user triggering the operation |
+| Logger(SourceContext) | Per Log Entry | Company.Solution.ClassName | Name of component or class generating the logs. |
+| ResourceId | Per Log Entry | tenure-12345 | When an operation is being executred in the context of an existing resource (i.e. a repair) the logs should make the id of the order being modified available. |
+
+** NOTE: **
+Attributes marked with a * indicate _high importance_.
+## Scopes:
+
+### Application Instance:
+
+Each deployed instance of an application will provide the same log attributes to all log entries generated. It does not change in the scope of the operations generating
+the logs.
+### Per Operation:
+
+When an operation is started (API request, message from a Queue), the attributes will be set and used throughout the chain of calls using the same attributes. It does not change based on the context within the application. In case the value does not come from external calls, should be generated internally.
+### Per Log Entry:
+Each class or logger will have it’s own set of attributes used within it’s context to identify the source component that is generating the logs. i.e: The class name writing to the logs, the Resource Id being manipulated, or any data available only in the context of the logger.
+## Filters and Masking:
+
+### Filters:
+
+- To reduce the high volume of log entries, the applications should add the following log filters:
+- Filter out healthcheck logs (keep errors);
+- Limit the log level to;
+- Non production environments = Information;
+- Production environments = Warning;
+- Don't let ASP.NET Core Console Logging Slow your App down;
+- Logging in .NET Core and ASP.NET Core;
+### Masking:
+
+Prevent log entries containing Personal Identifiable Information (PII) by removing the attributes, or, masking part of the value.