diff --git a/_vendor/github.com/chef/automate/components/docs-chef-io/content/automate/config_postgresql.md b/_vendor/github.com/chef/automate/components/docs-chef-io/content/automate/config_postgresql.md index e1330ff449..5cea500059 100644 --- a/_vendor/github.com/chef/automate/components/docs-chef-io/content/automate/config_postgresql.md +++ b/_vendor/github.com/chef/automate/components/docs-chef-io/content/automate/config_postgresql.md @@ -79,6 +79,8 @@ In the above snippet, `max_connections` determines the maximum number of concurr When running a standby server, you must set this parameter to the same or higher value than on the master server. Otherwise, queries will not be allowed on the standby server. +**Note:** Modifications to the 'max_connections' require a reboot of the leader, during which a new leader will be elected. The updated value for 'max_connections' will be reflected in the configuration only after the reboot, in accordance with [PostgresSQL documentation](https://www.postgresql.org/docs/current/runtime-config-connection.html#GUC-MAX-CONNECTIONS). + ### Pg Dump ```bash @@ -93,13 +95,30 @@ This section configures pg_dump, a PostgreSQL utility for performing database ba ```bash [replication] -lag_health_threshold = 20480 +name = 'replication' +password = 'replication' +# note: lag_health_threshold is in bytes - default to 300KB +# this is just greater than 1 WAL segment +lag_health_threshold = 307200 +# maximum lag time in seconds since log was last replayed before replica is eligible for a restart max_replay_lag_before_restart_s = 180 -name = "replication" -password = "replication" +max_wal_senders = 10 +max_replication_slots = 5 +wal_sender_timeout = 60 +wal_receiver_timeout = 60 +wal_compression = "off" ``` -This section configures replication settings. It sets the lag health threshold to 20480 bytes, the maximum allowed replication lag. It also specifies the maximum replay lag before restarting replication and provides the replication name and password. +This section configures replication settings: +- `name`: replication name +- `password`: replication password. +- `lag_health_threshold`: it sets the lag health threshold to 307200 bytes(300 kb), the maximum allowed replication lag. +- `max_replay_lag_before_restart_s`: Custom setting; maximum lag time in seconds since log was last replayed before replica is eligible for a restart. +- `max_wal_senders`: Limits how many standbys can connect for replication (default: 10). +- `max_replication_slots`: Sets how many replication slots are allowed (default: 5). +- `wal_sender_timeout`: Primary waits 60 seconds for standby response before disconnecting. +- `wal_receiver_timeout`: Standby waits 60 seconds for data from primary before timing out. +- `wal_compression`: Controls compression of WAL data; "off" disables it, "on" enables it. ### SSL diff --git a/_vendor/github.com/chef/automate/components/docs-chef-io/content/automate/ha.md b/_vendor/github.com/chef/automate/components/docs-chef-io/content/automate/ha.md index 3b1ccb5b6a..9a6432c403 100644 --- a/_vendor/github.com/chef/automate/components/docs-chef-io/content/automate/ha.md +++ b/_vendor/github.com/chef/automate/components/docs-chef-io/content/automate/ha.md @@ -28,10 +28,21 @@ The Chef Automate HA equates to reliability, efficiency, and productivity, built HA architecture includes the cluster of the *Chef Automate*, *Chef Server*, *PostgreSQL*, and *OpenSearch*. -### Chef Automate HA Architecture for On Premises / Cloud Non-Managed +{{< note >}} +Port **7799** must be accessible from the bastion host to all nodes within the Chef Automate cluster. +Although this requirement is not explicitly illustrated in the network architecture diagram for the sake of visual clarity, it is essential for proper cluster operation. The `chef-automate verify` command depends on successful connectivity to port **7799** on each node to perform its validations correctly. +{{< /note >}} + +### Chef Automate HA Architecture for OnPremise / Cloud Non-Managed ![High Availability Architecture](/images/automate/ha_arch_onprem.png) +{{< note >}} +In Chef Automate HA architecture for On-Premise or non-managed Cloud deployments, frontend nodes connect to PostgreSQL over port **5432** and use port **6432** to perform leader checks. + +Chef has deprecated the earlier configuration that required frontend nodes to use port **7432** for PostgreSQL connectivity. +{{< /note >}} + ### Chef Automate HA Architecture for AWS Managed ![High Availability Architecture](/images/automate/ha_arch_aws_managedservices.png) @@ -46,6 +57,12 @@ The following shows a five-node cluster, which is a supported deployment pattern ![High Availability Architecture](/images/automate/ha_arch_minnode_cluster.png) +{{< note >}} +In Chef Automate HA architecture for On-Premise or non-managed Cloud deployments, frontend nodes connect to PostgreSQL over port **5432** and use port **6432** to perform leader checks. + +Chef has deprecated the earlier configuration that required frontend nodes to use port **7432** for PostgreSQL connectivity. +{{< /note >}} + {{< warning >}} - Choose Minimum node deployment type when you have VM constraints. diff --git a/_vendor/github.com/chef/automate/components/docs-chef-io/content/automate/ha_aws_deploy_steps.md b/_vendor/github.com/chef/automate/components/docs-chef-io/content/automate/ha_aws_deploy_steps.md index d480e68921..46ac47f705 100644 --- a/_vendor/github.com/chef/automate/components/docs-chef-io/content/automate/ha_aws_deploy_steps.md +++ b/_vendor/github.com/chef/automate/components/docs-chef-io/content/automate/ha_aws_deploy_steps.md @@ -96,13 +96,28 @@ Run the following steps on Bastion Host Machine: ## Config Verify -1. After successful provision, run verify config command: +### Prerequisites - ```bash - sudo chef-automate verify -c config.toml - ``` +#### * Directory Structure + +- The verification cli needs `$HOME` environment variable to be available on all nodes. +- If in some case its not available then as a fallback the cli will be copied over to `/home//`. + - `ssh_user name` is read from `ssh_user` property in `config.toml` +- Every node must have the `$HOME` directory with minimum permissions `drwx------`. + +#### * Permission Requirements + +- The specified SSH user must have: + - Read (r), write (w), and execute (x) permissions. + - Ownership of the directory. + +After successful provision, run verify config command: + +```bash +sudo chef-automate verify -c config.toml +``` - To know more about config verify, you can check [Config Verify Doc page](/automate/ha_verification_check/). +To learn more about Config Verify, check the [Config Verify Doc page](/automate/ha_verification_check/). ## Steps to Deploy diff --git a/_vendor/github.com/chef/automate/components/docs-chef-io/content/automate/ha_aws_managed_deploy_steps.md b/_vendor/github.com/chef/automate/components/docs-chef-io/content/automate/ha_aws_managed_deploy_steps.md index 38142c327e..6cd88aa48a 100644 --- a/_vendor/github.com/chef/automate/components/docs-chef-io/content/automate/ha_aws_managed_deploy_steps.md +++ b/_vendor/github.com/chef/automate/components/docs-chef-io/content/automate/ha_aws_managed_deploy_steps.md @@ -88,15 +88,30 @@ Once the provisioning is successful, **if you have added custom DNS to your conf ## Config Verify -1. After successful provision, run verify config command: +### Prerequisites - ```bash - sudo chef-automate verify -c config.toml - ``` +#### * Directory Structure + +- The verification cli needs `$HOME` environment variable to be available on all nodes. +- If in some case its not available then as a fallback the cli will be copied over to `/home//`. + - `ssh_user name` is read from `ssh_user` property in `config.toml` +- Every node must have the `$HOME` directory with minimum permissions `drwx------`. + +#### * Permission Requirements + +- The specified SSH user must have: + - Read (r), write (w), and execute (x) permissions. + - Ownership of the directory. + +After successful provision, run verify config command: + +```bash +sudo chef-automate verify -c config.toml +``` - To know more about config verify, you can check [Config Verify Doc page](/automate/ha_verification_check/). +To learn more about Config Verify, check the [Config Verify Doc page](/automate/ha_verification_check/). - Once the verification is successfully completed, then proceed with deployment, In case of failure, please fix the issue and re-run the verify command. +Once the verification is completed, proceed with deployment. In case of failure, fix the issue and re-run the verify command. ## Steps to deploy diff --git a/_vendor/github.com/chef/automate/components/docs-chef-io/content/automate/ha_backup_restore.md b/_vendor/github.com/chef/automate/components/docs-chef-io/content/automate/ha_backup_restore.md index f943afebed..676d40faa7 100644 --- a/_vendor/github.com/chef/automate/components/docs-chef-io/content/automate/ha_backup_restore.md +++ b/_vendor/github.com/chef/automate/components/docs-chef-io/content/automate/ha_backup_restore.md @@ -52,10 +52,6 @@ An Amazon S3 bucket is a public cloud storage resource available in Amazon Web S With the AWS Free Usage Tier*, you can get started with Amazon S3 for free in all regions except the AWS GovCloud Regions. [See](https://aws.amazon.com/s3/) for more information. -## Taking Backup with Amazon S3 Bucket - -This section explains how to take backup for External Elasticsearch (ES) and PostgreSQL to the Amazon S3 bucket. - {{< note >}} Ensure you perform the backup configuration before deploying the Chef Automate High Availability (HA) cluster. diff --git a/_vendor/github.com/chef/automate/components/docs-chef-io/content/automate/ha_cert_rotation.md b/_vendor/github.com/chef/automate/components/docs-chef-io/content/automate/ha_cert_rotation.md index 241017de69..9300cf4cbd 100644 --- a/_vendor/github.com/chef/automate/components/docs-chef-io/content/automate/ha_cert_rotation.md +++ b/_vendor/github.com/chef/automate/components/docs-chef-io/content/automate/ha_cert_rotation.md @@ -42,6 +42,8 @@ To understand how to generate certificates, refer to the [Certificate Generation - `--wait-timeout` This flag sets the operation timeout duration (in seconds) for each individual node during the certificate rotation process. - Certificate rotation should be done in down-time window as service will restart. - CN (Common Name) should be the same for all certificates in Opensearch nodes. +-Use the CLI command to generate the certificate template, ensuring that the node order in the TOML file remains unchanged. Modifying the node sequence may lead to issues when issuing unique certificates for individual nodes. +- When specifying the subject during certificate generation, avoid using special characters. Due to the involvement of multiple processing layers, special character handling becomes complex and may necessitate manual intervention or patching. {{< /note >}} ### Rotate Cluster Certificates @@ -62,6 +64,12 @@ To rotate the certificate for a node (automate,chef-server,postgres,opensearch) chef-automate cert-rotate --certificate-config certificate-config.toml ``` +{{< warning >}} +It is critical to generate the certificate template using the following command: +`chef-automate cert-rotate generate-certificate-config certificate-config.toml` +This ensures that specific certificate metadata such as `nodes_dn` is generated in the exact same order as during processes like `upgrade`, `node add`, or `node remove`. Maintaining the IP address order generated by the `generate-certificate-config` command is essential. Any deviation from this order may result in unexpected cluster reboots or system instability. +{{< /warning >}} + #### Sample Certificate Template ```toml diff --git a/_vendor/github.com/chef/automate/components/docs-chef-io/content/automate/ha_disaster_recovery_setup.md b/_vendor/github.com/chef/automate/components/docs-chef-io/content/automate/ha_disaster_recovery_setup.md index 5beccaa7e0..43613de770 100644 --- a/_vendor/github.com/chef/automate/components/docs-chef-io/content/automate/ha_disaster_recovery_setup.md +++ b/_vendor/github.com/chef/automate/components/docs-chef-io/content/automate/ha_disaster_recovery_setup.md @@ -129,8 +129,8 @@ Configure backups for both clusters using either [file system](/automate/ha_back - Stop all the services on all Automate and Chef Infra frontend nodes using the following command, use the below command from the bastion. ```sh - chef-automate systemctl --a2 - chef-automate systemctl --cs + chef-automate start --a2 + chef-automate start --cs ``` - In the disaster recovery cluster, use the following sample command to restore the latest backup from any Chef Automate frontend instance. diff --git a/_vendor/github.com/chef/automate/components/docs-chef-io/content/automate/ha_healthcheck.md b/_vendor/github.com/chef/automate/components/docs-chef-io/content/automate/ha_healthcheck.md index aac4eef421..f7e6bfc52d 100644 --- a/_vendor/github.com/chef/automate/components/docs-chef-io/content/automate/ha_healthcheck.md +++ b/_vendor/github.com/chef/automate/components/docs-chef-io/content/automate/ha_healthcheck.md @@ -175,3 +175,80 @@ automate-backend-ctl show --svc=automate-ha-postgresql cd /hab/a2_deploy_workspace/ ./scripts/credentials set opensearch --no-auto ``` + +## Precaution During Backend Node Reboot + +To prevent data loss, do not restart all nodes in a PostgreSQL cluster in quick succession. + +When a follower node (e.g., f1) is restarted, it begins synchronizing data from the current leader. If the leader node is also restarted during this synchronization process, a leader election may occur. If f1 is elected as the new leader before completing its sync, it may not have the most recent data, which can lead to inconsistencies or data loss. + +## Precaution during opensearch Reboot + +- Check cluster health +Execute the following commands to verify the health of the cluster: + +```sh +curl -X GET "https://localhost:9200/_cat/health?v" -k +--cacert /hab/svc/automate-ha-opensearch/config/certificates/root-ca.pem +--key /hab/svc/automate-ha-opensearch/config/certificates/admin-key.pem +--cert /hab/svc/automate-ha-opensearch/config/certificates/admin.pem +``` + +```sh +curl -X GET "https://localhost:9200/_cat/recovery?v" -k +--cacert /hab/svc/automate-ha-opensearch/config/certificates/root-ca.pem +--key /hab/svc/automate-ha-opensearch/config/certificates/admin-key.pem +--cert /hab/svc/automate-ha-opensearch/config/certificates/admin.pem +``` + +- Disable shard allocation +Before restarting the node, disable shard allocation to prevent unnecessary rebalancing during the process: + +```sh +curl -X PUT "https://localhost:9200/_cluster/settings" -H 'Content-Type: application/json' -d' +{ + "persistent": { + "cluster.routing.allocation.enable": "primaries" + } +}' -k +--cacert /hab/svc/automate-ha-opensearch/config/certificates/root-ca.pem +--key /hab/svc/automate-ha-opensearch/config/certificates/admin-key.pem +--cert /hab/svc/automate-ha-opensearch/config/certificates/admin.pem +``` + +- Stop indexing and flush the data to disk: + +```sh +curl -X POST "https://localhost:9200/_flush" -k +--cacert /hab/svc/automate-ha-opensearch/config/certificates/root-ca.pem +--key /hab/svc/automate-ha-opensearch/config/certificates/admin-key.pem +--cert /hab/svc/automate-ha-opensearch/config/certificates/admin.pem +``` + +- Enable shard allocation once the node is back online to resume normal data distribution. + +```sh +curl -X PUT "https://localhost:9200/_cluster/settings" -H 'Content-Type: application/json' -d' +{ + "persistent": { + "cluster.routing.allocation.enable": null + } +}' -k +--cacert /hab/svc/automate-ha-opensearch/config/certificates/root-ca.pem +--key /hab/svc/automate-ha-opensearch/config/certificates/admin-key.pem +--cert /hab/svc/automate-ha-opensearch/config/certificates/admin.pem +``` + +- Monitor the cluster state, and verify its health and recovery status to ensure overall stability. + +```sh +curl -X GET "https://localhost:9200/_cat/health?v" -k +--cacert /hab/svc/automate-ha-opensearch/config/certificates/root-ca.pem +--key /hab/svc/automate-ha-opensearch/config/certificates/admin-key.pem +--cert /hab/svc/automate-ha-opensearch/config/certificates/admin.pem + +curl -X GET "https://localhost:9200/_cat/recovery?v" -k +--cacert /hab/svc/automate-ha-opensearch/config/certificates/root-ca.pem +--key /hab/svc/automate-ha-opensearch/config/certificates/admin-key.pem +--cert /hab/svc/automate-ha-opensearch/config/certificates/admin.pem +``` \ No newline at end of file diff --git a/_vendor/github.com/chef/automate/components/docs-chef-io/content/automate/ha_onprim_deployment_procedure.md b/_vendor/github.com/chef/automate/components/docs-chef-io/content/automate/ha_onprim_deployment_procedure.md index ebc55b2e18..e265438596 100644 --- a/_vendor/github.com/chef/automate/components/docs-chef-io/content/automate/ha_onprim_deployment_procedure.md +++ b/_vendor/github.com/chef/automate/components/docs-chef-io/content/automate/ha_onprim_deployment_procedure.md @@ -101,15 +101,30 @@ You can also generate a configuration file using the `init-config` subcommand. ## Config Verify -1. Verify the above config using the `verify` subcommand. +### Prerequisites - ```bash - sudo chef-automate verify -c config.toml - ``` +#### * Directory Structure + +- The verification cli needs `$HOME` environment variable to be available on all nodes. +- If in some case its not available then as a fallback the cli will be copied over to `/home//`. + - `ssh_user name` is read from `ssh_user` property in `config.toml` +- Every node must have the `$HOME` directory with minimum permissions `drwx------`. + +#### * Permission Requirements + +- The specified SSH user must have: + - Read (r), write (w), and execute (x) permissions. + - Ownership of the directory. - To know more about config verify, check [Config Verify Documentation](/automate/ha_verification_check/). +Verify the above config using the `verify` subcommand. - Once the verification completed successfully, proceed with the deployment. In case of failure, fix the issue and verify it by re-running the verify command. +```bash +sudo chef-automate verify -c config.toml +``` + +To learn more about Config Verify, check the [Config Verify Documentation](/automate/ha_verification_check/). + +Once the verification completed successfully, proceed with the deployment. In case of failure, fix the issue and verify it by re-running the verify command. ## Steps to Deploy @@ -231,7 +246,10 @@ The bastion server can patch new configurations in all nodes. To know more see [ - For the Frontend nodes you can use the same IP in Chef Automate and Chef Server. - For the Backend nodes you can use the same IP in PostgreSQL and OpenSearch. - To provide multiline certificates use triple quotes like `""" multiline certificate contents"""`. - +- Rebooting or restarting individual nodes outside a designated maintenance window should be avoided, especially during periods of high traffic. +- This recommendation is based on our [performance benchmarking](/automate/ha_performance_benchmarks/#5-node-cluster-deployment) and is intended for customers managing up to 10,000 nodes under typical load conditions. +- The 5 node Automate deployment pattern does not support dynamic scaling (i.e., adding or removing nodes). A 5 node deployment will always remain a 5 node setup. +- Transitioning to an 11 node deployment requires decommissioning the existing 5 node cluster entirely. The new 11 node architecture must be provisioned from scratch. {{< /note >}} ```config diff --git a/_vendor/github.com/chef/automate/components/docs-chef-io/content/automate/ha_onprim_deployment_with_aws_managed_deployment.md b/_vendor/github.com/chef/automate/components/docs-chef-io/content/automate/ha_onprim_deployment_with_aws_managed_deployment.md index b24590077d..d692e652f7 100644 --- a/_vendor/github.com/chef/automate/components/docs-chef-io/content/automate/ha_onprim_deployment_with_aws_managed_deployment.md +++ b/_vendor/github.com/chef/automate/components/docs-chef-io/content/automate/ha_onprim_deployment_with_aws_managed_deployment.md @@ -44,15 +44,30 @@ You can also view the [Sample Config](#sample-config-to-setup-on-premise-deploym ## Verify Configuration file -1. We verify the above config using the below command: +### Prerequisites - ```bash - sudo chef-automate verify -c config.toml - ``` +#### * Directory Structure + +- The verification cli needs `$HOME` environment variable to be available on all nodes. +- If in some case its not available then as a fallback the cli will be copied over to `/home//`. + - `ssh_user name` is read from `ssh_user` property in `config.toml` +- Every node must have the `$HOME` directory with minimum permissions `drwx------`. + +#### * Permission Requirements + +- The specified SSH user must have: + - Read (r), write (w), and execute (x) permissions. + - Ownership of the directory. + +We verify the above config using the below command: + +```bash +sudo chef-automate verify -c config.toml +``` - To know more about config verify, you can check [Config Verify Doc page](/automate/ha_verification_check/). +To learn more about Config Verify, check the [Config Verify Doc page](/automate/ha_verification_check/). - Once the verification is successfully completed, then proceed with deployment, In case of failure, please fix the issue and re-run the verify command. +Once the verification is completed, proceed with deployment. In case of failure, fix the issue and re-run the verify command. ## Steps to Deploy diff --git a/_vendor/github.com/chef/automate/components/docs-chef-io/content/automate/ha_onprim_deployment_with_customer_managed_deployment.md b/_vendor/github.com/chef/automate/components/docs-chef-io/content/automate/ha_onprim_deployment_with_customer_managed_deployment.md index 89a1f30014..c8860f2de0 100644 --- a/_vendor/github.com/chef/automate/components/docs-chef-io/content/automate/ha_onprim_deployment_with_customer_managed_deployment.md +++ b/_vendor/github.com/chef/automate/components/docs-chef-io/content/automate/ha_onprim_deployment_with_customer_managed_deployment.md @@ -48,15 +48,30 @@ This section will discuss deploying Chef Automate HA on-premises machines with a ## Verify -1. Verify the configuration file. +### Prerequisites - ```bash - sudo chef-automate verify -c config.toml - ``` +#### * Directory Structure + +- The verification cli needs `$HOME` environment variable to be available on all nodes. +- If in some case its not available then as a fallback the cli will be copied over to `/home//`. + - `ssh_user name` is read from `ssh_user` property in `config.toml` +- Every node must have the `$HOME` directory with minimum permissions `drwx------`. + +#### * Permission Requirements + +- The specified SSH user must have: + - Read (r), write (w), and execute (x) permissions. + - Ownership of the directory. + +Verify the configuration file. + +```bash +sudo chef-automate verify -c config.toml +``` - To know more about config verify, you can check [Config Verify Doc page](/automate/ha_verification_check/). +To learn more about Config Verify, check the [Config Verify Doc page](/automate/ha_verification_check/). - Once the verification is successfully completed, then proceed with deployment, In case of failure, please fix the issue and re-run the verify command. +Once the verification is completed, proceed with deployment. In case of failure, fix the issue and re-run the verify command. ## Steps to Deploy diff --git a/_vendor/github.com/chef/automate/components/docs-chef-io/content/automate/ha_performance_benchmarks.md b/_vendor/github.com/chef/automate/components/docs-chef-io/content/automate/ha_performance_benchmarks.md index 574edc3e8b..c2a68f9756 100644 --- a/_vendor/github.com/chef/automate/components/docs-chef-io/content/automate/ha_performance_benchmarks.md +++ b/_vendor/github.com/chef/automate/components/docs-chef-io/content/automate/ha_performance_benchmarks.md @@ -127,4 +127,6 @@ If a large number of Chef Infra client converges happen in a small window, it wi When expanding the Automate HA cluster to handle additional nodes or loads from users, it's preferable to scale the front-end nodes horizontally by adding more nodes rather than increasing the amount of server resources on the nodes. This makes it easier to scale as new nodes are added to Chef Infra and reduces the amount of configuration tuning required on each front-end node. +For PostgreSQL, we prefer vertical scaling to horizontal scaling for high-availability deployments. All frontend nodes in this architecture directly talk to the leader node, hence it is the sole point of interaction. Increasing the number of follower nodes does not share the load but adds to the load on the leader since it has to replicate data to all followers. It adds extra replication lag and worsens performance overall. + Tuning the configs to handle additional CPU cores can be time-consuming. It often can lead to bottlenecks or other issues in different parts of the Automate HA cluster. diff --git a/_vendor/github.com/chef/automate/components/docs-chef-io/content/automate/ha_troubleshooting.md b/_vendor/github.com/chef/automate/components/docs-chef-io/content/automate/ha_troubleshooting.md index 5b0d72d412..d5aa094c03 100644 --- a/_vendor/github.com/chef/automate/components/docs-chef-io/content/automate/ha_troubleshooting.md +++ b/_vendor/github.com/chef/automate/components/docs-chef-io/content/automate/ha_troubleshooting.md @@ -36,6 +36,81 @@ This page explains the frequently encountered issues in Chef Automate High Avail To make the service healthy, ensure the chef server can curl the data collector endpoint from the chef server node. +### Rate limiter ingestion issues on data collector endpoint + +```bash +Sep 24 22:33:20 rp000134186 hab: automate-gateway.default(O): time="2024-09-24T22:33:20-05:00" level=error +msg="resource=collector-requests cur=960 max=960: Resource limit exceeded" grpc_port=2001 hostname=127.0.0.1 https_port=2000 null_backend_socket=/hab/svc/automate-gateway/var/null_backend.sock +``` + +The rate limiter controls how many data collector requests are processed concurrently. If you experience "Resource limit exceeded" errors, you can increase the number of concurrent requests to handle more load, but be aware that this will increase CPU and memory consumption. + +Example Configuration: + +Use the following TOML template to update the gateway settings on the Automate nodes: + +`chef-automate config patch config.toml --a2` + +```toml +[gateway.v1.sys.data_collector.limiter] +# Setting disable to true will allow an unbounded number of +# concurrent data collector requests (not recommended). +disable = false +# Sets the maximum number of concurrent inflight requests. +# Default value = 60 * number of CPUs. +max_inflight_requests = 1200 +``` + +Guidance: + +- The default value for max_inflight_requests is 60 * number of CPUs. +- If you encounter `Resource limit exceeded` errors, increase this value gradually by 10% to 30% based on performance improvements. +- Monitor CPU and memory usage after each adjustment. + +Reference: [Chef Automate Configuration](https://docs.chef.io/automate/configuration/) + +### Queue is full errors on data collector endpoint + +```bash +Sep 30 00:04:43 rp000134186 hab: ingest-service.default(O): time="2024-09-30T00:04:43-05:00" level=error msg="Chef run ingestion failure" +error="Message rejected because queue is full" +``` + +The ingest/compliance service uses a message buffer to queue incoming data. If you encounter "Message rejected because queue is full" errors, you can increase the queue size, but this will increase CPU and memory usage. + +Example Configuration: + +Use the following TOML template to increase the queue size on the Automate nodes: + +`chef-automate config patch config.toml --a2` + +```toml +[compliance.v1.sys.service] +message_buffer_size = 300 + +[ingest.v1.sys.service] +message_buffer_size = 300 +``` + +Guidance: + +- The default value for message_buffer_size is 100. +- If you see queue overflow errors, increase the value to 300. +- If the issue persists, increase it gradually by 100 until the problem is resolved. + +Avoid setting this value too high, as it may cause backpressure and increase latency if downstream processing slows down or fails. + +Reference: [Chef Automate Configuration](https://docs.chef.io/automate/configuration/) + +### Still getting 5XX on data collector endpoint + +Along with the above configuration changes related to rate-limiter and queue configuration, it is crucial to implement proper splay on the client nodes sending data to Automate. Configuration changes can help mitigate the issue to a certain extent, but sudden bursts of traffic from multiple clients can still overwhelm the system and cause request rejections. + +Splay introduces a random delay between client runs, which helps: +- Prevent traffic spikes. +- Distribute load more evenly across the system. +- Reduce the chances of exceeding rate limits or overloading the queue. + ### Issue: Database Accessed by Other Users ```bash diff --git a/_vendor/github.com/chef/automate/components/docs-chef-io/content/automate/ha_upgrade_introduction.md b/_vendor/github.com/chef/automate/components/docs-chef-io/content/automate/ha_upgrade_introduction.md index 3b01161761..549b27d32b 100644 --- a/_vendor/github.com/chef/automate/components/docs-chef-io/content/automate/ha_upgrade_introduction.md +++ b/_vendor/github.com/chef/automate/components/docs-chef-io/content/automate/ha_upgrade_introduction.md @@ -58,6 +58,7 @@ Steps to upgrade the Chef Automate HA are as shown below: - Backend upgrades will restart the backend service, which take time for cluster to be in health state. - Backend upgrades should be performed in maintenance window. - Upgrade command, currently only supports minor upgrade. + - We recommend always performing a chef-automate backup before initiating any upgrade. {{< /note >}} - To skip user confirmation prompt in upgrade, you can pass a flag diff --git a/_vendor/github.com/chef/automate/components/docs-chef-io/content/automate/servicenow_incident_creation.md b/_vendor/github.com/chef/automate/components/docs-chef-io/content/automate/servicenow_incident_creation.md index 267d9e5280..7cc761cef7 100644 --- a/_vendor/github.com/chef/automate/components/docs-chef-io/content/automate/servicenow_incident_creation.md +++ b/_vendor/github.com/chef/automate/components/docs-chef-io/content/automate/servicenow_incident_creation.md @@ -36,9 +36,9 @@ The Incident App generates a data stream of compliance events that you can lever * A running [Chef Automate](https://www.chef.io/automate/) instance. * Chef Automate has a valid SSL/TLS certificate from a trusted certificate authority (CA). -* A running [ServiceNow](https://www.servicenow.com/) instance. The supported ServiceNow versions are **Tokyo**, **San Diego** and **Rome**. +* A running [ServiceNow](https://www.servicenow.com/) instance. The supported ServiceNow versions are **Vancouver**, **Washington DC** and **Xanadu**. * The ServiceNow instance is reachable on port 443. -* The ServiceNow instance should be compatible with **Tokyo**, **San Diego** and **Rome** versions. +* The ServiceNow instance should be compatible with **Vancouver**, **Washington DC** and **Xanadu** versions. ## Install diff --git a/_vendor/github.com/chef/automate/components/docs-chef-io/content/automate/servicenow_integration.md b/_vendor/github.com/chef/automate/components/docs-chef-io/content/automate/servicenow_integration.md index 655ea83593..c52633cf36 100644 --- a/_vendor/github.com/chef/automate/components/docs-chef-io/content/automate/servicenow_integration.md +++ b/_vendor/github.com/chef/automate/components/docs-chef-io/content/automate/servicenow_integration.md @@ -31,9 +31,9 @@ The Integration App works by exposing the REST API endpoints for communication b - A running [Chef Automate](https://www.chef.io/automate/) instance. - Chef Automate has a valid SSL/TLS certificate from a trusted certificate authority (CA). -- A running [ServiceNow](https://www.servicenow.com/) instance.* A running [ServiceNow](https://www.servicenow.com/) instance. The supported ServiceNow versions are **Tokyo**, **San Diego** and **Rome**. +- A running [ServiceNow](https://www.servicenow.com/) instance.* A running [ServiceNow](https://www.servicenow.com/) instance. The supported ServiceNow versions are **Vancouver**, **Washington DC** and **Xanadu**. - The ServiceNow instance is reachable on port 443. -- The ServiceNow instance should be compatible with **Tokyo**, **San Diego** and **Rome** versions. +- The ServiceNow instance should be compatible with **Vancouver**, **Washington DC** and **Xanadu** versions. ### Required ServiceNow Plugins diff --git a/_vendor/github.com/chef/automate/components/docs-chef-io/static/images/automate/ha_arch_aws_managedservices.png b/_vendor/github.com/chef/automate/components/docs-chef-io/static/images/automate/ha_arch_aws_managedservices.png index fef86d65ae..ba78ba0157 100644 Binary files a/_vendor/github.com/chef/automate/components/docs-chef-io/static/images/automate/ha_arch_aws_managedservices.png and b/_vendor/github.com/chef/automate/components/docs-chef-io/static/images/automate/ha_arch_aws_managedservices.png differ diff --git a/_vendor/github.com/chef/automate/components/docs-chef-io/static/images/automate/ha_arch_minnode_cluster.png b/_vendor/github.com/chef/automate/components/docs-chef-io/static/images/automate/ha_arch_minnode_cluster.png index 483b0cdfe9..948432d210 100644 Binary files a/_vendor/github.com/chef/automate/components/docs-chef-io/static/images/automate/ha_arch_minnode_cluster.png and b/_vendor/github.com/chef/automate/components/docs-chef-io/static/images/automate/ha_arch_minnode_cluster.png differ diff --git a/_vendor/github.com/chef/automate/components/docs-chef-io/static/images/automate/ha_arch_onprem.png b/_vendor/github.com/chef/automate/components/docs-chef-io/static/images/automate/ha_arch_onprem.png index 7aaf76893d..987ab9a886 100644 Binary files a/_vendor/github.com/chef/automate/components/docs-chef-io/static/images/automate/ha_arch_onprem.png and b/_vendor/github.com/chef/automate/components/docs-chef-io/static/images/automate/ha_arch_onprem.png differ diff --git a/_vendor/modules.txt b/_vendor/modules.txt index 631bfb6ece..6135d026e9 100644 --- a/_vendor/modules.txt +++ b/_vendor/modules.txt @@ -1,4 +1,4 @@ -# github.com/chef/automate/components/docs-chef-io v0.0.0-20250317095354-4cf10eec01e7 +# github.com/chef/automate/components/docs-chef-io v0.0.0-20250515070321-84edd4277ae8 # github.com/chef/desktop-config/docs-chef-io v0.0.0-20240814044820-5af667d41a43 # github.com/habitat-sh/habitat/components/docs-chef-io v0.0.0-20241227173243-de19b906a228 # github.com/chef/chef-server/docs-chef-io v0.0.0-20250414141619-a0fb7ff68e94 diff --git a/go.mod b/go.mod index 1d452a253f..56940f9d8e 100644 --- a/go.mod +++ b/go.mod @@ -3,7 +3,7 @@ module github.com/chef/chef-web-docs go 1.22 require ( - github.com/chef/automate/components/docs-chef-io v0.0.0-20250317095354-4cf10eec01e7 // indirect + github.com/chef/automate/components/docs-chef-io v0.0.0-20250515070321-84edd4277ae8 // indirect github.com/chef/chef-docs-theme v0.0.0-20250217213320-727f9bce8258 // indirect github.com/chef/chef-server/docs-chef-io v0.0.0-20250414141619-a0fb7ff68e94 // indirect github.com/chef/chef-workstation/docs-chef-io v0.0.0-20250205062508-ee50345a4044 // indirect diff --git a/go.sum b/go.sum index a9bc9c87a3..d8acf16fe6 100644 --- a/go.sum +++ b/go.sum @@ -1,5 +1,5 @@ -github.com/chef/automate/components/docs-chef-io v0.0.0-20250317095354-4cf10eec01e7 h1:aBSPBATSbiVOgqNwR0fZaiQplrqc1DLEG9IAL7y97pE= -github.com/chef/automate/components/docs-chef-io v0.0.0-20250317095354-4cf10eec01e7/go.mod h1:juvLC7Rt33YOCgJ5nnfl4rWZRAbSwqjTbWmcAoA0LtU= +github.com/chef/automate/components/docs-chef-io v0.0.0-20250515070321-84edd4277ae8 h1:YDp7WgYZJ0H4aBz4Kq0OpcUNdbi2EnKU6nM8rmdlEQI= +github.com/chef/automate/components/docs-chef-io v0.0.0-20250515070321-84edd4277ae8/go.mod h1:juvLC7Rt33YOCgJ5nnfl4rWZRAbSwqjTbWmcAoA0LtU= github.com/chef/chef-docs-theme v0.0.0-20250217213320-727f9bce8258 h1:wpWL3E4Kb6ynNEwilZiKk/clD0g9AjinDB/D+OKeKHU= github.com/chef/chef-docs-theme v0.0.0-20250217213320-727f9bce8258/go.mod h1:+Jpnv+LXE6dXu2xDcMzMc0RxRGuCPAoFxq5tJ/X6QpQ= github.com/chef/chef-server/docs-chef-io v0.0.0-20250414141619-a0fb7ff68e94 h1:YpF+MQ2CQ0V/sOtGrTCxa+Lpd5J9iR6ADDkrdSMqtw0=