This repository was archived by the owner on Aug 31, 2022. It is now read-only.
-
Notifications
You must be signed in to change notification settings - Fork 8
v0.6: nodes randomly fail to start with passwords mismatch #62
Copy link
Copy link
Open
Description
We observe randomly the following symptom with v0.6 release. We're not yet sure whether the root cause comes from the environment (bosh director or infrastructure) or from the cassandra 0.6 release.
Task 128129 | 07:27:33 | Updating instance cassandra-seeds: cassandra-seeds/41da6c3c-0d85-4049-b5b6-7ee8b34f6cfa (0) (canary) (00:01:48)
L Error: Action Failed get_task: Task 7b430e3e-cf68-4a3f-43c4-d2e16c9fbfc8 result: 1 of 2 post-start scripts failed. Failed Jobs: cassandra. Successful Jobs: bosh-dns.
Task 128129 | 07:29:34 | Updating instance cassandra-servers: cassandra-servers/e2699534-66fa-4372-b18c-82d963c2ff4f (0) (canary) (00:02:05)
L Error: Action Failed get_task: Task f2c8e89c-dc44-4a26-5ae0-8a8fb18a70af result: 1 of 2 post-start scripts failed. Failed Jobs: cassandra. Successful Jobs: bosh-dns.
However, after a few minutes the deployment status displays the nodes as running
$ bosh instances
Using environment '192.168.99.155' as client 'xx'
Task 128310. Done
Deployment 'c_072dd24d-c2aa-486c-88c6-6c362ae4f609'
Instance Process State AZ IPs
cassandra-broker/2f5ccb89-83d1-4d59-bb87-8254daeb694a failing z1 192.168.211.34
cassandra-seeds/41da6c3c-0d85-4049-b5b6-7ee8b34f6cfa running z1 192.168.211.25
cassandra-seeds/7e405a54-1acb-4fdd-a015-8650651b1e1a running z1 192.168.211.31
cassandra-seeds/f3fc209a-7117-49e7-9924-5c0d84f3b5fa running z1 192.168.211.32
cassandra-servers/e2699534-66fa-4372-b18c-82d963c2ff4f running z1 192.168.211.33
5 instances
Looking at /var/vcap/sys/log/cassandra/post-start.stderr.log on cassandra-seeds/41da6c3c-0d85-4049-b5b6-7ee8b34f6cfa we repeatedly see
2018-07-03_07:39:53: DEBUG: setting first password, exit status: '1'
2018-07-03_07:39:53: INFO: verifying that the current password is the desired password
Connection error: ('Unable to connect to any servers', {'192.168.211.25': AuthenticationFailed('Failed to authenticate to 192.168.211.25: Error from server: code=0100 [Bad credentials] message="Provided username cassandra and/or password are incorrect"',)})
2018-07-03_07:39:53: DEBUG: verifying current password, exit status: '1'
2018-07-03_07:39:53: ERROR: the password for user 'cassandra' is inconsistent. Aborting.
2018-07-03_07:44:21: INFO: reached Cassandra on '192.168.211.25:9042' after '14' attemps. Waiting 30 more seconds for the service to be available.
2018-07-03_07:44:51: INFO: setting first password
Connection error: ('Unable to connect to any servers', {'192.168.211.25': AuthenticationFailed('Failed to authenticate to 192.168.211.25: Error from server: code=0100 [Bad credentials] message="Provided username cassandra and/or password are incorrect"',)})
2018-07-03_07:44:52: DEBUG: setting first password, exit status: '1'
2018-07-03_07:44:52: INFO: verifying that the current password is the desired password
Connection error: ('Unable to connect to any servers', {'192.168.211.25': AuthenticationFailed('Failed to authenticate to 192.168.211.25: Error from server: code=0100 [Bad credentials] message="Provided username cassandra and/or password are incorrect"',)})
2018-07-03_07:44:52: DEBUG: verifying current password, exit status: '1'
2018-07-03_07:44:52: ERROR: the password for user 'cassandra' is inconsistent. Aborting.
following is the associated bosh deployment manifest
---
instance_groups:
- azs:
- z1
env:
bosh:
remove_dev_tools: true
swap_size: 0
instances: 3
jobs:
- consumes:
seeds:
from: deployment-seeds
name: cassandra
properties:
cass_KSP: "((!cassandra_key_store_pass))"
cass_pwd: "((!cassandra_admin_password))"
cassandra_ssl_YN: false
client_encryption_options:
enabled: false
optional: true
require_client_auth: false
cluster_name: cluster
heap_newsize: 1G
max_heap_size: 6G
num_tokens: 256
server_encryptions:
internode_encryption: none
topology:
- 10.8.32.60=DC1:RAC1
- 10.8.32.61=DC1:RAC1
- 10.8.32.62=DC1:RAC1
- 10.8.32.63=DC1:RAC1
validate_ssl_TF: false
provides:
seeds:
as: deployment-seeds
release: cassandra
name: cassandra-seeds
networks:
- name: tf-net-coab-depls-instance
persistent_disk_type: xlarge
stemcell: trusty
vm_type: large
- azs:
- z1
env:
bosh:
remove_dev_tools: true
swap_size: 100
instances: 1
jobs:
- consumes:
seeds:
from: deployment-seeds
name: cassandra
properties:
cass_KSP: "((!cassandra_key_store_pass))"
cass_pwd: "((!cassandra_admin_password))"
cassandra_ssl_YN: false
client_encryption_options:
enabled: false
optional: true
require_client_auth: false
cluster_name: cluster
heap_newsize: 1G
max_heap_size: 6G
num_tokens: 256
server_encryptions:
internode_encryption: none
topology:
- 10.8.32.60=DC1:RAC1
- 10.8.32.61=DC1:RAC1
- 10.8.32.62=DC1:RAC1
- 10.8.32.63=DC1:RAC1
validate_ssl_TF: false
release: cassandra
name: cassandra-servers
networks:
- name: tf-net-coab-depls-instance
persistent_disk_type: xlarge
stemcell: trusty
vm_type: large
- azs:
- z1
instances: 1
jobs:
- name: broker-smoke-tests
properties:
cf:
admin:
password: "((/secrets/cloudfoundry_admin_password))"
username: admin
api:
url: https://api.((/secrets/cloudfoundry_system_domain))
cassandra:
appdomain: "((/secrets/cloudfoundry_apps_domain))"
serviceinstancename: cassandra-instance
servicename: cassandra
serviceplan: default
org: service-sandbox
skip:
ssl:
validation: true
space: cassandra-smoke-tests
release: cassandra
- consumes:
seeds:
from: deployment-seeds
name: broker
properties:
broker:
password: "((/secrets/cloudfoundry_service_brokers_cassandra_password))"
user: cassandra-broker
cassandra_seed:
admin_password: "((!cassandra_admin_password))"
release: cassandra
- name: route-registrar
properties:
route_registrar:
external_host: cassandra-broker-c_ee617363-8821-43da-8034-efb2d9343654.((!/secrets/cloudfoundry_system_domain))
health_checker:
interval: 10
name: healthchk
message_bus_servers:
- host: "((/secrets/cloudfoundry_nats_host)):4222"
password: "((/secrets/cloudfoundry_nats_password))"
user: nats
port: 8080
release: route-registrar
name: cassandra-broker
networks:
- name: tf-net-coab-depls-instance
persistent_disk_type: xlarge
stemcell: trusty
vm_type: large
name: c_ee617363-8821-43da-8034-efb2d9343654
releases:
- name: cassandra
version: '6'
- name: route-registrar
version: '3'
stemcells:
- alias: trusty
os: ubuntu-trusty
version: '3468.25'
update:
canaries: 1
canary_watch_time: 30000-240000
max_in_flight: 1
serial: false
update_watch_time: 30000-240000
variables:
- name: cassandra_admin_password
type: password
- name: cassandra_key_store_pass
type: passwordadditional release
$ bosh releases
Using environment '192.168.99.155' as client 'xx'
Name Version Commit Hash
bosh-dns 0.2.0* 304d6ca
cassandra 6* 33952d4
mongodb-services 3* 688f3ec
node-exporter 1.1.0 d2706592+
os-conf 19* 22510c5
prometheus 21.1.0* 75e3e4b
route-registrar 3* f7132692+
syslog 11* 0e06601
weave-scope 0.0.17* f0cc5de2+
(*) Currently deployed
(+) Uncommitted changes
9 releases
Metadata
Metadata
Assignees
Labels
No labels