Skip to content
This repository was archived by the owner on Aug 31, 2022. It is now read-only.

v0.6: nodes randomly fail to start with passwords mismatch #62

@gberche-orange

Description

@gberche-orange

We observe randomly the following symptom with v0.6 release. We're not yet sure whether the root cause comes from the environment (bosh director or infrastructure) or from the cassandra 0.6 release.

Task 128129 | 07:27:33 | Updating instance cassandra-seeds: cassandra-seeds/41da6c3c-0d85-4049-b5b6-7ee8b34f6cfa (0) (canary) (00:01:48)
                      L Error: Action Failed get_task: Task 7b430e3e-cf68-4a3f-43c4-d2e16c9fbfc8 result: 1 of 2 post-start scripts failed. Failed Jobs: cassandra. Successful Jobs: bosh-dns.
Task 128129 | 07:29:34 | Updating instance cassandra-servers: cassandra-servers/e2699534-66fa-4372-b18c-82d963c2ff4f (0) (canary) (00:02:05)
                      L Error: Action Failed get_task: Task f2c8e89c-dc44-4a26-5ae0-8a8fb18a70af result: 1 of 2 post-start scripts failed. Failed Jobs: cassandra. Successful Jobs: bosh-dns.

However, after a few minutes the deployment status displays the nodes as running

$ bosh instances
Using environment '192.168.99.155' as client 'xx'

Task 128310. Done

Deployment 'c_072dd24d-c2aa-486c-88c6-6c362ae4f609'

Instance                                                Process State  AZ  IPs  
cassandra-broker/2f5ccb89-83d1-4d59-bb87-8254daeb694a   failing        z1  192.168.211.34  
cassandra-seeds/41da6c3c-0d85-4049-b5b6-7ee8b34f6cfa    running        z1  192.168.211.25  
cassandra-seeds/7e405a54-1acb-4fdd-a015-8650651b1e1a    running        z1  192.168.211.31  
cassandra-seeds/f3fc209a-7117-49e7-9924-5c0d84f3b5fa    running        z1  192.168.211.32  
cassandra-servers/e2699534-66fa-4372-b18c-82d963c2ff4f  running        z1  192.168.211.33  

5 instances

Looking at /var/vcap/sys/log/cassandra/post-start.stderr.log on cassandra-seeds/41da6c3c-0d85-4049-b5b6-7ee8b34f6cfa we repeatedly see

2018-07-03_07:39:53: DEBUG: setting first password, exit status: '1'
2018-07-03_07:39:53: INFO: verifying that the current password is the desired password
Connection error: ('Unable to connect to any servers', {'192.168.211.25': AuthenticationFailed('Failed to authenticate to 192.168.211.25: Error from server: code=0100 [Bad credentials] message="Provided username cassandra and/or password are incorrect"',)})
2018-07-03_07:39:53: DEBUG: verifying current password, exit status: '1'
2018-07-03_07:39:53: ERROR: the password for user 'cassandra' is inconsistent. Aborting.
2018-07-03_07:44:21: INFO: reached Cassandra on '192.168.211.25:9042' after '14' attemps. Waiting 30 more seconds for the service to be available.

2018-07-03_07:44:51: INFO: setting first password
Connection error: ('Unable to connect to any servers', {'192.168.211.25': AuthenticationFailed('Failed to authenticate to 192.168.211.25: Error from server: code=0100 [Bad credentials] message="Provided username cassandra and/or password are incorrect"',)})
2018-07-03_07:44:52: DEBUG: setting first password, exit status: '1'
2018-07-03_07:44:52: INFO: verifying that the current password is the desired password
Connection error: ('Unable to connect to any servers', {'192.168.211.25': AuthenticationFailed('Failed to authenticate to 192.168.211.25: Error from server: code=0100 [Bad credentials] message="Provided username cassandra and/or password are incorrect"',)})
2018-07-03_07:44:52: DEBUG: verifying current password, exit status: '1'
2018-07-03_07:44:52: ERROR: the password for user 'cassandra' is inconsistent. Aborting.

following is the associated bosh deployment manifest

---
instance_groups:
- azs:
  - z1
  env:
    bosh:
      remove_dev_tools: true
      swap_size: 0
  instances: 3
  jobs:
  - consumes:
      seeds:
        from: deployment-seeds
    name: cassandra
    properties:
      cass_KSP: "((!cassandra_key_store_pass))"
      cass_pwd: "((!cassandra_admin_password))"
      cassandra_ssl_YN: false
      client_encryption_options:
        enabled: false
        optional: true
        require_client_auth: false
      cluster_name: cluster
      heap_newsize: 1G
      max_heap_size: 6G
      num_tokens: 256
      server_encryptions:
        internode_encryption: none
      topology:
      - 10.8.32.60=DC1:RAC1
      - 10.8.32.61=DC1:RAC1
      - 10.8.32.62=DC1:RAC1
      - 10.8.32.63=DC1:RAC1
      validate_ssl_TF: false
    provides:
      seeds:
        as: deployment-seeds
    release: cassandra
  name: cassandra-seeds
  networks:
  - name: tf-net-coab-depls-instance
  persistent_disk_type: xlarge
  stemcell: trusty
  vm_type: large
- azs:
  - z1
  env:
    bosh:
      remove_dev_tools: true
      swap_size: 100
  instances: 1
  jobs:
  - consumes:
      seeds:
        from: deployment-seeds
    name: cassandra
    properties:
      cass_KSP: "((!cassandra_key_store_pass))"
      cass_pwd: "((!cassandra_admin_password))"
      cassandra_ssl_YN: false
      client_encryption_options:
        enabled: false
        optional: true
        require_client_auth: false
      cluster_name: cluster
      heap_newsize: 1G
      max_heap_size: 6G
      num_tokens: 256
      server_encryptions:
        internode_encryption: none
      topology:
      - 10.8.32.60=DC1:RAC1
      - 10.8.32.61=DC1:RAC1
      - 10.8.32.62=DC1:RAC1
      - 10.8.32.63=DC1:RAC1
      validate_ssl_TF: false
    release: cassandra
  name: cassandra-servers
  networks:
  - name: tf-net-coab-depls-instance
  persistent_disk_type: xlarge
  stemcell: trusty
  vm_type: large
- azs:
  - z1
  instances: 1
  jobs:
  - name: broker-smoke-tests
    properties:
      cf:
        admin:
          password: "((/secrets/cloudfoundry_admin_password))"
          username: admin
        api:
          url: https://api.((/secrets/cloudfoundry_system_domain))
        cassandra:
          appdomain: "((/secrets/cloudfoundry_apps_domain))"
          serviceinstancename: cassandra-instance
          servicename: cassandra
          serviceplan: default
        org: service-sandbox
        skip:
          ssl:
            validation: true
        space: cassandra-smoke-tests
    release: cassandra
  - consumes:
      seeds:
        from: deployment-seeds
    name: broker
    properties:
      broker:
        password: "((/secrets/cloudfoundry_service_brokers_cassandra_password))"
        user: cassandra-broker
      cassandra_seed:
        admin_password: "((!cassandra_admin_password))"
    release: cassandra
  - name: route-registrar
    properties:
      route_registrar:
        external_host: cassandra-broker-c_ee617363-8821-43da-8034-efb2d9343654.((!/secrets/cloudfoundry_system_domain))
        health_checker:
          interval: 10
          name: healthchk
        message_bus_servers:
        - host: "((/secrets/cloudfoundry_nats_host)):4222"
          password: "((/secrets/cloudfoundry_nats_password))"
          user: nats
        port: 8080
    release: route-registrar
  name: cassandra-broker
  networks:
  - name: tf-net-coab-depls-instance
  persistent_disk_type: xlarge
  stemcell: trusty
  vm_type: large
name: c_ee617363-8821-43da-8034-efb2d9343654
releases:
- name: cassandra
  version: '6'
- name: route-registrar
  version: '3'
stemcells:
- alias: trusty
  os: ubuntu-trusty
  version: '3468.25'
update:
  canaries: 1
  canary_watch_time: 30000-240000
  max_in_flight: 1
  serial: false
  update_watch_time: 30000-240000
variables:
- name: cassandra_admin_password
  type: password
- name: cassandra_key_store_pass
  type: password

additional release

$ bosh releases
Using environment '192.168.99.155' as client 'xx'

Name              Version  Commit Hash  
bosh-dns          0.2.0*   304d6ca  
cassandra         6*       33952d4  
mongodb-services  3*       688f3ec  
node-exporter     1.1.0    d2706592+  
os-conf           19*      22510c5  
prometheus        21.1.0*  75e3e4b  
route-registrar   3*       f7132692+  
syslog            11*      0e06601  
weave-scope       0.0.17*  f0cc5de2+  

(*) Currently deployed
(+) Uncommitted changes

9 releases

/CC @JCL38-ORANGE @poblin-orange

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions