Skip to content

Cross-site replication does not work with only 2 units in each cluster #1273

@mastier

Description

@mastier

Steps to reproduce

  1. deploy 2 cluster in 2 models with just 2 postgresql units
  2. Wait for them to establish sync replication
  3. Setup cross-site replication as adviced in the docs

Expected behavior

Cluster sync configured

Actual behavior

The cluster to which the replication is assigned is stuck in waiting

postgresql/0*                         waiting   idle        0        10.46.16.24     5432/tcp  Still starting the database in the standby leader
postgresql/1                          waiting   idle        1        10.46.16.23     5432/tcp  Waiting for the standby leader start the database

Versions

24.04.5

Juju CLI:
3.6.11

Juju agent:
3.6.11

Charm revision:
rev952 (16/stable)

LXD:
5.21.6

Log output

==> /var/snap/charmed-postgresql/common/var/log/patroni/patroni.log.7 <==                           
2025-10-29 17:57:34 UTC [195876]: INFO: Lock owner: None; I am postgresql-0                         
2025-10-29 17:57:34 UTC [195876]: INFO: trying to bootstrap a new standby leader                    
2025-10-29 17:57:34 UTC [195876]: ERROR: Error when fetching backup: pg_basebackup exited with code=1
2025-10-29 17:57:34 UTC [195876]: WARNING: Trying again in 5 seconds                                
2025-10-29 17:57:39 UTC [195876]: ERROR: Error when fetching backup: pg_basebackup exited with code=1
2025-10-29 17:57:39 UTC [195876]: ERROR: failed to bootstrap clone from remote member postgresql://10.46.28.24:5432
                                                                                                    
==> /var/snap/charmed-postgresql/common/var/log/patroni/patroni.log.70 <==                          
2025-10-29 17:55:48 UTC [193558]: ERROR: Could not remove data directory /var/snap/charmed-postgresql/common/var/lib/postgresql
Traceback (most recent call last):                                                                  
  File "/usr/lib/python3/dist-packages/patroni/postgresql/__init__.py", line 1346, in remove_data_directory
    shutil.rmtree(self._data_dir)                                                                   
  File "/usr/lib/python3.12/shutil.py", line 796, in rmtree                                         
    onexc(os.rmdir, path, err)                                                                      
  File "/usr/lib/python3.12/shutil.py", line 794, in rmtree                                         
    os.rmdir(path, dir_fd=dir_fd)                                                                   
OSError: [Errno 16] Device or resource busy: '/var/snap/charmed-postgresql/common/var/lib/postgresql'
                                                                                                    
==> /var/snap/charmed-postgresql/common/var/log/patroni/patroni.log.73 <==                          
2025-10-29 17:55:40 UTC [193505]: ERROR: Could not rename data directory /var/snap/charmed-postgresql/common/var/lib/postgresql
Traceback (most recent call last):                                                                  
  File "/usr/lib/python3/dist-packages/patroni/postgresql/__init__.py", line 1319, in move_data_directory
    os.rename(self._data_dir, new_name)                                                             
OSError: [Errno 16] Device or resource busy: '/var/snap/charmed-postgresql/common/var/lib/postgresql' -> '/var/snap/charmed-postgresql/common/var/lib/postgresql.failed'
2025-10-29 17:55:43 UTC [193558]: INFO: No PostgreSQL configuration items changed, nothing to reload.

Additional context

We got internal discussion and seems that was already spotted in the internal Jira ticket DPE-8748

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working as expected

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions