Confluence Sync

Copies an entire page tree (including images and attachments) from one Confluence instance to another, preserving hierarchy and rewriting internal links.

Current configuration

Source: Configured in config.json (source Confluence instance, space, and root page)
Target: Configured in config.json (target Confluence instance, space, and parent page — synced pages are placed as siblings of existing content)

Requirements

Python 3.10+
requests library (pip install requests)

Installation

# Create a virtual environment (or use an existing one)
python3 -m venv .venv
source .venv/bin/activate
pip install requests

# Copy the config template and fill in your tokens
cp config.example.json config.json

Configuration

All sensitive and environment-specific values are stored in config.json:

{
  "source": {
    "base_url": "https://source-confluence.example.com",
    "token": "<SOURCE_BEARER_TOKEN>",
    "space_key": "SOURCEKEY",
    "page_id": "123456789"
  },
  "target": {
    "base_url": "https://target-confluence.example.com",
    "token": "<TARGET_BEARER_TOKEN>",
    "space_key": "TARGETKEY",
    "parent_page_id": "987654321"
  },
  "title_prefix": "[SYNC] ",
  "restrictions": {
    "users": [
      "user@example.com"
    ],
    "groups": []
  }
}

Field	Description
`source.base_url`	URL of the source Confluence instance
`source.token`	Bearer token (API key) for the source
`source.space_key`	Space key of the source space
`source.page_id`	ID of the root page to be copied
`target.base_url`	URL of the target Confluence instance
`target.token`	Bearer token for the target
`target.space_key`	Space key of the target space
`target.parent_page_id`	ID of the parent page under which the copy is placed
`title_prefix`	Prefix prepended to all page titles on the target (e.g. `"[SYNC] "`). Set to `""` to disable. Useful to avoid title conflicts with existing pages.
`restrictions.users`	List of usernames that get read access to the synced pages
`restrictions.groups`	List of groups that get read access to the synced pages

config.json is listed in .gitignore and will not be committed. Use config.example.json as a template.

Usage

# Full sync (fetch + create + attachments + rewrite-links + restrict)
python3 confluence_sync.py --phase all

# Dry-run: preview what would happen without making changes
python3 confluence_sync.py --phase all --dry-run

# Run individual phases
python3 confluence_sync.py --phase fetch
python3 confluence_sync.py --phase create
python3 confluence_sync.py --phase attachments
python3 confluence_sync.py --phase rewrite-links
python3 confluence_sync.py --phase restrict

# Clean up: remove all synced pages from the target
python3 confluence_sync.py --phase delete

Phases

The script operates in 5 sequential phases (plus a separate delete phase):

1. Fetch (`--phase fetch`)

Recursively retrieves the entire page tree from the source instance:

Page content in Confluence storage format (XHTML)
List of attachments per page (with download URLs)
Hierarchical structure (parent-child relationships)

The data is stored in confluence_sync_state.json so subsequent phases do not need to re-fetch.

2. Create (`--phase create`)

Creates all pages on the target instance:

Root page is created as a child of target.parent_page_id
Child pages are recursively created with the same hierarchy
Page content (storage format) is transferred as-is, no conversion needed
Mapping from source ID to target ID is saved in state

Resume support: if the script stops midway, already created pages are skipped on restart.

3. Attachments (`--phase attachments`)

Copies all attachments (images, files) from source to target:

Downloads each attachment from the source API
Caches locally in the cache/ directory
Uploads to the corresponding target page
Tracks which pages are complete (resume support)

4. Rewrite Links (`--phase rewrite-links`)

Rewrites internal links in all target pages:

ri:content-id references: source IDs are replaced with target IDs
Hardcoded URLs to the source instance are rewritten to the target
ri:space-key attributes are updated to the target space key
Both relative and absolute URL patterns are handled

5. Restrict (`--phase restrict`)

Sets a read restriction on the synced root page. Confluence uses restriction inheritance: only the listed users and groups can see the root page and all its descendants. Everyone else — including logged-in users — will not see these pages.

Restriction is set on the root page only (children inherit automatically)
Users and groups are configured in config.json under restrictions
To grant access to additional users later, add them to the restrictions.users list and re-run --phase restrict

6. Delete (`--phase delete`)

Removes all previously synced pages from the target:

Recursively deletes all child pages (bottom-up)
Deletes the root page
Resets the ID mapping and attachment status in the state

Generated files

File	Description
`confluence_sync_state.json`	Contains the full sync state: fetched pages, ID mapping, progress. Managed automatically.
`confluence_sync.log`	Detailed log file (DEBUG level). Console only shows INFO.
`cache/`	Local cache of downloaded attachments. Can be deleted after a successful sync.

All generated files are listed in .gitignore.

How it works

Source Confluence                        Target Confluence
┌─────────────────────┐                  ┌─────────────────────────┐
│ Root page           │   ── fetch ──>   │ State (JSON)            │
│ ├── Section A       │                  │ ├── source_pages        │
│ │   ├── Page 1      │                  │ ├── source_tree         │
│ │   └── Page 2      │                  │ └── id_mapping          │
│ ├── Section B       │                  │                         │
│ └── Section C       │   ── create ──>  │ Target parent page      │
│                     │                  │ ├── Existing content    │
│ Attachments:        │                  │ └── Root page (new)     │
│ ├── image1.png      │ ─ attachments ─> │     ├── Section A       │
│ └── diagram.svg     │                  │     │   ├── Page 1      │
│                     │                  │     │   └── Page 2      │
│ Internal links:     │                  │     ├── Section B       │
│ ri:content-id="123" │ ─ rewrite ────>  │     └── Section C       │
│ href="/spaces/SRC"  │                  │     (links rewritten)   │
└─────────────────────┘                  └─────────────────────────┘

Fetch retrieves everything and stores it locally in the state
Create builds the page tree on the target (storage format 1:1)
Attachments copies all files from source to target
Rewrite updates internal links to point to the target pages
Restrict locks down the root page so only listed users/groups can see it

Important notes

Macros: Confluence-specific macros (expand, code, toc, etc.) are copied as-is. This only works if both instances support the same plugins/macros.
Permissions: Source permissions are not copied. Instead, the restrict phase sets new read restrictions on the root page based on config.json. All child pages inherit this restriction automatically.
Titles: Page titles are transferred unchanged. If a page with the same title already exists in the target space, the create phase will fail for that page.
Rate limiting: The script has built-in pauses (0.3-0.5s) between API calls and retry logic for HTTP 429/502/503/504.
State file: Delete confluence_sync_state.json to start a completely clean sync. Or use --phase delete to clean up the previous sync first.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Confluence Sync

Current configuration

Requirements

Installation

Configuration

Usage

Phases

1. Fetch (`--phase fetch`)

2. Create (`--phase create`)

3. Attachments (`--phase attachments`)

4. Rewrite Links (`--phase rewrite-links`)

5. Restrict (`--phase restrict`)

6. Delete (`--phase delete`)

Generated files

How it works

Important notes

FilesExpand file tree

ConfluenceSync.md

Latest commit

History

ConfluenceSync.md

File metadata and controls

Confluence Sync

Current configuration

Requirements

Installation

Configuration

Usage

Phases

1. Fetch (--phase fetch)

2. Create (--phase create)

3. Attachments (--phase attachments)

4. Rewrite Links (--phase rewrite-links)

5. Restrict (--phase restrict)

6. Delete (--phase delete)

Generated files

How it works

Important notes

1. Fetch (`--phase fetch`)

2. Create (`--phase create`)

3. Attachments (`--phase attachments`)

4. Rewrite Links (`--phase rewrite-links`)

5. Restrict (`--phase restrict`)

6. Delete (`--phase delete`)