Copies an entire page tree (including images and attachments) from one Confluence instance to another, preserving hierarchy and rewriting internal links.
- Source: Configured in
config.json(source Confluence instance, space, and root page) - Target: Configured in
config.json(target Confluence instance, space, and parent page — synced pages are placed as siblings of existing content)
- Python 3.10+
requestslibrary (pip install requests)
# Create a virtual environment (or use an existing one)
python3 -m venv .venv
source .venv/bin/activate
pip install requests
# Copy the config template and fill in your tokens
cp config.example.json config.jsonAll sensitive and environment-specific values are stored in config.json:
{
"source": {
"base_url": "https://source-confluence.example.com",
"token": "<SOURCE_BEARER_TOKEN>",
"space_key": "SOURCEKEY",
"page_id": "123456789"
},
"target": {
"base_url": "https://target-confluence.example.com",
"token": "<TARGET_BEARER_TOKEN>",
"space_key": "TARGETKEY",
"parent_page_id": "987654321"
},
"title_prefix": "[SYNC] ",
"restrictions": {
"users": [
"user@example.com"
],
"groups": []
}
}| Field | Description |
|---|---|
source.base_url |
URL of the source Confluence instance |
source.token |
Bearer token (API key) for the source |
source.space_key |
Space key of the source space |
source.page_id |
ID of the root page to be copied |
target.base_url |
URL of the target Confluence instance |
target.token |
Bearer token for the target |
target.space_key |
Space key of the target space |
target.parent_page_id |
ID of the parent page under which the copy is placed |
title_prefix |
Prefix prepended to all page titles on the target (e.g. "[SYNC] "). Set to "" to disable. Useful to avoid title conflicts with existing pages. |
restrictions.users |
List of usernames that get read access to the synced pages |
restrictions.groups |
List of groups that get read access to the synced pages |
config.json is listed in .gitignore and will not be committed. Use config.example.json as a template.
# Full sync (fetch + create + attachments + rewrite-links + restrict)
python3 confluence_sync.py --phase all
# Dry-run: preview what would happen without making changes
python3 confluence_sync.py --phase all --dry-run
# Run individual phases
python3 confluence_sync.py --phase fetch
python3 confluence_sync.py --phase create
python3 confluence_sync.py --phase attachments
python3 confluence_sync.py --phase rewrite-links
python3 confluence_sync.py --phase restrict
# Clean up: remove all synced pages from the target
python3 confluence_sync.py --phase deleteThe script operates in 5 sequential phases (plus a separate delete phase):
Recursively retrieves the entire page tree from the source instance:
- Page content in Confluence storage format (XHTML)
- List of attachments per page (with download URLs)
- Hierarchical structure (parent-child relationships)
The data is stored in confluence_sync_state.json so subsequent phases do not need to re-fetch.
Creates all pages on the target instance:
- Root page is created as a child of
target.parent_page_id - Child pages are recursively created with the same hierarchy
- Page content (storage format) is transferred as-is, no conversion needed
- Mapping from source ID to target ID is saved in state
Resume support: if the script stops midway, already created pages are skipped on restart.
Copies all attachments (images, files) from source to target:
- Downloads each attachment from the source API
- Caches locally in the
cache/directory - Uploads to the corresponding target page
- Tracks which pages are complete (resume support)
Rewrites internal links in all target pages:
ri:content-idreferences: source IDs are replaced with target IDs- Hardcoded URLs to the source instance are rewritten to the target
ri:space-keyattributes are updated to the target space key- Both relative and absolute URL patterns are handled
Sets a read restriction on the synced root page. Confluence uses restriction inheritance: only the listed users and groups can see the root page and all its descendants. Everyone else — including logged-in users — will not see these pages.
- Restriction is set on the root page only (children inherit automatically)
- Users and groups are configured in
config.jsonunderrestrictions - To grant access to additional users later, add them to the
restrictions.userslist and re-run--phase restrict
Removes all previously synced pages from the target:
- Recursively deletes all child pages (bottom-up)
- Deletes the root page
- Resets the ID mapping and attachment status in the state
| File | Description |
|---|---|
confluence_sync_state.json |
Contains the full sync state: fetched pages, ID mapping, progress. Managed automatically. |
confluence_sync.log |
Detailed log file (DEBUG level). Console only shows INFO. |
cache/ |
Local cache of downloaded attachments. Can be deleted after a successful sync. |
All generated files are listed in .gitignore.
Source Confluence Target Confluence
┌─────────────────────┐ ┌─────────────────────────┐
│ Root page │ ── fetch ──> │ State (JSON) │
│ ├── Section A │ │ ├── source_pages │
│ │ ├── Page 1 │ │ ├── source_tree │
│ │ └── Page 2 │ │ └── id_mapping │
│ ├── Section B │ │ │
│ └── Section C │ ── create ──> │ Target parent page │
│ │ │ ├── Existing content │
│ Attachments: │ │ └── Root page (new) │
│ ├── image1.png │ ─ attachments ─> │ ├── Section A │
│ └── diagram.svg │ │ │ ├── Page 1 │
│ │ │ │ └── Page 2 │
│ Internal links: │ │ ├── Section B │
│ ri:content-id="123" │ ─ rewrite ────> │ └── Section C │
│ href="/spaces/SRC" │ │ (links rewritten) │
└─────────────────────┘ └─────────────────────────┘
- Fetch retrieves everything and stores it locally in the state
- Create builds the page tree on the target (storage format 1:1)
- Attachments copies all files from source to target
- Rewrite updates internal links to point to the target pages
- Restrict locks down the root page so only listed users/groups can see it
- Macros: Confluence-specific macros (expand, code, toc, etc.) are copied as-is. This only works if both instances support the same plugins/macros.
- Permissions: Source permissions are not copied. Instead, the restrict phase sets new read restrictions on the root page based on
config.json. All child pages inherit this restriction automatically. - Titles: Page titles are transferred unchanged. If a page with the same title already exists in the target space, the create phase will fail for that page.
- Rate limiting: The script has built-in pauses (0.3-0.5s) between API calls and retry logic for HTTP 429/502/503/504.
- State file: Delete
confluence_sync_state.jsonto start a completely clean sync. Or use--phase deleteto clean up the previous sync first.