Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
42 commits
Select commit Hold shift + click to select a range
8218aa0
draft: docs with ai slop
invalid-email-address Aug 18, 2025
5519647
fix
Enjection Aug 18, 2025
fe26631
fix
Enjection Aug 18, 2025
c0de6ed
fix
Enjection Aug 18, 2025
c7a2a3a
fix
Enjection Aug 18, 2025
0b8419b
fix
Enjection Aug 18, 2025
5cf554b
cleanup
Enjection Aug 18, 2025
29f92ef
cleanup
Enjection Aug 18, 2025
af80c04
cleanup
Enjection Aug 19, 2025
0e28fec
fix
Enjection Aug 19, 2025
16a36b5
recreate from scratch
Enjection Aug 19, 2025
bddc4ba
recreate from scratch
Enjection Aug 19, 2025
8bed6ff
recreate from scratch
Enjection Aug 19, 2025
5c01c3a
recreate from scratch
Enjection Aug 19, 2025
e691e5e
fix
Enjection Aug 21, 2025
9ce392b
fix
Enjection Aug 21, 2025
7ad4eb0
fix
Enjection Aug 21, 2025
b01fa24
fix
Enjection Aug 21, 2025
c85994c
fix
Enjection Aug 21, 2025
f5cc5e2
fix
Enjection Aug 21, 2025
fee51e0
fix
Enjection Aug 21, 2025
1371189
fix
Enjection Aug 21, 2025
3a4e698
fix
Enjection Aug 21, 2025
72ab909
fix
Enjection Aug 21, 2025
c805b5e
fix
Enjection Aug 21, 2025
b032be3
fix
Enjection Aug 21, 2025
e3361ce
fix
Enjection Aug 21, 2025
3c552c8
fix
Enjection Aug 22, 2025
4691bea
fix
Enjection Aug 22, 2025
38e8af7
fix
Enjection Aug 22, 2025
0447d45
fix
Enjection Aug 22, 2025
7e86d65
fix
Enjection Aug 22, 2025
38decad
fix
Enjection Aug 22, 2025
89df5ca
fix
Enjection Aug 22, 2025
dd7fe41
fix
Enjection Aug 22, 2025
909f520
fix
Enjection Aug 24, 2025
e2488dc
fix
Enjection Aug 24, 2025
1333fc4
fix
Enjection Aug 24, 2025
ee661ed
fix
Enjection Aug 24, 2025
2f614e6
fix
Enjection Aug 24, 2025
3745eb3
fix
Enjection Aug 24, 2025
59fcb3a
fix
Enjection Aug 24, 2025
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
160 changes: 160 additions & 0 deletions ydb/docs/en/core/concepts/backup-collections.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,160 @@
# Backup Collections {#backup-collections}

Backup collections provide an advanced backup solution for YDB that organizes full and incremental backups into managed collections. This approach is designed for production workloads requiring efficient disaster recovery and point-in-time recovery capabilities.

## What are backup collections? {#what-are-backup-collections}

A backup collection is a named set of coordinated backups for selected database tables. Collections organize related backups and ensure they can be restored together consistently, providing:

- **Efficiency**: Incremental backups capture only changes since the previous backup.
- **Organization**: Related backups are grouped into logical collections.
- **Recovery flexibility**: Enables recovery using any backup in the chain.

## Core concepts {#core-concepts}

### Backup collection {#backup-collection}

A named container that groups backups for a specific set of database tables. Collections ensure that all included tables are backed up consistently.

### Full backup {#full-backup}

A complete snapshot of all selected tables at a specific point in time. Serves as the baseline for subsequent incremental backups and contains all data needed for independent restoration.

### Incremental backup {#incremental-backup}

Captures only the changes (inserts, updates, deletes) since the previous backup in the chain. Significantly smaller than full backups for datasets with limited changes.

### Backup chain {#backup-chain}

An ordered sequence of backups starting with a full backup followed by zero or more incremental backups. Each incremental backup depends on all previous backups in the chain for complete restoration.

## Architecture and components {#architecture}

### Backup flow {#backup-flow}

1. **Collection creation**: Define which tables to include and storage settings
2. **Initial full backup**: Create baseline snapshot of all tables
3. **Regular incremental backups**: Capture ongoing changes on-demand
4. **Chain management**: Monitor backup chains and manage retention manually

### Storage structure {#storage-structure}

Backup collections are stored in a dedicated directory structure within the database:

```text
/Root/test1/.backups/collections/
├── backup_collection_1/
│ ├── 20250821141425Z_full/ # Full backup
│ │ ├── table_1/
│ │ └── table_2/
│ └── 20250821141519Z_incremental/ # Incremental backup
│ ├── table_1/
│ └── table_2/
└── backup_collection_2/
├── 20250820093012Z_full/ # Full backup
│ └── table_3/
├── 20250820140000Z_incremental/ # First incremental
│ └── table_3/
└── 20250821080000Z_incremental/ # Second incremental
└── table_3/
```

Each backup contains:

- Table schemas at backup time. (Implicitly)
- Data files (full or incremental changes).

### Storage backends {#storage-backends}

#### Cluster storage {#cluster-storage}

Backups are stored within the YDB cluster itself, providing:

- **High availability**: Leverages cluster replication and fault tolerance.
- **Performance**: Fast backup and restore operations.
- **Integration**: Seamless integration with cluster operations.
- **Security**: Uses cluster security mechanisms.

```sql
WITH ( STORAGE = 'cluster' )
```

#### External storage {#external-storage}

Currently, external storage requires manual export/import operations. Use [export/import operations](../reference/ydb-cli/export-import/index.md) to move backups to external storage systems.

### Background operations {#background-operations}

All backup operations run asynchronously in the background, allowing you to:

- Continue normal database operations during backups.
- Monitor progress using YDB CLI operation commands.
- Handle large datasets without blocking other activities.

## How backup collections work internally {#how-they-work}

### Backup creation process {#backup-creation-process}

1. **Transaction isolation**: Backup starts from a consistent snapshot point
2. **Change tracking**: For incremental backups, only changes since last backup are captured and stored in CDC stream
3. **Change materialization**: When incremental backup called CDC stream compacted to incremental backup tables

### Incremental backup mechanism {#incremental-backup-mechanism}

Incremental backups use change tracking to identify:

- **New rows**: Added since last backup.
- **Modified rows**: Changed data in existing rows.
- **Deleted rows**: Removed data (tombstone records).
- **Schema changes**: Currently not supported.

## Relationship with incremental backups {#relationship-with-incremental-backups}

Backup collections are the foundation for incremental backup functionality:

- **Collections enable incrementals**: You must have a collection to create incremental backups.
- **Chain management**: Collections manage the sequence of full and incremental backups.
- **Consistency**: All tables in a collection are backed up consistently.

Without backup collections, only full export/import operations are available.

## When to use backup collections {#when-to-use}

**Ideal scenarios:**

- Production environments requiring regular backup schedules.
- Large datasets where incremental changes are much smaller than total data size.
- Scenarios requiring backup chains for efficiency.

**Consider traditional export/import for:**

- Small databases or individual tables.
- One-time data migration tasks.
- Development/testing environments.
- Simple backup scenarios without incremental needs.

## Benefits and limitations {#benefits-limitations}

### Benefits

- **Storage efficiency**: Incremental backups use significantly less storage.
- **Faster backups**: Only changes are processed after initial full backup (note: change capture still incurs storage and cpu costs).
- **SQL interface**: Familiar SQL commands for backup management.
- **Background processing**: Non-blocking operations.

### Current limitations

- **Cluster storage only**: External storage requires manual export/import.
- **No collection modification**: Cannot add/remove tables after creation.
- **No partial restore**: Partial restores from collections must be managed externally.

## Next steps {#next-steps}

- **Get started**: Follow the [operations guide](../maintenance/manual/backup-collections.md) for step-by-step instructions
- **See examples**: Explore [common scenarios](../recipes/backup-collections.md) and best practices

## See also

- [General backup concepts](backup.md) - Overview of all backup approaches in YDB.
- [Operations guide](../maintenance/manual/backup-collections.md) - Practical instructions and examples.
- [Common recipes](../recipes/backup-collections.md) - Real-world usage scenarios.
49 changes: 49 additions & 0 deletions ydb/docs/en/core/concepts/backup.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,49 @@
# Backup concepts

This section covers backup concepts and technologies available in {{ ydb-short-name }}.

{{ ydb-short-name }} provides several approaches for creating backups, each designed for different use cases and requirements:

## Export/import {#export-import}

For large-scale data migration and portability scenarios:

- **Use cases**: Large data migration between systems, archival storage, production data transfers.
- **Storage**: S3-compatible storage.

## Backup/restore {#backup-restore}

For local database backups and development workflows:

- **Use cases**: Local development environments, testing scenarios, smaller production environments, database cloning for local use.
- **Storage**: Filesystem.

## Backup collections {#backup-collections}

For production workloads requiring incremental backups:

- **Use cases**: Production environments, large datasets, regular backup schedules.
- **Storage**: Currently supports cluster storage only.

Learn more:

- [Backup collections concepts](backup-collections.md) - Architecture and concepts.
- [Operations guide](../maintenance/manual/backup-collections.md) - Practical operations.
- [Common recipes](../recipes/backup-collections.md) - Usage examples.

Learn more:

- [Export and import reference](../reference/ydb-cli/export-import/index.md) - Export/import operations.

## Choosing the right approach {#choosing-approach}

| Approach | Best for | Key advantages | Considerations |
|----------|----------|----------------|----------------|
| **Export/import** | Large data migration, archival, production data transfers | Portability between systems, flexible formats, handles large datasets | Full snapshots only |
| **Backup/restore** | Local development, testing, smaller production environments | Local filesystem operations, suitable for moderate data volumes | Full snapshots only, primarily for local use |
| **Backup collections** | Production environments, large datasets | Incremental efficiency, point-in-time recovery | Requires collection setup, cluster storage only |

## See also

- [Backup and recovery guide](../devops/backup-and-recovery.md).
- [Export and import reference](../reference/ydb-cli/export-import/index.md).
4 changes: 4 additions & 0 deletions ydb/docs/en/core/concepts/toc_i.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -27,6 +27,10 @@ items:
href: limits-ydb.md
- name: Multi-Version Concurrency Control (MVCC)
href: mvcc.md
- name: Backup and restore
href: backup.md
- name: Backup collections
href: backup-collections.md
- name: Asynchronous replication
href: async-replication.md
when: feature_async_replication
Expand Down
Loading