Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
23 changes: 19 additions & 4 deletions components/Layout/Footer.tsx
Original file line number Diff line number Diff line change
Expand Up @@ -85,13 +85,28 @@ export function Footer({ shortFooter }: { shortFooter?: boolean }): ReactElement
</Link>
</li>
<li className="site_item">
<Link href="/docs/ios-sdk" className="link">
iOS SDK
<Link href="/docs/tools" className="link">
Tools & APIs
</Link>
</li>
<li className="site_item">
<Link href="/docs/android-sdk" className="link">
Android SDK
<Link href="/docs/self-hosted-server" className="link">
Self-Hosted Server
</Link>
</li>
<li className="site_item">
<Link href="/docs/advanced" className="link">
Configuration
</Link>
</li>
<li className="site_item">
<Link href="/docs/internals" className="link">
Internals
</Link>
</li>
<li className="site_item">
<Link href="/docs/glossary" className="link">
Glossary
</Link>
</li>
</ul>
Expand Down
30 changes: 15 additions & 15 deletions components/Layout/MobileGnbDropdown.tsx
Original file line number Diff line number Diff line change
Expand Up @@ -89,52 +89,52 @@ export function MobileGnbDropdown({ isLoggedIn }: { isLoggedIn: boolean }) {
</li>
<li className="navigator_group">
<Link
href="/docs/ios-sdk"
href="/docs/tools"
className={classNames('navigator_item', 'add_icon', {
is_active: asPath.startsWith(`/docs/ios-sdk`),
is_active: asPath.startsWith(`/docs/tools`),
})}
>
iOS SDK
Tools & APIs
</Link>
</li>
<li className="navigator_group">
<Link
href="/docs/android-sdk"
href="/docs/self-hosted-server"
className={classNames('navigator_item', 'add_icon', {
is_active: asPath.startsWith(`/docs/android-sdk`),
is_active: asPath.startsWith(`/docs/self-hosted-server`),
})}
>
Android SDK
Self-Hosted Server
</Link>
</li>
<li className="navigator_group">
<Link
href="/docs/devtools"
href="/docs/advanced"
className={classNames('navigator_item', 'add_icon', {
is_active: asPath.startsWith(`/docs/devtools`),
is_active: asPath.startsWith(`/docs/advanced`),
})}
>
Devtools
Configuration
</Link>
</li>
<li className="navigator_group">
<Link
href="/docs/cli"
href="/docs/internals"
className={classNames('navigator_item', 'add_icon', {
is_active: asPath.startsWith(`/docs/cli`),
is_active: asPath.startsWith(`/docs/internals`),
})}
>
CLI
Internals
</Link>
</li>
<li className="navigator_group">
<Link
href="/docs/self-hosted-server"
href="/docs/glossary"
className={classNames('navigator_item', 'add_icon', {
is_active: asPath.startsWith(`/docs/self-hosted-server`),
is_active: asPath.startsWith(`/docs/glossary`),
})}
>
Self-Hosted Server
Glossary
</Link>
</li>
</ul>
Expand Down
2 changes: 1 addition & 1 deletion docs/advanced/projects.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -123,7 +123,7 @@ Each project can be independently configured with the following settings:
| **Allowed Origins** | CORS restrictions for client connections | [Security: Allowed Origins](/docs/advanced/security#allowed-origins) |
| **Max Attachments** | Maximum clients that can attach to a single document | [Document Limits](/docs/js-sdk#document-limits) |
| **Max Subscribers** | Maximum clients that can subscribe to a single document | [Document Limits](/docs/js-sdk#document-limits) |
| **Client Deactivate Threshold** | Time after which inactive clients are automatically deactivated by [Housekeeping](/docs/glossary) | [CLI](/docs/tools/cli#updating-the-project) |
| **Client Deactivate Threshold** | Time after which inactive clients are automatically deactivated by [Housekeeping](/docs/internals/housekeeping) | [CLI](/docs/tools/cli#updating-the-project) |

### Common Patterns

Expand Down
4 changes: 2 additions & 2 deletions docs/glossary.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -38,7 +38,7 @@ These terms are used in the Yorkie project and community. Understanding them wil
| [Channel](/docs/js-sdk#channel) | A lightweight, memory-only communication layer for real-time features like presence tracking and message broadcasting. Unlike Documents, Channels do not persist data to the database and are designed for ephemeral data. |
| [Broadcast](/docs/js-sdk#broadcast) | A messaging mechanism within Channels that allows clients to publish and subscribe to real-time events without persisting the data. Useful for chat messages, notifications, and other ephemeral communications. |
| [Presence](/docs/js-sdk#presence) | A data structure representing a user's current state within a document (e.g., cursor position, selection). |
| Attach / Detach | The operations that subscribe or unsubscribe a [Client](/docs/js-sdk#client) to/from a [Document](/docs/js-sdk#attaching-the-document) or [Channel](/docs/js-sdk#channel). Attaching synchronizes state with the server; detaching releases the subscription and cleans up resources. |
| [Attach / Detach](/docs/internals/document-lifecycle#attach-and-detach) | The operations that transition a [Document](/docs/js-sdk#document) or [Channel](/docs/js-sdk#channel) between lifecycle states. Attaching subscribes the client to a document, starts synchronization and the [Watch Stream](/docs/internals/synchronization#the-watch-stream); detaching stops synchronization, closes the stream, and releases resources. See [Document Lifecycle](/docs/internals/document-lifecycle) for details on states and transitions. |

### Synchronization & State Management

Expand Down Expand Up @@ -72,4 +72,4 @@ These terms are used in the Yorkie project and community. Understanding them wil
| [Admin API](/docs/tools/admin-api) | A REST API that allows server-side applications to programmatically manage Yorkie documents without using the Yorkie SDK. Useful for server-side document management, automation, and integration. |
| [Auth Webhook](/docs/advanced/security#auth-webhook) | A server-side webhook that validates client requests through an external authentication server, providing fine-grained access control for documents. See [Security](/docs/advanced/security). |
| [Event Webhook](/docs/advanced/event-webhook) | A webhook that notifies external services when specific events occur in Yorkie documents (e.g., `DocumentRootChanged`). Useful for integrations, notifications, and external system synchronization. |
| Housekeeping | A maintenance process that cleans up unnecessary data on the server, such as deactivating clients that exceed the `client-deactivate-threshold`. Configured via [CLI](/docs/tools/cli#updating-the-project). |
| [Housekeeping](/docs/internals/housekeeping) | A background service that periodically cleans up unnecessary data on the server, including deactivating inactive clients and compacting documents. Configured via [CLI](/docs/tools/cli#updating-the-project). |
1 change: 1 addition & 0 deletions docs/internals.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -14,6 +14,7 @@ This section explains how Yorkie works under the hood and how to deploy it at sc
- **[Document Lifecycle](/docs/internals/document-lifecycle)**: Understand the state transitions of documents and clients, from attachment through detachment and removal
- **[YSON](/docs/internals/yson)**: Learn about Yorkie Structured Object Notation, the data format used to represent documents with specialized types for collaborative editing
- **[Cluster Mode](/docs/internals/cluster-mode)**: Deploy Yorkie in production with sharded cluster mode, consistent hashing, leader election, and MongoDB sharding
- **[Housekeeping](/docs/internals/housekeeping)**: Learn how the background service deactivates inactive clients for garbage collection and compacts documents to reduce storage overhead

### Prerequisites

Expand Down
172 changes: 172 additions & 0 deletions docs/internals/housekeeping.mdx
Original file line number Diff line number Diff line change
@@ -0,0 +1,172 @@
---
title: 'Housekeeping'
order: 126
---

## Housekeeping

Housekeeping is a background service that periodically cleans up resources and data that are no longer needed in Yorkie. It plays a crucial role in maintaining performance and efficiency by managing memory usage and optimizing data storage in the CRDT-based collaborative system.

### Overview

As documents are edited over time, two types of overhead accumulate:

1. **Inactive clients** prevent [Garbage Collection](/docs/internals/crdt-concepts#garbage-collection) from reclaiming tombstoned nodes.
2. **Change history** grows continuously, increasing storage and memory costs.

Housekeeping addresses both by running two scheduled tasks:

| Task | Purpose |
|------|---------|
| **Client Deactivation** | Deactivates clients that have been inactive beyond a threshold, enabling more effective garbage collection |
| **[Document Compaction](/docs/internals/synchronization#document-compaction)** | Consolidates old change history into a single snapshot to reduce storage overhead |

```mermaid
graph LR
subgraph Scheduler["Scheduler (gocron)"]
Timer["Periodic Trigger"]
end

subgraph Tasks["Housekeeping Tasks"]
CD["Client Deactivation"]
DC["Document Compaction"]
end

subgraph Lock["Distributed Locking"]
DL["Prevent Duplicate Runs"]
end

Timer --> CD
Timer --> DC
CD --> DL
DC --> DL
```

### Client Deactivation for Garbage Collection

#### Why It Matters

In Yorkie's CRDT system, [Garbage Collection](/docs/internals/crdt-concepts#garbage-collection) uses the `minVersionVector` to determine which [tombstoned](/docs/internals/crdt-concepts#tombstones) nodes can be safely removed. The `minVersionVector` represents the minimum of all active clients' [version vectors](/docs/internals/crdt-concepts#version-vectors) -- the set of changes that every active client has definitely received.

If a client becomes inactive but remains registered, its outdated version vector holds back the `minVersionVector`, preventing garbage collection from reclaiming potentially large amounts of data.

```mermaid
sequenceDiagram
participant A as Client A (Active)
participant S as Server
participant B as Client B (Inactive 24h+)

Note over S: minVersionVector stuck at B's old vector
Note over S: Tombstones cannot be collected

S->>S: Housekeeping: Deactivate Client B
Note over S: minVersionVector advances to A's vector
Note over S: Tombstones now eligible for GC

A->>S: PushPull
S->>A: Response with updated minVersionVector
Note over A: GC: remove eligible tombstones
```

#### How It Works

1. The scheduler triggers the deactivation task at the configured interval.
2. For each [Project](/docs/advanced/projects), the system queries for clients that have not communicated with the server for longer than the `client-deactivate-threshold` (default: 24 hours).
3. Each candidate client is deactivated, removing it from the active client set.
4. The `minVersionVector` can now advance, unblocking garbage collection.

Projects are processed in a round-robin fashion across runs, distributing load over time rather than processing all projects in a single cycle.

### Document Compaction

Over time, a document accumulates a large history of individual changes. Document Compaction reduces storage overhead by:

1. Removing old change history that is no longer needed for synchronization.
2. Creating a new initial change that represents the current document state.
3. Maintaining document integrity while reducing metadata size.

#### Compaction Criteria

A document is eligible for compaction when:

- It has accumulated at least `CompactionMinChanges` changes (default: 1000).
- It is **not** currently attached to any client.

The second condition ensures that compaction does not interfere with active editing sessions. Document content remains identical after compaction -- only the internal change history is consolidated.

### Configuration

Housekeeping behavior is configured through server startup flags or the [CLI](/docs/tools/cli). The key parameters are:

| Parameter | Description | Default |
|-----------|-------------|---------|
| `housekeeping-interval` | Time between housekeeping runs | `30s` |
| `housekeeping-candidates-limit-per-project` | Maximum candidates returned per project in a single run | `500` |
| `housekeeping-project-fetch-size` | Number of projects fetched per run | `100` |
| `housekeeping-compaction-min-changes` | Minimum number of changes before a document is eligible for compaction | `1000` |
| `client-deactivate-threshold` | Time after which an inactive client is deactivated | `24h` |

These can be set when starting the server:

```bash
$ yorkie server \
--housekeeping-interval 30s \
--housekeeping-candidates-limit-per-project 500 \
--housekeeping-compaction-min-changes 1000 \
--client-deactivate-threshold 24h
```

Or updated per project using the [CLI](/docs/tools/cli#updating-the-project):

```bash
$ yorkie project update <project-name> \
--client-deactivate-threshold 12h
```

#### Configuration by Environment

For **development**, use shorter intervals and lower thresholds for faster feedback:

```bash
$ yorkie server \
--housekeeping-interval 10s \
--housekeeping-candidates-limit-per-project 10 \
--housekeeping-compaction-min-changes 100
```

For **production**, use longer intervals and higher limits to balance throughput with resource usage:

```bash
$ yorkie server \
--housekeeping-interval 1m \
--housekeeping-candidates-limit-per-project 1000 \
--housekeeping-compaction-min-changes 5000
```

### Cluster Mode Behavior

In a [Cluster Mode](/docs/internals/cluster-mode) deployment, only the leader server executes housekeeping tasks. This is coordinated through leader election, preventing duplicate work across cluster nodes.

For more on leader election, see [Cluster Mode: Architecture Components](/docs/internals/cluster-mode#architecture-components).

### Monitoring

Housekeeping logs its activity for observability:

```
HSKP: candidates 150, deactivated 45, 2.3s
HSKP: candidates 89, compacted 12, 1.8s
```

These logs show the number of candidates processed, the actions taken, and the duration of each run. Use these to tune configuration parameters for your workload.

### Further Reading

- [Housekeeping design document](https://github.com/yorkie-team/yorkie/blob/main/design/housekeeping.md) -- Full technical design
- [CRDT Concepts: Garbage Collection](/docs/internals/crdt-concepts#garbage-collection) -- How GC uses version vectors to reclaim tombstones
- [Synchronization: Document Compaction](/docs/internals/synchronization#document-compaction) -- How compaction fits into the sync lifecycle
- [Garbage Collection design document](https://github.com/yorkie-team/yorkie/blob/main/design/garbage-collection.md) -- Deep dive into the GC mechanism
- [Projects](/docs/advanced/projects) -- Per-project configuration including housekeeping thresholds
- [CLI: Updating the Project](/docs/tools/cli#updating-the-project) -- How to configure client deactivation threshold
- [Cluster Mode](/docs/internals/cluster-mode) -- Leader election and distributed coordination
- [Glossary](/docs/glossary) -- Definitions of all key terms
2 changes: 1 addition & 1 deletion docs/internals/synchronization.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -236,7 +236,7 @@ Over time, a document accumulates a large history of individual changes. **Docum
2. Creating a new initial change that represents the current document state.
3. Maintaining document integrity while reducing metadata size.

Compaction is performed by the [Housekeeping](/docs/glossary) background service and only runs on documents that:
Compaction is performed by the [Housekeeping](/docs/internals/housekeeping) background service and only runs on documents that:
- Have at least a configured minimum number of changes (default: 1000)
- Are not currently attached to any client

Expand Down