diff --git a/docs/operator_subscriptions/_category_.json b/docs/operator_subscriptions/_category_.json new file mode 100644 index 00000000000..b4b15b2810c --- /dev/null +++ b/docs/operator_subscriptions/_category_.json @@ -0,0 +1,4 @@ +{ + "label": "Subscriptions", + "collapsible": true +} diff --git a/docs/operator_subscriptions/subscriptions.md b/docs/operator_subscriptions/subscriptions.md new file mode 100644 index 00000000000..5caa5256c01 --- /dev/null +++ b/docs/operator_subscriptions/subscriptions.md @@ -0,0 +1,190 @@ +# Rucio Subscriptions + +## Overview + +A **subscription** in Rucio defines a set of rules and filters that are automatically applied to datasets and files as they are registered in the system. Subscriptions can trigger the creation of replication rules for matching data, enabling automated data placement and management. + +Subscriptions are a core way to automate data workflows and ensure datasets are distributed according to organizational needs. + +--- + +## Why Use Subscriptions? + +- **Automate Data Management:** Replicate datasets automatically based on metadata, names, or other attributes. +- **Policy Enforcement:** Ensure compliance with data distribution policies. +- **Efficiency:** Avoid manual rule creation for every new dataset. + +--- + +## Key Concepts + +- **Filter:** A dictionary specifying criteria for which data the subscription applies to (e.g., dataset name patterns, account, project). +- **Replication Rules:** Instructions for how matching data should be replicated (number of copies, target RSEs, weights, etc). +- **Retroactive:** Whether the subscription should apply to pre-existing data. +- **Lifetime:** Optional duration (in days) for the subscription to remain active. +- **Priority:** Determines the order in which subscriptions are processed. +- **State:** Subscriptions can be ACTIVE, NEW, INACTIVE, or UPDATED. +- **Keep History:** Optionally record a history of all subscription changes. + +--- + +## Using the Python Client + +The `Client` class exposes subscription management methods via its `add_subscription`, `list_subscriptions`, `update_subscription`, etc. + +```python +from rucio.client import Client + +client = Client() +``` + +### Creating a Subscription + +```python +sub_id = client.add_subscription( + name="my_subscription", + account="my_account", + filter_={'project': 'ATLAS', 'datatype': 'AOD'}, + replication_rules=[{'copies': 2, 'rse_expression': 'T1_*', 'weight': 'ddm'}], + comments="Automate AOD placement", + lifetime=30, # Optional: days, or False for no expiry + retroactive=False, # Retroactive mode is not implemented + dry_run=False, # If True, only prints actions + priority=3 # Optional: default is 3 +) +print(f"Created subscription ID: {sub_id}") +``` + +--- + +### Listing Subscriptions + +```python +# List all subscriptions for an account +subs = client.list_subscriptions(account="my_account") +for sub in subs: + print(sub) + +# List a specific subscription by name +subs = client.list_subscriptions(name="my_subscription", account="my_account") +print(list(subs)) +``` + +--- + +### Updating a Subscription + +```python +result = client.update_subscription( + name="my_subscription", + account="my_account", + filter_={'project': 'ATLAS', 'datatype': 'RAW'}, + replication_rules=[{'copies': 1, 'rse_expression': 'T2_*'}], + comments="Update to RAW", + priority=2 +) +print("Subscription updated:", result) +``` + +--- + +### Deactivating a Subscription + +```python +result = client.deactivate_subscription( + name="my_subscription", + account="my_account" +) +print("Subscription deactivated:", result) +``` + +--- + +### Listing Rules Associated With a Subscription + +```python +rules = client.list_subscription_rules(account="my_account", name="my_subscription") +for rule in rules: + print(rule) +``` + +--- + +## Error Handling + +- Operations will raise exceptions (e.g., `NotFound`) if subscriptions are missing or requests are invalid. +- `retroactive=True` is not implemented and will raise `NotImplementedError`. + +--- + +# Subscription Algorithms + +Rucio supports advanced subscription scenarios through algorithms that determine how child rules are placed based on the outcome of parent rules or the result of RSE expressions. These are especially relevant for "chained" subscriptions and complex data workflows. + +--- + +## Chained Subscription Algorithms + +Chained subscriptions allow the placement of new rules to depend on the outcome or parameters of rules created in a previous step (the "parent"). The specific algorithm to use for this chaining is specified in the subscription rule. + +The two main chained subscription algorithms are: + +### 1. `associated_site` + +**Purpose:** +Selects an associated RSE based on the `associated_site` attribute from the parent rule's RSE. + +**How it works:** +- When a rule is created on an RSE that has the `associated_site` attribute, the algorithm will select one of the associated sites based on the given index (`associated_site_idx`). +- This enables multi-step workflows where the child rule is coordinated with the parent’s site. + +**Usage Example:** +If a parent rule is on an RSE with `associated_site="T1_FR_CCIN2P3,T1_DE_KIT"`, and you want the child rule to go to the first associated site: +```json +{ + "algorithm": "associated_site", + "associated_site_idx": 1 +} +``` + +--- + +### 2. `exclude_site` + +**Purpose:** +Places the child rule on an RSE that does **not** match the site of the parent rule. + +**How it works:** +- The algorithm finds the `site` attribute of the parent rule's RSE. +- Modifies the RSE expression to exclude this site, ensuring the child rule lands elsewhere. + +**Usage Example:** +Used when you need to guarantee data is not replicated to the same site in multiple workflow stages. + +--- + +## Split Rule Option + +While not an "algorithm" per se, the `split_rule` filter option is often used to create one rule per matching RSE, useful for fine-grained placement and accounting. It can be combined with chained subscriptions for even more advanced workflows. + +**Purpose:** +- When `split_rule` is specified in a subscription filter, rules are created separately for each RSE matching the expression. +- The number of rules created is equal to the number of RSEs matching the expression. +- If `"copies": "*"` is specified, the number of copies is set to the number of matched RSEs. + +--- + +## Summary Table + +| Algorithm | Purpose | Key Parameters | +|-------------------|-----------------------------------------------------|---------------------------| +| associated_site | Child rule on parent’s associated site | associated_site_idx | +| exclude_site | Child rule avoids parent’s site | | +| split_rule | Rule per matching RSE (not chained) | split_rule (filter) | + +--- + +## Further Reading + +- [Transmogrifier details and implementation](transmogrifier.md) +- [Rucio Data Management Documentation](https://rucio.readthedocs.io/en/latest/) diff --git a/docs/operator_subscriptions/transmogrifier.md b/docs/operator_subscriptions/transmogrifier.md new file mode 100644 index 00000000000..c1cc72f1f93 --- /dev/null +++ b/docs/operator_subscriptions/transmogrifier.md @@ -0,0 +1,121 @@ +# Rucio Transmogrifier Daemon + +## Purpose + +The **transmogrifier daemon** in Rucio is responsible for automatically creating and managing **replication rules** for new DIDs (Data Identifiers) according to user-defined **subscriptions**. This automation ensures that new data is distributed across storage endpoints as soon as it appears, following the policies and patterns defined by users. + +--- + +## How It Works + +### Step-by-Step Behaviour + +1. **Initialization** + - The daemon starts, sets up logging, threads, and heartbeats. + - It checks the database schema for compatibility. + +2. **Loading Subscriptions** + - Active subscriptions are loaded and validated. + - Subscriptions contain filters (which data to match) and replication rules (where and how much to replicate). + +3. **Fetching New DIDs** + - The daemon queries for new DIDs (datasets or containers) that have not yet been processed. + +4. **Matching DIDs Against Subscriptions** + - For each new DID, it fetches metadata and checks against all subscription filters. + - Each filter can include scope, name patterns, account, DID type, file size limits, and more. + +5. **Processing Matching Subscriptions** + - For each matching subscription and DID: + - Parses the subscription’s replication rules. + - Prepares rule parameters (number of copies, RSE expressions, activity, etc.). + - Handles **split** and **chained** logic for advanced placement scenarios. + +6. **Selecting RSEs (Storage Endpoints)** + - Depending on rule logic: + - **Direct:** Uses the RSEs specified in the rule. + - **Split/Chained:** Calls algorithms to dynamically select RSEs based on previous rule placements or specific logic. + +7. **Rule Creation** + - Attempts to create the required replication rules for each selected RSE. + - Handles errors, blocklisted RSEs, and retry logic. + +8. **Marking & Updating** + - Successfully processed DIDs are marked to avoid reprocessing. + - Updates the subscription’s metadata (e.g., `last_processed` timestamp). + +9. **Metrics and Logging** + - The daemon records metrics on processing counts, errors, durations, and more for monitoring. + +10. **Loop or Exit** + - If running in continuous mode, the daemon sleeps and repeats. + - If running in one-shot mode, it exits. + +--- + +## Algorithms for Chained Rules + +Chained subscriptions enable advanced, context-aware data placement strategies. The `algorithm` parameter determines how the next RSE(s) are selected based on previous rules. + +| Algorithm | Description | Typical Use Case | +|-------------------|------------------------------------------------------------------|----------------------------------------------------| +| `associated_site` | Select an associated RSE/site from a previous rule's site using the `ASSOCIATED_SITES` attribute. The `associated_site_idx` parameter determines which associated site to use. | Chaining copies across logically linked sites (e.g., for redundancy or workflow steps). | +| `exclude_site` | Select an RSE that is **not** the site used by the parent rule (using the `site` attribute to exclude). | Ensuring that different copies are placed at physically separate sites. | + +- Both algorithms are only valid for rules with split logic and a single copy. +- Errors are raised if required parameters are missing or misconfigured. + +--- + +## Flowchart: Transmogrifier Behaviour + +```mermaid +flowchart TD + A["Start transmogrifier daemon"] --> B["Initialize threads & heartbeat"] + B --> C["Load active subscriptions"] + C --> D1["Fetch new DIDs"] + D1 --> E1{"For each new DID"} & W1{"No more new DIDs"} + E1 --> F1["Fetch metadata for DID"] + F1 --> G1{"For each subscription"} + G1 --> H1{"Does DID match subscription filter?"} & U1{"All subscriptions checked"} + H1 -- No --> G1 + H1 -- Yes --> I1["Parse subscription rules"] + I1 --> J1{"For each rule in subscription"} + J1 --> K1["Prepare rule parameters
: RSE expression, copies, etc."] & T1{"All rules done for this subscription"} + K1 --> L1{"Is rule split/chained?"} + L1 -- Yes --> M1["Select RSEs using split/chained logic"] + L1 -- No --> N1["Use RSEs from rule directly"] + M1 --> O1["For each selected RSE"] + N1 --> O1 + O1 --> P1{"Is RSE blocklisted and rule wildcarded?"} + P1 -- Yes --> Q1["Check ignore_availability
Skip or create rule"] + P1 -- No --> R1["Create replication rule for DID"] + Q1 --> R1 + R1 --> S1["Record created rule and success"] + S1 --> J1 + T1 --> G1 + U1 --> V1["If successful, mark DID as processed"] + V1 --> D1 + W1 --> X1["Update last_processed in subscriptions"] + X1 --> Y1["Push metrics and logs"] + Y1 --> Z1["Sleep if not once and repeat or exit"] +``` + +--- + +## Key Concepts + +- **DID**: Data Identifier (can be a file, dataset, or container). +- **Subscription**: User-defined pattern and rules for automatic data placement. +- **Replication Rule**: Instruction for Rucio to place a copy of a DID on specified RSEs. +- **RSE**: Rucio Storage Element (storage endpoint, e.g., a data center or cloud bucket). +- **Split/Chained Logic**: Advanced methods for spreading or chaining rules across multiple sites, using algorithms. + +--- + +## Summary + +The transmogrifier daemon is central to Rucio’s automated data management, ensuring that new data is promptly and correctly distributed according to organizational policies and user-defined subscriptions. Its sophisticated logic—including split, chained, and algorithmic rule selection—supports even the most advanced data placement strategies, while robust error handling and metrics allow for reliable, scalable operation. + +--- +