Exclude system tables and only restore user's schema by kendrick-ren · Pull Request #486 · scylladb/scylla-ansible-roles

kendrick-ren · 2025-08-21T17:47:03Z

This PR excludes system tables from the restore process, and only restores user's schema with scylla manager restore --restore-schema command.

to exclude the system tables from the variable "tables_to_restore", so that the playbook only refresh the user's tables with the items listed in this variable
added the step "Restore schema from a backup snapshot" to restore user's schema with scylla manager restore --restore-schema command.
changed prerequisite of README.md
changed vars.yaml.example by adding scylla_manager_cluster_name and changed hosts.example inventory file by adding scyllamgr_host and splitting 2 sections

vladzcloudius · 2025-08-21T20:03:09Z

Exclude system tables and only restore user's schema

Take out the step of cleaning up the directories(data/commit log/hint/...)

Use scylla manager command to restore the schema

Add related variables

Other new features (like change the tombstone_gc setting, adjust playbook structure to ensure cleanup to run, etc) will be added with another PR later

@kendrick-ren, please, split the patch into patches that change one thing, e.g. "Exclude system tables and only restore user's schema".

Also, please, provide a meaningful and more detailed than in the current patch (don't confuse with the PR description).

I'll also leave a few comments in the patch itself.

vladzcloudius

As promised - there are more comments.

example-playbooks/restore_scylla_manager_backup/restore.yaml

vladzcloudius · 2025-08-21T20:09:16Z

example-playbooks/restore_scylla_manager_backup/restore.yaml

  tags:
    - restore_token_ring
  tasks:
+    - name: Stop all nodes serially


Nodes don't need to be stopped serially. Stopping nodes serially is only going to make this step take longer for no reason AFAICT.

That is correct, this is a mistake. Let me move the step of stopping nodes out of this play.

Fixed. Made change to stop all nodes in parallel in line 46 ~ 56.

Test post

test

example-playbooks/restore_scylla_manager_backup/restore.yaml

vladzcloudius · 2025-08-21T20:16:28Z

example-playbooks/restore_scylla_manager_backup/restore.yaml

+  tags:
+    - upload_snapshot
+  tasks:
+    - name: Restore


What's the point of moving this block to be before the restoration of the original cluster's seeds configuration?

The original structure of the playbook that was, first, setting up the destination cluster (all the way) and only then restoring the data makes more sense the proposed version when the data restoration is stuck in between two parts of the cluster configuration steps.

The intension is to make the cleanup as an always run action, so even the restoration or any previous step fails in between somehow, the cluster would still be able to come back.
While the actual change to make it always action comes as a new feature in the next PR.

Fixed. Leave the structure as it was before.

You did not restore the original structure - the "block" and the corresponding indentation are still part of the first commit.
Please, remove.

vladzcloudius · 2025-08-21T20:19:55Z

example-playbooks/restore_scylla_manager_backup/vars.yaml.example

+# Cluster name to restore schema to. This is the name of the target cluster which
+# is registered in ScyllaDB Manager
+#
+cluster_name: <target_cluster_name>


cluster_name is a mandatory value that must appear in scylla.yaml on every Scylla node.
Giving a variable that has a different semantics the same name is very confusing.
Please, rename it to something like scylla_manager_cluster_name.

Agree, scylla_manager_cluster_name is better. Changed.

vladzcloudius · 2025-08-21T20:33:58Z

example-playbooks/restore_scylla_manager_backup/restore.yaml

      shell: |
-        scylla-manager-agent download-files -L {{ backup_location }} -n {{ host_id[inventory_hostname] }} -T {{ snapshot_tag }} -d /var/lib/scylla/data/ --dry-run | grep "^\s*\-" | cut -d"-" -f2 | cut -d"(" -f1
-      become_user: scylla
+        scylla-manager-agent download-files -L {{ backup_location }} -n {{ host_id[inventory_hostname] }} -T {{ snapshot_tag }} --dump-manifest 2>/dev/null | jq -r '.index[] | [.keyspace,.table] | join(".")'


This hunk add a dependency on the 3d party jq tool.
Note that the old version on this command didn't have this requirement.
If you still believe that using jq is necessary, please, make sure to update the README.md by add adding the corresponding requirement to a "Prerequisites" section.

@jpancier-scylla any specific reason on switching to use jq? Can you help elaborate?

based on the discussion today, we'll keep using jq and update the readme correspondingly.

Note that Ansible has a native support for JSON.

Updated the readme to include jq as one prerequisite in the README.md

Note that Ansible has a native support for JSON.

That is a good idea, let me convert the logic to use the built-in json parser instead of jq. Will submit another commit shortly.

Switched to Ansible built-in json parser, took out jq from the code and from README.md

Why not using the built-in Ansible JSON parser?

- name: Parse JSON from stdout ansible.builtin.set_fact: parsed_data: "{{ command_output.stdout | from_json }}"

vladzcloudius · 2025-08-21T20:34:22Z

example-playbooks/restore_scylla_manager_backup/restore.yaml

-#    - name: Let's see our new facts
-#      debug:
-#        msg: "{{ inventory_hostname }} old seeds list is {{ old_seeds }}"
-


Put any cleanup into a separate patch/commit.

What about this comment?

Excluded from this PR. Use a separate patch to get rid of it.

I still see it in the first commit of this PR.

example-playbooks/restore_scylla_manager_backup/restore.yaml

vladzcloudius · 2025-08-21T20:41:25Z

example-playbooks/restore_scylla_manager_backup/restore.yaml

  gather_facts: false
  serial: 1
  tags:
    - restore_token_ring


Unless you wipe old data/commitlog/schema_commitlog/etc. directories like the original playbook did Scylla nodes are not going to bootstrap hence won't pick up new tokens.

This whole change makes very little sense.
Did you test that the playbook with your changes actually works?

The new change was validated in both sanity test and impact test

Fixed, restore to the original logic.

kendrick-ren · 2025-10-21T16:29:12Z

@vladzcloudius I think all comments are resolved. Please review and add new comments if any. Thanks.

vladzcloudius · 2025-10-21T16:42:05Z

@kendrick-ren, please, don't resolve other people's comments. This makes it hard to go over changes that have been requested.
Reply on these comments instead, e.g. "fixed".

vladzcloudius · 2025-10-21T16:47:26Z

example-playbooks/restore_scylla_manager_backup/restore.yaml

-#    - name: Let's see our new facts
-#      debug:
-#        msg: "{{ inventory_hostname }} old seeds list is {{ old_seeds }}"
-


What about this comment?

kendrick-ren · 2025-10-21T17:21:05Z

@vladzcloudius
example-playbooks/restore_scylla_manager_backup/restore.yaml

- name: Let's see our new facts

debug:

msg: "{{ inventory_hostname }} old seeds list is {{ old_seeds }}"

Member
@vladzcloudius vladzcloudius 32 minutes ago
What about this comment?

This would be resolved in another patch, instead of combining them into one, per your earlier another comment.

example-playbooks/restore_scylla_manager_backup/restore.yaml

vladzcloudius · 2025-10-21T20:49:45Z

example-playbooks/restore_scylla_manager_backup/restore.yaml

      become_user: scylla
-      register: _tables_list
+      vars:
+        ansible_pipelining: true        


Why is pipelining needed?
According to Ansible documentation using pipelining conflicts with a privilege escalation (become) which you use in this task.

It is to avoid Ansible to write a temp file under a temp directory for the step, while it only conflicts when "become" is used along with requiretty enabled (which is usually disabled for Ansible automation on systems).

Why do we care if Ansible is going to write a temp file?

vladzcloudius · 2025-10-21T20:54:15Z

example-playbooks/restore_scylla_manager_backup/restore.yaml

+    - name: Save tables names list as a fact
      set_fact:
-        system_schema_tables: "{{ tables_to_restore | select('search', '^\\s*system_schema\\.') | list }}"
+        all_tables: "{{ _tables_list.stdout.split('\n') }}"


I don't get it: don't you already have it in _tables_list.stdout_lines ?

oops, I meant to use that, which was why I generated stdout_lines, but forgot to change the step after.

Fixed now.

vladzcloudius · 2025-10-21T20:55:51Z

example-playbooks/restore_scylla_manager_backup/restore.yaml


+    - name: Save names of tables to restore as a fact
+      set_fact:
+        tables_to_restore: "{{ all_tables | reject('search', '^system(_|\\.)') | list }}"


What about audit keyspace?

audit keyspace is also kept here, this variable is only used in node refresh step. In the steps later, audit keyspace table content is also restored.

vladzcloudius · 2025-10-21T20:56:40Z

example-playbooks/restore_scylla_manager_backup/restore.yaml

        port: 9042
        host: "{{ listen_address }}"

-


Cleanups should be part of a separate commit.

okay, isolated the cleanups related change to a separate commit.

I still see it in the first commit.

vladzcloudius · 2025-10-21T20:58:00Z

example-playbooks/restore_scylla_manager_backup/restore.yaml

    - restore_token_ring
  tasks:
    - name: Delete initial_token in scylla.yaml of {{ inventory_hostname }}
+      tags: cleanup


Adding tags (which allow you run specific "tagged" tasks) should put in a separate dedicated commit.

okay, all cleanups related changes went into a separate commit.

vladzcloudius · 2025-10-21T20:59:05Z

example-playbooks/restore_scylla_manager_backup/restore.yaml

        path: /etc/scylla/scylla.yaml
        regexp: '^initial_token:'
-        line: ""
+        state: absent


This is a change in behavior - should sent as a separate dedicated commit with a description that has a motivation for a change.

since the purpose is to remove initial_token, state: absent will delete the line, while the original line: "" would leave a blank line (which is an extra blank line compared with user's very original scylla.yaml file).

It went to a separate commit.

Please, don't mix independent changes in the same commit: there should be one commit for tagging and one for this hunk.
Each commit with a corresponding description.

vladzcloudius · 2025-10-21T21:00:19Z

example-playbooks/restore_scylla_manager_backup/restore.yaml

+  tasks:
+    - name: Restore ScyllaDB schema from backup
+      tags:
+        - restore_schema


Why there are 2 different tags on the task of a play that has a single task?

That is a good catch. Removed.

vladzcloudius · 2025-10-21T21:02:17Z

example-playbooks/restore_scylla_manager_backup/restore.yaml

        backrefs: yes

+- name: Restore schema from a backup snapshot {{ snapshot_tag }}
+  hosts: all


I don't get it: why do you run this play on hosts: all while in fact you are going to run a single command on a Scylla Manager host?

I think that is inherited, and hosts:all is the default in Ansible any way.

But it would be overwritten by the "delegate_to" and "run_once" setting in the sub task under it.

These are new lines - not inherited.
And you can use whatever host or a group of hosts with the hosts: ... parameter. Hence the question.

Here you were supposed to use scyllamgr_host as your hosts:

Something like (after you address the corresponding comment below):

... hosts: scylla-manager ...

And drop all this delegate_to and run_once nonsense.

vladzcloudius · 2025-10-21T21:05:16Z

example-playbooks/restore_scylla_manager_backup/restore.yaml

-      when: item not in system_schema_tables
+    - name: Restore
+      tags:
+        - restore_data


Same question as before: why there are two tags for the same block of tasks?
And even if there is any reason for that - this must come in a separate commit.
In order to be able to set a tag on a block you had to put all those commands in a block and change their indentation which makes it hard to see what was the functional change in this commit that was supposed to only change the way we restore the schema.

Please, take all this syntactic sugar out of the functional commit.

It makes sense, removed the unnecessary tag.

vladzcloudius · 2025-10-22T15:10:06Z

example-playbooks/restore_scylla_manager_backup/restore.yaml

-#    - name: Let's see our new facts
-#      debug:
-#        msg: "{{ inventory_hostname }} old seeds list is {{ old_seeds }}"
-


I still see it in the first commit of this PR.

kendrick-ren · 2025-10-22T16:05:43Z

#486 (comment)

@vladzcloudius Resolved.

vladzcloudius

The first patch description seems to be misleading.
There wasn't any issue with tables UUIDs on the source cluster in the original procedure AFAIK.

If there wasn't please, refer the corresponding GH.

AFAIU the change is required simply because the SM API has changed and there is a way to restore the schema without having to upload system KS data.

Please, reference the corresponding API change in the commit message too.

kendrick-ren · 2025-10-22T18:54:42Z

The first patch description seems to be misleading. There wasn't any issue with tables UUIDs on the source cluster in the original procedure AFAIK.

If there wasn't please, refer the corresponding GH.

AFAIU the change is required simply because the SM API has changed and there is a way to restore the schema without having to upload system KS data.

Please, reference the corresponding API change in the commit message too.

Other than the API change, I was referring to this GH issue -> scylladb/scylla-manager#3019 (comment) for table UUIDs on the target cluster. Is the that no longer an issue?

vladzcloudius · 2025-10-22T18:46:17Z

example-playbooks/restore_scylla_manager_backup/restore.yaml

      become_user: scylla
-      register: _tables_list
+      vars:
+        ansible_pipelining: true        


Why do we care if Ansible is going to write a temp file?

vladzcloudius · 2025-10-22T18:54:13Z

example-playbooks/restore_scylla_manager_backup/restore.yaml

        port: 9042
        host: "{{ listen_address }}"

-


I still see it in the first commit.

vladzcloudius · 2025-10-22T18:57:14Z

example-playbooks/restore_scylla_manager_backup/restore.yaml

        backrefs: yes

+- name: Restore schema from a backup snapshot {{ snapshot_tag }}
+  hosts: all


These are new lines - not inherited.
And you can use whatever host or a group of hosts with the hosts: ... parameter. Hence the question.

Here you were supposed to use scyllamgr_host as your hosts:

Something like (after you address the corresponding comment below):

... hosts: scylla-manager ...

And drop all this delegate_to and run_once nonsense.

vladzcloudius · 2025-10-22T18:59:34Z

example-playbooks/restore_scylla_manager_backup/restore.yaml

    - name: Get names of the tables in the snapshot {{ snapshot_tag }}
      shell: |
-        scylla-manager-agent download-files -L {{ backup_location }} -n {{ host_id[inventory_hostname] }} -T {{ snapshot_tag }} -d /var/lib/scylla/data/ --dry-run | grep "^\s*\-" | cut -d"-" -f2 | cut -d"(" -f1
+        scylla-manager-agent download-files -L {{ backup_location }} -n {{ host_id[inventory_hostname] }} -T {{ snapshot_tag }} --dump-manifest


This (first) commit does a lot more than changes just the way schema is restored.
It refactors all places that use scylla-manager-agent because its API has change apparently.

The patch description should be adjusted accordingly.

Added description

vladzcloudius · 2025-10-22T19:01:03Z

example-playbooks/restore_scylla_manager_backup/restore.yaml

-      with_items: "{{ tables_to_restore }}"
-      when: item not in system_schema_tables
+    - name: Restore
+      block:


vladzcloudius · 2025-10-22T19:01:42Z

example-playbooks/restore_scylla_manager_backup/restore.yaml

+  tags:
+    - upload_snapshot
+  tasks:
+    - name: Restore


You did not restore the original structure - the "block" and the corresponding indentation are still part of the first commit.
Please, remove.

vladzcloudius · 2025-10-22T19:02:50Z

example-playbooks/restore_scylla_manager_backup/vars.yaml.example

  3.66.25.100: aff05f79-7c69-4ecf-a827-5ea790a0fdc6
+
+# Host where Scylladb Manager is run from
+scyllamgr_host: <scylla_manager_host_ip, e.g. localhost if the playbook is run on the Scylla Manager host>


This should be part of the inventory, not the vars.

Make it a new section in the inventory.

vladzcloudius · 2025-10-22T19:04:44Z

example-playbooks/restore_scylla_manager_backup/restore.yaml

        path: /etc/scylla/scylla.yaml
        regexp: '^initial_token:'
-        line: ""
+        state: absent


Please, don't mix independent changes in the same commit: there should be one commit for tagging and one for this hunk.
Each commit with a corresponding description.

vladzcloudius · 2025-10-22T19:12:10Z

example-playbooks/restore_scylla_manager_backup/restore.yaml

+      block:
+        - name: Download data
+          shell: |
+            scylla-manager-agent download-files -L {{ backup_location }} -n {{ host_id[inventory_hostname] }} -T {{ snapshot_tag }} -d {{ data_dir }} -K '*,!system*' --mode upload


Regular expression here and at the line 51 are not the same.
Hence, if the user has a KS with a name system_my_keyspace you will not download its data but will try to restore it and will report it as a success while in fact you wouldn't restore anything.

Right, changed the regular expression in this one.

On the other hand, the regex in line 51 would also exclude the example you gave, line 51 basically excludes all "system_" and "system.".

Hence, I changed to '*,!system_*,!system.*'

vladzcloudius · 2025-10-22T19:19:38Z

example-playbooks/restore_scylla_manager_backup/restore.yaml

+
+        - name: refresh nodes with the restored data
+          shell: |
+            nodetool refresh {{ item.split('.') | join(' ') }} 


This is not going to work event with the nodetool from 2025.1 since the syntax is now nodetool refresh --keyspace <KS> --table <Table name>:

$ nodetool help refresh Load newly placed SSTables to the system without a restart Add the files to the upload directory, by default it is located under /var/lib/scylla/data/keyspace_name/table_name-UUID/upload. Materialized Views (MV) and Secondary Indexes (SI) of the upload table, and if they exist, they are automatically updated. Uploading MV or SI SSTables is not required and will fail. For more information, see: https://opensource.docs.scylladb.com/branch-2025.1/operating-scylla/nodetool-commands/refresh.html" scylla-nodetool options: -h [ --help ] show help message --help-seastar show help message about seastar options --help-loggers print a list of logger names and exit -h [ --host ] arg (=localhost) the hostname or ip address of the ScyllaDB node -p [ --port ] arg (=10000) the port of the REST API of the ScyllaDB node --rest-api-port arg the port of the REST API of the ScyllaDB node; takes precedence over --port|-p --password arg Remote jmx agent password (unused) --password-file arg Path to the JMX password file (unused) -u [ --username ] arg Remote jmx agent username (unused) --print-port Operate in 4.0 mode with hosts disambiguated by port number (unused) --keyspace arg The keyspace to load sstable(s) into --table arg The table to load sstable(s) into refresh: --load-and-stream Allows loading sstables that do not belong to this node, in which case they are automatically streamed to the owning nodes --primary-replica-only Load the sstables and stream to primary replica node that owns the data. Repair is needed after the load and stream process

I recommend considering to use REST API instead - its API is probably less likely to break unlike the nodetool.
Check if the corresponding REST API in 2022.2 is different from the one in 2025.1.

hmm, the following doc: https://docs.scylladb.com/manual/branch-2025.1/operating-scylla/nodetool-commands/refresh.html
and https://enterprise.docs.scylladb.com/branch-2024.1/operating-scylla/nodetool-commands/refresh.html

Both shows that the same syntax is still supported, except 2025.1 and beyond has more options added. The test done before on both 2024.1 LTS and 2025.1 LTS/2025.2/2025.3 worked.

In addition, REST API is still marked as BETA in 2024.2, while it looks it is not exposed in 2024.1 LTS official doc at all.

We have customer still on 2024.1 LTS. It is probably better to still use nodetool at the moment.

vladzcloudius · 2025-10-22T19:22:47Z

example-playbooks/restore_scylla_manager_backup/restore.yaml

+            nodetool refresh {{ item.split('.') | join(' ') }} 
+          with_items: "{{ tables_to_restore }}"
+
+        - name: Restart Scylla service to pick up the restored data


Why do you need to restart Scylla to pick up a restored data? The whole point of nodetool refresh to NOT require a restart.

That is a good one. Took it out.

kendrick-ren · 2025-10-22T23:13:50Z

For this comment (#486 (comment)), scylla user may or may not have privilege to write the temp file under a temp folder for Ansible task, when it doesn't, this task will give a warning complaining the privilege, which may confuse customers even it doesn't harm. The ansible_pipelining option would avoid this.

kendrick-ren · 2025-10-22T23:14:29Z

for #486 (comment), the cleanups related stuffs are now taken out from the PR.

kendrick-ren · 2025-10-22T23:14:53Z

for #486 (comment), fixed.

…uuids on the target cluster Use Scylla Manager command "restore --restore-schema" (refer to https://manager.docs.scylladb.com/stable/sctool/restore.html for the detail of the command) to restore user's table instead of using the previous approach of through restoring system table data. This avoids the issue discussed in scylladb/scylla-manager#3019. Refactor the task "Get names of the tables in the snapshot" with the new option --dump-manifest and also replace the previous JSON parsing logic by using Ansible built-in JSON parser to get more more reliable results. Refactor inventory file to have 2 separate sections, one for Scylla hosts and the other for Scylla Manager host which is required to run above Scylla Manager "restore --restore-schema" command.

kendrick-ren marked this pull request as draft August 21, 2025 18:26

kendrick-ren marked this pull request as ready for review August 21, 2025 18:29

vladzcloudius requested changes Aug 21, 2025

View reviewed changes

kendrick-ren changed the title ~~restore users schema only and add related variables~~ Exclude system tables and only restore user's schema Oct 21, 2025

kendrick-ren requested a review from vladzcloudius October 21, 2025 16:07

vladzcloudius requested changes Oct 21, 2025

View reviewed changes

kendrick-ren requested a review from vladzcloudius October 21, 2025 17:21

kendrick-ren force-pushed the master branch from 0221b9a to 3aaadfe Compare October 21, 2025 20:35

vladzcloudius requested changes Oct 21, 2025

View reviewed changes

kendrick-ren force-pushed the master branch from 8f4f4d3 to 9044898 Compare October 21, 2025 23:14

kendrick-ren requested a review from vladzcloudius October 21, 2025 23:17

vladzcloudius requested changes Oct 22, 2025

View reviewed changes

kendrick-ren force-pushed the master branch from 9044898 to 4193751 Compare October 22, 2025 15:58

kendrick-ren requested a review from vladzcloudius October 22, 2025 16:01

vladzcloudius requested changes Oct 22, 2025

View reviewed changes

kendrick-ren force-pushed the master branch from 73290b4 to 82409b8 Compare October 22, 2025 23:46

kendrick-ren requested a review from vladzcloudius October 22, 2025 23:50

kendrick-ren force-pushed the master branch from 82409b8 to 1972796 Compare October 23, 2025 16:34

Conversation

kendrick-ren commented Aug 21, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

vladzcloudius commented Aug 21, 2025

Uh oh!

vladzcloudius left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

kendrick-ren commented Oct 21, 2025

Uh oh!

vladzcloudius commented Oct 21, 2025

Uh oh!

Choose a reason for hiding this comment

Uh oh!

kendrick-ren commented Oct 21, 2025

- name: Let's see our new facts

debug:

msg: "{{ inventory_hostname }} old seeds list is {{ old_seeds }}"

Uh oh!

kendrick-ren commented Aug 21, 2025 •

edited

Loading

kendrick-ren Oct 21, 2025 •

edited

Loading