Skip to content

[BUG] Error submitting workflow #2132

@nocoli

Description

@nocoli

What is the bug?

Attempting to follow https://github.com/opensearch-project/opensearch-migrations/wiki/Backfill-Workflow and the workflow submit command is resulting in the following error in Cloudshell

(05:52:06) migration-console (~) -> workflow submit
NOT checking if all secrets have been created.  Run `workflow configure edit` to confirm
Initializing workflow from session: default
Submitting workflow to namespace: ma
ERROR:console_link.workflow.services.script_runner:Script failed with exit code 1
ERROR:console_link.workflow.services.script_runner:stderr: Error: Found 4 errors

Error submitting workflow: Command '['/root/configProcessor/createMigrationWorkflowFromUserConfiguration.sh', '/tmp/tmpcy9mouvd.yaml', '--prefix ma', '--etcd-endpoints http://etcd.ma.svc.cluster.local:2379']' returned non-zero exit status 1.
ERROR:console_link.workflow.commands.submit:Workflow submission failed
Traceback (most recent call last):
  File "/root/lib/console_link/console_link/workflow/commands/submit.py", line 139, in submit_command
    submit_result = runner.submit_workflow(config_yaml, args)
                    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/root/lib/console_link/console_link/workflow/services/script_runner.py", line 205, in submit_workflow
    output = self.run_script("createMigrationWorkflowFromUserConfiguration.sh", None,
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/root/lib/console_link/console_link/workflow/services/script_runner.py", line 113, in run_script
    return self.run(self.script_dir / script_name, input_data, *args)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/root/lib/console_link/console_link/workflow/services/script_runner.py", line 75, in run
    result = subprocess.run(
             ^^^^^^^^^^^^^^^
  File "/usr/lib64/python3.11/subprocess.py", line 571, in run
    raise CalledProcessError(retcode, process.args,
subprocess.CalledProcessError: Command '['/root/configProcessor/createMigrationWorkflowFromUserConfiguration.sh', '/tmp/tmpcy9mouvd.yaml', '--prefix ma', '--etcd-endpoints http://etcd.ma.svc.cluster.local:2379']' returned non-zero exit status 1.
ERROR:console_link.workflow.commands.submit:Unexpected error submitting workflow: 1
Traceback (most recent call last):
  File "/root/lib/console_link/console_link/workflow/commands/submit.py", line 139, in submit_command
    submit_result = runner.submit_workflow(config_yaml, args)
                    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/root/lib/console_link/console_link/workflow/services/script_runner.py", line 205, in submit_workflow
    output = self.run_script("createMigrationWorkflowFromUserConfiguration.sh", None,
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/root/lib/console_link/console_link/workflow/services/script_runner.py", line 113, in run_script
    return self.run(self.script_dir / script_name, input_data, *args)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/root/lib/console_link/console_link/workflow/services/script_runner.py", line 75, in run
    result = subprocess.run(
             ^^^^^^^^^^^^^^^
  File "/usr/lib64/python3.11/subprocess.py", line 571, in run
    raise CalledProcessError(retcode, process.args,
subprocess.CalledProcessError: Command '['/root/configProcessor/createMigrationWorkflowFromUserConfiguration.sh', '/tmp/tmpcy9mouvd.yaml', '--prefix ma', '--etcd-endpoints http://etcd.ma.svc.cluster.local:2379']' returned non-zero exit status 1.

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/root/lib/console_link/console_link/workflow/commands/submit.py", line 162, in submit_command
    ctx.exit(ExitCode.FAILURE.value)
  File "/.venv/lib64/python3.11/site-packages/click/core.py", line 738, in exit
    raise Exit(code)
click.exceptions.Exit: 1
Error: 1

What are your migration environments?

Source: Elasticsearch 6.8
Target: Opensearch 3.3
Migration Assistant: 2.6.1

How can one reproduce the bug?

Here is the template I am using

sourceClusters:
  my-source:
    endpoint: 
	  https://{SOURCE_ENDPOINT}.{REGION}.es.amazonaws.com
    version: "6.8"
    authConfig:
	  sigv4:
	    region: {REGION}
	    service: es

targetClusters:
  my-target:
    endpoint:
	  https://{TARGET_ENDPOINT}.{REGION}.es.amazonaws.com
    version: "3.3"
    authConfig:
	  sigv4:
	    region: {REGION}
	    service: es

migrations:
- sourceCluster: my-source
  targetCluster: my-target
  snapshotMigrations:
  - indices: ["docs-*"]
    metadataMigration:
	  enabled: true
    documentBackfill:
	  enabled: true

What is the expected behavior?

If it's an issue with the configuration I would expect an error that tells me what exactly is wrong with it.

Do you have any additional context?

Config is fairly basic, pretty much a copy and paste from the steps in the link above.

Connections look fine as per below:

(05:56:52) migration-console (~) -> console clusters connection-check
SOURCE CLUSTER
ConnectionResult(connection_message='Successfully connected!', connection_established=True, cluster_version='6.8.0')
TARGET CLUSTER
ConnectionResult(connection_message='Successfully connected!', connection_established=True, cluster_version='3.3.0')
(05:58:19) migration-console (~) -> console clusters cat-indices

WARNING: Cluster information may be stale. Use --refresh to update.

SOURCE CLUSTER
health status index                          uuid                   pri rep docs.count docs.deleted store.size pri.store.size
green  open   .kibana_1                      x3Gp5oeJSIKVmB-l3Rs-0Q   1   1          0            0       522b           261b
green  open   docs-2 V3E7z6lWSaauawCWuKzU2w   5   1      58664            0     42.1mb           21mb
green  open   .tasks                         JkrS9ev-T3i5pb4GomnEqA   1   1         12            0     42.7kb         21.3kb
green  open   docs-1            QGrM3J9_QB-buiEurf_vWw   5   1  114663048     17135159    102.7gb         51.3gb

TARGET CLUSTER
health status index                          uuid                   pri rep docs.count docs.deleted store.size pri.store.size
green  open   .plugins-ml-config             epvwuItXRyC59f1n3nIniA   1   1          1            0      8.1kb            4kb
green  open   .plugins-ml-jobs               WtfLH8svTse3eBWvWDYHJA   1   1          1            0     14.9kb          6.3kb
green  open   .opendistro-job-scheduler-lock RSJ97ZwhSfaON22el6mlKQ   1   1          1           11     59.6kb         37.8kb
green  open   .kibana_1                      6u3TGew_RXePdsp30B7Q7Q   1   1          1            0     10.5kb          5.2kb

I added an S3 snapshot repo to the config during debugging as per the example in the link above which resulted in the same error output above except for Error: Found 4 errors was changed to Error: Found 6 errors which is what is making me think there could be some trouble parsing the template

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions