Skip to content

Conversation

@bijay27bit
Copy link

No description provided.

@google-cla
Copy link

google-cla bot commented Dec 17, 2024

Thanks for your pull request! It looks like this may be your first contribution to a Google open source project. Before we can look at your pull request, you'll need to sign a Contributor License Agreement (CLA).

View this failed invocation of the CLA check for more information.

For the most up to date status, view the checks section at the bottom of the pull request.

Then Verify the pipeline status is "Succeeded"
Then Verify data is transferred to target GCS bucket

#Added new scenarios for GCS Sink - Bijay
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Remove the commented line here.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.


#Added new scenarios for GCS Sink - Bijay
@BQ_SOURCE_TEST @GCS_SINK_TEST
Scenario:Validate successful records transfer from BigQuery to GCS with macro enabled at sink
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add the macro scenario in a separate feature file with name macro, refer other plugins feature file for naming convention.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

Then Verify data is transferred to target GCS bucket

@GCS_SINK_TEST @BQ_SOURCE_TEST
Scenario Outline: To verify data is getting transferred successfully from BigQuery to GCS with contenttype selection
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add the validation for file format as well

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In progress.

| tsv | text/plain |

@BQ_SOURCE_TEST @GCS_SINK_TEST
Scenario: To verify data is getting transferred successfully from BigQuery to GCS using advanced file system properties field
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why are we adding macro here again, It is already covered in macro enabled scenario. It should be for without macro enabled

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

@BQ_SOURCE_TEST @GCS_SINK_TEST
Scenario: To verify data is getting transferred successfully from BigQuery to GCS using advanced file system properties field
Given Open Datafusion Project to configure pipeline
When Source is BigQuery
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Use the latest existing steps. This is a common review comment across all scenarios.

Then Close the GCS properties
Then Save the pipeline
Then Preview and run the pipeline
Then Enter runtime argument value "gcsFileSysProperty" for key "FileSystemPr"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't see any value added in parameter file for file system property

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

Then Close the preview
Then Deploy the pipeline
Then Run the Pipeline in Runtime
Then Enter runtime argument value "projectId" for key "gcsProjectId"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Remove the properties from the macro which are already covered in the scenarios . for eg-projectID is already covered.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

errorMessageInvalidSourcePath=Invalid bucket name in path 'abc@'. Bucket name should
errorMessageInvalidDestPath=Invalid bucket name in path 'abc@'. Bucket name should
errorMessageInvalidEncryptionKey=CryptoKeyName.parse: formattedString not in valid format: Parameter "abc@" must be
errorMessageInvalidBucketNameSink=Spark program 'phase-1' failed with error: Errors were encountered during validation. Error code: 400, Unable to read or access GCS bucket. Bucket names must be at least 3 characters in length, got 2: 'gg'. Please check the system logs for more details.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add only relevant error message.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

Then Verify data is transferred to target GCS bucket

@GCS_SINK_TEST @BQ_SOURCE_TEST @GCS_Sink_Required
Scenario Outline: To verify successful data transfer from BigQuery to GCS for different formats with write header true
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This scenario should be from GCS source to GCS sink right? Re-check and change accordingly. And why are we making it a macro scenarios, it is already covered in macro enabled scenario anyways.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

@GCS_SINK_TEST @BQ_SOURCE_TEST @GCS_Sink_Required
Scenario Outline: To verify successful data transfer from BigQuery to GCS for different formats with write header true
Given Open Datafusion Project to configure pipeline
When Source is BigQuery
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Use the latest existing steps from framework. Change in all the scenarios

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

Then Wait till pipeline is in running state
Then Open and capture logs
Then Verify the pipeline status is "Succeeded"
Then Verify data is transferred to target GCS bucket
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add the validation steps in all the scenarios.

Then Click on the Validate button
Then Verify that the Plugin Property: "format" is displaying an in-line error message: "errorMessageInvalidFormat"

@GCS_SINK_TEST @BQ_SOURCE_TEST
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Change the tag order, for ease of understanding.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

Then Enter runtime argument value "gcsInvalidBucketNameSink" for key "gcsSinkPath"
Then Run the Pipeline in Runtime with runtime arguments
Then Wait till pipeline is in running state
Then Verify the pipeline status is "Failed"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add open and capture logs step

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

@AnkitCLI AnkitCLI added the build Trigger unit test build label Dec 17, 2024
Then Verify data is transferred to target GCS bucket
Then Validate the values of records transferred to GCS bucket is equal to the values from source BigQuery table

@GCS_CSV @GCS_SINK_TEST @GCS_Source_Required @ITN_TEST
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

remove ITN_TEST tag

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

When Select plugin: "GCS" from the plugins list as: "Sink"
Then Connect plugins: "GCS" and "GCS2" to establish connection
Then Navigate to the properties page of plugin: "GCS"
Then Select dropdown plugin property: "select-schema-actions-dropdown" with option value: "clear"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why are we using this step?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This step is just clearing the 'output schema' clear.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I mean why are we adding this step? It is not required right?

Then Open and capture logs
Then Verify the pipeline status is "Succeeded"
Then Verify data is transferred to target GCS bucket
Then Validate the cmek key "cmekGCS" of target GCS bucket if cmek is enabled
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add the validation step for validating the values.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In progress.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

| csv | text/csv |
| tsv | text/plain |


Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

remove extra line here

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

Examples:
| FileFormat |
| csv |
#| tsv |
Copy link
Contributor

@itsmekumari itsmekumari Jan 6, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why are these commented , Uncomment tsv and delimited

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed.

@bijay27bit bijay27bit force-pushed the E2EgcsNewChangesSink_BT branch 6 times, most recently from caf0a54 to 9c57d24 Compare January 14, 2025 07:52
Then Validate the values of records transferred to GCS bucket is equal to the values from source BigQuery table

@GCS_AVRO_FILE @GCS_SINK_TEST @GCS_Source_Required
Scenario Outline: To verify data transferred successfully from GCS Source to GCS Sink with write header true at Sink
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This scenario can be merged with Validate successful records transfer from BigQuery to GCS with advanced file system properties field & To verify data is getting transferred successfully from BigQuery to GCS with contenttype selection, why do we need separate scenario for these?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The comment was also to merge this scenario with other two as well.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done. Please review.

Then Verify data is transferred to target GCS bucket
Then Validate the values of records transferred to GCS bucket is equal to the values from source BigQuery table

@GCS_AVRO_FILE @GCS_SINK_TEST @GCS_Source_Required
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why GCS_Source_Required tag here?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removed.

@bijay27bit bijay27bit force-pushed the E2EgcsNewChangesSink_BT branch from c4e93b9 to e8d2829 Compare January 17, 2025 06:37
@bijay27bit bijay27bit force-pushed the E2EgcsNewChangesSink_BT branch 7 times, most recently from 04f605c to 8ef1631 Compare January 22, 2025 07:53
@itsmekumari itsmekumari force-pushed the E2EgcsNewChangesSink_BT branch from 8ef1631 to 4c570a8 Compare February 3, 2025 09:27
@itsmekumari itsmekumari force-pushed the E2EgcsNewChangesSink_BT branch from 4c570a8 to 9181e54 Compare February 4, 2025 05:02
@AnkitCLI AnkitCLI merged commit 0b391d8 into data-integrations:develop Feb 6, 2025
15 of 16 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

build Trigger unit test build

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants