Skip to content

Commit cb6a3d0

Browse files
authored
Merge branch 'release/0.24' into snapshot-removal-6111
2 parents a74431a + 506a768 commit cb6a3d0

File tree

71 files changed

+1611
-479
lines changed

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

71 files changed

+1611
-479
lines changed

docs/GCSFile-batchsource.md

Lines changed: 24 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -33,22 +33,44 @@ You also can use the macro function ${conn(connection-name)}.
3333
**Project ID:** Google Cloud Project ID, which uniquely identifies a project.
3434
It can be found on the Dashboard in the Google Cloud Platform Console.
3535

36+
**Service Account Type:** Service account type, file path where the service account is located or the JSON content of
37+
the service account.
38+
39+
**Service Account File Path:** Path on the local file system of the service account key. Can be set to 'auto-detect'.
40+
41+
**Service Account JSON:** Contents of the service account JSON file.
42+
3643
**Path:** Path to file(s) to be read. If a directory is specified, terminate the path name with a '/'.
3744
For example, `gs://<bucket>/path/to/directory/`.
3845
An asterisk ("\*") can be used as a wildcard to match a filename pattern.
3946
If no files are found or matched, the pipeline will fail.
4047

4148
**Format:** Format of the data to read.
42-
The format must be one of 'avro', 'blob', 'csv', 'delimited', 'json', 'parquet', 'text', 'tsv', or the
49+
The format must be one of 'avro', 'blob', 'csv', 'delimited', 'json', 'parquet', 'text', 'tsv', 'xls', or the
4350
name of any format plugin that you have deployed to your environment.
4451
If the format is a macro, only the pre-packaged formats can be used.
4552
If the format is 'blob', every input file will be read into a separate record.
4653
The 'blob' format also requires a schema that contains a field named 'body' of type 'bytes'.
4754
If the format is 'text', the schema must contain a field named 'body' of type 'string'.
4855

56+
**Get Schema:** Auto-detects schema from file. Supported formats are: avro, parquet, csv, delimited, tsv, blob, text, and xls.
57+
58+
**Sample Size:** The maximum number of rows that will get investigated for automatic data type detection.
59+
The default value is 1000. This is only used when the format is 'xls'.
60+
61+
**Override:** A list of columns with the corresponding data types for whom the automatic data type detection gets
62+
skipped. This is only used when the format is 'xls'.
63+
64+
**Terminate Reading After Empty Row:** Specify whether to stop reading after encountering the first empty row. Defaults to false. When false the reader will read all rows in the sheet. This is only used when the format is 'xls'.
65+
66+
**Select Sheet Using:** Select the sheet by name or number. Default is 'Sheet Number'. This is only used when the format is 'xls'.
67+
68+
**Sheet Value:** The name/number of the sheet to read from. If not specified, the first sheet will be read.
69+
Sheet Numbers are 0 based, ie first sheet is 0. This is only used when the format is 'xls'.
70+
4971
**Delimiter:** Delimiter to use when the format is 'delimited'. This will be ignored for other formats.
5072

51-
**Use First Row as Header:** Whether to use first row as header. Supported formats are 'text', 'csv', 'tsv', 'delimited'.
73+
**Use First Row as Header:** Whether to use first row as header. Supported formats are 'text', 'csv', 'tsv', 'delimited', 'xls'.
5274

5375
**Enable Quoted Values:** Whether to treat content between quotes as a value. This value will only be used if the format
5476
is 'csv', 'tsv' or 'delimited'. For example, if this is set to true, a line that looks like `1, "a, b, c"` will output two fields.

pom.xml

Lines changed: 94 additions & 113 deletions
Original file line numberDiff line numberDiff line change
@@ -55,25 +55,14 @@
5555
<url>https://issues.cask.co/browse/CDAP</url>
5656
</issueManagement>
5757

58-
<distributionManagement>
59-
<repository>
60-
<id>sonatype.release</id>
61-
<url>https://oss.sonatype.org/service/local/staging/deploy/maven2</url>
62-
</repository>
63-
<snapshotRepository>
64-
<id>sonatype.snapshots</id>
65-
<url>https://oss.sonatype.org/content/repositories/snapshots</url>
66-
</snapshotRepository>
67-
</distributionManagement>
68-
6958
<properties>
7059
<jee.version>7</jee.version>
7160
<project.build.sourceEncoding>UTF-8</project.build.sourceEncoding>
7261
<avro.version>1.11.0</avro.version>
7362
<bigquery.connector.hadoop3.version>hadoop3-1.2.0</bigquery.connector.hadoop3.version>
7463
<commons.codec.version>1.4</commons.codec.version>
75-
<cdap.version>6.11.0-SNAPSHOT</cdap.version>
76-
<cdap.plugin.version>2.13.0-SNAPSHOT</cdap.plugin.version>
64+
<cdap.version>6.11.0</cdap.version>
65+
<cdap.plugin.version>2.13.0</cdap.plugin.version>
7766
<dropwizard.metrics-core.version>3.2.6</dropwizard.metrics-core.version>
7867
<flogger.system.backend.version>0.7.1</flogger.system.backend.version>
7968
<gcs.connector.version>hadoop3-2.2.21</gcs.connector.version>
@@ -118,19 +107,9 @@
118107
</dependencyManagement>
119108

120109
<repositories>
121-
<repository>
122-
<id>sonatype</id>
123-
<url>https://oss.sonatype.org/content/groups/public</url>
124-
<releases>
125-
<enabled>true</enabled>
126-
</releases>
127-
<snapshots>
128-
<enabled>false</enabled>
129-
</snapshots>
130-
</repository>
131110
<repository>
132111
<id>sonatype-snapshots</id>
133-
<url>https://oss.sonatype.org/content/repositories/snapshots</url>
112+
<url>https://central.sonatype.com/repository/maven-snapshots</url>
134113
<releases>
135114
<enabled>false</enabled>
136115
</releases>
@@ -394,78 +373,12 @@
394373
<groupId>org.apache.hbase</groupId>
395374
<artifactId>hbase-common</artifactId>
396375
</exclusion>
397-
</exclusions>
398-
</dependency>
399-
<dependency>
400-
<!--
401-
Required by bigtable-hbase-1.x-mapreduce instead of excluded non-shaded version.
402-
Shaded library is used to avoid dependency conflicts with Datastore module on profobuf-java dependency.
403-
Bigtable requires version 2.x and Datastore module requires 3.x protocol.
404-
-->
405-
<groupId>org.apache.hbase</groupId>
406-
<artifactId>hbase-shaded-client</artifactId>
407-
<version>${hbase-shaded-client.version}</version>
408-
<exclusions>
409-
<exclusion>
410-
<groupId>org.slf4j</groupId>
411-
<artifactId>slf4j-log4j12</artifactId>
412-
</exclusion>
413-
<exclusion>
414-
<groupId>log4j</groupId>
415-
<artifactId>log4j</artifactId>
416-
</exclusion>
417-
</exclusions>
418-
</dependency>
419-
<dependency>
420-
<!--
421-
Required by bigtable-hbase-1.x-mapreduce instead of excluded non-shaded version.
422-
Shaded library is used to avoid dependency conflicts with Datastore module on profobuf-java dependency.
423-
Bigtable requires version 2.x and Datastore module requires 3.x protocol.
424-
-->
425-
<groupId>org.apache.hbase</groupId>
426-
<artifactId>hbase-shaded-server</artifactId>
427-
<version>${hbase-shaded-server.version}</version>
428-
<exclusions>
429-
<exclusion>
430-
<groupId>org.slf4j</groupId>
431-
<artifactId>slf4j-log4j12</artifactId>
432-
</exclusion>
433376
<exclusion>
434-
<groupId>log4j</groupId>
435-
<artifactId>log4j</artifactId>
377+
<groupId>org.apache.hadoop</groupId>
378+
<artifactId>hadoop-common</artifactId>
436379
</exclusion>
437380
</exclusions>
438381
</dependency>
439-
<dependency>
440-
<groupId>io.dropwizard.metrics</groupId>
441-
<artifactId>metrics-core</artifactId>
442-
<version>${dropwizard.metrics-core.version}</version>
443-
</dependency>
444-
<dependency>
445-
<groupId>com.google.cloud</groupId>
446-
<artifactId>google-cloud-bigquery</artifactId>
447-
<version>${google.cloud.bigquery.version}</version>
448-
</dependency>
449-
<dependency>
450-
<groupId>com.google.crypto.tink</groupId>
451-
<artifactId>tink</artifactId>
452-
<version>${google.tink.version}</version>
453-
</dependency>
454-
<dependency>
455-
<groupId>com.google.crypto.tink</groupId>
456-
<artifactId>tink-gcpkms</artifactId>
457-
<version>${google.tink.version}</version>
458-
</dependency>
459-
<dependency>
460-
<groupId>com.google.cloud</groupId>
461-
<artifactId>google-cloud-spanner</artifactId>
462-
<version>${google.cloud.spanner.version}</version>
463-
</dependency>
464-
<dependency>
465-
<groupId>com.google.cloud</groupId>
466-
<artifactId>google-cloud-datastore</artifactId>
467-
<version>${google.cloud.datastore.version}</version>
468-
</dependency>
469382
<dependency>
470383
<groupId>org.apache.hadoop</groupId>
471384
<artifactId>hadoop-common</artifactId>
@@ -546,6 +459,88 @@
546459
</exclusion>
547460
</exclusions>
548461
</dependency>
462+
<dependency>
463+
<!--
464+
Required by bigtable-hbase-1.x-mapreduce instead of excluded non-shaded version.
465+
Shaded library is used to avoid dependency conflicts with Datastore module on profobuf-java dependency.
466+
Bigtable requires version 2.x and Datastore module requires 3.x protocol.
467+
-->
468+
<groupId>org.apache.hbase</groupId>
469+
<artifactId>hbase-shaded-client</artifactId>
470+
<version>${hbase-shaded-client.version}</version>
471+
<exclusions>
472+
<exclusion>
473+
<groupId>org.slf4j</groupId>
474+
<artifactId>slf4j-log4j12</artifactId>
475+
</exclusion>
476+
<exclusion>
477+
<groupId>log4j</groupId>
478+
<artifactId>log4j</artifactId>
479+
</exclusion>
480+
<exclusion>
481+
<groupId>org.apache.htrace</groupId>
482+
<artifactId>htrace-core</artifactId>
483+
</exclusion>
484+
</exclusions>
485+
</dependency>
486+
<dependency>
487+
<!--
488+
Required by bigtable-hbase-1.x-mapreduce instead of excluded non-shaded version.
489+
Shaded library is used to avoid dependency conflicts with Datastore module on profobuf-java dependency.
490+
Bigtable requires version 2.x and Datastore module requires 3.x protocol.
491+
-->
492+
<groupId>org.apache.hbase</groupId>
493+
<artifactId>hbase-shaded-server</artifactId>
494+
<version>${hbase-shaded-server.version}</version>
495+
<exclusions>
496+
<exclusion>
497+
<groupId>org.slf4j</groupId>
498+
<artifactId>slf4j-log4j12</artifactId>
499+
</exclusion>
500+
<exclusion>
501+
<groupId>log4j</groupId>
502+
<artifactId>log4j</artifactId>
503+
</exclusion>
504+
<exclusion>
505+
<groupId>org.apache.htrace</groupId>
506+
<artifactId>htrace-core</artifactId>
507+
</exclusion>
508+
<exclusion>
509+
<groupId>org.apache.hadoop</groupId>
510+
<artifactId>hadoop-common</artifactId>
511+
</exclusion>
512+
</exclusions>
513+
</dependency>
514+
<dependency>
515+
<groupId>io.dropwizard.metrics</groupId>
516+
<artifactId>metrics-core</artifactId>
517+
<version>${dropwizard.metrics-core.version}</version>
518+
</dependency>
519+
<dependency>
520+
<groupId>com.google.cloud</groupId>
521+
<artifactId>google-cloud-bigquery</artifactId>
522+
<version>${google.cloud.bigquery.version}</version>
523+
</dependency>
524+
<dependency>
525+
<groupId>com.google.crypto.tink</groupId>
526+
<artifactId>tink</artifactId>
527+
<version>${google.tink.version}</version>
528+
</dependency>
529+
<dependency>
530+
<groupId>com.google.crypto.tink</groupId>
531+
<artifactId>tink-gcpkms</artifactId>
532+
<version>${google.tink.version}</version>
533+
</dependency>
534+
<dependency>
535+
<groupId>com.google.cloud</groupId>
536+
<artifactId>google-cloud-spanner</artifactId>
537+
<version>${google.cloud.spanner.version}</version>
538+
</dependency>
539+
<dependency>
540+
<groupId>com.google.cloud</groupId>
541+
<artifactId>google-cloud-datastore</artifactId>
542+
<version>${google.cloud.datastore.version}</version>
543+
</dependency>
549544
<dependency>
550545
<groupId>org.apache.hadoop</groupId>
551546
<artifactId>hadoop-mapreduce-client-core</artifactId>
@@ -1150,29 +1145,15 @@
11501145
</execution>
11511146
</executions>
11521147
</plugin>
1153-
1154-
<plugin>
1155-
<groupId>org.apache.maven.plugins</groupId>
1156-
<artifactId>maven-release-plugin</artifactId>
1157-
<version>2.5.3</version>
1158-
<configuration>
1159-
<tag>v${releaseVersion}</tag>
1160-
<tagNameFormat>v@{project.version}</tagNameFormat>
1161-
<autoVersionSubmodules>true</autoVersionSubmodules>
1162-
<!-- releaseProfiles configuration will actually force a Maven profile
1163-
– the `releases` profile – to become active during the Release process. -->
1164-
<releaseProfiles>releases</releaseProfiles>
1165-
</configuration>
1166-
</plugin>
1167-
11681148
<plugin>
1169-
<groupId>org.sonatype.plugins</groupId>
1170-
<artifactId>nexus-staging-maven-plugin</artifactId>
1171-
<version>1.6.2</version>
1149+
<groupId>org.sonatype.central</groupId>
1150+
<artifactId>central-publishing-maven-plugin</artifactId>
1151+
<version>0.8.0</version>
11721152
<extensions>true</extensions>
11731153
<configuration>
1174-
<nexusUrl>https://oss.sonatype.org</nexusUrl>
1175-
<serverId>sonatype.release</serverId>
1154+
<publishingServerId>sonatype.release</publishingServerId>
1155+
<autoPublish>false</autoPublish>
1156+
<ignorePublishedComponents>true</ignorePublishedComponents>
11761157
</configuration>
11771158
</plugin>
11781159
</plugins>
@@ -1296,7 +1277,7 @@
12961277
<dependency>
12971278
<groupId>io.cdap.tests.e2e</groupId>
12981279
<artifactId>cdap-e2e-framework</artifactId>
1299-
<version>0.4.0-SNAPSHOT</version>
1280+
<version>0.4.0</version>
13001281
<scope>test</scope>
13011282
</dependency>
13021283
<dependency>

src/e2e-test/features/bigquery/source/BigQueryToBigQuery.feature

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -64,7 +64,7 @@ Feature: BigQuery source - Verification of BigQuery to BigQuery successful data
6464
Then Connect source as "BigQuery" and sink as "BigQuery" to establish connection
6565
Then Save the pipeline
6666
Then Preview and run the pipeline
67-
Then Wait till pipeline preview is in running state
67+
Then Wait till pipeline preview is in running state and check if any error occurs
6868
Then Open and capture pipeline preview logs
6969
Then Verify the preview run status of pipeline in the logs is "failed"
7070

src/e2e-test/features/gcs/sink/GCSSinkError.feature

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -43,7 +43,7 @@ Feature: GCS sink - Verify GCS Sink plugin error scenarios
4343
Then Open GCS sink properties
4444
Then Replace input plugin property: "project" with value: "projectId"
4545
Then Enter input plugin property: "referenceName" with value: "gcsReferenceName"
46-
Then Enter GCS source property path "gcsInvalidBucketName"
46+
Then Enter input plugin property: "path" with value: "gcsInvalidBucketName"
4747
Then Select GCS property format "csv"
4848
Then Click on the Validate button
4949
Then Verify that the Plugin Property: "path" is displaying an in-line error message: "errorMessageInvalidBucketName"

src/e2e-test/features/gcsmove/GCSMoveErrorScenarios.feature

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -19,7 +19,7 @@ Feature: GCSMove - Validate GCSMove plugin error scenarios
1919
When Expand Plugin group in the LHS plugins list: "Conditions and Actions"
2020
When Select plugin: "GCS Move" from the plugins list as: "Conditions and Actions"
2121
When Navigate to the properties page of plugin: "GCS Move"
22-
Then Enter GCSMove property source path "gcsInvalidBucketName"
22+
Then Enter input plugin property: "sourcePath" with value: "gcsInvalidBucketName"
2323
Then Enter GCSMove property destination path
2424
Then Verify GCS Move property "sourcePath" invalid bucket name error message is displayed for bucket "gcsInvalidBucketName"
2525

@@ -30,5 +30,5 @@ Feature: GCSMove - Validate GCSMove plugin error scenarios
3030
When Select plugin: "GCS Move" from the plugins list as: "Conditions and Actions"
3131
When Navigate to the properties page of plugin: "GCS Move"
3232
Then Enter GCSMove property source path "gcsCsvFile"
33-
Then Enter GCSMove property destination path "gcsInvalidBucketName"
33+
Then Enter input plugin property: "destPath" with value: "gcsInvalidBucketName"
3434
Then Verify GCS Move property "destPath" invalid bucket name error message is displayed for bucket "gcsInvalidBucketName"

0 commit comments

Comments
 (0)