Skip to content

viveknaskar/cloud-dataflow-template-poc

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

10 Commits
 
 
 
 

Repository files navigation

Cloud Dataflow Template Proof-of-Concept

Code to create Google cloud-dataflow pipeline in Java. This is worked upon Apache Beam's WordCount example. The pipeline will read data from a textfile which is stored in a GCS bucket and it will output the number of each words in another GCS bucket.

Run the below maven command to run the example dataflow pipeline

 mvn archetype:generate \
     -DarchetypeArtifactId=google-cloud-dataflow-java-archetypes-examples \
     -DarchetypeGroupId=com.google.cloud.dataflow \
     -DarchetypeVersion=2.5.0 \
     -DgroupId=com.viveknaskar \
     -DartifactId=dataflow-template-poc \
     -Dversion="0.1" \
     -DinteractiveMode=false \
     -Dpackage=com.viveknaskar

This generates some sample code, including the Dataflow standard WordCount. You can edit that code for creating a template.

Run the below command to create the template

 mvn compile exec:java \
     -Dexec.mainClass=com.viveknaskar.WordCount \
     -Dexec.args="--project=<your-project-id> \
     --stagingLocation=gs://dataflow-pipeline-staging/staging \
     --dataflowJobFile=gs://dataflow-pipeline-staging/templates/dataflow-template \
     --gcpTempLocation=gs://dataflow-pipeline-staging/tmp \
     --output=gs://dataflow-pipelines-output \
     --runner=DataflowRunner"

Before running the above maven command, you'll need some GCS buckets in your GCP account.

About

Creating Cloud Dataflow template using Java for counting a number of words from a document.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 2

  •  
  •  

Languages