Skip to content

NWiddup/DataOpsDemo

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

8 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

DataOpsDemo

Demo for DataOps solutions with Bicep IAC + SSDT + ADF. Currently the solution is focussed on Bicep IAC, with SSDT + ADF to follow. This solution is based on my working experience with the technologies in customer deployments. FYI:

  • This code is provided as-is without warranty.
  • I DO NOT recommended running this code in any production environment without you having a full and proper understanding of the entire codebase.
  • There are multiple opportunities for optimisation of this code, it is provided as an intro & sample ONLY.

Refs

PreReqs

Steps before running prereq script.

These items are not completed in the prereq script, and will need to be done manually, and ideally before running the prereq script

  • create AAD SQL Admin groups for each environment
  • get GUID for AzDO SQL Admin groups
    • consider whether you need to add the GUID for the RG level Service Principals to the AzDO SQL Admins group for the environment. This would allow administrative actions to be conducted from the AzDO Pipelines by using the relevant service connection
  • gather your own Azure Subscription Id's
  • update the Bicep templates & pipelines to substitute in your own GUIDS (For the SQL Admin groups and Azure Subscription Id's)

Running prereq script

The below prerequisite items are almost all handled within the PreReq-Setup.ps1 PowerShell/AzureCLI Script. They are listed out below so you can confirm you have completed the relevant steps.

  • create azure devops org
  • create azure devops project
  • create service principals in AzDO to run each service connection
  • create service connections in your DevOps project for each of your environments (Dev/Test/Prod) using the relevant service principals (or managed identities)
    • If you want to manage permissions within your IAC, your Service Principal will need either Owner permissions (at the RG/Sub level), or a custom role granting it permission to create other authorisations.
    • you may want to consider having 2 service connections, one at the sub level and one at the RG level. You would use the Sub level service connection to complete privileged activities, then the RG level service connection to complete local activities.
    • Please review the Refs before proceeding
  • create folders for your pipelines in the project
  • create the pipelines for your project
  • run initial deployment of Core Infrastructure.

Post running prereq script

You will need to setup Branch Policies (Approvers - minimum 1) on the main branch, so that commits cannot be pushed directly to main (Branch Policies Note: If any required policy is enabled, this branch cannot be deleted and changes must be made via pull request.). This will ensure there is opportunity to review the code before it is released to prod.

  • It is recommended to not allow the person who created the PR to approve thier own PRs
  • You should review the other available branch policies to see whether you would like to implement any other branch policies
  • You should review security policies on the release environments, to see whether you would like to implement any environment-based controls (eg. 'environment owner approvals' or 'release windows outside core business hours')

Overview

One of the biggest thing we have identified with DataOps is that the People are just as important as the process. eg. Where is the delineation of what the Data Engineer does vs the DevOps Engineer does.

DevOps = People + Process + Technology

Data DevOps issues then build from people not having a full understanding of how to devops data. This is a problem because devops for data is relatively new when compared to DevOps for Infrastructure as Code (IAC) or applications.

This repo aims to provide a relatively simple working example of a DevOps'd IAC environment, with DataOps setup for the pipelines which ingest the data to the data lake.

For more information on DataOps, review these two repositories:

DataOps Background

A DataOps approach improves a project’s ability to stay on target & on time. DataOps is an emerging discipline that brings together DevOps teams with data engineer and data scientis to provide the tools, processes and organization structures to support the data-focused enterprise. DataOps ensures that processes and systems that control the data journey are scalable and repeatable. The activities that fall under the DataOps umbrella include integrating with data sources, performing transformations, converting data formats, and writing or delivering data to its required destination. DataOps also encompasses the monitoring and governance of data flows while ensuring security.

Advantages of a DataOps Approach

  • Able to pivot & respond to real-world events as they happen
  • Improved efficiency and better use of people’s time
  • Faster time-to-value
  • A good fit to working with a global data fabric

DataOps: A Good Way to Adapt to Emerging Data Practices

  • Faster time-to-value & better ability to pivot
  • Better collaboration/communication across skill groups
  • Focused around data-related goals
  • More efficient use of team members’ time
  • A good fit to working with a data fabric

What is DataOps?

What is DataOps, exactly, and why are companies planning to invest in it? In a nutshell, DataOps controls the flow of data from source to value, speeding up the process of deriving value from data. Fundamentally, DataOps ensures that processes and systems that control the data journey are scalable and repeatable.

The activities that fall under the DataOps umbrella include integrating with data sources, performing transformations, converting data formats, and writing or delivering data to its required destination. DataOps also encompasses the monitoring and governance of data flows while ensuring security

Issues related to lack of DevOps:

  • Poor teamwork within the data team
  • Lack of collaboration between groups within the data organization
  • Waiting for IT to disposition or configure system resources
  • Waiting for access to data
  • Moving slowly and cautiously to avoid poor quality
  • Requiring approvals, such as from an Impact Review Board
  • Inflexible data architectures
  • Process bottlenecks
  • Technical debt from previous deployments
  • Poor quality creating unplanned work

About

Demo for DataOps solutions with Bicep IAC + SSDT + ADF

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published