|
| 1 | + |
| 2 | +## Data Factory Pipelines in Microsoft Fabric |
| 3 | + |
| 4 | +Data Factory provides a modern way to integrate data, allowing you to collect, prepare, and transform data from various sources like databases, data warehouses, Lakehouse, real-time data, and more. Whether you're a beginner or an experienced developer, you can use intelligent transformations and a wide range of activities to process your data. |
| 5 | + |
| 6 | +### Dataflows overview |
| 7 | + |
| 8 | +Dataflows offer a low-code interface to ingest data from numerous sources and transform it using over 300 data transformations. The transformed data can be loaded into various destinations, such as Azure SQL databases. Dataflows can be run manually, on a schedule, or as part of a data pipeline. |
| 9 | + |
| 10 | +### Important features of Dataflows |
| 11 | + |
| 12 | +- **Low-Code Interface**: Ingest data from hundreds of sources. |
| 13 | +- **Transformations**: Utilize 300+ data transformations. |
| 14 | +- **Destinations**: Load data into multiple destinations like Azure SQL databases. |
| 15 | +- **Execution**: Run dataflows manually, on a schedule, or within a data pipeline. |
| 16 | + |
| 17 | +### Power Query integration |
| 18 | + |
| 19 | +Dataflows are built using the Power Query experience, available across Microsoft products like Excel, Power BI, and Power Platform. Power Query enables users, from beginners to professionals, to perform data ingestion and transformations with ease. It supports joins, aggregations, data cleansing, custom transformations, and more, all through a user-friendly, visual, low-code interface. |
| 20 | + |
| 21 | +### Real-World Uses Cases for Dataflows |
| 22 | + |
| 23 | +**Data Consolidation for Reporting**: |
| 24 | +Organizations often have data spread across multiple sources such as databases, cloud storage, and on-premises systems. Dataflows can be used to consolidate this |
| 25 | +data into a single, unified dataset, which can then be used for reporting and analytics. For example, a company might use Dataflows to combine sales data from different regions into a single dataset for a comprehensive sales report. This single dataset can be further curated and promoted into a semantic model for use by a larger audience. |
| 26 | + |
| 27 | +**Data Preparation for Machine Learning**: |
| 28 | +Dataflows can be used to prepare and clean data for machine learning models. This method includes tasks such as data cleansing, transformation, and feature engineering. For instance, a data science team might use Dataflows to preprocess customer data, removing duplicates and normalizing values before feeding it into a machine learning model. |
| 29 | + |
| 30 | +**Real-Time Data Processing**: |
| 31 | +Dataflows can handle real-time data ingestion and transformation, making them ideal for scenarios where timely data processing is crucial. For example, an e-commerce platform might use Dataflows to process real-time transaction data, updating inventory levels and generating real-time sales reports. |
| 32 | + |
| 33 | +**Data Migration**: |
| 34 | +When migrating data from legacy systems to modern platforms, Dataflows can be used to extract, transform, and load (ETL) data into the new system. This process ensures that data is accurately and efficiently transferred, minimizing downtime and data loss. For instance, a company migrating from an on-premises database to Azure SQL Database might use Dataflows to handle the data migration process. |
| 35 | + |
| 36 | +**Self-Service Data Preparation**: |
| 37 | +Dataflows provide a low-code interface that allows business users to prepare their own data without needing extensive technical knowledge. This approach empowers users to create their own dataflows for tasks such as data cleansing, transformation, and enrichment, reducing the dependency on IT teams. For example, a marketing team might use Dataflows to prepare campaign data for analysis. |
| 38 | + |
| 39 | +These use cases demonstrate the flexibility and power of Dataflows in handling various data integration and transformation task and show a powerful self-service feature. Self-service might be more appealing to your organization's business users while still providing a roadmap to a larger ELT project that utilizes pipelines and notebooks. |
| 40 | + |
| 41 | +### Data Pipelines |
| 42 | + |
| 43 | +Data pipelines offer powerful workflow capabilities at cloud-scale, enabling you to build complex workflows that can refresh your dataflow, move petabyte-sized data, and define sophisticated control flow pipelines. |
| 44 | + |
| 45 | +### Important features of data Pipelines |
| 46 | + |
| 47 | +- **Complex Workflows**: Build workflows that can refresh dataflows, move large volumes of data, and define control flow pipelines. |
| 48 | +- **ETL and Data Factory Workflows**: Create complex ETL (Extract, Transform, Load) and data factory workflows which perform various tasks at scale. |
| 49 | +- **Control Flow Capabilities**: Utilize built-in control flow features to build workflow logic with loops and conditionals. |
| 50 | + |
| 51 | +### End-to-End ETL Data Pipeline |
| 52 | + |
| 53 | +Combine a configuration-driven copy activity with your low-code dataflow refresh in a single pipeline for a complete ETL data pipeline. You can also add code-first activities for Spark Notebooks, SQL scripts, stored procedures, and more. |
| 54 | + |
| 55 | +## Notebooks in Microsoft Fabric |
| 56 | + |
| 57 | +- **Interactive Data Exploration:** Notebooks allow users to interactively explore and analyze data, making it easier to understand and manipulate datasets. |
| 58 | +- **Multi-language Support:** Users can write and execute code in multiple languages within the same notebook, enhancing flexibility and collaboration. |
| 59 | +- **Visualization:** Notebooks support rich data visualization, enabling users to create charts, graphs, and other visual representations of data. |
| 60 | +- **Collaboration:** Notebooks facilitate collaboration by allowing multiple users to work on the same document simultaneously, share insights, and track changes. |
| 61 | +- **Integration with Fabric Services:** Notebooks seamlessly integrate with other Microsoft Fabric services, such as Data Factory, Synapse Data Engineering, and Synapse Data Science. This approach provides a unified platform for end-to-end data workflows. |
| 62 | + |
| 63 | +When comparing these technologies, it's important to note that while Data Factory focuses on data integration and pipeline automation, notebooks in Microsoft Fabric provide an interactive and ***collaborative*** environment for data exploration, documentation, transformation, and analysis. Both tools complement each other, offering a comprehensive solution for managing and analyzing data within the Microsoft Fabric ecosystem. |
0 commit comments