Total Visitors

diff --git a/Workloads-Specific/DataFactory/BestPractices.md b/Workloads-Specific/DataFactory/BestPractices.md
index 2b39df7..f288921 100644
--- a/Workloads-Specific/DataFactory/BestPractices.md
+++ b/Workloads-Specific/DataFactory/BestPractices.md
@@ -1,12 +1,12 @@
-# Azure Data Factory (ADF) Best Practices - Overview
+# Azure Data Factory (ADF) Best Practices - Overview
Costa Rica
-[](https://github.com)
+[](https://github.com)
[](https://github.com/)
[brown9804](https://github.com/brown9804)
-Last updated: 2025-04-16
+Last updated: 2025-04-21
----------
@@ -28,36 +28,36 @@ Last updated: 2025-04-16
- [Architecture examples](#architecture-examples)
- [Best Practices for ADF Pipelines](#best-practices-for-adf-pipelines)
- - [Clear Pipeline Structure](#clear-pipeline-structure)
- - [Example Pipeline Structure](#example-pipeline-structure)
- - [Parameterization](#parameterization)
- - [Incremental Loading](#incremental-loading)
- - [Use Timestamps](#use-timestamps)
- - [Change Data Capture CDC](#change-data-capture-cdc)
- - [Delta Loads](#delta-loads)
- - [Partitioning](#partitioning)
- - [Error Handling and Monitoring](#error-handling-and-monitoring)
- - [a. Use If Condition Activity](#a-use-if-condition-activity)
- - [b. Configure Activity Fault Tolerance](#b-configure-activity-fault-tolerance)
- - [c. Custom Error Handling: Use Web Activity for error handling](#c-custom-error-handling-use-web-activity-for-error-handling)
- - [d. Pipeline Monitoring: Monitor activity runs.](#d-pipeline-monitoring-monitor-activity-runs)
- - [Security Measures](#security-measures)
- - [Use Azure Key Vault](#use-azure-key-vault)
- - [Store Secrets](#store-secrets)
- - [Access Policies](#access-policies)
- - [Secure Access](#secure-access)
- - [Rotate Secrets](#rotate-secrets)
- - [Source Control](#source-control)
- - [Resource Management](#resource-management)
- - [Testing and Validation](#testing-and-validation)
- - [Documentation](#documentation)
- - [Regular Updates](#regular-updates)
- - [Performance Tuning](#performance-tuning)
+ - [Clear Pipeline Structure](#clear-pipeline-structure)
+ - [Example Pipeline Structure](#example-pipeline-structure)
+ - [Parameterization](#parameterization)
+ - [Incremental Loading](#incremental-loading)
+ - [Use Timestamps](#use-timestamps)
+ - [Change Data Capture CDC](#change-data-capture-cdc)
+ - [Delta Loads](#delta-loads)
+ - [Partitioning](#partitioning)
+ - [Error Handling and Monitoring](#error-handling-and-monitoring)
+ - [a. Use If Condition Activity](#a-use-if-condition-activity)
+ - [b. Configure Activity Fault Tolerance](#b-configure-activity-fault-tolerance)
+ - [c. Custom Error Handling: Use Web Activity for error handling](#c-custom-error-handling-use-web-activity-for-error-handling)
+ - [d. Pipeline Monitoring: Monitor activity runs.](#d-pipeline-monitoring-monitor-activity-runs)
+ - [Security Measures](#security-measures)
+ - [Use Azure Key Vault](#use-azure-key-vault)
+ - [Store Secrets](#store-secrets)
+ - [Access Policies](#access-policies)
+ - [Secure Access](#secure-access)
+ - [Rotate Secrets](#rotate-secrets)
+ - [Source Control](#source-control)
+ - [Resource Management](#resource-management)
+ - [Testing and Validation](#testing-and-validation)
+ - [Documentation](#documentation)
+ - [Regular Updates](#regular-updates)
+ - [Performance Tuning](#performance-tuning)
- [Recommended Training Modules on Microsoft Learn](#recommended-training-modules-on-microsoft-learn)
-## Architecture examples
+## Architecture examples

@@ -78,7 +78,6 @@ Last updated: 2025-04-16
| **Organized Layout** | Arrange activities in a logical sequence and avoid overlapping lines. | - Place activities in a left-to-right or top-to-bottom flow to visually represent the data flow.
- Group related activities together and use containers for better organization. |
| **Error Handling and Logging**| Include error handling and logging activities to capture and manage errors. | - Add a Web Activity to log errors to a monitoring system.
- Use Try-Catch blocks to handle errors gracefully and ensure the pipeline continues running. |
-
#### Example Pipeline Structure
> Pipeline: CopySalesDataPipeline
@@ -107,9 +106,9 @@ graph TD
- If needed, add parameters:

-
+

-
+

- **Activities Inside ForEach**:
@@ -124,7 +123,6 @@ graph TD

-
- **Set Variable Activity**: Log the status of the copy operation.
- **Name**: `LogStatus`
- **Annotation**: `Log the status of the copy operation`
@@ -138,6 +136,7 @@ graph TD

### Parameterization
+>
> Use parameters to make your pipelines more flexible and easier to manage.
| **Best Practice** | **Description** | **Example** |
@@ -148,6 +147,7 @@ graph TD
| **Parameterize Datasets** | Parameterize datasets to handle different data sources or destinations. | - Create a dataset with a parameterized file path to handle different file names dynamically.
- Use parameters in datasets to switch between different databases or tables.
- Define parameters for connection strings to dynamically connect to different data sources. |
### Incremental Loading
+>
> Implement incremental data loading to improve efficiency.
| **Best Practice** | **Description** | **Example** |
@@ -176,6 +176,7 @@ graph TD
- Use a Stored Procedure activity to update the `LastLoadedTimestamp` in the watermark table.
#### Change Data Capture (CDC)
+>
> Utilize CDC to capture and load only the changes made to the source data.
1. **Enable CDC on Source Table**:
@@ -189,6 +190,7 @@ graph TD
- Inside the ForEach activity, use Copy Data activities to apply the changes to the destination.
#### Delta Loads
+>
> Perform delta loads to update only the changed data instead of full loads.
1. **Track Changes**:
@@ -202,6 +204,7 @@ graph TD
- After loading, reset the `ChangeFlag` to 0.
#### Partitioning
+>
> Partition large datasets to improve performance and manageability.
1. **Partition Your Data**:
@@ -215,6 +218,7 @@ graph TD
- Inside the ForEach activity, use a Copy Data activity to load data for each partition.
### Error Handling and Monitoring
+>
> Set up robust error handling and monitoring to quickly identify and resolve issues.
| **Best Practice** | **Description** | **Example** |
@@ -225,6 +229,7 @@ graph TD
| **Custom Logging** | Implement custom logging to capture detailed error information. | - Use a Web Activity to log errors to an external logging service or database.
- Implement an Azure Function to log detailed error information and call it from the pipeline.
- Use a Set Variable activity to capture error details and write them to a log file in Azure Blob Storage. |
#### a. **Use If Condition Activity**
+
1. **Create a Pipeline**:
- Open Microsoft Fabric and navigate to Azure Data Factory.
@@ -251,6 +256,7 @@ graph TD

#### b. **Configure Activity Fault Tolerance**
+
1. **Set Retry Policy**:
- Select an activity within your pipeline.
- In the activity settings, configure the retry policy by specifying the number of retries and the interval between retries.
@@ -271,17 +277,18 @@ graph TD

-#### d. **Pipeline Monitoring**: Monitor activity runs.
+#### d. **Pipeline Monitoring**: Monitor activity runs
- In the ADF monitoring interface, navigate to the `Monitor` section, if you don't see it click on `...`.
- Check the status of individual activities within your pipelines for success, failure, and skipped activities. Or search for any specific pipeline.
- Click on the activity to see the `Details`, and click on the `Pipeline Run ID`:

-
+

### Security Measures
+>
> Apply security best practices to protect your data.
| **Best Practice** | **Description** | **Example** |
@@ -292,6 +299,7 @@ graph TD
| **Audit Logs** | Enable auditing to track access and changes to ADF resources. | - Use Azure Monitor to collect and analyze audit logs for ADF activities.
- Enable diagnostic settings to send logs to Azure Log Analytics, Event Hubs, or a storage account.
- Regularly review audit logs to detect and respond to unauthorized access or changes. |
### Use Azure Key Vault
+>
> Store sensitive information such as connection strings, passwords, and API keys in Azure Key Vault to enhance security and manage secrets efficiently.
| **Best Practice** | **Description** | **Example** |
@@ -325,8 +333,8 @@ graph TD

-
#### Access Policies
+>
> Configure access policies to control who can access secrets.
1. **Set Up Access Policies in Key Vault**:
@@ -343,10 +351,12 @@ graph TD
> Use managed identities to securely access Key Vault secrets.
**Grant Key Vault Access to Managed Identity**:
- - In the Key Vault, add an access policy to grant the Data Factory managed identity access to the required secrets.
- - Example: Grant `Get` and `List` permissions to the managed identity.
+
+- In the Key Vault, add an access policy to grant the Data Factory managed identity access to the required secrets.
+- Example: Grant `Get` and `List` permissions to the managed identity.
#### Rotate Secrets
+>
> Regularly rotate secrets to enhance security.
1. **Update Secrets in Key Vault**:
@@ -359,10 +369,10 @@ graph TD
- Ensure that relevant teams are notified when secrets are rotated.
- Example: Use Logic Apps to send email notifications when secrets are updated.
-
-### Source Control
+### Source Control
> Benefits of Git Integration:
+>
> - **Version Control**: Track and audit changes, and revert to previous versions if needed.
> - **Collaboration**: Multiple team members can work on the same project simultaneously.
> - **Incremental Saves**: Save partial changes without publishing them live.
@@ -397,6 +407,7 @@ graph TD
- Collaborate with team members through code reviews and comments.
### Resource Management
+>
> Optimize resource usage to improve performance and reduce costs.
| **Best Practice** | **Description** | **Example** |
@@ -407,6 +418,7 @@ graph TD
| **Resource Tagging** | Tag resources for better organization and cost tracking. | - Apply tags to ADF resources to categorize and track costs by project or department.
- Use tags to identify and manage resources associated with specific business units.
- Implement tagging policies to ensure consistent resource tagging across the organization. |
### Testing and Validation
+>
> Regularly test and validate your pipelines to ensure they work as expected.
| **Best Practice** | **Description** | **Example** |
@@ -417,6 +429,7 @@ graph TD
| **Automated Testing** | Automate testing processes to ensure consistency and reliability. | - Use Azure DevOps pipelines to automate the testing of ADF pipelines.
- Schedule automated tests to run after each deployment or code change.
- Integrate automated testing with CI/CD pipelines to ensure continuous validation. |
### Documentation
+>
> Maintain comprehensive documentation for your pipelines.
| **Best Practice** | **Description** | **Example** |
@@ -427,6 +440,7 @@ graph TD
| **Knowledge Sharing** | Share documentation with the team to ensure everyone is informed. | - Use a shared platform like SharePoint or Confluence to store and share documentation.
- Conduct regular training sessions to keep the team updated on best practices.
- Encourage team members to contribute to and update the documentation. |
### Regular Updates
+>
> Keep your pipelines and ADF environment up to date.
| **Best Practice** | **Description** | **Example** |
@@ -437,6 +451,7 @@ graph TD
| **Security Patches** | Apply security patches promptly to protect against vulnerabilities. | - Monitor security advisories and apply patches to ADF and related services.
- Implement a patch management process to ensure timely updates.
- Conduct regular security assessments to identify and address vulnerabilities. |
### Performance Tuning
+>
> Continuously monitor and tune performance.
| **Best Practice** | **Description** | **Example** |
@@ -447,6 +462,7 @@ graph TD
| **Resource Allocation** | Allocate resources efficiently to balance performance and cost. | - Adjust the number of Data Integration Units (DIUs) based on workload requirements.
- Use resource groups to manage and allocate resources effectively.
- Monitor resource usage and adjust allocations to optimize performance. |
## Recommended Training Modules on Microsoft Learn
+
- [Introductory training modules for Azure Data Factory](https://learn.microsoft.com/en-us/azure/data-factory/quickstart-learn-modules)
- [Quickstart: Get started with Azure Data Factory](https://learn.microsoft.com/en-us/azure/data-factory/quickstart-get-started)
- [Introduction to Azure Data Factory](https://learn.microsoft.com/en-us/training/modules/intro-to-azure-data-factory/): This module covers the basics of ADF and how it can help integrate your data sources
diff --git a/Workloads-Specific/DataFactory/HowMonitorChanges.md b/Workloads-Specific/DataFactory/HowMonitorChanges.md
index 100ec65..197adb1 100644
--- a/Workloads-Specific/DataFactory/HowMonitorChanges.md
+++ b/Workloads-Specific/DataFactory/HowMonitorChanges.md
@@ -5,7 +5,7 @@ Costa Rica
[](https://github.com/)
[brown9804](https://github.com/brown9804)
-Last updated: 2025-04-16
+Last updated: 2025-04-21
----------
@@ -60,7 +60,7 @@ Last updated: 2025-04-16

-## Create a pipeline
+## Create a pipeline
1. **Log in to Azure Portal**: Open your web browser and go to the Azure Portal. Enter your credentials to log in.
2. **Go to Data Factory**: Use the search bar at the top to search for `Data Factory` and select your Data Factory instance from the list.
@@ -108,7 +108,7 @@ Last updated: 2025-04-16

- ## How to see who modified a pipeline
+## How to see who modified a pipeline
1. **Log in to Azure Portal**: Open your web browser and go to the Azure Portal. Enter your credentials to log in.
2. **Go to Azure Data Factory**: Once logged in, use the search bar at the top to search for `Data Factory` and select your Data Factory instance from the list.
diff --git a/Workloads-Specific/DataScience/AI_integration/README.md b/Workloads-Specific/DataScience/AI_integration/README.md
new file mode 100644
index 0000000..59099bc
--- /dev/null
+++ b/Workloads-Specific/DataScience/AI_integration/README.md
@@ -0,0 +1,403 @@
+# Demostration: How to integrate AI in Microsoft Fabric
+
+Costa Rica
+
+[](https://github.com/)
+[brown9804](https://github.com/brown9804)
+
+Last updated: 2025-04-21
+
+------------------------------------------
+
+> Fabric's OneLake datastore provides a unified data storage solution that supports differents data formats and sources. This feature simplifies data access and management, enabling efficient data preparation and model training.
+
+
+List of References (Click to expand)
+
+- [Unleashing the Power of Microsoft Fabric and SynapseML](https://blog.fabric.microsoft.com/en-us/blog/unleashing-the-power-of-synapseml-and-microsoft-fabric-a-guide-to-qa-on-pdf-documents-2)
+- [Building a RAG application with Microsoft Fabric](https://techcommunity.microsoft.com/t5/startups-at-microsoft/building-high-scale-rag-applications-with-microsoft-fabric/ba-p/4217816)
+- [Building Custom AI Applications with Microsoft Fabric: Implementing Retrieval-Augmented Generation](https://support.fabric.microsoft.com/en-us/blog/building-custom-ai-applications-with-microsoft-fabric-implementing-retrieval-augmented-generation-for-enhanced-language-models?ft=Alicia%20Li%20%28ASA%29:author)
+- [Avail the Power of Microsoft Fabric from within Azure Machine Learning](https://techcommunity.microsoft.com/t5/ai-machine-learning-blog/avail-the-power-of-microsoft-fabric-from-within-azure-machine/ba-p/3980702)
+- [AI and Machine Learning on Databricks - Azure Databricks | Microsoft Learn]( https://learn.microsoft.com/en-us/azure/databricks/machine-learning)
+- [Training and Inference of LLMs with PyTorch Fully Sharded Data Parallel](https://techcommunity.microsoft.com/t5/microsoft-developer-community/training-and-inference-of-llms-with-pytorch-fully-sharded-data/ba-p/3845995)
+- [Harness the Power of LangChain in Microsoft Fabric for Advanced Document Summarization](https://blog.fabric.microsoft.com/en-us/blog/harness-the-power-of-langchain-in-microsoft-fabric-for-advanced-document-summarization)
+- [Integrating Azure AI and Microsoft Fabric for Next-Gen AI Solutions](https://build.microsoft.com/en-US/sessions/91971ab3-93e4-429d-b2d7-5b60b2729b72)
+- [Generative AI with Microsoft Fabric](https://techcommunity.microsoft.com/t5/microsoft-mechanics-blog/generative-ai-with-microsoft-fabric/ba-p/4219444)
+- [Harness Microsoft Fabric AI Skill to Unlock Context-Rich Insights from Your Data](https://blog.fabric.microsoft.com/en-us/blog/harness-microsoft-fabric-ai-skill-to-unlock-context-rich-insights-from-your-data)
+- [LangChain-AzureOpenAI Parameter API Reference](https://python.langchain.com/api_reference/openai/chat_models/langchain_openai.chat_models.azure.AzureChatOpenAI.html#)
+
+
+
+
+Table of Content (Click to expand)
+
+- [Overview](#overview)
+- [Demo](#demo)
+ - [Set Up Your Environment](#set-up-your-environment)
+ - [Install Required Libraries](#install-required-libraries)
+ - [Configure Azure OpenAI Service](#configure-azure-openai-service)
+ - [Basic Usage of LangChain Transformer](#basic-usage-of-langchain-transformer)
+ - [Using LangChain for Large Scale Literature Review](#using-langchain-for-large-scale-literature-review)
+ - [Machine Learning Integration with Microsoft Fabric](#machine-learning-integration-with-microsoft-fabric)
+
+
+
+## Overview
+
+> Microsoft Fabric is a comprehensive data analytics platform that brings together various data services to provide an end-to-end solution for data engineering, data science, data warehousing, real-time analytics, and business intelligence. It's designed to simplify the process of working with data and to enable organizations to gain insights more efficiently.
+> Capabilities Enabled by LLMs:
+>
+> - `Document Summarization`: LLMs can process and summarize large documents, making it easier to extract key information.
+> - `Question Answering:` Users can perform Q&A tasks on PDF documents, allowing for interactive data exploration.
+> - `Embedding Generation`: LLMs can generate embeddings for document chunks, which can be stored in a vector store for efficient search and retrieval.
+
+## Demo
+
+Tools in practice:
+
+| **Tool** | **Description**|
+|--------------|-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
+| **LangChain**| LangChain is a framework for developing applications powered by language models. It can be used with Azure OpenAI to build applications that require natural language understanding and generation.
**Use Case**: Creating complex applications that involve multiple steps or stages of processing, such as preprocessing text data, applying a language model, and postprocessing the results. |
+| **SynapseML**| SynapseML is an open-source library that simplifies the creation of massively scalable machine learning pipelines. It integrates with Azure OpenAI to provide distributed computing capabilities, allowing you to apply large language models at scale.
**Use Case**: Applying powerful language models to massive amounts of data, enabling scenarios like batch processing of text data or large-scale text analytics. |
+
+### Set Up Your Environment
+
+1. **Register the Resource Provider**: Ensure that the `microsoft.fabric` resource provider is registered in your subscription.
+
+

+
+2. **Create a Microsoft Fabric Resource**:
+ - Navigate to the Azure Portal.
+ - Create a new resource of type **Microsoft Fabric**.
+ - Choose the appropriate subscription, resource group, capacity name, region, size, and administrator.
+
+

+
+3. **Enable Fabric Capacity in Power BI**:
+ - Go to the Power BI workspace.
+ - Select the Fabric capacity license and the Fabric resource created in Azure.
+
+

+
+4. **Pause Fabric Compute When Not in Use**: To save costs, remember to pause the Fabric compute in Azure when you're not using it.
+
+

+
+### Install Required Libraries
+
+1. **Access Microsoft Fabric**:
+ - Open your web browser and navigate to the Microsoft Fabric portal.
+ - Sign in with your Azure credentials.
+2. **Select Your Workspace**: From the Microsoft Fabric home page, select the workspace where you want to configure SynapseML.
+3. **Create a New Cluster**:
+ - Within the **Data Science** component, you should find options to create a new cluster.
+
+

+
+ - Follow the prompts to configure and create your cluster, specifying the details such as cluster name, region, node size, and node count.
+
+

+
+

+
+4. **Install SynapseML on Your Cluster**: Configure your cluster to include the SynapseML package.
+
+

+
+

+
+ ~~~
+ %pip show synapseml
+ ~~~
+
+5. **Install LangChain and Other Dependencies**:
+ > You can use `%pip install` to install the necessary packages
+
+ ```python
+ %pip install openai langchain_community
+ ```
+
+ Or you can use the environment configuration:
+
+

+
+ You can also try with the `.yml file` approach. Just upload your list of dependencies. E.g:
+
+ ```yml
+ dependencies:
+ - pip:
+ - synapseml==1.0.8
+ - langchain==0.3.4
+ - langchain_community==0.3.4
+ - openai==1.53.0
+ - langchain.openai==0.2.4
+ ```
+
+### Configure Azure OpenAI Service
+
+> [!NOTE]
+> Click [here](./src/fabric-llms-overview_sample.ipynb) to see all notebook
+
+1. **Set Up API Keys**: Ensure you have the API key and endpoint URL for your deployed model. Set these as environment variables
+
+

+
+ ```python
+ import os
+
+ # Set the API version for the Azure OpenAI service
+ os.environ["OPENAI_API_VERSION"] = "2023-08-01-preview"
+
+ # Set the base URL for the Azure OpenAI service
+ os.environ["AZURE_OPENAI_ENDPOINT"] = "https://your-resource-name.openai.azure.com"
+
+ # Set the API key for Azure OpenAI
+ os.environ["AZURE_OPENAI_API_KEY"] = "your-api-key"
+ ```
+
+2. **Initialize Azure OpenAI Class**: Create an instance of the Azure OpenAI class using the environment variables set above.
+
+

+
+ ```python
+ from langchain_openai import AzureChatOpenAI
+
+ # Set the API base URL
+ api_base = os.environ["AZURE_OPENAI_ENDPOINT"]
+
+ # Create an instance of the Azure OpenAI Class
+ llm = AzureChatOpenAI(
+ openai_api_key=os.environ["AZURE_OPENAI_API_KEY"],
+ temperature=0.7,
+ verbose=True,
+ top_p=0.9
+ )
+ ```
+
+3. **Call the Deployed Model**: Use the Azure OpenAI service to generate text or perform other language model tasks. Here's an example of generating a response based on a prompt
+
+

+
+ ```python
+ # Define a prompt
+ messages = [
+ (
+ "system",
+ "You are a helpful assistant that translates English to French. Translate the user sentence.",
+ ),
+ ("human", "Hi, how are you?"),
+ ]
+
+ # Generate a response from the Azure OpenAI service using the invoke method
+ ai_msg = llm.invoke(messages)
+
+ # Print the response
+ print(ai_msg)
+ ```
+
+Make sure to replace `"your_openai_api_key"`, `"https://your_openai_api_base/"`, `"your_deployment_name"`, and `"your_model_name"` with your actual API key, base URL, deployment name, and model name from your Azure OpenAI instance. This example demonstrates how to configure and use an existing Azure OpenAI instance in Microsoft Fabric.
+
+### Basic Usage of LangChain Transformer
+
+> [!NOTE]
+> E.g: Automate the process of generating definitions for technology terms using a language model.
+> `The LangChain Transformer` is a tool that makes it easy to use advanced language models for `generating and transforming text`. It works by `setting up a template for what you want to create, linking this template to a language model, and then processing your data to produce the desired output`. This setup `helps automate tasks like defining technology terms or generating other text-based content`, making your workflow smoother and more efficient.
+
+> `LangChain Transformer helps you automate the process of generating and transforming text data using advanced language models`, making it easier to integrate AI capabilities into your data workflows.
+>
+> 1. `Prompt Creation`: Start by `defining a template for the kind of text you want to generate or analyze`. For example, you might create a prompt that asks the model to define a specific technology term.
+> 2. `Chain Setup`: Then `set up a chain that links this prompt to a language model`. This chain is responsible for sending the prompt to the model and receiving the generated response.
+> 3. `Transformer Configuration`: The LangChain Transformer is `configured to use this chain`. It specifies how the `input data (like a list of technology names) should be processed and what kind of output (like definitions) should be produced`.
+> 4. `Data Processing`: Finally, `apply this setup to a dataset.` E.g., list of technology names in a DataFrame, and the transformer will use the language model to generate definitions for each technology.
+
+1. **Create a Prompt Template**: Define a prompt template for generating definitions.
+
+

+
+ ```python
+ from langchain.prompts import PromptTemplate
+
+ copy_prompt = PromptTemplate(
+ input_variables=["technology"],
+ template="Define the following word: {technology}",
+ )
+ ```
+
+2. **Set Up an LLMChain**: Create an LLMChain with the defined prompt template.
+
+

+
+ ```python
+ from langchain.chains import LLMChain
+
+ chain = LLMChain(llm=llm, prompt=copy_prompt)
+ ```
+
+3. **Configure LangChain Transformer**: Set up the LangChain transformer to execute the processing chain.
+
+

+
+ ```python
+ # Set up the LangChain transformer to execute the processing chain.
+ from synapse.ml.cognitive.langchain import LangchainTransformer
+
+ openai_api_key= os.environ["AZURE_OPENAI_API_KEY"]
+
+ transformer = (
+ LangchainTransformer()
+ .setInputCol("technology")
+ .setOutputCol("definition")
+ .setChain(chain)
+ .setSubscriptionKey(openai_api_key)
+ .setUrl(api_base)
+ )
+ ```
+
+4. **Create a Test DataFrame**: Construct a DataFrame with technology names.
+
+

+
+ ```python
+ from pyspark.sql import SparkSession
+ from pyspark.sql.functions import udf
+ from pyspark.sql.types import StringType
+
+ # Initialize Spark session
+ spark = SparkSession.builder.appName("example").getOrCreate()
+
+ # Construct a DataFrame with technology names
+ df = spark.createDataFrame(
+ [
+ (0, "docker"), (1, "spark"), (2, "python")
+ ],
+ ["label", "technology"]
+ )
+
+ # Define a simple UDF to transform the technology column
+ def transform_technology(tech):
+ return tech.upper()
+
+ # Register the UDF
+ transform_udf = udf(transform_technology, StringType())
+
+ # Apply the UDF to the DataFrame
+ transformed_df = df.withColumn("transformed_technology", transform_udf(df["technology"]))
+
+ # Show the transformed DataFrame
+ transformed_df.show()
+ ```
+
+### Using LangChain for Large Scale Literature Review
+
+> [!NOTE]
+> E.g: Automating the extraction and summarization of academic papers: script for an agent using LangChain to extract content from an online PDF and generate a prompt based on that content.
+> An `agent` in the context of programming and artificial intelligence is a `software entity that performs tasks autonomously`. It can interact with its`environment, make decisions, and execute actions based on predefined rules or learned behavior.`
+
+1. **Define Functions for Content Extraction and Prompt Generation**: Extract content from PDFs linked in arXiv papers and generate prompts for extracting specific information.
+
+

+
+ ```python
+ from langchain.document_loaders import OnlinePDFLoader
+
+ def paper_content_extraction(inputs: dict) -> dict:
+ arxiv_link = inputs["arxiv_link"]
+ loader = OnlinePDFLoader(arxiv_link)
+ pages = loader.load_and_split()
+ return {"paper_content": pages[0].page_content + pages[1].page_content}
+
+ def prompt_generation(inputs: dict) -> dict:
+ output = inputs["Output"]
+ prompt = (
+ "find the paper title, author, summary in the paper description below, output them. "
+ "After that, Use websearch to find out 3 recent papers of the first author in the author section below "
+ "(first author is the first name separated by comma) and list the paper titles in bullet points: "
+ "
\n" + output + "."
+ )
+ return {"prompt": prompt}
+ ```
+
+2. **Create a Sequential Chain for Information Extraction**: Set up a chain to extract structured information from an arXiv link
+
+
+
+ ```python
+ from langchain.chains import TransformChain, SimpleSequentialChain
+
+ paper_content_extraction_chain = TransformChain(
+ input_variables=["arxiv_link"],
+ output_variables=["paper_content"],
+ transform=paper_content_extraction,
+ verbose=False,
+ )
+
+ paper_summarizer_template = """
+ You are a paper summarizer, given the paper content, it is your job to summarize the paper into a short summary,
+ and extract authors and paper title from the paper content.
+ """
+ ```
+
+### Machine Learning Integration with Microsoft Fabric
+
+1. **Train and Register Machine Learning Models**: Use Microsoft Fabric's native integration with the MLflow framework to log the trained machine learning models, the used hyperparameters, and evaluation metrics.
+
+
+
+ ```python
+ import mlflow
+ from mlflow.models import infer_signature
+ from sklearn.datasets import make_regression
+ from sklearn.ensemble import RandomForestRegressor
+
+ # Generate synthetic regression data
+ X, y = make_regression(n_features=4, n_informative=2, random_state=0, shuffle=False)
+
+ # Model parameters
+ params = {"n_estimators": 3, "random_state": 42}
+
+ # Model tags for MLflow
+ model_tags = {
+ "project_name": "grocery-forecasting",
+ "store_dept": "produce",
+ "team": "stores-ml",
+ "project_quarter": "Q3-2023"
+ }
+
+ # Log MLflow entities
+ with mlflow.start_run() as run:
+ # Train the model
+ model = RandomForestRegressor(**params).fit(X, y)
+
+ # Infer the model signature
+ signature = infer_signature(X, model.predict(X))
+
+ # Log parameters and the model
+ mlflow.log_params(params)
+ mlflow.sklearn.log_model(model, artifact_path="sklearn-model", signature=signature)
+
+ # Register the model with tags
+ model_uri = f"runs:/{run.info.run_id}/sklearn-model"
+ model_version = mlflow.register_model(model_uri, "RandomForestRegressionModel", tags=model_tags)
+
+ # Output model registration details
+ print(f"Model Name: {model_version.name}")
+ print(f"Model Version: {model_version.version}")
+ ```
+
+2. **Compare and Filter Machine Learning Models**: Use MLflow to search among multiple models saved within the workspace.
+
+
+
+ ```python
+ from pprint import pprint
+ from mlflow.tracking import MlflowClient
+
+ client = MlflowClient()
+ for rm in client.search_registered_models():
+ pprint(dict(rm), indent=4)
+ ```
+
+
+
Total Visitors
+

+
diff --git a/Workloads-Specific/DataScience/AI_integration/src/fabric-llms-overview_sample.ipynb b/Workloads-Specific/DataScience/AI_integration/src/fabric-llms-overview_sample.ipynb
new file mode 100644
index 0000000..7b0c18d
--- /dev/null
+++ b/Workloads-Specific/DataScience/AI_integration/src/fabric-llms-overview_sample.ipynb
@@ -0,0 +1,1194 @@
+{
+ "cells": [
+ {
+ "cell_type": "markdown",
+ "id": "519955e9-2dad-456d-93db-a332d38e9433",
+ "metadata": {
+ "microsoft": {
+ "language": "python",
+ "language_group": "synapse_pyspark"
+ },
+ "nteract": {
+ "transient": {
+ "deleting": false
+ }
+ }
+ },
+ "source": [
+ "# Fabric: Highlights into AI/LLMs"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 2,
+ "id": "d312e8d9-03fe-4b3d-aa6d-c52e3022ae39",
+ "metadata": {
+ "microsoft": {
+ "language": "python",
+ "language_group": "synapse_pyspark"
+ }
+ },
+ "outputs": [
+ {
+ "data": {
+ "application/vnd.livy.statement-meta+json": {
+ "execution_finish_time": "2024-10-31T03:58:26.7170509Z",
+ "execution_start_time": "2024-10-31T03:58:19.270951Z",
+ "livy_statement_state": "available",
+ "normalized_state": "finished",
+ "parent_msg_id": "e267b6ab-5133-4598-8251-d64374cd11e5",
+ "queued_time": "2024-10-31T03:58:18.9132075Z",
+ "session_id": "7383b5d4-1dea-4b9b-85d6-fe5ef5b7d325",
+ "session_start_time": null,
+ "spark_pool": null,
+ "state": "finished",
+ "statement_id": 5,
+ "statement_ids": [
+ 5
+ ]
+ },
+ "text/plain": [
+ "StatementMeta(, 7383b5d4-1dea-4b9b-85d6-fe5ef5b7d325, 5, Finished, Available, Finished)"
+ ]
+ },
+ "metadata": {},
+ "output_type": "display_data"
+ },
+ {
+ "name": "stdout",
+ "output_type": "stream",
+ "text": [
+ "Name: synapseml\r\n",
+ "Version: 1.0.8\r\n",
+ "Summary: Synapse Machine Learning\r\n",
+ "Home-page: https://github.com/Microsoft/SynapseML\r\n",
+ "Author: Microsoft\r\n",
+ "Author-email: synapseml-support@microsoft.com\r\n",
+ "License: MIT\r\n",
+ "Location: /home/trusted-service-user/cluster-env/clonedenv/lib/python3.11/site-packages\r\n",
+ "Requires: \r\n",
+ "Required-by: \r\n"
+ ]
+ }
+ ],
+ "source": [
+ "!pip show synapseml"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 3,
+ "id": "427610d0-3fae-45e3-8150-92ee7674f44c",
+ "metadata": {
+ "jupyter": {
+ "outputs_hidden": false,
+ "source_hidden": false
+ },
+ "microsoft": {
+ "language": "python",
+ "language_group": "synapse_pyspark"
+ },
+ "nteract": {
+ "transient": {
+ "deleting": false
+ }
+ }
+ },
+ "outputs": [
+ {
+ "data": {
+ "application/vnd.livy.statement-meta+json": {
+ "execution_finish_time": "2024-10-31T03:58:28.6254349Z",
+ "execution_start_time": "2024-10-31T03:58:27.1124616Z",
+ "livy_statement_state": "available",
+ "normalized_state": "finished",
+ "parent_msg_id": "0e9f6c0f-062b-4e5d-9061-afcd89c8fd75",
+ "queued_time": "2024-10-31T03:58:19.3223486Z",
+ "session_id": "7383b5d4-1dea-4b9b-85d6-fe5ef5b7d325",
+ "session_start_time": null,
+ "spark_pool": null,
+ "state": "finished",
+ "statement_id": 6,
+ "statement_ids": [
+ 6
+ ]
+ },
+ "text/plain": [
+ "StatementMeta(, 7383b5d4-1dea-4b9b-85d6-fe5ef5b7d325, 6, Finished, Available, Finished)"
+ ]
+ },
+ "metadata": {},
+ "output_type": "display_data"
+ },
+ {
+ "name": "stdout",
+ "output_type": "stream",
+ "text": [
+ "Name: langchain-openai\r\n",
+ "Version: 0.2.4\r\n",
+ "Summary: An integration package connecting OpenAI and LangChain\r\n",
+ "Home-page: https://github.com/langchain-ai/langchain\r\n",
+ "Author: \r\n",
+ "Author-email: \r\n",
+ "License: MIT\r\n",
+ "Location: /home/trusted-service-user/cluster-env/clonedenv/lib/python3.11/site-packages\r\n",
+ "Requires: langchain-core, openai, tiktoken\r\n",
+ "Required-by: \r\n"
+ ]
+ }
+ ],
+ "source": [
+ "!pip show langchain-openai"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 4,
+ "id": "baeeb853-2104-4edf-abf4-4d4be50cb977",
+ "metadata": {
+ "jupyter": {
+ "outputs_hidden": false,
+ "source_hidden": false
+ },
+ "microsoft": {
+ "language": "python",
+ "language_group": "synapse_pyspark"
+ },
+ "nteract": {
+ "transient": {
+ "deleting": false
+ }
+ }
+ },
+ "outputs": [
+ {
+ "data": {
+ "application/vnd.livy.statement-meta+json": {
+ "execution_finish_time": "2024-10-31T03:58:30.5465258Z",
+ "execution_start_time": "2024-10-31T03:58:29.0000586Z",
+ "livy_statement_state": "available",
+ "normalized_state": "finished",
+ "parent_msg_id": "716d9975-263b-4d92-b25c-b342106f5f43",
+ "queued_time": "2024-10-31T03:58:19.511824Z",
+ "session_id": "7383b5d4-1dea-4b9b-85d6-fe5ef5b7d325",
+ "session_start_time": null,
+ "spark_pool": null,
+ "state": "finished",
+ "statement_id": 7,
+ "statement_ids": [
+ 7
+ ]
+ },
+ "text/plain": [
+ "StatementMeta(, 7383b5d4-1dea-4b9b-85d6-fe5ef5b7d325, 7, Finished, Available, Finished)"
+ ]
+ },
+ "metadata": {},
+ "output_type": "display_data"
+ },
+ {
+ "name": "stdout",
+ "output_type": "stream",
+ "text": [
+ "Name: langchain\r\n",
+ "Version: 0.3.6\r\n",
+ "Summary: Building applications with LLMs through composability\r\n",
+ "Home-page: https://github.com/langchain-ai/langchain\r\n",
+ "Author: \r\n",
+ "Author-email: \r\n",
+ "License: MIT\r\n",
+ "Location: /home/trusted-service-user/cluster-env/clonedenv/lib/python3.11/site-packages\r\n",
+ "Requires: aiohttp, langchain-core, langchain-text-splitters, langsmith, numpy, pydantic, PyYAML, requests, SQLAlchemy, tenacity\r\n",
+ "Required-by: langchain-community\r\n"
+ ]
+ }
+ ],
+ "source": [
+ "!pip show langchain"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "c58cc406-c4f5-4607-a740-0802e8e4b550",
+ "metadata": {
+ "microsoft": {
+ "language": "python",
+ "language_group": "synapse_pyspark"
+ },
+ "nteract": {
+ "transient": {
+ "deleting": false
+ }
+ }
+ },
+ "source": [
+ "## Ensure you have the API key and endpoint URL for your deployed model. Set these as environment variables"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 19,
+ "id": "3c8ada7c-2632-4c69-86d2-f5260ee8f1b7",
+ "metadata": {
+ "jupyter": {
+ "outputs_hidden": false,
+ "source_hidden": false
+ },
+ "microsoft": {
+ "language": "python",
+ "language_group": "synapse_pyspark"
+ },
+ "nteract": {
+ "transient": {
+ "deleting": false
+ }
+ }
+ },
+ "outputs": [
+ {
+ "data": {
+ "application/vnd.livy.statement-meta+json": {
+ "execution_finish_time": "2024-10-31T04:20:14.3495341Z",
+ "execution_start_time": "2024-10-31T04:20:14.1128215Z",
+ "livy_statement_state": "available",
+ "normalized_state": "finished",
+ "parent_msg_id": "2573bf75-fe6d-40dc-b9f6-e06ebb9f7f73",
+ "queued_time": "2024-10-31T04:20:13.6194485Z",
+ "session_id": "7383b5d4-1dea-4b9b-85d6-fe5ef5b7d325",
+ "session_start_time": null,
+ "spark_pool": null,
+ "state": "finished",
+ "statement_id": 22,
+ "statement_ids": [
+ 22
+ ]
+ },
+ "text/plain": [
+ "StatementMeta(, 7383b5d4-1dea-4b9b-85d6-fe5ef5b7d325, 22, Finished, Available, Finished)"
+ ]
+ },
+ "metadata": {},
+ "output_type": "display_data"
+ }
+ ],
+ "source": [
+ "import os\n",
+ "\n",
+ "os.environ[\"OPENAI_API_VERSION\"] = \"2023-08-01-preview\"\n",
+ "os.environ[\"AZURE_OPENAI_ENDPOINT\"] = \"https://your-resource.openai.azure.com/openai/deployments/gpt-4o-mini/chat/completions?api-version=2024-08-01-preview\"\n",
+ "os.environ[\"AZURE_OPENAI_API_KEY\"] = \"your-value\""
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "3fac48a9-45fb-4e86-9792-8ee340b0ac60",
+ "metadata": {
+ "microsoft": {
+ "language": "python",
+ "language_group": "synapse_pyspark"
+ },
+ "nteract": {
+ "transient": {
+ "deleting": false
+ }
+ }
+ },
+ "source": [
+ "## Create an instance of the Azure OpenAI class using the environment variables set above"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 20,
+ "id": "5db10350-8000-4cbd-9bdf-d7da62d7fe61",
+ "metadata": {
+ "jupyter": {
+ "outputs_hidden": false,
+ "source_hidden": false
+ },
+ "microsoft": {
+ "language": "python",
+ "language_group": "synapse_pyspark"
+ },
+ "nteract": {
+ "transient": {
+ "deleting": false
+ }
+ }
+ },
+ "outputs": [
+ {
+ "data": {
+ "application/vnd.livy.statement-meta+json": {
+ "execution_finish_time": "2024-10-31T04:20:14.9382032Z",
+ "execution_start_time": "2024-10-31T04:20:14.7083469Z",
+ "livy_statement_state": "available",
+ "normalized_state": "finished",
+ "parent_msg_id": "7dfaca5a-f738-4010-bba1-f764ea70f450",
+ "queued_time": "2024-10-31T04:20:14.027325Z",
+ "session_id": "7383b5d4-1dea-4b9b-85d6-fe5ef5b7d325",
+ "session_start_time": null,
+ "spark_pool": null,
+ "state": "finished",
+ "statement_id": 23,
+ "statement_ids": [
+ 23
+ ]
+ },
+ "text/plain": [
+ "StatementMeta(, 7383b5d4-1dea-4b9b-85d6-fe5ef5b7d325, 23, Finished, Available, Finished)"
+ ]
+ },
+ "metadata": {},
+ "output_type": "display_data"
+ }
+ ],
+ "source": [
+ "from langchain_openai import AzureChatOpenAI\n",
+ "\n",
+ "# Set the API base URL\n",
+ "api_base = os.environ[\"AZURE_OPENAI_ENDPOINT\"]\n",
+ "\n",
+ "# Create an instance of the Azure OpenAI Class\n",
+ "llm = AzureChatOpenAI(\n",
+ " openai_api_key=os.environ[\"AZURE_OPENAI_API_KEY\"],\n",
+ " temperature=0.7,\n",
+ " verbose=True,\n",
+ " top_p=0.9\n",
+ ")"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "b17d7450-34b5-4ece-8e20-a77ddcdd93c4",
+ "metadata": {
+ "microsoft": {
+ "language": "python",
+ "language_group": "synapse_pyspark"
+ },
+ "nteract": {
+ "transient": {
+ "deleting": false
+ }
+ }
+ },
+ "source": [
+ "## Use the Azure OpenAI service to generate text or perform other language model tasks"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 21,
+ "id": "cfc5fd62-085a-4eff-9192-696d9f249a8e",
+ "metadata": {
+ "jupyter": {
+ "outputs_hidden": false,
+ "source_hidden": false
+ },
+ "microsoft": {
+ "language": "python",
+ "language_group": "synapse_pyspark"
+ },
+ "nteract": {
+ "transient": {
+ "deleting": false
+ }
+ }
+ },
+ "outputs": [
+ {
+ "data": {
+ "application/vnd.livy.statement-meta+json": {
+ "execution_finish_time": "2024-10-31T04:20:16.0500538Z",
+ "execution_start_time": "2024-10-31T04:20:15.2936074Z",
+ "livy_statement_state": "available",
+ "normalized_state": "finished",
+ "parent_msg_id": "e14e4d0b-1fd0-4dac-a07d-6479d6536ce3",
+ "queued_time": "2024-10-31T04:20:14.4969185Z",
+ "session_id": "7383b5d4-1dea-4b9b-85d6-fe5ef5b7d325",
+ "session_start_time": null,
+ "spark_pool": null,
+ "state": "finished",
+ "statement_id": 24,
+ "statement_ids": [
+ 24
+ ]
+ },
+ "text/plain": [
+ "StatementMeta(, 7383b5d4-1dea-4b9b-85d6-fe5ef5b7d325, 24, Finished, Available, Finished)"
+ ]
+ },
+ "metadata": {},
+ "output_type": "display_data"
+ },
+ {
+ "name": "stdout",
+ "output_type": "stream",
+ "text": [
+ "content='Salut, comment ça va ?' additional_kwargs={'refusal': None} response_metadata={'token_usage': {'completion_tokens': 6, 'prompt_tokens': 33, 'total_tokens': 39, 'completion_tokens_details': None, 'prompt_tokens_details': None}, 'model_name': 'gpt-4o-mini', 'system_fingerprint': 'fp_d54531d9eb', 'prompt_filter_results': [{'prompt_index': 0, 'content_filter_results': {}}], 'finish_reason': 'stop', 'logprobs': None, 'content_filter_results': {'hate': {'filtered': False, 'severity': 'safe'}, 'protected_material_code': {'filtered': False, 'detected': False}, 'protected_material_text': {'filtered': False, 'detected': False}, 'self_harm': {'filtered': False, 'severity': 'safe'}, 'sexual': {'filtered': False, 'severity': 'safe'}, 'violence': {'filtered': False, 'severity': 'safe'}}} id='run-8cb7f29a-44c1-4f65-a648-15afb2d793dc-0' usage_metadata={'input_tokens': 33, 'output_tokens': 6, 'total_tokens': 39, 'input_token_details': {}, 'output_token_details': {}}\n"
+ ]
+ }
+ ],
+ "source": [
+ "# Define a prompt\n",
+ "messages = [\n",
+ " (\n",
+ " \"system\",\n",
+ " \"You are a helpful assistant that translates English to French. Translate the user sentence.\",\n",
+ " ),\n",
+ " (\"human\", \"Hi, how are you?\"),\n",
+ "]\n",
+ "\n",
+ "# Generate a response from the Azure OpenAI service using the invoke method\n",
+ "ai_msg = llm.invoke(messages)\n",
+ "\n",
+ "# Print the response\n",
+ "print(ai_msg)"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "79729106-c7f1-4879-bc2b-871b50c2ac9a",
+ "metadata": {
+ "microsoft": {
+ "language": "python",
+ "language_group": "synapse_pyspark"
+ },
+ "nteract": {
+ "transient": {
+ "deleting": false
+ }
+ }
+ },
+ "source": [
+ "## Define a prompt template for generating definitions"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 22,
+ "id": "ca633361-c27b-4294-b8a7-9fc4a316afa4",
+ "metadata": {
+ "jupyter": {
+ "outputs_hidden": false,
+ "source_hidden": false
+ },
+ "microsoft": {
+ "language": "python",
+ "language_group": "synapse_pyspark"
+ },
+ "nteract": {
+ "transient": {
+ "deleting": false
+ }
+ }
+ },
+ "outputs": [
+ {
+ "data": {
+ "application/vnd.livy.statement-meta+json": {
+ "execution_finish_time": "2024-10-31T04:20:16.587491Z",
+ "execution_start_time": "2024-10-31T04:20:16.3655978Z",
+ "livy_statement_state": "available",
+ "normalized_state": "finished",
+ "parent_msg_id": "cc3215f4-71a5-4231-af47-9bd9a8f5698a",
+ "queued_time": "2024-10-31T04:20:14.7799392Z",
+ "session_id": "7383b5d4-1dea-4b9b-85d6-fe5ef5b7d325",
+ "session_start_time": null,
+ "spark_pool": null,
+ "state": "finished",
+ "statement_id": 25,
+ "statement_ids": [
+ 25
+ ]
+ },
+ "text/plain": [
+ "StatementMeta(, 7383b5d4-1dea-4b9b-85d6-fe5ef5b7d325, 25, Finished, Available, Finished)"
+ ]
+ },
+ "metadata": {},
+ "output_type": "display_data"
+ }
+ ],
+ "source": [
+ "from langchain.prompts import PromptTemplate\n",
+ "\n",
+ "copy_prompt = PromptTemplate(\n",
+ " input_variables=[\"technology\"],\n",
+ " template=\"Define the following word: {technology}\",\n",
+ ")"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "899839d9-adca-4042-b662-73edcad7e432",
+ "metadata": {
+ "microsoft": {
+ "language": "python",
+ "language_group": "synapse_pyspark"
+ },
+ "nteract": {
+ "transient": {
+ "deleting": false
+ }
+ }
+ },
+ "source": [
+ "## Create an LLMChain with the defined prompt template"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 23,
+ "id": "bd4f65ca-049b-481d-bbbd-a017c6c0119b",
+ "metadata": {
+ "jupyter": {
+ "outputs_hidden": false,
+ "source_hidden": false
+ },
+ "microsoft": {
+ "language": "python",
+ "language_group": "synapse_pyspark"
+ },
+ "nteract": {
+ "transient": {
+ "deleting": false
+ }
+ }
+ },
+ "outputs": [
+ {
+ "data": {
+ "application/vnd.livy.statement-meta+json": {
+ "execution_finish_time": "2024-10-31T04:20:17.1233668Z",
+ "execution_start_time": "2024-10-31T04:20:16.9052959Z",
+ "livy_statement_state": "available",
+ "normalized_state": "finished",
+ "parent_msg_id": "27790d83-509f-4716-bb69-9c288ad069ba",
+ "queued_time": "2024-10-31T04:20:15.1325692Z",
+ "session_id": "7383b5d4-1dea-4b9b-85d6-fe5ef5b7d325",
+ "session_start_time": null,
+ "spark_pool": null,
+ "state": "finished",
+ "statement_id": 26,
+ "statement_ids": [
+ 26
+ ]
+ },
+ "text/plain": [
+ "StatementMeta(, 7383b5d4-1dea-4b9b-85d6-fe5ef5b7d325, 26, Finished, Available, Finished)"
+ ]
+ },
+ "metadata": {},
+ "output_type": "display_data"
+ }
+ ],
+ "source": [
+ "from langchain.chains import LLMChain\n",
+ "\n",
+ "chain = LLMChain(llm=llm, prompt=copy_prompt)\n"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "936b3ddf-cc65-436c-ba4e-ae0abe21fc2c",
+ "metadata": {
+ "microsoft": {
+ "language": "python",
+ "language_group": "synapse_pyspark"
+ },
+ "nteract": {
+ "transient": {
+ "deleting": false
+ }
+ }
+ },
+ "source": [
+ "## Set up the LangChain transformer to execute the processing chain\n"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 24,
+ "id": "63a00038-37b4-49ee-9c53-128c8acf9d01",
+ "metadata": {
+ "jupyter": {
+ "outputs_hidden": false,
+ "source_hidden": false
+ },
+ "microsoft": {
+ "language": "python",
+ "language_group": "synapse_pyspark"
+ },
+ "nteract": {
+ "transient": {
+ "deleting": false
+ }
+ }
+ },
+ "outputs": [
+ {
+ "data": {
+ "application/vnd.livy.statement-meta+json": {
+ "execution_finish_time": "2024-10-31T04:20:18.181457Z",
+ "execution_start_time": "2024-10-31T04:20:17.4351576Z",
+ "livy_statement_state": "available",
+ "normalized_state": "finished",
+ "parent_msg_id": "3fb30420-f0c9-477b-ad1a-001dc0d8d37a",
+ "queued_time": "2024-10-31T04:20:15.6799013Z",
+ "session_id": "7383b5d4-1dea-4b9b-85d6-fe5ef5b7d325",
+ "session_start_time": null,
+ "spark_pool": null,
+ "state": "finished",
+ "statement_id": 27,
+ "statement_ids": [
+ 27
+ ]
+ },
+ "text/plain": [
+ "StatementMeta(, 7383b5d4-1dea-4b9b-85d6-fe5ef5b7d325, 27, Finished, Available, Finished)"
+ ]
+ },
+ "metadata": {},
+ "output_type": "display_data"
+ }
+ ],
+ "source": [
+ "from synapse.ml.cognitive.langchain import LangchainTransformer\n",
+ "\n",
+ "openai_api_key= os.environ[\"AZURE_OPENAI_API_KEY\"]\n",
+ "\n",
+ "transformer = (\n",
+ " LangchainTransformer()\n",
+ " .setInputCol(\"technology\")\n",
+ " .setOutputCol(\"definition\")\n",
+ " .setChain(chain)\n",
+ " .setSubscriptionKey(openai_api_key)\n",
+ " .setUrl(api_base)\n",
+ ")"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "c74293f0-925e-4987-a6a1-b3b9b8e14b9d",
+ "metadata": {
+ "microsoft": {
+ "language": "python",
+ "language_group": "synapse_pyspark"
+ },
+ "nteract": {
+ "transient": {
+ "deleting": false
+ }
+ }
+ },
+ "source": [
+ "## Construct a DataFrame with technology names."
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 26,
+ "id": "8e03963e-2fcf-4934-b96f-ac27b4e0353c",
+ "metadata": {
+ "jupyter": {
+ "outputs_hidden": false,
+ "source_hidden": false
+ },
+ "microsoft": {
+ "language": "python",
+ "language_group": "synapse_pyspark"
+ },
+ "nteract": {
+ "transient": {
+ "deleting": false
+ }
+ }
+ },
+ "outputs": [
+ {
+ "data": {
+ "application/vnd.livy.statement-meta+json": {
+ "execution_finish_time": "2024-10-31T04:24:08.3891172Z",
+ "execution_start_time": "2024-10-31T04:24:02.0675933Z",
+ "livy_statement_state": "available",
+ "normalized_state": "finished",
+ "parent_msg_id": "856f5b73-26e8-4d20-a901-356cd92b9c2a",
+ "queued_time": "2024-10-31T04:24:01.6603792Z",
+ "session_id": "7383b5d4-1dea-4b9b-85d6-fe5ef5b7d325",
+ "session_start_time": null,
+ "spark_pool": null,
+ "state": "finished",
+ "statement_id": 29,
+ "statement_ids": [
+ 29
+ ]
+ },
+ "text/plain": [
+ "StatementMeta(, 7383b5d4-1dea-4b9b-85d6-fe5ef5b7d325, 29, Finished, Available, Finished)"
+ ]
+ },
+ "metadata": {},
+ "output_type": "display_data"
+ },
+ {
+ "name": "stdout",
+ "output_type": "stream",
+ "text": [
+ "+-----+----------+----------------------+\n",
+ "|label|technology|transformed_technology|\n",
+ "+-----+----------+----------------------+\n",
+ "| 0| docker| DOCKER|\n",
+ "| 1| spark| SPARK|\n",
+ "| 2| python| PYTHON|\n",
+ "+-----+----------+----------------------+\n",
+ "\n"
+ ]
+ }
+ ],
+ "source": [
+ "from pyspark.sql import SparkSession\n",
+ "from pyspark.sql.functions import udf\n",
+ "from pyspark.sql.types import StringType\n",
+ "\n",
+ "# Initialize Spark session\n",
+ "spark = SparkSession.builder.appName(\"example\").getOrCreate()\n",
+ "\n",
+ "# Construct a DataFrame with technology names\n",
+ "df = spark.createDataFrame(\n",
+ " [\n",
+ " (0, \"docker\"), (1, \"spark\"), (2, \"python\")\n",
+ " ],\n",
+ " [\"label\", \"technology\"]\n",
+ ")\n",
+ "\n",
+ "# Define a simple UDF to transform the technology column\n",
+ "def transform_technology(tech):\n",
+ " return tech.upper()\n",
+ "\n",
+ "# Register the UDF\n",
+ "transform_udf = udf(transform_technology, StringType())\n",
+ "\n",
+ "# Apply the UDF to the DataFrame\n",
+ "transformed_df = df.withColumn(\"transformed_technology\", transform_udf(df[\"technology\"]))\n",
+ "\n",
+ "# Show the transformed DataFrame\n",
+ "transformed_df.show()"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "47ab1ba6-deaf-488d-9e95-8202669d948c",
+ "metadata": {
+ "microsoft": {
+ "language": "python",
+ "language_group": "synapse_pyspark"
+ },
+ "nteract": {
+ "transient": {
+ "deleting": false
+ }
+ }
+ },
+ "source": [
+ "## Extract content from PDFs linked in arXiv papers and generate prompts for extracting specific information.\n"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 27,
+ "id": "8b52c87e-5971-4d28-bc4b-4160d29a1c24",
+ "metadata": {
+ "jupyter": {
+ "outputs_hidden": false,
+ "source_hidden": false
+ },
+ "microsoft": {
+ "language": "python",
+ "language_group": "synapse_pyspark"
+ },
+ "nteract": {
+ "transient": {
+ "deleting": false
+ }
+ }
+ },
+ "outputs": [
+ {
+ "data": {
+ "application/vnd.livy.statement-meta+json": {
+ "execution_finish_time": "2024-10-31T04:27:08.3224773Z",
+ "execution_start_time": "2024-10-31T04:27:08.0430507Z",
+ "livy_statement_state": "available",
+ "normalized_state": "finished",
+ "parent_msg_id": "4eeab690-4159-41dc-be69-3cceed484314",
+ "queued_time": "2024-10-31T04:27:07.6309068Z",
+ "session_id": "7383b5d4-1dea-4b9b-85d6-fe5ef5b7d325",
+ "session_start_time": null,
+ "spark_pool": null,
+ "state": "finished",
+ "statement_id": 30,
+ "statement_ids": [
+ 30
+ ]
+ },
+ "text/plain": [
+ "StatementMeta(, 7383b5d4-1dea-4b9b-85d6-fe5ef5b7d325, 30, Finished, Available, Finished)"
+ ]
+ },
+ "metadata": {},
+ "output_type": "display_data"
+ }
+ ],
+ "source": [
+ "from langchain.document_loaders import OnlinePDFLoader\n",
+ "\n",
+ "def paper_content_extraction(inputs: dict) -> dict:\n",
+ " arxiv_link = inputs[\"arxiv_link\"]\n",
+ " loader = OnlinePDFLoader(arxiv_link)\n",
+ " pages = loader.load_and_split()\n",
+ " return {\"paper_content\": pages[0].page_content + pages[1].page_content}\n",
+ "\n",
+ "def prompt_generation(inputs: dict) -> dict:\n",
+ " output = inputs[\"Output\"]\n",
+ " prompt = (\n",
+ " \"find the paper title, author, summary in the paper description below, output them. \"\n",
+ " \"After that, Use websearch to find out 3 recent papers of the first author in the author section below \"\n",
+ " \"(first author is the first name separated by comma) and list the paper titles in bullet points: \"\n",
+ " \"\\n\" + output + \".\"\n",
+ " )\n",
+ " return {\"prompt\": prompt}"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "89d79c38-ba0c-4062-911c-7ede02536298",
+ "metadata": {
+ "microsoft": {
+ "language": "python",
+ "language_group": "synapse_pyspark"
+ },
+ "nteract": {
+ "transient": {
+ "deleting": false
+ }
+ }
+ },
+ "source": [
+ "## Set up a chain to extract structured information from an arXiv link\n"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 28,
+ "id": "e85241a0-11c2-49c1-9b2e-63187cb24d9a",
+ "metadata": {
+ "jupyter": {
+ "outputs_hidden": false,
+ "source_hidden": false
+ },
+ "microsoft": {
+ "language": "python",
+ "language_group": "synapse_pyspark"
+ },
+ "nteract": {
+ "transient": {
+ "deleting": false
+ }
+ }
+ },
+ "outputs": [
+ {
+ "data": {
+ "application/vnd.livy.statement-meta+json": {
+ "execution_finish_time": "2024-10-31T04:28:11.2331925Z",
+ "execution_start_time": "2024-10-31T04:28:11.0134852Z",
+ "livy_statement_state": "available",
+ "normalized_state": "finished",
+ "parent_msg_id": "232b4aa0-1b84-47f8-bb5d-347a575d9640",
+ "queued_time": "2024-10-31T04:28:10.663514Z",
+ "session_id": "7383b5d4-1dea-4b9b-85d6-fe5ef5b7d325",
+ "session_start_time": null,
+ "spark_pool": null,
+ "state": "finished",
+ "statement_id": 31,
+ "statement_ids": [
+ 31
+ ]
+ },
+ "text/plain": [
+ "StatementMeta(, 7383b5d4-1dea-4b9b-85d6-fe5ef5b7d325, 31, Finished, Available, Finished)"
+ ]
+ },
+ "metadata": {},
+ "output_type": "display_data"
+ }
+ ],
+ "source": [
+ "from langchain.chains import TransformChain, SimpleSequentialChain\n",
+ "\n",
+ "paper_content_extraction_chain = TransformChain(\n",
+ " input_variables=[\"arxiv_link\"],\n",
+ " output_variables=[\"paper_content\"],\n",
+ " transform=paper_content_extraction,\n",
+ " verbose=False,\n",
+ ")\n",
+ "\n",
+ "paper_summarizer_template = \"\"\"\n",
+ "You are a paper summarizer, given the paper content, it is your job to summarize the paper into a short summary, \n",
+ "and extract authors and paper title from the paper content.\n",
+ "\"\"\""
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "64937339-791c-4aad-953b-ca990bfd324a",
+ "metadata": {
+ "microsoft": {
+ "language": "python",
+ "language_group": "synapse_pyspark"
+ },
+ "nteract": {
+ "transient": {
+ "deleting": false
+ }
+ }
+ },
+ "source": [
+ "## Use Microsoft Fabric's native integration with the MLflow framework to log the trained machine learning models, the used hyperparameters, and evaluation metrics."
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 30,
+ "id": "5bac7684-a123-4733-baa3-a748ff0fd070",
+ "metadata": {
+ "jupyter": {
+ "outputs_hidden": false,
+ "source_hidden": false
+ },
+ "microsoft": {
+ "language": "python",
+ "language_group": "synapse_pyspark"
+ },
+ "nteract": {
+ "transient": {
+ "deleting": false
+ }
+ }
+ },
+ "outputs": [
+ {
+ "data": {
+ "application/vnd.livy.statement-meta+json": {
+ "execution_finish_time": "2024-10-31T04:36:54.8917645Z",
+ "execution_start_time": "2024-10-31T04:36:44.7561664Z",
+ "livy_statement_state": "available",
+ "normalized_state": "finished",
+ "parent_msg_id": "d2abef17-25d7-41c4-a62f-051d9b5fe8d7",
+ "queued_time": "2024-10-31T04:36:44.2999954Z",
+ "session_id": "7383b5d4-1dea-4b9b-85d6-fe5ef5b7d325",
+ "session_start_time": null,
+ "spark_pool": null,
+ "state": "finished",
+ "statement_id": 33,
+ "statement_ids": [
+ 33
+ ]
+ },
+ "text/plain": [
+ "StatementMeta(, 7383b5d4-1dea-4b9b-85d6-fe5ef5b7d325, 33, Finished, Available, Finished)"
+ ]
+ },
+ "metadata": {},
+ "output_type": "display_data"
+ },
+ {
+ "name": "stderr",
+ "output_type": "stream",
+ "text": [
+ "Registered model 'RandomForestRegressionModel' already exists. Creating a new version of this model...\n",
+ "2024/10/31 04:36:52 INFO mlflow.store.model_registry.abstract_store: Waiting up to 300 seconds for model version to finish creation. Model name: RandomForestRegressionModel, version 2\n",
+ "Created version '2' of model 'RandomForestRegressionModel'.\n"
+ ]
+ },
+ {
+ "name": "stdout",
+ "output_type": "stream",
+ "text": [
+ "Model Name: RandomForestRegressionModel\n",
+ "Model Version: 2\n"
+ ]
+ },
+ {
+ "data": {
+ "application/vnd.mlflow.run-widget+json": {
+ "data": {
+ "metrics": {},
+ "params": {
+ "n_estimators": "3",
+ "random_state": "42"
+ },
+ "tags": {
+ "mlflow.rootRunId": "20c75f63-d266-40b1-83f7-d9c76fd1f4f4",
+ "mlflow.runName": "icy_hamster_xr34qfzf",
+ "mlflow.user": "4b3a56ea-6f42-450e-b7c3-fb2932c7ac32",
+ "synapseml.experiment.artifactId": "17b41ab7-b0e0-4adc-9fc9-403dd72b6e5b",
+ "synapseml.experimentName": "Notebook-1",
+ "synapseml.livy.id": "7383b5d4-1dea-4b9b-85d6-fe5ef5b7d325",
+ "synapseml.notebook.artifactId": "789d5fef-b2a1-409b-996f-0cdb4e748a90",
+ "synapseml.user.id": "ea5a1fdc-a08c-493a-bce9-8422f28ecd05",
+ "synapseml.user.name": "System Administrator"
+ }
+ },
+ "info": {
+ "artifact_uri": "sds://onelakewestus3.pbidedicated.windows.net/6361aeaa-b63a-44ea-b28f-26db10b31a6c/17b41ab7-b0e0-4adc-9fc9-403dd72b6e5b/20c75f63-d266-40b1-83f7-d9c76fd1f4f4/artifacts",
+ "end_time": 1730349412,
+ "experiment_id": "d52403ad-a9c2-41ba-b582-9b8e9a57917e",
+ "lifecycle_stage": "active",
+ "run_id": "20c75f63-d266-40b1-83f7-d9c76fd1f4f4",
+ "run_name": "",
+ "run_uuid": "20c75f63-d266-40b1-83f7-d9c76fd1f4f4",
+ "start_time": 1730349405,
+ "status": "FINISHED",
+ "user_id": "7ebfac85-3ebb-440f-a743-e52052051f6a"
+ },
+ "inputs": {
+ "dataset_inputs": []
+ }
+ }
+ },
+ "metadata": {},
+ "output_type": "display_data"
+ }
+ ],
+ "source": [
+ "import mlflow\n",
+ "from mlflow.models import infer_signature\n",
+ "from sklearn.datasets import make_regression\n",
+ "from sklearn.ensemble import RandomForestRegressor\n",
+ "\n",
+ "# Generate synthetic regression data\n",
+ "X, y = make_regression(n_features=4, n_informative=2, random_state=0, shuffle=False)\n",
+ "\n",
+ "# Model parameters\n",
+ "params = {\"n_estimators\": 3, \"random_state\": 42}\n",
+ "\n",
+ "# Model tags for MLflow\n",
+ "model_tags = {\n",
+ " \"project_name\": \"grocery-forecasting\",\n",
+ " \"store_dept\": \"produce\",\n",
+ " \"team\": \"stores-ml\",\n",
+ " \"project_quarter\": \"Q3-2023\"\n",
+ "}\n",
+ "\n",
+ "# Log MLflow entities\n",
+ "with mlflow.start_run() as run:\n",
+ " # Train the model\n",
+ " model = RandomForestRegressor(**params).fit(X, y)\n",
+ "\n",
+ " # Infer the model signature\n",
+ " signature = infer_signature(X, model.predict(X))\n",
+ "\n",
+ " # Log parameters and the model\n",
+ " mlflow.log_params(params)\n",
+ " mlflow.sklearn.log_model(model, artifact_path=\"sklearn-model\", signature=signature)\n",
+ "\n",
+ " # Register the model with tags\n",
+ " model_uri = f\"runs:/{run.info.run_id}/sklearn-model\"\n",
+ " model_version = mlflow.register_model(model_uri, \"RandomForestRegressionModel\", tags=model_tags)\n",
+ "\n",
+ " # Output model registration details\n",
+ " print(f\"Model Name: {model_version.name}\")\n",
+ " print(f\"Model Version: {model_version.version}\")"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "315ebdcd-e78c-4bc5-93d6-f202d02bddc5",
+ "metadata": {
+ "microsoft": {
+ "language": "python",
+ "language_group": "synapse_pyspark"
+ },
+ "nteract": {
+ "transient": {
+ "deleting": false
+ }
+ }
+ },
+ "source": [
+ "## Use MLflow to search among multiple models saved within the workspace"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "id": "60e6f7d3-d1ec-4ccc-9745-6c7938d2f4bc",
+ "metadata": {
+ "jupyter": {
+ "outputs_hidden": false,
+ "source_hidden": false
+ },
+ "microsoft": {
+ "language": "python",
+ "language_group": "synapse_pyspark"
+ },
+ "nteract": {
+ "transient": {
+ "deleting": false
+ }
+ }
+ },
+ "outputs": [],
+ "source": [
+ "from pprint import pprint\n",
+ "from mlflow.tracking import MlflowClient\n",
+ "\n",
+ "client = MlflowClient()\n",
+ "for rm in client.search_registered_models():\n",
+ " pprint(dict(rm), indent=4)"
+ ]
+ }
+ ],
+ "metadata": {
+ "dependencies": {
+ "environment": {
+ "environmentId": "766562be-9e21-456c-b270-cac7e4bf8d18",
+ "workspaceId": "6361aeaa-b63a-44ea-b28f-26db10b31a6c"
+ }
+ },
+ "kernel_info": {
+ "name": "synapse_pyspark"
+ },
+ "kernelspec": {
+ "display_name": "Synapse PySpark",
+ "language": "Python",
+ "name": "synapse_pyspark"
+ },
+ "language_info": {
+ "name": "python"
+ },
+ "microsoft": {
+ "language": "python",
+ "language_group": "synapse_pyspark",
+ "ms_spell_check": {
+ "ms_spell_check_language": "en"
+ }
+ },
+ "nteract": {
+ "version": "nteract-front-end@1.0.0"
+ },
+ "spark_compute": {
+ "compute_id": "/trident/default",
+ "session_options": {
+ "conf": {
+ "spark.synapse.nbs.session.timeout": "1200000"
+ }
+ }
+ },
+ "widgets": {}
+ },
+ "nbformat": 4,
+ "nbformat_minor": 5
+}
diff --git a/Workloads-Specific/PowerBi/ConfigureCloudConnectionsGateways.md b/Workloads-Specific/PowerBi/ConfigureCloudConnectionsGateways.md
index 7833668..7bfb5bc 100644
--- a/Workloads-Specific/PowerBi/ConfigureCloudConnectionsGateways.md
+++ b/Workloads-Specific/PowerBi/ConfigureCloudConnectionsGateways.md
@@ -5,7 +5,7 @@ Costa Rica
[](https://github.com/)
[brown9804](https://github.com/brown9804)
-Last updated: 2025-04-16
+Last updated: 2025-04-21
------------------------------------------
@@ -30,21 +30,17 @@ Last updated: 2025-04-16
Table of Contents (Click to expand)
-- [Power Bi: Cloud Connections & Gateways](#power-bi-cloud-connections--gateways)
- - [Wiki](#wiki)
- - [Content](#content)
- - [How to Manage Cloud connections](#how-to-manage-cloud-connections)
- - [Creating Shareable Connections](#creating-shareable-connections)
- - [Managing Connections](#managing-connections)
- - [Admin Monitoring Workspace](#admin-monitoring-workspace)
- - [Identify Access per report](#identify-access-per-report)
- - [Restrict Access from new gateway connections](#restrict-access-from-new-gateway-connections)
- - [On-premises Data Gateways](#on-premises-data-gateways)
- - [Virtual Network VNet Data Gateways](#virtual-network-vnet-data-gateways)
+- [How to Manage Cloud connections](#how-to-manage-cloud-connections)
+ - [Creating Shareable Connections](#creating-shareable-connections)
+ - [Managing Connections](#managing-connections)
+- [Admin Monitoring Workspace](#admin-monitoring-workspace)
+- [Identify Access per report](#identify-access-per-report)
+- [Restrict Access from new gateway connections](#restrict-access-from-new-gateway-connections)
+ - [On-premises Data Gateways](#on-premises-data-gateways)
+ - [Virtual Network VNet Data Gateways](#virtual-network-vnet-data-gateways)
-
## How to Manage Cloud connections
Managing cloud connections in Power BI, below you can find differences between personal and shareable cloud connections:
@@ -77,9 +73,10 @@ Managing cloud connections in Power BI, below you can find differences between p
| --- | --- |
| Private | Contains sensitive or confidential information, and the visibility of the data source may be restricted to authorized users. It is completely isolated from other data sources. Examples include Facebook data, a text file containing stock awards, or a workbook containing an employee review. |
| Organizational | Limits the visibility of a data source to a trusted group of people. It is isolated from all Public data sources, but is visible to other Organizational data sources. A common example is a Microsoft Word document on an intranet SharePoint site with permissions enabled for a trusted group. |
-| Public | Gives everyone visibility to the data. Only files, internet data sources, or workbook data can be marked Public. Examples include data from a Wikipedia page, or a local file containing data copied from a public web page.|
+| Public | Gives everyone visibility to the data. Only files, internet data sources, or workbook data can be marked Public. Examples include data from a Wikipedia page, or a local file containing data copied from a public web page.|
+
+Steps:
-Steps:
- Go to [Power Bi](https://app.powerbi.com/)
- Click on ⚙️, and go to `Manage connections and gateways`
@@ -91,7 +88,7 @@ Steps:
### Managing Connections
-> - `Switching to Shareable Connections`: If you want to switch from a personal cloud connection to a shareable one, you can do so in the Semantic model settings. This allows you to leverage the benefits of shareable connections, such as easier management and sharing capabilities.
+> - `Switching to Shareable Connections`: If you want to switch from a personal cloud connection to a shareable one, you can do so in the Semantic model settings. This allows you to leverage the benefits of shareable connections, such as easier management and sharing capabilities.
> - `Granular Access Control`: Power BI allows for granular access control at the tenant, workspace, and semantic model levels. This means you can enforce access policies to ensure that only authorized users can create or use specific connections.
- To assign the connection a semantic model, click on `...` over your semantic model, and go to `Settings`
@@ -120,12 +117,12 @@ Steps to setup admin monitoring workspace:
-> The report can be accessed from the Admin monitoring workspace and is designed for admins to analyze various usage scenarios.
+> The report can be accessed from the Admin monitoring workspace and is designed for admins to analyze various usage scenarios.
| Report Name | Details |
-| --- | --- |
+| --- | --- |
| Feature Usage and Adoption Report | This report provides an in-depth analysis of how different features are utilized and adopted across your Microsoft Fabric tenant. It includes pages for activity overview, analysis, and detailed activity scenarios, helping identify which users are making use of cloud connections. |
-| Purview Hub | Offers insights into data governance and compliance. It helps administrators manage and monitor data policies, ensuring that data usage aligns with organizational standards and regulatory requirements. |
+| Purview Hub | Offers insights into data governance and compliance. It helps administrators manage and monitor data policies, ensuring that data usage aligns with organizational standards and regulatory requirements. |
@@ -146,6 +143,7 @@ Benefits of sharing the semantic model:
> [!IMPORTANT]
> Other ways to get insights:
+>
> - `Monitoring Usage`: You can monitor and manage cloud connections through the Power BI service. By navigating to the Manage connections and gateways section, you can see which users have access to and are using specific cloud connections.
>
> - `Premium Capacity Metrics`: For a more detailed analysis, you can use the Premium Capacity Metrics app, which provides insights into the usage and performance of your Power BI Premium capacities.
@@ -163,7 +161,7 @@ Benefits of sharing the semantic model:
## Restrict Access from new gateway connections
-> Facilitate secure data transfer between Power BI or Power Apps and non-cloud data sources like on-premises SQL Server databases or SharePoint sites.
+> Facilitate secure data transfer between Power BI or Power Apps and non-cloud data sources like on-premises SQL Server databases or SharePoint sites.
Gateway Roles:
@@ -181,13 +179,12 @@ Connection Roles:
| `User` | - Can use the connection in Power BI reports and dataflows.
- Cannot see or update credentials. |
| `User with Sharing` | - Can use the connection in Power BI reports and dataflows.
- Can share the data source with others with User permission. |
-
Steps to Manage Gateway and Connection Roles:
- Go to [Power Bi/Fabric admin center](https://app.powerbi.com/)
- Click on ⚙️, and go to `Manage Connections and Gateways`
- Choose `Connections`, `On premises data gateway` or `Virtual Network data gateways`:
-
+
- Click on `...`, and select `Manage users`:
@@ -212,7 +209,6 @@ Steps to Restrict Access for On-Premises Data Gateways:
> - **Tenant-Level Control**: You can `restrict who can install on-premises data gateways at the tenant level through the Power Platform admin center`. This prevents unauthorized users from creating new gateway connections.
> - **Role Management**: Assign specific roles to users, such as Admin, Connection Creator, and Connection Creator with Sharing, `to control who can create and manage connections on the gateway`.
-
1. **Access the Power Platform Admin Center**: Go to the [Power Platform Admin Center](https://admin.powerplatform.microsoft.com/ext/DataGateways).
2. **Navigate to Data Gateways**:
- Click on **Data** (preview) in the left-hand menu.
@@ -226,7 +222,7 @@ Steps to Restrict Access for On-Premises Data Gateways:
-### Virtual Network (VNet) Data Gateways
+### Virtual Network (VNet) Data Gateways
> Allow Power BI to connect to data services within an Azure virtual network without needing an on-premises data gateway. This setup is particularly useful for maintaining security and compliance by keeping data traffic within the Azure backbone.
diff --git a/Workloads-Specific/PowerBi/ConfigureReadAccess.md b/Workloads-Specific/PowerBi/ConfigureReadAccess.md
index a97ecb9..598c3c1 100644
--- a/Workloads-Specific/PowerBi/ConfigureReadAccess.md
+++ b/Workloads-Specific/PowerBi/ConfigureReadAccess.md
@@ -1,11 +1,11 @@
-# Demostration: How to Configure Read Access
+# Demostration: How to Configure Read Access
Costa Rica
[](https://github.com/)
[brown9804](https://github.com/brown9804)
-Last updated: 2025-04-16
+Last updated: 2025-04-21
-----------------------------------------
@@ -43,18 +43,16 @@ Last updated: 2025-04-16
-## Overview
+## Overview
**Create a Fabric Capacity**: Follow the prompts to configure and create the capacity.
-
## Viewer Role in Fabric Workspaces
> `Fabric Workspaces` in Microsoft Fabric are `collaborative environments where users can manage, analyze, and visualize data`. These workspaces integrate various data services and tools, providing a `unified platform for data professional`s to work together
-
| **Capability** | **Description** |
|------------------------|-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
| **View All Content** | - Users can view dashboards, reports, workbooks, and other content within the workspace.
- This includes content created by other users, enabling collaboration and shared insights. |
@@ -82,10 +80,10 @@ Last updated: 2025-04-16
> Semantic Model: Provides a logical description of an analytical domain using business-friendly terminology and metrics.
Capabilities:
- - Data Representation: Organizes data into a star schema with facts and dimensions.
- - Business Logic: Inherits business logic from parent lakehouses or warehouses.
- - Visualization: Supports creating Power BI reports and dashboards for visual analysis.
+- Data Representation: Organizes data into a star schema with facts and dimensions.
+- Business Logic: Inherits business logic from parent lakehouses or warehouses.
+- Visualization: Supports creating Power BI reports and dashboards for visual analysis.
@@ -98,27 +96,28 @@ Capabilities:
-
## SQL Analytics Endpoint in Fabric
> Lakehouse: A data architecture platform for storing, managing, and analyzing both structured and unstructured data.
Capabilities:
- - Data Storage: Combines the capabilities of data lakes and data warehouses.
- - SQL Analytics Endpoint: Provides a SQL-based experience for querying data.
- - Automatic Table Discovery: Automatically registers and validates tables.
+
+- Data Storage: Combines the capabilities of data lakes and data warehouses.
+- SQL Analytics Endpoint: Provides a SQL-based experience for querying data.
+- Automatic Table Discovery: Automatically registers and validates tables.
> SQL Analytics Endpoint: Allows users to query data in the lakehouse using SQL.
Capabilities:
- - T-SQL Queries: Supports T-SQL language for querying Delta tables.
- - Read-Only Mode: Operates in read-only mode, allowing data analysis without modifying the data.
- - Security: Implements SQL security for access control.
+
+- T-SQL Queries: Supports T-SQL language for querying Delta tables.
+- Read-Only Mode: Operates in read-only mode, allowing data analysis without modifying the data.
+- Security: Implements SQL security for access control.
> Apache Endpoint: Used for real-time data streaming and processing.
Capabilities:
- - Event Streaming: Streams events to and from Real-Time Intelligence using Apache Kafka.
- - Integration: Integrates with event streams to process and route real-time events.
- - Scalability: Supports building scalable, real-time data systems.
+- Event Streaming: Streams events to and from Real-Time Intelligence using Apache Kafka.
+- Integration: Integrates with event streams to process and route real-time events.
+- Scalability: Supports building scalable, real-time data systems.
diff --git a/Workloads-Specific/PowerBi/ConfigureWorkspaceApp.md b/Workloads-Specific/PowerBi/ConfigureWorkspaceApp.md
index 53a1345..2b02dec 100644
--- a/Workloads-Specific/PowerBi/ConfigureWorkspaceApp.md
+++ b/Workloads-Specific/PowerBi/ConfigureWorkspaceApp.md
@@ -1,11 +1,11 @@
-# Demostration: How to Configure Workspace App
+# Demostration: How to Configure Workspace App
Costa Rica
[](https://github.com/)
[brown9804](https://github.com/brown9804)
-Last updated: 2025-04-16
+Last updated: 2025-04-21
------------------------------------------
@@ -18,7 +18,7 @@ Last updated: 2025-04-16
2. Go to [Fabric](https://app.fabric.microsoft.com/), and assign the capacity created to the workspace desired.
-
+
> Select the `large semantic model only if your model exceeds 10 GB`. If not, use the small model. The large setup is for models up to 10 GB.
@@ -48,22 +48,22 @@ Last updated: 2025-04-16
- - You will see something like this:
-
+- You will see something like this:
+
- - You can leverage copilot to modify your report:
+- You can leverage copilot to modify your report:
- - Once you are ready, save your report:
+- Once you are ready, save your report:
- - At this point you will have your `lakehouse`, with your `SQL analytics endpoint`, the `semantic model` and `the report`.
-
+- At this point you will have your `lakehouse`, with your `SQL analytics endpoint`, the `semantic model` and `the report`.
+
8. A paginated report, can also be created:
@@ -89,10 +89,10 @@ Last updated: 2025-04-16
- - Let's say you want only `viewer` permissions:
+- Let's say you want only `viewer` permissions:
1. Need to give access to the lakehouse/sql analytics endpoint:
-
+
> `Read All SQL Endpoint Data` permission allows users to access and read data from SQL endpoints within the Fabric environment. This permission is typically required for users who need to:
@@ -100,8 +100,6 @@ Last updated: 2025-04-16
> - Access Reports: `View and interact with reports and dashboards that rely on SQL data sources`.
> - Data Analysis: `Perform data analysis and generate insights` using SQL-based data.
-
-
2. Make sure the person already have access to the semantic model:
diff --git a/Workloads-Specific/PowerBi/CopilotReports.md b/Workloads-Specific/PowerBi/CopilotReports.md
index d4c3b5f..e21f51c 100644
--- a/Workloads-Specific/PowerBi/CopilotReports.md
+++ b/Workloads-Specific/PowerBi/CopilotReports.md
@@ -5,14 +5,14 @@ Costa Rica
[](https://github.com/)
[brown9804](https://github.com/brown9804)
-Last updated: 2025-04-16
+Last updated: 2025-04-21
----------
-> Prerequisites:
-> - **Admin Account**: Ensure you have admin privileges in Microsoft Fabric.
-> - **Licenses**: You need a paid Fabric capacity (F64 or higher) or Power BI Premium capacity (P1 or higher).
-
+> Prerequisites:
+>
+> - **Admin Account**: Ensure you have admin privileges in Microsoft Fabric.
+> - **Licenses**: You need a paid Fabric capacity (F64 or higher) or Power BI Premium capacity (P1 or higher).
List of References (Click to expand)
@@ -32,14 +32,13 @@ Last updated: 2025-04-16
-
-## How to Tenant configuration
+## How to Tenant configuration
1. **Sign In**: Log in to Microsoft Fabric using your admin account credentials.
2. **Access Admin Portal**: Go to the Fabric settings and select the Admin portal from the menu.
-
+
3. **Tenant Settings**: Navigate to the Tenant settings in the Admin portal.
4. **Enable Copilot**: Use the search feature to locate the Copilot settings. Toggle the switch to enable Copilot in Fabric.
@@ -50,8 +49,9 @@ Last updated: 2025-04-16
## How to Configure Workspaces
+
1. **Workspace Settings**: Ensure that your reports are located in a workspace with either Premium Power BI (P1 and above) or paid Fabric (F64 and above) capacity.
-
+
2. **Apply Capacity**: Check your license type in the Workspace settings and apply either Premium capacity or Fabric capacity to the workspace.
@@ -59,6 +59,7 @@ Last updated: 2025-04-16
## How to Using Copilot in Power BI
+
1. **Access Copilot**: Once enabled, users can access Copilot across different workloads in Fabric, including Power BI.
2. **Generate Insights**: Use Copilot to transform and analyze data, generate insights, and create visualizations and reports.
diff --git a/Workloads-Specific/PowerBi/HowUseRestAPI.md b/Workloads-Specific/PowerBi/HowUseRestAPI.md
index 49117bf..a992aa3 100644
--- a/Workloads-Specific/PowerBi/HowUseRestAPI.md
+++ b/Workloads-Specific/PowerBi/HowUseRestAPI.md
@@ -1,17 +1,16 @@
-# Demostration: How to Use Power BI REST API
+# Demostration: How to Use Power BI REST API
Costa Rica
[](https://github.com/)
[brown9804](https://github.com/brown9804)
-Last updated: 2025-04-16
+Last updated: 2025-04-21
----------
> The Power BI REST API provides programmatic access to several Power BI resources, enabling automation and embedding of analytics.
-
List of References (Click to expand)
@@ -27,19 +26,17 @@ Last updated: 2025-04-16
-
Table of Contents (Click to expand)
- [Overview](#overview)
- [How to work around the rate limits](#how-to-work-around-the-rate-limits)
- - [Batch Request](#batch-request)
- - [Example Implementation in Python](#example-implementation-in-python)
+ - [Batch Request](#batch-request)
+ - [Example Implementation in Python](#example-implementation-in-python)
-## Overview
-
+## Overview
> [!IMPORTANT]
> There are rate limits for Power BI REST API endpoints.
@@ -71,7 +68,7 @@ Last updated: 2025-04-16
> Example of this works:
-```mermaid
+```mermaid
graph TD
A[Client Application] -->|Batch Request| B[Power BI REST API]
B -->|Response| A
@@ -137,8 +134,6 @@ response = batch_request(access_token, requests)
print(response)
```
-
-
Total Visitors

diff --git a/Workloads-Specific/PowerBi/IncrementalRefresh.md b/Workloads-Specific/PowerBi/IncrementalRefresh.md
index 1c81a0c..a6ce326 100644
--- a/Workloads-Specific/PowerBi/IncrementalRefresh.md
+++ b/Workloads-Specific/PowerBi/IncrementalRefresh.md
@@ -1,11 +1,11 @@
-# Power Bi: Incremental Refresh for Reporting - Overview
+# Power Bi: Incremental Refresh for Reporting - Overview
Costa Rica
[](https://github.com/)
[brown9804](https://github.com/brown9804)
-Last updated: 2025-04-15
+Last updated: 2025-04-21
----------
@@ -28,16 +28,15 @@ Last updated: 2025-04-15
- [Overview](#overview)
- [How the VertiPaq Engine Works](#how-the-vertipaq-engine-works)
- [How to create a unique key](#how-to-create-a-unique-key)
- - [Best Practices for Creating Unique Keys in Power BI](#best-practices-for-creating-unique-keys-in-power-bi)
- - [Strategies to Avoid High Cardinality in Power BI](#strategies-to-avoid-high-cardinality-in-power-bi)
+ - [Best Practices for Creating Unique Keys in Power BI](#best-practices-for-creating-unique-keys-in-power-bi)
+ - [Strategies to Avoid High Cardinality in Power BI](#strategies-to-avoid-high-cardinality-in-power-bi)
- [Steps to Change a Column Type to Date in Power BI](#steps-to-change-a-column-type-to-date-in-power-bi)
+## Overview
-## Overview
-
-> Allows Power BI to refresh only the data that has changed or is new since the last refresh, rather than refreshing the entire dataset. Particularly useful for large datasets, reducing processing and transfer times.
+> Allows Power BI to refresh only the data that has changed or is new since the last refresh, rather than refreshing the entire dataset. Particularly useful for large datasets, reducing processing and transfer times.
| **Aspect** | **Details** |
|---------------------------------|-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
@@ -113,7 +112,7 @@ Last updated: 2025-04-15
- **Optimize Data Model**: Ensure your data model is well-structured, preferably using a star schema, to improve performance and make troubleshooting easier.
- **Monitor Performance**: Keep an eye on performance metrics to identify any bottlenecks or issues related to data transformations and loading. Regular monitoring can help you catch and address issues before they impact your reports and dashboards.
-## How to create a unique key
+## How to create a unique key
> By concatenating multiple columns using DAX (Data Analysis Expressions) in Power BI
@@ -125,11 +124,12 @@ Last updated: 2025-04-15
UniqueKey = [column1] & "_" & [column2] & "_" & [column3]
```
- For example:
+ For example:
```DAX
UniqueKey = [DateTimeColumn] & "_" & [CallerID] & "_" & [CallID]
```
+
- **Apply the Changes**: After entering the formula, press Enter to create the new column. In this DAX formula example, it concatenates the `DateTimeColumn`, `CallerID`, and `CallID` columns with underscores to create a unique key for each record.
### Best Practices for Creating Unique Keys in Power BI