-
Notifications
You must be signed in to change notification settings - Fork 5.4k
Adding documentation for Microsoft Fabric VS Code experience for data science #8946
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Open
mksuni
wants to merge
32
commits into
microsoft:main
Choose a base branch
from
mksuni:fabric-doc
base: main
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Open
Changes from 30 commits
Commits
Show all changes
32 commits
Select commit
Hold shift + click to select a range
b890664
Add Microsoft Fabric quickstart guide for VS Code
mksuni ea97175
Fix typo in Microsoft Fabric section headings
mksuni be3b614
Update Microsoft Fabric quickstart documentation
mksuni e03268f
added Git and MCP details
mksuni acb34c1
Merge branch 'microsoft:main' into main
mksuni ad9606e
updated fabric extension doc
mksuni b8e5cb3
updated fabric data science documentation
mksuni c35a4f7
Merge branch 'main' into fabric-doc
mksuni 94217d4
added image for UDF
mksuni 38505f5
Merge branch 'fabric-doc' of https://github.com/mksuni/vscode-docs in…
mksuni 9f37ea4
Update docs/datascience/microsoft-fabric-quickstart.md
mksuni 480f238
Update microsoft-fabric-quickstart.md
mksuni 09342f3
Update docs/datascience/microsoft-fabric-quickstart.md
mksuni f8220dd
Update docs/datascience/microsoft-fabric-quickstart.md
mksuni 65c5263
Update docs/datascience/microsoft-fabric-quickstart.md
mksuni b5a1514
Update docs/datascience/microsoft-fabric-quickstart.md
mksuni 3f9158d
Update docs/datascience/microsoft-fabric-quickstart.md
mksuni 84f7291
Update docs/datascience/microsoft-fabric-quickstart.md
mksuni 52c6727
Update docs/datascience/microsoft-fabric-quickstart.md
mksuni 633f59d
Update docs/datascience/microsoft-fabric-quickstart.md
mksuni 8126dcf
Update docs/datascience/microsoft-fabric-quickstart.md
mksuni 744ed6e
Merge branch 'main' into fabric-doc
mksuni ea23bf5
Update microsoft-fabric-quickstart.md
mksuni 1717347
Update docs/datascience/microsoft-fabric-quickstart.md
mksuni c042bd3
Update docs/datascience/microsoft-fabric-quickstart.md
mksuni 433ce8d
Update docs/datascience/microsoft-fabric-quickstart.md
mksuni faae694
Update docs/datascience/microsoft-fabric-quickstart.md
mksuni 7983a2b
Update microsoft-fabric-quickstart.md
mksuni d10989a
Update toc.json
mksuni 3e3c7f3
Update microsoft-fabric-quickstart.md
mksuni 5f71eb3
Merge branch 'microsoft:main' into fabric-doc
mksuni ab6f339
Update image captions in quickstart guide
mksuni File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
3 changes: 3 additions & 0 deletions
3
docs/datascience/images/microsoft-fabric/fabric-command-palette.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
3 changes: 3 additions & 0 deletions
3
docs/datascience/images/microsoft-fabric/fabric-git-integration.gif
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
3 changes: 3 additions & 0 deletions
3
docs/datascience/images/microsoft-fabric/microsoft-fabric.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
3 changes: 3 additions & 0 deletions
3
docs/datascience/images/microsoft-fabric/publish-user-data-function.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
3 changes: 3 additions & 0 deletions
3
docs/datascience/images/microsoft-fabric/view-workspaces-and-items.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,219 @@ | ||
| --- | ||
| ContentId: 99a5d36e-ce14-4040-b1cf-7345b7fa2c7d | ||
| DateApproved: 10/9/2025 | ||
| MetaDescription: Get started with Microsoft Fabric extensions for Visual Studio Code to develop data engineering and analytics solutions | ||
| MetaSocialImage: images/datascience/fabric-social.png | ||
| --- | ||
|
|
||
| # Data science in Microsoft Fabric using Visual Studio Code | ||
|
|
||
| You can build and develop data science and data engineering solutions for [Microsoft Fabric](https://learn.microsoft.com/fabric/) within VS Code. [Microsoft Fabric](https://marketplace.visualstudio.com/items?itemName=fabric.vscode-fabric) extensions for VS Code provide an integrated development experience for working with Fabric artifacts, lakehouses, notebooks, and user data functions. | ||
|
|
||
| ## What is Microsoft Fabric? | ||
|
|
||
| [Microsoft Fabric](http://app.fabric.microsoft.com/) is an enterprise-ready, end-to-end analytics platform. It unifies data movement, data processing, ingestion, transformation, real-time event routing, and report building. It supports these capabilities with integrated services like Data Engineering, Data Factory, Data Science, Real-Time Intelligence, Data Warehouse, and Databases. [Sign up for free](https://app.fabric.microsoft.com/?pbi_source=learn-vscodedocs-microsoft-fabric-quickstart) and explore Microsoft Fabric for 60 days — no credit card required. | ||
|
|
||
|  | ||
mksuni marked this conversation as resolved.
Outdated
Show resolved
Hide resolved
|
||
|
|
||
| ## Prerequisites | ||
|
|
||
| Before you get started with Microsoft Fabric extensions for VS Code, you need: | ||
|
|
||
| * **Visual Studio Code**: Install latest [VS Code](https://code.visualstudio.com/) version. | ||
| * **Microsoft Fabric account**: You need access to a Microsoft Fabric workspace. You can [sign up for a free trial](https://app.fabric.microsoft.com/?pbi_source=learn-vscodedocs-microsoft-fabric-quickstart) to get started. | ||
| * **Python**: Install [Python 3.8 or later](https://python.org/downloads/) to work with [Notebooks](https://learn.microsoft.com/fabric/data-engineering/author-notebook-with-vs-code), [User data functions](https://learn.microsoft.com/fabric/data-engineering/user-data-functions/create-user-data-functions-vs-code) in VS Code. | ||
|
|
||
| ## Installation and setup | ||
|
|
||
| You can find and install the extensions from the [Visual Studio Marketplace](https://marketplace.visualstudio.com/VSCode) or directly in VS Code. Select the **Extensions** view (`kb(workbench.view.extensions)`) and search for **Microsoft Fabric**. | ||
|
|
||
| ### Which extensions to use | ||
|
|
||
| | Extension | Best For | Key Features | Recommended for you if… |Documentation| | ||
| |-----------------------------|-----------------------------|-----------------------------|--------------------------| --------------------------| | ||
| | **Microsoft Fabric extension** | General workspace management, item management and working with item definitions | - Manage Fabric items (Lakehouses, Notebooks, Pipelines)<br>- Microsoft account sign-in & tenant switching<br>- Unified or grouped item views<br>- Edit Fabric notebooks with IntelliSense<br>- Command Palette integration (`Fabric:` commands) | You want a single extension to manage workspaces, notebooks, and items in Fabric directly from VS Code. | [What is Fabric VS code extension](https://learn.microsoft.com/fabric/data-engineering/set-up-fabric-vs-code-extension)| | ||
| | **Fabric User data functions** | Developers building custom transformations & workflows | - Author serverless functions in Fabric<br>- Local debugging with breakpoints<br>- Manage data source connections<br>- Install/manage Python libraries<br>- Deploy functions directly to Fabric workspace | You build automation or data transformation logic and need debugging + deployment from VS Code. | [Develop User data function in VS code](https://learn.microsoft.com/fabric/data-engineering/user-data-functions/create-user-data-functions-vs-code)| | ||
| | **Fabric Data Engineering** | Data engineers working with large-scale data & Spark | - Explore Lakehouses (tables, raw files)<br>- Develop/debug Spark notebooks<br>- Build/test Spark job definitions<br>- Sync notebooks between local VS Code & Fabric<br>- Preview schemas & sample data | You work with Spark, Lakehouses, or large-scale data pipelines and want to explore, develop, and debug locally. | [Develop Fabric notebooks in VS Code](https://learn.microsoft.com/fabric/data-engineering/setup-vs-code-extension) | | ||
|
|
||
| ## Getting started | ||
| Once you have the extensions installed and signed in, you can start working with Fabric workspaces and items. In the Command Palette (`kb(workbench.action.showCommands)`), type **Fabric** to list the commands that are specific to Microsoft Fabric. | ||
|  | ||
|
|
||
| ## Fabric Workspace and items explorer | ||
|
|
||
| The Fabric extensions provide a seamless way to work with both remote and local Fabric items. | ||
| - In the Fabric extension, the **Fabric Workspaces** section lists all items from your remote workspace, organized by type (Lakehouses, Notebooks, Pipelines, and more). | ||
| - In the Fabric extension, the **Local folder** section shows a Fabric item(s) folder opened in VS Code. It reflects the structure of your fabric item definition for each type that is opened in VS Code. This enables you to develop locally and publish your changes to current or new workspace. | ||
|
|
||
|  | ||
|
|
||
| ## Use user data functions for data science | ||
|
|
||
| 1. In the Command Palette (`kb(workbench.action.showCommands)`), type **Fabric: Create Item**. | ||
| 2. Select your workspace and select **User data function**. Provide a name and select **Python** language. | ||
| 3. You are notified to set up the Python virtual environment and continue to set this up locally. | ||
| 4. Install the libraries using `pip install` or select the user data function item in the Fabric extension to add libraries. Update the `requirements.txt` file to specify the dependencies: | ||
|
|
||
| ```txt | ||
| fabric-user-data-functions ~= 1.0 | ||
| pandas == 2.3.1 | ||
| numpy == 2.3.2 | ||
| requests == 2.32.5 | ||
| scikit-learn=1.2.0 | ||
| joblib=1.2.0 | ||
| ``` | ||
|
|
||
| 4. Open `functions_app.py`. Here's an example of developing a User Data Function for data science using scikit-learn: | ||
|
|
||
| ```python | ||
| import datetime | ||
| import fabric.functions as fn | ||
| import logging | ||
|
|
||
| # Import additional libraries | ||
| import pandas as pd | ||
| from sklearn.ensemble import RandomForestClassifier | ||
| from sklearn.preprocessing import StandardScaler | ||
| from sklearn.model_selection import train_test_split | ||
| from sklearn.metrics import accuracy_score | ||
| import joblib | ||
|
|
||
| udf = fn.UserDataFunctions() | ||
| @udf.function() | ||
| def train_churn_model(data: list, targetColumn: str) -> dict: | ||
| ''' | ||
| Description: Train a Random Forest model to predict customer churn using pandas and scikit-learn. | ||
|
|
||
| Args: | ||
| - data (list): List of dictionaries containing customer features and churn target | ||
| Example: [{"Age": 25, "Income": 50000, "Churn": 0}, {"Age": 45, "Income": 75000, "Churn": 1}] | ||
| - targetColumn (str): Name of the target column for churn prediction | ||
| Example: "Churn" | ||
|
|
||
| Returns: dict: Model training results including accuracy and feature information | ||
| ''' | ||
| # Convert data to DataFrame | ||
| df = pd.DataFrame(data) | ||
|
|
||
| # Prepare features and target | ||
| numeric_features = df.select_dtypes(include=['number']).columns.tolist() | ||
| numeric_features.remove(targetColumn) | ||
|
|
||
| X = df[numeric_features] | ||
| y = df[targetColumn] | ||
|
|
||
| # Split and scale data | ||
| X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42) | ||
| scaler = StandardScaler() | ||
| X_train_scaled = scaler.fit_transform(X_train) | ||
| X_test_scaled = scaler.transform(X_test) | ||
|
|
||
| # Train model | ||
| model = RandomForestClassifier(n_estimators=100, random_state=42) | ||
| model.fit(X_train_scaled, y_train) | ||
|
|
||
| # Evaluate and save | ||
| accuracy = accuracy_score(y_test, model.predict(X_test_scaled)) | ||
| joblib.dump(model, 'churn_model.pkl') | ||
| joblib.dump(scaler, 'scaler.pkl') | ||
|
|
||
| return { | ||
| 'accuracy': float(accuracy), | ||
| 'features': numeric_features, | ||
| 'message': f'Model trained with {len(X_train)} samples and {accuracy:.2%} accuracy' | ||
| } | ||
|
|
||
| @udf.function() | ||
| def predict_churn(customer_data: list) -> list: | ||
| ''' | ||
| Description: Predict customer churn using trained Random Forest model. | ||
|
|
||
| Args: | ||
| - customer_data (list): List of dictionaries containing customer features for prediction | ||
| Example: [{"Age": 30, "Income": 60000}, {"Age": 55, "Income": 80000}] | ||
|
|
||
| Returns: list: Customer data with churn predictions and probability scores | ||
| ''' | ||
| # Load saved model and scaler | ||
| model = joblib.load('churn_model.pkl') | ||
| scaler = joblib.load('scaler.pkl') | ||
|
|
||
| # Convert to DataFrame and scale features | ||
| df = pd.DataFrame(customer_data) | ||
| X_scaled = scaler.transform(df) | ||
|
|
||
| # Make predictions | ||
| predictions = model.predict(X_scaled) | ||
| probabilities = model.predict_proba(X_scaled)[:, 1] | ||
|
|
||
| # Add predictions to original data | ||
| results = customer_data.copy() | ||
| for i, (pred, prob) in enumerate(zip(predictions, probabilities)): | ||
| results[i]['churn_prediction'] = int(pred) | ||
| results[i]['churn_probability'] = float(prob) | ||
|
|
||
| return results | ||
| ``` | ||
|
|
||
| 6. Test your functions locally, by pressing `kbstyle(F5)`. | ||
| 7. In the Fabric extension, in **Local folder** , select the function and publish to your workspace. | ||
|  | ||
mksuni marked this conversation as resolved.
Outdated
Show resolved
Hide resolved
|
||
|
|
||
| Learn more about invoking the function from: | ||
| - [Fabric Data pipelines](https://learn.microsoft.com/fabric/data-engineering/user-data-functions/create-functions-activity-data-pipelines) | ||
| - [Fabric Notebooks](https://learn.microsoft.com/fabric/data-engineering/notebook-utilities#user-data-function-udf-utilities) | ||
| - [An external application](https://learn.microsoft.com/fabric/data-engineering/user-data-functions/tutorial-invoke-from-python-app) | ||
|
|
||
| ## Use Fabric notebooks for data science | ||
| A Fabric notebook is an interactive workbook in Microsoft Fabric for writing and running code, visualizations, and markdown side-by-side. Notebooks support multiple languages (Python, Spark, SQL, Scala, and more) and are ideal for data exploration, transformation, and model development in Fabric working with your existing data in OneLake. | ||
|
|
||
| ### Example | ||
|
|
||
| The cell below reads a CSV with Spark, converts it to pandas, and trains a logistic regression model with scikit-learn. Replace column names and path with your dataset values. | ||
|
|
||
| ```python | ||
| def train_logistic_from_spark(spark, csv_path): | ||
| # Read CSV with Spark, convert to pandas | ||
| sdf = spark.read.option("header", "true").option("inferSchema", "true").csv(csv_path) | ||
| df = sdf.toPandas().dropna() | ||
|
|
||
| # Adjust these to match your dataset | ||
| X = df[['feature1', 'feature2']] | ||
| y = df['label'] | ||
|
|
||
| from sklearn.model_selection import train_test_split | ||
| from sklearn.linear_model import LogisticRegression | ||
| from sklearn.metrics import accuracy_score | ||
|
|
||
| X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42) | ||
| model = LogisticRegression(max_iter=200) | ||
| model.fit(X_train, y_train) | ||
|
|
||
| preds = model.predict(X_test) | ||
| return {'accuracy': float(accuracy_score(y_test, preds))} | ||
|
|
||
| # Example usage in a Fabric notebook cell | ||
| # train_logistic_from_spark(spark, '/path/to/data.csv') | ||
| ``` | ||
|
|
||
| Refer to [Microsoft Fabric Notebooks](https://learn.microsoft.com/fabric/data-engineering/how-to-use-notebook) documentation to learn more. | ||
|
|
||
| ## Git integration | ||
| Microsoft Fabric supports Git integration that enables version control and collaboration across data and analytics projects. You can connect a Fabric workspace to Git repositories, primarily Azure DevOps or GitHub, and only supported items are synced. This integration also supports CI/CD workflow to enable teams to manage releases efficiently and maintain high-quality analytics environments. | ||
|
|
||
|  | ||
|
|
||
| ## Next steps | ||
|
|
||
| Now that you have Microsoft Fabric extensions set up in VS Code, explore these resources to deepen your knowledge: | ||
|
|
||
| ### Learn more about Microsoft Fabric | ||
| * [Learn about Microsoft Fabric for Data Science](https://learn.microsoft.com/en-us/fabric/data-science/tutorial-data-science-introduction). | ||
| * [Set up your Fabric trial capacity](https://learn.microsoft.com/fabric/fundamentals/fabric-trial) | ||
| * [Microsoft Fabric fundamentals](https://learn.microsoft.com/fabric/fundamentals/fabric-overview) | ||
|
|
||
| ### Community and support | ||
|
|
||
| * [Microsoft Fabric community forums](https://community.fabric.microsoft.com/) | ||
| * [Fabric samples and templates](https://github.com/microsoft/fabric-samples) | ||
| * [Visual Studio Marketplace reviews and feedback](https://marketplace.visualstudio.com/items?itemName=ms-fabric.vscode-fabric) | ||
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Add a sentence that tells the reader what this article is about. You can start with "This article ..."