Skip to content

Releases: microsoft/data-formulator

Data Formulator 0.2

24 Apr 00:45
2fb5016
Compare
Choose a tag to compare

Data Formulator 0.2 now supports working with large datasets, powered by the backend database!

Demonstration: Exploration of Metacritic's Best Games and Reviews - 2025

  • This Kaggle dataset contains 13k+ games and 1.6M+ reviews of best games based on Metacritics reviews.
  • Data source: https://www.kaggle.com/datasets/davutb/metacritic-games
  • Exploration:
    • What's the relation between user scores and critic scores?
    • What are games where user reviews are really high but critic's scores are really low?
    • How does the score distribution compare between critics and users?
df-demo-game-reviews.mp4

Release details: data visualization with large sized data.

Data Formulator integrates DuckDB as the backend local database to support data exploration with large datasets (million rows). It is also possible to connect external database with DuckDB, not all connection are supported at the moment, but that's the beginning!

  • Upload large sized data to the local database, or connect to existing databases (mysql or postgres) to work with large data.
    • A subset of sample data will be pulled to the frontend to explore, you can roll the dice 🎲 or sort the data by different columns to view different samples.
    • Manage local database with the Database manager.
  • Interaction with Data Formulator as usual:
    • Use drag and drop to specify a chart, and Data Formulator can dynamically generate SQL query to fetch data to instantiate data. This process is quite fast!
    • Specify new visualization fields / provide NL instructions as usual, and the newly introduced NL2SQL agents can generate SQL queries based on your instruction to prepare the data, and create visualizations.
    • Anchor a dataset, followup, join some tables, can you can dive deep pretty fast into insights!
  • (Minor feature updates)
    • Updated how derived concept works in Data Formulator -- data transformation is executed in the backend and updated data is appended to the new dataset. New concepts can be applied directly to new dataset in one click.
    • Improved system performance with configurable sandboxing options (main process versus subprocess) for LLM generated code (~3s interaction time reduction).
    • Configurable default visualization size in the main panel.

Screenshot 2025-04-23 at 4 22 32 PM

Screenshot 2025-04-23 at 4 22 32 PM

More explorations on the demo dataset:

  • What's your favorite games and how their review change over time?
  • What's the franchise that consistently improved reviews?
  • What are games that have most different reviews in different platforms?
  • What are games with many positive critic reviews but no user bother to play?
  • What about reviews trends for the No Man's sky?

Well, it is time to upgrade Data Formulator and play with it! Let us know what you come up with :)

Data Formulator 0.1.7

21 Mar 02:24
79043d0
Compare
Choose a tag to compare

With Data Anchor, we can anchor an intermediate data to isolate it's derivation context from it's predecessors. Tables created from the anchor will take the anchored table as direct input (not the original data).

This could be helpful for cleaning initial input data (so we always work with cleaned data afterwards), or when we want to focus our analysis into a subset of dataset.

Example 1: Clean table

Use anchor to clean the table, so that follow-up analysis are all build on top of the clean data. Analysis of director profit is based on the filtered data.

anchor-clean-data.mp4

Example 2: Analyze a subset

Create a subset from the original table to focus analysis. The AI agent will be less likely to be confused, analysis will be faster. The anchored asian-energy dataset includes only countries from Asia.

anchor-subset-analysis.mp4

Illustration

The anchored thread has it's own context --- no more access to the original data. Though, you have the option to add the original data back using "multi-table" approach from the previous release. You can also go back to the original data to create another branch there.

image

Data Formulator 0.1.6

20 Feb 22:52
ed3bbe3
Compare
Choose a tag to compare

Highlight

It was supposed to only be some improvements and bug fixes over 0.1.5, but ended up getting much better --- Data Formulator now supports working with multiple tables! 🔥🔥🔥

When you add multiple tables to Data Formulator, you can select which base tables Data Formulator will use to derive the data (in the chart builder). This means Data Formulator can flexibly decide how to join or combine multiple tables together to create a visualization or answer your question.

In this demo below, we have a datasets of UK wheats production.

  • To visualize wheats production by UK monarch, we can load a second table (here I ask GPT-4o to generate the table out of nowhere since it has knowledge about history :)).
  • Then, we can drag a field from the second table to indicate that we want Data Formulator to leverage both tables to generate the chart, and it does.
  • In the second demo, we can manually tell Data Formulator needs to consider both tables to answer "average wheat production per monarch", and it will also join the two tables for create the answer.
df-multi-table-demo.mp4
image

Besides this feature, we have improved and fixed various UI and model selection issues from the community, thanks everyone for your suggestions! Let us know what you would like to see in Data Formulator next. :)

What's Changed

New Contributors

Full Changelog: 0.1.5.1...0.1.6

Data Formulator 0.1.5

13 Feb 00:30
0e1f215
Compare
Choose a tag to compare

What's New

Support more models!

image

Still, check out this Data Formulator experience vidoe:

data-formulator-ms-year-report-demo.mp4

0.1.5.1 -- fix the file upload bug

Data Formulator 0.1.4

07 Nov 18:41
8697bb8
Compare
Choose a tag to compare

This is the updates to the previous version with better error message display to help users debug what's going on if Data Formulator fails to run. Also introduces the direct conversation with table, could be useful for data cleaning.

  • We also improved data visualization challenges with data formulator -- can you complete them all?
  • Comment in the issue when you did, or share your results/questions with others! [comment here]

Enjoy this version! If there is any feedback, let us know.

data-formulator-ms-year-report-demo.mp4

image

Data Formulator v0.1.3.3

21 Oct 17:29
c5f86f0
Compare
Choose a tag to compare

This is the updates to the previous version with better error message display to help users debug what's going on if Data Formulator fails to run.

Update in 0.1.3.2: also include port option to run data formulator on a different port if the default one is occupied.
Update in 0.1.3.3: also to provide cleaning instruction when uploading an image.

Enjoy this version! If there is any feedback, let us know.

Here is a demo of this new version!

data-formulator-ms-year-report-demo.mp4

Data Formulator v0.1.2 Release

11 Oct 22:19
6af6886
Compare
Choose a tag to compare

Data Formulator v0.1.2 is released, featuring:

  • Python package release, so that you can install Data Formulator with pip install data_formulator locally, and run it with data_formulator.
  • Codespace script is updated to use the python package, for faster start up time.
  • Experimental feature on loading images or messy texts as inputs.