Skip to content

Commit 2fb5016

Browse files
authored
[deploy] Data Formulator v0.2 -- working with large dataset (with database support)
data formulator v0.2, with database support to handle large sized data
2 parents 01a79d8 + 8b2bd62 commit 2fb5016

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

48 files changed

+4493
-2988
lines changed

.env.template

Lines changed: 19 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,2 +1,20 @@
11
# Provide frontend configuration settings from environment variables
2-
SHOW_KEYS_ENABLED=true
2+
# You can override these settings when lauching the app as well:
3+
# python -m data_formulator -p 5000 --exec-python-in-subprocess true --disable-display-keys true
4+
5+
DISABLE_DISPLAY_KEYS=false # if true, the display keys will not be shown in the frontend
6+
EXEC_PYTHON_IN_SUBPROCESS=false # if true, the python code will be executed in a subprocess to avoid crashing the main app, but it will increase the time of response
7+
8+
LOCAL_DB_DIR= # the directory to store the local database, if not provided, the app will use the temp directory
9+
10+
# External atabase connection settings
11+
# check https://duckdb.org/docs/stable/extensions/mysql.html
12+
# and https://duckdb.org/docs/stable/extensions/postgres.html
13+
USE_EXTERNAL_DB=false # if true, the app will use an external database instead of the one in the app
14+
DB_NAME=mysql_db # the name to refer to this database connection
15+
DB_TYPE=mysql # mysql or postgresql
16+
DB_HOST=localhost
17+
DB_PORT=0
18+
DB_DATABASE=mysql
19+
DB_USER=root
20+
DB_PASSWORD=

DEVELOPMENT.md

Lines changed: 19 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -20,9 +20,25 @@ How to set up your local machine.
2020
```
2121
- **Configure environment variable (optional)s**
2222
- copy `api-keys.env.example` to `api-keys.env` and add your API keys.
23-
- required fields for different providers are different, please refer to the [LiteLLM setup](https://docs.litellm.ai/docs#litellm-python-sdk) guide for more details.
24-
- currently only endpoint, model, api_key, api_base, api_version are supported.
25-
- this helps data formulator to automatically load the API keys when you run the app, so you don't need to set the API keys in the app UI.
23+
- required fields for different providers are different, please refer to the [LiteLLM setup](https://docs.litellm.ai/docs#litellm-python-sdk) guide for more details.
24+
- currently only endpoint, model, api_key, api_base, api_version are supported.
25+
- this helps data formulator to automatically load the API keys when you run the app, so you don't need to set the API keys in the app UI.
26+
27+
- set `.env` to configure server properties:
28+
- copy `.env.template` to `.env`
29+
- configure settings as needed:
30+
- DISABLE_DISPLAY_KEYS: if true, API keys will not be shown in the frontend
31+
- EXEC_PYTHON_IN_SUBPROCESS: if true, Python code runs in a subprocess (safer but slower), you may consider setting it true when you are hosting Data Formulator for others
32+
- LOCAL_DB_DIR: directory to store the local database (uses temp directory if not set)
33+
- External database settings (when USE_EXTERNAL_DB=true):
34+
- DB_NAME: name to refer to this database connection
35+
- DB_TYPE: mysql or postgresql (currently only these two are supported)
36+
- DB_HOST: database host address
37+
- DB_PORT: database port
38+
- DB_DATABASE: database name
39+
- DB_USER: database username
40+
- DB_PASSWORD: database password
41+
2642
2743
- **Run the app**
2844
- **Windows**

README.md

Lines changed: 10 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -22,9 +22,18 @@ Transform data and create rich visualizations iteratively with AI 🪄. Try Data
2222

2323
## News 🔥🔥🔥
2424

25+
- [04-23-2025] Data Formulator 0.2: working with *large* data 📦📦📦
26+
- Explore large data by:
27+
1. Upload large data file to the local database (powered by [DuckDB](https://github.com/duckdb/duckdb)).
28+
2. Use drag-and-drop to specify charts, and Data Formulator dynamically fetches data from the database to create visualizations (with ⚡️⚡️⚡️ speeds).
29+
3. Work with AI agents: they generate SQL queries to transform the data to create rich visualizations!
30+
4. Anchor the result / follow up / create a new branch / join tables; let's dive deeper.
31+
- Checkout the demos at [[https://github.com/microsoft/data-formulator/releases/tag/0.2]](https://github.com/microsoft/data-formulator/releases/tag/0.2)
32+
- Improved overall system performance, and enjoy the updated derive concept functionality.
33+
2534
- [03-20-2025] Data Formulator 0.1.7: Anchoring ⚓︎
2635
- Anchor an intermediate dataset, so that followup data analysis are built on top of the anchored data, not the original one.
27-
- Clean a data and work with only the cleaned data; create a subset from the original data or join multiple data, and then focus your analysis from there. The AI agent will be less likely to get confused and work faster. ⚡️⚡️
36+
- Clean a data and work with only the cleaned data; create a subset from the original data or join multiple data, and then go from there. AI agents will be less likely to get confused and work faster. ⚡️⚡️
2837
- Check out the demos at [[https://github.com/microsoft/data-formulator/releases/tag/0.1.7]](https://github.com/microsoft/data-formulator/releases/tag/0.1.7)
2938
- Don't forget to update Data Formulator to test it out!
3039

0 commit comments

Comments
 (0)