Skip to content

Commit cb0b1c6

Browse files
docs: docs for quickstart, cli, model settings (#37)
* vibe it baby * clean up * iterate with claude * Save prog * Update info pipeine * Fix tests * Fix typo * remove redundant overload * Add support for multiple default model providers and config * pull user-defined model configs and providers if available * Added tests for default model settings * save progress * refactor cli to be modular and use OOP * new tests for cli components * config_dir > config_path * simplify list * list tests * stranded commit * tests for commands * tests for field.py * tests for form.py * more tests * deleting providers should delete associated model configs * add readme.md for cli * clean up * Fix tests * feat: (FTUE) pull user-defined (via cli) model configs and providers (#24) * added docs for quick start and default model settings * Updates per chat * update quickstart.md * update default-model-settings.md * add check for interface.py as well * move default model config resolution to src/data_designer/__init__.py * Revert "move default model config resolution to src/data_designer/__init__.py" This reverts commit 806a81d. * docs for cli * update default-model-settings.md * docs for model provider * more docs * add new tests for get provider name * add lru cache * remove non doc related changes * PR feedback * update reset info * tip for settings files * update * update info about default inference providers * DATA_DESIGNER_HOME_DIR -> DATA_DESIGNER_HOME --------- Co-authored-by: Johnny Greco <[email protected]>
1 parent 9d337c2 commit cb0b1c6

File tree

6 files changed

+616
-0
lines changed

6 files changed

+616
-0
lines changed
Lines changed: 135 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,135 @@
1+
# Configuring Model Settings Using The CLI
2+
3+
The Data Designer CLI provides an interactive interface for creating and managing default model providers and model configurations stored in your Data Designer home directory (default: `~/.data-designer/`).
4+
5+
## Configuration Files
6+
7+
The CLI manages two YAML configuration files:
8+
9+
- **`model_providers.yaml`**: Model provider configurations
10+
- **`model_configs.yaml`**: Model configurations
11+
12+
!!! info "Automatic Configuration"
13+
If these configuration files don't already exist, the Data Designer library automatically creates them with default settings at runtime when first initialized.
14+
15+
!!! note "Custom Directory"
16+
You can customize the configuration directory location with the `DATA_DESIGNER_HOME` environment variable:
17+
```bash
18+
export DATA_DESIGNER_HOME="/path/to/your/custom/directory"
19+
```
20+
21+
## CLI Commands
22+
23+
The Data Designer CLI provides four main configuration commands:
24+
25+
```bash
26+
# Configure model providers
27+
data-designer config providers
28+
29+
# Configure models
30+
data-designer config models
31+
32+
# List current configurations
33+
data-designer config list
34+
35+
# Reset all configurations
36+
data-designer config reset
37+
```
38+
39+
!!! tip "Getting help"
40+
See available commands
41+
```bash
42+
data-designer --help
43+
```
44+
45+
See available sub-commands
46+
```bash
47+
data-designer config --help
48+
```
49+
50+
## Managing Model Providers
51+
52+
Run the interactive provider configuration command:
53+
54+
```bash
55+
data-designer config providers
56+
```
57+
58+
### Available Operations
59+
60+
**Add a new provider**: Define a new provider by entering its name, endpoint URL, provider type, and optionally an API key (as plain text or as an environment variable name).
61+
62+
**Update an existing provider**: Modify an existing provider's settings. All fields are pre-filled with current values.
63+
64+
**Delete a provider**: Remove a provider and its associated models.
65+
66+
**Delete all providers**: Remove all providers and their associated models.
67+
68+
**Change default provider**: Set which provider is used by default. This option is only available when multiple providers are configured.
69+
70+
## Managing Model Configurations
71+
72+
Run the interactive model configuration command:
73+
74+
```bash
75+
data-designer config models
76+
```
77+
78+
!!! info "Provider Required"
79+
You need at least one provider configured before adding models. Run `data-designer config providers` first if none exist.
80+
81+
### Available Operations
82+
83+
**Add a new model configuration**
84+
85+
Create a new model configuration with the following fields:
86+
87+
- **Alias**: A unique name for referencing this model in a column configuration.
88+
- **Model ID**: The model identifier (e.g., `nvidia/nvidia-nemotron-nano-9b-v2`)
89+
- **Provider**: Select from available providers (if multiple exist)
90+
- **Temperature**: Sampling temperature (0.0 to 2.0)
91+
- **Top P**: Nucleus sampling parameter (0.0 to 1.0)
92+
- **Max Tokens**: Maximum output length (1 to 100000)
93+
94+
!!! note "Additional Settings"
95+
To configure additional inference parameter settings or use distribution-based inference parameters, edit the `model_configs.yaml` file directly.
96+
97+
**Update an existing model configuration**: Modify an existing model's configuration. All fields are pre-filled with current values.
98+
99+
**Delete a model configuration**: Remove a single model configuration.
100+
101+
**Delete all model configurations**: Remove all model configurations. The CLI will ask for confirmation before proceeding.
102+
103+
## Listing Configurations
104+
105+
View all current configurations:
106+
107+
```bash
108+
data-designer config list
109+
```
110+
111+
This command displays:
112+
113+
- **Model Providers**: All configured providers with their endpoints (API keys are masked)
114+
- **Default Provider**: The currently selected default provider
115+
- **Model Configurations**: All configured models with their settings
116+
117+
## Resetting Configurations
118+
119+
Delete all configuration files:
120+
121+
```bash
122+
data-designer config reset
123+
```
124+
125+
The CLI will show which configuration files exist and ask for confirmation before deleting them.
126+
127+
!!! danger "Destructive Operation"
128+
This command permanently deletes all configuration files and resets to the default model providers and configurations. You'll need to reconfigure your custom configurations from scratch.
129+
130+
## See Also
131+
132+
- **[Model Providers](model-providers.md)**: Learn about the `ModelProvider` class and provider configuration
133+
- **[Model Configurations](model-configs.md)**: Learn about `ModelConfig` and `InferenceParameters`
134+
- **[Default Model Settings](default-model-settings.md)**: Pre-configured providers and model settings included with Data Designer
135+
- **[Quick Start Guide](../quick-start.md)**: Get started with a simple example
Lines changed: 96 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,96 @@
1+
# Default Model Settings
2+
3+
Data Designer ships with pre-configured model providers and model configurations that make it easy to start generating synthetic data without manual setup.
4+
5+
## Model Providers
6+
7+
Data Designer includes two default model providers that are configured automatically:
8+
9+
### NVIDIA Provider (`nvidia`)
10+
11+
- **Endpoint**: `https://integrate.api.nvidia.com/v1`
12+
- **API Key**: Set via `NVIDIA_API_KEY` environment variable
13+
- **Models**: Access to NVIDIA's hosted models from [build.nvidia.com](https://build.nvidia.com)
14+
- **Getting Started**: Sign up and get your API key at [build.nvidia.com](https://build.nvidia.com)
15+
16+
The NVIDIA provider gives you access to state-of-the-art models including Nemotron and other NVIDIA-optimized models.
17+
18+
### OpenAI Provider (`openai`)
19+
20+
- **Endpoint**: `https://api.openai.com/v1`
21+
- **API Key**: Set via `OPENAI_API_KEY` environment variable
22+
- **Models**: Access to OpenAI's model catalog
23+
- **Getting Started**: Get your API key from [platform.openai.com/api-keys](https://platform.openai.com/api-keys)
24+
25+
The OpenAI provider gives you access to GPT models and other OpenAI offerings.
26+
27+
## Model Configurations
28+
29+
Data Designer provides pre-configured model aliases for common use cases. When you create a `DataDesignerConfigBuilder` without specifying `model_configs`, these default configurations are automatically available.
30+
31+
### NVIDIA Models
32+
33+
The following model configurations are automatically available when `NVIDIA_API_KEY` is set:
34+
35+
| Alias | Model | Use Case | Temperature | Top P |
36+
|-------|-------|----------|-------------|-------|
37+
| `nvidia-text` | `nvidia/nvidia-nemotron-nano-9b-v2` | General text generation | 0.85 | 0.95 |
38+
| `nvidia-reasoning` | `openai/gpt-oss-20b` | Reasoning and analysis tasks | 0.35 | 0.95 |
39+
| `nvidia-vision` | `nvidia/nemotron-nano-12b-v2-vl` | Vision and image understanding | 0.85 | 0.95 |
40+
41+
42+
### OpenAI Models
43+
44+
The following model configurations are automatically available when `OPENAI_API_KEY` is set:
45+
46+
| Alias | Model | Use Case | Temperature | Top P |
47+
|-------|-------|----------|-------------|-------|
48+
| `openai-text` | `gpt-4.1` | General text generation | 0.85 | 0.95 |
49+
| `openai-reasoning` | `gpt-5` | Reasoning and analysis tasks | 0.35 | 0.95 |
50+
| `openai-vision` | `gpt-5` | Vision and image understanding | 0.85 | 0.95 |
51+
52+
53+
### How Default Model Providers and Configurations Work
54+
55+
When the Data Designer library or the CLI is initialized, default model configurations and providers are stored in the Data Designer home directory for easy access and customization if they do not already exist. These configuration files serve as the single source of truth for model settings. By default they are saved to the following paths:
56+
57+
- **Model Configs**: `~/.data-designer/model_configs.yaml`
58+
- **Model Providers**: `~/.data-designer/model_providers.yaml`
59+
60+
!!! tip Tip
61+
While these files provide a convenient way to specify settings for your model providers and configuration you use most often, they can always be set programatically in your SDG workflow.
62+
63+
You can customize the home directory location by setting the `DATA_DESIGNER_HOME` environment variable:
64+
65+
```bash
66+
# In your .bashrc, .zshrc, or similar
67+
export DATA_DESIGNER_HOME="/path/to/your/custom/directory"
68+
```
69+
70+
These configuration files can be modified in two ways:
71+
72+
1. **Using the CLI**: Run CLI commands to add, update, or delete model configurations and providers
73+
2. **Manual editing**: Directly edit the YAML files with your preferred text editor
74+
75+
Both methods operate on the same files, ensuring consistency across your entire Data Designer setup.
76+
77+
## Important Notes
78+
79+
!!! warning "API Key Requirements"
80+
While default model configurations are always available, you need to set the appropriate API key environment variable (`NVIDIA_API_KEY` or `OPENAI_API_KEY`) to actually use the corresponding models for data generation. Without a valid API key, any attempt to generate data using that provider's models will fail.
81+
82+
!!! tip "Environment Variables"
83+
Store your API keys in environment variables rather than hardcoding them in your scripts:
84+
85+
```bash
86+
# In your .bashrc, .zshrc, or similar
87+
export NVIDIA_API_KEY="your-api-key-here"
88+
export OPENAI_API_KEY="your-openai-api-key-here"
89+
```
90+
91+
## See Also
92+
93+
- **[Configure Model Settings With the CLI](configure-model-settings-with-the-cli.md)**: Learn how to use the CLI to manage model settings.
94+
- **[Quick Start Guide](../quick-start.md)**: Get started with a simple example
95+
- **[Model Configuration Reference](../code_reference/config_builder.md)**: Detailed API documentation
96+
- **[Column Configurations](../code_reference/column_configs.md)**: Learn about all column types

0 commit comments

Comments
 (0)