diff --git a/docs/assets/perf_stats.png b/docs/assets/perf_stats.png
new file mode 100644
index 00000000..010e5eee
Binary files /dev/null and b/docs/assets/perf_stats.png differ
diff --git a/docs/assets/perf_summary.png b/docs/assets/perf_summary.png
new file mode 100644
index 00000000..cf456fb2
Binary files /dev/null and b/docs/assets/perf_summary.png differ
diff --git a/docs/assets/request_data.png b/docs/assets/request_data.png
new file mode 100644
index 00000000..d8d9a51d
Binary files /dev/null and b/docs/assets/request_data.png differ
diff --git a/docs/assets/tokens_data.png b/docs/assets/tokens_data.png
new file mode 100644
index 00000000..ab669594
Binary files /dev/null and b/docs/assets/tokens_data.png differ
diff --git a/docs/guides/cli.md b/docs/guides/cli.md
index d30962bd..cf3c3f85 100644
--- a/docs/guides/cli.md
+++ b/docs/guides/cli.md
@@ -1 +1,154 @@
-# Coming Soon
+
+# GuideLLM CLI User Guide
+
+For more details on setup and installation, see the Setup and [Installation](https://apps.neuralmagic.com/GuideLLM/README.MD/#Installation) sections.
+
+## GuideLLM Quickstart
+
+To get started with GuideLLM, check out the [GuideLLM README](https://github.com/neuralmagic/guidellm/edit/main/README.md#getting-started).
+
+## GuideLLM CLI Details
+
+**GuideLLM** is a powerful tool for evaluating and optimizing the deployment of large language models (LLMs). The CLI has a large set of input arguments that give you advanced controls over all aspects of a workload that you want to run.
+
+### Input Metrics
+The input arguments are split up into 3 sections:
+
+- **Workload Overview**
+- **Workload Data**
+- **Workload Type**
+
+Once you fill out these arguments and run the command, GuideLLM will run the simulated workload. Note the time it takes to run can be set with max_seconds, but may also depend on the hardware and model.
+
+### Workload Overview
+
+This section of input parameters covers what to actually benchmark including the target host location, model, and task. The full list of arguments and their defintions are presented below:
+
+- **--target** (str): The target path or url for the backend to
+evaluate. Ex: 'http://localhost:8000/v1'. [required]
+
+ - optional breakdown args if target isn't specified:
+
+ - **--host** (str): The host URL for benchmarking.
+
+ - **--port** (str): The port available for benchmarking.
+
+- **--backend** [openai_server]: The backend to use for benchmarking. The default is OpenAI Server enabling compatability with any server that follows the OpenAI spec including vLLM.
+
+- **--model** (str): The model to use for benchmarking. If not provided, it will use the first available model provided the backend supports listing models.
+
+- **--task** (str): The task to use for benchmarking.
+
+- **--output-path** (str): The output path to save the output report to for loading later. Ex: guidance_report.json. The default is None, meaning no output is saved and results are only printed to the console.
+
+
+### Workload Data
+
+This section of input parameters covers the data arguments that need to be supplied such as a reference to the dataset and tokenizer. The list of arguments and their defintions are presented below:
+
+- **--data** (str): The data source to use for benchmarking. Depending on the data-type, it should be a path to a data file containing prompts to run (ex: data.txt), a HuggingFace dataset name (ex: 'neuralmagic/LLM_compression_calibration'), or a configuration for emulated data (ex: 'prompt_tokens=128,generated_tokens=128'). [required]
+
+- **--data-type** [emulated, file, transformers]: The type of data to use for benchmarking. Use 'emulated' for synthetic data, 'file' for a file, or 'transformers' for a HuggingFace dataset. Specify the data source with the --data flag. [required]
+
+- **--tokenizer** (str): The tokenizer to use for calculating the number of prompt tokens. This should match the tokenizer used by the model.By default, it will use the --model flag to determine the tokenizer. If not provided and the model is not available, will raise an error. Ex: 'neuralmagic/Meta-Llama-3.1-8B-quantized.w8a8'
+
+### Workload Type
+
+This section of input parameters covers the type of workload that you want to run to represent the type of load you expect on your server in production such as rate-per-second and the frequency of requests. The full list of arguments and their definitions are presented below:
+
+- **--rate-type** [sweep|synchronous|throughput|constant|poisson] : The type of request rate to use for benchmarking. Use sweep to run a full range from synchronous to throughput (default), synchronous for sending requests one after the other, throughput to send requests as fast as possible, constant for a fixed request rate, or poisson for a real-world variable request rate.
+
+- **--rate** (float): The request rate to use for constant and poisson rate types. To run with multiple, specific, rates, provide the flag multiple times. Ex. --rate 1 --rate 2 --rate 5
+
+- **--max-seconds** (integer): The maximum number of seconds for each benchmark run. Either max-seconds, max- requests, or both must be set. The default is 120 seconds. Note, this is the maximum time for each rate supplied, not the total time. This value should be large enough to allow for the server's performance to stabilize.
+
+- **--max-requests** (integer): The maximum number of requests for each benchmark run. Either max-seconds, max- requests, or both must be set. Note, this is the maximum number of requests for each rate supplied, not the total number of requests. This value should be large enough to allow for the server's performance to stabilize.
+
+### Output Metrics via GuideLLM Benchmarks Report
+
+Once your GuideLLM run is complete, the output metrics are displayed as a GuideLLM Benchmarks Report via the Terminal in the following 4 sections:
+
+- **Requests Data by Benchmark**
+- **Tokens Data by Benchmark**
+- **Performance Stats by Benchmark**
+- **Performance Summary by Benchmark**
+
+The GuideLLM Benchmarks Report surfaces key LLM metrics to help you determine the health and performance of your inference server. You can use the numbers generated by the GuideLLM Benchmarks Report to make decisions around server request processing, Service Level Objective (SLO) success/failure for your task, general model performance, and hardware impact.
+
+### Requests Data by Benchmark
+
+This section shows the request statistics for the benchmarks that were run. Request Data statistics highlight the details of the requests hitting the inference server. Viewing this information is essential to understand the health of your server processing requests sent by GuideLLM and can surface potential issues in your inference serving pipeline including software and hardware issues.
+
+
+
+
+
+
+
+This table includes:
+- **Benchmark:** Synchronous or Asynchronous@X req/sec
+- **Requests Completed:** the number of successful requests handled
+- **Requests Failed:** the number of failed requests
+- **Duration (sec):** the time taken to run the specific benchmark, determined by max_seconds
+- **Start Time (HH:MI:SS):** local timestamp the GuideLLM benchmark started
+- **End Time (HH:MI:SS):** local timestamp the GuideLLM benchmark ended
+
+
+### Tokens Data by Benchmark
+This section shows the prompt and output token distribution statistics for the benchmarks that were run. Token Data statistics highlight the details of your dataset in terms of prompts and generated outputs from the model. Viewing this information is integral to understanding model performance on your task and to ensure you are able to hit SLOs required to guarentee a good user experience from your application.
+
+
+
+
+
+
+
+This table includes:
+- **Benchmark:** Synchronous or Asynchronous@X req/sec
+- **Prompt (token length):** the average length of prompt tokens
+- **Prompt (1%, 5%, 50%, 95%, 99%):** Distribution of prompt token length
+- **Output (token length):** the average length of output tokens
+- **Output (1%, 5%, 50%, 95%, 99%):** Distribution of output token length
+
+### Performance Stats by Benchmark
+This section shows the LLM peformance statistics for the benchmarks that were run. Performance Statistics highlight the performance of the model across the key LLM performance metrics including: Request Latency, Time to First Token (TTFT), Inter Token Latench (ITL or TPOT), and Output Token Throughput. Viewing these key metrics are integral to ensuring the performance of your inference server for your task on your designated hardware where you are running your inference server.
+
+
+
+
+
+
+
+
+This table includes:
+- **Benchmark:** Synchronous or Asynchronous@X req/sec
+- **Request Latency [1%, 5%, 10%, 50%, 90%, 95%, 99%] (sec)**: the time it takes from submitting a query to receiving the full response, including the performance of your queueing/batching mechanisms and network latencies
+- **Time to First Token [1%, 5%, 10%, 50%, 90%, 95%, 99%] (ms)**: the time it takes from submitting the query to receiving the first token (if the response is not empty); often abbreviated as TTFT
+- **Inter Token Latency [1%, 5%, 10%, 50%, 90% 95%, 99%] (ms)**: the time between consecutive tokens and is also known as time per output token (TPOT)
+- **Output Token Throughput (tokens/sec)**: the total output tokens per second throughput, accounting for all the requests happening simultaneously
+
+
+### Performance Summary by Benchmark
+This section shows the averages for the LLM peformance statistics for the benchmarks that were run. The average Performance Statistics provide an overall summary of the model performance across the key LLM performance metrics. Viewing these summary metrics are integral to ensuring the performance of your inference server for your task on your designated hardware where you are running your inference server.
+
+
+
+
+
+
+
+
+This table includes:
+- **Benchmark:** Synchronous or Asynchronous@X req/sec
+- **Request Latency (sec)**: the average time it takes from submitting a query to receiving the full response, including the performance of your queueing/batching mechanisms and network latencies
+- **Time to First Token (ms)**: the average time it takes from submitting the query to receiving the first token (if the response is not empty); often abbreviated as TTFT
+- **Inter Token Latency (ms)**: the average time between consecutive tokens and is also known as time per output token (TPOT)
+- **Output Token Throughput (tokens/sec)**: the total average output tokens per second throughput, accounting for all the requests happening simultaneously
+
+
+## Report a Bug
+
+To report a bug, file an issue on [GitHub Issues](https://github.com/neuralmagic/guidellm/issues).
+
+
+
diff --git a/docs/guides/configuration.md b/docs/guides/configuration.md
index d30962bd..a6b8ac87 100644
--- a/docs/guides/configuration.md
+++ b/docs/guides/configuration.md
@@ -1 +1,260 @@
-# Coming Soon
+
+# GuideLLM Environment Variable Configuration
+
+For advanced users, you can set up a config file that contains environment variables which you can tailor for your guidellm benchmarks reports. These environment variables allow you to control various settings in the application, including logging, dataset preferences, emulated data usage, OpenAI API configuration, and report generation options. Setting these variables in a configuration file or directly in the environment can tailor the application's behavior to different environments or use cases.
+
+## GuideLLM Environment Variable Configuration Details
+
+You can set environment variables directly in your `.env` file or in your environment (e.g., via shell commands) to configure the application as needed. and are split into the following categories:
+1. General Environment Variables
+2. Logging Settings
+3. Dataset Settings
+4. Emulated Data Settings
+5. OpenAI Settings
+6. Report Generation Settings
+
+### 1. **General Environment Variables**
+
+- **`GUIDELLM__ENV`**:
+ Sets the application's operating environment. It can be one of `local`, `dev`, `staging`, or `prod`. This controls which set of configuration defaults are used (e.g., URLs for reports, log levels, etc.).
+
+- **`GUIDELLM__REQUEST_TIMEOUT`**:
+ Controls the request timeout duration for the application in seconds. The default is 30 seconds.
+
+- **`GUIDELLM__MAX_CONCURRENCY`**:
+ Determines the maximum number of concurrent processes or requests the application can handle. The default is 512.
+
+- **`GUIDELLM__NUM_SWEEP_PROFILES`**:
+ Sets the number of sweep profiles to use. The default is 9.
+
+
+### 2. **Logging Settings**
+
+- **`GUIDELLM__LOGGING__DISABLED`**:
+ Enables or disables logging for the application. If set to `true`, logging is disabled. Default is `false`.
+
+- **`GUIDELLM__LOGGING__CLEAR_LOGGERS`**:
+ If `true`, existing loggers are cleared when the application starts. Default is `true`.
+
+- **`GUIDELLM__LOGGING__CONSOLE_LOG_LEVEL`**:
+ Sets the logging level for console output (e.g., `INFO`, `WARNING`, `ERROR`). Default is `WARNING`.
+
+- **`GUIDELLM__LOGGING__LOG_FILE`**:
+ Specifies the file path to write logs. If not set, logs are not written to a file.
+
+- **`GUIDELLM__LOGGING__LOG_FILE_LEVEL`**:
+ Sets the logging level for the log file if `LOG_FILE` is specified.
+
+
+### 3. **Dataset Settings**
+
+- **`GUIDELLM__DATASET__PREFERRED_DATA_COLUMNS`**:
+ A list of preferred column names to use from datasets. This is useful when working with varied datasets that may have different column names for similar data types.
+
+- **`GUIDELLM__DATASET__PREFERRED_DATA_SPLITS`**:
+ A list of preferred dataset splits (e.g., `train`, `test`, `validation`) that the application uses.
+
+
+### 4. **Emulated Data Settings**
+
+- **`GUIDELLM__EMULATED_DATA__SOURCE`**:
+ URL or path to the source of the emulated data. This is used when running the application with mock data.
+
+- **`GUIDELLM__EMULATED_DATA__FILTER_START`**:
+ A string that marks the start of the text to be used from the `SOURCE`.
+
+- **`GUIDELLM__EMULATED_DATA__FILTER_END`**:
+ A string that marks the end of the text to be used from the `SOURCE`.
+
+- **`GUIDELLM__EMULATED_DATA__CLEAN_TEXT_ARGS`**:
+ A dictionary of boolean settings to control the text cleaning options, such as fixing encoding, removing empty lines, etc.
+
+
+### 5. **OpenAI Settings**
+
+- **`GUIDELLM__OPENAI__API_KEY`**:
+ The API key for authenticating requests to OpenAI's API.
+
+- **`GUIDELLM__OPENAI__BASE_URL`**:
+ The base URL for the OpenAI server or a compatible server. Default is `http://localhost:8000/v1`.
+
+- **`GUIDELLM__OPENAI__MAX_GEN_TOKENS`**:
+ The maximum number of tokens that can be generated in a single API request. Default is 4096.
+
+
+### 6. **Report Generation Settings**
+
+- **`GUIDELLM__REPORT_GENERATION__SOURCE`**:
+ The source path or URL from which the report will be generated. If not set, defaults are used based on the environment (`local`, `dev`, `staging`, `prod`).
+
+- **`GUIDELLM__REPORT_GENERATION__REPORT_HTML_MATCH`**:
+ The placeholder string that will be matched in the HTML report to insert data. Default is `"window.report_data = {};"`.
+
+- **`GUIDELLM__REPORT_GENERATION__REPORT_HTML_PLACEHOLDER`**:
+ Placeholder format to be replaced with the generated report data.
+
+## Environment Variable Usage
+
+To use these environment variables in your code for different goals, you can set them in an `.env` file or directly in your environment and then use the `Settings` class from your provided code to access and utilize them. Here’s how you can leverage these settings in your code:
+
+### 1. **Setting Up the Environment Variables**
+
+You can set environment variables directly in your `.env` file or in your environment (e.g., via shell commands) to configure the application as needed. For example:
+
+
+```bash
+# In your .env file or shell
+export GUIDELLM__ENV=dev
+export GUIDELLM__LOGGING__DISABLED=true
+export GUIDELLM__OPENAI__API_KEY=your_openai_api_key
+export GUIDELLM__REPORT_GENERATION__SOURCE="https://example.com/report"
+
+```
+
+These settings will be loaded by the `Settings` class from the `.env` file or environment when the application starts.
+
+### 2. **Accessing Environment Variables in Code**
+
+The `Settings` class in the code is powered by `pydantic` and `pydantic_settings`, making it easy to access environment variables in your application code.
+
+For example:
+``` python
+# Access settings
+current_settings = settings
+
+# Print the current environment
+print(f"Current Environment: {current_settings.env}")
+
+# Check if logging is disabled
+if current_settings.logging.disabled:
+ print("Logging is disabled.")
+
+# Access OpenAI API key
+openai_api_key = current_settings.openai.api_key
+print(f"Using OpenAI API Key: {openai_api_key}")
+
+# Generate a report using the source URL
+report_source = current_settings.report_generation.source print(f"Generating report from source: {report_source}")
+```
+
+### 3. **Customize Environement Variables for your Goals**
+
+You can utilize the settings for various goals in your code as follows:
+
+#### Goal 1: **Customizing Logging Behavior**
+
+By setting `GUIDELLM__LOGGING__DISABLED`, `GUIDELLM__LOGGING__CLEAR_LOGGERS`, `GUIDELLM__LOGGING__CONSOLE_LOG_LEVEL`, and other logging-related settings, you can control how logging behaves:
+
+
+```python
+if current_settings.logging.disabled:
+ # Disable logging in your application
+ logger.disabled = True
+else:
+ # Set logging levels
+ logger.setLevel(current_settings.logging.console_log_level)
+ # Optionally clear existing loggers
+ if current_settings.logging.clear_loggers:
+ logging.root.handlers.clear()
+
+ # Log to a file if specified
+ if current_settings.logging.log_file:
+ file_handler = logging.FileHandler(current_settings.logging.log_file)
+ file_handler.setLevel(current_settings.logging.log_file_level or "WARNING")
+ logger.addHandler(file_handler)
+```
+
+
+#### Goal 2: **Configuring Dataset Preferences**
+
+If you want to control how your application processes datasets, you can customize dataset-related settings:
+
+``` python
+preferred_columns = current_settings.dataset.preferred_data_columns
+preferred_splits = current_settings.dataset.preferred_data_splits
+
+# Use preferred columns to filter dataset
+filtered_data = dataset.filter(columns=preferred_columns)
+# Use preferred splits to process only the required data splits
+for split in preferred_splits:
+ process_data_split(data[split])
+```
+
+#### Goal 3: **Using Emulated Data for Testing**
+
+To use emulated data for testing, you can adjust the `GUIDELLM__EMULATED_DATA__SOURCE` and related settings:
+
+``` python
+emulated_data_source = current_settings.emulated_data.source
+# Read data from the emulated source
+with open(emulated_data_source, "r") as f:
+ data = f.read()
+
+# Apply filters and cleaning based on settings
+filtered_data = apply_filters(data, start=current_settings.emulated_data.filter_start, end=current_settings.emulated_data.filter_end)
+cleaned_data = clean_text(filtered_data, **current_settings.emulated_data.clean_text_args)
+```
+
+#### Goal 4: **Configuring OpenAI API Requests**
+To make API requests to OpenAI or a compatible server, use `GUIDELLM__OPENAI__API_KEY`, `GUIDELLM__OPENAI__BASE_URL`, and other OpenAI-related settings:
+
+``` python
+import requests
+
+headers = {
+ "Authorization": f"Bearer {current_settings.openai.api_key}"
+}
+url = current_settings.openai.base_url
+
+response = requests.post(url, headers=headers, json={
+ "prompt": "Translate this to French: 'Hello, world!'",
+ "max_tokens": current_settings.openai.max_gen_tokens
+})
+
+print(response.json())
+```
+
+#### Goal 5: **Generating Reports Dynamically**
+
+You can control report generation behavior based on environment settings:
+
+``` python
+if not current_settings.report_generation.source:
+ # Use the default report source based on environment
+ report_url = ENV_REPORT_MAPPING[current_settings.env]
+else:
+ report_url = current_settings.report_generation.source
+
+# Fetch or generate the report
+generate_report_from_source(report_url)
+```
+
+### 4. **Reloading Settings Dynamically**
+
+To dynamically reload settings based on changes in the environment or `.env` file, you can use the `reload_settings` function:
+
+``` python
+# Reload settings when changes are detected
+reload_settings()
+
+# Re-access updated settings
+new_settings = settings
+print(f"Updated Environment: {new_settings.env}")
+```
+
+
+### 5. **Generating `.env` Files Programmatically**
+
+You can generate a `.env` file programmatically using the `generate_env_file` method:
+
+``` python
+env_file_content = settings.generate_env_file()
+
+with open(".env", "w") as env_file:
+ env_file.write(env_file_content)
+
+print("Generated .env file with current settings.")
+```
+
+## Conclusion
+By configuring and accessing these environment variables in your code, you can effectively manage application settings for various use cases such as development, testing, and production, while dynamically adjusting application behavior without needing to hard-code values directly.
diff --git a/docs/guides/examples.md b/docs/guides/examples.md
new file mode 100644
index 00000000..c0752fba
--- /dev/null
+++ b/docs/guides/examples.md
@@ -0,0 +1,2 @@
+# GuideLLM Examples
+