feat: Enhance sglang adapter and OpenAI api compatibility by attafosu · Pull Request #158 · mlcommons/endpoints

attafosu · 2026-03-09T21:33:48Z

What does this PR do?

Adds unified dataset preset featuring openai-compatible and native sglang api for cnn dailymail.
Also refactors "Harmonizer" in sglang adapter so to skip if "input_tokens" are pregenerated by dataset preset.
Adds unit tests for preset datasets and the Harmonizer transform

Type of change

Bug fix
New feature
Documentation update
Refactor/cleanup

Related issues

Testing

Tests added/updated
All tests pass locally
Manual testing completed

Checklist

Code follows project style
Pre-commit hooks pass
Documentation updated (if needed)

Signed-off-by: attafosu <thomas.atta-fosu@intel.com>

Signed-off-by: attafosu <thomas.atta-fosu@intel.com> Committer: attafosu <thomas.atta-fosu@intel.com>

github-actions · 2026-03-09T21:33:58Z

MLCommons CLA bot All contributors have signed the MLCommons CLA ✍️ ✅

gemini-code-assist · 2026-03-09T21:34:02Z

Summary of Changes

This pull request significantly enhances the data processing pipeline by integrating SGLang compatibility through a new dataset preset and refining the Harmonize transform for more flexible and robust tokenization. It also improves adherence to OpenAI API standards by making certain response fields optional. The changes are supported by comprehensive unit tests and updated documentation, ensuring reliability and ease of use for SGLang and OpenAI API interactions.

Highlights

SGLang Adapter and Dataset Preset: Introduced a new llama3_8b_sglang dataset preset for the CNN Dailymail dataset, enabling compatibility with both OpenAI-compatible and native SGLang APIs. This includes specific prompt formatting and tokenization steps tailored for SGLang.
Enhanced Harmonize Transform: Refactored the Harmonize data transform to include a mode parameter ('harmony' or 'plain') for flexible tokenization. It now also features a guard to prevent overwriting input_tokens if they are already pre-generated by a dataset preset, ensuring efficient processing in fused pipelines.
OpenAI API Compatibility Improvements: Adjusted the ChatCompletionResponseMessage and ChatCompletionResponse types to make refusal, usage, and system_fingerprint fields optional with a default value of None, aligning better with OpenAI API specifications.
Comprehensive Unit Testing: Added new unit tests for dataset preset transforms, covering instantiation, transform application, and output column verification. Expanded existing unit tests for the Harmonize transform to cover new mode functionality and the pre-tokenized row guard.
Updated Documentation: Updated the Llama3.1-8B example README to include detailed instructions for setting up and running benchmarks with SGLang endpoints, alongside existing vLLM instructions.

🧠 New Feature in Public Preview: You can now enable Memory to help Gemini Code Assist learn from your team's feedback. This makes future code reviews more consistent and personalized to your project's style. Click here to enable Memory in your admin console.

Changelog

DATASET_PRESET_TESTING.md
- Added new documentation detailing how to unit test dataset preset transforms, including quick start commands and scope of tests.
examples/05_Llama3.1-8B_Example/README.md
- Updated dataset download instructions to clarify when it's needed for quantization.
- Added a new section with instructions for launching an SGLang server.
- Modified benchmark start instructions to differentiate between vLLM and SGLang endpoint targets.
examples/05_Llama3.1-8B_Example/offline_llama3_8b_cnn_sglang_api.yaml
- Added a new YAML configuration file for running an offline throughput benchmark using the SGLang API.
src/inference_endpoint/dataset_manager/predefined/cnndailymail/presets.py
- Imported the Harmonize transform.
- Implemented a new llama3_8b_sglang preset function that formats prompts and tokenizes them using Harmonize in 'plain' mode for SGLang compatibility.
src/inference_endpoint/dataset_manager/transforms.py
- Added a mode parameter to the Harmonize transform's constructor, defaulting to 'harmony'.
- Implemented validation for the mode parameter to ensure it is either 'harmony' or 'plain'.
- Modified the process_row method in Harmonize to conditionally tokenize based on the mode and to skip processing if input_tokens are already present in the row.
src/inference_endpoint/openai/types.py
- Made the refusal field in ChatCompletionResponseMessage optional with a default value of None.
- Made the usage and system_fingerprint fields in ChatCompletionResponse optional with a default value of None.
tests/unit/dataset_manager/test_dataset_presets.py
- Added a new file containing unit tests for various dataset presets, including CNNDailyMail, AIME25, GPQA, LiveCodeBench, and OpenOrca.
- Included specific tests for the llama3_8b_sglang preset to verify instantiation, transform application, and correct output of tokenized inputs.
tests/unit/dataset_manager/test_transforms.py
- Updated the module docstring to reflect that Harmonize transform is now also tested.
- Added unit tests for the Harmonize transform, covering invalid mode handling, skipping existing input_tokens, and behavior in 'plain' vs 'harmony' modes.

Activity

The MLCommons CLA bot confirmed that all contributors have signed the CLA.
Gemini Code Assist provided an initial summary of the changes.
attafosu requested reviews from @arekay-nv and @nvzhihanj.
arekay-nv requested changes, specifically asking for the llama3-8b example to be updated and for tests to be added.
attafosu requested a summary from Gemini Code Assist.
Copilot suggested that Harmonize should still compute harmonized_column even when input_tokens are pre-generated, or clarify this behavior in the docstring.
Copilot noted that new Harmonize behavior (mode parameter, validation, row-level guard) was not covered by unit tests and requested targeted tests.
Copilot pointed out an issue with the SGLang launch command in the README, specifically regarding bash -lc syntax and inline comments breaking line continuations.
Copilot reiterated the need for unit tests for Harmonize's new mode behavior and skip logic, suggesting mocking Harmonizer to avoid downloads.
Copilot again highlighted the incorrect docker exec command for SGLang, advising to quote the full command string passed to -lc.
Copilot identified an inline comment breaking a line continuation in the SGLang launch command example and suggested moving or reformatting it.
Copilot suggested tightening the SGLang preset test to assert the existence and content of input_tokens and the absence of harmonized_prompt.
Copilot recommended tightening an assertion in TestGPQAPresets to check for specific instruction text rather than just the letter 'A'.

gemini-code-assist

Code Review

This pull request introduces a new dataset preset for SGLang compatibility, llama3_8b_sglang, which leverages an enhanced Harmonize transform. The Harmonize transform now supports a 'plain' mode for direct tokenization and includes a robust guard to prevent overwriting pre-tokenized input. Additionally, optional fields in OpenAI chat completion types (refusal, usage, system_fingerprint) have been updated to include default None values, improving constructor flexibility. The changes are well-implemented and align with the stated objective of improving OpenAI API compatibility and refactoring the Harmonizer.

Copilot

Pull request overview

Adds dataset preset and transform behavior needed to support SGLang usage alongside OpenAI-compatible flows, particularly for CNN/DailyMail, while improving OpenAI response type compatibility.

Changes:

Make certain OpenAI response fields optional by providing None defaults in msgspec structs.
Extend Harmonize transform with a mode option (harmony vs plain) and add a row-level guard to avoid overwriting pre-generated input_tokens when row processors are fused.
Add a CNN/DailyMail preset (llama3_8b_sglang) that formats a prompt and pre-tokenizes it for SGLang.

Reviewed changes

Copilot reviewed 3 out of 3 changed files in this pull request and generated 1 comment.

File	Description
`src/inference_endpoint/openai/types.py`	Adds defaults for optional OpenAI response fields to improve (de)serialization compatibility.
`src/inference_endpoint/dataset_manager/transforms.py`	Adds `mode` to `Harmonize` and prevents overwriting pre-generated `input_tokens` in fused row-processing.
`src/inference_endpoint/dataset_manager/predefined/cnndailymail/presets.py`	Introduces `llama3_8b_sglang` preset to generate `input_tokens` directly (plain tokenization).

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

You can also share your feedback on Copilot code review. Take the survey.

Copilot · 2026-03-09T21:38:24Z

src/inference_endpoint/dataset_manager/transforms.py

+        if self.tokenized_column in row and row[self.tokenized_column] is not None:
+            return row


Returning early when input_tokens is present skips populating harmonized_column even when it’s configured (non-None). If callers rely on the text harmonized prompt for debugging/logging, consider still computing harmonized_column from the existing tokens (without overwriting tokens), or update the docstring/behavior to make it explicit that the column may not be produced when tokens are pre-generated.

attafosu · 2026-03-09T22:08:15Z

@arekay-nv @nvzhihanj please take a look when you get the chance

Copilot

Pull request overview

Copilot reviewed 3 out of 3 changed files in this pull request and generated no new comments.

Comments suppressed due to low confidence (1)

src/inference_endpoint/dataset_manager/transforms.py:162

In mode="plain", Harmonize still instantiates Harmonizer, which unconditionally loads an OpenAI Harmony encoding (harmony.load_harmony_encoding(...)) and builds a Harmony system message. Since plain mode only needs tokenizer.encode(), consider avoiding/loading the Harmony encoding lazily (e.g., only create Harmonizer when mode=="harmony", and for plain mode just load/cache the HF tokenizer and encode directly) to prevent unnecessary startup cost and dependency on Harmony assets.

        self.mode = mode
        if self.mode not in {"harmony", "plain"}:
            raise ValueError(f"Invalid harmonize mode: {self.mode}")
        self.harmonizer = Harmonizer(
            tokenizer_name=tokenizer_name,
            encoding_name=encoding_name,
            reasoning_effort=reasoning_effort,
            conversation_start_date=conversation_start_date,
        )

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

You can also share your feedback on Copilot code review. Take the survey.

arekay-nv · 2026-03-12T04:09:46Z

@attafosu does this need the llama3-8b example to change? Can you also push those changes so the PR can be functionally verified. Also, would be nice to add tests for this.

Copilot

Pull request overview

Copilot reviewed 5 out of 5 changed files in this pull request and generated 2 comments.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

You can also share your feedback on Copilot code review. Take the survey.

src/inference_endpoint/dataset_manager/transforms.py

@@ -145,10 +146,14 @@ def __init__(
            tokenized_column: The name of the column containing the tokenized prompt.
            harmonized_column: The name of the column containing the harmonized prompt. If None,
                the harmonized prompt will not be stored as text.
+            mode: "harmony" to render a Harmony conversation; "plain" to tokenize the raw prompt.
        """
        self.prompt_column = prompt_column
        self.tokenized_column = tokenized_column
        self.harmonized_column = harmonized_column
+        self.mode = mode
+        if self.mode not in {"harmony", "plain"}:
+            raise ValueError(f"Invalid harmonize mode: {self.mode}")


examples/05_Llama3.1-8B_Example/README.md

+# Start sglang endpoint
+docker exec -u root -w /workspace sglang-cpu-server /bin/bash -lc python3 -m sglang.launch_server \
+    --model-path $MODEL_NAME \
+    --served-model-name meta-llama/Llama-3.1-8B-Instruct \ # Needed if `model-path` is here is different from `model` in the client config
+    --dtype bfloat16 \


* Handle case with string response Handles the case where the response is a single string, not a list - needed to handle AMD submission which wasn't calculating TPOT without the fix. --------- Signed-off-by: Rashid Kaleem <230885705+arekay-nv@users.noreply.github.com>

Signed-off-by: attafosu <thomas.atta-fosu@intel.com>

…159) * drop orjson * update * trigger CI

* updates * ci fix

* optimize msgspec implementation * precommit * drop old perf test * add sglang tests * updates

address pending comment in #162 Refactored Request Building: Refactored the build_request method to leverage these new pre-calculated prefixes, simplifying the logic and reducing string concatenations for common request paths. Prefix Rebuilding Logic: Introduced a new private method _rebuild_prefixes to manage the construction and update of these prefixes, ensuring they are correctly updated when cached headers change.

* docs: add AGENTS.md with AI coding guidelines, restructure CLAUDE.md Move repo guidelines from CLAUDE.md into AGENTS.md so they are tool-agnostic and usable by any AI coding agent. CLAUDE.md now contains only an @AGENTS.md include directive. AGENTS.md covers architecture, code organization, development standards, and adds two new sections: a policy requiring AGENTS.md updates alongside significant refactors, and a catalog of common AI coding pitfalls specific to this codebase. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * docs: fix AGENTS.md formatting for prettier compliance Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Signed-off-by: Rashid Kaleem <230885705+arekay-nv@users.noreply.github.com>

* Add preset dataset unit tests and documentation - Add test_dataset_presets.py with 20 test cases for 6 presets across 5 datasets - Add comprehensive testing guide and schema reference documentation Tests verify that transforms work correctly without end-to-end runs, enabling fast regression detection when transform code changes. Signed-off-by: attafosu <thomas.atta-fosu@intel.com> * Cleanup local directory Signed-off-by: attafosu <thomas.atta-fosu@intel.com> * Sanitize documentation Signed-off-by: attafosu <thomas.atta-fosu@intel.com> * Cleanup Signed-off-by: attafosu <thomas.atta-fosu@intel.com> * Decorate slow tests Signed-off-by: attafosu <thomas.atta-fosu@intel.com> * Update DATASET_SCHEMA_REFERENCE.md * Cleanup Signed-off-by: attafosu <thomas.atta-fosu@intel.com> * Remove redundant dataset schema Signed-off-by: attafosu <thomas.atta-fosu@intel.com> * Add fixtures to simplify unit tests Signed-off-by: attafosu <thomas.atta-fosu@intel.com> --------- Signed-off-by: attafosu <thomas.atta-fosu@intel.com>

Signed-off-by: attafosu <thomas.atta-fosu@intel.com>

Copilot

Pull request overview

Copilot reviewed 7 out of 7 changed files in this pull request and generated 3 comments.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

examples/05_Llama3.1-8B_Example/README.md

Copilot · 2026-03-20T06:16:00Z

tests/unit/dataset_manager/test_dataset_presets.py

+        assert llama3_8b_sglang_transformed is not None
+        assert "prompt" in llama3_8b_sglang_transformed.columns


The SGLang preset test only asserts that the prompt column exists, but this preset’s key output is input_tokens (and harmonized_column=None). To prevent regressions in the new adapter/preset interaction, assert that input_tokens exists and contains a non-empty list of ints (and optionally that harmonized_prompt is absent).

Suggested change

assert llama3_8b_sglang_transformed is not None

assert "prompt" in llama3_8b_sglang_transformed.columns

assert llama3_8b_sglang_transformed is not None

# SGLang preset should still provide a prompt column

assert "prompt" in llama3_8b_sglang_transformed.columns

# Key output for SGLang preset is tokenized input

assert "input_tokens" in llama3_8b_sglang_transformed.columns

input_tokens = llama3_8b_sglang_transformed["input_tokens"].iloc[0]

assert isinstance(input_tokens, list)

assert len(input_tokens) > 0

assert all(isinstance(token, int) for token in input_tokens)

# harmonized_column is expected to be None for this preset

assert "harmonized_prompt" not in llama3_8b_sglang_transformed.columns

Copilot · 2026-03-20T06:16:01Z

tests/unit/dataset_manager/test_dataset_presets.py

+        assert "(C)" in prompt
+        assert "(D)" in prompt
+        # Should instruct to express answer as option letter
+        assert "A" in prompt or "option" in prompt.lower()


This assertion is effectively always true because the prompt always contains "(A)" in the choices, so the test can pass even if the instruction to answer with an option letter is removed. Tighten the check to look for the actual instruction text (e.g., "Express your final answer" / "corresponding option") rather than the letter "A".

Suggested change

assert "A" in prompt or "option" in prompt.lower()

lower_prompt = prompt.lower()

assert "express your final answer" in lower_prompt or "corresponding option" in lower_prompt

Signed-off-by: attafosu <thomas.atta-fosu@intel.com>

attafosu · 2026-03-20T07:28:43Z

/gemini summary

gemini-code-assist · 2026-03-20T07:29:06Z

Summary of Changes

This pull request significantly enhances the data processing pipeline by integrating SGLang compatibility through a new dataset preset and refining the Harmonize transform for more flexible and robust tokenization. It also improves adherence to OpenAI API standards by making certain response fields optional. The changes are supported by comprehensive unit tests and updated documentation, ensuring reliability and ease of use for SGLang and OpenAI API interactions.

Highlights

SGLang Adapter and Dataset Preset: Introduced a new llama3_8b_sglang dataset preset for the CNN Dailymail dataset, enabling compatibility with both OpenAI-compatible and native SGLang APIs. This includes specific prompt formatting and tokenization steps tailored for SGLang.
Enhanced Harmonize Transform: Refactored the Harmonize data transform to include a mode parameter ('harmony' or 'plain') for flexible tokenization. It now also features a guard to prevent overwriting input_tokens if they are already pre-generated by a dataset preset, ensuring efficient processing in fused pipelines.
OpenAI API Compatibility Improvements: Adjusted the ChatCompletionResponseMessage and ChatCompletionResponse types to make refusal, usage, and system_fingerprint fields optional with a default value of None, aligning better with OpenAI API specifications.
Comprehensive Unit Testing: Added new unit tests for dataset preset transforms, covering instantiation, transform application, and output column verification. Expanded existing unit tests for the Harmonize transform to cover new mode functionality and the pre-tokenized row guard.
Updated Documentation: Updated the Llama3.1-8B example README to include detailed instructions for setting up and running benchmarks with SGLang endpoints, alongside existing vLLM instructions.

🧠 New Feature in Public Preview: You can now enable Memory to help Gemini Code Assist learn from your team's feedback. This makes future code reviews more consistent and personalized to your project's style. Click here to enable Memory in your admin console.

Changelog

DATASET_PRESET_TESTING.md
- Added new documentation detailing how to unit test dataset preset transforms, including quick start commands and scope of tests.
examples/05_Llama3.1-8B_Example/README.md
- Updated dataset download instructions to clarify when it's needed for quantization.
- Added a new section with instructions for launching an SGLang server.
- Modified benchmark start instructions to differentiate between vLLM and SGLang endpoint targets.
examples/05_Llama3.1-8B_Example/offline_llama3_8b_cnn_sglang_api.yaml
- Added a new YAML configuration file for running an offline throughput benchmark using the SGLang API.
src/inference_endpoint/dataset_manager/predefined/cnndailymail/presets.py
- Imported the Harmonize transform.
- Implemented a new llama3_8b_sglang preset function that formats prompts and tokenizes them using Harmonize in 'plain' mode for SGLang compatibility.
src/inference_endpoint/dataset_manager/transforms.py
- Added a mode parameter to the Harmonize transform's constructor, defaulting to 'harmony'.
- Implemented validation for the mode parameter to ensure it is either 'harmony' or 'plain'.
- Modified the process_row method in Harmonize to conditionally tokenize based on the mode and to skip processing if input_tokens are already present in the row.
src/inference_endpoint/openai/types.py
- Made the refusal field in ChatCompletionResponseMessage optional with a default value of None.
- Made the usage and system_fingerprint fields in ChatCompletionResponse optional with a default value of None.
tests/unit/dataset_manager/test_dataset_presets.py
- Added a new file containing unit tests for various dataset presets, including CNNDailyMail, AIME25, GPQA, LiveCodeBench, and OpenOrca.
- Included specific tests for the llama3_8b_sglang preset to verify instantiation, transform application, and correct output of tokenized inputs.
tests/unit/dataset_manager/test_transforms.py
- Updated the module docstring to reflect that Harmonize transform is now also tested.
- Added unit tests for the Harmonize transform, covering invalid mode handling, skipping existing input_tokens, and behavior in 'plain' vs 'harmony' modes.

Activity

The MLCommons CLA bot confirmed that all contributors have signed the CLA.
Gemini Code Assist provided an initial summary of the changes.
attafosu requested reviews from @arekay-nv and @nvzhihanj.
arekay-nv requested changes, specifically asking for the llama3-8b example to be updated and for tests to be added.
attafosu requested a summary from Gemini Code Assist.
Copilot suggested that Harmonize should still compute harmonized_column even when input_tokens are pre-generated, or clarify this behavior in the docstring.
Copilot noted that new Harmonize behavior (mode parameter, validation, row-level guard) was not covered by unit tests and requested targeted tests.
Copilot pointed out an issue with the SGLang launch command in the README, specifically regarding bash -lc syntax and inline comments breaking line continuations.
Copilot reiterated the need for unit tests for Harmonize's new mode behavior and skip logic, suggesting mocking Harmonizer to avoid downloads.
Copilot again highlighted the incorrect docker exec command for SGLang, advising to quote the full command string passed to -lc.
Copilot identified an inline comment breaking a line continuation in the SGLang launch command example and suggested moving or reformatting it.
Copilot suggested tightening the SGLang preset test to assert the existence and content of input_tokens and the absence of harmonized_prompt.
Copilot recommended tightening an assertion in TestGPQAPresets to check for specific instruction text rather than just the letter 'A'.

Copilot

Pull request overview

Copilot reviewed 8 out of 8 changed files in this pull request and generated 2 comments.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

examples/05_Llama3.1-8B_Example/README.md

Copilot · 2026-03-20T07:36:56Z

src/inference_endpoint/dataset_manager/transforms.py

+        self.mode = mode
+        if self.mode not in {"harmony", "plain"}:
+            raise ValueError(f"Invalid harmonize mode: {self.mode}")
        self.harmonizer = Harmonizer(
            tokenizer_name=tokenizer_name,
            encoding_name=encoding_name,


In mode="plain", process_row() only needs a HuggingFace tokenizer, but __init__ still constructs a full Harmonizer, which eagerly loads the Harmony encoding and builds the system message. This adds unnecessary overhead/dependencies for the plain-tokenization path. Consider making Harmonizer lazily load the encoding/system message only when mode=="harmony", or use AutoTokenizer directly in plain mode.

Signed-off-by: attafosu <thomas.atta-fosu@intel.com>

attafosu · 2026-03-20T07:55:14Z

@attafosu does this need the llama3-8b example to change? Can you also push those changes so the PR can be functionally verified. Also, would be nice to add tests for this.

@arekay-nv Added example for this change in the llama3-8b Readme. Also added unit tests that covers preset datasets

arekay-nv · 2026-03-23T01:59:01Z

src/inference_endpoint/dataset_manager/predefined/cnndailymail/presets.py

+        Harmonize(
+            tokenizer_name=tokenizer_name,
+            prompt_column="prompt",
+            tokenized_column="input_tokens",
+            harmonized_column=None,
+            mode="plain",
+        ),
+    ]


Can you clarify why we need harmonize with plain here for a llama model. Harmonization only works with the gpt-oss models as far as i know, so using a harmonize transform here is a bit confusing.

So here we just want to use the Harmonizer to generate the tokenized inputs (input_tokens needed by sglang api). The plain mode is introduced to ensure no chat templates nor processing is applied to the input prompt (as would otherwise be the case in the "harmony" mode: src/inference_endpoint/dataset_manager/transforms.py::process_row() --> src/inference_endpoint/openai/harmony.py::harmony())

I could also add a new transform say a Tokenizer transform to do just that (generating tokenized inputs), but only wanted to refactor existing implementations wherever possible. If this sounds more straightforward I can leave the Harmonizer as is and instead add the tokenizing transform.

Copilot

Pull request overview

Copilot reviewed 8 out of 8 changed files in this pull request and generated 3 comments.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

examples/05_Llama3.1-8B_Example/README.md

Copilot · 2026-03-23T23:47:26Z

DATASET_PRESET_TESTING.md

+# Exclude slow tests (Harmonize transform requires transformers)
+pytest tests/unit/dataset_manager/test_dataset_presets.py -m "not slow" -v


The note says slow tests are excluded because Harmonize “requires transformers”, but transformers is already a core dependency in this repo; the main reason to mark these slow is usually that they can trigger tokenizer/model downloads and be network-dependent. Consider rewording to reflect that.

Copilot · 2026-03-23T23:47:27Z

DATASET_PRESET_TESTING.md

+
+| Dataset       | Presets                         | Tests |
+| ------------- | ------------------------------- | ----- |
+| CNNDailyMail  | `llama3_8b`, `llama3_8b_sglang` | 6     |


The CNNDailyMail row lists 6 tests, but tests/unit/dataset_manager/test_dataset_presets.py currently defines 5 tests for CNNDailyMail (3 for llama3_8b and 2 for llama3_8b_sglang). Please update the count to match the actual test file.

Suggested change

| CNNDailyMail | `llama3_8b`, `llama3_8b_sglang` | 6 |

| CNNDailyMail | `llama3_8b`, `llama3_8b_sglang` | 5 |

Copilot · 2026-03-24T00:13:46Z

@attafosu I've opened a new pull request, #198, to work on those changes. Once the pull request is ready, I'll request review from you.

Copilot

Pull request overview

Copilot reviewed 8 out of 8 changed files in this pull request and generated 5 comments.

Comments suppressed due to low confidence (1)

src/inference_endpoint/dataset_manager/transforms.py:199

When mode="plain", harmonized_column (defaulting to harmonized_prompt) is still populated via to_text() from the raw tokens, which is not actually a Harmony-formatted prompt. To avoid semantic confusion, consider either: (a) only writing harmonized_column when mode == "harmony", or (b) renaming/documenting the column semantics for plain mode (and/or default harmonized_column=None when mode="plain").

        if self.mode == "plain":
            tokens = self.harmonizer.to_tokens(row[self.prompt_column])
            row[self.tokenized_column] = tokens
        else:
            row[self.tokenized_column] = self.harmonizer(row[self.prompt_column])
        if self.harmonized_column is not None:
            row[self.harmonized_column] = self.harmonizer.to_text(
                row[self.tokenized_column]
            )

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot · 2026-03-24T22:43:13Z

src/inference_endpoint/dataset_manager/transforms.py

+        # Guard pre-tokenized rows: the SGLang adapter adds a default Harmonize
+        # (GPT-OSS tokenizer + harmony mode). When row processors are fused, the
+        # dataframe-level skip is bypassed, so without this guard, adapter
+        # Harmonize would overwrite input tokens. Alternative: remove Harmonize
+        # from the adapter transforms and require each SGLang preset to add its
+        # own Harmonize with the desired tokenizer/args.
+        if self.tokenized_column in row and row[self.tokenized_column] is not None:
+            return row
+        if self.mode == "plain":
+            tokens = self.harmonizer.to_tokens(row[self.prompt_column])
+            row[self.tokenized_column] = tokens
+        else:
+            row[self.tokenized_column] = self.harmonizer(row[self.prompt_column])
        if self.harmonized_column is not None:


In process_row, the early return when input_tokens is already present also skips populating harmonized_column (when configured). This makes it impossible to keep preset-provided tokens while still emitting harmonized_prompt text. Consider skipping only the tokenization step (avoid overwriting input_tokens), but still fill harmonized_column if it’s set and missing (or validate it matches the existing tokens).

Copilot · 2026-03-24T22:43:13Z

DATASET_PRESET_TESTING.md

+| Dataset       | Presets                         | Tests |
+| ------------- | ------------------------------- | ----- |
+| CNNDailyMail  | `llama3_8b`, `llama3_8b_sglang` | 6     |
+| AIME25        | `gptoss`                        | 3     |
+| GPQA          | `gptoss`                        | 3     |
+| LiveCodeBench | `gptoss`                        | 3     |
+| OpenOrca      | `llama2_70b`                    | 3     |


This table’s CNNDailyMail test count appears incorrect. tests/unit/dataset_manager/test_dataset_presets.py currently defines 5 tests under TestCNNDailyMailPresets (3 regular + 2 @pytest.mark.slow), not 6. Please update the count (or remove the numeric column) so the doc stays accurate.

Copilot · 2026-03-24T22:43:13Z

DATASET_PRESET_TESTING.md

+# Exclude slow tests (Harmonize transform requires transformers)
+pytest tests/unit/dataset_manager/test_dataset_presets.py -m "not slow" -v
+```


The note “Exclude slow tests (Harmonize transform requires transformers)” is a bit misleading since transformers is already a core dependency here; the main reason these tests are slow is typically tokenizer/model downloads and external network access. Consider rewording to reflect that the slow marker is about heavyweight downloads / network dependency.

Copilot · 2026-03-24T22:43:14Z

tests/unit/dataset_manager/test_dataset_presets.py

+        assert len(llama3_8b_transformed["prompt"][0]) > 0
+
+    def test_llama3_8b_prompt_format(self, llama3_8b_transformed, sample_cnn_data):
+        """Test that llama3_8b produces properly formatted prompts."""
+        prompt = llama3_8b_transformed["prompt"][0]
+        assert "Summarize" in prompt
+        assert "news article" in prompt
+        assert "article" in sample_cnn_data.columns
+        # The original article should be embedded in the prompt
+        assert sample_cnn_data["article"][0] in prompt


These tests access DataFrame cells via df["col"][0], which is label-based and can break if the index isn’t 0 (or if transforms preserve a non-default index). Prefer .iloc[0] for positional access (and apply consistently throughout this test file).

Suggested change

assert len(llama3_8b_transformed["prompt"][0]) > 0

def test_llama3_8b_prompt_format(self, llama3_8b_transformed, sample_cnn_data):

"""Test that llama3_8b produces properly formatted prompts."""

prompt = llama3_8b_transformed["prompt"][0]

assert "Summarize" in prompt

assert "news article" in prompt

assert "article" in sample_cnn_data.columns

# The original article should be embedded in the prompt

assert sample_cnn_data["article"][0] in prompt

assert len(llama3_8b_transformed["prompt"].iloc[0]) > 0

def test_llama3_8b_prompt_format(self, llama3_8b_transformed, sample_cnn_data):

"""Test that llama3_8b produces properly formatted prompts."""

prompt = llama3_8b_transformed["prompt"].iloc[0]

assert "Summarize" in prompt

assert "news article" in prompt

assert "article" in sample_cnn_data.columns

# The original article should be embedded in the prompt

assert sample_cnn_data["article"].iloc[0] in prompt

Copilot · 2026-03-24T22:43:14Z

tests/unit/dataset_manager/test_dataset_presets.py

+        assert "(C)" in prompt
+        assert "(D)" in prompt
+        # Should instruct to express answer as option letter
+        assert "A" in prompt or "option" in prompt.lower()


This assertion is redundant: earlier in the same test you already assert "(A)" in prompt, which implies "A" in prompt will always be true. Consider replacing this with a more specific check for the instruction text (e.g., that the prompt asks for an option letter), or drop it to avoid a non-signal assertion.

Suggested change

assert "A" in prompt or "option" in prompt.lower()

assert "option" in prompt.lower()

attafosu added 2 commits March 4, 2026 11:35

Extend openaimsgspec to bridge sglang endpoint compatibility

f79e2db

Signed-off-by: attafosu <thomas.atta-fosu@intel.com>

Add sglang-specific preset for llama3.1-8b

d909fac

Signed-off-by: attafosu <thomas.atta-fosu@intel.com> Committer: attafosu <thomas.atta-fosu@intel.com>

attafosu requested a review from a team as a code owner March 9, 2026 21:33

Copilot AI review requested due to automatic review settings March 9, 2026 21:33

github-actions bot requested review from arekay-nv and nvzhihanj March 9, 2026 21:34

Copilot started reviewing on behalf of attafosu March 9, 2026 21:34 View session

attafosu changed the title ~~Feat/attafosu/sglang OpenAI api compatibility~~ feat: Enhance sglang adapter and OpenAI api compatibility Mar 9, 2026

gemini-code-assist bot reviewed Mar 9, 2026

View reviewed changes

Copilot AI reviewed Mar 9, 2026

View reviewed changes

Copilot AI review requested due to automatic review settings March 11, 2026 00:29

Copilot started reviewing on behalf of attafosu March 11, 2026 00:29 View session

Copilot AI reviewed Mar 11, 2026

View reviewed changes

Copilot AI review requested due to automatic review settings March 16, 2026 21:31

Copilot started reviewing on behalf of attafosu March 16, 2026 21:32 View session

Copilot AI reviewed Mar 16, 2026

View reviewed changes

viraatc and others added 10 commits March 16, 2026 15:45

fix tcp warmup (#153)

ccac968

Add sglang-specific preset for llama3.1-8b

c2a230f

Signed-off-by: attafosu <thomas.atta-fosu@intel.com>

chore(http-client): cleanup types, improve coverage, remove orjson (#…

fb19769

…159) * drop orjson * update * trigger CI

feat: optimize zmq receive (#131)

f3c6a58

* updates * ci fix

feat: msgspec optimizations, docs (#74)

479f6a8

* optimize msgspec implementation * precommit * drop old perf test * add sglang tests * updates

chore(http-client): full test-coverage 1/2 (#162)

7e25f8f

Fix example (#167)

a6ae983

Signed-off-by: Rashid Kaleem <230885705+arekay-nv@users.noreply.github.com>

attafosu added 2 commits March 19, 2026 23:01

fix pre-commit

61e8585

Signed-off-by: attafosu <thomas.atta-fosu@intel.com>

Copilot AI review requested due to automatic review settings March 20, 2026 06:10

Copilot started reviewing on behalf of attafosu March 20, 2026 06:10 View session

Copilot AI reviewed Mar 20, 2026

View reviewed changes

attafosu added 3 commits March 19, 2026 23:51

Add unit tests for Harmonize modes and callers

3e866a4

Signed-off-by: attafosu <thomas.atta-fosu@intel.com>

Update 8B sglang endpoing example

d65fafa

Signed-off-by: attafosu <thomas.atta-fosu@intel.com>

Enhance tests for 8b sglang preset

45379ae

Signed-off-by: attafosu <thomas.atta-fosu@intel.com>

attafosu requested a review from Copilot March 20, 2026 07:29

Copilot started reviewing on behalf of attafosu March 20, 2026 07:30 View session

Copilot AI reviewed Mar 20, 2026

View reviewed changes

Minor fix

ea4d4f6

Signed-off-by: attafosu <thomas.atta-fosu@intel.com>

arekay-nv reviewed Mar 23, 2026

View reviewed changes

attafosu mentioned this pull request Mar 23, 2026

Endpoints code requires an OpenAI API compatible server (SGLANG incompatability) #137

Closed

Merge branch 'main' into feat/attafosu/sglang-openai-api-compatibility

d43f8f8

Copilot AI review requested due to automatic review settings March 23, 2026 23:41

Copilot started reviewing on behalf of attafosu March 23, 2026 23:41 View session

Copilot AI reviewed Mar 23, 2026

View reviewed changes

Copilot AI mentioned this pull request Mar 24, 2026

fix: remove redundant -p flag from SGLang docker run command #198

Closed

10 tasks

attafosu added 2 commits March 23, 2026 17:23

Update README.md: Remove redundant port mapping

4721925

Merge branch 'main' into feat/attafosu/sglang-openai-api-compatibility

68ea45a

Copilot AI review requested due to automatic review settings March 24, 2026 22:38

Copilot started reviewing on behalf of attafosu March 24, 2026 22:38 View session

Copilot AI reviewed Mar 24, 2026

View reviewed changes

Merge branch 'main' into feat/attafosu/sglang-openai-api-compatibility

86156dd

attafosu requested a review from arekay-nv March 27, 2026 18:10

		if self.tokenized_column in row and row[self.tokenized_column] is not None:
		return row

		assert llama3_8b_sglang_transformed is not None
		assert "prompt" in llama3_8b_sglang_transformed.columns

-        assert llama3_8b_sglang_transformed is not None
-        assert "prompt" in llama3_8b_sglang_transformed.columns
+        assert llama3_8b_sglang_transformed is not None
+        # SGLang preset should still provide a prompt column
+        assert "prompt" in llama3_8b_sglang_transformed.columns
+        # Key output for SGLang preset is tokenized input
+        assert "input_tokens" in llama3_8b_sglang_transformed.columns
+        input_tokens = llama3_8b_sglang_transformed["input_tokens"].iloc[0]
+        assert isinstance(input_tokens, list)
+        assert len(input_tokens) > 0
+        assert all(isinstance(token, int) for token in input_tokens)
+        # harmonized_column is expected to be None for this preset
+        assert "harmonized_prompt" not in llama3_8b_sglang_transformed.columns

	assert "A" in prompt or "option" in prompt.lower()
	lower_prompt = prompt.lower()
	assert "express your final answer" in lower_prompt or "corresponding option" in lower_prompt

		# Exclude slow tests (Harmonize transform requires transformers)
		pytest tests/unit/dataset_manager/test_dataset_presets.py -m "not slow" -v

	\| CNNDailyMail \| `llama3_8b`, `llama3_8b_sglang` \| 6 \|
	\| CNNDailyMail \| `llama3_8b`, `llama3_8b_sglang` \| 5 \|

	assert "A" in prompt or "option" in prompt.lower()
	assert "option" in prompt.lower()

Conversation

attafosu commented Mar 9, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What does this PR do?

Type of change

Related issues

Testing

Checklist

Uh oh!

github-actions bot commented Mar 9, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

gemini-code-assist bot commented Mar 9, 2026 • edited by attafosu Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary of Changes

Highlights

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Copilot AI Mar 9, 2026

Choose a reason for hiding this comment

Uh oh!

attafosu commented Mar 9, 2026

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

arekay-nv commented Mar 12, 2026

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Uh oh!

Copilot AI Mar 20, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 20, 2026

Choose a reason for hiding this comment

Uh oh!

attafosu commented Mar 20, 2026

Uh oh!

gemini-code-assist bot commented Mar 20, 2026

Summary of Changes

Highlights

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Uh oh!

Copilot AI Mar 20, 2026

Choose a reason for hiding this comment

Uh oh!

attafosu commented Mar 20, 2026

Uh oh!

arekay-nv Mar 23, 2026

Choose a reason for hiding this comment

Uh oh!

attafosu Mar 23, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Uh oh!

Copilot AI Mar 23, 2026

attafosu commented Mar 9, 2026 •

edited

Loading

github-actions bot commented Mar 9, 2026 •

edited

Loading

gemini-code-assist bot commented Mar 9, 2026 •

edited by attafosu

Loading