Adding basic runtime tests and gh workflow #93

avinash2692 · 2025-08-25T16:39:07Z

Description

This PR address #53 and is a follow up to #67.

The overall PR marks stochastic tests with a llm marker so that they are expected to fail during runtime.
This only happens if GITHUB_ACTION == 1 is set as an env variable, which lets users run all stochastic models locally to run the full test suite.
Also adds a github action to run tests once quality checks pass.

jakelorocco

lgtm; a few suggestions

Also, is it possible to do a dry run of these actions somehow to make sure the tests run on the github runners?

jakelorocco · 2025-08-25T18:14:26Z

test/backends/test_huggingface.py

 @pytest.fixture(scope="module")
 def backend():
    """Shared HuggingFace backend for all tests in this module."""
+    # TODO: find a smalle 1B model to do Alora stuff on github actions.


Can we remove this or open a separate issue to fix this?

jakelorocco · 2025-08-25T18:17:10Z

test/conftest.py

+    # # Check if there is a session fixture.
+    # try:
+    #     session: MelleaSession = item._request.getfixturevalue("m_session")
+    # except Exception:
+    #     # Skip test cause all llm marked tests need a session fixture.
+    #     pytest.skip("`llm` marked tests requires a `m_session` fixture.")
+    # # Get the Ollama name.
+    # if isinstance(session.backend, OllamaModelBackend) or isinstance(session.backend, OpenAIBackend):
+    #     model_id = session.backend.model_id.ollama_name
+    #     # Skip tests of the model name is llama 1b
+    #     if model_id == "llama3.2:1b":
+    #         pytest.skip(
+    #             "Skipping LLM test: got model_id == llama3.2:1b in ollama. Used only in gh workflows."
+    #         )
+    # elif isinstance(session.backend, LocalHFBackend):
+    #     model_id = session.backend.model_id.hf_model_name
+    #     # Skip tests of the model name is llama 1b
+    #     if model_id == "unsloth/Llama-3.2-1B":
+    #         pytest.skip(
+    #             "Skipping LLM test: got model_id == unsloth/Llama-3.2-1B in hf. Used only in gh workflows."
+    #         )


Can we remove this commented out code?

jakelorocco · 2025-08-25T18:18:50Z

test/stdlib_basics/test_contextual_session.py

+def test_session_with_parameters(model_id):
    """Test contextual session with custom parameters."""
-    with start_session(backend_name="ollama", model_id="granite3.3:8b") as m:
+    with start_session(backend_name="ollama", model_id=META_LLAMA_3_2_1B) as m:


Should this be using the model_id fixture?

jakelorocco · 2025-08-25T18:21:28Z

pyproject.toml

+
+[tool.pytest.ini_options]
+markers = [
+    "llm: Marks the test as needing an exact output from an LLM (deselect with '-m \" not llm\"'); this depends on the session.backend.model_id"


I feel like some of the tests you've marked with this flag don't actually require an exact output. (I commented the ones in huggingface that I feel shouldn't have been, but it looks like the same applies to the other backends.).

Also, I think the naming is a bit confusing here. Can we do something like qualitative or output_checked or something along those lines. Because some of the tests that still run use llms.

makes sense. I changed it to quantitative. Let me know it that works.

jakelorocco · 2025-08-25T18:22:24Z

test/backends/test_huggingface.py

    session.reset()

-
+@pytest.mark.llm


This doesn't require the output to be a specific value.

jakelorocco · 2025-08-25T18:23:25Z

test/backends/test_huggingface.py

    print(result)

-
+@pytest.mark.llm


I think it might be fair to expect the alora to output only Y/N. But maybe it gets it wrong enough that we need to mark it this way.

jakelorocco · 2025-08-25T18:24:00Z

test/backends/test_huggingface.py

    assert alora_output in ["Y", "N"], alora_output

-
+@pytest.mark.llm


Same as above here. It should be forced to Y/N here.

jakelorocco · 2025-08-25T18:24:46Z

test/backends/test_huggingface.py

    assert str(val_result.reason) not in ["Y", "N"]

-
+@pytest.mark.llm


This doesn't require the output to be a specific value.

jakelorocco · 2025-08-25T18:25:17Z

test/backends/test_huggingface.py

    )

-
+@pytest.mark.llm


This doesn't require the output to be a specific value.

jakelorocco · 2025-08-25T18:25:30Z

test/backends/test_huggingface.py

    assert len(results) == len(prompts)

-
+@pytest.mark.llm


This doesn't require the output to be a specific value.

avinash2692 · 2025-08-25T19:11:01Z

lgtm; a few suggestions

Also, is it possible to do a dry run of these actions somehow to make sure the tests run on the github runners?

There isn't a way to do it on github (or I haven't found it yet). But I have been using https://github.com/nektos/act to test it locally on a docker machine. Some of it it failing now cause of a stupid typo in the env variables (GITHUB_ACTIONS --> GITHUB_ACTION) 🤦 , but will fix that and your suggestions in a commit in a bit.

jakelorocco

lgtm; I will approve but have a few comments:

I think you are still applying the qualitative marker too broadly; I saw a far amount of tests that generate but don't check the content of the output getting flagged with it
I think the watsonx skip is fine for now, but we should come up with a more systematic way of skipping backends

avinash2692 · 2025-08-26T18:03:29Z

lgtm; I will approve but have a few comments:

1. I think you are still applying the qualitative marker too broadly; I saw a far amount of tests that generate but don't check the content of the output getting flagged with it

2. I think the watsonx skip is fine for now, but we should come up with a more systematic way of skipping backends

Agree. I think this is something that we might need to address in a larger PR on how we can separate our tests to inference vs. non-inference.
Agree. I think this is probably a job for a DummyBackend at some point.

avinash2692 added 4 commits August 22, 2025 10:06

adding conftest.py for test configs

9498f0f

adding modified tests with optional LLM runs

68cf8c6

adding llama 1b in model ids

b318986

adding tests to config and workflow

733b53c

avinash2692 requested a review from jakelorocco August 25, 2025 16:42

avinash2692 added 3 commits August 25, 2025 09:43

renaming workflow

9866157

small changes

1eccc33

trying to test workflow

0ae217d

jakelorocco reviewed Aug 25, 2025

View reviewed changes

avinash2692 added 2 commits August 25, 2025 11:56

adding the tests to the quality workflow

aa7760a

adding env variable to disable tests

89b4834

avinash2692 added 6 commits August 25, 2025 15:57

chaning marker name llm -> qualitative

8e61910

changing test markers

cb754f2

addressing PR comments

26d5dfc

changing ollama port

6102d2e

changing ollama port

752d43d

changing ollama order

c0c76a4

avinash2692 requested a review from elronbandel August 26, 2025 00:04

avinash2692 and others added 8 commits August 25, 2025 17:11

skipping hf tests till we have a 1b alora

7f9f8ce

skip rich doc test that takes too much memory

773b398

remove unused session functions

b432a2a

changing env var name

16e909d

minor changes

5c0c3ee

ignoring more watsonx for now

4fc84be

minor changes

6b86010

fix non-duplicate member func for mify in python 3.11

e056768

jakelorocco approved these changes Aug 26, 2025

View reviewed changes

avinash2692 merged commit 03d93b4 into main Aug 26, 2025
3 checks passed

avinash2692 deleted the avi/marking-test-as-stochastic branch August 26, 2025 18:03

		assert alora_output in ["Y", "N"], alora_output


		@pytest.mark.llm

		assert str(val_result.reason) not in ["Y", "N"]


		@pytest.mark.llm

Adding basic runtime tests and gh workflow #93

Adding basic runtime tests and gh workflow #93

Uh oh!

Conversation

avinash2692 commented Aug 25, 2025

Description

Uh oh!

jakelorocco left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

avinash2692 commented Aug 25, 2025

Uh oh!

jakelorocco left a comment

Choose a reason for hiding this comment

Uh oh!

avinash2692 commented Aug 26, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants