-
Notifications
You must be signed in to change notification settings - Fork 53
Adding basic runtime tests and gh workflow #93
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
jakelorocco
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm; a few suggestions
Also, is it possible to do a dry run of these actions somehow to make sure the tests run on the github runners?
test/backends/test_huggingface.py
Outdated
| @pytest.fixture(scope="module") | ||
| def backend(): | ||
| """Shared HuggingFace backend for all tests in this module.""" | ||
| # TODO: find a smalle 1B model to do Alora stuff on github actions. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we remove this or open a separate issue to fix this?
test/conftest.py
Outdated
| # # Check if there is a session fixture. | ||
| # try: | ||
| # session: MelleaSession = item._request.getfixturevalue("m_session") | ||
| # except Exception: | ||
| # # Skip test cause all llm marked tests need a session fixture. | ||
| # pytest.skip("`llm` marked tests requires a `m_session` fixture.") | ||
| # # Get the Ollama name. | ||
| # if isinstance(session.backend, OllamaModelBackend) or isinstance(session.backend, OpenAIBackend): | ||
| # model_id = session.backend.model_id.ollama_name | ||
| # # Skip tests of the model name is llama 1b | ||
| # if model_id == "llama3.2:1b": | ||
| # pytest.skip( | ||
| # "Skipping LLM test: got model_id == llama3.2:1b in ollama. Used only in gh workflows." | ||
| # ) | ||
| # elif isinstance(session.backend, LocalHFBackend): | ||
| # model_id = session.backend.model_id.hf_model_name | ||
| # # Skip tests of the model name is llama 1b | ||
| # if model_id == "unsloth/Llama-3.2-1B": | ||
| # pytest.skip( | ||
| # "Skipping LLM test: got model_id == unsloth/Llama-3.2-1B in hf. Used only in gh workflows." | ||
| # ) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we remove this commented out code?
| def test_session_with_parameters(model_id): | ||
| """Test contextual session with custom parameters.""" | ||
| with start_session(backend_name="ollama", model_id="granite3.3:8b") as m: | ||
| with start_session(backend_name="ollama", model_id=META_LLAMA_3_2_1B) as m: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should this be using the model_id fixture?
pyproject.toml
Outdated
|
|
||
| [tool.pytest.ini_options] | ||
| markers = [ | ||
| "llm: Marks the test as needing an exact output from an LLM (deselect with '-m \" not llm\"'); this depends on the session.backend.model_id" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I feel like some of the tests you've marked with this flag don't actually require an exact output. (I commented the ones in huggingface that I feel shouldn't have been, but it looks like the same applies to the other backends.).
Also, I think the naming is a bit confusing here. Can we do something like qualitative or output_checked or something along those lines. Because some of the tests that still run use llms.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
makes sense. I changed it to quantitative. Let me know it that works.
test/backends/test_huggingface.py
Outdated
| session.reset() | ||
|
|
||
|
|
||
| @pytest.mark.llm |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This doesn't require the output to be a specific value.
test/backends/test_huggingface.py
Outdated
| print(result) | ||
|
|
||
|
|
||
| @pytest.mark.llm |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it might be fair to expect the alora to output only Y/N. But maybe it gets it wrong enough that we need to mark it this way.
test/backends/test_huggingface.py
Outdated
| assert alora_output in ["Y", "N"], alora_output | ||
|
|
||
|
|
||
| @pytest.mark.llm |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same as above here. It should be forced to Y/N here.
test/backends/test_huggingface.py
Outdated
| assert str(val_result.reason) not in ["Y", "N"] | ||
|
|
||
|
|
||
| @pytest.mark.llm |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This doesn't require the output to be a specific value.
test/backends/test_huggingface.py
Outdated
| ) | ||
|
|
||
|
|
||
| @pytest.mark.llm |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This doesn't require the output to be a specific value.
test/backends/test_huggingface.py
Outdated
| assert len(results) == len(prompts) | ||
|
|
||
|
|
||
| @pytest.mark.llm |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This doesn't require the output to be a specific value.
There isn't a way to do it on github (or I haven't found it yet). But I have been using https://github.com/nektos/act to test it locally on a docker machine. Some of it it failing now cause of a stupid typo in the env variables ( |
jakelorocco
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm; I will approve but have a few comments:
- I think you are still applying the qualitative marker too broadly; I saw a far amount of tests that generate but don't check the content of the output getting flagged with it
- I think the watsonx skip is fine for now, but we should come up with a more systematic way of skipping backends
|
Description
This PR address #53 and is a follow up to #67.
llmmarker so that they are expected to fail during runtime.GITHUB_ACTION == 1is set as an env variable, which lets users run all stochastic models locally to run the full test suite.