Skip to content

Enable Jupyter kernels for batch execution#61

Merged
bgruening merged 11 commits intobgruening:masterfrom
TuturBaba:patch-5
Apr 18, 2025
Merged

Enable Jupyter kernels for batch execution#61
bgruening merged 11 commits intobgruening:masterfrom
TuturBaba:patch-5

Conversation

@TuturBaba
Copy link
Contributor

Hello,
I am a master's student doing my internship with @yvanlebras . I need to create a workflow on Galaxy and noticed an issue with the latest Jupyter tool in Galaxy. Since it cannot be used in batch mode, I looked for a simple way to fix it. From what I implemented, it seems to work. I used pip install inside Conda and, most importantly, ipykernel, which allows Jupyter to correctly recognize the kernel paths and manually name them.

@bgruening
Copy link
Owner

Thanks a lot @TuturBaba.

Is your hypothesis that the conda package of ipykernel is broken? Or just too old?

Dockerfile Outdated
'r-tidyverse' \
'unixodbc' && \
conda create -n ansible-kernel --yes && \
conda run -n ansible-kernel pip install ipykernel bioblend galaxy-ie-helpers && \
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why do you install the ipykernel kernel in the ansible env?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I just noticed that there are quite a few mistakes and that I didn’t fully understand what I had done. So, I’m reviewing my work and will get back to it later.
The main issue is with the metadata, as shown in the image. Since I focused on R and Python, I saw that I could modify the metadata using ipkernel. Without thinking too much, I applied the same method to other languages, but this only changed the name while the language remained Python.
So, I need to find a solution to ensure that when running jupyter kernelspec list, the different kernels are correctly configured. At the moment, with the current setup, only the kernels that come with the base image work: python3 and julia-1.10
Screenshot from 2025-03-25 14-39-05

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And I forgot, but even the kernel that come with base image don't work by batch

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Isn't that a problem https://github.com/anaconda/nb_conda_kernels tries to solve?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe, but I haven't found a complete solution, however, I used this

@TuturBaba
Copy link
Contributor Author

Hello,
I have modified my Dockerfile, and I think I understand how it works, so I should be able to answer your questions.
However, I have one question: What is better, a single long RUN command or multiple RUN commands? I know that using multiple RUN commands can make the Dockerfile easier to manage since you don’t have to rebuild everything from scratch if something changes. But in terms of optimization, what is the best approach?

@bgruening
Copy link
Owner

One RUN command is recommended or you need to use a multi-stage Docker build. This is all to reduce the size of the final image.

Dockerfile Outdated
'r-tidymodels' \
'r-tidyverse' \
'unixodbc' && \
conda run -n rlang-kernel R -e 'IRkernel::installspec(user = TRUE)' && \
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can also add here something like (name = 'rlang-kernel', displayname = 'R')

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

and I guess for the other kernels as well

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes u can : conda run -n rlang-kernel R -e "IRkernel::installspec(user = TRUE, name = 'rlang-kernel', displayname = 'R')"
But it's also possible without specifying a name: conda run -n python-kernel-3.12 python -m ipykernel install --user

@TuturBaba
Copy link
Contributor Author

I have a question, Jupyter version 1.0.0 does not use a Conda envs. So I was wondering why you chose to create environments. I did a quick test, and it seems to work fine (without any complications).

Before this change, there was an issue with the Dockerfile when trying to install R libraries using commands like system("conda install r-terra"). While the libraries were technically installed, they were not placed in the correct location.  I confirmed this by checking with .libPaths().

After discovering the problem, I searched for a solution and found that by adding a .Rprofile file with the correct paths, the installation now works as expected. This modification sets the CONDA_PREFIX and updates the PATH in the .Rprofile to ensure R libraries are installed in the appropriate environment
@bgruening
Copy link
Owner

Because people wanted to install new packages for R, python, c++ whatever. The only package manager that enables all this and does not waste resources on compling is conda (in our community)

@TuturBaba
Copy link
Contributor Author

I understand the benefit of using Conda to manage dependencies, but why create a new environment (conda create) instead of installing packages directly in the base environment with conda install? Is there a specific advantage to isolating installations?

PS: By the way, I made a commit that fixes my issue, so I think this version is working fine.

@bgruening
Copy link
Owner

If you mix many different languages and many different packages my assumption was that if you install new packages over time it gets more complicated and brittle. Maybe I'm wrong and this is overengeenered. Is now everything working for you? Can you maybe outline how you tested this locally and maybe add it to the readme?

Copy link
Owner

@bgruening bgruening left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will merge in the next days if no one else has comments and release a new version.

@yvanlebras
Copy link

Thank you Björn! I think the PR is not ready to merge for now, the issue for R is solved, but not for Python (in batch mode, when a conda package is installed, it works fine BUT when the code is trying to use the installed conda package, Arthur has an error as the PATH is not the good one). Arthur is trying to look at it more deeply and try to apply the best patch he can if possible for all envs.

@TuturBaba
Copy link
Contributor Author

Hello Bjorn,
I have a script that seems correct, and I’m currently testing it to see if it works as expected.
I wanted to know if you have any test scripts available on your side, especially for Ansible and Octave, as I don’t have any knowledge of how they work.

@bgruening
Copy link
Owner

Ok, just checking, there are no new commits, the current version is working for you?

Hello, I’m going to explain my work during the testing phase. First, I use Conda packages for different languages. I use the python -m command to access certain package options. When 
I use this method to install the kernel, I don’t need nb_conda_kernels.

So, I can remove nb_conda_kernels, and this allows the base image kernels (Julia and Python) to get the correct path and metadata again. That’s why I didn’t create a new Python environment, but I can create one if needed. However, like with R, the path won’t be correct. I think I can define a solution to fix that if you want.

To test my Dockerfile, I use three scripts, which you can find on my GitHub: https://github.com/TuturBaba/Antarctic/tree/main/script_test_dockerfile . I only test R, Python, and Bash because I don’t have experience with other languages.I used basic commands, like doing math operations, writing to a file, and installing packages.

To test this, I simply added the following line at the end of my Dockerfile: ADD ./R.ipynb /import/R.ipynb
 Then I run:  RUN chmod -R 777 /import
There was a problem with "proj", the path after download was not the right one. Because of the way we manage R, it means that we can find other problems like
@yvanlebras
Copy link

Hi Arthur, Björn. Seems to me this is ok to merge isn't it ? For links, related issues are usegalaxy-eu/galaxy#244 and usegalaxy-eu/galaxy#266

@bgruening bgruening merged commit d924eed into bgruening:master Apr 18, 2025
2 checks passed
@bgruening
Copy link
Owner

I will create a new release and a new Galaxy tool version. Thanks everyone!

@bgruening
Copy link
Owner

Ok, it is deployed: https://usegalaxy.eu/root?tool_id=interactive_tool_jupyter_notebook make sure to use the latest 1.0.2 version.

@bgruening
Copy link
Owner

@TuturBaba it seems this has broken some other functionality. Can you try to create a new env with some new packages? I think its not possible to change into this new env anymore via the UI.

grafik

Can you maybe create a small bash script that is testing your usecase, independent of Galaxy, just this container and a notebook? That would help me I guess to make sure your use-case is working but still not breaking the venv selector.

@TuturBaba
Copy link
Contributor Author

@bgruening I didn’t fully understand what you want to do.
On my side, I tried to create a kernel directly in the Galaxy Jupyter tool, and I was able to create a new Python kernel with these commands (in terminal):

conda create -n test-kernel python=3.10 ipykernel -y
conda activate test-kernel
python -m ipykernel install --user --name test-kernel --display-name "Python (test)"

Also, running without Galaxy works for me because I use the container directly on my computer.
But if you need something specific, please give more details.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants