LLM inference in Python
Python supports Ollama via the Ollama Python library.
We recommend using a Python uv environment for your development. Several of our Open OnDemand apps include built-in support for uv, Conda, or traditional Python venv virtual environments.
Example: Spyder and Ollama#
To provide a concrete example of a Python AI inference workflow, this guide walks through the steps to use the Spyder IDE with an Ollama/Apptainer container.
Note
This example assumes that the llama3.2 model is available in your Ollama environment. If you use a different model, replace llama3.2 with your model name. See our Ollama page for instructions on installing models.
Submit your interactive job#
- Log in to Open OnDemand
- From the top navigation menu, select Interactive Apps → Spyder:

- On the job submission form, select a Slurm account that includes access to GPU nodes
- If you are not a member of a group with access to GPU nodes, enter
backfill2in the Slurm Account field and set Number of hours to no more than 4 - Set GPUs to at least 1
- Ensure "Internet Access via Web Proxy" is checked
- Set the other values as needed
- If you are not a member of a group with access to GPU nodes, enter
- When your job starts, you might see a black screen for up to 60 seconds, and then the Spyder IDE will load
Start an Ollama container in Apptainer#
Return to the Open OnDemand tab showing your interactive session card, then follow the instructions to start an Ollama server in Apptainer.
Using the Python Ollama library#
In the terminal window in the lower-right corner of the Spyder IDE, run the following command to install the Python Ollama client library:
Once the Ollama library is installed, paste the following code into the main Python editor and click the run button in the Spyder IDE toolbar.
If it worked, you will see the chatbot response in the terminal in the lower right-hand corner of the Spyder IDE after a few seconds.
Troubleshooting#
If the Python script cannot connect to Ollama, make sure:
- Your Ollama Apptainer container is still running.
- The Ollama server is listening on
http://127.0.0.1:11434. - You selected a GPU-capable Slurm account or requested GPU resources for your interactive job.
- Internet access via web proxy was enabled if you need to install Python packages during the session.