LLM inference in Python

Python supports Ollama via the Ollama Python library.

We recommend using a Python uv environment for your development. Several of our Open OnDemand apps include built-in support for uv, Conda, or traditional Python venv virtual environments.

Example: Spyder and Ollama#

To provide a concrete example of a Python AI inference workflow, this guide walks through the steps to use the Spyder IDE with an Ollama/Apptainer container.

Note

This example assumes that the llama3.2 model is available in your Ollama environment. If you use a different model, replace llama3.2 with your model name. See our Ollama page for instructions on installing models.

Submit your interactive job#

Log in to Open OnDemand
From the top navigation menu, select Interactive Apps → Spyder:
On the job submission form, select a Slurm account that includes access to GPU nodes
- If you are not a member of a group with access to GPU nodes, enter backfill2 in the Slurm Account field and set Number of hours to no more than 4
- Set GPUs to at least 1
- Ensure "Internet Access via Web Proxy" is checked
- Set the other values as needed
When your job starts, you might see a black screen for up to 60 seconds, and then the Spyder IDE will load

Start an Ollama container in Apptainer#

Return to the Open OnDemand tab showing your interactive session card, then follow the instructions to start an Ollama server in Apptainer.

Using the Python Ollama library#

In the terminal window in the lower-right corner of the Spyder IDE, run the following command to install the Python Ollama client library:

1 2	`# In Spyder, prefix non-Python shell commands with the '!' character !pip install ollama`

Once the Ollama library is installed, paste the following code into the main Python editor and click the run button in the Spyder IDE toolbar.

from ollama import Client

client = Client(host="http://127.0.0.1:11434")

response = client.chat(
    model="llama3.2",
    messages=[
      {
      "role": "user", 
      "content": "Why is the sky blue?"
      }
    ]
)

print(response["message"]["content"])

If it worked, you will see the chatbot response in the terminal in the lower right-hand corner of the Spyder IDE after a few seconds.

Troubleshooting#

If the Python script cannot connect to Ollama, make sure:

Your Ollama Apptainer container is still running.
The Ollama server is listening on http://127.0.0.1:11434.
You selected a GPU-capable Slurm account or requested GPU resources for your interactive job.
Internet access via web proxy was enabled if you need to install Python packages during the session.