Skip to content

LLM inference in Python

Python supports Ollama via the Ollama Python library.

We recommend using a Python uv environment for your development. Several of our Open OnDemand apps include built-in support for uv, Conda, or traditional Python venv virtual environments.

Example: Spyder and Ollama#

To provide a concrete example of a Python AI inference workflow, this guide walks through the steps to use the Spyder IDE with an Ollama/Apptainer container.

Note

This example assumes that the llama3.2 model is available in your Ollama environment. If you use a different model, replace llama3.2 with your model name. See our Ollama page for instructions on installing models.

Submit your interactive job#

  1. Log in to Open OnDemand
  2. From the top navigation menu, select Interactive Apps → Spyder: Select 'Interactive Apps' then 'Spyder'
  3. On the job submission form, select a Slurm account that includes access to GPU nodes
    • If you are not a member of a group with access to GPU nodes, enter backfill2 in the Slurm Account field and set Number of hours to no more than 4
    • Set GPUs to at least 1
    • Ensure "Internet Access via Web Proxy" is checked
    • Set the other values as needed
  4. When your job starts, you might see a black screen for up to 60 seconds, and then the Spyder IDE will load

Start an Ollama container in Apptainer#

Return to the Open OnDemand tab showing your interactive session card, then follow the instructions to start an Ollama server in Apptainer.

Using the Python Ollama library#

In the terminal window in the lower-right corner of the Spyder IDE, run the following command to install the Python Ollama client library:

# In Spyder, prefix non-Python shell commands with the '!' character
!pip install ollama

Once the Ollama library is installed, paste the following code into the main Python editor and click the run button in the Spyder IDE toolbar.

from ollama import Client

client = Client(host="http://127.0.0.1:11434")

response = client.chat(
    model="llama3.2",
    messages=[
      {
      "role": "user", 
      "content": "Why is the sky blue?"
      }
    ]
)

print(response["message"]["content"])

If it worked, you will see the chatbot response in the terminal in the lower right-hand corner of the Spyder IDE after a few seconds.

Troubleshooting#

If the Python script cannot connect to Ollama, make sure:

  • Your Ollama Apptainer container is still running.
  • The Ollama server is listening on http://127.0.0.1:11434.
  • You selected a GPU-capable Slurm account or requested GPU resources for your interactive job.
  • Internet access via web proxy was enabled if you need to install Python packages during the session.