Ollama

What is Ollama

Ollama is a software platform that allows users to run large language models (LLMs) locally on their own computers or servers. It simplifies the process of downloading, managing, and executing open-source LLMs, providing a command-line interface (CLI) and an API for interaction. This local deployment approach offers benefits like increased control over data, improved privacy, and potential cost savings compared to cloud-based LLM services.

How to use Ollama on our system

Start an interactive session on a GPU node:

srun --partition=gpu --gres=gpu:1 --mem=32G --cpus-per-task=8 --time=3:00:00 --pty bash -i

Once you connect to that session, choose a random port to run the Ollama server on.

export APPTAINERENV_OLLAMA_HOST=0.0.0.0:12345

Set path for the Ollama models to be stored, preferably in scratch so it doesn't clog our backups:

module load ollama
export SCRATCH_FOLDER_PATH=<path to a scratch folder>
export OLLAMA_MODELS=${SCRATCH_FOLDER_PATH}/ollama-models
mkdir -p $OLLAMA_MODELS

Run Ollama as a background process:

Please note that the variable OLLAMA_SIF is defined in our module file.

apptainer run --nv $OLLAMA_SIF &

A bunch of logs appear here, press Enter a few times to get past them

Run the desired model:

apptainer exec --nv $OLLAMA_SIF ollama run llama3.2

This should produce output like this: ( You might see some progress bars indicating download of model if this is your first time running this model.)

llama_new_context_with_model: graph splits = 2
⠙ time=2025-05-05T14:49:04.971+10:00 level=INFO source=server.go:597 msg="llama runner started in 2.76 seconds"
[GIN] 2025/05/05 - 14:49:04 | 200 |  3.199581505s |       127.0.0.1 | POST     "/api/generate"
>>> Hello there, how are you?
I'm just a language model, so I don't have emotions or feelings in the way that humans do. However, I'm functioning properly and ready to assist you with any questions or tasks 
you may have! How can I help you today?[GIN] 2025/05/05 - 14:49:30 | 200 |   2.00997452s |       127.0.0.1 | POST     "/api/chat"
>>> 

note

Note the "--nv" flag is to allow GPU acceleration to be used inside the Ollama container. It is very slow to use without GPU.

What is Ollama​

How to use Ollama on our system​

What is Ollama

How to use Ollama on our system