Ollama
What is Ollama
Ollama is a software platform that allows users to run large language models (LLMs) locally on their own computers or servers. It simplifies the process of downloading, managing, and executing open-source LLMs, providing a command-line interface (CLI) and an API for interaction. This local deployment approach offers benefits like increased control over data, improved privacy, and potential cost savings compared to cloud-based LLM services.
How to use Ollama on our system
- Start an interactive session on a GPU node:
srun --partition=gpu --gres=gpu:1 --mem=32G --cpus-per-task=8 --time=3:00:00 --pty bash -i
- Once you connect to that session, choose a random port to run the Ollama server on.
export APPTAINERENV_OLLAMA_HOST=0.0.0.0:12345
- Set path for the Ollama models to be stored, preferably in scratch so it doesn't clog our backups:
module load ollama
export SCRATCH_FOLDER_PATH=<path to a scratch folder>
export OLLAMA_MODELS=${SCRATCH_FOLDER_PATH}/ollama-models
mkdir -p $OLLAMA_MODELS
- Run Ollama as a background process:
Please note that the variable OLLAMA_SIF is defined in our module file.
apptainer run --nv $OLLAMA_SIF &
A bunch of logs appear here, press Enter a few times to get past them
- Run the desired model:
apptainer exec --nv $OLLAMA_SIF ollama run llama3.2
This should produce output like this: ( You might see some progress bars indicating download of model if this is your first time running this model.)
llama_new_context_with_model: graph splits = 2
⠙ time=2025-05-05T14:49:04.971+10:00 level=INFO source=server.go:597 msg="llama runner started in 2.76 seconds"
[GIN] 2025/05/05 - 14:49:04 | 200 | 3.199581505s | 127.0.0.1 | POST "/api/generate"
>>> Hello there, how are you?
I'm just a language model, so I don't have emotions or feelings in the way that humans do. However, I'm functioning properly and ready to assist you with any questions or tasks
you may have! How can I help you today?[GIN] 2025/05/05 - 14:49:30 | 200 | 2.00997452s | 127.0.0.1 | POST "/api/chat"
>>>
Note the "--nv" flag is to allow GPU acceleration to be used inside the Ollama container. It is very slow to use without GPU.