Deploying LLMs like Google Gemma

Ollama empowers users to work with large language models (LLMs) through its library of open-source models and its user-friendly API. This allows users to choose the best LLM for their specific task, whether it’s text generation, translation, or code analysis. Ollama also simplifies interaction with different LLMs, making them accessible to a wider audience and fostering a more flexible and efficient LLM experience. In this tutorial we will run Google Gemma with Ollama so that you can send queries via a REST API.

Quick start guide

Prerequisites
Starting a virtual machine with cudoctl
Installing Ollama via SSH
Using Docker to start a LLM API

Prerequisites

Create a project and add an SSH key
Download the CLI tool

Starting a virtual machine with cudoctl

Start a virtual machine with the base image you require, here we will start with an image that already has NVIDIA drivers. You can use the web console to start a virtual machine using the Ubuntu 22.04 + NVIDIA drivers + Docker image or alternatively use the command line tool cudoctl To use the command line tool you will need to get an API key from the web console, see here: API key Then run cudoctl init and enter your API key. First we search to find a virtual machine type to start

cudoctl search --vcpus 4 --mem 8 --gpus 1

Find an image:

cudoctl search images

After deciding on a machine type of epyc-milan-rtx-a4000 (16GB GPU) in the se-smedjebacken-1 data center and image ubuntu-2204-nvidia-535-docker-v20240214 we can start a virtual machine:

cudoctl vm create --id my-ollama --image ubuntu-2204-nvidia-535-docker-v20240214 --machine-type epyc-milan-rtx-a4000 --memory 8 --vcpus 4  --gpus 1 --boot-disk-size 80 -boot-disk-class network --data-center se-smedjebacken-1

Installing Ollama via SSH

Get the IP address of the virtual machine

cudoctl -json vm get my-ollama | jq '.externalIP'

SSH into the virtual machine

ssh root@<IP_ADDRESS>

Install ollama

curl -fsSL https://ollama.com/install.sh | sh

Download and run Google Gemma LLM, then you can enter your prompt.

ollama run gemma:7b

From the Ollama docs:

Model	Parameters	Size	Download
Llama 2	7B	3.8GB	`ollama run llama2`
Mistral	7B	4.1GB	`ollama run mistral`
Dolphin Phi	2.7B	1.6GB	`ollama run dolphin-phi`
Phi-2	2.7B	1.7GB	`ollama run phi`
Neural Chat	7B	4.1GB	`ollama run neural-chat`
Starling	7B	4.1GB	`ollama run starling-lm`
Code Llama	7B	3.8GB	`ollama run codellama`
Llama 2 Uncensored	7B	3.8GB	`ollama run llama2-uncensored`
Llama 2 13B	13B	7.3GB	`ollama run llama2:13b`
Llama 2 70B	70B	39GB	`ollama run llama2:70b`
Orca Mini	3B	1.9GB	`ollama run orca-mini`
Vicuna	7B	3.8GB	`ollama run vicuna`
LLaVA	7B	4.5GB	`ollama run llava`
Gemma	2B	1.4GB	`ollama run gemma:2b`
Gemma	7B	4.8GB	`ollama run gemma:7b`

Using Docker to start a LLM API

If you had created a vm in the previous step delete it by running:

cudoctl vm delete my-ollama

Create a text file with a command to start the Ollama docker container: start-ollama.txt

docker run -d --gpus=all -v ollama:/root/.ollama -p 11434:11434 --name ollama ollama/ollama

Create a virtual machine and include the command to add a start script -start-script-file start-ollama.txt:

cudoctl vm create --id my-ollama --image ubuntu-2204-nvidia-535-docker-v20240214 \
--machine-type epyc-milan-rtx-a4000 --memory 8 --vcpus 4  --gpus 1 --boot-disk-size 80 \
-boot-disk-class network --data-center se-smedjebacken-1 -start-script-file start-ollama.txt

Once the virtual machine is running you can curl the API to pull the model you require, here we use gemma:7b

curl http://<IP_ADDRESS>:11434/api/pull -d '{"name": "gemma:7b"}'

Now it is ready to respond to a prompt:

curl http://<IP_ADDRESS>:11434/api/generate -d '{
"model": "gemma:7b",
"prompt":"Why when you leave water overnight in a glass does it create bubbles in the water ?",
"stream":false
}' | jq '.response'

Tutorials

​Quick start guide

​Prerequisites

​Starting a virtual machine with cudoctl

​Installing Ollama via SSH

​Using Docker to start a LLM API

Quick start guide

Prerequisites

Starting a virtual machine with cudoctl

Installing Ollama via SSH

Using Docker to start a LLM API