Get started
Go to the apps section in the web console and click either the small, medium or large instance of Ollama. This will give you some good default settings but you can fully customise your deployment at the next step.Customise the deployment
You can just choose an id for your App and deploy it. Or you may want to configure the spec of the machine.GPU selection
The model(s) you wish to run will determine the amount of VRAM you will need on your GPU. Ollama supports a list of models available on ollama.com/library Here are some example models that can be downloaded:| Model | Parameters | Size | Download |
|---|---|---|---|
| Gemma 3 | 1B | 815MB | ollama run gemma3:1b |
| Gemma 3 | 4B | 3.3GB | ollama run gemma3 |
| Gemma 3 | 12B | 8.1GB | ollama run gemma3:12b |
| Gemma 3 | 27B | 17GB | ollama run gemma3:27b |
| QwQ | 32B | 20GB | ollama run qwq |
| DeepSeek-R1 | 7B | 4.7GB | ollama run deepseek-r1 |
| DeepSeek-R1 | 671B | 404GB | ollama run deepseek-r1:671b |
| Llama 3.3 | 70B | 43GB | ollama run llama3.3 |
| Llama 3.2 | 3B | 2.0GB | ollama run llama3.2 |
| Llama 3.2 | 1B | 1.3GB | ollama run llama3.2:1b |
| Llama 3.2 Vision | 11B | 7.9GB | ollama run llama3.2-vision |
| Llama 3.2 Vision | 90B | 55GB | ollama run llama3.2-vision:90b |
| Llama 3.1 | 8B | 4.7GB | ollama run llama3.1 |
| Llama 3.1 | 405B | 231GB | ollama run llama3.1:405b |
| Phi 4 | 14B | 9.1GB | ollama run phi4 |
| Phi 4 Mini | 3.8B | 2.5GB | ollama run phi4-mini |
| Mistral | 7B | 4.1GB | ollama run mistral |
| Moondream 2 | 1.4B | 829MB | ollama run moondream |
| Neural Chat | 7B | 4.1GB | ollama run neural-chat |
| Starling | 7B | 4.1GB | ollama run starling-lm |
| Code Llama | 7B | 3.8GB | ollama run codellama |
| Llama 2 Uncensored | 7B | 3.8GB | ollama run llama2-uncensored |
| LLaVA | 7B | 4.5GB | ollama run llava |
| Granite-3.2 | 8B | 4.9GB | ollama run granite3.2 |
Disk size
The default disk size is set between 100-200GB which should be enough for most users. However, some people often wish to compare the performance of many models so if you plan to download and use multiple models consider increasing your boot disk size.Using Ollama
When you deploy the VM you will be shown the VM information page. On the left hand side there is a pane called ‘Metadata’. For Ollama we can see the following metadata:port and CUDO_TOKEN
Pull a model
Use curl from your local machine to pull a model. Model list is here: Ollama library The model needs to fit on your GPU memory and VM Disk. Here is an example curl request pulling tinyllama:Test completion
Now try sending a completion using curl, here we have turned streaming to make the response more readable.Continue with curl / REST API
The API has more end points listed below, you can continue using curl or any other REST tool: API docs- Generate a completion
- Generate a chat completion
- Create a Model
- List Local Models
- Show Model Information
- Copy a Model
- Delete a Model
- Pull a Model
- Push a Model
- Generate Embeddings
- List Running Models
- Version
Using Ollama with OpenAI API
Ollama also supports and OpenAI compatible API. Note: OpenAI compatibility is experimental and is subject to major adjustments including breaking changes. For fully-featured access to the Ollama API, see the Ollama Python library, JavaScript library and REST API. Install the openai sdk:port and CUDO_TOKEN from the Metadata pane just below.